Title:
Elf: Efficient lightweight fast stream processing at scale

Thumbnail Image
Author(s)
Hu, Liting
Authors
Advisor(s)
Schwan, Karsten
Wolf, Matthew
Advisor(s)
Editor(s)
Associated Organization(s)
Organizational Unit
Organizational Unit
Series
Supplementary to
Abstract
Large Internet companies like Facebook, Amazon, and Twitter are increasingly recognizing the value of stream data processing, using tools like Flume, Muppet, or Storm to continuously collect and process incoming data in real time to help govern company activities. Applications include monitoring marketing streams for business-critical decisions, identifying spam campaigns from social network streams, datacenter's intrusion detection and troubleshooting, and others. Technical challenges for stream processing include the following: how to scale to numerous, concurrently running streaming jobs, to coordinate across those jobs to share insights, to make online changes to job functions to adapt to new requirements or data characteristics, and for each job, to efficiently operate over different time windows. This dissertation presents a new stream processing model, termed ELF, which addresses these new challenges. ELF proposes a novel decentralized "many masters many workers'' architecture implemented over a set of agents enriching the web tier of datacenter systems. ELF uses a DHT protocol to assign the jobs respective sets of master/workers mapping to the agents of the web tier, where for each job, the live data streams generated by webservers are first divided into mini-batches, then inserted and aggregated as space-efficient compressed buffer trees (CBTs) in local agents' memories. Second, per-batch results are `flushed' from CBTs, to be rolled up and aggregated via shared reducer trees (SRTs), in ways that naturally balance SRT-induced load, reduce processing latencies, and allow online job changes along with cross-job coordination. An ELF prototype implemented and evaluated for a larger scale configuration demonstrates scalability, high per-node throughput, sub-second job latency, and sub-second ability to adjust the actions of jobs being run.
Sponsor
Date Issued
2016-05-10
Extent
Resource Type
Text
Resource Subtype
Dissertation
Rights Statement
Rights URI