Title:
Memory-Efficient GroupBy-Aggregate using Compressed Buffer Trees

Thumbnail Image
Author(s)
Amur, Hrishikesh
Richter, Wolfgang
Andersen, David G.
Kaminsky, Michael
Schwan, Karsten
Balachandran, Athula
Zawadzki, Erik
Authors
Advisor(s)
Advisor(s)
Editor(s)
Associated Organization(s)
Supplementary to
Abstract
Memory is rapidly becoming a precious resource in many data processing environments. This paper introduces a new data structure called a Compressed Buffer Tree (CBT). Using a combination of buffering, compression, and lazy aggregation, CBTs can improve the memory efficiency of the GroupBy-Aggregate abstraction which forms the basis of many data processing models like MapReduce and databases. We evaluate CBTs in the context of MapReduce aggregation, and show that CBTs can provide significant advantages over existing hash-based aggregation techniques: up to 2x less memory and 1.5x the throughput, at the cost of 2.5x CPU.
Sponsor
Date Issued
2012
Extent
Resource Type
Text
Resource Subtype
Technical Report
Rights Statement
Rights URI