CubeCache: Efficient and Scalable Processing of OLAP Aggregation Queries in a Peer-to-Peer Network

Author(s)
Seshadri, Sangeetha
Cooper, Brian F.
Advisor(s)
Editor(s)
Associated Organization(s)
Series
Supplementary to:
Abstract
Peer to Peer (P2P) data sharing systems are emerging as a promising infrastructure for collaborative data sharing among multiple geographically distributed data centers within a large enterprise. This paper presents CubeCache, a peer-to-peer system for efficiently serving OLAP queries and data cube aggregations in a distributed data warehouse system. CubeCache combines multiple client caches into a single query processing and caching system. Compared to existing peer-to-peer systems the CubeCache solution has a number of unique features. First, we add a query processing layer to perform innetwork data aggregation over peer caches. Second, we introduce the concept of Query-Trails: a cache listing recent data requestors. Query-Trails make it easier to find caches that are likely to have data needed for a query. Third, we design a benefit measure that incorporates the 'rarity' of a chunk into the notion of benefit, allowing controlled replication of chunks in a system plagued by frequent node departures or failures. We report the results of analysis and an experimental study using simulations and an implemented prototype that shows the CubeCache solution reduces the server load, improves query throughput and reduces query latency for OLAP tasks.
Sponsor
Date
2007
Extent
Resource Type
Text
Resource Subtype
Technical Report
Rights Statement
Rights URI