CubeCache: Efficient and Scalable Processing of OLAP Aggregation Queries in a Peer-to-Peer Network
Author(s)
Advisor(s)
Editor(s)
Collections
Supplementary to:
Permanent Link
Abstract
Peer to Peer (P2P) data sharing systems are
emerging as a promising infrastructure for
collaborative data sharing among multiple
geographically distributed data centers within a
large enterprise. This paper presents CubeCache, a
peer-to-peer system for efficiently serving OLAP
queries and data cube aggregations in a distributed
data warehouse system. CubeCache combines
multiple client caches into a single query
processing and caching system. Compared to
existing peer-to-peer systems the CubeCache
solution has a number of unique features. First, we
add a query processing layer to perform innetwork
data aggregation over peer caches.
Second, we introduce the concept of Query-Trails:
a cache listing recent data requestors. Query-Trails
make it easier to find caches that are likely to have
data needed for a query. Third, we design a benefit
measure that incorporates the 'rarity' of a chunk
into the notion of benefit, allowing controlled
replication of chunks in a system plagued by
frequent node departures or failures. We report the
results of analysis and an experimental study using
simulations and an implemented prototype that
shows the CubeCache solution reduces the server
load, improves query throughput and reduces
query latency for OLAP tasks.
Sponsor
Date
2007
Extent
Resource Type
Text
Resource Subtype
Technical Report