Title:
Efficient Relational Algebra Algorithms and Data Structures for GPU
Efficient Relational Algebra Algorithms and Data Structures for GPU
Authors
Diamos, Gregory Frederick
Wu, Haicheng
Lele, Ashwin
Wang, Jin
Wu, Haicheng
Lele, Ashwin
Wang, Jin
Authors
Advisors
Advisors
Associated Organizations
Organizational Unit
Series
Collections
Supplementary to
Permanent Link
Abstract
Relational databases remain an important application domain for organizing and analyzing the massive
volume of data generated as sensor technology, retail and inventory transactions, social media, computer
vision, and new fields continue to evolve. At the same time, processor architectures are beginning
to shift towards hierarchical and parallel architectures employing throughput-optimized memory systems,
lightweight multi-threading, and Single-Instruction Multiple-Data (SIMD) core organizations. Examples
include general purpose graphics processing units (GPUs) such as NVIDIA’s Fermi, Intels Sandy Bridge,
and AMD’s Fusion processors. This paper explores the mapping of primitive relational algebra operations onto GPUs. In particular, we
focus on algorithms and data structure design identifying a fundamental conflict between the structure of
algorithms with good computational complexity and that of algorithms with memory access patterns and
instruction schedules that achieve peak machine utilization. To reconcile this conflict, our design space
exploration converges on a hybrid multi-stage algorithm that devotes a small amount of the total runtime
to prune input data sets using an irregular algorithm with good computational complexity. The partial
results are then fed into a regular algorithm that achieves near peak machine utilization. The design process
leading to the most efficient algorithm for each stage is described, detailing alternative implementations,
their performance characteristics, and an explanation of why they were ultimately abandoned. The least efficient algorithm (JOIN) achieves 57% 72% of peak machine performance depending
on the density of the input. The most efficient algorithms (PRODUCT, PROJECT, and SELECT) achieve
86% 92% of peak machine performance across all input data sets. To the best of our knowledge, these
represent the best known published results to date for any implementations. This work lays the foundation
for the development of a relational database system that achieves good scalability on a Multi-level Bulk-
Synchronous-Parallel (Multi-BSP) processor architecture exemplified by modern GPUs.
Sponsor
Date Issued
2012-02-01
Extent
Resource Type
Text
Resource Subtype
Technical Report