Fast Computation of Database Operations Using Graphics Processors

Naga K. Govindaraju, Brandon Lloyd, Wei Wang, Ming C. Lin, and Dinesh Manocha

We present new algorithms for performing fast computation of several common database operations on commodity graphics processors. Specifically, we consider operations such as conjunctive selections, aggregations, and semi-linear queries, which are essential computational components of typical database, data warehousing, and data mining applications. While graphics processing units (GPUs) have been designed for fast display of geometric primitives, we utilize the inherent pipelining and parallelism, single instruction and multiple data (SIMD) capabilities, and vector processing functionality of GPUs, for evaluating Boolean predicate combinations and semi-linear queries on attributes and executing database operations efficiently. Our algorithms take into account some of the limitations of the programming model of current GPUs and perform no data rearrangements. We have implemented our algorithms on a programmable GPU (e.g. NVIDIA GeForce FX 5900) and applied to databases consisting of up to a million records. We have compared their performance with an optimized implementation of CPU-based algorithms. Our experiments indicate that the graphics processor available on commodity computer systems is an effective co-processor for performing database operations.

Results

Relational Query Range Query Multi-Attribute Query
Execution time of a predicate evaluation with 60% selectivity by a CPU-based algorithm and a GPU-based algorithm. Timings for the GPU-based algorithm include the time to copy data values into the depth buffer. Considering only the computation time, the GPU is nearly twenty times faster than a compiler-optimized SIMD implementation. Execution time of a range query with 60% selectivity by a CPU-based algorithm and a GPU-based algorithm. Timings for the GPU-based algorithm include the time to copy data values into the depth buffer. Considering only the computation time, the GPU is nearly forty times faster than a compiler-optimized SIMD implementation. Execution time of a multi-attribute query with 60% selectivity for each attribute and a combination of AND operator. Time i is the time to perform a query with i attributes. We show timings for CPU-based and GPU-based implementations.
K-th Largest Number #1 K-th Largest Number #2 K-th Largest Number #3
Time to compute K-th largest number on the data count attribute. We used a portion of the TCP/IP database with nearly 250,000 records. Time taken to compute the median using KthLargest and QuickSelect on varying number of records. Time taken to compute the K-th largest number by the two implementations.
  Semi-Linear Queries Accumulator  
  Execution time of a semi-linear query using four attributes of the TCP/IP database. The GPU-based implementation is nearly an order of magnitude faster than the CPU-based implementation. Time required to sum the values of an attribute by the CPU-based and by the GPU-based Accumulator algorithm.  

Contents

Fast Computation of Database Operations Using Graphics Processors in Proceedings of ACM SIGMOD 2004.

Related Links

Contact

CB #3175, Department of Computer Science
University of North Carolina
Chapel Hill, NC 27599-3175
(919) 962-1749
geom@cs.unc.edu