Introduction
GPUFFTW is a fast FFT library designed to exploit the computational performance and memory bandwidth on GPUs. Our library exploits the data parallelism available on current GPUs and pipelines the computation to the different stages of the graphics processor. Moreover, our library uses an efficient tiling strategy to further improve the memory performance of our algorithm. GPUFFTW can efficiently handle large real and complex 1-D arrays at 32-bit floating point precision on commodity GPUs. Using a NVIDIA 8800 GPU and the FFTW metric for measuring performance, our algorithm is able to achieve over 29 GFLOPS of performance on large 1-D FFTs. Furthermore, our FFT algorithm achieves comparable precision to the IEEE 32-bit FFT algorithms on CPUs even on large 1-D arrays. The library supports both Windows and Linux platforms.
Please refer
to the documentation for details regarding the API and the contents of
the distribution. Also, please read through the system requirements below
before using the library.
Note: GPUFFTW does run correctly on Windows XP and 8800 GTX using the latest NVIDIA drivers 158.19. It also runs on Windows Vista and earlier NVIDIA GPUs and drivers
System Requirements
- OS:
Microsoft Windows XP/2000 and Linux
- RAM: Atleast a size of graphics processor video memory is required.
- GPU:
NVIDIA GeForce/Quadro family card with
support for the following OpenGL extensions:
- EXT_framebuffer_object
- ARB_texture_rectangle
- ARB_fragment_program
- Multiple Render Targets
- The above requirements are
met by NV40-based GPUs and above
(GeForce 6, GeForce 7 and GeForce 8 series)
- The library has been tested
on the following cards:
- GeForce 8 series (use latest drivers)
- GeForce 7 series
- GeForce 6 series
- Quadro FX 4000
- Laptop graphics cards:
GeForce 7/6 series based laptop cards
For obtaining reasonably high performance, we recommend a PC with AGP8X/PCI-Express NVIDIA GeForce 6800 GT or faster GPU.
- Video
RAM: The Video RAM will
determine the
maximum array length that can be sorted on the GPU. A rough guideline
for performing FFT on 32-bit floats is: Maximum array length in millions = Video
RAM in MB / 32. Therefore, on a card with 256 MB VRAM, the
maximum-length array which can be sorted is 256/32 = 8 Million real values or 4M complex values
- Drivers: Latest drivers from
NVIDIA (version 7772 or higher for windows, and 7664 for linux)
.
Note:
- FFTW : We are not porting FFTW to GPUs and our project is not related to FFTW. FFTW is a more general library designed mainly for CPUs. GPUFFTW is the fastest FFT library on GPUs similar to FFTW on CPUs and there is no other similarity between these two projects.
- ATI
cards: ATI cards are not
supported
in the present release of GPUFFTW mainly due to the lack of suport for ARB_texture_rectangle in fragment programs on current ATI drivers. These cards will be supported in future releases.
- Higher Dimensional FFTs Our current code only handles 1D power-of-two single-precision FFTs. Future releases may include 2D and 3D FFTs.
©2003 Department of Computer Science, UNC
Chapel Hill