TitleAssessment of Graphic Processing Units (GPUs) for Department of Defense (DoD) Digital Signal Processing (DSP) Applications (Tech Report)
Author(s) John D. Owens, Shubhabrata Sengupta, Daniel Horn
Year October 2005
InstitutionDepartment of Electrical and Computer Engineering, University of California, Davis
Abstract In this report we analyze the performance of the fast Fourier transform (FFT) on graphics hardware (the GPU), comparing it to the best-of-class CPU implementation FFTW. We describe the FFT, the architecture of the GPU, and how general-purpose computation is structured on the GPU. We then identify the factors that influence FFT performance and describe several experiments that compare these factors between the CPU and the GPU. We conclude that the overhead of transferring data and initiating GPU computation are substantially higher than on the CPU, and thus for latency-critical applications, the CPU is a superior choice. We show that the CPU implementation is limited by computation and the GPU implementation by GPU memory bandwidth and its lack of a writable cache. The GPU is comparatively better suited for larger FFTs with many FFTs computed in parallel in applications where FFT throughput is most important; on these applications GPU and CPU performance is roughly on par. We also demonstrate that adding additional computation to an application that includes the FFT, particularly computation that is GPU-friendly, puts the GPU at an advantage compared to the CPU.
Note We gratefully acknowledge the financial support of Lockheed-Martin and the Department of Defense that made this work possible.