@techreport{Owens:2005:AOG,
| title | = | "Assessment of Graphic Processing Units (GPUs) for Department of Defense (DoD) Digital Signal Processing (DSP) Applications", |
| author | = | "John
D. Owens AND Shubhabrata
Sengupta AND Daniel
Horn ", |
| year | = | "2005", |
| month | = | oct, |
| keywords | = | "FFT GPU GPGPU", |
| number | = | "ECE-CE-2005-3", |
| institution | = | "Department of Electrical and Computer Engineering, University of California, Davis", |
| abstract | = | "In this report we analyze the performance of the fast Fourier
transform (FFT) on graphics hardware (the GPU), comparing it to the
best-of-class CPU implementation FFTW. We describe the FFT, the
architecture of the GPU, and how general-purpose computation is
structured on the GPU. We then identify the factors that influence
FFT performance and describe several experiments that compare these
factors between the CPU and the GPU. We conclude that the overhead
of transferring data and initiating GPU computation are
substantially higher than on the CPU, and thus for latency-critical
applications, the CPU is a superior choice. We show that the CPU
implementation is limited by computation and the GPU implementation
by GPU memory bandwidth and its lack of a writable cache. The GPU is
comparatively better suited for larger FFTs with many FFTs computed
in parallel in applications where FFT throughput is most important;
on these applications GPU and CPU performance is roughly on par. We
also demonstrate that adding additional computation to an
application that includes the FFT, particularly computation that is
GPU-friendly, puts the GPU at an advantage compared to the CPU.", |