gpu versus fpga for high productivity...

Post on 22-May-2020

13 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

GPU versus FPGA for high productivitycomputing

David Huw Jones, Adam Powell, Christos-Savvas Bouganis,Peter Y. K. Cheung

August 26, 2010

David Huw JonesGPU versus FPGA for high productivity computing1 / 21

Overview

I Just another GPU vs FPGA comparison?I Productivity versus performanceI First review of new COTS FPGA system, HC-1I Benchmarked on different process architectures.I Analysis from point of view of HPC user.

David Huw JonesGPU versus FPGA for high productivity computing2 / 21

Contents

I Devices

I Benchmarks

I Results

I Conclusions

David Huw JonesGPU versus FPGA for high productivity computing3 / 21

HC-1: What is it?

a

aImage courtesy of ConveyComputer

Cost ∼ £30kForm 2U

FLOPS 65 GRated 1.4 kW

Typical 600 W

David Huw JonesGPU versus FPGA for high productivity computing4 / 21

HC-1: What is it?

5x Virtex 58x Stratix II

128GB DDR2 RAM80GB/s bandwidth

32 300Mhz cores

David Huw JonesGPU versus FPGA for high productivity computing5 / 21

GTX285: What is it?

a

aImage courtesy ofphoto.hardwarebistro.com

Cost ∼ £1kForm PCI card

FLOPS 1063 GRated 550 W

Typical 400 W

David Huw JonesGPU versus FPGA for high productivity computing6 / 21

GTX285: What is it?

192x 1.2Ghz cores1GB DDR3 RAM

120GB/s bandwidth

David Huw JonesGPU versus FPGA for high productivity computing7 / 21

Development options

David Huw JonesGPU versus FPGA for high productivity computing8 / 21

Development options considered

1. Why not custom personalities?I Convey estimate 3 month development timeI Average HPC job length 3 hours, max 24 days (SDSC)I Not OpenFPGA compliant, no open-source

2. Why not C - to - gates?I No compiler for HC-1 (yet)I Upper limit from application-specific personality benchmarkI Unfortunately, and despite 40 years of parallelizing compilers

for all sorts of machines, [optimization] algorithms don’t workterribly well (Ian Page 2004)

David Huw JonesGPU versus FPGA for high productivity computing9 / 21

Algorithmic skeletons

David Huw JonesGPU versus FPGA for high productivity computing10 / 21

Benchmarks

1. Random number generationI using Mersenne Twister

2. Matrix multiplicationI Floating pointI 32 and 64 bit

3. Sum of vectorI Floating pointI 32 and 64 bit

4. N-body simulationI 2nd orderI 2-D and 3-D

David Huw JonesGPU versus FPGA for high productivity computing11 / 21

Architecture and implementation of benchmarks

BenchmarksProcess Architecture 1 2 3 4Asynchronous Pipeline X X

Tree computation X

Crowd computation X

GPU Process implementationOptimised software X X X

FPGA Process implementationOptimised software X X

Optimised firmware X

David Huw JonesGPU versus FPGA for high productivity computing12 / 21

Expected results

I GPU: 192 cores at 1.2GHz will beat HC-1 64 cores at 300Mhz

I Synchronisation expensive on GPU

I 64-bit calculations expensive on GPU

I Custom firmware on HC-1 will beat software of GPU

David Huw JonesGPU versus FPGA for high productivity computing13 / 21

Random number generation

HC-1: 88.9xGPU: 89.3x

David Huw JonesGPU versus FPGA for high productivity computing14 / 21

Performance - Matrix multiplication

HC-1: 48.8xGPU: 190.4x

David Huw JonesGPU versus FPGA for high productivity computing15 / 21

Performance - Matrix multiplication

HC-1: 52.5xGPU: 98.0x

David Huw JonesGPU versus FPGA for high productivity computing16 / 21

Performance - Sum of vector

HC-1: 125.6xGPU: 306.4x

David Huw JonesGPU versus FPGA for high productivity computing17 / 21

Performance - Sum of vector

HC-1: 81.1xGPU: 109.3x

David Huw JonesGPU versus FPGA for high productivity computing18 / 21

Performance - N-body simulation

HC-1: 1.9xGPU: 43.2x

David Huw JonesGPU versus FPGA for high productivity computing19 / 21

Conclusions

I Both platforms outperformed CPU

I Most cases GPU outperformed HC-1I The exception used custom firmware

I Closed-sourceI Not standards-compliantI Developing HC-1 compatible code ≥3month task

I Further anecdotal evidenceI TOP500I Manufacturers, Cray et al.

David Huw JonesGPU versus FPGA for high productivity computing20 / 21

Acknowledgements

I Thanks toI EPSRC (Grants EP/C549481, EP/E045472).I Convey ComputerI Nvidia

I Questions?

David Huw JonesGPU versus FPGA for high productivity computing21 / 21

top related