Presented by
Field-Programmable Gate Array Research Speeds HPC “up to 100X”
Olaf O. StoraasliFuture Technologies Group
Computer Science and Mathematics Division
2 Storaasli_FPGA_SC07
Contents Background: Why FPGAs? ORNL success: FPGA systems, tools and up to 100X speedup Partners: Research Lab, , SRC,
THE SUPERCOMPUTER COMPANY
Explore FPGAs for future ORNL HPC
Virtex4 FPGA blades “accelerate mission-critical applications > 100X.”
Steve Scott, CTO HPCWire 24/3/2006
“After exhaustive analysis, Cray concluded that, although multi-core commodity processors will deliver some improvement, exploiting parallelism through a variety of processor technologies using scalar, vector, multithreading and hardware accelerators (e.g., FPGAs or ClearSpeed co-processors) creates the greatest opportunity for application acceleration.”
ORNL benefit: Exceed petaflops and reduce power
Why HPC vendors offer FPGAs
, ,
3 Storaasli_FPGA_SC07
FPGA Logic slice
What’s an FPGA?Your “custom chip”
Xilinx Virtex4 FPGA: 25K slices (miniCPUs) Logic array: user-tailored to application On-chip RAM, multipliers and PowerPCs Gigabit transceivers/DSP blocks => FastIO/precision 100–1000 operations/clock cycle
4 Storaasli_FPGA_SC07
0
100
200
300
Computation(GOPS)
Memory Bandwidth(GB/sec)
IO Bandwidth(Gbps)
PentiumVirtex-4
FPGAVirtex4
Pentium
Why FPGAs? Performance—optimal silicon use
(maximize parallel ops/cycle) Rapid growth—cells, speed, I/O Power—1/10th CPUs Flexible—tailor to application
1000
800
600
400
200
00
100
200
300
400
500
600
700
2002 2004 2006 2008
Thou
sand
s
Logic Cells
MH
z
Clock speed (MHz)
5 Storaasli_FPGA_SC07
Cray XD1
ORNL FPGA hardware/tools SRC-6 (Carte), Digilent (Viva,
VHDL), Nallatech (Viva) Cray XD1 (MitrionC, VHDL):
6 FPGAs + 144 Opterons SGI RASC-Altix/Virtex4s (MitrionC) CHiMPS (Bee2 => Cray XD1 => DRC
=> XT4)
RASC sgi
6 Storaasli_FPGA_SC07
Find parallelism: 80% FFTs
More GF/$ GF/Watt
Goal
QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture.
Profile
Model faster
Ported HPC code spectral transform shallow water model (STSWM) to FPGAs
FTRNDE
FTRNPE
FTTdd
UV FFT
SHTRNS FFT
COMP1
STEP
FTRNEX FTRNVX
8 calls in parallel
3 functions in parallel
2 calls in parallel
HLL compilerCHiMPS, Mitrion(FPGA Tools Inside) FPGA
speedup
HLL developerprofiles
7 Storaasli_FPGA_SC07
Viva: Graphical icons—3-dimensional MitrionC: Text/flow—1-dimensional
Exploring programming options
Gauss matrix solver Compiler, simulator, and debugger
+ Carte/SRC, CHiMPS-VHDL/Xilinx ,
8 Storaasli_FPGA_SC07
*FPGA vs 2.2 GHz Opteron
First mixed-precision LU and solver for FPGAs
Benefits:High performance of LP arithmeticHigh precision accuracySpeedup increases with matrix size(as LU dominates calculations)
Design Double FP Single FP S10e5
PE Amount 8 16 32
Max size 128 256 256
Achievablefrequency 120 MHz 150 MHz 150 MHz
Slices 27,005 (57%) 14,792 (59%) 14,730 (62%)
BRAMs 68 (29%) 129 (55%) 65 (28%)
MULT18X18 128 (55%) 64 (27%) 32 (13%)
0
200
400
600
800
1000
Exe
cutio
n tim
e (u
s)
64 96 128Matrix size
8757149
218133
404
865
443
258
S10e5SingleDouble
0
20
30
Spe
edup
double single S10e5
10.97.7
21.3
9.7 10.3
36.6LUSolver
Design data type
10
40
37X* LU decomposition speedup10X for matrix equation solver
9 Storaasli_FPGA_SC07
FPG
A S
peed
up
8 hrs => 5 min
*Virtex-4 FPGA vs 2.2 GHz Opteron on Cray XD1
120
100
80
60
40
20
0 26 28 30 32 34 36 38 40Genome sequence
8K w/align16K w/align8K w/o align16K w/o align
100X* DNA sequence speedupBacillus anthracis human DNA comparison
24#
# 24= Sequence AE17024
10 Storaasli_FPGA_SC07
FPGA speedup growswith query size
11 Storaasli_FPGA_SC07
Acknowledgement:
This research is supported by the Office of Science of the U.S. Department of Energy under Contract No. DE-AC05-00OR22725.
Summary
ORNL FPGA research: Increasing HPC relevence FPGA systems: Cray, SRC, Nallatech, Digilent, SGI Compilers: Mitrion-C, Carte, Viva, DSPlogic, CHiMPS Speedup: 10X eqn soln, 100X DNA sequencing Partners: Xilinx, UT, Mitrion, Cray, SGI
Next: Explore DRC, more FPGAs and CHiMPS
12 Storaasli_FPGA_SC07
Contact
12 Storaasli_ReconfigHPC_SC07
Olaf StoraasliFuture Technologies GroupComputer Science and Mathematics [email protected] Olaf ORNL