an introduction to reconfigurable computing mitch sukalski and craig ulmer dean r&d seminar 11...
TRANSCRIPT
An Introduction to Reconfigurable Computing
Mitch Sukalski and Craig Ulmer
Dean R&D Seminar
11 December 2003
Reconfigurable Computing…
is computation on a platform with reconfigurable (i.e., modifiable at run-time) hardware capable of implementing application-specific algorithms and functionality on demand.
Computing Spectrum
Executex / xor
Fetch
Decode
Registers
+
Memory
Writeback
Software
General-PurposeCPU
•Easily reprogrammed•Low cost•Fundamental bottlenecks
+
z-1
xorx
+
x
A B D π
x
C
result
Hardware
Application-Specific Integrated Circuit (ASIC)
•Not modifiable•High cost•Extremely fast
Soft-Hardware
Field ProgrammableGate Arrays (FPGAs)
•Reconfigurable hardware•Medium cost•Speedup potential
History
The Teramac CCM: Multi-Chip Module of FPGAs
Fixed+Variable CPU:Users can attach new computational circuits
to a fixed ALU
Xilinx Virtex FPGA
1945: Eckert, Mauchly, von Neumann: ENIAC
1945: “von Neumann architecture”
1960: Estrin: Fixed+Variable Structure Computer
1970’s: Simple PLDs
1985: Xilinx introduces first FPGA
1990’s: Custom Computing Machines (CCMs)
1999: FPGAs exceed million logic gates
2002: FPGAs include complex cores
ENIACConnecting computational
Blocks for an algorithmXilinx Virtex II Pro(image courtesy of rapidio.org)
Reconfigurable Computing in Modern HPC
• Stand-alone platforms– OctigaBay 12K– SRC-6– Starbridge Hypercomputer
• Accelerator cards– Timelogic’s DeCypher– Nallatech’s BenNUEY– Annapolis Micro Systems
WILDSTAR II
Example: Computational Fluid Dynamics
William Smith & Austars Schnore at GE Global Research
From: “Towards an RCC-based Accelerator for Computational Fluid Dynamics,” ERSA 2003
And now for some details…
• Field Programmable Gate Arrays (FPGAs)• Common RC design techniques• Reported examples
Field-Programmable Gate Arrays (FPGAs)
• FPGAs emulate digital logic circuitry– Large array of configurable logic blocks– Internal routing through programmable interconnection network
• FPGAs hold hardware configuration in SRAM– Change the digital circuitry by loading new configuration
• Design approach:– User designs in hardware description language– Synthesis tools translate to logic gates– Mapping tools target specific FPGA
Register
Register
LUT
LUT
Simplified Logic Block
• Emulates logic function– Thousands per chip
• Lookup Table (LUT)– Holds truth table– Inputs produce outputs
• 1-bit registers– Hold data between cycles
• Note: Greatly simplified
LUT Example:1-bit Adder
A B Cin Cout Sum
0 0 0 0 0
0 0 1 0 1
0 1 0 0 1
0 1 1 1 0
1 0 0 0 1
1 0 1 1 0
1 1 0 1 0
1 1 1 1 1
Register
Register
LUT
LUT
ABC0
ABC0
Cout
Sum
Truth Table
LBLB LBLB LBLB
LBLB LBLB LBLB
LBLB LBLB
LBLB LBLB
X X XX
LBLB LBLB LBLB LBLB LBLB
X X XX
LBLB LBLB LBLB LBLB LBLB
X X XX
LBLB LBLB LBLB LBLB LBLB
X X XX
Routing Data between Logic Blocks
• Need to connect logic blocks
• Wires and Switchboxes– LBs connect to local wires– Switchboxes route long
connections
• Routing set at compile time– Performed by tools
Reconfiguration
• Modern FPGAs SRAM based– Can be loaded with new circuitry
• Full reconfiguration– Few megabytes of configuration– Milliseconds
• Partial reconfiguration– Reprogram only a portion of chip– Reduces configuration time– Non-trivial, poorly supported
FPGA
Full Configuration Image
Partial Configuration Image
FPGAs as Computational Accelerators
• Use FPGAs as soft-hardware– Port algorithm to hardware– Run inside FPGA– Reuse hardware
• Techniques– Concurrency, memory, partial evaluation
1. Concurrency
• Load FPGA with multiple computational circuits– Hardware state machines are like threads, but..– All tasks are always running
• Raw parallelism– Units run in parallel– Example: Key breaking
• Pipelining– Chain units together in series– Example: Streaming computations, data-flow
2. Custom Memory Interactions
• Most FPGA cards have multiple memory banks– Fetch/store multiple data values at same time– Predictable performance (as opposed to caches)– Hide address generation
SRAMBank 0
SRAMBank 1
SRAMBank 2
SRAMBank 3
X
X
XSRAMBank 4
FPGA
3. Partial Evaluation
• Know data constants at design time– Apply to circuits and reduce hardware– Synthesis tools perform automatically
Note: FPGAs unique because we can easily generate new, optimized hardware configurations for each set of constants.
Example: 4-bit Ripple-Carry Adder
RC Performance Examples
• CFD: 23 GFLOPS sustained– “Towards an RCC-based Accelerator for
Computational Fluid Dynamics,” Smith & Schnore, 2003
• Adaptive beamforming: 20 GFLOPS– Parallel systolic array architecture– “20 GFLOPS QR processor on a Xilinx Virtex-E
FPGA,” Walke, et. al., 2000
• Real-time holographic video display at 30fps– “Using field programmable gate arrays to scale up the
speed of holographic video computation,” Nwodoh
In Summary
• Reconfigurable computing uses FPGAs to emulate application-specific hardware– Achieve performance gains with dedicated hardware
• It is possible to implement just about any kind of digital hardware in the FPGA. – Limited by capacity and effort– Resurrect application-specific hardware architectures– SIMD, MIMD, Systolic Processor Arrays, Data-Flow…