031211 dean seminar1
TRANSCRIPT
-
8/2/2019 031211 Dean Seminar1
1/19
An Introduction to
Reconfigurable Computing
Mitch Sukalski and Craig Ulmer
Dean R&D Seminar11 December 2003
-
8/2/2019 031211 Dean Seminar1
2/19
Reconfigurable Computing
is computation on a platform with reconfigurable
(i.e., modifiable at run-time) hardware capable
of implementing application-specificalgorithms
and functionality on demand.
-
8/2/2019 031211 Dean Seminar1
3/19
Computing Spectrum
Execute
x / xor
Fetch
Decode
Registers
+
Memory
Writeback
Software
General-PurposeCPU
Easily reprogrammedLow costFundamental bottlenecks
+
z-1
xorx
+
x
A B D
x
C
result
Hardware
Application-SpecificIntegrated Circuit (ASIC)
Not modifiableHigh costExtremely fast
Soft-Hardware
Field ProgrammableGate Arrays (FPGAs)
Reconfigurable hardwareMedium costSpeedup potential
-
8/2/2019 031211 Dean Seminar1
4/19
History
The Teramac CCM:
Multi-Chip Module of FPGAs
Fixed+Variable CPU:
Users can attach new
computational circuits
to a fixed ALU
Xilinx Virtex FPGA
1945: Eckert, Mauchly, von Neumann: ENIAC
1945: von Neumann architecture
1960: Estrin: Fixed+Variable Structure Computer1970s: Simple PLDs
1985: Xilinx introduces first FPGA
1990s: Custom Computing Machines (CCMs)
1999: FPGAs exceed million logic gates
2002: FPGAs include complex cores
ENIAC
Connecting computational
Blocks for an algorithmXilinx Virtex II Pro(image courtesy of rapidio.org)
-
8/2/2019 031211 Dean Seminar1
5/19
Reconfigurable Computing in
Modern HPC Stand-alone platforms
OctigaBay 12K
SRC-6
Starbridge Hypercomputer
Accelerator cards
Timelogics DeCypher
Nallatechs BenNUEY
Annapolis Micro SystemsWILDSTAR II
-
8/2/2019 031211 Dean Seminar1
6/19
Example: Computational Fluid Dynamics
William Smith & Austars Schnore at GE Global Research
From: Towards an RCC-based Accelerator for
Computational Fluid Dynamics, ERSA 2003
-
8/2/2019 031211 Dean Seminar1
7/19
And now for some details
Field Programmable Gate Arrays (FPGAs)
Common RC design techniques
Reported examples
-
8/2/2019 031211 Dean Seminar1
8/19
Field-Programmable Gate Arrays (FPGAs)
FPGAs emulate digital logic circuitry
Large array of configurable logic blocks
Internal routing through programmable interconnection network
FPGAs hold hardware configuration in SRAM Change the digital circuitry by loading new configuration
Design approach:
User designs in hardware description language
Synthesis tools translate to logic gates
Mapping tools target specific FPGA
-
8/2/2019 031211 Dean Seminar1
9/19
Register
Register
LUT
LUT
Simplified Logic Block
Emulates logic function
Thousands per chip
Lookup Table (LUT)
Holds truth table
Inputs produce outputs
1-bit registers
Hold data between cycles
Note: Greatly simplified
-
8/2/2019 031211 Dean Seminar1
10/19
LUT Example:1-bit Adder
A B Cin Cout Sum
0 0 0 0 0
0 0 1 0 1
0 1 0 0 1
0 1 1 1 0
1 0 0 0 1
1 0 1 1 0
1 1 0 1 0
1 1 1 1 1
Register
Register
LUT
LUT
ABC0
ABC0
Cout
Sum
Truth Table
-
8/2/2019 031211 Dean Seminar1
11/19
-
8/2/2019 031211 Dean Seminar1
12/19
Reconfiguration
Modern FPGAs SRAM based
Can be loaded with new circuitry
Full reconfiguration
Few megabytes of configuration
Milliseconds
Partial reconfiguration
Reprogram only a portion of chip Reduces configuration time
Non-trivial, poorly supported
FPGA
Full Configuration Image
Partial Configuration Image
-
8/2/2019 031211 Dean Seminar1
13/19
Design Techniques
Digital logic design techniques for
exploiting FPGAs
-
8/2/2019 031211 Dean Seminar1
14/19
FPGAs as Computational Accelerators
Use FPGAs as soft-hardware
Port algorithm to hardware
Run inside FPGA
Reuse hardware
Techniques
Concurrency, memory, partial evaluation
-
8/2/2019 031211 Dean Seminar1
15/19
1. Concurrency
Load FPGA with multiple computational circuits
Hardware state machines are like threads, but..
All tasks are always running
Raw parallelism Units run in parallel
Example: Key breaking
Pipelining Chain units together in series
Example: Streaming computations, data-flow
-
8/2/2019 031211 Dean Seminar1
16/19
2. Custom Memory Interactions
Most FPGA cards have multiple memory banks
Fetch/store multiple data values at same time
Predictable performance (as opposed to caches)
Hide address generation
SRAM
Bank 0
SRAM
Bank 1
SRAM
Bank 2
SRAM
Bank 3
X
X
X SRAMBank 4
FPGA
-
8/2/2019 031211 Dean Seminar1
17/19
3. Partial Evaluation
Know data constants at design time
Apply to circuits and reduce hardware
Synthesis tools perform automatically
Note: FPGAs unique because we can easily generate new, optimized
hardware configurations for each set of constants.
Example: 4-bit Ripple-Carry Adder
-
8/2/2019 031211 Dean Seminar1
18/19
RC Performance Examples
CFD: 23 GFLOPS sustained Towards an RCC-based Accelerator for
Computational Fluid Dynamics, Smith & Schnore,2003
Adaptive beamforming: 20 GFLOPS Parallel systolic array architecture
20 GFLOPS QR processor on a Xilinx Virtex-EFPGA, Walke, et. al., 2000
Real-time holographic video display at 30fps Using field programmable gate arrays to scale up the
speed of holographic video computation, Nwodoh
-
8/2/2019 031211 Dean Seminar1
19/19
In Summary
Reconfigurable computing uses FPGAs to
emulate application-specific hardware
Achieve performance gains with dedicated hardware
It is possible to implement just about any kind of
digital hardware in the FPGA.
Limited by capacity and effort
Resurrect application-specific hardware architectures
SIMD, MIMD, Systolic Processor Arrays, Data-Flow