Tools for Reconfigurable Supercomputing
Kris GajGeorge Mason University
1
Application Developmentfor Reconfigurable Computers
ProgramEntry
Compilation
Execution
Platformmapping
Debugging &Verification
2
Tasks Addressed in This Presentation
ProgramEntry
Compilation
Execution
Platformmapping
Debugging &Verification
3
Program Entry
Program
4
Platform MappingSW/HW Partitioning
5
Software(executed in
the microprocessor system)
Hardware(executed in
the reconfigurableprocessor system)
Program
SW/HW Partitioning & CodingTraditional Approach
6
Specification
SW/HW Partitioning
SW Coding HW Coding
SW Compilation HW Compilation
SW Profiling HW Profiling
SW/HW Partitioning & CodingNew Approach
7
Specification
SW/HW Coding
SW Compilation HW Compilation
SW Profiling HW Profiling
SW/HW Partitioning
Program Entry for FPGA Accelerator BoardsGraphicalData FlowDiagram
HLLHDLSoftware
TraditionalHardware
SoftwareExtended
Hardware
Increased productivity
8
Increased capability to describe parallel execution
Program Entry for Reconfigurable ComputersHLLHDL Graphical
Data FlowDiagram
SoftwareStar Bridge
COMobjects
porting EDIF
Hardware
SoftwareSRC
Hardware HDL macros
Increased productivity
9
Increased capability to describe parallel execution
Examples of Software Environmentsfor Reconfigurable Computers
DSP-oriented
• Corefire from Annapolis Microsystems
• Xtreme DSP from Xilinx Inc. & MathWorks
• Viva from Star Bridge Systems
• SRC Software Environment from SRC Computers, Inc.
General-purpose
10
CoreFire FPGA Application Builder
11
Design viewer
Library Cores window Message Window Diagram Editor
Xtreme DSP Environment
12
Star Bridge Software EnvironmentUser input
13
Place & Route
.bin files
.ngo files
Applicationexecutable
Configurationbitstreams
Netlists
VIVA
Graphical User Interface
Xilinx
14
Library
Object
Sheets
SRC Compilation Process
Macro sources Macro sources Application sourcesApplication sources
Objectfiles
MAP CompilerµP Compiler
Logic synthesis
Place & Route
Linker
.v files
.bin files
.ngo files
.o files .o files
Applicationexecutable
Configurationbitstreams
HDLsources
Netlists
.c or .f files .vhd or .v files
Objectfiles
MAP CompilerµP Compiler
Logic synthesis
Place & Route
Linker
.v files
.bin files
.ngo files
.o files .o files
Applicationexecutable
Configurationbitstreams
HDLsources
Netlists
.c or .f files .vhd or .v files
15
Cray XD1 – Traditional Design Flow
16
VHDL,Verilog, C
Modelsim
Synplicity,Leonardo,Precision,Xilinx ISE
Xilinx ISE
Simulate
ImplementSynthesizeHDL
Xilinx ChipScope
Cores
RA I/F,QDR SRAM I/F 0100010101
10101010110100101011010101101010011101010110101010
From Command line or Application
Download
Verify
Binary File
Metadata
Source: [Cray, MAPLD04]
Cray XD1 – New design flow
17
Source: [Cray, MAPLD04]
SGI Altix Design Flow (HDLs)
Design iterations
18
IA-32 Linux
Machine
Design Entry(Verilog, VHDL)
Design Synthesis(Synplify Pro,
Amplify)
Design Implementation
(ISE)
Design Verification
Behavioral Simulation(VCS, Modelsim)
Static Timing Analysis(ISE Timing Analyzer)
.v, .vhd.v, .vhd
.edf
.bin
MetadataProcessing
(Python)
.v, .vhd
Altix
.ncd, .pcf
.cfg
Device Programming(RASC Abstraction Layer,
Device Manager, Device Driver)
Real-time Verification
(gdb)
.c
SGI Altix Design Flow (HLLs)
RTL Generation and Integration with Core Services
Design Synthesis(Synplify Pro,
Amplify)
Design Verification
Behavioral Simulation(VCS, Modelsim)
Static Timing Analysis(ISE Timing Analyzer)
.v, .vhd.v,
.vhd
.ncd, .pcf
.bin
IA-32 Linux
Machine.edf
MetadataProcessing
(Python)
.v, .vhd
.cfg
Altix Device Programming(RASC Abstraction Layer,
Device Manager, Device Driver)
Real-time Verification
(gdb)
.c
Design Implementation(ISE)
HLL Design Entry(Handel-C, Impulse C, Mitrion C, Viva)
19
Platform MappingFPGA mapping
20
Software
HardwareProgram
FPGA 1 FPGA 2
FPGA 3
FPGA 4
Example of FPGA Mapping
add
multiply divide
FPGA 1 FPGA 2
add
FPGA
multiply divide
21
addmultiply divide
FPGA 2FPGA 1
FPGA Mapping in SRCFPGA1.mc
Makefile
MAPFILES = FPGA1.mc FPGA2.mcPRIMARY = FPGA1.mcSECONDARY = FPGA2.mcCHIP2 = FPGA2.mc
void fpga1(int64_t a, int64_t b, int64_t *sum, int mapno)
{int64_t c, temp;
send_to_bridge(b);c = a * const1;recv_from_bridge(&temp);*sum = temp+Mult;
}
void fpga2(){
int64_t a, d;
recv_from_bridge(&a);d = a/const2;send_to_bridge(d);
}
FPGA2.mc
a b
add
multiply divide
FPGA 1 FPGA 2
sum22
FPGA Mapping in VIVA TM
By changing the attributes one can specify where an object is to be located
23
Platform MappingFPGA-FPGA data transfer & synchronization
24
Software
HardwareProgram
FPGA 1 FPGA 2
FPGA 3
FPGA 4
FPGA-FPGA Data Transfer in SRCFPGA1.mc
FPGA 1 FPGA 264
64
computation2
computation1
void fpga1(int64_t a, b, c, *d){
send_to_bridge(a, b, c);computation1
recv_from_bridge(d);}
void fpga2(){
int64_t a,b,c,d;
recv_from_bridge(&a, &b, &c);computation2
send_to_bridge(d);}
FPGA2.mc
a
b
c
d
25
FPGA-FPGA Data Transfer in SRC
32 words
64 bits
64 bits
64
64
64
32 words
FIFO
FIFO
Bridge Port
26
FPGA-FPGA Data Transfer in VIVA TM
27
Special partitioning objects placed between the modules to be synthesized automatically map the relevant lines between the FPGAs.For designs mapped over several FPGAs:The system description must include those FPGAsover which the design is to be mapped,
Platform MappingUse of Internal and External Memories
28
Software
HardwareProgram
FPGA 1FPGA 2
FPGA 3
FPGA 4
OCM
OCM – On-Chip Memory LM – Local Memory SM – Shared Memory
SM
LM
Using On-Chip Memory (OCM) in SRCvoid sum(int64_t a[], int *c, int mapno)
{
BANK_A_ALLOC(AL, int64_t, SIZE);
ocm_a [SIZE];
int i;
cm2obm_0(AL, a, byteLength);
wait_server_0();
for(i=0; i<SIZE; i++) {
ocm_a[i] = AL[i]; }
for(i=0; i<SIZE; i++) {
tmp = ocm_a[i] + tmp; }
}
FPGA
SM(OBM)
64
32
AL[]
ocm_a[]OCM
computationsc
29
Using On-Chip Memory (OCM) in VIVATM
Special Objects under the Memory Subsystem of the library allows the programmer to use the on chip memory of the Xilinx VirtexII chip
30
Platform MappingI/O
31
Software
HardwareProgram
FPGA 1 FPGA 2
FPGA 3
FPGA 4
SM
LM
OCM
SRC
StarBridge
Run Time Reconfiguration in SRCProgram in C or Fortran
32
Main program
Function_2(d, e, f)
Function_1(a, d, e)
Function_1
Function_2
Macro_1(a, b, c)
Macro_2(b, d)Macro_2(c, e)
Macro_3(s, t)
Macro_1(n, b)Macro_4(t, k)
FPGA……
……
……
Macro_1
Macro_2 Macro_2
a
b c
d e
FPGA contents afterthe Function_1 call
Run-time Reconfiguration in VIVATM
33
Reconfiguration is possible by using the spawn object.By specifying the FileNameattribute a VIVA executable (.vex file) or a VIVA project can be loaded onto the same or a different FPGA.
Ideal Program Entry
Function
ProgramEntry
34
Actual Program Entry
FunctionPreferred
ArchitecturesSW/HW
Partitioning
Sequence of Run-time Reconfigurations
Use of FPGAResources
(multipliers,µP cores)
ProgramEntry
35
Data Transfers& Synchronization
FPGAMapping
Use of Internaland External Memories
SW/HW Interface
Not implemented
ManualEntry
CompilerAutomated
36
SRC
Star Bridge
µP-FPGA Partitioning
FPGA-FPGA Partitioning
µP-FPGA Data Transfer
FPGA-FPGA Data Transfer
Computation-Data transfer Overlapping
Choosing component version
Evolution and the current status of tools
and othervendors
. . . . . . . . .
Library Development - SRC
37
HLL (C, Fortran)
HDL (VHDL, Verilog)
µP system
FPGA system
ApplicationProgrammer
LibraryDeveloper
HLL (C, Fortran)
HLL (C, Fortran)
LLL (ASM)
HLL (C, Fortran)
Library Development - StarBridge
38
GDF (Viva)
HDL (VHDL, Verilog)
µP system
FPGA system
ApplicationProgrammer
LibraryDeveloper
GDF (Viva)
GDF (Viva)
HLL, LLL (C++, ASM)
GDF (Viva)
Debugging & Verification
39
CoreFireTM FPGA Application Debugger
40
Corefire Simulation
Insert Debug Modules During Design EditingStep Through Design Using Data FlowOne Step = One ModuleView Value and Status of Each Debug ModuleWaveform or Table of ValuesRead and Write Directly to RegistersRead and Write Directly to Memory
41
X86 System in VIVATM
The FileInObject as it appears when the x86 system is loaded
42
X86 System in VIVATM
FileIn object as it appears when the FPGA system description is loaded.
43
Debugging in VIVATM
44
Data can be viewed with the help of widgets, which are basically input and output ‘horns’ placed in a worksheet.Various display options are available to view data, options to include the kind of view desired by the viewer and the data viewed can be switched between HEX or INT.
MAP Board Execution
ApplicationApplication
MAP RuntimeMAP RuntimeLibraryLibrary
ComListComListCodeCode
WrapperWrapperCodeCode
User LogicUser Logic
SubroutineSubroutineFor MAPFor MAP
MAP BoardMAP Board
Data &Data &FlagsFlags
User FPGAsUser FPGAs
Control Processor
OnOn--boardboardMemoryMemory
User LogicUser Logic
Registers & Flags
LogicMacro
LogicMacro
LogicMacro
LogicMacro
ComList ProcessorDMA Engine
45
MAP Emulator + DFG Simulator
46
EmulatorEmulatorApplicationApplication
MAP RuntimeMAP RuntimeLibraryLibrary
ComListComListCodeCode
WrapperWrapperCodeCode
User LogicUser Logic
SubroutineSubroutineFor MAPFor MAP Data &Data &
FlagsFlags
User FPGAsUser FPGAs
Control Processor
OnOn--boardboardMemoryMemory
User LogicUser Logic
Registers & Flags
C CodeMacro
C CodeMacro
C CodeMacro
C CodeMacro
ComList ProcessorDMA Engine
MAP Emulator + Verilog Simulator
EmulatorEmulatorApplicationApplication
MAP RuntimeMAP RuntimeLibraryLibrary
ComListComListCodeCode
WrapperWrapperCodeCode
User LogicUser Logic
SubroutineSubroutineFor MAPFor MAP Data &Data &
FlagsFlags
User FPGAsUser FPGAs
Control Processor
OnOn--boardboardMemoryMemory
User LogicUser Logic
Registers & Flags
VCSVCS
VerilogMacro
VerilogMacro
VerilogMacro
VerilogMacro
ComList ProcessorDMA Engine
47
Summary – Program Entry
SRC StarBridge
Xtreme DSP Corefire
Program entry model
HLL GDF HLL, GDF HLL, GDF
Programminglanguages
HLL C, Fortran Matlab JavaGDF - VIVA
IIADLSimulink Corefire
ApplicationBuilder
HDL VHDL, Verilog EDIF VHDL, Verilog ?
48
Summary – Partitioning & Data Transfer
SRC Star Bridge Xtreme DSP CorefireFPGAMapping
SeparateHLL
functions
Systemattributes of
objects
Separate designsheets
Separate designsheets
FPGA-FPGAData Transfer
send-tobridge,
recv-frombridgemacro
special datatransfer
objects suchas PE1
=>PE2_50
interface librarycomponents
interface librarycomponents
49
Summary – Synchronization
SRC Star Bridge Xtreme DSP CorefireImplicit Explicit: Go-Done-
Busy-WaitExplicit: done, empty, full, etc.
Implicit
50
Summary – Run-time reconfiguration
SRC Star Bridge Xtreme DSP
Corefire
Run-timereconfiguration
sequence ofMAP
functioncalls
using spawnobjects
associated withVIVA
executables
? ?
51
Summary – Use of Internal Resources
Using internal components of FPGA
SRC Star Bridge Xtreme DSP Corefire
block RAM’s Arraysdefined
inside MAPfunctions
Specialobjectsunder
the memorysubsystem
Memoryblock sets
Memoryoperators
andresources
multipliers Libraryfunctions
Objectsunder
arithmeticsubsystem
Math blocksets
Modules inmath library
52
Summary – Data TypesData types SRC Star Bridge Xtreme DSP Corefire
Unsignedintegers 8,16,32,64 bits 1-128 bits - 32, 64 bits
Signed integers 8,16,32,64 bits 1-128 bits - 8 bitsFixed point - fix16, fix32 signed,
unsignedvariable size
-
Floating point Single & double precision
32-bit single
precision
64-bitdouble
precision
32-bit single
precisionArrays 8, 16, 32, 64
bitvectors of 1-128 bits
- -
User definedtypes
- yes user-definedprecisionoptions
-
Complex - - - 16-bit signedintegers, float
53
Summary – LibrariesLibraries SRC Star Bridge Xtreme DSP Corefire
arithmetic yes yes yes yes
logic yes yes yes yes
storage implicit yes yes yes
memory allocation and access
yes yes yes yes
control yes yes yes -
data transfer yes yes yes yes
debugging and profiling
yes yes yes yes
DSP - - yes yes
communication - - yes -
User definedcomponents
Macros inVHDL orVerilog
Objects inVIVA
Black boxesin Simulink
Macros inCorefire
54
Summary – Debugging & Verification
Debugging and verification
SRC Star Bridge
Xtreme DSP
Corefire
software emulation yes yes yes no
HDL simulation yes no yes no
55
Summary – Third Party Tools & I/OUse of external tools
SRC Star Bridge Xtreme DSP Corefire
logic synthesis
Synplicity Synplify Pro
- Synplicity Synplify Pro, Mentor
Graphics Leonardo Spectrum, Xilinx XST
-
MAP, PAR Xilinx ISE Xilinx ISE Xilinx ISE Xilinx ISEµp compilation Intel - Matlab -Schematic capture
- VIVA Simulink Application Builder
HDL simulation
VCS - ModelSim -
Components of the environment
Editors, compiler, DFG
behavioral simulator, VCS HDL simulator
VIVA (program entry, compiler, debugging and
verification, execution
environment)
Simulink program entry, HDL simulator, synthesis compiler, place and route tool
Application builder,
debugger
Input and output
standard HLL i/o functions,
files
widgets, files files, block sets files, waveforms,
tables
56
Acknowledgements
• SRC• Star Bridge• Sashisu Bajracharya (GMU)• Esmail Chitalwala (GWU)• Esam El-Araby (GWU)• Miaoqing Huang (GWU)• Allen Michalski (USC)• Nghi Nguyen (GMU)• Proshanta Saha (GWU)• Nandkishore Sastry (GMU)• Chang Shu (GMU)• Mohamed Taher (GWU)
57