VAPRES A Virtual Architecture for Partially Reconfigurable Embedded Systems
Presented by Joseph AntoonAbelardo Jara-Berrocal, Ann Gordon-Ross
NSF Center for High-Performance Reconfigurable Computing (CHREC)
Department of Electrical and Computer EngineeringUniversity of Florida
2Joseph AntoonUniversity of Florida
Adaptive Hardware Applications• Kalman filter used for target tracking
• Finds likely location from noisy measurements• Optimized filter depends on target type
Slow TargetLow Power Constant gain
Low Bandwidth Kalman Filter
Fast TargetHigh Power Constant gain
High Bandwidth Kalman Filter
Airborne TargetHigh Power Variable Gain
Low Bandwidth Multi-scale Smoother
Noisy TargetHigh Power Variable Gain
Low Bandwidth Kalman Filter
3Joseph AntoonUniversity of Florida
Adaptive Hardware Applications• FPGAs often out-perform CPUs
• Parallel computing power• Kalman filters scale well
• Partial Reconfiguration (PR)• Run-time HW adaptation• Allows FPGA time-sharing
• Communication Challenge• Transfers between modules can lock up CPU• Inter-module network alleviates resources
CPU FPGAs
FPGA Device
CP
U Memory
Filter AFilter B
4Joseph AntoonUniversity of Florida
Using Partial Reconfiguration
2. Platform studio 3. Import into ISE
6. Code PR region HDL
System Specifications
1. Define system
5. Set PRRs as black boxes
top
static prr_a prr_b
4. Divide project into mandated hierarchy
7. Synthesize!
9. Map on to PlanAhead
8. Guess Estimate a good floorplan
12. Write software
11. Implement!
10. Create “configurations”
Could you make it just a bit different…
5Joseph AntoonUniversity of Florida
Identifying Issues With PR• Support
• Only supported by Xilinx• Altera support announced
• Lack of abstraction• Manual partitioning• Manual floor-planning
• App-specific architectures• Increased time-to-market• Reduced flexibility
Frustr
ating
Design
Flow
!
In this work, we propose VAPRES• A Virtual Architecture for PR Embedded Systems• Abstracts base system from application• Automates design flow and floor-planning• Scalable, flexible features
6Joseph AntoonUniversity of Florida
VAPRES Architecture
MicroBlaze CPU
PRRegion 1
PRRegion 2
PLB Bus
DCRBridge
PRSocket
PRSocket
FSLFast
Simplex Links
Switch 1 Switch 2IF IF IF IF
IOModule
To IO
MicroBlaze CPU
PRRegion 1
PRRegion 2
PLB Bus
DCRBridge
PRSocket
PRSocket
FSLFast
Simplex Links
Switch 1 Switch 2IF IF IF IF
IOModule
To IO
• PR Regions (PRRs)– Independent clocks– FIFO-based I/O– Online placement– Created separately
• MACS– Intermodule network
• Flexible, scalable– PR Region Count– PR Region Size– MACS bandwidth
• Module channel width• Left to right channel width• Right to left channel width
– IO Module Count
MicroBlaze CPU
PRRegion 1
PRRegion 2
PLB Bus
DCRBridge
PRSocket
PRSocket
FSLFast
Simplex Links
Switch 1 Switch 2
IF IF IF IF
IOModule
To IO
7Joseph AntoonUniversity of Florida
PR Region Connectivity
PR Region
MicroBlaze
MACS Switch
FSLFast Simplex Links
Producer / Consumer
Queues
Slice Macros
Slice Macros
PR Socket
Device Control Register (DCR)
Clo
ck
Mac
ro
PR
R
FS
L
Enable ResetClockSelect
Regional Clock Buffer
(BUFR)
Fast Clock
Slow Clock
Clock Multiplexer (BUFGMUX)
8Joseph AntoonUniversity of Florida
MACS – Intermodule Network• Minimal Adaptive-Routing Circuit Switched Network• Circuit based
• Uses streaming channels• Circuit set by first word in channel• Fast setup (<10 cycles)
Switch 2
IF IF
Module 2
Module 1
Switch 2
IF IF
Module 3dstend
9Joseph AntoonUniversity of Florida
Design Methodology• Two separate design flows
• Base System• Application
• Applications made independently• Only base system specs needed
Bas
e F
low
App
Flo
w
App
Flo
w
App
Flo
w
Base system specifications
10Joseph AntoonUniversity of Florida
SystemSpecs
Base System Design Flow• User feeds specs to VAPRES• Base design created from specs
• Parametric templates used• System files generated
• Floorplan and Constraints• Embedded Dev. Kit (EDK) Files• HDL
• Synthesis• Implementation• Bitstream generated• System downloaded to the board
Base system flow
Generate Bitstream
Implementation
Synthesis
HDLFloorplan
Base Design
Templates
11Joseph AntoonUniversity of Florida
Application Design FlowApplication Flow
Executable
Link
Synthesis
Generate Bitstream
Implementation
SystemSpecs
• Partition App• Hardware• Software
• Software flow• Compile• Link
• Hardware Flow• Synthesize• Implement• Bitstream gen
• Download App
API
Compile
Application Decomposition
HDLSource Code
12Joseph AntoonUniversity of Florida
Revisiting Target Tracking
MicroBlaze CPU
BlankPR Region
PLB Bus
DCRBridge
PRSocket
Switch 2
IF IF
IOModule
Sensor
ICAP Filter Storage
AerospaceKalman
Filter
Looks like a spaceship
AerospaceKalman
Filter
13Joseph AntoonUniversity of Florida
Seamless Filter Swapping
MicroBlaze CPU
BlankModule
SW2
IF IF
IOModule
SW2
IF IF
BlankModule
• Filter tracks target• Target slows down• Filter swap needed
• First load new filter• Spare region used• Old filter continues
• Redirect traffic• Downtime is now
negligible• Previously in seconds
High PowerKalman
Filter
Low PowerKalman
Filter
Low PowerKalman
FilterLow Power
KalmanFilter
Low PowerKalman
Filter
The target changed!
14Joseph AntoonUniversity of Florida
Post Place and Route
Experimental Setup - Resources• Implemented on ML401 board
• Virtex-4 LX25 FPGA
• VAPRES• Two PR Regions• 16x11 CLB region size• Two IOMs
• MACS• Four switches• 32-bit channels• Two channels left to right• Two channels right to left
Floor Plan
Base System View
15Joseph AntoonUniversity of Florida
Results – Resource Usage
MicroBlaze MACS0
2000
4000
6000
8000
10000
12000
VAPRES Resource Usage
9721
1890
28%
6%66%
LX60
67%
14%
19%
MicroBlazeMACSRemaining
LX25
17%4%
79%
LX100
16Joseph AntoonUniversity of Florida
Flash BRAM ICAP
Experimental Setup – Timing• Two methods to reconfigure
• Implemented in software• 1) Write bitfile in one stage• 2) Write bitfile in two stages
• One-stage method• Load Flash sector to BRAM• Write to ICAP• Repeat until bitfile is loaded
• Two-stage method• Load bitfile into BRAM• Write bitfile to ICAP
Less RAM
required
Load once,
write often
ICAPBRAMFlash
Board peripheralFPGA structure
17Joseph AntoonUniversity of Florida
Results – Reconfiguration Time
One-Stage
Two-Stage
0 0.25 0.5 0.75 1 1.25
Loading + WritingLoadingWriting
93%
7%
Loading FlashWriting ICAP
Two-StageTime Breakdown
95%
5%
One-StageTime Breakdown
ICAP write reduced to
71.94 ms
18Joseph AntoonUniversity of Florida
Experimental Setup - Scaling• Four VAPRES Systems Set Up
Small
PRRs: 1Width: 10 CLBHeight: 1 rowMACS: No
Medium
PRRs: 1Width: 10 CLBHeight: 2 rowsMACS: No
Large
PRRs: 2Width: 16 CLBHeight: 2 rowsMACS: Yes
Populous
PRRs: 3Width: 16 CLBHeight: 1 rowMACS: Yes
19Joseph AntoonUniversity of Florida
Results - Scalability
Small Medium Large Populous66006700680069007000710072007300740075007600
Resources (slices)
Increased
PRR Size
Added
PRR
Decreased
PRR Size
20Joseph AntoonUniversity of Florida
Results - Scalability
Small Medium Large Populous114
115
116
117
118
119
120
121
Maximum Clock (MHz)
All designs
meet 100Mhz
constraint
21Joseph AntoonUniversity of Florida
Conclusions• We developed VAPRES
• Virtual Architecture for Partially Reconfigurable Systems
• Contributions• Modular design methodology• PR regions with independent, selectable clocks• Highly parametric design• Seamless filter swapping
• Future work• Algorithms for runtime module placement• Tools to assist system design formulation• Context save and restore for modules
22Joseph AntoonUniversity of Florida
Thank you for attending
Questions?