hw/sw co-synthesis of dynamically reconfigurable embedded systems hw/sw partitioning and scheduling...
Post on 22-Dec-2015
237 views
TRANSCRIPT
HW/SW Co-Synthesis of Dynamically Reconfigurable
Embedded Systems
HW/SW Partitioning and Scheduling Algorithms
Presentation Outline
Introduction Basics/Preliminaries Problem Formulation Representative Approaches Conclusion
Introduction
Embedded Systems? Special purpose/dedicated systems
Design Goals? Highly optimized but Cost Efficient
Examples embedded system provides a friendly interface hand-held devices, such as a cellular phone or PDA an industrial controller safety-critical controller, such as an antilock brake controller in a
car or an autopilot
Generic Architectural Template
General Purpose Processor
General Purpose Processor
Digital Signal
Processor
Digital Signal
Processor
ASIC ASIC
Dedicated Data path Dedicated Data path
MemoryMemory
HW/SW Co-Design
Need? Increasing design complexities Need to explore the design efficiently CAD/Design Automation
Co-Design Steps Co-specification: Specifications describing both HW/SW
elements (and the relationship between them) Co-synthesis: Automatic or semi-automatic design of
HW/SW to meet a specification Co-Simulation: Simultaneous simulation of HW/SW
elements, often at different levels of abstraction
Co-Synthesis Problem
Partitioning the functional description between HW and SW
Allocating processes to processing elements (PEs)
Scheduling processes on the PEs Binding processing elements to particular
component types
Dynamically Reconfigurable Logic
Alternative to conventional ASICs and general-purpose processors
post-fabrication customized for a wide class of applications
partially reconfigured at run-time to implement different tasks without effecting computation of other tasks
OnChipSRAM/Cache
OnChipSRAM/Cache
Embedded CPUEmbedded CPU
DynamicallyReconfigurableData path
DynamicallyReconfigurableData path
DRL Architecture Model
Frame: atomic reconfiguration storage unit that can be dynamically updated
Multiple frames reconfigured one by one Reconfiguration of one frame does not disturb the
execution of other frames
Partitioning and Scheduling
Partitioning Coarse Grained – Tasks Level Fine Grained – Basic block Level
Scheduling Static (design time) Dynamic (At run time)
Challenges of Using DRL
1. Reconfiguration management Goal: To minimize no. of reconfigurations
Reconfiguration Delays Execution Reconfiguration Consumes Power
How? Tasks Ordering Pre-fetching
Representative Co-synthesis Systems
CORDS – Princeton University CRUSADE – Bell Labs SLOPES – Princeton University NIMBLE Compiler Recent – Run-time Scheduling (by Juanjo
Noguera, Rosa M. Badia)
NIMBLE Compiler
partitioning algorithm selects which loops to implement in the FPGA,
and which hardware version of each loop should be used to achieve the highest application-level performance
SLOPES
Multi-objective: Price Power Performance Genetic Algorithm for Partitioning and
Allocation Scheduling Heuristic
takes into account the delay and power overheads of dynamic reconfiguration
Scheduling Issues
Scheduling sequence multiple ready tasks may reside candidate pool different time, resource and reconfiguration
requirements, and power consumption changing the scheduling order may have a
significant impact on scheduling quality
Scheduling Issues
Location assignment policy possible positions in the FPGA where the circuit
implementing the task can be located different locations not only influences the current
task, but may also impact the tasks scheduled either after or before it
SLOPES Scheduling
Scheduling sequence The order of scheduling tasks is determined
dynamically by task priorities Location assignment policy
The global reconfiguration information for all the tasks assigned to the FPGA is considered
Scheduling Sequence Policy
Dynamic Priority Assignment
jii
iii
tasktask
tasktasktask
erreconfigoverheadreconfig
timeexectimefinishlatestpriority
,int__
___
Location Assignment Policy
Reconfiguration prefetch Configuration pattern reutilization Eviction candidate
Fitting policy Slack time utilization
graphsub
task
jj depth
slacktimestarttimestarttolerate j
_
___
frameend
framestartiframei
freqrecurrentteviction_
_
_cos_
Location Assignment Policy
Frame Priorities
dhyperperioulotimestartts
dhyperperioulotimereadytr
tstrtrtsdhyperperio
tstrtstr
P
ii
ii
ii
i
frameframe
tasktask
frametasktaskframe
frametaskframetask
frame
mod__
mod__
__),__(
__),__(