rcw@dei - real needs and limits
TRANSCRIPT
POLITECNICO DI MILANO
Real Needs and LimitsReal Needs and Limits- welcome to the real world
of the Reconfigurable Computing -
DRESD Team
PPartialartial DDynamic ynamic RReconfiguration econfiguration WWorkshoporkshop
2
MotivationsMotivations
Reconfigurable systems, while providing new interesting features in the field of hardware/software co-design, and more in general in the embedded system design, also introduce new problems in their implementation and management. This is particularly true for systems that implement self partial reconfiguration, such as Xilinx platforms.
This talk will present the different scenarios (i.e. flexibility, resource lack…) where the reconfiguration can be effective showing also the drawbacks introduced by this new feature.We will show the presence of two different kinds of limits, theoretical and physical ones, trying to highlight possible solutions to both of these.
3
OutlineOutline
Some real needs
Limits and drawbacks
4
What’s nextWhat’s next
Some real needsBehavioral and structural flexibilityPerformance enhancementFault tolerance
Limits and drawbacks
5
Behavioral and Structural flexibilityBehavioral and Structural flexibility
Speedup the overall computation of the final system
Increasing need for behavioral flexibility in embedded systems design
Support of new standards, e.g. in media processingAddition of new features
New applications too large to fit on the device all at once
6
HW vs SWHW vs SW
Time
Area
Aw
Shw
Sw
Ao
Sho So
Feasible Solution Space
Shde
Ade
Problem:Ad < Ade
Time
Area
Aw
Shw
Sw
Ao
Sho So
Feasible Solution Space
Shde
AdeAd Avd
Shvd
Svd
7
Digital Image ProcessingDigital Image Processing
The canny edge detector is used to detect the edges in a given input image i [Kb]4 functionalites
Image smoothing remove the noiseGradient operator highlight regions with high spatial derivativeNon-maximum suppression reveal the edgesHysteresis remove false edges
Each functionality has to be executed using an input of j [Kb]j ≤ i and x = i/j
Time analysis to identify a first partition in HW core and SW coreNon-maximum suppression implemented as a SW coreImage smoothing, Gradient operator and Hysteresis implemented as HW cores
7
8
DIP: Partial ReconfigurationDIP: Partial Reconfiguration
Static side and IP-Cores resources requirement analysis
Time analysis, resources requirement and reconfigurability evaluation
Hysteresis implemented as a SW coreImage smoothing and Gradient operator implemented as reconfigurable cores
Reconfiguration time (fa into fb): 368ms8
9
Damage/ReliabilityDamage/Reliability
SRAM-based FPGAs are particularly sensible to radiation effects not only in critical environment, but also at terrestrial level
alpha particles hitting devices cause temporary and permanent faultstemporary faults can be modeled as
Modification in the data being processeduser-memory corruptionModification of the functionality being performedconfiguration-memory corruption
Embedded systems implemented on FPGAs need “robustness” to radiations, achieved by means of
by-design fault tolerance
1010
DRESD activies - DRESD activies - Damage/ReliabilityDamage/Reliability
Designing reliable systems implemented on FPGAs, able to cope with the effects of faults caused by radiations
Appling already known and well studied detection and recovery techniques to the particular FPGA scenario
Exploiting dynamic partial reconfiguration to trigger the reconfiguration of the affected portion of the architecture
… while the rest of the system is still working… without needing to entirely reprogram the system
Enabling the assessment of reliable system properties by means of fault injection and simulation
11
What’s nextWhat’s next
Some real needs
Limits and drawbacksSimulation and verificationDesign flow supportReconfiguration time overhead
12
Limits and DrawbacksLimits and Drawbacks
Simulation and Verification
Design flow: The need of a comprehensive framework which can guide designers through the whole implementation process is becoming stronger
Reconfiguration times impact heavily on the final solution’s latency
13
Simulation and VerificationSimulation and Verification
A new way of intending simulationSimulation used to explore the design space to find the best architectural solution
Support to HW/SW codesing solutions but no standard ways to verify the overall (reconfigurable) design
Unfocused tools for the verification of all the reconfiguration related aspects
Xilinx ChipscopeJbits (no longer supported)
14
Design flowDesign flow
Dynamic reconfigurable embedded systems are gathering, an increasing interest from both the scientific and the industrial world
The need of a comprehensive framework which can guide designers through the whole implementation process is becoming stronger
There are several techniques to exploit partial reconfiguration, but..
Few approaches for frameworks and tools to design dynamically reconfigurable systems
They don’t take into consideration both the HW and the SW side of the final architectureThey are not able to support different devicesThey cannot be used to design systems for different architectural solution
14
15
DRESD challengesDRESD challenges
Design of high-performance adaptive embedded systems
An adaptive system is a system that is able to adapt its behavior according to changes in its environment or in parts of the system itself by reconfiguring the existing design to counteract faults or a changed operational environment
Where DRESD comes in:Define reconfigurable hardware and software design methodologies that exploit existing devices: multicores and FPGAs
Design specification techniques and validation methodologiesDesign automation flows and tools to generate hardware and software components and runtime supportDynamically reconfigurable hardware and software architectures
15
16
Reconfiguration challenges Reconfiguration challenges
Reconfiguration times heavily impact on the final solution’s latency
Hiding reconfiguration time is not sufficient
Possible solution: Trivial
Bitstream dimension reductionComplex
Maximize the reuse of configured modulesReconfiguration hidingAlternative implementation (SW execution)Relocation
16
17
Tasks reuseTasks reuse
Reconfiguration times impact heavily on the final solution’s latency, therefore:
Not only try to hide the reconfigurationsBut try to maximize the reuse of reconfigurable modules
Schedule length is on average at least 18.6% better than the shortest one and 19.7% better than the average.
18
Reconfiguration hidingReconfiguration hiding
Time
Area
AB
Reconf
D
C
Reconf
E
F
Area
AB
Reconf
Reconf
DC
Reconf
Reconf
F
E
A
E
DC
B
F
2/1
2/2
1/2
1/1
1/1
2/2
Area/Time
19
Alternative implementation (SW Alternative implementation (SW execution)execution)
Object code implemented as hardware components do not always guarantee the best performance...
Cryptography architecture1 GPP running Linux2 reconfigurable regions2 cryptography services (AES and DES)
20
Relocation: The ProblemRelocation: The Problem
People Demanding for Functionalities
Set of Available Functionalities
FiArea/Time
Legenda:
A2/1
B 1/2
C2/2
D 1/1 E 1/1
F 2/2
RR3RR2RR1
FPGA
RR3RR2RR1
A
RR3RR2RR1
F
RR3RR2RR1
D
RR3RR2RR1
B
RR3RR2RR1
C
E
RR3RR2RR1
RFU Implementations
21
Relocation: ScenarioRelocation: Scenario
A
E
D
C
B
F
2/1
2/2
1/2
1/1
1/1
2/2
A possible scenario
FiArea/Time
Legenda:
Time
Time
Area
AB
Rec. F
F
Rec. E
E
Rec. C
C
Rec. D
D
RR3RR2RR1
A
RR3RR2RR1
F
RR3RR2RR1
D
RR3RR2RR1
B
RR3RR2RR1
C
E
RR3RR2RR1
RFU Implementations
22
Relocation: MotivationRelocation: Motivation
A
E
D
C
B
F
2/1
2/2
1/2
1/1
1/1
2/2
A possible scenario
FiArea/Time
Legenda:
Time
RR3RR2RR1
A
RR3RR2RR1
F
RR3RR2RR1
D
RR3RR2RR1
B
RR3RR2RR1
C
E
RR3RR2RR1
RFU Implementations
RR3RR2RR1
A
RR3RR2RR1
C
RR3RR2RR1
B
RR3RR2RR1
B
RR3RR2RR1
D
RR3RR2RR1
D
E
RR3RR2RR1
E
RR3RR2RR1
RR3RR2RR1
F
Time
Area
AB
Rec. C
C
Rec. F
F
Rec. E
E
DRec. D
Time
Area
AB
Rec. C
C
R2 F
F
R2 E
E
DR2 D
RR3RR2RR1
A
RR3RR2RR1
F
RR3RR2RR1
D
RR3RR2RR1
B
RR3RR2RR1
C
E
RR3RR2RR1
RFU Implementations
23
Relocation: RationaleRelocation: Rationale
Bitstreams relocation technique to: speedup the overall system executionreduce the amount of memory used to store partial bitstreamsachieve a core preemptive execution assign at runtime the bitstreams placement
Slots Modules Bitstreams Bitstreams with reloc. % Memory saving2 5 12 6 50,0%3 8 27 9 75,0%5 10 55 11 80,0%8 16 136 17 87,5%
24
Relocation: Virtual Relocation: Virtual homogeneityhomogeneity
252525
DRESD - Relocation DRESD - Relocation managementmanagement
Create an integrated HW/SW system to manage relocation (1D and 2D) in reconfigurable architecture
Maintain information on FPGA statusDecide how to efficiently allocate tasksProvide support for effective task allocationPerform bitstream relocation
25
26
QuestionsQuestions