embedded systemembedded.eecs.berkeley.edu/research/hsc/class.f99/ee249/handouts/bassam.pdfbassam...
TRANSCRIPT
Bassam Tabbara 10/19/99
Optimizations for HW and SW Synthesis 1
SRC Graduate Fellow: SRC DC-324-028
Data Flow and Control Optimizations for Hardware and Software Co-Synthesis
in Embedded Systems
%DVVDP�7DEEDUD$OEHUWR�6DQJLRYDQQL�9LQFHQWHOOL
8QLYHUVLW\�RI�&DOLIRUQLD�DW�%HUNHOH\
SRC Graduate Fellow: SRC DC-324-0282
� �����%DVVDP�7DEEDUD
Embedded System
• Electronic “brain” found in many applications e.g.ÅConsumer electronics ÅTelecommunications
• Consists of:ÅSoftware: flexibilityÅHardware: performance
• Application requirements on the system:ÅSmall ÅEfficientÅPowerÅOther metrics
Bassam Tabbara 10/19/99
Optimizations for HW and SW Synthesis 2
SRC Graduate Fellow: SRC DC-324-0283
� �����%DVVDP�7DEEDUD
Hardware/Software Co-design
6\QWKHVLV
'HVLJQ
5HSUHVHQWDWLRQ
DesignSpecification
(YDOXDWLRQ
,PSOHPHQWDWLRQ
+:�6:�3DUWLWLRQLQJ
Micro-processor
ASIC
SW
HW
SRC Graduate Fellow: SRC DC-324-0284
� �����%DVVDP�7DEEDUD
Hardware/Software Co-Synthesis
Data dominated applicationsÅ Focus: data processing (e.g. digital
video TV)
ÅModel: data-based (e.g. data dependency graph)
Å Optimization: data flow
Å Representative
ÃVULCAN [Gupta, 95]
ÃCOSYMA [Ernst, 93]
Control dominated applicationsÅ Focus: reactive controllers (e.g.
car brake controller)
ÅModel: control-based (e.g. FSM)
Å Optimization: control
Å Representative
ÃPOLIS [Chiodo, 94]
ÃCHINOOK [Borriello, 95]
Typical control-dominated applications are notpurely control; they have data computations as well ...
Bassam Tabbara 10/19/99
Optimizations for HW and SW Synthesis 3
SRC Graduate Fellow: SRC DC-324-0285
� �����%DVVDP�7DEEDUD
Challenges in Control-dominated Co-Synthesis
• Experience, and feedback from POLIS users
(automotive, telecommunications,...)
ÅCannot just focus on improved productivity
ÅApplications with data computations result in:
ÃPoor synthesized output quality: Large and Inefficient
• Co-design environments that focus on small reactive
controllers suffer from this as well (TOSCA [Borriello
95], SCENIC [Gupta 97], …)
SRC Graduate Fellow: SRC DC-324-0286
� �����%DVVDP�7DEEDUD
Research Opportunity
• Incorporate data flow optimization into control
dominated co-synthesis flows
ÅPerform in an unbiased manner towards HW or SW
• Reduce control (e.g. y= 1; if (y==1) … else …)
• Improve the quality of the synthesized output
ÅSmall (code size of software, area of hardware), and
ÅEfficient (performance)
ÅPower, domain-specific metrics
Bassam Tabbara 10/19/99
Optimizations for HW and SW Synthesis 4
SRC Graduate Fellow: SRC DC-324-0287
� �����%DVVDP�7DEEDUD
Overview
• Problem Statement: Current methodologies for
designing control-dominated hardware-software
systems suffer from inefficient hardware and
software synthesis
• Research Objective: Develop a methodology that
incorporates data flow in addition to control
optimizations in a hardware and software co-design
environment in order to improve synthesis quality
SRC Graduate Fellow: SRC DC-324-0288
� �����%DVVDP�7DEEDUD
Assumptions
• Target: heterogeneous control-dominated embedded system applications ÅFunctional decomposition captures design as a
network of Finite State Machines extended with data computations (EFSMs).
• Focus: representation, optimization, and synthesis of each individual task
• No assumptions on how tasks are composed in the whole system.
Bassam Tabbara 10/19/99
Optimizations for HW and SW Synthesis 5
SRC Graduate Fellow: SRC DC-324-028
Background
SRC Graduate Fellow: SRC DC-324-02810
� �����%DVVDP�7DEEDUD
Reactive System Co-synthesis
CDFG is suitable for describing EFSM reactive behavior but
8Some of the control flow is hidden8Data cannot be propagated
S1a:= a + 1
S0a:= 5
S2
a
EFSM
Mapping
D�� ��
VWDWH�� �6�
Case (state)
BEGIN
END
6�
D�� �D����
VWDWH�� �6�
HPLW�D�
6�
CDFG
6�
Bassam Tabbara 10/19/99
Optimizations for HW and SW Synthesis 6
SRC Graduate Fellow: SRC DC-324-02811
� �����%DVVDP�7DEEDUD
S1
S0
S2
EFSMRepresentation
a
a:= 5
a:= a + 1a:= 6
a
Optimized EFSMRepresentation
Data Flow Optimization
SRC Graduate Fellow: SRC DC-324-028
Suitable Design Representation
Bassam Tabbara 10/19/99
Optimizations for HW and SW Synthesis 7
SRC Graduate Fellow: SRC DC-324-02813
� �����%DVVDP�7DEEDUD
Intermediate Design Representation
• Develop Function Flow Graph (FFG) / C-Like
Intermediate Format (CLIF) ÃAble to represent EFSM
ÃSuitable for data flow analysis
EFSM FFGOptimized
FFGCDFG
Data Flow/ControlOptimizations
SRC Graduate Fellow: SRC DC-324-02814
� �����%DVVDP�7DEEDUD
Function Flow Graph (FFG)
Åis a triple G = (V, E, N0) where
ÃV is a finite set of nodes
ÃE = (x,y), a subset of V×V, is an edge from x to ywhere x ∈ Pred(y), the set of predecessor nodes of y.
ÃN0 ∈ N is the start node corresponding to the EFSM initial state.
ÃAn unscheduled sequence of operations is associated with each node N.
ÃOperations consist of TESTs performed on the EFSM inputs and internal variables, and ASSIGNson the EFSM outputs and internal variables
Bassam Tabbara 10/19/99
Optimizations for HW and SW Synthesis 8
SRC Graduate Fellow: SRC DC-324-02815
� �����%DVVDP�7DEEDUD
FFG / CLIF Example
(cond2 == 0) / output(a)(cond2 == 1) / output(b)
Legend: constant, output flow, dead operationS# = State, S#L# = Label in State S#
S1x=x+yx=x+ya= b+c
a=xcond1 = (y==cst1)cond2 = !cond1;
y = 1
FunctionFlow Graph
S1: x = x + y;x = x + y;a = b + c;a = x;cond1 = (y == cst1);cond2 = !cond1;if (cond2) goto S1L0output = a;goto S1; /* Loop */
output = b;goto S1;
S1L0:
CLIFTextual Representation
SRC Graduate Fellow: SRC DC-324-02816
� �����%DVVDP�7DEEDUD
C-Like Intermediate Format (CLIF)
• Import/Export Function Flow Graph (FFG)
• Unscheduled sequence of TEST and ASSIGNoperationsÅ[if (condition)] goto label
Ådest = op(src)Ãop = {not, minus, …}
Ådest = src1 op src2Ãop = {+, *, /, ||, &&, |, &, …}
• No aliasing (no side effects)• Loops are present
Bassam Tabbara 10/19/99
Optimizations for HW and SW Synthesis 9
SRC Graduate Fellow: SRC DC-324-028
Data Flow and Control Optimization
Optimizing FFG / CLIF
SRC Graduate Fellow: SRC DC-324-02818
� �����%DVVDP�7DEEDUD
Optimization Approach
• Develop optimizer for FFG (CLIF) intermediate design representation
• Goal: Optimize for speed, and size by reducingÅASSIGN operationsÅTEST operationsÅvariables
• Reach goal by solving sequence of data flow problems for analysis and information gathering using an underlying data flow analysis framework
Bassam Tabbara 10/19/99
Optimizations for HW and SW Synthesis 10
SRC Graduate Fellow: SRC DC-324-02819
� �����%DVVDP�7DEEDUD
Sample Data Flow ProblemAvailable Expressions Example
• Goal is to eliminate re-computationsÅFormulate Available Expressions Problem
– Global version of common sub-expression
AE = {a+2}
AE = {a+1, b+2}AE = {a+1}AE = {a+1}
AE = φ AE = φ
AE = Available Expression
S1
t:= a + 1
S3a := a * 5t3 = a + 2
S2t1:= a + 1t2:= b + 2
SRC Graduate Fellow: SRC DC-324-02820
� �����%DVVDP�7DEEDUD
Data Flow Problem Instance
• A particular (problem) instance of a monotone data flow analysis framework is a pair I = (G, M) where M: N → F is a function that maps each node N in V of FFG G to a function in F on the node labelsemilatticeL of the framework D.
Bassam Tabbara 10/19/99
Optimizations for HW and SW Synthesis 11
SRC Graduate Fellow: SRC DC-324-02821
� �����%DVVDP�7DEEDUD
Data Flow Analysis Framework
• A monotone data flow analysis framework D = (L, ∧, F) is used to manipulate the data flow information by interpreting the node labels on N in V of the FFG G as elements of an algebraic structure whereÅL is a bounded semilattice with meet ∧, and
ÅF is a monotone function space associated with L.
SRC Graduate Fellow: SRC DC-324-02822
� �����%DVVDP�7DEEDUD
• Data Flow Equations
Solving Data Flow Problems
)3())3()3(()3( SGenSKillSInSOut ∪−=
)()3(}2,1{
POutSInSSP∈
∩=
AE = {a+2}
AE = {a+1, b+2}AE = {a+1}AE = {a+1}
AE = {φ } AE = {φ }
AE = Available Expression
S1
t:= a + 1
S3a := a * 5t3 = a + 2
S2t1:= a + 1t2:= b + 2
Bassam Tabbara 10/19/99
Optimizations for HW and SW Synthesis 12
SRC Graduate Fellow: SRC DC-324-02823
� �����%DVVDP�7DEEDUD
Solving Data Flow Problems
• Solve data flow problems using the iterative
method [Kildall, 73] [Kennedy, 76]
ÅGeneral: does not depend on the flow graph
ÅOptimal for a class of data flow problems
[Kildall, 73]
ÅReaches fixpoint in polynomial time (O(n2))
Feasibility
Optimality
CovergenceSpeed
SRC Graduate Fellow: SRC DC-324-02824
� �����%DVVDP�7DEEDUD
Data Flow Problems
• Solve following problems in order to improve design:
Å Reaching Definitions and Uses
Å Available Expression Computation
Å Copy Propagation, and constant folding
Å Reachability Analysis
• Code Improvement techniques
Å Dead Operation Elimination
Å Computation sharing through normalization
Å Code motion
Type te xt
Bassam Tabbara 10/19/99
Optimizations for HW and SW Synthesis 13
SRC Graduate Fellow: SRC DC-324-02825
� �����%DVVDP�7DEEDUD
Related Work
• High Level Synthesis (Silicon Compilers)8 Local optimizations
8 Focus mainly on scheduling, allocation for hardware
Ãe.g. YSC [Brayton et. al., 88]; CATHEDRAL II [Rabaey et. al., 88]
• Global Data Flow Optimization Techniques (Software Compilers)8 Focus on handwritten code, and compilers for assembly code generation
Ãe.g. [Kildall, 1973]; [Kam and Ullman, 1976]; [Aho, Sethi, and Ullman, 1988]
4Can be applied on the design representation and specialized to the reactive embedded domain
SRC Graduate Fellow: SRC DC-324-02826
� �����%DVVDP�7DEEDUD
Specializing Problems to the Reactive Embedded Domain
input inp;
output outp;
int a = 0;
int _CONST_0 = 0;
int _T11 = 0;
int _T13 = 0;
S1:goto S2;
S2:a = inp;T13 = a + _CONST_0;T11 = a + a;outp = _T11;goto S3;
S3:outp = _T13;goto S3;
Bassam Tabbara 10/19/99
Optimizations for HW and SW Synthesis 14
SRC Graduate Fellow: SRC DC-324-028
Architecture Dependent Optimizations
SRC Graduate Fellow: SRC DC-324-02828
� �����%DVVDP�7DEEDUD
Architecture Dependent Optimizations
libArchitecturalInformation
EFSM FFG OFFG CDFGMFFG
ArchitectureIndependent
Sum
Bassam Tabbara 10/19/99
Optimizations for HW and SW Synthesis 15
SRC Graduate Fellow: SRC DC-324-02829
� �����%DVVDP�7DEEDUD
Sharing Sub-expressions
• Available Expressions cannot eliminate T2
• But if variables are registered (additional architectural information) we can share T1and T2
b
+
a x
Out
T(a+b)
S1T1 = a + b;
x = T1;a = c;
S2T2 = a + b;Out = T(a+b);
emit(Out)
SRC Graduate Fellow: SRC DC-324-02830
� �����%DVVDP�7DEEDUD
Function Architecture Co-design
AFFG IVP
IVP GDWD�I�
GDWD�I�
, 2FRQWURO
GDWD
L�R
$6,&V
SURFHVVRUV
SystemConstraints
SystemSpecs
DecompositionDecomposition
t1= 3*bt2= t1+a
emit x(t2)
Operator Strength ReductionInstruction Selection
Bassam Tabbara 10/19/99
Optimizations for HW and SW Synthesis 16
SRC Graduate Fellow: SRC DC-324-02831
� �����%DVVDP�7DEEDUD
FFG OFFG
FFG Interpreter (Simulation)
Co-Synthesis Flow
EFSMCDFGSHIFT
SoftwareCompilation
ObjectCode (.o)
HardwareSynthesis
Netlist
Or
SRC Graduate Fellow: SRC DC-324-02832
� �����%DVVDP�7DEEDUD
Design Representation in POLIS
• SHIFTÅHierarchical netlist of EFSMsÃEFSMs represented as
– Input/Output/State signals
– Tabular description of transition relation
ÅStateless arithmetic, or Boolean functionsÃSub-circuits
• CDFGÅEach path in CDFG is a EFSM transition
ÅOperations
Bassam Tabbara 10/19/99
Optimizations for HW and SW Synthesis 17
SRC Graduate Fellow: SRC DC-324-02833
� �����%DVVDP�7DEEDUD
CDFG speed (68HC11 cycles) Quicksort
[Aho, 1988] # nodes
build (sec) min max
EFSMÁCDFG 336 2.0 613 1266
EFSMÁFFGÁCDFG 206 0.3 229 661
Result (%) 38.7 85.0 62.6 47.8
ResultsSynthesized Software
SRC Graduate Fellow: SRC DC-324-02834
� �����%DVVDP�7DEEDUD
Synthesized Hardware
Before HardwareOptimization
After HardwareOptimization
SIMPLE
nodes(BLIF)
literals(sop)
latchesnodes
(BLIF)literals(sop)
latches
EFSMÁCDFG 353 3204 100 152 644 68EFSMÁCLIFÁCDFG 236 2415 100 88 328 36
Result (%) 33.1 24.6 0.0 42.1 49.1 47.1
Bassam Tabbara 10/19/99
Optimizations for HW and SW Synthesis 18
SRC Graduate Fellow: SRC DC-324-02835
� �����%DVVDP�7DEEDUD
Conclusion
F New design representation (FFG/CLIF) able to capture EFSM
and permit global data flow analysis for design optimization
F Data flow and control optimization approach to hardware and
software co-synthesis of embedded systems
Ô Architecture independent EFSM optimizations
ÔTEST, and ASSIGN operations, and variable reduction
Ô Architecture dependent EFSM optimizations
ÄFunction/Architecture Co-design
SRC Graduate Fellow: SRC DC-324-02836
� �����%DVVDP�7DEEDUD
Co-Design Methodology
Synthesis Verification
Architecture Function
HW SW
0DSSLQJ
Trade-off
Trade-off