embedded systemembedded.eecs.berkeley.edu/research/hsc/class.f99/ee249/handouts/bassam.pdfbassam...

18
Bassam Tabbara 10/19/99 Optimizations for HW and SW Synthesis 1 SRC Graduate Fellow: SRC DC-324-028 Data Flow and Control Optimizations for Hardware and Software Co-Synthesis in Embedded Systems %DVVDP7DEEDUD SRC Graduate Fellow: SRC DC-324-028 2 Embedded System Electronic “brain” found in many applications e.g. Consumer electronics Telecommunications Consists of: Software: flexibility Hardware: performance Application requirements on the system: Small Efficient Power Other metrics

Upload: danghanh

Post on 27-May-2019

219 views

Category:

Documents


0 download

TRANSCRIPT

Bassam Tabbara 10/19/99

Optimizations for HW and SW Synthesis 1

SRC Graduate Fellow: SRC DC-324-028

Data Flow and Control Optimizations for Hardware and Software Co-Synthesis

in Embedded Systems

%DVVDP�7DEEDUD$OEHUWR�6DQJLRYDQQL�9LQFHQWHOOL

8QLYHUVLW\�RI�&DOLIRUQLD�DW�%HUNHOH\

SRC Graduate Fellow: SRC DC-324-0282

� �����%DVVDP�7DEEDUD

Embedded System

• Electronic “brain” found in many applications e.g.ÅConsumer electronics ÅTelecommunications

• Consists of:ÅSoftware: flexibilityÅHardware: performance

• Application requirements on the system:ÅSmall ÅEfficientÅPowerÅOther metrics

Bassam Tabbara 10/19/99

Optimizations for HW and SW Synthesis 2

SRC Graduate Fellow: SRC DC-324-0283

� �����%DVVDP�7DEEDUD

Hardware/Software Co-design

6\QWKHVLV

'HVLJQ

5HSUHVHQWDWLRQ

DesignSpecification

(YDOXDWLRQ

,PSOHPHQWDWLRQ

+:�6:�3DUWLWLRQLQJ

Micro-processor

ASIC

SW

HW

SRC Graduate Fellow: SRC DC-324-0284

� �����%DVVDP�7DEEDUD

Hardware/Software Co-Synthesis

Data dominated applicationsÅ Focus: data processing (e.g. digital

video TV)

ÅModel: data-based (e.g. data dependency graph)

Å Optimization: data flow

Å Representative

ÃVULCAN [Gupta, 95]

ÃCOSYMA [Ernst, 93]

Control dominated applicationsÅ Focus: reactive controllers (e.g.

car brake controller)

ÅModel: control-based (e.g. FSM)

Å Optimization: control

Å Representative

ÃPOLIS [Chiodo, 94]

ÃCHINOOK [Borriello, 95]

Typical control-dominated applications are notpurely control; they have data computations as well ...

Bassam Tabbara 10/19/99

Optimizations for HW and SW Synthesis 3

SRC Graduate Fellow: SRC DC-324-0285

� �����%DVVDP�7DEEDUD

Challenges in Control-dominated Co-Synthesis

• Experience, and feedback from POLIS users

(automotive, telecommunications,...)

ÅCannot just focus on improved productivity

ÅApplications with data computations result in:

ÃPoor synthesized output quality: Large and Inefficient

• Co-design environments that focus on small reactive

controllers suffer from this as well (TOSCA [Borriello

95], SCENIC [Gupta 97], …)

SRC Graduate Fellow: SRC DC-324-0286

� �����%DVVDP�7DEEDUD

Research Opportunity

• Incorporate data flow optimization into control

dominated co-synthesis flows

ÅPerform in an unbiased manner towards HW or SW

• Reduce control (e.g. y= 1; if (y==1) … else …)

• Improve the quality of the synthesized output

ÅSmall (code size of software, area of hardware), and

ÅEfficient (performance)

ÅPower, domain-specific metrics

Bassam Tabbara 10/19/99

Optimizations for HW and SW Synthesis 4

SRC Graduate Fellow: SRC DC-324-0287

� �����%DVVDP�7DEEDUD

Overview

• Problem Statement: Current methodologies for

designing control-dominated hardware-software

systems suffer from inefficient hardware and

software synthesis

• Research Objective: Develop a methodology that

incorporates data flow in addition to control

optimizations in a hardware and software co-design

environment in order to improve synthesis quality

SRC Graduate Fellow: SRC DC-324-0288

� �����%DVVDP�7DEEDUD

Assumptions

• Target: heterogeneous control-dominated embedded system applications ÅFunctional decomposition captures design as a

network of Finite State Machines extended with data computations (EFSMs).

• Focus: representation, optimization, and synthesis of each individual task

• No assumptions on how tasks are composed in the whole system.

Bassam Tabbara 10/19/99

Optimizations for HW and SW Synthesis 5

SRC Graduate Fellow: SRC DC-324-028

Background

SRC Graduate Fellow: SRC DC-324-02810

� �����%DVVDP�7DEEDUD

Reactive System Co-synthesis

CDFG is suitable for describing EFSM reactive behavior but

8Some of the control flow is hidden8Data cannot be propagated

S1a:= a + 1

S0a:= 5

S2

a

EFSM

Mapping

D�� ��

VWDWH�� �6�

Case (state)

BEGIN

END

6�

D�� �D����

VWDWH�� �6�

HPLW�D�

6�

CDFG

6�

Bassam Tabbara 10/19/99

Optimizations for HW and SW Synthesis 6

SRC Graduate Fellow: SRC DC-324-02811

� �����%DVVDP�7DEEDUD

S1

S0

S2

EFSMRepresentation

a

a:= 5

a:= a + 1a:= 6

a

Optimized EFSMRepresentation

Data Flow Optimization

SRC Graduate Fellow: SRC DC-324-028

Suitable Design Representation

Bassam Tabbara 10/19/99

Optimizations for HW and SW Synthesis 7

SRC Graduate Fellow: SRC DC-324-02813

� �����%DVVDP�7DEEDUD

Intermediate Design Representation

• Develop Function Flow Graph (FFG) / C-Like

Intermediate Format (CLIF) ÃAble to represent EFSM

ÃSuitable for data flow analysis

EFSM FFGOptimized

FFGCDFG

Data Flow/ControlOptimizations

SRC Graduate Fellow: SRC DC-324-02814

� �����%DVVDP�7DEEDUD

Function Flow Graph (FFG)

Åis a triple G = (V, E, N0) where

ÃV is a finite set of nodes

ÃE = (x,y), a subset of V×V, is an edge from x to ywhere x ∈ Pred(y), the set of predecessor nodes of y.

ÃN0 ∈ N is the start node corresponding to the EFSM initial state.

ÃAn unscheduled sequence of operations is associated with each node N.

ÃOperations consist of TESTs performed on the EFSM inputs and internal variables, and ASSIGNson the EFSM outputs and internal variables

Bassam Tabbara 10/19/99

Optimizations for HW and SW Synthesis 8

SRC Graduate Fellow: SRC DC-324-02815

� �����%DVVDP�7DEEDUD

FFG / CLIF Example

(cond2 == 0) / output(a)(cond2 == 1) / output(b)

Legend: constant, output flow, dead operationS# = State, S#L# = Label in State S#

S1x=x+yx=x+ya= b+c

a=xcond1 = (y==cst1)cond2 = !cond1;

y = 1

FunctionFlow Graph

S1: x = x + y;x = x + y;a = b + c;a = x;cond1 = (y == cst1);cond2 = !cond1;if (cond2) goto S1L0output = a;goto S1; /* Loop */

output = b;goto S1;

S1L0:

CLIFTextual Representation

SRC Graduate Fellow: SRC DC-324-02816

� �����%DVVDP�7DEEDUD

C-Like Intermediate Format (CLIF)

• Import/Export Function Flow Graph (FFG)

• Unscheduled sequence of TEST and ASSIGNoperationsÅ[if (condition)] goto label

Ådest = op(src)Ãop = {not, minus, …}

Ådest = src1 op src2Ãop = {+, *, /, ||, &&, |, &, …}

• No aliasing (no side effects)• Loops are present

Bassam Tabbara 10/19/99

Optimizations for HW and SW Synthesis 9

SRC Graduate Fellow: SRC DC-324-028

Data Flow and Control Optimization

Optimizing FFG / CLIF

SRC Graduate Fellow: SRC DC-324-02818

� �����%DVVDP�7DEEDUD

Optimization Approach

• Develop optimizer for FFG (CLIF) intermediate design representation

• Goal: Optimize for speed, and size by reducingÅASSIGN operationsÅTEST operationsÅvariables

• Reach goal by solving sequence of data flow problems for analysis and information gathering using an underlying data flow analysis framework

Bassam Tabbara 10/19/99

Optimizations for HW and SW Synthesis 10

SRC Graduate Fellow: SRC DC-324-02819

� �����%DVVDP�7DEEDUD

Sample Data Flow ProblemAvailable Expressions Example

• Goal is to eliminate re-computationsÅFormulate Available Expressions Problem

– Global version of common sub-expression

AE = {a+2}

AE = {a+1, b+2}AE = {a+1}AE = {a+1}

AE = φ AE = φ

AE = Available Expression

S1

t:= a + 1

S3a := a * 5t3 = a + 2

S2t1:= a + 1t2:= b + 2

SRC Graduate Fellow: SRC DC-324-02820

� �����%DVVDP�7DEEDUD

Data Flow Problem Instance

• A particular (problem) instance of a monotone data flow analysis framework is a pair I = (G, M) where M: N → F is a function that maps each node N in V of FFG G to a function in F on the node labelsemilatticeL of the framework D.

Bassam Tabbara 10/19/99

Optimizations for HW and SW Synthesis 11

SRC Graduate Fellow: SRC DC-324-02821

� �����%DVVDP�7DEEDUD

Data Flow Analysis Framework

• A monotone data flow analysis framework D = (L, ∧, F) is used to manipulate the data flow information by interpreting the node labels on N in V of the FFG G as elements of an algebraic structure whereÅL is a bounded semilattice with meet ∧, and

ÅF is a monotone function space associated with L.

SRC Graduate Fellow: SRC DC-324-02822

� �����%DVVDP�7DEEDUD

• Data Flow Equations

Solving Data Flow Problems

)3())3()3(()3( SGenSKillSInSOut ∪−=

)()3(}2,1{

POutSInSSP∈

∩=

AE = {a+2}

AE = {a+1, b+2}AE = {a+1}AE = {a+1}

AE = {φ } AE = {φ }

AE = Available Expression

S1

t:= a + 1

S3a := a * 5t3 = a + 2

S2t1:= a + 1t2:= b + 2

Bassam Tabbara 10/19/99

Optimizations for HW and SW Synthesis 12

SRC Graduate Fellow: SRC DC-324-02823

� �����%DVVDP�7DEEDUD

Solving Data Flow Problems

• Solve data flow problems using the iterative

method [Kildall, 73] [Kennedy, 76]

ÅGeneral: does not depend on the flow graph

ÅOptimal for a class of data flow problems

[Kildall, 73]

ÅReaches fixpoint in polynomial time (O(n2))

Feasibility

Optimality

CovergenceSpeed

SRC Graduate Fellow: SRC DC-324-02824

� �����%DVVDP�7DEEDUD

Data Flow Problems

• Solve following problems in order to improve design:

Å Reaching Definitions and Uses

Å Available Expression Computation

Å Copy Propagation, and constant folding

Å Reachability Analysis

• Code Improvement techniques

Å Dead Operation Elimination

Å Computation sharing through normalization

Å Code motion

Type te xt

Bassam Tabbara 10/19/99

Optimizations for HW and SW Synthesis 13

SRC Graduate Fellow: SRC DC-324-02825

� �����%DVVDP�7DEEDUD

Related Work

• High Level Synthesis (Silicon Compilers)8 Local optimizations

8 Focus mainly on scheduling, allocation for hardware

Ãe.g. YSC [Brayton et. al., 88]; CATHEDRAL II [Rabaey et. al., 88]

• Global Data Flow Optimization Techniques (Software Compilers)8 Focus on handwritten code, and compilers for assembly code generation

Ãe.g. [Kildall, 1973]; [Kam and Ullman, 1976]; [Aho, Sethi, and Ullman, 1988]

4Can be applied on the design representation and specialized to the reactive embedded domain

SRC Graduate Fellow: SRC DC-324-02826

� �����%DVVDP�7DEEDUD

Specializing Problems to the Reactive Embedded Domain

input inp;

output outp;

int a = 0;

int _CONST_0 = 0;

int _T11 = 0;

int _T13 = 0;

S1:goto S2;

S2:a = inp;T13 = a + _CONST_0;T11 = a + a;outp = _T11;goto S3;

S3:outp = _T13;goto S3;

Bassam Tabbara 10/19/99

Optimizations for HW and SW Synthesis 14

SRC Graduate Fellow: SRC DC-324-028

Architecture Dependent Optimizations

SRC Graduate Fellow: SRC DC-324-02828

� �����%DVVDP�7DEEDUD

Architecture Dependent Optimizations

libArchitecturalInformation

EFSM FFG OFFG CDFGMFFG

ArchitectureIndependent

Sum

Bassam Tabbara 10/19/99

Optimizations for HW and SW Synthesis 15

SRC Graduate Fellow: SRC DC-324-02829

� �����%DVVDP�7DEEDUD

Sharing Sub-expressions

• Available Expressions cannot eliminate T2

• But if variables are registered (additional architectural information) we can share T1and T2

b

+

a x

Out

T(a+b)

S1T1 = a + b;

x = T1;a = c;

S2T2 = a + b;Out = T(a+b);

emit(Out)

SRC Graduate Fellow: SRC DC-324-02830

� �����%DVVDP�7DEEDUD

Function Architecture Co-design

AFFG IVP

IVP GDWD�I�

GDWD�I�

, 2FRQWURO

GDWD

L�R

$6,&V

SURFHVVRUV

SystemConstraints

SystemSpecs

DecompositionDecomposition

t1= 3*bt2= t1+a

emit x(t2)

Operator Strength ReductionInstruction Selection

Bassam Tabbara 10/19/99

Optimizations for HW and SW Synthesis 16

SRC Graduate Fellow: SRC DC-324-02831

� �����%DVVDP�7DEEDUD

FFG OFFG

FFG Interpreter (Simulation)

Co-Synthesis Flow

EFSMCDFGSHIFT

SoftwareCompilation

ObjectCode (.o)

HardwareSynthesis

Netlist

Or

SRC Graduate Fellow: SRC DC-324-02832

� �����%DVVDP�7DEEDUD

Design Representation in POLIS

• SHIFTÅHierarchical netlist of EFSMsÃEFSMs represented as

– Input/Output/State signals

– Tabular description of transition relation

ÅStateless arithmetic, or Boolean functionsÃSub-circuits

• CDFGÅEach path in CDFG is a EFSM transition

ÅOperations

Bassam Tabbara 10/19/99

Optimizations for HW and SW Synthesis 17

SRC Graduate Fellow: SRC DC-324-02833

� �����%DVVDP�7DEEDUD

CDFG speed (68HC11 cycles) Quicksort

[Aho, 1988] # nodes

build (sec) min max

EFSMÁCDFG 336 2.0 613 1266

EFSMÁFFGÁCDFG 206 0.3 229 661

Result (%) 38.7 85.0 62.6 47.8

ResultsSynthesized Software

SRC Graduate Fellow: SRC DC-324-02834

� �����%DVVDP�7DEEDUD

Synthesized Hardware

Before HardwareOptimization

After HardwareOptimization

SIMPLE

nodes(BLIF)

literals(sop)

latchesnodes

(BLIF)literals(sop)

latches

EFSMÁCDFG 353 3204 100 152 644 68EFSMÁCLIFÁCDFG 236 2415 100 88 328 36

Result (%) 33.1 24.6 0.0 42.1 49.1 47.1

Bassam Tabbara 10/19/99

Optimizations for HW and SW Synthesis 18

SRC Graduate Fellow: SRC DC-324-02835

� �����%DVVDP�7DEEDUD

Conclusion

F New design representation (FFG/CLIF) able to capture EFSM

and permit global data flow analysis for design optimization

F Data flow and control optimization approach to hardware and

software co-synthesis of embedded systems

Ô Architecture independent EFSM optimizations

ÔTEST, and ASSIGN operations, and variable reduction

Ô Architecture dependent EFSM optimizations

ÄFunction/Architecture Co-design

SRC Graduate Fellow: SRC DC-324-02836

� �����%DVVDP�7DEEDUD

Co-Design Methodology

Synthesis Verification

Architecture Function

HW SW

0DSSLQJ

Trade-off

Trade-off