embedded systems in silicon td5102 introduction and overview

83
Embedded Systems in Silicon TD5102 Introduction and overview Henk Corporaal http://www.ics.ele.tue.nl/~heco/courses/ EmbSystems Technical University Eindhoven DTI / NUS Singapore 2005/2006

Upload: swann

Post on 10-Feb-2016

53 views

Category:

Documents


1 download

DESCRIPTION

Embedded Systems in Silicon TD5102 Introduction and overview. Henk Corporaal http://www.ics.ele.tue.nl/~heco/courses/EmbSystems Technical University Eindhoven DTI / NUS Singapore 2005/2006. Contents. Trends Platforms Application mapping Design flow Summary. Observation 1: The 3 Cs. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Embedded Systems in Silicon TD5102 Introduction and overview

Embedded Systems in SiliconTD5102

Introduction and overview

Henk Corporaalhttp://www.ics.ele.tue.nl/~heco/courses/EmbSystems

Technical University EindhovenDTI / NUS Singapore

2005/2006

Page 2: Embedded Systems in Silicon TD5102 Introduction and overview

H.C. TD5102 2

Contents• Trends• Platforms• Application mapping• Design flow• Summary

Page 3: Embedded Systems in Silicon TD5102 Introduction and overview

H.C. TD5102 3

Observation 1:The 3 Cs

• Convergence of 3 Cscomputers, communications and consumer

electronics

• The computer enters the 3rd fasecomputing power - networking - intelligent processing

• The world is one network wherever, whenever, all information and communication available

We get a smart environment

Page 4: Embedded Systems in Silicon TD5102 Introduction and overview

H.C. TD5102 4

Observation 2: Current design practise

Logic

SystemAlgorithm

R/T

circuit

Behaviour Structure

Physical

Y-Chart (Gajski-Kuhn)

Design Flow is path in Y chart

Till RT-level largely manual flow

Page 5: Embedded Systems in Silicon TD5102 Introduction and overview

H.C. TD5102 5

Integration

Task Task

Task

Systempeople

CASM

Softwarepeople

vhdl

verilogHardware

people

Paper spec

Observation 3: Informal system specification

Page 6: Embedded Systems in Silicon TD5102 Introduction and overview

H.C. TD5102 6

Observation 4: design productivity

• Yes, we can fabricate the ICs, but …• Can we design them ?• Can we program them ?

103

102

101

4 8 12 16 year

complexity

HW gap

SW gap

Process technology + 58%

HW design productivity +21 %

SW productivity + 8 %

Page 7: Embedded Systems in Silicon TD5102 Introduction and overview

H.C. TD5102 7

Video

3D

Rel. CPU-load for 15 fps

0%200%400%600%800%

1000%1200%

Order ofMagnitude

0 %

25 %

50 %

75 %

100 %

0 50 100 150 200 250 300

Frame (IPPP ...)

Load (Sequence: weather, VO1, binary shape, 10Hz, 112 kbit/s, QCIF)

Factor 2

P. Kuhn, G. Diebel, “Complexity Analysis of the MPEG-4 VM 8.0,” ISO/IEC JTC1/SC29/WG11/MPEG97/m2862, Fribourg, October 1997*

*

Obervation 5:More dynamic applications

Page 8: Embedded Systems in Silicon TD5102 Introduction and overview

H.C. TD5102 8

Observation 6: Memory problem

µProc:55%/year

CPU

DRAM:7%/yearDRAM

1

10

100

1000

1980 1985

1990

1995

2000

Processor-MemoryPerformance Gap:(grows 50% / year)

Performance

Time

“Moore’s Law”

[Patterson]

Page 9: Embedded Systems in Silicon TD5102 Introduction and overview

H.C. TD5102 9

What do we learn from these observations?We need: • Short Time-to-Market

– reuse– short design time

• Flexible solution– programmability– reconfigurability

• Scalability• Low power• Low cost• QoS control

At sufficient performance !

Page 10: Embedded Systems in Silicon TD5102 Introduction and overview

H.C. TD5102 10

Solution ?

1. Platforms– HW and SW IP reuse– Standardization (interfaces)– QoS (quality of service) hooks

2. Advanced Design Flow for Platforms2. Raise abstraction level3. Tool support4. Modeling of Power, Cost, Performance5. Predictable design

Page 11: Embedded Systems in Silicon TD5102 Introduction and overview

H.C. TD5102 11

Lecture 1: Introduction• Trends• Platforms• Application mapping• Design flow• Summary

Page 12: Embedded Systems in Silicon TD5102 Introduction and overview

H.C. TD5102 12

What is a platform?

A platform is a generic, but domain specificinformation processing (sub-)system

In future available as single chip (SoC),or package (SiP)

Page 13: Embedded Systems in Silicon TD5102 Introduction and overview

H.C. TD5102 13

What is a platform?• HW properties:

– One or more programmable processors– Advanced memory organization– Programmable communication network– I/O (highly domain dependent)

• Possible extra HW features:– Reconfigurable logic– Domain specific accelerators

Page 14: Embedded Systems in Silicon TD5102 Introduction and overview

H.C. TD5102 14

What is a platform?• SW components:

– Standardized RTOS– Proper tooling for platform system design

• Compilers, Models, Exploration, Debugging, Simulation, …

• Possible extra SW features– Middleware layer on top of OS for features like:

• QoS• Domain specific protocols • Domain specific SW interfaces• Control reconfigurable logic • Library components• Distributed / Active network processing• Billing• Security

Page 15: Embedded Systems in Silicon TD5102 Introduction and overview

H.C. TD5102 15

Example Platform: Philips NexperiaAvailable in the Billion Transistor Era

– E.g. TI OMAP, Sony Cell, Philips Nexperia, TRIPS, Xilinx Virtex-4 Pro, …

Philips Nexperia

Page 16: Embedded Systems in Silicon TD5102 Introduction and overview

H.C. TD5102 16

Future platforms

Example: Smart Networked Devices

radio programmablehardware

reconfig.hardware

OS library

Virtual MachineProtocols

Multimedia (MPEG 21)Network

acceleratorhardware

Page 17: Embedded Systems in Silicon TD5102 Introduction and overview

H.C. TD5102 17

Future platform: architecture concept

CPUs Accelerators Reconfigurable HW blocks

Level 0

Level 1

Level N

Memory Memory

Memory

Communication network

Communication network

Communication network

I/O

I/O

CPUsCPUs AcceleratorsAccelerators Reconfigurable HW blocks

Reconfigurable HW blocks

Page 18: Embedded Systems in Silicon TD5102 Introduction and overview

H.C. TD5102 18

On-chipNetwork

Networkinterface

NoC realization

IP - Isles:32 RISC microprocessor ~ 20 KgatesMPEG decoding ~ 100 KgatesWavelet filtering ~ 40 KgatesSRAMDRAMFPGA block

Future platforms

IP core

Page 19: Embedded Systems in Silicon TD5102 Introduction and overview

H.C. TD5102 19

Lecture 1: Introduction• Trends• Platforms• Application mapping• Design flow• Summary

Page 20: Embedded Systems in Silicon TD5102 Introduction and overview

H.C. TD5102 20

Platform and platform design

Platform

Enabling technologies

Applications

Des

ign

tech

nolo

gySDT

system design technology

PDTplatform design

technology

Page 21: Embedded Systems in Silicon TD5102 Introduction and overview

H.C. TD5102 21

What is the system designers problem ?

Idea

Specification

Implementation

Find for an application (idea/specification) an efficientmapping/implementation on a given realization space,under given constraints (cost, P, E, T, E*D, Throughput, #pins, ..)

Page 22: Embedded Systems in Silicon TD5102 Introduction and overview

H.C. TD5102 22

A (single) processor: how does it look inside?

FunctionUnit(s)

DataMemory

r0r1r2

FunctionUnit(s)

Registerfile

Instruction register

Decode logic

Processor datapath

Load-StoreUnit

InstructionMemory

Processor control

Page 23: Embedded Systems in Silicon TD5102 Introduction and overview

H.C. TD5102 23

Mapping: placing operations in space and time

d = a * b;e = a + d;f = 2 * b + d;r = f – e;x = z + y;

* *

+ +

- +

a b 2

z y

d

e f

r x

Data Dependence Graph (DDG)

Page 24: Embedded Systems in Silicon TD5102 Introduction and overview

H.C. TD5102 24

How to map these operations?

* *

+ +

- +

a b 2

z yd

e f

r x

Architecture 1:• One Function Unit• All operations single cycle latency

*

*

+

+

-

+

cycle 1

2

3

45

6

Page 25: Embedded Systems in Silicon TD5102 Introduction and overview

H.C. TD5102 25

How to map these operations?

* *

+ +

- +

a b 2

z yd

e f

r x

Architecture 2:• One Add-Sub and one Mul unit• All operations single cycle latency

*

* +

+

-

+cycle 1

2

3

45

6

Mul Add-sub

Page 26: Embedded Systems in Silicon TD5102 Introduction and overview

H.C. TD5102 26

How to map these operations?

* *

+ +

- +

a b 2

z yd

e f

r x

Architecture 3:• One Add-sub and one Mul unit• Add/Sub 1 cycle, Mul 2 cycles

*

* +

+

-

+cycle 1

2

3

45

6

Mul Add-sub

Page 27: Embedded Systems in Silicon TD5102 Introduction and overview

H.C. TD5102 27

There are many mapping solutions

Pareto curve(solution space)

T ex

ecut

ion

x

x

x

x

xx

x

xx

x

x

x

x

x

x

xxx

x

x

xx

x

x

x

xx

x x

x

xx

Cost0

Specific architectureand code schedule

Let S be the solution space containing solutions x = (xi),then: x = Pareto point x S, and y S i xi < yi

Page 28: Embedded Systems in Silicon TD5102 Introduction and overview

H.C. TD5102 28

Can we do better?

Much better !!• transforming the specification• a different architecture• a different mapping• speculative execution• …… be creative ………..

Page 29: Embedded Systems in Silicon TD5102 Introduction and overview

H.C. TD5102 29

Transforming the specification (1)

+

+

+

+

+

+

Based on associativity of + operationa + (b + c) = (a + b) + c

Example: tree height reduction

Page 30: Embedded Systems in Silicon TD5102 Introduction and overview

H.C. TD5102 30

Transforming the specification (2)

d = a * b;e = a + d;f = 2 * b + d;r = f – e;x = z + y;

r = f – e = 2*b + d – (a + d) = 2*b – a;x = z + y;

<<

-

a

1 b

+

x

zy

r

Page 31: Embedded Systems in Silicon TD5102 Introduction and overview

H.C. TD5102 31

Changing the architecture: adding more complex units:

+

+

+

+

+

+

4-input adderwhy is this faster?

Page 32: Embedded Systems in Silicon TD5102 Introduction and overview

H.C. TD5102 32

Changing the architecture: adding more complex units

In the extreme case put everything into one unit!

Spatial mapping- no control flow

Page 33: Embedded Systems in Silicon TD5102 Introduction and overview

H.C. TD5102 33

More complex control flow

-a- ;If condThen -b-Else -c- ;-d- ;

-a-cond?

-b- -c-

-d-

Control Flow Graph(CFG)

Program part:

Page 34: Embedded Systems in Silicon TD5102 Introduction and overview

H.C. TD5102 34

Mapping the CFG example: 3 options: what's the best?

-a-br c

-b-jmp d

-c-

-d-

-a-br b

-c-jmp d

-b-

-d-

-a-br c

-c-jmp d

-b-

-d-

Page 35: Embedded Systems in Silicon TD5102 Introduction and overview

H.C. TD5102 35

Why not removing the control flow ?

Page 36: Embedded Systems in Silicon TD5102 Introduction and overview

H.C. TD5102 36

If conversion shortens the schedule

-a-br c

-b-jmp d

-c-

-d-

-a-

cond-b-

!cond-c-

-d-

Using guarded instructions like:r3: add r1,r2,r5; !r3: mul r4,r5,#3

Page 37: Embedded Systems in Silicon TD5102 Introduction and overview

H.C. TD5102 37

Speculative execution makes it even shorter!

-a-br c

-b-jmp d

-c-

-d-

-a- -b- -c-

-d-

Why not executing -d- in parallel?

Page 38: Embedded Systems in Silicon TD5102 Introduction and overview

H.C. TD5102 38

Huge requirements: > 10 GOP/s > 6 GB/s> 10 MB storage

Software specification: - more than 200 000 lines C- hundreds of files- written by approx. 80 teams

E.g.: MPEG-4 : multimedia

However: Real life much more complex

Page 39: Embedded Systems in Silicon TD5102 Introduction and overview

H.C. TD5102 39

Nowadays implementations:- small images- decoding only- not real-time- several W- single task- limited dynamism

Wanted features:- large images (HDTV)- encoding and decoding- real-time- 100 mW (mobile)- multiple tasks- dealing with dynamism

Can we handle this?

Page 40: Embedded Systems in Silicon TD5102 Introduction and overview

H.C. TD5102 40

Lecture 1: Introduction• Trends• Platforms• Application mapping• Design flow• Summary

Page 41: Embedded Systems in Silicon TD5102 Introduction and overview

H.C. TD5102 41

Embedded system design

How to map your application graph A(L,A,D) to hardware graph (L,N,C)

L: design level (e.g. architecture, implementation or realization level)A: application components (e.g. tasks, operations, data structures)D: dependences between application componentsN: hardware components (e.g. processors, ASICs, FPGA,memories)C: connections between hardware components

Page 42: Embedded Systems in Silicon TD5102 Introduction and overview

H.C. TD5102 42

Abstraction levels

Level 1: Architecture

Level 2: Implementation

Level 3: Realization

Explorationsearch area

Level 0: Requirements

Is modeled by

Is implemented by

Compiles into

Inter-level transformation:System specification level Level specificationlanguages:

English

ES/RT-UML, Esterel, SDL

C++, JAVA,C, VHDL, SystemC

Machine code,Hardware modules

Idea

Page 43: Embedded Systems in Silicon TD5102 Introduction and overview

H.C. TD5102 43

Design space exploration

Level n-1Design point

Exploration atlevel nExploration

search areaRealization

space

LT(n-1,n)

Design transformation Exploration search area

Cos

t

global optimum

Page 44: Embedded Systems in Silicon TD5102 Introduction and overview

H.C. TD5102 44

Design space exploration framework- another Y-chart

SoftwaredescriptionAG(L,A,D)

HardwaredescriptionRG(L,N,C)

Mapper &Scheduler

Analysis

Exploration

Steeringdesigntransformation

Steeringdesigntransformationand mapping

Design point

Statistics

Designtransfor-mations

Designtransfor-mations

Page 45: Embedded Systems in Silicon TD5102 Introduction and overview

H.C. TD5102 45

Design flow steps and constraintsR

efin

emen

t ste

ps

Transformation

Architecture / Platformconstraints

high abstraction level

low abstraction level

idea

realization

Page 46: Embedded Systems in Silicon TD5102 Introduction and overview

H.C. TD5102 46

In which order should we perform the steps?

Step n Step n+1Decision trees

Step n+1

Step n

Step n

Step n+1

Page 47: Embedded Systems in Silicon TD5102 Introduction and overview

H.C. TD5102 47

Well-known phase ordering examples

• Concurrency versus Data management– e.g. loop partitioning versus array partitioning for a

multiprocessor

• Scheduling versus Register allocation

• Logic synthesis versus Placement and Routing

Page 48: Embedded Systems in Silicon TD5102 Introduction and overview

H.C. TD5102 48

Rule of thumb!

• Perform steps with biggest impact first

• Biggest impact: – depends on your interest (= cost function)– min. E, P, E*D, D, Area, Npins, ...

Page 49: Embedded Systems in Silicon TD5102 Introduction and overview

H.C. TD5102 49

Phase ordering example:Why fix data storage/transfer before concurrency management issues?

Recursive image processing algorithm on local neighborhoods:(for i : 0 .. I-1 ) ::(for j : 0 .. J-1 ) :: img[i][j]= f(img[i][j-k], old_img[i][j]);

I

rows

J c o l u m n s

Page 50: Embedded Systems in Silicon TD5102 Introduction and overview

H.C. TD5102 50

Why fix data storage/transfer before concurrency mngnt issues?

Unrolling outerloop (i) M times: • needed M J-word FIFOs (image lines)• M data paths

I

rows

J c o l u m n s

14.4mm 2

(0.7um)

Page 51: Embedded Systems in Silicon TD5102 Introduction and overview

H.C. TD5102 51

Why fix data storage/transfer before concurrency mngnt issues?

Unrolling (j) innerloop (limited by k):M - 1 buffer reg

(i : 0 .. I-1 ) ::(j : 0 .. (J div 2)-1 ):: img[i][2j-1]= f(img[i][2j-k-1], old_img[i][2j-1]); img[i][2j]= f(img[i][2j-k], old_img[i][2j]);

I

rows

J c o l u m n s

Page 52: Embedded Systems in Silicon TD5102 Introduction and overview

H.C. TD5102 52

Proposed System Design Methodology

Traditional (parallelizing)Compiler Steps

System Specification

Optimized algorithms(C/C++ specification)

Code per (parallel) proc.

System-LevelExploration

and refinement

SW/HW Partitioning/Exploration Architecture

HW SynthesisSteps

Structural VHDL Code

Page 53: Embedded Systems in Silicon TD5102 Introduction and overview

H.C. TD5102 53

Dynamic memory mgmtDynamic memory mgmt

Task concurrency mgmtTask concurrency mgmt

Static memory mgmtStatic memory mgmt

Address optimizationAddress optimization

SWSWdesigndesignflowflow

HWHWdesigndesignflowflow

SW/HW co-designSW/HW co-design

Concurrent OO specConcurrent OO spec

Remove OO overheadRemove OO overhead

Page 54: Embedded Systems in Silicon TD5102 Introduction and overview

H.C. TD5102 54

Page 55: Embedded Systems in Silicon TD5102 Introduction and overview

H.C. TD5102 55

Object-based versus Object-oriented

• calls through function pointer• cannot be inlined

switchable

Switchget_state()update()

Switchableswitch_on()switch_off()

Buttonget_state()

Lampswitch_on()switch_off()

Buttonget_state()update()

Lampswitch_on()switch_off()lamp

• direct calls• can be inlined

Object-based

=> OO is good for specification, not for implementation

Object-oriented

Page 56: Embedded Systems in Silicon TD5102 Introduction and overview

H.C. TD5102 56

Whole-system optimization techniques

• Aggressive use of traditional inter-procedural techniques– in the embedded world you often know the whole application !

• OO specific optimization

• Data allocation optimization

Page 57: Embedded Systems in Silicon TD5102 Introduction and overview

H.C. TD5102 57

Example: data inlining

B *b;A() { b = new C; }~A() { delete b; }void f() { b->g();}

class A

C b;A(): b() {}~A() {}void f() { b.g();}

class A’

Eliminate:

• dynamic allocation

• pointer de-reference

• polymorphic calls

class B

class C

Page 58: Embedded Systems in Silicon TD5102 Introduction and overview

H.C. TD5102 58

Example: dynamic allocation removalvoid teq(…,short size,…){ float* Ryy; Ryy = new float[size]; … teq computation … delete Ryy;}

void teq(…,…){ float Ryy[64]; … teq computation … }

teq(…,64,…);…teq(…,64,…);…

• Eliminate dynamic allocation

• Re-use stack memory already needed for other call tree branches

teq(…,…);…teq(…,…);…

Page 59: Embedded Systems in Silicon TD5102 Introduction and overview

H.C. TD5102 59

ADSL result: footprint -33%

400kB

200kB

Total memory footprint (code + data)

106%100% 83%

82%

67%

ARM C++ optimized (-O2 -Ospace)Inlining, dead code, constant prop.

Unoptimized

Virtual call eliminationData alloc. optim.

Page 60: Embedded Systems in Silicon TD5102 Introduction and overview

H.C. TD5102 60

• Data type refinement• Virtual memory management

Page 61: Embedded Systems in Silicon TD5102 Introduction and overview

H.C. TD5102 61

ATM_cell * Data_In;Association_Table * Routing_Table;

Routing_Table = new Association_Table();Data_In = new ATM_cell();

if ( Routing_Table->Lookup(Data_In) ) ...

Data type refinement

Impl. alternatives

104

103

102

101

100

Power function Area function10 4

10 3

10 2

10 1

10 0

Page 62: Embedded Systems in Silicon TD5102 Introduction and overview

H.C. TD5102 62

ATM_cell * Data_In;Array * Routing_Table;

Routing_Table = new Array ();Data_In = new ATM_cell();

if ( Routing_Table->Lookup(Data_In) ) ...

Data type refinement: Array

Impl. alternatives

104

103

102

101

100

Power function Area function10 4

10 3

10 2

10 1

10 0

Array (AR)

data

data

data

data

Page 63: Embedded Systems in Silicon TD5102 Introduction and overview

H.C. TD5102 63

ATM_cell * Data_In;Linked_List * Routing_Table;

Routing_Table = new Linked_List ();Data_In = new ATM_cell();

if ( Routing_Table->Lookup(Data_In) ) ...

Data type refinement: Linked List

Impl. alternatives

104

103

102

101

100

Power function Area function10 4

10 3

10 2

10 1

10 0

Array (AR)

data

data

data

data

keydata

keydata

keydata

Linked List (LL)

keydata

Page 64: Embedded Systems in Silicon TD5102 Introduction and overview

H.C. TD5102 64

ATM_cell * Data_In;Binary_Tree * Routing_Table;

Routing_Table = new Binary_Tree ();Data_In = new ATM_cell();

if ( Routing_Table->Lookup(Data_In) ) ...

Data type refinement: Binary Tree

Impl. alternatives

104

103

102

101

100

Power function Area function10 4

10 3

10 2

10 1

10 0

Array (AR)

data

data

data

data

keydata

keydata

keydata

Linked List (LL)

keydata

keydata

keydata

Binary Tree (BT)

keydata

Page 65: Embedded Systems in Silicon TD5102 Introduction and overview

H.C. TD5102 65

Going from specification concurrency to implementation concurrency

Page 66: Embedded Systems in Silicon TD5102 Introduction and overview

H.C. TD5102 66

Modelling MTG*

Page 67: Embedded Systems in Silicon TD5102 Introduction and overview

H.C. TD5102 67

TCM transformations

Why transformations?– shift existing Pareto curves– create new points on the Pareto curves– improve available task level parallelism

Cycle Budget

Power

Cycle Budget

Power

Page 68: Embedded Systems in Silicon TD5102 Introduction and overview

H.C. TD5102 68

MA Cycle Budget

Shar

ed M

emor

y A

rea

TCM Transformations

T1 T2T3 T4 T5T6

T1 T2 T3 T4 T5 T6

Independent,dynamic tasksassigned to 1 Processor

T1 T2 T3

T4 T5 T6

P1

P2

T1 T3

T4 T5T6

T2

Tasks freely assignedto 2 Processors

T1 T3T5T6

T2T4

Tasks order constrainedto reduce memory requirements

T1 T6 T3

T4

T5

T2

HW1

HW1

Partial OrderConstraints

‘Conflict’

less memory

Page 69: Embedded Systems in Silicon TD5102 Introduction and overview

H.C. TD5102 69

DTSE: data transfer and storage exploration

Page 70: Embedded Systems in Silicon TD5102 Introduction and overview

H.C. TD5102 70

Static data memory management (DMM)

Processor Data Paths

L1Cache

L2Cache

Chip

Cache & BankRecombine

Local Latch 1 +Bank 1

Off-chip SDRAM

Local Latch N +Bank N

4 Avoid N-port Memories 3 Exploit memory hierarchy

1 Reduce redundant transfers2 Introduce Locality

6 Exploit limited life-timeand data layout freedom 5 Meet real-time constraints

Page 71: Embedded Systems in Silicon TD5102 Introduction and overview

H.C. TD5102 71

DMM: how to improve locality?

Processor Data Paths

L1Cache

L2Cache

Chip

Cache & BankRecombine

Local Latch 1 +Bank 1

Off-chip SDRAM

Local Latch N +Bank N

Introduce locality

FOR i:=1 TO N DO B[i]:=f(A[i]);FOR i:=1 TO N DO C[i]:=g(B[i]);

FOR i:=1 TO N DO{ B[i]:=f(A[i]); C[i]:=g(B[i]);}

Page 72: Embedded Systems in Silicon TD5102 Introduction and overview

H.C. TD5102 72

Exploiting Memory Hierarchy

Processor Data Paths

Reg.file M'' M'' M''

#A = 100%

P (before) = 100 %P (after) = 100%*0.01 + 10%*0.1 + 1% * 1 = 3%

#A = 1%#A = 10%

P=0.01 P=0.1 P=1

Page 73: Embedded Systems in Silicon TD5102 Introduction and overview

H.C. TD5102 73

How to Avoid N-port Memories?

R(A) R(B) W(C)

A,B,C A,B,CR(A) R(B)R(B)W(C)

R(A) R(B) W(C)

A,C B

Page 74: Embedded Systems in Silicon TD5102 Introduction and overview

H.C. TD5102 74

Page 75: Embedded Systems in Silicon TD5102 Introduction and overview

H.C. TD5102 75

Algebraic Transformations and Aggressive Code Hoisting for Expression Elimination

for(y=0..9; y++) { for(x=0..99; x++) { if (x>1) A[ (y%3)*3 + (x-2)%3 ]=... if (x>4) ...=A[ (y%3)*3 + (x-5)%3 ];}}

Initial

Optimised-1st

X3 less cost

for(y=0..9; y++) { v_y = (y%3)*3; for(x=0..99; x++) { v_yx = (x-2)%3+v_y; if (x>1) A[v_yx] = …; if (x>4) … = A[v_yx];}}

Page 76: Embedded Systems in Silicon TD5102 Introduction and overview

H.C. TD5102 76

for(y=0..9; y++) { v_y = (y%3)*3; for(x=0..99; x++) { v_yx = (x-2)%3+v_y; if (x>1) A[v_yx] = …; if (x>4) … = A[v_yx];}}

Optimised-1st

Modulo substitution for piece-wise linear addressing

for (p_y=0, y=0..9; y++) { if (p_y>=9) p_y - =9; for (p_x=1, x=0..99; x++) { if (p_x>=3) p_x - = 3; v_yx = p_x + p_y; if (x>1) A[v_yx] = …; if (x>4) … = A[v_yx]; p_x++; } p_y=+3; }

Optimised-2nd

X2 less cost

Page 77: Embedded Systems in Silicon TD5102 Introduction and overview

H.C. TD5102 77

What do we gain?Running example: cavity detection

• Application domain:– Computer Tomography in medical imaging

• Algorithm: – Cavity detection in CT-scans– Detect dark regions in

successive images– Indicate cavity in brain

Bad news for owner of brain

Page 78: Embedded Systems in Silicon TD5102 Introduction and overview

H.C. TD5102 78

Starting point

Reference (conceptual) C code for the algorithm– all functions: image_in[N x M]t-1 -> image_out[N x M]t

– new value of pixel depends on its neighbors– neighbor pixels read from background memory– approximately 110 lines of C code (ignoring file I/O etc)– experiments with N x M = 640 x 400 pixels– straightforward implementation: 6 image buffers

ComputeEdges

GaussBlur x Reverse Detect

Roots

MaxValue

GaussBlur y

Page 79: Embedded Systems in Silicon TD5102 Introduction and overview

H.C. TD5102 79

Cavity Detector Results

0

100

200

300

400

500

600

accesses size cycles

OriginalDF trafoLoop trafoData reuseIn-placeData layoutADOPT - moduloADOPT - rest

Page 80: Embedded Systems in Silicon TD5102 Introduction and overview

H.C. TD5102 80

Lecture 1: Introduction• Trends• Platforms• Application mapping• Design flow• Summary

Page 81: Embedded Systems in Silicon TD5102 Introduction and overview

H.C. TD5102 81

Summary

• Billions of Embedded systems, everywhere!!!• Multi-media applications become extremely

complex and dynamic• Time-to-Market pressure

• Solution:– Platforms as design target (raise abstraction level)– Advanced emb. system design flow needed

Page 82: Embedded Systems in Silicon TD5102 Introduction and overview

H.C. TD5102 82

Traditional Design Methodology

Traditional (parallelizing)Compiler Steps

System Specification

Optimized SW spec(C specification)

Code per (parallel) proc.

(SW SystemExploration)

SW/HW Partitioning/Exploration

ArchitectureHW Synthesis

Steps

Structural VHDL Code

Optimized HW spec(VHDL specification)

HW SystemExploration

Page 83: Embedded Systems in Silicon TD5102 Introduction and overview

H.C. TD5102 83

Proposed System Design Methodology

Traditional (parallelizing)Compiler Steps

System Specification

Optimized algorithms(C/C++ specification)

Code per (parallel) proc.

System-LevelExploration

and refinement

SW/HW Partitioning/Exploration Architecture

HW SynthesisSteps

Structural VHDL Code

Our main focus