reliable dynamic embedded data processing systemstbasten/presentations/ipa20091126.pdf ·...

Reliable Dynamic Embedded Data Processing Systems

Twan Basten

‘Knowing is not understanding.’

Charles Kettering

Twan Basten

Joint work with

Marc Geilen, AmirHossein Ghamarian, Hamid Shojaei, Sander Stuijk, Bart Theelen, Yang Yang

and others

Funding:

NWO PROMES

EC FP6 Betsy, FP7 MNEMEE

SenterNovem Octopus

Embedded Data Processing Systems2

sony

22

philips

asml

océ

thales

sony

apple

3 Embedded Data Processing Systems

sonysony

4 Embedded Data Processing Systems

Trend: Dynamic Behavior5

• Multiprocessor systems / networking / concurrency

• Application convergence / application evolution

• Intra-application dynamism

• Inter-application dynamism

What is the optimal configuration?

- given applications starting/stopping over time

- given data-dependent execution times

- given varying communication bandwidth

- given battery status

- …

6

Intra-application dynamism

Dataflow analysis

7 Dataflow Analysis: Challenges

predictability and efficiency

… are contradictory

8 Synchronous Data Flow Graphs

A,1 B,2 C,22 3 1 2

2 3 1 2

1 1 1 1 1 1

4 2

channelratetoken

execution time

actor

(SDFGs [Lee 1986])

1 1

1

1 1

1

1 1

1

9 Scenario-aware MPEG-4 SP Decoder Model

1

1VLD IDCT

RCMCFD1

1

1

1

11

1 e

ddd

a

bc c

Kernel

Parameterized

rate

Control channelData

channel

Fixed rate

Execution time

VLDP0 0

others 40

IDCTP0 0

others 17

I, P0 0

P30 90

P40 145

P 190

(FSM-based Scenario-Aware DataFlow (SADF))

RCMCFD

3

1 11

Detector Tokens

I P99 P99

P0

...

1

Rate I Px P0

a 0 1 0

b 0 x 0

c 99 x 1

d 1 1 0

e 99 x 0

MCP50 190

P60 235

P70 265

P80 310

P99 390

RC

I 350

P0 0

P30, P40, P50 250

P60 300

P70, P80, P99 320

FD All 0

x={30,40,50,60,70,80,99}

MEMOCODE 2006

10

Throughput analysis for SDF

ACSD 2006

11 Synchronous Data Flow Graphs

A,1 B,2 C,22 3 1 2

2 3 1 2

1 1 1 1 1 1

4 2

channelratetoken

execution time

actor

(SDFGs [Lee 1986])

1 1

1

1 1

1

1 1

1

Throughput: average number of actor firings over time

Iteration: smallest non-empty set of actor firings that does not change

the token distribution (example: A:3, B:2, C:1)

12 SDF Throughput Analysis

A

A, C

A, CB

B

A

Periodic PhaseTransient Phase

Self-timed execution:

A

BB

Throughput can be calculated from the periodic phase

Efficient implementationConsider one designated firing per iteration only to detect a recurrent state

13 Throughput Analysis: Our Result

State

Space

Dasdan

Gupta

Howard Young

Tarjan

Orlin

MP3 dec. 1. 10-3 1. 10-3 1. 10-3 1. 10-3

Modem 1.10-3 82. 10-3 81. 10-3 81. 10-3

Traditional methods

Modem 1.10 82. 10 81. 10 81. 10

Sample Rate 2. 10-3 >1800 >1800 >1800

Satellite 56. 10-3 >1800 >1800 >1800

H.263 decoder 10. 10-3 >1800 >1800 >1800

Runtimes of various throughput analysis methods (in seconds)

14

Storage-throughput trade-offs

IEEE T Comp 2008

15 Lessons to be Learned

• Repeated application of throughput analysis

• Causal dependencies potentially reducing performance

16 Trade-off: Storage vs. Throughput

• Separate memory for each channel.

• Storage distribution, for example, ⟨α, β⟩ → ⟨4,2⟩

A,1 B,2 C,22 3 1 2α β

Tokens on channels must be stored in memory.

1 1

1

1 1

1

1 1

1

• Storage distribution, for example, ⟨α, β⟩ → ⟨4,2⟩

17 Trade-off: Storage vs. Throughput

0.2

0.25

thro

ug

hp

ut

⟨⟨⟨⟨4,2⟩⟩⟩⟩⟨⟨⟨⟨6,2⟩⟩⟩⟩

⟨⟨⟨⟨6,3⟩⟩⟩⟩

⟨⟨⟨⟨7,3⟩⟩⟩⟩

⟨⟨⟨⟨5,3⟩⟩⟩⟩

1 1

1

1 1

1

1 1

1

A,1 B,2 C,22 3 1 2α β

5 6 7 8 9 10 110

0.05

0.1

0.15

0.2

thro

ug

hp

ut

distribution size

⟨⟨⟨⟨4,2⟩⟩⟩⟩⟨⟨⟨⟨6,2⟩⟩⟩⟩

Find all minimal storage distributions for any possible throughput

18

sp(β)

Dependency Graph

1 1

1

1 1

1

1 1

1

A,1 B,2 C,22 3 1 2α β

Storage distribution ⟨α, β⟩ → ⟨4,2⟩

dependency:firings in one

tk(α)

tk(β)

sp(α)

sp(β)

tk(A)

tk(α)sp(α) A3 A4

B0

C0

A2 B1

cycles denotestorage

dependencies

sp - space

tk - token

an actor firing start may depend on

an actor firing end

firings in one

period

19

sp(β)

Dependency Graph

resolving storage dependencies may increase throughput

dependency:firings in one

tk(α)

tk(β)

sp(α)

sp(β)

tk(A)

tk(α)sp(α) A3 A4

B0

C0

A2 B1

cycles denotestorage

dependencies

sp - space

tk - token

an actor firing start may depend on

an actor firing end

firings in one

period

20

sp(β)

Abstract Dependency Graph

tk(α) tk(β)

sp(β)

tk(A)

sp(α)

A B C

abstract dependency graph contains all storage

dependencies

dependencies between actors

tk(α)

tk(β)

sp(α)

sp(β)

tk(A)

tk(α)sp(α) A3 A4

B0

C0

A2 B1

dependencies

easily constructed from state-space exploration

dependency graphmay be very large

21 Experimental Results

Example Satellite MP3 H.263

#actors 3 22 13 4

#channels 2 26 12 3

min throughput > 0 1/7 0.18 1/137 1/21876

Distr. size 6 1542 12 4753Distr. size 6 1542 12 4753

Max throughput 1/4 0.23 1/132 1/10000

Distr. size 10 1544 16 8006

#Pareto points 4 2 3 3254

#Distributions 5 2 3 3254

Exec. time 1ms 7ms 2ms 53min

Approximation

increase buffers with n times the minimal step size

22 Trade-off Analysis

• Start small, recursively resolve bottlenecks

• Fast in general, approximation possible

• Generalizable to other resources, computational models

23

Scenarios

24 Scenario-aware Throughput Analysis

• SDF throughput analysis considers worst-case actor execution times

• Scenario-aware throughput analysis considers worst-case execution times per scenario, but it must consider scenario transitionstimes per scenario, but it must consider scenario transitions

• Analyzing all scenario transitions separately can be avoided

• Separate scenario transitions with an invariant reference schedule

ACM TECS 2010

MP3 Result

• SDF model: 9.28 Mcycles per frame

• SADF model: 5.83 Mcycles per frame

MS-SS-LL-SL-L

26 Pointers

• Yang Yang: shared resources, trade-off analysis

• Sander Stuijk: SDF3 toolset, http://www.es.ele.tue.nl/sdf3

• Future: shared resources + scenarios + trade-off analysis

27

Inter-application dynamism

Run-time adaptation

DAC 2009

28

Energy

(5,ck1)

(1,ck2)

(2,ck2)

(3,ck2)(3,ck1)

(4,ck1)

(5,ck1)

(1,ck2)

(2,ck2)

(3,ck2)

(7,ck2)

(6,ck2)

(9,ck2)

(8,ck2)

JCM

Run-time Application Management

Const

- Applications starting/stopping over time

- 6 procs, slow (ck1) and fast clocks (ck2)

- Minimize energy under time constraints

Completion time

(3,ck1)

(4,ck1)

(5,ck1)

(1,ck2)

(2,ck2)

(3,ck2)

(3,ck1)

(4,ck1)

(8,ck1)

(4,ck2)

(10,ck1)

(9,ck1)

(6,ck2) (5,ck2)

JCM

Gantt chart

Join Const Min

“enforce the

same clock”

29 Pareto Algebra

Goal: Compositional computation of trade-offs

30 Pareto-Algebraic Characterization

… of run-time adaptation

… select and configure

product - derive | constrain | abstract - minimize

off s

paces

adapt

com

ponent

trade-o

ff s

paces

system trade-off space(at the time of a change)

31 Pareto-Algebraic Characterization

… of run-time adaptation

… select and configure

product - derive | constrain | abstract - minimize

com

ponents

X

derive | constrain | abstract min

32 Complexity

n components with p Pareto points each

take a product and on-the-fly

derive | constrain | abstract - minimize

O(p2n)

off s

paces

adaptcom

ponent

trade-o

ff s

paces

system trade-off space(at the time of a change)

33 Complexity

n components with p Pareto points each

take a product and on-the-fly

derive | constrain | abstract - minimize

O(p2n)

X

com

ponents

constrainderive-abstract

min

34 Complexity Control

keep n and p small

• Compositional reasoning, among others

• Approximate Pareto sets

y =

x =

y/x = c2

y/x = c4

35 Multi-dimensional Multiple-choice Knapsack Problem

MMKP

- One optimization objective (value)

- Multiple resource dimensions with capacity constraints

- Multiple independent applications

- Multiple independent configurations per application

Pick one configuration per application optimizing value within constraints

NP hard

Pick one configuration per application optimizing value within constraints

36 A Parameterized Compositional Heuristic

• Project all resource dimensions into one dimension

• Approximate Pareto set in 2-dimensional space

y/x = c2

y =

x =

y/x = c4

product-constrain-derive-abstract-minimize

37 A Parameterized Compositional Heuristic

• Project all resource dimensions into one dimension

• Approximate Pareto set in 2-dimensional space

parameter p: maintain (at most) p Pareto points

Allows to budget analysis time

product-constrain-derive-abstract-minimize

Allows to budget analysis time

analysis time ≤ c ▪ n ▪ p2

(n number of applications; c platform constant)

38

60

70

80

90

100

T=1ms t=5ms T=20ms t=100ms

Controlling Time Budgets

budgets

time

values

par p

50

60

I1 I2 I3 I4 I5 I6

values

• MMKP benchmark: www.laria.u-picardie.fr/hifi/OR-Benchmark/MMKP

• Always within budget

• Good values

• Similar results to the IMEC, SOC 2006,

non-compositional state-of-the-art heuristicSimIt-ARM simulator

206 Mhz StrongARM

39

0

1IMEC Pareto

time(ms)

768 768

1576

1576 2353

2353

3224

3705

3224

3900

Compositionality

• Applications start at various times

• 5 at a time (a), 1 at a time (b)

values

what about applications stopping ?

0 5 10 15 20

0

0.5

1

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24

time(ms)

values always at least as good

as IMEC heuristic

# of active applications

# of active applications

40 Conclusions

• Goal: reliable, resource efficient data processing systems

• Dynamic behavior complicates analysis

• Intra-application dynamism: scenario-aware dataflow analysis and design-space exploration

• Inter-application dynamism: Pareto-algebra-based run-time adaptation

41 Thank you !

More info: www.es.ele.tue.nl/~tbasten/SDF3 toolkit: www.es.ele.tue.nl/sdf3/Pareto calculator: www.es.ele.tue.nl/pareto/

Questions ?

Pareto calculator: www.es.ele.tue.nl/pareto/

‘An understanding of the natural world and what's in it

is a source of not only a great curiosity but great fulfillment.’

David Attenborough

reliable dynamic embedded data processing systemstbasten/presentations/ipa20091126.pdf ·...

Documents