reliable dynamic embedded data processing systemstbasten/presentations/ipa20091126.pdf ·...
TRANSCRIPT
Reliable Dynamic Embedded Data Processing Systems
Twan Basten
‘Knowing is not understanding.’
Charles Kettering
Twan Basten
Joint work with
Marc Geilen, AmirHossein Ghamarian, Hamid Shojaei, Sander Stuijk, Bart Theelen, Yang Yang
and others
Funding:
NWO PROMES
EC FP6 Betsy, FP7 MNEMEE
SenterNovem Octopus
Embedded Data Processing Systems2
sony
22
philips
asml
océ
thales
sony
apple
3 Embedded Data Processing Systems
sonysony
4 Embedded Data Processing Systems
Trend: Dynamic Behavior5
• Multiprocessor systems / networking / concurrency
• Application convergence / application evolution
• Intra-application dynamism
• Inter-application dynamism
What is the optimal configuration?
- given applications starting/stopping over time
- given data-dependent execution times
- given varying communication bandwidth
- given battery status
- …
6
Intra-application dynamism
Dataflow analysis
7 Dataflow Analysis: Challenges
predictability and efficiency
… are contradictory
8 Synchronous Data Flow Graphs
A,1 B,2 C,22 3 1 2
2 3 1 2
1 1 1 1 1 1
4 2
channelratetoken
execution time
actor
(SDFGs [Lee 1986])
1 1
1
1 1
1
1 1
1
9 Scenario-aware MPEG-4 SP Decoder Model
1
1VLD IDCT
RCMCFD1
1
1
1
11
1 e
ddd
a
bc c
Kernel
Parameterized
rate
Control channelData
channel
Fixed rate
Execution time
VLDP0 0
others 40
IDCTP0 0
others 17
I, P0 0
P30 90
P40 145
P 190
(FSM-based Scenario-Aware DataFlow (SADF))
RCMCFD
3
1 11
Detector Tokens
I P99 P99
P0
...
1
Rate I Px P0
a 0 1 0
b 0 x 0
c 99 x 1
d 1 1 0
e 99 x 0
MCP50 190
P60 235
P70 265
P80 310
P99 390
RC
I 350
P0 0
P30, P40, P50 250
P60 300
P70, P80, P99 320
FD All 0
x={30,40,50,60,70,80,99}
MEMOCODE 2006
10
Throughput analysis for SDF
ACSD 2006
11 Synchronous Data Flow Graphs
A,1 B,2 C,22 3 1 2
2 3 1 2
1 1 1 1 1 1
4 2
channelratetoken
execution time
actor
(SDFGs [Lee 1986])
1 1
1
1 1
1
1 1
1
Throughput: average number of actor firings over time
Iteration: smallest non-empty set of actor firings that does not change
the token distribution (example: A:3, B:2, C:1)
12 SDF Throughput Analysis
A
A, C
A, CB
B
A
Periodic PhaseTransient Phase
Self-timed execution:
A
BB
Throughput can be calculated from the periodic phase
Efficient implementationConsider one designated firing per iteration only to detect a recurrent state
13 Throughput Analysis: Our Result
State
Space
Dasdan
Gupta
Howard Young
Tarjan
Orlin
MP3 dec. 1. 10-3 1. 10-3 1. 10-3 1. 10-3
Modem 1.10-3 82. 10-3 81. 10-3 81. 10-3
Traditional methods
Modem 1.10 82. 10 81. 10 81. 10
Sample Rate 2. 10-3 >1800 >1800 >1800
Satellite 56. 10-3 >1800 >1800 >1800
H.263 decoder 10. 10-3 >1800 >1800 >1800
Runtimes of various throughput analysis methods (in seconds)
14
Storage-throughput trade-offs
IEEE T Comp 2008
15 Lessons to be Learned
• Repeated application of throughput analysis
• Causal dependencies potentially reducing performance
16 Trade-off: Storage vs. Throughput
• Separate memory for each channel.
• Storage distribution, for example, ⟨α, β⟩ → ⟨4,2⟩
A,1 B,2 C,22 3 1 2α β
Tokens on channels must be stored in memory.
1 1
1
1 1
1
1 1
1
• Storage distribution, for example, ⟨α, β⟩ → ⟨4,2⟩
17 Trade-off: Storage vs. Throughput
0.2
0.25
thro
ug
hp
ut
⟨⟨⟨⟨4,2⟩⟩⟩⟩⟨⟨⟨⟨6,2⟩⟩⟩⟩
⟨⟨⟨⟨6,3⟩⟩⟩⟩
⟨⟨⟨⟨7,3⟩⟩⟩⟩
⟨⟨⟨⟨5,3⟩⟩⟩⟩
1 1
1
1 1
1
1 1
1
A,1 B,2 C,22 3 1 2α β
5 6 7 8 9 10 110
0.05
0.1
0.15
0.2
thro
ug
hp
ut
distribution size
⟨⟨⟨⟨4,2⟩⟩⟩⟩⟨⟨⟨⟨6,2⟩⟩⟩⟩
Find all minimal storage distributions for any possible throughput
18
sp(β)
Dependency Graph
1 1
1
1 1
1
1 1
1
A,1 B,2 C,22 3 1 2α β
Storage distribution ⟨α, β⟩ → ⟨4,2⟩
dependency:firings in one
tk(α)
tk(β)
sp(α)
sp(β)
tk(A)
tk(α)sp(α) A3 A4
B0
C0
A2 B1
cycles denotestorage
dependencies
sp - space
tk - token
an actor firing start may depend on
an actor firing end
firings in one
period
19
sp(β)
Dependency Graph
resolving storage dependencies may increase throughput
dependency:firings in one
tk(α)
tk(β)
sp(α)
sp(β)
tk(A)
tk(α)sp(α) A3 A4
B0
C0
A2 B1
cycles denotestorage
dependencies
sp - space
tk - token
an actor firing start may depend on
an actor firing end
firings in one
period
20
sp(β)
Abstract Dependency Graph
tk(α) tk(β)
sp(β)
tk(A)
sp(α)
A B C
abstract dependency graph contains all storage
dependencies
dependencies between actors
tk(α)
tk(β)
sp(α)
sp(β)
tk(A)
tk(α)sp(α) A3 A4
B0
C0
A2 B1
dependencies
easily constructed from state-space exploration
dependency graphmay be very large
21 Experimental Results
Example Satellite MP3 H.263
#actors 3 22 13 4
#channels 2 26 12 3
min throughput > 0 1/7 0.18 1/137 1/21876
Distr. size 6 1542 12 4753Distr. size 6 1542 12 4753
Max throughput 1/4 0.23 1/132 1/10000
Distr. size 10 1544 16 8006
#Pareto points 4 2 3 3254
#Distributions 5 2 3 3254
Exec. time 1ms 7ms 2ms 53min
Approximation
increase buffers with n times the minimal step size
22 Trade-off Analysis
• Start small, recursively resolve bottlenecks
• Fast in general, approximation possible
• Generalizable to other resources, computational models
23
Scenarios
24 Scenario-aware Throughput Analysis
• SDF throughput analysis considers worst-case actor execution times
• Scenario-aware throughput analysis considers worst-case execution times per scenario, but it must consider scenario transitionstimes per scenario, but it must consider scenario transitions
• Analyzing all scenario transitions separately can be avoided
• Separate scenario transitions with an invariant reference schedule
ACM TECS 2010
MP3 Result
• SDF model: 9.28 Mcycles per frame
• SADF model: 5.83 Mcycles per frame
MS-SS-LL-SL-L
26 Pointers
• Yang Yang: shared resources, trade-off analysis
• Sander Stuijk: SDF3 toolset, http://www.es.ele.tue.nl/sdf3
• Future: shared resources + scenarios + trade-off analysis
27
Inter-application dynamism
Run-time adaptation
DAC 2009
28
Energy
(5,ck1)
(1,ck2)
(2,ck2)
(3,ck2)(3,ck1)
(4,ck1)
(5,ck1)
(1,ck2)
(2,ck2)
(3,ck2)
(7,ck2)
(6,ck2)
(9,ck2)
(8,ck2)
JCM
Run-time Application Management
Const
- Applications starting/stopping over time
- 6 procs, slow (ck1) and fast clocks (ck2)
- Minimize energy under time constraints
Completion time
(3,ck1)
(4,ck1)
(5,ck1)
(1,ck2)
(2,ck2)
(3,ck2)
(3,ck1)
(4,ck1)
(8,ck1)
(4,ck2)
(10,ck1)
(9,ck1)
(6,ck2) (5,ck2)
JCM
Gantt chart
Join Const Min
“enforce the
same clock”
29 Pareto Algebra
Goal: Compositional computation of trade-offs
30 Pareto-Algebraic Characterization
… of run-time adaptation
… select and configure
product - derive | constrain | abstract - minimize
off s
paces
adapt
com
ponent
trade-o
ff s
paces
system trade-off space(at the time of a change)
31 Pareto-Algebraic Characterization
… of run-time adaptation
… select and configure
product - derive | constrain | abstract - minimize
com
ponents
X
derive | constrain | abstract min
32 Complexity
n components with p Pareto points each
take a product and on-the-fly
derive | constrain | abstract - minimize
O(p2n)
off s
paces
adaptcom
ponent
trade-o
ff s
paces
system trade-off space(at the time of a change)
33 Complexity
n components with p Pareto points each
take a product and on-the-fly
derive | constrain | abstract - minimize
O(p2n)
X
com
ponents
constrainderive-abstract
min
34 Complexity Control
keep n and p small
• Compositional reasoning, among others
• Approximate Pareto sets
y =
x =
y/x = c2
y/x = c4
35 Multi-dimensional Multiple-choice Knapsack Problem
MMKP
- One optimization objective (value)
- Multiple resource dimensions with capacity constraints
- Multiple independent applications
- Multiple independent configurations per application
Pick one configuration per application optimizing value within constraints
NP hard
Pick one configuration per application optimizing value within constraints
36 A Parameterized Compositional Heuristic
• Project all resource dimensions into one dimension
• Approximate Pareto set in 2-dimensional space
y/x = c2
y =
x =
y/x = c4
product-constrain-derive-abstract-minimize
37 A Parameterized Compositional Heuristic
• Project all resource dimensions into one dimension
• Approximate Pareto set in 2-dimensional space
parameter p: maintain (at most) p Pareto points
Allows to budget analysis time
product-constrain-derive-abstract-minimize
Allows to budget analysis time
analysis time ≤ c ▪ n ▪ p2
(n number of applications; c platform constant)
38
60
70
80
90
100
T=1ms t=5ms T=20ms t=100ms
Controlling Time Budgets
budgets
time
values
par p
50
60
I1 I2 I3 I4 I5 I6
values
• MMKP benchmark: www.laria.u-picardie.fr/hifi/OR-Benchmark/MMKP
• Always within budget
• Good values
• Similar results to the IMEC, SOC 2006,
non-compositional state-of-the-art heuristicSimIt-ARM simulator
206 Mhz StrongARM
39
0
1IMEC Pareto
time(ms)
768 768
1576
1576 2353
2353
3224
3705
3224
3900
Compositionality
• Applications start at various times
• 5 at a time (a), 1 at a time (b)
values
what about applications stopping ?
0 5 10 15 20
0
0.5
1
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
time(ms)
values always at least as good
as IMEC heuristic
# of active applications
# of active applications
40 Conclusions
• Goal: reliable, resource efficient data processing systems
• Dynamic behavior complicates analysis
• Intra-application dynamism: scenario-aware dataflow analysis and design-space exploration
• Inter-application dynamism: Pareto-algebra-based run-time adaptation
41 Thank you !
More info: www.es.ele.tue.nl/~tbasten/SDF3 toolkit: www.es.ele.tue.nl/sdf3/Pareto calculator: www.es.ele.tue.nl/pareto/
Questions ?
Pareto calculator: www.es.ele.tue.nl/pareto/
‘An understanding of the natural world and what's in it
is a source of not only a great curiosity but great fulfillment.’
David Attenborough