a compositional approach to embedded system performance ...€¦ · 4 r. ernst, tu braunschweig 10...
Post on 18-Jul-2020
0 Views
Preview:
TRANSCRIPT
1
R. Ernst, TU Braunschweig 1
A compositional approach to embedded system performance analysis
R. Ernst
TU Braunschweig
R. Ernst, TU Braunschweig 5
Overview
• heterogeneous embedded architectures
• formal performance analysis – holistic and flow based
• a compositional approach to performance analysis
• example
• context aware analysis
• conclusion
2
R. Ernst, TU Braunschweig 6
Embedded systems - trends
• Trend 1: higher system integration– integration of complete programmable subsystems on a single
IC – Multiprocessor-Systems-on-Chip (MpSoC)
R. Ernst, TU Braunschweig 7
Nexperia example: Viper Setop BoxExternal SDRAM
Interrupt controller
Enhanced JTAG
Universal async.receiver/transmitter
(UART)
Universal serial bus
IC debug
Clocks
CPU debug
ISO UART
Reset
MIPS(PR3940)
CPU
TriMediaC-Bridge
Memory controller
Fast C-Bridge
TriMedia(TM32)CPU
MIPS bridge
IEEE 1394 link layer controller
High-performance2D-rendering engine
MIPS C-Bridge
I²C
Exp. bus interfaceunit PCI/XIO
CRCDMA
Adv. imagecompositionProcessor
MPEG-2video decoder
Video inputprocessor
Memory-basedscaler
MPEGsystem proc.
C-bridge
Interrupt ctrl.
Audio I/O
Sony PhilipsDigital I/O
Transport streamDMA
General-purposeI/O
Synchronousserial interface
Fast
PIbu
s
MIP
SPI
bus
Mem
.M
gmt.
IFbu
s
TriM
edi a
PI b
u s
D$
I$
D$
I$
3
R. Ernst, TU Braunschweig 8
Embedded systems - trends - 2
• Trend 2: networked systemsubiquitous computing, telecom, automotive, avionics, space, ...
subsystem integration
subsystem integration
Image source: Siemens
service integrationservice
integration
R. Ernst, TU Braunschweig 9
Embedded architectures - trends
• system function integration – reactive and transformative parts– function IP, legacy code, new functions
• component and subsystem reuse (IP)– increased design productivity and reduced development cost
• programmable platforms – improved design productivity – increased volume– examples: network processors, multi-media platforms,
automotive platforms, game platforms
4
R. Ernst, TU Braunschweig 10
Embedded architecture - challenges
• design specialization – increased performance– reduced power consumption– lower cost and size
• design flexibility– late changes, platforms, reuse
• HW and SW IP integration– result of reuse
⇒ embedded HW architectures are heterogeneous
R. Ernst, TU Braunschweig 11
ES architectures are heterogeneous
• different processing element types– processors, weakly programmable coprocessors, IP
components
• different interconnection networks and communication protocols
• different memory types
• different scheduling and synchronization strategies
M
CoP
M
M
PDSP
M
P
5
R. Ernst, TU Braunschweig 12
Heterogeneous architecture: Viper Setop Box
External SDRAM
Interrupt controller
Enhanced JTAG
Universal async.receiver/transmitter
(UART)
Universal serial bus
IC debug
Clocks
CPU debug
ISO UART
Reset
MIPS(PR3940)
CPU
TriMediaC-Bridge
Memory controller
Fast C-Bridge
TriMedia(TM32)CPU
MIPS bridge
IEEE 1394 link layer controller
High-performance2D-rendering engine
MIPS C-Bridge
I²C
Exp. bus interfaceunit PCI/XIO
CRCDMA
Adv. imagecompositionProcessor
MPEG-2video decoder
Video inputprocessor
Memory-basedscaler
MPEGsystem proc.
C-bridge
Interrupt ctrl.
Audio I/O
Sony PhilipsDigital I/O
Transport streamDMA
General-purposeI/O
Synchronousserial interface
Fast
PIbu
s
MIP
SPI
bus
Mem
.M
gmt.
IFbu
s
TriM
edi a
PI b
u s
D$
I$
D$
I$
R. Ernst, TU Braunschweig 13
Managing HW architecture complexity
• development of application programmer interfaces (API) to hide complexity from application programmer and improve portability
• specialized RTOS to control resource sharing and interfaces
⇒ complex multi-level HW/SW architecture
6
R. Ernst, TU Braunschweig 14
Software architecture example
Bus
core
RTOS
I/O Int Bus-CTRL
timertimer
drivers
RTOS-APIs
application
periphery
cache
memprivate
private
private
private
shar
ed
hardware
software
architecture
application
• layered software architecture with HW dependent SW and API
⇒ embedded SW is heterogeneous
ce1
pe1
API
R. Ernst, TU Braunschweig 15
Communication Centric Design• communication network as a backbone for systems integration
• state of the art: – off chip: busses w. different
protocols and performance levels– on chip:
Proprietary or standard buses with bridgesAMBA, Sonics, ...
– future: multi-stage networks on-chip as well as off-chip (PCI Express)
CoPro
VLIW MEMIPIP IPIPMEM
RISC MEM DSP
communication networkcommunication network
Application B
Application A
7
R. Ernst, TU Braunschweig 16
Subsystem integration - example
M2IP2M3
M1
Com Netw
DSPIP1
HWCPU
M2
IP2M3
M1DSP
IP1HW
CPU
Integration
subsystem 1
subsystem 2P1
P3
P2
Sens
Sens
subsystem 2
subsystem 1
R. Ernst, TU Braunschweig 17
Communication centric design - key challenges
• subsystem integration&verification
• design space exploration and optimization
8
R. Ernst, TU Braunschweig 18
Integration tasks
• resource sharing– several SW processes mapped to one processing element
⇒ task scheduling– mapping several communications mapped to one
communication path⇒ communication (bus) scheduling
– several process data mapped to one memory ⇒ memory assignment (space & time)
• interface synthesis– synthesizing process communication to target system
communication
• integration verification
R. Ernst, TU Braunschweig 19
Integration verification
• correct implementation of specified function• HW/SW co-simulation, verification
• correct target architecture parameters• processor and communication performance• adherence to timing requirements• no memory over/underflow• no run-time dependent dead-locks
generaldesign problem
challenge to heterogeneous system design
9
R. Ernst, TU Braunschweig 20
Actuatorsystem busSensor
RTOS
I/O int bus-CTRL
timertimercore
drivers
RTOS-APIs
application
cache
MEM
RTOS
core
drivers
RTOS-APIs
application
I/Ointbus-CTRL
timertimer
ReleaseAirbag
Complex performance objectives and constraints
Crash
PctrlCPsens. PdetcCC
Pact.CC
Reaction time of airbag after crash ?
tcom+ tdrv
=tAPI
+ tprocess
+ tAPI
=tdrv
+ tcom
+ tdrv
=tAPI
+ tprocess
+ tAPI
=tdrv
+ tcom
=tsenstcrash + tcsens + tdetc + tfbus + tcact + tairbag+ tact+ + tctrl tact+ tairbag+
physical delay
tsens +tcrash +
physical delay tcom+ tdrv
tAPI+ tprocess
+ tAPI
tdrv+ tcom
+ tdrv
tAPI+ tprocess
+ tAPI
tdrv+ tcom
cache
MEM
R. Ernst, TU Braunschweig 21
Complex run-time interdependencies
M2IP2M3
M1
Com Netw
DSPIP1
HWCPUSens
• run-time dependencies of independent components via communication
• influence on timing and power
10
R. Ernst, TU Braunschweig 22
CPU1M1
communication networkcommunication network
min execution time⇒ high bus load
max execution time⇒ low bus load
P1
tbc1 twc1
Interdependency example
• complex non-functional interdependencies
• complex system corner cases
R. Ernst, TU Braunschweig 23
CoPro
Heterogeneous resource sharing -example
VLIW MEMIPIP IPIPMEM
RISC MEM DSP
com. netw.com. netw.
static executionorder scheduling
static priorityschedulingFCFS scheduling
earliest deadlinefirst scheduling
TDMA scheduling
proprietary(abstract info)
11
R. Ernst, TU Braunschweig 24
Performance verification - state of the art
• current approach: target architecture co-simulation• combines functional and performance validation• reuse component validation pattern for system integration and
function test• reuse application benchmarks for target architecture function
validation• visualization of system execution• extensive simulation run times to include many test cases
R. Ernst, TU Braunschweig 25
Co-simulation limitations
• identification of system performance corner cases– different from component performance corner cases– target architecture behavior unknown to the application
function developer (cp. functional HW test)⇒ test case definition and selection ?
• analysis of target architecture – confusing variety of run-time interdependencies– data dependent “transient” run-time effects– mixed in co-simulation
⇒ limited support of design space exploration⇒ debugging challenge
• inclusion of incomplete application specifications ⇒ additional performance models required
12
R. Ernst, TU Braunschweig 26
Alternatives?
• conservative design– install independency in resource sharing– example: fixed execution time slots – TDMA
• formal performance analysis
R. Ernst, TU Braunschweig 29
Formal performance analysis
• formal techniques known for individual components and subsystems (RMS, static scheduling etc.)
• heterogeneity is problem
13
R. Ernst, TU Braunschweig 30
Performance model structure
processexecution model
• execution path – data dependent• path execution – arch. dependent• communication – data & arch. dep.
P1 P2
activation
P1
• resource sharing strategy• process activation• component state (caches)
M
IP
M P M P
M
component & communicationexecution model
systemmodel
• communication pattern• shared memory access• environment model
influenced bymodel type
R. Ernst, TU Braunschweig 31
System performance analysis approaches
• global approach– analysis scope extension to several subsystems
• flow based hierachical approach– global flow analysis combined with local scheduling analysis
14
R. Ernst, TU Braunschweig 32
Analysis scope extension• coherent analysis („holistic“ approach)
• example: Tindell 94, Pop/Eles (DATE 2000, DAC 2002, …):TDMA + static priority – automotive applications
• problem: scalability
P2 P1
T
TTP bus interface
P3 P4
D
TTP bus interfacequeue
RTOSRTOS
TTP bus (TDMA)
static priorityprocess scheduling
static priorityqueueingT: Transmitter
process
R. Ernst, TU Braunschweig 33
Hierarchical approach
• independently scheduled subsystems are coupled by dataflow
⇒ subsystems coupled by stream of data⇒ interpreted as activating events
⇒ coupling corresponds to event propagation
SB 1
scheduling SB 1
P2
P1
SB 2
scheduling SB 2
P4
P3
15
R. Ernst, TU Braunschweig 34
Event propagation and analysis principle
environment model
local analysis
derive output event model
map to input event model
until convergence or non-schedulability
R. Ernst, TU Braunschweig 35
Flow based hierarchical approaches
• event model generalization for a set of scheduling strategies– arrival and service curves Chakraborty/Thiele/Gries/Künzli …– new analysis approaches needed, e.g. Baruah– used for network processor design
• event stream model adaptation– use abstract interface stream properties to couple local analysis– used e.g. for automotive software
16
R. Ernst, TU Braunschweig 36
Generalized event model• incoming events stream is integrated over time intervals and
captured in arrival curves
• event consumption is captured in service curves
• 2 curves each for upper and lower bounds (interval)
• upper and lower bounds linearly approximated for analyis
R. Ernst, TU Braunschweig 37
Load analysis with interval event model
• Load analyis: Addition propagation of computation and load intervals - min/max algebra
• e.g. buffer size: difference between input load and processed events HW and SW resources
resource bounds
input stream bounds
remaining resources processed packet streamssource: L. Thiele, ETH Zurich
L. Thiele, ETH Zurich
17
R. Ernst, TU Braunschweig 38
Compositional Approach (Ernst et al.)
• observation: real-time analysis assumes similar event models at input– periodic event stream
– periodic event streams with jitter
– periodic event streams with burst
– sporadic events with minimum event separation
– sporadic events with bursts
• comparable event models appear at output
• volume of generated events is fixed or interval (data dependent)
tptpte1 te2 te3
te1 te2 te3tptp
tint
tmin
te1 te2 te3 ten ten+1
te1 te2
R. Ernst, TU Braunschweig 39
RTA event model examples
• static execution order– periodic → periodic w. jitter
• Jitter: fixed part (scheduling) + variable part (data dependency)
– sporadic → sporadic w. jitter
• static priority – periodic → periodic w. jitter and burst
– sporadic → sporadic w. jitter and burst
• time driven scheduling: TDMA, round robin– adds jitter to any model
18
R. Ernst, TU Braunschweig 40
Example: Static priority with arbitrary deadlines
• complex execution sequence - may create output bursts
• found in communication scheduling and multiprocessing
busy period
T1
T2
T2
T2 T2
P3
P1
P2
priority analysis solution e.g. by Lehoczky
R. Ernst, TU Braunschweig 41
Few stream models can be derived
• nact (t) no. of activating events in time t
• burst can be simplified and modeled as jitter greater period with min. event distance
19
R. Ernst, TU Braunschweig 42
Event stream model adaptation
• composition requires model adaptation
• adaptation for compatible input and output event models– simple mathematical transformation to adapt analysis
Event Model InterFace (EMIF)– no added function, no run-time effect
• adaptation for non-compatible input and output event models– insert Event Adaptation Function (EAF) to transform models– EAF describes interface buffer function
⇒ shows that buffer is required to couple subsystems
R. Ernst, TU Braunschweig 43
periodic with jitterJ J J
TTperiodic with burst
Tb
t
b
t
periodicTT
sporadicx≥≥≥≥t x≥≥≥≥t x≥≥≥≥t
Event Model Interface Classification
jitter = 0burst length (b) = 1
t = T - J
t = T
t = t
lossless EMIF EMIF to less expressive model
T=T, t=T, b=1 T=T, J=0
20
R. Ernst, TU Braunschweig 44
Using EMIFs and EAFs
Sporadic
Periodic with Burst Periodic with Jitter
PeriodicEAF bufferrequired
upper bound only
R. Ernst, TU Braunschweig 45
Example revisited
com.netw.
subsystem 2
subsystem 1
Distortion
non-functional dependency cycle
21
R. Ernst, TU Braunschweig 46
EMIF w/ EAF
M2DSPIP2M3IP1
HWM1CPUSens
sporadic
EMIF EMIF
sporadic
EMIF
EMIF
burst
EMIF
sporadic
C1
C3
periodic periodic
com.netw.
burst-basednetw
ork analysis
C2
R. Ernst, TU Braunschweig 47
CPU
sporadicw/ jitter
periodicw/ burst
P3preemptionP1
EMIFw/ EAF
NoCC1 interference C2
HWSens
EMIF
simpleperiodic
simpleperiodic
activation by RTOS
com.network
Dependency cycle
resynchronization reduces jitter
22
R. Ernst, TU Braunschweig 48
simpleperiodic
EMIFw/ EAF
EMIF w/ EAF
M2DSPIP2M3IP1
HWM1CPU
Sens
simplesporadic sporadic
w/ jitter EMIF
simplesporadic
C1
C3
periodic periodic
NoC C2
periodicw/ burst
periodicw/ burst
periodicw/ jitter
fmax=1,7kHz, s=8kB
f=140kHz, s=1kB
f=20kHz
s=3kB
8kB→→→→ 32×××× ( 256 + 6 ) byte
1kB→→→→ 1×××× ( 1024 + 6 ) byte
3kB→→→→ 24×××× ( 128 + 6 ) byte
• t (P1)= 250 µµµµs
• t (P3)= 10 µµµµs
com.network
R. Ernst, TU Braunschweig 49
Results
39
23
R. Ernst, TU Braunschweig 50
Application model compatibility
• stream model represents exact data volume – keeps causality for application model activation– required for „AND“ activation, e.g. data flow graphs
otherwise infinite buffers or starvation
– not needed for process systems with „OR“ activation or time driven activation
P1 P2
11 1
11
1
CPU1 CPU2
P1 P2
2
1 1 11
CPU1 CPU2
P321
R. Ernst, TU Braunschweig 51
System contexts
• another stream property – partial event order is kept
• application: tagged token (SPI model) to take system contexts into account
• example set top box
24
R. Ernst, TU Braunschweig 52
Context exploitation: Set top box
system bus
decryptionunit
RF
hard-disk
R. Ernst, TU Braunschweig 53
Context exploitation: Set top box
decryptionunit
RF
hard-disk
mux MPEG-2
IP-traffic
• scenario 1: record video + watch movie + download file via IP
25
R. Ernst, TU Braunschweig 54
MPEG stream modeling
• stream properties– I, B, P frame types of different data volume– encoding with the following constraint on frame types
• stream coding– set of constraints on the sequence of tagged tokens
Min 2 out of 12
IMin 2 out of 12PMin 6 out of 12BMax 4 out of 12IMax 4 out of 12PMax 8 out of 12B
R. Ernst, TU Braunschweig 55
Event stream contexts
• intra event stream context – data volume and execution time depend on frame types and
frame sequence
• inter event stream context– bus load and response times depend on relative phase of
streams
• inter + intra event stream contexts– merged streams on bus have variable order
• solutions available for cyclo static streams (Chen/Mok) and inter event stream contexts only (Tindell), extended to constrained sequences and combinations
26
R. Ernst, TU Braunschweig 56
Response time impact
• response times for static priority bus arbitration
III I I I I I
w/o context t respIP = 920
PI I I I Ptw context t respIP = 690
t
118
muxIP
video streams are multiplexed and have higher priority than IP
R. Ernst, TU Braunschweig 57
Analysis improvement due to intra event streamcontexts
context w/otcontext wt
max resp,
max resp,
27
R. Ernst, TU Braunschweig 58
Set top box scenario 2
decryptionunit
RF
hard-disk
• scenario 2: decript video + download file via IP
Encrypted MPEG-2
Decrypted MPEG-2
IP-traffic
R. Ernst, TU Braunschweig 59
Response time impact
140
I I
t
transaction period = 100
I I50
I I
170 t
I I
100
enc enc
enc enc
dec dec
dec dec
decIP
enc
• enc and dec streams are coupled by decryption unit execution time
28
R. Ernst, TU Braunschweig 60
Analysis improvement due to inter event streamcontexts
context w/otcontext wt
max resp,
max resp,
R. Ernst, TU Braunschweig 61
Analysis improvement due to both intra and inter event stream contexts
context w/otcontext wt
max resp,
max resp,
29
R. Ernst, TU Braunschweig 62
Acknowledgement
• The following persons made major contributions to this work
– Jörn Braam– J. Bhavani– Rafik Henia – Marek Jersak– Razvan Racu – Kai Richter
R. Ernst, TU Braunschweig 63
Conclusion
• systems integration is key ES design problem
• performance verification is key integration problem
• many hidden performance problems not reflected in system function
• performance verification currently primarily based on simulation – risky and time consuming
• presented compositional analysis based on abstract eventflow models
• event flow models enable analysis of complex situations with feedback and contexts
• work applied in several ongoing industry cooperations (SpeAC, FlexFilm, automotive SW, …)
30
R. Ernst, TU Braunschweig 64
Literature
• see: www.spi-project.org
top related