georgia giannopoulou, pengcheng huang, kai …...contention on shared resources tasks interfere by...
TRANSCRIPT
Georgia Giannopoulou, Pengcheng Huang, Kai Lampka, Nikolay Stoimenov, Lothar Thiele
Multicore platforms are increasingly used in embedded real-time applications
Real-time applications are increasingly mixed-criticality
Hopes:
Multicore platforms can be used for the efficient deployment of mixed-criticality applications.
Guarantees can be provided for high critical as well as less critical tasks.
Challenge: Interference on shared resources
07/01/2015 2
Contention on shared resources
Tasks interfere by blocking each other during resource accesses
07/01/2015 3
Contention on shared resources
Tasks interfere by blocking each other during resource accesses
07/01/2015 3
Task executing on Core 1
Contention on shared resources
Tasks interfere by blocking each other during resource accesses
07/01/2015 3
L1 Cache accessed
Contention on shared resources
Tasks interfere by blocking each other during resource accesses
07/01/2015 3
L2 Cache accessed
Contention on shared resources
Tasks interfere by blocking each other during resource accesses
07/01/2015 3
Main Memory accessed
Contention on shared resources
Tasks interfere by blocking each other during resource accesses
07/01/2015 3
Main Memory accessed
Task executing on Core 2
Contention on shared resources
Tasks interfere by blocking each other during resource accesses
07/01/2015 3
Main Memory accessed
L1 Cache accessed
Contention on shared resources
Tasks interfere by blocking each other during resource accesses
07/01/2015 3
Main Memory accessed
L2 Cache is blocked by Core 1 - stall
Contention on shared resources
Tasks interfere by blocking each other during resource accesses
07/01/2015 3
Main Memory accessed
L2 Cache is blocked by Core 1 - stall
Task is executing on Core 1
Contention on shared resources
Tasks interfere by blocking each other during resource accesses
07/01/2015 3
Main Memory accessed
L2 Cache is blocked by Core 1 - stall
L1 Cache accessed
Contention on shared resources
Tasks interfere by blocking each other during resource accesses
07/01/2015 3
Main Memory accessed
L2 Cache is blocked by Core 1 - stall
L2 Cache accessed
Contention on shared resources
Tasks interfere by blocking each other during resource accesses
07/01/2015 3
Main Memory accessed
L2 Cache is blocked by Core 1 - stall
Main Memory is blocked by CPU 1 - stall
Contention on shared resources
Tasks interfere by blocking each other during resource accesses
07/01/2015 3
L2 Cache is blocked by Core 1 - stall
Main Memory request served Main Memory is accessed
Contention on shared resources
Tasks interfere by blocking each other during resource accesses
07/01/2015 3
Main Memory is accessed L2 Cache request served
L2 Cache accessed
Contention on shared resources
Tasks interfere by blocking each other during resource accesses
07/01/2015 3
Main Memory is accessed L2 Cache request served
Main Memory blocked by CPU 2 - stall
Contention on shared resources
Tasks interfere by blocking each other during resource accesses
07/01/2015 3
L2 Cache request served Main Memory request served
Main Memory accessed
Contention on shared resources
Tasks interfere by blocking each other during resource accesses
07/01/2015 3
Main Memory request served
Main Memory accessed
L1 Cache request served
Contention on shared resources
Tasks interfere by blocking each other during resource accesses
07/01/2015 3
Main Memory accessed
L1 Cache request served L2 Cache request served
Contention on shared resources
Tasks interfere by blocking each other during resource accesses
07/01/2015 3
L1 Cache request served L1 Cache request served
Main Memory access served
Contention on shared resources
Tasks interfere by blocking each other during resource accesses
07/01/2015 3
L1 Cache request served L1 Cache request served
L2 Cache request served
Contention on shared resources
Tasks interfere by blocking each other during resource accesses
07/01/2015 3
L1 Cache request served
Interferences: CPU1/Core2 blocked by CPU1/Core1 on L2 Cache CPU2/Core1 blocked by CPU1/Core1 on Main Memory CPU1/Core2 blocked by CPU2/Core1 on Main Memory
Criticality level (CL) expresses required protection against failure
Integration of mixed-criticality (MC) applications into a common platform
07.01.2015 4
Spatial and timing isolation
Partitioning mechanisms (ARINC-653)
Certifiable, but…
Resource over-provisioning for high-criticality applications
Resource reclaiming to low-criticality applications not possible
Expensive
Targeting mainly single-core systems
07.01.2015 5
HI LO
Migration to multicore platforms
07.01.2015 6
Can we utilize them to schedule mixed-criticality
applications efficiently, while preserving isolation?
Relaxed timing isolation
Incremental design
07.01.2015 7
The timing properties of tasks with CL l are preserved when new tasks with CL lower
than l are added to the system.
The response time of tasks with CL l must not be affected by tasks with CL lower than l.
Several policies for single-core and multi-core systems
Multiple execution profiles [S. Vestal, S. Baruah, D. de Niz, …]
Efficient, but…
Resource sharing not considered
No timing isolation
07.01.2015 8
HI
LO …
… Core 1
Core 2
Memory bus
Task CL C(LO) C(HI)
τ1 HI 5 20
τ2 LO 8 -
Possible solutions
Time-triggered memory bus
Memory throttling (servers)
Bounded delays, but…
Not flexible
Not applicable to COTS platforms
07.01.2015 9
HI
LO …
… Core 1
Core 2
Memory bus 1 1 1 1 2 2 2 2
07.01.2015 10
Execute mixed-criticality applications on multicore platforms efficiently while meeting certification requirements and
considering effects of resource sharing.
1. Design scheduling policy that fulfills
certification requirements
2. Develop response time analysis that bounds effects of resource sharing
3. Perform design optimization
uses
07.01.2015 11
Execute mixed-criticality applications on multicore platforms efficiently while meeting certification requirements and
considering effects of resource sharing.
1. Design scheduling policy that fulfills
certification requirements
2. Develop response time analysis that bounds effects of resource sharing
3. Perform design optimization
uses
07.01.2015 12
System model
07.01.2015 13
Cores Access to private caches No timing anomalies Execution stalls until memory access
completed
Crossbar between cores & banks Time or event-driven arbitration, e.g.,
FCFS, RR, FP, TDMA, FlexRay, …
Shared Memory Non-preemptive accesses Bounded access time Sequential memory space
It models closely a large class of commercial multi/many-core architectures like Kalray MPPA-256, STHorm
C(HI) C(LO)
07.01.2015 14
Periodic tasks Possible dependencies among tasks Superblock structure
Extended execution profiles Memory access bounds
Degraded mode
{μmin,μmax}
{emin,emax}
Task CL C(LO) C(HI)
τ1 HI 5 20
τ2 LO 8 -
Task CL {μmin,μmax} {emin,emax} {μmin,μmax} {emin,emax}
τ1 ΗΙ {8,10} {3,5} {6,22} {0,800}
τ2 LO {4,5} {7,8} - -
Existing Extended
07.01.2015 15
{μmin,μmax}
{emin,emax}
Periodic tasks Possible dependencies among tasks Superblock structure
Extended execution profiles Memory access bounds
Degraded mode C(HI) C(LO)
Task CL C(LO) C(HI)
τ1 HI 5 20
τ2 LO 8 -
Task CL {μmin,μmax} {emin,emax} {μmin,μmax} {emin,emax}
τ1 ΗΙ {8,10} {3,5} {6,22} {0,800}
τ2 LO {4,5} {7,8} {0,2} {1,1}
Existing Extended
07.01.2015 16
Model partitioned memory access
Memory interference graph
tasks memory blocks banks
determined through memory optimization
determined through memory analysis
07.01.2015 17
Time-Triggered Scheduling with Synchronization Points (TTS)
07.01.2015 18
HI LO
07.01.2015 19
Worst-case time of barrier synchronization can be computed at design time Slack time can be used for future tasks Support for fixed preemption points
HI LO
Synchronized “partitions” among the cores Only tasks of the same CL interfere on shared memory
Relaxed timing isolation
Slack time & empty frames for future tasks Incremental design
Dynamic dimensioning of “partitions” according to exhibited
execution profiles at runtime Efficiency
No need for hardware support for partitioning or memory
interference restriction Applicable to COTS platforms
07.01.2015 20
07.01.2015 21
Execute mixed-criticality applications on multicore platforms efficiently while meeting certification requirements and
considering effects of resource sharing.
1. Design scheduling policy that fulfills
certification requirements
2. Develop response time analysis that bounds effects of resource sharing
3. Perform design optimization
uses
07/01/2015 22
Model contention on shared memory banks during each sub-frame Bound WCRT of tasks in each sub-frame
Different execution profiles
Specify worst-case completion time of each subframe
Schliecker et al. [DATE 2010], Pellizzoni et al. [DATE2010], Schranzhofer et al. [RTAS 2010, RTAS 2011]
Event models/arrival curves specify tasks’ memory access patterns
▪ Arbitration policies: FCFS, RR, TDMA, hybrid time-/event-driven
Iterative and dynamic programming approaches to compute WCRT
Conservative derivation of resource load, abstractions over dynamic arbitration → pessimistic WCRT results
Lv et al. [RTSS 2010], Gustavsson et al. [WCET 2010]
Model checking for interference analysis
▪ Arbitration policies: FCFS, TDMA
Precise modeling of arbitration → accurate WCRT results
Analysis not well scalable beyond 2 cores
07/01/2015 23
State-based modeling and analysis (model checking)
Modeling with timed automata
Accurate timing analysis
Analytic abstractions (real-time calculus)
Modeling with arrival curves
Scalability
07/01/2015 24
07/01/2015 25
WCRT analysis
Modeling with timed automata
Modeling with real-time calculus & timed automata
07/01/2015 26
WCRT analysis
Modeling with timed automata
Modeling with real-time calculus & timed automata
07/01/2015 26
WCRT analysis
Modeling with timed automata
Modeling with real-time calculus & timed automata
07/01/2015 26
WCRT analysis
Modeling with timed automata
Modeling with real-time calculus & timed automata
07/01/2015 26
WCRT analysis
Modeling with timed automata
Modeling with real-time calculus & timed automata
System execution model = network of collaborating TA
Static Schedulers & Tasks
07/01/2015 27
Resource arbiter TA based on implemented policy
07/01/2015 28
FCFS/RR
TDMA
FlexRay
→ Input to model checker
Exact WCRT results
Model checking techniques explore exhaustively all feasible traces to find the WCRT of each superblock
State space grows exponentially with:
Number of concurrently executed automata
Number of variables and clocks
Number of synchronization channels
Example: 32-core system, 1 task/core
65 TA (32 Static Schedulers, 32 Tasks, 1 Arbiter)
97 clocks
128 synchronization channels
07/01/2015 29
07/01/2015 30
WCRT analysis
Modeling with timed automata
Modeling with real-time calculus & timed automata
07/01/2015 30
WCRT analysis
Modeling with timed automata
Modeling with real-time calculus & timed automata
Core Under Analysis (CUA)
07/01/2015 30
WCRT analysis
Modeling with timed automata
Modeling with real-time calculus & timed automata
Interfering Cores
07/01/2015 30
WCRT analysis
Modeling with timed automata
Modeling with real-time calculus & timed automata
Interfering Cores
07/01/2015 30
WCRT analysis
Modeling with timed automata
Modeling with real-time calculus & timed automata
Access Request Stream
maximum/minimum arriving events in any interval of length 2.5 ms
Arrival Curve [al,au]
number of access requests in t=[0 .. 2.5] ms
t
D
t [ms]
# r
equ
est
s
2.5
dem
and
D [ms] 2.5
al
au
07/01/2015 31
The arrival curve of each interfering core represents the maximum number of access requests in any time window
Aggregate interference curve
Sum of individual arrival curves
07/01/2015 32
07/01/2015 33
State-based component
Stateless component
TA RTC
RTC
The TA guarantee that the event streams conform to
An interference generator emits streams of access requests bounded by upper arrival curve and number of processing cores
All Task and Scheduler TA of interfering cores are substituted by only two TA
Reduced state space!
Example: 32-core system, 1 task/core
5 TA (before: 65)
5 clocks (before: 97)
6 synchronization channels (before: 128)
07/01/2015 34
Empirical Evaluation
07/01/2015 35
Accuracy
Evaluation of accuracy of state-of-the-art analytic approaches
E.g., Pellizzoni et al. [DATE 2010]
▪ Similar assumptions on task execution and resource arbitration
Experimental set-up
EEMBC 1.1 benchmark suite (automotive)
FCFS/RR resource arbitration
No interference abstraction
Evaluation
Feasible state-based analysis (≤ 14 min) up to 6 cores
WCRT estimates up to 27% tighter than [DATE10]
07/01/2015 36
Evaluation of scalability of WCRT analysis
Experimental set-up
4-task automotive application
FCFS/RR/TDMA/FlexRay
resource arbitration
One task on CUA, interference
abstraction of remaining cores
Evaluation
FCFS: 24 cores
RR, TDMA, FlexRay: 64 cores
07/01/2015 37
0
500
1000
1500
2 4 8 16 24 32 64
FCFS
0
1
2
3
2 4 8 16 24 32 64
RR
0
0.5
1
1.5
2
2 4 8 16 24 32 64
TDMA
0
200
400
600
2 4 8 16 24 32 64
FlexRay
#processing cores
Ver
ific
atio
n t
ime
(sec
.)
Scalability
07.01.2015 38
Execute mixed-criticality applications on multicore platforms efficiently while meeting certification requirements and
considering effects of resource sharing.
1. Design scheduling policy that fulfills
certification requirements
2. Develop response time analysis that bounds effects of resource sharing
3. Perform design optimization
uses
Mixed-Criticality Mapping Optimization
07.01.2015 39
Objective: Correctness & Maximization of slack time for future tasks
WCET & Memory Analysis (aiT)
task set; platform
Interference Graph
TTS Schedule Tables
Block-Bank Optimization
Mapping Optimization
mapping memory interference
Remap a task to another core Remap a job to
another frame
Change a task’s preemption points
WCRT analysis under resource contention
07.01.2015 40
Dimension TTS cycle & frames
Generate initial solution
Explore design space (SA)
Tight WCRT analysis (optional)
Best solution admissible?
END
YES
NO
convergence OR budget exhaustion
Heuristic approach based on a variation of Simulated Annealing
Accurate Method based on model-checking
Employed as a post-processing step for best found solution(s)
07.01.2015 41
versus
Quick
Method based on conservative assumptions
E.g., for RR arbitration, each access is delayed by all cores
Employed during design space exploration
07.01.2015 42
Execute mixed-criticality applications on multicore platforms efficiently while meeting certification requirements and
considering effects of resource sharing.
1. Design scheduling policy that fulfills
certification requirements
2. Develop response time analysis that bounds effects of resource sharing
3. Perform design optimization
uses
Goals:
Comparison to partitioning and state-of-the-art scheduling policies
Applicability for industrial applications
07.01.2015 43
Considered applications
Real-world application (Flight Management System)
Synthetic task sets (generated similar to [2])
Considered platforms
1-8 cores
RR-arbitrated shared memory
[2] Global Mixed-Criticality Scheduling on Multiprocessors, H. Li et al, ECRTS 2012
07.01.2015 44
Evaluation of schedulability for increasing number of cores
500 synthetic task sets with varying access time : execution time ratio (ATR)
07.01.2015 45
Scalability depends not only on cores, but also on
memory system
Comparison of schedulability under
TTS
TTS statically scheduled (fixed sub-frames)
500 random task sets with ATR=0.5
07.01.2015 46
-15%
Impact of TTS limitations
Frames of fixed length
Fixed preemption points
No task migration
Comparison to state-of-the-art MC scheduling policies
EDF-VD for single cores [3]
GLOBAL for multicores [4]
1000 synthetic task sets (10-20 tasks) with no memory accesses
Tasks with random, equal or harmonic periods
[3] The preemptive uniprocessor scheduling of mixed-criticality implicit-deadline sporadic task sets, S. Baruah et al, ECRTS 2012
[4] Global mixed-criticality scheduling on multiprocessors, H. Li et al, ECRTS 2012
07.01.2015 47
07.01.2015 48
Harmonic periods
-26%
-2.2% Avg:
+31% to Comparable results with state-of-the-art policies despite restrictions for
certifiability
Sch
ed
ula
ble
ta
sk s
ets
(%)
System utilization
Flight Management System 26 tasks 2 criticality levels Harmonic periods One computationally intensive task
▪ 3, 4, 7, 9 preemption points
07.01.2015 49
Flight Management System 26 tasks 2 criticality levels Harmonic periods One computationally intensive task
▪ 3, 4, 7, 9 preemption points
07.01.2015 49
Purpose Task CL Period Access Execution Access
Sensor data acquisition
τ1 HI 200 {50,108} {1,20} {50,85}
τ2 HI 200 {0,0} {1,20} {0,7}
τ3, τ4 HI 200 {0,0} {1,20} {0,19}
τ5 HI 200 {0,0} {1,20} {0,9}
Localization
τ6 HI 200 {50,127} {1,20} {10,18}
τ7 HI 1000 {10,18} {1,100} {10,18}
τ8 HI 5000 {20,35} {1,100} {0,1}
τ9 HI 1000 {20,33} {1,100} {0,4}
τ10 HI 200 {0,0} {1,20} {10,20}
τ11 HI 1000 {0,0} {1,100} {0,3}
τ12 HI 200 {0,0} {1,20} {0,3}
Flightplan management τ13, τ15, τ17, τ18 HI 1000 {100,200} {1,100} {20,100}
τ14, τ16, τ19, τ20 LO 1000 {100,200} {1,100} {20,100}
Flightplan computation
τ21 HI 1000 {0,3} {1,100} {0,3}
τ22 HI 1000 {30,54} {1,100} {20,44}
τ23 HI 5000 {200,300} {700,800} {100,180}
Guidance τ24, τ25 HI 200 {0,10} {1,20} {0,1}
Nearest airport τ26 LO 1000 {100,134} {1,100} {200,322}
07.01.2015 50
Design Space Exploration Core allocation
Number of preemption points for task τ23
Admissible Schedules
07.01.2015 50
Design Space Exploration Core allocation
Number of preemption points for task τ23
Convergence to a solution within 25 min. with quick
WCRT analysis
Up to 38% lower objective when combined with mem.
mapping optimization
Admissible Schedules
Mixed-criticality scheduling on resource-sharing multicores
Relaxed timing isolation & incremental design → Certification
Dynamic adaptation to runtime scenarios → Efficiency
Tight WCRT analysis under resource contention
Ease of implementation even on COTS platforms
Advantage from considering underlying platform
07.01.2015 51
Isolation
Efficiency
[EMSOFT12] G. Giannopoulou, K. Lampka, N. Stoimenov, L. Thiele,
“Timed Model Checking with Abstractions: Towards Worst-Case Response Time Analysis in Resource-Sharing Manycore Systems”, pp. 63-72, Oct. 2012
[EMSOFT13] G. Giannopoulou, N. Stoimenov, P. Huang, L. Thiele,
“Scheduling of Mixed-Criticality Applications on Resource-Sharing Multicore Systems”, pp. 17:1-17:15, Oct. 2013
[DATE14] G. Giannopoulou, N. Stoimenov, P. Huang, L. Thiele,
“Mapping Mixed-Criticality Applications on Multi-Core Architectures”, pp. 98:1-98:6, Mar. 2014
[RTS14] K. Lampka, G. Giannopoulou, R. Pellizzoni, Z. Wu, N. Stoimenov,
“A Formal Approach to the WCRT Analysis of Multicore Systems with Memory Contention under phase-structured task sets.”, Real-Time Systems Journal, Vol. 50, Issue 5, pp. 736-773, Nov. 2014
07/01/2015 52
Thank you for your attention! Contact: [email protected]