georgia giannopoulou, pengcheng huang, kai …...contention on shared resources tasks interfere by...

Georgia Giannopoulou, Pengcheng Huang, Kai Lampka, Nikolay Stoimenov, Lothar Thiele

Multicore platforms are increasingly used in embedded real-time applications

Real-time applications are increasingly mixed-criticality

Hopes:

Multicore platforms can be used for the efficient deployment of mixed-criticality applications.

Guarantees can be provided for high critical as well as less critical tasks.

Challenge: Interference on shared resources

07/01/2015 2

Contention on shared resources

Tasks interfere by blocking each other during resource accesses

07/01/2015 3

Task executing on Core 1

07/01/2015 3

L1 Cache accessed

07/01/2015 3

L2 Cache accessed

07/01/2015 3

Main Memory accessed

07/01/2015 3

Task executing on Core 2

07/01/2015 3

L1 Cache accessed

07/01/2015 3

L2 Cache is blocked by Core 1 - stall

07/01/2015 3

Task is executing on Core 1

07/01/2015 3

L1 Cache accessed

07/01/2015 3

L2 Cache accessed

07/01/2015 3

Main Memory is blocked by CPU 1 - stall

07/01/2015 3

Main Memory request served Main Memory is accessed

07/01/2015 3

Main Memory is accessed L2 Cache request served

L2 Cache accessed

07/01/2015 3

Main Memory is accessed L2 Cache request served

Main Memory blocked by CPU 2 - stall

07/01/2015 3

L2 Cache request served Main Memory request served

07/01/2015 3

Main Memory request served

L1 Cache request served

07/01/2015 3

L1 Cache request served L2 Cache request served

07/01/2015 3

Main Memory access served

07/01/2015 3

Interferences: CPU1/Core2 blocked by CPU1/Core1 on L2 Cache CPU2/Core1 blocked by CPU1/Core1 on Main Memory CPU1/Core2 blocked by CPU2/Core1 on Main Memory

Criticality level (CL) expresses required protection against failure

Integration of mixed-criticality (MC) applications into a common platform

07.01.2015 4

Spatial and timing isolation

Partitioning mechanisms (ARINC-653)

Certifiable, but…

Resource over-provisioning for high-criticality applications

Resource reclaiming to low-criticality applications not possible

Expensive

Targeting mainly single-core systems

07.01.2015 5

Migration to multicore platforms

07.01.2015 6

Can we utilize them to schedule mixed-criticality

applications efficiently, while preserving isolation?

Relaxed timing isolation

Incremental design

07.01.2015 7

The timing properties of tasks with CL l are preserved when new tasks with CL lower

than l are added to the system.

The response time of tasks with CL l must not be affected by tasks with CL lower than l.

Several policies for single-core and multi-core systems

Multiple execution profiles [S. Vestal, S. Baruah, D. de Niz, …]

Efficient, but…

Resource sharing not considered

No timing isolation

07.01.2015 8

LO …

… Core 1

Core 2

Memory bus

Task CL C(LO) C(HI)

τ1 HI 5 20

τ2 LO 8 -

Possible solutions

Time-triggered memory bus

Memory throttling (servers)

Bounded delays, but…

Not flexible

Not applicable to COTS platforms

07.01.2015 9

LO …

… Core 1

Core 2

Memory bus 1 1 1 1 2 2 2 2

07.01.2015 10

Execute mixed-criticality applications on multicore platforms efficiently while meeting certification requirements and

considering effects of resource sharing.

1. Design scheduling policy that fulfills

certification requirements

2. Develop response time analysis that bounds effects of resource sharing

3. Perform design optimization

07.01.2015 11

07.01.2015 12

System model

07.01.2015 13

Cores Access to private caches No timing anomalies Execution stalls until memory access

completed

Crossbar between cores & banks Time or event-driven arbitration, e.g.,

FCFS, RR, FP, TDMA, FlexRay, …

Shared Memory Non-preemptive accesses Bounded access time Sequential memory space

It models closely a large class of commercial multi/many-core architectures like Kalray MPPA-256, STHorm

C(HI) C(LO)

07.01.2015 14

Periodic tasks Possible dependencies among tasks Superblock structure

Extended execution profiles Memory access bounds

Degraded mode

{μmin,μmax}

{emin,emax}

Task CL C(LO) C(HI)

τ1 HI 5 20

τ2 LO 8 -

Task CL {μmin,μmax} {emin,emax} {μmin,μmax} {emin,emax}

τ1 ΗΙ {8,10} {3,5} {6,22} {0,800}

τ2 LO {4,5} {7,8} - -

Existing Extended

07.01.2015 15

{μmin,μmax}

{emin,emax}

Periodic tasks Possible dependencies among tasks Superblock structure

Extended execution profiles Memory access bounds

Degraded mode C(HI) C(LO)

Task CL C(LO) C(HI)

τ1 HI 5 20

τ2 LO 8 -

Task CL {μmin,μmax} {emin,emax} {μmin,μmax} {emin,emax}

τ1 ΗΙ {8,10} {3,5} {6,22} {0,800}

τ2 LO {4,5} {7,8} {0,2} {1,1}

Existing Extended

07.01.2015 16

Model partitioned memory access

Memory interference graph

tasks memory blocks banks

determined through memory optimization

determined through memory analysis

07.01.2015 17

Time-Triggered Scheduling with Synchronization Points (TTS)

07.01.2015 18

07.01.2015 19

Worst-case time of barrier synchronization can be computed at design time Slack time can be used for future tasks Support for fixed preemption points

Synchronized “partitions” among the cores Only tasks of the same CL interfere on shared memory

Relaxed timing isolation

Slack time & empty frames for future tasks Incremental design

Dynamic dimensioning of “partitions” according to exhibited

execution profiles at runtime Efficiency

No need for hardware support for partitioning or memory

interference restriction Applicable to COTS platforms

07.01.2015 20

07.01.2015 21

07/01/2015 22

Model contention on shared memory banks during each sub-frame Bound WCRT of tasks in each sub-frame

Different execution profiles

Specify worst-case completion time of each subframe

Schliecker et al. [DATE 2010], Pellizzoni et al. [DATE2010], Schranzhofer et al. [RTAS 2010, RTAS 2011]

Event models/arrival curves specify tasks’ memory access patterns

▪ Arbitration policies: FCFS, RR, TDMA, hybrid time-/event-driven

Iterative and dynamic programming approaches to compute WCRT

Conservative derivation of resource load, abstractions over dynamic arbitration → pessimistic WCRT results

Lv et al. [RTSS 2010], Gustavsson et al. [WCET 2010]

Model checking for interference analysis

▪ Arbitration policies: FCFS, TDMA

Precise modeling of arbitration → accurate WCRT results

Analysis not well scalable beyond 2 cores

07/01/2015 23

State-based modeling and analysis (model checking)

Modeling with timed automata

Accurate timing analysis

Analytic abstractions (real-time calculus)

Modeling with arrival curves

Scalability

07/01/2015 24

07/01/2015 25

WCRT analysis

Modeling with real-time calculus & timed automata

07/01/2015 26

WCRT analysis

07/01/2015 26

WCRT analysis

07/01/2015 26

WCRT analysis

07/01/2015 26

WCRT analysis

System execution model = network of collaborating TA

Static Schedulers & Tasks

07/01/2015 27

Resource arbiter TA based on implemented policy

07/01/2015 28

FCFS/RR

FlexRay

→ Input to model checker

Exact WCRT results

Model checking techniques explore exhaustively all feasible traces to find the WCRT of each superblock

State space grows exponentially with:

Number of concurrently executed automata

Number of variables and clocks

Number of synchronization channels

Example: 32-core system, 1 task/core

65 TA (32 Static Schedulers, 32 Tasks, 1 Arbiter)

97 clocks

128 synchronization channels

07/01/2015 29

07/01/2015 30

WCRT analysis

07/01/2015 30

WCRT analysis

Core Under Analysis (CUA)

07/01/2015 30

WCRT analysis

Interfering Cores

07/01/2015 30

WCRT analysis

Interfering Cores

07/01/2015 30

WCRT analysis

Access Request Stream

maximum/minimum arriving events in any interval of length 2.5 ms

Arrival Curve [al,au]

number of access requests in t=[0 .. 2.5] ms

t [ms]

D [ms] 2.5

07/01/2015 31

The arrival curve of each interfering core represents the maximum number of access requests in any time window

Aggregate interference curve

Sum of individual arrival curves

07/01/2015 32

07/01/2015 33

State-based component

Stateless component

TA RTC

The TA guarantee that the event streams conform to

An interference generator emits streams of access requests bounded by upper arrival curve and number of processing cores

All Task and Scheduler TA of interfering cores are substituted by only two TA

Reduced state space!

Example: 32-core system, 1 task/core

5 TA (before: 65)

5 clocks (before: 97)

6 synchronization channels (before: 128)

07/01/2015 34

Empirical Evaluation

07/01/2015 35

Accuracy

Evaluation of accuracy of state-of-the-art analytic approaches

E.g., Pellizzoni et al. [DATE 2010]

▪ Similar assumptions on task execution and resource arbitration

Experimental set-up

EEMBC 1.1 benchmark suite (automotive)

FCFS/RR resource arbitration

No interference abstraction

Evaluation

Feasible state-based analysis (≤ 14 min) up to 6 cores

WCRT estimates up to 27% tighter than [DATE10]

07/01/2015 36

Evaluation of scalability of WCRT analysis

Experimental set-up

4-task automotive application

FCFS/RR/TDMA/FlexRay

resource arbitration

One task on CUA, interference

abstraction of remaining cores

Evaluation

FCFS: 24 cores

RR, TDMA, FlexRay: 64 cores

07/01/2015 37

2 4 8 16 24 32 64

FlexRay

#processing cores

Scalability

07.01.2015 38

Mixed-Criticality Mapping Optimization

07.01.2015 39

Objective: Correctness & Maximization of slack time for future tasks

WCET & Memory Analysis (aiT)

task set; platform

Interference Graph

TTS Schedule Tables

Block-Bank Optimization

Mapping Optimization

mapping memory interference

Remap a task to another core Remap a job to

another frame

Change a task’s preemption points

WCRT analysis under resource contention

07.01.2015 40

Dimension TTS cycle & frames

Generate initial solution

Explore design space (SA)

Tight WCRT analysis (optional)

Best solution admissible?

convergence OR budget exhaustion

Heuristic approach based on a variation of Simulated Annealing

Accurate Method based on model-checking

Employed as a post-processing step for best found solution(s)

07.01.2015 41

versus

Method based on conservative assumptions

E.g., for RR arbitration, each access is delayed by all cores

Employed during design space exploration

07.01.2015 42

Goals:

Comparison to partitioning and state-of-the-art scheduling policies

Applicability for industrial applications

07.01.2015 43

Considered applications

Real-world application (Flight Management System)

Synthetic task sets (generated similar to [2])

Considered platforms

1-8 cores

RR-arbitrated shared memory

[2] Global Mixed-Criticality Scheduling on Multiprocessors, H. Li et al, ECRTS 2012

07.01.2015 44

Evaluation of schedulability for increasing number of cores

500 synthetic task sets with varying access time : execution time ratio (ATR)

07.01.2015 45

Scalability depends not only on cores, but also on

memory system

Comparison of schedulability under

TTS statically scheduled (fixed sub-frames)

500 random task sets with ATR=0.5

07.01.2015 46

Impact of TTS limitations

Frames of fixed length

Fixed preemption points

No task migration

Comparison to state-of-the-art MC scheduling policies

EDF-VD for single cores [3]

GLOBAL for multicores [4]

1000 synthetic task sets (10-20 tasks) with no memory accesses

Tasks with random, equal or harmonic periods

[3] The preemptive uniprocessor scheduling of mixed-criticality implicit-deadline sporadic task sets, S. Baruah et al, ECRTS 2012

[4] Global mixed-criticality scheduling on multiprocessors, H. Li et al, ECRTS 2012

07.01.2015 47

07.01.2015 48

Harmonic periods

-2.2% Avg:

+31% to Comparable results with state-of-the-art policies despite restrictions for

certifiability

System utilization

Flight Management System 26 tasks 2 criticality levels Harmonic periods One computationally intensive task

▪ 3, 4, 7, 9 preemption points

07.01.2015 49

Flight Management System 26 tasks 2 criticality levels Harmonic periods One computationally intensive task

▪ 3, 4, 7, 9 preemption points

07.01.2015 49

Purpose Task CL Period Access Execution Access

Sensor data acquisition

τ1 HI 200 {50,108} {1,20} {50,85}

τ2 HI 200 {0,0} {1,20} {0,7}

τ3, τ4 HI 200 {0,0} {1,20} {0,19}

τ5 HI 200 {0,0} {1,20} {0,9}

Localization

τ6 HI 200 {50,127} {1,20} {10,18}

τ7 HI 1000 {10,18} {1,100} {10,18}

τ8 HI 5000 {20,35} {1,100} {0,1}

τ9 HI 1000 {20,33} {1,100} {0,4}

τ10 HI 200 {0,0} {1,20} {10,20}

τ11 HI 1000 {0,0} {1,100} {0,3}

τ12 HI 200 {0,0} {1,20} {0,3}

Flightplan management τ13, τ15, τ17, τ18 HI 1000 {100,200} {1,100} {20,100}

τ14, τ16, τ19, τ20 LO 1000 {100,200} {1,100} {20,100}

Flightplan computation

τ21 HI 1000 {0,3} {1,100} {0,3}

τ22 HI 1000 {30,54} {1,100} {20,44}

τ23 HI 5000 {200,300} {700,800} {100,180}

Guidance τ24, τ25 HI 200 {0,10} {1,20} {0,1}

Nearest airport τ26 LO 1000 {100,134} {1,100} {200,322}

07.01.2015 50

Design Space Exploration Core allocation

Number of preemption points for task τ23

Admissible Schedules

07.01.2015 50

Design Space Exploration Core allocation

Number of preemption points for task τ23

Convergence to a solution within 25 min. with quick

WCRT analysis

Up to 38% lower objective when combined with mem.

mapping optimization

Admissible Schedules

Mixed-criticality scheduling on resource-sharing multicores

Relaxed timing isolation & incremental design → Certification

Dynamic adaptation to runtime scenarios → Efficiency

Tight WCRT analysis under resource contention

Ease of implementation even on COTS platforms

Advantage from considering underlying platform

07.01.2015 51

Isolation

Efficiency

[EMSOFT12] G. Giannopoulou, K. Lampka, N. Stoimenov, L. Thiele,

“Timed Model Checking with Abstractions: Towards Worst-Case Response Time Analysis in Resource-Sharing Manycore Systems”, pp. 63-72, Oct. 2012

[EMSOFT13] G. Giannopoulou, N. Stoimenov, P. Huang, L. Thiele,

“Scheduling of Mixed-Criticality Applications on Resource-Sharing Multicore Systems”, pp. 17:1-17:15, Oct. 2013

[DATE14] G. Giannopoulou, N. Stoimenov, P. Huang, L. Thiele,

“Mapping Mixed-Criticality Applications on Multi-Core Architectures”, pp. 98:1-98:6, Mar. 2014

[RTS14] K. Lampka, G. Giannopoulou, R. Pellizzoni, Z. Wu, N. Stoimenov,

“A Formal Approach to the WCRT Analysis of Multicore Systems with Memory Contention under phase-structured task sets.”, Real-Time Systems Journal, Vol. 50, Issue 5, pp. 736-773, Nov. 2014

07/01/2015 52

Thank you for your attention! Contact: giannopoulou@tik.ee.ethz.ch

georgia giannopoulou, pengcheng huang, kai …...contention on shared resources tasks interfere by...

Documents

design and construction of roads and accesses to · pdf...

generating a software loop with memory accesses

optimizing memory accesses for spatial computation

synthetic accesses to biguanide compounds -...

yong peng , pengcheng yong and yijuan luo

basics of high-throughput sequencing olivier elemento, phd...

shenzhen bus pengcheng - united nations 26, 2010 ·...

tracking conflicting accesses efficiently for software...

supervisor accesses awards via empowhr

exploiting sequential locality for fast disk accesses

bb 87th ce slides 2013 /pengcheng

hybrid access network (bonding two accesses)

experience report: memory accesses for avionic ... ·...

eﬃcient optimization of memory accesses in parallel...

eﬃcient optimization of memory accesses in...

cache performance analysis of traversals and random accesses

the way out. pengcheng ye [autosaved]

register allocation - university of...

evi32 vmebus interface (evi32) data...

lydia giannopoulou, sabine kasimir-bauer and evi s...