resource sharing and partitioning in...

www.bsc.es

Francisco J. CazorlaMixed Criticality/Reliability Workshop

HiPEAC CSWBarcelona May 2014

Resource Sharing and Partitioning in Multicore

2

Transition to Multicore and Manycores

Wanted or imposed transition?Deal with the ‘multicore paradox’ (well, one of them )– Want: benefits of sharing to improve average performance– Don’t want: problems brought by sharing to provide time bounds

Opportunity:– Integrating multiple applications onto the same hardware platform

advantages in performance, production costs, and reliability

Threat:– Software performance becomes unpredictable and hard to analyzable

Challenge:– “Prove that all temporal constraints will be satisfied during operation” in

a critical system

3

Hardware Resource Sharing (HRS): The problem

Contention in the access to shared resources affects the execution time, and WCET estimates, of corunning tasksVaries across resource types:– Stateless: bus– Statefull: cache

4

Time composability shapes how HRS is handled

Composability: – Attained whenever a property can be determined for each element of a

system isolation, and that property does not change when multiple items are brought together.

Time composability– The timing behavior of a software component observed, by analysis or

measurement, in isolation, is not affected by the presence of other software components.

Benefits in achieving incremental verification

5

Time composability shapes how HRS is handled

TC Execution Times– The execution time of a software component observed in isolation is

not affected by the presence of other components

TC WCET estimates– The WCET bound derived for a software component in isolation is not

affected by the presence of other components

0

0,5

1

1,5

2

2,5

Task A Task A, B Task A, C Task A, B, C WCET

Nor

mal

ized

Exe

cutio

nTi

me

of A

Workloads

6

Contention in Hardware Shared Resources (HSR)

Access time to HSR: – Isolation: – multicore:

and are history dependent– Depend on previously executed and active requests

• Vary from request to request – Stateless resources (Bus): arbitration policy– Statefull resources (Cache):

• bank conflicts• Content conflict

Approach:– Derive bounds to and – Solutions provide different design points in the tightness - complexity in

the analysis – time composability space

7

Multiple criticalities (dual)

and : – Non real-time tasks real-time tasks– Real-time tasks real-time tasks

Non real-time tasks real-time tasks– Transparent execution:

• Hardware designs preventing non real-time tasks from affecting real-time tasks

• Split long-latency operations into small interruptible micro-operations

Real-time tasks real-time tasks

8

Stateless resources: Time-dependent bounds

Time sharing of the resource– The load that the of contenders put on the resource does not affect a

given task WCET estimate

TDMA– TDMA window size = Ncontenders * SlotSize– Non work conserving

Bounds to can be effectively computed– Introduce complexities in the timing analysis tool

• depends on the particular ‘relative cycle’ in which a request is ready– Tighter results when the exact access cycle(s) for each request are

known

9

Stateless resources: History-independent bounds

Bounds are valid for any history of execution– Round Robin arbitration

• 1 ∗

Bounds to can be effectively computed– Introduce no complexities in the timing analysis tool

What about pessimism?– Low access frequency to the bus Low impact of on exec. time– High frequency of access to the bus

• is not overly pessimistic if the load on the bus is high• is pessimistic when the other tasks have low bus usage

– “I assume that I am the last in the round-robin arbitration but other tasks access very seldom the bus”

• Use prioritized round-robin (grouping)

10

Illustrative example

Requestors split into g groupsEach group gi contain ni tasksRR: – Across tasks in each group – Across groups

Each task in a group has (gi x ni) -1 contenders

t1t2

t6t3t4t5

t7t8

Groups/Tasks Delta(1,1,1,1,1,1,1) 7(1,1,1,5) 3 19(1,2,5) 2 5 14

11

Statefull resources: Combined Analysis & partitioning

Mainly for shared last-level caches (LLC)

Arb bank conflictAcc hard to track! History dependence is long!

Combined analysis:– Extending STA to determine which accesses hit/miss in cache

• Shift in the execution invalidates the analyses.• Breaks time composability

Cache partitioning– Software: coloring (set partitioning)– Hardware: Columnization (way partitioning) or Bankization (set

partitioning)

Probabilistic Timing Analysis

Replacing deterministic approaches with probabilistic ones– Get rid of the need (and cost) of the detail design knowledge required

to causally model the timing behavior of all system resources o.

Principle: Make resource latency can be accurately captured with a probabilistic law

12

13

Probabilistic shared cache

Given two tasks accessing a shared cache the features that shape their interference are:Time deterministic cache:

Memory mapping of tasks as it determines the sets accessed Access frequencies of each tasks affects LRU stateRelative order of accesses affect LRU state

Time Randomised cache:Memory mapping of tasksMiss frequencies of tasks (Hits do not affect cache state)Relative order of accesses

14

1Mladen Slijepcevic, Leonidas Kosmidis, Jaume Abella, Eduardo Quinones, Francisco J. CazorlaTime-Analysable Non-Partitioned Shared Caches for Real-Time Multicore SystemsIn Design Automation Conference (DAC) San Francisco, CA. June, 2014.

Probabilistic cache

Idea1: In Time Randomised caches limiting how often a task can evict lines from LLC is enough to derive trustworthy and tight WCET estimate– Controlling LLC miss frequency– No need to physically partition the cache

Cache as a capacity Resource

15

1Mladen Slijepcevic, Leonidas Kosmidis, Jaume Abella, Eduardo Quinones, Francisco J. CazorlaTime-Analysable Non-Partitioned Shared Caches for Real-Time Multicore SystemsIn Design Automation Conference (DAC) San Francisco, CA. June, 2014.

Probabilistic cache

Idea1: In Time Randomised caches limiting how often a task can evict lines from LLC is enough to derive trustworthy and tight WCET estimate– Controlling LLC miss frequency– No need to physically partition the cache

Cache as a capacity Resource

16

On-Chip Resources vs Off-Chip Resources

On-chip resources:– Less visible and “malleable” from the SW– High access frequency (nano seconds)– Contention captured in the WCET analysis

• Access bounds• Combined Analysis

Off-chip resources:– Consider no contention (or bounded) in on-chip resources– Capture the impact of contention compositionally

• WCRT = WCETisol + offchip-contention

17

Challenges

Manycores– Abstracting internal HSR operation vs. exposing it– Abstracting: History-independent access-latency bounds

• Simplify analysis but introduce pessimism– Exposing internal operation

• Timing Analysis tool can benefit from execution history • Tracking execution history may easily result in a state explosion

Multithreaded applications – Identify parallelization paradigms and how they impact WCET– Parallel-application aware WCET analysis– WCET-aware and parallel-application aware architecture

www.bsc.es

Francisco J. CazorlaMixed Criticality/Reliability Workshop

HiPEAC CSWBarcelona May 2014

Resource Sharing and Partitioning in Multicore

resource sharing and partitioning in...

Documents