resource sharing and partitioning in...
TRANSCRIPT
www.bsc.es
Francisco J. CazorlaMixed Criticality/Reliability Workshop
HiPEAC CSWBarcelona May 2014
Resource Sharing and Partitioning in Multicore
2
Transition to Multicore and Manycores
Wanted or imposed transition?Deal with the ‘multicore paradox’ (well, one of them )– Want: benefits of sharing to improve average performance– Don’t want: problems brought by sharing to provide time bounds
Opportunity:– Integrating multiple applications onto the same hardware platform
advantages in performance, production costs, and reliability
Threat:– Software performance becomes unpredictable and hard to analyzable
Challenge:– “Prove that all temporal constraints will be satisfied during operation” in
a critical system
3
Hardware Resource Sharing (HRS): The problem
Contention in the access to shared resources affects the execution time, and WCET estimates, of corunning tasksVaries across resource types:– Stateless: bus– Statefull: cache
4
Time composability shapes how HRS is handled
Composability: – Attained whenever a property can be determined for each element of a
system isolation, and that property does not change when multiple items are brought together.
Time composability– The timing behavior of a software component observed, by analysis or
measurement, in isolation, is not affected by the presence of other software components.
Benefits in achieving incremental verification
5
Time composability shapes how HRS is handled
TC Execution Times– The execution time of a software component observed in isolation is
not affected by the presence of other components
TC WCET estimates– The WCET bound derived for a software component in isolation is not
affected by the presence of other components
0
0,5
1
1,5
2
2,5
Task A Task A, B Task A, C Task A, B, C WCET
Nor
mal
ized
Exe
cutio
nTi
me
of A
Workloads
6
Contention in Hardware Shared Resources (HSR)
Access time to HSR: – Isolation: – multicore:
and are history dependent– Depend on previously executed and active requests
• Vary from request to request – Stateless resources (Bus): arbitration policy– Statefull resources (Cache):
• bank conflicts• Content conflict
Approach:– Derive bounds to and – Solutions provide different design points in the tightness - complexity in
the analysis – time composability space
7
Multiple criticalities (dual)
and : – Non real-time tasks real-time tasks– Real-time tasks real-time tasks
Non real-time tasks real-time tasks– Transparent execution:
• Hardware designs preventing non real-time tasks from affecting real-time tasks
• Split long-latency operations into small interruptible micro-operations
Real-time tasks real-time tasks
8
Stateless resources: Time-dependent bounds
Time sharing of the resource– The load that the of contenders put on the resource does not affect a
given task WCET estimate
TDMA– TDMA window size = Ncontenders * SlotSize– Non work conserving
Bounds to can be effectively computed– Introduce complexities in the timing analysis tool
• depends on the particular ‘relative cycle’ in which a request is ready– Tighter results when the exact access cycle(s) for each request are
known
9
Stateless resources: History-independent bounds
Bounds are valid for any history of execution– Round Robin arbitration
• 1 ∗
Bounds to can be effectively computed– Introduce no complexities in the timing analysis tool
What about pessimism?– Low access frequency to the bus Low impact of on exec. time– High frequency of access to the bus
• is not overly pessimistic if the load on the bus is high• is pessimistic when the other tasks have low bus usage
– “I assume that I am the last in the round-robin arbitration but other tasks access very seldom the bus”
• Use prioritized round-robin (grouping)
10
Illustrative example
Requestors split into g groupsEach group gi contain ni tasksRR: – Across tasks in each group – Across groups
Each task in a group has (gi x ni) -1 contenders
t1t2
t6t3t4t5
t7t8
Groups/Tasks Delta(1,1,1,1,1,1,1) 7(1,1,1,5) 3 19(1,2,5) 2 5 14
11
Statefull resources: Combined Analysis & partitioning
Mainly for shared last-level caches (LLC)
Arb bank conflictAcc hard to track! History dependence is long!
Combined analysis:– Extending STA to determine which accesses hit/miss in cache
• Shift in the execution invalidates the analyses.• Breaks time composability
Cache partitioning– Software: coloring (set partitioning)– Hardware: Columnization (way partitioning) or Bankization (set
partitioning)
Probabilistic Timing Analysis
Replacing deterministic approaches with probabilistic ones– Get rid of the need (and cost) of the detail design knowledge required
to causally model the timing behavior of all system resources o.
Principle: Make resource latency can be accurately captured with a probabilistic law
12
13
Probabilistic shared cache
Given two tasks accessing a shared cache the features that shape their interference are:Time deterministic cache:
Memory mapping of tasks as it determines the sets accessed Access frequencies of each tasks affects LRU stateRelative order of accesses affect LRU state
Time Randomised cache:Memory mapping of tasksMiss frequencies of tasks (Hits do not affect cache state)Relative order of accesses
14
1Mladen Slijepcevic, Leonidas Kosmidis, Jaume Abella, Eduardo Quinones, Francisco J. CazorlaTime-Analysable Non-Partitioned Shared Caches for Real-Time Multicore SystemsIn Design Automation Conference (DAC) San Francisco, CA. June, 2014.
Probabilistic cache
Idea1: In Time Randomised caches limiting how often a task can evict lines from LLC is enough to derive trustworthy and tight WCET estimate– Controlling LLC miss frequency– No need to physically partition the cache
Cache as a capacity Resource
15
1Mladen Slijepcevic, Leonidas Kosmidis, Jaume Abella, Eduardo Quinones, Francisco J. CazorlaTime-Analysable Non-Partitioned Shared Caches for Real-Time Multicore SystemsIn Design Automation Conference (DAC) San Francisco, CA. June, 2014.
Probabilistic cache
Idea1: In Time Randomised caches limiting how often a task can evict lines from LLC is enough to derive trustworthy and tight WCET estimate– Controlling LLC miss frequency– No need to physically partition the cache
Cache as a capacity Resource
16
On-Chip Resources vs Off-Chip Resources
On-chip resources:– Less visible and “malleable” from the SW– High access frequency (nano seconds)– Contention captured in the WCET analysis
• Access bounds• Combined Analysis
Off-chip resources:– Consider no contention (or bounded) in on-chip resources– Capture the impact of contention compositionally
• WCRT = WCETisol + offchip-contention
17
Challenges
Manycores– Abstracting internal HSR operation vs. exposing it– Abstracting: History-independent access-latency bounds
• Simplify analysis but introduce pessimism– Exposing internal operation
• Timing Analysis tool can benefit from execution history • Tracking execution history may easily result in a state explosion
Multithreaded applications – Identify parallelization paradigms and how they impact WCET– Parallel-application aware WCET analysis– WCET-aware and parallel-application aware architecture