response-time analysis for globally scheduled symmetric multiprocessor platforms
DESCRIPTION
Response-Time Analysis for globally scheduled Symmetric Multiprocessor Platforms. RETIS Lab. Real-Time Systems Laboratory. RTSS’07. Marko Bertogna, Michele Cirinei. Overview. Multiprocessor global scheduling Existing schedulability tests for global schedulers Limits of existing techniques - PowerPoint PPT PresentationTRANSCRIPT
Response-Time Analysis for globally scheduled Symmetric Multiprocessor Platforms
Real-Time Systems Laboratory
RETIS Lab
Marko Bertogna, Michele Cirinei
RTSS’07
Overview
• Multiprocessor global scheduling• Existing schedulability tests for global
schedulers• Limits of existing techniques• Extending Response Time Analysis to
multiprocessor systems:• Generic work-conserving schedulers• Global EDF• Global FP
• Simulations and conclusions
Global scheduling on SMP
CPU1
CPU2
CPU3
Global queue(ordered according to a given
policy)
The first m tasks are scheduled upon the m CPUs
Global scheduling on SMP
CPU1
CPU2
CPU3
When a task finishes its execution, the next one in the queue is scheduled on the available CPU
Global queue(ordered according to a given
policy)
Global scheduling on SMP
CPU1
CPU2
CPU3
When a higher priority task arrives, it preempts the task with lowest priority among the executing ones
Global queue(ordered according to a given
policy)
Global scheduling on SMP
CPU1
CPU2
CPU3
When another task ends its execution, the preempted task can resume its execution
Task “migrated” from
CPU3 to CPU1
Global queue(ordered according to a given
policy)
• Single system-wide queue instead of multiple per-processor queues:
CPU1
CPU2
CPU3
Global scheduling properties
CPU1
CPU2
CPU3
Global scheduling Partitioned approach
Global scheduling advantages
Advantages of global schedulers w.r.t. partitioning algorithms:
• Load automatically balanced• More efficient handling of overload conditions• More flexible reclaiming of unused bandwidth• Easier re-scheduling• Lower number of preemptions (but need to limit
migration cost)
On-line scheduling problem for global schedulers
• Limited performances of classical algorithms (EDF, RM, etc.)
• Pfair optimal only for implicit deadlines (Di = Ti)
• No optimal algorithm known for more general task models
• Many hybrid solutions proposed (EDF-US, RM-US, fpEDF, EDZL, etc.)
Schedulability problem for global scheduling
• All known exact tests are computationally intractable for non-trivial task sets
• Many different sufficient schedulability tests• Big gap from necessary and sufficient
conditions difficult to “compare” the various scheduling policies
• Need to reduce this schedulability gap
Considered task model
• Periodic and sporadic tasks: = (Ci,Di,Ti)
• Constrained deadlines: Di ≤ Ti
• Platform composed by m identical processors• Work-conserving global schedulers
Work-conserving scheduling policy:a processor is never idled when a task is ready to execute.
i
Existing schedulability tests for work-conserving global policies
• Fixed task priority:• Andersson et al.: utilization bound (RTSS’01), later improved
and extended for constrained deadlines by Bertogna et al. (OPODIS’05)
• Baker: demand-based polynomial test (RTSS’03, JTCC’06 and JEC’07)
• Fisher, Baruah: load-based pseudo-polynomial tests (IASTED’06, OPODIS’07, ICDCN’08)
• Dynamic task priority:• Goossens et al.: EDF utilization bound (RTSJ’03)
(Utot ≤ m(1-Umax)+Umax)) later extended for arbitrary deadlines• Baker: demand-based polynomial test (RTSS’03 and TPDS’07)• Bertogna et al.: demand-based ploynomial test (ECRTS’05)• Fisher, Baruah: load-based tests (ECRTS’07, RTSS’07)
Existing schedulability tests for work-conserving global policies
• Hybrid algorithms:• Srinivasan, Baruah: bound for EDF-US (IPL’02) generalized
by Baruah’s bound for fpEDF (Utot ≤ ) valid for implicit deadlines (TC’04)
• Cirinei, Baker: EDZL demand-based polynomial test (ECRTS’07)
• Dynamic job priority: • Pfair (Utot ≤ m, valid only for implicit deadlines)• Andersson, Tovar: EKG for implicit deadlines (RTCSA’06)
• Feasibility results:• Fisher, Baruah: load based pseudo-polynomial test
(ECRTS’06 improved in ECRTS’07)• Baker, Cirinei: load-based pseudo-polynomial necessary
test (RTSS’06)
2
1m
Our approach
• All existing schedulability tests have poor performances
• A better analysis of worst-case situations is needed
• Refine the estimation of the maximum interference a task can impose on other taks
• Apply Response Time Analysis (RTA) to multiprocessor systems; then check if
WCRTi ≤ Di for all tasks
RTA for Uniprocessors
The synchronous arrival of all tasks is a critical instant: we can compute the worst-case interferences considering that situation.
… … …
12
n• Synchronous arrivals• Jobs released as soon as permitted
RTA for Uniprocessors
• For FP, the worst-case response time of a task is given by the first instance released at a critical instant
• For EDF, it is given by an instance in a busy interval starting with a critical instant
With these observations it is possible to compute the WCRT of all tasks. Example: for FP, the WCRT of a task k is given by the fixed point of:
ihp i
kkk C
T
RCR
i
And for Multiprocessor?
For global schedulers, things are much more difficult:
• The synchronous arrival of tasks doesn’t represent a critical instant.
• Difficult to find a worst-case situation in which to compute the maximum response times.
• Need to introduce some pessimistic assumptions to make things easier
Introducing the interference
Ik = Total interference suffered by task k
Iki = Interference of task i on task k
kik
ikkkkkk RI
mCRICR )(
1)(
m
RIRI k
ik
kk
)()(
kk
kCPU1CPU2CPU3
rkrk+Rk
Ik2Ik1
Ik2
Ik3Ik4Ik5
Ik6
Ik8Ik5
Ik3
Ik7
Ik3
Limiting the interference
IDEA: It is sufficient to consider at most the portion Rk-Ck+1 of each term Ii,k in the sum
kikkk
ikkk CRRI
mCR )1),(min(
1It can be proved that WCRTk is given by the fixed point of:
1)()( kkkkkik CRRIRI
kk
kCPU1CPU2CPU3
rkrk+Rk
Ik2Ik1
Ik2
Ik3Ik4Ik5
Ik6
Ik8Ik5
Ik3
Ik7
Ik3
Bounding the interference
Exactly computing the interference is complex
Pessimistic assumptions:
1. Bound the interference of a task with the workload: .
2. Use an upper bound on the workload.
)()( kikik RWRI
Improving the estimation of the workload using slack values
Consider a situation in which:• The first job executes as close as possible to its deadline• Successive jobs execute as soon as possible
)()()()( LCLNLwLW iiiii
i
iii T
CDLLN )(
))(,min()( iiiiiii TLNCDLCL
where:
Cii
L
Di
Ci Ci Ci
Tiεi
(# jobs excluded the last one)
(last job)
RTA for generic global schedulers
• An upper bound on the WCRT of task k is given by the fixed point of Rk in the iteration:
• The slack of task k is at least:
kikkkikk CRRw
mCR )1),(min(
1
kkk RDS
Rk Sk
Improving the estimation of the workload using slack values
Consider a situation in which:• The first job executes as close as possible to its deadline• Successive jobs execute as soon as possible
)()()()( LCLNLwLW iiiii
i
iii T
CDLLN )(
))(,min()( iiiiiii TLNCDLCL
where:
Cii
L
Di
Ci Ci Ci
Tiεi
(# jobs excluded the last one)
(last job)
Improving the estimation of the workload using slack values
Consider a situation in which:• The first job executes as close as possible to its deadline• Successive jobs execute as soon as possible
),(),(),()( iiiiiiii SLCSLNSLwLW
i
iiiii T
SCDLSLN ),(
)),(,min(),( iiiiiiiiii TSLNSCDLCSL
where:
Cii
L
Di
Ci Ci Ci
TiSi
RTA for generic global schedulers
• An upper bound on the WCRT of task k can be given by the fixed point of Rk in the iteration:
kikkikikk CRSRw
mCR )1),,(min(
1
kkk RDS
1.
2.
If a fixed point Rk ≤ Dk is reached for every task k in the system, the task set is schedulable with any work-conseving global scheduler.
Iterative schedulability algorithm
1. All slacks initialized to zero
2. Compute slack lower bound for tasks 1,…,n– if higher than old value update slack bound– If lower, do nothing
3. If all tasks have a positive slack lower bound return success
4. If no slack has been updated for tasks 1,…,n return fail
5. Otherwise, return to point 2
Refining the analysis for particular policies
• We can exploit further information on the scheduling algorithm in use to tighten the bounds on interference and workload
• Refined analysis for:– Fixed Priority– EDF
RTA for Fixed Priority
• The interference on higher priority tasks is always null:
• For a system scheduled with FP, an upper bound on the WCRT of task k can be given by the fixed point of Rk in the iteration:
kiRI kik ,0)(
kikkikikk CRSRw
mCR )1),,(min(
1
kkk RDS 2.
1.
RTA for EDF
• Still valid the bound:• A different bound can be derived analyzing the
worst-case workload in a situation in which:• The interfering and interfered tasks have a common deadline• All jobs execute as late as possible
),()( ikikik SRwRI
Cii
Dk
Di
Ci Ci
Ti
k
Si
),()()( ikEDFik
ikk
ik SDwDIRI
ii
ikik C
T
DDDBF
1
with:
i
i
iikki
ikiki S
C
TDBFDCDBFSDw
0
,min),(
),()( ikikik SRwRI
and:
RTA for EDF
• For a system scheduled with EDF, an upper bound on the WCRT of task k can be given by the fixed point of Rk in the iteration:
kikkikiikikk CRSDwSRw
mCR )1),,(),,(min(
1
kkk RDS 2.
1.
If a fixed point Rk ≤ Dk is reached for every task k in the system, the task set is schedulable with global EDF.
Complexity
• Pseudo-polynomial complexity.• Depends on the order in which the slack lower
bounds are updated.• We verified the schedulability of millions of task
sets in a few minutes on a normal device.• Test particularly fast for Fixed Priority systems:
at most one slack update per task, if slacks are updated in decreasing priority order.
Experimental results for EDF
• 2 processors
• Constrained deadlines
• 1.000.000 task sets generated
• Our test is constantly superior at all utilizations
generatedtask sets
our test
Improvement over existing solutions
Task set utilization
task sets
Bertogna et al.’05Baker et al.’07Goossens et al.’03RTA-EDFTotal task sets
Experimental results for FP
• 2 processors
• Constrained deadlines
• 1.000.000 task sets generated
• Our test is constantly superior at all utilizations
generatedtask sets
our test
Task set utilization
task sets
Density boundBaker et al.’07Bertogna et al.’05RTA-FPTotal task sets
FP vs EDF
• 4 processors
• Constrained deadlines
• 1.000.000 task sets generated
• our FP test is constantly superior to all tests at every utilization
generatedtask sets
our FP test
Task set utilization
task sets
our EDF test
Goossens et al.’03RTA-EDFBaker et al.’07RTA-FPTotal task sets
Evaluations
• Our test behaves better than any existing polynomial and pseudo-polynomial schedulability test in literature
• However, it doesn’t dominate all of them• Resource augmentation bound needed• The test is also sustainable (probably a non-
trivial resource augmentation bound can be achieved)
Conclusions
• Multiprocessor Real-Time systems are a promising field to explore.
• Still few existing results far from tight conditions.• We contributed filling this gap.• Future work:
– Find tighter schedulability tests.– Use our techniques to analyze the efficiency of other
scheduling algorithms (EDZL, EDF-US, FP-DS, etc).– Take into account exclusive resources access.– Integrate into Resource Reservation framework.
Moore’s law effects
0,1
1
10
100
1000
71 74 78 85 92 00 04 08
Power
40048008
80808085
8086286
386486
PentiumP1
P2
P4
Pentium Tejascancelled!
P3Hot-plate
NuclearReactor
STOP
Clock speed limited to less than 4 GHzLeakage current intolerable @ 90nm
Year
Power (W)
Motivations
• Improve computing performances at reasonable power consumption.
• Multiprocessor-based architectures:– High-level computing: Intel’s Pentium D, Core 2 Duo,
Itanium and Xeon; AMD’s Opteron, Quad FX and Athlon64 X2; etc.
– Embedded market: TI’s OMAP, NXP’s Nexperia, STM’s Nomadik, ARM’s MPCore, Sony-IBM-Toshiba’s Cell, and many others.
• How to program these devices?
T
Multiprocessor scheduling anomalies
• Scheduling problem is in general NP-hard.• Schedulability problem is as well NP-hard.• Dhall’s effect significantly degrades
perfromances of classical scheduling algorithms.
• Synchronous instant is not “critical”.• Only sufficient schedulability conditions.
DEADLINEMISS
Utot 1