response-time analysis for globally scheduled symmetric multiprocessor platforms

Response-Time Analysis for globally scheduled Symmetric Multiprocessor Platforms

Real-Time Systems Laboratory

RETIS Lab

Marko Bertogna, Michele Cirinei

RTSS’07

Overview

• Multiprocessor global scheduling• Existing schedulability tests for global

schedulers• Limits of existing techniques• Extending Response Time Analysis to

multiprocessor systems:• Generic work-conserving schedulers• Global EDF• Global FP

• Simulations and conclusions

Global scheduling on SMP

CPU1

CPU2

CPU3

Global queue(ordered according to a given

policy)

The first m tasks are scheduled upon the m CPUs


CPU1

CPU2

CPU3

When a task finishes its execution, the next one in the queue is scheduled on the available CPU


policy)


CPU1

CPU2

CPU3

When a higher priority task arrives, it preempts the task with lowest priority among the executing ones


policy)


CPU1

CPU2

CPU3

When another task ends its execution, the preempted task can resume its execution

Task “migrated” from

CPU3 to CPU1


policy)

• Single system-wide queue instead of multiple per-processor queues:

CPU1

CPU2

CPU3

Global scheduling properties

CPU1

CPU2

CPU3

Global scheduling Partitioned approach

Global scheduling advantages

Advantages of global schedulers w.r.t. partitioning algorithms:

• Load automatically balanced• More efficient handling of overload conditions• More flexible reclaiming of unused bandwidth• Easier re-scheduling• Lower number of preemptions (but need to limit

migration cost)

On-line scheduling problem for global schedulers

• Limited performances of classical algorithms (EDF, RM, etc.)

• Pfair optimal only for implicit deadlines (Di = Ti)

• No optimal algorithm known for more general task models

• Many hybrid solutions proposed (EDF-US, RM-US, fpEDF, EDZL, etc.)

Schedulability problem for global scheduling

• All known exact tests are computationally intractable for non-trivial task sets

• Many different sufficient schedulability tests• Big gap from necessary and sufficient

conditions difficult to “compare” the various scheduling policies

• Need to reduce this schedulability gap

Considered task model

• Periodic and sporadic tasks: = (Ci,Di,Ti)

• Constrained deadlines: Di ≤ Ti

• Platform composed by m identical processors• Work-conserving global schedulers

Work-conserving scheduling policy:a processor is never idled when a task is ready to execute.

i

Existing schedulability tests for work-conserving global policies

• Fixed task priority:• Andersson et al.: utilization bound (RTSS’01), later improved

and extended for constrained deadlines by Bertogna et al. (OPODIS’05)

• Baker: demand-based polynomial test (RTSS’03, JTCC’06 and JEC’07)

• Fisher, Baruah: load-based pseudo-polynomial tests (IASTED’06, OPODIS’07, ICDCN’08)

• Dynamic task priority:• Goossens et al.: EDF utilization bound (RTSJ’03)

(Utot ≤ m(1-Umax)+Umax)) later extended for arbitrary deadlines• Baker: demand-based polynomial test (RTSS’03 and TPDS’07)• Bertogna et al.: demand-based ploynomial test (ECRTS’05)• Fisher, Baruah: load-based tests (ECRTS’07, RTSS’07)

Existing schedulability tests for work-conserving global policies

• Hybrid algorithms:• Srinivasan, Baruah: bound for EDF-US (IPL’02) generalized

by Baruah’s bound for fpEDF (Utot ≤ ) valid for implicit deadlines (TC’04)

• Cirinei, Baker: EDZL demand-based polynomial test (ECRTS’07)

• Dynamic job priority: • Pfair (Utot ≤ m, valid only for implicit deadlines)• Andersson, Tovar: EKG for implicit deadlines (RTCSA’06)

• Feasibility results:• Fisher, Baruah: load based pseudo-polynomial test

(ECRTS’06 improved in ECRTS’07)• Baker, Cirinei: load-based pseudo-polynomial necessary

test (RTSS’06)

2

1m

Our approach

• All existing schedulability tests have poor performances

• A better analysis of worst-case situations is needed

• Refine the estimation of the maximum interference a task can impose on other taks

• Apply Response Time Analysis (RTA) to multiprocessor systems; then check if

WCRTi ≤ Di for all tasks

RTA for Uniprocessors

The synchronous arrival of all tasks is a critical instant: we can compute the worst-case interferences considering that situation.

… … …

12

n• Synchronous arrivals• Jobs released as soon as permitted

RTA for Uniprocessors

• For FP, the worst-case response time of a task is given by the first instance released at a critical instant

• For EDF, it is given by an instance in a busy interval starting with a critical instant

With these observations it is possible to compute the WCRT of all tasks. Example: for FP, the WCRT of a task k is given by the fixed point of:

ihp i

kkk C

T

RCR

i

And for Multiprocessor?

For global schedulers, things are much more difficult:

• The synchronous arrival of tasks doesn’t represent a critical instant.

• Difficult to find a worst-case situation in which to compute the maximum response times.

• Need to introduce some pessimistic assumptions to make things easier

Introducing the interference

Ik = Total interference suffered by task k

Iki = Interference of task i on task k

kik

ikkkkkk RI

mCRICR )(

1)(

m

RIRI k

ik

kk

)()(

kk

kCPU1CPU2CPU3

rkrk+Rk

Ik2Ik1

Ik2

Ik3Ik4Ik5

Ik6

Ik8Ik5

Ik3

Ik7

Ik3

Limiting the interference

IDEA: It is sufficient to consider at most the portion Rk-Ck+1 of each term Ii,k in the sum

kikkk

ikkk CRRI

mCR )1),(min(

1It can be proved that WCRTk is given by the fixed point of:

1)()( kkkkkik CRRIRI

kk

kCPU1CPU2CPU3

rkrk+Rk

Ik2Ik1

Ik2

Ik3Ik4Ik5

Ik6

Ik8Ik5

Ik3

Ik7

Ik3

Bounding the interference

Exactly computing the interference is complex

Pessimistic assumptions:

1. Bound the interference of a task with the workload: .

2. Use an upper bound on the workload.

)()( kikik RWRI

Improving the estimation of the workload using slack values

Consider a situation in which:• The first job executes as close as possible to its deadline• Successive jobs execute as soon as possible

)()()()( LCLNLwLW iiiii

i

iii T

CDLLN )(

))(,min()( iiiiiii TLNCDLCL

where:

Cii

L

Di

Ci Ci Ci

Tiεi

(# jobs excluded the last one)

(last job)

RTA for generic global schedulers

• An upper bound on the WCRT of task k is given by the fixed point of Rk in the iteration:

• The slack of task k is at least:

kikkkikk CRRw

mCR )1),(min(

1

kkk RDS

Rk Sk



)()()()( LCLNLwLW iiiii

i

iii T

CDLLN )(

))(,min()( iiiiiii TLNCDLCL

where:

Cii

L

Di

Ci Ci Ci

Tiεi

(# jobs excluded the last one)

(last job)



),(),(),()( iiiiiiii SLCSLNSLwLW

i

iiiii T

SCDLSLN ),(

)),(,min(),( iiiiiiiiii TSLNSCDLCSL

where:

Cii

L

Di

Ci Ci Ci

TiSi

RTA for generic global schedulers

• An upper bound on the WCRT of task k can be given by the fixed point of Rk in the iteration:

kikkikikk CRSRw

mCR )1),,(min(

1

kkk RDS

1.

2.

If a fixed point Rk ≤ Dk is reached for every task k in the system, the task set is schedulable with any work-conseving global scheduler.

Iterative schedulability algorithm

1. All slacks initialized to zero

2. Compute slack lower bound for tasks 1,…,n– if higher than old value update slack bound– If lower, do nothing

3. If all tasks have a positive slack lower bound return success

4. If no slack has been updated for tasks 1,…,n return fail

5. Otherwise, return to point 2

Refining the analysis for particular policies

• We can exploit further information on the scheduling algorithm in use to tighten the bounds on interference and workload

• Refined analysis for:– Fixed Priority– EDF

RTA for Fixed Priority

• The interference on higher priority tasks is always null:

• For a system scheduled with FP, an upper bound on the WCRT of task k can be given by the fixed point of Rk in the iteration:

kiRI kik ,0)(

kikkikikk CRSRw

mCR )1),,(min(

1

kkk RDS 2.

1.

RTA for EDF

• Still valid the bound:• A different bound can be derived analyzing the

worst-case workload in a situation in which:• The interfering and interfered tasks have a common deadline• All jobs execute as late as possible

),()( ikikik SRwRI

Cii

Dk

Di

Ci Ci

Ti

k

Si

),()()( ikEDFik

ikk

ik SDwDIRI

ii

ikik C

T

DDDBF

1

with:

i

i

iikki

ikiki S

C

TDBFDCDBFSDw

0

,min),(

),()( ikikik SRwRI

and:

RTA for EDF

• For a system scheduled with EDF, an upper bound on the WCRT of task k can be given by the fixed point of Rk in the iteration:

kikkikiikikk CRSDwSRw

mCR )1),,(),,(min(

1

kkk RDS 2.

1.

If a fixed point Rk ≤ Dk is reached for every task k in the system, the task set is schedulable with global EDF.

Complexity

• Pseudo-polynomial complexity.• Depends on the order in which the slack lower

bounds are updated.• We verified the schedulability of millions of task

sets in a few minutes on a normal device.• Test particularly fast for Fixed Priority systems:

at most one slack update per task, if slacks are updated in decreasing priority order.

Experimental results for EDF

• 2 processors

• Constrained deadlines

• 1.000.000 task sets generated

• Our test is constantly superior at all utilizations

generatedtask sets

our test

Improvement over existing solutions

Task set utilization

task sets

Bertogna et al.’05Baker et al.’07Goossens et al.’03RTA-EDFTotal task sets

Experimental results for FP

• 2 processors



• Our test is constantly superior at all utilizations

generatedtask sets

our test


task sets

Density boundBaker et al.’07Bertogna et al.’05RTA-FPTotal task sets

FP vs EDF

• 4 processors



• our FP test is constantly superior to all tests at every utilization

generatedtask sets

our FP test


task sets

our EDF test

Goossens et al.’03RTA-EDFBaker et al.’07RTA-FPTotal task sets

Evaluations

• Our test behaves better than any existing polynomial and pseudo-polynomial schedulability test in literature

• However, it doesn’t dominate all of them• Resource augmentation bound needed• The test is also sustainable (probably a non-

trivial resource augmentation bound can be achieved)

Conclusions

• Multiprocessor Real-Time systems are a promising field to explore.

• Still few existing results far from tight conditions.• We contributed filling this gap.• Future work:

– Find tighter schedulability tests.– Use our techniques to analyze the efficiency of other

scheduling algorithms (EDZL, EDF-US, FP-DS, etc).– Take into account exclusive resources access.– Integrate into Resource Reservation framework.

Marko BertognaPhD student

[email protected]

Real-Time Systems Laboratory

RETIS Lab

Thank you

Moore’s law effects

0,1

1

10

100

1000

71 74 78 85 92 00 04 08

Power

40048008

80808085

8086286

386486

PentiumP1

P2

P4

Pentium Tejascancelled!

P3Hot-plate

NuclearReactor

STOP

Clock speed limited to less than 4 GHzLeakage current intolerable @ 90nm

Year

Power (W)

Motivations

• Improve computing performances at reasonable power consumption.

• Multiprocessor-based architectures:– High-level computing: Intel’s Pentium D, Core 2 Duo,

Itanium and Xeon; AMD’s Opteron, Quad FX and Athlon64 X2; etc.

– Embedded market: TI’s OMAP, NXP’s Nexperia, STM’s Nomadik, ARM’s MPCore, Sony-IBM-Toshiba’s Cell, and many others.

• How to program these devices?

T

Multiprocessor scheduling anomalies

• Scheduling problem is in general NP-hard.• Schedulability problem is as well NP-hard.• Dhall’s effect significantly degrades

perfromances of classical scheduling algorithms.

• Synchronous instant is not “critical”.• Only sufficient schedulability conditions.

DEADLINEMISS

Utot 1

response-time analysis for globally scheduled symmetric multiprocessor platforms

Documents

preempted task

icdcn08dynamic task

higher priority task

global schedulingall

conclusionsglobal scheduling

nontrivial task

schedulability problem

loadbased tests ecrts07