r-batch: task partitioning for fault-tolerant multiprocessor real-time systems

Carnegie Mellon

R-BATCH: Task Partitioning for Fault-tolerant Multiprocessor Real-Time Systems

Junsung Kim, Karthik Lakshmanan and Raj Rajkumar

Electrical and Computer EngineeringCarnegie Mellon University

Carnegie Mellon

OutlineMotivation

Goals and Systems Models

R-BATCH: Task Allocations with Replicas

Performance Evaluation

Conclusion

Carnegie Mellon

Motivation → Goals → R-BATCH → Evaluation → Conclusion3

Autonomous Vehicles: Background

GM Chevy Tahoe named “Boss” Won 2007 DARPA urban challenge


Carnegie Mellon


Autonomous Vehicles: BackgroundBoss

Senses environment Fuses sensor data to form

a model of the real world Plans navigation paths Actuates steering wheel,

brake, and acceleratorBoss requires

Safety-critical operations Timing guarantees Robustness to harsh environments


Carnegie Mellon


Autonomous Vehicles: Architecture

0.5 million lines of code for autonomous driving support 10 dual-core processors + 50 embedded processors


< Boss System Architecture >

<From C. Urmson et al.’s Tartan Racing: A Multi-Modal Approach to the DARPA Urban Challenge>

Carnegie Mellon


Autonomous Vehicles: Capabilities

0.5 million lines of code for autonomous driving support 10 dual-core processors + 50 embedded processors

Requires high computational capabilities with timeliness guarantees

Adding more processors Using high-performance processors


Carnegie Mellon

101

2010, 32nm

2000, 130nm 1989, 800nm

100Log time (years in service)

Infant mortality (random, extrinsic)

Failu

re R

ate

Wear-out (intrinsic)

<From Mark White’s Product Reliability and Qualification Challenges with CMOS Scaling >

Processor Reliability Trend


Carnegie Mellon

OutlineMotivation




Conclusion

Carnegie Mellon


Goals for Fault-Tolerance Handle permanent processor failures

Tolerate a given number of processor failures Avoid losing functionality by adding more resources in

an affordable way Hardware replication Software replication Re-execution of failed jobs Lower quality of service of tasks

Deal with unpredictable nature of failures Consider all possible scenarios?


Carnegie Mellon


System Model (1 of 2) Primary fault model: fail-stop

An entity stops functioning when it fails instead of alternating between correct and wrong outputs

Fault-containmentcan be guaranteed

Consider a set of periodic tasks Periodic task

Represented by : Worst-case execution time of task ti : Period of task

Task utilization: Total utilization in a processor:


Carnegie Mellon


System Model (2 of 2)Task classifications

Hard recovery task cannot miss the deadline even if a failure occurs

e.g., automotive engine control Soft recovery task

can be recovered in the next period e.g., navigation, chassis unit control

Best-effort recovery task can be recovered if there is an enough room after failure


Carnegie Mellon


Hard Recovery Task

0 𝑇 2𝑇 3𝑇

Failure occurred

Task recovered

𝐶−𝜖

Task should be recovered within

𝐶


Processor 1

Processor 2

Carnegie Mellon


Soft Recovery Task

Task recovered

Failure occurred

Task should be recovered within

𝐶−𝜖𝐶

0 𝑇 2𝑇 3𝑇


Processor 1

Processor 2

Carnegie Mellon


Task Replication

Hot Standby Cold StandbyCan recover Hard Recovery task Can recover Soft Recovery taskRunning multiple copies of a feature Dormant until activatedNo timing penalty Delayed recovery timeUtilization loss No utilization loss without failures

Observations Hot Standby

The primary and the backups running at the same time Cold Standby

One Cold Standby can recover several tasks on different processors

Shared system state is available in all processors By using network bus architecture


Carnegie Mellon


Hard Recovery Task with Hot Standby

0 𝑇 2𝑇 3𝑇

Failure occurred

Task recovered via Hot Standby

𝐶−𝜖𝐶


Processor 1

Processor 2

Carnegie Mellon


Soft Recovery Task with Cold Standby

Processor 1

Processor 2

Task recovered via Cold Standby

Failure occurred

𝐶−𝜖𝐶

0 𝑇 2𝑇 3𝑇


Carnegie Mellon


2H

3H

P1 P2 P3 P4

1P 2P

3P

1C

4P

5H

4C

5P

3C

5C

2P

3H

P1 P2 P3 P4

1P 2P

3P

1C

4P

5H

4P

5P

3C

5H2H

3H

P1 P2 P3 P4

1P 2P

3P

1P

4P

5H

4C

5P

3H

5C

Example Scenarios

P3 failedP1 failed

With 5 tasks and 4 processors

nP: Primary of task nnH: Hot Standbys of task nnC: Cold Standbys of task n


Carnegie Mellon

OutlineMotivation




Conclusion

Carnegie Mellon


R-BATCH Reliable Bin-packing Algorithm for Tasks with Cold

standby and Hot standby Reliable task allocation Allocates Hot Standbys Allocates Cold Standbys


Carnegie Mellon


Consider a set of periodic tasks Periodic task

Represented by : Worst-case execution time of task ti : Period of task

Task utilization: Total utilization in a processor:

Schedulability For EDF (Earliest Deadline First)

Tasks are schedulable if % For RMS (Rate Monotonic Scheduling)

Tasks are schedulable if % For general tasks

Tasks are schedulable if % For harmonic tasks

Uniprocessor Schedulability*

<* C.L. Liu and J.W. Layland. Scheduling algorithms for multiprogramming in a hard-real-time environment. J. ACM, 1973>

More complex; misbehaves at higher U

Lower utilization

Practical


Carnegie Mellon


Bin-packing Problem

Definition: The problem of packing a set of items into the fewest number of bins such that the total size does not exceed the bin capacity*

Items: Utilizations of each task Bins: Processors

Then, given a set of tasks, how manybins (processors) do we need?†

Tk

Tj

Ti

Tm

Processor P

Task

<* Mark Allen Weiss, from Data Structures and Algorithm Analysis, Addison><†D. Oh and T. Baker. Utilization bounds for n-processor rate monotonic scheduling with static processor assignment. Real-Time Systems, 1998.>


Carnegie Mellon


The Classical Approach: Bin-packing

Bin packing is used to allocate tasks to multiprocessor platforms Best-fit Decreasing (BFD) algorithm

Step 1: Sort the objects in descending order of size Step 2: Sort the bins in descending order of consumed space Step 3: Fit next object into the first sorted bin that fits

If no bin fits, add a new bin to fit into Step 4: If objects remain, go to Step 2. Step 5: Done.

P1 P2 P3 P4

1, 0.6

2, 0.3

3, 0.2

1, 0.6

Given a set of tasks: {0.6, 0.3, 0.2}

2, 0.3

3, 0.2


Carnegie Mellon


BFD with Placement Constraints

We also have to deal with replicated tasks Under the placement constraint (BFD-P*)

No two replicas can be on the same processor Otherwise, processor failure will take down both replicas

<* J. Chen, C. Yang, T.W., and S.Y. Tseng. Real-Time task replication for fault tolerance in identical multiprocessor systems. In Proceedings of the 13th IEEE RTAS, IEEE CS, 2007.>

1P, 0.6

2P, 0.3

3P, 0.21H, 0.6

2H, 0.3

3H, 0.2

P1 P2 P3 P4

1P, 0.6 2P, 0.3

3H, 0.2

1H, 0.6

2H, 0.3

3P, 0.2


Carnegie Mellon


Given a set of tasks: {0.6, 0.3, 0.2} with 2 replicas each By using BFD with placement constraint

We can however reduce the number of bins as follows:

1H, 0.6

1P, 0.6

2P, 0.3

3P, 0.21H, 0.6

2H, 0.3

3H, 0.2

1P, 0.6

2P, 0.3

3P, 0.2 2H, 0.3

3H, 0.3

Can BFD-P Be Improved?

P1 P2 P3 P4

P1 P2 P3 P4

1P, 0.6 2P, 0.3

3H, 0.2

1H, 0.6

2H, 0.3

3P, 0.2

1P, 0.6 2P, 0.3

3H, 0.2

1H, 0.6

2H, 0.3

3P, 0.2


Carnegie Mellon


Reliable BFD (RBFD) RBFD Algorithm

Step 1: Sort tasks in decreasing order according to the utilization of each task

Step 2: Allocate each primary task in the bin which will have the smallest remaining space

Step 3: Set i = 1 Step 4: Allocate ith replica of each task in the bin which will

have the smallest remaining space. Step 5: Increment i and repeat Step 4 until all replicas are

allocated.

1H, 0.6

1P, 0.6

2P, 0.3

3P, 0.2 2H, 0.3

3H, 0.3

P1 P2 P3 P4

1P, 0.6 2P, 0.33H, 0.2

1H, 0.6

2H, 0.3

3P, 0.2


Carnegie Mellon


Given a set of tasks: {0.6, 0.3, 0.2} with 3 replicas each to tolerate 2 processor failures

Instead of using two more processors, add an “empty” processor to hold a “virtual task”

Save More Processors with Cold Standby

1H, 0.6

1P, 0.6

2P, 0.3

3P, 0.2 2H, 0.3

3H, 0.2

P1 P2 P3 P4 P5

1H, 0.6

2H, 0.3

3H, 0.2

1H, 0.6

1P, 0.6

2P, 0.3

3P, 0.2 2H, 0.3

3H, 0.2

P1 P2 P3 P4 P5

1C, 0.62C, 0.33C, 0.2


Carnegie Mellon


Cold Standby with Virtual Task Virtual task

A guaranteed utilization reserving slack for recovering failures via Cold Standby

Generate Virtual Tasks Step 1: Create a new virtual task by selecting the task with the highest

utilization across all processors, which is not allocated to virtual tasks Step 2: Compare the size of virtual task with tasks on different processors,

and check if those tasks can be recovered by using the virtual task Step 3: Go to Step 1 if there are remaining tasks

1H, 0.6

1P, 0.6

2P, 0.3

3P, 0.2 2H, 0.3

3H, 0.2

P1 P2 P3 P4

1C, 0.6

Generated Virtual Task1C covers task 1, 2, and 3

2C, 0.3

3C, 0.21C,

0.6

2C, 0.3

3C, 0.21C,

0.6


Carnegie Mellon


R-BATCH Reliable Bin-packing Algorithm for Tasks with Cold

Standby and Hot Standby Step 1: Perform R-BFD with the primary and Hot Standbys Step 2: Generate virtual tasks Step 3: Perform R-BFD with virtual tasks

1H, 0.6

1P, 0.6

2P, 0.3

3P, 0.2 2H, 0.3

3H, 0.2

P1 P2 P3 P4

1P, 0.6 2P, 0.33H, 0.2

1H, 0.6

2H, 0.3

3P, 0.2

1C, 0.62C, 0.3

3C, 0.21C,

0.6


Carnegie Mellon

OutlineMotivation




Conclusion

Carnegie Mellon


Performance Evaluation (R-BFD)

𝑢𝑚𝑎𝑥=0.3 Motivation → Goals → R-BATCH → Evaluation → Conclusion

31

Ratios of Saved Processors(Normalized to BFD-P)

Number of Tasks

18%

Carnegie Mellon


Performance Evaluation (R-BATCH)


𝑢𝑚𝑎𝑥=0.3

Ratios of Saved Processors(Normalized to BFD-P)

Number of Tasks

49%

Carnegie Mellon




Ratio

s of

Sav

ed P

roce

ssor

s(N

orm

alize

d to

BFD

-P)

Ratio

s of

Sav

ed P

roce

ssor

s(N

orm

alize

d to

BFD

-P)

R-BFD, R-BATCH,

For smaller task set sizes, R-BFD is more beneficial For larger task set sizes, R-BATCH is more beneficial

Carnegie Mellon


Back to Boss 20 periodic tasks for autonomous driving support By using R-BATCH

Can tolerate 5 failures with 10 dual-core processors 35% saving compared to BFD-P

With the primary With 1 Hot Standby per task With 4 Cold Standby per task


Carnegie Mellon


Conclusion Many safety-critical real-time systems must also support

redundancy for tolerating faults We defined recovery task models

Hard Recovery Task Soft Recovery Task Best-effort Recovery Task

We used two types of recovery schemes Hot Standby (for Hard Recovery Task) Cold Standby (for Soft Recovery Task)

We can tolerate a fixed number of (fail-stop) failures R-BFD

18% fewer processors with Hot Standby R-BATCH

49% fewer processors with Hot Standby and Cold Standby Utilizes slack for additional tasks


r-batch: task partitioning for fault-tolerant multiprocessor real-time systems

Documents

functions boss

systems models rbatch

autonomous driving

real world model

high humidity

high temperature

task allocations

task partitioning