r-batch: task partitioning for fault-tolerant multiprocessor real-time systems

34
Carnegie Mellon R-BATCH: Task Partitioning for Fault-tolerant Multiprocessor Real- Time Systems Junsung Kim, Karthik Lakshmanan and Raj Rajkumar Electrical and Computer Engineering Carnegie Mellon University

Upload: questa

Post on 11-Jan-2016

41 views

Category:

Documents


0 download

DESCRIPTION

R-BATCH: Task Partitioning for Fault-tolerant Multiprocessor Real-Time Systems. Junsung Kim , Karthik Lakshmanan and Raj Rajkumar Electrical and Computer Engineering Carnegie Mellon University. Outline. Motivation Goals and Systems Models R-BATCH: Task Allocations with Replicas - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: R-BATCH: Task Partitioning for Fault-tolerant Multiprocessor Real-Time Systems

Carnegie Mellon

R-BATCH: Task Partitioning for Fault-tolerant Multiprocessor Real-Time Systems

Junsung Kim, Karthik Lakshmanan and Raj Rajkumar

Electrical and Computer EngineeringCarnegie Mellon University

Page 2: R-BATCH: Task Partitioning for Fault-tolerant Multiprocessor Real-Time Systems

Carnegie Mellon

OutlineMotivation

Goals and Systems Models

R-BATCH: Task Allocations with Replicas

Performance Evaluation

Conclusion

Page 3: R-BATCH: Task Partitioning for Fault-tolerant Multiprocessor Real-Time Systems

Carnegie Mellon

Motivation → Goals → R-BATCH → Evaluation → Conclusion3

Autonomous Vehicles: Background

GM Chevy Tahoe named “Boss” Won 2007 DARPA urban challenge

Motivation → Goals → R-BATCH → Evaluation → Conclusion3

Page 4: R-BATCH: Task Partitioning for Fault-tolerant Multiprocessor Real-Time Systems

Carnegie Mellon

Motivation → Goals → R-BATCH → Evaluation → Conclusion4

Autonomous Vehicles: BackgroundBoss

Senses environment Fuses sensor data to form

a model of the real world Plans navigation paths Actuates steering wheel,

brake, and acceleratorBoss requires

Safety-critical operations Timing guarantees Robustness to harsh environments

Motivation → Goals → R-BATCH → Evaluation → Conclusion4

Page 5: R-BATCH: Task Partitioning for Fault-tolerant Multiprocessor Real-Time Systems

Carnegie Mellon

Motivation → Goals → R-BATCH → Evaluation → Conclusion5

Autonomous Vehicles: Architecture

0.5 million lines of code for autonomous driving support 10 dual-core processors + 50 embedded processors

Motivation → Goals → R-BATCH → Evaluation → Conclusion5

< Boss System Architecture >

<From C. Urmson et al.’s Tartan Racing: A Multi-Modal Approach to the DARPA Urban Challenge>

Page 6: R-BATCH: Task Partitioning for Fault-tolerant Multiprocessor Real-Time Systems

Carnegie Mellon

Motivation → Goals → R-BATCH → Evaluation → Conclusion6

Autonomous Vehicles: Capabilities

0.5 million lines of code for autonomous driving support 10 dual-core processors + 50 embedded processors

Requires high computational capabilities with timeliness guarantees

Adding more processors Using high-performance processors

Motivation → Goals → R-BATCH → Evaluation → Conclusion6

Page 7: R-BATCH: Task Partitioning for Fault-tolerant Multiprocessor Real-Time Systems

Carnegie Mellon

101

2010, 32nm

2000, 130nm 1989, 800nm

100Log time (years in service)

Infant mortality (random, extrinsic)

Failu

re R

ate

Wear-out (intrinsic)

<From Mark White’s Product Reliability and Qualification Challenges with CMOS Scaling >

Processor Reliability Trend

Motivation → Goals → R-BATCH → Evaluation → Conclusion7

Page 8: R-BATCH: Task Partitioning for Fault-tolerant Multiprocessor Real-Time Systems

Carnegie Mellon

OutlineMotivation

Goals and Systems Models

R-BATCH: Task Allocations with Replicas

Performance Evaluation

Conclusion

Page 9: R-BATCH: Task Partitioning for Fault-tolerant Multiprocessor Real-Time Systems

Carnegie Mellon

Motivation → Goals → R-BATCH → Evaluation → Conclusion9

Goals for Fault-Tolerance Handle permanent processor failures

Tolerate a given number of processor failures Avoid losing functionality by adding more resources in

an affordable way Hardware replication Software replication Re-execution of failed jobs Lower quality of service of tasks

Deal with unpredictable nature of failures Consider all possible scenarios?

Motivation → Goals → R-BATCH → Evaluation → Conclusion9

Page 10: R-BATCH: Task Partitioning for Fault-tolerant Multiprocessor Real-Time Systems

Carnegie Mellon

Motivation → Goals → R-BATCH → Evaluation → Conclusion10

System Model (1 of 2) Primary fault model: fail-stop

An entity stops functioning when it fails instead of alternating between correct and wrong outputs

Fault-containmentcan be guaranteed

Consider a set of periodic tasks Periodic task

Represented by : Worst-case execution time of task ti : Period of task

Task utilization: Total utilization in a processor:

Motivation → Goals → R-BATCH → Evaluation → Conclusion10

Page 11: R-BATCH: Task Partitioning for Fault-tolerant Multiprocessor Real-Time Systems

Carnegie Mellon

Motivation → Goals → R-BATCH → Evaluation → Conclusion11

System Model (2 of 2)Task classifications

Hard recovery task cannot miss the deadline even if a failure occurs

e.g., automotive engine control Soft recovery task

can be recovered in the next period e.g., navigation, chassis unit control

Best-effort recovery task can be recovered if there is an enough room after failure

Motivation → Goals → R-BATCH → Evaluation → Conclusion11

Page 12: R-BATCH: Task Partitioning for Fault-tolerant Multiprocessor Real-Time Systems

Carnegie Mellon

Motivation → Goals → R-BATCH → Evaluation → Conclusion12

Hard Recovery Task

0 𝑇 2𝑇 3𝑇

Failure occurred

Task recovered

𝐶−𝜖

Task should be recovered within

𝐶

Motivation → Goals → R-BATCH → Evaluation → Conclusion12

Processor 1

Processor 2

Page 13: R-BATCH: Task Partitioning for Fault-tolerant Multiprocessor Real-Time Systems

Carnegie Mellon

Motivation → Goals → R-BATCH → Evaluation → Conclusion13

Soft Recovery Task

Task recovered

Failure occurred

Task should be recovered within

𝐶−𝜖𝐶

0 𝑇 2𝑇 3𝑇

Motivation → Goals → R-BATCH → Evaluation → Conclusion13

Processor 1

Processor 2

Page 14: R-BATCH: Task Partitioning for Fault-tolerant Multiprocessor Real-Time Systems

Carnegie Mellon

Motivation → Goals → R-BATCH → Evaluation → Conclusion14

Task Replication

Hot Standby Cold StandbyCan recover Hard Recovery task Can recover Soft Recovery taskRunning multiple copies of a feature Dormant until activatedNo timing penalty Delayed recovery timeUtilization loss No utilization loss without failures

Observations Hot Standby

The primary and the backups running at the same time Cold Standby

One Cold Standby can recover several tasks on different processors

Shared system state is available in all processors By using network bus architecture

Motivation → Goals → R-BATCH → Evaluation → Conclusion14

Page 15: R-BATCH: Task Partitioning for Fault-tolerant Multiprocessor Real-Time Systems

Carnegie Mellon

Motivation → Goals → R-BATCH → Evaluation → Conclusion15

Hard Recovery Task with Hot Standby

0 𝑇 2𝑇 3𝑇

Failure occurred

Task recovered via Hot Standby

𝐶−𝜖𝐶

Motivation → Goals → R-BATCH → Evaluation → Conclusion15

Processor 1

Processor 2

Page 16: R-BATCH: Task Partitioning for Fault-tolerant Multiprocessor Real-Time Systems

Carnegie Mellon

Motivation → Goals → R-BATCH → Evaluation → Conclusion16

Soft Recovery Task with Cold Standby

Processor 1

Processor 2

Task recovered via Cold Standby

Failure occurred

𝐶−𝜖𝐶

0 𝑇 2𝑇 3𝑇

Motivation → Goals → R-BATCH → Evaluation → Conclusion16

Page 17: R-BATCH: Task Partitioning for Fault-tolerant Multiprocessor Real-Time Systems

Carnegie Mellon

Motivation → Goals → R-BATCH → Evaluation → Conclusion17

2H

3H

P1 P2 P3 P4

1P 2P

3P

1C

4P

5H

4C

5P

3C

5C

2P

3H

P1 P2 P3 P4

1P 2P

3P

1C

4P

5H

4P

5P

3C

5H2H

3H

P1 P2 P3 P4

1P 2P

3P

1P

4P

5H

4C

5P

3H

5C

Example Scenarios

P3 failedP1 failed

With 5 tasks and 4 processors

nP: Primary of task nnH: Hot Standbys of task nnC: Cold Standbys of task n

Motivation → Goals → R-BATCH → Evaluation → Conclusion17

Page 18: R-BATCH: Task Partitioning for Fault-tolerant Multiprocessor Real-Time Systems

Carnegie Mellon

OutlineMotivation

Goals and Systems Models

R-BATCH: Task Allocations with Replicas

Performance Evaluation

Conclusion

Page 19: R-BATCH: Task Partitioning for Fault-tolerant Multiprocessor Real-Time Systems

Carnegie Mellon

Motivation → Goals → R-BATCH → Evaluation → Conclusion19

R-BATCH Reliable Bin-packing Algorithm for Tasks with Cold

standby and Hot standby Reliable task allocation Allocates Hot Standbys Allocates Cold Standbys

Motivation → Goals → R-BATCH → Evaluation → Conclusion19

Page 20: R-BATCH: Task Partitioning for Fault-tolerant Multiprocessor Real-Time Systems

Carnegie Mellon

Motivation → Goals → R-BATCH → Evaluation → Conclusion20

Consider a set of periodic tasks Periodic task

Represented by : Worst-case execution time of task ti : Period of task

Task utilization: Total utilization in a processor:

Schedulability For EDF (Earliest Deadline First)

Tasks are schedulable if % For RMS (Rate Monotonic Scheduling)

Tasks are schedulable if % For general tasks

Tasks are schedulable if % For harmonic tasks

Uniprocessor Schedulability*

<* C.L. Liu and J.W. Layland. Scheduling algorithms for multiprogramming in a hard-real-time environment. J. ACM, 1973>

More complex; misbehaves at higher U

Lower utilization

Practical

Motivation → Goals → R-BATCH → Evaluation → Conclusion20

Page 21: R-BATCH: Task Partitioning for Fault-tolerant Multiprocessor Real-Time Systems

Carnegie Mellon

Motivation → Goals → R-BATCH → Evaluation → Conclusion21

Bin-packing Problem

Definition: The problem of packing a set of items into the fewest number of bins such that the total size does not exceed the bin capacity*

Items: Utilizations of each task Bins: Processors

Then, given a set of tasks, how manybins (processors) do we need?†

Tk

Tj

Ti

Tm

Processor P

Task

<* Mark Allen Weiss, from Data Structures and Algorithm Analysis, Addison><†D. Oh and T. Baker. Utilization bounds for n-processor rate monotonic scheduling with static processor assignment. Real-Time Systems, 1998.>

Motivation → Goals → R-BATCH → Evaluation → Conclusion21

Page 22: R-BATCH: Task Partitioning for Fault-tolerant Multiprocessor Real-Time Systems

Carnegie Mellon

Motivation → Goals → R-BATCH → Evaluation → Conclusion22

The Classical Approach: Bin-packing

Bin packing is used to allocate tasks to multiprocessor platforms Best-fit Decreasing (BFD) algorithm

Step 1: Sort the objects in descending order of size Step 2: Sort the bins in descending order of consumed space Step 3: Fit next object into the first sorted bin that fits

If no bin fits, add a new bin to fit into Step 4: If objects remain, go to Step 2. Step 5: Done.

P1 P2 P3 P4

1, 0.6

2, 0.3

3, 0.2

1, 0.6

Given a set of tasks: {0.6, 0.3, 0.2}

2, 0.3

3, 0.2

Motivation → Goals → R-BATCH → Evaluation → Conclusion22

Page 23: R-BATCH: Task Partitioning for Fault-tolerant Multiprocessor Real-Time Systems

Carnegie Mellon

Motivation → Goals → R-BATCH → Evaluation → Conclusion23

BFD with Placement Constraints

We also have to deal with replicated tasks Under the placement constraint (BFD-P*)

No two replicas can be on the same processor Otherwise, processor failure will take down both replicas

<* J. Chen, C. Yang, T.W., and S.Y. Tseng. Real-Time task replication for fault tolerance in identical multiprocessor systems. In Proceedings of the 13th IEEE RTAS, IEEE CS, 2007.>

1P, 0.6

2P, 0.3

3P, 0.21H, 0.6

2H, 0.3

3H, 0.2

P1 P2 P3 P4

1P, 0.6 2P, 0.3

3H, 0.2

1H, 0.6

2H, 0.3

3P, 0.2

Motivation → Goals → R-BATCH → Evaluation → Conclusion23

Page 24: R-BATCH: Task Partitioning for Fault-tolerant Multiprocessor Real-Time Systems

Carnegie Mellon

Motivation → Goals → R-BATCH → Evaluation → Conclusion24

Given a set of tasks: {0.6, 0.3, 0.2} with 2 replicas each By using BFD with placement constraint

We can however reduce the number of bins as follows:

1H, 0.6

1P, 0.6

2P, 0.3

3P, 0.21H, 0.6

2H, 0.3

3H, 0.2

1P, 0.6

2P, 0.3

3P, 0.2 2H, 0.3

3H, 0.3

Can BFD-P Be Improved?

P1 P2 P3 P4

P1 P2 P3 P4

1P, 0.6 2P, 0.3

3H, 0.2

1H, 0.6

2H, 0.3

3P, 0.2

1P, 0.6 2P, 0.3

3H, 0.2

1H, 0.6

2H, 0.3

3P, 0.2

Motivation → Goals → R-BATCH → Evaluation → Conclusion24

Page 25: R-BATCH: Task Partitioning for Fault-tolerant Multiprocessor Real-Time Systems

Carnegie Mellon

Motivation → Goals → R-BATCH → Evaluation → Conclusion25

Reliable BFD (RBFD) RBFD Algorithm

Step 1: Sort tasks in decreasing order according to the utilization of each task

Step 2: Allocate each primary task in the bin which will have the smallest remaining space

Step 3: Set i = 1 Step 4: Allocate ith replica of each task in the bin which will

have the smallest remaining space. Step 5: Increment i and repeat Step 4 until all replicas are

allocated.

1H, 0.6

1P, 0.6

2P, 0.3

3P, 0.2 2H, 0.3

3H, 0.3

P1 P2 P3 P4

1P, 0.6 2P, 0.33H, 0.2

1H, 0.6

2H, 0.3

3P, 0.2

Motivation → Goals → R-BATCH → Evaluation → Conclusion25

Page 26: R-BATCH: Task Partitioning for Fault-tolerant Multiprocessor Real-Time Systems

Carnegie Mellon

Motivation → Goals → R-BATCH → Evaluation → Conclusion26

Given a set of tasks: {0.6, 0.3, 0.2} with 3 replicas each to tolerate 2 processor failures

Instead of using two more processors, add an “empty” processor to hold a “virtual task”

Save More Processors with Cold Standby

1H, 0.6

1P, 0.6

2P, 0.3

3P, 0.2 2H, 0.3

3H, 0.2

P1 P2 P3 P4 P5

1H, 0.6

2H, 0.3

3H, 0.2

1H, 0.6

1P, 0.6

2P, 0.3

3P, 0.2 2H, 0.3

3H, 0.2

P1 P2 P3 P4 P5

1C, 0.62C, 0.33C, 0.2

Motivation → Goals → R-BATCH → Evaluation → Conclusion26

Page 27: R-BATCH: Task Partitioning for Fault-tolerant Multiprocessor Real-Time Systems

Carnegie Mellon

Motivation → Goals → R-BATCH → Evaluation → Conclusion27

Cold Standby with Virtual Task Virtual task

A guaranteed utilization reserving slack for recovering failures via Cold Standby

Generate Virtual Tasks Step 1: Create a new virtual task by selecting the task with the highest

utilization across all processors, which is not allocated to virtual tasks Step 2: Compare the size of virtual task with tasks on different processors,

and check if those tasks can be recovered by using the virtual task Step 3: Go to Step 1 if there are remaining tasks

1H, 0.6

1P, 0.6

2P, 0.3

3P, 0.2 2H, 0.3

3H, 0.2

P1 P2 P3 P4

1C, 0.6

Generated Virtual Task1C covers task 1, 2, and 3

2C, 0.3

3C, 0.21C,

0.6

2C, 0.3

3C, 0.21C,

0.6

Motivation → Goals → R-BATCH → Evaluation → Conclusion27

Page 28: R-BATCH: Task Partitioning for Fault-tolerant Multiprocessor Real-Time Systems

Carnegie Mellon

Motivation → Goals → R-BATCH → Evaluation → Conclusion28

R-BATCH Reliable Bin-packing Algorithm for Tasks with Cold

Standby and Hot Standby Step 1: Perform R-BFD with the primary and Hot Standbys Step 2: Generate virtual tasks Step 3: Perform R-BFD with virtual tasks

1H, 0.6

1P, 0.6

2P, 0.3

3P, 0.2 2H, 0.3

3H, 0.2

P1 P2 P3 P4

1P, 0.6 2P, 0.33H, 0.2

1H, 0.6

2H, 0.3

3P, 0.2

1C, 0.62C, 0.3

3C, 0.21C,

0.6

Motivation → Goals → R-BATCH → Evaluation → Conclusion28

Page 29: R-BATCH: Task Partitioning for Fault-tolerant Multiprocessor Real-Time Systems

Carnegie Mellon

OutlineMotivation

Goals and Systems Models

R-BATCH: Task Allocations with Replicas

Performance Evaluation

Conclusion

Page 30: R-BATCH: Task Partitioning for Fault-tolerant Multiprocessor Real-Time Systems

Carnegie Mellon

Motivation → Goals → R-BATCH → Evaluation → Conclusion31

Performance Evaluation (R-BFD)

𝑢𝑚𝑎𝑥=0.3 Motivation → Goals → R-BATCH → Evaluation → Conclusion

31

Ratios of Saved Processors(Normalized to BFD-P)

Number of Tasks

18%

Page 31: R-BATCH: Task Partitioning for Fault-tolerant Multiprocessor Real-Time Systems

Carnegie Mellon

Motivation → Goals → R-BATCH → Evaluation → Conclusion32

Performance Evaluation (R-BATCH)

Motivation → Goals → R-BATCH → Evaluation → Conclusion32

𝑢𝑚𝑎𝑥=0.3

Ratios of Saved Processors(Normalized to BFD-P)

Number of Tasks

49%

Page 32: R-BATCH: Task Partitioning for Fault-tolerant Multiprocessor Real-Time Systems

Carnegie Mellon

Motivation → Goals → R-BATCH → Evaluation → Conclusion33

Performance Evaluation

Motivation → Goals → R-BATCH → Evaluation → Conclusion33

Ratio

s of

Sav

ed P

roce

ssor

s(N

orm

alize

d to

BFD

-P)

Ratio

s of

Sav

ed P

roce

ssor

s(N

orm

alize

d to

BFD

-P)

R-BFD, R-BATCH,

For smaller task set sizes, R-BFD is more beneficial For larger task set sizes, R-BATCH is more beneficial

Page 33: R-BATCH: Task Partitioning for Fault-tolerant Multiprocessor Real-Time Systems

Carnegie Mellon

Motivation → Goals → R-BATCH → Evaluation → Conclusion34

Back to Boss 20 periodic tasks for autonomous driving support By using R-BATCH

Can tolerate 5 failures with 10 dual-core processors 35% saving compared to BFD-P

With the primary With 1 Hot Standby per task With 4 Cold Standby per task

Motivation → Goals → R-BATCH → Evaluation → Conclusion34

Page 34: R-BATCH: Task Partitioning for Fault-tolerant Multiprocessor Real-Time Systems

Carnegie Mellon

Motivation → Goals → R-BATCH → Evaluation → Conclusion35

Conclusion Many safety-critical real-time systems must also support

redundancy for tolerating faults We defined recovery task models

Hard Recovery Task Soft Recovery Task Best-effort Recovery Task

We used two types of recovery schemes Hot Standby (for Hard Recovery Task) Cold Standby (for Soft Recovery Task)

We can tolerate a fixed number of (fail-stop) failures R-BFD

18% fewer processors with Hot Standby R-BATCH

49% fewer processors with Hot Standby and Cold Standby Utilizes slack for additional tasks

Motivation → Goals → R-BATCH → Evaluation → Conclusion35