r-batch: task partitioning for fault-tolerant multiprocessor real-time systems
DESCRIPTION
R-BATCH: Task Partitioning for Fault-tolerant Multiprocessor Real-Time Systems. Junsung Kim , Karthik Lakshmanan and Raj Rajkumar Electrical and Computer Engineering Carnegie Mellon University. Outline. Motivation Goals and Systems Models R-BATCH: Task Allocations with Replicas - PowerPoint PPT PresentationTRANSCRIPT
Carnegie Mellon
R-BATCH: Task Partitioning for Fault-tolerant Multiprocessor Real-Time Systems
Junsung Kim, Karthik Lakshmanan and Raj Rajkumar
Electrical and Computer EngineeringCarnegie Mellon University
Carnegie Mellon
OutlineMotivation
Goals and Systems Models
R-BATCH: Task Allocations with Replicas
Performance Evaluation
Conclusion
Carnegie Mellon
Motivation → Goals → R-BATCH → Evaluation → Conclusion3
Autonomous Vehicles: Background
GM Chevy Tahoe named “Boss” Won 2007 DARPA urban challenge
Motivation → Goals → R-BATCH → Evaluation → Conclusion3
Carnegie Mellon
Motivation → Goals → R-BATCH → Evaluation → Conclusion4
Autonomous Vehicles: BackgroundBoss
Senses environment Fuses sensor data to form
a model of the real world Plans navigation paths Actuates steering wheel,
brake, and acceleratorBoss requires
Safety-critical operations Timing guarantees Robustness to harsh environments
Motivation → Goals → R-BATCH → Evaluation → Conclusion4
Carnegie Mellon
Motivation → Goals → R-BATCH → Evaluation → Conclusion5
Autonomous Vehicles: Architecture
0.5 million lines of code for autonomous driving support 10 dual-core processors + 50 embedded processors
Motivation → Goals → R-BATCH → Evaluation → Conclusion5
< Boss System Architecture >
<From C. Urmson et al.’s Tartan Racing: A Multi-Modal Approach to the DARPA Urban Challenge>
Carnegie Mellon
Motivation → Goals → R-BATCH → Evaluation → Conclusion6
Autonomous Vehicles: Capabilities
0.5 million lines of code for autonomous driving support 10 dual-core processors + 50 embedded processors
Requires high computational capabilities with timeliness guarantees
Adding more processors Using high-performance processors
Motivation → Goals → R-BATCH → Evaluation → Conclusion6
Carnegie Mellon
101
2010, 32nm
2000, 130nm 1989, 800nm
100Log time (years in service)
Infant mortality (random, extrinsic)
Failu
re R
ate
Wear-out (intrinsic)
<From Mark White’s Product Reliability and Qualification Challenges with CMOS Scaling >
Processor Reliability Trend
Motivation → Goals → R-BATCH → Evaluation → Conclusion7
Carnegie Mellon
OutlineMotivation
Goals and Systems Models
R-BATCH: Task Allocations with Replicas
Performance Evaluation
Conclusion
Carnegie Mellon
Motivation → Goals → R-BATCH → Evaluation → Conclusion9
Goals for Fault-Tolerance Handle permanent processor failures
Tolerate a given number of processor failures Avoid losing functionality by adding more resources in
an affordable way Hardware replication Software replication Re-execution of failed jobs Lower quality of service of tasks
Deal with unpredictable nature of failures Consider all possible scenarios?
Motivation → Goals → R-BATCH → Evaluation → Conclusion9
Carnegie Mellon
Motivation → Goals → R-BATCH → Evaluation → Conclusion10
System Model (1 of 2) Primary fault model: fail-stop
An entity stops functioning when it fails instead of alternating between correct and wrong outputs
Fault-containmentcan be guaranteed
Consider a set of periodic tasks Periodic task
Represented by : Worst-case execution time of task ti : Period of task
Task utilization: Total utilization in a processor:
Motivation → Goals → R-BATCH → Evaluation → Conclusion10
Carnegie Mellon
Motivation → Goals → R-BATCH → Evaluation → Conclusion11
System Model (2 of 2)Task classifications
Hard recovery task cannot miss the deadline even if a failure occurs
e.g., automotive engine control Soft recovery task
can be recovered in the next period e.g., navigation, chassis unit control
Best-effort recovery task can be recovered if there is an enough room after failure
Motivation → Goals → R-BATCH → Evaluation → Conclusion11
Carnegie Mellon
Motivation → Goals → R-BATCH → Evaluation → Conclusion12
Hard Recovery Task
0 𝑇 2𝑇 3𝑇
Failure occurred
Task recovered
𝐶−𝜖
Task should be recovered within
𝐶
Motivation → Goals → R-BATCH → Evaluation → Conclusion12
Processor 1
Processor 2
Carnegie Mellon
Motivation → Goals → R-BATCH → Evaluation → Conclusion13
Soft Recovery Task
Task recovered
Failure occurred
Task should be recovered within
𝐶−𝜖𝐶
0 𝑇 2𝑇 3𝑇
Motivation → Goals → R-BATCH → Evaluation → Conclusion13
Processor 1
Processor 2
Carnegie Mellon
Motivation → Goals → R-BATCH → Evaluation → Conclusion14
Task Replication
Hot Standby Cold StandbyCan recover Hard Recovery task Can recover Soft Recovery taskRunning multiple copies of a feature Dormant until activatedNo timing penalty Delayed recovery timeUtilization loss No utilization loss without failures
Observations Hot Standby
The primary and the backups running at the same time Cold Standby
One Cold Standby can recover several tasks on different processors
Shared system state is available in all processors By using network bus architecture
Motivation → Goals → R-BATCH → Evaluation → Conclusion14
Carnegie Mellon
Motivation → Goals → R-BATCH → Evaluation → Conclusion15
Hard Recovery Task with Hot Standby
0 𝑇 2𝑇 3𝑇
Failure occurred
Task recovered via Hot Standby
𝐶−𝜖𝐶
Motivation → Goals → R-BATCH → Evaluation → Conclusion15
Processor 1
Processor 2
Carnegie Mellon
Motivation → Goals → R-BATCH → Evaluation → Conclusion16
Soft Recovery Task with Cold Standby
Processor 1
Processor 2
Task recovered via Cold Standby
Failure occurred
𝐶−𝜖𝐶
0 𝑇 2𝑇 3𝑇
Motivation → Goals → R-BATCH → Evaluation → Conclusion16
Carnegie Mellon
Motivation → Goals → R-BATCH → Evaluation → Conclusion17
2H
3H
P1 P2 P3 P4
1P 2P
3P
1C
4P
5H
4C
5P
3C
5C
2P
3H
P1 P2 P3 P4
1P 2P
3P
1C
4P
5H
4P
5P
3C
5H2H
3H
P1 P2 P3 P4
1P 2P
3P
1P
4P
5H
4C
5P
3H
5C
Example Scenarios
P3 failedP1 failed
With 5 tasks and 4 processors
nP: Primary of task nnH: Hot Standbys of task nnC: Cold Standbys of task n
Motivation → Goals → R-BATCH → Evaluation → Conclusion17
Carnegie Mellon
OutlineMotivation
Goals and Systems Models
R-BATCH: Task Allocations with Replicas
Performance Evaluation
Conclusion
Carnegie Mellon
Motivation → Goals → R-BATCH → Evaluation → Conclusion19
R-BATCH Reliable Bin-packing Algorithm for Tasks with Cold
standby and Hot standby Reliable task allocation Allocates Hot Standbys Allocates Cold Standbys
Motivation → Goals → R-BATCH → Evaluation → Conclusion19
Carnegie Mellon
Motivation → Goals → R-BATCH → Evaluation → Conclusion20
Consider a set of periodic tasks Periodic task
Represented by : Worst-case execution time of task ti : Period of task
Task utilization: Total utilization in a processor:
Schedulability For EDF (Earliest Deadline First)
Tasks are schedulable if % For RMS (Rate Monotonic Scheduling)
Tasks are schedulable if % For general tasks
Tasks are schedulable if % For harmonic tasks
Uniprocessor Schedulability*
<* C.L. Liu and J.W. Layland. Scheduling algorithms for multiprogramming in a hard-real-time environment. J. ACM, 1973>
More complex; misbehaves at higher U
Lower utilization
Practical
Motivation → Goals → R-BATCH → Evaluation → Conclusion20
Carnegie Mellon
Motivation → Goals → R-BATCH → Evaluation → Conclusion21
Bin-packing Problem
Definition: The problem of packing a set of items into the fewest number of bins such that the total size does not exceed the bin capacity*
Items: Utilizations of each task Bins: Processors
Then, given a set of tasks, how manybins (processors) do we need?†
Tk
Tj
Ti
Tm
Processor P
Task
<* Mark Allen Weiss, from Data Structures and Algorithm Analysis, Addison><†D. Oh and T. Baker. Utilization bounds for n-processor rate monotonic scheduling with static processor assignment. Real-Time Systems, 1998.>
Motivation → Goals → R-BATCH → Evaluation → Conclusion21
Carnegie Mellon
Motivation → Goals → R-BATCH → Evaluation → Conclusion22
The Classical Approach: Bin-packing
Bin packing is used to allocate tasks to multiprocessor platforms Best-fit Decreasing (BFD) algorithm
Step 1: Sort the objects in descending order of size Step 2: Sort the bins in descending order of consumed space Step 3: Fit next object into the first sorted bin that fits
If no bin fits, add a new bin to fit into Step 4: If objects remain, go to Step 2. Step 5: Done.
P1 P2 P3 P4
1, 0.6
2, 0.3
3, 0.2
1, 0.6
Given a set of tasks: {0.6, 0.3, 0.2}
2, 0.3
3, 0.2
Motivation → Goals → R-BATCH → Evaluation → Conclusion22
Carnegie Mellon
Motivation → Goals → R-BATCH → Evaluation → Conclusion23
BFD with Placement Constraints
We also have to deal with replicated tasks Under the placement constraint (BFD-P*)
No two replicas can be on the same processor Otherwise, processor failure will take down both replicas
<* J. Chen, C. Yang, T.W., and S.Y. Tseng. Real-Time task replication for fault tolerance in identical multiprocessor systems. In Proceedings of the 13th IEEE RTAS, IEEE CS, 2007.>
1P, 0.6
2P, 0.3
3P, 0.21H, 0.6
2H, 0.3
3H, 0.2
P1 P2 P3 P4
1P, 0.6 2P, 0.3
3H, 0.2
1H, 0.6
2H, 0.3
3P, 0.2
Motivation → Goals → R-BATCH → Evaluation → Conclusion23
Carnegie Mellon
Motivation → Goals → R-BATCH → Evaluation → Conclusion24
Given a set of tasks: {0.6, 0.3, 0.2} with 2 replicas each By using BFD with placement constraint
We can however reduce the number of bins as follows:
1H, 0.6
1P, 0.6
2P, 0.3
3P, 0.21H, 0.6
2H, 0.3
3H, 0.2
1P, 0.6
2P, 0.3
3P, 0.2 2H, 0.3
3H, 0.3
Can BFD-P Be Improved?
P1 P2 P3 P4
P1 P2 P3 P4
1P, 0.6 2P, 0.3
3H, 0.2
1H, 0.6
2H, 0.3
3P, 0.2
1P, 0.6 2P, 0.3
3H, 0.2
1H, 0.6
2H, 0.3
3P, 0.2
Motivation → Goals → R-BATCH → Evaluation → Conclusion24
Carnegie Mellon
Motivation → Goals → R-BATCH → Evaluation → Conclusion25
Reliable BFD (RBFD) RBFD Algorithm
Step 1: Sort tasks in decreasing order according to the utilization of each task
Step 2: Allocate each primary task in the bin which will have the smallest remaining space
Step 3: Set i = 1 Step 4: Allocate ith replica of each task in the bin which will
have the smallest remaining space. Step 5: Increment i and repeat Step 4 until all replicas are
allocated.
1H, 0.6
1P, 0.6
2P, 0.3
3P, 0.2 2H, 0.3
3H, 0.3
P1 P2 P3 P4
1P, 0.6 2P, 0.33H, 0.2
1H, 0.6
2H, 0.3
3P, 0.2
Motivation → Goals → R-BATCH → Evaluation → Conclusion25
Carnegie Mellon
Motivation → Goals → R-BATCH → Evaluation → Conclusion26
Given a set of tasks: {0.6, 0.3, 0.2} with 3 replicas each to tolerate 2 processor failures
Instead of using two more processors, add an “empty” processor to hold a “virtual task”
Save More Processors with Cold Standby
1H, 0.6
1P, 0.6
2P, 0.3
3P, 0.2 2H, 0.3
3H, 0.2
P1 P2 P3 P4 P5
1H, 0.6
2H, 0.3
3H, 0.2
1H, 0.6
1P, 0.6
2P, 0.3
3P, 0.2 2H, 0.3
3H, 0.2
P1 P2 P3 P4 P5
1C, 0.62C, 0.33C, 0.2
Motivation → Goals → R-BATCH → Evaluation → Conclusion26
Carnegie Mellon
Motivation → Goals → R-BATCH → Evaluation → Conclusion27
Cold Standby with Virtual Task Virtual task
A guaranteed utilization reserving slack for recovering failures via Cold Standby
Generate Virtual Tasks Step 1: Create a new virtual task by selecting the task with the highest
utilization across all processors, which is not allocated to virtual tasks Step 2: Compare the size of virtual task with tasks on different processors,
and check if those tasks can be recovered by using the virtual task Step 3: Go to Step 1 if there are remaining tasks
1H, 0.6
1P, 0.6
2P, 0.3
3P, 0.2 2H, 0.3
3H, 0.2
P1 P2 P3 P4
1C, 0.6
Generated Virtual Task1C covers task 1, 2, and 3
2C, 0.3
3C, 0.21C,
0.6
2C, 0.3
3C, 0.21C,
0.6
Motivation → Goals → R-BATCH → Evaluation → Conclusion27
Carnegie Mellon
Motivation → Goals → R-BATCH → Evaluation → Conclusion28
R-BATCH Reliable Bin-packing Algorithm for Tasks with Cold
Standby and Hot Standby Step 1: Perform R-BFD with the primary and Hot Standbys Step 2: Generate virtual tasks Step 3: Perform R-BFD with virtual tasks
1H, 0.6
1P, 0.6
2P, 0.3
3P, 0.2 2H, 0.3
3H, 0.2
P1 P2 P3 P4
1P, 0.6 2P, 0.33H, 0.2
1H, 0.6
2H, 0.3
3P, 0.2
1C, 0.62C, 0.3
3C, 0.21C,
0.6
Motivation → Goals → R-BATCH → Evaluation → Conclusion28
Carnegie Mellon
OutlineMotivation
Goals and Systems Models
R-BATCH: Task Allocations with Replicas
Performance Evaluation
Conclusion
Carnegie Mellon
Motivation → Goals → R-BATCH → Evaluation → Conclusion31
Performance Evaluation (R-BFD)
𝑢𝑚𝑎𝑥=0.3 Motivation → Goals → R-BATCH → Evaluation → Conclusion
31
Ratios of Saved Processors(Normalized to BFD-P)
Number of Tasks
18%
Carnegie Mellon
Motivation → Goals → R-BATCH → Evaluation → Conclusion32
Performance Evaluation (R-BATCH)
Motivation → Goals → R-BATCH → Evaluation → Conclusion32
𝑢𝑚𝑎𝑥=0.3
Ratios of Saved Processors(Normalized to BFD-P)
Number of Tasks
49%
Carnegie Mellon
Motivation → Goals → R-BATCH → Evaluation → Conclusion33
Performance Evaluation
Motivation → Goals → R-BATCH → Evaluation → Conclusion33
Ratio
s of
Sav
ed P
roce
ssor
s(N
orm
alize
d to
BFD
-P)
Ratio
s of
Sav
ed P
roce
ssor
s(N
orm
alize
d to
BFD
-P)
R-BFD, R-BATCH,
For smaller task set sizes, R-BFD is more beneficial For larger task set sizes, R-BATCH is more beneficial
Carnegie Mellon
Motivation → Goals → R-BATCH → Evaluation → Conclusion34
Back to Boss 20 periodic tasks for autonomous driving support By using R-BATCH
Can tolerate 5 failures with 10 dual-core processors 35% saving compared to BFD-P
With the primary With 1 Hot Standby per task With 4 Cold Standby per task
Motivation → Goals → R-BATCH → Evaluation → Conclusion34
Carnegie Mellon
Motivation → Goals → R-BATCH → Evaluation → Conclusion35
Conclusion Many safety-critical real-time systems must also support
redundancy for tolerating faults We defined recovery task models
Hard Recovery Task Soft Recovery Task Best-effort Recovery Task
We used two types of recovery schemes Hot Standby (for Hard Recovery Task) Cold Standby (for Soft Recovery Task)
We can tolerate a fixed number of (fail-stop) failures R-BFD
18% fewer processors with Hot Standby R-BATCH
49% fewer processors with Hot Standby and Cold Standby Utilizes slack for additional tasks
Motivation → Goals → R-BATCH → Evaluation → Conclusion35