christian delbe1 christian delbé oasis team inria -- cnrs - i3s -- univ. of nice sophia-antipolis...
TRANSCRIPT
Christian Delbe 1
Christian Delbé
OASIS Team
INRIA -- CNRS - I3S -- Univ. of Nice Sophia-Antipolis
November 29 2006
Automatic Fault Tolerance inAutomatic Fault Tolerance inProActiveProActive
Christian Delbe 2
Fault ToleranceFault Tolerance
A system is said to be fault tolerant if it can continue operating properly in the event of
failure of some of its parts.
• New requirements for Grid Computing • Large scale• High failure rate
• Simultaneous failures
• Heterogeneous Software• Portability
• Heterogeneous Hardware• Different dependability characteristics in each group
Christian Delbe 3
Fault Tolerance in JavaFault Tolerance in Java• Rollback-Recovery approach
• Each process periodically takes a checkpoint• Based on the availability of a stable storage
• Checkpoints are used to recover application in a correct state
• But Java threads are not checkpointable !
• Provide checkpointability with specific tools ?• System level, Virtual Machine level, Compiler level
Unfortunately …
Loss of portability / efficiency
Unique and non-standard implementation
Christian Delbe 4
Fault Tolerance in ProActiveFault Tolerance in ProActive• New Communication-Induced-Checkpointing protocol (CIC)
• Pessimistic Message-Logging protocol (PML)
• Non-intrusive • 100% standard Java, based on serialization
• Transparent for the programmer• Fault tolerance settings in deployment descriptors
• Based on a Fault Tolerance Server• Checkpoint storage• Failures detection• Resource service (deployed nodes or P2P infrastructure)• Localization service
Christian Delbe 5
CIC Protocol Overview CIC Protocol Overview
• Creation of a consistent global snapshot• Non-blocking synchronization: low failure-free overhead
p1
p4p3
p2
Christian Delbe 6
p1
p4p3
p2
p4
CIC Protocol Overview CIC Protocol Overview
• Creation of a consistent global snapshot• Non-blocking synchronization: low failure-free overhead
• After a failure, the entire system restarts• Recovery time increases with system size
Christian Delbe 7
PML Protocol Overview PML Protocol Overview • Independent checkpoints
• All messages must be logged• Failure free overhead increases with message rate
m1
p1
p4p3
p2
m1
Christian Delbe 8
p1
p4p3
p2
m1
p4
• Independent checkpoints
• All messages must be logged• Failure free overhead increases with message rate
•After a failure, only the faulty restarts• Recovery time is system size independent
PML Protocol Overview PML Protocol Overview
Christian Delbe 9
Performance comparisonPerformance comparisonCIC vs PMLCIC vs PML
• Jacobi iteration (SPMD iterative reduction of matrix) on matrix of size 30002 and 50002
• System size increases Checkpoint size decreases Message rate increases
Recovery Time
0102030405060708090
100
0 5 10 15 20 25 30
# Nodes
Re
cove
ry T
ime
(s)
CIC 3000 PML 3000
Failure Free Overhead
0,0%
5,0%
10,0%
15,0%
20,0%
25,0%
30,0%
35,0%
40,0%
0 5 10 15 20 25 30
# Nodes
Ove
rhe
ad
(%
)
CIC 3000 PML 3000 CIC 5000 PML 5000
Christian Delbe 10
Mixing CIC and PMLMixing CIC and PML
• Based on Recovery Groups• Independent groups linked with PML• After a failure, only the group have to restart• Fault Tolerance Servers are independent
• Groups Dynamically created on common stable server
CIC
PML
CIC
PML
PML
Christian Delbe 11
Rollback on Grid requirementsRollback on Grid requirements• Large scale
+ Divide-and-Conquer approach
• High failure rate+ Failure impact limited to the group+ Can handle multiple failures
• Heterogeneous Software+ Only Standard Java
• Heterogeneous Hardware+ Can apply the most adapted settings in each group
Christian Delbe 12
Recovery Time
0
50
100
150
200
250
300
0 20 40 60 80 100
# Nodes
Re
cove
ry T
ime
(s)
CIC 7000 Mixed 7000 CIC 9000 Mixed 9000
Performance ComparisonPerformance ComparisonCIC vs MixedCIC vs Mixed
• Jacobi iteration on 70002 and 90002 matrix• Two groups mapped on two clusters of Grid5000
Failure Free Overhead
0,0%
10,0%
20,0%
30,0%
40,0%
50,0%
60,0%
70,0%
0 20 40 60 80 100
# Nodes
Ove
rhe
ad
(%
)
CIC 7000 Mixed 7000 CIC 9000 Mixed 9000
Christian Delbe 13
Completion Time Rate (CIC/mixed) Jacobi 7000
0,5
0,7
0,9
1,1
1,3
1,5
0 1 2 3 4 5 6 7 8 9 10# failures
Ra
te
16 nœuds 36 nodes 81 nodes
Completion Time Rate (CIC/mixed) Jacobi 9000
0,5
0,7
0,9
1,1
1,3
1,5
0 1 2 3 4 5 6 7 8 9 10# failures
Ra
te
16 nodes 36 nodes 81 nodes 100 nodes
Performance ComparisonPerformance ComparisonCIC vs MixedCIC vs Mixed
• Jacobi iteration on 70002 and 90002 matrix• Two groups mapped on two clusters of Grid5000
1 1
nodes
Christian Delbe 14
• Automatic and Transparent Fault Tolerance • Easy to use• Configured at deployment time
• Three protocols: • Depends on hardware and application properties
CIC PML Mixed
• Next release 3.2 will include Mixed protocol
Fault Tolerance in ProActiveFault Tolerance in ProActive
- Failure Frequency + - Communication Rate +
Christian Delbe 15
Performance of the Mixed protocolPerformance of the Mixed protocol
Global Overhead - Jacobi 16000
20%30%40%50%60%70%80%90%
100%
50 100 150 200 250 300 350
# nodes
Ove
rhe
ad
4 clusters 5 clusters 6 clusters
• Jacobi iteration on a 16000 matrix• Groups mapped on 4 to 6 clusters of Grid5000
Christian Delbe 16
CIC Performance EvaluationCIC Performance Evaluation
• Jacobi iteration (SPMD iterative reduction of matrix)• CG NAS Parallel Benchmark (Conjugate Gradient)
Overhead without checkpointNPB kernel CG
0,0%
1,0%
2,0%
3,0%
4,0%
5,0%
6,0%
7,0%
0 10 20 30
# nodes
ove
rhe
ad
CG - A CG - B CG - C
Overhead without checkpointJacobi
0,0%
0,5%
1,0%
1,5%
2,0%
2,5%
3,0%
3,5%
0 20 40 60
# nodes
ove
rhe
ad
Jacobi 3000 Jacobi 5000
Christian Delbe 17
CIC Performance EvaluationCIC Performance Evaluation
• Jacobi iteration (SPMD iterative reduction of matrix)• CG NAS Parallel Benchmark (Conjugate Gradient)
Global Overhead Jacobi
0,0%
2,0%
4,0%
6,0%
8,0%
10,0%
12,0%
0 30 60 90 120 150
TTC (s)
Ove
rhe
ad
Jacobi 5000, 16 nodes Jacobi 5000, 36 nodesJacobi 5000, 64 nodes
Global OverheadNPB kernel CG
0,0%
10,0%
20,0%
30,0%
40,0%
50,0%
0 30 60 90 120 150
TTC (s)
Ove
rhe
ad
CG-C, 8 nodes CG-C, 16 nodes