christian delbe1 christian delbé oasis team inria -- cnrs - i3s -- univ. of nice sophia-antipolis...

Christian Delbe 1

Christian Delbé

OASIS Team

INRIA -- CNRS - I3S -- Univ. of Nice Sophia-Antipolis

November 29 2006

Automatic Fault Tolerance inAutomatic Fault Tolerance inProActiveProActive

Christian Delbe 2

Fault ToleranceFault Tolerance

A system is said to be fault tolerant if it can continue operating properly in the event of

failure of some of its parts.

• New requirements for Grid Computing • Large scale• High failure rate

• Simultaneous failures

• Heterogeneous Software• Portability

• Heterogeneous Hardware• Different dependability characteristics in each group

Christian Delbe 3

Fault Tolerance in JavaFault Tolerance in Java• Rollback-Recovery approach

• Each process periodically takes a checkpoint• Based on the availability of a stable storage

• Checkpoints are used to recover application in a correct state

• But Java threads are not checkpointable !

• Provide checkpointability with specific tools ?• System level, Virtual Machine level, Compiler level

Unfortunately …

Loss of portability / efficiency

Unique and non-standard implementation

Christian Delbe 4

Fault Tolerance in ProActiveFault Tolerance in ProActive• New Communication-Induced-Checkpointing protocol (CIC)

• Pessimistic Message-Logging protocol (PML)

• Non-intrusive • 100% standard Java, based on serialization

• Transparent for the programmer• Fault tolerance settings in deployment descriptors

• Based on a Fault Tolerance Server• Checkpoint storage• Failures detection• Resource service (deployed nodes or P2P infrastructure)• Localization service

Christian Delbe 5

CIC Protocol Overview CIC Protocol Overview

• Creation of a consistent global snapshot• Non-blocking synchronization: low failure-free overhead

p1

p4p3

p2

Christian Delbe 6

p1

p4p3

p2

p4

CIC Protocol Overview CIC Protocol Overview

• Creation of a consistent global snapshot• Non-blocking synchronization: low failure-free overhead

• After a failure, the entire system restarts• Recovery time increases with system size

Christian Delbe 7

PML Protocol Overview PML Protocol Overview • Independent checkpoints

• All messages must be logged• Failure free overhead increases with message rate

m1

p1

p4p3

p2

m1

Christian Delbe 8

p1

p4p3

p2

m1

p4

• Independent checkpoints

• All messages must be logged• Failure free overhead increases with message rate

•After a failure, only the faulty restarts• Recovery time is system size independent

PML Protocol Overview PML Protocol Overview

Christian Delbe 9

Performance comparisonPerformance comparisonCIC vs PMLCIC vs PML

• Jacobi iteration (SPMD iterative reduction of matrix) on matrix of size 30002 and 50002

• System size increases Checkpoint size decreases Message rate increases

Recovery Time

0102030405060708090

100

0 5 10 15 20 25 30

# Nodes

Re

cove

ry T

ime

(s)

CIC 3000 PML 3000

Failure Free Overhead

0,0%

5,0%

10,0%

15,0%

20,0%

25,0%

30,0%

35,0%

40,0%

0 5 10 15 20 25 30

# Nodes

Ove

rhe

ad

(%

)

CIC 3000 PML 3000 CIC 5000 PML 5000

Christian Delbe 10

Mixing CIC and PMLMixing CIC and PML

• Based on Recovery Groups• Independent groups linked with PML• After a failure, only the group have to restart• Fault Tolerance Servers are independent

• Groups Dynamically created on common stable server

CIC

PML

CIC

PML

PML

Christian Delbe 11

Rollback on Grid requirementsRollback on Grid requirements• Large scale

+ Divide-and-Conquer approach

• High failure rate+ Failure impact limited to the group+ Can handle multiple failures

• Heterogeneous Software+ Only Standard Java

• Heterogeneous Hardware+ Can apply the most adapted settings in each group

Christian Delbe 12

Recovery Time

0

50

100

150

200

250

300

0 20 40 60 80 100

# Nodes

Re

cove

ry T

ime

(s)

CIC 7000 Mixed 7000 CIC 9000 Mixed 9000

Performance ComparisonPerformance ComparisonCIC vs MixedCIC vs Mixed

• Jacobi iteration on 70002 and 90002 matrix• Two groups mapped on two clusters of Grid5000

Failure Free Overhead

0,0%

10,0%

20,0%

30,0%

40,0%

50,0%

60,0%

70,0%

0 20 40 60 80 100

# Nodes

Ove

rhe

ad

(%

)

CIC 7000 Mixed 7000 CIC 9000 Mixed 9000

Christian Delbe 13

Completion Time Rate (CIC/mixed) Jacobi 7000

0,5

0,7

0,9

1,1

1,3

1,5

0 1 2 3 4 5 6 7 8 9 10# failures

Ra

te

16 nœuds 36 nodes 81 nodes

Completion Time Rate (CIC/mixed) Jacobi 9000

0,5

0,7

0,9

1,1

1,3

1,5

0 1 2 3 4 5 6 7 8 9 10# failures

Ra

te

16 nodes 36 nodes 81 nodes 100 nodes

Performance ComparisonPerformance ComparisonCIC vs MixedCIC vs Mixed

• Jacobi iteration on 70002 and 90002 matrix• Two groups mapped on two clusters of Grid5000

1 1

nodes

Christian Delbe 14

• Automatic and Transparent Fault Tolerance • Easy to use• Configured at deployment time

• Three protocols: • Depends on hardware and application properties

CIC PML Mixed

• Next release 3.2 will include Mixed protocol

Fault Tolerance in ProActiveFault Tolerance in ProActive

- Failure Frequency + - Communication Rate +

Christian Delbe 15

Performance of the Mixed protocolPerformance of the Mixed protocol

Global Overhead - Jacobi 16000

20%30%40%50%60%70%80%90%

100%

50 100 150 200 250 300 350

# nodes

Ove

rhe

ad

4 clusters 5 clusters 6 clusters

• Jacobi iteration on a 16000 matrix• Groups mapped on 4 to 6 clusters of Grid5000

Christian Delbe 16

CIC Performance EvaluationCIC Performance Evaluation

• Jacobi iteration (SPMD iterative reduction of matrix)• CG NAS Parallel Benchmark (Conjugate Gradient)

Overhead without checkpointNPB kernel CG

0,0%

1,0%

2,0%

3,0%

4,0%

5,0%

6,0%

7,0%

0 10 20 30

# nodes

ove

rhe

ad

CG - A CG - B CG - C

Overhead without checkpointJacobi

0,0%

0,5%

1,0%

1,5%

2,0%

2,5%

3,0%

3,5%

0 20 40 60

# nodes

ove

rhe

ad

Jacobi 3000 Jacobi 5000

Christian Delbe 17

CIC Performance EvaluationCIC Performance Evaluation

• Jacobi iteration (SPMD iterative reduction of matrix)• CG NAS Parallel Benchmark (Conjugate Gradient)

Global Overhead Jacobi

0,0%

2,0%

4,0%

6,0%

8,0%

10,0%

12,0%

0 30 60 90 120 150

TTC (s)

Ove

rhe

ad

Jacobi 5000, 16 nodes Jacobi 5000, 36 nodesJacobi 5000, 64 nodes

Global OverheadNPB kernel CG

0,0%

10,0%

20,0%

30,0%

40,0%

50,0%

0 30 60 90 120 150

TTC (s)

Ove

rhe

ad

CG-C, 8 nodes CG-C, 16 nodes

christian delbe1 christian delbé oasis team inria -- cnrs - i3s -- univ. of nice sophia-antipolis...

Documents

group slide

system size slide

proactive slide

christian delbe10 mixing

event of failure

christian delbe11 rollback

christian delbe13 perf

automatic fault tolerance