multiparty protocol-induced recoverymrg.doc.ic.ac.uk/talks/2017/06/opct-2/slides.pdf ·...

28
Let it Recover: Multiparty Protocol-Induced Recovery 1

Upload: others

Post on 11-Jul-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Multiparty Protocol-Induced Recoverymrg.doc.ic.ac.uk/talks/2017/06/opct-2/slides.pdf · 2020-07-02 · Organise your processes in supervision trees Let it crash: Erlang’s fault

Let it Recover:Multiparty Protocol-Induced Recovery

1

Page 2: Multiparty Protocol-Induced Recoverymrg.doc.ic.ac.uk/talks/2017/06/opct-2/slides.pdf · 2020-07-02 · Organise your processes in supervision trees Let it crash: Erlang’s fault

“Fail fast and recover quickly”Erlang proverb

“Fail fast and recover quickly and safely ” OPCT proverb (after this talk)

2

Page 3: Multiparty Protocol-Induced Recoverymrg.doc.ic.ac.uk/talks/2017/06/opct-2/slides.pdf · 2020-07-02 · Organise your processes in supervision trees Let it crash: Erlang’s fault

3

Part One Background

Page 4: Multiparty Protocol-Induced Recoverymrg.doc.ic.ac.uk/talks/2017/06/opct-2/slides.pdf · 2020-07-02 · Organise your processes in supervision trees Let it crash: Erlang’s fault

The Erlang programming language

-> 1;factorial(0) factorial(X) when X > 0 -> X * factorial(X-1).

4

Page 5: Multiparty Protocol-Induced Recoverymrg.doc.ic.ac.uk/talks/2017/06/opct-2/slides.pdf · 2020-07-02 · Organise your processes in supervision trees Let it crash: Erlang’s fault

Erlang’s coding philosophy

5

Page 6: Multiparty Protocol-Induced Recoverymrg.doc.ic.ac.uk/talks/2017/06/opct-2/slides.pdf · 2020-07-02 · Organise your processes in supervision trees Let it crash: Erlang’s fault

Organise your processes in supervision trees

Let it crash: Erlang’s fault tolerance model

Do not program defensively, let the process crash

In case of error, the process is automatically terminated

Processes are linked. When a process crashes linked process are notified and (can be) restarted.

Recently adopted by

- - -

one-for-one all-for-one rest-for-one

Supervision Strategies

6

Page 7: Multiparty Protocol-Induced Recoverymrg.doc.ic.ac.uk/talks/2017/06/opct-2/slides.pdf · 2020-07-02 · Organise your processes in supervision trees Let it crash: Erlang’s fault

unsound

A recovery may cause deadlocks, orphan messages, reception errors

Supervision strategies: Drawbacks

Supervision strategies are: statically defined, error-prone

inefficient

7

Page 8: Multiparty Protocol-Induced Recoverymrg.doc.ic.ac.uk/talks/2017/06/opct-2/slides.pdf · 2020-07-02 · Organise your processes in supervision trees Let it crash: Erlang’s fault

unsoundinefficient

How to generate sound and efficient supervision strategies?

By using Session Types!

8

Page 9: Multiparty Protocol-Induced Recoverymrg.doc.ic.ac.uk/talks/2017/06/opct-2/slides.pdf · 2020-07-02 · Organise your processes in supervision trees Let it crash: Erlang’s fault

Session Types Overview

Global protocol (session type)

Local protocol (session type) Slice of global protocol relevant to one role Mechanically derived from a global protocol

A system of well-behaved processes is free from deadlocks, orphan messages and reception errors

The framework has been applied to Java, Python, MPI/C, Go…

Process language Execution model of I/O actions by roles

9

Page 10: Multiparty Protocol-Induced Recoverymrg.doc.ic.ac.uk/talks/2017/06/opct-2/slides.pdf · 2020-07-02 · Organise your processes in supervision trees Let it crash: Erlang’s fault

10

Part Two Let It Recover

Page 11: Multiparty Protocol-Induced Recoverymrg.doc.ic.ac.uk/talks/2017/06/opct-2/slides.pdf · 2020-07-02 · Organise your processes in supervision trees Let it crash: Erlang’s fault

Protocol

Dependency Graph Recovery Table

Recovery workflowrecovery algorithm implementation

A recovered system is free from deadlocks, orphan messages and reception error.

Outperforms one of the built-in recovery strategies in Erlang

(A:3)

Erlang Runtim

11

(B:1) (C:2)

Page 12: Multiparty Protocol-Induced Recoverymrg.doc.ic.ac.uk/talks/2017/06/opct-2/slides.pdf · 2020-07-02 · Organise your processes in supervision trees Let it crash: Erlang’s fault

This talk: Safe Recovery for Session ProtocolsApproach

Recovery algorithm to analyse a global protocol as to calculate the dependencies of a failed process.

Local supervisors monitor the state of the process in the protocol

Protocol supervisors use the algorithms at runtime to decide which process to recover

12

Page 13: Multiparty Protocol-Induced Recoverymrg.doc.ic.ac.uk/talks/2017/06/opct-2/slides.pdf · 2020-07-02 · Organise your processes in supervision trees Let it crash: Erlang’s fault

Causalities

Page 14: Multiparty Protocol-Induced Recoverymrg.doc.ic.ac.uk/talks/2017/06/opct-2/slides.pdf · 2020-07-02 · Organise your processes in supervision trees Let it crash: Erlang’s fault

Causalities

Page 15: Multiparty Protocol-Induced Recoverymrg.doc.ic.ac.uk/talks/2017/06/opct-2/slides.pdf · 2020-07-02 · Organise your processes in supervision trees Let it crash: Erlang’s fault

15

Part Three Recovery Algorithm

Page 16: Multiparty Protocol-Induced Recoverymrg.doc.ic.ac.uk/talks/2017/06/opct-2/slides.pdf · 2020-07-02 · Organise your processes in supervision trees Let it crash: Erlang’s fault

Recovery Algorithm

16

Page 17: Multiparty Protocol-Induced Recoverymrg.doc.ic.ac.uk/talks/2017/06/opct-2/slides.pdf · 2020-07-02 · Organise your processes in supervision trees Let it crash: Erlang’s fault

Recovery Algorithm

17

Page 18: Multiparty Protocol-Induced Recoverymrg.doc.ic.ac.uk/talks/2017/06/opct-2/slides.pdf · 2020-07-02 · Organise your processes in supervision trees Let it crash: Erlang’s fault

5

Initialise Final condition3 3, 4 3, 4

1:B E; 2:C E;3:B 6:D

A; 4:C E; 7:E

A; 5:A B;

D;

Initialise Final Condition

21 76

:5, 6, 7

33 44

3

4 3, 4 done

not done

18

4

3, 43, 4

Page 19: Multiparty Protocol-Induced Recoverymrg.doc.ic.ac.uk/talks/2017/06/opct-2/slides.pdf · 2020-07-02 · Organise your processes in supervision trees Let it crash: Erlang’s fault

Recovery points

recovery point: take the top node from the set of recovery nodes

Failure Recovery points… 3, A

3, B 4, C 4, A …

…A:3, B:3, C:4A:3, B:3, C:5C:2, E:2C:1, B:1, ……

Global Recovery Table

1:B 3:B

C; 2:C E; A; 4:C A;

19

Page 20: Multiparty Protocol-Induced Recoverymrg.doc.ic.ac.uk/talks/2017/06/opct-2/slides.pdf · 2020-07-02 · Organise your processes in supervision trees Let it crash: Erlang’s fault

Main Results: Transparency and Safety (informally)Theorem: Transparency

The recovered protocol is a reduction of the initial protocol. The configuration of the system after a failure is reachable from the initial configuration.

Theorem:Safety

Any reachable configuration which is an initial configuration of well- formed global protocol is free from deadlock, an orphan massage and a reception error.

Page 21: Multiparty Protocol-Induced Recoverymrg.doc.ic.ac.uk/talks/2017/06/opct-2/slides.pdf · 2020-07-02 · Organise your processes in supervision trees Let it crash: Erlang’s fault

21

Part Four Recovery Implementation

Page 22: Multiparty Protocol-Induced Recoverymrg.doc.ic.ac.uk/talks/2017/06/opct-2/slides.pdf · 2020-07-02 · Organise your processes in supervision trees Let it crash: Erlang’s fault

Enabling Protocol Recovery in

gen_server stores recovery tables protocol specification

protocol supervisor (recover processes)

local supervisors (monitor the process behaviour)

gen_server (used to implement processes)

22

Page 23: Multiparty Protocol-Induced Recoverymrg.doc.ic.ac.uk/talks/2017/06/opct-2/slides.pdf · 2020-07-02 · Organise your processes in supervision trees Let it crash: Erlang’s fault

Enabling Protocol Recovery in Erlang: Example

23

Page 24: Multiparty Protocol-Induced Recoverymrg.doc.ic.ac.uk/talks/2017/06/opct-2/slides.pdf · 2020-07-02 · Organise your processes in supervision trees Let it crash: Erlang’s fault

Evaluation: Web Crawler Examplese

cond

s

number of crashes

A process is chosen at random at the start

Improvement when several failures occur

By mistake initially we implemented all-for-one that introduced a deadlock

source: http://foat.me/articles/crawling-with-akka/

Page 25: Multiparty Protocol-Induced Recoverymrg.doc.ic.ac.uk/talks/2017/06/opct-2/slides.pdf · 2020-07-02 · Organise your processes in supervision trees Let it crash: Erlang’s fault

Evaluation: Concurrency Patternsse

cond

s

Map ReduceRing Calculator

52% improvement when intense local computation disconnected interactions

Up to 7% overhead when all roles are restarted

Page 26: Multiparty Protocol-Induced Recoverymrg.doc.ic.ac.uk/talks/2017/06/opct-2/slides.pdf · 2020-07-02 · Organise your processes in supervision trees Let it crash: Erlang’s fault

Future work & ResourcesFramework summary

Ensure processes are safe and conform to a protocol (even in cases of failures) Create supervision trees and link processes dynamically based on a protocol structure

Future work

Support for stateful processes Integration with checkpoints Replications and recovery actions

Additional Resources

Scribble webpage: scribble.doc.ic.ac.uk Project source: https://gitlab.doc.ic.ac.uk/rn710/codeINspire MRG webpage: http://mrg.doc.ic.ac.uk/

Page 27: Multiparty Protocol-Induced Recoverymrg.doc.ic.ac.uk/talks/2017/06/opct-2/slides.pdf · 2020-07-02 · Organise your processes in supervision trees Let it crash: Erlang’s fault

Q & A

27

Page 28: Multiparty Protocol-Induced Recoverymrg.doc.ic.ac.uk/talks/2017/06/opct-2/slides.pdf · 2020-07-02 · Organise your processes in supervision trees Let it crash: Erlang’s fault

Future work & ResourcesFramework summary

Ensure processes are safe and conform to a protocol (even in cases of failures) Create supervision trees and link processes dynamically based on a protocol structure

Future work

Support for stateful processes Integration with checkpoints Replications and recovery actions

Additional Resources

Scribble webpage: scribble.doc.ic.ac.uk Project source: https://gitlab.doc.ic.ac.uk/rn710/codeINspire MRG webpage: http://mrg.doc.ic.ac.uk/

28