conflict-free replicated data types...

102
Marc Shapiro, INRIA & LIP6 Nuno Preguiça, U. Nova de Lisboa Carlos Baquero, U. Minho Marek Zawirski, INRIA & UPMC Conflict-free Replicated Data Types (CRDTs) for collaborative environments

Upload: others

Post on 16-Jul-2020

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Conflict-free Replicated Data Types (CRDTs)events.telecom-sudparis.eu/wetice/invited_speakers/... · Conflict-free Replicated Data Types Conflict-free objects for large-scale distribution

Marc Shapiro, INRIA & LIP6Nuno Preguiça, U. Nova de Lisboa

Carlos Baquero, U. MinhoMarek Zawirski, INRIA & UPMC

Conflict-free Replicated Data Types (CRDTs)

for collaborative environments

Page 2: Conflict-free Replicated Data Types (CRDTs)events.telecom-sudparis.eu/wetice/invited_speakers/... · Conflict-free Replicated Data Types Conflict-free objects for large-scale distribution

Conflict-free Replicated Data Types

Conflict-free objects for large-scale distribution

Shared Mutable data• Read ⇒ replicate• Updates?

Novel, principled approach:Conflict-free objects

Can we design useful object types without any synchronisation whatsoever?

Can we build practical systems from such objects?

2

•Large, dynamic graph•Incremental, parallel,

asynchronous:- updates - processing

Page 3: Conflict-free Replicated Data Types (CRDTs)events.telecom-sudparis.eu/wetice/invited_speakers/... · Conflict-free Replicated Data Types Conflict-free objects for large-scale distribution

Conflict-free Replicated Data Types

Conflict-free objects for large-scale distribution

Shared Mutable data• Read ⇒ replicate• Updates?

Novel, principled approach:Conflict-free objects

Can we design useful object types without any synchronisation whatsoever?

Can we build practical systems from such objects?

2

•Large, dynamic graph•Incremental, parallel,

asynchronous:- updates - processing

Page 4: Conflict-free Replicated Data Types (CRDTs)events.telecom-sudparis.eu/wetice/invited_speakers/... · Conflict-free Replicated Data Types Conflict-free objects for large-scale distribution

Conflict-free Replicated Data Types

Conflict-free objects for large-scale distribution

Shared Mutable data• Read ⇒ replicate• Updates?

Novel, principled approach:Conflict-free objects

Can we design useful object types without any synchronisation whatsoever?

Can we build practical systems from such objects?

2

•Large, dynamic graph•Incremental, parallel,

asynchronous:- updates - processing

Page 5: Conflict-free Replicated Data Types (CRDTs)events.telecom-sudparis.eu/wetice/invited_speakers/... · Conflict-free Replicated Data Types Conflict-free objects for large-scale distribution

Conflict-free Replicated Data Types

Conflict-free objects for large-scale distribution

Shared Mutable data• Read ⇒ replicate• Updates?

Novel, principled approach:Conflict-free objects

Can we design useful object types without any synchronisation whatsoever?

Can we build practical systems from such objects?

2

•Large, dynamic graph•Incremental, parallel,

asynchronous:- updates - processing

Page 6: Conflict-free Replicated Data Types (CRDTs)events.telecom-sudparis.eu/wetice/invited_speakers/... · Conflict-free Replicated Data Types Conflict-free objects for large-scale distribution

Conflict-free Replicated Data Types

Conflict-free objects for large-scale distribution

Shared Mutable data• Read ⇒ replicate• Updates?

Novel, principled approach:Conflict-free objects

Can we design useful object types without any synchronisation whatsoever?

Can we build practical systems from such objects?

2

•Large, dynamic graph•Incremental, parallel,

asynchronous:- updates - processing

Page 7: Conflict-free Replicated Data Types (CRDTs)events.telecom-sudparis.eu/wetice/invited_speakers/... · Conflict-free Replicated Data Types Conflict-free objects for large-scale distribution

Replication for beginners

Page 8: Conflict-free Replicated Data Types (CRDTs)events.telecom-sudparis.eu/wetice/invited_speakers/... · Conflict-free Replicated Data Types Conflict-free objects for large-scale distribution

Conflict-free Replicated Data Types

Replicated data

Share data ⇒ Replicate at many locations• Performance: local reads• Availability: immune from network failure• Fault-tolerance: replicate computation• Scalability: load balancing

Updates• Push to all replicas• Conflicts: Consistency?

4

•Fault tolerance•and parallelism too?•Conflict!!

Page 9: Conflict-free Replicated Data Types (CRDTs)events.telecom-sudparis.eu/wetice/invited_speakers/... · Conflict-free Replicated Data Types Conflict-free objects for large-scale distribution

Conflict-free Replicated Data Types

Strong consistency

Preclude conflicts• All replicas execute updates

in same total order• Any deterministic object

Consensus• Serialisation bottleneck• Tolerates < n/2 faults

5

•Very general•Correct•Doesn't scale

•Simultaneous N-way agreement

Page 10: Conflict-free Replicated Data Types (CRDTs)events.telecom-sudparis.eu/wetice/invited_speakers/... · Conflict-free Replicated Data Types Conflict-free objects for large-scale distribution

Conflict-free Replicated Data Types

Strong consistency

Preclude conflicts• All replicas execute updates

in same total order• Any deterministic object

Consensus• Serialisation bottleneck• Tolerates < n/2 faults

5

•Very general•Correct•Doesn't scale

1 •Simultaneous N-way agreement

Page 11: Conflict-free Replicated Data Types (CRDTs)events.telecom-sudparis.eu/wetice/invited_speakers/... · Conflict-free Replicated Data Types Conflict-free objects for large-scale distribution

Conflict-free Replicated Data Types

Strong consistency

Preclude conflicts• All replicas execute updates

in same total order• Any deterministic object

Consensus• Serialisation bottleneck• Tolerates < n/2 faults

5

•Very general•Correct•Doesn't scale

12 •Simultaneous N-way agreement

Page 12: Conflict-free Replicated Data Types (CRDTs)events.telecom-sudparis.eu/wetice/invited_speakers/... · Conflict-free Replicated Data Types Conflict-free objects for large-scale distribution

Conflict-free Replicated Data Types

Strong consistency

Preclude conflicts• All replicas execute updates

in same total order• Any deterministic object

Consensus• Serialisation bottleneck• Tolerates < n/2 faults

5

•Very general•Correct•Doesn't scale

123 •Simultaneous N-way agreement

Page 13: Conflict-free Replicated Data Types (CRDTs)events.telecom-sudparis.eu/wetice/invited_speakers/... · Conflict-free Replicated Data Types Conflict-free objects for large-scale distribution

Conflict-free Replicated Data Types

Strong consistency

Preclude conflicts• All replicas execute updates

in same total order• Any deterministic object

Consensus• Serialisation bottleneck• Tolerates < n/2 faults

5

•Very general•Correct•Doesn't scale

1234 •Simultaneous N-way agreement

Page 14: Conflict-free Replicated Data Types (CRDTs)events.telecom-sudparis.eu/wetice/invited_speakers/... · Conflict-free Replicated Data Types Conflict-free objects for large-scale distribution

Conflict-free Replicated Data Types

Strong consistency

Preclude conflicts• All replicas execute updates

in same total order• Any deterministic object

Consensus• Serialisation bottleneck• Tolerates < n/2 faults

5

•Very general•Correct•Doesn't scale

12345 •Simultaneous N-way agreement

Page 15: Conflict-free Replicated Data Types (CRDTs)events.telecom-sudparis.eu/wetice/invited_speakers/... · Conflict-free Replicated Data Types Conflict-free objects for large-scale distribution

Conflict-free Replicated Data Types

Strong consistency

Preclude conflicts• All replicas execute updates

in same total order• Any deterministic object

Consensus• Serialisation bottleneck• Tolerates < n/2 faults

5

•Very general•Correct•Doesn't scale

123456 •Simultaneous N-way agreement

Page 16: Conflict-free Replicated Data Types (CRDTs)events.telecom-sudparis.eu/wetice/invited_speakers/... · Conflict-free Replicated Data Types Conflict-free objects for large-scale distribution

Conflict-free Replicated Data Types

Eventual Consistency

Update local + propagate• No foreground synch• Expose tentative state• Eventual, reliable delivery

On conflict• Arbitrate• Roll back

Consensus moved to background

6

•Availability ++•Parallelism++•Latency --

•Complexity ++

Page 17: Conflict-free Replicated Data Types (CRDTs)events.telecom-sudparis.eu/wetice/invited_speakers/... · Conflict-free Replicated Data Types Conflict-free objects for large-scale distribution

Conflict-free Replicated Data Types

Eventual Consistency

Update local + propagate• No foreground synch• Expose tentative state• Eventual, reliable delivery

On conflict• Arbitrate• Roll back

Consensus moved to background

6

•Availability ++•Parallelism++•Latency --

•Complexity ++

Page 18: Conflict-free Replicated Data Types (CRDTs)events.telecom-sudparis.eu/wetice/invited_speakers/... · Conflict-free Replicated Data Types Conflict-free objects for large-scale distribution

Conflict-free Replicated Data Types

Eventual Consistency

Update local + propagate• No foreground synch• Expose tentative state• Eventual, reliable delivery

On conflict• Arbitrate• Roll back

Consensus moved to background

6

•Availability ++•Parallelism++•Latency --

•Complexity ++

Page 19: Conflict-free Replicated Data Types (CRDTs)events.telecom-sudparis.eu/wetice/invited_speakers/... · Conflict-free Replicated Data Types Conflict-free objects for large-scale distribution

Conflict-free Replicated Data Types

Eventual Consistency

Update local + propagate• No foreground synch• Expose tentative state• Eventual, reliable delivery

On conflict• Arbitrate• Roll back

Consensus moved to background

6

•Availability ++•Parallelism++•Latency --

•Complexity ++

Page 20: Conflict-free Replicated Data Types (CRDTs)events.telecom-sudparis.eu/wetice/invited_speakers/... · Conflict-free Replicated Data Types Conflict-free objects for large-scale distribution

Conflict-free Replicated Data Types

Eventual Consistency

Update local + propagate• No foreground synch• Expose tentative state• Eventual, reliable delivery

On conflict• Arbitrate• Roll back

Consensus moved to background

6

•Availability ++•Parallelism++•Latency --

•Complexity ++

Page 21: Conflict-free Replicated Data Types (CRDTs)events.telecom-sudparis.eu/wetice/invited_speakers/... · Conflict-free Replicated Data Types Conflict-free objects for large-scale distribution

Conflict-free Replicated Data Types

Eventual Consistency

Update local + propagate• No foreground synch• Expose tentative state• Eventual, reliable delivery

On conflict• Arbitrate• Roll back

Consensus moved to background

6

•Availability ++•Parallelism++•Latency --

•Complexity ++

Conflict!

Page 22: Conflict-free Replicated Data Types (CRDTs)events.telecom-sudparis.eu/wetice/invited_speakers/... · Conflict-free Replicated Data Types Conflict-free objects for large-scale distribution

Conflict-free Replicated Data Types

Eventual Consistency

Update local + propagate• No foreground synch• Expose tentative state• Eventual, reliable delivery

On conflict• Arbitrate• Roll back

Consensus moved to background

6

•Availability ++•Parallelism++•Latency --

•Complexity ++

Page 23: Conflict-free Replicated Data Types (CRDTs)events.telecom-sudparis.eu/wetice/invited_speakers/... · Conflict-free Replicated Data Types Conflict-free objects for large-scale distribution

Conflict-free Replicated Data Types

Eventual Consistency

Update local + propagate• No foreground synch• Expose tentative state• Eventual, reliable delivery

On conflict• Arbitrate• Roll back

Consensus moved to background

6

•Availability ++•Parallelism++•Latency --

•Complexity ++

Page 24: Conflict-free Replicated Data Types (CRDTs)events.telecom-sudparis.eu/wetice/invited_speakers/... · Conflict-free Replicated Data Types Conflict-free objects for large-scale distribution

Conflict-free Replicated Data Types

Eventual Consistency

Update local + propagate• No foreground synch• Expose tentative state• Eventual, reliable delivery

On conflict• Arbitrate• Roll back

Consensus moved to background

6

•Availability ++•Parallelism++•Latency --

•Complexity ++

Page 25: Conflict-free Replicated Data Types (CRDTs)events.telecom-sudparis.eu/wetice/invited_speakers/... · Conflict-free Replicated Data Types Conflict-free objects for large-scale distribution

Conflict-free Replicated Data Types

Strong Eventual Consistency

Update local + propagate• No synch• Expose intermediate state• Eventual, reliable delivery

No conflict• Deterministic outcome of

concurrent updates

No consensus: ≤ n-1 faultsNot universalSolves the CAP problem

7

•Available, responsive•More parallelism•No conflicts•No rollback

Page 26: Conflict-free Replicated Data Types (CRDTs)events.telecom-sudparis.eu/wetice/invited_speakers/... · Conflict-free Replicated Data Types Conflict-free objects for large-scale distribution

Conflict-free Replicated Data Types

Strong Eventual Consistency

Update local + propagate• No synch• Expose intermediate state• Eventual, reliable delivery

No conflict• Deterministic outcome of

concurrent updates

No consensus: ≤ n-1 faultsNot universalSolves the CAP problem

7

•Available, responsive•More parallelism•No conflicts•No rollback

Page 27: Conflict-free Replicated Data Types (CRDTs)events.telecom-sudparis.eu/wetice/invited_speakers/... · Conflict-free Replicated Data Types Conflict-free objects for large-scale distribution

Conflict-free Replicated Data Types

Strong Eventual Consistency

Update local + propagate• No synch• Expose intermediate state• Eventual, reliable delivery

No conflict• Deterministic outcome of

concurrent updates

No consensus: ≤ n-1 faultsNot universalSolves the CAP problem

7

•Available, responsive•More parallelism•No conflicts•No rollback

Page 28: Conflict-free Replicated Data Types (CRDTs)events.telecom-sudparis.eu/wetice/invited_speakers/... · Conflict-free Replicated Data Types Conflict-free objects for large-scale distribution

Conflict-free Replicated Data Types

Strong Eventual Consistency

Update local + propagate• No synch• Expose intermediate state• Eventual, reliable delivery

No conflict• Deterministic outcome of

concurrent updates

No consensus: ≤ n-1 faultsNot universalSolves the CAP problem

7

•Available, responsive•More parallelism•No conflicts•No rollback

Page 29: Conflict-free Replicated Data Types (CRDTs)events.telecom-sudparis.eu/wetice/invited_speakers/... · Conflict-free Replicated Data Types Conflict-free objects for large-scale distribution

Conflict-free Replicated Data Types

Strong Eventual Consistency

Update local + propagate• No synch• Expose intermediate state• Eventual, reliable delivery

No conflict• Deterministic outcome of

concurrent updates

No consensus: ≤ n-1 faultsNot universalSolves the CAP problem

7

•Available, responsive•More parallelism•No conflicts•No rollback

Page 30: Conflict-free Replicated Data Types (CRDTs)events.telecom-sudparis.eu/wetice/invited_speakers/... · Conflict-free Replicated Data Types Conflict-free objects for large-scale distribution

The challenge:What interesting objects can

we design with no synchronisation whatsoever?

Page 31: Conflict-free Replicated Data Types (CRDTs)events.telecom-sudparis.eu/wetice/invited_speakers/... · Conflict-free Replicated Data Types Conflict-free objects for large-scale distribution

Conflict-free Replicated Data Types

Portfolio of CRDTs

Register• Last-Writer Wins• Multi-Value

Set• Grow-Only• 2P• Observed-Remove

Map• Set of Registers

Counter• Unlimited• Non-negative

Graphs• Directed• Monotonic DAG• Edit graph

Sequence• Edit sequence

9

Page 32: Conflict-free Replicated Data Types (CRDTs)events.telecom-sudparis.eu/wetice/invited_speakers/... · Conflict-free Replicated Data Types Conflict-free objects for large-scale distribution

Conflict-free Replicated Data Types

Set design alternatives

Sequential specification:• {true} add(e) {e ∈ S}• {true} remove(e) {e ∉ S}

{true} add(e) || remove(e) {????}• linearisable?• add wins?• remove wins?• last writer wins?• error state?

10

•linearisable: sequential order

•equivalent to real-time order

•Requires consensus

Page 33: Conflict-free Replicated Data Types (CRDTs)events.telecom-sudparis.eu/wetice/invited_speakers/... · Conflict-free Replicated Data Types Conflict-free objects for large-scale distribution

Conflict-free Replicated Data Types

Observed-Remove Set

11

•Can never remove more tokens than exist

•Op order ⇒ removed tokens have been previously added

{}

{}

{}s3

s1

s2

s

Page 34: Conflict-free Replicated Data Types (CRDTs)events.telecom-sudparis.eu/wetice/invited_speakers/... · Conflict-free Replicated Data Types Conflict-free objects for large-scale distribution

Conflict-free Replicated Data Types

Observed-Remove Set

11

•Can never remove more tokens than exist

•Op order ⇒ removed tokens have been previously added

add(a){}{aα}

{}

{}s3

s1

s2

sS

Page 35: Conflict-free Replicated Data Types (CRDTs)events.telecom-sudparis.eu/wetice/invited_speakers/... · Conflict-free Replicated Data Types Conflict-free objects for large-scale distribution

Conflict-free Replicated Data Types

Observed-Remove Set

11

•Can never remove more tokens than exist

•Op order ⇒ removed tokens have been previously added

add(a)

add(a)

{}{aα}

{}

{}s3

s1

s2

sS

S

{aβ}

Page 36: Conflict-free Replicated Data Types (CRDTs)events.telecom-sudparis.eu/wetice/invited_speakers/... · Conflict-free Replicated Data Types Conflict-free objects for large-scale distribution

Conflict-free Replicated Data Types

Observed-Remove Set

11

•Can never remove more tokens than exist

•Op order ⇒ removed tokens have been previously added

add(a)

add(a)

{}{aα}

{}

{aβ}M

{aβ}

{}s3

s1

s2

sS

S

{aβ}

Page 37: Conflict-free Replicated Data Types (CRDTs)events.telecom-sudparis.eu/wetice/invited_speakers/... · Conflict-free Replicated Data Types Conflict-free objects for large-scale distribution

Conflict-free Replicated Data Types

Observed-Remove Set

11

•Can never remove more tokens than exist

•Op order ⇒ removed tokens have been previously added

add(a)

add(a)

{}{aα}

{}

{aβ}M

{aα}

M

{aβ} {aβ, aα}

{}s3

s1

s2

sS

S

{aβ}

Page 38: Conflict-free Replicated Data Types (CRDTs)events.telecom-sudparis.eu/wetice/invited_speakers/... · Conflict-free Replicated Data Types Conflict-free objects for large-scale distribution

Conflict-free Replicated Data Types

Observed-Remove Set

11

•Can never remove more tokens than exist

•Op order ⇒ removed tokens have been previously added

add(a)

add(a)

rmv (a){}{aα} {aα}

{}

{aβ}M

{aα}

M

{aβ} {aβ, aα}

{}s3

s1

s2

sS S

S

{aβ}

Page 39: Conflict-free Replicated Data Types (CRDTs)events.telecom-sudparis.eu/wetice/invited_speakers/... · Conflict-free Replicated Data Types Conflict-free objects for large-scale distribution

Conflict-free Replicated Data Types

Observed-Remove Set

11

•Can never remove more tokens than exist

•Op order ⇒ removed tokens have been previously added

add(a)

add(a)

rmv (a){}{aα} {aα}

{}

{aβ}M

{aα}

M

{aα}M

{aβ} {aβ, aα} {aβ, aα}{}

s3

s1

s2

sS S

S

{aβ}

Page 40: Conflict-free Replicated Data Types (CRDTs)events.telecom-sudparis.eu/wetice/invited_speakers/... · Conflict-free Replicated Data Types Conflict-free objects for large-scale distribution

Conflict-free Replicated Data Types

Observed-Remove Set

11

•Can never remove more tokens than exist

•Op order ⇒ removed tokens have been previously added

add(a)

add(a)

rmv (a){}{aα} {aα}

{}

{aβ}M

{aα}

M

{aα}M

{aβ}M

{aβ, aα}

{aβ} {aβ, aα} {aβ, aα}{}

s3

s1

s2

sS S

S

{aβ}

Page 41: Conflict-free Replicated Data Types (CRDTs)events.telecom-sudparis.eu/wetice/invited_speakers/... · Conflict-free Replicated Data Types Conflict-free objects for large-scale distribution

Conflict-free Replicated Data Types

Observed-Remove Set

• Payload: added, removed (element, unique-token)add(e) = A ≔ A ∪ {(e, α)}

• Remove: all unique elements observedremove(e) = R ≔ R ∪ { (e, –) ∈ A}

• lookup(e) = ∃ (e, –) ∈ A \ R • merge (S, S') = (A ∪ A', R ∪ R')• {true} add(e) || remove(e) {e ∈ S}

11

•Can never remove more tokens than exist

•Op order ⇒ removed tokens have been previously added

add(a)

add(a)

rmv (a){}{aα} {aα}

{}

{aβ}M

{aα}

M

{aα}M

{aβ}M

{aβ, aα}

{aβ} {aβ, aα} {aβ, aα}{}

s3

s1

s2

sS S

S

{aβ}

Page 42: Conflict-free Replicated Data Types (CRDTs)events.telecom-sudparis.eu/wetice/invited_speakers/... · Conflict-free Replicated Data Types Conflict-free objects for large-scale distribution

Conflict-free Replicated Data Types

OR-Set

Set: solves Dynamo Shopping Cart anomalyOptimisations

• Just mark tombstones• Garbage-collect tombstones• Operation-based approach

12

Page 43: Conflict-free Replicated Data Types (CRDTs)events.telecom-sudparis.eu/wetice/invited_speakers/... · Conflict-free Replicated Data Types Conflict-free objects for large-scale distribution

Conflict-free Replicated Data Types

Graph design alternatives

Graph = (V, E) where E ⊆ V ! V

Sequential specification:• {v,v' ∈ V} addEdge(v,v') {…}• {∄(v,v') ∈ E} removeVertex(v) {…}

Concurrent: removeVertex(v') || addEdge(v,v')• linearisable?• addEdge wins?• removeVertex wins?• etc.

13

•for our Web Search Engine application, removeVertex wins

•Do not check precondition at add/remove

Page 44: Conflict-free Replicated Data Types (CRDTs)events.telecom-sudparis.eu/wetice/invited_speakers/... · Conflict-free Replicated Data Types Conflict-free objects for large-scale distribution

Conflict-free Replicated Data Types

Graph

Payload = OR-Set V, OR-Set EUpdates add/remove to V, E• addVertex(v), removeVertex(v)• addEdge(v,v'), removeEdge(v,v')

Do not enforce invariant a priori• lookupEdge(v,v') = (v,v') ∈ E

∧ v ∈ V ∧ v' ∈ VremoveVertex(v') || addEdge(v,v')• remove wins"

14

Page 45: Conflict-free Replicated Data Types (CRDTs)events.telecom-sudparis.eu/wetice/invited_speakers/... · Conflict-free Replicated Data Types Conflict-free objects for large-scale distribution

Conflict-free Replicated Data Types

Graph

Payload = OR-Set V, OR-Set EUpdates add/remove to V, E• addVertex(v), removeVertex(v)• addEdge(v,v'), removeEdge(v,v')

Do not enforce invariant a priori• lookupEdge(v,v') = (v,v') ∈ E

∧ v ∈ V ∧ v' ∈ VremoveVertex(v') || addEdge(v,v')• remove wins"

14

Page 46: Conflict-free Replicated Data Types (CRDTs)events.telecom-sudparis.eu/wetice/invited_speakers/... · Conflict-free Replicated Data Types Conflict-free objects for large-scale distribution

Conflict-free Replicated Data Types

Graph

Payload = OR-Set V, OR-Set EUpdates add/remove to V, E• addVertex(v), removeVertex(v)• addEdge(v,v'), removeEdge(v,v')

Do not enforce invariant a priori• lookupEdge(v,v') = (v,v') ∈ E

∧ v ∈ V ∧ v' ∈ VremoveVertex(v') || addEdge(v,v')• remove wins"

14

Page 47: Conflict-free Replicated Data Types (CRDTs)events.telecom-sudparis.eu/wetice/invited_speakers/... · Conflict-free Replicated Data Types Conflict-free objects for large-scale distribution

Conflict-free Replicated Data Types

Graph

Payload = OR-Set V, OR-Set EUpdates add/remove to V, E• addVertex(v), removeVertex(v)• addEdge(v,v'), removeEdge(v,v')

Do not enforce invariant a priori• lookupEdge(v,v') = (v,v') ∈ E

∧ v ∈ V ∧ v' ∈ VremoveVertex(v') || addEdge(v,v')• remove wins"

14

Page 48: Conflict-free Replicated Data Types (CRDTs)events.telecom-sudparis.eu/wetice/invited_speakers/... · Conflict-free Replicated Data Types Conflict-free objects for large-scale distribution

Conflict-free Replicated Data Types

Graph

Payload = OR-Set V, OR-Set EUpdates add/remove to V, E• addVertex(v), removeVertex(v)• addEdge(v,v'), removeEdge(v,v')

Do not enforce invariant a priori• lookupEdge(v,v') = (v,v') ∈ E

∧ v ∈ V ∧ v' ∈ VremoveVertex(v') || addEdge(v,v')• remove wins"

14

Page 49: Conflict-free Replicated Data Types (CRDTs)events.telecom-sudparis.eu/wetice/invited_speakers/... · Conflict-free Replicated Data Types Conflict-free objects for large-scale distribution

Conflict-free Replicated Data Types

Co-operative editing

⊢ ⊣⊢ ⊣

I N R I A

⊢ α β γ δ ε ⊣

• Deep (internal) view:

• DAG representing insert order

• Surface view: summarises total order

• strong order takes precedence over weak

• add: between already-ordered elements

• begin and end sentinels

• ensure consistent ordering at all replicas

{x,z ∈ graphi ∧ x < z}add-betweeni (x, y, z)

{y ∈ graphi ∧ x<y<z}

• Local constraint implies globally acyclic

I N R A

⊢ α β γ ε ⊣

•This spec is implemented directly by WOOT

•Clean

Page 50: Conflict-free Replicated Data Types (CRDTs)events.telecom-sudparis.eu/wetice/invited_speakers/... · Conflict-free Replicated Data Types Conflict-free objects for large-scale distribution

Conflict-free Replicated Data Types

Co-operative editing

⊣⊢ ⊣

I N R I A

⊢ α β γ δ ε ⊣

• Deep (internal) view:

• DAG representing insert order

• Surface view: summarises total order

• strong order takes precedence over weak

• add: between already-ordered elements

• begin and end sentinels

• ensure consistent ordering at all replicas

{x,z ∈ graphi ∧ x < z}add-betweeni (x, y, z)

{y ∈ graphi ∧ x<y<z}

• Local constraint implies globally acyclic

I N R A

⊢ α β γ ε ⊣

•This spec is implemented directly by WOOT

•Clean

Page 51: Conflict-free Replicated Data Types (CRDTs)events.telecom-sudparis.eu/wetice/invited_speakers/... · Conflict-free Replicated Data Types Conflict-free objects for large-scale distribution

Conflict-free Replicated Data Types

Co-operative editing

⊢ ⊣

I N R I A

⊢ α β γ δ ε ⊣

• Deep (internal) view:

• DAG representing insert order

• Surface view: summarises total order

• strong order takes precedence over weak

• add: between already-ordered elements

• begin and end sentinels

• ensure consistent ordering at all replicas

{x,z ∈ graphi ∧ x < z}add-betweeni (x, y, z)

{y ∈ graphi ∧ x<y<z}

• Local constraint implies globally acyclic

I N R A

⊢ α β γ ε ⊣

•This spec is implemented directly by WOOT

•Clean

Page 52: Conflict-free Replicated Data Types (CRDTs)events.telecom-sudparis.eu/wetice/invited_speakers/... · Conflict-free Replicated Data Types Conflict-free objects for large-scale distribution

Conflict-free Replicated Data Types

Co-operative editing

Nβ A

ε

⊢ ⊣

I N R I A

⊢ α β γ δ ε ⊣

• Deep (internal) view:

• DAG representing insert order

• Surface view: summarises total order

• strong order takes precedence over weak

• add: between already-ordered elements

• begin and end sentinels

• ensure consistent ordering at all replicas

{x,z ∈ graphi ∧ x < z}add-betweeni (x, y, z)

{y ∈ graphi ∧ x<y<z}

• Local constraint implies globally acyclic

I N R A

⊢ α β γ ε ⊣

•This spec is implemented directly by WOOT

•Clean

Page 53: Conflict-free Replicated Data Types (CRDTs)events.telecom-sudparis.eu/wetice/invited_speakers/... · Conflict-free Replicated Data Types Conflict-free objects for large-scale distribution

Conflict-free Replicated Data Types

Co-operative editing

Nβ A

ε

⊢ ⊣

I N R I A

⊢ α β γ δ ε ⊣

• Deep (internal) view:

• DAG representing insert order

• Surface view: summarises total order

• strong order takes precedence over weak

• add: between already-ordered elements

• begin and end sentinels

• ensure consistent ordering at all replicas

{x,z ∈ graphi ∧ x < z}add-betweeni (x, y, z)

{y ∈ graphi ∧ x<y<z}

• Local constraint implies globally acyclic

I N R A

⊢ α β γ ε ⊣

•This spec is implemented directly by WOOT

•Clean

Page 54: Conflict-free Replicated Data Types (CRDTs)events.telecom-sudparis.eu/wetice/invited_speakers/... · Conflict-free Replicated Data Types Conflict-free objects for large-scale distribution

Conflict-free Replicated Data Types

Co-operative editing

⊢ ⊣

I N R I A

⊢ α β γ δ ε ⊣

• Deep (internal) view:

• DAG representing insert order

• Surface view: summarises total order

• strong order takes precedence over weak

• add: between already-ordered elements

• begin and end sentinels

• ensure consistent ordering at all replicas

{x,z ∈ graphi ∧ x < z}add-betweeni (x, y, z)

{y ∈ graphi ∧ x<y<z}

• Local constraint implies globally acyclic

I N R A

⊢ α β γ ε ⊣

•This spec is implemented directly by WOOT

•Clean

Page 55: Conflict-free Replicated Data Types (CRDTs)events.telecom-sudparis.eu/wetice/invited_speakers/... · Conflict-free Replicated Data Types Conflict-free objects for large-scale distribution

Conflict-free Replicated Data Types

Co-operative editing

⊢ ⊣

I N R I A

⊢ α β γ δ ε ⊣

• Deep (internal) view:

• DAG representing insert order

• Surface view: summarises total order

• strong order takes precedence over weak

• add: between already-ordered elements

• begin and end sentinels

• ensure consistent ordering at all replicas

{x,z ∈ graphi ∧ x < z}add-betweeni (x, y, z)

{y ∈ graphi ∧ x<y<z}

• Local constraint implies globally acyclic

I N R A

⊢ α β γ ε ⊣

•This spec is implemented directly by WOOT

•Clean

Page 56: Conflict-free Replicated Data Types (CRDTs)events.telecom-sudparis.eu/wetice/invited_speakers/... · Conflict-free Replicated Data Types Conflict-free objects for large-scale distribution

Conflict-free Replicated Data Types

Co-operative editing

⊢ ⊣

I N R I A

⊢ α β γ δ ε ⊣

• Deep (internal) view:

• DAG representing insert order

• Surface view: summarises total order

• strong order takes precedence over weak

• add: between already-ordered elements

• begin and end sentinels

• ensure consistent ordering at all replicas

{x,z ∈ graphi ∧ x < z}add-betweeni (x, y, z)

{y ∈ graphi ∧ x<y<z}

• Local constraint implies globally acyclic

I N R A

⊢ α β γ ε ⊣

•This spec is implemented directly by WOOT

•Clean

Page 57: Conflict-free Replicated Data Types (CRDTs)events.telecom-sudparis.eu/wetice/invited_speakers/... · Conflict-free Replicated Data Types Conflict-free objects for large-scale distribution

Conflict-free Replicated Data Types

Co-operative editing

⊢ ⊣

I N R I A

⊢ α β γ δ ε ⊣

• Deep (internal) view:

• DAG representing insert order

• Surface view: summarises total order

• strong order takes precedence over weak

• add: between already-ordered elements

• begin and end sentinels

• ensure consistent ordering at all replicas

{x,z ∈ graphi ∧ x < z}add-betweeni (x, y, z)

{y ∈ graphi ∧ x<y<z}

• Local constraint implies globally acyclic

I N R A

⊢ α β γ ε ⊣

•This spec is implemented directly by WOOT

•Clean

Page 58: Conflict-free Replicated Data Types (CRDTs)events.telecom-sudparis.eu/wetice/invited_speakers/... · Conflict-free Replicated Data Types Conflict-free objects for large-scale distribution

Conflict-free Replicated Data Types

Co-operative editing

⊢ ⊣

I N R I A

⊢ α β γ δ ε ⊣

• Deep (internal) view:

• DAG representing insert order

• Surface view: summarises total order

• strong order takes precedence over weak

• add: between already-ordered elements

• begin and end sentinels

• ensure consistent ordering at all replicas

{x,z ∈ graphi ∧ x < z}add-betweeni (x, y, z)

{y ∈ graphi ∧ x<y<z}

• Local constraint implies globally acyclic

I N R A

⊢ α β γ ε ⊣

•This spec is implemented directly by WOOT

•Clean

Page 59: Conflict-free Replicated Data Types (CRDTs)events.telecom-sudparis.eu/wetice/invited_speakers/... · Conflict-free Replicated Data Types Conflict-free objects for large-scale distribution

Conflict-free Replicated Data Types

Continuum

Assign each element a unique real number• position

Real numbers not appropriate• approximate by tree

16

⊣⊣⊢⊢ I0

N100

Page 60: Conflict-free Replicated Data Types (CRDTs)events.telecom-sudparis.eu/wetice/invited_speakers/... · Conflict-free Replicated Data Types Conflict-free objects for large-scale distribution

Conflict-free Replicated Data Types

Continuum

Assign each element a unique real number• position

Real numbers not appropriate• approximate by tree

16

⊣⊣⊢⊢ I0

A101

N100

Page 61: Conflict-free Replicated Data Types (CRDTs)events.telecom-sudparis.eu/wetice/invited_speakers/... · Conflict-free Replicated Data Types Conflict-free objects for large-scale distribution

Conflict-free Replicated Data Types

Continuum

Assign each element a unique real number• position

Real numbers not appropriate• approximate by tree

16

⊣⊣⊢⊢ I0

I100.5

A101

N100

Page 62: Conflict-free Replicated Data Types (CRDTs)events.telecom-sudparis.eu/wetice/invited_speakers/... · Conflict-free Replicated Data Types Conflict-free objects for large-scale distribution

Conflict-free Replicated Data Types

Continuum

Assign each element a unique real number• position

Real numbers not appropriate• approximate by tree

16

⊣⊣⊢⊢ I0

R100.25

I100.5

’-1.00

A101

N100

L-1.01

Page 63: Conflict-free Replicated Data Types (CRDTs)events.telecom-sudparis.eu/wetice/invited_speakers/... · Conflict-free Replicated Data Types Conflict-free objects for large-scale distribution

Conflict-free Replicated Data Types

Treedoc binary tree

add appends leaf ⇒ non-destructive, IDs don’t change

17

1

1

0

Binary naming tree: • compact, self-adjusting• Logarithmic properties

0

remove: tombstone, IDs don't change

= L ’ I N R I

I

R

A

N I

•logarithmic: assuming tree

•Compact, low-arity tree

•In the following slides, will

•Low arity: binary, quaternary

Page 64: Conflict-free Replicated Data Types (CRDTs)events.telecom-sudparis.eu/wetice/invited_speakers/... · Conflict-free Replicated Data Types Conflict-free objects for large-scale distribution

Conflict-free Replicated Data Types

Treedoc binary tree

add appends leaf ⇒ non-destructive, IDs don’t change

17

1

1

0

Binary naming tree: • compact, self-adjusting• Logarithmic properties

0

L

0

remove: tombstone, IDs don't change

= L ’ I N R I

I

R

A

N I

•logarithmic: assuming tree

•Compact, low-arity tree

•In the following slides, will

•Low arity: binary, quaternary

Page 65: Conflict-free Replicated Data Types (CRDTs)events.telecom-sudparis.eu/wetice/invited_speakers/... · Conflict-free Replicated Data Types Conflict-free objects for large-scale distribution

Conflict-free Replicated Data Types

Treedoc binary tree

add appends leaf ⇒ non-destructive, IDs don’t change

17

1

1

0

Binary naming tree: • compact, self-adjusting• Logarithmic properties

0

L

0

remove: tombstone, IDs don't change

= L ’ I N R I

I

R

A

N I

•logarithmic: assuming tree

•Compact, low-arity tree

•In the following slides, will

•Low arity: binary, quaternary

Page 66: Conflict-free Replicated Data Types (CRDTs)events.telecom-sudparis.eu/wetice/invited_speakers/... · Conflict-free Replicated Data Types Conflict-free objects for large-scale distribution

Conflict-free Replicated Data Types

Treedoc binary tree

add appends leaf ⇒ non-destructive, IDs don’t change

17

1

1

0

Binary naming tree: • compact, self-adjusting• Logarithmic properties

0

L

0

remove: tombstone, IDs don't change

= L ’ I N R I

I

R

A

N I

•logarithmic: assuming tree

•Compact, low-arity tree

•In the following slides, will

•Low arity: binary, quaternary

Page 67: Conflict-free Replicated Data Types (CRDTs)events.telecom-sudparis.eu/wetice/invited_speakers/... · Conflict-free Replicated Data Types Conflict-free objects for large-scale distribution

Conflict-free Replicated Data Types

Treedoc binary tree

add appends leaf ⇒ non-destructive, IDs don’t change

17

1

1

0

Binary naming tree: • compact, self-adjusting• Logarithmic properties

0

L

0

remove: tombstone, IDs don't change

= L ’ I N R I

I

R

A

N I

•logarithmic: assuming tree

•Compact, low-arity tree

•In the following slides, will

•Low arity: binary, quaternary

Page 68: Conflict-free Replicated Data Types (CRDTs)events.telecom-sudparis.eu/wetice/invited_speakers/... · Conflict-free Replicated Data Types Conflict-free objects for large-scale distribution

Conflict-free Replicated Data Types

Layered Treedoc

18

binary tree

Page 69: Conflict-free Replicated Data Types (CRDTs)events.telecom-sudparis.eu/wetice/invited_speakers/... · Conflict-free Replicated Data Types Conflict-free objects for large-scale distribution

Conflict-free Replicated Data Types

Layered Treedoc

18

Site 34 Site 79sparse 864--ary tree

binary tree

Page 70: Conflict-free Replicated Data Types (CRDTs)events.telecom-sudparis.eu/wetice/invited_speakers/... · Conflict-free Replicated Data Types Conflict-free objects for large-scale distribution

Conflict-free Replicated Data Types

Layered Treedoc

18

Site 34 Site 79sparse 864--ary tree

binary tree

Site 22

Page 71: Conflict-free Replicated Data Types (CRDTs)events.telecom-sudparis.eu/wetice/invited_speakers/... · Conflict-free Replicated Data Types Conflict-free objects for large-scale distribution

Conflict-free Replicated Data Types

Layered Treedoc

18

Site 34 Site 79sparse 864--ary tree

binary tree

Site 34 Site 22

Page 72: Conflict-free Replicated Data Types (CRDTs)events.telecom-sudparis.eu/wetice/invited_speakers/... · Conflict-free Replicated Data Types Conflict-free objects for large-scale distribution

Conflict-free Replicated Data Types

Layered Treedoc

18

Site 34 Site 79

Site 34 Site 66 Site 79

sparse 864--ary tree

binary tree

Site 34 Site 22

Page 73: Conflict-free Replicated Data Types (CRDTs)events.telecom-sudparis.eu/wetice/invited_speakers/... · Conflict-free Replicated Data Types Conflict-free objects for large-scale distribution

Conflict-free Replicated Data Types

Layered Treedoc

18

Site 34 Site 79

Site 34 Site 66 Site 79

sparse 864--ary tree

binary tree

Site 34 Site 22

Edit: Binary treeConcurrency: Sparse tree

Page 74: Conflict-free Replicated Data Types (CRDTs)events.telecom-sudparis.eu/wetice/invited_speakers/... · Conflict-free Replicated Data Types Conflict-free objects for large-scale distribution

The theoryTwo simple conditions

for correctness without synchronisation

Page 75: Conflict-free Replicated Data Types (CRDTs)events.telecom-sudparis.eu/wetice/invited_speakers/... · Conflict-free Replicated Data Types Conflict-free objects for large-scale distribution

Conflict-free Replicated Data Types

Query

Local at source replica• Client's choice

20

•Example: Amazon shopping cart is replicated

•unspecified client, e.g., Web front-end

•One or more•load-balancer, failures may direct

client to different replicas

s3

s1

s2

s

client

Page 76: Conflict-free Replicated Data Types (CRDTs)events.telecom-sudparis.eu/wetice/invited_speakers/... · Conflict-free Replicated Data Types Conflict-free objects for large-scale distribution

Conflict-free Replicated Data Types

Query

Local at source replica• Client's choice

20

•Example: Amazon shopping cart is replicated

•unspecified client, e.g., Web front-end

•One or more•load-balancer, failures may direct

client to different replicas

S

s1.q(a)

s3

s1

s2

s

client

Page 77: Conflict-free Replicated Data Types (CRDTs)events.telecom-sudparis.eu/wetice/invited_speakers/... · Conflict-free Replicated Data Types Conflict-free objects for large-scale distribution

Conflict-free Replicated Data Types

Query

Local at source replica• Client's choice

20

•Example: Amazon shopping cart is replicated

•unspecified client, e.g., Web front-end

•One or more•load-balancer, failures may direct

client to different replicas

s2.q(b)S

S

s1.q(a)

s3

s1

s2

s

client

Page 78: Conflict-free Replicated Data Types (CRDTs)events.telecom-sudparis.eu/wetice/invited_speakers/... · Conflict-free Replicated Data Types Conflict-free objects for large-scale distribution

Conflict-free Replicated Data Types

State-based replication

Local at source s1.u(a), s2.u(b), …• Compute• Update local payload

Convergence:• Episodically: send si payload• On delivery: merge payloads m

21

•merge two valid states

•produce valid state•no historical info

available•Inefficient if

payload is large

s3

s1

s2

s

client

Page 79: Conflict-free Replicated Data Types (CRDTs)events.telecom-sudparis.eu/wetice/invited_speakers/... · Conflict-free Replicated Data Types Conflict-free objects for large-scale distribution

Conflict-free Replicated Data Types

State-based replication

Local at source s1.u(a), s2.u(b), …• Compute• Update local payload

Convergence:• Episodically: send si payload• On delivery: merge payloads m

21

•merge two valid states

•produce valid state•no historical info

available•Inefficient if

payload is large

S

s1.u(a)

s3

s1

s2

s

client

Page 80: Conflict-free Replicated Data Types (CRDTs)events.telecom-sudparis.eu/wetice/invited_speakers/... · Conflict-free Replicated Data Types Conflict-free objects for large-scale distribution

Conflict-free Replicated Data Types

State-based replication

Local at source s1.u(a), s2.u(b), …• Compute• Update local payload

Convergence:• Episodically: send si payload• On delivery: merge payloads m

21

•merge two valid states

•produce valid state•no historical info

available•Inefficient if

payload is large

s2.u(b)S

S

s1.u(a)

s3

s1

s2

s

client

Page 81: Conflict-free Replicated Data Types (CRDTs)events.telecom-sudparis.eu/wetice/invited_speakers/... · Conflict-free Replicated Data Types Conflict-free objects for large-scale distribution

Conflict-free Replicated Data Types

State-based replication

Local at source s1.u(a), s2.u(b), …• Compute• Update local payload

Convergence:• Episodically: send si payload• On delivery: merge payloads m

21

•merge two valid states

•produce valid state•no historical info

available•Inefficient if

payload is large

s2.u(b)S

S

s1.u(a)

s3

s1

s2

s

Page 82: Conflict-free Replicated Data Types (CRDTs)events.telecom-sudparis.eu/wetice/invited_speakers/... · Conflict-free Replicated Data Types Conflict-free objects for large-scale distribution

Conflict-free Replicated Data Types

State-based replication

Local at source s1.u(a), s2.u(b), …• Compute• Update local payload

Convergence:• Episodically: send si payload• On delivery: merge payloads m

21

•merge two valid states

•produce valid state•no historical info

available•Inefficient if

payload is large

M

s2.m(s1)

s2.u(b)S

S

s1.u(a)

s3

s1

s2

s

s1

Page 83: Conflict-free Replicated Data Types (CRDTs)events.telecom-sudparis.eu/wetice/invited_speakers/... · Conflict-free Replicated Data Types Conflict-free objects for large-scale distribution

Conflict-free Replicated Data Types

State-based replication

Local at source s1.u(a), s2.u(b), …• Compute• Update local payload

Convergence:• Episodically: send si payload• On delivery: merge payloads m

21

•merge two valid states

•produce valid state•no historical info

available•Inefficient if

payload is large

M

s2.m(s1)

s3.m(s2)M

s2.u(b)S

S

s1.u(a)

s3

s1

s2

s

s1

s2

Page 84: Conflict-free Replicated Data Types (CRDTs)events.telecom-sudparis.eu/wetice/invited_speakers/... · Conflict-free Replicated Data Types Conflict-free objects for large-scale distribution

Conflict-free Replicated Data Types

State-based replication

Local at source s1.u(a), s2.u(b), …• Compute• Update local payload

Convergence:• Episodically: send si payload• On delivery: merge payloads m

21

•merge two valid states

•produce valid state•no historical info

available•Inefficient if

payload is large

M

s2.m(s1)

s3.m(s2)M

s2.u(b)S

S

s1.u(a) s1.m(s2)M

s3

s1

s2

s

s1s2

s2

Page 85: Conflict-free Replicated Data Types (CRDTs)events.telecom-sudparis.eu/wetice/invited_speakers/... · Conflict-free Replicated Data Types Conflict-free objects for large-scale distribution

Conflict-free Replicated Data Types

If • payload type forms a semi-lattice• updates are increasing• merge computes Least Upper Bound

then replicas converge to LUB of last valuesExample: Payload = int, merge = max

22

•no reference to history

•⊔ = Least Upper Bound LUB = merge

State-based: monotonic semi-lattice ⇒ CRDT

M

s2.m(s1)

s3.m(s2)M

s2.u(b)S

S

s1.u(a) s1.m(s2)M

s3

s1

s2

s

s1s2

s2

Page 86: Conflict-free Replicated Data Types (CRDTs)events.telecom-sudparis.eu/wetice/invited_speakers/... · Conflict-free Replicated Data Types Conflict-free objects for large-scale distribution

Conflict-free Replicated Data Types

Operation-based replication

At source:• prepare• broadcast to all replicas

Eventually, at all replicas:• update local replica

23

S

S

•push to all replicas eventually

•push small updates- more efficient than

state-based

s3

s1

s2

s s1.u(a)

s2.u(b)

Page 87: Conflict-free Replicated Data Types (CRDTs)events.telecom-sudparis.eu/wetice/invited_speakers/... · Conflict-free Replicated Data Types Conflict-free objects for large-scale distribution

Conflict-free Replicated Data Types

Operation-based replication

At source:• prepare• broadcast to all replicas

Eventually, at all replicas:• update local replica

23

s1.u(b)D

s3.u(b)

D

S

S

•push to all replicas eventually

•push small updates- more efficient than

state-based

s3

s1

s2

s s1.u(a)

s2.u(b) b

b

Page 88: Conflict-free Replicated Data Types (CRDTs)events.telecom-sudparis.eu/wetice/invited_speakers/... · Conflict-free Replicated Data Types Conflict-free objects for large-scale distribution

Conflict-free Replicated Data Types

Operation-based replication

At source:• prepare• broadcast to all replicas

Eventually, at all replicas:• update local replica

23

D

s3.u(a)

s1.u(b)D

s3.u(b)

D

S

S

D

s2.u(a)

•push to all replicas eventually

•push small updates- more efficient than

state-based

s3

s1

s2

s s1.u(a)

s2.u(b) b a

ba

Page 89: Conflict-free Replicated Data Types (CRDTs)events.telecom-sudparis.eu/wetice/invited_speakers/... · Conflict-free Replicated Data Types Conflict-free objects for large-scale distribution

Conflict-free Replicated Data Types

Op-based: commute ⇒ CRDT

If:! •! (Liveness) all replicas execute all operations! ! ! in delivery order

• (Safety) concurrent operations all commuteThen: replicas converge

24

•Delivery order ≃ ensures downstream precondition

•happened-before or weaker

s1.u(a)

s2.u(b)

D

D

D

S

S

D

s3

s1

s2

s

b a

b

s3.u(b) s3.u(a)

s1.u(b)

s2.u(a)a

Page 90: Conflict-free Replicated Data Types (CRDTs)events.telecom-sudparis.eu/wetice/invited_speakers/... · Conflict-free Replicated Data Types Conflict-free objects for large-scale distribution

Conflict-free Replicated Data Types

Monotonic semi-lattice⇔ commutative

1. A state-based object can emulate an operation-based object, and vice-versa

2. State-based emulation of a CvRDT is a CmRDT

3. Operation-based emulation of a CvRDT is a CmRDT

25

•Systematic transformation•Inefficient•⇒ Hand-crafted op-based implementation

Page 91: Conflict-free Replicated Data Types (CRDTs)events.telecom-sudparis.eu/wetice/invited_speakers/... · Conflict-free Replicated Data Types Conflict-free objects for large-scale distribution

Conflict-free Replicated Data Types

Operation-based OR-Set

26

•Set of IDs associated with a in the old state

•As observed by source!

Payload: S = { (e,α), (e, β), (e', γ), … }! ! where α, β,… uniqueOperations: • lookup(e) = ∃ α: (e, α) ∈ S

Page 92: Conflict-free Replicated Data Types (CRDTs)events.telecom-sudparis.eu/wetice/invited_speakers/... · Conflict-free Replicated Data Types Conflict-free objects for large-scale distribution

Conflict-free Replicated Data Types

Operation-based OR-Set

• add(e) = S ≔ S ∪ {(e, α)} where α fresh

26

•Set of IDs associated with a in the old state

•As observed by source!

Payload: S = { (e,α), (e, β), (e', γ), … }! ! where α, β,… uniqueOperations: • lookup(e) = ∃ α: (e, α) ∈ S

Page 93: Conflict-free Replicated Data Types (CRDTs)events.telecom-sudparis.eu/wetice/invited_speakers/... · Conflict-free Replicated Data Types Conflict-free objects for large-scale distribution

Conflict-free Replicated Data Types

Operation-based OR-Set

• add(e) = S ≔ S ∪ {(e, α)} where α fresh• remove (e) =! (at source) R = {(e, α) ∈ S}! (downstream) S ≔ S \ R

26

•Set of IDs associated with a in the old state

•As observed by source!

Payload: S = { (e,α), (e, β), (e', γ), … }! ! where α, β,… uniqueOperations: • lookup(e) = ∃ α: (e, α) ∈ S

Page 94: Conflict-free Replicated Data Types (CRDTs)events.telecom-sudparis.eu/wetice/invited_speakers/... · Conflict-free Replicated Data Types Conflict-free objects for large-scale distribution

Conflict-free Replicated Data Types

Operation-based OR-Set

• add(e) = S ≔ S ∪ {(e, α)} where α fresh• remove (e) =! (at source) R = {(e, α) ∈ S}! (downstream) S ≔ S \ R

No tombstones

26

•Set of IDs associated with a in the old state

•As observed by source!

Payload: S = { (e,α), (e, β), (e', γ), … }! ! where α, β,… uniqueOperations: • lookup(e) = ∃ α: (e, α) ∈ S

Page 95: Conflict-free Replicated Data Types (CRDTs)events.telecom-sudparis.eu/wetice/invited_speakers/... · Conflict-free Replicated Data Types Conflict-free objects for large-scale distribution

Conflict-free Replicated Data Types

Operation-based OR-Set

• add(e) = S ≔ S ∪ {(e, α)} where α fresh• remove (e) =! (at source) R = {(e, α) ∈ S}! (downstream) S ≔ S \ R

No tombstones{true} add(e) || remove(e) {e ∈ S}

26

•Set of IDs associated with a in the old state

•As observed by source!

Payload: S = { (e,α), (e, β), (e', γ), … }! ! where α, β,… uniqueOperations: • lookup(e) = ∃ α: (e, α) ∈ S

Page 96: Conflict-free Replicated Data Types (CRDTs)events.telecom-sudparis.eu/wetice/invited_speakers/... · Conflict-free Replicated Data Types Conflict-free objects for large-scale distribution

Ongoing work

Page 97: Conflict-free Replicated Data Types (CRDTs)events.telecom-sudparis.eu/wetice/invited_speakers/... · Conflict-free Replicated Data Types Conflict-free objects for large-scale distribution

Conflict-free Replicated Data Types

CRDTsfor P2P

& Cloud Computing

ConcoRDanT: ANR 2010–2013Systematic study of conflict-free design space

• Theory and practice• Characterise invariants• Library of data types

Not universal• Conflict-free vs. conflict semantics• Move consensus off critical path, non-critical ops

28

Page 98: Conflict-free Replicated Data Types (CRDTs)events.telecom-sudparis.eu/wetice/invited_speakers/... · Conflict-free Replicated Data Types Conflict-free objects for large-scale distribution

Conflict-free Replicated Data Types

CRDT + dataflow

Incremental, asynchronous processing• Replicate, shard CRDTs near the edge• Propagate updates ≈ dataflow• Throttle according to QoS metrics

(freshness, availability, cost, etc.)Scale: shardedSynchronous processing: snapshot, at centre

29

Web site 1

Web site 2

Web site 3

Whitelist

crawl

Content DB

extractlinks

extractwords

Graph

Words

spam detector

Map URL to last 2 versions

Map word to Set of URLs

Graph

Set

Page 99: Conflict-free Replicated Data Types (CRDTs)events.telecom-sudparis.eu/wetice/invited_speakers/... · Conflict-free Replicated Data Types Conflict-free objects for large-scale distribution

Conflict-free Replicated Data Types

OR-Set + Snapshot

Read consistent snapshot• Despite concurrent, incremental updates

Unique token = time (vector clock)• α = Lamport (process i, counter t)• UIDs identify snapshot version• Snapshot: vector clock value• Retain tombstones until not needed

lookup(e, t) = ∃ (e, i, t')∈A : t'>t ∧ ∄ (e, i, t')∈R: t'>t

30

Page 100: Conflict-free Replicated Data Types (CRDTs)events.telecom-sudparis.eu/wetice/invited_speakers/... · Conflict-free Replicated Data Types Conflict-free objects for large-scale distribution

Conflict-free Replicated Data Types

Sharded OR-Set

Very large objects• Independent shards• Static: hash

Statically-Sharded CRDT• Each shard is a CRDT• Update: single shard• No cross-object invariants• The combination remains a CRDT

Statically Sharded OR-Set• Combination of smaller OR-Sets• Snapshot: clock across shards

31

•(Dynamic: requires consensus to rebalance)

Page 101: Conflict-free Replicated Data Types (CRDTs)events.telecom-sudparis.eu/wetice/invited_speakers/... · Conflict-free Replicated Data Types Conflict-free objects for large-scale distribution

Conflict-free Replicated Data Types

Take aways

Principled approach • Strong Eventual Consistency

Two sufficient conditions:• State: monotonic semi-lattice• Operation: commutativity

Useful CRDTs• Register, Counter, Set, Map (KVS),

Graph, Monotonic DAG, SequenceFuture work

• Snapshot, sharding, dataflow• A wee bit of synchronisation

32

Page 102: Conflict-free Replicated Data Types (CRDTs)events.telecom-sudparis.eu/wetice/invited_speakers/... · Conflict-free Replicated Data Types Conflict-free objects for large-scale distribution

Conflict-free Replicated Data Types

Portfolio of CRDTs

Register• Last-Writer Wins• Multi-Value

Set• Grow-Only• 2P• Observed-Remove

Map• Set of Registers

Counter• Unlimited• Non-negative

Graphs• Directed• Monotonic DAG• Edit graph

Sequence• Edit sequence

33