bigsets…crdt sets but bigger... “…after some analysis we found that much of our data could be...

168
Bigsets…CRDT sets but BIGGER Russell Brown, Basho Technologies [email protected]

Upload: others

Post on 11-Jun-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Bigsets…CRDT sets but BIGGER... “…after some analysis we found that much of our data could be modelled within sets so

Bigsets…CRDT sets but BIGGER

Russell Brown, Basho Technologies [email protected]

Page 2: Bigsets…CRDT sets but BIGGER... “…after some analysis we found that much of our data could be modelled within sets so
Page 3: Bigsets…CRDT sets but BIGGER... “…after some analysis we found that much of our data could be modelled within sets so
Page 4: Bigsets…CRDT sets but BIGGER... “…after some analysis we found that much of our data could be modelled within sets so
Page 5: Bigsets…CRDT sets but BIGGER... “…after some analysis we found that much of our data could be modelled within sets so
Page 6: Bigsets…CRDT sets but BIGGER... “…after some analysis we found that much of our data could be modelled within sets so

This project is funded by the European Union,

7th Research Framework Programme, ICT call 10,

grant agreement n°609551.

Page 7: Bigsets…CRDT sets but BIGGER... “…after some analysis we found that much of our data could be modelled within sets so

1.What are CRDTs (good for)?

2.History of CRDT Sets

3.Sets in Riak

4.Bigger Sets in Riak

4 Sections

Page 8: Bigsets…CRDT sets but BIGGER... “…after some analysis we found that much of our data could be modelled within sets so

Why CRDTs?

Page 9: Bigsets…CRDT sets but BIGGER... “…after some analysis we found that much of our data could be modelled within sets so

Fundamental Trade Off

• Lipton/Sandberg ’88

• Attiya/Welch ’94

• Gilbert/Lynch ’02

Low Latency/Availability: - Increased Revenue - User Engagement

Strong Consistency:- Easier for Programmers

- Less user “surprise”

Page 10: Bigsets…CRDT sets but BIGGER... “…after some analysis we found that much of our data could be modelled within sets so
Page 11: Bigsets…CRDT sets but BIGGER... “…after some analysis we found that much of our data could be modelled within sets so
Page 12: Bigsets…CRDT sets but BIGGER... “…after some analysis we found that much of our data could be modelled within sets so

1 CLIENT

2 REPLICAS

1 KEY

Page 13: Bigsets…CRDT sets but BIGGER... “…after some analysis we found that much of our data could be modelled within sets so

GETPUT

UPDATE

REPLICATE

Page 14: Bigsets…CRDT sets but BIGGER... “…after some analysis we found that much of our data could be modelled within sets so

PUTPUT

Page 15: Bigsets…CRDT sets but BIGGER... “…after some analysis we found that much of our data could be modelled within sets so

Quorum

Page 16: Bigsets…CRDT sets but BIGGER... “…after some analysis we found that much of our data could be modelled within sets so

GET

A

PUT

A

GET

B

PUT

B

TEMPORAL TIME

Page 17: Bigsets…CRDT sets but BIGGER... “…after some analysis we found that much of our data could be modelled within sets so

happens before

concurrent ——— divergent

convergent

Logical Clocks

Page 18: Bigsets…CRDT sets but BIGGER... “…after some analysis we found that much of our data could be modelled within sets so

happens before

concurrent ——— divergent

convergent

Logical Clocks

Page 19: Bigsets…CRDT sets but BIGGER... “…after some analysis we found that much of our data could be modelled within sets so
Page 20: Bigsets…CRDT sets but BIGGER... “…after some analysis we found that much of our data could be modelled within sets so

>155196119890 155196118001

Timestamp based reconciliation

Page 21: Bigsets…CRDT sets but BIGGER... “…after some analysis we found that much of our data could be modelled within sets so
Page 22: Bigsets…CRDT sets but BIGGER... “…after some analysis we found that much of our data could be modelled within sets so

Business Logic/Semantic Reconciliation

Page 23: Bigsets…CRDT sets but BIGGER... “…after some analysis we found that much of our data could be modelled within sets so
Page 24: Bigsets…CRDT sets but BIGGER... “…after some analysis we found that much of our data could be modelled within sets so

happens before

concurrent ——— divergent

convergent

Logical Clocks

Page 25: Bigsets…CRDT sets but BIGGER... “…after some analysis we found that much of our data could be modelled within sets so

Removes?

Page 26: Bigsets…CRDT sets but BIGGER... “…after some analysis we found that much of our data could be modelled within sets so

GET

A

PUT

A

GET

B

PUT

B

TEMPORAL TIME

Page 27: Bigsets…CRDT sets but BIGGER... “…after some analysis we found that much of our data could be modelled within sets so

Page 28: Bigsets…CRDT sets but BIGGER... “…after some analysis we found that much of our data could be modelled within sets so
Page 29: Bigsets…CRDT sets but BIGGER... “…after some analysis we found that much of our data could be modelled within sets so

Removes?

Page 30: Bigsets…CRDT sets but BIGGER... “…after some analysis we found that much of our data could be modelled within sets so

Google F1“Designing applications to cope with concurrency anomalies in their data is very error-prone, time-consuming, and ultimately not worth the performance gains.”

Page 31: Bigsets…CRDT sets but BIGGER... “…after some analysis we found that much of our data could be modelled within sets so

http://www.infoq.com/articles/key-lessons-learned-from-transition-to-nosql

“…writing merge functions was likely to confuse the hell out of

all our developers and slow down development…”

Page 32: Bigsets…CRDT sets but BIGGER... “…after some analysis we found that much of our data could be modelled within sets so

CRDTsDATA TYPESThat CONVERGE

Page 33: Bigsets…CRDT sets but BIGGER... “…after some analysis we found that much of our data could be modelled within sets so

CRDTsOff the shelfMERGE functions

Page 34: Bigsets…CRDT sets but BIGGER... “…after some analysis we found that much of our data could be modelled within sets so

CRDTsCRDTs are Data Types (maps/sets/booleans/graphs/etc)

THAT CONVERGE

Page 35: Bigsets…CRDT sets but BIGGER... “…after some analysis we found that much of our data could be modelled within sets so
Page 36: Bigsets…CRDT sets but BIGGER... “…after some analysis we found that much of our data could be modelled within sets so

CRDT SETS

Page 37: Bigsets…CRDT sets but BIGGER... “…after some analysis we found that much of our data could be modelled within sets so

http://www.infoq.com/articles/key-lessons-learned-from-transition-to-nosql

“…after some analysis we found that much of our data could be modelled

within sets so by leveraging CRDT’s our developers don't have to worry about

writing bespoke merge functions for 95% of carefully selected use cases…”

Page 38: Bigsets…CRDT sets but BIGGER... “…after some analysis we found that much of our data could be modelled within sets so

Evolution of a CRDT Set

Page 39: Bigsets…CRDT sets but BIGGER... “…after some analysis we found that much of our data could be modelled within sets so

Evolution of a Set

G-SET

Page 40: Bigsets…CRDT sets but BIGGER... “…after some analysis we found that much of our data could be modelled within sets so

Bob

Replica BReplica AShelly

AnnaPete

AlexShelly

Page 41: Bigsets…CRDT sets but BIGGER... “…after some analysis we found that much of our data could be modelled within sets so

Bob

Replica BReplica AShelly

AnnaPete

AlexShelly⨆

Page 42: Bigsets…CRDT sets but BIGGER... “…after some analysis we found that much of our data could be modelled within sets so

BobShelly

AnnaPete

AlexG-SET

Page 43: Bigsets…CRDT sets but BIGGER... “…after some analysis we found that much of our data could be modelled within sets so

Removes?

Page 44: Bigsets…CRDT sets but BIGGER... “…after some analysis we found that much of our data could be modelled within sets so

Evolution of a Set

G-SET2P-SET

Page 45: Bigsets…CRDT sets but BIGGER... “…after some analysis we found that much of our data could be modelled within sets so

Adds

BobPete

Shelly

Anna

Removes

BobPete

Shelly

2P-SET

Page 46: Bigsets…CRDT sets but BIGGER... “…after some analysis we found that much of our data could be modelled within sets so

Adds

BobPete

Shelly

Anna

Removes

BobPete

Shelly

= Anna

Page 47: Bigsets…CRDT sets but BIGGER... “…after some analysis we found that much of our data could be modelled within sets so

Value /= Structure

Page 48: Bigsets…CRDT sets but BIGGER... “…after some analysis we found that much of our data could be modelled within sets so

Adds

BobPete

Shelly

Anna

Removes

BobPete

Shelly

= Anna

Page 49: Bigsets…CRDT sets but BIGGER... “…after some analysis we found that much of our data could be modelled within sets so

I changed my mind!

Page 50: Bigsets…CRDT sets but BIGGER... “…after some analysis we found that much of our data could be modelled within sets so

Adds

BobPete

Shelly

Anna

Removes

BobPete

Shelly

= AnnaShelly

Page 51: Bigsets…CRDT sets but BIGGER... “…after some analysis we found that much of our data could be modelled within sets so

Evolution of a Set

U-SET

Page 52: Bigsets…CRDT sets but BIGGER... “…after some analysis we found that much of our data could be modelled within sets so

Bob

Replica BReplica A

Shelly

AnnaPete

AlexShelly⨆

4321 5

6

Page 53: Bigsets…CRDT sets but BIGGER... “…after some analysis we found that much of our data could be modelled within sets so

BobShelly

AnnaPete

Alex432

1,6

5

U-SET

Page 54: Bigsets…CRDT sets but BIGGER... “…after some analysis we found that much of our data could be modelled within sets so

Evolution of a Set

U-SETOR-SET

Page 55: Bigsets…CRDT sets but BIGGER... “…after some analysis we found that much of our data could be modelled within sets so

Evolution of a Set

OR-SETAW-SET

Page 56: Bigsets…CRDT sets but BIGGER... “…after some analysis we found that much of our data could be modelled within sets so

Adds

Shelly

Removes

BobShelly

AnnaPete

432

1,5ShellyBob

Shelly

Pete321

AW-SET

Page 57: Bigsets…CRDT sets but BIGGER... “…after some analysis we found that much of our data could be modelled within sets so

Adds

2

3

Bob

Pete

1 Shelly

Replica A

Removes

2

3

Bob

Pete

1 Shelly

Replica B

4 Anna

5 Shelly

Adds

Page 58: Bigsets…CRDT sets but BIGGER... “…after some analysis we found that much of our data could be modelled within sets so

Adds

2

3

Removes

Bob

Pete

1,5 Shelly

2

3

Bob

Pete

1 Shelly

4 Anna

Anna=Shelly

Page 59: Bigsets…CRDT sets but BIGGER... “…after some analysis we found that much of our data could be modelled within sets so

ObservedRemove

Page 60: Bigsets…CRDT sets but BIGGER... “…after some analysis we found that much of our data could be modelled within sets so

Semantics

AddWins

Page 61: Bigsets…CRDT sets but BIGGER... “…after some analysis we found that much of our data could be modelled within sets so

Evolution of a Set

AW-SET

Page 62: Bigsets…CRDT sets but BIGGER... “…after some analysis we found that much of our data could be modelled within sets so

Adds

2

3

Removes

Bob

Pete

1,5 Shelly

2

3

Bob

Pete

1,5 Shelly

=Shelly

4 Anna

[]4 Anna

Page 63: Bigsets…CRDT sets but BIGGER... “…after some analysis we found that much of our data could be modelled within sets so
Page 64: Bigsets…CRDT sets but BIGGER... “…after some analysis we found that much of our data could be modelled within sets so

Evolution of a Set

OR-SWOT

Page 65: Bigsets…CRDT sets but BIGGER... “…after some analysis we found that much of our data could be modelled within sets so

Evolution of a Set

OR-SWOTOptimised

AW-SET

Page 66: Bigsets…CRDT sets but BIGGER... “…after some analysis we found that much of our data could be modelled within sets so

Version Vectors

A B

Page 67: Bigsets…CRDT sets but BIGGER... “…after some analysis we found that much of our data could be modelled within sets so

Version Vectors

A B

{a, 1}

{b, 1}

{a, 2}

[ ]{a, 2}, {b, 1}

Page 68: Bigsets…CRDT sets but BIGGER... “…after some analysis we found that much of our data could be modelled within sets so

EVENTS/TAGS

A

{a, 1}

{a, 2}

Page 69: Bigsets…CRDT sets but BIGGER... “…after some analysis we found that much of our data could be modelled within sets so

[{a, 1}]

{a, 1} Shelly

Replica A

Page 70: Bigsets…CRDT sets but BIGGER... “…after some analysis we found that much of our data could be modelled within sets so

[{a, 1}] [{a, 1}

{a, 1} Shelly {a, 1} Shelly

Replica B Replica A

Page 71: Bigsets…CRDT sets but BIGGER... “…after some analysis we found that much of our data could be modelled within sets so

[{a, 1}] [{a, 1}, {b, 3}]

{a, 1} Shelly

{b, 1}

{b, 2}

{b, 3}

Bob

Pete

Phil

{a, 1} Shelly

Replica A Replica B

Page 72: Bigsets…CRDT sets but BIGGER... “…after some analysis we found that much of our data could be modelled within sets so

[{a, 1}, {b,3}] [{a, 1}, {b, 3}]

{a, 1} Shelly

{b, 1}

{b, 2}

{b, 3}

Bob

Pete

Phil

{a, 1} Shelly

{b, 1}

{b, 2}

{b, 3}

Bob

Pete

Phil

Replica B Replica A

Page 73: Bigsets…CRDT sets but BIGGER... “…after some analysis we found that much of our data could be modelled within sets so

[{a, 2}, {b, 3}]

{b, 1}

{b, 3}

[{a, 1}, {b, 4}]

Bob

Pete

{a, 1} Shelly

{b, 1}

{b, 2}

{b, 3}

Bob

Pete

Phil

{a, 1} Shelly

{a, 2} Anna {b, 4} John

Replica A Replica B

{b, 2} Phil

Page 74: Bigsets…CRDT sets but BIGGER... “…after some analysis we found that much of our data could be modelled within sets so

[{a, 2}, {b, 3}]

{b, 1}

{b, 3}

[{a, 1}, {b, 4}]

Bob

Pete

{a, 1} Shelly

{b, 1}

{b, 2}

{b, 3}

Bob

Pete

Phil

{a, 1} Shelly

{a, 2} Anna {b, 4} John

Replica A Replica B

Page 75: Bigsets…CRDT sets but BIGGER... “…after some analysis we found that much of our data could be modelled within sets so

[{a, 2}, {b, 3}]

{b, 1}

{b, 3}

[{a, 1}, {b, 4}]

Bob

Pete

{a, 1} Shelly

{b, 1}

{b, 2}

{b, 3}

Bob

Pete

Phil

{a, 1} Shelly

{a, 2} Anna {b, 4} John

Replica A Replica B MERGE

Page 76: Bigsets…CRDT sets but BIGGER... “…after some analysis we found that much of our data could be modelled within sets so

[{a, 2}, {b, 3}]

{b, 1}

{b, 3}

[{a, 1}, {b, 4}]

Bob

Pete

{a, 1} Shelly

{b, 1}

{b, 2}

{b, 3}

Bob

Pete

Phil

{a, 1} Shelly

{a, 2} Anna {b, 4} John

Replica A Replica B MERGE

{a, 1} Shelly

{b, 1} Bob

Page 77: Bigsets…CRDT sets but BIGGER... “…after some analysis we found that much of our data could be modelled within sets so

[{a, 2}, {b, 3}]{b, 2} ∈Phil

Page 78: Bigsets…CRDT sets but BIGGER... “…after some analysis we found that much of our data could be modelled within sets so

[{a, 2}, {b, 3}]{b, 2} ∈Phil

Page 79: Bigsets…CRDT sets but BIGGER... “…after some analysis we found that much of our data could be modelled within sets so

[{a, 2}, {b, 3}]

{b, 1}

{b, 3}

[{a, 1}, {b, 4}]

Bob

Pete

{a, 1} Shelly

{b, 1}

{b, 2}

{b, 3}

Bob

Pete

Phil

{a, 1} Shelly

{a, 2} Anna {b, 4} John

Replica A Replica B MERGE

{a, 1} Shelly

{b, 1} Bob

{b, 3} Pete

Page 80: Bigsets…CRDT sets but BIGGER... “…after some analysis we found that much of our data could be modelled within sets so

∉{a, 2}Anna

[{a, 1}, {b, 4}]

Page 81: Bigsets…CRDT sets but BIGGER... “…after some analysis we found that much of our data could be modelled within sets so

[{a, 2}, {b, 3}]

{b, 1}

{b, 3}

[{a, 1}, {b, 4}]

Bob

Pete

{a, 1} Shelly

{b, 1}

{b, 2}

{b, 3}

Bob

Pete

Phil

{a, 1} Shelly

{a, 2} Anna {b, 4} John

Replica A Replica B MERGE

{a, 1} Shelly

{b, 1} Bob

{b, 3} Pete

{a, 2} Anna

{b, 4} John

⨆ =

Page 82: Bigsets…CRDT sets but BIGGER... “…after some analysis we found that much of our data could be modelled within sets so

[{a, 2}, {b, 4}]

{b, 1}

{a, 2}

{b, 3}

Bob

Pete

Anna

{a, 1} Shelly

{b, 4} John

= Bob

Pete

Anna

Shelly

John

Page 83: Bigsets…CRDT sets but BIGGER... “…after some analysis we found that much of our data could be modelled within sets so

CRDT Sets

a semantic of “Add-Wins” via

“Observed Remove”

Page 84: Bigsets…CRDT sets but BIGGER... “…after some analysis we found that much of our data could be modelled within sets so

SETS in RIAK 2.0+

Page 85: Bigsets…CRDT sets but BIGGER... “…after some analysis we found that much of our data could be modelled within sets so

Riak 2.0riak_dt -> Riak Data Types

Page 86: Bigsets…CRDT sets but BIGGER... “…after some analysis we found that much of our data could be modelled within sets so

Sets in Riak

An optimized conflict-free replicated set Annette Bieniusa et al

http://arxiv.org/abs/1210.3368

Page 87: Bigsets…CRDT sets but BIGGER... “…after some analysis we found that much of our data could be modelled within sets so

WHO USES THE LIB?

Page 88: Bigsets…CRDT sets but BIGGER... “…after some analysis we found that much of our data could be modelled within sets so

A B

Client 1

who’s the actor?

Client 10000Client 2 ………

Page 89: Bigsets…CRDT sets but BIGGER... “…after some analysis we found that much of our data could be modelled within sets so

A B

Client 1 Client 10000Client 2 ………

{c1, 1} {cX, 1}{c2, 1}

Page 90: Bigsets…CRDT sets but BIGGER... “…after some analysis we found that much of our data could be modelled within sets so

A B

Client 1

[]

Page 91: Bigsets…CRDT sets but BIGGER... “…after some analysis we found that much of our data could be modelled within sets so

A B

Client 1

{c1, 1} Shelly

Page 92: Bigsets…CRDT sets but BIGGER... “…after some analysis we found that much of our data could be modelled within sets so

A B

Client 1

[]

Page 93: Bigsets…CRDT sets but BIGGER... “…after some analysis we found that much of our data could be modelled within sets so

A B

Client 1

{c1, 1} Bob

Page 94: Bigsets…CRDT sets but BIGGER... “…after some analysis we found that much of our data could be modelled within sets so

Read Your Own

Writes

Page 95: Bigsets…CRDT sets but BIGGER... “…after some analysis we found that much of our data could be modelled within sets so

A B

REPLICAS ACTORS

Page 96: Bigsets…CRDT sets but BIGGER... “…after some analysis we found that much of our data could be modelled within sets so

HOW TO USE THE LIB?

Page 97: Bigsets…CRDT sets but BIGGER... “…after some analysis we found that much of our data could be modelled within sets so

A B

[HAIRDRYER, PENCIL CASE]

SHOPPING CART

Page 98: Bigsets…CRDT sets but BIGGER... “…after some analysis we found that much of our data could be modelled within sets so

A B

[…, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …,

…, …, …, …, …, …, …, …, …, …], […,

ZUCK’s FOLLOWERS?

Page 99: Bigsets…CRDT sets but BIGGER... “…after some analysis we found that much of our data could be modelled within sets so

Client X Client Y

Add “Shelly” Add “Bob”

[…, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …,

…, …, …], […, …, …, …, …, …, …, …,

ZUCK’s FOLLOWERS

[…, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …,

…, …, …], […, …, …, …, …, …, …, …,

ZUCK’s FOLLOWERS

Page 100: Bigsets…CRDT sets but BIGGER... “…after some analysis we found that much of our data could be modelled within sets so

Client X Client Y

remove “Shelly” remove “Bob”

[…, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …,

…, …, …], […, …, …, …, …, …, …, …,

ZUCK’s FOLLOWERS

[…, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …,

…, …, …], […, …, …, …, …, …, …, …,

ZUCK’s FOLLOWERS

Page 101: Bigsets…CRDT sets but BIGGER... “…after some analysis we found that much of our data could be modelled within sets so

ObservedRemove

Page 102: Bigsets…CRDT sets but BIGGER... “…after some analysis we found that much of our data could be modelled within sets so

Client X Client Y

[…, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …,

…, …, …], […, …, …, …, …, …, …, …,

ZUCK’s FOLLOWERS

[…, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …,

…, …, …], […, …, …, …, …, …, …, …,

ZUCK’s FOLLOWERS

[…, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …,

…, …, …], […, …, …, …, …, …, …, …,

[…, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …,

…, …, …], […, …, …, …, …, …, …, …,

Page 103: Bigsets…CRDT sets but BIGGER... “…after some analysis we found that much of our data could be modelled within sets so

Client X Client Y

remove “Shelly”[{A3, B4}]

remove “Bob”[{A1, B5}]

[…, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …,

…, …, …], […, …, …, …, …, …, …, …,

ZUCK’s FOLLOWERS

[…, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …,

…, …, …], […, …, …, …, …, …, …, …,

ZUCK’s FOLLOWERS

Page 104: Bigsets…CRDT sets but BIGGER... “…after some analysis we found that much of our data could be modelled within sets so

Sets in Riak• Operation Based API

• With causal Context for removes!

• Vnode As Actor/Replica

• Action-at-a-distance

• Full state replication

Page 105: Bigsets…CRDT sets but BIGGER... “…after some analysis we found that much of our data could be modelled within sets so

Sets in Riak

An optimized conflict-free replicated set Annette Bieniusa et al

http://arxiv.org/abs/1210.3368

Page 106: Bigsets…CRDT sets but BIGGER... “…after some analysis we found that much of our data could be modelled within sets so

Sets in Riak

Page 107: Bigsets…CRDT sets but BIGGER... “…after some analysis we found that much of our data could be modelled within sets so

Sets in Riak

Page 108: Bigsets…CRDT sets but BIGGER... “…after some analysis we found that much of our data could be modelled within sets so

Sets in Riak

PHOTO © 2011 J. RONALD LEE, CC ATTRIBUTION 3.0. https://www.flickr.com/photos/jronaldlee/5566380424

Page 109: Bigsets…CRDT sets but BIGGER... “…after some analysis we found that much of our data could be modelled within sets so

Client X

Add “Shelly”

[…, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …,

…, …, …], […, …, …, …, …, …, …, …,

ZUCK’s FOLLOWERS

Page 110: Bigsets…CRDT sets but BIGGER... “…after some analysis we found that much of our data could be modelled within sets so

[…, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …,

…, …, …], […, …, …, …, …, …, …, …,

ZUCK’s FOLLOWERS

Page 111: Bigsets…CRDT sets but BIGGER... “…after some analysis we found that much of our data could be modelled within sets so

[…, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …,

…, …, …], […, …, …, …, …, …, …, …,

ZUCK’s FOLLOWERS

Page 112: Bigsets…CRDT sets but BIGGER... “…after some analysis we found that much of our data could be modelled within sets so

[…, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …,

…, …, …], […, …, …, …, …, …, …, …,

ZUCK’s FOLLOWERS

Page 113: Bigsets…CRDT sets but BIGGER... “…after some analysis we found that much of our data could be modelled within sets so

[…, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …,

…, …, …], […, …, …, …, …, …, …, …,

ZUCK’s FOLLOWERS

Page 114: Bigsets…CRDT sets but BIGGER... “…after some analysis we found that much of our data could be modelled within sets so

[…, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …,

…, …, …, …, …, …, …, …, …, …,

ZUCK’s FOLLOWERS

[{a, 34}, {b, 1000}…]

Page 115: Bigsets…CRDT sets but BIGGER... “…after some analysis we found that much of our data could be modelled within sets so

[{a35}shelly, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …,

ZUCK’s FOLLOWERS

[{a, 35}, {b, 1000}…]

Page 116: Bigsets…CRDT sets but BIGGER... “…after some analysis we found that much of our data could be modelled within sets so

[…, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …,

…, …, …], […, …, …, …, …, …, …, …,

ZUCK’s FOLLOWERS

Page 117: Bigsets…CRDT sets but BIGGER... “…after some analysis we found that much of our data could be modelled within sets so

[…, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …,

…, …, …], […, …, …, …, …, …, …, …,

ZUCK’s FOLLOWERS

Page 118: Bigsets…CRDT sets but BIGGER... “…after some analysis we found that much of our data could be modelled within sets so

[…, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …,

…, …, …], […, …, …, …, …, …, …, …,

ZUCK’s FOLLOWERS

REPLICATE

Page 119: Bigsets…CRDT sets but BIGGER... “…after some analysis we found that much of our data could be modelled within sets so

[…, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …,

…, …, …], […, …, …, …, …, …, …, …,

ZUCK’s FOLLOWERS

[…, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …,

…, …, …], […, …, …, …, …, …, …, …,

ZUCK’s FOLLOWERS

U

Page 120: Bigsets…CRDT sets but BIGGER... “…after some analysis we found that much of our data could be modelled within sets so

[…, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …,

…, …, …], […, …, …, …, …, …, …, …,

ZUCK’s FOLLOWERS

[…, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …,

…, …, …], […, …, …, …, …, …, …, …,

ZUCK’s FOLLOWERS

Page 121: Bigsets…CRDT sets but BIGGER... “…after some analysis we found that much of our data could be modelled within sets so

[…, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …,

…, …, …], […, …, …, …, …, …, …, …,

ZUCK’s FOLLOWERS

Page 122: Bigsets…CRDT sets but BIGGER... “…after some analysis we found that much of our data could be modelled within sets so

Problem?

• 1key -> 1 Set • Poor Write speed • Can’t have “big” sets

Page 123: Bigsets…CRDT sets but BIGGER... “…after some analysis we found that much of our data could be modelled within sets so

Every time we change the set we read and write the

whole set!

Page 124: Bigsets…CRDT sets but BIGGER... “…after some analysis we found that much of our data could be modelled within sets so

Sets in Riak

Page 125: Bigsets…CRDT sets but BIGGER... “…after some analysis we found that much of our data could be modelled within sets so

Sets in Riak

10k sets, 100k elements, 50 workers - write

Page 126: Bigsets…CRDT sets but BIGGER... “…after some analysis we found that much of our data could be modelled within sets so

Sets in Riak

Small : riak object 1MB limit

Page 127: Bigsets…CRDT sets but BIGGER... “…after some analysis we found that much of our data could be modelled within sets so

Bigsets:Make writes faster

andsets bigger

Page 128: Bigsets…CRDT sets but BIGGER... “…after some analysis we found that much of our data could be modelled within sets so

Bigset Design: Overview

Page 129: Bigsets…CRDT sets but BIGGER... “…after some analysis we found that much of our data could be modelled within sets so

Bigset Design: Overview

Page 130: Bigsets…CRDT sets but BIGGER... “…after some analysis we found that much of our data could be modelled within sets so

Initial Results

Page 131: Bigsets…CRDT sets but BIGGER... “…after some analysis we found that much of our data could be modelled within sets so

Initial Results

10k sets, 100k elements, 50 workers - write

Page 132: Bigsets…CRDT sets but BIGGER... “…after some analysis we found that much of our data could be modelled within sets so

10k sets, 100k elements, 50 workers - write

Page 133: Bigsets…CRDT sets but BIGGER... “…after some analysis we found that much of our data could be modelled within sets so

Client X

Add “Shelly”

ZUCK’s FOLLOWERS CLOCK

bob a 1

[{a, 10}, {b 99}…{z, 89}]

bob c 12

Zooey b 97

Page 134: Bigsets…CRDT sets but BIGGER... “…after some analysis we found that much of our data could be modelled within sets so

ZUCK’s FOLLOWERS CLOCK

bob a 1

[{a, 10}, {b 99}…{z, 89}]

bob c 12

Zooey b 97

Page 135: Bigsets…CRDT sets but BIGGER... “…after some analysis we found that much of our data could be modelled within sets so

ZUCK’s FOLLOWERS CLOCK

bob a 1

[{a, 10}, {b 99}…{z, 89}]

bob c 12

Zooey b 97

ZUCK’s FOLLOWERS CLOCK

bob a 1

[{a, 10}, {b 99}…{z, 89}]

bob c 12

Zooey b 97

Page 136: Bigsets…CRDT sets but BIGGER... “…after some analysis we found that much of our data could be modelled within sets so

ZUCK’s FOLLOWERS CLOCK

bob a 1

[{a, 11}, {b 99}…{z, 89}]

bob c 12

Zooey b 97

Shelly a 11

[{a, 11}, {b 99}…{z, 89}]

ZUCK’s FOLLOWERS CLOCK

bob a 1

[{a, 10}, {b 99}…{z, 89}]

bob c 12

Zooey b 97

Page 137: Bigsets…CRDT sets but BIGGER... “…after some analysis we found that much of our data could be modelled within sets so

[{a, 11},

Shelly a 11

[{a, 11}, {b 99}…{z, 89}]

ZUCK’s FOLLOWERS CLOCK

bob a 1

[{a, 11}, {b 99}…{z, 89}]

bob c 12

Zooey b 97

Page 138: Bigsets…CRDT sets but BIGGER... “…after some analysis we found that much of our data could be modelled within sets so

REPLICATEZUCK’s FOLLOWERS CLOCK

bob a 1

[{a, 11}, {b 99}…{z, 89}]

bob c 12

Zooey b 97

Shelly a 11

Shelly a 11

Page 139: Bigsets…CRDT sets but BIGGER... “…after some analysis we found that much of our data could be modelled within sets so

ZUCK’s FOLLOWERS CLOCK

bob a 1

[{a, 10}, {b 99}…{z, 89}]

bob c 12

Zooey b 97

Shelly a 11

Page 140: Bigsets…CRDT sets but BIGGER... “…after some analysis we found that much of our data could be modelled within sets so

ZUCK’s FOLLOWERS CLOCK

bob a 1

[{a, 10}, {b 99}…{z, 89}]

bob c 12

Zooey b 97

Shelly a 11[{a, 10}, {b 99}…{z, 89}]

Page 141: Bigsets…CRDT sets but BIGGER... “…after some analysis we found that much of our data could be modelled within sets so

ZUCK’s FOLLOWERS CLOCK

bob a 1

[{a, 10}, {b 99}…{z, 89}]

bob c 12

Zooey b 97

Shelly a 11

[{a, 10}, {b 99}…{z, 89}]

Page 142: Bigsets…CRDT sets but BIGGER... “…after some analysis we found that much of our data could be modelled within sets so

ZUCK’s FOLLOWERS CLOCK

bob a 1

[{a, 10}, {b 99}…{z, 89}]

bob c 12

Zooey b 97

Shelly a 11

[{a, 11}, {b 99}…{z, 89}][{a, 11}, {b 99}…{z, 89}]

Page 143: Bigsets…CRDT sets but BIGGER... “…after some analysis we found that much of our data could be modelled within sets so

ZUCK’s FOLLOWERS CLOCK [{a, 10}, {b 99}…{z, 89}]

bob a 1

bob c 12

Zooey b 97

Shelly a 11

[{a, 11}, {b 99}…{z, 89}]

Page 144: Bigsets…CRDT sets but BIGGER... “…after some analysis we found that much of our data could be modelled within sets so

THAT’S IT!

Page 145: Bigsets…CRDT sets but BIGGER... “…after some analysis we found that much of our data could be modelled within sets so

THAT’S IT?• Reads! • Version Vector! • Hand-Off! • AAE!

Page 146: Bigsets…CRDT sets but BIGGER... “…after some analysis we found that much of our data could be modelled within sets so

Reads?

Page 147: Bigsets…CRDT sets but BIGGER... “…after some analysis we found that much of our data could be modelled within sets so

Initial Read Results

Page 148: Bigsets…CRDT sets but BIGGER... “…after some analysis we found that much of our data could be modelled within sets so

10k sets, 100k elements, 20 workers - read

Page 149: Bigsets…CRDT sets but BIGGER... “…after some analysis we found that much of our data could be modelled within sets so

clock

element-N

end_key

set-tombstone

element-1

clock

element-N

end_key

set-tombstone

element-1

Read Clock

Iterate keys

Page 150: Bigsets…CRDT sets but BIGGER... “…after some analysis we found that much of our data could be modelled within sets so

C++clock

element-N

end_key

set-tombstone

element-1

Read Clock

Iterate each key

ERLANG

Page 151: Bigsets…CRDT sets but BIGGER... “…after some analysis we found that much of our data could be modelled within sets so

C++clock

element-N

end_key

set-tombstone

element-1

Read Clock

100k buffer of keys

ERLANG

Page 152: Bigsets…CRDT sets but BIGGER... “…after some analysis we found that much of our data could be modelled within sets so

<<Set, \0, $c, \0, Actor, \0>>

<<Set, \0, %z, \0, \0>>

<<Set, \0, $e, \0, Element-1, \0, Actor, Cnt:/64/big-unsigned-

integer>>

<<Set, \0, $e, \0, Element-1, \0, Actor, Cnt:/64/big-unsigned-

integer>>

No Sext

No T2B

Page 153: Bigsets…CRDT sets but BIGGER... “…after some analysis we found that much of our data could be modelled within sets so

Bigset Design: read

Client X

read fsm

block1

replica A replica C

block1

blockN

block1

blockN

incremental merge

blockN

Page 154: Bigsets…CRDT sets but BIGGER... “…after some analysis we found that much of our data could be modelled within sets so

Reads Today

10k sets, 100k elements, 20 workers - read

Page 155: Bigsets…CRDT sets but BIGGER... “…after some analysis we found that much of our data could be modelled within sets so

Reads Today

10k sets, 100k elements, 20 workers - read

Page 156: Bigsets…CRDT sets but BIGGER... “…after some analysis we found that much of our data could be modelled within sets so

Full Set Read or Queries?

Why read the whole set?

‘Cos you HAVE TO!

Page 157: Bigsets…CRDT sets but BIGGER... “…after some analysis we found that much of our data could be modelled within sets so

Full Set Read or Queries?

Why read the whole set?

Page 158: Bigsets…CRDT sets but BIGGER... “…after some analysis we found that much of our data could be modelled within sets so

Full Set Read or Queries?

Why read the whole set?

‘Cos you HAVE TO!

Page 159: Bigsets…CRDT sets but BIGGER... “…after some analysis we found that much of our data could be modelled within sets so

Bigset Queries• Subset

• Is Member? • Range queries SORTED! • Pagination

Page 160: Bigsets…CRDT sets but BIGGER... “…after some analysis we found that much of our data could be modelled within sets so

Removes• Observed-Remove - context

• Requires _some kind_ of read

• cheap membership check

Page 161: Bigsets…CRDT sets but BIGGER... “…after some analysis we found that much of our data could be modelled within sets so

Is Member(X)

Client X

read fsm

is_member(<<bob>>).

replica A replica C

read(<<bob>>)

Page 162: Bigsets…CRDT sets but BIGGER... “…after some analysis we found that much of our data could be modelled within sets so

Is Member(X)Replica A

{ReplicaA, clock} [{a, 3}, {b,3}]

{<<bob>>, a, 1} <<>>

{<<bob>>, b, 2} <<>>

{<<shelly>>, b, 1} <<>>

{ReplicaA, end_key} <<>>

read clock

{<<anne>>, a, 3} <<>>

Seek <<bob>>

Page 163: Bigsets…CRDT sets but BIGGER... “…after some analysis we found that much of our data could be modelled within sets so

Is Member(X)Replica A

{ReplicaA, clock} [{a, 3}, {b,3}]

{<<bob>>, a, 1} <<>>

{<<bob>>, b, 2} <<>>

{<<shelly>>, b, 1} <<>>

{ReplicaA, end_key} <<>>

read clock

{<<anne>>, a, 3} <<>>

Seek <<bob>>

Page 164: Bigsets…CRDT sets but BIGGER... “…after some analysis we found that much of our data could be modelled within sets so

Is Member(X)Replica A

[{a, 3}, {b,3}]

<<bob>> [{a,1}, {b,2}]

read fsm

<<bob>> Set

Replica C

[{a,3}, {c, 1}]

<<bob>> [{c, 1}]

Page 165: Bigsets…CRDT sets but BIGGER... “…after some analysis we found that much of our data could be modelled within sets so

Is Member(X)

⊔=[{a, 3}, {b,3}]

<<bob>> [{a,1}, {b,2}]

[{a,3}, {c, 1}]

<<bob>> [{c, 1}]

==<<bob>> [{b,2}, {c,1}]

Read FSM

{true, [{b,2}, {c, 1}]}Client X

Page 166: Bigsets…CRDT sets but BIGGER... “…after some analysis we found that much of our data could be modelled within sets so

Next?

• Other “Big” Types - Maps

• Quorum Read Secondary Indexes

• Big Sets of Maps - Tables

• Joins? SQL?

Page 167: Bigsets…CRDT sets but BIGGER... “…after some analysis we found that much of our data could be modelled within sets so

Summary

• CRDTs make eventual consistency easier on developers

• There exists an Optimised Add-Wins Set…

• It takes more more than a lib

• A little engineering goes a long way

Page 168: Bigsets…CRDT sets but BIGGER... “…after some analysis we found that much of our data could be modelled within sets so

THANK YOU!

WE’RE HIRING! UK Client Service Engineer Developer Advocate EMEA

bashojobs.theresumator.com

VISIT OUR STAND Experience our Riak TS demo and be

entered to win a Scalextric set! Get your invitation to our IoT Riak TS

Roadshow

Bigset Paper https://arxiv.org/abs/1605.06424