algebraic data types: semilattices

Algebraic data type: Semilattices

a.k.a eventually consistent data structures

Bernhard Huemer IRIAN Solutions

@bhuemer

.. because distributed is the new normal

Why are we here?

Shamelessly stolen from: https://skillsmatter.com/skillscasts/4915-how-do-we-reconcile-eventually-consistent-data

https://skillsmatter.com/skillscasts/4915-how-do-we-reconcile-eventually-consistent-data

Source: Wikimedia Commons

130 ms

E = MC2

Latency might be one reason why you want distribution

Shamelessly stolen from: https://skillsmatter.com/skillscasts/4915-how-do-we-reconcile-eventually-consistent-data


Scale-up vs scale-out

foo = foo + 1

foo = foo + 2

Race conditions

Network partitions

Conflict resolution (1)

Not clinging to some total order will make your life easier


Leave it to the user to resolve conflicts, often there’s something meaningful you can do (e.g. merge shopping carts)

G-Counters


.. or this thing that Riak does for you

Algebraic data types

Source: http://en.wikipedia.org/wiki/Algebraic_structure

Algebra - the GoF design pattern collection for functional programmers

Rather than solving this problem over and over again, let’s find a more general solution

http://en.wikipedia.org/wiki/Algebraic_structure

Semilattice

trait Semilattice[T] { ! def join(T a, T b): T }

Monoid

trait Monoid[T] { def id: T def op(T a, T b): T }

Idempotency Commutativity Associativity

Identity Associativity

Idempotency

List(a) ++ List(a) ≠ List(a)

Set(a) ++ Set(a) = Set(a)

1 + 1 ≠ 1

max(1, 1) = 1

!

• Familiar binary operations forming monoids don’t need to be semilattices!

• Immutability isn’t enough / the same

• It doesn’t matter how many times you apply the operation

Commutativity

• Order in which you apply operations doesn’t matter any more

• If we notice dropped packages, just send them again

1 + 2 = 2 + 1

max(1, 2) = max(2, 1)

List(a) ++ List(b) ≠ List(b) ++ List(a)

Set(a) ++ Set(b) = Set(b) ++ Set(a)

Associativity (1)

• Allows you to split up and batch computations

• Each node needn’t receive all atomic operands, intermediate results will do as well

1 + (2 + 3) = (1 + 2) + 3

max(1, max(2, 3) = max(max(1, 2), 3)

List(a) ++ (List(b)++ List(c))

= (List(a) ++ List(b))

++ List(c)

Associativity (2)

• Again, intermediate results are as good as atomic operands

• You never lose any information in the whole computation

red + blue = blue + red =

purple

red + (blue + blue) red + blue

≠ (red + blue) + blue

purple + blue

* Simplistic version that assumes we’re losing information about the volume of the colour, for example (if you’re mixing paint)

avg(1, avg(2, 4)) avg(1, 3)

≠ avg(avg(1, 2), 4)

avg(1.5, 4)

2P-Set

OP-Set

Further reading (1)• “Jonas Bonér - The Road to Akka Cluster, and Beyond”:

https://skillsmatter.com/skillscasts/4543-the-road-to-akka-cluster-and-beyond

• “Noel Welsh - Reconciling eventually consistent data”: https://skillsmatter.com/skillscasts/4915-how-do-we-reconcile-eventually-consistent-data

• “Sean Cribbs - Eventually Consistent Data Structures”: https://vimeo.com/43903960

https://skillsmatter.com/skillscasts/4543-the-road-to-akka-cluster-and-beyond


https://vimeo.com/43903960

Further reading (2)

• “A comprehensive study of Convergent and Commutative Replicated Data Types”: http://hal.upmc.fr/docs/00/55/55/88/PDF/techreport.pdf

http://hal.upmc.fr/docs/00/55/55/88/PDF/techreport.pdf

One more thing …

algebraic data types: semilattices

Technology