algebraic data types: semilattices
DESCRIPTION
Introduction to the algebraic data type Semilattice and its application in distributed environments.TRANSCRIPT
Algebraic data type: Semilattices
a.k.a eventually consistent data structures
Bernhard Huemer IRIAN Solutions
@bhuemer
.. because distributed is the new normal
Why are we here?
Shamelessly stolen from: https://skillsmatter.com/skillscasts/4915-how-do-we-reconcile-eventually-consistent-data
Source: Wikimedia Commons
130 ms
E = MC2
Latency might be one reason why you want distribution
Shamelessly stolen from: https://skillsmatter.com/skillscasts/4915-how-do-we-reconcile-eventually-consistent-data
Scale-up vs scale-out
foo = foo + 1
foo = foo + 2
Race conditions
Network partitions
Conflict resolution (1)
Not clinging to some total order will make your life easier
Conflict resolution (2)
Leave it to the user to resolve conflicts, often there’s something meaningful you can do (e.g. merge shopping carts)
G-Counters
Conflict resolution (3)
.. or this thing that Riak does for you
Algebraic data types
Source: http://en.wikipedia.org/wiki/Algebraic_structure
Algebra - the GoF design pattern collection for functional programmers
Rather than solving this problem over and over again, let’s find a more general solution
Semilattice
trait Semilattice[T] { ! def join(T a, T b): T }
Monoid
trait Monoid[T] { def id: T def op(T a, T b): T }
Idempotency Commutativity Associativity
Identity Associativity
Idempotency
List(a) ++ List(a) ≠ List(a)
Set(a) ++ Set(a) = Set(a)
1 + 1 ≠ 1
max(1, 1) = 1
!
• Familiar binary operations forming monoids don’t need to be semilattices!
• Immutability isn’t enough / the same
• It doesn’t matter how many times you apply the operation
Commutativity
• Order in which you apply operations doesn’t matter any more
• If we notice dropped packages, just send them again
1 + 2 = 2 + 1
max(1, 2) = max(2, 1)
List(a) ++ List(b) ≠ List(b) ++ List(a)
Set(a) ++ Set(b) = Set(b) ++ Set(a)
Associativity (1)
• Allows you to split up and batch computations
• Each node needn’t receive all atomic operands, intermediate results will do as well
1 + (2 + 3) = (1 + 2) + 3
max(1, max(2, 3) = max(max(1, 2), 3)
List(a) ++ (List(b)++ List(c))
= (List(a) ++ List(b))
++ List(c)
Associativity (2)
• Again, intermediate results are as good as atomic operands
• You never lose any information in the whole computation
red + blue = blue + red =
purple
red + (blue + blue) red + blue
≠ (red + blue) + blue
purple + blue
* Simplistic version that assumes we’re losing information about the volume of the colour, for example (if you’re mixing paint)
avg(1, avg(2, 4)) avg(1, 3)
≠ avg(avg(1, 2), 4)
avg(1.5, 4)
G-Set
2P-Set
OP-Set
Further reading (1)• “Jonas Bonér - The Road to Akka Cluster, and Beyond”:
https://skillsmatter.com/skillscasts/4543-the-road-to-akka-cluster-and-beyond
• “Noel Welsh - Reconciling eventually consistent data”: https://skillsmatter.com/skillscasts/4915-how-do-we-reconcile-eventually-consistent-data
• “Sean Cribbs - Eventually Consistent Data Structures”: https://vimeo.com/43903960
Further reading (2)
• “A comprehensive study of Convergent and Commutative Replicated Data Types”: http://hal.upmc.fr/docs/00/55/55/88/PDF/techreport.pdf
One more thing …