creating fault tolerant services on mesos

21
Creating fault-tolerant services on Mesos Max Neunhöffer Hangzhou, 19 November 2016 www.arangodb.com

Upload: arangodb

Post on 13-Apr-2017

168 views

Category:

Technology


3 download

TRANSCRIPT

Page 1: Creating Fault Tolerant Services on Mesos

Creating fault-tolerant services onMesosMax Neunhöffer

Hangzhou, 19 November 2016

www.arangodb.com

Page 2: Creating Fault Tolerant Services on Mesos

What is this all about?Fundamental problemCreating, deploying and maintaining resilient distributed systems is hard.Do not “reinvent square wheels”.Use proven and battle-hardenedmethods, algorithms and tools.

This is aboutholding cluster-wide configurationorganizing automatic fail-overdiscovering other serverssynchronizing actions across the cluster

Page 3: Creating Fault Tolerant Services on Mesos

Consensus— what is this and why is it needed?In a distributed system, the various parts need to agree on things.

This is a hard problem!

All is easy if all servers and the network constantly work.Neither is the case in practice! Not at all!!!The “various parts” need to “agree” across all potential failures!Consensusis the art to achieve this as well as possible in software.

Page 4: Creating Fault Tolerant Services on Mesos

Paxos and RaftTraditionally, one uses the Paxos Consensus Protocol (1998).More recently, Raft (2013) has been proposed.

Paxos is a challenge to understand and to implement efficiently.Various variants exist.Raft is designed to be understandable.My advice:First try to understand Paxos for some time (do not implement it!), thenenjoy the beauty of Raft, but do not implement it either!

Use some battle-tested implementation you trust!

But most importantly: DO NOT TRY TO INVENT YOUR OWN!

Page 5: Creating Fault Tolerant Services on Mesos

Raft in a slideAn odd number of servers each keep a persisted log of events.Everything is replicated to everybody.They democratically elect a leader with absolute majority.Only the leader may append to the replicated log.An append only counts when a majority has persisted and confirmed it.Very smart logic to ensure a unique leader and automatic recovery from failure.It is all a lot of fun to get right, but it is proven to work.One puts a key/value store on top, the log contains the changes.

Page 6: Creating Fault Tolerant Services on Mesos

ArangoDB Agency— the data modelPhilosophy: All data is JSON, all communication is via HTTP.You can think of the state of our key/value store as a single JSON document:{ "a": 12,

"b": { "name": "Max","children": ["Savina", "Phil"] },

"c": { "x": 12,"y": { "u": 17, "v": 27, "counts": true },"z": [null, true, false, "Humpdidump"] } }

The API allows to— read projections,— change parts using transactions, and to— specify preconditions.

Page 7: Creating Fault Tolerant Services on Mesos

ArangoDB Agency— read transactionsExample state: { "a": 12,

"b": { "name": "Max","children": ["Savina", "Phil"] },

"c": { "x": 12,"y": { "u": 17, "v": 27, "counts": true },"z": [null, true, false, "Humpdidump"] } }

A read transaction is described by an array of arrays of paths:[ [ "/b/name" ], [ "/a", "/c/y" ] ]

and would here result in an array of single JSON values:[ { "b": { "name": "Max" },

{ "a": 12, "c": { "y": { "u": 17, "v": 27, "counts": true } } } ]

For each string list, you get the smallest projection that contains all paths.

Page 8: Creating Fault Tolerant Services on Mesos

ArangoDB Agency— write transactionsA write transaction is described by a JSON, the attribute names are paths:[ { "/a": { "op": "set", "new": 13 },

"/c/y/u": { "op": "increment" } } ]

for example, sets STATE.a to 13 and increments STATE.c.y.u by 1 in one go.We could have changed the whole of STATE.c.y and deleted STATE.c.x like so:[ { "/c/x": { "op": "delete" } ,

"/c/y": { "op": "set", "new": { "u": 18, "v": 27 } } } ]

There is a short cut for simple operations:[ { "/a": 13 } ] means [ { "/a": { "op": "set", "new": 13 } ]

There are also array operations like “push” and “pop” and one can give a time to live.The K/V store promises to order all transactions linearly.

Page 9: Creating Fault Tolerant Services on Mesos

ArangoDB Agency— preconditionsA write transaction can have a precondition, i.e. a pair [U, P] is sent instead of [U]:[ { "/a": { "op": "set", "new": 13 },

"/c/y/u": { "op": "increment" } },{ "/a": { "old": 12 },

"/c/y": { "oldEmpty": false } } ]

The transaction will only be done, if STATE.a has the value 12 and STATE.c.y is set.There are conditions for a value— to be empty ({ "oldEmpty": true })— to be non-empty ({ "oldEmpty": false })— to have a specified value ({ "old": ... })— to be an array ({ "isArray": true })

Page 10: Creating Fault Tolerant Services on Mesos

ArangoDB Agency— HTTP callbacksOne can order an HTTP callback for some value:[ { "/c/y": { "op": "observe",

"url": "http://<host>:<port>/<path>" } } ]

Whenever anything changes under /c/y, an HTTP POST request is issued.This is good for performance optimizations to react quickly.

BUTNever rely on these for correctness since networks are unreliable.

Page 11: Creating Fault Tolerant Services on Mesos

ArangoDB Agency— supervisionA good design pattern for distributed systems is that

There is a single, fault-tolerant place for configuration.A supervision process monitors and takes action in the central configuration.All servers decentrally discover the change and adapt accordingly.It is ensured, that there is always exactly one supervision process running.ArangoDB AgencyCan play the central role above for a distributed system.One can easily code a supervision process

in JavaScript as Foxx app,with direct access to the agency, andcode is executed on the Raft leader.

Page 12: Creating Fault Tolerant Services on Mesos

Typical use casesUse cases in distributed systems

Central configuration managementHeartbeats and health managementSupervision and automatic fail-overSynchronization between serversLocking and resource managementLeader electionService discoveryUnique initializationetc.

Page 13: Creating Fault Tolerant Services on Mesos

DeploymentYou can deploy an ArangoDB agency via Marathon:

dcos marathon app add arangodb-agency.json

{ "id": "arangodb-agency", "instances": 3,"container": {

"docker": {"image": "arangodb/arangodb-agency", "network": "HOST","forcePullImage": true }, "type": "DOCKER",

"volumes": [{ "hostPath":"data","containerPath":"/var/lib/arangodb3","mode":"RW" },{ "containerPath":"data","persistent":{"size":1024},"mode":"RW"}] },

"portDefinitions": [ { "port": 0, "protocol": "tcp" } ],"cpus": 1, "mem": 1024, "disk": 1024, "env": { "AGENCY_SIZE": "3" } }

Page 14: Creating Fault Tolerant Services on Mesos

Idea of usage

Every task starts up independently, and only knowsits own address (from within the cluster)a way to find the ArangoDB Agency

The idea is thatthe Agency is always up and runningsynchronization and coordination works via Agency transactionsAgency communication is not necessarily fast but reliableclients know all Agency nodes and follow redirects to the leader

Page 15: Creating Fault Tolerant Services on Mesos

DiscoveryInitialization: Every task creates a UUID and does:POST /_api/agency/write[[{"/servers/83e451f04323": {"op": "set",

"new": "172.0.1.3:8529"}},{"/servers/83e451f04323": {"oldEmpty": true}}]

]

and retries with a different UUID if this fails.Regularly: Every task checks the list of servers:POST /_api/agency/read[[ "/servers" ]]and they can talk with each other happily.

Page 16: Creating Fault Tolerant Services on Mesos

One time initialization

ProblemSome global configuration data needs to be initialized once andsubsequently used by everybody.

Page 17: Creating Fault Tolerant Services on Mesos

One time initializationSolution: every node does this at startup:POST /_api/agency/read[[ "/initState" ]]if initState == "done": start serviceif initState non-empty: wait 15s and start from scratch

POST /_api/agency/write[[{"/initState": {"op": "set, "new": "83e451f04323", ttl: 60}},

{"/initState": {"oldEmpty": true}}]]if precondition failed: start from scratch

POST /_api/agency/write[[{"/whatever": ... , "/initState": "done"}]]

Page 18: Creating Fault Tolerant Services on Mesos

Leader election

ProblemAt all times one of the tasks is considered to be the leader.

Page 19: Creating Fault Tolerant Services on Mesos

Leader electionSolution: every follower does this regularly:POST /_api/agency/read[[ "/theLeader" ]]-> empty or {"leader": <id>, "term": <number>}if theLeader is non-empty: follow this leader, keep term

POST /_api/agency/write[[{"/theLeader": {"set": {"leader": "83e451f04323",

"term": <nextTerm>},"ttl": 10}},

{"/theLeader": {"oldEmpty": true}}]]if precondition failed: start overBe leader with <nextTerm> as <myTerm>...

Page 20: Creating Fault Tolerant Services on Mesos

Leader electionSolution: every leader does this regularly:POST /_api/agency/write[[{"/theLeader": {"set": {"leader": "83e451f04323",

"term": <myTerm>},"ttl": 10}},

{"/theLeader": {"old": {"leader": "83e451f04323","term": <myTerm>}}}]]

if precondition failed: resign and become follower

Continue being leader, use <myTerm> with every action

Important: A follower stores the term of the current leader and does notaccept orders from a leader with a lower term.

Page 21: Creating Fault Tolerant Services on Mesos

Linkshttps://www.arangodb.com

https://docs.arangodb.com/cookbook/index.html

http://the-paper-trail.org/blog/consensus-protocols-paxos/

https://raft.github.io/

http://mesos.apache.org/

https://mesosphere.com/

https://mesosphere.github.io/marathon/

https://github.com/neunhoef/AgencyUsage