distributed systems

Distributed Systems + NodeJS

Bruno Bossola

MILAN 25-26 NOVEMBER 2016

@bbossola

Whoami

Developer since 1988

XP Coach 2000+

Co-founder of JUG Torino

Java Champion since 2005

CTO @ EF (Education First)

I live in London, love the weather...

Agenda

Distributed programming

How does it work, what does it mean

The CAP theorem

CAP explained with codeCA system using two phase commit

AP system using sloppy quorums

CP system using majority quorums

What next?

Q&A


Do we need it?

The 93 petaflop Sunway TaihuLight is installed at the National Supercomputing Centre in Wuxi. At its peak, the computer can perform around 93,000 trillion calculations per second.

It has more than 10.5 million processing cores and 40,960 nodes and runs on a Linux-based operating system.


Any system should deal with two tasks:Storage

Computation

How do we deal with scale?

How do we use multiple computers to do what we used to do on one?

What do we want to achieve?

Scalability

Availability

Consistency

Scalability

The ability of a system/network/process to: handle a growing amount of work

be enlarged to accommodate new growth

A scalable system continue to meet the needs of its users as the scale increase

clipart courtesy of openclipart.org


Scalability flavours

size: more nodes, more speed

more nodes, more space

more data, same latency

geographic:more data centers, quicker response

administrative:more machines, no additional work

How do we scale? partitioning

Slice the dataset into smaller independent sets

reduces the impact of dataset growthimproves performance by limiting the amount of data to be examined

improves availability by the ability of partitions to fail indipendently

How do we scale? partitioning

But can also be a source of problemswhat happens if a partition become unavailable?

what if It becomes slower?

what if it becomes unresponsive?


How do we scale? replication

Copies of the same data on multiple machines

Benefits:allows more servers to take part in the computation

improves performance by making additional computing power and bandwidth

improves availability by creating copy of the data

How do we scale? replication

But it's also a source of problemsthere are independent copies of the data

need to be kept in sync on multiple machines

Your system must follow a consistency model

v4v4v8v8

v4v5v7v8


Availability

The proportion of time a system is in functioning conditions

The system is fault-tolerantthe ability of your system to behave in a well defined manner once a fault occurs

All clients can always read and writeIn distributed systems this
is achieved by redundancy


Introducing: performance

The amount of useful work accomplished compared to the time and resources used

Basically:short response time for a unit of work

high rate of processing

low utilization of resources


There are tradeoffs involved in optimizing for any of these outcomes. For example, a system may achieve a higher throughput by processing larger batches of work thereby reducing operation overhead. The tradeoff would be longer response times for individual pieces of work due to batching.

Introducing: latency

The period between the initiation of something and the occurrence

The time between something happened and the time it has an impact or become visible

more high level examples:how long until you become a zombie
after a bite?

how long until my post is visible
to others?

clipart courtesy of cliparts.co

I find that low latency - achieving a short response time - is the most interesting aspect of performance, because it has a strong connection with physical (rather than financial) limitations. It is harder to address latency using financial resources than the other aspects of performance.

Consistency

Any read on a data item X returns a value corresponding to the result of the most recent write on X.

Each client always has the same view of the data

Also know as Strong Consistency


Strong consistency
every replica sees every update in the same order. Updates are made atomically, so that no two replicas may have different values at the same time.

Weak consistency
every replica will see every update, but possibly in different orders.

Eventual consistency
every replica will eventually see every update (i.e. there is a point in time after which every replica has seen a given update), and will eventually agree on all values. Updates are therefore not atomic.

Consistency flavours

Strong consistency every replica sees every update in the same order.

no two replicas may have different values at the same time.

Weak consistency every replica will see every update, but possibly in different orders.

Eventual consistency every replica will eventually see every update and will eventually agree on all values.

The CAP theorem

CONSISTENCY

AVAILABILITY

PARTITION
TOLERANCE

The CAP theorem

You cannot have all :(

You can select two properties at once

Sorry, this has been mathematically proven and no, has not been debunked.

Consistency means that each client always has the same view of the data.

Availability means that all clients can always read and write.

Partition tolerance means that the system works well across physical network partitions.

Consistency is considered strong here:Atomic, linearizable, consistency: there must exist a total order on all operations such that each operation looks as if it were completed at a single instant. This is equivalent to requiring requests of the distributed shared memory to act as if they were executing on a single node, responding to operations one at a time

The CAP theorem

CA systems!You selected consistency and availability!

Strict quorum protocols (two/multi phase commit)

Most RDBMS

Hey! A network partition will f**k you up good!

The CAP theorem

AP systems!You selected availability and partition tolerance!

Sloppy quorums and conflict resolution protocols

Amazon Dynamo, Riak, Cassandra

The CAP theorem

CP systems!You selected consistency and partition tolerance!

Majority quorum protocols (paxos, raft, zab)

Apache Zookeeper, Google Spanner

Raft, Paxos and Zookeeper ZAB, all provide linearizable writes

This is intuitive since they use a leader which publishes the quorum-voted changes atomically and in order, creating a virtual synchrony.

CockroachDB and Google Spanner, also provide linearizability (Google also uses atomic clocks to optimize latency).

NodeJS time!

Let's write our brand new key value store

We will code all three different flavours

We will have many nodes, fully replicated

No sharding

We will kill servers!

We will trigger network
partitions!(no worries. it's a simulation!)


explain CAP theorem with a distributed key-value store

move to AP and implement lampart clock

move to CP and implement consensus

Node APPGeneral design

APIStorage
API

GET (k)

SET (k,v)

StorageDatabase

Core

fX

fY

fZ

fK

CA key-value store

Uses classic two-phase commit

Works like a local system

Not partition tolerant

It provides the illusion of behaving like a single system but cannot tolerate network partitions or failures of his parts

NodeappCA: two phase commit, simplified

2PC
APIStorage
API

GET (k)

SET (k,v)

StorageDatabase

2PC
Core

propose(tx)

commit (tx)

rollback
(tx)

AP key-value store

Eventually consistent design

Prioritizes availability over consistency

Example: Amazon Dynamo (Riak, Cassandra...)

Dynamo prioritizes availability over consistency; it does not guarantee single-copy consistency. Instead, replicas may diverge from each other when values are written; when a key is read, there is a read reconciliation phase that attempts to reconcile differences between replicas before returning the value back to the client.

For many features on Amazon, it is more important to avoid outages than it is to ensure that data is perfectly consistent, as an outage can lead to lost business and a loss of credibility. Furthermore, if the data is not particularly important, then a weakly consistent system can provide better performance and higher availability at a lower cost than a traditional RDBMS.

Nodeapp`AP: sloppy quorums, simplified

QUORUM
APIStorage
API

GET (k)

SET (k,v)

StorageDatabase

QUORUM
Core

(read)

(repair)

propose(tx)

commit (tx)

rollback
(tx)

CP key-value store

Uses majority quorum (raft)

Guarantees eventual consistency

Will use the RAFT protocol

CP: majority quorums (raft, simplified)

RAFT
APIStorage
API

GET (k)

SET (k,v)

StorageDatabase

RAFT
Core

beat

voteme

history

Nodeapp`Urgently needs
refactoring!!!!

What about BASE?

It's just a way to qualify eventually consistent systems

BAsic AvailabilityThe database appears to work most of the time.

Soft-stateStores dont have to be write-consistent, nor do different replicas have to be mutually consistent all the time.

Eventual consistencyStores exhibit consistency at some later point (e.g., lazily at read time).

What about Lamport clocks?

It's a mechanism to maintain a distributed notion of time

Each process maintains a counterWhenever a process does work, increment the counter

Whenever a process sends a message, include the counter

When a message is received, set the counter to max(local_counter, received_counter) + 1


What about Vector clocks?

Maintains an array of N Lamport clocks, one per each node

Whenever a process does work, increment the logical clock value of the node in the vector

Whenever a process sends a message, include the full vector

When a message is received:update each element in max(local, received)

increment the logical clock

of the current node in the vector


What next?

Learn the lingo and the basics

Do your homework

Start playing with these concepts

It's complicated, but not rocket science

Be inspired!

Q&A

Amazon Dynamo:http://www.allthingsdistributed.com/2007/10/amazons_dynamo.html

The RAFT consensus algorithm:https://raft.github.io/http://thesecretlivesofdata.com/raft/

The code used into this presentation:https://github.com/bbossola/sysdist


[email protected]

@bbossola

distributed systems

Software