distributed systems
Post on 07-Jan-2017
492 Views
Preview:
TRANSCRIPT
Distributed Systems + NodeJS
Bruno Bossola
MILAN 25-26 NOVEMBER 2016
@bbossola
Whoami
Developer since 1988
XP Coach 2000+
Co-founder of JUG Torino
Java Champion since 2005
CTO @ EF (Education First)
I live in London, love the weather...
Agenda
Distributed programming
How does it work, what does it mean
The CAP theorem
CAP explained with codeCA system using two phase commit
AP system using sloppy quorums
CP system using majority quorums
What next?
Q&A
Distributed programming
Do we need it?
The 93 petaflop Sunway TaihuLight is installed at the National Supercomputing Centre in Wuxi. At its peak, the computer can perform around 93,000 trillion calculations per second.
It has more than 10.5 million processing cores and 40,960 nodes and runs on a Linux-based operating system.
Distributed programming
Any system should deal with two tasks:Storage
Computation
How do we deal with scale?
How do we use multiple computers to do what we used to do on one?
What do we want to achieve?
Scalability
Availability
Consistency
Scalability
The ability of a system/network/process to: handle a growing amount of work
be enlarged to accommodate new growth
A scalable system continue to meet the needs of its users as the
scale increase
clipart courtesy of openclipart.org
clipart courtesy of openclipart.org
Scalability flavours
size: more nodes, more speed
more nodes, more space
more data, same latency
geographic:more data centers, quicker response
administrative:more machines, no additional work
How do we scale? partitioning
Slice the dataset into smaller independent sets
reduces the impact of dataset growthimproves performance by limiting the amount of data to be examined
improves availability by the ability of partitions to fail indipendently
How do we scale? partitioning
But can also be a source of problemswhat happens if a partition become unavailable?
what if It becomes slower?
what if it becomes unresponsive?
clipart courtesy of openclipart.org
How do we scale? replication
Copies of the same data on multiple machines
Benefits:allows more servers to take part in the computation
improves performance by making additional computing power and bandwidth
improves availability by creating copy of the data
How do we scale? replication
But it's also a source of problemsthere are independent copies of the data
need to be kept in sync on multiple machines
Your system must follow a consistency model
v4v4v8v8
v4v5v7v8
clipart courtesy of openclipart.org
Availability
The proportion of time a system is in functioning conditions
The system is fault-tolerantthe ability of your system to behave in a well defined manner once a fault occurs
All clients can always read and writeIn distributed systems
this
is achieved by redundancy
clipart courtesy of openclipart.org
Introducing: performance
The amount of useful work accomplished compared to the time and resources used
Basically:short response time for a unit of work
high rate of processing
low utilization of resources
clipart courtesy of openclipart.org
There are tradeoffs involved in optimizing for any of these outcomes. For example, a system may achieve a higher throughput by processing larger batches of work thereby reducing operation overhead. The tradeoff would be longer response times for individual pieces of work due to batching.
Introducing: latency
The period between the initiation of something and the occurrence
The time between something happened and the time it has an impact or become visible
more high level examples:how long until you become a
zombie
after a bite?
how long until my post is visible
to others?
clipart courtesy of cliparts.co
I find that low latency - achieving a short response time - is the most interesting aspect of performance, because it has a strong connection with physical (rather than financial) limitations. It is harder to address latency using financial resources than the other aspects of performance.
Consistency
Any read on a data item X returns a value corresponding to the result of the most recent write on X.
Each client always has the same view of the data
Also know as Strong Consistency
clipart courtesy of cliparts.co
Strong consistency
every replica sees every update in the same order. Updates are made
atomically, so that no two replicas may have different values at
the same time.
Weak consistency
every replica will see every update, but possibly in different
orders.
Eventual consistency
every replica will eventually see every update (i.e. there is a
point in time after which every replica has seen a given update),
and will eventually agree on all values. Updates are therefore not
atomic.
Consistency flavours
Strong consistency every replica sees every update in the same order.
no two replicas may have different values at the same time.
Weak consistency every replica will see every update, but possibly in different orders.
Eventual consistency every replica will eventually see every update and will eventually agree on all values.
The CAP theorem
CONSISTENCY
AVAILABILITY
PARTITION
TOLERANCE
The CAP theorem
You cannot have all :(
You can select two properties at once
Sorry, this has been mathematically proven and no, has not been debunked.
Consistency means that each client always has the same view of the data.
Availability means that all clients can always read and write.
Partition tolerance means that the system works well across physical network partitions.
Consistency is considered strong here:Atomic, linearizable, consistency: there must exist a total order on all operations such that each operation looks as if it were completed at a single instant. This is equivalent to requiring requests of the distributed shared memory to act as if they were executing on a single node, responding to operations one at a time
The CAP theorem
CA systems!You selected consistency and availability!
Strict quorum protocols (two/multi phase commit)
Most RDBMS
Hey! A network partition will f**k you up good!
The CAP theorem
AP systems!You selected availability and partition tolerance!
Sloppy quorums and conflict resolution protocols
Amazon Dynamo, Riak, Cassandra
The CAP theorem
CP systems!You selected consistency and partition tolerance!
Majority quorum protocols (paxos, raft, zab)
Apache Zookeeper, Google Spanner
Raft, Paxos and Zookeeper ZAB, all provide linearizable writes
This is intuitive since they use a leader which publishes the quorum-voted changes atomically and in order, creating a virtual synchrony.
CockroachDB and Google Spanner, also provide linearizability (Google also uses atomic clocks to optimize latency).
NodeJS time!
Let's write our brand new key value store
We will code all three different flavours
We will have many nodes, fully replicated
No sharding
We will kill servers!
We will trigger network
partitions!(no worries. it's a simulation!)
clipart courtesy of cliparts.co
explain CAP theorem with a distributed key-value store
move to AP and implement lampart clock
move to CP and implement consensus
Node APPGeneral design
APIStorage
API
GET (k)
SET (k,v)
StorageDatabase
Core
fX
fY
fZ
fK
CA key-value store
Uses classic two-phase commit
Works like a local system
Not partition tolerant
It provides the illusion of behaving like a single system but cannot tolerate network partitions or failures of his parts
NodeappCA: two phase commit, simplified
2PC
APIStorage
API
GET (k)
SET (k,v)
StorageDatabase
2PC
Core
propose(tx)
commit (tx)
rollback
(tx)
AP key-value store
Eventually consistent design
Prioritizes availability over consistency
Example: Amazon Dynamo (Riak, Cassandra...)
Dynamo prioritizes availability over consistency; it does not guarantee single-copy consistency. Instead, replicas may diverge from each other when values are written; when a key is read, there is a read reconciliation phase that attempts to reconcile differences between replicas before returning the value back to the client.
For many features on Amazon, it is more important to avoid outages than it is to ensure that data is perfectly consistent, as an outage can lead to lost business and a loss of credibility. Furthermore, if the data is not particularly important, then a weakly consistent system can provide better performance and higher availability at a lower cost than a traditional RDBMS.
Nodeapp`AP: sloppy quorums, simplified
QUORUM
APIStorage
API
GET (k)
SET (k,v)
StorageDatabase
QUORUM
Core
(read)
(repair)
propose(tx)
commit (tx)
rollback
(tx)
CP key-value store
Uses majority quorum (raft)
Guarantees eventual consistency
Will use the RAFT protocol
CP: majority quorums (raft, simplified)
RAFT
APIStorage
API
GET (k)
SET (k,v)
StorageDatabase
RAFT
Core
beat
voteme
history
Nodeapp`Urgently needs
refactoring!!!!
What about BASE?
It's just a way to qualify eventually consistent systems
BAsic AvailabilityThe database appears to work most of the time.
Soft-stateStores dont have to be write-consistent, nor do different replicas have to be mutually consistent all the time.
Eventual consistencyStores exhibit consistency at some later point (e.g., lazily at read time).
What about Lamport clocks?
It's a mechanism to maintain a distributed notion of time
Each process maintains a counterWhenever a process does work, increment the counter
Whenever a process sends a message, include the counter
When a message is received, set the counter to max(local_counter, received_counter) + 1
clipart courtesy of cliparts.co
What about Vector clocks?
Maintains an array of N Lamport clocks, one per each node
Whenever a process does work, increment the logical clock value of the node in the vector
Whenever a process sends a message, include the full vector
When a message is received:update each element in max(local, received)
increment the logical clock
of the current node in the vector
clipart courtesy of cliparts.co
What next?
Learn the lingo and the basics
Do your homework
Start playing with these concepts
It's complicated, but not rocket science
Be inspired!
Q&A
Amazon Dynamo:http://www.allthingsdistributed.com/2007/10/amazons_dynamo.html
The RAFT consensus algorithm:https://raft.github.io/http://thesecretlivesofdata.com/raft/
The code used into this presentation:https://github.com/bbossola/sysdist
clipart courtesy of cliparts.co
bbossola@gmail.com
@bbossola
top related