distributed systems · • cap theorem. raft • proposed by ongaroand ousterhoutin 2014 • five...

Post on 10-Oct-2020

1 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

DistributedSystemsDay12:Consistency

“GottoCatchThemAll…”

Today

• RaftRecap

• Consensus

• CAPTheorem

Raft• Proposed byOngaro andOusterhout in2014

• Fivecomponents• Leaderelection• Logreplication• Safety• Clientprotocol• Membershipchanges

• Assumescrashfailures(sonobyzantinefailures)

• Nodependency ontimeforsafety• Butdepends ontimeforavailability

• Tolerates(N-1)/2failures

RaftProperties• Safety:atmostoneleader

• Eachfollowervotesforatmostonecandidate• Acandidateneedsamajoritytobeleader

• Liveness:eventuallytherewillbealeader• Challenge:ifmultipletrytocallforelectionà splitvote• Timeout+randomness:randomnesshelpstoensurethatoneserverdetects

fasterthantheothers

• LogSafety: ifleadercommits,thendataisinallfutureleaders• ElectionModifications:followersonlyvoteforclientwithhigherterm/index• CommitModifications:NewLeaderdoesnotcommituntilentriesincurrent

termhavebeenagreedonbyfollowers

HowdoyouChangeClusterSize?N=5

N=7

Goal:addtwonewserverstocluster

Challenge:consistentlygetallserverstoagreeonnewclustersize

Case:Duringclusterupdate,leaderfailsANDdifferentservershavedifferentnotionsofsizeàmultipleleaders

HowdoyouChangeClusterSize?N=5

N=7

Goal:addtwonewserverstocluster

Challenge:consistentlygetallserverstoagreeonnewclustersize

Case:Duringclusterupdate,leaderfailsANDdifferentservershavedifferentnotionsofsizeàmultipleleaders

Solution:Needaprotocoltoconsistentlyupdatethecluster

ConfigurationChanges• Cannotswitchdirectlyfromoneconfigurationtoanother:conflictingmajoritiescouldarise

• SwitchingfromN=3toN=5Seethepaperfordetails

Cold Cnew

Server1Server2Server3Server4Server5

MajorityofCold

MajorityofCnew

time

Today

• RaftRecap

• Consensus

• CAPTheorem

ApproachestoReplication

PassiveReplication• Totalordering• Protocols:Zookeeper,Paxos,Chubby

ActiveReplication• FIFOordering• Toleratesbyzantinefailures

Lazyreplication• Causalordering• Protocols:Gossip,DynamoDB,

CassandraDB,VoldemortDB,MongoDB

ActiveReplication PassiveReplication LazyReplication

ServerB

ServerC

ServerA

FE

FEServerB(follower)

ServerC(Follower)

ServerA(leader)FE

FE FE ServerB

ServerC

ServerA

FE

ApproachestoReplication

PassiveReplication• Totalordering• Performanceissues:slowandlimitsparallelism

• Allservers processthesame request

Lazyreplication• Causalordering• Performance:faster

– Anyserver canprocessanyrequest– More parallelism˜

PassiveReplication LazyReplication

ServerB(follower)

ServerC(Follower)

ServerA(leader)FE

FE FE ServerB

ServerC

ServerA

FE

ThinkingAboutConsistency

• Allreplicasareoneserver

• Ifdifferentclientswriteandreadtothis’’one’’server,whatshouldweexpect?

ServerB(follower)

ServerC(Follower)

ServerA(leader)FE

FE

Get(c)

Get(c)

set(c=5)

set(c=7)

Get(c)

Get(c)

Get(c)

Get(c)

C1

C2

C3

C4

ConsistencySpectrum

StrictSerializability

Linearizable

Sequential

Causal+

Eventual

WEAKCONSISTENCY

STRONGCONSISTENCY

SLOWERBUTEASYTOPROGRAM

FASTBUTHARDERTOPROGRAM

Linearizable

• Totalorder+FIFO+“Time”• Recall:Totalorder– allserversoperateinsameorder• Linearizable isasubsetofTotalorder• OrderingmustbeFIFOandbasedontime

Get(c)

Get(c)

set(c=5)

set(c=7)

Get(c)

Get(c)

Get(c)

Get(c)

C1

C2

C3

C4

Initialc=3

Sequential

• Totalorder+FIFO

• SequentialisasubsetofTotalorder• OrderingmustbeFIFO• Requests fromdifferentclients canbereshuffled

Get(c)

Get(c)

set(c=5)

set(c=7)

Get(c)

Get(c)

Get(c)

Get(c)

C1

C2

C3

C4

Initialc=3

Causal+

• MustrespectCausality

• Needvectorclockstotrackandmaintaincausality

• Onlycausallyrelatedeventsneedtobeordered

• NOTOTALORDERING!!!!!

Get(c)

Get(c)

set(c=5)

set(c=7)

Get(c)

Get(c)

Get(c)

Get(c)

C1

C2

C3

C4

Initialc=3

Eventual

• AnythingCanhappen

• Ifnowritesà eventuallyallservers returnthesamedata

Get(c)

Get(c)

set(c=5)

set(c=7)

Get(c)

Get(c)

Get(c)

Get(c)

C1

C2

C3

C4

Initialc=3

ConsistencySpectrum• Linearizable:totalorder+realtime• Sequential:totalorder+clientorder• Causal+:causallyordered+eventuallyeveryoneagree• Eventual:eventuallyeveryoneagrees

StrictSerializability

Linearizable

Sequential

Causal+

Eventual

WEAKCONSISTENCY

STRONGCONSISTENCY

SLOWER

BUTEASYTOPROGRAM

FASTandParallel

BUTHARDERTOPROGRAM:needconflictresolution

Today

• RaftRecap

• Consensus

• CAPTheorem

CAPTheorem

• ConsistencyModel:Howdoesyoursystemreactduringpartition

• C:Consistency(linearizable)• Allaccess islinearizable• Requires consensus

• A:Availability• Allclients canmakeprogress

• P:Partitiontolerance• CandAcanhappenduringapartition

CAPTheorem

ServerB(follower)

ServerC(Follower)

ServerA(leader)

FE

FE

NetworkPartition

• ConsistencyModel:Howdoesyoursystemreactduringpartition

• C:Consistency(linearizable)• Allaccess islinearizable• Requires consensus

• A:Availability• Allclients canmakeprogress

• P:Partitiontolerance• CandAcanhappenduringapartition

Can’tcommit:• Nottheleader• Cantreachleader

Noavailability

CAPTheorem

ServerB(follower)

ServerC(Follower)

ServerA(leader)

FE

FE

NetworkPartition

• ConsistencyModel:Howdoesyoursystemreactduringpartition

• C:Consistency(linearizable)• Allaccess islinearizable• Requires consensus

• A:Availability• Allclients canmakeprogress

• P:Partitiontolerance• CandAcanhappenduringapartition I’llcommit:thereWILL

BECONFLICTS

NOCONSISTENCY

availability

ServerB(follower)

ServerC(Follower)

ServerA(leader)

FE

FE

NetworkPartition

I’llcommit: thereWILLBE CONFLICTS

NO CONSISTENCY

availability

ServerB(follower)

ServerC(Follower)

ServerA(leader)

FE

FE

NetworkPartition

Can’t commit:• Not theleader• Cant reach leader

No availability

Raft(linearizable):PassiveReplication:• Strongconsistency• Duringpartition---

• someclientswillmakenoprogress• Becauseleaderisunavailable

EventualConsistency:• Duringpartition---

• Someclientswillmakeprogress• Sinceclientscanchangesamedata

• Noconsistencygaurantees

CAPTheorem

• C:Consistency(Linearizable)• A:Availability• P:Partitiontolerance

• Givena“Partition”,youmustpickbetween“Availability”and“Consistency”• PickConsistently:Someclients(notall)canchange“data consistently”• PickAvailability:Allclientscanchangedatabut“inconsistently”

Today

• RaftRecap• Challenges: howtochangethesizeofthecluster

• Consensus:ConsistencyModels• Definitions ofdifferentconsistencymodels• Differencesbetween themodels

• CAPTheorem:Given‘P’,youcanonlyhave“A”and“P”.• Whendesigning asystemthatmusttoleratepartitions, youmustpickbetween “A”and“P”.

top related