cap, pacelc, and determinism

20
CAP, PACELC, and CAP, PACELC, and Determinism Determinism Daniel Abadi Daniel Abadi Yale University Yale University January 18 January 18 th th , 2011 , 2011 Twitter: @daniel_abadi Twitter: @daniel_abadi Blog: Blog: dbmsmusings.blogspot.com dbmsmusings.blogspot.com

Upload: daniel-abadi

Post on 10-May-2015

5.408 views

Category:

Technology


1 download

TRANSCRIPT

Page 1: CAP, PACELC, and Determinism

CAP, PACELC, and DeterminismCAP, PACELC, and Determinism

Daniel AbadiDaniel Abadi

Yale University Yale University January 18January 18thth, 2011, 2011

Twitter: @daniel_abadiTwitter: @daniel_abadi

Blog: dbmsmusings.blogspot.comBlog: dbmsmusings.blogspot.com

Page 2: CAP, PACELC, and Determinism

MIT TheoremMIT Theorem

Friends Sleep

Grades

Page 3: CAP, PACELC, and Determinism

CAP TheoremCAP Theorem

Tolerance of Partitioning

AvailabilityConsistency

Page 4: CAP, PACELC, and Determinism

Intuition Behind ProofIntuition Behind Proof

DB 1 DB 2network

Page 5: CAP, PACELC, and Determinism

Intuition Behind ProofIntuition Behind Proof

DB 1 DB 2

network partition

Page 6: CAP, PACELC, and Determinism

Problems with CAPProblems with CAP

Not as elegant as the MIT theoremNot as elegant as the MIT theorem

Page 7: CAP, PACELC, and Determinism

MIT TheoremMIT Theorem

Friends Sleep

Grades

Page 8: CAP, PACELC, and Determinism

CAP TheoremCAP Theorem

Tolerance of Partitioning

AvailabilityConsistency

Page 9: CAP, PACELC, and Determinism

Problems with CAPProblems with CAP

Not as elegant as the MIT theoremNot as elegant as the MIT theorem– There are not three different choices!There are not three different choices!– CA and CP are indistinguishableCA and CP are indistinguishable

Used as an excuse to not bother with Used as an excuse to not bother with consistencyconsistency– ““Availability is really important to me, so CAP Availability is really important to me, so CAP

says I have to get rid of consistency” says I have to get rid of consistency”

Page 10: CAP, PACELC, and Determinism

You can have A and C when You can have A and C when there are no partitions!there are no partitions!

DB 1 DB 2network

Page 11: CAP, PACELC, and Determinism

Source of ConfusionSource of Confusion

Asymmetry of CAP propertiesAsymmetry of CAP properties– Some are properties of the system in generalSome are properties of the system in general– Some are properties of the system only when Some are properties of the system only when

there is a partitionthere is a partition

Page 12: CAP, PACELC, and Determinism

Another Problem to FixAnother Problem to Fix

There are other costs to consistency (besides There are other costs to consistency (besides availability in the face of partitions)availability in the face of partitions)– Overhead of synchronization schemes Overhead of synchronization schemes

Deterministic execution model significantly reduces the cost Deterministic execution model significantly reduces the cost (see Thomson and Abadi in VLDB 2010)(see Thomson and Abadi in VLDB 2010)

– LatencyLatencyIf workload is geographically partitionableIf workload is geographically partitionable

– Latency is not so badLatency is not so bad

Otherwise Otherwise – No way to get around at least one round-trip messageNo way to get around at least one round-trip message

Page 13: CAP, PACELC, and Determinism

A Cut at Fixing Both ProblemsA Cut at Fixing Both Problems

PACELCPACELC– In the case of a partition (P), does the system In the case of a partition (P), does the system

choose availability (A) or consistency (C)?choose availability (A) or consistency (C)?– Else (E), does the system choose latency (L) Else (E), does the system choose latency (L)

or consistency (C)?or consistency (C)?

Page 14: CAP, PACELC, and Determinism

ExamplesExamples

PA/ELPA/EL– Dynamo, SimpleDB, Cassandra, Riptano, CouchDB, Dynamo, SimpleDB, Cassandra, Riptano, CouchDB,

CloudantCloudant

PC/ECPC/EC– ACID compliant database systemsACID compliant database systems

PA/ECPA/EC– GenieDBGenieDB– See CIDR paper from Wada, Fekete, et. al.See CIDR paper from Wada, Fekete, et. al.

Indicates that Google App Engine data store (eventual Indicates that Google App Engine data store (eventual consistent option) falls under this categoryconsistent option) falls under this category

PC/EL: Existence is debatable, see next slidePC/EL: Existence is debatable, see next slide

Page 15: CAP, PACELC, and Determinism

Does PC/EL exist?Does PC/EL exist?

Strengthening consistency when there is a partition doesn’t Strengthening consistency when there is a partition doesn’t seem to make senseseem to make senseI think about it the following wayI think about it the following way– Some EL systems have stronger consistency guarantees than Some EL systems have stronger consistency guarantees than

eventual consistencyeventual consistencyMonotonic reads consistencyMonotonic reads consistencyTimeline consistencyTimeline consistencyRead-your-writes consistencyRead-your-writes consistencyNon-atomic consistencyNon-atomic consistency

– CAP presents a problem for these guaranteesCAP presents a problem for these guarantees– Hence, partitioning causes some loss of availabilityHence, partitioning causes some loss of availability– Examples: PNUTs/Sherpa (original option), MongoDBExamples: PNUTs/Sherpa (original option), MongoDB

Dan Weinreb makes a case that this is too confusing: Dan Weinreb makes a case that this is too confusing: http://danweinreb.org/blog/improving-the-pacelc-taxonomy http://danweinreb.org/blog/improving-the-pacelc-taxonomy

Page 16: CAP, PACELC, and Determinism

A Case for P*/ECA Case for P*/EC

Increased push for horizontally scalable transactional database Increased push for horizontally scalable transactional database systemssystems– Reasons:Reasons:

Cloud computingCloud computingDistributed applicationsDistributed applicationsDesire to deploy applications on cheap, commodity hardwareDesire to deploy applications on cheap, commodity hardware

Vast majority of currently available horizontally scalable systems are Vast majority of currently available horizontally scalable systems are P*/ELP*/EL– Developed by engineers at Google, Facebook, Yahoo, Amazon, etc.Developed by engineers at Google, Facebook, Yahoo, Amazon, etc.– These engineers can handle reduced consistency, but it’s really hard, These engineers can handle reduced consistency, but it’s really hard,

and there needs to be an option for the rest of usand there needs to be an option for the rest of us

AlsoAlso– Distributed concurrency control and commit protocols are expensiveDistributed concurrency control and commit protocols are expensive– Once consistency is gone, atomicity usually goes next ---> NoSQLOnce consistency is gone, atomicity usually goes next ---> NoSQL

Page 17: CAP, PACELC, and Determinism

Key Problems to OvercomeKey Problems to Overcome

High availability is critical, replication must be a first class High availability is critical, replication must be a first class citizencitizen– Today’s systems generally act, then replicateToday’s systems generally act, then replicate

Complicates semantics of sending read queries to replicasComplicates semantics of sending read queries to replicas

Need confirmation from replica before commit (increased latency) if Need confirmation from replica before commit (increased latency) if you want durability and high availabilityyou want durability and high availability

In progress transactions must be aborted upon a master failureIn progress transactions must be aborted upon a master failure

– Want system that replicates then actsWant system that replicates then acts

Distributed concurrency control and commit are Distributed concurrency control and commit are expensive, want to get rid of them bothexpensive, want to get rid of them both

Page 18: CAP, PACELC, and Determinism

Key IdeaKey Idea

Instead of weakening ACID, strengthen it!Instead of weakening ACID, strengthen it!– Guaranteeing equivalence to SOME serial Guaranteeing equivalence to SOME serial

order makes active replication difficultorder makes active replication difficultRunning the same set of xacts on two different Running the same set of xacts on two different replicas might cause replicas to divergereplicas might cause replicas to diverge

Disallow any nondeterministic behaviorDisallow any nondeterministic behaviorDisallow aborts caused by DBMSDisallow aborts caused by DBMS– Disallow deadlockDisallow deadlock– Distributed commit much easier if you don’t Distributed commit much easier if you don’t

have to worry about abortshave to worry about aborts

Page 19: CAP, PACELC, and Determinism

Consequences of DeterminismConsequences of Determinism

Replicas produce the same output, given the same input,Replicas produce the same output, given the same input,– Facilitates active replicationFacilitates active replication

Only initial input needs to be logged, state at failure can Only initial input needs to be logged, state at failure can be reconstructed from this input log (or from a replica)be reconstructed from this input log (or from a replica)Active distributed xacts not aborted upon node failureActive distributed xacts not aborted upon node failure– Greatly reduces (or eliminates) cost of distributed commitGreatly reduces (or eliminates) cost of distributed commit

Don’t have to worry about nodes failing during commit protocolDon’t have to worry about nodes failing during commit protocolDon’t have to worry about affects of transaction making it to disk Don’t have to worry about affects of transaction making it to disk before promising to commit transactionbefore promising to commit transactionJust need one message from any node that potentially can Just need one message from any node that potentially can deterministically abort the xactdeterministically abort the xactThis message can be sent in the middle of the xact, as soon as it This message can be sent in the middle of the xact, as soon as it knows it will commitknows it will commit

Page 20: CAP, PACELC, and Determinism

So how can this be done?So how can this be done?

Serialize transactionsSerialize transactions– SlowSlow

Partition data across cores, serialize transactions in each partitionPartition data across cores, serialize transactions in each partition– My work with Stonebraker, Madden, and Harizopoulos shows that this My work with Stonebraker, Madden, and Harizopoulos shows that this

works well if workload is partitionable (H-Store project)works well if workload is partitionable (H-Store project)– H-Store project became VoltDBH-Store project became VoltDB

What about non-partitionable workloads?What about non-partitionable workloads?– See Thomson and Abadi VLDB 2010 paper: “The case for determinism See Thomson and Abadi VLDB 2010 paper: “The case for determinism

in database systems”in database systems”– db.cs.yale.edu/determinism-vldb10.pdfdb.cs.yale.edu/determinism-vldb10.pdf

Bottom line:Bottom line:– Cost of consistency is less than people thinkCost of consistency is less than people think– Latency is still an issue over a WAN unless workload is geographically Latency is still an issue over a WAN unless workload is geographically

partitionablepartitionable– VLDB 2010 paper makes P*/EC systems more viableVLDB 2010 paper makes P*/EC systems more viable