cap, pacelc, and determinism
Post on 10-May-2015
5.408 Views
Preview:
TRANSCRIPT
CAP, PACELC, and DeterminismCAP, PACELC, and Determinism
Daniel AbadiDaniel Abadi
Yale University Yale University January 18January 18thth, 2011, 2011
Twitter: @daniel_abadiTwitter: @daniel_abadi
Blog: dbmsmusings.blogspot.comBlog: dbmsmusings.blogspot.com
MIT TheoremMIT Theorem
Friends Sleep
Grades
CAP TheoremCAP Theorem
Tolerance of Partitioning
AvailabilityConsistency
Intuition Behind ProofIntuition Behind Proof
DB 1 DB 2network
Intuition Behind ProofIntuition Behind Proof
DB 1 DB 2
network partition
Problems with CAPProblems with CAP
Not as elegant as the MIT theoremNot as elegant as the MIT theorem
MIT TheoremMIT Theorem
Friends Sleep
Grades
CAP TheoremCAP Theorem
Tolerance of Partitioning
AvailabilityConsistency
Problems with CAPProblems with CAP
Not as elegant as the MIT theoremNot as elegant as the MIT theorem– There are not three different choices!There are not three different choices!– CA and CP are indistinguishableCA and CP are indistinguishable
Used as an excuse to not bother with Used as an excuse to not bother with consistencyconsistency– ““Availability is really important to me, so CAP Availability is really important to me, so CAP
says I have to get rid of consistency” says I have to get rid of consistency”
You can have A and C when You can have A and C when there are no partitions!there are no partitions!
DB 1 DB 2network
Source of ConfusionSource of Confusion
Asymmetry of CAP propertiesAsymmetry of CAP properties– Some are properties of the system in generalSome are properties of the system in general– Some are properties of the system only when Some are properties of the system only when
there is a partitionthere is a partition
Another Problem to FixAnother Problem to Fix
There are other costs to consistency (besides There are other costs to consistency (besides availability in the face of partitions)availability in the face of partitions)– Overhead of synchronization schemes Overhead of synchronization schemes
Deterministic execution model significantly reduces the cost Deterministic execution model significantly reduces the cost (see Thomson and Abadi in VLDB 2010)(see Thomson and Abadi in VLDB 2010)
– LatencyLatencyIf workload is geographically partitionableIf workload is geographically partitionable
– Latency is not so badLatency is not so bad
Otherwise Otherwise – No way to get around at least one round-trip messageNo way to get around at least one round-trip message
A Cut at Fixing Both ProblemsA Cut at Fixing Both Problems
PACELCPACELC– In the case of a partition (P), does the system In the case of a partition (P), does the system
choose availability (A) or consistency (C)?choose availability (A) or consistency (C)?– Else (E), does the system choose latency (L) Else (E), does the system choose latency (L)
or consistency (C)?or consistency (C)?
ExamplesExamples
PA/ELPA/EL– Dynamo, SimpleDB, Cassandra, Riptano, CouchDB, Dynamo, SimpleDB, Cassandra, Riptano, CouchDB,
CloudantCloudant
PC/ECPC/EC– ACID compliant database systemsACID compliant database systems
PA/ECPA/EC– GenieDBGenieDB– See CIDR paper from Wada, Fekete, et. al.See CIDR paper from Wada, Fekete, et. al.
Indicates that Google App Engine data store (eventual Indicates that Google App Engine data store (eventual consistent option) falls under this categoryconsistent option) falls under this category
PC/EL: Existence is debatable, see next slidePC/EL: Existence is debatable, see next slide
Does PC/EL exist?Does PC/EL exist?
Strengthening consistency when there is a partition doesn’t Strengthening consistency when there is a partition doesn’t seem to make senseseem to make senseI think about it the following wayI think about it the following way– Some EL systems have stronger consistency guarantees than Some EL systems have stronger consistency guarantees than
eventual consistencyeventual consistencyMonotonic reads consistencyMonotonic reads consistencyTimeline consistencyTimeline consistencyRead-your-writes consistencyRead-your-writes consistencyNon-atomic consistencyNon-atomic consistency
– CAP presents a problem for these guaranteesCAP presents a problem for these guarantees– Hence, partitioning causes some loss of availabilityHence, partitioning causes some loss of availability– Examples: PNUTs/Sherpa (original option), MongoDBExamples: PNUTs/Sherpa (original option), MongoDB
Dan Weinreb makes a case that this is too confusing: Dan Weinreb makes a case that this is too confusing: http://danweinreb.org/blog/improving-the-pacelc-taxonomy http://danweinreb.org/blog/improving-the-pacelc-taxonomy
A Case for P*/ECA Case for P*/EC
Increased push for horizontally scalable transactional database Increased push for horizontally scalable transactional database systemssystems– Reasons:Reasons:
Cloud computingCloud computingDistributed applicationsDistributed applicationsDesire to deploy applications on cheap, commodity hardwareDesire to deploy applications on cheap, commodity hardware
Vast majority of currently available horizontally scalable systems are Vast majority of currently available horizontally scalable systems are P*/ELP*/EL– Developed by engineers at Google, Facebook, Yahoo, Amazon, etc.Developed by engineers at Google, Facebook, Yahoo, Amazon, etc.– These engineers can handle reduced consistency, but it’s really hard, These engineers can handle reduced consistency, but it’s really hard,
and there needs to be an option for the rest of usand there needs to be an option for the rest of us
AlsoAlso– Distributed concurrency control and commit protocols are expensiveDistributed concurrency control and commit protocols are expensive– Once consistency is gone, atomicity usually goes next ---> NoSQLOnce consistency is gone, atomicity usually goes next ---> NoSQL
Key Problems to OvercomeKey Problems to Overcome
High availability is critical, replication must be a first class High availability is critical, replication must be a first class citizencitizen– Today’s systems generally act, then replicateToday’s systems generally act, then replicate
Complicates semantics of sending read queries to replicasComplicates semantics of sending read queries to replicas
Need confirmation from replica before commit (increased latency) if Need confirmation from replica before commit (increased latency) if you want durability and high availabilityyou want durability and high availability
In progress transactions must be aborted upon a master failureIn progress transactions must be aborted upon a master failure
– Want system that replicates then actsWant system that replicates then acts
Distributed concurrency control and commit are Distributed concurrency control and commit are expensive, want to get rid of them bothexpensive, want to get rid of them both
Key IdeaKey Idea
Instead of weakening ACID, strengthen it!Instead of weakening ACID, strengthen it!– Guaranteeing equivalence to SOME serial Guaranteeing equivalence to SOME serial
order makes active replication difficultorder makes active replication difficultRunning the same set of xacts on two different Running the same set of xacts on two different replicas might cause replicas to divergereplicas might cause replicas to diverge
Disallow any nondeterministic behaviorDisallow any nondeterministic behaviorDisallow aborts caused by DBMSDisallow aborts caused by DBMS– Disallow deadlockDisallow deadlock– Distributed commit much easier if you don’t Distributed commit much easier if you don’t
have to worry about abortshave to worry about aborts
Consequences of DeterminismConsequences of Determinism
Replicas produce the same output, given the same input,Replicas produce the same output, given the same input,– Facilitates active replicationFacilitates active replication
Only initial input needs to be logged, state at failure can Only initial input needs to be logged, state at failure can be reconstructed from this input log (or from a replica)be reconstructed from this input log (or from a replica)Active distributed xacts not aborted upon node failureActive distributed xacts not aborted upon node failure– Greatly reduces (or eliminates) cost of distributed commitGreatly reduces (or eliminates) cost of distributed commit
Don’t have to worry about nodes failing during commit protocolDon’t have to worry about nodes failing during commit protocolDon’t have to worry about affects of transaction making it to disk Don’t have to worry about affects of transaction making it to disk before promising to commit transactionbefore promising to commit transactionJust need one message from any node that potentially can Just need one message from any node that potentially can deterministically abort the xactdeterministically abort the xactThis message can be sent in the middle of the xact, as soon as it This message can be sent in the middle of the xact, as soon as it knows it will commitknows it will commit
So how can this be done?So how can this be done?
Serialize transactionsSerialize transactions– SlowSlow
Partition data across cores, serialize transactions in each partitionPartition data across cores, serialize transactions in each partition– My work with Stonebraker, Madden, and Harizopoulos shows that this My work with Stonebraker, Madden, and Harizopoulos shows that this
works well if workload is partitionable (H-Store project)works well if workload is partitionable (H-Store project)– H-Store project became VoltDBH-Store project became VoltDB
What about non-partitionable workloads?What about non-partitionable workloads?– See Thomson and Abadi VLDB 2010 paper: “The case for determinism See Thomson and Abadi VLDB 2010 paper: “The case for determinism
in database systems”in database systems”– db.cs.yale.edu/determinism-vldb10.pdfdb.cs.yale.edu/determinism-vldb10.pdf
Bottom line:Bottom line:– Cost of consistency is less than people thinkCost of consistency is less than people think– Latency is still an issue over a WAN unless workload is geographically Latency is still an issue over a WAN unless workload is geographically
partitionablepartitionable– VLDB 2010 paper makes P*/EC systems more viableVLDB 2010 paper makes P*/EC systems more viable
top related