nosql databases, the cap theorem, and the theory of relativity

69
NoSQL, CAP, and relativity 2013-09-18 Lars Marius Garshol, [email protected], http://twitter.com/larsga 1

Upload: lars-marius-garshol

Post on 08-Sep-2014

10.043 views

Category:

Technology


2 download

DESCRIPTION

A presentation showing how the CAP theorem causes NoSQL databases to have BASE semantics. That is, they don't support ACID consistency. Then shows how CAP is related to Einstein's theory of relativity. And finally shows how Google Spanner and F1 provide ACID that scales.

TRANSCRIPT

Page 1: NoSQL databases, the CAP theorem, and the theory of relativity

1

NoSQL, CAP, and relativity2013-09-18Lars Marius Garshol, [email protected], http://twitter.com/larsga

Page 2: NoSQL databases, the CAP theorem, and the theory of relativity

2

Agenda

• CAP theorem mostly• A bit about NoSQL databases• Are triple stores NoSQL?• Connection with Einstein’s theory

of relativity• And, finally, a surprise

Page 3: NoSQL databases, the CAP theorem, and the theory of relativity

3

NoSQL databases

Page 4: NoSQL databases, the CAP theorem, and the theory of relativity

4

What makes a NoSQL database?• Doesn’t use SQL as query

language– usually more primitive query language– sometimes key/value only

• BASE rather than ACID– that is, sacrifices consistency for

availability– much more about this later

• Schemaless– that is, data need not conform to a

predefined schema

Page 5: NoSQL databases, the CAP theorem, and the theory of relativity

5

BASE vs ACID

• ACID– Atomicity– Consistency– Isolation– Durability

• BASE– Basically Available– Soft-state– Eventual consistency

Page 6: NoSQL databases, the CAP theorem, and the theory of relativity

6

Eventual consistency• A key property of non-ACID systems• Means

– if no further changes made,– eventually all nodes will be consistent

• In itself eventual consistency is a very weak guarantee– when is “eventually”? it doesn’t say– in practice it means the system can be inconsistent

at any time• Stronger guarantees are sometimes made

– with prediction and measuring, actual behaviour can be quantified

– in practice, systems often appear strongly consistent

Page 7: NoSQL databases, the CAP theorem, and the theory of relativity

7

Implementing ev. consistency• Nodes must exchange information

about writes– basically, after performing a write, node must

inform all other replicas of the changed objects– signal OK to reader before or during replication– for example, by broadcast to all nodes

• Must have some way to deal with conflicts– all nodes must agree on conflict resolution– common solution: embed clock value in write

message, then let last writer win– clock need not be in sync for all nodes

Page 8: NoSQL databases, the CAP theorem, and the theory of relativity

8

What’s wrong with ACID?

• Semantics are easier for developers– applications can lean back and just trust

the db• However, doesn’t scale as well– that is, doesn’t scale with number of

nodes– requires too much communication and

agreement between nodes• Bigger web sites have therefore

gone BASE– Facebook, Flickr, Twitter, ...

Page 9: NoSQL databases, the CAP theorem, and the theory of relativity

9

Other benefits of NoSQL

• Schemaless– possible to write much more flexible code– schema evolution vastly easier

• Avoid joins– document databases allow hierarchical

non-normalized objects to be retrieved directly

Page 10: NoSQL databases, the CAP theorem, and the theory of relativity

10

Downsides to NoSQL• Everyone knows SQL

– few people know your specific NoSQL database• Lack of validation

– code will typically do anything the database lets it get away with (especially over time)

• No standards– you can’t easily switch databases– (well, except with SPARQL)

• Lack of maturity– lack of supporting tools, unpleasant

surprises, ...• Weak query languages

– means you have to do more in code– may hurt performance

Page 11: NoSQL databases, the CAP theorem, and the theory of relativity

11

Triple stores (SPARQL)

• Non-SQL? Yes• BASE? No• Schemaless? Yes

Only two out of three, so whethertriple stores are NoSQL databasesis debatable.

At the very least, they differ substantiallyfrom the core examples of NoSQL databases.

Page 12: NoSQL databases, the CAP theorem, and the theory of relativity

12

Can triple stores be BASE?

• In theory, yes– nothing inherent in graph structure that

prevents it• But,– how do you shard graph data?– no known way to do it that’s efficient

• So, in practice this is hard

Page 13: NoSQL databases, the CAP theorem, and the theory of relativity

13

When should you use NoSQL?• If scalability is a concern– however, don’t forget that sites like Flickr

and Wikipedia used RDBMSs for years– relational databases scale a long way

• If schemalessness is important– sometimes it really is– seriously consider RDF for this use case

• If fashion is a concern– for a surprising number of people, using

something new and shiny is the main thing

Page 14: NoSQL databases, the CAP theorem, and the theory of relativity

14

The CAP Theorem

Page 15: NoSQL databases, the CAP theorem, and the theory of relativity

15

CAP

• Consistency– all nodes always give the same answer

• Availability– nodes always answer queries and accept

updates• Partition-tolerance– system continues working even if one or

more nodes go quietCAP Theorem: You can only have two of these.Partition tolerance: Without this, the cluster dies the moment one node goes silent. Can’t really drop this one.

Page 16: NoSQL databases, the CAP theorem, and the theory of relativity

16

C ≠ C

• C in ACID– means all data obeys constraints in

schema• C in CAP– means all servers agree on the data

• However,– ACID implementations also follow the C

in CAP

Page 17: NoSQL databases, the CAP theorem, and the theory of relativity

17

History

• First formulated by Eric Brewer in 2000– based on experience with Inktomi search

engine– described the SQL/NoSQL divide very

well– coined the BASE acronym

• Formalized and proven in 2002– by Seth Gilbert and Nancy Lynch

• Today CAP is better understood– widely considered a key tradeoff in

designing distributed systems– and particularly databases– in some ways gave rise to NoSQL

databases

Page 18: NoSQL databases, the CAP theorem, and the theory of relativity

18

Consistent or Available?

request request request

Page 19: NoSQL databases, the CAP theorem, and the theory of relativity

19

Consistency

DB node 1

DB node 2

Client 1

Client 2

read account Xbalance -> 100

set account X balance = 0 set account X

balance = 0set account X balance = 0

set account X balance = 0

Page 20: NoSQL databases, the CAP theorem, and the theory of relativity

20

Availability

DB node 1

DB node 2

Client 1

Client 2

read account Xbalance -> 100

set account X balance = 0 set account X

balance = 0

set account X balance = 0

read account Xbalance -> 100

set account X balance = 0

Happy customer walksaway, richer by 200.

Servers eventually agreebalance is 0.

Page 21: NoSQL databases, the CAP theorem, and the theory of relativity

21

What exactly is the problem?

• The ordering of events affects the outcome

• The different nodes do not necessarily observe the same order

• However, it is possible to impose a consistent ordering of events

• The trouble is, that involves communication with all nodes

• Waiting for all of them takes time

Page 22: NoSQL databases, the CAP theorem, and the theory of relativity

22

Math digression

Page 23: NoSQL databases, the CAP theorem, and the theory of relativity

23

Time, Clocks and the Ordering of Events in a Distributed SystemThe origin of this paper was a note titled The Maintenance of Duplicate Databases by Paul Johnson and Bob Thomas. I believe their note introduced the idea of using message time-stamps in a distributed algorithm. I happen to have a solid, visceral understanding of special relativity (see [5]). This enabled me to grasp immediately the essence of what they were trying to do. ... I realized that the essence of Johnson and Thomas's algorithm was the use of timestamps to provide a total ordering of events that was consistent with the causal order. ...

It didn't take me long to realize that an algorithm for totally ordering events could be used to implement any distributed system.

http://research.microsoft.com/en-us/um/people/lamport/pubs/pubs.html#time-clocks

Page 24: NoSQL databases, the CAP theorem, and the theory of relativity

24

Order theory

• An ordering relation is any relation ≤ such that– a ≤ a (reflexivity)– if a ≤ b and b ≤ a then a = b

(antisymmetry)– if a ≤ b and b ≤ c then a ≤ c (transitivity)

• A total order is an order such that– a ≤ b or b ≤ a (totality)

• A partial order is any order which is not total– that is, for some pairs a and b, neither a ≤

b nor b ≤ a

Page 25: NoSQL databases, the CAP theorem, and the theory of relativity

25

Examples

• Total orders– normal ordering of numbers and letters

• Partial orders– ordering sets by the subset relation (see

figure)– “In general, the ·order-relation· on

duration is a partial order since there is no determinate relationship between certain durations such as one month (P1M) and 30 days (P30D)” XML Schema, pt2

Page 26: NoSQL databases, the CAP theorem, and the theory of relativity

26

Relativity

Page 27: NoSQL databases, the CAP theorem, and the theory of relativity

27

History

• 1687– Isaac Newton publishes

Philosophiæ Naturalis Principia Mathematica

– physics begins– no changes over next two

centuries• 1905– Albert Einstein publishes special

relativity– abandons notions of fixed time

and space• 1916– Einstein’s general relativity– takes into account gravity– no changes since

Page 28: NoSQL databases, the CAP theorem, and the theory of relativity

28

...(we skip 270 slides)

Page 29: NoSQL databases, the CAP theorem, and the theory of relativity

29

The barn is too small

• Three people (M, F, and B) own a board (5m wide) and a barn (4m wide)

• The board doesn’t fit inside the barn!

• What to do?

4

Page 30: NoSQL databases, the CAP theorem, and the theory of relativity

30

Page 31: NoSQL databases, the CAP theorem, and the theory of relativity

31

We have a solution!

As seen by F & B: board is 4m long (relativistic shortening), barn continues to be 4m wide.

When the board is exactly inside the barn, F and B will close their doors simultaneously,and the problem will be solved.

(As seen by M: board is 5m long (at rest relative to him), barn shortened to 3.2m wide. Pay no attention to this.)

Page 32: NoSQL databases, the CAP theorem, and the theory of relativity

32

What they observeF and B• When the board is

just inside, both close their doors simultaneously

• Right after, the board crashes through back door

M• B shuts his door just

as the front of the board reaches him

• 0.6 seconds later, F closes his door

• Board crashes through back door

Page 33: NoSQL databases, the CAP theorem, and the theory of relativity

33

The key point• They don’t agree on the order of events!

– and this is not a paradox– it is in fact how the universe works

• Change the story slightly and the three people could have three different orders of events

• No total order of events exists on which all observers can agree– the ordering of events in the universe is a

partial order• What then of causality?

– if A causes B, but some people think B happened before A, then what?

Page 34: NoSQL databases, the CAP theorem, and the theory of relativity

34

Resolution

• A, B, and C are events• The cone is the “light cone”

from A– that is, the spread of light from A

• C is outside the cone– therefore A cannot influence C– observers may disagree on order

of A&C• B is inside the cone– therefore A can influence B– observers may not disagree on

order

Page 35: NoSQL databases, the CAP theorem, and the theory of relativity

35

...(we skip 532 slides)

Page 36: NoSQL databases, the CAP theorem, and the theory of relativity

36

Back to CAP

Page 37: NoSQL databases, the CAP theorem, and the theory of relativity

37

Relevance to the CAP theorem• Distributed nodes will never agree

on the order of events– unless a communication delay is

introduced• Basically, only events inside the

“light cone” can be totally ordered– communications delay can never be less

than time taken by light to traverse physical distance

– in practice, it will be quite a bit bigger– how big depends on hardware and

design constraints

Page 38: NoSQL databases, the CAP theorem, and the theory of relativity

38

One solution: Paxos

• Protocol created by Leslie Lamport• Can be used to introduce logical clock– all nodes agree to always increase the

number• Which again can order events• Allowing all nodes to agree on order of

eventshttp://research.microsoft.com/en-us/um/people/lamport/pubs/pubs.html#paxos-simple

Page 39: NoSQL databases, the CAP theorem, and the theory of relativity

39

Another solution: ev. consistency• Essentially, this solution says some

% of errors is acceptable– Amazon may have to compensate

purchasers with a gift card once every 100,000 transactions

– business value of remaining available is higher than cost of errors

• Even in banking this may apply– ATMs often continue working even if

contact with bank is lost– allow withdrawals up to some limit– accept it if customers overcharge– (they’ll have to pay fees and interest,

anyway)

Page 40: NoSQL databases, the CAP theorem, and the theory of relativity

40

CALM• Client code may have to compensate

for database inconsistencies– this can quickly become complex– complex means error-prone

• However, there is a way around that– CALM = consistency as logical monotonicity– means facts used by clients to make decisions

never change• A database that never deletes or

overwrites is CALM– for example because it logs trades or other

events– non-CALM databases may require ACID

Page 41: NoSQL databases, the CAP theorem, and the theory of relativity

41

A CALM example

• Client 1 reads A = 10• Client 1 uses this information to

write B = 5• If Client 2 now reads B = 5, that

client cannot read a value of A older than A = 10

• Doing so would violate what’s known as “causal consistency”

Page 42: NoSQL databases, the CAP theorem, and the theory of relativity

42

ACID 2.0

• A bit misleading, because it’s not really ACID at all

• Basically requires update operations to have these properties– associativity a + (b + c) = (a

+ b) + c– commutativity a + b = b + a– idempotence f(x) = f(f(x))– distributed

• One approach is to use datatypes which guarantee these properties

Page 43: NoSQL databases, the CAP theorem, and the theory of relativity

43

SDShare updates

• These are actually– associative– commutative– idempotent

• So SDShare is already ACID 2.0...

Page 44: NoSQL databases, the CAP theorem, and the theory of relativity

44

Another solution: datatypes

• CRDTs– commutative, replicated data types

• Basically,– data types designed so that the order of

operations doesn’t matter– the end result is always the same,

regardless of the order of operations

Page 45: NoSQL databases, the CAP theorem, and the theory of relativity

45

An example

• A problem with our cash withdrawal example is the overwrite operation

• What if the operation were “increment(account, -100)” instead?– this operation is associative and

commutative– (not inherently idempotent, however)

• Nodes can now apply incoming updates as they get them– the ordering of updates can be ignored– once all updates are applied, all nodes

will agree (which is eventual consistency)

Page 46: NoSQL databases, the CAP theorem, and the theory of relativity

46

Thus far, all is well, and everyone agrees

Page 47: NoSQL databases, the CAP theorem, and the theory of relativity

47

“To go wildly faster, one

must remove all four

sources of the overhead

discussed above. This is

possible in either a SQL

context or some other

context.”

Page 48: NoSQL databases, the CAP theorem, and the theory of relativity

48

What on earth is he talking about?

Everyone knows SQL and ACID don’t scale!

Page 49: NoSQL databases, the CAP theorem, and the theory of relativity

49

Meanwhile, at Google...

Page 50: NoSQL databases, the CAP theorem, and the theory of relativity

50

AdWords database difficulties

This backend was originally based on a MySQL database

that was manually sharded many ways. The uncompressed dataset is tens of terabytes, which is small compared to many NoSQL instances, but was large enough to cause difficulties with sharded MySQL. The MySQL sharding scheme assigned each customer and all related data to a fixed shard. This layout enabled the use of indexes and complex query processing on a per-customer basis, but required some knowledge of the sharding in

application business logic. Resharding this revenue-critical database as it grew in the number of customers and their data was extremely costly. The last resharding took

over two years of intense effort, and involved coordination and testing across dozens of teams to minimize risk.

Page 51: NoSQL databases, the CAP theorem, and the theory of relativity

51

More background

We store financial data and have hard requirements on data integrity and consistency. We also have a lot of experience with eventual consistency systems at Google. In all such systems, we find developers spend a significant fraction of their time building extremely complex and error-prone mechanisms to cope with eventual consistency and handle data that may be out of date. We think this is an unacceptable burden to place on developers and that consistency problems should be solved at the database level.

Page 52: NoSQL databases, the CAP theorem, and the theory of relativity

52

Yet more background

At least 300 applications within Google use Megastore (despite its relatively low per- formance) because its data model is simpler to manage than Bigtable’s, and because of its support for synchronous replication across datacenters. (Bigtable only supports eventually-consistent replication across data- centers.) Examples of well-known Google applications that use Megastore are Gmail, Picasa, Calendar, Android Market, and AppEngine.

Page 53: NoSQL databases, the CAP theorem, and the theory of relativity

53

Requirements

• Scalability– scale simply by adding hardware– no manual sharding

• Availability– no downtime, for any reason

• Consistency– full ACID transactions

• Usability– full SQL with indexes

Uh, didn’t we just learnthat this is impossible?

Page 54: NoSQL databases, the CAP theorem, and the theory of relativity

54

Spanner

• Globally-distributed semi-relational db– SQL as query language– versioned data with non-locking read-

only transactions• Externally consistent reads/writes• Atomic schema updates– even while transactions are running

• High availability– experiment: killing 25 out of 125 servers

has no effect (except on throughput)

Page 55: NoSQL databases, the CAP theorem, and the theory of relativity

55

Transaction model

• Fairly close to traditional MVCC– every row has a timestamp– reads have associated timestamp, see

database as of that point in time• The key is a consistent order of

timestamps across nodes

Page 56: NoSQL databases, the CAP theorem, and the theory of relativity

56

Spanner architecture

Page 57: NoSQL databases, the CAP theorem, and the theory of relativity

57

Spanner architecture #2

Page 58: NoSQL databases, the CAP theorem, and the theory of relativity

58

TrueTime• Enables consistency in Spanner by giving

transactions timestamps– that is, imposes a consistent ordering on

transactions• Represents time with uncertainty interval

– the bigger the uncertainty, the more careful nodes must be

– bigger uncertainty leads to slower transactions• Uses two kinds of time servers to reduce

uncertainty– GPS-based servers– atomic clocks

Page 59: NoSQL databases, the CAP theorem, and the theory of relativity

59

Use of Paxos is key

• Combines Paxos with TrueTime to ensure timestamps are monotonically increasing

• Paxos requires majority votes to agree– implies less than half of data centers can

fail at any one time• AdWords therefore runs with 5 data

centers– allows two simultaneous failures without

effect– three on East Coast, two on West Coast– (in Google East Coast + West Goast =

globally)

Page 60: NoSQL databases, the CAP theorem, and the theory of relativity

60

Data model

Page 61: NoSQL databases, the CAP theorem, and the theory of relativity

61

F1 – the next layer up

• Builds on Spanner, adds– distributed SQL queries– including joins from external sources– transactionally consistent indexes– asynchronous schema changes– optimistic transactions– automatic change history

Page 62: NoSQL databases, the CAP theorem, and the theory of relativity

62

Why built-in change history?“Many database users build mechanisms to log changes, either from

application code or using database features like triggers. In the MySQL

system that AdWords used before F1, our Java application libraries

added change history records into all transactions. This was nice, but it

was inefficient and never 100% reliable. Some classes of changes would

not get history records, including changes written from Python

scripts and manual SQL data changes.”

Application code is not enough to enforce business rules,because many important changes are made behind theapplication code. For example, data conversion.

Look at any database that’s a few years old, and you’llfind data disallowed by the application code, but allowedby the schema.

Page 63: NoSQL databases, the CAP theorem, and the theory of relativity

63

Distributed queries

Page 64: NoSQL databases, the CAP theorem, and the theory of relativity

64

Two interfaces

• NoSQL interface– basically a simple key->row lookup– simpler in code for object lookup– faster because no SQL parsing

• Full SQL interface– good for analytics and more complex

interactions

Page 65: NoSQL databases, the CAP theorem, and the theory of relativity

65

Status

• >100 terabyte of uncompressed data– distributed across 5 data centers– Five nines (99.999%) uptime

• Serves up to hundreds of thousands of requests/second

• SQL queries scan trillions of rows/day

• No observable increase of latency compared to MySQL-based backend– but change tracking and sharding now

invisible to application

Page 66: NoSQL databases, the CAP theorem, and the theory of relativity

66

Winding up

Page 67: NoSQL databases, the CAP theorem, and the theory of relativity

67

Conclusion

• NoSQL is mostly about BASE– to some degree also schemalessness

• The CAP Theorem is key to understanding distributed systems– NoSQL is BASE because of CAP

• The CAP Theorem is a consequence of the theory of relativity

• New systems seem to indicate that ACID may scale, after all– basically, the speed of light is greater

than we thought

Page 68: NoSQL databases, the CAP theorem, and the theory of relativity

68

Further reading• NoSQL eMag, InfoQ, pilot issue May 2013– http://www.infoq.com/minibooks/emag-NoSQL

• Brewer’s original presentation– http://www.cs.berkeley.edu/~brewer/cs262b-

2004/PODC-keynote.pdf• Proof by Lynch & Gilbert– http://lpd.epfl.ch/sgilbert/pubs/BrewersConjecture-

SigAct.pdf• Why E=mc2?, Cox & Forshaw• Eventual Consistency Today: Limitations,

Extensions, and Beyond, ACM Queue– http://queue.acm.org/detail.cfm?id=2462076

Page 69: NoSQL databases, the CAP theorem, and the theory of relativity

69

Further reading

• Spanner paper– http://research.google.com/archive/

spanner.html• F1 papers– http://research.google.com/pubs/

pub38125.html– http://research.google.com/pubs/

pub41376.html