scalability: rdbms vs other data stores

22
RDBMS vs. Other Data Stores for Scalability [email protected] TechTalk 2009, IIIT Hyderabad

Upload: ramki-gaddipati

Post on 11-May-2015

5.701 views

Category:

Technology


0 download

TRANSCRIPT

Page 1: Scalability: Rdbms Vs Other Data Stores

RDBMS vs. Other Data Stores for

Scalability

[email protected] 2009, IIIT Hyderabad

Page 2: Scalability: Rdbms Vs Other Data Stores

Scalability

• Increase Resources Increase Performance (Linearly)

• Performance?– Latency, Capacity, Throughput

• Vertical Scalability (Scaling Up)– Divide the functionality

• Horizontal Scalability (Scaling Out)– Divide the data

Page 3: Scalability: Rdbms Vs Other Data Stores

Relational Database

• Table, Row, Column• Set, Item, Property

Page 4: Scalability: Rdbms Vs Other Data Stores

Relational Theory

• Selection: SELECT• Filter: WHERE• Join: JOIN, LEFT JOIN,RIGHT JOIN• Correlation:

SELECT a FROM A WHERE A.b IN (SELECT b FROM B WHERE b.a > a)

Page 5: Scalability: Rdbms Vs Other Data Stores

Relational Theory

• Aggregation– Set Operators• Union, Intersection, Minus

– Group By• MAX, MIN, SUM, AVG

Page 6: Scalability: Rdbms Vs Other Data Stores

Transactions: Atomicity

• Transaction Level– Entire Logical operations is a transaction– Multiple statements

• Statement level– Each statement is either successful or not, no

partial success– Multiple records

• Record Level– All modifications to a record are successful or not

Page 7: Scalability: Rdbms Vs Other Data Stores

Transactions: Consistency

• Integrity Constraints• Referential Integrity

Page 8: Scalability: Rdbms Vs Other Data Stores

Transactions: Isolation Levels

• Serializable– A definite order of mutations/transactions is possible to

arrive to state B from state A• Repeatable Read

– Any data read by a transaction will remain so till transaction is complete

• Non Repeatable Read aka Read Committed– Two reads within a transaction may give different results

• Dirty Read– A transaction might read data which might then be

rolledback

Page 9: Scalability: Rdbms Vs Other Data Stores

RDBMS Luxuries

• Multiple Indexes• Auto Increments/Sequences• Triggers

Page 10: Scalability: Rdbms Vs Other Data Stores

Scalability in RDBMS

• Replication– Read Replication (Master-Slave)– Read Write Replication (Master-Master)

• Cluster– Distributed Transaction– Two-phase commits

Page 11: Scalability: Rdbms Vs Other Data Stores

Scalability Impediments

• Performance– Sub-Queries/Correlation, Joins, Aggregates, – Referential Integrity constraints

• Basic Guarantee– Consistency– Availability

Page 12: Scalability: Rdbms Vs Other Data Stores

CAP?

• Conjecture: Distributed systems cannot ensure all three of the following properties at once– Consistency The client perceives that a set of

operations has occurred all at once.– Availability Every operation must terminate in an

intended response.– Partition tolerance Operations will complete, even

if individual components are unavailable.

Page 13: Scalability: Rdbms Vs Other Data Stores

ACID to BASE

• Basically Available - system seems to work all the time

• Soft State - it doesn't have to be consistent all the time

• Eventually Consistent - becomes consistent at some later time

Page 14: Scalability: Rdbms Vs Other Data Stores

BASE: An Example

BEGIN TransactionINSERT INTO ORDER( oid, timestamp, customer)FOREACH item IN itemList

INSERT INTO ORDER_ITEM ( oid, item.id, item.quantity, item.unitprice)

//UPDATE INVENTORY SET quantity=quantity-item.quantity WHERE item = item.idCOMMIT

END Transaction

Assume Each statement is queued for execution You will get COMMIT success

Page 15: Scalability: Rdbms Vs Other Data Stores

Alternate Implementations

• BigTable – Google – CP• Hbase – Apache – CP • HyperTable – Community - CP • Dynamo – Amazon – AP• SimpleDB – Amazon - AP• Voldemort – LinkedIn – AP• Cassandra – Facebook – AP• MemcacheDB - community – CP/AP

Page 16: Scalability: Rdbms Vs Other Data Stores

Data Models

• Key/Value Pairs – Dynamo, MemcacheDB, Voldemort

• Row-Column– BigTable, Casandra, SimpleDB, Hypertable, Hbase

Page 17: Scalability: Rdbms Vs Other Data Stores

Programming Models

// Open the tableTable *T = OpenOrDie("/bigtable/web/webtable");// Write a new anchor and delete an old anchorRowMutation r1(T, "com.cnn.www");r1.Set("anchor:www.c-span.org", "CNN");r1.Delete("anchor:www.abc.com");Operation op;Apply(&op, &r1);

Page 18: Scalability: Rdbms Vs Other Data Stores

BigTable: Consistent yet Infinitely Scalable

• Single Master• B+ tree based data distribution

Page 19: Scalability: Rdbms Vs Other Data Stores

BigTable: Transactions

Invoice

Invoice Item

Delivery Note

• Enities and Entity Groups

Page 20: Scalability: Rdbms Vs Other Data Stores

Dynamo: Highly available and Infinitely Scalable

• Consistent Hashing• Peer to Peer Distributed• Gossip based member discovery

Page 21: Scalability: Rdbms Vs Other Data Stores

RDBMS or Other?

• Nature of Business• Maturity of the Product• Cost of Adoption• Maturity of the alternative Datastores

Page 22: Scalability: Rdbms Vs Other Data Stores

Q&A