scalable sql and nosql slides · • acid: atomicity, consistency, isolation, durability ......
TRANSCRIPT
![Page 1: Scalable SQL and NoSQL Slides · • ACID: Atomicity, Consistency, Isolation, Durability ... another place for improvements in performance. HBase ... Hypertable • Written in C++,](https://reader035.vdocuments.us/reader035/viewer/2022062604/5f76a2452fadde22ca10ccea/html5/thumbnails/1.jpg)
Scalable SQL and NoSQL Data Stores
Rick Cattell
Presenter: MoHan Zhang
![Page 2: Scalable SQL and NoSQL Slides · • ACID: Atomicity, Consistency, Isolation, Durability ... another place for improvements in performance. HBase ... Hypertable • Written in C++,](https://reader035.vdocuments.us/reader035/viewer/2022062604/5f76a2452fadde22ca10ccea/html5/thumbnails/2.jpg)
What is NoSQL?
![Page 3: Scalable SQL and NoSQL Slides · • ACID: Atomicity, Consistency, Isolation, Durability ... another place for improvements in performance. HBase ... Hypertable • Written in C++,](https://reader035.vdocuments.us/reader035/viewer/2022062604/5f76a2452fadde22ca10ccea/html5/thumbnails/3.jpg)
NoSQL• Stand for: Not Only SQL / Not Relational
• Features:
• Ability to scale to many servers
• Efficient use of distributed indexes & RAM for data storage
• Dynamically add new attributes to data records (dynamic schema)
• Weaker concurrency model than ACID transactions of most relational databases
![Page 4: Scalable SQL and NoSQL Slides · • ACID: Atomicity, Consistency, Isolation, Durability ... another place for improvements in performance. HBase ... Hypertable • Written in C++,](https://reader035.vdocuments.us/reader035/viewer/2022062604/5f76a2452fadde22ca10ccea/html5/thumbnails/4.jpg)
ACID vs BASE• ACID: Atomicity, Consistency, Isolation, Durability
• BASE: Basically Available, Soft State, Eventually Consistent
• Updates are eventually propagated, but limited guarantee on read consistency
• Give up ACID constraints = Higher Performance and Scalability
![Page 5: Scalable SQL and NoSQL Slides · • ACID: Atomicity, Consistency, Isolation, Durability ... another place for improvements in performance. HBase ... Hypertable • Written in C++,](https://reader035.vdocuments.us/reader035/viewer/2022062604/5f76a2452fadde22ca10ccea/html5/thumbnails/5.jpg)
Key Property: Shared Nothing Architecture
• Replicate and partition data over many servers
• support a large number of simple read/write operations per second
![Page 6: Scalable SQL and NoSQL Slides · • ACID: Atomicity, Consistency, Isolation, Durability ... another place for improvements in performance. HBase ... Hypertable • Written in C++,](https://reader035.vdocuments.us/reader035/viewer/2022062604/5f76a2452fadde22ca10ccea/html5/thumbnails/6.jpg)
The purpose of this paper is to survey a set of scalable SQL and NoSQL database
models under the following 4 categories:
![Page 7: Scalable SQL and NoSQL Slides · • ACID: Atomicity, Consistency, Isolation, Durability ... another place for improvements in performance. HBase ... Hypertable • Written in C++,](https://reader035.vdocuments.us/reader035/viewer/2022062604/5f76a2452fadde22ca10ccea/html5/thumbnails/7.jpg)
• Key-value Stores
• Document Stores
• Extensible Record Stores
• Relational Databases
![Page 8: Scalable SQL and NoSQL Slides · • ACID: Atomicity, Consistency, Isolation, Durability ... another place for improvements in performance. HBase ... Hypertable • Written in C++,](https://reader035.vdocuments.us/reader035/viewer/2022062604/5f76a2452fadde22ca10ccea/html5/thumbnails/8.jpg)
• Key-value Stores
• Document Stores
• Extensible Record Stores
• Relational Databases
![Page 9: Scalable SQL and NoSQL Slides · • ACID: Atomicity, Consistency, Isolation, Durability ... another place for improvements in performance. HBase ... Hypertable • Written in C++,](https://reader035.vdocuments.us/reader035/viewer/2022062604/5f76a2452fadde22ca10ccea/html5/thumbnails/9.jpg)
Key-value Stores
• Systems under this category store values and an index to find them, based on a programmer defined key
• Insert, Delete, Lookup Operations
• Scalability through key distributions over nodes
![Page 10: Scalable SQL and NoSQL Slides · • ACID: Atomicity, Consistency, Isolation, Durability ... another place for improvements in performance. HBase ... Hypertable • Written in C++,](https://reader035.vdocuments.us/reader035/viewer/2022062604/5f76a2452fadde22ca10ccea/html5/thumbnails/10.jpg)
Use Case:
• Simple application, one kind of object, only need to look up on one attribute
![Page 11: Scalable SQL and NoSQL Slides · • ACID: Atomicity, Consistency, Isolation, Durability ... another place for improvements in performance. HBase ... Hypertable • Written in C++,](https://reader035.vdocuments.us/reader035/viewer/2022062604/5f76a2452fadde22ca10ccea/html5/thumbnails/11.jpg)
![Page 12: Scalable SQL and NoSQL Slides · • ACID: Atomicity, Consistency, Isolation, Durability ... another place for improvements in performance. HBase ... Hypertable • Written in C++,](https://reader035.vdocuments.us/reader035/viewer/2022062604/5f76a2452fadde22ca10ccea/html5/thumbnails/12.jpg)
Project Voldemort• Written in Java, open-source, supported by Linkedin
• Multi-version Concurrency Control (MVCC) for updates
• No guarantee of consistent data
• Optimistic Locking
• Consistent Hashing
• Store data in RAM or in storage engines
![Page 13: Scalable SQL and NoSQL Slides · • ACID: Atomicity, Consistency, Isolation, Durability ... another place for improvements in performance. HBase ... Hypertable • Written in C++,](https://reader035.vdocuments.us/reader035/viewer/2022062604/5f76a2452fadde22ca10ccea/html5/thumbnails/13.jpg)
![Page 14: Scalable SQL and NoSQL Slides · • ACID: Atomicity, Consistency, Isolation, Durability ... another place for improvements in performance. HBase ... Hypertable • Written in C++,](https://reader035.vdocuments.us/reader035/viewer/2022062604/5f76a2452fadde22ca10ccea/html5/thumbnails/14.jpg)
Riak• Written in Erlang, open-source, client based on RESTful
• Objects can be fetched and stored in JSON
• can have multiple fields (like documents)
• Only lookup is on Primary Key
• MVCC & Consistent Hashing
• Map/Reduce to split work over nodes in a cluster
• Unique Feature: Store links between objects
![Page 15: Scalable SQL and NoSQL Slides · • ACID: Atomicity, Consistency, Isolation, Durability ... another place for improvements in performance. HBase ... Hypertable • Written in C++,](https://reader035.vdocuments.us/reader035/viewer/2022062604/5f76a2452fadde22ca10ccea/html5/thumbnails/15.jpg)
![Page 16: Scalable SQL and NoSQL Slides · • ACID: Atomicity, Consistency, Isolation, Durability ... another place for improvements in performance. HBase ... Hypertable • Written in C++,](https://reader035.vdocuments.us/reader035/viewer/2022062604/5f76a2452fadde22ca10ccea/html5/thumbnails/16.jpg)
Redis
• Written in C, Open-source
• Client side does the distributed hashing over servers, servers store data in RAM
• Updates by locking
• Asynchronous Replication
![Page 17: Scalable SQL and NoSQL Slides · • ACID: Atomicity, Consistency, Isolation, Durability ... another place for improvements in performance. HBase ... Hypertable • Written in C++,](https://reader035.vdocuments.us/reader035/viewer/2022062604/5f76a2452fadde22ca10ccea/html5/thumbnails/17.jpg)
![Page 18: Scalable SQL and NoSQL Slides · • ACID: Atomicity, Consistency, Isolation, Durability ... another place for improvements in performance. HBase ... Hypertable • Written in C++,](https://reader035.vdocuments.us/reader035/viewer/2022062604/5f76a2452fadde22ca10ccea/html5/thumbnails/18.jpg)
Membase
• Based on distributed in-memory indexing system, Memcache
• Open-source
• Elastically add / remove servers in a running system
![Page 19: Scalable SQL and NoSQL Slides · • ACID: Atomicity, Consistency, Isolation, Durability ... another place for improvements in performance. HBase ... Hypertable • Written in C++,](https://reader035.vdocuments.us/reader035/viewer/2022062604/5f76a2452fadde22ca10ccea/html5/thumbnails/19.jpg)
Other systems:
• Scalaris
• Tokyo Cabinet
![Page 20: Scalable SQL and NoSQL Slides · • ACID: Atomicity, Consistency, Isolation, Durability ... another place for improvements in performance. HBase ... Hypertable • Written in C++,](https://reader035.vdocuments.us/reader035/viewer/2022062604/5f76a2452fadde22ca10ccea/html5/thumbnails/20.jpg)
Riak Redis Scalaris Tokyo Cabinet Membase Voldemort
Data Store Ram or disk Ram Ram Ram or
disk Ram Ram or disk
Replication Async Async Sync Async Sync Async
Transactions No No Yes Yes No No
Updates MVCC Locking Locking Locking Locking MVCC
![Page 21: Scalable SQL and NoSQL Slides · • ACID: Atomicity, Consistency, Isolation, Durability ... another place for improvements in performance. HBase ... Hypertable • Written in C++,](https://reader035.vdocuments.us/reader035/viewer/2022062604/5f76a2452fadde22ca10ccea/html5/thumbnails/21.jpg)
• Key-value Stores
• Document Stores
• Extensible Record Stores
• Relational Databases
![Page 22: Scalable SQL and NoSQL Slides · • ACID: Atomicity, Consistency, Isolation, Durability ... another place for improvements in performance. HBase ... Hypertable • Written in C++,](https://reader035.vdocuments.us/reader035/viewer/2022062604/5f76a2452fadde22ca10ccea/html5/thumbnails/22.jpg)
Document Stores
• Systems under this category store documents. Documents are indexed and a query mechanism is provided.
• Secondary indexes and multiple types of objects per database
• No ACID Transactional Properties
![Page 23: Scalable SQL and NoSQL Slides · • ACID: Atomicity, Consistency, Isolation, Durability ... another place for improvements in performance. HBase ... Hypertable • Written in C++,](https://reader035.vdocuments.us/reader035/viewer/2022062604/5f76a2452fadde22ca10ccea/html5/thumbnails/23.jpg)
Use Case:
• Multiple kinds of objects (e.g. Driver Licensing, with vehicles and drivers), need to look up on multiple attributes (driver_name, license_number, owned_vehicle, birthday)
• Need to tolerate eventual consistency
![Page 24: Scalable SQL and NoSQL Slides · • ACID: Atomicity, Consistency, Isolation, Durability ... another place for improvements in performance. HBase ... Hypertable • Written in C++,](https://reader035.vdocuments.us/reader035/viewer/2022062604/5f76a2452fadde22ca10ccea/html5/thumbnails/24.jpg)
![Page 25: Scalable SQL and NoSQL Slides · • ACID: Atomicity, Consistency, Isolation, Durability ... another place for improvements in performance. HBase ... Hypertable • Written in C++,](https://reader035.vdocuments.us/reader035/viewer/2022062604/5f76a2452fadde22ca10ccea/html5/thumbnails/25.jpg)
SimpleDB• Pay as you go service from Amazon
• Select, Delete, GetAttributes, PutAttributes
• Does not allow nested documents
• Eventual Consistency & Async replication
• More than one grouping in one database
• multiple indexes
• No automatic data partitioning over servers
![Page 26: Scalable SQL and NoSQL Slides · • ACID: Atomicity, Consistency, Isolation, Durability ... another place for improvements in performance. HBase ... Hypertable • Written in C++,](https://reader035.vdocuments.us/reader035/viewer/2022062604/5f76a2452fadde22ca10ccea/html5/thumbnails/26.jpg)
![Page 27: Scalable SQL and NoSQL Slides · • ACID: Atomicity, Consistency, Isolation, Durability ... another place for improvements in performance. HBase ... Hypertable • Written in C++,](https://reader035.vdocuments.us/reader035/viewer/2022062604/5f76a2452fadde22ca10ccea/html5/thumbnails/27.jpg)
MongoDB• Written in C++, GPL Open-source
• Automatic sharing distributed documents over many servers
• Replication used for failover, not for scalability
• Data stored in BSON format (binary JSON)
• Master-slave replication with automatic failover and recovery
![Page 28: Scalable SQL and NoSQL Slides · • ACID: Atomicity, Consistency, Isolation, Durability ... another place for improvements in performance. HBase ... Hypertable • Written in C++,](https://reader035.vdocuments.us/reader035/viewer/2022062604/5f76a2452fadde22ca10ccea/html5/thumbnails/28.jpg)
Other systems
• CouchDB
• Terrastore
![Page 29: Scalable SQL and NoSQL Slides · • ACID: Atomicity, Consistency, Isolation, Durability ... another place for improvements in performance. HBase ... Hypertable • Written in C++,](https://reader035.vdocuments.us/reader035/viewer/2022062604/5f76a2452fadde22ca10ccea/html5/thumbnails/29.jpg)
• Key-value Stores
• Document Stores
• Extensible Record Stores
• Relational Databases
![Page 30: Scalable SQL and NoSQL Slides · • ACID: Atomicity, Consistency, Isolation, Durability ... another place for improvements in performance. HBase ... Hypertable • Written in C++,](https://reader035.vdocuments.us/reader035/viewer/2022062604/5f76a2452fadde22ca10ccea/html5/thumbnails/30.jpg)
Extensible Record Stores
• Systems under this category store extensible records that can be partitioned vertically and horizontally across nodes
• Motivated by Google’s BigTable, but none achieved the scalability of BigTable
![Page 31: Scalable SQL and NoSQL Slides · • ACID: Atomicity, Consistency, Isolation, Durability ... another place for improvements in performance. HBase ... Hypertable • Written in C++,](https://reader035.vdocuments.us/reader035/viewer/2022062604/5f76a2452fadde22ca10ccea/html5/thumbnails/31.jpg)
Use Case:• Multiple kinds of objects and need to look up on
multiple attributes, higher throughput than Document Stores, stronger concurrency
• e.g. eBay application:
• cluster users by country
• Separate rarely changed customer information in one place, and frequently updated information in another place for improvements in performance
![Page 32: Scalable SQL and NoSQL Slides · • ACID: Atomicity, Consistency, Isolation, Durability ... another place for improvements in performance. HBase ... Hypertable • Written in C++,](https://reader035.vdocuments.us/reader035/viewer/2022062604/5f76a2452fadde22ca10ccea/html5/thumbnails/32.jpg)
![Page 33: Scalable SQL and NoSQL Slides · • ACID: Atomicity, Consistency, Isolation, Durability ... another place for improvements in performance. HBase ... Hypertable • Written in C++,](https://reader035.vdocuments.us/reader035/viewer/2022062604/5f76a2452fadde22ca10ccea/html5/thumbnails/33.jpg)
HBase• Written in Java, Apache project
• Hadoop DFS, updates in memory and periodically write to disk
• updates go to the end of data files
• B-trees allow fast range queries and sorting
• Optimistic Concurrency control
![Page 34: Scalable SQL and NoSQL Slides · • ACID: Atomicity, Consistency, Isolation, Durability ... another place for improvements in performance. HBase ... Hypertable • Written in C++,](https://reader035.vdocuments.us/reader035/viewer/2022062604/5f76a2452fadde22ca10ccea/html5/thumbnails/34.jpg)
![Page 35: Scalable SQL and NoSQL Slides · • ACID: Atomicity, Consistency, Isolation, Durability ... another place for improvements in performance. HBase ... Hypertable • Written in C++,](https://reader035.vdocuments.us/reader035/viewer/2022062604/5f76a2452fadde22ca10ccea/html5/thumbnails/35.jpg)
Hypertable
• Written in C++, Open-source, sponsored by Baidu
• Similar to BigTable and HBase
• Uses query language named HQL
![Page 36: Scalable SQL and NoSQL Slides · • ACID: Atomicity, Consistency, Isolation, Durability ... another place for improvements in performance. HBase ... Hypertable • Written in C++,](https://reader035.vdocuments.us/reader035/viewer/2022062604/5f76a2452fadde22ca10ccea/html5/thumbnails/36.jpg)
![Page 37: Scalable SQL and NoSQL Slides · • ACID: Atomicity, Consistency, Isolation, Durability ... another place for improvements in performance. HBase ... Hypertable • Written in C++,](https://reader035.vdocuments.us/reader035/viewer/2022062604/5f76a2452fadde22ca10ccea/html5/thumbnails/37.jpg)
Cassandra
• Written in Java, Open-source, basic features similar to HBase
• Used by Facebook and other companies
• Weaker Concurrency Model: No locking, Async replica updates
![Page 38: Scalable SQL and NoSQL Slides · • ACID: Atomicity, Consistency, Isolation, Durability ... another place for improvements in performance. HBase ... Hypertable • Written in C++,](https://reader035.vdocuments.us/reader035/viewer/2022062604/5f76a2452fadde22ca10ccea/html5/thumbnails/38.jpg)
• Key-value Stores
• Document Stores
• Extensible Record Stores
• Relational Databases
![Page 39: Scalable SQL and NoSQL Slides · • ACID: Atomicity, Consistency, Isolation, Durability ... another place for improvements in performance. HBase ... Hypertable • Written in C++,](https://reader035.vdocuments.us/reader035/viewer/2022062604/5f76a2452fadde22ca10ccea/html5/thumbnails/39.jpg)
Scalable Relational Databases
• Pre-defined Schema, SQL interface, ACID transactions
• Penalize Large-scope operations, while NoSQL systems forbid these operations
• Avoid cross-node operations to deliver scalability
![Page 40: Scalable SQL and NoSQL Slides · • ACID: Atomicity, Consistency, Isolation, Durability ... another place for improvements in performance. HBase ... Hypertable • Written in C++,](https://reader035.vdocuments.us/reader035/viewer/2022062604/5f76a2452fadde22ca10ccea/html5/thumbnails/40.jpg)
Use Case:
• Many tables across different kinds of data, need for a centralized schema, need for simplicity of SQL
• Database being updated from many locations
![Page 41: Scalable SQL and NoSQL Slides · • ACID: Atomicity, Consistency, Isolation, Durability ... another place for improvements in performance. HBase ... Hypertable • Written in C++,](https://reader035.vdocuments.us/reader035/viewer/2022062604/5f76a2452fadde22ca10ccea/html5/thumbnails/41.jpg)
![Page 42: Scalable SQL and NoSQL Slides · • ACID: Atomicity, Consistency, Isolation, Durability ... another place for improvements in performance. HBase ... Hypertable • Written in C++,](https://reader035.vdocuments.us/reader035/viewer/2022062604/5f76a2452fadde22ca10ccea/html5/thumbnails/42.jpg)
MySQL Cluster
• Shared nothing architecture: shards data over multiple database servers
• In-memory & Disk-based data
• Can scale to more nodes than other RDBMSs but runs into bottleneck after a few dozen nodes
![Page 43: Scalable SQL and NoSQL Slides · • ACID: Atomicity, Consistency, Isolation, Durability ... another place for improvements in performance. HBase ... Hypertable • Written in C++,](https://reader035.vdocuments.us/reader035/viewer/2022062604/5f76a2452fadde22ca10ccea/html5/thumbnails/43.jpg)
![Page 44: Scalable SQL and NoSQL Slides · • ACID: Atomicity, Consistency, Isolation, Durability ... another place for improvements in performance. HBase ... Hypertable • Written in C++,](https://reader035.vdocuments.us/reader035/viewer/2022062604/5f76a2452fadde22ca10ccea/html5/thumbnails/44.jpg)
VoltDB• Open-source RDBMS, designed for scalability and
per-node performance
• Tables partitioned over many servers
• Shards replicated for crash recovery
• Designed for databases that fit into distributed RAM of a server, so that the system never waits for the disk
• This and other optimizations boost single node performance
![Page 45: Scalable SQL and NoSQL Slides · • ACID: Atomicity, Consistency, Isolation, Durability ... another place for improvements in performance. HBase ... Hypertable • Written in C++,](https://reader035.vdocuments.us/reader035/viewer/2022062604/5f76a2452fadde22ca10ccea/html5/thumbnails/45.jpg)
![Page 46: Scalable SQL and NoSQL Slides · • ACID: Atomicity, Consistency, Isolation, Durability ... another place for improvements in performance. HBase ... Hypertable • Written in C++,](https://reader035.vdocuments.us/reader035/viewer/2022062604/5f76a2452fadde22ca10ccea/html5/thumbnails/46.jpg)
Clustrix
• Nodes sold as rack-mounted appliances
• Scalability to hundreds of nodes, automatic sharing & replication
• Automatic failover and failure recovery
• Seamlessly compatible with MySQL
![Page 47: Scalable SQL and NoSQL Slides · • ACID: Atomicity, Consistency, Isolation, Durability ... another place for improvements in performance. HBase ... Hypertable • Written in C++,](https://reader035.vdocuments.us/reader035/viewer/2022062604/5f76a2452fadde22ca10ccea/html5/thumbnails/47.jpg)
Other systems
• ScaleDB
• ScaleBase
• NimbusDB
![Page 48: Scalable SQL and NoSQL Slides · • ACID: Atomicity, Consistency, Isolation, Durability ... another place for improvements in performance. HBase ... Hypertable • Written in C++,](https://reader035.vdocuments.us/reader035/viewer/2022062604/5f76a2452fadde22ca10ccea/html5/thumbnails/48.jpg)
Conclusion
![Page 49: Scalable SQL and NoSQL Slides · • ACID: Atomicity, Consistency, Isolation, Durability ... another place for improvements in performance. HBase ... Hypertable • Written in C++,](https://reader035.vdocuments.us/reader035/viewer/2022062604/5f76a2452fadde22ca10ccea/html5/thumbnails/49.jpg)
Some predictions from 2010• Many developers are willing to abandon globally ACID
transactions in order to gain scalability, availability, and other advantages
• The simplicity, flexibility, and scalability of NoSQL data stores fill a niche market
• Many data models described today will not be enterprise ready in a while
• One or two systems within each category will become the leader
![Page 50: Scalable SQL and NoSQL Slides · • ACID: Atomicity, Consistency, Isolation, Durability ... another place for improvements in performance. HBase ... Hypertable • Written in C++,](https://reader035.vdocuments.us/reader035/viewer/2022062604/5f76a2452fadde22ca10ccea/html5/thumbnails/50.jpg)
Relational > NoSQL?• Relational can do everything NoSQL can, with
analogous performance and scalability, adding in the convenience of SQL
• Relational DBMSs have been dominating the market for more than 30 years
• Relational DBMSs have been built to deal with other problems and they will have no problem dealing with scalability
![Page 51: Scalable SQL and NoSQL Slides · • ACID: Atomicity, Consistency, Isolation, Durability ... another place for improvements in performance. HBase ... Hypertable • Written in C++,](https://reader035.vdocuments.us/reader035/viewer/2022062604/5f76a2452fadde22ca10ccea/html5/thumbnails/51.jpg)
NoSQL > Relational?• No benchmarks showing Relational can achieve the
scalability of some NoSQL systems
• In NoSQL: only pay the learning curve for the complexity you require
• Relational DBMS makes expensive (multi-node, multi-table) operations too accessible, NoSQL systems make them impossible or visibly expensive to programmers
• While relational DBMSs have been successful, over the years there have been other products occupying niche markets
![Page 52: Scalable SQL and NoSQL Slides · • ACID: Atomicity, Consistency, Isolation, Durability ... another place for improvements in performance. HBase ... Hypertable • Written in C++,](https://reader035.vdocuments.us/reader035/viewer/2022062604/5f76a2452fadde22ca10ccea/html5/thumbnails/52.jpg)
Thank you!
![Page 53: Scalable SQL and NoSQL Slides · • ACID: Atomicity, Consistency, Isolation, Durability ... another place for improvements in performance. HBase ... Hypertable • Written in C++,](https://reader035.vdocuments.us/reader035/viewer/2022062604/5f76a2452fadde22ca10ccea/html5/thumbnails/53.jpg)
Q&A
![Page 54: Scalable SQL and NoSQL Slides · • ACID: Atomicity, Consistency, Isolation, Durability ... another place for improvements in performance. HBase ... Hypertable • Written in C++,](https://reader035.vdocuments.us/reader035/viewer/2022062604/5f76a2452fadde22ca10ccea/html5/thumbnails/54.jpg)