rdbms vs nosql - iit kanpur vs nosql relational database management systems versus big data...
Post on 12-Apr-2018
226 Views
Preview:
TRANSCRIPT
RDBMS vs NoSQLRelational Database Management Systems
versusBig Data Management (NoSQL) Systems
ARNAB BHATTACHARYAarnabb@cse.iitk.ac.in
Department of Computer Science and Engineering,Indian Institute of Technology, Kanpur,
India
9th August, 2017TEQIP Short Course on Big Data
Database
Concept of a database
A database is a collection of interrelated dataA database management system (DBMS) provides anenvironment that is efficient and convenient to usePrograms and interface to
Store dataVisualize dataAccess (query) dataManipulate data
Arnab Bhattacharya (arnabb@cse.iitk.ac.in) RDBMS vs NoSQL 09/08/17 2 / 24
RDBMS
Relational DBMS
Table-based
Relational algebra as mathematical backgroundOperators precisely definedOperands are relationsRelations are sets of tuplesTuples consist of named attributesConcept of candidate keys to uniquely identify tuplesQuery across relations (joins) are naturalProcedures can be coded into RDBMS engineTriggers and views are supported
Arnab Bhattacharya (arnabb@cse.iitk.ac.in) RDBMS vs NoSQL 09/08/17 3 / 24
RDBMS
Relational DBMS
Table-basedRelational algebra as mathematical backgroundOperators precisely definedOperands are relations
Relations are sets of tuplesTuples consist of named attributesConcept of candidate keys to uniquely identify tuplesQuery across relations (joins) are naturalProcedures can be coded into RDBMS engineTriggers and views are supported
Arnab Bhattacharya (arnabb@cse.iitk.ac.in) RDBMS vs NoSQL 09/08/17 3 / 24
RDBMS
Relational DBMS
Table-basedRelational algebra as mathematical backgroundOperators precisely definedOperands are relationsRelations are sets of tuplesTuples consist of named attributesConcept of candidate keys to uniquely identify tuples
Query across relations (joins) are naturalProcedures can be coded into RDBMS engineTriggers and views are supported
Arnab Bhattacharya (arnabb@cse.iitk.ac.in) RDBMS vs NoSQL 09/08/17 3 / 24
RDBMS
Relational DBMS
Table-basedRelational algebra as mathematical backgroundOperators precisely definedOperands are relationsRelations are sets of tuplesTuples consist of named attributesConcept of candidate keys to uniquely identify tuplesQuery across relations (joins) are natural
Procedures can be coded into RDBMS engineTriggers and views are supported
Arnab Bhattacharya (arnabb@cse.iitk.ac.in) RDBMS vs NoSQL 09/08/17 3 / 24
RDBMS
Relational DBMS
Table-basedRelational algebra as mathematical backgroundOperators precisely definedOperands are relationsRelations are sets of tuplesTuples consist of named attributesConcept of candidate keys to uniquely identify tuplesQuery across relations (joins) are naturalProcedures can be coded into RDBMS engineTriggers and views are supported
Arnab Bhattacharya (arnabb@cse.iitk.ac.in) RDBMS vs NoSQL 09/08/17 3 / 24
RDBMS
SQL
Structured Query LanguageFormally defined programming language based on relationalalgebraDeclarative language
RDBMS engine free to choose implementation of operationsDecades of query optimizationIndexing
Arnab Bhattacharya (arnabb@cse.iitk.ac.in) RDBMS vs NoSQL 09/08/17 4 / 24
RDBMS
SQL
Structured Query LanguageFormally defined programming language based on relationalalgebraDeclarative languageRDBMS engine free to choose implementation of operationsDecades of query optimizationIndexing
Arnab Bhattacharya (arnabb@cse.iitk.ac.in) RDBMS vs NoSQL 09/08/17 4 / 24
RDBMS
Transactions
RDBMSs offer in-built transaction support
A transaction is a logical unit of a programACID properties to preserve data integrity
Atomicity: either all operations or noneConsistency: database remains consistent before and after atransactionIsolation: one transaction has no effect on other even if they runconcurrentlyDurability: effect of a transaction is permanent
Arnab Bhattacharya (arnabb@cse.iitk.ac.in) RDBMS vs NoSQL 09/08/17 5 / 24
RDBMS
Transactions
RDBMSs offer in-built transaction supportA transaction is a logical unit of a program
ACID properties to preserve data integrityAtomicity: either all operations or noneConsistency: database remains consistent before and after atransactionIsolation: one transaction has no effect on other even if they runconcurrentlyDurability: effect of a transaction is permanent
Arnab Bhattacharya (arnabb@cse.iitk.ac.in) RDBMS vs NoSQL 09/08/17 5 / 24
RDBMS
Transactions
RDBMSs offer in-built transaction supportA transaction is a logical unit of a programACID properties to preserve data integrity
Atomicity: either all operations or noneConsistency: database remains consistent before and after atransactionIsolation: one transaction has no effect on other even if they runconcurrentlyDurability: effect of a transaction is permanent
Arnab Bhattacharya (arnabb@cse.iitk.ac.in) RDBMS vs NoSQL 09/08/17 5 / 24
RDBMS
Schedules
A schedule is a chronological sequence of instructions fromconcurrent transactionsIf a transaction appears in a schedule, all instructions of thetransaction must appear in the scheduleOrder of instructions within a transaction must be maintained inthe schedule
To increase concurrencyMultiple transactions should be able to run simultaneously
Serializability ensures correctnessRecoverability ensures consistency despite failures
Arnab Bhattacharya (arnabb@cse.iitk.ac.in) RDBMS vs NoSQL 09/08/17 6 / 24
RDBMS
Schedules
A schedule is a chronological sequence of instructions fromconcurrent transactionsIf a transaction appears in a schedule, all instructions of thetransaction must appear in the scheduleOrder of instructions within a transaction must be maintained inthe scheduleTo increase concurrency
Multiple transactions should be able to run simultaneously
Serializability ensures correctnessRecoverability ensures consistency despite failures
Arnab Bhattacharya (arnabb@cse.iitk.ac.in) RDBMS vs NoSQL 09/08/17 6 / 24
RDBMS
Schedules
A schedule is a chronological sequence of instructions fromconcurrent transactionsIf a transaction appears in a schedule, all instructions of thetransaction must appear in the scheduleOrder of instructions within a transaction must be maintained inthe scheduleTo increase concurrency
Multiple transactions should be able to run simultaneously
Serializability ensures correctness
Recoverability ensures consistency despite failures
Arnab Bhattacharya (arnabb@cse.iitk.ac.in) RDBMS vs NoSQL 09/08/17 6 / 24
RDBMS
Schedules
A schedule is a chronological sequence of instructions fromconcurrent transactionsIf a transaction appears in a schedule, all instructions of thetransaction must appear in the scheduleOrder of instructions within a transaction must be maintained inthe scheduleTo increase concurrency
Multiple transactions should be able to run simultaneously
Serializability ensures correctnessRecoverability ensures consistency despite failures
Arnab Bhattacharya (arnabb@cse.iitk.ac.in) RDBMS vs NoSQL 09/08/17 6 / 24
RDBMS
Issues of RDBMS
Scalability of RDBMS is a problemIt is at most vertical, i.e., across relationsAll tuples in a relation must stay in one machine
Distributed design is harderIndexing across distributed machines is not provided naturallyHard to model complex data
HierarchicalSpatio-temporalGraphsSemi-structured
Unnatural way to model data
Arnab Bhattacharya (arnabb@cse.iitk.ac.in) RDBMS vs NoSQL 09/08/17 7 / 24
RDBMS
Issues of RDBMS
Scalability of RDBMS is a problemIt is at most vertical, i.e., across relationsAll tuples in a relation must stay in one machineDistributed design is harder
Indexing across distributed machines is not provided naturallyHard to model complex data
HierarchicalSpatio-temporalGraphsSemi-structured
Unnatural way to model data
Arnab Bhattacharya (arnabb@cse.iitk.ac.in) RDBMS vs NoSQL 09/08/17 7 / 24
RDBMS
Issues of RDBMS
Scalability of RDBMS is a problemIt is at most vertical, i.e., across relationsAll tuples in a relation must stay in one machineDistributed design is harderIndexing across distributed machines is not provided naturally
Hard to model complex dataHierarchicalSpatio-temporalGraphsSemi-structured
Unnatural way to model data
Arnab Bhattacharya (arnabb@cse.iitk.ac.in) RDBMS vs NoSQL 09/08/17 7 / 24
RDBMS
Issues of RDBMS
Scalability of RDBMS is a problemIt is at most vertical, i.e., across relationsAll tuples in a relation must stay in one machineDistributed design is harderIndexing across distributed machines is not provided naturallyHard to model complex data
HierarchicalSpatio-temporalGraphsSemi-structured
Unnatural way to model data
Arnab Bhattacharya (arnabb@cse.iitk.ac.in) RDBMS vs NoSQL 09/08/17 7 / 24
NoSQL
NoSQL
NoSQL is
not “no-SQL”It is not only SQLIt does not aim to provide the ACID propertiesOriginated as no-SQL thoughLater changed since RDBMS is too powerful to always ignoreScalability is horizontal, i.e., can put tuples across ditributedmachinesFlexibility to model any kind of dataNatural way of modeling dataDistribution support is in-built
Arnab Bhattacharya (arnabb@cse.iitk.ac.in) RDBMS vs NoSQL 09/08/17 8 / 24
NoSQL
NoSQL
NoSQL is not “no-SQL”It is not only SQL
It does not aim to provide the ACID propertiesOriginated as no-SQL thoughLater changed since RDBMS is too powerful to always ignoreScalability is horizontal, i.e., can put tuples across ditributedmachinesFlexibility to model any kind of dataNatural way of modeling dataDistribution support is in-built
Arnab Bhattacharya (arnabb@cse.iitk.ac.in) RDBMS vs NoSQL 09/08/17 8 / 24
NoSQL
NoSQL
NoSQL is not “no-SQL”It is not only SQLIt does not aim to provide the ACID propertiesOriginated as no-SQL thoughLater changed since RDBMS is too powerful to always ignore
Scalability is horizontal, i.e., can put tuples across ditributedmachinesFlexibility to model any kind of dataNatural way of modeling dataDistribution support is in-built
Arnab Bhattacharya (arnabb@cse.iitk.ac.in) RDBMS vs NoSQL 09/08/17 8 / 24
NoSQL
NoSQL
NoSQL is not “no-SQL”It is not only SQLIt does not aim to provide the ACID propertiesOriginated as no-SQL thoughLater changed since RDBMS is too powerful to always ignoreScalability is horizontal, i.e., can put tuples across ditributedmachines
Flexibility to model any kind of dataNatural way of modeling dataDistribution support is in-built
Arnab Bhattacharya (arnabb@cse.iitk.ac.in) RDBMS vs NoSQL 09/08/17 8 / 24
NoSQL
NoSQL
NoSQL is not “no-SQL”It is not only SQLIt does not aim to provide the ACID propertiesOriginated as no-SQL thoughLater changed since RDBMS is too powerful to always ignoreScalability is horizontal, i.e., can put tuples across ditributedmachinesFlexibility to model any kind of dataNatural way of modeling data
Distribution support is in-built
Arnab Bhattacharya (arnabb@cse.iitk.ac.in) RDBMS vs NoSQL 09/08/17 8 / 24
NoSQL
NoSQL
NoSQL is not “no-SQL”It is not only SQLIt does not aim to provide the ACID propertiesOriginated as no-SQL thoughLater changed since RDBMS is too powerful to always ignoreScalability is horizontal, i.e., can put tuples across ditributedmachinesFlexibility to model any kind of dataNatural way of modeling dataDistribution support is in-built
Arnab Bhattacharya (arnabb@cse.iitk.ac.in) RDBMS vs NoSQL 09/08/17 8 / 24
NoSQL
CAP theorem
All of C, A, P cannot be satisfied simultaneouslyCA: single-site; partitioning is not allowedCP: what is available is consistentAP: everything is available but may not be consistentNot a theorem – just a hypothesis
Arnab Bhattacharya (arnabb@cse.iitk.ac.in) RDBMS vs NoSQL 09/08/17 9 / 24
NoSQL
CAP theorem
All of C, A, P cannot be satisfied simultaneouslyCA: single-site; partitioning is not allowedCP: what is available is consistentAP: everything is available but may not be consistentNot a theorem – just a hypothesis
Arnab Bhattacharya (arnabb@cse.iitk.ac.in) RDBMS vs NoSQL 09/08/17 9 / 24
NoSQL
CAP theorem
All of C, A, P cannot be satisfied simultaneously
CA: single-site; partitioning is not allowedCP: what is available is consistentAP: everything is available but may not be consistentNot a theorem – just a hypothesis
Arnab Bhattacharya (arnabb@cse.iitk.ac.in) RDBMS vs NoSQL 09/08/17 9 / 24
NoSQL
CAP theorem
All of C, A, P cannot be satisfied simultaneouslyCA: single-site; partitioning is not allowedCP: what is available is consistentAP: everything is available but may not be consistent
Not a theorem – just a hypothesis
Arnab Bhattacharya (arnabb@cse.iitk.ac.in) RDBMS vs NoSQL 09/08/17 9 / 24
NoSQL
CAP theorem
All of C, A, P cannot be satisfied simultaneouslyCA: single-site; partitioning is not allowedCP: what is available is consistentAP: everything is available but may not be consistentNot a theorem – just a hypothesis
Arnab Bhattacharya (arnabb@cse.iitk.ac.in) RDBMS vs NoSQL 09/08/17 9 / 24
NoSQL
BASE properties
Basically Available: System guarantees availabilitySoft state: State of system is soft, i.e., it may change without inputto maintain consistencyEventual consistency: Data will be eventually consistent withoutany interim perturbation
Sacrifices consistencyTo counter ACID
Arnab Bhattacharya (arnabb@cse.iitk.ac.in) RDBMS vs NoSQL 09/08/17 10 / 24
NoSQL
BASE properties
Basically Available: System guarantees availabilitySoft state: State of system is soft, i.e., it may change without inputto maintain consistencyEventual consistency: Data will be eventually consistent withoutany interim perturbationSacrifices consistency
To counter ACID
Arnab Bhattacharya (arnabb@cse.iitk.ac.in) RDBMS vs NoSQL 09/08/17 10 / 24
NoSQL
BASE properties
Basically Available: System guarantees availabilitySoft state: State of system is soft, i.e., it may change without inputto maintain consistencyEventual consistency: Data will be eventually consistent withoutany interim perturbationSacrifices consistencyTo counter ACID
Arnab Bhattacharya (arnabb@cse.iitk.ac.in) RDBMS vs NoSQL 09/08/17 10 / 24
NoSQL
Types
Four main types of NoSQL data stores:1 Columnar families2 Bigtable systems3 Document databases4 Graph databases
http://nosql-database.org/
Arnab Bhattacharya (arnabb@cse.iitk.ac.in) RDBMS vs NoSQL 09/08/17 11 / 24
NoSQL
Columnar storage
Instead of rows being stored together, columns are storedconsecutivelyA single disk block (or a set of consecutive blocks) stores a singlecolumn familyA column family may consist of one or multiple columnsThis set of columns is called a super column
Two main typesColumnar relational modelsKey-value stores and/or big tables
Arnab Bhattacharya (arnabb@cse.iitk.ac.in) RDBMS vs NoSQL 09/08/17 12 / 24
NoSQL
Columnar storage
Instead of rows being stored together, columns are storedconsecutivelyA single disk block (or a set of consecutive blocks) stores a singlecolumn familyA column family may consist of one or multiple columnsThis set of columns is called a super columnTwo main types
Columnar relational modelsKey-value stores and/or big tables
Arnab Bhattacharya (arnabb@cse.iitk.ac.in) RDBMS vs NoSQL 09/08/17 12 / 24
NoSQL
Columnar relational models
Not NoSQL and is actually RDBMSColumn-wise storage on the disk
Allows faster querying when only few columns are touched on theentire dataAllows compression of columnsProvides better memory cachingJoins are faster since they are mostly on similar columns from twotablesNot good for updatesNot good when many columns of a few tuples are accessedGood for OLAP (online analytical processing)Not good for OLTP (online transaction processing)Example: MonetDB
Arnab Bhattacharya (arnabb@cse.iitk.ac.in) RDBMS vs NoSQL 09/08/17 13 / 24
NoSQL
Columnar relational models
Not NoSQL and is actually RDBMSColumn-wise storage on the diskAllows faster querying when only few columns are touched on theentire dataAllows compression of columnsProvides better memory cachingJoins are faster since they are mostly on similar columns from twotables
Not good for updatesNot good when many columns of a few tuples are accessedGood for OLAP (online analytical processing)Not good for OLTP (online transaction processing)Example: MonetDB
Arnab Bhattacharya (arnabb@cse.iitk.ac.in) RDBMS vs NoSQL 09/08/17 13 / 24
NoSQL
Columnar relational models
Not NoSQL and is actually RDBMSColumn-wise storage on the diskAllows faster querying when only few columns are touched on theentire dataAllows compression of columnsProvides better memory cachingJoins are faster since they are mostly on similar columns from twotablesNot good for updatesNot good when many columns of a few tuples are accessed
Good for OLAP (online analytical processing)Not good for OLTP (online transaction processing)Example: MonetDB
Arnab Bhattacharya (arnabb@cse.iitk.ac.in) RDBMS vs NoSQL 09/08/17 13 / 24
NoSQL
Columnar relational models
Not NoSQL and is actually RDBMSColumn-wise storage on the diskAllows faster querying when only few columns are touched on theentire dataAllows compression of columnsProvides better memory cachingJoins are faster since they are mostly on similar columns from twotablesNot good for updatesNot good when many columns of a few tuples are accessedGood for OLAP (online analytical processing)Not good for OLTP (online transaction processing)
Example: MonetDB
Arnab Bhattacharya (arnabb@cse.iitk.ac.in) RDBMS vs NoSQL 09/08/17 13 / 24
NoSQL
Columnar relational models
Not NoSQL and is actually RDBMSColumn-wise storage on the diskAllows faster querying when only few columns are touched on theentire dataAllows compression of columnsProvides better memory cachingJoins are faster since they are mostly on similar columns from twotablesNot good for updatesNot good when many columns of a few tuples are accessedGood for OLAP (online analytical processing)Not good for OLTP (online transaction processing)Example: MonetDB
Arnab Bhattacharya (arnabb@cse.iitk.ac.in) RDBMS vs NoSQL 09/08/17 13 / 24
NoSQL
Key-value stores
Two columns: a key and a valueKey is mostly textValue can be anything and is simply an object
Essentially, actual data becomes “value” and an unique id isgenerated which becomes “key”Whole database is then just one big table with these two columnsBecomes schema-lessCan be distributed and is, thus, highly scalableIn essence, a big distributed hash tableAll queries are on keysKeys are necessarily indexedExample: Cassandra, CouchDB, HBase, Tokyo Cabinet, Redis
Arnab Bhattacharya (arnabb@cse.iitk.ac.in) RDBMS vs NoSQL 09/08/17 14 / 24
NoSQL
Key-value stores
Two columns: a key and a valueKey is mostly textValue can be anything and is simply an objectEssentially, actual data becomes “value” and an unique id isgenerated which becomes “key”
Whole database is then just one big table with these two columnsBecomes schema-lessCan be distributed and is, thus, highly scalableIn essence, a big distributed hash tableAll queries are on keysKeys are necessarily indexedExample: Cassandra, CouchDB, HBase, Tokyo Cabinet, Redis
Arnab Bhattacharya (arnabb@cse.iitk.ac.in) RDBMS vs NoSQL 09/08/17 14 / 24
NoSQL
Key-value stores
Two columns: a key and a valueKey is mostly textValue can be anything and is simply an objectEssentially, actual data becomes “value” and an unique id isgenerated which becomes “key”Whole database is then just one big table with these two columnsBecomes schema-less
Can be distributed and is, thus, highly scalableIn essence, a big distributed hash tableAll queries are on keysKeys are necessarily indexedExample: Cassandra, CouchDB, HBase, Tokyo Cabinet, Redis
Arnab Bhattacharya (arnabb@cse.iitk.ac.in) RDBMS vs NoSQL 09/08/17 14 / 24
NoSQL
Key-value stores
Two columns: a key and a valueKey is mostly textValue can be anything and is simply an objectEssentially, actual data becomes “value” and an unique id isgenerated which becomes “key”Whole database is then just one big table with these two columnsBecomes schema-lessCan be distributed and is, thus, highly scalableIn essence, a big distributed hash table
All queries are on keysKeys are necessarily indexedExample: Cassandra, CouchDB, HBase, Tokyo Cabinet, Redis
Arnab Bhattacharya (arnabb@cse.iitk.ac.in) RDBMS vs NoSQL 09/08/17 14 / 24
NoSQL
Key-value stores
Two columns: a key and a valueKey is mostly textValue can be anything and is simply an objectEssentially, actual data becomes “value” and an unique id isgenerated which becomes “key”Whole database is then just one big table with these two columnsBecomes schema-lessCan be distributed and is, thus, highly scalableIn essence, a big distributed hash tableAll queries are on keysKeys are necessarily indexed
Example: Cassandra, CouchDB, HBase, Tokyo Cabinet, Redis
Arnab Bhattacharya (arnabb@cse.iitk.ac.in) RDBMS vs NoSQL 09/08/17 14 / 24
NoSQL
Key-value stores
Two columns: a key and a valueKey is mostly textValue can be anything and is simply an objectEssentially, actual data becomes “value” and an unique id isgenerated which becomes “key”Whole database is then just one big table with these two columnsBecomes schema-lessCan be distributed and is, thus, highly scalableIn essence, a big distributed hash tableAll queries are on keysKeys are necessarily indexedExample: Cassandra, CouchDB, HBase, Tokyo Cabinet, Redis
Arnab Bhattacharya (arnabb@cse.iitk.ac.in) RDBMS vs NoSQL 09/08/17 14 / 24
NoSQL
Bigtable systems
Started from Google’s BigTable implementationUses a key-value storeData can be replicated for better availability
Uses a timestampTimestamp is used to
Expire dataDelete stale dataResolve read-write conflicts
Same value can be indexed using multiple keysMap-reduce framework to computeExample: BigTable, HBase, Cassandra, HyperTable, SimpleDB
Arnab Bhattacharya (arnabb@cse.iitk.ac.in) RDBMS vs NoSQL 09/08/17 15 / 24
NoSQL
Bigtable systems
Started from Google’s BigTable implementationUses a key-value storeData can be replicated for better availabilityUses a timestampTimestamp is used to
Expire dataDelete stale dataResolve read-write conflicts
Same value can be indexed using multiple keysMap-reduce framework to computeExample: BigTable, HBase, Cassandra, HyperTable, SimpleDB
Arnab Bhattacharya (arnabb@cse.iitk.ac.in) RDBMS vs NoSQL 09/08/17 15 / 24
NoSQL
Bigtable systems
Started from Google’s BigTable implementationUses a key-value storeData can be replicated for better availabilityUses a timestampTimestamp is used to
Expire dataDelete stale dataResolve read-write conflicts
Same value can be indexed using multiple keysMap-reduce framework to compute
Example: BigTable, HBase, Cassandra, HyperTable, SimpleDB
Arnab Bhattacharya (arnabb@cse.iitk.ac.in) RDBMS vs NoSQL 09/08/17 15 / 24
NoSQL
Bigtable systems
Started from Google’s BigTable implementationUses a key-value storeData can be replicated for better availabilityUses a timestampTimestamp is used to
Expire dataDelete stale dataResolve read-write conflicts
Same value can be indexed using multiple keysMap-reduce framework to computeExample: BigTable, HBase, Cassandra, HyperTable, SimpleDB
Arnab Bhattacharya (arnabb@cse.iitk.ac.in) RDBMS vs NoSQL 09/08/17 15 / 24
NoSQL
Document databases
Uses documents as the main storage format of dataPopular document formats are XML, JSON, BSON, YAMLDocument itself is the key while the content is the valueDocument can be indexed by id or simply its location (e.g., URI)
Content needs to be parsed to make senseContent can be organised furtherExtremely useful for insert-once read-many scenariosCan use map-reduce framework to computeExample: MongoDB, CouchDB
Arnab Bhattacharya (arnabb@cse.iitk.ac.in) RDBMS vs NoSQL 09/08/17 16 / 24
NoSQL
Document databases
Uses documents as the main storage format of dataPopular document formats are XML, JSON, BSON, YAMLDocument itself is the key while the content is the valueDocument can be indexed by id or simply its location (e.g., URI)Content needs to be parsed to make senseContent can be organised further
Extremely useful for insert-once read-many scenariosCan use map-reduce framework to computeExample: MongoDB, CouchDB
Arnab Bhattacharya (arnabb@cse.iitk.ac.in) RDBMS vs NoSQL 09/08/17 16 / 24
NoSQL
Document databases
Uses documents as the main storage format of dataPopular document formats are XML, JSON, BSON, YAMLDocument itself is the key while the content is the valueDocument can be indexed by id or simply its location (e.g., URI)Content needs to be parsed to make senseContent can be organised furtherExtremely useful for insert-once read-many scenariosCan use map-reduce framework to compute
Example: MongoDB, CouchDB
Arnab Bhattacharya (arnabb@cse.iitk.ac.in) RDBMS vs NoSQL 09/08/17 16 / 24
NoSQL
Document databases
Uses documents as the main storage format of dataPopular document formats are XML, JSON, BSON, YAMLDocument itself is the key while the content is the valueDocument can be indexed by id or simply its location (e.g., URI)Content needs to be parsed to make senseContent can be organised furtherExtremely useful for insert-once read-many scenariosCan use map-reduce framework to computeExample: MongoDB, CouchDB
Arnab Bhattacharya (arnabb@cse.iitk.ac.in) RDBMS vs NoSQL 09/08/17 16 / 24
NoSQL
Graph databases
Nodes represent entities or objectsEdges encode relationships between nodesCan be directedCan have hyper-edges as well
Easier to find distances and neighborsExample: Neo4J, HyperGraph, Infinite Graph, Titan, FlockDB
Arnab Bhattacharya (arnabb@cse.iitk.ac.in) RDBMS vs NoSQL 09/08/17 17 / 24
NoSQL
Graph databases
Nodes represent entities or objectsEdges encode relationships between nodesCan be directedCan have hyper-edges as wellEasier to find distances and neighbors
Example: Neo4J, HyperGraph, Infinite Graph, Titan, FlockDB
Arnab Bhattacharya (arnabb@cse.iitk.ac.in) RDBMS vs NoSQL 09/08/17 17 / 24
NoSQL
Graph databases
Nodes represent entities or objectsEdges encode relationships between nodesCan be directedCan have hyper-edges as wellEasier to find distances and neighborsExample: Neo4J, HyperGraph, Infinite Graph, Titan, FlockDB
Arnab Bhattacharya (arnabb@cse.iitk.ac.in) RDBMS vs NoSQL 09/08/17 17 / 24
NoSQL Systems
NoSQL systems
Three most popular ones areHBaseCassandraMongoDBhttps://www.linkedin.com/pulse/real-comparison-nosql-databases-hbase-cassandra-mongodb-sahu
Arnab Bhattacharya (arnabb@cse.iitk.ac.in) RDBMS vs NoSQL 09/08/17 18 / 24
NoSQL Systems
HBase
Based on Hadoop, HDFS and BigTableKey-value store
Column storesUses column familiesRequires Zookeeper to maintain distributed coordination,configuration and maintenanceCentralized master that dictates slavesStrong consistencyVersioning can be doneScales by adding nodesCP
Arnab Bhattacharya (arnabb@cse.iitk.ac.in) RDBMS vs NoSQL 09/08/17 19 / 24
NoSQL Systems
HBase
Based on Hadoop, HDFS and BigTableKey-value storeColumn storesUses column families
Requires Zookeeper to maintain distributed coordination,configuration and maintenanceCentralized master that dictates slavesStrong consistencyVersioning can be doneScales by adding nodesCP
Arnab Bhattacharya (arnabb@cse.iitk.ac.in) RDBMS vs NoSQL 09/08/17 19 / 24
NoSQL Systems
HBase
Based on Hadoop, HDFS and BigTableKey-value storeColumn storesUses column familiesRequires Zookeeper to maintain distributed coordination,configuration and maintenanceCentralized master that dictates slaves
Strong consistencyVersioning can be doneScales by adding nodesCP
Arnab Bhattacharya (arnabb@cse.iitk.ac.in) RDBMS vs NoSQL 09/08/17 19 / 24
NoSQL Systems
HBase
Based on Hadoop, HDFS and BigTableKey-value storeColumn storesUses column familiesRequires Zookeeper to maintain distributed coordination,configuration and maintenanceCentralized master that dictates slavesStrong consistency
Versioning can be doneScales by adding nodesCP
Arnab Bhattacharya (arnabb@cse.iitk.ac.in) RDBMS vs NoSQL 09/08/17 19 / 24
NoSQL Systems
HBase
Based on Hadoop, HDFS and BigTableKey-value storeColumn storesUses column familiesRequires Zookeeper to maintain distributed coordination,configuration and maintenanceCentralized master that dictates slavesStrong consistencyVersioning can be done
Scales by adding nodesCP
Arnab Bhattacharya (arnabb@cse.iitk.ac.in) RDBMS vs NoSQL 09/08/17 19 / 24
NoSQL Systems
HBase
Based on Hadoop, HDFS and BigTableKey-value storeColumn storesUses column familiesRequires Zookeeper to maintain distributed coordination,configuration and maintenanceCentralized master that dictates slavesStrong consistencyVersioning can be doneScales by adding nodes
CP
Arnab Bhattacharya (arnabb@cse.iitk.ac.in) RDBMS vs NoSQL 09/08/17 19 / 24
NoSQL Systems
HBase
Based on Hadoop, HDFS and BigTableKey-value storeColumn storesUses column familiesRequires Zookeeper to maintain distributed coordination,configuration and maintenanceCentralized master that dictates slavesStrong consistencyVersioning can be doneScales by adding nodesCP
Arnab Bhattacharya (arnabb@cse.iitk.ac.in) RDBMS vs NoSQL 09/08/17 19 / 24
NoSQL Systems
Cassandra
Column-store based on BigTable
Decentralized architectureReplicatedAny node can perform any actionStrong securityContinuous availabilityExtremely good single-tuple read performanceNot fully consistentRequires quorum reads for consistencyAP
Arnab Bhattacharya (arnabb@cse.iitk.ac.in) RDBMS vs NoSQL 09/08/17 20 / 24
NoSQL Systems
Cassandra
Column-store based on BigTableDecentralized architecture
ReplicatedAny node can perform any actionStrong securityContinuous availabilityExtremely good single-tuple read performanceNot fully consistentRequires quorum reads for consistencyAP
Arnab Bhattacharya (arnabb@cse.iitk.ac.in) RDBMS vs NoSQL 09/08/17 20 / 24
NoSQL Systems
Cassandra
Column-store based on BigTableDecentralized architectureReplicatedAny node can perform any action
Strong securityContinuous availabilityExtremely good single-tuple read performanceNot fully consistentRequires quorum reads for consistencyAP
Arnab Bhattacharya (arnabb@cse.iitk.ac.in) RDBMS vs NoSQL 09/08/17 20 / 24
NoSQL Systems
Cassandra
Column-store based on BigTableDecentralized architectureReplicatedAny node can perform any actionStrong security
Continuous availabilityExtremely good single-tuple read performanceNot fully consistentRequires quorum reads for consistencyAP
Arnab Bhattacharya (arnabb@cse.iitk.ac.in) RDBMS vs NoSQL 09/08/17 20 / 24
NoSQL Systems
Cassandra
Column-store based on BigTableDecentralized architectureReplicatedAny node can perform any actionStrong securityContinuous availability
Extremely good single-tuple read performanceNot fully consistentRequires quorum reads for consistencyAP
Arnab Bhattacharya (arnabb@cse.iitk.ac.in) RDBMS vs NoSQL 09/08/17 20 / 24
NoSQL Systems
Cassandra
Column-store based on BigTableDecentralized architectureReplicatedAny node can perform any actionStrong securityContinuous availabilityExtremely good single-tuple read performance
Not fully consistentRequires quorum reads for consistencyAP
Arnab Bhattacharya (arnabb@cse.iitk.ac.in) RDBMS vs NoSQL 09/08/17 20 / 24
NoSQL Systems
Cassandra
Column-store based on BigTableDecentralized architectureReplicatedAny node can perform any actionStrong securityContinuous availabilityExtremely good single-tuple read performanceNot fully consistentRequires quorum reads for consistency
AP
Arnab Bhattacharya (arnabb@cse.iitk.ac.in) RDBMS vs NoSQL 09/08/17 20 / 24
NoSQL Systems
Cassandra
Column-store based on BigTableDecentralized architectureReplicatedAny node can perform any actionStrong securityContinuous availabilityExtremely good single-tuple read performanceNot fully consistentRequires quorum reads for consistencyAP
Arnab Bhattacharya (arnabb@cse.iitk.ac.in) RDBMS vs NoSQL 09/08/17 20 / 24
NoSQL Systems
MongoDB
Document storeData stored in JSON or BSON format
Supports master-slave replicationStrong consistencyGood index supportData modeling flexibilityCP
Arnab Bhattacharya (arnabb@cse.iitk.ac.in) RDBMS vs NoSQL 09/08/17 21 / 24
NoSQL Systems
MongoDB
Document storeData stored in JSON or BSON formatSupports master-slave replication
Strong consistencyGood index supportData modeling flexibilityCP
Arnab Bhattacharya (arnabb@cse.iitk.ac.in) RDBMS vs NoSQL 09/08/17 21 / 24
NoSQL Systems
MongoDB
Document storeData stored in JSON or BSON formatSupports master-slave replicationStrong consistency
Good index supportData modeling flexibilityCP
Arnab Bhattacharya (arnabb@cse.iitk.ac.in) RDBMS vs NoSQL 09/08/17 21 / 24
NoSQL Systems
MongoDB
Document storeData stored in JSON or BSON formatSupports master-slave replicationStrong consistencyGood index support
Data modeling flexibilityCP
Arnab Bhattacharya (arnabb@cse.iitk.ac.in) RDBMS vs NoSQL 09/08/17 21 / 24
NoSQL Systems
MongoDB
Document storeData stored in JSON or BSON formatSupports master-slave replicationStrong consistencyGood index supportData modeling flexibility
CP
Arnab Bhattacharya (arnabb@cse.iitk.ac.in) RDBMS vs NoSQL 09/08/17 21 / 24
NoSQL Systems
MongoDB
Document storeData stored in JSON or BSON formatSupports master-slave replicationStrong consistencyGood index supportData modeling flexibilityCP
Arnab Bhattacharya (arnabb@cse.iitk.ac.in) RDBMS vs NoSQL 09/08/17 21 / 24
NoSQL Systems
Issues of NoSQL systems
No join support (unless columnar RDBMS)Cannot work across tables
Requires unraveling of data values to answer deeper queriesNo natural or direct procedural supportConsistency
Arnab Bhattacharya (arnabb@cse.iitk.ac.in) RDBMS vs NoSQL 09/08/17 22 / 24
NoSQL Systems
Issues of NoSQL systems
No join support (unless columnar RDBMS)Cannot work across tablesRequires unraveling of data values to answer deeper queries
No natural or direct procedural supportConsistency
Arnab Bhattacharya (arnabb@cse.iitk.ac.in) RDBMS vs NoSQL 09/08/17 22 / 24
NoSQL Systems
Issues of NoSQL systems
No join support (unless columnar RDBMS)Cannot work across tablesRequires unraveling of data values to answer deeper queriesNo natural or direct procedural support
Consistency
Arnab Bhattacharya (arnabb@cse.iitk.ac.in) RDBMS vs NoSQL 09/08/17 22 / 24
NoSQL Systems
Issues of NoSQL systems
No join support (unless columnar RDBMS)Cannot work across tablesRequires unraveling of data values to answer deeper queriesNo natural or direct procedural supportConsistency
Arnab Bhattacharya (arnabb@cse.iitk.ac.in) RDBMS vs NoSQL 09/08/17 22 / 24
Discussion
Discussion
NoSQL, although started as anti-SQL, is no more soMore a realisation that, for some cases
RDBMS does not scale or distribute, orACIDity is an overkill
NoSQL is not good for every scenarioNot always consistency can be sacrificedMost legacy systems still use RDBMSMany NoSQL systems are increasingly using features of RDBMSNoSQL horizon is shifting rapidly with almost no control or senseHowever, trend is for NoSQL as cloud computing and big datarelies on ithttps://db-engines.com/en/system/Cassandra%3BHBase%3BMongoDB%3BPostgreSQL
Arnab Bhattacharya (arnabb@cse.iitk.ac.in) RDBMS vs NoSQL 09/08/17 23 / 24
Discussion
Discussion
NoSQL, although started as anti-SQL, is no more soMore a realisation that, for some cases
RDBMS does not scale or distribute, orACIDity is an overkill
NoSQL is not good for every scenarioNot always consistency can be sacrificedMost legacy systems still use RDBMS
Many NoSQL systems are increasingly using features of RDBMSNoSQL horizon is shifting rapidly with almost no control or senseHowever, trend is for NoSQL as cloud computing and big datarelies on ithttps://db-engines.com/en/system/Cassandra%3BHBase%3BMongoDB%3BPostgreSQL
Arnab Bhattacharya (arnabb@cse.iitk.ac.in) RDBMS vs NoSQL 09/08/17 23 / 24
Discussion
Discussion
NoSQL, although started as anti-SQL, is no more soMore a realisation that, for some cases
RDBMS does not scale or distribute, orACIDity is an overkill
NoSQL is not good for every scenarioNot always consistency can be sacrificedMost legacy systems still use RDBMSMany NoSQL systems are increasingly using features of RDBMS
NoSQL horizon is shifting rapidly with almost no control or senseHowever, trend is for NoSQL as cloud computing and big datarelies on ithttps://db-engines.com/en/system/Cassandra%3BHBase%3BMongoDB%3BPostgreSQL
Arnab Bhattacharya (arnabb@cse.iitk.ac.in) RDBMS vs NoSQL 09/08/17 23 / 24
Discussion
Discussion
NoSQL, although started as anti-SQL, is no more soMore a realisation that, for some cases
RDBMS does not scale or distribute, orACIDity is an overkill
NoSQL is not good for every scenarioNot always consistency can be sacrificedMost legacy systems still use RDBMSMany NoSQL systems are increasingly using features of RDBMSNoSQL horizon is shifting rapidly with almost no control or senseHowever, trend is for NoSQL as cloud computing and big datarelies on ithttps://db-engines.com/en/system/Cassandra%3BHBase%3BMongoDB%3BPostgreSQL
Arnab Bhattacharya (arnabb@cse.iitk.ac.in) RDBMS vs NoSQL 09/08/17 23 / 24
Discussion
Conclusions
RDBMS is still going strongNoSQL is catching up as a real choice and not just simply abuzzwordhttps://db-engines.com/en/
THANK YOU!
Questions?Answers!
Arnab Bhattacharya (arnabb@cse.iitk.ac.in) RDBMS vs NoSQL 09/08/17 24 / 24
Discussion
Conclusions
RDBMS is still going strongNoSQL is catching up as a real choice and not just simply abuzzwordhttps://db-engines.com/en/
THANK YOU!
Questions?Answers!
Arnab Bhattacharya (arnabb@cse.iitk.ac.in) RDBMS vs NoSQL 09/08/17 24 / 24
Discussion
Conclusions
RDBMS is still going strongNoSQL is catching up as a real choice and not just simply abuzzwordhttps://db-engines.com/en/
THANK YOU!
Questions?Answers!
Arnab Bhattacharya (arnabb@cse.iitk.ac.in) RDBMS vs NoSQL 09/08/17 24 / 24
top related