nosql overview
DESCRIPTION
Presented at JavaOne 2013, Wednesday September 25.TRANSCRIPT
NOSQL Overview
Tobias LindaakerSoftware Developer @ Neo Technology
twitter:! @thobe / @neo4j / #neo4jemail:! [email protected]:! http://neo4j.org/web:! http://thobe.org/
CON6449
Agenda
๏Key/Value Stores
๏Document Databases
๏NewSQL Databases
๏Graph Databases
๏Column Oriented Databases
๏Caches
๏Message Queues
๏Hadoop2
General
3
Two main categories
4
Aggregate oriented Graph
Distinctio
n defined by
Martin Fo
wler
Source: NoSQL Distilled
Trend: Less uniformity
5
6
α β γ δ ε ζ η θ ι κ λ μ
id π τ1337
2468
3145
3579
4468
7878
entity key value1337 a lorem ipsum
1337 b lorem ipsum
3145 b lorem ipsum
3578 a lorem ipsum
3579 f lorem ipsum
3579 j lorem ipsum
4468 c lorem ipsum
4468 f lorem ipsum
7878 g lorem ipsum
7878 f lorem ipsum
Sparse data - Relational mismatch
7
id foo
1337 bar
2468 baz
3145 quux
3579 quux
4468 waldo
7878 fred
Sparse data - Relational mismatch
id data
1337 {"foo":"bar", ...}
2468 {"foo":"bar", ...}
3145 {"foo":"bar", ...}
3579 {"foo":"bar", ...}
4468 {"foo":"bar", ...}
7878 {"foo":"bar", ...}
id bar
1337 foo
2468 baz
3145 quux
3579 quux
4468 waldo
7878 fred
Search Tables
Data Table
Trend: Exponential data growth
8
2005 2006 2007 2008 2009 2010 20112012
Con
nect
edne
s
Time
Trend: Data becomes more connected
9
Nothing is new - everything changes
10
Then๏Navigational databases
IDS (Codasyl), IMS (IBM)
๏Multivalued databasesPICK/BASIC
๏Key/Value databasesMUMPS/M
๏COPYBOOKCOBOL
๏Object databasesObjectivity, db4o
๏XML databases
Now๏Graph databases
Neo4j,
๏Column databasesCassandra
๏Key/Value databasesCouchbase
๏Document databasesMongoDB, Redis
Still recent enoughto not have “new”counterparts...
Key/Value stores
11
Key/Value stores
12
๏Amazon SimpleDB
๏memcached
๏Oracle NoSQL Database
๏Redis
Key/Value stores
13
E D
CF
G B
A
Key/Value stores
13
E D
CF
G B
A
Key/Value stores
13
E D
CF
G B
A
Key/Value stores
13
E D
CF
G B
A
14
Sample use case: Content sharing
Document Databases
15
Document Databases
๏Lotus Notes
๏MongoDB
๏Riak
๏Redis
๏CouchDB
16
Document Databases
17
‣ id: 99CC
‣ fname: John
‣ lname: Smith
Document Databases
17
‣ id: 99CC
‣ fname: John
‣ lname: Smith
‣ clock:
‣ type: Fob watch
‣ make: Gallifreyan
‣ diameter: 2”
Document Databases
17
‣ id: 99CC
‣ fname: John
‣ lname: Smith
‣ clock:
‣ type: Fob watch
‣ make: Gallifreyan
‣ diameter: 2”
‣ id: 1337
‣ fname: Martha
‣ lname: Jones
‣ occupation: MD
Document Databases
17
‣ id: 99CC
‣ fname: John
‣ lname: Smith
‣ clock:
‣ type: Fob watch
‣ make: Gallifreyan
‣ diameter: 2”
‣ id: 1337
‣ fname: Martha
‣ lname: Jones
‣ occupation: MD
‣ id: 2468
‣ fname: Rose
‣ lname: Tyler
‣ in_love_with: 99CC
Document Databases
18
Document Databases
18
posttitle: ___text: ___tags: [...]
comments
text: ___
text: ___
The rise of REST for databases
19
๏ It’s actually all about Hypermedia:
•When one aggregate root references another
•Not necessarily on the same host
•Hyperlinks provide the desired decoupling,and can reference documents qualified by host
๏HTTP and the ease to develop client drivers a further driver
NewSQL
20
NewSQL defined
21
๏Relational Databases with (primarily) a SQL interface, that adopts the scaling benefits of NoSQL databases.
๏Automatic/Transparent sharding of data
๏Distributed, Fault Tolerant, Highly Available
NewSQL databases
22
๏Google Spanner
๏VoltDB
๏TokuDB (MySQL engine)
๏Clusterix
๏RethinkDB
Graph Databases
23
Neo4j is a Graph Database
24
24
IS_A
Neo4j Graph Database
Example Graph Databases
๏Neo4j
๏ Infinite Graph (by Objectivity)
๏AllegroGraph (by Franz inc.)
๏HypergraphDB
๏ InfoGrid
๏DEX
๏VertexDB
๏FlockDB
25
26
27
27
fromstole
27
fromstole
companioncompanion
companion
27
fromstole
companioncompanion
companion
married
27
fromstole
companioncompanion
companion
enemy
enemyenemy
married
27
fromstole
plays
plays
plays
plays
companioncompanion
companion
enemy
enemyenemy
married
27
A Good Man Goes to War
Bad Wolf
fromstole
plays
plays
plays
plays
companioncompanion
companion
enemy
enemyenemy
married
in
inin
inin
in
in
Graph Databases
30
Querying Graph Databases (Neo4j)
31
LOVESA B
Graph Patterns
Querying Graph Databases (Neo4j)
31
A -[:LOVES]-> B
LOVESA B
Graph PatternsASCII art
Querying Graph Databases (Neo4j)
31
A -[:LOVES]-> B
LOVESA B
Graph Patterns
START A=node:person(name=“A”)MATCHRETURN B as lover
ASCII art
Column Oriented Databases
32
Column Store
33
Column Oriented Databases
๏Cassandra
๏BigTable (internal at Google)
๏HBase (part of Hadoop)
๏Hypertable
34
Column DB - Classic example
35
Twitter clone
Column Databases
36
๏Use as underlying storage for a higher level data storage model
๏Eg. a graph database model implemented on top of Cassandra
•Notable example:Aurelius Titan
Caches
37
Caches - Improving Reads
38
๏Read from cache first, only read from DB on cache miss
๏Preferably cache aggregates, possibly after passing throughApp-level processing
๏memcached - mainly a cache, tried re-position as a NOSQL DB
• as has other cache products tried
Message Queues
39
Message Queues - Improving Writes
40
๏Write to Queue, process work from Queue in batches
•Alleviates transactional overhead by grouping writes
• Still guarantees writes if the Queue has durability guarantees
•Needs tx synchronization with DB (2PC)
๏Writes not immediately visible, delayed through queue
•Write-to-cache can be used to get around this,if a cache is used
๏Amazon SQS
๏RabbitMQ
๏ZeroMQ
41
Hadoop - Big Data processing
41
Hadoop - Big Data processing
Oracle
Neo4j
Cassandra
41
Hadoop - Big Data processing
Oracle
Neo4j
Cassandra
41
Hadoop - Big Data processing
MapReduce
Hadoop - Data Analysis/Processing
42
๏Batch process large amounts of datatypically offline or semi-online, not for interactive querying
๏ Ingest data from your DB, process and generate report
• Ex. Read Neo4j graph, generate centrality analysis report
๏ Ingest data from event stream, process and generate data for DB
•Ex. Read access logs, create Neo4j data for security analysis
๏ Ingest data from one DB, process and generate data for another
• Ex. Read MySQL transaction logs,create Neo4j data for query acceleration
More DB history
43
Building Databases is hard
44
๏The current NOSQL wave took off in 2009
๏ ... many much older databases still have issues...
๏Most likely there will be issues
๏https://github.com/aphyr/jepsen (by Kyle Kingsbury / @aphyr)
• ... most distributed databases fail in the event of Partitions
๏Test, Test, Test, and Test
•Test the database heavily before you put it in production
•Test for your use cases - generic benchmarks are useless
•Test with real load
•Test continuously
Serious Database Vendorstake Data Seriously
๏Make sure to test their product under “real” load
๏Make sure to test their product in the event of failures
๏But you still need to Test!
๏Report issues to the vendor
๏Data loss is too embarrassing - will be fixed!
๏Performance is important - you’ll be heard!
45
Polyglot Persistence:combining multiple databases
46
Polyglot Persistence - Multiple DBs
47
๏Real world examples:
•RDBMS as system of record,Neo4j for accelerating (join) queries
•Neo4j for storing metadata and structure,Cassandra for storing event logs,S3 for storing BLOB data
Conclusion
48
It is all about modelling
Simplify the world enough‣ to reason about‣ to store and process
Model mis-match
Real World Model
Complex problem? - right tool for each job!
51Image credits: Unknown :’(
Key/Value stores
๏Examples:
•Amazon SimpleDB, memcached, Oracle NoSQL, Redis
๏Use when Data is opaque
๏Scalability is important
๏Scale simply with the addition of more servers
• rebalance equally simply
52
Document Databases
๏Examples:
•MongoDB, Riak
๏Use when data is collections of similar entities
• But semi structured (sparse) rather than tabular
•When fields in entries have multiple values
53
Column Family Databases
๏Examples:
•Cassandra
๏Use when scalability is the main issue
•Both scaling size and scaling load
‣In particular scaling write load
๏Linear scalability (as you add servers) both in read and write
๏Low level - will require you to duplicate data to support queries
54
Graph Databases
๏Examples:
•Neo4j, DEX, InfiniteGraph
๏Use when (deep) traversals are important
๏For complex domains
๏When how entities relate is an important aspect of the domain
55
When not to use a NOSQL Database
๏RDBMSes have been the de-facto standard for years, and still have better tools for some tasks
• Especially for reporting
๏When maintaining a system that works already
๏Sometimes when data is uniform / structured
๏When aggregations over (subsets) of the entire dataset is key
๏But please don’t use a Relational database for persisting objects
56
http://neotechnology.com
Questions?