nosql database technology - why? why now?
TRANSCRIPT
1
NoSQL Database Technology Why? And why now?
James Phillips Co-‐founder and SVP Products
2
Changes in interacFve soHware – NoSQL driver
3
Modern interactive software architecture
Application Scales Out Just add more commodity web servers
Database Scales Up Get a bigger, more complex server
Note – RelaFonal database technology is great for what it is great for, but it is not great for this.
4
11%
12%
16%
29%
35%
49%
Other
All of these
Costs
High latency/low performance
Inability to scale out data
Lack of flexibility/rigid schemas
Source: Couchbase NoSQL Survey, December 2011, n=1351
What is the biggest data management problem driving your use of NoSQL in the coming year?
Survey: Schema inflexibility #1 adopFon driver
5
Extending the scope of RDBMS technology
• Data parFFoning (“sharding”) – DisrupFve to reshard – impacts applicaFon – No cross-‐shard joins – Schema management on every shard
• Denormalizng – Increases speed – At the limit, provides complete flexibility – Eliminates relaFonal query benefits
• Distributed caching – Accelerate reads – Scale out – Another Fer, no write acceleraFon, coherency management
6
Lacking market soluFons, users forced to invent
Dynamo October 2007
Cassandra August 2008
Voldemort February 2009
Bigtable November 2006
Very few organizaFons want to (fewer can) build and maintain database soHware technology. But every organizaFon building interacFve web applicaFons needs this technology.
• No schema required before inserFng data • No schema change required to change data format • Auto-‐sharding without applicaFon parFcipaFon • Distributed queries • Integrated main memory caching • Data synchronizaFon (mobile, mulF-‐datacenter)
7
NoSQL database matches application logic tier architecture Data layer now scales with linear cost and constant performance.
Application Scales Out Just add more commodity web servers
Database Scales Out Just add more commodity data servers
Scaling out flattens the cost and performance curves.
NoSQL Database Servers
8
NOSQL TAXONOMY
9
The Key-Value Store – the foundation of NoSQL
Key 101100101000100010011101 101100101000100010011101 101100101000100010011101 101100101000100010011101 101100101000100010011101 101100101000100010011101 101100101000100010011101 101100101000100010011101 101100101000100010011101 101100101000100010011101
101100101000100010011101 101100101000100010011101
101100101000100010011101 101100101000100010011101 101100101000100010011101
Opaque Binary Value
10
Memcached – the NoSQL precursor
Key 101100101000100010011101 101100101000100010011101 101100101000100010011101 101100101000100010011101 101100101000100010011101 101100101000100010011101 101100101000100010011101 101100101000100010011101 101100101000100010011101 101100101000100010011101
101100101000100010011101 101100101000100010011101
101100101000100010011101 101100101000100010011101 101100101000100010011101
Opaque Binary Value
memcached
In-‐memory only Limited set of operaFons Blob Storage: Set, Add, Replace, CAS Retrieval: Get Structured Data: Append, Increment “Simple and fast.” Challenges: cold cache, disrupFve elasFcity
11
Redis – More “Structured Data” commands
Key 101100101000100010011101 101100101000100010011101 101100101000100010011101 101100101000100010011101 101100101000100010011101 101100101000100010011101 101100101000100010011101 101100101000100010011101 101100101000100010011101 101100101000100010011101
101100101000100010011101 101100101000100010011101
101100101000100010011101 101100101000100010011101 101100101000100010011101
“Data Structures” Blob List Set Hash …
redis
In-‐memory only Vast set of operaFons Blob Storage: Set, Add, Replace, CAS Retrieval: Get, Pub-‐Sub Structured Data: Strings, Hashes, Lists, Sets, Sorted lists
Example operaOons for a Set Add, count, subtract sets, intersecFon, is member?, atomic move from one set to another
12
NoSQL catalog
Key-‐Value
memcached redis
Data Structure Document Column Graph
Cache
(mem
ory on
ly)
13
Membase – From key-‐value cache to database
Disk-‐based with built-‐in memcached cache Cache refill on restart Memcached compaFble (drop in replacement) Highly-‐available (data replicaFon) Add or remove capacity to live cluster “Simple, fast, elasFc.”
membase Key 101100101000100010011101 101100101000100010011101 101100101000100010011101 101100101000100010011101 101100101000100010011101 101100101000100010011101 101100101000100010011101 101100101000100010011101 101100101000100010011101 101100101000100010011101
101100101000100010011101 101100101000100010011101
101100101000100010011101 101100101000100010011101 101100101000100010011101
Opaque Binary Value
14
NoSQL catalog
Key-‐Value
memcached
membase
redis
Data Structure Document Column Graph
Cache
(mem
ory on
ly)
Database
(mem
ory/disk)
15
Couchbase – document-‐oriented database
Key { “string” : “string”, “string” : value, “string” : { “string” : “string”, “string” : value }, “string” : [ array ] }
Auto-‐sharding Disk-‐based with built-‐in memcached cache Cache refill on restart Memcached compaFble (drop in replace) Highly-‐available (data replicaFon) Add or remove capacity to live cluster When values are JSON objects (“documents”): Create indices, views and query against the views
JSON OBJECT
(“DOCUMENT”)
Couchbase
16
NoSQL catalog
Key-‐Value
memcached
membase
redis
Data Structure Document Column Graph
Cache
(mem
ory on
ly)
Database
(mem
ory/disk)
membase couchbase
17
MongoDB – Document-‐oriented database
Key { “string” : “string”, “string” : value, “string” : { “string” : “string”, “string” : value }, “string” : [ array ] }
Disk-‐based with in-‐memory “caching” BSON (“binary JSON”) format and wire protocol Master-‐slave replicaFon Auto-‐sharding Values are BSON objects Supports ad hoc queries – best when indexed
BSON OBJECT
(“DOCUMENT”)
MongoDB
18
NoSQL catalog
Key-‐Value
memcached
membase
redis
Data Structure Document Column Graph
mongoDB
couchbase
Cache
(mem
ory on
ly)
Database
(mem
ory/disk)
19
Cassandra – Column overlays
Disk-‐based system Clustered External caching required for low-‐latency reads “Columns” are overlaid on the data Not all rows must have all columns Supports efficient queries on columns Restart required when adding columns Good cross-‐datacenter support
Cassandra Key101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101
101100101000100010011101101100101000100010011101
101100101000100010011101101100101000100010011101101100101000100010011101
OpaqueBinaryValue
Column 1
Column 2
Column 3 (not present)
20
NoSQL catalog
Key-‐Value
memcached
membase
redis
Data Structure Document Column Graph
mongoDB
couchbase cassandra
Cache
(mem
ory on
ly)
Database
(mem
ory/disk)
21
Neo4j – Graph database
Disk-‐based system External caching required for low-‐latency reads Nodes, relaFonships and paths ProperFes on nodes Delete, Insert, Traverse, etc.
Neo4j
Key101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101
101100101000100010011101101100101000100010011101
101100101000100010011101101100101000100010011101101100101000100010011101
OpaqueBinaryValue
Key101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101
101100101000100010011101101100101000100010011101
101100101000100010011101101100101000100010011101101100101000100010011101
OpaqueBinaryValue
Key101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101
101100101000100010011101101100101000100010011101
101100101000100010011101101100101000100010011101101100101000100010011101
OpaqueBinaryValue
Key101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101
101100101000100010011101101100101000100010011101
101100101000100010011101101100101000100010011101101100101000100010011101
OpaqueBinaryValue
Key101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101
101100101000100010011101101100101000100010011101
101100101000100010011101101100101000100010011101101100101000100010011101
OpaqueBinaryValue
22
NoSQL catalog
Key-‐Value
memcached
membase
redis
Data Structure Document Column Graph
mongoDB
couchbase cassandra
Cache
(mem
ory on
ly)
Database
(mem
ory/disk)
Neo4j
23
Document-‐oriented database advantages
Performance. The document data model keeps related data in a single physical locaFon in memory and on disk (a document). This allows consistently low-‐latency access to the data – reads and writes happen with very liqle delay. Database latency can result in perceived “lag” by the player of a game and avoiding it is a key success criterion. Dynamic elasOcity. Because the document approach keeps records “in one place” (a single document in a conFguous physical locaFon), it is much easier to move the data from one server to another while maintaining consistency – and without requiring any game downFme. Moving data between servers is required to add and remove cluster capacity to cost-‐effecFvely match the aggregate performance needs of the applicaFon to the performance capability of the database. Doing this at any Fme without stopping the revenue flow of the game can make a material difference in game profitability. Schema flexibility. While all NoSQL databases provide schema flexibility. Key-‐value and document-‐oriented databases enjoy the most flexibility. Column-‐oriented databases sFll require maintenance to add new columns and to group them. A key-‐value or document-‐oriented database requires no database maintenance to change the database schema (to add and remove “fields” or data elements from a given record). Query flexibility. Balancing schema flexibility with query expressiveness (the ability to ask the database quesFons, for example “return me a list of all the farms in which a player purchased a black sheep last month”) is important. While a key-‐value database is completely flexible, allowing a user to put any desired value in the “value” part of the key-‐value pair, it doesn’t provide the ability to ask quesFons. It only permits accessing the data record associated with a given key. I can ask for the farm data for user A, B, C … and see if they have a black sheep, but I can’t ask the database to do that work on my behalf. Document-‐databases provide the best balance of schema flexibility without giving up the ability to do sophisFcated queries.
24
Big data and NoSQL – Hadoop and Couchbase
click stream events
profiles, campaigns
profiles, real Fme campaign staFsFcs
40 milliseconds to respond with the decision.
2
3
1
25
COUCHBASE
26
Couchbase automaFcally distributes data across commodity servers. Built-‐in caching enables apps to read and write data with sub-‐millisecond latency. And with no schema to manage, Couchbase effortlessly accommodates changing data management requirements.
Couchbase Server
Simple. Fast. ElasFc. NoSQL.
27
RepresentaFve user list
28
Typical Couchbase producFon environment
ApplicaOon users
Load Balancer
ApplicaOon Servers
Servers
29
Couchbase architecture
Membase EP Engine
CouchDB
storage interface
Heartbeat
Process m
onito
r
Glob
al singleton supe
rviso
r
Confi
guraFo
n manager
on each node
Rebalance orchestrator
Nod
e he
alth m
onito
r
one per cluster
vBucket state and
replicaF
on m
anager
hqp RE
ST m
anagem
ent A
PI/W
eb UI
Erlang/OTP
(built-‐in memcached)
Data Manager Cluster Manager
Database OperaFons
Cluster Management
30
Couchbase deployment
Data Flow
Cluster Management
Web ApplicaFon
Couchbase Client Library
31
Clustering With Couchbase
SET request arrives at KEY’s master server
Listener-‐Sender
Master server for KEY Replica Server 2 for KEY Replica Server 1 for KEY
3 3
1 SET acknowledgement returned to applicaFon 2
Disk Disk Disk
RAM
Couchb
ase storage en
gine
Disk Disk Disk
4
32
COUCHBASE CLIENT LIBRARY
Basic OperaFon
§ Docs distributed evenly across servers in the cluster
§ Each server stores both ac#ve & replica docs § Only one server acFve at a Fme
§ Client library provides app with simple interface to database
§ Cluster map provides map to which server doc is on § App never needs to know
§ App reads, writes, updates docs
§ MulFple App Servers can access same document at same Fme
Doc 4
Doc 2
Doc 5
SERVER 1
Doc 6
Doc 4
SERVER 2
Doc 7
Doc 1
SERVER 3
Doc 3
User Configured Replica Count = 1
Read/Write/Update
COUCHBASE CLIENT LIBRARY
Read/Write/Update
Doc 9
Doc 7
Doc 8 Doc 6
Doc 3
DOC
DOC
DOC
DOC
DOC
DOC
DOC
DOC
DOC
DOC
DOC
DOC
DOC
DOC
DOC
Doc 9
Doc 5
DOC
DOC
DOC
Doc 1
Doc 8 Doc 2
Replica Docs Replica Docs Replica Docs
AcFve Docs AcFve Docs AcFve Docs
CLUSTER MAP
CLUSTER MAP
APP SERVER 1 APP SERVER 2
COUCHBASE SERVER CLUSTER
33
Add Nodes
§ Two servers added to cluster § One-‐click operaFon
§ Docs automaFcally rebalanced across cluster § Even distribuFon of
docs § Minimum doc movement
§ Cluster map updated
§ App database calls now distributed over larger # of servers
User Configured Replica Count = 1
Read/Write/Update Read/Write/Update
Doc 7
Doc 9
Doc 3
AcFve Docs
Replica Docs
Doc 6
COUCHBASE CLIENT LIBRARY CLUSTER MAP
APP SERVER 1
COUCHBASE CLIENT LIBRARY CLUSTER MAP
APP SERVER 2
Doc 4
Doc 2
Doc 5
SERVER 1
Doc 6
Doc 4
SERVER 2
Doc 7
Doc 1
SERVER 3
Doc 3
Doc 9
Doc 7
Doc 8 Doc 6
Doc 3
DOC
DOC
DOC
DOC
DOC
DOC
DOC
DOC
DOC
DOC
DOC
DOC
DOC
DOC
DOC
Doc 9
Doc 5
DOC
DOC
DOC
Doc 1
Doc 8 Doc 2
Replica Docs Replica Docs Replica Docs
AcFve Docs AcFve Docs AcFve Docs
SERVER 4 SERVER 5
AcFve Docs AcFve Docs
Replica Docs Replica Docs
COUCHBASE SERVER CLUSTER
34
Fail Over Node
§ App servers happily accessing docs on Server 3
§ Server fails § App server requests to server 3 fail § Cluster detects server has failed
§ Promotes replicas of docs to ac#ve § Updates cluster map
§ App server requests for docs now go to appropriate server
§ Typically rebalance would follow
User Configured Replica Count = 1
Doc 7
Doc 9
Doc 3
AcFve Docs
Replica Docs
Doc 6
COUCHBASE CLIENT LIBRARY CLUSTER MAP
APP SERVER 1
COUCHBASE CLIENT LIBRARY CLUSTER MAP
APP SERVER 2
Doc 4
Doc 2
Doc 5
SERVER 1
Doc 6
Doc 4
SERVER 2
Doc 7
Doc 1
SERVER 3
Doc 3
Doc 9
Doc 7 Doc 8
Doc 6
Doc 3
DOC
DOC
DOC DOC
DOC
DOC
DOC DOC
DOC
DOC
DOC DOC
DOC
DOC
DOC
Doc 9
Doc 5 DOC
DOC
DOC
Doc 1
Doc 8
Doc 2
Replica Docs Replica Docs Replica Docs
AcFve Docs AcFve Docs AcFve Docs
SERVER 4 SERVER 5
AcFve Docs AcFve Docs
Replica Docs Replica Docs
COUCHBASE SERVER CLUSTER
35
COUCHBASE SOLUTION OPERATING A CLUSTER
36
Reading and WriFng
Reading Data WriOng Data
Server
Give me document A
Here is document A
Application Server
A
Server
Please store document A
OK, I stored document A
Application Server
A
(We’ll save the arithmeOc for the sizing secOon : )
37
Server
Reading data
RAM
DISK
Application Server
Give me document A
A
Here is document A
If document A is in memory return document A to the applicaFon Else add document to read queue reader eventually loads document from disk into memory return document A to the applicaFon
A
Reading Data
38
Keeping working data set in RAM is key to read performance
Your applicaOon’s working set should fit in RAM…
!"#$"#
%&'
()!*
&++,-./0-123!"#$"#
4-$"35"361.75"203&
!
8"#"3-9361.75"203&
"# 61.75"203&3-93-235"51#:#"07#2361.75"203&30130;"3/++,-./0-12
$%&'/66361.75"203013#"/63<7"7"#"/6"#3"$"207/,,:3#"/69361.75"203=#1536-9>30135"51#:
#"07#2361.75"203&30130;"3/++,-./0-12
!
… or else! (because you don’t want the “else” part happening very ohen – it is MUCH slower than a memory read and you could have to wait in line an indeterminate amount of Ome for the read to happen.)
Reading Data
39
Working set raFo depends on your applicaFon
Server Server Server
Late stage social game Many users no longer acFve; few logged in at
any given Fme.
Ad Network Any cookie can show up
at any Fme.
Business applicaOon Users logged in during the day. Day moves around the globe.
working/total set = 1 working/total set = .01 working/total set = .33
Reading Data
40
Server
Couchbase in operaFon: WriFng data
RAM
DISK
Application Server
Store document A
A
OK, it is stored
If there is room for the document in RAM Store the document in RAM Else Eject other document(s) from RAM Store the document in RAM Add the document to the replicaFon queue Replicator eventually transmits document Add the document to write queue Writer eventually writes document to disk
A
WriOng Data
41
Server
Flow of data when wriFng
WriOng Data
Application ServerApplication Server Application Server
ApplicaOons wriOng to Couchbase
Couchbase wriOng to disk
network
Couchbase transmijng replicas
42
Server
Queues build if aggregate arrival rate exceeds drain rates
WriOng Data
Application ServerApplication Server Application Server
network
ReplicaOon queue Disk write queue
43
Server Server Server
Scaling out permits matching of aggregate flow rates so queues do not grow
Application ServerApplication Server Application Server
network network network