Download - No sql for sql professionals
NoSQL for SQL Professionals
Don Pinto
Product Manager
NoSQL+ +
More Data More Users Interactive Apps
Macro Trends Driving NoSQL Technology
Lacking Solutions, Users Forced to Invent
DynamoOctober 2007
CassandraAugust 2008
VoldemortFebruary 2009November 2006
Bigtable
Very few organizations can build and maintain database software technology.But every organization building interactive web applications needs this technology.
What Is Biggest Data Management Problem Driving Use of NoSQL in Coming Year?
Lack of flexibility/rigid schemas
Inability to scale out data
Performance challenges
Cost All of these Other
49%
35%
29%
16% 12% 11%
Source: Couchbase Survey, December 2011, n = 1351.
Relational vs. NoSQL
Key Differences
RDBMS Scales UpGet a bigger, more complex server
Users
Application Scales OutJust add more commodity web servers
Users
System CostApplication Performance
Relational Technology Scales Up
Relational Database
Web/App Server Tier
Expensive and disruptive sharding, doesn’t perform at web scale
System CostApplication Performance
Won’t scale beyond this point
NoSQL Database Scales Out Like App Tier
NoSQL Database Scales OutCost and performance mirrors app tier
Users
Scaling out flattens the cost and performance curves
Couchbase Distributed Data Store
Application Scales OutJust add more commodity web servers
Users
System CostApplication Performance
Application Performance System Cost
Web/App Server Tier
Relational vs Document Data Model
Relational data model Document data modelCollection of complex documents with
arbitrary, nested data formats andvarying “record” format.
Highly-structured table organization with rigidly-defined data formats and
record structure.
C1 C2 C3 C4
JSONJSON
JSON
{
}
RDBMS Example: User Profile
Address Info
1 DEN 30303CO
2 MV 94040CA
3 CHI 60609IL
User Info
KEY First ZIP_idLast
4 NY 10010NY
1 Dipti 2Borkar
2 Joe 2Smith
3 Ali 2Dodson
4 John 3Doe
ZIP_id CITY ZIPSTATE
1 2
2 MV 94040CA
To get information about specific user, you perform a join across two tables
Document Example: User Profile
All data in a single document
{ “ID”: 1, “FIRST”: “Dipti”, “LAST”: “Borkar”, “ZIP”: “94040”, “CITY”: “MV”, “STATE”: “CA” }
JSON
= +
Making a Change Using RDBMSUser ID First Last Zip
1 Dipti Borkar 94040
2 Joe Smith 94040
3 Ali Dodson 94040
4 Sarah Gorin NW1
5 Bob Young 30303
6 Nancy Baker 10010
7 Ray Jones 31311
8 Lee Chen V5V3M
• • •
50000 Doug Moore 04252
50001 Mary White SW195
50002 Lisa Clark 12425
Country ID
TEL3
001
Country ID Country name
001 USA
002 UK
003 Argentina
004 Australia
005 Aruba
006 Austria
007 Brazil
008 Canada
009 Chile
• • •
130 Portugal
131 Romania
132 Russia
133 Spain
134 Sweden
User ID Photo ID Comment
2 d043 NYC
2 b054 Bday
5 c036 Miami
7 d072 Sunset
5002 e086 Spain
Photo Table
001
007
001
133
133
User ID Status ID Text
1 a42 At conf
4 b26 excited
5 c32 hockey
12 d83 Go A’s
5000 e34 sailing
Status Table
134
007
008
001
005
Country Table
User ID Affl ID Affl Name
2 a42 Cal
4 b96 USC
7 c14 UW
8 e22 Oxford
Affiliations TableCountry
ID
001
001
001
002
Country ID
Country ID
001
001
002
001
001
001
008
001
002
001
User Table
...
Making the Same Change With a Document DB
{ “ID”: 1, “FIRST”: “Don”, “LAST”: “Pinto”, “ZIP”: “94040”, “CITY”: “MV”, “STATE”: “CA”, “STATUS”: { “TEXT”: “At Conf” }
}
“GEO_LOC”: “134” },“COUNTRY”: ”USA”
Just add information to a document
JSON
,}
User ID First Last Zip
1 Frank Wiegel 94040
2 Joe Smith 94040
3 Ali Dodson 94040
4 Sarah Gorin NW1
5 Bob Young 30303
6 Nancy Baker 10010
7 Ray Jones 31311
8 Lee Chen V5V3
• • •
5000 Doug Moore 04252
5001 Mary White 41694
5002 Lisa Clark 12425
User ID
PhotoID Comment
2 d043 NYC
2 b054 Bday
5 c036 Miami
7 d072 Sunset
5002 e086 Spain
User Table Photo Table
User ID
Status ID Text
1 a42 At conf
4 b26 excited
5 c32 hockey
12 d83 Go A’s
5000 e34 sailing
Status Table
User ID
AffiliationsID
AffiliationsName
2 a42 Cal
4 b96 USC
7 c14 UW
8 e22 Oxford
Affiliations Table
Relational vs Document Performance
1 Frank 94040Weigel
a421 At conf
5 Bob 30303Young
c0365 Miami
4 Sarah NW1Gorin
b264 hockey
JSON
{
}
JSON
{
}
JSON
{
}JSON
{
}
JSON
{
}JSON
{
}
JSON
{
}JSON
{
}
JSON
{
}JSON
{
}8 Lee V5V3Chen
e228 Oxford5002 Lisa 12425Clark
e0865002 Spain
c0325 excited
Faster response times and higher throughput
Document Databases Easily Accommodate Unstructured Data
{ “ID”: 1, “NAME”: “Fairmont San Francisco”, “DESCRIPTION”: “Historic grandeur…”, “AVG_REVIEWER_SCORE”: “4.3”, “AMENITY”: {“TYPE”: “gym”, DESCRIPTION: “fitness center” }, {“TYPE”: “wifi”, “DESCRIPTION”: “free wifi”}, “RATE_TYPE”: “nightly”, “PRICE”: “$199”, “REVIEWS”: [“review_1”, “review_2”], “ATTRACTIONS”: “Chinatown”, }
JSON
{ “ID”: 2, “NAME”: “W San Francisco”, “DESCRIPTION”: “Chic, hip accommodations..”, “AVG_REVIEWER_SCORE”: “4.0”, “AMENITY”: {“TYPE”: “spa”, DESCRIPTION: “Bliss Spa” }, {“TYPE”: “wifi”, “DESCRIPTION”: “free wifi”}, {“TYPE”: “dining”, “DESCRIPTION”: “bar/lounge”}, “RATE_TYPE”: “nightly”, “PRICE”: “$194”, “REVIEWS”: [“review_1”, “review_2”],} JSON
Hotels
Document Databases Easily Accommodate Unstructured Data
{ “ID”: 1, “NAME”: “Fairmont San Francisco”,…} JSON
{ “REVIEW_ID”: 1, “REVIEW”: “Loved Hotel & Location”, “WOULD RECOMMEND”: “yes”, “AVG_REVIEWER_SCORE”: “5”, “REVIEW_DATE”: “May 29, 2013”, “USER_PROFILE_ID”: “271”,
}
JSON
{ “REVIEW_ID”: 2, “REVIEW”: “Nice, but a few kinks”, “WOULD RECOMMEND”: “yes”, “AVG_REVIEWER_SCORE”: “4”, “REVIEW_DATE”: “May 22, 2013”, “USER_PROFILE_ID”: “923”,
}
JSON
Hotels
Reviews
Document Databases Easily Accommodate Unstructured Data
{ “ID”: 1, “NAME”: “Fairmont San Francisco”,…} JSON
Hotel Descriptions
Reviews { “REVIEW_ID”: 1, “REVIEW”: “Loved Hotel…”,…} JSON
{ “REVIEW_ID”: 2, “REVIEW”: “Nice, but …”,…}
JSON
User Profiles { “USER_ID”: 1, “DISPLAY_NAME ”: “Ted’s Trip Experience”, “CITY”: “Saratoga”, “STATE”: “California”,“NUM_OF_REVIEWS”: “8”, }
JSON
{ “USER_ID”: 1, “DISPLAY_NAME ”: “WhatWhat567”, “CITY”: “Kansas City”, “STATE”: “MO”,“NUM_OF_REVIEWS”: “3”, } JSON
Document Databases Easily Accommodate Unstructured Data
{ “ID”: 1, “NAME”: “Fairmont San Francisco”,…} JSON
Hotel Descriptions
Reviews { “REVIEW_ID”: 1, “REVIEW”: “Loved Hotel…”,…} JSON
{ “REVIEW_ID”: 2, “REVIEW”: “Nice, but …”,…}
JSON
User Profiles { “USER_ID”: 1, “DISPLAY”: “Ted’s Trip…”,…}
JSON
{ “USER_ID”: 2, “DISPLAY”: “WhatWhat …”,…}
JSON
Document IDs associates related objects
Hotels points to reviews
Reviews points to users
Indexing with Document DatabasesIndex on AVG_REVIEWER_SCORE
Indexing with Document DatabasesIndex on AVG_REVIEWER_SCORE
…4.0, doc_id4.0, doc_id4.1, doc_id4.3, doc_id5.0, doc_id…
Index
Querying with Document DatabasesQuery on AVG_REVIEWER_SCORE
…3.4, doc_id3.4, doc_id3.5, doc_id3.6, doc_id3.7, doc_id3.8, doc_id4.0, doc_id4.1, doc_id4.3, doc_id4.5, doc_id4.7, doc_id4.9, doc_id5.0, doc_id…5.0, doc_id
Index Matching ResultsQuery
Flavors of NoSQL
NoSQL catalog
Key-Value
memcached redis
Data Structure Document Column Graph
mongoDB
couchbase cassandra
Cach
e(m
emor
y on
ly)
Dat
abas
e(m
emor
y/di
sk)
Neo4j
Couchbase Open Source Project
• Leading NoSQL database project focused on distributed database technology and surrounding ecosystem
• Supports both key-value and document-oriented use cases
• All components are available under the Apache 2.0 Public License
• Obtained as packaged software in both enterprise and community editions.
Couchbase Open Source Project
Easy Scalabili
ty
Consistent High
Performance
Always On
24x365
Grow cluster without application changes, without downtime with a single click
Consistent sub-millisecond read and write response times
with consistent high throughput
No downtime for software upgrades, hardware maintenance, etc.
JSONJSONJSON
JSONJSON
PERFORMANCE
Flexible Data Model
JSON document model with no fixed schema.
Couchbase Server
Couchbase Server Architecture
Hea
rtbe
at
Proc
ess
mon
itor
Glo
bal s
ingl
eton
sup
ervi
sor
Confi
gura
tion
man
ager
on each node
Reba
lanc
e or
ches
trat
or
Nod
e he
alth
mon
itor
one per cluster
vBuc
ket s
tate
and
repl
icati
on m
anag
er
httpRE
ST m
anag
emen
t API
/Web
UI
HTTP8091
Erlang port mapper4369
Distributed Erlang21100 - 21199
Erlang/OTP
storage interface
Couchbase EP Engine
11210Memcapable 2.0
Moxi
11211Memcapable 1.0
Memcached
New Persistence Layer
8092Query API
Que
ry E
ngin
e
Data Manager Cluster Manager
Couchbase Server Architecture
Replication, Rebalance, Shard State Manager
REST management API/Web UI
8091Admin Console
Erla
ng /
OTP
11210 / 11211Data access ports
Object-managedCache
Multi-threaded Persistence Engine
8092Query API
Que
ry E
ngin
e
http
Data Manager Cluster Manager
Where is NoSQL a good fit?
Market AdoptionInternet Companies Enterprises
• Communications
• Retail
• Financial Services
• Health Care
• Automotive/Airline
• Agriculture
• Consumer Electronics
• Business Systems
• Social Gaming
• Ad Networks
• Social Networks
• Online Business Services
• E-Commerce
• Online Media
• Content Management
• Cloud Services
Application Characteristics - Data driven
• 3rd party or user defined structure (Twitter feeds)
• Support for unlimited data growth (Viral apps)
• Data with non-homogenous structure
• Need to quickly and often change data structure
• Variable length documents
• Sparse data records
• Hierarchical data
NoSQL is a good fit
Application Characteristics - Performance driven
• Low latency critical (ex. 1millisecond)
• High throughput (ex. 200000 ops / sec)
• Large number of users
• Unknown demand with sudden growth of users/data
• Predominantly direct document access
• Read / Mixed / Write heavy workloads
NoSQL is a good fit
Q & A
Extra - Couchbase Operations
33 2
Single node - Couchbase Write Operation
Managed Cache
Dis
k Q
ueue
Disk
Replication Queue
App Server
Couchbase Server Node
Doc 1Doc 1
Doc 1
To other node
33 2
Single node - Couchbase Update Operation
Managed Cache
Dis
k Q
ueue
Replication Queue
App Server
Doc 1’
Doc 1
Doc 1’Doc 1
Doc 1’
Disk
To other node
Couchbase Server Node
GET
Doc
1
33 2
Single node - Couchbase Read Operation
Dis
k Q
ueue
Replication Queue
App Server
Doc 1
Doc 1Doc 1
Managed Cache
Disk
To other node
Couchbase Server Node
33 2
Single node – Couchbase Cache Miss2
Dis
k Q
ueue
Replication Queue
App Server
Couchbase Server Node
Doc 1
Doc 3Doc 5 Doc 2Doc 4
Doc 6 Doc 5 Doc 4 Doc 3 Doc 2
Doc 4
GET
Doc
1
Doc 1
Doc 1
Managed Cache
Disk
To other node
COUCHBASE SERVER CLUSTER
Basic Operation
• Docs distributed evenly across servers
• Each server stores both active and replica docsOnly one server active at a time
• Client library provides app with simple interface to database
• Cluster map provides map to which server doc is onApp never needs to know
• App reads, writes, updates docs
• Multiple app servers can access same document at same time
User Configured Replica Count = 1
READ/WRITE/UPDATE
ACTIVE
Doc 5
Doc 2
Doc
Doc
Doc
SERVER 1
ACTIVE
Doc 4
Doc 7
Doc
Doc
Doc
SERVER 2
Doc 8
ACTIVE
Doc 1
Doc 2
Doc
Doc
Doc
REPLICA
Doc 4
Doc 1
Doc 8
Doc
Doc
Doc
REPLICA
Doc 6
Doc 3
Doc 2
Doc
Doc
Doc
REPLICA
Doc 7
Doc 9
Doc 5
Doc
Doc
Doc
SERVER 3
Doc 6
APP SERVER 1
COUCHBASE Client Library
CLUSTER MAP
COUCHBASE Client Library
CLUSTER MAP
APP SERVER 2
Doc 9
Add Nodes to Cluster
• Two servers addedOne-click operation
• Docs automatically rebalanced across clusterEven distribution of docsMinimum doc movement
• Cluster map updated
• App database calls now distributed over larger number of servers
REPLICA
ACTIVE
Doc 5
Doc 2
Doc
Doc
Doc 4
Doc 1
Doc
Doc
SERVER 1
REPLICA
ACTIVE
Doc 4
Doc 7
Doc
Doc
Doc 6
Doc 3
Doc
Doc
SERVER 2
REPLICA
ACTIVE
Doc 1
Doc 2
Doc
Doc
Doc 7
Doc 9
Doc
Doc
SERVER 3 SERVER 4 SERVER 5
REPLICA
ACTIVE
REPLICA
ACTIVE
Doc
Doc 8 Doc
Doc 9 Doc
Doc 2 Doc
Doc 8 Doc
Doc 5 Doc
Doc 6
READ/WRITE/UPDATE READ/WRITE/UPDATE
APP SERVER 1
COUCHBASE Client Library
CLUSTER MAP
COUCHBASE Client Library
CLUSTER MAP
APP SERVER 2
COUCHBASE SERVER CLUSTER
User Configured Replica Count = 1
Fail Over Node
REPLICA
ACTIVE
Doc 5
Doc 2
Doc
Doc
Doc 4
Doc 1
Doc
Doc
SERVER 1
REPLICA
ACTIVE
Doc 4
Doc 7
Doc
Doc
Doc 6
Doc 3
Doc
Doc
SERVER 2
REPLICA
ACTIVE
Doc 1
Doc 2
Doc
Doc
Doc 7
Doc 9
Doc
Doc
SERVER 3 SERVER 4 SERVER 5
REPLICA
ACTIVE
REPLICA
ACTIVE
Doc 9
Doc 8
Doc Doc 6 Doc
Doc
Doc 5 Doc
Doc 2
Doc 8 Doc
Doc
• App servers accessing docs
• Requests to Server 3 fail
• Cluster detects server failedPromotes replicas of docs to activeUpdates cluster map
• Requests for docs now go to appropriate server
• Typically rebalance would follow
Doc
Doc 1 Doc 3
APP SERVER 1
COUCHBASE Client Library
CLUSTER MAP
COUCHBASE Client Library
CLUSTER MAP
APP SERVER 2
User Configured Replica Count = 1
COUCHBASE SERVER CLUSTER