nosql 101: couchbase connect 2014
TRANSCRIPT
©2014 Couchbase, Inc. 3
NoSQL
Macro Trends Driving NoSQL Technology
+ +
More Data More Users Interactive Apps
©2014 Couchbase, Inc. 4
Cloud-Based, Data-Centric Apps are Creating a Disruption
Why NoSQL?
Client/Server Cloud
Apps run on premise, support thousands of simultaneous users
Centralized architecture on high-end, expensive servers
Manage relatively small amount of mostly structured data
Apps run in the cloud, support millions of simultaneous users
Distributed, web-scale architecture on low-cost, commodity servers
Data-centric apps that must handle large amount of unstructured data
©2014 Couchbase, Inc. 5
Right Database for Cloud-Based, Data-Centric Apps
Why NoSQL?
Scalability PerformanceAgileDevelopment
Availability
PERFORMANCE
JSONJSONJSON
JSONJSON
©2014 Couchbase, Inc. 6
JSON Data Model Fits Today’s Developer Needs Better
Agile Development
Hundreds or thousands of inter-related tables
Handles structured data well, unstructured data poorly
Rigid schema requires migrations that can take weeks, months
Impedance mismatch with developers
Aggregates & denormalizes data into single document
Handles structured & unstructured data equally well
Inferred schema requires no migration
JSON rapidly being adopted
Hotel Descriptions
Reviews
User Profiles
Reviews points to users
Hotels points to reviews
{ “ID”: 1, “NAME”: “Fairmont San Francisco”,…}
{“REVIEW_ID”: 1, “REVIEW”: “Loved Hotel…”,…}
{ “REVIEW_ID”: 2, “REVIEW”: “Nice, but …”,…}
{ “USER_ID”: 1, “DISPLAY”: “Ted’s Trip…”,…}
{ “USER_ID”: 2, “DISPLAY”: “WhatWhat …”,…}
©2014 Couchbase, Inc. 7
Must Dynamically Scale Apps to Support Millions of Users
Scalability
Centralized, scale up architecture with big, expensive servers
Manual sharding at app level struggles to support “web scale”
High software costs & TCO
Distributed, scale-out architecture with cluster of low-cost, commodity servers
Auto-sharding at database level to support Big Data, Big Users
Open source & lower TCO
RDBMS Scales UpGet a bigger, more complex server
Users
Application Scales OutJust add more commodity web
servers
Users
System CostApplication Performance
System CostApplication Performance
Won’t scale beyond this point
©2014 Couchbase, Inc. 8
Consumers & Employees Demand Highly Responsive Apps
Performance
Architecture based on “speed of disk”
Requires joins across hundreds or thousands of tables
High throughput requires very expensive hardware
Architecture based on “speed to memory”
Faster access to aggregated, de-normalized objects
High throughput at low TCO with cluster of commodity servers
Application layer
RDBMSCache Application layer
RDBMSCacheCouchbase
©2014 Couchbase, Inc. 9
Apps Must Now Stay Online 24 x 365
Availability
Relational systems use clustering as an afterthought
Must take database down for “maintenance windows”
Struggle to support XDCR replication across many DCs
Clustered systems with intra-cluster replication for availability
Designed for online software upgrades & maintenance
Native master-master XDCR for higher availability
JSONJSON
JSONJSON
24/7
http://www.mypage.com
turpis eget dolor mollis, id tincidunt dui mattis. Nunc sodales elementum turpis, vel interdum ante congue quis. Pellentesque habitant morbi tristique senectus et netus et malesuada Well, this is embarrassing.
We are having some difficulties and we apologies for the inconvenience.
©2014 Couchbase, Inc. 11
Key-Value
memcached redis
Data Structure Document Column Graph
mongoDB
couchbase cassandra
Cac
he(m
emor
y on
ly)
Dat
aba
se(m
emor
y/di
sk)
Neo4j
NoSQL catalog
©2014 Couchbase, Inc. 12
The Key-Value Store – the foundation of NoSQL
Key
101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101
101100101000100010011101101100101000100010011101
101100101000100010011101101100101000100010011101101100101000100010011101
OpaqueBinaryValue
©2014 Couchbase, Inc. 13
memcached – the NoSQL precursor
memcached
In-memory only Limited set of operations Blob Storage: Set, Add, Replace,
CAS Retrieval: Get Structured Data: Append, Increment Simple and fast Challenges: cold cache, disruptive
elasticity
Key
101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101
101100101000100010011101101100101000100010011101
101100101000100010011101101100101000100010011101101100101000100010011101
OpaqueBinaryValue
©2014 Couchbase, Inc. 14
Couchbase – document-oriented database
Key
{ “string” : “string”, “string” : value, “string” : { “string” : “string”, “string” : value }, “string” : [ array ]}
Auto-sharding Disk-based with built-in
memcached cache Elastic scalability Highly-available (data replication) When values are JSON objects
(“documents”): Create indices, views and query
against the views
JSONOBJECT
(“DOCUMENT”)
Couchbase
©2014 Couchbase, Inc. 15
Couchbase Architecture
read/write/update
Active
SERVER 1
Active
SERVER 2
Active
SERVER 3
APP SERVER 1
COUCHBASE Client Library
CLUSTER MAP
COUCHBASE Client Library
CLUSTER MAP
APP SERVER 2
Shard 5
Shard 2
Shard 9
Shard
Shard
Shard
Shard 4
Shard 7
Shard 8
Shard
Shard
Shard
Shard 1
Shard 3
Shard 6
Shard
Shard
Shard
Replica Replica Replica
Shard 4
Shard 1
Shard 8
Shard
Shard
Shard
Shard 6
Shard 3
Shard 2
Shard
Shard
Shard
Shard 7
Shard 9
Shard 5
Shard
Shard
Shard
©2014 Couchbase, Inc. 16
MongoDB – document-oriented database
Disk-based with OS caching BSON (“binary JSON”) format and
wire protocol Master-slave replication Auto-sharding Values are BSON objects Supports ad hoc queries – best when
indexed
MongoDBKey
{ “string” : “string”, “string” : value, “string” : { “string” : “string”, “string” : value }, “string” : [ array ]}
JSONOBJECT
(“DOCUMENT”)
©2014 Couchbase, Inc. 18
Cassandra – Column-family database
More disk-based system Key includes a row, column family and
column name Store versioned blobs in one large table Queries can be done on rows, column
families and column names Row and column designs are critical Clustered External caching required for low-latency
reads
CassandraKey
101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101
101100101000100010011101101100101000100010011101
101100101000100010011101101100101000100010011101101100101000100010011101
OpaqueBinaryValue
Column 1
Column 2
Column 3 (not present)
©2014 Couchbase, Inc. 20
Neo4j – Graph database
Disk-based system External caching required for low-latency
reads Nodes, relationships and paths Properties on nodes Delete, Insert, Traverse, etc.
Neo4j
Key
101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101
101100101000100010011101101100101000100010011101
101100101000100010011101101100101000100010011101101100101000100010011101
OpaqueBinaryValue
Key
101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101
101100101000100010011101101100101000100010011101
101100101000100010011101101100101000100010011101101100101000100010011101
OpaqueBinaryValue
Key
101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101
101100101000100010011101101100101000100010011101
101100101000100010011101101100101000100010011101101100101000100010011101
OpaqueBinaryValue
Key
101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101
101100101000100010011101101100101000100010011101
101100101000100010011101101100101000100010011101101100101000100010011101
OpaqueBinaryValue
Key
101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101
101100101000100010011101101100101000100010011101
101100101000100010011101101100101000100010011101101100101000100010011101
OpaqueBinaryValue
©2014 Couchbase, Inc. 21
NoSQL Considerations
Accessing data– No standards exist yet– Typically via SDKs or over HTTP– Check if the programing language of your choice is
supported.
App Server
App Server
App Server
Consistency– Consistent only at the document level– Most documents stores currently don’t support multi-
document transactions– Analyze your application needs
Availability– Each node stores active and replica data
(Couchbase)– Each node is either a master or slave (MongoDB)
©2014 Couchbase, Inc. 22
NoSQL considerations
Operations– Monitoring the system– Backup and restore the system– Upgrades and maintenance – Support
App Server
App ServerClient
Ease of Scaling– Ease of adding and reducing capacity– Single node type– App availability on topology changes
Indexing and Querying– Secondary indexes – Aggregates Grouping – Basic querying / Ad hoc querying
©2014 Couchbase, Inc. 24
3rd party or user defined structure (Twitter feeds) Support for unlimited data growth (Viral apps) Data with non-homogenous structure Need to quickly and often change data structure Variable length documents Sparse data records Hierarchical data
Application Characteristics - Data driven
©2014 Couchbase, Inc. 25
Low latency critical (ex. 1millisecond) High throughput (ex. 200000 ops / sec) Large number of users Unknown demand with sudden growth of users/data Predominantly direct document access Read / Mixed / Write heavy workloads
Application Characteristics - Performance driven
©2014 Couchbase, Inc. 27
High-Availability Caching
RDBMS
Application LayerUser Requests
Cache Misses and Write Requests
Read-Write Requests
Couchbase Distributed Cache
Use Case 1
©2014 Couchbase, Inc. 28
Application objects Popular search query results Session information Heavily accessed web landing
pages
High-Availability Caching
Speed up RDBMS Consistently low response times
for document / key lookups High-availability 24x7x365 Replacement for entire caching
tier
Data cached in Couchbase? Application characteristic
Use Case 1http://www.Look.PopularSearchWuerycom
Look Something Search
WEB % of clicks % of clicks
something 56.3 28
DoSomething.com 13.4 25.08
SomethingFishy.org 9.8 14.68
Popular
©2014 Couchbase, Inc. 30
Session Store
Extremely fast access to session data using unique session ID
Easy scalability to handle fast growing number of users and user-generated data
Always-on functionality for global user base
Application characteristic
Use Case 2
Session values or Cookies (stored as key-value pairs)
Examples include: items in a shopping cart, flights selected, search results, etc.
Data stored in Couchbase?
©2014 Couchbase, Inc. 32
http://www.ProfileStore.com
e enim nec felis rhoncus, ac volutpat magna blandit. Nunc facilisis turpis eget dolor mollis, id tincidunt dui mattis. Nunc sodales elementum turpis, vel interdum ante congue quis. Pellentesque habitant morbi tristique senectus et netus et malesuada fames ac turpis egestas. Aliquam erat volutpat. Nullam suscipit diam nec tortor pharetra, vitae adipiscing dolor pretium. Integer ac porta tortor. Vestibulum imperdiet quam laoreet nisl scelerisque, a tempus tortor tincidunt. Mauris suscipit dui ac urna dignissim, vitae aliquet velit convallis. Phasellus lobortis felis eu magna vulputate dapibus. Ut ornare ut quam a vulputatullam et dui odio. Nulla pharetra, velit ac convallis semper, dolor turpis porta nunc, in egestas mauris leo a nisi. Pellentesque fringilla sagittis magna vitae imperdiet. Mauris ac leo ut tellus aliquet interdum. Interdum et malesuada fames ac ante ipsum primis in faucibus. Nunc cursus odio sit amet elit mollis, et sollicitudin lacus accumsan. Nulla facilisi. Fusce et vehicula sem. Curabitur interdum vestibulum nulla id accumsan. Integer ut tortor in ligula semper vehicula. Vestibulum ut nibh ultrices, venenatis metus at, adipiscing ipsum. Donec quis consequat lectus.Class aptent taciti sociosqu ad litora torquent per conubia nostra, per inceptos himenaeos. Donec a diam tempus, aliquet ipsum eu, vestibulum sapien. Donec eleifend lectus sit amet luctus facilisis. Morbi porttitor, orci sit amet placerat tempus, nisi justo dictum augue, ac dignissim elit enim eget dolor. Praesent pulvinar ipsum arcu, eu posuere eros luctus nec. Vestibulum odio eros, ultrices non metus sit amet, tristique malesuada augue. Pellentesque lacinia dolor nec diam eleifend mollis. Vestibulum sit amet ultrices diam. Aliquam lacinia accumsan eros id hendrerit. Cras placerat laoreet urna scelerisque rutrum. Duis ornare mi ac augue varius, sit amet accumsan leo lacinia. Vivamus nec egestas neque. Quisque interdum enim molestie urn.
turpis eget dolor mollis, id tincidunt dui mattis. Nunc sodales elementum turpis, vel interdum ante congue quis. Pellentesque habitant morbi tristique senectus et netus et malesuada
Welcome back Laura!You have 3 items in your shopping cart waiting for you.
LOGIN
ID:
PASS:
Globally Distributed User Profile Store
Extremely fast access to individual profiles
Always online system as multiple applications access user profiles
Flexibility to add and update user attributes
Easy scalability to handle fast growing number of users
User profile with unique ID User setting / preferences User’s network User application state
Data stored in Couchbase? Application characteristic
Use Case 3
Laura930
********
©2014 Couchbase, Inc. 33
Data Aggregation
Flexibility to store any kind of content Flexibility to handle schema changes Full-text Search across data set High speed data ingestion Scales horizontally as more content
gets added to the system
Social media feeds: Twitter, Facebook, LinkedIn
Blogs, news, press articles Data service feeds: Hoovers, Reuters Data form other systems
Data stored in Couchbase? Application characteristic
Use Case 4
in
Ft
NEWS
Blog
©2014 Couchbase, Inc. 34
Use Case 5
Content and Metadata
Nature, Field, Summer, Farm, Sky, Environment, Landscaped, Grass, Green,Blue, Oilseed, Rape, Agriculture, Scenics, Land, Spring, Non-Urban Scene,Environmental, Conservation, Sun, Meadow, Horizon, Season, Cloud, Landscapes, Travel Locations, Pasture, Cultivated Land, Stratoshpere, cloudy day, Oliseed Rape, Rural Scene, Vibrant Color, No People, Beauty In Nature,Gold, Color Image, Beauty, Idyllic, Multicolored, Yellow, Colors, Cloudscape,Outdoors, Plant, Sunlight, Horizon Over Land
Content and metadata store
©2014 Couchbase, Inc. 35
Content and Metadata Store
Flexibility to store any kind of content Fast access to content metadata (most
accessed objects) and content Full-text Search across data set Scales horizontally as more content
gets added to the system
Content metadata Content: Articles, text Landing pages for website Digital content: eBooks, magazine,
research material
Data stored in Couchbase? Application characteristic
Use Case 5http://www.LandingPage.com
ebookMag
©2014 Couchbase, Inc. 37
Document Databases Easily Accommodate Unstructured Data
{ “ID”: 1, “NAME”: “Fairmont San Francisco”, “DESCRIPTION”: “Historic grandeur…”, “AVG_REVIEWER_SCORE”: “4.3”, “AMENITY”: {“TYPE”: “gym”, DESCRIPTION: “fitness center” }, {“TYPE”: “wifi”, “DESCRIPTION”: “free wifi”}, “RATE_TYPE”: “nightly”, “PRICE”: “$199”, “REVIEWS”: [“review_1”, “review_2”], “ATTRACTIONS”: “Chinatown”, }
JSON
{ “ID”: 2, “NAME”: “W San Francisco”, “DESCRIPTION”: “Chic, hip accommodations..”, “AVG_REVIEWER_SCORE”: “4.0”, “AMENITY”: {“TYPE”: “spa”, DESCRIPTION: “Bliss Spa” }, {“TYPE”: “wifi”, “DESCRIPTION”: “free wifi”}, {“TYPE”: “dining”, “DESCRIPTION”: “bar/lounge”}, “RATE_TYPE”: “nightly”, “PRICE”: “$194”, “REVIEWS”: [“review_1”, “review_2”],} JSON
Hotels
©2014 Couchbase, Inc. 38
Document Databases Easily Accommodate Unstructured Data
{ “ID”: 1, “NAME”: “Fairmont San Francisco”,…}
JSON
{ “REVIEW_ID”: 1, “REVIEW”: “Loved Hotel & Location”, “WOULD RECOMMEND”: “yes”, “AVG_REVIEWER_SCORE”: “5”, “REVIEW_DATE”: “May 29, 2013”, “USER_PROFILE_ID”: “271”,
}
JSON
{ “REVIEW_ID”: 2, “REVIEW”: “Nice, but a few kinks”, “WOULD RECOMMEND”: “yes”, “AVG_REVIEWER_SCORE”: “4”, “REVIEW_DATE”: “May 22, 2013”, “USER_PROFILE_ID”: “923”,
}
JSON
Hotels
Reviews
©2014 Couchbase, Inc. 39
Document Databases Easily Accommodate Unstructured Data
{ “ID”: 1, “NAME”: “Fairmont San Francisco”,…}
JSON
Hotel Descriptions
Reviews { “REVIEW_ID”: 1, “REVIEW”: “Loved Hotel…”,…} JSON
{ “REVIEW_ID”: 2, “REVIEW”: “Nice, but …”,…} JSON
User Profiles { “USER_ID”: 1, “DISPLAY_NAME ”: “Ted’s Trip Experience”, “CITY”: “Saratoga”, “STATE”: “California”,“NUM_OF_REVIEWS”: “8”, }
JSON
{ “USER_ID”: 1, “DISPLAY_NAME ”: “WhatWhat567”, “CITY”: “Kansas City”, “STATE”: “MO”,“NUM_OF_REVIEWS”: “3”, }
JSON
©2014 Couchbase, Inc. 40
Document Databases Easily Accommodate Unstructured Data
{ “ID”: 1, “NAME”: “Fairmont San Francisco”,…} JSON
Hotel Descriptions
Reviews { “REVIEW_ID”: 1, “REVIEW”: “Loved Hotel…”,…}
JSON
{ “REVIEW_ID”: 2, “REVIEW”: “Nice, but …”,…}
JSON
User Profiles { “USER_ID”: 1, “DISPLAY”: “Ted’s Trip…”,…}
JSON
{ “USER_ID”: 2, “DISPLAY”: “WhatWhat …”,…}
JSON
Document IDs associates related objects
Hotels points to reviews
Reviews points to users
©2014 Couchbase, Inc. 42
Indexing with Document Databases
Index on AVG_REVIEWER_SCORE…4.0, doc_id4.0, doc_id4.1, doc_id4.3, doc_id5.0, doc_id…
Index
©2014 Couchbase, Inc. 43
Querying with Document Databases
Query on AVG_REVIEWER_SCORE
…3.4, doc_id3.4, doc_id3.5, doc_id3.6, doc_id3.7, doc_id3.8, doc_id4.0, doc_id4.1, doc_id4.3, doc_id4.5, doc_id4.7, doc_id4.9, doc_id5.0, doc_id…5.0, doc_id
Index Matching ResultsQuery
[email protected]@dborkar