webinar - making the shift from relational to nosql
TRANSCRIPT
Making the Shift fromRelational to NoSQL
Tugdual “Tug” Grall
Technical Evangelist
Agenda
• Why moving to NoSQL?
• Key differences
• How to move?
• Modeling the data
• Accessing the data
• Migrating the data
You can download Couchbase at www.couchbase.com/download
Why transition to NoSQL?
You are here because..
You want to build applications
and currently have a relational database
It’s probably gotten a little bit big..
http://www.seoclerks.com/imagedb/2005/BIG-CAT-FOUND-Spoh
Or maybe huge
http://www.2pep.com/extreme-funny-stuff/
And complicated
You’re not aloneTop two big drivers for NoSQL
adoption
Lack of flexibility/rigid schemas
Inability to scale out data
Performance challenges
Cost All of these Other
49%
35%
29%
16%12% 11%
Source: Couchbase Survey, December 2011, n = 1351.
NoSQL catalog
Key-Value
memcached redis
Data Structure Document Column Graph
mongoDB
couchbase cassandra
Cac
he
(mem
ory
on
ly)
Dat
abas
e(m
emo
ry/d
isk)
Neo4j
Key Differences
Relational vs Document data model
Relational data model Document data modelCollection of complex documents with
arbitrary, nested data formats andvarying “record” format.
Highly-structured table organization with rigidly-defined data formats and
record structure.
JSONJSON
JSON
C1 C2 C3 C4
{
}
Document Databases
• Each record in the database is a self-describing document
• Each document has an independent structure
• Documents can be complex • All databases require a unique key• Documents are stored using JSON or
XML or their derivatives• Content can be indexed and queried • Offer auto-sharding for scaling and
replication for high-availability
{ “UUID”: “21f7f8de-8051-5b89-86“Time”: “2011-04-01T13:01:02.42“Server”: “A2223E”,
“Calling Server”: “A2213W”,“Type”: “E100”,“Initiating User”: “[email protected]”,“Details”:
{
“IP”: “10.1.1.22”,“API”: “InsertDVDQueueItem”,“Trace”: “cleansed”,“Tags”:
[
“SERVER”, “US-West”, “API”]
}
}
Example: User Profile
Address Info
1 DEN 30303CO
2 MV 94040CA
3 CHI 60609IL
User Info
KEY First ZIP_idLast
4 NY 10010NY
1 Dipti 2Borkar
2 Joe 2Smith
3 Ali 2Dodson
4 John 3Doe
ZIP_id CITY ZIPSTATE
1 2
2 MV 94040CA
To get information about specific user, you perform a join across two tables
All data in a single document
Document Example: User Profile
{“ID”: 1,“FIRST”: “Tug”,“LAST”: “Grall”,“ZIP”: “94040”,“CITY”: “MV”,“STATE”: “CA”
}JSON
= +
User ID First Last Zip
1 Dipti Borkar 94040
2 Joe Smith 94040
3 Ali Dodson 94040
4 Sarah Gorin NW1
5 Bob Young 30303
6 Nancy Baker 10010
7 Ray Jones 31311
8 Lee Chen V5V3M
•••
50000 Doug Moore 04252
50001 Mary White SW195
50002 Lisa Clark 12425
Country ID
TEL3
001
Country ID Country name
001 USA
002 UK
003 Argentina
004 Australia
005 Aruba
006 Austria
007 Brazil
008 Canada
009 Chile
•••
130 Portugal
131 Romania
132 Russia
133 Spain
134 Sweden
User ID Photo ID Comment
2 d043 NYC
2 b054 Bday
5 c036 Miami
7 d072 Sunset
5002 e086 Spain
Photo Table
001
007
001
133
133
User ID Status ID Text
1 a42 At conf
4 b26 excited
5 c32 hockey
12 d83 Go A’s
5000 e34 sailing
Status Table
134
007
008
001
005
Country Table
User ID Affl ID Affl Name
2 a42 Cal
4 b96 USC
7 c14 UW
8 e22 Oxford
Affiliations TableCountry
ID
001
001
001
002
Country ID
Country ID
001
001
002
001
001
001
008
001
002
001
User Table
.
.
.
Making a Change Using RDBMS
Making the Same Change with a Document Database
{“ID”: 1,“FIRST”: “Tug”,“LAST”: “Grall”,“ZIP”: “94040”,“CITY”: “MV”,“STATE”: “CA”,“STATUS”: { “TEXT”: “At Conf”
}
}
“GEO_LOC”: “134” },“COUNTRY”: ”USA”
Just add information to a document
JSON
,}
Where is NoSQL a good fit?
Market Adoption
Internet Companies Enterprises
• Social Gaming
• Ad Networks
• Social Networks
• Online Business Services
• E-Commerce
• Online Media
• Content Management
• Cloud Services
• Communications
• Retail
• Financial Services
• Health Care
• Automotive/Airline
• Agriculture
• Consumer Electronics
• Business Systems
Market Adoption – Customers
Internet Companies Enterprises
More than 300 customers -- 5,000 production deployments worldwide
Application Characteristics - Data driven
• 3rd party or user defined structure (Twitter feeds)
• Support for unlimited data growth (Viral apps)
• Data with non-homogenous structure
• Need to quickly and often change data structure
• Variable length documents
• Sparse data records
• Hierarchical data
Couchbase is a good fit
Application Characteristics -Performance driven
• Low latency critical (ex. 1millisecond)
• High throughput (ex. 200000 ops / sec)
• Large number of users
• Unknown demand with sudden growth of users/data
• Predominantly direct document access
• Read / Mixed / Write heavy workloads
Couchbase is a good fit
Common Use CasesSocial Gaming
• Couchbase stores player and game data
• Examples customers include: Zynga
• Tapjoy, Ubisoft, Tencent
Mobile Apps
• Couchbase stores user info and app content
• Examples customers include: Kobo, Playtika
Ad Targeting
• Couchbase stores user information for fast access
• Examples customers include: AOL, Mediamind, Convertro
Session store
• Couchbase Server as a key-value store
• Examples customers include: Concur, Sabre
User Profile Store
• Couchbase Server as a key-value store
• Examples customers include: Tunewiki
High availability cache
• Couchbase Server used as a cache tier replacement
• Examples customers include: Orbitz
Content & Metadata Store
• Couchbase document store with Elastic Search
• Examples customers include: McGraw Hill
3rd party data aggregation
• Couchbase stores social media and data feeds
• Examples customers include: Sambacloud
From SQL to NoSQL
Document Design
Document Design
• One document that contains all related data
Data is de-normalized
Better performance and scale
Eliminate client-side joins
But if the document grows continuously or has write contention, it should be split
• Separate documents for different object types with cross references
Data duplication is reduced
Objects may not be co-located
Transactions supported only on a document boundary
Most document databases do not support joins
Example: Entities for a Blog
• User profile
The main pointer into the user data
• Blog entries
• Badge settings, like a twitter badge
• Blog posts
Contains the blogs themselves
• Blog comments
• Comments from other users
Single vs. Multiple Documents
• We need to decide the best way to store comments
{
}
Single Multiple
vs.
Document
Comment
Comment
Comment
{ “UUID”: “21f7f8de-8051-5b89-86“Time”: “2011-04-01T13:01:02.42“Server”: “A2223E”,
“Calling Server”: “A2213W”,“Type”: “E100”,“Initiating User”: “[email protected]”,“Details”:
{
“IP”: “10.1.1.22”,“API”: “InsertDVDQueueItem”,“Trace”: “cleansed”,“Tags”:
[
“SERVER”, “US-West”, “API”]
}
}
Blog Document – Option 1 – Single document
{
“_id”: “Couchbase_Hello_World”,
“author”: “dborkar”,
“type”: “post”
“title”: “Hello World”,
“format”: “markdown”,
“body”: “Hello from [Couchbase](http://couchbase.com).”,
“html”: “<p>Hello from <a href=\“http: …
“comments”:[
[“format”: “markdown”, “body”:”Awesome post!”],
[“format”: “markdown”, “body”:”Like it.” ]
]}
Blog Document – Option 2 - Split into multiple docs
{ “UUID”: “21f7f8de-8051-5b89-86“Time”: “2011-04-01T13:01:02.42“Server”: “A2223E”,
“Calling Server”: “A2213W”,“Type”: “E100”,“Initiating User”: “[email protected]”,“Details”:
{
“IP”: “10.1.1.22”,“API”: “InsertDVDQueueItem”,“Trace”: “cleansed”,“Tags”:
[
“SERVER”, “US-West”, “API”]
}
}
{
“_id”: “Coucbase_Hello_World”,
“author”: “tgrall”,
“type”: “post”
“title”: “Hello World”,
“format”: “markdown”,
“body”: “Hello from
[Couchbase](http://couchbase.com).”,
“html”: “<p>Hello from <a href=\“http: …
“comments”:[
“comment1_Couchbase_Hello_world”
]
}{ “UUID”: “21f7f8de-8051-5b89-86“Time”: “2011-04-01T13:01:02.42“Server”: “A2223E”,
“Calling Server”: “A2213W”,“Type”: “E100”,“Initiating User”: “[email protected]”,“Details”:
{
“IP”: “10.1.1.22”,“API”: “InsertDVDQueueItem”,“Trace”: “cleansed”,“Tags”:
[
“SERVER”, “US-West”, “API”]
}
}
{
“_id”: “comment1_Couchbase_Hello_World”,
“format”: “markdown”,
“body”:”Awesome post!”
}
BLOG DOC
COMMENT
Threaded Comments
• You can imagine how to take this to a threaded list
Blog
First comment
Reply to comment
More Comments
List
List
Advantages
• Only fetch the data when you need it• For example, rendering part of a web page
• Spread the data and load across the entire cluster
Which one is the best?
Case by Case:
• How data are created by the application?
• How data are read by the application?
• Focus on the most common use case in the application
Note: changing document “schema” is really easy !
Write your application
Mental Adjustments
• In SQL we tend to want to avoid hitting the database as much as possible
• In Couchbase, gets and sets on documents are in memory. This is fast.
• The key to finding data is the key of the document.
• Many newcomers see views as a replacement for key design, because it seems more “SQL” –like.
Code Change
• Replace your database calls by Couchbase calls: Find/Select by ID : couchbase.get(key);
Save/Update : couchbase.set(key, document);
Delete : couchbase.delete(key);
• If you need locking for concurrent access use CAS Check and Swap : Optimistic Locking
• All these operations are atomic
Query the data
• Create Couchbase views to index your documents
• Use views to: Index documents on primary index
Index documents on secondary index
Query these indexes using range, aggregate, …
CRUD Operations MAP()
emit()
(processed)
Couchbase Views
Data Migration
Data Migration
Move your data:• Using ETL or other tools• Own code
Moving Data using an ETL
DEMO
Moving Data Using TalendMoving Data Using Java Importer
Conclusion
• Move to NoSQL to gain scalability and flexibility
• Design your document based on the application logic
• Migrate the data easily with tools
• Resources:
• Java Petstore Migration: http://goo.gl/mzRNqt
• Talend Connector : http://couchbase.com/couchbase-server/connectors/talend
• Couchbase SQL Importer : https://github.com/tgrall/couchbase-sql-importer
Download Couchbase: www.couchbase.com/download
Questions?