webinar - making the shift from relational to nosql

Making the Shift fromRelational to NoSQL

Tugdual “Tug” Grall

Technical Evangelist

Agenda

• Why moving to NoSQL?

• Key differences

• How to move?

• Modeling the data

• Accessing the data

• Migrating the data

You can download Couchbase at www.couchbase.com/download

Why transition to NoSQL?

You are here because..

You want to build applications

and currently have a relational database

It’s probably gotten a little bit big..

http://www.seoclerks.com/imagedb/2005/BIG-CAT-FOUND-Spoh

Or maybe huge

http://www.2pep.com/extreme-funny-stuff/

And complicated

You’re not aloneTop two big drivers for NoSQL

adoption

Lack of flexibility/rigid schemas

Inability to scale out data

Performance challenges

Cost All of these Other

49%

35%

29%

16%12% 11%

Source: Couchbase Survey, December 2011, n = 1351.

NoSQL catalog

Key-Value

memcached redis

Data Structure Document Column Graph

mongoDB

couchbase cassandra

Cac

he

(mem

ory

on

ly)

Dat

abas

e(m

emo

ry/d

isk)

Neo4j

Key Differences

Relational vs Document data model

Relational data model Document data modelCollection of complex documents with

arbitrary, nested data formats andvarying “record” format.

Highly-structured table organization with rigidly-defined data formats and

record structure.

JSONJSON

JSON

C1 C2 C3 C4

{

}

Document Databases

• Each record in the database is a self-describing document

• Each document has an independent structure

• Documents can be complex • All databases require a unique key• Documents are stored using JSON or

XML or their derivatives• Content can be indexed and queried • Offer auto-sharding for scaling and

replication for high-availability

{ “UUID”: “21f7f8de-8051-5b89-86“Time”: “2011-04-01T13:01:02.42“Server”: “A2223E”,

“Calling Server”: “A2213W”,“Type”: “E100”,“Initiating User”: “[email protected]”,“Details”:

{

“IP”: “10.1.1.22”,“API”: “InsertDVDQueueItem”,“Trace”: “cleansed”,“Tags”:

[

“SERVER”, “US-West”, “API”]

}

}

Example: User Profile

Address Info

1 DEN 30303CO

2 MV 94040CA

3 CHI 60609IL

User Info

KEY First ZIP_idLast

4 NY 10010NY

1 Dipti 2Borkar

2 Joe 2Smith

3 Ali 2Dodson

4 John 3Doe

ZIP_id CITY ZIPSTATE

1 2

2 MV 94040CA

To get information about specific user, you perform a join across two tables

All data in a single document

Document Example: User Profile

{“ID”: 1,“FIRST”: “Tug”,“LAST”: “Grall”,“ZIP”: “94040”,“CITY”: “MV”,“STATE”: “CA”

}JSON

= +

User ID First Last Zip

1 Dipti Borkar 94040

2 Joe Smith 94040

3 Ali Dodson 94040

4 Sarah Gorin NW1

5 Bob Young 30303

6 Nancy Baker 10010

7 Ray Jones 31311

8 Lee Chen V5V3M

•••

50000 Doug Moore 04252

50001 Mary White SW195

50002 Lisa Clark 12425

Country ID

TEL3

001

Country ID Country name

001 USA

002 UK

003 Argentina

004 Australia

005 Aruba

006 Austria

007 Brazil

008 Canada

009 Chile

•••

130 Portugal

131 Romania

132 Russia

133 Spain

134 Sweden

User ID Photo ID Comment

2 d043 NYC

2 b054 Bday

5 c036 Miami

7 d072 Sunset

5002 e086 Spain

Photo Table

001

007

001

133

133

User ID Status ID Text

1 a42 At conf

4 b26 excited

5 c32 hockey

12 d83 Go A’s

5000 e34 sailing

Status Table

134

007

008

001

005

Country Table

User ID Affl ID Affl Name

2 a42 Cal

4 b96 USC

7 c14 UW

8 e22 Oxford

Affiliations TableCountry

ID

001

001

001

002

Country ID

Country ID

001

001

002

001

001

001

008

001

002

001

User Table

.

.

.

Making a Change Using RDBMS

Making the Same Change with a Document Database

{“ID”: 1,“FIRST”: “Tug”,“LAST”: “Grall”,“ZIP”: “94040”,“CITY”: “MV”,“STATE”: “CA”,“STATUS”: { “TEXT”: “At Conf”

}

}

“GEO_LOC”: “134” },“COUNTRY”: ”USA”

Just add information to a document

JSON

,}

Where is NoSQL a good fit?

Market Adoption

Internet Companies Enterprises

• Social Gaming

• Ad Networks

• Social Networks

• Online Business Services

• E-Commerce

• Online Media

• Content Management

• Cloud Services

• Communications

• Retail

• Financial Services

• Health Care

• Automotive/Airline

• Agriculture

• Consumer Electronics

• Business Systems

Market Adoption – Customers

Internet Companies Enterprises

More than 300 customers -- 5,000 production deployments worldwide

Application Characteristics - Data driven

• 3rd party or user defined structure (Twitter feeds)

• Support for unlimited data growth (Viral apps)

• Data with non-homogenous structure

• Need to quickly and often change data structure

• Variable length documents

• Sparse data records

• Hierarchical data

Couchbase is a good fit

Application Characteristics -Performance driven

• Low latency critical (ex. 1millisecond)

• High throughput (ex. 200000 ops / sec)

• Large number of users

• Unknown demand with sudden growth of users/data

• Predominantly direct document access

• Read / Mixed / Write heavy workloads

Couchbase is a good fit

Common Use CasesSocial Gaming

• Couchbase stores player and game data

• Examples customers include: Zynga

• Tapjoy, Ubisoft, Tencent

Mobile Apps

• Couchbase stores user info and app content

• Examples customers include: Kobo, Playtika

Ad Targeting

• Couchbase stores user information for fast access

• Examples customers include: AOL, Mediamind, Convertro

Session store

• Couchbase Server as a key-value store

• Examples customers include: Concur, Sabre

User Profile Store

• Couchbase Server as a key-value store

• Examples customers include: Tunewiki

High availability cache

• Couchbase Server used as a cache tier replacement

• Examples customers include: Orbitz

Content & Metadata Store

• Couchbase document store with Elastic Search

• Examples customers include: McGraw Hill

3rd party data aggregation

• Couchbase stores social media and data feeds

• Examples customers include: Sambacloud

From SQL to NoSQL

Document Design

Document Design

• One document that contains all related data

Data is de-normalized

Better performance and scale

Eliminate client-side joins

But if the document grows continuously or has write contention, it should be split

• Separate documents for different object types with cross references

Data duplication is reduced

Objects may not be co-located

Transactions supported only on a document boundary

Most document databases do not support joins

Example: Entities for a Blog

• User profile

The main pointer into the user data

• Blog entries

• Badge settings, like a twitter badge

• Blog posts

Contains the blogs themselves

• Blog comments

• Comments from other users

Single vs. Multiple Documents

• We need to decide the best way to store comments

{

}

Single Multiple

vs.

Document

Comment

Comment

Comment



{


[


}

}

Blog Document – Option 1 – Single document

{

“_id”: “Couchbase_Hello_World”,

“author”: “dborkar”,

“type”: “post”

“title”: “Hello World”,

“format”: “markdown”,

“body”: “Hello from [Couchbase](http://couchbase.com).”,

“html”: “<p>Hello from <a href=\“http: …

“comments”:[

[“format”: “markdown”, “body”:”Awesome post!”],

[“format”: “markdown”, “body”:”Like it.” ]

]}

http://couchbase.com

Blog Document – Option 2 - Split into multiple docs



{


[


}

}

{

“_id”: “Coucbase_Hello_World”,

“author”: “tgrall”,

“type”: “post”

“title”: “Hello World”,


“body”: “Hello from

[Couchbase](http://couchbase.com).”,

“html”: “<p>Hello from <a href=\“http: …

“comments”:[

“comment1_Couchbase_Hello_world”

]

}{ “UUID”: “21f7f8de-8051-5b89-86“Time”: “2011-04-01T13:01:02.42“Server”: “A2223E”,


{


[


}

}

{

“_id”: “comment1_Couchbase_Hello_World”,


“body”:”Awesome post!”

}

BLOG DOC

COMMENT

http://couchbase.com

Threaded Comments

• You can imagine how to take this to a threaded list

Blog

First comment

Reply to comment

More Comments

List

List

Advantages

• Only fetch the data when you need it• For example, rendering part of a web page

• Spread the data and load across the entire cluster

Which one is the best?

Case by Case:

• How data are created by the application?

• How data are read by the application?

• Focus on the most common use case in the application

Note: changing document “schema” is really easy !

Write your application

Mental Adjustments

• In SQL we tend to want to avoid hitting the database as much as possible

• In Couchbase, gets and sets on documents are in memory. This is fast.

• The key to finding data is the key of the document.

• Many newcomers see views as a replacement for key design, because it seems more “SQL” –like.

Code Change

• Replace your database calls by Couchbase calls: Find/Select by ID : couchbase.get(key);

Save/Update : couchbase.set(key, document);

Delete : couchbase.delete(key);

• If you need locking for concurrent access use CAS Check and Swap : Optimistic Locking

• All these operations are atomic

Query the data

• Create Couchbase views to index your documents

• Use views to: Index documents on primary index

Index documents on secondary index

Query these indexes using range, aggregate, …

CRUD Operations MAP()

emit()

(processed)

Couchbase Views

Data Migration

Data Migration

Move your data:• Using ETL or other tools• Own code

Moving Data using an ETL

DEMO

Moving Data Using TalendMoving Data Using Java Importer

Conclusion

• Move to NoSQL to gain scalability and flexibility

• Design your document based on the application logic

• Migrate the data easily with tools

• Resources:

• Java Petstore Migration: http://goo.gl/mzRNqt

• Talend Connector : http://couchbase.com/couchbase-server/connectors/talend

• Couchbase SQL Importer : https://github.com/tgrall/couchbase-sql-importer

Download Couchbase: www.couchbase.com/download

http://goo.gl/mzRNqt

http://goo.gl/mzRNqt

http://couchbase.com/couchbase-server/connectors/talend






Questions?

Thank you

Couchbase NoSQL Document Database

[email protected]

mailto:[email protected]

webinar - making the shift from relational to nosql

Technology

defined data formats

specific user

document databaseseach

relational database

selfdescribing document

structured table organization

record structure

key differenceshow