sql to nosql - top 6 questions before making the move

SQL to NoSQL: Top 6 QuestionsGlynn BirdDeveloper Advocate @ IBM

@glynn_bird

Agenda

• Top 6 Questions When Moving to NoSQL1. Why NoSQL?2. Rows and Tables Become ... What?3. Will I Have to Rebuild My App?4. How do I query data?5. What's _rev?6. Does it replicate?

• Live Q&A

1. Why NoSQL?

But, What Is NoSQL, Really?

• Umbrella term for databases using non-SQL query languages• Key-Value stores• Column-family stores• Document stores• Graph stores

• Some also say "non-relational," because data is not decomposed into separate tables, rows, and columns • It’s still possible to represent relationships in NoSQL

• The question is, are these relationships always necessary?

NoSQL Document Stores

• That's databases like MongoDB, Apache CouchDB™, Cloudant, and Dynamo

• Optimized for "semi-structured" or "schema-optional" data• People say "unstructured," but that's inaccurate

• Each document has its own structure

multi-node clusteringCloudant Geo Cloudant

Query (Mango)Cloudant Search (Lucene)

Dashboard

Schema Flexibility

• Cloudant uses JavaScript Object Notation (JSON) as its data format• Cloudant is based on Apache CouchDB. In both systems, a "database" is simply

a collection of JSON documents{ "docs": [ { "_id": "df8cecd9809662d08eb853989a5ca2f2", "_rev": "1-8522c9a1d9570566d96b7f7171623270", "Movie_runtime": 162, "Movie_rating": "PG-13", "Person_name": "Zoe Saldana", "Actor_actor_id": "0757855", "Movie_genre": "AVYS", "Movie_name": "Avatar", "Actor_movie_id": "0499549", "Movie_earnings_rank": "1", "Person_pob": "New Jersey, USA", "Person_id": "0757855", "Movie_id": "0499549", "Movie_year": 2009, "Person_dob": "1978-06-19" } ]}

The Cloudant Data Layer

• Distributed NoSQL data persistence layer

• Available as a fully-managed DBaaS, or managed by you on-premises

• Transactional JSON document database with REST API

• Spreads data across data centers & devices for scale & high availability

• Ideal for apps that require:• Massive, elastic scalability• High availability• Geo-location services• Full-text search• Offline-first design for occasionally

connected users

Not One DB Server; a Cluster of Servers• A Cloudant cluster

• Horizontal scale

• Redundant load balancers backed by multiple DB servers

• Designed for durability• Saves multiple copies of data

• Spreads copies across cluster

• All replicas do reads & writes

• Access Cloudant over the Web• Developers get an API

• Cloudant manages it all behind the scenes

Horizontal Scaling

• Shard across many commodity servers vs. few expensive ones• Performance improves linearly with cost, not exponentially

2. Rows and Tables Become ... What?

... This!

SQL Terms/Conceptsdatabase -->

table -->

row -->

column -->

materialized view -->

primary key -->

table JOIN operations -->

Document Store Terms/Conceptsdatabase

bunch of documents

document

index/database view/secondary index

"_id":

entity relations

Rows --> Documents

• Use some field to group documents by schema• Example:

"type":"user" or "type":"book"

"_id":"user:456" or "_id":"book:9988"

Tables --> Databases

• Put all tables in one database; use "type": to distinguish• Model entity relationships with secondary indexes

• http://wiki.apache.org/couchdb/EntityRelationship

3. How do you query NoSQL

Indexes and Queries

• An "index" in Cloudant is not strictly a performance optimization• Instead, more akin to "materialized view" in RDBMS terms• Index also called a "database view" in Cloudant

• Index, then query• You need one before you can do the other

• Create index, then query by URL• Can create a secondary index on any field within a document• You get primary index (based on reserved "_id": field) by default

• Indexes precomputed, updated in real time• Indexes are updated using incremental MapReduce• You don't need to rebuild the entire index every time a document is changed,

added, or deleted• Performant at big-honkin' scale

One Cloudant DB, Many Indexes

The Cloudant API

Cloudant Query

curl -X POST 'https://<accountname>.cloudant.com/users/_find' -d'{

"selector": {"age": {

"$gt": 25,"$lte": 50

4. Will I Have to Rebuild My App?

By ripping out the bad parts:• Extract, Transform, Load

• Schema migrations

• JOINs that don't scale

Each of My Tables Becomes a Different Type of JSON Document?

No• Fancy explanation:

• Best practice is to denormalize data into 3rd normal form

• Or, less fancy:• Smoosh relationships for each

entry all together into one JSON doc

• Denormalization• Approach to data modeling that

shards well and scales well

• Works well with data that is somewhat static, or infrequently updated

A smooshed and griddled cheese sandwich

Example{ "_id": "johnsmith@us.ibm.com", "_rev": "12-89e6128fb2d3e2e14559e796b6a71c9d", "name": "John Smith", "title": "Technical Sales Manager", "products": [ "Cloudant", "Information Server"], "languages": [ "English" ], "geolocation": { "coordinates": [ -122.18258, 37.880058 ], "type": "point" }, "address": { "street": "63 Citron knoll", "city": "Orinda", "state": "CA", "country": "USA" }}

5. Does it replicate?

{ "_id": "johnsmith@us.ibm.com", "_rev": "12-89e6128fb2d3e2e14559e796b6a71c9d", "name": "John Smith", "title": "Technical Sales Manager", "products": [ "Cloudant", "Information Server",], "languages": [ "English" ], "geolocation": { "coordinates": [ -122.18258, 37.880058 ], "type": "point" }, "address": { "street": "63 Citron knoll", "city": "Orinda", "state": "CA", "country": "USA" }}

Replication targets

• Apache CouchDB• IBM Cloudant• PouchDB (client & server)• Cloudant Sync Libraries

www.glynnbird.com

• My home page

• Cloudant database of articles

• Replicated to PouchDB

• Appcache for offline first

• http://www.glynnbird.com/

6. How do I get data in and out?

• Yes• https://cloudant.com/for-developers/migrating-data/

• But every use case is different and everyone’s data is different

• Lots of DIY tools on github that could work for you

• Cloudant’s Homegrown CSV --> JSON Tools• python: https://github.com/claudiusli/csv-import

• Java: https://github.com/cavanaugh-ibm/db-data-loader

• Node: https://github.com/glynnbird/couchimport

Simple Data Pipe

• https://github.com/ibm-cds-labs/pipes

Simple Search Service

https://developer.ibm.com/clouddataservices/simple-search-service/

Glynn BirdDeveloper Advocate, Cloud Data Servicesglynn.bird@uk.ibm.com@glynn_birdgithub.com/glynnbird

Legal Slide #1

© "Apache", "CouchDB", "Apache CouchDB", "Apache Lucene," "Lucene", and the CouchDB logo are trademarks or registered trademarks of The Apache Software Foundation. All other brands and trademarks are the property of their respective owners.

Legal Slide #2

IBM and the IBM Cloudant logo are trademarks of International Business Machines Corp., registered in many jurisdictions worldwide. Other product and service names might be trademarks of IBM or other companies. A current list of IBM trademarks is available on the Web at "Copyright and trademark information" at ibm.com/legal/copytrade.shtml

sql to nosql - top 6 questions before making the move

Software

nosql for the sql server pro

nosql + sql = postgresql (pgday campinas 2014)

sql or nosql - how to choose

generating nosql from sql

nosql not only sql, couchdb - apache couchdb has started...

sql vs nosql

nosql and sql databases

getting started with sql for oracle nosql database ·...

sql vs. nosql

nosql vs sql

sql or nosql - truenorthphp

nosql - wordpress.com · จาก sql สู่nosql •...

evolving from rdbms to nosql + sql

comparing sql and nosql dbs

sql vs. nosql databases

nosql, no sql injections?

objectives: describe a realistic nosql nosql · objectives:...

bridging sql and nosql

sql access to nosql

bringing sql to nosql: rich, declarative query for nosql