introducing azure documentdb - nosql, no problem

68
Hello DocumentDB NoSQL, No Problem { "name": "Andrew Liu", "e-mail": "[email protected]", "twitter": "@aliuy8" }

Upload: andrew-liu

Post on 11-Feb-2017

405 views

Category:

Software


0 download

TRANSCRIPT

Page 1: Introducing Azure DocumentDB - NoSQL, No Problem

Hello DocumentDBNoSQL, No Problem

{ "name": "Andrew Liu", "e-mail": "[email protected]", "twitter": "@aliuy8"}

Page 2: Introducing Azure DocumentDB - NoSQL, No Problem

First… a Rant

Page 3: Introducing Azure DocumentDB - NoSQL, No Problem

managing servers makes me cry

Page 4: Introducing Azure DocumentDB - NoSQL, No Problem

structuring data is really hard

Page 5: Introducing Azure DocumentDB - NoSQL, No Problem

backfilling data and managing indexes makes

me angry

Page 6: Introducing Azure DocumentDB - NoSQL, No Problem

A Solution

Page 7: Introducing Azure DocumentDB - NoSQL, No Problem

DocumentDBdocument-database… as a

service!

Page 8: Introducing Azure DocumentDB - NoSQL, No Problem

• Part of the NoSQL family of databases• Built for simplicity, scale and performance• Non-relational, no schema enforced by the database• Flexible query options

What’s a document database?

Great for these documents …

Page 9: Introducing Azure DocumentDB - NoSQL, No Problem

• Part of the NoSQL family of databases• Built for simplicity, scale and performance• Non-relational, no schema enforced by the database• Flexible query options

What’s a document database?

Not ideal for these documents …

Page 10: Introducing Azure DocumentDB - NoSQL, No Problem

• Part of the NoSQL family of databases• Built for simplicity, scale and performance• Non-relational, no schema enforced by the database• Flexible query options

What’s a document database?

Definitely not for these documents …

Page 11: Introducing Azure DocumentDB - NoSQL, No Problem

Heterogeneous data

Page 12: Introducing Azure DocumentDB - NoSQL, No Problem

Item Author Pages

Language

Harry Potter and the Sorcerer’s Stone

J.K. Rowling 309 English

Game of Thrones: A Song of Ice and Fire

George R.R. Martin

864 English

Page 13: Introducing Azure DocumentDB - NoSQL, No Problem

Item Author Pages

Language

Harry Potter and the Sorcerer’s Stone

J.K. Rowling 309 English

Game of Thrones: A Song of Ice and Fire

George R.R. Martin

864 English

Lenovo Thinkpad X1 Carbon

??? ??? ???

Page 14: Introducing Azure DocumentDB - NoSQL, No Problem

!=

Page 15: Introducing Azure DocumentDB - NoSQL, No Problem

!=

Page 16: Introducing Azure DocumentDB - NoSQL, No Problem

{ "ItemType": "Book", "Title": "Harry Potter and the Sorcerer's Stone", "Author": "J.K. Rowling", "Pages": "864", "Languages": [ "English", "Spanish", "Portuguese", "Russian", "French" ]} {

"ItemType": "Laptop", "Name": "Lenovo Thinkpad X1 Carbon", "Processor": "Core i7 3.3 Ghz", "Memory": "8 GB DDR3L SDRAM", "Storage": "256 GB SSD", "Graphics": "Intel HD Graphics 4400", "Weight": "1 pound"}

Page 17: Introducing Azure DocumentDB - NoSQL, No Problem

Rapid Iterative Development

Page 18: Introducing Azure DocumentDB - NoSQL, No Problem

3rd Party Data

Page 19: Introducing Azure DocumentDB - NoSQL, No Problem

It just works.

Page 20: Introducing Azure DocumentDB - NoSQL, No Problem

fully managed, scalable, queryable, schemafree JSON document database service for modern applications

fully featured RDBMStransactional processing

rich query managed as a service

elastic scale

internet accessible http/rest

schema-free data model

arbitrary data formats

Where does it fit in the Azure family?

Page 21: Introducing Azure DocumentDB - NoSQL, No Problem

Some of my favorite things

query over schema-free

JSONtransactional

integrated javascript

tunableperformance

fully managedas a service

Page 22: Introducing Azure DocumentDB - NoSQL, No Problem

Query over schema-free JSON

query over schema-free

JSONtransactional

integrated javascript

tunableperformance

fully managedas a service

Page 23: Introducing Azure DocumentDB - NoSQL, No Problem

No need to define secondary indices / schema hints for indexing!

Automatic Indexing

Page 24: Introducing Azure DocumentDB - NoSQL, No Problem

-- Nested lookup against indexSELECT Books.AuthorFROM BooksWHERE Books.Author.Name = "Leo Tolstoy"

-- Transformation, Filters, Array accessSELECT { Name: Books.Title, Author: Books.Author.Name }FROM BooksWHERE Books.Price > 10 AND Books.Languages[0] = "English"

-- Joins, User Defined Functions (UDF)SELECT CalculateRegionalTax(Books.Price, "USA", "WA")FROM BooksJOIN LanguagesArr IN Books.LanguagesWHERE LanguagesArr.Language = "Russian"

SQL Query Grammar

Query over schema-free JSON

Page 25: Introducing Azure DocumentDB - NoSQL, No Problem

Transactional Integrated JavaScript

query over schema-free

JSONtransactional

integrated javascript

tunableperformance

fully managedas a service

Page 26: Introducing Azure DocumentDB - NoSQL, No Problem

Transactional Integrated JavaScript

Page 27: Introducing Azure DocumentDB - NoSQL, No Problem

Transactional Integrated JavaScript

Page 28: Introducing Azure DocumentDB - NoSQL, No Problem

function(playerId1, playerId2) {    var playersToSwap = __.filter (function (document) {        return (document.id == playerId1 || document.id == playerId2);    });

    var player1 = playersToSwap[0], player2 = playersToSwap[1];     var player1ItemTemp = player1.item;    player1.item = player2.item;    player2.item = player1ItemTemp;

    __.replaceDocument(player1)        .then(function() { return __.replaceDocument(player2); })        .fail(function(error){ throw 'Unable to update players, abort'; });}

client.executeStoredProcedureAsync ("procs/1234", ["MasterChief", "SolidSnake“]) .then(function (response) {    console.log(“success!");  }, function (err) {   console.log("Failed to swap!", error); });

Client Database

Transactional Integrated JavaScript

Page 29: Introducing Azure DocumentDB - NoSQL, No Problem

Tunable Performance

query over schema-free

JSONtransactional

integrated javascript

tunableperformance

fully managedas a service

Page 30: Introducing Azure DocumentDB - NoSQL, No Problem

Tunable Consistency LevelsBrewer’s CAP

TheoremConsistency

Availability Partition Tolerance

Page 31: Introducing Azure DocumentDB - NoSQL, No Problem

Tunable Consistency LevelsDocumentDB offers 4 consistency levelsBrewer’s CAP

TheoremConsistency

Availability Partition Tolerance99.95% Availability

SLA

Page 32: Introducing Azure DocumentDB - NoSQL, No Problem

Fully managed as a service

query over schema-free

JSONtransactional

integrated javascript

tunableperformance

fully managedas a service

Page 33: Introducing Azure DocumentDB - NoSQL, No Problem

• Predictable Performance

• Hourly Billing

• 99.95% Availability

• Adjustable Performance Levels

Fully managed as a service

databaseusers, permissions

S1 S2

collections …

S3

I’m notcryinganymore

Page 34: Introducing Azure DocumentDB - NoSQL, No Problem

DocumentDB in action

Page 35: Introducing Azure DocumentDB - NoSQL, No Problem

And many others too!“With Azure DocumentDB, we didn’t have to say ‘no’ to the business, and we weren’t a bottleneck to launching the promotion — in fact, we came in ahead of schedule.”

- Andreas Helland, TelenorMobility Architect

Page 36: Introducing Azure DocumentDB - NoSQL, No Problem

Ready to get started?

Page 37: Introducing Azure DocumentDB - NoSQL, No Problem

Fully managed as a service

Page 38: Introducing Azure DocumentDB - NoSQL, No Problem
Page 39: Introducing Azure DocumentDB - NoSQL, No Problem
Page 40: Introducing Azure DocumentDB - NoSQL, No Problem
Page 41: Introducing Azure DocumentDB - NoSQL, No Problem
Page 42: Introducing Azure DocumentDB - NoSQL, No Problem

Part of the Azure Ecosystem

Page 43: Introducing Azure DocumentDB - NoSQL, No Problem

Enriched app experiences

DocumentDB + Search

http://aka.ms/docdbsearch

Page 44: Introducing Azure DocumentDB - NoSQL, No Problem

Big data and analytics

DocumentDB + HDInsight

http://aka.ms/docdbhdi

Page 45: Introducing Azure DocumentDB - NoSQL, No Problem

• Collections != Tables

• De-normalize data where appropriate

• Tuning / Perf• Consistency Levels• Index Policies• Understand Query Costs / Limits / Avoid Scans• Pre-aggregate where possible

Quick Tips

Page 46: Introducing Azure DocumentDB - NoSQL, No Problem

Recent Announcements (since April 2015)• Order By

• Geospatial Indexing

• Id based routing

• Online index transformations

• JavaScript language-integrated query

• Azure Preview portal enhancements

• Data migration tool enhancements

• Azure Data Factory Integration

• Partitioning support

• General availability in Australia

Page 47: Introducing Azure DocumentDB - NoSQL, No Problem

Thank YouGet started with Azure DocumentDB

http://www.azure.com/docdb

Query Demo:https://www.documentdb.com/sql/demo

Andrew [email protected]

@aliuy8

Page 48: Introducing Azure DocumentDB - NoSQL, No Problem
Page 49: Introducing Azure DocumentDB - NoSQL, No Problem

Data Modeling

Page 50: Introducing Azure DocumentDB - NoSQL, No Problem

{ "id": "1", "firstName": "Thomas", "lastName": "Andersen", "addresses": [ { "line1": "100 Some Street", "line2": "Unit 1", "city": "Seattle", "state": "WA", "zip": 98012 } ], "contactDetails": [ {"email: "[email protected]"}, {"phone": "+1 555 555-5555", "extension": 5555} ] }

Try model your entity as a self-contained documentGenerally, use embedded data models when:

There are "contains" relationships between entitiesThere are one-to-few relationships between entities Embedded data changes infrequentlyEmbedded data won’t grow without boundEmbedded data is integral to data in a document

Data modeling with denormalization

Denormalizing typically provides for better read performance

Page 51: Introducing Azure DocumentDB - NoSQL, No Problem

In general, use normalized data models when:

Write performance is more important than read performanceRepresenting one-to-many relationshipsCan representing many-to-many relationshipsRelated data changes frequently

Provides more flexibility than embeddingMore round trips to read data

Data modeling with referencing

{"id": "xyz","username:

"user xyz"}

{"id": "address_xyz","userid": "xyz",

"address" : {…

}}

{"id: "contact_xyz","userid": "xyz","email" :

"[email protected]" "phone" : "555 5555"}

User document

Address document

Contact details document

Normalizing typically provides better write performance

Page 52: Introducing Azure DocumentDB - NoSQL, No Problem

No magic bulletThink about how your data is going to be written, read and model accordingly

Hybrid models ~ denormalize + reference + aggregate

{ "id": "1", "firstName": "Thomas", "lastName": "Andersen", "countOfBooks": 3, "books": [1, 2, 3], "images": [

{"thumbnail": "http://....png"} {"profile": "http://....png"}

] }

{ "id": 1, "name": "DocumentDB 101", "authors": [

{"id": 1, "name": "Thomas Andersen", "thumbnail": "http://....png"},

{"id": 2, "name": "William Wakefield", "thumbnail": "http://....png"}

] }

Author document

Book document

Page 53: Introducing Azure DocumentDB - NoSQL, No Problem

Request Units

Page 54: Introducing Azure DocumentDB - NoSQL, No Problem

Request Units

Request Unit (RU) is the normalized currency

% Memory

% IOPS

% CPU

Replica gets a fixed budget of Request Units

READGET Resourc

e

Resourceset

INSERT

POSTResource

DELETEDELETE Resourc

e

QueryPOST Document

sSQL

EXECUTEPOST sprocsargs

REPLACE

PUTResource

Resource

Predictable PerformanceEach DocumentDB collection has reserved throughput in terms of request units (RUs)Normalized currency across database operationsRUs offer accurate accounting in face of diverse database operations

Page 55: Introducing Azure DocumentDB - NoSQL, No Problem

Request UnitsOperation Request units

(RUs) consumed*

Reading a single 1KB document 1

Reading a single 2KB document 2

Query with a simple predicate for a 1KB document

3

Creating a single 1 KB document with 10 JSON properties (consistent indexing)

14

Create a single 1 KB document with 100 JSON properties (consistent indexing)

20

Replacing a single 1 KB document 28

Execute a stored procedure with two create documents

30

Page 56: Introducing Azure DocumentDB - NoSQL, No Problem

Partitioning

Page 57: Introducing Azure DocumentDB - NoSQL, No Problem

Why Partition?

• Data SizeA single collection holds 10GB

• Throughput3 Performance tiers with a max of 2,500 RU/sec

Page 58: Introducing Azure DocumentDB - NoSQL, No Problem

Start with 1 partition, fill it, then move to next

headroom {(fill factor)

Partitioning - Spillover

Page 59: Introducing Azure DocumentDB - NoSQL, No Problem

Keep current data hot, Warm historical data, Scale-down older data, Purge / Archive

}current period

Partitioning - Range

Page 60: Introducing Azure DocumentDB - NoSQL, No Problem

Home tenant / user to a specific partition. Use "master" lookup.

Tenant Partition Id

Customer 1Big Customer

2

Another 3

Cache this shard map

to avoid makingthe lookup the

bottleneck

Partitioning - Lookup

Page 61: Introducing Azure DocumentDB - NoSQL, No Problem

Evenly distribute across n number of partitions (algorithmic) ….

Partitioning - Hash

Page 62: Introducing Azure DocumentDB - NoSQL, No Problem

- Application needs to query each candidate partition

( can be done in parallel )

- Application consolidates (or reduces) results

{ record: "1", created: { "date": "6/1/2014", "epoch": 1401662986 }},{ record: "3", created: { "date": "9/23/2014" "epoch": 1411512586 }} ,{ record: "123", created: { "date": "8/17/2013" "epoch": 1376779786 }}

SELECT * FROM root r WHERE r.date.epoch BETWEEN 1376779786 AND 1401662986

{ record: "1", created: { "date": "6/1/2014", "epoch": 1401662986 }},{ record: "3", created: { "date": "9/23/2014" "epoch": 1411512586 }}

{ record: "43233", created: { "epoch": 1411512586 }} ,{ record: "1123", created: { "date": "8/17/2013" "epoch": 1376779786 }},{ record: "43234", created: { "epoch": 1376779786}

Partitioning - Fan-out Queries

Page 63: Introducing Azure DocumentDB - NoSQL, No Problem

Design: PartitioningHash sharding• Examples: Profile data (user ID, app ID), (user ID), Device and vehicle data (device/vin

ID), Catalog data (item ID)

• Pros: balanced, stateless• Cons: reshuffling is hard

Range sharding• Examples: Operational data (timestamp), (timestamp, event ID)• Pros: easy sliding window, range queries• Cons: stateful

Lookup sharding• SaaS/multitenant service (tenant ID), Metadata store (type ID)• Pros: simple, easy to reshuffle, can span accounts• Cons: stateful, works only on discrete keys

Page 64: Introducing Azure DocumentDB - NoSQL, No Problem

Tunable Indexing

Page 65: Introducing Azure DocumentDB - NoSQL, No Problem

IndexingHow it worksAutomatic indexing of documentsJSON documents are represented as treesStructural information and instance values are normalized into a JSON-PathFixed upper bound on index size (typically 5-10% in real production data)

Example{"headquarters": "Belgium"} /"headquarters"/"Belgium" {"exports": [{"city": “Moscow"}, {"city": Athens"}]} /"exports"/0/"city"/"Moscow" and /"exports"/1/"city"/"Athens".

Page 66: Introducing Azure DocumentDB - NoSQL, No Problem

Indexing PoliciesConfiguration Level Options

Automatic Per collection True (default) or False Override with each document write

Indexing Mode Per collection Consistent or Lazy Lazy for eventual updates/bulk ingestion

Included and excluded paths

Per path Individual path or recursive includes (? And *)

Indexing Type Per path Support Hash (Default) and RangeHash for equality, range for range queries

Indexing Precision Per path Supports 3 – 7 per pathTradeoff storage, query RUs and write RUs

Page 67: Introducing Azure DocumentDB - NoSQL, No Problem

Indexing Paths Path Description/use case / Default path for collection. Recursive and applies to whole

document tree. /"prop"/? Serve queries like the following (with Hash or Range types

respectively): SELECT * FROM collection c WHERE c.prop = "value" SELCT * FROM collection c WHERE c.prop > 5

/"prop"/* All paths under the specified label. /"prop"/"subprop"/ Used during query execution to prune documents that do not have

the specified path. /"prop"/"subprop"/? Serve queries (with Hash or Range types respectively):

SELECT * FROM collection c WHERE c.prop.subprop = "value" SELECT * FROM collection c WHERE c.prop.subprop > 5

Page 68: Introducing Azure DocumentDB - NoSQL, No Problem

Thank YouGet started with Azure DocumentDB

http://www.azure.com/docdb

Query Demo:https://www.documentdb.com/sql/demo

Andrew [email protected]

@aliuy8