introducing azure documentdb - nosql, no problem

Hello DocumentDBNoSQL, No Problem

{ "name": "Andrew Liu", "e-mail": "andrl@microsoft.com", "twitter": "@aliuy8"}

First… a Rant

managing servers makes me cry

structuring data is really hard

backfilling data and managing indexes makes

me angry

A Solution

DocumentDBdocument-database… as a

service!

• Part of the NoSQL family of databases• Built for simplicity, scale and performance• Non-relational, no schema enforced by the database• Flexible query options

What’s a document database?

Great for these documents …

Not ideal for these documents …

Definitely not for these documents …

Heterogeneous data

Item Author Pages

Language

Harry Potter and the Sorcerer’s Stone

J.K. Rowling 309 English

Game of Thrones: A Song of Ice and Fire

George R.R. Martin

864 English

Item Author Pages

Language

Harry Potter and the Sorcerer’s Stone

J.K. Rowling 309 English

Game of Thrones: A Song of Ice and Fire

George R.R. Martin

864 English

Lenovo Thinkpad X1 Carbon

??? ??? ???

{ "ItemType": "Book", "Title": "Harry Potter and the Sorcerer's Stone", "Author": "J.K. Rowling", "Pages": "864", "Languages": [ "English", "Spanish", "Portuguese", "Russian", "French" ]} {

"ItemType": "Laptop", "Name": "Lenovo Thinkpad X1 Carbon", "Processor": "Core i7 3.3 Ghz", "Memory": "8 GB DDR3L SDRAM", "Storage": "256 GB SSD", "Graphics": "Intel HD Graphics 4400", "Weight": "1 pound"}

Rapid Iterative Development

3rd Party Data

It just works.

fully managed, scalable, queryable, schemafree JSON document database service for modern applications

fully featured RDBMStransactional processing

rich query managed as a service

elastic scale

internet accessible http/rest

schema-free data model

arbitrary data formats

Where does it fit in the Azure family?

Some of my favorite things

query over schema-free

JSONtransactional

integrated javascript

tunableperformance

fully managedas a service

Query over schema-free JSON

JSONtransactional

tunableperformance

No need to define secondary indices / schema hints for indexing!

Automatic Indexing

-- Nested lookup against indexSELECT Books.AuthorFROM BooksWHERE Books.Author.Name = "Leo Tolstoy"

-- Transformation, Filters, Array accessSELECT { Name: Books.Title, Author: Books.Author.Name }FROM BooksWHERE Books.Price > 10 AND Books.Languages[0] = "English"

-- Joins, User Defined Functions (UDF)SELECT CalculateRegionalTax(Books.Price, "USA", "WA")FROM BooksJOIN LanguagesArr IN Books.LanguagesWHERE LanguagesArr.Language = "Russian"

SQL Query Grammar

Query over schema-free JSON

Transactional Integrated JavaScript

JSONtransactional

tunableperformance

function(playerId1, playerId2) { var playersToSwap = __.filter (function (document) { return (document.id == playerId1 || document.id == playerId2); });

var player1 = playersToSwap[0], player2 = playersToSwap[1]; var player1ItemTemp = player1.item; player1.item = player2.item; player2.item = player1ItemTemp;

__.replaceDocument(player1) .then(function() { return __.replaceDocument(player2); }) .fail(function(error){ throw 'Unable to update players, abort'; });}

client.executeStoredProcedureAsync ("procs/1234", ["MasterChief", "SolidSnake“]) .then(function (response) { console.log(“success!"); }, function (err) { console.log("Failed to swap!", error); });

Client Database

Tunable Performance

JSONtransactional

tunableperformance

Tunable Consistency LevelsBrewer’s CAP

TheoremConsistency

Availability Partition Tolerance

Tunable Consistency LevelsDocumentDB offers 4 consistency levelsBrewer’s CAP

TheoremConsistency

Availability Partition Tolerance99.95% Availability

Fully managed as a service

JSONtransactional

tunableperformance

• Predictable Performance

• Hourly Billing

• 99.95% Availability

• Adjustable Performance Levels

databaseusers, permissions

collections …

I’m notcryinganymore

DocumentDB in action

And many others too!“With Azure DocumentDB, we didn’t have to say ‘no’ to the business, and we weren’t a bottleneck to launching the promotion — in fact, we came in ahead of schedule.”

- Andreas Helland, TelenorMobility Architect

Ready to get started?

Part of the Azure Ecosystem

Enriched app experiences

DocumentDB + Search

http://aka.ms/docdbsearch

Big data and analytics

DocumentDB + HDInsight

http://aka.ms/docdbhdi

• Collections != Tables

• De-normalize data where appropriate

• Tuning / Perf• Consistency Levels• Index Policies• Understand Query Costs / Limits / Avoid Scans• Pre-aggregate where possible

Quick Tips

Recent Announcements (since April 2015)• Order By

• Geospatial Indexing

• Id based routing

• Online index transformations

• JavaScript language-integrated query

• Azure Preview portal enhancements

• Data migration tool enhancements

• Azure Data Factory Integration

• Partitioning support

• General availability in Australia

Thank YouGet started with Azure DocumentDB

http://www.azure.com/docdb

Query Demo:https://www.documentdb.com/sql/demo

Andrew Liuandrl@microsoft.com

@aliuy8

Data Modeling

{ "id": "1", "firstName": "Thomas", "lastName": "Andersen", "addresses": [ { "line1": "100 Some Street", "line2": "Unit 1", "city": "Seattle", "state": "WA", "zip": 98012 } ], "contactDetails": [ {"email: "thomas@andersen.com"}, {"phone": "+1 555 555-5555", "extension": 5555} ] }

Try model your entity as a self-contained documentGenerally, use embedded data models when:

There are "contains" relationships between entitiesThere are one-to-few relationships between entities Embedded data changes infrequentlyEmbedded data won’t grow without boundEmbedded data is integral to data in a document

Data modeling with denormalization

Denormalizing typically provides for better read performance

In general, use normalized data models when:

Write performance is more important than read performanceRepresenting one-to-many relationshipsCan representing many-to-many relationshipsRelated data changes frequently

Provides more flexibility than embeddingMore round trips to read data

Data modeling with referencing

{"id": "xyz","username:

"user xyz"}

{"id": "address_xyz","userid": "xyz",

"address" : {…

{"id: "contact_xyz","userid": "xyz","email" :

"user@user.com" "phone" : "555 5555"}

User document

Address document

Contact details document

Normalizing typically provides better write performance

No magic bulletThink about how your data is going to be written, read and model accordingly

Hybrid models ~ denormalize + reference + aggregate

{ "id": "1", "firstName": "Thomas", "lastName": "Andersen", "countOfBooks": 3, "books": [1, 2, 3], "images": [

{"thumbnail": "http://....png"} {"profile": "http://....png"}

{ "id": 1, "name": "DocumentDB 101", "authors": [

{"id": 1, "name": "Thomas Andersen", "thumbnail": "http://....png"},

{"id": 2, "name": "William Wakefield", "thumbnail": "http://....png"}

Author document

Book document

Request Units

Request Unit (RU) is the normalized currency

% Memory

% IOPS

Replica gets a fixed budget of Request Units

READGET Resourc

Resourceset

INSERT

POSTResource

DELETEDELETE Resourc

QueryPOST Document

EXECUTEPOST sprocsargs

REPLACE

PUTResource

Resource

Predictable PerformanceEach DocumentDB collection has reserved throughput in terms of request units (RUs)Normalized currency across database operationsRUs offer accurate accounting in face of diverse database operations

Request UnitsOperation Request units

(RUs) consumed*

Reading a single 1KB document 1

Reading a single 2KB document 2

Query with a simple predicate for a 1KB document

Creating a single 1 KB document with 10 JSON properties (consistent indexing)

Create a single 1 KB document with 100 JSON properties (consistent indexing)

Replacing a single 1 KB document 28

Execute a stored procedure with two create documents

Partitioning

Why Partition?

• Data SizeA single collection holds 10GB

• Throughput3 Performance tiers with a max of 2,500 RU/sec

Start with 1 partition, fill it, then move to next

headroom {(fill factor)

Partitioning - Spillover

Keep current data hot, Warm historical data, Scale-down older data, Purge / Archive

}current period

Partitioning - Range

Home tenant / user to a specific partition. Use "master" lookup.

Tenant Partition Id

Customer 1Big Customer

Another 3

Cache this shard map

to avoid makingthe lookup the

bottleneck

Partitioning - Lookup

Evenly distribute across n number of partitions (algorithmic) ….

Partitioning - Hash

- Application needs to query each candidate partition

( can be done in parallel )

- Application consolidates (or reduces) results

{ record: "1", created: { "date": "6/1/2014", "epoch": 1401662986 }},{ record: "3", created: { "date": "9/23/2014" "epoch": 1411512586 }} ,{ record: "123", created: { "date": "8/17/2013" "epoch": 1376779786 }}

SELECT * FROM root r WHERE r.date.epoch BETWEEN 1376779786 AND 1401662986

{ record: "1", created: { "date": "6/1/2014", "epoch": 1401662986 }},{ record: "3", created: { "date": "9/23/2014" "epoch": 1411512586 }}

{ record: "43233", created: { "epoch": 1411512586 }} ,{ record: "1123", created: { "date": "8/17/2013" "epoch": 1376779786 }},{ record: "43234", created: { "epoch": 1376779786}

Partitioning - Fan-out Queries

Design: PartitioningHash sharding• Examples: Profile data (user ID, app ID), (user ID), Device and vehicle data (device/vin

ID), Catalog data (item ID)

• Pros: balanced, stateless• Cons: reshuffling is hard

Range sharding• Examples: Operational data (timestamp), (timestamp, event ID)• Pros: easy sliding window, range queries• Cons: stateful

Lookup sharding• SaaS/multitenant service (tenant ID), Metadata store (type ID)• Pros: simple, easy to reshuffle, can span accounts• Cons: stateful, works only on discrete keys

Tunable Indexing

IndexingHow it worksAutomatic indexing of documentsJSON documents are represented as treesStructural information and instance values are normalized into a JSON-PathFixed upper bound on index size (typically 5-10% in real production data)

Example{"headquarters": "Belgium"} /"headquarters"/"Belgium" {"exports": [{"city": “Moscow"}, {"city": Athens"}]} /"exports"/0/"city"/"Moscow" and /"exports"/1/"city"/"Athens".

Indexing PoliciesConfiguration Level Options

Automatic Per collection True (default) or False Override with each document write

Indexing Mode Per collection Consistent or Lazy Lazy for eventual updates/bulk ingestion

Included and excluded paths

Per path Individual path or recursive includes (? And *)

Indexing Type Per path Support Hash (Default) and RangeHash for equality, range for range queries

Indexing Precision Per path Supports 3 – 7 per pathTradeoff storage, query RUs and write RUs

Indexing Paths Path Description/use case / Default path for collection. Recursive and applies to whole

document tree. /"prop"/? Serve queries like the following (with Hash or Range types

respectively): SELECT * FROM collection c WHERE c.prop = "value" SELCT * FROM collection c WHERE c.prop > 5

/"prop"/* All paths under the specified label. /"prop"/"subprop"/ Used during query execution to prune documents that do not have

the specified path. /"prop"/"subprop"/? Serve queries (with Hash or Range types respectively):

SELECT * FROM collection c WHERE c.prop.subprop = "value" SELECT * FROM collection c WHERE c.prop.subprop > 5

Thank YouGet started with Azure DocumentDB

http://www.azure.com/docdb

Query Demo:https://www.documentdb.com/sql/demo

Andrew Liuandrl@microsoft.com

@aliuy8

introducing azure documentdb - nosql, no problem

Software

nosql storage in windows azure

azure documentdb for healthcare integration - part 2

sdi/istc seminar · 2015-04-23 · schema-agnostic indexing...

azure documentdb

microsoft azure documentdb

introduction to azure documentdb

azure documentdb en dev@nights

high performance nosql with mongodb - sdd 2016 · pdf...

introducción a azure documentdb

schema-agnostic indexing with azure documentdb - · pdf...

azure documentdb: advanced features for large scale-apps

schema agnostic indexing with live...

sdi/istc seminar€¦ · schema-agnostic indexing with...

microsoft azure documentdb query cheat...

azure documentdb overview

documentdb - another nosql solution for cloud infrastructure

microsoft azure documentdb query cheat sheet · microsoft...

schema-agnostic indexing with azure documentdb ·...

webinar - introduction to azure documentdb

tdc2015 nosql-documentdb