introducing azure documentdb - nosql, no problem

Post on 11-Feb-2017

405 Views

Category:

Software

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Hello DocumentDBNoSQL, No Problem

{ "name": "Andrew Liu", "e-mail": "andrl@microsoft.com", "twitter": "@aliuy8"}

First… a Rant

managing servers makes me cry

structuring data is really hard

backfilling data and managing indexes makes

me angry

A Solution

DocumentDBdocument-database… as a

service!

• Part of the NoSQL family of databases• Built for simplicity, scale and performance• Non-relational, no schema enforced by the database• Flexible query options

What’s a document database?

Great for these documents …

• Part of the NoSQL family of databases• Built for simplicity, scale and performance• Non-relational, no schema enforced by the database• Flexible query options

What’s a document database?

Not ideal for these documents …

• Part of the NoSQL family of databases• Built for simplicity, scale and performance• Non-relational, no schema enforced by the database• Flexible query options

What’s a document database?

Definitely not for these documents …

Heterogeneous data

Item Author Pages

Language

Harry Potter and the Sorcerer’s Stone

J.K. Rowling 309 English

Game of Thrones: A Song of Ice and Fire

George R.R. Martin

864 English

Item Author Pages

Language

Harry Potter and the Sorcerer’s Stone

J.K. Rowling 309 English

Game of Thrones: A Song of Ice and Fire

George R.R. Martin

864 English

Lenovo Thinkpad X1 Carbon

??? ??? ???

!=

!=

{ "ItemType": "Book", "Title": "Harry Potter and the Sorcerer's Stone", "Author": "J.K. Rowling", "Pages": "864", "Languages": [ "English", "Spanish", "Portuguese", "Russian", "French" ]} {

"ItemType": "Laptop", "Name": "Lenovo Thinkpad X1 Carbon", "Processor": "Core i7 3.3 Ghz", "Memory": "8 GB DDR3L SDRAM", "Storage": "256 GB SSD", "Graphics": "Intel HD Graphics 4400", "Weight": "1 pound"}

Rapid Iterative Development

3rd Party Data

It just works.

fully managed, scalable, queryable, schemafree JSON document database service for modern applications

fully featured RDBMStransactional processing

rich query managed as a service

elastic scale

internet accessible http/rest

schema-free data model

arbitrary data formats

Where does it fit in the Azure family?

Some of my favorite things

query over schema-free

JSONtransactional

integrated javascript

tunableperformance

fully managedas a service

Query over schema-free JSON

query over schema-free

JSONtransactional

integrated javascript

tunableperformance

fully managedas a service

No need to define secondary indices / schema hints for indexing!

Automatic Indexing

-- Nested lookup against indexSELECT Books.AuthorFROM BooksWHERE Books.Author.Name = "Leo Tolstoy"

-- Transformation, Filters, Array accessSELECT { Name: Books.Title, Author: Books.Author.Name }FROM BooksWHERE Books.Price > 10 AND Books.Languages[0] = "English"

-- Joins, User Defined Functions (UDF)SELECT CalculateRegionalTax(Books.Price, "USA", "WA")FROM BooksJOIN LanguagesArr IN Books.LanguagesWHERE LanguagesArr.Language = "Russian"

SQL Query Grammar

Query over schema-free JSON

Transactional Integrated JavaScript

query over schema-free

JSONtransactional

integrated javascript

tunableperformance

fully managedas a service

Transactional Integrated JavaScript

Transactional Integrated JavaScript

function(playerId1, playerId2) {    var playersToSwap = __.filter (function (document) {        return (document.id == playerId1 || document.id == playerId2);    });

    var player1 = playersToSwap[0], player2 = playersToSwap[1];     var player1ItemTemp = player1.item;    player1.item = player2.item;    player2.item = player1ItemTemp;

    __.replaceDocument(player1)        .then(function() { return __.replaceDocument(player2); })        .fail(function(error){ throw 'Unable to update players, abort'; });}

client.executeStoredProcedureAsync ("procs/1234", ["MasterChief", "SolidSnake“]) .then(function (response) {    console.log(“success!");  }, function (err) {   console.log("Failed to swap!", error); });

Client Database

Transactional Integrated JavaScript

Tunable Performance

query over schema-free

JSONtransactional

integrated javascript

tunableperformance

fully managedas a service

Tunable Consistency LevelsBrewer’s CAP

TheoremConsistency

Availability Partition Tolerance

Tunable Consistency LevelsDocumentDB offers 4 consistency levelsBrewer’s CAP

TheoremConsistency

Availability Partition Tolerance99.95% Availability

SLA

Fully managed as a service

query over schema-free

JSONtransactional

integrated javascript

tunableperformance

fully managedas a service

• Predictable Performance

• Hourly Billing

• 99.95% Availability

• Adjustable Performance Levels

Fully managed as a service

databaseusers, permissions

S1 S2

collections …

S3

I’m notcryinganymore

DocumentDB in action

And many others too!“With Azure DocumentDB, we didn’t have to say ‘no’ to the business, and we weren’t a bottleneck to launching the promotion — in fact, we came in ahead of schedule.”

- Andreas Helland, TelenorMobility Architect

Ready to get started?

Fully managed as a service

Part of the Azure Ecosystem

Enriched app experiences

DocumentDB + Search

http://aka.ms/docdbsearch

Big data and analytics

DocumentDB + HDInsight

http://aka.ms/docdbhdi

• Collections != Tables

• De-normalize data where appropriate

• Tuning / Perf• Consistency Levels• Index Policies• Understand Query Costs / Limits / Avoid Scans• Pre-aggregate where possible

Quick Tips

Recent Announcements (since April 2015)• Order By

• Geospatial Indexing

• Id based routing

• Online index transformations

• JavaScript language-integrated query

• Azure Preview portal enhancements

• Data migration tool enhancements

• Azure Data Factory Integration

• Partitioning support

• General availability in Australia

Thank YouGet started with Azure DocumentDB

http://www.azure.com/docdb

Query Demo:https://www.documentdb.com/sql/demo

Andrew Liuandrl@microsoft.com

@aliuy8

Data Modeling

{ "id": "1", "firstName": "Thomas", "lastName": "Andersen", "addresses": [ { "line1": "100 Some Street", "line2": "Unit 1", "city": "Seattle", "state": "WA", "zip": 98012 } ], "contactDetails": [ {"email: "thomas@andersen.com"}, {"phone": "+1 555 555-5555", "extension": 5555} ] }

Try model your entity as a self-contained documentGenerally, use embedded data models when:

There are "contains" relationships between entitiesThere are one-to-few relationships between entities Embedded data changes infrequentlyEmbedded data won’t grow without boundEmbedded data is integral to data in a document

Data modeling with denormalization

Denormalizing typically provides for better read performance

In general, use normalized data models when:

Write performance is more important than read performanceRepresenting one-to-many relationshipsCan representing many-to-many relationshipsRelated data changes frequently

Provides more flexibility than embeddingMore round trips to read data

Data modeling with referencing

{"id": "xyz","username:

"user xyz"}

{"id": "address_xyz","userid": "xyz",

"address" : {…

}}

{"id: "contact_xyz","userid": "xyz","email" :

"user@user.com" "phone" : "555 5555"}

User document

Address document

Contact details document

Normalizing typically provides better write performance

No magic bulletThink about how your data is going to be written, read and model accordingly

Hybrid models ~ denormalize + reference + aggregate

{ "id": "1", "firstName": "Thomas", "lastName": "Andersen", "countOfBooks": 3, "books": [1, 2, 3], "images": [

{"thumbnail": "http://....png"} {"profile": "http://....png"}

] }

{ "id": 1, "name": "DocumentDB 101", "authors": [

{"id": 1, "name": "Thomas Andersen", "thumbnail": "http://....png"},

{"id": 2, "name": "William Wakefield", "thumbnail": "http://....png"}

] }

Author document

Book document

Request Units

Request Units

Request Unit (RU) is the normalized currency

% Memory

% IOPS

% CPU

Replica gets a fixed budget of Request Units

READGET Resourc

e

Resourceset

INSERT

POSTResource

DELETEDELETE Resourc

e

QueryPOST Document

sSQL

EXECUTEPOST sprocsargs

REPLACE

PUTResource

Resource

Predictable PerformanceEach DocumentDB collection has reserved throughput in terms of request units (RUs)Normalized currency across database operationsRUs offer accurate accounting in face of diverse database operations

Request UnitsOperation Request units

(RUs) consumed*

Reading a single 1KB document 1

Reading a single 2KB document 2

Query with a simple predicate for a 1KB document

3

Creating a single 1 KB document with 10 JSON properties (consistent indexing)

14

Create a single 1 KB document with 100 JSON properties (consistent indexing)

20

Replacing a single 1 KB document 28

Execute a stored procedure with two create documents

30

Partitioning

Why Partition?

• Data SizeA single collection holds 10GB

• Throughput3 Performance tiers with a max of 2,500 RU/sec

Start with 1 partition, fill it, then move to next

headroom {(fill factor)

Partitioning - Spillover

Keep current data hot, Warm historical data, Scale-down older data, Purge / Archive

}current period

Partitioning - Range

Home tenant / user to a specific partition. Use "master" lookup.

Tenant Partition Id

Customer 1Big Customer

2

Another 3

Cache this shard map

to avoid makingthe lookup the

bottleneck

Partitioning - Lookup

Evenly distribute across n number of partitions (algorithmic) ….

Partitioning - Hash

- Application needs to query each candidate partition

( can be done in parallel )

- Application consolidates (or reduces) results

{ record: "1", created: { "date": "6/1/2014", "epoch": 1401662986 }},{ record: "3", created: { "date": "9/23/2014" "epoch": 1411512586 }} ,{ record: "123", created: { "date": "8/17/2013" "epoch": 1376779786 }}

SELECT * FROM root r WHERE r.date.epoch BETWEEN 1376779786 AND 1401662986

{ record: "1", created: { "date": "6/1/2014", "epoch": 1401662986 }},{ record: "3", created: { "date": "9/23/2014" "epoch": 1411512586 }}

{ record: "43233", created: { "epoch": 1411512586 }} ,{ record: "1123", created: { "date": "8/17/2013" "epoch": 1376779786 }},{ record: "43234", created: { "epoch": 1376779786}

Partitioning - Fan-out Queries

Design: PartitioningHash sharding• Examples: Profile data (user ID, app ID), (user ID), Device and vehicle data (device/vin

ID), Catalog data (item ID)

• Pros: balanced, stateless• Cons: reshuffling is hard

Range sharding• Examples: Operational data (timestamp), (timestamp, event ID)• Pros: easy sliding window, range queries• Cons: stateful

Lookup sharding• SaaS/multitenant service (tenant ID), Metadata store (type ID)• Pros: simple, easy to reshuffle, can span accounts• Cons: stateful, works only on discrete keys

Tunable Indexing

IndexingHow it worksAutomatic indexing of documentsJSON documents are represented as treesStructural information and instance values are normalized into a JSON-PathFixed upper bound on index size (typically 5-10% in real production data)

Example{"headquarters": "Belgium"} /"headquarters"/"Belgium" {"exports": [{"city": “Moscow"}, {"city": Athens"}]} /"exports"/0/"city"/"Moscow" and /"exports"/1/"city"/"Athens".

Indexing PoliciesConfiguration Level Options

Automatic Per collection True (default) or False Override with each document write

Indexing Mode Per collection Consistent or Lazy Lazy for eventual updates/bulk ingestion

Included and excluded paths

Per path Individual path or recursive includes (? And *)

Indexing Type Per path Support Hash (Default) and RangeHash for equality, range for range queries

Indexing Precision Per path Supports 3 – 7 per pathTradeoff storage, query RUs and write RUs

Indexing Paths Path Description/use case / Default path for collection. Recursive and applies to whole

document tree. /"prop"/? Serve queries like the following (with Hash or Range types

respectively): SELECT * FROM collection c WHERE c.prop = "value" SELCT * FROM collection c WHERE c.prop > 5

/"prop"/* All paths under the specified label. /"prop"/"subprop"/ Used during query execution to prune documents that do not have

the specified path. /"prop"/"subprop"/? Serve queries (with Hash or Range types respectively):

SELECT * FROM collection c WHERE c.prop.subprop = "value" SELECT * FROM collection c WHERE c.prop.subprop > 5

Thank YouGet started with Azure DocumentDB

http://www.azure.com/docdb

Query Demo:https://www.documentdb.com/sql/demo

Andrew Liuandrl@microsoft.com

@aliuy8

top related