azure documentdb: advanced features for large scale-apps

70
Azure DocumentDB: anced Features for Large-Scale Apps { "name": "Andrew Liu", "e-mail": "[email protected]", "twitter": "@aliuy8" }

Upload: andrew-liu

Post on 11-Feb-2017

183 views

Category:

Software


0 download

TRANSCRIPT

Page 1: Azure DocumentDB: Advanced Features for Large Scale-Apps

Azure DocumentDB:Advanced Features for Large-Scale Apps

{ "name": "Andrew Liu", "e-mail": "[email protected]", "twitter": "@aliuy8"}

Page 2: Azure DocumentDB: Advanced Features for Large Scale-Apps

First… a Rant

Page 3: Azure DocumentDB: Advanced Features for Large Scale-Apps

managing servers makes me cry

Page 4: Azure DocumentDB: Advanced Features for Large Scale-Apps

structuring data is really hard

Page 5: Azure DocumentDB: Advanced Features for Large Scale-Apps

managing schema and indexes makes me angry

Page 6: Azure DocumentDB: Advanced Features for Large Scale-Apps

DocumentDBNoSQL… as a service!

Page 7: Azure DocumentDB: Advanced Features for Large Scale-Apps

Let's talk about…• A quick recap on NoSQL

• Big Data Challenges

• Partitioning, Data Modeling, Stored Procedures

• Q&A

Page 8: Azure DocumentDB: Advanced Features for Large Scale-Apps

• NoSQL is buzzword

• NoSQL is varied• Key-value• Wide-column • Graph• Document-oriented

NoSQL in a nutshell

Page 9: Azure DocumentDB: Advanced Features for Large Scale-Apps

{ "name": "SmugMug", "permalink": "smugmug", "homepage_url": "http://www.smugmug.com", "blog_url": "http://blogs.smugmug.com/", "category_code": "photo_video", "products": [ { "name": "SmugMug", "permalink": "smugmug" } ], "offices": [ { "description": "", "address1": "67 E. Evelyn Ave", "address2": "", "zip_code": "94041", "city": "Mountain View", "state_code": "CA", "country_code": "USA", "latitude": 37.390056, "longitude": -122.067692 } ]}

Perfect for these

Documentsschema-agnostic JSON store

for

hierarchical and de-normalized data at scale

Page 10: Azure DocumentDB: Advanced Features for Large Scale-Apps

Not these documents

Page 11: Azure DocumentDB: Advanced Features for Large Scale-Apps

{ "name": "SmugMug", "permalink": "smugmug", "homepage_url": "http://www.smugmug.com", "blog_url": "http://blogs.smugmug.com/", "category_code": "photo_video", "products": [ { "name": "SmugMug", "permalink": "smugmug" } ], "offices": [ { "description": "", "address1": "67 E. Evelyn Ave", "address2": "", "zip_code": "94041", "city": "Mountain View", "state_code": "CA", "country_code": "USA", "latitude": 37.390056, "longitude": -122.067692 } ]}

Perfect for these

Documentsschema-agnostic JSON store

for

hierarchical and de-normalized data at scale

Page 12: Azure DocumentDB: Advanced Features for Large Scale-Apps

Azure DocumentDB

Elastic Limitless scale

Millions of RPSMany TBs of data

Transparent Partitioning

<10ms Reads<15ms Writes

@P99

Low-latency access around the globe!

Guaranteed low latency

Globally replicated

Automatic IndexingEasy-to-learn query

grammarMulti-Record Transactions

Schema Freedom

Blazing fast, planet scale NoSQL service

99.99% SLAs for availability, latency, and throughput

Page 13: Azure DocumentDB: Advanced Features for Large Scale-Apps

How does this fit in the Azure family?

Page 14: Azure DocumentDB: Advanced Features for Large Scale-Apps

“If all you have is a hammer, everything looks like a nail“

-Abraham Maslow

Page 15: Azure DocumentDB: Advanced Features for Large Scale-Apps

The database renaissance!

Page 16: Azure DocumentDB: Advanced Features for Large Scale-Apps

Choose the right tools for the right job

Page 17: Azure DocumentDB: Advanced Features for Large Scale-Apps

Problem 1: Variety

Page 18: Azure DocumentDB: Advanced Features for Large Scale-Apps

Item Author Pages

Language

Harry Potter and the Sorcerer’s Stone

J.K. Rowling 309 English

Game of Thrones: A Song of Ice and Fire

George R.R. Martin

864 English

Page 19: Azure DocumentDB: Advanced Features for Large Scale-Apps

Item Author Pages

Language

Harry Potter and the Sorcerer’s Stone

J.K. Rowling 309 English

Game of Thrones: A Song of Ice and Fire

George R.R. Martin

864 English

Lenovo Thinkpad X1 Carbon

??? ??? ???

Page 20: Azure DocumentDB: Advanced Features for Large Scale-Apps

!=

Page 21: Azure DocumentDB: Advanced Features for Large Scale-Apps

!=

Page 22: Azure DocumentDB: Advanced Features for Large Scale-Apps

Item Author Pages Language Processor Memory StorageHarry Potter and the Sorcerer’s Stone

J.K. Rowling

309 English ??? ??? ???

Game of Thrones: A Song of Ice and Fire

George R.R. Martin

864 English ??? ??? ???

Lenovo Thinkpad X1 Carbon

??? ??? ??? Core i7 3.3ghz

8 GB 256 GB SSD

What a waste of space…

Page 23: Azure DocumentDB: Advanced Features for Large Scale-Apps

Item Author Pages

Language

Harry Potter and the Sorcerer’s Stone

J.K. Rowling 309 English

Game of Thrones: A Song of Ice and Fire

George R.R. Martin

864 English

Item CPU Memory StorageLenovo Thinkpad X1 Carbon

Core i7 3.3ghz

8 GB 256 GB SSD

More tables!

Okay… What if I have 100,000 product types?Or I have varying features for a single product

type?

Page 24: Azure DocumentDB: Advanced Features for Large Scale-Apps

ProductId Item1 Harry Potter and the

Sorcerer’s Stone2 Game of Thrones: A Song

of Ice and Fire3 Lenovo Thinkpad X1

Carbon

ProductId Attribute Value1 Author J.K. Rowling1 Pages 309

…2 Author George R.R. Martin2 Pages 864

…3 Processor Core i7 3.3ghz3 Memory 8 GB

Page 25: Azure DocumentDB: Advanced Features for Large Scale-Apps

{ "ItemType": "Book", "Title": "Harry Potter and the Sorcerer's Stone", "Author": "J.K. Rowling", "Pages": "864", "Languages": [ "English", "Spanish", "Portuguese", "Russian", "French" ]} {

"ItemType": "Laptop", "Name": "Lenovo Thinkpad X1 Carbon", "Processor": "Core i7 3.3 Ghz", "Memory": "8 GB DDR3L SDRAM", "Storage": "256 GB SSD", "Graphics": "Intel HD Graphics 4400", "Weight": "1 pound"}

Page 26: Azure DocumentDB: Advanced Features for Large Scale-Apps

It just works.

Page 27: Azure DocumentDB: Advanced Features for Large Scale-Apps

Problem 2: Scale (Volume and Velocity)

Page 28: Azure DocumentDB: Advanced Features for Large Scale-Apps

Let’s begin with a Story

Page 29: Azure DocumentDB: Advanced Features for Large Scale-Apps

Indexing JSON and fighting zombies at SCALE

Page 30: Azure DocumentDB: Advanced Features for Large Scale-Apps

Next Games Game Development Studio Based in

Helsinki, Finland

65 employees

Develop F2P mobile games for iOS and Android

Based on own & licensed IP

The Walking Dead TV show

Drama about a zombie walker apocalypse on AMC

First cable drama to beat broadcast shows

Most watched cable TV show in the US (16M users)

Page 31: Azure DocumentDB: Advanced Features for Large Scale-Apps

The Challenge

Scale with expectation of millions of users on Day 1

Deliver real time responsiveness for a lag-free, gaming experience

Highly competitive – high scoresand global leaderboards critical

More Users, More Problems

Page 32: Azure DocumentDB: Advanced Features for Large Scale-Apps
Page 33: Azure DocumentDB: Advanced Features for Large Scale-Apps

The Results

#1 in Apple app store free appsduring launch week

>1M downloads

~1B queries per day

99p queries served under 10ms

Page 34: Azure DocumentDB: Advanced Features for Large Scale-Apps

How?

Page 35: Azure DocumentDB: Advanced Features for Large Scale-Apps

Just throw some data in a database!

Page 36: Azure DocumentDB: Advanced Features for Large Scale-Apps

Just throw some data in a database!

Page 37: Azure DocumentDB: Advanced Features for Large Scale-Apps

Not that easy…

Page 38: Azure DocumentDB: Advanced Features for Large Scale-Apps

Why is this such a hard problem?

Caches Scoreboard keeps updating…

SQL database Need to shard

Schema and Index Management Loss of relational benefits

Azure Table Storage Secondary Indexes Latency Throughput

Page 39: Azure DocumentDB: Advanced Features for Large Scale-Apps

Planet-Scale NoSQL

Horizontal Scaling for storage andthroughput

High performance with SSDs andautomatic indexing

Operating on a global scale

Page 40: Azure DocumentDB: Advanced Features for Large Scale-Apps

Partitioning

Page 41: Azure DocumentDB: Advanced Features for Large Scale-Apps

Fact: Managing shards is really painful.

Page 42: Azure DocumentDB: Advanced Features for Large Scale-Apps

Elastic Scale

Page 43: Azure DocumentDB: Advanced Features for Large Scale-Apps

Good news: DocumentDB has done all the heavy lifting.

Page 44: Azure DocumentDB: Advanced Features for Large Scale-Apps

Request Units

Request Unit (RU) is the normalized currency

% Memory

% IOPS

% CPU

Replica gets a fixed budget of Request Units

READGET Resourc

e

Resourceset

INSERT

POSTResource

DELETEDELETE Resourc

e

QueryPOST Document

sSQL

EXECUTEPOST sprocsargs

REPLACE

PUTResource

Resource

Predictable PerformanceMost import metric in DocumentDB!

Page 45: Azure DocumentDB: Advanced Features for Large Scale-Apps

Partitioned Collections

Page 46: Azure DocumentDB: Advanced Features for Large Scale-Apps

What’s left? Choosing a Partition Key

Page 47: Azure DocumentDB: Advanced Features for Large Scale-Apps

Choosing a Partition Key• Workload – Read vs Write heavy?

• Top Queries

• Transaction Boundary

• Avoid Storage + Performance Bottlenecks

• Multi-Tenancy: Tenant Size

• Examples: partition by tenant, device, timestamp, or composite

Page 48: Azure DocumentDB: Advanced Features for Large Scale-Apps

Creating partitioned collections //pre-defined collectionsDocumentCollection collectionSpec = new DocumentCollection { Id = "Walkers" };RequestOptions options = new RequestOptions { OfferType = "S3" };

DocumentCollection documentCollection = await client.CreateDocumentCollectionAsync("dbs/" + database.Id, collectionSpec, options);

//partitioned collectionsDocumentCollection collectionSpec = new DocumentCollection { Id = "Walkers" };collectionSpec.PartitionKey.Paths.Add(“/walkerId”);int collectionThroughput = 100000; RequestOptions options = new RequestOptions { OfferThroughput = collectionThroughput };

DocumentCollection documentCollection = await client.CreateDocumentCollectionAsync("dbs/" + database.Id, collectionSpec, options);

Page 49: Azure DocumentDB: Advanced Features for Large Scale-Apps

Let's talk about a physics problem

Page 50: Azure DocumentDB: Advanced Features for Large Scale-Apps

Globally Distributed

• Not just for disaster recovery…. DocumentDB is unreasonably highly available

• Replicate data across any # of regions of your choice

• Low-latency access to your data around the globe

• Dynamically configure your write and read regions

Azure DocumentDB gives you the ability cheat the speed of light!

Page 51: Azure DocumentDB: Advanced Features for Large Scale-Apps

… with well-defined consistency models!

Bounded Staleness

Session

Eventual

Strong

LEFT TO RIGHT Relaxed consistency => better performance and availability

Consistency Level Strong Bounded Staleness Session Eventual

Total global order Yes Yes, outside of the “staleness window”

No, partial “session” order

No

Consistent prefix guarantee

Yes Yes Yes Yes

Monotonic reads Yes Yes, across regions outside of the staleness window and within a region all the time

Yes, for the given session

No

Monotonic writes Yes Yes Yes YesRead your writes Yes Yes (in the write region) Yes No

Strong consistency, High latency

Eventual consistency, Low latency

27%3%

54%

16%

Observed Distribution

Bounded-StalenessEventualSessionStrong

Page 52: Azure DocumentDB: Advanced Features for Large Scale-Apps

App defined regional preferencesConnectionPolicy docClientConnectionPolicy = new ConnectionPolicy { ConnectionMode =

ConnectionMode.Direct, ConnectionProtocol = Protocol.Tcp };

docClientConnectionPolicy.PreferredLocations.Add(LocationNames.EastUS2);docClientConnectionPolicy.PreferredLocations.Add(LocationNames.WestUS);

docClient = new DocumentClient( new Uri("https://myglobaldb.documents.azure.com:443"),

"PARvqUuBw2QTO4rRXr6d1GnLCR7VinERcYrBQvDRh6EDTJLOHtZxgjTS4pv8nQv2Lg1QQLBLfO6TVziOZKvYow==", docClientConnectionPolicy);

Page 53: Azure DocumentDB: Advanced Features for Large Scale-Apps

Enjoy true schema-freedom

Page 54: Azure DocumentDB: Advanced Features for Large Scale-Apps

Automatic Indexing• Index is a union of all the document trees

Commonstructure

1 2

Terms Postings List/Values

$/location/0/ 1, 2location/0/country/

1, 2

location/0/city/ 1, 20/country/Germany

1, 2

1/country/France 2 … …0/city/Moscow 20/dealers/0 2

http://aka.ms/docdbvldb

No need to define secondary indices / schema hints!

Page 55: Azure DocumentDB: Advanced Features for Large Scale-Apps

Index policiescustomize index management including storageoverhead, throughput and query consistency

range, hash and spatial indexes included and excluded paths indexing mode; consistent or lazy index precision online, in-place index transformations

{ "indexingMode": "consistent", "automatic": true, "includedPaths": [ { "path": "/*", "indexes": [ { "kind": "Range", "dataType": "Number", "precision": -1 }, { "kind": "Hash", "dataType": "String", "precision": 3 }, { "kind": "Spatial", "dataType": "Point" } ] } ], "excludedPaths": []}

Page 56: Azure DocumentDB: Advanced Features for Large Scale-Apps

-- Nested lookup against indexSELECT Books.AuthorFROM BooksWHERE Books.Author.Name = "Leo Tolstoy"

-- Transformation, Filters, Array accessSELECT { Name: Books.Title, Author: Books.Author.Name }FROM BooksWHERE Books.Price > 10 AND Books.Languages[0] = "English"

-- Joins, User Defined Functions (UDF)SELECT CalculateRegionalTax(Books.Price, "USA", "WA")FROM BooksJOIN LanguagesArr IN Books.LanguagesWHERE LanguagesArr.Language = "Russian"

SQL Query Grammar

Query over schema-free JSON

Page 57: Azure DocumentDB: Advanced Features for Large Scale-Apps

JavaScript as a Modern Day T-SQL

Page 58: Azure DocumentDB: Advanced Features for Large Scale-Apps

Transactional Integrated JavaScript

Page 59: Azure DocumentDB: Advanced Features for Large Scale-Apps

Transactional Integrated JavaScript

Page 60: Azure DocumentDB: Advanced Features for Large Scale-Apps

function(playerId1, playerId2) {    var playersToSwap = __.filter (function (document) {        return (document.id == playerId1 || document.id == playerId2);    });

    var player1 = playersToSwap[0], player2 = playersToSwap[1];     var player1ItemTemp = player1.item;    player1.item = player2.item;    player2.item = player1ItemTemp;

    __.replaceDocument(player1)        .then(function() { return __.replaceDocument(player2); })        .fail(function(error){ throw 'Unable to update players, abort'; });}

client.executeStoredProcedureAsync ("procs/1234", ["MasterChief", "SolidSnake“]) .then(function (response) {    console.log(“success!");  }, function (err) {   console.log("Failed to swap!", error); });

Client Database

Transactional Integrated JavaScript

Page 61: Azure DocumentDB: Advanced Features for Large Scale-Apps

Getting Started

Page 62: Azure DocumentDB: Advanced Features for Large Scale-Apps

Fully managed as a service

Page 63: Azure DocumentDB: Advanced Features for Large Scale-Apps
Page 64: Azure DocumentDB: Advanced Features for Large Scale-Apps

API and Toolchain Options

DocumentDB

REST over HTTPS/TCP

Java .NET

PowerBI

Page 65: Azure DocumentDB: Advanced Features for Large Scale-Apps

Tip: Data Modeling

Page 66: Azure DocumentDB: Advanced Features for Large Scale-Apps

{ "id": "1", "firstName": "Thomas", "lastName": "Andersen", "addresses": [ { "line1": "100 Some Street", "line2": "Unit 1", "city": "Seattle", "state": "WA", "zip": 98012 } ], "contactDetails": [ {"email: "[email protected]"}, {"phone": "+1 555 555-5555", "extension": 5555} ] }

Try model your entity as a self-contained documentGenerally, use embedded data models when:

There are "contains" relationships between entitiesThere are one-to-few relationships between entities Embedded data changes infrequentlyEmbedded data won’t grow without boundEmbedded data is integral to data in a document

Data modeling with denormalization

Denormalizing typically provides for better read performance

Page 67: Azure DocumentDB: Advanced Features for Large Scale-Apps

In general, use normalized data models when:

Write performance is more important than read performanceRepresenting one-to-many relationshipsCan representing many-to-many relationshipsRelated data changes frequently

Provides more flexibility than embeddingMore round trips to read data

Data modeling with referencing

{"id": "xyz","username:

"user xyz"}

{"id": "address_xyz","userid": "xyz",

"address" : {…

}}

{"id: "contact_xyz","userid": "xyz","email" :

"[email protected]" "phone" : "555 5555"}

User document

Address document

Contact details document

Normalizing typically provides better write performance

Page 68: Azure DocumentDB: Advanced Features for Large Scale-Apps

No magic bulletThink about how your data is going to be written, read and model accordingly

Hybrid models ~ denormalize + reference + aggregate

{ "id": "1", "firstName": "Thomas", "lastName": "Andersen", "countOfBooks": 3, "books": [1, 2, 3], "images": [

{"thumbnail": "http://....png"} {"profile": "http://....png"}

] }

{ "id": 1, "name": "DocumentDB 101", "authors": [

{"id": 1, "name": "Thomas Andersen", "thumbnail": "http://....png"},

{"id": 2, "name": "William Wakefield", "thumbnail": "http://....png"}

] }

Author document

Book document

Page 69: Azure DocumentDB: Advanced Features for Large Scale-Apps

• De-normalize data where appropriate

• Collections != Tables

• Tuning / Perf• Consistency Levels• Index Policies• Understand Query Costs / Limits / Avoid Scans• Pre-aggregate where possible

Quick Tips

Page 70: Azure DocumentDB: Advanced Features for Large Scale-Apps

Thank YouGet started with Azure DocumentDB

http://www.azure.com/docdb

Query Demo:https://www.documentdb.com/sql/demo

Andrew [email protected]

@aliuy8