azure documentdb

55
Italian Virtual Chapter – 19.10.2016 Azure DocumentDb Marco Parenzan Microsoft MVP for Azure Microsoft Azure Trainer @ Cloud Academy SAGL Community Lead 1nn0va [email protected] @marco_parenzan

Upload: marco-parenzan

Post on 11-Feb-2017

103 views

Category:

Software


0 download

TRANSCRIPT

Page 1: Azure DocumentDb

Italian Virtual Chapter – 19.10.2016

Azure DocumentDbMarco Parenzan

Microsoft MVP for AzureMicrosoft Azure Trainer @ Cloud Academy SAGL

Community Lead [email protected]

@marco_parenzan

Page 2: Azure DocumentDb

Italian Virtual Chapter – 19.10.2016

Document Db◇ Fully managed◇ Schema agnostic◇ Scalable◇ Tunable consistency levels◇ Tunable indexing policies◇ Familiar SQL syntax for querying◇ JavaScript execution

Page 3: Azure DocumentDb

Italian Virtual Chapter – 19.10.2016

Documents

Page 4: Azure DocumentDb

Italian Virtual Chapter – 19.10.2016

Developer Appeal◇ Document is JSON Document◇ DocumentDb is a schemaless Db◇ Resilient to iterative schema changes◇ Promote code first development (mapping objects to json)◇ Low impedance as object / JSON store; no ORM required◇ Richer query and indexing (compared to KV stores) ◇ It just works◇ It’s fast◇ It’s great for Catalog Data, Preference and State, Event Store, User

Generated Content, Data Exchange

Page 5: Azure DocumentDb

Italian Virtual Chapter – 19.10.2016

Train yourself with ViewModels◇ Implement a real contractsomething to exchange from Presentation to

BL/DA◇ ViewModel=a model that is functional just for presentation, not persistence

■ No more Ids■ No more null fields■ No more grayed/hidden fields■ No more graphs■ No more joins■ No many roles per entity (just one)

◇ Greatly represented in JSON

Page 6: Azure DocumentDb

Italian Virtual Chapter – 19.10.2016

Come as you are

Data normalizationORM

Embedding vs. Referencing

Page 7: Azure DocumentDb

Italian Virtual Chapter – 19.10.2016

embed reference

Embedding vs. referencing

Page 8: Azure DocumentDb

Italian Virtual Chapter – 19.10.2016

Referencing◇ Representing one-to-many relationships.◇ Representing many-to-many relationships.◇ Related data changes frequently.◇ Referenced data could be unbounded◇ Provides more flexibility than embedding

■ More round trips to read data◇ Normalizing typically provides better write performance

Page 9: Azure DocumentDb

Italian Virtual Chapter – 19.10.2016

Embedding◇ There are contains relationships between entities.◇ There are one-to-few relationships between entities.◇ There is embedded data that changes infrequently.◇ There is embedded data won't grow without bound.◇ There is embedded data that is integral to data in a document.

Page 10: Azure DocumentDb

Italian Virtual Chapter – 19.10.2016

Resource Model◇ DocumentDb is Platform as a Service

■ No OnPremise◇ RESTful API

■ All DocDb elements public and accessible as Resource Uri◇ Resource

■ Json Resources

Page 11: Azure DocumentDb

Italian Virtual Chapter – 19.10.2016

Resource Model Items

Database Account Databases Collections Documents

Page 12: Azure DocumentDb

Italian Virtual Chapter – 19.10.2016

Database Account◇ Unit of Autorization◇ Unit of Consistency

Page 13: Azure DocumentDb

Italian Virtual Chapter – 19.10.2016

Unit of Authorization◇ Master keys

■ Upon creation of a DocumentDB account, two master keys (primary and secondary) are created. These keys enable full administrative access to all resources within the DocumentDB account.

◇ Read-only keys■ Upon creation of a DocumentDB account, two read-only keys (primary

and secondary) are created. These keys enable read-only access to all resources within the DocumentDB account.

Page 14: Azure DocumentDb

Italian Virtual Chapter – 19.10.2016

Unit of Consistency◇ Query / transaction throughput (and reliability – i.e., hardware failure)

depend on replication!■ All writes to the primary are replicated across two secondary replicas■ All reads are distributed across three copies■ “Scalability of throughput” – allowing different clients to read from

different replicas helps prevent bottlenecks◇ BUT replication takes time!

■ Potential scenario: some clients are reading while another is writing■ Now, the data is stale (out-of-date), inconsistent!

Page 15: Azure DocumentDb

Italian Virtual Chapter – 19.10.2016

Tweakable Consistency◇ Trade-off: speed (performance & availability) or consistency (data

correctness)?■ “Does every read need the MOST current data?”■ “Or do I need every request to be handled and handled quickly?”

◇ 4 options …■ Strong, Session, Bounded Staleness, Eventual■ Default consistency for the entire Db…■ At collection basis in a future release■ On query basis (optional parameter on CreateDocumentQuery

method)

Page 16: Azure DocumentDb

Italian Virtual Chapter – 19.10.2016

Stale data◇ ViewModel is state

■ ViewModel is disconnected state from our trueness (the DB!)■ ViewModel is duplicated state from DB■ Many users can duplicataten-uplicate state from DB

◇ So…which is reality?■ You have STALE data, you have a lot of smell

◇ What smells?■ Copies of the data that are not the truth

◇ Entity can be a lie, because it says “that will be the state”

Page 17: Azure DocumentDb

Italian Virtual Chapter – 19.10.2016

CAP Theorem◇ Consistency:

■ All nodes should see the same data at the same time◇ Availability:

■ Node failures do not prevent survivors from continuing to operate◇ Partition-tolerance:

■ The system continues to operate despite network partitions◇ A distributed system can satisfy any two of these guarantees at the same

time but not all three

Page 18: Azure DocumentDb

Italian Virtual Chapter – 19.10.2016

Strong◇ client always sees completely consistent data◇ Slowest reads / writes ◇ Mission critical: e.x. stock market, banking, airline reservation

Page 19: Azure DocumentDb

Italian Virtual Chapter – 19.10.2016

Session◇ Default – even trade-off between performance & availability vs. data

correctness◇ client reads its own writes, but other clients reading this same data might

see older values

Page 20: Azure DocumentDb

Italian Virtual Chapter – 19.10.2016

Bounded Staleness◇ client might see old data, but it can specify a limit for how old that data

can be (ex. 2 seconds) ◇ Updates happen in order received◇ similar to Session consistency, but speeds up reads while still preserving

the order of updates

Page 21: Azure DocumentDb

Italian Virtual Chapter – 19.10.2016

Eventual◇ client might see old data for as long as it takes a write to propagate to all

replicas◇ High performance & availability, but a client might sometimes read out-of-

date information or see updates out of order

Page 22: Azure DocumentDb

Italian Virtual Chapter – 19.10.2016

Setting Consistency◇ At the database level (see preview portal)◇ On a per-read or per-query basis (optional parameter on

CreateDocumentQuery method)

Page 23: Azure DocumentDb

Italian Virtual Chapter – 19.10.2016

Globally Distributed◇ Azure DocumentDB gives you the

ability cheat the speed of light!◇ Not just for disaster recovery….

DocumentDB is unreasonably highly available

◇ Replicate data across any # of regions of your choice

◇ Low-latency access to your data around the globe

◇ Dynamically configure your write and read regions

Page 24: Azure DocumentDb

Italian Virtual Chapter – 19.10.2016

Databases◇ Unit of Namespace

Page 25: Azure DocumentDb

Italian Virtual Chapter – 19.10.2016

Collections

Page 26: Azure DocumentDb

Italian Virtual Chapter – 19.10.2016

DocumentDb Performance◇ Data is saved on SSD◇ All writes to the primary are replicated across two secondary replicas

■ (Replicas are spread on different hardware in same region to protect against failures)

◇ All reads are distributed across the three copies (when and how depend on consistency level for db account and query)

Page 27: Azure DocumentDb

Italian Virtual Chapter – 19.10.2016

Collections◇ A unit of scale for transaction

■ for stored procedures and triggers◇ A unit of query throughput

■ capacity units allocated uniformly across all collections)◇ A unit of replication

■ A collection is replicated three times◇ A container of JSON documents

■ JSON docs inside of a collection can vary dramatically

Page 28: Azure DocumentDb

Italian Virtual Chapter – 19.10.2016

CollectionsDatabase Account

Users

Permissions

Collections Documents

Stored Procedures

Triggers

User Defined Functions

JS

JS

JS

AttachmentsDatabases

Page 29: Azure DocumentDb

Italian Virtual Chapter – 19.10.2016

Unit of query throughput◇ Collection-based RU Reservation

■ Capacity units allocated uniformly across all collections)◇ Standard pricing tier with hourly billing◇ Performance levels can be adjusted ◇ Each collection = 10GB of SSD

■ Limit of 100 collections (1 TB) ■ Soft limit, can be lifted as needed per account (with Support)

Page 30: Azure DocumentDb

Italian Virtual Chapter – 19.10.2016

Performance levels

Page 31: Azure DocumentDb

Italian Virtual Chapter – 19.10.2016

Request Units◇ Predictable Performance◇ Each DocumentDB collection has

reserved throughput in terms of request units (RUs)

◇ Normalized currency across database operations

◇ RU=◇ RUs offer accurate accounting in

face of diverse database operations

Operation RU Consumed

Reading a single 1KB document 1

Reading a single 2KB document 2

Query with a simple predicate for a 1KB document 3

Creating a single 1 KB document with 10 JSON properties (consistent indexing)

14

Create a single 1 KB document with 100 JSON properties (consistent indexing)

20

Replacing a single 1 KB document 28

Execute a stored procedure with two create documents

30

Page 32: Azure DocumentDb

Italian Virtual Chapter – 19.10.2016

DEMO

Page 33: Azure DocumentDb

Italian Virtual Chapter – 19.10.2016

Partitioning

Page 34: Azure DocumentDb

Italian Virtual Chapter – 19.10.2016

Why Partition?◇ Data Size

A single collection holds 10GB◇ Throughput

3 Performance tiers with a max of 2,500 RU/sec

Page 35: Azure DocumentDb

Italian Virtual Chapter – 19.10.2016

Collection

Request

Partitioning our data

Page 36: Azure DocumentDb

Italian Virtual Chapter – 19.10.2016

Partitioning our data

Partition 1

Request

Request

Partition 2

Logical grouping

Page 37: Azure DocumentDb

Italian Virtual Chapter – 19.10.2016

Evenly distribute across n number of partitions (algorithmic) ….

Partitioning - Hash

Page 38: Azure DocumentDb

Italian Virtual Chapter – 19.10.2016

Keep current data hot, Warm historical data, Scale-down older data, Purge / Archive

}current period

Partitioning - Range

Page 39: Azure DocumentDb

Italian Virtual Chapter – 19.10.2016

Home tenant / user to a specific partition. Use "master" lookup.

Tenant Partition Id

Customer 1

Big Customer 2

Another 3

Cache this shard map

to avoid makingthe lookup the

bottleneck

Partitioning - Lookup

Page 40: Azure DocumentDb

Italian Virtual Chapter – 19.10.2016

Indexing

Page 41: Azure DocumentDb

Italian Virtual Chapter – 19.10.2016

Index policies◇ customize index management

including storage◇ overhead, throughput and query

consistency■ range, hash and spatial

indexes■ included and excluded paths■ indexing mode; consistent or

lazy■ index precision■ online, in-place index

transformations

{ "indexingMode": "consistent", "automatic": true, "includedPaths": [ { "path": "/*", "indexes": [ { "kind": "Range", "dataType": "Number", "precision": -1 }, { "kind": "Hash", "dataType": "String", "precision": 3 }, { "kind": "Spatial", "dataType": "Point" } ] } ], "excludedPaths": []}

Page 42: Azure DocumentDb

Italian Virtual Chapter – 19.10.2016

Indexing PoliciesConfiguration Level Options

Automatic Per collection True (default) or False Override with each document write

Indexing Mode Per collection Consistent or Lazy Lazy for eventual updates/bulk ingestion

Included and excluded paths

Per path Individual path or recursive includes (? And *)

Indexing Type Per path Support Hash (Default) and RangeHash for equality, range for range queries

Indexing Precision Per path Supports 3 – 7 per pathTradeoff storage, query RUs and write RUs

Page 43: Azure DocumentDb

Italian Virtual Chapter – 19.10.2016

Indexing Paths Path Description/use case / Default path for collection. Recursive and applies to whole document tree.

/"prop"/? Serve queries like the following (with Hash or Range types respectively): SELECT * FROM collection c WHERE c.prop = "value" SELCT * FROM collection c WHERE c.prop > 5

/"prop"/* All paths under the specified label.

/"prop"/"subprop"/ Used during query execution to prune documents that do not have the specified path.

/"prop"/"subprop"/? Serve queries (with Hash or Range types respectively): SELECT * FROM collection c WHERE c.prop.subprop = "value" SELECT * FROM collection c WHERE c.prop.subprop > 5

Page 44: Azure DocumentDb

Italian Virtual Chapter – 19.10.2016

Indexing tips◇ Use lazy indexing for faster peak time ingestion rates◇ Exclude unused paths from indexing for faster writes◇ Specify range index path type for all paths used in range queries◇ Vary index precision for write vs query performance vs storage tradeoffs◇ http://azure.microsoft.com/blog/2015/01/27/performance-tips-for-azure-doc

umentdb-part-2/

Page 45: Azure DocumentDb

Italian Virtual Chapter – 19.10.2016

Querying

Page 46: Azure DocumentDb

Italian Virtual Chapter – 19.10.2016

Query◇ Query over heterogeneous documents

without defining schema or managing indexes

◇ Query arbitrary paths, properties and values without specifying secondary indexes or indexing hints

◇ Execute queries with consistent results ◇ Supported SQL features; predicates,

iterations (arrays), sub-queries, logical operators, UDFs, intra-document JOINs, JSON transforms

◇ In general, more predicates result in a larger request charge.

◇ Additional predicates can help if they result in narrowing the overall result set.

from book in client.CreateDocumentQuery<Book>(collectionSelfLink)where book.Title == "War and Peace" select book;

from book in client.CreateDocumentQuery<Book>(collectionSelfLink)where book.Author.Name == "Leo Tolstoy"select book.Author;

-- Nested lookup against indexSELECT B.AuthorFROM Books BWHERE B.Author.Name = "Leo Tolstoy"

-- Transformation, Filters, Array accessSELECT { Name: B.Title, Author: B.Author.Name }FROM Books BWHERE B.Price > 10 AND B.Language[0] = "English"

-- Joins, User Defined Functions (UDF)SELECT udf.CalculateRegionalTax(B.Price, "USA", "WA")FROM Books BJOIN L IN B.LanguagesWHERE L.Language = "Russian"

LINQ Query

SQL Query Grammar

Page 47: Azure DocumentDb

Italian Virtual Chapter – 19.10.2016

DEMO

Page 48: Azure DocumentDb

Italian Virtual Chapter – 19.10.2016

Programmability

Page 49: Azure DocumentDb

Italian Virtual Chapter – 19.10.2016

function region(doc){ switch (doc.Location.Region) { case 0: return "North"; case 1: return "Middle"; case 2: return "South"; }}

Query with user-defined function◇ The

complexity of a query impacts the request units consumed for an operation:

◇ Use of user-defined functions (UDFs)■ SELE

CT or WHERE clauses

◇ To take advantage of indexing, try and have at least one filter against an indexed property when leveraging a UDF in the WHERE clause.

Page 50: Azure DocumentDb

Italian Virtual Chapter – 19.10.2016

function count(filterQuery, continuationToken) { var collection = getContext().getCollection(); var maxResult = 25; // MAX number of docs to process in one batch, when reached, return to client/request continuation. // intentionally set low to demonstrate the concept. This can be much higher. Try experimenting. // We've had it in to the high thousands before seeing the stored proceudre timing out.

// The number of documents counted. var result = 0;

tryQuery(continuationToken);}

Executing Stored Procedures◇ Execute

“explicit” Javascript code on collection

Page 51: Azure DocumentDb

Italian Virtual Chapter – 19.10.2016

function normalize() { var collection = getContext().getCollection(); var collectionLink = collection.getSelfLink(); var doc = getContext().getRequest().getBody();

var newDoc = { "Sensor": { "Id": doc.sensorId, "Class": 0 }, "Degree": { "Value": doc.degreeValue, "Type": 0 }, "Location": { "Name": doc.locationName, "Region": doc.locationRegion, "Longitude": doc.locationLong, "Latitude": doc.locationLat },"id": doc.id }; // Update the request -- this is what is going to be inserted. getContext().getRequest().setBody(newDoc);}

Triggers◇ Execute

“implicit” Javascript code on CRUD operations (Insert, Update, Delete) on collections

Page 52: Azure DocumentDb

Italian Virtual Chapter – 19.10.2016

Conclusions

Page 53: Azure DocumentDb

Italian Virtual Chapter – 19.10.2016

Conclusions◇ DocumentDb is a Restful service◇ Documents defines Unit of Costs with Resource Units◇ Database Account defines Accessibility and Consistency◇ Database is a Namespace placeholder◇ Containers is the unit of Scale

Page 54: Azure DocumentDb

Italian Virtual Chapter – 19.10.2016

Usage: what is DocumentDb for?◇ User generated content◇ Many specific data (varbinary(MAX) in SQL)◇ Catalog data◇ Log data◇ User preferences data◇ Device sensor data◇ IoT use cases commonly share some patterns in how they ingest, process

and store data. First, these systems allow for data intake that can ingest bursts of data from device sensors of various locales. Next, these systems process and analyze streaming data to derive real time insights. And last but not least, most if not all data will eventually land in a data store for adhoc querying and offline analytics.

Page 55: Azure DocumentDb

Italian Virtual Chapter – 19.10.2016

Any questions?You can find me at: [email protected]/@marco_parenzan

Thanks!