introducing azure documentdb - nosql, no problem
Post on 11-Feb-2017
405 Views
Preview:
TRANSCRIPT
Hello DocumentDBNoSQL, No Problem
{ "name": "Andrew Liu", "e-mail": "andrl@microsoft.com", "twitter": "@aliuy8"}
First… a Rant
managing servers makes me cry
structuring data is really hard
backfilling data and managing indexes makes
me angry
A Solution
DocumentDBdocument-database… as a
service!
• Part of the NoSQL family of databases• Built for simplicity, scale and performance• Non-relational, no schema enforced by the database• Flexible query options
What’s a document database?
Great for these documents …
• Part of the NoSQL family of databases• Built for simplicity, scale and performance• Non-relational, no schema enforced by the database• Flexible query options
What’s a document database?
Not ideal for these documents …
• Part of the NoSQL family of databases• Built for simplicity, scale and performance• Non-relational, no schema enforced by the database• Flexible query options
What’s a document database?
Definitely not for these documents …
Heterogeneous data
Item Author Pages
Language
Harry Potter and the Sorcerer’s Stone
J.K. Rowling 309 English
Game of Thrones: A Song of Ice and Fire
George R.R. Martin
864 English
Item Author Pages
Language
Harry Potter and the Sorcerer’s Stone
J.K. Rowling 309 English
Game of Thrones: A Song of Ice and Fire
George R.R. Martin
864 English
Lenovo Thinkpad X1 Carbon
??? ??? ???
!=
!=
{ "ItemType": "Book", "Title": "Harry Potter and the Sorcerer's Stone", "Author": "J.K. Rowling", "Pages": "864", "Languages": [ "English", "Spanish", "Portuguese", "Russian", "French" ]} {
"ItemType": "Laptop", "Name": "Lenovo Thinkpad X1 Carbon", "Processor": "Core i7 3.3 Ghz", "Memory": "8 GB DDR3L SDRAM", "Storage": "256 GB SSD", "Graphics": "Intel HD Graphics 4400", "Weight": "1 pound"}
Rapid Iterative Development
3rd Party Data
It just works.
fully managed, scalable, queryable, schemafree JSON document database service for modern applications
fully featured RDBMStransactional processing
rich query managed as a service
elastic scale
internet accessible http/rest
schema-free data model
arbitrary data formats
Where does it fit in the Azure family?
Some of my favorite things
query over schema-free
JSONtransactional
integrated javascript
tunableperformance
fully managedas a service
Query over schema-free JSON
query over schema-free
JSONtransactional
integrated javascript
tunableperformance
fully managedas a service
No need to define secondary indices / schema hints for indexing!
Automatic Indexing
-- Nested lookup against indexSELECT Books.AuthorFROM BooksWHERE Books.Author.Name = "Leo Tolstoy"
-- Transformation, Filters, Array accessSELECT { Name: Books.Title, Author: Books.Author.Name }FROM BooksWHERE Books.Price > 10 AND Books.Languages[0] = "English"
-- Joins, User Defined Functions (UDF)SELECT CalculateRegionalTax(Books.Price, "USA", "WA")FROM BooksJOIN LanguagesArr IN Books.LanguagesWHERE LanguagesArr.Language = "Russian"
SQL Query Grammar
Query over schema-free JSON
Transactional Integrated JavaScript
query over schema-free
JSONtransactional
integrated javascript
tunableperformance
fully managedas a service
Transactional Integrated JavaScript
Transactional Integrated JavaScript
function(playerId1, playerId2) { var playersToSwap = __.filter (function (document) { return (document.id == playerId1 || document.id == playerId2); });
var player1 = playersToSwap[0], player2 = playersToSwap[1]; var player1ItemTemp = player1.item; player1.item = player2.item; player2.item = player1ItemTemp;
__.replaceDocument(player1) .then(function() { return __.replaceDocument(player2); }) .fail(function(error){ throw 'Unable to update players, abort'; });}
client.executeStoredProcedureAsync ("procs/1234", ["MasterChief", "SolidSnake“]) .then(function (response) { console.log(“success!"); }, function (err) { console.log("Failed to swap!", error); });
Client Database
Transactional Integrated JavaScript
Tunable Performance
query over schema-free
JSONtransactional
integrated javascript
tunableperformance
fully managedas a service
Tunable Consistency LevelsBrewer’s CAP
TheoremConsistency
Availability Partition Tolerance
Tunable Consistency LevelsDocumentDB offers 4 consistency levelsBrewer’s CAP
TheoremConsistency
Availability Partition Tolerance99.95% Availability
SLA
Fully managed as a service
query over schema-free
JSONtransactional
integrated javascript
tunableperformance
fully managedas a service
• Predictable Performance
• Hourly Billing
• 99.95% Availability
• Adjustable Performance Levels
Fully managed as a service
databaseusers, permissions
S1 S2
collections …
S3
I’m notcryinganymore
DocumentDB in action
And many others too!“With Azure DocumentDB, we didn’t have to say ‘no’ to the business, and we weren’t a bottleneck to launching the promotion — in fact, we came in ahead of schedule.”
- Andreas Helland, TelenorMobility Architect
Ready to get started?
Fully managed as a service
Part of the Azure Ecosystem
Enriched app experiences
DocumentDB + Search
http://aka.ms/docdbsearch
Big data and analytics
DocumentDB + HDInsight
http://aka.ms/docdbhdi
• Collections != Tables
• De-normalize data where appropriate
• Tuning / Perf• Consistency Levels• Index Policies• Understand Query Costs / Limits / Avoid Scans• Pre-aggregate where possible
Quick Tips
Recent Announcements (since April 2015)• Order By
• Geospatial Indexing
• Id based routing
• Online index transformations
• JavaScript language-integrated query
• Azure Preview portal enhancements
• Data migration tool enhancements
• Azure Data Factory Integration
• Partitioning support
• General availability in Australia
Thank YouGet started with Azure DocumentDB
http://www.azure.com/docdb
Query Demo:https://www.documentdb.com/sql/demo
Andrew Liuandrl@microsoft.com
@aliuy8
Data Modeling
{ "id": "1", "firstName": "Thomas", "lastName": "Andersen", "addresses": [ { "line1": "100 Some Street", "line2": "Unit 1", "city": "Seattle", "state": "WA", "zip": 98012 } ], "contactDetails": [ {"email: "thomas@andersen.com"}, {"phone": "+1 555 555-5555", "extension": 5555} ] }
Try model your entity as a self-contained documentGenerally, use embedded data models when:
There are "contains" relationships between entitiesThere are one-to-few relationships between entities Embedded data changes infrequentlyEmbedded data won’t grow without boundEmbedded data is integral to data in a document
Data modeling with denormalization
Denormalizing typically provides for better read performance
In general, use normalized data models when:
Write performance is more important than read performanceRepresenting one-to-many relationshipsCan representing many-to-many relationshipsRelated data changes frequently
Provides more flexibility than embeddingMore round trips to read data
Data modeling with referencing
{"id": "xyz","username:
"user xyz"}
{"id": "address_xyz","userid": "xyz",
"address" : {…
}}
{"id: "contact_xyz","userid": "xyz","email" :
"user@user.com" "phone" : "555 5555"}
User document
Address document
Contact details document
Normalizing typically provides better write performance
No magic bulletThink about how your data is going to be written, read and model accordingly
Hybrid models ~ denormalize + reference + aggregate
{ "id": "1", "firstName": "Thomas", "lastName": "Andersen", "countOfBooks": 3, "books": [1, 2, 3], "images": [
{"thumbnail": "http://....png"} {"profile": "http://....png"}
] }
{ "id": 1, "name": "DocumentDB 101", "authors": [
{"id": 1, "name": "Thomas Andersen", "thumbnail": "http://....png"},
{"id": 2, "name": "William Wakefield", "thumbnail": "http://....png"}
] }
Author document
Book document
Request Units
Request Units
Request Unit (RU) is the normalized currency
% Memory
% IOPS
% CPU
Replica gets a fixed budget of Request Units
READGET Resourc
e
Resourceset
INSERT
POSTResource
DELETEDELETE Resourc
e
QueryPOST Document
sSQL
EXECUTEPOST sprocsargs
REPLACE
PUTResource
Resource
Predictable PerformanceEach DocumentDB collection has reserved throughput in terms of request units (RUs)Normalized currency across database operationsRUs offer accurate accounting in face of diverse database operations
Request UnitsOperation Request units
(RUs) consumed*
Reading a single 1KB document 1
Reading a single 2KB document 2
Query with a simple predicate for a 1KB document
3
Creating a single 1 KB document with 10 JSON properties (consistent indexing)
14
Create a single 1 KB document with 100 JSON properties (consistent indexing)
20
Replacing a single 1 KB document 28
Execute a stored procedure with two create documents
30
Partitioning
Why Partition?
• Data SizeA single collection holds 10GB
• Throughput3 Performance tiers with a max of 2,500 RU/sec
Start with 1 partition, fill it, then move to next
headroom {(fill factor)
Partitioning - Spillover
Keep current data hot, Warm historical data, Scale-down older data, Purge / Archive
}current period
Partitioning - Range
Home tenant / user to a specific partition. Use "master" lookup.
Tenant Partition Id
Customer 1Big Customer
2
Another 3
Cache this shard map
to avoid makingthe lookup the
bottleneck
Partitioning - Lookup
Evenly distribute across n number of partitions (algorithmic) ….
Partitioning - Hash
- Application needs to query each candidate partition
( can be done in parallel )
- Application consolidates (or reduces) results
{ record: "1", created: { "date": "6/1/2014", "epoch": 1401662986 }},{ record: "3", created: { "date": "9/23/2014" "epoch": 1411512586 }} ,{ record: "123", created: { "date": "8/17/2013" "epoch": 1376779786 }}
SELECT * FROM root r WHERE r.date.epoch BETWEEN 1376779786 AND 1401662986
{ record: "1", created: { "date": "6/1/2014", "epoch": 1401662986 }},{ record: "3", created: { "date": "9/23/2014" "epoch": 1411512586 }}
{ record: "43233", created: { "epoch": 1411512586 }} ,{ record: "1123", created: { "date": "8/17/2013" "epoch": 1376779786 }},{ record: "43234", created: { "epoch": 1376779786}
Partitioning - Fan-out Queries
Design: PartitioningHash sharding• Examples: Profile data (user ID, app ID), (user ID), Device and vehicle data (device/vin
ID), Catalog data (item ID)
• Pros: balanced, stateless• Cons: reshuffling is hard
Range sharding• Examples: Operational data (timestamp), (timestamp, event ID)• Pros: easy sliding window, range queries• Cons: stateful
Lookup sharding• SaaS/multitenant service (tenant ID), Metadata store (type ID)• Pros: simple, easy to reshuffle, can span accounts• Cons: stateful, works only on discrete keys
Tunable Indexing
IndexingHow it worksAutomatic indexing of documentsJSON documents are represented as treesStructural information and instance values are normalized into a JSON-PathFixed upper bound on index size (typically 5-10% in real production data)
Example{"headquarters": "Belgium"} /"headquarters"/"Belgium" {"exports": [{"city": “Moscow"}, {"city": Athens"}]} /"exports"/0/"city"/"Moscow" and /"exports"/1/"city"/"Athens".
Indexing PoliciesConfiguration Level Options
Automatic Per collection True (default) or False Override with each document write
Indexing Mode Per collection Consistent or Lazy Lazy for eventual updates/bulk ingestion
Included and excluded paths
Per path Individual path or recursive includes (? And *)
Indexing Type Per path Support Hash (Default) and RangeHash for equality, range for range queries
Indexing Precision Per path Supports 3 – 7 per pathTradeoff storage, query RUs and write RUs
Indexing Paths Path Description/use case / Default path for collection. Recursive and applies to whole
document tree. /"prop"/? Serve queries like the following (with Hash or Range types
respectively): SELECT * FROM collection c WHERE c.prop = "value" SELCT * FROM collection c WHERE c.prop > 5
/"prop"/* All paths under the specified label. /"prop"/"subprop"/ Used during query execution to prune documents that do not have
the specified path. /"prop"/"subprop"/? Serve queries (with Hash or Range types respectively):
SELECT * FROM collection c WHERE c.prop.subprop = "value" SELECT * FROM collection c WHERE c.prop.subprop > 5
Thank YouGet started with Azure DocumentDB
http://www.azure.com/docdb
Query Demo:https://www.documentdb.com/sql/demo
Andrew Liuandrl@microsoft.com
@aliuy8
top related