breakingbarriers:) mongodb)design)patternspattern:)fast)accounting...
TRANSCRIPT
Breaking Barriers: MongoDB Design Patterns
Nikolaos Vyzas & Christos Soulios
a bit about us… and this talkWho we are… what we do…
Christos Soulios
• Christos is a principal architect at Pythian • Delivers Big Data platforms for some of the world's top tech organizations
• Expert in Big Data, Hadoop, NoSQL etc• Working with MongoDB since v1.7 (back in 2011)
3
Nik Vyzas
• Nik is a Sr. TechOps Architect at Percona• 10+ years experience in production support and enterprise software development for large scale distributed environments
• Expert in a variety of open-source technologies especially RHEL, Debian, Percona Server, XtraDB Cluster, MongoDB, Ansible, Java and Python.
• Over the years he has also mastered the dark art of turning caffeine into new software and bug fixes.
4
What is this talk about?
• Proven MongoDB design patterns• Data modelling and indexing principles• Common MongoDB pitfalls and how to avoid them• Balancing performance and data consistency• Best practices for scaling out / sharding• How to generally press the go-faster button
5
Session Overview
• Indexing Strategy• Data Modeling: To reference or to embed?• Ranking / Fast Accounting in MongoDB• Atomic Updates and the Optimistic Locking Pattern• Keyword Search Pattern• Defensive Programming• Read / Write Concern• Sharding Considerations
6
Indexing StrategyOptimizing Query Performance
Sometimes queries are a bit slow…
8
Think about your indexing
9
Index Types
Basic index types:• Single field indexes• Compound indexes
Other noteworthy indexes:• Text indexes• Geospatial indexes• Hashed indexes (* mainly for sharding)
10
Index Properties
• Unique - acts as index and constraint• Sparse - only when field exists • Partial - based on specified criteria• TTL - * Keep in mind the TTLDeleter thread runs every 60 seconds
11
Indexing Tips
• Ensure your indexes fit in memory. Try to be minimal• Don't index everything – Indexes are costly• When indexing timestamps, always index coarsely. • NEVER index milliseconds
• Don’t index fields with low cardinality• Careful with text indexes
12
More Indexing Tips
• Prefer compound indexes that will improve multiple queries• Create indexes that cover the queries - all data is retrieved from the index• When developing code you can start mongod with the –notablescan option• Over time schemas and query patterns evolve, always review your indexes
13
The explain() plan
14
The explain() plan
• Returns the query execution plan for a specific query• Provides execution statistics e.g. rows scanned, indexes used etc.• For sharded collections information regarding shards accessed is included
15
Use the explain plan to identify required indexesfor filtering and sorting documents
The explain() plan – No Index
db.movies.find({'year': '2001'}).explain(true)…
"queryPlanner" : {"winningPlan" : {
"stage" : "COLLSCAN","filter" : { "year" : {"$eq" : "2001" } },"direction" : "forward"
}}
…
The explain() plan – No Index
"executionStats" : {"totalDocsExamined" : 250,…"executionStages" : {
… "nReturned" : 7,"advanced" : 7,"direction" : "forward","docsExamined" : 250
},},
17
The explain() plan - Indexed
"queryPlanner" : {"winningPlan" : {
"stage" : "IXSCAN","keyPattern" : { "year" : 1 },"indexName" : "year_1","isMultiKey" : false,"isUnique" : false,"isSparse" : false,"isPartial" : false,"indexVersion" : 1,"direction" : "forward","indexBounds" : { "year" : ["[\"2001\", \"2001\"]" ] }
…18
The explain() plan - Indexed
"executionStats" : {…
"nReturned" : 7,
"executionTimeMillis" : 0,
"totalKeysExamined" : 7,
"totalDocsExamined" : 7,
…
}
19
The Database Profiler
20
• Collects data about operations, cursors and db commands• Configurable per database or per instance• Allows setting slowOpThresholdMs to capture only slow queries or all• Crucial for identifying bottlenecks and understanding workload
Data Modeling: To reference or to embed?
Embedded vs. referenced pattern implementations
Data modeling
• We define data relationships between collections• How do I join data?• Effective data modeling • Application side joins
• Two basic models: Embedded or Referenced• ALWAYS ask yourself “to reference or to embed?”
22
Pattern: Embedded One-to-One Relationship{'_id' : ObjectId(…),
'title' : 'Shawshank Redemption',…
'director': { 'name' : 'Frank Darabont', … },…
}
23
Pattern: Embedded One-to-Many Relationship
{'_id' : ObjectId(…),
'title' : 'Shawshank Redemption',…
'writers' : [{'name':'Stephen King', … },
{'name':'Frank Darabont', … }], …
}
24
The Embedded Model
• Faster reads / writes – whole BSON is retrieved in 1x database call• Updates at the document level enforce atomicity• Duplication can lead to data inconsistencies• Avoid embedding data with unbound growth• Never embed documents that grow after creation (MMAPv1 storage engine)
25
Pattern: Referenced One-to-Many Relationship
26
{'_id' : 'xyz','title' : 'The Wall',...
}
{'_id' : ObjectId(…),'movie_id' : 'xyz','rating' : 8,...
}
movies
reviews
{'_id' : ObjectId(…),'movie_id' : 'xyz','rating' : 2,...
}
Reference on movie_id = 'xyz'
The Referenced Model
• Enforces data consistency• Allows for Parent & Child Tree References• Each relationship requires an additional call• Ensure that your referenced fields are indexed• This becomes costly:
▪ Makes reading slower▪ Makes writing slower▪ Requires more indexes
27
Atomic Updates & the Optimistic Lock Pattern
Atomic Updates & Collection Versioning
Atomic Update Operations
>>> db.movies.update_one({'rating': {'$gt': 9 }},
{'$set' : {'favorite' : True }})
>>> old_doc = db.movies.find_one_and_update(
{'rating': {'$gt': 9 }},
{'$set' : {'favorite' : True }})
29
Atomic Update Operations
>>> db.movies.update_one({'rating': {'$gt': 9 }},
{'$set' : {'favorite' : True }})
>>> old_doc = db.movies.find_one_and_update(
{'rating': {'$gt': 9 }},
{'$set' : {'favorite' : True }})
30
The update and return occurs within a single atomic operation
Update Operators
31
The list of valid update operators:
$inc : Increment counter$set : Set a new value $unset : Set value = NULL$addToSet : Add value into array (duplicates not inserted)$push / $pushAll : Add value into array $pop / $popAll : Remove first / last value(s) of array$pull / $pullAll : Remove instance(s) of value from array$rename : Update key name(s)
Pattern: Optimistic Locking
For complex changes use the Optimistic Locking Design Pattern:• Include a version field in all documents
{'_id': ObjectId(…), 'title':'Zootopia', 'v':1 }
• Retrieve a document and remember its version• Make a series of complex transformations to the document or create a new one
• Do not forget to increment the version of the new document• Update the document only if the version has not changed
32
Pattern: Optimistic Locking
Update only if the document version has not changedm = db.movies.find_one({'title' : 'Zootopia'})v = m['v'] # Remember the old versionm = complex_transformations(m)m['v'] = v + 1 # Increment the versionr = db.movies.replace_one({'_id' : m['_id'], 'v' : v}, m)if r.modified_count == 0:
compensate()
33
Ranking / Fast Accounting in MongoDB
High performance accounting to avoid aggregation
Pattern: Fast Accounting
Use case: Count daily and monthly reviews posted for each movie. Display a histogram on the movie pageNaive solution: Run counts on the reviews collection when histograms must be rendered• Slow and resource consuming to aggregate millions of documents• Calculating on every page view is too often• Indexing may help but it will not solve the problem• Fetching old data destroys page cache
35
Pattern: Fast Accounting
• Solution: Fast Accounting Design Pattern• Create a separate collection to store aggregate counters• Update counters when a new review is submitted• If there are more than one counters, multiple updates will be performed• This is a pattern taken from Complex Event Processing (CEP)
36
Pattern: Fast Accounting - Schema
Create a separate collection named 'review_counts':{ '_id':
{'movie_id: ObjectId(…),'day' : '2016-04-21'},
'count' : 10345},
{ '_id': {'movie_id: ObjectId(…),
'month' : '2016-04'},'count' : 11210345
}
37
Query Dimensions
Pattern: Fast Accounting – Increment countsUpdate daily counts:>>> db.review_counts.update_one({'_id':
{'movie_id': ObjectId(…),'day' : '2016-04-21'}},{'$inc' : {'count' : 1}}, upsert=True)
Update monthly counts:>>> db.review_counts.update_one({'_id':
{'movie_id': ObjectId(…),'month' : '2016-04'}},{'$inc' : {'count' : 1}}, upsert=True)
38
Pattern: Fast Accounting – Retrieve counts
Retrieve daily count for a single day:>>> db.review_counts.find_one({'_id':
{'movie_id': ObjectId(…),'day' : '2016-04-21'}})['count']
10345
39
Pattern: Fast Accounting
• Documents for the latest dates and months are in memory• Retrievals are very fast because they search indexed data• Updates are very fast – They happen in memory• Use the _id index to ensure uniqueness and save space
40
Pattern: Fast Accounting
• Updates are atomic – They can scale to thousands of concurrent updates• Always use upsert=True to create new counters• More dimensions can be added in the counter – don't overdo it• This pattern can be adopted for aggregating any timeseries data
41
Keyword Search PatternModelling data for retrieval based on specific keyword or tag
Living in the #hashtag world
Use case: Retrieve a document based on a specific hashtag or keywordNaive solution: Add all tags delimited to a “tags” field and create a text index e.g. db.movies.createIndex({"tags":"text"})• Text indexes require more space• Take very long to build• Significantly reduce insertion time• More intensive retrieval processing
43
Keyword Search Pattern
• Solution: Keyword Search Pattern• Create separate index entries per tags• Groups documents based on tags• Leverages multi-key indexes using an array (automatically created)• Results in smaller and faster indexes compared to text
44
Keyword Search Pattern: Schema
• A “movies” collection with search keywords / tags:{ '_id':
{'movie_id: ObjectId(…),'title' : 'World War Z'},'tags' : ['thriller','2016','zombies']
},db.movies.createIndex({tags: 1})
### Separate index entries have now been created for:### - thriller### - 2016### - zombies
45
Gotchas and Pitfalls
• Be careful: Insertion degrades on high cardinality (i.e. thousands) • If indexes get too large asynchronous indexing may be required• Allowing for “free-text” entry can lead to high cardinality - try to maintain a list if possible
• Be weary of case-sensitivity - consider forcing UPPER / LOWER case• Do not use this pattern for full-text-search, rather prefer “text” indexing
46
Defensive programmingBest practices for reading and writing data with a schemaless database
Structure in a schemaless world
MongoDB does not enforce schemaKey considerations for coding:
• Is the data I’m writing valid?• Is the data I’m reading valid?
48
Structure in a schemaless world
Methods for ensuring data is valid:• Using BSON document types• Document validation capability (3.2+)
49
BSON Document Types
BSON provides support for common variable types, most importantly:
bool … int … long … double
string … array
timestamp … date
objectId … object
…
50
BSON Document Types
• Python types supported by Pymongo• Pymongo converts Python types in a JSON document to BSON types • BSON types also supported by the Java driver• Generally - language specific drivers support BSON• Custom Types can also be defined using a “class” • Document types can also be defined using an ORM such as MongoEngine
51
BSON Document Types
For example - insert a document in Pymongo enforcing datetime:
>>> doc = {
"date": datetime(2003, 11, 26),"title_id":"tt0111161","user_location":"Texas","title_name":"The Shawshank Redemption","summary":"Best movie ever!!”
}
>>> db.bsontest.insert(doc)
52
BSON Document Types
Then retrieve the value in Mongo cli:
> db.bsontest.find() {
"_id" : ObjectId("570aae6d0059a38a781fed60"),"title_id" : "tt0111161","user_location" : "Texas","summary" : "Tied for the best movie I have ever
seen","date" : ISODate("2003-11-26T00:00:00Z"),"title_name" : "The Shawshank Redemption"
}
53
Document Validation
Document validation is supported in Mongo 3.2+Validation can be set during collection creation or on an existing collectionTwo modes of operation:
• Strict - Applied to all document inserts / updates• Moderate - Applied to inserts / updates on documents that conform
• Setting validationAction:• “warn” for testing (logs errors)• “error” for enforcing (throws an error)
54
New Document Validation
Create a validation on the “dvtest” collection:db.createCollection(“dvtest",{ validator :
{ $and: [ {"title_id" : { $type: "string" }},{"user_location" : { $exists: true }},{"title_name" : { $type: "string" }}
]}
}
55
New Document Validation
Insert an invalid document into the “dvtest” collection:db.dvtest.insert({“foo": "bar"})WriteResult({
"nInserted" : 0,"writeError" : {
"code" : 121,"errmsg" : "Document failed validation"
}})
56
Existing Document Validation
Add a new validation to an existing “dvtest” collection with “moderate” validation:db.runCommand( {
collMod: "dvtest",validator: {$and:[{title_id: {$exists:true}}]},validationLevel: "moderate"}
)
57
Read / Write ConcernRead / Write Concern levels for CRUD operations
What is Write Concern
Write concern determines the level of acknowledgement for data written by mongod processes: • w = 0: No acknowledgement at all. It fails only if connectivity errors occur at the client application
• w = 1 (default): Require acknowledgement by the Primary replica• w > 1: Acknowledgment by the number of replicas equal to w• w = “majority”
59
Write Concern
• wtimeout: Time (ms) for an acknowledgement to return• The j=True option requires an acknowledgement that data was written to the database journal
60
Write Concern Tips
• Never do unsafe writes (w=0) – Except if you don’t care about your data• w=1 is not safe at all. A write can be overwritten by an outdated replica after a fail over
• w='majority' is safe. But it’s slow• w>1 is your best bet• Always use wtimeout when w>1. If write concern cannot be achieved, the write will block forever
61
Read Preference
read_preference specifies the replica instance that read operations are directed at:Possible values: • PRIMARY [default]• PRIMARY_PREFERRED• SECONDARY• SECONDARY_PREFERRED• NEAREST
62
Read Concern
• Read concern specifies the isolation level for read operations• ReadConcern(‘local’) returns local data stored on the replica queried [default]
• ReadConcern(‘majority’) returns data replicated to the majority of replicas i.e. already replicated
• majority is only supported by the WiredTiger storage engine, not by the MMAPv1
63
Read Concern Tips
• Read from secondaries when possible to scale reads• All read preference modes except PRIMARY may return stale data because of replication lag
• majority read concern is slow• majority read concern does not guarantee the latest data, but the latest data replicated to the majority of replicas
64
Sharding ConsiderationsHash vs. Timestamp Distribution
Sharding in MongoDB
• Sharding: Horizontal partitioning of data across multiple nodes / replicasets
• Sharded replicasets are recommended for HA• Collections are sharded across replicasets based on a shard key• High cardinality of the shard-key ensures even distribution across replicasets
• Collections which are not sharded remain on the primary shard
66
Sharding in MongoDB
What are my sharding options?• Hash based • Range based• Tag based
67
Hash Based Sharding
• Use hash indexes for the ranges• Evenly distributed reads / writes• Random operations due to random sharding algorithm• Retrieving multiple documents can lead to scatter - gather• Key use cases:
• Scaling: Load balancing reads & writes (example to follow)• Disaster recovery: Parallel shard recovery
68
Hash Based Sharding
Example: Shard key hash(datetime) - good write distribution
69
Shard1 (Primary) Shard2 Shard3
WRITES
“2016-4-19 00:00:00” “2016-4-20 00:00:00” “2016-4-18 00:00:00”
Hash Based Sharding
Example: Shard key hash(datetime) - scatter gather reads
70
Shard1 (Primary) Shard2 Shard3
“2016-4-19 00:00:00” “2016-4-20 00:00:00” “2016-4-18 00:00:00”
READS
Find datetime values between 17th & 21st of April
Hash Based Sharding
Example: Shard key hash(userid) - good read distribution
71
Shard1 (Primary) Shard2 Shard3
“ed4f7269”
Find user with id = ed4f7269
READS
Range Based Sharding
• Ranges are defined on the defined data e.g. number / date-time• Data is divided across range of documents
• E.G. 4x shards with int 1..100 >> Shard1 with values 1..25 etc.• Can lead to hotspot shards on date-based ranges• As ranges change chunk migration may cause overhead• Key use cases:
• Scaling: Load balancing reads & writes• Disaster recovery: Parallel shard recovery
72
Range Based Sharding
Example: Shard key datetime value - bad write distribution
73
Shard1 (Primary) Shard2 Shard3
WRITES
“2016-4-18 00:00:00” “2016-4-19 00:00:00” “2016-4-20 00:00:00”
On 20th April all writes go to Shard3
Range Based Sharding
Example: Shard key datetime value - bad write distribution
74
Shard1 (Primary) Shard2 Shard3
“2016-4-18 00:00:00” “2016-4-19 00:00:00” “2016-4-20 00:00:00”
Similar scenario with reads
READS
Tag Based Sharding
Allows for custom data distributionData is divided across predefined tags
• E.G. “Americas” on Shard1 .. “EU” on Shard2 .. “APAC” on Shard3Can lead to hotspots depending on use-caseKey use cases:
• Geo-locality: Force data into suitable geographically dispersed shards• HW Optimization: Force hot data onto faster hardware
75
Tag Based Sharding
Example: Shard tags on “location” - faster response times
76
Shard1 (Primary) Tag: AM Shard2 Tag: EU Shard3 Tag: APAC
WRITES
“USA” “GREECE” “AUSTRALIA”
Writes occur in a local DC
Tag Based Sharding
Example: Shard tags on “location” - faster response times
77
Shard1 (Primary) Tag: AM Shard2 Tag: EU Shard3 Tag: APAC
WRITES
“USA” “GREECE” “AUSTRALIA”
READS
Reads occur in a local DC
Tag Based Sharding
Example: Shard tags on “year” ranges - automatic archiving
78
Shard1 (Primary) Tag: 2016 Shard2 Tag: 2010 - 2015 Shard3 Tag: < 2010
WRITES
“<NEW DATA>”
New data is written to high speed node
32x Cores - 128GB RAM - SSD 16x Cores - 64GB RAM - SSD 4x Cores - 32GB RAM - Rotational
“<FEWER WRITES>” “<NO WRITE ACTIVITY>”
Tag Based Sharding
Example: Shard tags on “year” ranges - automatic archiving
79
Shard1 (Primary) Tag: 2016 Shard2 Tag: 2010 - 2015 Shard3 Tag: < 2010
WRITES
“<NEW DATA>”
64x Cores - 256GB RAM - SSD 16x Cores - 64GB RAM - SSD 4x Cores - 32GB RAM - Rotational
READS
“<FEWER READS>” “<ONLY REPORTING>”
New data is written to high speed node
…the endQ&A