breakingbarriers:) mongodb)design)patternspattern:)fast)accounting...

Breaking Barriers: MongoDB Design Patterns

Nikolaos Vyzas & Christos Soulios

a bit about us… and this talkWho we are… what we do…

Christos Soulios

• Christos is a principal architect at Pythian • Delivers Big Data platforms for some of the world's top tech organizations

• Expert in Big Data, Hadoop, NoSQL etc• Working with MongoDB since v1.7 (back in 2011)

3

Nik Vyzas

• Nik is a Sr. TechOps Architect at Percona• 10+ years experience in production support and enterprise software development for large scale distributed environments

• Expert in a variety of open-source technologies especially RHEL, Debian, Percona Server, XtraDB Cluster, MongoDB, Ansible, Java and Python.

• Over the years he has also mastered the dark art of turning caffeine into new software and bug fixes.

4

What is this talk about?

• Proven MongoDB design patterns• Data modelling and indexing principles• Common MongoDB pitfalls and how to avoid them• Balancing performance and data consistency• Best practices for scaling out / sharding• How to generally press the go-faster button

5

Session Overview

• Indexing Strategy• Data Modeling: To reference or to embed?• Ranking / Fast Accounting in MongoDB• Atomic Updates and the Optimistic Locking Pattern• Keyword Search Pattern• Defensive Programming• Read / Write Concern• Sharding Considerations

6

Indexing StrategyOptimizing Query Performance

Sometimes queries are a bit slow…

8

Think about your indexing

9

Index Types

Basic index types:• Single field indexes• Compound indexes

Other noteworthy indexes:• Text indexes• Geospatial indexes• Hashed indexes (* mainly for sharding)

10

Index Properties

• Unique - acts as index and constraint• Sparse - only when field exists • Partial - based on specified criteria• TTL - * Keep in mind the TTLDeleter thread runs every 60 seconds

11

Indexing Tips

• Ensure your indexes fit in memory. Try to be minimal• Don't index everything – Indexes are costly• When indexing timestamps, always index coarsely. • NEVER index milliseconds

• Don’t index fields with low cardinality• Careful with text indexes

12

More Indexing Tips

• Prefer compound indexes that will improve multiple queries• Create indexes that cover the queries - all data is retrieved from the index• When developing code you can start mongod with the –notablescan option• Over time schemas and query patterns evolve, always review your indexes

13

The explain() plan

14

The explain() plan

• Returns the query execution plan for a specific query• Provides execution statistics e.g. rows scanned, indexes used etc.• For sharded collections information regarding shards accessed is included

15

Use the explain plan to identify required indexesfor filtering and sorting documents

The explain() plan – No Index

db.movies.find({'year': '2001'}).explain(true)…

"queryPlanner" : {"winningPlan" : {

"stage" : "COLLSCAN","filter" : { "year" : {"$eq" : "2001" } },"direction" : "forward"

}}

…

The explain() plan – No Index

"executionStats" : {"totalDocsExamined" : 250,…"executionStages" : {

… "nReturned" : 7,"advanced" : 7,"direction" : "forward","docsExamined" : 250

},},

17

The explain() plan - Indexed

"queryPlanner" : {"winningPlan" : {

"stage" : "IXSCAN","keyPattern" : { "year" : 1 },"indexName" : "year_1","isMultiKey" : false,"isUnique" : false,"isSparse" : false,"isPartial" : false,"indexVersion" : 1,"direction" : "forward","indexBounds" : { "year" : ["[\"2001\", \"2001\"]" ] }

…18

The explain() plan - Indexed

"executionStats" : {…

"nReturned" : 7,

"executionTimeMillis" : 0,

"totalKeysExamined" : 7,

"totalDocsExamined" : 7,

…

}

19

The Database Profiler

20

• Collects data about operations, cursors and db commands• Configurable per database or per instance• Allows setting slowOpThresholdMs to capture only slow queries or all• Crucial for identifying bottlenecks and understanding workload

Data Modeling: To reference or to embed?

Embedded vs. referenced pattern implementations

Data modeling

• We define data relationships between collections• How do I join data?• Effective data modeling • Application side joins

• Two basic models: Embedded or Referenced• ALWAYS ask yourself “to reference or to embed?”

22

Pattern: Embedded One-to-One Relationship{'_id' : ObjectId(…),

'title' : 'Shawshank Redemption',…

'director': { 'name' : 'Frank Darabont', … },…

}

23

Pattern: Embedded One-to-Many Relationship

{'_id' : ObjectId(…),

'title' : 'Shawshank Redemption',…

'writers' : [{'name':'Stephen King', … },

{'name':'Frank Darabont', … }], …

}

24

The Embedded Model

• Faster reads / writes – whole BSON is retrieved in 1x database call• Updates at the document level enforce atomicity• Duplication can lead to data inconsistencies• Avoid embedding data with unbound growth• Never embed documents that grow after creation (MMAPv1 storage engine)

25

Pattern: Referenced One-to-Many Relationship

26

{'_id' : 'xyz','title' : 'The Wall',...

}

{'_id' : ObjectId(…),'movie_id' : 'xyz','rating' : 8,...

}

movies

reviews

{'_id' : ObjectId(…),'movie_id' : 'xyz','rating' : 2,...

}

Reference on movie_id = 'xyz'

The Referenced Model

• Enforces data consistency• Allows for Parent & Child Tree References• Each relationship requires an additional call• Ensure that your referenced fields are indexed• This becomes costly:

▪ Makes reading slower▪ Makes writing slower▪ Requires more indexes

27

Atomic Updates & the Optimistic Lock Pattern

Atomic Updates & Collection Versioning

Atomic Update Operations

>>> db.movies.update_one({'rating': {'$gt': 9 }},

{'$set' : {'favorite' : True }})

>>> old_doc = db.movies.find_one_and_update(

{'rating': {'$gt': 9 }},


29

Atomic Update Operations

>>> db.movies.update_one({'rating': {'$gt': 9 }},


>>> old_doc = db.movies.find_one_and_update(

{'rating': {'$gt': 9 }},


30

The update and return occurs within a single atomic operation

Update Operators

31

The list of valid update operators:

$inc : Increment counter$set : Set a new value $unset : Set value = NULL$addToSet : Add value into array (duplicates not inserted)$push / $pushAll : Add value into array $pop / $popAll : Remove first / last value(s) of array$pull / $pullAll : Remove instance(s) of value from array$rename : Update key name(s)

Pattern: Optimistic Locking

For complex changes use the Optimistic Locking Design Pattern:• Include a version field in all documents

{'_id': ObjectId(…), 'title':'Zootopia', 'v':1 }

• Retrieve a document and remember its version• Make a series of complex transformations to the document or create a new one

• Do not forget to increment the version of the new document• Update the document only if the version has not changed

32

Pattern: Optimistic Locking

Update only if the document version has not changedm = db.movies.find_one({'title' : 'Zootopia'})v = m['v'] # Remember the old versionm = complex_transformations(m)m['v'] = v + 1 # Increment the versionr = db.movies.replace_one({'_id' : m['_id'], 'v' : v}, m)if r.modified_count == 0:

compensate()

33

Ranking / Fast Accounting in MongoDB

High performance accounting to avoid aggregation

Pattern: Fast Accounting

Use case: Count daily and monthly reviews posted for each movie. Display a histogram on the movie pageNaive solution: Run counts on the reviews collection when histograms must be rendered• Slow and resource consuming to aggregate millions of documents• Calculating on every page view is too often• Indexing may help but it will not solve the problem• Fetching old data destroys page cache

35


• Solution: Fast Accounting Design Pattern• Create a separate collection to store aggregate counters• Update counters when a new review is submitted• If there are more than one counters, multiple updates will be performed• This is a pattern taken from Complex Event Processing (CEP)

36

Pattern: Fast Accounting - Schema

Create a separate collection named 'review_counts':{ '_id':

{'movie_id: ObjectId(…),'day' : '2016-04-21'},

'count' : 10345},

{ '_id': {'movie_id: ObjectId(…),

'month' : '2016-04'},'count' : 11210345

}

37

Query Dimensions

Pattern: Fast Accounting – Increment countsUpdate daily counts:>>> db.review_counts.update_one({'_id':

{'movie_id': ObjectId(…),'day' : '2016-04-21'}},{'$inc' : {'count' : 1}}, upsert=True)

Update monthly counts:>>> db.review_counts.update_one({'_id':

{'movie_id': ObjectId(…),'month' : '2016-04'}},{'$inc' : {'count' : 1}}, upsert=True)

38

Pattern: Fast Accounting – Retrieve counts

Retrieve daily count for a single day:>>> db.review_counts.find_one({'_id':

{'movie_id': ObjectId(…),'day' : '2016-04-21'}})['count']

10345

39


• Documents for the latest dates and months are in memory• Retrievals are very fast because they search indexed data• Updates are very fast – They happen in memory• Use the _id index to ensure uniqueness and save space

40


• Updates are atomic – They can scale to thousands of concurrent updates• Always use upsert=True to create new counters• More dimensions can be added in the counter – don't overdo it• This pattern can be adopted for aggregating any timeseries data

41

Keyword Search PatternModelling data for retrieval based on specific keyword or tag

Living in the #hashtag world

Use case: Retrieve a document based on a specific hashtag or keywordNaive solution: Add all tags delimited to a “tags” field and create a text index e.g. db.movies.createIndex({"tags":"text"})• Text indexes require more space• Take very long to build• Significantly reduce insertion time• More intensive retrieval processing

43

Keyword Search Pattern

• Solution: Keyword Search Pattern• Create separate index entries per tags• Groups documents based on tags• Leverages multi-key indexes using an array (automatically created)• Results in smaller and faster indexes compared to text

44

Keyword Search Pattern: Schema

• A “movies” collection with search keywords / tags:{ '_id':

{'movie_id: ObjectId(…),'title' : 'World War Z'},'tags' : ['thriller','2016','zombies']

},db.movies.createIndex({tags: 1})

### Separate index entries have now been created for:### - thriller### - 2016### - zombies

45

Gotchas and Pitfalls

• Be careful: Insertion degrades on high cardinality (i.e. thousands) • If indexes get too large asynchronous indexing may be required• Allowing for “free-text” entry can lead to high cardinality - try to maintain a list if possible

• Be weary of case-sensitivity - consider forcing UPPER / LOWER case• Do not use this pattern for full-text-search, rather prefer “text” indexing

46

Defensive programmingBest practices for reading and writing data with a schemaless database

Structure in a schemaless world

MongoDB does not enforce schemaKey considerations for coding:

• Is the data I’m writing valid?• Is the data I’m reading valid?

48

Structure in a schemaless world

Methods for ensuring data is valid:• Using BSON document types• Document validation capability (3.2+)

49

BSON Document Types

BSON provides support for common variable types, most importantly:

bool … int … long … double

string … array

timestamp … date

objectId … object

…

50

BSON Document Types

• Python types supported by Pymongo• Pymongo converts Python types in a JSON document to BSON types • BSON types also supported by the Java driver• Generally - language specific drivers support BSON• Custom Types can also be defined using a “class” • Document types can also be defined using an ORM such as MongoEngine

51

BSON Document Types

For example - insert a document in Pymongo enforcing datetime:

>>> doc = {

"date": datetime(2003, 11, 26),"title_id":"tt0111161","user_location":"Texas","title_name":"The Shawshank Redemption","summary":"Best movie ever!!”

}

>>> db.bsontest.insert(doc)

52

BSON Document Types

Then retrieve the value in Mongo cli:

> db.bsontest.find() {

"_id" : ObjectId("570aae6d0059a38a781fed60"),"title_id" : "tt0111161","user_location" : "Texas","summary" : "Tied for the best movie I have ever

seen","date" : ISODate("2003-11-26T00:00:00Z"),"title_name" : "The Shawshank Redemption"

}

53

Document Validation

Document validation is supported in Mongo 3.2+Validation can be set during collection creation or on an existing collectionTwo modes of operation:

• Strict - Applied to all document inserts / updates• Moderate - Applied to inserts / updates on documents that conform

• Setting validationAction:• “warn” for testing (logs errors)• “error” for enforcing (throws an error)

54

New Document Validation

Create a validation on the “dvtest” collection:db.createCollection(“dvtest",{ validator :

{ $and: [ {"title_id" : { $type: "string" }},{"user_location" : { $exists: true }},{"title_name" : { $type: "string" }}

]}

}

55

New Document Validation

Insert an invalid document into the “dvtest” collection:db.dvtest.insert({“foo": "bar"})WriteResult({

"nInserted" : 0,"writeError" : {

"code" : 121,"errmsg" : "Document failed validation"

}})

56

Existing Document Validation

Add a new validation to an existing “dvtest” collection with “moderate” validation:db.runCommand( {

collMod: "dvtest",validator: {$and:[{title_id: {$exists:true}}]},validationLevel: "moderate"}

)

57

Read / Write ConcernRead / Write Concern levels for CRUD operations

What is Write Concern

Write concern determines the level of acknowledgement for data written by mongod processes: • w = 0: No acknowledgement at all. It fails only if connectivity errors occur at the client application

• w = 1 (default): Require acknowledgement by the Primary replica• w > 1: Acknowledgment by the number of replicas equal to w• w = “majority”

59

Write Concern

• wtimeout: Time (ms) for an acknowledgement to return• The j=True option requires an acknowledgement that data was written to the database journal

60

Write Concern Tips

• Never do unsafe writes (w=0) – Except if you don’t care about your data• w=1 is not safe at all. A write can be overwritten by an outdated replica after a fail over

• w='majority' is safe. But it’s slow• w>1 is your best bet• Always use wtimeout when w>1. If write concern cannot be achieved, the write will block forever

61

Read Preference

read_preference specifies the replica instance that read operations are directed at:Possible values: • PRIMARY [default]• PRIMARY_PREFERRED• SECONDARY• SECONDARY_PREFERRED• NEAREST

62

Read Concern

• Read concern specifies the isolation level for read operations• ReadConcern(‘local’) returns local data stored on the replica queried [default]

• ReadConcern(‘majority’) returns data replicated to the majority of replicas i.e. already replicated

• majority is only supported by the WiredTiger storage engine, not by the MMAPv1

63

Read Concern Tips

• Read from secondaries when possible to scale reads• All read preference modes except PRIMARY may return stale data because of replication lag

• majority read concern is slow• majority read concern does not guarantee the latest data, but the latest data replicated to the majority of replicas

64

Sharding ConsiderationsHash vs. Timestamp Distribution

Sharding in MongoDB

• Sharding: Horizontal partitioning of data across multiple nodes / replicasets

• Sharded replicasets are recommended for HA• Collections are sharded across replicasets based on a shard key• High cardinality of the shard-key ensures even distribution across replicasets

• Collections which are not sharded remain on the primary shard

66

Sharding in MongoDB

What are my sharding options?• Hash based • Range based• Tag based

67

Hash Based Sharding

• Use hash indexes for the ranges• Evenly distributed reads / writes• Random operations due to random sharding algorithm• Retrieving multiple documents can lead to scatter - gather• Key use cases:

• Scaling: Load balancing reads & writes (example to follow)• Disaster recovery: Parallel shard recovery

68

Hash Based Sharding

Example: Shard key hash(datetime) - good write distribution

69

Shard1 (Primary) Shard2 Shard3

WRITES

“2016-4-19 00:00:00” “2016-4-20 00:00:00” “2016-4-18 00:00:00”

Hash Based Sharding

Example: Shard key hash(datetime) - scatter gather reads

70


“2016-4-19 00:00:00” “2016-4-20 00:00:00” “2016-4-18 00:00:00”

READS

Find datetime values between 17th & 21st of April

Hash Based Sharding

Example: Shard key hash(userid) - good read distribution

71


“ed4f7269”

Find user with id = ed4f7269

READS

Range Based Sharding

• Ranges are defined on the defined data e.g. number / date-time• Data is divided across range of documents

• E.G. 4x shards with int 1..100 >> Shard1 with values 1..25 etc.• Can lead to hotspot shards on date-based ranges• As ranges change chunk migration may cause overhead• Key use cases:

• Scaling: Load balancing reads & writes• Disaster recovery: Parallel shard recovery

72


Example: Shard key datetime value - bad write distribution

73


WRITES

“2016-4-18 00:00:00” “2016-4-19 00:00:00” “2016-4-20 00:00:00”

On 20th April all writes go to Shard3


Example: Shard key datetime value - bad write distribution

74


“2016-4-18 00:00:00” “2016-4-19 00:00:00” “2016-4-20 00:00:00”

Similar scenario with reads

READS

Tag Based Sharding

Allows for custom data distributionData is divided across predefined tags

• E.G. “Americas” on Shard1 .. “EU” on Shard2 .. “APAC” on Shard3Can lead to hotspots depending on use-caseKey use cases:

• Geo-locality: Force data into suitable geographically dispersed shards• HW Optimization: Force hot data onto faster hardware

75

Tag Based Sharding

Example: Shard tags on “location” - faster response times

76

Shard1 (Primary) Tag: AM Shard2 Tag: EU Shard3 Tag: APAC

WRITES

“USA” “GREECE” “AUSTRALIA”

Writes occur in a local DC

Tag Based Sharding

Example: Shard tags on “location” - faster response times

77

Shard1 (Primary) Tag: AM Shard2 Tag: EU Shard3 Tag: APAC

WRITES

“USA” “GREECE” “AUSTRALIA”

READS

Reads occur in a local DC

Tag Based Sharding

Example: Shard tags on “year” ranges - automatic archiving

78

Shard1 (Primary) Tag: 2016 Shard2 Tag: 2010 - 2015 Shard3 Tag: < 2010

WRITES

“<NEW DATA>”

New data is written to high speed node

32x Cores - 128GB RAM - SSD 16x Cores - 64GB RAM - SSD 4x Cores - 32GB RAM - Rotational

“<FEWER WRITES>” “<NO WRITE ACTIVITY>”

Tag Based Sharding

Example: Shard tags on “year” ranges - automatic archiving

79

Shard1 (Primary) Tag: 2016 Shard2 Tag: 2010 - 2015 Shard3 Tag: < 2010

WRITES

“<NEW DATA>”

64x Cores - 256GB RAM - SSD 16x Cores - 64GB RAM - SSD 4x Cores - 32GB RAM - Rotational

READS

“<FEWER READS>” “<ONLY REPORTING>”

New data is written to high speed node

…the endQ&A