breakingbarriers:) mongodb)design)patternspattern:)fast)accounting...

80
Breaking Barriers: MongoDB Design Patterns Nikolaos Vyzas & Christos Soulios

Upload: others

Post on 14-Jul-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: BreakingBarriers:) MongoDB)Design)PatternsPattern:)Fast)Accounting Use(case:(Count(dailyand(monthlyreviewsposted(for(each(movie.(Displaya(histogram(on(the(movie(page Naive(solution:(Run(countson(the

Breaking  Barriers:  MongoDB  Design  Patterns

Nikolaos  Vyzas  &  Christos  Soulios

Page 2: BreakingBarriers:) MongoDB)Design)PatternsPattern:)Fast)Accounting Use(case:(Count(dailyand(monthlyreviewsposted(for(each(movie.(Displaya(histogram(on(the(movie(page Naive(solution:(Run(countson(the

a  bit  about  us…  and  this  talkWho  we  are…  what  we  do…

Page 3: BreakingBarriers:) MongoDB)Design)PatternsPattern:)Fast)Accounting Use(case:(Count(dailyand(monthlyreviewsposted(for(each(movie.(Displaya(histogram(on(the(movie(page Naive(solution:(Run(countson(the

Christos  Soulios

• Christos  is  a  principal  architect  at  Pythian  • Delivers  Big  Data  platforms  for  some  of  the  world's  top  tech  organizations

• Expert  in  Big  Data,  Hadoop,  NoSQL etc• Working  with  MongoDB since  v1.7  (back  in  2011)

3

Page 4: BreakingBarriers:) MongoDB)Design)PatternsPattern:)Fast)Accounting Use(case:(Count(dailyand(monthlyreviewsposted(for(each(movie.(Displaya(histogram(on(the(movie(page Naive(solution:(Run(countson(the

Nik  Vyzas

• Nik  is  a  Sr.  TechOps  Architect  at  Percona• 10+  years  experience  in  production  support  and  enterprise  software  development  for  large  scale  distributed  environments

• Expert  in  a  variety  of  open-­source  technologies  especially  RHEL,  Debian,  Percona  Server,  XtraDB  Cluster,  MongoDB,  Ansible,  Java  and  Python.  

• Over  the  years  he  has  also  mastered  the  dark  art  of  turning  caffeine  into  new  software  and  bug  fixes.

4

Page 5: BreakingBarriers:) MongoDB)Design)PatternsPattern:)Fast)Accounting Use(case:(Count(dailyand(monthlyreviewsposted(for(each(movie.(Displaya(histogram(on(the(movie(page Naive(solution:(Run(countson(the

What  is  this  talk  about?

• Proven  MongoDB  design  patterns• Data  modelling  and  indexing  principles• Common  MongoDB  pitfalls  and  how  to  avoid  them• Balancing  performance  and  data  consistency• Best  practices  for  scaling  out  /  sharding• How  to  generally  press  the  go-­faster  button

5

Page 6: BreakingBarriers:) MongoDB)Design)PatternsPattern:)Fast)Accounting Use(case:(Count(dailyand(monthlyreviewsposted(for(each(movie.(Displaya(histogram(on(the(movie(page Naive(solution:(Run(countson(the

Session  Overview

• Indexing  Strategy• Data  Modeling:  To  reference  or  to  embed?• Ranking  /  Fast  Accounting  in  MongoDB• Atomic  Updates  and  the  Optimistic  Locking  Pattern• Keyword  Search  Pattern• Defensive  Programming• Read  /  Write  Concern• Sharding  Considerations

6

Page 7: BreakingBarriers:) MongoDB)Design)PatternsPattern:)Fast)Accounting Use(case:(Count(dailyand(monthlyreviewsposted(for(each(movie.(Displaya(histogram(on(the(movie(page Naive(solution:(Run(countson(the

Indexing  StrategyOptimizing  Query  Performance

Page 8: BreakingBarriers:) MongoDB)Design)PatternsPattern:)Fast)Accounting Use(case:(Count(dailyand(monthlyreviewsposted(for(each(movie.(Displaya(histogram(on(the(movie(page Naive(solution:(Run(countson(the

Sometimes  queries  are  a  bit  slow…

8

Page 9: BreakingBarriers:) MongoDB)Design)PatternsPattern:)Fast)Accounting Use(case:(Count(dailyand(monthlyreviewsposted(for(each(movie.(Displaya(histogram(on(the(movie(page Naive(solution:(Run(countson(the

Think  about  your  indexing

9

Page 10: BreakingBarriers:) MongoDB)Design)PatternsPattern:)Fast)Accounting Use(case:(Count(dailyand(monthlyreviewsposted(for(each(movie.(Displaya(histogram(on(the(movie(page Naive(solution:(Run(countson(the

Index  Types

Basic  index  types:• Single  field  indexes• Compound  indexes

Other  noteworthy  indexes:• Text  indexes• Geospatial  indexes• Hashed  indexes  (*  mainly  for  sharding)

10

Page 11: BreakingBarriers:) MongoDB)Design)PatternsPattern:)Fast)Accounting Use(case:(Count(dailyand(monthlyreviewsposted(for(each(movie.(Displaya(histogram(on(the(movie(page Naive(solution:(Run(countson(the

Index  Properties

• Unique  -­ acts  as  index  and  constraint• Sparse  -­ only  when  field  exists  • Partial  -­ based  on  specified  criteria• TTL  -­ *  Keep  in  mind  the  TTLDeleter  thread  runs  every  60  seconds

11

Page 12: BreakingBarriers:) MongoDB)Design)PatternsPattern:)Fast)Accounting Use(case:(Count(dailyand(monthlyreviewsposted(for(each(movie.(Displaya(histogram(on(the(movie(page Naive(solution:(Run(countson(the

Indexing  Tips  

• Ensure  your  indexes  fit  in  memory.  Try  to  be  minimal• Don't  index  everything  – Indexes  are  costly• When  indexing  timestamps,  always  index  coarsely.  • NEVER  index  milliseconds

• Don’t  index  fields  with  low  cardinality• Careful  with  text  indexes

12

Page 13: BreakingBarriers:) MongoDB)Design)PatternsPattern:)Fast)Accounting Use(case:(Count(dailyand(monthlyreviewsposted(for(each(movie.(Displaya(histogram(on(the(movie(page Naive(solution:(Run(countson(the

More  Indexing  Tips  

• Prefer  compound  indexes  that  will  improve  multiple  queries• Create  indexes  that  cover  the  queries  -­ all  data  is  retrieved  from  the  index• When  developing  code  you  can  start  mongod  with  the  –notablescan  option• Over  time  schemas  and  query  patterns  evolve,  always  review  your  indexes

13

Page 14: BreakingBarriers:) MongoDB)Design)PatternsPattern:)Fast)Accounting Use(case:(Count(dailyand(monthlyreviewsposted(for(each(movie.(Displaya(histogram(on(the(movie(page Naive(solution:(Run(countson(the

The  explain()  plan

14

Page 15: BreakingBarriers:) MongoDB)Design)PatternsPattern:)Fast)Accounting Use(case:(Count(dailyand(monthlyreviewsposted(for(each(movie.(Displaya(histogram(on(the(movie(page Naive(solution:(Run(countson(the

The  explain()  plan

• Returns  the  query  execution  plan  for  a  specific  query• Provides  execution  statistics  e.g.  rows  scanned,  indexes  used  etc.• For  sharded  collections  information  regarding  shards  accessed  is  included

15

Use the explain plan to identify required indexesfor filtering and sorting documents

Page 16: BreakingBarriers:) MongoDB)Design)PatternsPattern:)Fast)Accounting Use(case:(Count(dailyand(monthlyreviewsposted(for(each(movie.(Displaya(histogram(on(the(movie(page Naive(solution:(Run(countson(the

The  explain()  plan  – No  Index

db.movies.find({'year': '2001'}).explain(true)…

"queryPlanner" : {"winningPlan" : {

"stage" : "COLLSCAN","filter" : { "year" : {"$eq" : "2001" } },"direction" : "forward"

}}

Page 17: BreakingBarriers:) MongoDB)Design)PatternsPattern:)Fast)Accounting Use(case:(Count(dailyand(monthlyreviewsposted(for(each(movie.(Displaya(histogram(on(the(movie(page Naive(solution:(Run(countson(the

The  explain()  plan  – No  Index

"executionStats" : {"totalDocsExamined" : 250,…"executionStages" : {

… "nReturned" : 7,"advanced" : 7,"direction" : "forward","docsExamined" : 250

},},

17

Page 18: BreakingBarriers:) MongoDB)Design)PatternsPattern:)Fast)Accounting Use(case:(Count(dailyand(monthlyreviewsposted(for(each(movie.(Displaya(histogram(on(the(movie(page Naive(solution:(Run(countson(the

The  explain()  plan  -­ Indexed

"queryPlanner" : {"winningPlan" : {

"stage" : "IXSCAN","keyPattern" : { "year" : 1 },"indexName" : "year_1","isMultiKey" : false,"isUnique" : false,"isSparse" : false,"isPartial" : false,"indexVersion" : 1,"direction" : "forward","indexBounds" : { "year" : ["[\"2001\", \"2001\"]" ] }

…18

Page 19: BreakingBarriers:) MongoDB)Design)PatternsPattern:)Fast)Accounting Use(case:(Count(dailyand(monthlyreviewsposted(for(each(movie.(Displaya(histogram(on(the(movie(page Naive(solution:(Run(countson(the

The  explain()  plan  -­ Indexed

"executionStats" : {…

"nReturned" : 7,

"executionTimeMillis" : 0,

"totalKeysExamined" : 7,

"totalDocsExamined" : 7,

}

19

Page 20: BreakingBarriers:) MongoDB)Design)PatternsPattern:)Fast)Accounting Use(case:(Count(dailyand(monthlyreviewsposted(for(each(movie.(Displaya(histogram(on(the(movie(page Naive(solution:(Run(countson(the

The  Database  Profiler

20

• Collects  data  about  operations,  cursors  and  db  commands• Configurable  per  database  or  per  instance• Allows  setting  slowOpThresholdMs  to  capture  only  slow  queries  or  all• Crucial  for  identifying  bottlenecks  and  understanding  workload

Page 21: BreakingBarriers:) MongoDB)Design)PatternsPattern:)Fast)Accounting Use(case:(Count(dailyand(monthlyreviewsposted(for(each(movie.(Displaya(histogram(on(the(movie(page Naive(solution:(Run(countson(the

Data  Modeling:  To  reference  or  to  embed?

Embedded  vs.  referenced  pattern  implementations

Page 22: BreakingBarriers:) MongoDB)Design)PatternsPattern:)Fast)Accounting Use(case:(Count(dailyand(monthlyreviewsposted(for(each(movie.(Displaya(histogram(on(the(movie(page Naive(solution:(Run(countson(the

Data  modeling

• We  define  data  relationships  between  collections• How  do  I  join  data?• Effective  data  modeling  • Application  side  joins

• Two  basic  models:  Embedded  or  Referenced• ALWAYS  ask  yourself  “to  reference  or  to  embed?”

22

Page 23: BreakingBarriers:) MongoDB)Design)PatternsPattern:)Fast)Accounting Use(case:(Count(dailyand(monthlyreviewsposted(for(each(movie.(Displaya(histogram(on(the(movie(page Naive(solution:(Run(countson(the

Pattern:  Embedded  One-­to-­One  Relationship{'_id' : ObjectId(…),

'title' : 'Shawshank Redemption',…

'director': { 'name' : 'Frank Darabont', … },…

}

23

Page 24: BreakingBarriers:) MongoDB)Design)PatternsPattern:)Fast)Accounting Use(case:(Count(dailyand(monthlyreviewsposted(for(each(movie.(Displaya(histogram(on(the(movie(page Naive(solution:(Run(countson(the

Pattern:  Embedded  One-­to-­Many  Relationship

{'_id' : ObjectId(…),

'title' : 'Shawshank Redemption',…

'writers' : [{'name':'Stephen King', … },

{'name':'Frank Darabont', … }], …

}

24

Page 25: BreakingBarriers:) MongoDB)Design)PatternsPattern:)Fast)Accounting Use(case:(Count(dailyand(monthlyreviewsposted(for(each(movie.(Displaya(histogram(on(the(movie(page Naive(solution:(Run(countson(the

The  Embedded  Model

• Faster  reads  /  writes  – whole  BSON  is  retrieved  in  1x  database  call• Updates  at  the  document  level  enforce  atomicity• Duplication  can  lead  to  data  inconsistencies• Avoid  embedding  data  with  unbound  growth• Never  embed  documents  that  grow  after  creation  (MMAPv1  storage  engine)

25

Page 26: BreakingBarriers:) MongoDB)Design)PatternsPattern:)Fast)Accounting Use(case:(Count(dailyand(monthlyreviewsposted(for(each(movie.(Displaya(histogram(on(the(movie(page Naive(solution:(Run(countson(the

Pattern:  Referenced  One-­to-­Many  Relationship

26

{'_id' : 'xyz','title' : 'The Wall',...

}

{'_id' : ObjectId(…),'movie_id' : 'xyz','rating' : 8,...

}

movies

reviews

{'_id' : ObjectId(…),'movie_id' : 'xyz','rating' : 2,...

}

Reference on movie_id = 'xyz'

Page 27: BreakingBarriers:) MongoDB)Design)PatternsPattern:)Fast)Accounting Use(case:(Count(dailyand(monthlyreviewsposted(for(each(movie.(Displaya(histogram(on(the(movie(page Naive(solution:(Run(countson(the

The  Referenced  Model

• Enforces  data  consistency• Allows  for  Parent  &  Child  Tree  References• Each  relationship  requires  an  additional  call• Ensure  that  your  referenced  fields  are  indexed• This  becomes  costly:

▪ Makes  reading  slower▪ Makes  writing  slower▪ Requires  more  indexes

27

Page 28: BreakingBarriers:) MongoDB)Design)PatternsPattern:)Fast)Accounting Use(case:(Count(dailyand(monthlyreviewsposted(for(each(movie.(Displaya(histogram(on(the(movie(page Naive(solution:(Run(countson(the

Atomic  Updates  &  the  Optimistic  Lock  Pattern

Atomic  Updates  &  Collection  Versioning

Page 29: BreakingBarriers:) MongoDB)Design)PatternsPattern:)Fast)Accounting Use(case:(Count(dailyand(monthlyreviewsposted(for(each(movie.(Displaya(histogram(on(the(movie(page Naive(solution:(Run(countson(the

Atomic  Update  Operations

>>> db.movies.update_one({'rating': {'$gt': 9 }},

{'$set' : {'favorite' : True }})

>>> old_doc = db.movies.find_one_and_update(

{'rating': {'$gt': 9 }},

{'$set' : {'favorite' : True }})

29

Page 30: BreakingBarriers:) MongoDB)Design)PatternsPattern:)Fast)Accounting Use(case:(Count(dailyand(monthlyreviewsposted(for(each(movie.(Displaya(histogram(on(the(movie(page Naive(solution:(Run(countson(the

Atomic  Update  Operations

>>> db.movies.update_one({'rating': {'$gt': 9 }},

{'$set' : {'favorite' : True }})

>>> old_doc = db.movies.find_one_and_update(

{'rating': {'$gt': 9 }},

{'$set' : {'favorite' : True }})

30

The update and return occurs within a single atomic operation

Page 31: BreakingBarriers:) MongoDB)Design)PatternsPattern:)Fast)Accounting Use(case:(Count(dailyand(monthlyreviewsposted(for(each(movie.(Displaya(histogram(on(the(movie(page Naive(solution:(Run(countson(the

Update  Operators

31

The list of valid update operators:

$inc : Increment counter$set : Set a new value $unset : Set value = NULL$addToSet : Add value into array (duplicates not inserted)$push / $pushAll : Add value into array $pop / $popAll : Remove first / last value(s) of array$pull / $pullAll : Remove instance(s) of value from array$rename : Update key name(s)

Page 32: BreakingBarriers:) MongoDB)Design)PatternsPattern:)Fast)Accounting Use(case:(Count(dailyand(monthlyreviewsposted(for(each(movie.(Displaya(histogram(on(the(movie(page Naive(solution:(Run(countson(the

Pattern:  Optimistic  Locking

For  complex  changes  use  the  Optimistic  Locking  Design  Pattern:• Include  a  version  field  in  all  documents

{'_id': ObjectId(…), 'title':'Zootopia', 'v':1 }

• Retrieve  a  document  and  remember  its  version• Make  a  series  of  complex  transformations  to  the  document  or  create  a  new  one

• Do  not  forget  to  increment  the  version  of  the  new  document• Update  the  document  only  if  the  version  has  not  changed

32

Page 33: BreakingBarriers:) MongoDB)Design)PatternsPattern:)Fast)Accounting Use(case:(Count(dailyand(monthlyreviewsposted(for(each(movie.(Displaya(histogram(on(the(movie(page Naive(solution:(Run(countson(the

Pattern:  Optimistic  Locking

Update  only  if  the  document  version  has  not  changedm = db.movies.find_one({'title' : 'Zootopia'})v = m['v'] # Remember the old versionm = complex_transformations(m)m['v'] = v + 1 # Increment the versionr = db.movies.replace_one({'_id' : m['_id'], 'v' : v}, m)if r.modified_count == 0:

compensate()

33

Page 34: BreakingBarriers:) MongoDB)Design)PatternsPattern:)Fast)Accounting Use(case:(Count(dailyand(monthlyreviewsposted(for(each(movie.(Displaya(histogram(on(the(movie(page Naive(solution:(Run(countson(the

Ranking  /  Fast  Accounting  in  MongoDB

High  performance  accounting  to  avoid  aggregation  

Page 35: BreakingBarriers:) MongoDB)Design)PatternsPattern:)Fast)Accounting Use(case:(Count(dailyand(monthlyreviewsposted(for(each(movie.(Displaya(histogram(on(the(movie(page Naive(solution:(Run(countson(the

Pattern:  Fast  Accounting

Use  case:  Count  daily  and  monthly  reviews  posted  for  each  movie.  Display  a  histogram  on  the  movie  pageNaive  solution:  Run  counts  on  the  reviews  collection  when  histograms  must  be  rendered• Slow  and  resource  consuming  to  aggregate  millions  of  documents• Calculating  on  every  page  view  is  too  often• Indexing  may  help  but  it  will  not  solve  the  problem• Fetching  old  data  destroys  page  cache

35

Page 36: BreakingBarriers:) MongoDB)Design)PatternsPattern:)Fast)Accounting Use(case:(Count(dailyand(monthlyreviewsposted(for(each(movie.(Displaya(histogram(on(the(movie(page Naive(solution:(Run(countson(the

Pattern:  Fast  Accounting

• Solution:  Fast  Accounting  Design  Pattern• Create  a  separate  collection  to  store  aggregate  counters• Update  counters  when  a  new  review  is  submitted• If  there  are  more  than  one  counters,  multiple  updates  will  be  performed• This  is  a  pattern  taken  from  Complex  Event  Processing  (CEP)

36

Page 37: BreakingBarriers:) MongoDB)Design)PatternsPattern:)Fast)Accounting Use(case:(Count(dailyand(monthlyreviewsposted(for(each(movie.(Displaya(histogram(on(the(movie(page Naive(solution:(Run(countson(the

Pattern:  Fast  Accounting  -­ Schema

Create  a  separate  collection  named  'review_counts':{ '_id':

{'movie_id: ObjectId(…),'day' : '2016-04-21'},

'count' : 10345},

{ '_id': {'movie_id: ObjectId(…),

'month' : '2016-04'},'count' : 11210345

}

37

Query Dimensions

Page 38: BreakingBarriers:) MongoDB)Design)PatternsPattern:)Fast)Accounting Use(case:(Count(dailyand(monthlyreviewsposted(for(each(movie.(Displaya(histogram(on(the(movie(page Naive(solution:(Run(countson(the

Pattern:  Fast  Accounting  – Increment  countsUpdate  daily  counts:>>> db.review_counts.update_one({'_id':

{'movie_id': ObjectId(…),'day' : '2016-04-21'}},{'$inc' : {'count' : 1}}, upsert=True)

Update  monthly  counts:>>> db.review_counts.update_one({'_id':

{'movie_id': ObjectId(…),'month' : '2016-04'}},{'$inc' : {'count' : 1}}, upsert=True)

38

Page 39: BreakingBarriers:) MongoDB)Design)PatternsPattern:)Fast)Accounting Use(case:(Count(dailyand(monthlyreviewsposted(for(each(movie.(Displaya(histogram(on(the(movie(page Naive(solution:(Run(countson(the

Pattern:  Fast  Accounting  – Retrieve  counts

Retrieve  daily  count  for  a  single  day:>>> db.review_counts.find_one({'_id':

{'movie_id': ObjectId(…),'day' : '2016-04-21'}})['count']

10345

39

Page 40: BreakingBarriers:) MongoDB)Design)PatternsPattern:)Fast)Accounting Use(case:(Count(dailyand(monthlyreviewsposted(for(each(movie.(Displaya(histogram(on(the(movie(page Naive(solution:(Run(countson(the

Pattern:  Fast  Accounting

• Documents  for  the  latest  dates  and  months  are  in  memory• Retrievals  are  very  fast  because  they  search  indexed  data• Updates  are  very  fast  – They  happen  in  memory• Use  the  _id  index  to  ensure  uniqueness  and  save  space

40

Page 41: BreakingBarriers:) MongoDB)Design)PatternsPattern:)Fast)Accounting Use(case:(Count(dailyand(monthlyreviewsposted(for(each(movie.(Displaya(histogram(on(the(movie(page Naive(solution:(Run(countson(the

Pattern:  Fast  Accounting

• Updates  are  atomic – They  can  scale  to  thousands  of  concurrent  updates• Always  use  upsert=True  to  create  new  counters• More  dimensions  can  be  added  in  the  counter  – don't  overdo  it• This  pattern  can  be  adopted  for  aggregating  any  timeseries  data

41

Page 42: BreakingBarriers:) MongoDB)Design)PatternsPattern:)Fast)Accounting Use(case:(Count(dailyand(monthlyreviewsposted(for(each(movie.(Displaya(histogram(on(the(movie(page Naive(solution:(Run(countson(the

Keyword  Search  PatternModelling  data  for  retrieval  based  on  specific  keyword  or  tag

Page 43: BreakingBarriers:) MongoDB)Design)PatternsPattern:)Fast)Accounting Use(case:(Count(dailyand(monthlyreviewsposted(for(each(movie.(Displaya(histogram(on(the(movie(page Naive(solution:(Run(countson(the

Living  in  the  #hashtag  world

Use  case:  Retrieve  a  document  based  on  a  specific  hashtag  or  keywordNaive  solution:  Add  all  tags  delimited  to  a  “tags”  field  and  create  a  text index  e.g.  db.movies.createIndex({"tags":"text"})• Text  indexes  require  more  space• Take  very  long  to  build• Significantly  reduce  insertion  time• More  intensive  retrieval  processing  

43

Page 44: BreakingBarriers:) MongoDB)Design)PatternsPattern:)Fast)Accounting Use(case:(Count(dailyand(monthlyreviewsposted(for(each(movie.(Displaya(histogram(on(the(movie(page Naive(solution:(Run(countson(the

Keyword  Search  Pattern

• Solution:  Keyword  Search  Pattern• Create  separate  index  entries  per  tags• Groups  documents  based  on  tags• Leverages  multi-­key  indexes  using  an  array  (automatically  created)• Results  in  smaller  and  faster  indexes  compared  to  text

44

Page 45: BreakingBarriers:) MongoDB)Design)PatternsPattern:)Fast)Accounting Use(case:(Count(dailyand(monthlyreviewsposted(for(each(movie.(Displaya(histogram(on(the(movie(page Naive(solution:(Run(countson(the

Keyword  Search  Pattern:  Schema

• A  “movies”  collection  with  search  keywords  /  tags:{ '_id':

{'movie_id: ObjectId(…),'title' : 'World War Z'},'tags' : ['thriller','2016','zombies']

},db.movies.createIndex({tags: 1})

### Separate index entries have now been created for:### - thriller### - 2016### - zombies

45

Page 46: BreakingBarriers:) MongoDB)Design)PatternsPattern:)Fast)Accounting Use(case:(Count(dailyand(monthlyreviewsposted(for(each(movie.(Displaya(histogram(on(the(movie(page Naive(solution:(Run(countson(the

Gotchas  and  Pitfalls

• Be  careful:  Insertion  degrades  on  high  cardinality  (i.e.  thousands)  • If  indexes  get  too  large  asynchronous  indexing  may  be  required• Allowing  for  “free-­text”  entry  can  lead  to  high  cardinality  -­ try  to  maintain  a  list  if  possible

• Be  weary  of  case-­sensitivity  -­ consider  forcing  UPPER  /  LOWER  case• Do  not  use  this  pattern  for  full-­text-­search,  rather  prefer  “text”  indexing

46

Page 47: BreakingBarriers:) MongoDB)Design)PatternsPattern:)Fast)Accounting Use(case:(Count(dailyand(monthlyreviewsposted(for(each(movie.(Displaya(histogram(on(the(movie(page Naive(solution:(Run(countson(the

Defensive  programmingBest  practices  for  reading  and  writing  data  with  a  schemaless  database

Page 48: BreakingBarriers:) MongoDB)Design)PatternsPattern:)Fast)Accounting Use(case:(Count(dailyand(monthlyreviewsposted(for(each(movie.(Displaya(histogram(on(the(movie(page Naive(solution:(Run(countson(the

Structure  in  a  schemaless  world

MongoDB  does  not  enforce  schemaKey  considerations  for  coding:

• Is  the  data  I’m  writing  valid?• Is  the  data  I’m  reading  valid?

48

Page 49: BreakingBarriers:) MongoDB)Design)PatternsPattern:)Fast)Accounting Use(case:(Count(dailyand(monthlyreviewsposted(for(each(movie.(Displaya(histogram(on(the(movie(page Naive(solution:(Run(countson(the

Structure  in  a  schemaless  world

Methods  for  ensuring  data  is  valid:• Using  BSON  document  types• Document  validation  capability  (3.2+)

49

Page 50: BreakingBarriers:) MongoDB)Design)PatternsPattern:)Fast)Accounting Use(case:(Count(dailyand(monthlyreviewsposted(for(each(movie.(Displaya(histogram(on(the(movie(page Naive(solution:(Run(countson(the

BSON  Document  Types

BSON  provides  support  for  common  variable  types,  most  importantly:

bool … int … long … double

string … array

timestamp … date

objectId … object

50

Page 51: BreakingBarriers:) MongoDB)Design)PatternsPattern:)Fast)Accounting Use(case:(Count(dailyand(monthlyreviewsposted(for(each(movie.(Displaya(histogram(on(the(movie(page Naive(solution:(Run(countson(the

BSON  Document  Types

• Python  types  supported  by  Pymongo• Pymongo  converts  Python  types  in  a  JSON  document  to  BSON  types  • BSON  types  also  supported  by  the  Java  driver• Generally  -­ language  specific  drivers  support  BSON• Custom  Types  can  also  be  defined  using  a  “class”  • Document  types  can  also  be  defined  using  an  ORM  such  as  MongoEngine

51

Page 52: BreakingBarriers:) MongoDB)Design)PatternsPattern:)Fast)Accounting Use(case:(Count(dailyand(monthlyreviewsposted(for(each(movie.(Displaya(histogram(on(the(movie(page Naive(solution:(Run(countson(the

BSON  Document  Types

For  example  -­ insert  a  document  in  Pymongo  enforcing  datetime:

>>> doc = {

"date": datetime(2003, 11, 26),"title_id":"tt0111161","user_location":"Texas","title_name":"The Shawshank Redemption","summary":"Best movie ever!!”

}

>>> db.bsontest.insert(doc)

52

Page 53: BreakingBarriers:) MongoDB)Design)PatternsPattern:)Fast)Accounting Use(case:(Count(dailyand(monthlyreviewsposted(for(each(movie.(Displaya(histogram(on(the(movie(page Naive(solution:(Run(countson(the

BSON  Document  Types

Then  retrieve  the  value  in  Mongo  cli:

> db.bsontest.find() {

"_id" : ObjectId("570aae6d0059a38a781fed60"),"title_id" : "tt0111161","user_location" : "Texas","summary" : "Tied for the best movie I have ever

seen","date" : ISODate("2003-11-26T00:00:00Z"),"title_name" : "The Shawshank Redemption"

}

53

Page 54: BreakingBarriers:) MongoDB)Design)PatternsPattern:)Fast)Accounting Use(case:(Count(dailyand(monthlyreviewsposted(for(each(movie.(Displaya(histogram(on(the(movie(page Naive(solution:(Run(countson(the

Document  Validation

Document  validation  is  supported  in  Mongo  3.2+Validation  can  be  set  during  collection  creation  or  on  an  existing  collectionTwo  modes  of  operation:

• Strict  -­ Applied  to  all  document  inserts  /  updates• Moderate  -­ Applied  to  inserts  /  updates  on  documents  that  conform

• Setting  validationAction:• “warn”  for  testing  (logs  errors)• “error”  for  enforcing  (throws  an  error)

54

Page 55: BreakingBarriers:) MongoDB)Design)PatternsPattern:)Fast)Accounting Use(case:(Count(dailyand(monthlyreviewsposted(for(each(movie.(Displaya(histogram(on(the(movie(page Naive(solution:(Run(countson(the

New  Document  Validation

Create  a  validation  on  the  “dvtest”  collection:db.createCollection(“dvtest",{ validator :

{ $and: [ {"title_id" : { $type: "string" }},{"user_location" : { $exists: true }},{"title_name" : { $type: "string" }}

]}

}

55

Page 56: BreakingBarriers:) MongoDB)Design)PatternsPattern:)Fast)Accounting Use(case:(Count(dailyand(monthlyreviewsposted(for(each(movie.(Displaya(histogram(on(the(movie(page Naive(solution:(Run(countson(the

New  Document  Validation

Insert  an  invalid  document  into  the  “dvtest”  collection:db.dvtest.insert({“foo": "bar"})WriteResult({

"nInserted" : 0,"writeError" : {

"code" : 121,"errmsg" : "Document failed validation"

}})

56

Page 57: BreakingBarriers:) MongoDB)Design)PatternsPattern:)Fast)Accounting Use(case:(Count(dailyand(monthlyreviewsposted(for(each(movie.(Displaya(histogram(on(the(movie(page Naive(solution:(Run(countson(the

Existing  Document  Validation

Add  a  new  validation  to  an  existing  “dvtest”  collection  with  “moderate”  validation:db.runCommand( {

collMod: "dvtest",validator: {$and:[{title_id: {$exists:true}}]},validationLevel: "moderate"}

)

57

Page 58: BreakingBarriers:) MongoDB)Design)PatternsPattern:)Fast)Accounting Use(case:(Count(dailyand(monthlyreviewsposted(for(each(movie.(Displaya(histogram(on(the(movie(page Naive(solution:(Run(countson(the

Read  /  Write  ConcernRead  /  Write  Concern  levels  for  CRUD  operations

Page 59: BreakingBarriers:) MongoDB)Design)PatternsPattern:)Fast)Accounting Use(case:(Count(dailyand(monthlyreviewsposted(for(each(movie.(Displaya(histogram(on(the(movie(page Naive(solution:(Run(countson(the

What  is  Write  Concern

Write  concern  determines  the  level  of  acknowledgement  for  data  written  by  mongod  processes:  • w  =  0:  No  acknowledgement  at  all.  It  fails  only  if  connectivity  errors  occur  at  the  client  application

• w  =  1  (default):  Require  acknowledgement  by  the  Primary  replica• w  >  1:  Acknowledgment  by  the  number  of  replicas  equal  to  w• w  =  “majority”

59

Page 60: BreakingBarriers:) MongoDB)Design)PatternsPattern:)Fast)Accounting Use(case:(Count(dailyand(monthlyreviewsposted(for(each(movie.(Displaya(histogram(on(the(movie(page Naive(solution:(Run(countson(the

Write  Concern

• wtimeout:  Time  (ms)  for  an  acknowledgement  to  return• The  j=True  option  requires  an  acknowledgement  that  data  was  written  to    the  database  journal

60

Page 61: BreakingBarriers:) MongoDB)Design)PatternsPattern:)Fast)Accounting Use(case:(Count(dailyand(monthlyreviewsposted(for(each(movie.(Displaya(histogram(on(the(movie(page Naive(solution:(Run(countson(the

Write  Concern  Tips

• Never  do  unsafe  writes  (w=0)  – Except  if  you  don’t  care  about  your  data• w=1  is  not  safe  at  all.  A  write  can  be  overwritten  by  an  outdated  replica  after  a  fail  over

• w='majority'  is  safe.  But  it’s  slow• w>1  is  your  best  bet• Always  use  wtimeout  when  w>1.  If  write  concern  cannot  be  achieved,  the  write  will  block  forever

61

Page 62: BreakingBarriers:) MongoDB)Design)PatternsPattern:)Fast)Accounting Use(case:(Count(dailyand(monthlyreviewsposted(for(each(movie.(Displaya(histogram(on(the(movie(page Naive(solution:(Run(countson(the

Read  Preference

read_preference  specifies  the  replica    instance  that  read  operations  are  directed  at:Possible  values:  • PRIMARY  [default]• PRIMARY_PREFERRED• SECONDARY• SECONDARY_PREFERRED• NEAREST

62

Page 63: BreakingBarriers:) MongoDB)Design)PatternsPattern:)Fast)Accounting Use(case:(Count(dailyand(monthlyreviewsposted(for(each(movie.(Displaya(histogram(on(the(movie(page Naive(solution:(Run(countson(the

Read  Concern

• Read  concern  specifies  the  isolation  level  for  read  operations• ReadConcern(‘local’)  returns  local  data  stored  on  the  replica  queried  [default]  

• ReadConcern(‘majority’)  returns  data  replicated  to  the  majority  of  replicas  i.e.  already  replicated

• majority  is  only  supported  by  the  WiredTiger  storage  engine,  not  by  the  MMAPv1

63

Page 64: BreakingBarriers:) MongoDB)Design)PatternsPattern:)Fast)Accounting Use(case:(Count(dailyand(monthlyreviewsposted(for(each(movie.(Displaya(histogram(on(the(movie(page Naive(solution:(Run(countson(the

Read  Concern  Tips

• Read  from  secondaries when  possible  to  scale  reads• All  read  preference  modes  except  PRIMARY  may  return  stale  data  because  of  replication  lag

• majority  read  concern  is  slow• majority  read  concern  does  not  guarantee  the  latest  data,  but  the  latest  data  replicated  to  the  majority  of  replicas

64

Page 65: BreakingBarriers:) MongoDB)Design)PatternsPattern:)Fast)Accounting Use(case:(Count(dailyand(monthlyreviewsposted(for(each(movie.(Displaya(histogram(on(the(movie(page Naive(solution:(Run(countson(the

Sharding  ConsiderationsHash  vs.  Timestamp  Distribution

Page 66: BreakingBarriers:) MongoDB)Design)PatternsPattern:)Fast)Accounting Use(case:(Count(dailyand(monthlyreviewsposted(for(each(movie.(Displaya(histogram(on(the(movie(page Naive(solution:(Run(countson(the

Sharding  in  MongoDB

• Sharding:  Horizontal  partitioning  of  data  across  multiple  nodes  /  replicasets

• Sharded  replicasets  are  recommended  for  HA• Collections  are  sharded  across  replicasets  based  on  a  shard  key• High  cardinality  of  the  shard-­key  ensures  even  distribution  across  replicasets

• Collections  which  are  not  sharded  remain  on  the  primary  shard

66

Page 67: BreakingBarriers:) MongoDB)Design)PatternsPattern:)Fast)Accounting Use(case:(Count(dailyand(monthlyreviewsposted(for(each(movie.(Displaya(histogram(on(the(movie(page Naive(solution:(Run(countson(the

Sharding  in  MongoDB

What  are  my  sharding  options?• Hash  based    • Range  based• Tag  based

67

Page 68: BreakingBarriers:) MongoDB)Design)PatternsPattern:)Fast)Accounting Use(case:(Count(dailyand(monthlyreviewsposted(for(each(movie.(Displaya(histogram(on(the(movie(page Naive(solution:(Run(countson(the

Hash  Based  Sharding

• Use  hash  indexes  for  the  ranges• Evenly  distributed  reads  /  writes• Random  operations  due  to  random  sharding  algorithm• Retrieving  multiple  documents  can  lead  to  scatter  -­ gather• Key  use  cases:  

• Scaling:  Load  balancing  reads  &  writes  (example  to  follow)• Disaster  recovery:  Parallel  shard  recovery

68

Page 69: BreakingBarriers:) MongoDB)Design)PatternsPattern:)Fast)Accounting Use(case:(Count(dailyand(monthlyreviewsposted(for(each(movie.(Displaya(histogram(on(the(movie(page Naive(solution:(Run(countson(the

Hash  Based  Sharding

Example:  Shard  key  hash(datetime)  -­ good  write  distribution

69

Shard1 (Primary) Shard2 Shard3

WRITES

“2016-4-19 00:00:00” “2016-4-20 00:00:00” “2016-4-18 00:00:00”

Page 70: BreakingBarriers:) MongoDB)Design)PatternsPattern:)Fast)Accounting Use(case:(Count(dailyand(monthlyreviewsposted(for(each(movie.(Displaya(histogram(on(the(movie(page Naive(solution:(Run(countson(the

Hash  Based  Sharding

Example:  Shard  key  hash(datetime)  -­ scatter  gather  reads

70

Shard1 (Primary) Shard2 Shard3

“2016-4-19 00:00:00” “2016-4-20 00:00:00” “2016-4-18 00:00:00”

READS

Find datetime values between 17th & 21st of April

Page 71: BreakingBarriers:) MongoDB)Design)PatternsPattern:)Fast)Accounting Use(case:(Count(dailyand(monthlyreviewsposted(for(each(movie.(Displaya(histogram(on(the(movie(page Naive(solution:(Run(countson(the

Hash  Based  Sharding

Example:  Shard  key  hash(userid)  -­ good  read  distribution

71

Shard1 (Primary) Shard2 Shard3

“ed4f7269”

Find user with id = ed4f7269

READS

Page 72: BreakingBarriers:) MongoDB)Design)PatternsPattern:)Fast)Accounting Use(case:(Count(dailyand(monthlyreviewsposted(for(each(movie.(Displaya(histogram(on(the(movie(page Naive(solution:(Run(countson(the

Range  Based  Sharding

• Ranges  are  defined  on  the  defined  data  e.g.  number  /  date-­time• Data  is  divided  across  range  of  documents

• E.G.  4x  shards  with  int  1..100  >>  Shard1  with  values  1..25  etc.• Can  lead  to  hotspot  shards  on  date-­based  ranges• As  ranges  change  chunk  migration  may  cause  overhead• Key  use  cases:  

• Scaling:  Load  balancing  reads  &  writes• Disaster  recovery:  Parallel  shard  recovery

72

Page 73: BreakingBarriers:) MongoDB)Design)PatternsPattern:)Fast)Accounting Use(case:(Count(dailyand(monthlyreviewsposted(for(each(movie.(Displaya(histogram(on(the(movie(page Naive(solution:(Run(countson(the

Range  Based  Sharding

Example:  Shard  key  datetime  value  -­ bad  write  distribution

73

Shard1 (Primary) Shard2 Shard3

WRITES

“2016-4-18 00:00:00” “2016-4-19 00:00:00” “2016-4-20 00:00:00”

On 20th April all writes go to Shard3

Page 74: BreakingBarriers:) MongoDB)Design)PatternsPattern:)Fast)Accounting Use(case:(Count(dailyand(monthlyreviewsposted(for(each(movie.(Displaya(histogram(on(the(movie(page Naive(solution:(Run(countson(the

Range  Based  Sharding

Example:  Shard  key  datetime  value  -­ bad  write  distribution

74

Shard1 (Primary) Shard2 Shard3

“2016-4-18 00:00:00” “2016-4-19 00:00:00” “2016-4-20 00:00:00”

Similar scenario with reads

READS

Page 75: BreakingBarriers:) MongoDB)Design)PatternsPattern:)Fast)Accounting Use(case:(Count(dailyand(monthlyreviewsposted(for(each(movie.(Displaya(histogram(on(the(movie(page Naive(solution:(Run(countson(the

Tag  Based  Sharding

Allows  for  custom  data  distributionData  is  divided  across  predefined  tags

• E.G.  “Americas”  on  Shard1  ..  “EU”  on  Shard2  ..  “APAC”  on  Shard3Can  lead  to  hotspots  depending  on  use-­caseKey  use  cases:  

• Geo-­locality:  Force  data  into  suitable  geographically  dispersed  shards• HW  Optimization:  Force  hot  data  onto  faster  hardware

75

Page 76: BreakingBarriers:) MongoDB)Design)PatternsPattern:)Fast)Accounting Use(case:(Count(dailyand(monthlyreviewsposted(for(each(movie.(Displaya(histogram(on(the(movie(page Naive(solution:(Run(countson(the

Tag  Based  Sharding

Example:  Shard  tags  on  “location”  -­ faster  response  times

76

Shard1 (Primary) Tag: AM Shard2 Tag: EU Shard3 Tag: APAC

WRITES

“USA” “GREECE” “AUSTRALIA”

Writes occur in a local DC

Page 77: BreakingBarriers:) MongoDB)Design)PatternsPattern:)Fast)Accounting Use(case:(Count(dailyand(monthlyreviewsposted(for(each(movie.(Displaya(histogram(on(the(movie(page Naive(solution:(Run(countson(the

Tag  Based  Sharding

Example:  Shard  tags  on  “location”  -­ faster  response  times

77

Shard1 (Primary) Tag: AM Shard2 Tag: EU Shard3 Tag: APAC

WRITES

“USA” “GREECE” “AUSTRALIA”

READS

Reads occur in a local DC

Page 78: BreakingBarriers:) MongoDB)Design)PatternsPattern:)Fast)Accounting Use(case:(Count(dailyand(monthlyreviewsposted(for(each(movie.(Displaya(histogram(on(the(movie(page Naive(solution:(Run(countson(the

Tag  Based  Sharding

Example:  Shard  tags  on  “year”  ranges  -­ automatic  archiving

78

Shard1 (Primary) Tag: 2016 Shard2 Tag: 2010 - 2015 Shard3 Tag: < 2010

WRITES

“<NEW DATA>”

New data is written to high speed node

32x Cores - 128GB RAM - SSD 16x Cores - 64GB RAM - SSD 4x Cores - 32GB RAM - Rotational

“<FEWER WRITES>” “<NO WRITE ACTIVITY>”

Page 79: BreakingBarriers:) MongoDB)Design)PatternsPattern:)Fast)Accounting Use(case:(Count(dailyand(monthlyreviewsposted(for(each(movie.(Displaya(histogram(on(the(movie(page Naive(solution:(Run(countson(the

Tag  Based  Sharding

Example:  Shard  tags  on  “year”  ranges  -­ automatic  archiving

79

Shard1 (Primary) Tag: 2016 Shard2 Tag: 2010 - 2015 Shard3 Tag: < 2010

WRITES

“<NEW DATA>”

64x Cores - 256GB RAM - SSD 16x Cores - 64GB RAM - SSD 4x Cores - 32GB RAM - Rotational

READS

“<FEWER READS>” “<ONLY REPORTING>”

New data is written to high speed node

Page 80: BreakingBarriers:) MongoDB)Design)PatternsPattern:)Fast)Accounting Use(case:(Count(dailyand(monthlyreviewsposted(for(each(movie.(Displaya(histogram(on(the(movie(page Naive(solution:(Run(countson(the

…the  endQ&A