mongo db improve the performance of your application codemotion2016

MADRID · NOV 18-19 · 2016

19th November 201618:30-19:15

MongoDB:Improve the performance of your applicationby @juanroycouto

@codemotion_es

1

http://twitter.com/juanroycouto

MADRID · NOV 18-19 · 2016

#whoami

❖ @juanroycouto❖ www.juanroy.es❖ www.mongodbspain.com/en/author/juanroy❖ [email protected]❖ MongoDB DBA at Grupo Undanet

2

http://www.twitter.com/juanroycouto

http://www.twitter.com/juanroycouto

http://www.juanroy.es

http://www.juanroy.es

http://www.mongodbspain.com/en/author/juanroy

http://www.mongodbspain.com/en/author/juanroy

mailto:[email protected]

mailto:[email protected]

http://www.undanet.com/

MADRID · NOV 18-19 · 2016

#Agenda

3

● Data Model● Application Patterns● .explain()● Indexes● Storage Engines● Query Optimizer

● Aggregation Framework● Profiling● Tuning● Sharding● Storage & Compression

MADRID · NOV 18-19 · 2016

#Data Model

4

MADRID · NOV 18-19 · 2016

#Data Model

5

● All the information in one document:○ No joins○ Single read○ Higher performance○ ACID compliance at the document level

● MMAPv1: avoid unbounded document growth● WiredTiger rewrites the document for each update

MADRID · NOV 18-19 · 2016

#Application Patterns

6

● Modify only those fields that have changed● Project only the fields you need● Avoid negation in queries● Use Covered Queries when possible● Avoid scatter-gather queries● Write Concern:

○ w : 0 → faster than w : all

MADRID · NOV 18-19 · 2016

#.explain()

7

● Use .explain() to get the time a query needs● Verbose options:

○ queryPlanner (default)○ executionStats (stats for the winner plan)○ allPlansExecution (stats for all candidate

plans)

MADRID · NOV 18-19 · 2016

#.explain()

8

● In-memory sorting data is limited to 32MB (you need an index)

● My index selectivity is the relationship between nReturned and totalKeysExamined

MADRID · NOV 18-19 · 2016

#Indexes

9

MADRID · NOV 18-19 · 2016

#Indexes

10

● Improve the performance of read-intensive apps

● Overhead:○ Write operations○ Disk usage○ Memory consumption

MADRID · NOV 18-19 · 2016

● Working Set is made of:○ Indexes○ Your most frequently accessed data

#Indexes-Working Set

11

MADRID · NOV 18-19 · 2016

● If your Working Set does not fit into RAM:○ Increment your RAM○ Shard the database○ Avoid scan all documents in the collection

■ Page faults■ You will fill in your Working Set with

infrequent accessed data

#Indexes-Working Set

12

MADRID · NOV 18-19 · 2016

#Indexes-Before creating indexes

13

● Understand your data● How your queries will be● Create as few indexes as possible● Put your highly selective fields first in the index● Check your current indexes before creating new

ones

MADRID · NOV 18-19 · 2016

#Indexes-Advices

14

● Do not index a little cardinality field● Check your index performance● Monitor your system to check if:

○ Your strategy is still up to date○ You need new indexes○ You can remove unnecessary indexes

MADRID · NOV 18-19 · 2016

#Indexes-Compound

15

● A compound index has better performance than:○ Single indexes○ Intersection

● Remove indexes that are prefixes of other● The value of the combined fields must have high

cardinality

MADRID · NOV 18-19 · 2016

#Indexes-Compound

16

>db.collection.createIndex( { equality, sort, range } );

>db.collection.find({ book_author : ‘John’, #equality book_id : { $gt : 54 } #range} ).sort( { book_price : 1 } ); #sort

MADRID · NOV 18-19 · 2016

> use test> db.example.createIndex( { a : 1, b : 1 } );> db.example.explain("executionStats").find( { a : 56, b : 87 } );

"nReturned" : 100,"executionTimeMillis" : 2,"totalKeysExamined" : 100,"totalDocsExamined" : 100,

> db.example.dropIndex( { a : 1, b : 1 } );> db.example.createIndex( { a : 1 } );> db.example.createIndex( { b : 1 } );> db.example.explain("executionStats").find( { a : 56, b : 87 } );

"nReturned" : 100,"executionTimeMillis" : 36,"totalKeysExamined" : 10000,"totalDocsExamined" : 10000

#Indexes-Compound example

17

MADRID · NOV 18-19 · 2016

#Indexes-Covered Queries

18

● They return results from the indexes● A covered query does not read from the

document level:○ ndocsExamined : 0

● In the index must be all the fields:○ Included in the query○ Returned by the query

MADRID · NOV 18-19 · 2016

#Indexes-Covered Queries example

19

> use test> db.example.createIndex( { a : 1, b : 1 } );> db.example.explain("executionStats").find( { a : 56, b : 87 } );


> db.example.explain("executionStats").find( { a : 56, b : 87 }, { _id : 0, a : 1, b : 1 } );


MADRID · NOV 18-19 · 2016

#Indexes-TTL

20

● TTL (TimeToLive) Indexes○ Allow the user to specify a period of time after

which the data will be automatically deleted

MADRID · NOV 18-19 · 2016

#Indexes-TTL example

21

> use test> db.timetolive.createIndex( { date : 1 } , { expireAfterSeconds: 20 } );> db.timetolive.insert( { date : new Date() } );> db.timetolive.find().pretty(){

"_id" : ObjectId("581fb43f5626d57bcf66fe73"),"date" : ISODate("2016-11-06T22:52:47.373Z")

}> db.timetolive.count()0>

The background task that removes expired documents runs every 60 seconds.

MADRID · NOV 18-19 · 2016

#Indexes-Partial

22

● Partial Indexes○ Allow to index a subset of your data by

specifying a filtering expression during the index creation

○ Better query performance while reducing system overhead

MADRID · NOV 18-19 · 2016

#Indexes-Partial example

> use test> db.sysprofile.createIndex( { ns : 1, op : 1 }, { partialFilterExpression : { millis : { $gt : 10000 } } } );> db.sysprofile.explain().find( { ns : 'school2.students', op : 'query' }, { millis : 1 } ); "winningPlan" : { "inputStage" : {

"stage" : "COLLSCAN"> db.sysprofile.explain().find( { ns : 'school2.students', op : 'query', millis : { $gte : 20000 } }, { millis : 1 } ); "winningPlan" : { "inputStage" : {

"stage" : "IXSCAN"

23

MADRID · NOV 18-19 · 2016

#Indexes-Sparse

24

● Sparse Indexes○ Only contain entries for documents that

contain a specific field○ Smaller and more efficient indexes

MADRID · NOV 18-19 · 2016

> use test> db.test.createIndex( { b : 1 } , { sparse : true } );> db.test.stats()16384>

> db.test.createIndex( { b : 1 } );> db.test.stats()2580480>

#Indexes-Sparse example

25

MADRID · NOV 18-19 · 2016

#Storage Engines-WiredTiger

26

● Reduced storage space and higher I/O scalability● It compresses indexes into RAM, freeing up

Working Set for documents● Performance (granular concurrency control)● Native compression:

○ per collection○ per index

MADRID · NOV 18-19 · 2016

#Storage Engines-WiredTiger

27

● Types of compression:○ Snappy (default): Data and Journal○ zlib: maximum compression○ Prefix: for indexes

MADRID · NOV 18-19 · 2016

#Storage Engines: In-Memory

28

● Extreme performance coupled with real time analytics for the most demanding, latency-sensitive applications

● Delivers the extreme throughput and predictable latency

● Eliminates the need for separate caching layers

MADRID · NOV 18-19 · 2016

#Query Optimizer

29

● It selects the best index to use● It periodically runs alternate query plans and

selects the index with the best response time for each query type

● The results of these tests are stored as a cached query plan and are updated periodically

MADRID · NOV 18-19 · 2016

#Aggregation Framework

30

MADRID · NOV 18-19 · 2016

#Aggregation Framework

31

● Use $indexStats to determine how frequently your index is used

● Each shard has its own statistics● Stats are reset when you restart your mongod

service>db.collection.aggregate([ { $indexStats : {} }

])

MADRID · NOV 18-19 · 2016

#Aggregation Framework example

32

> db.test.aggregate([ { $indexStats : {} } ]).pretty(){

"name" : "b_1","key" : {

"b" : 1},"host" : "hostname:27017","accesses" : {

"ops" : NumberLong(4),"since" : ISODate("2016-11-07T00:05:39.771Z")

}}...

MADRID · NOV 18-19 · 2016

#Tools-Compass

33

MADRID · NOV 18-19 · 2016

#Tools-Compass

34

● MongoDB Compass○ Analyze the size and usage of the indexes○ It offers visual plan explanations○ Index creation○ Real time stats○ Identify and fix performance and data problems

MADRID · NOV 18-19 · 2016

#Tools-Ops Manager

35

● The Visual Query Profiler allows to analyze a query performance and recommends on the addition of indexes

● It can be used to visualize output from the profiler when identifying slow queries

MADRID · NOV 18-19 · 2016

#Tools-Cloud Manager & Atlas

36

● Use the Query Targeting chart

MADRID · NOV 18-19 · 2016

#Tools-Profiling

37

● Guess the slowest queries for improving their indexes

● Log information for all events or only those whose duration exceeds a configurable threshold (100ms default)

MADRID · NOV 18-19 · 2016

#Tools-Profiling example

38

> db.setProfilingLevel(2){ "was" : 0, "slowms" : 100, "ok" : 1 }> db.example.find( { c : 98 } );> db.system.profile.find().pretty(){

"op" : "query","ns" : "test.example","query" : {

"find" : "example","filter" : {

"c" : 98}

},…"millis" : 6

MADRID · NOV 18-19 · 2016

#Tuning-SSD

39

● Most disk access patterns are not sequential● Data files benefit from SSDs● MongoDB journal files have high sequential write

patterns (good for spinning disks)● RAM/SSDs better than spinning drives● Use high performance storage● Avoid networked storage

MADRID · NOV 18-19 · 2016

#Tuning-CPU

40

● MongoDB has better performance on faster CPUs● The MongoDB WiredTiger Storage Engine is better

able to saturate multi-core processor resources than the MMAPv1 Storage Engine

MADRID · NOV 18-19 · 2016

#Tuning-Thread

41

● The WiredTiger Storage Engine is multi-threaded and can take advantage of many CPU Cores

MADRID · NOV 18-19 · 2016

#Sharding

42

● The balancer has been moved from the mongos to the primary member of the config server

● Sharding improves the performance of reads● Benefits (even with a ‘poor’ shard key and

scatter-gather):○ A larger amount of memory to store indexes○ Parallelize queries across shards reducing latency

MADRID · NOV 18-19 · 2016

#Storage & Compression types

43

● Balance query latency with storage density and cost by assigning data sets based on a value such as a timestamp to specific storage devices

MADRID · NOV 18-19 · 2016

#Storage & Compression types

44

● Recent, frequently accessed data can be assigned to high performance SSDs with Snappy compression

● Older, less frequently accessed data is tagged to lower-throughput hard disk drives compressed with zlib to attain maximum storage density with a lower cost-per-bit

● As data ages, MongoDB will automatically migrates it between shards

MADRID · NOV 18-19 · 2016

#Questions?

45

Questions?

MongoDB:Improve the performance of your applicationby @juanroycouto

MADRID · NOV 18-19 · 201646

Thank you for your attention!

http://twitter.com/juanroycouto

mongo db improve the performance of your application codemotion2016

Data & Analytics