mythbusting: understanding how we measure the performance of mongodb
DESCRIPTION
TRANSCRIPT
- 1. Senior Director of Performance Engineering, MongoDB Alvin Richards #MongoDBWorld Mythbusting: Understanding How We Measure the Performance of MongoDB
- 2. Before we start We are going to look a lot at C++ kernel code Java benchmarks JavaScript tests And lots of charts
- 3. Measuring "Performance" https://www.youtube.com/watch?v=7wm-pZp_mi0
- 4. Benchmarking Some common traps Performance measurement & diagnosis What's next
- 5. Part One Some Common Traps
- 6. The Milk Train Doesn't Stop Here Anymore Tennessee Williams "We all live in a house on fire, no fire department to call; no way out, just the upstairs window to look out of while the fire burns the house down with us trapped, locked in it."
- 7. long startTime = System.currentTimeMillis(); for (int roundNum = 0; roundNum < numRounds; roundNum++) { for (int i = 0; i < documentsPerInsert; i++) { id++; BasicDBObject doc = new BasicDBObject(); doc.put("_id",id); doc.put("k",rand.nextInt(numMaxInserts)+1); String cVal = "" doc.put("c",cVal); String padVal = ""; doc.put("pad",padVal); aDocs[i]=doc; } coll.insert(aDocs); numInserts += documentsPerInsert; globalInserts.addAndGet(documentsPerInsert); } long endTime = System.currentTimeMillis(); #1 Time taken to Insert x Documents
- 8. long startTime = System.currentTimeMillis(); for (int roundNum = 0; roundNum < numRounds; roundNum++) { for (int i = 0; i < documentsPerInsert; i++) { id++; BasicDBObject doc = new BasicDBObject(); doc.put("_id",id); doc.put("k",rand.nextInt(numMaxInserts)+1); String cVal = "" doc.put("c",cVal); String padVal = ""; doc.put("pad",padVal); aDocs[i]=doc; } coll.insert(aDocs); numInserts += documentsPerInsert; globalInserts.addAndGet(documentsPerInsert); } long endTime = System.currentTimeMillis(); #1 Time taken to Insert x Documents
- 9. long startTime = System.currentTimeMillis(); for (int roundNum = 0; roundNum < numRounds; roundNum++) { for (int i = 0; i < documentsPerInsert; i++) { id++; BasicDBObject doc = new BasicDBObject(); doc.put("_id",id); doc.put("k",rand.nextInt(numMaxInserts)+1); String cVal = "" doc.put("c",cVal); String padVal = ""; doc.put("pad",padVal); aDocs[i]=doc; } coll.insert(aDocs); numInserts += documentsPerInsert; globalInserts.addAndGet(documentsPerInsert); } long endTime = System.currentTimeMillis(); #1 Time taken to Insert x Documents
- 10. long startTime = System.currentTimeMillis(); for (int roundNum = 0; roundNum < numRounds; roundNum++) { for (int i = 0; i < documentsPerInsert; i++) { id++; BasicDBObject doc = new BasicDBObject(); doc.put("_id",id); doc.put("k",rand.nextInt(numMaxInserts)+1); String cVal = "" doc.put("c",cVal); String padVal = ""; doc.put("pad",padVal); aDocs[i]=doc; } coll.insert(aDocs); numInserts += documentsPerInsert; globalInserts.addAndGet(documentsPerInsert); } long endTime = System.currentTimeMillis(); #1 Time taken to Insert x Documents
- 11. long startTime = System.currentTimeMillis(); for (int roundNum = 0; roundNum < numRounds; roundNum++) { for (int i = 0; i < documentsPerInsert; i++) { id++; BasicDBObject doc = new BasicDBObject(); doc.put("_id",id); doc.put("k",rand.nextInt(numMaxInserts)+1); String cVal = "" doc.put("c",cVal); String padVal = ""; doc.put("pad",padVal); aDocs[i]=doc; } coll.insert(aDocs); numInserts += documentsPerInsert; globalInserts.addAndGet(documentsPerInsert); } long endTime = System.currentTimeMillis(); #1 Time taken to Insert x Documents
- 12. long startTime = System.currentTimeMillis(); for (int roundNum = 0; roundNum < numRounds; roundNum++) { for (int i = 0; i < documentsPerInsert; i++) { id++; BasicDBObject doc = new BasicDBObject(); doc.put("_id",id); doc.put("k",rand.nextInt(numMaxInserts)+1); String cVal = "" doc.put("c",cVal); String padVal = ""; doc.put("pad",padVal); aDocs[i]=doc; } coll.insert(aDocs); numInserts += documentsPerInsert; globalInserts.addAndGet(documentsPerInsert); } long endTime = System.currentTimeMillis(); So that looks ok, right?
- 13. long startTime = System.currentTimeMillis(); for (int roundNum = 0; roundNum < numRounds; roundNum++) { for (int i = 0; i < documentsPerInsert; i++) { id++; BasicDBObject doc = new BasicDBObject(); doc.put("_id",id); doc.put("k",rand.nextInt(numMaxInserts)+1); String cVal = "" doc.put("c",cVal); String padVal = ""; doc.put("pad",padVal); aDocs[i]=doc; } coll.insert(aDocs); numInserts += documentsPerInsert; globalInserts.addAndGet(documentsPerInsert); } long endTime = System.currentTimeMillis(); What are else you measuring? Object creation and GC management?
- 14. long startTime = System.currentTimeMillis(); for (int roundNum = 0; roundNum < numRounds; roundNum++) { for (int i = 0; i < documentsPerInsert; i++) { id++; BasicDBObject doc = new BasicDBObject(); doc.put("_id",id); doc.put("k",rand.nextInt(numMaxInserts)+1); String cVal = "" doc.put("c",cVal); String padVal = ""; doc.put("pad",padVal); aDocs[i]=doc; } coll.insert(aDocs); numInserts += documentsPerInsert; globalInserts.addAndGet(documentsPerInsert); } long endTime = System.currentTimeMillis(); What are else you measuring? Thread contention on nextInt()? Object creation and GC management?
- 15. long startTime = System.currentTimeMillis(); for (int roundNum = 0; roundNum < numRounds; roundNum++) { for (int i = 0; i < documentsPerInsert; i++) { id++; BasicDBObject doc = new BasicDBObject(); doc.put("_id",id); doc.put("k",rand.nextInt(numMaxInserts)+1); String cVal = "" doc.put("c",cVal); String padVal = ""; doc.put("pad",padVal); aDocs[i]=doc; } coll.insert(aDocs); numInserts += documentsPerInsert; globalInserts.addAndGet(documentsPerInsert); } long endTime = System.currentTimeMillis(); What are else you measuring? Time to synthesize data? Object creation and GC management? Thread contention on nextInt()?
- 16. long startTime = System.currentTimeMillis(); for (int roundNum = 0; roundNum < numRounds; roundNum++) { for (int i = 0; i < documentsPerInsert; i++) { id++; BasicDBObject doc = new BasicDBObject(); doc.put("_id",id); doc.put("k",rand.nextInt(numMaxInserts)+1); String cVal = "" doc.put("c",cVal); String padVal = ""; doc.put("pad",padVal); aDocs[i]=doc; } coll.insert(aDocs); numInserts += documentsPerInsert; globalInserts.addAndGet(documentsPerInsert); } long endTime = System.currentTimeMillis(); What are else you measuring? Object creation and GC management? Thread contention on addAndGet()? Thread contention on nextInt()? Time to synthesize data?
- 17. long startTime = System.currentTimeMillis(); for (int roundNum = 0; roundNum < numRounds; roundNum++) { for (int i = 0; i < documentsPerInsert; i++) { id++; BasicDBObject doc = new BasicDBObject(); doc.put("_id",id); doc.put("k",rand.nextInt(numMaxInserts)+1); String cVal = "" doc.put("c",cVal); String padVal = ""; doc.put("pad",padVal); aDocs[i]=doc; } coll.insert(aDocs); numInserts += documentsPerInsert; globalInserts.addAndGet(documentsPerInsert); } long endTime = System.currentTimeMillis(); What are else you measuring? Object creation and GC management? Clock resolution? Thread contention on nextInt()? Time to synthesize data? Thread contention on addAndGet()?
- 18. // Pre Create the Object outside the Loop BasicDBObject[] aDocs = new BasicDBObject[documentsPerInsert]; for (int i=0; i < documentsPerInsert; i++) { BasicDBObject doc = new BasicDBObject(); String cVal = ""; doc.put("c",cVal); String padVal = ""; doc.put("pad",padVal); aDocs[i] = doc; } Solution: Pre-Create the objects Pre-create non varying data outside the timing loop Alternative Pre-create the data in a file; load from file
- 19. // Use ThreadLocalRandom generator or an instance of java.util.Random per thread java.util.concurrent.ThreadLocalRandom rand; for (long roundNum = 0; roundNum < numRounds; roundNum++) { for (int i = 0; i < documentsPerInsert; i++) { id++; doc = aDocs[i]; doc.put("_id",id); doc.put("k", nextInt(rand, numMaxInserts)+1); } coll.insert(aDocs); numInserts += documentsPerInsert; } // Maintain count outside the loop globalInserts.addAndGet(documentsPerInsert * roundNum); Solution: Remove contention Remove contention nextInt() by making Thread local
- 20. // Use ThreadLocalRandom generator or an instance of java.util.Random per thread java.util.concurrent.ThreadLocalRandom rand; for (long roundNum = 0; roundNum < numRounds; roundNum++) { for (int i = 0; i < documentsPerInsert; i++) { id++; doc = aDocs[i]; doc.put("_id",id); doc.put("k", nextInt(rand, numMaxInserts)+1); } coll.insert(aDocs); numInserts += documentsPerInsert; } // Maintain count outside the loop globalInserts.addAndGet(documentsPerInsert * roundNum); Solution: Remove contention Remove contention on addAndGet() Remove contention nextInt() by making Thread local
- 21. long startTime = System.currentTimeMillis(); long endTime = System.currentTimeMillis(); long startTime = System.nanoTime(); long endTime = System.nanoTime() - startTime; Solution: Timer resolution "resolution is at least as good as that of currentTimeMillis()" "granularity of the value depends on the underlying operating system and may be larger" Source http://docs.oracle.com/javase/7/docs/api/java/lang/System.html
- 22. General Principal #1 Know what you are measuring
- 23. BasicDBObject doc = new BasicDBObject(); doc.put("v", str); // str is a 2k string for (int i=0; i < 1000; i++) { doc.put("_id",i); coll.insert(doc); } BasicDBObject predicate = new BasicDBObject(); long startTime = System.currentTimeMillis(); DBCursor cur = coll.find(predicate); DBObject foundObj; while (cur.hasNext()) { foundObj = cur.next(); } long endTime = System.currentTimeMillis(); #2 Response time to return all results
- 24. BasicDBObject doc = new BasicDBObject(); doc.put("v", str); // str is a 2k string for (int i=0; i < 1000; i++) { doc.put("_id",i); coll.insert(doc); } BasicDBObject predicate = new BasicDBObject(); long startTime = System.currentTimeMillis(); DBCursor cur = coll.find(predicate); DBObject foundObj; while (cur.hasNext()) { foundObj = cur.next(); } long endTime = System.currentTimeMillis(); #2 Response time to return all results
- 25. BasicDBObject doc = new BasicDBObject(); doc.put("v", str); // str is a 2k string for (int i=0; i < 1000; i++) { doc.put("_id",i); coll.insert(doc); } BasicDBObject predicate = new BasicDBObject(); long startTime = System.currentTimeMillis(); DBCursor cur = coll.find(predicate); DBObject foundObj; while (cur.hasNext()) { foundObj = cur.next(); } long endTime = System.currentTimeMillis(); #2 Response time to return all results
- 26. BasicDBObject doc = new BasicDBObject(); doc.put("v", str); // str is a 2k string for (int i=0; i < 1000; i++) { doc.put("_id",i); coll.insert(doc); } BasicDBObject predicate = new BasicDBObject(); long startTime = System.currentTimeMillis(); DBCursor cur = coll.find(predicate); DBObject foundObj; while (cur.hasNext()) { foundObj = cur.next(); } long endTime = System.currentTimeMillis(); #2 Response time to return all results
- 27. BasicDBObject doc = new BasicDBObject(); doc.put("v", str); // str is a 2k string for (int i=0; i < 1000; i++) { doc.put("_id",i); coll.insert(doc); } BasicDBObject predicate = new BasicDBObject(); long startTime = System.currentTimeMillis(); DBCursor cur = coll.find(predicate); DBObject foundObj; while (cur.hasNext()) { foundObj = cur.next(); } long endTime = System.currentTimeMillis(); So that looks ok, right?
- 28. BasicDBObject doc = new BasicDBObject(); doc.put("v", str); // str is a 2k string for (int i=0; i < 1000; i++) { doc.put("_id",i); coll.insert(doc); } BasicDBObject predicate = new BasicDBObject(); long startTime = System.currentTimeMillis(); DBCursor cur = coll.find(predicate); DBObject foundObj; while (cur.hasNext()) { foundObj = cur.next(); } long endTime = System.currentTimeMillis(); What are else you measuring? Each doc is is 4080 bytes on disk with powerOf2Sizes
- 29. BasicDBObject doc = new BasicDBObject(); doc.put("v", str); // str is a 2k string for (int i=0; i < 1000; i++) { doc.put("_id",i); coll.insert(doc); } BasicDBObject predicate = new BasicDBObject(); long startTime = System.currentTimeMillis(); DBCursor cur = coll.find(predicate); DBObject foundObj; while (cur.hasNext()) { foundObj = cur.next(); } long endTime = System.currentTimeMillis(); What are else you measuring? Each doc is is 4080 bytes on disk with powerOf2Sizes Unrestricted predicate?
- 30. BasicDBObject doc = new BasicDBObject(); doc.put("v", str); // str is a 2k string for (int i=0; i < 1000; i++) { doc.put("_id",i); coll.insert(doc); } BasicDBObject predicate = new BasicDBObject(); long startTime = System.currentTimeMillis(); DBCursor cur = coll.find(predicate); DBObject foundObj; while (cur.hasNext()) { foundObj = cur.next(); } long endTime = System.currentTimeMillis(); What are else you measuring? Each doc is is 4080 bytes on disk with powerOf2Sizes Measuring Time to parse & execute query Time to retrieve all document But also Cost of shipping ~4MB data through network stack Unrestricted predicate?
- 31. BasicDBObject predicate = new BasicDBObject(); predicate.put("_id", new BasicDBObject("$gte", 10).append("$lte", 20)); BasicDBObject projection = new BasicDBObject(); projection.put("_id", 1); long startTime = System.currentTimeMillis(); DBCursor cur = coll.find(predicate, projection ); DBObject foundObj; while (cur.hasNext()) { foundObj = cur.next(); } long endTime = System.currentTimeMillis(); Solution: Limit the projection Return fixed range
- 32. BasicDBObject predicate = new BasicDBObject(); predicate.put("_id", new BasicDBObject("$gte", 10).append("$lte", 20)); BasicDBObject projection = new BasicDBObject(); projection.put("_id", 1); long startTime = System.currentTimeMillis(); DBCursor cur = coll.find(predicate, projection ); DBObject foundObj; while (cur.hasNext()) { foundObj = cur.next(); } long endTime = System.currentTimeMillis(); Solution: Limit the projection Only project _id Return fixed range
- 33. BasicDBObject predicate = new BasicDBObject(); predicate.put("_id", new BasicDBObject("$gte", 10).append("$lte", 20)); BasicDBObject projection = new BasicDBObject(); projection.put("_id", 1); long startTime = System.currentTimeMillis(); DBCursor cur = coll.find(predicate, projection ); DBObject foundObj; while (cur.hasNext()) { foundObj = cur.next(); } long endTime = System.currentTimeMillis(); Solution: Limit the projection Only project _id Only 46k transferred through network stack Return fixed range
- 34. General Principal #2 Measure only what you need to measure
- 35. Part Two Performance measurement & diagnosis
- 36. The Physical Principles of the Quantum Theory (1930) Werner Heisenberg "Every experiment destroys some of the knowledge of the system which was obtained by previous experiments."
- 37. Broad categories Micro Benchmarks Workloads
- 38. Micro benchmarks: mongo-perf
- 39. mongo-perf: goals Measure commands Configure Single mongod, ReplSet size (1 -> n), Sharding Single vs. Multiple DB O/S Characterize Throughput by thread count Compare
- 40. What do you get? Better
- 41. What do you get? Measured improvement between rc0 and rc2 Better
- 42. tests.push( { name: "Commands.CountsIntIDRange", pre: function( collection ) { collection.drop(); for ( var i = 0; i < 1000; i++ ) { collection.insert( { _id : i } ); } collection.getDB().getLastError(); }, ops: [ { op: "command", ns : "testdb", command : { count : "mycollection", query : { _id : { "$gt" : 10, "$lt" : 100 } } } } ] } ); Benchmark source code
- 43. tests.push( { name: "Commands.CountsIntIDRange", pre: function( collection ) { collection.drop(); for ( var i = 0; i < 1000; i++ ) { collection.insert( { _id : i } ); } collection.getDB().getLastError(); }, ops: [ { op: "command", ns : "testdb", command : { count : "mycollection", query : { _id : { "$gt" : 10, "$lt" : 100 } } } } ] } ); Benchmark source code
- 44. tests.push( { name: "Commands.CountsIntIDRange", pre: function( collection ) { collection.drop(); for ( var i = 0; i < 1000; i++ ) { collection.insert( { _id : i } ); } collection.getDB().getLastError(); }, ops: [ { op: "command", ns : "testdb", command : { count : "mycollection", query : { _id : { "$gt" : 10, "$lt" : 100 } } } } ] } ); Benchmark source code
- 45. tests.push( { name: "Commands.CountsIntIDRange", pre: function( collection ) { collection.drop(); for ( var i = 0; i < 1000; i++ ) { collection.insert( { _id : i } ); } collection.getDB().getLastError(); }, ops: [ { op: "command", ns : "testdb", command : { count : "mycollection", query : { _id : { "$gt" : 10, "$lt" : 100 } } } } ] } ); Benchmark source code
- 46. Code Change
- 47. Workloads "public" workloads YCSB Sysbench "real world" simulations Inbox fan in/out Message Stores Content Management
- 48. Example: Bulk Load Performance 16m Documents Better 55% degradation 2.6.0-rc1 vs 2.4.10
- 49. Ouch where's the tree in the woods? 2.4.10 -> 2.6.0 4495 git commits
- 50. git-bisect Bisect between good/bad hashes git-bisect nominates a new githash Build against githash Re-run test Confirm if this githash is good/bad Rinse and repeat
- 51. Code Change - Bad Githash
- 52. Code Change - Fix
- 53. Bulk Load Performance - Fix Better 11% improvement 2.6.1 vs 2.4.10
- 54. The problem with measurement Observability What can you observe on the system? Effect What effects can an observation cause?
- 55. mtools
- 56. mtools MongoDB log file analysis Filter logs for operations, events Response time, lock durations Plot https://github.com/rueckstiess/mtools
- 57. Response Times > 100ms Bulk Insert 2.6.0-rc0 Ops/Sec Time
- 58. Response Times > 100ms Bulk Insert 2.6.0-rc0 vs. 2.6.0-rc2 Floor raised
- 59. Code Change Yielding Policy
- 60. Code Change
- 61. Response Times Bulk Insert 2.6.0 vs 2.6.1 Ceiling similar, lower floor resulting in 40% improvement in throughput
- 62. Secondary effects of Yield policy change Write lock time reduced Order of magnitude reduction of write lock duration
- 63. > db.serverStatus() Yes will cause a read lock to be acquired > db.serverStatus({recordStats:0}) No lock is not acquired > mongostat Yes - until SERVER-14008 resolved, uses db.serverStatus() Unexpected side effects of measurement?
- 64. CPU sampling Get an impression of Call Graphs CPU time spent on node and called nodes
- 65. > sudo apt-get install google-perftools > sudo apt-get install libunwind7-dev > scons --use-cpu-profiler mongod Setup & building with google- profiler
- 66. > mongodb dbpath Note: Do not use fork > mongo > use admin > db.runCommand({_cpuProfilerStart: {profileFilename: 'foo.prof'}}) Execute some commands that you want to profile > db.runCommand({_cpuProfilerStop: 1}) Start the profiling
- 67. Sample start vs. end of workload
- 68. Sample start vs. end of workload
- 69. Code change
- 70. Public Benchmarks Not all forks are the same YCSB https://github.com/achille/YCSB sysbench-mongodb https://github.com/mdcallag/sysbench-mongodb
- 71. Part Three And next?
- 72. Beavis & Butthead "The future sucks. Change it." "I'm way cool Beavis, but I cannot change the future."
- 73. What we are working on mongo-perf UI refactor Adding more micro benchmarks Workloads Adding external benchmarks Creating benchmarks for common use cases Inbox fan in/out Analytical dashboards Stream / Feeds Customers, Partners & Community
- 74. Here's how you can help change the future! Got a great workload? Great benchmark? Want to donate it? [email protected]
- 75. Don't be that benchmark #1 Know what you are measuring #2 Measure only what you need to measure
- 76. [email protected] Senior Director of Performance Engineering, MongoDB Alvin Richards #MongoDBWorld Thank You