big and fat: using mongodb with deep and diverse data sets (mongoatl version)
DESCRIPTION
The best presentation ever. Life changing.TRANSCRIPT
Tuesday, February 8, 2011
Big and FatUsing MongoDB with deep and diverse datasets:
A case study
Tuesday, February 8, 2011
About me
• My name is Jeremy McAnally
• “Software architect” at Intridea
• Write a lot of books, OSS, etc.
• http://github.com/jm
• http://twitter.com/jm
• http://authoringebooks.com
• http://wickhamhousebrand.com
Tuesday, February 8, 2011
New book!
Tuesday, February 8, 2011
New book!
-2 days
from today
Tuesday, February 8, 2011
Preface
The Application™
Tuesday, February 8, 2011
Tuesday, February 8, 2011
Tuesday, February 8, 2011
Disclaimer
We moved to (mostly) sql.
Tuesday, February 8, 2011
Tuesday, February 8, 2011
Tuesday, February 8, 2011
Tuesday, February 8, 2011
YAK SHAVEYAK SHAVEYAK SHAVEYAK SHAVEYAK SHAVEYAK SHAVEYAK SHAVE
YAK SHAVEYAK SHAVEYAK SHAVEYAK SHAVEYAK SHAVEYAK SHAVEYAK SHAVE
YAK SHAVEYAK SHAVEYAK SHAVEYAK SHAVEYAK SHAVEYAK SHAVEYAK SHAVE
YAK SHAVEYAK SHAVEYAK SHAVEYAK SHAVEYAK SHAVEYAK SHAVEYAK SHAVE
YAK SHAVEYAK SHAVEYAK SHAVEYAK SHAVEYAK SHAVEYAK SHAVEYAK SHAVE
YAK SHAVEYAK SHAVEYAK SHAVEYAK SHAVEYAK SHAVEYAK SHAVEYAK SHAVE
YAK SHAVEYAK SHAVEYAK SHAVEYAK SHAVEYAK SHAVEYAK SHAVEYAK SHAVE
YAK SHAVEYAK SHAVEYAK SHAVEYAK SHAVEYAK SHAVEYAK SHAVEYAK SHAVE
Tuesday, February 8, 2011
Lesson 1
Abstraction is a double-edged sword.
Tuesday, February 8, 2011
Abstract away!Talking to all data (no matter the source) the same way will
keep you sane.
Tuesday, February 8, 2011
users = MySQL::Query.execute("SELECT * FROM users;")
users.each do |u| posts = db.collection('posts').find(:user_id => u['id']) # [...] comments = db.collection('comments').find("$where" => "sum(this.admin_count, this.moderator_count) == 5")end
Tuesday, February 8, 2011
users = User.all
users.each do |u| posts = Post.find(:user_id => u.id) # [...] comments = Comment.where("sum(this.admin_count, this.moderator_count) == 5")end
Tuesday, February 8, 2011
users = User.all
users.each do |u| posts = Post.find(:user_id => u.id) # [...] comments = Comment.with_five_thingsend
Tuesday, February 8, 2011
...but wait!MongoDB has a lot of features that will perform better and be
less (and often better) code.
Tuesday, February 8, 2011
pharmacists = {}
Patient.all.each do |patient| patient.prescriptions.each do |prescription| pharmacists[presciption.name] ||= 0 pharmacists[presciption.name] += 1 endend
Tuesday, February 8, 2011
pharmacists = {}
Patient.all.each do |patient| patient.prescriptions.each do |prescription| pharmacists[presciption.name] ||= 0 pharmacists[presciption.name] += 1 endendSLOW AS
CRAP
Tuesday, February 8, 2011
map = "function(){ this.prescriptions.forEach( function(p) { emit(p.name, { count : 1 }); })}" reduce = "function(k, v) { var number = 0; for v.forEach(function() { number += v[i].count; }); return { count : number }; }" pharms = @patients.map_reduce(map, reduce)
Tuesday, February 8, 2011
map = "function(){ this.prescriptions.forEach( function(p) { emit(p.name, { count : 1 }); })}" reduce = "function(k, v) { var number = 0; for v.forEach(function() { number += v[i].count; }); return { count : number }; }" pharms = @patients.map_reduce(map, reduce)
Tuesday, February 8, 2011
Lesson 2
Schema design matters.
Tuesday, February 8, 2011
Lesson 2
Schema design matters.DATA MODEL
Tuesday, February 8, 2011
Embedding works.
Embedding documents is a smart decision in a lot of cases.
Tuesday, February 8, 2011
SELECT * FROM patients WHERE id=212;SELECT * FROM prescriptions WHERE patient_id=212;SELECT * FROM appointments WHERE patient_id=212;SELECT * FROM contacts WHERE patient_id=212;SELECT * FROM claims WHERE patient_id=212;...
Tuesday, February 8, 2011
{ "_id" : ObjectId("4d51959614971661303ea716"), "title" : "Blogs rawk.", "body" : "Fo realz", "comments" : [ { "user_name" : "Jeremy", "user_id" : 1234, "body" : "Yup." } ]}
Tuesday, February 8, 2011
...but watch it.You can also hit a ton of
performance and design issues.
Tuesday, February 8, 2011
Tuesday, February 8, 2011
Tuesday, February 8, 2011
OUR GIANT DOCUMENT
Mongo’s Pre-Allocated Space
Tuesday, February 8, 2011
Patient
Pharmacy
“Reference”Pharmacy
Search, listing, etc.
Tuesday, February 8, 2011
Lesson 3
Don’t go nuts.
Tuesday, February 8, 2011
Schemaless is fun!
Having schemaless data has its own battery of advantages.
nosql
OH MAN MONGO JUST GOT REAL UP
IN HERE
Tuesday, February 8, 2011
Schemaless Joy• Transforming data models is a delight
Tuesday, February 8, 2011
Tuesday, February 8, 2011
Schemaless Joy• Transforming data models is a delight
• Formless data isn’t awkward
Tuesday, February 8, 2011
{ "_id" : ObjectId("4d50c6c32472473e54122d29"), "name" : "Subject A", "2007" : 199, "2008" : 2002, "2010" : 387},{ "_id" : ObjectId("4d50c6d92472473e54122d2a"), "name" : "Subject B", "2005" : 8, "2008" : 99, "2012" : 466},{ "_id" : ObjectId("4d50c6f52472473e54122d2b"), "name" : "Subject C", "2005" : 100, "2009" : 120, "2010" : 1201, "2012" : 3469}
Tuesday, February 8, 2011
> db.subjects.find({2008: {$ne: null}}) { "_id" : ObjectId("4d50c6c32472473e54122d29"), "name" : "Subject A", "2007" : 199, "2008" : 2002, "2010" : 387 }{ "_id" : ObjectId("4d50c6d92472473e54122d2a"), "name" : "Subject B", "2005" : 8, "2008" : 99, "2012" : 466 }
Tuesday, February 8, 2011
Schemaless Joy• Transforming data models is a delight
• Formless data isn’t awkward
• Arbitrary embedding is awesome
Tuesday, February 8, 2011
Tuesday, February 8, 2011
Schemaless Joy• Transforming data models is a delight
• Formless data isn’t awkward
• Arbitrary embedding is awesome
• Building to work with schemaless data can lead to some really powerful app concepts
Tuesday, February 8, 2011
...but be wary.Going nuts will create
headaches for you.
Tuesday, February 8, 2011
Schemaless Pain
Tuesday, February 8, 2011
Schemaless Pain
• Weird app behavior
Tuesday, February 8, 2011
Schemaless Pain
• Weird app behavior
• Huge, long-running data transformations
Tuesday, February 8, 2011
Schemaless Pain
• Weird app behavior
• Huge, long-running data transformations
• Annoying data transforms for development env’s
Tuesday, February 8, 2011
Schemaless Pain
• Weird app behavior
• Huge, long-running data transformations
• Annoying data transforms for development env’s
• Difficult to version data models
Tuesday, February 8, 2011
Lesson 4
Dig deep.
Tuesday, February 8, 2011
> db.runCommand({"serverStatus" : 1}){ "version" : "1.4.3", "uptime" : 96, "localTime" : "Thu Nov 18 2010 01:49:38 GMT-‐0500 (EST)", "globalLock" : { "totalTime" : 96005290, "lockTime" : 174040, "ratio" : 0.0018128167729090762 }, "mem" : { "bits" : 64, "resident" : 2, "virtual" : 2396, "supported" : true, "mapped" : 0 }, "connections" : { "current" : 1, "available" : 19999 }, "extra_info" : { "note" : "fields vary by platform" }, "indexCounters" : { "btree" : { "accesses" : 0, "hits" : 0, "misses" : 0, "resets" : 0, "missRatio" : 0 } }, "backgroundFlushing" : { "flushes" : 1, "total_ms" : 0, "average_ms" : 0, "last_ms" : 0, "last_finished" : "Thu Nov 18 2010 01:49:02 GMT-‐0500 (EST)" }, "opcounters" : { "insert" : 0, "query" : 1, "update" : 0, "delete" : 0, "getmore" : 0, "command" : 3 }, "asserts" : { "regular" : 0, "warning" : 0, "msg" : 0, "user" : 0, "rollovers" : 0 }, "ok" : 1}
Tuesday, February 8, 2011
"opcounters" : { "insert" : 0, "query" : 1, "update" : 0, "delete" : 0, "getmore" : 0, "command" : 3}
Tuesday, February 8, 2011
"connections" : { "current" : 1, "available" : 19999}
Tuesday, February 8, 2011
Jeremy-‐McAnallys-‐MacBook-‐Pro:~ jeremymcanally$ mongostatconnected to: 127.0.0.1insert/s query/s update/s delete/s getmore/s command/s mapped vsize res % locked % idx miss conn time 0 0 0 0 0 1 0 2396 3 0 0 1 01:53:32 0 0 0 0 0 1 0 2396 3 0 0 1 01:53:33 0 0 0 0 0 1 0 2396 3 0 0 1 01:53:34 0 0 0 0 0 1 0 2396 3 0 0 1 01:53:35 0 0 0 0 0 1 0 2396 3 0 0 1 01:53:36 0 0 0 0 0 1 0 2396 3 0 0 1 01:53:37 0 0 0 0 0 1 0 2396 3 0 0 1 01:53:38 0 0 0 0 0 1 0 2396 3 0 0 1 01:53:39
Tuesday, February 8, 2011
Tuesday, February 8, 2011
db._adminCommand({ diagLogging : 1 })
Tuesday, February 8, 2011
db.currentOp(){ inprog: [ { "opid" : 35 , "op" : "query" , "ns" : "fundb.parties" , "query" : "{ score : 1.0 }" , "inLock" : 1 } ]}
Tuesday, February 8, 2011
> db.oplog.$main.find(){ "ts" : { "t" : 1290063566000, "i" : 1 }, "op" : "i", "ns" : "ming.foo", "o" : { "_id" : ObjectId("4ce4ceceabb1b65158000001"), "thing" : 2 } }{ "ts" : { "t" : 1290063569000, "i" : 1 }, "op" : "n", "ns" : "", "o" : { } }{ "ts" : { "t" : 1290063579000, "i" : 1 }, "op" : "n", "ns" : "", "o" : { } }{ "ts" : { "t" : 1290063581000, "i" : 1 }, "op" : "i", "ns" : "ming.foo", "o" : { "_id" : ObjectId("4ce4ceddabb1b65158000002"), "thing" : 2 } }{ "ts" : { "t" : 1290063581000, "i" : 2 }, "op" : "i", "ns" : "ming.foo", "o" : { "_id" : ObjectId("4ce4ceddabb1b65158000003"), "thing" : 2 } }
Tuesday, February 8, 2011
{ "ts" : { "t" : 1290063566000, "i" : 1 }, "op" : "i", "ns" : "ming.foo", "o" : { "_id" : ObjectId("4ce4ceceabb1b65158000001"), "field" : 2 } }
Tuesday, February 8, 2011
That’s all I got.
Questions?
Tuesday, February 8, 2011