Download - Data Modeling Deep Dive
![Page 1: Data Modeling Deep Dive](https://reader034.vdocuments.us/reader034/viewer/2022052601/559829901a28abdd308b466e/html5/thumbnails/1.jpg)
![Page 2: Data Modeling Deep Dive](https://reader034.vdocuments.us/reader034/viewer/2022052601/559829901a28abdd308b466e/html5/thumbnails/2.jpg)
Data Modeling:
Four use cases
Toji GeorgeSolutions ArchitectMongoDB Inc.
![Page 3: Data Modeling Deep Dive](https://reader034.vdocuments.us/reader034/viewer/2022052601/559829901a28abdd308b466e/html5/thumbnails/3.jpg)
Agenda
• 4 Real World Schemas
– Inbox
– History
– Indexed Attributes
– Multiple Identities
• Conclusions
![Page 4: Data Modeling Deep Dive](https://reader034.vdocuments.us/reader034/viewer/2022052601/559829901a28abdd308b466e/html5/thumbnails/4.jpg)
In MongoDB
Application Development requires Good Schema
Design
Success comes from Proper Data Structure
“Schema-less”?
![Page 5: Data Modeling Deep Dive](https://reader034.vdocuments.us/reader034/viewer/2022052601/559829901a28abdd308b466e/html5/thumbnails/5.jpg)
#1 –Message Inbox
![Page 6: Data Modeling Deep Dive](https://reader034.vdocuments.us/reader034/viewer/2022052601/559829901a28abdd308b466e/html5/thumbnails/6.jpg)
Lets get social
![Page 7: Data Modeling Deep Dive](https://reader034.vdocuments.us/reader034/viewer/2022052601/559829901a28abdd308b466e/html5/thumbnails/7.jpg)
Sending Messages
?
![Page 8: Data Modeling Deep Dive](https://reader034.vdocuments.us/reader034/viewer/2022052601/559829901a28abdd308b466e/html5/thumbnails/8.jpg)
Design Goals
• Efficiently send new messages to recipients
• Efficiently read inbox
![Page 9: Data Modeling Deep Dive](https://reader034.vdocuments.us/reader034/viewer/2022052601/559829901a28abdd308b466e/html5/thumbnails/9.jpg)
Reading My Inbox
?
![Page 10: Data Modeling Deep Dive](https://reader034.vdocuments.us/reader034/viewer/2022052601/559829901a28abdd308b466e/html5/thumbnails/10.jpg)
Three (of many) Approaches
• Fan out on Read
• Fan out on Write
• Fan out on Write with Bucketing
![Page 11: Data Modeling Deep Dive](https://reader034.vdocuments.us/reader034/viewer/2022052601/559829901a28abdd308b466e/html5/thumbnails/11.jpg)
Fan out on read
// Shard on "from"db.shardCollection( "mongodbdays.inbox", { from: 1 } )
// Make sure we have an index to handle inbox readsdb.inbox.ensureIndex( { to: 1, sent: 1 } )
msg = {from: "Joe",to: [ "Bob", "Jane" ],sent: new Date(), message: "Hi!",
}
// Send a messagedb.inbox.save( msg )
// Read my inboxdb.inbox.find( { to: "Joe" } ).sort( { sent: -1 } )
![Page 12: Data Modeling Deep Dive](https://reader034.vdocuments.us/reader034/viewer/2022052601/559829901a28abdd308b466e/html5/thumbnails/12.jpg)
Fan out on read – I/O
Shard 1 Shard 2 Shard 3
Send Message
![Page 13: Data Modeling Deep Dive](https://reader034.vdocuments.us/reader034/viewer/2022052601/559829901a28abdd308b466e/html5/thumbnails/13.jpg)
Fan out on read – I/O
Shard 1 Shard 2 Shard 3
Read Inbox
Send Message
![Page 14: Data Modeling Deep Dive](https://reader034.vdocuments.us/reader034/viewer/2022052601/559829901a28abdd308b466e/html5/thumbnails/14.jpg)
Considerations
• Write: One document per message sent
• Read: Find all messages with my own name in
the recipient field
• Read: Requires scatter-gather on sharded
cluster
• A lot of random I/O on a shard to find everything
![Page 15: Data Modeling Deep Dive](https://reader034.vdocuments.us/reader034/viewer/2022052601/559829901a28abdd308b466e/html5/thumbnails/15.jpg)
Fan out on write
// Shard on “recipient” and “sent” db.shardCollection( "mongodbdays.inbox", { ”recipient”: 1, ”sent”: 1 } )
msg = {from: "Joe",to: [ "Bob", "Jane" ],sent: new Date(), message: "Hi!",
}
// Send a messagefor ( recipient in msg.to ) {
msg.recipient = msg.to[recipient]db.inbox.save( msg );
}
// Read my inboxdb.inbox.find( { recipient: "Joe" } ).sort( { sent: -1 } )
![Page 16: Data Modeling Deep Dive](https://reader034.vdocuments.us/reader034/viewer/2022052601/559829901a28abdd308b466e/html5/thumbnails/16.jpg)
Fan out on write – I/O
Shard 1 Shard 2 Shard 3
Send Message
![Page 17: Data Modeling Deep Dive](https://reader034.vdocuments.us/reader034/viewer/2022052601/559829901a28abdd308b466e/html5/thumbnails/17.jpg)
Fan out on write – I/O
Read Inbox
Send Message
Shard 1 Shard 2 Shard 3
![Page 18: Data Modeling Deep Dive](https://reader034.vdocuments.us/reader034/viewer/2022052601/559829901a28abdd308b466e/html5/thumbnails/18.jpg)
Considerations
• Write: One document per recipient
• Read: Find all of the messages with me as the
recipient
• Can shard on recipient, so inbox reads hit one
shard
• But still lots of random I/O on the shard
![Page 19: Data Modeling Deep Dive](https://reader034.vdocuments.us/reader034/viewer/2022052601/559829901a28abdd308b466e/html5/thumbnails/19.jpg)
Fan out on write with buckets
// Shard on "owner / sequence"
db.shardCollection( "mongodbdays.inbox",
{ owner: 1, sequence: 1 } )
db.shardCollection( "mongodbdays.users", { user_name: 1 } )
msg = {
from: "Joe",
to: [ "Bob", "Jane" ],
sent: new Date(),
message: "Hi!",
}
![Page 20: Data Modeling Deep Dive](https://reader034.vdocuments.us/reader034/viewer/2022052601/559829901a28abdd308b466e/html5/thumbnails/20.jpg)
Fan out on write with buckets
// Send a messagefor( recipient in msg.to) {
count = db.users.findAndModify({query: { user_name: msg.to[recipient] },
update: { "$inc": { "msg_count": 1 } },upsert: true,new: true }).msg_count;
sequence = Math.floor(count / 50);
db.inbox.update({ owner: msg.to[recipient], sequence: sequence }, { $push: { "messages": msg } },{ upsert: true } );
}
// Read my inboxdb.inbox.find( { owner: "Joe" } )
.sort ( { sequence: -1 } ).limit( 2 )
![Page 21: Data Modeling Deep Dive](https://reader034.vdocuments.us/reader034/viewer/2022052601/559829901a28abdd308b466e/html5/thumbnails/21.jpg)
Fan out on write with buckets
• Each “inbox” document is an array of messages
• Append a message onto “inbox” of recipient
• Bucket inboxes so there’s not too many
messages per document
• Can shard on recipient, so inbox reads hit one
shard
• 1 or 2 documents to read the whole inbox
![Page 22: Data Modeling Deep Dive](https://reader034.vdocuments.us/reader034/viewer/2022052601/559829901a28abdd308b466e/html5/thumbnails/22.jpg)
Fan out on write with buckets – I/O
Shard 1 Shard 2 Shard 3
Send Message
![Page 23: Data Modeling Deep Dive](https://reader034.vdocuments.us/reader034/viewer/2022052601/559829901a28abdd308b466e/html5/thumbnails/23.jpg)
Fan out on write with buckets – I/O
Shard 1 Shard 2 Shard 3
Read Inbox
Send Message
![Page 24: Data Modeling Deep Dive](https://reader034.vdocuments.us/reader034/viewer/2022052601/559829901a28abdd308b466e/html5/thumbnails/24.jpg)
#2 - History
![Page 25: Data Modeling Deep Dive](https://reader034.vdocuments.us/reader034/viewer/2022052601/559829901a28abdd308b466e/html5/thumbnails/25.jpg)
![Page 26: Data Modeling Deep Dive](https://reader034.vdocuments.us/reader034/viewer/2022052601/559829901a28abdd308b466e/html5/thumbnails/26.jpg)
Design Goals
• Need to retain a limited amount of history e.g.
– Hours, Days, Weeks
– May be legislative requirement (e.g. HIPPA, SOX,
DPA)
• Need to query efficiently by
– match
– ranges
![Page 27: Data Modeling Deep Dive](https://reader034.vdocuments.us/reader034/viewer/2022052601/559829901a28abdd308b466e/html5/thumbnails/27.jpg)
3 (of many) approaches
• Bucket by Number of messages
• Fixed size array
• Bucket by date + TTL collections
![Page 28: Data Modeling Deep Dive](https://reader034.vdocuments.us/reader034/viewer/2022052601/559829901a28abdd308b466e/html5/thumbnails/28.jpg)
Bucket by number of messages
db.inbox.find(){ owner: "Joe", sequence: 25, messages: [
{ from: "Joe",to: [ "Bob", "Jane" ],sent: ISODate("2013-03-01T09:59:42.689Z"),message: "Hi!"
},…
] }
// Query with a date rangedb.inbox.find ({owner: "friend1",
messages: { $elemMatch: {sent:{$gte: ISODate("…") }}}})
// Remove elements based on a datedb.inbox.update({owner: "friend1" },
{ $pull: { messages: { sent: { $gte: ISODate("…") } } } } )
![Page 29: Data Modeling Deep Dive](https://reader034.vdocuments.us/reader034/viewer/2022052601/559829901a28abdd308b466e/html5/thumbnails/29.jpg)
Considerations
• Shrinking documents, space can be reclaimed
with– db.runCommand ( { compact: '<collection>' } )
• Removing the document after the last element in
the array as been removed– { "_id" : …, "messages" : [ ], "owner" :
"friend1", "sequence" : 0 }
![Page 30: Data Modeling Deep Dive](https://reader034.vdocuments.us/reader034/viewer/2022052601/559829901a28abdd308b466e/html5/thumbnails/30.jpg)
Fixed size array
msg = {from: "Your Boss",to: [ "Bob" ],sent: new Date(), message: "CALL ME NOW!"
}
// 2.4 Introduces $each, $sort and $slice for $pushdb.messages.update(
{ _id: 1 }, { $push: { messages: { $each: [ msg ],
$sort: { sent: 1 }, $slice: -50 }
}}
)
![Page 31: Data Modeling Deep Dive](https://reader034.vdocuments.us/reader034/viewer/2022052601/559829901a28abdd308b466e/html5/thumbnails/31.jpg)
Considerations
• Need to compute the size of the array based on
retention period
![Page 32: Data Modeling Deep Dive](https://reader034.vdocuments.us/reader034/viewer/2022052601/559829901a28abdd308b466e/html5/thumbnails/32.jpg)
TTL Collections
// messages: one doc per user per day
db.inbox.findOne()
{
_id: 1,
to: "Joe",
sequence: ISODate("2013-02-04T00:00:00.392Z"),
messages: [ ]
}
// Auto expires data after 31536000 seconds = 1 year
db.messages.ensureIndex( { sequence: 1 },
{ expireAfterSeconds: 31536000 } )
![Page 33: Data Modeling Deep Dive](https://reader034.vdocuments.us/reader034/viewer/2022052601/559829901a28abdd308b466e/html5/thumbnails/33.jpg)
#3 – Indexed Attributes
![Page 34: Data Modeling Deep Dive](https://reader034.vdocuments.us/reader034/viewer/2022052601/559829901a28abdd308b466e/html5/thumbnails/34.jpg)
Design Goal
• Application needs to stored a variable number of
attributes e.g.
– User defined Form
– Meta Data tags
• Queries needed
– Equality
– Range based
• Need to be efficient, regardless of the number of
attributes
![Page 35: Data Modeling Deep Dive](https://reader034.vdocuments.us/reader034/viewer/2022052601/559829901a28abdd308b466e/html5/thumbnails/35.jpg)
2 (of many) Approaches
• Attributes as Embedded Document
• Attributes as Objects in an Array
![Page 36: Data Modeling Deep Dive](https://reader034.vdocuments.us/reader034/viewer/2022052601/559829901a28abdd308b466e/html5/thumbnails/36.jpg)
Attributes as a sub-document
db.files.insert( { _id: "local.0",
attr: { type: "text", size: 64,
created: ISODate("..." } } )
db.files.insert( { _id: "local.1",
attr: { type: "text", size: 128} } )
db.files.insert( { _id: "mongod",
attr: { type: "binary", size: 256,
created: ISODate("...") } } )
// Need to create an index for each item in the sub-document
db.files.ensureIndex( { "attr.type": 1 } )
db.files.find( { "attr.type": "text"} )
// Can perform range queries
db.files.ensureIndex( { "attr.size": 1 } )
db.files.find( { "attr.size": { $gt: 64, $lte: 16384 } } )
![Page 37: Data Modeling Deep Dive](https://reader034.vdocuments.us/reader034/viewer/2022052601/559829901a28abdd308b466e/html5/thumbnails/37.jpg)
Considerations
• Each attribute needs an Index
• Each time you extend, you add an index
• Lots and lots of indexes
![Page 38: Data Modeling Deep Dive](https://reader034.vdocuments.us/reader034/viewer/2022052601/559829901a28abdd308b466e/html5/thumbnails/38.jpg)
Attributes as objects in array
db.files.insert( {_id: "local.0",
attr: [ { type: "text" },
{ size: 64 },
{ created: ISODate("...") } ] } )
db.files.insert( { _id: "local.1",
attr: [ { type: "text" },
{ size: 128 } ] } )
db.files.insert( { _id: "mongod",
attr: [ { type: "binary" },
{ size: 256 },
{ created: ISODate("...") } ] } )
db.files.ensureIndex( { attr: 1 } )
![Page 39: Data Modeling Deep Dive](https://reader034.vdocuments.us/reader034/viewer/2022052601/559829901a28abdd308b466e/html5/thumbnails/39.jpg)
Considerations
• Only one index needed on attr
• Can support range queries, etc.
• Index can be used only once per query
![Page 40: Data Modeling Deep Dive](https://reader034.vdocuments.us/reader034/viewer/2022052601/559829901a28abdd308b466e/html5/thumbnails/40.jpg)
#4 –Multiple Identities
![Page 41: Data Modeling Deep Dive](https://reader034.vdocuments.us/reader034/viewer/2022052601/559829901a28abdd308b466e/html5/thumbnails/41.jpg)
Design Goal
• Ability to look up by a number of different
identities e.g.
- Username
- Email address
- FB handle
- LinkedIn URL
![Page 42: Data Modeling Deep Dive](https://reader034.vdocuments.us/reader034/viewer/2022052601/559829901a28abdd308b466e/html5/thumbnails/42.jpg)
2 (of many) approaches
• Identifiers in a single document
• Separate Identifiers from Content
![Page 43: Data Modeling Deep Dive](https://reader034.vdocuments.us/reader034/viewer/2022052601/559829901a28abdd308b466e/html5/thumbnails/43.jpg)
Single document by user
db.users.findOne()
{ _id: "joe",
email: "[email protected],
fb: "joe.smith", // facebook
li: "joe.e.smith", // linkedin
other: {…}
}
// Shard collection by _id
db.shardCollection("mongodbdays.users", { _id: 1 } )
// Create indexes on each key
db.users.ensureIndex( { email: 1} )
db.users.ensureIndex( { fb: 1 } )
db.users.ensureIndex( { li: 1 } )
![Page 44: Data Modeling Deep Dive](https://reader034.vdocuments.us/reader034/viewer/2022052601/559829901a28abdd308b466e/html5/thumbnails/44.jpg)
Read by _id (shard key)
Shard 1 Shard 2 Shard 3
find( { _id: "joe"} )
![Page 46: Data Modeling Deep Dive](https://reader034.vdocuments.us/reader034/viewer/2022052601/559829901a28abdd308b466e/html5/thumbnails/46.jpg)
Considerations
• Lookup by shard key is routed to 1 shard
• Lookup by other identifier is scatter gathered
across all shards
• Secondary keys cannot have a unique index
![Page 47: Data Modeling Deep Dive](https://reader034.vdocuments.us/reader034/viewer/2022052601/559829901a28abdd308b466e/html5/thumbnails/47.jpg)
Document per identity
// Create unique index
db.identities.ensureIndex( { identifier : 1} , { unique: true} )
// Create a document for each users document
db.identities.save(
{ identifier : { hndl: "joe" }, user: "1200-42" } )
db.identities.save(
{ identifier : { email: "[email protected]" }, user: "1200-42" } )
db.identities.save(
{ identifier : { li: "joe.e.smith" }, user: "1200-42" } )
// Shard collection by _id
db.shardCollection( "mydb.identities", { identifier : 1 } )
// Create unique index
db.users.ensureIndex( { _id: 1} , { unique: true} )
// Shard collection by _id
db.shardCollection( "mydb.users", { _id: 1 } )
![Page 48: Data Modeling Deep Dive](https://reader034.vdocuments.us/reader034/viewer/2022052601/559829901a28abdd308b466e/html5/thumbnails/48.jpg)
Read requires 2 reads
Shard 1 Shard 2 Shard 3
db.identities.find({"identifier" : { "hndl" : "joe" }})
db.users.find( { _id: "1200-42"} )
![Page 49: Data Modeling Deep Dive](https://reader034.vdocuments.us/reader034/viewer/2022052601/559829901a28abdd308b466e/html5/thumbnails/49.jpg)
Considerations
• Lookup to Identities is a routed query
• Lookup to Users is a routed query
• Unique indexes available
• Must do two queries per lookup
![Page 50: Data Modeling Deep Dive](https://reader034.vdocuments.us/reader034/viewer/2022052601/559829901a28abdd308b466e/html5/thumbnails/50.jpg)
Conclusion
![Page 51: Data Modeling Deep Dive](https://reader034.vdocuments.us/reader034/viewer/2022052601/559829901a28abdd308b466e/html5/thumbnails/51.jpg)
Summary
• Multiple ways to model a domain problem
• Understand the key uses cases of your app
• Balance between ease of query vs. ease of write
• Reduce random I/O where possible for better
performance
![Page 52: Data Modeling Deep Dive](https://reader034.vdocuments.us/reader034/viewer/2022052601/559829901a28abdd308b466e/html5/thumbnails/52.jpg)