Aaron Silverman
Mongo or Die!How MongoDB powers Doodle or Die
(@Zugwalt)
Doodle or Die
@DoodleOrDie
What is Doodle or Die?
Telephone
I'd like some beer!
I'd like some deer
I see no deer
I’ve no idea!
Phrase
Phrase
Phrase
Phrase
Doodle or Die
Shining Apple
Doodle
Phrase
Phrase Eat your fruit or DIE!
Doodle
What Powers Doodle or Die?
Started very, very small
4 Cores128MB RAM
Node Server
MongoDB Server
Small - 2GB
Got serious about our servers
MongoDB Server8 Cores256MB RAM
Node Server
Large - 5GB
Called in some Reinforcements
MongoDB Server12 Cores1GB RAM
Node Server
In the last 30 Days:
• 2,500,000 page views
• 100,000 uniques
• 35,000 active player accounts
• 2,000,000 new doodles and descriptions
“Small Data”Mongo DB4 GB data size<1 GB index size~10 queries/sec
Amazon170 GB data size8 GB in/month200 GB out/month
MongoHQ: $67• Amazon S3: $70
• Player Info• Chain Info (excluding
doodles)• Group Info• Game State• Logs
• Doodles• Static Content• Compressed
Database Backups
MongoDB - $65 / month
Daily backups to Amazon S3: $16/mo
Amazon S3 - $70 / month
Node - $62/ month
Total Cost To Host: $197/monthMongoDB $65Amazon S3 $70Node $62
Total $197
PAAS Provides Easy Upgrade Path
General Principles
Custom _idPartially random string generated using ShortId node module
ObjectIdObjectId("4fd02d5d78315a502d15cdde")ObjectId("4fd02d5a78315a502d15cddd")ObjectId("4fd02d5878315a502d15cddc")
ShortId"8rOIwh2VD""1qyY61Lu1""5GQnbx-1"
• Shorter, less cumbersome in code and queries
db.players.findOne({_id: ‘58mwYlTKV’});
db.chains.update({_id: ‘58mwYlTKV’}, {$set: activePlayer_id: ‘88ueYaL6V’});
http://doodleordie.com/c/5ONtvvSGH
• Randomness could help with sharding; more importantly makes it harder to cheat
<span class="doodle" data-jsonp="http://doodles.s3.amazonaws.com/d2/Eh8-Po2R5/1Em5kj3LY.js">
Question Oriented Subdocuments
What chain is this player working on right now?
What are this player’s stats?
Which players are not eligible to be assigned this chain?
Goal is for most “Questions” to be able to be answered in one query from one sub document
Related “Questions” will share common ancestors
db.players.findOne({_id: ‘58mwYlTKV’}, {‘game.recentSkips’: 1});
db.players.findOne({_id: ‘58mwYlTKV’}, {game: 1});
Indexes are designed to make answering questions easy!
Which player is working on this chain?
db.players.ensureIndex({‘game.activeChain_id’: 1});
What chains are recently awaiting a new doodle?
db.chains.ensureIndex({inUse: -1, activeState: 1 lastModified" : -1});
Doodle or Die Collections
Disclaimer: Much of the original Doodle or Die code and schema were created during a weekend long hackathon!
players
chains
groups
sessions
log
Primary
Support
Players
players
account
info
login
chainHistory
game
stats QueryFrequency
Often
Rarely
players
account
info
login
chainHistory
stats
game • activeState• activeChain_id• activeStepIndex• recentSkips
Answers question: “What is this player working on right now?”db.players.findOne({_id: ‘58mwYlTKV’}, {game: 1});
players
account
info
login
stats
game
chainHistory
Chain1• datePlayed• dateViewed
ChainN
Answers the question: “What has the player worked on?”db.players.findOne({_id: ‘58mwYlTKV’}, {chainHistory: 1});
players
account
info
login
stats
game
chainHistory
Chain1• datePlayed• dateViewed
ChainN
YUCK!Plan to refactor out along with a refactor of how chains store steps
players
account
info
login
game
chainHistory
Answers the question: “How active/good is this player?”
stats
• totalSteps• drawSteps• phraseSteps• numSkips• numLikes
db.players.findOne({_id: ‘58mwYlTKV’}, {stats: 1});
Chains
chains
• activePlayer_id• activeState• numSteps• lastModified
ineligiblePlayer_idssteps
chains are assigned (not as many questions)db.chains.findOne({inUse: false, activeState : player.game.activeState, ineligiblePlayer_ids: {$ne: player._id}, lastModified: { $gte: timeRange}});
chains
• activePlayer_id• activeState• numSteps• lastModified
*
steps
ineligiblePlayer_ids
[player_id1 player_id2, player_id3 … player_idN]
Note: $addToSet and $pull work great in maintaining this array
chains
• activePlayer_id• activeState• numSteps• lastModified
*ineligiblePlayer_ids
steps
[
]
Step1• player_id• state• date
StepN
content
• phrase
Description Step Content:
Doodle Step Content:• url (points to S3)• time• numStrokes• numDistinctColors
• phrase
Description Step Content:
Doodle Step Content:• url (points to S3)• time• numStrokes• numDistinctColors
doodles
• player_id• date• url (points to S3)• time• numStrokes• numDistinctColors
descriptions
• player_id• date• text
We plan to stop embedding steps in chains, and link to “doodles” and “descriptions” collections
How Does it all Work?
players queried for users who are associated with the authenticated twitter account
chain history used to load thumbnails, previous chain is loaded
db.players.find({‘login.twitter.uid’: ‘XXXXXXX’}, {game: 1, chainHistory: 1});
db.chains.find({_id: {$in: [player.game.activeChain_id, player.game.lastChain_id]}});
chains collection searched and atomically updated for unclaimed eligible chain to be given to player
db.chains.findAndModify( {inUse: false, activeState : player.game.activeState, ineligiblePlayer_ids: {$ne: player._id}, lastModified: { $gte: timeRange}}, {$set: {inUse: true, activePlayer_id: player._id, $addToSet: {ineligiblePlayer_ids: player._id}});
Content saved to chain
db.chains.update({_id: chain._id, activePlayer_id: player._id}}, {$inc: {numSteps: 1}, $set: inUse: false, lastModified: datePlayed} $unset: {activePlayer_id: 1}, $push: {steps: { player_id: player._id, state: chain.activeState, content: content, date: datePlayed}}});
Doodle strokes will be saved to S3(but url to S3 and metadata saved to chain)
{"color":"#000000","size":15,"path":[307,66,308,66,308,68,308,69,308,70,308,71,308,72,308,73,306,76,305,79,305,81,302,83,302,84,302,85,302,86,301,87,300,89,300,90,300,91,300,92,300,95,300,97,300,102,300,103,300,104,300,108,300,109,300,110,301,114,303,116,304,116,305,119,305,121,307,124,309,127,310,130,310,131,313,132,314,133,316,137,317,138,320,140,321,140,323,143,326,143,328,143,333,144,337,144,341,144,343,144,347,143,352,143,354,141,357,140,358,140,359,139,362,138,363,138,365,137,368,135,369,132,370,131,371,130,374,128,375,128,376,125,378,125,379,124,380,123,381,123,382,120,385,119,386,118,388,115,391,112,394,109,394,107,394,106,397,104,397,103,397,102,397,101,397,100,397,98,397,96,397,94,397,93,397,91,397,90,397,88,397,87,397,86,397,83,395,80,395,79,395,77,394,74,393,72,393,70,393,67,392,65,391,63,389,62,388,59,386,57,383,54,383,52,381,49,379,48,378,47,376,46,375,44,374,43,372,43,370,42,369,42,368,42,364,41,362,41,359,41,355,41,351,42,349,42,347,42,346,44,343,44,342,45,339,46,337,47,336,48,333,49,332,51,330,51,328,51,327,52,326,53,323,53,322,53,321,54,320,55,317,55,316,56,314,58,309,59,306,61,306,62,305,62,304,63]}
Player’s chainHistory retrieved
db.chains.find({$in: chain_idArr});
db.players.find({‘login.urlSafe’: urlSafeUid}, {chainHistory: 1});
Retrieved chains ordered (on server) based on previously sorted chainHistory
chainHistory filtered and sorted on server, Applicable chains/steps retrieved
Stats are loaded from the counting log which is essentially a bunch of counters incremented
db.log.find({_id: {$in: [‘2012-06-20’, ‘2012-06-20’, ‘2012-06-20’,
‘2012-06-20’]}});
Extremely simple implementation using nested subdocuments for organization
Some Additional Doodle or Die Tricks
Build assumptions into queries
Eve tries to join despite being banned!
groups: { _id: ‘8rOIwh2VD’, members: [ {name: ‘Alice’, dateJoined: ‘2012-05-24’}, {name: ‘Bob’, dateJoined: ‘2012-05-25’}
] banned: [ {name: ‘Eve’, dateBanned: ‘2012-05-25’}
]}
Alice and Bob are in an awesome group
groups.update( {_id: ‘8rOIwh2VD’}, {$push: {members: {name: ‘Eve’, dateJoined: new Date()}
}, }, function(err) {
if (err) throw err;callback();
});
Bugs in our code fail to detect Eve’s trickery and the update query is run!
groups: { _id: ‘8rOIwh2VD’, members: [ {name: ‘Alice’, dateJoined: ‘2012-05-24’}, {name: ‘Bob’, dateJoined: ‘2012-05-25’}, {name: ‘Eve’, dateJoined: ‘2012-05-26’}
] banned: [ {name: ‘Eve’, dateRequested: ‘2012-05-25’}
]}
Blast! Eve got through! How can we help prevent this?
Bake in our assumptions!
groups.update( {_id: ‘8rOIwh2VD’, ‘members.name’: {$ne: ‘Eve’}, ‘banned.name’: {$ne: ‘Eve’}}, {$push: {members: {name: ‘Eve’, dateJoined: new Date()}
}, }, function(err, updateCount) { if (err) throw err; if (updateCount !== 1) throw new Error(‘bugs!’); callback(updateCount === 1); });
Always Specify Fields
To assign a player a new chain, we only need their _id and game information
players
account
info
login
chainHistory
stats
gamegame
To load a player’s profile content, we just need their history, stats, and profile info
players
account
info
login
chainHistory
stats
gamechainHistory
stats
info
We need to assign Bob a new chain, lets figure out what state he needs next
db.players.find({name: ‘Bob’});
This will fetch and send back the ENTIRE player object!
Lets specify that we only want our “game” subdocument
db.players.find({name: ‘Bob’}, {fields: [‘game’]});
Hooray! Much less to retrieve and send over the series of tubes.
Store Everything!
If you have information available, save it even if you don’t plan on using it!
db.chains.update({_id: ‘2VmVO18hs’}, {$push: {player_id: ‘1qyY61Lu1’, state: ‘draw’, date: new Date(), content: {time: 106896,
count: 27,
width: 520,
height: 390,
step_id: ‘1i7RlFbgU’ } });
Some Pain Points
Less obvious database structure
mysql> DESC players;+----------------+---------+------+-----+---------+-------+| Field | Type | Null | Key | Default | Extra |+----------------+---------+------+-----+---------+-------+| id | int(11) | NO | PRI | NULL | || activeStateId | int(11) | YES | | NULL | || activeStepId | int(11) | YES | | NULL | |+----------------+---------+------+-----+---------+-------+
Data integrity up to application
Mysql> ALTER TABLE players ADD CONSTRAINT fk_players_chains FOREIGN KEY (activeChainId) REFERENCES chains(id);
/* Oh no! This is going to mess things up! */
players.update({_id: ‘c58D4’}, {$set: {‘game.activeChain_id’: ‘bad_id’}});
No mature GUI query development tools
If we could start over would we still use MongoDB?
Absolutely!
• NoSQL in general is great for rapid prototyping
• Fantastic performance
• Intuitive language keeps it easy to run one-off queries / refactor schema
• Excellent (and now officially supported) Node driver
Questions?
@DoodleOrDie@Zugwalt