Download - Keeping the Lights On with MongoDB
MONGO SVKeeping the lights on with MongoDB
Tony Tam
12/3/2010
PRESENTATION OVERVIEW
Data >>> codeTreat it appropriately
Manage and maintain MongoMongo is young (and robust!)
Performance and FeaturesThe right hooks exist
WHO IS WORDNIK
Wordnik is:The world’s largest English Language
reference ~10M words!
Mapping every word, based on real data
(free ) API to add word information, everywhere
WORDNIK’S MONGODB DEPLOYMENT
Over 12 Months with MongoCorpus/UGC/Structured Data/StatisticsMaster/Slave~3TB data~12B recordsWe love Mongo’s performanceRead more:
http://blog.wordnik.com/12-months-with-mongodb
ENGINEERING + IT OPS
First, Guiding PrinciplesKnow your dataDon’t rely on IT magic
Equal Importance in WebApps / SaaSHold hands and be friendsIf you can’t manage it, don’t deploy it
ADMINS: BE PREPARED
ok, this sucks.
HOW?
Replicate!Is that enough?Well, not if your company is on the line
SnapshotEvery minute???
Export oftenReally???
THEN WHAT?
Yes, Mongo can do IncrementalUse the mongo slave mechanism
It’s exposedIt’s supportedIt’s very easyIt’s extremely fast
How?Snapshot your dataStream write ops to diskRepeat
BETTER THAN FREE
Take our tools-They work!!!SnapshotUtil
Selectively snapshot in BSON Index info too!
IncrementalBackupUtilTail the oplog, stream to disk
Only the collections you want! Compress & rotate
RestoreUtilRecover your snapshots
Apply indexes yourself
ReplayUtilApply your Incremental backups
WHAT IF SCENARIOS
One collection gets corrupt?Restore itApply all operations to it
“My top developer dropped a collection!”Restore just that oneApply operations to it until that POT
“We got hacked!”Restore it allApply operations until that POT
WHAT ELSE IS POSSIBLE?
ReplicationWhy not use built-in?
Control, of courseSame logic as Incremental + Replay
Add some filters and it gets interesting
HOT DATACENTER
Create incremental backupsCompressPush to DC in batchApply to master
SCP
Primary Datacenter
Master
Incremental Backup
Files
Hot Datacenter
Master
Replay Util
DEV ENVIRONMENT
Developers need production-ish dataAnonymize while replicating to dev
server
MULTIPLE UPSTREAM MASTERS
Aggregate to single collectionTarget can be a master!
Master C
Master A
db.page_views
Master B
db.page_views
UNBLOCK MAPREDUCE
Map Reduce can lock up your serverReplicate source data to another mongodReplicate results back to master
MasterMR
Server
db.source_data
db.summary_data
MESH MODE
Write to Multiple MastersFilter by “Server Identifier”
Master 1 Master 2
db.documentsdocuments.src != 1
> db.documents.find().limit(2){"_id":99887,"src":2,"title":"favorite.png","fsid":33774}{"_id":128773,"src":1,"title":"select.png","fsid":837743}
db.documentsdocuments.src != 2
WHAT’S NEXT
Multi-Master in Wordnik ProductionMultiple Datacenter PresenceMore data => more challenges
TRY IT OUT
http://blog.wordnik.com/mongoutils
Questions?