migrating from mysql to mongodb at wordnik
DESCRIPTION
Slide's from Tony Tam's presentation at MongoSF on 4/30/2010TRANSCRIPT
MongoSF 4/30/2010From MySQL to MongoDB
Migrating a Live Application
Tony Tam
WHAT IS WORDNIK
Project to track language like GPS for English
Dictionary is a road block to the languageRoughly 200 new words created dailyLanguage is not static
Capture information about all wordsMeaning is often undefined in traditional senseMachines can determine meaning through
analysisNeeds LOTS of data
WHY SHOULD YOU CARE
Every Developer can use a Robust Language API!
Wordnik migrated to MongoDB> 5 Billion documents> 1.2 TBZero application downtime
Learn from our Experience
WORDNIK
Not just a website!But we have one
Launched Wordnik entirely on MySQLHit road bumps with insert speed ~4B rows
on MyISAM tablesTables locked for 10’s of seconds during insertsBut we need more data!
Created elaborate update schemes to work around itLost lots of sleep babysitting servers while
researching LT solution
WORDNIK + MONGODB
What are our storage needs?Database vs. Application LogicNo PK/FK constraintsNo Stored ProceduresConsistency?
Lots of R&DTried most all noSQL solutions
MIGRATING STORAGE ENGINES
Many parts to this effortSetup & AdministrationSoftware DesignOptimization
Many types of data at WordnikCorpusStructured Hierarchical DataUser Data
Migrated #1 & #2
SERVER INFRASTRUCTURE
Wordnik is Heavily Read-onlyMaster / Slave deployment
Looking at replica pairsMongoDB loves system resources
Wordnik runs dedicated boxes to avoid other apps being sent to disk (aka time-out)
Memory + Disk = Happy MongoMany X the disk space of MySQLEasy pill to swallow until…
SERVER INFRASTRUCTURE
Physical Hardware2 x 4 core CPU, 32gb RAM, FC SANHad bad luck on VMs
(you might not)Disk speed => performance
SOFTWARE DESIGN
Two distinct use cases for MongoDBIdentical structure, different storage
engineSame underlying objects, same storage fidelity
(largely key/value)Hierarchical data structure
Same underlying objects, document-oriented storage
SOFTWARE DESIGN
Create BasicDBObjects from POJOs and used collection methodsBasicDBObject dbo = new BasicDBObject("sentence",s.getSentence()) .append("rating",s.getRating()).append(...);
ID Generation to manage unique _ID valuesAnalogous to MySQL AutoIncrement behaviorCompatible with MySQL Ids (more later)dbo.append("_ID", getId());collection.save(dbo);
Implemented all CRUD methods in DAO Swappable between MongoDB and MySQL at
runtime
SOFTWARE DESIGN
Key-Value storage use caseEasy as implementing new DAOs
SentenceHandler h = new MongoDBSentenceHandler();
Save methods construct BasicDBObject and call save() on collection
Implement same interfaceSame methods against DAO between MySQL
and MongoDB versionsData Abstraction 101
SOFTWARE DESIGN
What about bulk inserts?FAF Queued approach
Add objects to queue, return to callerEvery X seconds, process queueAll objects from same collection are appended
to a single List<DBObject>Call collection.insert(…) before 2M
charactersReduces network overheadVery fast inserts
SOFTWARE DESIGN
Hierarchical Data done more elegantlyWordnik Dictionary ModelJava POJOs already had JAXB annotations
Part of public REST apiUsed Mysql
12+ tables13 DAOs2500 lines of code50 requests/second uncachedMemcache needed to maintain reasonable speed
SOFTWARE DESIGN
TMGO
SOFTWARE DESIGN
MongoDB’s Document Storage let us…Turn the Objects into JSON via Jackson
Mapper (fasterxml.com)Call saveSupport all fetch types, enhanced filters1000 requests / secondNo explicit cachingNo less scary code
SOFTWARE DESIGN
Saving a complex objectString rawJSON = getMapper().writeValueAsString(veryComplexObject);
collection.save(new BasicDBOBject(getId(),JSON.parse(rawJSON));
Fetching complex objectBasicDBObject dbo = cursor.next();ComplexObject obj = getMapper().readValue(dbo.toString(), ComplexObject.class);
No joins, 20x faster
MIGRATING DATA
Migrating => existing data logicUse logic to select DAOs appropriatelyRead from old, write with newGreat system test for MongoDB
SentenceHandler mysqlSh = new MySQLSentenceHandler();SentenceHandler mongoSh = new MongoDbSentenceHandler();while(hasMoreData){ mongoSh.asyncWrite(mysqlSh.next()); ...}
MIGRATING DATA
Wordnik moved 5 billion rows from MySQLSustained 100,000 inserts/secondMigration tool was CPU bound
ID generation logic, among other
Wordnik reads MongoDB fastRead + create java objects @ 250k/second (!)
GOING LIVE TO PRODUCTION
Choose your use case carefully if migrating incrementally
Scary no matter whatTest your perf monitoring system first!
Use your DAOs from migrationTurn on MongoDB on one server,
monitor, tune (rollback, repeat)Full switch over when comfortable
GOING LIVE TO PRODUCTION
Really?SentenceHandler h = null;if(useMongoDb){ h = new MongoDbSentenceHandler();}else{ h = new MySQLDbSentenceHandler();}return h.find(...);
OPTIMIZING PERFORMANCE
Home-grown connection poolingMaster onlyConnectionManager.getReadWriteConnection()
Slave onlyConnectionManager.getReadOnlyConnection()
Round-robin all servers, bias on slaves
ConnectionManager.getConnection()
OPTIMIZING PERFORMANCE
CachingHad complex logic to handle cache
invalidationOut-of-process caches are not freeMongoDB loves your RAMLet it do your LRU cache (it will anyway)
HardwareDo not skimp on your disk or RAM
IndexesSchema-less design
Even if no values in any document, needs to read document schema to check
OPTIMIZING PERFORMANCE
Disk spaceSchemaless => schema per document
(row)Choose your mappings wisely({veryLongAttributeName:true}) =>
more disk space than ({vlan:true})
OPTIMIZING PERFORMANCE
A Typical Day at the Office for MongoDBAPI call rate: 47.7 calls/sec
OTHER TIPS
Data TypesUse caution when changingDBObject obj = cur.next();long id = (Long) obj.get(“IWasAnIntOnce”)
Attribute namesDon’t change w/o migrating existing data!
WTFDMDG????
WHAT’S NEXT?
GridFSStore audio files on disk
Requires clustered file system for shared access
Capped Collections (rolling out this week)
UGC from MySQL => MongoDBBeg/Bribe 10gen for some Features
Questions?