breaking the oracle tie
TRANSCRIPT
1
Breaking the Oracle tie; High Performance OLTP and analytics using MongoDB
Alexandros GiamasSenior Software Engineer
Persado: Proven Value - Worldwide
Skype
$500 MILLION
Incremental Revenue
30+Premium Brands
20+Worldwide Languages
40+Countries
500M+
Engaged Consumers
100%Average Conversion
Lift
4
Can you afford to leave half the opportunity on the table?
You won't believe itPick an Online Number! Why you'll love your Online Number: 1. Your friends without VoIP can call you 2. You answer on VoIP 3. You also have voicemail included
I like that!
2.07%
You won't believe itPick an Online Number! Why get an Online Number: 1. Your friends without VoIP can call you 2. You answer on VoIP 3. You also have voicemail included
I like that!
1.42%
You won't believe itThey dial, you answer on VoIP! Why you'll love your Online Number: 1. Family & friends without VoIP can call you 2. You answer on VoIP 3. And you can use it from anywhere in the world
I like that!
1.11%
…another 16 Million + combinations
The Marketing Communication Suite
We Generate the marketing messages that work best.For any customer, any product, at any time.
Persado History
Oracle shop
Persado History
Persado History
• Exponentially growing dataset• Data value/KB?
Persado History
Not anymore...
Persado History
Transactional Data and Analytics
Transaction (Re)-defined
Social, Mobile, Email, Web, Display, Search
Which one stands out?
Conversational and Transactional PropertiesWeb based channels
Mobile Text Messaging
Conversational and Transactional Properties
Flexi-structured data
One User across campaigns and mediums
{"_id" : ObjectId("511e3cbea9f1fd01fbd51c67"),"domain" : DBRef("Domain", NumberLong(3)),"locale" : "en","msisdn" : "59210000000","email" : "[email protected]","mobileInfo" : {"State" : "CA"},"emailInfo" : { "referral" : "www.google.com" },"expclk" : { "h" : 0.05, "d" : 0.02 }
}
Overall Architecture - Data flow
Sizing transactional data
☛ User Terminated data☛ User Originated data☛ Metadata (state for User per campaign and globally)☛ Must hold data in memory, or at least indexes
ETL for OLAP
Offline / Online processing•Going online is mostly simpler•Offline must take into account data irregularities (data validation policy driven by business needs)
ETL for OLAP
☛Custom Data transformation☛Custom “continueOnError” implementation
Analytics
First cut- Custom js server-side using $where
Analytics
GWLGlobal Write Lock
Analytics In the real world
Your own mini transactions
Break down Spring Batch steps in idempotent and non idempotent ones•For idempotent steps, just replay them•For non idempotent, replace current state with last known good state before latest spring batch step invocation (undo log) and retry the step
Your own mini transactions Issues•16MB document size limit...•Slow to replay•Hard to test using Selenium
Analytics In the real world
Map Reduce Implementation
Analytics In the real world
Caching layers✓ Caching in collections
Analytics In the real world
Caching layers✓ Caching in ehcache
Analytics using the Aggregation Framework{$project: { "rdd": {
$isoDate: { year: {$year:"$_id.receivedDateHour"}, month: {$month:"$_id.receivedDateHour"}, dayOfMonth: {$dayOfMonth:"$_id.receivedDateHour"},
hour: {$hour:"$_id.receivedDateHour"} }
}, "value.diffDaysSum.0":1, "value.diffDaysSum.1":1, "value.diffDaysSum.2":1
} }, {$project: {rdd:1, diffDaysSum : {$add : ["$value.diffDaysSum.0",
"$value.diffDaysSum.1", "$value.diffDaysSum.2" ] } } },{$group: {
_id:"$rdd", totalSumPerDay: { $sum: "$diffDaysSum" } } }
Analytics using the Aggregation Framework
Double project phase, followed by grouping results
Analytics using the Aggregation Framework
Pros:✓ More flexible than it sounds✓ Rapid development✓ Easy debugging
Cons:✘ No custom js supported ✘ Memory limitation✘ API still evolving
Fine grained write semantics and asynchronous magic
Fine grained write semantics• WriteConcern.SAFE for most writes• WriteConcern.REPLICAS_SAFE for writes that are costly
to recompute in case of failure
Reactive Mongo • Asynchronous and non blocking scala driver for
MongoDB• Async writes with WriteConcern.SAFE and callback retry
policy in case of error
Lessons LearnedUse
replica setsJournalingAggregation FrameworkMMS
Don't useDevelopment versions across the teamUnbound datasets that can't fit in memoryMapReduce if you don't need to
MongoDB on EC2
4 nodes with 6 mongod processes
MongoDB on EC2 Using LVM's
http://goo.gl/8NbV7
For high performance, use LVM's with RAID 0 or 10Have your guerilla team ready:
MongoDB on EC2 Lesson Learned
Unix level tweaks:• Raise ulimit• Raise tcp timeout• Noatime nodirtime• Use XFS or ext4• Use LVM for snapshotting
Use journaling