mongodb at yle
TRANSCRIPT
MongoDB @ Yle.fiAn Evening with MongoDB Helsinki, June 11th
Kalle Ylä-Anttila | Jussi Pöri
Agenda
• About Yle.fi• Technology stack• MongoDB
• Use cases• Why• How• Lessons learned
Yle.fi
• Some stats• ~3,5 million unique browsers per week• yle.fi/uutiset 2,4 - Areena 1,6• Lots of traffic during big events
• sports, elections, etc• ~40 developers (mostly external)
Yle.fi
• Main focus at the moment: Yle API• End user device fragmentation is a challenge• Provides “pure” data without presentation info• Enables flexibility and rapid changes• Hides the complexity of our legacy backend
services (mainly broadcast services)• All new customer services are powered by Yle
API
Yle.fi
• Yle API• Consist of several small API:s• Technology stacks vary between API:s• All API:s are used via public REST end points
• No direct linking, no direct reads of another APIs data store, no back-doors whatsoever
• = “The Jeff Bezos way” :)• We eat our own dog food
Technology stack
Yle API architecture
Yle API architecture
Architecture of a single API
Why mongodb
• Easy of use• High availability• Automatic failover• Horizontal scaling• Widely used• Open source
How
• MongoDB 2.4.8• 8 Replica set clusters
• 3 + 1 nodes per replica set • Overall 32 production nodes
• Configuration with Puppet• MMS for monitoring• New relic for server monitoring and alerts• Cashbah(2.6.2) and reactivemongo (0.10.2)
Case - uutisvahti
• Yle news mobile app• User can give weights for interesting topics
Case - uutisvahti
• Over 100 000 subscribers• 2.7M topic preferences• 12.6M articles read• 25M Push notification sent• Lead time from published news to app notification
only ~2s.• 50 - 100ms average API response time• Backend processing for news ratings
Case - uutisvahti
Stats• 15 449 826 objects• Avg. object size 1462 KB• Data size 21 GB• Index size 14 GB
Case - uutisvahti
Lessons learned• Network storage iops• Use separate databases for heavy writes• Replication hidden(backup) node does effect on voting. Do not
shutdown 2/4 nodes. • Look for index sizes, not to fill your memory• Less getMore’s• Reindexing, be careful when updating indexes. • Use secondary read’s where possible• Slow network will effect on replication
Case - metrics
• Provide metrics for applications about publications• Social media metrics, likes, tweets• View/play counts
• Store historical data• Background processing to fetch data from different
sources
Case - metrics• some-data
• 10 collections• 3.5 M objects• Avg object size 164 KB• Index size 650MB• Data size 557MB
• somedata• 5 collections• 176 M objects• Avg object size 167 KB• Index size 32 GB• Data size 27.6 GB
• somedata-comscore• 4 collections• 57 M objects• Avg object size 167 KB• Index size 10.3 GB• Data size 8.2 GB
• somedata-facebook• 4 collections• 3.5 M objects• Avg object size 164 KB• Index size 154 MB• Data size 131 MB
Case - metrics
Lesson learned• No collection/DB level isolation - Be careful when updating big
batches of data. Clients will see incomplete view.• When querying medium amounts of data be careful with built in
functions - mapReduce, distinct, aggregate have pretty small built in memory limits, so your query will not complete
• Sharding a cluster will be expensive (lot of nodes)• When playing around use some graphical query browser (e.g.
Robomongo) , sometimes it's easier to write queries when you include underscore.js
Future
• Upgrade to 2.6.1• SSD disks• Architecture with public cloud• Improve backups