mongodb at yle

MongoDB @ Yle.fiAn Evening with MongoDB Helsinki, June 11th

Kalle Ylä-Anttila | Jussi Pöri

Agenda

• About Yle.fi• Technology stack• MongoDB

• Use cases• Why• How• Lessons learned

Yle.fi

• Some stats• ~3,5 million unique browsers per week• yle.fi/uutiset 2,4 - Areena 1,6• Lots of traffic during big events

• sports, elections, etc• ~40 developers (mostly external)

Yle.fi

• Main focus at the moment: Yle API• End user device fragmentation is a challenge• Provides “pure” data without presentation info• Enables flexibility and rapid changes• Hides the complexity of our legacy backend

services (mainly broadcast services)• All new customer services are powered by Yle

API

Yle.fi

• Yle API• Consist of several small API:s• Technology stacks vary between API:s• All API:s are used via public REST end points

• No direct linking, no direct reads of another APIs data store, no back-doors whatsoever

• = “The Jeff Bezos way” :)• We eat our own dog food

Technology stack

Yle API architecture

Architecture of a single API

Why mongodb

• Easy of use• High availability• Automatic failover• Horizontal scaling• Widely used• Open source

How

• MongoDB 2.4.8• 8 Replica set clusters

• 3 + 1 nodes per replica set • Overall 32 production nodes

• Configuration with Puppet• MMS for monitoring• New relic for server monitoring and alerts• Cashbah(2.6.2) and reactivemongo (0.10.2)

Case - uutisvahti

• Yle news mobile app• User can give weights for interesting topics

Case - uutisvahti

• Over 100 000 subscribers• 2.7M topic preferences• 12.6M articles read• 25M Push notification sent• Lead time from published news to app notification

only ~2s.• 50 - 100ms average API response time• Backend processing for news ratings

Case - uutisvahti

Stats• 15 449 826 objects• Avg. object size 1462 KB• Data size 21 GB• Index size 14 GB

Case - uutisvahti

Lessons learned• Network storage iops• Use separate databases for heavy writes• Replication hidden(backup) node does effect on voting. Do not

shutdown 2/4 nodes. • Look for index sizes, not to fill your memory• Less getMore’s• Reindexing, be careful when updating indexes. • Use secondary read’s where possible• Slow network will effect on replication

Case - metrics

• Provide metrics for applications about publications• Social media metrics, likes, tweets• View/play counts

• Store historical data• Background processing to fetch data from different

sources

Case - metrics• some-data

• 10 collections• 3.5 M objects• Avg object size 164 KB• Index size 650MB• Data size 557MB

• somedata• 5 collections• 176 M objects• Avg object size 167 KB• Index size 32 GB• Data size 27.6 GB

• somedata-comscore• 4 collections• 57 M objects• Avg object size 167 KB• Index size 10.3 GB• Data size 8.2 GB

• somedata-facebook• 4 collections• 3.5 M objects• Avg object size 164 KB• Index size 154 MB• Data size 131 MB

Case - metrics

Lesson learned• No collection/DB level isolation - Be careful when updating big

batches of data. Clients will see incomplete view.• When querying medium amounts of data be careful with built in

functions - mapReduce, distinct, aggregate have pretty small built in memory limits, so your query will not complete

• Sharding a cluster will be expensive (lot of nodes)• When playing around use some graphical query browser (e.g.

Robomongo) , sometimes it's easier to write queries when you include underscore.js

Future

• Upgrade to 2.6.1• SSD disks• Architecture with public cloud• Improve backups