mongodb, our swiss army knife database

42
MongoDB at fotopedia Timeline storage Our Swiss Army Knife Database

Upload: mathieu-poumeyrol

Post on 25-May-2015

3.692 views

Category:

Technology


3 download

DESCRIPTION

Experience feedback on 10 monthes of happy mongodb usage at fotopedia. You may also checkout: http://www.slideshare.net/octplane/mongodb-vs-mysql-a-devops-point-of-view

TRANSCRIPT

Page 1: Mongodb, our Swiss Army Knife Database

MongoDB at fotopediaTimeline storage

Our Swiss Army Knife Database

Page 2: Mongodb, our Swiss Army Knife Database

MongoDB at fotopedia

• Context

• Wikipedia data storage

• Metacache

Page 3: Mongodb, our Swiss Army Knife Database

Fotopedia

• Fotonauts, an American/French company

• Photo — Encyclopedia

• Heavily interconnected system : flickr, facebook, wikipedia, picassa, twitter…

• MongoDB in production since last october

• main store lives in MySQL… for now

Page 4: Mongodb, our Swiss Army Knife Database

First contact

• Wikipedia imported data

Page 5: Mongodb, our Swiss Army Knife Database
Page 6: Mongodb, our Swiss Army Knife Database
Page 7: Mongodb, our Swiss Army Knife Database
Page 8: Mongodb, our Swiss Army Knife Database
Page 9: Mongodb, our Swiss Army Knife Database

Wikipedia queries

• wikilinks from one article

• links to one article

• geo coordinates

• redirect

• why not use wikipedia API ?

Page 10: Mongodb, our Swiss Army Knife Database

Download ~ 5.7GB gzipXML

GeoRedirectBacklinkRelated

~12GB tabular data

Page 11: Mongodb, our Swiss Army Knife Database

Problem

Load ~12GB into a K/V store

Page 12: Mongodb, our Swiss Army Knife Database

CouchDB 0.9 attempt

• CouchDB had no dedicated import tool

• need to go through HTTP / Rest API

Page 13: Mongodb, our Swiss Army Knife Database

“DATA LOADING”

LOADING!

(obviously hijacked from xkcd.com)

Page 14: Mongodb, our Swiss Army Knife Database

Problem, rephrased

Load ~12GB into any K/V store

in hours, not days

Page 15: Mongodb, our Swiss Army Knife Database

Hadoop HBase ?

• as we were already using Hadoop Map/Reduce for preparation

• bulk load was just emerging at that time, requiring to code against HBase private APIs, generate the data in an ad-hoc binary format, ...

Page 17: Mongodb, our Swiss Army Knife Database

Problem, rerephrasedLoad ~12GB into any K/V store

in hours, not days

without wasting a week on development

and another week on setup

and several months on tuning

please ?

Page 18: Mongodb, our Swiss Army Knife Database

MongoDB attempt• Transforming the tabular data into a JSON

form : about half an hour or code, 45 minutes of hadoop parallel processing

• setup mongo server : 15 minutes

• mongoimport : 3 minutes to start it, 90 minutes to run

• plug RoR app on mongo : minutes

• prototype was done in a day

Page 19: Mongodb, our Swiss Army Knife Database

Download ~ 5.7GB gzip

GeoRedirectBacklinkRelated

~12GB, 12M docs

Batch Synchronous

Ruby on Rails

Page 20: Mongodb, our Swiss Army Knife Database

Hot swap ?

• Indexing was locking everything.

• Just run two instances of MongoDB.

• One instance is servicing the web app

• One instance is asleep or loading data

• One third instance knows the status of the two instances.

Page 21: Mongodb, our Swiss Army Knife Database

We loved:

• JSON import format

• efficiency of mongoimport

• simple and flexible installation

• just one cumbersome dependency

• easy to start (we use runit)

• easy to have several instances on one box

Page 22: Mongodb, our Swiss Army Knife Database

Second contact

• itʼs just all about graphes, anyway.

• wikilinks

• people following people

• related community albums

• and soon, interlanguage links

Page 23: Mongodb, our Swiss Army Knife Database
Page 24: Mongodb, our Swiss Army Knife Database

all about graphes...

• ... and itʼs also all about cache.

• The application needs to “feel” faster, letʼs cache more.

• The application needs to “feel” right, so letʼs cache less.

• or — big sigh — invalidate.

Page 25: Mongodb, our Swiss Army Knife Database
Page 27: Mongodb, our Swiss Army Knife Database

There are only two hard thingsin Computer Science:cache invalidation and naming things.

Phil Karlton

Haiku ?

Page 28: Mongodb, our Swiss Army Knife Database

Naming things

• REST have been a strong design principle in fotopedia since the early days, and the efforts are paying.

Page 29: Mongodb, our Swiss Army Knife Database

/en/2nd_arrondissement_of_Paris

/en/Paris/fragment/left_col

/en/Paris/fragment/related

/users/john/fragment/contrib

Page 30: Mongodb, our Swiss Army Knife Database

Invalidating

• Rest allows us to invalidate by URL prefix.

• When the Paris album changes, we have to invalidate /en/Paris.*

Page 31: Mongodb, our Swiss Army Knife Database

Varnish invalidation

• Varnish built-in regexp based invalidation is not designed for intensive, fine grained invalidation.

• We need to invalidate URLs individually.

Page 32: Mongodb, our Swiss Army Knife Database

/en/Paris.*

/en/Paris

/en/Paris/fragment/left_col

/en/Paris/photos.json?skip=0&number=20

/en/Paris/photos.json?skip=13&number=27

Page 33: Mongodb, our Swiss Army Knife Database

Metacache workflow

RoR application

Varnish HTTP cache

Nginx SSI

metacache feeder

varnish log

invalidation worker

/en/Paris/en/Paris/fragment/left_col/en/Paris/photos.json?skip=0&number=20/en/Paris/photos.json?skip=13&number=27

/en/Paris/fragment/left_col

/en/Paris.*

Page 34: Mongodb, our Swiss Army Knife Database

Waw.

• This time we are actually using MongoDB as a BTree. Impressive.

• The metacache has been running fine for several months, and we want to go further.

Page 35: Mongodb, our Swiss Army Knife Database

Invalidate less

• We need to be more specific as to what we invalidate.

• Today, if somebody votes on a photo in the Paris album, we invalidate all /en/Paris prefix, and most of it is unchanged.

• We will move towards a more clever metacache.

Page 36: Mongodb, our Swiss Army Knife Database

Metacache reloaded• Pub/Sub metacache

• Have the backend send a specific header to be caught by the metacache-feeder, conaining “subscribe” message.

• This header will be a JSON document, to be pushed to the metacache.

• The purge commands will be mongo search queries.

Page 37: Mongodb, our Swiss Army Knife Database

{url:/en/Paris, observe:[summary,links]}

{url:/en/Paris/fragment/left_col, observe: [cover]}

{url:/en/Paris/photos.json?skip=0&number=20, observe:[photos]}

{url:/en/Paris/photos.json?skip=0&number=20, observe:[photos]}

/en/Paris

/en/Paris/fragment/left_col

/en/Paris/photos.json?skip=0&number=20

/en/Paris/photos.json?skip=13&number=27

Page 38: Mongodb, our Swiss Army Knife Database

{url:/en/Paris, observe:[summary,links]}

{url:/en/Paris/fragment/left_col, observe: [cover]}

{url:/en/Paris/photos.json?skip=0&number=20, observe:[photos]}

{url:/en/Paris/photos.json?skip=0&number=20, observe:[photos]}

when somebody votes{ url:/en/Paris.*, observe:photos }

when the summary changes{ url:/en/Paris.*, observe:summary }

when the a new link is created{ url:/en/Paris.*, observe:links }

Page 39: Mongodb, our Swiss Army Knife Database

Other uses cases

• Timeline activities storage: just one more BTree usage.

• Moderation workflow data: tiny dataset, but more complex queries, map/reduce.

• Suspended experimentation around log collection and analysis

Page 40: Mongodb, our Swiss Army Knife Database

Current situation

• Mysql: main data store

• CouchDB: old timelines (+ chef)

• MongoDB: metacache, wikipedia, moderation, new timelines

• Redis: raw data cache for counters, recent activity (+ resque)

Page 41: Mongodb, our Swiss Army Knife Database

What about the main store ?

• albums are good fit for documents

• votes and score may be more tricky

• recent introduction of resque

Page 42: Mongodb, our Swiss Army Knife Database

In short

• Simple, fast.

• Hackable: in a language most can read.

• Clear roadmap.

• Very helpful and efficient team.

• Designed with application developer needs in mind.