yieldbot tech talk, sept 20, 2012

19
© 2012 Yieldbot / CONFIDENTIAL © 2012 Yieldbot / Yieldbot Tech Talk – MongoDB to k/v © 2012 Yieldbot

Upload: yieldbot

Post on 06-May-2015

3.762 views

Category:

Technology


0 download

TRANSCRIPT

Page 1: Yieldbot Tech Talk, Sept 20, 2012

© 2012 Yieldbot / CONFIDENTIAL© 2012 Yieldbot / CONFIDENTIAL

Yieldbot Tech Talk – MongoDB to k/v

© 2012 Yieldbot

Page 2: Yieldbot Tech Talk, Sept 20, 2012

© 2012 Yieldbot

Yieldbot Tech Talk – MongoDB to key/value, Sept 20, 2012

• Yieldbot technology creates marketplaces where advertisers target realtime consumer intent flowing through premium publishers.

• At a high level: Analytics + Ad Serving– Geo-distributed

• Data collection• Realtime ad matching

– Cascalog batch analytics– Rich Analytics Results visualizations

What We Do

Page 3: Yieldbot Tech Talk, Sept 20, 2012

© 2012 Yieldbot

Yieldbot Tech Talk – MongoDB to key/value, Sept 20, 2012

Why MongoDB (Dec 2009)

• Needed manageable by dev team (1 person!)• Flexible• Easy to get started, run on laptop or deploy• Scale wasn’t initially biggest concern• Could focus on other stuff

– Lucene– Analytics– Ad serving dynamics

Page 4: Yieldbot Tech Talk, Sept 20, 2012

© 2012 Yieldbot

Yieldbot Tech Talk – MongoDB to key/value, Sept 20, 2012

• Configuration– Publisher profiles, ad matching rules, etc.

• Data collection– Pageviews, impressions, clicks

• Analytics results• Task state tracking• Lookup tables for ad serving• Real-time ad stats

How MongoDB Used Initially

Page 5: Yieldbot Tech Talk, Sept 20, 2012

© 2012 Yieldbot

Yieldbot Tech Talk – MongoDB to key/value, Sept 20, 2012

• Master/Slave– convenient for simple durability– convenient for geo distribution– not unique to Mongo, now similar redis topology

• Indexing– Easy to set up, but eventually RAM scaling issue– initially great for efficient views of data in UI– moved analytics results as key/value in mongo

• Durable sharded config (replica sets) expensive

Couple Aspects of Note

Page 6: Yieldbot Tech Talk, Sept 20, 2012

© 2012 Yieldbot

Yieldbot Tech Talk – MongoDB to key/value, Sept 20, 2012

• Mongo: collections for pageviews, impressions, clicks– Wasn’t archived anywhere else– Not where you want to infinitely scale

• Now flows through redis, to files, to S3

Data Collection

Page 7: Yieldbot Tech Talk, Sept 20, 2012

© 2012 Yieldbot

Yieldbot Tech Talk – MongoDB to key/value, Sept 20, 2012

• redis lists populated as events come in• Daemons pull off lists and write to files• Periodically compress and archive files to S3• S3 files used for input later

– Hadoop (Cascalog) batch analytics– Advertising Stats Calculations

Data Collection with redis Assist

Page 8: Yieldbot Tech Talk, Sept 20, 2012

© 2012 Yieldbot

Yieldbot Tech Talk – MongoDB to key/value, Sept 20, 2012

• Mongo: collections for different lookup types– Eg., geo, url– Built periodically, updated on config change– Lookup in each, correlate results

• redis– Ability to pipeline operations in single server call– Set intersection across lookup dimensions and one

response back– Same master/slave as Mongo for distribution

Matching Lookup Tables

Page 9: Yieldbot Tech Talk, Sept 20, 2012

© 2012 Yieldbot

Yieldbot Tech Talk – MongoDB to key/value, Sept 20, 2012

• Mongo– Database per publisher– Collections for objects– Denormalized where possible– Manual Foreign Keys– Obviously best candidate for relational model

• History and Versioning was paramount to us– Roll our own: HeroDB

Configuration

Page 10: Yieldbot Tech Talk, Sept 20, 2012

© 2012 Yieldbot

Yieldbot Tech Talk – MongoDB to key/value, Sept 20, 2012

HeroDB

• History and granular versioning highest goal• Database built on top of git

– Golden database is a bare repo– Can clone to anywhere, make changes, push– Changes in single commit are atomic

• How, when, and who changed it• Ability to set to specific previous state of DB• Much more to do, in production 6+ months

– Recent change, caching

Page 11: Yieldbot Tech Talk, Sept 20, 2012

© 2012 Yieldbot

Yieldbot Tech Talk – MongoDB to key/value, Sept 20, 2012

Analytics Results

• ARCv1, Mongo: indexed collections– Very easy to code to– Initially with everything else in same server– Moved out to dedicated server– Memory became an issue

• Indexes bigger than data itself– Overhead of importing Cascalog results

• Pull json files from S3 to local disk• mongoimport files into DB

Page 12: Yieldbot Tech Talk, Sept 20, 2012

© 2012 Yieldbot

Yieldbot Tech Talk – MongoDB to key/value, Sept 20, 2012

• ARCv2, Mongo: paged data, key/value– Migrated app to key/value access pattern– Much better memory usage– Application sharded, publishers spread around– DB per day per publisher, most recent 7 held– Still overhead of importing Hadoop results

• Pull json files from S3 to local disk• mongoimport files into DB

Analytics Results Cont’d

Page 13: Yieldbot Tech Talk, Sept 20, 2012

© 2012 Yieldbot

Yieldbot Tech Talk – MongoDB to key/value, Sept 20, 2012

Analytics Results - ElephantDB

• Cascalog support to directly write EDB format– Berkeley DB or LevelDB

• Ring Topology– Shards distributed around ring, consistent hashing– Configurable replication factor– Request to any node, forwards as necessary– Incrementally increase ring size

• Import from S3 efficient– Copy shard from S3 to local disk

Page 14: Yieldbot Tech Talk, Sept 20, 2012

© 2012 Yieldbot

Yieldbot Tech Talk – MongoDB to key/value, Sept 20, 2012

Real-time Ad Stats

• Mongo: DB per day, collection by entity type– Document per entity instance– stat_type.hour.minute nested values, atomic

increment– Never a good story around aggregating at larger

timeframes• Enter redis again

Page 15: Yieldbot Tech Talk, Sept 20, 2012

© 2012 Yieldbot

Yieldbot Tech Talk – MongoDB to key/value, Sept 20, 2012

Real-time Ad Stats Cont’d

• redis has robust access patterns– More pipelining

• Initially realtime and aggregated kept in redis• Issue with redis scaling is DB has to fit in memory• Time-period aggregations now kept in HBase• Only most recent hours kept in redis

Page 16: Yieldbot Tech Talk, Sept 20, 2012

© 2012 Yieldbot

Yieldbot Tech Talk – MongoDB to key/value, Sept 20, 2012

Task State Tracking

• The last holdout• Collection of tasks

– Each task is a document– Indexed as needed– Mongo query and update syntax convenient

• Both in static code, but also in Python or Mongo repl

Page 17: Yieldbot Tech Talk, Sept 20, 2012

© 2012 Yieldbot

Yieldbot Tech Talk – MongoDB to key/value, Sept 20, 2012

• redis for the celery backend, used for task messaging infrastructure

• but was never mongo anyway...

Honorable Mention

Page 18: Yieldbot Tech Talk, Sept 20, 2012

© 2012 Yieldbot

Yieldbot Tech Talk – MongoDB to key/value, Sept 20, 2012

MongoDB Migration Summary • Configuration• Data Collection• Analytics Results• Task State Tracking• Matcher Lookup Tables• Real-time Ad Stats

HeroDB to S3 via redis ElephantDB still Mongo redis redis/HBase

Page 19: Yieldbot Tech Talk, Sept 20, 2012

© 2012 Yieldbot

Yieldbot Tech Talk – MongoDB to key/value, Sept 20, 2012

Thanks!

Site: yieldbot.comBlog: blog.yieldbot.comTwitter: @yieldbotEmail: [email protected]