moving from c#/.net to hadoop/mongodb

Post on 24-Jan-2015

2.337 Views

Category:

Technology

1 Downloads

Preview:

Click to see full reader

DESCRIPTION

 

TRANSCRIPT

Moving from C#/.NET to Hadoop/MongoDB

Robert Vandehey

December 4, 2012

4 © 2012 Rovi Corporation. Company confidential.

We power the Discovery, Delivery and Display of Digital Entertainment

7 © 2012 Rovi Corporation. Company confidential.

Viewers use our guide technologiesthrough service provider offerings

137M+

Consumer electronic (CE) deviceshave our CE guide technologies

266M+

Households reached globally by Rovi Advertising Network

40M+

Devices certified for high quality DivX video playback

600M+

Storefronts with entertainment servicespowered by Rovi Entertainment Store

47M+

TV shows, movies, sports and celebrities

4.5M+

Album releases and 32M music tracks3.3M+

Movie titles500K+

Data coverage:

Global Reach

© 2012 Rovi Corporation. Company confidential.11

© 2012 Rovi Corporation. Company confidential.13

The Problem

ETL/Cache Loading Data Takes Too Long

Page 16

Cache Loading Process

DSG DB Server(s)

DSG Database

WSP ETL Server

CI DatabaseDatabase TransformExtract

Node 2 DB Server

CI Database

Backup & Restore

MemcacheD Cluster

MemcacheD

MemcacheD

Node 1 DB Server

CI Database

Backup & Restore

MemcacheDB Cluster

MemcacheDB

MemcacheDB

MemcacheD (Scratch Server(s))

MemcacheD

MemcacheD

Table Loading Process

© 2012 Rovi Corporation. Company confidential.17

The Solution

Hadoop/MongoDB

Copyright ®2012 Rovi Corporation. Company confidential.18

Network Diagram

Copyright ®2012 Rovi Corporation. Company confidential.20

Mongo Sharding

Copyright ®2012 Rovi Corporation. Company confidential.21

© 2012 Rovi Corporation. Company confidential.23

Challenges

Challenges

• Transition existing Windows/.NET team to Linux/Java

– Environment setup. Technology framework choices

– Coding differences

– Cultural differences

– Platform differences

– Easier than expected to transition team from .NET to Java – No religious battles

• Backwards compatibility of CXF web services to Microsoft .NET web services

• Managing new releases of Hadoop

• BCP took too long

– Converted to base tables. Used Pig to join the data

• Writes to Mongo are very fast. Updates are slower and saturated disks

– Implemented Diff process (MD5 calc) to allow Hadoop to do the work and minimize writes to Mongo

© 2012 Rovi Corporation. Company confidential.24

Lessons Learned

© 2012 Rovi Corporation. Company confidential.25

Lessons Learned

• General

– Current versions of Hadoop CDH4 and MongoDB 2.0 are actually very stable products• We purchased enterprise support agreements from both Cloudera and 10gen

– Create a developers VM image

– Deploy early and often even if not ready for real customers

– Use the same setup in test and production environments• Sharding caused differences

• SQL

– Get raw tables without any transformation or joins• Let Hadoop do the processing for you

• Hadoop

– Do as much work as you can in Hadoop

– Take the time to create small datasets to iterate fast

– Take the time to learn and use Pig• It is very fast and provides tons of functionality that you don’t need to code in Java

– Don’t create Runners - Use Oozie workflows

– Measure, benchmark and track performance – Use Hadoop counters

© 2012 Rovi Corporation. Company confidential.26

Lessons Learned - 2

• MongoDB

– RAM, RAM, RAM!!!

– Many writes from Hadoop can easily overwhelm MongoDB• Single database lock

• Drive bandwidth saturation – Can be expanded through sharding

• Do as much as possible to minimize writes

• Measure where your application is blocking and optimize

– Don’t shard unless you have to – if you do shard, preconfigure your shard key• You need a good shard key

– Use Replica sets. They are easy to setup and work good.• Make sure repllog is large enough.

– Use MongoDB Monitoring Service (MMS) – It’s free

– Mongo queries are fast!

© 2012 Rovi Corporation. Company confidential.27

Mongo Query – returns 90 rows from a database of 9 million in 44ms

© 2012 Rovi Corporation. Company confidential.28

31 © 2012 Rovi Corporation. Company confidential.

Q&A

Follow-up Information

• Email: robert.vandehey@rovicorp.com

• LinkedIn: http://www.linkedin.com/in/bvandehey

• Twitter: @bvandehey

• Rovi Cloud Services: http://developer.rovicorp.com/

© 2012 Rovi Corporation. Company confidential.32

Thank You

33 © 2012 Rovi Corporation. Company confidential.

top related