moving from c#/.net to hadoop/mongodb

21
Moving from C#/.NET to Hadoop/MongoDB Robert Vandehey December 4, 2012

Upload: mongodb

Post on 24-Jan-2015

2.337 views

Category:

Technology


1 download

DESCRIPTION

 

TRANSCRIPT

Page 1: Moving from C#/.NET to Hadoop/MongoDB

Moving from C#/.NET to Hadoop/MongoDB

Robert Vandehey

December 4, 2012

Page 2: Moving from C#/.NET to Hadoop/MongoDB

4 © 2012 Rovi Corporation. Company confidential.

We power the Discovery, Delivery and Display of Digital Entertainment

Page 3: Moving from C#/.NET to Hadoop/MongoDB

7 © 2012 Rovi Corporation. Company confidential.

Viewers use our guide technologiesthrough service provider offerings

137M+

Consumer electronic (CE) deviceshave our CE guide technologies

266M+

Households reached globally by Rovi Advertising Network

40M+

Devices certified for high quality DivX video playback

600M+

Storefronts with entertainment servicespowered by Rovi Entertainment Store

47M+

TV shows, movies, sports and celebrities

4.5M+

Album releases and 32M music tracks3.3M+

Movie titles500K+

Data coverage:

Global Reach

Page 4: Moving from C#/.NET to Hadoop/MongoDB
Page 5: Moving from C#/.NET to Hadoop/MongoDB
Page 6: Moving from C#/.NET to Hadoop/MongoDB

© 2012 Rovi Corporation. Company confidential.11

Page 7: Moving from C#/.NET to Hadoop/MongoDB

© 2012 Rovi Corporation. Company confidential.13

The Problem

Page 8: Moving from C#/.NET to Hadoop/MongoDB

ETL/Cache Loading Data Takes Too Long

Page 16

Cache Loading Process

DSG DB Server(s)

DSG Database

WSP ETL Server

CI DatabaseDatabase TransformExtract

Node 2 DB Server

CI Database

Backup & Restore

MemcacheD Cluster

MemcacheD

MemcacheD

Node 1 DB Server

CI Database

Backup & Restore

MemcacheDB Cluster

MemcacheDB

MemcacheDB

MemcacheD (Scratch Server(s))

MemcacheD

MemcacheD

Table Loading Process

Page 9: Moving from C#/.NET to Hadoop/MongoDB

© 2012 Rovi Corporation. Company confidential.17

The Solution

Page 10: Moving from C#/.NET to Hadoop/MongoDB

Hadoop/MongoDB

Copyright ®2012 Rovi Corporation. Company confidential.18

Page 11: Moving from C#/.NET to Hadoop/MongoDB

Network Diagram

Copyright ®2012 Rovi Corporation. Company confidential.20

Page 12: Moving from C#/.NET to Hadoop/MongoDB

Mongo Sharding

Copyright ®2012 Rovi Corporation. Company confidential.21

Page 13: Moving from C#/.NET to Hadoop/MongoDB

© 2012 Rovi Corporation. Company confidential.23

Challenges

Page 14: Moving from C#/.NET to Hadoop/MongoDB

Challenges

• Transition existing Windows/.NET team to Linux/Java

– Environment setup. Technology framework choices

– Coding differences

– Cultural differences

– Platform differences

– Easier than expected to transition team from .NET to Java – No religious battles

• Backwards compatibility of CXF web services to Microsoft .NET web services

• Managing new releases of Hadoop

• BCP took too long

– Converted to base tables. Used Pig to join the data

• Writes to Mongo are very fast. Updates are slower and saturated disks

– Implemented Diff process (MD5 calc) to allow Hadoop to do the work and minimize writes to Mongo

© 2012 Rovi Corporation. Company confidential.24

Page 15: Moving from C#/.NET to Hadoop/MongoDB

Lessons Learned

© 2012 Rovi Corporation. Company confidential.25

Page 16: Moving from C#/.NET to Hadoop/MongoDB

Lessons Learned

• General

– Current versions of Hadoop CDH4 and MongoDB 2.0 are actually very stable products• We purchased enterprise support agreements from both Cloudera and 10gen

– Create a developers VM image

– Deploy early and often even if not ready for real customers

– Use the same setup in test and production environments• Sharding caused differences

• SQL

– Get raw tables without any transformation or joins• Let Hadoop do the processing for you

• Hadoop

– Do as much work as you can in Hadoop

– Take the time to create small datasets to iterate fast

– Take the time to learn and use Pig• It is very fast and provides tons of functionality that you don’t need to code in Java

– Don’t create Runners - Use Oozie workflows

– Measure, benchmark and track performance – Use Hadoop counters

© 2012 Rovi Corporation. Company confidential.26

Page 17: Moving from C#/.NET to Hadoop/MongoDB

Lessons Learned - 2

• MongoDB

– RAM, RAM, RAM!!!

– Many writes from Hadoop can easily overwhelm MongoDB• Single database lock

• Drive bandwidth saturation – Can be expanded through sharding

• Do as much as possible to minimize writes

• Measure where your application is blocking and optimize

– Don’t shard unless you have to – if you do shard, preconfigure your shard key• You need a good shard key

– Use Replica sets. They are easy to setup and work good.• Make sure repllog is large enough.

– Use MongoDB Monitoring Service (MMS) – It’s free

– Mongo queries are fast!

© 2012 Rovi Corporation. Company confidential.27

Page 18: Moving from C#/.NET to Hadoop/MongoDB

Mongo Query – returns 90 rows from a database of 9 million in 44ms

© 2012 Rovi Corporation. Company confidential.28

Page 19: Moving from C#/.NET to Hadoop/MongoDB

31 © 2012 Rovi Corporation. Company confidential.

Q&A

Page 20: Moving from C#/.NET to Hadoop/MongoDB

Follow-up Information

• Email: [email protected]

• LinkedIn: http://www.linkedin.com/in/bvandehey

• Twitter: @bvandehey

• Rovi Cloud Services: http://developer.rovicorp.com/

© 2012 Rovi Corporation. Company confidential.32

Page 21: Moving from C#/.NET to Hadoop/MongoDB

Thank You

33 © 2012 Rovi Corporation. Company confidential.