moving from c#/.net to hadoop/mongodb
Post on 24-Jan-2015
2.337 Views
Preview:
DESCRIPTION
TRANSCRIPT
Moving from C#/.NET to Hadoop/MongoDB
Robert Vandehey
December 4, 2012
4 © 2012 Rovi Corporation. Company confidential.
We power the Discovery, Delivery and Display of Digital Entertainment
7 © 2012 Rovi Corporation. Company confidential.
Viewers use our guide technologiesthrough service provider offerings
137M+
Consumer electronic (CE) deviceshave our CE guide technologies
266M+
Households reached globally by Rovi Advertising Network
40M+
Devices certified for high quality DivX video playback
600M+
Storefronts with entertainment servicespowered by Rovi Entertainment Store
47M+
TV shows, movies, sports and celebrities
4.5M+
Album releases and 32M music tracks3.3M+
Movie titles500K+
Data coverage:
Global Reach
© 2012 Rovi Corporation. Company confidential.11
© 2012 Rovi Corporation. Company confidential.13
The Problem
ETL/Cache Loading Data Takes Too Long
Page 16
Cache Loading Process
DSG DB Server(s)
DSG Database
WSP ETL Server
CI DatabaseDatabase TransformExtract
Node 2 DB Server
CI Database
Backup & Restore
MemcacheD Cluster
MemcacheD
MemcacheD
Node 1 DB Server
CI Database
Backup & Restore
MemcacheDB Cluster
MemcacheDB
MemcacheDB
MemcacheD (Scratch Server(s))
MemcacheD
MemcacheD
Table Loading Process
© 2012 Rovi Corporation. Company confidential.17
The Solution
Hadoop/MongoDB
Copyright ®2012 Rovi Corporation. Company confidential.18
Network Diagram
Copyright ®2012 Rovi Corporation. Company confidential.20
Mongo Sharding
Copyright ®2012 Rovi Corporation. Company confidential.21
© 2012 Rovi Corporation. Company confidential.23
Challenges
Challenges
• Transition existing Windows/.NET team to Linux/Java
– Environment setup. Technology framework choices
– Coding differences
– Cultural differences
– Platform differences
– Easier than expected to transition team from .NET to Java – No religious battles
• Backwards compatibility of CXF web services to Microsoft .NET web services
• Managing new releases of Hadoop
• BCP took too long
– Converted to base tables. Used Pig to join the data
• Writes to Mongo are very fast. Updates are slower and saturated disks
– Implemented Diff process (MD5 calc) to allow Hadoop to do the work and minimize writes to Mongo
© 2012 Rovi Corporation. Company confidential.24
Lessons Learned
© 2012 Rovi Corporation. Company confidential.25
Lessons Learned
• General
– Current versions of Hadoop CDH4 and MongoDB 2.0 are actually very stable products• We purchased enterprise support agreements from both Cloudera and 10gen
– Create a developers VM image
– Deploy early and often even if not ready for real customers
– Use the same setup in test and production environments• Sharding caused differences
• SQL
– Get raw tables without any transformation or joins• Let Hadoop do the processing for you
• Hadoop
– Do as much work as you can in Hadoop
– Take the time to create small datasets to iterate fast
– Take the time to learn and use Pig• It is very fast and provides tons of functionality that you don’t need to code in Java
– Don’t create Runners - Use Oozie workflows
– Measure, benchmark and track performance – Use Hadoop counters
© 2012 Rovi Corporation. Company confidential.26
Lessons Learned - 2
• MongoDB
– RAM, RAM, RAM!!!
– Many writes from Hadoop can easily overwhelm MongoDB• Single database lock
• Drive bandwidth saturation – Can be expanded through sharding
• Do as much as possible to minimize writes
• Measure where your application is blocking and optimize
– Don’t shard unless you have to – if you do shard, preconfigure your shard key• You need a good shard key
– Use Replica sets. They are easy to setup and work good.• Make sure repllog is large enough.
– Use MongoDB Monitoring Service (MMS) – It’s free
– Mongo queries are fast!
© 2012 Rovi Corporation. Company confidential.27
Mongo Query – returns 90 rows from a database of 9 million in 44ms
© 2012 Rovi Corporation. Company confidential.28
31 © 2012 Rovi Corporation. Company confidential.
Q&A
Follow-up Information
• Email: robert.vandehey@rovicorp.com
• LinkedIn: http://www.linkedin.com/in/bvandehey
• Twitter: @bvandehey
• Rovi Cloud Services: http://developer.rovicorp.com/
© 2012 Rovi Corporation. Company confidential.32
Thank You
33 © 2012 Rovi Corporation. Company confidential.
top related