Post on 06-May-2015
Embed Size (px)
- 1.Hadoop on Mesos with a short history of distributed computing
2. Agenda 1. Introduction (to me) 2. A short history of distributed computing 3. Hadoop on Mesos 4. Case study - Airbnb 5. Final thoughts 6. Q&A 3. About me - Brenden Matthews cyclist runner started computering before it was cool free software advocate & contributor (Conky) for a living, engineers software @ Airbnb 4. About me - Brenden Matthews cyclist runner started computering before it was cool free software advocate & contributor (Conky) for a living, engineers software @ I don't even like computers. 5. Von Neumann Bottleneck Forever limited by memory and other I/O bandwidth limitations To do more, you must scale beyond a single node Even with SMP systems, the same limitations apply A little history 6. Early days of distributed computing Working around the Von Neumann Bottleneck: scaling up & out (Cray, SGI, IBM) 'Supercomputers' only practical for organizations with budget multipliers that start with a 'B' 7. Who has time to build a datacentre? Xen hypervisor is released in 2003, paves the way for an 'abstract datacentre' through virtualization Amazon launches EC2 in 2006, kicks off the 'cloud computing' craze 8. DIY supercomputer; a novel approach Google's MapReduce papers formalized the concept of 'black-box' distributed computing (2004) Google's own infrastructure is built upon free software and commodity hardware 9. DIY supercomputer; a novel approach Hadoop: a free implementation of Google's infrastructure; 'big computing' for all (2005) Robust High tolerance of system failure 10. We're still left with many incomplete solutions EC2 doesn't solve some problems: Virtualization delivers poor performance when compared to 'bare metal'; must compensate by adding more instances Frequent instance failures (mystery reboots, etc) EC2 isn't 'application aware' (though some have tried) What else? Supercomputers aren't affordable Building a datacentre is not feasible for most Existing 'application in the cloud' systems are too restrictive 11. How can we overcome these problems? 12. The dream is alive. 13. Mesos is an operating system for your cluster that provides application level distributed computing Mesos helps bridge the gap between the hardware and your application (or 'framework', in Mesos terms) What's Mesos? 14. Why Mesos? yes, but... 15. I enjoy doing things the hard way. 16. I really enjoy doing things the hard way. 17. Hadoop on Mesos: Why? Formalized, scalable distributed computing Extensive toolset (Hive, Pig, Cascading, Cascalog, ...) Familiar to many ('gold standard') Hadoop as a distributed application (a novel concept!) Multiple versions of Hadoop (upgrade path) Why stop at Hadoop? There's more to do with our cluster! (Chronos, Storm, Jenkins, Spark, ...) and who has time to manage it? 18. Hadoop on Mesos: Goals Avoid complexity: rely on existing, vetted systems, where possible Hadoop on Mesos should behave like any other Hadoop Realize high resource utilization Minimize contention & starvation Make Hadoop a first class framework on Mesos 19. Hadoop terminology JobTracker: manages cluster resources, assigns tasks to TaskTrackers TaskTracker: manages individual map/reduce tasks, serves intermediate data amongst other TaskTrackers Job: collection of map and reduce tasks Task: one unit of work for a job (be it map or reduce) Slot: a task executor, is either map or reduce HDFS: distributed filesystem (outside scope) 20. Hadoop on Mesos: Challenges Availability: JobTracker must ensure adequate map and reduce slots are available for current & future jobs Capacity: how do you estimate capacity? How do you profile jobs? Optimization: general case, or specific cases? Per job resource allocation policies? Separate JobTrackers for different job types? 21. Hadoop on Mesos: Challenges Mesos reservations allow for reservation of slave resources for frameworks Hadoop FairScheduler supports role fair sharing and task pre-emption within JobTracker Resource reservations: handling competing frameworks on the same cluster 22. Hadoop on Mesos: Challenges Job Maps Reduces Duration Start 1 95 5 1h 0 2 5 100 1m 1m 3 10 10 30m 60m 4 50 0 20m 70m 5 100 5 1h 80m Maps Reduces 95 5 48 52 10 10 60 10 90 10 Job Flow With capacity for 100 slots A contrived example Maps Reduces 50 50 50 50 50 50 50 50 50 50 Ideal allocation Actual Hadoop 23. Hadoop on Mesos: What we did Mesos Scheduler is a thin layer atop the Hadoop scheduler JobTracker launches TaskTrackers for each job, using either a fixed or variable slot policy Fixed policy launches a fixed number of slots per TaskTracker Variable policy attempts to launch an ideal number of TaskTrackers and slots based on job queue Task scheduling is left to the underlying scheduler (i.e., Hadoop FairScheduler) 24. Suggested key configuration values Hadoop on Mesos: How we did it Name Value mapred.tasktracker.map.tasks.maximum 50 mapred.tasktracker.reduce.tasks.maximum 50 mapred.mesos.slot.map.minimum 1000 mapred.mesos.slot.reduce.minimum 1000 mapred.mesos.scheduler.policy.fixed false mapred.mesos.slot.cpus 0.95 mapred.mesos.slot.mem 1550 25. Engineering & analytics departments use Hive, Pig, Cascading and other tools on Hadoop: Building search indices Pricing suggestion system Trust & safety, fraud detection Business analytics Dealing with hypergrowth Case study: Airbnb 26. Had previously been using EMR, Amazon's managed Hadoop as a service EMR suffers from: limited Hive/Pig features feature lag inability to patch or modify Hadoop Data infrastructure was prone to error due to significant complexity EMR clusters would be spun up & destroyed every week accessing Hadoop required strange SSH 'hopping' Case study: Airbnb, yesterday 27. Case study: Airbnb, today We run Chronos, Hadoop, and Storm on Mesos now Finished complete migration to Mesos from EMR (June 2013) ~500 Chronos jobs ~20TiB of daily Hive data, ~1-2PiB of archived data 28. Data availability: all time high Eng. & analytics customer satisfaction through the roof Case study: Airbnb, today 29. Action shots 30. Action shots 31. Next steps Locality awareness HDFS on Mesos HA JobTracker JobTracker on Mesos 32. Links The code: https://github.com/airbnb/mesos Airbnb Engineering Blog: http://nerds.airbnb. com/ My other stuff: https://github. com/brndnmtthws email@example.com firstname.lastname@example.org 33. Thanks! 34. Questions?