Scalable On-Demand Hadoop Clusters with Docker and Mesos

Download Scalable On-Demand Hadoop Clusters with Docker and Mesos

Post on 28-Jul-2015

1.501 views

Category:

Technology

3 download

TRANSCRIPT

1. Scalable On-Demand Hadoop Clusters with Docker and Mesos Andrew Nelson, Nutanix @vmwnelson http://virtual-hiking.blogspot.com Chris Mutchler, VMware @chrismutchler http://virtualelephant.com V 2. Agenda New Approach for Hadoop Ops Infrastructure Resource Considerations Docker as the new Unit of Work Future Work 2 3. Last Years State of the Art Self-service and multi-tenant Hadoop Elastic and decoupled infrastructure Extensible blueprinting 3 4. New Goals Operationalize multiple frameworks Decoupled service architecture Flexible and developer-friendly form factor 4 5. Apache Mesos Introduction Started at Berkeley Graduated to top level Apache project 2013 Commercial entity is Mesosphere https://github.com/apache/mesos/ 5 6. Mesos Architecture 6 Source: http://mesos.apache.org/assets/img/documentation/architecture3.jpg 7. Mesos as a Multi-Tenant Resource Pool 7 Source: https://github.com/mesos/myriad/blob/phase1/docs/how-it-works.md 8. Tools to Build and Scale Serengeti, Vmware https://github.com/vmware-serengeti BOSH, Pivotal https://github.com/cloudfoundry/bosh Cloudify, Gigaspaces https://github.com/CloudifySource/cloudify Cloudbreak, SequenceIQ https://github.com/sequenceiq/cloudbreak 8 9. Advantages for Ops Mesos as a Resource Pool Multiple concurrent frameworks Decouple frameworks from resource pools 9 10. Compute Partitions on Mesos 10 Shared Hadoop Storm Spark Kafka Hadoop Cassandra Storm Spark Marathon Cassandra Siloed 11. HDFS as a Service 11 Namenode Standby Namenode Secondary Namenode HDFS MapReduce Spark Hive Storm 12. Networking Services Service Discovery Handled per framework Port range resource managed by Mesos slave For example, Marathon uses HAProxy for request routing Per-container network monitoring Egress rate-limiting 12 13. Scheduling Options Mesos scheduling Capacity Scheduler Fair Scheduler Tenant scheduling examples Hadoop on Mesos Myriad (YARN) on Mesos 13 14. Dev Workflow Code Repo / Registry Pull / Push / Commit / Run Automated Builds Version tagging Marathon CI / CD Dependencies Rolling restarts 14 15. Registry Services Pluggable storage Webhooks Image control Security Logging 15 Registry Repository Repository Image Image Image 16. Advantages for Developers Interchangeable verbs for codecontainers Choice of framework to use as their PaaS Adopt microservices approach to app pipeline 16 17. Recommendations for Success Start small, scale fast Use most appropriate framework for the job Think ahead, decouple Plan for rolling restart capacity up front 17 18. Gap Analysis Be prepared to look under the hood Variable maturity and resiliency of the layers Networking Security 18 19. Where Are We Going Next Scale and learn Container-focused OS Software-defined networking services Discover key performance and availability metrics 19 20. Wrapping up Mesos allows for choice of framework Devs utilize Docker with familiar workflow Portable, flexible, and scalable architecture 20