Scaling Like Twitter with Apache Mesos

Download Scaling Like Twitter with Apache Mesos

Post on 06-Jan-2017

1.095 views

Category:

Technology

0 download

TRANSCRIPT

#SCALING LIKE TWITTER WITH APACHE MESOSPhilip Norman & Sunil Shah

2015 Mesosphere, Inc. All Rights Reserved.

#Dan the Datacenter OperatorDoesnt sleep very wellLoves automationWants to control what runs in his datacenterAlice the Application DeveloperFinds setting up infrastructure tediousWants her application to be deployed as quickly as possibleMODERN INFRASTRUCTURE

2015 Mesosphere, Inc. All Rights Reserved.

#Clean separation of responsibilitiesNo more 3am wake upsEasy programmatic deployment3 TENETSModern Infrastructure

2015 Mesosphere, Inc. All Rights Reserved.

#CLEAN SEPARATIONModern InfrastructureBeforeDan cares about his hardware and Alices software that runs on it

Alice cares about her software and what hardware Dan providesNow

With Mesos, all the nodes are provisioned exactly the same (but may have heterogenous hardware).

Dan doesnt care what software is deployed since applications are well encapsulated.

Alice doesnt care where her software is deployed because its easy enough to scale up and down.

2015 Mesosphere, Inc. All Rights Reserved.

In the old world, both our datacenter operator Dan and our application developer Alice had to be concerned about both the physical infrastructure running an application and the actual configuration and deployment of an application.- Dan would have to know how to run Alice's application in case a node went away.- We get a painful situation where Dan has to treat each of his machines like carefully crafted snowflakes.- Alice would have to know how much hardware to ask Dan for, and would usually overestimate to be safe.- This means that resource utilisation is often pretty bad.

Now:- In the Mesos world, you have a homogenous cluster running your applications. It doesn't matter where your software gets deployed since it brings its dependencies with it.- Dan doesn't have to worry about specific machines going away and disproportionately impacting specific applications.- Alice doesn't have to worry too much about requesting enough resources ahead of time, because it's quick and easy to scale her application.

#NO MORE 3AM WAKE UPSModern InfrastructureBeforeDan had to react every time an application or machine went down.NowMesos and Marathon monitor running tasks.

If a task fails or is lost (due to a machine going offline), Mesos communicates that to Marathon.

Marathon restarts the application.

Dan gets to sleep peacefully!

2015 Mesosphere, Inc. All Rights Reserved.

If an application or server went down, your infrastructure might have enough slack to tolerate it but often that's not possible. Your operator and/or developer gets paged and has to restart the application manually or try to revive the faulty hardware.Now- Mesos and Marathon provide a health checking mechanism that allows them to tell if an application is up and responsive. If it's not, they'll reap the application and start it again somewhere else.- Similarly, if a machine goes down, Mesos communicates the task failure to Marathon. It will try to maintain the desired state (e.g. 100 instances) and re-deploy the application somewhere else.- Dan doesn't get woken up because DCOS takes care of these failures.

#EASY PROGRAMMATIC DEPLOYMENTModern InfrastructureBeforeServers were handcrafted.

Deploying new or updated software would require oversight and involvement from both Alice and Dan.NowDan provides Alice with her own instance of Marathon that makes it hard for her to take down someone elses application.

Running applications are isolated from each other by Mesos.

Marathon offers a nice API that allows Alice to easily deploy new versions safely.

2015 Mesosphere, Inc. All Rights Reserved.

- Provisioning machines was a privileged action. Alice would have to petition Dan for an upgrade. Changes were slow.Now-- Dan can provide Alice with her own instance of Marathon. These tasks are inherently isolated from other tasks running on the same cluster. Alice is able to do what she run what she likes using this instance (with some reasonable constraints).- Alice can now programmatically deploy applications to Marathon using its APIs.

#Apache MesosLAYER OF ABSTRACTION

2015 Mesosphere, Inc. All Rights Reserved.

Coordinator -> schedulerSlaves -> agents

#Apache MesosINTRODUCTIONApache Mesos is a cluster resource manager.

It handles:Aggregating resources and offering them to schedulersLaunching tasks (i.e. processes) on those resourcesCommunicating the state of those tasks back to schedulersINTRODUCTIONApache Mesos

2015 Mesosphere, Inc. All Rights Reserved.

#PRODUCTION CUSTOMERS AND MESOS USERS

Government Agencies

2015 Mesosphere, Inc. All Rights Reserved.

#MESOS: ORIGINS

2015 Mesosphere, Inc. All Rights Reserved.

#THE BIRTH OF MESOSTWITTER TECH TALKThe grad students working on Mesos give a tech talk at Twitter.March 2010APACHE INCUBATIONMesos enters the Apache Incubator.

Spring 2009CS262BBen Hindman, Andy Konwinski and Matei Zaharia create Nexus as their CS262B class project.

MESOS PUBLISHEDMesos: A Platform for Fine-Grained Resource Sharing in the Data Center is published as a technical report.September 2010

December 2010

2015 Mesosphere, Inc. All Rights Reserved.

Spring 2009: 262B project between BenH, Andy and Matei2009: Nexus project started at UC Berkeleys RAD Lab (later renamed to Mesos)March 2010: Grad students give talk at Twitter, Twitter starts exploring deploymentSeptember 2010: Mesos technical report published (http://mesos.berkeley.edu/mesos_tech_report.pdf)December 2010: Mesos enters Apache Incubator (http://incubator.apache.org/projects/mesos.html)

#

Sharing resources between batch processing frameworksHadoopMPISpark

What does an operating system provide?Resource managementProgramming abstractionsSecurityMonitoring, debugging, loggingTECHNOLOGYVISION

2015 Mesosphere, Inc. All Rights Reserved.

#ARCHITECTUREMESOS FUNDAMENTALS

2015 Mesosphere, Inc. All Rights Reserved.

Slave -> agent

#ARCHITECTUREMESOS FUNDAMENTALSAgents advertise resources to MasterMaster offers resources to SchedulerScheduler rejects/uses resourcesAgents report task status to Master

2015 Mesosphere, Inc. All Rights Reserved.

#ARCHITECTUREMESOS FUNDAMENTALSAgents advertise resources to MasterMaster offers resources to SchedulerScheduler rejects/uses resourcesAgents report task status to Master

2015 Mesosphere, Inc. All Rights Reserved.

#ARCHITECTUREMESOS FUNDAMENTALSAgents advertise resources to MasterMaster offers resources to SchedulerScheduler rejects/uses resourcesAgents report task status to Master

2015 Mesosphere, Inc. All Rights Reserved.

#ARCHITECTUREMESOS FUNDAMENTALSAgents advertise resources to MasterMaster offers resources to SchedulerScheduler rejects/uses resourcesAgents report task status to Master

2015 Mesosphere, Inc. All Rights Reserved.

#ARCHITECTUREMESOS FUNDAMENTALSAgents advertise resources to MasterMaster offers resources to SchedulerScheduler rejects/uses resourcesAgents report task status to Master

2015 Mesosphere, Inc. All Rights Reserved.

#A naive approach to handling varied app requirements: static partitioning.

This can cope with heterogeneity, but is very expensive.KEEP IT STATICtime

2015 Mesosphere, Inc. All Rights Reserved.

#Maintaining sufficient headroom to handle peak workloads on all partitions leads to poor utilisation overall.KEEP IT STATICtime

2015 Mesosphere, Inc. All Rights Reserved.

#Multiple frameworks can use the same cluster resources, with their share adjusting dynamically.SHARED RESOURCEStime

2015 Mesosphere, Inc. All Rights Reserved.

#TWITTER & MESOS

2015 Mesosphere, Inc. All Rights Reserved.

Sunil starts here

Lets talk a little about how Mesos had so much utility for Twitter

#THE BIRTH OF MESOSTWITTER TECH TALKThe grad students working on Mesos give a tech talk at Twitter.March 2010APACHE INCUBATIONMesos enters the Apache incubator.

Spring 2009CS262BBen Hindman, Andy Konwinski and Matei Zaharia create Nexus as their CS262B class project.

MESOS PUBLISHEDMesos: A Platform for Fine-Grained Resource Sharing in the Data Center is published as a technical report.September 2010

December 2010

2015 Mesosphere, Inc. All Rights Reserved.

When Mesos was presented at Twitter, engineers in the audience immediately saw potential. A couple of engineers, who had been at Google, saw this as a possible replacement for Borg.

#Former Google engineers at Twitter thought Mesos could provide the same functionality as Borg.

Mesos actually works pretty well for long running services.MESOS REALLY HELPS

2015 Mesosphere, Inc. All Rights Reserved.

Mesos actually works pretty well for long running services.

#MESOS WITH MARATHON IN PRODUCTION

2015 Mesosphere, Inc. All Rights Reserved.

#WHAT IS MARATHON?Mesos with Marathon in ProductionService scheduler for Mesosinit.d for long-running appsYour own private PaaS

2015 Mesosphere, Inc. All Rights Reserved.

#WHAT IS MARATHON?Mesos with Marathon in Production

2015 Mesosphere, Inc. All Rights Reserved.

#USEFUL MARATHON FEATURESMesos with Marathon in ProductionStart, stop, scale, update appsHighly available, no SPoFNative Docker supportPowerful Web UIFully featured REST APIPluggable event busArtifact staging

2015 Mesosphere, Inc. All Rights Reserved.

#USEFUL MARATHON FEATURES: DEPLOY LIKE FACEBOOKMesos with Marathon in ProductionApplication versioningRolling deploy / restartDeployment strategies

2015 Mesosphere, Inc. All Rights Reserved.

#USEFUL MARATHON FEATURES: DEPLOY LIKE A TELCOMesos with Marathon in ProductionApplication versioningHot/hot new/old clustersAuthentic scale testingManual *and* automated testing

2015 Mesosphere, Inc. All Rights Reserved.

#MESOS WITH MARATHON IN ACTION

2015 Mesosphere, Inc. All Rights Reserved.

#

2015 Mesosphere, Inc. All Rights Reserved.

#

2015 Mesosphere, Inc. All Rights Reserved.

#

2015 Mesosphere, Inc. All Rights Reserved.

#

2015 Mesosphere, Inc. All Rights Reserved.

#

2015 Mesosphere, Inc. All Rights Reserved.

#

2015 Mesosphere, Inc. All Rights Reserved.

#

2015 Mesosphere, Inc. All Rights Reserved.

#

2015 Mesosphere, Inc. All Rights Reserved.

#

2015 Mesosphere, Inc. All Rights Reserved.

#

2015 Mesosphere, Inc. All Rights Reserved.

#

2015 Mesosphere, Inc. All Rights Reserved.

#Mesos with Marathon in ActionTASK FAILURE

2015 Mesosphere, Inc. All Rights Reserved.

#

2015 Mesosphere, Inc. All Rights Reserved.

#

2015 Mesosphere, Inc. All Rights Reserved.

#

2015 Mesosphere, Inc. All Rights Reserved.

#

2015 Mesosphere, Inc. All Rights Reserved.

#How do my applications discover each other?

Two main service discovery mechanisms:DNS based (Mesos-DNS)HAProxy based (Marathon-lb)SERVICE DISCOVERYMesos with Marathon in Production

2015 Mesosphere, Inc. All Rights Reserved.

This a little more involved.- Marathon takes care of deploying your application, just post marathon.json to http://dcos-host/service/my-marathon

- It has different strategies, default is to start as many healthy new instances of the app as are currently running before killing the old app.

- Configurable.

#MESOS-DNSService Discovery

Ingests cluster state periodically.

Uses cluster state to generate DNS records for all running Mesos tasks.

Services query DNS server to discover IP address and port of other services.

Primarily used for internal service discovery.

No extra configuration required!

2015 Mesosphere, Inc. All Rights Reserved.

#MARATHON-LBService Discovery

Ingests state of running Marathon applications.

Regenerates HAProxy configuration.

Supports virtual hosts!

Can be used for both internal and external service discovery.

Must add HAPROXY_GROUP and HAPROXY_0_VHOST variables to your marathon.json.

2015 Mesosphere, Inc. All Rights Reserved.

#Using chef/puppet/ansible (or a reliable intern)Install ZooKeeper and MesosInstall your scheduler (Marathon)Deploy some long-running services.See https://open.mesosphere.com/getting-started/tools/ for more docsHOW TO DEPLOY A MESOS CLUSTER (THE HARD WAY)Mesos as the Datacenter Kernel

2015 Mesosphere, Inc. All Rights Reserved.

#Visit http://mesosphere.comHit the Get Started buttonHOW TO DEPLOY A MESOS CLUSTER (OUR WAY)Mesos as the Datacenter Kernel

2015 Mesosphere, Inc. All Rights Reserved.

#

Existing InfrastructureMesosphere DCOSServices & ContainersMESOS AS THE DATACENTER KERNEL

2015 Mesosphere, Inc. All Rights Reserved.

#Jenkins on Mesos allows you to share build resources between multiple Jenkins masters.

PayPal does this with hundreds of Jenkins mastersBetween them, they use less than a hundred build slaves to service several thousand developers.Combining Jenkins with a PaaS like Marathon or Kubernetes allows you to practice easy continuous deployment.JENKINS: BUILD RESOURCE POOLINGMesos as the Datacenter Kernel

2015 Mesosphere, Inc. All Rights Reserved.

#

An example continuous deployment pipeline using Mesos, Marathon, Docker and Jenkins on Mesos.

2015 Mesosphere, Inc. All Rights Reserved.

#CONTINUOUS DELIVERY DEMO

2015 Mesosphere, Inc. All Rights Reserved.

#KUBERNETES ON MESOSMesos as the Datacenter Kernel

2015 Mesosphere, Inc. All Rights Reserved.

#KUBERNETES ON MESOSMesos as the Datacenter Kernel

2015 Mesosphere, Inc. All Rights Reserved.

#Mesos was built for and is great for running big data workloads:Chronos (time scheduled jobs)SparkCassandraKafkaHadoop/YARN (via Myriad)BIG DATA ON MESOSMesos as the Datacenter Kernel

2015 Mesosphere, Inc. All Rights Reserved.

QUESTIONS?THANK YOU!#Come and talk to us!Email us at philip@mesosphere.io, sunil@mesosphere.ioSlides will be up at http://mesosphere.github.io/presentations