how yelp uses mesos to power its soa infrastructure
TRANSCRIPT
● Scheduling: puppet manifests
● Delivery:○ service code: rsync + cron jobs○ dependencies: puppet + apt
● Discovery: hand-crafted LB configs + VIPs
● Alerting: hand-crafted nagios configs
Ye Olde Yelp SOA
● Scheduling: yelpsoa-configs
● Delivery:○ service code: rsync + cron jobs○ dependencies: virtualenv, some puppet
● Discovery: Smartstack
● Alerting: Sensu based on yelpsoa-configs
Not-so-Olde Yelp SOA
● All services on all machines?○ tough if your memory footprint is too big
● Some subset on each?○ What are the biggest resource users?
● Which services actually need resources?
What if you buy more machines?How do you redistribute load?
● Hope you have a box with spare capacity
● Spin up more boxes? See previous slide.
What if a service needs more CPU?
● virtualenv for python libs
● what about C shared libraries?
What if services have conflicting dependencies?
● virtualenv for python libs
● What about different interpreters?
● Different HTTP server?
What if a service has unusual dependencies?
PaaSTA
● Scheduling: Mesos+Marathon
● Delivery: Docker
● Discovery: Smartstack
● Alerting: Sensu based on yelpsoa-configs
Scheduling in PaaSTA:Mesos and Marathon● Mesos is an "SDK for distributed systems",
batteries not included.● Requires a framework
○ Marathon (like ASG for Mesos)○ Chronos (Periodic tasks)
● Supports Docker as task executor
● Containers: like lightweight VMs
● Provide language (Dockerfile) for describing container image
● Reproducible builds (mostly)
● Provides software flexibility
Delivery in Paasta: Docker
How do we configure Marathon?
● Need a wall around Marathon: it has root on your entire cluster.
● Cron job
● Combines yelpsoa-configs and currently-blessed docker image
● # instances● cpu● mem● healthcheck timeouts, grace period● bounce strategy● cmd / args
marathon.yaml
● Distribution with private registry
● S3 bucket shared among all environments
● Bless images by creating git tags○ 1:1 git commit <-> docker image
Building/shipping Docker images
● configure_nerve.py queries local mesos-slave API
● This means registrations work even when Mesos master or Marathon is down
● Provides backwards compatibility with older SOA
Registering with SmartStack
Jenkins Registrymesos-slavemesos-master
Marathon
yelpsoa-configs
docker
nerve
client
synapse
HAProxy
Zookeeper
git
codemetadatatraffic
● Describe end goal, not path● Helps achieve fault tolerance
"Deploy 12abcd34 to prod"vs.
"Commit 12abcd34 should be running in prod"
Gas pedal vs. Cruise Control
Declarative control
Reading Comprehension Question
A. To describe how cool Yelp's PaaSTA system isB. To tease viewers about a system that is not open source yet?D. To confuse viewers into making them consider yet another docker-based PaaSC. To Inspire viewers to build their own bespoke PaaS based on Mesos with some of these ideas.