finagle linkerd and apache mesos twitter style microservices at scale
TRANSCRIPT
Finagle, linkerd, and MesosTwitter-style microservices at scaleoliver gouldcto, buoyant
ApacheCon North America, May 2016
from
oliver gould • cto @ buoyantopen-source microservice infrastructure
• previously, tech lead @ twitter:observability, traffic
• core contributor: finagle
• creator: linkerd
• loves: dogs@olix0r [email protected]
overview• 2010: A Failwhale Odyssey
• Automating the Datacenter
• Microservices: A Silver Bullet
• Finagle: The Once and Future Layer 5
• Introducing linkerd
• Demo
• Q&A
2010A FAILWHALE ODYSSEY
Twitter, 2010107 users
107 tweets/day
102 engineers
101 services
101 deploys/week
102 hosts
0 datacenters
101 user-facing outages/weekhttps://blog.twitter.com/2010/measuring-tweets
The Monorail, 2010
103 of RPS
102 of RPS/host
101 of RPS/process
hardware lb
the monorail
mysql memcache kestrel
Problems with the MonorailRuby performance
MySQL scaling
Memcache operability
Deploys
Events
https://blog.twitter.com/2013/new-tweets-per-second-record-and-how
Asymmetry
Photo by Troy Holden
Provisioning
automating the datacenter
mesos.apache.orgUC Berkeley, 2010
Twitter, 2011
Apache, 2012
Abstracts compute resources
Promise: don’t worry about the hosts
aurora.apache.orgTwitter, 2011
Apache, 2013
Schedules processes on Mesos
Promise: no more puppet, monit, etc
timelines
Aurora (or Marathon, or …)
host
Mesos
host host host host host
users notifications
x800 x300 x1000
microservicesA SILVER BULLET
scaling teams
growing software
flexibility
performance correctness monitoring debugging efficiency security
resilience
not a silver bullet.(sorry.)
Resilience is an imperative: our software runs on the truly dismal computers we call datacenters. Besides being heinouslycomplex… they are unreliable and prone to operator error.
Marius Eriksen @mariusRPC Redux
resilience in microservicessoftware you didn’t write
hardware you can’t touch
network you can’t configure
break in new and surprising ways
and your customers shouldn’t notice
resilient microservices means
resilient communication
datacenter
[1] physical
[2] link
[3] network
[4] transport aurora, marathon, … mesos canal, weave, …
aws, azure, digitalocean, gce, …
business languages, libraries[7] application
rpc[5] session
[6] presentation json, protobuf, thrift, …
http/2, mux, …
layer 5 dispatches requests onto layer 4 connections
finagleTHE ONCE AND FUTURE LAYER 5
github.com/twitter/finagleRPC library (JVM)
asynchronous
built on Netty
scala
functional
strongly typed
first commit: Oct 2010
used by…
programming finagleval users = Thrift.newIface[UserSvc](“/s/users”) val timelines = Thrift.newIface[TimelineSvc](“/s/timeline”)
Http.serve(“:8080”, Service.mk[Request, Response] { req => for { user <- users.get(userReq(req)) timeline <- timelines.get(user) } yield renderHTML(user, timeline) })
operating finagletransport security
service discovery
circuit breaking
backpressure
deadlines
retries
tracing
metrics
keep-alive
multiplexing
load balancing
per-request routing
service-level objectives
Obs
erve
Sess
ion
timeo
ut
Ret
ries
Req
uest
dra
inin
g
Load balancer
Mon
itor
Obs
erve
Trac
e
Failu
re a
ccru
al
Req
uest
tim
eout
Pool
Fail
fast
Expiration Dispatcher
“It’s slow”is the hardest problem you’ll ever debug.
Jeff Hodges @jmhodgesNotes on Distributed Systems for Young Bloods
the more components you deploy, the more problems you have
the more components you deploy, the more problems you have
😩
the more components you deploy, the more problems you have
😩
😩
😩
😩
😩
😩
lb algorithms: • round-robin • fewest connections • queue depth • exponentially-weighted
moving average (ewma) • aperture
load balancing at layer 5
timeouts & retries
timelines
users
web
db
timeout=400ms retries=3
timeout=400ms retries=2
timeout=200ms retries=3
timelines
users
web
db
deadlines
timelines
users
web
db
timeout=400ms
deadline=223ms
deadline=10ms
177ms elapsed
213ms elapsed
retry budget
typical:
retries=3 worst-case: 300% more load!!!
better: retryBudget=20% worst-case: 20% more load
tracing
tracing
😎
tracing
layer 5 routing
layer 5 routingapplications refer to logical names
requests are bound to concrete names
delegations express routing
/s/users
/io.l5d.zk/prod/users
/s => /io.l5d.zk/prod/http
per-request routing: staging
GET / HTTP/1.1 Host: mysite.comDtab-local: /s/B => /s/B2
per-request routing: debug proxy
GET / HTTP/1.1Host: mysite.comDtab-local: /s/E => /s/P/s/E
so all i have to do is rewrite my app in scala?
linkerd
github.com/buoyantio/linkerdmicroservice rpc proxy
layer-5 router
aka l5d
built on finagle & netty
pluggable
http, thrift, …
etcd, consul, kubernetes, marathon, zookeeper, …
…
magic resiliency sprinklestransport security
service discovery
circuit breaking
backpressure
deadlines
retries
tracing
metrics
keep-alive
multiplexing
load balancing
per-request routing
service-level objectives
Service Binstance
linkerd
Service Cinstance
linkerd
Service Ainstance
linkerd
namerdreleased in March
centralized routing policy
delegates logical names to service discovery
pluggable
etcd
kubernetes
zookeeper
…
namerd
demo: gob’s microservice
web
wordgen
l5d
l5dl5d
web
wordgen
gen-v2
l5d
l5dl5d
l5d
web
wordgen
gen-v2
l5d
l5dl5d
l5d
namerd
master
dc/os marathon zookeeper
node nodepublic node node
…
ELB ELB
master
dc/os marathon zookeeper
node nodepublic node node
…
linkerd linkerd linkerd linkerd
ELB ELB
namerd
master
dc/os marathon zookeeper
node nodepublic node node
…
linkerd linkerd linkerd linkerd
ELB ELB
namerd
web (x1) gen (x3)
word (x3)
word-growthhack (x3)
github.com/buoyantio/linkerd-examples
linkerd roadmap• Netty4.1
• HTTP/2+gRPC linkerd#174 • TLS client certs • Richer routing policies • Announcers • More configurable everything
more at linkerd.io
slack: slack.linkerd.io
email: [email protected]
twitter:
• @olix0r
• @linkerd
thanks!