computing atscale withfinagle - monkey.orgmarius/wt294.pdf · finagle from 10k feet rpc system...

39
COMPUTING ATSCALE WITHFINAGLE MARIUSERIKSEN EECS294MARCH222015 [email protected]@MARIUS #WT294

Upload: others

Post on 16-Jun-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: COMPUTING ATSCALE WITHFINAGLE - monkey.orgmarius/wt294.pdf · Finagle from 10k feet RPC system Serialize Name Distribute Session Client Deserialize Admit Server calc.multiply(100,

COMPUTING ATSCALE

WITHFINAGLEMARIUSERIKSEN

EECS294MARCH222015 [email protected]@MARIUS

#WT294

Page 2: COMPUTING ATSCALE WITHFINAGLE - monkey.orgmarius/wt294.pdf · Finagle from 10k feet RPC system Serialize Name Distribute Session Client Deserialize Admit Server calc.multiply(100,

SettingThe modern computing environment is distributed. • Datacenters are the only way we know how to build

big, cheap computers.

Internet services must be “carrier grade.” • Their usefulness is directly related to their

availability.

Utility computing means environment variability. • You get what you pay for: resources may be revoked

at any time; you may have noisy neighbors.

Page 3: COMPUTING ATSCALE WITHFINAGLE - monkey.orgmarius/wt294.pdf · Finagle from 10k feet RPC system Serialize Name Distribute Session Client Deserialize Admit Server calc.multiply(100,

The dismal computerA datacenter is a really crappy computer; they: • have deep memory hierarchies,• exhibit partial failures,• have dynamic topologies,• are heterogeneous,• are connected via asynchronous networks,• make lots of room for operator error,• and are very complex.

But, they’re what we’ve got. We need to gain reliability, safety, and efficiency through software.

Page 4: COMPUTING ATSCALE WITHFINAGLE - monkey.orgmarius/wt294.pdf · Finagle from 10k feet RPC system Serialize Name Distribute Session Client Deserialize Admit Server calc.multiply(100,

It gets worseMost of our tools, languages, tribal knowledge, and capacity for sophistication is geared towards local computing, for example: • debuggers,• profilers,• linker/loaders,• memory models,• runtimes, and• type systems.

The very model of computing changes.

Page 5: COMPUTING ATSCALE WITHFINAGLE - monkey.orgmarius/wt294.pdf · Finagle from 10k feet RPC system Serialize Name Distribute Session Client Deserialize Admit Server calc.multiply(100,

Problems

How do we have harness datacenter computing?

How do we provide a sane programming model?

Can we recoup what is lost?

Page 6: COMPUTING ATSCALE WITHFINAGLE - monkey.orgmarius/wt294.pdf · Finagle from 10k feet RPC system Serialize Name Distribute Session Client Deserialize Admit Server calc.multiply(100,

*What I’m about to talk about — IPC/RPC systems, are just a small, though important, piece of this puzzle.

At the end of the day, the implications of datacenter computing are pervasive, and must be accounted for in every layer.

What we can do, however, is provide good tools to solve these problems. That’s what I’m talking about today.

Page 7: COMPUTING ATSCALE WITHFINAGLE - monkey.orgmarius/wt294.pdf · Finagle from 10k feet RPC system Serialize Name Distribute Session Client Deserialize Admit Server calc.multiply(100,

The service model

A service is an autonomous, asynchronous, isolated, and failure-explicit module.

Services compose other services. A system is a graph of services.

Services operate concurrently.

Page 8: COMPUTING ATSCALE WITHFINAGLE - monkey.orgmarius/wt294.pdf · Finagle from 10k feet RPC system Serialize Name Distribute Session Client Deserialize Admit Server calc.multiply(100,

STORAGE & RETRIEVAL

LOGICPRESENTATIONROUTING

Redis

Memcache

Flock

T-Bird

MySQLTweet

User

Timeline

Social Graph

DMs

API

Web

Monorail

TFE

HTTP Thrift “Stuff”

Page 9: COMPUTING ATSCALE WITHFINAGLE - monkey.orgmarius/wt294.pdf · Finagle from 10k feet RPC system Serialize Name Distribute Session Client Deserialize Admit Server calc.multiply(100,

Services in Finagle

// A service is a function that takes a // Req-typed value, returning a // Future of a Rep-typed value. // // It’s an asynchronous function.

trait Service[Req, Rep] extends (Req => Future[Rep])

Page 10: COMPUTING ATSCALE WITHFINAGLE - monkey.orgmarius/wt294.pdf · Finagle from 10k feet RPC system Serialize Name Distribute Session Client Deserialize Admit Server calc.multiply(100,

Defining a servicecalc.thrift:

// Define an interface using an IDL. // (In this case, Thrift.) service Calculator { i32 multiply(1: i32 a, 2: i32 b); }

calc.scala:

trait Calculator { def multiply(a: Int, b: Int): Future[Int] }

Page 11: COMPUTING ATSCALE WITHFINAGLE - monkey.orgmarius/wt294.pdf · Finagle from 10k feet RPC system Serialize Name Distribute Session Client Deserialize Admit Server calc.multiply(100,

Implementing a calculator

val calculator = new Calculator { def multiply(a: Int, b: Int): Future[Int] = Future.value(a+b) }

Rpc.serve(“calculator”, calculator)

Page 12: COMPUTING ATSCALE WITHFINAGLE - monkey.orgmarius/wt294.pdf · Finagle from 10k feet RPC system Serialize Name Distribute Session Client Deserialize Admit Server calc.multiply(100,

Using the calculatorval calculator = Rpc.bind[Calculator](“calculator”)

val f: Future[Int] = calculator.multiply(100, 200)

f.respond { case Return(res) => println(s”100*200=$res”) case Throw(err) => println(s“error $err”) }

Page 13: COMPUTING ATSCALE WITHFINAGLE - monkey.orgmarius/wt294.pdf · Finagle from 10k feet RPC system Serialize Name Distribute Session Client Deserialize Admit Server calc.multiply(100,

Concurrent compositiondef querySegment(id: Int, query: String) : Future[Result]def search(query: String): Future[Set[Result]] = {

val queries: Seq[Future[Result]] = for (id <- 0 until NumSegments) yield { querySegment(id, query) }

Future.collect(queries) flatMap { results: Seq[Set[Result]] => Future.value(results.flatten.toSet) }}

Page 14: COMPUTING ATSCALE WITHFINAGLE - monkey.orgmarius/wt294.pdf · Finagle from 10k feet RPC system Serialize Name Distribute Session Client Deserialize Admit Server calc.multiply(100,

An RPC systemRpc.serve("calculator")

Rpc.serve("calculator")

Rpc.serve("calculator")

Rpc.serve("calculator")

Rpc.bind("calculator")

RPC system

Page 15: COMPUTING ATSCALE WITHFINAGLE - monkey.orgmarius/wt294.pdf · Finagle from 10k feet RPC system Serialize Name Distribute Session Client Deserialize Admit Server calc.multiply(100,

Finagle from 10k feetR

PC s

yste

m

Serialize Name DistributeSession

Client

Deserialize Admit

Server

calc.multiply(100, 200)

calc.multiply(100, 200)

Datacenter network

Serialize

Deserialize

Session

Page 16: COMPUTING ATSCALE WITHFINAGLE - monkey.orgmarius/wt294.pdf · Finagle from 10k feet RPC system Serialize Name Distribute Session Client Deserialize Admit Server calc.multiply(100,

Serialization

0000000 d2791881 4ca6401e 32003b8f d4fe0a240000010 087b66b0 ddbc8058 cff4bb11 7b9cce850000020 5546dd41 858dfb25 2614aa5b f872082a0000030 48cc7d91 5f7f2884 f0b74ae8 1a1e2c680000040 16f8d867 971112cb b84827de ef52f2810000050 06eb6c5b 0098603b 5a0e49b6 c607fda0

Call("Calculator.multiply", 100, 200)Call("Calculator.multiply", 100, 200)

calculator.multiply(100, 200)

Page 17: COMPUTING ATSCALE WITHFINAGLE - monkey.orgmarius/wt294.pdf · Finagle from 10k feet RPC system Serialize Name Distribute Session Client Deserialize Admit Server calc.multiply(100,

Naming

Logical/Abstract

calculator

Replica set

zone/ owner/ env/ calculator

Physical

host1.smf1:122host2.smf1:123host3.smf1:124…

Page 18: COMPUTING ATSCALE WITHFINAGLE - monkey.orgmarius/wt294.pdf · Finagle from 10k feet RPC system Serialize Name Distribute Session Client Deserialize Admit Server calc.multiply(100,

Distributionserver1

server2

server3

servern

...

distributor

Page 19: COMPUTING ATSCALE WITHFINAGLE - monkey.orgmarius/wt294.pdf · Finagle from 10k feet RPC system Serialize Name Distribute Session Client Deserialize Admit Server calc.multiply(100,

Distribution (2)

server1

server2

server3

servern

...

distributor

server1

statistics: failures latencies load session health

controller

circuit breaker

Page 20: COMPUTING ATSCALE WITHFINAGLE - monkey.orgmarius/wt294.pdf · Finagle from 10k feet RPC system Serialize Name Distribute Session Client Deserialize Admit Server calc.multiply(100,

Session (Mux)session

data plane

control plane

session

data plane

control plane

requestresponse

cancelpingcrediterrornack

Page 21: COMPUTING ATSCALE WITHFINAGLE - monkey.orgmarius/wt294.pdf · Finagle from 10k feet RPC system Serialize Name Distribute Session Client Deserialize Admit Server calc.multiply(100,

Admission control

admit?

nack

request requestservice

response

Page 22: COMPUTING ATSCALE WITHFINAGLE - monkey.orgmarius/wt294.pdf · Finagle from 10k feet RPC system Serialize Name Distribute Session Client Deserialize Admit Server calc.multiply(100,

Distributing traffic

Distribution happens at different scales: • Time.• Geography.

This is a recursive problem! • Use the same mechanism, in different places.

Page 23: COMPUTING ATSCALE WITHFINAGLE - monkey.orgmarius/wt294.pdf · Finagle from 10k feet RPC system Serialize Name Distribute Session Client Deserialize Admit Server calc.multiply(100,

Diagnostics

val sr: StatsReceiver val counter = sr.counter(“requests”) val stat = sr.stat(“latency”)

counter.incr() stat.add(latencyMs)

Page 24: COMPUTING ATSCALE WITHFINAGLE - monkey.orgmarius/wt294.pdf · Finagle from 10k feet RPC system Serialize Name Distribute Session Client Deserialize Admit Server calc.multiply(100,

Diagnostics% curl http://.../admin/metrics.json ... "Gizmoduck/request_latency_ms": { "average": 1, "count": 124909591, "maximum": 950, "minimum": 0, "p50": 1, "p90": 3, "p95": 5, "p99": 19, "p999": 105, "p9999": 212, "sum": 222202958 }, ... "err/CancelledRequest": 11, "err/Unknown": 285, "err/Timeout": 106, ...

Page 25: COMPUTING ATSCALE WITHFINAGLE - monkey.orgmarius/wt294.pdf · Finagle from 10k feet RPC system Serialize Name Distribute Session Client Deserialize Admit Server calc.multiply(100,
Page 26: COMPUTING ATSCALE WITHFINAGLE - monkey.orgmarius/wt294.pdf · Finagle from 10k feet RPC system Serialize Name Distribute Session Client Deserialize Admit Server calc.multiply(100,

Aggregates

avg(ts(AVG, Gizmoduck, Gizmoduck/request_latency_ms.p{50,90,99,999}))

Page 27: COMPUTING ATSCALE WITHFINAGLE - monkey.orgmarius/wt294.pdf · Finagle from 10k feet RPC system Serialize Name Distribute Session Client Deserialize Admit Server calc.multiply(100,
Page 28: COMPUTING ATSCALE WITHFINAGLE - monkey.orgmarius/wt294.pdf · Finagle from 10k feet RPC system Serialize Name Distribute Session Client Deserialize Admit Server calc.multiply(100,
Page 29: COMPUTING ATSCALE WITHFINAGLE - monkey.orgmarius/wt294.pdf · Finagle from 10k feet RPC system Serialize Name Distribute Session Client Deserialize Admit Server calc.multiply(100,

A word on resiliencyresilience, n. The act of resiling, springing back, or rebounding; as, the resilience of a ball or of sound.

Systems must be designed, end-to-end, for resiliency. RPC systems are a toolkit for resilient applications, not a panacea.

Resilient systems should balance MTTF vs. MTTR.

We can’t wish these problems away.

Page 30: COMPUTING ATSCALE WITHFINAGLE - monkey.orgmarius/wt294.pdf · Finagle from 10k feet RPC system Serialize Name Distribute Session Client Deserialize Admit Server calc.multiply(100,

Message broker architectures

Explicit, decoupled, message queues. • Publishers, subscribers.• Topics.• Patterns on top — “request/reply,” “fire and forget,”

etc.

Brokered by middleware — the message queues.

Page 31: COMPUTING ATSCALE WITHFINAGLE - monkey.orgmarius/wt294.pdf · Finagle from 10k feet RPC system Serialize Name Distribute Session Client Deserialize Admit Server calc.multiply(100,

Actor architecturesActors and Services are both structuring idioms. • Whereas Services are asynchronous functions;

actors are asynchronous sinks.

Actors • Independent, isolated.• Asynchronous message passing.• Arranged into systems. • Request-reply is a pattern.

Error handling through supervisor hierarchies.

Page 32: COMPUTING ATSCALE WITHFINAGLE - monkey.orgmarius/wt294.pdf · Finagle from 10k feet RPC system Serialize Name Distribute Session Client Deserialize Admit Server calc.multiply(100,

Thanks.

Page 33: COMPUTING ATSCALE WITHFINAGLE - monkey.orgmarius/wt294.pdf · Finagle from 10k feet RPC system Serialize Name Distribute Session Client Deserialize Admit Server calc.multiply(100,

This is all open source!

github.com/twitter/finagle

github.com/twitter/util

github.com/twitter/scrooge

Page 34: COMPUTING ATSCALE WITHFINAGLE - monkey.orgmarius/wt294.pdf · Finagle from 10k feet RPC system Serialize Name Distribute Session Client Deserialize Admit Server calc.multiply(100,

Backup slides

Page 35: COMPUTING ATSCALE WITHFINAGLE - monkey.orgmarius/wt294.pdf · Finagle from 10k feet RPC system Serialize Name Distribute Session Client Deserialize Admit Server calc.multiply(100,

Systems thinking

We no longer care about a single {process,machine,service,…}. What matter is how the system works.

For example, we want to optimize end-to-end performance, not that of individual servers.

Page 36: COMPUTING ATSCALE WITHFINAGLE - monkey.orgmarius/wt294.pdf · Finagle from 10k feet RPC system Serialize Name Distribute Session Client Deserialize Admit Server calc.multiply(100,

Lessons

Define high-level objectives, not low-level parameters, e.g., SLOs. • Give the system more freedom.• Make use of dynamism.• Balance with simplicity.

Page 37: COMPUTING ATSCALE WITHFINAGLE - monkey.orgmarius/wt294.pdf · Finagle from 10k feet RPC system Serialize Name Distribute Session Client Deserialize Admit Server calc.multiply(100,

Load balancing

A rich topic, with many tradeoffs presented.

Power of Two Choices (Mitzenmacher).

Apertures.

Latency-based metrics.

Page 38: COMPUTING ATSCALE WITHFINAGLE - monkey.orgmarius/wt294.pdf · Finagle from 10k feet RPC system Serialize Name Distribute Session Client Deserialize Admit Server calc.multiply(100,

Session liveness

How do we determine whether a session is live? A surprisingly tricky question.

φ accrual.

Threshold detector.

Page 39: COMPUTING ATSCALE WITHFINAGLE - monkey.orgmarius/wt294.pdf · Finagle from 10k feet RPC system Serialize Name Distribute Session Client Deserialize Admit Server calc.multiply(100,

Requeueing

After receiving a server NACK, what do we do?

Credit/debit scheme. Cost ratio.