reactive systems architecture - qcon london 2020 · reactive systems architecture. iot–like media...

43
Reactive systems architecture

Upload: others

Post on 24-Jun-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Reactive systems architecture - QCon London 2020 · Reactive systems architecture. IoT–like media processing ... • Reactive architecture & code throughout their commit history

Reactive systems architecture

Page 2: Reactive systems architecture - QCon London 2020 · Reactive systems architecture. IoT–like media processing ... • Reactive architecture & code throughout their commit history

IoT–like media processing

• Operates stream processing devices

• Exposes health–checks and pipeline topology

• Provides a global view of the pipeline

Page 3: Reactive systems architecture - QCon London 2020 · Reactive systems architecture. IoT–like media processing ... • Reactive architecture & code throughout their commit history

A distributed system without durable messaging easily

grows into a monolith

Page 4: Reactive systems architecture - QCon London 2020 · Reactive systems architecture. IoT–like media processing ... • Reactive architecture & code throughout their commit history

DeviceDeviceShadow

Page 5: Reactive systems architecture - QCon London 2020 · Reactive systems architecture. IoT–like media processing ... • Reactive architecture & code throughout their commit history

DeviceDeviceShadow

Log file

Page 6: Reactive systems architecture - QCon London 2020 · Reactive systems architecture. IoT–like media processing ... • Reactive architecture & code throughout their commit history

DeviceDeviceShadow

DB

Page 7: Reactive systems architecture - QCon London 2020 · Reactive systems architecture. IoT–like media processing ... • Reactive architecture & code throughout their commit history

DeviceDeviceShadow

Message queue X

Page 8: Reactive systems architecture - QCon London 2020 · Reactive systems architecture. IoT–like media processing ... • Reactive architecture & code throughout their commit history

A distributed system without supervision is binary:

working or failed

Page 9: Reactive systems architecture - QCon London 2020 · Reactive systems architecture. IoT–like media processing ... • Reactive architecture & code throughout their commit history

def makeDeviceRequest(request: DeviceRequest): Future[DeviceResponse] = ???

makeDeviceRequest(request).onComplete { case Success(dr) => // working! case Failure(ex) => // failed! } log.error(ex) makeDeviceRequest(request) }

log.error(ex) scheduleOnce(1000L, makeDeviceRequest(request)) } 👉

Page 10: Reactive systems architecture - QCon London 2020 · Reactive systems architecture. IoT–like media processing ... • Reactive architecture & code throughout their commit history

Naïve timeouts cause

Page 11: Reactive systems architecture - QCon London 2020 · Reactive systems architecture. IoT–like media processing ... • Reactive architecture & code throughout their commit history
Page 12: Reactive systems architecture - QCon London 2020 · Reactive systems architecture. IoT–like media processing ... • Reactive architecture & code throughout their commit history

…other timeouts

Page 13: Reactive systems architecture - QCon London 2020 · Reactive systems architecture. IoT–like media processing ... • Reactive architecture & code throughout their commit history

…excessive downstream load when combined with re–tries

Page 14: Reactive systems architecture - QCon London 2020 · Reactive systems architecture. IoT–like media processing ... • Reactive architecture & code throughout their commit history

DeviceDeviceShadow

👉

Page 15: Reactive systems architecture - QCon London 2020 · Reactive systems architecture. IoT–like media processing ... • Reactive architecture & code throughout their commit history

• Timeout = 2000 ms

• Timeout = 1950 ms

• Timeout = 1900 ms

Page 16: Reactive systems architecture - QCon London 2020 · Reactive systems architecture. IoT–like media processing ... • Reactive architecture & code throughout their commit history

• Timeout = 2000 ms

• Timeout = 1950 ms

• Timeout = 1900 ms

• Retries = 3 • Timeout = 2000 ms

• Retries = 2 • Timeout = 1950 ms

• Retries = 3 • Timeout = 1900 ms

666

975

633

👉

📻

💾

Page 17: Reactive systems architecture - QCon London 2020 · Reactive systems architecture. IoT–like media processing ... • Reactive architecture & code throughout their commit history

• Deadline = 2000 ms • QoS = …

• Deadline = 1950 ms • QoS = …

• Deadline = 1900 ms • QoS = …

Page 18: Reactive systems architecture - QCon London 2020 · Reactive systems architecture. IoT–like media processing ... • Reactive architecture & code throughout their commit history

def makeDeviceRequest(request: DeviceRequest): Future[DeviceResponse] = ???

makeDeviceRequest(request).onComplete { case Success(dr) => // working! case Failure(ex) => // failed! } log.error(ex) makeDeviceRequest(request) }

log.error(ex) scheduleOnce(1000L, makeDeviceRequest(request)) } 👉

Page 19: Reactive systems architecture - QCon London 2020 · Reactive systems architecture. IoT–like media processing ... • Reactive architecture & code throughout their commit history

implicit val sys: ActorSystem = … implicit val mat: ActorMaterializer = … val base = Uri(…).authority val pool = Http(sys).cachedHostConnectionPool[String](base.host.address(), base.port)

val correlationId = UUID.randomUUID().toString val rq = HttpRequest(…)

val _ = Source.single(rq ! correlationId) .via(pool) .runForeach(…) .recoverWithRetries(5, …)

.runForeach(…)👉

🎉

Page 20: Reactive systems architecture - QCon London 2020 · Reactive systems architecture. IoT–like media processing ... • Reactive architecture & code throughout their commit history

makeDeviceRequest(request)

Failed

Failed

👉

Page 21: Reactive systems architecture - QCon London 2020 · Reactive systems architecture. IoT–like media processing ... • Reactive architecture & code throughout their commit history

A distributed system without back–pressure will fail or will

make everything around it fail

Page 22: Reactive systems architecture - QCon London 2020 · Reactive systems architecture. IoT–like media processing ... • Reactive architecture & code throughout their commit history

“Let’s just go with the defaults for the thermal exhaust ports.” —Galen Erso

Page 23: Reactive systems architecture - QCon London 2020 · Reactive systems architecture. IoT–like media processing ... • Reactive architecture & code throughout their commit history

implicit val sys: ActorSystem = … implicit val mat: ActorMaterializer = … val base = Uri(…).authority val pool = Http(sys).cachedHostConnectionPool[String](base.host.address(), base.port)

val correlationId = UUID.randomUUID().toString val rq = HttpRequest(…)

val _ = Source.single(rq ! correlationId) .via(pool) .recoverWithRetries(5, …) .runForeach(…)

• Pool size • TCP connect timeout • TCP receive timeout

• Body receive timeout • Flow timeout

• How many retries? • Within what time–frame? • Idempotent endpoints? • nonces, etc.

Page 24: Reactive systems architecture - QCon London 2020 · Reactive systems architecture. IoT–like media processing ... • Reactive architecture & code throughout their commit history

A distributed system without observability and monitoring is

a stack of black boxes

Page 25: Reactive systems architecture - QCon London 2020 · Reactive systems architecture. IoT–like media processing ... • Reactive architecture & code throughout their commit history

👉

Page 26: Reactive systems architecture - QCon London 2020 · Reactive systems architecture. IoT–like media processing ... • Reactive architecture & code throughout their commit history

A distributed system without robust access control is a

ticking time–bomb

Page 27: Reactive systems architecture - QCon London 2020 · Reactive systems architecture. IoT–like media processing ... • Reactive architecture & code throughout their commit history

Service A

Message

privateKey

payload

string correlationId = 1;string token = 2;bytes signature = 3;bytes payload = 4;

token

publicKeys

Service B

privateKey payload

token

publicKeys

valid?

👉

Page 28: Reactive systems architecture - QCon London 2020 · Reactive systems architecture. IoT–like media processing ... • Reactive architecture & code throughout their commit history

A distributed system without chaos testing is going to fail in

the most creative ways

Page 29: Reactive systems architecture - QCon London 2020 · Reactive systems architecture. IoT–like media processing ... • Reactive architecture & code throughout their commit history

val mb = Array[Byte](8, 1, 12, 3, 65, 66, 67, 99, ..., 99) X.parseFrom(mb)

Exception in thread "main" java.lang.StackOverflowError at … JsonStreamContext.<init>(…:43) at … JsonReadContext.<init>(…:58) at … JsonReadContext.createChildObjectContext(…:128) at … ReaderBasedJsonParser._nextAfterName(…:773) at … ReaderBasedJsonParser.nextToken(…:636) at … JValueDeserializer.deserialize(…:45)

X.validate(mb)

val mj = """{"x":""" * 2000 JsonFormat.fromJsonString[X](mj)

Exception in thread "main" java.lang.StackOverflowError at … $StreamDecoder.readTag(…:2051) at … $StreamDecoder.skipMessage(…:2158) at … $StreamDecoder.skipField(…:2090) …

👉

Page 30: Reactive systems architecture - QCon London 2020 · Reactive systems architecture. IoT–like media processing ... • Reactive architecture & code throughout their commit history

message DeviceRequest { string method = 1; string uri = 2; map<string, string> headers = 3; string entity_content_type = 4; bytes entity = 5;

string schedule_time = 10; MisfireStrategy misfire_strategy = 11;

enum MisfireStrategy { BEST_EFFORT = 0; FORGET = 1; } }

👉

class DeviceActor extends Actor { … override def receive: Receive = { case TopicPartitionOffsetMessage(tpo, dr: DeviceRequest, _) => val d = Duration.between(ZonedDateTime.now, ZonedDateTime.parse(dr.scheduleTime)) context.system.scheduler.scheduleOnce(FiniteDuration(d.toMillis, TimeUnit.MS), self, dr) case d: DeviceRequest => Source.single(d -> …).via(…).run(…) } }

Page 31: Reactive systems architecture - QCon London 2020 · Reactive systems architecture. IoT–like media processing ... • Reactive architecture & code throughout their commit history

class DeviceActor extends Actor { … override def receive: Receive = { case TopicPartitionOffsetMessage(tpo, dr: DeviceRequest, _) => val d = Duration.between(ZonedDateTime.now, ZonedDateTime.parse(dr.scheduleTime)) context.system.scheduler.scheduleOnce(FiniteDuration(d.toMillis, TimeUnit.MS), self, dr) case d: DeviceRequest => Source.single(d -> …).via(…).run(…) } }

message DeviceRequest { string method = 1; string uri = 2; map<string, string> headers = 3; string entity_content_type = 4; bytes entity = 5;

string schedule_time = 10; MisfireStrategy misfire_strategy = 11;

enum MisfireStrategy { BEST_EFFORT = 0; FORGET = 1; } }

"GET" "/" Map.empty "application/json" "e30="

"2018-11-14T11:42:06+00:00” "BEST_EFFORT"

"GET" "/" Map.empty "application/json" "e30="

"2014-10-14T11:42:06+00:00" "BEST_EFFORT"

"GET" "/" Map.empty "application/json" "e30="

"2018-13-14T11:42:06+00:00" "BEST_EFFORT"

"I n vok

i n g

t h e f è ẹl

i n g o f

h á o s

"

"a/%%30%30" Map.empty "#cmds=({'/bin/echo', #eps})" "4oGmdGVzdOKBpw=="

"2017-10-14T11:42:06+00:00" "BEST_EFFORT"

👉

Page 32: Reactive systems architecture - QCon London 2020 · Reactive systems architecture. IoT–like media processing ... • Reactive architecture & code throughout their commit history

class DeviceActor extends Actor { … override def receive: Receive = { case TopicPartitionOffsetMessage(tpo, dr: DeviceRequest, _) => val d = Duration.between(ZonedDateTime.now, ZonedDateTime.parse(dr.scheduleTime)) context.system.scheduler.scheduleOnce(FiniteDuration(d.toMillis, TimeUnit.MS), self, dr) case d: DeviceRequest => Source.single(d -> …).via(…).run(…) } }

💣 Exception in thread "…" java.time.format.DateTimeParseException: Text '2014-12-10T05:44:06.635Z[😜] could not be parsed at index 21

💣 Exception in thread "…" java.time.format.DateTimeParseException: Text '2014-12-10T05:44:06.635Z[GMT] could not be parsed at index 11

💣 GET http://host/foo.action Content-Type: #cmds=({'/bin/echo', #eps})

💣 SIGSEGV (0xb) at pc=0x000000010fda8262, pid=21419, tid=18435 # V [libjvm.dylib+0x3a8262] PhaseIdealLoop::idom_no_update(Node*) const+0x12

https://github.com/minimaxir/big-list-of-naughty-strings

👉

Page 33: Reactive systems architecture - QCon London 2020 · Reactive systems architecture. IoT–like media processing ... • Reactive architecture & code throughout their commit history

Do tell another anecdote…

Page 34: Reactive systems architecture - QCon London 2020 · Reactive systems architecture. IoT–like media processing ... • Reactive architecture & code throughout their commit history

We measured

• For every file in every commit in every project…

• Classification of the kind and quality of code

• Matching production performance data from PagerDuty

Page 35: Reactive systems architecture - QCon London 2020 · Reactive systems architecture. IoT–like media processing ... • Reactive architecture & code throughout their commit history

👉

Page 36: Reactive systems architecture - QCon London 2020 · Reactive systems architecture. IoT–like media processing ... • Reactive architecture & code throughout their commit history
Page 37: Reactive systems architecture - QCon London 2020 · Reactive systems architecture. IoT–like media processing ... • Reactive architecture & code throughout their commit history
Page 38: Reactive systems architecture - QCon London 2020 · Reactive systems architecture. IoT–like media processing ... • Reactive architecture & code throughout their commit history

The biggest impact on production performance

comes from…

Page 39: Reactive systems architecture - QCon London 2020 · Reactive systems architecture. IoT–like media processing ... • Reactive architecture & code throughout their commit history
Page 40: Reactive systems architecture - QCon London 2020 · Reactive systems architecture. IoT–like media processing ... • Reactive architecture & code throughout their commit history
Page 41: Reactive systems architecture - QCon London 2020 · Reactive systems architecture. IoT–like media processing ... • Reactive architecture & code throughout their commit history

Four things successful projects do before breakfast

• Structured and performance–tested logging

• Monitoring & [distributed] tracing

• Performance testing

• Reactive architecture & code

throughout their commit history

Page 42: Reactive systems architecture - QCon London 2020 · Reactive systems architecture. IoT–like media processing ... • Reactive architecture & code throughout their commit history