service stampede: surviving a thousand services
TRANSCRIPT
![Page 1: Service Stampede: Surviving a Thousand Services](https://reader036.vdocuments.us/reader036/viewer/2022062523/58eea1661a28ab542f8b45cb/html5/thumbnails/1.jpg)
Service StampedeAnil Gursel, PayPal
![Page 2: Service Stampede: Surviving a Thousand Services](https://reader036.vdocuments.us/reader036/viewer/2022062523/58eea1661a28ab542f8b45cb/html5/thumbnails/2.jpg)
Agenda Monoliths to Microservices
Problems with microservices
Solves & Practices
The need for standardization
Introducing squbs
![Page 3: Service Stampede: Surviving a Thousand Services](https://reader036.vdocuments.us/reader036/viewer/2022062523/58eea1661a28ab542f8b45cb/html5/thumbnails/3.jpg)
Monolith to Microservices
Requests
Congrats! Your monolith became a thousand microservices – now you’re in serious trouble!!!
![Page 4: Service Stampede: Surviving a Thousand Services](https://reader036.vdocuments.us/reader036/viewer/2022062523/58eea1661a28ab542f8b45cb/html5/thumbnails/4.jpg)
Cost/Benefits of Moving to Microservices
• Independence – faster PDLC
• Freedom of choice for service implementation
• Easy evolution of service & technology
• Coexisting services across generations
• Complexity & Latency
Gains• Homogeneity
• Consistency of implementation across
• Timing & Determinism
Losses
Hmm. To be, or not to be… a service, that is...
![Page 5: Service Stampede: Surviving a Thousand Services](https://reader036.vdocuments.us/reader036/viewer/2022062523/58eea1661a28ab542f8b45cb/html5/thumbnails/5.jpg)
Microservices Issues
Latency & Determinism
Service BoundariesTo be, or not to be a service
Scaling and rightsizing
Many failure points – need resiliency
Inconsistency – need standardization
![Page 6: Service Stampede: Surviving a Thousand Services](https://reader036.vdocuments.us/reader036/viewer/2022062523/58eea1661a28ab542f8b45cb/html5/thumbnails/6.jpg)
Microservices Issues
Latency & Determinism
Service BoundariesTo be, or not to be a service
Scaling and rightsizing
Many failure points – need resiliency
Inconsistency – need standardization
![Page 7: Service Stampede: Surviving a Thousand Services](https://reader036.vdocuments.us/reader036/viewer/2022062523/58eea1661a28ab542f8b45cb/html5/thumbnails/7.jpg)
Latency Determinism
![Page 8: Service Stampede: Surviving a Thousand Services](https://reader036.vdocuments.us/reader036/viewer/2022062523/58eea1661a28ab542f8b45cb/html5/thumbnails/8.jpg)
Latency by Deployment Topology
• Avoid too many layers of services• Keep state close to the edge• The more hops, the higher and less deterministic the latency
is
![Page 9: Service Stampede: Surviving a Thousand Services](https://reader036.vdocuments.us/reader036/viewer/2022062523/58eea1661a28ab542f8b45cb/html5/thumbnails/9.jpg)
Microservices Issues
Latency & Determinism
Service BoundariesTo be, or not to be a service
Scaling and rightsizing
Many failure points – need resiliency
Inconsistency – need standardization
![Page 10: Service Stampede: Surviving a Thousand Services](https://reader036.vdocuments.us/reader036/viewer/2022062523/58eea1661a28ab542f8b45cb/html5/thumbnails/10.jpg)
Microservices Issues
Latency & Determinism
Service BoundariesTo be, or not to be a service
Scaling and rightsizing
Many failure points – need resiliency
Inconsistency – need standardization
![Page 11: Service Stampede: Surviving a Thousand Services](https://reader036.vdocuments.us/reader036/viewer/2022062523/58eea1661a28ab542f8b45cb/html5/thumbnails/11.jpg)
Services Need to Scale
• Scale horizontally with increasing workload• More nodes, or…• More pods with increasing workload
• Scale vertically – why?• Keep the number of instances under control• 125 nodes @16CPU easier to manage than 1000 nodes @2CPU• Less load on network and switching infrastructure• Potentially better utilization & cache hits• Stateful systems: More limited horizontal scale• Need critical mass for redundancy
![Page 12: Service Stampede: Surviving a Thousand Services](https://reader036.vdocuments.us/reader036/viewer/2022062523/58eea1661a28ab542f8b45cb/html5/thumbnails/12.jpg)
Microservices Issues
Latency & Determinism
Service BoundariesTo be, or not to be a service
Scaling and rightsizing
Many failure points – need resiliency
Inconsistency – need standardization
![Page 13: Service Stampede: Surviving a Thousand Services](https://reader036.vdocuments.us/reader036/viewer/2022062523/58eea1661a28ab542f8b45cb/html5/thumbnails/13.jpg)
Microservices Issues
Latency & Determinism
Service BoundariesTo be, or not to be a service
Scaling and rightsizing
Many failure points – need resiliency
Inconsistency – need standardization
![Page 14: Service Stampede: Surviving a Thousand Services](https://reader036.vdocuments.us/reader036/viewer/2022062523/58eea1661a28ab542f8b45cb/html5/thumbnails/14.jpg)
Practices for Successful Microservices
Deployment Topologies
Reactive Systems
Resilience with Circuit Breakers
Asynchronous Communication
Standardization
![Page 15: Service Stampede: Surviving a Thousand Services](https://reader036.vdocuments.us/reader036/viewer/2022062523/58eea1661a28ab542f8b45cb/html5/thumbnails/15.jpg)
Practices for Successful Microservices
Deployment Topologies
Reactive Systems
Resilience with Circuit Breakers
Asynchronous Communication
Standardization
![Page 16: Service Stampede: Surviving a Thousand Services](https://reader036.vdocuments.us/reader036/viewer/2022062523/58eea1661a28ab542f8b45cb/html5/thumbnails/16.jpg)
Individual Service Deployments
Service A Service B
RequestsRequests
![Page 17: Service Stampede: Surviving a Thousand Services](https://reader036.vdocuments.us/reader036/viewer/2022062523/58eea1661a28ab542f8b45cb/html5/thumbnails/17.jpg)
Joint Deployments
Service A
Requests Service B
Service C
• Deployment orchestration using Chef, etc.• Kubernetes Pods
![Page 18: Service Stampede: Surviving a Thousand Services](https://reader036.vdocuments.us/reader036/viewer/2022062523/58eea1661a28ab542f8b45cb/html5/thumbnails/18.jpg)
Practices for Successful Microservices
Deployment Topologies
Reactive Systems
Resilience with Circuit Breakers
Asynchronous Communication
Standardization
![Page 19: Service Stampede: Surviving a Thousand Services](https://reader036.vdocuments.us/reader036/viewer/2022062523/58eea1661a28ab542f8b45cb/html5/thumbnails/19.jpg)
The Reactive Manifesto
Responsive
Message Driven
Elastic Resilient
![Page 20: Service Stampede: Surviving a Thousand Services](https://reader036.vdocuments.us/reader036/viewer/2022062523/58eea1661a28ab542f8b45cb/html5/thumbnails/20.jpg)
Why Does it Matter?
Respond in a deterministic, timely manner. Controls determinism
Stays responsive in the face of failure – even cascading failures
Stays responsive under workload spikes
Basic building block for responsive, resilient, and elastic systems
Responsive
Resilient
Elastic
Message Driven
![Page 21: Service Stampede: Surviving a Thousand Services](https://reader036.vdocuments.us/reader036/viewer/2022062523/58eea1661a28ab542f8b45cb/html5/thumbnails/21.jpg)
Practices for Successful Microservices
Deployment Topologies
Reactive Systems
Resilience with Circuit Breakers
Asynchronous Communication
Standardization
![Page 22: Service Stampede: Surviving a Thousand Services](https://reader036.vdocuments.us/reader036/viewer/2022062523/58eea1661a28ab542f8b45cb/html5/thumbnails/22.jpg)
Circuit Breaker Keeps systems responsive under failure
Avoids cascading failures
Especially with multi-generational downstream services
Critical part to keeping your 1000 services alive
![Page 23: Service Stampede: Surviving a Thousand Services](https://reader036.vdocuments.us/reader036/viewer/2022062523/58eea1661a28ab542f8b45cb/html5/thumbnails/23.jpg)
Practices for Successful Microservices
Deployment Topologies
Reactive Systems
Resilience with Circuit Breakers
Asynchronous Communication
Standardization
![Page 24: Service Stampede: Surviving a Thousand Services](https://reader036.vdocuments.us/reader036/viewer/2022062523/58eea1661a28ab542f8b45cb/html5/thumbnails/24.jpg)
Practices for Successful Microservices
Deployment Topologies
Reactive Systems
Resilience with Circuit Breakers
Asynchronous Communication
Standardization
![Page 25: Service Stampede: Surviving a Thousand Services](https://reader036.vdocuments.us/reader036/viewer/2022062523/58eea1661a28ab542f8b45cb/html5/thumbnails/25.jpg)
Standardization
• Monitoring• Need to collect metrics, consistently
• Logging• Correlation across services• Uniformity in logs
• Security• Need to apply standard security configuration
• Environment Resolution• Staging, production, etc.
Consistency in the face of Heterogeneity
![Page 26: Service Stampede: Surviving a Thousand Services](https://reader036.vdocuments.us/reader036/viewer/2022062523/58eea1661a28ab542f8b45cb/html5/thumbnails/26.jpg)
Standardized Reactive PlatformFor Large Scale Internet Deployments
![Page 27: Service Stampede: Surviving a Thousand Services](https://reader036.vdocuments.us/reader036/viewer/2022062523/58eea1661a28ab542f8b45cb/html5/thumbnails/27.jpg)
Akka, Spray, Akka Http & Streams
Asynchronous
High Performance
Resilience & Supervision
Great Libraries for building Reactive Systems
![Page 28: Service Stampede: Surviving a Thousand Services](https://reader036.vdocuments.us/reader036/viewer/2022062523/58eea1661a28ab542f8b45cb/html5/thumbnails/28.jpg)
Bootstrap and Lifecycle Management
Unicomplex: Lightweight bootstrap module
Emits lifecycle events: starting, active, stopping
Startup and shutdown hooks
Allows obtaining the current state
![Page 29: Service Stampede: Surviving a Thousand Services](https://reader036.vdocuments.us/reader036/viewer/2022062523/58eea1661a28ab542f8b45cb/html5/thumbnails/29.jpg)
Listener
• Declares configuration for port binding, interfaces, security, etc
![Page 30: Service Stampede: Surviving a Thousand Services](https://reader036.vdocuments.us/reader036/viewer/2022062523/58eea1661a28ab542f8b45cb/html5/thumbnails/30.jpg)
Service
• Akka Http/Spray Routes and Http Request Handler Actors• Configured in squbs-meta.conf• A service can be defined in a dependency artifact
![Page 31: Service Stampede: Surviving a Thousand Services](https://reader036.vdocuments.us/reader036/viewer/2022062523/58eea1661a28ab542f8b45cb/html5/thumbnails/31.jpg)
Extension
• To start low level (non-actor) facilities needed for the environment
![Page 32: Service Stampede: Surviving a Thousand Services](https://reader036.vdocuments.us/reader036/viewer/2022062523/58eea1661a28ab542f8b45cb/html5/thumbnails/32.jpg)
Request/Response Pipeline
![Page 33: Service Stampede: Surviving a Thousand Services](https://reader036.vdocuments.us/reader036/viewer/2022062523/58eea1661a28ab542f8b45cb/html5/thumbnails/33.jpg)
CubesAnother deployment Topology
squbs: rhymes with cubes
Drop-in modules
Cubes can run in isolation as well as on a flat classpath
Easy to compose/decompose/refactor
Cubes share the actor system
Provide better predictability
![Page 34: Service Stampede: Surviving a Thousand Services](https://reader036.vdocuments.us/reader036/viewer/2022062523/58eea1661a28ab542f8b45cb/html5/thumbnails/34.jpg)
Orchestrationtask1
task2task3
task4task5
Input
Output
![Page 35: Service Stampede: Surviving a Thousand Services](https://reader036.vdocuments.us/reader036/viewer/2022062523/58eea1661a28ab542f8b45cb/html5/thumbnails/35.jpg)
val task1F = doTask1(input)val task2F = doTask2(input)val task3F = (task1F, task2F) >> doTask3val task4F = task2F >> doTask4val task5F = (task3F, task4F) >> doTask5for { result <- task5F } { requester ! result context.stop(self)}
Orchestrationtask1
task2task3
task4task5
Input
Output
![Page 36: Service Stampede: Surviving a Thousand Services](https://reader036.vdocuments.us/reader036/viewer/2022062523/58eea1661a28ab542f8b45cb/html5/thumbnails/36.jpg)
Orchestration DSL
High-performance asynchronous orchestration
Responsive: Respond within SLA, with or without results
Streamlined error handling
Reduced code complexity
![Page 37: Service Stampede: Surviving a Thousand Services](https://reader036.vdocuments.us/reader036/viewer/2022062523/58eea1661a28ab542f8b45cb/html5/thumbnails/37.jpg)
More Utilities
• Http Client• Admin Console• Actor Registry• Perpetual Stream• Persistence Buffer• …
![Page 38: Service Stampede: Surviving a Thousand Services](https://reader036.vdocuments.us/reader036/viewer/2022062523/58eea1661a28ab542f8b45cb/html5/thumbnails/38.jpg)
Summary
• Large number of services have benefits, but are more difficult• Control your service topology for more determinism and lower
latency• Rule of thumb: No more than two hops of synchronous calls
from edge• Reactive systems – ideal for services• Responsive & resilient
• Standardization• Walk like a duck, quack like a duck, and manage it like a
duck• squbs: Have the cake, and eat it too
![Page 39: Service Stampede: Surviving a Thousand Services](https://reader036.vdocuments.us/reader036/viewer/2022062523/58eea1661a28ab542f8b45cb/html5/thumbnails/39.jpg)
Q&A – Feedback AppreciatedJoin us on – link from https://github.com/paypal/squbs @squbs
![Page 40: Service Stampede: Surviving a Thousand Services](https://reader036.vdocuments.us/reader036/viewer/2022062523/58eea1661a28ab542f8b45cb/html5/thumbnails/40.jpg)