resilient functional service design
TRANSCRIPT
![Page 1: Resilient Functional Service Design](https://reader034.vdocuments.us/reader034/viewer/2022052300/58a95c391a28ab77408b687d/html5/thumbnails/1.jpg)
Resilient Functional Service Design The usually forgotten parts of resilient software design
Uwe Friedrichsen – codecentric AG – 2015-2017
![Page 2: Resilient Functional Service Design](https://reader034.vdocuments.us/reader034/viewer/2022052300/58a95c391a28ab77408b687d/html5/thumbnails/2.jpg)
@ufried Uwe Friedrichsen | [email protected] | http://slideshare.net/ufried | http://ufried.tumblr.com
![Page 3: Resilient Functional Service Design](https://reader034.vdocuments.us/reader034/viewer/2022052300/58a95c391a28ab77408b687d/html5/thumbnails/3.jpg)
What’s that “resilience” thing?
![Page 4: Resilient Functional Service Design](https://reader034.vdocuments.us/reader034/viewer/2022052300/58a95c391a28ab77408b687d/html5/thumbnails/4.jpg)
Business
Production
Availability
![Page 5: Resilient Functional Service Design](https://reader034.vdocuments.us/reader034/viewer/2022052300/58a95c391a28ab77408b687d/html5/thumbnails/5.jpg)
(Almost) every system is a distributed system
Chas Emerick
http://www.infoq.com/presentations/problems-distributed-systems
![Page 6: Resilient Functional Service Design](https://reader034.vdocuments.us/reader034/viewer/2022052300/58a95c391a28ab77408b687d/html5/thumbnails/6.jpg)
A distributed system is one in which the failure of a computer you didn't even know existed can render your own computer unusable.
Leslie Lamport
![Page 7: Resilient Functional Service Design](https://reader034.vdocuments.us/reader034/viewer/2022052300/58a95c391a28ab77408b687d/html5/thumbnails/7.jpg)
Failures in todays complex, distributed and interconnected systems are not the exception. • They are the normal case
• They are not predictable
![Page 8: Resilient Functional Service Design](https://reader034.vdocuments.us/reader034/viewer/2022052300/58a95c391a28ab77408b687d/html5/thumbnails/8.jpg)
… and it’s getting “worse”
• Cloud-based systems
• Microservices
• Zero Downtime
• Mobile & IoT
• Social Web
à Ever-increasing complexity and connectivity
![Page 9: Resilient Functional Service Design](https://reader034.vdocuments.us/reader034/viewer/2022052300/58a95c391a28ab77408b687d/html5/thumbnails/9.jpg)
Do not try to avoid failures. Embrace them.
![Page 10: Resilient Functional Service Design](https://reader034.vdocuments.us/reader034/viewer/2022052300/58a95c391a28ab77408b687d/html5/thumbnails/10.jpg)
resilience (IT) the ability of a system to handle unexpected situations
- without the user noticing it (best case) - with a graceful degradation of service (worst case)
![Page 11: Resilient Functional Service Design](https://reader034.vdocuments.us/reader034/viewer/2022052300/58a95c391a28ab77408b687d/html5/thumbnails/11.jpg)
Beware of the “100% available” trap!
![Page 12: Resilient Functional Service Design](https://reader034.vdocuments.us/reader034/viewer/2022052300/58a95c391a28ab77408b687d/html5/thumbnails/12.jpg)
Designing for resilience The pitfall
![Page 13: Resilient Functional Service Design](https://reader034.vdocuments.us/reader034/viewer/2022052300/58a95c391a28ab77408b687d/html5/thumbnails/13.jpg)
First, you learn about resilience …
![Page 14: Resilient Functional Service Design](https://reader034.vdocuments.us/reader034/viewer/2022052300/58a95c391a28ab77408b687d/html5/thumbnails/14.jpg)
Complement
Core
Detect
Prevent
Recover
Mitigate
Treat
![Page 15: Resilient Functional Service Design](https://reader034.vdocuments.us/reader034/viewer/2022052300/58a95c391a28ab77408b687d/html5/thumbnails/15.jpg)
Core
Detect Treat
Prevent
Recover
Mitigate Complement
Supporting patterns
Redundancy
Stateless
Idempotency
Escalation
Zero downtime deployment
Location transparency
Relaxed temporal
constraints
Fallback
Shed load Share load
Marked data Queue for resources
Bounded queue
Finish work in progress
Fresh work before stale
Deferrable work Communication paradigm
Isolation
Bulkhead System level
Monitor
Watchdog
Heartbeat
Acknowledgement
Either level
Voting
Synthetic transaction
Leaky bucket Routine
checks
Health check
Fail fast
Let sleeping dogs lie
Small releases
Hotdeployments
Routine maintenance
Backuprequest
Anti-fragility
Diversity Jitter
Error injection
Spread the news
Anti-entropy
Backpressure
Retry
Limit retries
Rollback Roll-forward
Checkpoint Safe point
Failover
Read repair
Error handler
Reset Restart
Reconnect
Fail silently
Default value
Node level
Timeout
Circuit breaker
Complete parameter checking
Checksum
Statically
Dynamically
Confinement
![Page 16: Resilient Functional Service Design](https://reader034.vdocuments.us/reader034/viewer/2022052300/58a95c391a28ab77408b687d/html5/thumbnails/16.jpg)
... then, you digest the stuff just learned
![Page 17: Resilient Functional Service Design](https://reader034.vdocuments.us/reader034/viewer/2022052300/58a95c391a28ab77408b687d/html5/thumbnails/17.jpg)
Confinement
Core
Detect Treat
Prevent
Recover
Mitigate Complement
Supporting patterns
Redundancy
Stateless
Idempotency
Escalation
Zero downtime deployment
Location transparency
Relaxed temporal
constraints
Fallback
Shed load Share load
Marked data Queue for resources
Bounded queue
Finish work in progress
Fresh work before stale
Deferrable work Communication paradigm
Isolation
Bulkhead System level
Monitor
Watchdog
Heartbeat
Acknowledgement
Either level
Voting
Synthetic transaction
Leaky bucket Routine
checks
Health check
Fail fast
Let sleeping dogs lie
Small releases
Hotdeployments
Routine maintenance
Backuprequest
Anti-fragility
Diversity Jitter
Error injection
Spread the news
Anti-entropy
Backpressure
Retry
Limit retries
Rollback Roll-forward
Checkpoint Safe point
Failover
Read repair
Error handler
Reset Restart
Reconnect
Fail silently
Default value
Node level
Timeout
Circuit breaker
Complete parameter checking
Checksum
Statically
Dynamically
Oh, my! Theoretical blah!
Uncool!
Know that anyway for eons. So, let’s
move on to the cool parts …
Ah, now we’re talkin’! Here’s the cool stuff!
That‘s practical, applicable. Don‘t you have more code examples? Or even better: Can‘t we turn that all into a live hacking session?
Offline activities?
Hmm, let‘s
focus on the other stuff.
Uh, sounds like one-off, tough
stuff …
Better start with the easier stuff, best
with library support
Yeah, more cool stuff!
Aren‘t there more libs like Hystrix that we can drag into our projects
with a line of configuration?
Well, neat …
I’ll come back to that stuff whenever
I really need it
![Page 18: Resilient Functional Service Design](https://reader034.vdocuments.us/reader034/viewer/2022052300/58a95c391a28ab77408b687d/html5/thumbnails/18.jpg)
Core
Detect
Recover
Mitigate
Treat
Prevent
Complement
Developer priority
Relevance for application robustness
Ye be warned!
If you don’t get this part right, nothing else matters
Here be dragons!
This is extremely hard and poorly understood
![Page 19: Resilient Functional Service Design](https://reader034.vdocuments.us/reader034/viewer/2022052300/58a95c391a28ab77408b687d/html5/thumbnails/19.jpg)
Let’s recap …
![Page 20: Resilient Functional Service Design](https://reader034.vdocuments.us/reader034/viewer/2022052300/58a95c391a28ab77408b687d/html5/thumbnails/20.jpg)
The core parts are
• extremely important
• poorly understood
• massively underestimated
![Page 21: Resilient Functional Service Design](https://reader034.vdocuments.us/reader034/viewer/2022052300/58a95c391a28ab77408b687d/html5/thumbnails/21.jpg)
Houston, we have a problem!
![Page 22: Resilient Functional Service Design](https://reader034.vdocuments.us/reader034/viewer/2022052300/58a95c391a28ab77408b687d/html5/thumbnails/22.jpg)
Let’s have a closer look at the core parts
![Page 23: Resilient Functional Service Design](https://reader034.vdocuments.us/reader034/viewer/2022052300/58a95c391a28ab77408b687d/html5/thumbnails/23.jpg)
Complement
Core
Detect
Prevent
Recover
Mitigate
Treat
![Page 24: Resilient Functional Service Design](https://reader034.vdocuments.us/reader034/viewer/2022052300/58a95c391a28ab77408b687d/html5/thumbnails/24.jpg)
Core
Detect Treat
Prevent
Recover
Mitigate Complement
Isolation
![Page 25: Resilient Functional Service Design](https://reader034.vdocuments.us/reader034/viewer/2022052300/58a95c391a28ab77408b687d/html5/thumbnails/25.jpg)
Isolation
• System must not fail as a whole
• Split system in parts and isolate parts against each other
• Avoid cascading failures
• Foundation of resilient software design
![Page 26: Resilient Functional Service Design](https://reader034.vdocuments.us/reader034/viewer/2022052300/58a95c391a28ab77408b687d/html5/thumbnails/26.jpg)
Core
Detect Treat
Prevent
Recover
Mitigate Complement
Isolation
Bulkhead
![Page 27: Resilient Functional Service Design](https://reader034.vdocuments.us/reader034/viewer/2022052300/58a95c391a28ab77408b687d/html5/thumbnails/27.jpg)
Bulkheads are not about thread pools!
![Page 28: Resilient Functional Service Design](https://reader034.vdocuments.us/reader034/viewer/2022052300/58a95c391a28ab77408b687d/html5/thumbnails/28.jpg)
Bulkheads
• Core isolation pattern (a.k.a. “failure units” or “units of mitigation”)
• Diverse implementation choices available, e.g., µservice, actor, scs, ...
• Implementation choice impacts system and resilience design a lot
• Shaping good bulkheads is extremely hard (pure design issue)
![Page 29: Resilient Functional Service Design](https://reader034.vdocuments.us/reader034/viewer/2022052300/58a95c391a28ab77408b687d/html5/thumbnails/29.jpg)
Sounds easy. Where is the problem?
![Page 30: Resilient Functional Service Design](https://reader034.vdocuments.us/reader034/viewer/2022052300/58a95c391a28ab77408b687d/html5/thumbnails/30.jpg)
Service A Service B Request
Due to functional design, Service A always needs backing from Service B to be able to answer a client request,
i.e. the isolation is broken by design
How do we avoid this …
![Page 31: Resilient Functional Service Design](https://reader034.vdocuments.us/reader034/viewer/2022052300/58a95c391a28ab77408b687d/html5/thumbnails/31.jpg)
Service
Request
Due to functional design we need to call a lot of services to be able
to answer a client request,
i.e. availability is broken by design
... and this ...
Service
Service
Service Service
Service
Service
Service
Service
Service
Service
Service
Service
![Page 32: Resilient Functional Service Design](https://reader034.vdocuments.us/reader034/viewer/2022052300/58a95c391a28ab77408b687d/html5/thumbnails/32.jpg)
Mothership Service
(a.k.a. Monolith) Request
By trying to avoid the aforementioned issues we ended up with cramming all
required functionality in one big service
i.e. the isolation is broken by design
... without ending up with this?
![Page 33: Resilient Functional Service Design](https://reader034.vdocuments.us/reader034/viewer/2022052300/58a95c391a28ab77408b687d/html5/thumbnails/33.jpg)
Let’s use the well-known best practices
• Divide & conquer a.k.a. functional decomposition
• DRY (Don’t Repeat Yourself )
• Design for reusability
• Layered architecture
• …
![Page 34: Resilient Functional Service Design](https://reader034.vdocuments.us/reader034/viewer/2022052300/58a95c391a28ab77408b687d/html5/thumbnails/34.jpg)
Unfortunately, …
![Page 35: Resilient Functional Service Design](https://reader034.vdocuments.us/reader034/viewer/2022052300/58a95c391a28ab77408b687d/html5/thumbnails/35.jpg)
Service A Service B Request
Due to functional design, Service A always needs backing from Service B to be able to answer a client request,
i.e. the isolation is broken by design
... this usually leads to this ...
![Page 36: Resilient Functional Service Design](https://reader034.vdocuments.us/reader034/viewer/2022052300/58a95c391a28ab77408b687d/html5/thumbnails/36.jpg)
Service
Request
Due to functional design we need to call a lot of services to be able
to answer a client request,
i.e. availability is broken by design
... and this ...
Service
Service
Service Service
Service
Service
Service
Service
Service
Service
Service
Service
![Page 37: Resilient Functional Service Design](https://reader034.vdocuments.us/reader034/viewer/2022052300/58a95c391a28ab77408b687d/html5/thumbnails/37.jpg)
Mothership Service
(a.k.a. Monolith) Request
By trying to avoid the aforementioned issues we ended up with cramming all
required functionality in one big service
i.e. the isolation is broken by design
... and in the end also often to this.
![Page 38: Resilient Functional Service Design](https://reader034.vdocuments.us/reader034/viewer/2022052300/58a95c391a28ab77408b687d/html5/thumbnails/38.jpg)
Welcome to distributed hell!
![Page 39: Resilient Functional Service Design](https://reader034.vdocuments.us/reader034/viewer/2022052300/58a95c391a28ab77408b687d/html5/thumbnails/39.jpg)
Caches to the rescue!
![Page 40: Resilient Functional Service Design](https://reader034.vdocuments.us/reader034/viewer/2022052300/58a95c391a28ab77408b687d/html5/thumbnails/40.jpg)
Service A Service B Request
Due to functional design, Service A always needs backing from Service B to be able to answer a client request,
i.e. the isolation is broken by design
Cach
e of
B
Break tight service coupling by caching data/responses
of downstream service
![Page 41: Resilient Functional Service Design](https://reader034.vdocuments.us/reader034/viewer/2022052300/58a95c391a28ab77408b687d/html5/thumbnails/41.jpg)
Caches to the rescue?
![Page 42: Resilient Functional Service Design](https://reader034.vdocuments.us/reader034/viewer/2022052300/58a95c391a28ab77408b687d/html5/thumbnails/42.jpg)
Do you really thinkthat copying stale data all over your system
is a suitable measure to fix an inherently broken design?
![Page 43: Resilient Functional Service Design](https://reader034.vdocuments.us/reader034/viewer/2022052300/58a95c391a28ab77408b687d/html5/thumbnails/43.jpg)
We have to re-learn design for distributed systems!
![Page 44: Resilient Functional Service Design](https://reader034.vdocuments.us/reader034/viewer/2022052300/58a95c391a28ab77408b687d/html5/thumbnails/44.jpg)
A works-out-of-the-box-in-all-contexts, just-add-water-and-stir,
three-bullet-point panacea for designing perfect bulkheads
![Page 45: Resilient Functional Service Design](https://reader034.vdocuments.us/reader034/viewer/2022052300/58a95c391a28ab77408b687d/html5/thumbnails/45.jpg)
You need lots of those …
![Page 46: Resilient Functional Service Design](https://reader034.vdocuments.us/reader034/viewer/2022052300/58a95c391a28ab77408b687d/html5/thumbnails/46.jpg)
... maybe some of those
![Page 47: Resilient Functional Service Design](https://reader034.vdocuments.us/reader034/viewer/2022052300/58a95c391a28ab77408b687d/html5/thumbnails/47.jpg)
Then it is a lot of hard work …
![Page 48: Resilient Functional Service Design](https://reader034.vdocuments.us/reader034/viewer/2022052300/58a95c391a28ab77408b687d/html5/thumbnails/48.jpg)
... and there is no silver bullet
![Page 49: Resilient Functional Service Design](https://reader034.vdocuments.us/reader034/viewer/2022052300/58a95c391a28ab77408b687d/html5/thumbnails/49.jpg)
Yet, a few guiding thoughts about bulkhead design …
![Page 50: Resilient Functional Service Design](https://reader034.vdocuments.us/reader034/viewer/2022052300/58a95c391a28ab77408b687d/html5/thumbnails/50.jpg)
Foundations of design • High cohesion, low coupling
• Separation of concerns
• Crucial across process boundaries
• Still poorly understood issue
• Start with • Understanding organizational boundaries
• Understanding use cases and flows
• Identifying functional domains (à DDD)
• Finding areas that change independently
• Do not start with a data model!
![Page 51: Resilient Functional Service Design](https://reader034.vdocuments.us/reader034/viewer/2022052300/58a95c391a28ab77408b687d/html5/thumbnails/51.jpg)
Short activation paths
• Long activation paths affect availability
• Increase latency and likelihood of failures
• Minimize remote calls per request
• Need to balance opposing forces
• Avoid monolith à clear separation of concerns
• Minimize requests à cluster functionality & data
• Caches sometimes help, but stale data as trade-off
![Page 52: Resilient Functional Service Design](https://reader034.vdocuments.us/reader034/viewer/2022052300/58a95c391a28ab77408b687d/html5/thumbnails/52.jpg)
Dismiss reusability
• Reusability increases coupling
• Reusability leads to bad service design
• Reusability compromises availability
• Reusability rarely pays
• Do not strive for reuse
• Strive for replaceability instead
![Page 53: Resilient Functional Service Design](https://reader034.vdocuments.us/reader034/viewer/2022052300/58a95c391a28ab77408b687d/html5/thumbnails/53.jpg)
Broadening the options ...
![Page 54: Resilient Functional Service Design](https://reader034.vdocuments.us/reader034/viewer/2022052300/58a95c391a28ab77408b687d/html5/thumbnails/54.jpg)
Core
Detect Treat
Prevent
Recover
Mitigate Complement
Isolation
Communication paradigm
Bulkhead
![Page 55: Resilient Functional Service Design](https://reader034.vdocuments.us/reader034/viewer/2022052300/58a95c391a28ab77408b687d/html5/thumbnails/55.jpg)
Communication paradigm
• Request-response <-> messaging <-> events <-> …
• Heavily influences resilience patterns to be used
• Also heavily influences functional bulkhead design
• Very fundamental decision which is often underestimated
![Page 56: Resilient Functional Service Design](https://reader034.vdocuments.us/reader034/viewer/2022052300/58a95c391a28ab77408b687d/html5/thumbnails/56.jpg)
µS
Request/Response : Horizontal slicing
Flow / Process
µS µS
µS µS µS
µS
Event-driven : Vertical slicing
µS µS
µS
µS µS
Flow / Process
![Page 57: Resilient Functional Service Design](https://reader034.vdocuments.us/reader034/viewer/2022052300/58a95c391a28ab77408b687d/html5/thumbnails/57.jpg)
Synchronous R/R vs. asynchronous events
• Decomposition • Vertically divide-and-conquer vs. horizontally go-with-the-flow
• Coordination • Coordination logic/services and orchestration vs. event chains and choreography
• Transactions • Built-in transaction handling vs. external supervision
• Error handling • Built into service vs. escalation/supervision strategy
• Separation of concerns • Multiple responsibilities service vs. single responsibility services
• Encapsulation • Domain logic distributed across services vs. domain logic in one place • Reusability vs. Replaceability
• Complexity • A draw …
![Page 58: Resilient Functional Service Design](https://reader034.vdocuments.us/reader034/viewer/2022052300/58a95c391a28ab77408b687d/html5/thumbnails/58.jpg)
The communication paradigm influences the functional service design a lot
and also the resilience patterns to be used
![Page 59: Resilient Functional Service Design](https://reader034.vdocuments.us/reader034/viewer/2022052300/58a95c391a28ab77408b687d/html5/thumbnails/59.jpg)
Example: order fulfillment • Simple order, credit card, non-digital items
• Add coupons incl. validation • Add promotions incl. usage notification • Add bonus card incl. purchase notification
• Customer accounts as payment type • PayPal as payment type
• Integrate digital music library • Integrate digital video library • Integrate e-book library
![Page 60: Resilient Functional Service Design](https://reader034.vdocuments.us/reader034/viewer/2022052300/58a95c391a28ab77408b687d/html5/thumbnails/60.jpg)
Design exercise – Part 1 Create a bulkhead design for the case study • Use one communication paradigm
• Synchronous request/response (e.g., REST) • Asynchronous messaging (e.g., Akka) • Asynchronous events (e.g., Pub/Sub)
• Assume incremental requirements • How many services do you need to touch • What about the functional isolation of the services • How big/maintainable are the resulting services • Take a few notes
![Page 61: Resilient Functional Service Design](https://reader034.vdocuments.us/reader034/viewer/2022052300/58a95c391a28ab77408b687d/html5/thumbnails/61.jpg)
Online Shop Checkout
Credit Card Provider
Warehouse System
Coupon Management
Campaign Management
PayPal
Loyalty Management
Accounts Receivables Music Library
E-Book Library Video Library
E-Mail Server
Customer pressed
“Buy now”
?
![Page 62: Resilient Functional Service Design](https://reader034.vdocuments.us/reader034/viewer/2022052300/58a95c391a28ab77408b687d/html5/thumbnails/62.jpg)
Order Fulfillment Service
Online Shop
Payment Service
Credit Card Provider
Shipment Service
Warehouse System
<Foreign Service> <Own Service>
Coupon Management
Promotion Campaign
Management Loyalty
Account Service
Payment Provider
PayPal
Loyalty Management
Accounts Receivables
Music Library
E-Book Library
Video Library
E-Mail Server
Coupon
Credit Card
Coordinate
Warehouse
Coordinate
Assets
Notify Cust.
PayPal
Coordinate
![Page 63: Resilient Functional Service Design](https://reader034.vdocuments.us/reader034/viewer/2022052300/58a95c391a28ab77408b687d/html5/thumbnails/63.jpg)
Order confirmed
Online Shop
Credit Card Provider
Warehouse System
<Foreign Service>
<Own Service>
Coupon Management
Campaign Management
Account service
Credit Card Service
Loyalty Management
Accounts Receivables
Music Library
E-Book Library
Video Library E-Mail Server
PayPal
PayPal Service
Warehouse Service
Promotion Service
Bonus Card Service
Coupon Service
Music Library Service
Video Library Service
E-Book Library Service
Notification Service
Payment authorized Digital asset provisioned
Payment failed
<Event>
Order fulfillment supervisor
Track flow of events Reschedule events in case of failure
Services are responsible to eventually succeed or fail for good, usually incorporating a supervision/escalation hierarchy for that
![Page 64: Resilient Functional Service Design](https://reader034.vdocuments.us/reader034/viewer/2022052300/58a95c391a28ab77408b687d/html5/thumbnails/64.jpg)
Do not limit your design options upfront without an important reason
![Page 65: Resilient Functional Service Design](https://reader034.vdocuments.us/reader034/viewer/2022052300/58a95c391a28ab77408b687d/html5/thumbnails/65.jpg)
Wrap-up
• Today’s systems are distributed • Failures are not avoidable, nor predictable • Resilient software design needed
• Bulkhead design is • crucial for application robustness • poorly understood • massively underrated • different from traditional design best practices
• Communication paradigms broaden your bulkhead design options
![Page 66: Resilient Functional Service Design](https://reader034.vdocuments.us/reader034/viewer/2022052300/58a95c391a28ab77408b687d/html5/thumbnails/66.jpg)
We have to re-learn design for distributed systems
![Page 67: Resilient Functional Service Design](https://reader034.vdocuments.us/reader034/viewer/2022052300/58a95c391a28ab77408b687d/html5/thumbnails/67.jpg)
@ufried Uwe Friedrichsen | [email protected] | http://slideshare.net/ufried | http://ufried.tumblr.com
![Page 68: Resilient Functional Service Design](https://reader034.vdocuments.us/reader034/viewer/2022052300/58a95c391a28ab77408b687d/html5/thumbnails/68.jpg)