from mainframe to microservice: an introduction to distributed systems
TRANSCRIPT
![Page 1: From Mainframe to Microservice: An Introduction to Distributed Systems](https://reader036.vdocuments.us/reader036/viewer/2022062308/55d57882bb61ebae2f8b45ba/html5/thumbnails/1.jpg)
From Mainframe to Microservice
An Introduction to Distributed Systems
@tyler_treat Workiva
![Page 2: From Mainframe to Microservice: An Introduction to Distributed Systems](https://reader036.vdocuments.us/reader036/viewer/2022062308/55d57882bb61ebae2f8b45ba/html5/thumbnails/2.jpg)
An Introduction to Distributed Systems❖ Building a foundation of understanding
❖ Why distributed systems?
❖ Universal fallacies
❖ Characteristics and the CAP theorem
❖ Common pitfalls
❖ Digging deeper
❖ Byzantine Generals Problem and consensus
❖ Split-brain
❖ Hybrid consistency models
❖ Scaling shared data and CRDTs
![Page 3: From Mainframe to Microservice: An Introduction to Distributed Systems](https://reader036.vdocuments.us/reader036/viewer/2022062308/55d57882bb61ebae2f8b45ba/html5/thumbnails/3.jpg)
–Leslie Lamport
“A distributed system is one in which the failure of a computer you didn't even know existed can
render your own computer unusable.”
![Page 4: From Mainframe to Microservice: An Introduction to Distributed Systems](https://reader036.vdocuments.us/reader036/viewer/2022062308/55d57882bb61ebae2f8b45ba/html5/thumbnails/4.jpg)
Scale Up vs. Scale Out
❖ Add resources to a node ❖ Increases node capacity, load
is unaffected ❖ System complexity unaffected
Vertical Scaling❖ Add nodes to a cluster ❖ Decreases load, capacity is
unaffected ❖ Availability and throughput w/
increased complexity
Horizontal Scaling
![Page 5: From Mainframe to Microservice: An Introduction to Distributed Systems](https://reader036.vdocuments.us/reader036/viewer/2022062308/55d57882bb61ebae2f8b45ba/html5/thumbnails/5.jpg)
A distributed system is a collection of independent computers that behave as a single coherent system.
![Page 6: From Mainframe to Microservice: An Introduction to Distributed Systems](https://reader036.vdocuments.us/reader036/viewer/2022062308/55d57882bb61ebae2f8b45ba/html5/thumbnails/6.jpg)
Why Distributed Systems?
Availability
Fault Tolerance
Throughput
Architecture
Economics
serve every request
resilient to failures
parallel computation
decoupled, focused services
scale-out becoming manageable/cost-effective
![Page 7: From Mainframe to Microservice: An Introduction to Distributed Systems](https://reader036.vdocuments.us/reader036/viewer/2022062308/55d57882bb61ebae2f8b45ba/html5/thumbnails/7.jpg)
oh shit…
![Page 8: From Mainframe to Microservice: An Introduction to Distributed Systems](https://reader036.vdocuments.us/reader036/viewer/2022062308/55d57882bb61ebae2f8b45ba/html5/thumbnails/8.jpg)
–Ken Arnold
“You have to design distributed systems with the
expectation of failure.”
![Page 9: From Mainframe to Microservice: An Introduction to Distributed Systems](https://reader036.vdocuments.us/reader036/viewer/2022062308/55d57882bb61ebae2f8b45ba/html5/thumbnails/9.jpg)
Distributed systems engineers are the world’s biggest pessimists.
![Page 10: From Mainframe to Microservice: An Introduction to Distributed Systems](https://reader036.vdocuments.us/reader036/viewer/2022062308/55d57882bb61ebae2f8b45ba/html5/thumbnails/10.jpg)
Universal Fallacy #1The network is reliable.
❖ Message delivery is never guaranteed ❖ Best effort ❖ Is it worth it? ❖ Resiliency/redundancy/failover
![Page 11: From Mainframe to Microservice: An Introduction to Distributed Systems](https://reader036.vdocuments.us/reader036/viewer/2022062308/55d57882bb61ebae2f8b45ba/html5/thumbnails/11.jpg)
Universal Fallacy #2Latency is zero.
❖ We cannot defy the laws of physics ❖ LAN to WAN deteriorates quickly ❖ Minimize network calls (batch) ❖ Design asynchronous systems
![Page 12: From Mainframe to Microservice: An Introduction to Distributed Systems](https://reader036.vdocuments.us/reader036/viewer/2022062308/55d57882bb61ebae2f8b45ba/html5/thumbnails/12.jpg)
Universal Fallacy #3Bandwidth is infinite.
❖ Out of our control ❖ Limit message sizes ❖ Use message queueing
![Page 13: From Mainframe to Microservice: An Introduction to Distributed Systems](https://reader036.vdocuments.us/reader036/viewer/2022062308/55d57882bb61ebae2f8b45ba/html5/thumbnails/13.jpg)
Universal Fallacy #4The network is secure.
❖ Everyone is out to get you ❖ Build in security from day 1 ❖ Multi-layered ❖ Encrypt, pentest, train developers
![Page 14: From Mainframe to Microservice: An Introduction to Distributed Systems](https://reader036.vdocuments.us/reader036/viewer/2022062308/55d57882bb61ebae2f8b45ba/html5/thumbnails/14.jpg)
Universal Fallacy #5Topology doesn’t change.
❖ Network topology is dynamic ❖ Don’t statically address hosts ❖ Collection of services, not nodes ❖ Service discovery
![Page 15: From Mainframe to Microservice: An Introduction to Distributed Systems](https://reader036.vdocuments.us/reader036/viewer/2022062308/55d57882bb61ebae2f8b45ba/html5/thumbnails/15.jpg)
Universal Fallacy #6There is one administrator.
❖ May integrate with third-party systems ❖ “Is it our problem or theirs?” ❖ Conflicting policies/priorities ❖ Third parties constrain; weigh the risk
![Page 16: From Mainframe to Microservice: An Introduction to Distributed Systems](https://reader036.vdocuments.us/reader036/viewer/2022062308/55d57882bb61ebae2f8b45ba/html5/thumbnails/16.jpg)
Universal Fallacy #7Transport cost is zero.
❖ Monetary and practical costs ❖ Building/maintaining a network is not
trivial ❖ The “perfect” system might be too costly
![Page 17: From Mainframe to Microservice: An Introduction to Distributed Systems](https://reader036.vdocuments.us/reader036/viewer/2022062308/55d57882bb61ebae2f8b45ba/html5/thumbnails/17.jpg)
Universal Fallacy #8The network is homogenous.
❖ Networks are almost never homogenous ❖ Third-party integration? ❖ Consider interoperability ❖ Avoid proprietary protocols
![Page 18: From Mainframe to Microservice: An Introduction to Distributed Systems](https://reader036.vdocuments.us/reader036/viewer/2022062308/55d57882bb61ebae2f8b45ba/html5/thumbnails/18.jpg)
These problems apply to LAN and WAN systems (single-data-center and cross-data-center)
No one is safe.
![Page 19: From Mainframe to Microservice: An Introduction to Distributed Systems](https://reader036.vdocuments.us/reader036/viewer/2022062308/55d57882bb61ebae2f8b45ba/html5/thumbnails/19.jpg)
–Murphy’s Law
“Anything that can go wrong will go wrong.”
![Page 20: From Mainframe to Microservice: An Introduction to Distributed Systems](https://reader036.vdocuments.us/reader036/viewer/2022062308/55d57882bb61ebae2f8b45ba/html5/thumbnails/20.jpg)
![Page 21: From Mainframe to Microservice: An Introduction to Distributed Systems](https://reader036.vdocuments.us/reader036/viewer/2022062308/55d57882bb61ebae2f8b45ba/html5/thumbnails/21.jpg)
Characteristics of a Reliable Distributed System
Fault-tolerant
Available
Scalable
Consistent
Secure
Performant
nodes can fail
serve all the requests, all the time
behave correctly with changing topologies
state is coordinated across nodes
access is authenticated
it’s fast!
![Page 22: From Mainframe to Microservice: An Introduction to Distributed Systems](https://reader036.vdocuments.us/reader036/viewer/2022062308/55d57882bb61ebae2f8b45ba/html5/thumbnails/22.jpg)
![Page 23: From Mainframe to Microservice: An Introduction to Distributed Systems](https://reader036.vdocuments.us/reader036/viewer/2022062308/55d57882bb61ebae2f8b45ba/html5/thumbnails/23.jpg)
Distributed systems are all about trade-offs.
![Page 24: From Mainframe to Microservice: An Introduction to Distributed Systems](https://reader036.vdocuments.us/reader036/viewer/2022062308/55d57882bb61ebae2f8b45ba/html5/thumbnails/24.jpg)
CAP Theorem
❖ Presented in 1998 by Eric Brewer
❖ Impossible to guarantee all three:
❖ Consistency
❖ Availability
❖ Partition tolerance
![Page 25: From Mainframe to Microservice: An Introduction to Distributed Systems](https://reader036.vdocuments.us/reader036/viewer/2022062308/55d57882bb61ebae2f8b45ba/html5/thumbnails/25.jpg)
Consistency
![Page 26: From Mainframe to Microservice: An Introduction to Distributed Systems](https://reader036.vdocuments.us/reader036/viewer/2022062308/55d57882bb61ebae2f8b45ba/html5/thumbnails/26.jpg)
Consistency❖ Linearizable - there exists a total order of all state
updates and each update appears atomic
❖ E.g. mutexes make operations appear atomic
❖ When operations are linearizable, we can assign a unique “timestamp” to each one (total order)
❖ A system is consistent if every node shares the same total order
❖ Consistency which is both global and instantaneous is impossible
![Page 27: From Mainframe to Microservice: An Introduction to Distributed Systems](https://reader036.vdocuments.us/reader036/viewer/2022062308/55d57882bb61ebae2f8b45ba/html5/thumbnails/27.jpg)
Consistency
Eventual consistencyreplicas allowed to diverge, eventually converge
Strong consistencyreplicas can’t diverge; requires linearizability
![Page 28: From Mainframe to Microservice: An Introduction to Distributed Systems](https://reader036.vdocuments.us/reader036/viewer/2022062308/55d57882bb61ebae2f8b45ba/html5/thumbnails/28.jpg)
Availability
❖ Every request received by a non-failing node must be served
❖ If a piece of data required for a request is unavailable, the system is unavailable
❖ 100% availability is a myth
![Page 29: From Mainframe to Microservice: An Introduction to Distributed Systems](https://reader036.vdocuments.us/reader036/viewer/2022062308/55d57882bb61ebae2f8b45ba/html5/thumbnails/29.jpg)
Partition Tolerance
❖ A partition is a split in the network—many causes
❖ Partition tolerance means partitions can happen
❖ CA is easy when your network is perfectly reliable
❖ Your network is not perfectly reliable
![Page 30: From Mainframe to Microservice: An Introduction to Distributed Systems](https://reader036.vdocuments.us/reader036/viewer/2022062308/55d57882bb61ebae2f8b45ba/html5/thumbnails/30.jpg)
Partition Tolerance
![Page 31: From Mainframe to Microservice: An Introduction to Distributed Systems](https://reader036.vdocuments.us/reader036/viewer/2022062308/55d57882bb61ebae2f8b45ba/html5/thumbnails/31.jpg)
Common Pitfalls
❖ Halting failure - machine stops
❖ Network failure - network connection breaks
❖ Omission failure - messages are lost
❖ Timing failure - clock skew
❖ Byzantine failure - arbitrary failure
![Page 32: From Mainframe to Microservice: An Introduction to Distributed Systems](https://reader036.vdocuments.us/reader036/viewer/2022062308/55d57882bb61ebae2f8b45ba/html5/thumbnails/32.jpg)
Exploring some higher-level concepts
Digging Deeper
![Page 33: From Mainframe to Microservice: An Introduction to Distributed Systems](https://reader036.vdocuments.us/reader036/viewer/2022062308/55d57882bb61ebae2f8b45ba/html5/thumbnails/33.jpg)
Byzantine Generals Problem
❖ Consider a city under siege by two allied armies
❖ Each army has a general
❖ One general is the leader
❖ Armies must agree when to attack
❖ Must use messengers to communicate
❖ Messengers can be captured by defenders
![Page 34: From Mainframe to Microservice: An Introduction to Distributed Systems](https://reader036.vdocuments.us/reader036/viewer/2022062308/55d57882bb61ebae2f8b45ba/html5/thumbnails/34.jpg)
Byzantine Generals Problem
![Page 35: From Mainframe to Microservice: An Introduction to Distributed Systems](https://reader036.vdocuments.us/reader036/viewer/2022062308/55d57882bb61ebae2f8b45ba/html5/thumbnails/35.jpg)
Byzantine Generals Problem
❖ Send 100 messages, attack no matter what
❖ A might attack without B
❖ Send 100 messages, wait for acks, attack if confident
❖ B might attack without A
❖ Messages have overhead
❖ Can’t reliably make decision (provenly impossible)
![Page 36: From Mainframe to Microservice: An Introduction to Distributed Systems](https://reader036.vdocuments.us/reader036/viewer/2022062308/55d57882bb61ebae2f8b45ba/html5/thumbnails/36.jpg)
Distributed Consensus
❖ Replace 2 generals with N generals
❖ Nodes must agree on data value
❖ Solutions:
❖ Multi-phase commit
❖ State replication
![Page 37: From Mainframe to Microservice: An Introduction to Distributed Systems](https://reader036.vdocuments.us/reader036/viewer/2022062308/55d57882bb61ebae2f8b45ba/html5/thumbnails/37.jpg)
Two-Phase Commit
❖ Blocking protocol
❖ Coordinator waits for cohorts
❖ Cohorts wait for commit/rollback
❖ Can deadlock
![Page 38: From Mainframe to Microservice: An Introduction to Distributed Systems](https://reader036.vdocuments.us/reader036/viewer/2022062308/55d57882bb61ebae2f8b45ba/html5/thumbnails/38.jpg)
Three-Phase Commit
❖ Non-blocking protocol
❖ Abort on timeouts
❖ Susceptible to network partitions
![Page 39: From Mainframe to Microservice: An Introduction to Distributed Systems](https://reader036.vdocuments.us/reader036/viewer/2022062308/55d57882bb61ebae2f8b45ba/html5/thumbnails/39.jpg)
State Replication
❖ E.g. Paxos, Raft protocols
❖ Elect a leader (coordinator)
❖ All changes go through leader
❖ Each change appends log entry
❖ Each node has log replica
![Page 40: From Mainframe to Microservice: An Introduction to Distributed Systems](https://reader036.vdocuments.us/reader036/viewer/2022062308/55d57882bb61ebae2f8b45ba/html5/thumbnails/40.jpg)
State Replication
❖ Must have quorum (majority) to proceed
❖ Commit once quorum acks
❖ Quorums mitigate partitions
❖ Logs allow state to be rebuilt
![Page 41: From Mainframe to Microservice: An Introduction to Distributed Systems](https://reader036.vdocuments.us/reader036/viewer/2022062308/55d57882bb61ebae2f8b45ba/html5/thumbnails/41.jpg)
Split-Brain
![Page 42: From Mainframe to Microservice: An Introduction to Distributed Systems](https://reader036.vdocuments.us/reader036/viewer/2022062308/55d57882bb61ebae2f8b45ba/html5/thumbnails/42.jpg)
Split-Brain
![Page 43: From Mainframe to Microservice: An Introduction to Distributed Systems](https://reader036.vdocuments.us/reader036/viewer/2022062308/55d57882bb61ebae2f8b45ba/html5/thumbnails/43.jpg)
Split-Brain
![Page 44: From Mainframe to Microservice: An Introduction to Distributed Systems](https://reader036.vdocuments.us/reader036/viewer/2022062308/55d57882bb61ebae2f8b45ba/html5/thumbnails/44.jpg)
Split-Brain
❖ Optimistic (AP) - let partitions work as usual
❖ Pessimistic (CP) - quorum partition works, fence others
![Page 45: From Mainframe to Microservice: An Introduction to Distributed Systems](https://reader036.vdocuments.us/reader036/viewer/2022062308/55d57882bb61ebae2f8b45ba/html5/thumbnails/45.jpg)
Hybrid Consistency Models❖ Weak == available, low latency, stale reads
❖ Strong == fresh reads, less available, high latency
❖ How do you choose a consistency model?
❖ Hybrid models
❖ Weaker models when possible (likes, followers, votes)
❖ Stronger models when necessary
❖ Tunable consistency models (Cassandra, Riak, etc.)
![Page 46: From Mainframe to Microservice: An Introduction to Distributed Systems](https://reader036.vdocuments.us/reader036/viewer/2022062308/55d57882bb61ebae2f8b45ba/html5/thumbnails/46.jpg)
Scaling Shared Data❖ Sharing mutable data at large scale is difficult
❖ Solutions:
❖ Immutable data
❖ Last write wins
❖ Application-level conflict resolution
❖ Causal ordering (e.g. vector clocks)
❖ Distributed data types (CRDTs)
![Page 47: From Mainframe to Microservice: An Introduction to Distributed Systems](https://reader036.vdocuments.us/reader036/viewer/2022062308/55d57882bb61ebae2f8b45ba/html5/thumbnails/47.jpg)
Scaling Shared Data
Imagine a shared, global counter…
“Get, add 1, and put” transaction will not scale
![Page 48: From Mainframe to Microservice: An Introduction to Distributed Systems](https://reader036.vdocuments.us/reader036/viewer/2022062308/55d57882bb61ebae2f8b45ba/html5/thumbnails/48.jpg)
CRDT
❖ Conflict-free Replicated Data Type
❖ Convergent: state-based
❖ Commutative: operations-based
❖ E.g. distributed sets, lists, maps, counters
❖ Update concurrently w/o writer coordination
![Page 49: From Mainframe to Microservice: An Introduction to Distributed Systems](https://reader036.vdocuments.us/reader036/viewer/2022062308/55d57882bb61ebae2f8b45ba/html5/thumbnails/49.jpg)
CRDT❖ CRDTs always converge (provably)
❖ Operations commute (order doesn’t matter)
❖ Highly available, eventually consistent
❖ Always reach consistent state
❖ Drawbacks:
❖ Requires knowledge of all clients
❖ Must be associative, commutative, and idempotent
![Page 50: From Mainframe to Microservice: An Introduction to Distributed Systems](https://reader036.vdocuments.us/reader036/viewer/2022062308/55d57882bb61ebae2f8b45ba/html5/thumbnails/50.jpg)
G-Counter
![Page 51: From Mainframe to Microservice: An Introduction to Distributed Systems](https://reader036.vdocuments.us/reader036/viewer/2022062308/55d57882bb61ebae2f8b45ba/html5/thumbnails/51.jpg)
CRDT
❖ Add to set is associative, commutative, idempotent ❖ add(“a”), add(“b”), add(“a”) => {“a”, “b”}
❖ Adding and removing items is not ❖ add(“a”), remove(“a”) => {}
❖ remove(“a”), add(“a”) => {“a”}
❖ CRDTs require interpretation of common data structures w/ limitations
![Page 52: From Mainframe to Microservice: An Introduction to Distributed Systems](https://reader036.vdocuments.us/reader036/viewer/2022062308/55d57882bb61ebae2f8b45ba/html5/thumbnails/52.jpg)
Two-Phase Set❖ Use two sets, one for adding, one for removing
❖ Elements can be added once and removed once
❖ { “a”: [“a”, “b”, “c”], “r”: [“a”]}
❖ => {“b”, “c”}
❖ add(“a”), remove(“a”) => {“a”: [“a”], “r”: [“a”]}
❖ remove(“a”), add(“a”) => {“a”: [“a”], “r”: [“a”]}
![Page 53: From Mainframe to Microservice: An Introduction to Distributed Systems](https://reader036.vdocuments.us/reader036/viewer/2022062308/55d57882bb61ebae2f8b45ba/html5/thumbnails/53.jpg)
![Page 54: From Mainframe to Microservice: An Introduction to Distributed Systems](https://reader036.vdocuments.us/reader036/viewer/2022062308/55d57882bb61ebae2f8b45ba/html5/thumbnails/54.jpg)
Let’s Recap...
![Page 55: From Mainframe to Microservice: An Introduction to Distributed Systems](https://reader036.vdocuments.us/reader036/viewer/2022062308/55d57882bb61ebae2f8b45ba/html5/thumbnails/55.jpg)
Distributed architectures allow us to build highly available, fault-tolerant systems.
![Page 56: From Mainframe to Microservice: An Introduction to Distributed Systems](https://reader036.vdocuments.us/reader036/viewer/2022062308/55d57882bb61ebae2f8b45ba/html5/thumbnails/56.jpg)
We can't live in this fantasy land where everything works perfectly
all of the time.
![Page 57: From Mainframe to Microservice: An Introduction to Distributed Systems](https://reader036.vdocuments.us/reader036/viewer/2022062308/55d57882bb61ebae2f8b45ba/html5/thumbnails/57.jpg)
![Page 58: From Mainframe to Microservice: An Introduction to Distributed Systems](https://reader036.vdocuments.us/reader036/viewer/2022062308/55d57882bb61ebae2f8b45ba/html5/thumbnails/58.jpg)
![Page 59: From Mainframe to Microservice: An Introduction to Distributed Systems](https://reader036.vdocuments.us/reader036/viewer/2022062308/55d57882bb61ebae2f8b45ba/html5/thumbnails/59.jpg)
Shit happens — network partitions, hardware failure, GC pauses, latency, dropped packets…
![Page 60: From Mainframe to Microservice: An Introduction to Distributed Systems](https://reader036.vdocuments.us/reader036/viewer/2022062308/55d57882bb61ebae2f8b45ba/html5/thumbnails/60.jpg)
Build resilient systems.
![Page 61: From Mainframe to Microservice: An Introduction to Distributed Systems](https://reader036.vdocuments.us/reader036/viewer/2022062308/55d57882bb61ebae2f8b45ba/html5/thumbnails/61.jpg)
Design for failure.
![Page 62: From Mainframe to Microservice: An Introduction to Distributed Systems](https://reader036.vdocuments.us/reader036/viewer/2022062308/55d57882bb61ebae2f8b45ba/html5/thumbnails/62.jpg)
kill -9
![Page 63: From Mainframe to Microservice: An Introduction to Distributed Systems](https://reader036.vdocuments.us/reader036/viewer/2022062308/55d57882bb61ebae2f8b45ba/html5/thumbnails/63.jpg)
Consider the trade-off between consistency and availability.
![Page 64: From Mainframe to Microservice: An Introduction to Distributed Systems](https://reader036.vdocuments.us/reader036/viewer/2022062308/55d57882bb61ebae2f8b45ba/html5/thumbnails/64.jpg)
Partition tolerance is not an option, it’s required.
(if you’re building a distributed system)
![Page 65: From Mainframe to Microservice: An Introduction to Distributed Systems](https://reader036.vdocuments.us/reader036/viewer/2022062308/55d57882bb61ebae2f8b45ba/html5/thumbnails/65.jpg)
Use weak consistency when possible, strong when necessary.
![Page 66: From Mainframe to Microservice: An Introduction to Distributed Systems](https://reader036.vdocuments.us/reader036/viewer/2022062308/55d57882bb61ebae2f8b45ba/html5/thumbnails/66.jpg)
Sharing data at scale is hard, let’s go shopping.
(or consider your options)
![Page 67: From Mainframe to Microservice: An Introduction to Distributed Systems](https://reader036.vdocuments.us/reader036/viewer/2022062308/55d57882bb61ebae2f8b45ba/html5/thumbnails/67.jpg)
State is hell.
![Page 68: From Mainframe to Microservice: An Introduction to Distributed Systems](https://reader036.vdocuments.us/reader036/viewer/2022062308/55d57882bb61ebae2f8b45ba/html5/thumbnails/68.jpg)
Further Readings❖ Jepsen series
Kyle Kingsbury (aphyr)
❖ A Comprehensive Study of Convergent and Commutative Replicated Data TypesShapiro et al.
❖ In Search of an Understandable Consensus Algorithm Ongaro et al.
❖ CAP Twelve Years Later Eric Brewer
❖ Many, many more…
![Page 69: From Mainframe to Microservice: An Introduction to Distributed Systems](https://reader036.vdocuments.us/reader036/viewer/2022062308/55d57882bb61ebae2f8b45ba/html5/thumbnails/69.jpg)
@tyler_treat github.com/tylertreatbravenewgeek.com
Thanks!