top mistakes when writing reactive applications - scala by the bay 2016
TRANSCRIPT
Petr Zapletal @petr_zapletal@ScalaByTheBay
@cakesolutions
Top Mistakes When Writing Reactive Applications
Agenda
● Motivation
● Actors vs Futures
● Serialization
● Graceful Shutdown
● Distributed Transactions
● Longtail Latencies
● Quick Tips
Pick the Right Tool for The JobLocal Abstractions Distribution
Akka TYPED
Akka
ACTORS
Power
Constraints
Akka Stream
Actor Use Cases
● State management
● Location transparency
● Resilience mechanisms
● Single writer
● In-memory lock-free cache
● Sharding
Akka
ACTOR
Avoid Java Serialization
Java Serialization is the default in Akka, since it is easy to start with it, but is very slow and footprint heavy
Java Serialization - Footprint
Java Serialization:
----sr--model.Order----h#-----J--idL--customert--Lmodel/Customer;L--descriptiont--Ljava/lang/String;L--orderLinest--Ljava/util/List;L--totalCostt--Ljava/math/BigDecimal;xp--------ppsr--java.util.ArrayListx-----a----I--sizexp----w-----sr--model.OrderLine--&-1-S----I--lineNumberL--costq-~--L--descriptionq-~--L--ordert--Lmodel/Order;xp----sr--java.math.BigDecimalT--W--(O---I--scaleL--intValt--Ljava/math/BigInteger;xr--java.lang.Number-----------xp----sr--java.math.BigInteger-----;-----I--bitCountI--bitLengthI--firstNonzeroByteNumI--lowestSetBitI--signum[--magnitudet--[Bxq-~----------------------ur--[B------T----xp----xxpq-~--xq-~--
XML:
<order id="0" totalCost="0"><orderLines lineNumber="1" cost="0"><order>0</order></orderLines></order>
JSON:
{"order":{"id":0,"totalCost":0,"orderLines":[{"lineNumber":1,"cost":0,"order":0}]}}
Points of Interest
● Performance
● Footprint
● Schema evolution
● Implementation effort
● Human readability
● Language bindings
● Backwards & forwards compatibility
● ...
JSON
● Advantages:
○ Human readability
○ Simple & well known
○ Many good libraries for all platforms
● Disadvantages:
○ Slow
○ Large
○ Object names included
○ No schema (except e.g. json schema)
○ Format and precision issues
● json4s, circe, µPickle, spray-json, argonaut, rapture-json, play-json, …
Binary formats [Schema-less]
● Metadata send together with data
● Advantages:
○ Implementation effort
○ Performance
○ Footprint *
● Disadvantages:
○ No human readability
● Kryo, Binary JSON (MessagePack, BSON, ... )
Binary formats [Schema]
● Schema defined by some kind of DSL
● Advantages:
○ Performance
○ Footprint
○ Schema evolution
● Disadvantages:
○ Implementation effort
○ No human readability
● Protobuf (+ projects like Flatbuffers, Cap’n Proto, etc.), Thrift, Avro
Summary
● Should be always changed
● Depends on particular use case
● Quick tips:
○ json4s
○ kryo
○ protobuf
Graceful Shutdown
We have thousands of sharded actors on multiple nodes and we want to shut one of them down
High-level Procedure
1. JVM gets the shutdown signal
2. Coordinator tells all local ShardRegions to shut down gracefully
High-level Procedure
1. JVM gets the shutdown signal
2. Coordinator tells all local ShardRegions to shut down gracefully
3. Node leaves cluster
High-level Procedure
1. JVM gets the shutdown signal
2. Coordinator tells all local ShardRegions to shut down gracefully
3. Node leaves cluster
4. Coordinator gives singletons a grace period to migrate
High-level Procedure
1. JVM gets the shutdown signal
2. Coordinator tells all local ShardRegions to shut down gracefully
3. Node leaves cluster
4. Coordinator gives singletons a grace period to migrate
5. Actor System & JVM Termination
Integration with Sharded Actors
● Handling of added messages
○ Passivate() message for graceful stop
○ Context.stop() for immediate stop
● Priority mailbox
○ Priority message handling
○ Message retrying support
Summary
● We don’t want to lose data (usually)
● Shutdown coordinator on every node
● Integration with sharded actors
Distributed Transactions
Any situation where a single event results in the mutation of two separate sources of data which cannot be committed atomically
What’s Wrong With Them
● Simple happy paths
● 7 Fallacies of Distributed Programming
○ The network is reliable.
○ Latency is zero.
○ Bandwidth is infinite.
○ The network is secure.
○ Topology doesn't change.
○ There is one administrator.
○ Transport cost is zero.
○ The network is homogeneous.
Two-phase commit (2PC)Stage 1 - Prepare Stage 2 - Commit
Prepare
Prepared
PreparePrepared
Commit
Committed
CommitCommitted
Resource Manager
Resource Manager
Transaction Manager
Resource Manager
Resource Manager
Transaction Manager
The Big Trade-Off
● Distributed transactions can be usually avoided
○ Hard, expensive, fragile and do not scale
● Every business event needs to result in a single synchronous commit
● Other data sources should be updated asynchronously
● Introducing eventual consistency
Longtail Latencies
Consider a system where each service typically responds in 10ms but with a 99th percentile latency of one second
Longtail LatenciesLatency Normal vs. Longtail
Legend: Normal
Longtail
50
40
30
20
10
0 25 50 75 90 99 99.9
Late
ncy
(ms)
Percentile
Longtails really matter
● Latency accumulation
● Not just noise
● Don’t have to be power users
● Real problem
Tolerating Longtail Latencies
● Hedging your bet
● Tied requests
● Selectively increase replication factors
Tolerating Longtail Latencies
● Hedging your bet
● Tied requests
● Selectively increase replication factors
● Put slow machines on probation
Tolerating Longtail Latencies
● Hedging your bet
● Tied requests
● Selectively increase replication factors
● Put slow machines on probation
● Consider ‘good enough’ responses
Tolerating Longtail Latencies
● Hedging your bet
● Tied requests
● Selectively increase replication factors
● Put slow machines on probation
● Consider ‘good enough’ responses
● Hardware update
Quick Tips
● Monitoring
● Network partitions & Split Brain Resolver
● Blocking
● Too many actor systems
Quick Tips
● Monitoring
● Network partitions & Split Brain Resolver
● Blocking
● Too many actor systems
● Error Handling
MANCHESTER LONDON NEW YORK
@petr_zapletal @cakesolutions
347 708 1518
We are hiringhttp://www.cakesolutions.net/careers
References
● http://www.slideshare.net/ktoso/zen-of-akka
● http://eishay.github.io/jvm-serializers/prototype-results-page/
● http://java-persistence-performance.blogspot.com/2013/08/optimizing-java-serialization-java-vs.html
● https://github.com/romix/akka-kryo-serialization
● http://gotocon.com/dl/goto-chicago-2015/slides/CaitieMcCaffrey_ApplyingTheSagaPattern.pdf
● http://www.grahamlea.com/2016/08/distributed-transactions-microservices-icebergs/
● http://www.cs.duke.edu/courses/cps296.4/fall13/838-CloudPapers/dean_longtail.pdf
● https://engineering.linkedin.com/performance/who-moved-my-99th-percentile-latency
● http://doc.akka.io/docs/akka/rp-15v09p01/scala/split-brain-resolver.html