storm at forter
TRANSCRIPT
![Page 1: Storm at Forter](https://reader031.vdocuments.us/reader031/viewer/2022022411/58f052ad1a28abbc1f8b458b/html5/thumbnails/1.jpg)
Storm at
Picture https://www.flickr.com/photos/silentmind8/15865860242 by silentmind8 under CC BY 2.0 http://creativecommons.org/licenses/by/2.0/
![Page 2: Storm at Forter](https://reader031.vdocuments.us/reader031/viewer/2022022411/58f052ad1a28abbc1f8b458b/html5/thumbnails/2.jpg)
Forter• We detect fraud
• A lot of data is collected
• New data can introduce new data sources
• At transaction time, we do our magic. Fast.
• We deny less
![Page 3: Storm at Forter](https://reader031.vdocuments.us/reader031/viewer/2022022411/58f052ad1a28abbc1f8b458b/html5/thumbnails/3.jpg)
What’s Storm?• Streaming/data-pipeline infrastructure
• What’s a pipeline?
• “Topology” driven flow, static
• Written over JVM and also supports Python and Node.js
• Easy clustering
• Apache top level project, large community
![Page 4: Storm at Forter](https://reader031.vdocuments.us/reader031/viewer/2022022411/58f052ad1a28abbc1f8b458b/html5/thumbnails/4.jpg)
Storm Lingo• Tuples
• The basic data transfer object in storm. Basically a dictionary (key->val).
• Spouts
• Entry points into the pipe. This is where data comes from.
• Bolts
• Components that can transform and route tuples
• Joins
• Joins are where async branches of the topology meet and join
• Streams
• Streams allow for flow control in the topology
![Page 5: Storm at Forter](https://reader031.vdocuments.us/reader031/viewer/2022022411/58f052ad1a28abbc1f8b458b/html5/thumbnails/5.jpg)
System challenges• Latency should be determined by business needs -
flexible per customer (300ms - customers who just don’t care)
• Data dependencies in decision part can get very complex
• Getting data can be slow, especially 3rd party
• Data scientists write in Python
• Should be scaleable, because we’re ever growing
• Should be very granularly monitored
![Page 6: Storm at Forter](https://reader031.vdocuments.us/reader031/viewer/2022022411/58f052ad1a28abbc1f8b458b/html5/thumbnails/6.jpg)
Bird’s eye view
• Two systems:
• System 1: data prefetching & preparing
• System 2: decision engine, must have all available data handy at TX time
![Page 7: Storm at Forter](https://reader031.vdocuments.us/reader031/viewer/2022022411/58f052ad1a28abbc1f8b458b/html5/thumbnails/7.jpg)
System 1: high throughput pipeline
• Stream Batching
• Prefetching / Preparing
• Common use case, lots of competitors
![Page 8: Storm at Forter](https://reader031.vdocuments.us/reader031/viewer/2022022411/58f052ad1a28abbc1f8b458b/html5/thumbnails/8.jpg)
System 2: low latency decision
• Dedicated everything
• Complex dependency graph
• Less common, fewer players
![Page 9: Storm at Forter](https://reader031.vdocuments.us/reader031/viewer/2022022411/58f052ad1a28abbc1f8b458b/html5/thumbnails/9.jpg)
System 1High Throughput
![Page 10: Storm at Forter](https://reader031.vdocuments.us/reader031/viewer/2022022411/58f052ad1a28abbc1f8b458b/html5/thumbnails/10.jpg)
Cache and cache layering• Storm constructs make it easy to tweak caches,
add enrichment steps transparently
• Different enrichment operations may require different execution power
• Each operation can be replaced by a sub-topology - layering of cache levels
• Field grouping allows the ability to maintain state in components - local cache or otherwise
![Page 11: Storm at Forter](https://reader031.vdocuments.us/reader031/viewer/2022022411/58f052ad1a28abbc1f8b458b/html5/thumbnails/11.jpg)
![Page 12: Storm at Forter](https://reader031.vdocuments.us/reader031/viewer/2022022411/58f052ad1a28abbc1f8b458b/html5/thumbnails/12.jpg)
Maintain a stored state• Many events coming in, some cause a state to
change
• State of a working set is saved in memory
• New/old states are fetched from an external data source
• Sate updates are saved immediately
• State machine is scalable - again, field grouping
![Page 13: Storm at Forter](https://reader031.vdocuments.us/reader031/viewer/2022022411/58f052ad1a28abbc1f8b458b/html5/thumbnails/13.jpg)
![Page 14: Storm at Forter](https://reader031.vdocuments.us/reader031/viewer/2022022411/58f052ad1a28abbc1f8b458b/html5/thumbnails/14.jpg)
And the rest…
• Batching content for writing (Storm’s tick tuples)
• Aggregating events in memory
• Throttling/Circuit-breaking external calls
![Page 15: Storm at Forter](https://reader031.vdocuments.us/reader031/viewer/2022022411/58f052ad1a28abbc1f8b458b/html5/thumbnails/15.jpg)
System 2: Low Latency
![Page 16: Storm at Forter](https://reader031.vdocuments.us/reader031/viewer/2022022411/58f052ad1a28abbc1f8b458b/html5/thumbnails/16.jpg)
Unique Challenges• Scaling. Resources need to be very dedicated,
parallelizing is bad
• Join logic is much stricter, with short timeouts
• Data validity is crucial for the stream routing
• Error handling
• Component graph is immense and hard to contain mentally - especially considering the delicate time window configurations.
![Page 17: Storm at Forter](https://reader031.vdocuments.us/reader031/viewer/2022022411/58f052ad1a28abbc1f8b458b/html5/thumbnails/17.jpg)
Scalability• Each topology is built to handle a fixed number of
parallel TXs. Storm’s max-spout-pending
• Each topology atomically polls a queue
• Trying to keep as much of the logic in the same process to reduce network and serialization costs
• Latency is the only measure
![Page 18: Storm at Forter](https://reader031.vdocuments.us/reader031/viewer/2022022411/58f052ad1a28abbc1f8b458b/html5/thumbnails/18.jpg)
Joining and errors• Waiting is not an option
• Tick tuples no good, break the single thread illusion
• Static topologies are easy to analyze and edit in runtime, and intervene
• Fallback streams are an elegant solution to the problem, preventing developers from explicitly defining escape routes
• Also allow for “try->finally” semantics
![Page 19: Storm at Forter](https://reader031.vdocuments.us/reader031/viewer/2022022411/58f052ad1a28abbc1f8b458b/html5/thumbnails/19.jpg)
Multilang• Storm allows running bolt processes (shell-bolt)
with the builtin capability of communicating through standard i/o
• Not hugely scalable, but works
• Implemented are: Node.js (our contribution) and Python
• We use for legacy and to keep data scientists happy
![Page 20: Storm at Forter](https://reader031.vdocuments.us/reader031/viewer/2022022411/58f052ad1a28abbc1f8b458b/html5/thumbnails/20.jpg)
Data Validity• Wrapping the bolts, we implemented contracts for
outputs
• Java POJOs with Hibernate Validator
• Contracts allow us “hard-typing” the links in the topologies
• Also help minimize data flow, especially to shell-bolts
• Checkout storm-data-contracts on github
![Page 21: Storm at Forter](https://reader031.vdocuments.us/reader031/viewer/2022022411/58f052ad1a28abbc1f8b458b/html5/thumbnails/21.jpg)
Managing Complexity• Complexity of the data dependencies is maintained
by literally drawing it.
• Nimbus REST APIs offer access to the topology layout
• Timing complexity reduced by synchronizing the joins to a shared point-in-time. Still pretty complex.
• Proves better than our previous iterative solution
![Page 22: Storm at Forter](https://reader031.vdocuments.us/reader031/viewer/2022022411/58f052ad1a28abbc1f8b458b/html5/thumbnails/22.jpg)
Monitoring• Nimbus metrics give out averages - not
good enough
• Reimann used to efficiently monitor latencies for every tuple in the system
• Inherent low latency monitoring issue: CPU utilization monitoring
• More at Itai Frenkel’s lecture
![Page 23: Storm at Forter](https://reader031.vdocuments.us/reader031/viewer/2022022411/58f052ad1a28abbc1f8b458b/html5/thumbnails/23.jpg)
Questions?
Contact info:
Re’em Bensimhon [email protected] / [email protected] linkedin.com/in/bensimhon twitter: @reembs