comparing processing frameworks v7
Post on 18-Aug-2015
130 Views
Preview:
TRANSCRIPT
Spark / StormSpark Storm
Implemented in Scala Clojure, Java
Delivery Semantics Exactly once At least once. Exactly once with Trident
APIPython, Java, Scala Java, Scala, Clojure,
Python, etc. Trident: Java, Scala, Clojure.
Processing ModelBatch. Micro-batches with Spark Streaming. ~ 500ms
Record at a time/ Trident allows for micro-batches.
Latency 1 - 2 seconds sub-seconds
Node Specifications
Spark Streaming Storm
4 AWS nodes m3.medium
Zookeeper 3.4.6Kafka 0.8.2.1
Spark (streaming) 1.3
4 AWS nodes m3.medium
Zookeeper 3.4.6Kafka 0.8.2.1Storm 0.9.5
Spark Streaming: 1 master node, 3 workers
Cluster Configuration
Master node
Worker 1
Worker 2
Worker 3
Metric
Throughput: amount of data that is being processed.
● By changing batch size
● By changing load (i.e. Scaling up)
● Programs used for benchmarking will be wordcount.
# Producers Batch Interval
1 1s, 2s, 3s, 4s, 6s
41s, 2s, 3s, 4s, 6s
8 1s, 2s, 3s, 4s, 6s
Tests for Spark Streaming
# Producers Tuples Emitted-Acked
1 10 min
4 10 min
8 10 min
Tests for Storm
Preliminary results for storm
Takeaways
● Setting the batch interval in spark streaming should be done by monitoring processing times and load size
● For Storm as numbers of producers increase so does throughput and spout latency.
Would like to add:
● Increase number of producers. Use real data.
● Add a graph as a second use case.
● Dashboard to monitor live streaming.
top related