comparing processing frameworks v7

Testing Processing Frameworks

Streamingand

Gabriela Choy

Spark / StormSpark Storm

Implemented in Scala Clojure, Java

Delivery Semantics Exactly once At least once. Exactly once with Trident

APIPython, Java, Scala Java, Scala, Clojure,

Python, etc. Trident: Java, Scala, Clojure.

Processing ModelBatch. Micro-batches with Spark Streaming. ~ 500ms

Record at a time/ Trident allows for micro-batches.

Latency 1 - 2 seconds sub-seconds

Pipeline

Streaming

Producer/s

Each pipeline is run independently

Node Specifications

Spark Streaming Storm

4 AWS nodes m3.medium

Zookeeper 3.4.6Kafka 0.8.2.1

Spark (streaming) 1.3

4 AWS nodes m3.medium

Zookeeper 3.4.6Kafka 0.8.2.1Storm 0.9.5

Spark Streaming: 1 master node, 3 workers

Cluster Configuration

Master node

Worker 1

Worker 2

Worker 3

Storm : 1 nimbus, 3 Supervisors

Cluster Configuration

Nimbus

Supervisor 1

Supervisor 2

Supervisor 3

Metric

Throughput: amount of data that is being processed.

● By changing batch size

● By changing load (i.e. Scaling up)

● Programs used for benchmarking will be wordcount.

# Producers Batch Interval

1 1s, 2s, 3s, 4s, 6s

41s, 2s, 3s, 4s, 6s

8 1s, 2s, 3s, 4s, 6s

Tests for Spark Streaming

Throughput for 1 producer with 95% CI

Throughput for 1 producer

Throughput for 4 producers with 95% CI

Throughput for 4 producers

Throughput for 8 producers

# Producers Tuples Emitted-Acked

1 10 min

4 10 min

8 10 min

Tests for Storm

Preliminary results for storm

Tuples Emitted per Second

Tuples Acked per Second

Spout Latency

Takeaways

● Setting the batch interval in spark streaming should be done by monitoring processing times and load size

● For Storm as numbers of producers increase so does throughput and spout latency.

Would like to add:

● Increase number of producers. Use real data.

● Add a graph as a second use case.

● Dashboard to monitor live streaming.

Gabriela Choy

Bsc. in Chem. Engineering. ULA, Vnzla.Msc in Statistics. UT Dallas

Previously: Worked in Device Reliability Engineering at View, Inc.

About Me

comparing processing frameworks v7

s tests

spark storm spark storm

producers tuples

numbers of producers

number of producers

producers batch interval

live streaming

metric throughput

Technology

on comparing the expressing power of access control model...

comparing jvm web frameworks - jfokus 2012

dr. mohammad iqbal thanks to aditya sengupta comparing web...

comparing hot javascript frameworks: angularjs, ember.js and...

mapreduce frameworks: comparing hadoop and hpccceur-ws.org...

comparing java web frameworks apache con eu2007

comparing semantic and syntactic methods in mechanized proof...

comparing jvm web frameworks devoxxfr2013

comparing capacity building frameworks for computer science...

comparing multi-platform mobile apps frameworks

comparing web frameworks

comparing jvm web frameworks - february 2014

comparing java web frameworks

comparing jvm web frameworks - devoxx france 2013

comparing flex frameworks -...

comparing jvm web frameworks - devoxx 2010

comparing modelling frameworks e a workshop...

comparing flex frameworks

comparing jvm web frameworks

comparing hot javascript frameworks