cassandra & spark for iot

Cassandra & Spark for IoT_

Matthias Niehoff

Cassandra

•Distributed database

•Highly Available

•Horizontal & Linear Scalable

•Multi Datacenter Support

•No Single Point Of Failure

•Chooses Availability Over Strong Consistency

Cassandra for IoT_

Node 1

Node 2

Node 3

Node 4

26-50 51-75

Great for Time Series Data_

CREATETABLEsensors(sensorIduuid,timetimeuuid,metricNametext,metricValuedouble,PRIMARYKEY(sensorId,time)

id t1 t2 t3 t4 t5 t6 t7 t8 t9 t10 t11

Stored sequentially on disk

•Open Source & Apache project since 2010

•Data processing Framework • Batch processing • Stream processing

What Is Apache Spark_

•Fast • up to 100 times faster than Hadoop • a lot of in-memory processing • linear scalable using more nodes

• Easy • Scala, Java and Python API • Clean Code (e.g. with lambdas in Java 8) • expanded API: map, reduce, filter, groupBy, sort, union, join,

reduceByKey, groupByKey, sample, take, first, count

• Fault-Tolerant • easily reproducible

Why Use Spark_

•RDD‘s – Resilient Distributed Dataset • Read–Only description of a collection of objects • Partitioned for distribution • Determined through transformations • Allows automatically rebuild on failure

•Operations • Transformations (map,filter,reduce...) —> new RDD • Actions (count, collect, save)

•Only Actions start processing!

Easily Reproducable?_

RDD Example_

scala>valtextFile=sc.textFile("README.md")textFile:spark.RDD[String]=spark.MappedRDD@2ee9b6e3

scala>vallinesWithSpark=textFile.filter(line=>line.contains("Spark"))linesWithSpark:spark.RDD[String]=spark.FilteredRDD@7dd4af09

scala>linesWithSpark.count()res0:Long=126

Spark & Cassandra

•Spark Cassandra Connector by Datastax • https://github.com/datastax/spark-cassandra-connector

• Cassandra tables as Spark RDD (read & write)

• Mapping of C* tables and rows onto Java/Scala objects

• Server-Side filtering („where“)

• Included as Maven / SBT dependency in your application

Connecting Spark With Cassandra_

Two Datacenter - Two Purposes_

Spark WN

Spark WNSpark

Spark WN

Spark Master

DC1 - Online DC2 - Analytics

Spark Streaming

• Real Time Processing using micro batches

• Supported sources: Files, TCP, MQTT, Kafka, Twitter,..

• Data as Discretized Stream (DStream)

• Same programming model as for batches

• All Operations of the Spark Core, SQL and MLLib

• Stateful Operations & Sliding Windows

Stream Processing With Spark Streaming_

valssc=newStreamingContext(sc,Milliseconds(500))vallines=MQTTUtils.createStream(ssc,"tcp://localhost:1883","foo",StorageLevel.MEMORY_ONLY_SER_2)

valkeyValue=lines.map(input=>input.toLowerCase)

data.foreachRDD(_.saveToCassandra("mqtt","sensors"))

ssc.start()

//awaitmanualterminationorerrorssc.awaitTermination()

//manualterminationssc.stop()

Spark Streaming - MQTT Example_

Use Cases

•Spark Streaming • Continuous data streams • MQTT, Kafka, ZeroMQ... • Easily reliable

• Spark Core • Existing data • SQL Databases, CSV, Json...

• Use the same programming model or even the same code!

Use Cases for Spark and Cassandra in IoT_

Ingestion

• Real-Time Analysis • React on events • Join with existing data • Apply events on ML models

• Batch Analysis • Scheduled jobs • Analytics on the data • Train ML models

Use Cases for Spark and Cassandra in IoT_

Analyses

Questions?

Matthias Niehoff, IT-Consultant

codecentric AG Zeppelinstraße 2 76185 Karlsruhe, Germany

mobil: +49 (0) 172.1702676 matthias.niehoff@codecentric.de

www.codecentric.de blog.codecentric.de

matthiasniehoff

cassandra & spark for iot

Software

spark cassandra 2016

spark with cassandra by christopher batey

spark and cassandra (hulu talk)

cassandra day 2014: interactive analytics with cassandra and...

cassandra spark integration - university of southern...

intro to py spark (and cassandra)

analytics with cassandra & spark

cassandra and spark

cassandra data maintenance with spark

iot, timeseries and prediction with android, cassandra and...

cassandra and spark sql

cassandra and spark - tim berglund

stratiodeep: an integration layer between spark and...

advanced apache spark meetup data sources api cassandra...

spark cassandra integration 2016

cassandra and iot

spark cassandra connector dataframes

cassandra london - c* spark connector

spark streaming with cassandra

using apache spark, apache kafka and apache...