cassandra & spark for iot
TRANSCRIPT
![Page 1: Cassandra & Spark for IoT](https://reader030.vdocuments.us/reader030/viewer/2022021506/586fdaf51a28ab18428b5f33/html5/thumbnails/1.jpg)
Cassandra & Spark for IoT_
Matthias Niehoff
![Page 2: Cassandra & Spark for IoT](https://reader030.vdocuments.us/reader030/viewer/2022021506/586fdaf51a28ab18428b5f33/html5/thumbnails/2.jpg)
Cassandra
2
![Page 3: Cassandra & Spark for IoT](https://reader030.vdocuments.us/reader030/viewer/2022021506/586fdaf51a28ab18428b5f33/html5/thumbnails/3.jpg)
•Distributed database
•Highly Available
•Horizontal & Linear Scalable
•Multi Datacenter Support
•No Single Point Of Failure
•Chooses Availability Over Strong Consistency
Cassandra for IoT_
3
Node 1
Node 2
Node 3
Node 4
1-25
26-50 51-75
76-0
![Page 4: Cassandra & Spark for IoT](https://reader030.vdocuments.us/reader030/viewer/2022021506/586fdaf51a28ab18428b5f33/html5/thumbnails/4.jpg)
Great for Time Series Data_
4
CREATETABLEsensors(sensorIduuid,timetimeuuid,metricNametext,metricValuedouble,PRIMARYKEY(sensorId,time)
)
id t1 t2 t3 t4 t5 t6 t7 t8 t9 t10 t11
Stored sequentially on disk
![Page 5: Cassandra & Spark for IoT](https://reader030.vdocuments.us/reader030/viewer/2022021506/586fdaf51a28ab18428b5f33/html5/thumbnails/5.jpg)
Spark
5
![Page 6: Cassandra & Spark for IoT](https://reader030.vdocuments.us/reader030/viewer/2022021506/586fdaf51a28ab18428b5f33/html5/thumbnails/6.jpg)
•Open Source & Apache project since 2010
•Data processing Framework • Batch processing • Stream processing
What Is Apache Spark_
6
![Page 7: Cassandra & Spark for IoT](https://reader030.vdocuments.us/reader030/viewer/2022021506/586fdaf51a28ab18428b5f33/html5/thumbnails/7.jpg)
•Fast • up to 100 times faster than Hadoop • a lot of in-memory processing • linear scalable using more nodes
• Easy • Scala, Java and Python API • Clean Code (e.g. with lambdas in Java 8) • expanded API: map, reduce, filter, groupBy, sort, union, join,
reduceByKey, groupByKey, sample, take, first, count
• Fault-Tolerant • easily reproducible
Why Use Spark_
7
![Page 8: Cassandra & Spark for IoT](https://reader030.vdocuments.us/reader030/viewer/2022021506/586fdaf51a28ab18428b5f33/html5/thumbnails/8.jpg)
•RDD‘s – Resilient Distributed Dataset • Read–Only description of a collection of objects • Partitioned for distribution • Determined through transformations • Allows automatically rebuild on failure
•Operations • Transformations (map,filter,reduce...) —> new RDD • Actions (count, collect, save)
•Only Actions start processing!
Easily Reproducable?_
8
![Page 9: Cassandra & Spark for IoT](https://reader030.vdocuments.us/reader030/viewer/2022021506/586fdaf51a28ab18428b5f33/html5/thumbnails/9.jpg)
RDD Example_
9
scala>valtextFile=sc.textFile("README.md")textFile:spark.RDD[String]=spark.MappedRDD@2ee9b6e3
scala>vallinesWithSpark=textFile.filter(line=>line.contains("Spark"))linesWithSpark:spark.RDD[String]=spark.FilteredRDD@7dd4af09
scala>linesWithSpark.count()res0:Long=126
![Page 10: Cassandra & Spark for IoT](https://reader030.vdocuments.us/reader030/viewer/2022021506/586fdaf51a28ab18428b5f33/html5/thumbnails/10.jpg)
Spark & Cassandra
10
![Page 11: Cassandra & Spark for IoT](https://reader030.vdocuments.us/reader030/viewer/2022021506/586fdaf51a28ab18428b5f33/html5/thumbnails/11.jpg)
•Spark Cassandra Connector by Datastax • https://github.com/datastax/spark-cassandra-connector
• Cassandra tables as Spark RDD (read & write)
• Mapping of C* tables and rows onto Java/Scala objects
• Server-Side filtering („where“)
• Included as Maven / SBT dependency in your application
Connecting Spark With Cassandra_
11
![Page 12: Cassandra & Spark for IoT](https://reader030.vdocuments.us/reader030/viewer/2022021506/586fdaf51a28ab18428b5f33/html5/thumbnails/12.jpg)
Two Datacenter - Two Purposes_
12
C*
C*
C*C*
C*
C*
C*C*
Spark WN
Spark WNSpark
WN
Spark WN
Spark Master
DC1 - Online DC2 - Analytics
![Page 13: Cassandra & Spark for IoT](https://reader030.vdocuments.us/reader030/viewer/2022021506/586fdaf51a28ab18428b5f33/html5/thumbnails/13.jpg)
Spark Streaming
13
![Page 14: Cassandra & Spark for IoT](https://reader030.vdocuments.us/reader030/viewer/2022021506/586fdaf51a28ab18428b5f33/html5/thumbnails/14.jpg)
• Real Time Processing using micro batches
• Supported sources: Files, TCP, MQTT, Kafka, Twitter,..
• Data as Discretized Stream (DStream)
• Same programming model as for batches
• All Operations of the Spark Core, SQL and MLLib
• Stateful Operations & Sliding Windows
Stream Processing With Spark Streaming_
14
![Page 15: Cassandra & Spark for IoT](https://reader030.vdocuments.us/reader030/viewer/2022021506/586fdaf51a28ab18428b5f33/html5/thumbnails/15.jpg)
valssc=newStreamingContext(sc,Milliseconds(500))vallines=MQTTUtils.createStream(ssc,"tcp://localhost:1883","foo",StorageLevel.MEMORY_ONLY_SER_2)
valkeyValue=lines.map(input=>input.toLowerCase)
data.foreachRDD(_.saveToCassandra("mqtt","sensors"))
ssc.start()
//awaitmanualterminationorerrorssc.awaitTermination()
//manualterminationssc.stop()
Spark Streaming - MQTT Example_
15
![Page 16: Cassandra & Spark for IoT](https://reader030.vdocuments.us/reader030/viewer/2022021506/586fdaf51a28ab18428b5f33/html5/thumbnails/16.jpg)
Use Cases
16
![Page 17: Cassandra & Spark for IoT](https://reader030.vdocuments.us/reader030/viewer/2022021506/586fdaf51a28ab18428b5f33/html5/thumbnails/17.jpg)
•Spark Streaming • Continuous data streams • MQTT, Kafka, ZeroMQ... • Easily reliable
• Spark Core • Existing data • SQL Databases, CSV, Json...
• Use the same programming model or even the same code!
Use Cases for Spark and Cassandra in IoT_
17
Ingestion
![Page 18: Cassandra & Spark for IoT](https://reader030.vdocuments.us/reader030/viewer/2022021506/586fdaf51a28ab18428b5f33/html5/thumbnails/18.jpg)
• Real-Time Analysis • React on events • Join with existing data • Apply events on ML models
• Batch Analysis • Scheduled jobs • Analytics on the data • Train ML models
Use Cases for Spark and Cassandra in IoT_
18
Analyses
![Page 19: Cassandra & Spark for IoT](https://reader030.vdocuments.us/reader030/viewer/2022021506/586fdaf51a28ab18428b5f33/html5/thumbnails/19.jpg)
Demo
19
![Page 20: Cassandra & Spark for IoT](https://reader030.vdocuments.us/reader030/viewer/2022021506/586fdaf51a28ab18428b5f33/html5/thumbnails/20.jpg)
Questions?
Matthias Niehoff, IT-Consultant
90
codecentric AG Zeppelinstraße 2 76185 Karlsruhe, Germany
mobil: +49 (0) 172.1702676 [email protected]
www.codecentric.de blog.codecentric.de
matthiasniehoff