devops spark streaming
TRANSCRIPT
creditexampleSpark Streaming
1. https://spark.apache.org/docs/latest/cluster-overview.html
Berkeley AMPLab 2009Fast, general purpose cluster computing platform10X to 100X faster than Hadoop - runs in-memoryon top of Hadoop
1. Open source implementation forResilient Distributed Datasets(RDD's)
2. Advanced DAG execution enginesupporting cyclic data flow and in-memory computing
3. Java, Scala, Python and R4. Mesos, Yarn, StandAlone, Cloud,
Notebook5. HDFS, Hive, Cassandra, HBase,
Tachyon, Hadoop
RDD's + DAG + Lazy ExecutionRDD's + DAG + Lazy Execution
credit: Pietro Michirardi - Spark Internals
credit: http://spark.apache.org/docs/1.0.0/streaming-programming-guide.html
Spark StreamingSpark Streamingecosystem
credit: http://techblog.netflix.com/2015/03/can-spark-streaming-survive-chaos-monkey.html
NETFLIX ARCHITECTURENETFLIX ARCHITECTURE
credit:
credit:
credit: Diving into Spark Streaming
credit: Diving into Spark Streaming
credit: Diving into Spark Streaming
credit: Amplab - deep dive into Spark Streaming
credit: Amplab - deep dive into Spark Streaming
ZooKeeper is a system for distributedcoordination and service discovery
Is highly-available
ZooKeeper Features
Distributed coordinationDistributed queuesDistributed locksDiscovery service Leader election
Distributed: runs on a set of servers called brokers ScalablePublisher-Subscriber System - topic based subscriptionReliable - messages passed to Kafka are replicated andpersisted to diskPreserves message order
Credit: http://www.michael-noll.com/blog/2013/03/13/running-a-multi-broker-apache-kafka-cluster-on-a-single-node/
localhost:2181
credit: https://spark.apache.org/docs/latest/streaming-programming-guide.html#a-quick-examplecredit: Jeremy Freeman
Semantics
At most onceAt least onceExactly once
credit: https://spark.apache.org/docs/latest/streaming-programming-guide.html
Spark Streaming supports "at least once"and with Kafka "exactly once"
credit:https://databricks.com/blog/2015/03/30/improvements-to-kafka-integration-of-spark-streaming.html
At Least Once
example
credit:https://databricks.com/blog/2015/03/30/improvements-to-kafka-integration-of-spark-streaming.html
Exactly Once example
http://spark.apache.org/docs/latest/streaming-kafka-integration.html#approach-2-direct-
approach-no-receivers
Lambda Architecture - combine batch andstreaming data
credit: Strata+Hadoop NYC
Combine machine learning to real-timedata
1. credit: Strata+Hadoop NYC
credit: Strata+Hadoop NYC
Combine SQL with real-time data
credit: Hadoop+Strata NYC