big data streaming with apache spark on azure

30
Big data streaming Willem Meints

Upload: willem-meints

Post on 21-Feb-2017

106 views

Category:

Technology


6 download

TRANSCRIPT

Page 1: Big data streaming with Apache Spark on Azure

Big data streamingWillem Meints

Page 2: Big data streaming with Apache Spark on Azure

Microservices & analytics

Page 3: Big data streaming with Apache Spark on Azure

Event bus

Micro services• Multiple smaller services that scale independedely• Each service his own data store• Data flows between services through the event bus

Page 4: Big data streaming with Apache Spark on Azure

Rapids

Rivers

Lakes

Page 5: Big data streaming with Apache Spark on Azure

Data analytics challenges with microservices• A complete picture is there, but spread over a vast landscape• Most data doesn’t come in a database• Data changes rapidly

Page 6: Big data streaming with Apache Spark on Azure

Exploring some scenarios

Page 7: Big data streaming with Apache Spark on Azure

Scenario 1: Get a annual sales report• The goal is to get a complete picture of the situation• Data based on business events

Page 8: Big data streaming with Apache Spark on Azure

OrdersInvoices

Event bus

Data analytics

Data Lake

Page 9: Big data streaming with Apache Spark on Azure

OrdersInvoices

Event bus

Data analytics

Data Lake

Page 10: Big data streaming with Apache Spark on Azure

Scenario 2: Detect anomalies• The goal is to detect anomalies on the website and prevent abuse• Machine learning needed to detect the anomalies• Data based on the data lake

Page 11: Big data streaming with Apache Spark on Azure

Click stream collector

Event bus

Data analytics

Data Lake

Page 12: Big data streaming with Apache Spark on Azure

Click stream collector

Event bus

Data analytics

Data Lake

Model

Page 13: Big data streaming with Apache Spark on Azure

Analytics tools

Page 14: Big data streaming with Apache Spark on Azure

vs

Page 15: Big data streaming with Apache Spark on Azure

Event bus Data processing tool

Distributed database

Alerting

Dashboarding

Page 16: Big data streaming with Apache Spark on Azure

Event bus Data processing tool

Distributed database

Alerting

Dashboarding

Flow control logic

Cluster Manager

Page 17: Big data streaming with Apache Spark on Azure

The Azure based solution

Page 18: Big data streaming with Apache Spark on Azure

Azure Event Hub HDInsight

Azure Data Lake

Alerting

Dashboarding

Azure App Services

Cluster Manager

Page 19: Big data streaming with Apache Spark on Azure

DemoA short introduction into Apache Spark

Page 20: Big data streaming with Apache Spark on Azure

Spark SQL Spark Streaming Machine Learning GraphX

Apache Spark Core

Page 21: Big data streaming with Apache Spark on Azure
Page 22: Big data streaming with Apache Spark on Azure

Resilient Distributed Data SetsResilient Distributed Dataset

Partition

Record Record

Partition

Record Record

Page 23: Big data streaming with Apache Spark on Azure

Spark Streaming

Spark Engine

Stream Batches Processed data

Streams with Spark

Page 24: Big data streaming with Apache Spark on Azure

Spark Streaming

Spark Engine

Stream Batches Processed data

Streams with Spark

Lists of RDDs

Page 25: Big data streaming with Apache Spark on Azure

DemoDeploying Spark to Azure using HDInsight

Page 26: Big data streaming with Apache Spark on Azure

Azure Event Hubs• Capable of streaming large

volumes of data

• SDK available in many languages

• Ruby• Python• Java/Scala• C#• Apache Spark

Page 27: Big data streaming with Apache Spark on Azure

Hoe werkt een Azure Event Hub?

Publisher

Publisher

Publisher

Event Hub

Partition

Partition

Partition

Consumer group

Consumer group

Consumer

Consumer

Page 28: Big data streaming with Apache Spark on Azure

DemoUsing Azure Event Hub with Spark

Page 29: Big data streaming with Apache Spark on Azure

Tips for going in production• When using streams, always have n+1 worker nodes• More partitions = more speed• Longer intervals is slower, but sometimes better

Page 30: Big data streaming with Apache Spark on Azure

Thanks!Willem MeintsTechnical Evangelist/Microsoft MVP@willem_meints