sahara+storm: real time processing in sahara · 2019. 2. 26. · event stream processing tools...

45
Sahara+Storm: Real time processing in Sahara

Upload: others

Post on 22-Aug-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Sahara+Storm: Real time processing in Sahara · 2019. 2. 26. · Event Stream Processing Tools Popular open-source tools (e.g., Apache Storm, Apache Samza, Apache Spark) Cloud-based

Sahara+Storm: Real time processing in Sahara

Page 2: Sahara+Storm: Real time processing in Sahara · 2019. 2. 26. · Event Stream Processing Tools Popular open-source tools (e.g., Apache Storm, Apache Samza, Apache Spark) Cloud-based

Presenters

2

Telles NóbregaSoftware Engineer - Universidade Federal de Campina Grande (Brazil)OpenStack ATC

Andrey BritoProfessor - Universidade Federal de Campina Grande (Brazil)

Michael McCuneSenior Software Engineer, Red Hat (USA)Sahara Core

Page 3: Sahara+Storm: Real time processing in Sahara · 2019. 2. 26. · Event Stream Processing Tools Popular open-source tools (e.g., Apache Storm, Apache Samza, Apache Spark) Cloud-based

Motivation● What is event stream processing?

○ Techniques for handling unbounded sequences of events○ Events are pieces of data carrying relevant information

○ Events are processed individually or in small batches (event windows)

● Why do we need it?○ Log/metric analysis, trending topics, fraud detection, IoT○ Value of information sometimes decreases quickly○ Sometimes data is too big to be stored in raw format

3

Page 4: Sahara+Storm: Real time processing in Sahara · 2019. 2. 26. · Event Stream Processing Tools Popular open-source tools (e.g., Apache Storm, Apache Samza, Apache Spark) Cloud-based

Motivation● Event Stream Processing Tools

○ Popular open-source tools (e.g., Apache Storm, Apache Samza, Apache Spark)○ Cloud-based (e.g., AWS Kinetic)

○ Academic (e.g., StreamMine3G)

● Storm○ Originated at BackType○ Acquired by Twitter○ Apache top-level project since 2014

4

Page 5: Sahara+Storm: Real time processing in Sahara · 2019. 2. 26. · Event Stream Processing Tools Popular open-source tools (e.g., Apache Storm, Apache Samza, Apache Spark) Cloud-based

Storm● Why Storm?

○ Well known and used○ Scalable○ Support for fault tolerance

○ Programing language agnostic

● Major components○ Nimbus○ Zookeeper○ Supervisor

Nimbus Zookeeper

Zookeeper

Zookeeper

Supervisor

Supervisor

Supervisor

Supervisor

Supervisor

5

Page 6: Sahara+Storm: Real time processing in Sahara · 2019. 2. 26. · Event Stream Processing Tools Popular open-source tools (e.g., Apache Storm, Apache Samza, Apache Spark) Cloud-based

Storm - Topology elements● Data into the system - Spouts● Data processing - Bolts

6https://storm.apache.org/

Page 7: Sahara+Storm: Real time processing in Sahara · 2019. 2. 26. · Event Stream Processing Tools Popular open-source tools (e.g., Apache Storm, Apache Samza, Apache Spark) Cloud-based

Storm - Communication between elements

Shuffle

Fields Global

All

7

Field = 1

Field = 2

Page 8: Sahara+Storm: Real time processing in Sahara · 2019. 2. 26. · Event Stream Processing Tools Popular open-source tools (e.g., Apache Storm, Apache Samza, Apache Spark) Cloud-based

Storm - Exclamation Topology

8

example example!!! example!!!!!!

https://github.com/apache/storm/tree/master/examples/storm-starter/

Page 9: Sahara+Storm: Real time processing in Sahara · 2019. 2. 26. · Event Stream Processing Tools Popular open-source tools (e.g., Apache Storm, Apache Samza, Apache Spark) Cloud-based

Storm - Exclamation Topology - Spout

9

Page 10: Sahara+Storm: Real time processing in Sahara · 2019. 2. 26. · Event Stream Processing Tools Popular open-source tools (e.g., Apache Storm, Apache Samza, Apache Spark) Cloud-based

Storm - Exclamation Topology - Spout

10

Page 11: Sahara+Storm: Real time processing in Sahara · 2019. 2. 26. · Event Stream Processing Tools Popular open-source tools (e.g., Apache Storm, Apache Samza, Apache Spark) Cloud-based

Storm - Exclamation Topology - Spout

11

Page 12: Sahara+Storm: Real time processing in Sahara · 2019. 2. 26. · Event Stream Processing Tools Popular open-source tools (e.g., Apache Storm, Apache Samza, Apache Spark) Cloud-based

Storm - Exclamation Topology

12

example example!!! example!!!!!!

https://github.com/apache/storm/tree/master/examples/storm-starter/

Page 13: Sahara+Storm: Real time processing in Sahara · 2019. 2. 26. · Event Stream Processing Tools Popular open-source tools (e.g., Apache Storm, Apache Samza, Apache Spark) Cloud-based

Storm - Exclamation Topology - Bolt

13

Page 14: Sahara+Storm: Real time processing in Sahara · 2019. 2. 26. · Event Stream Processing Tools Popular open-source tools (e.g., Apache Storm, Apache Samza, Apache Spark) Cloud-based

Storm - Exclamation Topology - Bolt

14

Page 15: Sahara+Storm: Real time processing in Sahara · 2019. 2. 26. · Event Stream Processing Tools Popular open-source tools (e.g., Apache Storm, Apache Samza, Apache Spark) Cloud-based

Storm - Exclamation Topology - Main

15

Page 16: Sahara+Storm: Real time processing in Sahara · 2019. 2. 26. · Event Stream Processing Tools Popular open-source tools (e.g., Apache Storm, Apache Samza, Apache Spark) Cloud-based

Storm - Exclamation Topology - Main

16

Page 17: Sahara+Storm: Real time processing in Sahara · 2019. 2. 26. · Event Stream Processing Tools Popular open-source tools (e.g., Apache Storm, Apache Samza, Apache Spark) Cloud-based

Storm - Exclamation Topology

● Building and Testing Locally

○ git clone https://github.com/apache/storm.git

○ mvn clean package

○ storm jar target/storm-starter-0.0.1-SNAPSHOT-standalone.jar storm.starter.ExclamationTopology exclamation-topology

17

Page 18: Sahara+Storm: Real time processing in Sahara · 2019. 2. 26. · Event Stream Processing Tools Popular open-source tools (e.g., Apache Storm, Apache Samza, Apache Spark) Cloud-based

Sahara Overview● OpenStack’s Data Processing service● Easy-to-use standard interfaces:

○ Provision cluster of machines○ Deploy data processing frameworks○ Scale clusters○ Run jobs on frameworks

● Full integration into OpenStack Dashboard● Support for a variety of processing frameworks

○ Hadoop, including vendor specific distributions○ Spark○ Storm

18

Page 19: Sahara+Storm: Real time processing in Sahara · 2019. 2. 26. · Event Stream Processing Tools Popular open-source tools (e.g., Apache Storm, Apache Samza, Apache Spark) Cloud-based

Sahara Architecture

19

Page 20: Sahara+Storm: Real time processing in Sahara · 2019. 2. 26. · Event Stream Processing Tools Popular open-source tools (e.g., Apache Storm, Apache Samza, Apache Spark) Cloud-based

Sahara● REST interface allows access to all features● Run jobs based of several types: PIG, Hive, Java, Shell, MapReduce and

Streaming● Access data stored in Swift, HDFS, or Manila● Elastic Data Processing with the ability to use static, transient, and scalable

clusters● Plugin based structure for data processing frameworks

○ Allows for addition of new frameworks

20

Page 21: Sahara+Storm: Real time processing in Sahara · 2019. 2. 26. · Event Stream Processing Tools Popular open-source tools (e.g., Apache Storm, Apache Samza, Apache Spark) Cloud-based

Sahara + Storm● Storm became part of Sahara in Kilo Release● Full UX of Storm via OpenStack

○ Horizon

21

Page 22: Sahara+Storm: Real time processing in Sahara · 2019. 2. 26. · Event Stream Processing Tools Popular open-source tools (e.g., Apache Storm, Apache Samza, Apache Spark) Cloud-based

Preparing a Cluster - Node Group Templates● Nimbus (Master)● Supervisor (Slave)● Zookeeper

22

Page 23: Sahara+Storm: Real time processing in Sahara · 2019. 2. 26. · Event Stream Processing Tools Popular open-source tools (e.g., Apache Storm, Apache Samza, Apache Spark) Cloud-based

Preparing a Cluster - Cluster Templates● Preparing your cluster

23

Page 24: Sahara+Storm: Real time processing in Sahara · 2019. 2. 26. · Event Stream Processing Tools Popular open-source tools (e.g., Apache Storm, Apache Samza, Apache Spark) Cloud-based

Preparing a Cluster - Starting ● Preparing your cluster

24

Page 25: Sahara+Storm: Real time processing in Sahara · 2019. 2. 26. · Event Stream Processing Tools Popular open-source tools (e.g., Apache Storm, Apache Samza, Apache Spark) Cloud-based

Preparing a Cluster - Storm UI

25

http://storm-master:8080

Page 26: Sahara+Storm: Real time processing in Sahara · 2019. 2. 26. · Event Stream Processing Tools Popular open-source tools (e.g., Apache Storm, Apache Samza, Apache Spark) Cloud-based

Running a Job - Binaries

26

Page 27: Sahara+Storm: Real time processing in Sahara · 2019. 2. 26. · Event Stream Processing Tools Popular open-source tools (e.g., Apache Storm, Apache Samza, Apache Spark) Cloud-based

Running a Job - Creating Template

27

Page 28: Sahara+Storm: Real time processing in Sahara · 2019. 2. 26. · Event Stream Processing Tools Popular open-source tools (e.g., Apache Storm, Apache Samza, Apache Spark) Cloud-based

Running a Job - Templates

28

Page 29: Sahara+Storm: Real time processing in Sahara · 2019. 2. 26. · Event Stream Processing Tools Popular open-source tools (e.g., Apache Storm, Apache Samza, Apache Spark) Cloud-based

Running a Job - Launch Job

29

Page 30: Sahara+Storm: Real time processing in Sahara · 2019. 2. 26. · Event Stream Processing Tools Popular open-source tools (e.g., Apache Storm, Apache Samza, Apache Spark) Cloud-based

Running a Job - Configs

30

Page 31: Sahara+Storm: Real time processing in Sahara · 2019. 2. 26. · Event Stream Processing Tools Popular open-source tools (e.g., Apache Storm, Apache Samza, Apache Spark) Cloud-based

Running a Job

31

Page 32: Sahara+Storm: Real time processing in Sahara · 2019. 2. 26. · Event Stream Processing Tools Popular open-source tools (e.g., Apache Storm, Apache Samza, Apache Spark) Cloud-based

Running a Job - Storm UI

32

storm-master:8080

Page 33: Sahara+Storm: Real time processing in Sahara · 2019. 2. 26. · Event Stream Processing Tools Popular open-source tools (e.g., Apache Storm, Apache Samza, Apache Spark) Cloud-based

Sahara + Storm - Next Steps● Improve integration with Data Sources

○ Kafka

○ Zaqar

● Add ability to run python jobs● Put newer versions of Storm into Sahara

○ 0.9.3○ 0.9.4○ 0.9.5

33

Page 34: Sahara+Storm: Real time processing in Sahara · 2019. 2. 26. · Event Stream Processing Tools Popular open-source tools (e.g., Apache Storm, Apache Samza, Apache Spark) Cloud-based

Demo

FilterSaharaTweets

34

Page 35: Sahara+Storm: Real time processing in Sahara · 2019. 2. 26. · Event Stream Processing Tools Popular open-source tools (e.g., Apache Storm, Apache Samza, Apache Spark) Cloud-based

FilterSaharaTweets

35

Page 36: Sahara+Storm: Real time processing in Sahara · 2019. 2. 26. · Event Stream Processing Tools Popular open-source tools (e.g., Apache Storm, Apache Samza, Apache Spark) Cloud-based

Wiring Topology

36

Page 37: Sahara+Storm: Real time processing in Sahara · 2019. 2. 26. · Event Stream Processing Tools Popular open-source tools (e.g., Apache Storm, Apache Samza, Apache Spark) Cloud-based

Demo

37

● Please Tweet!!!○ i.e.: Storm works great in #Sahara

Page 38: Sahara+Storm: Real time processing in Sahara · 2019. 2. 26. · Event Stream Processing Tools Popular open-source tools (e.g., Apache Storm, Apache Samza, Apache Spark) Cloud-based

Acknowledgements

38

Page 39: Sahara+Storm: Real time processing in Sahara · 2019. 2. 26. · Event Stream Processing Tools Popular open-source tools (e.g., Apache Storm, Apache Samza, Apache Spark) Cloud-based

Thank you!

Page 40: Sahara+Storm: Real time processing in Sahara · 2019. 2. 26. · Event Stream Processing Tools Popular open-source tools (e.g., Apache Storm, Apache Samza, Apache Spark) Cloud-based

Sahara + Storm

40

Page 41: Sahara+Storm: Real time processing in Sahara · 2019. 2. 26. · Event Stream Processing Tools Popular open-source tools (e.g., Apache Storm, Apache Samza, Apache Spark) Cloud-based

Sahara + Storm

41

Page 42: Sahara+Storm: Real time processing in Sahara · 2019. 2. 26. · Event Stream Processing Tools Popular open-source tools (e.g., Apache Storm, Apache Samza, Apache Spark) Cloud-based

Sahara + Storm

42

Page 43: Sahara+Storm: Real time processing in Sahara · 2019. 2. 26. · Event Stream Processing Tools Popular open-source tools (e.g., Apache Storm, Apache Samza, Apache Spark) Cloud-based

Sahara + Storm

43

Page 44: Sahara+Storm: Real time processing in Sahara · 2019. 2. 26. · Event Stream Processing Tools Popular open-source tools (e.g., Apache Storm, Apache Samza, Apache Spark) Cloud-based

AlarmBolt

44

Page 45: Sahara+Storm: Real time processing in Sahara · 2019. 2. 26. · Event Stream Processing Tools Popular open-source tools (e.g., Apache Storm, Apache Samza, Apache Spark) Cloud-based

Wiring Topology

45