apache gearpump - lightweight real-time streaming engine
TRANSCRIPT
![Page 1: Apache Gearpump - Lightweight Real-time Streaming Engine](https://reader033.vdocuments.us/reader033/viewer/2022042706/58a8b3ea1a28abbd6b8b53c5/html5/thumbnails/1.jpg)
Apache GearpumpLightweight Real-time Streaming Engine
![Page 2: Apache Gearpump - Lightweight Real-time Streaming Engine](https://reader033.vdocuments.us/reader033/viewer/2022042706/58a8b3ea1a28abbd6b8b53c5/html5/thumbnails/2.jpg)
About me ● Software Engineer at Intel Big Data Team● Apache Gearpump committer, awesome-streaming● Previously MapReduce NativeTask, storm-benchmark● Shanghai Big Data Streaming Meetup
![Page 3: Apache Gearpump - Lightweight Real-time Streaming Engine](https://reader033.vdocuments.us/reader033/viewer/2022042706/58a8b3ea1a28abbd6b8b53c5/html5/thumbnails/3.jpg)
History of Gearpump● Conceived at Intel in mid-2014 ● Open source project on GitHub from start● Entered Apache incubation on Mar.8th, 2016● Current stable release 0.8.0
“The name Gearpump is a reference to the engineering term “Gear Pump”, which is a super simple pump that consists of only two gears, but is very powerful at streaming water.”
![Page 4: Apache Gearpump - Lightweight Real-time Streaming Engine](https://reader033.vdocuments.us/reader033/viewer/2022042706/58a8b3ea1a28abbd6b8b53c5/html5/thumbnails/4.jpg)
Yet Another Streaming Engine ?
https://databaseline.wordpress.com/2016/03/12/an-overview-of-apache-streaming-technologies/
![Page 5: Apache Gearpump - Lightweight Real-time Streaming Engine](https://reader033.vdocuments.us/reader033/viewer/2022042706/58a8b3ea1a28abbd6b8b53c5/html5/thumbnails/5.jpg)
(source: The Evolution of Massive-Scale Data Processing, slide 4)
![Page 6: Apache Gearpump - Lightweight Real-time Streaming Engine](https://reader033.vdocuments.us/reader033/viewer/2022042706/58a8b3ea1a28abbd6b8b53c5/html5/thumbnails/6.jpg)
Data Processing TradeoffsCorrectness
Low Latency Low Cost
(source: The Beam Model, slide 10)
![Page 7: Apache Gearpump - Lightweight Real-time Streaming Engine](https://reader033.vdocuments.us/reader033/viewer/2022042706/58a8b3ea1a28abbd6b8b53c5/html5/thumbnails/7.jpg)
Billing
Correctness
Low Latency Low Cost
Correctness
Low Latency Low Cost
Live Cost Estimate
Correctness
Low Latency Low Cost
Abuse Detection
Correctness
Low Latency Low Cost
Abuse Detection Backfill
Use case: charge advertisers
![Page 8: Apache Gearpump - Lightweight Real-time Streaming Engine](https://reader033.vdocuments.us/reader033/viewer/2022042706/58a8b3ea1a28abbd6b8b53c5/html5/thumbnails/8.jpg)
GearpumpCorrectness
Low Latency Low Cost
(source: The Beam Model, slide 10)
● Out of order processing ● Exactly Once● Flow Control● Fault Tolerance
● Native Streaming● Message Driven
● Message Driven● Dynamic DAG
![Page 9: Apache Gearpump - Lightweight Real-time Streaming Engine](https://reader033.vdocuments.us/reader033/viewer/2022042706/58a8b3ea1a28abbd6b8b53c5/html5/thumbnails/9.jpg)
Overview - DAG
Task
![Page 10: Apache Gearpump - Lightweight Real-time Streaming Engine](https://reader033.vdocuments.us/reader033/viewer/2022042706/58a8b3ea1a28abbd6b8b53c5/html5/thumbnails/10.jpg)
Overview - Application
Executor (JVM)
AppMaster
![Page 11: Apache Gearpump - Lightweight Real-time Streaming Engine](https://reader033.vdocuments.us/reader033/viewer/2022042706/58a8b3ea1a28abbd6b8b53c5/html5/thumbnails/11.jpg)
Overview - Cluster
Woker
Master (HA) Dashboard
Woker
Client A
AppMaster
Client B
![Page 12: Apache Gearpump - Lightweight Real-time Streaming Engine](https://reader033.vdocuments.us/reader033/viewer/2022042706/58a8b3ea1a28abbd6b8b53c5/html5/thumbnails/12.jpg)
Overview - Deployment● Local mode● Standalone mode● YARN mode
![Page 13: Apache Gearpump - Lightweight Real-time Streaming Engine](https://reader033.vdocuments.us/reader033/viewer/2022042706/58a8b3ea1a28abbd6b8b53c5/html5/thumbnails/13.jpg)
Overview - API
Processor API
Scala DSL
Java DSLStorm
Akka Streams Beam API
SAMOA REST
![Page 14: Apache Gearpump - Lightweight Real-time Streaming Engine](https://reader033.vdocuments.us/reader033/viewer/2022042706/58a8b3ea1a28abbd6b8b53c5/html5/thumbnails/14.jpg)
Storm compatibility● Binary compatibility● Dynamic DAG● Support Storm 0.9 and 0.10
![Page 15: Apache Gearpump - Lightweight Real-time Streaming Engine](https://reader033.vdocuments.us/reader033/viewer/2022042706/58a8b3ea1a28abbd6b8b53c5/html5/thumbnails/15.jpg)
Overview - Performance
● 100 byte message ● 48-core, 256 GB memory,
four node cluster
Source Sink
shuffle
![Page 16: Apache Gearpump - Lightweight Real-time Streaming Engine](https://reader033.vdocuments.us/reader033/viewer/2022042706/58a8b3ea1a28abbd6b8b53c5/html5/thumbnails/16.jpg)
Yahoo Streaming Benchmarks
(source: benchmarking streaming computation engines at yahoo)
![Page 17: Apache Gearpump - Lightweight Real-time Streaming Engine](https://reader033.vdocuments.us/reader033/viewer/2022042706/58a8b3ea1a28abbd6b8b53c5/html5/thumbnails/17.jpg)
Yahoo Streaming Benchmarks
https://github.com/yahoo/streaming-benchmarks/pull/10
![Page 18: Apache Gearpump - Lightweight Real-time Streaming Engine](https://reader033.vdocuments.us/reader033/viewer/2022042706/58a8b3ea1a28abbd6b8b53c5/html5/thumbnails/18.jpg)
(source: https://www.flickr.com/photos/mike_lao/2588723972)
![Page 19: Apache Gearpump - Lightweight Real-time Streaming Engine](https://reader033.vdocuments.us/reader033/viewer/2022042706/58a8b3ea1a28abbd6b8b53c5/html5/thumbnails/19.jpg)
Demo
![Page 20: Apache Gearpump - Lightweight Real-time Streaming Engine](https://reader033.vdocuments.us/reader033/viewer/2022042706/58a8b3ea1a28abbd6b8b53c5/html5/thumbnails/20.jpg)
Dynamic DAGRuntime DAG modification without restarting applications
![Page 21: Apache Gearpump - Lightweight Real-time Streaming Engine](https://reader033.vdocuments.us/reader033/viewer/2022042706/58a8b3ea1a28abbd6b8b53c5/html5/thumbnails/21.jpg)
Change parallelism
![Page 22: Apache Gearpump - Lightweight Real-time Streaming Engine](https://reader033.vdocuments.us/reader033/viewer/2022042706/58a8b3ea1a28abbd6b8b53c5/html5/thumbnails/22.jpg)
Change logicv1
v1
v2
v2
Jarupload
![Page 23: Apache Gearpump - Lightweight Real-time Streaming Engine](https://reader033.vdocuments.us/reader033/viewer/2022042706/58a8b3ea1a28abbd6b8b53c5/html5/thumbnails/23.jpg)
Message Driven Processing
![Page 24: Apache Gearpump - Lightweight Real-time Streaming Engine](https://reader033.vdocuments.us/reader033/viewer/2022042706/58a8b3ea1a28abbd6b8b53c5/html5/thumbnails/24.jpg)
Message driven processing
Executor (JVM) ● Task is thread safe● Task is only taking up CPU on
incoming messages ● Scale up to 10000 task on
single four-core machine1
1. Gearpump Task is actually Akka Actor and it is reported ~2.5 million actors per GB of heap by Akka
![Page 25: Apache Gearpump - Lightweight Real-time Streaming Engine](https://reader033.vdocuments.us/reader033/viewer/2022042706/58a8b3ea1a28abbd6b8b53c5/html5/thumbnails/25.jpg)
Out of order processing
(source: The Evolution of Massive-Scale Data Processing, slide 72)
![Page 26: Apache Gearpump - Lightweight Real-time Streaming Engine](https://reader033.vdocuments.us/reader033/viewer/2022042706/58a8b3ea1a28abbd6b8b53c5/html5/thumbnails/26.jpg)
Event time based window count
7:00 7:01 6:59
Watermark = 7:00
7:02 6:58 6:56 6:51 6:52
Watermark = 6:55 Watermark = 6:50
6:49
([6:50, 6:55], 3) ([6:45, 6:50], 1)Window count
![Page 27: Apache Gearpump - Lightweight Real-time Streaming Engine](https://reader033.vdocuments.us/reader033/viewer/2022042706/58a8b3ea1a28abbd6b8b53c5/html5/thumbnails/27.jpg)
Exactly Once No lost or duplicate updates to state
![Page 28: Apache Gearpump - Lightweight Real-time Streaming Engine](https://reader033.vdocuments.us/reader033/viewer/2022042706/58a8b3ea1a28abbd6b8b53c5/html5/thumbnails/28.jpg)
0:50
0:45
0:55
0:35
0:40
0:30
0:30
kafka
kafka
Persistent Store
(timestamp, kafka_offset)
(timestamp, state) (timestamp)
local watermark = 0:45
local watermark = 0:30
global watermark = min(local watermarks)= 0:30
Checkpoint
![Page 29: Apache Gearpump - Lightweight Real-time Streaming Engine](https://reader033.vdocuments.us/reader033/viewer/2022042706/58a8b3ea1a28abbd6b8b53c5/html5/thumbnails/29.jpg)
0:50
0:45
0:55
0:40
0:30
0:30
kafka
kafka
Persistent Store
(timestamp, kafka_offset)
(timestamp, state) (timestamp)
local watermark = 0:45
local watermark = 0:30
global watermark = min(local watermarks)= 0:30
Crash
![Page 30: Apache Gearpump - Lightweight Real-time Streaming Engine](https://reader033.vdocuments.us/reader033/viewer/2022042706/58a8b3ea1a28abbd6b8b53c5/html5/thumbnails/30.jpg)
0:30
0:30
0:30
0:30
kafka
kafka
Persistent Store
(0:30, kafka_offset)
(0:30, state) (0:30)
Recover0:30
0:30
0:30
replay
local watermark = global watermark= 0:30
![Page 31: Apache Gearpump - Lightweight Real-time Streaming Engine](https://reader033.vdocuments.us/reader033/viewer/2022042706/58a8b3ea1a28abbd6b8b53c5/html5/thumbnails/31.jpg)
Example
global watermark (local watermark, kafka offset) (local watermark, state)
0:00 (0:10, 1) (0:00, 0)
0:10 (0:20, 2) (0:10, 1)
0:20 (0:30, 3) (0:20, 2)
0:30 (0:40, 4) (0:30, 3)
checkpoint
crash
![Page 32: Apache Gearpump - Lightweight Real-time Streaming Engine](https://reader033.vdocuments.us/reader033/viewer/2022042706/58a8b3ea1a28abbd6b8b53c5/html5/thumbnails/32.jpg)
Example
global watermark (local watermark, kafka offset) (local watermark, state)
0:30 (0:30, 3) (0:30, 3)
0:30 (0:40, 4) (0:30, 3)
0:40 (0:50, 5) (0:40, 4)
0:50 (1:00, 6) (0:50, 5)
recover
checkpoint
![Page 33: Apache Gearpump - Lightweight Real-time Streaming Engine](https://reader033.vdocuments.us/reader033/viewer/2022042706/58a8b3ea1a28abbd6b8b53c5/html5/thumbnails/33.jpg)
Flow Control
![Page 34: Apache Gearpump - Lightweight Real-time Streaming Engine](https://reader033.vdocuments.us/reader033/viewer/2022042706/58a8b3ea1a28abbd6b8b53c5/html5/thumbnails/34.jpg)
Without flow control
Fast Task
Slow Task
Fast Task
OOM
![Page 35: Apache Gearpump - Lightweight Real-time Streaming Engine](https://reader033.vdocuments.us/reader033/viewer/2022042706/58a8b3ea1a28abbd6b8b53c5/html5/thumbnails/35.jpg)
Performant message track
A B100101102200201202
100
A B200201202300301302
200
AckRequestAckRequest
AckRequest AckRequest
Ack
Ack
Received 100 messages
Received 200 messages
![Page 36: Apache Gearpump - Lightweight Real-time Streaming Engine](https://reader033.vdocuments.us/reader033/viewer/2022042706/58a8b3ea1a28abbd6b8b53c5/html5/thumbnails/36.jpg)
Flow control
Fast Task
Slow Task
Fast Task
No ack, stop sending
Back-pressure
![Page 37: Apache Gearpump - Lightweight Real-time Streaming Engine](https://reader033.vdocuments.us/reader033/viewer/2022042706/58a8b3ea1a28abbd6b8b53c5/html5/thumbnails/37.jpg)
Flow control
Slow Task
Slow Task
Fast Task
No ack, stop sending
Back-pressure
![Page 38: Apache Gearpump - Lightweight Real-time Streaming Engine](https://reader033.vdocuments.us/reader033/viewer/2022042706/58a8b3ea1a28abbd6b8b53c5/html5/thumbnails/38.jpg)
Flow control
Slow Task
Slow Task
Fast Task
No ack, stop sending
Back-pressure
![Page 39: Apache Gearpump - Lightweight Real-time Streaming Engine](https://reader033.vdocuments.us/reader033/viewer/2022042706/58a8b3ea1a28abbd6b8b53c5/html5/thumbnails/39.jpg)
Fault Tolerance
![Page 40: Apache Gearpump - Lightweight Real-time Streaming Engine](https://reader033.vdocuments.us/reader033/viewer/2022042706/58a8b3ea1a28abbd6b8b53c5/html5/thumbnails/40.jpg)
Master HA ● Conflict-free Replicated Data Type (CRDT) for state consistency
Leader
Follower Follower
sync
sync
sync
![Page 41: Apache Gearpump - Lightweight Real-time Streaming Engine](https://reader033.vdocuments.us/reader033/viewer/2022042706/58a8b3ea1a28abbd6b8b53c5/html5/thumbnails/41.jpg)
Resource isolation● Linux CGroup● Configurable CPU resource per executor (JVM)● Configurable executor number per application
![Page 42: Apache Gearpump - Lightweight Real-time Streaming Engine](https://reader033.vdocuments.us/reader033/viewer/2022042706/58a8b3ea1a28abbd6b8b53c5/html5/thumbnails/42.jpg)
References1. An Introduction to the Beam Model2. The Evolution of Massive-Scale Data Processing3. gearpump.apache.org4. akka.io5. http://www.slideshare.net/SeanZhong/strata-singapore-gearpumpreal-
time-dagprocessing-with-akka-at-scale
![Page 43: Apache Gearpump - Lightweight Real-time Streaming Engine](https://reader033.vdocuments.us/reader033/viewer/2022042706/58a8b3ea1a28abbd6b8b53c5/html5/thumbnails/43.jpg)
Q & A
![Page 44: Apache Gearpump - Lightweight Real-time Streaming Engine](https://reader033.vdocuments.us/reader033/viewer/2022042706/58a8b3ea1a28abbd6b8b53c5/html5/thumbnails/44.jpg)
Backup slides
![Page 45: Apache Gearpump - Lightweight Real-time Streaming Engine](https://reader033.vdocuments.us/reader033/viewer/2022042706/58a8b3ea1a28abbd6b8b53c5/html5/thumbnails/45.jpg)
Supervisor hierarchyMaster
AppMaster
Executor
Task
AppMaster
Executor Executor
Task Task Task
watch
![Page 46: Apache Gearpump - Lightweight Real-time Streaming Engine](https://reader033.vdocuments.us/reader033/viewer/2022042706/58a8b3ea1a28abbd6b8b53c5/html5/thumbnails/46.jpg)
Supervisor hierarchyMaster
AppMaster
Executor
Task
AppMaster
Executor Executor
Task Task Task
watch
report failure
report failure