kenneth knowles - apache beam - a unified model for batch and streaming data processing
TRANSCRIPT
![Page 1: Kenneth Knowles - Apache Beam - A Unified Model for Batch and Streaming Data Processing](https://reader036.vdocuments.us/reader036/viewer/2022081604/587138441a28abf0568b62fd/html5/thumbnails/1.jpg)
Apache Beam (incubating)Kenneth Knowles
Apache Beam (incubating) PPMCSoftware Engineer @ Google
[email protected] / @KennKnowles Flink Forward 2016https://goo.gl/jzlvD9
A Unified Model for Batch and Streaming Data Processing
![Page 2: Kenneth Knowles - Apache Beam - A Unified Model for Batch and Streaming Data Processing](https://reader036.vdocuments.us/reader036/viewer/2022081604/587138441a28abf0568b62fd/html5/thumbnails/2.jpg)
What is Apache Beam?
Apache Beam is
a unified programming model
for expressing
efficient and portable
data processing pipelines.
![Page 3: Kenneth Knowles - Apache Beam - A Unified Model for Batch and Streaming Data Processing](https://reader036.vdocuments.us/reader036/viewer/2022081604/587138441a28abf0568b62fd/html5/thumbnails/3.jpg)
Big Data: Infinite & Out of Order
The Beam Model
Beam Project / Technical Vision
Agenda
1
2
3
3
![Page 4: Kenneth Knowles - Apache Beam - A Unified Model for Batch and Streaming Data Processing](https://reader036.vdocuments.us/reader036/viewer/2022081604/587138441a28abf0568b62fd/html5/thumbnails/4.jpg)
4
Big Data:Infinite & Out of Order
1
![Page 5: Kenneth Knowles - Apache Beam - A Unified Model for Batch and Streaming Data Processing](https://reader036.vdocuments.us/reader036/viewer/2022081604/587138441a28abf0568b62fd/html5/thumbnails/5.jpg)
https://commons.wikimedia.org/wiki/File:Globe_centered_in_the_Atlantic_Ocean_(green_and_grey_globe_scheme).svg5
![Page 6: Kenneth Knowles - Apache Beam - A Unified Model for Batch and Streaming Data Processing](https://reader036.vdocuments.us/reader036/viewer/2022081604/587138441a28abf0568b62fd/html5/thumbnails/6.jpg)
6
Unbounded, delayed, out of order
9:008:00 14:0013:0012:0011:0010:002:001:00 7:006:005:004:003:00
6
8:00
8:008:00
![Page 7: Kenneth Knowles - Apache Beam - A Unified Model for Batch and Streaming Data Processing](https://reader036.vdocuments.us/reader036/viewer/2022081604/587138441a28abf0568b62fd/html5/thumbnails/7.jpg)
Incoming!
Score per user?
7
![Page 8: Kenneth Knowles - Apache Beam - A Unified Model for Batch and Streaming Data Processing](https://reader036.vdocuments.us/reader036/viewer/2022081604/587138441a28abf0568b62fd/html5/thumbnails/8.jpg)
Organizing the stream
8
8:00
8:00
8:00
![Page 9: Kenneth Knowles - Apache Beam - A Unified Model for Batch and Streaming Data Processing](https://reader036.vdocuments.us/reader036/viewer/2022081604/587138441a28abf0568b62fd/html5/thumbnails/9.jpg)
Completeness Latency Cost
$$$
Data Processing Tradeoffs
9
![Page 10: Kenneth Knowles - Apache Beam - A Unified Model for Batch and Streaming Data Processing](https://reader036.vdocuments.us/reader036/viewer/2022081604/587138441a28abf0568b62fd/html5/thumbnails/10.jpg)
What is important for your application?
Completeness Low Latency Low Cost
Important
Not Important
$$$10
![Page 11: Kenneth Knowles - Apache Beam - A Unified Model for Batch and Streaming Data Processing](https://reader036.vdocuments.us/reader036/viewer/2022081604/587138441a28abf0568b62fd/html5/thumbnails/11.jpg)
Monthly Billing
Completeness Low Latency Low Cost
Important
Not Important
$$$11
![Page 12: Kenneth Knowles - Apache Beam - A Unified Model for Batch and Streaming Data Processing](https://reader036.vdocuments.us/reader036/viewer/2022081604/587138441a28abf0568b62fd/html5/thumbnails/12.jpg)
Billing estimate
Completeness Low Latency Low Cost
Important
Not Important
$$$12
![Page 13: Kenneth Knowles - Apache Beam - A Unified Model for Batch and Streaming Data Processing](https://reader036.vdocuments.us/reader036/viewer/2022081604/587138441a28abf0568b62fd/html5/thumbnails/13.jpg)
Abuse Detection
Completeness Low Latency Low Cost
Important
Not Important
$$$13
![Page 14: Kenneth Knowles - Apache Beam - A Unified Model for Batch and Streaming Data Processing](https://reader036.vdocuments.us/reader036/viewer/2022081604/587138441a28abf0568b62fd/html5/thumbnails/14.jpg)
20142004 2006 2008 2010 2012 20162005 2007 2009 2013 20152011
MapReduce(paper)
Apache Hadoop
Dataflow Model(paper)
See also: Tyler Akidau's Evolution of Massive-Scale Data Processing (goo.gl/VlVAEp)
MillWheel(paper)
Heron
ApacheSpark
ApacheStorm
Apache Gearpump
(incubating)Apache
Apex
Apache Flink
Cloud Dataflow
FlumeJava(paper)
Apache Beam (incubating)
Choices abound
Apache Samza
![Page 15: Kenneth Knowles - Apache Beam - A Unified Model for Batch and Streaming Data Processing](https://reader036.vdocuments.us/reader036/viewer/2022081604/587138441a28abf0568b62fd/html5/thumbnails/15.jpg)
15
The Beam Model2
![Page 16: Kenneth Knowles - Apache Beam - A Unified Model for Batch and Streaming Data Processing](https://reader036.vdocuments.us/reader036/viewer/2022081604/587138441a28abf0568b62fd/html5/thumbnails/16.jpg)
The Beam Model
Pipeline
16
PTransform
PCollection
(bounded or unbounded)
![Page 17: Kenneth Knowles - Apache Beam - A Unified Model for Batch and Streaming Data Processing](https://reader036.vdocuments.us/reader036/viewer/2022081604/587138441a28abf0568b62fd/html5/thumbnails/17.jpg)
The Beam Vision (for users)
Sum Per Key
17
input.apply(
Sum.integersPerKey())
Java
input | Sum.PerKey()
Python
Apache Flink
Apache Spark
Cloud Dataflow
⋮ ⋮
Apache Gearpump
(incubating)
Apache Apex
![Page 18: Kenneth Knowles - Apache Beam - A Unified Model for Batch and Streaming Data Processing](https://reader036.vdocuments.us/reader036/viewer/2022081604/587138441a28abf0568b62fd/html5/thumbnails/18.jpg)
Pipeline p = Pipeline.create(options);
p.apply(TextIO.Read.from("gs://dataflow-samples/shakespeare/*"))
.apply(FlatMapElements.via(line → Arrays.asList(line.split("[^a-zA-Z']+"))))
.apply(Filter.byPredicate(word → !word.isEmpty()))
.apply(Count.perElement())
.apply(MapElements.via(count → count.getKey() + ": " + count.getValue())
.apply(TextIO.Write.to("gs://..."));
p.run();
What your (Java) Code Looks Like
18
![Page 19: Kenneth Knowles - Apache Beam - A Unified Model for Batch and Streaming Data Processing](https://reader036.vdocuments.us/reader036/viewer/2022081604/587138441a28abf0568b62fd/html5/thumbnails/19.jpg)
The Beam Model: Asking the Right Questions
What are you computing?
Where in event time?
When in processing time are results produced?
How do refinements relate?
19
![Page 20: Kenneth Knowles - Apache Beam - A Unified Model for Batch and Streaming Data Processing](https://reader036.vdocuments.us/reader036/viewer/2022081604/587138441a28abf0568b62fd/html5/thumbnails/20.jpg)
The Beam Model: Asking the Right Questions
What are you computing?
Where in event time?
When in processing time are results produced?
How do refinements relate?
20
Aggregations, transformations, ...
![Page 21: Kenneth Knowles - Apache Beam - A Unified Model for Batch and Streaming Data Processing](https://reader036.vdocuments.us/reader036/viewer/2022081604/587138441a28abf0568b62fd/html5/thumbnails/21.jpg)
The Beam Model: What are you computing?
Sum Per User
21
![Page 22: Kenneth Knowles - Apache Beam - A Unified Model for Batch and Streaming Data Processing](https://reader036.vdocuments.us/reader036/viewer/2022081604/587138441a28abf0568b62fd/html5/thumbnails/22.jpg)
The Beam Model: What are you computing?
Sum Per Key
22
input.apply(Sum.integersPerKey()) .apply(BigQueryIO.Write.to(...));
Java
input | Sum.PerKey() | Write(BigQuerySink(...))
Python
![Page 23: Kenneth Knowles - Apache Beam - A Unified Model for Batch and Streaming Data Processing](https://reader036.vdocuments.us/reader036/viewer/2022081604/587138441a28abf0568b62fd/html5/thumbnails/23.jpg)
Per element(ParDo)
Grouping(Group/Combine Per Key)
Composite
The Beam Model: What are you computing?
![Page 24: Kenneth Knowles - Apache Beam - A Unified Model for Batch and Streaming Data Processing](https://reader036.vdocuments.us/reader036/viewer/2022081604/587138441a28abf0568b62fd/html5/thumbnails/24.jpg)
The Beam Model: What are you computing?
Sum Per Key
24
input.apply(Sum.integersPerKey()) .apply(BigQueryIO.Write.to(...));
Java
input | Sum.PerKey() | Write(BigQuerySink(...))
Python
![Page 25: Kenneth Knowles - Apache Beam - A Unified Model for Batch and Streaming Data Processing](https://reader036.vdocuments.us/reader036/viewer/2022081604/587138441a28abf0568b62fd/html5/thumbnails/25.jpg)
The Beam Model: Asking the Right Questions
What are you computing?
Where in event time?
When in processing time are results produced?
How do refinements relate?
25
Event time windowing
![Page 26: Kenneth Knowles - Apache Beam - A Unified Model for Batch and Streaming Data Processing](https://reader036.vdocuments.us/reader036/viewer/2022081604/587138441a28abf0568b62fd/html5/thumbnails/26.jpg)
26
The Beam Model: Where in Event Time?8:00
8:00
8:00
![Page 27: Kenneth Knowles - Apache Beam - A Unified Model for Batch and Streaming Data Processing](https://reader036.vdocuments.us/reader036/viewer/2022081604/587138441a28abf0568b62fd/html5/thumbnails/27.jpg)
Processing Time vs Event Time
Event Time = Processing Time ??
27
![Page 28: Kenneth Knowles - Apache Beam - A Unified Model for Batch and Streaming Data Processing](https://reader036.vdocuments.us/reader036/viewer/2022081604/587138441a28abf0568b62fd/html5/thumbnails/28.jpg)
Processing Time vs Event Time
28
Proc
essi
ng T
ime
Event Time
![Page 29: Kenneth Knowles - Apache Beam - A Unified Model for Batch and Streaming Data Processing](https://reader036.vdocuments.us/reader036/viewer/2022081604/587138441a28abf0568b62fd/html5/thumbnails/29.jpg)
Proc
essi
ng T
ime
Processing Time vs Event Time
Realtime
29
This is not possible
Event Time
![Page 30: Kenneth Knowles - Apache Beam - A Unified Model for Batch and Streaming Data Processing](https://reader036.vdocuments.us/reader036/viewer/2022081604/587138441a28abf0568b62fd/html5/thumbnails/30.jpg)
Processing Time vs Event Time
30
Processing Delay
Proc
essi
ng T
ime
![Page 31: Kenneth Knowles - Apache Beam - A Unified Model for Batch and Streaming Data Processing](https://reader036.vdocuments.us/reader036/viewer/2022081604/587138441a28abf0568b62fd/html5/thumbnails/31.jpg)
Processing Time vs Event TimeVery delayed
31
Proc
essi
ng T
ime
Event Time
![Page 32: Kenneth Knowles - Apache Beam - A Unified Model for Batch and Streaming Data Processing](https://reader036.vdocuments.us/reader036/viewer/2022081604/587138441a28abf0568b62fd/html5/thumbnails/32.jpg)
Processing Time windows(probably are not what you want)
Proc
essi
ng T
ime
Event Time 32
![Page 33: Kenneth Knowles - Apache Beam - A Unified Model for Batch and Streaming Data Processing](https://reader036.vdocuments.us/reader036/viewer/2022081604/587138441a28abf0568b62fd/html5/thumbnails/33.jpg)
Event Time Windows
33
Proc
essi
ng T
ime
Event Time
![Page 34: Kenneth Knowles - Apache Beam - A Unified Model for Batch and Streaming Data Processing](https://reader036.vdocuments.us/reader036/viewer/2022081604/587138441a28abf0568b62fd/html5/thumbnails/34.jpg)
Proc
essi
ng T
ime
Event Time
Event Time Windows
34
(implementing processing time windows)
Just throw away your data's timestamps and replace them with "now()"
![Page 35: Kenneth Knowles - Apache Beam - A Unified Model for Batch and Streaming Data Processing](https://reader036.vdocuments.us/reader036/viewer/2022081604/587138441a28abf0568b62fd/html5/thumbnails/35.jpg)
input | WindowInto(FixedWindows(3600) | Sum.PerKey()
| Write(BigQuerySink(...))
Python
The Beam Model: Where in Event Time?
Sum Per Key
Window Into
35
input.apply(
Window.into(
FixedWindows.of(
Duration.standardHours(1))) .apply(Sum.integersPerKey())
.apply(BigQueryIO.Write.to(...))
Java
![Page 36: Kenneth Knowles - Apache Beam - A Unified Model for Batch and Streaming Data Processing](https://reader036.vdocuments.us/reader036/viewer/2022081604/587138441a28abf0568b62fd/html5/thumbnails/36.jpg)
Fixed Windows(also called Tumbling)
Sliding Windows
User Sessions
The Beam Model: Where in Event Time?1. Assign each timestamped
event to one or more windows
2. Merge those windows according to custom logic
![Page 37: Kenneth Knowles - Apache Beam - A Unified Model for Batch and Streaming Data Processing](https://reader036.vdocuments.us/reader036/viewer/2022081604/587138441a28abf0568b62fd/html5/thumbnails/37.jpg)
So that's what and where...
37
![Page 38: Kenneth Knowles - Apache Beam - A Unified Model for Batch and Streaming Data Processing](https://reader036.vdocuments.us/reader036/viewer/2022081604/587138441a28abf0568b62fd/html5/thumbnails/38.jpg)
The Beam Model: Asking the Right Questions
What are you computing?
Where in event time?
When in processing time are results produced?
How do refinements relate?
38
Watermarks & Triggers
![Page 39: Kenneth Knowles - Apache Beam - A Unified Model for Batch and Streaming Data Processing](https://reader036.vdocuments.us/reader036/viewer/2022081604/587138441a28abf0568b62fd/html5/thumbnails/39.jpg)
Event time windowsPr
oces
sing
Tim
e
39
Event Time
![Page 40: Kenneth Knowles - Apache Beam - A Unified Model for Batch and Streaming Data Processing](https://reader036.vdocuments.us/reader036/viewer/2022081604/587138441a28abf0568b62fd/html5/thumbnails/40.jpg)
Fixed cutoff (we can do better)Pr
oces
sing
Tim
e
Event Time40
Allowed delay
Concurrent windows
![Page 41: Kenneth Knowles - Apache Beam - A Unified Model for Batch and Streaming Data Processing](https://reader036.vdocuments.us/reader036/viewer/2022081604/587138441a28abf0568b62fd/html5/thumbnails/41.jpg)
Perfect watermarkPr
oces
sing
Tim
e
41
Event Time
Check out Slava's slides from Strata London 2016 talk on watermarks:https://goo.gl/K4FnqQ
![Page 42: Kenneth Knowles - Apache Beam - A Unified Model for Batch and Streaming Data Processing](https://reader036.vdocuments.us/reader036/viewer/2022081604/587138441a28abf0568b62fd/html5/thumbnails/42.jpg)
Heuristic WatermarkPr
oces
sing
Tim
e
42
Event Time
![Page 43: Kenneth Knowles - Apache Beam - A Unified Model for Batch and Streaming Data Processing](https://reader036.vdocuments.us/reader036/viewer/2022081604/587138441a28abf0568b62fd/html5/thumbnails/43.jpg)
Heuristic WatermarkPr
oces
sing
Tim
e
43
Current processing time
Event Time
![Page 44: Kenneth Knowles - Apache Beam - A Unified Model for Batch and Streaming Data Processing](https://reader036.vdocuments.us/reader036/viewer/2022081604/587138441a28abf0568b62fd/html5/thumbnails/44.jpg)
Heuristic WatermarkPr
oces
sing
Tim
e
44
Current processing time
Event Time
![Page 45: Kenneth Knowles - Apache Beam - A Unified Model for Batch and Streaming Data Processing](https://reader036.vdocuments.us/reader036/viewer/2022081604/587138441a28abf0568b62fd/html5/thumbnails/45.jpg)
Heuristic WatermarkPr
oces
sing
Tim
e
45
Current processing time
Late data
Event Time
![Page 46: Kenneth Knowles - Apache Beam - A Unified Model for Batch and Streaming Data Processing](https://reader036.vdocuments.us/reader036/viewer/2022081604/587138441a28abf0568b62fd/html5/thumbnails/46.jpg)
Watermarks measure completeness
46
$$$
$$$
$$$
? Running Total
✔ Monthly billing
? Abuse Detection
![Page 47: Kenneth Knowles - Apache Beam - A Unified Model for Batch and Streaming Data Processing](https://reader036.vdocuments.us/reader036/viewer/2022081604/587138441a28abf0568b62fd/html5/thumbnails/47.jpg)
The Beam Model: When in Processing Time?
Sum Per Key
Window Into
47
input
.apply(Window.into(FixedWindows.of(...))
.triggering(
AfterWatermark.pastEndOfWindow())) .apply(Sum.integersPerKey())
.apply(BigQueryIO.Write.to(...))
Java
input | WindowInto(FixedWindows(3600),
trigger=AfterWatermark())
| Sum.PerKey()
| Write(BigQuerySink(...))
Python
Trigger after end of window
![Page 48: Kenneth Knowles - Apache Beam - A Unified Model for Batch and Streaming Data Processing](https://reader036.vdocuments.us/reader036/viewer/2022081604/587138441a28abf0568b62fd/html5/thumbnails/48.jpg)
Proc
essi
ng T
ime
Event Time
AfterWatermark.pastEndOfWindow()
48
![Page 49: Kenneth Knowles - Apache Beam - A Unified Model for Batch and Streaming Data Processing](https://reader036.vdocuments.us/reader036/viewer/2022081604/587138441a28abf0568b62fd/html5/thumbnails/49.jpg)
Current processing time
Proc
essi
ng T
ime
Event Time49
AfterWatermark.pastEndOfWindow()
![Page 50: Kenneth Knowles - Apache Beam - A Unified Model for Batch and Streaming Data Processing](https://reader036.vdocuments.us/reader036/viewer/2022081604/587138441a28abf0568b62fd/html5/thumbnails/50.jpg)
Proc
essi
ng T
ime
Event Time
Late data
50
Current processing time
AfterWatermark.pastEndOfWindow()
![Page 51: Kenneth Knowles - Apache Beam - A Unified Model for Batch and Streaming Data Processing](https://reader036.vdocuments.us/reader036/viewer/2022081604/587138441a28abf0568b62fd/html5/thumbnails/51.jpg)
Proc
essi
ng T
ime
Event Time51
High completeness
Potentially high latency
Low cost
AfterWatermark.pastEndOfWindow()
$$$
![Page 52: Kenneth Knowles - Apache Beam - A Unified Model for Batch and Streaming Data Processing](https://reader036.vdocuments.us/reader036/viewer/2022081604/587138441a28abf0568b62fd/html5/thumbnails/52.jpg)
Proc
essi
ng T
ime
Event Time
Repeatedly.forever( AfterPane.elementCountAtLeast(2))
52
![Page 53: Kenneth Knowles - Apache Beam - A Unified Model for Batch and Streaming Data Processing](https://reader036.vdocuments.us/reader036/viewer/2022081604/587138441a28abf0568b62fd/html5/thumbnails/53.jpg)
Proc
essi
ng T
ime
Event Time53
Current processing time
Repeatedly.forever( AfterPane.elementCountAtLeast(2))
![Page 54: Kenneth Knowles - Apache Beam - A Unified Model for Batch and Streaming Data Processing](https://reader036.vdocuments.us/reader036/viewer/2022081604/587138441a28abf0568b62fd/html5/thumbnails/54.jpg)
Current processing time
Proc
essi
ng T
ime
Event Time54
Repeatedly.forever( AfterPane.elementCountAtLeast(2))
![Page 55: Kenneth Knowles - Apache Beam - A Unified Model for Batch and Streaming Data Processing](https://reader036.vdocuments.us/reader036/viewer/2022081604/587138441a28abf0568b62fd/html5/thumbnails/55.jpg)
Proc
essi
ng T
ime
Event Time55
Current processing time
Repeatedly.forever( AfterPane.elementCountAtLeast(2))
![Page 56: Kenneth Knowles - Apache Beam - A Unified Model for Batch and Streaming Data Processing](https://reader036.vdocuments.us/reader036/viewer/2022081604/587138441a28abf0568b62fd/html5/thumbnails/56.jpg)
Proc
essi
ng T
ime
Event Time56
Repeatedly.forever( AfterPane.elementCountAtLeast(2))
Low completeness
Low latency
Cost driven by input$$$
![Page 57: Kenneth Knowles - Apache Beam - A Unified Model for Batch and Streaming Data Processing](https://reader036.vdocuments.us/reader036/viewer/2022081604/587138441a28abf0568b62fd/html5/thumbnails/57.jpg)
Build a finely tuned trigger for your use caseAfterWatermark.pastEndOfWindow()
.withEarlyFirings(
AfterProcessingTime
.pastFirstElementInPane()
.plusDuration(Duration.standardMinutes(1))
.withLateFirings(AfterPane.elementCountAtLeast(1))
57
Bill at end of month
Near real-time estimates
Immediate corrections
![Page 58: Kenneth Knowles - Apache Beam - A Unified Model for Batch and Streaming Data Processing](https://reader036.vdocuments.us/reader036/viewer/2022081604/587138441a28abf0568b62fd/html5/thumbnails/58.jpg)
Proc
essi
ng T
ime
Event Time58
.withEarlyFirings(after 1 minute)
.withLateFirings(ASAP after each element)
![Page 59: Kenneth Knowles - Apache Beam - A Unified Model for Batch and Streaming Data Processing](https://reader036.vdocuments.us/reader036/viewer/2022081604/587138441a28abf0568b62fd/html5/thumbnails/59.jpg)
Proc
essi
ng T
ime
Event Time59
Current processing time
.withEarlyFirings(after 1 minute)
.withLateFirings(ASAP after each element)
![Page 60: Kenneth Knowles - Apache Beam - A Unified Model for Batch and Streaming Data Processing](https://reader036.vdocuments.us/reader036/viewer/2022081604/587138441a28abf0568b62fd/html5/thumbnails/60.jpg)
Proc
essi
ng T
ime
Event Time60
Current processing time
Low completeness
Low latency
Low cost, driven by time$$$
.withEarlyFirings(after 1 minute)
.withLateFirings(ASAP after each element)
![Page 61: Kenneth Knowles - Apache Beam - A Unified Model for Batch and Streaming Data Processing](https://reader036.vdocuments.us/reader036/viewer/2022081604/587138441a28abf0568b62fd/html5/thumbnails/61.jpg)
Current processing time
Proc
essi
ng T
ime
Event Time61
.withEarlyFirings(after 1 minute)
.withLateFirings(ASAP after each element)
![Page 62: Kenneth Knowles - Apache Beam - A Unified Model for Batch and Streaming Data Processing](https://reader036.vdocuments.us/reader036/viewer/2022081604/587138441a28abf0568b62fd/html5/thumbnails/62.jpg)
Current processing time
Proc
essi
ng T
ime
Event Time
Late output
62
.withEarlyFirings(after 1 minute)
.withLateFirings(ASAP after each element)
![Page 63: Kenneth Knowles - Apache Beam - A Unified Model for Batch and Streaming Data Processing](https://reader036.vdocuments.us/reader036/viewer/2022081604/587138441a28abf0568b62fd/html5/thumbnails/63.jpg)
Proc
essi
ng T
ime
Event Time
Late output
63
.withEarlyFirings(after 1 minute)
.withLateFirings(ASAP after each element)
![Page 64: Kenneth Knowles - Apache Beam - A Unified Model for Batch and Streaming Data Processing](https://reader036.vdocuments.us/reader036/viewer/2022081604/587138441a28abf0568b62fd/html5/thumbnails/64.jpg)
Trigger CatalogueComposite TriggersBasic Triggers
64
AfterEndOfWindow()
AfterCount(n)
AfterProcessingTimeDelay(Δ)
AfterEndOfWindow()
.withEarlyFirings(A)
.withLateFirings(B)
AfterAny(A, B)
AfterAll(A, B)
Repeat(A)
Sequence(A, B)
![Page 65: Kenneth Knowles - Apache Beam - A Unified Model for Batch and Streaming Data Processing](https://reader036.vdocuments.us/reader036/viewer/2022081604/587138441a28abf0568b62fd/html5/thumbnails/65.jpg)
The Beam Model: Asking the Right Questions
What are you computing?
Where in event time?
When in processing time are results produced?
How do refinements relate?
65
Accumulation Mode
![Page 66: Kenneth Knowles - Apache Beam - A Unified Model for Batch and Streaming Data Processing](https://reader036.vdocuments.us/reader036/viewer/2022081604/587138441a28abf0568b62fd/html5/thumbnails/66.jpg)
66
The Beam Model: How do refinements relate?2
5 7 14 25
Window.into(...)
.triggering(...)
.discardingFiredPanes()
5
Window.into(...)
.triggering(...)
.accumulatingFiredPanes()
711
![Page 67: Kenneth Knowles - Apache Beam - A Unified Model for Batch and Streaming Data Processing](https://reader036.vdocuments.us/reader036/viewer/2022081604/587138441a28abf0568b62fd/html5/thumbnails/67.jpg)
The Beam Model: Asking the Right Questions
What are you computing?
Where in event time?
When in processing time are results produced?
How do refinements relate?
67
![Page 68: Kenneth Knowles - Apache Beam - A Unified Model for Batch and Streaming Data Processing](https://reader036.vdocuments.us/reader036/viewer/2022081604/587138441a28abf0568b62fd/html5/thumbnails/68.jpg)
68
Beam Project / Technical Vision3
![Page 69: Kenneth Knowles - Apache Beam - A Unified Model for Batch and Streaming Data Processing](https://reader036.vdocuments.us/reader036/viewer/2022081604/587138441a28abf0568b62fd/html5/thumbnails/69.jpg)
Dataflow → BeamGoogleCloudPlatform/DataflowJavaSDK
cloudera/spark-dataflowdataArtisans/flink-dataflow
apache/incubator-beam
Contributors [with GitHub badges] from:Google, Data Artisans, Cloudera, Talend, Paypal, Spotify, Intel, Twitter, Capital One, DataTorrent, …, <your org here>
![Page 70: Kenneth Knowles - Apache Beam - A Unified Model for Batch and Streaming Data Processing](https://reader036.vdocuments.us/reader036/viewer/2022081604/587138441a28abf0568b62fd/html5/thumbnails/70.jpg)
End users - who want to write pipelines in a language that’s familiar.
SDK authors - who want to make Beam concepts available in new languages.
Runner authors - who have a distributed processing environment and want to run Beam pipelines Beam Fn API: Invoke user-definable functions
Apache Flink
Apache Spark
Beam Runner API: Build and submit a pipeline
OtherLanguagesBeam Java
Beam Python
Execution Execution
Cloud Dataflow
Execution
The Beam Vision
Apache Apex
Apache Gearpump (incubating)
![Page 71: Kenneth Knowles - Apache Beam - A Unified Model for Batch and Streaming Data Processing](https://reader036.vdocuments.us/reader036/viewer/2022081604/587138441a28abf0568b62fd/html5/thumbnails/71.jpg)
Outlook
Dataflow Java 1.x
Apache Beam Java 0.x
Apache Beam Java 2.xBug Fix
Feature
Breaking Change
We are
here
Feb 2016
![Page 72: Kenneth Knowles - Apache Beam - A Unified Model for Batch and Streaming Data Processing](https://reader036.vdocuments.us/reader036/viewer/2022081604/587138441a28abf0568b62fd/html5/thumbnails/72.jpg)
Capability Matrix
http://beam.apache.org/learn/runners/capability-matrix/
![Page 73: Kenneth Knowles - Apache Beam - A Unified Model for Batch and Streaming Data Processing](https://reader036.vdocuments.us/reader036/viewer/2022081604/587138441a28abf0568b62fd/html5/thumbnails/73.jpg)
Unified - One model handles batch and streaming use cases.
Portable - Pipelines can be executed on multiple execution environments, avoiding lock-in.
Extensible - Supports user and community driven SDKs, Runners, transformation libraries, and IO connectors.
Why Apache Beam?
![Page 74: Kenneth Knowles - Apache Beam - A Unified Model for Batch and Streaming Data Processing](https://reader036.vdocuments.us/reader036/viewer/2022081604/587138441a28abf0568b62fd/html5/thumbnails/74.jpg)
Why Apache Beam?http://data-artisans.com/why-apache-beam/
"We firmly believe that the Beam model is the correct programming model for streaming and batch data processing."
- Kostas Tzoumas (Data Artisans)
https://cloud.google.com/blog/big-data/2016/05/why-apache-beam-a-google-perspective
"We hope it will lead to a healthy ecosystem of sophisticated runners that compete by making users happy, not [via] API lock in."
- Tyler Akidau (Google)
![Page 75: Kenneth Knowles - Apache Beam - A Unified Model for Batch and Streaming Data Processing](https://reader036.vdocuments.us/reader036/viewer/2022081604/587138441a28abf0568b62fd/html5/thumbnails/75.jpg)
Beam here at Flink Forward 2016
75
I hope you saw:
Beaming Flink to the Cloud @ Netflix Monal Daxini - Netflix
And stay in this room for:
Flink and Beam: Current State & RoadmapMaximilian Michels - data Artisans
No shard left behind: Dynamic work rebalancing in Apache BeamMalo Denielou - Google
![Page 76: Kenneth Knowles - Apache Beam - A Unified Model for Batch and Streaming Data Processing](https://reader036.vdocuments.us/reader036/viewer/2022081604/587138441a28abf0568b62fd/html5/thumbnails/76.jpg)
http://beam.incubator.apache.org/
Join the community!
User discussions - [email protected] discussions - [email protected] @ApacheBeam on Twitter
Good Reads
Why Apache Beam? (from Data Artisans)Why Apache Beam? (from Google)
Streaming 101Streaming 102The Dataflow Beam Model
More Beam!
76
![Page 77: Kenneth Knowles - Apache Beam - A Unified Model for Batch and Streaming Data Processing](https://reader036.vdocuments.us/reader036/viewer/2022081604/587138441a28abf0568b62fd/html5/thumbnails/77.jpg)
END
77