stream processing made simple with kafka

104
Kafka Streams Stream processing Made Simple with Kafka 1 Guozhang Wang Hadoop Summit, June 28, 2016

Upload: dataworks-summithadoop-summit

Post on 16-Apr-2017

675 views

Category:

Technology


1 download

TRANSCRIPT

Page 1: Stream Processing made simple with Kafka

Kafka Streams Stream processing Made Simple with Kafka

1

Guozhang Wang Hadoop Summit, June 28, 2016

Page 2: Stream Processing made simple with Kafka

2

What is NOT Stream Processing?

Page 3: Stream Processing made simple with Kafka

3

Stream Processing isn’t (necessarily)

• Transient, approximate, lossy…

• .. that you must have batch processing as safety net

Page 4: Stream Processing made simple with Kafka

4

Page 5: Stream Processing made simple with Kafka

5

Page 6: Stream Processing made simple with Kafka

6

Page 7: Stream Processing made simple with Kafka

7

Page 8: Stream Processing made simple with Kafka

8

Stream Processing

• A different programming paradigm

• .. that brings computation to unbounded data

• .. with tradeoffs between latency / cost / correctness

Page 9: Stream Processing made simple with Kafka

9

Why Kafka in Stream Processing?

Page 10: Stream Processing made simple with Kafka

10

• Persistent Buffering

• Logical Ordering

• Highly Scalable “source-of-truth”

Kafka: Real-time Platforms

Page 11: Stream Processing made simple with Kafka

11

Stream Processing with Kafka

Page 12: Stream Processing made simple with Kafka

12

• Option I: Do It Yourself !

Stream Processing with Kafka

Page 13: Stream Processing made simple with Kafka

13

• Option I: Do It Yourself !

Stream Processing with Kafka

while (isRunning) { // read some messages from Kafka inputMessages = consumer.poll();

// do some processing…

// send output messages back to Kafka producer.send(outputMessages); }

Page 14: Stream Processing made simple with Kafka

14

• Ordering

• Partitioning &

Scalability

• Fault tolerance

DIY Stream Processing is Hard

• State Management

• Time, Window &

Out-of-order Data

• Re-processing

Page 15: Stream Processing made simple with Kafka

15

• Option I: Do It Yourself !

• Option II: full-fledged stream processing system

• Storm, Spark, Flink, Samza, ..

Stream Processing with Kafka

Page 16: Stream Processing made simple with Kafka

16

MapReduce Heritage?

• Config Management

• Resource Management

• Configuration

• etc..

Page 17: Stream Processing made simple with Kafka

17

MapReduce Heritage?

• Config Management

• Resource Management

• Deployment

• etc..

Page 18: Stream Processing made simple with Kafka

18

MapReduce Heritage?

• Config Management

• Resource Management

• Deployment

• etc..

Can I just use my own?!

Page 19: Stream Processing made simple with Kafka

19

• Option I: Do It Yourself !

• Option II: full-fledged stream processing system

• Option III: lightweight stream processing library

Stream Processing with Kafka

Page 20: Stream Processing made simple with Kafka

Kafka Streams

• In Apache Kafka since v0.10, May 2016

• Powerful yet easy-to-use stream processing library• Event-at-a-time, Stateful

• Windowing with out-of-order handling

• Highly scalable, distributed, fault tolerant

• and more..20

Page 21: Stream Processing made simple with Kafka

21

Anywhere, anytime

Ok. Ok. Ok. Ok.

Page 22: Stream Processing made simple with Kafka

22

Anywhere, anytime

<dependency>

<groupId>org.apache.kafka</groupId> <artifactId>kafka-streams</artifactId> <version>0.10.0.0</version>

</dependency>

Page 23: Stream Processing made simple with Kafka

23

Anywhere, anytime

War File

Rsync

Puppet/

Chef

YARN

Mesos

Docker

Kuberne

tes

Very Uncool Very Cool

Page 24: Stream Processing made simple with Kafka

24

Simple is Beautiful

Page 25: Stream Processing made simple with Kafka

Kafka Streams DSL

25

public static void main(String[] args) { // specify the processing topology by first reading in a stream from a topic KStream<String, String> words = builder.stream(”topic1”);

// count the words in this stream as an aggregated table KTable<String, Long> counts = words.countByKey(”Counts”);

// write the result table to a new topic counts.to(”topic2”);

// create a stream processing instance and start running it KafkaStreams streams = new KafkaStreams(builder, config); streams.start(); }

Page 26: Stream Processing made simple with Kafka

Kafka Streams DSL

26

public static void main(String[] args) { // specify the processing topology by first reading in a stream from a topic KStream<String, String> words = builder.stream(”topic1”);

// count the words in this stream as an aggregated table KTable<String, Long> counts = words.countByKey(”Counts”);

// write the result table to a new topic counts.to(”topic2”);

// create a stream processing instance and start running it KafkaStreams streams = new KafkaStreams(builder, config); streams.start(); }

Page 27: Stream Processing made simple with Kafka

Kafka Streams DSL

27

public static void main(String[] args) { // specify the processing topology by first reading in a stream from a topic KStream<String, String> words = builder.stream(”topic1”);

// count the words in this stream as an aggregated table KTable<String, Long> counts = words.countByKey(”Counts”);

// write the result table to a new topic counts.to(”topic2”);

// create a stream processing instance and start running it KafkaStreams streams = new KafkaStreams(builder, config); streams.start(); }

Page 28: Stream Processing made simple with Kafka

Kafka Streams DSL

28

public static void main(String[] args) { // specify the processing topology by first reading in a stream from a topic KStream<String, String> words = builder.stream(”topic1”);

// count the words in this stream as an aggregated table KTable<String, Long> counts = words.countByKey(”Counts”);

// write the result table to a new topic counts.to(”topic2”);

// create a stream processing instance and start running it KafkaStreams streams = new KafkaStreams(builder, config); streams.start(); }

Page 29: Stream Processing made simple with Kafka

Kafka Streams DSL

29

public static void main(String[] args) { // specify the processing topology by first reading in a stream from a topic KStream<String, String> words = builder.stream(”topic1”);

// count the words in this stream as an aggregated table KTable<String, Long> counts = words.countByKey(”Counts”);

// write the result table to a new topic counts.to(”topic2”);

// create a stream processing instance and start running it KafkaStreams streams = new KafkaStreams(builder, config); streams.start(); }

Page 30: Stream Processing made simple with Kafka

Kafka Streams DSL

30

public static void main(String[] args) { // specify the processing topology by first reading in a stream from a topic KStream<String, String> words = builder.stream(”topic1”);

// count the words in this stream as an aggregated table KTable<String, Long> counts = words.countByKey(”Counts”);

// write the result table to a new topic counts.to(”topic2”);

// create a stream processing instance and start running it KafkaStreams streams = new KafkaStreams(builder, config); streams.start(); }

Page 31: Stream Processing made simple with Kafka

31

Native Kafka IntegrationProperty cfg = new Properties();

cfg.put(StreamsConfig.APPLICATION_ID_CONFIG, “my-streams-app”);

cfg.put(StreamsConfig.BOOTSTRAP_SERVERS_CONFIG, “broker1:9092”);

cfg.put(ConsumerConfig.AUTO_OFFSET_RESET_CONIFG, “earliest”);

cfg.put(CommonClientConfigs.SECURITY_PROTOCOL_CONFIG, “SASL_SSL”);

cfg.put(KafkaAvroSerDeConfig.SCHEMA_REGISTRY_URL_CONFIG, “registry:8081”);

StreamsConfig config = new StreamsConfig(cfg);

KafkaStreams streams = new KafkaStreams(builder, config);

Page 32: Stream Processing made simple with Kafka

32

Property cfg = new Properties();

cfg.put(StreamsConfig.APPLICATION_ID_CONFIG, “my-streams-app”);

cfg.put(StreamsConfig.BOOTSTRAP_SERVERS_CONFIG, “broker1:9092”);

cfg.put(ConsumerConfig.AUTO_OFFSET_RESET_CONIFG, “earliest”);

cfg.put(CommonClientConfigs.SECURITY_PROTOCOL_CONFIG, “SASL_SSL”);

cfg.put(KafkaAvroSerDeConfig.SCHEMA_REGISTRY_URL_CONFIG, “registry:8081”);

StreamsConfig config = new StreamsConfig(cfg);

KafkaStreams streams = new KafkaStreams(builder, config);

Native Kafka Integration

Page 33: Stream Processing made simple with Kafka

33

API, coding

“Full stack” evaluation

Operations, debugging, …

Page 34: Stream Processing made simple with Kafka

34

API, coding

“Full stack” evaluation

Operations, debugging, …

Simple is Beautiful

Page 35: Stream Processing made simple with Kafka

35

Key Idea:

Outsource hard problems to Kafka!

Page 36: Stream Processing made simple with Kafka

Kafka Concepts: the Log

4 5 5 7 8 9 10 11 12...

Producer Write

Consumer1 Reads (offset 7)

Consumer2 Reads (offset 10)

Messages

3

Page 37: Stream Processing made simple with Kafka

Topic 1

Topic 2

Partitions

Producers

Producers

Consumers

Consumers

Brokers

Kafka Concepts: the Log

Page 38: Stream Processing made simple with Kafka

38

Kafka Streams: Key Concepts

Page 39: Stream Processing made simple with Kafka

Stream and Records

39

Key Value Key Value Key Value Key Value

Stream

Record

Page 40: Stream Processing made simple with Kafka

Processor Topology

40

Stream

Page 41: Stream Processing made simple with Kafka

Processor Topology

41

StreamProcessor

Page 42: Stream Processing made simple with Kafka

Processor Topology

42

KStream<..> stream1 = builder.stream(”topic1”);

KStream<..> stream2 = builder.stream(”topic2”);

KStream<..> joined = stream1.leftJoin(stream2, ...);

KTable<..> aggregated = joined.aggregateByKey(...);

aggregated.to(”topic3”);

Page 43: Stream Processing made simple with Kafka

Processor Topology

43

KStream<..> stream1 = builder.stream(”topic1”);

KStream<..> stream2 = builder.stream(”topic2”);

KStream<..> joined = stream1.leftJoin(stream2, ...);

KTable<..> aggregated = joined.aggregateByKey(...);

aggregated.to(”topic3”);

Page 44: Stream Processing made simple with Kafka

Processor Topology

44

KStream<..> stream1 = builder.stream(”topic1”);

KStream<..> stream2 = builder.stream(”topic2”);

KStream<..> joined = stream1.leftJoin(stream2, ...);

KTable<..> aggregated = joined.aggregateByKey(...);

aggregated.to(”topic3”);

Page 45: Stream Processing made simple with Kafka

Processor Topology

45

KStream<..> stream1 = builder.stream(”topic1”);

KStream<..> stream2 = builder.stream(”topic2”);

KStream<..> joined = stream1.leftJoin(stream2, ...);

KTable<..> aggregated = joined.aggregateByKey(...);

aggregated.to(”topic3”);

Page 46: Stream Processing made simple with Kafka

Processor Topology

46

KStream<..> stream1 = builder.stream(”topic1”);

KStream<..> stream2 = builder.stream(”topic2”);

KStream<..> joined = stream1.leftJoin(stream2, ...);

KTable<..> aggregated = joined.aggregateByKey(...);

aggregated.to(”topic3”);

Page 47: Stream Processing made simple with Kafka

Processor Topology

47

Source Processor

Sink Processor

KStream<..> stream1 = builder.stream(

KStream<..> stream2 = builder.stream(

aggregated.to(

Page 48: Stream Processing made simple with Kafka

Processor Topology

48Kafka Streams Kafka

Page 49: Stream Processing made simple with Kafka

Kafka Topic B

Data Parallelism

49

Kafka Topic A

MyApp.1 MyApp.2Task2Task1

Page 50: Stream Processing made simple with Kafka

50

• Ordering

• Partitioning &

Scalability

• Fault tolerance

Stream Processing Hard Parts

• State Management

• Time, Window &

Out-of-order Data

• Re-processing

Page 51: Stream Processing made simple with Kafka

States in Stream Processing

51

• filter

• map

• join

• aggregate

Stateless

Stateful

Page 52: Stream Processing made simple with Kafka

52

Page 53: Stream Processing made simple with Kafka

States in Stream Processing

53

KStream<..> stream1 = builder.stream(”topic1”);

KStream<..> stream2 = builder.stream(”topic2”);

KStream<..> joined = stream1.leftJoin(stream2, ...);

KTable<..> aggregated = joined.aggregateByKey(...);

aggregated.to(”topic2”);

State

Page 54: Stream Processing made simple with Kafka

Kafka Topic B

Task2Task1

States in Stream Processing

54

Kafka Topic A

State State

Page 55: Stream Processing made simple with Kafka

It’s all about Time

• Event-time (when an event is created)

• Processing-time (when an event is processed)

55

Page 56: Stream Processing made simple with Kafka

Event-time 1 2 3 4 5 6 7Processing-time 1999 2002 2005 1997 1980 1983 2015

56

PHAN

TOM

MEN

ACE

ATTA

CK O

F TH

E CL

ON

ES

REV

ENG

E O

F TH

E SI

TH

A N

EW H

OPE

THE

EMPI

RE

STR

IKES

BAC

K

RET

UR

N O

F TH

E JE

DI

THE

FORC

E AW

AKEN

S

Out-of-Order

Page 57: Stream Processing made simple with Kafka

Timestamp Extractor

57

public long extract(ConsumerRecord<Object, Object> record) {

return System.currentTimeMillis();

}

public long extract(ConsumerRecord<Object, Object> record) {

return record.timestamp();

}

Page 58: Stream Processing made simple with Kafka

Timestamp Extractor

58

public long extract(ConsumerRecord<Object, Object> record) {

return System.currentTimeMillis();

}

public long extract(ConsumerRecord<Object, Object> record) {

return record.timestamp();

}

processing-time

Page 59: Stream Processing made simple with Kafka

Timestamp Extractor

59

public long extract(ConsumerRecord<Object, Object> record) {

return System.currentTimeMillis();

}

public long extract(ConsumerRecord<Object, Object> record) {

return record.timestamp();

}

processing-time

event-time

Page 60: Stream Processing made simple with Kafka

Windowing

60

t…

Page 61: Stream Processing made simple with Kafka

Windowing

61

t…

Page 62: Stream Processing made simple with Kafka

Windowing

62

t…

Page 63: Stream Processing made simple with Kafka

Windowing

63

t…

Page 64: Stream Processing made simple with Kafka

Windowing

64

t…

Page 65: Stream Processing made simple with Kafka

Windowing

65

t…

Page 66: Stream Processing made simple with Kafka

Windowing

66

t…

Page 67: Stream Processing made simple with Kafka

67

• Ordering

• Partitioning &

Scalability

• Fault tolerance

Stream Processing Hard Parts

• State Management

• Time, Window &

Out-of-order Data

• Re-processing

Page 68: Stream Processing made simple with Kafka

Stream v.s. Table?

68

KStream<..> stream1 = builder.stream(”topic1”);

KStream<..> stream2 = builder.stream(”topic2”);

KStream<..> joined = stream1.leftJoin(stream2, ...);

KTable<..> aggregated = joined.aggregateByKey(...);

aggregated.to(”topic2”);

State

Page 69: Stream Processing made simple with Kafka

69

Tables ≈ Streams

Page 70: Stream Processing made simple with Kafka

70

Page 71: Stream Processing made simple with Kafka

71

Page 72: Stream Processing made simple with Kafka

72

Page 73: Stream Processing made simple with Kafka

The Stream-Table Duality

• A stream is a changelog of a table

• A table is a materialized view at time of a stream

• Example: change data capture (CDC) of databases

73

Page 74: Stream Processing made simple with Kafka

KStream = interprets data as record stream

~ think: “append-only”

KTable = data as changelog stream

~ continuously updated materialized view

74

Page 75: Stream Processing made simple with Kafka

75

alice eggs bob lettuce alice milk

alice lnkd bob googl alice msft

KStream

KTable

User purchase history

User employment profile

Page 76: Stream Processing made simple with Kafka

76

alice eggs bob lettuce alice milk

alice lnkd bob googl alice msft

KStream

KTable

User purchase history

User employment profile

time

“Alice bought eggs.”

“Alice is now at LinkedIn.”

Page 77: Stream Processing made simple with Kafka

77

alice eggs bob lettuce alice milk

alice lnkd bob googl alice msft

KStream

KTable

User purchase history

User employment profile

time

“Alice bought eggs and milk.”

“Alice is now at LinkedIn Microsoft.”

Page 78: Stream Processing made simple with Kafka

78

alice 2 bob 10 alice 3

timeKStream.aggregate()

KTable.aggregate()

(key: Alice, value: 2)

(key: Alice, value: 2)

Page 79: Stream Processing made simple with Kafka

79

alice 2 bob 10 alice 3

time

(key: Alice, value: 2 3)

(key: Alice, value: 2+3)

KStream.aggregate()

KTable.aggregate()

Page 80: Stream Processing made simple with Kafka

80

KStream KTable

reduce() aggregate() …

toStream()

map() filter() join() …

map() filter() join() …

Page 81: Stream Processing made simple with Kafka

81

KTable aggregated

KStream joined

KStream stream1KStream stream2

Updates Propagation in KTable

State

Page 82: Stream Processing made simple with Kafka

82

KTable aggregated

KStream joined

KStream stream1KStream stream2

State

Updates Propagation in KTable

Page 83: Stream Processing made simple with Kafka

83

KTable aggregated

KStream joined

KStream stream1KStream stream2

State

Updates Propagation in KTable

Page 84: Stream Processing made simple with Kafka

84

KTable aggregated

KStream joined

KStream stream1KStream stream2

State

Updates Propagation in KTable

Page 85: Stream Processing made simple with Kafka

85

• Ordering

• Partitioning &

Scalability

• Fault tolerance

Stream Processing Hard Parts

• State Management

• Time, Window &

Out-of-order Data

• Re-processing

Page 86: Stream Processing made simple with Kafka

86

Remember?

Page 87: Stream Processing made simple with Kafka

87

StateProcess

StateProcess

StateProcess

Kafka ChangelogFault ToleranceKafka

Kafka Streams

Kafka

Page 88: Stream Processing made simple with Kafka

88

StateProcess

StateProcess Protoco

l

StateProcess

Fault ToleranceKafka

Kafka Streams

Kafka Changelog

Kafka

Page 89: Stream Processing made simple with Kafka

89

StateProcess

StateProcess Protoco

l

StateProcess

Fault Tolerance

StateProcess

KafkaKafka Streams

Kafka Changelog

Kafka

Page 90: Stream Processing made simple with Kafka

90

Page 91: Stream Processing made simple with Kafka

91

Page 92: Stream Processing made simple with Kafka

92

Page 93: Stream Processing made simple with Kafka

93

Page 94: Stream Processing made simple with Kafka

94

• Ordering

• Partitioning &

Scalability

• Fault tolerance

Stream Processing Hard Parts

• State Management

• Time, Window &

Out-of-order Data

• Re-processing

Page 95: Stream Processing made simple with Kafka

95

• Ordering

• Partitioning &

Scalability

• Fault tolerance

Stream Processing Hard Parts

• State Management

• Time, Window &

Out-of-order Data

• Re-processing

Simple is Beautiful

Page 96: Stream Processing made simple with Kafka

96

But how to get data in / out Kafka?

Page 97: Stream Processing made simple with Kafka

97

Page 98: Stream Processing made simple with Kafka

98

Page 99: Stream Processing made simple with Kafka

99

Page 100: Stream Processing made simple with Kafka

100

Page 101: Stream Processing made simple with Kafka

Take-aways

• Stream Processing: a new programming paradigm

101

Page 102: Stream Processing made simple with Kafka

Take-aways

• Stream Processing: a new programming paradigm

• Kafka Streams: stream processing made easy

102

Page 103: Stream Processing made simple with Kafka

Take-aways

• Stream Processing: a new programming paradigm

• Kafka Streams: stream processing made easy

103

THANKS!

Guozhang Wang | [email protected] | @guozhangwang

Visit Confluent at the Syncsort Booth (#1303), live demos @ 29thDownload Kafka Streams: www.confluent.io/product

Page 104: Stream Processing made simple with Kafka

104

We are Hiring!