stream application development with apache kafka

44
1 Confidential Stream Application Development with Apache ® Kafka TM Matthias J. Sax | Software Engineer [email protected] @MatthiasJSax

Upload: matthias-j-sax

Post on 09-Jan-2017

985 views

Category:

Software


2 download

TRANSCRIPT

Page 1: Stream Application Development with Apache Kafka

1Confidential

Stream Application Development withApache® KafkaTM

Matthias J. Sax | Software Engineer

[email protected]

@MatthiasJSax

Page 2: Stream Application Development with Apache Kafka

2Confidential

Apache Kafka is … ...a distributed streaming platform

Consumers

Producers

Connectors

Processing

Page 3: Stream Application Development with Apache Kafka

3Confidential

Confluent is ...…a company founded by the original creators of Apache Kafka...a distributed streaming platform

• Built on Apache Kafka• Confluent Open Source• Confluent Enterprise

All components but Kafkaare optional to run Confluent.Mix-and-match them as required.

…a company founded by the original creators of Apache Kafka

Page 4: Stream Application Development with Apache Kafka

4Confidential

Kafka Streams is ...… the easiest way to process data in Kafka (as of v0.10)

• Easy to use library

• Real stream processing / record by record / ms latency

• DSL

• Focus on applications

• No cluster / “cluster to-go”

• ”DB to-go”

• Expressive

• Single record transformations

• Aggregations / Joins

• Time, windowing, out-of-order data

• Stream-table duality

• Tightly integrated within Kafka

• Fault-tolerant

• Scalable (s/m/l/xl), elastic

• Encryption, authentication, authorization

• Stateful

• Backed by Kafka

• Queryable / “DB to-go”

• Date reprocessing

• Application “reset button”

Page 5: Stream Application Development with Apache Kafka

5Confidential

Before Kafka Streams

Do-it-yourself stream Processing• Hard to get right / lots of “glue code”• Fault-tolerance / scalability … ???

plain consumer/producer clients

Page 6: Stream Application Development with Apache Kafka

6Confidential

Before Kafka Streams

Page 7: Stream Application Development with Apache Kafka

7Confidential

Before Kafka Streams

Do-it-yourself stream Processing• Hard to get right / lots of “clue code”• Fault-tolerance / scalability … ???

Using a framework• Requires a cluster

• Bare metal – hard to manage• YARN / Mesos

• Test locally – deploy remotely• “Can you please deploy my code?”

• Jar und dependency hell

How does you application interact with you stream processing job?

plain consumer/producer clients

and others...

Page 8: Stream Application Development with Apache Kafka

8Confidential

Before Kafka Streams

Page 9: Stream Application Development with Apache Kafka

9Confidential

Build apps, not clusters!

Page 10: Stream Application Development with Apache Kafka

10Confidential

Easy to use!

$ java -cp MyApp.jar \ io.confluent.MyApp

Page 11: Stream Application Development with Apache Kafka

11Confidential

Easy to integrate!

Page 12: Stream Application Development with Apache Kafka

12Confidential

Queryable / “DB to-go”

Page 13: Stream Application Development with Apache Kafka

13Confidential

How to install Kafka Streams?Not at all. It’s a library.

<dependency> <groupId>org.apache.kafka</groupId> <artifactId>kafka-streams</artifactId> <version>0.10.1.0</version></dependency>

Page 14: Stream Application Development with Apache Kafka

14Confidential

How do I deploy my app?Whatever works for you. It’s just an app as any other!

Page 15: Stream Application Development with Apache Kafka

15Confidential

If it’s just a regular application…

• How does it scale?• How can it be fault-tolerant?• How does it handle distributed state?

Off-load hard problems to brokers.

• Kafka is a streaming platform: no need to reinvent the wheel• Exploit consumer groups and group management protocol

Page 16: Stream Application Development with Apache Kafka

16Confidential

Scaling

Page 17: Stream Application Development with Apache Kafka

17Confidential

Scaling

Page 18: Stream Application Development with Apache Kafka

18Confidential

Scaling

Easy to scale!It’s elastic!“cluster to-go”

Page 19: Stream Application Development with Apache Kafka

19Confidential

Fault-tolerance

RebalanceConsumer Group

Page 20: Stream Application Development with Apache Kafka

20Confidential

Distributed State

State stores

Page 21: Stream Application Development with Apache Kafka

21Confidential

Distributed State

State stores

Page 22: Stream Application Development with Apache Kafka

22Confidential

Distributed State

Page 23: Stream Application Development with Apache Kafka

23Confidential

Yes it’s complicated…

API, coding

Org. processes

Reality™

Operations

Security

Architecture

Page 24: Stream Application Development with Apache Kafka

24Confidential

But…

API, coding

Org. processes

Reality™

Operations

Security

Architecture

You

Kafka core / Kafka Streams

Page 25: Stream Application Development with Apache Kafka

25Confidential

KStream/KTable

• KStream• Record stream• Each record describes an event in the real world• Example: click stream

• KTable• Changelog stream

• Each record describes a change to a previous record• Example: position report stream• In Kafka Streams:

• KTable holds a materialized view of the latest update per key as internal state

Page 26: Stream Application Development with Apache Kafka

26Confidential

KTable

User profile/location information

alice paris bob zurich alice berlinChangelog stream

alice parisKTable state

alice parisKTable state

bob zurichalice berlinKTable state

bob zurich

Page 27: Stream Application Development with Apache Kafka

27Confidential

KTable (count moves)

alice paris bob zurich alice berlinRecord stream

alice 0KTable state

count()

Changelog stream (output)alice 0

alice 0KTable state

bob 0

count()

bob 0

alice 1KTable state

bob 0

count()

alice 1

Page 28: Stream Application Development with Apache Kafka

28Confidential

KTable (cont.)

• Internal state:• Continuously updating materialized view of the latest status

• Downstream result (“output”)• Changelog stream, describing every update to the materialized view

KStream stream = …KTable table = stream.aggregate(...)

It’s the changelog!

Page 29: Stream Application Development with Apache Kafka

29Confidential

KStream/KTable

Page 30: Stream Application Development with Apache Kafka

30Confidential

Time and Windows• Event time (default)

• Create time• (Broker) Ingestion time• Customized

• (Hopping) Time windows• Overlapping or non-overlapping (tumbling)• For aggregations

• Processing Time

• Sliding windows• For KStream-KStream joins

KStream stream = …

KTable table = stream.aggregate(TimeWindow.of(10 * 1000), ...);

Page 31: Stream Application Development with Apache Kafka

31Confidential

KTable Semantics

• Non-windowed:• State is kept forever:

• Out-of-order/late-arriving records can be handled straightforward• KTable aggregation can be viewed as a landmark window (ie, window size ==

infinite)• Output is a changelog stream

• Windowed:• Windows (ie, state) is kept ”forever” (well, there is a configurable retention time)

• Out-of-order/late-arriving records can be handled straightforward• Output is a changelog stream

• No watermarks required• Early updates/results

Page 32: Stream Application Development with Apache Kafka

32Confidential

Show Code!

Page 33: Stream Application Development with Apache Kafka

33Confidential

Page Views per Region

Stream/TablejoinClick Stream

Profile Changelog

key val

Current User Info

Cnt

PageViews per Region

<userId:region>

<userId:page>

<region:page>

Page 34: Stream Application Development with Apache Kafka

34Confidential

Page Views per Region final KStreamBuilder builder = new KStreamBuilder();

// read record stream from topic “PageView” and changelog stream from topic “UserProfiles” final KStream<String, String> views = builder.stream("PageViews"); // <userId : page> final KTable<String, String> userProfiles = builder.table("UserProfiles", "UserProfilesStore"); // <userId : region>

Page 35: Stream Application Development with Apache Kafka

35Confidential

Page Views per Region final KStreamBuilder builder = new KStreamBuilder();

// read record stream from topic “PageView” and changelog stream from topic “UserProfiles” final KStream<String, String> views = builder.stream("PageViews"); // <userId : page> final KTable<String, String> userProfiles = builder.table("UserProfiles", "UserProfilesStore"); // <userId : region>

// enrich page views with user’s region -- stream-table-join final KStream<String, String> viewsWithRegionKey = views.leftJoin(userProfiles, (page, userRegion) -> page + “,” + userRegion ) // and set “region” as new key .map( (userId, pageAndRegion) -> new KeyValue<>(pageAndRegion.split(“,”)[1], pageAndRegion.split(“,”)[0]) );

Page 36: Stream Application Development with Apache Kafka

36Confidential

Page Views per Region final KStreamBuilder builder = new KStreamBuilder();

// read record stream from topic “PageView” and changelog stream from topic “UserProfiles” final KStream<String, String> views = builder.stream("PageViews"); // <userId : page> final KTable<String, String> userProfiles = builder.table("UserProfiles", "UserProfilesStore"); // <userId : region>

// enrich page views with user’s region -- stream-table-join AND set “region” as new key final KStream<String, String> viewsWithRegionKey = views.leftJoin(userProfiles, ...).map(...); // <region : page>

// count views by region, using hopping windows of size 5 minutes that advance every 1 minute final KTable<Windowed<String>, Long> viewsPerRegion = viewsWithRegionKey .groupByKey() // redistribute data .count(TimeWindow.of(5 * 60 * 1000L).advanceBy(60 * 1000L), "GeoPageViewsStore");

Page 37: Stream Application Development with Apache Kafka

37Confidential

Page Views per Region final KStreamBuilder builder = new KStreamBuilder();

// read record stream from topic “PageView” and changelog stream from topic “UserProfiles” final KStream<String, String> views = builder.stream("PageViews"); // <userId : page> final KTable<String, String> userProfiles = builder.table("UserProfiles", "UserProfilesStore"); // <userId : region>

// enrich page views with user’s region -- stream-table-join AND set “region” as new key final KStream<String, String> viewsWithRegionKey = views.leftJoin(userProfiles, ...).map(...); // <region : page> // count views by region, using hopping windows of size 5 minutes that advance every 1 minute final KTable<Windowed<String>, Long> viewsByRegion = viewsWithRegionKey.groupByKey().count(TimeWindow.of(...)..., ...);

// write result viewsByRegion.toStream( (windowedRegion, count) -> windowedRegion.toString() ) // prepare result .to(stringSerde, longSerde, "PageViewsByRegion"); // write to topic “PageViewsByResion”

Page 38: Stream Application Development with Apache Kafka

38Confidential

Page Views per Region final KStreamBuilder builder = new KStreamBuilder();

// read record stream from topic “PageView” and changelog stream from topic “UserProfiles” final KStream<String, String> views = builder.stream("PageViews"); // <userId : page> final KTable<String, String> userProfiles = builder.table("UserProfiles", "UserProfilesStore"); // <userId : region>

// enrich page views with user’s region -- stream-table-join AND set “region” as new key final KStream<String, String> viewsWithRegionKey = views.leftJoin(userProfiles, ...).map(...); // <region : page> // count views by region, using hopping windows of size 5 minutes that advance every 1 minute final KTable<Windowed<String>, Long> viewsByRegion = viewsWithRegionKey.groupByKey().count(TimeWindow.of(...)..., ...);

// write result to topic “PageViewsByResion” viewsByRegion.toStream(...).to(..., "PageViewsByRegion");

// start application final KafkaStreams streams = new KafkaStreams(builder, streamsConfiguration); // streamsConfiguration omitted for brevity streams.start(); // stop application streams.close();

/* https://github.com/confluentinc/examples/blob/3.1.x/kafka-streams/src/main/java/io/confluent/examples/streams/PageViewRegionExample.java */

Page 39: Stream Application Development with Apache Kafka

39Confidential

Interactive Queries

• KTable is a changelog stream with materialized internal view (state)• KStream-KTable join can do lookups into the materialized view• What if the application could do lookups, too?

https://www.confluent.io/blog/unifying-stream-processing-and-interactive-queries-in-apache-kafka/

Yes, it can!“DB to-go“

Page 40: Stream Application Development with Apache Kafka

40Confidential

Interactive Queries

charlie 3bob 5 alice 2

New API to accesslocal state stores ofan app instance

Page 41: Stream Application Development with Apache Kafka

41Confidential

Interactive Queries

charlie 3bob 5 alice 2

New API to discoverrunning app instances

“host1:4460”

“host5:5307”

“host3:4777”

Page 42: Stream Application Development with Apache Kafka

42Confidential

Interactive Queries

charlie 3bob 5 alice 2

You: inter-app communication (RPC layer)

Page 43: Stream Application Development with Apache Kafka

43Confidential

Wrapping Up• Kafka Streams is available in Apache Kafka 0.10 and Confluent Platform 3.1

• http://kafka.apache.org/• http://www.confluent.io/download (OS + enterprise versions, tar/zip/deb/rpm)

• Kafka Streams demos at https://github.com/confluentinc/examples • Java 7, Java 8+ with lambdas, and Scala• WordCount, Joins, Avro integration, Top-N computation, Windowing, Interactive

Queries

• Apache Kafka documentation: http://kafka.apache.org/documentation.html• Confluent documentation: http://docs.confluent.io/current/streams/

• Quickstart, Concepts, Architecture, Developer Guide, FAQ

• Join our bi-weekly Confluent Developer Roundtable sessions on Kafka Streams• Contact me at [email protected] for detail

Page 44: Stream Application Development with Apache Kafka

44Confidential

Thank YouWe are hiring!