introducing apache kafka's streams api - kafka meetup munich, jan 25 2017

72
1 Apache Kafka meetup, Munich, Germany, Jan 25, 2017 Introducing Kafka’s Streams API Taking real-time processing to the mainstream Michael Noll <[email protected]> Product manager, Confluent

Upload: michael-noll

Post on 07-Feb-2017

272 views

Category:

Data & Analytics


0 download

TRANSCRIPT

Page 1: Introducing Apache Kafka's Streams API - Kafka meetup Munich, Jan 25 2017

1Apache Kafka meetup, Munich, Germany, Jan 25, 2017

Introducing Kafka’s Streams APITaking real-time processing to the mainstream

Michael Noll <[email protected]>Product manager, Confluent

Page 2: Introducing Apache Kafka's Streams API - Kafka meetup Munich, Jan 25 2017

2Apache Kafka meetup, Munich, Germany, Jan 25, 2017

Page 3: Introducing Apache Kafka's Streams API - Kafka meetup Munich, Jan 25 2017

3Apache Kafka meetup, Munich, Germany, Jan 25, 2017

Page 4: Introducing Apache Kafka's Streams API - Kafka meetup Munich, Jan 25 2017

4Apache Kafka meetup, Munich, Germany, Jan 25, 2017

Page 5: Introducing Apache Kafka's Streams API - Kafka meetup Munich, Jan 25 2017

5Apache Kafka meetup, Munich, Germany, Jan 25, 2017

Our Dream Our Reality

Page 6: Introducing Apache Kafka's Streams API - Kafka meetup Munich, Jan 25 2017

6Confidential

Kafka’s Streams APITaking real-time processing to the mainstream

Page 7: Introducing Apache Kafka's Streams API - Kafka meetup Munich, Jan 25 2017

7Apache Kafka meetup, Munich, Germany, Jan 25, 2017

Taking real-time processing to the mainstreamKafka’s Streams API• Powerful yet easy-to-use library to build stream

processing apps• Apps are standard Java applications that run on client

machines• Part of open source Apache Kafka, introduced in 0.10+• https://github.com/apache/kafka/tree/trunk/streams

Streams API

Your App

KafkaCluster

Page 8: Introducing Apache Kafka's Streams API - Kafka meetup Munich, Jan 25 2017

8Apache Kafka meetup, Munich, Germany, Jan 25, 2017

<dependency> <groupId>org.apache.kafka</groupId> <artifactId>kafka-streams</artifactId> <version>0.10.1.1</version></dependency>

Build Applications, not Clusters!

Page 9: Introducing Apache Kafka's Streams API - Kafka meetup Munich, Jan 25 2017

9Apache Kafka meetup, Munich, Germany, Jan 25, 2017

“Cluster to go”: elastic, scalable, distributed, fault-tolerant, secure apps

Page 10: Introducing Apache Kafka's Streams API - Kafka meetup Munich, Jan 25 2017

10Apache Kafka meetup, Munich, Germany, Jan 25, 2017

”Database to go”: tables, state management, interactive queries

Page 11: Introducing Apache Kafka's Streams API - Kafka meetup Munich, Jan 25 2017

11Apache Kafka meetup, Munich, Germany, Jan 25, 2017

Equally viable for S / M / L / XL / XXL use cases

Ok. Ok. Ok.

Page 12: Introducing Apache Kafka's Streams API - Kafka meetup Munich, Jan 25 2017

12Apache Kafka meetup, Munich, Germany, Jan 25, 2017

Runs everywhere: from containers to cloud

Page 13: Introducing Apache Kafka's Streams API - Kafka meetup Munich, Jan 25 2017

13Apache Kafka meetup, Munich, Germany, Jan 25, 2017

When to use Kafka’s Streams API

Use case examples• Customer 360-degree view• Fleet or inventory management• Fraud detection• Real-time monitoring &

intelligence• Location-based marketing• Virtual Reality (avatar replication)• <and more>

To build real-time applications for your core business

Scenarios• Microservices• Fast Data apps for small and big

data• Reactive applications• Continuous queries and

transformations• Event-triggered processes• The “T” in ETL• <and more>

Page 14: Introducing Apache Kafka's Streams API - Kafka meetup Munich, Jan 25 2017

14Apache Kafka meetup, Munich, Germany, Jan 25, 2017

Some public use cases in the wild & external articles• Why Kafka Streams: towards a real-time streaming architecture, by Sky

Betting and Gaming• http://engineering.skybettingandgaming.com/2017/01/23/streaming-architectures/

• Applying Kafka’s Streams API for social messaging at LINE Corp.• http://developers.linecorp.com/blog/?p=3960 • Production pipeline at LINE, a social platform based in Japan with 220+ million users

• Microservices and Reactive Applications at Capital One• https://

speakerdeck.com/bobbycalderwood/commander-decoupled-immutable-rest-apis-with-kafka-streams

• Containerized Kafka Streams applications in Scala, by Hive Streaming• https://www.madewithtea.com/processing-tweets-with-kafka-streams.html

• Geo-spatial data analysis• http://www.infolace.com/blog/2016/07/14/simple-spatial-windowing-with-kafka-streams/

• Language classification with machine learning• https://dzone.com/articles/machine-learning-with-kafka-streams

Page 15: Introducing Apache Kafka's Streams API - Kafka meetup Munich, Jan 25 2017

15Apache Kafka meetup, Munich, Germany, Jan 25, 2017

Kafka Summit NYC, May 09

Here, the community will sharelatest Kafka Streams use cases.

http://kafka-summit.org/

Page 16: Introducing Apache Kafka's Streams API - Kafka meetup Munich, Jan 25 2017

16Apache Kafka meetup, Munich, Germany, Jan 25, 2017

Do more with less

Page 17: Introducing Apache Kafka's Streams API - Kafka meetup Munich, Jan 25 2017

17Confidential

Architecture comparison: use case exampleReal-time dashboard for security monitoring

“Which of my data centers are under attack?”

Page 18: Introducing Apache Kafka's Streams API - Kafka meetup Munich, Jan 25 2017

18Apache Kafka meetup, Munich, Germany, Jan 25, 2017

Page 19: Introducing Apache Kafka's Streams API - Kafka meetup Munich, Jan 25 2017

19Apache Kafka meetup, Munich, Germany, Jan 25, 2017

With Streams API

Page 20: Introducing Apache Kafka's Streams API - Kafka meetup Munich, Jan 25 2017

20Apache Kafka meetup, Munich, Germany, Jan 25, 2017

Organizational benefits: decouple teams and roadmaps, scale people

Page 21: Introducing Apache Kafka's Streams API - Kafka meetup Munich, Jan 25 2017

21Apache Kafka meetup, Munich, Germany, Jan 25, 2017

Available APIs

Page 22: Introducing Apache Kafka's Streams API - Kafka meetup Munich, Jan 25 2017

22Apache Kafka meetup, Munich, Germany, Jan 25, 2017

• API option 1: DSL (declarative)

KStream<Integer, Integer> input = builder.stream("numbers-topic");

// Stateless computationKStream<Integer, Integer> doubled = input.mapValues(v -> v * 2);

// Stateful computationKTable<Integer, Integer> sumOfOdds = input .filter((k,v) -> v % 2 != 0) .selectKey((k, v) -> 1) .groupByKey() .reduce((v1, v2) -> v1 + v2, "sum-of-odds");

The preferred API for most use cases.

9 out of 10 users pick the DSL.

Particularly appeals to:• Fans of Scala, functional

programming• Users familiar with e.g. Spark

Page 23: Introducing Apache Kafka's Streams API - Kafka meetup Munich, Jan 25 2017

23Apache Kafka meetup, Munich, Germany, Jan 25, 2017

• API option 2: Processor API (imperative)

class PrintToConsoleProcessor implements Processor<K, V> {

@Override public void init(ProcessorContext context) {}

@Override void process(K key, V value) { System.out.println("Got value " + value); }

@Override void punctuate(long timestamp) {}

@Override void close() {}}

Full flexibility but more manual work

Appeals to:• Users who require functionality

that isnot yet available in the DSL

• Users familiar with e.g. Storm, Samza• Still, check out the DSL!

Page 24: Introducing Apache Kafka's Streams API - Kafka meetup Munich, Jan 25 2017

24Apache Kafka meetup, Munich, Germany, Jan 25, 2017

Writing and running your first application• Preparation: Ensure Kafka cluster is accessible, has data to process

• Step 1: Write the application code in Java or Scala, see next slide• Great starting point: https://github.com/confluentinc/examples • Documentation: http://docs.confluent.io/current/streams/

• Step 2: Run the application• During development: from your IDE, from CLI … (pro tip: Application Reset Tool is great for

playing around)• In production: e.g. bundle as fat jar, then `java -cp my-fatjar.jar

com.example.MyStreamsApp`• http://

docs.confluent.io/current/streams/developer-guide.html#running-a-kafka-streams-application

Page 25: Introducing Apache Kafka's Streams API - Kafka meetup Munich, Jan 25 2017

25Apache Kafka meetup, Munich, Germany, Jan 25, 2017

Example: complete app, ready for production at large-scaleWordCoun

t

App configuration

Define processing(here: WordCount)

Start processing

Page 26: Introducing Apache Kafka's Streams API - Kafka meetup Munich, Jan 25 2017

26Apache Kafka meetup, Munich, Germany, Jan 25, 2017

Key concepts

Page 27: Introducing Apache Kafka's Streams API - Kafka meetup Munich, Jan 25 2017

27Apache Kafka meetup, Munich, Germany, Jan 25, 2017

Key concepts

Page 28: Introducing Apache Kafka's Streams API - Kafka meetup Munich, Jan 25 2017

28Apache Kafka meetup, Munich, Germany, Jan 25, 2017

Key concepts

Page 29: Introducing Apache Kafka's Streams API - Kafka meetup Munich, Jan 25 2017

29Apache Kafka meetup, Munich, Germany, Jan 25, 2017

Key concepts

Kafka’s data model Kafka’s Streams API

Page 30: Introducing Apache Kafka's Streams API - Kafka meetup Munich, Jan 25 2017

30Confidential

Streams and TablesStream Processing meets Databases

Page 31: Introducing Apache Kafka's Streams API - Kafka meetup Munich, Jan 25 2017

31Apache Kafka meetup, Munich, Germany, Jan 25, 2017

Page 32: Introducing Apache Kafka's Streams API - Kafka meetup Munich, Jan 25 2017

32Apache Kafka meetup, Munich, Germany, Jan 25, 2017

Page 33: Introducing Apache Kafka's Streams API - Kafka meetup Munich, Jan 25 2017

33Apache Kafka meetup, Munich, Germany, Jan 25, 2017

Key observation: close relationship between Streams and Tables

http://www.confluent.io/blog/introducing-kafka-streams-stream-processing-made-simple http://docs.confluent.io/current/streams/concepts.html#duality-of-streams-and-tables

Page 34: Introducing Apache Kafka's Streams API - Kafka meetup Munich, Jan 25 2017

34Apache Kafka meetup, Munich, Germany, Jan 25, 2017

Page 35: Introducing Apache Kafka's Streams API - Kafka meetup Munich, Jan 25 2017

35Apache Kafka meetup, Munich, Germany, Jan 25, 2017

Page 36: Introducing Apache Kafka's Streams API - Kafka meetup Munich, Jan 25 2017

36Apache Kafka meetup, Munich, Germany, Jan 25, 2017

Page 37: Introducing Apache Kafka's Streams API - Kafka meetup Munich, Jan 25 2017

37Apache Kafka meetup, Munich, Germany, Jan 25, 2017

Page 38: Introducing Apache Kafka's Streams API - Kafka meetup Munich, Jan 25 2017

38Apache Kafka meetup, Munich, Germany, Jan 25, 2017

Page 39: Introducing Apache Kafka's Streams API - Kafka meetup Munich, Jan 25 2017

39Apache Kafka meetup, Munich, Germany, Jan 25, 2017

Page 40: Introducing Apache Kafka's Streams API - Kafka meetup Munich, Jan 25 2017

40Apache Kafka meetup, Munich, Germany, Jan 25, 2017

Key features

Page 41: Introducing Apache Kafka's Streams API - Kafka meetup Munich, Jan 25 2017

41Apache Kafka meetup, Munich, Germany, Jan 25, 2017

Key features in 0.10• Native, 100%-compatible Kafka integration

Page 42: Introducing Apache Kafka's Streams API - Kafka meetup Munich, Jan 25 2017

42Apache Kafka meetup, Munich, Germany, Jan 25, 2017

Native, 100% compatible Kafka integration

Read from Kafka

Write to Kafka

Page 43: Introducing Apache Kafka's Streams API - Kafka meetup Munich, Jan 25 2017

43Apache Kafka meetup, Munich, Germany, Jan 25, 2017

Key features in 0.10• Native, 100%-compatible Kafka integration• Secure stream processing using Kafka’s security features

Page 44: Introducing Apache Kafka's Streams API - Kafka meetup Munich, Jan 25 2017

44Apache Kafka meetup, Munich, Germany, Jan 25, 2017

Secure stream processing with the Streams API• Your applications can leverage all client-side security features in Apache Kafka• Security features include:

• Encrypting data-in-transit between applications and Kafka clusters• Authenticating applications against Kafka clusters (“only some apps may talk to the

production cluster”)• Authorizing application against Kafka clusters (“only some apps may read data from

sensitive topics”)

Streams API

Your AppKafkaCluster

”I’m the Payments app!” “Ok, you may read the Purchases topic.”

Data encryption

Page 45: Introducing Apache Kafka's Streams API - Kafka meetup Munich, Jan 25 2017

45Apache Kafka meetup, Munich, Germany, Jan 25, 2017

Key features in 0.10• Native, 100%-compatible Kafka integration• Secure stream processing using Kafka’s security features• Elastic and highly scalable• Fault-tolerant

Page 46: Introducing Apache Kafka's Streams API - Kafka meetup Munich, Jan 25 2017

46Apache Kafka meetup, Munich, Germany, Jan 25, 2017

Page 47: Introducing Apache Kafka's Streams API - Kafka meetup Munich, Jan 25 2017

47Apache Kafka meetup, Munich, Germany, Jan 25, 2017

Page 48: Introducing Apache Kafka's Streams API - Kafka meetup Munich, Jan 25 2017

48Apache Kafka meetup, Munich, Germany, Jan 25, 2017

Page 49: Introducing Apache Kafka's Streams API - Kafka meetup Munich, Jan 25 2017

49Apache Kafka meetup, Munich, Germany, Jan 25, 2017

Key features in 0.10• Native, 100%-compatible Kafka integration• Secure stream processing using Kafka’s security features• Elastic and highly scalable• Fault-tolerant• Stateful and stateless computations

Page 50: Introducing Apache Kafka's Streams API - Kafka meetup Munich, Jan 25 2017

50Apache Kafka meetup, Munich, Germany, Jan 25, 2017

Stateful computations• Stateful computations include aggregations (e.g. counting), joins, and windowing• State stores are the backbone of state management

• … are local for best performance• … are continuously backed up to Kafka to enable elasticity and fault-tolerance• ... are per stream task for isolation, think: share-nothing

• Pluggable storage engines• Default: RocksDB (a key-value store) to allow for local state that is larger than available

RAM• You can also use your own storage engine

• From the user perspective• DSL: no need to worry about anything, state management is automatically being done for

you• Processor API: direct access to state stores – very flexible but more manual work

Page 51: Introducing Apache Kafka's Streams API - Kafka meetup Munich, Jan 25 2017

51Apache Kafka meetup, Munich, Germany, Jan 25, 2017

Page 52: Introducing Apache Kafka's Streams API - Kafka meetup Munich, Jan 25 2017

52Apache Kafka meetup, Munich, Germany, Jan 25, 2017

Page 53: Introducing Apache Kafka's Streams API - Kafka meetup Munich, Jan 25 2017

53Apache Kafka meetup, Munich, Germany, Jan 25, 2017

Page 54: Introducing Apache Kafka's Streams API - Kafka meetup Munich, Jan 25 2017

54Apache Kafka meetup, Munich, Germany, Jan 25, 2017

Page 55: Introducing Apache Kafka's Streams API - Kafka meetup Munich, Jan 25 2017

55Apache Kafka meetup, Munich, Germany, Jan 25, 2017

Use case: real-time, distributed joins at large scale

Page 56: Introducing Apache Kafka's Streams API - Kafka meetup Munich, Jan 25 2017

56Apache Kafka meetup, Munich, Germany, Jan 25, 2017

Use case: real-time, distributed joins at large scale

Page 57: Introducing Apache Kafka's Streams API - Kafka meetup Munich, Jan 25 2017

57Apache Kafka meetup, Munich, Germany, Jan 25, 2017

Use case: real-time, distributed joins at large scale

Page 58: Introducing Apache Kafka's Streams API - Kafka meetup Munich, Jan 25 2017

58Apache Kafka meetup, Munich, Germany, Jan 25, 2017

Key features in 0.10• Native, 100%-compatible Kafka integration• Secure stream processing using Kafka’s security features• Elastic and highly scalable• Fault-tolerant• Stateful and stateless computations• Interactive queries

Page 59: Introducing Apache Kafka's Streams API - Kafka meetup Munich, Jan 25 2017

59Apache Kafka meetup, Munich, Germany, Jan 25, 2017

Page 60: Introducing Apache Kafka's Streams API - Kafka meetup Munich, Jan 25 2017

60Apache Kafka meetup, Munich, Germany, Jan 25, 2017

Key features in 0.10• Native, 100%-compatible Kafka integration• Secure stream processing using Kafka’s security features• Elastic and highly scalable• Fault-tolerant• Stateful and stateless computations• Interactive queries• Time model

Page 61: Introducing Apache Kafka's Streams API - Kafka meetup Munich, Jan 25 2017

61Apache Kafka meetup, Munich, Germany, Jan 25, 2017

Time

Page 62: Introducing Apache Kafka's Streams API - Kafka meetup Munich, Jan 25 2017

62Apache Kafka meetup, Munich, Germany, Jan 25, 2017

Time

A

C

B

Page 63: Introducing Apache Kafka's Streams API - Kafka meetup Munich, Jan 25 2017

63Apache Kafka meetup, Munich, Germany, Jan 25, 2017

Key features in 0.10• Native, 100%-compatible Kafka integration• Secure stream processing using Kafka’s security features• Elastic and highly scalable• Fault-tolerant• Stateful and stateless computations• Interactive queries• Time model• Supports late-arriving and out-of-order data

Page 64: Introducing Apache Kafka's Streams API - Kafka meetup Munich, Jan 25 2017

64Apache Kafka meetup, Munich, Germany, Jan 25, 2017

Out-of-order and late-arriving data: example

Users with mobile phones enterairplane, lose Internet connectivity

Emails are being writtenduring the 8h flight

Internet connectivity is restored,phones will send queued emails now,

though with an 8h delay

Bob writes Alice an email at 2 P.M.

Bob’s email is finally being sent at 10 P.M.

Page 65: Introducing Apache Kafka's Streams API - Kafka meetup Munich, Jan 25 2017

65Apache Kafka meetup, Munich, Germany, Jan 25, 2017

Key features in 0.10• Native, 100%-compatible Kafka integration• Secure stream processing using Kafka’s security features• Elastic and highly scalable• Fault-tolerant• Stateful and stateless computations• Interactive queries• Time model• Supports late-arriving and out-of-order data• Windowing

Page 66: Introducing Apache Kafka's Streams API - Kafka meetup Munich, Jan 25 2017

66Apache Kafka meetup, Munich, Germany, Jan 25, 2017

Windowing• To group related events in a stream• Use case examples:

• Time-based analysis of ad impressions (”number of ads clicked in the past hour”)

• Monitoring statistics of telemetry data (“1min/5min/15min averages”)• Analyzing user browsing sessions on a news site

Input data, wherecolors represent

different users events

Rectangles denotedifferent event-time

windows

processing-time

event-time

windowing

alice

bob

dave

Page 67: Introducing Apache Kafka's Streams API - Kafka meetup Munich, Jan 25 2017

67Apache Kafka meetup, Munich, Germany, Jan 25, 2017

Key features in 0.10• Native, 100%-compatible Kafka integration• Secure stream processing using Kafka’s security features• Elastic and highly scalable• Fault-tolerant• Stateful and stateless computations• Interactive queries• Time model• Supports late-arriving and out-of-order data• Windowing• Millisecond processing latency, no micro-batching• At-least-once processing guarantees (exactly-once is in the works as we

speak)

Page 68: Introducing Apache Kafka's Streams API - Kafka meetup Munich, Jan 25 2017

68Apache Kafka meetup, Munich, Germany, Jan 25, 2017

Roadmap Outlook

Page 69: Introducing Apache Kafka's Streams API - Kafka meetup Munich, Jan 25 2017

69Apache Kafka meetup, Munich, Germany, Jan 25, 2017

Roadmap outlook for Kafka StreamsUpcoming in Confluent 3.2 & Apache Kafka 0.10.2• Sessionization aka “session windows” -- e.g. for analyzing user browsing behavior• Global KTables (vs. today’s partitioned KTables) – e.g. for convenient facts-to-

dimensions joins• Now you can use newer versions of the Streams API against older clusters, too• Further operational metrics to improve monitoring and 24x7 operations of apps

Feature highlight for 2017• Exactly-Once processing semantics• But much more to come!

Page 70: Introducing Apache Kafka's Streams API - Kafka meetup Munich, Jan 25 2017

70Apache Kafka meetup, Munich, Germany, Jan 25, 2017

Wrapping Up

Page 71: Introducing Apache Kafka's Streams API - Kafka meetup Munich, Jan 25 2017

71Apache Kafka meetup, Munich, Germany, Jan 25, 2017

Where to go from here• Kafka’s Streams API is available in Confluent Platform 3.1 and in Apache

Kafka 0.10.1• http://www.confluent.io/download

• Demo applications: https://github.com/confluentinc/examples • Interactive Queries, Joins, Security, Windowing, Avro integration, …

• Confluent documentation: http://docs.confluent.io/current/streams/• Quickstart, Concepts, Architecture, Developer Guide, FAQ

• Recorded talks• Introduction to Kafka’s Streams API:

http://www.youtube.com/watch?v=o7zSLNiTZbA• Application Development and Data in the Emerging World of Stream Processing (higher

level talk): https://www.youtube.com/watch?v=JQnNHO5506w

Page 72: Introducing Apache Kafka's Streams API - Kafka meetup Munich, Jan 25 2017

72Confidential

Thank You