deep dive into apache kafka consumption

39
Deep dive into Apache Kafka consumption

Upload: alexandre-tamborrino

Post on 23-Feb-2017

1.486 views

Category:

Technology


3 download

TRANSCRIPT

Page 1: Deep dive into Apache Kafka consumption

Deep dive into Apache Kafka consumption

Page 2: Deep dive into Apache Kafka consumption

Goals

• Better understanding of Apache Kafka architecture and possible delivery guarantees

• The happy coding path towards fault-tolerant Kafka consumption using Kafka Java client and Akka Stream

Page 3: Deep dive into Apache Kafka consumption

Apache Kafka?

“Apache Kafka is publish-subscribe messaging system rethought as a distributed commit log.”

Page 4: Deep dive into Apache Kafka consumption
Page 5: Deep dive into Apache Kafka consumption
Page 6: Deep dive into Apache Kafka consumption

Consumer

poll messages

Page 7: Deep dive into Apache Kafka consumption

Consumer

poll messages

Ordering per partition

Page 8: Deep dive into Apache Kafka consumption

Consumer

Commit storage (ZK, Kafka, …)

commit:(Partition 0, Offset 5) (Partition 1, Offset 3)

(Partition 2, Offset 10) poll messages

Page 9: Deep dive into Apache Kafka consumption

Consumer

Commit storage (ZK, Kafka, …)

poll messages

Page 10: Deep dive into Apache Kafka consumption

Consumer

Commit storage (ZK, Kafka, …)

get:(Partition 0, Offset 5) (Partition 1, Offset 3)

(Partition 2, Offset 10) poll messages

restarting…

Page 11: Deep dive into Apache Kafka consumption

Consumer 1

Commit storage (ZK, Kafka, …)

commit: (Partition 0, Offset 5) (Partition 1, Offset 3) poll messages

Consumer 2

commit:(Partition 2, Offset 10)

Same consumer-group (balance)

Page 12: Deep dive into Apache Kafka consumption

Consumer 1

Commit storage (ZK, Kafka, …)

commit: (Partition 0, Offset 5) (Partition 1, Offset 3)

(Partition 2, Offset 10) poll messages

Consumer 2

Different consumer-groups (broadcast)

commit: (Partition 0, Offset 2) (Partition 1, Offset 1) (Partition 2, Offset 3)

Page 13: Deep dive into Apache Kafka consumption

Delivery guarantees: commit before

1. Get message

2. Commit offset

3. Begin message processing

4. End message processing

loop:

Page 14: Deep dive into Apache Kafka consumption

Delivery guarantees: commit before

1. Get message

2. Commit offset

3. Begin message processing

4. End message processing

Node failure / Redeployment /

Processing failure

Message lost! At-most-once guarantee

loop:

Page 15: Deep dive into Apache Kafka consumption

Delivery guarantees: commit after

1. Get message

2. Begin message processing

3. End message processing

4. Commit offset

Node failure / Redeployment /

Processing failure

Message processed twice! At-least-once guarantee

loop:

Page 16: Deep dive into Apache Kafka consumption

Delivery guarantees: auto-commit

1. Get message

2. Begin message processing

3. End message processing

Node failure / Redeployment /

Processing failure

Message lost OR processed twice! No guarantee

loop:

Page 17: Deep dive into Apache Kafka consumption

Delivery guarantees: exactly-once?

• At-least-once + idempotent message processing

• ex: update a key-value DB that stores the last state of a device

• At-least-once + atomic message processing and storage of offset

• ex: store offset + message in a SQL DB in a transaction, and use this DB as the main offset storage

Page 18: Deep dive into Apache Kafka consumption

How can I apply these concepts in my code?

Page 19: Deep dive into Apache Kafka consumption

Kafka Java client: at-least-once

Page 20: Deep dive into Apache Kafka consumption

Async non-blocking?

• In a Reactive/Scala world, message processing is usually asynchronous (non-blocking IO call to a DB, ask Akka actor, …): def processMsg(message: String): Future[Result]

• How to process your Kafka messages staying reactive (i.e not blocking threads)?

Page 21: Deep dive into Apache Kafka consumption

Kafka Java client: async non-blocking?

Page 22: Deep dive into Apache Kafka consumption

Kafka Java client: async non-blocking?

• Out-of-order processing! • No guarantee anymore! (offset N can be committed before N-1,

“shadowing” N-1) • Unbounded amount of messages in-memory. If Kafka message

rate > processing speed, can lead to Out Of Memory

Page 23: Deep dive into Apache Kafka consumption

What do we need?

Ordered asynchronous stream processing with back pressure

Page 24: Deep dive into Apache Kafka consumption

What do we need?

Ordered asynchronous stream processing with back pressure

ENTER REACTIVE STREAMS

Page 25: Deep dive into Apache Kafka consumption

Reactive Streams

• “Reactive Streams is an initiative to provide a standard for asynchronous stream processing with non-blocking back pressure.”

• Backed by Netflix, Pivotal, Red Hat, Twitter, Lightbend (Typesafe), …

• Implementations: RxJava, Akka Stream, Reactor, …

Page 26: Deep dive into Apache Kafka consumption

Akka Stream

• Stream processing abstraction on top of Akka Actors

• Types! Types are back!

• Source[A] ~> Flow[A, B] ~> Sink[B]

• Automatic back pressure

Page 27: Deep dive into Apache Kafka consumption

Reactive Kafka

• Akka Stream client for Kafka

• On top of Kafka Java client 0.9+

• https://github.com/akka/reactive-kafka

Page 28: Deep dive into Apache Kafka consumption

Reactive Kafka

Page 29: Deep dive into Apache Kafka consumption

Reactive Kafka

• At-least-once semantic in case of node failure / redeployment • Asynchronous processing without blocking any thread • Back pressure • Ordered processing • But what if the processMsg function fails?

Page 30: Deep dive into Apache Kafka consumption

The difference between Error and Failure

• Error: something went wrong, and this is deterministic (it will happen again if you do the same call)ex: HTTP 4xx, Deserialisation exception, Duplicate key DB error

• Failure: something went wrong, and this is not deterministic (it may not happen again if you do the same call): ex: HTTP 5xx, network exception

Page 31: Deep dive into Apache Kafka consumption

Error and Failure in Scala code using Scalactic

Future[Result Or Every[Error]]

can contain one or more Errorscan contain a Failure

Page 32: Deep dive into Apache Kafka consumption

Error and Failure in Scala code (non-async)

Try[Result Or Every[Error]]

can contain one or more Errorscan contain a Failure

Page 33: Deep dive into Apache Kafka consumption

Fault-tolerant consumption with Reactive Kafka

Page 34: Deep dive into Apache Kafka consumption

Keeping message ordering even in failure cases

• Retrying message processing upon failures will block the processing of subsequent messages, but that’s ok if message processing is homogenous

• ex: if processMsg of msg N results in a network failure calling a DB (say ELS), there is a high probability that processMsg of msg N+1 will encounter the same failure, so blocking is ok and even better to avoid losing messages due to transient failures

• If message processing is heterogenous (calling different external systems according to the msg), it is better to implement different consumer-groups and/or have different topics

Page 35: Deep dive into Apache Kafka consumption

Consumer

poll messages

Reminder: Kafka guarantees ordering only per partition

If #(consumer instances) < #(Kafka partitions), at least one consumer instance will process two or more partitions

Page 36: Deep dive into Apache Kafka consumption

Parallel processing between partitions while keeping ordering per partition

Page 37: Deep dive into Apache Kafka consumption

Bonus: auto-adaptive micro-batching windows per partition based on back

pressure signal

Dynamic trade-off between latency and throughput!

Page 38: Deep dive into Apache Kafka consumption

Conclusion

• Apache Kafka as a system is scalable and fault-tolerant but fault-tolerant consumption can be tricky

• But with the right concepts and the right tools, we can make Kafka consumption fault-tolerant very easily (i.e with a few lines of extra code)

Page 39: Deep dive into Apache Kafka consumption

Thank you!

Questions?