no data loss pipeline with apache kafka

Post on 16-Apr-2017

8.817 Views

Category:

Technology

1 Downloads

Preview:

Click to see full reader

TRANSCRIPT

No Data Loss Pipeline with

Apache KafkaJiangjie (Becket) Qin @ LinkedIn

● Data losso producer.send(record) is called but

record did not end up in consumer as expected

● Message reorderingo send(record1) is called before

send(record2)o record2 shows in broker before record1

doeso matters in cases like DB replication

Data loss and message reordering

Kafka based data pipelineKafka Cluster

(Colo 1)Producer Kafka Cluster(Colo 2) ConsumerMirror Maker

Today’s Agenda:● No data loss● No message reordering● Mirror maker enhancement

○ Customized consumer rebalance listener○ Message handler

Synchronous send is safe but slow...

producer.send(record).get()

ProducerKafka Cluster

(Colo 1)Producer Kafka Cluster(Colo 2) ConsumerMirror Maker

Using asynchronous send with callback can be trickyproducer.send(record,callback)

ProducerKafka Cluster

(Colo 1)Producer Kafka Cluster(Colo 2) ConsumerMirror Maker

Producer can cause data loss when● block.on.buffer.full=false● retries are exhausted● sending message without using acks=all

ProducerKafka Cluster

(Colo 1)Producer Kafka Cluster(Colo 2) ConsumerMirror Maker

Is this good enough?producer.send(record,callback)● block.on.buffer.full=TRUE● retries=Long.MAX_VALUE● acks=all● resend in callback when message send failed

ProducerKafka Cluster

(Colo 1)Producer Kafka Cluster(Colo 2) ConsumerMirror Maker

Message reordering might happen if:● max.in.flight.requests.per.connection > 1, AND ● retries are enabled

ProducerKafka Cluster

(Colo 1)Producer Kafka Cluster(Colo 2) ConsumerMirror Maker

Kafka BrokerProducer

message 0

message 1

message 0 failed

retry message 0

Timeline

Message reordering might also happen if:● producer is closed carelessly

o close producer in user thread, oro close without using close(0)

Producer

RecordAccumulator

Sender Thread

Kafka BrokerTimeline

1.msg 02.callback(msg 0) ack expt.

User Thread

close prod.

3.msg 1

notify

● close producer in the callback on error● close producer with close(0) to prevent further

sending after previous message send failed

Producer

RecordAccumulator

Sender Thread

Kafka BrokerTimeline

1.msg 02.callback(msg 0) ack expt.

User Thread

close(0)

notify

To prevent data loss:● block.on.buffer.full=TRUE● retries=Long.MAX_VALUE (for some use cases)● acks=allTo prevent reordering:● max.in.flight.requests.per.connection=1● close producer in callback with close(0) on send

failure

ProducerKafka Cluster

(Colo 1)Producer Kafka Cluster(Colo 2) ConsumerMirror Maker

Not a perfect solution:● Producer needs to be closed to guarantee

message order. E.g. In mirror maker, one message send failure to a topic should not affect the whole pipeline.

● When producer is down, message in buffer will still be lost

ProducerKafka Cluster

(Colo 1)Producer Kafka Cluster(Colo 2) ConsumerMirror Maker

Correct producer setting is not enough● acks=all still can lose data when

unclean leader election happens.● Two replicas are needed at any time

to guarantee data persistence.

Kafka BrokersKafka Cluster

(Colo 1)Producer Kafka Cluster(Colo 2) ConsumerMirror Maker

● replication factor >= 3● min.isr = 2● Replication factor > min.isr

o If replication factor = min.isr, partition will be offline when one replica is down

Kafka BrokersKafka Cluster

(Colo 1)Producer Kafka Cluster(Colo 2) ConsumerMirror Maker

Settings we use:● replication factor = 3● min.isr = 2● unclean leader election disabled

Kafka BrokersKafka Cluster

(Colo 1)Producer Kafka Cluster(Colo 2) ConsumerMirror Maker

● Consumer might lose message when offsets are committed carelessly. E.g. commit offsets before processing messages completelyo Disable auto.offset.commito Commit offsets only after the messages are

processed

ConsumerKafka Cluster

(Colo 1)Producer Kafka Cluster(Colo 2) ConsumerMirror Maker

Kafka based data pipelineKafka Cluster

(Colo 1)Producer Kafka Cluster(Colo 2) ConsumerMirror Maker

Today’s Agenda:● No data loss● No message reordering● Mirror maker enhancement

○ Customized consumer rebalance listener○ Message handler

● Consume-then-produce pattern● Only commit consumer offsets of

messages acked by target cluster● Default to no-data-loss and no-

reordering settings

Mirror Maker EnhancementKafka Cluster

(Colo 1)Producer Kafka Cluster(Colo 2) ConsumerMirror Maker

● Customized Consumer Rebalance Listenero Can be used to propagate topic change

from source cluster to target cluster. E.g. partition number change, new topic creation.

Mirror Maker EnhancementKafka Cluster

(Colo 1)Producer Kafka Cluster(Colo 2) ConsumerMirror Maker

● Customized Message Handler, useful foro partition-to-partition mirroro filtering out messageso message format conversiono other simple message processing

Mirror Maker EnhancementKafka Cluster

(Colo 1)Producer Kafka Cluster(Colo 2) ConsumerMirror Maker

● startup/shutdown accelerationo parallelized startup and shutdowno 26 nodes cluster with 4 consumer each

takes about 1 min to startup and shutdown

Mirror Maker EnhancementKafka Cluster

(Colo 1)Producer Kafka Cluster(Colo 2) ConsumerMirror Maker

Q&A

top related