apache kafka bay area sep meetup - 24/7 customer, inc

40
© 2016 24/7 CUSTOMER, INC. Apache Kafka Bay Area September Meetup - 24/7 CUSTOMER, INC. Our Kafka journey to 0.10 Engineering Manager - Big Data Platform Suneet Grover

Upload: suneet-grover

Post on 15-Apr-2017

203 views

Category:

Technology


0 download

TRANSCRIPT

Page 1: Apache Kafka Bay Area Sep Meetup - 24/7 Customer, Inc

© 2016 24/7 CUSTOMER, INC.

Apache Kafka Bay Area September Meetup - 24/7 CUSTOMER, INC.

Our Kafka journey to 0.10

Engineering Manager - Big Data Platform

Suneet Grover

Page 2: Apache Kafka Bay Area Sep Meetup - 24/7 Customer, Inc

2© 2016 24/7 CUSTOMER, INC.

About [24]7

Page 3: Apache Kafka Bay Area Sep Meetup - 24/7 Customer, Inc

© 2016 24/7 CUSTOMER, INC. 3

Today’s engagement is not driving successful moments

Q&A

IVR

Page 4: Apache Kafka Bay Area Sep Meetup - 24/7 Customer, Inc

© 2016 24/7 CUSTOMER, INC. 4

Smart Customer Engagement

Data-DrivenReflecting All Available Data

Click here to see [24]7 in actionVideo available at http://player.vimeo.com/video/85280070

PredictiveReal-timeDecisions

Omni-channelAcross Digital

& Voice

PersonalizedUser Experience

Page 5: Apache Kafka Bay Area Sep Meetup - 24/7 Customer, Inc

© 2016 24/7 CUSTOMER, INC. 5

Intent-driven engagement

Anticipate consumer intent

Holistic experience across channels

Delivering the right moments

to

They moved from

Channel-centric engagement

Reacting to consumer behavior

Disconnected, fragmented channels

Too many failed experiences

Page 6: Apache Kafka Bay Area Sep Meetup - 24/7 Customer, Inc

© 2016 24/7 CUSTOMER, INC. 6

[24]7 by the numbers

1.2bsmart speech

calls/year

127mvirtual agent

inquiries/year

30magent

chats/year

341mweb visitors

/month

5000+digital chat agents

(#1 WW)

70+data scientists

(most in industry)

100+patents

300+software engineers &

designers

Page 7: Apache Kafka Bay Area Sep Meetup - 24/7 Customer, Inc

© 2016 24/7 CUSTOMER, INC. 7

Agenda• Kafka at [24]7• Challenges and Learnings• Transparency and Resiliency• Upgrade path• Configurations that worked for us• Design for multiple data centers• Our Kafka wish list

Page 8: Apache Kafka Bay Area Sep Meetup - 24/7 Customer, Inc

8© 2016 24/7 CUSTOMER, INC.

Kafka at [24]7

Page 9: Apache Kafka Bay Area Sep Meetup - 24/7 Customer, Inc

© 2016 24/7 CUSTOMER, INC. 9

Intent PredictionData AnalyticsBusiness Intelligence

Page 10: Apache Kafka Bay Area Sep Meetup - 24/7 Customer, Inc

© 2016 24/7 CUSTOMER, INC. 10

Aug 2016

Kafka 0.10.0.1

Easy UpgradeBetter client APIs

Stable so far

Our Kafka Timeline2013

Kafka 0.7

Broker PartitionsLess visibility

Apr 2016

Kafka 0.8.2.2

Non-sticky partitionsRound Robin MMEasier to Manage

Fewer Issues

2014

Kafka 0.7 & 0.8

Sticky partitionsRange based MMs

Migration procsBugs

Page 11: Apache Kafka Bay Area Sep Meetup - 24/7 Customer, Inc

11© 2016 24/7 CUSTOMER, INC.

Few months ago …

Page 12: Apache Kafka Bay Area Sep Meetup - 24/7 Customer, Inc

© 2016 24/7 CUSTOMER, INC. 12

Our setup

DC1 - 0.7 DC2 - 0.7

DC2 - 0.8DC1 - 0.8

Topics X

Topics All - X

Mirroring

Migration

Page 13: Apache Kafka Bay Area Sep Meetup - 24/7 Customer, Inc

© 2016 24/7 CUSTOMER, INC. 13

Challenges with Kafka 0.8• Broker partition stickiness

• Cannot move clusters• No elasticity

• ZK load and latencies• Range based mirror-maker algorithm• Stale topics deletion

Page 14: Apache Kafka Bay Area Sep Meetup - 24/7 Customer, Inc

© 2016 24/7 CUSTOMER, INC. 14

Experience with Network issues• DNS issue causing runtime issue at ZKClient• Connectivity issues leading to controller re-elections• Conflict errors in mirror-makers• Socket leaks leading to open file descriptors

Page 15: Apache Kafka Bay Area Sep Meetup - 24/7 Customer, Inc

© 2016 24/7 CUSTOMER, INC. 15

Experience with Kafka 0.8 and upgrade• Mismatch in Kafka vs zookeeper state

• Producers could see certain partitions but consumers couldn’t• We added the same partitions back to the cluster

• Leader-Replica-ISR mismatch• We did the controller broker restart

• Broker not allowed into cluster• Controller task queue went into invalid state - KAFKA-2300

• Repeated Kafka controller switching• Data Loss due to fewer replicas

Page 16: Apache Kafka Bay Area Sep Meetup - 24/7 Customer, Inc

© 2016 24/7 CUSTOMER, INC. 16

Learnings• **It works to delete the “/controller” node from zookeeper • Always do clean shutdown and restart of brokers• Some issues are not always visible as errors or warnings• Run ZK on SSD

Page 17: Apache Kafka Bay Area Sep Meetup - 24/7 Customer, Inc

17© 2016 24/7 CUSTOMER, INC.

Upgrade Path

Page 18: Apache Kafka Bay Area Sep Meetup - 24/7 Customer, Inc

© 2016 24/7 CUSTOMER, INC. 18

Path we took

Kafka 0.8

In-place 0.9

New cluster 0.8.2.2

In-place

0.8.2.2

New cluster

0.9

Page 19: Apache Kafka Bay Area Sep Meetup - 24/7 Customer, Inc

© 2016 24/7 CUSTOMER, INC. 19

Our Upgrade to 0.8.2.2• Shutdown 0.7 pipeline• Tried in-place upgrade from 0.8.0 to 0.8.2.2• Were successful with moving to a separate 0.8.2.2 cluster• Added a lot more monitors for resiliency

Page 20: Apache Kafka Bay Area Sep Meetup - 24/7 Customer, Inc

© 2016 24/7 CUSTOMER, INC. 20

Upgrade to Kafka 0.10.0.1• Separated mirror makers from brokers• Only the brokers upgraded to 0.10.0.1• In-place upgrade worked very well• Found an issue with the mirror-maker 0.10.0.1• Yet to change the message format, upgrade clients etc.

Page 21: Apache Kafka Bay Area Sep Meetup - 24/7 Customer, Inc

21© 2016 24/7 CUSTOMER, INC.

Configurations that worked for us

Page 22: Apache Kafka Bay Area Sep Meetup - 24/7 Customer, Inc

© 2016 24/7 CUSTOMER, INC. 22

Broker configurations• default.replication.factor = 3• num.partitions = 2• delete.topic.enable = true• auto.leader.rebalance.enable = true• controlled.shutdown.enable = true• queued.max.requests = 1000

Page 23: Apache Kafka Bay Area Sep Meetup - 24/7 Customer, Inc

© 2016 24/7 CUSTOMER, INC. 23

Upgrade specific configurations• inter.broker.protocol.version = 0.10.0.1• message.format.version = 0.8.2.2

Page 24: Apache Kafka Bay Area Sep Meetup - 24/7 Customer, Inc

© 2016 24/7 CUSTOMER, INC. 24© 2016 24/7 CUSTOMER, INC.

Transparency and Resiliency

Page 25: Apache Kafka Bay Area Sep Meetup - 24/7 Customer, Inc

© 2016 24/7 CUSTOMER, INC. 25

Metrics flow

Grafana

Graphite

Kafka BrokerMetrics Reporter

Kafka MM JMXTrans

Zookeeper

Host level Metrics & Alerts

Lag monitor

ELK

Page 26: Apache Kafka Bay Area Sep Meetup - 24/7 Customer, Inc

© 2016 24/7 CUSTOMER, INC. 26

Essential Broker Metrics• Disk, CPU and throughput utilization• Ingress, egress volume per broker and topic• Active controller count• Offline partitions• Under replicated partitions• Partitions per broker• Log flush rate

Page 27: Apache Kafka Bay Area Sep Meetup - 24/7 Customer, Inc

© 2016 24/7 CUSTOMER, INC. 27

Basic Alerts• Disk, CPU utilization• Open file handles• Controller count• Controller re-elections• Under replicated partitions• Offline partitions• Stuck pending commands in zookeeper• Conflicts in mirror-makers

Page 28: Apache Kafka Bay Area Sep Meetup - 24/7 Customer, Inc

© 2016 24/7 CUSTOMER, INC. 28

JMXTrans• Push mirror-maker metrics to graphite

• Throughput per topic, per thread, per instance etc.• WaitOnTake, WaitOnPut

• Push zookeeper metrics to graphite• Latency, quorum, connections etc.

Page 29: Apache Kafka Bay Area Sep Meetup - 24/7 Customer, Inc

© 2016 24/7 CUSTOMER, INC. 29

Data Lag Monitoring• Measures the event level time delay• Monitors data latencies per cluster, per topic, per partition• Latencies between multiple steps in Kafka pipeline• Optimize and configure sampling ratio• Supports multiple message formats json, avro etc.• Alerts based on pre-defined thresholds

Page 30: Apache Kafka Bay Area Sep Meetup - 24/7 Customer, Inc

© 2016 24/7 CUSTOMER, INC. 30

Indicative Broker Metrics• Request Metrics

• Local Time• Remote Time• Queue Time

• Request Handler Idle Percent • Network Processor Idle Percent

Page 31: Apache Kafka Bay Area Sep Meetup - 24/7 Customer, Inc

31© 2016 24/7 CUSTOMER, INC.

Now some demo

Page 32: Apache Kafka Bay Area Sep Meetup - 24/7 Customer, Inc

© 2016 24/7 CUSTOMER, INC. 32© 2016 24/7 CUSTOMER, INC.

Design for Multiple Data Centers

Page 33: Apache Kafka Bay Area Sep Meetup - 24/7 Customer, Inc

© 2016 24/7 CUSTOMER, INC. 33

Range Based Mirror Makers

Consumer 1 Consumer 2 Consumer 3 Consumer 41

10

100

10001000

181

14

5

Skewed Partition Assignment

Num Partitions

Page 34: Apache Kafka Bay Area Sep Meetup - 24/7 Customer, Inc

© 2016 24/7 CUSTOMER, INC. 34

Round Robin Mirror Makers

Consumer 1 Consumer 2 Consumer 3 Consumer 40

50

100

150

200

250

300

350

Uniform Partition Assignment

Num Partitions

Page 35: Apache Kafka Bay Area Sep Meetup - 24/7 Customer, Inc

© 2016 24/7 CUSTOMER, INC. 35

Mirror-maker fine tuning• Round Robin works better than Range based in most cases• Spread out the topics in multiple MM consumer groups

• If you have a few large volume topics• Negative regex works with whitelist parameter• Doesn’t help to have too many MM consumer threads• Tune socket buffer size (doesn’t apply unless OS allows)

• MM - socket.receive.buffer.bytes = 1048576• Broker - socket.send.buffer.bytes = 1048576

Page 36: Apache Kafka Bay Area Sep Meetup - 24/7 Customer, Inc

36© 2016 24/7 CUSTOMER, INC.

Critical to our data pipelineCarries data reliably across DCsEasy to manage and operateGood monitoring capabilities

Kafka to our components is like arteries to a body

Page 37: Apache Kafka Bay Area Sep Meetup - 24/7 Customer, Inc

37© 2016 24/7 CUSTOMER, INC.

Our Kafka wish list

Page 38: Apache Kafka Bay Area Sep Meetup - 24/7 Customer, Inc

© 2016 24/7 CUSTOMER, INC. 38

It would be great to have• Partition assignment based on volume in brokers and MMs• Blacklisting and whitelisting capabilities in mirror-makers• Rolling restarts of the brokers• Auto cleaning stale topics and partitions• Catching uneven topics with skewed data spread – bad

producers

Page 39: Apache Kafka Bay Area Sep Meetup - 24/7 Customer, Inc

© 2016 24/7 CUSTOMER, INC. 39

Q & A

Page 40: Apache Kafka Bay Area Sep Meetup - 24/7 Customer, Inc

© 2016 24/7 CUSTOMER, INC. 40