apache kafka lesson learned

57
1 Guozhang Wang Kafka Meetup Beijing, April 15, 2017 Apache Kafka Development Experience and Lesson Learned

Upload: guozhang-wang

Post on 21-Apr-2017

618 views

Category:

Engineering


2 download

TRANSCRIPT

Page 1: Apache Kafka Lesson Learned

1

Guozhang Wang Kafka Meetup Beijing, April 15, 2017

Apache Kafka Development Experience and Lesson Learned

Page 2: Apache Kafka Lesson Learned

2

A Short History of Kafka

Page 3: Apache Kafka Lesson Learned

3

A Short History• 2010.10: First commit of Kafka

Page 4: Apache Kafka Lesson Learned

4

A Short History• 2010.10: First commit of Kafka

• 2011.07: Enters Apache Incubator

• Release 0.7.0: compression, mirror-maker

Page 5: Apache Kafka Lesson Learned

5

A Short History• 2010.10: First commit of Kafka

• 2011.07: Enters Apache Incubator

• Release 0.7.0: compression, mirror-maker

• 2012.10: Graduated to top-level project

• Release 0.8.0: intra-cluster replication

Page 6: Apache Kafka Lesson Learned

6

A Short History• 2010.10: First commit of Kafka

• 2011.07: Enters Apache Incubator

• Release 0.7.0: compression, mirror-maker

• 2012.10: Graduated to top-level project

• Release 0.8.0: intra-cluster replication

• 2014.11: Confluent founded

• Release 0.8.2: new producer, quota

• Release 0.9.0: Kafka Connect, new consumer, security

• Release 0.10.0: Kafka Streams, timestamps, rack awareness

Page 7: Apache Kafka Lesson Learned

7

What is Kafka, Really?

[NetDB 2011]a scalable pub-sub messaging system..

Page 8: Apache Kafka Lesson Learned

8

Example: Pub-Sub Messaging

Tracking Logs / Metrics

Hadoop / DW

Apache Kafka

Page 9: Apache Kafka Lesson Learned

9

What is Kafka, Really?

[NetDB 2011]

[Hadoop Summit 2013]

a scalable pub-sub messaging system..

a real-time data pipeline..

Page 10: Apache Kafka Lesson Learned

10

Example: Centralized Data Pipeline

KV-Store Doc-Store RDBMSTracking Logs / Metrics

Hadoop / DW Monitoring Rec. Engine Social GraphSearchingSecurity

Apache Kafka

Page 11: Apache Kafka Lesson Learned

11

What is Kafka, Really?

[NetDB 2011]

[Hadoop Summit 2013]

[VLDB 2015]

a scalable pub-sub messaging system..

a real-time data pipeline..

a distributed and replicated log..

Page 12: Apache Kafka Lesson Learned

12

Example: Data Store Geo-Replication

Apache Kafka

Local Stores

User Apps User Apps

Local Stores

Apache Kafka

Region 2Region 1

write read

append log

mirroring

apply log

Page 13: Apache Kafka Lesson Learned

13

What is Kafka, Really?

a scalable pub-sub messaging system.. [NetDB 2011]

a real-time data pipeline.. [Hadoop Summit 2013]

a distributed and replicated log.. [VLDB 2015]

a unified data integration stack.. [CIDR 2015]

Page 14: Apache Kafka Lesson Learned

14

Example: Async. Micro-Services

Page 15: Apache Kafka Lesson Learned

15

What is Kafka, Really?

a scalable pub-sub messaging system.. [NetDB 2011]

a real-time data pipeline.. [Hadoop Summit 2013]

a distributed and replicated log.. [VLDB 2015]

a unified data integration stack.. [CIDR 2015]

Page 16: Apache Kafka Lesson Learned

16

What is Kafka, Really?

a scalable Pub-sub messaging system.. [NetDB 2011]

a real-time data pipeline.. [Hadoop Summit 2013]

a distributed and replicated log.. [VLDB 2015]

a unified data integration stack.. [CIDR 2015]

All of them!

Page 17: Apache Kafka Lesson Learned

17

Kafka: Streaming Platform

• Publish / Subscribe• Move data around as online streams

• Store• “Source-of-truth” continuous data

• Process• React / process data in real-time

Page 18: Apache Kafka Lesson Learned

18

How did we get here?

Page 19: Apache Kafka Lesson Learned

19

Lesson 1: Build evolvable systems

Page 20: Apache Kafka Lesson Learned

20

Upgrade Your Kafka Cluster is like ..

Page 21: Apache Kafka Lesson Learned

21

Kafka @ LI

• Release from trunk• Push frequency: daily

• Staging cluster• Full traffic of production• Full monitoring / alerting

• Production Ramp-up• Prepare to roll-back anytime

Page 22: Apache Kafka Lesson Learned

22

Kafka: Evolvable System• Zero down-time• Maintenance outage? No such thing.

• All protocols versioned• Brokers can talk to older versioned clients

• And vice versa since 0.10.2!

• One should do no more than rolling bounces• Staging before production

Page 23: Apache Kafka Lesson Learned

23

Server-Client Compatibility

0.10.0

0.8.2

0.10.2

0.8.0

0.9.0

0.8.2

0.10.1

0.8.1

Page 24: Apache Kafka Lesson Learned

24

Page 25: Apache Kafka Lesson Learned

25

Lesson 2: What gets measured gets fixed

Page 26: Apache Kafka Lesson Learned

26

Audit Trail @ LI

Page 27: Apache Kafka Lesson Learned

27

Page 28: Apache Kafka Lesson Learned

28

.. and a LOT of Them

Page 29: Apache Kafka Lesson Learned

29

Page 30: Apache Kafka Lesson Learned

Metrics Reporter

30

• JMXReporter• Included in AK: sensor / metrics -> mbean / attributes

• KafkaReporter• Send metrics back to Kafka!

• More..• Ganglia, Graphite, statsd ..

Page 31: Apache Kafka Lesson Learned

31

Confluent Enterprise: Delivery Tracking

Page 32: Apache Kafka Lesson Learned

32

Confluent Enterprise: Delivery Tracking

Page 33: Apache Kafka Lesson Learned

33

Confluent Enterprise: Cluster Health

Page 34: Apache Kafka Lesson Learned

34

Lesson 3: APIs stay forever

Page 35: Apache Kafka Lesson Learned

The Story of KAFKA-1481

35

Hey, we should stop allowing dashes / underscores in MBean name since they care used in hostnames / topics / etc.

Makes sense, let’s do this!

DONE! (a few lines of key changes)

Page 36: Apache Kafka Lesson Learned

The Story of KAFKA-1481

36

Page 37: Apache Kafka Lesson Learned

The Story of KAFKA-1481

37

Page 38: Apache Kafka Lesson Learned

The Story of KAFKA-1481

38

OMG what happened?

We need to tell our SRE to change their monitoring metrics names now..

That’s N cluster on M data centers ..

Page 39: Apache Kafka Lesson Learned

39

Page 40: Apache Kafka Lesson Learned

40

Page 41: Apache Kafka Lesson Learned

41

Page 42: Apache Kafka Lesson Learned

42

Lesson 4: Service needs gatekeepers

Page 43: Apache Kafka Lesson Learned

43

0.10.0

0.8.2

0.10.2

0.8.0

0.9.0

0.8.2

0.10.1

0.8.1

One naughty client can bother everyone ..

Page 44: Apache Kafka Lesson Learned

44

0.10.0

0.8.2

0.10.2

0.8.0

0.9.0

0.8.2

0.10.1

0.8.1

One naughty client can bother everyone ..

Page 45: Apache Kafka Lesson Learned

45

0.10.0

0.8.2

0.10.2

0.8.0

0.9.0

0.8.2

0.10.1

0.8.1

One naughty client can bother everyone ..

Page 46: Apache Kafka Lesson Learned

46

One naughty client can bother everyone ..

0.10.0

0.8.2

0.10.2

0.8.0

0.9.0

0.8.2

0.10.1

0.8.1

Page 47: Apache Kafka Lesson Learned

Multi-tenancy Services

47

• Security from ground up [Release 0.9.0+]

• Authentication

• Authorization

• Resources under control [Release 0.9.0+]

• Quota on bytes rate / request rate on clientId / GroupId etc • Quota on CPU resources

Page 48: Apache Kafka Lesson Learned

48

Lesson 5: Ecosystems are the key

Page 49: Apache Kafka Lesson Learned

49

Remember Hadoop-Producer/Consumer?

Page 50: Apache Kafka Lesson Learned

50

• Container Image?

• Kafka Manager GUI?

• REST Proxy APIs?

• Operation Tooling?

What Should Go into AK ?

• Other Language Clients?

• Schema Registry?

• Hadoop / etc Integration?

• Stream Processing?

Page 51: Apache Kafka Lesson Learned

51

Layered Architecture, not Monolithic

Messagin

g

Cross-D

C

Data Schem

as

Fast ETL

Search

&Query

Process

ing

Console

s

Apache Kafka Enterprise OfferingsOS Eco-Systems

Third-party ToolsFramework Impls

Page 52: Apache Kafka Lesson Learned

52

API, coding

“Full stack” evaluation

Operations, debugging, …

Page 53: Apache Kafka Lesson Learned

53

API, coding

“Full stack” evaluation

Operations, debugging, …

Simple is Beautiful

Page 54: Apache Kafka Lesson Learned

54

Page 55: Apache Kafka Lesson Learned

55

Page 56: Apache Kafka Lesson Learned

56

Page 57: Apache Kafka Lesson Learned

57

Take-aways• Build evolvable systems

• What gets measured gets fixed

• APIs stay forever

• (Multi-tenant) Services need gatekeepers

• Ecosystems are the key

THANKS!

Guozhang Wang | [email protected] | @guozhangwang

Kafka Summit 2017 @ NYC & SF