Transcript
Page 1: Apache Kafka at LinkedIn

©2013 LinkedIn Corporation. All Rights Reserved. KAFKA Team, Data Infrastructure

Apache Kafka at LinkedIn

Page 2: Apache Kafka at LinkedIn

©2013 LinkedIn Corporation. All Rights Reserved. KAFKA Team, Data Infrastructure

About Me

2

Page 3: Apache Kafka at LinkedIn

©2013 LinkedIn Corporation. All Rights Reserved. KAFKA Team, Data Infrastructure

Agenda

3

• Overview of Kafka

• Kafka Design

• Kafka Usage at LinkedIn

• Roadmap

• Q & A

Page 4: Apache Kafka at LinkedIn

Why We Build Kafka?

Page 5: Apache Kafka at LinkedIn

©2013 LinkedIn Corporation. All Rights Reserved. KAFKA Team, Data Infrastructure

We Have a lot of Data

5

• User activity tracking

• Page views, ad impressions, etc

• Server logs and metrics

• Syslogs, request-rates, etc

• Messaging

• Emails, news feeds, etc

• Computation derived

• Results of Hadoop / data warehousing, etc

Page 6: Apache Kafka at LinkedIn

©2013 LinkedIn Corporation. All Rights Reserved. KAFKA Team, Data Infrastructure

.. and We Build Products on Data

6

Page 7: Apache Kafka at LinkedIn

©2013 LinkedIn Corporation. All Rights Reserved. KAFKA Team, Data Infrastructure

Newsfeed

7

Page 8: Apache Kafka at LinkedIn

©2013 LinkedIn Corporation. All Rights Reserved. KAFKA Team, Data Infrastructure

Recommendation

8HADOOP SUMMIT 2013

People you may know

Page 9: Apache Kafka at LinkedIn

©2013 LinkedIn Corporation. All Rights Reserved. KAFKA Team, Data Infrastructure

Recommendation

9

Page 10: Apache Kafka at LinkedIn

©2013 LinkedIn Corporation. All Rights Reserved. KAFKA Team, Data Infrastructure

Search

10

Page 11: Apache Kafka at LinkedIn

©2013 LinkedIn Corporation. All Rights Reserved. KAFKA Team, Data Infrastructure

Metrics and Monitoring

11

HADOOP SUMMIT 2013

System and application metrics/logging

LinkedIn Corporation ©2013 All Rights Reserved 5

Page 12: Apache Kafka at LinkedIn

©2013 LinkedIn Corporation. All Rights Reserved. KAFKA Team, Data Infrastructure

.. and a LOT of Monitoring

12

Page 13: Apache Kafka at LinkedIn

The Problem:

How to integrate this variety of data

and make it available to all products?

Page 14: Apache Kafka at LinkedIn

©2013 LinkedIn Corporation. All Rights Reserved. KAFKA Team, Data Infrastructure 14

Life back in 2010:

Point-to-Point Pipeplines

Page 15: Apache Kafka at LinkedIn

©2013 LinkedIn Corporation. All Rights Reserved. KAFKA Team, Data Infrastructure 15

Example: User Activity Data Flow

Page 16: Apache Kafka at LinkedIn

©2013 LinkedIn Corporation. All Rights Reserved. KAFKA Team, Data Infrastructure 16

What We Want

• A centralized data pipeline

Page 17: Apache Kafka at LinkedIn

©2013 LinkedIn Corporation. All Rights Reserved. KAFKA Team, Data Infrastructure 17

Apache Kafka

We tried some systems off-

the-shelf, but…

Page 18: Apache Kafka at LinkedIn

©2013 LinkedIn Corporation. All Rights Reserved. KAFKA Team, Data Infrastructure 18

What We REALLY Want

• A centralized data pipeline that is

• Elastically scalable

• Durable

• High-throughput

• Easy to use

Page 19: Apache Kafka at LinkedIn

©2013 LinkedIn Corporation. All Rights Reserved. KAFKA Team, Data Infrastructure

• A distributed pub-sub messaging system

• Scale-out from groundup

• Persistent to disks

• High-Throughput (10s MB/sec per server)

19

Apache Kafka

Page 20: Apache Kafka at LinkedIn

©2013 LinkedIn Corporation. All Rights Reserved. KAFKA Team, Data Infrastructure 20

Life Since Kafka in Production

Apache Kafka

• Developed and maintained by 5 Devs + 2 SRE

Page 21: Apache Kafka at LinkedIn

©2013 LinkedIn Corporation. All Rights Reserved. KAFKA Team, Data Infrastructure

Agenda

21

• Overview of Kafka

• Kafka Design

• Kafka Usage at LinkedIn

• Roadmap

• Q & A

Page 22: Apache Kafka at LinkedIn

Key Idea #1:

Data-parallelism leads to scale-out

Page 23: Apache Kafka at LinkedIn

©2013 LinkedIn Corporation. All Rights Reserved. KAFKA Team, Data Infrastructure

• Produce/consume requests are randomly balanced among brokers

23

Distribute Clients across Partitions

Page 24: Apache Kafka at LinkedIn

Key Idea #2:

Disks are fast when used sequentially

Page 25: Apache Kafka at LinkedIn

©2013 LinkedIn Corporation. All Rights Reserved. KAFKA Team, Data Infrastructure

• Appends are effectively O(1)

• Reads from known offset are fast still, when cached

25

Store Messages as a Log

3 4 5 5 7 8 9 10 11 12...

Producer Write

Consumer1

Reads (offset 7)

Consumer2

Reads (offset 7)

Partition i of Topic A

Page 26: Apache Kafka at LinkedIn

Key Idea #3:

Batching makes best use of network/IO

Page 27: Apache Kafka at LinkedIn

©2013 LinkedIn Corporation. All Rights Reserved. KAFKA Team, Data Infrastructure

• Batched send and receive

• Batched compression

• No message caching in JVM

• Zero-copy from file to socket (Java NIO)

27

Batch Transfer

Page 28: Apache Kafka at LinkedIn

©2013 LinkedIn Corporation. All Rights Reserved. KAFKA Team, Data Infrastructure 28

The API (0.8)

Producer:

send(topic, message)

Consumer:

Iterable stream = createMessageStreams(…).get(topic)

for (message: stream) {// process the message

}

Page 29: Apache Kafka at LinkedIn

©2013 LinkedIn Corporation. All Rights Reserved. KAFKA Team, Data Infrastructure

Agenda

29

• Overview of Kafka

• Kafka Design

• Kafka Usage at LinkedIn

• Pipeline deployment

• Schema for data cleanliness

• O(1) ETL

• Auditing for correctness

• Roadmap

• Q & A

Page 30: Apache Kafka at LinkedIn

©2013 LinkedIn Corporation. All Rights Reserved. KAFKA Team, Data Infrastructure 30

Kafka Usage at LinkedIn

• Mainly used for tracking user-activity and metrics data

• 16 - 32 brokers in each cluster (615+ total brokers)

• 527 billion messages/day

• 7500+ topics, 270k+ partitions

• Byte rates:

• Writes: 97 TB/day

• Reads: 430 TB/day

Page 31: Apache Kafka at LinkedIn

©2013 LinkedIn Corporation. All Rights Reserved. KAFKA Team, Data Infrastructure 31

Kafka Usage at LinkedIn

Page 32: Apache Kafka at LinkedIn

©2013 LinkedIn Corporation. All Rights Reserved. KAFKA Team, Data Infrastructure 32

Kafka Usage at LinkedIn

Page 33: Apache Kafka at LinkedIn

©2013 LinkedIn Corporation. All Rights Reserved. KAFKA Team, Data Infrastructure 33

Kafka Usage at LinkedIn

Page 34: Apache Kafka at LinkedIn

©2013 LinkedIn Corporation. All Rights Reserved. KAFKA Team, Data Infrastructure

Agenda

34

• Overview of Kafka

• Kafka Design

• Kafka Usage at LinkedIn

• Pipeline deployment

• Schema for data cleanliness

• O(1) ETL

• Auditing for correctness

• Roadmap

• Q & A

Page 35: Apache Kafka at LinkedIn

Problems

• Hundreds of message types

• Thousands of fields

• What do they all mean?

• What happens when they change?

Page 36: Apache Kafka at LinkedIn

©2013 LinkedIn Corporation. All Rights Reserved. KAFKA Team, Data Infrastructure 36

Standardized Schema on Avro

• Schema

• Message structure contract

• Performance gain

• Workflow

• Check in schema

• Auto compatibility check

• Code review

• “Ship it!”

Page 37: Apache Kafka at LinkedIn

©2013 LinkedIn Corporation. All Rights Reserved. KAFKA Team, Data Infrastructure

Agenda

37

• Overview of Kafka

• Kafka Design

• Kafka Usage at LinkedIn

• Pipeline deployment

• Schema for data cleanliness

• O(1) ETL

• Auditing for correctness

• Roadmap

• Q & A

Page 38: Apache Kafka at LinkedIn

©2013 LinkedIn Corporation. All Rights Reserved. KAFKA Team, Data Infrastructure 38

Kafka to Hadoop

Page 39: Apache Kafka at LinkedIn

©2013 LinkedIn Corporation. All Rights Reserved. KAFKA Team, Data Infrastructure 39

Hadoop ETL (Camus)

• Map/Reduce job does data load

• One job loads all events

• ~10 minute ETA on average from producer to HDFS

• Hive registration done automatically

• Schema evolution handled transparently

• Open sourced:

– https://github.com/linkedin/camus

Page 40: Apache Kafka at LinkedIn

©2013 LinkedIn Corporation. All Rights Reserved. KAFKA Team, Data Infrastructure

Agenda

40

• Overview of Kafka

• Kafka Design

• Kafka Usage at LinkedIn

• Pipeline deployment

• Schema for data cleanliness

• O(1) ETL

• Auditing for correctness

• Roadmap

• Q & A

Page 41: Apache Kafka at LinkedIn

Does it really work?“All published messages must be delivered to all consumers (quickly)”

Page 42: Apache Kafka at LinkedIn

Audit Trail

Page 43: Apache Kafka at LinkedIn

©2013 LinkedIn Corporation. All Rights Reserved. KAFKA Team, Data Infrastructure 43

More Features in Kafka 0.8

• Intra-cluster replication (0.8.0)

• Highly availability,

• Reduced latency

• Log compaction (0.8.1)

• State storage

• Operational tools (0.8.2)

• Topic management

• Automated leader rebalance

• etc ..

Checkout our page for more: http://kafka.apache.org/

Page 44: Apache Kafka at LinkedIn

©2013 LinkedIn Corporation. All Rights Reserved. KAFKA Team, Data Infrastructure 44

Kafka 0.9

• Clients Rewrite

• Remove ZK dependency

• Even better throughput

• Security

• More operability, multi-tenancy ready

• Transactional Messaing

• From at-least-one to exactly-once

Checkout our page for more: http://kafka.apache.org/

Page 45: Apache Kafka at LinkedIn

©2013 LinkedIn Corporation. All Rights Reserved. KAFKA Team, Data Infrastructure

Kafka Users: Next Maybe You?

Page 46: Apache Kafka at LinkedIn

©2013 LinkedIn Corporation. All Rights Reserved. KAFKA Team, Data Infrastructure 46

Acknowledgements

Page 47: Apache Kafka at LinkedIn

Questions? Guozhang Wang

[email protected]

www.linkedin.com/in/guozhangwang

Page 48: Apache Kafka at LinkedIn

Backup Slides

Page 49: Apache Kafka at LinkedIn

©2013 LinkedIn Corporation. All Rights Reserved. KAFKA Team, Data Infrastructure 49

Real-time Analysis with Kafka• Analytics from Hadoop can be slow

• Production -> Kafka: tens of milliseconds

• Kafka - > Hadoop: < 1 minute

• ETL in Hadoop: ~ 45 minutes

• MapReduce in Hadoop: maybe hours

Page 50: Apache Kafka at LinkedIn

©2013 LinkedIn Corporation. All Rights Reserved. KAFKA Team, Data Infrastructure 50

Real-time Analysis with Kafka

• Solution No.1: directly consuming from Kafka

• Solution No. 2: other storage than HDFS

• Spark, Shark

• Pinot, Druid, FastBit

• Solution No. 3: stream processing

• Apache Samza

• Storm

Page 51: Apache Kafka at LinkedIn

©2013 LinkedIn Corporation. All Rights Reserved. KAFKA Team, Data Infrastructure 51

How Fast can Kafka Go?

• Bottleneck #1: network bandwidth

• Producer: 100 Mb/s for 1 Gig-Ethernet

• Consumer can be slower due to multi-sub

• Bottleneck #2: disk space

• Data may be deleted before consumed at peak time•

• Configurable time/size-based retention policy

• Bottleneck #3: Zookeeper

• Mainly due to offset commit, will be lifted in 0.9

Page 52: Apache Kafka at LinkedIn

©2013 LinkedIn Corporation. All Rights Reserved. KAFKA Team, Data Infrastructure 52

Intra-cluster Replication• Pick CA within Datacenter (failover < 10ms)

• Network partition is rare

• Latency less than an issue

• Separate data replication and consensus

• Consensus => Zookeeper

• Replication => primary-backup (f to tolerate f-1 failure)

• Configurable ACK (durability v.s. latency)

• More details:

• http://www.slideshare.net/junrao/kafka-replication-apachecon2013

Page 53: Apache Kafka at LinkedIn

©2013 LinkedIn Corporation. All Rights Reserved. KAFKA Team, Data Infrastructure 53

Replication Architecture

Producer

Consumer

Producer

Broker Broker Broker Broker

Consumer

ZK


Top Related