jay kreps, neha narkhede, jun rao linkedin · 2019-07-17 · kafka: a distributed messaging system...

15
Kafka: a Distributed Messaging System for Log Processing Jay Kreps, Neha Narkhede, Jun Rao LinkedIn

Upload: others

Post on 15-Aug-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Jay Kreps, Neha Narkhede, Jun Rao LinkedIn · 2019-07-17 · Kafka: a Distributed Messaging System for Log Processing Jay Kreps, Neha Narkhede, Jun Rao LinkedIn

Kafka: a Distributed Messaging System for Log Processing

Jay Kreps, Neha Narkhede, Jun RaoLinkedIn

Page 2: Jay Kreps, Neha Narkhede, Jun Rao LinkedIn · 2019-07-17 · Kafka: a Distributed Messaging System for Log Processing Jay Kreps, Neha Narkhede, Jun Rao LinkedIn

AGENDA

• Kafka usage at LinkedIn

• Kafka design

• Kafka roadmap

Page 3: Jay Kreps, Neha Narkhede, Jun Rao LinkedIn · 2019-07-17 · Kafka: a Distributed Messaging System for Log Processing Jay Kreps, Neha Narkhede, Jun Rao LinkedIn

ABOUT LINKEDIN

• Professional social network platform

• top 50th largest site in the world (traffic)

• 100M+ members

Page 4: Jay Kreps, Neha Narkhede, Jun Rao LinkedIn · 2019-07-17 · Kafka: a Distributed Messaging System for Log Processing Jay Kreps, Neha Narkhede, Jun Rao LinkedIn

LOGGING OVERVIEW• Many types of events

• user activity events: impression, search, ads, etc

• operational events: call stack, service metrics, etc

• High volume: billions of events per day

• Both online and offline use case

• reporting, batch analysis

• security, news feeds, performance dashboard, ...

Page 5: Jay Kreps, Neha Narkhede, Jun Rao LinkedIn · 2019-07-17 · Kafka: a Distributed Messaging System for Log Processing Jay Kreps, Neha Narkhede, Jun Rao LinkedIn

DEPLOYMENT

Frontend Frontend Frontend

VIP

KafkaKafkaKafka

Realtimeservice

Realtimeservice

OracleAsterdata

Main site

KafkaKafkaKafka

Analysis site

Hadoop

Page 6: Jay Kreps, Neha Narkhede, Jun Rao LinkedIn · 2019-07-17 · Kafka: a Distributed Messaging System for Log Processing Jay Kreps, Neha Narkhede, Jun Rao LinkedIn

KAFKA DESIGN PRINCIPLES

• Simple API

• Efficient

•Distributed

Page 7: Jay Kreps, Neha Narkhede, Jun Rao LinkedIn · 2019-07-17 · Kafka: a Distributed Messaging System for Log Processing Jay Kreps, Neha Narkhede, Jun Rao LinkedIn

PRODUCER API

void send(String topic, ByteBufferMessageSet messages)

producer = new KafkaProducer(…); message = new Message(“test message str”.getBytes()); set = new ByteBufferMessageSet(message); producer.send(“test”, set);

Page 8: Jay Kreps, Neha Narkhede, Jun Rao LinkedIn · 2019-07-17 · Kafka: a Distributed Messaging System for Log Processing Jay Kreps, Neha Narkhede, Jun Rao LinkedIn

CONSUMER API

streams[] = Consumer.createMessageStreams(“test”, 1)

for(message: streams[0]) { bytes = message.payload() // do something with bytes}

Page 9: Jay Kreps, Neha Narkhede, Jun Rao LinkedIn · 2019-07-17 · Kafka: a Distributed Messaging System for Log Processing Jay Kreps, Neha Narkhede, Jun Rao LinkedIn

EFFICIENCY #1: SIMPLE STORAGE

• Each topic has an evergrowing log

• A log == a list of files

• A message is addressed by a log offset

Page 10: Jay Kreps, Neha Narkhede, Jun Rao LinkedIn · 2019-07-17 · Kafka: a Distributed Messaging System for Log Processing Jay Kreps, Neha Narkhede, Jun Rao LinkedIn

EFFICIENCY #2: CAREFUL TRANSFER

• Batch send and fetch

•No message caching in Kafka layer

• Rely on file system page cache

•mostly, sequential access patterns

• Zero-copy transfer : file -> socket

Page 11: Jay Kreps, Neha Narkhede, Jun Rao LinkedIn · 2019-07-17 · Kafka: a Distributed Messaging System for Log Processing Jay Kreps, Neha Narkhede, Jun Rao LinkedIn

EFFICIENCY #3: STATELESS BROKER

• Each consumer maintains its own state

•Message deletion driven by retention policy, not by tracking consumption

• acceptable in practice

• rewindable consumer

Page 12: Jay Kreps, Neha Narkhede, Jun Rao LinkedIn · 2019-07-17 · Kafka: a Distributed Messaging System for Log Processing Jay Kreps, Neha Narkhede, Jun Rao LinkedIn

AUTO CONSUMER LOAD BALANCING

• brokers and consumers register in zookeeper

• consumers listen to broker and consumer changes

• each change triggers consumer rebalancing

broker broker broker broker

consumer

zookeeper

consumer

Page 13: Jay Kreps, Neha Narkhede, Jun Rao LinkedIn · 2019-07-17 · Kafka: a Distributed Messaging System for Log Processing Jay Kreps, Neha Narkhede, Jun Rao LinkedIn

PRODUCER PERFORMANCE

!

Page 14: Jay Kreps, Neha Narkhede, Jun Rao LinkedIn · 2019-07-17 · Kafka: a Distributed Messaging System for Log Processing Jay Kreps, Neha Narkhede, Jun Rao LinkedIn

CONSUMER PERFORMANCE

!

Page 15: Jay Kreps, Neha Narkhede, Jun Rao LinkedIn · 2019-07-17 · Kafka: a Distributed Messaging System for Log Processing Jay Kreps, Neha Narkhede, Jun Rao LinkedIn

ROADMAP

•New Kafka features

• compression

• replication

• stream processing (online M/R)

• http://sna-projects.com/kafka/