apache kafka: a high-throughput distributed messaging system @ jcconf 2014

#JCConf�

Apache KafkaA high-throughput distributed messaging system

陸振恩 (popcorny) [email protected]

mailto:[email protected]

#JCConf�

● What is Kafka ● Basic concept ● Why Kafka fast ● Programming Kafka ● Using scenarios ● Recap

Outline

2

#JCConf�

What is Kafka

#JCConf�

● More and more data and metrics need to be collected - Web activity tracking - Operation metrics - Application log aggregation - Commit log - …

● We need a message bus to collect and relay these data - Big Volume - Fast and Scalable

Motivation

4

#JCConf�

● Developed by Linkedin ● Message System

- Queue - Pub / Sub

● Written in Scala ● Features

- Durability - Scalability - High Availability - High Throughput

Kafka

5

#JCConf�

BigData World

6

Traditional BigData

File System NFS HDFS, S3

Database RDBMS Cassandra, HBase

Batch Processing SQL Hadoop MapReduce

Spark

Stream Processing In-App Processing Strom,

Spark Streaming

Message Service AMQP-compliant Kafka

#JCConf�

● Durability - All messages are persisted - Sequential read & write (like log file) - Consumers keep the message offset (like file

descriptor) - The log files are rotated (like logrotate) - Messages are only deleted on expired. (like

logrotate) - Support Batch Load and Real Time Usage!

• cat access.log | grep ‘jcconf’ ● tail -F access.log | grep ‘jcconf’

Features

7

#JCConf�

Design like Message Queue Implementation like Distributed Log File

8

#JCConf�

● Scalability - Horizontal scale out - Topic is partitioned (sharded)

● High Availability - Partition can be replicated

Features

9

#JCConf�

● High Throughput

Features

10

source: http://www.infoq.com/articles/apache-kafka

http://www.infoq.com/articles/apache-kafka

#JCConf�

Basic Concept

#JCConf�

● Producer - The role to send message to broker ● Consumer -The role to receive message from broker ● Broker - One node of Kafka cluster. ● ZooKeeper - Coordinator of Kafka cluster and costumer

groups.

Kafka Cluster

Physical Components

12

Producer BroakerBroakerBroaker

Zookeeper

Consumer Group

Consumer

#JCConf�

● Topic!- The named destination of

partition. ● Partition

- One Topic can have multiple partition

- Unit of parallelism ● Message!

• Key/value pair • Message offset

Logical ComponentsTopic

B

Partition 2

0 1

E

2

F

3

M

4

N

5

Q

6

R

7

S

8

Y

9

b

C

Partition 3

0 1

D

2

K

3

L

4

O

5

P

6

T

7

U

A

Partition 1

0 1

G

2

H

3

I

4

J

5

V

6

W

7

X

8

c

13

#JCConf�

One Partition One Consumer (Queue)

P CA

Partition 1

0 1

B

2

C

3

D

4

E

5

F

6

G

7

H

8

I

9

J

offset = 8

14

Consumers keep the offset.Broker has no idea about if message is proceeded

#JCConf�

One Partition Multiple Consumer (Pub/Sub)

P A

Partition 1

0 1

B

2

C

3

D

4

E

5

F

6

G

7

H

8

I

9

J

C1

C2

C3

offset = 8

offset = 7

offset = 9

15

Each Consumer keep its own offset.

#JCConf�

broker2

Multiple Partitions

broker1

P

A

Partition 1

0 1

G

2

H

3

I

4

J

5

V

6

W

7

X

8

c

B

Partition 2

0 1

E

2

F

3

M

4

N

5

Q

6

R

7

S

8

Y

9

b

C

Partition 3

0 1

D

2

K

3

L

4

O

5

P

6

T

7

U

16

C1

p1.offset = 7 p2.offset = 9 p3.offset = 7

Dispatched by hashed key

#JCConf�

broker2

Multiple Partitions

broker1

P

A

Partition 1

0 1

G

2

H

3

I

4

J

5

V

6

W

7

X

8

c

B

Partition 2

0 1

E

2

F

3

M

4

N

5

Q

6

R

7

S

8

Y

9

b

C

Partition 3

0 1

D

2

K

3

L

4

O

5

P

6

T

7

U

17

C2

offset = 9

offset = 7

C3

offset = 7

C1

#JCConf�

Can we auto-rebalance the consumers to partitions?

18

Yes, Consumer Group!!

#JCConf�

● A group of workers ● Share the offsets ● Offsets are synced to ZooKeeper ● Auto Rebalancing

Consumer Group

19

#JCConf�

Consumer Group

20

broker2

broker1

P

A

Partition 1

0 1

G

2

H

3

I

4

J

5

V

6

W

7

X

8

c

B

Partition 2

0 1

E

2

F

3

M

4

N

5

Q

6

R

7

S

8

Y

9

b

C

Partition 3

0 1

D

2

K

3

L

4

O

5

P

6

T

7

U

Consumer Group ‘group1’

C2

p1.offset = 7 p2.offset = 9 p3.offset = 7

C1

#JCConf�

Consumer Group

21

broker2

broker1

P

A

Partition 1

0 1

G

2

H

3

I

4

J

5

V

6

W

7

X

8

c

B

Partition 2

0 1

E

2

F

3

M

4

N

5

Q

6

R

7

S

8

Y

9

b

C

Partition 3

0 1

D

2

K

3

L

4

O

5

P

6

T

7

U

’group1’

C2

C1

C1

’group2’

#JCConf�

Consumer Group

P A

Partition 1

0 1

B

2

C

3

D

4

E

5

F

6

G

7

H

8

I

9

J

C1

C2

C3

offset = 9

Consumer Group

22

Partition to Consumer is Many to One relation (In One Consumer Group)

#JCConf�

● Messages from the same partition guarantee FIFO semantic

● Traditional MQ can only guarantee message are delivered in order

● Kafka can guarantee messages are handled in order (for same partition)

Message Ordering

23

P BC1

C2

PP1 C1

C2P2

Traditional MQ Kafka

#JCConf�

● At most once - Messages may be lost but are never redelivered.

● At least once - Messages are never lost but may be redelivered.

● Exactly once - each message is delivered once and only once. (this is what people actually want) - Two-Phase Commit - At least once + Idempotence

Delivery Semantic

24

Apply multiple times without changing the final result

#JCConf�

● Which part do we discuss?

Delivery Semantic

25

Producer Broker Consumer


#JCConf�

● At most once - Async send ● At least once - Sync send (with retry count) ● Exactly once!

- Idempotent delivery does not support until next version (0.9)

Producer To Broker

26


#JCConf�

● At most once - Store the offset before handling the message

● At least once - Store the offset after handling the message

● Exactly once - At least once + Idempotent operation

Broker to Consumer

27


#JCConf�

● The unit of replication is the partition!● Each partition has a single leader and zero or more

followers ● All reads and writes go to the leader of the partition

Replication

28

source: http://www.infoq.com/articles/apache-kafka

Leader FollowerFollower

Producer Consumer

sync sync

write read

http://www.infoq.com/articles/apache-kafka

#JCConf�29

#JCConf�

● Many data system retain a latest state for data by some key.

● Log compaction adds an alternative retention mechanism, log compaction, to support retaining messages by key instead of purely by time.

● This would describe both many common data systems — a search index, a cache, etc

Log Compaction

30

#JCConf�

Log Compaction

31

#JCConf�

Log Compaction

32

#JCConf�

Why Kafka Fast?

#JCConf�34

Persistence and Fast?

#JCConf�

● Don’t fear file system ● Six 7,200 RPM SATA RAID-5 array

- Sequential write: 600MB/sec - Random write: 100K/sec

● Sequential read in disk faster than random access in memory?

Sequential vs Random

35

source: http://queue.acm.org/detail.cfm?id=1563874

http://queue.acm.org/detail.cfm?id=1563874

#JCConf�

If we persist data, should we cache the data in memory?

36

#JCConf�

● In-Process Cache - Message as object - Cache in JVM heap.

● Page Cache - Disk cache by OS

In-Process Cache vs Page Cache

37

#JCConf�


38

In Process Cache Disk Page CacheMemory Usage In-heap memory Free Memory

Overhead Object overhead No

Garbage Collection Yes No

Process Restart Lost Still Warm

Controled by App OS

#JCConf�

● Fact - All disk reads and writes will go through page

cache. This feature cannot easily be turned off without using direct I/O, so even if a process maintains an in-process cache of the data, this data will likely be duplicated in OS pagecache, effectively storing everything twice.

● Conclusion - Relying on pagecache is superior to maintaining

an in-process cache or other structure


39

#JCConf�

How to transfer to consumers?

40

#JCConf�

Application Copy vs Zero Copying

41

#JCConf�

● Traditional Queue - Broker keep the message state and metadata - B-Tree O(log n) - Random Access

● Kafka - Consumers keep the offset - Sequential Disk Read/Write O(1)

Constant Time

42

#JCConf�

Programming Kafka

#JCConf�

Producer

44

Sync Send

#JCConf�

Producer

45

Async Send

#JCConf�

High Level Consumer

46

Open The Consumer Connector

Open the stream for topic

#JCConf�

High Level Consumer

47

Receive the message

#JCConf�

Using Scenarios

#JCConf�

● Realtime processing and analyzing ● Stream processing frameworks

- Strom - Spark Streaming - Samza

● Distributed stream source + Distributed stream processing

● All these three frameworks support Kafka as stream source.

Source of Stream Processing

49

Kafka Cluster

Stream Processing

#JCConf�

● The most reliable source for stream processingSource of Stream Processing

50

source: http://www.slideshare.net/ptgoetz/apache-storm-vs-spark-streaming

http://www.slideshare.net/ptgoetz/apache-storm-vs-spark-streaming

#JCConf�

● Centralized Log Framework ● Distributed Log Collectors

- Logstash - Fluentd - Flume

Source and/or Sink of Distributed Log Collectors

51

Kafka Cluster

Distributed Log Collector

Other Sink

Kafka Cluster


Other Source

http://jasonwilder.com/blog/2013/07/16/centralized-logging-architecture/

#JCConf�

● Push vs Pull

● Distributed Log Collector provide Configurable producer and consumer

● Kafka Cluster provide distributed, high availability, reliable message system

Source and/or Sink of Distributed Log Collectors (cont.)

52


Kafka Cluster

pull

pull

push

push

#JCConf�

● What is lambda architecture? - Stream for realtime data - Batch for historical data - Query by merged view.

Source of Lambda Architecture

53

source: http://lambda-architecture.net/

http://lambda-architecture.net/

#JCConf�

Lambda Architecture (cont.)

54

source: https://metamarkets.com/2014/building-a-data-pipeline-that-handles-billions-of-events-in-real-time/

https://metamarkets.com/2014/building-a-data-pipeline-that-handles-billions-of-events-in-real-time/

#JCConf�

● Features - Durability - Scalability - High Availability - High Throughput

● Basic Concept - Producer, Broker, Consumer, Consumer Group - Topic, Partition, Message - Message Ordering - Delivery Semantic - Replication

● Why Kafka fast ● Using Scenarios

- Source of stream processing - Source or sink of distributed log framework - Source of lambda architecture

Recap

55

#JCConf�

● Kafka Documentationkafka.apache.org/documentation.html

● Kafka Wikihttps://cwiki.apache.org/confluence/display/KAFKA/Index

● The Log: What every software engineer should know about real-time data's unifying abstractionengineering.linkedin.com/distributed-systems/log-what-every-software-engineer-should-know-about-real-time-datas-unifying

● Benchmarking Apache Kafka: 2 Million Writes Per Second (On Three Cheap Machines)engineering.linkedin.com/kafka/benchmarking-apache-kafka-2-million-writes-second-three-cheap-machines

● Apache Kafka for Beginnersblog.cloudera.com/blog/2014/09/apache-kafka-for-beginners/

Reference

56

http://kafka.apache.org/documentation.html

https://cwiki.apache.org/confluence/display/KAFKA/Index

http://engineering.linkedin.com/distributed-systems/log-what-every-software-engineer-should-know-about-real-time-datas-unifying

http://engineering.linkedin.com/kafka/benchmarking-apache-kafka-2-million-writes-second-three-cheap-machines

http://blog.cloudera.com/blog/2014/09/apache-kafka-for-beginners/

#JCConf�57

producer.send(“thanks”);

#JCConf�

// any question? question = consumer.receive();

58

apache kafka: a high-throughput distributed messaging system @ jcconf 2014

Technology