apache kafka: a high-throughput distributed messaging system @ jcconf 2014
DESCRIPTION
隨著大數據的需求,資料的產生流動儲存處理已經越分越細。早期的 RDBMS 已經無法達到需求,而因此出現了 NoSQL 來解決更大資料量的問題。至於傳統的 Message Queue System 也漸漸無法達到大數據的需求,Apache Kafka 也試著解決 Message 在超大數據的需求。Kafka 可以視為新的 Message Queue Solution,或者更像 Activity Logging System,目標是達到快速、分散式、持續性、延展性等特色。為了這些特性,其中有許多設計上的巧思也值得去關注。所以在這次分享中,除了介紹 Kafka 功能以外,也會針對內部設計做更進一步的介紹。TRANSCRIPT
#JCConf�
Apache KafkaA high-throughput distributed messaging system
陸振恩 (popcorny) [email protected]
#JCConf�
● What is Kafka ● Basic concept ● Why Kafka fast ● Programming Kafka ● Using scenarios ● Recap
Outline
2
#JCConf�
What is Kafka
#JCConf�
● More and more data and metrics need to be collected - Web activity tracking - Operation metrics - Application log aggregation - Commit log - …
● We need a message bus to collect and relay these data - Big Volume - Fast and Scalable
Motivation
4
#JCConf�
● Developed by Linkedin ● Message System
- Queue - Pub / Sub
● Written in Scala ● Features
- Durability - Scalability - High Availability - High Throughput
Kafka
5
#JCConf�
BigData World
6
Traditional BigData
File System NFS HDFS, S3
Database RDBMS Cassandra, HBase
Batch Processing SQL Hadoop MapReduce
Spark
Stream Processing In-App Processing Strom,
Spark Streaming
Message Service AMQP-compliant Kafka
#JCConf�
● Durability - All messages are persisted - Sequential read & write (like log file) - Consumers keep the message offset (like file
descriptor) - The log files are rotated (like logrotate) - Messages are only deleted on expired. (like
logrotate) - Support Batch Load and Real Time Usage!
• cat access.log | grep ‘jcconf’ ● tail -F access.log | grep ‘jcconf’
Features
7
#JCConf�
Design like Message Queue Implementation like Distributed Log File
8
#JCConf�
● Scalability - Horizontal scale out - Topic is partitioned (sharded)
● High Availability - Partition can be replicated
Features
9
#JCConf�
● High Throughput
Features
10
source: http://www.infoq.com/articles/apache-kafka
#JCConf�
Basic Concept
#JCConf�
● Producer - The role to send message to broker ● Consumer -The role to receive message from broker ● Broker - One node of Kafka cluster. ● ZooKeeper - Coordinator of Kafka cluster and costumer
groups.
Kafka Cluster
Physical Components
12
Producer BroakerBroakerBroaker
Zookeeper
Consumer Group
Consumer
#JCConf�
● Topic!- The named destination of
partition. ● Partition
- One Topic can have multiple partition
- Unit of parallelism ● Message!
• Key/value pair • Message offset
Logical ComponentsTopic
B
Partition 2
0 1
E
2
F
3
M
4
N
5
Q
6
R
7
S
8
Y
9
b
C
Partition 3
0 1
D
2
K
3
L
4
O
5
P
6
T
7
U
A
Partition 1
0 1
G
2
H
3
I
4
J
5
V
6
W
7
X
8
c
13
#JCConf�
One Partition One Consumer (Queue)
P CA
Partition 1
0 1
B
2
C
3
D
4
E
5
F
6
G
7
H
8
I
9
J
offset = 8
14
Consumers keep the offset.Broker has no idea about if message is proceeded
#JCConf�
One Partition Multiple Consumer (Pub/Sub)
P A
Partition 1
0 1
B
2
C
3
D
4
E
5
F
6
G
7
H
8
I
9
J
C1
C2
C3
offset = 8
offset = 7
offset = 9
15
Each Consumer keep its own offset.
#JCConf�
broker2
Multiple Partitions
broker1
P
A
Partition 1
0 1
G
2
H
3
I
4
J
5
V
6
W
7
X
8
c
B
Partition 2
0 1
E
2
F
3
M
4
N
5
Q
6
R
7
S
8
Y
9
b
C
Partition 3
0 1
D
2
K
3
L
4
O
5
P
6
T
7
U
16
C1
p1.offset = 7 p2.offset = 9 p3.offset = 7
Dispatched by hashed key
#JCConf�
broker2
Multiple Partitions
broker1
P
A
Partition 1
0 1
G
2
H
3
I
4
J
5
V
6
W
7
X
8
c
B
Partition 2
0 1
E
2
F
3
M
4
N
5
Q
6
R
7
S
8
Y
9
b
C
Partition 3
0 1
D
2
K
3
L
4
O
5
P
6
T
7
U
17
C2
offset = 9
offset = 7
C3
offset = 7
C1
#JCConf�
Can we auto-rebalance the consumers to partitions?
18
Yes, Consumer Group!!
#JCConf�
● A group of workers ● Share the offsets ● Offsets are synced to ZooKeeper ● Auto Rebalancing
Consumer Group
19
#JCConf�
Consumer Group
20
broker2
broker1
P
A
Partition 1
0 1
G
2
H
3
I
4
J
5
V
6
W
7
X
8
c
B
Partition 2
0 1
E
2
F
3
M
4
N
5
Q
6
R
7
S
8
Y
9
b
C
Partition 3
0 1
D
2
K
3
L
4
O
5
P
6
T
7
U
Consumer Group ‘group1’
C2
p1.offset = 7 p2.offset = 9 p3.offset = 7
C1
#JCConf�
Consumer Group
21
broker2
broker1
P
A
Partition 1
0 1
G
2
H
3
I
4
J
5
V
6
W
7
X
8
c
B
Partition 2
0 1
E
2
F
3
M
4
N
5
Q
6
R
7
S
8
Y
9
b
C
Partition 3
0 1
D
2
K
3
L
4
O
5
P
6
T
7
U
’group1’
C2
C1
C1
’group2’
#JCConf�
Consumer Group
P A
Partition 1
0 1
B
2
C
3
D
4
E
5
F
6
G
7
H
8
I
9
J
C1
C2
C3
offset = 9
Consumer Group
22
Partition to Consumer is Many to One relation (In One Consumer Group)
#JCConf�
● Messages from the same partition guarantee FIFO semantic
● Traditional MQ can only guarantee message are delivered in order
● Kafka can guarantee messages are handled in order (for same partition)
Message Ordering
23
P BC1
C2
PP1 C1
C2P2
Traditional MQ Kafka
#JCConf�
● At most once - Messages may be lost but are never redelivered.
● At least once - Messages are never lost but may be redelivered.
● Exactly once - each message is delivered once and only once. (this is what people actually want) - Two-Phase Commit - At least once + Idempotence
Delivery Semantic
24
Apply multiple times without changing the final result
#JCConf�
● Which part do we discuss?
Delivery Semantic
25
Producer Broker Consumer
Producer Broker Consumer
#JCConf�
● At most once - Async send ● At least once - Sync send (with retry count) ● Exactly once!
- Idempotent delivery does not support until next version (0.9)
Producer To Broker
26
Producer Broker Consumer
#JCConf�
● At most once - Store the offset before handling the message
● At least once - Store the offset after handling the message
● Exactly once - At least once + Idempotent operation
Broker to Consumer
27
Producer Broker Consumer
#JCConf�
● The unit of replication is the partition!● Each partition has a single leader and zero or more
followers ● All reads and writes go to the leader of the partition
Replication
28
source: http://www.infoq.com/articles/apache-kafka
Leader FollowerFollower
Producer Consumer
sync sync
write read
#JCConf�29
#JCConf�
● Many data system retain a latest state for data by some key.
● Log compaction adds an alternative retention mechanism, log compaction, to support retaining messages by key instead of purely by time.
● This would describe both many common data systems — a search index, a cache, etc
Log Compaction
30
#JCConf�
Log Compaction
31
#JCConf�
Log Compaction
32
#JCConf�
Why Kafka Fast?
#JCConf�34
Persistence and Fast?
#JCConf�
● Don’t fear file system ● Six 7,200 RPM SATA RAID-5 array
- Sequential write: 600MB/sec - Random write: 100K/sec
● Sequential read in disk faster than random access in memory?
Sequential vs Random
35
source: http://queue.acm.org/detail.cfm?id=1563874
#JCConf�
If we persist data, should we cache the data in memory?
36
#JCConf�
● In-Process Cache - Message as object - Cache in JVM heap.
● Page Cache - Disk cache by OS
In-Process Cache vs Page Cache
37
#JCConf�
In-Process Cache vs Page Cache
38
In Process Cache Disk Page CacheMemory Usage In-heap memory Free Memory
Overhead Object overhead No
Garbage Collection Yes No
Process Restart Lost Still Warm
Controled by App OS
#JCConf�
● Fact - All disk reads and writes will go through page
cache. This feature cannot easily be turned off without using direct I/O, so even if a process maintains an in-process cache of the data, this data will likely be duplicated in OS pagecache, effectively storing everything twice.
● Conclusion - Relying on pagecache is superior to maintaining
an in-process cache or other structure
In-Process Cache vs Page Cache
39
#JCConf�
How to transfer to consumers?
40
#JCConf�
Application Copy vs Zero Copying
41
#JCConf�
● Traditional Queue - Broker keep the message state and metadata - B-Tree O(log n) - Random Access
● Kafka - Consumers keep the offset - Sequential Disk Read/Write O(1)
Constant Time
42
#JCConf�
Programming Kafka
#JCConf�
Producer
44
Sync Send
#JCConf�
Producer
45
Async Send
#JCConf�
High Level Consumer
46
Open The Consumer Connector
Open the stream for topic
#JCConf�
High Level Consumer
47
Receive the message
#JCConf�
Using Scenarios
#JCConf�
● Realtime processing and analyzing ● Stream processing frameworks
- Strom - Spark Streaming - Samza
● Distributed stream source + Distributed stream processing
● All these three frameworks support Kafka as stream source.
Source of Stream Processing
49
Kafka Cluster
Stream Processing
#JCConf�
● The most reliable source for stream processingSource of Stream Processing
50
source: http://www.slideshare.net/ptgoetz/apache-storm-vs-spark-streaming
#JCConf�
● Centralized Log Framework ● Distributed Log Collectors
- Logstash - Fluentd - Flume
Source and/or Sink of Distributed Log Collectors
51
Kafka Cluster
Distributed Log Collector
Other Sink
Kafka Cluster
Distributed Log Collector
Other Source
#JCConf�
● Push vs Pull
● Distributed Log Collector provide Configurable producer and consumer
● Kafka Cluster provide distributed, high availability, reliable message system
Source and/or Sink of Distributed Log Collectors (cont.)
52
Distributed Log Collector
Kafka Cluster
pull
pull
push
push
#JCConf�
● What is lambda architecture? - Stream for realtime data - Batch for historical data - Query by merged view.
Source of Lambda Architecture
53
source: http://lambda-architecture.net/
#JCConf�
Lambda Architecture (cont.)
54
source: https://metamarkets.com/2014/building-a-data-pipeline-that-handles-billions-of-events-in-real-time/
#JCConf�
● Features - Durability - Scalability - High Availability - High Throughput
● Basic Concept - Producer, Broker, Consumer, Consumer Group - Topic, Partition, Message - Message Ordering - Delivery Semantic - Replication
● Why Kafka fast ● Using Scenarios
- Source of stream processing - Source or sink of distributed log framework - Source of lambda architecture
Recap
55
#JCConf�
● Kafka Documentationkafka.apache.org/documentation.html
● Kafka Wikihttps://cwiki.apache.org/confluence/display/KAFKA/Index
● The Log: What every software engineer should know about real-time data's unifying abstractionengineering.linkedin.com/distributed-systems/log-what-every-software-engineer-should-know-about-real-time-datas-unifying
● Benchmarking Apache Kafka: 2 Million Writes Per Second (On Three Cheap Machines)engineering.linkedin.com/kafka/benchmarking-apache-kafka-2-million-writes-second-three-cheap-machines
● Apache Kafka for Beginnersblog.cloudera.com/blog/2014/09/apache-kafka-for-beginners/
Reference
56
#JCConf�57
producer.send(“thanks”);
#JCConf�
// any question? question = consumer.receive();
58