kafka based distributed message system

35
DMS Kafka based Distributed Message System Samuel Chen 2014/8/1

Upload: samuel-chen

Post on 30-Oct-2014

124 views

Category:

Software


5 download

DESCRIPTION

Introduced the Apache Kafka concept such as broker, group, topic, partition, leader, producer, consumer, offset, cluster and etc. And also introduced the Java lib for quick development of either kafka producer or consumer. At the end, introduced some policy for failover.

TRANSCRIPT

Page 1: Kafka based Distributed Message System

DMS

Kafka based Distributed Message System

Samuel Chen2014/8/1

Page 2: Kafka based Distributed Message System

Agent

• Kafka concepts• Messaging System• Producer• Consumer• Failover• Web Console• Q & A

Page 3: Kafka based Distributed Message System

Kafka Concepts

• Broker• Topic & Partition• Leader• Replication• Offset

• Message• Producer• Consumer• Group• Log

Page 4: Kafka based Distributed Message System

Broker

• Broker is a cluster node• Fail tolerate for N-1 servers failures. N is

replica factor for a topic

• Zookeeper

(replica factor 5)

Broker 0 Broker 1

Broker 2Broker 4 Broker 3

Page 5: Kafka based Distributed Message System

Topic & Partition

• Topic is name of the channel for a category of messages. Simply treat it as the collection of queues for same type messages.

• A topic is separated as N partitions to provide replica and parallel capability.

• A partition is an ordered, immutable sequence of messages. Simply it can be treat as a queue in a topic. Your message will be send to one of the partitions by different algorithms.

Page 6: Kafka based Distributed Message System

Topic & Partition

• Messages are stored in topic partitions on broker.

• Message are ordered with an index - offset

• Each partition has its own offset sequence.

• Partitions are replicated to brokers.

Page 7: Kafka based Distributed Message System

Leader

• Leader is the read/write broker for a partition ( partition is replicated to multi brokers by replica factor N)

• Other replicated partitions (N-1) are followers.• Follower will forward request to leader and

fetch messages from leader.• Once the leader fail, Kafka will select a new

leader from the followers.

Page 8: Kafka based Distributed Message System

Leader

Leader 0,1,2

Follower 0,1,2

Follower 0,1,2

Leader 0

Replica by fetch

Replica by fetch

(leader -> follower if recovered)

Partition 0(leader) Partition 1(leader) Partition 2

Partition 0 (follower -> leader)Partition 1 (follower)Partition 2 (follower)

Partition 0 (follower)Partition 1 (follower)Partition 2 (follower)

Replicate partition 0 from broker 2 after failure

Page 9: Kafka based Distributed Message System

Replication

• Replicate the messages(log) for each topic’s partitions across the brokers (config, replica factor)

• Replication factor 1 means un-replicated.• Leader keeps list of followers to replicate– ZK heartbeat– Not too far behind leader

(replica.lag.max.messages & replica.lag.time.max.ms)

Page 10: Kafka based Distributed Message System

Producer

• Producer is the “man” to produce messages. Simply treat producer as the commander who sends action orders.

• A producer produces for a topic.• Send your message to a topic on either broker via high

level producer API. Kafka will find the leader and balance partition for you.

• Async / Sync + ack• Batch sending• Keyed message (log retain policy, partition selection)

Page 11: Kafka based Distributed Message System

Consumer

• Consumer is the “man” to consume messages. Simply treat consumer as the soldier who executes the commander’s (producer’s) orders.

• A consumer consume from a topic.• Consume your message from a topic on either

broker via High level API without seeking offset capability.

• One consumer per partition. Use more partition for parallel.

• Commit your change.

Page 12: Kafka based Distributed Message System

Group

• A group means a collection of consumers may not for a single topic.

• A topic may have consumers from different groups.

• A group may have consumers consuming different topics.

• A consumer is belongs only one group and consumes from single topic.

Page 13: Kafka based Distributed Message System

Offset

• Offset is the ordered index for messages. You may access message by offset even a message was consumed.

• For a partition of a topic, each group has an offset. All consumers in a same group for a topic use the same offset.

• First offset may not be zero. (data may be deleted)• Auto commit via high level API

(auto.commit.enable)

Page 14: Kafka based Distributed Message System

Message

• Ordered messages. Messages sent by a producer to a particular topic partition will be appended in the order they are sent.

• A consumer instance sees messages in the order they are stored in the log.

• Stored on server FS (retain time, size are configurable)

• Size limitation (message.max.bytes)• Serialization / Deserialization

Page 15: Kafka based Distributed Message System

Message

• At most once—Messages may be lost but are never redelivered. (Kafka default)

• At least once—Messages are never lost but may be redelivered. (Disable retry)

• Exactly once—this is what people actually want, each message is delivered once and only once. (via offset)

Page 16: Kafka based Distributed Message System

Log

• Log is record of offset and message persistently stored.

• Cleanup policy (time, size)

• Compaction – store last one for same

key– Offset no change. No

order change.

Page 17: Kafka based Distributed Message System

Messaging System

• Independent • Scalability• Durability• Online / offline / real time• Other benefits• Scenario

Page 18: Kafka based Distributed Message System

Independent

• MQ is another system. Message is the protocol between systems/apps.

• Send and forget. Focus on the what you are doing. The rest let the other app/system handle.

• A worker for an aspect• Reduce dependent from package/class

reference.• Forget processing capability

Page 19: Kafka based Distributed Message System

Scalability

• Dynamically add/remove worker to enlarge parallel processing capability.

• Distributed in different servers

Page 20: Kafka based Distributed Message System

Durability

• Distributed in different server• Start server instances as wish• No data lost• Re-processing-able

Page 21: Kafka based Distributed Message System

Online / Offline / Real Time

• Online data processing for application business logic.

• Offline data re-use-able for hadoop batch analysis

• Real time data analysis for storm• Real time message process for news/company

updates and so on.

Page 22: Kafka based Distributed Message System

Other Benefits

• None blocking - Asynchronization• No threading – simplified• Reduce DB load – No need to use DB to

exchange data for jobs.

Page 23: Kafka based Distributed Message System

Scenario

DBMQ

Workeruser activity log

Workerprovision

ProducerProvision User followed a

Company

Job for provision company Job for user activity

Page 24: Kafka based Distributed Message System

Scenario

DBMQ

Worker(s)News process mid

way

WorkerNews final process

WorkerCrawl news

Jobs to extract info of news

Job for tag news (whatever)

App job to scan new news

Page 25: Kafka based Distributed Message System

Producer

• IProduce• IMessage• DMSProducer• DMSProducerManager• Partitioner• FailManager• Logger• Customized Producer

Page 26: Kafka based Distributed Message System

Producer

• Send keyed message so that we could partition it.

• Asynchronization.• Retry and handle fail states• Handle failed data and check if recovered• Register all producers automatically• Serialization / Deserialization

Page 27: Kafka based Distributed Message System

Producer

• Implement your producer by extending DMSProducer with the topic.

• Implement your message class by implements IMessage. getKey() is used to return the message key.

• Send your message by using producer.send(IMessage) in your logic.

• Let’s handle the remains.

Page 28: Kafka based Distributed Message System

Consumer

• IConsume• ConsumerExecution• TopicConsumer• ConsumerManager• Kafka• ZookeeperListeners

Page 29: Kafka based Distributed Message System

Consumer

• One consumer instance maps to the single broker(host:port), single topic + partition, single group

• The broker is the seed to find the leader. When find a leader, it’s replaced. If this broker failed, switch to another in the broker list.

• One consumer consumes from one partition of the topic. ConsumerManager balances it.

• The consumed offset is stored in local file. It will be used to compare with server offset. If bigger, then use local offset after crashed.

Page 30: Kafka based Distributed Message System

Consumer

• Automatically register and create instance for consumers

• Support dynamically add/remove consumers from a topic. Automatically or manually select its partition (in single instance)

• Balance partition – consumer cross server/process by Listening to ZK cluster (multi instance)

• Auto/manually commit offset• Shutdown after the message consumed• Hook system signal to handle “kill” command

Page 31: Kafka based Distributed Message System

Consumer

• Implement your consumer by extending DMSConsumer

• Overwrite onMessage to handle received messages

• Try catch• Let’s handle the remains

Page 32: Kafka based Distributed Message System

Failover

• Producer failover– When failed, retry N times– If still fail, log the messages to FailProducer (File, DB)– When Recovered, send the backlog to Kafka

• Consumer offset recover– Commit offset to cluster by interval– Write offset to local file– Compare with server offset if start from crash

• Consumer balancing– Register on ZK– One partition one consumer– Rebalance it once state changed

Page 33: Kafka based Distributed Message System

Failover

• Network issue – retry by different intervals• Access issue – raise exception and handle it• Data damaged• Time lag• Server crash/killed – backup information and recover

once server comes back• Client crash/killed – backup states and compare with

server to recover when resume• One time fail is not fail• Check states with all nodes

Page 34: Kafka based Distributed Message System

TBD: Web Console

• Producer running states• Consumer running states• Interface to hot manually balance consumers

across machine/process• Interface to hot change/refresh configuration• Cluster running states

Page 35: Kafka based Distributed Message System

Q & A

• Thanks