introduction to kafka and zookeeper
DESCRIPTION
A short presentation on Overview of Kafka and Zookeeper for beginners to understand the basic concepts of these two in a lucid manner.TRANSCRIPT
![Page 1: Introduction to Kafka and Zookeeper](https://reader035.vdocuments.us/reader035/viewer/2022081414/54c6dd9c4a7959261a8b45bc/html5/thumbnails/1.jpg)
Introduction to Kafka and Zookeeper
June Hadoop MeetupRahul Jain
@rahuldausa
![Page 2: Introduction to Kafka and Zookeeper](https://reader035.vdocuments.us/reader035/viewer/2022081414/54c6dd9c4a7959261a8b45bc/html5/thumbnails/2.jpg)
Who am I?
Software Engineer Member of Core technology @ IVY Comptech,
Hyderabad, India 6 years of programming experience Areas of expertise/interest
High traffic web applications JAVA/J2EE Big data, NoSQL Information-Retrieval, Machine learning
2
![Page 3: Introduction to Kafka and Zookeeper](https://reader035.vdocuments.us/reader035/viewer/2022081414/54c6dd9c4a7959261a8b45bc/html5/thumbnails/3.jpg)
3
Agenda
• Overview• Zookeeper• Messaging System (Basic Concepts)• Kafka• Q&A
![Page 4: Introduction to Kafka and Zookeeper](https://reader035.vdocuments.us/reader035/viewer/2022081414/54c6dd9c4a7959261a8b45bc/html5/thumbnails/4.jpg)
Apache Zookeeper TM
![Page 5: Introduction to Kafka and Zookeeper](https://reader035.vdocuments.us/reader035/viewer/2022081414/54c6dd9c4a7959261a8b45bc/html5/thumbnails/5.jpg)
What is a Distributed System
“A Distributed system consists of multiple computers that communicate and coordinate their actions by passing messages. The components interact with each
other in order to achieve a common goal. ”- Wikipedia
![Page 6: Introduction to Kafka and Zookeeper](https://reader035.vdocuments.us/reader035/viewer/2022081414/54c6dd9c4a7959261a8b45bc/html5/thumbnails/6.jpg)
6
What is Zookeeper
• An Open source, High Performance coordination service for distributed applications
• Centralized service for – Configuration Management– Locks and Synchronization for providing coordination between
distributed systems– Naming service (Registry)– Group Membership
• Features– hierarchical namespace– provides watcher on a znode– allows to form a cluster of nodes
• Supports a large volume of request for data retrieval and update
• http://zookeeper.apache.org/
Source : http://zookeeper.apache.org
![Page 7: Introduction to Kafka and Zookeeper](https://reader035.vdocuments.us/reader035/viewer/2022081414/54c6dd9c4a7959261a8b45bc/html5/thumbnails/7.jpg)
Zookeeper Use cases• Configuration Management
• Cluster member nodes Bootstrapping configuration from a central source
• Distributed Cluster Management• Node Join/Leave• Node Status in real time
• Naming Service – e.g. DNS• Distributed Synchronization – locks, barriers• Leader election• Centralized and Highly reliable Registry
![Page 8: Introduction to Kafka and Zookeeper](https://reader035.vdocuments.us/reader035/viewer/2022081414/54c6dd9c4a7959261a8b45bc/html5/thumbnails/8.jpg)
Zookeeper Data Model Hierarchical Namespace Each node is called “znode” Each znode has data(stores data in
byte[] array) and can have children znode
– Maintains “Stat” structure with version of data changes , ACL changes and timestamp
– Version number increases with each changes
![Page 9: Introduction to Kafka and Zookeeper](https://reader035.vdocuments.us/reader035/viewer/2022081414/54c6dd9c4a7959261a8b45bc/html5/thumbnails/9.jpg)
Let’s recall basic concepts ofMessaging System
![Page 10: Introduction to Kafka and Zookeeper](https://reader035.vdocuments.us/reader035/viewer/2022081414/54c6dd9c4a7959261a8b45bc/html5/thumbnails/10.jpg)
Point to Point Messaging (Queue)
Credit: http://fusesource.com/docs/broker/5.3/getting_started/FuseMBStartedKeyJMS.html
![Page 11: Introduction to Kafka and Zookeeper](https://reader035.vdocuments.us/reader035/viewer/2022081414/54c6dd9c4a7959261a8b45bc/html5/thumbnails/11.jpg)
Publish-Subscribe Messaging (Topic)
Credit: http://fusesource.com/docs/broker/5.3/getting_started/FuseMBStartedKeyJMS.html
![Page 12: Introduction to Kafka and Zookeeper](https://reader035.vdocuments.us/reader035/viewer/2022081414/54c6dd9c4a7959261a8b45bc/html5/thumbnails/12.jpg)
Apache Kafka
![Page 13: Introduction to Kafka and Zookeeper](https://reader035.vdocuments.us/reader035/viewer/2022081414/54c6dd9c4a7959261a8b45bc/html5/thumbnails/13.jpg)
13
Overview• An apache project initially developed at LinkedIn• Distributed publish-subscribe messaging system• Designed for processing of real time activity stream data e.g. logs,
metrics collections• Written in Scala• Does not follow JMS Standards, neither uses JMS APIs• Features
– Persistent messaging– High-throughput– Supports both queue and topic semantics – Uses Zookeeper for forming a cluster of nodes (producer/consumer/broker)and many more…
• http://kafka.apache.org/
![Page 14: Introduction to Kafka and Zookeeper](https://reader035.vdocuments.us/reader035/viewer/2022081414/54c6dd9c4a7959261a8b45bc/html5/thumbnails/14.jpg)
How it works
Credit : http://kafka.apache.org/design.html
![Page 15: Introduction to Kafka and Zookeeper](https://reader035.vdocuments.us/reader035/viewer/2022081414/54c6dd9c4a7959261a8b45bc/html5/thumbnails/15.jpg)
15
Real time transfer
Consumer3(Group2)
Kafka Broker
Consumer4(Group2)
Producer
Zookeeper
Consumer2(Group1)
Consumer1(Group1)
get K
afka
brok
er a
ddre
ss
Streaming
Fetch messages
Update ConsumedMessage offset
QueueTopology
Topic Topology
Kafka Broker
![Page 16: Introduction to Kafka and Zookeeper](https://reader035.vdocuments.us/reader035/viewer/2022081414/54c6dd9c4a7959261a8b45bc/html5/thumbnails/16.jpg)
Design Elements• Uses Filesystem Cache
• Zero-copy transfer of messages
• Batching of Messages
• Batch Compression
• Automatic Producer Load balancing.
• Broker does not Push messages to Consumer, Consumer Polls messages from Broker.
![Page 17: Introduction to Kafka and Zookeeper](https://reader035.vdocuments.us/reader035/viewer/2022081414/54c6dd9c4a7959261a8b45bc/html5/thumbnails/17.jpg)
Design Elements (Contd.)
• Cluster formation of Broker/Consumer using Zookeeper, – So on the fly more consumer, broker can be introduced. The new
cluster rebalancing will be taken care by Zookeeper
• Data is persisted in broker – But not removed on consumption (till retention period), so if one
consumer fails while consuming, same message can be re-consumed again later from broker.
• Simplified storage mechanism for message, – not for each message per consumer.
![Page 18: Introduction to Kafka and Zookeeper](https://reader035.vdocuments.us/reader035/viewer/2022081414/54c6dd9c4a7959261a8b45bc/html5/thumbnails/18.jpg)
Performance Numbers
Credit : http://research.microsoft.com/en-us/UM/people/srikanth/netdb11/netdb11papers/netdb11-final12.pdf
Producer Performance Consumer Performance
![Page 19: Introduction to Kafka and Zookeeper](https://reader035.vdocuments.us/reader035/viewer/2022081414/54c6dd9c4a7959261a8b45bc/html5/thumbnails/19.jpg)
Questions ?@rahuldausa on twitter and slideshare
http://www.linkedin.com/in/rahuldausa