apache: big data 2015 kafka architecture the best of apache · pdf filethe best of apache...

33
The Best of Apache Kafka Architecture Ranganathan Balashanmugam @ran_than Apache: Big Data 2015

Upload: lamhanh

Post on 10-Mar-2018

235 views

Category:

Documents


5 download

TRANSCRIPT

Page 1: Apache: Big Data 2015 Kafka Architecture The Best of Apache · PDF fileThe Best of Apache Kafka Architecture Ranganathan Balashanmugam @ran_than Apache: Big Data 2015

The Best of Apache Kafka Architecture

Ranganathan Balashanmugam

@ran_than

Apache: Big Data 2015

Page 2: Apache: Big Data 2015 Kafka Architecture The Best of Apache · PDF fileThe Best of Apache Kafka Architecture Ranganathan Balashanmugam @ran_than Apache: Big Data 2015

Helló Budapest

Page 3: Apache: Big Data 2015 Kafka Architecture The Best of Apache · PDF fileThe Best of Apache Kafka Architecture Ranganathan Balashanmugam @ran_than Apache: Big Data 2015

About Me

❏ Graduated as Civil Engineer.❏ <dev> 10+ years </dev>❏ <Thoughtworker from=”India”/>❏ Organizer of Hyderabad Scalability Meetup with 2000+

members.

Page 4: Apache: Big Data 2015 Kafka Architecture The Best of Apache · PDF fileThe Best of Apache Kafka Architecture Ranganathan Balashanmugam @ran_than Apache: Big Data 2015

“Form follows function.”

- Louis Sullivan

Page 5: Apache: Big Data 2015 Kafka Architecture The Best of Apache · PDF fileThe Best of Apache Kafka Architecture Ranganathan Balashanmugam @ran_than Apache: Big Data 2015

Gravity DamIndirasagar Dam, India

img src: http://www.montanhydraulik.in

Page 6: Apache: Big Data 2015 Kafka Architecture The Best of Apache · PDF fileThe Best of Apache Kafka Architecture Ranganathan Balashanmugam @ran_than Apache: Big Data 2015

Forces on a gravity dam

Dam weight

Head Water

Tail Water

Uplift

Page 7: Apache: Big Data 2015 Kafka Architecture The Best of Apache · PDF fileThe Best of Apache Kafka Architecture Ranganathan Balashanmugam @ran_than Apache: Big Data 2015

❏ publish-subscribe messaging service❏ distributed commit/write-ahead log

“producers produce, consumers consume, in large distributed reliable way -- real time”

Page 8: Apache: Big Data 2015 Kafka Architecture The Best of Apache · PDF fileThe Best of Apache Kafka Architecture Ranganathan Balashanmugam @ran_than Apache: Big Data 2015

❏ DBs❏ Logs❏ Brokers❏ HDFS

“For highly distributed messages, Kafka stands out.”

Why Kafka?

Page 9: Apache: Big Data 2015 Kafka Architecture The Best of Apache · PDF fileThe Best of Apache Kafka Architecture Ranganathan Balashanmugam @ran_than Apache: Big Data 2015

Kafka Vs ________

src: https://softwaremill.com/mqperf/

Page 10: Apache: Big Data 2015 Kafka Architecture The Best of Apache · PDF fileThe Best of Apache Kafka Architecture Ranganathan Balashanmugam @ran_than Apache: Big Data 2015

Timeline

2011 2012 2013 2014 2015

Open sourced by LinkedIn, as version 0.6

Graduated from Apache

Latest stable - 0.8.2.1

Several Engineers who built Kakfa create Confluent

Page 11: Apache: Big Data 2015 Kafka Architecture The Best of Apache · PDF fileThe Best of Apache Kafka Architecture Ranganathan Balashanmugam @ran_than Apache: Big Data 2015

A Kafka Message

CRC attributes key length

key message message length message content

kafka.message.Message

magic

Change requested:KAFKA-2511

Page 12: Apache: Big Data 2015 Kafka Architecture The Best of Apache · PDF fileThe Best of Apache Kafka Architecture Ranganathan Balashanmugam @ran_than Apache: Big Data 2015

Producers - push

Kafka Broker

org.apache.kafka.clients.producer.KafkaProducer

Response => [TopicName [Partition ErrorCode Offset]]

Request => RequiredAcks Timeout [TopicName [Partition MessageSetSize MessageSet]]

Page 13: Apache: Big Data 2015 Kafka Architecture The Best of Apache · PDF fileThe Best of Apache Kafka Architecture Ranganathan Balashanmugam @ran_than Apache: Big Data 2015

Topic

number of messages

time size

Remove messages based on

kafka.common.Topic

Page 14: Apache: Big Data 2015 Kafka Architecture The Best of Apache · PDF fileThe Best of Apache Kafka Architecture Ranganathan Balashanmugam @ran_than Apache: Big Data 2015

Partitions

kafka.cluster.Partition

Serves: Horizontal scaling, Parallel consumer reads

Page 15: Apache: Big Data 2015 Kafka Architecture The Best of Apache · PDF fileThe Best of Apache Kafka Architecture Ranganathan Balashanmugam @ran_than Apache: Big Data 2015

Consumers - pull

kafka.consumer.ConsumerConnector,kafka.consumer.SimpleConsumer

Consumer 1

Consumer 2

Page 16: Apache: Big Data 2015 Kafka Architecture The Best of Apache · PDF fileThe Best of Apache Kafka Architecture Ranganathan Balashanmugam @ran_than Apache: Big Data 2015

Consumer offsetscommitting and fetching consumer offsets

img src: http://www.reynanprinting.com/photos/undefined/impresion-offset1.jpg

Page 17: Apache: Big Data 2015 Kafka Architecture The Best of Apache · PDF fileThe Best of Apache Kafka Architecture Ranganathan Balashanmugam @ran_than Apache: Big Data 2015

kafka:// - protocol

● Metadata● Send● Fetch● Offsets● Offset commit● Offset fetch

“Binary protocol over TCP”

Page 18: Apache: Big Data 2015 Kafka Architecture The Best of Apache · PDF fileThe Best of Apache Kafka Architecture Ranganathan Balashanmugam @ran_than Apache: Big Data 2015

Mechanical Sympathy"The most amazing achievement of the computer software industry is its continuing cancellation of the steady and staggering gains made by the computer hardware industry." - Henry Peteroski

Image source: http://www.theguide2surrey.com

Page 19: Apache: Big Data 2015 Kafka Architecture The Best of Apache · PDF fileThe Best of Apache Kafka Architecture Ranganathan Balashanmugam @ran_than Apache: Big Data 2015

Persistence“Everything is faster till the disk IO.”

Page 20: Apache: Big Data 2015 Kafka Architecture The Best of Apache · PDF fileThe Best of Apache Kafka Architecture Ranganathan Balashanmugam @ran_than Apache: Big Data 2015

Disk faster than RAM

src: http://queue.acm.org/detail.cfm?id=1563874

Page 21: Apache: Big Data 2015 Kafka Architecture The Best of Apache · PDF fileThe Best of Apache Kafka Architecture Ranganathan Balashanmugam @ran_than Apache: Big Data 2015

Linear Read & Writes

On high level there are only two operations:

Append to end of logfetch messages from a partition beginning from a particular message id

sequential file I/O

Page 22: Apache: Big Data 2015 Kafka Architecture The Best of Apache · PDF fileThe Best of Apache Kafka Architecture Ranganathan Balashanmugam @ran_than Apache: Big Data 2015

“Let us play pictionary”

Page 23: Apache: Big Data 2015 Kafka Architecture The Best of Apache · PDF fileThe Best of Apache Kafka Architecture Ranganathan Balashanmugam @ran_than Apache: Big Data 2015

Linux Page Cache

“Kafka ate my RAM”

Page 24: Apache: Big Data 2015 Kafka Architecture The Best of Apache · PDF fileThe Best of Apache Kafka Architecture Ranganathan Balashanmugam @ran_than Apache: Big Data 2015

ZeroCopy

src: http://www.ibm.com/developerworks/library/j-zerocopy/

Page 25: Apache: Big Data 2015 Kafka Architecture The Best of Apache · PDF fileThe Best of Apache Kafka Architecture Ranganathan Balashanmugam @ran_than Apache: Big Data 2015

Batchingsmall latency to improve throughput

img src: https://prashanthpanduranga.files.wordpress.com/2015/05/tirupati.jpg

Page 26: Apache: Big Data 2015 Kafka Architecture The Best of Apache · PDF fileThe Best of Apache Kafka Architecture Ranganathan Balashanmugam @ran_than Apache: Big Data 2015

Compressionbandwidth is more expensive per-byte to scale than disk I/O, CPU, or network bandwidth capacity within a facility

kafka.message.CompressionCodec

Page 27: Apache: Big Data 2015 Kafka Architecture The Best of Apache · PDF fileThe Best of Apache Kafka Architecture Ranganathan Balashanmugam @ran_than Apache: Big Data 2015

Log compaction

img src: http://kafka.apache.org/083/documentation.htmlkafka.log.LogCleaner, LogCleanerManager

Page 28: Apache: Big Data 2015 Kafka Architecture The Best of Apache · PDF fileThe Best of Apache Kafka Architecture Ranganathan Balashanmugam @ran_than Apache: Big Data 2015

Message Delivery

Atleast once Atmost once Exactly once

Page 29: Apache: Big Data 2015 Kafka Architecture The Best of Apache · PDF fileThe Best of Apache Kafka Architecture Ranganathan Balashanmugam @ran_than Apache: Big Data 2015

Replicationun-replicated = replication factor of one

Page 30: Apache: Big Data 2015 Kafka Architecture The Best of Apache · PDF fileThe Best of Apache Kafka Architecture Ranganathan Balashanmugam @ran_than Apache: Big Data 2015

Quorum based

● Better latency● To tolerate “f” failures, need “2f+1” replicas

Page 31: Apache: Big Data 2015 Kafka Architecture The Best of Apache · PDF fileThe Best of Apache Kafka Architecture Ranganathan Balashanmugam @ran_than Apache: Big Data 2015

Primary-backup replication

Broker 1 Broker 2 Broker 3 Broker 4

Topic 1 Topic 1 Topic 1

Topic 2 Topic 2 Topic 2

Topic 3 Topic 3Topic 3

Page 32: Apache: Big Data 2015 Kafka Architecture The Best of Apache · PDF fileThe Best of Apache Kafka Architecture Ranganathan Balashanmugam @ran_than Apache: Big Data 2015

ZooKeepercluster coordinator

Page 33: Apache: Big Data 2015 Kafka Architecture The Best of Apache · PDF fileThe Best of Apache Kafka Architecture Ranganathan Balashanmugam @ran_than Apache: Big Data 2015

THANK YOUFor questions or suggestions:

Ran.ga.na.than B

[email protected]

@ran_than