introduction to kafka

28
Introduction to Kafka Akash Vacher 2015/12/5

Upload: akash-vacher

Post on 14-Apr-2017

945 views

Category:

Data & Analytics


0 download

TRANSCRIPT

Page 1: Introduction to Kafka

Introduction to KafkaAkash Vacher

2015/12/5

Page 2: Introduction to Kafka

▪Akash VacherSRE,Data Infrastructure Streaming (Bengaluru)Linkedin

Page 3: Introduction to Kafka

SRE?

▪Site Reliability Engineers

–Administrators

–Architects

–Developers

▪Keep the site running, always

Page 4: Introduction to Kafka

Agenda

▪ Kafka Overview

▪ Some facts and figures

▪ Basic Kafka concepts

▪ Some use cases

▪ Q and A

Page 5: Introduction to Kafka

Kafka Overview

▪ High-throughput distributed messaging system

▪ Kafka guarantees:

– At least once delivery

– Strong ordering

▪ Developed at Linkedin and open sourced in early 2011

▪ Implemented in Scala and Java

Page 6: Introduction to Kafka

Kafka users

Source: https://cwiki.apache.org/confluence/display/KAFKA/Powered+By

Page 7: Introduction to Kafka

Attributes of a Kafka Cluster

• Disk Based

• Durable

• Scalable

• Low Latency

• Finite Retention

Page 8: Introduction to Kafka

Motivation

▪ Unified platform to handle all real time data feeds

▪ High throughput

▪ Stream Processing

▪ Horizontally scalable

Page 9: Introduction to Kafka

Before

Page 10: Introduction to Kafka

After

Page 11: Introduction to Kafka

How is Kafka used at Linkedin?

▪ Monitoring (inGraphs)

▪ User tracking

▪ Email and SMS notifications

▪ Stream processing (Samza)

▪ Database Replication

Page 12: Introduction to Kafka

Facts and figures

▪ Over 1,300,000,000,000 messages are produced to Kafka everyday at LinkedIn

▪ 300 Terabytes of inbound and 900 Terabytes of outbound traffic

▪ 4.5 Million messages per second, on single cluster

▪ Kafka runs on ~1300 servers at LinkedIn

Page 13: Introduction to Kafka

Building blocks

Page 14: Introduction to Kafka

The humble log

Page 15: Introduction to Kafka

Anatomy of a topic

Page 16: Introduction to Kafka

Consumer groups

Page 17: Introduction to Kafka

Bird’s eye view

Page 18: Introduction to Kafka

Kafka in action

Broker AP0

AP1

AP1

AP0 AP0

Consumer

Producer

Zookeeper

Page 19: Introduction to Kafka

Performance recipe

▪ OS page cache▪ Linear IO, never fear the file system!▪ sendfile(), system call▪ Message batching

Page 20: Introduction to Kafka

Operating Kafka▪ Broker Hardware

– Cisco C240, Intel xeon quad core, 64GB RAM , 14 disk Raid-10

▪ Zookeeper Hardware– 5 + 1 ensemble, 64GB RAM, 500GB SSD

Page 21: Introduction to Kafka

Operating Kafka▪ Monitoring

– Under Replicated Partitions– Unclean leader election– Lag monitoring– Burrow

▪ Cluster rebalance – Sizewise rebalance– Partitionwise rebalance

Page 22: Introduction to Kafka

Kafka at Linkedin

▪ Multiple data centers

▪ Mirror data

▪ Cluster Types

– Tracking

– Metrics

– Queuing

▪ Data transport from applications to Hadoop, and back

Page 23: Introduction to Kafka

Metrics collection▪ Building Blocks

– Sensors– RRD– Front end

▪ Facts & Figures

– 320,000,000 metrics collected per minute

– 530 TB of disk space

– Over 210,000 metricscollected per service

Page 24: Introduction to Kafka

InGraphs

Page 25: Introduction to Kafka

Kafka for database replication - Master slave

Page 26: Introduction to Kafka

Kafka for database replication - Multi master

Page 27: Introduction to Kafka

How Can You Get Involved?

▪ http://kafka.apache.org

▪ Join the mailing lists–[email protected]

▪ irc.freenode.net - #apache-kafka

▪ Contribute

Page 28: Introduction to Kafka

Questions?