kafka short

49
1 Kafka for Kafka for BigData BigData Processing Processing Yanai Franchi , Tikal Yanai Franchi , Tikal

Upload: tikal-knowledge

Post on 27-Jan-2015

129 views

Category:

Software


1 download

DESCRIPTION

 

TRANSCRIPT

Page 1: Kafka short

1

Kafka for Kafka for BigData BigData

ProcessingProcessing

Yanai Franchi , TikalYanai Franchi , Tikal

Page 2: Kafka short

2

Find “Hot” Places

Page 3: Kafka short

3

Page 4: Kafka short

4

gogobot checkinHeat Map Service

Lets' Develop“Gogobot Checkins Heat-Map”

Page 5: Kafka short

5

Key Notes● Collector Service - Collects checkins as text addresses

– We need to use GeoLocation ServiceWe need to use GeoLocation Service

● Upon elapsed interval, the last locations list will be displayed as Heat-Map in GUI.

● Web Scale service – 10Ks checkins/seconds all over the world (imaginary, but lets do it for the exercise).

Page 6: Kafka short

6

Heat-Map Context

Text-Address

Checkins Heat-MapService

Gogobot System

GogobotMicro Service

GogobotMicro Service

GogobotMicro Service

Geo LocationService

Get-GeoCode(Address)

Heat-Map

Last Interval Locations

Page 7: Kafka short

7

Tons of Addresses Arriving Every Second

Page 8: Kafka short

8

First Reaction...

Page 9: Kafka short

9

Checkin HTTP Reactor Checkins

Topic

Storm Heat-Map Topology

Hotzones Topic

Web App

Push via WebSocket

Publish Checkins

HDFS

Checkin HTTP Firehose

Page 10: Kafka short

10

Page 11: Kafka short

11

They all are GoodBut not for all use-cases

Page 12: Kafka short

12

KafkaA little introduction

Page 13: Kafka short

13

Page 14: Kafka short

14

Why ?

Page 15: Kafka short

15

LinkedIn Original Architecture

Page 16: Kafka short

16

Page 17: Kafka short

17

What LinkedIn Want...

Page 18: Kafka short

18

Looks Familiar : Use Messaging

(i.e. JMS, RabbitMQ)

Page 19: Kafka short

19

Page 20: Kafka short

20

Page 21: Kafka short

21

Page 22: Kafka short

22

Page 23: Kafka short

23

It Didn't Scale...

Page 24: Kafka short

24

Paradigm Change : Do NOT track message

consumption

Page 25: Kafka short

25

Page 26: Kafka short

26

Page 27: Kafka short

27

Page 28: Kafka short

28

Stateless Broker &Doesn't Fear the File System

Page 29: Kafka short

29

Topics● Logical collections of partitions (the physical fi les). ● A broker contains some of the partitions for a topic

Page 30: Kafka short

30

A partition is Consumed byExactly One Group's Consumer

Page 31: Kafka short

31

Distributed & Fault-Tolerant

Page 32: Kafka short

32

Broker 1 Broker 3Broker 2

Zoo Keeper

Consumer 1 Consumer 2

Producer 1 Producer 2

Page 33: Kafka short

33

Broker 1 Broker 4Broker 3Broker 2

Zoo Keeper

Consumer 1 Consumer 2

Producer 1 Producer 2

Page 34: Kafka short

34

Broker 1 Broker 4Broker 3Broker 2

Zoo Keeper

Consumer 1 Consumer 2

Producer 1 Producer 2

Page 35: Kafka short

35

Broker 1 Broker 4Broker 3Broker 2

Zoo Keeper

Consumer 1 Consumer 2

Producer 1 Producer 2

Page 36: Kafka short

36

Broker 1 Broker 4Broker 3Broker 2

Zoo Keeper

Consumer 1 Consumer 2

Producer 1 Producer 2

Page 37: Kafka short

37

Broker 1 Broker 4Broker 3Broker 2

Zoo Keeper

Consumer 1 Consumer 2

Producer 1 Producer 2

Page 38: Kafka short

38

Broker 1 Broker 4Broker 3Broker 2

Zoo Keeper

Consumer 1 Consumer 2

Producer 1 Producer 2

Page 39: Kafka short

39

Broker 1 Broker 3Broker 2

Zoo Keeper

Consumer 1 Consumer 2

Producer 1 Producer 2

Page 40: Kafka short

40

Broker 1 Broker 3Broker 2

Zoo Keeper

Consumer 1 Consumer 2

Producer 1 Producer 2

Page 41: Kafka short

41

Broker 1 Broker 3Broker 2

Zoo Keeper

Consumer 1 Consumer 2

Producer 1 Producer 2

Page 42: Kafka short

42

Broker 1 Broker 3Broker 2

Zoo Keeper

Consumer 1

Producer 1 Producer 2

Page 43: Kafka short

43

Broker 1 Broker 3Broker 2

Zoo Keeper

Consumer 1

Producer 1 Producer 2

Page 44: Kafka short

44

Broker 1 Broker 3Broker 2

Zoo Keeper

Consumer 1

Producer 1 Producer 2

Page 45: Kafka short

45

Performance Benchmark1 Broker

1 Producer1 Consumer

Page 46: Kafka short

46

Page 47: Kafka short

47

Page 48: Kafka short

48

LinkedIn Kafka Performance (2012)

● 8 nodes per datacenter

– ~20 GB RAM available for Kafka~20 GB RAM available for Kafka

– 6TB storage, RAID 10, basic SATA drives6TB storage, RAID 10, basic SATA drives

● 10 billion messages/day

● Sustained peak:

– 172,000 messages/second written172,000 messages/second written

– 950,000 messages/second read950,000 messages/second read

● 367 topics

● 40 real-time consumers

● Many ad hoc consumers

● 9.5TB log retained (~ 6 days)

● End-to-end delivery time: A few seconds

Page 49: Kafka short

49

Thanks