devoxx fr 2016 - apache kafka - stream data platform

28
#DevoxxFR Hand’s on Kafka : http://kafka.apache.org/downloads.html Hand’s on : https://github.com/mblanc/hands_on_kafka.git

Upload: xebia-france

Post on 13-Jan-2017

481 views

Category:

Software


1 download

TRANSCRIPT

Page 1: Devoxx fr 2016 - Apache Kafka - Stream Data Platform

#DevoxxFR

Hand’s on

Kafka : http://kafka.apache.org/downloads.htmlHand’s on : https://github.com/mblanc/hands_on_kafka.git

Page 2: Devoxx fr 2016 - Apache Kafka - Stream Data Platform

#DevoxxFR

@matthieublanc

@slequeux

Matthieu BlancSylvain Lequeux

Page 3: Devoxx fr 2016 - Apache Kafka - Stream Data Platform

#DevoxxFR

Messaging System?

Page 4: Devoxx fr 2016 - Apache Kafka - Stream Data Platform

#DevoxxFR

Jay KrepsNeha NarkhedeJun Rao

History

Page 5: Devoxx fr 2016 - Apache Kafka - Stream Data Platform

#DevoxxFR

WebAppRelationalDB

NoSQLDB

DWH

Hadoop

Monitoring Logs

Page 6: Devoxx fr 2016 - Apache Kafka - Stream Data Platform

#DevoxxFR

WebApp

RelationalDB

NoSQLDB

DWH

Hadoop

ActiveMQ

WebApp

Logs

Monitoring

WebApp

Search

Big Data?

Page 7: Devoxx fr 2016 - Apache Kafka - Stream Data Platform

#DevoxxFR

WebApp

RelationalDB

NoSQLDB

DWH

Hadoop ETL

ActiveMQ

WebApp

Logs

Monitoring

WebApp

Search

BIGMESS!

Page 8: Devoxx fr 2016 - Apache Kafka - Stream Data Platform

#DevoxxFR

Stream Data Platform

Page 9: Devoxx fr 2016 - Apache Kafka - Stream Data Platform

#DevoxxFR

● Decoupling Systems● High throughput● Distributed - Horizontal scaling● Multi consumers● Persistence● Automatic recovery from broker failure

Features

Page 10: Devoxx fr 2016 - Apache Kafka - Stream Data Platform

#DevoxxFR

● Cost● Persistence● Batch system -> perfs down● Large scale stream processing● Ordering guarantees

RabbitMQ/ActiveMQ?

Page 11: Devoxx fr 2016 - Apache Kafka - Stream Data Platform

#DevoxxFR

Consumer

Broker

Consumer

Consumer

Kafka Cluster

Broker Broker

Broker Broker Broker

Zookeeper

Producer

Producer

Producer

Architecture

Page 12: Devoxx fr 2016 - Apache Kafka - Stream Data Platform

#DevoxxFR

Distributed Commit Logs

10 11 12 13 14 15 16 17 18987654321 19

1st recordNext recordWritten

Reads(sequential access = high performance)

Page 13: Devoxx fr 2016 - Apache Kafka - Stream Data Platform

#DevoxxFR

Producer

10 11 12 13 14 15 16 17 18987654321

10 11 12 13 14987654321 15

10 11 12 13 14 15987654321 16

Partition #1

Partition #2

Partition #3

ProducerProducer

19

16

17

offset

Old New

Writes

Writes

Writes

message : (key bytes[ ], value bytes[ ])

Page 14: Devoxx fr 2016 - Apache Kafka - Stream Data Platform

#DevoxxFR

Topic storage

10 11 12 13 14 15 16 17 18987654321

Partition #1

directory segment = file

Page 15: Devoxx fr 2016 - Apache Kafka - Stream Data Platform

#DevoxxFR

Fast

● Sequential Access● PageCache● Linux : sendfile()● Compression

Source : http://queue.acm.org/detail.cfm?id=1563874

Page 16: Devoxx fr 2016 - Apache Kafka - Stream Data Platform

#DevoxxFR

Fast

Page 17: Devoxx fr 2016 - Apache Kafka - Stream Data Platform

#DevoxxFR

Consumer Group

10 11 12 13 14 15 16 17 18987654321

10 11 12 13 14987654321 15

10 11 12 13 14 15987654321 16

19

16

17

Producer

Consumer Group A

Consumer Group A

Consumer Group A

Consumer Group B

Consumer Group B

Partition #1

Partition #2

Partition #3

Writes

Consumption

Page 18: Devoxx fr 2016 - Apache Kafka - Stream Data Platform

#DevoxxFR

Fault tolerant consumption

10 11 12 13 14 15 16 17 18987654321

10 11 12 13 14987654321 15

10 11 12 13 14 15987654321 16

19

16

17

Producer

Consumer Group A

Consumer Group A

Consumer Group A

Consumer Group B

Consumer Group B

Partition #1

Partition #2

Partition #3

Writes

Automatic rebalancing on failure

Page 19: Devoxx fr 2016 - Apache Kafka - Stream Data Platform

#DevoxxFR

Consumer group

10 11 12 13 14 15 16 17 18987654321

10 11 12 13 14987654321 15

10 11 12 13 14 15987654321 16

Partition #1

Partition #2

Partition #3

Group Topic # Offset

1 log 1 18

1 log 2 12

1 log 3 14

2 log 1 1

2 log 2 0

2 log 3 3

Consumer group 2 Consumer group 1

Old New

Page 20: Devoxx fr 2016 - Apache Kafka - Stream Data Platform

#DevoxxFR

Replicas/ISRs

Partition #0

Partition #2

Topic : fooPartitions : 3Replicas : 3

Partition #1

Partition #0

Partition #2

Producer

Broker #0 Broker #1 Broker #2

WritesConsumer

Leader

Leader

Leader

Partition #1

Partition #2

Partition #0

Partition #1

Page 21: Devoxx fr 2016 - Apache Kafka - Stream Data Platform

#DevoxxFR

Ka ka 0.9 - New Consumer

● Unified consumer API● Much simpler and thinner● Allows for larger groups with far faster

rebalancing● Decouple Kafka clients from Zookeeper!!!

Page 22: Devoxx fr 2016 - Apache Kafka - Stream Data Platform

#DevoxxFR

Security

● Authentication : Kerberos / TLS certificate● Authorization : unix-like permissions

system ● Encryption on the wire : SSL● Encryption at rest : encrypting individual

fields / filesystem security features● User defined quota

Page 23: Devoxx fr 2016 - Apache Kafka - Stream Data Platform

#DevoxxFR

Ka ka Connect

Kaf

ka C

onne

ct

DataSource

Kaf

ka C

onne

ct

DataSink

Kafka

Page 24: Devoxx fr 2016 - Apache Kafka - Stream Data Platform

#DevoxxFR

Ka ka Streams

Kaf

ka C

onne

ct

DataSource

Kaf

ka C

onne

ct

DataSink

Kafka

KafkaStreams

Page 25: Devoxx fr 2016 - Apache Kafka - Stream Data Platform

#DevoxxFR

Jay KrepsNeha NarkhedeJun Rao

Ka ka Enterprise Ready

2011 2012

2014

Page 26: Devoxx fr 2016 - Apache Kafka - Stream Data Platform

#DevoxxFR

● User behaviour, click stream analysis● Infrastructure monitoring and security ● Telemetry data from mobile/sensors● IoT● Log analysis● ...

Use cases

Page 27: Devoxx fr 2016 - Apache Kafka - Stream Data Platform

#DevoxxFR

Used by

● LinkedIn : activity stream, metrics● Netflix : Real-time Monitoring● Twitter : Real-time data pipeline● Spotify : log delivery● Loggly : log collection and processing● Mozilla : telemetry data● Microsoft : Ads, Bing, Office● Airbnb, Square, Uber, Criteo, OVH ...

Page 28: Devoxx fr 2016 - Apache Kafka - Stream Data Platform

#DevoxxFR

GL HF !

● Download Kafka : http://kafka.apache.org/downloads.html

● Git Clone : https://github.com/mblanc/hands_on_kafka.git

● Open : reveal.js/index_java.html