apache kafka - scalable message-processing and more !

38
BASEL BERN BRUGG DÜSSELDORF FRANKFURT A.M. FREIBURG I.BR. GENF HAMBURG KOPENHAGEN LAUSANNE MÜNCHEN STUTTGART WIEN ZÜRICH Apache Kafka Scalable Message Processing and more! Guido Schmutz @ gschmutz guidoschmutz.wordpress.com

Upload: guido-schmutz

Post on 16-Apr-2017

559 views

Category:

Technology


0 download

TRANSCRIPT

Page 1: Apache Kafka - Scalable Message-Processing and more !

BASEL BERN BRUGG DÜSSELDORF FRANKFURT A.M. FREIBURG I.BR. GENF HAMBURG KOPENHAGEN LAUSANNE MÜNCHEN STUTTGART WIEN ZÜRICH

Apache KafkaScalable Message Processing and more!

Guido Schmutz

@gschmutz guidoschmutz.wordpress.com

Page 2: Apache Kafka - Scalable Message-Processing and more !

Guido Schmutz

Working at Trivadis for more than 20 yearsOracle ACE Director for Fusion Middleware and SOAConsultant, Trainer Software Architect for Java, Oracle, SOA andBig Data / Fast DataMember of Trivadis Architecture BoardTechnology Manager @ Trivadis

More than 30 years of software development experience

Contact: [email protected]: http://guidoschmutz.wordpress.comSlideshare: http://www.slideshare.net/gschmutzTwitter: gschmutz

2 8.12.2016 Big Data & Fast Data

Page 3: Apache Kafka - Scalable Message-Processing and more !

Agenda

1. Introduction & Motivation2. Kafka Core

3. Kafka Connect

4. Kafka Streams

5. Kafka and ”Big Data” / ”Fast Data” Ecosystem

6. Confluent Data Platform7. Summary

Apache Kafka - Scalable Message Processing and more!3

Page 4: Apache Kafka - Scalable Message-Processing and more !

Introduction & Motivation

Apache Kafka - Scalable Message Processing and more!4

Page 5: Apache Kafka - Scalable Message-Processing and more !

Hadoop ClusterdHadoop Cluster

Big Data Cluster

Traditional Big Data Architecture

BITools

Enterprise Data Warehouse

Billing &Ordering

CRM / Profile

MarketingCampaigns

File Import / SQL Import

SQL

Search

Online&MobileApps

Search

NoSQL

ParallelProcessing

DistributedFilesystem

• MachineLearning• GraphAlgorithms• NaturalLanguageProcessing

Page 6: Apache Kafka - Scalable Message-Processing and more !

Event HubEvent

Hub

Hadoop ClusterdHadoop Cluster

Big Data Cluster

Event Hub – handle event stream data

BITools

Enterprise Data Warehouse

Location

Social

Clickstream

Sensor Data

Billing &Ordering

CRM / Profile

MarketingCampaigns

Event Hub

CallCenter

WeatherData

MobileApps

SQL

Search

Online&MobileApps

Search

Data Flow

NoSQL

ParallelProcessing

DistributedFilesystem

• MachineLearning• GraphAlgorithms• NaturalLanguageProcessing

Page 7: Apache Kafka - Scalable Message-Processing and more !

Hadoop ClusterdHadoop ClusterBig Data Cluster

Event Hub – taking Velocity into account

Location

Social

Clickstream

Sensor Data

Billing &Ordering

CRM / Profile

MarketingCampaigns

CallCenter

MobileApps

Batch Analytics

Streaming Analytics

Event HubEvent

HubEvent Hub

NoSQL

ParallelProcessing

DistributedFilesystem

Stream AnalyticsNoSQL

Reference /Models

SQL

Search

Dashboard

BITools

Enterprise Data Warehouse

Search

Online&MobileApps

File Import / SQL Import

WeatherData

Apache Kafka - Scalable Message Processing and more!7

Page 8: Apache Kafka - Scalable Message-Processing and more !

Kafka Stream Data Platform

Source:ConfluentApache Kafka - Scalable Message Processing and more!8

Page 9: Apache Kafka - Scalable Message-Processing and more !

Kafka Core

Apache Kafka - Scalable Message Processing and more!9

Page 10: Apache Kafka - Scalable Message-Processing and more !

Apache Kafka - Overview

Distributed publish-subscribe messaging system

Designed for processing of real time activity stream data (logs, metrics collections, social media streams, …)

Initially developed at LinkedIn, now part of Apache

Does not use JMS API and standards

Kafka maintains feeds of messages in topics

Apache Kafka - Scalable Message Processing and more!10

Page 11: Apache Kafka - Scalable Message-Processing and more !

Apache Kafka - Motivation

LinkedIn’s motivation for Kafka was:

• “A unified platform for handling all the real-time data feeds a large company might have.”

Must haves

• High throughput to support high volume event feeds.

• Support real-time processing of these feeds to create new, derived feeds.

• Support large data backlogs to handle periodic ingestion from offline systems.

• Support low-latency delivery to handle more traditional messaging use cases.

• Guarantee fault-tolerance in the presence of machine failures.

Apache Kafka - Scalable Message Processing and more!11

Page 12: Apache Kafka - Scalable Message-Processing and more !

Kafka High Level Architecture

The who is who• Producers write data to brokers.• Consumers read data from

brokers.• All this is distributed.

The data• Data is stored in topics.• Topics are split into partitions,

which are replicated.

Kafka Cluster

Consumer Consumer Consumer

Producer Producer Producer

Broker 1 Broker 2 Broker 3

ZookeeperEnsemble

Apache Kafka - Scalable Message Processing and more!12

Page 13: Apache Kafka - Scalable Message-Processing and more !

Apache Kafka - Architecture

Kafka Broker

Movement Processor

MovementTopic

Engine-MetricsTopic

1 2 3 4 5 6

EngineProcessor1 2 3 4 5 6

Truck

Apache Kafka - Scalable Message Processing and more!13

Page 14: Apache Kafka - Scalable Message-Processing and more !

Apache Kafka - Architecture

Kafka Broker

Movement Processor

MovementTopic

Engine-MetricsTopic

1 2 3 4 5 6

EngineProcessor

Partition0

1 2 3 4 5 6Partition0

1 2 3 4 5 6Partition1 Movement

ProcessorTruck

Apache Kafka - Scalable Message Processing and more!14

Page 15: Apache Kafka - Scalable Message-Processing and more !

ApacheKafka

Kafka Broker 1

Movement Processor

Truck

MovementTopicP0

Movement Processor

1 2 3 4 5

P2 1 2 3 4 5

Kafka Broker 2MovementTopic

P2 1 2 3 4 5

P1 1 2 3 4 5

Kafka Broker 3MovementTopic

P0 1 2 3 4 5

P1 1 2 3 4 5

Movement Processor

Apache Kafka - Scalable Message Processing and more!15

Page 16: Apache Kafka - Scalable Message-Processing and more !

Apache Kafka - Architecture

• Write Ahead Log / Commit Log

• Producers always append to tail

• think append to file

Kafka Broker

MovementTopic

1 2 3 4 5

Truck

6 6

Apache Kafka - Scalable Message Processing and more!16

Page 17: Apache Kafka - Scalable Message-Processing and more !

Durability Guarantees

Producer can configure acknowledgements

Value Impact Durability0 • Producerdoesn’twaitforleader weak

1(default) • Producerwaitsforleader• Leadersends ack whenmessagewrittentolog• Nowaitforfollowers

medium

all • Producerwaitsforleader• Leadersendsack when allIn-SyncReplicahave

acknowledged

strong

Apache Kafka - Scalable Message Processing and more!17

Page 18: Apache Kafka - Scalable Message-Processing and more !

Apache Kafka - Partition offsets

Offset: messages in the partitions are each assigned a unique (per partition) and sequential id called the offset

• Consumers track their pointers via (offset, partition, topic) tuples

ConsumerGroupA ConsumerGroupB

Apache Kafka - Scalable Message Processing and more!18

Source:ApacheKafka

Page 19: Apache Kafka - Scalable Message-Processing and more !

Data Retention – 3 options

1. Never

2. Time based (TTL) log.retention.{ms | minutes | hours}

3. Size based log.retention.bytes

4. Log compaction based (entries with same key are removed)kafka-topics.sh --zookeeper localhost:2181 \

--create --topic customers \--replication-factor 1 --partitions 1 \--config cleanup.policy=compact

Apache Kafka - Scalable Message Processing and more!19

Page 20: Apache Kafka - Scalable Message-Processing and more !

Apache Kafka – Some numbers

Kafka at LinkedIn => over 1800+ broker machines / 79K+ Topics

Kafka Performance at our own infrastructure => 6 brokers (VM) / 1 cluster

• 445’622 messages/second• 31 MB / second • 3.0405 ms average latency between producer / consumer

1.3Trillionmessagesperday

330Terabytesin/day

1.2Petabytesout/day

Peakloadforasinglecluster2millionmessages/sec4.7Gigabits/secinbound15Gigabits/secoutbound

http://engineering.linkedin.com/kafka/benchmarking-apache-kafka-2-million-writes-second-three-cheap-machines

https://engineering.linkedin.com/kafka/running-kafka-scale

Apache Kafka - Scalable Message Processing and more!20

Page 21: Apache Kafka - Scalable Message-Processing and more !

Kafka Topics

Creating a topic• Command line interface

• Using AdminUtils.createTopic method

• Auto-create via auto.create.topics.enable = true

Modifying a topichttps://kafka.apache.org/documentation.html#basic_ops_modify_topic

Deleting a topic

• Command Line interface

$ kafka-topics.sh –zookeeper zk1:2181 --create \--topic my.topic –-partitions 3 \–-replication-factor 2 --config x=y

Apache Kafka - Scalable Message Processing and more!21

Page 22: Apache Kafka - Scalable Message-Processing and more !

Inspecting the current state of a topic

Use the --describe option

• Leader: brokerID of the currently elected leader broker

• Replica ID’s = broker ID’s

• ISR = “in-sync replica”, replicas that are in sync with the leader. In this example:

• Broker 0 is leader for partition 1.

• Broker 1 is leader for partitions 0 and 2.

• All replicas are in-sync with their respective leader partitions.

$ kafka-topics.sh –zookeeper zk1:2181 –-describe --topic my.topicTopic:zerg2.hydra PartitionCount:3 ReplicationFactor:2 Configs:Topic: my.topic Partition: 0 Leader: 1 Replicas: 1,0 Isr: 1,0Topic: my.topic Partition: 1 Leader: 0 Replicas: 0,1 Isr: 0,1Topic: my.topic Partition: 2 Leader: 1 Replicas: 1,0 Isr: 1,0

Apache Kafka - Scalable Message Processing and more!22

Page 23: Apache Kafka - Scalable Message-Processing and more !

Kafka Connect

Apache Kafka - Scalable Message Processing and more!23

Page 24: Apache Kafka - Scalable Message-Processing and more !

Kafka Connect Architecture

Apache Kafka - Scalable Message Processing and more!24

Source:Confluent

Page 25: Apache Kafka - Scalable Message-Processing and more !

Kafka Connector Hub – Certified Connectors

Source:http://www.confluent.io/product/connectors

Apache Kafka - Scalable Message Processing and more!25

Page 26: Apache Kafka - Scalable Message-Processing and more !

Kafka Connector Hub – Additional Connectors

Source:http://www.confluent.io/product/connectors

Apache Kafka - Scalable Message Processing and more!26

Page 27: Apache Kafka - Scalable Message-Processing and more !

Kafka Streams

Apache Kafka - Scalable Message Processing and more!27

Page 28: Apache Kafka - Scalable Message-Processing and more !

Kafka Streams

• Designed as a simple and lightweight library in Apache Kafka

• no external dependencies on systems other than Apache Kafka

• Leverages Kafka as its internal messaging layer

• agnostic to resource management and configuration tools

• Supports fault-tolerant local state

• Event-at-a-time processing (not microbatch) with millisecond latency

• Windowing with out-of-order data using a Google DataFlow-like model

Apache Kafka - Scalable Message Processing and more!28

Page 29: Apache Kafka - Scalable Message-Processing and more !

Kafka Streams - Architecture

Apache Kafka - Scalable Message Processing and more!29

topology defines the stream processing computational logic for your application

topology is a graph of stream processors (nodes) that are connected by streams (edges)

source processor is a stream processor that does not have any upstream processors

sink processor is a special type of stream processor that does not have down-stream processors.

Source:Confluent

Page 30: Apache Kafka - Scalable Message-Processing and more !

Kafka Streams - Processor Topology

Apache Kafka - Scalable Message Processing and more!30

topology defines the stream processing computational logic for your application

topology is a graph of stream processors(nodes) that are connected by streams (edges)

source processor is a stream processor that does not have any upstream processors. Consumes one or Kafka topics.

sink processor is a special type of stream processor that does not have down-stream processors. Produces to a single Kafka topic.

Source:Confluent

Page 31: Apache Kafka - Scalable Message-Processing and more !

Kafka and ”Big Data” / ”Fast Data” Ecosystem

Apache Kafka - Scalable Message Processing and more!31

Page 32: Apache Kafka - Scalable Message-Processing and more !

Kafka and the Big Data / Fast Data ecosystem

Kafka integrates with many popular products / frameworks

• Apache Spark Streaming

• Apache Flink

• Apache Storm

• Apache NiFi

• Streamsets

• Apache Flume

• Oracle Stream Analytics

• Oracle Service Bus

• Oracle GoldenGate

• Spring Integration Kafka Support

• …Stormbuilt-inKafkaSpouttoconsumeeventsfromKafka

Apache Kafka - Scalable Message Processing and more!32

Page 33: Apache Kafka - Scalable Message-Processing and more !

Confluent Platform

Apache Kafka - Scalable Message Processing and more!33

Page 34: Apache Kafka - Scalable Message-Processing and more !

Confluent Data Platform 3.1

Apache Kafka - Scalable Message Processing and more!34

Source:Confluent

Page 35: Apache Kafka - Scalable Message-Processing and more !

Summary

Apache Kafka - Scalable Message Processing and more!35

Page 36: Apache Kafka - Scalable Message-Processing and more !

WeatherData

SQL ImportHadoop ClusterdHadoop Cluster

Hadoop Cluster

Location

Social

Clickstream

Sensor Data

Billing &Ordering

CRM / Profile

MarketingCampaigns

CallCenter

MobileApps

Batch Analytics

Streaming Analytics

Event HubEvent

HubEvent Hub

NoSQL

ParallelProcessing

DistributedFilesystem

Stream AnalyticsNoSQL

Reference /Models

SQL

Search

Dashboard

BITools

Enterprise Data Warehouse

Search

Online&MobileApps

Customer Event Hub – mapping of technologies

Apache Kafka - Scalable Message Processing and more!36

Page 37: Apache Kafka - Scalable Message-Processing and more !

Summary

• Kafka can scale to millions of messages per second, and more

• Easy to start with for a PoC

• A bit more to invest to setup production environment

• Monitoring is key

• Vibrant community and ecosystem

• Fast pace technology

• Confluent provides Kafka Distribution

Apache Kafka - Scalable Message Processing and more!37

Page 38: Apache Kafka - Scalable Message-Processing and more !

Guido SchmutzTechnology Manager

[email protected]

Apache Kafka - Scalable Message Processing and more!38

@gschmutz guidoschmutz.wordpress.com