apache kafka - rainfocus · apache kafka scalable message ... introduction& motivation apache...

43
BASEL BERN BRUGG DÜSSELDORF FRANKFURT A.M. FREIBURG I.BR. GENF HAMBURG KOPENHAGEN LAUSANNE MÜNCHEN STUTTGART WIEN ZÜRICH Apache Kafka Scalable Message Processing and more! Guido Schmutz - 24.4.2017 @ gschmutz guidoschmutz.wordpress.com

Upload: hakhanh

Post on 05-May-2018

288 views

Category:

Documents


4 download

TRANSCRIPT

Page 1: Apache Kafka - RainFocus · Apache Kafka Scalable Message ... Introduction& Motivation Apache Kafka -Scalable Message Processing and more! Apache Kafka -Overview ... • Apache Spark

BASEL BERN BRUGG DÜSSELDORF FRANKFURT A.M. FREIBURG I.BR. GENF HAMBURG KOPENHAGEN LAUSANNE MÜNCHEN STUTTGART WIEN ZÜRICH

Apache KafkaScalable Message Processing and more!

Guido Schmutz - 24.4.2017

@gschmutz guidoschmutz.wordpress.com

Page 2: Apache Kafka - RainFocus · Apache Kafka Scalable Message ... Introduction& Motivation Apache Kafka -Scalable Message Processing and more! Apache Kafka -Overview ... • Apache Spark

Guido Schmutz

Working at Trivadis for more than 20 yearsOracle ACE Director for Fusion Middleware and SOAConsultant, Trainer Software Architect for Java, Oracle, SOA andBig Data / Fast DataMember of Trivadis Architecture BoardTechnology Manager @ Trivadis

More than 30 years of software development experience

Contact: [email protected]: http://guidoschmutz.wordpress.comSlideshare: http://www.slideshare.net/gschmutzTwitter: gschmutz

Apache Kafka - Scalable Message Processing and more!

Page 3: Apache Kafka - RainFocus · Apache Kafka Scalable Message ... Introduction& Motivation Apache Kafka -Scalable Message Processing and more! Apache Kafka -Overview ... • Apache Spark

Agenda

1. Introduction & Motivation2. Kafka Core3. Kafka Connect4. Kafka Streams5. Kafka and "Big Data" / "Fast Data" Ecosystem6. Kafka in Enterprise Architecture7. Confluent Data Platform8. Summary

Apache Kafka - Scalable Message Processing and more!

Page 4: Apache Kafka - RainFocus · Apache Kafka Scalable Message ... Introduction& Motivation Apache Kafka -Scalable Message Processing and more! Apache Kafka -Overview ... • Apache Spark

Introduction & Motivation

Apache Kafka - Scalable Message Processing and more!

Page 5: Apache Kafka - RainFocus · Apache Kafka Scalable Message ... Introduction& Motivation Apache Kafka -Scalable Message Processing and more! Apache Kafka -Overview ... • Apache Spark

Apache Kafka - Overview

Distributed publish-subscribe messaging system

Designed for processing of real time activity stream data (logs, metrics collections, social media streams, …)

Initially developed at LinkedIn, now part of Apache

Does not use JMS API and standards

Kafka maintains feeds of messages in topics

Apache Kafka - Scalable Message Processing and more!

Page 6: Apache Kafka - RainFocus · Apache Kafka Scalable Message ... Introduction& Motivation Apache Kafka -Scalable Message Processing and more! Apache Kafka -Overview ... • Apache Spark

Apache Kafka - Motivation

LinkedIn’s motivation for Kafka was:

• "A unified platform for handling all the real-time data feeds a large company might have."

Must haves

• High throughput to support high volume event feeds• Support real-time processing of these feeds to create new, derived feeds.

• Support large data backlogs to handle periodic ingestion from offline systems• Support low-latency delivery to handle more traditional messaging use cases• Guarantee fault-tolerance in the presence of machine failures

Apache Kafka - Scalable Message Processing and more!

Page 7: Apache Kafka - RainFocus · Apache Kafka Scalable Message ... Introduction& Motivation Apache Kafka -Scalable Message Processing and more! Apache Kafka -Overview ... • Apache Spark

Apache Kafka History

Apache Kafka - Scalable Message Processing and more!

Source:Confluent

Page 8: Apache Kafka - RainFocus · Apache Kafka Scalable Message ... Introduction& Motivation Apache Kafka -Scalable Message Processing and more! Apache Kafka -Overview ... • Apache Spark

Apache Kafka - Unix Analogy

Apache Kafka - Scalable Message Processing and more!

$ cat < in.txt | grep "kafka" | tr a-z A-Z > out.txt

KafkaConnectAPI KafkaConnectAPIKafkaStreamsAPI

KafkaCore(Cluster)

Source:Confluent

Page 9: Apache Kafka - RainFocus · Apache Kafka Scalable Message ... Introduction& Motivation Apache Kafka -Scalable Message Processing and more! Apache Kafka -Overview ... • Apache Spark

Kafka Core

Apache Kafka - Scalable Message Processing and more!

Page 10: Apache Kafka - RainFocus · Apache Kafka Scalable Message ... Introduction& Motivation Apache Kafka -Scalable Message Processing and more! Apache Kafka -Overview ... • Apache Spark

Kafka High Level Architecture

The who is who• Producers write data to brokers.• Consumers read data from

brokers.• All this is distributed.

The data• Data is stored in topics.• Topics are split into partitions,

which are replicated.

Kafka Cluster

Consumer Consumer Consumer

Producer Producer Producer

Broker 1 Broker 2 Broker 3

ZookeeperEnsemble

Apache Kafka - Scalable Message Processing and more!

Page 11: Apache Kafka - RainFocus · Apache Kafka Scalable Message ... Introduction& Motivation Apache Kafka -Scalable Message Processing and more! Apache Kafka -Overview ... • Apache Spark

Apache Kafka - Architecture

Kafka Broker

Movement Processor

MovementTopic

Engine-MetricsTopic

1 2 3 4 5 6

EngineProcessor1 2 3 4 5 6

Truck

Apache Kafka - Scalable Message Processing and more!

Page 12: Apache Kafka - RainFocus · Apache Kafka Scalable Message ... Introduction& Motivation Apache Kafka -Scalable Message Processing and more! Apache Kafka -Overview ... • Apache Spark

Apache Kafka - Architecture

Kafka Broker

Movement Processor

MovementTopic

Engine-MetricsTopic

1 2 3 4 5 6

EngineProcessor

Partition0

1 2 3 4 5 6Partition0

1 2 3 4 5 6Partition1 Movement

ProcessorTruck

Apache Kafka - Scalable Message Processing and more!

Page 13: Apache Kafka - RainFocus · Apache Kafka Scalable Message ... Introduction& Motivation Apache Kafka -Scalable Message Processing and more! Apache Kafka -Overview ... • Apache Spark

ApacheKafka

Kafka Broker 1

Movement Processor

Truck

MovementTopicP0

Movement Processor

1 2 3 4 5

P2 1 2 3 4 5

Kafka Broker 2MovementTopic

P2 1 2 3 4 5

P1 1 2 3 4 5

Kafka Broker 3MovementTopic

P0 1 2 3 4 5

P1 1 2 3 4 5

Movement Processor

Page 14: Apache Kafka - RainFocus · Apache Kafka Scalable Message ... Introduction& Motivation Apache Kafka -Scalable Message Processing and more! Apache Kafka -Overview ... • Apache Spark

Apache Kafka - Architecture

• Write Ahead Log / Commit Log

• Producers always append to tail

• think append to file

Kafka Broker

MovementTopic

1 2 3 4 5

Truck

6 6

Apache Kafka - Scalable Message Processing and more!

Page 15: Apache Kafka - RainFocus · Apache Kafka Scalable Message ... Introduction& Motivation Apache Kafka -Scalable Message Processing and more! Apache Kafka -Overview ... • Apache Spark

Kafka Topics

Creating a topic• Command line interface

• Using AdminUtils.createTopic method

• Auto-create via auto.create.topics.enable = true

Modifying a topichttps://kafka.apache.org/documentation.html#basic_ops_modify_topic

Deleting a topic• Command Line interface

$ kafka-topics.sh –zookeeper zk1:2181 --create \--topic my.topic –-partitions 3 \–-replication-factor 2 --config x=y

Apache Kafka - Scalable Message Processing and more!

Page 16: Apache Kafka - RainFocus · Apache Kafka Scalable Message ... Introduction& Motivation Apache Kafka -Scalable Message Processing and more! Apache Kafka -Overview ... • Apache Spark

Kafka Producer

Apache Kafka - Scalable Message Processing and more!

private Properties kafkaProps = new Properties();kafkaProps.put("bootstrap.servers","broker1:9092,broker2:9092");kafkaProps.put("key.serializer", "...StringSerializer");kafkaProps.put("value.serializer", "...StringSerializer");

producer = new KafkaProducer<String, String>(kafkaProps);

ProducerRecord<String, String> record =new ProducerRecord<>(”topicName", ”Key", ”Value");

try {producer.send(record);

} catch (Exception e) {}

Page 17: Apache Kafka - RainFocus · Apache Kafka Scalable Message ... Introduction& Motivation Apache Kafka -Scalable Message Processing and more! Apache Kafka -Overview ... • Apache Spark

Durability Guarantees

Producer can configure acknowledgements

Apache Kafka - Scalable Message Processing and more!

Value Description Throughput Latency Durability0 • Producerdoesn’twaitforleader high low low (no

guarantee)1

(default)• Producerwaitsforleader• Leadersends ack whenmessage

writtentolog• Nowaitforfollowers

medium medium medium(leader)

all(-1) • Producerwaitsforleader• Leadersendsack when allIn-Sync

Replicahaveacknowledged

low high high(ISR)

Page 18: Apache Kafka - RainFocus · Apache Kafka Scalable Message ... Introduction& Motivation Apache Kafka -Scalable Message Processing and more! Apache Kafka -Overview ... • Apache Spark

Apache Kafka - Partition offsets

Offset: messages in the partitions are each assigned a unique (per partition) and sequential id called the offset

• Consumers track their pointers via (offset, partition, topic) tuples

ConsumerGroupA ConsumerGroupB

Apache Kafka - Scalable Message Processing and more!

Source:ApacheKafka

Page 19: Apache Kafka - RainFocus · Apache Kafka Scalable Message ... Introduction& Motivation Apache Kafka -Scalable Message Processing and more! Apache Kafka -Overview ... • Apache Spark

Data Retention – 3 options

1. Never

2. Time based (TTL) log.retention.{ms | minutes | hours}

3. Size based log.retention.bytes

4. Log compaction based (entries with same key are removed)kafka-topics.sh --zookeeper localhost:2181 \

--create --topic customers \--replication-factor 1 --partitions 1 \--config cleanup.policy=compact

Apache Kafka - Scalable Message Processing and more!

Page 20: Apache Kafka - RainFocus · Apache Kafka Scalable Message ... Introduction& Motivation Apache Kafka -Scalable Message Processing and more! Apache Kafka -Overview ... • Apache Spark

Apache Kafka – Some numbers

Kafka at LinkedIn => over 1800+ broker machines / 79K+ Topics

Kafka Performance at our own infrastructure => 6 brokers (VM) / 1 cluster

• 445’622 messages/second• 31 MB / second • 3.0405 ms average latency between producer / consumer

1.3Trillionmessagesperday

330Terabytesin/day

1.2Petabytesout/day

Peakloadforasinglecluster2millionmessages/sec4.7Gigabits/secinbound15Gigabits/secoutbound

http://engineering.linkedin.com/kafka/benchmarking-apache-kafka-2-million-writes-second-three-cheap-machines

https://engineering.linkedin.com/kafka/running-kafka-scale

Apache Kafka - Scalable Message Processing and more!

Page 21: Apache Kafka - RainFocus · Apache Kafka Scalable Message ... Introduction& Motivation Apache Kafka -Scalable Message Processing and more! Apache Kafka -Overview ... • Apache Spark

Kafka Connect

Apache Kafka - Scalable Message Processing and more!

Page 22: Apache Kafka - RainFocus · Apache Kafka Scalable Message ... Introduction& Motivation Apache Kafka -Scalable Message Processing and more! Apache Kafka -Overview ... • Apache Spark

Kafka Connect Architecture

Apache Kafka - Scalable Message Processing and more!

Source:Confluent

Page 23: Apache Kafka - RainFocus · Apache Kafka Scalable Message ... Introduction& Motivation Apache Kafka -Scalable Message Processing and more! Apache Kafka -Overview ... • Apache Spark

Kafka Connector Hub – Certified Connectors

Source:http://www.confluent.io/product/connectors

Apache Kafka - Scalable Message Processing and more!

Page 24: Apache Kafka - RainFocus · Apache Kafka Scalable Message ... Introduction& Motivation Apache Kafka -Scalable Message Processing and more! Apache Kafka -Overview ... • Apache Spark

Kafka Connector Hub – Additional Connectors

Source:http://www.confluent.io/product/connectors

Apache Kafka - Scalable Message Processing and more!

Page 25: Apache Kafka - RainFocus · Apache Kafka Scalable Message ... Introduction& Motivation Apache Kafka -Scalable Message Processing and more! Apache Kafka -Overview ... • Apache Spark

Kafka Connect – Twitter example

Apache Kafka - Scalable Message Processing and more!

./connect-standalone.sh ../demo-config/connect-simple-source-standalone.properties

../demo-config/twitter-source.properties

name=twitter-sourceconnector.class=com.eneco.trading.kafka.connect.twitter.TwitterSourceConnectortasks.max=1topic=tweetstwitter.consumerkey=<consumer-key>twitter.consumersecret=<consumer-secret>twitter.token=<token>twitter.secret=<token-secret>track.terms=bigdata

bootstrap.servers=localhost:9095,localhost:9096,localhost:9097

key.converter=org.apache.kafka.connect.storage.StringConvertervalue.converter=org.apache.kafka.connect.storage.StringConverter...

Page 26: Apache Kafka - RainFocus · Apache Kafka Scalable Message ... Introduction& Motivation Apache Kafka -Scalable Message Processing and more! Apache Kafka -Overview ... • Apache Spark

Kafka Streams

Apache Kafka - Scalable Message Processing and more!

Page 27: Apache Kafka - RainFocus · Apache Kafka Scalable Message ... Introduction& Motivation Apache Kafka -Scalable Message Processing and more! Apache Kafka -Overview ... • Apache Spark

Kafka Streams

• Designed as a simple and lightweight library in Apache Kafka

• no external dependencies on systems other than Apache Kafka

• Part of open source Apache Kafka, introduced in 0.10+• Leverages Kafka as its internal messaging layer• agnostic to resource management and configuration tools• Supports fault-tolerant local state• Event-at-a-time processing (not microbatch) with millisecond

latency• Windowing with out-of-order data using a Google DataFlow-like

model

Apache Kafka - Scalable Message Processing and more!

Page 28: Apache Kafka - RainFocus · Apache Kafka Scalable Message ... Introduction& Motivation Apache Kafka -Scalable Message Processing and more! Apache Kafka -Overview ... • Apache Spark

Streams API in the context of Kafka

Apache Kafka - Scalable Message Processing and more!

Source:Confluent

Page 29: Apache Kafka - RainFocus · Apache Kafka Scalable Message ... Introduction& Motivation Apache Kafka -Scalable Message Processing and more! Apache Kafka -Overview ... • Apache Spark

Kafka and "Big Data" / "Fast Data"Ecosystem

Apache Kafka - Scalable Message Processing and more!

Page 30: Apache Kafka - RainFocus · Apache Kafka Scalable Message ... Introduction& Motivation Apache Kafka -Scalable Message Processing and more! Apache Kafka -Overview ... • Apache Spark

Kafka and the Big Data / Fast Data ecosystem

Kafka integrates with many popular products / frameworks

• Apache Spark Streaming

• Apache Flink

• Apache Storm

• Apache NiFi

• Streamsets

• Apache Flume

• Oracle Stream Analytics

• Oracle Service Bus

• Oracle GoldenGate

• Spring Integration Kafka Support

• …Stormbuilt-inKafkaSpouttoconsumeeventsfromKafka

Apache Kafka - Scalable Message Processing and more!

Page 31: Apache Kafka - RainFocus · Apache Kafka Scalable Message ... Introduction& Motivation Apache Kafka -Scalable Message Processing and more! Apache Kafka -Overview ... • Apache Spark

Kafka in “Enterprise Architecture”

Apache Kafka - Scalable Message Processing and more!

Page 32: Apache Kafka - RainFocus · Apache Kafka Scalable Message ... Introduction& Motivation Apache Kafka -Scalable Message Processing and more! Apache Kafka -Overview ... • Apache Spark

Hadoop ClusterdHadoop Cluster

Big Data Cluster

Traditional Big Data Architecture

BITools

Enterprise Data Warehouse

Billing &Ordering

CRM / Profile

MarketingCampaigns

File Import / SQL Import

SQL

Search

Online&MobileApps

Search

NoSQL

Parallel BatchProcessing

DistributedFilesystem

• MachineLearning• GraphAlgorithms• NaturalLanguageProcessing

Apache Kafka - Scalable Message Processing and more!

Page 33: Apache Kafka - RainFocus · Apache Kafka Scalable Message ... Introduction& Motivation Apache Kafka -Scalable Message Processing and more! Apache Kafka -Overview ... • Apache Spark

Event HubEvent

Hub

Hadoop ClusterdHadoop Cluster

Big Data Cluster

Event Hub – handle event stream data

BITools

Enterprise Data Warehouse

Location

Social

Clickstream

Sensor Data

Billing &Ordering

CRM / Profile

MarketingCampaigns

Event Hub

CallCenter

WeatherData

MobileApps

SQL

Search

Online&MobileApps

Search

Data Flow

NoSQL

Parallel BatchProcessing

DistributedFilesystem

• MachineLearning• GraphAlgorithms• NaturalLanguageProcessing

Page 34: Apache Kafka - RainFocus · Apache Kafka Scalable Message ... Introduction& Motivation Apache Kafka -Scalable Message Processing and more! Apache Kafka -Overview ... • Apache Spark

Hadoop ClusterdHadoop ClusterBig Data Cluster

Event Hub – taking Velocity into account

Location

Social

Clickstream

Sensor Data

Billing &Ordering

CRM / Profile

MarketingCampaigns

CallCenter

MobileApps

Batch Analytics

Streaming Analytics

Event HubEvent

HubEvent Hub

NoSQL

Parallel BatchProcessing

DistributedFilesystem

Stream AnalyticsNoSQL

Reference /Models

SQL

Search

Dashboard

BITools

Enterprise Data Warehouse

Search

Online&MobileApps

File Import / SQL Import

WeatherData

Apache Kafka - Scalable Message Processing and more!

Page 35: Apache Kafka - RainFocus · Apache Kafka Scalable Message ... Introduction& Motivation Apache Kafka -Scalable Message Processing and more! Apache Kafka -Overview ... • Apache Spark

Container

Hadoop ClusterdHadoop ClusterBig Data Cluster

Event Hub – Asynchronous Microservice Architecture

Location

Social

Clickstream

Sensor Data

Billing &Ordering

CRM / Profile

MarketingCampaigns

CallCenter

MobileApps

Event HubEvent

HubEvent Hub

ParallelBatch

ProcessingDistributedFilesystem

Microservice

NoSQLRDBMS

SQL

Search

BITools

Enterprise Data Warehouse

Search

Online&MobileApps

File Import / SQL Import

WeatherData

Apache Kafka - Scalable Message Processing and more!

{}

API

Page 36: Apache Kafka - RainFocus · Apache Kafka Scalable Message ... Introduction& Motivation Apache Kafka -Scalable Message Processing and more! Apache Kafka -Overview ... • Apache Spark

Confluent Platform

Apache Kafka - Scalable Message Processing and more!

Page 37: Apache Kafka - RainFocus · Apache Kafka Scalable Message ... Introduction& Motivation Apache Kafka -Scalable Message Processing and more! Apache Kafka -Overview ... • Apache Spark

Confluent Data Platform 3.2

Apache Kafka - Scalable Message Processing and more!

Source:Confluent

Page 38: Apache Kafka - RainFocus · Apache Kafka Scalable Message ... Introduction& Motivation Apache Kafka -Scalable Message Processing and more! Apache Kafka -Overview ... • Apache Spark

Confluent Data Platform 3.2

Apache Kafka - Scalable Message Processing and more!

Source:Confluent

Page 39: Apache Kafka - RainFocus · Apache Kafka Scalable Message ... Introduction& Motivation Apache Kafka -Scalable Message Processing and more! Apache Kafka -Overview ... • Apache Spark

Confluent Enterprise – Control Center

Apache Kafka - Scalable Message Processing and more!

Source:Confluent

Page 40: Apache Kafka - RainFocus · Apache Kafka Scalable Message ... Introduction& Motivation Apache Kafka -Scalable Message Processing and more! Apache Kafka -Overview ... • Apache Spark

Summary

Apache Kafka - Scalable Message Processing and more!

Page 41: Apache Kafka - RainFocus · Apache Kafka Scalable Message ... Introduction& Motivation Apache Kafka -Scalable Message Processing and more! Apache Kafka -Overview ... • Apache Spark

Summary

• Kafka can scale to millions of messages per second, and more

• Easy to start in a Proof of Concept (PoC), but more to invest to setup a production environment

• Monitoring is key

• Vibrant community and ecosystem

• Fast paced technology

• Confluent provides distribution and support for Apache Kafka •

Oracle Event Hub Service offers a Kafka Managed Service

Apache Kafka - Scalable Message Processing and more!

Page 42: Apache Kafka - RainFocus · Apache Kafka Scalable Message ... Introduction& Motivation Apache Kafka -Scalable Message Processing and more! Apache Kafka -Overview ... • Apache Spark

WeatherData

SQL ImportHadoop ClusterdHadoop Cluster

Hadoop Cluster

Location

Social

Clickstream

Sensor Data

Billing &Ordering

CRM / Profile

MarketingCampaigns

CallCenter

MobileApps

Batch Analytics

Streaming Analytics

Event HubEvent

HubEvent Hub

NoSQL

ParallelProcessing

DistributedFilesystem

Stream AnalyticsNoSQL

Reference /Models

SQL

Search

Dashboard

BITools

Enterprise Data Warehouse

Search

Online&MobileApps

Customer Event Hub – mapping of technologies

Apache Kafka - Scalable Message Processing and more!

Page 43: Apache Kafka - RainFocus · Apache Kafka Scalable Message ... Introduction& Motivation Apache Kafka -Scalable Message Processing and more! Apache Kafka -Overview ... • Apache Spark

Guido SchmutzTechnology Manager

[email protected]

Apache Kafka - Scalable Message Processing and more!

@gschmutz guidoschmutz.wordpress.com