Apache Kafka - RainFocus Kafka Scalable Message ... Introduction Motivation Apache Kafka -Scalable Message Processing and more! Apache Kafka -Overview ... Apache Spark Streaming

Download Apache Kafka - RainFocus  Kafka Scalable Message ... Introduction Motivation Apache Kafka -Scalable Message Processing and more! Apache Kafka -Overview ...  Apache Spark Streaming

Post on 05-May-2018

218 views

Category:

Documents

4 download

TRANSCRIPT

  • BASEL BERN BRUGG DSSELDORF FRANKFURT A.M. FREIBURG I.BR. GENF HAMBURG KOPENHAGEN LAUSANNE MNCHEN STUTTGART WIEN ZRICH

    Apache KafkaScalable Message Processing and more!

    Guido Schmutz - 24.4.2017

    @gschmutz guidoschmutz.wordpress.com

  • Guido Schmutz

    Working at Trivadis for more than 20 yearsOracle ACE Director for Fusion Middleware and SOAConsultant, Trainer Software Architect for Java, Oracle, SOA andBig Data / Fast DataMember of Trivadis Architecture BoardTechnology Manager @ Trivadis

    More than 30 years of software development experience

    Contact: guido.schmutz@trivadis.comBlog: http://guidoschmutz.wordpress.comSlideshare: http://www.slideshare.net/gschmutzTwitter: gschmutz

    Apache Kafka - Scalable Message Processing and more!

  • Agenda

    1. Introduction & Motivation2. Kafka Core3. Kafka Connect4. Kafka Streams5. Kafka and "Big Data" / "Fast Data" Ecosystem6. Kafka in Enterprise Architecture7. Confluent Data Platform8. Summary

    Apache Kafka - Scalable Message Processing and more!

  • Introduction & Motivation

    Apache Kafka - Scalable Message Processing and more!

  • Apache Kafka - Overview

    Distributed publish-subscribe messaging system

    Designed for processing of real time activity stream data (logs, metrics collections, social media streams, )

    Initially developed at LinkedIn, now part of Apache

    Does not use JMS API and standards

    Kafka maintains feeds of messages in topics

    Apache Kafka - Scalable Message Processing and more!

  • Apache Kafka - Motivation

    LinkedIns motivation for Kafka was:

    "A unified platform for handling all the real-time data feeds a large company might have."

    Must haves

    High throughput to support high volume event feeds Support real-time processing of these feeds to create new, derived feeds. Support large data backlogs to handle periodic ingestion from offline systems Support low-latency delivery to handle more traditional messaging use cases Guarantee fault-tolerance in the presence of machine failures

    Apache Kafka - Scalable Message Processing and more!

  • Apache Kafka History

    Apache Kafka - Scalable Message Processing and more!

    Source:Confluent

  • Apache Kafka - Unix Analogy

    Apache Kafka - Scalable Message Processing and more!

    $ cat < in.txt | grep "kafka" | tr a-z A-Z > out.txt

    KafkaConnectAPI KafkaConnectAPIKafkaStreamsAPI

    KafkaCore(Cluster)

    Source:Confluent

  • Kafka Core

    Apache Kafka - Scalable Message Processing and more!

  • Kafka High Level Architecture

    The who is who Producers write data to brokers. Consumers read data from

    brokers. All this is distributed.

    The data Data is stored in topics. Topics are split into partitions,

    which are replicated.

    Kafka Cluster

    Consumer Consumer Consumer

    Producer Producer Producer

    Broker 1 Broker 2 Broker 3

    ZookeeperEnsemble

    Apache Kafka - Scalable Message Processing and more!

  • Apache Kafka - Architecture

    Kafka Broker

    Movement Processor

    MovementTopic

    Engine-MetricsTopic

    1 2 3 4 5 6

    EngineProcessor1 2 3 4 5 6

    Truck

    Apache Kafka - Scalable Message Processing and more!

  • Apache Kafka - Architecture

    Kafka Broker

    Movement Processor

    MovementTopic

    Engine-MetricsTopic

    1 2 3 4 5 6

    EngineProcessor

    Partition0

    1 2 3 4 5 6Partition0

    1 2 3 4 5 6Partition1 Movement

    ProcessorTruck

    Apache Kafka - Scalable Message Processing and more!

  • ApacheKafka

    Kafka Broker 1

    Movement Processor

    Truck

    MovementTopicP0

    Movement Processor

    1 2 3 4 5

    P2 1 2 3 4 5

    Kafka Broker 2MovementTopic

    P2 1 2 3 4 5

    P1 1 2 3 4 5

    Kafka Broker 3MovementTopic

    P0 1 2 3 4 5

    P1 1 2 3 4 5

    Movement Processor

  • Apache Kafka - Architecture

    Write Ahead Log / Commit Log

    Producers always append to tail

    think append to file

    Kafka Broker

    MovementTopic

    1 2 3 4 5

    Truck

    6 6

    Apache Kafka - Scalable Message Processing and more!

  • Kafka Topics

    Creating a topic Command line interface

    Using AdminUtils.createTopic method

    Auto-create via auto.create.topics.enable = true

    Modifying a topichttps://kafka.apache.org/documentation.html#basic_ops_modify_topic

    Deleting a topic Command Line interface

    $ kafka-topics.sh zookeeper zk1:2181 --create \--topic my.topic -partitions 3 \-replication-factor 2 --config x=y

    Apache Kafka - Scalable Message Processing and more!

  • Kafka Producer

    Apache Kafka - Scalable Message Processing and more!

    private Properties kafkaProps = new Properties();kafkaProps.put("bootstrap.servers","broker1:9092,broker2:9092");kafkaProps.put("key.serializer", "...StringSerializer");kafkaProps.put("value.serializer", "...StringSerializer");

    producer = new KafkaProducer(kafkaProps);

    ProducerRecord record =new ProducerRecord(topicName", Key", Value");

    try {producer.send(record);

    } catch (Exception e) {}

  • Durability Guarantees

    Producer can configure acknowledgements

    Apache Kafka - Scalable Message Processing and more!

    Value Description Throughput Latency Durability0 Producerdoesntwaitforleader high low low (no

    guarantee)1

    (default) Producerwaitsforleader Leadersends ack whenmessage

    writtentolog Nowaitforfollowers

    medium medium medium(leader)

    all(-1) Producerwaitsforleader Leadersendsack when allIn-Sync

    Replicahaveacknowledged

    low high high(ISR)

  • Apache Kafka - Partition offsets

    Offset: messages in the partitions are each assigned a unique (per partition) and sequential id called the offset

    Consumers track their pointers via (offset, partition, topic) tuples

    ConsumerGroupA ConsumerGroupB

    Apache Kafka - Scalable Message Processing and more!

    Source:ApacheKafka

  • Data Retention 3 options

    1. Never

    2. Time based (TTL) log.retention.{ms | minutes | hours}

    3. Size based log.retention.bytes

    4. Log compaction based (entries with same key are removed)kafka-topics.sh --zookeeper localhost:2181 \

    --create --topic customers \--replication-factor 1 --partitions 1 \--config cleanup.policy=compact

    Apache Kafka - Scalable Message Processing and more!

  • Apache Kafka Some numbers

    Kafka at LinkedIn => over 1800+ broker machines / 79K+ Topics

    Kafka Performance at our own infrastructure => 6 brokers (VM) / 1 cluster

    445622 messages/second 31 MB / second 3.0405 ms average latency between producer / consumer

    1.3Trillionmessagesperday

    330Terabytesin/day

    1.2Petabytesout/day

    Peakloadforasinglecluster2millionmessages/sec4.7Gigabits/secinbound15Gigabits/secoutbound

    http://engineering.linkedin.com/kafka/benchmarking-apache-kafka-2-million-writes-second-three-cheap-machines

    https://engineering.linkedin.com/kafka/running-kafka-scale

    Apache Kafka - Scalable Message Processing and more!

  • Kafka Connect

    Apache Kafka - Scalable Message Processing and more!

  • Kafka Connect Architecture

    Apache Kafka - Scalable Message Processing and more!

    Source:Confluent

  • Kafka Connector Hub Certified Connectors

    Source:http://www.confluent.io/product/connectors

    Apache Kafka - Scalable Message Processing and more!

  • Kafka Connector Hub Additional Connectors

    Source:http://www.confluent.io/product/connectors

    Apache Kafka - Scalable Message Processing and more!

  • Kafka Connect Twitter example

    Apache Kafka - Scalable Message Processing and more!

    ./connect-standalone.sh ../demo-config/connect-simple-source-standalone.properties

    ../demo-config/twitter-source.properties

    name=twitter-sourceconnector.class=com.eneco.trading.kafka.connect.twitter.TwitterSourceConnectortasks.max=1topic=tweetstwitter.consumerkey=twitter.consumersecret=twitter.token=twitter.secret=track.terms=bigdata

    bootstrap.servers=localhost:9095,localhost:9096,localhost:9097

    key.converter=org.apache.kafka.connect.storage.StringConvertervalue.converter=org.apache.kafka.connect.storage.StringConverter...

  • Kafka Streams

    Apache Kafka - Scalable Message Processing and more!

  • Kafka Streams

    Designed as a simple and lightweight library in Apache Kafka

    no external dependencies on systems other than Apache Kafka

    Part of open source Apache Kafka, introduced in 0.10+ Leverages Kafka as its internal messaging layer agnostic to resource management and configuration tools Supports fault-tolerant local state Event-at-a-time processing (not microbatch) with millisecond

    latency Windowing with out-of-order data using a Google DataFlow-like

    model

    Apache Kafka - Scalable Message Processing and more!

  • Streams API in the context of Kafka

    Apache Kafka - Scalable Message Processing and more!

    Source:Confluent

  • Kafka and "Big Data" / "Fast Data"Ecosystem

    Apache Kafka - Scalable Message Processing and more!

  • Kafka and the Big Data / Fast Data ecosystem

    Kafka integrates with many popular products / frameworks

    Apache Spark Streaming

    Apache Flink

    Apache Storm

    Apache NiFi

    Streamsets

    Apache Flume

    Oracle Stream Analytics

    Oracle Service Bus

    Oracle GoldenGate

    Spring Integration Kafka Support

    Stormbuilt-inKafkaSpouttoconsumeeventsfromKafka

    Apache Kafka - Scalable Message Processing and more!

  • Kafka in Enterprise Architecture

    Apache Kafka - Scalable Message Processing and more!

  • Hadoop ClusterdHadoop Cluster

    Big Data Cluster

    Traditional Big Data Architecture

    BITools

    Enterprise Data Warehouse

    Billing &Ordering

    CRM / Profile

    MarketingCampaigns

    File Import / SQL Import

    SQL

    Search

    Online&MobileApps

    Search

    NoSQL

    Parallel BatchProcessing

    DistributedFilesystem

    MachineLearning GraphAlgorithms NaturalLanguageProcessing

    Apache Kafka - Scalable Message Processing and more!

  • Event HubEvent Hub

    Hadoop ClusterdHadoop Cluster

    Big Data Cluster

    Event Hub handle event stream data

    BITools

    Enterprise Data Warehouse

    Location

    Social

    Clickstream

    Sensor Data

    Billing &Ordering

    CRM / Profile

    MarketingCampaigns

    Event Hub

    CallCenter

    WeatherData

    MobileApps

    SQL

    Search

    Online&MobileApps

    Search

    Data Flow

    NoSQL

    Parallel BatchProcessing

    DistributedFilesystem

    MachineLearning GraphAlgorithms NaturalLanguageProcessing

  • Hadoop ClusterdHadoop ClusterBig Data Cluster

    Event Hub taking Velocity into account

    Location

    Social

    Clickstream

    Sensor Data

    Billing &Ordering

    CRM / Profile

    MarketingCampaigns

    CallCenter

    MobileApps

    Batch Analytics

    Streaming Analytics

    Event HubEvent HubEvent Hub

    NoSQL

    Parallel BatchProcessing

    DistributedFilesystem

    Stream AnalyticsNoSQL

    Reference /Models

    SQL

    Search

    Dashboard

    BITools

    Enterprise Data Warehouse

    Search

    Online&MobileApps

    File Import / SQL Import

    WeatherData

    Apache Kafka - Scalable Message Processing and more!

  • Container

    Hadoop ClusterdHadoop ClusterBig Data Cluster

    Event Hub Asynchronous Microservice Architecture

    Location

    Social

    Clickstream

    Sensor Data

    Billing &Ordering

    CRM / Profile

    MarketingCampaigns

    CallCenter

    MobileApps

    Event HubEvent HubEvent Hub

    ParallelBatch

    ProcessingDistributedFilesystem

    Microservice

    NoSQLRDBMS

    SQL

    Search

    BITools

    Enterprise Data Warehouse

    Search

    Online&MobileApps

    File Import / SQL Import

    WeatherData

    Apache Kafka - Scalable Message Processing and more!

    {}

    API

  • Confluent Platform

    Apache Kafka - Scalable Message Processing and more!

  • Confluent Data Platform 3.2

    Apache Kafka - Scalable Message Processing and more!

    Source:Confluent

  • Confluent Data Platform 3.2

    Apache Kafka - Scalable Message Processing and more!

    Source:Confluent

  • Confluent Enterprise Control Center

    Apache Kafka - Scalable Message Processing and more!

    Source:Confluent

  • Summary

    Apache Kafka - Scalable Message Processing and more!

  • Summary

    Kafka can scale to millions of messages per second, and more

    Easy to start in a Proof of Concept (PoC), but more to invest to setup a production environment

    Monitoring is key

    Vibrant community and ecosystem

    Fast paced technology

    Confluent provides distribution and support for Apache Kafka

    Oracle Event Hub Service offers a Kafka Managed Service

    Apache Kafka - Scalable Message Processing and more!

  • WeatherData

    SQL ImportHadoop ClusterdHadoop Cluster

    Hadoop Cluster

    Location

    Social

    Clickstream

    Sensor Data

    Billing &Ordering

    CRM / Profile

    MarketingCampaigns

    CallCenter

    MobileApps

    Batch Analytics

    Streaming Analytics

    Event HubEvent HubEvent Hub

    NoSQL

    ParallelProcessing

    DistributedFilesystem

    Stream AnalyticsNoSQL

    Reference /Models

    SQL

    Search

    Dashboard

    BITools

    Enterprise Data Warehouse

    Search

    Online&MobileApps

    Customer Event Hub mapping of technologies

    Apache Kafka - Scalable Message Processing and more!

  • Guido SchmutzTechnology Manager

    guido.schmutz@trivadis.com

    Apache Kafka - Scalable Message Processing and more!

    @gschmutz guidoschmutz.wordpress.com

Recommended

View more >