streaming data with apache kafka

14

Marko Bonaći

Upload: mbonaci

Post on 26-Jan-2017

274 views

Category:

Technology

1 download

Report

Download

Embed Size (px):

TRANSCRIPT

Page 1: Streaming data with Apache kafka

Marko Bonaći

Page 2: Streaming data with Apache kafka

Apache Kafka general info● originated at LinkedIn● open sourced in early 2011● implemented in Scala, client libraries in Java● many other client libraries:

○ C++, Go, JS/Node, Clojure, Ruby, ...

Page 3: Streaming data with Apache kafka

Motivation at LinkedIn

● many disparate data sources and destinations● a need to move data around reliably● existing solutions not adequate

Page 4: Streaming data with Apache kafka

Requirements

● low latency● huge throughput● online and offline consumers● zero data loss● fault-tolerance

Page 5: Streaming data with Apache kafka

Hadoop(HBase)Consumers

Front enduser

eventsProducers Metrics Logs

RDBMS&

NoSQL

Kafka

MonitoringSystems

LogAggregation

Systems

Search engines DWH

"Events": asgeneralization

of real-time msg processing

Page 6: Streaming data with Apache kafka

Terminology

● Kafka broker = serverhosts

● Topic = queueconsists of

● Partition = piece of a sliced queueis mirrored by

● Replica = backup of a Partition

Page 7: Streaming data with Apache kafka

A0 A0 A0

A1 A1 A1

Broker0 Broker1 Broker2

B0 B0

Page 8: Streaming data with Apache kafka

0 1 2 3 Producer

0 1 2 3 4

0 1 2 3 4

Topic consists of partitions

A0

A1

A2

Page 9: Streaming data with Apache kafka

0 1 2 3 4 5 6 7

Producer

Consumer BConsumer A

Page 10: Streaming data with Apache kafka

Consumer offsets

● previously kept in Zookeeper● from 0.8 added ability to store offsets in a

special Kafka topic● brokers are dumb - consumers track their

offsets● ZK only needed to keep track of cluster state

Page 11: Streaming data with Apache kafka

Retention

● size based● time based● per-topic

Page 12: Streaming data with Apache kafka

Ordered and immutable

● strict ordering within partition● low and high level consumers in 0.8● number of partitions = parallelism● 1 partition = max 1 consumer thread● 1 consumer thread = one or more partitions

Page 13: Streaming data with Apache kafka

Impressive performance

● 3 cheap machines○ 6-core Intel Xeon 2.5GHz, 32GB RAM, 7 x 7200 RPM SATA, Gb ethernet○ 3 async producers (1 per machine, since network became saturated)

● 2M msgs/s● tens of milliseconds latency end-to-end

Page 14: Streaming data with Apache kafka

Impressive volumes: Kafka at LinkedIn

● 600 brokers● 30k topics● 300k partitions (not counting replicated ones)● Trillion messages per day● 120 TB/day in & 500 TB/day out● Peak load:

○ 6M messages/s○ 15 Gb/s in & 60 Gb/s out

Kubernetes Deploy and use Apache Kafka on - Puzzle...Kafka Streams Client library for building streaming applications Just a library, not a framework Perfect integration with Apache

Apache kafka-a distributed streaming platform

[Big Data Spain] Apache Spark Streaming + Kafka 0.10: an Integration Story

Amazon Managed Streaming for Apache Kafka - Amazon MSK … · existing applications, tooling, and plugins from partners and the Apache Kafka community are supported without requiring

Streaming mitApache Kafka - JUG Saxony Day · Streaming mit Apache Kafka Kafka: The Definitive Guide Real-time dataandstream processingat scale von NehaNarkhede, Gwen Shapira, Todd

Real-Time Streaming: IMS to Apache Kafka and … IMS User Group August 22nd 2017 Real-Time Streaming: IMS to Apache Kafka and Hadoop - 2017 Scott Quillicy SQData

Apache Kafka Event-Streaming Platform for .NET Developers · 1 Apache Kafka Event-Streaming Platform for .NET Developers October, 2019 @gamussa | #SpringOne | @ConfluentINc #SpringOne

Apache Kafka Overview€¦ · Apache Kafka is a high performance, highly available, and redundant streaming message platform. Kafka functions much like a publish/subscribe messaging

IMS CDC to Kafka Performance and Tuning of Apache Kafka Key Performance Considerations for CDC Streaming to Kafka Initial Load Configuration and Performance IMS CDC Streaming Configuration

Streaming in Practice - Putting Apache Kafka in Production

Streaming Data and Stream Processing with Apache Kafka

Apache Kafka - Masaryk Universitydisa.fi.muni.cz/wp-content/uploads/ApacheKafka.pdf · 2018-05-14 · What is Apache Kafka A distributed streaming platform Originally developed by

Webinar: Data Streaming with Apache Kafka & MongoDB

Best Practices for Developing Apache Kafka Applications on ... · Confluent Cloud is a fully managed service for Apache Kafka®, a distributed streaming platform technology. Engineers

Apache Kafka - · PDF fileOverview What is Apache Kafka? Data pipelines Architecture How does Apache Kafka work? Brokers Producers Consumers Topics

A Performance Evaluation of Apache Kafka in Support of Big Data … · 2020-03-07 · A Performance Evaluation of Apache Kafka in Support of Big Data Streaming Applications Producer

Building Streaming Data Applications Using Apache Kafka

Spark streaming with apache kafka

Data Streaming with Apache Kafka & MongoDB - EMEA

· Apache Kafka Introduction to Apache Kafka Apache Kafka Architecture explanation Practical Examples on Apache Kafka SCALA, PYTHON, SPARK Course Content

Apache Kafka - RainFocus · Apache Kafka Scalable Message ... Introduction& Motivation Apache Kafka -Scalable Message Processing and more! Apache Kafka -Overview ... • Apache Spark

12062018 The Unmatchable ROI of Managed …...THE UNMATCHABLE ROI OF MANAGED APACHE KAFKA SERVICES | 04 Thousands of companies are using Apache Kafka to build streaming applications

Streaming Data Ingest and Processing with Apache Kafka

MQTT KAFKA BRIDGE · WHAT IS APACHE KAFKA? • A distributed streaming platform used for building real-time data pipelines and streaming apps. • Open-source • Horizontally scalable,

Amazon Managed Streaming for Apache Kafka - Developer Guide · 2019-12-23 · Amazon Managed Streaming for Apache Kafka Developer Guide Step 1: Create a VPC Getting Started Using

Apache Spark Streaming + Kafka 0.10 with Joan Viladrosariera

SED370 - Kafka Cloud - Software Engineering Daily · SED 370 Transcript EPISODE 370 [INTRODUCTION] [0:00:00.3] JM: Apache Kafka is an open source distributed streaming platform. Kafka

Oracle GoldenGate and Apache Kafka: A Deep Dive Into Real-Time Data Streaming

Accelerating Anomaly Detection Algorithms on FPGA-Based ...matutani/papers/matsutani_mpsoc2018.pdf(Apache Kafka) Stream processing (Apache Spark Streaming) Batch processing (Apache

Vehicle data analysis using cloud-based stream processing · 2019-07-09 · ing state-of-the-art streaming components, namely Apache Spark Streaming and Apache Kafka. This thesis

OCI Streaming Service Level 100Apache Kafka is an open source pub/sub system; OCI Streaming Vs Apache Kafka Adding Connectors, Stream Processing, Kafka compatibility in H2 2019 OCI

IBM Event Streams - Event Streaming with Apache Kafka in

Data Streaming with Apache Kafka & MongoDB

Apache Kafka with Spark Streaming: Real-time Analytics Redefined