apache kafka at linkedin

37

Jay Kreps Introduction to Apache Kafka

Upload: discover-pinterest

Post on 19-Aug-2014

331 views

Category:

Engineering

5 download

Report

Download

Embed Size (px):

DESCRIPTION

Jay Kreps is a Principal Staff Engineer at LinkedIn where he is the lead architect for online data infrastructure. He is among the original authors of several open source projects including a distributed key-value store called Project Voldemort, a messaging system called Kafka, and a stream processing system called Samza. This talk gives an introduction to Apache Kafka, a distributed messaging system. It will cover both how Kafka works, as well as how it is used at LinkedIn for log aggregation, messaging, ETL, and real-time stream processing.

TRANSCRIPT

Page 1: Apache Kafka at LinkedIn

Jay KrepsIntroduction to Apache Kafka

Page 2: Apache Kafka at LinkedIn

The Plan1. What is Apache Kafka?2. Kafka and Data Integration3. Kafka and Stream Processing

Page 3: Apache Kafka at LinkedIn

Apache Kafka

Page 4: Apache Kafka at LinkedIn

Abrief

historyof

ApacheKafka

Page 5: Apache Kafka at LinkedIn

Characteristics• Scalability of a filesystem– Hundreds of MB/sec/server throughput–Many TB per server

• Guarantees of a database–Messages strictly ordered– All data persistent

• Distributed by default– Replication– Partitioning model

Page 6: Apache Kafka at LinkedIn

Kafka is about logs

Page 7: Apache Kafka at LinkedIn

What is a log?

Page 8: Apache Kafka at LinkedIn

Page 9: Apache Kafka at LinkedIn

Page 10: Apache Kafka at LinkedIn

Logs: pub/sub done right

Page 11: Apache Kafka at LinkedIn

Partitioning

Page 12: Apache Kafka at LinkedIn

Nodes Host Many Partitions

Page 13: Apache Kafka at LinkedIn

Producers Balance Load

Page 14: Apache Kafka at LinkedIn

Consumer’s Divide Up Partitions

Page 15: Apache Kafka at LinkedIn

End-to-End

Page 16: Apache Kafka at LinkedIn

Kafka At LinkedIn• 175 TB of in-flight log data per colo• Replicated to each datacenter• Tens of thousands of data producers• Thousands of consumers• 7 million messages written/sec• 35 million messages read/sec• Hadoop integration

Page 17: Apache Kafka at LinkedIn

Performance• Producer (3x replication):– Async: 786,980 records/sec (75.1 MB/sec)– Sync: 421,823 records/sec (40.2 MB/sec)

• Consumer: – 940,521 records/sec (89.7 MB/sec)

• End-to-end latency: – 2 ms (median)– 14 ms (99.9th percentile)

Page 18: Apache Kafka at LinkedIn

Page 19: Apache Kafka at LinkedIn

The Plan1. What is Apache Kafka?2. Kafka and Data Integration3. Kafka and Stream Processing

Page 20: Apache Kafka at LinkedIn

Data Integration

Page 21: Apache Kafka at LinkedIn

Maslow’s Hierarchy

Page 22: Apache Kafka at LinkedIn

For Data

Page 23: Apache Kafka at LinkedIn

New Types of Data• Database data– Users, products, orders, etc

• Events– Clicks, Impressions, Pageviews, etc

• Application metrics– CPU usage, requests/sec

• Application logs– Service calls, errors

Page 24: Apache Kafka at LinkedIn

New Types of Systems• Live Stores– Voldemort– Espresso– Graph– OLAP– Search– InGraphs

• Offline– Hadoop– Teradata

Page 25: Apache Kafka at LinkedIn

Bad

Page 26: Apache Kafka at LinkedIn

Good

Page 27: Apache Kafka at LinkedIn

Example: User views job

Page 28: Apache Kafka at LinkedIn

Comparing Data Transfer Mechanisms

Page 29: Apache Kafka at LinkedIn

The Plan1. What is Apache Kafka?2. Kafka and Data Integration3. Kafka and Stream Processing

Page 30: Apache Kafka at LinkedIn

Stream Processing

Page 31: Apache Kafka at LinkedIn

Stream processing is ageneralization

of batch processing

Page 32: Apache Kafka at LinkedIn

Stream Processing = Logs + Jobs

Page 33: Apache Kafka at LinkedIn

Examples• Monitoring• Security• Content processing• Recommendations• Newsfeed• ETL

Page 34: Apache Kafka at LinkedIn

Frameworks Can Help

Page 35: Apache Kafka at LinkedIn

Samza Architecture

Page 36: Apache Kafka at LinkedIn

Log-centric Architecture

Page 37: Apache Kafka at LinkedIn

Kafkahttp://kafka.apache.org

Samzahttp://samza.incubator.apache.org

Log Bloghttp://linkd.in/199iMwY

Benchmark:http://t.co/40fkKJvanx

Mehttp://www.linkedin.com/in/jaykreps

@jaykreps

http://kafka.apache.org/

http://kafka.apache.org/

http://samza.incubator.apache.org/

http://samza.incubator.apache.org/

http://linkd.in/199iMwY

http://linkd.in/199iMwY

http://t.co/40fkKJvanx

http://t.co/40fkKJvanx

http://www.linkedin.com/in/jaykreps

http://www.linkedin.com/in/jaykreps

Integrating Apache Hive with Kafka, Spark, and BI · 2020-05-10 · Data Access Apache Hive-Kafka integration Related Information Apache Kafka Documentation Perform ETL by ingesting

Kafka Streams: The Stream Processing Engine of Apache Kafka

Learning Apache Kafka - Second Edition - DropPDF1.droppdf.com/files/wxO8u/learning-apache-kafka-second-edition-by... · Learning Apache Kafka Second Edition Credits About the Author

15-319 / 15-619 Cloud Computingmsakr/15619-s18/... · Apache Kafka Developed at LinkedIn as a distributed messaging system. Apache Kafka ... consumed from the Kafka stream Additionally,

Spring for Apache Kafka · 2020. 8. 12. · 4.1. Using Spring for Apache Kafka . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Apache Kafka Overview - docs.cloudera.com

Using Apache Spark, Apache Kafka and Apache Cassandra...USING APACHE SPARK, APACHE KAFKA AND APACHE CASSANDRA TO POWER INTELLIGENT APPLICATIONS | 02 Apache Cassandra is well known

Making Apache Kafka Elastic with Apache Mesos

Building Distributed Semantic Job Queue with Kafka · Apache Kafka Overview What is Apache Kafka ? Run as a cluster on one or more servers that can span multiple DC Apache Kafka®

STREAM PROCESSING AT LINKEDIN: APACHE KAFKA & APACHE … SF... · STREAM PROCESSING AT LINKEDIN: APACHE KAFKA & ... One of the initial authors of Apache Kafka, ... Introduction to

· Apache Kafka Introduction to Apache Kafka Apache Kafka Architecture explanation Practical Examples on Apache Kafka SCALA, PYTHON, SPARK Course Content

Chapter 1: An Introduction to SMACK · Figure 4-11: Apache Cassandra cache. Chapter 5: The Broker - Apache Kafka Figure 5-1. Apache Kafka typical scenario. Figure 5-2. Apache Kafka

apache kafka event stream processing solution Kafka is a Stream Processing Element (SPE) taking care of the needs of event processing. Apache Kafka was initially developed at LinkedIn

Slides - Apache Kafka® Architecture & Fundamentals Explained€¦ · for Apache Kafka (aligns to Confluent Developer Skills for Building Apache Kafka course) Confluent Certified

User's Guide Apache Kafka Software Release 2 diagram shows the parts (green) of Apache Kafka: Core Apache Kafka (light green), including the Kafka client API and the Kafka broker

Apache Kafka Workshop - intuit.com...Apache Kafka is a distributed publish-subscribe messaging system. It is - Scalable Durable Fault-tolerant Fast It was originally developed at LinkedIn

Apache Kafka Lesson Learned

Evaluation of Apache Kafka in Real-Time Big Data Pipeline ... · Apache Kafka in the pipeline architecture. 3.1 Apache Kafka Architecture . Kafka [9] is an open source, distributed

Kafka Tutorial - Introduction to Apache Kafka (Part 1)

apache kafka event stream processing solution is Apache Kafka? Apache Kafka is a Stream Processing Element (SPE) taking care of the needs of event processing. Apache Kafka was initially

Building a Real-time Data Pipeline: Apache Kafka at LinkedIn

Apache Kafka - Martin Podval

Authorization in Apache Kafka - Seattle Kafka Meetup - Ashish Singh

Kafka in Production - WordPress.com · Apache Kafka Apache Kafka is an open-source stream processing platform developed by the Apache Software Foundation written in Scala and Java

Stream Processing using Apache Spark and Apache Kafka

Apache Kafka Security

Apache Kafka Best Practices

apache-kafka - riptutorial.com · from: apache-kafka It is an unofficial and free apache-kafka ebook created for educational purposes. All the content is extracted from Stack Overflow

Apache Kafka - RainFocus · Apache Kafka Scalable Message ... Introduction& Motivation Apache Kafka -Scalable Message Processing and more! Apache Kafka -Overview ... • Apache Spark

Apache Kafka at LinkedIn

Apache Kafka - · PDF fileOverview What is Apache Kafka? Data pipelines Architecture How does Apache Kafka work? Brokers Producers Consumers Topics

Introduction to Apache Kafka

Building a Real-Time Data Pipeline: Apache Kafka at Linkedin Hadoop Summit 2013 Joel Koshy June 2013 LinkedIn Corporation ©2013 All Rights Reserved

Apache Kafka Lightning Talk