Apache Kafka at LinkedIn
out of 37
Post on 19-Aug-2014
DESCRIPTIONJay Kreps is a Principal Staff Engineer at LinkedIn where he is the lead architect for online data infrastructure. He is among the original authors of several open source projects including a distributed key-value store called Project Voldemort, a messaging system called Kafka, and a stream processing system called Samza. This talk gives an introduction to Apache Kafka, a distributed messaging system. It will cover both how Kafka works, as well as how it is used at LinkedIn for log aggregation, messaging, ETL, and real-time stream processing.
Jay Kreps Introduction to Apache Kafka The Plan 1. What is Apache Kafka? 2. Kafka and Data Integration 3. Kafka and Stream Processing Apache Kafka A brief history of Apache Kafka Characteristics Scalability of a filesystem Hundreds of MB/sec/server throughput Many TB per server Guarantees of a database Messages strictly ordered All data persistent Distributed by default Replication Partitioning model Kafka is about logs What is a log? Logs: pub/sub done right Partitioning Nodes Host Many Partitions Producers Balance Load Consumers Divide Up Partitions End-to-End Kafka At LinkedIn 175 TB of in-flight log data per colo Replicated to each datacenter Tens of thousands of data producers Thousands of consumers 7 million messages written/sec 35 million messages read/sec Hadoop integration Performance Producer (3x replication): Async: 786,980 records/sec (75.1 MB/sec) Sync: 421,823 records/sec (40.2 MB/sec) Consumer: 940,521 records/sec (89.7 MB/sec) End-to-end latency: 2 ms (median) 14 ms (99.9th percentile) The Plan 1. What is Apache Kafka? 2. Kafka and Data Integration 3. Kafka and Stream Processing Data Integration Maslows Hierarchy For Data New Types of Data Database data Users, products, orders, etc Events Clicks, Impressions, Pageviews, etc Application metrics CPU usage, requests/sec Application logs Service calls, errors New Types of Systems Live Stores Voldemort Espresso Graph OLAP Search InGraphs Offline Hadoop Teradata Bad Good Example: User views job Comparing Data Transfer Mechanisms The Plan 1. What is Apache Kafka? 2. Kafka and Data Integration 3. Kafka and Stream Processing Stream Processing Stream processing is a generalization of batch processing Stream Processing = Logs + Jobs Examples Monitoring Security Content processing Recommendations Newsfeed ETL Frameworks Can Help Samza Architecture Log-centric Architecture Kafka http://kafka.apache.org Samza http://samza.incubator.apache.org Log Blog http://linkd.in/199iMwY Benchmark: http://t.co/40fkKJvanx Me http://www.linkedin.com/in/jaykreps @jaykreps
View more >
STREAM PROCESSING AT LINKEDIN: APACHE KAFKA APACHE SF...STREAM PROCESSING AT LINKEDIN: APACHE KAFKA ... One of the initial authors of Apache Kafka, ... Introduction to Logs Apache Kafka !
apache kafka event stream processing solution Kafka is a Stream Processing Element (SPE) taking care of the needs of event processing. Apache Kafka was initially developed at LinkedIn and
About the Tutorial - nbsp; Apache Kafka i About the Tutorial Apache Kafka was originated at LinkedIn and later became an open sourced Apache project in 2011, then First-class Apache project in ...
About the Tutorial - Kafka i About the Tutorial Apache Kafka was originated at LinkedIn and later became an open sourced Apache project in 2011, then First-class Apache project in ...
Apache Kafka - RainFocus Kafka Scalable Message ... Introduction Motivation Apache Kafka -Scalable Message Processing and more! Apache Kafka -Overview ... Apache Spark Streaming
Kafka Introduction to Apache Kafka Apache Kafka Architecture explanation Practical Examples on Apache Kafka SCALA, PYTHON, SPARK Course Content
Building a Real-Time Data Pipeline: Apache Kafka at Linkedin Hadoop Summit 2013 Joel Koshy June 2013 LinkedIn Corporation 2013 All Rights Reserved.
Apache Kafka - nbsp; Overview What is Apache Kafka? Data pipelines Architecture How does Apache Kafka work? Brokers Producers Consumers Topics
SED370 - Kafka Cloud - Software Engineering Daily 370 Transcript EPISODE 370 [INTRODUCTION] [0:00:00.3] JM: Apache Kafka is an open source distributed streaming platform. Kafka was originally developed at LinkedIn and the creators of the project eventually left LinkedIn and
Confluent Operations Training for Apache Kafka Operations Training for Apache Kafka ... You will learn how Kafka and the Confluent Platform work, ... Introduction The Motivation for Apache Kafka