Apache Kafka at LinkedIn

Download Apache Kafka at LinkedIn

Post on 14-Jul-2015

346 views

Category:

Engineering

4 download

TRANSCRIPT

LinkedIn Recruiting Solutions tagline

2013 LinkedIn Corporation. All Rights Reserved.Apache Kafka at LinkedInGuozhang WangBDTC 20142013 LinkedIn Corporation. All Rights Reserved.KAFKA Team, Data InfrastructureAbout Me2

2013 LinkedIn Corporation. All Rights Reserved.KAFKA Team, Data Infrastructure2Agenda3Overview of Kafka

Kafka Design

Kafka Usage at LinkedIn

Roadmap

Q & A2013 LinkedIn Corporation. All Rights Reserved.KAFKA Team, Data Infrastructure3Why We Build Kafka?2013 LinkedIn Corporation. All Rights Reserved.KAFKA Team, Data InfrastructureWe Have a lot of Data5User activity trackingPage views, ad impressions, etc

Server logs and metricsSyslogs, request-rates, etc

MessagingEmails, news feeds, etc

Computation derivedResults of Hadoop / data warehousing, etc2013 LinkedIn Corporation. All Rights Reserved.KAFKA Team, Data InfrastructureData-serving websites, LinkedIn has a lot of data5.. and We Build Products on Data62013 LinkedIn Corporation. All Rights Reserved.KAFKA Team, Data Infrastructure6Newsfeed7

2013 LinkedIn Corporation. All Rights Reserved.KAFKA Team, Data Infrastructure7Recommendation8

2013 LinkedIn Corporation. All Rights Reserved.KAFKA Team, Data InfrastructureBased on relevence8Recommendation9

2013 LinkedIn Corporation. All Rights Reserved.KAFKA Team, Data Infrastructure9Search10

2013 LinkedIn Corporation. All Rights Reserved.KAFKA Team, Data Infrastructure10Metrics and Monitoring11

2013 LinkedIn Corporation. All Rights Reserved.KAFKA Team, Data InfrastructureWe have this variety of data and and we need to build all these products around such data.11.. and a LOT of Monitoring12

2013 LinkedIn Corporation. All Rights Reserved.KAFKA Team, Data InfrastructureWe have this variety of data and and we need to build all these products around such data.12The Problem:

How to integrate this variety of data and make it available to all products? 2013 LinkedIn Corporation. All Rights Reserved.KAFKA Team, Data Infrastructure14Life back in 2010:

Point-to-Point Pipeplines

2013 LinkedIn Corporation. All Rights Reserved.KAFKA Team, Data InfrastructureMessaging: ActiveMQUser Activity: In house log aggregationLogging: SplunkMetrics: JMX => ZenossDatabase data: Databus, custom ETL

1415Example: User Activity Data Flow

2013 LinkedIn Corporation. All Rights Reserved.KAFKA Team, Data Infrastructure1516What We WantA centralized data pipeline

2013 LinkedIn Corporation. All Rights Reserved.KAFKA Team, Data Infrastructure1617Apache Kafka

We tried some systems off-the-shelf, but2013 LinkedIn Corporation. All Rights Reserved.KAFKA Team, Data InfrastructureActiveMQ: they do not fly1718What We REALLY WantA centralized data pipeline that is

Elastically scalable

Durable

High-throughput

Easy to use

2013 LinkedIn Corporation. All Rights Reserved.KAFKA Team, Data Infrastructure18A distributed pub-sub messaging system

Scale-out from groundup

Persistent to disks

High-Throughput (10s MB/sec per server)19Apache Kafka

2013 LinkedIn Corporation. All Rights Reserved.KAFKA Team, Data Infrastructure1920Life Since Kafka in Production

Apache Kafka

Developed and maintained by 5 Devs + 2 SRE2013 LinkedIn Corporation. All Rights Reserved.KAFKA Team, Data InfrastructureNow you maybe wondering why it works so well? For example, why it can be both highly durable by persisting data to disks while still maintaining high throughput?20Agenda21Overview of Kafka

Kafka Design

Kafka Usage at LinkedIn

Roadmap

Q & A2013 LinkedIn Corporation. All Rights Reserved.KAFKA Team, Data Infrastructure21Key Idea #1:

Data-parallelism leads to scale-out2013 LinkedIn Corporation. All Rights Reserved.KAFKA Team, Data InfrastructureProduce/consume requests are randomly balanced among brokers23Distribute Clients across Partitions

2013 LinkedIn Corporation. All Rights Reserved.KAFKA Team, Data InfrastructureTopic = message streamTopic has partitions, partitions are distributed to brokers23Key Idea #2:

Disks are fast when used sequentially2013 LinkedIn Corporation. All Rights Reserved.KAFKA Team, Data InfrastructureDo not be afraid of disks24Appends are effectively O(1)

Reads from known offset are fast still, when cached25Store Messages as a Log3455789101112...Producer WriteConsumer1 Reads (offset 7)Consumer2 Reads (offset 7)Partition i of Topic A2013 LinkedIn Corporation. All Rights Reserved.KAFKA Team, Data InfrastructureFile system caching25Key Idea #3:

Batching makes best use of network/IO2013 LinkedIn Corporation. All Rights Reserved.KAFKA Team, Data InfrastructureBatched send and receive

Batched compression

No message caching in JVM

Zero-copy from file to socket (Java NIO)27Batch Transfer2013 LinkedIn Corporation. All Rights Reserved.KAFKA Team, Data InfrastructureAnd finally after all these tricks, the client interface we exposed to the users, are very simple.2728The API (0.8)Producer: send(topic, message)

Consumer:

Iterable stream = createMessageStreams().get(topic)

for (message: stream) {// process the message}2013 LinkedIn Corporation. All Rights Reserved.KAFKA Team, Data Infrastructure28Agenda29Overview of Kafka

Kafka Design

Kafka Usage at LinkedIn

Pipeline deployment

Schema for data cleanliness

O(1) ETL

Auditing for correctness

Roadmap

Q & A2013 LinkedIn Corporation. All Rights Reserved.KAFKA Team, Data InfrastructureNow I will switch my gear and talk a little bit about Kafka usage at Linkedin2930Kafka Usage at LinkedInMainly used for tracking user-activity and metrics data

16 - 32 brokers in each cluster (615+ total brokers)

527 billion messages/day

7500+ topics, 270k+ partitions

Byte rates:Writes: 97 TB/dayReads: 430 TB/day2013 LinkedIn Corporation. All Rights Reserved.KAFKA Team, Data Infrastructure21st, October.

3031Kafka Usage at LinkedIn

2013 LinkedIn Corporation. All Rights Reserved.KAFKA Team, Data Infrastructure3132Kafka Usage at LinkedIn

2013 LinkedIn Corporation. All Rights Reserved.KAFKA Team, Data InfrastructureMulti-colo3233Kafka Usage at LinkedIn

2013 LinkedIn Corporation. All Rights Reserved.KAFKA Team, Data Infrastructure33Agenda34Overview of Kafka

Kafka Design

Kafka Usage at LinkedIn

Pipeline deployment

Schema for data cleanliness

O(1) ETL

Auditing for correctness

Roadmap

Q & A2013 LinkedIn Corporation. All Rights Reserved.KAFKA Team, Data Infrastructure34ProblemsHundreds of message types

Thousands of fields

What do they all mean?

What happens when they change?2013 LinkedIn Corporation. All Rights Reserved.KAFKA Team, Data Infrastructure36Standardized Schema on AvroSchema

Message structure contract

Performance gain

Workflow

Check in schemaAuto compatibility check

Code review

Ship it!

2013 LinkedIn Corporation. All Rights Reserved.KAFKA Team, Data Infrastructure36Agenda37Overview of Kafka

Kafka Design

Kafka Usage at LinkedIn

Pipeline deployment

Schema for data cleanliness

O(1) ETL

Auditing for correctness

Roadmap

Q & A2013 LinkedIn Corporation. All Rights Reserved.KAFKA Team, Data Infrastructure3738Kafka to Hadoop

2013 LinkedIn Corporation. All Rights Reserved.KAFKA Team, Data Infrastructure3839Hadoop ETL (Camus)Map/Reduce job does data load

One job loads all events

~10 minute ETA on average from producer to HDFS

Hive registration done automatically

Schema evolution handled transparently

Open sourced: https://github.com/linkedin/camus2013 LinkedIn Corporation. All Rights Reserved.KAFKA Team, Data Infrastructure39Agenda40Overview of Kafka

Kafka Design

Kafka Usage at LinkedIn

Pipeline deployment

Schema for data cleanliness

O(1) ETL

Auditing for correctness

Roadmap

Q & A2013 LinkedIn Corporation. All Rights Reserved.KAFKA Team, Data Infrastructure40Does it really work?All published messages must be delivered to all consumers (quickly)2013 LinkedIn Corporation. All Rights Reserved.KAFKA Team, Data InfrastructureAudit Trail

2013 LinkedIn Corporation. All Rights Reserved.KAFKA Team, Data Infrastructure99.99%4243More Features in Kafka 0.8Intra-cluster replication (0.8.0)

Highly availability,

Reduced latency

Log compaction (0.8.1)State storage

Operational tools (0.8.2)Topic management

Automated leader rebalance

etc ..

Checkout our page for more: http://kafka.apache.org/2013 LinkedIn Corporation. All Rights Reserved.KAFKA Team, Data Infrastructure0.8.2:

Delete topicAutomated leader rebalancing Controlled shutdown Offset management Parallel recoverymin.isr and clean leader election4344Kafka 0.9

Clients Rewrite

Remove ZK dependency

Even better throughput

Security

More operability, multi-tenancy ready

Transactional Messaing

From at-least-one to exactly-onceCheckout our page for more: http://kafka.apache.org/2013 LinkedIn Corporation. All Rights Reserved.KAFKA Team, Data Infrastructure44Kafka Users: Next Maybe You?

2013 LinkedIn Corporation. All Rights Reserved.KAFKA Team, Data InfrastructureNon-Java / Scala

C / C++ / .NETGoClojureRubyNode.jsPHPPythonErlangHTTP RESTCommand lineetc ..

https://cwiki.apache.org/confluence/display/KAFKA/ClientsPython - Pure Python implementation with full protocol support. Consumer and Producer implementations included, GZIP and Snappy compression supported.C - High performance C library with full protocol supportC++ - Native C++ library with protocol support for Metadata, Produce, Fetch, and Offset.Go (aka golang) Pure Go implementation with full protocol support. Consumer and Producer implementations included, GZIP and Snappy compression supported.Ruby - Pure Ruby, Consumer and Producer implementations included, GZIP and Snappy compression supported. Ruby 1.9.3 and up (CI runs MRI 2.Clojure - Clojure DSL for the Kafka APIJavaScript (NodeJS) - NodeJS client in a pure JavaScript implementationstdin & stdouthttps://cwiki.apache.org/confluence/display/KAFKA/Clients

4546Acknowledgements

2013 LinkedIn Corporation. All Rights Reserved.KAFKA Team, Data InfrastructureNon-Java / Scala

C / C++ / .NETGoClojureRubyNode.jsPHPPythonErlangHTTP RESTCommand lineetc ..

46Questions?Guozhang Wangguwang@linkedin.comwww.linkedin.com/in/guozhangwang2013 LinkedIn Corporation. All Rights Reserved.KAFKA Team, Data InfrastructureBackup Slides2013 LinkedIn Corporation. All Rights Reserved.KAFKA Team, Data Infrastructure49Real-time Analysis with KafkaAnalytics from Hadoop can be slowProduction -> Kafka: tens of millisecondsKafka - > Hadoop: < 1 minuteETL in Hadoop: ~ 45 minutesMapReduce in Hadoop: maybe hours

2013 LinkedIn Corporation. All Rights Reserved.KAFKA Team, Data Infrastructure4950Real-time Analysis with KafkaSolution No.1: directly consuming from Kafka

Solution No. 2: other storage than HDFSSpark, SharkPinot, Druid, FastBit

Solution No. 3: stream processingApache SamzaStorm2013 LinkedIn Corporation. All Rights Reserved.KAFKA Team, Data Infrastructure5051How Fast can Kafka Go? Bottleneck #1: network bandwidth

Producer: 100 Mb/s for 1 Gig-Ethernet

Consumer can be slower due to multi-sub

Bottleneck #2: disk space

Data may be deleted before consumed at peak time Configurable time/size-based retention policy

Bottleneck #3: Zookeeper

Mainly due to offset commit, will be lifted in 0.9

2013 LinkedIn Corporation. All Rights Reserved.KAFKA Team, Data Infrastructure5152Intra-cluster ReplicationPick CA within Datacenter (failover < 10ms)

Network partition is rareLatency less than an issue

Separate data replication and consensus

Consensus => ZookeeperReplication => primary-backup (f to tolerate f-1 failure)

Configurable ACK (durability v.s. latency)

More details:

http://www.slideshare.net/junrao/kafka-replication-apachecon2013

2013 LinkedIn Corporation. All Rights Reserved.KAFKA Team, Data Infrastructure5253Replication ArchitectureProducerConsumerProducerBrokerBrokerBrokerBrokerConsumerZK2013 LinkedIn Corporation. All Rights Reserved.KAFKA Team, Data Infrastructure53

Recommended

View more >