couchbase meetup jan 2016

Post on 10-Feb-2017

663 Views

Category:

Technology

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Michael Kehoe Senior Site Reliability Engineer

LinkedIn

LinkedIn’s Big Data Pipeline with Kafka, Hadoop and

Couchbase

3

$ whoami Michael Kehoe

• Sr Site Reliability Engineer (SRE)

• Member of CBVT• B.E. (Electrical Engineering)

fromthe University of Queensland,Australia

4

Kafka @ LinkedIn

• Kafka was created by LinkedIn• Kafka is a publish-subscribe

system as a distributed commit log

• Processes 500+ TB/ day (~500 billion messages)

5

LinkedIn’s use of Kafka

• Monitoring• Pub-SubMessaging• Analytics• Buildingblockfor(log)distributed

application• Samza• Espresso• Pinot

6

 Kafka to Hadoop (Analytics)Use Case

• LinkedIntracksdatatobetterunderstandhowmembersuseourproducts

• InformationsuchaswhichpagegotviewedandwhichcontentgotclickedonaresentintoaKafkaclusterineachdatacenter

• SomeoftheseeventsareallcentrallycollectedandpushedontoourHadoopgridforanalysisanddailyreportgeneration

7

Couchbase @ LinkedIn

• About80separateserviceswithoneormoreclustersinmultipledatacenters

• Upto~70serversinacluster• Single&Multi-tenantclusters

8

Hadoop to Couchbase

• Ourprimaryuse-caseforHadoopCouchbaseisforbuilding(warming)/restoringCouchbasebuckets

• LinkedInbuiltit’sownin-housesolutiontoworkwithourETLprocessesetc

9

 Jobs ClusterClusters & Numbers

• Usedforread-scaling,>150kQPS,27nodeclusters

• WeuseHadooptopre-builddatabypartition• Couchbaseaveragelatencyis2-3ms

• 99thpercentileis~8-12ms

10

Questions?Thank You

©2014 LinkedIn Corporation. All Rights Reserved.

top related