processing changes from couchbase in kafka at paypal

18
© 2014 PayPal Inc. All rights reserved. Streaming data from Couchbase to Hadoop using Kafka https ://github.com/paypal/couchbasekafka Shibi Sudhakaran @s007 Vinoth Kumar Pothapu @vinothtwit 1

Upload: shibi-sudhakaran

Post on 15-Jul-2015

445 views

Category:

Data & Analytics


2 download

TRANSCRIPT

© 2014 PayPal Inc. All rights reserved.

Streaming data from Couchbase to Hadoop using Kafka

https://github.com/paypal/couchbasekafka

Shibi Sudhakaran @s007

Vinoth Kumar Pothapu @vinothtwit

1

100CURRENCIES SUPPORTED

155MACTIVE REGISTERED

ACCOUNTS

203MARKETS OFFER PAYPAL

EUROPEAN UNIONEURO

AUSTRALIANDOLLAR

CANADIANDOLLAR

NEW ZEALANDDOLLAR

HUNGARIANFORINT

MALAYSIANRINGGIT

UNITED KINGDOMPOUNDS STERLING

HONG KONGDOLLAR

UNITED STATESDOLLAR

TAIWANNEW DOLLAR

CHINESERMB

SWEDISHKRONA

SINGAPOREDOLLAR

PHILIPPINEPESO

BRAZILIANREAL

RUSSIANRUBLE

NORWEGIANKRONE

JAPANESEYEN

MEXICANPESO

TURKISHLIRA

SWISSFRANC

CZECHKORUNA

ISRAELINEW SHEKEL

DANISHKRONE

THAIBAHT

POLISHZLOTY

2

3

$1 in every $6Spent on e-commerce is

spent through PayPal.*

Problem

Transfer data from Couchbase to Hadoop

4

© 2014 PayPal Inc. All rights reserved.. 5

Snapshot, Need local Hadoop, Non-real time.

https://github.com/paypal/cbflumePUSH – PUSHFlume Agent determines the rate at which message is send

Available Alternatives

© 2014 PayPal Inc. All rights reserved..

Solution : Couchbase Kafka Adapter

6

https://github.com/paypal/couchbasekafka

config.propertiesContains settings for TAP API and Adaptercb.cbserver ,cb.fulldump,CBMessageConverter,monitoringEnabled,sherlockThreshold

kafkaconfig.propertiesmetadata.broker.list - #Kafka brokers partitioner.class - Topic partition logic. request.required.acks- 0, means that the producer never waits for an acknowledgement cookie.topic#Topic to publish messages to producer.type=async/syncbatch.size

© 2014 PayPal Inc. All rights reserved..

Usecase : Cookie

7

Node

Others

C++

Java

Persistent & Plain

Text

Session & Plain Text

Persistent &

Encrypted

Session &

Encrypted

Functional View

8

CookieService

Couchbase DC A Couchbase DC B

Front Tier

Customers

ApplicationCookie Libraries

Mid Tier

Data Tier

XDCR

Couchbase Client

© 2014 PayPal Inc. All rights reserved..

Why Couchbase ?

9

Data volume/

Scalability

• Online system ;–

1B documents

• 4-10k size ; 5-10TB

total storage

• Linearly Scalable

Availability

• Multi data center –

DR

• Availability

requirement of

99.99%

Data Structure

• Flexible &

Schema less;

document based

Performance

• 50% read/50%

write;

• Low latency < 5-

10 msec

© 2014 PayPal Inc. All rights reserved..

What lives inside Couchbase Cluster

10

Cookie

User

Session

Map

Cookie

User

Migration

Status

Cookie

Audit

Cookie

Metrics

Cluster Overview

11

Cookie

Service

Cookie

Service

Cookie

Service

XDCR

Active

Write

Read

Deployment Model

12

Bi-directional Uni-directional

Active Passive

© 2014 PayPal Inc. All rights reserved..

Why we need to extract data ?

13

Data Driven Cookie Migration

Use immutable commit logs for batch and real-time computing.

• Analytics/real time monitoring for cookie usage/hacks.

• Enrich data by merging with Tracking, Profile, Data Science etc.

• Real time alerting for Risk

© 2014 PayPal Inc. All rights reserved..

Enter… KAFKA

publish-subscribe messaging rethought as a distributed commit log.

Fast

Scalable

Durable

Distributed

© 2014 PayPal Inc. All rights reserved..

KAFKA• PARTITION LOGIC• ASYNC PRODUCER – batching

• Topics & Partitions

• Leader promotion using Zookeeper

• Time to Live

• Replication (automatic failovers)

• Load balancing between consumer groups.

• Fast – page cache+send file – data served

from cache.

• at-least-once delivery

• Fetch logic

• Offset management

• Consumer groups

Broker

Producer

Consumer

© 2014 PayPal Inc. All rights reserved..

Adapter Dashboard

What Next?

17

DEMO

18