processing changes from couchbase in kafka at paypal
Post on 15-Jul-2015
445 Views
Preview:
TRANSCRIPT
© 2014 PayPal Inc. All rights reserved.
Streaming data from Couchbase to Hadoop using Kafka
https://github.com/paypal/couchbasekafka
Shibi Sudhakaran @s007
Vinoth Kumar Pothapu @vinothtwit
1
100CURRENCIES SUPPORTED
155MACTIVE REGISTERED
ACCOUNTS
203MARKETS OFFER PAYPAL
EUROPEAN UNIONEURO
AUSTRALIANDOLLAR
CANADIANDOLLAR
NEW ZEALANDDOLLAR
HUNGARIANFORINT
MALAYSIANRINGGIT
UNITED KINGDOMPOUNDS STERLING
HONG KONGDOLLAR
UNITED STATESDOLLAR
TAIWANNEW DOLLAR
CHINESERMB
SWEDISHKRONA
SINGAPOREDOLLAR
PHILIPPINEPESO
BRAZILIANREAL
RUSSIANRUBLE
NORWEGIANKRONE
JAPANESEYEN
MEXICANPESO
TURKISHLIRA
SWISSFRANC
CZECHKORUNA
ISRAELINEW SHEKEL
DANISHKRONE
THAIBAHT
POLISHZLOTY
2
© 2014 PayPal Inc. All rights reserved.. 5
Snapshot, Need local Hadoop, Non-real time.
https://github.com/paypal/cbflumePUSH – PUSHFlume Agent determines the rate at which message is send
Available Alternatives
© 2014 PayPal Inc. All rights reserved..
Solution : Couchbase Kafka Adapter
6
https://github.com/paypal/couchbasekafka
config.propertiesContains settings for TAP API and Adaptercb.cbserver ,cb.fulldump,CBMessageConverter,monitoringEnabled,sherlockThreshold
kafkaconfig.propertiesmetadata.broker.list - #Kafka brokers partitioner.class - Topic partition logic. request.required.acks- 0, means that the producer never waits for an acknowledgement cookie.topic#Topic to publish messages to producer.type=async/syncbatch.size
© 2014 PayPal Inc. All rights reserved..
Usecase : Cookie
7
Node
Others
C++
Java
Persistent & Plain
Text
Session & Plain Text
Persistent &
Encrypted
Session &
Encrypted
Functional View
8
CookieService
Couchbase DC A Couchbase DC B
Front Tier
Customers
ApplicationCookie Libraries
Mid Tier
Data Tier
XDCR
Couchbase Client
© 2014 PayPal Inc. All rights reserved..
Why Couchbase ?
9
Data volume/
Scalability
• Online system ;–
1B documents
• 4-10k size ; 5-10TB
total storage
• Linearly Scalable
Availability
• Multi data center –
DR
• Availability
requirement of
99.99%
Data Structure
• Flexible &
Schema less;
document based
Performance
• 50% read/50%
write;
• Low latency < 5-
10 msec
© 2014 PayPal Inc. All rights reserved..
What lives inside Couchbase Cluster
10
Cookie
User
Session
Map
Cookie
User
Migration
Status
Cookie
Audit
Cookie
Metrics
Cookie
Service
Cookie
Service
Cookie
Service
XDCR
Active
Write
Read
Deployment Model
12
Bi-directional Uni-directional
Active Passive
© 2014 PayPal Inc. All rights reserved..
Why we need to extract data ?
13
Data Driven Cookie Migration
Use immutable commit logs for batch and real-time computing.
• Analytics/real time monitoring for cookie usage/hacks.
• Enrich data by merging with Tracking, Profile, Data Science etc.
• Real time alerting for Risk
© 2014 PayPal Inc. All rights reserved..
Enter… KAFKA
publish-subscribe messaging rethought as a distributed commit log.
Fast
Scalable
Durable
Distributed
© 2014 PayPal Inc. All rights reserved..
KAFKA• PARTITION LOGIC• ASYNC PRODUCER – batching
• Topics & Partitions
• Leader promotion using Zookeeper
• Time to Live
• Replication (automatic failovers)
• Load balancing between consumer groups.
• Fast – page cache+send file – data served
from cache.
• at-least-once delivery
• Fetch logic
• Offset management
• Consumer groups
Broker
Producer
Consumer
top related