paypal real time analytics

36
© 2014 PayPal Inc. All rights reserved. Confidential and proprietary. Open Source Real Time BI using Storm, Hadoop, Titan, Druid & D3 Anil Madan Sr. Director Engineering, PayPal

Upload: anil-madan

Post on 22-Nov-2014

271 views

Category:

Data & Analytics


4 download

DESCRIPTION

Open Source Real Time BI using Storm, Hadoop, Titan, Druid & D3

TRANSCRIPT

Page 1: PayPal  Real Time Analytics

© 2014 PayPal Inc. All rights reserved. Confidential and proprietary.

Open Source Real Time BI using Storm, Hadoop, Titan, Druid & D3

Anil MadanSr. Director Engineering, PayPal

Page 2: PayPal  Real Time Analytics

$1 in every $6Spent on e-commerce is

spent through PayPal.*

*Source: Morgan Stanley, “eCommerce Disruption: A Global Theme,” January 6, 2013, p.21.

Page 3: PayPal  Real Time Analytics

Creating Tomorrow’s

Mobile PaymentExperiences

25 countries with live PayPal fingerprint authenticationon Samsung devices.

Page 4: PayPal  Real Time Analytics

Helping DevelopersInnovate & Monetize

New Mobile Apps

Braintree launches its new API, including Pay with PayPal.

Page 5: PayPal  Real Time Analytics

PayPal Now Available in 203 Markets10 new markets added in the second quarter,

making PayPal available to 80 million new internet users.

Paraguay

Côte d’Ivoire

Nigeria

Monaco

Belarus

Moldova

Cameroon

Zimbabwe

Montenegro

Macedonia

Page 6: PayPal  Real Time Analytics

© 2014 PayPal Inc. All rights reserved. Confidential and proprietary.

We need to better understand our customers…

Where do prospects sign up for accounts?

How do prospective customers learn about

PayPal?

Acquisition Activation AdoptionAwareness

How can we help them

use PayPal even more?

How can we help them to

complete their 1st

payment?

Business Problem

Page 7: PayPal  Real Time Analytics

© 2014 PayPal Inc. All rights reserved. Confidential and proprietary.

How we solved it…

Direct/Home Page

ProductExperiences

Search EngineMarketing

TransactionEmails

Tracking MetadataTool

Taxonomy

Tracking Event Service

Tracking Servers

Tag Catalog

Tracking Validation Service

Marketing

Segmentation

Real Time Systems

Experimentation

Metadata

AttributionExploratory Analytics Predictive Analytics

Big Data

Mobile

Page 8: PayPal  Real Time Analytics

© 2014 PayPal Inc. All rights reserved. Confidential and proprietary.

Reporting & Visualization

Pathing Store

Logical View

Client Side Events

Page Performance

Events

Server Side Events

Collection Service

Sessionization

Behavioral Metrics

Marketing Metrics

Metadata Instrumentation Collection Processing Analytics

Performance Metrics

Operational Metrics (OpenTSDB)

DRUIDMetrics Store

Real Time Event

Metrics

Page 9: PayPal  Real Time Analytics

© 2014 PayPal Inc. All rights reserved. Confidential and proprietary.

Metadata –Logical Entity Model

COMPONENTS

PAGETEMPLATE

TAGS

LINK

Page 10: PayPal  Real Time Analytics

© 2014 PayPal Inc. All rights reserved. Confidential and proprietary.

Metadata – Logical Event Model

ImpressionEvent

TrackingEvent

ReactionEvent

ComponentImpression

Event

AdImpression

Event

ClickEvent

Click-ThroughEvent

Mouse-overEvent

EntryEvent

ExitEvent

OutcomeEvent

PageImpression

Event

Client PageImpression

Event

Server PageImpression

Event

Page 11: PayPal  Real Time Analytics

© 2014 PayPal Inc. All rights reserved. Confidential and proprietary.

11

Metadata - Self-Service Management Workflow…

Page 12: PayPal  Real Time Analytics

© 2014 PayPal Inc. All rights reserved. Confidential and proprietary.

DATA PIPELINEProcessing Analysis &

VisualizationClientSide

Metadata

Performance

Collection

Metrics

Tools

RESTSpout

Bot flagging

Bolt

AggregationSessionization

RESTProxy

HTTP

ServerSide

Geo Enrichment

Bolt Reporting

Data Stores

Druid

Apache Titan

DevelopersProduct Owners

Customers

Meta data

Reporting Consumers

Metadata Service

Page 13: PayPal  Real Time Analytics

© 2014 PayPal Inc. All rights reserved. Confidential and proprietary.

Druid Architecture

• Open-source• Distributed • Real-time • Highly-Available Data store• Column-oriented• Approximate or Exact

Page 14: PayPal  Real Time Analytics

© 2014 PayPal Inc. All rights reserved. Confidential and proprietary.

14

• Ingest data and buffer events in memory

• Incremental indexing• Query data as soon as it is

ingested• Periodically persist collected

events to disk • Combine multiple disk indexes

to create immutable ‘segments’• Log-structured merge-tree

Real Time Nodes

Page 15: PayPal  Real Time Analytics

© 2014 PayPal Inc. All rights reserved. Confidential and proprietary.

Druid Architecture

Page 16: PayPal  Real Time Analytics

© 2014 PayPal Inc. All rights reserved. Confidential and proprietary.

16

Historical Nodes

• Load immutable read-optimized data from deep storage

• Memory mapped storage engine• Caches segments • Supports tiered storage

Page 17: PayPal  Real Time Analytics

© 2014 PayPal Inc. All rights reserved. Confidential and proprietary.

Druid Architecture

Page 18: PayPal  Real Time Analytics

© 2014 PayPal Inc. All rights reserved. Confidential and proprietary.

18

Druid Systems Overview

Page 19: PayPal  Real Time Analytics

© 2014 PayPal Inc. All rights reserved. Confidential and proprietary.

19

"type": "doubleSum", "name": "pageviews", "fieldName": "PV" }, { "type": "doubleSum", "name": "bounces", "fieldName": "bnc" },.... { "type": "hyperUnique", "name": "unique_visits", "fieldName": "user_session_guid" }, { "type": "hyperUnique", "name": "unique_visitors", "fieldName": "user_guid" }

2014/06/11/10", "filter": "part-", "parser": { "type": "string", "timestampSpec": { "column": "timestamp", "format": "auto" }, "data": { "format": "json", "dimensions": [ "timestamp", "USER_GUID", "USER_SESSION_GUID", "PAGE_GROUP", "PAGE_NAME", "PAGEGROUP_LINK_NAME", "PAGE_LINK_NAME",

Metrics & Dimensions

Standard

Metrics

Estimated

Metrics

HyperLogLog

Dimensions

Page 20: PayPal  Real Time Analytics

© 2014 PayPal Inc. All rights reserved. Confidential and proprietary.

20

Sessionization

Visitor ID

SessionID

Timestamp EventPayload

V1 S1 2014-10-16 05:12

E1

V2 S2 2014-10-16 05:14

E2

V1 S1 2014-10-16 05:15

E3

V1 S1 2014-10-16 05:20

E4

V2 S2 2014-10-16 05:21

E5

V1 S3 2014-10-16 05:25

E6

… … … …

Visitor ID

SessionID

Payload

V1 S1 sf, mac, {flash, quicktime}, {ca, usa}, 480 secs,….

E1

E3

E4V2 S2 ff, win, {acrobat, mediaplayer}.

{wb, in}, 420 secs…..E2

E5

V1 S3 sf, mac, {quicktime, java}, {on, ca}, 60 secs

E6

Events VisitContainer

Page 21: PayPal  Real Time Analytics

© 2014 PayPal Inc. All rights reserved. Confidential and proprietary.

21

Druid Storage – Columns & Dictionaries Timestamp (Hr) Sessi

onID

Country OS UserAgent

Page Name

2014-10-16 05 S1 US MAC SF LoginAccountOverview

2014-10-16 05 S2 DE WIN IE LoginPaymentReviewAccountHistory

2014-10-16 05 S3 US LNX FF LoginPaymentReview

Checkout

2014-10-16 05 S4 UK LNX FF LoginProfile

Checkout

2014-10-16 05 S5 DE WIN CR LoginProfile

2014-10-16 05 S6 UK MAC SF LoginAccountOverview

Checkout

Page Name

01

023

024

054

05

014

Dictionary

Login 0

AccountOverview

1

PaymentReview 2

AccountHistory 3

Checkout 4

Profile 5LZF

Page 22: PayPal  Real Time Analytics

© 2014 PayPal Inc. All rights reserved. Confidential and proprietary.

22

Druid Data Structure - Bitmap Indices

Page 23: PayPal  Real Time Analytics

© 2014 PayPal Inc. All rights reserved. Confidential and proprietary.

23

Herald – Self Service Analytics

Page 24: PayPal  Real Time Analytics

© 2014 PayPal Inc. All rights reserved. Confidential and proprietary.

24

Herald – Self Service Analytics

Page 25: PayPal  Real Time Analytics

© 2014 PayPal Inc. All rights reserved. Confidential and proprietary.

25

Druid Metrics

Page 26: PayPal  Real Time Analytics

© 2014 PayPal Inc. All rights reserved. Confidential and proprietary.

26

Enter

Pathing

Page 27: PayPal  Real Time Analytics

© 2014 PayPal Inc. All rights reserved. Confidential and proprietary.

27

Fallout Reports

Page 28: PayPal  Real Time Analytics

© 2014 PayPal Inc. All rights reserved. Confidential and proprietary.

28

Visitor ID Current Page Next Page 1 Next Page 2 Prev Page 1 Prev Page 2

S1 A B C null null

S1 B C D A null

S1 C D X B A

S1 D X A C B

S1 X A M D C

S1 A M null X D

S1 M Null null A X

S2 A B C null Null

S2 B C D null A

S2 C D E B A

S2 D E Null C B

S2 E Null null D C

A->B->C->D->X->A->M and A->B->C->D->E Pathing

Page 29: PayPal  Real Time Analytics

© 2014 PayPal Inc. All rights reserved. Confidential and proprietary.

29

Next Page{ “queryType” : “groupBy” “dimensions” : (“current_page”, “dimensions like country, segmentation etc”} “aggregations” : [ { “type”: “count”, “name”: “next_page_count”, “fieldname” : “next_page, next_page2” }] “filter”: { “type”: “selector”, “dimension”: “current_page”, “value”: “C” }}

Previous Page{ “queryType” : “groupBy” “dimensions” : {“current_page”, “dimensions like country, segmentations etc”} “aggregations” : [ { “type”: “count”, “name”: “prev_page_count”, “fieldname” : “prev_page1, prev_page2” }] “filter”: { “type”: “selector”, “dimension”: “current_page”, “value”: “C” }}

Pathing

Page 30: PayPal  Real Time Analytics

© 2014 PayPal Inc. All rights reserved. Confidential and proprietary.

30

Fallout

• Apply them to the dictionary• Figure out the values that match• Take those bitmap indices• OR the bitmap indices together• Use the output bitmap as the filter

A->D-> X->M

“queryType” : “search” “dimensions” : { “current_page_path_count”, “dimensions like country, segmentation etc”} “filter”: { “type”: “regex”, “dimension”: “next_page_path”, “pattern”: “^A*D*X*M$” }}

A->B->C->D->X->A->M

Page 31: PayPal  Real Time Analytics

© 2014 PayPal Inc. All rights reserved. Confidential and proprietary.

31

Model View

Controller

NVD3Directives

CL

IEN

TS

ER

VE

RHerald Architecture

Page 32: PayPal  Real Time Analytics

© 2014 PayPal Inc. All rights reserved. Confidential and proprietary.

32

SSO

Druid

Herald Deployment

Page 33: PayPal  Real Time Analytics

© 2014 PayPal Inc. All rights reserved. Confidential and proprietary.

33

Name:

Login_2014101611

Country: USCount: 15

Name:

AccountOverview_2014101611

Name:

PaymentReview_

2014101611

Name:

Checkout_2014101611

Count: 8

Country: USCount: 5

Count: 7

Country: USCount: 5

Country: USCount: 10

Count: 5

Count: 5

5

8

7

6

Adhoc Graph Analytics

Page 34: PayPal  Real Time Analytics

© 2014 PayPal Inc. All rights reserved. Confidential and proprietary.

34

Name: Login_2014101611

Country: USCount: 15

Name: AccountOverview_2014101611

Name: PaymentReview_2014101611

Name: Checkout_2014101611

Count: 8

Country: USCount: 5

Count: 7

Country: USCount: 5

Country: USCount: 10

Count: 5

Count: 5

5

8

7

6

gremlin> g.v(‘Name’, ‘Login_2014101611').as('x’).

outE.inV.loop('x'){it.loops < 4}

{it.object.getProperty('name') ==

'Checkout_2014101611'}.path

Page 35: PayPal  Real Time Analytics

© 2014 PayPal Inc. All rights reserved. Confidential and proprietary.

35

Summary

• Problem• Understand our customer behavior• Across disparate channels & experiences

• Solution• Democratize data• Consistent standardized metadata• Disciplined instrumentation• Distributed scalable backend for adhoc & interactive analytics• Self-service BI through modern visualization tools

Page 36: PayPal  Real Time Analytics

© 2014 PayPal Inc. All rights reserved. Confidential and proprietary.

Questions ?