time-to-event models, presented by datasong and revolution analytics

23
© 2013 DataSong, all rights reserved / 234 Front Street, 3rd Floor, San Francisco CA 94111 Time-to-Event Models Consumer Interaction Insight October 2013

Upload: revolution-analytics

Post on 26-Jan-2015

108 views

Category:

Business


1 download

DESCRIPTION

Companies are doing a better and better job of collecting data that explains why consumers behave the way they do. These diverse data sets cause us to rethink some of the workhorse algorithms for data analysis. Specifically, the traditional binary response model leaves much room for improvement in how it embraces time. Cross–sectional models allow much rich data to fall through the cracks. We’ll discuss real-world scenarios and how to better use data with time to event modeling.

TRANSCRIPT

Page 1: Time-to-Event Models, presented by DataSong and Revolution Analytics

© 2013 DataSong, all rights reserved / 234 Front Street, 3rd Floor, San Francisco CA 94111

Time-to-Event Models Consumer Interaction Insight

October 2013

Page 2: Time-to-Event Models, presented by DataSong and Revolution Analytics

Today’s Presenters

Tess A Nesbitt, PhD

Senior Data Scientist

John Wallace

Founder & CEO

Page 3: Time-to-Event Models, presented by DataSong and Revolution Analytics

Agenda

About us

Problem statement

High level modeling approach.

Use cases

Scoring systems

Q&A

Page 4: Time-to-Event Models, presented by DataSong and Revolution Analytics

DataSong at a Glance

Approaching $1 trillion in revenue analyzed. $2 billion in marketing spend under our lens.

Experienced 60 person team based out of San Francisco, with offices in Seattle, LA and India.

Founded in 2003 with a proven history of solving difficult analytics problems. Evolved from consulting

through close partnerships with clients.

Customer interaction insight that powers applications for customer level revenue attribution,

targeting, media optimization

Actionable and accurate information that drives customer acquisition and revenue growth for

modern direct marketers.

Patented big data approach models behavior at the individual consumer level.

Page 5: Time-to-Event Models, presented by DataSong and Revolution Analytics

DataSong Offerings

1. A regression modeling framework for prediction and inference

2. Automation of modelsets in Hadoop

3. Enterprise grade scoring in Hadoop

Page 6: Time-to-Event Models, presented by DataSong and Revolution Analytics

Modelset Creation: Current State

Flatten out the data

• 1. Aggregate a fact table (sum, count)

• 2. Join a dimension to a fact table and aggregate it (sum, count)

• 3. Superpose time

• If we have a dimension with a cardinality of 25 and 6 time periods of interest, that’s 150 variables for 1 dimension

AccountNo #SiteVisits

123456 5

AccountNo #Visits_SEO #Visits_EmailClick #Visits_SEM #Visits_...

123456 3 1 1 …

AccountNo #Visits_SEO_1Mo #Visits_SEO_3Mo #Visits_SEO_6Mo #Visits_SEO_...

123456 1 2 3 …

Page 7: Time-to-Event Models, presented by DataSong and Revolution Analytics

In Our Opinion

“Feature Engineering”

• Creating good variables is many times more important than choice of algorithm

Don’t lose track of time

• Age old practice of flattening data into 1 row per customer with 1000s of variables is limiting

Aggregations can obfuscate

Time series without customer- level data overlook important causal relationships

Page 8: Time-to-Event Models, presented by DataSong and Revolution Analytics

New Challenges for Predictive Modeling

More and more of our input data is generated from log files

• Large observational data (or if you want to call it Big Data, you can)

• We are approaching an infinite number of variables to test

Increasing # of use cases for real time scoring

Increasing # of opportunities to use models for inference

Page 9: Time-to-Event Models, presented by DataSong and Revolution Analytics

Understanding the Baseline Hazard

Page 10: Time-to-Event Models, presented by DataSong and Revolution Analytics

What Are We Doing About it?

Survival Response Model

• Explains differences in response rate as we change exposure to marketing

• Know what was significant and what wasn’t

Account ID-level analysis follows customers and cookies over time

Time-dependent Outcome had an event or was censored

Time-dependent Covariates the effect of an event is not constant

Time-varying Covariates time may modify an event effect

Controls for non-marketing effects:

Baseline Hazard Rate

Customer-driven activity many customers are driven by loyalty vs. marketing

Anniversary Effects many sales driven by season demand vs. marketing

Page 11: Time-to-Event Models, presented by DataSong and Revolution Analytics

CUSTOMER INTERACTION

BEHAVIORAL

LOYALTY

ID L

EV

EL

TIM

ES

TA

MP

DA

TA

MARKETING

SERVICE

LOYALTY

TELEMATIC

Prior Transactions

Email

Impressions

DM

Referring clicks

In-store service

Call center

Inbound email/forms

Redemptions

Point balances

GPS data

Smart devices

EXAMPLE

Page 12: Time-to-Event Models, presented by DataSong and Revolution Analytics

CUSTOMER

SERVICE

CUSTOMER INTERACTION OBJECTIVE TIME APPROACH OUTCOME

BEHAVIORAL

LOYALTY

SITE VISIT

SU

BS

CR

IPT

ION

-CE

NT

RIC

ID

LE

VE

L T

IME

ST

AM

P D

AT

A

INFERENCE

PREDICTION

TIME-TO-EVENT

POINT-IN-TIME

Response

Model MARKETING

SERVICE

LOYALTY

TELEMATIC

PRICE/

PROMOTION

COMPETITION

SEASONALITY

UPGRADE

LEAVE

DEFAULT MA

CR

O D

AT

A

SITE VISIT

PURCHASE

Page 13: Time-to-Event Models, presented by DataSong and Revolution Analytics

CUSTOMER

SERVICE

CUSTOMER INTERACTION OBJECTIVE TIME APPROACH OUTCOME

BEHAVIORAL

LOYALTY

SITE VISIT

SU

BS

CR

IPT

ION

-CE

NT

RIC

ID

LE

VE

L T

IME

ST

AM

P D

AT

A

INFERENCE

PREDICTION

TIME-TO-EVENT

POINT-IN-TIME

Voluntary

Churn

Model

MARKETING

SERVICE

LOYALTY

TELEMATIC

PRICE/

PROMOTION

COMPETITION

SEASONALITY

UPGRADE

LEAVE

DEFAULT MA

CR

O D

AT

A

SITE VISIT

PURCHASE

Page 14: Time-to-Event Models, presented by DataSong and Revolution Analytics

CUSTOMER

SERVICE

CUSTOMER INTERACTION OBJECTIVE TIME APPROACH OUTCOME

BEHAVIORAL

LOYALTY

SITE VISIT

SU

BS

CR

IPT

ION

-CE

NT

RIC

ID

LE

VE

L T

IME

ST

AM

P D

AT

A

INFERENCE

PREDICTION

TIME-TO-EVENT

POINT-IN-TIME

Involuntary

Churn

Model

MARKETING

SERVICE

LOYALTY

TELEMATIC

PRICE/

PROMOTION

COMPETITION

SEASONALITY

UPGRADE

LEAVE

DEFAULT MA

CR

O D

AT

A

SITE VISIT

PURCHASE

Page 15: Time-to-Event Models, presented by DataSong and Revolution Analytics

SITE VISIT

CUSTOMER

SERVICE

PURCHASE

CUSTOMER INTERACTION OBJECTIVE TIME APPROACH OUTCOME

BEHAVIORAL

LOYALTY

SU

BS

CR

IPT

ION

-CE

NT

RIC

ID

LE

VE

L T

IME

ST

AM

P D

AT

A

INFERENCE

PREDICTION

TIME-TO-EVENT

POINT-IN-TIME

Simple

Attribution

Model

MARKETING

SERVICE

LOYALTY

TELEMATIC

PRICE/

PROMOTION

COMPETITION

SEASONALITY

UPGRADE

LEAVE

DEFAULT MA

CR

O D

AT

A

Page 16: Time-to-Event Models, presented by DataSong and Revolution Analytics

SITE VISIT

CUSTOMER

SERVICE

PURCHASE

CUSTOMER INTERACTION OBJECTIVE TIME APPROACH OUTCOME

BEHAVIORAL

LOYALTY

SU

BS

CR

IPT

ION

-CE

NT

RIC

ID

LE

VE

L T

IME

ST

AM

P D

AT

A

INFERENCE

PREDICTION

TIME-TO-EVENT

POINT-IN-TIME

Incremental

Attribution

Model

MARKETING

SERVICE

LOYALTY

TELEMATIC

PRICE/

PROMOTION

COMPETITION

SEASONALITY

UPGRADE

LEAVE

DEFAULT MA

CR

O D

AT

A

Page 17: Time-to-Event Models, presented by DataSong and Revolution Analytics

Customer

3

Customer

2

Customer

1

What Would the Model Say?

JANUARY FEBRUARY MARCH APRIL MAY JUNE

PURCHASE

CA

TA

LO

G

EM

AIL

CA

TA

LO

G

EM

AIL

EM

AIL

EM

AIL

CA

TA

LO

G

EM

AIL

$100 PURCHASE

PURCHASE $100 PURCHASE

PURCHASE $100 PURCHASE PURCHASE

DAYS SINCE TREATMENT SALES ALLOCATION

customer sales Catalog Email Retarget Cumulative

Orders Catalog Email Retarget Brand Loyalty

#1 $ 100 20 40 0 1 $ 95.66 $ 0.02 $ - $ 4.32

#2 $ 100 20 10 0 1 $ 77.52 $ 18.16 $ - $ 4.32

#3 $ 100 20 10 0 2 $ 69.94 $ 17.74 $ - $ 12.32

Page 18: Time-to-Event Models, presented by DataSong and Revolution Analytics

Functions Used Purpose

rxImport read in data from flat files

READ/WRITE rxDataStep read from XDF file, output to xdf file

rxReadXdf read from XDF file, can output to dataframe

rxSummary calculate summary stats on XDF file

rxCrossTabs build contingency tables of factors

EDA rxCube build contingency tables of factors

rxHistogram create histograms of numeric vars

rxQuantile calculate quantiles of numeric vars

rxLogit build logistic regression models

MODELING rxPredict score data from xdf with specifed model

rxRocCurve evaluate false and true positives of models

rxDTree* build classification and regression trees

Revolution R Enterprise ScaleR Functions Used

Run time for 30MM rows

and 30 variables is

approx 5 min

Page 19: Time-to-Event Models, presented by DataSong and Revolution Analytics

Prediction: Current State

How did we deliver?

Propensity Score (LOW HIGH)

Other models only use one dimension to predict likelihood to purchase: PROPENSITY

Page 20: Time-to-Event Models, presented by DataSong and Revolution Analytics

Prediction: DataSong Approach

Incrementality Metric

Sensitivity

Score

● Breakthrough results from adding customer sensitivity score: 14% increase in response rate

● Reallocated marketing circulation: Identified best prospects to not mail that were likely to purchase without receiving catalog

Propensity Score (LOW HIGH)

(LO

W

HIG

H)

Response modeling single channel: swap set usage

INCREMENTALITY metric predicts sensitivity of the next marketing treatment

Page 21: Time-to-Event Models, presented by DataSong and Revolution Analytics

Scoring Discussion

Scoring systems are like picture frames: good art is never without one

Your best model may never see the light of day

• Sharing your parameter estimates isn’t enough

Who should own scoring ?

• IT: Production support, high uptime mentality

• Analytics: often missing the software engineering discipline

Scale

Analytics teams should be able to manage dozens of models and score billions of records everyday

Page 22: Time-to-Event Models, presented by DataSong and Revolution Analytics

DataSong Architecture

• ETL

• N marketing channels

• Behavioral variables

• Promotional data

• Overlay data

• Functions to read Hadoop output; xdf creation

• Exploratory data analysis

• GAM survival models

• Scoring for inference

• Scoring for prediction

• 5 billion scores per day

per customer

DATASONG DATA

FORMAT (DDF)

CUSTOM VARIABLES

(PMML)

Page 23: Time-to-Event Models, presented by DataSong and Revolution Analytics

DataSong Contact

1. A regression modeling framework for prediction and inference

2. Automation of modelsets in Hadoop

3. Enterprise grade scoring in Hadoop

Linked In: www.linkedin.com/company/datasong

Facebook: www.facebook.com/datasong

Twitter: www.twitter.com/datasong

Phone: 877.540.5910

Email: [email protected]