ezcater accomplishes early prediction of ltv with rapidminer · data new data id order size day of...

70
ezCater Accomplishes Early Prediction of LTV with RapidMiner

Upload: others

Post on 21-May-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: ezCater Accomplishes Early Prediction of LTV with RapidMiner · Data New Data id Order size Day of week Food Type Time of day Head-count Location Actual Y1R 1001 103.45 Mon Mex 10

ezCater Accomplishes Early Prediction of LTV with

RapidMiner

Page 2: ezCater Accomplishes Early Prediction of LTV with RapidMiner · Data New Data id Order size Day of week Food Type Time of day Head-count Location Actual Y1R 1001 103.45 Mon Mex 10

AgendaIt begins: “The Promotion”

Estimating Lifetime Value: A summary of my Googling

Aside: Why Y1R instead of LTV/CLV?

Getting the Data

Training

What did we learn about machine learning?

Limits of Regression

Enter RapidMiner

1

2

3

4

5

6

7

8

Page 3: ezCater Accomplishes Early Prediction of LTV with RapidMiner · Data New Data id Order size Day of week Food Type Time of day Head-count Location Actual Y1R 1001 103.45 Mon Mex 10

A Bit About ezCater

ezCater is an online marketplace for business and corporate catering. Need food for your

next team lunch? Order it on ezCater!

We’ve raised 169 million to date

We are growing between 2-3x year over year

Working at ezCater is AWESOME and yes,

we are hiring!

We have 400+ employees

Page 4: ezCater Accomplishes Early Prediction of LTV with RapidMiner · Data New Data id Order size Day of week Food Type Time of day Head-count Location Actual Y1R 1001 103.45 Mon Mex 10

A Bit About

@jdwyah

engineering.ezcater.com

Distinguished Software Engineer

Works on growth at ezCater

Not a machine learning expert

Jeff Dwyer

Page 5: ezCater Accomplishes Early Prediction of LTV with RapidMiner · Data New Data id Order size Day of week Food Type Time of day Head-count Location Actual Y1R 1001 103.45 Mon Mex 10

ezCater the business

• ezCater is part marketplace, part SaaS

• Customers love us, thus they keep using us

• Good retention makes the unit economics SaaS-like

• Customer quality > Number of new customers

Page 6: ezCater Accomplishes Early Prediction of LTV with RapidMiner · Data New Data id Order size Day of week Food Type Time of day Head-count Location Actual Y1R 1001 103.45 Mon Mex 10

The Promotion

You’re a SaaS business. You've been acquiring customers and on average they're worth about X. Somebody clever suggests a promotion: "$5 off your first order."

You release it and boom! Conversion rates increase, the number of new customers increase. Yay, right?

Page 7: ezCater Accomplishes Early Prediction of LTV with RapidMiner · Data New Data id Order size Day of week Food Type Time of day Head-count Location Actual Y1R 1001 103.45 Mon Mex 10

The Promotion

But then someone asks that pernicious question "are we sure these are still 'good' customers"?

Put simply "Is the promotion worth it?”

Page 8: ezCater Accomplishes Early Prediction of LTV with RapidMiner · Data New Data id Order size Day of week Food Type Time of day Head-count Location Actual Y1R 1001 103.45 Mon Mex 10

Disclaimer*

*All the numbers in this webinar are made up :)

Page 9: ezCater Accomplishes Early Prediction of LTV with RapidMiner · Data New Data id Order size Day of week Food Type Time of day Head-count Location Actual Y1R 1001 103.45 Mon Mex 10

SaaS Unit Economics

https://www.forentrepreneurs.com/saas-metrics-2/

Page 10: ezCater Accomplishes Early Prediction of LTV with RapidMiner · Data New Data id Order size Day of week Food Type Time of day Head-count Location Actual Y1R 1001 103.45 Mon Mex 10

SaaS Unit Economics

https://www.forentrepreneurs.com/saas-metrics-2/

Page 11: ezCater Accomplishes Early Prediction of LTV with RapidMiner · Data New Data id Order size Day of week Food Type Time of day Head-count Location Actual Y1R 1001 103.45 Mon Mex 10

SaaS Unit Economics

RETENTION. IS.KING.

Page 12: ezCater Accomplishes Early Prediction of LTV with RapidMiner · Data New Data id Order size Day of week Food Type Time of day Head-count Location Actual Y1R 1001 103.45 Mon Mex 10

SaaS Unit Economics

RETENTION. IS.LTV.

Page 13: ezCater Accomplishes Early Prediction of LTV with RapidMiner · Data New Data id Order size Day of week Food Type Time of day Head-count Location Actual Y1R 1001 103.45 Mon Mex 10

A Summary of my Googling

Estimating LTV

Page 14: ezCater Accomplishes Early Prediction of LTV with RapidMiner · Data New Data id Order size Day of week Food Type Time of day Head-count Location Actual Y1R 1001 103.45 Mon Mex 10

Basics

https://blog.profitwell.com/how-to-calculate-ltv-for-saas-the-right-way

Page 15: ezCater Accomplishes Early Prediction of LTV with RapidMiner · Data New Data id Order size Day of week Food Type Time of day Head-count Location Actual Y1R 1001 103.45 Mon Mex 10

Basics

• Totally “correct”

• Totally not actionable

Page 16: ezCater Accomplishes Early Prediction of LTV with RapidMiner · Data New Data id Order size Day of week Food Type Time of day Head-count Location Actual Y1R 1001 103.45 Mon Mex 10

Example

• I have 1000 users at ARPU of $1000

• Idea: Give new users a $500 iPad if they signup and we’ll still make $500!

Page 17: ezCater Accomplishes Early Prediction of LTV with RapidMiner · Data New Data id Order size Day of week Food Type Time of day Head-count Location Actual Y1R 1001 103.45 Mon Mex 10

Example

Customer Quality Matters

Page 18: ezCater Accomplishes Early Prediction of LTV with RapidMiner · Data New Data id Order size Day of week Food Type Time of day Head-count Location Actual Y1R 1001 103.45 Mon Mex 10

Basics

It all boils down to the question of

Given X days of customer data how can I predict their year one revenue and at what accuracy

Page 19: ezCater Accomplishes Early Prediction of LTV with RapidMiner · Data New Data id Order size Day of week Food Type Time of day Head-count Location Actual Y1R 1001 103.45 Mon Mex 10

Naming things mattersY1R vs LTV

Page 20: ezCater Accomplishes Early Prediction of LTV with RapidMiner · Data New Data id Order size Day of week Food Type Time of day Head-count Location Actual Y1R 1001 103.45 Mon Mex 10

LTV is Complex

”LTV” and ”CLV” are loaded terms. They imply:

• Multi-year retention profiles

• Weighted cost of capital

Page 21: ezCater Accomplishes Early Prediction of LTV with RapidMiner · Data New Data id Order size Day of week Food Type Time of day Head-count Location Actual Y1R 1001 103.45 Mon Mex 10

Y1R is SimpleY1R = “Year 1 Revenue”

• Just as actionable as LTV

• No arguing about definitions

• Extensible!

• Y1B = Year 1 Bookings

• Y1M = Year 1 Margin

• EY1R = Estimated Year 1 Revenue

• D180R = Day 180 Revenue

Page 22: ezCater Accomplishes Early Prediction of LTV with RapidMiner · Data New Data id Order size Day of week Food Type Time of day Head-count Location Actual Y1R 1001 103.45 Mon Mex 10

Data

Page 23: ezCater Accomplishes Early Prediction of LTV with RapidMiner · Data New Data id Order size Day of week Food Type Time of day Head-count Location Actual Y1R 1001 103.45 Mon Mex 10

We’re going to need some

Data

Page 24: ezCater Accomplishes Early Prediction of LTV with RapidMiner · Data New Data id Order size Day of week Food Type Time of day Head-count Location Actual Y1R 1001 103.45 Mon Mex 10

Data

Page 25: ezCater Accomplishes Early Prediction of LTV with RapidMiner · Data New Data id Order size Day of week Food Type Time of day Head-count Location Actual Y1R 1001 103.45 Mon Mex 10

Dataid Order

sizeDay of week

Food Type

Time of day

Head-count Location Actual Y1R

1001 103.45 Mon Mex 10 9 MA 500

1002 140.12 Fri BBQ 11 11 NH 200

1003 35.00 Sat Thai 9 3 CA 10

1004 201.12 Mon Mex 12 20 TX 30

1005 55.32 Tue Burg 12 3 MI 14

id Order size

Day of week

Food Type

Time of day

Head-count Location Actual Y1R

1008 93.45 Sun BBQ 9 8 VT

1009 123.99 Sat Burg 14 10 MI

1010 18.00 Mon Mex 9 22 TX

1011 182.12 Tue Mex 16 9 FL

1012 65.32 Tue Burg 12 3 MI

Training

New Data

Page 26: ezCater Accomplishes Early Prediction of LTV with RapidMiner · Data New Data id Order size Day of week Food Type Time of day Head-count Location Actual Y1R 1001 103.45 Mon Mex 10

Data

New Data

id Order size

Day of week

Food Type

Time of day

Head-count Location Actual Y1R

1001 103.45 Mon Mex 10 9 MA 500

1002 140.12 Fri BBQ 11 11 NH 200

1003 35.00 Sat Thai 9 3 CA 10

1004 201.12 Mon Mex 12 20 TX 30

1005 55.32 Tue Burg 12 3 MI 14

id Order size

Day of week

Food Type

Time of day

Head-count Location Actual Y1R

1008 93.45 Sun BBQ 9 8 VT

1009 123.99 Sat Burg 14 10 MI

1010 18.00 Mon Mex 9 22 TX

1011 182.12 Tue Mex 16 9 FL

1012 65.32 Tue Burg 12 3 MI

Training

New Data

DATA THAT LOOKS LIKE THIS IS WHAT ML LOVES

Page 27: ezCater Accomplishes Early Prediction of LTV with RapidMiner · Data New Data id Order size Day of week Food Type Time of day Head-count Location Actual Y1R 1001 103.45 Mon Mex 10

Data

But… the data is not in the warehouse

Page 28: ezCater Accomplishes Early Prediction of LTV with RapidMiner · Data New Data id Order size Day of week Food Type Time of day Head-count Location Actual Y1R 1001 103.45 Mon Mex 10

What is Stitch?

A SaaS platform for consolidating data from a wide array of data sources to data warehouses for analysis.

Page 29: ezCater Accomplishes Early Prediction of LTV with RapidMiner · Data New Data id Order size Day of week Food Type Time of day Head-count Location Actual Y1R 1001 103.45 Mon Mex 10

Getting going with Stitch

• Self-serve setup is easy

• Free plans for smaller data volumes

• Standard paid plans start at $100

• No commitment required

• No incremental cost for additional data sources

Page 30: ezCater Accomplishes Early Prediction of LTV with RapidMiner · Data New Data id Order size Day of week Food Type Time of day Head-count Location Actual Y1R 1001 103.45 Mon Mex 10

A platform with extensible data sources

• Any developer can build an integration

• Send data to Stitch, or another destination

• Existing Stitch integrations run on Singer

• Enables developers to support any use case

Singer is an open-source standard for writing scripts that move data

Page 31: ezCater Accomplishes Early Prediction of LTV with RapidMiner · Data New Data id Order size Day of week Food Type Time of day Head-count Location Actual Y1R 1001 103.45 Mon Mex 10

Stitch is an Enterprise SolutionA simple solution to a complex problem

Data Sources Destinations Open Source

Flexible UI

Instant Connections

Configurable Frequencies

Historical Backload

70+ Integrations Today

Amazon Redshift

Google BigQuery

Snowflake

Panoply

PostgreSQL

Powered by singer.io

Simple & Composable

JSON Based Standard

Extensible by Anyone

Embed in your Application

Platform

Integration Scalability Reliability Security Compliance Extensibility

Page 32: ezCater Accomplishes Early Prediction of LTV with RapidMiner · Data New Data id Order size Day of week Food Type Time of day Head-count Location Actual Y1R 1001 103.45 Mon Mex 10

Better Together J

Read the Stitch - ezCater case study

https://www.stitchdata.com/customers/ezcater-enterprise-etl/

+

Page 33: ezCater Accomplishes Early Prediction of LTV with RapidMiner · Data New Data id Order size Day of week Food Type Time of day Head-count Location Actual Y1R 1001 103.45 Mon Mex 10

Data: Stitch

Page 34: ezCater Accomplishes Early Prediction of LTV with RapidMiner · Data New Data id Order size Day of week Food Type Time of day Head-count Location Actual Y1R 1001 103.45 Mon Mex 10

Data: Stitch Singer

• No waiting for AppsFlyer integration

• No ongoing costs to support AppsFlyer

• Contributing to existing HubSpot integration code

• Less lock-in

• Contractors can contribute

• Singer Slack Channel

Page 35: ezCater Accomplishes Early Prediction of LTV with RapidMiner · Data New Data id Order size Day of week Food Type Time of day Head-count Location Actual Y1R 1001 103.45 Mon Mex 10

Time to let the machine’s learn

Learning

Page 36: ezCater Accomplishes Early Prediction of LTV with RapidMiner · Data New Data id Order size Day of week Food Type Time of day Head-count Location Actual Y1R 1001 103.45 Mon Mex 10

“When starting, always do a simple regression first”

- Everbody

Training: Start Linear

Page 37: ezCater Accomplishes Early Prediction of LTV with RapidMiner · Data New Data id Order size Day of week Food Type Time of day Head-count Location Actual Y1R 1001 103.45 Mon Mex 10

Dataid Order

sizeDay of week

Food Type

Time of day

Head-count Location Actual Y1R

1001 103.45 Mon Mex 10 9 MA 500

1002 140.12 Fri BBQ 11 11 NH 200

1003 35.00 Sat Thai 9 3 CA 10

1004 201.12 Mon Mex 12 20 TX 30

1005 55.32 Tue Burg 12 3 MI 14

id Order size

Day of week

Food Type

Time of day

Head-count Location Actual Y1R

1008 93.45 Sun BBQ 9 8 VT

1009 123.99 Sat Burg 14 10 MI

1010 18.00 Mon Mex 9 22 TX

1011 182.12 Tue Mex 16 9 FL

1012 65.32 Tue Burg 12 3 MI

Training

New Data

Page 38: ezCater Accomplishes Early Prediction of LTV with RapidMiner · Data New Data id Order size Day of week Food Type Time of day Head-count Location Actual Y1R 1001 103.45 Mon Mex 10

ExpectationsThis… isn’t going to be easy

Page 39: ezCater Accomplishes Early Prediction of LTV with RapidMiner · Data New Data id Order size Day of week Food Type Time of day Head-count Location Actual Y1R 1001 103.45 Mon Mex 10

Out of core learning algorithm

• Popular on Kaggle• Free• Gradient descent• Linear• Logistic• Neural

Training: Vowpal Wabbit

Page 40: ezCater Accomplishes Early Prediction of LTV with RapidMiner · Data New Data id Order size Day of week Food Type Time of day Head-count Location Actual Y1R 1001 103.45 Mon Mex 10

Input Format:40 |b event_day_of_week=Mon |c event_local_time:1200

Pros: • Readable input format• Super fast <1 min for 500k• Built in protection against overfitting

Cons: • Analysis totally DIY• Viewing feature weightings wonky• Totally DIY pipeline• Regression not a fabulous fit

Training: Vowpal Wabbit

Page 41: ezCater Accomplishes Early Prediction of LTV with RapidMiner · Data New Data id Order size Day of week Food Type Time of day Head-count Location Actual Y1R 1001 103.45 Mon Mex 10

Input Format:40 |b event_day_of_week=Mon |c event_local_time:1200

Pros: • Readable input format• Super fast <1 min for 500k• Built in protection against overfitting

Cons: • Analysis totally DIY• Viewing feature weightings wonky• Totally DIY pipeline• Regression not a fabulous fit

Training: Vowpal Wabbit

Page 42: ezCater Accomplishes Early Prediction of LTV with RapidMiner · Data New Data id Order size Day of week Food Type Time of day Head-count Location Actual Y1R 1001 103.45 Mon Mex 10

Limits of Regression

Page 43: ezCater Accomplishes Early Prediction of LTV with RapidMiner · Data New Data id Order size Day of week Food Type Time of day Head-count Location Actual Y1R 1001 103.45 Mon Mex 10

Limits of RegressionRegression + co-dependence =

People that order on the weekend are MUCH worsePeople that make fewer orders are worse

BUT

If people make many orders, than it doesn’t matter if they order on the weekend!

Page 44: ezCater Accomplishes Early Prediction of LTV with RapidMiner · Data New Data id Order size Day of week Food Type Time of day Head-count Location Actual Y1R 1001 103.45 Mon Mex 10

Alternatives to Regression

• Gradient boosted trees• Random forests

• Clustering• Neural nets• SO MANY OTHER CHOICES

Page 45: ezCater Accomplishes Early Prediction of LTV with RapidMiner · Data New Data id Order size Day of week Food Type Time of day Head-count Location Actual Y1R 1001 103.45 Mon Mex 10

Alternatives to Regression

• Gradient boosted trees• Random forests• Clustering• Neural nets• SO MANY OTHER CHOICES

Page 46: ezCater Accomplishes Early Prediction of LTV with RapidMiner · Data New Data id Order size Day of week Food Type Time of day Head-count Location Actual Y1R 1001 103.45 Mon Mex 10

TrainingChoices

Install xgboost, learn python & scikit,

or maybe R

See whether there’s something to one of

these machine learning companies

1 2

Page 47: ezCater Accomplishes Early Prediction of LTV with RapidMiner · Data New Data id Order size Day of week Food Type Time of day Head-count Location Actual Y1R 1001 103.45 Mon Mex 10

RapidMiner

Page 48: ezCater Accomplishes Early Prediction of LTV with RapidMiner · Data New Data id Order size Day of week Food Type Time of day Head-count Location Actual Y1R 1001 103.45 Mon Mex 10

RapidMiner

Test 4 totally different techniques on the exact same data!

Page 49: ezCater Accomplishes Early Prediction of LTV with RapidMiner · Data New Data id Order size Day of week Food Type Time of day Head-count Location Actual Y1R 1001 103.45 Mon Mex 10

RapidMiner

Compare ROC curves trivially!

Page 50: ezCater Accomplishes Early Prediction of LTV with RapidMiner · Data New Data id Order size Day of week Food Type Time of day Head-count Location Actual Y1R 1001 103.45 Mon Mex 10

RapidMiner

Page 51: ezCater Accomplishes Early Prediction of LTV with RapidMiner · Data New Data id Order size Day of week Food Type Time of day Head-count Location Actual Y1R 1001 103.45 Mon Mex 10

RapidMiner

• Free trial

• ~1 hour to get my CSV -> Regression outputs

• Super easy to explain pipeline to new colleagues

Page 52: ezCater Accomplishes Early Prediction of LTV with RapidMiner · Data New Data id Order size Day of week Food Type Time of day Head-count Location Actual Y1R 1001 103.45 Mon Mex 10

RapidMiner

~1 hour to get my CSV -> Regression outputs

Page 53: ezCater Accomplishes Early Prediction of LTV with RapidMiner · Data New Data id Order size Day of week Food Type Time of day Head-count Location Actual Y1R 1001 103.45 Mon Mex 10

RapidMiner

Query Redshift right from RapidMiner

Page 54: ezCater Accomplishes Early Prediction of LTV with RapidMiner · Data New Data id Order size Day of week Food Type Time of day Head-count Location Actual Y1R 1001 103.45 Mon Mex 10

RapidMiner

Generate new attributes

Page 55: ezCater Accomplishes Early Prediction of LTV with RapidMiner · Data New Data id Order size Day of week Food Type Time of day Head-count Location Actual Y1R 1001 103.45 Mon Mex 10

RapidMiner

Clear Visual Pipeline

Page 56: ezCater Accomplishes Early Prediction of LTV with RapidMiner · Data New Data id Order size Day of week Food Type Time of day Head-count Location Actual Y1R 1001 103.45 Mon Mex 10

RapidMiner

Simple Analysis

Page 57: ezCater Accomplishes Early Prediction of LTV with RapidMiner · Data New Data id Order size Day of week Food Type Time of day Head-count Location Actual Y1R 1001 103.45 Mon Mex 10

RapidMiner

No lack of detail

Page 58: ezCater Accomplishes Early Prediction of LTV with RapidMiner · Data New Data id Order size Day of week Food Type Time of day Head-count Location Actual Y1R 1001 103.45 Mon Mex 10

RapidMiner

Nice things “just work”

Page 59: ezCater Accomplishes Early Prediction of LTV with RapidMiner · Data New Data id Order size Day of week Food Type Time of day Head-count Location Actual Y1R 1001 103.45 Mon Mex 10

(It works!)

Results

Page 60: ezCater Accomplishes Early Prediction of LTV with RapidMiner · Data New Data id Order size Day of week Food Type Time of day Head-count Location Actual Y1R 1001 103.45 Mon Mex 10

Results

Page 61: ezCater Accomplishes Early Prediction of LTV with RapidMiner · Data New Data id Order size Day of week Food Type Time of day Head-count Location Actual Y1R 1001 103.45 Mon Mex 10

Results

R^2 .23 Spearman’s Rho .648 Accuracy of prediction .82

Page 62: ezCater Accomplishes Early Prediction of LTV with RapidMiner · Data New Data id Order size Day of week Food Type Time of day Head-count Location Actual Y1R 1001 103.45 Mon Mex 10

Results

R^2 .504 Spearman’s Rho .834 Accuracy of prediction .89

Page 63: ezCater Accomplishes Early Prediction of LTV with RapidMiner · Data New Data id Order size Day of week Food Type Time of day Head-count Location Actual Y1R 1001 103.45 Mon Mex 10

Results

R^2 .73 Spearman’s Rho .920 Accuracy of prediction .94

Page 64: ezCater Accomplishes Early Prediction of LTV with RapidMiner · Data New Data id Order size Day of week Food Type Time of day Head-count Location Actual Y1R 1001 103.45 Mon Mex 10

Results

Page 65: ezCater Accomplishes Early Prediction of LTV with RapidMiner · Data New Data id Order size Day of week Food Type Time of day Head-count Location Actual Y1R 1001 103.45 Mon Mex 10

Results

EY1R shows major differences in cohort qualityBlue and Red converted the best… but lost on EY1R

Page 66: ezCater Accomplishes Early Prediction of LTV with RapidMiner · Data New Data id Order size Day of week Food Type Time of day Head-count Location Actual Y1R 1001 103.45 Mon Mex 10

ResultsTop Line Company Metrics

# New Customers / week

New EY1R / week

Before After

Page 67: ezCater Accomplishes Early Prediction of LTV with RapidMiner · Data New Data id Order size Day of week Food Type Time of day Head-count Location Actual Y1R 1001 103.45 Mon Mex 10

Results

# New Customers / week

New EY1R / week

Before After

Page 68: ezCater Accomplishes Early Prediction of LTV with RapidMiner · Data New Data id Order size Day of week Food Type Time of day Head-count Location Actual Y1R 1001 103.45 Mon Mex 10

ChallengesIngo’s 2nd mistake

Life changes underneath you: utm_values

But what does it mean?

KISS

Mixing TimeHorizons

Productionization

1

2

3

4

5

6

Page 69: ezCater Accomplishes Early Prediction of LTV with RapidMiner · Data New Data id Order size Day of week Food Type Time of day Head-count Location Actual Y1R 1001 103.45 Mon Mex 10

Takeaways

Estimating LTV is doable

Never say ”LTV” by itself: Prefer EY1R etc

RapidMiner allows mere mortals to use data science

Gradient Boosted Trees are great. (aka “Listen to YY”)

1

2

3

4

Page 70: ezCater Accomplishes Early Prediction of LTV with RapidMiner · Data New Data id Order size Day of week Food Type Time of day Head-count Location Actual Y1R 1001 103.45 Mon Mex 10

Let’s Connect

@jdwyah

engineering.ezcater.com

Software Engineering Manager

Works on growth at ezCater

Not a machine learning expert

Jeff Dwyer