a predictive model factory picks up steam

13
1 © 2014 Cisco and/or its affiliates. All rights reserved. A Predictive Model Factory Picks Up Steam H2O and Cisco’s Propensity to Buy Factory Lou Carvalheira H2O World - Nov2014

Upload: sri-ambati

Post on 12-Jul-2015

819 views

Category:

Data & Analytics


2 download

TRANSCRIPT

Page 1: A Predictive Model Factory Picks Up Steam

1© 2014 Cisco and/or its affiliates. All rights reserved.

A Predictive Model Factory Picks Up Steam

H2O and Cisco’s Propensity to Buy Factory

Lou Carvalheira

H2O World - Nov2014

Page 2: A Predictive Model Factory Picks Up Steam

© 2014 Cisco and/or its affiliates. All rights reserved. 2

Who we are:

• 20 professionals with advanced degrees in Statistics, Mathematics and Econometrics

• Contributed to more than $3 Billion in additional bookings to Cisco since 2007 (measured with the use of control groups in models deployed by the Marketing and Sales organizations)

• Recipients of the 2008 Gold Award for Analytical Modeling from the National Conference for Database Marketing

What we do:

• Predictive Modeling, Customer Valuation, Forecasting, Optimization

Our Mission:

• Deliver insights that will influence and improve Sales and Marketing initiatives and that are derived through the use of Statistics and Data Mining

Page 3: A Predictive Model Factory Picks Up Steam

3© 2014 Cisco and/or its affiliates. All rights reserved.

Page 4: A Predictive Model Factory Picks Up Steam

© 2014 Cisco and/or its affiliates. All rights reserved. 4

Build

Statistical

Model

Historical

• Firmographics

• Past Purchase Behavior

• Contacts (# and types)

• Marketing Interactions

• Cust Sat surveys

• Macroeconomic indicators

• Purchase / Non Purchase

• SCORE: probability that a company

will buy a specific technology in the

next quarter

• VALUE: bookings amount that

Cisco will likely see IF the company

in fact buys the technology

t

Q1 Q2Q4

(most recent

closed quarter)

Q3Q2Q1Q4

past purchase behavior

and marketing interactions

firmographics

and contact data…

predicted

purchase

window

Scoring

happens

here

Latest

• Firmographics

• Past Purchase Behavior

• Contacts (# and types)

• Marketing Interactions

• Cust Sat surveys

• Macroeconomic indicators

Page 5: A Predictive Model Factory Picks Up Steam

© 2014 Cisco and/or its affiliates. All rights reserved. 5

Too many models (60,000!)

From scratch every time

Users: “my region is different”

Cisco is constantly introducing new products and services

2 distinct universe of companies: internal and external (160M)

models by product, country, company size, and mktg objective

Tech business changes a lot: new patterns arise every time

Companies change: mergers, acquisitions, in & out of business

Improvements in data collection may make more info available

The truth: too few data miners for “artisanal” approach to modeling

Page 6: A Predictive Model Factory Picks Up Steam

© 2014 Cisco and/or its affiliates. All rights reserved. 6

Country&

Regional

End of

Quarter

Results

Assess.

SAS

Data

Warehouse,

Salesforce,

etc

Deploy-

ment

Embedding

Control

Groups

Whennew data

is available

ScoringModel

Training

For all potential products

Massive

Data Prep

Naïve,

Random,

Challenger

SAS + SAS Ent.Miner SAS, Teradata, BO, Tableau

• Busy SAS Environment, shared by other groups

(using mostly EG)

• Training and Scoring would take more than 4

weeks sometimes !!!

• Decision Trees only

Challenges:

Page 7: A Predictive Model Factory Picks Up Steam

© 2014 Cisco and/or its affiliates. All rights reserved. 7

Page 8: A Predictive Model Factory Picks Up Steam

© 2014 Cisco and/or its affiliates. All rights reserved. 8

Results

Assess.

Deploy-

ment

Embedding

Control

Groups

Massive

Data Prep

H2O in small cluster

• 4 nodes running on CentOS

• 24 cores, 128GB memory each

• Using R to control flow of process

ScoringModel

Training

……

Results in

• 2 days to train and

score all models !!

• More data, more

patterns being

identified

• More techniques

compared

• more accuracy

Training with

• Many 10M’s of observations

• GLM, Random Forest, Gradient Boosting

• different algosused in ensemble and compared

Page 9: A Predictive Model Factory Picks Up Steam

© 2014 Cisco and/or its affiliates. All rights reserved. 9

Q1 Q2

P2B Training

Scoring models

Data Refresh Q2

Data Refresh Q1

Prepare, execute Mktg & Sales

activities

Before without H2O

Q1 Q2

Train &

score

Data Refresh

Prepare, execute Mktg & Sales

activities

Train &

score

Data Refresh

Prepare, execute Mktg & Sales

activities

Now with H2O

Without H2O:

• Models needed to be

prepared in advance, not

to delay scoring

• More time preparing

models, less time left for

using the scores in the

sales activities

With H2O:

• Newer Buying Patterns

incorporated

immediately into models

• Scores are published

sooner: more time for

planning and executing

activities

Page 10: A Predictive Model Factory Picks Up Steam

© 2014 Cisco and/or its affiliates. All rights reserved. 11

1) Define environment and main

parameters

2) Read training and scoring files

• Reserve subset of training for validation

3) Define list of predictors and target

variables

4) Train first stage for each target product

(what is the probability of purchase?)

• Train GLM and evaluate model against validation

• Train a couple of Random Forests using different

architectures (fewer, deeper trees vs more,

shallower trees) and evaluate model against

validation

• If product has traditionally been hard to predict,

then train a GBM and evaluate model against

validation

• Use best model (AUC) to score. If more than one

has good result, use ensemble to compose

probability of purchase

5) Train second stage for each target

product (how much will be purchased?)

• Train a GLM and evaluate results on validation set

• Train a GBM and evaluate results on validation set

• Choose the best model to score and predict

purchase value

6) Save intermediate results and treat the

next target product (step 4)

7) Save final score files and clean things

up

Page 11: A Predictive Model Factory Picks Up Steam

© 2014 Cisco and/or its affiliates. All rights reserved. 12

Improvements

• P2B factory is 15x faster with H2O

• Quicker techniques for simpler problems, deeper for harder ones (grid

searches!)

• Ensembles improved accuracy and stability of models significantly

Lessons Learned

• Memory is your friend! Even with few nodes speed improvement over

traditional data mining tools is substantial

• H2O becomes really powerful and robust when combined with R

• Rely on Hexadata’s extremely responsive support

• Anxious to see more data preparation capabilities in H2O

Page 12: A Predictive Model Factory Picks Up Steam

© 2014 Cisco and/or its affiliates. All rights reserved. 13

Throughout the last decade Cisco has increasingly relied on

advanced analytics to drive marketing and sales efforts

The P2B Factory has been a fundamental component of that drive but

it needed to expand, predict new products and services, increase its

accuracy and do it all in less time

H2O allowed that improvement to happen with its powerful in-memory

distributed computing algorithms, great support team and cost

effective solution

Page 13: A Predictive Model Factory Picks Up Steam

Thank you.