ml2 production

Taking ML to Production

Nikhil Ketkar14 July 2015

What • Key Insights on taking Machine Learning

Models to production• Opinionated but backed by experience• In the trenches of an early phase startup• Mileage may vary• More that one way to do it• Not a cookbook, need to think critically

Build an Override Mechanism Early• Models will never be

perfect• Cost of certain mistakes

is too high• Field requests need to be

handled ASAP• Model improvements

made incrementally• Override DB is your

feature backlog

Model OverrideSystem

OverrideDB

Y

X

Model

Model

X,YTraining

ModelX

Y

Establish Benchmarks and Blind Sets Early

• Concerns on a new model– Is the new model really better?– Should we push to production?– What will be the impact?

• Kill Ambiguity with a Blind Set• Build the set early on• Stakeholder builds and

maintains a blind set• No peeking

YModelX

Blind Set

Training + Validation

Set

Stakeholders

Data Scientists

Develop Expertize in using Crowdsourcing

• Acquiring ground truth is priority number one• No problem can be solved without ground truth• Even unsupervised approaches require validation• Crowdsourcing Ecosystem is maturing but you

need iron out process and tools• Having multiple vendors helps• Allocate budget for Crowdsourcing• Should be an ongoing process• Active Learning

Calibrate all your Models

• All models should be calibrated to produce probabilities– Filter outright predictions with poor confidence– Crowd source labeling of poor confidence predictions if

critical– Generate useful training data– Make joint inferences– Trigger alerts– Serve use cases with different misclassification costs

• Two popular approaches (Isotonic regression Platt’s scaling)

Standardize Process of Model Improvement

• Bias Problem– High Training Error and

High Testing Error• Variance Problem– Low Training Error and

High Testing Error• Ground Truth Problem– Low Training Error and

Low Testing Error but errors on field set high

BiasProblem

VarianceProblem

Ground Truth

Problem

Problem

Regularization, Feature Reduction, Feature

Selection, Ensembles, Model with poorer

hypothesis space, More data

Add more features, Model

with richer hypothesis space

Update your training, testing

set

Continuous Delivery of Models

• Model improvements are incremental• Significant improvement on blind set is followed by taking

model to production• If this process is long and complicated further model

improvements take a hit• Handing off a model to a developer does not work well• Infrastructure (Staging, BG deployment is essential)• Different tech. stacks for model development and

production are a roadblock• First time deployments of models will be painful, don’t

abstract too much

Democratize your Data

• Data silos are evil– Production specific technology choices bad for tinkering– Undocumented formats– Serious programming required– Bars all non-developers from innovation– Kills velocity

• Everybody should be able to access all data with the tools of their choice

• Self-serve– Solr– Hive

• Support multiple languages

Is your Ground Truth still the Truth?

• Ground truth is a moving target• If reality has changed, so should your ground

truth and your model• Sample and spot check via crowdsourcing often • Trigger alerts based on key statistical

differences• Automate the sampling, spot-checking and

triggring

Beware of Cascading Errors

• Cascading inference means cascading errors

• Try to make a joint inference – Not always possible– Complicated

• Experiment with order of inference

• Segment populations and use different orders

• Product Category and Product Brand is a good example

A B

A

B

J

Traceability and Repeatability

• As the prediction sequence becomes large debugging errors becomes complicated

• Traceability is being able to view the sequence of predictions leading to a final prediction

• Repeatability is being able to reproduce the results of a end-to-end run

• Simple in theory, but think about this early• Hard when data, code and models is changing all the time• Infrastructure should focus on traceability and repeatability

over performance• Expose internal REST API over all models

On-line/Near-line/Off-line Models

• Online Models– Simple– Fast– Don’t change often– In the line of fire– Good enough

• Near-line Models– Intermediate

• Offline Models– Complicated– Slow– Change Frequently

Near-line

Online

Offline

Product Classification

Title, UPC, Brand, MPN Linear SVM and Db Lookup

Title, Description, Breadcrumb, UPC, Brand, MPN, Ensemble model and Db Lookup

Title, Description, Breadcrumb, UPC, Brand, MPN, Image, Ensemble mode, Joint Inference

Prediction Cache/DB/IB

Prediction Cache/DB/IB

Kaggalize your Problems

• Package your training data and blind sets with a readme and house them internally

• Allows people to take a crack at a problem if they are interested

• Can be used for internal hackathons, during the interview process and can be put on Kaggle if its important enough

• Allows you to take help from experts when they are available

Build Capability to Run What if Scenarios

• Results on blind set are a good gate for small changes• For Major Changes a full production run will be

required• Impact on dependent subsystems needs to be

validated• Ability to run what-if scenarios is paramount for

velocity• Running a parallel setup all the time is prohibitive• Ad-hoc setup, end-to-end run, tests and tear-down of

production setup is essential

Be Smart about Sampling

• Make sure your samples are truly random• Extract multiple samples and compare on the

statistic you care about to ensure there is no bias• Make sure your sample sizes give you the power

(statistical) you need• Do the math or a simulation to save on the

crowdsourcing budget• Save the seed be able to replicate experiments

Ensembles and Co-Training

• Always be ensembling• Both homogeneous (bagging, boosting) and

heterogeneous (stacking)• Segment your population and train specific

models for each population• Co-Training based on Text and Images• Develop independent models that can be

combined based on available data

Summary• Build an override system early• Establish blind sets and benchmarks early • Invest time and gain expertize in Crowdsourcing • Calibrate your models• Standardize Model Improvement Process • Continuous delivery of models • Democratize your data • Is your ground truth still the truth?• Beware of Cascading Errors• Traceability and Repeatability is Paramount• Online/Near-Line/Offline Models• Kaggalize your Problems• Build Capability to Run What if Scenarios• Be-Smart about Sampling• Ensembles and Co-Training

ml2 production

Data & Analytics