ml2 production
TRANSCRIPT
Taking ML to Production
Nikhil Ketkar14 July 2015
What • Key Insights on taking Machine Learning
Models to production• Opinionated but backed by experience• In the trenches of an early phase startup• Mileage may vary• More that one way to do it• Not a cookbook, need to think critically
Build an Override Mechanism Early• Models will never be
perfect• Cost of certain mistakes
is too high• Field requests need to be
handled ASAP• Model improvements
made incrementally• Override DB is your
feature backlog
Model OverrideSystem
OverrideDB
Y
X
Model
Model
X,YTraining
ModelX
Y
Establish Benchmarks and Blind Sets Early
• Concerns on a new model– Is the new model really better?– Should we push to production?– What will be the impact?
• Kill Ambiguity with a Blind Set• Build the set early on• Stakeholder builds and
maintains a blind set• No peeking
YModelX
Blind Set
Training + Validation
Set
Stakeholders
Data Scientists
Develop Expertize in using Crowdsourcing
• Acquiring ground truth is priority number one• No problem can be solved without ground truth• Even unsupervised approaches require validation• Crowdsourcing Ecosystem is maturing but you
need iron out process and tools• Having multiple vendors helps• Allocate budget for Crowdsourcing• Should be an ongoing process• Active Learning
Calibrate all your Models
• All models should be calibrated to produce probabilities– Filter outright predictions with poor confidence– Crowd source labeling of poor confidence predictions if
critical– Generate useful training data– Make joint inferences– Trigger alerts– Serve use cases with different misclassification costs
• Two popular approaches (Isotonic regression Platt’s scaling)
Standardize Process of Model Improvement
• Bias Problem– High Training Error and
High Testing Error• Variance Problem– Low Training Error and
High Testing Error• Ground Truth Problem– Low Training Error and
Low Testing Error but errors on field set high
BiasProblem
VarianceProblem
Ground Truth
Problem
Problem
Regularization, Feature Reduction, Feature
Selection, Ensembles, Model with poorer
hypothesis space, More data
Add more features, Model
with richer hypothesis space
Update your training, testing
set
Continuous Delivery of Models
• Model improvements are incremental• Significant improvement on blind set is followed by taking
model to production• If this process is long and complicated further model
improvements take a hit• Handing off a model to a developer does not work well• Infrastructure (Staging, BG deployment is essential)• Different tech. stacks for model development and
production are a roadblock• First time deployments of models will be painful, don’t
abstract too much
Democratize your Data
• Data silos are evil– Production specific technology choices bad for tinkering– Undocumented formats– Serious programming required– Bars all non-developers from innovation– Kills velocity
• Everybody should be able to access all data with the tools of their choice
• Self-serve– Solr– Hive
• Support multiple languages
Is your Ground Truth still the Truth?
• Ground truth is a moving target• If reality has changed, so should your ground
truth and your model• Sample and spot check via crowdsourcing often • Trigger alerts based on key statistical
differences• Automate the sampling, spot-checking and
triggring
Beware of Cascading Errors
• Cascading inference means cascading errors
• Try to make a joint inference – Not always possible– Complicated
• Experiment with order of inference
• Segment populations and use different orders
• Product Category and Product Brand is a good example
A B
A
B
J
Traceability and Repeatability
• As the prediction sequence becomes large debugging errors becomes complicated
• Traceability is being able to view the sequence of predictions leading to a final prediction
• Repeatability is being able to reproduce the results of a end-to-end run
• Simple in theory, but think about this early• Hard when data, code and models is changing all the time• Infrastructure should focus on traceability and repeatability
over performance• Expose internal REST API over all models
On-line/Near-line/Off-line Models
• Online Models– Simple– Fast– Don’t change often– In the line of fire– Good enough
• Near-line Models– Intermediate
• Offline Models– Complicated– Slow– Change Frequently
Near-line
Online
Offline
Product Classification
Title, UPC, Brand, MPN Linear SVM and Db Lookup
Title, Description, Breadcrumb, UPC, Brand, MPN, Ensemble model and Db Lookup
Title, Description, Breadcrumb, UPC, Brand, MPN, Image, Ensemble mode, Joint Inference
Prediction Cache/DB/IB
Prediction Cache/DB/IB
Kaggalize your Problems
• Package your training data and blind sets with a readme and house them internally
• Allows people to take a crack at a problem if they are interested
• Can be used for internal hackathons, during the interview process and can be put on Kaggle if its important enough
• Allows you to take help from experts when they are available
Build Capability to Run What if Scenarios
• Results on blind set are a good gate for small changes• For Major Changes a full production run will be
required• Impact on dependent subsystems needs to be
validated• Ability to run what-if scenarios is paramount for
velocity• Running a parallel setup all the time is prohibitive• Ad-hoc setup, end-to-end run, tests and tear-down of
production setup is essential
Be Smart about Sampling
• Make sure your samples are truly random• Extract multiple samples and compare on the
statistic you care about to ensure there is no bias• Make sure your sample sizes give you the power
(statistical) you need• Do the math or a simulation to save on the
crowdsourcing budget• Save the seed be able to replicate experiments
Ensembles and Co-Training
• Always be ensembling• Both homogeneous (bagging, boosting) and
heterogeneous (stacking)• Segment your population and train specific
models for each population• Co-Training based on Text and Images• Develop independent models that can be
combined based on available data
Summary• Build an override system early• Establish blind sets and benchmarks early • Invest time and gain expertize in Crowdsourcing • Calibrate your models• Standardize Model Improvement Process • Continuous delivery of models • Democratize your data • Is your ground truth still the truth?• Beware of Cascading Errors• Traceability and Repeatability is Paramount• Online/Near-Line/Offline Models• Kaggalize your Problems• Build Capability to Run What if Scenarios• Be-Smart about Sampling• Ensembles and Co-Training