![Page 1: Machine Learning Lifecycle - boss-workshop.github.io](https://reader033.vdocuments.us/reader033/viewer/2022061102/629ce019b98d3a34054952d8/html5/thumbnails/1.jpg)
Simplifying the Machine Learning Lifecycle
![Page 2: Machine Learning Lifecycle - boss-workshop.github.io](https://reader033.vdocuments.us/reader033/viewer/2022061102/629ce019b98d3a34054952d8/html5/thumbnails/2.jpg)
Agenda/ Broad Adoption of ML … and its issues
/ The need for standardization
/ ML development challenges
/ How MLflow tackles these
![Page 3: Machine Learning Lifecycle - boss-workshop.github.io](https://reader033.vdocuments.us/reader033/viewer/2022061102/629ce019b98d3a34054952d8/html5/thumbnails/3.jpg)
Login to Databricks Community Edition
• Sign up for Databricks Community Edition for free • We will use this for the tutorial• Once you sign up, you can continue to use it to learn and
experiment on a dedicated data sciences engineering environment
https://databricks.com/try
![Page 4: Machine Learning Lifecycle - boss-workshop.github.io](https://reader033.vdocuments.us/reader033/viewer/2022061102/629ce019b98d3a34054952d8/html5/thumbnails/4.jpg)
Go to databricks.com/try
![Page 5: Machine Learning Lifecycle - boss-workshop.github.io](https://reader033.vdocuments.us/reader033/viewer/2022061102/629ce019b98d3a34054952d8/html5/thumbnails/5.jpg)
Sign up for Community Edition
![Page 6: Machine Learning Lifecycle - boss-workshop.github.io](https://reader033.vdocuments.us/reader033/viewer/2022061102/629ce019b98d3a34054952d8/html5/thumbnails/6.jpg)
Sign up for Community Edition
![Page 7: Machine Learning Lifecycle - boss-workshop.github.io](https://reader033.vdocuments.us/reader033/viewer/2022061102/629ce019b98d3a34054952d8/html5/thumbnails/7.jpg)
Log into DBCE
![Page 8: Machine Learning Lifecycle - boss-workshop.github.io](https://reader033.vdocuments.us/reader033/viewer/2022061102/629ce019b98d3a34054952d8/html5/thumbnails/8.jpg)
Create a Cluster on DBCE
![Page 9: Machine Learning Lifecycle - boss-workshop.github.io](https://reader033.vdocuments.us/reader033/viewer/2022061102/629ce019b98d3a34054952d8/html5/thumbnails/9.jpg)
Create a Cluster on DBCE
![Page 10: Machine Learning Lifecycle - boss-workshop.github.io](https://reader033.vdocuments.us/reader033/viewer/2022061102/629ce019b98d3a34054952d8/html5/thumbnails/10.jpg)
Create a Cluster on DBCE
![Page 11: Machine Learning Lifecycle - boss-workshop.github.io](https://reader033.vdocuments.us/reader033/viewer/2022061102/629ce019b98d3a34054952d8/html5/thumbnails/11.jpg)
Create a Cluster on DBCE
![Page 12: Machine Learning Lifecycle - boss-workshop.github.io](https://reader033.vdocuments.us/reader033/viewer/2022061102/629ce019b98d3a34054952d8/html5/thumbnails/12.jpg)
Attach a Notebook to your Cluster
![Page 13: Machine Learning Lifecycle - boss-workshop.github.io](https://reader033.vdocuments.us/reader033/viewer/2022061102/629ce019b98d3a34054952d8/html5/thumbnails/13.jpg)
Attach a Notebook to your Cluster
![Page 14: Machine Learning Lifecycle - boss-workshop.github.io](https://reader033.vdocuments.us/reader033/viewer/2022061102/629ce019b98d3a34054952d8/html5/thumbnails/14.jpg)
Attach a Notebook to your Cluster
![Page 15: Machine Learning Lifecycle - boss-workshop.github.io](https://reader033.vdocuments.us/reader033/viewer/2022061102/629ce019b98d3a34054952d8/html5/thumbnails/15.jpg)
Attach a Notebook to your Cluster
![Page 16: Machine Learning Lifecycle - boss-workshop.github.io](https://reader033.vdocuments.us/reader033/viewer/2022061102/629ce019b98d3a34054952d8/html5/thumbnails/16.jpg)
Broad Adoption of ML
and many many more customers in different industries and segments
Internet of ThingsDigital PersonalizationHealthcare and Genomics Fraud Prevention
Huge disruptive innovations are affecting most enterprises on the planet
![Page 17: Machine Learning Lifecycle - boss-workshop.github.io](https://reader033.vdocuments.us/reader033/viewer/2022061102/629ce019b98d3a34054952d8/html5/thumbnails/17.jpg)
MLCode
ConfigurationData Collection
Data Verification
Feature Extraction
Machine Resource
Management
Analysis Tools
ProcessManagement Tools
ServingInfrastructure
Monitoring
“Hidden Technical Debt in Machine Learning Systems,” Google NIPS 2015
Only a small fraction of real-world ML systems is composed of the ML code, as shown by the small green box in the middle. The required surrounding infrastructure is vast and complex.
Hardest Part of ML isn’t ML, it’s Data
![Page 18: Machine Learning Lifecycle - boss-workshop.github.io](https://reader033.vdocuments.us/reader033/viewer/2022061102/629ce019b98d3a34054952d8/html5/thumbnails/18.jpg)
Data & ML Tech and People are in Silos
DATA ENGINEERS
xDATA SCIENTISTS
![Page 19: Machine Learning Lifecycle - boss-workshop.github.io](https://reader033.vdocuments.us/reader033/viewer/2022061102/629ce019b98d3a34054952d8/html5/thumbnails/19.jpg)
ML Lifecycle is Manual, Inconsistent and Disconnected
● Ad hoc approach to track experiments
● Very hard to reproduce experiments
Prep Data● Multiple tightly coupled
deployment options ● Different monitoring approach
for each framework
Build Model Deploy Model● Low level integrations for
Data and ML● Difficult to track data used
for a model
![Page 20: Machine Learning Lifecycle - boss-workshop.github.io](https://reader033.vdocuments.us/reader033/viewer/2022061102/629ce019b98d3a34054952d8/html5/thumbnails/20.jpg)
The need for standardization
![Page 21: Machine Learning Lifecycle - boss-workshop.github.io](https://reader033.vdocuments.us/reader033/viewer/2022061102/629ce019b98d3a34054952d8/html5/thumbnails/21.jpg)
Day in the life of a data scientist (tracking edition)
Elasticnet model (alpha=0.01, l1_ratio=1.0): RMSE: ?? MAE: 51.051828604086325 R2: 0.3951809598912357
Elasticnet model (alpha=?, l1_ratio=0.75): RMSE: 65.28994906390733 MAE: 53.759148284349266 R2: ??
Elasticnet model (alpha=0.01, l1_ratio=?): RMSE: 71.40362571026475 MAE: ?? R2: 0.2291130640003659
![Page 22: Machine Learning Lifecycle - boss-workshop.github.io](https://reader033.vdocuments.us/reader033/viewer/2022061102/629ce019b98d3a34054952d8/html5/thumbnails/22.jpg)
Day in the life of a data scientist (tracking edition)
Elasticnet model (alpha=0.01, l1_ratio=1.0): RMSE: ?? MAE: 51.051828604086325 R2: 0.3951809598912357
Elasticnet model (alpha=?, l1_ratio=0.75): RMSE: 65.28994906390733 MAE: 53.759148284349266 R2: ??
Elasticnet model (alpha=0.01, l1_ratio=?): RMSE: 71.40362571026475 MAE: ?? R2: 0.2291130640003659
Did anything change in the feature engineering?
![Page 23: Machine Learning Lifecycle - boss-workshop.github.io](https://reader033.vdocuments.us/reader033/viewer/2022061102/629ce019b98d3a34054952d8/html5/thumbnails/23.jpg)
Day in the life of a data scientist (tracking edition)
Elasticnet model (alpha=0.01, l1_ratio=1.0): RMSE: ?? MAE: 51.051828604086325 R2: 0.3951809598912357
Elasticnet model (alpha=?, l1_ratio=0.75): RMSE: 65.28994906390733 MAE: 53.759148284349266 R2: ??
Elasticnet model (alpha=0.01, l1_ratio=?): RMSE: 71.40362571026475 MAE: ?? R2: 0.2291130640003659
How did the hyperparameters change?
![Page 24: Machine Learning Lifecycle - boss-workshop.github.io](https://reader033.vdocuments.us/reader033/viewer/2022061102/629ce019b98d3a34054952d8/html5/thumbnails/24.jpg)
Day in the life of a data scientist (tracking edition)
Elasticnet model (alpha=0.01, l1_ratio=1.0): RMSE: ?? MAE: 51.051828604086325 R2: 0.3951809598912357
Elasticnet model (alpha=?, l1_ratio=0.75): RMSE: 65.28994906390733 MAE: 53.759148284349266 R2: ??
Elasticnet model (alpha=0.01, l1_ratio=?): RMSE: 71.40362571026475 MAE: ?? R2: 0.2291130640003659
What data was this model trained on?
![Page 25: Machine Learning Lifecycle - boss-workshop.github.io](https://reader033.vdocuments.us/reader033/viewer/2022061102/629ce019b98d3a34054952d8/html5/thumbnails/25.jpg)
Day in the life of a data scientist (tracking edition)
Elasticnet model (alpha=0.01, l1_ratio=1.0): RMSE: ?? MAE: 51.051828604086325 R2: 0.3951809598912357
Elasticnet model (alpha=?, l1_ratio=0.75): RMSE: 65.28994906390733 MAE: 53.759148284349266 R2: ??
Elasticnet model (alpha=0.01, l1_ratio=?): RMSE: 71.40362571026475 MAE: ?? R2: 0.2291130640003659
How did the offline metrics change?
![Page 26: Machine Learning Lifecycle - boss-workshop.github.io](https://reader033.vdocuments.us/reader033/viewer/2022061102/629ce019b98d3a34054952d8/html5/thumbnails/26.jpg)
Day in the life of a data scientist (tracking edition)
Elasticnet model (alpha=0.01, l1_ratio=1.0): RMSE: ?? MAE: 51.051828604086325 R2: 0.3951809598912357
Elasticnet model (alpha=?, l1_ratio=0.75): RMSE: 65.28994906390733 MAE: 53.759148284349266 R2: ??
Elasticnet model (alpha=0.01, l1_ratio=?): RMSE: 71.40362571026475 MAE: ?? R2: 0.2291130640003659
What else am I missing?
![Page 27: Machine Learning Lifecycle - boss-workshop.github.io](https://reader033.vdocuments.us/reader033/viewer/2022061102/629ce019b98d3a34054952d8/html5/thumbnails/27.jpg)
The difference between releasing Software and deploying ML Models
![Page 28: Machine Learning Lifecycle - boss-workshop.github.io](https://reader033.vdocuments.us/reader033/viewer/2022061102/629ce019b98d3a34054952d8/html5/thumbnails/28.jpg)
Write code
Software
Write unit tests
Send for review
Get approvals
Commit
Release testing
Release
![Page 29: Machine Learning Lifecycle - boss-workshop.github.io](https://reader033.vdocuments.us/reader033/viewer/2022061102/629ce019b98d3a34054952d8/html5/thumbnails/29.jpg)
Write code
Software ML Models
Write unit tests
Send for review
Get approvals
Commit
Release testing
Release
Analyze data
Put data into the right format
Write model code
Train and evaluate model
Experiment with params, model structure
Deploy … by email?
Monitor performance and trigger retraining
![Page 30: Machine Learning Lifecycle - boss-workshop.github.io](https://reader033.vdocuments.us/reader033/viewer/2022061102/629ce019b98d3a34054952d8/html5/thumbnails/30.jpg)
Write code
Software ML Models
Write unit tests
Send for review
Get approvals
Commit
Release testing
Release
Analyze data
Put data into the right format
Write model code
Train and evaluate model
Experiment with params, model structure
Deploy … by email?
Monitor performance and trigger retraining
Meet a functional specification
Optimize a metric, e.g. CTR
Goal
![Page 31: Machine Learning Lifecycle - boss-workshop.github.io](https://reader033.vdocuments.us/reader033/viewer/2022061102/629ce019b98d3a34054952d8/html5/thumbnails/31.jpg)
Write code
Software ML Models
Write unit tests
Send for review
Get approvals
Commit
Release testing
Release
Analyze data
Put data into the right format
Write model code
Train and evaluate model
Experiment with params, model structure
Deploy … by email?
Monitor performance and trigger retraining
Meet a functional specification
Optimize a metric, e.g. CTR
Goal
Depends on code Depends on data, code, model, params,
...
Quality
![Page 32: Machine Learning Lifecycle - boss-workshop.github.io](https://reader033.vdocuments.us/reader033/viewer/2022061102/629ce019b98d3a34054952d8/html5/thumbnails/32.jpg)
Write code
Software ML Models
Write unit tests
Send for review
Get approvals
Commit
Release testing
Release
Analyze data
Put data into the right format
Write model code
Train and evaluate model
Experiment with params, model structure
Deploy … by email?
Monitor performance and trigger retraining
Meet a functional specification
Optimize a metric, e.g. CTR
Goal
Depends on code Depends on data, code, model, params,
...
Quality
Typically one software stack
Combination of many libraries, tools,
...
Tools
![Page 33: Machine Learning Lifecycle - boss-workshop.github.io](https://reader033.vdocuments.us/reader033/viewer/2022061102/629ce019b98d3a34054952d8/html5/thumbnails/33.jpg)
Write code
Software ML Models
Write unit tests
Send for review
Get approvals
Commit
Release testing
Release
Analyze data
Put data into the right format
Write model code
Train and evaluate model
Experiment with params, model structure
Deploy … by email?
Monitor performance and trigger retraining
Meet a functional specification
Optimize a metric, e.g. CTR
Goal
Depends on code Depends on data, code, model, params,
...
Quality
Typically one software stack
Combination of many libraries, tools,
...
Tools
Works deterministically
Keeps changing with data, etc.
Outcome
![Page 34: Machine Learning Lifecycle - boss-workshop.github.io](https://reader033.vdocuments.us/reader033/viewer/2022061102/629ce019b98d3a34054952d8/html5/thumbnails/34.jpg)
In summary, deploying ML Models is hard!
![Page 35: Machine Learning Lifecycle - boss-workshop.github.io](https://reader033.vdocuments.us/reader033/viewer/2022061102/629ce019b98d3a34054952d8/html5/thumbnails/35.jpg)
ML Lifecycle and Challenges
Delta
Tuning Model Mgmt
Raw Data ETL TrainFeaturize Score/ServeBatch + Realtime
Monitor Alert, Debug
Deploy
AutoML, Hyper-p. search
Experiment Tracking
Remote Cloud Execution
Project Mgmt(scale teams)
Model Exchange
DataDrift
ModelDrift
Orchestration (Airflow, Jobs)
A/BTesting
CI/CD/Jenkins push to prod
Feature Repository
Lifecycle mgmt.
RetrainUpdate FeaturesProduction Logs
Zoo of Ecosystem Frameworks
Collaboration Scale Governance
An open source platform for the machine learning lifecycle
![Page 36: Machine Learning Lifecycle - boss-workshop.github.io](https://reader033.vdocuments.us/reader033/viewer/2022061102/629ce019b98d3a34054952d8/html5/thumbnails/36.jpg)
Introducing MLflowUnveiled in June 2018, MLflow is the only open source framework designed to manage the complete Machine Learning Lifecycle.
ModelRegistry
![Page 37: Machine Learning Lifecycle - boss-workshop.github.io](https://reader033.vdocuments.us/reader033/viewer/2022061102/629ce019b98d3a34054952d8/html5/thumbnails/37.jpg)
Introducing MLflowUnveiled in June 2018, MLflow is the only open source framework designed to manage the complete Machine Learning Lifecycle.
![Page 38: Machine Learning Lifecycle - boss-workshop.github.io](https://reader033.vdocuments.us/reader033/viewer/2022061102/629ce019b98d3a34054952d8/html5/thumbnails/38.jpg)
140120100806040200
0 5 10 15 20 25 30 35 40 45
Months since Project Launch
# of
Con
trib
utor
s
![Page 39: Machine Learning Lifecycle - boss-workshop.github.io](https://reader033.vdocuments.us/reader033/viewer/2022061102/629ce019b98d3a34054952d8/html5/thumbnails/39.jpg)
Components
Tracking
Record and queryexperiments: code,
data, config, results
Projects
Packaging formatfor reproducible
runs on any platform
Models
General format that standardizes
deployment paths
Model Registry
Centralized and collaborative
model lifecycle management
mlflow.org github.com/mlflow twitter.com/MLflow
![Page 40: Machine Learning Lifecycle - boss-workshop.github.io](https://reader033.vdocuments.us/reader033/viewer/2022061102/629ce019b98d3a34054952d8/html5/thumbnails/40.jpg)
Components
Tracking
Record and queryexperiments: code,
data, config, results
Projects
Packaging formatfor reproducible
runs on any platform
Models
General format that standardizes
deployment paths
Model Registry
Centralized and collaborative
model lifecycle management
new
mlflow.org github.com/mlflow twitter.com/MLflow
![Page 41: Machine Learning Lifecycle - boss-workshop.github.io](https://reader033.vdocuments.us/reader033/viewer/2022061102/629ce019b98d3a34054952d8/html5/thumbnails/41.jpg)
Tracking
Notebooks
Local Apps
Cloud Jobs
UI
API
Tracking Server
Parameters Metrics Artifacts
ModelsMetadata Spark Data Source
![Page 42: Machine Learning Lifecycle - boss-workshop.github.io](https://reader033.vdocuments.us/reader033/viewer/2022061102/629ce019b98d3a34054952d8/html5/thumbnails/42.jpg)
Key Concepts in Tracking
Parameters: key-value inputs to your codeMetrics: numeric values (can update over time)Artifacts: arbitrary files, including modelsSource: what code ran?
![Page 43: Machine Learning Lifecycle - boss-workshop.github.io](https://reader033.vdocuments.us/reader033/viewer/2022061102/629ce019b98d3a34054952d8/html5/thumbnails/43.jpg)
# Scikit Learn Linear Regression via ElasticNet lr = ElasticNet(alpha=alpha, l1_ratio=l1_ratio, random_state=42) lr.fit(train_x, train_y)
# Predict predicted_qualities = lr.predict(test_x)
# Evaluate Metrics (rmse, mae, r2) = eval_metrics(test_y, predicted_qualities)
![Page 44: Machine Learning Lifecycle - boss-workshop.github.io](https://reader033.vdocuments.us/reader033/viewer/2022061102/629ce019b98d3a34054952d8/html5/thumbnails/44.jpg)
with mlflow.start_run() as run:
# Scikit Learn Linear Regression via ElasticNet lr = ElasticNet(alpha=alpha, l1_ratio=l1_ratio, random_state=42) lr.fit(train_x, train_y)
# Predict predicted_qualities = lr.predict(test_x)
# Evaluate Metrics (rmse, mae, r2) = eval_metrics(test_y, predicted_qualities)
# Log mlflow.log_param("alpha", alpha) ...
![Page 45: Machine Learning Lifecycle - boss-workshop.github.io](https://reader033.vdocuments.us/reader033/viewer/2022061102/629ce019b98d3a34054952d8/html5/thumbnails/45.jpg)
GitHub Demohttps://github.com/dennyglee/mlflow-diabetes-example
![Page 46: Machine Learning Lifecycle - boss-workshop.github.io](https://reader033.vdocuments.us/reader033/viewer/2022061102/629ce019b98d3a34054952d8/html5/thumbnails/46.jpg)
Comparing Runs Contour Plot
![Page 47: Machine Learning Lifecycle - boss-workshop.github.io](https://reader033.vdocuments.us/reader033/viewer/2022061102/629ce019b98d3a34054952d8/html5/thumbnails/47.jpg)
Projects
Project Spec
Code MetadataConfig
Local Execution
Remote Execution
![Page 48: Machine Learning Lifecycle - boss-workshop.github.io](https://reader033.vdocuments.us/reader033/viewer/2022061102/629ce019b98d3a34054952d8/html5/thumbnails/48.jpg)
Example MLflow Projectmy_project/├── MLproject│ │ │ │ │├── conda.yaml├── main.py└── model.py ...
conda_env: conda.yaml
entry_points: main: parameters: training_data: path lambda: {type: float, default: 0.1} command: python main.py {training_data} {lambda}
$ mlflow run git://<my_project>
mlflow.run(“git://<my_project>”, ...)
![Page 49: Machine Learning Lifecycle - boss-workshop.github.io](https://reader033.vdocuments.us/reader033/viewer/2022061102/629ce019b98d3a34054952d8/html5/thumbnails/49.jpg)
Model Format
Flavor 2Flavor 1
Simple model flavors usable by many tools
Containers
Batch & Stream Scoring
Cloud Inference Services
In-Line Code
Models
![Page 50: Machine Learning Lifecycle - boss-workshop.github.io](https://reader033.vdocuments.us/reader033/viewer/2022061102/629ce019b98d3a34054952d8/html5/thumbnails/50.jpg)
Example MLflow Modelmy_model/├── MLmodel│ │ │ │ │└── estimator/ ├── saved_model.pb └── variables/ ...
Usable by tools that understandTensorFlow model format
Usable by any tool that can runPython (Docker, Spark, etc!)
run_id: 769915006efd4c4bbd662461time_created: 2018-06-28T12:34flavors: tensorflow: saved_model_dir: estimator signature_def_key: predict python_function: loader_module: mlflow.tensorflow
![Page 51: Machine Learning Lifecycle - boss-workshop.github.io](https://reader033.vdocuments.us/reader033/viewer/2022061102/629ce019b98d3a34054952d8/html5/thumbnails/51.jpg)
Automated Jobs
REST Serving
Downstream Users
Reviewers + CI/CD Tools
Model Registry
Experimental Staging A/B Tests Production
Model RegistryData Scientists Deployment Engineers
![Page 52: Machine Learning Lifecycle - boss-workshop.github.io](https://reader033.vdocuments.us/reader033/viewer/2022061102/629ce019b98d3a34054952d8/html5/thumbnails/52.jpg)
Model Registry: Benefits
One Collaborative Hub
● Central Model Repository
● Overview of versions in Staging/Production/etc.
● Search/filter/pagination1
2
3
3
1
2
3
![Page 53: Machine Learning Lifecycle - boss-workshop.github.io](https://reader033.vdocuments.us/reader033/viewer/2022061102/629ce019b98d3a34054952d8/html5/thumbnails/53.jpg)
Model Registry: Benefits
2
1
Management of the entire ML Lifecycle (MLOps)
● Overview of active model versions and their deployment stage
● Request/Approval workflow for transitioning deployment stages
1
2
![Page 54: Machine Learning Lifecycle - boss-workshop.github.io](https://reader033.vdocuments.us/reader033/viewer/2022061102/629ce019b98d3a34054952d8/html5/thumbnails/54.jpg)
1
Visibility
● Full activity log of stage transition requests, approvals, etc.
1
Model Registry: Benefits
![Page 55: Machine Learning Lifecycle - boss-workshop.github.io](https://reader033.vdocuments.us/reader033/viewer/2022061102/629ce019b98d3a34054952d8/html5/thumbnails/55.jpg)
1.a
1.b
1.c
Governance and Auditability
● Full provenance from Model marked production in the Registry to …
●○ Run that produced the model○ Notebook that produced the
run○ Exact revision history of the
notebook that produced the run
1.a
1.b
1.c
Model Registry: Benefits
![Page 56: Machine Learning Lifecycle - boss-workshop.github.io](https://reader033.vdocuments.us/reader033/viewer/2022061102/629ce019b98d3a34054952d8/html5/thumbnails/56.jpg)
Notebook Demohttps://github.com/dennyglee/tech-talks/blob/master/sa
mples/MLflow%20Diabetes%20Example%20(with%20MLflow%20Registry).ipynb
![Page 57: Machine Learning Lifecycle - boss-workshop.github.io](https://reader033.vdocuments.us/reader033/viewer/2022061102/629ce019b98d3a34054952d8/html5/thumbnails/57.jpg)
Towards more principled Data Science and ML
: An Open Source ML Platform
mlflow.org github.com/mlflow twitter.com/MLflow
![Page 58: Machine Learning Lifecycle - boss-workshop.github.io](https://reader033.vdocuments.us/reader033/viewer/2022061102/629ce019b98d3a34054952d8/html5/thumbnails/58.jpg)
Hands-on Workshop
bit.ly/mlflow-boss-2020