building machine learning pipelines
TRANSCRIPT
![Page 2: Building Machine Learning Pipelines](https://reader033.vdocuments.us/reader033/viewer/2022052307/55bb1181bb61ebb7268b4639/html5/thumbnails/2.jpg)
What do ML Pipelines Look Like?
![Page 3: Building Machine Learning Pipelines](https://reader033.vdocuments.us/reader033/viewer/2022052307/55bb1181bb61ebb7268b4639/html5/thumbnails/3.jpg)
TRAINING DATA
AWESOME ML
TECHNIQUEMODEL
TESTING DATA
PREDICTIONS
![Page 4: Building Machine Learning Pipelines](https://reader033.vdocuments.us/reader033/viewer/2022052307/55bb1181bb61ebb7268b4639/html5/thumbnails/4.jpg)
Let’s build one now!
![Page 5: Building Machine Learning Pipelines](https://reader033.vdocuments.us/reader033/viewer/2022052307/55bb1181bb61ebb7268b4639/html5/thumbnails/5.jpg)
UserID Pet Children Salary
1 cat 4 90
2 dog 6 24
3 dog 3 44
4 fish 3 27
5 cat 2 32
6 dog 3 59
7 cat 5 36
8 fish 4 27
Predict the salary from the kind of pets and the number of children a person has
![Page 6: Building Machine Learning Pipelines](https://reader033.vdocuments.us/reader033/viewer/2022052307/55bb1181bb61ebb7268b4639/html5/thumbnails/6.jpg)
You may need to:1. Binarize/normalize data2. Remove noise3. Reduce dimensionality of data4. Make features from raw data…before you get to train your model !!
![Page 7: Building Machine Learning Pipelines](https://reader033.vdocuments.us/reader033/viewer/2022052307/55bb1181bb61ebb7268b4639/html5/thumbnails/7.jpg)
C D F N S
1 0 0 0.21 90
0 1 0 1.88 24
0 1 0 -0.63 44
0 0 1 -0.63 27
1 0 0 1.46 32
0 1 0 -0.63 59
1 0 0 1.04 36
0 0 1 0.21 27
Neural Net
Training Set
YX
Model
![Page 8: Building Machine Learning Pipelines](https://reader033.vdocuments.us/reader033/viewer/2022052307/55bb1181bb61ebb7268b4639/html5/thumbnails/8.jpg)
But is this enough?
![Page 9: Building Machine Learning Pipelines](https://reader033.vdocuments.us/reader033/viewer/2022052307/55bb1181bb61ebb7268b4639/html5/thumbnails/9.jpg)
No ML Pipeline is complete without Cross-validation and Hyper-parameter optimization
![Page 10: Building Machine Learning Pipelines](https://reader033.vdocuments.us/reader033/viewer/2022052307/55bb1181bb61ebb7268b4639/html5/thumbnails/10.jpg)
So how does our ML Pipeline look now?
![Page 11: Building Machine Learning Pipelines](https://reader033.vdocuments.us/reader033/viewer/2022052307/55bb1181bb61ebb7268b4639/html5/thumbnails/11.jpg)
RAW DATA
AWESOME ML
TECHNIQUEwith
PARAMETERS 1
BEST MODEL
TESTING DATA
PREDICTIONS
PRE-PROCESSED
DATA
EXTRACT FEATURES
TRAINING DATA
AWESOME ML
TECHNIQUEwith
PARAMETERS K
AWESOME ML
TECHNIQUEwith
PARAMETERS N
![Page 12: Building Machine Learning Pipelines](https://reader033.vdocuments.us/reader033/viewer/2022052307/55bb1181bb61ebb7268b4639/html5/thumbnails/12.jpg)
What does ‘best’ model mean?
![Page 13: Building Machine Learning Pipelines](https://reader033.vdocuments.us/reader033/viewer/2022052307/55bb1181bb61ebb7268b4639/html5/thumbnails/13.jpg)
ML Pipeline in Code
![Page 14: Building Machine Learning Pipelines](https://reader033.vdocuments.us/reader033/viewer/2022052307/55bb1181bb61ebb7268b4639/html5/thumbnails/14.jpg)
Series of transformationsTransformations might involve making modelsModels can be used to transform or predictGrid-search on Parameters
![Page 15: Building Machine Learning Pipelines](https://reader033.vdocuments.us/reader033/viewer/2022052307/55bb1181bb61ebb7268b4639/html5/thumbnails/15.jpg)
>>> clf.set_params(svm__C=10)
Pipeline(steps=[('reduce_dim', PCA(copy=True, n_components=None, whiten=False)),
('svm', SVC(C=10, cache_size=200, class_weight=None,
coef0=0.0, degree=3, gamma=0.0, kernel='rbf', max_iter=-1,
probability=False, random_state=None, shrinking=True, tol=0.001,
verbose=False))])
>>> from sklearn.grid_search import GridSearchCV
>>> params = dict(reduce_dim__n_components=[2, 5, 10],
... svm__C=[0.1, 10, 100])
>>> grid_search = GridSearchCV(clf, param_grid=params)
![Page 16: Building Machine Learning Pipelines](https://reader033.vdocuments.us/reader033/viewer/2022052307/55bb1181bb61ebb7268b4639/html5/thumbnails/16.jpg)
Extra features:Configurable data sourcesCustomized scoring metrics(average, median of results etc.)
![Page 17: Building Machine Learning Pipelines](https://reader033.vdocuments.us/reader033/viewer/2022052307/55bb1181bb61ebb7268b4639/html5/thumbnails/17.jpg)
Customize cross-validation based on nature of data
How do you cross-validate on time-series data?
![Page 18: Building Machine Learning Pipelines](https://reader033.vdocuments.us/reader033/viewer/2022052307/55bb1181bb61ebb7268b4639/html5/thumbnails/18.jpg)
Why use ML pipelines?
![Page 19: Building Machine Learning Pipelines](https://reader033.vdocuments.us/reader033/viewer/2022052307/55bb1181bb61ebb7268b4639/html5/thumbnails/19.jpg)
DRY
![Page 20: Building Machine Learning Pipelines](https://reader033.vdocuments.us/reader033/viewer/2022052307/55bb1181bb61ebb7268b4639/html5/thumbnails/20.jpg)
Libraries with ML PipelinesSci-kit Learn, Pandas and Scikit-MapperSparks MLLibWrite your own!!
![Page 23: Building Machine Learning Pipelines](https://reader033.vdocuments.us/reader033/viewer/2022052307/55bb1181bb61ebb7268b4639/html5/thumbnails/23.jpg)
UserID Pet Children Salary
1 cat 4 90
2 dog 6 24
3 dog 3 44
4 fish 3 27
5 cat 2 32
6 dog 3 59
7 cat 5 36
8 fish 4 27
Need to binarize this column Might also want to normalize this column
![Page 24: Building Machine Learning Pipelines](https://reader033.vdocuments.us/reader033/viewer/2022052307/55bb1181bb61ebb7268b4639/html5/thumbnails/24.jpg)
Is Pet a Cat? Is Pet a Dog? Is Pet a Fish? Normalized number of children
1 0 0 0.21
0 1 0 1.88
0 1 0 -0.63
0 0 1 -0.63
1 0 0 1.46
0 1 0 -0.63
1 0 0 1.04
0 0 1 0.21