introduction to machine learning @ mooncascade ml camp
TRANSCRIPT
![Page 2: Introduction to Machine Learning @ Mooncascade ML Camp](https://reader031.vdocuments.us/reader031/viewer/2022021813/58708ed81a28ab412b8b5129/html5/thumbnails/2.jpg)
ONE MACHINE LEARNING USE CASE
![Page 3: Introduction to Machine Learning @ Mooncascade ML Camp](https://reader031.vdocuments.us/reader031/viewer/2022021813/58708ed81a28ab412b8b5129/html5/thumbnails/3.jpg)
![Page 4: Introduction to Machine Learning @ Mooncascade ML Camp](https://reader031.vdocuments.us/reader031/viewer/2022021813/58708ed81a28ab412b8b5129/html5/thumbnails/4.jpg)
![Page 5: Introduction to Machine Learning @ Mooncascade ML Camp](https://reader031.vdocuments.us/reader031/viewer/2022021813/58708ed81a28ab412b8b5129/html5/thumbnails/5.jpg)
![Page 6: Introduction to Machine Learning @ Mooncascade ML Camp](https://reader031.vdocuments.us/reader031/viewer/2022021813/58708ed81a28ab412b8b5129/html5/thumbnails/6.jpg)
![Page 7: Introduction to Machine Learning @ Mooncascade ML Camp](https://reader031.vdocuments.us/reader031/viewer/2022021813/58708ed81a28ab412b8b5129/html5/thumbnails/7.jpg)
![Page 8: Introduction to Machine Learning @ Mooncascade ML Camp](https://reader031.vdocuments.us/reader031/viewer/2022021813/58708ed81a28ab412b8b5129/html5/thumbnails/8.jpg)
![Page 9: Introduction to Machine Learning @ Mooncascade ML Camp](https://reader031.vdocuments.us/reader031/viewer/2022021813/58708ed81a28ab412b8b5129/html5/thumbnails/9.jpg)
![Page 10: Introduction to Machine Learning @ Mooncascade ML Camp](https://reader031.vdocuments.us/reader031/viewer/2022021813/58708ed81a28ab412b8b5129/html5/thumbnails/10.jpg)
Can we ask a computer to create those patterns
automatically?
![Page 11: Introduction to Machine Learning @ Mooncascade ML Camp](https://reader031.vdocuments.us/reader031/viewer/2022021813/58708ed81a28ab412b8b5129/html5/thumbnails/11.jpg)
Can we ask a computer to create those patterns
automatically?
Yes
![Page 12: Introduction to Machine Learning @ Mooncascade ML Camp](https://reader031.vdocuments.us/reader031/viewer/2022021813/58708ed81a28ab412b8b5129/html5/thumbnails/12.jpg)
Can we ask a computer to create those patterns
automatically?
Yes
How?
![Page 13: Introduction to Machine Learning @ Mooncascade ML Camp](https://reader031.vdocuments.us/reader031/viewer/2022021813/58708ed81a28ab412b8b5129/html5/thumbnails/13.jpg)
Raw data
![Page 14: Introduction to Machine Learning @ Mooncascade ML Camp](https://reader031.vdocuments.us/reader031/viewer/2022021813/58708ed81a28ab412b8b5129/html5/thumbnails/14.jpg)
Instance Raw dataClass (label)A data sample:
“7”
![Page 15: Introduction to Machine Learning @ Mooncascade ML Camp](https://reader031.vdocuments.us/reader031/viewer/2022021813/58708ed81a28ab412b8b5129/html5/thumbnails/15.jpg)
Instance Raw dataClass (label)A data sample:
“7”
How to represent it in a machine-readable form?
![Page 16: Introduction to Machine Learning @ Mooncascade ML Camp](https://reader031.vdocuments.us/reader031/viewer/2022021813/58708ed81a28ab412b8b5129/html5/thumbnails/16.jpg)
Instance Raw dataClass (label)A data sample:
“7”
How to represent it in a machine-readable form?
Feature extraction
![Page 17: Introduction to Machine Learning @ Mooncascade ML Camp](https://reader031.vdocuments.us/reader031/viewer/2022021813/58708ed81a28ab412b8b5129/html5/thumbnails/17.jpg)
Instance Raw dataClass (label)A data sample:
“7”
How to represent it in a machine-readable form?
Feature extraction
28 p
x
28 px
![Page 18: Introduction to Machine Learning @ Mooncascade ML Camp](https://reader031.vdocuments.us/reader031/viewer/2022021813/58708ed81a28ab412b8b5129/html5/thumbnails/18.jpg)
Instance Raw dataClass (label)A data sample:
“7”
28 p
x
28 px784 pixels in total
Feature vector(0, 0, 0, …, 28, 65, 128, 255, 101, 38,… 0, 0, 0)
How to represent it in a machine-readable form?
Feature extraction
![Page 19: Introduction to Machine Learning @ Mooncascade ML Camp](https://reader031.vdocuments.us/reader031/viewer/2022021813/58708ed81a28ab412b8b5129/html5/thumbnails/19.jpg)
Instance Raw dataClass (label)A data sample:
“7”
28 p
x
28 px784 pixels in total
Feature vector(0, 0, 0, …, 28, 65, 128, 255, 101, 38,… 0, 0, 0)
How to represent it in a machine-readable form?
Feature extraction
(0, 0, 0, …, 28, 65, 128, 255, 101, 38,… 0, 0, 0)
(0, 0, 0, …, 13, 48, 102, 0, 46, 255,… 0, 0, 0)
(0, 0, 0, …, 17, 34, 12, 43, 122, 70,… 0, 7, 0)
(0, 0, 0, …, 98, 21, 255, 255, 231, 140,… 0, 0, 0)
“7”“2”
“8”“2”
![Page 20: Introduction to Machine Learning @ Mooncascade ML Camp](https://reader031.vdocuments.us/reader031/viewer/2022021813/58708ed81a28ab412b8b5129/html5/thumbnails/20.jpg)
Instance Raw dataClass (label)A data sample:
“7”
28 p
x
28 px784 pixels in total
Feature vector(0, 0, 0, …, 28, 65, 128, 255, 101, 38,… 0, 0, 0)
How to represent it in a machine-readable form?
Feature extraction
(0, 0, 0, …, 28, 65, 128, 255, 101, 38,… 0, 0, 0)
(0, 0, 0, …, 13, 48, 102, 0, 46, 255,… 0, 0, 0)
(0, 0, 0, …, 17, 34, 12, 43, 122, 70,… 0, 7, 0) Dataset(0, 0, 0, …, 98, 21, 255, 255, 231, 140,… 0, 0, 0)
“7”“2”
“8”“2”
![Page 21: Introduction to Machine Learning @ Mooncascade ML Camp](https://reader031.vdocuments.us/reader031/viewer/2022021813/58708ed81a28ab412b8b5129/html5/thumbnails/21.jpg)
The data is in the right format — what’s next?
![Page 22: Introduction to Machine Learning @ Mooncascade ML Camp](https://reader031.vdocuments.us/reader031/viewer/2022021813/58708ed81a28ab412b8b5129/html5/thumbnails/22.jpg)
The data is in the right format — what’s next?• C4.5• Randomforests• Bayesiannetworks• HiddenMarkovmodels• Artificialneuralnetwork• Dataclustering• Expectation-maximizationalgorithm• Self-organizingmap• Radialbasisfunctionnetwork• VectorQuantization• Generativetopographicmap• Informationbottleneckmethod• IBSEAD• Apriorialgorithm• Eclatalgorithm• FP-growthalgorithm• Single-linkageclustering• Conceptualclustering• K-meansalgorithm• Fuzzyclustering• Temporaldifferencelearning• Q-learning• LearningAutomata
• AODE• Artificialneuralnetwork• Backpropagation• NaiveBayesclassifier• Bayesiannetwork• Bayesianknowledgebase• Case-basedreasoning• Decisiontrees• Inductivelogicprogramming• Gaussianprocessregression• Geneexpressionprogramming• Groupmethodofdatahandling(GMDH)• LearningAutomata• LearningVectorQuantization• LogisticModelTree• Decisiontree• Decisiongraphs• Lazylearning• MonteCarloMethod• SARSA
• Instance-basedlearning• NearestNeighborAlgorithm• Analogicalmodeling• Probablyapproximatelycorrectlearning(PACL)• Symbolicmachinelearningalgorithms• Subsymbolicmachinelearningalgorithms• Supportvectormachines• RandomForest• Ensemblesofclassifiers• Bootstrapaggregating(bagging)• Boosting(meta-algorithm)• Ordinalclassification• Regressionanalysis• Informationfuzzynetworks(IFN)• Linearclassifiers• Fisher'slineardiscriminant• Logisticregression• NaiveBayesclassifier• Perceptron• Supportvectormachines• Quadraticclassifiers• k-nearestneighbor• Boosting
Pick an algorithm
![Page 23: Introduction to Machine Learning @ Mooncascade ML Camp](https://reader031.vdocuments.us/reader031/viewer/2022021813/58708ed81a28ab412b8b5129/html5/thumbnails/23.jpg)
The data is in the right format — what’s next?• C4.5• Randomforests• Bayesiannetworks• HiddenMarkovmodels• Artificialneuralnetwork• Dataclustering• Expectation-maximizationalgorithm• Self-organizingmap• Radialbasisfunctionnetwork• VectorQuantization• Generativetopographicmap• Informationbottleneckmethod• IBSEAD• Apriorialgorithm• Eclatalgorithm• FP-growthalgorithm• Single-linkageclustering• Conceptualclustering• K-meansalgorithm• Fuzzyclustering• Temporaldifferencelearning• Q-learning• LearningAutomata
• AODE• Artificialneuralnetwork• Backpropagation• NaiveBayesclassifier• Bayesiannetwork• Bayesianknowledgebase• Case-basedreasoning• Decisiontrees• Inductivelogicprogramming• Gaussianprocessregression• Geneexpressionprogramming• Groupmethodofdatahandling(GMDH)• LearningAutomata• LearningVectorQuantization• LogisticModelTree• Decisiontree• Decisiongraphs• Lazylearning• MonteCarloMethod• SARSA
• Instance-basedlearning• NearestNeighborAlgorithm• Analogicalmodeling• Probablyapproximatelycorrectlearning(PACL)• Symbolicmachinelearningalgorithms• Subsymbolicmachinelearningalgorithms• Supportvectormachines• RandomForest• Ensemblesofclassifiers• Bootstrapaggregating(bagging)• Boosting(meta-algorithm)• Ordinalclassification• Regressionanalysis• Informationfuzzynetworks(IFN)• Linearclassifiers• Fisher'slineardiscriminant• Logisticregression• NaiveBayesclassifier• Perceptron• Supportvectormachines• Quadraticclassifiers• k-nearestneighbor• Boosting
Pick an algorithm
![Page 24: Introduction to Machine Learning @ Mooncascade ML Camp](https://reader031.vdocuments.us/reader031/viewer/2022021813/58708ed81a28ab412b8b5129/html5/thumbnails/24.jpg)
DECISION TREE
vs.
![Page 25: Introduction to Machine Learning @ Mooncascade ML Camp](https://reader031.vdocuments.us/reader031/viewer/2022021813/58708ed81a28ab412b8b5129/html5/thumbnails/25.jpg)
DECISION TREE
vs.
(0, …, 28, 65, …, 207, 101, 0, 0)
(0, …, 19, 34, …, 254, 54, 0, 0)
(0, …, 87, 59, …, 240, 52, 4, 0)
(0, …, 87, 52, …, 240, 19, 3, 0)
(0, …, 28, 64, …, 102, 101, 0, 0)
(0, …, 19, 23, …, 105, 54, 0, 0)
(0, …, 87, 74, …, 121, 51, 7, 0)
(0, …, 87, 112, …, 239, 52, 4, 0)
![Page 26: Introduction to Machine Learning @ Mooncascade ML Camp](https://reader031.vdocuments.us/reader031/viewer/2022021813/58708ed81a28ab412b8b5129/html5/thumbnails/26.jpg)
DECISION TREE
vs.
(0, …, 28, 65, …, 207, 101, 0, 0)
(0, …, 19, 34, …, 254, 54, 0, 0)
(0, …, 87, 59, …, 240, 52, 4, 0)
(0, …, 87, 52, …, 240, 19, 3, 0)
(0, …, 28, 64, …, 102, 101, 0, 0)
(0, …, 19, 23, …, 105, 54, 0, 0)
(0, …, 87, 74, …, 121, 51, 7, 0)
(0, …, 87, 112, …, 239, 52, 4, 0)
PIXEL #417
![Page 27: Introduction to Machine Learning @ Mooncascade ML Camp](https://reader031.vdocuments.us/reader031/viewer/2022021813/58708ed81a28ab412b8b5129/html5/thumbnails/27.jpg)
DECISION TREE
vs.
(0, …, 28, 65, …, 207, 101, 0, 0)
(0, …, 19, 34, …, 254, 54, 0, 0)
(0, …, 87, 59, …, 240, 52, 4, 0)
(0, …, 87, 52, …, 240, 19, 3, 0)
(0, …, 28, 64, …, 102, 101, 0, 0)
(0, …, 19, 23, …, 105, 54, 0, 0)
(0, …, 87, 74, …, 121, 51, 7, 0)
(0, …, 87, 112, …, 239, 52, 4, 0)
PIXEL #417
PIXEL #417
>200 <200
![Page 28: Introduction to Machine Learning @ Mooncascade ML Camp](https://reader031.vdocuments.us/reader031/viewer/2022021813/58708ed81a28ab412b8b5129/html5/thumbnails/28.jpg)
DECISION TREE
vs.
(0, …, 28, 65, …, 207, 101, 0, 0)
(0, …, 19, 34, …, 254, 54, 0, 0)
(0, …, 87, 59, …, 240, 52, 4, 0)
(0, …, 87, 52, …, 240, 19, 3, 0)
(0, …, 28, 64, …, 102, 101, 0, 0)
(0, …, 19, 23, …, 105, 54, 0, 0)
(0, …, 87, 74, …, 121, 51, 7, 0)
(0, …, 87, 112, …, 239, 52, 4, 0)
PIXEL #417
PIXEL #417
>200 <200
![Page 29: Introduction to Machine Learning @ Mooncascade ML Camp](https://reader031.vdocuments.us/reader031/viewer/2022021813/58708ed81a28ab412b8b5129/html5/thumbnails/29.jpg)
DECISION TREE
vs.
(0, …, 28, 65, …, 207, 101, 0, 0)
(0, …, 19, 34, …, 254, 54, 0, 0)
(0, …, 87, 59, …, 240, 52, 4, 0)
(0, …, 87, 52, …, 240, 19, 3, 0)
(0, …, 28, 64, …, 102, 101, 0, 0)
(0, …, 19, 23, …, 105, 54, 0, 0)
(0, …, 87, 74, …, 121, 51, 7, 0)
(0, …, 87, 112, …, 239, 52, 4, 0)
PIXEL #417
>200 <200
![Page 30: Introduction to Machine Learning @ Mooncascade ML Camp](https://reader031.vdocuments.us/reader031/viewer/2022021813/58708ed81a28ab412b8b5129/html5/thumbnails/30.jpg)
DECISION TREE
vs.
(0, …, 28, 65, …, 207, 101, 0, 0)
(0, …, 19, 34, …, 254, 54, 0, 0)
(0, …, 87, 59, …, 240, 52, 4, 0)
(0, …, 87, 52, …, 240, 19, 3, 0)
(0, …, 28, 64, …, 102, 101, 0, 0)
(0, …, 19, 23, …, 105, 54, 0, 0)
(0, …, 87, 74, …, 121, 51, 7, 0)
(0, …, 87, 112, …, 239, 52, 4, 0)
PIXEL #417
>200 <200
PIXEL #123
![Page 31: Introduction to Machine Learning @ Mooncascade ML Camp](https://reader031.vdocuments.us/reader031/viewer/2022021813/58708ed81a28ab412b8b5129/html5/thumbnails/31.jpg)
DECISION TREE
vs.
(0, …, 28, 65, …, 207, 101, 0, 0)
(0, …, 19, 34, …, 254, 54, 0, 0)
(0, …, 87, 59, …, 240, 52, 4, 0)
(0, …, 87, 52, …, 240, 19, 3, 0)
(0, …, 28, 64, …, 102, 101, 0, 0)
(0, …, 19, 23, …, 105, 54, 0, 0)
(0, …, 87, 74, …, 121, 51, 7, 0)
(0, …, 87, 112, …, 239, 52, 4, 0)
PIXEL #417
>200 <200
PIXEL #123
<100 >100
PIXEL #123
![Page 32: Introduction to Machine Learning @ Mooncascade ML Camp](https://reader031.vdocuments.us/reader031/viewer/2022021813/58708ed81a28ab412b8b5129/html5/thumbnails/32.jpg)
DECISION TREE
vs.
(0, …, 28, 65, …, 207, 101, 0, 0)
(0, …, 19, 34, …, 254, 54, 0, 0)
(0, …, 87, 59, …, 240, 52, 4, 0)
(0, …, 87, 52, …, 240, 19, 3, 0)
(0, …, 28, 64, …, 102, 101, 0, 0)
(0, …, 19, 23, …, 105, 54, 0, 0)
(0, …, 87, 74, …, 121, 51, 7, 0)
(0, …, 87, 112, …, 239, 52, 4, 0)
PIXEL #417
>200 <200
<100 >100
PIXEL #123
![Page 33: Introduction to Machine Learning @ Mooncascade ML Camp](https://reader031.vdocuments.us/reader031/viewer/2022021813/58708ed81a28ab412b8b5129/html5/thumbnails/33.jpg)
DECISION TREE
![Page 34: Introduction to Machine Learning @ Mooncascade ML Camp](https://reader031.vdocuments.us/reader031/viewer/2022021813/58708ed81a28ab412b8b5129/html5/thumbnails/34.jpg)
DECISION TREE
![Page 35: Introduction to Machine Learning @ Mooncascade ML Camp](https://reader031.vdocuments.us/reader031/viewer/2022021813/58708ed81a28ab412b8b5129/html5/thumbnails/35.jpg)
ACCURACY
![Page 36: Introduction to Machine Learning @ Mooncascade ML Camp](https://reader031.vdocuments.us/reader031/viewer/2022021813/58708ed81a28ab412b8b5129/html5/thumbnails/36.jpg)
ACCURACY
Confusion matrix
True
cla
ss
Predicted class
![Page 37: Introduction to Machine Learning @ Mooncascade ML Camp](https://reader031.vdocuments.us/reader031/viewer/2022021813/58708ed81a28ab412b8b5129/html5/thumbnails/37.jpg)
ACCURACY
Confusion matrix
acc =
correctly classified
total number of samples
True
cla
ss
Predicted class
![Page 38: Introduction to Machine Learning @ Mooncascade ML Camp](https://reader031.vdocuments.us/reader031/viewer/2022021813/58708ed81a28ab412b8b5129/html5/thumbnails/38.jpg)
ACCURACY
Confusion matrix
acc =
correctly classified
total number of samples
Beware of an imbalanced dataset!
True
cla
ss
Predicted class
![Page 39: Introduction to Machine Learning @ Mooncascade ML Camp](https://reader031.vdocuments.us/reader031/viewer/2022021813/58708ed81a28ab412b8b5129/html5/thumbnails/39.jpg)
ACCURACY
Confusion matrix
acc =
correctly classified
total number of samples
Beware of an imbalanced dataset!
Consider the following model: “Always predict 2”
True
cla
ss
Predicted class
![Page 40: Introduction to Machine Learning @ Mooncascade ML Camp](https://reader031.vdocuments.us/reader031/viewer/2022021813/58708ed81a28ab412b8b5129/html5/thumbnails/40.jpg)
ACCURACY
Confusion matrix
acc =
correctly classified
total number of samples
Beware of an imbalanced dataset!
Consider the following model: “Always predict 2”
Accuracy 0.9
True
cla
ss
Predicted class
![Page 41: Introduction to Machine Learning @ Mooncascade ML Camp](https://reader031.vdocuments.us/reader031/viewer/2022021813/58708ed81a28ab412b8b5129/html5/thumbnails/41.jpg)
DECISION TREE
![Page 42: Introduction to Machine Learning @ Mooncascade ML Camp](https://reader031.vdocuments.us/reader031/viewer/2022021813/58708ed81a28ab412b8b5129/html5/thumbnails/42.jpg)
DECISION TREE
“You said 100% accurate?! Every 10th digit your system detects is wrong!”
Angry client
![Page 43: Introduction to Machine Learning @ Mooncascade ML Camp](https://reader031.vdocuments.us/reader031/viewer/2022021813/58708ed81a28ab412b8b5129/html5/thumbnails/43.jpg)
DECISION TREE
“You said 100% accurate?! Every 10th digit your system detects is wrong!”
Angry client
We’ve trained our system on the data the client gave us. But our system has never seen the new data the client applied it to.
And in the real life — it never will…
![Page 44: Introduction to Machine Learning @ Mooncascade ML Camp](https://reader031.vdocuments.us/reader031/viewer/2022021813/58708ed81a28ab412b8b5129/html5/thumbnails/44.jpg)
OVERFITTING
Simulate the real-life situation — split the dataset
![Page 45: Introduction to Machine Learning @ Mooncascade ML Camp](https://reader031.vdocuments.us/reader031/viewer/2022021813/58708ed81a28ab412b8b5129/html5/thumbnails/45.jpg)
OVERFITTING
Simulate the real-life situation — split the dataset
![Page 46: Introduction to Machine Learning @ Mooncascade ML Camp](https://reader031.vdocuments.us/reader031/viewer/2022021813/58708ed81a28ab412b8b5129/html5/thumbnails/46.jpg)
OVERFITTING
Simulate the real-life situation — split the dataset
![Page 47: Introduction to Machine Learning @ Mooncascade ML Camp](https://reader031.vdocuments.us/reader031/viewer/2022021813/58708ed81a28ab412b8b5129/html5/thumbnails/47.jpg)
OVERFITTING
Simulate the real-life situation — split the dataset
![Page 48: Introduction to Machine Learning @ Mooncascade ML Camp](https://reader031.vdocuments.us/reader031/viewer/2022021813/58708ed81a28ab412b8b5129/html5/thumbnails/48.jpg)
Underfitting!“Too stupid” OK Overfitting!
“Too smart”
OVERFITTING
![Page 49: Introduction to Machine Learning @ Mooncascade ML Camp](https://reader031.vdocuments.us/reader031/viewer/2022021813/58708ed81a28ab412b8b5129/html5/thumbnails/49.jpg)
Underfitting!“Too stupid” OK Overfitting!
“Too smart”
OVERFITTING
Our current decision tree has too much capacity, it just has memorized all of the data.
Let’s make it less complex.
![Page 50: Introduction to Machine Learning @ Mooncascade ML Camp](https://reader031.vdocuments.us/reader031/viewer/2022021813/58708ed81a28ab412b8b5129/html5/thumbnails/50.jpg)
![Page 51: Introduction to Machine Learning @ Mooncascade ML Camp](https://reader031.vdocuments.us/reader031/viewer/2022021813/58708ed81a28ab412b8b5129/html5/thumbnails/51.jpg)
![Page 52: Introduction to Machine Learning @ Mooncascade ML Camp](https://reader031.vdocuments.us/reader031/viewer/2022021813/58708ed81a28ab412b8b5129/html5/thumbnails/52.jpg)
![Page 53: Introduction to Machine Learning @ Mooncascade ML Camp](https://reader031.vdocuments.us/reader031/viewer/2022021813/58708ed81a28ab412b8b5129/html5/thumbnails/53.jpg)
You probably did not notice, but we are overfitting again :(
![Page 54: Introduction to Machine Learning @ Mooncascade ML Camp](https://reader031.vdocuments.us/reader031/viewer/2022021813/58708ed81a28ab412b8b5129/html5/thumbnails/54.jpg)
TEST SET 20%
TRAINING SET 60%
THE WHOLE DATASET
VALIDATION SET 20%
![Page 55: Introduction to Machine Learning @ Mooncascade ML Camp](https://reader031.vdocuments.us/reader031/viewer/2022021813/58708ed81a28ab412b8b5129/html5/thumbnails/55.jpg)
TEST SET 20%
TRAINING SET 60%
THE WHOLE DATASET
VALIDATION SET 20%
Fit various models and parameter combinations on this subset
![Page 56: Introduction to Machine Learning @ Mooncascade ML Camp](https://reader031.vdocuments.us/reader031/viewer/2022021813/58708ed81a28ab412b8b5129/html5/thumbnails/56.jpg)
TEST SET 20%
TRAINING SET 60%
THE WHOLE DATASET
VALIDATION SET 20%
Fit various models and parameter combinations on this subset
• Evaluate the models created with different parameters
![Page 57: Introduction to Machine Learning @ Mooncascade ML Camp](https://reader031.vdocuments.us/reader031/viewer/2022021813/58708ed81a28ab412b8b5129/html5/thumbnails/57.jpg)
TEST SET 20%
TRAINING SET 60%
THE WHOLE DATASET
VALIDATION SET 20%
Fit various models and parameter combinations on this subset
• Evaluate the models created with different parameters
!• Estimate overfitting
TRAVALI
![Page 58: Introduction to Machine Learning @ Mooncascade ML Camp](https://reader031.vdocuments.us/reader031/viewer/2022021813/58708ed81a28ab412b8b5129/html5/thumbnails/58.jpg)
TEST SET 20%
TRAINING SET 60%
THE WHOLE DATASET
VALIDATION SET 20%
Fit various models and parameter combinations on this subset
• Evaluate the models created with different parameters
!• Estimate overfitting
TRAVALITRAVALI
![Page 59: Introduction to Machine Learning @ Mooncascade ML Camp](https://reader031.vdocuments.us/reader031/viewer/2022021813/58708ed81a28ab412b8b5129/html5/thumbnails/59.jpg)
TEST SET 20%
TRAINING SET 60%
THE WHOLE DATASET
VALIDATION SET 20%
Fit various models and parameter combinations on this subset
• Evaluate the models created with different parameters
!• Estimate overfitting
TRAVALITRAVALITRAVALI
![Page 60: Introduction to Machine Learning @ Mooncascade ML Camp](https://reader031.vdocuments.us/reader031/viewer/2022021813/58708ed81a28ab412b8b5129/html5/thumbnails/60.jpg)
TEST SET 20%
TRAINING SET 60%
THE WHOLE DATASET
VALIDATION SET 20%
Fit various models and parameter combinations on this subset
• Evaluate the models created with different parameters
!• Estimate overfitting
TRAVALITRAVALITRAVALITRAVALI
![Page 61: Introduction to Machine Learning @ Mooncascade ML Camp](https://reader031.vdocuments.us/reader031/viewer/2022021813/58708ed81a28ab412b8b5129/html5/thumbnails/61.jpg)
TEST SET 20%
TRAINING SET 60%
THE WHOLE DATASET
VALIDATION SET 20%
Fit various models and parameter combinations on this subset
• Evaluate the models created with different parameters
!• Estimate overfitting
TRAVALITRAVALITRAVALITRAVALITRAVALI
![Page 62: Introduction to Machine Learning @ Mooncascade ML Camp](https://reader031.vdocuments.us/reader031/viewer/2022021813/58708ed81a28ab412b8b5129/html5/thumbnails/62.jpg)
TEST SET 20%
TRAINING SET 60%
THE WHOLE DATASET
VALIDATION SET 20%
Fit various models and parameter combinations on this subset
• Evaluate the models created with different parameters
!• Estimate overfitting
Use only once to get the final performance estimate
TRAVALITRAVALITRAVALITRAVALITRAVALI
![Page 63: Introduction to Machine Learning @ Mooncascade ML Camp](https://reader031.vdocuments.us/reader031/viewer/2022021813/58708ed81a28ab412b8b5129/html5/thumbnails/63.jpg)
TEST SET 20%
TRAINING SET 60%
VALIDATION SET 20%
![Page 64: Introduction to Machine Learning @ Mooncascade ML Camp](https://reader031.vdocuments.us/reader031/viewer/2022021813/58708ed81a28ab412b8b5129/html5/thumbnails/64.jpg)
TEST SET 20%
TRAINING SET 60%
VALIDATION SET 20%
![Page 65: Introduction to Machine Learning @ Mooncascade ML Camp](https://reader031.vdocuments.us/reader031/viewer/2022021813/58708ed81a28ab412b8b5129/html5/thumbnails/65.jpg)
CROSS-VALIDATION
TRAINING SET 60%
THE WHOLE DATASET
VALIDATION SET 20%
![Page 66: Introduction to Machine Learning @ Mooncascade ML Camp](https://reader031.vdocuments.us/reader031/viewer/2022021813/58708ed81a28ab412b8b5129/html5/thumbnails/66.jpg)
CROSS-VALIDATION
TRAINING SET 60%
THE WHOLE DATASET
VALIDATION SET 20%
What if we got too optimistic validation set?
![Page 67: Introduction to Machine Learning @ Mooncascade ML Camp](https://reader031.vdocuments.us/reader031/viewer/2022021813/58708ed81a28ab412b8b5129/html5/thumbnails/67.jpg)
CROSS-VALIDATION
TRAINING SET 60%
THE WHOLE DATASET
VALIDATION SET 20%
What if we got too optimistic validation set?
TRAINING SET 80%
![Page 68: Introduction to Machine Learning @ Mooncascade ML Camp](https://reader031.vdocuments.us/reader031/viewer/2022021813/58708ed81a28ab412b8b5129/html5/thumbnails/68.jpg)
CROSS-VALIDATION
TRAINING SET 60%
THE WHOLE DATASET
VALIDATION SET 20%
What if we got too optimistic validation set?
TRAINING SET 80%
Fix the parameter value you ned to evaluate, say msl=15
![Page 69: Introduction to Machine Learning @ Mooncascade ML Camp](https://reader031.vdocuments.us/reader031/viewer/2022021813/58708ed81a28ab412b8b5129/html5/thumbnails/69.jpg)
CROSS-VALIDATION
TRAINING SET 60%
THE WHOLE DATASET
VALIDATION SET 20%
What if we got too optimistic validation set?
TRAINING SET 80%
Fix the parameter value you ned to evaluate, say msl=15
TRAINING VAL
TRAINING VAL
TRAININGVAL
Repeat 10 times
![Page 70: Introduction to Machine Learning @ Mooncascade ML Camp](https://reader031.vdocuments.us/reader031/viewer/2022021813/58708ed81a28ab412b8b5129/html5/thumbnails/70.jpg)
CROSS-VALIDATION
TRAINING SET 60%
THE WHOLE DATASET
VALIDATION SET 20%
What if we got too optimistic validation set?
TRAINING SET 80%
Fix the parameter value you ned to evaluate, say msl=15
TRAINING VAL
TRAINING VAL
TRAININGVAL
Repeat 10 times } Take average validation score over 10 runs — it is a more stable estimate.
![Page 71: Introduction to Machine Learning @ Mooncascade ML Camp](https://reader031.vdocuments.us/reader031/viewer/2022021813/58708ed81a28ab412b8b5129/html5/thumbnails/71.jpg)
![Page 72: Introduction to Machine Learning @ Mooncascade ML Camp](https://reader031.vdocuments.us/reader031/viewer/2022021813/58708ed81a28ab412b8b5129/html5/thumbnails/72.jpg)
![Page 73: Introduction to Machine Learning @ Mooncascade ML Camp](https://reader031.vdocuments.us/reader031/viewer/2022021813/58708ed81a28ab412b8b5129/html5/thumbnails/73.jpg)
![Page 74: Introduction to Machine Learning @ Mooncascade ML Camp](https://reader031.vdocuments.us/reader031/viewer/2022021813/58708ed81a28ab412b8b5129/html5/thumbnails/74.jpg)
MACHINE LEARNING PIPELINE
Take raw data Extract features Split into TRAINING and TEST
Pick an algorithm and parameters
Train on the TRAINING data
Evaluate on the TRAINING data
with CV
Train on the whole TRAINING
Fix the best parameters
Evaluate on TESTReport final
performance to the client
Try our different algorithms and parameters
![Page 75: Introduction to Machine Learning @ Mooncascade ML Camp](https://reader031.vdocuments.us/reader031/viewer/2022021813/58708ed81a28ab412b8b5129/html5/thumbnails/75.jpg)
MACHINE LEARNING PIPELINE
Take raw data Extract features Split into TRAINING and TEST
Pick an algorithm and parameters
Train on the TRAINING data
Evaluate on the TRAINING data
with CV
Train on the whole TRAINING
Fix the best parameters
Evaluate on TESTReport final
performance to the client
Try our different algorithms and parameters
“So it is ~87%…erm… Could you do better?”
![Page 76: Introduction to Machine Learning @ Mooncascade ML Camp](https://reader031.vdocuments.us/reader031/viewer/2022021813/58708ed81a28ab412b8b5129/html5/thumbnails/76.jpg)
MACHINE LEARNING PIPELINE
Take raw data Extract features Split into TRAINING and TEST
Pick an algorithm and parameters
Train on the TRAINING data
Evaluate on the TRAINING data
with CV
Train on the whole TRAINING
Fix the best parameters
Evaluate on TESTReport final
performance to the client
Try our different algorithms and parameters
“So it is ~87%…erm… Could you do better?”
Yes
![Page 77: Introduction to Machine Learning @ Mooncascade ML Camp](https://reader031.vdocuments.us/reader031/viewer/2022021813/58708ed81a28ab412b8b5129/html5/thumbnails/77.jpg)
• C4.5• Randomforests• Bayesiannetworks• HiddenMarkovmodels• Artificialneuralnetwork• Dataclustering• Expectation-maximizationalgorithm• Self-organizingmap• Radialbasisfunctionnetwork• VectorQuantization• Generativetopographicmap• Informationbottleneckmethod• IBSEAD• Apriorialgorithm• Eclatalgorithm• FP-growthalgorithm• Single-linkageclustering• Conceptualclustering• K-meansalgorithm• Fuzzyclustering• Temporaldifferencelearning• Q-learning• LearningAutomata
• AODE• Artificialneuralnetwork• Backpropagation• NaiveBayesclassifier• Bayesiannetwork• Bayesianknowledgebase• Case-basedreasoning• Decisiontrees• Inductivelogicprogramming• Gaussianprocessregression• Geneexpressionprogramming• Groupmethodofdatahandling(GMDH)• LearningAutomata• LearningVectorQuantization• LogisticModelTree• Decisiontree• Decisiongraphs• Lazylearning• MonteCarloMethod• SARSA
• Instance-basedlearning• NearestNeighborAlgorithm• Analogicalmodeling• Probablyapproximatelycorrectlearning(PACL)• Symbolicmachinelearningalgorithms• Subsymbolicmachinelearningalgorithms• Supportvectormachines• RandomForest• Ensemblesofclassifiers• Bootstrapaggregating(bagging)• Boosting(meta-algorithm)• Ordinalclassification• Regressionanalysis• Informationfuzzynetworks(IFN)• Linearclassifiers• Fisher'slineardiscriminant• Logisticregression• NaiveBayesclassifier• Perceptron• Supportvectormachines• Quadraticclassifiers• k-nearestneighbor• Boosting
Pick another algorithm
![Page 78: Introduction to Machine Learning @ Mooncascade ML Camp](https://reader031.vdocuments.us/reader031/viewer/2022021813/58708ed81a28ab412b8b5129/html5/thumbnails/78.jpg)
• C4.5• Randomforests• Bayesiannetworks• HiddenMarkovmodels• Artificialneuralnetwork• Dataclustering• Expectation-maximizationalgorithm• Self-organizingmap• Radialbasisfunctionnetwork• VectorQuantization• Generativetopographicmap• Informationbottleneckmethod• IBSEAD• Apriorialgorithm• Eclatalgorithm• FP-growthalgorithm• Single-linkageclustering• Conceptualclustering• K-meansalgorithm• Fuzzyclustering• Temporaldifferencelearning• Q-learning• LearningAutomata
• AODE• Artificialneuralnetwork• Backpropagation• NaiveBayesclassifier• Bayesiannetwork• Bayesianknowledgebase• Case-basedreasoning• Decisiontrees• Inductivelogicprogramming• Gaussianprocessregression• Geneexpressionprogramming• Groupmethodofdatahandling(GMDH)• LearningAutomata• LearningVectorQuantization• LogisticModelTree• Decisiontree• Decisiongraphs• Lazylearning• MonteCarloMethod• SARSA
• Instance-basedlearning• NearestNeighborAlgorithm• Analogicalmodeling• Probablyapproximatelycorrectlearning(PACL)• Symbolicmachinelearningalgorithms• Subsymbolicmachinelearningalgorithms• Supportvectormachines• RandomForest• Ensemblesofclassifiers• Bootstrapaggregating(bagging)• Boosting(meta-algorithm)• Ordinalclassification• Regressionanalysis• Informationfuzzynetworks(IFN)• Linearclassifiers• Fisher'slineardiscriminant• Logisticregression• NaiveBayesclassifier• Perceptron• Supportvectormachines• Quadraticclassifiers• k-nearestneighbor• Boosting
Pick another algorithm
![Page 79: Introduction to Machine Learning @ Mooncascade ML Camp](https://reader031.vdocuments.us/reader031/viewer/2022021813/58708ed81a28ab412b8b5129/html5/thumbnails/79.jpg)
RANDOM FOREST
![Page 80: Introduction to Machine Learning @ Mooncascade ML Camp](https://reader031.vdocuments.us/reader031/viewer/2022021813/58708ed81a28ab412b8b5129/html5/thumbnails/80.jpg)
RANDOM FORESTDecision tree:
pick best out of all features
![Page 81: Introduction to Machine Learning @ Mooncascade ML Camp](https://reader031.vdocuments.us/reader031/viewer/2022021813/58708ed81a28ab412b8b5129/html5/thumbnails/81.jpg)
RANDOM FORESTDecision tree:
pick best out of all featuresRandom forest:
pick best out of random subset of features
![Page 82: Introduction to Machine Learning @ Mooncascade ML Camp](https://reader031.vdocuments.us/reader031/viewer/2022021813/58708ed81a28ab412b8b5129/html5/thumbnails/82.jpg)
RANDOM FOREST
![Page 83: Introduction to Machine Learning @ Mooncascade ML Camp](https://reader031.vdocuments.us/reader031/viewer/2022021813/58708ed81a28ab412b8b5129/html5/thumbnails/83.jpg)
RANDOM FOREST
pick best out of another random subset of features
![Page 84: Introduction to Machine Learning @ Mooncascade ML Camp](https://reader031.vdocuments.us/reader031/viewer/2022021813/58708ed81a28ab412b8b5129/html5/thumbnails/84.jpg)
RANDOM FOREST
pick best out of another random subset of features pick best out of yet another
random subset of features
![Page 85: Introduction to Machine Learning @ Mooncascade ML Camp](https://reader031.vdocuments.us/reader031/viewer/2022021813/58708ed81a28ab412b8b5129/html5/thumbnails/85.jpg)
RANDOM FOREST
![Page 86: Introduction to Machine Learning @ Mooncascade ML Camp](https://reader031.vdocuments.us/reader031/viewer/2022021813/58708ed81a28ab412b8b5129/html5/thumbnails/86.jpg)
RANDOM FOREST
![Page 87: Introduction to Machine Learning @ Mooncascade ML Camp](https://reader031.vdocuments.us/reader031/viewer/2022021813/58708ed81a28ab412b8b5129/html5/thumbnails/87.jpg)
RANDOM FOREST
class
instance
![Page 88: Introduction to Machine Learning @ Mooncascade ML Camp](https://reader031.vdocuments.us/reader031/viewer/2022021813/58708ed81a28ab412b8b5129/html5/thumbnails/88.jpg)
RANDOM FOREST
class
instance
![Page 89: Introduction to Machine Learning @ Mooncascade ML Camp](https://reader031.vdocuments.us/reader031/viewer/2022021813/58708ed81a28ab412b8b5129/html5/thumbnails/89.jpg)
RANDOM FOREST
class
instance
![Page 90: Introduction to Machine Learning @ Mooncascade ML Camp](https://reader031.vdocuments.us/reader031/viewer/2022021813/58708ed81a28ab412b8b5129/html5/thumbnails/90.jpg)
RANDOM FOREST
class
instance
![Page 91: Introduction to Machine Learning @ Mooncascade ML Camp](https://reader031.vdocuments.us/reader031/viewer/2022021813/58708ed81a28ab412b8b5129/html5/thumbnails/91.jpg)
![Page 92: Introduction to Machine Learning @ Mooncascade ML Camp](https://reader031.vdocuments.us/reader031/viewer/2022021813/58708ed81a28ab412b8b5129/html5/thumbnails/92.jpg)
![Page 93: Introduction to Machine Learning @ Mooncascade ML Camp](https://reader031.vdocuments.us/reader031/viewer/2022021813/58708ed81a28ab412b8b5129/html5/thumbnails/93.jpg)
Happy client
![Page 94: Introduction to Machine Learning @ Mooncascade ML Camp](https://reader031.vdocuments.us/reader031/viewer/2022021813/58708ed81a28ab412b8b5129/html5/thumbnails/94.jpg)
ALL OTHER USE CASES
![Page 95: Introduction to Machine Learning @ Mooncascade ML Camp](https://reader031.vdocuments.us/reader031/viewer/2022021813/58708ed81a28ab412b8b5129/html5/thumbnails/95.jpg)
Sound
Frequency components Genre Bag of
words Topic
Text
Pixel values
Image
Cat or dog
Video
Frame pixels
Walking or running
Database records Biometric data
Census data
Average salary … Dead or
alive
![Page 96: Introduction to Machine Learning @ Mooncascade ML Camp](https://reader031.vdocuments.us/reader031/viewer/2022021813/58708ed81a28ab412b8b5129/html5/thumbnails/96.jpg)
![Page 97: Introduction to Machine Learning @ Mooncascade ML Camp](https://reader031.vdocuments.us/reader031/viewer/2022021813/58708ed81a28ab412b8b5129/html5/thumbnails/97.jpg)
![Page 98: Introduction to Machine Learning @ Mooncascade ML Camp](https://reader031.vdocuments.us/reader031/viewer/2022021813/58708ed81a28ab412b8b5129/html5/thumbnails/98.jpg)
![Page 99: Introduction to Machine Learning @ Mooncascade ML Camp](https://reader031.vdocuments.us/reader031/viewer/2022021813/58708ed81a28ab412b8b5129/html5/thumbnails/99.jpg)
![Page 100: Introduction to Machine Learning @ Mooncascade ML Camp](https://reader031.vdocuments.us/reader031/viewer/2022021813/58708ed81a28ab412b8b5129/html5/thumbnails/100.jpg)
![Page 101: Introduction to Machine Learning @ Mooncascade ML Camp](https://reader031.vdocuments.us/reader031/viewer/2022021813/58708ed81a28ab412b8b5129/html5/thumbnails/101.jpg)
HANDS-ON SESSION
![Page 102: Introduction to Machine Learning @ Mooncascade ML Camp](https://reader031.vdocuments.us/reader031/viewer/2022021813/58708ed81a28ab412b8b5129/html5/thumbnails/102.jpg)
http://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestClassifier.html
![Page 103: Introduction to Machine Learning @ Mooncascade ML Camp](https://reader031.vdocuments.us/reader031/viewer/2022021813/58708ed81a28ab412b8b5129/html5/thumbnails/103.jpg)