introduction to machine learning @ mooncascade ml camp

by Ilya Kuzovkin ilya.kuzovkin@gmail.com

Mooncascade ML Camp 2016

Machine LearningESSENTIAL CONCEPTS

ONE MACHINE LEARNING USE CASE

Can we ask a computer to create those patterns

automatically?

Raw data

Instance Raw dataClass (label)A data sample:

“7”

How to represent it in a machine-readable form?

“7”

Feature extraction

“7”

Feature extraction

“7”

28 px784 pixels in total

Feature vector(0, 0, 0, …, 28, 65, 128, 255, 101, 38,… 0, 0, 0)

Feature extraction

“7”

Feature vector(0, 0, 0, …, 28, 65, 128, 255, 101, 38,… 0, 0, 0)

Feature extraction

(0, 0, 0, …, 28, 65, 128, 255, 101, 38,… 0, 0, 0)

(0, 0, 0, …, 13, 48, 102, 0, 46, 255,… 0, 0, 0)

(0, 0, 0, …, 17, 34, 12, 43, 122, 70,… 0, 7, 0)

(0, 0, 0, …, 98, 21, 255, 255, 231, 140,… 0, 0, 0)

“7”“2”

“8”“2”

“7”

Feature vector(0, 0, 0, …, 28, 65, 128, 255, 101, 38,… 0, 0, 0)

Feature extraction

(0, 0, 0, …, 28, 65, 128, 255, 101, 38,… 0, 0, 0)

(0, 0, 0, …, 13, 48, 102, 0, 46, 255,… 0, 0, 0)

(0, 0, 0, …, 17, 34, 12, 43, 122, 70,… 0, 7, 0) Dataset(0, 0, 0, …, 98, 21, 255, 255, 231, 140,… 0, 0, 0)

“7”“2”

“8”“2”

The data is in the right format — what’s next?

The data is in the right format — what’s next?• C4.5• Randomforests• Bayesiannetworks• HiddenMarkovmodels• Artificialneuralnetwork• Dataclustering• Expectation-maximizationalgorithm• Self-organizingmap• Radialbasisfunctionnetwork• VectorQuantization• Generativetopographicmap• Informationbottleneckmethod• IBSEAD• Apriorialgorithm• Eclatalgorithm• FP-growthalgorithm• Single-linkageclustering• Conceptualclustering• K-meansalgorithm• Fuzzyclustering• Temporaldifferencelearning• Q-learning• LearningAutomata

• AODE• Artificialneuralnetwork• Backpropagation• NaiveBayesclassifier• Bayesiannetwork• Bayesianknowledgebase• Case-basedreasoning• Decisiontrees• Inductivelogicprogramming• Gaussianprocessregression• Geneexpressionprogramming• Groupmethodofdatahandling(GMDH)• LearningAutomata• LearningVectorQuantization• LogisticModelTree• Decisiontree• Decisiongraphs• Lazylearning• MonteCarloMethod• SARSA

• Instance-basedlearning• NearestNeighborAlgorithm• Analogicalmodeling• Probablyapproximatelycorrectlearning(PACL)• Symbolicmachinelearningalgorithms• Subsymbolicmachinelearningalgorithms• Supportvectormachines• RandomForest• Ensemblesofclassifiers• Bootstrapaggregating(bagging)• Boosting(meta-algorithm)• Ordinalclassification• Regressionanalysis• Informationfuzzynetworks(IFN)• Linearclassifiers• Fisher'slineardiscriminant• Logisticregression• NaiveBayesclassifier• Perceptron• Supportvectormachines• Quadraticclassifiers• k-nearestneighbor• Boosting

Pick an algorithm

The data is in the right format — what’s next?• C4.5• Randomforests• Bayesiannetworks• HiddenMarkovmodels• Artificialneuralnetwork• Dataclustering• Expectation-maximizationalgorithm• Self-organizingmap• Radialbasisfunctionnetwork• VectorQuantization• Generativetopographicmap• Informationbottleneckmethod• IBSEAD• Apriorialgorithm• Eclatalgorithm• FP-growthalgorithm• Single-linkageclustering• Conceptualclustering• K-meansalgorithm• Fuzzyclustering• Temporaldifferencelearning• Q-learning• LearningAutomata

Pick an algorithm

DECISION TREE

(0, …, 28, 65, …, 207, 101, 0, 0)

(0, …, 19, 34, …, 254, 54, 0, 0)

(0, …, 87, 59, …, 240, 52, 4, 0)

(0, …, 87, 52, …, 240, 19, 3, 0)

(0, …, 28, 64, …, 102, 101, 0, 0)

(0, …, 19, 23, …, 105, 54, 0, 0)

(0, …, 87, 74, …, 121, 51, 7, 0)

(0, …, 87, 112, …, 239, 52, 4, 0)

DECISION TREE

(0, …, 28, 65, …, 207, 101, 0, 0)

(0, …, 19, 34, …, 254, 54, 0, 0)

(0, …, 87, 59, …, 240, 52, 4, 0)

(0, …, 87, 52, …, 240, 19, 3, 0)

(0, …, 28, 64, …, 102, 101, 0, 0)

(0, …, 19, 23, …, 105, 54, 0, 0)

(0, …, 87, 74, …, 121, 51, 7, 0)

(0, …, 87, 112, …, 239, 52, 4, 0)

PIXEL #417

DECISION TREE

(0, …, 28, 65, …, 207, 101, 0, 0)

(0, …, 19, 34, …, 254, 54, 0, 0)

(0, …, 87, 59, …, 240, 52, 4, 0)

(0, …, 87, 52, …, 240, 19, 3, 0)

(0, …, 28, 64, …, 102, 101, 0, 0)

(0, …, 19, 23, …, 105, 54, 0, 0)

(0, …, 87, 74, …, 121, 51, 7, 0)

(0, …, 87, 112, …, 239, 52, 4, 0)

PIXEL #417

>200 <200

DECISION TREE

(0, …, 28, 65, …, 207, 101, 0, 0)

(0, …, 19, 34, …, 254, 54, 0, 0)

(0, …, 87, 59, …, 240, 52, 4, 0)

(0, …, 87, 52, …, 240, 19, 3, 0)

(0, …, 28, 64, …, 102, 101, 0, 0)

(0, …, 19, 23, …, 105, 54, 0, 0)

(0, …, 87, 74, …, 121, 51, 7, 0)

(0, …, 87, 112, …, 239, 52, 4, 0)

PIXEL #417

>200 <200

DECISION TREE

(0, …, 28, 65, …, 207, 101, 0, 0)

(0, …, 19, 34, …, 254, 54, 0, 0)

(0, …, 87, 59, …, 240, 52, 4, 0)

(0, …, 87, 52, …, 240, 19, 3, 0)

(0, …, 28, 64, …, 102, 101, 0, 0)

(0, …, 19, 23, …, 105, 54, 0, 0)

(0, …, 87, 74, …, 121, 51, 7, 0)

(0, …, 87, 112, …, 239, 52, 4, 0)

PIXEL #417

>200 <200

DECISION TREE

(0, …, 28, 65, …, 207, 101, 0, 0)

(0, …, 19, 34, …, 254, 54, 0, 0)

(0, …, 87, 59, …, 240, 52, 4, 0)

(0, …, 87, 52, …, 240, 19, 3, 0)

(0, …, 28, 64, …, 102, 101, 0, 0)

(0, …, 19, 23, …, 105, 54, 0, 0)

(0, …, 87, 74, …, 121, 51, 7, 0)

(0, …, 87, 112, …, 239, 52, 4, 0)

PIXEL #417

>200 <200

PIXEL #123

DECISION TREE

(0, …, 28, 65, …, 207, 101, 0, 0)

(0, …, 19, 34, …, 254, 54, 0, 0)

(0, …, 87, 59, …, 240, 52, 4, 0)

(0, …, 87, 52, …, 240, 19, 3, 0)

(0, …, 28, 64, …, 102, 101, 0, 0)

(0, …, 19, 23, …, 105, 54, 0, 0)

(0, …, 87, 74, …, 121, 51, 7, 0)

(0, …, 87, 112, …, 239, 52, 4, 0)

PIXEL #417

>200 <200

PIXEL #123

<100 >100

PIXEL #123

DECISION TREE

(0, …, 28, 65, …, 207, 101, 0, 0)

(0, …, 19, 34, …, 254, 54, 0, 0)

(0, …, 87, 59, …, 240, 52, 4, 0)

(0, …, 87, 52, …, 240, 19, 3, 0)

(0, …, 28, 64, …, 102, 101, 0, 0)

(0, …, 19, 23, …, 105, 54, 0, 0)

(0, …, 87, 74, …, 121, 51, 7, 0)

(0, …, 87, 112, …, 239, 52, 4, 0)

PIXEL #417

>200 <200

<100 >100

PIXEL #123

DECISION TREE

ACCURACY

Confusion matrix

Predicted class

ACCURACY

Confusion matrix

correctly classified

total number of samples

Predicted class

ACCURACY

Confusion matrix

Beware of an imbalanced dataset!

Predicted class

ACCURACY

Confusion matrix

Consider the following model: “Always predict 2”

Predicted class

ACCURACY

Confusion matrix

Consider the following model: “Always predict 2”

Accuracy 0.9

Predicted class

DECISION TREE

“You said 100% accurate?! Every 10th digit your system detects is wrong!”

Angry client

DECISION TREE

“You said 100% accurate?! Every 10th digit your system detects is wrong!”

Angry client

We’ve trained our system on the data the client gave us. But our system has never seen the new data the client applied it to.

And in the real life — it never will…

OVERFITTING

Simulate the real-life situation — split the dataset

OVERFITTING

Underfitting!“Too stupid” OK Overfitting!

“Too smart”

OVERFITTING

Underfitting!“Too stupid” OK Overfitting!

“Too smart”

OVERFITTING

Our current decision tree has too much capacity, it just has memorized all of the data.

Let’s make it less complex.

You probably did not notice, but we are overfitting again :(

TEST SET 20%

TRAINING SET 60%

THE WHOLE DATASET

VALIDATION SET 20%

TEST SET 20%

TRAINING SET 60%

THE WHOLE DATASET

VALIDATION SET 20%

Fit various models and parameter combinations on this subset

TEST SET 20%

TRAINING SET 60%

THE WHOLE DATASET

VALIDATION SET 20%

• Evaluate the models created with different parameters

TEST SET 20%

TRAINING SET 60%

THE WHOLE DATASET

VALIDATION SET 20%

!• Estimate overfitting

TRAVALI

TEST SET 20%

TRAINING SET 60%

THE WHOLE DATASET

VALIDATION SET 20%

TRAVALITRAVALI

TEST SET 20%

TRAINING SET 60%

THE WHOLE DATASET

VALIDATION SET 20%

TRAVALITRAVALITRAVALI

TEST SET 20%

TRAINING SET 60%

THE WHOLE DATASET

VALIDATION SET 20%

TRAVALITRAVALITRAVALITRAVALI

TEST SET 20%

TRAINING SET 60%

THE WHOLE DATASET

VALIDATION SET 20%

TRAVALITRAVALITRAVALITRAVALITRAVALI

TEST SET 20%

TRAINING SET 60%

THE WHOLE DATASET

VALIDATION SET 20%

Use only once to get the final performance estimate

TRAVALITRAVALITRAVALITRAVALITRAVALI

TEST SET 20%

TRAINING SET 60%

VALIDATION SET 20%

TEST SET 20%

TRAINING SET 60%

VALIDATION SET 20%

CROSS-VALIDATION

TRAINING SET 60%

THE WHOLE DATASET

VALIDATION SET 20%

CROSS-VALIDATION

TRAINING SET 60%

THE WHOLE DATASET

VALIDATION SET 20%

What if we got too optimistic validation set?

CROSS-VALIDATION

TRAINING SET 60%

THE WHOLE DATASET

VALIDATION SET 20%

TRAINING SET 80%

CROSS-VALIDATION

TRAINING SET 60%

THE WHOLE DATASET

VALIDATION SET 20%

TRAINING SET 80%

Fix the parameter value you ned to evaluate, say msl=15

CROSS-VALIDATION

TRAINING SET 60%

THE WHOLE DATASET

VALIDATION SET 20%

TRAINING SET 80%

TRAINING VAL

TRAININGVAL

Repeat 10 times

CROSS-VALIDATION

TRAINING SET 60%

THE WHOLE DATASET

VALIDATION SET 20%

TRAINING SET 80%

TRAINING VAL

TRAININGVAL

Repeat 10 times } Take average validation score over 10 runs — it is a more stable estimate.

MACHINE LEARNING PIPELINE

Take raw data Extract features Split into TRAINING and TEST

Pick an algorithm and parameters

Train on the TRAINING data

Evaluate on the TRAINING data

with CV

Train on the whole TRAINING

Fix the best parameters

Evaluate on TESTReport final

performance to the client

Try our different algorithms and parameters

with CV

“So it is ~87%…erm… Could you do better?”

with CV

“So it is ~87%…erm… Could you do better?”

• C4.5• Randomforests• Bayesiannetworks• HiddenMarkovmodels• Artificialneuralnetwork• Dataclustering• Expectation-maximizationalgorithm• Self-organizingmap• Radialbasisfunctionnetwork• VectorQuantization• Generativetopographicmap• Informationbottleneckmethod• IBSEAD• Apriorialgorithm• Eclatalgorithm• FP-growthalgorithm• Single-linkageclustering• Conceptualclustering• K-meansalgorithm• Fuzzyclustering• Temporaldifferencelearning• Q-learning• LearningAutomata

Pick another algorithm

• C4.5• Randomforests• Bayesiannetworks• HiddenMarkovmodels• Artificialneuralnetwork• Dataclustering• Expectation-maximizationalgorithm• Self-organizingmap• Radialbasisfunctionnetwork• VectorQuantization• Generativetopographicmap• Informationbottleneckmethod• IBSEAD• Apriorialgorithm• Eclatalgorithm• FP-growthalgorithm• Single-linkageclustering• Conceptualclustering• K-meansalgorithm• Fuzzyclustering• Temporaldifferencelearning• Q-learning• LearningAutomata

Pick another algorithm

RANDOM FOREST

RANDOM FORESTDecision tree:

pick best out of all features

RANDOM FORESTDecision tree:

pick best out of all featuresRandom forest:

pick best out of random subset of features

RANDOM FOREST

pick best out of another random subset of features

RANDOM FOREST

pick best out of another random subset of features pick best out of yet another

random subset of features

RANDOM FOREST

instance

RANDOM FOREST

instance

RANDOM FOREST

instance

RANDOM FOREST

instance

Happy client

ALL OTHER USE CASES

Frequency components Genre Bag of

words Topic

Pixel values

Cat or dog

Frame pixels

Walking or running

Database records Biometric data

Census data

Average salary … Dead or

HANDS-ON SESSION

http://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestClassifier.html

introduction to machine learning @ mooncascade ml camp

Technology

machine learning - university of...

adversarial machine learning —an...

camp oamp webinar fha updates thru ml 10-29 sept (2)

membership inference attacks against machine learning...

putting the “machine” back in machine learning: the for...

artificial intelligence (ai) and machine learning (ml

optimizing terascale machine learning pipelines with...

ml for rt: priority assignment using machine learning

ml-advice machine learning

machine learning using python -...

machine learning (ml) in wireless sensor networks (wsns)

skill up in machine learning using azure ml

materials database and machine learning: aflow-ml

create machine learning (ml) models in minutes using automl

a machine learning (ml) based approach for smart...

smart data webinar: machine learning (ml) adoption...

ml with tensorflowhunkim.github.io/ml/lec1.pdf · machine...

a review of machine learning (ml) algorithms used for

machine learning and azure ml studio gabc

m.sc. (ml & ml devops) master of science in machine learning...