machine learning tutorial (okkam project, trento, 2010) - part i
DESCRIPTION
George GiannakopoulosApril 13, 2010Please cite http://www.iit.demokritos.gr/~ggianna if you reuse.An introduction to machine learning basics.TRANSCRIPT
Introduction
Machine Learning for... OKKAMoids
George Giannakopoulos
April 13, 2010
George Giannakopoulos Machine Learning for... OKKAMoids
Introduction
Material
Thanks
Simon Colton [Colton, 2010]
T. Palpanas
Wikipedia Commons [wik, 2010]
S. Theodoridis and K.Koutroumbas [Theodoridis and Koutroumbas, 2003]
Please cite http://www.iit.demokritos.gr/~ggianna if you reuse.
George Giannakopoulos Machine Learning for... OKKAMoids
Introduction Definitions
Purpose
Part I: Introducing Machine Learning
Familiarize with basic terminology
Provide insight on common approaches
Give general principles
Provide classes of algorithms and examples
Part II: Going Deeper: Models, Parameter Estimation, and ML inPractice
Sneak peek at the end of this presentation...
George Giannakopoulos Machine Learning for... OKKAMoids
Main ApproachesClosing
Examples and Requirements
Machine Learning
The field of machine learning is concerned with the question ofhow to construct computer programs that automatically improvewith experience.[Mitchell, 1997]
George Giannakopoulos Machine Learning for... OKKAMoids
Main ApproachesClosing
Examples and Requirements
Well-defined Learning Problem
A computer program is said to learn from experience E withrespect to some class of tasks T and performance measure P, if itsperformance at tasks in T , as measured by P, improves withexperience E .
George Giannakopoulos Machine Learning for... OKKAMoids
Main ApproachesClosing
Examples and Requirements
Example Tasks
Speech Recognition Recognizing words uttered by a speaker(audio).
Document Classification Classifying documents in e.g., spam andham (text).
Preference Learning Learning what a user wants, based onfeedback (interaction).
Chess playing Learning to playing chess, based on previous games(gaming).
Automatic Driving Learning to recognize and adapt to theenvironment (multimodal).
George Giannakopoulos Machine Learning for... OKKAMoids
Main ApproachesClosing
Examples and Requirements
Important Decisions
Exact type of knowledge to be learnt
Representation of the knowledge
bag-of-* vs. sequencevectors vs. nominalsinteger vs. realorthogonal features vs. feature relations
Learning strategy
George Giannakopoulos Machine Learning for... OKKAMoids
Main ApproachesClosing
Examples and Requirements
Questions
What algorithms exist for learning?
How much training data is sufficient?
When and how can prior knowledge help generalizing fromexamples?
Best strategy for choosing training experience?
What functions should the system attempt to learn?
How did things start?
George Giannakopoulos Machine Learning for... OKKAMoids
Main ApproachesClosing
Concept LearningClassificationClustering
Concept Learning
Hypothesis declaration
Search
Generalization
...getting a model describing the whole set of unknowninstances.
George Giannakopoulos Machine Learning for... OKKAMoids
Main ApproachesClosing
Concept LearningClassificationClustering
Example: Playing Experience Based on Weather
Sky AirTemp Humidity Wind Water Forecast EnjoySport
Sunny Warm Normal Strong Warm Same YesSunny Warm High Strong Warm Same YesRainy Cold High Strong Warm Change NoSunny Warm High Strong Cool Change Yes
Table: Positive and Negative Examples for EnjoySport concept
But, what is actually learnable?
George Giannakopoulos Machine Learning for... OKKAMoids
Main ApproachesClosing
Concept LearningClassificationClustering
Probably Approximately Correct (PAC) Learning -(1)-[Valiant, 1984]
Instance space (encoded) X (Representation space)
Distribution D over X (Instances over the space)
Generator and oracle EX(X ,D) gives concept instance c andlabel c(x), x ∈ X
Learner algorithm A
Error bound ε
Probability δ
George Giannakopoulos Machine Learning for... OKKAMoids
Main ApproachesClosing
Concept LearningClassificationClustering
Probably Approximately Correct (PAC) Learning -(2)-
Definition
A problem instance space X , where for every (concept) subspaceC ∈ X , and every distribution D over X there is an algorithm Athat can, with probability of at least 1− δ output a hypothesish ∈ C , that has error less than or equal to ε with examples drawnfrom X with the distribution D, this means that X is PAClearnable.
How can we learn?
George Giannakopoulos Machine Learning for... OKKAMoids
Main ApproachesClosing
Concept LearningClassificationClustering
Candidate Hypotheses Elimination
Choose best candidates from all the possible hypotheses
Check: Generalization vs. Specificity
Check: Hypotheses’ space sufficiency
Check: Noise and insufficient data
Check: Concept drift
Check: Overfitting
Sky AirTemp Humidity Wind Water Forecast EnjoySport
Sunny Warm Normal Strong Warm Same YesSunny Warm High Strong Warm Same YesRainy Cold High Strong Warm Change No
George Giannakopoulos Machine Learning for... OKKAMoids
Main ApproachesClosing
Concept LearningClassificationClustering
Candidate Hypotheses Elimination
Choose best candidates from all the possible hypotheses
Check: Generalization vs. Specificity
Check: Hypotheses’ space sufficiency
Check: Noise and insufficient data
Check: Concept drift
Check: Overfitting
Sky AirTemp Humidity Wind Water Forecast EnjoySport
Sunny Warm Normal Strong Warm Same YesSunny Warm High Strong Warm Same YesRainy Cold High Strong Warm Change NoSunny Warm High Strong Cool Change Yes
George Giannakopoulos Machine Learning for... OKKAMoids
Main ApproachesClosing
Concept LearningClassificationClustering
Candidate Hypotheses Elimination
Choose best candidates from all the possible hypotheses
Check: Generalization vs. Specificity
Check: Hypotheses’ space sufficiency
Check: Noise and insufficient data
Check: Concept drift
Check: Overfitting
George Giannakopoulos Machine Learning for... OKKAMoids
Main ApproachesClosing
Concept LearningClassificationClustering
Candidate Hypotheses Elimination
Choose best candidates from all the possible hypotheses
Check: Generalization vs. Specificity
Check: Hypotheses’ space sufficiency
Check: Noise and insufficient data
Check: Concept drift
Check: Overfitting
George Giannakopoulos Machine Learning for... OKKAMoids
Main ApproachesClosing
Concept LearningClassificationClustering
Candidate Hypotheses Elimination
Choose best candidates from all the possible hypotheses
Check: Generalization vs. Specificity
Check: Hypotheses’ space sufficiency
Check: Noise and insufficient data
Check: Concept drift
Check: Overfitting
George Giannakopoulos Machine Learning for... OKKAMoids
Main ApproachesClosing
Concept LearningClassificationClustering
Candidate Hypotheses Elimination
Choose best candidates from all the possible hypotheses
Check: Generalization vs. Specificity
Check: Hypotheses’ space sufficiency
Check: Noise and insufficient data
Check: Concept drift
Check: Overfitting
George Giannakopoulos Machine Learning for... OKKAMoids
Main ApproachesClosing
Concept LearningClassificationClustering
Candidate Hypotheses Elimination
Choose best candidates from all the possible hypotheses
Check: Generalization vs. Specificity
Check: Hypotheses’ space sufficiency
Check: Noise and insufficient data
Check: Concept drift
Check: Overfitting
But isn’t this classification?
George Giannakopoulos Machine Learning for... OKKAMoids
Main ApproachesClosing
Concept LearningClassificationClustering
Classification as Learning
For a given instance i determine its label L.
Classifier
A model or algorithm M the can predict for every i a label M(i)
Thus, we train a classifier on known data to then use it onunknown data.
George Giannakopoulos Machine Learning for... OKKAMoids
Main ApproachesClosing
Concept LearningClassificationClustering
Types of Classification Based on Input
Supervised: Labeled, unlabeled instances
Unsupervised: (Is it classification?) Clustering Unlabeledinstances only
Active learning: Labeled, unlabeled instances, plus choseninstances to label
Transfer learning: Re-use learning in another domain
George Giannakopoulos Machine Learning for... OKKAMoids
Main ApproachesClosing
Concept LearningClassificationClustering
Types of Classification Based on Input
Supervised: Labeled, unlabeled instances
Unsupervised: (Is it classification?) Clustering Unlabeledinstances only
Active learning: Labeled, unlabeled instances, plus choseninstances to label
Transfer learning: Re-use learning in another domain
George Giannakopoulos Machine Learning for... OKKAMoids
Main ApproachesClosing
Concept LearningClassificationClustering
Types of Classification Based on Input
Supervised: Labeled, unlabeled instances
Unsupervised: (Is it classification?) Clustering Unlabeledinstances only
Active learning: Labeled, unlabeled instances, plus choseninstances to label
Transfer learning: Re-use learning in another domain
George Giannakopoulos Machine Learning for... OKKAMoids
Main ApproachesClosing
Concept LearningClassificationClustering
Types of Classification Based on Input
Supervised: Labeled, unlabeled instances
Unsupervised: (Is it classification?) Clustering Unlabeledinstances only
Active learning: Labeled, unlabeled instances, plus choseninstances to label
Transfer learning: Re-use learning in another domain
George Giannakopoulos Machine Learning for... OKKAMoids
Main ApproachesClosing
Concept LearningClassificationClustering
Examples and Strategies
Bayesian approach Simply use bayes rule individually in featuresand multiply
Rules Candidate hypotheses elimination
Decision trees One node per feature - Recursive partitioning
Discriminative Hyperplane Regression and Support Vectors
George Giannakopoulos Machine Learning for... OKKAMoids
Main ApproachesClosing
Concept LearningClassificationClustering
How Learners Learn: Inductive Bias
Minimum features
Nearest neighbors
Maximum margin
Minimum cross-validation error
Maximum conditional independence
Minimum description length (Occam’s razor)
George Giannakopoulos Machine Learning for... OKKAMoids
Main ApproachesClosing
Concept LearningClassificationClustering
How Learners Learn: Inductive Bias
Minimum features
Nearest neighbors
Maximum margin
Minimum cross-validation error
Maximum conditional independence
Minimum description length (Occam’s razor)
George Giannakopoulos Machine Learning for... OKKAMoids
Main ApproachesClosing
Concept LearningClassificationClustering
How Learners Learn: Inductive Bias
Minimum features
Nearest neighbors
Maximum margin
Minimum cross-validation error
Maximum conditional independence
Minimum description length (Occam’s razor)
George Giannakopoulos Machine Learning for... OKKAMoids
Main ApproachesClosing
Concept LearningClassificationClustering
How Learners Learn: Inductive Bias
Minimum features
Nearest neighbors
Maximum margin
Minimum cross-validation error
Maximum conditional independence
Minimum description length (Occam’s razor)
George Giannakopoulos Machine Learning for... OKKAMoids
Main ApproachesClosing
Concept LearningClassificationClustering
How Learners Learn: Inductive Bias
Minimum features
Nearest neighbors
Maximum margin
Minimum cross-validation error
Maximum conditional independence
Minimum description length (Occam’s razor)
Pr(A ∩ B|C ) = P(A|C )P(B|C )
George Giannakopoulos Machine Learning for... OKKAMoids
Main ApproachesClosing
Concept LearningClassificationClustering
How Learners Learn: Inductive Bias
Minimum features
Nearest neighbors
Maximum margin
Minimum cross-validation error
Maximum conditional independence
Minimum description length (Occam’s razor)
George Giannakopoulos Machine Learning for... OKKAMoids
Main ApproachesClosing
Concept LearningClassificationClustering
Performance -(1)-
Confusion matrix Per category assignments
Recall How many instances were retrieved
Precision How precisely instances were retrieved
F-measure Harmonic mean of R, P
George Giannakopoulos Machine Learning for... OKKAMoids
Main ApproachesClosing
Concept LearningClassificationClustering
Performance -(1)-
Confusion matrix Per category assignments
Recall How many instances were retrieved
Precision How precisely instances were retrieved
F-measure Harmonic mean of R, P
R = True class instances foundTrue number of class instances
George Giannakopoulos Machine Learning for... OKKAMoids
Main ApproachesClosing
Concept LearningClassificationClustering
Performance -(1)-
Confusion matrix Per category assignments
Recall How many instances were retrieved
Precision How precisely instances were retrieved
F-measure Harmonic mean of R, P
P = True class instances foundAll instances indicated to belong to class
George Giannakopoulos Machine Learning for... OKKAMoids
Main ApproachesClosing
Concept LearningClassificationClustering
Performance -(1)-
Confusion matrix Per category assignments
Recall How many instances were retrieved
Precision How precisely instances were retrieved
F-measure Harmonic mean of R, P
F1 = 2 R∗PR+P
George Giannakopoulos Machine Learning for... OKKAMoids
Main ApproachesClosing
Concept LearningClassificationClustering
Performance -(2)-
ROC curve Receiver Operating Characteristic
George Giannakopoulos Machine Learning for... OKKAMoids
Main ApproachesClosing
Concept LearningClassificationClustering
Average Performance
Micro-average Calculated over all instances: e.g. Count TP, FNand calculate Precision directly.
Macro-average Calculated over all categories: Calculate Precisionfor each class and average.
George Giannakopoulos Machine Learning for... OKKAMoids
Main ApproachesClosing
Concept LearningClassificationClustering
Clustering -(1)-
Distance/Proximity/Similarity between instances used
Distributions over the space
Optimization of an objective function
George Giannakopoulos Machine Learning for... OKKAMoids
Main ApproachesClosing
Concept LearningClassificationClustering
Clustering -(2)-[Theodoridis and Koutroumbas, 2003]
Sequential Clustering
Hierarchical Clustering
AgglomerativeDivisiveCost function optimization
hardfuzzypossibilisticprobabilisticboundary detection
Other (Branch and bound, Genetic Clustering, Stochasticrelaxation)
George Giannakopoulos Machine Learning for... OKKAMoids
Main ApproachesClosing
What did you say Machine Learning was?Is that all?
Recapitulation
Learning from Experience
Various tasks
Differentiation based on input (classification, clustering, ...)
Differentiation based on inductive bias (strategy)
Several ways to evaluate performance
George Giannakopoulos Machine Learning for... OKKAMoids
Main ApproachesClosing
What did you say Machine Learning was?Is that all?
Sneak Peek to Part II
Pattern Recognition on Sequences
Generative Models
Discriminative Models
Parametric vs Non-parametric Models
Parameter Estimation (Maximum Likelihood, GeneticAlgorithms, etc.)
Good Practices for Using Machine Learning Techniques
Matching Problems to Algorithms
Available Tools
George Giannakopoulos Machine Learning for... OKKAMoids
Main ApproachesClosing
What did you say Machine Learning was?Is that all?
Yes, we are done (for today)!
Thank you!Please check the feedback form1 to help me improve.
1http://tinyurl.com/ycommj3
George Giannakopoulos Machine Learning for... OKKAMoids
References
(2010).Wikimedia commons.
Colton, S. (March 30, 2010).Artificial intelligence course v231.
Mitchell, T. (1997).Machine learning.Burr Ridge, IL: McGraw Hill.
Theodoridis, S. and Koutroumbas, K. (2003).Pattern Recognition.Academic Press.
Valiant, L. G. (1984).A theory of the learnable.Commun. ACM, 27(11):1134–1142.
George Giannakopoulos Machine Learning for... OKKAMoids