an introduction to machine learning · an introduction to machine learning fabio a. gonz alez ph.d....
TRANSCRIPT
AnIntroductionto Machine
Learning
Fabio A.Gonzalez
Ph.D.
Introduction
Patterns andGeneralization
LearningProblems
LearningTechniques
An Introduction to Machine Learning
Fabio A. Gonzalez Ph.D.
Depto. de Ing. de Sistemas e IndustrialUniversidad Nacional de Colombia, Bogota
February 9, 2018
AnIntroductionto Machine
Learning
Fabio A.Gonzalez
Ph.D.
Introduction
Patterns andGeneralization
LearningProblems
LearningTechniques
Content
1 IntroductionExampleHow to State the Learning Problem?How to Solve the Learning Problem?
2 Patterns and GeneralizationGeneralizing from patternsOverfitting/ OverlearningHow to Measure the Quality of a Solution?
3 Learning ProblemsSupervisedNon-supervisedActiveOn-line
4 Learning Techniques
AnIntroductionto Machine
Learning
Fabio A.Gonzalez
Ph.D.
Introduction
Example
How to State theLearning Problem?
How to Solve theLearning Problem?
Patterns andGeneralization
LearningProblems
LearningTechniques
Two class classification problem
AnIntroductionto Machine
Learning
Fabio A.Gonzalez
Ph.D.
Introduction
Example
How to State theLearning Problem?
How to Solve theLearning Problem?
Patterns andGeneralization
LearningProblems
LearningTechniques
How to solve it?
• We need to build a prediction function f : R2 → R suchthat::
Prediction(x , y) =
{C1 si f (x , y) ≥ 0
C2 si f (x , y) < 0
• Training set: D = {((x1, y1), l1) , . . . , ((xn , yn), ln)}• Example:
D = {((1, 2),−1), ((1, 3),−1), ((3, 1), 1), . . . }• Loss function:
L(f ,D) =∑
(xi ,yi ,li )∈D
|sign(f (xi , yi))− li |2
AnIntroductionto Machine
Learning
Fabio A.Gonzalez
Ph.D.
Introduction
Example
How to State theLearning Problem?
How to Solve theLearning Problem?
Patterns andGeneralization
LearningProblems
LearningTechniques
L1 Error loss
f (x , y) = w1x + w0y
L(f ,D) =12
∑(xi ,yi ,li )∈D |f (xi , yi))− li |
• Are there otheralternative lossfunctions? �2.0 �1.5 �1.0 �0.5 0.0 0.5 1.0 1.5
w0
�2.0
�1.5
�1.0
�0.5
0.0
0.5
1.0
1.5
w1
0
15
30
45
60
75
90
105
120
135
AnIntroductionto Machine
Learning
Fabio A.Gonzalez
Ph.D.
Introduction
Example
How to State theLearning Problem?
How to Solve theLearning Problem?
Patterns andGeneralization
LearningProblems
LearningTechniques
Square error loss
f (x , y) = w1x + w0y
L(f ,D) =12
∑(xi ,yi ,li )∈D(f (xi , yi))−
li)2
�2.0 �1.5 �1.0 �0.5 0.0 0.5 1.0 1.5w0
�2.0
�1.5
�1.0
�0.5
0.0
0.5
1.0
1.5
w1
0
160
320
480
640
800
960
1120
AnIntroductionto Machine
Learning
Fabio A.Gonzalez
Ph.D.
Introduction
Example
How to State theLearning Problem?
How to Solve theLearning Problem?
Patterns andGeneralization
LearningProblems
LearningTechniques
Learning as optimization
• General optimization problem:
minf ∈H
L(f ,D)
• Two Class 2D classification using linear functions:
H = {f : f (x , y) = w2x + w1y + w0, ∀w0,w1,w2 ∈ R}
minf ∈H
L(f ,D) = minW∈R3
1
2
∑(xi ,yi )∈D
(w2xi + w1yi + w0 − li)2
AnIntroductionto Machine
Learning
Fabio A.Gonzalez
Ph.D.
Introduction
Example
How to State theLearning Problem?
How to Solve theLearning Problem?
Patterns andGeneralization
LearningProblems
LearningTechniques
Hypothesis space
8 6 4 2 0x
4
2
0
2
y
AnIntroductionto Machine
Learning
Fabio A.Gonzalez
Ph.D.
Introduction
Example
How to State theLearning Problem?
How to Solve theLearning Problem?
Patterns andGeneralization
LearningProblems
LearningTechniques
Gradient descent
Iterative optimization of the lossfunction:
initialize W 0 =w0,w1,w2
k ← 0repeat
k ← k + 1W k ←W k−1 −
η(k)∇L(fW k−1 ,S )until |η(k)∇L(fW k−1 ,S )| <Θ
AnIntroductionto Machine
Learning
Fabio A.Gonzalez
Ph.D.
Introduction
Example
How to State theLearning Problem?
How to Solve theLearning Problem?
Patterns andGeneralization
LearningProblems
LearningTechniques
Gradient descent iteration example(1)
�2.0 �1.5 �1.0 �0.5 0.0 0.5 1.0 1.5w2
�2.0
�1.5
�1.0
�0.5
0.0
0.5
1.0
1.5w1
0
160
320
480
640
800
960
1120
AnIntroductionto Machine
Learning
Fabio A.Gonzalez
Ph.D.
Introduction
Example
How to State theLearning Problem?
How to Solve theLearning Problem?
Patterns andGeneralization
LearningProblems
LearningTechniques
Gradient descent iteration example(2)
�2.0 �1.5 �1.0 �0.5 0.0 0.5 1.0 1.5w2
�2.0
�1.5
�1.0
�0.5
0.0
0.5
1.0
1.5w1
0
160
320
480
640
800
960
1120
AnIntroductionto Machine
Learning
Fabio A.Gonzalez
Ph.D.
Introduction
Example
How to State theLearning Problem?
How to Solve theLearning Problem?
Patterns andGeneralization
LearningProblems
LearningTechniques
Non-separable data
4 5 6 7 8 9 10 11 12x
12
11
10
9
8
7
6
5
y
AnIntroductionto Machine
Learning
Fabio A.Gonzalez
Ph.D.
Introduction
Patterns andGeneralization
Generalizing frompatterns
Overfitting/Overlearning
How to Measure theQuality of aSolution?
LearningProblems
LearningTechniques
What is a pattern?
• Data regularities
• Data relationships
• Redundancy
• Generative model
AnIntroductionto Machine
Learning
Fabio A.Gonzalez
Ph.D.
Introduction
Patterns andGeneralization
Generalizing frompatterns
Overfitting/Overlearning
How to Measure theQuality of aSolution?
LearningProblems
LearningTechniques
Learning a boolean function
x1 x2 f1 f2 ... f16
0 0 0 0 ... 1
0 1 0 0 ... 1
1 0 0 0 ... 1
1 1 0 1 ... 1
• How many Boolean functions of n variables are?
• How many candidate functions are removed by a sample?
• Is it possible to generalize?
AnIntroductionto Machine
Learning
Fabio A.Gonzalez
Ph.D.
Introduction
Patterns andGeneralization
Generalizing frompatterns
Overfitting/Overlearning
How to Measure theQuality of aSolution?
LearningProblems
LearningTechniques
Inductive bias
• In general, the learning problem is ill-posed (more thanone possible solution for the same particular problem,solutions are sensitive to small changes on the problem)
• It is necessary to make additional assumptions about thekind of pattern that we want to learn
• Hypothesis space: set of valid patterns that can belearnt by the algorithm
AnIntroductionto Machine
Learning
Fabio A.Gonzalez
Ph.D.
Introduction
Patterns andGeneralization
Generalizing frompatterns
Overfitting/Overlearning
How to Measure theQuality of aSolution?
LearningProblems
LearningTechniques
What is a good pattern?
AnIntroductionto Machine
Learning
Fabio A.Gonzalez
Ph.D.
Introduction
Patterns andGeneralization
Generalizing frompatterns
Overfitting/Overlearning
How to Measure theQuality of aSolution?
LearningProblems
LearningTechniques
What is a good pattern?
AnIntroductionto Machine
Learning
Fabio A.Gonzalez
Ph.D.
Introduction
Patterns andGeneralization
Generalizing frompatterns
Overfitting/Overlearning
How to Measure theQuality of aSolution?
LearningProblems
LearningTechniques
Occam’s razor
from Wikipedia:
Occam’s razor (also spelled Ockham’s razor) is a principleattributed to the 14th-century English logician and Franciscanfriar William of Ockham. The principle states that theexplanation of any phenomenon should make as fewassumptions as possible, eliminating, or ”shaving off”, thosethat make no difference in the observable predictions of theexplanatory hypothesis or theory. The principle is oftenexpressed in Latin as the lex parsimoniae (law of succinctnessor parsimony).”All things being equal, the simplest solution tends to bethe best one.”
AnIntroductionto Machine
Learning
Fabio A.Gonzalez
Ph.D.
Introduction
Patterns andGeneralization
Generalizing frompatterns
Overfitting/Overlearning
How to Measure theQuality of aSolution?
LearningProblems
LearningTechniques
Training error vsgeneralization error
• The loss function measures the error in the training set
• Is this a good measure of the quality of the solution?
•
AnIntroductionto Machine
Learning
Fabio A.Gonzalez
Ph.D.
Introduction
Patterns andGeneralization
Generalizing frompatterns
Overfitting/Overlearning
How to Measure theQuality of aSolution?
LearningProblems
LearningTechniques
Over-fitting and under-fitting
4 2 0 2 4
2
0
2
4
6
8
10
Subajuste
4 2 0 2 4
2
0
2
4
6
8
10
Ajuste apropiado
4 2 0 2 4
2
0
2
4
6
8
10
Sobreajuste
AnIntroductionto Machine
Learning
Fabio A.Gonzalez
Ph.D.
Introduction
Patterns andGeneralization
Generalizing frompatterns
Overfitting/Overlearning
How to Measure theQuality of aSolution?
LearningProblems
LearningTechniques
Generalization error
• Generalization error:
E [(L(fw ,S )]
• How to control the generalization error during training?
• Cross validation• Regularization
AnIntroductionto Machine
Learning
Fabio A.Gonzalez
Ph.D.
Introduction
Patterns andGeneralization
Generalizing frompatterns
Overfitting/Overlearning
How to Measure theQuality of aSolution?
LearningProblems
LearningTechniques
Regularization
• Vapnik, 1995:
AnIntroductionto Machine
Learning
Fabio A.Gonzalez
Ph.D.
Introduction
Patterns andGeneralization
LearningProblems
Supervised
Non-supervised
Active
On-line
LearningTechniques
Types
• Supervised learning
• Non-supervised learning
• Semi-supervised learning
• Active/reinforcement learning
• On-line learning
AnIntroductionto Machine
Learning
Fabio A.Gonzalez
Ph.D.
Introduction
Patterns andGeneralization
LearningProblems
Supervised
Non-supervised
Active
On-line
LearningTechniques
Supervised learning
• Fundamentalproblem: to find afunction that relates aset of inputs with a setof outputs
• Typical problems:
• Classification• Regression
AnIntroductionto Machine
Learning
Fabio A.Gonzalez
Ph.D.
Introduction
Patterns andGeneralization
LearningProblems
Supervised
Non-supervised
Active
On-line
LearningTechniques
Non-supervised learningLatent Dirichlet allocation (LDA)
gene 0.04
dna 0.02
genetic 0.01
.,,
life 0.02
evolve 0.01
organism 0.01
.,,
brain 0.04
neuron 0.02
nerve 0.01
...
data 0.02
number 0.02
computer 0.01
.,,
Topics DocumentsTopic proportions and
assignments
• Each topic is a distribution over words
• Each document is a mixture of corpus-wide topics
• Each word is drawn from one of those topics
AnIntroductionto Machine
Learning
Fabio A.Gonzalez
Ph.D.
Introduction
Patterns andGeneralization
LearningProblems
Supervised
Non-supervised
Active
On-line
LearningTechniques
Non-supervised learningLatent Dirichlet allocation (LDA)
gene 0.04
dna 0.02
genetic 0.01
.,,
life 0.02
evolve 0.01
organism 0.01
.,,
brain 0.04
neuron 0.02
nerve 0.01
...
data 0.02
number 0.02
computer 0.01
.,,
Topics DocumentsTopic proportions and
assignments
• Each topic is a distribution over words
• Each document is a mixture of corpus-wide topics
• Each word is drawn from one of those topics
• There are not labels for the training samples
• Fundamental problem: to find the subjacent structure ofa training data set
• Typical problems: clustering, probability densityestimation, dimensionality reduction, latent topic analysis,data compression
• Some samples may have labels, in that case it is calledsemi-supervised learning
AnIntroductionto Machine
Learning
Fabio A.Gonzalez
Ph.D.
Introduction
Patterns andGeneralization
LearningProblems
Supervised
Non-supervised
Active
On-line
LearningTechniques
Active/reinforcement learning
• Generally, it happens inthe context of an agentacting in an environment
• The agent is not toldwhether it has make theright decision or not
• The agent is punished orrewarded (not necessarilyin an immediate way)
• Fundamental problem:to define a policy thatallows to maximize thepositive stimulus (reward) https://www.youtube.com/watch?v=iqXKQf2BOSE
AnIntroductionto Machine
Learning
Fabio A.Gonzalez
Ph.D.
Introduction
Patterns andGeneralization
LearningProblems
Supervised
Non-supervised
Active
On-line
LearningTechniques
On-line learning
• Only one pass through the data
• big data volume• real time
• It may be supervised or unsupervised
• Fundamental problem: to extract the maximuminformation from data with minimum number of passes
AnIntroductionto Machine
Learning
Fabio A.Gonzalez
Ph.D.
Introduction
Patterns andGeneralization
LearningProblems
LearningTechniques
Representative techniques
• Computational
• Decision trees• Nearest-neighbor
classification• Graph-based clustering• Association rules
• Statistical
• Multivariate regression• Linear discriminant
analysis• Bayesian decision theory• Bayesian networks• K-means
• Computational-Statistical
• SVM• AdaBoost
• Bio-inspired
• Neural networks• Genetic algorithms• Artificial immune
systems