an introduction to machine learning · an introduction to machine learning fabio a. gonz alez ph.d....

30
An Introduction to Machine Learning Fabio A. Gonz´ alez Ph.D. Introduction Patterns and Generalization Learning Problems Learning Techniques An Introduction to Machine Learning Fabio A. Gonz´ alez Ph.D. Depto. de Ing. de Sistemas e Industrial Universidad Nacional de Colombia, Bogot´ a February 9, 2018

Upload: others

Post on 20-May-2020

5 views

Category:

Documents


0 download

TRANSCRIPT

AnIntroductionto Machine

Learning

Fabio A.Gonzalez

Ph.D.

Introduction

Patterns andGeneralization

LearningProblems

LearningTechniques

An Introduction to Machine Learning

Fabio A. Gonzalez Ph.D.

Depto. de Ing. de Sistemas e IndustrialUniversidad Nacional de Colombia, Bogota

February 9, 2018

AnIntroductionto Machine

Learning

Fabio A.Gonzalez

Ph.D.

Introduction

Patterns andGeneralization

LearningProblems

LearningTechniques

Content

1 IntroductionExampleHow to State the Learning Problem?How to Solve the Learning Problem?

2 Patterns and GeneralizationGeneralizing from patternsOverfitting/ OverlearningHow to Measure the Quality of a Solution?

3 Learning ProblemsSupervisedNon-supervisedActiveOn-line

4 Learning Techniques

AnIntroductionto Machine

Learning

Fabio A.Gonzalez

Ph.D.

Introduction

Example

How to State theLearning Problem?

How to Solve theLearning Problem?

Patterns andGeneralization

LearningProblems

LearningTechniques

Two class classification problem

AnIntroductionto Machine

Learning

Fabio A.Gonzalez

Ph.D.

Introduction

Example

How to State theLearning Problem?

How to Solve theLearning Problem?

Patterns andGeneralization

LearningProblems

LearningTechniques

How to solve it?

• We need to build a prediction function f : R2 → R suchthat::

Prediction(x , y) =

{C1 si f (x , y) ≥ 0

C2 si f (x , y) < 0

• Training set: D = {((x1, y1), l1) , . . . , ((xn , yn), ln)}• Example:

D = {((1, 2),−1), ((1, 3),−1), ((3, 1), 1), . . . }• Loss function:

L(f ,D) =∑

(xi ,yi ,li )∈D

|sign(f (xi , yi))− li |2

AnIntroductionto Machine

Learning

Fabio A.Gonzalez

Ph.D.

Introduction

Example

How to State theLearning Problem?

How to Solve theLearning Problem?

Patterns andGeneralization

LearningProblems

LearningTechniques

L1 Error loss

f (x , y) = w1x + w0y

L(f ,D) =12

∑(xi ,yi ,li )∈D |f (xi , yi))− li |

• Are there otheralternative lossfunctions? �2.0 �1.5 �1.0 �0.5 0.0 0.5 1.0 1.5

w0

�2.0

�1.5

�1.0

�0.5

0.0

0.5

1.0

1.5

w1

0

15

30

45

60

75

90

105

120

135

AnIntroductionto Machine

Learning

Fabio A.Gonzalez

Ph.D.

Introduction

Example

How to State theLearning Problem?

How to Solve theLearning Problem?

Patterns andGeneralization

LearningProblems

LearningTechniques

Square error loss

f (x , y) = w1x + w0y

L(f ,D) =12

∑(xi ,yi ,li )∈D(f (xi , yi))−

li)2

�2.0 �1.5 �1.0 �0.5 0.0 0.5 1.0 1.5w0

�2.0

�1.5

�1.0

�0.5

0.0

0.5

1.0

1.5

w1

0

160

320

480

640

800

960

1120

AnIntroductionto Machine

Learning

Fabio A.Gonzalez

Ph.D.

Introduction

Example

How to State theLearning Problem?

How to Solve theLearning Problem?

Patterns andGeneralization

LearningProblems

LearningTechniques

Learning as optimization

• General optimization problem:

minf ∈H

L(f ,D)

• Two Class 2D classification using linear functions:

H = {f : f (x , y) = w2x + w1y + w0, ∀w0,w1,w2 ∈ R}

minf ∈H

L(f ,D) = minW∈R3

1

2

∑(xi ,yi )∈D

(w2xi + w1yi + w0 − li)2

AnIntroductionto Machine

Learning

Fabio A.Gonzalez

Ph.D.

Introduction

Example

How to State theLearning Problem?

How to Solve theLearning Problem?

Patterns andGeneralization

LearningProblems

LearningTechniques

Hypothesis space

8 6 4 2 0x

4

2

0

2

y

AnIntroductionto Machine

Learning

Fabio A.Gonzalez

Ph.D.

Introduction

Example

How to State theLearning Problem?

How to Solve theLearning Problem?

Patterns andGeneralization

LearningProblems

LearningTechniques

Gradient descent

Iterative optimization of the lossfunction:

initialize W 0 =w0,w1,w2

k ← 0repeat

k ← k + 1W k ←W k−1 −

η(k)∇L(fW k−1 ,S )until |η(k)∇L(fW k−1 ,S )| <Θ

AnIntroductionto Machine

Learning

Fabio A.Gonzalez

Ph.D.

Introduction

Example

How to State theLearning Problem?

How to Solve theLearning Problem?

Patterns andGeneralization

LearningProblems

LearningTechniques

Gradient descent iteration example(1)

�2.0 �1.5 �1.0 �0.5 0.0 0.5 1.0 1.5w2

�2.0

�1.5

�1.0

�0.5

0.0

0.5

1.0

1.5w1

0

160

320

480

640

800

960

1120

AnIntroductionto Machine

Learning

Fabio A.Gonzalez

Ph.D.

Introduction

Example

How to State theLearning Problem?

How to Solve theLearning Problem?

Patterns andGeneralization

LearningProblems

LearningTechniques

Gradient descent iteration example(2)

�2.0 �1.5 �1.0 �0.5 0.0 0.5 1.0 1.5w2

�2.0

�1.5

�1.0

�0.5

0.0

0.5

1.0

1.5w1

0

160

320

480

640

800

960

1120

AnIntroductionto Machine

Learning

Fabio A.Gonzalez

Ph.D.

Introduction

Example

How to State theLearning Problem?

How to Solve theLearning Problem?

Patterns andGeneralization

LearningProblems

LearningTechniques

Non-separable data

4 5 6 7 8 9 10 11 12x

12

11

10

9

8

7

6

5

y

AnIntroductionto Machine

Learning

Fabio A.Gonzalez

Ph.D.

Introduction

Patterns andGeneralization

Generalizing frompatterns

Overfitting/Overlearning

How to Measure theQuality of aSolution?

LearningProblems

LearningTechniques

What is a pattern?

• Data regularities

• Data relationships

• Redundancy

• Generative model

AnIntroductionto Machine

Learning

Fabio A.Gonzalez

Ph.D.

Introduction

Patterns andGeneralization

Generalizing frompatterns

Overfitting/Overlearning

How to Measure theQuality of aSolution?

LearningProblems

LearningTechniques

Learning a boolean function

x1 x2 f1 f2 ... f16

0 0 0 0 ... 1

0 1 0 0 ... 1

1 0 0 0 ... 1

1 1 0 1 ... 1

• How many Boolean functions of n variables are?

• How many candidate functions are removed by a sample?

• Is it possible to generalize?

AnIntroductionto Machine

Learning

Fabio A.Gonzalez

Ph.D.

Introduction

Patterns andGeneralization

Generalizing frompatterns

Overfitting/Overlearning

How to Measure theQuality of aSolution?

LearningProblems

LearningTechniques

Inductive bias

• In general, the learning problem is ill-posed (more thanone possible solution for the same particular problem,solutions are sensitive to small changes on the problem)

• It is necessary to make additional assumptions about thekind of pattern that we want to learn

• Hypothesis space: set of valid patterns that can belearnt by the algorithm

AnIntroductionto Machine

Learning

Fabio A.Gonzalez

Ph.D.

Introduction

Patterns andGeneralization

Generalizing frompatterns

Overfitting/Overlearning

How to Measure theQuality of aSolution?

LearningProblems

LearningTechniques

What is a good pattern?

AnIntroductionto Machine

Learning

Fabio A.Gonzalez

Ph.D.

Introduction

Patterns andGeneralization

Generalizing frompatterns

Overfitting/Overlearning

How to Measure theQuality of aSolution?

LearningProblems

LearningTechniques

What is a good pattern?

AnIntroductionto Machine

Learning

Fabio A.Gonzalez

Ph.D.

Introduction

Patterns andGeneralization

Generalizing frompatterns

Overfitting/Overlearning

How to Measure theQuality of aSolution?

LearningProblems

LearningTechniques

Occam’s razor

from Wikipedia:

Occam’s razor (also spelled Ockham’s razor) is a principleattributed to the 14th-century English logician and Franciscanfriar William of Ockham. The principle states that theexplanation of any phenomenon should make as fewassumptions as possible, eliminating, or ”shaving off”, thosethat make no difference in the observable predictions of theexplanatory hypothesis or theory. The principle is oftenexpressed in Latin as the lex parsimoniae (law of succinctnessor parsimony).”All things being equal, the simplest solution tends to bethe best one.”

AnIntroductionto Machine

Learning

Fabio A.Gonzalez

Ph.D.

Introduction

Patterns andGeneralization

Generalizing frompatterns

Overfitting/Overlearning

How to Measure theQuality of aSolution?

LearningProblems

LearningTechniques

Training error vsgeneralization error

• The loss function measures the error in the training set

• Is this a good measure of the quality of the solution?

AnIntroductionto Machine

Learning

Fabio A.Gonzalez

Ph.D.

Introduction

Patterns andGeneralization

Generalizing frompatterns

Overfitting/Overlearning

How to Measure theQuality of aSolution?

LearningProblems

LearningTechniques

Over-fitting and under-fitting

4 2 0 2 4

2

0

2

4

6

8

10

Subajuste

4 2 0 2 4

2

0

2

4

6

8

10

Ajuste apropiado

4 2 0 2 4

2

0

2

4

6

8

10

Sobreajuste

AnIntroductionto Machine

Learning

Fabio A.Gonzalez

Ph.D.

Introduction

Patterns andGeneralization

Generalizing frompatterns

Overfitting/Overlearning

How to Measure theQuality of aSolution?

LearningProblems

LearningTechniques

Generalization error

• Generalization error:

E [(L(fw ,S )]

• How to control the generalization error during training?

• Cross validation• Regularization

AnIntroductionto Machine

Learning

Fabio A.Gonzalez

Ph.D.

Introduction

Patterns andGeneralization

Generalizing frompatterns

Overfitting/Overlearning

How to Measure theQuality of aSolution?

LearningProblems

LearningTechniques

Regularization

• Vapnik, 1995:

AnIntroductionto Machine

Learning

Fabio A.Gonzalez

Ph.D.

Introduction

Patterns andGeneralization

LearningProblems

Supervised

Non-supervised

Active

On-line

LearningTechniques

Types

• Supervised learning

• Non-supervised learning

• Semi-supervised learning

• Active/reinforcement learning

• On-line learning

AnIntroductionto Machine

Learning

Fabio A.Gonzalez

Ph.D.

Introduction

Patterns andGeneralization

LearningProblems

Supervised

Non-supervised

Active

On-line

LearningTechniques

Supervised learning

• Fundamentalproblem: to find afunction that relates aset of inputs with a setof outputs

• Typical problems:

• Classification• Regression

AnIntroductionto Machine

Learning

Fabio A.Gonzalez

Ph.D.

Introduction

Patterns andGeneralization

LearningProblems

Supervised

Non-supervised

Active

On-line

LearningTechniques

Non-supervised learningLatent Dirichlet allocation (LDA)

gene 0.04

dna 0.02

genetic 0.01

.,,

life 0.02

evolve 0.01

organism 0.01

.,,

brain 0.04

neuron 0.02

nerve 0.01

...

data 0.02

number 0.02

computer 0.01

.,,

Topics DocumentsTopic proportions and

assignments

• Each topic is a distribution over words

• Each document is a mixture of corpus-wide topics

• Each word is drawn from one of those topics

AnIntroductionto Machine

Learning

Fabio A.Gonzalez

Ph.D.

Introduction

Patterns andGeneralization

LearningProblems

Supervised

Non-supervised

Active

On-line

LearningTechniques

Non-supervised learningLatent Dirichlet allocation (LDA)

gene 0.04

dna 0.02

genetic 0.01

.,,

life 0.02

evolve 0.01

organism 0.01

.,,

brain 0.04

neuron 0.02

nerve 0.01

...

data 0.02

number 0.02

computer 0.01

.,,

Topics DocumentsTopic proportions and

assignments

• Each topic is a distribution over words

• Each document is a mixture of corpus-wide topics

• Each word is drawn from one of those topics

• There are not labels for the training samples

• Fundamental problem: to find the subjacent structure ofa training data set

• Typical problems: clustering, probability densityestimation, dimensionality reduction, latent topic analysis,data compression

• Some samples may have labels, in that case it is calledsemi-supervised learning

AnIntroductionto Machine

Learning

Fabio A.Gonzalez

Ph.D.

Introduction

Patterns andGeneralization

LearningProblems

Supervised

Non-supervised

Active

On-line

LearningTechniques

Active/reinforcement learning

• Generally, it happens inthe context of an agentacting in an environment

• The agent is not toldwhether it has make theright decision or not

• The agent is punished orrewarded (not necessarilyin an immediate way)

• Fundamental problem:to define a policy thatallows to maximize thepositive stimulus (reward) https://www.youtube.com/watch?v=iqXKQf2BOSE

AnIntroductionto Machine

Learning

Fabio A.Gonzalez

Ph.D.

Introduction

Patterns andGeneralization

LearningProblems

Supervised

Non-supervised

Active

On-line

LearningTechniques

On-line learning

• Only one pass through the data

• big data volume• real time

• It may be supervised or unsupervised

• Fundamental problem: to extract the maximuminformation from data with minimum number of passes

AnIntroductionto Machine

Learning

Fabio A.Gonzalez

Ph.D.

Introduction

Patterns andGeneralization

LearningProblems

LearningTechniques

Representative techniques

• Computational

• Decision trees• Nearest-neighbor

classification• Graph-based clustering• Association rules

• Statistical

• Multivariate regression• Linear discriminant

analysis• Bayesian decision theory• Bayesian networks• K-means

• Computational-Statistical

• SVM• AdaBoost

• Bio-inspired

• Neural networks• Genetic algorithms• Artificial immune

systems

AnIntroductionto Machine

Learning

Fabio A.Gonzalez

Ph.D.

Introduction

Patterns andGeneralization

LearningProblems

LearningTechniques

Alpaydin, E. 2010 Introduction to Machine Learning(Adaptive Computation and Machine Learning). The MITPress. (Chap 1,2)