an introduction to machine learning · an introduction to machine learning fabio a. gonz alez ph.d....

AnIntroductionto Machine

Learning

Fabio A.Gonzalez

Ph.D.

Introduction

Patterns andGeneralization

LearningProblems

LearningTechniques

An Introduction to Machine Learning

Fabio A. Gonzalez Ph.D.

Depto. de Ing. de Sistemas e IndustrialUniversidad Nacional de Colombia, Bogota

February 9, 2018


Learning

Fabio A.Gonzalez

Ph.D.

Introduction


LearningProblems

LearningTechniques

Content

1 IntroductionExampleHow to State the Learning Problem?How to Solve the Learning Problem?

2 Patterns and GeneralizationGeneralizing from patternsOverfitting/ OverlearningHow to Measure the Quality of a Solution?

3 Learning ProblemsSupervisedNon-supervisedActiveOn-line

4 Learning Techniques


Learning

Fabio A.Gonzalez

Ph.D.

Introduction

Example

How to State theLearning Problem?

How to Solve theLearning Problem?


LearningProblems

LearningTechniques

Two class classification problem


Learning

Fabio A.Gonzalez

Ph.D.

Introduction

Example




LearningProblems

LearningTechniques

How to solve it?

• We need to build a prediction function f : R2 → R suchthat::

Prediction(x , y) =

{C1 si f (x , y) ≥ 0

C2 si f (x , y) < 0

• Training set: D = {((x1, y1), l1) , . . . , ((xn , yn), ln)}• Example:

D = {((1, 2),−1), ((1, 3),−1), ((3, 1), 1), . . . }• Loss function:

L(f ,D) =∑

(xi ,yi ,li )∈D

|sign(f (xi , yi))− li |2


Learning

Fabio A.Gonzalez

Ph.D.

Introduction

Example




LearningProblems

LearningTechniques

L1 Error loss

f (x , y) = w1x + w0y

L(f ,D) =12

∑(xi ,yi ,li )∈D |f (xi , yi))− li |

• Are there otheralternative lossfunctions? �2.0 �1.5 �1.0 �0.5 0.0 0.5 1.0 1.5

w0

�2.0

�1.5

�1.0

�0.5

0.0

0.5

1.0

1.5

w1

0

15

30

45

60

75

90

105

120

135


Learning

Fabio A.Gonzalez

Ph.D.

Introduction

Example




LearningProblems

LearningTechniques

Square error loss

f (x , y) = w1x + w0y

L(f ,D) =12

∑(xi ,yi ,li )∈D(f (xi , yi))−

li)2

�2.0 �1.5 �1.0 �0.5 0.0 0.5 1.0 1.5w0

�2.0

�1.5

�1.0

�0.5

0.0

0.5

1.0

1.5

w1

0

160

320

480

640

800

960

1120


Learning

Fabio A.Gonzalez

Ph.D.

Introduction

Example




LearningProblems

LearningTechniques

Learning as optimization

• General optimization problem:

minf ∈H

L(f ,D)

• Two Class 2D classification using linear functions:

H = {f : f (x , y) = w2x + w1y + w0, ∀w0,w1,w2 ∈ R}

minf ∈H

L(f ,D) = minW∈R3

1

2

∑(xi ,yi )∈D

(w2xi + w1yi + w0 − li)2


Learning

Fabio A.Gonzalez

Ph.D.

Introduction

Example




LearningProblems

LearningTechniques

Hypothesis space

8 6 4 2 0x

4

2

0

2

y


Learning

Fabio A.Gonzalez

Ph.D.

Introduction

Example




LearningProblems

LearningTechniques

Gradient descent

Iterative optimization of the lossfunction:

initialize W 0 =w0,w1,w2

k ← 0repeat

k ← k + 1W k ←W k−1 −

η(k)∇L(fW k−1 ,S )until |η(k)∇L(fW k−1 ,S )| <Θ


Learning

Fabio A.Gonzalez

Ph.D.

Introduction

Example




LearningProblems

LearningTechniques

Gradient descent iteration example(1)

�2.0 �1.5 �1.0 �0.5 0.0 0.5 1.0 1.5w2

�2.0

�1.5

�1.0

�0.5

0.0

0.5

1.0

1.5w1

0

160

320

480

640

800

960

1120


Learning

Fabio A.Gonzalez

Ph.D.

Introduction

Example




LearningProblems

LearningTechniques

Gradient descent iteration example(2)

�2.0 �1.5 �1.0 �0.5 0.0 0.5 1.0 1.5w2

�2.0

�1.5

�1.0

�0.5

0.0

0.5

1.0

1.5w1

0

160

320

480

640

800

960

1120


Learning

Fabio A.Gonzalez

Ph.D.

Introduction

Example




LearningProblems

LearningTechniques

Non-separable data

4 5 6 7 8 9 10 11 12x

12

11

10

9

8

7

6

5

y


Learning

Fabio A.Gonzalez

Ph.D.

Introduction


Generalizing frompatterns

Overfitting/Overlearning

How to Measure theQuality of aSolution?

LearningProblems

LearningTechniques

What is a pattern?

• Data regularities

• Data relationships

• Redundancy

• Generative model


Learning

Fabio A.Gonzalez

Ph.D.

Introduction





LearningProblems

LearningTechniques

Learning a boolean function

x1 x2 f1 f2 ... f16

0 0 0 0 ... 1

0 1 0 0 ... 1

1 0 0 0 ... 1

1 1 0 1 ... 1

• How many Boolean functions of n variables are?

• How many candidate functions are removed by a sample?

• Is it possible to generalize?


Learning

Fabio A.Gonzalez

Ph.D.

Introduction





LearningProblems

LearningTechniques

Inductive bias

• In general, the learning problem is ill-posed (more thanone possible solution for the same particular problem,solutions are sensitive to small changes on the problem)

• It is necessary to make additional assumptions about thekind of pattern that we want to learn

• Hypothesis space: set of valid patterns that can belearnt by the algorithm


Learning

Fabio A.Gonzalez

Ph.D.

Introduction





LearningProblems

LearningTechniques

What is a good pattern?


Learning

Fabio A.Gonzalez

Ph.D.

Introduction





LearningProblems

LearningTechniques

Occam’s razor

from Wikipedia:

Occam’s razor (also spelled Ockham’s razor) is a principleattributed to the 14th-century English logician and Franciscanfriar William of Ockham. The principle states that theexplanation of any phenomenon should make as fewassumptions as possible, eliminating, or ”shaving off”, thosethat make no difference in the observable predictions of theexplanatory hypothesis or theory. The principle is oftenexpressed in Latin as the lex parsimoniae (law of succinctnessor parsimony).”All things being equal, the simplest solution tends to bethe best one.”


Learning

Fabio A.Gonzalez

Ph.D.

Introduction





LearningProblems

LearningTechniques

Training error vsgeneralization error

• The loss function measures the error in the training set

• Is this a good measure of the quality of the solution?

•


Learning

Fabio A.Gonzalez

Ph.D.

Introduction





LearningProblems

LearningTechniques

Over-fitting and under-fitting

4 2 0 2 4

2

0

2

4

6

8

10

Subajuste

4 2 0 2 4

2

0

2

4

6

8

10

Ajuste apropiado

4 2 0 2 4

2

0

2

4

6

8

10

Sobreajuste


Learning

Fabio A.Gonzalez

Ph.D.

Introduction





LearningProblems

LearningTechniques

Generalization error

• Generalization error:

E [(L(fw ,S )]

• How to control the generalization error during training?

• Cross validation• Regularization


Learning

Fabio A.Gonzalez

Ph.D.

Introduction





LearningProblems

LearningTechniques

Regularization

• Vapnik, 1995:


Learning

Fabio A.Gonzalez

Ph.D.

Introduction


LearningProblems

Supervised

Non-supervised

Active

On-line

LearningTechniques

Types

• Supervised learning

• Non-supervised learning

• Semi-supervised learning

• Active/reinforcement learning

• On-line learning


Learning

Fabio A.Gonzalez

Ph.D.

Introduction


LearningProblems

Supervised

Non-supervised

Active

On-line

LearningTechniques

Supervised learning

• Fundamentalproblem: to find afunction that relates aset of inputs with a setof outputs

• Typical problems:

• Classification• Regression


Learning

Fabio A.Gonzalez

Ph.D.

Introduction


LearningProblems

Supervised

Non-supervised

Active

On-line

LearningTechniques

Non-supervised learningLatent Dirichlet allocation (LDA)

gene 0.04

dna 0.02

genetic 0.01

.,,

life 0.02

evolve 0.01

organism 0.01

.,,

brain 0.04

neuron 0.02

nerve 0.01

...

data 0.02

number 0.02

computer 0.01

.,,

Topics DocumentsTopic proportions and

assignments

• Each topic is a distribution over words

• Each document is a mixture of corpus-wide topics

• Each word is drawn from one of those topics


Learning

Fabio A.Gonzalez

Ph.D.

Introduction


LearningProblems

Supervised

Non-supervised

Active

On-line

LearningTechniques

Non-supervised learningLatent Dirichlet allocation (LDA)

gene 0.04

dna 0.02

genetic 0.01

.,,

life 0.02

evolve 0.01

organism 0.01

.,,

brain 0.04

neuron 0.02

nerve 0.01

...

data 0.02

number 0.02

computer 0.01

.,,

Topics DocumentsTopic proportions and

assignments

• Each topic is a distribution over words

• Each document is a mixture of corpus-wide topics

• Each word is drawn from one of those topics

• There are not labels for the training samples

• Fundamental problem: to find the subjacent structure ofa training data set

• Typical problems: clustering, probability densityestimation, dimensionality reduction, latent topic analysis,data compression

• Some samples may have labels, in that case it is calledsemi-supervised learning


Learning

Fabio A.Gonzalez

Ph.D.

Introduction


LearningProblems

Supervised

Non-supervised

Active

On-line

LearningTechniques

Active/reinforcement learning

• Generally, it happens inthe context of an agentacting in an environment

• The agent is not toldwhether it has make theright decision or not

• The agent is punished orrewarded (not necessarilyin an immediate way)

• Fundamental problem:to define a policy thatallows to maximize thepositive stimulus (reward) https://www.youtube.com/watch?v=iqXKQf2BOSE

https://www.youtube.com/watch?v=iqXKQf2BOSE


Learning

Fabio A.Gonzalez

Ph.D.

Introduction


LearningProblems

Supervised

Non-supervised

Active

On-line

LearningTechniques

On-line learning

• Only one pass through the data

• big data volume• real time

• It may be supervised or unsupervised

• Fundamental problem: to extract the maximuminformation from data with minimum number of passes


Learning

Fabio A.Gonzalez

Ph.D.

Introduction


LearningProblems

LearningTechniques

Representative techniques

• Computational

• Decision trees• Nearest-neighbor

classification• Graph-based clustering• Association rules

• Statistical

• Multivariate regression• Linear discriminant

analysis• Bayesian decision theory• Bayesian networks• K-means

• Computational-Statistical

• SVM• AdaBoost

• Bio-inspired

• Neural networks• Genetic algorithms• Artificial immune

systems


Learning

Fabio A.Gonzalez

Ph.D.

Introduction


LearningProblems

LearningTechniques

Alpaydin, E. 2010 Introduction to Machine Learning(Adaptive Computation and Machine Learning). The MITPress. (Chap 1,2)