an introduction to machine learningfgonza/courses/2011-ii/ml/introduction.pdf · introduction to...

67
An Introduction to Machine Learning Fabio A. Gonz´ alez Ph.D. Patterns and Generalization Learning Problems Learning Techniques Main Questions An Introduction to Machine Learning Fabio A. Gonz´ alez Ph.D. Depto. de Ing. de Sistemas e Industrial Universidad Nacional de Colombia, Bogot´ a August 10, 2011

Upload: others

Post on 24-Jul-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: An Introduction to Machine Learningfgonza/courses/2011-II/ML/introduction.pdf · Introduction to Machine Learning Fabio A. Gonz´alez Ph.D. Patterns and Generalization Learning Problems

AnIntroductionto MachineLearning

Fabio A.GonzalezPh.D.

Patterns andGeneralization

LearningProblems

LearningTechniques

MainQuestions

An Introduction to Machine Learning

Fabio A. Gonzalez Ph.D.

Depto. de Ing. de Sistemas e IndustrialUniversidad Nacional de Colombia, Bogota

August 10, 2011

Page 2: An Introduction to Machine Learningfgonza/courses/2011-II/ML/introduction.pdf · Introduction to Machine Learning Fabio A. Gonz´alez Ph.D. Patterns and Generalization Learning Problems

AnIntroductionto MachineLearning

Fabio A.GonzalezPh.D.

Patterns andGeneralization

LearningProblems

LearningTechniques

MainQuestions

Content

1 Patterns and GeneralizationGeneralizing from patternsOverfitting/ Overlearning

2 Learning ProblemsSupervisedNon-supervisedActiveOn-line

3 Learning Techniques

4 Main QuestionsHow to State the Learning Problem?How to Solve the Learning Problem?How to Measure the Quality of a Solution?

Page 3: An Introduction to Machine Learningfgonza/courses/2011-II/ML/introduction.pdf · Introduction to Machine Learning Fabio A. Gonz´alez Ph.D. Patterns and Generalization Learning Problems

AnIntroductionto MachineLearning

Fabio A.GonzalezPh.D.

Patterns andGeneralization

Generalizing frompatterns

Overfitting/Overlearning

LearningProblems

LearningTechniques

MainQuestions

Outline

1 Patterns and GeneralizationGeneralizing from patternsOverfitting/ Overlearning

2 Learning ProblemsSupervisedNon-supervisedActiveOn-line

3 Learning Techniques

4 Main QuestionsHow to State the Learning Problem?How to Solve the Learning Problem?How to Measure the Quality of a Solution?

Page 4: An Introduction to Machine Learningfgonza/courses/2011-II/ML/introduction.pdf · Introduction to Machine Learning Fabio A. Gonz´alez Ph.D. Patterns and Generalization Learning Problems

AnIntroductionto MachineLearning

Fabio A.GonzalezPh.D.

Patterns andGeneralization

Generalizing frompatterns

Overfitting/Overlearning

LearningProblems

LearningTechniques

MainQuestions

Outline

1 Patterns and GeneralizationGeneralizing from patternsOverfitting/ Overlearning

2 Learning ProblemsSupervisedNon-supervisedActiveOn-line

3 Learning Techniques

4 Main QuestionsHow to State the Learning Problem?How to Solve the Learning Problem?How to Measure the Quality of a Solution?

Page 5: An Introduction to Machine Learningfgonza/courses/2011-II/ML/introduction.pdf · Introduction to Machine Learning Fabio A. Gonz´alez Ph.D. Patterns and Generalization Learning Problems

AnIntroductionto MachineLearning

Fabio A.GonzalezPh.D.

Patterns andGeneralization

Generalizing frompatterns

Overfitting/Overlearning

LearningProblems

LearningTechniques

MainQuestions

What is a pattern?

• Data regularities

• Data relationships

• Redundancy

• Generative model

Page 6: An Introduction to Machine Learningfgonza/courses/2011-II/ML/introduction.pdf · Introduction to Machine Learning Fabio A. Gonz´alez Ph.D. Patterns and Generalization Learning Problems

AnIntroductionto MachineLearning

Fabio A.GonzalezPh.D.

Patterns andGeneralization

Generalizing frompatterns

Overfitting/Overlearning

LearningProblems

LearningTechniques

MainQuestions

What is a pattern?

• Data regularities

• Data relationships

• Redundancy

• Generative model

Page 7: An Introduction to Machine Learningfgonza/courses/2011-II/ML/introduction.pdf · Introduction to Machine Learning Fabio A. Gonz´alez Ph.D. Patterns and Generalization Learning Problems

AnIntroductionto MachineLearning

Fabio A.GonzalezPh.D.

Patterns andGeneralization

Generalizing frompatterns

Overfitting/Overlearning

LearningProblems

LearningTechniques

MainQuestions

What is a pattern?

• Data regularities

• Data relationships

• Redundancy

• Generative model

Page 8: An Introduction to Machine Learningfgonza/courses/2011-II/ML/introduction.pdf · Introduction to Machine Learning Fabio A. Gonz´alez Ph.D. Patterns and Generalization Learning Problems

AnIntroductionto MachineLearning

Fabio A.GonzalezPh.D.

Patterns andGeneralization

Generalizing frompatterns

Overfitting/Overlearning

LearningProblems

LearningTechniques

MainQuestions

What is a pattern?

• Data regularities

• Data relationships

• Redundancy

• Generative model

Page 9: An Introduction to Machine Learningfgonza/courses/2011-II/ML/introduction.pdf · Introduction to Machine Learning Fabio A. Gonz´alez Ph.D. Patterns and Generalization Learning Problems

AnIntroductionto MachineLearning

Fabio A.GonzalezPh.D.

Patterns andGeneralization

Generalizing frompatterns

Overfitting/Overlearning

LearningProblems

LearningTechniques

MainQuestions

Learning a Boolean function

x1 x2 f1 f2 ... f16

0 0 0 0 ... 1

0 1 0 0 ... 1

1 0 0 0 ... 1

1 1 0 1 ... 1

• How many Boolean functions of n variables are?

• How many candidate functions are removed by a sample?

• Is it possible to generalize?

Page 10: An Introduction to Machine Learningfgonza/courses/2011-II/ML/introduction.pdf · Introduction to Machine Learning Fabio A. Gonz´alez Ph.D. Patterns and Generalization Learning Problems

AnIntroductionto MachineLearning

Fabio A.GonzalezPh.D.

Patterns andGeneralization

Generalizing frompatterns

Overfitting/Overlearning

LearningProblems

LearningTechniques

MainQuestions

Learning a Boolean function

x1 x2 f1 f2 ... f16

0 0 0 0 ... 1

0 1 0 0 ... 1

1 0 0 0 ... 1

1 1 0 1 ... 1

• How many Boolean functions of n variables are?

• How many candidate functions are removed by a sample?

• Is it possible to generalize?

Page 11: An Introduction to Machine Learningfgonza/courses/2011-II/ML/introduction.pdf · Introduction to Machine Learning Fabio A. Gonz´alez Ph.D. Patterns and Generalization Learning Problems

AnIntroductionto MachineLearning

Fabio A.GonzalezPh.D.

Patterns andGeneralization

Generalizing frompatterns

Overfitting/Overlearning

LearningProblems

LearningTechniques

MainQuestions

Learning a Boolean function

x1 x2 f1 f2 ... f16

0 0 0 0 ... 1

0 1 0 0 ... 1

1 0 0 0 ... 1

1 1 0 1 ... 1

• How many Boolean functions of n variables are?

• How many candidate functions are removed by a sample?

• Is it possible to generalize?

Page 12: An Introduction to Machine Learningfgonza/courses/2011-II/ML/introduction.pdf · Introduction to Machine Learning Fabio A. Gonz´alez Ph.D. Patterns and Generalization Learning Problems

AnIntroductionto MachineLearning

Fabio A.GonzalezPh.D.

Patterns andGeneralization

Generalizing frompatterns

Overfitting/Overlearning

LearningProblems

LearningTechniques

MainQuestions

Inductive bias

• In general, the learning problem is ill-posed (more thanone possible solution for the same particular problem,solutions are sensitive to small changes on the problem)

• It is necessary to make additional assumptions about thekind of pattern that we want to learn

• Hypothesis space: set of valid patterns that can belearnt by the algorithm

Page 13: An Introduction to Machine Learningfgonza/courses/2011-II/ML/introduction.pdf · Introduction to Machine Learning Fabio A. Gonz´alez Ph.D. Patterns and Generalization Learning Problems

AnIntroductionto MachineLearning

Fabio A.GonzalezPh.D.

Patterns andGeneralization

Generalizing frompatterns

Overfitting/Overlearning

LearningProblems

LearningTechniques

MainQuestions

Inductive bias

• In general, the learning problem is ill-posed (more thanone possible solution for the same particular problem,solutions are sensitive to small changes on the problem)

• It is necessary to make additional assumptions about thekind of pattern that we want to learn

• Hypothesis space: set of valid patterns that can belearnt by the algorithm

Page 14: An Introduction to Machine Learningfgonza/courses/2011-II/ML/introduction.pdf · Introduction to Machine Learning Fabio A. Gonz´alez Ph.D. Patterns and Generalization Learning Problems

AnIntroductionto MachineLearning

Fabio A.GonzalezPh.D.

Patterns andGeneralization

Generalizing frompatterns

Overfitting/Overlearning

LearningProblems

LearningTechniques

MainQuestions

Inductive bias

• In general, the learning problem is ill-posed (more thanone possible solution for the same particular problem,solutions are sensitive to small changes on the problem)

• It is necessary to make additional assumptions about thekind of pattern that we want to learn

• Hypothesis space: set of valid patterns that can belearnt by the algorithm

Page 15: An Introduction to Machine Learningfgonza/courses/2011-II/ML/introduction.pdf · Introduction to Machine Learning Fabio A. Gonz´alez Ph.D. Patterns and Generalization Learning Problems

AnIntroductionto MachineLearning

Fabio A.GonzalezPh.D.

Patterns andGeneralization

Generalizing frompatterns

Overfitting/Overlearning

LearningProblems

LearningTechniques

MainQuestions

Outline

1 Patterns and GeneralizationGeneralizing from patternsOverfitting/ Overlearning

2 Learning ProblemsSupervisedNon-supervisedActiveOn-line

3 Learning Techniques

4 Main QuestionsHow to State the Learning Problem?How to Solve the Learning Problem?How to Measure the Quality of a Solution?

Page 16: An Introduction to Machine Learningfgonza/courses/2011-II/ML/introduction.pdf · Introduction to Machine Learning Fabio A. Gonz´alez Ph.D. Patterns and Generalization Learning Problems

AnIntroductionto MachineLearning

Fabio A.GonzalezPh.D.

Patterns andGeneralization

Generalizing frompatterns

Overfitting/Overlearning

LearningProblems

LearningTechniques

MainQuestions

What is a good pattern?

Page 17: An Introduction to Machine Learningfgonza/courses/2011-II/ML/introduction.pdf · Introduction to Machine Learning Fabio A. Gonz´alez Ph.D. Patterns and Generalization Learning Problems

AnIntroductionto MachineLearning

Fabio A.GonzalezPh.D.

Patterns andGeneralization

Generalizing frompatterns

Overfitting/Overlearning

LearningProblems

LearningTechniques

MainQuestions

What is a good pattern?

Page 18: An Introduction to Machine Learningfgonza/courses/2011-II/ML/introduction.pdf · Introduction to Machine Learning Fabio A. Gonz´alez Ph.D. Patterns and Generalization Learning Problems

AnIntroductionto MachineLearning

Fabio A.GonzalezPh.D.

Patterns andGeneralization

Generalizing frompatterns

Overfitting/Overlearning

LearningProblems

LearningTechniques

MainQuestions

Occam’s Razor

from Wikipedia:

Occam’s razor (also spelled Ockham’s razor) is a principleattributed to the 14th-century English logician and Franciscanfriar William of Ockham. The principle states that theexplanation of any phenomenon should make as fewassumptions as possible, eliminating, or ”shaving off”, thosethat make no difference in the observable predictions of theexplanatory hypothesis or theory. The principle is oftenexpressed in Latin as the lex parsimoniae (law of succinctnessor parsimony).”All things being equal, the simplest solution tends to bethe best one.”

Page 19: An Introduction to Machine Learningfgonza/courses/2011-II/ML/introduction.pdf · Introduction to Machine Learning Fabio A. Gonz´alez Ph.D. Patterns and Generalization Learning Problems

AnIntroductionto MachineLearning

Fabio A.GonzalezPh.D.

Patterns andGeneralization

LearningProblems

Supervised

Non-supervised

Active

On-line

LearningTechniques

MainQuestions

Outline

1 Patterns and GeneralizationGeneralizing from patternsOverfitting/ Overlearning

2 Learning ProblemsSupervisedNon-supervisedActiveOn-line

3 Learning Techniques

4 Main QuestionsHow to State the Learning Problem?How to Solve the Learning Problem?How to Measure the Quality of a Solution?

Page 20: An Introduction to Machine Learningfgonza/courses/2011-II/ML/introduction.pdf · Introduction to Machine Learning Fabio A. Gonz´alez Ph.D. Patterns and Generalization Learning Problems

AnIntroductionto MachineLearning

Fabio A.GonzalezPh.D.

Patterns andGeneralization

LearningProblems

Supervised

Non-supervised

Active

On-line

LearningTechniques

MainQuestions

Types

• Supervised learning

• Non-supervised learning

• Semi-supervised learning

• Active learning

• On-line learning

Page 21: An Introduction to Machine Learningfgonza/courses/2011-II/ML/introduction.pdf · Introduction to Machine Learning Fabio A. Gonz´alez Ph.D. Patterns and Generalization Learning Problems

AnIntroductionto MachineLearning

Fabio A.GonzalezPh.D.

Patterns andGeneralization

LearningProblems

Supervised

Non-supervised

Active

On-line

LearningTechniques

MainQuestions

Outline

1 Patterns and GeneralizationGeneralizing from patternsOverfitting/ Overlearning

2 Learning ProblemsSupervisedNon-supervisedActiveOn-line

3 Learning Techniques

4 Main QuestionsHow to State the Learning Problem?How to Solve the Learning Problem?How to Measure the Quality of a Solution?

Page 22: An Introduction to Machine Learningfgonza/courses/2011-II/ML/introduction.pdf · Introduction to Machine Learning Fabio A. Gonz´alez Ph.D. Patterns and Generalization Learning Problems

AnIntroductionto MachineLearning

Fabio A.GonzalezPh.D.

Patterns andGeneralization

LearningProblems

Supervised

Non-supervised

Active

On-line

LearningTechniques

MainQuestions

Supervised learning

• Fundamentalproblem: to find afunction that relates aset of inputs with a setof outputs

• Typical problems:

• Classification• Regression

Page 23: An Introduction to Machine Learningfgonza/courses/2011-II/ML/introduction.pdf · Introduction to Machine Learning Fabio A. Gonz´alez Ph.D. Patterns and Generalization Learning Problems

AnIntroductionto MachineLearning

Fabio A.GonzalezPh.D.

Patterns andGeneralization

LearningProblems

Supervised

Non-supervised

Active

On-line

LearningTechniques

MainQuestions

Supervised learning

• Fundamentalproblem: to find afunction that relates aset of inputs with a setof outputs

• Typical problems:

• Classification• Regression

Page 24: An Introduction to Machine Learningfgonza/courses/2011-II/ML/introduction.pdf · Introduction to Machine Learning Fabio A. Gonz´alez Ph.D. Patterns and Generalization Learning Problems

AnIntroductionto MachineLearning

Fabio A.GonzalezPh.D.

Patterns andGeneralization

LearningProblems

Supervised

Non-supervised

Active

On-line

LearningTechniques

MainQuestions

Supervised learning

• Fundamentalproblem: to find afunction that relates aset of inputs with a setof outputs

• Typical problems:

• Classification

• Regression

Page 25: An Introduction to Machine Learningfgonza/courses/2011-II/ML/introduction.pdf · Introduction to Machine Learning Fabio A. Gonz´alez Ph.D. Patterns and Generalization Learning Problems

AnIntroductionto MachineLearning

Fabio A.GonzalezPh.D.

Patterns andGeneralization

LearningProblems

Supervised

Non-supervised

Active

On-line

LearningTechniques

MainQuestions

Supervised learning

• Fundamentalproblem: to find afunction that relates aset of inputs with a setof outputs

• Typical problems:

• Classification• Regression

Page 26: An Introduction to Machine Learningfgonza/courses/2011-II/ML/introduction.pdf · Introduction to Machine Learning Fabio A. Gonz´alez Ph.D. Patterns and Generalization Learning Problems

AnIntroductionto MachineLearning

Fabio A.GonzalezPh.D.

Patterns andGeneralization

LearningProblems

Supervised

Non-supervised

Active

On-line

LearningTechniques

MainQuestions

Outline

1 Patterns and GeneralizationGeneralizing from patternsOverfitting/ Overlearning

2 Learning ProblemsSupervisedNon-supervisedActiveOn-line

3 Learning Techniques

4 Main QuestionsHow to State the Learning Problem?How to Solve the Learning Problem?How to Measure the Quality of a Solution?

Page 27: An Introduction to Machine Learningfgonza/courses/2011-II/ML/introduction.pdf · Introduction to Machine Learning Fabio A. Gonz´alez Ph.D. Patterns and Generalization Learning Problems

AnIntroductionto MachineLearning

Fabio A.GonzalezPh.D.

Patterns andGeneralization

LearningProblems

Supervised

Non-supervised

Active

On-line

LearningTechniques

MainQuestions

Non-supervised learning

• There are not labels for the training samples

• Fundamental problem: to find the subjacent structure ofa training data set

• Typical problems: clustering, data compression

• Some samples may have labels, in that case it is calledsemi-supervised learning

Page 28: An Introduction to Machine Learningfgonza/courses/2011-II/ML/introduction.pdf · Introduction to Machine Learning Fabio A. Gonz´alez Ph.D. Patterns and Generalization Learning Problems

AnIntroductionto MachineLearning

Fabio A.GonzalezPh.D.

Patterns andGeneralization

LearningProblems

Supervised

Non-supervised

Active

On-line

LearningTechniques

MainQuestions

Non-supervised learning

• There are not labels for the training samples

• Fundamental problem: to find the subjacent structure ofa training data set

• Typical problems: clustering, data compression

• Some samples may have labels, in that case it is calledsemi-supervised learning

Page 29: An Introduction to Machine Learningfgonza/courses/2011-II/ML/introduction.pdf · Introduction to Machine Learning Fabio A. Gonz´alez Ph.D. Patterns and Generalization Learning Problems

AnIntroductionto MachineLearning

Fabio A.GonzalezPh.D.

Patterns andGeneralization

LearningProblems

Supervised

Non-supervised

Active

On-line

LearningTechniques

MainQuestions

Non-supervised learning

• There are not labels for the training samples

• Fundamental problem: to find the subjacent structure ofa training data set

• Typical problems: clustering, data compression

• Some samples may have labels, in that case it is calledsemi-supervised learning

Page 30: An Introduction to Machine Learningfgonza/courses/2011-II/ML/introduction.pdf · Introduction to Machine Learning Fabio A. Gonz´alez Ph.D. Patterns and Generalization Learning Problems

AnIntroductionto MachineLearning

Fabio A.GonzalezPh.D.

Patterns andGeneralization

LearningProblems

Supervised

Non-supervised

Active

On-line

LearningTechniques

MainQuestions

Non-supervised learning

• There are not labels for the training samples

• Fundamental problem: to find the subjacent structure ofa training data set

• Typical problems: clustering, data compression

• Some samples may have labels, in that case it is calledsemi-supervised learning

Page 31: An Introduction to Machine Learningfgonza/courses/2011-II/ML/introduction.pdf · Introduction to Machine Learning Fabio A. Gonz´alez Ph.D. Patterns and Generalization Learning Problems

AnIntroductionto MachineLearning

Fabio A.GonzalezPh.D.

Patterns andGeneralization

LearningProblems

Supervised

Non-supervised

Active

On-line

LearningTechniques

MainQuestions

Outline

1 Patterns and GeneralizationGeneralizing from patternsOverfitting/ Overlearning

2 Learning ProblemsSupervisedNon-supervisedActiveOn-line

3 Learning Techniques

4 Main QuestionsHow to State the Learning Problem?How to Solve the Learning Problem?How to Measure the Quality of a Solution?

Page 32: An Introduction to Machine Learningfgonza/courses/2011-II/ML/introduction.pdf · Introduction to Machine Learning Fabio A. Gonz´alez Ph.D. Patterns and Generalization Learning Problems

AnIntroductionto MachineLearning

Fabio A.GonzalezPh.D.

Patterns andGeneralization

LearningProblems

Supervised

Non-supervised

Active

On-line

LearningTechniques

MainQuestions

Active/reinforcing learning

• Generally, it happens in thecontext of an agent acting inan environment

• The agent is not told whetherit has make the right decisionor not

• The agent is punished orrewarded (not necessarily inan immediate way)

• Fundamental problem: todefine a policy that allows tomaximize the positivestimulus (reward)

Page 33: An Introduction to Machine Learningfgonza/courses/2011-II/ML/introduction.pdf · Introduction to Machine Learning Fabio A. Gonz´alez Ph.D. Patterns and Generalization Learning Problems

AnIntroductionto MachineLearning

Fabio A.GonzalezPh.D.

Patterns andGeneralization

LearningProblems

Supervised

Non-supervised

Active

On-line

LearningTechniques

MainQuestions

Active/reinforcing learning

• Generally, it happens in thecontext of an agent acting inan environment

• The agent is not told whetherit has make the right decisionor not

• The agent is punished orrewarded (not necessarily inan immediate way)

• Fundamental problem: todefine a policy that allows tomaximize the positivestimulus (reward)

Page 34: An Introduction to Machine Learningfgonza/courses/2011-II/ML/introduction.pdf · Introduction to Machine Learning Fabio A. Gonz´alez Ph.D. Patterns and Generalization Learning Problems

AnIntroductionto MachineLearning

Fabio A.GonzalezPh.D.

Patterns andGeneralization

LearningProblems

Supervised

Non-supervised

Active

On-line

LearningTechniques

MainQuestions

Active/reinforcing learning

• Generally, it happens in thecontext of an agent acting inan environment

• The agent is not told whetherit has make the right decisionor not

• The agent is punished orrewarded (not necessarily inan immediate way)

• Fundamental problem: todefine a policy that allows tomaximize the positivestimulus (reward)

Page 35: An Introduction to Machine Learningfgonza/courses/2011-II/ML/introduction.pdf · Introduction to Machine Learning Fabio A. Gonz´alez Ph.D. Patterns and Generalization Learning Problems

AnIntroductionto MachineLearning

Fabio A.GonzalezPh.D.

Patterns andGeneralization

LearningProblems

Supervised

Non-supervised

Active

On-line

LearningTechniques

MainQuestions

Active/reinforcing learning

• Generally, it happens in thecontext of an agent acting inan environment

• The agent is not told whetherit has make the right decisionor not

• The agent is punished orrewarded (not necessarily inan immediate way)

• Fundamental problem: todefine a policy that allows tomaximize the positivestimulus (reward)

Page 36: An Introduction to Machine Learningfgonza/courses/2011-II/ML/introduction.pdf · Introduction to Machine Learning Fabio A. Gonz´alez Ph.D. Patterns and Generalization Learning Problems

AnIntroductionto MachineLearning

Fabio A.GonzalezPh.D.

Patterns andGeneralization

LearningProblems

Supervised

Non-supervised

Active

On-line

LearningTechniques

MainQuestions

Outline

1 Patterns and GeneralizationGeneralizing from patternsOverfitting/ Overlearning

2 Learning ProblemsSupervisedNon-supervisedActiveOn-line

3 Learning Techniques

4 Main QuestionsHow to State the Learning Problem?How to Solve the Learning Problem?How to Measure the Quality of a Solution?

Page 37: An Introduction to Machine Learningfgonza/courses/2011-II/ML/introduction.pdf · Introduction to Machine Learning Fabio A. Gonz´alez Ph.D. Patterns and Generalization Learning Problems

AnIntroductionto MachineLearning

Fabio A.GonzalezPh.D.

Patterns andGeneralization

LearningProblems

Supervised

Non-supervised

Active

On-line

LearningTechniques

MainQuestions

On-line learning

• Only one pass through the data

• big data volume• real time

• It may be supervised or unsupervised

• Fundamental problem: to extract the maximuminformation from data with minimum number of passes

Page 38: An Introduction to Machine Learningfgonza/courses/2011-II/ML/introduction.pdf · Introduction to Machine Learning Fabio A. Gonz´alez Ph.D. Patterns and Generalization Learning Problems

AnIntroductionto MachineLearning

Fabio A.GonzalezPh.D.

Patterns andGeneralization

LearningProblems

Supervised

Non-supervised

Active

On-line

LearningTechniques

MainQuestions

On-line learning

• Only one pass through the data

• big data volume

• real time

• It may be supervised or unsupervised

• Fundamental problem: to extract the maximuminformation from data with minimum number of passes

Page 39: An Introduction to Machine Learningfgonza/courses/2011-II/ML/introduction.pdf · Introduction to Machine Learning Fabio A. Gonz´alez Ph.D. Patterns and Generalization Learning Problems

AnIntroductionto MachineLearning

Fabio A.GonzalezPh.D.

Patterns andGeneralization

LearningProblems

Supervised

Non-supervised

Active

On-line

LearningTechniques

MainQuestions

On-line learning

• Only one pass through the data

• big data volume• real time

• It may be supervised or unsupervised

• Fundamental problem: to extract the maximuminformation from data with minimum number of passes

Page 40: An Introduction to Machine Learningfgonza/courses/2011-II/ML/introduction.pdf · Introduction to Machine Learning Fabio A. Gonz´alez Ph.D. Patterns and Generalization Learning Problems

AnIntroductionto MachineLearning

Fabio A.GonzalezPh.D.

Patterns andGeneralization

LearningProblems

Supervised

Non-supervised

Active

On-line

LearningTechniques

MainQuestions

On-line learning

• Only one pass through the data

• big data volume• real time

• It may be supervised or unsupervised

• Fundamental problem: to extract the maximuminformation from data with minimum number of passes

Page 41: An Introduction to Machine Learningfgonza/courses/2011-II/ML/introduction.pdf · Introduction to Machine Learning Fabio A. Gonz´alez Ph.D. Patterns and Generalization Learning Problems

AnIntroductionto MachineLearning

Fabio A.GonzalezPh.D.

Patterns andGeneralization

LearningProblems

Supervised

Non-supervised

Active

On-line

LearningTechniques

MainQuestions

On-line learning

• Only one pass through the data

• big data volume• real time

• It may be supervised or unsupervised

• Fundamental problem: to extract the maximuminformation from data with minimum number of passes

Page 42: An Introduction to Machine Learningfgonza/courses/2011-II/ML/introduction.pdf · Introduction to Machine Learning Fabio A. Gonz´alez Ph.D. Patterns and Generalization Learning Problems

AnIntroductionto MachineLearning

Fabio A.GonzalezPh.D.

Patterns andGeneralization

LearningProblems

LearningTechniques

MainQuestions

Outline

1 Patterns and GeneralizationGeneralizing from patternsOverfitting/ Overlearning

2 Learning ProblemsSupervisedNon-supervisedActiveOn-line

3 Learning Techniques

4 Main QuestionsHow to State the Learning Problem?How to Solve the Learning Problem?How to Measure the Quality of a Solution?

Page 43: An Introduction to Machine Learningfgonza/courses/2011-II/ML/introduction.pdf · Introduction to Machine Learning Fabio A. Gonz´alez Ph.D. Patterns and Generalization Learning Problems

AnIntroductionto MachineLearning

Fabio A.GonzalezPh.D.

Patterns andGeneralization

LearningProblems

LearningTechniques

MainQuestions

Representative techniques

• Computational

• Decision trees• Nearest-neighbor

classification• Graph-based clustering• Association rules

• Statistical

• Multivariate regression• Linear discriminant

analysis• Bayesian decision theory• Bayesian networks• K-means

• Computational-Statistical

• SVM• AdaBoost

• Bio-inspired

• Neural networks• Genetic algorithms• Artificial immune

systems

Page 44: An Introduction to Machine Learningfgonza/courses/2011-II/ML/introduction.pdf · Introduction to Machine Learning Fabio A. Gonz´alez Ph.D. Patterns and Generalization Learning Problems

AnIntroductionto MachineLearning

Fabio A.GonzalezPh.D.

Patterns andGeneralization

LearningProblems

LearningTechniques

MainQuestions

How to State theLearning Problem?

How to Solve theLearning Problem?

How to Measure theQuality of aSolution?

Outline

1 Patterns and GeneralizationGeneralizing from patternsOverfitting/ Overlearning

2 Learning ProblemsSupervisedNon-supervisedActiveOn-line

3 Learning Techniques

4 Main QuestionsHow to State the Learning Problem?How to Solve the Learning Problem?How to Measure the Quality of a Solution?

Page 45: An Introduction to Machine Learningfgonza/courses/2011-II/ML/introduction.pdf · Introduction to Machine Learning Fabio A. Gonz´alez Ph.D. Patterns and Generalization Learning Problems

AnIntroductionto MachineLearning

Fabio A.GonzalezPh.D.

Patterns andGeneralization

LearningProblems

LearningTechniques

MainQuestions

How to State theLearning Problem?

How to Solve theLearning Problem?

How to Measure theQuality of aSolution?

Outline

1 Patterns and GeneralizationGeneralizing from patternsOverfitting/ Overlearning

2 Learning ProblemsSupervisedNon-supervisedActiveOn-line

3 Learning Techniques

4 Main QuestionsHow to State the Learning Problem?How to Solve the Learning Problem?How to Measure the Quality of a Solution?

Page 46: An Introduction to Machine Learningfgonza/courses/2011-II/ML/introduction.pdf · Introduction to Machine Learning Fabio A. Gonz´alez Ph.D. Patterns and Generalization Learning Problems

AnIntroductionto MachineLearning

Fabio A.GonzalezPh.D.

Patterns andGeneralization

LearningProblems

LearningTechniques

MainQuestions

How to State theLearning Problem?

How to Solve theLearning Problem?

How to Measure theQuality of aSolution?

Two Class Classification Problem

0 2 4 6 8 100

1

2

3

4

5

6

7

8

• The idea is to buid a linear classifier function, f : R2 → R,such that:

f (x , y) =

{< 0 if (x , y) ∈ C0

≥ 0 if (x , y) ∈ C1

Page 47: An Introduction to Machine Learningfgonza/courses/2011-II/ML/introduction.pdf · Introduction to Machine Learning Fabio A. Gonz´alez Ph.D. Patterns and Generalization Learning Problems

AnIntroductionto MachineLearning

Fabio A.GonzalezPh.D.

Patterns andGeneralization

LearningProblems

LearningTechniques

MainQuestions

How to State theLearning Problem?

How to Solve theLearning Problem?

How to Measure theQuality of aSolution?

Loss Function

• Training set: S = {((x1, y1), l1) , . . . , ((xn , yn), ln)}• Example:

S = {((1, 2),−1), ((1, 3),−1), ((3, 1), 1), . . . }

• Loss function:

L(f ,S ) =12

∑(xi ,yi )∈S

(f (xi , yi)− ln)2

• Are there other alternative loss functions?

Page 48: An Introduction to Machine Learningfgonza/courses/2011-II/ML/introduction.pdf · Introduction to Machine Learning Fabio A. Gonz´alez Ph.D. Patterns and Generalization Learning Problems

AnIntroductionto MachineLearning

Fabio A.GonzalezPh.D.

Patterns andGeneralization

LearningProblems

LearningTechniques

MainQuestions

How to State theLearning Problem?

How to Solve theLearning Problem?

How to Measure theQuality of aSolution?

Loss Function

• Training set: S = {((x1, y1), l1) , . . . , ((xn , yn), ln)}• Example:

S = {((1, 2),−1), ((1, 3),−1), ((3, 1), 1), . . . }

• Loss function:

L(f ,S ) =12

∑(xi ,yi )∈S

(f (xi , yi)− ln)2

• Are there other alternative loss functions?

Page 49: An Introduction to Machine Learningfgonza/courses/2011-II/ML/introduction.pdf · Introduction to Machine Learning Fabio A. Gonz´alez Ph.D. Patterns and Generalization Learning Problems

AnIntroductionto MachineLearning

Fabio A.GonzalezPh.D.

Patterns andGeneralization

LearningProblems

LearningTechniques

MainQuestions

How to State theLearning Problem?

How to Solve theLearning Problem?

How to Measure theQuality of aSolution?

Loss Function

• Training set: S = {((x1, y1), l1) , . . . , ((xn , yn), ln)}• Example:

S = {((1, 2),−1), ((1, 3),−1), ((3, 1), 1), . . . }

• Loss function:

L(f ,S ) =12

∑(xi ,yi )∈S

(f (xi , yi)− ln)2

• Are there other alternative loss functions?

Page 50: An Introduction to Machine Learningfgonza/courses/2011-II/ML/introduction.pdf · Introduction to Machine Learning Fabio A. Gonz´alez Ph.D. Patterns and Generalization Learning Problems

AnIntroductionto MachineLearning

Fabio A.GonzalezPh.D.

Patterns andGeneralization

LearningProblems

LearningTechniques

MainQuestions

How to State theLearning Problem?

How to Solve theLearning Problem?

How to Measure theQuality of aSolution?

Square Error Loss

f (x , y) = w1x + w0y

�2.0 �1.5�1.0 �0.5 0.0 0.5 1.0 1.5w0

�2.0�1.5�1.0�0.50.0

0.5

1.0

1.5

w1

0

160

320

480

640

800

960

1120

Page 51: An Introduction to Machine Learningfgonza/courses/2011-II/ML/introduction.pdf · Introduction to Machine Learning Fabio A. Gonz´alez Ph.D. Patterns and Generalization Learning Problems

AnIntroductionto MachineLearning

Fabio A.GonzalezPh.D.

Patterns andGeneralization

LearningProblems

LearningTechniques

MainQuestions

How to State theLearning Problem?

How to Solve theLearning Problem?

How to Measure theQuality of aSolution?

L1 Error Loss

f (x , y) = w1x + w0y

�2.0 �1.5�1.0 �0.5 0.0 0.5 1.0 1.5w0

�2.0�1.5�1.0�0.50.0

0.5

1.0

1.5

w1

0

15

30

45

60

75

90

105

120

135

Page 52: An Introduction to Machine Learningfgonza/courses/2011-II/ML/introduction.pdf · Introduction to Machine Learning Fabio A. Gonz´alez Ph.D. Patterns and Generalization Learning Problems

AnIntroductionto MachineLearning

Fabio A.GonzalezPh.D.

Patterns andGeneralization

LearningProblems

LearningTechniques

MainQuestions

How to State theLearning Problem?

How to Solve theLearning Problem?

How to Measure theQuality of aSolution?

Learning as Optimization

• General optimization problem:

minf ∈H

L(f ,S )

• Two Class 2D Classification:

H = {f : f (x , y) = w2x + w1y + w0,∀w0,w1,w2 ∈ R}

minf ∈H

L(f ,S ) = minW∈R3

12

∑(xi ,yi )∈S

(w2xi + w1yi + w0 − li)2

Page 53: An Introduction to Machine Learningfgonza/courses/2011-II/ML/introduction.pdf · Introduction to Machine Learning Fabio A. Gonz´alez Ph.D. Patterns and Generalization Learning Problems

AnIntroductionto MachineLearning

Fabio A.GonzalezPh.D.

Patterns andGeneralization

LearningProblems

LearningTechniques

MainQuestions

How to State theLearning Problem?

How to Solve theLearning Problem?

How to Measure theQuality of aSolution?

Learning as Optimization

• General optimization problem:

minf ∈H

L(f ,S )

• Two Class 2D Classification:

H = {f : f (x , y) = w2x + w1y + w0,∀w0,w1,w2 ∈ R}

minf ∈H

L(f ,S ) = minW∈R3

12

∑(xi ,yi )∈S

(w2xi + w1yi + w0 − li)2

Page 54: An Introduction to Machine Learningfgonza/courses/2011-II/ML/introduction.pdf · Introduction to Machine Learning Fabio A. Gonz´alez Ph.D. Patterns and Generalization Learning Problems

AnIntroductionto MachineLearning

Fabio A.GonzalezPh.D.

Patterns andGeneralization

LearningProblems

LearningTechniques

MainQuestions

How to State theLearning Problem?

How to Solve theLearning Problem?

How to Measure theQuality of aSolution?

Outline

1 Patterns and GeneralizationGeneralizing from patternsOverfitting/ Overlearning

2 Learning ProblemsSupervisedNon-supervisedActiveOn-line

3 Learning Techniques

4 Main QuestionsHow to State the Learning Problem?How to Solve the Learning Problem?How to Measure the Quality of a Solution?

Page 55: An Introduction to Machine Learningfgonza/courses/2011-II/ML/introduction.pdf · Introduction to Machine Learning Fabio A. Gonz´alez Ph.D. Patterns and Generalization Learning Problems

AnIntroductionto MachineLearning

Fabio A.GonzalezPh.D.

Patterns andGeneralization

LearningProblems

LearningTechniques

MainQuestions

How to State theLearning Problem?

How to Solve theLearning Problem?

How to Measure theQuality of aSolution?

Gradient Descent

Iterative optimization of the loss function:

initialize W 0 = w0,w1,w2

k ← 0repeat

k ← k + 1W k ←W k−1 − η(k)∇L(fW k−1 ,S )

until |η(k)∇L(fW k−1 ,S )| < Θ

Page 56: An Introduction to Machine Learningfgonza/courses/2011-II/ML/introduction.pdf · Introduction to Machine Learning Fabio A. Gonz´alez Ph.D. Patterns and Generalization Learning Problems

AnIntroductionto MachineLearning

Fabio A.GonzalezPh.D.

Patterns andGeneralization

LearningProblems

LearningTechniques

MainQuestions

How to State theLearning Problem?

How to Solve theLearning Problem?

How to Measure theQuality of aSolution?

Gradient Descent IterationExample (1)

�2.0�1.5�1.0�0.5 0.0 0.5 1.0 1.5w2

�2.0�1.5�1.0�0.50.0

0.5

1.0

1.5w1

0

160

320

480

640

800

960

1120

Page 57: An Introduction to Machine Learningfgonza/courses/2011-II/ML/introduction.pdf · Introduction to Machine Learning Fabio A. Gonz´alez Ph.D. Patterns and Generalization Learning Problems

AnIntroductionto MachineLearning

Fabio A.GonzalezPh.D.

Patterns andGeneralization

LearningProblems

LearningTechniques

MainQuestions

How to State theLearning Problem?

How to Solve theLearning Problem?

How to Measure theQuality of aSolution?

Gradient Descent IterationExample (2)

�2.0 �1.5�1.0 �0.5 0.0 0.5 1.0 1.5w2

�2.0�1.5�1.0�0.50.0

0.5

1.0

1.5w1

0

160

320

480

640

800

960

1120

Page 58: An Introduction to Machine Learningfgonza/courses/2011-II/ML/introduction.pdf · Introduction to Machine Learning Fabio A. Gonz´alez Ph.D. Patterns and Generalization Learning Problems

AnIntroductionto MachineLearning

Fabio A.GonzalezPh.D.

Patterns andGeneralization

LearningProblems

LearningTechniques

MainQuestions

How to State theLearning Problem?

How to Solve theLearning Problem?

How to Measure theQuality of aSolution?

Outline

1 Patterns and GeneralizationGeneralizing from patternsOverfitting/ Overlearning

2 Learning ProblemsSupervisedNon-supervisedActiveOn-line

3 Learning Techniques

4 Main QuestionsHow to State the Learning Problem?How to Solve the Learning Problem?How to Measure the Quality of a Solution?

Page 59: An Introduction to Machine Learningfgonza/courses/2011-II/ML/introduction.pdf · Introduction to Machine Learning Fabio A. Gonz´alez Ph.D. Patterns and Generalization Learning Problems

AnIntroductionto MachineLearning

Fabio A.GonzalezPh.D.

Patterns andGeneralization

LearningProblems

LearningTechniques

MainQuestions

How to State theLearning Problem?

How to Solve theLearning Problem?

How to Measure theQuality of aSolution?

Training Error vsGeneralization Error

• The loss function measures the error in the training set

• Is this a good measure of the quality of the solution?

Page 60: An Introduction to Machine Learningfgonza/courses/2011-II/ML/introduction.pdf · Introduction to Machine Learning Fabio A. Gonz´alez Ph.D. Patterns and Generalization Learning Problems

AnIntroductionto MachineLearning

Fabio A.GonzalezPh.D.

Patterns andGeneralization

LearningProblems

LearningTechniques

MainQuestions

How to State theLearning Problem?

How to Solve theLearning Problem?

How to Measure theQuality of aSolution?

Training Error vsGeneralization Error

• The loss function measures the error in the training set

• Is this a good measure of the quality of the solution?

Page 61: An Introduction to Machine Learningfgonza/courses/2011-II/ML/introduction.pdf · Introduction to Machine Learning Fabio A. Gonz´alez Ph.D. Patterns and Generalization Learning Problems

AnIntroductionto MachineLearning

Fabio A.GonzalezPh.D.

Patterns andGeneralization

LearningProblems

LearningTechniques

MainQuestions

How to State theLearning Problem?

How to Solve theLearning Problem?

How to Measure theQuality of aSolution?

Training Error vsGeneralization Error

• The loss function measures the error in the training set

• Is this a good measure of the quality of the solution?

Page 62: An Introduction to Machine Learningfgonza/courses/2011-II/ML/introduction.pdf · Introduction to Machine Learning Fabio A. Gonz´alez Ph.D. Patterns and Generalization Learning Problems

AnIntroductionto MachineLearning

Fabio A.GonzalezPh.D.

Patterns andGeneralization

LearningProblems

LearningTechniques

MainQuestions

How to State theLearning Problem?

How to Solve theLearning Problem?

How to Measure theQuality of aSolution?

Generalization Error

• Generalization error:

E [(L(fw ,S )]

• How to control the generalization error during training?

• Cross validation• Regularization

Page 63: An Introduction to Machine Learningfgonza/courses/2011-II/ML/introduction.pdf · Introduction to Machine Learning Fabio A. Gonz´alez Ph.D. Patterns and Generalization Learning Problems

AnIntroductionto MachineLearning

Fabio A.GonzalezPh.D.

Patterns andGeneralization

LearningProblems

LearningTechniques

MainQuestions

How to State theLearning Problem?

How to Solve theLearning Problem?

How to Measure theQuality of aSolution?

Generalization Error

• Generalization error:

E [(L(fw ,S )]

• How to control the generalization error during training?

• Cross validation• Regularization

Page 64: An Introduction to Machine Learningfgonza/courses/2011-II/ML/introduction.pdf · Introduction to Machine Learning Fabio A. Gonz´alez Ph.D. Patterns and Generalization Learning Problems

AnIntroductionto MachineLearning

Fabio A.GonzalezPh.D.

Patterns andGeneralization

LearningProblems

LearningTechniques

MainQuestions

How to State theLearning Problem?

How to Solve theLearning Problem?

How to Measure theQuality of aSolution?

Generalization Error

• Generalization error:

E [(L(fw ,S )]

• How to control the generalization error during training?

• Cross validation

• Regularization

Page 65: An Introduction to Machine Learningfgonza/courses/2011-II/ML/introduction.pdf · Introduction to Machine Learning Fabio A. Gonz´alez Ph.D. Patterns and Generalization Learning Problems

AnIntroductionto MachineLearning

Fabio A.GonzalezPh.D.

Patterns andGeneralization

LearningProblems

LearningTechniques

MainQuestions

How to State theLearning Problem?

How to Solve theLearning Problem?

How to Measure theQuality of aSolution?

Generalization Error

• Generalization error:

E [(L(fw ,S )]

• How to control the generalization error during training?

• Cross validation• Regularization

Page 66: An Introduction to Machine Learningfgonza/courses/2011-II/ML/introduction.pdf · Introduction to Machine Learning Fabio A. Gonz´alez Ph.D. Patterns and Generalization Learning Problems

AnIntroductionto MachineLearning

Fabio A.GonzalezPh.D.

Patterns andGeneralization

LearningProblems

LearningTechniques

MainQuestions

How to State theLearning Problem?

How to Solve theLearning Problem?

How to Measure theQuality of aSolution?

Regularization

• Vapnik, 1995:

Page 67: An Introduction to Machine Learningfgonza/courses/2011-II/ML/introduction.pdf · Introduction to Machine Learning Fabio A. Gonz´alez Ph.D. Patterns and Generalization Learning Problems

AnIntroductionto MachineLearning

Fabio A.GonzalezPh.D.

Patterns andGeneralization

LearningProblems

LearningTechniques

MainQuestions

How to State theLearning Problem?

How to Solve theLearning Problem?

How to Measure theQuality of aSolution?

Alpaydin, E. 2004 Introduction to Machine Learning(Adaptive Computation and Machine Learning). The MITPress. (Cap 1,2)