overfitting and regurlarization in machine learning · inductive learning • simplest form: learn...

35
Overfitting and Regurlarization in Machine Learning Based on [Bishop, PRML 05 ] Ch.1

Upload: others

Post on 06-Aug-2020

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Overfitting and Regurlarization in Machine Learning · Inductive learning • Simplest form: learn a function from examples An example is a pair (x, y) : x = data, y = outcomeassume:

Overfitting and Regurlarization in Machine Learning

Based on [Bishop, PRML 05 ] Ch.1

Page 2: Overfitting and Regurlarization in Machine Learning · Inductive learning • Simplest form: learn a function from examples An example is a pair (x, y) : x = data, y = outcomeassume:

Feedback in Learning

• Type of feedback:– Supervised learning: correct answers for each

example• Discrete (categories) : classification• Continuous : regression

– Unsupervised learning: correct answers not given– Reinforcement learning: occasional rewards

Page 3: Overfitting and Regurlarization in Machine Learning · Inductive learning • Simplest form: learn a function from examples An example is a pair (x, y) : x = data, y = outcomeassume:

Inductive learning• Simplest form: learn a function from examples

An example is a pair (x, y) : x = data, y = outcome assume: y drawn from function f(x) : y = f(x) + noise

f = target function

Problem: find a hypothesis hsuch that h ≈ fgiven a training set of examples

Note: highly simplified model :– Ignores prior knowledge : some h may be more likely– Assumes lots of examples are available– Objective: maximize prediction for unseen data – Q. How?

Page 4: Overfitting and Regurlarization in Machine Learning · Inductive learning • Simplest form: learn a function from examples An example is a pair (x, y) : x = data, y = outcomeassume:

Inductive learning method• Construct/adjust h to agree with f on training set• (h is consistent if it agrees with f on all examples)• E.g., curve fitting:

Page 5: Overfitting and Regurlarization in Machine Learning · Inductive learning • Simplest form: learn a function from examples An example is a pair (x, y) : x = data, y = outcomeassume:

y = f(x)

Regression: y is continuous

Classification: y : set of discrete values

e.g. classes C1, C2, C3... y ∈ {1,2,3...}

Regression vs Classification

Page 6: Overfitting and Regurlarization in Machine Learning · Inductive learning • Simplest form: learn a function from examples An example is a pair (x, y) : x = data, y = outcomeassume:

Precision: A / Retrieved

Positives

Recall: A / Actual

Positives

Precision vs Recall

Page 7: Overfitting and Regurlarization in Machine Learning · Inductive learning • Simplest form: learn a function from examples An example is a pair (x, y) : x = data, y = outcomeassume:

Regression

Page 8: Overfitting and Regurlarization in Machine Learning · Inductive learning • Simplest form: learn a function from examples An example is a pair (x, y) : x = data, y = outcomeassume:

Polynomial Curve Fitting

Page 9: Overfitting and Regurlarization in Machine Learning · Inductive learning • Simplest form: learn a function from examples An example is a pair (x, y) : x = data, y = outcomeassume:

Linear Regression

y = f(x) = Σi wi . φi(x)

φi(x) : basis function

wi : weights

Linear : function is linear in the weightsQuadratic error function --> derivative is linear in w

Page 10: Overfitting and Regurlarization in Machine Learning · Inductive learning • Simplest form: learn a function from examples An example is a pair (x, y) : x = data, y = outcomeassume:

Sum-of-Squares Error Function

Page 11: Overfitting and Regurlarization in Machine Learning · Inductive learning • Simplest form: learn a function from examples An example is a pair (x, y) : x = data, y = outcomeassume:

0th Order Polynomial

Page 12: Overfitting and Regurlarization in Machine Learning · Inductive learning • Simplest form: learn a function from examples An example is a pair (x, y) : x = data, y = outcomeassume:

1st Order Polynomial

Page 13: Overfitting and Regurlarization in Machine Learning · Inductive learning • Simplest form: learn a function from examples An example is a pair (x, y) : x = data, y = outcomeassume:

3rd Order Polynomial

Page 14: Overfitting and Regurlarization in Machine Learning · Inductive learning • Simplest form: learn a function from examples An example is a pair (x, y) : x = data, y = outcomeassume:

9th Order Polynomial

Page 15: Overfitting and Regurlarization in Machine Learning · Inductive learning • Simplest form: learn a function from examples An example is a pair (x, y) : x = data, y = outcomeassume:

Over-fitting

Root-Mean-Square (RMS) Error:

Page 16: Overfitting and Regurlarization in Machine Learning · Inductive learning • Simplest form: learn a function from examples An example is a pair (x, y) : x = data, y = outcomeassume:

Polynomial Coefficients

Page 17: Overfitting and Regurlarization in Machine Learning · Inductive learning • Simplest form: learn a function from examples An example is a pair (x, y) : x = data, y = outcomeassume:

9th Order Polynomial

Page 18: Overfitting and Regurlarization in Machine Learning · Inductive learning • Simplest form: learn a function from examples An example is a pair (x, y) : x = data, y = outcomeassume:

Data Set Size: 9th Order Polynomial

Page 19: Overfitting and Regurlarization in Machine Learning · Inductive learning • Simplest form: learn a function from examples An example is a pair (x, y) : x = data, y = outcomeassume:

Data Set Size: 9th Order Polynomial

Page 20: Overfitting and Regurlarization in Machine Learning · Inductive learning • Simplest form: learn a function from examples An example is a pair (x, y) : x = data, y = outcomeassume:

Regularization

Penalize large coefficient values

Page 21: Overfitting and Regurlarization in Machine Learning · Inductive learning • Simplest form: learn a function from examples An example is a pair (x, y) : x = data, y = outcomeassume:

Regularization:

Page 22: Overfitting and Regurlarization in Machine Learning · Inductive learning • Simplest form: learn a function from examples An example is a pair (x, y) : x = data, y = outcomeassume:

Regularization:

Page 23: Overfitting and Regurlarization in Machine Learning · Inductive learning • Simplest form: learn a function from examples An example is a pair (x, y) : x = data, y = outcomeassume:

Regularization: vs.

Page 24: Overfitting and Regurlarization in Machine Learning · Inductive learning • Simplest form: learn a function from examples An example is a pair (x, y) : x = data, y = outcomeassume:

Polynomial Coefficients

Page 25: Overfitting and Regurlarization in Machine Learning · Inductive learning • Simplest form: learn a function from examples An example is a pair (x, y) : x = data, y = outcomeassume:

Information Theory

Page 26: Overfitting and Regurlarization in Machine Learning · Inductive learning • Simplest form: learn a function from examples An example is a pair (x, y) : x = data, y = outcomeassume:

Twenty Questions

Knower: thinks of object (point in a probability space)Guesser: asks knower to evaluate random variables

Stupid approach:

Guesser: Is it my left big toe? Knower: No.

Guesser: Is it Valmiki? Knower: No.

Guesser: Is it Aunt Lakshmi? ...

Page 27: Overfitting and Regurlarization in Machine Learning · Inductive learning • Simplest form: learn a function from examples An example is a pair (x, y) : x = data, y = outcomeassume:

Expectations & Surprisal

Turn the key: expectation: lock will open

Exam paper showing: could be 100, could be zero. random variable: function from set of marks

to real interval [0,1]

Interestingness ∝ unpredictability

surprisal (r.v. = x) = - log2 p(x)

= 0 when p(x) = 1= 1 when p(x) = ½ = ∞ when p(x) = 0

Page 28: Overfitting and Regurlarization in Machine Learning · Inductive learning • Simplest form: learn a function from examples An example is a pair (x, y) : x = data, y = outcomeassume:

A: 00010001000100010001. . . 0001000100010001000100010001

B: 01110100110100100110. . . 1010111010111011000101100010

C: 00011000001010100000. . . 0010001000010000001000110000

Expectations in data

Structure in data easy to remember

Page 29: Overfitting and Regurlarization in Machine Learning · Inductive learning • Simplest form: learn a function from examples An example is a pair (x, y) : x = data, y = outcomeassume:

Entropy

Used in• coding theory• statistical physics• machine learning

Page 30: Overfitting and Regurlarization in Machine Learning · Inductive learning • Simplest form: learn a function from examples An example is a pair (x, y) : x = data, y = outcomeassume:

Entropy

Page 31: Overfitting and Regurlarization in Machine Learning · Inductive learning • Simplest form: learn a function from examples An example is a pair (x, y) : x = data, y = outcomeassume:

Entropy

In how many ways can N identical objects be allocated M bins?

Entropy maximized when

Page 32: Overfitting and Regurlarization in Machine Learning · Inductive learning • Simplest form: learn a function from examples An example is a pair (x, y) : x = data, y = outcomeassume:

Entropy in Coding theory

x discrete with 8 possible states; how many bits to transmit the state of x?

All states equally likely

Page 33: Overfitting and Regurlarization in Machine Learning · Inductive learning • Simplest form: learn a function from examples An example is a pair (x, y) : x = data, y = outcomeassume:

Coding theory

Page 34: Overfitting and Regurlarization in Machine Learning · Inductive learning • Simplest form: learn a function from examples An example is a pair (x, y) : x = data, y = outcomeassume:

Entropy in Twenty Questions

Intuitively : try to ask q whose answer is 50-50

Is the first letter between A and M?

question entropy = p(Y)logp(Y) + p(N)logP(N)

For both answers equiprobable: entropy = - ½ * log2(½) - ½ * log2(½) = 1.0

For P(Y)=1/1028entropy = - 1/1028 * -10 - eps = 0.01

Page 35: Overfitting and Regurlarization in Machine Learning · Inductive learning • Simplest form: learn a function from examples An example is a pair (x, y) : x = data, y = outcomeassume:

Change of variable x=g(y)