machine learning and pattern recognitionrocha/teaching/2018s1/mo444/classes/2018... · machine...

Machine Learning andPattern Recognition

A High Level Overview

Prof. Anderson Rocha(Main bulk of slides kindly provided by Prof. Sandra Avila)

Institute of Computing (IC/Unicamp)

MC886/MO444

REC Dreasoning for complex data

Today’s Agenda

● Regularization○ The Problem of Overfitting

○ Diagnosing Bias vs. Variance

○ Cost Function

○ Regularized Linear Regression

○ Regularized Logistic Regression

The Problem of Overfitting

×Size

×× × ×

×Size

×× × ×

×Size

×× × ×

Example: Linear Regression

+ + + + + + +

×Size

×× × ×

×Size

×× × ×

×Size

×× × ×

+ + + + + + +

×Size

×× × ×

×Size

×× × ×

×Size

×× × ×

+ + + + + + +

UnderfittingHigh bias

×Size

×× × ×

×Size

×× × ×

×Size

×× × ×

+ + + + + + +

×Size

×× × ×

×Size

×× × ×

×Size

×× × ×

+ + + + + + +

×Size

×× × ×

×Size

×× × ×

×Size

×× × ×

+ + + + + + +

OverfittingHigh variance

Example: Logistic Regression

×××× ××××× × ××

×××

×××× ××××× × ××

×××

×××× ××××× × ××

×××

+ + + + +++ +

× × ×

×××× ××××× × ××

×××

×××× ××××× × ××

×××

×××× ××××× × ××

×××

+ + + + +++ +

× × ×

×××× ××××× × ××

×××

×××× ××××× × ××

×××

×××× ××××× × ××

×××

+ + + + +++ +

× × ×

×××× ××××× × ××

×××

×××× ××××× × ××

×××

×××× ××××× × ××

×××

+ + + + +++ +

× × ×

A model’s generalization error can be expressed as the sum of three very different errors:

● Bias● Variance● Irreducible error

The Bias/Variance Tradeoff

● Bias○ Due to wrong assumptions, such as assuming that the data is linear

when it is actually quadratic. ○ A high-bias model is most likely to underfit the training data.

● Variance● Irreducible error

● Bias● Variance

○ Due to the model’s excessive sensitivity to small variations in the training data.

○ A model with many degrees of freedom is likely to have high variance, and thus to overfit the training data.

● Irreducible error

● Bias● Variance● Irreducible error

○ Due to the noisiness of the data itself. ○ The only way to reduce this part of the error is to clean up the data.

Increasing a model’s complexity will typically increase its variance and reduce its bias.

Reducing a model’s complexity increases its bias and reduces its variance.

This is why it is called a tradeoff.

DiagnosingBias vs. Variance

×Size

×× × ×

×Size

×× × ×

×Size

×× × ×

Bias/Variance

+ + + + + + +

×Size

×× × ×

×Size

×× × ×

×Size

×× × ×

Bias/Variance

+ + + + + + +

d = 1 d = 2 d = 4

Training error:

Cross-validation error:

Bias/Variance

degree of polynomial d

Training error:

Bias/Variance

Training error:

Bias/Variance

Diagnosing Bias vs. Varianceer

Suppose your learning algorithm is performing less well than you were hoping: is high. Is it a bias problem or a variance problem?

Bias (underfit):

Variance (overfit):Bias

Bias (underfit):

Variance (overfit):

will be high≈

Bias (underfit):

Variance (overfit):

will be high≈

will be low≫

Cost Function

×Size

×× × ×

×Size

×× × ×

Intuition

+ + + + + +

×Size

×× × ×

×Size

×× × ×

Intuition

+ + + + + +

Suppose we penalize and make 𝜃3, 𝜃4 really small.

+ +−

×Size

×× × ×

×Size

×× × ×

Intuition

+ + + + + +

+ +−

×Size

×× × ×

×Size

×× × ×

Intuition

+ + + + + +

+ +−

𝜃3 ≈ 0𝜃4 ≈ 0

Regularization

Small values for parameters 𝜃0, 𝜃1, ...,𝜃n ● “Simpler” hypothesis● Less prone to overfitting

Regularization

Housing ● Features: x0, x1, ..., x100

● Parameters: 𝜃0, 𝜃1, 𝜃2, ..., 𝜃100

+− +=

Regularization

Housing ● Features: x0, x1, ..., x100

● Parameters: 𝜃0, 𝜃1, 𝜃2, ..., 𝜃100

Regularization

Regularization parameter

to fit the training data well

to keep the parameters small

In regularized linear regression, we choose 𝜃 to minimize

What if 𝜆 is set to an extremely large value (perhaps for too large for our problem, say 𝜆 = 1010)?

×Size

×× × ×

+ + + +

×Size

×× × ×

+ + + +

×Size

×× × ×

+ + + +

Regularized Linear Function

Gradient Descent

repeat {

− −− −

(simultaneously update for = )

Gradient Descent

repeat {

− −

Gradient Descent

repeat {

− − +

− −

Gradient Descent

repeat {

− − +

− −

− − −

Gradient Descent

repeat {

− − +

− −

− − −

X = = =

Normal Equation

X = = =

Normal Equation

= 𝜆+

X = = =

Normal Equation

= 𝜆+

http://melvincabatuan.github.io/Machine-Learning-Activity-4/

Regularized Logistic Function

Gradient Descent

repeat {

− − +

− −

− − −

Gradient Descent

repeat {

− − +

− −

− − −

http://melvincabatuan.github.io/Machine-Learning-Activity-4/

References

Machine Learning Books

● Hands-On Machine Learning with Scikit-Learn and TensorFlow, Chap. 4

● Pattern Recognition and Machine Learning, Chap. 3

Machine Learning Courses

● https://www.coursera.org/learn/machine-learning, Week 3 & 6

machine learning and pattern recognitionrocha/teaching/2018s1/mo444/classes/2018... · machine...

Documents

pattern-oriented knowledge model for architecture design ·...

channel pattern outline description of channel pattern...

pattern: abacus sash 001 pattern: abacus sash edge 001...

raf103f mynsturgreining pattern recognition · pattern...

design pattern intro+ factory method pattern

quilted easter egg pattern - pattern please

visual pattern recognition: pattern classification and

design pattern 4. factory pattern

design pattern - singleton pattern

singleton pattern -...

interpretation normal spirometry obstructive pattern...

design pattern - factory method pattern

pattern a pattern b

design pattern 0 - pattern langrage overview

abacus pattern cards pattern 1 - melissa &...

primary pattern profile primary pattern plan primary air...

machine learning and pattern...

adapter design pattern state design pattern

the pattern drafter 2-day dress pattern drafting & sewing...

design patterns composite pattern observer pattern