probability theory and parameter estimation ii

26
Probability Theory and Parameter Estimation II

Upload: others

Post on 13-Apr-2022

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Probability Theory and Parameter Estimation II

Probability Theory and

Parameter Estimation II

Page 2: Probability Theory and Parameter Estimation II

Least Squares and Gauss

● How to solve the least square problem?● Carl Friedrich Gauss solution in 1794 (age 18)

● Why solving the least square problem?● Carl Friedrich Gauss solution in 1822 (age 46)● Least square solution is optimal in the sense that it is the best linear unbiased estimator of the coefficients of the polynomials ● Assumptions: errors have zero mean and equal variances

http://en.wikipedia.org/wiki/Least_squares

Page 3: Probability Theory and Parameter Estimation II

Three Approaches

posterior ∝ likelihood × prior

p(Data | Parameters)

p(Parameters)p(Parameters | Data)

1. find parameters that maximize (log) likelihood 2. find parameters that maximize posterior (MAP)3. find the posterior (fully Bayesian)

Page 4: Probability Theory and Parameter Estimation II

Maximum Likelihood I

Maximize log likelihood

Surprise! Maximizing log likelihood is equivalent to minimizing sum of square error function!

Page 5: Probability Theory and Parameter Estimation II

Maximum Likelihood II

Determine by minimizing sum-of-squares error, .

Maximize log likelihood

Page 6: Probability Theory and Parameter Estimation II

Predictive Distribution

Page 7: Probability Theory and Parameter Estimation II

MAP: A Step towards Bayes

Determine by minimizing regularized sum-of-squares error.

Maximum posterior

prior over parameters

hyper-parameter

likelihood

Page 8: Probability Theory and Parameter Estimation II

MAP: A Step towards Bayes

Determine by minimizing regularized sum-of-squares error.

Surprise! Maximizing posterior is equivalent to minimizing regularized sum of square error function!

Page 9: Probability Theory and Parameter Estimation II

Three Approaches

posterior ∝ likelihood × prior

p(Data | Parameters)

p(Parameters)p(Parameters | Data)

1. find parameters that maximize (log) likelihood 2. find parameters that maximize posterior (MAP)3. find the posterior (fully Bayesian)

p (t 0∣X , x0)=∫ p (t 0∣X , x0,Y ) p(Y∣X , x0)dY

Page 10: Probability Theory and Parameter Estimation II

Bayesian Curve Fitting

Page 11: Probability Theory and Parameter Estimation II

Bayesian Predictive Distribution

Mean of predictive distribution

Page 12: Probability Theory and Parameter Estimation II

Review

posterior ∝ likelihood × prior

1. find parameters that can lead to over-fitting maximize (log) likelihood (yields parameters) 2. find parameters that avoids over-fitting maximize posterior (MAP) (yields parameters)3. find the posterior yields distribution (fully Bayesian)

Page 13: Probability Theory and Parameter Estimation II

Review

posterior ∝ likelihood × prior

1. find parameters that can lead to over-fitting maximize (log) likelihood (yields parameters) 2. find parameters that avoids over-fitting maximize posterior (MAP) (yields parameters)3. find the posterior yields distribution (fully Bayesian)

Page 14: Probability Theory and Parameter Estimation II

Model Selection

Cross-Validation

Page 15: Probability Theory and Parameter Estimation II

Curse of Dimensionality

Page 16: Probability Theory and Parameter Estimation II

Curse of Dimensionality

Page 17: Probability Theory and Parameter Estimation II

Curse of Dimensionality

Page 18: Probability Theory and Parameter Estimation II

Curse of Dimensionality

Polynomial curve fitting, M = 3

Gaussian Densities in higher dimensions

Page 19: Probability Theory and Parameter Estimation II

Decision Theory

Inference stepDetermine either or .

Decision stepFor given x, determine optimal a (action).

To minimize misclassification: maximize posterior

regression

classification

p (cancer∣image)=p (image∣cancer ) p(cancer )

p (image)

Page 20: Probability Theory and Parameter Estimation II

Minimum Misclassification Rate

Page 21: Probability Theory and Parameter Estimation II

Minimum Expected Loss

Example: classify medical images as ‘cancer’ or ‘normal’

DecisionTru

thLoss function

Page 22: Probability Theory and Parameter Estimation II

Minimum Expected Loss

Regions are chosen to minimize

elements in region j

real class is k

Page 23: Probability Theory and Parameter Estimation II

Reject Option

Page 24: Probability Theory and Parameter Estimation II

Why Separate Inference and Decision?• Minimizing risk (loss matrix may change over

time)• Reject option• Unbalanced class priors• Combining models

Page 25: Probability Theory and Parameter Estimation II

Decision Theory for Regression

Inference stepDetermine .

Decision stepFor given x, make optimal prediction, y(x), for t.

Loss function:

Page 26: Probability Theory and Parameter Estimation II

The Squared Loss Function

As expected