parameter estimation: maximum likelihood estimation chapter 3 (duda et al.) – sections 3.1-3.2...

23
Parameter Estimation: Maximum Likelihood Estimation Chapter 3 (Duda et al.) – Sections 3.1-3.2 CS479/679 Pattern Recognition Dr. George Bebis

Upload: philomena-roberts

Post on 19-Dec-2015

219 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Parameter Estimation: Maximum Likelihood Estimation Chapter 3 (Duda et al.) – Sections 3.1-3.2 CS479/679 Pattern Recognition Dr. George Bebis

Parameter Estimation:Maximum Likelihood Estimation

Chapter 3 (Duda et al.) – Sections 3.1-3.2

CS479/679 Pattern RecognitionDr. George Bebis

Page 2: Parameter Estimation: Maximum Likelihood Estimation Chapter 3 (Duda et al.) – Sections 3.1-3.2 CS479/679 Pattern Recognition Dr. George Bebis

Parameter Estimation

• Bayesian Decision Theory allows us to design an optimal classifier given that we know P(i) and p(x/i):

• Estimating P(i) is usually not difficult.

• Estimating p(x/i) is more difficult:– Number of samples is often too small– Dimensionality of feature space is large.

( / ) ( )( / )

( )j j

j

p x PP x

p x

Page 3: Parameter Estimation: Maximum Likelihood Estimation Chapter 3 (Duda et al.) – Sections 3.1-3.2 CS479/679 Pattern Recognition Dr. George Bebis

• Assumptions– A set of training samples D ={x1, x2, ...., xn}, where the samples

were drawn according to p(x|j).

– p(x|j) has some known parametric form:

• Parameter estimation problem:

Parameter Estimation (cont’d)

Given D, find the best possible

also denoted as p(x / ) where =(μi , Σi)

e.g., p(x /i) ~ N(μ i , i)

Page 4: Parameter Estimation: Maximum Likelihood Estimation Chapter 3 (Duda et al.) – Sections 3.1-3.2 CS479/679 Pattern Recognition Dr. George Bebis

Main Methods in Parameter Estimation

• Maximum Likelihood (ML)

• Bayesian Estimation (BE)

Page 5: Parameter Estimation: Maximum Likelihood Estimation Chapter 3 (Duda et al.) – Sections 3.1-3.2 CS479/679 Pattern Recognition Dr. George Bebis

Main Methods in Parameter Estimation

• Maximum Likelihood (ML)– Assumes that the values of the parameters are fixed but

unknown.– Best estimate is obtained by maximizing the probability

of obtaining the samples x1,x2,..,xn actually observed (i.e., training data):

1 2( , ,..., / ) ( / )np p Dx x x θ θ

Page 6: Parameter Estimation: Maximum Likelihood Estimation Chapter 3 (Duda et al.) – Sections 3.1-3.2 CS479/679 Pattern Recognition Dr. George Bebis

Main Methods in Parameter Estimation (cont’d)

• Bayesian Estimation (BE)– Assumes that the parameters θare random variables

that have some known a-priori distribution p(θ.– Estimates a distribution rather than making point

estimates like ML:

( / ) ( / ) ( / )p D p p D dx x θ θ θ

Note: the BE solution might not be of the parametric form assumed!

Page 7: Parameter Estimation: Maximum Likelihood Estimation Chapter 3 (Duda et al.) – Sections 3.1-3.2 CS479/679 Pattern Recognition Dr. George Bebis

ML Estimation - Assumptions

• Let us assume c classes and that the training data consists of c sets (i.e., one for each class):

• Samples in Dj have been drawn independently according to p(x/ωj).

• p(x/ωj) has known parametric form with parameters j :

e.g., j =(μj , Σj) for Gaussian distribution

D1, D2, ...,Dc

Page 8: Parameter Estimation: Maximum Likelihood Estimation Chapter 3 (Duda et al.) – Sections 3.1-3.2 CS479/679 Pattern Recognition Dr. George Bebis

ML Estimation - Problem Formulation and Solution

• Problem: given D1, D2, ...,Dc and a model for each class, estimate

• If the samples in Dj give no information about i ( ),

we need to solve c independent problems (i.e., one for each class)

• The ML estimate for D={x1,x2,..,xn} is the value that maximizes p(D /) (i.e., best supports the training

data).

i j

1 21

( / ) ( , ,..., / ) ( / )n

n kk

p D p p

θ x x x θ x θ

θ̂

1, 2,…, c

(using independence assumption)

Page 9: Parameter Estimation: Maximum Likelihood Estimation Chapter 3 (Duda et al.) – Sections 3.1-3.2 CS479/679 Pattern Recognition Dr. George Bebis

ML Estimation - Problem Definition and Solution (cont’d)

• How should we find the maximum of p(D/) ?

( / ) 0p D θ θ

where

Page 10: Parameter Estimation: Maximum Likelihood Estimation Chapter 3 (Duda et al.) – Sections 3.1-3.2 CS479/679 Pattern Recognition Dr. George Bebis

ML Estimation Using Log-Likelihood

• Consider the log-likelihood for simplicity:

• The solution maximizes ln p(D/ θ) θ̂

1

ln ( / ) 0 ln ( / ) 0n

kk

p D or p

θ θθ x θ

ˆ arg max ln ( / )p D θθ θ

1

ln ( / ) ln ( / )n

kk

p D p

θ x θ

1 21

( / ) ( , ,..., / ) ( / )n

n kk

p D p p

θ x x x θ x θ

Page 11: Parameter Estimation: Maximum Likelihood Estimation Chapter 3 (Duda et al.) – Sections 3.1-3.2 CS479/679 Pattern Recognition Dr. George Bebis

ML Estimation Using Log-Likelihood (cont’d)

ln p(D/ θ)

p(D / θ) ==μμ̂θ

training data,

unknown mean,

known variance

Page 12: Parameter Estimation: Maximum Likelihood Estimation Chapter 3 (Duda et al.) – Sections 3.1-3.2 CS479/679 Pattern Recognition Dr. George Bebis

ML for Multivariate Gaussian Density:Case of Unknown θ=μ

( / ) ~ ( , )p N x μ μ• Consider

• Computing the gradient, we have

11 1ln ( / ) ( ) ( ) ln 2 ln | |

2 2 2t d

p x μ x μ x μ

1ln ( / ) ln ( / ) ( )k kk k

p D p μ μμ x μ x μ

Page 13: Parameter Estimation: Maximum Likelihood Estimation Chapter 3 (Duda et al.) – Sections 3.1-3.2 CS479/679 Pattern Recognition Dr. George Bebis

• Setting we have:

• The solution is given by

The ML estimate is simply the “sample mean”.

ML for Multivariate Gaussian Density:Case of Unknown θ=μ (cont’d)

ln ( / ) 0p D μ μ

μ̂1

n

kkn

μ x

1

1 1

( ) 0 0n n

k kk k

or n

x μ x μ

Page 14: Parameter Estimation: Maximum Likelihood Estimation Chapter 3 (Duda et al.) – Sections 3.1-3.2 CS479/679 Pattern Recognition Dr. George Bebis

Special Case of ML: Maximum A-Posteriori Estimator

(MAP)• Assume that θ is a random variable with known p(θ).

• Maximize p(θ/D) or p(D/θ)p(θ) or ln p(D/ θ)p(θ):

( / ) ( )( / )

( )

p D pp D

p D

θ θθ

11

( / ) ( ) ln ( / ) ln ( )n n

k kkk

p p p p

x θ θ x θ θ

Consider:

Page 15: Parameter Estimation: Maximum Likelihood Estimation Chapter 3 (Duda et al.) – Sections 3.1-3.2 CS479/679 Pattern Recognition Dr. George Bebis

Special Case of ML: Maximum A-Posteriori Estimator

(MAP)•What happens when p(θ) is uniform?

1

ln ( / ) ln ( )n

kk

p p

x θ θ

MAP is equivalent to ML

1

ln ( / )n

kk

p x θ

Page 16: Parameter Estimation: Maximum Likelihood Estimation Chapter 3 (Duda et al.) – Sections 3.1-3.2 CS479/679 Pattern Recognition Dr. George Bebis

MAP for Multivariate Gaussian Density:Case of Unknown θ=μ

• Assume

• MAP maximizes ln p(D/ μ)p(μ):

( / ) ~ ( , ( ))p N Diag μx μ μ

0( ) ~ ( , (σ ))p N Diag 0 μμ μ

1

( ln ( / ) ln ( )) 0n

kk

p p

μ x μ μ

1

ln ( / ) ln ( )n

kk

p p

x μ μ

maximize

where(known)

Page 17: Parameter Estimation: Maximum Likelihood Estimation Chapter 3 (Duda et al.) – Sections 3.1-3.2 CS479/679 Pattern Recognition Dr. George Bebis

MAP for Multivariate Gaussian Density:Case of Unknown θ=μ (cont’d)

• If , then

• What happens when

0

00

2

21

22 21

2

1 1ˆ( ) ( ) 0

1

n

knk

kk

or

n

μ0

μ0

μμ μ

μ

μ x

x μ μ μ μ

0

2

21

μ

μ

ˆ 0μ μ

00 ? μ

1

n

kkn

μ x

Page 18: Parameter Estimation: Maximum Likelihood Estimation Chapter 3 (Duda et al.) – Sections 3.1-3.2 CS479/679 Pattern Recognition Dr. George Bebis

ML for Univariate Gaussian Density:Case of Unknown θ=(μ,σ2)

• Assume 2( / ) ~ ( , )p x N θ

2 22

22 1

2

1 1ln (x / ) ln 2 (x )

2 21 1

ln (x / ) ln 2 (x )2 2

k k

k k

p or

p

θ

θ

p(xp(xkk//θθ))

p(xk/θ)

p(xk/θ)

θ =(θ1,θ2)=(μ,σ2)

Page 19: Parameter Estimation: Maximum Likelihood Estimation Chapter 3 (Duda et al.) – Sections 3.1-3.2 CS479/679 Pattern Recognition Dr. George Bebis

ML for Univariate Gaussian Density:Case of Unknown θ=(μ,σ2) (cont’d)

=0=0

p(xp(xkk//θθ)=0)=0

• The solutions are given by:

=0=0

sample mean

sample variance

Page 20: Parameter Estimation: Maximum Likelihood Estimation Chapter 3 (Duda et al.) – Sections 3.1-3.2 CS479/679 Pattern Recognition Dr. George Bebis

ML for Multivariate Gaussian Density:Case of Unknown θ=(μ,Σ)

• In the general case (i.e., multivariate Gaussian) the solutions are:

1

n

kkn

μ x

1

1ˆ ˆ ˆ( )( )n

tk k

kn

x μ x μ

sample mean

sample covariance

Page 21: Parameter Estimation: Maximum Likelihood Estimation Chapter 3 (Duda et al.) – Sections 3.1-3.2 CS479/679 Pattern Recognition Dr. George Bebis

Biased and Unbiased Estimates

• An estimate is unbiased when

where θ is the true value.• The ML estimate is unbiased, i.e.,

• The ML estimate and is biased:

θ̂

ˆ[ ]E θ θ

μ̂

ˆ[ ]E μ μ

σ̂ ̂

2 21ˆ[ ]

nE

n

σ1ˆ[ ]

nE

n

Page 22: Parameter Estimation: Maximum Likelihood Estimation Chapter 3 (Duda et al.) – Sections 3.1-3.2 CS479/679 Pattern Recognition Dr. George Bebis

Biased and Unbiased Estimates (cont’d)

• The following are unbiased estimates of andσ̂ ̂

2

1

1ˆ ˆ( )

1

n

kkn

x μ

1

1ˆ ˆ ˆ( )( )1

nt

k kkn

x μ x μ

Page 23: Parameter Estimation: Maximum Likelihood Estimation Chapter 3 (Duda et al.) – Sections 3.1-3.2 CS479/679 Pattern Recognition Dr. George Bebis

Comments about ML

• ML estimation is usually simpler than alternative methods.

• It provides more accurate estimates as the number of training samples increases.

• If the model chosen for p(x/ θ) is correct, and independence assumptions among samples are true, ML will give very good results.