parameter estimation: maximum likelihood estimation chapter 3 (duda et al.) – sections 3.1-3.2...

Parameter Estimation:Maximum Likelihood Estimation

Chapter 3 (Duda et al.) – Sections 3.1-3.2

CS479/679 Pattern RecognitionDr. George Bebis

Parameter Estimation

• Bayesian Decision Theory allows us to design an optimal classifier given that we know P(i) and p(x/i):

• Estimating P(i) is usually not difficult.

• Estimating p(x/i) is more difficult:– Number of samples is often too small– Dimensionality of feature space is large.

( / ) ( )( / )

( )j j

p x PP x

• Assumptions– A set of training samples D ={x1, x2, ...., xn}, where the samples

were drawn according to p(x|j).

– p(x|j) has some known parametric form:

• Parameter estimation problem:

Parameter Estimation (cont’d)

Given D, find the best possible

also denoted as p(x / ) where =(μi , Σi)

e.g., p(x /i) ~ N(μ i , i)

Main Methods in Parameter Estimation

• Maximum Likelihood (ML)

• Bayesian Estimation (BE)

Main Methods in Parameter Estimation

• Maximum Likelihood (ML)– Assumes that the values of the parameters are fixed but

unknown.– Best estimate is obtained by maximizing the probability

of obtaining the samples x1,x2,..,xn actually observed (i.e., training data):

1 2( , ,..., / ) ( / )np p Dx x x θ θ

Main Methods in Parameter Estimation (cont’d)

• Bayesian Estimation (BE)– Assumes that the parameters θare random variables

that have some known a-priori distribution p(θ.– Estimates a distribution rather than making point

estimates like ML:

( / ) ( / ) ( / )p D p p D dx x θ θ θ

Note: the BE solution might not be of the parametric form assumed!

ML Estimation - Assumptions

• Let us assume c classes and that the training data consists of c sets (i.e., one for each class):

• Samples in Dj have been drawn independently according to p(x/ωj).

• p(x/ωj) has known parametric form with parameters j :

e.g., j =(μj , Σj) for Gaussian distribution

D1, D2, ...,Dc

ML Estimation - Problem Formulation and Solution

• Problem: given D1, D2, ...,Dc and a model for each class, estimate

• If the samples in Dj give no information about i ( ),

we need to solve c independent problems (i.e., one for each class)

• The ML estimate for D={x1,x2,..,xn} is the value that maximizes p(D /) (i.e., best supports the training

data).

( / ) ( , ,..., / ) ( / )n

p D p p

θ x x x θ x θ

1, 2,…, c

(using independence assumption)

ML Estimation - Problem Definition and Solution (cont’d)

• How should we find the maximum of p(D/) ?

( / ) 0p D θ θ

ML Estimation Using Log-Likelihood

• Consider the log-likelihood for simplicity:

• The solution maximizes ln p(D/ θ) θ̂

ln ( / ) 0 ln ( / ) 0n

p D or p

θ θθ x θ

ˆ arg max ln ( / )p D θθ θ

ln ( / ) ln ( / )n

θ x θ

( / ) ( , ,..., / ) ( / )n

p D p p

θ x x x θ x θ

ML Estimation Using Log-Likelihood (cont’d)

ln p(D/ θ)

p(D / θ) ==μμ̂θ

training data,

unknown mean,

known variance

ML for Multivariate Gaussian Density:Case of Unknown θ=μ

( / ) ~ ( , )p N x μ μ• Consider

• Computing the gradient, we have

11 1ln ( / ) ( ) ( ) ln 2 ln | |

2 2 2t d

p x μ x μ x μ

1ln ( / ) ln ( / ) ( )k kk k

p D p μ μμ x μ x μ

• Setting we have:

• The solution is given by

The ML estimate is simply the “sample mean”.

ML for Multivariate Gaussian Density:Case of Unknown θ=μ (cont’d)

ln ( / ) 0p D μ μ

( ) 0 0n n

k kk k

x μ x μ

Special Case of ML: Maximum A-Posteriori Estimator

(MAP)• Assume that θ is a random variable with known p(θ).

• Maximize p(θ/D) or p(D/θ)p(θ) or ln p(D/ θ)p(θ):

( / ) ( )( / )

p D pp D

θ θθ

( / ) ( ) ln ( / ) ln ( )n n

p p p p

x θ θ x θ θ

Consider:

Special Case of ML: Maximum A-Posteriori Estimator

(MAP)•What happens when p(θ) is uniform?

ln ( / ) ln ( )n

x θ θ

MAP is equivalent to ML

ln ( / )n

p x θ

MAP for Multivariate Gaussian Density:Case of Unknown θ=μ

• Assume

• MAP maximizes ln p(D/ μ)p(μ):

( / ) ~ ( , ( ))p N Diag μx μ μ

0( ) ~ ( , (σ ))p N Diag 0 μμ μ

( ln ( / ) ln ( )) 0n

μ x μ μ

ln ( / ) ln ( )n

x μ μ

maximize

where(known)

MAP for Multivariate Gaussian Density:Case of Unknown θ=μ (cont’d)

• If , then

• What happens when

1 1ˆ( ) ( ) 0

μμ μ

x μ μ μ μ

ˆ 0μ μ

00 ? μ

ML for Univariate Gaussian Density:Case of Unknown θ=(μ,σ2)

• Assume 2( / ) ~ ( , )p x N θ

1 1ln (x / ) ln 2 (x )

2 21 1

ln (x / ) ln 2 (x )2 2

p(xp(xkk//θθ))

p(xk/θ)

θ =(θ1,θ2)=(μ,σ2)

ML for Univariate Gaussian Density:Case of Unknown θ=(μ,σ2) (cont’d)

p(xp(xkk//θθ)=0)=0

• The solutions are given by:

sample mean

sample variance

ML for Multivariate Gaussian Density:Case of Unknown θ=(μ,Σ)

• In the general case (i.e., multivariate Gaussian) the solutions are:

1ˆ ˆ ˆ( )( )n

x μ x μ

sample mean

sample covariance

Biased and Unbiased Estimates

• An estimate is unbiased when

where θ is the true value.• The ML estimate is unbiased, i.e.,

• The ML estimate and is biased:

ˆ[ ]E θ θ

ˆ[ ]E μ μ

σ̂ ̂

2 21ˆ[ ]

σ1ˆ[ ]

Biased and Unbiased Estimates (cont’d)

• The following are unbiased estimates of andσ̂ ̂

1ˆ ˆ( )

1ˆ ˆ ˆ( )( )1

x μ x μ

Comments about ML

• ML estimation is usually simpler than alternative methods.

• It provides more accurate estimates as the number of training samples increases.

• If the model chosen for p(x/ θ) is correct, and independence assumptions among samples are true, ML will give very good results.

parameter estimation: maximum likelihood estimation chapter 3 (duda et al.) – sections 3.1-3.2...

Documents

matrices cs485/685 computer vision dr. george bebis

arithmetic and geometric transformations (chapter 2)...

george bebis foundation professor director of computer...

cs 253: algorithms chapter 7 mergesort quicksort credit: dr....

expectation-maximization (em) case studies cs479/679 pattern...

1. problem 38 (page 151) -...

computer vision cs302 data structures dr. george bebis

dimensionality reduction chapter 3 (duda et al.) – section...

image representation and manipulation cs302 data structures...

fast multiresolution image querying cs474/674 – prof....

probability review 1 cs479/679 pattern recognition dr....

example-based object detection in images by...

support vector machines (svms) chapter 5 (duda et al.)...

plant classiﬁcation system for crop / weed discrimination...

computer vision research @ unr dr. george bebis

parameter estimation: bayesian estimation chapter 3 (duda et...

multi-dimensional search trees cs 302 data structures dr....

3 3 * 3 %bebis/cs479/readings/kak_pcavslda.pdf · l pt>n...

linear algebra review 1 cs479/679 pattern recognition dr....

face recognition by fusing thermal infrared and...