estimation theory 1
TRANSCRIPT
ESTIMATION THEORY
4.1. Introduction:
When we fit the random data by an AR model, we have to determine the process
parameters observed data.
In RADAR signal processing, we have to determine the location and the velocity of a
target by observing the received noisy data
In communication, we have to infer about the transmitted signal from the received noisy
data.
Generally estimation includes: Parametric estimation and Non parametric estimation.
RADAR signal processing
For example, consider the problem of estimating the probability density function Xf x
of a random variable X. We may assume a model for X, say the Gaussian, and find the mean
and the variance 2 of the RV. Finding out and 2 from the observed value of X is a
problem of parameter estimation. Particularly, we have the problem to find each value of a
signal from noisy observation. This problem is known as the signal estimation problem.
Otherwise, we may be interested to find the true value of Xf x directly from data for all
values of X without assuming any model for Xf x .This is the non-parametric method of
estimation.
We will discuss the problem of parameter estimation here:
We have a sequence of observable random variables 1, 2 n
X X X , represented by the
vector:
1
2
n
X
X
X
X
X is governed by the joint density function which depends on some unobservable parameter
and is given by: 1 2, ,..., / 1 2 /, , , | |
nX X X nf x x x fX x , where, may be deterministic or
random. Our aim is to make an inference on from an observed sample of 1, 2 , ,
nX X X .
An estimator ( ) x is a rule by which we guess about the value of an unknown on the basis of
X.
( ) X being a function of random variables, is a random variable. For a particular observation
1, 2 , ,n
X X X , we get what is known as an estimate (not estimator). An estimator for a parameter
is also called a point estimate.
Example 1:
Let 1 2, , ,n
X X X be a sequence of independent and identically distributed (i.i.d) random
variables with mean and variance 2 .
1
1 n
i
i
Xn
is an estimator for .
2 2
1
1ˆ1
n
i
i
Xn
is an estimator for 2 .
An estimator is a function of the random sequence 1 2, , ,
nX X X and if it does not involve any
unknown parameters. Such a function is generally called a Statistic.
Example 2:
Suppose we have DC voltage X corrupted by noise i
V and the observed data , 1,2,...,i
Y i n are
given by
i iY X V
Then, 1
1ˆn
i
i
X Yn
is an estimator for X.
Properties of Estimators
A good estimator should satisfy some properties. ( ) X should be as close to as possible.
Some desirable properties of the estimator can be described in terms of the mean and variance of
the estimator.
(a) Unbiased Estimator
An estimator of is said to be unbiased if and only if E( )= . The quantity E( )- is
called the bias of the estimator. Unbiasedness is necessary but not sufficient to make an
estimator a good one.
A random parameter θ is unbiased. if E E . We consider θ to be a deterministic parameter in this discussion.
Consider two estimators,
22
1
1
1 n
i
i
Xn
and
22
2
1
1
1
v
i
i
Xn
for an i.i.d. sequence 1, 2, ,
nX X X .
We can show 2
2 is an unbiased estimator.
2 2
1 1
2 2
1
2
n n
i i
i i
n
i i
i
E X E X
E X X
Now, 2 2
iE X and
2
2
2
2
2
2
1
1( )
i
i
i
XE E
n
E n Xn
E Xn
2
2
1i i j
i j i
E X E X Xn
2
2
1i
E Xn
(because of independence) 2
n
also, 2 2
ii
XE X E
n n
2
2 2 2 2
1
2 ( 1)n
i x
i
E X n n
So, 2
2 2
1
1ˆ ˆ( )1
n
i
i
E E Xn
2 is an unbiased estimator of 2 .
Similarly, sample mean is an unbiased estimator
1
1
1
1
n
x i
i
n
i
i
Xn
nE E X
n n
.
Example 4 Suppose 1 2, , ,n
X X X is an i.i.d. Poisson random variables with unknown
parameter . Then,
1
1
1ˆn
i
i
Xn
and 2
1
1
1ˆ1
n
i
i
Xn
are two unbiased estimators of .
(b) Variance of the Estimator
The variance of the estimator is given by:
2
var ( )E E
For an unbiased case
2
var E
The variance of the estimator should be as low as possible.
An unbiased estimator is called a minimum variance unbiased estimator (MVUE) if
2 2
'E E where ' is any other unbiased estimator.
(c) Mean square error of Estimator
2
MSE E .MSE should be as small as possible. Out of all unbiased estimator, the
MVUE has the minimum mean square error.
MSE is related to the bias and variance as shown below:
2 2
2 2
2 2
2
2
MSE E E E E
E E E E E E E
E E E E E E E
2var 0b
So, 2varMSE b
(d) Consistent Estimators
As we have more data, the quality of estimation should be better. This idea is used in defining
the consistent estimator. An estimator is called a consistent estimator of if converges in
probability to .
0limN
P
for any 0
Less rigorous test is obtained by applying the Markov Inequality
2
2
EP
If is an unbiased estimator [ 0b ], then varMSE .
Therefore, if 2
0limN
E
, then will be a consistent estimator.
Also, note that 2varMSE b .
Therefore, if the estimator is asymptotically unbiased (i.e. 0b as n ) and var 0
as n ,then 0MSE .Therefore for an asymptotically unbiased estimator , if var 0
as n , then will be a consistent estimator.
Example 3
Suppose 1 2, , ,n
X X X is an i.i.d. random sequence with unknown x
and known 2
x .
Let 1
1 n
i
i
Xn
be an estimator forx
. We have already shown that x
is unbiased. also,
2
varn
. Is it a consistent estimator?
Clearly, 2
var 0lim limxn n n
. Therefore, is a consistent estimator of .
Efficient Estimator
Suppose 1 and 2 be two unbiased estimator of the parameter . The relative efficiency of the
estimator 2 with respect to the estimator 1 I s defined by
1
2
var( )
ˆvar( )Relative Efficiency
Particularly, if 1 is an MVUE, it is called an efficient estimator and the absolute efficiency of
an unbiased estimator with respect to this estimator.
Example 5 Suppose 1 2, , ,
nX X X is an i.i.d. normal random variables with unknown mean .
Then, and the sample median m are two
We have shown that 2
varn
and it can be shown that 2
ˆvar2
mn
. Therefore,
2ˆEfficiencyof m
Example 6 Suppose 1 2, , ,N
X X X is an i.i.d. normal random variables with unknown mean .
Then, 1
1
1ˆ1
n
i
i
Xn
is a biased estimator of . Note that
1
2 2
1 2 2
ˆ ˆ1
ˆ ˆvar( ) var( )( 1) ( 1)
n
n
n n
n n
We have shown that 2
varn
and it can be shown that 2
ˆvar2
mn
. Therefore,
2ˆEfficiencyof m
Minimum Variance Unbiased Estimator
We described about the Minimum Variance Unbiased Estimator (MVUE) which is a desirable
estimator
is an MVUE if
)ˆ(E
and ˆ ˆ( ) ( )Var Var
where is any other unbiased estimator of .
Theorem: MVUE is unique
Suppose 1 and
2 are two MVUEs for the deterministic parameter .
Clearly , 1 2ˆ ˆE E
Suppose 2
1 2( ) ( )Var Var
Assume another estimator 1 23
2
Then
1 2 1 23
1 2 1 2
1 2 1 2
2
var( ) var( ) 2cov( , )( )
4
var( ) var( ) 2 cov( , )
4
var( ) var( ) 2 var( ) var( ).
4using CS. inequality
var
But3( )var cannot be less than 2 .
2 2
3 1 2( ) cov( , )var .
Now consider
1 2 1 2 1 2
2 2 2
( ) var( ) var( ) 2cov( , )
2
0
var
1 2ˆ ˆ with probability 1.
Cramer Rao theorem
Can we reduce the variance of an unbiased estimator indefinitely? The answer is given by the
Cramer Rao theorem.
Suppose is an unbiased estimator of random sequence. Let us denote the sequence by the
vector
1
2
n
X
X
X
X
Let / 1( ,......, / )
nf x x X
be the joint PDF which characterises .X This function is called
likelihood function. Note that may also be random. In that case likelihood function will
represent conditional joint density function.
The quantity / 1 2( / ) ln ( , ,..., / )
nL f x x x Xx is called the log- likelihood function.
Statement of the Cramer Rao theorem
Suppose is an unbiased estimator of D , where D is an open interval and
/ 1( ,..., / )n
f x x Xsatisfies the following regularity conditions:
(i) The support /{ | ( / ) 0}
Xf x x does not depend on . We may assume n
to be the support.
(ii) For , D x , ( / )L L
x exists and is finite.
Then
1ˆ( )( )
n
VarI
where 2( ) ( )n
LI E
and ( )n
I is a measure of average information in the random sequence
and is called Fisher information statistic.
The equality of CR bound holds if )ˆ(
cL
where c is a constant.
Proof: is an unbiased estimator of
0)ˆ( E .
/
/
ˆ( ) ( / ) 0.
ˆ( ) ( / ) 0.
f d
f d
X
X
x x
x x
where the integration is an n-fold integration.
Differentiate with respect to , we get
/ˆ{( ) ( / )} 0.
df dx
d
X x
Note that the regularity condition that the limits of integration are not function of . Therefore,
the processes of integration and differentiation can be interchanged and we get,
/
/ /
/
ˆ{( ) ( / )} 0.
ˆ( ) ( / )} ( / ) 0.
ˆ( ) ( / ) ( / ) 1. (1)
f d
f d f d
f d f d
X
X
X X
X
x x
x x x x
x x x x
Note that
/
/ / /( / ) {ln ( / )} ( / )
( ) ( / )
f f f
Lf
X
X X Xx x x
x
Therefore, from (1)
/ˆ( ){ ( / )} ( / )} 1.L f d
Xx x x
So that
/ /
2
ˆ( ) ( / ) ( / ) ( / )dx 1f L f d
X Xx x x x . (2)
since /
( / ) 0.f
X
x
Recall that he Cauchy Schawarz inequality is given by
222
, baba
where the equality holds when ba c ( where c is any scalar ).
Applying this inequality to the L.H.S. of equation (2) we get
2
2
2
-
ˆ( ) ( / ) ( / ) ( / )dx
ˆ( - ) ( / ) d ( ( / ) ( / ) d
f L f d
f L f
X X
X X
x x x x
x x x x x
= ˆvar( ) ( )n
I
ˆ. . var( ) ( )n
L H S I
But R.H.S. = 1
ˆvar( ) I ( ) 1.n
1ˆvar( ) ,( )nI
which is the Cramer Rao Inequality. The right hand side is the Cramer Rao lower bound (CRLB)
for ˆvar( ) .
The equality will hold when
ˆ ( / ) ( / )} ( ) ( / ) ,L f c f
X Xx x x
so that
where c is independent of and may be a function of . Noting that
2
2 21 ( / ) ˆ = ( - )ˆvar( )
LE c E
x
,
we get
( )nc I
Thus the CRLB is achieved if and only if
( / ) ˆ ( )( - )n
LI
x
( / ) ˆ ( - )L
c
x
If satisfies CR -bound with equality, then is called an efficient estimator. Note that an
efficient estimator is always an MVUE.
Also from /
( / ) 1,f d
Xx x we get
/
/
( / ) 0
( / ) 0
f d
Lf d
X
X
x x
x x
Taking the partial derivative with respect to again, we get
/ /
/ /
2
2
22
2
( / ) ( / ) 0
( / ) ( / ) 0
L Lf f d
L Lf f d
X X
X X
x x x
x x x
2 2
2
L( / ) L( / )E - E
x x
Thus the CR inequality may be written as:
2
2
1ˆvar( )L( / )
- E
x
Remark
(1) If the information ( )I is more, ˆvar( ) will be less.
(2) Suppose 1 2, ,...,n
X X X are iid. Then
/ /1 1
, ,..., /1 2
, ,..., /1 2
/
/
2 2
1 2
2
1 2
2
1 22
2
21
2
2
( ) ln( ( )) ln( ( ))
( ) ln( ( , ,..., / ))
ln( ( , ,..., / ))
ln( ( / ))
ln(
X X
X X Xn
X X Xn
Xi
Xi
n n
n
n
ii
I E f x E f x
I E f x x x
E f x x x
E f x
E f
1
1
( / ))
( )
n
ii
x
nI
(3) If satisfies CR -bound with equality, then is called an efficient estimator.
Extension to Vector Parameters
Suppose 1 2, ,..., k are k parameters which are represented as the vector 1 2[ ... ]k θ .
Then the log-likelihood function is given by
/ 1 2( / ) ln ( , ,..., )nL f x x x X θx θ
We can represent the 1st-order partial derivatives of ( / )L x θ as
1 2
( / ) ( / ) ( / )... ... ( / )k
L L L L
x θ x θ x θ x θθ
The Fisher Information matrix is given by
( / ) ( / )E L L
nI x θ x θθ θ
where E is performed on each term of the matrix.
It can be shown that
2 2 2
2
1 1 2 1 2
2 2 2
2
1 2
( / )
( / ) ( / ) ( / )... ....
( / ) ( / ) ( / )... ....
n n n
E L
L L L
EL L L
nI x θθ θ
x θ x θ x θ
x θ x θ x θ
Assume that the pdf / 1 2( , ,..., /nf x x xX θ θ ) satisfies the regularity condition
( ( / )) 0E L
x θθ
where the expectation is taken with respect to / 1 2( , ,..., / )nf x x xX θ θ Then, the
covariance matrix of any unbiased estimator satisfies:
ˆ ( )n -1
θC I θ 0
where the in equality with respect to the Zero matrix implies that the left-hand side is a positive
semi-definite matrix.
The CR theorem for the individual variances is given by
( , )ˆ( ) ( ) |i n i iVar -1I θ
where ( , )| i i
denotes the ith element of the matrix.
The equality will hold when
ˆ ( / ) ( / )} ( ) ( / ) ,L f c f
X Xx x x
so that
Example 3:
Let 1 2, ,...,n
X X X are iid Gaussian random sequence with known 2 variance and unknown
mean .
Suppose 1
1ˆn
i
i
Xn
which is unbiased. Find CR bound and hence show that is an efficient
estimator.
( / ) ˆ ( ) ( - )L
n
x θ I θ θ θ
θ
The likelihood function / 2( , ,...., / )
nf x x x X 1
will be product of individual densities (since iid)
/ 2 n
1 2( )22 11
( , ,....., / )( (2 )
n n
nxi
if x x x e
X 1
so that 2
21
1( / ) ln( 2 ) ( )
2
nn n
i
i
L x
X
Now
21
2
2 2
2
2 2
1 0 - ( -2) ( )
2
n -
n So that E -
n
ii
LX
L
L
CR Bound = 2
2
22
1 1 1
( )-n
nLI nE
2 2 21
1 ˆ ( ) - ( - )2
n
ii
i i
L n X nX
n
estimator.efficient an is ˆ and
)-ˆ( c - Hence
L
Example 4 Suppose n n
X a bn V , 2~ (0, ), and are known constants.n
V N a b Here
[ ]a b θ . The -likelihood function is given by
1
/ 1 2n
1 2( )21 2( , ,..., )
( (2 )
ii
nn
nx a bi
f x x x e
X θ
2
21
2 21 1
2 2 2
2 2 2 2 2
1( / ) ln( 2 ) ( )
2
1 1( ), ( ) ,
( 1) ( 1)(2 1), and
2 6
nn n
ii
n n
i ii i
L x a bi
L Lx a bi x a bi i
a b
L n L n n L n n n
a a b a
x θ
2
( 1)
2
1 ( 1) ( 1)(2 1)
2 6
n nn
n n n n
nI
Taking the inverse we get,
1 2
2
2
2(2 1) 6
( 1) ( 1)
6 12
( 1) ( 1)
2(2 1) 12var( ) and var( )
( 1) ( 1)
n
n
n n n n
n n n n
na b
n n n n
I
MVUE through Sufficient Statistic
W saw that MVUE achieving the CRLB can be obtained through the factorization of
( / ) ˆ( )L
x θ
I θ θ θθ
. However, the CRLB may not be achieved by the MVUE. The sufficient
statistic can be used to find the MVUE under certain conditions.
The observations 1 2, ,...., nX X X contain information about the unknown parameter . An
estimator should carry the same information about as the observed data. The concept of
sufficient statistic is based on this idea.
A measurable function 1 2( , ,...., )
nT X X X is called a sufficient statistic of if it contains the
same information about as contained in the random sequence 1 2, ,...., .nX X X In other word the
joint conditional density 1 2 1 2, ,..., | ( , ,..., ) 1 2( , ,..., )
n nX X X T X X X nf x x x does not involve .
There are a large number of sufficient statistics for a particular criterion. One has to select a
sufficient statistic which has good estimation properties.
Example 7: Suppose ,1i
x N and 1 2 1 2,T x x x x . Then,
1 2 1 2
1 2 1 2
1 2
1 2
1 2
2 2
1 2
2
1 2
1 2 1 2, , ,
1 2, | ,
1 2,
, 1 2
1 2
1
2
12
4
, , ,,
,
,
1
2 1
4
1
x x T x x
x x T x x
T x x
x x
x x
x x
x x
f x x T x xf x x
f T x x
f x x
f x x
e
e
e
2 2 2 2 2 21 2 1 2 1 2 1 2 1 2
2 21 2 1 2
2
1 2
1 12 2 4 4 2
2 4
12
4
1
4
1
1
x x x x x x x x x x
x x x x
x x
e
e
Thus 1 2 1 2
1 2, | ,,
x x T x xf x x does not involve the parameter . Hence 1 2 1 2,T x x x x is a
sufficient statistic.
Remark: If 1 2 1 2, 3T x x x x we can show in a similar way that 1 2,T x x is not a sufficient
statistic.
The above definition allows us to check whether a given statistic is sufficient or not. A way to
determine a sufficient statistic is through the Neyman Fisher Factorization theorem .
Factorization theorem
For continuous RVS 1 2, , ,
nX X X , the statistic
1 2( , ,...., )n
T X X X is a sufficient statistic for
if and only if
1 2, ,..., / 1 2 1 2
ˆ( , ,...., ) ( , ) ( , ,...., )nX X X n nf x x x g h x x x
where )ˆ,( g is a non-constant and nonnegative function of and and 1 2( , ,...., )n
h x x x
does not involve and is a nonnegative function of 1 2, ,....,n
x x x .
For the discrete case, the factorization theorem states:
T(x) is sufficient if and only if
( ) ( , ( )) ( )p g T h X
x x x
Proof: Denote the value T(x) by t. Suppose T(X) is a sufficient statistic. Then,
: ( )
( ) ( )
( , ( ) )
( ( ) ) ( ( ) , )
( ( ) ) ( ( ) ) [ ( ) is a sufficient statistic ]
( ) ( )
( , )) ( )
T t
p P
P T t
P T t P T t
P T t P T t T
p h
g t h
X
Xx x
x X x
X x X
X X x X
X X x X X
x x
x
where : ( )
( , ) ( )T t
g t p
Xx x
x and ( ) ( ( ) )) h P T t x X x X
Conversely, suppose ( ) ( , )) ( )p g t h X
x x . Then,
: ( )
: ( )
: (
, ( )( )
( )
( )
( , ) ( )
( , ) ( )
( , ) ( )
( , ) ( )
( )
( )
T t
T t
T t
P T tP T t
P T t
P
P T t
g t h
g t h
g t h
g t h
h
h
x x
x x
x x)
X x XX x X
X
X x
X
x
x
x
x
x
x
which does not depend on θ.
Example 8: Suppose 1 2, ,...,n
X X X are iid Gaussian random variables with unknown mean
and known variance 1.
Then n
ii 1
1( ) XT
n X is a sufficient statistic of .
Because
1 2
1
2 2
1 1
21 2
, ,..., / 1 21
21 2
1 1 2 2
1( , ,...., )
2
1
( 2 )
1
( 2 )
n
n
i
n n
ii i
n xi
X X X ni
xi
n
n x xi
n
f x x x e
e
e e
The first exponential is a function of 1 2, ,..., nx x x and the second exponential is a function of
and 1
( )n
ii
T x x
. Therefore 1
( )n
ii
T x x
is a sufficient statistics of .
Rao-Blackwell Theorem
Suppose is an unbiased estimator of and ( )T X is a sufficient for .
Then ˆ ˆ( / ( ))E T X is unbiased and ˆ ˆvar( ) var( ) .
Proof : Using the property of conditional expectation ,we have
ˆ ˆ( ( / ( )))
ˆ( )
E E E T
E
X
∴ is an unbiased estimator of �. Now 2
2
2
2
ˆ ˆvar( ) ( )
ˆ( ( ) / ( ))
ˆ( (( ) / ( )) )
ˆ( )
ˆvar( )
(Using Jensen's inequality for a convex finction)
E
E E T
E E T
E
X
X
Complete statistic
A statistic X is said to be complete if for any bounded function g X
0 for E g X
implies that
0 1 for P g X
Example: Suppose 1 2 3, , ,........,n
are iid 1, random variables and
1
n
i
i
X
Clearly ,i n X and X takes values 0,1,...,t n .
0
1 0 0,1n
n tt
t
nE g g t
t
X
0
1 0 0,11
tnn
t
ng t
t
0
0 1
tn
t
ng t
t
The left hand side are polynomials in 1
and can be zero if and only if the coefficients
vanish
g 0 for 0,1,2,...,t t n
Hence is complete statistic.
Remark: If X is a complete statistic then there is only one function g X which is
unbiased. Suppose there is another function 1g x which is unbiased.
Then 1 0g g X X
1 0g g X X
1 0g g X X
1g g X X with probability 1
Lehmann-Scheff theorem
Suppose X is complete sufficient statistic for and g X is unbiased estimator based
on X . Then g X is the MVUE
Proof:
Using Rao Blackwell theorem,
ˆ /g X X X ,where
X is any unbiased estimator of , is unbiased.
g X is unique as X is a complete
statistic and
ˆ
ˆ /
Var g Var
g
X X
X X X
is an MVUE
Exponential Family of Distribution
A family of distribution with the probability density function ( or probability mass function) of
the form
/ ( ) ( )exp( ( ) ( ))X
f x a b x c t x
with ( ) 0a and ( )c as real functions of and ( ) 0b x
is called an exponential family of distribution.
Similarly a family of distributions /
1
( ) ( )expk
X i i
i
f x a b x c t x
with
, and i
a b x c as specified above , is called the k-parameter exponential family. An
e po e tial fa il of dis rete RV’s ill ha e the pro a ilit ass fu tio i the a o e for s
Example 9
Suppose 2,X N then
2
2
2/
2 2
2
2
2 2 2
1 1exp
22
1 1 = exp 2
22
1 1 1 = exp exp
2 22
Xf x x
x x
x x
Thus 2/X
f x
belongs to a 2-parameter exponential family with
2
22
1 2 1 22 2 2
1 1 1, exp , 1, , , , and
2 22a b x c c t x x t x x
If 1 2, ,...,n
X X X are iid random variables ,then
1 21
1 1
1
, ,..., exp
= exp
k nnn
n j i i jj
i j
kn
i i
i
f x x x a b x c t x
a b c T
X/θ
x x
Define
1
2
1
1
2
1
1
:
:
= :
:
k
n
j
j
n
j
j
n
n j
j
T
T
T
T
t x
t x
t x
x
x
x
x
It is easy to show that T x is a sufficient and complete.
Criteria for Estimation
The estimation of a parameter is based on several well-known criteria. Each of the criteria tries
to optimize some functions of the observed samples with respect to the unknown parameter to be
estimated. Some of the most popular estimation criteria are:
Maximum Likelihood
Minimum Mean Square Error.
Baye’s Method.
Maximum Likelihood Estimator (MLE)
Suppose 1 2, ,...,n
X X X are random samples with the joint probability density
function 1 2, ,..., / 1 2( , ,..., )
nX X X nf x x x which depends on an unknown nonrandom
parameter .
/ 1 2( , , ..., / )n
f x x x X is called the likelihood function. If
1 2, ,...,n
X X X are discrete,
then the likelihood function will be a joint probability mass function. We represent
the concerned random variables and their values in vector notation by
1 2[ ... ]n
X X X X and 1 2[ ... ]n
x x x x respectively. Note that
/( / ) ln ( / )L f Xx x is the log likelihood function. As a function of the random
variables, the likelihood and log-likelihood functions are random variables.
The maximum likelihood estimator MLE is such an estimator that
/ 1 2 / 1 2ˆ( , ,..., / ) ( , ,..., / ),
n MLE nf x x x f x x x X X
If the likelihood function is differentiable with respect to , then MLE is given by
MLE
ˆ/ θ ( / ) 0f
X x
or 0 θ
)|L(
MLEθ
x
Thus the MLE is given by the solution of the likelihood equation given above.
If we have k unknown parameters given by
1
2
k
θ
Then MLE is given by a set of conditions.
1 1 2 2ˆ ˆ ˆ1 2
L( / ) L( / ) L( / ) .... 0
θ θ θMLE MLE k kMLE
k
x x x
Since ln( ( / ))L x is a monotonic function of the argument, it is convenient to express the MLE
conditions in terms of the log-likelihood function. The condition is now given by
1 1 2 2ˆ ˆ ˆ1 2
(L( / )) (L( / )) (L( / )) .... 0
θ θ θMLE MLE k kMLE
k
Ln Ln Ln
x x x
Example 10:
Let 1 2, ,...,
nX X X are independent identically distributed sequence of 2
( , )N distributed
random variables. Find MLE for and 2 .
2
2
1 2/ ,( , ,..., / , )nf x x x
X =
21
2
1
1
2
ixn
i
e
2
2 2
/ ,( / , ) ln ( / , )L f
Xx x
=
2
1
1ln 2 ln -
2
n
i
i
xn n
ˆ
1
0
ˆ( ) 0
MLE
n
i MLE
i
L
x
2
2
ˆ
2
2 4
0
ˆ( )0
ˆ ˆ
MLE
MLEi
MLE MLE
L
xn
Solving we get
1
22
1
1ˆ
1ˆ ˆ
n
MLE i
i
n
MLE i MLE
i
x andn
xn
Example 11:
Let 1 2, ,...,
nX X X are independent identically distributed random samples with
1 /
1( ) -
2
x
Xf x e x
Show that 1 2( , ,..., )
nX X Xmedian is the MLE for .
1
1 2, ,...., / 1 2
1( , ,...., )
2
n
i
i
n
x
X X X n nf x x x e
/
1
( / ) ln ( )
ln 2 n
i
i
L f
n x
Xx x
1
n
i
i
x
is minimized by 21, , ( ..., )nxmedian x x
21, ,ˆ ( ..., )MLE nxmedian x x Properties of MLE
(1) MLE may be biased or unbiased. In Example 4, ˆMLE
is unbiased where as 2ˆMLE
is a
biased estimator.
(2) If an efficient estimator exists, the MLE estimator is the efficient estimator.
Supposes an efficient estimator θ exists . Then
ˆ( / ) ( )L x c
at ˆ ,MLE
ˆ( / )
0
ˆ ˆ( ) 0
ˆ θ
MLE
MLE
MLE
L x
c
(3) The MLE is asymptotically unbiased and efficient. Thus for large n, the MLE is
approximately efficient.
(4) Invariance Properties of MLE
It is a remarkable property of the MLE and not shaerd by other estimators. If MLE
is the MLE
of and ( )h is a function, then ˆ( )MLE
h is the