asp lecture 3 estimation intro 2015
DESCRIPTION
DSPTRANSCRIPT
![Page 1: ASP Lecture 3 Estimation Intro 2015](https://reader034.vdocuments.us/reader034/viewer/2022042702/563db88b550346aa9a94ad73/html5/thumbnails/1.jpg)
Advanced Signal Processing
Introduction to Estimation Theory
Danilo Mandic,
room 813, ext: 46271
Department of Electrical and Electronic Engineering
Imperial College London, [email protected], URL: www.commsp.ee.ic.ac.uk/∼mandic
c© D. P. Mandic Advanced Signal Processing, Spring 2015 1
![Page 2: ASP Lecture 3 Estimation Intro 2015](https://reader034.vdocuments.us/reader034/viewer/2022042702/563db88b550346aa9a94ad73/html5/thumbnails/2.jpg)
Aims of this lecture
◦ To introduce the notions of: Estimator, Estimate, Estimandum
◦ To discuss the bias and variance in statistical estimation theory,asymptotically unbiased and consistent estimators
◦ Performance metric: the Mean Square Error (MSE)
◦ The bias–variance dilemma and the MSE in this context
◦ To derive a feasible MSE estimator
◦ A class of Minimum Variance Unbiased (MVU) estimators
◦ Extension to the vector parameter case
◦ Point estimators, confidence intervals, statistical goodness of anestimator, the role of noise
c© D. P. Mandic Advanced Signal Processing, Spring 2015 2
![Page 3: ASP Lecture 3 Estimation Intro 2015](https://reader034.vdocuments.us/reader034/viewer/2022042702/563db88b550346aa9a94ad73/html5/thumbnails/3.jpg)
Role of estimation in signal processing
(try also the function specgram in Matlab)
◦ An enabling technology in many electronic signal processing systems
1. Radar 4. Image analysis 7. Control2. Sonar 5. Biomedicine 8. Seismics3. Speech 6. Communications 9. Almost everywhere ...
◦ Radar and sonar: range and azimuth
◦ Image analysis: motion estimation, segmenation
◦ Speech: features used in recognition and speaker verification
◦ Seismics: oil reservoirs
◦ Communications: equalization, symbol detection
◦ Biomedicine: various applications
c© D. P. Mandic Advanced Signal Processing, Spring 2015 3
![Page 4: ASP Lecture 3 Estimation Intro 2015](https://reader034.vdocuments.us/reader034/viewer/2022042702/563db88b550346aa9a94ad73/html5/thumbnails/4.jpg)
Statistical estimation problem
for simplicity, consider a DC level in WGN, x[n] = A + w[n], w ∼ N (0, σ2)
Problem statement: We seek to determine a set of parametersθ = [θ1, . . . , θp]
T from a set of data points x = [x[0], . . . , x[N − 1]]T
such that the values of these parameters would yield the highestprobability of obtaining the observed data. In other words,
maxspan θ
p(x;θ) reads : “p(x) parametrised by θ′′
◦ The unknown parameters may be seen as deterministic or randomvariables
◦ There are essentially two alternatives to the statistical case
– No a priori distribution assumed: Maximum Likelihood– A priori distribution known: Bayesian estimation
◦ Key problem # to estimate a group of parameters from a discrete-timesignal or dataset.
c© D. P. Mandic Advanced Signal Processing, Spring 2015 4
![Page 5: ASP Lecture 3 Estimation Intro 2015](https://reader034.vdocuments.us/reader034/viewer/2022042702/563db88b550346aa9a94ad73/html5/thumbnails/5.jpg)
Estimation of a scalar random variable
Given an N - point dataset x[0], x[1], . . . , x[N − 1] which depends on anunknown parameter θ, (scalar), define an “estimator” as some function, g,of the dataset, that is
θ = g(x[0], x[1], . . . , x[N − 1])
which may be used to estimate θ (single parameter case).
(in our DC level estimation problem, θ = A)
◦ This defines the problem of “parameter estimation”
◦ Also need to determine g(·)
◦ Theory and techniques of statistical estimation are available
◦ Estimation based on PDFs which contain unknown but deterministicparameters is termed classical estimation
◦ In Bayesian estimation, the unknown parameters are assumed to berandom variables, which may be prescribed “a priori” to lie within somerange of allowable parameters (or desired performance)
c© D. P. Mandic Advanced Signal Processing, Spring 2015 5
![Page 6: ASP Lecture 3 Estimation Intro 2015](https://reader034.vdocuments.us/reader034/viewer/2022042702/563db88b550346aa9a94ad73/html5/thumbnails/6.jpg)
The stastical estimation problem
First step: to model, mathematically, the data
◦ We employ a Probability Desity Function (PDF) to describe theinherently random measurement process, that is
p(x[0], x[1], . . . , x[N − 1]; θ)
which is “parametrised” by the unknown parameter θ
Example: for N = 1, and θ denoting the mean value, a generic form ofPDF for the class of Gaussian PDFs with any value of θ is given by
p(x[0]; θ) = 1√2πσ2
exp[− 1
2σ2(x[0]− θ)2]
x[0] 1 2
Clearly, theobserved value ofx[0] impacts uponthe likely value ofθ.
c© D. P. Mandic Advanced Signal Processing, Spring 2015 6
![Page 7: ASP Lecture 3 Estimation Intro 2015](https://reader034.vdocuments.us/reader034/viewer/2022042702/563db88b550346aa9a94ad73/html5/thumbnails/7.jpg)
Estimator vs. Estimate
The parameter to be estimated is then viewed as a realisation of therandom variable θ
◦ Data are described by the joint PDF of the data and parameters:
p(x, θ) = p(x | θ)︸ ︷︷ ︸
(conditional PDF )
p(θ)︸︷︷︸
(prior PDF )
◦ An estimator is a rule that assigns a value of θ from each realisationof x = x = [x[0], . . . , x[N − 1]]T
◦ An estimate of, i.e. θ (also called ’estimandum’) is the value obtained
for a given realisation of x = [x[0], . . . , x[N − 1]]T in the form θ = g(x)
Example: for a noisy straight line: p(x; θ) = 1(2πσ2)N/2e
[
− 12σ2
∑N−1n=0 (x[n]−A−Bn)2
]
◦ Performance is critically dependent upon this PDF assumption - theestimator should be robust to slight mismatch between the measurementand the PDF assumption
c© D. P. Mandic Advanced Signal Processing, Spring 2015 7
![Page 8: ASP Lecture 3 Estimation Intro 2015](https://reader034.vdocuments.us/reader034/viewer/2022042702/563db88b550346aa9a94ad73/html5/thumbnails/8.jpg)
Example: finding the parameters of straight line
Specification of the PDF is critical in determining a good estimator
In practice, we choose a PDF which fits the problem constraints and any“a priori” information; but it must also be mathematically tractable.
Example: Assume that “on theaverage” the data are increasing
n
noisy line
ideal noiseless line
0
A
x[n]
Data: straight line embedded inrandom noise w[n] ∼ N (0, σ2)
x[n] = A+Bn+ w[n]
n = 0, 1, . . . , N − 1
p(x;A,B) = 1(2πσ2)N/2e
[
− 12σ2
∑N−1n=0 (x[n]−A−Bn)2
]
Unknown parameters:
A, B ⇔ θ ≡ [A B]T
Careful: what would be the effectsof bias in A and B?
c© D. P. Mandic Advanced Signal Processing, Spring 2015 8
![Page 9: ASP Lecture 3 Estimation Intro 2015](https://reader034.vdocuments.us/reader034/viewer/2022042702/563db88b550346aa9a94ad73/html5/thumbnails/9.jpg)
Bias in parameter estimation
Estimation theory (scalar case): estimate the value of an unknown
parameter, θ, from a set of observations of a random variable described bythat parameter
θ = g(x[0], x[1], . . . , x[N − 1]
)
Example: given a set of observations from Gaussian distribution, estimatethe mean or variance from these observations.
◦ Recall that in linear mean square estimation, when estimating a value ofrandom variable y from an observation of a related random variable x,the coefficents a and b in the estimation y = ax+ b depend upon themean and variance of x and y as well as on their correlation.
The difference between the expected value of the estimate and theactual value θ is caleld the bias and will be denoted y B.
B = E{θN} − θ
where θN denotes estimation over N data samples, x[0], . . . , x[N − 1]
c© D. P. Mandic Advanced Signal Processing, Spring 2015 9
![Page 10: ASP Lecture 3 Estimation Intro 2015](https://reader034.vdocuments.us/reader034/viewer/2022042702/563db88b550346aa9a94ad73/html5/thumbnails/10.jpg)
Asymptotic unbiasedness
If the bias is zero, then the expected value of the estimate is equal to thetrue value, that is
E{θN} = θ ≡ B = E{θN} − θ = 0
and the estimate is said to be unbiased.
If B 6= 0 then the estimator θ = g(x) is said to be biased.
Example: Consider the sample mean estimator of the signalx[n] = A+ w[n], w ∼ N (0, 1), given by
A = x =1
N + 2
N−1∑
n=0
x[n] that is θ = A
Is the above sample mean estimator of the true mean A biased?
More often: an estimator is biased but bias B → 0 when N → ∞
limN→∞
E{θN} = θ
Such as estimator is said to be asymptotically unbiased.
c© D. P. Mandic Advanced Signal Processing, Spring 2015 10
![Page 11: ASP Lecture 3 Estimation Intro 2015](https://reader034.vdocuments.us/reader034/viewer/2022042702/563db88b550346aa9a94ad73/html5/thumbnails/11.jpg)
How about the variance?
◦ It is desirable that an estimator be either unbiased or asymptoticallyunbiased (think about the power of estimation error due to DC offset)
◦ For an estimate to be meaningful, it is necessary that we use theavailable statistics effectively, that is,
V ar → 0 as N → ∞
or in other words
limN→∞
var{θN} = limN→∞
{
|θN − E{θN}|2}
= 0
If θN is unbiased then E{θN} = θ, and from Tchebycheff inequality ∀ ǫ > 0
Pr{|θN − θ| ≥ ǫ} ≤var{θN}
ǫ2
⇒ if V ar → 0 as N → ∞, then the probability that θN differs by morethan ǫ from the true value will go to zero (showing consistency).
In this case, θN is said to converge to θ with probability one.
c© D. P. Mandic Advanced Signal Processing, Spring 2015 11
![Page 12: ASP Lecture 3 Estimation Intro 2015](https://reader034.vdocuments.us/reader034/viewer/2022042702/563db88b550346aa9a94ad73/html5/thumbnails/12.jpg)
Mean square convergence
Another form of convergence, stronger than convergence with probabilityone is mean square convergence.
An estimate θN is said to converge to θ in the mean–square sense, if
limN→∞
E{|θN − θ|2}︸ ︷︷ ︸mean square error
= 0
◦ For an unbiased estimator this is equivalent to the previous conditionthat the variance of the estimate goes to zero
◦ An estimate is said to be consistent if it converges, in some sense, tothe true value of the parameter
◦ We say that the estimator is consistent if it is asymptoticallyunbiased and has a variance that goes to zero as N → ∞
c© D. P. Mandic Advanced Signal Processing, Spring 2015 12
![Page 13: ASP Lecture 3 Estimation Intro 2015](https://reader034.vdocuments.us/reader034/viewer/2022042702/563db88b550346aa9a94ad73/html5/thumbnails/13.jpg)
Example: Assessing the performance of the Sample
Mean as an estimator
Consider the estimation of a DC level, A in random noise, which could bemodelled as
x[n] = A+ w[n]
w[n] ∼ some zero mean random process.
◦ Aim: to estimate A given {x[0], x[1], . . . , x[N − 1]}
◦ Intuitively, the sample mean is a reasonable estimator
A =1
N
N−1∑
n=0
x[n]
Q1: How close will A be to A?
Q2: Are there better estimators than the sample mean?
c© D. P. Mandic Advanced Signal Processing, Spring 2015 13
![Page 14: ASP Lecture 3 Estimation Intro 2015](https://reader034.vdocuments.us/reader034/viewer/2022042702/563db88b550346aa9a94ad73/html5/thumbnails/14.jpg)
Mean and variance of the Sample Mean estimator
x[n] = A+ w[n] w[n] ∼ N (0, σ2)
Estimator = f( random data ), ⇒ a random variable itself
⇒ its performance must be judged statistically
(1) What is the mean of A?
E{
A}
= E
{
1
N
N−1∑
n=0
x[n]
}
=1
N
N−1∑
n=0
E {x[n]} = A # unbiased
(2) What is the variance of A?
Assumption: The samples of w[n]s are uncorrelated
E{
A2}
= V ar{
A}
= V ar
{
1
N
N−1∑
n=0
x[n]
}
=1
N2
N−1∑
n=0
V ar {x[n]} =1
N2N σ2 =
σ2
N
Notice the variance → 0 as N → ∞ # consistent (see your P&A sets)
c© D. P. Mandic Advanced Signal Processing, Spring 2015 14
![Page 15: ASP Lecture 3 Estimation Intro 2015](https://reader034.vdocuments.us/reader034/viewer/2022042702/563db88b550346aa9a94ad73/html5/thumbnails/15.jpg)
Minimum Variance Unbiased (MVU) estimation
Aim: to establish “good” estimators of unknown deterministic parameters
Unbiased estimator # “on the average” yields the true value of theunknown parameter independent of its particular value, i.e.
E(θ) = θ a < θ < b
where (a, b) denotes the range of possible values of θ
Example: Unbiased estimator for a DC level in White GaussianNoise (WGN). If we are given
x[n] = A+ w[n] n = 0, 1, . . . , N − 1
where A is the unknown, but deterministic, parameter to be estimatedwhich lies within the interval (−∞,∞), then the sample mean can be usedas an estimator of A, namely
A =1
N
N−1∑
n=0
x[n]
c© D. P. Mandic Advanced Signal Processing, Spring 2015 15
![Page 16: ASP Lecture 3 Estimation Intro 2015](https://reader034.vdocuments.us/reader034/viewer/2022042702/563db88b550346aa9a94ad73/html5/thumbnails/16.jpg)
Careful: the estimator is parameter dependent!
An estimator may be unbiased for certain values of the unknownparameter but not all, such an estimator is not unbiased
Consider another sample mean estimator:
ˆA = 1
2N
∑N−1n=0 x[n]
Therefore: E{ˆA}
= 0 when A = 0 but
E{ˆA}
= A2 when A 6= 0 (parameter dependent)
HenceˆA is not an unbiased estimator
◦ A biased estimator introduces a “systemic error” which should notgenerally be present
◦ Our goal is to avoid bias if we can, as we are interested in stochasticsignal properties and bias is largely deterministic
c© D. P. Mandic Advanced Signal Processing, Spring 2015 16
![Page 17: ASP Lecture 3 Estimation Intro 2015](https://reader034.vdocuments.us/reader034/viewer/2022042702/563db88b550346aa9a94ad73/html5/thumbnails/17.jpg)
Remedy
(also look in the Assignment dealing with PSD in your CW )
Several unbiased estimates of the same quantity may be averagedtogether, i.e. given the L independent estimates
{
θ1, θ2, . . . , θL
}
We may choose to average them, to yield
θ = 1L
∑Ll=1 θl
Our assumption was that the individual estimators are unbiased, with equalvariance, and uncorrelated with one another.
Then (NB: averaging biased estimators will not remove the bias)
E{
θ}
= θ
and
V ar{
θ}
= 1L2
∑Ll=1 V ar
{
θl
}
= 1LV ar
{
θl
}
Note, as L → ∞, θ → θ (consistent)
c© D. P. Mandic Advanced Signal Processing, Spring 2015 17
![Page 18: ASP Lecture 3 Estimation Intro 2015](https://reader034.vdocuments.us/reader034/viewer/2022042702/563db88b550346aa9a94ad73/html5/thumbnails/18.jpg)
Effects of averaging for real world data
Problem 3.4 from your P/A sets: heart rate estimation
The heart rate, h, of a patient is automatically recorded by a computer every 100ms. In
one second the measurements{
h1, h2, . . . , h10
}
are averaged to obtain h. Given than
E{
hi
}
= αh for some constant α and var(hi) = 1 for all i, determine whether
averaging improves the estimator if α = 1 and α = 1/2.
h =1
10
10∑
i=1
hi[n],
E{
h}
=α
10
10∑
i=1
h = αh
If α = 1, unbiased, if α = 1/2 it will not
be unbiased unless the estimator is formed
as h = 15
∑10i=1 hi[n].
var{
h}
=1
L2
10∑
i=1
var{
hi
}
h
p(h
i)
Before Averaging
h/2 h
p(h
i)
h
p(h
i)
After Averaging
h/2 h
p(h
i)
α =1
2α =
1
2
α = 1α = 1
hi
hi
c© D. P. Mandic Advanced Signal Processing, Spring 2015 18
![Page 19: ASP Lecture 3 Estimation Intro 2015](https://reader034.vdocuments.us/reader034/viewer/2022042702/563db88b550346aa9a94ad73/html5/thumbnails/19.jpg)
Minimum variance criterion
⇒ An optimality criterion is necessary to define an optimal estimator
Mean Square Error (MSE)
MSE(θ) = E
{(
θ − θ)2
}
measures the average mean squared deviation of the estimator from thetrue value.
This criterion leads, however, to unrealisable estimators - namely, oneswhich are not solely a function of the data
MSE(θ) = E
{[(
θ − E(θ))
+(
E(θ)− θ)]2
}
= V ar(θ) + E{
(θ)− θ}2
= V ar(θ) +B2(θ)
⇒ MSE = VARIANCE OF THE ESTIMATOR + SQUARED BIAS
c© D. P. Mandic Advanced Signal Processing, Spring 2015 19
![Page 20: ASP Lecture 3 Estimation Intro 2015](https://reader034.vdocuments.us/reader034/viewer/2022042702/563db88b550346aa9a94ad73/html5/thumbnails/20.jpg)
Example: An MSE estimator with ’gain factor’
Consider the following estimator for DC level in WGN
A = a1
N
N−1∑
n=0
x[n]
Task: Find a which results in minimum MSE
Given
E{
A}
= aA and
V ar(A) = a2σ2
N
we have
MSE(A) =a2σ2
N+ (a− 1)2A2
Of course, the choice of a = 1 removes the mean and minimises the variance
c© D. P. Mandic Advanced Signal Processing, Spring 2015 20
![Page 21: ASP Lecture 3 Estimation Intro 2015](https://reader034.vdocuments.us/reader034/viewer/2022042702/563db88b550346aa9a94ad73/html5/thumbnails/21.jpg)
Continued: MSE estimator with ’gain’
But, can we find a analytically? Differentiating with respect to a yields
∂MSA
∂a(A) =
2aσ2
N+ 2(a− 1)A2
and setting the result to zero gives the optimal value
aopt =A2
A2 + σ2
N
but we do not know the value of A
◦ The optimal value depends upon A which is the unknown parameter
◦ Comment - any criterion which depends on the value of the unknownparameter to be found is likely to yield unrealisable estimators
◦ Practically, the minimum MSE estimator needs to be abandoned, andthe estimator must be constrained to be unbiased
c© D. P. Mandic Advanced Signal Processing, Spring 2015 21
![Page 22: ASP Lecture 3 Estimation Intro 2015](https://reader034.vdocuments.us/reader034/viewer/2022042702/563db88b550346aa9a94ad73/html5/thumbnails/22.jpg)
A counter-example: A little bias can help
(but the estimator is difficult to control)
Q: Let {y[n]}, n = 1, . . . , N be iid Gaussian variables ∼ N (0, σ2).Consider the following estimate of σ2
σ2 =α
N
N∑
n=1
y2[n] α > 1
Find α which minimises the MSE of σ2.
S: It is straightforward to show that E{σ2} = ασ2 andMSE(σ2) = E{
(
σ2 − σ2)2} = E{σ4} + σ4(1 − 2α)
=α2
N2
N∑
n=1
N∑
s=1
E{y2[n]y2[s]} + σ4(1 − 2α)
=α2
N2
(
N2σ4 + 2Nσ4)
+ σ4(1 − 2α) = σ4[
α4(1 +2
N) + (1 − 2α)
]
The MMSE is obtained for αmin = NN+2 and has the value
minMSE(σ2) = 2σ4
N+2. Given that the minimum variance of an
unbiased estimator (CRLB, later) is 2σ4/N , this is an example of abiased estimator which obtains a lower MSE than the CRLB.
c© D. P. Mandic Advanced Signal Processing, Spring 2015 22
![Page 23: ASP Lecture 3 Estimation Intro 2015](https://reader034.vdocuments.us/reader034/viewer/2022042702/563db88b550346aa9a94ad73/html5/thumbnails/23.jpg)
Example: Estimation of hearing threshold (hearing aids)
Objective estimation of hearing threshold:
◦ Design AM or FM stimuli with carrier frequency up to 20 kHz
◦ Modulate this carrier with e.g. 40 Hz envelope
◦ Present this sound to the listener
◦ Measure the electroencephalogram, which should contain a 40 Hz signalif the subject was able to detect the sound
◦ This is a good objective estimate of the hearing curve
c© D. P. Mandic Advanced Signal Processing, Spring 2015 23
![Page 24: ASP Lecture 3 Estimation Intro 2015](https://reader034.vdocuments.us/reader034/viewer/2022042702/563db88b550346aa9a94ad73/html5/thumbnails/24.jpg)
Key points
Simulation: Auditory steady state response (ASSR), 40 Hz amplitude modulated
◦ An estimator is a random
variable, its performance can
only be completely described
statistically or by a PDF.
◦ Estimator’s performance cannot
be conclusively assessed by
one computer simulation - we
often average M trials of
independent simulations, called
“Monte Carlo” analysis.
◦ Performance and computation
complexity are often traded - the
optimal estimator is replaced by
a suboptimal but realisable one.
90 92 94 96 98
−2
0
2
x 10−6 Time series
20 25 30 35 40 45
10−15
PSD analysis
98 100 102 104 106
−2
0
2
x 10−6
20 25 30 35 40 45
10−15
106 108 110 112 114
−4
−2
0
2
4x 10
−6
20 25 30 35 40 45
10−15
90 95 100 105 110
−4
−2
0
2
4x 10
−6
Time (s)20 25 30 35 40 45
10−15
Averaged PSD analysis
Frequency (Hz)
Simulation: Left column: The 24s of ITE EEG sampled at 256Hz was split into 3
segments of length 8s. Left column: The respective periodograms (rectangular window).
Bottom right: The averaged periodogram, observe the reduction in variance (in red).
c© D. P. Mandic Advanced Signal Processing, Spring 2015 24
![Page 25: ASP Lecture 3 Estimation Intro 2015](https://reader034.vdocuments.us/reader034/viewer/2022042702/563db88b550346aa9a94ad73/html5/thumbnails/25.jpg)
Desired: minimum variance unbiased (MVU) estimator
Minimising the variance of an unbiased estimator concentrates the PDF ofthe error about zero ⇒ estimation error is therefore less likely to be large
◦ Existence of the MVU estimator
The MVU estimator is an unbiased estimator with minimumvariance for all θ, that is, θ3 on the graph.
c© D. P. Mandic Advanced Signal Processing, Spring 2015 25
![Page 26: ASP Lecture 3 Estimation Intro 2015](https://reader034.vdocuments.us/reader034/viewer/2022042702/563db88b550346aa9a94ad73/html5/thumbnails/26.jpg)
Methods to find the MVU estimator
◦ The MVU estimator may not always exist
◦ A single unbiased estimator may not exist – in which case a searchfor the MVU is fruitless!
1. Determine the Cramer-Rao lower bound (CRLB) and find someestimator which satisfies
2. Apply the Rao-Blackwell-Lehmann-Scheffe (RBLS) theorem
3. Restrict the class of estimators to be not only unbiased, but also linear(BLUE)
4. Sequential vs. block estimators
5. Adaptive estimators
c© D. P. Mandic Advanced Signal Processing, Spring 2015 26
![Page 27: ASP Lecture 3 Estimation Intro 2015](https://reader034.vdocuments.us/reader034/viewer/2022042702/563db88b550346aa9a94ad73/html5/thumbnails/27.jpg)
Extensions to the vector parameter case
◦ If θ =[
θ1, θ2, . . . , θp
]T
∈ Rp×1 is a vector of unknown parameters, an
estimator is unbiased if
E(θi) = θi ai < θi < bi
for i = 1, 2, . . . , p
and by defining
E(θ) =
E(θ1)E(θ2)
...E(θp)
an unbiased estimator has the property E(θ) = θ
within the p– dimensional space of parameters
◦ An MVU estimator has the additional property that V ar(θi) fori = 1, 2, . . . , p is minimum among all unbiased estimators
c© D. P. Mandic Advanced Signal Processing, Spring 2015 27
![Page 28: ASP Lecture 3 Estimation Intro 2015](https://reader034.vdocuments.us/reader034/viewer/2022042702/563db88b550346aa9a94ad73/html5/thumbnails/28.jpg)
Summary and food for thoughts
◦ We are now equipped with performance metrics for assessing thegoodnes of any estimator (bias, variance, MSE)
◦ Since MSE = var + bias2, some biased estimators may yield low MSE.However, we prefer the minimum variance unbiased (MVU) estimators
◦ Even a simple Sample Mean estimator is a very rich example of theadvantages of statistical estimators
◦ The knowledge of the parametrised PDF p(data;parameters) is veryimportant for designing efficient estimators
◦ We have introduced statistical “point estimators”, would it be useful toalso know the “confidence” we have in our point estimate
◦ In many disciplines it is useful to design so called “set membershipestimates”, where the output of an estimator belongs to a pre-defininedbound (range) of values
◦ We will next address linear, best linear unbiased, maximum likelihood,least squares, sequential least squares, and adaptive estimators
c© D. P. Mandic Advanced Signal Processing, Spring 2015 28
![Page 29: ASP Lecture 3 Estimation Intro 2015](https://reader034.vdocuments.us/reader034/viewer/2022042702/563db88b550346aa9a94ad73/html5/thumbnails/29.jpg)
Homework: Check another proof for the MSE expression
MSE(θ) = var(θ) + bias2(θ)
Note : var(x) = E[x2]−[E[x]
]2(∗)
Idea : Let x = θ − θ → substitute into (∗)
to give var(θ − θ)︸ ︷︷ ︸
term (1)
= E[(θ − θ)2]︸ ︷︷ ︸
term (2)
−[E[θ − θ]
]2
︸ ︷︷ ︸
term (3)
(∗∗)
Let us now evaluate these terms:
(1) var(θ − θ) = var(θ)
(2) E[θ − θ]2 = MSE
(3)[E[θ − θ]
]2=
[E[θ]− E[θ]
]2=
[E[θ − θ]
]2= bias2(θ)
Substitute (1), (2), (3) into (**) to give
var(θ) = MSE− bias2 ⇒ MSE = var(θ) + bias2(θ)
c© D. P. Mandic Advanced Signal Processing, Spring 2015 29
![Page 30: ASP Lecture 3 Estimation Intro 2015](https://reader034.vdocuments.us/reader034/viewer/2022042702/563db88b550346aa9a94ad73/html5/thumbnails/30.jpg)
Recap: Unbiased estimators
Due to the linearity properties of the E {·}, that is
E{a+ b} = E{a}+ E{b}
the sample mean operator can be simply shown to be unbiased, i.e.
E{
A}
= 1N
∑N−1n=0 E {x[n]} = 1
N
∑N−1n=0 A = A
◦ In some applications, the value of A may be constrained to be positive
a component value such as an inductor, capacitor or resistor wouldbe
positive (prior knowledge)
◦ For N data points in random noise, unbiased estimators generally havesymmetric PDFs centred about their true value, i.e.
A ∼ N (A, σ2/N)
c© D. P. Mandic Advanced Signal Processing, Spring 2015 30
![Page 31: ASP Lecture 3 Estimation Intro 2015](https://reader034.vdocuments.us/reader034/viewer/2022042702/563db88b550346aa9a94ad73/html5/thumbnails/31.jpg)
Notes
c© D. P. Mandic Advanced Signal Processing, Spring 2015 31
![Page 32: ASP Lecture 3 Estimation Intro 2015](https://reader034.vdocuments.us/reader034/viewer/2022042702/563db88b550346aa9a94ad73/html5/thumbnails/32.jpg)
Notes
c© D. P. Mandic Advanced Signal Processing, Spring 2015 32