least squares estimate additional notes 1. introduction the quality of an estimate can be judged...

1

Least Squares Estimate

Additional Notes

2

Introduction

• The quality of an estimate can be judged using the expected value and the covariance matrix of the estimated parameters.

• The following theorem is an important tool to determine the quality of the least squares estimate.

3

Assumptions

• Assume that there exists a true parameter vector θ0

so that the data satisfy:

• The stochastic process e(k) is zero-mean white noise, with variance σ2 at every step.

• Intuition: The assumption says that the true system can be represented by the chosen model, up to some errors that are well-behaved in a statistical sense.

)()()( 0 kekky T

4

Theorem

• Part 1: The solution of the least-squares problem is an unbiased estimate of θ0, that is:

• Part 2: The covariance matrix P of the estimated parameter vector using least squares is given by

0]ˆE[

nnTP 12 )()ˆcov(

5

Remark on Part 1

Part 1 of the theorem says that the solution makes (statistical) sense on average.

6

Remarks on Part 2

• In the covariance matrix P,

the ith prime diagonal element Pii is the variance of the ith parameter The low is the variance of the estimated parameter, the higher is the confidence that is close to the true value θi.

• Also, it can be seen that smaller errors e(k) (i.e. smaller variance σ2) yield smaller parameter covariance.

nnTP 12 )()ˆcov(

7

Remarks on Part 2, continued• The calculation of the covariance matrix P assumes that we know

the variance σ2 of the measurement error e which is not known beforehand. A valid estimate of σ2 is given by

• Where N is the number of I/O data samples and n is the number of parameters to be estimated.

• Recall that the cost function evaluated at is

nNee

nNV T

)ˆ(ˆ 2

YYYY

YYeeVTTTT

LST

LST

LS

1)(

)ˆ()ˆ()ˆ(

8

Confidence intervals for the parameters

• If the residuals e are normally distributed, the confidence interval of the parameters can be computed. The confidence limits give a rough idea of the significance of the estimates.

• For example, when a parameter that is expected to lie between 0 and 1 turns out to be 0.4 with 95% confidence region of ±0.005, it is safe to say that it is a good estimate of that parameter. However, when the 95% confidence region around the parameter is ±2, the estimate is meaningless.

9


For any Gaussian distribution N(µ, σ2), the probability of drawing a sample in the interval [µ−1.96σ, µ+1.96σ] is found (from the Gaussian formula) to be 0.95.

10

• For N >> n, is distributed according to Gaussian distribution N(θi,Pii). Therefore:

• Hence, the 95% confidence limits of the true parameter θi are

• Note that this interval is valid if the off-diagonal elements of P are small compared to the main diagonal elements. In practice, this requirement can be relaxed and the given interval is sufficient for deciding whether an estimate should be accepted or rejected.

,96.1ˆiii P


95.096.1ˆPr iiii P

11

Example• Consider the following model and the input-output data:

Use the least squares method to find the unknown parameters a and b along with their confidence intervals.

)()1()1()( kekbukayky

k 0 1 2 3 4 5 6 7 8u(k) 1 1 1 1 0 0 0 0 *y(k) 0 0.57 0.97 1.26 1.47 1.06 0.76 0.54 0.39

12

Solution

• First, we form the vector y and the matrix as

.

054.0076.0006.1047.1126.1197.0157.010

)7()7()6()6()5()5()4()4()3()3()2()2()1()1()0()0(

,

39.054.076.006.147.126.197.057.0

)8()7()6()5()4()3()2()1(

uyuyuyuyuyuyuyuy

yyyyyyyy

y

13

• We calculate the least squares estimate of a and b as

• Hence, the model of least squares fit is

yba TT

1)(ˆ

ˆ̂

)1(565.0)1(718.0)( kukyky

.565.0718.0

39.054.076.006.147.126.197.057.0

0000111154.076.006.147.126.197.057.00

054.0076.0006.1047.1126.1197.0157.010

0000111154.076.006.147.126.197.057.00

1

14

Solution, continued

• The residual errors e can be calculated as:

• The number of data points is N = 8, and the number of unknown parameters is n = 2. Hence, an estimate of the noise variance is

52 1074.128

)ˆ(ˆ

ee

nNV T

T] 0.0024 0.0056- 0.0009- 0.0047 0.0005 0.0013- 0.0042- [0.005

ˆ

ye

15

Solution, continued

• The covariance matrix of the estimated parameters is

• The relatively large magnitudes of the off-diagonal terms in P indicate that there is a strong interaction between the estimates. This is typical in dynamic systems.

6044.02415.02415.0345.0

10)()ˆcov( 512 TP

16

• The 95% confidence limits are roughly given by

• The confidence regions are so small that the estimates can be accepted without much reservation.

0048.0565.0106044.096.1565.0ˆ

0036.0718.010345.096.1718.0ˆ

5

5

b

a

Solution, continued

17

Visualization of the confidence interval

• To visualize the confidence intervals, imagine that we repeat the previous experiment 1000 times.

• In each experiment, we use 8 input-output data points and the least squares to find an estimate of the parameters.

• The empirical distribution (histogram) of the estimated parameters are shown in the next slide.

18

Visualization of the confidence interval

The confidence interval describes roughly the width of the histogram.

19

Acknowledgement Most of the material in this course are due to:

(1) Prof. Abdel-Latif El-Shafei , Department of Electrical power Engineering, Cairo University.

(2) Dr Lucian Busoniu, Technical University of Cluj-Napoca, Romania. http://busoniu.net/teaching/sysid2014/

(3) Prof. Xia Hong, University of Reading, England. http://www.personal.reading.ac.uk/~sis01xh/lecturenotes.html

http://busoniu.net/teaching/sysid2014/



http://www.personal.reading.ac.uk/~sis01xh/lecturenotes.html



least squares estimate additional notes 1. introduction the quality of an estimate can be judged...

Documents