iiiiii - ulisboausers.isr.ist.utl.pt/~pjcro/temp/applied optimal estimation - gelb.pdf · questions...

APPLIED OPTIMAL ESTIMATION

written by Technical StaffTHE ANALYTIC SCIENCES CORPORATION

edited by Arthur Gelb

principala<Jthors Arthur Gelb

Joseph F. Kasper, Jr.Raymond A. Nash, Jr.Charles F. PriceArthur A. Sutherland, Jr.

IIIIII1THE M.I.T. PRESSMassachusetts Institute of TechnologyCambridge, Massachusetts, and London, England

Sixteenthprinting, 2001

Copyright e 1974 byThe Analytic Sciences Corponltion

AU rights reserved. No part of this book may bereproduced in any fonn or by any means, electronic ormechanical, including photocopying, recording, or by anyinformation storage and retrieval system, without permission in writing from The Analytic SciencesCorporation. This book was printed and bound in theUnited States of America.

Lib,1lI'Y of CongrelS COIDlog CardNumber: 74·1604ISBN 0-262-20027-9 (hard)

0-262-57048-3 (paper)

This book was designed by TASC.

FOREWORD

Estimation is the process of extracting information from data ~ data whichcan be used to infer the desired information and may contain errors. Modernestimation methods use known relationships to compute the desired informationfrom the measurements. taking account of measurement errors, the effects ofdisturbances and control actions on the system. and prior knowledge of theinformation. Diverse measurements can be blended to form "best" estimates,and information which is unavailable for measurement can be approximated inan optimal fashion. The intent of this book is to enable readers to achieve a levelof competence that will permit their participation in the design and evaluationof practical estimators. Therefore, the text is oriented to the applied rather thantheoretical aspects of optimal estimation. It is our intent throughout to providea simple and interesting picture of the central issues underlying modernestimation theory and practice. Heuristic, rather than theoretically elegant.arguments are used extensively, with emphasis on physical insights and keyquestions of practical importance.

The text is organized into three principal parts. Part I introduces the subjectmatter and provides brief treatments of the underlying mathematics. Chapter 1presents a brief overview, including a historical perspective; Chapters 2 and 3treat the mathematics underlying random process theory and state-spacecharacterization of linear dynamic systems, both of which are essentialprerequisites to understanding optimal estimation theory. Part II providesderivations, interpretations and examples pertinent to the theory of optimalestimation. Thus. Chapters 4 and 5 address optimal linear filtering and

smoothing, respectively, while Chapter 6 addresses the subject of nonlinearfiltering and smoothing. Part III treats those practical issues which often meanthe difference between success or failure of the implemented optimal estimator.The practical and often pivotal issues of suboptimal filtering, sensitivity analysisand implementation considerations are discussed at some length in Chapters 7and 8. Additional topics of practical value are presented in Chapter 9; theseinclude refinements and other viewpoints of estimation theory, and the closeconnection of the mathematics which underly both optimal linear estimationtheory and optimal linear control theory.

Many illustrative examples have been interspersed throughout the text toassist in effective presentation of the theoretical material. Additionally,problems with "built-in" answers have been included at the end of each chapter,to further enable self-study of the subject matter.

This book is the outgrowth of a course taught by The Analytic SciencesCorporation (TASC) at a number of U.S. Government facilities. The coursenotes were, in turn, based on the considerable practical experience of TASe inapplying modern estimation theory to large-scale systems of diverse nature.Thus, virtually all Members of the Technical Staff of TASC have, at one time oranother, contributed to the material contained herein. It is a pleasure to specifically acknowledge those current members of TASe who, in addition to theprincipal authors, have directly contributed to the writing of this book. Bard S.Crawford provided a complete section; Julian L Center. Jr., Joseph A.D'Appolito, and Ronald S. Warren contributed through providing text additions,technical comments and insights of a very diverse nature. Other individualswhose contributions are acknowledged are Robert G. Bellaire, Norman H.Josepby. William F. O'Halloran, Jr. and Bahar J. Uttam. William R. Sullivan andVicky M. Koczerga created all the artwork. Renwick E. Curry, John J. Deyst, Jr.and Professor Wallace E. Vander Velde of M.I.T. contributed through technicaldiscussions and by providing some problems for inclusion at the end of severalchapters. Professor Charles E. Hutchinson, University of Massachusetts. contributed through his participation in early TASe work which set the stage forthis book. Special acknowledgement is due Harry B. Silverman ofTASe for hisencouragement of the project from its inception. Finally, this most recent printing benefits considerably from the careful reading given earlier printings by thoseindividuals who have provided us with comments and corrections.

THE ANALYTIC SCIENCES CORPORATIONReading,Massachusetts

Arthur Gelb16 July 1979

CONTENTS

Foreword v

Chapter 1 Introduetion

Chapter 2 Review of Underlying Mathematical Techniques 10

2.1 Vectors, Matrices, and Least Squares 102.2 Probability and Random Processes 24

Chapter 3 Linear Dynamic Systems 51

3.1 State-Space Notation 513.2 Transition Matrix 573.3 Matrix Superposition Integral 633.4 Discrete Formulation 663.5 System Observability and Controllability 673.6 Covariance Matrix 723.7 Propagation of Errors 753.8 Modeling and State Vector Augmentation 783.9 Empirical Model Identification 84

Chapter 4 Optimal Linear Filtering 102

4.1 Recursive Filters 1054.2 Discrete Kalman Filter 107

4.3 Continuous Kalman Filter 1194.4 Intuitive Concepts I 274.5 Correlated Measurement Errors 1334.6 Solution of the Riceati Equation 1364.7 Statistical Steady State - The WienerFilter 142

CillIpte,5 Optimal Linear Smoothing 156

5.1 Form of the Optimal Smoother 1575.2 Optimal Fixed-IntervalSmoother 1605.3 Optimal Fixed-PointSmoother 1705.4 Optimal Fixed-LagSmoother 173

Chapter 6 Nonlinear Estimation 180

6.1 Nonlinear MinimumVariance Estimation 1826.2 Nonlinear Estimation by Statistical Linearization 2036.3 Nonlinear Least-Squares Estimation 2146.4 Direct Statistical Analysisof NonlinearSystems _ CADET'M 216

Chapter 7 Suboptimal Filter Design and Sensitivity Analysis 229

7.1 Suboptimal Filter Design 2307.2 SensitivityAnalysis:Kalman Filter 2467.3 SensitivityAnalysisExamples 2557.4 Developingan Error Budget 2607.5 SensitivityAnalysis:Optimal Smoother 2667.6 Organizationof a Computer Programfor CovarianceAnalysis 268

Chapter S Implementation Consideretions 277

8.1 ModelingProblems 2788.2 Constraints Imposed by the Computer 2888.3 The Inherently Finite Nature of the Computer 2928.4 Algorithmsand Computer Loading Analysis 303

Chapte,9 Additionel Topics 316

9.1 AdaptiveKalman Filtering 3179.2 Observers 3209.3 Stochastic Approximation 3359.4 Real-TimeParameter Identification 3489.5 Optimal Control of Linear Systems 356

Index 371

APPLIED OPTIMAL ESTIMATION

1. INTRODUCTION

HISTORICALPERSPECTIVE

The development of data processing methods for dealing with randomvariables can be traced to Gauss (circa 18(0), who invented the technique ofdeterministic least-squares and employed it in a relatively simple orbitmeasurement problem (Ref. I). The next significant contribution to the broadsubject of estimation theory occurred more than 100 years later when Fisher(circa 1910), working with probability density functions, introduced theapproach of maximum likelihood estimation (Ref. 2). Utilizing random processtheory, Wiener (circa 1940) set forth a procedure for the frequency domaindesign of statistically optimal filters (Refs. 3, 4). The technique addressed thecontinuous-time problem in terms of correlation functions and the continuousfilter impulse response. It was limited to statistically stationary processes andprovided optimal estimates only in the steady-state regime. In the same timeperiod, Kolmogorov treated the discrete-time problem (Ref. 5). During the next20 years, Wiener's work was extended - in a way which often requiredcumbersome calculations - to include nonstationary and multiport systems(Refs. 6-8). Kalman and others (circa 1960) advanced optimal recursive filtertechniques based on state-space. time domain formulations (Refs. 9-13). Thisapproach, now known as the Kalman filter, is ideally suited for digital computerimplementation. Indeed, it is the very foundation for data mixing in modernmultisensor systems.

It is interesting to see the many similarities between Gauss' work and themore "modern" approaches. As is pointed out in Ref. 14, Gauss notes the need

APPLIED OPTIMAL ESTIMATION INTROOUCTION

for redundant data to eliminate the influence of measurement errors; he raisesthe issue of dynamic modeling of the system under study; he refers to theinaccuracy of observations and thereby sets the stage for probabilisticconsiderations; he refers to the "suitable combination" of observations whichwill provide the most accurate estimates and thus touches upon the questions ofestimator structure and performance criterion definition; he refers to thenumber of observations that are absolutely required for determination of theunknown quantities and thus addresses the subject currently referred to as"observability" of the system. Other similaritiescan also be cited. In fact, it canbe argued that the Kalman filter is, in essence, a recursive solution* to Gauss'originalleast-squaresproblem.

It is also interesting to note two underlying differences between the classicaland modem techniques; namely, the use of random process vs. deterministicsignal descriptions and the use of high-speed digital computers to generatenumerical solutions vs. the requirement for closed-form "pencil and paper"solutions, The former consideration enables the modern mathematics to moreclosely characterize physical situations being treated; the latter tremendouslybroadens the range of problems which may be studied.

~ DENOTESSPAN OF AVAILABLE~ MEASUREMENT DATA

tlo} fiLTERING

t[b] SMOOTHING

1<1 PREDKTlON

Figure 1.0-1 Three Types of Estimation Problems (estimate desired at time t)

information. the Kalman filter describes how to process the measurement data.However. the Kalman filter per se does not solve the problem of establishinganoptimal measurement schedule, or of design in the presence of parameter uncertainties, or of how to deal with computational errors. Other design criteria.in addition to those used to derive the filtering algorithm, must be imposed toresolvethese questions.

SYSTEMSTATE

ESTIMATE~Ill

A PRIORIINFORMATION

MEASUREMENTERROR

SOURCES

SYSTEMERROR

SOURCES

OPTIMAL ESTIMATION

An optimal estimator is a computational algorithm that processes measurements to deduce a minimum errorl' estimate of the state of a system byutilizing: knowledge of system and measurement dynamics, assumed statistics ofsystem noisesand measurement errors, and initial condition information. Amongthe presumed advantages of this type of data processorare that it minimizes theestimation error in a well defined statistical sense and that it utilizes allmeasurement data plus prior knowledge about the system. The correspondingpotential disadvantages are its sensitivity to erroneous a priori models andstatistics, and the inherent computational burden. The important conceptsembodied in these statements are explored in the sequel.

The three types of estimation problems of interest are depicted in Fig. 1.Q-1.When the time at which an estimate is desired coincides with the lastmeasurement point, the problem is referred to as filtering; when the time ofinterest falls within the span of available measurement data, the problem ittermed smoothing; and when the time of interest occurs after the last availablemeasurement, the problem is called prediction.

Probably the most common optimal filtering technique is that developed byKalman (Ref. 10) for estimating the state of a linear system. This technique,depicted in Fig. 1.0-2, provides a convenient example with which to illustratethe capabilities and limitations of optimal estimators. For instance, given a linearsystem model and any measurements of its behavior, plus statistical modelswhich characterize system and measurement errors, plus initial condition

·A solution that enables sequential, rather than batch, processing of the measurement data.tIn accordance with some stated criterion of optimality. Fipre 1.0-2 Block Diagram Depicting System. Measurement and Estimator

APPLIED OPTIMAL ESTIMATIONINTRODUCTION

APPLICATIONS OF OPTIMAL ESTIMATION THEORY

The theory of optimal estimation has application to a tremendously broadrange of problem areas. To cite a few, we have: tracer studies in nuclearmedicine, statistical image enhancement, estimation of traffic densities, chemicalprocess control, estimation of river flows, power system load forecasting,classification of vectorcardiograms, satellite orbit estimation and nuclear reactorparameter identification.

The estimation problem may be posed in terms of a single sensor makingmeasurements on a single process or, more generally, in terms of multiple sensorsand multiple processes. The latter case is referred to as a multisensorsystem.Suppose there are £ sensors, which provide measurements on m physicalprocesses. Some of the sensors may measure the same quantity, in which casesimple redundant measurements are provided; others may measure quantitiesrelated only indirectly to the processes of interest. The estimation problem, inthe context of this multisensor system, is to process the sensor outputs such that"best" estimates of the processes of interest are obtained. A computerimplemented data processing algorithm operates on the sensor data to providethe desired estimates. These estimates may be used to drive displays and also toserve as control signals for the physical systems under observation, as illustratedin Fig. 1.0-3. In modern multisensor systems, the data processing algorithm isvery often derived using optimal filter theory.

DIGITALCOMPuTER

(DATA )

PROCESSINGALGORITHM

INTERFACE

Figure l.(~3 Modern Multisensor System

DISPLAYS

1CONTROLSIGNALS

contain random errors that may be significant, when compared to the navigationsystem errors. Secondly, navigation system errors are primarily caused byrandom, time-varying navigation sensor errors. The optimal use of externalmeasurements, together with those provided by the navigation system. canprovide a resulting navigation accuracy which is better than that obtainable fromeither external measurements or the navigation system alone.

Application of modern estimation techniques to multisensor navigationsystems began in the mid-1960's, shortly after optimal recursive filter theory wasdeveloped and published. Because the errors in a typical navigation systempropagate in essentially a linear manner and linear combinations of these errorscan be detected from external measurements, the Kalman filter is ideally suitedfor their estimation. It also provides useful estimates of all system error SOurceswith significant correlation times. In addition, the Kalman filter providesimproved design and operational flexibility. As a time-varying filter, it canaccommodate nonstationary error sources when their statistical behavior isknown. Configuration changes in the navigation system are relatively easy toeffect by programming changes. The Kalman filter provides for optimal use ofany number, combination, and sequence of external measurements. It is atechnique for systematically employing all available external measurements,regardless of their errors, to improve the accuracy of navigation systems. Thisapplication of the theory, among others, is often made in the following chapters.

Perhaps the essential non-hardware issues in any practical application ofoptimal estimation are those of modeling and realistic performanceprojections.These issues are, of course, related and their proper treatment is a prerequisite tosuccessful system operation in a real environment. Based upon this perspective,it becomes clear that a reasonable use of estimation theory in any operationalsystem is: /irst - design and computer evaluation of the "optimal" systembehavior; second - design of a suitable "suboptimal" system with cost constraints, sensitivity characteristics, computational requirements, measurementschedules, etc., in mind; and third - construction and test of a prototypesystem, making final adjustments or changes as warranted.

Example 1.0-1

Consider a system comprised of two sensors, each making a single measurement, zi(i =1,2), of a constant but unknown quantity. x, in the presence of random, independent,unbiased measurement errors, vi(i = 1.2). Design a data processing algorithm that combinesthe two measurements to produce an optimal estimate of x.

The measurements are described by

In the absence of any other infcrmetion, we might seek an estimate of x which is a linearfunction of the measurements in the form (superscript .......denotes estimate)

Some outstanding examples of multisensor systems occur in the field ofnavigation. External measurements were originally used to update navigationvariables in a deterministic manner; for example, system position indication waschanged to agree with the results of an external position measurement. Thisapproach ignored two important facts. First, external measurements themselves

(1.0-1)

(1.0-2)

6 APPLIED OPTIMAL ESTIMATION INTRODUCTION

where k 1 and k2 remain to be specified. Defining the estimation error, X, as

x=x-x

we seek to minimize the mean square value ofx as the criterion of optimality. Furthermorewe require that our choice of k t and k2 be independent of the value of x; this condition"will hold if the estimate isunbillred - Le., if

(1.0-3)

where E denotes the ensemble expectationt or average. Performing the indicatedexpectation, with E(vd = E(v2] = 0 and E(x] = x since x is "nonrandom", we obtain

Physical systems and measurements can be categorized according to whetherthey are static or dynamic, continuous or discrete, and linear or nonlinear.Furth~rmore, we shall be interested in real-time data processing algorithms[filtering and prediction) and post-experiment algorithms (smoothing). Due toits notational simplicity and theoretical power, subsequent treatment of thesephysical systems and data processing algorithms is couched in terms of statespace representation, using the mathematics of vectors and matrices and thenotions of random process theory. These topics are discussed in Chapters 2 and3.

REFERENCES

or

(1.0-4)

PROBLEMS

1. Gauss, K.F., TheoriDMotu" 1809; also, Theory of the Motion of'the Heapenly BodiesAbout the Sun in Conic Sections, Dover, New York, 1963.

2. Fisher, R.A., "On an Absolute Criterion for Fitting Frequency Curves," Mellenger ofMarh., VoI.41,1912, p. 155.

3. Wiener, N., "The Extrapolation, Interpolation and Smoothing of Stationary TimeSeries," OSRD 370, Report to the Services 19, Research Project DIC-6037, MIT,February 1942.

4. Wiener, N. The Extrapolation, Interpolation and Smoothing of Stationary Time Series,John Wiley & Sons, Inc., New York, 1949.

5. Kohnogorov, A.N., "Interpolation and Extrapolation von Stationaren ZufalligenFolgen," Bull Acad Sci. USSR, Ser. Math. 5, 1941, pp. 3-14.

6. Zadeh, L.A., and Ragazzini, J.R., "An Extension of Wiener's Theory of Prediction," J.Appl. Phy< Vol. 21, July 1950, pp. 64H55.

7. Laning, J.H., Jr. and Battin, R.H., Random Procesees in Automatic Control,McGraw~Hill Book Co., Inc., New York, 1956, Chapter 8.

8. Shinbrot, M., "Optimization of Time-Varying Linear Systems with NonstationaryInputs,"1Tanr. ASME, Vol. 80, 1958, pp. 457-462.

9. Swerling, P., "First Order Error Propagation in a Stagewise Smoothing Procedure forSatellite Observations," J. Astronautical Science', Vol. 6, 1959, pp. 46-52.

10. Kalman, R.E., "A New Approach to Linear Filtering and Prediction Problems," J. Ba,icEng., Match 1960, pp. 35-46.

11. Kalman, R.E., and Bucy, R.S., "New Results in Linear Filtering and PredictionTheory," J. lJIlsicEng., March 1961, pp. 95-108.

12. Blum, M., ""AStagewise Parameter Estimation Procedure for Correlated Data," Numer.Marh., Vol. 3. 1961, pp. 202-208.

13. Battin, R.H., "A Statistical Optintizing Navigation Procedure for Space Flight," ARS10"""", Vol. 32, 1962, pp. 1681-1696.

14. Sorenson, H.W., "Least-Squares Estimation: From Gauss to Kalman," IEEE Spectrum,July 1910, pp. 63-68.

(1.0-1)

(1.0-6)

(1.0-5)

( 2) ( 2)... °2 ~x > ~ Zl+ 2 2 Z201 + 02 01 + 02

makes sense in the various limits of interest - Le., if 0t 2 = 022,

the measurements areaveraged; if one measurement is perfect (Ot or 01 equal to zero) the other is rejected; etc.

a?k t =-2--2

01 + 02

It can be seen that the mean square estimation error is smaller than either of the meansquare measurement errors. The algorithm [Bq. (1.0-2)]

The corresponding minimum mean square estimation error is

In Example 1.0-1, the physical quantity of interest was a scalar constant andonly two linear measurements of that constant were available. In general, weshall be interested in estimates of vector time-varying quantities where manymeasurements are provided.

where 0t 2 and 022 denote the variances of vr and V2' respectively. Differentiating thisquantity with respect to k1 and setting the result to zero yields

Combining Bqs. (1.0-1) through (1.0-4), the mean square error is computed in the forrn

"This condition is imposed because x is unknown; hence, the gains k l and k2 must beindepenaent of x.

tThe expectation operator is discussed in Section 2.2.

Problem 1-1

Repeat Example 1.0-1 for the case where measurement errors are correlated; that IS,

E(VtV2] = pOI 02 where p is a correlation coefficient (Ipl OIl; 1). Show that

APPLlEQOPTIMALESTIMATION

and

for the optimal estimate. Physically interpret the meaningofE[i?j = 0 foe p = ±1.

Problem 1-2

Compute E(i2) from Eqs. (1.0-1) and (1.0-2) with the gains k 1 and k2 unrestricted.

Show that values of the gains which minimize E[:i?) are functions of x, using the fact thatEjx] = x and E[x 2 ) = x:Z.

Problem 1-3

Consider a case similar to Example 1.0-1, but in which three independent measurementsare available instead of two. Argue why an estimator should be sought in the form

Develop optimum values for k l and k2 and use these to show that

for the optimal estimate.

Problem 14

The concentration of a substance in solution decreases exponentially during anexperiment. Noisy measurements of the concentration are made at tUnes tl and t2 , suchthat (i = 1,2)

An estimate of the initial ccncentretlce, xo, is desired. Demonstrate that an unbiasedestimator is given by

where k, not yet specified, is a constant. Show that the value of k, which minimizes themean square estimation error, is

and that the corresponding mean square estimation error is

. 2 (1 1 )-'E«xo - xo) 1= 2 e-2t t +2 e-2t2at a2

where at 2 and a22 are measurement error variances.

INTRODUCTION 9

2. REVIEW OF UNDERLYINGMATHEMATICAL TECHNIQUES

REVIEW OF UNDERLYING MATHEMATICAL TECHNIOUES 11

{j ,,'n

Vector Addition - Addition of two vectors is defined by

[

XI +YJX2 +Y2

l\+l ~ :

Xn +Yn

(2.1-2)

In this chapter, mathematical techniques utilized in the development andapplication of modern estimation theory are reviewed. These techniques includevector and matrix operations and their application to least-squares estimation.Also included is a presentation on probability. random variables and randomprocesses. Of these, the latter is the most important for our work and,consequently. is given the greatest emphasis. The specific goal is to provide themathematical tools necessary for development of state vector and covariancematrix methods and their subsequent application.

Note that the material in this chapter is neither a complete discussion of thetopics, nor mathematically rigorous. However, more detailed material is providedin the references cited.

Vector subtraction is defined in a similar manner. In both vector addition andsubtraction, ~ and l must have the same dimension.

Seal. Multiplication - A vector may be multiplied by a scalar, k, yielding

(2.1-3)

Zero Vector - The vector 0 is defined as the vector in which every element isthe number O. -

Vector Transpose - The transpose of a vector is defined such that, if x is thecolumn vector -

2.1 VECTORS. MATRICES. AND LEAST SQUARES

This section contains pertinent definitions and operations for the applicationof vector and matrix methods in modern estimation. The reader requiring a morecomplete discussion is referred to Refs. 1-3.

VECTOR OPERATIONS

An array of elements, arranged in a column, is designated by a lowercaseletter with an underbar and is referred to as a vector (more precisely, a columnvector). The number of elements in the vector is its dimension. Thus, ann-dimensional vector! is:

{]its transpose is the row vector

(2.14)

(2.1-5)

12 APPLIED OPTIMAL ESTIMATION REVIEW OF UNDERLYING MATHEMATICAL TECHNIQUES 13

The length of the vector II is denoted by

I ...... Product- The quantity J!Tl: is referred to as the inner product or dotproduct and yields the scalar

If 3TX = 0, 3 and X are said to be orthogonal. In addition, J!TJI, the squaredlength of the vector 3, is

(2.1-11)[~l (t)x,(t)

!(t)= :

xn(t)

Multiplying both sides by the scalar 1//11and taking the limit as.1t ... 0 yields

which is the desired result. The integral of a vector is similarly described - i.e.,in termsof the integrals of its elements.

(2.1-6)

(2.1-7)

(2.1-8)

Outer Product - The quantity ~T is referred to as the outer product andyields the matrix

MATRIX OPERATIONS

A matrix is an m X n rectangular array of elementsin m rowsand n columns,and will be designated by a capital letter. The matrix A, consisting of m rowsandn columns,is denotedas:

where'1i is the elementin the jth roW andjth column, for i = 1,2, ... , m andj =

1,2, ... .n, For example. ifm =2 and n =3, A is a 2 X 3 matrix

Matrix Addition - Matrix addition is defined only when the two matricestobe added are of identicaldimensions, i.e., they have the same number of rowsand the sarne number of columns. Specifically.

(2.1·12)A= [lI;j)

Note that a column vector may be thought of as a matrix of n rows and Icolumn; a row vector may be thought of as a matrix of I rOW and n columns;and a scalar may be thought of as a matrix of I row and I column. A matrix of nrows and n columnsis square; a squarematrixin whichall elementsoff the maindiagonal are zero is termed a diagonal matrix. The main diagonal starts in theupperleft cornerandendsin the lowerright;its elementsareall, a2 2 , ... , 3on'

(2.1·9)

[ ::~ : ::"~:::: ::"~:]3l:T = :

xnYI XnY2 xnYn

Similarly, we can formthe matrix~T as

where~T is calledthe scatter matrix of the vector!.

Vector Derivative - By usingpreviouslydefinedoperations,the derivative ofa vector may be defined. For continuous vectors J!(t) and J!(t + .1t) we get (2.1·13)

[

1(t + .1tj [Xl(tj [X1(t+.1tJ-X1(tJx,(t + .1t) x,(t) x,(t + .1t) - x,(t)

l\(t Ht) -l\(t) = : -: = : (2.1-10)

. . .xn(t + .1t) xn(t) xn(t + .1t) - xn(t)

and for m = 3 and n = 2:

Scalar Multiplication - The matrix A may be multiplied by a scalar k. Suchscalar multiplication is denoted by kA where

(2.1·14)

Matrix Derivative and Integral - Using the operations of addition andmultiplication by a scalar. the derivative and integral of a matrix may beformulated. Analogous to the vector operations. we obtain

Thus. for matrix subtraction. A - B = A + (-1)B. that is. one simply subtractscorresponding elements.

Matrix Multiplication - The product of two matrices, AB, read A times B, inthat order. is defined by the matrix

JA(t) dt = [J&;j(t)-l(2.1·19)

(2.1-20)

AB=C= [Cij) (2.1·15) Zero Matrix - The matrix [01. herein simply denoted O. is defined as thematrix in which every element is the number O.

a l lbl 2 +a12 b22 +a13b321

a21 b12 +a22b22 +a23b32J

The product AB is defined only when A and B are conformable. that is. thenumber of columns of A is equal to the number of rows of B. Where A is m X pand B is P X n, the product matrix [~jJ has m rows and n columns and eij isgiven by

(2.1-21)

where 6jj is the Kronecker delta defined by

Identity Metrix - The identity matrix I is a square matrix with I located ineach position down the main diagonal of the matrix and O's elsewhere - Le.,

(2.1-16)

p

cij - L "ikbkj = ail blj + ai2b2j + ... + aipbpjK=l

For example. with A andB as previously defined. AB is given by

The result of multiplication by the identity matrix is AI = IA = A. that is. nochange.

for m = 2. p = 3. n = 2. It is noted that two matrices are equal if. and only if. allof their corresponding elements are equal. Thus A = B implies aij = bij for all i(1.2 •... .m) and all j (1.2 •...•n). For square matrices A and B of equaldimension. the products AB and BA are both defined. but in general AB *BA- Le.,matrix multiplication is noncommutative.

_ {I ifi =j6jj- Oifi*i (2.1-22)

Vector·Matrix Product - If a vector! and matrix A are conformable, theproduct

Matrix Determinant - The square matrix A has a determinant denoted byIA I.The determinant is a scalar defined by

(2.1-17)

is defined such thatIAI=t i:

i=1 j=lj*i

n

L aUa2j · · · <!nil1l=1fi!*i.j."". k

(2.1·23)

n

Yi= L aijxj

J= I

(2.1-18)In each term the second subscripts iJ, . " " • ~ are permutations of the numbers1,2, " .. .n, Terms whose subscripts are even permutations are given plus signs,


and those with odd permutations are given minus signs. The determinant of theproductAB is

lAB1= IAI' IBI (2.1-24)

which is the solution to Eq. (2.1-27). Computing inverses for large matrices is atime-consuming operation; however, it is suitable for computer solution. Resultsfor 2 X 2 and 3 X 3 matrices are given below. If

The Invene of a Matrix - In considering the inverse of a matrix we mustrestrict our discussion to square matrices. If A is a square matrix, its inverse isdenoted by AI such that

The determinant of the identity matrix is unity. One common use of thedetenninant is in solving a set of linear algebraicequations by means of Cramer'sRule (Ref. I). However, modern computer algorithms utilize other techniques,which do not rely on calculating the matrix determinant for solving linearequations. For present purposes, the matrix determinant is used as part of acriterion for matrix invertibility - a subject which is discussed in subsequentparagraphs.

Al A=AAI =1 (2.1-25)

A= [: ~]

then

AI = -.L [d -abJIAI -c

where

IAI=ad-bc

If

(2.1·29)

(2.1-30)

(2.1-31)

That is, the multiplication of a matrix by its inverse is commutative. For squarematrices A and B of the same dimension, it is easily shown that A= [:

beh

(2.1-32)

(2.1-26)then

It is noted that all square matrices do not have inverses - only those that arenonsingular have an inverse. For purposes of clarification, a nonsingular matrixmay be defined as follows:

No row (column) is a linear combination of other rows (columns)

orwhere

[

ei - fhgf - didb-ge

ch- biai-gogb -ah

bf - ec]de - afae-bd

(2.1-33)

The determinant of the matrix is not zero - i.e., IA 1*0.

If IA 1= 0, two or more of the rows (columns) of A are linearly dependent. Useof the inverse matrix A-I can be illustrated by its role in solving the matrixequation

IA 1= aei + bfg + cdh - ceg - bdi - afh

It can be shown that, in general,

AI. ,L adjA

(2.1-34)

(2.1-27)

where A is an n X n matrix, and! and.r are n-row column vectors (n X I).Premultiplication by A-I (assuming that it exists) yields

(2.1-28)

where IAI is the determinant of the square matrix A_ and adjA (the adjoint ofA) is the matrix formed by replacing each element with its cofactor andtranspOSing the result [the cofactor of &j' is the determinant of the submatrixformed by deleting from A the ilb row and jtb column, multiplied by (-1)1 + I].Clearly, hand computation of the inverse of a large matrix is tedious since anumber of successively smaller submatricesare obtained and the determinant foreach of these must also be computed.


The Transpose of a Matrix - The transpose of a matrix is obtained byinterchanging its rowsandcolumns. For example, if

determinant. It follows from the discussion of invertibility that a nonsingularn X n matrixhas rankn,

then

(2.1-40)

Matrix P1aIdoinvene - The matrix inverse is defined for square matricesonly; it is used in the solution of sets of linearequations- of the formAx = ~

in which the number of equations is equal to the number of unknowns. Fornonsquare matricesused to describesystems of equationswhere the numberofequations does not equal the numberof unknowns, the equivalentoperatoristhe pseudoinverse (Ref. 4).

Ifa matrixA has morerows thancolumns, the pseudoinverse is definedas

for nonsingular (ATA). In the solution of linear equations this is the so-calledoverdetermined case - where there are more equations than unknowns. Theresultingsoluticn.j, = A4Fx is best in a least-squares sense. If A has more columnsthan rows,the pseudoinverse is defined as(2.1·35)

a

2 1]

a22

a23

or, in general

Thus, an m X n matrix hasan n X m transpose. Forsquarematrices,if A = AT,then A is said to be symmetric. If AT = 1\1 , then A is said to be orthogonal.Thedeterminant of an orthogonal matrix is unity. If AT = -A, A is said to beskew-symmetric. A propertyof a skew-symmetric matrix is that allelements onthe maindiagonalarezero.

For matricesA and B, of appropriate dimension,it can be shown that

(2.1-36)

(2.1-41)

Thiscorresponds to the underdetermined case - thereare fewerequationsthanunknowns. Typically, such a situation leads to an infinite number ofleast-squares solutions (consider the least-squares fit of a straightline to a singlepoint). The solution resulting from the pseudoinverse is also best in aleast-squares sense, and the vector !- =A#x. is the solution of minimumlength.

Several useful relations concerningthe pseudoinverse aregivenbelow:

IfA is invertible then

(2.1-37)

Trace - The trace of square matrix A is the scalar sum of its diagonalelements.Thus

AA#A=A (2.1-42)

A#AA#= A# (2.1-43)

(A#A)T =A#A (2.1-44)

(AA#jT = AA# (2.1-45)n

trace[A] = L aiii=1

(2.1-38) FunetiCM1s of SquareMatrices - With'"as a scalarvariable, the equation

Rank - The rankof matrix A is the dimension of the largest squarematrixcontained in A (formed by deleting rows and columns) which has a nonzero

is called the characteristic equation of the square matrix A. Values of A thatsatisfy this equation are the eigenvalues of the square matrix A. By expandingthe determinant, it can be seen that f(A) is a polynomial in A. TheCayley-Hamilton Theoremstates that, for the same polynomialexpression,

For squarematricesA and B

trace [AB] = trace [BA] (2.1-39)

f(A)= IAI -A 1=0

f(A) = 0

(2.1-46)

(2.1-47)

20 APPLIED OPTIMAL ESTIMA TfON REVIEW OF UNDERLYING MATHEMATICAL TECHNIQUES 21

This scalar expression is referred to as a quadratic form. An orthogonal matrix Qcan always be found such that (Ref. I)

That is, every square matrix satisfies itll own characteristic equation.It is possible to define special polynomial functions of a square matrix A. two

of which are:A'=QTAQ (2.1-56)

A-I A I A2 lA'e - + +2f +jf + ... (2.148) is diagonal with aii = A;., where the A;. are the eigenvalues of A. It can be shownthat the quadratic form reduces to

sinA=A- J. A' + J. AS _3! 5!

(2.149) (2.1·57)

where where

The matrix exponential eA occurs in the study of constant coefficient matrixdifferential equations, and is utilized in Chapter 3. Coddington and Levinson(Ref. 5) provide a number of useful relations for the matrix exponential. Amongthem are:

VECTOR·MATRIX OPERATIONS

Vectors and matrices can be combined in mathematical expressions in variousways. Since a vector of dimension n can be thought of as an n X 1 matrix, therules developed above for matrix operations can be readily applied. Severalof the more common operations are briefly considered in the followingdiscussion.

(2.1-59)

(2.1-60)IIAII =yX;"

(2.1-58)

Definite Forms - The quadratic form is further used to define properties ofthe matrix A:

If!TAl' >0 for all realI', A is said to be positive definite.

If!TAlI;;>0 for all real !, A is said to be positive semidefinite.

If!TAlI'; 0 for all real !, A is said to be negative semidefinite.

IfI'TAI' <0 for all real g, A is said to be negative definite.

Norm - The matrix-associated quantity, analogous to the length of a vector,is called the nor", and is defined by

With vector length as defined in Eq. (2.1-8), the norm is readily computed as

where Xl is the maximum eigenvalue of the matrix product ATA (Ref. 6).

(2.1-50)

(2.1-51)

(2.1-52)

(2.1-53)

(2. I-54}

A' =AAA etc.

A2 =AA

eA+B = eAeS, if AB=BA

Quadratic Forms - An n X n symmetric matrix A and the a-dimensionalvector! can be used to define the scalar quantity

(2.1-55)

Gradient Operations - Differentiation of vector-matrix expressions, withrespect to scalar quantities such as time, has been previously discussed. Rules fordifferentiation, with respect to vector and matrix quantities, are given below.The gradientor derivative of a scalar function z, with respect to a vector 3. is thevector

or

azai =! (2.1·61)

22 APPLIED OPTIMAL ESTIMATION

with

(2.1-62)

REVIEW OF UNDERL VING MATHEMATICAL TECHNIQUES 23

with

(2.1-70)

A case of special importance is the vector gradient of the inner product. Wehave Two scalar functions of special note are the matrix trace and determinant. Atabulation of gradient functions for the trace and determinant is provided inRef. 7. Some of particular interest for square matrices A~ B, and C are givenbelow:

and

-2..{v TlO = yall" -

(2.1-63)

(2.1-64)

aaA tr&ceIA] = 1

~ trace [BAC) =BT(TaA

(2.1'71)

(2.1-72)

The second partial of a scalar z, with respect to a vector .!., is a matrix denoted

bya

aA trace[ABAT) = A(B + BT) (2.1-73)

a'zaI' = A

(2.1-65) (2.1-74)

with

a'z (2.1-66)aij = a,,;a"j LEAST-80UARES TECHNIOUES

(2.1-75)

The determinant of A is called the Hessian of z.In general, the vector gradient of a vector is the matrix defined as

al.T = A (2_1-67)all

with

a.. =~~ (2.1-68)u ax;

If z and X are of equal dimension, the determinant of A can be found and iscalled the Jacobian of!. The matrix gradient of a scalar z is defined by

Vector and matrix methods are particularly convenient in the application ofleast-squares estimation techniques. A specific example of least-squares estimation occurs in curve-fitting problems. where it is desired to obtain a functionalform of some chosen order that best fits a given set of measurements. Thecriterion for goodness of fit is to minimize the sum of squares of differencesbetween measurements and the "estimated" functional form or curve.

The linear least-squares problem involves using a set of measurements, l.which are linearly related to the unknown quantities ~ by the expression

(2.1-76)

where! is a vector of measurement "noise." The goal is to find an estimateof the unknown, denoted by.&.. In particular, given the vector difference

~=BaA

(2.1·69)

we wish to find the & that minimizes the sum of the squares of the elements ofZ.- Hi.Recall that the vector inner product generates the sum of squares of a

24 APPLIED OPTIMAI.·ESTIMATION REVIEW OF UNDERLYING MATHEMATICAL TECHNIQUES 25

vector. Thus,we wishto minimize the scalar cost functionJ, where

(2.1-77)

Minimization of a scalar,with respectto a vector, is obtainedwhen

PROBABILITY

Consider an event E, which is a possible outcome of a random experiment.We denote the probability of this event by 1'I(E)and intuitively think of it as thelimit, as the number of trials becomes large, of the ratio of the number of timesE occurred to the number of times the experiment was attempted. If all posaibIeoutcomes of the experimentaredenoted by Eh i=l ,2, ... ,n, then

(2.1-78)(2.2-1)

and the Hessianof J is positivesemidefinite aod

It is readily shown that the second derivative of J, with respect to i, is positivesemidefinite; and thus Eq. (2.1-80) does, iodeed, define a minimum. When HTHpossesses an inverse [i.e., when it has a nonzero determinant),the least-squaresestimate is

Note the correspondence between thisresultand that givenin the discussionofthe pseudoinverse. This derivation verifies the least-squares property of thepseudoinverse.

Implied in the preceding discussion is that all available measurements areutilized together at one time - i.e., in a so-called batch processing scheme.Subsequent discussion of optimal filtering is based on the concept of recursiveprocessing in which measurements are utilized sequentially, as they becomeavailable.

prescribe the limits of the probability.The joint event that A aod B and C, etc., occurred is denoted by ABC ... ,

and the probability of this joint event, by Pr(ABC ...). If the events A,B,C, etc ..are mutually independent - which means that the occurrence of anyone ofthem bearsno relation to the occurrence of any other - the probabilityof thejoint event is the product of the probabilities of the simple events. That is,

(2.2·2)

(2.2-3)Pr(ABC ...) = Pr(A)Pr(B)Pr(C) ...

n

L Pr(E;) = Ii=l

if the events A,B,C, etc., are mutually independent. Actually, the mathematicaldefinition of independence is the reverse of this statement, but the result ofconsequence is that independenceof events and the multiplicative propertyofptobabilities go together.

The event that either A or B or C, etc., occurs is denoted by A+B+C and theprobability of this event, by Pr(A+B+C). If these events are mutually exclusivewhichmeansthat the occurrence of one precludesthe occurrenceof any other the probability of the total event is the sum of the probabilities of the simpleevents.That is,

(2.1-81)

(2.1-80)

(2.1-79)Ia'JIa&' ;;'0

Differentiating J and setting the result to zero yields

if events A,B,C, etc., are mutually exclusive. If two events A and B are notmutuallyexclusive, then

Clearly, if two events are mutually exclusive their joint probability Pr(AB) iszero.

For events which arenot independent,the concept of conditionalprobabilitycan provideadded information.The probabilityof event A occurring, giventhat

2.2 PROBABILITY AND RANDOM PROCESSES*

TIlls section contains a brief summation of material on probability, randomvariables and random processes with particular emphasis on application tomodern estimation. The reader desiring a more extensive treatment.of thismaterialis referredto anyone of the following excellent books on the subject:Davenport and Root (Ref. 9), Laning and Battin (Ref. 10) and Papoulis (Ref.

11).

.This material closely follows a similar presentation in Appendix H of Ref. 8.

Pr(A+B+C+ ...) = Pr(A) + Pr(B) + Pr(C) + ...

Pr(A+B) = Pr(A) + Pr(B) - Pr(AB)

(2.2-4)

(2.2-5)


event B has occurred, is denoted by Pr(A 1B). Tlus probability is defined by

It is apparent that, if events A and B are independent, the conditionalprobability reduces to the simple probability Pr(A). Since A and Bareinterchangeable in Eq. (2.2-6), it follows that

Let us consider possible outcomes Aj, i=l ,2.... .n, given that B has occurred,

Equation (2.2·11) is a statement of Bayes' theorem. We shall have the occasionto utilize this result in subsequent chapters.

Pr(BIAt)Pr(A,)

P (AlB) = Pr(AB)r Pr(B)

Pr(AIB)Pr(B) = Pr(BIA)Pr(A)

from which we reformulate Eq. (2.2-6) as

P (AlB) = Pr(B IA)Pr(A)r Pr(B)

Pr(A IB) =Pr(B IAj)Pr(Ai), Pr(B)

But

Pr(B) =Pr(BIAt )Pr(A t) + Pr(BIA, )Pr(A,) + .

+ Pr(BIAnlPr(A n)

so that

Pr(B IAj)Pr(Aj)Pr(Aj IB) = n

Li = 1

(2.2-6)

(2.2-7)

(2.2·8)

(2.2-9)

(2.2-10)

(2.2-11)

• Possible values - 1,2, 3,4,5,6• PI(value =j;j. 1, 2, 3,4,5,6) =1f6.

Joint (Independent) Event

• Expenment ~ roll two dice, A and B• Events - value of A and value of B• Joint event - values of A and B• POSsiblejoint event values - (I,ll. (1.2), ... , (6,6)• PI(joint event)~ PI(AB) = 1f36 =PI(A) Pr(B).

Mutually Exclusive Events

• Experiment - roU a die• Event - value is either 1 or 2• PrO + 2) = PrO) + Pr(2) = 1f6 + 1f6 = 1f3.

Non-Mutually Exclusive Events

• Experiment - roll two dice, A and B• Event - value of either A or B is 1• PI(A = 1 or B = 1)= Pr(A= 1) +PI(B = 1) -Pr(A = 1 and B = 1)

= 1/6 + 1/6 - 1/36 = 11/36.

Conditional Probability

• Experiment - toll three dice A, B and. C• Event E1 - obtain exactly two l's

E2 - A = 1, B and C any value

• PI(E,) = PI(A =I) , PI(B =I) , Pr(C '" 1)+ Pr(A "'1)' Pr(B = I)' PI(C = I)+ PI(A = 1) • PI(B '" 1) • PI(C = 1)

=3(1/6 X 1f6 X 5f6) = 5f72

PI(E,) = 1f6 X 1 X 1 = 1/6

PI(EIE,) = PI(A = I) • PI(B = 1) • Pr(C '" I)+ PI(A = 1) • Pr(B '" I) • PI(C = 1)

=20f6 X 1f6 X 5f6) =5/108

5PI(E1IE 2) = Pr(E,E2) =108 =~

Pr(E,) ! 18

6

• Thus, given that A = I, the probability of E1 occurring is four times greater.

Example 2.2-1

The probability concepts defined above are best illustrated by using a family of simpleexamples. In particular. the commonly used experiment involving the roll of a die is utilized.

Probability

• Experiment - roll a die• Event - value of the die

RANDOM VARIABLES

A random variable X is, in simplest terms, a variable which takes on values atrandom; and may be thought of as a function of the outcomes of some randomexperiment. The manner of specifying the probability with which differentvalues are taken by the random variable is by the probability distributionfunction F(x), which is defined by

F(x) = PI(X';; X) (2.2-12) I: G(x) 8(x - Xo) dx = G(xo) (2.2-18)

The inverse of the defining relationship for the probability density function is

or by the probability density function f(x), which is defined by

f(x) = dF(x)dx

F(x) = l: f(u) du

(2.2-13)

(2.2-14)

if G(x) is a finite-valued function which is continuous at x = xo.A random variable may take values over a continuous range and, in addition,

take a discrete set of values with nonzero probability. The resulting probabilitydensity function includes both a finite function of x and an additive set ofprobability-weighted delta functions; such a distribution is called mixed.

The simultaneous consideration of more than one random variable is oftennecessary or useful. In the case of two, the probability of the occurrence of pairsof values in a given range is prescribed by the joint probability distributionfunction

Anevident characteristic of any probability distribution or density function is F,(x,y) = Pr(X';; x and Y';; y) (2.2-19)

F(-) = f~ t(u) du = I (2.2-15)where X and Y are the two random variables under consideration. Thecorresponding joint probability density function is

From the definition, the interpretation of f(x) as the density of probabilityof the event that X takes a value in the vicinity of x is clear:

a'F,(x,y)f,(x,y)=~ (2.2-20)

f(x) = lim F(x + dx) - F(x)

dx->O dx

It is clear that the individual probability distribution and density functions for Xand Y can be derived from the joint distribution and density functions. For thedistribution of X,

An example of such a random variable is the outcome of the roll of a die. Asuitable definition of the delta function, 8(x), for the present purpose is afunction which is Zero everywhere except at x = 0, where it is infinite in such away that the integral of the function across the singularity is unity. Animportant property of the delta function, which follows from this definition, is

This function is finite if the probability that X takes a value in the infinitesimalinterval between x and x + dx (the interval closed on the right) is an inhnitesimalof order dx, This is usually true of random variables which take values over acontinuous range. If, however, X takes a set of discrete values Xi - with nonzeroprobabilities Pi - f(x) is infinite at these values of x. Thisis expressed as a seriesof Dirac delta functions weighted by the appropriate probabilities:

Corresponding relations give the distribution of Y. These concepts extenddirectly to the description of the joint behavior of more than two randomvariables.

If X and Yare independent, the event (X .;; x) is independent of the event (Y<; y); thus the probability for the joint occurrence of these events is the productof the probabilities for the individual events. Equation (2.2-19) then gives

(2.2-22)

(2.2-21)

(2.2-23)

(2.2-24)

f(x) = I: f,(x,y)dy

F(x) = F,(x,~)

f,(x,y) = fx(x)fy(y)

F,(x,y) = PI(X <; x and Y';; y)

= Pr(X';; x) Pr(Y';; y)

= Fx(x)Fy(y)

From Eq. (2.2-20) the joint probability density function is, then,

(2.2-17)

(2.2·16)= lim Pr(x<X';;x+dx)

dx->O dx

30 APPLIED OPTIMAL ESTIMATIONREVIEW OF UNDERLYING MATHEMATICAL TECHNIQUES 31

Subscripts X and Yare used to emphasize the fact that the distributions aredifferent functions of different random variables.

Expectations and Statistics of Random Variables - The expectation of arandom variable is defmed as the sum of all values the random variable may take,each weighted by the probability with which the value is taken. For a randomvariable that takes values over a continuous range. the summation is done byintegration. The probability, in the limit as dx-+O, that X takes a value in theinfinitesimal interval of width dx near x is givenby Eq. (2.2-16) as f(x) dx. Thus,the expectation of X, which we denote by E[X] is

The square root of the variance, or 0, is the standard deviation of the randomvariable. The rms value and standard deviation are equal only for a zero-meanrandom variable.

Other functions whose expectations are of interest are sums and products ofrandom variables. It is easily shown that the expectation of the sum of randomvariables is equal to the sum of the expectations,

E[Xl =f: x f(x)dx (2.2-25)

denoted by 0 2 where

0' = f: (x-E[X])' f(x)dx=E[X']-E[X]'

E[X, + X, + ... + Xnl = E[X.J + E[X,] + ... + E[Xnl

(2.2-28)

(2.2·29)

then. for independent Xi'

whether or not the variables are independent, and that the expectation of theproduct of random variablesis equal to the product of the expectations,

if the variables are independent. It is also true that the variance of the sum ofrandom variables is equal to the sum of the variances if the variables areindependent - i.e., if

X= t Xii=l

(2.2-30)E[X, X, ... Xnl = E[X, J • E[X,] •...• E[Xnl

TIlls is also called the mean value of X, the mean of the distribution of X, or thefirst moment of X. This is a precisely defined number toward which the averageof a number of observations of X tends, in the probabilistic sense, as the numberof observations increases. Equation (2.2-25) is the analytic definition of theexpectation, or mean, of a random variable. This expression is valid for randomvariables having a continuous, discrete. or mixed distribution if the set ofdiscrete values that the random variable takes is represented by impulses in f(x)according to Eq. (2.2-17).

It is frequently necessary to find the expectation of a function of a randomvariable. If Y is defined as some function of the random variable X - e.g.,Y=g(X) - then Y is itself a random variable with a distribution derivable fromthe distribution of X. The expectation of Y is defined by Eq. (2.2-25) where theprobability density function for Y would be used in the integral. Fortunately,this procedure can be abbreviated. The expectation of any function of X can becalculated directly from the distribution of X by the integral

E[g(X)] = .C g(x)f(x)dx (2.2-26)(2.2-31)

An important statistical parameter descriptive of the distribution of X is itsmean squared value. Using Eq. (2.2-26), the expectation of the square of X iswritten

E[X'] = f: x'f(x)dx (2.2-27)

A very important concept is that of statistical correlation between randomvariables. A partial indication of the degree to which one variable is related toanother is given by the covariance, which is the expectation of the product ofthe deviations of two random variables from their means,

E [(X~E[X])(Y-E[Y])] = !: dx !: dy(x~E[X])(y-E[Y])f,(x,y)

The quantity E [X'] is also called the second moment of X. The root-meansquared (nns) value of X is the square root of E[X']. The variance of a randomvariable is the mean squared deviation of the random variable from its mean; it is

= E[XY] - E[X] E[Y] (2.2-32)


In the expression, the term E[XY) is the second moment of X and Y. Thecovariance, normalized by the standard deviations of X and Y, is called thecorrelation coefficient, and is expressed as

interval is the reciprocal of the interval width as required to make the integral ofthe function unity. This function is shown in Fig. 2.2-1. The n0111U1l probabilitydensity function, shown in Fig. 2.2-2, has the analytic form

p ; E[XY] - E[X] E[Y)

aXay(2.2-33) f(x); ~a exp [_(X;f)'] (2.2-37)

The correlation coefficient is a measure of the degree of linear dependencebetween X and Y. If X and Y are independent, p is zero (the inverse is not true);if Y is a linear function of X, pis ± 1. If an attempt is made to approximate Yby some linear function of X, the minimum possible mean-squared error in theapproximation is ay 2 (1 - p2). This provides another interpretation of p as ameasure of the degree of linear dependence between random variables.

One additional quantity associated with the distribution of a random variableis the characteristicfunction. It is defined by

where the two parameters that define the distribution are m, the mean, and a,the standard deviation. The integral of the function, or area under the curve, isunity. The area within the ±la bounds centered about the mean is approximately 0.68. Within the ±2a bounds the area is 0.95. As an interpretation ofthe meaning of these values, the probability that a normally distributed randomvalue is outside ± 2a is approximately 0.05.

fix)

g(t); E[expijtX»)

f_- exptjtx] f(x)dx

(2.2-34)

1

h

x

Thus, the moments of x can be generated directly from the derivatives of thecharacteristic function.

Notice that Eqs. (2.2-34) and (2.2-35) are in the form of a Fourier transformpair. Another useful relation is

The Uniform and Normal Probability Distributions - Two important specificforms of probability distribution are the uniform and normal distributions. Theuniform distribution is characterized by a uniform (constant) probabilitydensity, over some finite interval. The magnitude of the density function in this

x

Figure 2.2-1 Uniform Probability Density Function

f [x]

Figure 2.2-2 Normal Probability Density Function

(2.2-35)

(2.2-36)

I f- .f(x); 2iT __ exp (-Jtx) g(t) dt

E[xn] ; j-n dng(t) Idtn t;Q

A property of the characteristic function that largely explains its value is thatthe characteristic function of a random variable which is the sum of independentrandom variables is the product of the characteristic functions of the individualvariables. If the characteristic function of a random variable is known, theprobability density function can be determined from

34 APPLIED OPTIMAL ESTIMATION REVIEW Of UNDERLYING MATHEMATICAL TECHNIQUES 35

RANDOM PROCESSES

A random process may be thought of as a collection, or ensemble, offunctions of time, anyone of which might be observed on any trial of anexperiment. The ensemble may include a finite number, a countable infinity, ora noncountable infinity of such- functions. We shall denote the ensemble offunctions by {x(t)}, and any observed member of the ensemble by x(t). Thevalue of the observed member of the ensemble at a particular time, say ti , asshown in Fig. 2.2-3, is a random variable. On repeated trials of the experiment.x(t,) takes different values at random. The probability that x(t,) takes values ina certain range is given by the probability distribution function, as it is for anyrandom variable. In this case the dependence on the time of observation isshown explicitly in the notation, viz.:

By calculating the characteristic function for a normally distributed randomvariable. one can immediately show that the distribution of the sum ofindependent normally distributed variables is also normal. Actually, thisremarkable property of preserving the distribution form is true of the sum ofnormally distributed random variables, whether they are independent or not.Even more remarkable is the fact that under certain circumstances thedistribution of the sum of independent random variables, each having anarbitrary distribution, tends toward the normal distribution as the number ofvariables in the sum tends toward infinity. This statement, together with theconditions under which the result can be proved, is known as the central limittheorem. The conditions are rarely tested in practical situations, but theempirically observed fact is that a great many random variables display adistribution which closely approximates the normal. The reason for the commonoccurrence of normally distributed random variables is certainly stated in thecentral limit theorem and the fact that superposition is common in nature.

We are often interested in two random variables which possess a bivariatenormal distribution. The form of the joint probability density function forzero-mean variables, written in terms of statistical parameters previously defined,is

F(XI,til = Pr[x(t,)" XlJ

The corresponding probability density function is

'( t) = dF(XI,t,)llXl.l dx,

(2.2-41)

(2.2-42)

For n random variables. the multidimensional or multivariate normal distribution is

with

These functions are adequate to define, in a probabilistic sense, the range ofamplitudes which the random process displays. To gain a sense of how quickly

1

~ I c=:~I I1 I

~(2.2-40)

The quantities

m=EW

and

P= E((z-wXJI-m)T]

are the mean and covariance of the vector X. respectively. Vector and matrixfunctions of random variables adhere to the operational definitions and rulesestablished in Section 2.1. Thus, the expected value of a vector (matrix) is thevector (matrix) containing expected values of the respective elements.

Figure 2.2-3 Membersof the Ensemble {X(t)}

members of the ensemble are likely to vary, one has to observe the same memberfunction at more than one time. The probability for the occurrence of a pair ofvalues in certain ranges is given by the second-order joint probability distributionfunction

(2.2-43)

and the corresponding joint probability density function

Stationarity - A stationary random process is one whose statistical propertiesare invariant in time. This implies that the first probability density function forthe process, f(x"t.), is independent of the time of observation t l. Then all themoments of this distribution, such as E[x(t l)) and E[x'(t,)), are alsoindependent of time - they are constants. The second probability densityfunction is not in this case dependent on the absolute times of observation. t 1

and ta , but still depends on the difference between them. So if ta is written as

(2.2-49)

f2(xt ,tt;X2.t2) becomes f2(xt,tt;X2 ,tt +T). which is independent of tr , but stilla function of T. The correlation functions are then functions only of the singlevariable T, viz.:

(2.2-44)

Higher-ordered joint distribution and density functions can be defined followingthis pattern. but only rarely does one attempt to deal with more than thesecond-order sta tistics of random processes.

If two random processes are under consideration, the simplest distributionand density functions that provide some indication of their joint statisticalcharacteristics are the second-order functions

"'xx(r) = E[x(tl) x(t l + r))

"'Xy(r) = E[x(tl) y(t, +r))

(2.2-50)

(2.2-51)

Correlation Functions - Actually, the characterization of random processes,in practice, is usually limited to even less information than that given by thesecond-order distribution or density functions. Only the first moments of thesedistributions are commonly measured. These moments are called autocorrelationand cross-correlation functions. The autocorrelation function is defined as

In the case where E[x(t,)), E[x(t,)), and E[y(t,)) are all zero, thesecorrelation functions are the covariances of the indicated random variables. Ifthey are then normalized by the corresponding standard deviations, according toEq. (2.2-33). they become correlation coefficients which measure on a scalefrom -1 to +1 the degree of linear dependence between the variables.

(2.2-52)

(2.2-53)

(2.2-54)

(2.2-55)

"'xx(O) = E[x')

"'Xy(-r) = "'yx(r)

"'xx(O);;>1"'xx(r) 1

We note the following properties of these correlation functions:

Ergodicity - One further concept associated with stationary randomprocesses is the ergodic hypothesis. This hypothesis claims that any statisticcalculated by averaging over all members of an ergodic ensemble at a fixed timecan also be calculated by averaging over all time on a single representativemember of the ensemble. The key to this notion is the word "representative." Ifa particular member of the ensemble is to be statistically representative of all, itmust display at various points in time the full range of amplitude, rate of changeof amplitude, etc., which are to be found among all the members of theensemble. A classic example of a stationary ensemble which is not ergodic is theensemble of functions which are constant in time. The failing in this case is thatno member of the ensemble is representative of all.

In practice, almost all empirical results for stationary processes are derivedfrom tests on a single function, under the assumption that the ergodichypothesis holds. In this case, the common statistics associated with a randomprocess are written

(2.2-45)

(2.2-48)

(2.2-46)

F,(x,t,;y,t, )=Pr[x(t,) <;; x and y t',) <;; y)

f (xt 'yt )=o'F,(x,tl;y,t,)2 ,It, 2 axay

and the cross-correlation function as

"'Xy(tl,t,)=E[x(tl) y(t,)] = [ dx f: dy xy f,(x,t, ;y,t,)


TE [X) = lim .L 1 x(t) dt

T~~ 2T -T(2.2-56)

REVIEW OF UNDERLYING MATHEMATICAL TECHNIQUES 39

A2 1 ITIPxx(T)=T lim -2T (cosWT-cos(2wt+wT+2B»)dt

T~~ -T

E[x 2 ) = lim -LT~~ 2T

TL x2(t) dt-T

(2.2-57)

A2=TCOS W T

The two results are equivalent and, thus, x(t) is an example of an ergodic process.

T

'l'xx(T) = lim -2TI f x(t) x(t +T) dt

T-+oo -T

T

'l'Xy(T)= lim -2TI 1 x(t)y(t +T) dt

T-+oo -T

(2.2-58)

(2.2-59)

Gaussian Processes _. A gaussian process is one characterized by the propertythat its joint probability distribution functions of all orders are multidimensionalnormal distributions. For a gaussian process, then, the distribution of x(t) forany time t is the normal distribution, for which the density function is expressedby

The joint distribution of x(t.) and x(t:!J is the bivariate normal distribution;higher-order joint distributions are given by the multivariate normal distribution.If ~(t) is an a-dimensional gaussian vector then the distribution of ~(t) is thenormal distribution expressed by

All the statistical properties of a gaussian random process are defined by the firstand second moments of the distribution. Equivalently, the statistics of theprocess are all contained in the autocorrelation function of the process. Clearly,this property is a great boon to analytic operations. Additionally, the output ofa linear system whose input is gaussian is also gaussian.

As a consequence of the central limit theorem. gaussian processes are thosemost frequently encountered in actual systems; as a consequence of theiranalytic convenience. they are also most frequently encountered in systemanalysis. It is, therefore, appropriate to introduce a shorthand notation tocontain the information in Eq. (2.2-61). The notation, which is used extensivelyin the sequel, is

Example 2.2-2

An example of a stationary ergodic random process is the ensemble of sinusoids, of givenamplitude and frequency, with a uniform distribution of phase. The member functions ofthis ensemble are all of the form

x(t) =A sin (wt +8)

where 8 is a random variable, uniformly distributed over the interval (0,211') radians. Anyaverage taken over the members of this ensemble at any fixed time would find all phaseangles represented with equal probability density. But the same is true of an average over alltime on anyone member;__For this process. then, all members of the ensemble qualify as"representative." Note that any distribution of the phase angle 8 other than the uniformdistribution over an integral number of cycles would define a ncnergodlc process. Therelevant calculations are given below. For the ensemble average autocorrelation function(f(O) =1/2_ for 0 '" 0 '" 2_) we get (Eq. (22-47»

2_

f . . 1'PXX(T) = A Sln(wt+ 6) A sm(wt + WT+ 8) 2- dO

o -

2£2-=:11' 0 (cos WT - cos(2wt + WT + 28») dB

A2="2 cos WT

while the time average autocorrelation function is (Eq. (22-58»

I [ (x_m)2]f(x,t) = V2ii a exp -~

f~,t) =(2njoA IP1'/2 exp [-i- ~ -m)T p-l ~ - m)]

~-N(m,P)

(2.2-60)

(2.2-61)

(2.2-62)

T

'PxX(T) = lim ...!... L A sin (wt + 8) A sin (wt + WT + 8) dtT-+oo 2T -T

which indicates that! is a gaussian (normal) random vector with mean mandcovariance P. By way of example, for a one-dimensional random process x withmean m and standard deviation a, we would write

x-N(m,a2 ) (2.2-63)

Power Spectral Density Functi.ons - The input-output relation for a linearsystem may be written (Ref. II, also, Section 3.3 herein) <l>xx(W) = I: "'xiT) exp(-jwT) dr (2.2-70)

where x(t) is the input function, y(t) is the output. and W(t,T) is the systemweighting function. the response at time t to a unit impulse input at time T. Ifthe system is time-invariant.thissuperposition integral reduces to

This expression is also referred to as the convolution integral. Using Eq. (2.2·65),the statistics of the output process can bewritten in tenus of those of the input.If the input process is stationary. the output process is also stationary in thesteady state. Manipulation of Eq. (2.2-65) in this instance leads to severalexpressions for the relationships between the moments of x(t) and y(t),

t

y(t) = 1 X(T)W(t,T) dr

y(t) = I~W(T)x(t - T) dro

(2.2-64)

(2.2-65)

is called the power spectml density function, or power density spect~m of therandom process {x(t)}. The term "power" is here used in a generalized sense,indicating the expected squared value of the members of the ensemble. <l>xx(w)is indeed the spectral distribution of power density for {x(t)! m that integrationof 4txx(w) over frequencies in the band from w. to ~2 Ylel~ the mean-squaredvalue of the process whose autocorrelation function consists only. of thoseharmonic components of lp (T) that lie between WI and W2' In particular, ~hemean-squared value of {x(t)fitself is given by integration of the power densityspectrum for the random process over the full range ~fw. This last r~sult is seenas a specialization of the inverse transform relation corresponding to Eq.

(2.2-70), namely

(2.2-71)

E[y] = E[x] I~ w(t) dto

(2.2-66)(2.2-72)

(2.2-67)

The Fourier transform of the cross-correlation function is catted the cross power

spectral density function

The desired input-output algebraic relationships corresponding to Eqs, (2.2-68)

and (2.2-69) are

(2.2·68)

(2.2·69)

(2.2·73)

(2.2-74)

and

where W is the system transfer function. defined as the Laplace transform of thesystem weighting function (with s = jW, see Ref. 13)

Analytic operations on linear time-invariantsystems are facilitated by the useof integral transforms which transform the convolution input-output relation ofEq, (2.2·65) into the algebraic operation of multiplication. Since members ofstationary random ensembles must necessarily be visualized as existing for allnegative and positive time, the two-sided Fourier transform is the appropriatetransformation to employ. The Fourier transforms of the correlation functionsdefined above then appear quite naturally in analyses. The Fourier transform ofthe autocorrelation function

W(s) =i~ w(T) e- ST dro

(2.2-75)

(2.2-76)

White Noise - A particularly simple form for the power density spectrum is aconstant, 4lxx(W) = 410, This implies that power is distributed uniformly over allfrequency components in the full infinite range. By analogy with thecorresponding situation in the case of white light, such a random process, usuallya noise, is called a white noise. The autocorrelation function for white noise is adelta function

(2.2-77)

We recall the definition of the Dirac delta 6(r) and note that themean-squared value of white noise, 'I'nn(O)= <1>.6(0), is infinite. Thus the processis not physically realizable. White noise is an idealized concept which does,however, serve as a very useful approximation to situations in which a disturbingnoise is wideband compared with the bandwidth of a system. A familiar physicalprocess closely approximated by white noise is the shot effect which is amathematical description of vacuum-tube circuit output fluctuation.

White noise is also quite useful in analytic operations; integration propertiesof the Dirac delta 6(r) can many times be used to advantage. Additionally, anumber of random processes can be generated by passingwhite noise through asuitable filter, Illustrations of the autocorrelation functions and power spectraldensity functions for several common random processes are shown in Fig. 2.2-4.Applications of such random processes will be considered in subsequentchapters.

Gauss·Markov Processes - A special class of random processes which can begenerated by passing white noise through simple fitters is the family ofgauss-markov processes. A continuous process x(t) is first-order markov if forevery k and

t, <t, < ... <tk

ilis true that

(2.2-78)

That is, the probability distribution for the process X(tk) is dependent only onthe value at one point immediately in the past, x(tk_l).1f the continuous processx(t) is first-order markov, it can be associated with the differential equation

(2.2-79)

where w is white noise (for a discrete first-order markov process, the associatedrelation is a first-order difference equation). If we add the restriction that the


PROCESS AUTOCORRElATION fUNCTION POWER SPECTRAL OfNSITY

I + IWHITE

~T -==1=wNOiSE

4>"ITI ~ 4>081TI ¢lxx = 410

MARKOV ~,~paOCE55m=O

I 2(30'2 w

I<Pxxfrl ,,=02e -P1 1r l ¢xx= w2;1J 2

! -w, -LlLw!j SINUSOIDJ -wI WIJ

¢In = fA2[

3(w-Wll+8Iw+wd]

IA2

<P10,(T) =""2 cos WI T

I

RANDOM -==:b ~.81AS I 1"

i I 4>" ITI ~ m2 ¢lu = 2".m 2 a (w)

Figure 2.2-4 Descriptions of Common Random Processes

probability density functions of w and consequently x also are gaussian, theprocess x(t) is a gauss-markov process. The statistics of a stationary gauss-markovprocess are completely described by the autocorrelation function

(2.2-80)

The so-called correlation time (lje point) is Ij~l. The spectral density of thewhite noise, w, which generates the ~rocess described by Eq. (2.2-80) is given interms of the variance of x as 2{31 a. The autocorrelations of many physicalphenomena are well-described by Eq. (2.2-80).

A continuous process x(t) is second-order markov,if for every k and


If the density function of w is gaussian, the process x(t) is second-ordergauss-markov. If x(t) has mean m and is stationary, its autocorrelation functionhas the form

it is true that I"xX(T) = 0'(1 +~2 IT 1)e-~2 IT '+ m' (2.2-83)

Equivalently. the probability distribution of x(tk) is dependent only on theconditions at two points inunediately in the past. An associated differentialequation is

(2.2-81)

(2.2-82)

The correlation time of this process is approximately 2.146/~2; the spectraldensity of the white noise w is defined as ~.'026(T). For the second-ordergauss-markov process the derivative of lPxx(T) is zero at T=O; this characteristic isappealing for many physical phenomena (see Example 3.8-1).

Definition of an nth-ordergauss-markov process proceeds directly from theabove. The characteristics of such processes are given in Table 2.2·1 and Fig.2.2-5. For the nth order gauss-markovprocess as n-e 00, the result is a bias.Heuristically. white noise can be thought of as a "zeroth-order" gauss-markovprocess.

Figure 2.2-5 Autocorrelation Functions of Gauss-Markov Processes

All AUTOCORRELATIONFUNCTIONS NORMALIZED

<PIOI'lCORRELATION TIME' 1

TABLE 2.2-1 CHARACTERISTICS OF STATIONARY MARKOV PROCESSES'

Order of Power SpectralMarkov Density. Autocorrelation Function, CorrelationProcess 4lxx(w) lPxx(-r) Time

1 2111 0 Z , -~dTI 1

wZ + 111

o e -~I

411i 02 -~,ITI 2.146

2(wZ +11~r

aZe {I + I1zl,.l} --~,

t6,6~ aa -~3ITI 2.903

33 (wZ+11~r

02 e {t + 1131,.1+t~'''121~3

(~n)2n-ll r(n)) , a -~nITI n-r r(n)(2~nITlrk-! Solvedn

(2n _ 2)! (w 2 + ,6~) n a e L: (2n-2)!k!r(n-k) Arithmetically

k-O for each n

n-+~ 211" 0z6(w) 0' ~

* r(n) is the Gamma function.

-;;;-z0;::uZ 0.•~Z0

~ 0.'

'"0ue 0.2

'""

os 1.0 15 20TIME SHIFT. 'r

2S '0 3S

REFERENCES

1. Hildebrand, F.B., Methods 0/ Applied Mathematics, Prentice-Hall, Englewood Cliffs,1961.

2. Hoffman, K. and Kunze, R., Linear Algebra, Prentice-Hall, Englewood Cliffs, 1961.

3. Luenberger, D., Optimizotion by Vector Space Methods, John Wiley & Sons, Inc., NewY",k.1969.

4. Penrose, R., "A Generalized Inverse for Matrices," Proc. Cambridge Phil Soc., 51,406-413.

5. Coddington, E.A. and Levinson, N., Theory 0/ Ordinary Differential Equations,McGraw-Hill Book Co., Inc., New York, 1955.

6. Fadeev, D.K. and Fadeeva, V.N., Computational Methods 0/ Linear Algebra, W.H.Freeman, San Francisco, 1963.

7. Athans, M., "Gradient Matrices and Matrix Calculations," Lincoln Laboratory,Lexington, Mass., November 1965 (AD624426).

8. Gelb, A. and VanderVelde, W.E., Multiple-Input Describing Functions and NonlinearSystem Design, McGraw-Hill Book Co., Inc., New York, 1968.

9. Davenport, W.B., Jr. and Root, W.L., An Introduction to Random Signals and Noise,McGraw-HIDBook Co., Inc., New York, 1958.

10. Laning, J.H., Jr. and Battin, R.H., Random Processes in Automatic Control,McGraw-Hill Book Co., Inc., New York, 1956..

11. Papoulis, A., Probability, Random Varillbles and Stochastic Processes, McGraw-HillBook Co., Inc., New York,1965.

12. Singer, R.A., "Estimating Optimal Tracking Filter Performance for Manned Maneuvering Targets," IEEE Trans. Aerosp. Elect. Syst., AES·6, 473-483, July 1970.

13. Truxal, J.G., Automatic Feedback Control System Synthesis, McGraw-Hill Book Co.,Jnc., New York, 1955.

14. Schweppe, F.C., Uncertain Dynamic Systems, Prentice-Hall, Englewood Cliffs, N.J.,1973.

PROBLEMS

Problem 2-1Show that irt = _pi pop-I

Problem 2-2


Problem 2-4

If R(t) is a time-varying orthogonal matrix, and

d:l(t) RT(t) = S(I)

show that S(t) must be skew-symmetric.

Problem 2-6

Consider the matrix A

A= ~ ~](a) Use the Cayley-Hamilton theorem to show that

A2-5A-21=0

What are the eigenvalues (AI,A:!:) of A? (b) Use the result in (a) and the matrix exponentialseries expansion to show that

eAt = 1 + At +.f (SA + 21)+~ (27A + 101)+~ (l45A + 541) +2 6 24 ...

(c) Collecting terms in the expression in (b) yields

'1 f , 51' 9,4 ] [51' 9" 14514

]eo- =ILl+t +3+4+'·· +A t+2 +2+~+".

Closed form values for the two series at(t) and a2(t) are not immediately apparent.However, the Cayley-Hamilton theorem can be used to obtain closed form expressions.Demonstrate that these solutions are of the form'

;\2eAt t _ Al l 2t

At - A2

For the matrix

A= [~-4

-; -~Jo -2

Problem 2-6

If.! is a three-dimensional position vector, the locus of Points for which

LTr:I.!<>;:l

show that the eigenvalues are

At = 2, A2 = -4, A3= 5.

Problem 2-3

Show that A is positive definite if, and only if, all of the eigenvalues of A are positive.

Where E is a positive definite symmetric matrix defines an ellipsoid. Show that the volumeof this ellipsoid is

V=~v'iE13

48 APPLIED OPTIMAL ESTIMATION REVIEW Of UNDERLYING MATHEMATICAL TECHNIQUES 49

Problem 2-7

The least-squares derivation in Section 2.1 is based on the assumption that allmeasurements j; are of equal quality. If, in fact, it is known that it is reasonable to applydifferent weights to the various measurements comprising k the Ieast-squares estimatorshould be appropriately modified. If the [th measurement zi has a relative weight of wi, it isreasonable to construct a matrix

Find the Poisson probability distribution function F(x). If the average arrival rate is0.4/minute, show that the probability of exactly 4 arrivals in 10 minutes is 0.195 and theprobability of no more than fow arrivals in ten minutes is 0.629.

Problem 2-11

If x and y are independent normal random variables with zero mean and equal variance0 2, show that the variable

z=Jx2+y2

has probability density described by

awm

with which to define a weighted cost function IZ -z'/20' 0--;;e , z>

f(z) =o ,z< 0

The random variable z is called a Rayleigh process. Show that its mean and variance are

Show that the weighted least-squares estimate is

and

Problem 2-8 tespectively•

Problem 2·9

Problem 2-10

Sketch Pw(w) and comment on its shape.

b= 1- (Po + 2P.maxl2Amax

Problem 2-13

Find the mean, mean square and variance of a random variable uniformly distribu ted inthe interval (a,b).

and that the variance of the acceleration is (Ref. 12)

0 2 = A~ax (1 + 4Pmax - Po)3

Problem 2-12The probability density for the acceleration of a particular maneuvering vehicle is a

combination of discrete and continuous functions. The probability of no acceleration is Po·The probability of acceleration at a maximum rate Amaxt-Amax) is Pmax- For all otheracceleration values, acceleration probability is described by a uniform density fun~tion.

Sketch the combined probability density function. Show that the magnitude of the uniformdensity function is

IWI~ 1

1O"lw1<3"

...!."IWI<13

to -3w'l

E(I-Iwll'16

o

pwlwl =

For an arbitrary random process x(t), show that the cross-correlation between x(t) andits derivative itt) is

If X, Y, and Z are independent random variables each uniformly distributed in theinterval (-1,1), show that the probability density function for the variable w = (X + Y +Zl/3 is

The Poisson or exponential random process is frequently used to describe random arrivalrates, or random service times in queueing problems. If ~ is the average number of arrivals,the probability density function describing the probability of x successes is

Problem 2·14

If XI and x2 are zero-mean gaussian random variables with a joint probability densityfunction given by Eq. (2.2-38), show that their sum z = XI + X2 is gaussian. Note that thisresult holds even though XI and X2 are dependent.


Problem 2·15

In Section 2.2 it is noted that white noise is physically unrealizable. However, it is oftensaid that a first-order gauss-markov process

x(t) =-/lX(t) + wIt)

with

is a physically realizable process. Since a physically realizable process cannot have aderivative with infmitc variance, show that x(t) is just as physically umealizable as whitenoise.

Thus, it is suggested that none of the continuous stochastic processes treated in thisbook are physically realizable. However, as noted in Ref. 14, this fact merely serves toemphasize the point that mathematical models are approximate but highly usefulrepresetttations of the real world.

3. LINEAR DYNAMIC SYSTEMS

Application of optimal estimation is predicated on the description of aphysical system under consideration by means of differential equations. In thischapter, state-space notation is introduced to provide a convenient formulationfor the required mathematical description, and techniques for solving the resultant vector-matrix differential equation are also presented. Although the initialdiscussion is concerned with continuous physical systems, results are extendedto the discrete case in which information is available or desired only at specifiedtime intervals. Controllability and observability , two properties based on systemconfiguration and dynamics. are defined and illustrated. Equations for timepropagation of system error state covariance are obtained for both discrete andcontinuous systems. A number of models for noise sources that affect the systemerror state covariance are discussed, with special attention given to theirstatistical properties and the manner in which they can be described instate-space notation. Finally. some considerations for empirical determination oferror models are presented.

3.1 STATE·SPACE NOTATION

Early work in control and estimation theory involvedsystem description andanalysis in the frequency domain. In contrast to these efforts, most of the recentadvances- work by Pontryagin, Bellman, Lyapunov. Kalman and others - haveinvolved system descriptions in the time domain. The formulation used employsstate-space notation which offers the advantage of mathematical and notational

52 APPLIED OPTIMAL ESTIMATION LINEAR DYNAMIC SYSTEMS 53

convenience. Moreover, this approach to system description is closer to physicalreality than any of the frequency-oriented transform techniques. It is particularly useful in providingstatistical descriptions of system behavior.

The dynamics of linear, lumped parametersystems can be represented by thefirst-order vector-matrixdifferential equation

Given an ntb-order linear differential equation

[D" + "n-I (t)Dn-1 +... al (t)D + ao(t)ly(t)~ w(t) (3.1-3)

where D = d/dt, we can define a set of state variables XI(t), ... , xn(t) by

where !(t) is the system state vector, \y(t) is a random forcing function, l!(t) is adetermini, tic (control) input, and F(t), G(t), L(t) are matrices arising in theformulation. This is the continuous form ordinarily employed in modernestimation and control theory. Figure 3.1-1 illustrates the equation. The statevector for a dynamic system is composed of any set of quantities sufficient tocompletely describe the unforced motion of that system. Given the state vectorat a particular point in time and a description of the system forcing and controlfunctions from that point in time forward, the state at any other time can becomputed. The state vector is not a unique set of variables; any other set .!'(t)related to 2\(t) by a nonsingular transformation

3(t) ~ F(t)ll;(t) +G(t)jy(t) + L(t)l!(t)

ll.'(t) ~ A(t)ll.(t)

(3.1-1)

(3.1·2)

XI(t) ~y(t)

x,(t)~",(t)

(3.1-4)

These relations can be written as a set of n first-order linear differentialequations:

",(t)= x,(t)

",(t) = x,(t)

fulfills the above requirement.

"n(t) = -ao(t)x,(t) - al (t)x,(t) - ... - "n-I (t)xn(t) + w(t) (3.1-5)

The first n-l of these equations follow from the state variable definitions; thelast is obtained using the definitions and Eq. (3.1-3). Expressing the equations invector-matrix form as in Eq. (3.1-1) yields

(3.1·6)

oo

oo

o 0

oI

-a2 ... -30-2 -an_l

!

Figure 3.1-1 Block Diagram of Continuous Representation ofLinear Dynamics Equation

This is called the companion form ofEq. (3.1.3). The system dynamics matrix Fis square with dimension n, corresponding to the order of the originaldifferential equation. Equation (3.1-6) is illustrated in block diagram form inFig. 3.1.2; note that in this formulation the state variables are the outputs ofintegrators.

In many linear systems of interest the forcing and control functions aremultivariable - i.e., jy(t) and lI(t) in Eq. (3.1-1) are composed of several nonzero


by 3 muUivariable forcing function. Several examples of the application ofstate-space notation to physical systems are given below.

Example 3.1-1

Consider the mass m shown in Fig. 3.1-3; it is connected to the left wall by a springwithspring constant k and a damper with damping coefficient c. Frictionless wheels are assumed.Displacement x is measured (positive-left) between the indicators; the entire container issubject to acceleration w(t) which is positive to the right. This is a one-dimensionalnanslanon-motion-only system and, consequently, displacement x and velocity x aresuitable state variables.

Figure 3.1-2 Block Diagram Representation of Eq. (3.1·6)

functions. Also, the individual elements of l\(t) and J!(t) may drive several statevariables simultaneously, causing G(t) and L(t) to be matrices with significantelements at locations other than those on the main diagonal. Ordinarily. systemdynamics are determined directly from the physics of the problem. A blockdiagram of the physical system may be sketched and the first-order vectormatrix differential equation determined by inspection. The outputs of each ofthe various integrators would constitute a convenient set of state variables.Thesystem dynamics equations can be written in the form of Eq. (3.1-1) as

wIt}

(3.1-8)

The equation of motion of the system is obtained froin Newton's second law

Figure 3.1-3 Second-Order Physical SYstem

The forces acting are rfx = -kx - ex corresponding to the spring and damper. Totalacceleration is a = x- W(t) SO that

(3.1-9)

(3.1-10)

mX + eX + kx = mw(t)

If the state vector is defined as

[i' ["f12

''oj [" l'g"

"1 ["Xz fZI fz z f2n x:: gZI gzz g2r w::

+

Xn fnl fn2 fnn Xn Snl Sn2 80r Wr

["£IZ

... "] ["']£ZI £zz••• £~s ~z

+ : (3.1-7)

~I ~2 Qns Us

The functions wand l! need not be of dimension n; in the equation shown thedimension of Wis r and that of y. is s. However. it is required that products GWand Ly' be of dimension n. Reference 1 further demonstrates the steps requiredto convert a high-order differential equation into a set of state variables driven

the appropriateequation for the system dynamics is

(3.1-11)


Example 3.1-2

Application of state-space notation when the system is described by a block diagram isillustrated using Fig. 3.1-4. which is a single-axis inertia) navigation system Schuler looperror diagram (Ref. 2). The following definitions apply: If) is the platform tilt angle (rad), svis the system velocity error (fps), sp is the system position error (ft), R is earth radius (ft), gis local gravity (fps2), €g is gyro random drift rate (radjsec}, and ea is accelerometeruncertainty (fps2 ). The system state variables are chosen as the outputs of the threeintegrators so that the state vector is

3.2 TRANSITION MATRIX

Having provided a mathematical formulation for the description of physicalsystems, we next seek techniques for solution of the system dynamics equation.The first step involves solving the equation when the forcing function Y{ andcontrol function y. are not present.

The homogeneous unforced matrix differential equation corresponding to Eq.(3.1-1) is

If the initial condition on the it h integrator is something other than unity - ascale factor c, for example - we find from the linear behavior of the system

Suppose that at some time, to, all but one of the outputs of the systemintegrators are set to zero and no inputs are present. Also, assume that thenonzero integrator output is given a magnitude of one. The behavior of the statevector for all times t, where t ~ to, can be expressed in terms of a time-varying"solution vector," lPi(t,to). where the subscript refers to the integrator whoseoutput is initially nonzero

EO

(3.1-12)3(1) = F(I)lI(1)

[

X! (I' IO)i]x,(l,lo)j

fi(l,lo)= :

Xn(l,lo)i

(3.2-1)

(3.2-2)

Figure 3.1-4 Single-Axis Schuler Loop Error Diagram (3.2-3)

An equivalent form for the random forcing function is

No control jj is applied and. therefore, the system dynamics equation can be written byinspection as

(3.2-4)

[fi: fj]and the vector

[~]

But this can be written as a product of the matrix

Now, due to the superposition principle, if integrators i and j both have nonzerooutputs ci and Cj at time to, the system response is the sum of the individualresponse vectors, viz:

(3.1-13)

(3.1-14)

As previously noted the only requirement is that the product Glt has the same dimension asthe vector ,&.

In general, every integrator can have a nonzero value at te ; these valuescomprise the state 2\(10). The time history of the state is the sum or theindividual effects,

(3.2-5)

which for compactness is written as

LINEAR DYNAMIC SYSTEMS 68

.!.lt21· cZ>lt 2,t,I!{t,)

• cZ>lt2,t,IcZ>(t,. tol!(to )

(3.2-6)

The matrix q,(t,to) is called the transition matrix for the system ofEq. (3.2-1).The transition matrix allows calculation of the state vector at some time t, givencompJete knowledge of the state vector at to. in the absence of forcingfunctions

Returning to Eq. (3.2-2), it can be seen that the solution vectors obey thedifferential equation

where

dplt,to)--=--at- = F(t) ti(t,to) (3.2-7)FiJUre3.2-1 Conceptual Illustration of the Time Evolution of a State Vector

in (n + I)-Dimensional Space

Also,

which is a general property of the state transition matrix, independent of theorder of to, t I , and t:z. Since for any t

(3.2·8)

Similarly, the transition matrix, composed of the vectors!ii' obeys the equations

2\(t,) =~(t"t,)2\(t,)

= ~(t"t, ~(t, ,Io)2\(to)

Therefore,

~(t,t) = ~t,to~(to,t) = I

(3.2-11)

(3.2-12)

(3.2-13)ddt~t,to) =F(t) ~t,to), ~t,t) =I (3.2·9)

premultiplying by oil" (t,to) provides the useful relationship

Figure 3.2-1 illustrates the time history of a state vector. The transitionmatrix q,(t" to) describes the influence of2\(to) on 2\(t,):

(3.2-10)

<F' (t,t o) ~ ~(to,t)

Since the inverse ofq,(t,to) must exist, it necessarily follows that

I~(t,to) 1* 0

(3.2-14)

(3.2-15)

Other relationships involving the determinant of the transition matrix are (t - to) = eF(t - to)


(3.2.22)

and

-£t 1(t,to)1 = trace [F(t)]I(t,to)1 (3.2·16) which depends only on the stationary system dynamics F and the interval t - to.Examples of the transition matrices for several simple physical systems areillustrated below.

Transition Matrix for Stationsry Systems - For a stationary system, the Fmatrix is time-invariant and the transition matrix depends only on the timeinterval considered, viz:

Example 3.2·'Consider the circuit shown in Fig. 3.2-2, which is composed of a voltage SOurce,v, a

resistor, R, and an inductor, L. Kirchhoff's voltage law yields

1(t,to)1 = exp [~: trace [F(T)] dT]

(t,to) = (t - to)

(32-17)

(3.2-18)

v:::iR+L"*" (3.2-23)

nus is easily shown by expanding X(t) in a Taylor's series about some time, to,

(3.2·\9)

But from Eq.(3.2.1)

!(to) =FX(to)

ii(to) = FiI(to) = F'X(to)

etc.

Substituting, the expansion becomes

v

Figure 3.2-2 Elementary Electrical Circuit

Weassume j:::ioat t:::to and v:::O for aUtime, which yields

(3.2-14)

F'(t-to)'21(t)= X(to) + F(t-to)X(to) +--2-'-X(to) +...

=[\ + F(t-to) + -ft F'(t-to)' + ...JX(to)

In Chapter 2, the matrix exponential is defined as

(3.2-20)

(3.2-21)

The system dynamics matrix F is merely the scalar quantity ~ R/L Elementary differentialequation solution techniques yield

(3.2-25)

as the solution to Eq. (3.2-24). From the solution we identify the transition matrix as

_B.(t-to)4t(t,to)::: e L

Consequently, the transition matrix for the stationary system can be identifiedfrom Eq. (3.2.20) as

Properties of the transition matrix are readily verified. For times to, t r , ta. we write

(3.1-26)

62 APPLIED OPTIMAL ESTIMATION LINEAR DVNAMICSVSTEMS 63

so that

(3.2-27)

(3.2-28)

so that

eFt::I+Ft+ -:AF2t2+ t F3t3

= [~-±W~'~+~ __i-:o~ ~;'~ ~-]1 33 1 22

wot - 3! wo t +... : 1- '2Wo t + ...

Weidentify the two series in Eq. (3.2-31) as trigonometric functions:

(3.2·31)

Figure 3.2-3 is, in fact, a representation of a second-order oscillator; this identification isborne out by the oscillatory nature of the transition matrix.

Example 3.2-2The system shown in Fig. 3.2-3 is used as a second example of transition matrix

computation. Integrator outputs x I and )0;2 are convenient state variables; the systemdynamics equation is obtained by inspection a,

[

coswot -sinwotJ4>('.0) =

sinwot coswot(3.2-32)

(3.2·29)

3.3 MATRIX SUPERPOSITION INTEGRAL

Having computed the homogeneous solution, we now seek the differentialequation particular solution. Consider first the linear system including forcingfunction inputs:

Matrix multiplication yields

[0 -woJ .z.z two'"Ft:: ; .. 2t ::

wot 0 0:,J; F','= [ 0" WOo"J ..·

-wo t ~Wo t

(3.2-30)

iI(t) = F(t)JI(t) +L(t)!l(t) (3.3-1)

Referring to Fig. 33-1, we see that the effect of the input to the [th integratorof Fig. 3.1-2 over a small interval (T - aT, T) can be represented as an impulsewhose area is the i'b row of the term L(T)l!(T) times the interval Ll.T. This impulseproduces a small change in the output of the integrator,

The effect of this small change in the state at any subsequent point in time isgiven by

x

Ll.xi(T) = (L(T)!l(T))iLl.T

Thechange in the entire state vector can beexpressed as

[

Ll.X!(T)]Ll.X,(T)

Ll.X(T) = : = L(Tlll(T)Ll.T

Ll.Xn(T)

(3.3-2)

(3.3-3)

Figule 3.2-3 Second-Order Oscillatory System ~t) [given an impulsive input L(r)!l(T) Ll.T] = ct>(t.T)L(T)J!(T)Ll.T (3.3-4)

We seek a solution of a similarnature, which also includes the effect of the forcing function.The assumed form is...

I

[l (TJ~ITIJ, ' INPUT TO ,'h INTEGRATOR

.. t

Figure 3.3-1 Representation of the Input to the [thIntegrator as an Impulse

lI(t):4>(I,IO)l(l)

Substitution in Eq. (3.3-1) yields

ddt [4>(1,10)1(1») = F(t)..(t,lo)l(l) + L(1)l!(I)

F(,) 4>(',10) 1(1) + 4>(t,lo) i(t) = F(') 4>(','0) 1(1) + L(') l!(t)

which reduces to

i(')= 4>('o,t) L(')l!(t)

Solving for l(t), we obtain

I

1(t) =1(10) + 1 4>(10") L(dl!(') d'10

(3.3-8)

(3.3-9)

(3.3-10)

(3.3-11)

Because the system is linear, the response to an input can be viewed as the sumof individual responses to the impulses comprising the input. In the limit as6.T-+ 0 the effect of the input on the state at some time, t, is given by

2\(t)= [ ~(t,T)L(T)l!(T)dT (3.3-5)

Substituting this result in Eq. (3.3-8) yields

But1(to) = .!.(to) so that the desired solution to Eq. (3.3-1) is

If the state at some time, to, is known, it is only necessary to observe the inputafter to, by utilizing Eq. (3.2-{j):

t

.!.(t)=4t(t,to)~.(to)+ [ 4t(t,T)L(T)!:!.(T)dT

'0(3.3-12)

(3.3-6)

Example 3.3-2The electrical circuit of Fig. 3.2·2 is now subject to a nonzero input voltage, v. We

assume initial condition i=10 at time t=O. Prior to t=O. the voltage v is zero; for t>O it isconstant at v-v,The system equation is (t:>O)

Equation (3.3-6) is often called the matrix superposition integral. Of course,differentiation of Eq. (3.3·6) can be shown to result in Eq. (3.3.1). An alternatederivation of the matrix superposition integral is shown in Example 3.3-1 and itsapplication to a physical system is illustrated in Example 3.3·2.

Example 3.3-1The solution to the homogeneous equation is

<Ii R V-=--i+dt L L

Recall that the transition matrix is

R--(t-to)4t(t,to) = e L

(3.3·13)

(3.3-14)

13·3-7)

_.B.,<1>(',0) = e L

SUbstituting in Eq. 0.3-6), we obtain

R , R--, f --('-') Vi(t) = ioe L + e L L"d1'

o


In general, Eqs. (3.4-3) and (3.4-4) provide unique definitions of only theproducts rk'ilk and Akllk and not the individual terms r k, Wk' Ak and llk' Inthe following discussion. care is taken to make this distinction clear. Notice thatif \y(t) is a vector of random processes. !k and rk\Yk will be vectors of randomprocesses. Equation (3.4-1) is illustrated in Fig. 3.4-1.*

The solution of the dynamics equation, when a random input is present.proceeds in an analogous fashion. Thus. corresponding to

R ( R)--t V --t:ioe L +"R l-e L

3(t) = F(t) X(t) +G(t) W(t)+ L(t) ll(t)

we directly find

(3.3-15)

(3.3-16)

3.4 DISCRETE FORMULATION

If interest is focussed on the system state at discrete points in time. tk ,

k=1,2, ... , the resulting difference equation is, from Eq. (3.3-17),

It is important to note that, in subsequent discussions. we do not deal with alldiscrete systems, but rather, only those that can be derived from continuoussystems. Without this restriction there could be extensive computationaldifficulty - e.g., in the discrete system whose transition matrix ~k is notinvertible. By considering only the subset of discrete systems noted above. theinvertibility of <l>k is assured.

x(t) =(t,to)X(to) + t J'J (t,r)G(r)w(r)dr + (t,r)L(rlll(r)drto to

(3.3-17)

(3.4-1)

Figure 3.4-1 Illustration of Discrete Representationof Linear Dynamics Equation

where3.5 SYSTEM OBSERVABILITY AND CONTROLLABILITY

OBSERVABILITY

(3.4-2)

(3.4-3)

(3.4-4)

A discussion of observability requires that the concept of a measurement beemployed. Measurements are bnefly discussed In Section 2.1 with regard toleast-squares estimation and further discussion is given in Chapter 4. At thispoint it suffices to say that measurements can indeed be made. They are denotedby lie and are assumed linearly related to the discrete system state Xk by theobservation matrix Hk, as

"In the sequel, the case of rk =I andJ.lk = 2,is treated quite often.


(3.5·1)

where Ik is the measurement noise. Given a sequence of measurements 10,• 1, ••. ,~, the observability condition defines our ability to determineXO,xI,... ,xk from the measurements.

Consider the discrete deterministic, constant nth-order system* (Ref. 3)

• A system is observable at time t1>"0 if it is possible to determine the statelI(to) by observing l.(t) in the interval (to. tj ). If all states lI(t)corresponding to all Z(t) are observable. the system is completelyobservable.

• A continuous deterministic n'b-order, constant coefficient, linear dynamicsystem is observable if, and only if, the matrix

with n scalar noise-free measurements

(3.5.2)

[

' I2= W: FTHT:

I I

has rank n.

(3.5-7)

Zk = HlIk' k=O,I,2 •... ,0-1 (3.5.3)

so that H is a constant, n-dirnensional row vector. We may write

Zo = HXo

z, = !ill! = H~lIo

(3.5-4)

Example 3.5-1

Consider the third-order system in Fig. 3.5-1 described by:

(3.5-8)

If measurements can be made only at the output of the final integrator, then

Therefore,(3.5-9)

(3.5.5)

(3.5-10)

MEASUREMENT,,I,

11.3 :

iI = 10 0 II

We compute

so that

(3.5-6)

be of rank n. The same condition will apply when measurements are a vector,~extension to continuous systems is straightforward.

More complete statements of observability and the observability conditionfor continuous systems follow:

If lIo is to be determined uniquely. the matrix 2 T (or equivalently 2) must havean inverse - i.e., be nonsingular. Thus, the observability condition is that thematrix

[I I I 1

2= HT I ~THT I ... 1(~T)n-lHTI 1 I

*A time-invariant discrete system is denoted by 41k=41,Ak =A, etc. FiJure 3.5-] Third-order System with Output Observation Only

70 APPLlEO OPTIMAL ESTIMATION LINEAR OYNAMIC SYSTEMS 71

A square n X n matrix has rank n if it has a nonzero determinant. The determinant of:=: iszero so that the matrix has rank less than 3; thus, the system is not observable. The physicalinterpretation of this result is that it is impossible to distinguish between the spectrallyidentical states xI and X2 when the only available measurement is their sum.

CONTROLLABILITY

We now concern ourselves with determining conditions such that it is possibleto control the state of a deterministic linear dynamic system - Le., to select aninput so that the state takes any desired value after n stages. A simple derivationof controllability for the case of a discrete system driven by a scalar control isanalogous to the derivation of observability; it is left as an exercise for thereader. The condition is that the matrix

[. I I ~

0= /I.:~/l.I ... I~-I/I. (3.5-12)I I I

be of rank n, We proceed directly to the statements of controllability and thecontrollability condition for a continuous system:

and term the matrix

[010]

It= ~ ~(3.5-11)

1-----.--+"

1-----.,.-...',

Figure 3.S-2 Parallel First-Order Systems with a Common Control Input

so that

F= Ca -~l L=[:]yielding

e = [I _a] (3.5.15)1 -e

(3.5-14)

(3.5-13)

• A system is controllable at time tl>to if there exists a controll!(t) suchthat any arbitrary state 3(t o) = 2. can be driven to another arbitrary state

3(t')=1'• A continuous deterministic nth-order, constant coefficient, linear dynamic

system is controllable if, and only if, the matrix

~I I I I ]

0= LlpLlp'LI ... IP-ILI I I I

has rank n.

Example 3.5-2

Consider the system in Fig. 3.5-2. We wish to find conditions on 0: and 13 such that thesystem is controllable. The system is second-order and described by:

To determine the rank of e, we compute its determinant

lei = -~ +a (3.5-16)

The controllability condition requires a nonzero determinant for e

0,* -13+ a

a*~ (3.5-17)

The interpretation of this result is quite straightforward. If 0: and i3are equal, the twofirst--order systems are identical and there is no way that an input u could, by itself, producedifferent values of x r and x2. The controllability condition requires that Xl and x2 can bedriven to any arbitrary values: with a = 13 the condition cannot be met and, therefore. thesystem is uncontrollable. It is important to note that the test applied to the matrix e onlyestQblirher controllability; it does not provide a means to determine the input u required tocontrol the system. This latter task can be quite difficult.

Nonuniqueness of Model - Having introduced the concept of a measurement,we have completed the basic state-space structure of a linear system. For acontinuous linear system, the general state-space model is

i(t) = F(t)l>(t)+G(t) jy{t)

~(t) = H(th(t) +ret)

(3.5·18)

(3.5-19)

The random state and forcing function vectors are frequently described interms of their covariance matrices.The cross-covariance matrix of two vectors rand,!.is defined in terms of the outer products: -

These models are not unique; given the pertinent system input and outputquantities - i.e., 2\(0), ~(t) and ISt) in the continuous case - there are manydifferent sets of F(t), G(t) and H(t) which will yield the same overallinput-output behavior. As noted in Ref. 4. choosing a particular set of F, G andH corresponds to the choice of a coordinate system. This choice can haveconsiderable impact in numerical analyses as well as affecting system observability and controllability as described above.

When.!.= .!oEq. (3.6-2) defines the covariance of.r; it is simply a matrix whoseelements are the second moments of the random components r1, r2.· .. ,rn. Inthe sequel. we define the error i in the estimate of a state vector to be thedifference between the estimated (i) and actual I.l»values:

For a discrete linear system, the model is

(3.5-20)

(3.5-21)

cross-covarianceof,rand!= E[u. - EWX1 - EW)Tj

= ElnTj - EW EIJ.TJ

- ".1I=1I-lI

Thecovariance of ii. designated P, is then givenby

(3.6-2)

(3.6-3)

(3.6-4)

3.6 COVARIANCE MATRIX

In the following discussions both the system state and forcing function arevectors whose elements are random variables. The state vector obeys arelationship of the fonn of Eq. (3.1-1), while the forcing function lY is. ingeneral, assumed to be uncorrelated in time (white noise). In the discreteformulation, the forcing function ~k will be assumed uncorrelated fromobservation time to observation time (white sequence). For the remainder of thischapter, we restrict consideration to those systems for which the control input j;

is zero.It is generally assumed that random variables have zero ensemble average

values - Le., they are unbiased. The fact that a variable is known to be biasedimplies knowledge of the bias value. Consequently, a new quantity with the biasremoved (i.e., subtracted) can be defined. This does not suggest that certainconstant state variables cannot be considered; as long as their distribution overthe ensemble of possible values is unbiased, constant state variables areadmissible. It should be pointed out that if the state at some time to is unbiased,the state will remain unbiased. This can be illustrated by taking the ensembleaverageof both sides of Eq. (3.4·1) where Ilk = Q:

It provides a statistical measure of the uncertainty in ~. The notation permits usto discuss the properties of the covariance matrix independently of the meanvalue of the state.

Some features of the covariance matrix can be seen by treating the error inknowledge of two random system state variables.

(3.6-6)

(3.6-5)

E[X,x,Jl

E[x,']J

&= [:JThe covariance matrix of ~ is

Notice that the covariance matrix of an n-state vector is an n X n symmetricmatrix; this fact will be used repeatedly in subsequent chapters. The diagonalelements of this covariance matrix are the mean square errors in knowledge ofthe state variables. Also, the trace of P is the mean square length ot the vectori· The off-diagonal terms of P are indicators of cross-correlation between theelements of R. Specifically, they are related to the linear correlation coefficientp(x"x,)by

(3.6-1)

EllIk+d =E[<I'kllk +rk~d

= <l>kEfJld +rkE[~kJ

=Q

74 APPLI EO OPTIMAL EST,MATION LINEAR DYNAM.C SYSTEMS 75

(3.6·7)The properties of the Dirac delta function, l)(T - a), are used to perform theintegration over a, yielding

where the operator l) is the Dirac delta function. The corresponding covariancematrix of the uncorrelated random sequence r\Yk in the discrete formulation is

The noise covariance matrix is computed using the definition of rk\Yk given inEq. (3.4-3), viz:

which is the result sought.There is an important distinction to be made between the matrices O(t) and

Ok. The former is a spectral density matrix. whereas the latter is a covariancematrix. A spectral density matrix may be converted to a covariance matrixthrough multiplication by the Dirac delta function, 5(t - r); since the deltafunction has units of I/time, it follows that the units of the two matrices Q(t)and Qk are different. This subject is again discussed in Chapter 4, when thecontinuous optimal filter is derived from the discrete formulation.

The above discussion concerns covariance matrices formed between vectors ofequal dimension n. The matrices generated are square and :l X n. Covariancematrices can also be formed between two vectors of unlike dimension; anexample of this situation arisesin Chapter 7.

(3.6-9)

(3.6-8)E[(G(t)~(t)XG(r)ll(r))TJ = G(t)Q(t)GT(t) 5(t - r)

where a indicates standard deviation.The random forcing functions are also described in terms of their covariance

matrices. In the continuous formulation, the covariance matrix for the whitenoise random forcing function, G~, is givenby

(3.6-10)

yielding

3.7 PROPAGATION OF ERRORS

Consider the problem of estimating the state of a dynamic system in whichthe state vector! is known at some time tk with an uncertainty expressed bythe error covariance matrix

(3.7-1)

where the error vector, .!k' is the difference between the true state''''k' and theestimate, i k;

To show that the error in the estimate at tk+1 is unbiased, subtract Eq. (3.4-1)from Eq. (3.7-3) to obtain

(3.7-2)

(3.7-4)

(3.7.3)

It is desired to obtain an estimate at a later point in time, tk-+l' which willhave an unbiased error, ik+l' To form the estimate (i.e., the predictable portionof &k+ 1 given.!k) the known state transition matrix <l>k of Eq. (3.4-1) is used,resulting in

The expectation operator may be taken inside the integral since, looselyspeaking, the expectation of a sum is the sum of the expectations and an integralmay be thought of as an infinite series. Weobtain


The expected value of the error is

(3.7-5)

Thus, under the assumptions that i k and W'k are unbiased, it can be seen thatEq. (3.7-3) permits extrapolation of the state vector estimate withoutintroducing a bias.

If a known input was provided to the system during the interval (tkA+l) thiswould appear as the additional term, Akl!k' in Eq. (3.4-1),

measurements taken on the state. A system with neutral stability* will alsoexhibit an unbounded errorgrowth if appropriateprocess noise is present.

The expression for covariance propagation in the continuous systemformulation is obtained through use of limiting arguments applied to Eq.(3.7-10). The noise covariance is:

(3.7-6) For tk+1 - tk = ilt -+ 0, this is replaced by

Since the input is known, an identical quantity is added to the estimate of Eq.(3.7-3) and thus, the estimation error would be unchanged.

Equation (3.7-4) can be used to develop a relationship for projecting the errorcovariance matrix P from time tk to tk+1. The error covariance matrix Pk+1 is

(3.7-12)

where terms of order at2 have been dropped. The differential equation for thetransition matrix is

(3.7-7) oi>(t) = F(t) 4>(t)

From Eq. (3.7-4) or

iik+liik+l T = (4'kiik - r klYkX4>kiik - rklYk)T (3.7-8)

= 4>kiikiikT4>kT -4>kiiklYkTrkT -r'klYk-ikT4>kT +rklYklYkTrkT

Taking the expected value of the terms of Eq. (3.7-8) and using the fact that theestimation error at tk and the noise rkW'k are uncorrelated (a consequence ofthe fact that rklYk is a white sequence), namely

4>(t+ ilt) - 4>(t) = F(t) 4>(t)ilt

asat -+ O.Rearrangingterms we deduce that, for f1t -+ 0,

(3.7-13)

(3.7-14)

the equation for projecting the errorcovariance is found to be

From Eq. (3.7-10) it can be seen that the size of the random systemdisturbance (i.e., the "size" of rkQkrkT) has a direct bearing on the magnitudeof the error covariance at any point in time. Less obvious is the effect ofdynamic system stability, as reflected in the transition matrix, on covariancebehavior. In broad terms, a very stable system* will cause the first term in Eq.(3.7.10) to be smaller than the covariance Pk. No restrictions regarding thestability of the system were made in the development above and the errorcovariance of an unstable system will grow unbounded in the absence of

(3.7-16)

(3.7-17)

(3.7-15)Pk+1 = (I + Filt) Pk (I + Filt)T +GQGT ilt

= Pk + (FP k + PkFT +GQGT) ilt + FPkFT ilt'

p(t)= F(t)P(t) + P(t)FT(t) +G(t)Q(t) GT(t)

As at -+ 0, the equation becomes

This equation can be rearranged in the form

Substituting in Eq. (3.7-10), we have(3.7-9)

(3.7-10)

*That is, one whose F-matrix only has eigenvalues with large, negative real parts."That is, one whose F-matrix has some purely imaginary eigenvalue pairs.

78 APPLIED OPTIMAL ESTIMATIONLINEAR DYNAMIC SYSTEMS 79

which is the continuous form of the covariance propagation equation. This is theso-called linearvariance equation.

(3.8-5)

3.8 MODELING AND STATE VECTOR AUGMENTATIONWe now consider a number of specific correlation models for system

distwbances, in each instance scalar descriptions are presented.

The random constant can be thought of as the output of an integrator which hasno input but has a random initial condition [see Fig. 3.8-I(a)].

Random Constant - The random constant is a non-dynamic quantity with afixed, albeit random, amplitude. The continuous random constant is describedby the state vector differential equation

Random Walk - The random walk process results when uncorrelated signalsare integrated. It derives its name from the example of a man who takesfixed-length steps in arbitrary directions. In the limit, when the number of stepsis large and the individual steps are short in length, the distance travelled in aparticular direction resembles the random walk process. The state variabledifferential equation for the random walk process is

The error propagation equations obtained in Section 3.7 are developed underthe assumption that the system random disturbances (yt{t) or Yik) are notcorrelated in time. Suppose, however, that important time correlation does exist.The characterization of system disturbances which have significant timecorrelation may be accomplished by means of "state vector augmentation:' Thatis, the dimension of the state vector is increased by including the correlateddisturbances as well as a description of system dynamic behavior in appropriaterows of an enlarged F (or ep) matrix. In this manner, correlated random noisesare taken to be state variables of a fictitious linear dynamic system which is itselfexcited by white noise. This model serves two purposes; it provides properautocorrelation characteristics through specification of the linear system and thestrength of the driving noise and, in addition, the random nature of the signalfollows from the random excitation. Most correlated system disturbances can bedescribed to a good approximation by a combination of one or more of theseveral types of models described in this section. The problem of correlateddisturbances is treated for both continuous and discrete formulations.

State Vector Augmentation - The augmentation of the state vector toaccount for correlated disturbances is done using Eq. (3.1·1):

>'=0

The corresponding discrete process is described by

(3.8-6)

(3.8-7)

(3.8·1) x==w (3.8-8)

Suppose 1Y is composed of correlated quantities ~l and uncorrelated quantities

~2,

where E[w(t)w(r)] = q(t) 6(t-r). A block diagram representation of thisequation is shown in Fig. 3.8-I(b). The equivalent discrete process is

~=~1+~2 (3.8·2) (3.8-9)

where ~3 is a vector composed of uncorrelated noises, then the augmentedstate vector !,' is givenby

p=q

is used to determine the time behavior of the random walk variable. For f==O,g=I. we obtain

(3.8-10)p=2fp+ g'q

where the noise covariance is [Eq. (3.6-13)1 ~ = q(tk+t -tk)· The scalar versionof the continuous linear variance equation

(3.8-4)

(3.8·3)

"'=["JYil

IfYi. can be modeled by a differential equation

and the augmented state differential equation, driven only by uncorrelateddisturbances, is


(3.8-12)

The state XI is the random ramp process; X2 is an auxiliary variable whose initialcondition provides the slope of the ramp. This initial condition is exhibited inthe form of a mean square slope. E[x;(O)). From the solution of Eqs, (3.8·12)the mean square value of XI is seen to grow parabolically with time, viz:

E[X,2(t)) = E[x;(O)) t2 (3.8.13)

A combination of a random ramp, random walk and random constant can berepresented by the use of only two state variables as illustrated in Fig. 3.8-2. Theequivalent discrete version of the random ramp is defined by the two variables

(3.8-14)

RANDOM CONSTANT+ RANDOM WALK• RANDOM RAMP

J-----1~xl

X2 (0)

(0) RANDOM CONSTANT

(b) RANDOM WALK

X(O\

w~.tC) RANDOM WALK and

RANDOMCONSTANT

Figure 3.8-1 Block Diagrams for Random Constant andRandom Walk Processes

Figure 3.8-2 Generation of Three Random Characteristics by theAddition of Only Two State Variables

so that EIql(lnentially Correlated Random Variable - A random quantity whoseautocorrelation function is a decreasing exponential

E[x 2] =p=qt (3.8.11) (3.8·15)

A combination of the random walk and random constant can be represented bythe use of only one state variable. This is illustrated in Fig. 3.8.I(c).

Random Ramp - Frequently, random errors which exhibit a definitetime-growing behavior are present. The random ramp, a function which growslinearlywith time, can often be used to describe them. The growth rate of therandom ramp is a random quantity with a given probability density. Two stateelements are necessary to describe the random ramp:

is frequently a useful representation of random system disturbances. Recall fromChapter 2 that this autocorrelation function is representative of a first-ordergauss-markov process. This quantity is often used to provide an approximationfor a band-limited signal, whose spectral density is flat over a finite bandwidth.The exponentially correlated random variable is generated by passing anuncorrelated signal through a linear first-order feedback system as shown in Fig.3.8·3. The differential equation of the state variable is

(3.8·16)

1=

<{'\~

8l="

~-;- ~Jj-; -;

';: ~ .,. 'l=9-

~ ~ ~ ~

~ 1 8 ~~ ~ ~: ~ ~ ~

~ s!

zo

w~l

~Iz

(3.8-21)

(3.8-23)

(3.8-22)

(3.8-20)

(3.8-19)

(3.8-17)

(3.8-18)

0= -2p(3+ q

yielding

The mean square value of the exponentially correlated random variable isconstant if the mean square initial condition on the integrator is taken as qJ2(3.The mean square value of x is found from the scalar covariance equation, Eq.(3.8-10). Since q is constant, p=O in steady state, so that (f = -(3, g = 1)

The discrete version of the exponentially correlated random variable is describedby

Periodic Random Quantities - Random variables which exhibit periodicbehavior also arise in physical systems. A useful autocorrelation function modelis given by

Using Eq. (3.7-11), we readily find the covariance ofwk as

where

where the values of (3 and ware chosen on the basis of the physics of thesituation or to fit empirical autocorrelation data. Two state variables arenecessary to represent a random variable with this autocorrelation function. Onepair of quantities which provides this relation obeys the following differentialequations (Ref.5)

The spectral density of the white noise w is 2(3a2~(T).

Piaule3.8-3 Continuous Time Models for Random Variables

85

.-"

s

LINEAR DYNAMIC SYSTEMs

I

iI "

iI

II,

rII

Figure 3.8-4 DiscreteVersionof ErrorModels

(3.9-1)

(3_9-2)

~k+t ; 4>lIk +S rk

A time-invariant, discrete linear system of the form

3.9 EMPIRICAL MODEL IDENTIFICATION

where T and D are the first-order correlation time and first-order correlationdistance, respectively. This correlation also occasionally appears when considering geophysical phenomena (e.g., Ref. 7). However, the majority of systemmeasurement errors and disturbances can be described by some combination ofthe random variable relations summarized in Fig. 3.8-3.

is used to fit the data. It can be shown that this type of model is general enoughto fit any sequence of observations generated by a stable, time-invariant linearsystem driven by gaussiannoise. Here rk' referred to as the residual at time tle, isthe difference between the observation at time tk and the output the modelwould produce based on inputs up to but not including time tk. The residualsreflect the degree to which the model fits the data.

Model identification typically proceeds in the following way. Once data areacquired, a preliminary model for the random process is chosen. An optimalsmoothing algorithm (see Chapter 5) is used to determine the initial state andthe inputs most likely to have produced the observed data. Next, residuals areexamined to determine how well the model fits the data. A new model isselected if the residuals do not display the desired characteristics, and theprocedure is repeated.

Subsequent discussions assume that any random process under study has beenmodeled as a linear system driven by gaussian white noise. In this section,methods for determining the best linear-gaussian model to fit an observedsequence of data are discussed. As illustrated in Fig. 3.9-1, a model for theprocess is produced by combining empirical data and prior knowledge ofunderlying physical mechanisms. The derived model and the data are then usedin an estimation process as described in succeeding chapters.

It is assumed that the available data are a sequence of scalar observations,

B4 APPLIED OPTIMAL ESTIMATiON

A summary of the continuous models described above is given in Fig. 3.8·3.The discrete models which have been presented are summarized in Fig. 3.8-4.Occasionally, other more complex random error models arise. For example, Ref.6 discusses a time and distance correlated error whose autocorrelation functionis given by

OP(T,d);a,;ITI IT ;Idl ID (3.8-24)


Figure 3.9-1 Empirical Model Identification

PRIORKNOWLEDGE

AUTOCORRELATION TECHNIQUES

Since certain of the models considered here can produce stationary gaussianrandom processes, it is natural to give primary consideration to the sample meanand sample autocorrelation function for the data when attempting modelidentification. Given a string of data zi' i = 1,2, ... , N, taken at constantintervals, ~t, the sample mean and sample autocorrelation function are

determined as

75808590105 100 95

WEST LONGITUDE (deg)

110115

Figure 3.9-2 Meridian Component, Vertical Deflections of Gravity 35th Parallel, United States (Ref. 9)

Example 3.9-1

Deflections of the vertical are angular deviations of the true gravity vector from thedirection postulated by the regular model called the reference ellipsoid (Ref. 8). Considerthe data sample shown in Fig. 3.9-2; it consists of measurements of one component of thedeflection of the vertical, t, taken at 12.5 nm intervals across the 35th parallel in the UnitedStates (Ref. 9).

25

20

(~"J 15

Z0 10l-Vw 5.....u,W0 0.....«Id -5....O'w -10>

-15125 120

ESTIMATES

"lIr

MODEL MODEL- IDENTIFICATION,TECHINQUES

,~OPTIMAL... ESTIMATION

..., ,ALGORITHM

EMPIRICALDATA

, Q=0,1, ...,N-2

If the process being measured is known to be zero mean, then

We are concerned with spatial rather than temporal correlation so that the shiftparameter in the autocorrelation function is distance. The sample autocorrelation functioncalculated according to Eq. (3.9-4) is shown in Fig. 3.9-3. The formula is evaluated only outto 12 shifts, i.e., 150 om; beyond that point two effects cloud the validity of the results.First, as the number of shifts Q increases, the number of terms N - Q in the summationdecreases, and confidence in the derived values is reduced. Secondly, calculated autocorrelation values at shifts greater than 150 nm are sufficiently small that any measurement noisepresent tends to dominate.

Vertical deflection autocorrelation values are seen to fall off exponentially. A first-ordermodel least-squares curve-fit results in the empirical autocorrelation function shown in Fig.3.9-3. It has the form

(3.9-3)

(3.9-4a)

N

LZji= 1

1m=-

N

A preliminary model for the process can be obtained by choosing one whosemean and autocorrelation function closely approximate the sample mean andsample autocorrelation function. Although seemingly straightforward, theprocedure is often complicated by the fact that data might not be taken over asufficient period of time or at sufficiently frequent intervals. It is, of course,necessary to apply physical insight to the problem in order to correctly interpretcalculated statistical quantities. The following example provides an illustration.

the mean is found to be

(3.9-5)

the correlation distance is

0t =5.2~

where the standard deviation is determined as

(3.94b), Q= 0, 1, ... , N-l1

rpzz(Q~t) =N _Q


Vertical Deflection Sample Autocorrelation Function 35th Parallel U.S.

o ACT UAl DATA- fITTED flRST-ac::>fk MOtEL--- fITTED SECOND-ORDER MODEL

(3.9-8)

A sequence of measurements taken at intervals ~t over a period T is used toobtain an empirical autocorrelation function

The empirically determined quantities of and mt are unchanged. Details of applying thesecond-order gauss-markov model to gravity phenomena are given in Ref. 11. Comparisonsof the first- and second-order processes with the empirical autocorrelation values are given inFig. 3.9-3.

Finite data length implies an inability to determine the form of anautocorrelation function at large shifts, whereas finite data spacing implies aninability to determine the form at small shifts. Together, the two correspond toa limited quantity of data, thus resulting in classical statistical limitations inestimating autocorrelation function parameters. These factors are treatedanalytically in Refs. 12 and 13; brief summaries are given here.

Consider a zero mean random process whose statistics are described by theautocorrelation function

150

oo

125

-oo

50 75 100

DISTANCE SHIFT. nm

Figure 3.9-3

35

N

(~ 30

Z 250 \

\.....\U

Z \:::> \u. 15 \Z \0 10~-' 5well::ell::0 0~.....:::> -5«

-100 25

D =25.1 nm

N-ll

L zizi+Qi= 1

, Q=0, I, 2, ... , N-I (3.9-9)

and d is the distance shift parameter. As described in Ref. 10, the fitted function fromwhich Eq. (3.9-5) is obtained is derived using techniques which compensate for finite data.Thus, it is not unnatural that the function is negative at large distance shifts. In this examplethe correlation distance D is approximately twice the data interval. The structure of theactual autocorrelation function cannot be accurately determined for small distance shifts e.g.• less than 12.5 nm.

An exponential (first-order gauss-markov) autocorrelation function describes a processwhose correlation falls off exponentially at all distance shifts, including d =O. Equivalently,the derivative of the autocorrelation function is nonzero at zero distance shift,

a A I 1- 'Ptt(d) = -- 0e :I: 0ad d=O D

(3.9-6)

where N = T/~t. In Fig. 3.9-4, expected values of the normalized empiricalautocorrelation function are plotted for various values of T/To. The case ofT/To ~ 00 corresponds to the actual function, Eq. (3.9-8). Clearly, for smallvalues of T/T0, empirically derived autocorrelation functions may be quitedifferent from the function corresponding to the underlying physical model.

Data limitations, expressed as a finite value for N, lead to an uncertainty inthe estimation of autocorrelation function parameters. If the actual processvariance is Oz2 and the empirically derived estimate is Oz2

, the uncertainty in theestimate is expressed by

If !he actual process correlation time is To and the empirically derived estimateis To, the uncertainty in the estimate is expressed by

This result is not physically appealing since vertical deflections are caused by nonuniformmass distributions in the earth, which are not likely to change instantaneously in distance aswould be implied by a nonzero derivative. Thus, consideration is given to the second-ordergauss-markov process, whose autocorrelation function does have a zero derivative at zeroshift. The form of the function is

(3.9-7)

and by least-squares curve fitting the characteristic distance is determined to be

(3.9-10)

(3.9-11)

D =11.6 nm This relationship is illustrated in Fig. 3.9-5; the ideal situation corresponds to Tlarge and ~t small, relative to To.

T: DATA LENGTH

12

1.6.tITo

NORMAliZED DATA SPACING

14,- . -,

°oo------+----_---.J

Fipre 3.~s Variance of Estimated Correlation Time of ExponentialAutocorrelation Function

A state variable representation of this model has the states

@ -02:;;~ -0.3~z

-04

v

~§

~8o5<l

§v

x

:~ 1'~bN::..... 0.7

ZQ

I 2 3 4NORMALIZED TIME SHIFT. 'TITO

-0.50~---'---~----!------'----'

Figure 3.9-4 Expectation of Measured Autocorrelation Function forYarying Sample Lengths

TIME SERIES ANALYSIS

In Ref. 14, Box and Jenkins present techniques for fitting a model toempirical scalar time series data. Instead of using a state variable formulation forthe linear system. the equivalent difference equation formulation is used,

(3.9-13)

p q

zk = L bi zk_ i + rk - L ci rk_ i1=1 is' I

(3.9-12)

Here zk is the observation of the time series at time tk and rk' called theresidualat time tk• is an unconelated gaussian random variable. The summationlimits p and q as well as the parameters bi and Cj are adjusted to fit the data. It isassumed that the data have been detrended; that is, a linear system has beenfound which adequately models the mean value of the data. For example, therandom bias and random ramp models discussed previously can be used. Thedetrended data sequence is assumed to have stationary statistics.

Here

(3.9-14)


represents the prediction of Zt based on the model and on knowledge of theinfinite number of z's prior to time tt. It is easy to see that

p q

zk+l(-)=blzt-clrt+ L bizt+1-i-L cirk+l-ii=2 i=2

This type of time series is called an autoregressive (AR) process. Note that theresidual rk is the only portion of the measurement zk which cannot be predictedfrom previous measurements. It is assumed that the coefficients of thisdifference equation have been chosen so that the linear system is stable, thusmaking the autoregressive process stationary. The characteristic nature of thisprocess is reflected in its autocorrelation function. Multiplying both sides of Eq.(3.9-17) by delayed zk and taking the ensemble expectation of the result, we find

(3.9-18)p

'Pzz(k) = L b, 'Pzz (k-i)1= 1

(3.9-15)

q

bi Zt+l-i - Li=2

p

=bIZt(-)+(bl-cdrk+ Li=2

Therefore the state variable model is

!k+l =

o; Io I ,· I I· I I I 0· I Io I I--------,o 0 • • • 0 I----- ----,-,- ----

, 0 II 0 I

•I . I

o I . I I

I 0 I_______ J_.l. _-c -c 1·· • -C2 (b b ••• b lq q - I P p-l

00•

•

•

01

!k + ---0 rk

0•

•

•

01---

bl-ci

(3.9-16)

Thus, the autocorrelation function of an AR process obeys the homogeneousdifference equation for the process. For a stationary AR process, the solution tothe homogeneous difference equation is given by linear combinations of dampedsinusoids and exponentials.

It is desirable to express the difference equation coefficients in terms of theautocorrelation function values; estimates of the coefficients can then beobtained by using the sample autocorrelation function. Evaluation of Eq.(3.9-19) for p successive shifts results in a set of p linear equations which can bewritten as

'Pzz(O) 'Pzi I) • • • 'Pzip-l) b l 'Pzi I)

'Pzi I) 'Pzz(O) • • • 'Pzip-2) b2 'Pzi2)

• • • • • (3.9-19)-• • • • •

• • • • •

'Pzz(p-I) 'Pzz(p-2) • • • 'PziO) bp 'Pzip)

To gain insight into how the characteristics of the time series model areaffected by the coefficients of the difference equation, the general form of Eq.(3.9-12) is specialized to several particular cases. The characteristics of theseforms are reflected in their autocorrelation functions. Therefore, by studyingthese specialized forms, it is easier to identify a model that matches the observedtime series and also has a minimum number of parameters to be determined.

This set of linear equations, called the Yule-Walker equations, can be solved forthe correlation matrix values and correlation vector using estimates of theautocorrelation function. The state variable representation of an autoregressiveprocess has only the p states,

Autoregressive CAR) Process - If it is assumed that the present observation isa linear combination of past observations plus a gaussian random variable, Eq.(3.9-12) becomes

P

zk= Li=l

b. zk . + rkI -1(3.9-17)

Zk_p+l

zk_p+2

••

!k = • (3.9-20)

zk-2

zk-l00

Zt(-)

Moving Average (MA) Process - If the time series is assumed to be generatedby a finite linear combination of past and present inputs only, the process iscalled a moving average (MA) process. Under this assumption, the time series isgenerated by the difference equation

q

zk=rk- 2: cirk_ii=l

(3.9-21)

The autocorrelation function for this process is

(0) I + CI' - 2 CI b l a,''1'" = I-b,'

This model always produces a stationary process. It is assumed that thecoefficients are chosen so that model is also invertible - i.e., the input sequencecan be completely determined from knowledge of the observed output sequence.Under the assumption that the covariance of the input sequence is

= (1- bIC,)(b , - cd u'1- b l ' r

'I',,(k)= b , 'I',,(k-I) for k;;>2

Thus, the autocorrelation function for an MA process has a fmite number ofnon-zero values, and cuts off at the order of the process.

(3_9-25)

Fast Fourier Transforms - In 1965 Cooley and Tukey (Ref. 15) described acomputationally efficient algorithm for obtaining Fourier coefficients. The fastFourier transform (FFT) is a method for computing the discrete Fouriertransform of a time series of discrete data samples. Such time series result whendigital analysis techniques are used for analyzing a continuous waveform. Thetime series will represent completely the continuous waveform, provided thewaveform is frequency band-limited and the samples are taken at a rate atleast twice the highest frequency present. The discrete Fourier transform of thetime series is closely related to the Fourier integral transform of the continuouswaveform.

The FFT has applicability in the generation of statistical error models fromseries of test data. The algorithm can be modified to compute the autocorrelation function of a one-dimensional real sequence or the cross-correlation

are found to have stationary statistics. Then an ARMA process can be used tomodel these differences, and the measurements can be modeled as the sum ofthe ditferences plus an initial condition. Such a process is called an autoregressiveintegrated moving average (AR[MA) process; the term "integrated" refers to thesummation of the differences. Note that the random walk process discussed inSection 3.8 is the simplest form of an AR[MA process.

Autoregressive Integrated Moving Average (AR IMA) Processes - If themeasurements do not exhibit stationary statistics, the AR, MA and ARMAmodels cannot be used directly. In certain situations this difficulty can beovercome by differencing the data. For example, suppose the differences

Note that '1',,(0) and '1',,(1) depend upon both the autoregressive and movingaverage parameters. The autocorrelation function is exponential except for'1',,(0), which does not follow the exponential pattern.

(3.9-24)

(3.9-22)

(3.9-23)for k <q

for k> q

u/

for £=k

for £*k

p

'I',,(k) = L: bi'l'zz(k-i) for k > qi=l

'I'zz<k)= (-ck +

=0

the corresponding autocorrelation function of the observations is

with p initial conditions 'I',,(q), 'I',,(q-I), ... , 'I',,(q-p+I). If q < p, theautocorrelation function will consist of damped exponentials and sinusoidsdetermined by the difference equation coefficients and initial conditions. Ifp .. q, there will be (q-p+ I) initial values which do not follow the generalpattern. An example is given by the first-order mixed ARMA process,

Mixed Autoregressive Moving Average (ARMA) Processes - A more generalstationary time series can be generated by combining the AR and MA processesto get a mixed autoregressive moving average (ARMA) process. The differenceequation model for the process takes the general form of Eq. (3.9-12), but theallowable range of coefficients is restricted so that the process is stationary andthe model is invertible. The process autocorrelation function is identical to thepure AR process after (q - p) shifts. Thus, the autocorrelation function is givenby


function and convolution of two one-dimensional real sequences. It can also beused to estimate the powerspectraldensity of a one-dimensional real continuouswaveform from a sequence of evenly-spaced samples. The considerable efficiencyof the FFT, relative to conventional analysis techniques, and the availability ofoutputs from which statistical error models are readily obtained, suggest that theFFT will be of considerable utility in practical applications of linear systemtechniques. Computational aspects of applying the FFT as noted above arediscussed in Ref. 16.

REFERENCES

1. DeRusso, D.M., Roy, R.J., and Close, c.M., State Variablesfor Engineers, John Wiley &Sons, Inc., New York, 1966.

2. Leondes, C.T., ed., Guidance and Control of Aerospace Vehicles, McGraw-Hill BookCo., Inc., New York, 1963.

3. Lee, R.C.K., Optimal Estimation, Identification. and Control, M.l.T. Press, Cambridge,Mass.,1964.

4. Schweppe, F.C., Uncertain Dynamic Systems, Prentice-Hall, Englewood Cliffs, N.J.,1973.

5. Fitzgerald, R.J., «Filtering Horizon - Sensor Measurements for Orbital Navigation,"AIAA Guidance and Control Conference Proceedings, August 1966, pp. 500-509.

6. Wilcox, J.c., "Self-Contained Orbital Navigation Systems with Correlated MeasurementErrors," AIM/ION Guidance and Control Conference Proceedings, August 1965, pp.231-247.

7. D'Appolito, J.A. and Kasper, J.F., Jr., "Predicted Performance of an IntegratedOMEGA/Inertial Navigation System," Proc: National Aerospace Electronics Conference, May 1971.

8. Heiskanen, W.A. and Moritz, H., Physical Geodesy, W.H. Freeman, San Francisco,1967.

9. Rice, D.A., uA Geoidal Section in the United States," Bull. Geod., Vol. 65, 1962.

10. Levine, S.A. and Gelb, A., "Effect of Deflections of the Vertical on the Performance ofa Terrestrial Inertial Navigation System," J. Spacecraft and Rockets, Vol. 6, No.9.September 1969.

11. Kasper, J.F., Jr., "A Second-Qrder Markov Gravity Anomaly Model," J. Geophys. Res.,VoL 76, No. 32. November 1971.

12. Bendat, J.S. and Piersol, A.G., Measurement and AruzlystsofRandom Data, John Wiley& Sons, Inc., New York, 1966.

13. Weinstock, H., "The Description of Stationary Random Rate Processes," M.I.T.Instrumentation Laboratory, E-1377, July 1963.

14. BoX, G.E.P., and Jenkins, G.M., Time Series Analysis, Forecasting and Control,Holden-Day, San Francisco, 1970.

15. Cooley, l.W. and Tukey, J.W., "An Algorithm for the Machine Calculation of ComplexFourier Series," Math. ofComput., VoL 19, pp. 297-301, April 1965.

16. Cochran, W.T., et. al., "What is the Fast Fourier Transform?",Proc. of the IEEE, Vol.55, No. 10, pp. 1664-1674, October 1967.


PROBLEMS

problem 3-1

Show that

t

P(t) =4l(t,to)P(to) 4lT(t,to) + f 4l(t,r)G(T)Q(T)GT(1')41T(t,1')d1'to

IS the solution of the linear variance equation.

Problem 3-2

Use the solution given in Problem I to show that the solution to

is

P= f- eft QeFTtdto

where F and Q are constant matrices.

Problem 3-3

Extensive analysis of certain geophysical data yields the following temporal autocorrelation function

~1')=o2 (Olt2+0l22cosWT+0l32e-PI1'I)

where w is earth angular rate (2'11124 hit) and l' is time shift. Derive a set of state vectorequations to describe this random process.

Problem 3-4

For the system shown in Fig. 3-1, where the autocorrelation function is

'PX2X2

(1') = o2e-P1TI

show that

0' [2(,6'-1 +e-p,)1 m-e-P')jP(,)=- -----1----

p' PO-e-p,) 1 p'

for the state vector taken as.! = [Xl X2)T.

Figure 3-1

Problem 3-5

'Consider the single-axis error model for an inertial navigation system shown In hg. 3-2.


Problem 3-7

For the constant discrete nth-order deterministic system driven by a scalar control

In this simplified model, gyro drift rate errors are ignored. It is desired to estimate thevertical deflection process, et. Consider the system state vector consisting of the four states:

,"r 0 UNBIASED RANDOM VERTICAl DEflECTION

:~b: ~~T~~~L~~~L~C~~O~I:~ASMEA5UREMENT ERROR

epb" BIAS POSITION MEASUREMENT ERRORe p , UNCORRElATED POSITION MEASUREMENT ERROR

Fipre 3-2

".

Show that the controllability criterion is that the rank of

0=[1:<1>1: ... :<l>n- 11]I I I

be n where,!Q, ~ and the ui are given. (Hint: Describe the vector .!n - ~n&> in terms ofuo, ..., wn-c l and powers of e.)

Problem 3-8

Show that for the stationary linear system described in Fig. 3-3, the state transition

matrix is ('" "[O:_"lffI:~(.-"'ffI_.-"'/T'~I TI~T2

<1>("')= - -0 - T-- - ~"'rT,---

I

J + 1 Jwltl _~_~

(a) For position measurements only (zv not available), set up the obscrvability matrix anddetermine whether this system is observable. (b) Now assume we have both position andvelocity measurements. Is this system observable? (c) Now assume vpb = 0 and can beeliminated from the state error model. Is this system observable with position measurementsonly?

Problem 3-6

For the linear system whose dynamics are described by

FJ~ ~ll~ -~

Show that for small «T, the state transition matrix is

[I T..f]

(T)= 0 I ;

o 0 I

Figure 3-3

Problem 3-9

For the system illustrated in Fig. 34, show that

Et1 2 (t)]=q 1 t

Er,'(t)]=ql;' +q,t

Et,,(,)]=q~~ +¥ +q,t

[ 1 n qn+1 i t 2i- 1

Ern'(t)J =L: (2i-I)(i-I)!(i-l)!1=1

where the white noise inputs Wihave spectral densities Qj6(t).


Figure J.4

Problem 3-10

For the second-order system shown in Fig. 3·5, use the linear variance equation to obtain

E[X12(t~= ~

Et22( t l} 4~wwhere the white noise input w has spectral density qli(t), in the steady state.

Figure 3-5

Problem 3-11

For the first-order system shown in Fig. 3-6, choose the gain K to minimize the meansquare error between the command input r(t) and the system output c(t). Define a statevector.lT =: [c r] and obtain the steady-state solution of the linear variance equation, PssDefine the quantity e(t) =: c(t) ~ r(t) and compute its steady-state mean square value as

For 02 =- (J =: Lfl and N = 0.5, show that K =: t.O.


rltl~,---~'~.J cit)

"h'."" l~rn (t)

¢nnI T) :N8IT)

Figure 3-6

4. OPTIMAL LINEAR FILTERING OPTIMAL LINEAR FILTERING 103

Let us assume that the set of Q measurements, 1, can be expressed as a linearcombination of the n elements of a constant vector ~ plus a random, additivemeasurement error, y,.That is, the measurement process is modeled as

(4.0-1)

where z is an Q X 1 vector. x is an n X I vector, H is an Q X n matrix and v is an £X I ve~tor. For £ > n the ~easurement set contains redundant infonn;;ion. Inleast-squares estimation, one chooses as.8 that value which minimizes the sum ofsquares of the deviations, Zj - Zj ~ i.e., minimizes the quantity

(4.0-2)

The resulting least-squares estimate (2 ;;, n), found by setting aJ/ax = 0 (Sec.2.1),is - -

(4.0-3)

If, instead, one seeks to minimize the weighted sum of squares of deviations,

where R- t is an £ X £ symmetric, positive definite weighting matrix, theweighted-least-squares estimate

To maximize p(~.lv we minimize the exponent in brackets. This is equivalent tominimizing the cost function in Eq. (4.0-4), although now a probabilistic basisfor choosing R exists. The result, of course, is as given in Eq. (4.0-5). Still

(4.0-6)

(4.0-4)

(4.0-5)

is obtained. These results have no direct probabilistic interpretation; they werederived through deterministic argument only. Consequently, the least-squaresestimates may be preferred to other estimates when there is no basis forassigning probability density functions to !. and 1. Alternatively, one may usethe maximum likelihood philosophy, which is to take as .& that value whichmaximizes the probability of the measurements 1. that actually occurred. takinginto account known statistical properties of :I.. There is still no statistical modelassumed for the variable !. In the simple example above, the conditionalprobability density function for 1. conditioned on a given value for If, is just thedensity for :t. centered around H!. With ~ taken as a zero mean, gaussiandistributed observation with covariance matrix R, we have

The preceding chapters discuss a number of properties of random processesand develop state-vector models of randomly excited linear systems. Now, weare ready to take up the principal topic of the book - namely, the estimation ofa state vector from measurement data corrupted by noise. Optimal estimatesthat minimize the estimation error, in a well-defined statistical sense, are ofparticular interest. This chapter is devoted to the subject of optimal filtering forlinear systems. Filtering refers to estimating the state vector at the current time,based upon all past measurements. Prediction refers to estimating the state at afuture time; We shall see that prediction and filtering are closely related. Chapter5 discusses smoothing, which means estimating the value of the state at someprior time, based on all measurements taken up to the current time.

Our approach to development of the optimal linear filter is to argue the/ormit should take, to specify a suitable criterion of optimality and to proceeddirectly to optimization of the assumed form. Before embarking upon thiscourse, however, we briefly provide some background on the subjects ofdesirable characteristics of estimators in general, alternative approaches tooptimal estimation and motivation for optimal linear estimators.

An estimate, E, is the computed value of a quantity,!, based upon a set ofmeasurements, z. An unbiased estimate is one whose expected value is the sameas that of the q~antity being estimated. A minimum variance (unbiased) estimatehas the property that its error variance is less than or equal to that of any otherunbiased estimate. A consistent estimate is one which converges to the true valueof !' as the number of measurements increases. Thus, we shall look for unbiased,minimum variance, consistent estimators.

104 APPLIED oPTIMAL ESTIMAnON OPTIMAL LINEAR FILTERING 106

(4.0-12)

where Po is the a priori covariance matrix of!.In comparing the various estimation methods just discussed, we note that if

there is little or no a priori information, p.-I is very small and Eq. (4.0-12)becomes Eq. (4.0-5). And if we argue that all measurement errors areuncorrelated [i.e., R is a diagonal matrix) and all errors have equal variance[i.e., R = 0'1), Eq. (4.0-5) reduces to Eq, (4.0·3). In his important work,Kalman (Ref. 2) formulated and solved the Wiener problem for gauss-markovsequences through use of state-space representation and the viewpoint ofconditional distributions and expectations. His results also reduce to those givenabove. Therefore, the important conclusion is reached that for gaussian randomvariables, identical results are obtained by 0/1 these methods as long as theassumptions are the same in each case (Ref. 3). This property motivates us toconsider primarily Bayesian minimum variance estimators.

Now we observe that R in Eq. (4.0-12) is a linear operation on themeasurement data. Furthermore, it is proven elsewhere (Ref. 4) that, for agaussian time-varying signal, the optimal (minimum mean square error) predictoris a linear predictor. Additionally, as a practical fact, most often all we knowabout the characterization of a given random process is its autocorrelationfunction. But there always exists a gaussian random process possessing the sameautocorrelation function; we therefore might as well assume that the givenrandom process is itself gaussian.That is, the two processes are indistinguishablefrom the standpoint of the amount of knowledge postulated. On the basis ofthese observations we are led to consider the optimal estimator as a linearoperator in most applications.

Henceforth, unless stated otherwise, the term optimal estimator refers to onewhich minimizes the mean square estimation error. Next, we consider therecursive form of the linear estimator, which applies to gaussian randomsequences. The discrete time problem is considered first; the continuous timeequivalent is then obtained through a simple limiting procedure. Intuitiveconcepts, situations of special interest and examples comprise the remainder ofthe chapter.

(4.0·9)

(4.0·7)

(4.0·10)[-1- t:J = ...- .. -.. - ..

1_[- t:x= ...- - .. _.. ~ ..

where p(i) is the a priori probability density function of ~, and p(Y is theprobability density function of the measurements. Depending upon the criterionof optimality, one can compute i from~. For example, if the object is tomaxinilze the probability that .R=~, the solution is.R=mode" of pQilfj. Whenthe a pnori density function p(i) is uniform (which implies no knowledge of~

between its allowable limits), this estimate is equal to the maximum likelihoodestimate. If the object is to find a generalized minimum variance Bayes' estimate,that is, to minimize the cost functional

which is the conditional mean estimate. Equation (4.0-8) has the characteristicstructure

where S is an arbitrary, positive semidefinite matrix, we simply set 3J /3'&=Qtofind, independent ofS, that

(4.0·8)

another approach is Bayesian estimation, where statistical models are availablefor both x and z, and one seeks the a posteriori conditional density function,pC!IV, s;ce it ;ontains all the statistical information of interest. In general,pQilfj is evaluated as (Bayes' theorem)

where L(j) is a scalar "loss function" of the estimation error

(4.0.11)

4.1 RECURSIVE FILTERS

A recursive filter is one in which there is no need to store past measurementsfor the purpose of computing present estimates. This concept is best demonstrated by the following example.

The result given in Eq. (4.0-9) holds for a wide variety of loss functions inaddition to that used in Eq. (4.0-8), with some mild restrictions on the form ofP<!1i); in all these cases the minimum variance Bayes' estimate (conditionalmean) is also the optimal Bayes' estimate (Ref. 1). Assuming gaussiandistributions for ~ and y, the result of evaluating E[~I!J in Eq. (4.0-9) is

Example 4.1·1

Consider the problem of estimating a scalar nonrandom" constant, x, based on knoise-corrupted measurements, l.j, where l.i = x + vi (i = 1,2, ... k), Here Virepresents the

"'The pe4k value of pf1J1,). "'x is unknown and has no defined statistical properties.

106 APPUED OP:rIMJ\LESTlMATION OPTIMAL LINEAR FILTERING 107

measurement noise, which we assume to be a white sequence. An unbiased, minimumvariance estimate Xk results from averaging the measurements (this can be shown from Eq.(4.Q..3»; thus, we choose

Hk is the measurement matrix at time tk; it describes the linear combinations ofstate variables which comprise!k in the absence of noise. The dimension of themeasurement matrix is Q X n, corresponding to Q-dimensionedmeasurements ofan a-dimensioned state. The term!k is a vector of random noise quantities (zeromean. covariance Rk) corrupting the measurements.

Given a prior estimate of the system state at time tk • denoted &.,(- ),we seekan updated estimate, !k(+)' based on use of the measurement.lk' In order toavoid a growing memory filter, this estimate is sought in the linear, recursiveform*

(4.1-6)

(4.1-7)

[

Zl kJz2k

.!k. = :

z2k

(4.1-3)

(4.1-2)

(4.1-1)

• k (I k ~ I k. 1xk+l = - - '" zi + - zk+l = - xk +- zk+lk+l k L..J k+l k+l k+l

ie l

k+1

Xk+1=....!..- L zik+1 i=l

This expression can be manipulated to evidence the prior estimate, viz:

When an additional measurement becomes available we have, as the new estnnate

Hence, by employing Eq. (4.1·3) rather than Eq. (4.1-2) to compute Xk+l' the need to storepast measurements is eliminated - all previous information is embodied in the priorestimate (plus the measurement index, k) - and we have a recursive,linear estimator. Notethat Eq. (4.1-3) can be written in the alternative recursive form

where Kk and Kk are time-varying weighting matrices, as yet unspecified.Although the following derivation is for an assumed recursive. single-stagefilter,the result has been shown to be the solution for a more general problem. If~k'

.Yk are gaussian, the filter we will find is the optimal multi-stage filter; anonlinear filler cannol do beller (Ref. 2). In other cases we will simply havedetermined the optimal linearfilter.

4.2 DISCRETE KALMAN FILTER

It is possible to derive the Kalman filter by optimizing the assumed form ofthe hnear estimator. An equation for the estimationerrorafter incorporation ofthe measurement can be obtained from Eq. (4.1-7) through substitution of themeasurement equation [Eq. (4.1-5)} and tne defining relations (tilde denotesestimation error)

in which the new estimate is given by the prior estimate plus an appropriately weighteddifference between the new measurement and its expected value, given by the priorestimate. The quantity zk+1 - Xkis cauee the measurement resraual.

In the example we dealt with scalar quantities; generalization of the conceptto vector quantities proceeds directly. Consider a discrete system whose state attime tk is denoled by l!.(lk) or simply l!.k' where ~k is a zero mean, whitesequence of covariance Qk,

(4.14)

!k(+) ~ l!.k+ !k(+)

&(-)~ l!.k+~k(-)

(4.2-1)

Measurements are taken as linear combinations of the system state variables,corrupted by uncorrelated noise. The measurement equation is written invector-matrix notation as

The result is

(4.2-2)

(4.1-5)

where l:k is the set of Q measurements at time tk, namely. zlk' z2k" ..• Z2k.arranged in vector form

By definition EfA] ~ Q. Also, if Elik(-)) ~ Q, this estimator will be unbiased[i.e., Elik(+)] ~ Q) for any given state vector l!.k only if the term in squarebrackets is zero. Thus. we require

"Throughout the text, (-) and (+) are used to denote the times immediately before andimmediately after a discrete measurement, respectively.

(4.2-3)

and the estimator takes the form

OPTIMAL LINEAR FIL TERING 109

where S is any positive semidefinite matrix. As demonstrated in Eq. (4.0-9), theoptimal estimate is independent of S; hence, we may as well choose S = I.yielding

(4.2·4) (4.2.14)

(4.2-5)

or alternatively.

~k(+)=~(-)+ Kdzk- Hk&kHI

The corresponding estimation error is, from Eqs. (4.1-5), (4.2-1) and (4.2-5),

This is equivalent to minimizing the length of the estimation error vector. Tofind the value of Kk which provides a minimum, it is necessary to take thepartial derivative of Jk with respect to Kk and equate it to zero. Use is mad~ ofthe relation for the partial derivativeof the trace of the product of two matricesA and B (With B symmetric),

(4.2-6) :A [trace(ABAT)) =2AB

Error Covariance Update - Using Eq. (4.2-6) the expression for the change inme error covariance matrix when a measurement is employed can be derived.From the definition

From Eqs. (4.2-12) and (4.2-13) the result is

Eq. (4.2-6) gives

Pk(+) = E 10 - KkHk)&H l!k(-?O - KkHk)T +YkT KkT)

+KkYk lid-)T(I-KkHk? +YI?KkT) fBy definition,

and, as a result of measurement errorsbeing uncorrelated,

(4.2·7)

(4.2-8)

(4.2·9)

(4.2-10)

(4.2·11)

Solving for Kk,

(4.2-15)

which is referred to as the Kalman gainmatrix. Examination of the Hessian of Jkreveals that this value of Kk does indeed minimize Jk [Eq. (2.1-79)).

Substitution of Eq. (4.2·15) into Eq. (4.2·12) gives, after some manipulation,

Pk(+)= Pk(-)-Pk(-)HkT [HkPk(-) HkT+ Rkf' HkPk(-) (4.2-16a)

= [I - KkHkl PkH (4.2-16b)

which is the optimized value of the updated estimation error covariancematrix.

Thus far we have described the discontinuous state estimate and errorcovariance matrix behavior acrossa measurement. The extrapolation of thesequantities between measurements is (Section 3.7)

Thus

(4.2·12)

.Rk (-)= k_l .Rk-l(+)

Pk (-) = k_l Pk- 1(+)k_lT + Qk-l

(4.2·17)

(4.2-18)

Optimum Choice of Kk - The criterion for choosing Kk is t~ minimize.aweighted scalar sum of the diagonal elements of the error covanance matrixPk(+). Thus, for the cost function we choose

See Fig. 4.2·1 for a "timing diagram" of the various quantities involved in thediscrete optimal filter equations. The equations of the discrete Kalmanfilter aresummarized in Table 4.2-1.

(4.2·13)

110 APPLIED OPTIMAL ESTIMATION OPTIMAL LINEAR FILTERING 111

tk_1

OI~(RETE l(A\MAN FILTER

~. ------ 't:.0'·'i1l'~~;

:>

T1ME--"Fip.re 4.2-2 System Model and Discrete Kalman Filter

Fipre 4.2·1 Discrete Kalman Filter Timing Diagram

Figure 4.2-2 illustrates these equations in block diagram form. The Kalmanfilter to be implemented appears outside the dashed-line box. That appearinginside the box is simply a mathematical abstraction - a model of what we thinkthe system and measurement processes are. Of course, the Kalman filter weimplement is based upon this model. Chapters 7 and 8 dwell on the practicalconsequences of this fact.It is often said that the Kalman filter generates its ownerror analysis. Clearly, this refers to the computation of Pt, which provides anindication of the accuracy of the estimate. Again, Chapters 7 and 8 explore thepractical meaning of Pt in view of modeling errors and other unavoidablefactors.

In the linear, discrete Kalman filter, calculations at the covariance levelultimately serve to provide Kk , which is then used in the calculation of meanvalues (Le., the estimate &k)' There is no feedback from the state equations to

the covariance equations. This is illustrated in Fig. 4.2~3. which is essentially asimplified computer flow diagram of the discrete Kalman filter.

TABLE 4.2-1 SUMMARY OF DISCRETE KALMAN FILTER EQUATIONS '.

Figure 4.2-3 Discrete Kalman Filter Information Flow Diagram

A Simpler Form for Kk - There is a matrix inversion relationship whichstates that, for Pk(+) as given in Eq. (4.2-16a), Pk- ' (+) is expressible as:

System Model II = ~ic:-l!k-l + ~k-l. M< - N(Q, QJc)

Measurement Model .!k=H~+.Ik.. ~-N(Q,Rk)

Initial Conditions E!2i(O)1 = io, EI~fO) - 5o)~(O) - Eo)T) = Po

Other Assumptions EIll'k!ll = 0 for all j, k

State Estimate Extrapolation EI«-) = "'k-Uk-If+)

ErrorCovariance Extrapolation Pkf-) = "'k-I Pk-I(+) "'k-I T + Qk-I

State Estimate Update ik(+) = Ekf-) + Kkl.>k - HW(-)j

ErrorCovariance Update Pkf+) = )1- KkHkl PkH

KalmanGain Matrix Kk = Pkf-) HkTIHkPk(- )HkT + RkJ- 1

--

UPDATEDf~llMAn

: ~ I')

(4.2-19)

This relationship can easily be verified by showing that Pk(+) Pk-' (+) ; I. Weuse this result to manipulateKk as follows,

Kk; [Pk(+) Pk-' (+)] Pk(-) HkT [HkPk(-)HkT + Rd-'

; Pk(+)[Pk-'(-) + HkTRk-' Hd Pk(-)HkT [HkPk(- )HkT + Rkr'

Expanding and collecting terms yields

Kk; Pk(+)HkT [I + Rk-' HkPk(-)HkTl [HkPk(-)HkT + Rd-'

; Pk (+)Hk T Rk-' (4.2-20)

which is the simpler form sought.

A Property of the Optimal Estimator - We may verify by direct calculationthat

(4.2-21)

That is, the optimal estimate and its error are uncorrelated (orthogonal). Thestate andits estimate at time t l (+) are

OPTIMAL LINEAR FILTERING 113

Eli,(+)&,(+)T) ; -K, H, (4roP04roT + Qo)+ K, H, (4roP04roT + Qo)H, TK, T

+KIR1K1T

;-K,H,P,(-)[I-K,TH,T) +K,R,K,T

; K,(_H,P,(+)T + R,K, T)

;0

where use has been made of Eq. (4.2-20). We may now repeat the process forEli,(+)&,(+)T] and, by induction, verify Eq. (4.2-21).

Kalman Filter Examples - Several applications of optimal filtering arepresented in this section. Since analytic results are sought, only relativelylow-order systems are considered. However, it is well to note that, in practice,Kalman optimal filters for 50th order systems (and even higher) have beenconsidered, and loth and 20th order Kalman filters are relatively common. Withincreasing availability of small, powerful digital computers, the main factorswhich limit the size of implemented Kalman filters are no longer computerlimitations per se, but rathermodelingerrorsandassociated sensitivityproblems.These are discussed in Chapters 7 and 8.

and

.301 = <lJo.3o +Wo

11,(+); 410110 + K,~, - H, 410&01

; 41030 + K, [-H, 410&0+ H,wo + :£,]

(4.2-22)

(4.2-23)

Example 4.2·1Estimate the value of a constant x, given discrete measurements of x corrupted by an

uncorrelated gaussian noise sequence with zero mean and variance rooThe scalar equations describing this situation are

System

Measurement

respectively, where the measurement equation, 11 = H1XI t Y1 has beenemployed. Subtracting Eq. (4.2-22) from Eq. (4.2-23) yields an equation inil - Le.,

where

"k - N(O,rol

ii.(+) ~ (410 - K, H,4r o)&0 + (K, H, - l)wo + K,Yt

Since ElioiioT] ; 0, we directly calculate the quantity of interest as

(4.2-24) that is, vk is an uncorrelated gaussian noise sequence with zero mean and variance roo Forthis problem, V'o(tk,tj);: h = I and qk;: 0 yielding the variance propagation equation

Eli, (+l&,(+)T] ; E I[410110 + K, (-H,4ro&0+ H, Wo+ yt») •

1.&0T(4ro -K,H,4ro)T + WoT(K,H, -I)T + yt TK, TJI

;-K,H,4roPo(4roT -410TH, TK, T)+K,H,Qo(H, TK, T _I)

+K,R,K,T

and single-stage optimal update equation

Pk+I(+1 = Pk+I(-) - Pk+I(-1 (Pk+I(-) +ror'Pk+I(-)

= Pk+I(-)1+ Pk+I(-)

'0

[PII (+)

P(+) =

PI2(+)


Pk(+)1\+[(+)=--

1 + pJc{+)'0

This difference equation can be solved, starting with Po(+) = Po, by noting that

PI(+)=~1+~

'0

The discrete optimal filter for this example problem is

PoXk+l = Xk + __'0_ 12k - xkl

I"~k'0

For sufficiently large k, Xk+1 = Xk = xas new measurements provide essentially no newinformation.

Example 4.2-2

Consider two unknown but correlated constants, XI and X2. We wish to determine theimprovement in knowledge of x I which is possible through processing a single noisymeasurement of x2'

The vector and matrix quantities of interest in this example are:

where P(-) is the covariance matrix describing the uncertainty ina before the measurement,that is, (1) 2 is the initial mean square error in knowledge of XI> 02'2 is the initial meansquare error in knowledge of x'2, and 012'2 measures the corresponding cross-correlation.Computing theupdoted covariance matrix, P(+), according to Eq. (4.2-16), yields

[

,(o,'(I-P,)+,,\1 '(-")JPI2(+) 0'1 0'22+ r'2 1110'12 0'22+ r2

P22(+J = -a,;(-, "~ T-1

a?( ,+,,-,)-02 + r2 02 r2

I


where rz denotes the measurement noise covariance, and p is the correlation coefficientdefined by IEq. (3.6-7)J

za"p=--0')0'2

A few limiting cases are worth examining. First, in the case where the measurement isperfect (i.e., ra = 0), the final uncertainty in the estimate of X2, P22(+), is, of course, zero.Also, when p = 0, the final uncertainty in the estimate of x I is, as expected. equal to theinitial uncertainty; nothing can be learned from the measurement in this case. Finally, in thecase where p = ± 1, the final uncertainty in the estima te of x I is given by

PII(+) =a,'(-I1,/)+ <1:z r:z

and the amount of information gained [Le., the reduction in p,,(+)} depends upon the ratioof initial mean square error in knowledge of X2to the mean square error in measurement ofX2' All of these results are intuitively satisfying.

Example 4.2-3

Omega is a world-wide navigation system, utilizing phase comparison of 10.2 kHzcontinuous-wave radio signals. The user employs a propagation correction, designed toaccount for globally predictable phenomena (diurnal effects, earth conductivity variations,etc.j, to bring theoretical phase values into agreement with observed phase measurements.The residual Omega phase errors are known to exhibit correlation over large distances (e.g.,> 1000 nm). Design a data processor that, operating in a limited geographic area, canprocess error data gathered at two measurement sites and infer best estimates of phase errorat other locations nearby.

In the absence of other information, assume that the phase errors are well-modeled as azero-mean markov process in space, with variance 0',/. Further assume that the phase errorprocess is isotropic - Le., it possesses the same statistics in all directions, and that themeasurement error is very small

We may now proceed in the usual way. First, denote by '1'2and '1'3the two phase errormeasurements, and by '1'1 the phase error to be estimated. Then we can form.3,and.l.as

whence it follows that

Abo, by definition

116 APPLIED OPTIMAL ESTIMATiON OPTIMAL LINEAR FILTERING 117

where rlj is the distance between sites i and l. and d is the correlation distance of the phaseerror random process. The best estimator (in the sense of mmimum mean square error) isgiven by

where 6p(0). 6v(0) and 68(0) represent initial values of position, velocity and accelerationerrors, and higher-order terms have been dropped. Thus, we may write, as the stateequlltions for this system

i(+) = Kl.

sincei(-) ""Q.. K ~s given by 6v'" 6a

Sa'" 0

and the following result is obtained:

I {<P2[e-rI2/d_C-(rJ3 + r23)/dl {

'PI = I _ lr2J/d + <P3(e-rI3/d-e-(r12 +r23)/dJf

<P2= <P2

or, equivalently,

<P3= <P3 The radio position error is indicated by ep(t), see Fig. 4.2-5.

Figure 4.2-4 Phase Estimation Error - Kings Point, N.Y. (Ref. 14)

I) ,~ ~+O

epltl&0[01

[0I 0] [0o 0 1 I + 0

o 00 0

8v10)

t I"jI t

o I

Figure 4.2-5 System and Discrete Measurement Model

=G !n=~

80(01

"(I,Oj = eFI

For any initialestimatei(O) of the system state vector, we may compute the expected valueof i(T-).iust prior to the lust nx,as follows:

The transition matrix for this time-invariant system is readily computed as

24001000 1500 2000

GREENWICH MEAN TIME lhrsl

500

Example 4.2-4

Study the design of an optimal data processing scheme by which an Inertial NavigationSystem (INS), in a balloon launched ionospheric probe, can be initialized in early flightusing radio position updates.

In view of the intended application, it is appropriate to choose a model of INS errordynamics valid only over a several minute period. Taking single-axis errors intoconsideration, we may write

The first equation shows that ';'1 is computed as a weighted sum of the perfectmeasurements. The last two equations are, of course, a direct consequence of the assumedperfect measurements.

This technique has been successfully employed with actual Omega data; see Fig. 4.2-4.

t'SP(t) lO:<lip(O) +liv(O)t +sa(O)2"

i(r) = ..(T,o)i(o)

[~ P<O) +!v(O)T: ~'(O)T'/2 ]

= "(0) +6a(O)T,.(0)

Also, corresponding to an initially diagonal covariance matrix P(O), the value of P(T-) iscomputed according to (overbar denotes ensemble expectation)

which is symmetric. but not dit1gonol.That is, position error is now correlated with velocityand acceleration errors.

The. scalar position measurement is obtained by forming the difference betweenradio-indicated and inertially-indicated positions, viz:


under the assumption Pit (T-) >- (Jp1.. Thus, the first position fix reduces the INS positionerror to approximately the fix error (actually, somewhat below it). Similar calculationsshow that the velocity and acceleration errors are essentially unchanged. The next positionfaxreduces the INS velocity error. The Kalman gain at time tk for this system is

In the optimal filter, illustrated in Fig. 4.2--6, the sampler is understood to represent animpulse modulator. Note the model of the system imbedded in the Kalman filter,

where the indicated quantities, Pind- have been described in terms of their true value, );)t,plus error components. This is equivalent to writing

z = Pind(radio) - Pind(INS)

= Pt +ep - (Pt +6p)

=-6p+cp

z(f)= (-I 0 01

H ~peT] + 'pm

6v(T) -v-'am

which enables identification of H and the measurement error. Let us assume that the radioposition errors are stationary and uncorrelated from fix to fix. The effect of a flx at time Tis computed according to «(Jp1.= ;;;):

PW) =P(T-) - P(f-jHTIHP(T-)HT + RJ-1HP(r)

= P _ _ I [PII'(rJ I~T-;P12(f-JI~=)P,,(f=)j(T J PII(T )+ap' I -.!C12i!:.-!-!P12(T..2!:!'~

I I p,,'U'-j

The upper left corner element of P(rt) is, for example,

(4.2-25)

Figure 4.2-6 Optimal Filter Configuration

Figure 4.2-7 shows the result of a computer simulation of the complete system. The firstfix occurs at t = 0, the second at t = 0.5 min and the third at t = 1.0 min. In this system theinitial errors were PII(O) = (I nm)1., P1.1.(O) =(7.3 kts)1., P33(0)=Oand(Jp= 30ft Higherorder terms were included but, as indicated earlier, these are not of primacy importancehere. Thus, the results are largely as predicted by the analysis herein.

4.3 CONTINUOUS KALMAN FILTER

The transition from the discrete to the continuous formulation of the Kalmanfilter is readily accomplished. First, in order to go from the discrete system andmeasurement models [Eqs, (4.]-4, 5») to the continuous models"

"Throughout the remainder of the text. time dependence of all continuous time quantitieswill often be suppressed for notational convenience. For example, .3,(t) and Q(t) will bedenoted by .3,and Q. etc.

120 APPLIED OPTIMAL ESTIMATION OPTIMAL LINEAR FilTERING 121

CONTINUOUS PROPAGATION OF COVARIANCE

In discrete form, the state errorcovariance matrix was shown to propagateaccording to

The first two of these relationships were derived in Section 3.7. What nowremains is to establishthe equivalence between the discretewhite noise sequenceXl< and the (non-physically realizable) continuous white noise process y. Notethat, whereas Rk = E[Xl<YI?) is a covariance matrix, R(t) defined byE[y(t)yT(T)) = R(t)li(t - T) is a spectral density matrix (the Dirac function 5(tT) has units of I/time). The covariance matrix R(t)li(t - T) has infinite-valuedelements. The discrete white noise sequence can be made to approximate thecontinuouswhite noise processby shrinking the pulselengths (..:1t) andincreasingtheir amplitude, such that Rk -> R/At. That is, in the limit as At -> 0, the discretenoise sequence tends to one of infirrite-valuedpulsesof zero duration,such thatthe areaunderthe "impulse"autocorrelation function is Rk..:1t, equal to the areaR underthe continuous white noise impulseautocorrelation function.

Using these expressions, OUr approach is simply one of writing theappropriate differenceequationsand observing theirbehaviorin the limit as ..:1t-+O. The notation will be kept as simpleaspossibleand, in keepingwith the rest ofthe presentation, the derivations are heuristic and to the point. The readerinterested in a more rigorousderivation is referred to Kalmanand Bucy (Ref. 5).For presentpurposesit shallbe assumed that R is non-singular - i.e., R-t exists.In addition,it is assumedthat wand yare uncorrelated.

//

//

//

/7 FiXE51l'0.O.51 /

//

//

//

/;'

;'

'"/

"'"'

II

II

II "XI' ~OlI

II

II

II

II

II

II

I

1 0.70

;z0

:i:;;5 0.15

z0;::Vi2~

s 0.10

Zz<t:z:u

"zVi

t t tFIX TIMES

(4.3·3)

Thisis now rewrittenas

Figure 4.2-7 Optimal Use of Position Fixes in an Ionospheric ProbeInertial Navigation System

(4.34)

whereWI I are zero meanwhite noise processeswith spectraldensity matricesQand R, respectively, it is necessary to observethe followingequivalences, valid inthe limit as tk - tk_l = At ... 0:

.Ii = FlI +GlY (4.3·1)

(4.3-2)

Expansion yields

where O(..:1t2) denotes "terms of the order ..:1t2 ." As a consequence of optimal

use of a measurement, Pk(+) can be expressed as [Eq. (4.2·16b)]

~"'I+FAt Inserting this expression into the equation for Pk+ I(-) and rearranging termsyields

122 APPLIED OPTIMAL ESTIMATIONOPTIMAL LINEAR FILTERING 123

Examining the term Ijlit Kk, we note that (Eq. (4.2-15)/

it-.Kk = it- Pk(-)HkT(HkPk(-)HkT + Rkr'

= Pk(-)HkT(HkPk(-)Hk Tlit + Rklitr'

= Pk(- )HkT[HkPk(- )HkTlit + Rr'

(4.3-6)

,. ...TttfM~l~~ !.~l _'r-- --- ----,I CONTlNlK)US SYSTEM Mf"'5\JREMfNI

III !hlI\I

IwltlI-IIIII IL ~

CONTINUOUS lC"'lMAN FlllfR

HlU..-------'

!oIl)

Thus, in the limit as lit ... 0 we getFigure 4.3-1 System Model and Continuous Kalman Filter

(4.3.7)Replacing k_l by I + Flit and Kk by PHTl{"' lit and rearranging terms yields

and, simultaneously,

(4.3-8)

iK(+) -!tk-l(+) = F&k-l(+) + PHTK"' ~-Hkik-l(+)l +O(lit)

(4.3-11)

where we have made the substitution

(4.3-12)

System Model i(l) = F(t)lI(t) + G(t)ll!(t), ~t)-NUl.Q(l))

Measurement Model .l<tl=H('lJl.(')+~.(l), ~t)-N(Jl,R(t))

Initial Conditions EllI(O)] =io, E[{JI(O)-io){JI(O) -io)T] =Po

Other Assumptions R- 1(t) exists

State Estimate i(l) = F(l)l.(l) + K(t)lJ.(l) - H(t)l.(tl], itO) = ioError Covariance Propagation P(l) = F(l)P(l) + P(t)FT(l) + G(l)Q(l)GT(l)

I-K(t)R(l)KT(l), P(O)= Po

Kalman Gain Matrix K(l) = P(l)HT(t)R-'(l) when E[ll!(lllT(r)] =0

I= [1'(t)HT(l) +G(l)C(l)]R-'(l)

when E[lI:(t)l!.T(r)] = C(')6(l - r)'

In the limit as lit ... 0 this becomes

which is the continuous Kalman filter, and in which P is computed according toEq. (4.3·8); see Fig. 4.3-1. The continuous Kalman filter equations aresummarized in Table 4.3-1.

TABLE 4.3-1 SUMMARY OF CONTINUOUS KALMAN FILTER EQUATIONS(WHITE MEASUREMENT NOISE)

(4.3-9)

(4.3-10)

which is the linearvariance equation previously derived in Section 3.7.

Tracing the development of the terms on the right-sideof this equation, it isclear that FP + PFT results from behavior of the homogeneous (unforced)system without measurements, GQGT accounts for the increase of uncertaintydue to process noise (this term is positive semidefinite) and _PHTR- 1 HPaccounts for the decreaseof uncertainty as a result of measurements. Equation(4.3-8) is nonlinear in P; it is referred to as the matrix Riccati equation. In theabsence of measurementswe get

CONTINUOUS KALMAN FILTER

The discrete form of the Kalman filter may bewritten as [Eq. (4.2-5»)


The differential equation for ii can be obtained by subtracting Eq. (4.3.1)from Eq. (43-12) and employing Eq. (43-2). Recalling tbat i = 21 + ii, thisresults in (K =PJiTR-1

)


The scalarequations describing this situation are

X'" 0 System

&=(F -KH)ii -GlY+ K~

Analogous to the case of the discrete estimator, it can be shown that

(4.3-13)where

z"'x+v

v-NCO,')

Measurement

It is useful to obtain results for the case where the process and measurementnoises are correlated, viz:

One approach is to convert this new problem to an equivalent problem with nocorrelation between process and measurement noises. To do this, add zero to theright-side of Eq. (4.3-1), in the form>

where in Eq. (4.3-17) D~ is treated as a known input (Section 4.4) and (GlY D:l:l is treated as a process noise. By the choice of D specified above, E [(GlYDx)xT] = 0, and thus the measurement and process noises in this equivalentproblem are indeed uncorrelated. Applying previously derived results to thisreformulated problem, it is easily verified that the solution is as given in Table4.3-1, wilb the Kalman gain matrix specified by

E[i(t)ii(t)T) =0

CORRELATEDPROCESSAND MEASUREMENT NOISES

E[lY(t)):T(T)) =C(t) 6(t - T)

where D =GCR-1. The filtering problem now under consideration is

X=(F -DR)! + Dl.+ GlY- Dy

l.=!flI+;l:

K = [pJiT + Ge) R- '

(43-14)

(4.3-15)

(4.3-16)

(4.3-17)

. (4.3-18)

Figure 4.3-2 depicts the system and measurement models.

x 101

~,Fipre 4.3-2 System and Measurement Models

For this problem, f:: g '" q :: 0 and h :: 1; thus yielding the -scalarRiccati equation

Integrating this equation by separating variables, we get

r ~ = .: J' dtPo prO

Pop=--

l+~tr

Note that through the definition, ro :: rIT, we see that this ~ult is identicol to thatobtained in Example 4.2-1 at the instants t:: leT (k:: 0,1,2, .•. ), as It should be.

The continuous Kalman gain is

This method can also be used to derive results for the corresponding discretetime problem.

Example 4.3-1

Estimate the value of a constant x, given a continuous measurement of x corrupted by agaussian white noise process with zero mean and spectral density r.

*Method attributed to V.C. Ho.

~k:: R:: - '-

r 1+~tr

Figure 4.3-3 illustrates the filter which, acting on z, produces an optimal estimate of x, viz:

126 APPLIED OPTIMAL EST'MATIONOPTIMAL LINEAR FIL TERING 127

Po

i(t> = -'- [z - x(t)Jl+~t

r

k(t) = p(t)/,

= 0 coth Ot

In the limit of large t, k(t) ....... 0 and not zero as in the previous example. New measurementsare always processed in this case - a direct consequence of the noise term driving thesystem model,

PoT

~

Figure 4.3-3 Optimal Filter Design

Notice that, as t -+ '", k{t) -+ 0 and thus xU) tends to a constant, as indeed it should.

Example 4.3-2

A spacecraft is falling radially away from the earth at an almost constant speed, and issubject to small. random, high-frequency disturbance accelerations of spectral density q.Determine the accuracy to which vehicle velocity can be estimated, using ground-baseddoppler radar with an error of spectral density r. Assume poor initial condition information.

Let x denote the deviation from the predicted nominal spacecraft velocity. usingavailable gravitational models. We then have

pIt)

x e w , w- N(O.q)Figure 4.3-4 Tracking Error Versus Time

z =x+v, v ....N(O.r) 4.4 INTUITIVE CONCEPTS

P(t) <1:#Ol coth Ot

In the case of poor initial condition information, Po is large and thus p(t) reduces to

and consequently (f =O.g =h =1)

(4.4·1)

Covariance Matrix - Inspection of the equations describing the behavior ofthe error covariance matrix reveals several observations which confirm ourintuition about the operation of the filter. The effect of system disturbances onerror covariance growth is the same as that observed when measurements werenot available. The larger the statistical parameters of the disturbances asreflected in the "size" of the Q-matrix, and the more pronounced the effect ofthe disturbances as reflected in the "size" of the G-matrix, the more rapidly theerror covariance increases.

The effect of measurement noise on the error covariance of the discrete filteris observed best in the expression

Large measurement noise (Rk-1 is small) provides only a small increase in the

inverse of the error covariance (a small decrease in the error covariance) whenthe measurement is used; the associated measurements contribute little toreduction in estimation errors. On the other hand, small measurement errors(large Rk-

t) cause the error covariance to decrease considerably whenever a

p(O) = Pop=q_p2/r,

Employing the identity

f/~p2 = ~ In(:~~)we find directly (0 ~v;q, p ~ .J<i{r):

("p,,- C,-O::sh::.-Ilt=--+::O:..:Sl::'nh,:,-,-Pt,- )p(t) =0 -Po sinh IJt+ Ol cosh IJt

In the steady state, p(t) -+ Ol regardless of initial conditions; see Fig. 4.3-4. The Kalman filterfor thts problem looks like that illustrated in Fig. 4.3-3, with k(t) now given by


measurement is utilized. When measurement noise is absent, Eq. (4.2-16a) mustbe used because Rk-t does not exist.

The effect of measurement noise on the ability of the continuous Kalmanfilter to provide accurate estimates of the state, appears in the fourth term onthe right side of Eq. (4.3-8). If noise occurs in every element of themeasurement, Rand K"I are positive definite matrices. The term

(4.4-2)

is also positive definite and the negative of this will always cause a decrease inthe "size" of a nonzero error covariance matrix P. The magnitude of this term isinversely proportional to statistical parameters of the measurement noise. Largermeasurement noise will cause the error covariance to diminish less rapidly or toincrease, depending on the system dynamics, disturbances and the initial value ofP. Smaller noise will cause the ftlter estimates to converge on the true valuesmore rapidly. The effects of system disturbances and measurement noises ofdifferent magnitudes can be described graphically by considering the standarddeviation of the error in the estimate of a representative state variable. This ispresented in Fig. 4.4-1 for a hypothetical system which reaches statistical"steady state."

OPTIMAL LINEAR FilTERING 129

II

THIS IS ALSO A POSSIBILITY

IIIIII LARGER DIS1URBANCES AND

MEASUREMENT NOISE

SMALLER DISTURBANCES ANDMEASUREMENT NOISE

t-- STEADY STATE

(o) CONTINUOUS FILTER

Optimal Prediction - Optimal prediction can be thought of, quite simply, interms of optimal filtering in the absence of measurements. This, in turn, isequivalent to optimal filtering with arbitrarily large measurement errors (thusK"I -+ 0 and hence K -+ 0). Therefore, if measurements are unavailable beyondsome time, to, the optimal prediction of ,,(t) for &1 0 given &(to) must be

Kalman Gain Matrix - The optimality of the Kalman ftlter is contained in itsstructure and in the specification of the gain matrices. There is an intuitive logicbehind the equations for the Kalman gain matrix. It can be seen from the forms

(4.4-3)

To better observe the meaning of the expressions, assume that H is the identitymatrix. In this case, both P and R-t are nxn matrices. If j{"1 is a diagonal matrix(no cross-correlation between noise terms), K results from multiplying eachcolumn of the error covariance matrix by the appropriate inverse of mean squaremeasurement noise. Each element of the filter gain matrix is essentially the ratiobetween statistical measures of the uncertainty in the state estimate and theuncertainty in a measurement.

Thus, the gain matrix is "proportional" to the uncertainty in the estimate,and "inversely proportional" to the measurement noise. Ifmeasurement noise islarge and state estimate errors are small, the quantity J!. in Figs.4.2-2 and 4.3-1 isdue chiefly to the noise and only small changes in the state estimates should bemade. On the other hand, small measurement noise and large uncertainty in thestate estimates suggest that J!. contains considerable information about errors inthe estimates. Therefore, the difference between the actual and the predictedmeasurement will be used as a basis for strong corrections to the estimates.Hence, the filter gain matrix is specified in a way which agrees with an intuitiveapproach to improving the estimate.

'"o'"~

zQ

~....'"w'":;:'"

Figure 4.4-t

I-- STEADY STATE-----'

{bl DISCRETE FILTER

Behavior of the RMS Error in the Kalman Filter Estimate of a Particular State Variable

obtained from [Eqs. (4.3-10) and (4.3-12)] :

OPTIMAL LINEAR FllTERINO 131

for which the solution is·

i(t) = 4>(t,to)j(to)

i(t) =F(t)i(t)

discrete

continuous

(4.4-4)

(4.4-5)

I

P(t) = I 4>(t,T) G(T) Q(T) GT(T)~T(t, T) dro

(4.4-7)

where $(t,T) is the transition matrix corresponding to F. If the integral ispositive definite for some t > 0, then P(t) > 0 - i.e., the process noise excitesallthe states in the system. The system is said to be uniformly completelycontrollable when the integral is positive definite and bounded for some t > o.The property of stochastic controllability is important for establishing stabilityof the filter equations and for obtaining a unique steady-state value ofP. Whenthe system is stationary, and if Q is positive definite, this criterion of completecontrollability can be expressed algebraically; the result is exactly that discussedin Section 3.5.

In the case of discrete systems, the condition for complete controllability isexpressed as

Stochastic Observability - In the absence of process noise and a prioriinformation, the continuous system matrix Riccati equation is given by

using the matrix identity, p-I = _p-l PP-I. The solution to this linear equation inp-I is

The corresponding equations for uncertainty in the optimal predictions, givenP(t o), are Eq. (4.3-3) for a single time stage in the discrete case and Eq. (4.3-9)in the continuous case.

Note that both the discrete and continuous Kalman filters contain an exactmodel of the system in their formulation (i.e., the F or <P matrices). Thisprovides the mechanism by which past information is extrapolated into thefuture for the purpose of prediction.

System Model Contains Deterministic Inputs - When the system underobservation is excited by a deterministic time-varying input, lL, whether due to acontrol being intentionally applied or a deterministic disturbance which occurs,these known inputs must be accounted for by the optimal estimator. It is easy tosee that the modification shown in Table 4.4-1 must be made in order for theestimators to remain unbiased. By subtracting 2' from &. in both discrete andcontinuous cases, it is observed that the resultant equations for.& are preciselythose obtained before. Hence, the procedures for computing P, K, etc., remainunchanged.

TABLE 4.4-1 MODIFICATION TO ACCOUNT FOR KNOWN INPUTS U1k-1 ORJ!('))

System Model Estimator

Discrete ik(+) = ¢k-l.iIc-I(+)+ Ak-l.!!k-l

4k=~k-l&k-l+Wk-l + Ak-l.!!k-l + Kkll.k - Hkook-liIe_I(+)

.Q::: HkAk+.n. -Hkr\k_l!!."_l]

Continuous

i(')' F(llA(!) + G(')~(') + L{t)y.<') i(t) =F(t)i(!) + L(I)J!(')

l.ll) =H{')a(l) + ~I) + K{I)~(I) - H{,)i(,»)

Stochastic Controllability - In the absence of measurements and with perfecta priori information, the continuous system matrix Riccati equation is

PII';;; ~ ~(k,i+I)Qi~T(k,i+I)';;;P21i=k-N

for some value of N > 0, where PI > 0 and P,> O.

P= FP + PFT - PHTJ{"I lIP, P(O)... co

This can be rewritten as

t

p-I (I) = f <j>T(T, t)W(T)R-I(T)H(T)~T,t)dTo

(4.4-8)

(4.4·9)

(4.4-10)

(4.4-11)

P(O) = 0 (4.4-6)*This is easily verified by substitution into Eq. (4.4-6), using Leibniz' rule

I, t

-dd A(t,r) dr = A{t , t) + 1 _.iI A{t,'I')drt 0 0 t

where cl>(t,T) is the transition matrix corresponding to F. If the integral ispositive definite for some t > 0, then P- 1(t) > 0, and it follows that °<P(t) <00 - i.e., through processing measurements it is possible to acquire information(decrease the estimation error variance) about states that are initially completelyunknown (Ref. 6). The system is said to be uniformly completely observablewhen the integral is positive definite and bounded for some t > O. When thelinear system is stationary, this criterion of complete observability can beexpressed algebraically; the result is exactly that discussed in Section 3.5 (Ref.8).

In the case of discrete systems, the condition for uniform completeobservability is

4.5 CORRELATED MEASUREMENT ERRORS

Measurements may contain errorswhose correlation times are significant. Letus assume that, through the use of a shaping filter, these measurement errorsaredescribed as the output of a first-order vector differential equation forced bywhite noise. One might argue that the technique of state vector augmentationcould then be used to recast this problem into a form where the solution hasalready been obtained; but it is readily demonstrated (Refs. 9·12) that this is notthe case in continuous-time systems, and undesirable in the case of discrete-timesystems.

(4.5·1)

(4.5·2)

lY-N(Q,Q)

STATE VECTOR AUGMENTATION

Consider the continuous system and measurement described by

(4.4·12)k

"II';; L <l>T(i,k)HjTR;-1 Hj<l>(i,k).;;",Ii=k-N

for some value of N > 0, where "I >0 and '" > O.

Stability - One consideration of both practical and theoretical interest is thestability of the Kalman filter. Stability refers to the behavior of state estimateswhen measurements are suppressed. For example, in the continuous case, the"unforced" filter equation takes the form

It is desirable that the solution of Eq. (4.4·13) be asymptotically stable;- i.e.,loosely speaking, &(t) -+ II as t -+ ~, for any initial condition &:0). This willinsure that any unwanted component of i caused by disturbances driving Eq.(4.4-13) - such as computational errors arising from finite word length in adigital computer - are bounded.

Optimaiity of the Kalman filter does not guarantee its stability. However, onekey result exists which assures both stability (more precisely, uniformasymptotic stability) of the filter and uniqueness of the behavior ofp(t) foriarget, independently of P(O). It requires stochastic uniform complete observability,stochastic uniform complete controllability, bounded Q and R (from above andbelow), and bounded F (from above). References I and 5 provide details of thistheorem and other related mathematical facts. It is important to note thatcomplete observability and controllability requirementsare quite restrictive and,in many cases of practical significance, these conditions are not fulfilled; butKalman filters, designed in the normal way, operate satisfactorily. This isattributed to the fact that the solution to Eq. (4.4·13) frequently tends towardzero over a finite time interval of interest, even though it may not beasymptotically stable in the strict sense of the definition. Perhaps, from apracticalviewpoint, the key issues pertaining to various forms of instability arethose associated with modeling errorsand implementation considerations. Theseare discussed in Chapters 7 and 8.

(4.5·3)

(4.5·5)

(4.54)

lY. -N(ll,Q,)

where

The augmented state vector lI,'T= [lI, llT satisfies the differential equation

and the measurement equation becomes

In this reformulated problem, the equivalent measurement noise is zero.Correspondingly, the equivalent R matrix is singular and thus the Kalman gainmatnx, K = PHTK" I , required for the optimal state estimator, does not exist.

CONTINUOUS TIME. R SINGULAR

There is another approach to this problem which avoids both the difficulty ofsingular R and the undesirability of working with a higher order (i.e., augmented) system. From Eq. (4.5·3), where we see that i - El is a white noiseprocess, we are led to consider the derived measurement ~I,

(4.4·13)&(t) = [F(t) - K(t) H(t)l&:t)

134 APPLIED OPTIMAL ESTIMATiON OPTIMAL LINEAR FIL TERING 135

&W) = &(0) + P(O)HT(O)[H(O)P(O)HT(O) + R(Olr' ll.(0) - H(O)i(O)) (4.5·14)

For additional details, including the treatment of general situations, see Refs. 9and II.

where E[x(OlYT(O)] = R(O). The initial condition on the filter in Fig. 4.5·1 isdependent upon the initial measurement and cannot be determined a priori, viz:

I, =.i-EI

= HX+ Hi - EHX- Ey + Y

=(H + HF - EH)X+(HGlY + lYll

(4.5-6)

where the definitions of HI and X.I are apparent. For this derived measurement,the noise YI is indeed white; it is also correlated with the process noise ~. Thecorresponding quantities R, and C, are directly obtained as (E[~,(thYT(T)1 =0)

(4.5·7)

(4.5·8)

and

p(Q+)= P(O) - P(O)HT(O)[H(O)p(O)HT(O) + R(Olr1 H(O)P(O)

initial condition = &(0+) - K, (O)l/O)

(4.5·15)

(4.5·16)

and thus the Kalman gain matrix for this equivalent problem, from Table 4.3-1,is

There remain two aspects of this solution which warrant further discussion;namely, the need for differentiation in the derived measurement and appropriateinitial values of &,(t) and p(t). Assuming K,(t) exists and K,(t) is piecewisecontinuous, use of the identity

K, =[PH, T +GC.I R,"

= [P(H + HF - EH)T + GQGTHTJ [HGQGTHT + Q, r'The equations for &(t) and P(t) are

&= Fg +K,<i- EI- H1il

p=FP+ PFT + GQGT - K,R,K,T

d . •at (K,~ =K'lI + K'lI

(4.5·9)

(4.5·10)

(4.5·11)

(4.5·12)

>-_-_~ltl

Figure 4.5-1 Correlated Noise Optimal Filter, Continuous Time

DISCRETE TIME, Rk SINGULAR

State vector augmentation can be employed in the discrete-time problem withcorrelated noise to arriveat an equivalent measurement-noise-free problem, as In

Eqs. (4.5-4) and (4.5-5). Let us suppose that the equivalentproblem is describedby

enables rewriting Eq. (4.5·10) in the form

(4.5·13)

Xk = <l>k-1Xk-l +lYk-l (4.5-17)

(4.5-18)

In this manner, the need to differentiate the measurements is avoided. Figure4.5-1 is a block diagram of the corresponding optimal estimator. Regarding theinitial estimate and error covariance matrix, note that the instant aftermeasurement data are available(t == 0+), the following discontinuities occur,

Although the measurement noise is zero (hence, Rk = 0 and Rk-1 does not

exist), apparently there is no difficulty because the discrete Kalmangain matrixdoes not explicitly involve Rk-

l . However, consider the expression for updatingthe error covariance mattix (Table 4.2-1),

136 APPLIED OPTIMAL ESTIMATIONOPTIMAL LINEAR FILTERING 137

(4.5-19)Therefore

It is easy to see from this expression that(4.6-5)

(4.5-20)and the equivalent linear system equations are:

By denoting the transition matrix for this linear system as (namely, <l>(to+r,to)= <l>(r) = eMT), we can write

Pk(+) must be singular, of course, since certain linear combinations of the statevector elements are known exactly. Error covariance extrapolation is performedaccording to (fable 4.2-1)

(4.5-21)

Therefore it follows that if 4\: -- I [i.e., as it would be, in a continuous systemwhich is sampled at points close in time relative to the system time constants)and if Ok is small, the covariance update may become ill-conditioned [Le.,Pk+l(-) ~ Pk(+)]. To alleviate this problem. measurement differencing may beemployed, which is analogous to measurement differentiation previouslydiscussed. Details of this procedure are available in Ref. 12.

4.6 SOLUTION OF THE RICCATI EQUATION

[~] = [G:: HT:IH][~]M

[~y~r~! ~y~(rB rr(to)]

LAy(r) I ,,(r)] lA<to)

<l>(r)

(4.6-6)

(4.6-7)

The optimal filtering covariance equations can only be solved analytically forsimple problems. Various numerical techniques are available for more cornplicated problems. Numerical integration is the direct method of approach fortime-varying systems (i.e., when the matrix Riccati equation has time-varyingcoefficients). For an nth order system, there are o(n+I)/2 variables involved [thedistinct elements of the n X n symmetric matrix P(t)].

Systems for which the matrix Riceati equation has constant (or piecewiseconstant) coefficients, or can be approximated as such, can be treated asfollows For the nonlinear equation

P= FP + PFT + GQGT - PHTK"1HP, p(t o) given

the transformations (Ref. 5)

and

(4.6-1)

(4.6-2)

where 4t(T) is shown partitioned into square n X n matrices. Writing out theexpressions for r(to+r) and ~to+r) and employing Eq. (4.6-2) yields

(4.6-8)

Thus, the original nonlinear matrix Riccati equation has been converted to anequivalent, albeit linear, matrix equation and solved. Once 4>(T) is computed,P may be generated as a function of time by repeated use of Eq.(4.6-8). Thesolution to the Riccati equation is obtained without any truncation errors, and issubject only to roundoff errors in computing 4>. Note the importance ofperiodically replacing P with its symmetric part, (P + pT)/2, to avoid errors dueto asymmetry. This method is often faster than direct numerical integration.Although this technique is equally applicable to general time-varying systems, M= M(t); thus, is a function of both to and r in these cases. The added difficultyin computing <l>(to+r,to) is usually not justifiable when compared to othernumerical methods of solution.

Solution of the linear variance equation

(4.6-3) P=FP + PFT + GQGT , P(to) given (4.6-9)

result in a system of linear differential equations. Using the above, we compute

~=Pr+Pi

= F'1: + PFTz+ GQGTz- PHTK"t HPr - PFTr + PHTR- t HPr (4.6-4)

is easily obtained by noting that HTK"t H =0 in this case and thus <l>yy(r) =(e-PT)T, ,,(r) =eFT, and <l>YA(r) =O. The desired result, obtained from Eq.(4.6-8), is

(4.6-10)


and when all the eigenvalues of F have negative real parts, the unique steadystate solution of the linear variance equation is

(4.6-11)

Example 4.6-1

For a certain class of integrating gyroscopes. all units have a constant but a priofiunknown drift rate, e. once thermally stabilized. The gyroscopes are instrumented tostabilize a single-axis test table. Continuous indications of table angle, 8, which is a directmeasure of the integrated gyroscope drift rate, are available. The table angle readout hasanerror. e, which is well described by an exponential autocorrelation function model ofstandard deviation a (sec) and short correlation time T (sec). Design an efficient real-timedrift rate test data processor.

The equation of motion of the test table, neglecting servo errors, is

and the measurement is expressed as

(see Fig. 4.6-0. Four methods of attack now suggest themselves.

(4.6-16)

8=et+8o

and the measurement is described by

z » 8 +e

(4.6-12)

(4.6-13)

FigUJe4.6-1 Model of the System and Measurement

Method 1 - Having identified the system and measurement equations, the Riccatiequation

p= FP+ PFT - PHTR1HP

with

(4.6-14)

where w is a white noise of zero mean and spectral density q = 202rr(sec2 /sec).There are two ways to formulate the data processor. One is to augment the two

differential equations implied by Eq. (4.6-12) with Eq, (4.6-14), resulting in a third-ordersystem (three-row state vector), and to proceed as usual. Another is to approximate therelatively high frequency correlated noise by a white noise process and thus delete-Eq,(4.6-14). resulting in a second-order system. In this case the spectral density of the noise(now directly representing e) is r =2a 2T (sec2sec). We proceed using the latter approach.

The differential equations corresponding to Eq. (4.6-12) are:

8=e

E=O

must be solved for P(t). Note the absence of the term GQGT in this case. With F and H asgiven in Eqs. (4.6-15) and (4.6-16), the Riccati equation becomes

r~" ~I~ ~12 P'1 + rp" ~ _+ rp,,'r12 P2~ lo oJ l02, ~ ~"PI'

PII =2Pl2 - PII 2/ r

Pl2 =P22 - Pl1P12/r

P22 =-PI22/r

which is a set of coupled nonlinear differential equations. While this set of equations can besolved directly (albeit, with considerable difficulty), there are better methods of approach.

[:} [: :][:]--F---;:-

(4.6-15)

Method 2 - Following Eq. (4.6-6), form the matrix M to obtain

to 0 I 11, 0]

M= =;-~i ~-io 0 I 0 0


(4.6-17)

(4.6-18b)

Fip.re 4.6-2 Optimal Filter Configuration

[~11 ~l~ = _ fa" a12l fo IJ _ro °l [a" a12l + fl: :]ara a22J ~12 a22J Lo 0 ~ oj al2 anJ L

Method 3 _ Notice that there is no noise term driving the state vector di!Tere~tialequation. In this case the nonJineor Riccati equation in P is identical to the folIowmg lineardifferential equation in p-I

p-I "" -P'"I F _ FTP'"I +HTR-IH

Equivalence is readily shown by premultiplying.apd postm~tiplYing both sidesl of th.~equation by P and noting the relationship P :: -PI' P. Denoting the elements of P'" byaij.

we write

tP22(0)+t2PII(0)P22(0)12]

P2,(O)+tp" (O)P22(O)I,

The partitioned matrices 4»yy. 4»yA,4»AY and 4»11.11. are readily identified. With to::0 and T::

t, and assuming P(O) is diagonal,

fpll(O) 0]P(O)· l 0 P2,(O)

Eq. (4.6-8) yields

I [PI I (O)+t 2p22(O) +t3

PII (O)P12(0)13r

P(O = "(0 ~P22(0)+"P,,(0)P22(0)/2'

where

lbis is the complete answer for P(t). K(t) can now be computed and the optimal filter hasthus been determined. In the case where there is no a priori information. PI I (0) =P22(0) - 00 and the limit of the above result yields

The transition matrix corresponding to M is computed next. In this case the matrixexponential series is finite, truncating after four terms, with the result

[

I 0 I TI' T'/2'JI, 3-T 1 I -T 12r -T 16r

4»(T) = --- - - ----o 0 I I T

o 0 I 0 I

(4.6-18a)

P ~/t 6/t'J(0='

6/t' 12/t3

The Kalman gain is thus given by

(4.6-19)

[I I' -er r J

:: -e r r - 2a12

and the optimal filter is as illustrated in Fig. 4.6-2.In the limit as t_oo • our theory says that K(t)-+O corresponding to the covariance matrix

limit Pct) - O.This steady state would not be one of ideal knowledge if there were any othererror sources driving Bq. (4.6-15). Although we have not modeled any such errors, it iscertain that they will indeed exist and hence influence the drift rate test. Therefore. thetheoretical prediction P(t)-+O should be viewed as unrealistically optimistic. and should inpractice be a(ljusted to account for other expected noise terms.

Equivalently.

all:: I/r

The solutions are.

all =:+al1(O)

142 APPLIED OPTIMAL ESTIMATiON OPTIMAL LINEAR FIL TERING 143

information enters the system (P~HTI<I HP_), and (b) the system dissipation dueto damping (expressed in F) (Ref. 6).

The corresponding steady-state optimal filter is given by

In the case where P(O) is given by Bq. (4.6-17), the corresponding elements of 1'"1(0) area,,(O) = I!p,,(O), a,,(O) = lip" (0) and a12(O)= O.Thus,

[

til + IIp,,(O) -t'/21 - tip " (0) 1I'"I(!) =

-t'/2r - tip" (0) t3/31 + t'/p" (0) + IIp:12(O) £(1) = Fj(t) +!\., Wt) - Hj(t)] (4.7-2)

which, upon Inversion, yields Eq. (4.6-18). where Koo is constant (Koo = PooHTK"I). This equation may be rewritten

Method 4 - It has been mentioned that the choice of state variables is not unique. Thepresent example serves well for exploration of this point. Suppose we use. as an alternativesystem state vector,

&(t) - (F - KJ;I)&(t) = !\.,!\,t) (4.7-3)

Laplace transforming both sides and neglecting initial conditions yields

(sl - F + KJ;I>&(s)= K_!\,s) (4.7-4)

Briefly, it follows that where s is the Laplace transform variable. Thus,

H=[ltl,F=O (4.7-5)

and consequently that

pa-I= ! [I t J

r t t 2

The quantity in brackets (representing the transfer function), which operateson .l:(s) to produce j(s), is the Wiener optimal filter. For example, in the scalarcase, Eq. (4.7~5) may be written

REFERENCES

1. Kalman, R.E., "New Methods in Wiener Filtering," Chapter 9, Proc. of the FirstSymposium on Engineering Applications of Random Function Theory and Probability,John Wiley and Sons, Inc., New York, 1963.

2. Kalman, R.E., "A New Approach to Linear Filtering and Prediction Problems," JournalofBasic Engineering (ASME), Vol. 820, March 1960, pp. 35-45.

3. Smith, G.L., "The Scientific Inferential Relationships Between Statistical Estimation,Decision Theory. and Modem Filter Theory," Proc. JA ce, Rensselaer Polytechnic Inst.,June 1965, pp. 350-359.

which is the optimum filter in conventional transfer function form.Underlying Wiener filter design (Ref. 13), is the so-called Wiener-Hop!

(integral) equation, its solution through spectral factorization and the practicalproblem of synthesizing the theoretically optimal filter from its impulseresponse. The contribution of Kalman and Bucy was recognition of the fact thatthe integral equation could be converted into a nonlinear differential equation,whose solution contains all the necessary information for design of the optimalfilter, The problem of spectral factorization in the Wiener filter is analogous tothe requirement for solving n(n+l)/2 coupled nonlinear algebraic equations inthe Kalman filter [Eq. (4.7·1)].

(4.7-6)x(s) _ k_z[s) - s+(k_ h-f)

(4.7-1)

Integrating and inverting, asswning no a priori infotmation (i.e.• ,-1 (0) = 0), yields

[4/t -6/t'Jp - I

a - _6/t2 12/t 3

4.7 STATISTICAL STEADY STATE-THE WIENER FILTER

where Poo denotes the steady-state value of P. In this steady state, the rate atwhich uncertainty builds (GQGT) is just balanced by (a) the rate at which new

This result can be reconciled with Eq. (4.6-19). The procedure is left as an exercise for thereader.

In the case where system and measurement dynamics are linear, constantcoefficient equations (F, G, H are not functions of time) and the driving noisestatistics are stationary (Q, R are not functions of time), the ftltering processmay reach a "steady state" wherein P is constant. Complete observability hasbeen shown (Ref. 5) to be a sufficient condition for the existence of asteady-state solution. Complete controllability will assure that the steady-statesolution is unique. Thus, for P= 0, we have

4. Laning, J.H., Jr. and Battin, R.H., Random Processes ill Automatic Control,McGraw-Hill Book Company, lnc., New York, 1956, pp. 269-275.

5. Kalman, R.E. and Bucy, R., "New Results in Linear Filtering and Prediction," Journalof&sic Engineering(ASME), Vol. 83D, 1961, pp. 95-108.

6. Bryson, A.E., Jr., and Ho, Y.C., Applied Optimol Control, Blaisdell PublishingCompany, Waltham, Mass., 1969, pp. 366-368.

7. Sorenson, H.W. and Stubberud, A.R., «Linear Estimation -rheorv," Chapter 1 ofTheory and AppUcations of Kolman Filtering, C.T. Leondes, Editor, NATO AGARDograph 139, February 1970.

8. Kalman. R.E., Ho, Y.C., and Narendra, K.S., "Controllability of Linear DynamicalSystems," Contributions to Differential Equations, Vol. I, McMillan, New York, 1961.

9. Bryson, A.E.• Jr. and Johansen, D.E., «Linear Filtering for Time-Varying Systems UsingMeasurements Containing Colored Noise," IEEE Trans. on Automatic Control, Vol.AC-IO, No. I,January 1%5, pp. 4-10.

10. Deyst, J.J., "A Derivation of the Optimum Continuous Linear Estimator for Systemswith Correlated Measurement Noise," AIM Journal, Vol 7, No. 11, September 1969,pp.2116-2119.

11. Stear, E.B. and Stubberud, A.R., "Optimal Filtering for Gauss-Markov Noise," Int. J.Control, Vol. 8, No.2, 1968, pp. 123-130.

12. Bryson, A.E., Jr. and Henrikson, L.J., «Estimation Using Sampled Data ContainingSequentially Correlated Noise," J. Spacecraft end Rockets, Vol. 5, No.6, June 1968,pp.662-665.

13. Wiener. N., The Extrapolation, Interpolation and Smoothing ofStationary Time Series.John Wiley & Sons, Inc., New York, 1949.

14. Kasper, J.F., Jr., "A Skywave Correction Adjustment Procedure for Improved OMEGAAccuracy," Proceedings o[ the ION National Marine Meeting, New London, Connecticut, October 1970.

15. Sutherland, A.A.• Jr. and Gelb, A., "Application of the Kalman Filter to Aided InertialSystems," Naval Weapons Center, China Lake, California, NWCTP 4652, August 1968.

16. Lee, R.C.K., Optimal Estimation, Identification, and Control, Research Monograph No.28, MIT Press. Cambridge, Mass. 1964.

17. Potter, J.E. and Fraser, D.C., "A Formula for Updating the Determinant of theCovariance Matrix," AIAA Journlll, VoL 5. No. 7. July L967, pp. 1352-1354.

PROBLEMSProblem 4-1

Repeat Example 1.0-1 by treating the two measurements (a) sequentially and (b)simultaneously, within the framework of Kalman filtering. Assume no a priori information.

Problem 4-2

Use the matrix inversion lemma to easily solve Problems I-I and 1·3 in .a minimumnumber of steps.

Problem 4·3

Repeat Problem 1-4 by reformulating it as a Kalman filter problem and considering (a)simultaneous and (b) sequential measurement processing. Assume no a priQl'; informationabout xo.

OPTIMAL LINEAR FILTIRING 146

Problem 4-4The weighted-least-squares estimate, i(-), corresponding to an original linear measure

ment set!o. measurement matrix Ho and weightin@;matrix Ro is given by [Eq. (4.0-5»)

• (T-l)-I T-'!.(-)= Ho Ro Ho Ho Ro !:JJ

Suppose an additional measurement set,.b becomes available. Defining the followingmatrices for the complete measurement set,

the new estimate, .!(+). can be found as

i(+) =(H1TR 1-1H1)-1H 1TR1-

1.!1

Using the definitions for Hj , RI,.!, above, show that.!:(+) can be manipulated into the form(1"'(-)= HoTRo-1HO):

!(+) = !(_) + P(+)HT R-l l.1- H!(-»)

1"1(+) =1"'(_) + HTR-1H

Problem <Hi

Choose i to minimize the particular scalar loss function

and directly obtain the recursive. weighted-least-squares estimator,

g(+) =i(-) + P(+)HTR-' ~-Hi(-»)

1"'(+) =1"'(_) + HTR- 1H

Problem~

For the particulat linear vector measurement equation,.!:: H!.+,I. where.::: - NCQ.R) isindependent of!.o the conditional probability~ can be written as

Demonstrate that the estimate, .!. which maximizes p~l.!> (Le., the maximum likelihoodestimate) is found by minimizing the expression (l-Hil'1"""R- 1 (l-Hi). Show that

.& = (HTR-1H)-l HTR-11.

State the recursive form of this estimate.

146 APPLIED OPTIMAL ESTIMATiON

Problem 4-7

Consider a gaussian vector ]i. where.! - N<i,(-), P(-», and a linear measurement,,!::H.!+J:,where the gaussian measurement noise.!. is independent of.! and!."" N<Q.,R).

a) Show that,! - N(H.!(-), HP(-)HT+R)

b) Show that the a posterioridensity function of.! is given by (Ref. 16)

c) By direct substitution into this relationship, obtain

{I [ - T -I - T -I

P~!I e c exp -1: !.&-.!.<-1I p (-l !.&-.!c.HI +~-Hil R ~-Hil

-1!.-Hi(-lITIHP(-lHT+Rfll!.-HiHI]}

where c is a constant.

d) Complete the square in the expression in braces to obtain the form

P<.2f.!!l e c exp {-i- (i~-E(+)ITP-I(+lI'-&(+)ll}

and thus, identify as the Bayesian maximum likelihood estimate (maximum a posterioriestimate)

.!(+):: !(-) + P(+) HTR-1l!-H.!(-»)

Y'I(+):: P-I(_) + HTR-IH

Problem 4-8


Problem 4-9

a) Can a Kalman filter separate two biases in the absence of a priori information? Given apriori information?

b) Can a Kalman filter separate two sinusoids of frequency Wo given no a prioriinformation? Given a priori information about amplitude and/or phase?

c) Can a Kalman filter separate two markov processes with correlation time T given no apnon information? Given a priori information?

Problem 4-10

A random variable, x, may take on any values in the range _00 to 00. Based on a sampleof k values, Xi, i=I.2, ... ,k, we wish to compute the sample mean, mk, and samplevariance, ~k2 • as estimates of the population mean, m, and variance, 0

2• Show that unbiasedestimators for these quantities are:

-2 1 ~ - 2"k = k-l l,.,(X;-D'Jc)

I-I

and recast these expressions in recursiveform.

Problem 4-11

A simple dynamical system and measurement are given by

Show that the optimal filter error variance is given by

For the system and measurement equations,

lY- N<Q,Ql

:c N<Q,R)

xe ax e w ,

a e bx e v ,

w - N(O,q)

v-N(O,r)

consider a linear filter described by

~=K'.! +K,!

where K' and K are to be chosen to optimize the estimate, x. First, by requiring that theestimate be unbiased, show that K':: F - KH, and thus obtain-

Next, show that the covariance equation of the estimation error is

p=(F - KH)P+ P(F _ KH)T+ GQGT+ KRKT

Finally, choose K to yield a maximum rate of decrease of error by minimizing the scalarcost function, J=trace[P), and find the result:

(apo + q) sinh I3t + ttPo cosh fltP(')=( 2 )

~po-a sinh,Jt+l3coshl3t

where

l3=a~l+b:qa r

Demonstrate that the steady-state value of p(t) is given by

p_=~ (l+~\t ::: )

independent of Po. Draw a block diagram of the optimal filter and discuss its steady-statebehavior.

148 APPLIED OPTIMAL ESTIMATiONOPTIMAL LINEAR FILTERING 149

Problem 4-12

A second-order system and scalar measurement are illustrated in Fig. 4-1, wherew ..... N(O,q). Draw a block diagram of the Kalman optimal filter for this system. and showthat the steady-state Kalman gain matrix is

Repeat the calculation for the case where simultaneous measurements of XI and x2 aremade. Assume uncorrelated measurement errors.

Problem 4-14

The output of an integrator driven by white noise w [where w"'" NCO,q)J is sampledevery 4. seconds (where A"'tk+l r-tk '" constant), in the presence of a measurement noise, Vk(where Vk ..... N(O,ro)J. Assume that there is no a priOf'i information. Calculate Pk(+) andPk+1(-) for k = 0,1,2. etc. and thus. demonstrate that. for all k,

Pk(+) '" ro and Pk+1(-) = ro + q4. for q4.'> ro

and

for ro > q4.

Sketch and physically interpret error variance curves for each of these cases.

Figure 4-1 Example Second-Order System

Problem 4-13

Show that the optimal filter for detecting a sine wave in white noise, based upon themeasurement

Problem 4-15

Reformulate Example 4.24 using the alternate state vector 1!a = [0P(0) ov(O) oa(O)]T.Make appropriate arguments about initial conditions and employ the matrix inversionlemma to arrive at the following result:

Ps(T')' I I j0:1 T:1 r T T:1

I~,(f.~ ,~,~,§,~{ __ ~p,!-,o)__ -l __ =-'P.,,(O-,- __1 I I __op_'_. _1_. -~I _-~

~.t(T) t-PI~)P~(O~PJ~O)~PI~~ __ 2!!2(~ __

1- I I~_._I_.~L I I PII(O)P22(O) P:12(O) PII(O)

(sinrJ.e)traceP \5;: A:: 02

Specifically. show that (Oi 2< 02 for i = 1,2,3)

Zi= 8i + vi.

1 [ , 'T']Aa<n = PII(0)P22(O)P33(0) 0p +Pl1(O)+P22(O)T +P33(0)""4

and Pll(O) =EI'p'(O»), P22(O) =E['y'(O)]. p.,(O) =EI'a'(O»). Reconcile this resultwith the exact version of Eq, (4.2-25).

Problem 4-16

By measuring the line of sight to a star, a spacecraft stellar navigation syst~m .canmeasure two of the three angies comj,rising fl.. the navigation coordinate frame misalignment. For 6 = [6 1 62 63 1T, P(O) '" 0 1. demonstrate the value of a "single-star fix" (i.e., ameasureme";;tof 61 and ( 2), Then. assume another measurement on a different star (i.e.• ameasurement of 91 and 83), and thus, demonstrate the value of a "two-star fix." Assumethat each component 0(£ is observed with an uncorrelated measurement error, viz. (i=1. 2or 3)

where

Figure 4-2 Optimal Sine Wave Estimator

2(t-to) - sin 2(1-to)

[(t-tO)2 - sin2(t-to)J

2sin2( t- to )

k l1 ( t) =

z(t) = XI(1) cos(t-n + x2(t) sin(t-n + v(t)

where v -- N(O.r), is as shown in Fig. 4-2, where (Ref. 1)

and

(two) 2

trace P star I'll ~ + 0:1'2 + 03'2

fix 2

This formula for updating the determinant of the covariance matrix has been shown to bevalid independent of whether Hk is square (Ref. 17).

Problem 4-20

Problem 4-17

A polynomial tracking futer is designed to optimally estimate the state of the systemdescribed by x(t) = O.given scalar measurements z - x + v, v ""'N(O,r). Assuming no a prioriinformation and measurements spaced T time units apart, show that (k :::: 1. 2, ...)

Find the optimal linear estimator for x, given that

v is uniform over (-1,t)z » x+ v,

Observations of a constant parameter x are made through a digital instrument withquantization levels of width q. A reasonable approach for "small" q is to model thequantlzer as a noise source whose distribution is uniform over (-qJ2, qJ2), and which isuncorrelated with x - r.e.

!,][

2k + 12t

Pk+I(+)· (Ie + l)(k+2) ~

and

[

2(2k+ I) J(k+ I)(k+ 2)

Kk+1 = 6

T(k+ I)(k+ 2)

Problem 4-21

Observations z of the constant parameter x are corrupted by multiplicative noise - i.e., ascale factor error, n,

Z=(1+fl)X

(Hint: It may prove useful to employ the relationship p(Ofl = lim el, and to solve byinduction.) E -+ 0

Problem 4-18

A system and measurement are given as,

!!: - N<Q,Q)

l; - N<Q,R)

Show that the optimal differentiator associated with a particular output of the system,

where

E[x:1}:::: ox'2, £[7/'2} =ofl'2

ElK! =El"l =Elox! =0

(a) Find the optimal linear estimate of x based on a measurement z(x =kz}. (b) What is themean square error in the estimate?

Problem 4-22

Design an optimal linear filter to separate a noise net) from a signal set) when thespectral densities for the signal and noise are given by:

is

Why is it incorrect to compute! by formingi -= ME and then differentiating the result?<Hint: this problem in Wiener fIltering can be solved as the steady-state portion of a Kalmanfiltering problem).

Problem 4-19 Problem 4-23

where Bk is the angular position at t :::: kT, and vk is the measurement error. Theuncertainties in initial conditions are described by

Consider a satellite in space which is spinmng at a constant, but unknown, rate. Theangular position is measured every T seconds, viz.:

k = I, 2, 3 ...Zk=8k+V)c,

Manipulate the discrete covariance matrix update equation into the form

and thus show that, when Hk is square and nonsingutar,

E[602] ""(20deg)2, E[9oiJ ""(20deg/sec)2

E[B.]-EIB.] =EIB •••]-O

Write the system state equations and then the linear filter equations to give an optimalestimate of the position and velocity after each observation.


relative to the water is constant. but unknown with mean ms and variance as2; the currenthas a constant mean value me. The random component of current can be modeled asintegrated white noise of spectral density 'Ieand initial variance cc 2; position measurementsa:re made every T hours by radio or sextant methods. The errors in these readings areindependent of each other and have mean zero and variance aR 2 (miles)2; the initialposition estimate (T hours before the first measurement) is xo. with uncertainty ao2.

Problem 4-24An RC filter with time constant T is excited by white noise. and the output is measured

every T seconds. The output at the sample times obeys the equation

where

T

xj e e TXk_l+WJc._l, k = I, 2, ...

a) Set up the state equations for this system. Show clearly what differential equationsa:re needed.

b) Set up the filter equations to give a continuous estimate of position and the errorin this estimate. Be sure to specify the contents of all matrices and vectors, and allinitial conditions.

Problem 4-26

The differential equation for the altitude hold mode of an airplane autopilot is given by

Elx.] = I, E[X.'J = 2

{2 j = k

E(Wk! = 0, ElwjWk] = 0 j" k

The measurements are described by

h(tl + O.OO6h(l) + 0.003h(t) = 0.3Ihc(t ) + O.Olhelt)!

where h represents altitude and he is commanded altitude. The altitude command he ismodelled as a constant h CO plUSgaussian white noise 6hc<t) in the command channel

hell) = he. + .helt)

The constant hco is a normal random variable with statistics

where VIc is a white sequence and has the following probability density function:

Zk""Xk+VJt, k e I, 2, ... mean "" 10.000 ftvariance « 250,000 ft2

Noise in the command channel has the following statistics

f(v)

-2

Fijjure4-3

Find the best linear estimate (a) of Xl based on ZI, and (b) X2 based on ZI and Z2; when

Zl = 1.5and Z2 "" 3.0.

Problem 4-25Design a data processor to determine the position of a ship at sea. E~:nine ~ne

dimensional motion (e.g., North-South) under the following assumptions: the ship s velocity

.he - N(O, 400 ft' sec)

and 6he is independent of all other variables.Discrete measurements of altitude are available every 10 seconds and we wish to process

them to obtain the minimum variance estimate of altitude. The altitude measurementscontain random errors

z(lk) =h(lkl + Vh(lk)

where z(tk) is measured altitude and Vh(tk) is a white sequence of measurement errors,

Vh- N(O, 100 ft')

Determine the difference equations defining the minimum variance estimator of h(t).Write these equations out in scalarform.

Problem 4-27

Consider the circuit in Fig. 4-4. It has been constructed and sealed into the proverbialblack box. Capacitor C I has a very low voltage rating and it is desired to monitor the voltageacross C I to determine when it exceeds the capacitor limit.

154 APPLIED OPTIMAL ESTIMATION OPTIMAL LINEAR FIL TERING 165

r(t) = R (R constant>

If Ul(t) = U2(t) = 0, these equations admit the solution

9(t) =wt (w constant)

X3 = R (9 -- wt),XI =r - R,

• ;(1) I I)8(1) = -28(1) r(l) + Lr(l) U2(1)

where R3

w2 =Go - i.e., circular orbits are possible. Let XI. X2, X3, and xa be given by the

relationships

itt) = r(t) 6(t)2 - ~o + Ul(t)r(l)

~---------------,,BLACK BOX 1

i "'+ ~ f"~' f', i 1L -.JR,'R,"OHM; c, 'c, "I

and show that the linearized equations of motion about the nominally circular solution are

Note that there is no process noise included in the above state equations.It is desired to measure these small orbital deviations from observations on the ground.

Two proposals are presented. (a) In an effort to keep the measurement stations rathersimple and inexpensive, only angle (X3) measurements will be made. However. the designerrealizes the very likely possibility of measurement errors and includes an optimal ItIter in hisproposal for estimating the states. The measurement may be represented as

The only measurement that can be made on this system is the output voltage, eo. However,thanks to an exceedingly good voltmeter, perfect measurements can be made of this voltageat discrete times. In order to estimate the voltage across C r, assume that u(t) can bedescribed as

U....N(O. 2 volt 2 sec)

Detennine an expression for the optimal estimate of the voltage across C I. Asswne thatthe system starts up with no charge in the capacitors. Plot the variance of the error in theestimate as a function of time, taking measurements every half second for two seconds.

Problem 4-28

["'(1)] [0~2(t) = 3w

2

X3(t) 04(1) 0

1 0OJ [Xl(ll [0 0] [u1(t)1o 0 2w X2(t) + 1 0 U2(t)J

o 0 1 x,(I) 0 0

-2w 0 0 X4(t) 0 1

The motion of a unit mass. in an inverse square law force field, isgoverned by a pair ofsecond-order equations in the radius r and the angle 9.

Z(t) = X3(t) + V3(t),

(b) The second design proposes to use measurements of range (Xt). In this case

It is your task to determine Which of these proposals is superior.

Problem 4-29

Consider the scalar moving average time-series model,

Figure 4-5

If we assume that the unit mass has the capability of thrusting in the radial direction with athrust ur and in the tangential direction with a thrust u2, then we have

zk=rk+rk_l

where {rkl is a unit-variance, white gaussian sequence. Show that the optimal one-steppredictor lor this model is (assume Po '= 1)

(Hint: use the state-space formulation of Section 3.4)

5. OPTIMAL LINEAR SMOOTHING OPTIMAL LINEAR SMOOTHING 151

uncorrelated errors, since process and measurement noises are assumed white.This suggests that the optimal combination of .&(t)and '&b(t) will, indeed, yieldthe optimal smoother; proof of this assertion can be found in Ref. 1.

Three types of smoothing are of interest. In fixed-interval smoothing, theinitial and final times 0 and T are fixed and the estimate ,&(tIT)is sought, where tvaries from 0 to T. In FIXed-point smoothing, t is fixed and i(tIT) is sought as Tincreases. In FIXed-lag smoothing, i(T-.:lIT) is sought as T increases, with .:lheld fixed.

In this chapter the two-filter form of optimal smoother is used as a point ofdeparture. Fixed-interval, fixed-point and fixed-lag smoothers are derived for thecontinuous-time case, with corresponding results presented for the discrete-timecase, and severalexamplesare discussed.

5.1 FORM OF THE OPTIMAL SMOOTHER

Following the lead of the previous chapter, we seek the optimal smoother inthe form

For unbiased filtering errors, &.(t) and Xb(t), we wish to obtain an unbiasedsmoothing error, i(tIT); thus, we set the expression in brackets to zero. Thisyields

where A and A' are weighting matrices to be determined. Replacing each of theestimates in this expression by the corresponding true value plus an estimationerror, we obtain

Smoothing is a non-real-time data processing scheme that uses all measurements between 0 and T to estimate the state of a system at a certain time t,where 0 ~ t ~ T. The smoothed estimate of x(t) based on all the measurementsbetween 0 and T is denoted by &(tIT). An optimal smoother can be thought ofas a suitable combination of two optimal filters. One of the filters, called a"forward filter," operates on all the data before time t and produces theestimate i(t); the other filter, called a "backward filter," operates on all the dataafter time t and produces the estimate ib(t). Together these two filters utilize allthe available information; see Fig. 5.D-1. The two estimates they provide have

.&(tIT); A.&(t)+A'~(t)

&(tIT); [A + A' - I] x(t) + A&(t) +A'ib(t)

(5.1-1)

(5.1-2)

BACKWARD FILTER

~~b

(5.1-3)

and, consequently,

o ----------I It T

~~

FORWARD FILTER

i(tlT); A&(t) + (I - A) ~(t)

Computing the smoother error covariance, we find

P(tIT); E[&(tIT) &T(tIT)]

; AP(t) AT + (1- A) Pb(t)(I- A)T

(5.1-4)

(5.1-5)

Fip.re 5.6-1 Relationship of Forward and Backward Filterswhere product terms involving &(t) and ib(t) do not appear. P(tIT) denotes thesmoother error covariance matrix, while P(t) and Pb(t) denote forward andbackward optimal filter error covariance matrices, respectively.

158 APPLIED OPTIMAL ESTIMATION OPT1MAL LINEAR SMOOTHING 159

OPTIMIZATION OF THE SMOOTHER

Once again. following the previous chapter, we choose that value of A whichminimizes the trace of P(t1T). Forming this quantity, differentiating withrespect to A and setting the result to zero, we find

SMOOTHING Pit I T)

II

I5 I~ /z FORWARD BACKWARD /~ FILTERING FILTERING /

~ "'" Pili Pb//lt/I /

~"~ ~/5 '........ __/'

I-- -------------ii'---- -7---;....;.""'--~'-'--'---~

(5.1-6)

(5.1-8)

(5.1-7)

or

0; 2AP + 2(1 - A)Pb(-1)

Inserting these results into Eq. (5.1-5), we obtain

and, correspondingly

TIME ------+

By systematically combining factors in each of the two right-side terms of thisequation, we arrive at a far more compact result. The algebraic steps aresketched below,

P(tlT); Pb(p+Pbrl p (I+p b'l pt' +P(P+Pbll Pb(P'IPb + If'

; Pb(P+ Pbrl (p-I +Pb'lr1

+P(P+Pbll (p' l +pb'l)

; (p-I +pb'lf' (5.1-10)

or

Figure 5.1-1 Advantage of Performing Optimal Smoothing

REINTERPRETATION OF PREVIOUS RESULTS

It is interesting to note that we could have arrived at these expressions byinterpretation of optimal fllter relationships. In the subsequent analogy,estimates .&b from the backward filter will be thought of as providing"measurements" with which to update the forward filter. In the corresponding"measurement equation," H =I, as the total state vector is estimated by thebackward filter. Clearly, the "measurement error" covariance matrix is thenrepresented by Pb' From Eq (4.2-19), in which pk'l(_) and pk'

I(+) are nowinterpreted as p-I (t) and p-I [tl'F), respectively, we obtain

(5.1-1 I)(5.1-13)

From Eq. (5.1-11), P(tlT) <; Pit), which means that the smoothed estimate of~(t) is always better than or equal to its filtered estimate. This is shown gra~h

ically in Fig. 5.1·1. Performing similar manipulations on Eq. (5.1-1), we find

i(tIT); Ai(t) + (I - A) ib(t)

; Pb(P + Pbll i(t) + PIP + Pbll ib(t)

; (p-I +Pb'lr1

P-li(t)+ (p-I +Pb'lr' Pb"ib(t)

; P(tIT) [p-I (t) i(t) + pb'l (t) ib(t)]

Equations (5.1-1 I) and (5.1-12) are the results of interest.

(5.1-12)

Equations (4.2·16b) and (4.2-20) provide the relationships

Kk ; Pk(+)HkTRk" ->P(tIT)Pb'l(t)

(I - KkHk): Pk(+) pk'I(-) ->P(tIT) P-I(t)

which, when inserted in Eq. (4.2-5) yield the result

(5.1-14)

where ik(-) and ik(+) have been interpreted as i(t) and i(tIT), respectively.Thus, we arriveat the same expressions for the optimal smoother and its errorcovariancematrix as obtained previously.


5.2 OPTIMAL FIXED·INTERVAL SMOOTHER

The forward-backward filter approach provides a particularly simple mechanism for arriving at a set of optimal smoother equations. Other formulations arealso possible, one of which is also presented in this section. For the moment, werestrict our attention to time-invariant systems.

OPTIMAL LINEAR SMOOTHING 161

(5.2-8)

but the boundary condition on Xb(T) is yet unknown. One way of avoiding thisproblem is to transform Eq. (5.2-6) by defining the new variable

FORWARD-BACKWARD FILTER FORMULATION OF THEOPTIMAL SMOOTHER

For a system and measurement given by

where, since Xb(T) is finite, it follows that

.&:t=T)=o. or '&:7=0)=0.

(5.2-9)

(5.2-10)

~-N(o.,Q)

y- N(Q,R) (5.2-1)Computational considerations regarding the equations above lead us to theirreformulation in terms ofPb- l

• Using the relationship

the equations defining the forward filter are, as usual,

P=FP + PFT + GQGT - PHTR"' HP, P(O) =p.

&= FR +PHTR"' [~- Hi), &(0)=&. (5.2-2)

(5.2-3)

d P -, _ P _I (d P ) P -,liT b -- b liT b b

Eq. (5.2-7) can be written as

(5.2-11)

The equations defining the backward fllter are quite similar. Since this filter runsbackward in time, it is convenient to set T = T - t. Writing Eq. (5.2-1) in termsof T, gives"

(5.2-12)

for which Eq. (5.2-8) is the appropriate boundary condition. Differentiating Eq.(5.2-9) with respect to 7 and employing Eqs. (5.2-6) and (5.2-12) andmanipulating, yields

for 0 :s;;; T :s;;; T. By analogy with the forward filter, the equations for thebackward filter can be written changing F to -F and G to -G. This results in

=-FJ>-G~ (5.2-4)

(5.2-5)

(5.2-6)

(5.2-13)

for which Eq. (5.2-10) is the appropriate boundary condition. Equations(5.1-11, 12) and (5.2-2, 3,12,13) define the optimal smoother. See Table 5.2-1,in which alternate expressions for X(tIT) and P(tIT), which obviate the need forunnecessary matrix inversions, are also presented (Ref. 1). These can be verifiedby algebraic manipulation. The results presented in Table 5.2-1 are for thegeneral, time-varying case.

(5.2·7)

At time t = T, the smoothed estimate must be the same as the forward filterestimate. Therefore, &(TIT) = &(T) and P(TIT) = P(T). The latter result, incombination with Eq. (5.1-11), yields the boundary condition on Pb-

I,

*In this chapter, a dot denotes differentiation with respect to (forward) time t. Differentiation with respect to backward time is denoted by dId".

ANOTHER FORM OF THE EQUATIONS

Several other forms of the smoothing equations may also be derived. One isthe Rauch-Tung-Striebel form (Ref. 3), which we utilize in the sequel. Thisform, which does not involve backward filtering per se, can be obtained bydifferentiating Eqs. (5.1-11) and (5.1-12) and using Eq. (5.2.12). It is given byEqs. (5.2-2) and (5.2-3) and"

*From this point on. aUdiscussion pertains to the general time-varying case unless statedotherwise. However, for notational convenience. explicit dependence of F. G. H. Q. Rupon t may not be shown.

162 APPLIED OPTIMAL ESTIMATION OPTIMAL LINEAR SMOOTHING 163

TABLE 5.2-1 SUMMARV OF CONTINUOUS, FIXED-INTERVAL OPTIMALLINEAR SMOOTHER EQUATIONS, TWO·FILTER FORM

System Model !(I) = F(I)~(t) + G(I)\\,(t), \\,(1) -N[Q,Q(I)J

Measurement .!(t) =H(t)X(t) +y(t), y(l) - NIQ,R(I))Model

Initial Conditions EI,(O)J = '0. E(l!(O) - '0)(,(0) - 'o)TJ = Po

Other EI~(tl)yT(t2)J =0 for all t10 ta: R-1(t) existsAssumptions

Forward Filter i(,) = F(I),(I) + P(I)HT(I)R" (llI.(I) - H(t)'(O]. itO)=io

P(t) = F(t) P(I) + P(I)FT(I) + G(I)Q(t)GT(I)I

Error CovariancePropagation -P(I) HT(I) R" (I) H(I) P(I), P(O) = Po

Backward Filter -;; !(T-r)= [FT(T-r) - Pj,"(T -r)G(T -r)Q(T-r)GT(T-,)J!(T-o-)(T= T-t)

+ HT(T-T)R-1(T-Tl~(T -T), 1(0) =Q

Error Covariance .! Pb-1(T -T) = i»b- I (T-T)F(T-T) + FT(T -T)P1J-1rr -T)drPropagation

_Pj," (T -,JG(T -r)Q(T-r)GT(T -r)Pj," (T-,J(T=T-t)

+ HT(T _T)R-1(T-T)H(T -T), Pb" (0) = 0

Optimal ,(liT) = P(t1T) [p"' (t),(I) + ,(I)JSmoother = (I + P(I) Pj,"(t)] ,It) + P(IIT),(I)

Error Covariance P(t1T) = [..-' (I) + Pj," (t)r'Propagation = PIt) - P(I) Pj,"(I)[1 + P(')Pj," (1)1" P(I)

All smoothing algorithms depend, in some way, on the forward filteringsolution. Therefore, accurate filtering is prerequisite to accurate smoothing.Since fixed-interval smoothing is done off-line, after the data record has beenobtained. computation speed is not usually an important factor. However. sinceit is often necessary to process long data records involvingmany measurements.computation error due to computer roundoff is an important factor. Hence, it isdesirable to have recursion formulas that are relatively insensitive to computerroundoff errors. These are discussed in Chapter 8 in connection with optimalfiltering; the extension to optimal smoothing is straightforward (Ref. 1).

The continuous-time and corresponding discrete-time (Ref. 3) fixed-intervalRauch-Tung-Striebel optimal smoother equations are summarized in Table 5.2-2.In the discrete-time case the intermediate time variable is k, with final timedenoted by N. PklN corresponds to P(tIT), XklN corresponds to X(t!T) and thesingle subscripted quantities ik and Pk refer to the discrete optimal filtersolution. Another. equivalent fixed-interval smoother is given in Ref. 4 and thecase of correlated measurement noise is treated in Ref. s.

~ltlT)

Diagram of Rauch-Tung-Striebel Fixed-Interval ContinuousOptimal Smoother {t " T)

~(T)

Figure5.281

~ltl

(5.2·J4) _i(t[T) = Fx(tIT) + GQGTp" (t)[X(t[T) - X(t)]

(5.2·16)

P(tIT)= [F+GQGTp-'(t)] P(tIT)+P(tIT)[F+GQGTp"(t)]T -GQGT

(5.2·J5)

Equations (5.2·14) and (5.2·15) are integrated backwards from t = T to t = 0,with starting conditions given by X(T[T) = X(T) and P(TIT) = peT). Figure 5.2·1is a block diagram of the optimal smoother. Note that the operation whichproduces the smoothed state estimate does not involve the processing of actualmeasurement data. It does utilize the complete filtering solution. however. sothat problem must be solved first. Thus, fixed-interval smoothing cannot bedone real-time. on-line. It must be done after all the measurement data arecollected. Note also.that P(tlT) is a continuous time function even where P(t)may be discontinuous, as can be seen from Eq. (5.2·15).

SMOOTHABILITY

A state is said to be smoothable if an optimal smoother provides a stateestimate superior to that obtained when the final optimal fllter estimate isextrapolated backwards in time. In Ref. 2, it is shown that only those stateswhich are controllable by the noise driving the system state vector aresmoothable. Thus, constant states are not srnoothable, whereas randomlytime-varying states are smoothable. This smoothability condition is exploredbelow.

Consider the case where there are no system disturbances. From Eq. (5.2.14),we find(Q=O)

~(tIT) = Fx(tIT)

TABLE 5.2·2 SUMMARYOF RAVCH·TUN<rSfRIEBEL FIXED-INTERVALOPTIMAL SMOOTHER EQUATIONS


and

Continuous-Time p(tIT) = P(T) (5.2·21)

(See Table 4.3-1 for the required continuous optimal filter terms)

Smoothed State i('IT);F(t)i(lIT) + G(I)Q(I)GT(I)P"(,)[i(lIT) - i(t)]Estimate

where T is fixed, t <lO;; T. and g(TIT) = X(T).

Error Covariance P(IITl; [F(t) + G(')Q(llGT(t)P" ('lIP(IITlMatrix Propagation

+ P(IITl(F(,) + G(')Q('lGT(tjP"('l] T - G(')Q(I)GT(t)

where P(TIT) = p(n

for all t ~ T. That is, smoothing offers no improvement over filtering when F =Q = O. This corresponds to the case in which a constant vector is being estimatedwith no process noise present. Identical results clearly apply to the m constantstates nf an n'h order system (n > m); hence, the validity of the smoothabilitycondition.

Example 5.2·'This spacecraft tracking problem was treated before in Example 4.3-2. The underlying

equations are:

Part 1- The forward filter Riccati equation is (f=O, g=h=l)

which, in the steady state (P=O),yields p = ..Jfii = 0:. The backward filter Riccati equation isfrom Eq. (5.2-7)

and the steady-state optimal filter solution was shown to be pet) = 0:, where 0: = .Jrq.Examine the steady-state, fixed-interval optimal smoother both in terms of (l) forwardbackward optimal filters, and (2) Rauch-Tung-Striebel form.

Discrete-Time

(See Table 42-1 for the required discrete optimal filter terms)

Smoothed State iklN = i k(+) + Aklxk+lIN - ik-+-l(-)]Estimate

where

Ak;PJc(+)<>k T!']c+1"(-), iNIN;iN(+)ro,k'N-1.

Error Covariance PJcIN; !']c(+) + Ak [PJc+lIN - Pk+1(-)1 AkTMatrix Propagation

where ~IN = PN(+) for k = N - I

The solution is (i(TIT) =geT))

x= w,

z = x + v,

w- N(O,q)

v e- N(O,r)

x(tlT) = (t,T)i(T) (5.2-17) ~Pb=q-Pb2/rdr

Thus, the optimal fixed-interval smoother estimate, when Q = 0, is the finaloptimal filter estimate extrapolated backwards in time. The correspondingsmoothed state error covariance matrix behavior is governed by

Whichhas the steady state Pb = yrq = 0:. Thus, we find, for the smoothed covariance

P(tlT) = (p-I (t) -+- Pb-I(t»)-I

P(tIT) =FP(tIT) +P(tIT)FT

for which the solution is [P(TIT) =peT)]

(5.2·18)Q

2

which is half the optimal ruter covariance. Consequently,

p(tlT) =(t,T)P(T)<l>T(t,T)

If, in addition, F =0, it follows that (t,T) =1 and hence, that

(5.2·19) X(IIT); p(IIT) (X('l/p(,) + xb(t)/Pb(t)1

;+ [x(t) + xb(t)l (5.2-22)

&(tIT) =i(T) (5.2-20)The smoothed estimate of x is the average of forward plus backward estimates, insteady state.

Pact 2 - In Rauch-Tung-Striebel form, the steady-state smoothed covariance matrixdifferential equation (Table 5.2-2, T fixed, t " T) is

",,,T) ~ ~P(tlTl - q

tor which the voluuon I~ (q/o = (1,P(TIT) = 0)

P(tIT) ==f (l + e- 211(T-t) , r c T

This result is plotted in Fig. 5.2-2. For T-t sufficiently large (i.e., T -t > 2/(1), thebackward sweep IS 10 steady state. In this case, we obtain P(tIT) = 01/2, as before. Thecorresponding differential equation for the smoothed state esnmare, from Table 5.2-2, is

This can be shown to be identical to Eq. (5.2-22) by differentiation of the latter withrespect to time and manipulation of the resulting equation.


Figure 5.2-3 Block Diagram of Gyroscope Drift Rate Model and Test Measurement

Consider the aspects of observabjlity and smoothability. The equations governing thissystem are

ITIME ----...

i<5~

"~ s,

>-

0g>

p Ltl __

T FIXED

t~ T

[::1. [~ 1 ~ ][::] + [w~]xJ 0 -I/T ~ W,

F ~

""d

'k = (I 0 0 01 [::] + vk~ X3

'4

Figure 5.2-2 Optimal Filter and Smoother Covanances for Example 5.2-1

Example 5.2-2

Thrs example describes an investigation of the applicability of fixed-interval optimalsmooJhing to gyroscope Jesting. In the resr considered. a gyroscope is mounted on a scrvoedturntable and samples of the table angle, which IS a measure of the integrated gyroscopedrift rate, e, are recorded. The gyroscope drift rate is assumed to be a linear combination ofa random bias b, a random walk, a random ramp (slope m), and a first-order markov process;this is shown in block diagram form in Fig. 5.2-3. The available measurements are thesamples, {lk, corrupted by a noise sequence, vk- and measurement data are to be batchprocessed after the test.

The test for observability involves determination of the rank of the matnx z (Section 3.5),where in this case

Z:=[HT: FTHT: (FT)2 HT! (FT)3 HT]I I I

= [~~ 0 ~]o 0 0

o -Alr I/T2

It is easily shown that the rank of z:IS four, equal to the dimension of the system; hence,the system is completely observable. Performing the test for observabrlity, as Illustratedabove, can lend insight into the problem at hand and can help avoid attempting impossible

168 APPLIED OPTIMAL ESTIMATION OPTIMAL LINEAR SMOOTKlNG 169

o

#SMOOTHED ESTIMATE

-:~ filTERED ESTIMATEo ~ ACTUAL DRFT RATE

o 0

I 3,----.-------~- __.

ui

~ II~

'"08 o.N

<i 0

:2 08

~Z 0.7

00

Figure 5.2·5 Comparison of Real Data With Filtered and SmoothedEstimates (Ref. 7)

10 20

TIME {boors}

sequences, one representing the random walk input and one representing the sequence ofmea.sure.men~ error~ Figure 5.2-5 compares a sample of a "real" time history of state x3(which ID this case IS €, the entire drift rate) with filtered and smoothed estimates. The solidline connects the "real" hourly values of x3. The dashed line indicates the real bias drift b(the initial value) and the ramp slope m (the slope of the dashed line). The departure fromthe dashed line is due to the random walk component. Each filtered estimate of xa is basedon the partial set of measurements (z I to zk). Each smoothed estimate of X3 is based on thefull set of measurements (Zl to zN).

The relative advantage of the smoother is most apparent in the first two hours, where thefilter has only one or two data points to work with. The smoother is also more accurategenerally, throughout the 2a-hour time span due to its ability to "look ahead" at thesubsequent data points. The example also shows that the filter tends to lag the real datawhenever it takes a major "swing" up or down. The smoother does not exhibit this type ofbehavior.

A STEADY-STATE, FIXED-INTERVAL SMOOTHER SOLUTION

The backward filter Riccati equations [(5.2-7) and (5.2-12)] can be solved bytransforming the n X n nonlinear matrix differential equations to 2n X 2n linearmatrix differential equations, precisely as was done in Section 4.6. Therefore,this approach warrants no further discussion here. The linear smoothercovariance equation given in Eq. (5.2·15) can also be treated in a manner similarto that used in Section 4.6; this is briefly treated below. Defining thetransformations

2010TIME(hours)

COMPlETE SYSTEM(~: x)+ x41(T":15hoursl

°o!:---....L----f;;----'----,,',

tasks. For example, it is tempting to add another integrator (state variable) to the systemshown in Fig. 5.2·3, in order to separate the bias and random walk components, with thehope of separately identifying them through filtering and smoothing. When such afive-dimensional system is formulated, and the appropriate F(5 X 5) and H(I X 5) areconsidered, the resulting z (5 X 5) matrix has rank four. This five-dimensional system is notcompletely observable because two state variables have the same dynamic relationship to themeasured quantity; it is impossible to distinguish between the bias drift and the initialcondition of the random walk component. Thus, these two components should becombined at one integrator, as in Fig. 5.2-3, where no such distinction is made. In thisexample, the randomly varying states x I, x3, and x4 are smoothable; the constant state, xais not.

Figure 5.2-4 shows normalized estimation errors for the drift rate € over a 20-hourperiod, based upon drift rate samples taken once per hour. Results are shown for thecomplete system and for a simpler three-dimensional system which does not include themarkov process component of gyroscope drift rate. In the latter case, state x3 is the entiredrift rate, which is the sum of a ramp, a bias, and a random walk. The filtered estimate of x3is much improved over that in the four-dimensional situation, and reaches equilibrium afterapproximately four hours. The estimate at each point is evidently based primarily on thecurrent measurement and the four previous measurements. The fixed-interval smoothedestimate reduces the rms error by almost half, being largely based on the currentmeasurement, the preceding four measurements, and the subsequent four measurements.

A numerical example based on simulated real data is presented in order to graphicallyillustrate the difference between a set of filtered and smoothed estimates of gyroscope driftmeasurements. The example corresponds to the three-dimensional 'no-markov-process case.A set of "real" data is generated by simulating this three-dimensional linear system andusing a random number generator to produce initial conditions and two white noise

---+-- FilTER

~ SMOOTHER

Figure 5.2-4 Error in Filtered and Smoothed Estimates With and Without theMarkov Process (Ref. 7) A= p(tIT) X (5.2-23a)

and

(5.2-23b)&(tIT) = ,(t,T) &(T) - J; ,(t,T)GQGTp-I(T)&(T) dr (5.3-2)

we fmd where ,(t,T) is the transition matrix corresponding to F + GQGTp-l(t), that is,

Since the boundary condition for the smoothing process is specified at t = T, welet

Equation (5.3-2) is readily verified as the solution to Eq. (5.3-1) bydifferentiation with respect to t and use of Leibniz' rule.*

Now consider t fixed, and let T increase. Differentiating Eq. (5.3-2), makinguse of Leibniz' rule, we obtain

[i] [-[F+ GQGTp-l(t)]T 0 J[l]~ = -GQGT F + GQGTp-I(t) 8

(5.2-24)

4>,(t,T) = [F + GQGTp-'(t)] ,(t,T), ,(t,t) = I (5.3-3)

T=T- t

and useT as the independent variable in Eq. (5.2-24). Thus, we obtain

(5.2-25)d&~iT) = d<l>~~,T)&(T)+,(t,T) d~i) - [Q-,(t,T)GQGTp-I(T)&(T)+ Jl]

= -,(t,T)[F+GQGTp-1 (T)] &(T)+ ,(t,T)jF&(T)+ K(T)WT)

- H(T)&(T)]} + ,(t,T)GQGTp-1(T)&(T)

= ,(t,T)K(T)WT)- H(T)&(T)] (5.3-4)

where we have used the known optimal filter differential equation for &(T), andthe relationship (T;;' t)

The latter is easily shown by differentiating the expression ,(t,T),(T,t) = Iwith respect to t and usingEq. (5.3-3).

To establish the fixed-point. optimal smoother covariance matrix differentialequation, write the solution to Eq. (5.2-15) as [P(TIT) = P(T)]

An expression similar to Eq. (4.6.8) may, in principle, be written for therecursive solution to the smoothing covariance equation. As before, thisformulation of the solution is only of practical value when all elements of thesquare matrix in Eq. (5.2-26) are constant. However, if the system is observableand time-invariant, p(t) tends to a stationary limit, and the constant value ofP(t), denoted p~, can be employed in Eq. (5.2-26). If the time interval underconsideration is sufficiently large and if Pco is employed in Eq. (5.2-26), thelatter would also have a stationary solution. In many cases of practical interest itis precisely the stationary filtering and smoothing solutions which are sought; inthese cases the solution to Eq. (5.2-26) is obtained in the form ofEq. (4.6-8) byiterating until changes in the diagonal elements of p(tlT) are sufficiently small.

d<l>,(t,T)= -,(t,T)[F(T) + G(T)Q(T)GT(T)P-I (T)]dT

,(t,t)= I

(5.3-5)

5.3 OPTIMAL FIXED-POINT SMOOTHER

When the smoothing solution is sought only for a specific time of interest, itis more efficient to reformulate the smoother equations to perform the task thanto accomplish it through use of the fixed-interval smoother. To do this, we firstwrite Eq. (5.2-14) in the form

p(tIT) = ,(t,T)P(l),T(t,T) - J; ,(t,T)G(T)Q(T)GT(T),T(t,T)dT

(5.3-6)

(5.3-1)

for which the solution is Li(TIT) = i(T)] • d jb(l) db(l) da(l) fb(l).

d- f(t,T)d"'=flt,b(t»)---.f{t,a(t)]_+ -f(t,T)dTt a(t) dt dt a(t) at

This, too, may be verified directly by differentiation. As before, we nowconsider t fixed and differentiate with respect to T to obtain

dP(IIT) = d<l>s(I,T)P(T"" T(I T) + (I T) <!P(T) T(IT) (5.3.7)dT dT J~S' S' dT s ,

+ ,(I,T)p(T)d<l>s:i,T) - (0 _ ,(I,T)GQGT<l>sT(I,T) + 0)

Example 5.3-1

Determine the steady-state behavior of the fixed-point optimal smoother for the systemconsidered in Example 5.2-1. The transition matrix required for the fixed-point optimalsmoother is governed (Table 5.3-1, t fixed, T;;lot) by

"'(',')=1

which ultimately simplifies 10Thus,

~~T) = -s(I,T)P(T)W(T)R"(T)H(T)P(T)4>,T(I,T) (5.3-8)

(5.3-9)

Using this result in the equation for fixed-point optimal smoother covariance propagation(Table 5.3-1), we find

In applications, the fixed-point optimal smoother is perhaps most often usedto estimate the initial state of a dynamic system or process - i.e., orbit injectionconditions for a spacecraft. The continuous-time and corresponding discretetime (Ref. 6) optimal fixed-point smoother equations are summarized in Table5.3·1.

dp('ID= _ --e-2jl(T-I) P('I!) =adT - ,

for which the solution,

p(tlT) =f (1 + e-213(T-t») (5.3-10)

TABLE 5.3· I SUMMARYOF FIXED-POINT OPTIMAL SMOOTHER EQUATIONS

for which the solution is computed forward in time from t until the present, T.

5.4 OPTIMAL FIXED·LAG SMOOTHER

Differentiation with respect to T and combining terms, yields

<l>L(T -.l,T)GQGTp-' (T)li(T)dT

(5.4·1)

x(tlt) = x(t)

j T - <>

li(T -.lIT) = <l>dT-.l, nsrn - T

In cases where a running smoothing solution that lags the most recentmeasurement by a constant time delay, .6., is sought, the fixed-lag smoother isused. The derivation closely follows that of Sec lion 5.3. From Eq. (5.3-2), with I=T- .l,wegel

is obtained directly by integration. This equation is identical to that plotted in Fig. 5.2-2.but the interpretation differs in that now t is fixed and T is increasing. When examining apoint sufficiently far in the past (l.e., T - t ;;. 2/M, the fixed-point optimal smoother error isin the steady state described by p(tlT) = 01./2, T > t + 2/13. The differential equation for thefixed-point optimal smoother state estimate (Table 5.3-1) is

Discrete-Time

(See Table 4.2-1 for the required discrete optimal filter terms)

Continuous-Time


Smoothed State d~d~Tl ="s(',TlK(T)I~(T)-H(T~(Tl)Estimate

d"~~,Tl= -"sT(',TlI F(T)+G(T)Q(T)GT(TlP-1rru

where t is fixed, T ;;. t, X(tlt) = Xm and 4ls(t,t) = I

Error Covariance dd~Tl= -"s(I,T)P(T)HT(T)R"'(T)H(T)P(T)"sT(I,TlMatrix Propagation

where P(tlt) = pet)

Smoothed State RklN = .EkIN-I + BN[RN(+) - RN(-)JEstimate where

N-l

BN= TT Ai, Ai = Pj(+)41i'fPi+I-1( - )

i=k

iklk = &k; N = k + I. k + 2 , ...

Error Covariance "kIN ="kIN-l + BN[Pk(+) -Pk( -Jl BNT

Matrix Propagation where"klk = "k(+)

cJg(~~~IT)= F(T-~) + G(T-~)Q(T-~)GT(T-~)p-'(T-~)jg(T-~1T)

-G(T-~)Q(T-~)GT(T-~)P-'(T-~)g(T-~)

+ <l>L(T-~,T)K(T)[~T) - H(T)g(T») (5.4-2)

where the relationship (L(O,~) = <1>,(0. ~»)

d<l>L(~~,n= [F(T-~)+G(T-~)Q(T-~)GT(T-~jp-'(T-A)]L(T-~,n

- <l>LCT-~,T)[F(T) + G(DQ(T)GT(Tjp-' (T)] (5.4.3)

has been used (T;> ~). This is easily verified by differentiation of the expression

with respect to T. The initial condition for )f(T-~IT) is the flxed-pointsolution, g(OI.6.). When measurements are received they are processed by a fixedpoint algorithm until T = ~, at which point the fixed-Jag algorithm is initializedand subsequently takes over. The filtering solution, of course, is carriedthroughout. The corresponding solution for the fixed-lag smoother covariancematrix is

+ p(T-~tT)[F(T-~)+ G(T-~)Q(T-~)GT(T-~jp-' (T-~») T

- <l>L(T-~.T)P(T)HT(T)R-'(T)H(T)p(T)LT(T-~.T)

- G(T-~)Q(T-~)GT(T-~) (5.4-4)

for which the initial condition is P(OI~), the optimal fixed-point smoothing errorcovariance matrix evaluated at T = .6..

In practice, the fixed-lag optimal smoother is used as a "refined:' albeitdelayed, optimal filter, Applications in communications and telemetry, amongothers, are suggested. The continuous-time and corresponding discrete-time (Ref.6) fixed-lagsmoother equations are summarized in Table 5.4-1.

Example 5.4-1

Determine the steady-state behavior of the fixed-lag optimal smoother for the systemconsidered in Example 5.2-1. The transition matrix required for the fixed-lag optimalsmoother (Table 5.4-1, T - t = a is fixed, T > a) is


TABLE 5.4-1 SUMMARY OF FIXED-LAG OPTIMAL SMOOTHER EQUATIONS

Continuous-Time


Smoothed dg(T-"'IT) = [F(T_"') +G(T-"')Q(T-",jGT(T -"')P-'(T-"')]x(T -"'IT>State

fi -

Estimate -G(T-"')Q(T _"')GT(T-"')P-' (T-",)g(T-"')

+"L(T -"',T)K(T)[~(T) - H(T)g(T)]

where T ;;. a, a fixed, and i(OI.6o) is the initial condition obtained fromthe optimal fixed-point smoother, and

d<PL~~~ = [Lt T-.:,.)+ (,( I .l)t)(I .:,.)(;1(1 .l)p-l (I ~)j <PI) I .l.11o-r

-"L(T-"'.nIF(T) + G(T)Q(T)GT(T)p-'(T)J

where 4lL(O,.6o) = 4ls(O,.6o)

Error dP(T-"'IT) = [F(T_"') +G(T-"')Q(T _"'jGT(T_",)P-' (T-"')JP(T -"'IT)Covariance dT

Matrix + P(T-"'IT)[F(T-"') + G(T-"')Q(T-"')GT(T-"') P-' (T-",))TPropagation

- "UT-"',T)P(T)HT(T)R-' (T)H(T)P(T) "LT(T-"',n

- G(T-"')Q(T _"')GT(T-"')

where P(OI.6o) is the initial condition obtained from the optimalfixed-point smoother.

Discrete-Time

(See Table 4 2-1 for the required discrete optimal filter terms)

Smoothed ik+llk+l+N = 4lkiklk -N + Qk(41kT)-I Pk-I(+)[~klk+N - ik(+)}

State+ Bk+l +NKk+l +N[.ll+l +N -Hk+l +N41k+Nik+N(+)]Estimate

k+Nwhere Bk+I +N = TT Ai, Ai = Pi(+)41iTPi+I-1

(-)

l"'k+!

; k=O. 1. 2,. . and £(OIN) is the initial condition.

Error fJ<+llk+l +N '" PHI (-) - Bk+I+NKk+1 +NHk+I+NPk+l +N(- )Bk+I+NT

Covariance-Ak-'Ifk(+) - fklk+NJ (AkT)'Matrix

Propagationwhere the initial condition is P(OIN).

under the assumption that p(T - .6.) =p(T) =Q. The solution, a constant, is given by theinitial condition obtained from fixed-point smoothing [Eq. (5.3-9»,

The fixed-lag smoother covariance matrix, therefore, behaves according to (Table 5.4-1)

dP(T~;iT). 2Pp(T-"'iT) _ q(l +e-2~"')

Employing the initial condition P(OI.6.) obtained from the fixed-point solution, Eq. (5.3-10),we find

Thus, for .6. sufficiently large (i.e., .6. > 2/13), the delayed estimate of the state has theaccuracy of the steady-state smoother. This. of course, is the reason for its utility. Thecorresponding delayed state estimate (Table 5.4-1) IS given oy

di(T-"'IT) ("" " )--d-T- • ~ x(T-"'iT) - x(T-"') +e-~"'(z(T) - x(T)1

subject to the initial condition, x(OI.6.), which is obtained from the optimal fixed-pointsmoother.

REFERENCES

l. Fraser, D.C., and Potter, LE., "The Optimum Linear Smoother as a Combination ofTwo Optimum Linear Filters," IEEE Tram. on Automatic Control, Vol. 7, No.8.August 1969, pp. 387-390.

2. Fraser, D.C., "A New Technique for the Optimal Smoothing of Data," Ph.D. Thesis,Massachusetts Institute of Technology, January 1961.

3. Rauch, H.E., Tung, F., and Striebel, C.T., "Maximum Likelihood Estimates of LinearDynamic Systems," AIAA Journal, Vol. 3, No.8, August 1965, pp. 1445-1450.

4. Bryson, A.E., and Frazier, M., "Smoothing for Linear and Nonlinear DynamicSystems," Technical Report, ASD-TDR-63-119, Wright Patterson Air Force Base. Ohio,September 1962. pp. 353-364.

5. Mehra, R.K., and Bryson, A.E., "Linear Smoothing Using Measurements ContainingCorrelated Noise with an Application to Inertial Navigation," IEEE Trans. onAutomatic Control, Vol. ACI3, No.5, October 1968. pp. 496-503.

6. Meditch, J.S .• Stochastic Optimal Linear Estimation and Control, McGraw-Hill BookCo•• Inc.• New York, 1969.

7. Nash, R.A., Jr., Kasper. J.F .• Jr., Crawford, B.S., and Levine, S.A., "Application ofOptimal Smoothing to the Testing and Evaluation of Inertial Navigation Systems andComponents," IEEE Trans.on Automatic Control, Vol. AC-16, No.6, December 1971.pp. 806-816.


8. Nash. R.A., Jr., "The Estimation and Control of Terrestrial Inertial Navigation SystemErrors Due to Vertical Deflections," IEEE TnJns. on Automatic Control, Vol. AC-13,No.4, August 1968, pp. 329-338.

PROBLEMSProblem 5-1

Choose If to minimize the scalar loss function

and directly obtain the forward-backward filter form of the optimal smoother,

£(tIT)' P(tlT)lp-1(1)£ (') + Ib-I(t)£ WlI

P(tlT) • Ip-I (I) + Ib-I (tll -I

Problem 5-2

Derive the Rauch-Tung-Strlebel smoother equations [Eqs. (5.2-14) and (5.2-15») byfollowing the steps outlined in the text.

Problem 5·3

For the variable Mt),defined by

£(tiT) = £(')- P(t) ~(t)

show that

where ~(T):: Q.For A(t):: E (~(t) ~T(t»), show that

P(tlT)' P(') - pet) A(t) pet)

where

and A(T) :: O.These are the Bryson-Frazier smoother equations (Ref. 4).

Problem 5-4

A scalar system and measurement are described by

x e ax e w, w ....N(O.q)

z :: bx + v, v - N(O,r)

(a) Show that the forward filter steady-state error covariance. obtained from the Riccatiequation by setting p<t) :: 0, is

P~=~(1 +)1 +~ )

(b) Next, obtain the steady-state fixed-interval smoother error covariance (denotedPoo(tIT») by setting p(tlT) 0; O.as

P_(IITl;~2(1 +- p_l

q

(c) Show that this result can be written in the alternate form (y2 = b'q/a'r;;>o 0)

In this form it is apparent that the smoother error variance is always less than half thefilter error variance.

Problem 5-5

It is a fact that smoothed covarfances are not necessarily symmetric about the midpointof the sampling interval (see, for example, Ref. 8). A simple illustration of this point can bemade with the second-order system of Fig. 5-1, where w - N(O,2ao'). Assume that thesystem is in steady state, and that at t=O a perfect measurement orx, is made. Show that,while E[x l'(t» is symmetric with respect to t=O, E[X2'(t» is not symmetric with respectto t=O, viz:

E(x l'(t») = 0'(1 ~ e-2al tl)

E[X22(t»)=~ (1-~e20:t), rcotJ(a+tJ) a+tJ

=~(l M(a+tJ)e-at-2tJe-tJ1J2).tJ(a+tJ) (a+tJ)(a-tJ)' t;;>O 0

(Hint: It may prove simplest to seek estimates in the form X1(t) 0; kt(t) XI(O}, x,(t) =k2(t)x,.(O). and to choose kl(t) and k,(t) to minimize £[X1'(t)] and £(X2'(t»),respectively.)

Figure 5-1

Problem 5-6

The output of an integrator driven by white noise w (where w - N<o,q» is sampledevery a seconds (where a = tk+l-tk = constant) in the presence of a measurement noise vk


[where vk - N(O,ro». Assume that three measurements are made, corresponding to k ::0, I, 2. Further assume that there is no a priori information.

a) Show that a fixed-interval optimal smoother yields (7 0; q afro)

P1\,o;r oG::)1 +37+7'

POI' = ro (1 + 7)(3 + 7)

b) Check these results by formulating the optimal smoother in terms of a continuousoptimal forward niter updated at the measurement times by a continuous optimalbackward filter, and sketch the Itltering and smoothing covariances.

c) Show that a fixed-point optimal smoother for the initial condition yields

POll = ro(~ ::)

1 + 3y +-y'

POI' = ro (1 +-y)(3 +7)

Problem 5-7Design an optimal linear smoother to separate a noise net) from a signal set) when the

spectral densities for the signal and noise are given by:

1 2w 2

~ss(w) =~ , ~nn(w) =w 4 + 1-

(Hint: This problem in Wiener smoothing can be solved as the steady-state portion of anoptimal linear smoothing problem.)

6. NONLINEAR ESTIMATION NONLINEAR ESTIMATION 181

operation on the measurement data - e.g., the Kalman filter algorithm.Consequently, there is little theoretical justification for using a different dataprocessing technique, unless a nonBayesian optimization criterion is preferred.By contrast,in the nonlinearproblem~(t) is generallynot gaussian; hence, manyBayesian criteria lead to estimatesthat aredifferentfromthe conditionalmean.In addition,optimal estimationalgorithms for nonlinearsystems often cannot beexpressed in closed form, requiring methods for approximating optimalnonlinearfilters,

One further complication associated with general nonlinear estimationproblems arises in the structure of the system nonlinearities. Theoreticaltreatments of this subject often deal witb a more general version of Bq. (6.0-1),namely

i(t) =.fiJI(t),t) +G(3(t),t)~(t) (6.Q.3)

This chapter extends the discussion of optimal estimation for linear systemsto the more general case described by the nonlinear stochastic differentialequation

The vectori is a nonlinearfunction of the state and ~t) is zero meangaussiannoise having spectral density matrix Q(t). We shall investigate tbe problem ofestlmating gft) from samplednonlinearmeasurements of the form

li(t) = fu(t),t) + ~(t) (6.Q.l)

where G{Jl(t),t) is a nonlinear matrix function of 3(t), and ~(t) is again(formally) a vector white noise process. In this case a theory for estimating3(t)cannot be developedwithin the traditional framework of meansquarestochasticcalculus because the right side of Eq. (6.0-3) is not integrable in the mean squaresense, owing to tbe statistical properties of the term G(3(t),t~t). TIllsdifficulty is overcome by formulating tbe nonlinear filtering problem within thecontext of Ito calculus (Refs. 1 and 2) which provides consistent mathematicalrules for integrating Bq. (6.0-3). However, a theoretical discussion of the lattertopic is beyond the scope of this book.

The main goal of thischapter is to provide insight into principles of nonlinearestimation theory which will be useful for most practical problems. In this spirit,we circumvent tbe matbematical issues raised by Bq. (6.0-3) using tbe followingargument. Most physical nonlinear systems can be represented by a differentialequationof the form

lie =hk{Jl(tk» +!k, k =1,2, ... (6.Q.2) (6.0-4)

where ~2(t) is gaussian white noise. Combining Eqs. (6.Q-4) and (6.0-5) anddefining

where 32(t) is a bandlimited (nonwhite) random forcing function havingbounded nns value - i.e., there is no such thing as white noise in nature. Weshall model 32(t) as a gaussian random process generated by tbe linear system

where hk depends upon both tbe index k and the state at each sampling time,andl::tk}is a white random sequence of zero mean gaussian random variableswith associated covariance matrices {R k}. Thisconstitutes a class of estimationproblems for nonlinear systems having continuous dynamics and discrete-timemeasurements.

For several reasons, the problem of filtering and smoothing for nonlinearsystems is considerably more difficult and admits a wider variety of solutionsthan does the linear estimation problem. First of all, in the linear gaussian casethe optimal estimate of .!(t) for most reasonable Bayesian optimization criteria isthe conditional mean defined in Eq. (4.Q.9). Furthermore, the gaussian propertyimplies that the conditional mean can be computed from a unique linear

32(t) = F2(t)32(t) + ~2(t) (6.0-5)

182 APPLIED OPTIMAL ESTIMATION NONLINEAR ESTIMATiON 183

6.1 NONLINEAR MINIMUM VARIANCE ESTIMATION

THE EXTENDED KALMAN FILTER

Given the equations of motion and measurement data in Eqs. (6.0-1) and(6.0-2), we seek algorithms for calculating the minimum variance estimate of~(t) as a function of time and the accumulated measurement data. Recall fromChapter 4 that the minimum varianceestimate is always the conditional mean ofthe state vector, regardless of its probability density function. Now suppose thatthe measurement at time tk-l has just been processed and the correspondingvalue i(tk_l) of the conditional mean is known. Between times tk_l and tk, nomeasurements are taken and the state propagates according to Eq. (6.0-1). Byformally integrating the latter, we obtain

having the same form as Eq. (6.0-1). Because the white noise term in Eq. (6.0-6)is independent of x.(t), the manipulations associated with mean square stochasticcalculus can be applied. A more detailed discussion of this point is provided inReLI.

Within the framework of the model in Eqs. (6.0-1) and (6.0-2), this chapterconsiders some estimation criteria that lead to practical techniques forestimating the state of a nonlinear system. Sections 6.1 and 6.2 discuss filteringand smoothing algorithms for minimum variance estimators- i.e., those whichcalculate the conditional mean of jI(t). In Section 6.1 emphasis is placed onTaylor series approximation methods for computing the estimate. Section 6.2describes the use of statistical lineariz.ation approximations. Section 6.3 brieflytreats the topic of nonlinear least-squares estimation,a technique that avoids theneed to specify statistical models for the noise processes, wet) and Xk, in Eqs.(6.0-1) and (6.0-2). Finally, Section 6.4 discussesa practical analytical techniquefor analyzingnonlinear stochastic systems, based upon statistical lineanzationarguments.

(6.1-3)&(t)=1<3(t),t),

. ~SIt) = F(t) lI(t)

=F(t)&(t)

=I(i, t)

with the initial condition

where the dependence of X upon t, and 1 upon ~ and t, is suppressed fornotational convenience.

Equations (6.1-3) and (6.1-5) are generalizations of the propagation equationsfor the linear estimation problem. IfSIt) and PIt) can be calculated, they willprovide both an estimate of the state vector between measurement times and ameasure of the estimation accuracy. Now observe that the differential equationsfor &(t) and p(t) depend upon the entire probability density function' p(l[,t) forlI(t). Recall that for linear systems fu(t), t) = F(t) lI(t) so that Eq. (6.1-3)reduces to

where the caret (j denotes the expectation operation. Similarly, a differentialequation for the estimation errorcovariance matrix

PIt) ~ E [I&(l) - ~t)) [i(t) -lI(t)]~ (6.1-4)

is derived by substituting for x(t) in Eq. (6.1-4) from Eq. (6.1-1), interchangingthe order of expectation and integration, and differentiating. The result is

A Ap(t) =!J: - SoP' +f!T -1&T+Q(t), tk_ t <; t < tk (6.1-5)

Therefore, on the interval tk_1 <; t < tk' the conditional mean of jI(t) is thesolution to Eq. (6.1-2), which can be written more compactly as

That is, &(t) depends only upon F(t) and SIt); similarly, by substituting thelinear form for i into Eq. (6.1-5) it follows that PIt) depends only upon F(t),p(t), and O(t).·· Therefore, the differential equations for the estimate and itscovariance matrix can be readily integrated. However, in the more generalnonlinearcase

(6.1-1)

(6.0-6)

~t) = lI(tk_t)+f' i(lf.(T), T) dr +1' lY(T) drtk_1 tk_1

we obtain the augmented equations of motion

3(t) = [~~t~l = [~~I~)~'~t)~t~] + [_ ~ 13,(t)J F,(t) 2I,(t) lY,(t~

Taking the expectation of both sides of Eq, (6.1-1) conditioned on all themeasurements taken up until time tk_I' interchangingthe order of expectationand integration, and differentiating produces

[(l[,t)= L-it EIJI(t)) = E[f(lf.(t),t)], tk_l <;t < tk (6.1-2)

* The symbol N.t) is used to denote the probability density function in this chapter todistinguish it from the dynamic nonlinearities,.!),K, tj.

**See Problem 6-2.

184 APPLIED OPTIMAL ESnMATION NONLINEAR ESTIMATION 185

where it is assumed the required partialderivatives exist. Takingthe expectationof both sides of Eq. (6.1-6) produces

Thus, in order to computelUl,t), N,t) must be known.To obtain practical estimation algorithms, methods of computing the mean

and covariance matrix which do not depend upon knowing N,t) are needed. Amethod often used to achieve this goal is to expand i in a Taylorseriesabout aknown vector X(t) that is close to lI(t). In particular, if Lis expanded about thecurrent estimate (conditional mean) of the state vector, then X= &and

LUI,t) =1(&,t) + ¥I .UI-&) + ...II X=;l

(6.1-6)

subsequently. Other techniques that depend upon finding functional approximations to the conditional probability density function of II, and which are nottreated here, are diseussed in Refs. 19 and 20.

To obtain a complete fdtering algorithm, update equations which account formeasurement data are needed. To develop update equations, assume that theestimate of lI(t) and its associated covariance matrix have been propagated usingEqs. (6.1·7) and (6.1-8), and denote the solutions at time tk by ik(-) andPk(-). When the measurement ~k is taken, an improved estimate of the state issought. Motivatedby the linearestimation problem discussedin Chapter4, werequirethat the updatedestimatebea linearfunction of the measurement - Le.,

(6.1·9)

lUl,t) = .f(i,t) + Q+ ...

The first-order approximation to 1<JI(t), t) is obtained by dropping all but thefirst term of the power series for 1and substituting the result into Eq. (6.1-3);this produces"

where the vector.ik and the "gain" matrixKk are to be determined.Proceedingwith arguments similar to those used in Section 4.2, we define the estimationerrors just before and just after update, i k(-) and ilk(+)' respectively, by

(6.1-10)ilk(+) ~ &k(+)-Ilk

iik(-)~ &k(-) -Ilk(6.1·7)i(t) =.f(i(t)),

Similarly, an approximate differential equation for the estimation errorcovariance matrix is obtained by substituting the first two terms of theexpansion for i into Eq, (6.1-5), carrying out the indicated expectationoperations,and combiningterms;the resultis

Then Eqs. (6.1-9) and (6.1·10) are combined with Eq. (6.0-2) to produce thefollowing expressionfor the estimation error:

(6.1-11)

!Xt) = F<&(t), t) p(t) + p(t) FT <&(t), t) + O(t),

where F<&(t), t) is the matrix whose ijth element is given by

(6.1-8) One condition required is that the estimate be unbiased - i.e., E[ik(+)] =Q.This is consistent with the fact that the desiredestimateis an approximationtothe conditional mean. Applying the latter requirement to Eq. (6.1-11) andrecognizing that

we obtain

SolvingEq. (6.1-12) for!k and substituting the result into Eq. (6.1-9) yields

In addition, Eqs. (6.1-11) and (6.1-12) can be combined to express theestimationerrorin the form

(6.1-13)

(6.1-12)

Equations (6.1·7) and (6.1·8) are approximate expressions for propagatingthe conditional mean of the state and its associated covariance matrix. Beinglinearized about &(t), they have a structure similar to the Kalman filterpropagation equations for linear systems. Consequently they are referred to asthe extended Kalman filter propagation equations. Higher-order, more exactapproximations to the optimalnonlinearfilter can be achievedusingmore termsin the Taylor series expansion for the nonlinearities, and by derivingrecursiverelations for the higher moments of 11. Methods of this type are discussed

*Up to this point i has denoted the exact conditional mean; henceforth. i denotes anyestimate of the state that is an approximation to the conditional mean. (6.1-14)

To determine the optimal gain matrix Kk , the same procedure used for thelinear estimation problem in Section 4.2 is employed. First an expression isobtained for the estimation error covariance matrix Pk (+) in terms of Kk ; thenKk is chosen to minimize an appropriate function of Pk(+). Applying thedefinition

to Eq. (6.1-14), recognizing that 11< is uncorrelated with iik(-) and !k, usingthe relations

PkH= El.&kHiik(-)T]

Rk =E!1IcYkT]

and assuming that Pk(+) is independent of~, we obtain'

Pk(+) =PkH + Kk E[lhkUlk) -hkUlk)] IhkUlk) -hkUlk)]T] KkT

+ E ~kH [l!kUlk) - hkUlk)] T] KkT

+ Kk E[l!!kUlk) - hkUlk)] iik(-)T] + KkRkKkT (6.1-15)

The estimate being sought - the approximate conditional mean of 2\(t) - is aminimum varianceestimate; that is, it minimizes the classof functions

Jk = E [iik(+)T Siik(+)]

for any positive semidefinite matrix S. Consequently, we can choose S = I, andwrite

(6.1-16)

Taking the trace of both sides of Eq. (6.1-15), substituting the result into(6.1-16) and solving the equation

for Kk yields the desired optimal gain matrix,

Kk = -E[hH [l!kUlk) - l!kUlk)] T]

X{ E [lhkUlk) - l!kUlk)1IhkUlk) - l!kUlk)] T] + Rk }-, (6. H 7)

NONLINEAR ESTIMATION 187

Substituting Eq. (6.1-17) into Eq. (6.1·15) produces, after some manipulation,

Pk(+) =Pk(-) + Kk E [lhk(!k) - hkUlk)] hHT] (6.1-18)

Equations (6.1·13), (6.1-17), and (6.1-18) together provide updating algorithms for the estimate when a measurement is taken. However, they areimpractical to implement in this form because they depend upon the probabilitydensity function for 2\(t) to calculate hk' To simplify the computation, expand!!kUlk) in a power series about !k(-') as follows:

(6.1-19)

where

(6.1-20)

Truncating the above series after the first two terms. substituting the resultingapproximation for !!k(!J<) into Eqs. (6.1·13), (6.1-17), and (6.1-18), andcarrying out the indicated expectation operations produces the extendedKalman filter measurement update equations

!k(+) =!kH + Kklh - !lk(!kH)]

Kk =P k(-) HkT(!k(-)) [Hk(!k(-» Pk(-) HkT(!k(-» + RkJ'

Pk(+)'= [1 - KkHkGik(-»] PkH (6.1-21)

Equations (6.1-7), (6.1-8), and (6.1-21) constitute the extended Kalmanfiltering algoritJun for nonlinear systems with discrete measurements. Asummary of the mathematical model and the filter equations is given in Table6.1-1; the extension of these results to the case of continuous measurements isgiven in Table 6.1-2. A comparison of these results with the conventionalKalman filter discussed in Section 4.2 indicates an important difference; thegains Kk in Eq. (6.1-21) are actually random variables depending upon theestimate ii(t) through the matrices F(!(t), t) and Hk<h(-». This results fromthe fact that we have chosen to linearize1and Uk about the current estimate of!(t). Hence, the sequence { Kk } must be computed in real time; it cannot beprecomputed before the measurements are collected and stored in a computermemory. Furthermore. the sequence of (approximate) estimation error covariance matrices {Pk f is also random, depending upon the time-history of.&(t) - i.e .• the estimation accuracy achieved is trajectory dependent. The readercan verify that when the system dynamics and measurements are linear. theextended Kalman filter reduces to the conventional Kalman filter.

TABLE 6.1-1 SUMMARY OF CONTINUOUS-DISCRETEEXTENDED KALMAN FILTER

System Model i(l) =1Cl1II),t)+l!:ll); l!:(1)-N(Jl,Q(I))

Measurement Model ~ = IikCll(tk)) +:tit ; k = 1,2.... ; :tit -N(Jl,Rk)

Initial Conditions J!IO)- N!io, Po)

Other Assumptions E bY<t),IkT]= o for allk and aU t

State Estimateill) =1!i(1),1)Propagation

Error CovarianceP(t) = F!ilt),I) P(I) + P(I) FT!i(t),t) + Q(t)

Propagation

State Estimate Update iI«+) = ik{-) + Kkllk-Iik !ikl-»J

Error Covariance!'J«+) = II-KkHk!ik(-))J !'J«-)Update

Gain Matrix Kk = !'J«-) HkT(ik(-» [Hk!ik(-)) !'J<HHkT!ikI-» + RJ-l

F!i(t) I) = a1Cl1(I)'I)~, ~ (1)=i(l)

Definitions

11 (S (-)) = al!k(&llk»Ik k aA(tk) ~(tk)=i.c(-)

TABLE 6.1-2 SUMMARY OF CONTINUOUS EXTENDED KALMAN FILTER

System Model i(t) =1\!(I),I) + l!:ll) ; l!:(I) - N(Jl, Q(I))

Measurement Model z,,(l)= !!\!(I),t) +1:(1) ; 1:(1)- N(Jl, R(l»

Initial Conditions .!(O) - N<io. Po)

Other Assumptions E[l!:(t)1:T(T)] = Ofora111and aUT

State Estimate Equation itt) =l!ilt),I) + K(I) ll.(I)-!!!i(t),t)1

Error Covariance Equation P(l) = F!i(t),I) PIt) + P(I) FT!i(t),I) + Q(I)

- P(I) HT!i(I),I) R-' (I) H!i(I),I) P(l)

Gain Equation K(I) = P(I) HT!i(I),I) R-t (I)

F(i(l) t) = a1<&t),I)1' ~21II)=i(l)

DefinitionsH!i(I),I)' alltli(I).I)~

~x(t)=i(t)


It should be noted that a lineanzed Kalman filtering algorithm, for which thefilter gains can be precalculated is developed if the nonlinear functions i.and hkin Eqs. (6.0-1) and (6.0-2) are linearized about a vector i!:(t)which is specifiedprior to processing the measurement data. If the stages in the extended Kalmanfilter derivation are repeated, with '& substituted for &. throughout, the filteringalgoritlun given in Table 6.1-3 is obtained. Generally speaking, this procedureyields less filtering accuracy than the extended Kalman filter because i!:(t) isusually not as close to the actual trajectory as is E(t). However, it offers thecomputational advantage that the gains IKd can be computed off-line andstored in a computer, since !(t) is known a priori.

TABLE 6.1-3 SUMMARY OF CONTINUOUS-DISCRETEUNEARIZED KALMAN FILTER

System Model il')=1\!(1),1)+ l!:(t) ; l!:(I)-N(Jl,Q(I))

Measurement Model ~ =l!k\!(tk)) +:tIt ; k = 1,2, ... ; .lk - N(Jl,Rk)

Initial Conditions J!(O)- N!io•Po)

Other Assumptions EI»:(t) ,lkTI = 0 for aU k and all I

! Nominal trajectory:!(t) is available

State Estimateill) =1(X11),I)+F(S(I),I) li(t) - X(t) J

Propagation

Error Covariance P(t) = F(!<I),t)P(I) +PIt) FT(i!(I).l)+Q(I)Propagation

State Estimateik(+) =ik( -)+ Kkllk -J!k(i!(lk))-Hk(i!(t))!ik(-) -Xllk) I JIUpdate

Error Covariance!'J«+) = II - Kk Hk<il(tIc)1 !'J<{-)Update

Gain Matrix. Kk =!'J« - )HkT!l(lk)) IHk(E(lk))!'J« -)HkT!E(lk))+ Rkr'

F(X(I),I)= ~.!!.21(I)'I)1- aJ!\il" 2l(1)= X(l)

Deftnitions

Hk(x(lk)) =a~~,- aJ!(lk J!(tk) = !(tk)

Because the matrix Pk in Eq. (6.1-21) is only an approximation to the truecovariance matrix, actual filter performance must be verified by monte carlosimulation. There is no guarantee that the actual estimate obtained will be closeto the truly optimal estimate. Fortunately, the extended Kalman filter hasbeenfound to yield accurate estimates in a number of important practicalapplications. Because of this experience and its similarity to the conventional

190 APPLIED OPTIMAL ESTIMATION NONLINEAR ESTIMATION 191

Kalman filter, it is usually one of the first methods to be tried for any nonlinearfiltering problem.

iik,i+I(+) = &k(-) + Kk,i[j;k - hk(3k,i(+)) - Hk(h,l+))(Zk(-) - 3k}+))!

Kk,i =Pk(-)HkT(3k,i(+)) [Hk(&k,i(+)) Pk(-) HkT(h,i(+)) + Rd-'

Observe that Eq. (6.1-22) reduces to Eq. (6.1-19) when i = O. Truncating theseries in Eq. (6.1-22) after the second term, substituting the result into Bqs.(6.1-13), (6.1-17), and (6.1-18), performing the indicated expectation operations, and observing that" ElJIk! = &k(-)' produces the following iterativeexpressionsfor the updated estimate:

The Iterated Extended Kalman FRter - One method by which the estimate'&k(+) given in Eq. (6.1-21) can be improved is by repeatedly calculating'&k(+)'Kk, and Pk(+), each time Itnearizing about the most recent estimate. To developthis algorithm, denote the [th estimate of !k(+) by '&k,i(+)' i = 0, I, ..., with'&k,O(+) = &k(-) and expand hklJlk) in Eqs. (6.1-13) and (6.1-17) in the form

hkUlk,i) = hk(!k,j{+)) + Hk(!k,i(+)){JIk - '&k,i(+))+ ...

(6.1-25)

(6.1-23)

Pk,i+ 1(+) = (1- Kk,i Hkl]k.i(+))] Pk(-)

h,O(+)~.&kH , ,=0,1,.

" \[ d'f J (di'(f,B) = trace \ s:__'_ B( uX p d"q

where the operator 2,'([.B), for any functionL~,t) andany matrix B, is a vectorwhose [th element is defined by

hklJlk) = hkl]kH) - Hl]k(-» &k(-) +~2' (hh~k(-)&kT(_» + ..

(6.1-24)

1LlJI(t),t) =1(&(t),t) - Fl](t),t) get) +'22' (f,&(t) &T(t) + .

A Second-order Filter - Another type of higher-order approximationto theminimum variance estimate is achieved by includingsecond-order terms in theexpansions for hklJlk) and LlJIk' t) in Eqs. (6.1-6) and (6.1-19). Thus, we write

The bracketed quantity in Eq. (6.1-25) is a matrix whose pqth element is thequantity d'fi/dxpdxq. Truncating the series in Eq. (6.1-24) after the thirdterm, substituting the resulting approximations for f and 11k into Eqs. (6.1-3),(6.1-5), (6.1-13), (6.1-17), and (6.1-18), and carrying out the indicatedexpectation operations produces,after some manipulation,"

As many calculations of h/+) in Eq. (6.1-23), over the index i, can beperformedas are necessary to reach the point where little furtherimprovementis realized from additional iterations. However,it should be recognizedthat eachiteration of Eq. (6.1·23) contributes to the computation time required tomechanize the filter.

A similar procedurecan be devised for iterating over the nonlineardynamicsin Eq. (6.1-7); for details the reader is referred to Ref. 1. It has been found thatsuch iteration techniques can significantly reduce that part of the extendedKalman filter estimation error which is caused by system nonlinearities(Ref. 5).

(6.1-22)

Hkl]k,i(+)) ~ :~:I .!k = !k,i(+)

HIGHER-ORDER FILTERS

The Taylor series expansion method for treating the nonlinear filteringproblem, outlined in the preceding section, can be extended to obtainhigher-order nonlinear filters. One method of accomplishingthis is to write theestimate, ik(+) in Eq. (6.1-9), as a higher-order power series inll<. Another morecommonly used approach is to include more terms in the expansions for1<.3,t)and hklJlk) in Eqs. (6.1-6) and (6.1-19). These methods differ in that the latterseeks petter approximations to the optimal filter whose structureis constrainedby Eq. (6.1-9) - i.e., the measurement appears linearly; the former allows amore general dependence of the estimate upon the measurementdata. If bothtechniques are applied Simultaneously,a more generalnonlinearfilter will resultthan from either method alone. The discussion here is restricted to the casewhere the estimate is constrainedto be linearinlk.

*This is true because the expectation .6k is conditioned on all themeasurements up to. butnot including, time tk-

*An assumption used in deriving Eq. (6.1-26) is that third-order moments of !k(-) arenegligible - i.e., the probability density for ik(-) is assumed to be symmetric about themean.


. Ig(t) = i@),t) +2 2'(f,P(t))

P(t) = F(g(t),t) P(t) + p(t) FT(g(t),t) +Q(t)

example of a hypothetical navigation application; the second and third presentcomputer simulation results of linearized and nonlinear filters applied to atracking problem.

gk(+) =gkH + Kk [A - !!k(ikH) - ~ 2' (hk, PkH)] _I

Kk = PkH HkT(ikH) [Hk(ikH) PkH HkTfiJc(_))+Rk + Ak]

Pk(+)= [1- KkHk(gk(-))] PkH (6.1-26)

The matrix Ak is defined by the relation

Ak ~ ~E [2'(hk,3kHkT(_))2'(hk,ikHikT(-1T]

I 2 2 T-"42 (hk'pk(-)) 2 (hk'pkH) (6.1-27)

Example 6.1·1The extended Kalman filter is a popular method for treating nonlinear estimation

problems (e.g., see Ref. 18). However, if nonlinearities are sufficiently important, theestimation error can be significantly reduced through use of a higher-order estimationtechnique. As a specific illustration, consider an aircraft whose estimated location in x, ycoordinates is xo,Yo in Fig. 6.1·1, with a gaussian distribution p(x,y) in position uncertaintywhose shape is indicated by the narrow shaded elliptical area; the value of p(x,y) is constanton the circumference of the ellipse." Now suppose the aircraft measures its range to atransponder that has a known position in x,y coordinates. Furthennore, assume that therange measurement errors are very small. If the nonlinear relationship of range to x, yposition were taken into account exactly, a new estimate of position would be deducedapproximately, as follows: Assuming the range measurement, rm. is perfect, the aircraft liessomewhere on the circle of radius rm shown in Fig. 6.1-1. Given the a prion' distribution ofx and y denoted by the shaded ellipse, the new minimum variance estimates, Xl and Yl of xand y will lie close to the peak value of p(x,y) evaluated along the range circle. Hence, Xland Yl will lie approximately at the point indicated in Fig. 6.1·1.

TRANSPONDER

CONSTANT RANGE - LOCUSOf ACTUAL POSITIONS

An Illustration of the Difference Between Linearized Estimates(X'l, Y'I) and Nonlinear Estimates (Xl, Yl)

FiBU'"6.1-1

LOCUS Of LINEARIZED POSITIONS=~~~-~-~----------------

E[x p xq xm xnl = Ppq Pmn + Ppm Pqn + Ppn Pqm

The ijth element of Ak is given approximately by

a .. =.t.[ '"" a'hi a'hj J (6.1-28)k ll - 4 L...i ax ax (PpmPqn +PpnPqm)~

p,q,m,n p q m n lilt =ik(-j

where hi denotes the [th element of Uk and the dependence of h and ~ on thetime index k is suppressed for convenience.

Equation (6.1·28) is derived from the gaussianapproximation

where Xi and Pij denote the ith and ijth elements of ik(-) and Pk(-),respectively. For this reason, Eq. (6.1·26) is sometimes referred to as a gaussiansecond-order filter (Ref. I). It has a structure similar to the extended Kalmanfilter described in Table 6.1·1; however, there are additional terms to accountfor the effect of the nonlinearities. Consequently the improved performance isachieved at the expense of an increased computational burden.

EXAMPLES OF NONLINEAR FILTERING

In this section a number of examples are discussed which illustrate thecomparative performance of various nonlinear filters. The first is a geometric

"Ihe elliptical contour is obtained by setting the magnitude of the exponent of the gaussian

density function equal to a co~:~~anj\that i~~:: t]hecontour x and y satisfy

...0,-1 ...0:; constanty-Yo y-Yo


By contrast, if the range measurement is linearized as in an extended Kalman filter, theargument proceeds as follows: In terms of the initial range estimate fO in Fig. 6.1-1, tm isapproximately expressed as


r'~~~IIIIIIIIII* RADAR

rI

Fisure 6.1-2 Geometry of the One-Dimensional Tracking Problem

(6_1-30)

(6.1-29)• arI . arI .tm ~ ro + ax ...... (x-xo) + ay ... ... (y-yo)xo.Yo xo.Yo

where r is the range to the transponder as a function of x and y. In other words

• T" ~x-x~rm - ro >l! 2'r (xo.Yo) ...y-Yo

where 2'r(xo,Yo) is the gradient of r with respect to x and y evaluated at ro' The directionof the gradient is along the radius vector from the transponder. Equation (6.1-30) states thatthe new values of x and y deduced from rm must lie along the straight dashed line shown inFig. 6.1-1, which is normal to ~r(xo,Yo)' Choosing the new estimate of position at themaximum of p(x,y) along the straight line,linearized estimates, x' 1 and y'I, are obtained atthe point indicated in the fJgure. Depending upon the range accuracy desired, the differencebetween the nonlinear estimate (x\,y\) and (i' lY' 1) may be significant.

This example illustrates the point that nonlinearity can have an important effect onestimation accuracy. The degree of importance depends upon the estimation accuracydesired, the amount of nonlinearity, the shape of the density function p(x,y) and thestrength of the measurement noise.

for velocity, X2. is nonlinear through the dependence of drag on velocity. air density andballistic coefficient. Linear measurements are assumed available in the continuous form

Example 6.1-2

In this example, we consider the problem of tracking a body falling freely through theatmosphere. The motion is modeled in one dimension by assuming the body falls in astraight line, directly toward a tracking radar, as illustrated in Fig. 6.1-2. The state variablesfor this problem are designated as

z(t) =Xl(t) + v(t) (6.1-33)

XI =x, X2 =X, X3 ={3 (6.1-31)where v(t) ..... N(O,r). Initial values of the state variables are assumed to have mean, IJ., and acovariance matrix of the form -

The problem of estimating all the state variables may be solved using both the extendedand linearized Kalman filters discussed in Section 6.1. Recall that the extended Kalmanfilter is linearized about the current estimate of the state, whereas the general linearizedfilter utilizes a precomputed nominal trajectory X(t). The latter is derived for this exampleby solving the differential equation

where {3is the so-called ballistic coefficient of the faIling body and x is its height above theearth.

The equations of motion for the body are given by [PliO0 0]

Po = 0 P220 0

o 0 P330

(6.1-34)

d= P X22

2x, ! =1®

I(O) = 1:. (6.1-35)

(6.1-32)

where d is drag deceleration, g is acceleration of gravity, p is atmospheric density (with Po

the atmospheric density at sea level) and kp is a decay constant. The differential equation

Comparative performance results from the two different filters are displayed in Figs.6.1-3 and 6.1-4, which show the estimation errors achieved from a single monte carlo trial.The simulation parameter values are given below.


o

20

V 100 nrr...-rTTT"T,......,-r-rTrn'"Tr-----,--"""'T.--,

91........

2010

TIME (sec)

(0) POSITION ERROR

o

1 ,' I , • "TIT'I',ill'" ..,- , Ti [1:rrr -or

l :1't :!'~jj I!I 111 !fj• t 111 It 'i! ff.t: jJ t

if :r,," 1 t 11 I If • ; j rt I

• .;

• l~ ~ I ! , ...

f-r't if t ~ft ~1 , U' It J I I . , t i. • • f

:1 ttt tt'" ~ 1• -1 + T , fIT~Kt • t; ~ :

'1 ttt ,If i1i, t tit • ••, • J .It iji it til I~ I ! t t • I'

... .,.. ; ~. .., ~ .1~ • , I• j. ·..

• t t ...! ' I 1\.. H. ~ ... ·.~ j' ... ' ... •.. . , , .,. ... ... t .. f '1 t f t t :j t ' .. ·, . ..-

I' !

I . i ; 1 ~ j , j j t.. .. . ·.. t + ~ t t ~ ' .., , • •• ~ . ~1 ~ •

.. , " . .. ·.. ... ; , .• • t I I ·,. ., . • •• t ~ t ~ 1 •.. ., . I ... t •• • I' . ·'

•• I !- ••

• " I ·'--

o

zQ~

2

~Of:w

5~~

:;; -100w

-----le

2010TIME (sec)

(b) VELOCITY ERROR

~ It IWt.JP;"2

l +-o

100u..oil........;;N~ -100Of:Of:w -200Zo~ -300~~:n -.00

>~

~ -500-'w> -600o2010

TI~ (sec)

(0) POSITION ERROR

o

1 1 T [f r' I '1l11i1IlII I I ii',,

H H~j Ilf II ~I II: I , !' .fl' .. 'it:!' P11 1 . ' ! .~ .

MIWIt ' .. , ..

f j "r; T-e- T- I ' .' ,

••• 'I' ·..; I ...j • ,

~ ~ rj ~ ~ t.:t t I t1!

.i., ! '..t t·

f j i I! I!; j ; · r

t' • t

Iii ' , It!

'ft t : I It •

i I ti Ii' ' I

; , T t t '.1 ~ t III f! 1! · ,

j1 ••

tjf Hi ijl

!11 ' • T j I i I· :Ht I j ! · ! r

tfII, ,

I

111I' , • ,

t p' f j' i tI •• j

o

zo~~i=V'I1&.1 -100Zo

--...

2010TIME (sec)

(e) BAlliSTIC COEFFICIENT ERROR

400

;wt I !

Iitt

R 'W• 'I'T~~ +:t

H~ t

~ f: II! It

o

200

.00

600

-.00

10 20TIME (sec)

(c) BALLISTIC COEFFICIENT ERROR

Fipre 6.1-3 Performance of the Extended Kalman Filter Figure 6.1-4 Performance of the Linearized Kalman Filter

PARAMETER VALUES FOA THE ONE-DIMENSiONAL

TRACKING PROBLEM

Directing our attention to the errors in estimating the ballistic coefficient, we note thatneither filter tracks 8 accurately early in the trajectory. Physically. this is due to the factthat the thin atmosphere at high altitude produces a small drag force on the body;consequently, the measurements contain little information about IJ. After the body entersthe thicker atmosphere, the increased drag force enables both filters to achieve substantialreduction in the p estimation error; however, the extended Kalman filter (Fig. 6.1-3c) givesappreciably better performance. Consequently, the latter also gives better estimates ofposition and velocity as the body enters denser atmosphere. The actual performanceobserved for the extended Kalman filter is consistent with the behavior of its associatedcovariance matrix, computed from the algorithm in Table 6.1-2. This is demonstrated by thefact that the estimated states in Fig. 6.1-3 tend to remain between the smooth lines, whichare the square roots of the corresponding diagonal elements in the computed P matrix. Incontrast" the estimates for the linearized filter in Fig. 6.1-4 tend to have larger errors thanpredicted by the P matrix.

This example illustrates the fact that the extended Kalman filter generally performsbetter than a filter linearized about a precomputed nominal trajectory. However, the readermust remember that the latter is mote easily mechanized because ruter gains can beprecomputed and stored. Consequently. there is a trade-off between filter complexity andestimation accuracy.


Figure 6.1-5 Geometry of the Two-Dimensional Tracking Problem

)~~/ t f/ I

// IRADAR / I

-----.--~ :I ! I

'2 I :I I

I

The rms errors incurred in estimating altitude and the inverse ballistic coefficient. IIp, usingthe three filters are shown in Fig. 6.1-6. These results have been obtained by computing therms errors over 100 monte carlo runs.

x(O)- NOO' ft, 500 ft')

xeo) ~ N(-6000 ftlsec, 2 X 104 ft2 / sec2)

Pl 10 "' 500 n 2

P220 = 2 X 104 ft 2/sec2

P330'" 2.5 X lOs Ib2/ft4

Po'" 3.4 X 1<f3 1b sec2/ft4

g= 32.2 ft/sec2

kp = 22000 ft

t » lOOft2/Hz

p - N(2000 Iblft', 2.5 X 10' Ib'/ft4)

PARAMETER VALUES FOR THE TWO-DIMENSIONALTRACKING PROBLEM

The problem of estimating the three state variables was solved using the extended Kalmanfilter, a second-order filter, and an iterated filter, Monte carlo computer simulations wereperformed using the following parameter values:

12 18 24 30TlMEIlecl

Comparative Performance of Several Tracking Algorithms (Ref. 5)

12 l8 24 30TIME(sec)

Figure 6.1-6

(6.1-36)

x,(O) = 2 X 10-' ft'llb

X3{0)= 6 X 10-4 ft2/lb

r= 100ft2/Hz

P'2g= O.OSlb/ft'

x(O)::: x(O)::: 3 X 105 ft

x(O) '" l({l) = 2 X 104 ft/sec

VJt- N(O,r)

Example 6.1-3

As a further comparison of the types of nonlinear filters discussed in Section 6.1, weinclude a two-dimensional tracking problem in which the range measurement is made alongthe Iine-of-sjght illustrated in Fig. 6.1-5. This example is taken from Ref. 5. The equationsof motion are the same as in the one-dbnensional tracking problem except that the thirdstate variable is identified as IIp. Range measurements, taken at one second intervals, arenow a nonlinear function of altitude given by

Zit. '" "r12 + (xk - r2)2 + vk

200 APPLIED OPTIMAL ESTIMATIONNONLINEAR ESTIMATION 201

Near the beginning of the trajectory, all filters yield the same performance because theyhave the same initial estimation errors. Evidently, the higher-order methods - thesecond-order and iterated filters - yield substantially better estimates as the body falls intothe earth's denser atmosphere. This occurs because both the measurement and dynamicnonlinearities become stronger as the altitude decreases.

This example alsoclearly demonstrates the trade-off between estimation accuracy andfilter complexity. In terms of these criteria, the fllters are ranked in the order of bothincreasing accuracy and complexity as follows: extended Kalman filter, second-order filter,iterated filter. Thus, better filter performance is achieved at the expense of a greatercomputational burden. .

where &(t) and &bet) are estimates provided by forward and backward filters,respectively, and pet) and Pb(t) are their corresponding covariance matrices.If the same approach is taken in the nonlinear case - i.e., if it is assumed thatthe smoothed estimate is a linear combination of &(1) and !b(t) - then Eq.(6.1-37) still holds (Ref. 3). The quantities &(1) and p(t), associated with theforward filter, can be obtained by any of the filtering algorithms given in thepreceding sections. The subsequent discussion treats appropriateexpressions forthe backward filter.

The backward filter operates recursively on the measurement data, beginningat the terminal time and proceeding toward the desired smoothing point.Consequently, it is useful to redefine the equations of motion and themeasurements in Eqs. (6.0-1) and (6.0-2) by making the change of variableT = T-t, resultingin

NONLINEAR SMOOTHING

The purpose of this section is to briefly indicate how smoothing algorithmsfor nonlinear systems can be derived using linearization techniques. Thephilosophy of smoothing for nonlinear systems is the same as for the linear Case

discussed in Chapter 5; namely, the filtered estimate of ~(t) can be improved iffuture, as well as past, measurement data are processed. The associatedestimation problem can be formulated as One of fixed-interval, fixed-point, orfixed-lag smoothing. The discussion here is confined to fixed-interval smoothing.Linear and second-order algorithms for the fixed-point and fixed-lag cases aredescribed in Refs. 22 and 23. For a more theoretical treatment of this subjectthe readeris referredto Ref. 4.

Recall from Section 5.1 that the smoothed estimate 8(tl'I') for a linearsystem, with a given data record ending at time T, can beexpressed as

Using Eq. (6.1-38) as the model for the system dynamics, the linearizationmethods discussed in previous sections can be applied to obtain a backwardfiltering algorithm. In particular, it is convenient to assume that the forwardextended Kalman filter, described in Table 6_1-1, is first applied to all the data;then the backward filter is derived by linearizing Eq. (6.1-38) about the forwardfilter estimate. This procedure results in a backward· filter having the form givenm Table 6.1-3, where the nominal trajectory :8(t) is taken to be!(t) Z!(T -f).The resulting expressions for !b(f) and Pb(f) are as follows:

where

MEASUREMENT UPDATE EQUATIONS

!bk(+) = !bk(-)+ Kbk[~N-k-hN-k(iN_k)-HN-k(iN-k) (&bk(-)-&N-k)]

Kbk = Pbk(-)HN- kT(iN -k) [HN_ k(8N-k)Pbk(-)~_kT(8N-k)+ RN-kl'

Pbk(+)= [I-KbkH N- k(8N_k)}bk(-)' k=O, 1,... , N-I (6.1-40)

~ere !!N-k and HN_k are obtained from the forward filter. The quantities~bt(-) and Pbk(-) are the solutions to Eq, (6.1-39) evaluated at time fk, justbe/ore !N_ k IS processed.

The problem of obtaining initial conditions for the backward filter iscircumvented by deriving algorithms for Pb-I(t) and!<t) ~ pb-

I (t)&b(t), both ofwhich are initialized at zero, just as in the case of the linear smoother discussedin Chapter 5. Using Eqs. (6. 1-39) and (6.1-40) and the matrix identities

(A-' + BTC-'B)-' = A - ABT (BABT + C)-' BA

(K' + BTC-'B)-I BTC-' = ABT(BABT+ C)-I (6.1-41)

and making the assumption that RN_ k has an inverse, the following, moreconvement expressions for the backwardfilter are obtained:

PROPAGATION EQUATIONS

t: &b(f) = -f(8(T-f), T-f) - F(8(T-f), T-f)(ib(f)-i\(T-f»

t: Pb(f) = -F(i(T-f), T-f) Pb(f)-Pb(f) FT(8(T -f), T-f) + Q(T-f)

(6.1-39)

(6.1-37)

(6.1-38)

&(t1T) = p(tIT)[P-' (t) !(t) + pb-I (I) &b(t)J

p-' «rn = p-I (I) +Pb-I (t)

daT ~(T-f) = -i~(T-f), T-f) + Y{(T-T)

.kN-k =hN_k~(T-fk»+~N_k

202 APPLIED OPTIMAL ESTIMATION NONl.lNEAR ESTIMATION 203

PROPAGATION EQUATIONS and Ok is given by

where F and f are understood to be functions of i(T - T) and (T - T).Substituting lCt) into Eq. (6.1-37), the smoothed esthnate is obtained in theform

MEASUREMENT UPDATE EQUATIONS

4(+) =.!k(-) + HN_kT RN_ k" ~N-k - !!N-k + HN_kikl

Pb;'(+) = Pbk"(-)

+ HN_kT RN-k"HN_ k

Pb;'(-)=O, lo(-)=Q, k=O,I, ... ,N-I

i(tIT) = P(tIT)[P-'(t)i(t) + let)]

(6.1·42)

(6.1-43)

(6.1-44)

(6.1-48)

Observe that the initial conditions for Eq. (6.1-45) are the filter outputs iN andPN ; clearly the latter are identical with the corresponding smoothed quantities atthe data endpoint. When the system dynamics and measurements are linear,.2k =Qand Eq. (6.1-45) reduces to the form given in Ref. 1.

6.2 NONLINEAR ESTIMATION BY STATISTICALLINEARIZATION

In Section 6.1, a number of approximately optimal nonlinear filters havebeen derived using truncated Taylor series expansions to represent the systemnonlinearities. An alternative approach, and one that is generally more accuratethan the Taylor series expansion method, is referred to as statistical approximation (Refs. 6, 8, and 9). The basic principle of this technique is convenientlyillustrated for a scalar function, f(x), of a random variable x.

Consider that f(x) is to be approximated by a series expansion of the form

at time tk+1; lflk is the transition matrix from time tk to tk+1 associated withthe homogeneous differential equation

where Pk and &k are obtained from a forward filter and PklN is (approximately)the error covariance matrix associated with !k\N. The quantity .12k is the solutionto the differential equation

(6.2-2)

(6.2-1)

and setting the partial derivatives of this quantity with respect to each nk equalto zero. The result is a set of algebraic equations, linear in the nk's, that can besolved in terms of the moments and cross-moments of x and f(x). Withoutcarrying out the details of this procedure, we can see one distinct advantage thatstatistical approximation has over the Taylor series expansion; it does not

It is desirable that the nk's be chosen so that e is small in some "average" sense;any procedure that is used to accomplish this goal, which is based upon thestatistical properties of x, can be thought of as a statistical approximationtechnique.

The most frequently used method for choosing the coefficients in Eq. (6.2-1)is to minimize the mean square value of e. E[e'). This is accomplished byforming

The problem of determining appropriate coefficients nk is similar to theestimation problem where an estimate of a random variable is sought from givenmeasurement data. Analogous to the concept of estimation error, we define afunction representation error, e, of the form

(6.1-47)

(6.1-46)

~(t) = F(i(t),t)2\(t)

!i<t) = F(i(t), t) !let) +1(i(t), t) - F(i(t), t)i(t)

Q(tk)=Q

Again by analogy with linear systems, another form of the fixed-intervalsmoother is the Rauch-Tung-Striebel algorithm which expresses i(tkIT) andP(tkIT), denoted by SkiN and PkIN• in terms of a forward filter estimate and thesmoothed estimate at stage k + 1. This is more convenient to mechanize than theforward-backward filter inasmuch as it does not require inverting a covariancematrix. The relations for the smoothed estimate at successivemeasurement timesare stated below without proof:

.&klN =.&k+Kk ('&k+lIN - lflk!k - .12k)' !NIN=!N

Kk = Pk<l>kT(kPk<l>kT + Ok)"

PkIN=Pk+Kk(Pk+lIN-kPk<l>kT_Ok)KkT, PN1N=PN (6.1-45)

2CM APPLI ED OPTIMAL ESTIMATION NONUNEAR ESTIMATION 205

where the caret (j denotes the expectation operation. Substituting! from Eq.(6.2·7) into J and taking the partial derivative with respect to the elementsof Nf, we obtain

where P is the conventional covariance matrix for !. Observe that both.! and Nf ,

as given by Eqs. (6.2.7) and (6.2·9), are independent of the weighting matrix A;hence, they provide a genenuized minimum mean square error approximation to

LEquation (6.2-9) has an important connection with describing [unction

theory (Ref. 7) for approximating nonlinearities. In particular, if both f and xare scalarsand their mean values are zero. Nf becomes the scalar quantity- -

require the existence of derivatives of f. Thus, a large number of nonlinearities ~relay, saturation, threshold, etc. - can be treated by this method without havingto approximate discontinuities or corners in f(x) by smooth functions. On theother hand, because of the expectation operation in Eq. (6.2-2), an apparentdisadvantage of the method is that the probability density function for x mustbeknown in order to compute the coefficients nt, a requirement that does notexist when f(x) is expanded ill a Taylor series about its mean value. However, itturns out that approximations can often be made for the probability densityfunction used to calculate the coefficients, such that the resulting expansion forf(x) is considerably more accurate than the Taylor series, from a statistical pointof view. Thus, statistical approximations for nonlinearities have potentialperformance advantages for designing nonlinear ftIters.

This section discusses the calculation of the first two terms in a series, havingthe form of Eq. (6.2-1), for a vector function .fuI). This provides a statisticallinearization of J: that can be used to construct filter algorithms analogous tothose provided in Section 6.1. An example is given that illustrates thecomparative accuracy of the nonlinear filter algorithms derived from both theTaylor series and the statistical linearization approximations.

STATISTICAL LINEARIZATION

Weseek a linear approximation for a vector function 1(X)of a vector randomvariable 11, having probability density function pCll). Following the statisticalapproximation technique outlined in the introduction. we propose to approximate iCll) by the linear expression

where

Solving Eq. (6.2-8) produces

Nf= liT -liT] p-I

E[fx]nf= E[x 2 ]

(6.2·8)

(6.2·9)

(6.2-10)

where!. and Nf are a vector and a matrix to bedetermined. Defining the error

(6.1.-3)

(6.24)

which is the describing [unction gain for an odd-function nonlinearity whoseinput is a zero mean random variable (e.g., see Fig. 6.2-1). The use of describingfunctions to approximate nonlinearities has found wide application in analyzingnonlinear control systems. Tables of expressions for nf have been computed formany common types of nonlinearitieshaving gaussian inputs.

we desire to choose! and Nf so that the quantity

is minimized for some symmetric positive semidefinite matrix A. SubstitutingEq. (6.2-4) into Eq. (6.2,5) and setting the partial derivative of J with respect tothe elements of! equal to zero. Weobtain

E[A(fu) -.i! - Nflll] = Q

Therefore.g is given by

(6.2.5)

(6.2-6)

(6.2-7)Figure 6.2-1 The Describing Function Approximation for a Scalar Odd Nonlinearity

(6.2.12)

The effect of the saturation is completely lost because of the discontinuity inthe first derivativeof f. By contrast, if statistica1linearization is used, we have

At this point, it is worthwhile to note why statistical linearization tends to beJ1X)re accurate than the Taylor series approximation method. Consider theexample of the saturation nonlinearity in ".g. 6.2·1. If f(x) for this case isexpanded in a Taylor series of any order about the origin (x = 0), we obtain

and p(x) is the probability density function for x. If we now assume that x is agaussianrandom variable, then

Substituting Eq. (6.2·14) into Eq. (6.2·13) and evaluating nf for the saturationfunction shown in Fig. 6.2·1, we obtain (from Ref. 7) the result shown in Fig.6.2·2. Evidently, nf is a function of the linear part of f(x), the point 6 at which

f(x) '" x

where nf is the describingfunction gain defined by

J- xf(x)p(x)dx

nf = --==-"'-'-----f: x' p(x)dx

x'p(x)= _1_ e- 2.'

-.[Tiro

(6.2·11)

(6.2,13)

(6.2·14)

saturation occurs, and the standard deviation of x. The essential feature of thedescribing function is that it takes into account the probability that x can liewithin the saturation region.

For values of a which are small relative to S, the probability of saturation islow and nf is approximately equal to one - i.e., f is approximately equal to theTaylor series given in Eq. (6.2·11). For larger values of 0, nf is sill/lificantlysmaller than one because there is a higher probability of saturation. Thus, for agiven saturation function, statistical linearization provides a series of a.dependent linearizations illustrated in Fig. 6.2-3. In subsequent sections theusefulness of this approximation for designing nonlinear filters and analyzingnonlinear system performance is demonstrated.

fix)

Figure 6.2·2 The Describing Function for a Saturation Nonlinearity (Ref. 7)

(a) Saturating Nonlinearity

00

t(b) Describing Function

Figure 6.2-3 Illustration of Statistical Linearization as a a-Dependent Gain

DESIGN OF NONLINEAR FILTERS

The approximation of a nonlinear function I(!> by a statistically optimizedpower series can be combined with results obtained in Section 6.1 to deriveapproximate minimum variance filtering algorithms. In particular, by substituting Eq. (6.2·7) into Eq. (6.2·3), we obtain

fu)"'IW + Nt<! - &)

Because of the aforementioned connection with describing function theory, we~l refer to Nf as the describing function gain matrix. We now can identify iWlth the nonlinear function in the equations of motion [Eq. (6.().1)] for a

TABLE 6.2-1 SUMMARY OF THE STATISTICALLY LINEARIZED FILTER

Using similar notation for themeasurement nonlinearity in Eq. (6.0-2) produces

(6.2-17)

1.S 2.0 2S

TIME (sec)

wIt) - N(O,q)

10

Fip.re 6.2-4 Unforced Solutions to Eq. (6.2-17)

O.S

x(O) - N(O,po)'

x(t): -sIn(x(t)) + wIt)

this PUrpose, it is most frequently assumed that ! is gaussian. Since theprobability density function for a gaussian random variable is completelydefined by its mean &and its covariance matrix P, both of which are part of thecomputation in any filtering algorithm, it will be possible to compute all theaveragesin Eqs. (6.2-15) and (6.2-16).

Assuming that the right sides of Eqs. (6.2-15) and (62-16) are calculated bymaking the gaussian approximation for 1, the statistical linearizatidn for f andIlk can be substituted into Eqs. (6.1-3), (6.1-5), (6_1-13),(6.1-17), and (6.1-18)for the nonlinear filter. Making this substitution and carrying out the indicatedexpectation operations produces the filter equations given in Table 6.2-1.Observe that the structure of the latter is similar to the extended Kalman filterin Table 6.1-1; however, Nf and Nh have replaced F and Hk. The computationalrequirements of the statistically linearized filter may be greater than for filtersderived from T...ay!or series expansions of the nonlinearities because theexpectations - 1, bk' etc. - must be performed over the assumed gaussiandensity for ~. However, as demonstrated below, the performance advantagesoffered by statistical linearization may make the additional computationworthwhile.

Exampla 6.2-1

To compare the filtering algorithm in Table 6.2-1 with those derived in Section 6.1 thefollowing scalar example, due to Phaneuf (Ref. 6). is presented. The equation of motion is

8r------------------,

The unforced solution (w( t) ::: 0) to Eg. (6.2-17) is of interest. and is displayed in Fig. 6.2-4for several different initial values of x, Observe that x(t) approaches a steady-state limit thatdepends upon the value of the initial condition.

(6.2-16)

(6.2-15)

nonlinear stochastic system. The latter is in general a function of time as well, inwhich case Nf as computed from Eq. (6.2·9) is also time-dependent. Introducingthe notation Nt<t) to denote the describing function gain matrix for &,t), wewrite

Now the issue of computingLbk' Nf , and Nh arises. Each of these quantitiesdepends upon the probability density function for 1, a quantity that is generallynot readily available. In fact, the absence of the latter is the entire motivationfor seeking power series expansions for i and hk' Consequently, an approximation is needed for pC!) that permits the above quantities to be calculated. For

System Model i(tl:.Ull.(t),t)+)!(t): )!(tl - N(Q,Q(t))

Measurement Model Mk' !!k(l(lJc))+:ik: k » 1,2, ... : :ik - [email protected])

Initial Conditions 1l(0) - N\&" Po)

Other Assumptions E[l!!(I):ikTJ = 0 for all k and all I

State Estimate£(t)=iCJi(I),ll

Propagation

Error Covariance P(t) = Nr(t) PIt) + PIt) NfT(t) + Q(tlPropagation

Describing FunctionA

CalculationsNr(t) = [llT - ill"T]l"l(t)

State Estimate Update i!t(+): i!t(-) + KkUk-Jik<.lkll

Error Covariance1\.(+). [1-KkNh(k)] 1\.(-)

Update

Gain Matrix. Calculation Kk =1\.(-)Nh·(klT [Nh(k)!\.(-)NhT{k)+ Rkf'

Describing Function Nh(k) = ii<.lkH)llkT(-) - hk<.lk(-)) EJcT(_») Pk-1HCalculations

Jir,I, i are expectations calculated: assuming.! """ Nl.X.P)

Definitions ,ht.&(-)and~) are expectations calculated assuming

ll.k(-) - N!ik(-),!\.HI

210 APPLI ED OPTIMAL ESTIMATION NONLINEAR eSTIMATION 211

A nonlinear discrete-time measurement equation of the form

2k =+ sin(2xk) + VIc, vk - N(O,r) (6.2-18)

In the vector case, the statistically linearized filter requires computation ofthe matrices Nf and Nh in Eqs. (6.2-15) and (6.2-16). This, in general, involvesthe evaluation in real-time of multidimensional expectation integrals having theform

(6.2-19)

(6.2-20)

(6.2-23)

(6.2-21)

(6.2-22)

where g~) is a nonlinear function. In general, gwill be difficult to evaluate;however, its computation can be facilitated by making a change of variableswhich simplifies the exponential term in Eq. (6.2-21) (Ref. 6). First, we define

COMPUTATIONAL CONSIDERATIONS

Both integrals in Eq. (6.2-19) can be represented by the general expression

Similar expressions must be evaluated for~mponents ofh. The quantity ~ij

denotes the ijth element of the matrix fu)!T in Eq. (6.2-9) and PUl) is theassumed joint gaussian density function for ~(t),

so that g becomes

where c is the appropriate normalizing constant. The degree of difficulty incalculating ~ij and fi is an important consideration in judging whether to usestatistical linearization for a particular application. The computational effortrequired is, of course, dependent upon the specific type of nonlinearity. Somesystematic procedures for evaluating the expectation integrals are discussed next.

Next, recognizing that p-l is a symmetric matrix, we can find a nonsingulartransformation matrix, T, such that

12c0ii1w

<:;5iw

~ EXTENDED

£KALMANFilTER

'"w

04

MEAS. NOISE r' 0.Q2

SYSTEM NOISE q' 0.01

1.6,------------------,

1.4

Figure 6.2-5 Estimation Errors Icr Five Filtets (Ret. 6)

0.2

is assumed.The dynamics of this example are sufficiently simple so that a variety of nonlinear filters

can be constructed for estimating x(t). Algorithms were derived for the extended Kalmanfilter, for higher-order filters using second-, third-. and fourth-order Taylor series expansionsof the nonlinearities, and for the statistically linearized (quasi-linear) filter discussed in thissection. The results of monte carlo runs, performed with identical noise sequences for eachof the above five filters, are compared in Fig. 6.2-5. Evidently the error associated withstatistical linearization is much smaller than for the filters derived from Taylor series expansions, even up to fourth-order, during the transient period. This is explained on the basisthat Taylor series expansions are least accurate when the estimation error is large.

(6.2-24)

where D is a diagonal matrixwhose diagonal elementsarethe eigenvalues ofP-I(Ref. 10). Now, if we define

then gcan be expressedanalytically in termsof the firstandsecondmomentsof3. assuming the latter is gaussian. Forexample,if 3 has zeromeanand

(6.2-30)

(6.2-25)

it follows from the properties of T thatgcan be writtenin the form*

then it can be shown (Ref. 24) that

... ./"'-...~ /"'-..../"'-...../"'-.../"-...g= XIX2X3X4 + XIX3X2X4 + XIX4 X2 X3 (6.2-31)

(6.2-33)

(6.2-32).',.~" .l+] ·M,j

wherefi(~) is a nonlinearfunctionof only one state variable. It is demonstratedbelow that this type of structure generally simplifies the computational effortinvolved in computing the describing function gain matrix Nf in Eq. (6.2-9).

Consider the ijth element of Nf which requires calculating the quantity ~ij inEq. (6.2-19). When the nonlinearities are functions of a limited number of statevariables, there will be some elements of Nf that depend upon values of ~ij of theform

where Xs is a subset of the vector X which excludes~. If it is furtherassumedthat~ is a gaussian random variable, then with somemanipulation the expressionfor ~ij in Eq, (6.2-33) can be simplified as follows":

Productnonlinearities arean important classof functionsbecausethey can oftenbeused to approximate moregeneral nonlinearities,

A second special case of practical interestariseswhen the nonlinear part ofthe system dynamics is a function of only a few of the total number of statevariables. For example. theremay be a singlesaturation nonlinearity of the typeshown in Fig. 6.2-1 embedded in a dynamical system which is otherwiselinear.In thiscase, the systemequationof motion could be expressed as

(6.2-26)

(6.2-27)

k=I, ... ,n

where d j is the ilb diagonal element ofD. Consequently, gcan be evaluated fromBq,(6.2-27) by successively computing integrals of the form

g"H~g and

where

or alternatively,

(6.2-28)

The task of evaluating g from Eq, (6_2-27) may still be difficult; however,formulating the solution as a sequence of integrals of the form given in Eq.(6.2-28) permits the systematic application of approximate integration techniques, such as gaussian quadrature formulae. Other methods are suggested inRef. 7_

Turning now to some important special cases ofEq. (6.2-21), we note that.ifsUI) has the form

(6.2-29)

·In particular, T il an orthogonal matrix wl!.ose determinant is one. Consequently. afterchanginl variablesof integration frorn.1ao.l. g is given by Eq. (6.2-26).

"This derivation was suggested by Prof. Wallace E. Vander Velde, Department of Aeronautics and Astronautics, Massachusetts Institute or Technology. Line three of Eq.(6.2-34) is obtained from line two by substituting the gaussian form for Ns,Xj) and p<l.s).


provide a "best" fit to the observed measurement data in a deterministic, ratherthan a statistical sense. To illustrate, for the case with no process noise - i.e.,

a weighted least-squares estimate of the state at stage k, ik' is one which minimizes the deterministic performance index

where

(6.2-35)

3(t) =iUl(t),t) (6.3-1)

(6.3-2)

(6.3-3)

(6.2-36)

and Ilsj is the describing function vector for fjUl.), defined by

!lsj~ EWJI,XJI, - .&,)T] (EIUI, - i,)UI, - i,)T] r'The last equality in Eq. (6.2-34) verifies that ~jj in Eq. (6.2-33) can be calculatedby first linearizing fjUl,) independently of"i' viz:

where {Wi}is a sequence of weighting matrices. Since Eq. (6.3-1) is unforced,,3(t) is determined for all time by its initial value .30; hence, the estimationproblem is equivalent to determining the value of Xo which minimizes Jk. Thekth estimate of the initial condition, .&Ok' can in principle be obtained by solvingthe familiar necessary conditions for the minimum of a function,

(6.34)

where

where.&k in Eq. (6.3-3) is expressed in terms of.&ok through the solution to Eq.(6.3-1).

The above formulation of the estimation problem is in the format of aclassical parameter optimization problem, where J, the function to beminimized, is subject to the equality constraint conditions imposed by thesolutions to Eq. (6.3·1) at the measurement times. The latter can be expressed asa nonlinear difference equation

and then carrying out the indicated expectation operation. This result simplifiesthe statistical linearization computations considerably, when nonlinearities arefunctions of only a few state variables.

SUMMARY

This section demonstrates a method of deriving approximately optimalnonlinear filters using statistical linearization. In many instances, this techniqueyields superior performance to filters developed from Taylor series expansion ofthe nonlinearities. The fact that practical application of the filter algorithmrequires that a form be assumed for the density function of the state is notoverly restrictive, since the same assumption is implicit in many of the existingTaylor series-type filters. The decision as to which of the several types of filtersdiscussed in Sections 6.1 and 6.2 should be employed in a particular applicationultimately depends upon their computational complexity and relative performance as observed from realistic computer simulations.

(6.3-5)

(6.3-6)

6.3 NONLINEAR LEAST-SQUARES ESTIMATION

As an alternative to the minimum variance estimation criterion employedthroughout Sections 6.1 and 6.2, we briefly mention the subject of least-squaresestimation. This approach requires no statistical assumptions about the sourcesof uncertainty in the problem, because the estimate of the state is chosen to

Consequently, the estimate .i-ok at each stage is typically determined as thesolution to a two-point boundary problem (e.g., see Ref. 17). For generalnonlinear functions hkand L the boundary value problem cannot be solved inclosed form; thus, approximate solution techniques must be employed to obtainpractical algorithms. To obtain smoothed estimates of.x.o, after a fixed numberof measurements have been collected, various iterative techniques that have been


developed specifically for solving optimization problems can be employed - e.g.,steepest descent (Ref. 17), conjugate gradient (Ref. 11) and quasi-linearization(Refs. 12 and 13). To obtain recursive estimates at each measurementstage - i.e., to mechanize a filter - the sequence of functions, Jt. J'l" ..•defined in Eq. (6.3-3), must be minimized. Approximate recursive solutions havebeen obtained using both the invariant embedding (Ref. 14) and quasilinearization methods. Extensions of the least-squares method to cases where Eq.(6.3-1) includes a driving noise term, and where the measurements arecontinuous are given in Refs. 15 and 16.

The above discussion presents a very brief summary of the philosophy ofleast-squares estimation. It is a useful alternative to minimum variance estimation in situations where the statistics of uncertain quantities are not well defined.To obtain specific least-squares data processing algorithms, the reader is referredto the cited works.

6.4 DIRECT STATISTICAL ANALYSIS OF NONLINEARSYSTEMS - CADET'M

One often encounters systems, for which the statistical behavior is sought, inwhich significant nonlinearities exist. These problems may include filters. linearor otherwise, or may simply involve statistical signal propagation. In either case,the existence of significant nonlinear behavior has traditionally necessitated theemployment of monte carlo techniques - repeated simulation trials plusaveraging- to arrive at a statistical description of system behavior. Hundreds,often thousands, of sample responses are needed to obtain statisticallymeaningful results; correspondingly, the computer burden can be exceptionallysevere both in cost and time. Thus, one is led to search for alternative methodsof analysis.

An exceptionally promising technique (Ref. 21), developed by The AnalyticSciences Corporation specifically for the direct statistical analysis of dynamicalnonlinear systems, is presented herein. It is called the Covariance AnalysisDEscribing function Technique (CADET). This technique employs the device ofstatistical linearization discussed in Section 6.2; however, the viewpoint here isstatistical analysis rather than estimation.

The general form of the system to be considered is given by Eq. (6.0-1) and isillustrated in Fig. 6.4-1, where .x is the system state vector, ~ is a white noiseinput [~,.." N(h,Q)], and fu) is a vector nonlinear function. The objective is todetermine the statistical properties - mean and covariance matrix - ofX(t). Thesuccess of CADET in achieving this goal depends on how well !U) can beapproximated. The approximation criterion used here is the same as thatemployed in Section 6.2; howeser, the CADET algorithm is derived somewhatdifferently here, in order to expose the reader to additional properties ofstatistical linearization.

Consider approximation of the nonJinear function 1(x) by a linear function,in the sense suggested by Fig. 6.4~2. The input to the nonlinear function, 4. is


NONLINEARFUNCTION

Figure 6.4~1 Nonlinear System Block Diagram

taken to be comprised of a mean, m. plus a zero mean. independent. randomprocess.j, Thus,*

(6.4·1)

The mean can arise due to an average value of~. a mean initial value of 4. arectificauon in the nonlinearity. or a combination of these.

APPROXIMATIONr - --I

IIIr

r rL- ..J

Figure 6.4-2 Describing Function Approximation

*Comp~ring ~bsequent notation ~with the jlevelopment in Section 6.2, note that thefonoWing equivalences hold:.! = -.x. Nmlll =1. and Nr = NC.

218 APPLIED OPTIMAL ESTIMATiON NONLINEAR ESTIMATION 219

Replacing the nonlinear function in Eq. (6.0-1) by the describing functionapproximation indicated in Fig. 6.4-2, we see directly that the differentialequations of the resulting quasi-linear system ares (see Fig. 6.4-3)

We think of the approximating output. !aU). as being comprised of the sumof two terms, one linearly related to mand the other linearly related to 1. Theso-called multiple-input describing function gain matrices, Nm and Nr• arechosen to minimize the mean square error in representing the actual nonlinearity output.!<JI) by!a(z).

Calculation of Nm and N, is readily accomplished. Note first from Fig. 6.4-2that

Ii!= NmUD,S)m +h

i.= N,UD,Sh +l! (6.4-9)

(6.4-2)

and the covariance matrix for.!.satisfies

s= N,UD,s) S +SN,TUD,S)+Q (6.4-10)

Forming the matrix ss.T. we minimize the mean square approximation error bycomputing

where Eq. (6.4-10) is simply the linear variance equation with F replaced byN,UD.S)- These equations are initialized by associating the mean portion of .1\(0)with m(O). and the random portion with S(O),where

(6.4-3) S(O)= E[(z(O) - m(O» (z(0) - m(o»Tl

These computations result in the relationships A single forward integration of these coupled differential equations will thenproduce m(t) and S(t).

and

RANDOMPORTION

DETERMINISTICP~TrON

II

I/

//

/

~---..<\

\\

\\

\\

(6.4-7)

(6.4-6)

(6.4-5)

(6.44)

and assuming that S is nonsingular, we find

since E[nnT) = E[ll!LT) = o. Equations (6.44) and (6.4-5) define Nm and N,.Denoting the random process covariance matrix by S(t), viz:

FiJUle6.4-3 Quasi-linearSystem Model

Rather than attempt to solve for Nm (which requires a pseudoinverse calculation, sincemmT is always singular), we simply require

(6.4-8)

'This result is aD we shall need to solve the problem at hand. Evaluation of theexpectations in Eq. (6.4-7) and (6.4-8) requires an assumption about theprobability density function of At). Most often a gaussian density is assumed.although this need not be the case.

A few special cases are worth noting. When the system is /inear,fu) = FX andEqs. (6.4-7) and (6.4-8) immediately lead to the result Nm = N, = F.Corresponding to Eqs. (6.4-9) and (6.4-10) are the familiar equations for meanand covariance propagation in linear systems. When the nonlinearity is odd [i.e .•I! -lI) = -.!<JI)I. the effective gain to a smaU mean m in the presence of a

"Ihese equations also fonow directly from Eqs. (6.1-3) and (6.1-5). Note that.!! =~ - 11is simply the zero mean value portion of the input. l!:.


multidimensional gaussian process• .!, can be shown to be the same as theeffective gain of that nonlinearity to L itself (Ref. 6), that is

The same is true for the effective gain to a small sinusoid and other small signals.Discussion of this interesting result can be found in Ref. 7. It is also worthnoting that when! is gaussian, Nr can be computed from the relationship

lim Nm =Nr(m =illw-Q

(6.4-11)

(6.4-12)

and then utilize this result to obtain nr• viz:

nr=~ L- f(r)rp(r)dro __

2[£6 1 18

/

2 1]=z r-dr+ sr-r-dr0" 0 a 6 a

Proof of this useful result is left as an exercise for the reader. Finally, in the caseof scalar nonlinearities, the scalar describing function gains are computed from[Eqs. (6.4-7) and (6.4-8)J

foro 1-> 6 .j3

where r has been assumed to be a gaussian random process. Tables of the resuJtof this calculation for a wide variety of common nonlinearities can be found inRef. 7. When the mean of x is known to be zero, Weset m to zero and calculateonly a single-input describing function,

Example 6.4-1

One issue of considerable importance is the degree of sensitivity which describingfunction gains display as a function of different input process probability densities. To get afeel for this sensitivity. we shall examine the limiter nonlinearity with a zero mean. randominput whose probability density function is either uniform. triangular or gaussian.

Consider the uniform probability density function. For a/2 ;a. 45 (see Fig. 6.44) we firstcalculate 0"2 as

Figure 6.44 Describing Function Results With Different Input Probability Densities

3s:a

--d=E-0 0

"2 "21

~~USSIAN

1 0

09

0.8

07

06

n, 05

0.'

0.3

02

01

00

(6.4-14)

(6.4-13)

(6.4-15)

1 1- a an = --- f(r + m)re-(r /20 Jdrr "j21i 0' __

J 1- '/ anm =--- f(r+m)e-1r 20 Jdr"j21irna __

1 1- '/'n =--- f(r)re-1r 20 Idrr "j21ia' __

These calculations are also extensively tabulated in Ref. 7.

and

222 APPLIED OPTIMAL ESTIMATIONNONLINEAR ESTIMATION 223

Similarly, for the triangular probability density function we calculate

For al2 < 6, it is clear that the input never causes output saturation to occur. Hence, in thisevent we ["rod

a b'(1::=-

6CADET and 200 run ensemble monte carlo simulation results for the evader acceleration(X4) and the relative separation (X2) are presented in Fig. 6.4--6 for the linearized system(6=00) along with a Ig and a O.lg pursuer lateral acceleration saturation level. The nns missdistance is given by the relative separation at 10 seconds (TGO=O). The results clearlydemonstrate good agreement between CADET and the monte carlo method, even when thenonlinear saturation effect is dominant, as in Figure 6.4-6d. The advantage of CADET, ofcourse. is that it llnalyticoJly computes the system statistics, thereby saving considerablecomputer time. Mere extensive applications of CADET are discussed in Ref. 25.

o 1for 7;0> .J6

o Ifor7< .J6= 1

"'" ~~ ~ _~(!.-)' + _I (!.-)']..130 l 6 0 12.,f6 0

and

o 1nr = 1 for-;<..j3

For the gaussian probability density function we obtain the form

in which the so-called probability integral, which is tabulated (R.ef. I), occurs. This resultwas depicted earlier in Fig. 6.2-2.

The results of these three calculations are plotted in Fig. 6.44. Qualitatively, at least, therelative insensitivity of the linearizing gain to the various input signal probability densities isapparent. This is obtained with other nonlinearities as well. It accounts, to some degree, forthe success of CADET, given that the required probability densities are in fact never exactlyknown.

Exampla 6.4-2

Consider the problem ofa pursuer and an evader, initially closing head-on in apIane. Theevader has a random lateral acceleration maneuver, perpendicular to the initialline-of-sight,characterized by a first-order markov process with a standard deviation of 0.5g (16.1ft/sec2 ). The pursuer has first-order commanded lateral acceleration dynamics with lateralacceleration saturation. Pursuer lateral acceleration guidance commands are taken asproportional to the pursuer-evader line-of-sight rate (proportional guidance), with theguidance gain a set at 3. A block diagram of the system under consideration is shown in Fig.6.4-5. where TOO is the time-to-intercept. Total problem time is 10 seconds.

In the absence of information to the contrary, we assume x 1 to be a gaussian randomprocess. If it is. the nonlinearity output will be significantly nongaussian for (1 > .s;but theaction of linear filtering around the loop back to x I will be such as to reshape the densityfunction, whatever it is, back towards gaussian (a result of the central limit theorem). This isthe often cited filter hypothesis, common to most describing function analyses.

Making use of Eqs. (6.4-1) and (6.4-12), the quasi-linearized system can be representedin state vector notation as follows:

%- r---------~P\)l1SUUISAlUllATION

DH1NlllON':>

"I • COMMAI'lOfOPUR!ilER ACCEtERAfIQN

"2' RRATNE lATEllAL SEPARATION

"J ' RElATIVESEPARATION RATE

". ,EVAOElllATERALPoCC-HERATIONw.,WHllENOSE

Figwe 6.4-5 Kinematic Guidance Loop With Pursuer Lateral Acceleration Saturation

1. Jazwinski, A.H., Stochastic Processes and Filtering Theory . Academic Press, New York,1970.

2. Wong, E., Stochastic Processes in Information and Dynamical Systems, McGraw-HillBook Co., lnc., New York, 1971.

3. Fraser, D.C., "A New Technique for the Optimal Smoothing of Data," Ph.D. Thesis,Massachusetts Institute of Technology. January 1967.

4. Leondes, C.T•• Peller, J.B•• and Stear, E.B., "Nonlinear Smoothing Theory." IEEETrans. Systems Science and Cybernetics , Vol. SSC-6, No.1 , January 1970.

5. Wishner, R.P., Tabaczynski, J.A., and Athans, M., "A Comparison of Three Non-LinearFilters," Automatica, Vol. 5, 1969, pp. 487-496.

6. Phaneuf, R.J., "Approximate Nonlinear Estimation," Ph.D. Thesis, MassachusettsInstitute of Technology, May 1968.

REFERENCES

Simulation Results for Various Levels of PursuerLateral Acceleration Saturation

7. Gelb, A. and Vander Velde, W.E., Multiple-Input Describing Functions and NonlinearSystem Design, McGraw-Hill Book Co., Inc., New York, 1968 .

8. Sunahara, Y., " An Approximate Method of State Estimation fQr Nonlinear DynamicalSystems," Joint Automatic Control Conference, University of Colorado, 1969.

9. Mahalanabis, A.K. and Farooq, M., "A Second-Order Method for State Estimation ofNonlinear Dynamical Systems,"lnL J. Control, Vol. 14, No.4, 1971, pp. 631-639.

10. Hildebrand, F.B. , Methods of Applied Mathematics, Prentice-Hall, lnc., EnglewoodCliffs, N.J., 1952.

11. Fletcher, R. and Reeves, C.M., "Function Minimization by Conjugate Gradients," TheComputer Journal, Vol. 7, 1964, p . 149.

12. BeUman, R.E. and Kalaba, R., Quasilinearization and Boundary Value Problems,American Elsevier Publishing Co. , New York , 1965.

13. Chen, R.T.N., "A Recurrence Relationship for Parameter Estimation by the MethodQuasilmearizanon and Its Connection with Kalman Filtering," Joint Automatic ControlConference, Atlanta, Georgia, June 1970. ~

14. Pearson, J.B. , "On Nonlinear Least-Squares Filtering," Automatica, Vol. 4, 1967, pp.97-105.

15. Sage, A.P., Optimum Systems Control, Prentice-HaU, Inc., Englewood Cliffs, N.J.,1968.

16. Sage, A.P. and MeJsa, J.L., System Identification, Academic Press, New York, 1971.

17. Bryson, A.E., Jr. and Ho, Y.C., Applied Optimal Control, BlaisdeU Publishing Co.,Waltham, Mass., 1969.

18. Schmidt, S.F., Weinberg, J.D. and Lukesh, J.S. , "Case Study of Kalman Filtering in theC5 Aircraft Navigation System," Case Studies in System Control, University ofMichigan, June 1968, pp. 57-109.

19. Sorenson, H.W. and Stubberud, A.R. , " Nonlinear Filtering by Approximation of theAposteriori Density," Int. J. Control, Vol. 18, 1968, pp. 33-51.

20. Sorenson. H.W. and Alspach. D.L., "Recursive Bayesian Estimation Using GaussianSums," A utomatica , Vol. 7, 1971 , pp. 465-479.

21. Gelb, A. and Warren, R.S., " Direct Statistical Analysis of Nonlinear Systems," Proc,AIAA Guidance and Control Conf), Palo Alto, August 1972.

22. Sage, A.P. and Melsa, J.L., Estimation Theory with Applications to Communicationsand Control, McGraw-Hill Book Co. , Inc., New York, 1971.

23. Bizwas, K.K. and Mdhalanabis, A.K., "Suboptimal Algorithms for NonlinearSmoothing." IEEE Trans. on Aerospace and Electronic Systems, Vol. AE5-9, No.4,July 1973, pp. 529-534.

24. Davenport, W.B., Jr. and Root, W.L., An Introduction to the Theory of RandomSignals and Noise, McGraw-Hill Book Co., lnc., New York , 1958.

25. Price, C.F., Warren, R.S. , Gelb, A., and Vander Velde, W.E., "Evaluation of HomingGuidance Laws Using the Covariance Analysis Describing Function Technique," Trans.First NWC Symposium on the Application of Control Theory to Modem WeaponsSystems," China Lake, June 1973, pp. 73-94.

PROBLEMSProblem 6-1

SUppose that a parameter x has the probability density function

p(x) =A2

X e-Ax for x ;;;. 0 , =0 for x < 0

1210

10

CADET

8. 322 h/_'

MONTECARlOSOlUTION

46.

TIME (sec)

(b)

4 6 •

TIME (sec)(ell

2

2

o

-8'00- -

MONTECAilO

- SOlUrlON. .

I-

f- ~ C"OfT

0 I",-

- - f- -.0 , .. :+. II .

l-I-

I-0 .

0

--:::.....z· 2S0Q

~0:a! 200wCIl

-J

ffi ISO

~-J

~ 100

~~ SOCIl

~

_ 120-.:::.....~ IOO

!;i0:a! .0III-JC(0: 6W

~~ 4

~~ 2CIl::E0:

12

1210

10• 6 •

TIME (see)

(0 1

.6.TIME (sec)

[e 1

Figure 6.4-6

MONTECARlOSOlUTION

2o

.l- I- '8 I I I I I ',"_

• 32 2 I" " , _

MONrECARLOSOlUTION

017 " r...: f-.1- . >- .

JiI f-0 .

- -.. -

0CADET

- - -0 - - .

-0

.....ZIOOo!;i0:

a! •wCIl

-J

~ 6w~-J

~ 4

!;i~ 2CIl::E0:

_ 120---

24

-120

::::~....z 16

Q~

~ 12w

~~ •~&AI

CIl •::E0:

00 2

with a specified vaJue of A.. (a) Show that Ejx] = 2/A.. (b) Show that the maximum vaJue ofp(x) occurs at x = 1/1\. (c) Consider an arbitrary density function, P(Y), for a randomvariable. Determine restrictions on I'<y) such that E(y) is the same as the value of Y thatmaximizes p(y).

Problem 6-2

Prove that Eq. (6.1-5) reduces to

P=FP+PFT+Q

N()I'IIUNEAR ESTIMATION 227

(4x2m2 + 4xmJ + m4)(~ - h) - (2xm2 + 1111';;> -~21b= -~, -- - ----;;;-2~ --=- I~-~ .: - ~-

-(2xm2 + m3)(fi - [;) + nl2(;) - i ...1\c = ---~-----2 -

m2m4 - rnJ

where

when!~=F.! for i = 2,3,4

Problem 6-3

Consider a scalar nonlinear system

x(t) = fIx) + wIt); w(l) - N(O,q)

with me~surements

Let the estimate of the state at time tk be updated according to

Xk(+) = 3k + bkzk + ckzic?

Assuming that the conditional mean just before the update, Xk(-), is known, (a) show that

xk(+) is unbiased if

ak + bkh + ck(h 2 +r) - xk(-} = 0

(b) Detennine bJc and ck, such that xk(+) is a minimum variance estimate.

Problem 6-4Derive the estimation equations given in table 6.1-3 following the steps outlined in the

text

Problem 6-5

Defining

st) =Pb-I(l) Rbft)

derive Eqs. (6.1-42) and (6.1-43) from Eq•. (6.1-39) and (6.1-40).

Problem 6-6(a) Given a scalar nonlinear function f(x) of a random variable x, show that the constants

a, b, and c which minimize the mean square error

E[(f(x) - a - bx _ ex2)2]

are given by

(b) Show that if f(x) is an odd function of x, and if

x= m3 = 0, thenc= n end

A -fx - fx

b=-m,Compare these values for b and c with Eqs. (6.2-7) and (6.2#10).

Problem 6-7

Suppose x is a zero mean gaussian random variable with variance 0 2• Show that Elxn , =

(I) (3) ..•. (n_l)on for n even and zero for n odd. (Hint: Use the characteristic function(Ref. 24) of the probability density ror x.)

Problem 6-8

Supply the missing details in deriving the third equality in Eq. (6.2·34).

Problem 6-9

Employ Eq. (6.4-3) to arrive at the relationships which define the multiple-inputdescribing functions Nm(m's) and Nr<m.S), Eqs. (6.4-4) and (6.4-5).

Problem 6-10

Demonstrate that the describing function approximation error, ! (see Fig. 6.4-2), isuncorrelated with the nonlinearity input. That is, show that Els'!'TJ = O.

Problem 6-11

For a scalar nonlinearity described by

show that the multiple-input describing function gains om and nr are given by

nm(m, or2) = 1 +m2 + 30/

nr<m, or2) = I +3m2 + 30/

(Hint: Use the result E(l] = 3(1r4, valid for a gaussian random variable r.)

Problem 6-12

For the ideal relay nonlinearity defined by

f(x)=D forx>O

=0 forx=O

= -D for x< 0

show that the describing functions for gaussian, triangular and uniform input signalprobability density functions are:

! D DDr(gaussian) = - -""'0.80-

• 0 0

$ D DIlr(triangular)= - -""0.82-

3 0 0

Dr(uniform) =E~ ""'0.87~

Problem 6·13

For the nonlinear differential equation

where w - N(b,q). show that the covariance analysis describing function technique(CADET) yields the following equations for mean and covariance propagation:

7. SUBOPTIMAL FILTER DESIGN ANDSENSITIVITY ANALYSIS

After an excursion into the realm of nonlinear systems, measurementsand filters in Chapter 6, we now direct our attention back to the moremathematically tractable subject of linear filtering, picking up where we left offat the conclusion of Chapter S. The subject here. suboptimal linear filters andlinear fllter sensitivity, is of considerable practical importance to the presentday design and operation of multisensor systems.

The filter and smoother equations, developed in Chapters 4 and 5, provide asimple set of rules for designing optimal linear data processors. At first glance,the problem of filter design appears to have been solved. However, when theKalman filter equations are applied to practical problems, several difficultiesquickly become obvious. The truly optimal filter must model all error sources inthe system at hand. This often places an impossible burden On the computational capabilities available. Also, it is assumed in the filter equations that exactdescriptions of system dynamics, error statistics and the measurement processare known. Similar statements apply to the use of the optimal smootherequations. Because an unlimited computer capability is not usually available, thedesigner of a filter or smoother purposely ignores or simplifies certain effectswhen he represents them in his design equations; this results in a suboptimaldata processor. For this reason, and because some of the information about thesystem behavior and statistics is not known precisely, the prudent engineerperforms a separate set of analyses to determine the sensitivity of his design toany differences that might exist between his filter or smoother and one that fitsthe optimal mold exactly. This process is called sensitivity analysis. The

230 APPLIED OPTIMAL ESTIMATION SU30PTIMAL FILTER DESIGN AND SENSITIVITY ANALYSIS 231

Figure 7.1·1 Illustration of Relative Computer Burden and Estimation Error forThree Data Processing Algorithms

-

-- ---f---~---- ---- ----

i ;;< ~z ~ 1----::;; «>= ;;::;; ;;<~ >-~ Z ::;; ~z

~0.< ... «'" O~

~>=

::J~ ~~'" '" 0.<~ O~

sensitivity analysis discussed here is distinct from another procedure which may,from time to time, be given the same name in other writings - that ofrecomputing the optimal filter equations for several sets of assumptions, eachtime finding the accuracy Which could be achieved if all the conditions ofoptimality were perfectly satisfied.

In addition to establishing the overall sensitivity of a particular Hnear dataprocessing algorithm, the equations and procedures of sensitivity analysis caninform the filter designer of mdividual error source contributions to estimation errors. This type of source-by-source breakdown is valuable inassessing potential hardware improvements. For example if, for a given systemand filter, the error contribution from a particular component is relatively small,specifications on that piece of equipment can be relaxed without seriouslydegrading system performance. Conversely, critical component specifications arerevealed when errors in a particular device are found to be among the dominantsources of system error.

This chapter discusses various proven approaches to suboptimal filter design.The equations for sensitivity analysis of linear filters and smoothers arepresented, with emphasis on the underlying development of relationships foranalyzing systems employing "optimal" filters. Several valuable methods ofutilizing information generated during sensitivity analyses are ilJustrated, and acomputer program, organized to enable study of suboptimal filter design andsensitivity analysis, is described.

7.1 SUBOPTIMAL FILTER DESIGN

-------------COMPUTERBURDEN

------ESTIMATIONERROR

The data-combination algorithm, Or filter. for a multisensor system is veryoften a deliberate simplification of, or approximation to, the optimal (Kalman)filter, One reason, already noted, for which the filter designer may choose todepart from the strict Kalman filter formula is that the latter may impose anunacceptable computer burden. Figure 7.1-] illustrates the common experiencethat a judicious reduction in filter size, leading to a suboptimal filter, oftenprovides a substantially smaller computer burden with little significant reductionin system accuracy (the "conventional" data processingalgorithm is understoodto be a fixed-gain filter, arrived at by some other means).

Since solution of the error covariance equations usually represents the majorportion of computer burden in any Kalman filter, an attractive way to make thefilter meet computer hardware limitations is to precompute the error covariance,and thus the filter gain, In particular, the behavior of the filter gain elementsmay be approximated by a time function which is easy to generate using simpleelectronic components. Useful choices for approximating the filter gain are:constants, "staircases" (piecewise constant functions), and decaying exponentials. Wheneveran approximation to the optimal filter gain behavior is made, asensitivity analysis is required to determine its effect on estimation errors.

The filter designer may also depart from the optimal design because it appearstoo sensitive to differences between the parameter values he uses to design the

filter and those which may exist in practice. By choosing an appropriatesuboptimal design, it may be possible to reduce the sensitivity to uncertainparameters, as illustrated in Fig. 7.1·2. In the figure, it is assumed that there is arange of uncertainty in the value of 0true' a certain parameter which is criticalto the filter design. The value of the parameter that is used, 0design' is chosen inthe center of the interval in which 0true is known to He.Of course when 0true =I=

0design the filter suffers some loss of accuracy, as indicated by the concaveshape of the plots. In the hypothetical case shown, the theoretically optimalfilter requires thirty states (n=30), but exhibits a wide variation in performanceover the range of design uncertainty. The lQ..statedesign is relatively insensitivebut provides less accuracy than the 20-state filter shown, over the region ofuncertainty. The minimum sensitivity filter of 20 states represents the "best"filter in this case. The best filter may be found, for example, by assuming aprobability density function for 0true and computing an expected value of theperformance measure for each proposed filter, selecting the one with the lowestmean performance measure. It must be noted that reduced sensitivity is achievedat the price of a larger minimum error.

While the discussion of suboptimal filter design in this chapter centers onmodifications to the basic Kalman filter procedure, observer theory (see Chapter9) is also suggested as a viable technique for producing realizable filters.

232 APPLI EO OPTIMAL ESTIMATION SUBOPTIMAL flL-TER DEsIGN AND SENSITIVITY ANALYSIS 233

(7.1-3)

(7.1-4)

(7.1-2)

(1.1-1)

(7.1-50)

XI '" -QIXI + WI

zl"'xi +vl

2 2

Pit = -20: I P II +qll - ~ - ~fl. l22

PI2 = - (at +a2 +~ +P22)PI2 +7P1Ic.. £22

2 2

(>22 = -2a2P22 + q22 - ~ - ~ + 2'Ypq'22 rll

Oecoupling States - The number of multiplications necessary to compute theerror covariance matrix.used in Kalman-based linear filters generally varies as thethird power of the state size (for more details see Chapter 8). Therefore. themain thrust of attempts to reduce the computer burden imposed by such filtersis aimed at reducing the filter state size. Often the number of state variablesreaches an irreducible minimum and the filter still makes excessivedemands onthe computer. Sometimes, an alternative technique is available that complementsthe more obvious approach of deleting state variables; if certain portions of thesystem are weakly coupled, it may bepossible to break the relatively high-orderfilter into several mutually exclusive lower-order filters - each with separatecovariancecalculations, filter gains, etc. The advantage is evident if one considers.breaking an n-state filter into three 1I/3-state filters. Using the rule stated above,the n-state filter requires kn3 multiplications each time its covariance matrix ispropagated. On the other hand the three nj3.state filters require it total of3k(n'j27) = kn'j9 multiplications to perform the same operation; thecorresponding computer burden is thus reduced by about 90%.

where qr r , Q22, r r r and r22 are the spectral densities of WI>W2. VI and va, respectively.If the coupling term "( is smaU. it is tempting to view the system as two separate,

uncoupled systems, as follows (see Fig. 7.1-4):

Example 7.1-1

Consider the second-order coupled system, with continuous measurements (see Fig.7.1-3), given by

where w .. wa- VI and v2 are uncorrelated white noises.Notice that the subsystem whose state is XI receives no feedback from x2- If 'Y is zero,

the system is composed of two independent first-order markov processes. The estimationerror covariance equations for a Kalman filter for the coupled. second-order system are

FiIU,e7.1-2 Conceptual Example of Designing for Minimum Sensitivity(0lrlle held constant, "design is varied)

CHOOSING SIMPLIFIED SYSTEM MODELS

As just indicated, the filter designer may need to reduce the number of statesmodeled not only because a fully optimal IiItel imposes too great a burden onthe computer available, but also because the optimal filter may be too sensitiveto uncertainties in the statistical parameters (spectral densities of system noises,etc.) which must be provided. There is emerging a body of theory which canhelp the designer to reduce the sensitivity of his filter; this facet of the designproblem is discussed later. By and large, however, the filter designer is leftwith few rules of general applicability to guide him in the process of eliminatingthose parts of the full description of the system that can be ignored, rearrangedor replaced by a simpler mathematical description. He must depend, first andforemost, on his physical understanding of the system with which he is dealing.The common procedure is to make a simplified model based on insight, thenanalyze the accuracy of the resulting filter in the presence of a complete set ofsystem dynamics, properly represented. Thus. the approach is generally one ofanalysis rather than synthesis, and a number of steps are necessary before asatisfactory result emerges. The equations for such analyses are given in Section7.2. The present discussion concentrates on particular filter simplificationapproaches which have proven suc-cessful in the past, illustrated by examples.

fTdesignL.... -::.L.'-.L..L...~L<:.LLL:L,----.... fT

t rue


v,

SUBOPnMAL FILTER DESIGN AND SENSITIVITY ANALYSIS 235

and

Several qualitative observations can be drawn from a comparison of Eqs. (7.1-2), (7.1-3)and(7.14) with Eqs. (1.1-6) and (7.1-7). The expressions for Pll and P22 differ by termsproportional to Pl1 and P122; if P12 is small, the decoupled filter error covariances aresimilar to those of the nIter modeling the complete system. Furthermore, the terms in PIIthat involve Pt2 always acts to reduce the error covariance in the complete filter ; thissuggests that the filter using the complete model of the system will always do a better job ofestimating XI. Finally. if we view the error covariances as varying very slowly (a quasi-staticapproximation), Eq. (7.1-3) shows that the estimation error correlation term PI2 can beviewed as the output of a stable first-order system (assuming (tl > 0, (t2 > 0) driven by aforcing term PI I· It can be seen that if the coupling coefficient is zero and PI2(O) '" 0, thetwo sets of error covariance equations are identical.

X2 '" -(t2x2 + W2

22'" X2 +V2

In this case the estimation error covariance equations are simply

2

Pll "'-20: IPll +qU-~'II

2

P22'" -20:2P22 + q22 -~'"

(7.1·Sb)

(7.l-{i)

(7.1-7)

Figure 7.1-3 Block Diagram of Example System

'1

'2

Figure 7.14 Block Diagram of Example System with-r Taken as Zero

Deleting States - When the computer burden indicates a need to deletestates. it must be done very carefully, and always with some risk. Here the filterdesigner's knowledge of the physics of the problem and of how the filter workswill assist him in selecting which states to eliminate first. Once the state vectorand nominal dynamic and statistical parameters are selected, the resulting filteris subjected to a performance evaluation via sensitivity analysis techniques (seeSection 7.2). If the performance is not acceptable, dominant error sources mustbe isolated and an effort made to reduce their effects. Where a dominantcorrelated error source was not estimated initially in the filter, it may benecessary to estimate it by adding extra states to the filter.*If it was too simplymodeled in the filter, model complexity may have to be increased or modelparameters adjusted to reduce the effect of this error source. Based onconsiderations of this type, the filter design is amended and the evaluationprocess repeated, thus starting a second cycle in this trial and error designapproach.

Example 7.1-2

Consider the second-order system shown in Fig. 7.1-5 and given by Eq. (7.1-8),

*It should be emphasized that the act of estimating an additional important correlatederror source does not, in itself, guarantee significantly improved filter performance.

236 APPLIED OPTIMAL ESTIMATION SUBOPnMAL FILTER DESIGN AND SENSITIVITY ANALYSIS 231

Fipre 7.1·5 Block Diagram ofEq. (7.1·8)

Between measurements the error covariance equations for the state variables in Eq. (7.1-8)are given by

FILTERINDICATION of ESTlMATrONElitROR CORRelATIONS

ACTUAL ESTIMATIONElit ROR CQlitRElATlONS

VerifiCa1ion of Filter Design - We conclude this section with an attempt toimpress upon the reader the importance of checking, by suitable sensitivityanalyses, any suboptimal filter design he may produce. The motivation is bestillustrated by discussing an unusual type of filter behavior that has beenobserved in many apparently well-thought-out SUboptimal filter designs, whenthey were subjected to analysis which correctly accounts for all error sources.This anomalous behavior is characterized by a significant growth in errors in theestimate of a particular variable when some measurements are incorporated bythe filter. The reason for this unusual performance is usually traced to adifference between error correlations indicated by the filter covariance matrix,and those that are found to exist when the true error behavior is computed.

Figure 7.1-6 helps to illustrate how incorrect correlations can cause ameasurement to be processed improperly. Before the measurement the stateestimates are Xo and Yo, with the estimation error distributions illustrated in thefigure by ellipses which represent equal-error contours. The measurementindicates, within some narrow band of uncertainty not shown, that the state y isat Ym. Because the measurement is quite precise, the filter corrects the estimateof the y coordinate. Y. to a value that is essentially equal to the measurement,Ym- Because of the correlation that the filter thinks exists between the x and ycoordinates, the x coordinate is corrected to Xt. Observe that no directmeasurement of x is made. but the correlation is used to imply x from ameasurement of y. If the correct error correlations were known, the filter would

(1.1-8)

(1.1-9)

(1.1-10)

(1.1-11)

where q is the spectral density ofw.lf the frequency content of xl!,sh,gh, compared to thebandwidth of the first-order loop whose output is x2, the filter designer may wish to modelthe entire system by a single first-order markov process driven by white noise. In that case,only one state variable, X2, remains. The spectral density, q', of the ''white noise" drivingthis system is given by q' = 2Plioo/olJ = q/0I12 (Refs. 1 and 2), where Plloo is thesteady-state value of PII calculated from Eq. (7.1-9). The error covariance equation for thelone state variable of the simplified system is

In the steady state, the state error covariance of the simplified system can be found fromEq. (1.1-12) to be

The steady-state error covariance of the corresponding state, x2' in the full system is foundfrom Eqs. (1.1-9), (1.1-10) and (1.1-1 I) to be

FILTER CORRECTS THESTATE,USINGMEASuREMENT Ym

(1.1-14)

(1.1·13)

(1.1-12)

q' qPoo=- = --,-

20:2 2«1 a2

It can be seen from Eqs. (7.1·13) and (7.1·14) that when the frequency content of Xl ismuch larger than the bandwidth of the first-order system whose output is X2 (i.e.,al > 012),the steady-state covariance for the output of the simplified system closely approximatesthat of the full system. Reference 3 explores th" techn,que in more generality. It was alsoemployed in Example 4.6-1.

Fipre 7.1-6 Illustration of Improper Use of a Measurement

23B APPLIED OPTIMAL ESTIMATION

correct the state to YI =:!: Ym and i'l' Because the fllter covariance matrixindicated the wrong correlation, the filter has "corrected" its estimate of x inthe wrong direction, thus increasing the error in its estimate of that variable.This sori of improper behavior cannot usually be anticipated when choosingstates, etc., for a suboptimal filter design. It can only be observed by performingsensitivity analyses.

CHOOSING SIMPLIFIED FILTER GAINS

It has been pointed out previously that the major portion of thecomputational burden imposed by the Kalman filter involves computing thefilter error covariance matrix for use in determining the filter gains. While theprevious discussions emphasized the role of reducing this effort by simplifyingthe system description and cutting the filter state size, it is sometimes possibleand desirable to eliminate altogether the on-line covariance calculations. In thesecases, the error covariance matrix is computed beforehand and the filter gainhistories are observed. While a pre-recorded set of precise gain histories could bestored and used, a more attractive approach is to approximate the observed gainbehavior by analytic functions of time, which can be easily computed in teal

TIME

FilUle 7.1-7 Piecewise Constant Suboptimal Gain

SUBOPTIMAL FILTER DESIGN AND SENSITIVITY ANALYSIS 239

time. These tend to be exponentials, staircases and constants. Figure 7.1-7illustrates a staircase, or piecewise constant, approximation to an optimal timehistory for a single element in a filter gain matrix. The same gain element couldalso be well approximated by a decaying exponential function of time.

Approximating the Optimal Gains - Reference 4 discusses a system forwhich, as a practical matter, an investigation was performed to determine a setof piecewise constant filter gains that will approximate the performance of aKalman optimal filter. Figure 7.1-8 illustrates such a gain approximation usedduring initial operation of the system. The continuous optimal gain curve wasapproximated by a piecewise constant gain; Fig. 7.1-9 shows one resulting errorin the system. As expected, the suboptimal gain history produces larger errorsfor a certain duration of time. The interesting observation here is that thesteady-state error does not suffer as a consequence of the gain approximation. Inmany systems, the longer convergence time may be an acceptable price to payfor not having to compute the error covariance and gain matrices on-line or tostore their detailed time histories.

While it is usually possible to select good approximations to the optimal filtergains simply by observing their time behavior and using subjective judgement,some analysis has been performed which could be brought to bear on theselection of an optimal set of piecewise constant gains with which toapproximate time-varying gain elements. The work is described in Ref. 5, anddeals with the linear regulator problem whose formulation is similar to that of

TIME ..

GAIN APPROXIMATION

KALMAN OPTIMAL GAIN HISTORy

Figure 7.1-8 Fixed-Gain Approximations (Ref. 4)

240 APPLIED OPTIMAL ESTIMATION SUBOPTIMAL FILTER DESIGN AND SENSITIVITY ANALVSIS 241

k1(t)

0.5~.::.:.=:.o..~__

1t (seel

ONE SUBINTERVAL

-0.5'\~STEP-GAINSYSTEM

\ KALMAN OPTIMAL SYSTEM

TIME

Figure 7.1--9 History of rms Error for Gain Choices Shown in Fig. 7.1·8 (Ref. 4)TWO SUBINTERvALS

optimal estimation. BTiefly, a set of discrete time intervals - ~ " t "t. .i =1,2, ... N - is established and a constant controller gain matrix is soughtforeach of these intervals by minimizing the average quadratic cost function. Asteepest descent technique is used to converge on the minimum. The sametechnique could be useful for selecting piecewiseconstant filter gains for a linearestimator. Figure 7.1-10 shows the one set of optimal piecewise constant gains,k, (t), chosen in Ref. 5 and the optimal continuous gain they replace. It can beseen that the problem was solved for several arbitral)' subdivisions of the timescale. Note that the constant gains do not usually represent the average optimalcontinuous gain fer the time interval they span.

ok;~ltl

o I 2t Isec I

-0.5

EIGHT SUBINTERVALS

Using Steady-$tate Gains - The limiting case of a set of piecewise constantgains is choosing each gain to be constant over all time; logical choices for theconstants are the set of gains reached when the filter error covariance equationsare allowed to achieve steady state. The gain matrix is simply givenby

- - - - - OPTIMAL GAINS---- OPTIMAL PIECEWISE CONSTANT GAINS

Figure 7.1·10 Optimal and Suboptimal Feedback Gains for Third-Order System (Ref. 5)

where the H and R matrices, as well as the dynamics matrix F, must be.constantfor the steady-state error covariance, P00' to exist. More generally, HTR"IH mustbe constant; this condition is seldom satisfied if H and R are not constant. InChapter 4 it is shown that a Kalman filter which uses gains derived from thesteady-state covariance is, in fact, identical with the Wiener filter.

The use of steady-state gains assumes that measurements are available over asufficiently long period of time such that a steady-state condition in the filter isachieved before critical (in terms of system accuracy) points in time are reached.This approach forfeits the rapid convergence capability of the filter, whichdepends largely on time-varying gains to weight the first few measurementsheavily when initial uncertainty about the state value is high. How much time isrequired for the fixed gain filter errors to approach steady state (and therefore

242 APPLIED OPTIMAL ESTIMATION SUBOPTIMAL FILTER DESIGN AND SENSITIVITY ANALY$IS 243

and P"", is the steady-state Kalman filter error covariance. Reference 6 shows thatan upper bound on the Hilbert 0' spectral norm (Ref. 7)- of the differencebetween the Kalman and Wiener error covariances is given by

satisfy conditions for which the filter was designed) is, of course, a function ofthe particular problem. It is sufficient to say that, in most cases, the constantgain filter will be considerably slower in reacbing steady state than thetime-varying filter whose steady-state gains the former may be using. A lowerbound on the convergence time of the fixed-gain filter can be determined bynoting how long it takes for the steady-state covariance matrix, on which thosefixed gains are based, to develop.

UP (t) _ P (t) 11< IIP(O) - P~ U' IIHTR"'H Uw K 81a",ax I

(7.1-15)

Example 7.1-3

Consider the problem of estimating a random walk from a noisy measurement

x. =w, w~ N(O,q)

z = x + v, v ~ N(O,r)

The estimation error covariance equation for the optimal filter is simply

where P(O) is the initial error covariance and Q'max is the maximum real part ofthe eigenvalues of (F - K_H). Equation (7.1-15) indicates that an upper boundon the difference between the Kalman and Wiener filter errors is large when thedifference between the initial and steady-state errors, 6P(O)=P(O) - P~, is large.Also, Ct'max is the inverse of the largest time constant in the Wiener filter; thebound on BP varies with that time constant. The matrixHTR- t H is recognized asbeing related to the information contained in each measurement, from thematrix inversion relationship, Eq. (4.2-19), of Chapter 4.

Example 7.1-4

Consider the Kalman and Wiener filters for the scalar systemIn the steady state p"", = 0 and

x=ax+w,

2.:= bx +v,

w- N(O,q)

v-N(O,r)

k = ~- ..j,

Observe that the steady-state gain weights the residuals, (2. - x),highly when the processnoise is high and the measurement noise is low. Also, the steady-state error is driven by theprocess and measurement noises alone.

Consider the error covariance matrix equations for a continuous Kalman filterand a Wiener filter

PK = FPK+PKFT+GQGT-PKHTR" HPK (Kalman filter)

Pw =(F-~H) Pw+Pw (F-K_H)T+GQGT+K_RK_T (Wiener filter]"

where

..An error covariance differential equation for a filter with the structure of a Kalman filterbut an arbitrary gain K is

P=(F-KH)P+P(F-KH)T +GQGT +KRKT

Proof of this is left as an exercise for the reader (see Problem 7-7).

The Kalman filter error covariance equation is

P(O)= Po

The solution of this equation is, from Problem 4-11,

PK(t) = (a)o + q) sinhPt + l3Pocosh fJt

(-7 Po - a) sinhfJt +lJcoshlJI

where

R:i' qfJ=a l+~a r

*1be Hilbert norm of a matrix M. denoted UMM, is given by

IMI =Jx,.,ax(MTM)

where~ (MTM) is the largest eigenvalue of MTM.

244 APPLIED OPTIMAL ESTIMATION SUBOPTIMAL FILTER DESIGN AND SENSITIVITY ANALYSIS 245

It can be shownby limitingargumentsthat

ar ( ~)P.... ='bi 1"'-;

Thegainof the Wienerfilter is thus

a +~k .

- b

UsingEq. (7.1-15),the upperbound on 6p(t) = pwtt) - PK(t) is

IIPo _ P....U2 1I~211Mp(t)" <--

8Iom..

1r

which,giventhe properties'of the Hilbertnorm,* becomes

(p -p )' b'6p(d < 0 ....

8r~

MINIMUM SENSITIVITY DESIGN

In many problems dynamic and statistical parameters are known only to lie incertain bounded ranges. Agame-theoretic approach to filter design in the presenceof these parameter uncertainties has been formulated and studied in Refs. 8 and9. Using any of the previously described state reduction techniques, asuboptimal filter form is selected a priori, but the filter parameters are leftunspecified. Any filter performance measure is now a function ofuncertainrealworld system parameters and the unspecifiedfilter parameters. Let us denote theuncertain real world system parameters by a vector g and the unspecified filterparameters by the vector J!. It is assumed that g and I! lie in closed bounded sets,A and B, respectively. A convenient scalar measure for any suboptimal filterperformance is then

mind, three sensitivity measures and their associated rules of synthesis areappropriate. They are:

S, =min max [J(g,£) - Jo(g)]!eB !!eA

S. =min max [J(g,£) - Jo(ll)]E'B S,A Jo(g)

The S. design simply minimizes the maximum value of J over the parameter set~. This places an upper bound on the cost, and might be interpreted as a "worstcase" design. The second and third criteria minimize the maximum absolute andrelative deviations, respectively, of the filter error from optimum, over theparameter set g. Thus, the S2 and S3 criteria force the filter error to track theoptimum error within some tolerance over the entire set g e A. In each case theabove procedures yield a fixed value of ~ and, therefore, a fixed filter designgood for all values of the uncertain parameters g.

Research to date has concentrated on the design of optimally insensitivefilters in the presence of uncertain system and measurement noise statistics.Specifically, it has been assumed that elements of the system and measurementnoise covariance matrices, Q and R respectively, were unknown. These unknownelements then constitute the vector g. For the S. filter, which is the easiest tofind, a rule of thumb is available: A good initial guess for a is that value whichmaximizes Jo(Q), denoted 9,' - Le., -

JoCg') = max Jo(W!!eA

When the filter state and the real world state are of the same dimension, thisresult is exact and the S, filter is simply the Kalman filter fo'!{

where M is a positive definite weighting matrix, included to balance theimportance one assigns to each system error and P is the filter error covariancematrix. J (g, I!) is then simply a weighted sum of all system errors. For a given g,the minimum value of J, denoted Jo(g), is attained by the Kalman filter and is afunction of galone. Clearly,J (g,I!);;'Jo(!!J for all g e A and aliI! E B. Slnce g isunknown and ~ alone is available for selection by the designer, it seems mostnatural to view 9' and ~ as adversaries in the game-theoretic sense. With this in

Example 7.1-6

Considera first-orderplant witha noisymeasurement

Giventhe filter equation

k unspecified

O<q<1

O<r<; 1

w-N(O,q),

v-N(O,r),

x= -x'" k(z - X)i

x=-x'" w,

z = x.,. v,

(7.1-16)J (!l, I!) =Trace [MPj

*Fromthe definitionof the Hilbert norm, whenM is a scalar, m,

IImll=m

select J ." ~...., where Poo is the steady-state valueof the filter error covariance, and performthe minimization over the range 6 '" k, k > -1 for stability and the maximizationover the

246 APPLIED OPTIMAL ESTIMATION SUBOPTIMAL FILTER DESIGN AND SENSITIVITY ANAL VSIS 247

range of q and r. {Note that this is the same measure of filter performance expressed in Eq.(7.[·16)). Then

40,.------------------Of'llMAl---

Consider the continuous rust-order markov process with noisy measurements (j3> 0)

Example 7.1-6

;. = -fJx + W, w- N(O,q)

35

uZ..'"~8

; 3.0

z0iis§

25

(7. i-t 7)J=k2r+ q . J =(r2 +rq)ll2_ r

2(1 + k)' 0

The SI value ofJ, SI =min(k > -I) max(q,r) (J(k,q,r»), occurs at u v r « 1 with k=..j'2-l.The S2 criterion is satisfied when q = 0, r =1 and q = I, r = 0 with k = l. Since Jo =0 whenr or q are zero, the argument of the S3 criterion is infinite at those boundaries of the r.qspace under consideration, and the filter satisfying the S3 criterion does not exist in thisexample.

Assuming that the value of f1 is uncertain, a filter identical in form to the Kalman filter iscaosen to estimate x, namely

z = x + v, v - N(O.r)

motivation. Also, the equations developed, while corr-ect, are not necessarilymost efficient, because emphasis here is placed on the tutorial presentation ofthe subject. References 10 and 11 provide alternative sensitivity equations whichmay require less computer spaceand time.

Observe the structure of the Kalman filter, illustrated in Fig. 7.2-1, for thecontinuous case; the filter contains an exact model of the system dynamics andmeasurement process. Additionally, note that the filter gain matrix is calculatedusing the exact models of dynamics and measurement and exact knowledge ofthe process noise covariance (and the influence matrix.G), measurement errorcovariance, and initial estimation error covariance.

There are two broad questions we can ask with respect to the sensitivity ofthe filter: "How does the error covariance behave if we make approximations incomputing the gain matrix K, but use the correct values of F and H in theimplemented filter?" and "Hew does the error covariance behave if we compute

where it is required for stability that f1f+ k > O.The covariance of the estimation error forthis simple problem has a steady-state value, denoted here as p.., and given by

k2r+q cpl-f12 ) qp..,=---+

2(pf + k) 2I1(pf+ k)(jl + ~f + k)

Observe that p.., is -a function of both the unknown f1 and the unspecified filter parameters/If and k. The S1, S2 and S3 filters described above were determined for this example for thecase where q = 10. r = 1 andO.1 or;;;;:fJ or;;;;: 1.

Figure 7.1-11 shows the error performance of the optimal S., S2, and 83 filters versusthe true value of 1'. The performance of a Kalman filter designed for f1 = 0.5 is alsoillustrated. Observe that the SI filter has the smallest maximum error. while the S2 and S3filters tend to track the optimal performance more closely. The S3 error covariance. forexample, is everywhere less than 8% above optimum and is only 0.4% (rom optimum atf1 = 0.3. By comparison. the Kalman filter designed for a nominal value of (J equal to 0.5degrades rapidly as.the true value of adrops below 0.5.

This example shows that the minimax filters 'Canachieve near optimal performance overwide parameter variations. The filter implemented is identical in form to the Kalman filterand thus avoids the additional mechanization complexity of adaptive schemes sometimessuggested.

7.2 SENSITIVITY ANALYSIS: KALMAN FILTER

The statistical (or covariance) analysis which determines the true behavior ofestimation errors in a suboptimal linear filter is necessarily more complex thanthe work presented in previous chapters. The equations are developed below.Emphasis here is placed on the technical aspects of the analysis, rather than the

Fipre 7.1-11

05TRUE SYSTEM /3

Estimation Error Covariance C-omparisonas a Function of True SystemBandwidth (Ref. 9)

248 APPLIED OPTIMAL ESTIMATION SUBOPTIMAL FILTER DESIGN AND SENSITIVITV ANALYSIS 2411

EXACT IMPLEMENTATION OF DYNAMICS AND MEASUREMENTS

!!fit)

~ltJ

Figure 7.2-3 Block Diagram of Estimation Error Dynamics of a DiscreteFilter: System Dynamics and Measurement ProcessPerfectly Modeled in the Filter

Fipre 7.2-2 Block Diagram of Estimation Error Dynamics of a ContinuousFilter: System Dynamics and Measurement Process PerfectlyModeled in the Filter

Equations (7.2·3) or (7.2·1) and (7.2-2), together with thecovariance equationsfor the optimal Iilter and the definition of the gain matrix, can be used todetermine the effect of using incorrect values of F, H, G, R or P(O) in thecalculation of the Kalman gain matrix. The procedure involves two steps whichcan be performed either simultaneously Or in sequence. In the latter case K iscomputed and stored for later use in Eqs, (7.2-1) or (7.2-3).

(7.2-1)

(7.2-2)

(7.2-3)

IIIIIIIL _

IIIIII

I IL J

P=(F-KH)P +P(F-KH)T +GQGT+ KRKT

the gain matrix in some manner (optimal or otherwise), and use wrong values ofF and H in the implemented filter?" The first question is relatively easy toanswer, while the second question requires a considerable amount of extracalculation.

A single equation describes the corresponding error propagation for thecontinuous filter (see Problem 7-7), viz:

Figure 7.2-1 Block Diagram of the Continuous Filter Equations

The error covariance relationships for a discrete filter with the same structureas the Kalman filter, but with an arbitrarygain matrix, are (see Problem 7,6):

Under the assumptions with which we are presently dealing, Fig. 7.2-1 (whichillustrates the similarity between the filter and the system) can be rearranged toprovide a corresponding block diagram for estimation error dynamics, shown inFig. 7.2-2. A similar error block diagram for the discrete filter is shown in Fig.7.2-3. The error equations illustrated in these two figures are used in thederivation of Eqs. (7.2-1), (7.2.2) and (7.2-3).

If the expressions for the optimal gain matrix.provided earlier are substitutedinto Eqs. (7.2-1) and (7.2·3), they reduce to the previously stated covariancerelations for the optimal filter. in which the K matrix does not appear explicitly.

250 APPLIED OPTIMAL ESTIMATION SUBOPTIMAL FILTER DESIGN AND SENSITIVITY ANALVSlS 251

Step 1: In previous chapters the following error covariance equations, in whichthe filter gain matrix, K, does not appear explicitly, were derived. Forthe discrete-time case,

Pk(+) = Pk(-) - Pk(-) HkT (HkPk(-)HkT + Rkf'HkPk(-)

Pk+1(-) = 4\Pk(+) <l>kT + Qk

(7.2-4)

(7.2-5)

Continuous filter - The procedure for deriving error covariance equationswhich can be used to answer the second question posed is quite similar to thatused in the earlier derivations in Chapters 3 and 4. The error sensitivityequations for the continuous filter are derived as follows: From Chapter 4 weknow that the equations for the state and the estimate are given by

or, for the continuous-time case,

(7.2-6)

l=~+!

~ = F"j + K" fl-H"il (7.2.7)

and recalling the relation between Z, R and g,Eq. (7.2-8) can be written as

where the asterisked quantities K*, H* and F* represent the filter gain and themeasurement process and system dynamics implemented in the filter; H and Frepresent the actual measurement process and system dynamics. In thisderivation, it is assumed that the state variables estimated and those needed tocompletelv describe the process are identical. The sensitivity equations for thecase when the estimated state is a subset of the entire state are provided at theend of the derivation. Equation (7.2-7) illustrates the fact that the actual systemdynamics and measurement process, represented by the matrices F and H, arenot faithfully reproduced in the filter -·i.e., F '* F", H '* H". The error in theestimate,,& =.&-2\, obeys, from Eq. (7.2.7),

x= (F*-K"H")R + (.1F-K*.1H)Z- Gjy + K"!

x= (F*-K"H").& - (F-K"H)Z- Gjy +K"y

(7.2-9)

(7.2-8)

.1H~ H*-H

Letting

.1F ~ F* - F and

INCORRECT IMPLEMENTATION OF DYNAMICS AND MEASUREMENTS

Using Eqs. (7.2-4) and (7.2-5), or Eq. (7.2.6), and design" values of F,H, G, Q, R and P(O), compute the error covariance history. Also, usingthe equations from previous discussions, compute the filter gain matrixwhich would be optimal if the designvalueswerecorrect.

Step 2: Inserting the gain matrix computed in Step I into Eq. (7.2.1) or (7.2-3),and using the correct values of F. H, G. O. R and P(O) (which are tobe implemented in the filter), compute the "actual" error covariancehistory.

Because Eqs. (7.2-1), (7.2-2) and (7.2-3) are based only on the structure ofthe .Kalman filter, and not on any assumption that the optimal gain matrix isemployed, they can be used to analyze the filter error covariance for any set offilter gains. This permits investigation of proposed sets of precomputed gains orsimplified filter gains, such as decaying exponentials, etc., assuming the correct F(or <1» and H matrices are implemented in the filter. In that case, the gain matrixis simply inserted in Eqs. (7.2-1) or (7.2-3) and. using correct values for F, H, CQ, Rand P(O), Eqs. (7.2-1) and (7.2-2), or (7.2-3), are used to compute the errorcovariance.

Answering the second question posed in the introduction to this section ismore difficult. It is tempting to compute a filter gain matrix based on Eqs.(7.2-4) and (7.2-5) or (7.2-6) and a set of design values, and-insert it into Eq.(7.2-1) or (7.2·3); essentially, this is following the same procedure as outlinedabove. The fallacy of this approach is that Eqs. (7.2-1) through (7.2-6) are allbased on the assumption that the system dynamics and measurement process areidentical in the Kalman filter implementation and the real world - the set ofcircumstances treated in the previous section.

*Design values are those used to derive the filter gain matrix. The filter implementationrequires specification of system dynamics and measurement matrices, which do notnecessarily have to be the same as the corresponding matrices used in filter gain matrixdesign.

A new vector.j,' , is defined by

,f! fRjx - -- 11

The differential equation for ~' in vector-matrix form is, from Eqs, (7.2-7) and(7.2-9),

!,' + vi

(7.2·10)


The quantity of interest, the covariance of ii, is the upper left corner of thecovariance matrix of~':

Note that Eq. (7.2-10) is in the form of the differential equation for the state ofa linear system driven by white noise. Prom Chapter 4, the covariance equationfor!,' is

(7.2-17)

Note that when the actual system dynamics and measurement process areimplemented (.6.F = 0, Mi = ,0), Eqs. (7.2-14), (7.2-15) and (7.2.16) reduce toEq. (7.2-3), as expected. There can be only one gain matrix of interest here, theone implemented in the filter, consequently, K* = K.

If the state vector of the implemented filter contains fewer elements than thecorrect state vector, F and F* and H and H* will not be compatible in terms oftheir dimensions. They can he made compatible by the use of an appropriatenonsquare matrix. For example, if the state vector implemented consists of thefirst four elements in a six-element true state vector, P is 6 X 6 and P* is 4 X 4.The difference matrix zsf can be defined by

(7.2·11)

(7.2-12)

where where the matrix Waccounts for the dimensional incompatibility between F andF*,

Breaking Eq. (7.2-13) into its component parts, the error sensitivity equationshecome

Combining Eqs. (7.2-11) and (7.2-12) and expressing the expected value ofylyt'Tin terms of the spectral density matrices Qand R,

p= (F*-K*H*)P + P(F*-K*H*)T + (.6.F-K* .6.H)V

+ VT(.6.F-K* .6.H)T+ GQGT + K*RK*T

It = FV + V(F*-K*H*)T + U(.6.F-K* .6.H)T_ GQGT

U= FU+ UFT + GQGT

(7.2-18).6.H=H*W-H

A similar transformation can make H* compatible with H,

The matrix W is, in fact, the transformation between the true state and thatimplemented,

The sensitivity equations for a continuous filter which does not estimate theentire state vector are

.!implemented = W.!true

P= WT(F*-K*H*)WP + PWT(F*-K*H*)TW + (.6.F-WTK*.6.H)V

+ VT(.6.F-WTK*.6.H)T+ GQGT + WTK*RK*TW

It = FV + VWT(F*-K*H*)TW + U(.6.F-WTK*.6.H)T_GQGT

U = FU + UFT + GQGT (7.2-19)

W= [~ ~ ~ ~ ~ ~]001000

000100

(7.2-13)

(7.2.14)

(7.2-15)

(7.2.16)

t· I 'jP I V-.-, :V,U

Since the initial uncertainty in the estimate is identical with the uncertainty inthe state, where .6.F and.6.H are defined in Eqs, (7.2-17) and (7.2-18). A more complex set

of sensitivity equations, covering the case where the filter state is a general linearcombination of the true states, is given in Refs. 10 and 11 for both continuousand discrete linear filters.

254 APPLIED OPTIMAL ESTIMATION SUBOPTIMAL Fl-LTER DESIGN AND SENSITIVITY ANALYSIS 255

It should be emphasized that the only purpose of the transformation W is toaccount for the elimination of certain state variables in the implementation. Theapproach illustrated here will not cover the situation where entirely differentsets of state variables are used.

7.3 SENSITIVITY ANALYSIS EXAMPLES

This section provides the reader with several examples of the application ofsensitivity analysis.

(7.2-20)

Example 7.3-1Consider a stationary, first-order markov process where the state is to be identified from

a noisy measurement:

i> = (-~-k) p +p(-~-k) + q +k'r

Equation (7.2-3) becomes the scalar equation

(1.3·2)

(7.3-1)

w- N(O,q)

v ..... N(O,r)

x(t): ~[jx + W,

z(t) = x + v,

Discrete Filter - The sensitivity equations for the discrete filter whichcorrespond to Eqs. (7.2-14) through (7.2-16) are, between measurements,

Pk+1(-) = <l>k *Pk(+) <1\ *T + <l>k *VkT(+)a<l>kT

+ a<l>k Vk(+) <l>k *T + a<l>kUka<l>kT + Ok

Vk+1(- ) = <l>k Vk(+)k *T + <l>kUka<l>kT - Ok

Uk+1 = <l>kUk<l>kT + Ok

and across a measurement,In the steady state, we set Poo = 0, and thus find

Pk(+) = (I-Kk*Hk*)Pk(-)(l-Kk*Hk*)T - (I-Kk*Hk*)VkT(-)aHkTKk*T

- Kk*aHkVk(-)(I - Kk*Hk*)T + Kk*aHkUkaHkTKk*T

+ Kk*RkKk*TThe optimal k(denoted ko) is computed as pooh/r or, equivalently, from

(1.3-3)

(7.2-21) ap_= 0ak

where

a<l>~ <1>*-<1>

both of which yield

ko=-fJ+~ (1.3-4)

When a<l> and aH are zero, Eqs, (7.2-20) and (7.2-21) become identical withEqs. (7.2-2) and (7.2-1), respectively.

It is worth noting here that if all the states whose values are estimated by thefilter are instantaneously adjusted aftet each measurement according to

The variation of Pooas a function of k is illustrated in Fig. 7.3-1.

(7.2.22) Ja

the estimate of the state followmg such a correction must be zero, In that case.the equation for propagating the estimate between states and incorporating thenext estimate is

Q

Iii

(7.2-23)

and there is no need for specifying filter matrices <1>* and H*_ Equations (7.2-1)and (7.2.2) are the equations which apply in this case. Instantaneousadjustments can be made if the states being estimated are formed from variablesstored in a digital computer.

Fipre 1.3-1 Sensitivity Curve. p_ V$ k

If the filter is designed for particular values of rand q, k is fixed according to Eq.(7.34). It can be seen from Eq. (7.3-3) that, with kfixed, the steady-state error covariancevaries linearly with the actual process noise spectral density or measurement error spectraldensity. This is illustrated in Fig. 7.3~2.

If the true noise variances are assumed fixed and the design values of q and r are varied,quite different curves result. Any deviation of the design variances, and consequently k,from the correct values will cause an increase in the fiI ter error variance. This is aconsequence of the optimality of the filter, and is illustrated in Fig. 7.3-3. The sensitivitiesof the first-order process shown in Figs. 7.3-2 and 7.3-3 are similar to those observed inhigher-{nder systems.

Example 7.3·2Continuing with the example, let us suppose that there is an uncertainty in the value of

the coefficient (1in the system dynamics. Furthermore, let us suppose that the filter designerselects a design value (1* and is certain of the mean square magnitude. 0 2, of the state x.Consequently, the designer always picks q* = 2(1$0

2• In this case. the steady-state estimationerror covariance can be shown to be

P_ = -'- 1.' l~ + ~')>.+ (P - ~'W'l + '~'(' - ~')(jl' - ~)} (7.3-5)A.(fJ+")

where" = ~fJ*2 + 2fJ*02Jr

Figure 1.3-4 is a plot of Po<>' as a function of 11*, for the case 11 = r = 0 2", I. Notice thesimilarity with Fig. 1.3-3.

Figure 7.3-2 Effect of Changing Actual Noise Variances

FiSUre-7,3-3 Effects of Changing Design Values of Noise Variances

Pw

r<q2'1

08'

080

07b

072

a e:a Il-'

Figure 7.3-4 Effect of Varying Design Value of System Dynamics Parameter

Example 7.3·3To illustrate the effect of different system dynamics models, we consider the system

shown in Figure 1.3-530. The state of interest is influenced by the Jorcing function. Ameasurement, several integrations removed, is used to estimate the system state. The twopossible models for the forcing function considered are: a first-order markov process and arandom ramp, both of which are illustrated in fig. 7.3-Sb. Figures 7.3-6, 7.3·1, and 1.3·8illustrate the results of research into the importance of that model. In Fig. 7.3--6,the filterhas been designed with the forcing function modeled as a first-order markov process but the

"

{bl Vanatlon of DeSign Measurement NOise

~~----+"

(b) vee.anon of Actual Measurement NOise

.'IP"I

~o-----"'-;----+.'

[e} vanancn of DeSIgn Process NOIse

!-------!,------....

(a) vanancn of Actual Process NOIse

,','IP,'1

o!:------':----·


)Ol~--------------

200)~-------------

"010 10

RMS SLOpt Of THf RANDOM RAMP,y

VALUES USED If\!I-ILTE:R DESIGN

liME CONSTANT 3 h,s

STANDARD DEVIATION·0002

50

150

FiJlu,re7.3-6 Filter Based on Exponential Correlation

~~ 100

actual forcing function is a random ramp. The rms error in the estimate of the state ofinterest is shown as a function of the rms ramp slope 'Y. In Fig. 7.3-7, the filter has beenredesigned with the forcing function modeled as a random ramp and the value of "I inthe filter design is assumed to be always correct (Le., the filter and actual values of "Icoincide for each value of'Y shown; in this sense the filter is always optimal).

It can be seen by comparing Figs. 7.3-6 and 7.3-1 that having the correct system modelmakes a dramanc improvement in filter performance. However, the situation depicted inFig. 7.3-1 is optimistic since the designer cannot hope to have perfect information about therms slope of the random ramp. Figure 7.3·8 illustrates a more realistic situation in which thedesigner has picked a nominal value for "I (i.e., 10-6). The effect of the existence of othervalues of "I is shown. It is evident that some deterioration in performance takes place whenthe actual forcing function is greater than that presumed in the filter design. However, it isinteresting to note, by comparing Figs. 7.3-6, 7.3-7 and 7.3-8, that thefonn chosen for thesystem model has a much greater impact on accuracy in this case than the numericalparameters used to describe the magnitude of the forcing function.

Example 7.3-4

We conclude this section with an illustration of how the sensitivity analysis equation,developed in Section 7.2, can be used to help choose which states will provide the mostefficient suboptimal filter and what deterioration from the optimal filter will beexperienced. Figure 7.3-9 is drawn from a study of a mulnsensor system in which thecomplete (optimal) filter would have 28 state variables. It shows the effect of deleting statevariables from the optimal filter. A judicious reduction from 28 to 16 variables produces lessthan a 1% increase in the error in the estimate of an important parameter, suggesting thatthe 12 states deleted were not significant. Overlooking for the moment the case of 13 statevariables, we can see that further deletions of variables result in increasing estimation errors.But even the removal of 19 states can give rise to only a 5% increase in estimation error.

o _7 -5-410 10 \0 10

RMS SlOPE OF THE RANDOM RAMP,y

)2~-------------'

STATE Of WHITE NOISE

~\RES~ ~.J~MEASU'EMENT

Figure?3-? Filter Based on Ramp with Correct rms Slope

(a) The System

INITIALCONDITIONS

dJ--cD--(b) Forcing Functions Considered

II

VALUE USED IN FILTER DESIGN:

r ~ 10-6

?0·7 \0-6 16 5 16 4

RMS SLOPE OF T~E RANDOM RAMP,r

Fipte 7.3-5 System Used to Illustrate the Effect of Different System Dynamics ModeledFigure 7.lo8 Filter Based on Ramp with Fixed rms Slope


6

o

COMPLETE FORMULATIONHAS 18 STATE VARIABLES

f-r-

~

r-

-r-

r-l9 10 \2 13 16

NUMBER OF FilTER STATE VARIABLES

Briefly, the steps required to evaluate a proposed filter design by producingan error budget are as follows: First, using the filter designer's rules, determinethe time history of the filter gain matrix. Then, using the complete model oferror sources, evaluate system errors. Developing an error budget involvesdetermining the individual effects of a single error source, or group of errorsources. These steps are illustrated in Fig. 7.4-1. Their implementation leads to aset of time histories of the contributions of the sets of error sources treated. Theerror budget is a snapshot of the effects of the error sources at a particular pointin time. Any number of error budgets can be composed from the time tracesdeveloped. In this way, information can be summarized at key points in thesystem operation. As illustrated in Fig. 7.4-1, each column of the error budgetcan represent the contributions to a particular system error of interest, while arow can represent the effects of a particular error source or group of errorsources. For example, the columns might be position errors in a multisensornavigation system, while one of the rows might be the effects on those positionerrors of a noise in a particular measurement. Assuming the error sources areuncorrelated, the total system errors can be found by taking the root-sum-squareof all the contributions in each column. Error sources correlated with eachother can also be treated, either by including them within the same group or byproviding a separate line on the error budget to account for correlation

Figure 1.3-9 Effect of Reduction in Number of Estimated State Variables

Of course, the case of 13 variables does not fit the general pattern. The point illustratedhere is that care must be exercised in removing states. The 13 variables represented by thisbar are not the twelve variables represented by the bar to its left, with one addition. (If theywere, a lower error than that shown for 12 states could generally be expccted.) They are adifferent subset of the origina128 states and they have obviously not been well chosen. Thiskind of information comes to the filter designer only through the careful exercise of thesensitivity analysis equations. It is behavior such as this that keeps good filter design amixture of both ad and science.

7.4 DEVELOPING AN ERROR BUDGET

Error budget calculations are a specialized form of sensitivity analysis. Theydetermine the separate effects of individual error sources, or groups of errorsources, which are thought to have potential influence on system accuracy. Theunderlying assumption is that a Kalman-like linear filter is designed, based onsome choice of state variables, measurement process, noise spectra etc. ("filtermodel"). When that filter is employed, all of the error sources ("truth mode)"),whether modeled in the filter or not, contribute in their own manner to errors inthe filter estimate. The e"or budget is a catalog of those contribulions.This section describes how the error budget calculations are performed. Thediscussion treats the d.iscretefilter in some detail, but tlie approach is valid forcontinuous filters as well.

[OPTIMISTIC)FllTER-INOtcATEDPERfORMANCf

REALISTICAllYPROJECTEDSYSTEM

PERFORMANCE

Figure 7.... 1 DiagramIllustrating the Steps in Error Budget Development

Notice that Fig. 7.4-1 indicates that the (optimal) filter error covarianceequations are nonlinear [see Eqs. (7.24) through (7.2-6)] while the sensitivityequations [Eqs. (7.2-19) through (7.2-21)] are linear. That is, thesensitivity equations are linear in F(or $), H, R, Q, and initial conditions oncetbe filter (F*, H*, R*, Q* and K*) is flxed, Asa consequence of the linearity ofthe sensitivity equations, it is a simple matter to develop sensitivity curves oncethe error budget has been assembled. This will be illustrated in a later example.The figure also points out that the filter covariance calculations are generally anoptimistic indication of the accuracy of the filter. Whenall the error sources are


considered the true filter accuracy is usually seen to be somewhat poorer thanthe filter covariance, per se, would lead one to believe. Certain steps in thegeneration of an error budget are now discussed in some detail.

Filter Gain Calculations - Each Kalman filter mechanization specifies aparticular set of matrices - a filter representation of state dynamics (symbolically, the matrix p* or "'*), a filter representation of the measurementprocess (H*), a process noise covariance matrix (Q*), and a measurement noisecovariance matrix (R*). These matrices, combined with an initial estimationerror covariance [P*(O)), are used to compute the time history of filter errorcovariance and, subsequently, the filter gain matrix, K*. The filter then proceedsto propagate its estimate and process incoming measurements according to theKalman filter equations, using K*, H* and p* (or "'*). All of the above-listedmatrices [Po, H*, Q*, R*, P*(O») must be provided in complete detail.

True Error Covariance Calculations - If all of the pertinent error effects areproperly modeled in the implemented filter, the error covariance matrixcomputed as described above would be a correct measure of the systemaccuracy. However, for reasons discussed previously, many sources of error arenot properly accounted for in the design of most filters, and the consequences ofignoring or approximating these error effects areto be investigated. The behaviorof each element of the so-called "truth model", whether or not included in thefilter design under evaluation, must be described by a linear differentialequation. For example, an individual error, o, which is presented as a first-ordermarkov process obeys the differential equation

. Ia=--ct+w

fa

where T a is a time constant and w is a white noise. Information concerning therms magnitude of such a variable is provided by its initial covariance and in thespectral density ascribed to w. When a constant error source such as ameasurement bias error, b, is considered, it obeys the linear differential equation

b=0

In this case, the ems value of the error is entirely specified by an initialcovariance parameter. All of the untreated sources of error are added to thefilter states to form an augmented "state" vector for performing the errorbudget calculations.* The dynamics of the augmented state are represented by amatrix F or its counterpart in the discrete representation, $. An initialcovariance matrix, P(O), is also formed for the augmented state.

"The augmented state vector is not to be confused with the vector x' whose dynamics aregiven in Eq. (1.2~10). The augmented state vector discussed here is always represented bythe.! component of i in Section 1.2.


Of course, it is essential to the systems under consideration that measurements be incorporated to reduce errors in estimates of the variables of interest.The gain matrix designed for each filter only contains enough rows to accountfor the number of states modeled in the filter. But the augmented state isgenerally much larger and a convenient device is used to make the filter gainmatrix, K*, conformable for multiplications with the covariance matrix for thenew state. A matrix W is defined and a new gain matrix is formed as follows:

K' =WK*

If the augmented state is formed by simply adding new state variables to thoseused in the filter, we can write

where the identity matrix has dimensions equal to that of the filter state.Clearly, filter states which are more complex (but linear) combinations of theaugmented state can be accommodated by proper definition ofW.

The covariance calculations for the augmented state yield, at the appropriatelocations in the matrix P, the true mean square errors in the estimates ofinterest. The difficulty involved in performing those calculations can begreatlyreduced if it is assumed that all ofthe state variables are properly corrected [seethe discussion surrounding Eqs. (7.2-22) and (7.2-23)J for each measurement;this assumption is made in what follows. The covariance equation at the time ameasurement is taken is similar to Eq. (7.2-1), viz:

In the absence of measurements the augmented state (truth model) covariance changes in time according to Eq. (7.2-2), which is repeated here forconvenience:

(7.4-2)

In Eq. (7.4-2) the matrix Qk represents the effects of uncorrelated forcingfunctions over the interval tk to tk+ l ' When the state variables are not corrected,Eqs. (7.2-20) and (7.2-21) apply, with the appropriate substitution ofWKk* for

Kk *·Equations (7.4-1) and (7.4-2) can be used to determine the effects of the

various items in the truth model. In order to separate the effects, the equationsmust be exercised many times, with different initial conditions and forcingfunctions, For example, to investigate the effects of measurement bias errorsalone, all elements of P(O) not corresponding to these errors will be set to zero,along with the Q and R matrices. The error covariance elements generated bythis procedure only result from measurement bias errors. To look at first-order

markov errors the appropriate elements of P(O)and Q are entered (all others setto zero) and the equations rerun. This procedure must be repeated many timesto generate the error budget. The root-sum-square of all the effects provides thetrue measure of total system performance. Section 7.6 describes the organizationof a computer program to perform these covariance calculations.

When bias error sources are under consideration, their separate effects canalso be computed using "simulation" equations such as

SUBOPTIMAL.FILTER DESIGN AND SENSITIVITY ANALYSIS 266

and

vP::«il= J2(1.~14)= 0.595

J<0.414)'v'P.J')= --=0.245~ 2(1.414)

Equations (7.4-3) and (7.4-4) are vector rather than matrixequations, and offersome reduction in computer complexity over the covarianceequations describedabove [Eqs. (7.4-1) and (7.4-2)). However, their use precludes calculating theeffects of groups of error sources in a single computer run. Each bias errorsource must be treated separately with Eqs. (7.4-3) and (7.4-4) and their effectsroot-sum-squared, while the covariance equations permit the calculation of theeffects of a set of bias error sources in one operation. Also, the "simulation"approach cannot handle correlations between error sources.

(7.4-3)

(7.4-4)

The error budget generated is given in Table 7.4-1. The root-sum-square error can becomputed easily.

TABLE7.4-1 EXAMPLEERROR BUDGET

Contributions to Steady-State ErrorsError Sources in Estimate of x

Initial Error 0

Process Noise 0.595

Measur~m~nt Noise 0.245

Total (Root-Sum-Square) Error 0.643

Example 7.4-1From Example 7.3-1 we can form an error budget which displays individual

contributions to the steady-state estimation error. The sources of error in the estimate ofthe single state, x, are: initial errors, x(0), process noise, W, and measurement noise, v.

Effect of Initial Errors - By inspection of Eq. (7.3·3) we can see that the initial errors,represented by p(O), do not contribute to the steady-state estimation error.

-Effect of Procea Noi. - Equation (7.3·3) permits us to calculate the effect of processnoise, characterized by the spectral density matrix q, by setting r = 0:

Effect of Measurement Noise - W~ can find the effect of measurement noise on thesteady-state error covariance by setting q =0 in Eq. (7.3·3):

If we assign values to 11, q and r, we can construct an error budget from the aboveequations. Since the error budget commonly gives the root-mean-square contribution ofeach error source we take the square root of POD in finding entnes in the error budget Set11 =q =r = 1. Then, from Eq. (7.)-4)

k > -I +J2=0.414

The linearity of the equations used in forming error budgets permits easydevelopment of sensitivity curves which illustrate the effects of different valuesof the error sources on the estimation errors, Q$ long as the filter design isunchanged. The procedure for developing a set of sensitivity curves for aparticular error source is as follows: First. subtract the contribution of the errorsource under consideration trom the mean square total system error. Then, tocompute the effect of changing the error source by a factor OJ y, multiply itscontributions to the mean square system errors by y2. Next replace the originalcontribution to mean square error by the one computed above, and finally,take the square root of the newly computed mean square error to obtain thenew rss system error. Sensitivity curves developed in this manner can be used toestablish the effect of incorrectly prescribed values of error sources, to identifycritical error sources, and to explore the effects of substituting into the systemunder study alternative hardware deviceswhich have different error magnitudes.

Example 7.4·2Continuing Example 7.4~l, we can develop a sensitivity curve showing the effect of

different values of measurement noise. If the rms measurement noise is halved (implying a1/4 reduction in r). while k, fJand q are held constant, the entry on the third line of theerror budget becomes

J<0.245)'-4-=0.123


EXACT IMPLEMENTATION OF DYNAMICS AND MEASUREMENTS

Using the same notation as developed in Section 5.2 for the Ranch-TungStriebel form of the optimal smoother, the optimal filter covariance is computedusing Eq. (7.2-3),

All other entries are unchanged. The new total (ISS) error is

..}(0.595)' +(0.123)'-:0.601

When the rms measurement noise is doubled (factor of 4 increase in r), the entry is doubledand the total error is

..j(0.595)' + (0.490)' = 0.710p(t) = (F-KH) p(t) + p(t) (F-KH)T + KRKT + GQGT (7.5-1)

The sensitivity curve constructed from the three points now available is shown in Fig. 7.4~2. Then, using the end condition P(TIT) = P(T), the fixed-interval optimalsmoother error covariance is computed from the relation [Eq. (5.2-15»)

P(tIT) = IF + GQGT P-'(t)] P(tIT)

+P(tIT) [F+GQGTp-'(t»)T _GQGT (7.5-2)

(7.54)

(7.5-3)

P(tIT) = (F + K,) P(tIT) + P(tIT) (F + K,)T - GQGT

Eq. (7.5-2) can be written

Ifwe define a "smoother gain", Ks• from the first equation in Table 5.2-2,

P(t) = (F*-K*H*) P(t) + P(t) (F*-K*H*)T + (l>F-K*l>H)Y

+ yT(l>F_K*l>H)T + GQGT + K*RK*T, P(O)= P*(O) (7.5-5)

INCORRECT IMPLEMENTATION OF DYNAMICS AND MEASUREMENTS

When incorrect values of F and H are used in filter and smootherimplementation, a more complex set of relations must be used to perform thesensitivity analysis. As before we designate implemented valueswith an asterisk.It is also necessary to define a number of new matrices, some with statisticalmeaning as covariances between familiar vectors (e.g., such as true state,estimate, smoothed estimate, error in smoothed estimate, etc.) and others whichserve only as mathematical intermediaries. The equations for sensitivity analysisof the linear smoother are

To determine the effects of differences between design values of F, H, G, Q, R,and 1'(0) and those that may actually exist, assumingthat correctvaluesofF andH areused in the filter implementation equations, the following steps are taken:First, compute p(t) using the design values. Next compute the filter gain K, asbefore, and K, as in Eq. (7.5-3). Finally, using Eqs. (7.5-1), (7.5-4), the endcondition given above, and the actual values of F, H, G, Q, R, and P(O), computethe actual error covariance history.

1RMS MEASUREMENT NOISE

IProportionol To';; )

J---'-----~o

Figure 7.4-2 Example Sensitivity Curve

Sensitivity analyses of optimal smoothen provide the same insights, and aremotivated by the same concerns as those discussed in Section 7.2 with regard tooptimal filters. The derivation of the sensitivity equations proceeds along linessimilar to that detailed in Section 7.2. The two cases treated in Section 7.2 when the system dynamics and measurement process are accurately modeled andwhen they are not - also arise in the sensitivity analysis of smoothen. They arediscussedseparately below.

7.5 SENSITIVITY ANALYSIS: OPTIMAL SMOOTHER

268 APPLIED OPTIMAL EST'MATION

Y ~ FY +Y(FLK·H·)T + U(l>F-K·l>H)T - GQGT, Y(O)~ E[!(O)~T(O)l

(7.5-6)

(7.5-7)

p(tIT) ~ (F· +GOO·G·TB- I) P(tIT) +P(tl1) (F· +G·Q·G·TB-I)T

+G.Q.G.TB- I [yTDT - p(t)CT) + [DY - CP(t») B-1G·Q·G·T

+ l>F(VCT- UDT) + (CyT - DU)l>FT+ (C +D) GQGT

+ GQGT(C + D)T - GQGT, p(TI1) ~ P(1) (7.5-8)

(7.5.9)

c~ (F* +G·Q·G·TB-I)C - G·Q*G·TB- 1 - C(F·-K·H·), C(1) ~ I

(7.5-10)

(7.5.11)

Note that p(t), Y, U, l>F and l>H are the same as P, Y, U, l>F and l>H in Section7.2. The matrix B is the error covariance of the optimal filter if the set of"implemented" matrices are correct. Note also that when ~F and.6.H are zero, B~ p(t), D ~ 0, and, when the implemented matrices are the same as the designmatrices (F* ~ F, Q* =r Q, etc.) the equation for the actual smoother errorcovariance [Eq. (7.5-8)] reduces to Eq, (7.5.2) for the optimal smoother.

7.6 ORGANIZATION OF A COMPUTER PROGRAM FORCOVARIANCE ANALYSIS

This section describes the organization of a computer program which hasbeen found useful in the evaluation and design of linear filters. The programdescription illustrates the practical application of the covariance analysismethods developed in previous sections. The description is presented in terms ofalgebraic equations and a flow chart and, therefore, is independent of anyparticular programming language.

EYALUATION OF AN n·STATE FILTER OPERATING IN ANm-STATE WORLD

The program described here is suited to a fairly broad class of filter evaluationproblems. The overall program scheme is applicable, with minor modifications,to an even broader class of problems involving suboptimal or optimal linearfilters. The general situation being considered is one where a linear filter designhas been proposed; it is to receive measurements at discrete points in time and

SUBoPTIMAL FILTER DESIGN AND SENSITIVITY ANALYSIS 269

estimate states of a system which is continuously evolving. The filter design isbased on an incomplete and/or incorrect model of the actual system dynamics.It is desired to obtain a measure of the filter's performance based on a trothmodel which is different (usually more complex) than the model contained inthe filter itself.

The problem is formulated in terms of two covariance matrices: P,representing the truth model or system estimation error covariance, and P*.representing the filter's internal calculation of the estimation error covariance. Pis an m X m matrix, where m is the dimension of the truth model. p* is an n X nmatrix, where n is the number of estimated states. Usually. m is larger than n;sometimes much larger. The principal program OUtput is the time history of P;its value before and after each update is computed. A useful graphical output is acollection of plots of the square roots of the diagonal elements ofP versus time.The time histories of the fil ter gain matrix K and covariance P" are alsocomputed, and may be printed or plotted if desired.

The principal program inputs are the following collection of system and filtermatrices:

System Inputs

Po the initial truth model covariance matrix(m X m)

F the truth model dynamics matrix (m X m)

H the truth model measurement matrix (£ X m), where £ is themeasurement vector dimension

Q the truth model process noise matrix (m X m)

R the truth model measurement noise matrix (£ X £)

Filter Inputs

Po* the initial filter covariance matrix (n X n)

F· the filter dynamics matrix (n X n)

H* the fllter measurement matrix (2 X n)

Q. the filter process noise matrix (n X n)

R* the filter measurement noise matrix (R X £)

The five system matrices represent a linearized description of the entire physicalsituation, as understood by the person performing the evaluation. The five filtermatrices represent, usually, a purposely simplified model, which it is hoped willproduce adequate results. In the general case, F, H, Q, R, F·, H$, Q., and R$are time-varying matrices, whose elements are computed during the course of theproblem solution. For the special case of constant dynamics and stationary noisestatistics these eight matrices are simply held constant at their input values.


The immediateobjectivein running the program is to produce:

• A quantitative measure of overall system performance, showinghow then-state filter design performs in the m-state representation of the realworld.

• A detailederror budget, showing the contributionof each system state ordriving noise (both those that are modeled and those that are unmodeledin the filter) to estimationerrors.

An underlying objective is to gain insight into the various errormechanismsinvolved. Such insight often leads the way to improved filter design.

truth model state vector that is estimated by the filter is correctlymodeled;or(2) when a feedback filter is mechanized and the estimated states areimmediately corrected, in accordance with the filter update equation. Whenneither of the above conditions hold, the more complicatedequationsinvolvingthe V and U matrices should be used.

Except for one missingingredient, the five system matrix inputs along withEqs. (7.6-1) and (7.6-2) would completely determine the discrete time history ofPk, the estimation error covariance. The missing ingredient is the sequence offilter gain matrices Kk*. The filter covariance equationsmustbesolved in orderto produce this sequence.These equationsare, for propagation

and

(7.6-3)

(7.6-4)

(7.6-5)

forgain calculation

where

and for updating

tPk* = eF * .:1 t

(7.6-1)

TRUTH MODELAND FILTER COVARIANCE EQUATIONRELATIONSHIPS

Two sets of covariance equations, involving P and P*, are presentedbelow.The relationship between these two sets is explained with the aid of aninformation flow diagram. This relationship is a key element in determining theorganization of the covariance analysiscomputerprogram.

Both sets of equationsarerecursionrelationswhich alternatelypropagate thecovariance between measurements and update the covariance at measurementtimes. The truth model covariance equations used in the exampleprogram areaspecial case of equations given in Section 7.2. For propagation betweenmeasurements

For updating

where W is an m Xn transformation matrix [I :{)l as defined in Section 7.4. Thetransition matrix4>)( andnoise matrix.ok arefound usingmatrixseriessolutionsto

and

Figure 7.6-1 is an information flow diagram illustrating the relationshipbetween these two sets of recursion formulae. The upper half of the diagramreprese~ts the iterative solution of the filter covariance equations. These are~lved m orderto generatethe sequenceof filtergains.Kt., which is a necessaryinput to the lower half of the diagram, representing the iterativesolution of thetruthmodel covariance equations.

where ..6.t is the time interval between measurements. Equations(7.6.1) and(7.6-2) are appropriate forms ofEqs. (7.2·20) and (7.2-21), accounting for thedifference between filter state size and actual state size, when one of the twofollowing situations obtains: (I) when aF = aH = 0 - i.e., when the part of the

PROGRAMARCHITECTURE

A "macro flow chart" of a main program and its communicationwith twosubroutines, is shown in Figure 7.6-2. The function of the main program is toupdate and propagate both system and filter covariances, P and P", in a singleloop. A single time step is taken with each passage around the loop. Note thatthe filt~r gain and update calculations are performedbefore the system updatecalculationbecausethe currentfiltergainmatrixmust be available as an input to


~• i

Figure 7.6-1 Covariance Analysis Information Flow

the latter. For constant dynamics problems, no subroutines are needed; thevalues of F, F*, Q, Q*, H, H*, R, and R* are read in as inputs and do not changethereafter. These constant matrices along with the initial values, Po and po·,completely determine the solution. For time-varying problems a subroutineTVM is called once during each passage around the loop. Its function is togenerate the time-varyingelements of the system and filter matrices.

While the main program is generally applicable to a broad class of problems.lVM is a special purpose subroutine which is tailored to a particulartime-varying problem. TVM can be designed in a variety of ways. For example,in the case of problems involving maneuvering vehicles a useful feature,corresponding to the organization shown in Figure 7.6~2, is the inclusion of asubsidiary subroutine TRAJ, which provides trajectory information. Once eachtime step TRAJ passes position, velocity and acceleration vectors (f, V,and 1) toTVM. TVM generates various matrix elements, which are expressed in terms ofthese trajectory variables. This modular organization is useful in a number ofways. For example, once TVM is written for a particular truth model.corresponding to a given fllter evaluation problem, the evaluation can berepeated for different trajectories by inserting different versions of TRAJ andleaving TVM untouched. Similarly, if two or more filter designs are to becompared over a given trajectory, different versions of TVM can be insertedwhile leavingTRAJ untouched. TRAJ can be organized as a simple table-look-upprocedure, or as a logical grouping of expressions representing a sequence oftrajectory phases. Individual phases can be constant velocity or constantacceleration straight line segments, circular arc segments, spiral climbs, etc. Theparticular group of expressions to be used during a given pass depends on thecurrent value of time.

An overall system performance projection for a given trajectory can begenerated in a single run by inputting appropriate elements of Po, Q, and R,corresponding to the entire list of truth model error sources. The effects of

SUBOPTIMAL FILTER DESIGN AND SENSITIVITY ANALYSIS Z73

F.... 7.6-2 CovarianceProgllUD Macro Flowaw.

changing the trajectory, the update interval or the measurement schedule can befound using repeat runs with appropriate changes in the trajectory subroutine orthe panmeters, such as At, which control the sequencing of the main programloop. Contributions to system error of individual error sources, Or smallgroupsof error sources, can he found by inputting individ.·aJ or small groups ofelements of Po, Q, and R. A sequence of such runs c;,., be used to generate ary.tem error budget of the type discussed in Section 7.4, which tabulate. theseindividual contributions separately. Such a tabulation reveals the major errorcontn1>utorsfor a given design. The time·history plots produced in conjunctionwith these error budget runs are also useful in lending insight into the importanterror mechanisms involved. This information and these insight. are very.helpfulin ouggertingfilter deSlgOimprovements.


REFERENCESI. Bryson, A.E., Jr. and Ho, Y.C., Applied Optimal Control, Blaisdell Publishing Co.,

Waltham. Massachusetts 1969.

2. Sutherland, A.A.. Jr. and Gelb. A., "Apnlication of the Kalman Filter to Aided InertialSystems," Naval Weapons Center, NWCTP 4652, August 1968.

3. Huddle, J.R., "On Suboptimal Linear Filter Design," IEEE Trans. on AutomaticControl, Vol. AC-12, No.3, pp. 317-318. June 1967.

4. Downey. J.J .. Hollander S and Rottach, E.A.. "A Loran Inertial NavigationSystem," Sperry EngineeringReview, Vol. 21, No. I, pp. 36-43, 1968.

5. Fortmann, T., Kleinman, D.L., and Athans, M., "On the Design of Linear Systems withPiecewise-Constant Feedback Gains," IEEE Trans. on Automatic Control, Vol. AC-13,No.4. PP. 354-361, August 1968.

6. Singer, R.A. and Frost, P.A., "On the Relative Performance of the Kalman and WienerFilters," IEEE Trans. on Automatic Control, Vol. AC-14, No.4, pp. 391-394, August1969.

7. Isaacson, E. and Keller, H.B., Analysis of Numerical Methods, John Wiley and Sons,lnc., New York. 1966.

8. Hutchinson, C.E. and D'Appolito, J.A., "Design of Low-Sensitivity Kalman Filters forHybrid Navigation Systems," AGARD Conference Proceedings No. 54 on HybridNavigationSystems, AGARD CP No. 54, January 1970.

9. D'Appolito, J.A. and Hutchinson, C.E., "Low Sensitivity Filters for State Estimation inthe Presence of Large Parameter Uncertainties," IEEE Trans. on Automatic Control,Vol AC-14, No.3, pp. 310-312, June 1969.

10. D'Appolito, J.A., "The Evaluation of Kalman Filter Designs for Multisensor IntegratedNavigation Systems," The Analytic Sciences Corp., AFAL-TR-7Q-271, (AD 881286),January 1971.

II. Nash, R.A., D'Appolito, J.A. and Roy, K.J., "Error Analysis of Hybrid InertialNavigation Systems." AIAA Guidance and Control Conference, Paper No. 72-848,Stanford, California, August 1972.

PROBLEMS

Problem 1-1

Using Eq. (7.2-3) show that in the steady state, the error covariance for the filter usingthe fixed gain koo =JQiidescribed in Example 7.1-3 is equal to .jiq, the steady-state errorcovariance determined for a Kalman filter in the same example.

Problem 1-2

Using the relations


Problem 1-3

Using Eq, (7.1-1) and the equation

P=FP + PFT + GQGT - PHTR-I HP

derive Eqs. (7.1-2) through (7.1-4).

Problem 1-4

Using the method followed in Example 7.4-2, develop a sensitivity curve depicting thesensitivity of the steady-state error in Examples 7.4-1 and 7.4-2 to process noise.

Problem 1-5

Using Eq, (7.3-3), show that the curve in Fig. 7.4-2 will asymptotically approach theline - ordinate = 0.247 X abscissa - as r grows without bound.

Problem 1-6

Utilizing the equation for propagation of the error in the estimate in a linear filterhaving the structure of a Kalman filter but without an explicitly defined gain matrix, showthat, for such a filter, the error covariance equation across a measurement is

~(+) =(I-KkHk) Pk(-) (I-KkHk)T + KkRkKk T

Problem 1-1

Refer to Table 4.3-1. The state estimate equation there contains the essence of thestructure of the Kalman filter without explicitly defining the filter gain matrix K(t). Thedefinition for K(t) given in the table is that which defines the optimal gain and thus theoptimal filter, (a) Show that, for the case of a general K(t) the error differential equation isgiven by Eq. (4.3-13). (b) Show that the error covariance for the filter with a generalgain matrix, in the case where.!and ~ are not correlated, is [Eq. (7.2-3)),

p= (F-KH)P + P(F-KH)T + GQGT + KRKT

Problem 1-8

Consider the problem of estimating a random ramp function given by the 2-state set ofequations:

with the noisy measurement

z = X2 + v

Show that the differential equations for the elements of the error covariance matrix P:

~k+1 = cI»tc~k + l\:k

~k = Hk!k + n

ik+I(-) = cI»tc*!k(+)

ilr+l(-) = ik+I(-) + Kk[~k - Hk*ik+l(-)]

and the definitions provided for V, U, ~cI»tc and ~H, derive Eqs. (7.2-20) and (7.2-21).

P~ rPIl

Lp12P12]P22

216 APPLIED OPTIMAL ESTIMAnoN

are

. 1 2P11:: -rP12

• PI2P22PI1 = Pu --r-

Observe that in the steady state (Pit = Pl'1 = P-1'l '"01. the elements of the error covariancematrix all vanish. While the process being observed is nonstationary and grows withoutbound, the estimate converges on the true values of the states. Notice also that, since theerror covariance vanishes, the filter gain matrix vanishes. Show that if there is a smallamount of white noise of spectral density q forcing the state - i.e.•

and if that forcing function is not accounted for in the filter design (i.e., the gains ateallowed to go to zero), the error covariance grows according to the relations

PI 1 = q

P12 = Pr r

8. IMPLEMENTATIONCONSIDERATIONS

In Chapter 7, the effects of either inadvertent or intended discrepanciesbetween the true system model and the system model "assumed by the Kalmanfilter" were described. The rms estimation errors associated with the so-calledsuboptimal Kalman filter are always larger than what they would have been hadthe filter had an exact model of the true system, and they are often larger thanthe predicted rIDS errors associated with the filter gain matrix computation. Inaddition, the computations associated 'Withcalculating the true rms estimationerrors are more complicated than those associated with the optimal filter. In thischapter, these same kinds of issues are addressed from the standpoint ofreal-time implementation of the Kalman filter equations. The point of viewtaken is that a simplified system model, upon which a Kalman filtering algorithmis to be based, has largely been decided upon; it remains to implement thisalgorithm in a manner that is both computationally efficient and that produces astate estimate that "tracks" the true system state in a meaningful manner.

In most practical situations, the Kalman filter equations are implemented ona digital computer. The digital computer used may be a small, relatively slow, orspecial purpose machine. If an attempt is made to implement the "theoretical"Kalman filter equations, often the result is that it is either impossible to do sodue to the limited nature of the computer, or if it is possible. the resultingestimation simply does not correspond to that predicted by theory. Thesedifficulties can usually be ascribed to the fact that the original model for thesystem was inaccurate, or that the computer really cannot exactly solve the

Kalman filter equations. These difficulties and the methods that may be used toovercome them are categorized in this chapter. The material includes adiscussion of:

• modeling problems., constraints imposed by the computer• the inherently finite nature of the computer• special purpose Kalman filter algorithms., computer loading analysis

When reading this chapter, it is important to keep in mind that a unified bodyof theory and practice has not yet been developed in this area. Thus, thediscussion encompasses a number of ideas which are not totally related. Also, itis to be emphasized that this chapter deals with real-timeapplications of Kalmanfiltering, and not covariance analyses or monte carlo simulations performed Onlarge general-purpose digital computers. However, many of the conceptspresented here are applicable to the latter situation.

8.1 MODELING PROBLEMS

Performance projections for data processing algorithms such as the K.a1manfilter are based on assumed models of the real world. Since these models arenever exactly correct, the operation of the filter in a real-time computer and in are~-ti~e environment is usually degraded from the theoretical projection.This discrepancy, commonly referred to as "divergence", can conveniently beseparated into two categories: apparent divergence and true divergence (Refs. 1,2, 3). In apparent divergence, the true estimation errors approach values that arelarger, albeit bounded, than those predicted by theory. In true divergence, thetrue estimation errors eventually become "Infinite." This is illustrated In Fig.8.!-1.

TRUE

THEORETICAL

TIME TIME

(a) Apparent Divergence (b) True Divergence

FiJUIe 8.1-1 Two Kinds of Divergence

IMPLEMENTATION CONSIDERATIONS 279

Apparent divergence can arise when there are modeling errors which cause theimplemented filter to be suboptimal - i.e., it yields a larger rms error than thatpredicted by the filter covariance computations. This type of behavior wasdiscussed in Chapter 7. True divergence occurs when the filter is not stable, inthe sense discussed in Section 4.4 and Ref. 3. or when there are unmodeled,unbounded system states, which cause the true estimation to grow withoutbound. These concepts are illustrated in the following example.

Example 8.1-1

As shown in Fig. 8.1-2, the Kalman filter designer assumes that the state to be estimated,xI. is simply a bias whereas, in fact, XI(t) is a bias, XI(O),plus the ramp, Xl(O)t. Recall fromExample 4.3-1 that the filter is simply a first-order lag where the gain k(t) - 0 as t _ 00.

,REALWORLD KALMAN fiLTER

Figure 8.1-2 Example System Illustrating True Divergence

Thus eventually, XI(t) diverges as Xl(O)t. The analogous discrete filtering problem displayssimilar behavior. More complicated examples of divergence are given in Refs. 46 and 41.

Practical solutions to the true divergence problem due to modeling errors cangenerally be grouped into three categories: estimate unmodeled states, addprocess noise. or use finite memory filtering. In the first of these, if one suspectsthat there are unmodeled growing states in the real world, they are modeled inthe filter for "insurance." Thus, in Example 8.1-1, a ramp state would have beenincluded in the filter. However, this approach is generally considered unsatis-factory since it adds complexity to the filter and one can never be sure that allof the suspected unstable states are indeed modeled. The other two categoriesare treated below. More sophisticated approaches are also possible. For example,see Refs. 4 and 17.

FICTITIOUS PROCESS NOISE

An attractive solution for preventing divergence is the addition of fictitiousprocess noise !,<:t) to the system model. This idea is easily explained in terms ofExample 8.1-1. Suppose that the true system is thought to be a bias (><, = 0),but that the designer purposely adds white noise (><, = w), as shown in Fig.8.1-3. The Kalman filter based on this model has the same form as that in Fig.


8.1-2. However, as shown in Fig. 8.1-3, the Kalman gain k(t) now approaches aconstant nonzerovalueas t ~ 00. If, as in Example 8.1-1, the true system consistsof a bias plus a ramp. the Kalman filter will now track the signal x, with anongrowingerror.


Example 8.1·2

The true system is the same as that in Example 8.1-1. The designer wishes to estimatethe assumed bias x1. using a simple averager of the form

whet.

(a) Assumed System

kIt)

(b) TYPIcal Gain HIstory

z(t) =- x,(t) + 'o'(t)

Note that this averager only uses data T units of time into the past. If, in fact, x,(t) alsocontains a ramp, then

It follows that

Figure 8.1-3 The Use of Fictitious Process Noise in Kalman Filter Design

The reason that this technique works can be seen by examining the Riccatiequation for the error covariance matrix P:

(8.1-1)

In steady state, P= 0 so that

Xz(t)"'xz(O)TXt(O) (t-'!') +1-1 tv(-r)d1'

2 T t-T

The estimation error is

1t

• 1 x,tOjTx,(t) - x,(t) '"T '0'(1') dT - -2-

,-T

(8.1-2)It is easy to show that the rms value of this error is bounded for a fixed value of the datawindow T. Though an unknown ramp is present, the estimator is able to track the signalwith a bounded error.

If certain elements of Q are zero, corresponding elements in the steady-statevalue of P, and consequently K, are zero. Thus, with respect to the statesassumed not driven by white noise, the filter disregards new measurements("non-smoothable" states, see Section 5.2 and Ref. 5). However, if theseelements of Q are assumed to be nonzero, then the corresponding elements of Kwill be nonzero, and the filter will always try to track the true system.Analogous remarks can be made about the discrete case.

The choice of the appropriate .level of the elements of Q is largelyheuristic,and depends to a great extent upon what is known about the unmodeled states.Some examples are givenin Ref. 6.

FINITE MEMORY FILTERING

The basic idea of finite memory ftltering is the elimination of old data, whichare no longer thought to be meaningful (this is often called a "movingwindow.") This idea is conveniently explained in terms of an example.

Although this example does serve to illustrate the concept of finite memoryfiltering, it does not represent a recursive filtering algorithm. In order tocorrespond to, and take advantage of, the discrete Kalman filter formulation, itis necessary to cast the finite memory filter in the form

(8.1-3)

where itt(N) is an estimate of ~ at time tk based on the last N data points, \Ilk isthe transition matrix of the filter, and Kk is the filter gain.Somehow, the finitememory property of the filter must be embodied in 'ilk and Kk . Also, anadditional term in Eq. (8.1-3), involving1k_N,may be needed to "subtract out"the effect of old measurements. The problem of casting the finite memory filterin the form of Eq. (8.1-3) has not yet beep satisfactorily resolved. although somework based on least-squares ideas has been done (Ref. 7). However, threepractically useful approximation techniques do exist; these ate described in thefollowingparagraphs.

282 APPLIED OPTIMAL ESTIMATION IMPLEMENTATION CONSIDERATIONS 283

(8.1-4)

Direct Limited Memory Filter Theory - If there is no process noise (Q=O),Jazwinski (Refs. 8, 9) has shown that the equations for the limited memoryfilter are 0 < k)

Xk(N) = Pk(N) (Pk'k" Xk - Pku" Xk'i)

(p (N»)" .• .•k =Pklk -Pku

where

.xk(N) best estimate of !k' given last N = k-j measurements

Pk1k covarianceof&k- 3k' that is, Pk(+) in "normal" nomenclature

Pk1j covariance of '&klj-3k where ikli. is estimate of i k given jmeasurements - i.e., ilk'i = 4>(k,j)3;(+)

4>(k,j)Pj(+)4>T(k,il

4l(k,j) = transition matrix of the observed system

Pk(N) = covarianceofik(N)_2)k

Kalman filter nom (a) to (b) in Fig. 8.1-4, obtaining X., p.,. = p.(+);(2) Continue to run the regular Kalman filter from (b) to (c). obtaining X••,p.. ,•• = p .. (+); (3)Calculate .&•• ,. = 4>(16,8).&., P" IS= 4>(l6,8lP.(+)·4>T(16,8); (4) Calculate i..(S) and p.. (S) from Eq. (8.1-4); and (5) Run theregular Kalman filter from (c) to (d) using 3.. (S) and p •• (S) as initialconditions, etc. Note that for k> 8, '.xk is conditioned on the last 8 to 16measurements.

Jazwinski (Refs. 8, 9) has applied this technique to a simulated orbitdetermination problem. The idea is to estimate the altitude and altitude rate of asatellite, given noisy measurements of the altitude. The modeling error in theproblem is a 0.075%uncertainty in the universalgravitationalconstant times themass of the earth, which manifests itself as an unknown bias. Scalarmeasurements are taken every 0.1 hr and the limited memory window is N = 10(1.0 hr). The results are shown in Fig. 8.1-5. Note that the "regular" Kalman

filter quickly diverges.

The € Technique - As indicated earlier, filter divergence often occurs whenthe values of P and K, calculated by the filter, become unrealistically small and

Figure 8.1-5 Position Estimation Error for the Regular and Limited MemoryKalman Filters (Refs. 8, 9)

ORBIT· BETWEEN Band 34 EARTH RADIl

OBSERVATION NOSE, 1,1nm

-

,,I

V II

f\: If'. \11: : V ~:!, 11

I I "I ~ ;: \.1

: 'JIII

J

:I,I I

60 70

LIMITED MEMORYfn.TU,

,,;l ~ ~l \ II! 'I ~ I, (" II

,. : 1 ~ ~: \J ~ :\ r \, , : ~~

\: ~ I 1'""""~

30TIME (h,,)

2010

15 -6 DATA GAP

~ \B _SL__--L -:': .,..L- -',-ll'-'- ....L......._

0. 0 40

z(t)

FiJure 8.1-4 MeasurementSchedule fOJ the Limited Memory Example

From Eq. (8.1-4), it is seen that the limited memory filter estimate is' theweighted difference of two infinite memory filter estimates. This is unacceptableboth because the weighting matrices involve inverses of the full dimension ofX - a tedious computation - and because the computation involves two infinitememory estimates, the very calculation sought to beavoided.

An approximate method of solving Eq, (8.1-4) has been devised, thatlargely avoids the above difficulties. This method is illustrated in the following(see Fig. 8.1-4), for a hypothetical example where N = 8: (I) Run the regular

284 APPLIED OPTIMAL ESTIMATION IMPLEMENTATION CONSIOERATIONS 285

(8.1-6)

From this equation. it is seen that the effective gain of this modified filter is

(8.1-11)

(8.1-13)

(8.1-12)

k = j,j-l,j-2, ...

s;;'1

Fading Memory Filten and Ago-Weighting (Ref•. g, 13-161 - Discarding olddata Canbe accomplished by weighting them according to when they occurred.as illustrated in Fig. 8.1-6. This means that the covariance of the measurementnoise must somehow be increased for past measurements. One manner ofaccomplishing this is to set

where j is some number greater than or equal to one, Rk is the ....regular.. noisecovariance. Rk• is the "new" noise covariance and j denotes the present time.

assuming that the modeled values F. Qk. Hk, rt. and Po are correct. Practicallyspeaking then, this technique prevents Pk(+) from becoming too small, and thusoverweights the last measurement. This is clearly a form of a limited memoryfilter. A practical application of the "e technique" is given in Ref. 10.

where Kow is an "overweight" gain proportional to €. Meaningful values of € liein the range of 0 to I, and are usually chosen by experience or trial and error. Ofcourse, € =0 corresponds to the "regular" Kalman filter. When e = I, Hk+1 .Et+ 1(+) = zk+ 1 - i.e., the estimate of the measurement equals the measurement.

Using the techniques in Chapter 7. it can be shown that the covariance matrixof the estimation error is

[ik+1(+) - 4>kit(+)] reg = Kk+1 [Zk+l- Hk+l4>k3t(+1

and where

<1it+I(+) = (Hk+? rk+l-1Hult Hk+IT rk+1-' [Zk+l- Hk+l4>kik(+~

(8.1-7)

the filter stops "paying attention" to new data. These ideas motivate theaddition of a correction term to the regular Kalman fdter update equation,which is based on the current measurement. This concept, which is due toSclunidt (Refs. 9,10,11), can be expressed mathematically as

3t+I(+) - 4>kik(+) = rik+1(+) - 4>k3t(+)l + e<1ik+I (+) (8.1-5)l l,egwhere the subscript reg denotes "regular" - i.e.,

In Eq. (8.1-6), Kk is the "regular" Kalman gain based on the assumed values ofF, "t, Ot,rk' and Po' In Eq. (8.1.7), # denotes the pseudoinverse, zk+l mustbe a scalar <Hk is a row matrix, rt is a scalar), and e' is a scalar factor to bedetennined. Since Hk is a row vector, Eq. (8.1-7) reduces to

<l.ik+I(+) = Hk+1T (Hk+1Hk+1Tfl rk+l-Hk+l4>k!t(+)] (8.1-8)

Equations (8.1-7) and (8.1-8) represent the best estimate of lIt+l - 4>kik(+),based only on the current measurement residual (Ref. 12). Note that thisestimate retains the desirable property that <l.ik+1 = Qif zk+l = Hk 4>k &k(+)'Since Xk generally has more components than the scalar measurement zk. theproblem is underdetermined, and use of the pseudo-inverse is required.

It is convenient to define a new scalar, E, where

Note that the numerator and denominator of Eq. (8.1-9) are scalars. If Eqs.(8.1-8) and (8.1-9) are substituted into Eq. (8.1.5), the update equations become

l= €Ik+1

Hk+IPk+I(-)Hk+1 T +rk+1(8.1-9)

..<o~w

~o

§L--------------::-,L---PRESENT TIME

Figure 8.1-6 Age-WeightingConcept


For example, suppose Rk is the same for all k - i.e., Rk = R. A convenientway to think of the factor sis

(8.1-14)

where .6.t is the measurement interval and T is the age-weighting time constant. Itthen follows that

IMPLEMENTATION CONSIDERATiONS 287

s::f.:l, it can be shown that in general P'k is not the error covariance. The trueerror covariance can easily be calculated using the techniques in Chapter 7.

Example 8.1.:1

The age-weighted filter will now be applied to the simple example. used several timespreviously in this section and illustrated again in Fig. 8.1-8. For this case. it follows fromEq. (8.1-16) that the necessary equations are

R'_m = (eID'''/T)R, m=O,I,2, ...

This is illustrated in Fig. 8.1-7.

R.OJ

k·) k-2 k-' k

(8.1-IS)P'k(-) = SP'k_I(+)

P'k(+): p'kH _ ;'k(~)'o +p k(-)

, S2p'k_ 12 (+)

= sPk-l(+) - 02+ SP'k_l(+)

o2 SP'k_l(+)o2+SPk_ 1(+)

where a lower case P'k is used to emphasize the scalar nature of the example. For steadystate. P'k(+) =P'k-I (+) =P:" and the solution is

p': =.' (1 - +)

k:=(I-+)

Fijjure8.1·7 Conceptual Illustration of Age-Weighted Noise Covariance Matrix BehaviorNote that koo > 0 for all s > 1. This is exactly Hie desired behavior; the gain does not "tumoff" as it does in the "regular" Kalman filter.

A recursive Kalman filter can be constructed under these assumptions. Theequations are:

Comparison of this set of equations with the set for the Kalman filter will showthat they are nearly identical. The only difference is the appearance of theage-weighting factor. s, in the equation for P'k (-). There is, however, aconceptual difference between these sets. In the case of the Kalman filter (s=I),P'k (usually denoted by Pk) is the error covariance, E[(Sk -l',k)(Sk - Jlk)Tj. If

Kk =P'k(-)HkT[HkP'k(-)Hl + Rtr'

Pk(+) = P'k(-) - Pk(-)HkT [HkP'k(-)HkT + Rkr' HkP'k(-)(8.1-16)

Figure 8.1·8 Bias System and Matrices (lower cese e q,r, h denote scalars)

288 APPLIED OPTIMAL ESTIMATION IMf'LEMENTATIONCONStoERAnONS 289

Figure 8.2-1 Error Flow in an Inertial Navigation System

It can also be shown that for this problem as k becomes large the error covariance,Pk(+)::: E(Xk(+) - Xk)'J, approaches the limit

Poo::: lim Pk(+) '" .!..=.-II .'k_ oo s+

when s ~ 1. This equation is summarized in Table 8.1-1 for s=1 and~. Note that when5-+ 00

, Xk equals the measurement. This implies that the filter also disregards the initialcondition information, E(x'(O)l; hence, the caption, "one-stage estimator;"

TABLE 8.1·1 LIMITING CONDITIONS OF THE AGE·WEIGtJTEDFILTER EXAMPLE

G'1lODlllfTRATES

ACCELEROMETEREItRQlIS

GolAVITYUNCERTAINTIES

RefERENCEPOSlTIQN

''l'l'"

IEfERt:NCEvELOCITY.RllOll

Regular Kalman Filter One-Stage Estimator

s I -p- O .'k_ 0 I

8.2 CONSTRAINTS IMPOSED BY THE COMPUTER

Generally speaking, the class of digital computers available for a particularmission is largely constrained by considerations (e.g., weight and size restrictions) other than the complexity of the Kalman filter equations. This leads oneto attempt to reduce the number of calculations performed in the implementation of the filter as much as possible. To accomplish this, the followingtechniques are often used: reducing the number of states, decoupling theequations (or otherwise simplifying the F matrix), and prefiltertng.

by gyro drift rates and the associated 24-l\r dynamics. Thus, if external position and velocitymeasurements are not available frequently, accelerometer errors, gravity uncertainties. andthe 84-minute (Schuler) dynamics can generally be neglected in the filter, Of course.estimates of the high frequency components of velocity error are lost in this simplif'tedfilter. Even if measurements are available frequently. optimal covariance studies will oftenshow that although the 84-minute dynamics must be modeled in the filter, error sourcessuch as accelerometer errors, vertical deflections, and gravity anomalies cannot be estimatedaccurately relative to their Q priori ems values. Thus, states representing these errors neednot be modeled in the filter. '

DECOUPLlNG STATES

One method of simplifying" the Kalman filter equations is to decouple theequations. As mentioned in Section 7.1, the rationale here is that there arealways fewer computations involved in solving two sets of equations ofdimension n/2, as opposed to one set of dimension n. For example, suppose themodel equations are

If the elements of FI' and F'l are small, and if ~ 1 and ~2 are uncorrelated, itmay be possible to uncouple ~I and.&, and work with the simplified set

Of course, covariance studies would have to be performed to determine whetherEq. (8.2-2) was a reasonable approximation to Eq. (S.2-1).

DELETING STATES

Deleting states implies eliminating states in the system model upon whichthe filter is based, thus automatically reducing the number of states in the filter,Presently, state reduction is largely based upon engineering judgement andexperience. However, some general guidelines (also see Section 7.1) are:(1) states with small rms values, or with a small effect on other statemeasurements of interest, can often be eliminated; (2) states that cannot beestimated accurately, or whose numerical values are of no practical interest, canoften be eliminated; and (3) a large number of states describing the errors in aparticular device can often be represented with fewer "equivalent" states. Theseideas are illustrated below.

Example 8.2-1

The error dynamics of an externally-aided inertial navigation system can be representedin the form shown in Fig. 8.2-1 (Refs. 18~21). Typically, 25 to 75 states are required todescribe such a system. However. long-term position error growth is eventually dominated

(8.2-1)

(8.2-2)

290 APPLI EO OPTIMAL ESTtMATION IMPLEMENTATION CONSIDERATIONS 291

Example 8.2-2

A more sophisticated approach to the decoupling problem was recently taken withregard to an Apollo application (Ref. 26). In general, the system equations can bepartitioned to read

this simplification does not actually decouple certain equations, it offersadvantages in the computation of the transition matrix, particularly if use ismade of the fact that certain elements in the F matrix are zero.

The goal is to eliminate the states 2i2 from the filter, thus reducing its complexity, but stillsomehow compensating for their loss. The true gain equation for K t can be written

KI = (P IlH IT+PnH2T) [HI (PnHtT+PI2H2T)

+H2 {P'l1H1T+PnH2T)+R]-t

If P12 is small and PH is dominated by its diagonal elements (and they only vary slightlyabout their initial conditions), then the following approximations can be made:

PREFILTERING

The concept of "preftltering," (also referred to as "measurement averaging:'or "data compression") is motivated by cases where measurements are availablemore frequently than it is necessary, desirable, or possible to process them. Forexample, suppose measurements of a state ~ are available every ~t. Supposefurther that the time required by the computer to cycle through the Kalmanfilter equations (obtain &k+1 and Pk+1 from &k' Pk, and !Ie) and to performother required operations is <loT, where <loT = n<lot for n > 1. Clearly eachindividual measurement cannot be used. However, it is generally wasteful to usethe measurement only every ~T seconds. This problem is often resolved by usingsome form of average of the measurement (over ~T) every ~T seconds.

To illustrate what is involved, consider the scalar measurementP12 ....0

(8.2-3)

The gain equation reduces to After every ~T seconds, an averaged measurement z'm is used, where

(8.2-4)

As indicated in Eq. (8.2-5), the original noise vk is indeed smoothed but there isan additional error due to smoothing the signal xk'

(8.2-5)

AdditionalNoise Due toSmoothingthe Signal

= 1.. -.t- v, +n .L. 1

i=1~

Reduced NoiseDue to

Smoothing theOriginal Noise

where the index I runs over the measurements collected during the previcus zs'Tinterval. In order to use the standard Kalmanfiltering equations, z'm must beput in the form (state) + (noise). Therefore, by definition, the measurementnoise v'm must be (measurement) - (state), or

K J =PIHIT [HIPIIHIT +R+H2P22(0)H 2T]-1

This is the "normal" equation for K I' when .&2 is neglected, plus the extra termH2P22(0)H2T.

For the Apollo application in question. !.l represented the position and velocity of theorbiting spacecraft, and ~ represented ten instrument biases. The measurements are range,range-rate. and pointing error. In Table 8.2-1, the performance of the optimal and reducedstate fi1t~rs are compared. Note that 80% of the performance loss. realized when.2£2wasdeleted. IS recovered by the inclusion of the term H 2P 22(0) H2T.

TABLE 8.2-t COMPARISON OF OPTIMAL AND REDUCED-STATE FILTERS

Often, other kinds of simplifications can be made in the F matrix "assumed"by the filter. For example, it is common to approximate broadband errorsources by white noise, thus eliminating certain terms in the F matrix. Even if

rms Filter Accuracy

x y z X y ;(ft) (ft) (Ct) (Cps) (Cps) (Cps)

Optimal Filter 1096 7417 919 0.410 8.21 1.51

No Compensation1271 8179 1768 0.637

H'lP'12(O)H2T = 09.63 2.67

Full Compensation 1112 9186 lOIS 0.442 9.46 1.63


To use the Kalman filter, the variance of v'm must be specified. Since vk andxk are assumed independent, this calculation reduces to specifying the vari3lJceof the "reduced noise" and the "additional noise." Since vk is an uncorrelstedsequence, and if E[Vk2

) = ao' ' then

(8.2-6)

Properties of the "additional noise" depend on statistical properties of xk. Forexample, suppose that n = 2, and that


ALGORITHMS AND INTEGRATION RULES

Kalman Filter Equations - In practical applications. the discrete form of theKalr:nan filter is always used. However, the discrete equations arise fromsampling a continuous system. In particular, recall from Chapter 3 that thecontinuous system differential equation is "replaced" by

(83-1)

Similarly, the covariance matrix Qk of l!'k is related to the spectral densitymatrix Q by the integral

E[Xi"j} =ax' e-Ii-il~t

It then follows that

(8.2-7)

(8.3-2)

Thus, exclusive of other matrix algebra, this implementation of the Kalman filterincludes a matrix inverse operation and determination of cI\ and Ot.. Algorithmsdesigned to calculate matrix inverses are not peculiar to Kalman filtering andthey are well-documented in the literature (see, for example, Ref. 30).Consequently, these calculations will not be discussed further here, although amethod for avoiding the inverse is given in Section 8.4.

The state transition matrix 4»is obtained from

(8.2-8)

It is very important to realize that the measurement noise v'm' associatedwith the averaged measurement, is not white noise and - in addition ~ iscorrelated with the state x; both facts being due to the presence of the"additional noise." This must be accounted for in the Kalman filter mechanization or serious performance degradation may result. An approximate method foraccomplishing this is indicated in Ref. 10. A more exact technique involves theso-called "delayed state" Kalman filter (Refs. 27,28).

Recent work by Jogiekar (Refs. 51 and 52) has attempted to quantify theconcept of data compression and presents examples related to aircraft navigation.For example, Joglekar shows that the accuracy lost through data compressionwill be small if the plant noise lY is small compared to the observation noise y.Other examples of prefiltering and data compression are given in Ref. 24(inertial navigation systems), Ref. 48 (communication systems), Ref. 49(trajectory estimation) and Ref. 50 (usage in the extended Kalman filter).

Further, recall that the discrete Kalman filter equations are

A+I(+) = ~kgk(+) + KtI.~k - Hk~&t(+)l

Pk+I(-)=~kPk(+)~kT +Qk

where

(8.3-3)

(8.3-4)

8.3 THE INHERENTLY FINITE NATURE OF THE COMPUTER d~~:, T) = F(t) ~(t, T), ~(t, t)=1 (8.3-5)

A digital computer cannot exactly solve analytic equations because numericalalgorithms must be used to approximate mathematical operations such asintegration and differentIation (thus leading to truncation errors) and the wordlength of the computer is finite (thus leading toroundofferrors). The nature ofthe resultant errors must be carefully considered prior to computer implementation. Here, these errors are described quantitatively along with methods forreducing their impact.

and Qk is given by Eq. (8.3-2). Differentiation of this equation using Leibniz'formula yields

(8.3-6)

Thus, solutions to matrix differential or integral equations are required.Excluding those cases where the solution is known in closed form, Eqs. (8.3-5)


Integration Algorithms - Standard integration rules typically make use ofTaylor series expansions (Refs. 31 and 32). To illustrate, the solution to thescalar differential equation

and (8.3-6) may be solved by either applying standard integration rules that willsolve differential equations in general, or by developing algorithms that exploitthe known properties of the equations in question. It is to be emphasized thatthe following discussion is slanted towards real-time applications. However, theresults can generally be extrapolated to the off-line situation, where computertime and storage considerations are not as critical.

(8.3-14)

(8.3-13)

(8.3-15)

However, this is not satisfactory since xk+l is not known. Recalling that xk =f(x•• tk) exactly, and approximating xk+1 by

so that

we see that X:k+1 can be approximated as(8.3-7)i = f(x, t)

can be expressed as

x(t) = x(to) + i(to}{t-to) + x(to}{t-to? + ',,(to}{t-to>' +2 6 (8.3-8)

Substitution of Eq. (8.3-15) into Eq. (8.3-13) yields the algorithm

(8.3-16)

where the initial condition x(to} is known. The two-term approximation to Eq.(8.3-8) is

Xk+1 = xk +Xkt.tk

= xk + f(xk' tk) t.tk

where the subscript k refers to time tk, and where

(8.3-9)

(8.3-10)

This algorithm is known as the "modified Euler method". All of the moresophisticated integration algorithms are extensions and modifications of theTaylor series idea. For example, the Runge-Kutta method uses terms through thefourth derivative.

The important point to be made is that theoretically, the accuracy of thealgorithm increases as more terms of the Taylor series are used. However,computer storage requirements, execution time, and roundoff errors alsoincrease. This tradeoff is illustrated in Fig. 8.3-1. In this figure. "truncationerror" refers to the error in the algorithm due to the Taylor seriesapproximation.

The second derivative can be approximated according to

The integration algorithm represented by Eq. (8.3-9) is called Euler's method.For many problems, this method is unsatisfactory - the approximate solutiondivergesquickly from the true solution.

A more accurate algorithm can be derived by retaining the first three terms ofEq. (8.3-8),

STORAGE REOUIREMENTS 1(~~lt~F~~RT~~; (

COMPLEXITY OF INTEGRATION RULE

(8.3-11)

(8.3-12)

('\10 of terms 1(1TO'l'IOf series used)

Figure 8.3-1 Integration Algorithm Tradeoffs

296 APPLIED OPTIMAL ESTIMATiON IMPLEMENTATION CONSIDERATIONS 2ft1

Special Algorithms fo< the Transition Matrix - As opposed to generalintegration algorithms, consider now the special case of the equation for thetransition matrix

3min6,...

0.ia0&

~w::>

~0

3::>

~u

0 5 10NUMBeR Of SUBDIVISIONS OF 0.5 k, INTERVAL

(8.3-18)

(8.3-17)

el>(t2 , t,)=el>(at)

del> = Fel>dt

=e4 t F

= ~ at"F"~ n!n=O

If F is constant, it is well-known that

where bot = t, - t •. When .6.t is less than the dominant time constants in thesystem, n on the order of 10 or 20 will furnish six to eight place decimal digitaccuracy. For.6.t much smaller, just two or three terms may suffice. Forlargertime intervals, .6.t,the property

FiJure 8.3-2 Calculated Value of III

Other examples of the "constant F approximation" are:

(8.3-19)

is used so that

In most realistic cases, F is not time-invariant. However, the exponentialtime-series idea can still be used - i.e.,

n

el>(t +nat, t) = TT el>ji=l

n

el>(t + nat, t) = TT el>ji=l

c) Multiplying out (a),

(8.3-20)

(8.3-21)

when t, - t, is much less than the time required for significant changes in F(t).Equation (8.3-19) can then be used to propagate el>.

n

cl>(t t nat, t) = 1+ at L F(tj) + o(at2)

i=l

Example 8.3-1

Considel the case where F(t) is the 1 X 1 matrix sill nt, where n is 15 deg/hr (earthrate). Suppose the desired quantity is the transition matrix ~(o.5 hr, 0). Equations (8.3-21)and (8.3-19) are applied for an increasingly larger number of subdivisions of the 0.5 hIinterval. The calculated value of 4>(0.5hr. 0) is plotted in Fig. 8.3-2, and it is seen that anacceptable iIIterval over which the constant F matrix approximation can be used is lessthan6oo~~ .

n

'>! 1+at L F(t j )

i=1

where O(at') indicates terms of order at' or higher, Of these algorithms, (b) ismore accurate than (a) because more terms are used in the series, (a) and (b) aremore accurate than (c). but (c) requires the least computer space andcomputation time.

Example 8.3-2

An algorithm considered for ~ in the C·5A airaaft navigation program (Ref. 10) is amodification of (b) and (c) just discussed. Multiplying (b) out,

n n

~t + nat, t) =I + at L F(tj) + ~ at2l: F2(tj}

i=l i=l

n i-I

+(4')' L: F('i) L:F(!Jt-Il+0<4")i=l k=l


where Q is assumed constant. If this process is continued, one obtains

(8.3-24)

In addition, like the equation for CP, the equation for Q has certain knownproperties which can be exploited. In particular, note that Eq. (8.3-6) for Q isidentical to the matrix Riccati equation with the nonlinear term absent.Therefore, it follows from Section 4.6 that if F and Q do not change appreciablyover the interval tk+I - tk, then

(8.3-25)

where c,f)AY and CPAA are submatrices of the transition matrix

n n

OlilI + at L:F(tj}+tat2 L:Fc2

i=l i=l

where Fe is the time-invariant portion of F(tj} - Le.,

and F' (ti) is time varying. Note that the cross-product term hasbeen neglected. This term isboth small and computationally costly.

corresponding to the dynamics matrix

l- FT I O]-+Q I F

(8.3-26)

(8.3-27)

Algorithms for a,. - The design of algorithms that calculate Qk is motivatedby the same considerations just outlined for CP. Consider first the case ofapplying standard integration rules '0 Eq. (8.3-6). Suppose, for example. it isdesired to obtain Qk by breaking the interval (tk, tk+t) into N equal steps of'length" at, and then applying Euler's method. Denote calculated values of Qk

alter each integration step by Ot(at). Qk(2at). etc., where Qk(Nat) =Ot. andlet F (t k + iat) = Fi . Then according to Eq. (8.3-6) and Eq. (8.3-9).

Qk(at) =Ot(O) + (FoQk(O) + Qk(O)FoT+ Q1 at

~Qat(8.3-22)

Thus, the problem reduces to determining algorithms to evaluate transitionmatrices. A more sophisticated application of the relationship between Qk andthe matrix Riccati equation is given in Ref. 33.

WORD LENGTH ERRORS

Computer errors associated with finite word length are best visualized byconsidering multiplication in a fixed-point machine. As illustrated in Fig. 8.3-3,the calculated value of the product a x b is

and(ab)eale = (ab)true + r (8.3-28)

Ot(2at) = Ot(at) + (F, Ot(at) + Ot(at)F, T + QJ at

= Qat + (F,Qat + Qat F,' +Q)at

= 2Qat + (F,Q + QF,T) at' (8.3-23)

where r is a remainder that must be discarded. This discarding is generally donein two ways: symmetric rounding up or down to the nearest whole number orclwpping down to the next smallest number (Ref. 34).


+SOI~---------------

Q

b

ab

TIME ____

-so,L..--------------(0) CHOPPING

Fipre 8.3-3 Multiplication in a Fixed-Point Machine

Symmetric rounding and chopping are illustrated in Fig. 8.34. Note that in

symmetric rounding, the probability distribution p(r) of the remainder issymmetric about r :::0; whereas in chopping, the remainder r is either positive ornegative depending upon the sign of abo Thus if the products are in factconsistently positive or negative, truncating tends to produce systematic andlarger errors than symmetric rounding. This is illustrated in Fig. 83-5.

TIME ----

-osL..----------------'(b) SYMMETRIC ROUNDING

L' No olb'hpIt) FilUre 8.3-5 Errors Due to Finite Word Length (Ref. 34)

J~!-L,2- L' 1

PRODUCT POSlTlVE PRODUCT NEGATIVE

SYMMETRIC ROUNDING,. 9,

137_14{1or2d.C1mal d,gtl5}

133-13

CHOPPING,. 9 ,

13.7_13,,'_0.7

13.3_ 13.' '-0.3-137 __13.1'07

-13.3- -13, r = 03

The errors caused by finite word length can he determined in two ways. Oneis to solve the Kalman filter equations by direct simulation on a machine with avery long word length, then solve the filter equations again using the shorterword length machine in question (or on a short word length machine simulation), and compare the results. The other is to use the probability distribution inFig. 8.3-4 and theoretically compute the roundoff error. The choice of methodsdepends upon the availability of computers and the tradeoff between engineering time and computer time.

Analytic Methods (Ref. 34) - The basic idea of using probability distributions in Fig. 8.34 is the following. Suppose, for example, it is desired tocalculate

Fiaure 8.3-4 Distribution of the Remainder for Symmetric Rounding and Chopping(2 =number of bits) H(nxn) (8.3-29)


*Positive-definiteness is generally only lost if the filter model is uncontrollable and thus thefilter is not asymptotically stable.

If it is assumed that all r's have the same distribution, symmetric rounding isused, ana the r's are independent, then it follows from the central limittheorem (see Section 2.2) that e is approximalely gaussian and has Ihe variance

From this equation, it is seen that roundoff error increases as the number ofcalculations (n) increase and decreases as the word length (2) increases, anintuitively satisfying result.

Wilh respecl 10 Kalman filler applications, these concepts can be used 10calculate the mean and covariance of the error in the calculated value of i, dueto word length errors. These ideas can also be extended to floating-pointmachines, although the manipulations become more cumbersome.

Choice of State Variables (Ref. 36) - Consider a spacecraft orbiling themoon and attempting to estimate altitude with an altimeter as the measurementsource. There are Ihree variables, two of which are independenl and should beincluded as state variables: the radius of moon, R, the radius of orbit, r, and thealtitude h = r - R. After a number of measurements, h can theoretically bedetermined with great accuracy. However, uncertainties in Rand r rem~n ~arg~;

Ihus Rand h or r and h should he used as stale variables. Otherwise, h = r - Rtends to have a large error due to roundoff errors in ; and R.

TABLE 8.3-1 TYPICAL ERRORS FOR AN ORBITAL ESTIMATiON PROBLEM FOR20 ORBITS (REF. 35) - ISO NM ORCULAR ORBIT, 28 BIT MACHINE

Good A Priori Estimates (Ref. 36) - For a fixed-point machine, roundoffcauses errors in the last bit. Therefore, it as advantageous to fill the whole wordwith significant digits. For example, suppose the maximum allowable number is1000 with a roundoff error of ± 1. Thus, if all numerical quantities can be keptnear 1000, maximum error per operalion '>! 1/1000 = 0.1%. It follows thai it isadvantageous to obtain a good initial estimate, &0' Then Pk and Rt will notdiffer greatly from Po and &" and all of Ihe quanti lies can be scaled 10 flll theentire word.

The E Technique - The E technique was discussed in Section 8.1, inconjunction with finite memory filtering. Recall from that discussion that thistechnique consists of increasing the Kalman gain in such a manner so as to giveextra weight to the most recent measurement. The resultant theoretical P matrixis increased according 10 Eq. (8.1-12). Therefore, the e technique also producesthe desired result for the problem al hand, preventing Pk from becomingunrealistically small. Often a modification of the technique is used, where thediagonal terms ofPk(+) are not allowed 10 fall below a selected threshold.

The effectiveness of the e lechnique is illustraled in Table 8.3-1. Note thatwithout the E technique, and contrary to the previous discussion, symmetricrounding is inferior to chopping. This is commented upon in Ref. 3S as beingcurious and points out that each problem may have unique characteristics whichneed 10 be examined closely.

Maximum Error

Without £ Technique With I: Technique

r Symmetric Rounding 32,OOOft 153 ft

I Chopping 12,OOOft 550 ft

8.4 ALGORITHMS AND COMPUTER LOADING ANALYSISThis chapler has emphasized the fact that in any real-lime application of

Kalman filtering, key issues are the accuracy of the implemented algorithm inthe face of finite word length and other "noise" sources, and the associated

(8.3-30)

(8.3-31)

(I)eale = H& +f. , 11,1 (nxl)

= (r)".o +f.

E = rt + r2 + ... + rn (per component)

r = word length error at each multiplication

The value of I calculated by the compuler is

where

EFFECT OF WORD LENGTH ERRORS

In Kalman ftlter implementations, word length errors tend to manifestthemselves as errors in the calculation of the "filter-computed" covariancemalrix Pk(+) and in !k' Since the equalion for Pk(+) does nol accounl for wordlength errors, Pk(+) lends 10 assume an unrealistically small value and perhapseven loses positive-definiteness and thus, numerical significance.* This causesinaccuracies in the calculation of the gain Kk which in turn may cause theestimate &k to diverge from the true value ~k.

There are several ways in which sensitivity to word length errors can bereduced. These include minimizing the number of calculations, increasing wordlength (double precision), specialized Kalman filler algorilhms, e technique,choice of state variables, and good a priori estimates. The first of these involvestechniques described elsewhere in the text (decoupHng, reduced state filters,etc.), while the second is an obvious solution limited by computer size. Special.ized Kalman filler algorilhms are discussed in Section 8.4. The remaining altema,lives are discussed separately in what foUows.


computer burden in terms of memory and execution time. In this section,certain specialized algorithms are discussed from this perspective, including anexample of "computer loading analysis."

PROCESSING MEASUREMENTS ONE AT A TIME

As early as 1962, it was known that the:Kalman filter algorithm was in such aform as to avoid the requirement for taking the inverse of other than a scalar(Refs. 37 and 38). This is because the update equation for the covariance matrix,

(8.4-1)

contains an inverse of the dimension of the measurement vector Z. and thisdimension can always be taken to equal one by considering the simultaneousmeasurement components to occur serially over a zero [i.e., very short) timespan.

Example 8.4-1

Suppose a system is defined by

[I ~j [1j [I 0]PI-I· ••. , H= , R=~ 1 0 I 0 I

where the subscripts k have beendropped for convenience. It follows fromEq. (8.4-1) that

IWLEMENTATION CONStDERATIONS 306

and obtain the intermedillte result (denoted Pi(+»

Pi(+)= [~

Utilizing this result, the desired value of P(+) is calculated by next assuminc

H = (0 IJ, R = (II

It is easily verified that this yields Eq. (8.4-2). The advantage of this method. is that only a1 X 1 matrix need be inverted, thus avoiding the programming (storage) of a matrix inverseroutine. However, some penalty may be incurred in execution time. Similar manipulationscan be perfonned to obtain i.

Although this example considers a case where the elements of the measut'ement vectorare uncorrelated (i.e., R is a diagonal matrix), the technique can be extended to the moregeneralcase(Ref. 39).

MATHEMATICAL FORM OF EQUATIONS

Recall that the covariance matrix update equations are, equivalently,

P(+)=[~ ~]- [~ :] [~ :r [~ :]J: ~]Ls IS (1.4-2)

p(+)=(I-KH)P(-)

and

p(+)= (I-KH)P(-)(I-KHlT +KRKT

where

(8.4·3)

(8.4-4)

On the other hand, let us now assume that the two measurements are separated by an"instant" and process them individually. For this case,we rust assume

H=[I OJ, R=(IJ

and the subscripts have again been dropped for convenience. Bucy and Joseph(Ref. 36) have pointed out that Eq. (8.4-3), although simpler, is computationallyinferior to Eq. (8.4-4). If K~ K + 6K, it is easily shown that

6p(+) = -6K HP(-l

for Eq. (8.4-3). For Eq, (8.4-4), we instead find

(8.4-5)

6p(+)= 6KIRKT -HP(-)(I-KHlT) + [KR-(I-KH)P(-)HT) 6KT (8.4-6)


to first order. Equation (8.4-4) is sometimes referred to as the "Josephalgorithm:' Of course, despite its inherent accuracy, Eq. (8.4-4) consumesconsiderably more computation time than Eq. (8.4-3).

SQUARE-RQOT FORMULATIONS

In the square-root formulation, a matrix W is calculated instead of P, where P= WWT _ Thus, 1'(+) is always assured to he positive definite. The SQuare-rootformulation gives the same accuracy in single precision as does the conventionalformulation in double precision. Unfortunately, the matrix W is not unique, andthis leads to a proliferation of square-root algorithms (Refs. 11,39-42,53,54).The square-root algorithm due to Andrews (Ref. 41) takes the form:

Substituting for K, we obtain

oP(+) = 0 (8.4-7)

Of course, a penalty is paid in that the "square root" of certain matrices suchas R must be calculated; a somewhat tedious process involving eigenvalueeigenvector routines. An indication of the number of extra calculations requiredto implement the square-root formulation can be seen from the sample caseillustrated in Table 8.4-1, which is for a state vector of dimension 10 and a scalarmeasurement. This potential increase in computation time has motivated asearch for efficient square root algorithms. Recently, Carlson (Ref. 53) hasderived an algorithm which utilizes a lower triangle form for W to improvecomputation speed. Carlson demonstrates that this algorithm approaches orexceeds the speed of the conventional algorithm for low-order filters andreduces existing disadvantagesof square-root filters for the high-ordercase.

TABLE 8.4-1 COMPARISON OFTHENUMBER OFCALCULATtONS INVOLVEDINTHECONVENTIONAL ANDSQUARE·ROOT FORMULATIONS

OFTHEKALMAN FILTER(REF. 11)

U datp e:

Square M&D ~ A&S 1 EquivalentRoots I M&D

I Conventional 0 310

I211 352

Square-Root 1 322 302 387

(8.4-8)

(8.4-9)

Update W(+)= W(-)[l - Z(UT)" (U + Vr' ZTJ

Extrapolation ft W(-) = FW(-) +(~)O[WT(_)j"

where

and where the subscripts have been dropped for convenience. Note that Eq.:8.4-9) can be verified by expanding the time derivative ofP:

E tr Iatix apo ton:

Square M&D 1 A&S t EquivalentRoots i M&D

I Conventional 0 2100

I2250

I2550

Square-Root 10 4830 4785 5837

Note: M&D= multiplicationsanddivisions-- A&S= additionsand subtractions

(8.4-10)

Z=WT(_)HT

UUT = R + ZTZ

WT=R

= [FW+ !.Q(WTf'JWT +W[WTFT +!.W·, OJ2 2

= F(WWT)+!.O+(WWT)FT +1 02 2

= FP+ PFT + 0

(8.4-11)

Example 8.4-2 illustrates the manner in which square-root algorithms minimize roundoff error.

'chmidt (Ref. 11) has shown how to replace the differential equation in Eq.8.4-9) with an equivalent difference equation.

308 APPLI£D OPTIMAL ESTIMA nON IMPLEMENTATION CONSIDERATIONS 309

TABLE 8.4-2 TYPICAL CYCLE TIME FOR KALMAN FILTER EQUATIONS(20 STATE FILTER, 9 STATE MEASUREMENT)

refers to the time required to compute the Kalman filter covariance and gain.update the estimate' of the state, and extrapolate the covariance to the nextmeasurement time. Considering the nature of the Kalman filter equations, onecan see from the execution times in Table 8.4-2 that the computer multiply timetends to dominate the algebraic calculations. In Ref. 44, it is shown that logictime is comparable to multiply time.

Storage. or computer memory, can be divided into the space required to storethe program instructions and the space required to store scalars. vectors.matrices, etc. If one does not have past experience to draw upon, the first step indetermining the space required for the program instructions is to simply write

computation time requirements. These estimates can then be used to establishtradeoffs between computer size and speed and algorithm complexity. Ofcourse, the constraints of the situation must be recognized; frequently thecomputer has been selected on the basis of other considerations. and the Kalmanfilter algorithm simply must be designed to "fit."

Computation time can be estimated by inspecting Kalman filter algorithmequations - i.e., counting the number of "adds." "multiplies," and "divides."multiplying by the individual computer execution times, and totalling theresults. Additional time should also be added for logic (e.g.• branchinginstructions), linkage between "executive" and subprograms, etc. An exampleof the results of such a calculation for several airborne computers is shown inTable 8.4-2 for a 20-state filter with a 9-state measurement. "Total cycle time"

97.5U.80.9

1982 1974 1965 1963Computer Computer Computer Computer

(Fast) (Moderate) (Slow) (Slow-No FloatingPoint Hardware)

fpsec) (Psec) (,usee) (,usee)

Load 0.4 1 8 t20

Multiply 1.0 6 100 800

Divide 4.0 7 t30 1000

Add 0.8 2 50 300

Store 0.4 2 8 120

IncrementIndex Register 0.4 1 8 80

IE~timated Cycle I. Tune (sec) ,,0.18

EX4et K= r:<]Convenliono/ K= [~]Squtue·Root K= [~]

[ ' ]-<- 01+1:2

P(+) = 0 1

Example 8A·2 (Ref. 39)

Suppose

Using the square-root algorithm in Eq, (8.4..8), the result is

P(+) = [~' ~]

P(-)=[~ ~],H=(1 Ol,r=<'

P(+) = [~ ~]

whereas the value calculated in the computer using the standard Kalman filter algorithm inEq. (8.4-1) is

wher; I: « 1 and to simulate computer word length roundoff, we assume I + I: ¢ I butI + I: ill I. It follows that the exact value for P(+) is

Since

it foDows that:

COMPUTER LOADING ANALYSIS

The burden that Kalman filtering algorithms place on real-time digitalcomputers is considerable. Meaningful measures of this burden are storage andcomputation time. The first item impacts the memory requirements of thecomputer. whereas the second item helps to determine the rate at whichmeasurements can be accepted. If a Kalman fllter algorithm is to be programmedin real time, it is generally necessary to form some estimate of these storage and

Clearly the conventional formulation may lead to divergence problems.


the program out in the appropriate language. Then, using the knowledge of theparticular machine in question, the number of storage locations can bedetermined. For example, the instruction A + B=C requires I 1/2 28-bit wordsin the Litton LC-728 Computer.

When determining the storage required for matrices, etc., care must be takento account for the fact that many matrices are symmetric, and that it may bepossible to avoid storing zeros. For example, any n X n covariance matrix onlyrequires n(n + 1)/2 storage locations. Similarly, transition matrices typicallycontain many zeros, and these need not be stored. Of course, overlay techniquesshould also be used when possible; this can be done for Pk(+) and Pk( - ) , forexample.

For the example illustrated in Table 8.4-2, 3300 32-bil words of storage wererequired. Of these, 900 words were devoted to storage of the programinstructions. A further example of storage requirements is illustrated in Table8.4·3. The total storage requirement (exclusive of program linkage) is 793 32-bitwords. The computer in question is the Raytheon RAC-230 and the speed of thismachine is comparable with the "1968 computer" in Table 8.4-2 (Ref. 45). TheKalman filter equation cycle time is computed to be 9.2 msec.

TABLE 8.4-3 COMPUTER OPERATIONS REQUIRED FOR A 5-8TATE FtLTERWITHA 3-8TATE MEASUREMENT"

Eraseable Permanent ProgramOperation Storage Storage Instruction Divide Multiply Add

I. Compute e 5 23 6 2

2. Compute

!'k(-) ="'1'k_1(+)",T 29 58 250 300

3. Compute H 31 345 54 92 51

4. Compute

Kk =!'k(-)HT.

[HPk(-)HT + Rl" 59 9 164 9 144 173

5. Compute

I'k(+) =[1-KkHII'kH 2 22 75 125

6. Computeik(+) = KkQ 3 2 35 18 21

Total 98 65 630 63 581 670

*There IS no process norse in this example (Q = 0).


Note that for the 5-state filter in Table 8.4·3, program instructionscomprise the major portion of computer storage, whereas the opposite istrue for the 20-state filter in Table 8.4·2. The reason for this is that the matrixstorage space tends to be proportional to the square of the state size. Conversely,program instruction storage is relatively independent of state size, so long asbranch-type instructions are used. Branch instructions are those that set uplooping operations. Thus, for matrix multiplication, it would require the samenumber of machine instructions to multiply a 2 X 2 matrix as a 20 X 20 matrix.

Gura and Bierman (Ref. 43), Mendel (Ref. 44) and others have attemptedto develop parametric curves of computer storage and time requirements, forseveral Kalman filter algorithms ("standard" Kalman, Joseph, square-root, etc.).A partial summary of their storage results for large n (n is the state vectordimension) is shown in Table 8.44, where m is the dimension of the measurement vectors. Mendel has considered only the "standard" Kalman filteralgorithm, and has constructed plots of storage and time requirements vs. thedimension ofthe filter state, measurement,and notse.e

TABLE 8.4-4 KALMAN FILTER STORAGE REQUIREMENTS FOR LARGE N(PROGRAM INSTRUC110NS NOT INCLUDED) (REF. 43)

Storage Locations

n a m

Algorithm n> m nO!: 1 n>1 m>n

Standard Kalman* 2.5 n' 3.5 n(n + 1.3) 3.5 n2 m2

Josephl 2.502 1.5 n(n + 1) 1.S n2 m2

Andrews Square-Root 3 n2 5.5 n(n + 0.8) 5.5 n2 2.5 m2

Standard Kalman's(no symmetry) 3 n2 5 n(n + 0.6) 5 n2 2m2

Josephl(no symmetry) 3 n2 6 n(n + 0.6) 6 n2 2 m2

*Eq. (8.4-1)tEq. (8.4-4)n is the state vector dimensionm is the measurement vector dimension

*Recall R may be singular - Le., the dimension of.,n may be less than the dimension of.l.k.

REFERENCES

1. Price, C.F., "An Analysis of the Divergence Problem in the Kalman filter;' IEEETransactions on Automatic Control, Vol. AC-13, No.6, December 1968, pp. 699-702.

2. Fitzgerald, R.J., "Error Divergence in Optimal Filtering Problem," Second IFACSymposium on Automatic Control in Space, Vienna, Austria, September 1967.

3. Fitzgerald. ~.J., "Divergence of the Kalman Filter," IEEE Transactions on AutomaticControl, Vol. AC·16, No.6, December 1971, pp. 736-747.

4. Nahi, N.E. and Schaefer, B.M., "Decision-Directed Adaptive Recursive Estimators:Divergence Prevention," IEEE Transactions on Automatic Control, Vol. AC-17, No. I,February 1972. pp. 61-67.

5. Fraser, D.C., "A New Technique for the Optimal Smoothing of Data," Ph.D. Thesis,Massachusetts Institute of Technology, January 1967.

6. D'Appolito, J.A., "The Evaluation of Kalman Filter Designs for Multisensor IntegratedNavigation Systems," The Analytic Sciences Corp., AFAL-TR·7o-271. (AD 881206),January 1971.

7. Lee, R.C.K., "A Moving Window Approach to the Problems of Estimation andIdentification;' Aerospace Corp .• El Segundo, California, Report No. TR-I001(2307)-23. Juoe1967.

8. Jazwinski, A.H., "Limited Memory Optimal Filtering;' IEEE Transactions on Automatic Control, Vol. AC-13, No. 5. October 1968, pp. 558-563.

9. Jazwinski, A.H., Stochastic Processesand Filte,ing TheOTY. Academic Press, New York,1970.

10. Schmidt, S.F., Weinberg, J.D., and Lukesh, J.S., "Case Study of Kalman Filtering in theC-5 Aircraft Navigation System," Case Studies in System Control, University ofMichigan. June 1968, pp. 57-109.

II. Schmidt. S.F .• "Computational Techniques in Kalman Filtering," Theory and Applica·tions of Kalman Filtering, Advisory Group for Aerospace Research and Development,AGARDograph 139. (AD 704 306). Feb. 1970.

12. Deutsch, R.• Estimation Theory, Prentice-Hall, Inc.• Englewood Cliffs, N.J., 1965.

13. Fagin, S.L., "Recursive Linear Regression Theory, Optimal Filter Theory. and ErrorAnalysis of Optimal Systems," IEEE International Convention Record, March 1964.pp.216-240.

14. Tarn, T.S. and Zaborsky, J., "A Practical Nondivergina Filter;' AIM Journal, Vol. 8,No.6, Juoe1970. pp. 1127-1133.

15. Sacks, J.E. and Sorenson, H.W., "Comment on 'A Practical Nondiverging Filter: ,.AIM Journal, Vol. 9. No.4, April 1911. pp. 767, 768.

16. Miller, R.W., "Asymptotic Behavior of the Kalman Filter with Exponential Aging,"AlAA Journal. Vol. 9, No.3, March 1971, pp. 537-539.

17. Jazwinski, A.H., "Adaptive Filtering," Automattca, Vol. 5, 1969, pp. 975-985.

18. Leondes, C.T., ed.• Guidance and Cont,aI of Aerospace Vehicles, McGraw-Hili BookCo., Inc., New York. 1963.

19. Britting, K.R., lneniat Navigation Systems Analysis, John Wiley & Sons. New York,1971.


20. Nash, R.A., Jr., Levine, S.A., and Roy, K.J., "Error Analysis of Spacc-Stable InertialNavigation Systems," IEEE 1'Ttlmtlctions on Aerospace tmd Electronic Systems. Vol.AES-7, No.4, July 1971, pp. 617-629.

21. Hutchinson. C.E. and Nash, R.A., Jr.• "Comparison of furor Propagation in Local-Leveland Space-Stable Inertial Systems," IEEE TraIllilCtionS 011 AeTOf/HlCl! and ElectronicSystmls. VoL AES-7. No. 6. November 1971, pp. 1138·1142.

22. Kayton, M. and Fried. W.• eels•• Avionics NlWigtln·on Systems, John Wiley &:Sons, Inc.,NewY<nk.1969.

23. O"Halloran, W.F., Jr., "A Suboptimal Error Reduction Scheme for a Long-TermSdf-Contained Inertial Navigation System." National Aerospace and ElectronicsConference, Dayton, Ohio, May 1972.

24. Nash, R.A., Jr., D'Appolito, J.A., and Roy. K.J .• "Error Analysis of Hybrid InertialNavigation Systems," AIAA Guidance and Control Conference, Stanford, California,August 1972.

25. Bona, B.E. and Smay, R.J., "Optimum Reset of Ship's Inertial Navigation System,"IEEE Transactions M Aerospace and Electronic Systems, Vol. AES-2, No.4, July1966, pp. 411'J.414.

26. Olin, P.P., "Real Time Kalman Filtering of APOLW LM/AGS Rendezvous RadarData," AIAA Guidance, Control and Flight Mechanics Conference, Santa Barbara,California, August 1970.

27. Brown, R.G., "Analysis of an Integrated Inertial-Doppler-Satellite Navigation System.Part I, Theory and Mathematical Models," Engineering Research Institute, Iowa StateUniversity, Ames, Iowa, ERI-62600, November 1969.

28. Brown, R.G. and Hartmann, G.L., "Kalman Filter with Delayed States as Observables,"Ptoceedings of the Natiol1lll Electronics Conference, Chicago, Illinois, Vol. 24,December 1966, pp. 67-72.

29. Klementis, K.A. and Standish, C.J., "Final Report - Phase I - Synergistic NavigationSystem Study," IBM Corp., Owego, N.Y., IBM No. 67-923-7, (AD 678 070), October1%6.

30. Faddeev, D.K. and Faddeeva, V.N.• ComputatiotUll Methods of Linear AIgeINIl.W.K.Freeman and Co••San Francisco, Cal, 1963.

31. Kelly, LoG., Handbook of Numerical Methods and Applications, Addison-WesleyPublishing Co" Reading, Mass., 1967.

32. Grove. W.E., Brief NumeTical Methods. Prentice-Hall, Inc., Englewood Qiffs, N.J.•1968.

33. D'Appolito, LA., "A Simple Algorithm for Discretizing Linear Stationary ContinuousTime Systems," hoc. of the IEEE, Vol. 54, No. 12, December 1966. pp. 201-0, 2011.

34. Lee, J.S. and Jordan, J.W., "The Effect of Word Length in the Implementation of anOnboard Computer,' ION National Meeting, Los Angeles, California, July 1967.

35. "Computer Precision Study II," IBM Corp., Rockville, Maryland, Report No.SSD-TDR-65-134, (AD 474 102), October 1965.

36. BuC)', R.S. and Joseph, P.D., Filtering for Stochastic Processes With Applications toGuidlznce, Interscience Publishers. N.Y., 1968.

37. Ho, Y.C•• "The Method of Least Squares and Optimal Filtering Theory," Memo.RM-3329-PR, RAND Corp., Santa Monica. Cal.• October 1962.

38. Ho, Y.C., "On the Stoclwtic Appl'oximation Method and Optimal Filtering Theory,J.Math. Analysu and Applications, 6, 1963, J;>p. 152-154.


39. Kaminski, P.G., Bryson, A.E., Jr. and Schmidt, J.F., "Discrete Square Root Filteril18: ASurvey of Current Techniques," IEEE Transactions on Automatic Control, Vol. AC-16,No.6, December 1971, pp. 727-736.·

40. Bellantoni, l.F. and Dodge, K.W., ....A Square Root Formulation of the Kalman-SchmidtFilter," AIAA Journal, Vol. 5, No. 7, July 1967.pp. 1309-1314.

41. Andrews, A., "A Square Root Formulation of the Kalman Covariance Equattcns,"AIAA Joumal, June 1968, pp. 1165, 1166.

42. Battin, R.H., Astronautical Guidance, McGraw-Hill Book Co., lnc., New York, '964,pp.338-339.

43. Oura, I.A. and Bierman, A.B., "On Computational Efficiency of Linear FilteringAlgorithms;' Automatica, Vol. 7, 1911. pp. 299-314.

44. Mendel, J.M., "Computational Requirements for a Discrete Kalman Filter," IEEETransactions on Automatic Control, Vol. AC-16, No.6, December 1971, pp. 148~758.

45. Dawson, W.F. and Parke, N.G., IV, "Development Tools for Strapdown GuidanceSystems;' AIAA Guidance, Control and Flight DYnamics Conference, Paper No.68-826, August 1968.

46. Toda, N.F., Schlee, F,H. and Obsharsky, P., "The Region of Kalman Filter Convergencefor Several Autonomous Modes;' AIAA Guidance, Control and Flight DynamicsConference, AIAA Paper No. 67-623. Huntsville. Alabama, August 1%7.

47. Schlee, F.Il., Standish, c.J. and Toda, N.F., "Divergence in the Kalman Filter," AIAAJoumal, Vol. 5, No.6. pp. 111....1120. June 1967.

48. Davisson, L.D., "The Theoretical Analysis of Data Compression Systems," Proceedingsof the IEEE, VoL 56, pp. 176-186, February 1968.

49. Bar-Shalom. Y., "Redundancy and Data Compression in Recursive Estimation," IEEETrans. on Automatic Control, Vol. AC-11, No.5, October 1972, pp. 684-689.

50. Dressler, R.M. and Ross, D.W., "A Simplifiea Algorithm for Suboptimal Non-LinearState Estimation;' Automatica, Vol. 6, May 1970, pp. 477-480.

51. Joglekar, A.N., "Data Compression in Recursive Estimation with Applications toNavigation Systems," Dept. cJ. Aeronautics and Astronautics, Stanford University,Stanford, California, SUDAAR No. 458, July 1913.

52. Joglekar, A.N. and Powell, J.D., "Data Compression in Recursive Estimation withApplications to Navigation Systems," AIAA Guidance and Control Conference, KeyBiscayne, Florida, Paper No. 13-901, A.ugust 1913.

53. Carlson, N.A., "Fast Triangular Formulation of the Square Root Filter," AIAA Journal,Vol. II, No.5, pp. 1259-1265, September 1973.

54. Dyer, P. and McReynolds, "Extension of Square-Root Filtering to Include ProcessNoise," Journal of Optimization Theory and Applications, Vol. 3, No.6, pp. 444-458,1969.

PROBLEMS

Problem 8·'

Verify that Eq. (8.1-12) does Yield the correct covariance matrix when the E technique isemployed.

Problem 8-2

In Example 8.1·3, derive the equation for p 00


Problem 8-3

Show that when the second measurement in Example 8.4-1 is processed, the correctcovariance matrix is recovered.

Problem 8-4

Show formallY why the filter in Example 8.1-1 is not asymptotically stable.

Problem 8-5Apply the _ technique to the system in Examples 8.1·1 and 8.1·3. Compare the result

with the other two techniques in the examples.

Problem8'{;

Show the relationship between the exponential series for the transition matrix [Eq.(8.3-18)] and the Euler and modified Euler integration algorithms in Section 8.3.

Problem 8-7

Using the definition P :: WWT, set up the equations for W, the square root of the matrix

p e G~JNote that the equations do not have a unique solution. What happens when W is constrainedto be triangular?

9. ADDITIONAL TOPICS ADDITIONAL TOPICS 317

9.1 ADAPTIVE KALMAN FILTERING

We have seen that for a Kalman filter to yield optimal performance, it isnecessary to provide the correct a priori descriptions ofF, G, H, Q, R, and P(O).As a practical fact, this is usually impossible;guesses of these quantities must beadvanced. Hopefully, the fllter design will be such that the penalty formisguesses is small. But we may raise an interesting question - i.e., His itpossible to deduce non-optimal behavior duringoperationandthusimprove thequality of a priori information?" Within certain limits, the answer is yes. Theparticular viewpoint given here largely follows Mehra (Ref. I); other approachescan be found in Refs. 2-4.

INNOVATIONS PROPERTY OF THEOPTIMAL FILTER

For a continuoussystemandmeasurement givenby

(9.1·1)

(9.1·2)

and a filter given by

the innovations property (Ref. 5) states that, if K is the optimal gain,

In other words the innovations process,£, is a white noiseprocess.Heuristically,there is no "information" left inE, if i is an optimal estimate.

Equation (9.1·5) is readily proved. From Eqs. (9.1·2) and (9.1·4) we see thatU=& -l\):

(9.1·7)

(9.1·6)

(9.1·5)

(9.1·3)

(9.1-4)

El£(t2)£(t,)T] =H(t2) Er~(t2)~T(tl)1 HT(t ,)

- H(t2) El&(t2) r T(t ,)] +R(t ,)6(t2 - t,l

Thus, for t1;> t 1 • we get

This chapterpresents brief treatments of several important topics which areclosely related to the material presented thus far; each of the topics selected haspractical value. For example, while it has been assumed until now that theoptimal filter, once selected. is held fixed in any application, it is entirelyreasonable to ask whether information acquired duringsystemoperationcanbeused to improve upon the a priori assumptions that were made at the outset.This leads us to the topic of adaptive filtering, treated in Section 9.1. One mightalso inquire as to the advantagesof a filter chosen in s form similarto that of theKalman filter, but in which the gain matrix, K(t), is specified on a basis otherthan to produce a statistically optimal estimate. This question is imbedded in thestudy of observers, Section 9.2, which originated in the treatment of statereconstruction for linear deterministic systems. Or, one might be interested int!te class of estimation techniques which are not necessarily optimal in anystatistical sense, but which yield recursive estimators possessing certain welldefined convergence properties. These stochastic approximation methods areexamined in Section 9.3. The subject of real-time parameter identification can beVIewed as an application of nonlinear estimation theory; it is addressed inSection 9.4. Finally, the very important subject of optimal control - whosemathematics, interestingly enough, closely parallels that encountered in optimalestimation - is treated in Section 9.S.

From Eqs. (9.1·1, 2, 3, 4) it is seen that i satisfies the differential equation

(9.1.8)

318 APPLIED OPTIMAL ESTIMATION ADDITIONAL TOPICS 319

The solution to this equation is

where <l>(t" t.) is the transition matrix corresponding to (F - KH). Using Eq.(9.1-9), we directly compute

E!i(t,HT(t,}] = <l>(t" tl}P(t l}

EI'&(t,)yT(t ,}] =<l>(t" tl}K(t,}R(t ,}

(9.1-1O)

(9.1-11)

covariance, the experimentally measured steady-state correlation functionEl£k.l4c_?] is processed to identify unknown Q and R, for known F, G and H.It can be shown that, for this case, the value of K which whitens the innovationsprocess is the optimal gain. If F, G, and H are also unknown, the equations foridentification are much more complicated. Care must be exercised in theidentification of unknown system matrices from the innovations sequence; forexample, it is known that whiteness of the innovations sequence is not asufficient condition to identify an unknown system F matrix. Thus, non-uniquesolutions for the system F matrix can be obtained from an identification schemeb~<''''.-I on the innovations sequence. The following simpleexample demonstratesuse of the innovations sequence for adaptive Kalman filtering,

Therefore, from Eqs. (9.1-7,10, 11)

El£(t,}£T (tl}J = H(t,) 4>(t" t l} [P(t l} HT(II} - K(II} R(tl}]

+ R(II}8(11 - I,} (9.1-12)

BUIfor the optimal filter K(t l} =P(t l} HT(t l} R,I(ld, therefore,

which is the desired result. Note that Eq. (9.1-12) could have been employed ina derivation of the Kalman gain as that which "whitened" the process£(t).

ADAPTIVE KALMAN FILTER

At this point we restrict our attention to time-invariant systems for whichEqs. (9.1-1) and (9.1-3) are stable. Under this condition the autocorrelationfunction, El£(t,}£T(t l}], is a function of, = I,-t l only, viz.:

IDENTIFIER1--------------;I TESTS ON THE II INNOVATION II SEOUENCE II II CHANGESTO Kk II IL ---J

(9.1-14) Figure 9.1-1 Adaptive Kalman Filter

In the stationary, discrete-time case. corresponding results are

Ef£k £k_iT] = HP(-} HT + R fori = 0

=H[(I-KH}Ji- 1 4>(P(-}HT -K[HP(-}HT +R])

for j > 0 (9.1-15)

which is independent ofk. Here, «P is the discrete system transition matrix. Notethat the optimal choice, K = P(-} HT[HP(-} HT + Rr ', makes the expressionvanish for alli*O.

In an adaptive Kalman filter, the innovations property is used as a criterion totest for optimality, see Fig. 9.1-1. Employing tests for whiteness, mean and

Example 9.1-1

Suppose we have the continuous (scalar) system and measurement given by

x=w, w--N(O,q)

z = x + v , v ,.. N(O,r)

and utilize the measurement data to estimate x according to

x= k(i - x)where k is based on a set of incorrect assumed values for q and r. The true values of q and rcan be deduced as follows.

First, for steady-state system operation. the process v(t) = z - xis recorded and itsautocorrelation function obtained. For a sufficient amount of information, this yields (f=O,.h=I, in Eq.(9.1-12)1'

E(v(t) v(t - T») =<PW(T)= e-k lTI (Poo- kr) + r 6(T)

Next, this experimental result is further processed to yield the numbers rand (Poo- kr).Here, Poo is the steady-state value of p; it is the solution to the linear variance equationassociated with Eq. (9.1-8),

Poo= 0= -2kpoo +q +k 2r

With the values of k, r, and Pooalready known, this equation is solved for q. Thus, we haveidentified rand q based on analysis of data acquired during system operation, for this simpleexample.

ADDITIONAL TOPICS 321

As formulaled by Luenberger, an observer is designed 10 be an exponentialestimator, - i.e., for a time-invariant linear system, the estimation error willdecay exponentially. Since there is no stochastic covariance equation which canbe used to specify a unique optimal observer in the minimum mean square errorsense, the eigenvalues of the observer can be chosen arbitrarily to achieve desiredresponse characteristics. The observer response time chosen should be fastenough to provide convergence of the estimates within the time interval ofinterest. Observers can also be constructed to provide accurate state estimatesfor time-varying, deterministic systems ..... provided the observer response time ischosen to be short, relative to the system time variations.

In the sequel, the theory of reduced-order observers for continuousdeterministic dynamic systems is presented. These results are then generalized tocontinuous stochastic estimation problems, containing both noisy and noise-freemeasurements. The stochastic observer unifies the concepts of deterministicLuenberger observer theory and stochastic Kalman filtering theory (Refs. 10 and11). Only continuous linear systems are treated herein; however, analogousresults have also been derived for discrele syslems (Refs. 12, 13 and 14).

where y(l) is a delerministic (conlrol) inpul. Observations of Ihe state areavailable according to

OBSERVERS FOR DETERMINISTIC SYSTEMS

In this section, Luenberger's theory of reduced-order observers is described ina form which facilitates extension to stochastic state estimation for time-varyingsystems. Consider a linear deterministic nth-order syslem described by

H(I) is an m x n measuremenl malrix (m < n) which is assumed 10 be of fullrank. Thus, 1.(1) represenls m linearly independenl combinalions of Ihe statevector, 3(1). It is also assumed that Ihe syslem described by Eqs. (9.2-1) and(9.2-2) is completely observable (the observability condition is defined inSection 3.5).

It is desired 10 provide an estimate of Ihe state vector, .&(1), employing an(n - m)th-order observer. To do this, introduce an (n - m) dimensional vector

Sm,

In praclice, Ihe crux of Ihe mailer is processing El.I4c£k_ jT I 10 yield the

quantities of interest. For the high-order systems of practical interest, thealgorithms proposed in Refs. 1 and 3 may not work as well as theory wouldpredict; other more heuristically motivated approaches may be both computationally simpler and more effective (e.g., Ref. 6). Nevertheless, Iheviewpoint presented herein is enlightening, and thus worthwhile. Extensions ofthe viewpoint to non-real-time adaptive smoothing may be accomplishedthrough application to the forward filter. Here again, heuristically motivated andcomputationally simpler approaches may have a great deal to offer in practice.

9.2 OBSERVERS

In some estimation problems, it may be desired to reconstruct the state of adeterministic, linear dynamical system - based on exact observations of thesystem output. For deterministic problems of this nature, stochastic estimationconcepts are not directly applicable. Luenberger (Refs. 7 and 8) formulaled Ihenotion of an observer for reconstructing the state vector of an observabledeterministic linear system from exact measurements of the output.

Assume that m linearly independent, noise-free measurements are availablefrom an nth.order system (m < n). The initial system state, 30, is assumed to bea random vector. Then an observer of order (n-m) can be formulated which, byobserving the system output, will reconstruct the current state of the systemexactly in an asymptotic sense. Hence, an observer is a reduced-order estimator.A major application of observer concepts has been to deterministic feedbackcontrol problems, where the control law may depend on knowledge of all thesystem states, while only limited combinations of the states are measurable (Ref.9).

Ji(t) = F(I) 21(1) +L(I) !!(I) ; 3(l n) = 30

lil) = H(I) 11m

!(I) = T(I) 2\(1)

such that

(9.2-1)

(9.2-2)

(9.2-3)

(9.2-4)

322 APPLIED OPTIMAL ESTIMATION ADDITIONAL TOPtcs 323

is a nonsingular matrix. The vector ~(t) represents (n - m) linear combinationsof the system states which are independent of the measurements, ~ft). It istherefore possible to obtain the inverse transformation In order to reconstruct an estimate, i,of the vector i, it is appropriate to design

an observer which models the known dynamics of 1,given by Eq. (9.2-11),utilizing ~ and 1! as known inputs. We are therefore led to an observer of theform

The concept of observers is based on devising an (n - m)th-order estimator forthe transformed state vector ~(t), which can then be used to reconstruct anestimate of the original state ;ector ~(t), according to the relationship of Eq.(9.2-7). In the following development, the form of the dynamic observer ispresented and the corresponding error equations derived.

At the outset, some constraint relationships can be established between theA, B, T and H matrices, viz":

For convenience, define

r.!~tD -I ~ [A(t): B(t)]LH(t~

so that

3<t) = A(t) i<t) + B(t)z(t)

AT+BH=I

(9.2-5)

(9.2-6)

(9.2-7)

(9.2-8)

f = (TFA - TA) i + (TFB - Til)z + TLl!

i = (TFA - TA) i + (TFB - T13)Z+ TLl!

&=Al+ Bz

A block diagram of this observer is illustrated in Fig. 9.2-1.

(9.2-11)

(9.2-12a)

(9.2-12b)

all}

These constraints. which are a direct consequence of the inverse relationshipdefined in Eq. (9.2-6), are useful in what follows.

A differential equation for i can be easily obtained by differentiating Eq.(9.2.3) and substituting from Eqs. (9.2.1) and (9.2-7). The result is

and

i.= (TFA + fA) i + (TFB + 1B)1. + TLl!

(9.2-9)

(9.2-10)

fill .... ...J

Figure 9.2~1 Deterministic Observer Block Diagram

It is important to !!ote that for every initial state of the system, ~(to), thereexists an initial state i(to) of the observergiven by Eq. (9.2-12) such that &(t)=lilt) for any 11(t), for all t;;' to' Thus, if properly initialized, the observer willtrack the true system state exactly. In practice, however, the proper initialcondition is not known, so it is appropriate to consider the propagation of theobserver error. As mentioned previously, observers exhibit the property that theobserver error, defined by

By differentiating the appropriate partitions of Eq. (9.2-9), it is seen that therelationships TA = -TA and 1B = -TIi must hold. It is convenient to substitutethese relationships into Eq. (9.2-10) to obtain an equivalent differential equationfor i, in the form

*Henceforth, the time arguments are dropped from variable quantities. except wherenecessary for clarification.

(9.2-13)

decays exponentially to zero. This is easily demonstrated by subtracting Eq.(9.2-11) from Eq. (9.2·12a), to obtain the differential equation

(9.2-14)

324 APPLI EO OPTIMAL ESTIMATION ADDITIONAL TOPICS 32S

Note that if the observer is chosen to be asymptotically. stable, then !(t) willtend uniformly and asymptotically to zero for arbitrary !(to)' The stability ofthe observer and the behavior of! are both determined by the properties of thematrix TFA - TA; the eigenvalues of this matrix can be chosen arbitrarily byappropriate specification of T, A and B, subject to the constraints of Eqs. (9.2-8)aod (9.2·9).

Now considerthe total stateestimationerror.i. definedby

i=.&-ll (9.2·15)

• Choosean arbitral)' set of eigenvalues. A (complex conjugate pairs)

• Pick B such thatF -BHF hasA as its nonzeroset of eigenvalues

• Choose A and T consistent with AT + BH = I.

The choice of A and T which satisfies Eq. (9.2.8) is not unique. To illustratethis, suppose that an allowable pair of matrices (A*, T*) is chosen (a method forconstructing such an allowable pair is given in Ref. 52). Then the pau (A,T)

given by

The differential equation for i is derived in the following manner. From Eqs.(9.2.7) and (9.2.12b), it is seen thatj is related to f by T=MT* (9.2·20)

(9.2·16)

(9.2·17)

It follows from Eq. (9.2·16) that & tends uniformly and asymptotically to zero,owing to the convergence properties of i previously discussed. The corresponding relationship

zitI

Fip.R 9.2-2 Second-Order Tracking System Example

It is desired to construct an observer for this system.

with measurements of position available according to

. [XI] [0 11 [XIJ [0].1= X:z = 0 -~J X:z + Ii! U

ult)

z= [1 OJ [::]

Example9.2·1Consider the second-order example illustrated in Fig. 9.2-2. This system model might be

representative of a simplified one-dimensional tracking problem. where. Xl and X:z areposition and velocity of the tracked object. respectively. The system dynamics are

also satisfies Eq. (9.2·8), where M is any nonsingular matrix. The se~ ~f allallowable pairs (A, T) defines an equivalent class of observers which exhibit the

sameerrorbehavior.

(9.2·19)

(9.2.18)

~=(F - BHF - BH)&

~=Af+Ai

=(AT+ATF-ATAT)&

can be obtained by premultiplying Eq. (9.2·16) by T and invoking the constraint TA=I from Eq. (9.2·9). Using these relationships, the differentialequationfori can be derived as

Notice from Eq. (9.2·18) that the estimation error behavior depends onspecifyingthe matrixproductsAT andAT. It wouldbemoreconvenient,from adesign standpoint, to specify the desiredobserver errorcharacteristics in termsof fewer parameters. Fortunately, it is easy to demonstrate that Eq. (9.2·18) canbe writtenin the equivalent form

The transformation from Eq. (9.2·18) to Eq. (9.2-19) is left as an exercise forthe reader. (Hint: show that AT &= &, and employ the constraints AT + BH = 1and HA = -HA).

From Eq. (9.2·19), it is apparent that for a given system described by F andH, the estimation error depends only on the choice ofB. It can beshown that ifthe system is completely observable, the matrix B can bechosen to achieve anydesired set of (n - m) eigenvalues for the error response. For the special case ofa time-invariant system, the observer may be specifiedaccording to the followingprocedure:


From Eq. (9.2-9), HB = I, so that B is constrained to be of the form

The error dynamics can be written as (Eq. (9.2-19»)

,i= (F - BHF),i

where

[0 0]F - BHF=o -(Jl+b,)

Since n - m = I for this problem, the observer is of first order and is specified by the singleeigenvalue, A = ~ (j3 + bz). It is desirable to choose the observer time constant to besignificantly smaller than the system constant. Arbitrarily choose ;\ = -51J, which implies bz= 4,8. A possible choice of A and T, satisfying Eqs. (9.2·8) and (9.2-9), is

A= [~J. T=[-4P 1J

This completes the specification of the first-order observer. The corresponding blockdiagram is illustrated in Fig. 9.2-3. Although the choice of A and T is not unique, it iseasily demonstrated that any allowable choice of A and T leads to the equivalent observerconfiguration of Fig. 9.2-3. This is left as an exercise for the reader.

----------.--------i.. ~11t)

FiguJe 9.2·) Observer Configuration for Simplified Tracking Example

The observer performance may be illustrated by a numerical example. Assume that thesYstem parameters are

(1=1.0 sec-I, l!:=1.0, u(t) =-1.0 ft/secZ

Xl (0) =1.0 ft, xz(O) =1.0 ft/sec


The observer is initUilizedby choosing

so that i(O) = -4.0 ft/sec. The resulting system and observer outputs are plotted in Fig.9.2-4. Note that Xl is estimated without error since it is directly measured, while the errorin the estimate of Xzdecays exponentially to zero with a time constant of 0.2 sec.

OBSERVERS FOR STOCHASTIC SYSTEMS

The formulation for deterministic observers can be extended to encompassstochastic systems. Stochastic observer theory forges the link between reducedstate deterministic observers and optimal Kalman filtering; in some applications,the stochastic observer offers a convenient approach for the design ofreduced-state filtering algorithms. In the sequel, a heuristic approach is taken tothe design of stochastic observers.

Consider the system described by

(9.2-21)

For simplicity, deterministic inputs are not included in the system model; anydeterministic inputs are assumed to be compensated, and can be removed fromthe formulation without loss of generality. The measurements are assumed to bepartly deterministic and partly stochastic, viz:

(9.2-22)

Of the m measurements, rn, are noisy with measurement noise spectral densityR I , and mz are noise-free (m, = m - m.). As before, we assume that Hz is offull rank.

A configuration is sought for the stochastic observer that has the form of the(n - m)th.order deterministic observer when all the measurements are noise-free(m, = 0), and becomes the nth·order Kalman filter when all the measurementsare noisy (m, :: 0). For combined noisy and noise-free measurements, theconventional continuous Kalman filter cannot be implemented, due to singularity of the measurement noise covariance matrix. In Section 4.5, an approachis given for modifying the Kalman filter to circumvent the singularity of R forcertain special cases. The concept of stochastic observers presented herein isapplicable to the more general case and reduces exactly to the modified Kalmanfilter ofSection 4.5, under equivalent conditions.

It is immediately apparent from the deterministic observer block diagram.Fig. 9.2-1, that the noisy measurements must be processed differently than thenoise-free measurements to avoid the appearance of white noise on the output.It is necessary to provide additional filtering of the noisy measurements to avoidthis situation.


1.5

1.0xdt} AND ~I (tl-

zQ

2 0.5


Define the (n - m1) x n transformation

(9.2-23)

Such that

(9.2-24)

is a nonsingular matrix. It then follows that

(9.2-25)

where

AT+B,H, ~l (9.2-26)

0.5 1.0 1.5 2.0

By analogy to the deterministic problem, i can be shown to satisfy thedifferential equation

This is the same expression previously derived in Eq. (9.2-11), except that thedeterministic input, Ly, hasbeen replaced by the stochasticinput, GW.

11is reasonable to postulate that the stochastic observer should bedesigned toprocess the noise-free measurements, 11, accordingto the deterministic observerformulation, while processing the noisy measurements, .1, according to thestochasticKalman filter formulation. Accordingly,the stochasticobserver can bepostulated to have the form

An additionaln x rn, freegainmatrix,BI , has been incorporated for processingthe noisy measurements, .II' Notice that the noisy measurements appear only asinputs to the observer dynamics in a manner analogous to the Kalman filterformulation, and are not fed forward directly into the state estimates,as arethenoise-free measurements. A block diagram fOI this stochastic observer configuration is illustrated in Fig. 9.2-5. The choice of the observer specified by Eqs.(9.2-28) bas been constrained so that in the absence of the~ measurements, theobserver structure will be identical to the nth-order Kalman filter. It is easilyshownthat, for m1 ::: O.

1.0

-1.0

t(sec)i.~(TFA - TA}f+(TFB, - TB,)~ +TGw

i~Af+B,~

f~(TFA-TA)f+(TFB, -TB,h, +TB, ~,-H,i)

(9.2·27)

(9.2-28.)

(9.2-28b)

Fipte 9.2-4 Actualand Estimated States vs. Time for Simplified TrackingExample

and Fig. 9.2-5 reduces to the continuous Kalman filter shown in Fig. 4.3-1. Inthis case, the optimal choice of the gain BI corresponds to the Kalman gain. Forthe other extreme, where all m measurements are noise-free (m, = 0), Fig. 9.2-5immediately reduces to the deterministic (n - m)th-order observer depicted inFig. 9.2-1.


T=A=I

A=O (9.2-29)


Following the deterministic observer analysis, an equivalent differential equationfor i can be expressed in terms of the gain Bz as

3= (F-B,H,F-B,H, - ATB,H,)ii + ATB,:.:., - (I-B,H,)Gl\! (9.2-33)

Consider replacing B, by ATB, in Eq. (9.2-33). Using the identity TA = I, it canbe seen that the error dynamics are unaffected by the substitution. Hence, Eq.(9.2-33) may be simplified by replacing the term ATB, by B" yielding thefollowing equivalent expression for the estimation error dynamics:

',ltl-----r--------------, (9.2-34)

z,(1)~>,{)--+I

(nxn)

F\'I J.4--------+

FigtJl'e 9.2-5 Stochastic Observer Block Diagram

~(t)

Notice that, in the case of no measurement or process noise, Eq, (9.2·34)reduces to the deterministic observer error dynamics of Eq. (9.2-19). In theabsence of noise-free measurements, 8z and Hz are zero and Eq. (9.2-34)reduces to the standard Kalman filter error dynamics of Eq. (4.3-13), where B,is identified as the Kalman gain matrix.

OPTIMAL CHOICE OF 8, AND 8,

The stochastic observer design may now be optimized by choosing B I and Bzto minimize the mean square estimation error. The error covariance equation,determined from Eq, (9.2-34), is

P=(F-B,H,F - B,H, - B,H,}P+ P(F - B,H,F-B,H, - B,H,}T

+ B,R, B, T + (I-B,H,) GQGT(I-B,H,}T (9.2-35)

Both BI and Bz may now be chosen to minimize the trace of P(see Problem4-8). Minimizing first with respect to B" yields the optimal gain

The estimation error dynamics for the stochastic observer may be determinedin a manner analogous to that used to derive the deterministic observerdynamics. From Eqs. (9.2-27) and (9.2-28b), the expression for i is

(9.2·30)

Noting that

Substituting BI op t into the covariance equation gives

p= (F-B,H,F-B,H,) P + P(F-B,H,F-B,H,}T

-PH, TR,o, H,P + (I-B,H,) GQGT(I_B,H,}T

(9.2-36)

(9.2-37)

(9.2-3Ia)

(9.2-31b)

The optimum choice of Bz can be determined by minimizing the trace of PinEq. (9.2-37), with respect to B,. This computation leads readily to

the differential equation for the estimation error,&, is then obtained as(9.2·38)

3=AI+Af

= (AT + ATF - ATAT - ATB,H,)ii + ATB,:.:., - ATGl\! (9.2.32)

To complete the specification of the optimal stochastic observer, it isnecessary to properly initialize the filter estimates and the covariance matrix.Due to the exact measurements, discontinuities occur at t = 0+ Single stage

332 APPLtED OPTtMAL ESTIMATION

estimation theory can be used to determine the initial conditions &(0+) and1'(0+) from ~(O) and tbe prior estimates, i(0) and P(O):

i(0+) =i(0) + P(0)H2 T(O) [H2(0) P(O)H2 T(O)r' r~(O) - H2(0)&(0»)

(9.2-39)

1'(0+) = 1'(0) - P(O)H, T(O) [H,(O) P(O)H, T(OW' H2(0)P(0) (9.2-40)


where

(9.2-45)

The nth-order optimal stochastic observer is now derived for this problem.A useful property of observers is that the estimation error is orthogonal to

the noise-free measurements, so thatThe initial estimate depends on the initial measurement, Mz{O), and cannot bedetermined a priori. The initial condition for the observer, HO+), is related toi(~)~ -

1<0+) =T(O)&<O+) (9.2-41)

(9.2·46)

The proof of this result is left as an exercise for the reader. [Hint: Premultiplythe relation R' = A1by H'2, and use the constraint of Eq. (9.2-9)]. Hence, thecovariance matrix, P' = E £i'i'T] , can be expressed as

where P is the covariance matrix of the error i. The pertinent observer matricesmay be partitioned according to

Note that the optimal gain B, oPt, specified according Eq. (9.2-38), dependson the presence of process noise. In order to compute B2 opt, it is necessary toinvert the matrix H2GQGTH

2 T. This matrix will be nonsingular, yielding aunique solution for 82 opt, if the first derivatives of the noise-freemeasurementscontain white process noise. If some of the derivatives of the noise-freemeasurements are also free of white noise, then the choice of 82 is notcompletely specified. Under these conditions, 8 2 may be chosen to givedesirable error convergence properties, as in the case of completely detenninisticobservers.

rB2I] I [A~B, = LB~,J ,T= IT, I T2 ]. A= [AJ

(9.2-47)

(9.2-48)

SPECIALIZATION-TO CORRELATEDMEASUREMENT ERRORS

In many filtering problems of practical interest, the measurements may bemodeled as containing correlated measurement errors. where the measurementerrors are described by first-ceder differential equations driven by white noise.Consider the ntb.order system, with m measurements, described by

where B2 1 is n x m, T1 is n x n and Al is n x n. Substituting Eqs. (9.2.45) and(9.2-47) into the optimal gain expression of Eq. (9.2-38) gives

A possible selection of A and T that satisfies the constraint AT + B2H'2 =I isgiven by

The (n + m)th-order augmented system, withlT = [liT!yTJ is described by

A=FlI+GlY

~=HlI+Y

where the measurement noise.I. satisfies the differential equation

if: = F'lI' + G'lY'

~=H"lI'

(9.2-42)

(9.2-43)

(9.2-44)

B2Iopt = [P(H+ HF-EH)T +GQGTHT] [HGQGTHT + o.r:B"opt = [-HP<H+ HF-EHjT +Q.] [HGQGTHT + o.r:

= I-HB2 1opt

A= ~qT= [1-B2IH :-11,.]

The observer dynamics are described by

(9.2-49)

(9.2-50)

(9.2-51)


i'=Ai+B,~

i = (TF'A-TA) i +(TF'B,-ril,)~

Substituting for T, F', Aand B, leads to the result

TF'A-TA=F- B"H,

TF'B, - TB, = FB"-B,,E - B, ,-8, ,H,B"

(9.2-52)

(9.2-53)

The optimal reduced-order observer for the special case of correlated measurement errors is identical to the modified Kalman filter derived in Section 4.5,using a completely different approach. The Kalman filter approach to theproblem requires that special allowances be made to circumvent the singularityof the R matrix. Using the structure of stochastic observer theory, however, thesolution to the correlated measurement error problem follows directly.

9_3 STOCHASTIC APPROXIMATION

",(II...-----'

For an initial estimate &(0), the discontinuities in &(0+) and P(O+) aredetermined by appropriately partitioning Eqs. (9.2-39) and (9_2-40), resulting in

THE SCALAR CONSTANT-PARAMETER CASE

Stochastic approximation methods were first developed as iterative procedures for determining a solution, xo' to the scalar nonlinear algebraic equation,

Most of the material contained in this book is concerned with obtainingoptimal estimates of a random vector .1, or a vector random process ~(t), fromnoise-corrupted measurement data. Recall that an optimal estimate is one whichminimizes an appropriate functional of the estimation error; examples of suchcriteria - maximum likelihood, least squares, etc. - are discussed in theintroduction to Chapter 4. This section considers a class of estimationtechniques, called stochastic approximation methods, that are not necessarilyoptimal in any statistical sense. but which yield recursiveestimates with certainwell-definedconvergence properties.

The motivation for stochastic approximation methods is that optimalestimation criteria often depend upon assumptions about the statisticalcharacteristics of X(t), and its associated measurement data, which may not holdtrue in practice. For instance, the Kalman filter yields a minimum varianceestimate of Jl(t}, provided the latter satisfies a linear stochastic differentialequation driven by gaussian noise and measurements are linearly related to ~(t)

with additive gaussian noise. If the dynamics of X(t) and its observations aredominated by nonlinear effects that cannot be accurately approximated bylinearization, or if the noise processes are nongaussian, the correspondingoptimal estimation algorithm is often too complex to mechanize. Moregenerally, if the noise statistics are unknown or undefined, the optimal estimatemay be Indeterminate. In such circumstances it is sometimes possible, throughuse of stochastic approximation methods, to obtain a sequence of estimates forJl(t) that either asymptotically approaches the true value (when lI(t) is constant),or possesses a statistically bounded error (in the time-varying case). Themathematical assumptions needed to prove these convergence properties aregenerally much weaker than those required to determine optimal estimators.Furthermore, most stochastic approximation algorithms are recursive linearfunctions of the measurement data that can be readily mechanized in acomputer. Consequently, they offer attractive alternatives to optimal estimationtechniques in some applications.

ill)

(9_2-54)

(9.2-55), ~p(O)1 0 JP(O)= --1--o I R(O)

iW) =&(0)+P(O)HT(O) [H(O)P(O)HT(O)+ R(OW' [z(0) - H(O)&(O)]

1'(0+) =P(O)- p(O)HT(O) [H(O)P(O)HT(O)+ R(O)r' H(O)P(O)(9.2-56)

where HI has been defined as

A block diagram of the optimal observer is illustrated in Fig. 9.2-6, for theobserver defined by Eqs, (9.2-52) and (9.2-53). Since we are not interested inestimating the noise, L only the estimates i are shown in the figure. Assume thatthe prim estimate ofP' is givenby

Figure 9.2-6 Correlated Measurement Error Optimal Observerfl(x)= 0 (9.3-1)

where k~, k = I, 2, ... , is an appropriately chosen sequence of "gains" (denotedby jkkl)' In the method of successive approximations kk = -Sgn [g(ik)]for all values of k; in Newton's method*

The objective at each stage is to apply a correction to the most recent estimateof xo' which yields a better estimate. Conditions under which the sequencelid, generated by Eq. (9.3-2), converges to a solution of Eq. (9.3-1) can bestated in terms of restrictions on the sequence of gains, on the function g(x) inthe vicinity of the solution and on the initial "guess", x, (see Ref. IS).

The stochastic case refers to situations where g(Xk) cannot be evaluatedexactly; instead, for each trial value of x a noise-corrupted observation

determine the proper gain setting to achieve zero steady state error, anexperiment can be devised whereby successive gain settings are tried and theresulting steady-state error is measured with an error vk'

(9.3-3)

(9.3-2)

where g(xk) cannot be evaluated exactly for trial values, or "estimates," Xl, ...xk' ...• of the argument, x. This is analogous to the classical deterministicproblem of obtaioiog a solution to Eq. (9.3-1) for a known fuoction g(x), whichcan be treated by any of several classical numerical techniques - Newton'smethod, successive approximations, etc. Most such numerical techniques havethe common property that an approximate solution for Xo is obtained byiteratively performing the calculation

(9.3-4) FilUle 9.3-1 GraphicallUustration of the Ctassof Functions. g(x)

is generated. In the sequel it is convenient to assume that g(x) is a monotonicallyincreasing or decreasing function having a unique solution to Eq. (9.3-1), andlVk }is " sequence of zero mean independent random variables having boundedvariances. Furthermore it is assumed that g(x) has finite, nonzerO slope asillustrated in Fig. 9.3-1 - Le.,

The background of iterative methods available for determining the solution toEq. (9.3-1) in cases where g(x) is a known function led investigators, beginningin the early 1950's, to inquire whether a recursion relation of the form

(9.3-5)

0< a<; lg'(x)] <;b<~

Somewhat less restrictive conditions could be imposed; however, those givenabove suffice for many applications of interest.

A practical example of the type of problem described above might be thenonlinear control system illustrated in Fig. 9.3·2, where x represents the value ofa control gain and g(x) represents the steady state error between the systemoutput and a constant input, as an unknown function of x. If the objective is to

can also generate a sequence that in some sense converges to Xo when g(xk) isobservable only through Eq. (9.3-4). Equation (9.3-5) has the same form as itsdeterministic counterpart, Eq. (9.3-2), except that g(xk) is replaced by themeasurement mk- Because mk has a random component, Eq. (9.3-5) is called astochastic approximationalgorithm.

A common definition of convergence applied to the random sequencegenerated by Eq. (9.3-5) is meansquareconvergence, defined by

{

I ; 0>0

·g'(Xk) ~ dg(x)/dxlx=Xk' Sgn(a] ~ 0; Q = 0

-1; 0<0

~~~ E(xo-Xk)2] = 0 (9.3-6)

The pioneering work of Robbins and Munroe (Ref. 16) demonstrates that thesolution to Eq. (9.3-5) converges in mean square, if the gains kk satisfy theconditions

338 APPLIED OPTIMAL ESTIMATION ADDITiONAL TOPICS 339

SPECIFIEDCONSTANTINPUT

MEASUREMENTERROR,v~

IMEASUREMENT

~

Following Robbins and Munroe, others derived alternative conditions forconvergence of stochastic approximation algorithms and investigated ways ofchoosing the gain sequence 1kk I in Eq. (9.3-5) to achieve a high rate ofconvergence (Refs. 17, 18 and 19). Algorithms have been obtained for problemsin estimation (Refs. 20 through 25), optimization (Refs. 26 and 27), and patternclassification (Refs. 28, 29, and 30). The subsequent discussion in this sectionpursues the viewpoint of our main interest, viz., estimation theory.

Suppose that an unknown constant Xo is to be estimated from observationsof the form

Figure9.3-2 A Control System Application Where Stochastic ApproximationMethods can be Applied to Determine the Proper DC Gain

(9.3-9)

where h(x) is a known function and{ vk } is a sequence of independent zeromean random variables, having otherwise unknown statistical properties. If anew function g(x) is defined bySgn(kk) = - Sgn(g'(Xk»)

lim (kk) =0k~-

(9.3-7a)

(9.3-7b) g(x) ~ h(xo) - h(x) (9.3-10)

then the problem of determining Xo is equivalent to finding the solution to

(9.3-7c) g(x) =0 (9.3-11)

(9.3-7d)

We shall not supply convergence proofs here; however, the reader shouldqualitatively understand why the conditions in Eq. (9.3-7) are generallynecessary for convergence. For instance, Eq. (9.3-7a) insures that the correctionto each estimate xk is in the proper direction, by analogy with the classiC~

techniques for determining zeros of known functions. Hence, although g(x) IS

unknown, the sign of its derivative must be known in order to choose a propersequence of gains. The condition that the gain sequence approaches zero [Eq.(9.3-7b)] is needed to insure that

Subtracting h(xk) from both sides of Eq. (9.3-9), using Eq. (9.3·10), anddefining

(9.3-12)

we obtain equivalent measurements

(9.3·13)

Thus, estimation of Xo is recast asthe problem of approximating the solution toEq. (9.3-11) from the measurement data 1mk Iin Eq. (9.3·13). Applying thealgorithm in Eq. (9.3-5), we can estimate xo' recursively, from the relation

lim Xk+l = xkk~-

(9.3-8)(9.3.14)

in Eq. (9.3~5), otherwise { Xt } cannot converge in a mean square sense to xo'[Note that condition (9.3-7b) is implied by condition (9.3·7d).] The condition.in Eqs. (9.3-7c) and (9.3-7d) are needed to insure that Ikkl approaches zero atthe proper rate - not too fast, Eq. (9.3-7c), but not too slowly either, Eq.(9.3-7d); an example that specifically demonstrates the necessity of theseconditions is given in the problem section.

where the gain sequence1kk Iis chosen to satisfy the condition. in Eq. (9.3-7).Equation (9.3.14) is similar in form to the discrete Kalman filtering algorithmdescribed in Section 4.2. In particular, the new estimate at each stage is a linearfunction of the difference between the new measurement and the most recentestimate of h(xo)' However, notable differences are that h(x) can be a nonlinear


for all values of k. To this end, an algorithm having the same form as Eq.(9.3-14) is employed, viz.:

function and the gains { kk ~ are determined without making any assumptionsabout the statistical properties of Xo and ~ Vk} .

As an example, suppose Xa is observed through linear measurements of theform

(9.3-15)

Then Eq. (9.3-14) becomes

we seek the value of x that satisfies

gk(X)=0 (9.3-22)

(9.3-23)

(9.3-16)

Using the fact that g(x) = Xo - x, the condition in Eq. (9.3-7a) reduces to kk >0; hence Eq. (9.3-14) becomes a linear filter with positive gains. Some gainsequences which satisfy the other convergence conditions in Eq. (9.3-7) are

In order for the sequence {Xk} to converge to Xo in the case of a time-varyingobservation function, conditions must be imposed which take into account thevariations in 8k(x) with the index k, as well as the properties of the gainsequence {kk}. Henceforth, attention will be restricted to linearproblems ~ i.e.,

(9:3-24)

where hk may be a function of k. For this case, the following conditions, inaddition to those in Eq. (9.3-7), are imposed (Ref. 24):

:k' :~+Qk }y Q, {3 > 0; k = 1,2, ...

Q+kkk = ii+k'

(9.3-17) (9.3-25a)

(9.3-25b)

This category includes the important problem of estimating the coefficient of aknown time-varyingfunction - e.g.,

The class of estimation algorithms discussed above can be extended tosituations where the observation function, h(xo)' varieswith time - Le.,

Defining

8k(x) ~ hk(xo) - hk(x)

(9.3-18)

(9.3-19)

(9.3-20)

The condition in Eq. (9.3-25a) is analogous to that in Fig. 9.3-1, which boundsg(x) by a linear function with a finite slope. The condition in Eq. (9.3-25b) is ageneralization of Eq. (9.3-7c) (the latter is implied by the former) needed toinsure that the measurements contain sufficient information about xo. Forexample, if the measurements in Eq. (9.3-19) are taken at uniform intervals oflength 21T/W sec, beginning at t =0, then hk =sin (21Tk) is zero for all values ofk,causing Eq. (9.3-25b) to be violated - i.e., ~Ikkllhkl = 0 *~. In this case, thedata would contain no information about xo.

The condition in Eq. (9.3-25b) is physically similar to the concept ofstochastic observability (see Section 4.4), which describes the informationcontent of measurement data associated with linear dynamical systems. Todemonstrate this, we begin by recalling that a discrete-time dynamic system isuniformly stochastically observable if

and corresponding observations,

(9.3-21) (9.3-26)

342 APPLIED OPTIMAL ESTIMATiON ADDITIONAL TOPICS 343

for some value of N, wtth c, > 0 and "', > 0, where <p(i,k) is the transitionmatrix from stage k to stage I, Hi is the measurement matrix and Ri is themeasurement noise covariance matrix. It follows from Eq. (9.3-26) that

t <P(i,k)THjTR;-' Hj<P(i,k): ee

i=l(9.3-27)

If the additive measurement errors have bounded variance, it is easy to prove"that Xk will converge to "<>, provided the denominator in Eq. (9.3-29) growswithout bound as k approaches infinity - i.e., provided Xo is observableaccording to Eq. (9.3-28). Thus, observability Implies convergence of the leastsquares estima teo

On the other hand Eq. (9.3-29) can be written in the recursive form

Comparing Eqs. (9.3-23) and (9.3-30), we can regard the latter as a specificstochasticapproximationalgorithmhavinga gaingivenby

Applied to the problem of estimating Xo in Eq. (9.3-24) for which

«i,k): I Ifor alli,k

Hj : II;

with the assumption thatthe additivemeasurement noisehas constantvariance,"0', Eq. (9.3-27) reduces to

(9.3-28)

(9.3-30)

(9.3·3 I}

Now, the optimal least-squares estimate for Xo in Eq. (9.3-24), obtained from aset of k-l measurements of the form

In order that the condition in Eq. (9.3-25b) be satisfied, it must be true that

It can be shown that Eq. (9.3·32) will hold provided (see Ref. 31 on theAbel-Dini Theorem)

is given by (see the discussions of least-squares estimationin the introduction toChapter 4),

(9.3-29)t~,:~j=1

(9.3-32)

(9.3-33)

That is, xk in Eq. (9.3-29) is Ihe value ofxk which minimizes the functionwhich is equivalent to Eq. (9.3·28). Consequently, for least-squares estimation,stochastic observability and Eq. (9.3-25b) imply the same conditions on hk.

*This assumption is for convenience of exposition only; the analogy we are making here willhold under much weaker conditions.

!: xohj2 +vk

lim Xk: lim FI~k_ k-+- -

h"1= J

providedEjVk'J is bounded.


However, Eq. (9.3-2Sb) has more general applicability, in the sense that itprovides convergence conditions for gain sequences that are not derived from aparticular optimization criterion.

ADDITiONAL TOPICS 346

emphasizes the fact that it ill wise to take advantage of any available knowledge about thestatistics of Xo and {Vk}.

1.50-' ,--------------------,

Example 9.3-1To illustrate the performance of stochastic approximation methods, consider the

estimation of a gaussian random variable Xo with a mean #0 and variance 002. Suppose that

Xo is observed from linear measurements defined by

(9.3-34)

where { Vk } is a sequence of zero mean, independent gaussian random variables havingvariance 0 2. A commonly used stochastic approximation algorithm is one having gains ofthe form

(9.3-35)

This choice is motivated by least-squares estimators of the type given in Eq. (9.3-30), whosegains are the same as in Eq. (9.3-35) in the special case, hk "" I. Applying this sequence ofgains, the algorithm for estimating Xo becomes

STOCHASTIC APPROXIMATION ALGORITHM

5TOTAL NUMBER OF MEASUREMENTS

10

If the variances of Xo and { Yk} are known, the mean square estimation error can becomputed. In particular, for this example

It is instructive to compare the result in Eq. (9.3-37) with the error that would beachieved if an optimal Kalman filter, based upon knowledge of #0' 00

2, and 02, had been

used. The filter equation has the form

k = 1, 2,. (9.3-36)

(9.3-37)

Figure 9.3-3 Comparison of rms Estimation Error forVarious Filtering Algorithms

On the other hand, if a Kalman filter design is based upon assumed values for thestatistical parameters which are incorrect, the algorithm in Eq. (9.3-36) may yield a betterestimate. To illustrate, suppose Xo has zero mean and variance

E [x02

] = 0 02

+4002

where 002 is the assumed value and 400

2 represents an error. Then the Kalman filter in Eq.(9.3-28) produces an actual mean square estimation error given by

(9.3-40)

• • 1 •xk+ 1 = xk + --0-2 (Zk - xk)

k+ 200

and the corresponding mean square estimation error is given by

,EEXk - Xo)2] ""~

k+ 200

(9.3-38)

(9.3-39)

• 2 02

4002

EEXk - Xo) ] =-, + ~--'----)'o 00k+ 2 l+k 2

00 0

By contrast, the mean square estimation error associated with Eq. (9.3-36) remains the sameas in Eq. (9.3-37), which is independent of 00

2. The mean square error computed from Eq.(9.3-40) is also shown in Fig. 9.3-3, for the case 400

2 = 20 2. Evidently, the improperlydesigned Kalman filter now performs consistently worse than the stochastic approximationalgorithm, reflecting the fact that the latter generally tends to be less sensitive to changes ina priori statistics.

The mean square estimation errors, computed from Eqs. (9.3-37) and (9.3-39), are shown inFig. 9.3-3 for the case 002 =0.5 0

2- i.e., the error variance in the a priori estimate of Xo is

smaller than the measurement error. The Kalman filter takes advantage of this informationby using gains that give less weight to the first few measurements than does the algorithm inEq. (9.3-36), thereby providing a lower estimation error. This comparison simply

We conclude, from the above discussion and example, that a stochasticapproximation algorithm is a reasonable alternative to an optimal estimation

346 APPLIED OPTIMAL ESTI"'ATION ADDITIONAL TOPICS 347

thenlllkl converges to JIo. The form of Eq. (9.3-43) is suggested by the leastsquares gain sequence in Eq. (9.3-31) for the scalar parameter case. It has theadvantage of being easily computed and the conditions for convergence can bereadily checked.

technique when the statistical parameters on which the latter is based, are notwell known. In addition, in many nonlinear estimation problems the linearstochastic approximation algorithm is generally more easily mechanized. Finally,in real-time filtering applications where measurements must be processed rapidlydue to high data rates, easily computed gain sequences such as that in Eq.(9.3-35) may be preferable to the generally more complicated optimal gains.This advantage is particularly applicable to the vector case described in the nextsection. Thus this class of estimation techniques should be included within thedata processing repertoire of any engineer concerned with filtering andestimation problems. For additional details the reader is referred to the citedreferences, particularly Ref. 24.

subject to the conditions

for some a, N >0 (9.3-44)

GENERALIZATIONS

The VectorConstant-Parameter Case- To extend the preceding discussion tothe problem of estimating a set of n constant parameters arranged in a vector .co'we again consider a scalar linear measurement of the form

Time-Varying Parameten - Stochastic approximation methods were firstdeveloped for estimating constant parameters. However, analogous methods canbe applied for estimating variables that are time-varying. To illustrate, suppose itis desired to estimate xk from the measurements

(9.3-41) (9.3-45)

where hk is a time-varying vector and vk is a sequence of zero mean independentrandom variables having bounded variance. The case of vector measurements canbe included within this category by considering individual measurements one ata time. By analogy with Eq. (9.3-23), the corresponding vector stochasticapproximation algorithm takes the form

where xk varies from sample to sample. Specifically the variation in xk isassumed to be governed by the stochastic difference equation

(9.346)

~onditions for convergence of the sequence of vectors {h} to~ can be foundIn Ref. 24. They represent a nontrivial multidimensional generalization of theconditions required in the scalar parameter case; we do not include them here inkeeping with our purpose of presenting a principally qualitative description ofstochastic approximation methods. However, it is useful to mention a particulargain sequence that often yields convergent estimates. In particular, if*

where ~(x) is a sequence of known functions, and {wkl is a sequence of zeromean random variables.

Byanalogy with the constant parameter case, an algorithm of the form

(9.3-48)

(9.3-47)

is suggested for estimating xk. Equation (9.3-47) differs from Eq. (9.3-23) inthat lp(xk) replaces Xk to account for the fredicted change in xk at each stage. Ingeneral, we cannot expect the sequence tXk} to converge to xk; however, it isoften possible to achieve an estimation error that is statistically boundedaccording to the conditions(9.3-43)

.(9.3-42)

Criteria for achieving statistically bounded estimates, for a scalar nonlinearestimation problem, are provided in Ref. 24. In the case of linear dynamicsystems, any Kalman filter whose gains may be incorrect due to errors in the

9.4 REAL-TIME PARAMETER IDENTIFICATION

The problem of identifying constant parameters in a system can be regardedas a special case of the general state estimation problem discussed throughoutthis book, where the parameters are a set of random variables,1, satisfying thedifferential equation

(9.4-4)

(9.4.5)

(9.4-6)

(9.4-7)

!s(t); Fs(jl." tlJl,(t)

.io =.11 +£2t+ ... +.in+l tn

If Eq. (9.4-6) is substituted into Eq. (9.4·5), the equations of motion take theform of Eq. (9.4-4) with the vector .i!being composed of the constant sets ofcoefficients 11, .. ".!n+l'

One apparent difficulty with the above modeling technique is that thedimension of.! - and hence, also, the dimension of the composite state vector inEq. (9.4·4) - increases with every term added in Eq. (9.4-6). Because thecomplexity of the filter algorithm required to estimate a increases with thenumber of parameters, it is desirable that Eq. (9.4-6) have as few terms aspossible. However, a practical, finite number of terms may not adequatelydescribe the change in1o, over the time interval of interest.~n alternative method useful in accounting for time-varying parameters, that

avoids introducing a large number of state variables, is to assume that thetime-variation in .10 is partially random in nature. In particular, we replace theconstant parameter model <1; 0) in Eq. (9.4-1) with the expression

In this form it is evident that the composite estimation problem is nonlinearbecause the product FsW .x,(t) is a nonlinear function of jl and lis.

with parameters, !o. One approach to allowing for time variations in theparameters is to model g, as a truncated power series.

MODELING UNKNOWN PARAMETERS

Applications where the need for parameter identification often arises, are insystems where the dynamics are nominally time-invariant, but where it isexpected that certain parameters will change with time, while the system isoperating. The exact nature of this time variation is frequently unknown;however, the range of variation may be sufficiently large to require that it beincluded in modeling the equations of motion. To illustrate, suppose that thenominal equations of motion are

If! and ~S<t) are combined into a composite state vector ~(t), the combinedequations of motion for the system become

where ll!. (t) - N(Q,Q.). The strength of the noise should correspond roughly tothe possible range of parameter variation. For example, if it is known that the

(9.4·2)

(9.4-1)

i.(t); FsW 2I,(t) +E.W u(t)

Mt); Q

The need to estimate parameters can arise in a number of ways. For example, inlinear filtering theory it is assumed that the elements of the matrices F, H, ...,etc., that describe the dynamics and measurement data for the linear system, areknown. In practice, this condition is not always satisfied, and the task ofestimating certain unknown parameters may be included as part of the overallstate estimation problem. Similarly, in control system design, it is necessary thatwe know the dynamics of the plant to be controlled, so that an appropriatecontrol law can be derived. If some plant parameters are unknown, a parameterestimation algorithm may be a necessary part of the system control loop.

In this section, we present an example of real-time parameter identificationfor a specific control system - viz, an airframe autopilot - where severaldynamic quantities are unknown. The equations of motion are nominally linear.in the form

where u(t) is a known control input. In the more general case, process noise canbe included in Eq. (9.4-2); however it is omitted from this particular illustration.For this application, real-time estimates of lIs(t) and .i! are needed so thatsatisfactory control of the system can be maintained. That is, u(t) is a feedbackcontrol depending upon the current best estimate of x( t); the nature of thisdependence is discussed in the sequel. Since u(t) is assumed to be known, it isnot necessary to know how it is generated in order to design a filter for !(t); itcan simply be regarded as a time-varying known input analogous to lI(t) in Table4.4-1.

Both .i! and .x,(t) are to be estimated in real-time from the linear noisymeasurement data

assumed noise covariance matrices, or because of imperfect knowledge of thesystem dynamics, can be viewed as a time-varying stochastic approximationalgorithm. Conditions for the boundedness of the resulting estimation error aregiven in Refs. 32 and 33. Aside from these special cases, development of simplealgorithms that yield bounded estimation errors for time-varying parameters particularly when the dynamics andj or the measurements are nonlinear - IS, inmany respects, still an open problem requiring further research.

(9.4-3)

350 APPLIED OPTIMAL ESTIMATIONADDITIONAL TOPICS 351

itb element of ! is likely to change by an amount <1,,; over the interval of

interest, ~t, then require

The equations of motion and observation given in Eq. (9.4-9) have the sameform as the nonlinear system dynamics in Eq. (6.0-1) with linear discrete

measurements,

Some of the alternative identification methods mentioned above areadvocated for situations where the unknown parameters are assumed to beconstant with unknown statistics, and where an algorithm is desired that yieldsperfect (unbiased, consistent) estimates in the limit, as an infinite number ofmeasurements is taken. The equation error, stochastic approximation, andcorrelation methods are all in this category. However, when the parameters aretime-varying, as in Eq. (9.4-7), the convergence criteria do not apply. Whether ornot these methods will operate satisfactorily in the time-varying case can onlybe ascertained by simulation.

Other methods are based upon various optimization criteria - e.g., maximumlikelihood and least-squares. The maximum likelihood estimate is one thatmaximizes the joint probability density function for the set of unknownparameters; however, it is typically calculated by a non-real-time algorithm that IS

not well suited for control system applications. In addition the probabilitydensity function can have several peaks, in which case the optimal estimate mayrequire extensive searching. The least-squares criterion seeks an estimate that"1»51" fits the observed measurement data, In the sense that it minimizes aquadratic penalty function; in general, the estimate is different for eachquadratic penalty function. In contrast with the minimum variance estimate, theleast-squares estimate does not require knowledge of noise statistics.

The autopilot design application treated here demonstrates the parameteridentification capability of the minimum variance estimation algorithms dis-cussed in Chapter 6. The latter provide a logical choice of identificationtechniques when the random process statistics are known and the requirementexists for real-time filtered estimates; both of these conditions exist in thisexample.

(9.4-9)

(9.4-8)

(9.4-10)

A <1,,;'itb diagonal element ofQ, = at

if we make the identifications

In practice, it is frequently observed that Bq. (9.4-7) is a good model for thepurpose of filter design, even though the parameter changes may actually bedeterministic in nature. In fact, the random model often permits dropping allbut the first term in Bq. (9.4-6), thus keeping the number of state variables at aminimum. Combining Eq. (9.4-7) with Eqs. (9.4-2) and (9.4-3), we obtain the

model

In addition, if statistical models are assumed for !fto)' lI,(to) and !Ie, theBayesian nonlinear filtering methods discussed in Chapter 6 can be applied. Theextended Kalman filter, described in Table 6.1-1, is used for the applicationdiscussed in this section. However, the reader should be aware that there aremany different methods that can be used for parameter identification - e.g.,maxtrnum likelihood (Refs. 34 and 35), least-squares (Ref. 36), equation error(Refs. 37 and 38), stochastic approximation (see Section 9.3 and Ref. 24), andcorrelation (Ref. 39). Some discussion of the relative applicability of thesetechniques is in order.

\y(t) ~ tlY~(!~

l<A(t), t)~ ~s(j(l))"i,(i"f+£"Wtj) ii(iJ]Hk ~ [01 H,(tkll (9.4-11)

Example 9.4-1

The framework of the parameter identification problem described here is the design ofan adaptive pitch-plane autopilot for an aerodynamic vehicle having the functional blockdiagram shown in Fig. 9.4-1. This example treats motion in a single planeonly - referredtoas the pitch-plane. The airframe dynamics are assumed linear. with a numberof unknowntime-varying parameters. The objective is to obtain a feedback control signal u(t), thatcauses the airframe lateral acceleration to closely follow the input steering command c(t).

If the airframe dynamics were completely known, feedback compensation could beselected to obtain the desired autopilot response characteristics using any appropriatesystem design technique. Because certain parameters are unknown, an autopilot design issought that identifies (estimates) the parameters in real time and adjusts the feedbackcompensation accordingly to maintain the desired response characteristics. In thisapplication, the method used to determine the control input is the so-called pole orrignn'lenttechnique, whereby u(t) is specified as -afunction of ,i,is.and c(t) from the relation

(9.4-12)

Assuming the parameter estimates are perfect. the feedback gains. h. are chosen such thatthe autopilot "instantaneous closed-loop poles" have desired values; ho is an input gainchosen to provide a specified level of Input-output de gain. If the airframe parameters are


AUTOPilOT oYNA",.U~- -- ------ -- -- - --- -- _.- --

Figure 9.4-1 Adaptive Autopilot Including Parameter Identification

known and constant, ho and h will be constant; the mathematical details for deriving thefunctional expressions for the gains can be fou~d in Ref. 40. However, because !. isinitially unknown and time-varying, the estimates a will vary and the control gains must beperiodically recalculated; this is the adaptive co;;trol feature of the autopilot. For thepurpose of developing the parameter identification algorithm, we need say no more aboutu(t); the fact that it is known to the designer as a function of the real time estimates of.!sand.1is sufficient.

The airframe configuration in this application is for the tail-controlled vehicle illustratedin Fig. 9.4-2. The equations of motion for the airframe can be expressed in thevector-matrix form


Figure 9.4-2 Airframe Configuration

For this demonstration, the vehicle is assumed to be thrusting longitudinally at a level of25-g's, causing rapid variations in the parameters in Eq. (9.4-14) tluough theiJ dependenceupon airspeed The trajectory duration is three seconds. The simulation truth model usespiecewise-linear time functions to represent the parameters; the filter model assumes thatthe parameters vary randomly according to Eq. (9.4-7). Measurements of the three airframestate variables are assumed to be available in the form

The quantity X represents known actuator dynamics, V is unknown airspeed. and Mq, M6',etc., are unknown stability derivatives. In the context of Eq. (9.4-9). the set of unknownparameters is defined to be

~ci(tj fMq ~ ~ M6 - MOL6] [q(tj [0]~(t) = VLa -La -WL~ .(t) + • VL6 u(t)6(t) 0 -x 6(t) •

or, in abbreviated notation,

.is(t) = Fsl\s(t) +g.u(t)

[

f

ll

]

f t2

Ir a.=- f1 1

f"

Jaa

(9.4-13)

(9.4-14)

(9.4·15)

(9.4-t6)

where ~} is a gaussian white sequence. With these assumptions. Eqs. (9.4-14) and (9.4-16)fit the format of Eq. (9.4-9) and the extended Kalman filter algorithm can be directlyapplied. A digital computet monte carlo simulation of this model was performed under thefollowing conditions:

Measurement noise rms level; pitch rate gyro, 4.5 X 10-4 rad/secaccelerometer, 0.16 ft/sec2deflection angle resolver, 6.3 X 10-3 rad

Input command. c(t): square wave - amplitude = 10 ft/sec2 andfrequency> 1.67 Hz

Measurement interval: tk+l - tk::= 0.02 sec

The parameter tracking accuracy achieved for the system in Fig. 9.4·1 is illustrated by thecurves in Fig. 9.4-3. which display both the truth model and the estimated parameter valuesas functions of time.

It is observed from Fig. 9.4-3 that some parameters are identified less accurately thanothers; in particular. f" and f12 are relatively poorly estimated. This is physically explainedby the fact that these two parameters describe the open loop damping characteristics of themissile airframe. Since the damping is very light. it has little influence on the airframe statevariables 4S(t), and hence. on the measurement data. Thus. f ll and f2 2 are relativelyunobservable. However, this behavior does not adversely affect autopilot design becauseparameters that have little influence on airframe behavior tend also to have little effect onthe airframe control law.

Another phenomenon demonstrated by this application is that the character of the inputsignal u(t) in Eq. (9.4-13) influences the parameter identification accuracy - an effect that


0I , I

fi,

,i I

I ,I.

"23i I

0

I

I"

0

·2000

(a) Input Square Wave Period 1.6 sec

2TIMElsec)

-lOa

.!..;

..; 33 ~~

~

~ ~«~ ~~

I 2 I 2TIMElsec) TIMEI,.cl

.-~

! ~]!3 ~~

~

~~

i ~

I 2TIMEh.c)

2TlMElsKI

(b) Input Square Wave Period 0.2 sec

-12000o

-10000

l-4000

_= -6000

I 2TIMEls.c1

"'OOO,LO-_'-_-l._-

j -8,000

£ -10,000

j -2,000

-= -4,000 f-+1'lfl--l

~ -6,000

0~;; ! iii: 1:11 Ii:: I

••q:l

o~ .:1:..

j

Efg :tW {n· ~1:. ~'j'22' i:J -: :1#IN :;~ b -:

o ~.~~: .. ,

--1£?Q '22:.': -kij?

....

K I'0

0,''\,

,0.

i -01

-"..;3~ -0.2

~

! -03~

FiJUre 9.4-3 Comparison of Parameter Estimates and TruthModel Parameter Values

Figure 9.4-4 Demonstration of Identification Accuracy Sensitivityto Control Input Signal Frequency

366 APPLIED OPTIMAL ESTIMATION ADDITIONAl- TOPICS 357

is not observed in truly linear filtering problems. * This occurs because the extended Kalmanfilter gains are dependent upon~(t)which, in turn, is dependent upon u(t). To illustrate thisbehavior, two simulations were performed under identical conditions except that differentvalues of the SQuare wave signal frequency were used for c(t). One parameter in Eq.(9.4-13), fa a- exhibited marked sensitivity to the frequency changes, as demonstrated inFig. 9.44.

The above example illustrates the use of the extended Kalman filter forobtaining estimates of state variables in problems that are inherently nonlinear.Practical problems in parameter identification are currently receiving considerable attention in many fields - e.g., high-speed aircraft, ship control andnuclear power plant control. Work on the design of optimal inputs for systemparameter identification is also a subject of current research (e.g., Ref. 41).

appropriate approximation to the desired feedback control law - y[&(t),t] can be achieved. An example of this type was treated in Section 9.4; however, in this section we are explicitly concerned with the design of thecontrol law.

Just as various optimization criteria have been employed to derive optimalestimation algorithms. the concept of optimal control arises when !l(t) is chosento minimize a performance index, or figure of merit, for the controlled system.For linear systems with certain specific types of performance indices, there aresignificant similarities in the solutions to the control and estimation problems.This section briefly discusses the control problem, compares it with theestimation problem, and indicates how both subjects are combined in the designof an optimal stochastic control system.

9.5 OPTIMAL CONTROL OF LINEAR SYSTEMS

"Note from Table 4.4-1 that the covariance matrix for a linear system is unaffected by thepresence of ~(t).

where y(t) is gaussian measurement noise. As we shall subsequently demonstrate.the role of state estimation is to process the measurement data so that an

(9.5-3)

(9.54)

~(t) = F(t) ~(t) + L(t) ll<t)

DETERMINISTIC OPTIMALLINEARSYSTEMS- DUALITY

First, we discuss the control problem for deterministic linear systems wherethe noise processes, mt) and 1<t), in Eqs. (9.5-1) and (9.5-2) are absent. Todetermine a control law for the system described by

it is desirable to impose a performance criterion that leads to a unique choice ofy(t). An important class of problems, successfully treated from this point ofview, is the so-called regulator problem, wherein !(t) is assumed to have aninitial value !o at time to and the control is chosen to drive the state towardzero. This objective is stated more precisely by requiring that y(t) minimize aperformance index, J, which provides a measure of the size of.x(t). In addition,the index should include a weighting on the magnitude of y(t) to limit theamount of control effort expended in nulling the state vector. A form of J thatis found to be convenient and useful for linear systems is the quadraticperformance index. defined by

where Vr, V(t) and U(t) are specified weighting matrices and tr is a specifiedfinal time. The matrices Vr and V(t) are usually required to be symmetric andpositive semidefinite; U(t) is symmetric and positive definite.

The positive quadratic terms in JI in Eq. (9.54) provide performancemeasures that tend to achieve the desired reduction in the state when y(t) ischosen to minimize J. However, the amount of reduction achieved iscompromised with the control effort expended by the inclusion of a quadraticterm in !l(t)- The smaller the elements of U(t), relative to those ofVr and V(t),the larger will be the magnitude of the optimal control and the greater will bethe reduction in JI(t).

(9.5-2)

(9.5-1)~(t) = F(t) JI(t) + G(t) \y(t) + L(t) ll(t)

];(t) = H(t) 3(t) +1<t)

An important application for the estimation techniques discussed in this bookis the field of control system design. The concept of control arises when theequations of motion of a system contain a set of "free" variables y(t), whosefunctional form can be prescribed by a designer to alter the system dynamicproperties. A linear stochastic system having this capability is represented by theequation

where L(t) is a gain matrix describing the influence of ll<t)on the state vector. Ify(t) is explicitly a function of time only, it is referred to as an open loopcontrol. If !l(t) is explicitly a function of ~(t) also, it is called a closed loop orfeedback control.

We note that Eq. (9.5-1) has the same form as the equations for thecontinuous system in Table 4.4-1, where !let) is referred to as a deterministicinput. The linear open loop control system falls into the same category, becausethe control variables are specified by the designer; therefore, they can beconsidered as known (deterministic).

To mechanize a feedback control,lIfJ1(t), tl, the state variablesgft) must beaccessible. Typically Jl(t) can be observed only through the available measurements Ji..t). In designing linear control systems, the latter are assumed to be givenby the familiar linear expression

Examination of Eq. (9.5-8) reveals that it can be rewritten in the form

if we impose the following condition on S(t):

Equation (9.5·9) is readily verified by inspection of Eqs. (9.5·8) and (9.5-10).Until now, S(t) has been an arbitrarysymmetric matrix, except for its value at trspecified by Eq. (9.5-5); hence, we have the freedom to require that S(t) satisfyEq. (9.5-10), recognized as the matrix Riccati equation, on the intervalto';;t<lr·

Finally we observe that

(9.5-9)

(9.5·10)

The procedure outlined above for designing a feedback control system isreferred to as an optimal control problem formulation, where a control law issought that minimizes a scalar index of performance, J. Such problems areextensively treated in the control systems literature (see Refs. 36, 42-44).Solutions for the optimal controls are obtained by various mathematicaltechniques - the principal ones being the calculus of variations, Pontryagin'smaximum principle, and dynamic programming. It is not our intention here toprovide a detailed exposition of control theory: instead we discuss the analogybetween the control and estimation problems for linear systems, and demonstrate the role of estimation theory in the control of linear stochastic systems.For these purposes, a relatively simple "special-purpose" derivation (Ref. 45) ofthe optimal feedback control law, which minimizes the index in Eq. (9.54), willsuffice.

Let us assume that a time-varying symmetric matrix. S(t) exists, defined onthe interval to .;; t';; tf, such that

(9.5-5)

where Vf is the weighting matrix for the terminal value of the state appearinginEq. (9.54). For the present, no additional conditions are imposed on S(t);hence, its values at other times can be quite arbitrary. Observe that Eq. (9.5·5)implies

(9.5-11)

Combining Eqs. (9.54), (9.5.5), (9.5-9), and (9.5·11), we obtain the followingequivalent expression for the performance index:

(9.5-6)

Now we form

(9.5·12)

(9.5-7)

where it is tacitly assumed that S(t) is differentiable. For notational conveniencethe explicit dependence of !' S, and other time-varying quantities upon t isfrequently omitted in the sequel. If li. is substituted from Eq. (9.5-3) into Eq.(9.5-7), the result can be written as

d T •dt<! SjI) = jlT(FTS+SF +S).I\+!!TLT SjI+jlTSL!!

= jlT(FTS+SF+S+V).I\ +!!TLT SJ.I +jlTSL!!+UTU!!_jlTVjI- !!TU!!

(9.5-8)

The final expression of Eq. (9.5-8) is obtained simply by adding and subtractingthe terms in the integrand of the quadratic performance index to the right sideof Eq. (9.5-7).

Both S(to) and .!So are independent of the control !!(t); consequently, J can beminimized by minimizing the integral alone in Eq. (9.5-12). Furthermore, it isevident that the integrand in Eq, (9.5-12) is nonnegative and can be madeidentically zero by requiring

(9.5·13)

Solving Eq. (9.5-13) for !l(t) and Eq. (9.5-10) for S(t), subject to the conditionimposed in Eq, (9.5-5), provides the complete solution to the optimal controlproblem, which is summarizedas follows:

!l(t) = _tTl (t) LT(t) S(t) jI(t)

S(t) = -FT(t) S(t) - S(t) F(t) + S(t) 1.(t) tT1(t) LT(t) S(t) - V(t)

S(lr)=Vf (9.5·14)

Observethat the optimal control law, derived above, has the linear form

!!(t) = ---{:(t)JI{t) (9.5·15)

360 APPLIED OPTIMAL ESTIMATiON ADDITIONAL TOPICS 361

with the feedback gainsC(t) given by

The latter are evaluated by integrating the Riccati differential equation for Set)backward in time from the terminal condition, S(tf ) = Vf. If all of the systemstate variables are measured directly [H = I in Eq. (9.5·2)] with negligiblemeasurement error, the control law can be mechanized as illustrated in Fig.9.5·\. If the matrix H in Eq. (9.5·2) is singular, an approximation to ~(t) can begenerated with an "observer," as demonstrated in Fig. 9.5-2 (see the discussionin Section 9.2). The case where appreciable measurement errors may be presentis treated in the next section.

C(t) = lJl(t) LT(t) S(t) (9.5·16) APPROXIMATELYOPTIMAL

DETERMINISTICCONTROllER

I------~

A I :,(I) I

hl-t-~'-__oJ I '--_~

I

~Itl

Figure 9.5-1 Optimal Control System Configuration, H = I

i t .4>(t,T) L(T) V'l (T)LT(T) <t>T(t, T) dr

to

Figure 9.5-2 Approximately Optimal Control SystemConfiguration; H is Singular

is positive definite for some t > 0, and observable if the integral

is positive definite for some t > O. The quantity is the transition matrixassociated with F in Eq. (9.5-3). Comparing the above definitions with thosegiven in Section 4.4, we find that controllability in the control system, definedin terms of F, L, and V-I, corresponds to observability in the Kalman filter,defined in terms of FT, HT and R-I . Similarly, observability for the controlsystem, defined in terms of FT and V, corresponds to controllability of theKalman filter, defined in terms of F and GQGT Thus, for each controllabledynamic system in a filtering problem, there exists a corresponding observabledynamic system for a control problem, and vice versa.

A list of duality transformations for the control and filtering problems isgiven in Table 9.5-1. This comparison is useful because only one body of theoryis needed to treat both situations.

OPTIMAL LINEAR STOCHASTIC CONTROL SYSTEMS SEPARATION PRINCIPLES

When the noise processes, -,y(t) and ~t), are present in Eqs. (9.5·1) and(9.5-2), the optimal control problem must be posed differently than for thedeterministic case treated in the preceding section. Because the exact value ofthe state vector is unknown, due to the noisy measurement data, it is

OPTIMALDETERMINISTICCONTROLLER

,-------'1

I : ~ltI

III

II ,(tl

There are important structural similarities between the linear control law,derived above, and the Kalman filtering algorithm for continuous systems,discussed in Section 4.3. In particular, the control gains C(t) are determined bysolution of a matrix Riccati equation in Eq. (9.5-14), similar in form to that inTable 4.3·1 which specifies the filter gains, K(t). Therefore, all the conditionsunder which solutions to the filtering problem exist, have their counterparts inthe control problem. For this reason the control and estimation problems aresaid to be the duals of each other.

One consequence of duality is the fact that the concepts of observability andcontrollability, discussed in Section 4.4, are also defined for control systems.The control system is said to be controllable if the integral

1 IL .J


TABLE 9.5-1 DUALITY RELATIONSHIPS

p

where C(I) is Ihe set of control gains derived in Ihe preceding section. Afunctional diagram of Ihe control law is shown in Fig. 9.5-3. The realization ofthe opthnal linear stochastic control law, in tenus of distinct filtenng andcontrol functions, is referred to as the separation principle.

meaningless to talk about minimizing the index J in Eq. (9.5-4), which dependsupon X(t). Instead, a statistical measure of performance is needed; this can beobtained by defining a new performance index J,which is the average value of J,written as

F

GQGT

H

R

K

Observability

Controllability -.00--------_

V

LT

U

CT

If

Controllability

Observability

OPTIMAL STOCHASTIC CONTROlLERr------------,

II

PROCESSNOISE

MEASUREMENTNOISE

~( tl

J = E[J!(tf)T v, l[(tf) +{'f Il!T(t)V(t)J!(t)+ lIT(t)U(t)J!{t)] dtJ

(9.5·17)

The optimal stochastic control problem is to choose M(t)so Ihat j is minimized,subject to the equations of motion given in Eq. (9.5-1). If a feedback control issought, lI(t) will depend upon the measurement data in Eq. (9.5·2).

To describe the solution to the control problem defined above, we first notethat regardless of what- method is used to generate 11ft),the conditional mean(optimal estimate) of Ihe state vector &(t) can always be determined by applyinga Kalman filter to the measurement data. This is true because y(t), in Eq.(9.5-1), is effectively a known time-varying input to the linear system when thecontrol law is specified. Consequently, &(t) can be determined from thealgorithm in Table 4.4--.!. With this fact in mind, we can state the optimal controllaw which minimizes J; it consists of two separate cascaded functions. First, aconventional Kalman filter is employed in the manner outlined above to obtainan optimal estimate of the state vector. Then the control command, y(t), isgenerated according to the relation

Figure 9.5-3 Optimal Stochastic Control System Configuration

The separation principle is usually d.:rived by applying Ihe,theory of dynamicprogramming to the minimization of J in Eq. (9.5-17) (Ref. 44). To provideinsight as to why the principle is true, we offer a relatively simple plausibilityargument, without the mathematical rigor of a complete proof. Using aprocedure similar to that described in the preceding section for deriving theoptimal deterministic control law, it can be shown that the performance indexcan be written in the form

(9.5-19)

rj = E!J!.,TS('o)l[o] + trace [SGQGTI dt'0

+E b(~f (l[TSL+ lITU)U"(LTSl!+ UlIldt]

where S(t) is the solution to Ihe matrix Riccati equation in Eq. (9.5-14). Thederivation of Eq. (9.5-19) is left as an exercise. The principal difference betweenEqs. (9.5-12) and (9.5-19) is that the latter contains the term SGQGT, whicharises from the process noise in Eq. (9.5-1). The first and second terms in Eq.(9.5-19) are independent of l!ft) and can be disregarded in the nlinhnization.Interchanging the order of expectation and integration, the third term in Eq.(9.5-19) becomes

(9.5-18)lI(t) = --{;(t) &(t)


(9.5·20)

At any given time t, having collected all the measurement data up to time t,the integrand in Eq. (9.5-20) is instantaneously minimized with respect to thecontrol by solving the expression

systems. For nonlinear systems, the optimal control laws generally do notseparate into an optimal filter cascaded with a control command computation[Refs. 50 and 51). However, in these cases it may be possible to linearize thenonlinearities and design an approximately optimal linearized system using theseparation principle. Thus, from this point of view, estimation theory is seen tobe exceptionally useful for a wide variety of feedback control system designproblems.

(9.5-21) REFERENCES

for !let), where the expectation is conditioned on the available measurementdata. The resulting value of the control is given by

Observe that it is expressed in terms of the optimal state estimate, &(t).Substituting Eq. (9.5-22) into Eq. (9.5-20) and carrying out the indicatedexpectation operation produces

which is independent of 2S,(t); J lI depends only upon the trajectory-independentquantities S, L, U and P. Therefore, if l!,(t) is specified by Eq. (9.5-22), theintegrand in Eq. (9.5-20) is minimized at each point in time. Consequently, theintegral itself, and hence J. is minimized by the choice of control given in Eq.(9.5-22).

The separation principle cited above is most frequently associated with lineargaussian stochastic systems having quadratic performance indices. However, ithas been demonstrated (Refs. 46-49) that optimal control laws for linearstochastic systems with more general types of performance indices, including thepossibility of an explicit constraint on the control magnitude as well asnongaussian measurement noise, also satisfy a form of separation principle. Inthe latter case, the control law consists of a Kalman filter cascaded with acontroller that computes !!it) in terms of jet). However, for nonquadraticperformance indices, the controller portion of the control law can be a nonlinearfunction of &(t); furthermore, it generally depends upon the statistics of theprocess and measurement noises. In many cases the resulting controllercomputation cannot be carried out in closed form. The structure of this moregeneral separation principle is similar to that shown in Fig. 9.5-3; however, thecontroller can now be nonlinear. The reader is referred to the works cited abovefor further details.

The purpose of this section has been to demonstrate the role that optimalestimation theory plays in developing optimal control laws for linear stochastic

1. Mehra, R.K., "On the Identification of Variances and Adaptive Kalman Filtering,"IEEE Trans. on Automatic Control, Vol. AC-15, No.2, April 1970, pp. 175-184.

2. Weiss, I.M., "A Survey of Discrete Kalman-Bucy Filtering with Unknown NoiseCovariances," AIAA Guidance, Control and Flight Mechanics Conference, Paper No.70-955, Santa Barbara, California, August 1970.

3. Mehra, R.K., "On-Line Identification of Linear Dynamic Systems with Applications toKalman Filtering," IEEE Trans. on Automatic Control, VoL AC-16, No. 1, February1971, pp. 12-21.

4. Abramson, P.O., Jr., "Simultaneous Estimation of the State and Noise Statistics inLinear Dynamic Systems," Ph.D. Thesis, Massachusetts Institute of Technology, TE-25,May 1968.

5. Kailath, T., "An Innovations Approach to Least Squares Estimation - Part I: LinearFiltering in Additive White Noise," IEEE Trans: on Automatic Control, Vol. AC·13,No.6, December 1968.

6. Gelb, A., Dushman, A., and Sandberg, H.J., "A Means for Optimum SignalIdentification," NEREM Record, Boston, November 1963.

7. Luenberger, D.G., "Observing the State of a Linear System," IEEE Transactions onMiliftJryElectronics, Vol. MIL 8, April 1964.

8. Luenberger, D.G., "Observers for Multivariable Systems," IEEE Transactions onAutomatic Control, Vol. AC-Il, No.2, April 1966.

9. Uttam, B.l., "On the Stability of Time-Varying Systems Using an Observer forFeedback Control," Intemationat Journal of Control, December 197L

10. Uttam, B.J. and O'Halloran, W.F., Jr., "On Observers and Reduced Order OptimalFilters for Linear Stochastic Systems," Proc: of the Joint Automatic ControlConference, August 1972.

11. O'Halloran, W.F., Jr. and Uttam, 8.J., "Derivation of Observers for Continuous LinearStochastic Systems From the Discrete Formulation," Sixth Asilomar Conference,Pacific Grove, California, November 1972.

12. Aoki, M. and Huddle, J.R., "Estimation of a State Vector of a Linear Stochastic Systemwith a Constrained Estimator," IEEE Transactions on Automatic Control, Vol. AC-12,No.4, August 1967.

13. Tse-, E. and Athans, M., "Optimal Mmimal-Order Observer Estimators for DiscreteLinear Time-Varying Systems," IEEE Transactions on Automatic Control, Vol. AC-15,No.4, August 1970.

14. Novak, L.M., "Optimal Minimal-Order Observers for Discrete-Time Systema-- AUnified Theory," Automatice, Vol. 8, July 1972.

(9.5-23)

(9.5-22)!!(t) = -U-'(t) L(t)S(t) R(t)

Problem 9-1

A system and measurement are described by the set of differential equations

PROBLEMS

37. Lion, P.M., "Rapid Identification of Linear and Nonlinear Systems," AIM Journal,Vol. 5, No. 10, October 1967.

38. Kushner, H.J., "On the Convergence of Lion's Identification Method with RandomInputs," IEEE Trans. on Automatic Control, Vol. AC·1S, No.6, December 1970, pp.652-<i54.

w-N(O,q)

V"" N(O,r)

X=-/JX+w,

z = x + v,

50. Fel'dbaum, A.A., Optimal Control Systems, Academic Press, New York, 1965.

51. Aoki, M., Optimization 0/ Stochastic Systems, Academic Press, New York, 1967.

52. Yoshikawa, T. and Kobayashi, H., "Comments on 'Optimal Minimal Order ObserverEstimators for Discrete Linear Time-Varying Systems'," IEEE Transactions onAutomatic Control, Vol. AC~17, No.2, April 1972.

48. Deyst, J.J., Jr., "Optimal Control in the Presence of Measurement Uncertainties," Sc.D.Thesis, January 1967, Department of Aeronautics and Astronautics, MassachusettsInstitute of Technology, Cambridge, Massachusetts.

49. Deyst, J.J., Jr. and Price, C.F., "Optimal Stochastic Guidance Laws for TacticalMissiles," AIAA Journal 0/ Spacecraft and Rockets, Vol. 10, No.5, May 1973,pp. 301-308.

39. Astrom, K.J. and Eykoff, P., "System Identification - A Survey," Automatica, Vol. 7,No.2, March 1971, pp. 123-162.

40. Price, C.F. and Koenigsberg, W.D., "Parameter Identification for Tactical Missiles,"TR·17(}3, The Analytic Sciences Corp., 1 December 1971.

41. Mehra, R.K., "Frequency-Domain Synthesis of Optimal Inputs for Linear SystemParameter Identification," Harvard University, Technical Report No. 645, July 1973.

42. Kalman, R.E., "Fundamental Study of Adaptive Control Systems," Technical ReportNo. AS[)'TR~61-27, VoL I, Flight Control Laboratory, wright-Patterson Air Force Base,Ohio, April 1962.

43. Athans, M. and Falb, P.L., Optimal Control, McGraw~Hill Book Co., Inc., New York,1966.

44. Bryson, A.E., Jr. and Ho, Y. C., Applied Optimal Control, Blaisdell Publishing Co.,Waltham, Mass., 1969.

45. Stear, E.B., Notes for a short course on Guidance and Control of Tactical Missiles,presented at the University of California at Los Angeles, July 21-25, 1969.

46. Potter, J.E., "A Guidance-Navigation Separation Theorem," Report RE-ll, Experimental Astronomy Laboratory, M.I.T., August 1964.

47. Striebel, CT.• "Sufficient Statistics on the Optimal Control of Stochastic Systems,"Journal of Mathematical Analysis and Applications, Vol. 12, pp. 576-592, December1965.

20. Ho, Y.C, ''On the Stochastic Approximation Method and Optimal Filtering Theory,"Journal ofMathematical Analysis and Applications, Vol. 6,1962, pp- 152·154.

21. Ho, Y.c. and Lee, R.C.K., "Identification of Linear Dynamic Systems," Journal ofIn/amotion and Control, Vol. 8, 1965, pp. 93~110.

22. Sakrison, D.J., "Stochastic Approximation: A Recursive Methoa for Solving RegressionProblems," Advances in Communication Systems, Vol. 2, 1966, pp. 51-106.

23. Sage, A.P. and Melsa, J.L., System Identification, Academic Press, New York, 1967.

24. Albert, A.E. and Gardner, L.A., Jr., Stochastic Approximation and NonlinearRegression, The M.I.T. Press,Cambridge, Mass., 1967.

25. Sinha, N.K. and Griscik, M.P., "A Stochastic Approximation Method," IEEE Trans.onSystems, Man, and Cybernetics, Vol. SMC-l, No.4, October 1971, pp. 338-343.

26. Kiefer, E. and Wolfowitz, J., "Stochastic Estimation of the Maximum of a RegressionFunction," Ann. Math. Statistics, Vol. 23, No.3, 1952.

27. Ho, Y.c. and Newbold, P.M., "A Descent Algorithm for Constrained StochasticExtrema," Journal 0/ Optimization Theory and Applications, Vol. I, No.3, 1967, pp.215-231.

28. Sandie, G.N., Nikolic, Z.J., and Fu, K.S., "Stochastic Approximation Algorithms forSystem Identification, Estimation and Decomposition of Mixtures," IEEE Trans. onSystems Science and Cybernetics, Vol. SSC·5, No.1, January 1969, pp. 8-15.

29. Tsypkin, Y.S., "Adaptation, Training and Self-Drganizing in Automatic Systems,"Automatika i Telemekhanika, Vol. 27, No. I, January 1966, pp. 23.-61.

30. Mendel, J.M. and Fu, K.S., Adaptive, Learning and Pattern Recognition Systems,Academic Press, New York, 1970.

31. Knopp, K., Theory and Applictltion oflnfinue Series, Hafner, New York, 1947.

32. Deyst, J.J., Jr., and Price, C.F., "Conditions for Asymptotic Stability of the Discrete,Minimum Variance, Linear Estimator," IEEE Trans. on Automatic Control, Vol.AC-13, No.6, December 1968, pp. 702·705.

33. Price, C.F., "An Analysis of the Divergence Problem in the Kalman Filter:' IEEE Trans.on Automatic Control, Vol. AC·13, No.6, December 1968, pp. 699·702.

34. Carney, T.M., "On Joint Estimation of the State and Parameters for an AutonomousLinear System Based on Measurements Containing Noise," Ph.D. Thesis, RiceUniversity. 1967.

35. Melu:a, R.K., "Identification of Stochastic Linear Dynamic Systems," 1969 IEEESymposium on Adaptive Processes, November 1969, Philadelphia, Pennsylvania.

36. Sage, A.P., Optimum Systems Control, Prentice-Hall, Inc., Englewood Cliffs, NewJersey, 1968.

15. Todd, J., Survey of Numerical Al'Ullysis. McGraw-Hill Book Co., Inc., New York, 1962.

16. Robbins, H. and Munroe, S., "A Stochastic Approximation Method," Ann. Math.Statistics, Vol. 22, No. I, 1951, pp. 400~407.

17. Wolfowitz, J., ''On the Stochastic Approximation Method of Robbins and Munroe,"Ann. Math. Statistics, Vol. 23, No.3, 1952, pp. 457466.

18. Schmetterer, L., "Stochastic Approximation," Proc. 4th Berkeley Symp. on Mathematical Statistics and Probability, Vol. I, Los Angeles: University of California Press,1958, pp. 587-609.

19. Derman, C, "Stochastic Approximation," Ann. Math. Statistics, Vol. 27, 1956, pp.819-886.

368 APPLIED OPTIMAl- ESTIMATION

with measurement data processed to provide an estimate xaccording to

X=-.oX:+k(z-X)

where k is based on erroneous Q priori assumptions about q and r.Describe a computational approach leading to the adaptive behavior of this estimator, in

which the value of k can be improved as the system operates. Detail all the relevantequations and estimate, in terms of the given parameters, the length of time it would take toestablish the correct value of k to within 5%.

Problem 9-2

The stochastic observer undergoes initial discontinuities in both the state estimates andestimation error covariance due to the exact information provided by the noise-freemeasurements at t = 0+. Show that i,(0+) and P(O+) satisfy Eqs. (9.2-39) and (9.240).[Hint: First show that

R(o+) = R(O) + 8,(0)!Ja(0) - H,(O)R(O»)

P(O+)= (I - 8,(0)H,(0») P(O)[I - 8,(0) H,(O)]T

and then determine the optimum gain, B20pt(0), by minimizing the trace of P(O+) withrespect to B2 (0). )

Problem 9-3

Define the covariance of the observer error .I, to be

Show that nand P can be related by

n= TPTT

p=AnAT

Verify this result for the special case of the colored-noise stochastic observer, where A and Tare specified according to Eq. (9.2-51).

Problem 9-4

The stochastic observer covariance propagation and optimal gains given by Eqs, (9.2-36)to (9.2-38) are written in terms of the n X n covariance matrix, P. As shown in Problem 9-3,the observer errors may be equivalently represented by the reduced-order (n-m) X (n-mjcovariance matrix, n. Starting with the differential equation for the observer error, t, givenin Eq. (9.2-30). derive the differential equation for the observer error covariance matrix, n.By minimizing the trace of n, derive expressions for the optimal gains of the stochasticobserver in terms of the reduced-order covariance matrix, n.

Problem 9-5

Prove that the modified Newton's algorithm

ADDITIONAL TOPiCS 369

will converge to the solution, xo, of

g(x) = 0

from any initial guess, XI, where g(X) is a known function satisfying

0'" a '" Ig(X)I '" b < ""

and Itois a constant which satisfies

(Hint: Find a recursion for the quantity, xi - xo')

Problem 9-6

Demonstrate that conditions in Eqs. (9.3-7c) and (9.3-7d) are necessary in order that thesequence ~ Xk} generated by Eq. (9.3-5) converges to the solution of

g(x) = 0

(Hint: Evaluate xi for the special case, g(x)::=X - a, in terms of the gain sequence {kk}')

Problem 9-7

Verify that the gain sequences defined in Eq. (9.3-17) satisfy the conditions in Bqs.(9.3-7c) and (9.3-7d).

Problem 9-8

Derive Eqs. (9.3-37) through (9.340).

Problem 9-9

Suppose that a random sequence {Xk} evolves according to

xk+l=xk+wk; k=1,2, ...

Elx') =0

E(x. 2 ] =ao2

where {Wk } is a sequence of independen t zero mean random variables having uniform variance 0.

2• Measurements of the form

are available, where VIc. is a sequence of zero mean independent random variables havinguniform variance 0) 2. Using the algorithm

to estimate xk, where ko is constant, derive conditions on Itosuch that the mean squareestimation error is bounded. Compute the rms estimation error in terms of 00

2,012,0'1 2,

andko·

370 APPLiED OPTIMAL ESTIMATiON

Problem 9-10

Formulate and give the solution to the optimal control problem which is the dual of theestimation problem described in Example 4.6-1.

Problem 9-11

Prove that'i in Eq. (9.5-17) can be expressed in the form given in Eq. (9.5~19) (Hint:Mimic the development beginning with Eq. (9.5-5), replacinl~TS~by E(~TS~J).

INDEX

Abel-Dini theorem, 343Autocorrelation techniques, 86Adaptive filtering, 316, 317Age-weighting, 285Algorithms, integration, 294Andrews, A., 306Augmentation. see state-vectorAutopilot, 153,351Autoregressive (AR) procese, 92

integrated moving average (ARIMA).95moving average (ARM A), 94

Backward filter, 156, 160Batch processing scheme, 24Battin, R.H., 24Bayes' theorem, 26

Bayesian estimation, 104, 180Bellman, R.E., 5 IBierman, A.B., 311Bryson, A.E., 177Bucy, R.S., 121,143, 305

CADET,216Carlson, N.A., 307

Cayley-Hamilton theorem, 19,47Central limit theorem, 34, 302Characteristic function, 32Chopping, 299Companion form, matrix, 53Computer loading analysis, 303, 308Controllability, 70, 360

complete, 142stochastic controllability, 130

Convergence, mean square, 337Cooley, J.W., 95Correlated measurement errors, 124, 133,

332Correlated process noise, 124Correlation coefficient, 32Correlation functions, 36

autocorrelation, 36cross-correlation, 36

Correlation time, 43Covariance, 31, 34Covariance matrix, 72, 75,127

Datil compression, 291Davenport, W.B., Jr., 24Decoupling states, 233, 289

Dctcung states 235, 257, 288Delta function

DiT.lC, 28, 75,12\Kroncckcrvl S

Dcvcnbiug function, 205, 217gain, 205gain matrix 207,218vcctor,214

Dctcrnunictic {cont tol] inputs 130Drtrcrentfutor, optimal, 150Direct simulation, 301Discrete system rcprcscntutton, 66Divergence, 278Duality, 357, 362

e tcchniquc, 283, 303l-mpincal model idcntificauon, 84Ergodicity,37Error budget, 230

calculation or, 260lvtun.uc, unbtaccd, 75

Bayes, 104consistent, 103'ca\\-l.q\l,ITC~, 103mu vimumlikciihoodv lfl Jminimum variance. 103wcightcd-lcuvt-vquarcv, 103

l-stimutor, optimal, 105r:\pcctatioll, <:.tati"ticat,30Exponential csumatcr, 321Lxponcnt iully correi.rtcd proce.\'-,81Extended Kalman filler, 182Eu1cr'\ method, 294

modified Euler method, 295

l-ndtng memory filtering, 285I avt I ouricr transtcrm, 95I ccdback control, 356Lrcutious proccvv noise, 279I ntcr dcvign , verification 01',237I-nter bypotbcvis 222l-Htcrmg. linear, 2

o)ltillml,102I:iltcrin!!, nonlinear, 182

cvtcndcc Kalman filter, 182,188iterated extended Kollman filter, 190linearized Kalman filler, 189second-order filter, 191ctanvncat hncnnzation, 203, 2()8

l-irutc datu ctfcct , 89Hnitt' memory Illtctiug , 2HOl-isbcr, H..A., I"i,ed"1!ain upprovimntionv, 239t-orwurdtfhcr, 156, 160

I rJlicr.M., 177

Gauss, K.L, 1Gaussian processes, 39Gaussian second-order filter, 192Gauss-markov processe\, 42, 81Gura,I.A.,311Gyroscope. 13&, 166

Hessian, 22Wlbert norm, 243110, Y.C., 124

Inertial navigation system, 56, 98, 116,288lnnovunons process. 317

innovationv property, 317Integration ulgorithmv, 293lIocalculm,181

J:.ICObl'II1.22Jazwmski, A.II., 282Joglckur. A.N., 292Joseph, P.O., 305

Kutmun, R,L. 1,2,51, 105, 121, 143Kalman fihcr, 3, 5

continuous. 119divcrctc. 107~I\"itivlty al\alys,s of, 246vtubility, 132

Kalman gain matnv 108,128approvimuting oprimul gains, 239uving vtcad y-sratc gainv, 240

Kolmogorov. A.N .• I

Laning, J.II., Jr.. 24Least-squares cvthnauon, 23,103,214Lcibniz' rule, 131, 171Limited memory filtering, 282Linear variance equation, 78, 97,122

quasi-linear form, 219solution of, 137

Lucnbcrgcr , D.G., 320Lyupunov, A.M., 51

Malri'\ Ricc:.I11 equation, 122,359physical interpretation of, 128. 142solution of, 136

Matri, opcrauonv, 13M,l,imum 1i.kclil\ood C\t\matiOl\, t03Mean value. 30, 34Mean ~q uurcd value, 30Measurement, 3, 23, 67

measurement avcnunng, 291

measurement differencing, ] 36measurement differenti:.llion, 134

Mehra, R.K., 317Mendel, J.M., 311Minimum variance esumauon, 102, 182Modeling errors, effect of, 248Modelmg problems, 278Modeling unknown parameters, 349Models, choosing ~impJjfied, 232Modeh, rundorn process, 83, 85Moments, 30Moving average (MA) process. 94Multiple-input describing functions, 218Multisensor system, 4Multivariate probability function, 34Munroe, S., 337, 339

Newton's method, 336Nonlinear estimation, 203

least-squares, ]82, 2]4minimum variance, ]82

Nonuruquencss of model, 71Norm, matrix, 21, 243Normal density function, 33

Observability, 2, 67, 361complete, 142stochastic cbscrvability, ]31

Observers. 316, 320for deterministic cystcms, 321for stochustlc sysrem-, 327

Omega. 1]5Open loop control, 356Optimal control, 316, 356Optimal deterministic controller. 360Optimal estimator, 2Optimal smoother, 2, 156

sensitivity analysis of, 266Orbit estimation, 303

Papoulis, A., 24Parameter identification, real-time. 316.

348optimal inputs for, 356

Periodic random process 82Phaneuf, R.J., 209Poisson random process, 48Polynomial tracking filter, ]50Pontryagin, L.S., 5]Power spectral density, 40Prediction, 2

optimal, 129Preflltcriug, 291

INDEX 373

Probability, 25density function 28distribution function, 27joint density function, 29joint distribution function, 29

Pseudoinverse, matrix, 19pursuer-evader problem, 222

Quadratic performance index, 357Quasi-linear system model, 2]9

Random constant, 79Random processes, 35Random ramp, 80Random walk, 79Random variables, 27Rank, matrix, 18, 69Rauch, H.E., ] 6]Rayleigh process, 49Recursive filters. 105Recursive processing, I, 24Reduced-order estimator, 320Regulator problem, 357Residual, measurement, 84, 106Robbins, H•. 337, 339Root, W.L., 24Roundoff errors, 137, 292

Scatter matnv 12Schmidt. S.F .• 284, 306Second-order markov, 44Sensitivity analysis, 2, 229

evumples of, 255minimum sensitivity design, 232, 244sensitivity measures, 245

Separation principle, 363Serial measurement processing, 304SmoothabiJity, 163Smoothing, linear> 2

fixed-interval, 157, 160, 162, 169fixed-lag, 157,173fixed-point, 157, ]}Ooptimal, 156 .... ~,

Smoothing, nontleear, 200Spacecraft trackilJg'. 126, 151, 165, 283,

290 ,Spectral density matrix, 75Square-root algorithms, 306Stability. system, 76,]32Standard deviation, 31State-"pace notation, 5]State vector, 52

augmentation of, ] 8, 133, 262


Stationurity,37Statistical linearization, 203Stellar navigation system, 149Stochastic approximation, 316, 335Striebel, C.T., 161Suboptimal filter, 229

design of, 230Successive approximations, method of, 336Superposition integral, 40

matrix form, 63Symmetric rounding, 299

The Analytic Sciences Corporation, 216Time series analysis, 90Trace, matrix, 18Trajectory estimation, 194, 325Transition matrix, 57

special algorithms for, 296Truncation errors, 292Truth model, 269Tukey, J.W., 95Tung, F., 161

Uniform distribution, 32

Vander Velde. W.E., 213Variance, 30vector-matrix operations, 20Vector operations, 10Vertical deflections, 87

White noise, 42, 72Wiener, N., I, 105Wiener filter, 142, 242wiener-Hopr equation, 143Word length errors, 299

effect of, 302

Yule·Wulker equations, 93

iiiiii - ulisboausers.isr.ist.utl.pt/~pjcro/temp/applied optimal estimation - gelb.pdf · questions...

Documents