700 ieee transactions on signal processing, vol. 60, no. …vikramk/kbgm12.pdf · 700 ieee...

15
700 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 60, NO. 2, FEBRUARY 2012 Sequential Detection With Mutual Information Stopping Cost Vikram Krishnamurthy, Fellow, IEEE, Robert R. Bitmead, Fellow, IEEE, Michel Gevers, Fellow, IEEE, and Erik Miehling Abstract—This paper formulates and solves a sequential detec- tion problem that involves the mutual information (stochastic ob- servability) of a Gaussian process observed in noise with missing measurements. The main result is that the optimal decision is char- acterized by a monotone policy on the partially ordered set of posi- tive definite covariance matrices. This monotone structure implies that numerically efficient algorithms can be designed to estimate and implement monotone parametrized decision policies. The se- quential detection problem is motivated by applications in radar scheduling where the aim is to maintain the mutual information of all targets within a specified bound. We illustrate the problem for- mulation and performance of monotone parametrized policies via numerical examples in fly-by and persistent-surveillance applica- tions involving a ground moving target indicator (GMTI) radar. Index Terms—Kalman filter, lattice programming, monotone de- cision policy, mutual information, radar tracking, sequential detec- tion, stopping time problem. I. INTRODUCTION C ONSIDER the following sequential detection problem. targets (Gaussian processes) are allocated priorities . A sensor obtains measurements of these evolving targets with signal to noise ratio (SNR) for target proportional to priority . A decision maker has two choices at each time : If the decision maker chooses action (continue) then the sensor takes another measurement and accrues a measurement cost . If the decision maker chooses action (stop), then a stopping cost proportional to the mutual information (stochastic observability) of the targets is Manuscript received May 20, 2011; revised August 30, 2011; accepted Oc- tober 27, 2011. Date of publication November 09, 2011; date of current version January 13, 2012. The associate editor coordinating the review of this manu- script and approving it for publication was Prof. Ljubisa Stankovic. The work of authors V. Krishnamurthy and E. Miehling was supported by an NSERC Strategic Grant and DRDC Ottawa. The work of author M. Gevers was sup- ported by the Belgian Network DYSCO (Dynamical Systems, Control, and Op- timization), funded by the Interuniversity Attraction Poles Programme, initiated by the Belgian State, Science Policy Office. The scientific responsibility rests with the authors. V. Krishnamurthy and E. Miehling are with the Department of Electrical and Computer Engineering, University of British Columbia, Vancouver, BC V6T 1Z4, Canada (e-mail: [email protected]; [email protected]). R. R. Bitmead is with the Department of Mechanical and Aerospace Engi- neering, University of California San Diego, CA 92093-0411 USA (e-mail: rbit- [email protected]). M. Gevers is with the Department of Mathematical Engineering, Université Catholique de Louvain, B-1348, Louvain-la-Neuve, Belgium, and also with the Department ELEC, Vrije Universiteit Brussel (e-mail: Michel.Gevers@uclou- vain.be). Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org. Digital Object Identifier 10.1109/TSP.2011.2175388 accrued and the problem terminates. What is the optimal time for the decision maker to apply the stop action? Our main result is that the optimal decision policy is a monotone function of the target covariances (with respect to the positive definite partial ordering). This facilitates devising numerically efficient algorithms to compute the optimal policy. The sequential detection problem addressed in this paper is non-trivial since the decision to continue or stop is based on Bayesian estimates of the targets’ states. In addition to Gaussian noise in the measurement process, the sensor has a non-zero probability of missing observations. Hence, the sequential detection problem is a partially observed stochastic control problem. Targets with high priority are observed with higher SNR and the uncertainty (covariance) of their estimates decreases. Lower priority targets are observed with lower SNR and their relative uncertainty increases. The aim is to devise a sequential detection policy that maintains the stochastic observability (mutual information or conditional entropy) of all targets within a specified bound. Why stochastic observability? As mentioned above, the stop- ping cost in our sequential detection problem is a function of the mutual information (stochastic observability) of the targets. The use of mutual information as a measure of stochastic ob- servability was originally investigated in [1]. In [2], determining optimal observer trajectories to maximize the stochastic observ- ability of a single target is formulated as a stochastic dynamic programming problem—but no structural results or characteri- zation of the optimal policy is given; see also [3]. We also refer to [4] where a nice formulation of sequential waveform design for MIMO radar is given using a Kullback-Leibler divergence based approach. As described in Section III-C, another favor- able property of stochastic observability is that its monotonicity with respect to covariances does not require stability of the state matrix of the target (eigenvalues strictly inside the unit circle). In target models, the state matrix for the dynamics of the target has eigenvalues at 1 and thus is not stable. Organization and Main Results: i) To motivate the sequential detection problem, Section II presents a GMTI (Ground moving target indicator) radar with macro/micro-manager architecture and a linear Gaussian state space model for the dynamics of each target. A Kalman filter is used to track each target over the time scale at which the micro-manager operates. Due to the presence of missed detec- tions, the covariance update via the Riccati equation is measure- ment dependent (unlike the standard Kalman filter where the co- variance is functionally independent of the measurements). 1053-587X/$26.00 © 2011 IEEE

Upload: others

Post on 20-Oct-2019

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 700 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 60, NO. …vikramk/KBGM12.pdf · 700 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 60, NO. 2, FEBRUARY 2012 Sequential Detection With

700 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 60, NO. 2, FEBRUARY 2012

Sequential Detection With MutualInformation Stopping Cost

Vikram Krishnamurthy, Fellow, IEEE, Robert R. Bitmead, Fellow, IEEE, Michel Gevers, Fellow, IEEE, andErik Miehling

Abstract—This paper formulates and solves a sequential detec-tion problem that involves the mutual information (stochastic ob-servability) of a Gaussian process observed in noise with missingmeasurements. The main result is that the optimal decision is char-acterized by a monotone policy on the partially ordered set of posi-tive definite covariance matrices. This monotone structure impliesthat numerically efficient algorithms can be designed to estimateand implement monotone parametrized decision policies. The se-quential detection problem is motivated by applications in radarscheduling where the aim is to maintain the mutual information ofall targets within a specified bound. We illustrate the problem for-mulation and performance of monotone parametrized policies vianumerical examples in fly-by and persistent-surveillance applica-tions involving a ground moving target indicator (GMTI) radar.

Index Terms—Kalman filter, lattice programming, monotone de-cision policy, mutual information, radar tracking, sequential detec-tion, stopping time problem.

I. INTRODUCTION

C ONSIDER the following sequential detection problem.targets (Gaussian processes) are allocated priorities

. A sensor obtains measurements of theseevolving targets with signal to noise ratio (SNR) for targetproportional to priority . A decision maker has two choicesat each time : If the decision maker chooses action(continue) then the sensor takes another measurement andaccrues a measurement cost . If the decision maker choosesaction (stop), then a stopping cost proportional to themutual information (stochastic observability) of the targets is

Manuscript received May 20, 2011; revised August 30, 2011; accepted Oc-tober 27, 2011. Date of publication November 09, 2011; date of current versionJanuary 13, 2012. The associate editor coordinating the review of this manu-script and approving it for publication was Prof. Ljubisa Stankovic. The workof authors V. Krishnamurthy and E. Miehling was supported by an NSERCStrategic Grant and DRDC Ottawa. The work of author M. Gevers was sup-ported by the Belgian Network DYSCO (Dynamical Systems, Control, and Op-timization), funded by the Interuniversity Attraction Poles Programme, initiatedby the Belgian State, Science Policy Office. The scientific responsibility restswith the authors.

V. Krishnamurthy and E. Miehling are with the Department of Electrical andComputer Engineering, University of British Columbia, Vancouver, BC V6T1Z4, Canada (e-mail: [email protected]; [email protected]).

R. R. Bitmead is with the Department of Mechanical and Aerospace Engi-neering, University of California San Diego, CA 92093-0411 USA (e-mail: [email protected]).

M. Gevers is with the Department of Mathematical Engineering, UniversitéCatholique de Louvain, B-1348, Louvain-la-Neuve, Belgium, and also with theDepartment ELEC, Vrije Universiteit Brussel (e-mail: [email protected]).

Color versions of one or more of the figures in this paper are available onlineat http://ieeexplore.ieee.org.

Digital Object Identifier 10.1109/TSP.2011.2175388

accrued and the problem terminates. What is the optimal timefor the decision maker to apply the stop action? Our mainresult is that the optimal decision policy is a monotone functionof the target covariances (with respect to the positive definitepartial ordering). This facilitates devising numerically efficientalgorithms to compute the optimal policy.

The sequential detection problem addressed in this paperis non-trivial since the decision to continue or stop is basedon Bayesian estimates of the targets’ states. In addition toGaussian noise in the measurement process, the sensor hasa non-zero probability of missing observations. Hence, thesequential detection problem is a partially observed stochasticcontrol problem. Targets with high priority are observed withhigher SNR and the uncertainty (covariance) of their estimatesdecreases. Lower priority targets are observed with lower SNRand their relative uncertainty increases. The aim is to devisea sequential detection policy that maintains the stochasticobservability (mutual information or conditional entropy) of alltargets within a specified bound.

Why stochastic observability? As mentioned above, the stop-ping cost in our sequential detection problem is a function ofthe mutual information (stochastic observability) of the targets.The use of mutual information as a measure of stochastic ob-servability was originally investigated in [1]. In [2], determiningoptimal observer trajectories to maximize the stochastic observ-ability of a single target is formulated as a stochastic dynamicprogramming problem—but no structural results or characteri-zation of the optimal policy is given; see also [3]. We also referto [4] where a nice formulation of sequential waveform designfor MIMO radar is given using a Kullback-Leibler divergencebased approach. As described in Section III-C, another favor-able property of stochastic observability is that its monotonicitywith respect to covariances does not require stability of the statematrix of the target (eigenvalues strictly inside the unit circle).In target models, the state matrix for the dynamics of the targethas eigenvalues at 1 and thus is not stable.Organization and Main Results:

i) To motivate the sequential detection problem, Section IIpresents a GMTI (Ground moving target indicator) radar withmacro/micro-manager architecture and a linear Gaussian statespace model for the dynamics of each target. A Kalman filteris used to track each target over the time scale at which themicro-manager operates. Due to the presence of missed detec-tions, the covariance update via the Riccati equation is measure-ment dependent (unlike the standard Kalman filter where the co-variance is functionally independent of the measurements).

1053-587X/$26.00 © 2011 IEEE

Page 2: 700 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 60, NO. …vikramk/KBGM12.pdf · 700 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 60, NO. 2, FEBRUARY 2012 Sequential Detection With

KRISHNAMURTHY et al.: SEQUENTIAL DETECTION WITH MUTUAL INFORMATION STOPPING COST 701

ii) In Section III, the sequential detection problem is for-mulated. The cost of stopping is the stochastic observabilitywhich is based on the mutual information of the targets. The op-timal decision policy satisfies Bellman’s dynamic programmingequation. However, it is not possible to compute the optimalpolicy in closed form.1 Despite this, our main result (Theorem 1)shows that the optimal policy is a monotone function of thetarget covariances. This result is useful for two reasons: a) Al-gorithms can be designed to construct policies that satisfy thismonotone structure and b) the monotone structural result holdswithout stability assumptions on the linear dynamics. So thereis an inherent robustness of this result since it holds even if theunderlying model parameters are not exactly specified.

iii) Section IV exploits the monotone structure of the op-timal decision policy to construct finite dimensional parame-trized policies. Then a simulation-based stochastic approxima-tion (adaptive filtering) algorithm (Algorithm 1) is given to com-pute these optimal parametrized policies. The practical impli-cation is that, instead of solving an intractable dynamic pro-gramming problem, we exploit the monotone structure of theoptimal policy to compute such parametrized policies in poly-nomial time.

iv) Section V presents a detailed application of the sequen-tial detection problem in GMTI radar resource management.By bounding the magnitude of the nonlinearity in the GMTImeasurement model, we show that for typical operating values,the system can be approximated by a linear time invariant statespace model. Then detailed numerical examples are given thatuse the above monotone policy and stochastic approximationalgorithm to demonstrate the performance of the radar man-agement algorithms. We present numerical results for two im-portant GMTI surveillance problems, namely, the target fly-byproblem and the persistent surveillance problem. In both cases,detailed numerical examples are given and the performance iscompared with periodic stopping policies. Persistent surveil-lance has received much attention in the defense literature [6],[7], since it can provide critical, long-term surveillance infor-mation. By tracking targets for long periods of time using aerialbased radars, such as DRDC-Ottawa’s XWEAR radar [6] or theU.S. Air Force’s Gorgon Stare Wide Area Airborne SurveillanceSystem, operators can “rewind the tapes” in order to determinethe origin of any target of interest [7].

v) The Appendix presents the proof of Theorem 1. It useslattice programming and supermodularity. A crucial step in theproof is that the conditional entropy described by the Riccatiequation update is monotone. This involves use of Theorem 2which derives monotone properties of the Riccati and Lyapunovequations. The idea of using lattice programming and super-modularity to prove the existence of monotone policies is wellknown in stochastic control; see [8] for a textbook treatment of

1For stochastic control problems with continuum state spaces such as con-sidered in this paper, apart from special cases such as linear quadratic controland partially observed Markov decision processes, there are no finite dimen-sional characterizations of the optimal policy [5]. Bellman’s equation does nottranslate into practical solution methodologies since the state space is a con-tinuum. Quantizing the space of covariance matrices to a finite state space andthen formulating the problem as a finite-state Markov decision process is infea-sible since such quantization typically would require an intractably large statespace.

the countable state Markov decision process case. However, inour case since the state space comprises covariance matrices thatare only partially ordered, the optimal policy is monotone withrespect to this partial order. The structural results of this paperallow us to determine the nature of the optimal policy withoutbrute force numerical computation.

Motivation—GMTI Radar Resource Management: Thispaper is motivated by GMTI radar resource managementproblems [9], [10], [12]. The radar macro-manager deals withpriority allocation of targets, determining regions to scan,and target revisit times. The radar micro-manager controlsthe target tracking algorithm and determines how long tomaintain a priority allocation set by the macro-manager. Inthe context of GMTI radar micro-management, the sequentialdetection problem outlined above reads: Suppose the radarmacro-manager specifies a particular target priority alloca-tion. How long should the micro-manager track targets usingthe current priority allocation before returning control to themacro-manager? Our main result, that the optimal decisionpolicy is a monotone function of the targets’ covariances, facil-itates devising numerically efficient algorithms for the optimalradar micro-management policy.

II. RADAR MANAGER ARCHITECTURE AND TARGET DYNAMICS

This section motivates the sequential detection problem byoutlining the macro/micro-manager architecture of the GMTIradar and target dynamics. (The linear dynamics of the targetmodel are justified in Section V-A where a detailed descriptionis given of the GMTI kinematic model.)

A. Macro- and Micro-Manager Architecture

(The reader who is uninterested in the radar application canskip this subsection.) Consider a GMTI radar with an agile beamtracking ground moving targets indexed by .In this section we describe a two-time-scale radar managementscheme comprised of a micro-manager and a macro-manager.

a) Macro-Manager: At the beginning of each schedulinginterval , the radar macro-manager allocates the target pri-ority vector . Here the priority of targetis and . The priority weight de-termines what resources the radar devotes to target . This af-fects the track variances as described below. The choice istypically rule-based, depending on several extrinsic factors. Forexample, in GMTI radar systems, the macro-manager picks thetarget priority vector based on the track variances (uncer-tainty) and threat levels of the targets.

The track variances of the targets are determined by theBayesian tracker as discussed below.

b) Micro-Manager: Once the target priority vector ischosen (we omit the subscript for convenience), the micro-manager is initiated. The clock on the fast time scale (whichis called the decision epoch time scale in Section V-A) is resetto and commences ticking. At this decision epoch timescale, , the targets are tracked/estimated by aBayesian tracker. Target with priority is allocated the frac-tion of the total number of observations (by integratingobservations on the fast time scale, see Section V-A) so that the

Page 3: 700 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 60, NO. …vikramk/KBGM12.pdf · 700 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 60, NO. 2, FEBRUARY 2012 Sequential Detection With

702 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 60, NO. 2, FEBRUARY 2012

observation noise variance is scaled by . The questionwe seek to answer is: How long should the micro-manager trackthe targets with priority vector before returning control tothe macro-manager to pick a new priority vector? We formulatethis as a sequential decision problem.

Note that the priority allocation vector and track variancesof the targets capture the interaction between the micro- andmacro-managers.

B. Target Kinematic Model and Tracker

We now describe the target kinematic model at the epoch timescale : Let denote the Cartesian coordi-nates and velocities of the ground moving target .

Section V-A shows that on the micro-manager time scale, theGMTI target dynamics can be approximated as the followinglinear time invariant Gaussian state space model

with probability

with probability(1)

The parameters are defined in Section V. They can betarget dependent; to simplify notation we have not done this.In (1), denotes a 3-dimensional observation vector of targetat epoch time . The noise processes and are mu-tually independent, white, zero-mean Gaussian random vectorswith covariance matrices and , respectively. ( andare defined in Section V.) Finally, denotes the probability ofdetection of target , and represents a missed observation thatcontains no information about state .2

Define the one-step-ahead predicted covariance matrix oftarget at time as

Here the superscript denotes transpose. Based on the priorityvector and model (1), the covariance of the state estimate oftarget is computed via the following measure-ment dependent Riccati equation

(2)

Here denotes the indicator function. In the special casewhen a target is allocated zero priority , or whenthere is a missing observation , then (2) specializes tothe Kalman predictor updated via the Lyapunov equation

(3)

III. SEQUENTIAL DETECTION PROBLEM

This section presents our main structural result on thesequential detection problem. Section III-A formulates the

2With suitable notational abuse, we use ‘ ’ as a label to denote a missingobservation. When a missing observation is encountered, the track estimate isupdated by the Kalman predictor with covariance update (3).

stopping cost in terms of the mutual information of the targetsbeing tracked. Section III-B formulates the sequential detec-tion problem. The optimal decision policy is expressed as thesolution of a stochastic dynamic programming problem. Themain result (Theorem 1 in Section III-C) states that the optimalpolicy is a monotone function of the target covariance. As aresult, the optimal policy can be parametrized by monotonepolicies and estimated in a computationally efficient mannervia stochastic approximation (adaptive filtering) algorithms.This is described in Section IV.

Notation: Given the priority vector allocated by the macro-manager, let denote the highest priority target,i.e., . Its covariance is denoted . We use thenotation to denote the set of covariance matrices of theremaining targets. The sequential decision problem belowis formulated in terms of .

A. Formulation of Mutual Information Stopping Cost

As mentioned in Section II-A, once the radar macro-managerdetermines the priority vector , the micro-manager switches onand its clock begins to tick. The radar micro-man-ager then solves a sequential detection problem involving twoactions: At each slot , the micro-manager chooses action

. To formulate the sequential detectionproblem, this subsection specifies the costs incurred with theseactions.

Radar Operating Cost: If the micro-manager chooses action(continue), it incurs the radar operating cost denoted as

. Here depends on the radar operating parameters,Stopping Cost—Stochastic Observability: If the micro-man-

ager chooses action (stop), a stopping cost is incurred.In this paper, we formulate a stopping cost in terms of the sto-chastic observability of the targets, see also [1], [2]. Define thestochastic observability of each target as the mu-tual information

(4)

In (4), and are non-negative constants chosen by the de-signer. Recall from information theory [13], that denotesthe differential entropy of target at time . Alsodenotes the conditional differential entropy of target at time

given the observation history . The mutual informationis the average reduction in uncertainty of the target’s

coordinates given measurements . In the standard defini-tion of mutual information . However, we are alsointerested in the special case when , in which case, weare considering the conditional entropy for each target (see Case4 below).

Consider the following stopping cost if the micro-managerchooses action at time

(5)

Recall denotes the highest priority target. In (5), denotesa function chosen by the designer to be monotone increasing ineach of its variables (examples are given below).

The following lemma follows from straightforward argu-ments in [13].

Page 4: 700 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 60, NO. …vikramk/KBGM12.pdf · 700 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 60, NO. 2, FEBRUARY 2012 Sequential Detection With

KRISHNAMURTHY et al.: SEQUENTIAL DETECTION WITH MUTUAL INFORMATION STOPPING COST 703

Lemma 1: Under the assumption of linear Gaussian dynamics(1) for each target , the mutual information of target definedin (4) is

(6)

where. Here denotes the pre-

dicted (a priori) covariance of target at epoch given no obser-vations. It is computed using the Kalman predictor covarianceupdate (3) for iterations. Also, is the posterior covarianceand is computed via the Kalman filter covariance update (2).

Using Lemma 1, the stopping cost in (5) can be ex-pressed in terms of the Kalman filter and predictor covariances.Define the four-tuple of sets of covariance matrices

(7)

Therefore the stopping cost (5) can be expressed as

(8)

Examples: We consider the following examples of in(8):

Case 1. Maximum Mutual Information Difference StoppingCost: inwhich case,

(9)

The stopping cost is the difference in mutual information be-tween the target with highest mutual information and the targetwith highest priority. This can be viewed as a stopping cost thatdiscourages stopping too soon.

Case 2. Minimum Mutual Information Difference StoppingCost: inwhich case,

(10)

The stopping cost is the difference in mutual information be-tween the target with lowest mutual information and the targetwith highest priority. This can be viewed as a conservative stop-ping cost in the sense that preference is given to stop sooner.

Case 3. Average Mutual Information Difference StoppingCost: in whichcase,

(11)

This stopping cost is the difference between the average mutualinformation of the targets (if and include aterm) and the highest priority target.

Case 4. Conditional Differential Entropy Difference StoppingCost: We are also interested in the following special case which

involves scheduling between a Kalman filter and mea-surement-free Kalman predictors; see [14]. Suppose the highpriority target is allocated a Kalman filter and the remaining

targets are allocated measurement-free Kalman predic-tors. This corresponds to the case where and for

in (1), that is, the radar assigns all its resources to targetand no resources to any other target. Then solving the sequentialdetection problem is equivalent to posing the following ques-tion: What is the optimal stopping time when the radar shoulddecide to start tracking another target? In this case, the mutualinformation of each target is zero (since in (6)).So it is appropriate to choose for in (8). Notefrom (4), that when , the stopping cost of each individualtarget becomes the negative of its conditional entropy. That is,the stopping cost is the difference in the conditional differentialentropy instead of the mutual information.

Discussion: A natural question is: How to pick the stoppingcost (8) depending on the target priorities? One can design thechoice of stopping cost (namely, Case 1, 2, or 3 above) de-pending on the range of target priorities. For example, supposethe priority of a target is the negative of its mutual information.

i) If two or more targets have similar high priorities, it makessense to use Case 2 since the stopping cost would be closeto zero. This would give incentive for the micro-manager to stopquickly and consider other high priority targets. Note also thatif multiple targets have similar high priorities, the radar woulddevote similar amounts of time to them according to the protocolin Section II-A, thereby not compromising the accuracy of theestimates of these targets.

ii) If target has a significantly higher priority than all othertargets, then Case 1 or 3 can be chosen for the stopping cost. Asmentioned above, Case 1 would discourage stopping too soonthereby allocating more resources to target . In comparison,Case 3 is a compromise between Case 1 and Case 2, since itwould consider the average of all other target priorities (insteadof the maximum or minimum).

Since, as will be shown in Section IV, the parametrizedmicro-management policies can be implemented efficiently,the radar system can switch between the above stopping costsin real time (at the macro-manager time scale). Finally, from apractical point of view, the macro-manager, which is respon-sible for assigning the priority allocations, will rarely assignequal priorities to two targets. This is due to the fact that thepriority computation in realistic scenarios is based on manyfactors such as target proximity and heading relative to assetsin surveillance region, error covariances in state estimates, andtarget type.

B. Formulation of Sequential Decision Problem

With the above stopping and continuing costs, we are nowready to formulate the sequential detection problem that wewish to solve. Let denote a stationary decision policy of theform

(12)

Recall from (7) that is a 4-tuple of sets of covariance ma-trices. Let denote the family of such stationary policies. For

Page 5: 700 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 60, NO. …vikramk/KBGM12.pdf · 700 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 60, NO. 2, FEBRUARY 2012 Sequential Detection With

704 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 60, NO. 2, FEBRUARY 2012

any prior 4-tuple (recall notation (7)) and policychosen by the micro-manager, define the stopping time

. The following cost is associated with thesequential decision procedure:

(13)

Here is the radar operating cost and the stopping cost in-troduced in Section III-A. Also, denotes expectation with re-spect to stopping time and initial condition . (A measure-the-oretic definition of , which involves an absorbing state to dealwith stopping time , is given in [15]).

The goal is to determine the optimal stopping time withminimal cost, that is, compute the optimal policy tominimize (13). Denote the optimal cost as

(14)

The existence of an optimal stationary policy follows from[5, Prop. 1.3, Ch. 3]. Since is non-negative, for the condi-tional entropy cost function of Case 4 in Section III-A, stoppingis guaranteed in finite time, i.e., is finite with probability 1.For Cases 1 to 3, in general is not necessarily finite—how-ever, this does not cause problems from a practical point of viewsince the micro-manager has typically a prespecified upper timebound at which it always chooses and reverts back tothe macro-manager. Alternatively, for Cases (1) to (3), if onetruncates to some upper bound, then again stopping isguaranteed in finite time.

Considering the above cost (13), the optimal stationary policyand associated value function are

the solution of the following “Bellman’s dynamic programmingequation” [8] (recall our notation ):

(15)

where and were defined in (2) and (3). Heredenotes the Kalman filter covariance update for the lowerpriority targets according to (2). Our goal is to characterize theoptimal policy and optimal stopping set defined as

(16)

In the special Case 4 of Section III-A, when , then.

The dynamic programming equation (15) does not translateinto practical solution methodologies since the space of , 4-tu-ples of sets of positive definite matrices, is uncountable, and itis not possible to compute the optimal decision policy in closedform.

C. Main Result: Monotone Optimal Decision Policy

Our main result below shows that the optimal decision policyis a monotone function of the covariance matrices of the

targets. To characterize in the sequential decision problembelow, we introduce the following notation:

— Let denote the dimension of the state in (1). (In theGMTI radar example .)

— Let denote the set of all real-valued, symmetricpositive semi-definite matrices. For define thepositive definite partial ordering as if

for all , and if for.

— Define with the inequalities reversed. Notice thatis a partially ordered set (poset).

— Note that ordering positive definite matrices also or-ders their eigenvalues. Let and

denote vectors with elements in .Then define the componentwise partial order on (de-noted by ) as (equivalently, ) iffor all .

— For any matrix , let denote the eigen-values of arranged in decreasing order as a vector. Note

implies . Clearly, is a poset.— Define scalar function to be increasing3 if im-

plies , or equivalently, if implies. Finally we say that is increasing in

if is increasing in each component of.

The following is the main result of this paper regarding thepolicy .

Theorem 1: Consider the sequential detection problem (13)with stochastic observability cost (8) and stopping set (16).

1) The optimal decision policy is in-creasing in , decreasing in , decreasing in ,and increasing in on the poset . Alternatively,

is increasing in , decreasingin , decreasing in and increasing in on theposet . Here denotes the vectors ofeigenvalues (and similarly for ).

2) In the special case when for all ,(i.e., Case 4 in Section III-A where stopping cost is theconditional entropy) the optimal policy is in-creasing in and decreasing in on the poset. Alternatively, is increasing in , and de-

creasing in on the poset .The proof is in Appendix B. The monotone property of

the optimal decision policy is useful since (as describedin Section IV) parametrized monotone policies are readilyimplementable at the radar micro-manager level and can beadapted in real time. Note that in the context of GMTI radar,the above policy is equivalent to the radar micro-manageropportunistically deciding when to stop looking at a target: Ifthe measured quality of the current target is better than somethreshold, then continue; otherwise stop.

To get some intuition, consider the second claim of Theorem1 when each state process has dimension . Then thecovariance of each target is a non-negative scalar. The secondclaim of Theorem 1 says that there exists a threshold switching

3Throughout this paper, we use the term “increasing” in the weak sense. Thatis “increasing” means non-decreasing. Similarly, the term “decreasing” meansnon-increasing.

Page 6: 700 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 60, NO. …vikramk/KBGM12.pdf · 700 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 60, NO. 2, FEBRUARY 2012 Sequential Detection With

KRISHNAMURTHY et al.: SEQUENTIAL DETECTION WITH MUTUAL INFORMATION STOPPING COST 705

curve , where is increasing in each elementof , such that for it is optimal to stop, andfor it is optimal to continue. This is illustrated inFig. 1. Moreover, since is monotone, it is differentiable almosteverywhere (by Lebesgue’s theorem).

To prove Theorem 1 we will require the following mono-tonicity result regarding the Riccati and Lyapunov equations ofthe Kalman covariance update. This is proved in Appendix C.Below denotes determinant.

Theorem 2: Consider the Kalman filter Riccati covarianceupdate, , defined in (2) with possibly missing measure-ments, and Lyapunov covariance update, defined in (3).The following properties hold for and (where

denotes the dimension of the observation vector in (1)):i) i) and (ii) are monotone decreasing

in on the poset .Discussion: An important property of Theorem 2 is that sta-

bility of the target system matrix (see (24)) is not required.In target tracking models (such as (1)), has eigenvalues at 1and is therefore not stable. By using Theorem 2, Lemma 4 (inAppendix A) shows that the stopping cost involving stochasticobservability is a monotone function of the covariances. Thismonotone property of the stochastic observability of a Gaussianprocess is of independent interest.

Instead of stochastic observability (which deals with log-de-terminants), suppose we had chosen the stopping cost in termsof the trace of the covariance matrices. Then, in general, it is nottrue that is decreasing in on theposet . Such a result typically requires stability of .

IV. PARAMETRIZED MONOTONE POLICIES AND STOCHASTIC

OPTIMIZATION ALGORITHMS

Theorem 1 shows that the optimal sequential decision policyis monotone in . Below, we char-

acterize and compute optimal parametrized decision policiesof the form for the sequentialdetection problem formulated in Section III-B. Heredenotes a suitably chosen finite dimensional parameter andis a subset of Euclidean space. Any such parametrized policy

needs to capture the essential feature of Theorem 1: itneeds to be decreasing in and increasing in .In this section, we derive several examples of parametrized poli-cies that satisfy this property. We then present simulation-based

adaptive filtering (stochastic approximation) algorithms to es-timate these optimal parametrized policies. To summarize, in-stead of attempting to solve an intractable dynamic program-ming problem (15), we exploit the monotone structure of theoptimal decision policy (Theorem 1) to estimate a parametrizedoptimal monotone policy (Algorithm 1 below).

A. Parametrized Decision Policies

Below we give several examples of parametrized decisionpolicies for the sequential detection problem that are mono-tone in the covariances. Because such parametrized policies sat-isfy the conclusion of Theorem 1, they can be used to approx-imate the monotone optimal policy of the sequential detectionproblem. Lemma 2 below shows that the constraints we specifyare necessary and sufficient for the parametrized policy to bemonotone implying that such policies are an approxi-mation to the optimal policy within the appropriate pa-rametrized class .

First we consider 3 examples of parametrized policies that arelinear in the vector of eigenvalues (defined in Section III-C).Recall that denotes the dimension of state in (1). Let and

denote the parameter vectors that parametrizethe policy defined as (17)–(19), shown at the bottom of thepage.

As a fourth example, consider the parametrized policyin terms of covariance matrices. Below andare unit-norm vectors, i.e., and for

. Let denote the space of unit-norm vectors.Define the parametrized policy as (20), shownat the bottom of the next page.

The following lemma states that the above parametrized poli-cies satisfy the conclusion of Theorem 1 that the policies aremonotone. The proof is straightforward and hence omitted.

Lemma 2: Consider each of the parametrized policies (17),(18), and (19). Then is necessary and suffi-cient for the parametrized policy to be monotone increasingin and decreasing in . For (20),(unit-norm vectors) is necessary and sufficient for the param-etrized policy to be monotone increasing in anddecreasing in .

Lemma 2 says that since the constraints on the parametervector are necessary and sufficient for a monotone policy, theclasses of policies (17), (18), (19), and (20) do not leave outany monotone policies; nor do they include any non monotone

if

otherwise(17)

if

otherwise(18)

if

otherwise(19)

Page 7: 700 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 60, NO. …vikramk/KBGM12.pdf · 700 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 60, NO. 2, FEBRUARY 2012 Sequential Detection With

706 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 60, NO. 2, FEBRUARY 2012

Fig. 1. Threshold switching curve for optimal decision policy .Claim 2 of Theorem 1 says that the optimal decision policy is characterized by amonotone increasing threshold curve when each target has state dimension

.

policies. Therefore, optimizing over for each case yields thebest approximation to the optimal policy within the appropriateclass.

Remark: Another example of a parametrized policy that sat-isfies Lemma 2 is obtained by replacing with in(17), (18), and (19). In this case, the parameters arescalars. However, numerical studies (not presented here) showthat this scalar parametrization is not rich enough to yield usefuldecision policies.

B. Stochastic Approximation Algorithm to Estimate

Having characterized monotone parameterized policiesabove, our next goal is to compute the optimal parametrizedpolicy for the sequential detection problem described inSection III-B. This can be formulated as the following sto-chastic optimization problem:

(21)

Recall that is the stopping time at which stop action isapplied, i.e., .

The optimal parameter in (21) can be computed by simu-lation-based stochastic optimization algorithms as we now de-scribe. Recall that for the first three examples above (namely,(17), (18) and (19)), there is the explicit constraint that and

. This constraint can be eliminated straightfor-wardly by choosing each component of aswhere . The optimization problem (21) can then beformulated in terms of this new unconstrained parameter vector

.In the fourth example above, namely (20), the parameter is

constrained to the boundary set of the -dimensional unit hy-

persphere . This constraint can be eliminated by parametrizingin terms of spherical coordinates as follows: Let

(22)

where denote a parametrization of .Then it is trivially verified that . Again the optimizationproblem (21) can then be formulated in terms of this new un-constrained parameter vector .

Algorithm 1: Policy Gradient Algorithm for ComputingOptimal Parametrized Policy

Step 1: Choose initial threshold coefficients andparametrized policy .

Step 2: For iterations• Evaluate sample cost

.Compute gradient estimate as:

with probabilitywith probability

Here denotes the gradient step size withand .

• Update threshold coefficients via (where belowdenotes step size)

and (23)

Several possible simulation based stochastic approximationalgorithms can be used to estimate in (21). In our numericalexamples, we used Algorithm 1 to estimate the optimal param-etrized policy. Algorithm 1 is a simultaneous perturbation sto-chastic approximation (SPSA) algorithm [16]; see [17] for othermore sophisticated gradient estimators. Algorithm 1 generatesa sequence of estimates and thus that con-verges to a local minimum of (21) with policy . InAlgorithm 1 we denote the policy as since is parametrizedin terms of as described above.

The SPSA algorithm [16] picks a single random direction(see Step 2) along which the derivative is evaluated after eachbatch . As is apparent from Step 2 of Algorithm 1, evaluationof the gradient estimate requires only 2 batch simulations.This is unlike the well known Kiefer–Wolfowitz stochastic ap-proximation algorithm [16] where batch simulations are re-quired to evaluate the gradient estimate. Since the stochastic gra-

ifotherwise.

(20)

Page 8: 700 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 60, NO. …vikramk/KBGM12.pdf · 700 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 60, NO. 2, FEBRUARY 2012 Sequential Detection With

KRISHNAMURTHY et al.: SEQUENTIAL DETECTION WITH MUTUAL INFORMATION STOPPING COST 707

dient algorithm (23) converges to a local optimum, it is neces-sary to retry with several distinct initial conditions.

V. APPLICATION: GMTI RADAR SCHEDULING AND

NUMERICAL RESULTS

This section illustrates the performance of the monotone pa-rametrized policy (21) computed via Algorithm 1 in a GMTIradar scheduling problem. We first show that the nonlinear mea-surement model of a GMTI tracker can be approximated satis-factorily by the linear Gaussian model (1) that was used above.Therefore, the main result Theorem 1 applies, implying that theoptimal radar micro-management decision policy is monotone.To illustrate these micro-management policies numerically, wethen consider two important GMTI surveillance problems—thetarget fly-by problem and the persistent surveillance problem.

A. GMTI Kinematic Model and Justification ofLinearized Model (1)

The observation model below is an abstraction based onapproximating several underlying preprocessing steps. Forexample, given raw GMTI measurements, space-time adaptiveprocessing (STAP) (which is a two-dimensional adaptive filter)is used for near real-time detection, see [6] and referencestherein. Similar observation models can be used as abstractionsof synthetic aperture radar (SAR) based processing.

A modern GMTI radar manager operates on three time-scales(the description below is a simplified variant of an actual radarsystem), as follows.

1) Individual observations of target are obtained on the fasttime-scale . The period at which ticks istypically 1 milli-second. At this time-scale, ground targetscan be considered to be static.

2) Decision epoch is the time-scale at whichthe micro-manager and target tracker operate. Recall isthe stopping time at which the micro-manager decides tostop and return control to the macro-manager. The clock-period at which ticks is typically seconds.At this epoch time-scale , the targets move accordingto the kinematic model (24) and (25) below. Each epoch

is comprised of intervals of the fasttime-scale, where is typically of the order of 100. So,100 observations are integrated at the -time-scale to yielda single observation at the -time-scale.

3) The scheduling interval is the time-scale atwhich the macro-manager operates. Each scheduling in-terval is comprised of decision epochs. This stoppingtime is determined by the micro-manager. is typicallyin the range 10 to 50—in absolute time, it corresponds tothe range 1 to 5 s. In such a time period, a ground targetmoving at 50 km/h moves approximately in the range 14to 70 m.

1) GMTI Kinematic Model: The tracker assumes that eachtarget has kinematic model and GMTI observa-tions [9]

(24)

with probability

with probability(25)

Here denotes a 3-dimensional (range, bearing and range rate)observation vector of target at epoch time and denotesthe Cartesian coordinates and speed of the platform (aircraft)on which the GMTI radar is mounted. The noise processesand are zero-mean Gaussian random vectors with co-variance matrices and , respectively. The observation

in decision epoch is the average of the measurementsobtained at the fast time scale . Thus, the observation noise vari-ance in (25) is scaled by the reciprocal of the target priority .In (24) and (25) for a GMTI system,

(26)

Recall that is typically 0.1 seconds. The elements ofcorrespond to range, azimuth, and range rate, respectively. Also

denotes the and position and speeds,respectively, and denotes the altitude, assumed to be constant,of the aircraft on which the GMTI radar is mounted.

2) Approximation by Linear Gaussian State Space Model:Starting with the nonlinear state space model (24), the aimbelow is to justify the use of the linearized model (1). Westart with linearizing the model (24) as follows; see [18, Ch.8.3]. For each target , consider a nominal deterministic targettrajectory and nominal measurement where

. Defining and ,a first order Taylor series expansion around this nominaltrajectory yields (27) at the bottom of the next page, where

andfor some . In the above equation,

is the Jacobian matrix defined as (for simplicity weomit the superscript for target )

(28)

Page 9: 700 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 60, NO. …vikramk/KBGM12.pdf · 700 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 60, NO. 2, FEBRUARY 2012 Sequential Detection With

708 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 60, NO. 2, FEBRUARY 2012

where denotes the relative positionof the target with respect to the platform and denote therelative velocities. Since the target is ground based and the plat-form is constant altitude, is a constant.

In (27), denotes the 3 4 4 Hessian tensor. Byevaluating this Hessian tensor for typical operating modes and

, we show below that

(29)

The first inequality above says that the model is approximatelylinear in the sense that the ratio of linearization error tolinear term is small; the second inequality says that the modelis approximately time-invariant, in the sense that the relativemagnitude of the error between linearizing around andis small. Therefore, on the micro-manager time scale, the targetdynamics can be viewed as a linear time invariant state spacemodel (1).

Justification of (29): Using typical GMTI oper-ating parameters, we evaluate the bounds in (29).Denote the state of the platform (aircraft) on whichthe radar is situated as

35 000 m 100 m/s 15 000 m 20 m/s . Then the platformheight is , where is the depressionangle, typically between 10 to 25 . We assume a depressionangle of below yielding 10203.2 m. Next,consider typical behavior of ground targets with speed 15 m/s(54 km/h) and select the following significantly different initialtarget state vectors (denoted by superscripts )

(30)

(31)

Now, propagate these initial states using the target model with0.1 s, 0.5 20 m, 5 m/s, 0.5

with a true track variability parameter (used for truetrack simulation as and ). Define (see (29) and recall that

for some )

TABLE IRATE OF CHANGE OF JACOBIAN FOR VARIOUS RUNNING TIMES

TABLE IIRATIO OF SECOND-ORDER TO FIRST-ORDER TERM OF TAYLOR SERIES

EXPANSION FOR

TABLE IIIRATIO OF SECOND-ORDER TO FIRST-ORDER TERM OF TAYLOR SERIES

EXPANSION FOR

Tables I to III show how and evolve with iteration. The entries in the tables are small, thereby

justifying the linear time invariant state space model (1).Remark: Since a linear Gaussian model is an accurate ap-

proximate model, most real GMTI trackers use an extendedKalman filter. Approximate nonlinear filtering methods such assequential Markov chain Monte Carlo methods (particle filters)are not required.

B. Numerical Example 1: Target Fly-By

With the above justification of the model (1), we present thefirst numerical example. Consider ground targets thatare tracked by a GMTI platform, as illustrated in Fig. 2. Thenominal range from the GMTI sensor to the target region is ap-proximately 30 km. For this example, the initial (at the startof the micro-manager cycle) estimated and true target states ofthe four targets are given in Table IV.

We assume in this example that the most uncertain target isregarded as being the highest priority. Based on the initial states

with probability

with probability(27)

Page 10: 700 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 60, NO. …vikramk/KBGM12.pdf · 700 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 60, NO. 2, FEBRUARY 2012 Sequential Detection With

KRISHNAMURTHY et al.: SEQUENTIAL DETECTION WITH MUTUAL INFORMATION STOPPING COST 709

Fig. 2. Target fly-by scenario. The GMTI platform (aircraft) moves with con-stant altitude and velocity at nominal range 30 km from the target region.( is defined in (28)). Initial states of the four targets are specified in Table IV.

TABLE IVINITIAL TARGET STATES AND ESTIMATES

and estimates in Table IV, the mean square error values are,MSE MSE MSE ,and MSE . Thus, target is the most uncertainand allocated the highest priority. So we denote .

The simulation parameters are as follows: sampling time0.1 s (see Section V-A-1)); probability of detection

(for all targets, so superscript is omitted);track standard deviations of target model 0.5m; measurement noise standard deviations 20 m,

5 m/s; and platform states10 km 53 m/s 30 km 85 m/s . We assume a target priority

vector of .Recall from (25) that the target priority scales the inverse ofthe covariance of the observation noise. We chose an operatingcost of , and the stopping cost of specifiedin (11), with constants . Theparametrized policy chosen for this example wasdefined in (20). We used the SPSA algorithm (Algorithm 1)to estimate the parameter that optimizes the objective (21).Since the SPSA converges to a local minimum, several initialconditions were evaluated via a random search.

Fig. 3 explores the sensitivity of the sample-path cost(achieved by the parametrized policy) with respect to prob-ability of detection, , and the operating cost, . Thesample-path cost increases with and decreases with .Larger values of the operating cost, , cause the radarmicro-manager to specify the “stop” action sooner than forlower values of . As can be seen in the figure, neither thesample-path cost or the average stopping time is particularlysensitive to changes in the probability of detection. However, asexpected, varying the operating cost has a large effect on boththe sample-path cost and the associated average stopping time.

Fig. 3. Dependence of the sample-path cost achieved by the parametrizedpolicy on the probability of detection, , and the operating cost, . Thesample-path cost increases with the operating cost, but decreases with theprobability of detection. Note the stopping times associated with the labelledvertices above.

Fig. 4 compares the optimal parametrized policy with peri-odic myopic policies. Such periodic myopic policies stop at adeterministic prespecified time (without considering state infor-mation) and then return control to the macro-manager. The per-formance of the optimal parametrized policy is measured usingmultiple initial conditions. As seen in Fig. 4, the optimal pa-rametrized policy is the lower envelope of all possible periodicstopping times, for each initial condition. The optimal periodicpolicy is highly dependent upon the initial condition. The mainperformance advantage of the optimal parametrized policy isthat it achieves virtually the same cost as the optimal periodicpolicy for any initial condition.

C. Numerical Example 2: Persistent Surveillance

As mentioned in Section II-A, persistent surveillance in-volves exhaustive surveillance of a region over long timeintervals, typically over the period of several hours or weeks[19] and is useful in providing critical, long-term battlefieldinformation. Fig. 5 illustrates the persistent surveillance setup.Here is the nominal range from the target region to the GMTIplatform, assumed in our simulations to be approximately 30km. The points on the GMTI platform track labeled (1)–(72)correspond to locations4 where we evaluate the Jacobian (28).Assume a constant platform orbit speed of 250 m/s (or approxi-mately 900 km/h [20]) and a constant altitude of approximately5000 m. Assuming 72 divisions along the 30 km radius orbit,the platform sensor takes 10.4 s to travel between the tracksegments. Using a similar analysis to the Appendix, the mea-surement model changes less than 5% in -norm in 10.4 s, thus

4The platform state at location is defined as.

Page 11: 700 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 60, NO. …vikramk/KBGM12.pdf · 700 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 60, NO. 2, FEBRUARY 2012 Sequential Detection With

710 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 60, NO. 2, FEBRUARY 2012

Fig. 4. Plot of sample-path cost of periodic policies and the parametrized policy(thick-dashed line) versus initial conditions. These initial conditions are orderedwith respect to the cost achieved using the parametrized policy for that particularinitial condition. Notice that the sample-path cost is the lower envelope of alldeterministic stopping times for any initial condition. (a) Sample-path cost. (b)Magnified region.

the optimal parameter vector is approximately constant on eachtrack segment.

Simulation parameters for this example are as follows:number of targets ; sampling time 0.1 s; proba-bility of detection ; track variances of target model

0.5 m; and measurement noise parameters20 m, 0.5 5 m/s. The platform velocity is nowchanging (assume a constant speed of 250 m/s), unlike theprevious example, which assumed a constant velocity platform.Since the linearized model will be different at each of the pre-specified points, (1)–(72), along the GMTI track, we computedthe optimal parametrized policy at each of the respective loca-tions. The radar manager then switches between these policiesdepending on the estimated position of the targets.

We consider Case 4 of Section III-A where the radar devotesall its resources to one target, and none to the other targets.That is, we assume a target priority vector of .In this case, the first target is allocated a Kalman filter, with

Fig. 5. Representation of the persistent surveillance scenario in GMTI systems.The GMTI platform (aircraft) orbits the target region in order to obtain persistentmeasurements as long as targets remain within the target region. The nominalrange from the platform to the target region is assumed to be 30 km.

all the other targets allocated measurement-free Kalman predic-tors. Since the threshold parametrization vectors depend on thetarget’s state and measurement models, the first target hasa unique parameter vector, where targets all have the sameparameter vectors. Also, , for all .

We chosein stopping cost (11) (average mu-

tual information difference stopping cost). The parametrizedpolicy considered was in (20). The optimal pa-rametrized policy was computed using Algorithm 1 at each ofthe 72 locations on the GMTI sensor track. As the GMTI plat-form orbits the target region, we switch between these param-etrized policy vectors, thus continually changing the adoptedtracking policy. We implemented the following macro-manager:

. The priority vector was chosenas and for all . Fig. 6. shows log-deter-minants of each of the targets’ error covariance matrices overmultiple macro-management tracking cycles.

VI. CONCLUSION

This paper considers a sequential detection problem with mu-tual information stopping cost. Using lattice programming weprove that the optimal policy has a monotone structure in termsof the covariance estimates (Theorem 1). The proof involvedshowing monotonicity of the Riccati and Lyapunov equations(Theorem 2). Several examples of parametrized decision poli-cies that satisfy this monotone structure were given. A sim-ulation-based adaptive filtering algorithm (Algorithm 1) wasgiven to estimate the parametrized policy. The sequential de-tection problem was illustrated in a GMTI radar schedulingproblem with numerical examples. In related work, [11] derives

Page 12: 700 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 60, NO. …vikramk/KBGM12.pdf · 700 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 60, NO. 2, FEBRUARY 2012 Sequential Detection With

KRISHNAMURTHY et al.: SEQUENTIAL DETECTION WITH MUTUAL INFORMATION STOPPING COST 711

Fig. 6. Plot of log-determinants for each target over multiple scheduling inter-vals. On each scheduling interval, a Kalman filter is deployed to track one targetand Kalman predictors track the remaining 3 targets. The bold line correspondsto the target allocated the Kalman filter by the micro-manager in each sched-uling interval. Initially a Kalman filter is deployed on target . Data pointsmarked in red indicate missing observations.

threshold policies for partially observed Markov decision pro-cesses in sequential detection. In contrast to the current paper,[11] uses the monotone likelihood ratio partial order for the un-derlying hidden Markov model state filter.

APPENDIX

PROOF OF THEOREM 1 AND THEOREM 2

This Appendix presents the proof of the main resultTheorem 1. Appendix A presents the value iteration algo-rithm and supermodularity that will be used as the basis of theinductive proof. The proof of Theorem 1 in Appendix B useslattice programming [21] and depends on certain monotoneproperties of the Kalman filter Riccati and Lyapunov equations.These properties are proved in Theorem 2 in Appendix C.

A. Preliminaries

We first rewrite Bellman’s equation (15) in a form that is suit-able for our analysis. Define

whereifotherwise,

(32)

In (32), we have assumed that the missed observation events in(25) are statistically independent between targets, and so

. Actually, the results below hold for any joint distri-bution of missed observation events (and therefore allow these

events to be dependent between targets). For notational con-venience we assume (25) and independent missed observationevents.

Clearly and optimal decision policy satisfyBellman’s equation

(33)

Our goal is to characterize the stopping set defined as

Since the function is translation invariant (that is,for any func-

tions and ), both the stopping set and optimal policyin these new coordinates are identical to those in the original

coordinate system (16), (15).Value Iteration Algorithm: The value iteration algorithm will

be used to construct a proof of Theorem 1 by mathematical in-duction. Let denote iteration number. The valueiteration algorithm is a fixed point iteration of Bellman’s equa-tion and proceeds as follows:

(34)

Submodularity: Next, we define the key concept of submodu-larity [21]. While it can be defined on general lattices with an ar-bitrary partial order, here we restrict the definition to the posets

and , where the partial orders and weredefined above.

Definition 1 (Submodularity and Supermodularity [21]): Ascalar function is submodular in if

for . is supermodular if is sub-modular. A scalar function is sub/su-permodular in each component of if it is sub/supermodularin each component . An identical definition holds withrespect to on .

The most important feature of a supermodular (submodular)function is that decreases (increases)

Page 13: 700 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 60, NO. …vikramk/KBGM12.pdf · 700 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 60, NO. 2, FEBRUARY 2012 Sequential Detection With

712 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 60, NO. 2, FEBRUARY 2012

in its argument , see [21]. This is summarized in the followingresult.

Theorem 3 [21]: Supposeis submodular in , submodular in , su-permodular in and supermodular in .Then there exists a

, that is in-creasing in , decreasing in , increasing in anddecreasing in .

Next we state a well-known result (see [22] for proof) that theevolution of the covariance matrix in the Lyapunov and Riccatiequation are monotone.

Lemma 3 [22]: and are monotone operators onthe poset . That is, if , then andfor all .

Finally, we present the following lemma which states that thestopping costs (stochastic observability) are monotone in the co-variance matrices. The proof of this lemma depends on Theorem2, the proof of which is given in Appendix C below.

Lemma 4: For in Case 1 (9), Case 2 (10), and Case3 (11), the cost defined in (32) is de-creasing in , and increasing in . (Case 4 isa special case when for all .)

Proof: For Case 1 and Case 2 letor

, respectively.From (32) with denoting determinant,

(35)

For Case 3,

(36)

Theorem 2 shows that and are decreasing inand for all .

B. Proof of Theorem 1

Proof: The proof is by induction on the value iterationalgorithm (34). Note defined in (34) is decreasing in

and increasing in via Lemma 3.

Next assume is decreasing in andincreasing in . Since

and are monotone increasingin and , it follows that the term

isdecreasing in and increasing in in (34).Next, it follows from Lemma 4 thatis decreasing in and increasing in .Therefore, from (34), inherits this property. Hence,

is decreasing in and increasingin . Since value iteration converges pointwise, i.e.,

pointwise , it follows that is decreasing inand increasing in .

Therefore, is decreasing in and increasingin . This implies is submodular in ,submodular in , supermodular in and super-modular in . Therefore, from Theorem 3, there exists aversion of that is increasing in and decreasingin .

C. Proof of Theorem 2

We start with the following lemma.Lemma 5: If matrices and are invertible, then for con-

formable matrices and

(37)

Proof: The Schur complement forumlas applied toyield

Taking determinants yields (37).Theorem 2i): Given positive definite matrices andand arbitrary matrix is decreasing in , or

equivalently,Proof: Applying (37) with

(38)

Since , then and thus. Since positive definite

dominance implies dominance of determinants, it follows that

Using (38), the result follows.

Page 14: 700 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 60, NO. …vikramk/KBGM12.pdf · 700 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 60, NO. 2, FEBRUARY 2012 Sequential Detection With

KRISHNAMURTHY et al.: SEQUENTIAL DETECTION WITH MUTUAL INFORMATION STOPPING COST 713

Theorem 2ii): Given positive definite matrices and ar-bitrary matrix is decreasing in . That is for

(39)

Proof: Using the matrix inversion lemma

(40)

Applying the identity (37) withwe have

(41)

Further, using (37) with , wehave

(42)

Substituting (42) into (41),

(43)

(44)

From (44) and (40)

(45)

We are now ready to prove the result. Since :•

;•

;• ;

• .Therefore, (39) follows from the following inequality:

ACKNOWLEDGMENT

The contribution of author M. Gevers was limited to theproofs of the submodularity properties in the Appendix. Thecontribution of author E. Miehling was to code the algorithmsproposed in the paper and prepare the numerical examples inSection V.

REFERENCES

[1] R. R. Mohler and C. S. Hwang, “Nonlinear data observability and in-formation,” J. Franklin Inst., vol. 325, no. 4, pp. 443–464, 1988.

[2] A. Logothetis and A. Isaksson, “On sensor scheduling via informationtheoretic criteria,” in Proc. Amer. Control Conf., San Diego, CA, 1999,pp. 2402–2406.

[3] A. R. Liu and R. R. Bitmead, “Stochastic observability in network stateestimation and control,” Automatica, vol. 47, pp. 65–78, 2011.

[4] E. Grossi and M. Lops, “MIMO radar waveform design: A divergence-based approach for sequential and fixed-sample size tests,” in Proc. 3rdIEEE Int. Workshop on Comput. Adv. Multi-Sensor Adapt. Process.,2009, pp. 165–168.

[5] D. P. Bertsekas, Dynamic Programming and Optimal Control. Bel-mont, MA: Athena Scientific, 2000, vol. 1 and 2.

[6] B. Bhashyam, A. Damini, and K. Wang, “Persistent GMTI surveil-lance: Theoretical performance bounds and some experimentalresults,” in Radar Sensor Technology XIV. Orlando, FL: SPIE, 2010.

[7] R. Whittle, “Gorgon stare broadens UAV surveillance,” Aviation Week,Nov. 3, 2010.

[8] D. P. Heyman and M. J. Sobel, Stochastic Models in Operations Re-search. New York: McGraw-Hill, 1984, vol. 2.

[9] S. Blackman and R. Popoli, Design and Analysis of Modern TrackingSystems. Norwood, MA: Artech House, 1999.

[10] J. Wintenby and V. Krishnamurthy, “Hierarchical resource manage-ment in adaptive airborne surveillance radars—A stochastic discreteevent system formulation,” IEEE Trans. Aerosp. Electron. Syst., vol.20, no. 2, pp. 401–420, Apr. 2006.

[11] V. Krishnamurthy, “Bayesian sequential detection with phase-dis-tributed change time and nonlinear penalty—A POMDP latticeprogramming approach,” IEEE Trans. Inf. Theory, vol. 57, no. 10, pp.7096–7120, Oct. 2011.

[12] V. Krishnamurthy and D. V. Djonin, “Optimal threshold policies formultivariate POMDPs in radar resource management,” IEEE Trans.Signal Process., vol. 57, no. 10, pp. 3954–3969, 2009.

[13] T. M. Cover and J. A. Thomas, Elements of Information Theory. NewYork: Wiley-Interscience, 2006.

[14] R. Evans, V. Krishnamurthy, and G. Nair, “Networked sensor man-agement and data rate control for tracking maneuvering targets,” IEEETrans. Signal Process., vol. 53, no. 6, pp. 1979–1991, Jun. 2005.

[15] O. Hernández-Lerma and J. B. Laserre, Discrete-Time Markov ControlProcesses: Basic Optimality Criteria. New York: Springer-Verlag,1996.

[16] J. Spall, Introduction to Stochastic Search and Optimization. NewYork: Wiley, 2003.

[17] G. Pflug, Optimization of Stochastic Models: The Interface BetweenSimulation and Optimization. Norwell, MA: Kluwer, 1996.

[18] A. H. Jazwinski, Stochastic Processes and Filtering Theory. NewYork: Academic, 1970.

[19] R. D. Rimey, W. Hoff, and J. Lee, “Recognizing wide-area and process-type activities,” in Proc. 10th Int. Conf. Inf. Fusion, Jul. 2007, pp. 1–8.

[20] U.S. Air Force, E-8c Joint Stars Fact Sheet [Online]. Available: http://www.af.mil/information/factsheets/factsheet.asp?id=100 Sep. 2007

[21] D. M. Topkis, Supermodularity and Complementarity. Princeton, NJ:Princeton Univ. Press, 1998.

Page 15: 700 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 60, NO. …vikramk/KBGM12.pdf · 700 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 60, NO. 2, FEBRUARY 2012 Sequential Detection With

714 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 60, NO. 2, FEBRUARY 2012

[22] B. D. O. Anderson and J. B. Moore, Optimal Filtering. EnglewoodCliffs, NJ: Prentice-Hall, 1979.

Vikram Krishnamurthy (F’05) was born in 1966.He received the Bachelor’s degree from the Univer-sity of Auckland, New Zealand, in 1988 and the Ph.D.degree from the Australian National University, Can-berra, in 1992.

He is currently a professor and holds the CanadaResearch Chair at the Department of Electrical Engi-neering, University of British Columbia, Vancouver,Canada. His current research interests includecomputational game theory, stochastic dynamicalsystems for modeling of biological ion channels, and

stochastic optimization and scheduling.Dr. Krishnamurthy served as Distinguished Lecturer for the IEEE Signal

Processing Society in 2009 and 2010. For the term 2010 to 2012, he isserving as Editor-in-Chief of the IEEE JOURNAL SELECTED TOPICS INSIGNAL PROCESSING. He has served as associate editor for several journals,including the IEEE TRANSACTIONS ON AUTOMATIC CONTROL and the IEEETRANSACTIONS ON SIGNAL PROCESSING.

Robert R. Bitmead (F’91) was born in Sydney,Australia, in 1954. He received the B.Sc. degree inapplied mathematics from the University of Sydney,Sydney, in 1976 and the M.E. and Ph.D. degrees inelectrical engineering from the University of New-castle, Australia, in 1977 and 1979, respectively.

He currently holds the Cymer Corporation En-dowed Chair in the Department of Mechanical andAerospace Engineering, University of California,San Diego, CA, where he has been on the Facultysince 1999. He has held faculty positions at the

Australian National University from 1982 to 1999 and James Cook Universityof North Queensland from 1980 to 1982. He has held visiting faculty positionsat Cornell University, the University of Louvain, INRIA, Kyoto University,The Norwegian Univeristy of Science & Technology, Poltecnico di Milano,ETH Zurich, and Melbourne University. His research is in the areas of adaptivesystems, estimation, control design, modeling, and telecommunications.

Dr. Bitmead is a Fellow of the Australian Academy of Technological Sciencesand Engineering and the International Federation of Automatic Control.

Michel Gevers (F’90) was born in Antwerp,Belgium, in 1945. He received the Electrical En-gineering degree from the Universit Catholique deLouvain, Belgium, in 1968 and the Ph.D. degreefrom Stanford University, Stanford, CA, in 1972,under the supervision of T. Kailath.

He is Professor Emeritus at the Departmentof Mathematical Engineering at the UniversitéCatholique de Louvain, Louvain la Neuve, Belgium,and part-time Professor at the Department Elecof the Vrije Universiteit Brussel. For 20 years, he

has been the coordinator of the Belgian Interuniversity Network DYSCO(Dynamical Systems, Control, and Optimization) funded by the FederalMinistry of Science. This is a network of excellence in systems, control andoptimization, which counts about 55 academics and almost 200 Ph.D. studentsand postdoctoral researchers. He has spent long-term visits at the University ofNewcastle, Australia; the Technical University of Vienna; and the AustralianNational University. He has published more than 250 papers and conferencepapers, and coauthored two books: one with R. R. Bitmead and V. Wertz titledAdaptive Optimal Control—The Thinking Man’s GPC (Prentice-Hall, 1990),and one with G. Li titled Parameterizations in Control, Estimation and FilteringProblems: Accuracy Aspects (Springer-Verlag, 1993). His present researchinterests are in system identification and its interconnection with robust controldesign, optimal experiment design for control, and data-based control design.

Dr. Gevers holds an Honorary Degree (Doctor Honoris Causa) from the Uni-versity of Brussels and the University of Linköping, Sweden. He was Presidentof the European Union Control Association (EUCA) from 1997 to 1999 andVice-President of the IEEE Control Systems Society in 2000 and 2001. He wasAssociate Editor of Automatica and of the IEEE TRANSACTIONS ON AUTOMATICCONTROL. He is presently Associate Editor at Large of the European Journal ofControl and Associate Editor of Mathematics of Control, Signals, and Systems(MCSS). He is an IFAC Fellow and a Distinguished Member of the IEEE Con-trol Systems Society.

Erik Miehling received the Bachelor’s degree inelectrical engineering from the University of BritishColumbia, Vancouver, BC, Canada, in 2009, and theMaster’s degree in electrical engineering, also fromthe University of British Columbia, in 2011.

His research interests include stochastic controland resource management with applications indefense.