torrione 2002 masters

A COMPARISON OF STATISTICAL ALGORITHMS FOR

LANDMINE DETECTION

by

Peter Acerbo Torrione

Department of Electrical and Computer EngineeringDuke University

Date:Approved:

Dr. Leslie Collins, Supervisor

Dr. Gary Ybarra

Dr. Gregg Trahey

A thesis submitted in partial fulfillment of therequirements for the degree of Master of Science

in the Department of Electrical and Computer Engineeringin the Graduate School of

Duke University

2002

Contents

List of Tables v

List of Figures vi

1 Introduction 1

2 Background 5

2.1 Electromagnetic Induction Systems . . . . . . . . . . . . . . . . . . . 5

2.1.1 Physics of EMI Systems . . . . . . . . . . . . . . . . . . . . . 5

2.1.2 The GEM-3 Sensor . . . . . . . . . . . . . . . . . . . . . . . . 7

2.2 Data Collection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

2.3 Parameter Estimation and the Cramer-Rao Lower Bound . . . . . . . 9

2.4 The Detection Problem: Likelihood Ratios and Generalized LikelihoodRatios . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

2.4.1 The Likelihood Ratio Test . . . . . . . . . . . . . . . . . . . . 12

2.4.2 The Generalized Likelihood Ratio Test . . . . . . . . . . . . . 13

2.4.3 The Matched Filter . . . . . . . . . . . . . . . . . . . . . . . . 14

2.5 Linear Algebra Preliminaries and Matched Subspace Detectors . . . . 15

2.5.1 Linear Algebra Preliminaries . . . . . . . . . . . . . . . . . . . 15

2.5.2 Invariance of Hypothesis Testing Problems . . . . . . . . . . . 17

2.5.3 Invariance Tests and Maximal Invariant Statistics . . . . . . . 18

2.5.4 Matched Subspace Detectors . . . . . . . . . . . . . . . . . . . 18

2.6 Support Vector Machines . . . . . . . . . . . . . . . . . . . . . . . . . 22

2.6.1 Problem Statement and the Vapnik-Chervonekis Dimension . 23

ii

2.6.2 Kernel Functions and Avoiding the Complexities of a High Di-mensional Space . . . . . . . . . . . . . . . . . . . . . . . . . . 24

2.6.3 Finding the Optimal Hyperplane . . . . . . . . . . . . . . . . 27

3 The Cramer-Rao Lower Bound 30

3.1 Additive White Noise . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

3.2 Additive White Noise and DC Term (in-phase) . . . . . . . . . . . . . 36

3.3 Additive White Noise and Additive Function of Frequency (model 1quadrature) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

3.4 Additive White Noise and Multiplicative Term (model 2 quadrature) 39

4 Signal Processing Using Matched Subspace Detectors 44

4.1 Properties of Estimated Landmine Responses . . . . . . . . . . . . . 44

4.2 Basis Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

4.3 Designing the Matched Subspace Filter . . . . . . . . . . . . . . . . . 48

4.4 Matched Subspace Results . . . . . . . . . . . . . . . . . . . . . . . . 53

5 Decay Rate Estimation 56

5.1 Decay Rates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56

5.2 Estimation Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . 57

5.3 Gaussian Models and Detection . . . . . . . . . . . . . . . . . . . . . 59

5.4 Decay Rate Estimation Results . . . . . . . . . . . . . . . . . . . . . 62

6 Support Vector Machine Algorithms 65

6.1 Building the Support Vector Machine . . . . . . . . . . . . . . . . . . 66

6.2 Model and Parameter Selection and Implementation . . . . . . . . . . 66

6.3 Support Vector Machine Results . . . . . . . . . . . . . . . . . . . . . 68

7 Conclusions and Future Work 74

iii

Bibliography 79

iv

List of Tables

2.1 Calibration grid landmine type and depth specifications . . . . . . . . 10

v

List of Figures

2.1 Calibration Lane Data Collection . . . . . . . . . . . . . . . . . . . . 9

2.2 Blind Lane Data Collection . . . . . . . . . . . . . . . . . . . . . . . 10

2.3 Data separation in 2 Dimensions . . . . . . . . . . . . . . . . . . . . . 26

2.4 Data separation in 3 Dimensions . . . . . . . . . . . . . . . . . . . . . 26

3.1 Typical in-phase and quadrature background measurements versus log-frequency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

3.2 Typical in-phase background measurements visibly shifted by someconstant . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

3.3 Typical quadrature background measurements corrupted by some mul-tiplicative constant, or some additive term which increases with fre-quency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

3.4 Plots of the Cramer-Rao lower bound, calculated, and sample estima-tor variances versus the standard deviation of k. Parameters: bi = 10,2n = 1. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

4.1 Signatures of VS-50 landmines versus log-frequency . . . . . . . . . . 45

4.2 Signatures of M-14 landmines versus log-frequency . . . . . . . . . . . 46

4.3 Actual, mean, and estimated signatures of M-14 landmines . . . . . . 48

4.4 Comparison of filter bank outputs resulting from landmine and clutterresponses. Note that the sum across the filter banks from the clutterresponse is larger than from the landmine response. . . . . . . . . . . 51

4.5 Comparison of in-phase and quadrature matched subspace receiveroperating characteristics from the calibration grid . . . . . . . . . . . 54

vi

4.6 Comparison of quadrature matched subspace detector and baselineenergy detector receiver operating characteristics from the blind andcalibration grids . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55

5.1 Estimation of VS-50 Response . . . . . . . . . . . . . . . . . . . . . . 58

5.2 Estimation of M-14 Response . . . . . . . . . . . . . . . . . . . . . . 59

5.3 Estimated landmine decay rates plotted against 1 and 2 in Hz. Eachlandmine type is represented by a different shape. . . . . . . . . . . . 60

5.4 Estimated landmine decay rates plotted against 1 and 2 in Hz (close-up). Each landmine type is represented by a different shape. Note thehigh degree of spatial correlation between landmines of each type. . . 61

5.5 Estimated clutter decay rates plotted against 1 and 2 in Hz. Notethat the estimated decay rates for clutter objects are spread through-out a wide frequency range. . . . . . . . . . . . . . . . . . . . . . . . 62

5.6 Gaussian PDF contours with scattered landmine and clutter decay rates 63

5.7 ROC for Gaussian-PDF estimated decay rate-based detector operatingin the calibration grid. . . . . . . . . . . . . . . . . . . . . . . . . . . 64

6.1 Support Vector Machine decision boundaries for non-rejecting SVMsand relevant landmine and clutter parameter locations from the cali-bration grid. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69

6.2 Receiver operating characteristics of non-rejecting support vector ma-chines trained on decay rates, matched subspace outputs, and fullsignal responses operating in the calibration grid . . . . . . . . . . . . 70

6.3 Receiver operating characteristics of rejecting support vector machinestrained on decay rates, matched subspace outputs, and full signal re-sponses operating in the calibration grid . . . . . . . . . . . . . . . . 71

6.4 Receiver operating characteristics for three different support vectormachines operating in the blind grid . . . . . . . . . . . . . . . . . . 72

7.1 Comparison of detector operating characteristics for matched subspaceand support vector machines . . . . . . . . . . . . . . . . . . . . . . . 75

vii

Chapter 1

Introduction

Although estimates vary, agencies including the Red Cross and the United Nations

concede that there are between 60 and 70 million active landmines in the ground,

buried across 70 countries around the globe. Every year approximately 26,000 people

are maimed or killed by landmines and 8,000 to 10,000 of these victims are children

[1].

Currently, there are approximately 340 different models of anti-personnel land-

mines. Although these landmines cost as little as three dollars to produce, their

presence inflicts a tremendous cost - especially in developing areas. Firstly, the

cost to safely detect and remove each landmine can range between $300 and $1000.

Furthermore, many surviving landmine victims require artificial prosthetics. These

artificial limbs can cost between $100 and $3000, and they must be regularly replaced

(every 3-5 years in adults, and every 6 months in children) [1]. It is impossible to

measure the damage landmines inflict upon productivity, emotional well being, and

the peaceful reconciliation of neighbors after years of war.

As of 2002, the landmine crisis primarily affects poorer countries for which the

economic impact of landmines is especially devastating. There are an estimated 22.5

million landmines in Egypt, 16 million in Iran, and 10 million in Iraq, to list only

some of the most egregiously affected countries [1].

The primary contributor to the large cost of landmine removal is a high false alarm

rate stemming from large amounts of anthropic clutter that pervades minefields. Until

it is excavated and determined to not pose a threat, this clutter must be considered

as dangerous as an actual landmine. On occasion, false alarm rates as high as 95%

1

percent have been reported when clearing minefields [2].

There are two distinct categories of landmine remediation: military and humani-

tarian [1]. The primary goal of military landmine removal is to clear a path through

a suspected minefield to allow troop movement in the area. Generally, this must

be accomplished quickly, usually at night, to avoid exposure to the enemy. Military

landmine clearance is often accomplished by driving large rollers, flails, or plows over

landmines to detonate them and clear a path [1]. Unfortunately, these techniques may

only achieve clearance rates of 80 percent, which is not an acceptable detection level

for humanitarian situations (the regulations for humanitarian de-mining are described

by the International Standards for Humanitarian Mine Clearance Operations, see

reference [3]). Humanitarian landmine removal is a much more arduous process gen-

erally involving indigenous workers using hand-held devices to locate possible targets

and safely remove them.

Landmines are indiscriminate killers, and while the UN is lobbying for a world-

wide ban on their use, concerted efforts are underway to remove landmines in areas

where they can cause harm to civilians. The goal of the research presented in this

thesis is to develop signal processing techniques that expediate accurate detection

of potential threats by decreasing the false alarm rates associated with currently

deployed landmine detectors while maintaining high detection rates.

Several novel sensor modalities have been investigated for discerning the locations

of buried landmines. Possible ground querying techniques include neutron backscat-

tering [4], ground penetrating radar [5, 6, 7, 8], seismic detectors [9], and acoustic-to-

seismic coupling [10]. While many of these technologies hold great promise for future

application, currently almost all fielded or nearly-fielded landmine detection systems

use electromagnetic induction (EMI) sensors which operate on the same principals

as a standard metal-detector.

2

A large body of literature exists dealing with the applications of EMI sensors and

processing of EMI data to the detection of buried landmines and unexploded ordnance

(UXO). Some of this work has focused on determining the EMI responses from rota-

tionally symmetric bodies [11, 12] and development of simplified phenomenological

models to fit such responses [13]. Several researchers have explored the processing

of time domain EMI responses to landmine detection using estimated decay rates

[14, 15, 16, 17, 18, 19]. Other work by Won et al. has indicated that the wideband

EMI spectral responses from different landmines are unique [20]. Gao et al. have

derived the complicated optimal wideband EMI detector and have compared its re-

sults to sub-optimal detectors [21, 22]. Additional signal processing research on the

detection and classification of low-metallic content landmines via EMI data has been

performed by Collins et al. [23, 24].

In this thesis, we build on this body of work in three ways. First, we will address

the problem of landmine response estimation via soil, or background removal and

show that our proposed estimator achieves the Cramer Rao lower bound under specific

statistical models of the received data. Second, we will apply the theory of matched

subspace detectors [25, 26, 27] to the detection and classification of landmines versus

clutter. Third, we will explore the possible applications of support vector machines

(SVMs) [28, 29, 30, 31, 32, 33] to the landmine detection problem.

The remainder of this thesis is organized as follows.

In chapter two we review some of the information fundamental to the rest of

the paper. We begin with a brief overview of electromagnetic induction sensors, the

data collection procedure used, and the particular EMI sensor used in this study:

the GEM-3. This is followed by a review of the Cramer-Rao lower bound and some

linear algebra preliminaries to the matched subspace detector. A full treatment of

matched subspace detectors is given prior to discussing the derivation and properties

3

of support vector machines.

Chapter three focuses on applying the Cramer-Rao lower bound to background

response estimation. A method of estimating the received signal by subtracting an

estimate of the background signal is proposed. The performance of the estimation

procedure is considered under four different models of the received data. The esti-

mation procedure is shown to achieve the Cramer-Rao bound for three of the models

and to approach the bound for the fourth model.

Chapter four discusses the particulars of the matched subspace detector as ap-

plied to the landmine detection problem. This includes subspace basis estimation

procedures and energy pre-screening. Results from a blind field trial are presented.

Chapter five deals with the problems of decay rate estimation from frequency-

domain EMI data. We briefly discuss a simple detection technique based on multiple

Gaussian probability density functions.

Chapter six describes the application of support vector machines to the landmine

detection problem. Three different support vector machines are presented, and their

receiver operating characteristics from blind field trials are discussed.

In chapter 7 we will review the research covered by this thesis and present thoughts

on the results. A comparison of the results of the support vector machine and matched

subspace landmine detection techniques is presented. Possible avenues for future work

are also discussed.

4

Chapter 2

Background

2.1 Electromagnetic Induction Systems

In 1831 Michael Faraday made the discovery that a changing magnetic field can

generate or induce a current in a nearby conductor. Building upon Faradays work,

Maxwell generated his four most famous equations upon which all electromagnetics is

based. The phenomenology associated with EMI sensors (like hobby metal-detectors)

is based directly on these equations.

2.1.1 Physics of EMI Systems

A standard EMI sensor has a primary coil, or transmitter coil, composed of wire

through which alternating current flows. This current flow generates a changing

magnetic field around the sensor that penetrates the ground. As Faraday noted, the

changing magnetic field from the transmitter coil induces current flow in the ground.

The current flowing through the earth (and any contaminants therein) generates

another magnetic field. Thus, it is possible to use a receiver coil to listen for the

magnetic field that results from the induced current flow in the earth. Of course, care

must be taken in the placement of and recording of measurements from the receiver

coil since the magnetic field of the transmitter will, in general, be much stronger than

the secondary field resulting from the earths response. The magnitude and phase of

the measured wideband EMI responses can be used to discern the amount, type, and

shape of buried metal objects [20, 34, 35].

Although Maxwells equations completely govern the responses of conducting ma-

5

terials in any shape and orientation, solving these equations for shapes of arbitrary

complexity is mathematically problematic.

It has been shown [12, 36] that the frequency-domain response of a buried highly

conducting object subject to EMI radiation can be modeled as:

H() = a+n

bn

jn (2.1)

Furthermore, the initial term a has been shown to be non-zero only for ferrous

targets [37]. Similarly, the time-domain response of such a system has been shown

[14, 38, 39] to be the weighted sum of exponentials:

S(t) = a(t) +n

Anent (2.2)

where, since the real part of n is negligible, n is real. In practice, the actual

responses of buried targets are well approximated by the first few terms in each of

the above summations. The primary parameters of interest are often assumed to be

the first few decay rates: 1 and 2. A significant amount of work has focused on the

application of estimated decay rates to landmine detection [14, 15, 11, 19, 17, 18, 40].

For high metal-content objects, the primary decay rates are generally fairly small,

resulting in slowly decaying exponential responses. Such responses are relatively easy

to sample in the time-domain. However, for objects containing small amounts of

metal (like most modern landmines), the decay rate parameters are very large and

the resulting exponential signature decays very rapidly. This makes time-domain

measurement of the decay rates difficult due to the rate at which the signal decays.

In this work, a wideband frequency-domain EMI sensor is utilized. Since a wide-

band frequency-domain sensors responses are not time dependent, these sensors are

advantageous when measuring quickly decaying exponential signals.

6

2.1.2 The GEM-3 Sensor

In this work, data from a Geophex GEM-3 sensor was used [41]. This section describes

the GEM-3 sensor.

The GEM-3 is a wideband digital electromagnetic sensor weighing about 10

pounds. The sensor head of the GEM-3 consists of three concentric coils. The

inner coil is the receiver coil, and the two outer coils comprise the transmitter coil.

The combination of the magnetic fields induced by the outer coils creates a magnetic

cavity (area with zero magnetic field) at the receiving coil. This prevents interference

between the transmitted and induced magnetic fields. [42]

When operating as a wideband sensor the GEM-3 prompts for a set of frequencies

at which to collect the induced EMI response. The GEM-3 can operate at frequencies

between 30 Hz and 24 kHz. In this work, the GEM-3 was programmed to collect data

at the following ten frequencies:

750 1410 2370 4050 6030 8250 10890 14430 19450 23970 Hz

A sensor that operates a multiple frequencies has the advantage of being able

to see at multiple depths into the medium since low frequency signal will pene-

trate further into the medium than a high frequency signal. It has been previously

shown that the GEM-3 performs significantly better for discriminating landmines

from clutter than several other sensors at blind government run test sites [43].

It has also been established that different types of landmines generate unique

frequency-domain signatures, which are relatively independent of target-sensor ori-

entation and distance for high metal content objects [20, 44]. However, the signatures

are dependent on target-sensor orientation and distance if the objects metal content

is low [43]. Recent work has also shown that these signatures change when the objects

are buried [43]. The goal of this research is to develop algorithms that reduce the ef-

7

fects of the soil on the measured signal and maximize the detection and classification

of landmines using their frequency-domain EMI signatures.

2.2 Data Collection

The GEM-3 data used in this work was taken from a government test site in Virginia.

The site is segmented into a large (50m x 20 m) grid consisting of squares measuring

1 meter per side. Before being used as a testing ground, all of the anthropic clutter

was systematically removed from the site. Some clutter was subsequently replaced to

provide discrete opportunities for clutter-induced false alarms. At the center of each

1m x 1m grid square a landmine, a clutter item, or nothing is emplaced. Ground

truth, i.e. the object buried in each square, is sequestered for this area and is known

only to the government sponsor. A separate area measuring 25 meters by five meters

was designated for sensor calibration and algorithm testing. The ground truth for the

calibration section is available to the public so that algorithms can be tested prior to

application on the blind grid.

The calibration data used for algorithm training in this work was recorded from

various spots throughout the calibration grid. In all, 20 clutter responses and 27

landmine signatures from 12 different landmine types at varying depths were col-

lected from the calibration lanes. Data from 980 potential targets was measured in

the blind grid. In the calibration lanes, where the ground truth is known, two back-

ground measurements were taken from either side of the center target location as

shown in Fig. 2.1. In the blind grid, measurements alternated between background

and potential targets at locations shown in Fig 2.2. All of the central and back-

ground measurements were taken by human operators. Although the sensor height is

approximately constant across all measurements, variations are bound to exist due to

uneven ground, operator height and posture, and other factors. Thus, sensor height

8

1 m

1 m

Figure 2.1: Calibration Lane Data Collection

is essentially a random variable.

In summary, for each calibration-grid data point two unique background signals

were measured on each side of the possible target location. For each blind grid data

point there are two shared background signals for each square with the exception of

the first and last squares from each column.

Table 2.2 indicates the depths and number of occurrences of each landmine type

in the calibration area. In the table, HE means high explosive present. For addi-

tional information regarding the data collection, please see the Hand Held Metallic

Mine Detector Performance Baselining Collection Plan [45, 46].

2.3 Parameter Estimation and the Cramer-Rao Lower Bound

Estimating an unknown parameter from data is a research topic that has been studied

extensively [47, 27, 48]. In this section two standard approaches to parameter esti-

mation and the Cramer-Rao bound which places limits on the best possible unbiased

9

1 m

1 m

Figure 2.2: Blind Lane Data Collection

Minetype Number of measurements Depth Range (in)

VS-50 5 0 - 2.25TS-50 3 0 - 1.75M-14 3 .25 - 1.75

M-14 (HE) 2 .5 - 1.125PMA-3 2 0 - 1.5VAL69 1 0VS-2.2 2 1.50 - 3M-19 2 1.25 - 2.5TMA-4 2 1.75 - 3TM62P3 2 1.50 - 3T-72 1 1.25TM-46 1 3VS1.6 1 1

Table 2.1: Calibration grid landmine type and depth specifications

10

estimator are discussed.

Consider a data set x consisting of xi data points drawn from some distribution

F with parameter : F (x,). The goal of an estimator is to predict the value of

using only the set of data given and (possibly) some prior knowledge of F. The

estimated value is then referred to as . is said to be an unbiased estimator of

if E(|x) = (where E represents the expected value). is said to be a consistentestimator of if the variance of (E(()2|x)) tends toward zero with probabilityone as the size of the data set grows to infinity.

There are two common approaches to parameter estimation: Bayesian and Maxi-

mum Likelihood [47]. In Bayesian estimation one assumes a prior distribution on the

parameter of interest F (). One then considers the distribution Fx(x|), and

= E(|x) =

f(|x) (2.3)

where:

f(|x) = f(x|)f()f(x)

(2.4)

In maximum likelihood estimation one considers the density f(x,) and maximizes

this function such that given a set of data x, is chosen to maximize f(x, ).

Often it is difficult to derive, implement, or show that the optimal estimator ex-

ists for a given problem [47]. Although consistency guarantees that the variance of

an estimate tends to zero, there are often some estimators whose variance will ap-

proaches zero more quickly than others. It is useful to determine if a given estimator

approaches or achieves the statistics of the best possible estimator; the Cramer Rao

lower bound (CRLB) provides such a tool [47, 49]. The CRLB is a measure of the

smallest variance that an unbiased estimator can achieve on a given set of data. If an

estimator achieves this bound, the estimator is the best unbiased estimator. Consider

11

an estimator of some parameter . Further, consider a set of data X = xi drawn

from the density f(xi,). In mathematical terms, the CRLB states that the variance

of an estimator satisfies:

V AR() 1J()

(2.5)

where J is the Fischer information defined as:

J() = E[

ln(f(x; ))]2 (2.6)

An alternative formulation of J(X) is given in [48] as:

J() = E[

2

ln(f(x|))|] (2.7)

2.4 The Detection Problem: Likelihood Ratios and Gener-

alized Likelihood Ratios

This thesis is primarily concerned with the detection of signals in noise. In this

section the optimal solution to the hypothesis testing problem - the likelihood ratio,

and a sub-optimal version of this test - the generalized likelihood ratio are reviewed.

2.4.1 The Likelihood Ratio Test

In most binary decision problems, one has a set of data and wishes to determine

which of two separate distributions the data was drawn from. The two hypotheses

are generally termed H0, and H1, or the null and alternative hypotheses respectively.

The likelihood ratio is the optimal decision statistic for a wide range of decision

problems [48] and is defined as:

(x) =p(x|H1)p(x|H0)

>< (2.8)

12

The null hypothesis is accepted if (x) is less than a certain threshold, , otherwise

the alternative hypothesis is accepted.

Determining the optimal threshold value to use depends on the performance cri-

teria chosen. The two most commonly used performance criteria are the Neyman-

Pearson criteria and the Bayes criteria [48].

2.4.2 The Generalized Likelihood Ratio Test

The standard likelihood ratio test assumes that the conditional distributions of the

data under the two hypotheses are known. Often this assumption is invalid. When

the two probability density functions are not known or are difficult to estimate,

the Generalized Likelihood Ratio Test (GLRT) is often utilized. The GLRT is an

intuitive (although not optimal) mechanism by which to approach the problem of

unknown distributions in a two-hypothesis decision scenario. Consider again the two

probability distribution functions, except assume that some parameter, denoted ,

associated with the probability density function p is unknown:

p(x|H1) p(x|, H1) (2.9)

p(x|H0) p(x|, H0) (2.10)

The likelihood ratio is [47]:

(x) =

(p(x|, H1) p(|H1)d)(p(x|, H0) p(|H0)d)

(2.11)

In practice, the calculation of this integral is often difficult, or if p(|H1) is un-known, impossible. One sub-optimal solution results from substituting estimates of

the unknown into the density functions. This formulation is termed the generalized

likelihood ratio test [48]:

(x) =p(x|, H1)|p(x|, H0)|

(2.12)

13

2.4.3 The Matched Filter

One simple and commonly encountered hypothesis testing problem involves deter-

mining the presence of a known signal s in the presence of additive zero-mean white

noise. In this case, the likelihood ratio reduces to a filter known as a correlation

detector or matched filter [48].

Let s and n be length i vectors consisting of the known signal and statistically

independent, N (0, I2n) noise respectively. Consider a received data vector x. Underthe null and alternative hypotheses

H0 : x = n

H1 : x = s+ n

The distributions of x under H0 and H1 are:

f(x|H0) =i

j=1

12pi2n

expx2j22n

f(x|H1) =i

j=1

12pi2n

exp(xj sj)2

22n

The likelihood ratio (equation 2.8) is

(x) =i

j=1

exp(2xjsj s2j)

22n

Taking the natural logarithm and incorporating the known values (2n,si) into the

threshold () yields:

(x) =i

j=1

xjsj><

which is the well known matched filter.

14

2.5 Linear Algebra Preliminaries and Matched Subspace De-

tectors

The common matched filter is a special case of a more general class of filters termed

matched subspace detectors [27]. Scharfs derivation of the matched subspace de-

tectors (see [27, 25]) requires some linear algebra preliminaries which allow him to

show that the matched subspace detector has many interesting and powerful proper-

ties including invariance to rotations in certain subspaces and optimal performance

under certain assumptions. In this section the linear algebra associated with projec-

tion matrices (which are an integral part of matched subspace filters) is discussed. A

summary of Scharfs definitions of invariance and maximal invariant statistics (closely

following the discussion from [27]) is given, and finally, summaries of Scharfs appli-

cation of these ideas to the development of the matched subspace filter and his proof

that the matched subspace detector is a uniformly most powerful test are provided.

2.5.1 Linear Algebra Preliminaries

Before discussing matched subspace filters, it is important to review the formation

and properties of projection matrices.

The span of a set of vectors [v1v2, ...vN ] is defined as the set of all linear combi-

nations of {v1,v2, ...vN}. A vector b is then an element of the span of {vi} if andonly if the equation:

b = a1 v1 + a2 v2 + ...+ aN vN (2.13)

has a solution. When the vectors {vi} are considered columns in a matrixH, the spanof {vi} is equivalent to the subspace denoted by . The orthogonal complementof is denoted .

15

A projection matrix E is a square matrix that gives a projection onto a given

subspace. The projection onto a subspace is denoted as EH . A projection

matrix must be idempotent (equal to its own square):

E2 = E (2.14)

An orthogonal projection matrix has the additional constraint of being Hermitian

(equal to its Hermitian transpose). Such projections are denoted with the letter P:

PH = P (2.15)

The most common orthogonal projection matrices are the Cartesian coordinate pro-

jections in

An orthogonal projection onto maps vectors contained in the subspace

to themselves, and maps vectors lying in to the zero vector. This can be seen

using the Cartesian projections:

Px

[c

0

]=

[c

0

](2.22)

Px

[0d

]=

[00

](2.23)

2.5.2 Invariance of Hypothesis Testing Problems

In many decision problems, there are parameters associated with the probability

distribution functions of the measured signals which are considered nuisance pa-

rameters. In these cases it is desirable to reduce the set of viable decision rules to

those which are (in some sense) invariant to changes in the nuisance parameters.

As Scharf states:

This leads to the key idea behind invariance in hypothesis testing: When

presented with nuisance parameters that are extraneous to the hypothesis

test, look for transformations of the measured data that would introduce

these nuisance parameters and then look for a decision rule that is invari-

ant to these transformations. [27] pg. 128

Consider the hypothesis testing problem of determining if X was drawn from

F1(x) or F0(x). If for every g in G:

x : F(x) (2.24)

y = g(x) (2.25)

F(y) = P[g(X) y] (2.26)17

where F(y) is the distribution of y with parameter , and

F(y) = Fg()(y) (2.27)

(that is - if the only effect of the function g(x) on the distribution F(x) is to change

the parameter from to g()) then the family of distributions for which equation

2.27 holds is said to be invariant to G. Also, if the transformation g maintains thedichotomy between H1 and H0, the hypothesis testing problem is said to be invariant

to G.

2.5.3 Invariance Tests and Maximal Invariant Statistics

A hypothesis test is invariant to G if (g(x)) = (x) [27]. Furthermore, a statistic ismaximally invariant if

M(g(x)) =M(x) for all g in G (invariant) (2.28)

M(x1) =M(x2) implies x1 = g(x2) for some g in G (maximal) (2.29)

Thus, all invariant tests may be written as a function of a maximally invariant statistic

[27]:

(x) = (M(x)) (2.30)

These results are important for the landmine detection problem because they

show that when deriving a decision rule for all invariant hypothesis testing problems,

it is possible consider only functions of a maximal invariant statistic.

2.5.4 Matched Subspace Detectors

In this section a review of Scharfs work is presented which shows that the problem

statement leading up to the matched filter is naturally invariant to a set of trans-

formations and that the matched subspace detector is a maximal invariant statistic.

18

Scharfs explanation of why the matched subspace detector is uniformly most pow-

erful is also reviewed.

In a detection problem, the exact form of the signal of interest is often unknown.

The signal may be subject to an arbitrary gain, or it may be a random (unknown)

combination of a set of basis vectors. As has been previously noted, a vector x which

lies in the subspace can always be represented by a linear combination of a set

of vectors comprising the matrix H. The signal x can then be represented as:

x =n

nhn = H (2.31)

where H is an N X P matrix and is a P X 1 vector containing the coordinates of

x in . If the weight vector is known a priori then, since the subspace

is known, the vector x is completely determined, and the optimal detector is the

matched filter. However, if is unknown, then all that is known about the vector

x is that it lies somewhere in the space spanned by H. Under these assumptions, if

x is corrupted by white noise and biased in , Scharf [25] has shown that the

optimal test statistic is:

2 = xTPHx (2.32)

Here we summarize his proof.

Let X = H + N with N : N [0, 2I]. If a channel also rotates the signalin and adds a bias v in the subspace =, this can be described

mathematically as:

QH(X+ v) (2.33)

where QH is a rotation matrix in and v lies in the subspace . Note that

the rotation of v leaves v unchanged (since we are rotating in ), and H is

mapped to H. Let

19

y = QH(X+ v) (2.34)

y : N [H + v, 2I] (2.35)

The hypothesis test is then to discern between the null hypothesis ( = 0) and

the alternative ( > 0). As mentioned above, since QH and v are unknown, they are

considered nuisance parameters and the matched subspace detector should ideally be

invariant to them. To show that the matched subspace detector is uniformly most

powerful, Scharf shows that the distribution of y is invariant to these parameters,

the matched subspace detector is a maximal invariant statistic, and the matched

subspace detector has a monotone likelihood ratio.

It can be shown that the hypothesis testing problem in this case is invariant to

the set of functions

G = {g : g(y) = QH(y+w)} (2.36)

since the distribution of QH(y+w) is

N [H + v+w, 2I] (2.37)

and the distribution of y is given by eq. 2.35. Note that the form of the distribution

has not changed (only the mean parameter has been altered), thus the distribution

of y is invariant to G. Also, since the transformation of the parameter (H+v) is:

g(H+ v) = H + v+w (2.38)

the transformations of the hypothesis are:

g(H0) = v+w = H0 (2.39)

and

g(H1) = H + v+w = H1 (2.40)

20

the dichotomy of the original parameter space is maintained, and the hypothesis

testing problem is G-invariant.To show that the matched subspace statistic

2 =M(y) = yTPHy (2.41)

is maximal invariant to the group G, Scharf shows that eq. 2.28 and 2.29 hold with:

g(y) = QH(y+ v) (2.42)

For eq. 2.28:

(QH(y+ v))TPH(QH(y+ v)) (2.43)

since QTHQH = I:

= (y+ v)TPH(y+ v) (2.44)

and since v is in

= yTPHy (2.45)

For eq. 2.29:

yT1PHyT1 = y

T2PHy

T2 (2.46)

note that the quadratic form involving PH is the energy of the vectors in the subspace

. Since the energies of both y1 and y2 in the subspace are the same, y2

must be a rotation of y1 and/or differ only in the subspace . Thus:

y1 = QH(y2 + v) (2.47)

for some QH and v.

Since the statistic 2/2 (2 from eq. 2.41) is primarily the square of a Gaussian-

distributed vector, it can be shown (see [27]) that it is distributed as a chi-squared

21

random variable. By the Karlin-Rubin theorem, since all 2 random variables have

monotone likelihood ratios, the 2 test is uniformly most powerful [27].

In the above discussions, the variance of the noise (2) has been assumed to be

known. If this is not the case, then the maximal invariant statistic becomes:

F =xTPHx

xT (PH)x(2.48)

or

F =xTPHx

xT (IPH)x (2.49)

Furthermore, note that the constant false alarm rate matched filter can be de-

scribed using a cosine statistic as [50]:

cos2 =xTPHx

xTx(2.50)

Although matched subspace detectors are significantly more complicated than the

special case of the matched filter, they provide a wide range of invariances and are

significantly more robust than matched filters when the signal of interest is not known

exactly, as is the case in the particular problem of landmine detection.

2.6 Support Vector Machines

Support vector machines (SVMs) are a relatively new type of learning machine that

have many interesting properties [29, 32, 28, 31]. Support vector machines operate

by mapping the data of interest to a high dimensional space and generating a sep-

arating hyperplane in that space. The high dimensional separating hyperplane can

then be used for hypothesis testing. In this section, we describe the mathematics

associated with SVMs and review how they avoid the complexities usually associated

with decision making in a high dimensional space.

22

2.6.1 Problem Statement and the Vapnik-Chervonekis Dimension

Assume that a set of training vectors {xi} are available which were drawn from someprobability density function P (x, y) where y Y : {1, 1}. Here, y represents theclassification of the training data into one of two sets or hypotheses. Let y = 1correspond to H0 and y = 1 correspond to H1. Then consider then the sets of training

data:

(x1, y1), ..., (xN , yN)

Unfortunately, the equations governing the VC dimension are complicated and

usually not of practical value [29]. If the search for f is restricted to linear forms:

f(x) = (w x) + b (2.54)

(hyperplanes in some space), it can be shown [28] that the VC dimension is bounded

by the minimal distance from the hyperplane to a data point; this distance is called

the margin.

2.6.2 Kernel Functions and Avoiding the Complexities of a High Dimen-

sional Space

Although the linear restrictions suggested above appear to be somewhat limiting,

this apparent shortcoming can be overcome by mapping the observed data into high

dimensional spaces. Consider a function :

A simple example from [32] and [29] illustrates this point. Consider a set of data

distributed in

X1

X2

Figure 2.3: Data separation in 2 Dimensions

Z1Z2

Z3

Figure 2.4: Data separation in 3 Dimensions

26

Special rules can be applied to determine if a function is a valid kernel. In this

thesis, we restrict ourselves to polynomial and Gaussian functions of the 2-norm of

the data. These are valid kernel functions by Mercers Theorem [29].

2.6.3 Finding the Optimal Hyperplane

Previous work, including the illustrative example above, has shown that in some cases

mapping data into higher dimensions may decrease the complexity of the data separa-

tion problem. Furthermore, kernel functions provide a tool to obtain dot products of

vectors in high-dimensional spaces without actually performing the high-dimensional

mapping. However, a technique for determining the optimal hyperplane as to achieve

the best possible performance has not been presented. In order to find the optimal

hyperplane, the discussion given in [29] is reviewed.

Optimal performance, and thus the optimal hyperplane, can be found by mini-

mizing the expected risk. Since the expected risk is generally unknown, the optimal

hyperplane is found by minimizing the upper bound on the expected risk via [28]:

R[f ] R[f ] +h ln(ln(2n

h+ 1) ln(

4))

n(2.59)

with probability of at least 1 for n > h.

where h is the VC dimension of the function class F .If the training data is assumed to be perfectly separable by f , then R[f ] is zero,

and the risk is bounded by a monotonic function of the VC dimension h [29].

Furthermore, Vapnik has shown [28] that for linear classifiers (like the one deter-

mined by the optimal hyperplane) the VC dimension itself is bounded by a monotonic

function of w. Thus, one can find the optimal hyperplane by minimizing w while

maintaining perfect training data separation:

yi((w (xi)) + b) 1, i = 1, ..., n. (2.60)27

This minimization is complicated, but through Lagrange multipliers, it is possible to

arrive at the following quadratic programming formula [29, 32, 28]:

max

T1 12TD (2.61)

subject to:

TY = 0 (2.62)

0 (2.63)

where:

1T = [1, ..., 1] (2.64)

T = [1, ..., n] (2.65)

w =ni=1

iyi(xi) (2.66)

Dij = yiyj(k(xi,xj)) (2.67)

k being the kernel function

The decision statistic is then:

f(x) = sign

[ni=1

yii((x) (xi)) + b]

(2.68)

or:

f(x) = sign

[ni=1

yiik(x,xi) + b

](2.69)

In the above discussion it is assumed that the training data available is perfectly

separable by a hyperplane in F . If this is not the case, a hyperplane that is a solutionto:

max

T1 12[TD +

2maxC

] (2.70)

28

(subject to the same constraints) must be determined.

There is a substantial body of literature on solving the quadratic programming

problem (for a list of references, see [29]). In this work, we use Cawleys SVM package

(available from [30] or [51]). It achieves good performance by splitting the quadratic

optimization problem into mini-problems of size two using the sequential minimal

optimization technique [29].

29

Chapter 3

The Cramer-Rao Lower Bound

The response of the ground to wideband EMI sensors is a random vector b which

depends upon the makeup of the soil and the height of the sensor above the ground.

When measuring the EMI responses of buried targets in the earth, the variability in

the background response degrades our received signal. Thus, the measured response

from a buried M-14 landmine will differ significantly depending on the composition

of the soil under which the landmine is buried [43]. Since landmines are found

throughout the world in varying environments, background interference adversely

affects ones ability to define a robust non-adaptive decision algorithm.

One approach to reducing the effect of the background response is to take mea-

surements near the potential target and use these measurements to estimate the

background signal at the target location. In this chapter we discuss several models of

the received background data and show that under certain assumptions the Cramer-

Rao lower bound can be achieved by using the available background measurements to

remove an estimate of the background signature from the potential target location.

In the measurements from the site in Virginia, two background signals were taken

for each potential target (see figures 2.1 and 2.2). We will assume that the background

response at the site is constant over a distance of one meter. This allows us to

model the background response as constant over the potential target location and

two neighboring background measurements. The assumption that the background

response is constant is reasonable since the composition of the soil is not expected

to change substantially over one meter and it has been shown [43] that sensor drift

occurs over a longer time span than would be required to take EMI readings over a

30

103 104

80

70

60

50

40

30

20

10

0

LogFrequency

Res

pons

e

Typical Inphase and Quadrature Background Signals vs. LogFrequency

QuadratureInphase

Figure 3.1: Typical in-phase and quadrature background measurements versuslog-frequency

one meter square. For examples of in-phase and quadrature background signals, see

figure 3.1.

This chapter is divided into four sections each considering a different model of

the background response: additive zero mean white Gaussian noise, additive white

Gaussian noise with an additive constant term across frequencies, additive white

Gaussian noise with an additive non-constant variance term across frequencies, and

additive white Gaussian noise with a multiplicative term across frequencies.

3.1 Additive White Noise

For each target we have three measurements from the GEM-3. They will be denoted

si and are modeled as:

31

s1 = n1 + b (3.1)

s2 = n2 + b+ r (3.2)

s3 = n3 + b (3.3)

where

b is some unknown (but constant across the three measurements) vector representing

the ground response

ni is additive zero-mean white Gaussian noise [43]

r is the response of a buried target.

represents an arbitrary (non-negative) gain affecting the target response due to

the targets depth beneath the ground and the sensors height above the ground

The hypothesis test will be to decide between > 0 and = 0. First, we are

concerned with obtaining the best estimate of b so that we can estimate r via

r = s2 b. (3.4)

We propose the estimator

b =s1 + s3

2. (3.5)

This estimator is widely used in practice [43], but little analysis has been performed

to evaluate its statistical properties. First we must show that b is unbiased. This is

easily shown by:

32

E[b] = E[s1 + s3

2] (3.6)

=1

2E[n1 + b+ n3 + b] (3.7)

= b. (3.8)

Note that b is a vector. In the following mathematical treatment we exploit

the assumption that the interfering noise is always white [43], so the measurements

between data points are uncorrelated. We use bi to represent the ith element of b

and show that our estimators satisfy our criteria for general bi and thus for b (also,

xji represents the ith data point in vector x from ground measurement j {1, 2, 3}).

The variance of bi is:

VAR[bi] = E[(bi bi)2] (3.9)

= E[b2i ] b2i (3.10)

=1

4E[(s1i + s3i)

2] b2i (3.11)

=1

4E[n21i + n

23i + 4n1ibi + 4n3ibi + 2n1in3i + 4b

2i ] b2i (3.12)

=2n2

(3.13)

33

To determine optimality, we must show that the variance of bi achieves the CRLB

(eq. 2.5), using eq. 2.7 for the Fisher information. Since s1 and s3 are distributed as

N (b, 2nI), we have:

J(bi) = Ebi [

bi

2

ln(f(s1i, s3i|bi))] (3.14)

Simplifying from the inside out:

f(s1i, s3i|bi) = C exp (s1i bi)2 (s3i bi)222n

(3.15)

ln(f(s1i, s3i|bi)) = ln(C) + 122n

(s21i 2s1ibi + b2i + s23i 2s3ibi + b2i ) (3.16)

biln(f(s1i, s3i|bi)) = 1

22n(2s1i + 2bi 2s1i + 2bi) (3.17)

differentiating again yields:

22n

(3.18)

Finally, taking the expected value and multiplying by 1, we have

J(bi) =2

2n(3.19)

And the CRLB is satisfied:

1

J(bi)=2n2

= V AR(bi) (3.20)

34

103 104

80

70

60

50

40

30

20

LogFrequency

Res

pons

e

Typical Inphase Background Signals vs. LogFrequency

Figure 3.2: Typical in-phase background measurements visibly shifted by someconstant

Thus we have the optimal estimator of b given s1 and s3.

In analyzing the experimental data, we noted that the data received from the

GEM-3 processor for adjacent background measurements was more variable than

could be accounted for by additive zero-mean Gaussian noise. The in-phase readings

appeared to be shifted by some additive constant across the frequency range, and

the quadrature readings appeared to either be corrupted by an additive term with

variance that increases across the frequency range, or have some small multiplicative

noise effects. For examples of these effects, see figures 3.2 and 3.3.

For clarity, we will refer to the additive-noise quadrature model as quadrature

model 1, and the multiplicative-noise quadrature model as quadrature model 2. In-

tuitively, it is reasonable to assume that the in-phase and quadrature signals should

35

103 104

80

70

60

50

40

30

20

10

LogFrequency

Res

pons

e

Typical Quadrature Background Signals vs. LogFrequency

Figure 3.3: Typical quadrature background measurements corrupted by some mul-tiplicative constant, or some additive term which increases with frequency

be subject to the same noise effects (additive, multiplicative, etc...). However, it is

unclear which statistical assumptions better model the background interference. For

completeness, we present the Cramer-Rao lower bound derivations for both cases. We

proceed to determine whether the previously posed estimator is still optimal when

the assumptions regarding the statistics of the noise are modified.

3.2 Additive White Noise and DC Term (in-phase)

For the in-phase case we will model the extra interference as a random DC term cj

with variance 2c :

s1 = n1 + b+ c1 (3.21)

36

s2 = n2 + b+ r+ c2 (3.22)

s3 = n3 + b+ c3 (3.23)

We assume that the cj are distributed as N (0, 2c ). Note that while the b andn vectors are functions of frequency, the DC terms cj are constant across frequency.

Under these assumptions, the si are distributed N (b, I(2n + 2c )). It is easy to showthat the estimator b is unbiased, and that its variance is 2b

= 2n+

2c

2. From the

distribution of f(s1i, s3i|bi), we can show that the form of the CRLB correspondingto equation 3.16 is:

ln(f(s1i, s3i|bi)) = ln(C) + 12(2n +

2c )[(s1i bi)2 + (s3i bi)2] (3.24)

Differentiating twice with respect to bi yields the equation corresponding to 3.18:

2(2n +

2c ). (3.25)

Multiplying by negative one and taking the inverse, we again find the CRLB equal

to the variance of the estimator and the estimator is thus optimal under the in-phase

hypothesis.

3.3 Additive White Noise and Additive Function of Fre-

quency (model 1 quadrature)

This derivation is very similar to the in-phase model. In fact, the in-phase model of

an additive DC term is really a special case of the general additive vector encountered

here. In this model, the extra interference is modeled as a vector cj whose individual

terms cji have variance 2ci:

37

s1 = n1 + b+ c1 (3.26)

s2 = n2 + b+ r+ c2 (3.27)

s3 = n3 + b+ c3 (3.28)

From the observed data, we can see that the variance of the cji increases with fre-

quency. We assume that the cji are distributed as N (0, 2ci). Let 2c be the vector ofci variances.

2c =[2c1 ,

2ci, ..., 2cn

]T(3.29)

The cj vectors are distributed as N (0, I2c). Under these assumptions, the sj aredistributed N (b, I(2n+2c)). It is easy to show that the estimator b is unbiased, andthat its variance is 2b

= 2n+

2c

2. The distribution of the individual sji is:

f(s1i, s3i|bi) = C exp (s1i bi)2 (s3i bi)2

2(2n + 2ci)

(3.30)

ln(f(s1i, s3i|bi)) = ln(C)+ 12(2n +

2ci)(s21i2s1ibi+b2i +s23i2s3ibi+b2i ) (3.31)

biln(f(s1i, s3i|bi)) = 1

2(2n + 2ci)(2s1i + 2bi 2s1i + 2bi) (3.32)

differentiating again yields:

2(2n +

2ci)

(3.33)

38

Taking the expected value and multiplying by 1, we have

J(bi) =2

(2n + 2ci)

(3.34)

And the CRLB is satisfied:

1

J(bi)=

(2n + 2ci)

2= V AR(bi) (3.35)

3.4 Additive White Noise and Multiplicative Term (model

2 quadrature)

We now consider the quadrature case and assume that multiplicative Gaussian noise

is affecting the measured background signals. In this model, the multiplicative scaling

effects known to affect target responses are also assumed to affect the background

responses. This makes this model perhaps the most intuitively satisfying of all the

statistical models presented.

The multiplicative noise terms affecting the background responses are denoted kj

and are assumed to be distributed as N (1, 2k). The received signals are modeled as:

s1 = n1 + k1b (3.36)

s2 = n2 + k2b+ r (3.37)

s3 = n3 + k3b (3.38)

39

Note that s1 and s3 are distributedN (b, (b22k+2n)I). Furthermore, the estimatorb = s1+s3

2is still unbiased.

Since the mean value (bi) enters the signal distribution in the variance as well as

the mean, the calculations are more complicated. Since we assume that the noise

interference is white, we can consider the scalar equivalents of the pdf. The variance

of bi is given by:

VAR[bi] = E[b2i ] b2i (3.39)

= E[(n1i + k1i bi + n3i + k3i bi

2)2] b2i (3.40)

=1

4E[n21i + 2 n1i k1i bi + 2 n1i n3i + 2 n1i k3i bi + k21i b2i+ (3.41)

2 k1i bi n3i + 2 k1i b2i k3i + n23i + 2 n3i k3i bi + k23i b2i ] b2i

Taking the expected value, we obtain:

VAR[bi] =2si2

(3.42)

with

2si = (b2i

2k +

2n) (3.43)

To determine optimality, we begin with the conditional probability density func-

tion:

f(s1i, s3i|bi) = 12pi2si

exp[ 122si

((s1i bi)2 + (s3i bi)2)] (3.44)

and apply equation 3.16. After taking the natural logarithm, the equation can be

separated into two terms from the coefficient and exponential portions of equation

3.44:

ln (1

2pi2si) 1

22si((s1i bi)2 + (s3i bi)2) (3.45)

40

Differentiating equations 3.45 twice with respect to bi yields:

(62nb2i

2k 24n + 2s1ib34k (3.46)

6s1ibi2k2n + 2s3ib3i4k 6s3ibi2k2n

3s234kb2i + s232k2n 34ks21ib2i

+s21i2k

2n + 2b

46k 22k4n)/

(b2i2k +

2n)

3

The expected value operator then replaces s2ji with 2si+b2i and sji with bi, yielding:

2(2b2i

4k + b

2i

2k +

2n)

(b2i2k +

2n)

2(3.47)

The Cramer-Rao lower bound is given by:

12 (2b2i 4k+b2i 2k+2n)

(b2i 2k+2n)

2

(3.48)

or:

1

2

4si

2 b2i 4k + 2si(3.49)

Note that in this case, our estimator does not achieve the Cramer Rao lower

bound. In order to determine how close the variance of the proposed estimator is to

the variance of the optimal estimator, consider the term:

2 b2i 4k (3.50)

in the denominator. Since this term differentiates the CRLB from the variance of the

proposed estimator, as the term approaches zero, the difference between the variances

becomes negligible.

41

1e005 0.0333 0.0667 0.1 0.133 0.167 0.2 0.233 0.267 0.30

0.5

1

1.5

2

2.5

3

3.5

4

4.5

5

k

b2

Comparison of CRLB, Sample, and Calculated variances vs. k

CRLB

Sample Variance (k

2*b2 +

n2

)/2

Figure 3.4: Plots of the Cramer-Rao lower bound, calculated, and sample estimatorvariances versus the standard deviation of k. Parameters: bi = 10,

2n = 1.

To determine how well the proposed estimator performs compared to the CRLB,

a set of data was generated under the proposed assumptions and the actual (sample)

variance of the estimator was compared with the theoretically calculated variance

of the estimator and the Cramer-Rao lower bound. Figure 3.4 shows the Cramer-

Rao lower bound, the sample variance from a set of ten thousand data points, and

the calculated variance of the estimator (2s/2). Note that the difference between the

CRLB and the sample and computed variances is small, especially for small 2k values.

In experiments, almost all estimated 2k values were found to be below 0.1 (except for

the lowest frequency measurement which, due to near-zero average magnitude, had a

high estimated 2k). Thus, despite not achieving the CRLB, the proposed estimator

is expected to perform well on this data set.

42

We have shown that the intuitive estimation procedure that involves subtracting

the mean of the received background signals is optimal under three different assump-

tions regarding the underlying stochastic nature of the received signals:

1. if the signal is corrupted by additive white noise

2. if the signal is corrupted by additive white noise and a Gaussian-distributed

additive DC term (in-phase)

3. if the signal is corrupted by additive white noise and a Gaussian-distributed

additive vector (quadrature model 1)

and although not optimal, the intuitive procedure is a low-variance estimate when the

signal is corrupted by additive white noise and a Gaussian-distributed multiplicative

term (quadrature model 2). In the following chapters we will utilize the proposed

estimation technique to obtain estimates of the actual target responses for use in our

detection algorithms.

43

Chapter 4

Signal Processing Using Matched

Subspace Detectors

In chapter 3 we proposed an estimator of the background signal b which is an optimal

or low-variance estimator under several models of the underlying stochastic processes.

Using this estimator, we can now estimate the target response via

r = s2 b. (4.1)

Using this target response estimate, a detection algorithm that distinguishes between

landmines and clutter and between different landmine types can be developed. In this

section the application of matched subspace filters to correctly identify and classify

landmines is presented.

4.1 Properties of Estimated Landmine Responses

We begin by inspecting the responses of different landmine types. As expected, the

landmines all have unique wideband EMI signatures [20].

Figure 4.1 shows the estimated in-phase and quadrature responses of five VS-50

landmines which were obtained by subtracting the estimated background as suggested

in Chapter 3. These five landmines were buried at depths from 0 to 1.875 inches.

Figure 4.2 shows the estimated in-phase and quadrature responses of three M-14

landmines which were obtained in the same manner. These three landmines were

buried at depths from 0.25 to 1.75 inches.

The responses of different landmine types are distinguishable from one another

and, as has been shown (see [20, 43]), the general shape of the responses stays constant

44

103 104

100

0

100

200

300

400

500

LogFrequency

Res

pons

e

Estimated VS50 Landmine Responses vs. LogFrequency

InphaseQuadrature

Figure 4.1: Signatures of VS-50 landmines versus log-frequency

across measurements despite differences in target-sensor orientation and mine depth.

Note that the final data point, corresponding to 23,970 Hz, in the estimated sig-

nals appears to be markedly out of place - especially in the quadrature measurements.

Comparisons to previous work on landmine responses and the theoretical treatment

of responses given in chapter 2 led us to believe that the final data points are dis-

torted. Whether this corruption is a function of the sensor (it is operating at the

very limit of its frequency range), the additional noise inherent to measurements at

these frequencies, or user error is unclear. Due to the apparent erroneous nature of

the highest frequency measurement, the final data point is excluded in the work that

follows.

Although the landmine responses are discernible from one another and maintain

their approximate shape despite differences in their depth, it is clear that the ener-

45

103 104

15

10

5

0

5

10

15

LogFrequency

Res

pons

e

Estimated M14 Landmine Responses vs. LogFrequency

InphaseQuadrature

Figure 4.2: Signatures of M-14 landmines versus log-frequency

gies of the responses from any particular landmine type vary widely. This problem is

inherent in real-fielded landmine detection: the depth at which a landmine is buried

substantially alters the energy of the received signal [36]. This is particularly ev-

ident in the quadrature responses of high metal-content mines like the VS-50 (see

figure 4.1). This signal distortion can be modeled as an uncertainty parameter in

the distributions of our data. Consider an unknown parameter which acts as a

multiplicative gain on the received data. Physically, represents the depth at which

the landmine is buried. An effective detector should be robust or invariant to changes

in the uncertainty parameter . The matched subspace detector is such a detector

[52].

46

4.2 Basis Estimation

In order to apply a matched subspace detector, a linear subspace containing the

received signals is needed. Alternatively, a set of basis functions that spans the

responses from a particular landmine type must be found.

Estimating a signal subspace is a well studied problem [27], but the maximum

likelihood solution was not appropriate in this situation. The maximum likelihood

estimate of a signal subspace consists of the p largest eigenvectors of the sample co-

variance matrix [27]. However, the calibration data available often only contained

between one and three instances of any particular landmine type. The sample covari-

ance matrix in this case would clearly be inaccurate. Furthermore, if it is assumed

that variation in target-sensor distance leads primarily to a change in the gain of the

received signals, we can very easily model the subspace in a much simpler fashion:

as scaled versions of a mean vector.

In figure 4.2 an actual M-14 landmine quadrature response, the mean of all M-14

landmine responses, and an estimate of the actual M-14 using a scaled version of the

mean are shown. The error in the resulting signal estimation is about 0.7% of the

original signals energy. In this particular case the estimation of a landmine response

as a scaled version of the mean of all landmine responses is very accurate, and this

result holds across all different landmine types (although the technique performs

significantly better on the quadrature data).

The decision to model the different responses as scaled versions of a single re-

sponse is also intuitively satisfying, since it applies a simple law to account for

distance-induced differences in measurements. Furthermore, the scaling relationship

associated with target-sensor distance is well known [36, 35].

47

103 1042

4

6

8

10

12

14

16

LogFrequency

Res

pons

e

Mean, Actual, and Estimated M14 Responses vs. LogFrequency

Mean M14 ResponseActual M14 ResponseEstimated M14 Response

Figure 4.3: Actual, mean, and estimated signatures of M-14 landmines

4.3 Designing the Matched Subspace Filter

The clutter present in the blind grid poses a unique problem to traditional subspace

detection techniques. Clutter is by nature difficult to classify (generally made up of

anthropic and natural conductors with an enormous range of sizes and shapes). Also,

the calibration data set contained only 20 clutter responses. One approach considered

was to model the clutter as a set of basis functions and have a clutter-detection

algorithm to compare against our landmine detection algorithm. However, attempts

to formulate a basis to model clutter are inherently limited since clutter is comprised

of an infinite set of possible shapes, sizes, and materials. Despite the wide range

of clutter which impedes most detection techniques, a matched subspace detector

should be somewhat naturally robust to clutter interference. Consider a piece of

48

random clutter whose response is some vector x. Our decision statistic is the cosine

statistic (equation 2.50):

=xPHx

xx. (4.2)

The numerator can be considered a matched-energy detector since the output of

the numerator is the amount of the energy in x which lies in the subspace spanned

by . We have assumed that there is only one basis vector in H corresponding to

the mean of the landmine responses for a given landmine-type. Therefore, for clutter

to register a large response in the detector, it must look much like a scaled version

of our landmine response (i.e. lie in the subspace spanned by the mean vector of the

landmine responses).

The standard matched subspace detector is appropriate for finding a single land-

mine type amongst background or clutter (binary hypothesis test). However, the

blind grid is populated with various landmine types. In the multiple hypothesis

test case our detector must decide between H0 and all the alternative hypotheses:

{H1,H2,...,Hn}. The standard likelihood ratio then becomes:

=p(x|{H1, H2, ..., Hn})

p(x|H0) (4.3)

=p(x|H1)p(H1) + p(x|H2)p(H2) + ...+ p(x|Hn)p(Hn)

p(x|H0) (4.4)

=i

i(x)p(Hi) (4.5)

Where p(Hi) represents the a priori probability of minetype i. Since all mine types

are considered equally likely a priori, this reduces to:

=i

i(x) (4.6)

49

Equation 4.6 suggests implementing a bank of matched subspace filters and sum-

ming their outputs to form a decision statistic. However, this formulation also as-

sumes that the distribution p(x|H0) is known, but in this work, the distribution ofclutter is unknown and difficult to estimate. As an illustrative example of the prob-

lems encountered when p(x|H0) is unknown, consider n matched subspace filters eachtuned to a specific landmine type. When a landmine response is presented to the

bank of filters, a typical set of outputs contains one large response coinciding with

the matched subspace filter tuned to that landmine type. When a clutter response

is fed to the same bank of filters, although no filter bank produces a particularly

large result, the clutter vector generates significant responses from several different

filter banks because the clutter model in the denominator which would normally off-

set the numerator is missing. That clutter induces significant responses from several

filter banks makes intuitive sense since all of the landmine responses, when taken

together, span a large subspace and clutter will undoubtedly have some energy in

the span of this space. For typical examples of the matched subspace filter bank

outputs for clutter and landmine data, see figure 4.3. Note that the sum of the out-

puts across filter banks for the input clutter vector is larger than the sum for the

landmine vector. In this case, a better (although sub-optimal) decision statistic than

the summation across the filter banks is the maximum value across the filter banks.

Although this technique is not equivalent to the Bayesian solution to the multiple

hypothesis test problem, the similarities are evident. The Bayesian solution to the

multiple hypothesis testing problem is to choose Hi such that Hi maximizes the a

posteriori probability p(Hi|x) [48].A bank of matched subspace detectors was thus generated, with each filter tuned

to a specific landmine type. The decision statistic chosen was the maximum value

50

2 4 6 8 10 120

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Filter Bank Number

Filte

r Ban

k O

utpu

t

Matched Subspace Outputs vs. Filter Bank for Landmine and Clutter Responses

Landmine Filter Bank Outputs, sum = 1.0110Clutter Filter Bank Outputs, sum = 1.2803

Figure 4.4: Comparison of filter bank outputs resulting from landmine and clutterresponses. Note that the sum across the filter banks from the clutter response islarger than from the landmine response.

across the bank of filters.

= maxi

i (4.7)

Despite not being optimal, we shall see that the performance of this statistic is very

good. Furthermore, the maximum value across the filter banks provides an intuitive

method to perform landmine classification - the landmine type corresponding to the

largest filter bank output is considered our best guess of the underlying landmine type.

This is different from the maximum a posteriori Bayesian solution which chooses Hi

to maximize p(Hi|x); here Hi is chosen to maximize the percent energy of x in PHiwhich is an intuitive measure of p(x|Hi).

Note that since equation 4.2 contains a normalization term in the denominator

(xx), the detector ensures that the maximum output from the detector is one -

51

regardless of the energy of the input vector. This is important in a bank of detectors

since the numerator (xPHx) will very often produce a large result for a large input

energy x.

The invariance that matched subspace detectors provide to gain occasionally has

some drawbacks. The primary drawback in this work stems from very low energy

clutter which often looks like deeply buried high-energy landmines. Consider the

VS-50 landmine (see fig. 4.1) which has a relatively high energy and rather flat fre-

quency response. A substantial amount of low energy clutter also has a flat frequency

response. As a result, scaled low energy clutter often looks like a VS-50 landmine

to a matched subspace detector.

However, our prior knowledge regarding the depths at which landmines can be

buried leads us to conclude that very low energy flat signatures are not landmines

buried meters in the ground, rather they are small pieces of clutter. In this work we

assume that landmines will not be buried beyond their tactical depths. We further

assume that the distribution of landmine depths in the blind grid is uniform and com-

mensurate with the depths found in the calibration grid. Under these assumptions,

we implemented an energy pre-screener that evaluates the energy of each potential

target vector to ensure that it is commensurate with the current filter bank landmine

type (within one order of magnitude from the lowest and highest energies from the

calibration grid for that particular landmine type). If the energy is within limits, the

subspace detector proceeds normally, otherwise that particular bank of the subspace

detector (wherever the input energy was found to be outside the reasonable range of

energies for that landmine type) is manually assigned a low output value.

Besides discriminating between clutter and landmines, detection algorithms must

also discriminate between empty ground signatures and landmines. While the blind

grid contains several blank squares containing neither anthropic clutter or landmines,

52

no such squares were measured in the calibration grid, so our detector may be subject

to false alarms caused by empty grid squares. We did not consider this a serious prob-

lem because background-corrected responses from blank grid squares should contain

very little energy and be automatically rejected by the energy pre-screener.

4.4 Matched Subspace Results

To determine the effectiveness of our matched subspace detector in discriminating

landmines from clutter, receiver operating characteristic (ROC) curves were gener-

ated for the calibration and blind grids. The calibration grid ROCs were generated

manually, and the blind grid ROCs were generated by the government sponsor of the

test site. We expect our calibration lane ROCs to be very good since the filter was

trained on that data, while good results from the blind grid would be an indicator of

the algorithms robustness.

Before sending our results to be scored, the algorithm was run on the calibration

grid to determine its effectiveness. Two separate detectors utilizing the in-phase and

quadrature data were created and tested. As can be seen in figure 4.4, the algorithm

performs significantly better on the quadrature data than on the in-phase data. In

fact, the in-phase results are not significantly better than a simple energy detector.

We believe the poor in-phase performance is due to the relatively high amount of

noise inherent in the in-phase readings. Alternatively, the in-phase data may be

more difficult to model as a linear combination of a set of vectors. Future efforts that

may improve the in-phase processor results are discussed in chapter 7.

Figure 4.4 shows the ROCs of the matched subspace filter operating on the blind

and calibration data as well as a simple baseline energy detector operating on the

blind grid data. The matched subspace detector is nearly as effective on the blind

grid as on the calibration grid. This indicates that the algorithm is fairly robust and

53

0 0.2 0.4 0.6 0.8 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Pf

Pd

Quadrature MSS ROCInPhase MSS ROC

Figure 4.5: Comparison of in-phase and quadrature matched subspace receiveroperating characteristics from the calibration grid

that the assumptions made regarding the interfering noise statistics are reasonable.

Further, we note the substantial decrease in the false alarm rate as compared to the

simple energy detector. The matched subspace detector achieves a false alarm rate of

11% (at a probability of detection of 95%) in the blind grid, which is an improvement

of over a factor of 6 versus the energy detector.

The major difference between the two matched subspace curves appears between

the 60% and 95% probability of detection range. We believe the difference between

the two curves here stems from the vast amount of clutter present in the blind grid

as compared to the calibration lanes. The smoothness of the blind-grid ROC stems

from the 800 or so pieces of clutter present therein, and the discrete-jump nature of

the calibration ROC stems from the 20 pieces of clutter found there.

54

0 0.2 0.4 0.6 0.8 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Pf

Pd

Matched Subspace Detector ROCs in Calibration and Blind Grids

Calibration MSS ROCBlind MSS ROCBlind Energy Detector ROC

Figure 4.6: Comparison of quadrature matched subspace detector and baselineenergy detector receiver operating characteristics from the blind and calibration grids

55

Chapter 5

Decay Rate Estimation

As discussed in chapter 1, a popular method of discriminating landmines from clutter

is through characteristic decay rate estimates. In this chapter we discuss why decay

rates may be useful for target identification and discrimination, the estimation proce-

dure that has been utilized, the relative locations of poles from the calibration lanes,

and a simple method of discrimination using Gaussian probability density functions.

5.1 Decay Rates

The EMI responses of a highly conducting body are given by equations 2.1 and 2.2

which are repeated here for convenience.

H() = a+n

bn

jn (5.1)

S(t) = a(t) +n

Anent (5.2)

There is a substantial amount of work in the literature pertaining to estimating

decay rates from time-domain signals (see [14, 15, 17, 18, 19, 53, 54]). Decay rates

have been investigated for several reasons. First, they provide a compact space with

which to model landmine responses. In our experiments two decay rates are used to

model a signal which has eighteen data points (9 in-phase and 9 quadrature), thus

the computational load on our detection algorithm is reduced (note that obtaining

these decay rates is, however, computationally very expensive). Furthermore, as has

been noted [37], the decay rates should be purely target dependent at the frequencies

the GEM-3 sensor is operating at. Arguments against using decay rates cite the

56

computational load required to estimate these parameters and the fact that decay

rates do not provide a sufficient statistic [15].

5.2 Estimation Procedure

In this work, we focused on estimating the two primary decay rates from our EMI

data. In order to estimate 1 and 2, an objective function was generated to minimize

the mean-square error between our estimated responses and the data. The MATLAB

function FMINUNC (in the optimization toolbox) was then used to find the optimal

five parameters to model each landmine. (Five parameters: DC term a, two gains b1,

b2 and two decay rates 1, 2.)

Often (especially when modeling clutter), the algorithm used by FMINUNC could

not find potential solutions any significant distance from the initial values provided.

This may be due to a local minimum in the objective function near the initial guess.

In these cases (when the resulting parameters were deemed too close to the initial

guesses), the initial decay rates were varied over a wide range and the optimization

was carried out at each point. The resulting estimate with the lowest error was chosen

as the best estimate of the target decay rates.

The error in these models was very low across a wide range of mine energies.

Figures 5.1 and 5.2 show the parametrized fits to the data for one high-energy and

one low-energy landmine.

Since the estimated decay rates approximate the actual responses well and the

estimated responses shapes are highly correlated, it is intuitive to suppose that the

decay rates estimated from different responses from the same landmine type would be

clustered together to some degree. Such clustering would indicate that the estimated

decay rates are drawn from some target dependent distribution and could facilitate

the formulation of a detector based on them.

57

103 104

50

0

50

100

150

200

250

300

350

LogFrequency

Fitte

d Re

spon

se E

rror =

0.3

3042

%

Estimated and Fitted VS50 Responses

Quadrature DataInphase DataQuadrature FitInphase Fit

Figure 5.1: Estimation of VS-50 Response

Several attempts were made to use clustering algorithms available in MATLAB to

group the different landmines automatically. However, the results obtained seemed

slightly counter-intuitive and did not take into account our a priori knowledge of

which estimates were from which landmine types. Figure 5.3 illustrates a clustering

of decay rates by landmine type made manually, and figure 5.4 provides a closeup of

the same figure.

Note the high degree of intra-mine type correlation. The majority of landmines for

a given type were grouped together. The only two instances where all the responses

from a particular landmine type were not grouped together were the M-14 HE / non-

HE landmines. In the calibration lanes, two of the M-14 landmines were measured

with their primary high-explosive fills present. This altered the responses enough to

warrant the separation of these M-14s from their counterparts (the difference between

58

103 104

15

10

5

0

5

10

15

LogFrequency

Fitte

d Re

spon

se E

rror =

0.0

4756

5%

Estimated and Fitted M14 Responses

Quadrature DataInphase DataQuadrature FitInphase Fit

Figure 5.2: Estimation of M-14 Response

HE and non-HE landmine responses is documented in [43]).

The decay rate estimates for the clutter from the calibration grid is shown in

figure 5.5. As can be seen from the figure, the clutter decay rates are distributed

throughout the range of frequencies but are more densely concentrated at low values

of the first decay rate 1.

5.3 Gaussian Models and Detection

One of the simplest approaches to incorporate the estimated decay rates into a detec-

tion algorithm is to model their statistical distribution with a 2-Dimensional Gaussian

probability density function and generate detectors based on these PDFs. By com-

bining the probability density functions for the different landmine types, a mixture of

59

0 1 2 3 4 5x 104

0

0.5

1

1.5

2

2.5

3

3.5

4

4.5

5x 104

1

2

Clustering of Mine Decay Rate Estimates by Mine Type

VS50 TS50 M14 PMA3 VAL69 VS2.2M19 TMA4 TM62P3T72 TM46 V31.6

Figure 5.3: Estimated landmine decay rates plotted against 1 and 2 in Hz. Eachlandmine type is represented by a different shape.

Gaussian densities is formed. For each cluster of decay rates (clusters do not necessar-

ily represent all landmines of a given landmine type) the sample mean and variance

were calculated using standard techniques. However, with so few data points for each

landmine type, these estimates are suspect. For example, the calibration grid con-

tains only one instance of certain landmines. These solitary landmines are clustered

alone. To estimate the variance of their decay rate distribution functions, an estimate

of the average decay rate variance across landmine types was used. Also, no attempt

was made to generate estimated correlation matrices since there was rarely enough

data to make for a decent estimation. Contours of some of the resulting estimated

Gaussian distributions are shown in figure 5.6. The combination of the separate de-

cay rate PDFs results in a mixture of Gaussian pdfs across the range of i values.

60

1000 2000 3000 4000 5000 6000 7000 8000

0.5

1

1.5

2

2.5

x 104

1

2

Clustering of Mine Decay Rate Estimates by Mine Type

VS50 TS50 M14 PMA3 VAL69 VS2.2M19 TMA4 TM62P3T72 TM46 V31.6

Figure 5.4: Estimated landmine decay rates plotted against 1 and 2 in Hz(close-up). Each landmine type is represented by a different shape. Note the highdegree of spatial correlation between landmines of each type.

We assumed that the clutter decay rates were totally random (uniform across

the range of frequencies) since we had little information to base any general clutter

model upon. Under this assumption the optimal detector for each landmine type is a

threshold on the mixture of Gaussian PDFs (or a monotonic function there of). Since

we have estimated the means and variances of the landmine clusters, this decision

statistic is a GLRT. To make the detector capable of discerning between all landmine

types and clutter, we followed the filter bank procedure outlined in the Chapter 5.

Thus, our results could be used to discriminate between landmine types by choosing

the filter bank with the highest response to an estimated set of decay rates.

61

0 2 4 6 8 10x 104

0

0.5

1

1.5

2

2.5

3

3.5

4

4.5

5x 105

2

1

Clustering of Estimated Decay Rates for Clutter

Figure 5.5: Estimated clutter decay rates plotted against 1 and 2 in Hz. Note thatthe estimated decay rates for clutter objects are spread throughout a wide frequencyrange.

5.4 Decay Rate Estimation Results

In this section we briefly discuss the ROC curves generated from the parameter based

detector discussed above. Figure 5.7 shows the ROC generated from the calibration

data.

Note that the algorithm does not achieve a 95% detection rate until its false alarm

rate approaches 35% and the algorithm only achieve a 100% detection rate at a 45%

false alarm rate. Furthermore, we have good reason to believe that the detector will

n

torrione 2002 masters

Documents

matched subspace filter

matched filter

additive white noise

matched subspace detectors

matched subspace results

additive term

parameter estimation

data separation