parallel architecture of adaptive mtd processors

ELSEVIER Mathematics and Computers in Simulation 42 (1996) 97-105

ncs

MpLrrERS SIMULATION

~t Parallel architecture of adaptive MTD processors

Christo Kabakchiev, Vera Behar*

CICT Bulgarian Academy of Sciences, Acad. G. Bontchev Str. 25-A, Sofia, Bulgaria

Abstract

A parallel systolic structure of the adaptive moving target detector (AMTD) is described. The adaptive signal processing is realized in the whole area of observation by a set of systolic processors operating in parallel. The cost of systolic AMTD implementation (necessary number of processing elements and computational steps) is also evaluated.

1. Introduct ion

In 1978 Kung and Leiserson [1] introduced the term "systolic array". They proposed the systolic arrays for applications with two important sets of characteristics. First, these applications require extensive throughput and large processing bandwidth. Second, these applications are supported by algorithms that can be implemented on arrays consisting of a few types of simple processing elements. These algorithms are characterized by repeated computations of few types of relatively simple operations that are common to many input data items. For this reason the algorithms intended for digital signal processing are very convenient for implementation on systolic arrays. In particular in the present paper we represent and evaluate one optimal variant of systolic architecture of digital signal processors intended for moving target detection (AMTD processors).

The Moving Target Detector (MTD) is a pulsed Doppler radar using the Doppler effect to select signals from targets with different radial velocities. We consider the latest adaptive version of MTD processor (AMTD processor) developed at the Lincoln Laboratory in 1985. This version is described in [3] where close attention is paid to the adaptive filter design and the choice of sufficient number of bits to represent the filter coefficients. In the present paper we solve another very difficult problem of AMTD implementation concerning the choice of an optimal computer architecture. The AMTD processor involves a bank of Doppler filters with adaptive weights. The general block diagram of this processor is shown in Fig. 1.

Each filter of the bank is uniformly spaced in a frequency region equal to pulse repetition frequency and tuned to different Doppler frequency. To reduce the processing time all filter weights can be computed for

* Supported in part by Grant TN No. 245/92 from the Bulgarian National Foundation for Scientific Investigations and developed at the Laboratory "Signal" - Bulgarian Academy of Sciences.

* Corresponding author. Address: Inst. of Information Technologies, Acad. G. Bonchev Str., bl.2, Sofia, Bulgaria.

0378-4754/96/$15.00 © 1996 Elsevier Science B.V. All rights reserved PII S0378-4754(96)0005 8-4

98 C. Kabakchiev, V. Behar /Mathematics and Computers in Simulation 42 (1996) 97-105

I Tapped Delay Line

Video from receiver

F -1 Filter Weights Estimation

b I

1, 1, tuned to fo Filter

1 1 Square [ Square Law Law

Detector Detector

1 1 [ ThAdsahiti~ieg I Adaptive Thresholding [

to computer memory

'l Estimation Clutter

Level

l Computer Memory

Fig. 1. Block diagram of AMTD processor.

several types of clutter environment and can be stored in computer memory in advance. The selection of filter weights is controlled by "range-azimuth" clutter map. Decision of target-present and target-absent is realized by adaptive signal threshold.

Radar Signal Parameter Estimation can be formulated as a problem in statistical decision theory. The class of possible radar signals (echoes) can be represented as points S in signal space. Each point in the space represents a waveform with a particular combination of radar signal parameters (Doppler shift, delay time, amplitude) according to a particular combination of target parameters (distance, velocity, azimuth). In a similar way noise space can be defined whose points n describe all possible waveform realizations of the noise process within the observation interval. Next, observation space is defined whose points X represent joint combination of signal S and noise n. Subsets of points from the observation space are mapped by a decision rule into signal points in decision space.

The structure of signal processing applied to the waveform Xdf is shown in Fig. 2. The decision Ddf n indicates the presence or absence of desired signal Sdf n in the observed waveform

Xdf after passing through the filter tuned to frequency fn. All points in the decision space, i.e. Ddfn, have only two possible values (0 : signal present and 1 : signal absent).

2. Systolic structure of AMTD processors

The most effective approach for speeding the process of moving target detection in a large radar observation space is to design a signal processor with maximum parallelism. According to this approach the whole radar observation space can be divided into D resolution cells in range, into F resolution cells in azimuth and into Nf resolution cells in radial velocity. As a result the whole radar observation space can be represented as a set of (D × F x Nf) "range-azimuth-Doppler velocity" resolution cells where

C. Kabakchiev, V. Behar/Mathematics and Computers in Simulation 42 (1996) 97-105 99

I- - - I I I I-- --]

I I II II I

• - II I I I

I II I I I Square

Bank Law CFAR L of Fnters ~ [_ Detector J IThreshold I

Fig. 2. Signal processing in Adf resolution cell.

the signal processing in a single resulution cell Adfk (d = I -D; f = I -F ; k = 1-Nf) is realized by a single processor SPdfn. Therefore the whole radar observation space can be performed by a set of identical processors operating in parallel. The number of such processors Nsp is given by the following expression:

Nse = D x F x Nf. (1)

The processor S Pdfn consists of two consecutively connected subprocessors S PdDfn and S pyyAR: the first for signal filtration; the second for signal detection and parameter estimation (see Fig. 3). In this case the processing time in the whole radar observation space T °bs is computed by

T ° b s : T~f n q- Td~Fn AR, (2)

where T~f n is the number of computational steps in the subprocessor SP~f n and Td~ FAR - in the subprocessor

sPdCf FAR. In this case the running time of the processor SPdf. can be optimized through the design of the

optimal parallel architecture of both subprocessor SP~f n and subprocessor SPdCf FAR.

2.1. Systolic structure of the processor s e~f n

According to [2,3] a sampled complex amplitude of the IF (Intermediate Frequency) signal observed at the input of the filter bank can be described by vector xTf = (Xdfl . . . . . Xdfi . . . . . XdfNp) where the vector

component Xdfi can be represented as xdfi = xdf( iT) = adi( + jadi f (T : interpulse period; Np: number of coherent pulses; ai 1, ai2 : real and imaginary components of a complex Xdfi).

I M E M - W [ I M E M - Q ~ + - - - ~ ~

I I I Fig. 3. Structure of the processor SPdfn.

100 C Kabakchiev, V Behar/Mathematics and Computers in Simulation 42 (1996) 97-105

The signal power at the output of the processor Sfff n is given by Zdfn = Ydfn Y]dn' where Ydfn is

the output signal of the nth Doppler filter. It is given by Ydfn = Xdf W T, where the filter coefficients (weights) Wn of the nth Doppler filter can be written as Wn = (Wn 1 . . . . . toni . . . . . WnNp), toni : ton (i T) : Wnil + jWni2.

According to [3] all filter coefficients can be computed in advance for several (for example three) types of clutter environment: clutter absent or weak clutter or strong clutter. It is assumed that three variants of the vector Wn are computed in advance and stored in computer memory. In result the equation for Ydfn gives

Ydfn = ailWil -- ~ ai2wi2 + j ailtOi2 "~- Z ai2Wil " \ i = 1 i=1 i=1

Denoting

UP E a i 2 w i 2 : R22,

up

E ailWil : RI1 , i=1 i=1 Np Np

E a i l t O i 2 ..~ e l 2 , E a i 2 t O i l : R21 , i=1 i=1

it can be rewritten as Ydfn = ( R l l -- R22) + j(R12 + R21). Substituting this expression into equation for Zdfn we obtain the expression for Zdfn:

Zdfn = ( R l l - R22) 2 + (R12 + R21) 2.

The systolic linear array of the processor Sfff n intended for signal filtration in a single "range-azimuth- Doppler velocity" resolution cell is shown in Fig. 4. It involves four types of processing elements with very simple logic presented in Fig. 4. One of the processing elements is well known with inside accumulation. However we must mention that this structure requires every input data item to be doubled. The cost of this systolic implementation is given by the following expressions:

The number of processing elements is computed by

Nfff n = 7 (3)

The number of computational steps is computed by

T~f n = Np + 3. (4)

Another version of the systolic array of the processor SPry n is shown in Fig. 5. As shown in Fig. 5 this structure is a combination of rectangular and linear arrays and involves five types of processing elements with simple logic. The cost of this systolic implementation is given by the following expressions:

The number of processing elements is computed by

Nfff n = 4 x Np + 5 . (5)

The number of computational steps is computed by

T~f n = g p + 4. (6)

C. Kabakchiev, V Behar /Mathematics and Computers in Simulation 42 (1996) 97-105 101

a l l . . . a l Np

W l l . . . WlNp

a21. • • a2Np

W 21 . . . W2N p I

P E 2 I a l l • • • a lNp [

W21 . . . W2Np

a21. • . a2Np

Wl l •. • WiNp

a i n ~ ~ Zout Win

A I J

1 I -I

, Z d t h

1) (1-1) ~ .(1-1) . (1-1) -(out = - . i n -,- ~ , ~ • ~ i n

bi nain I ~ D aout a~u)t - -(I-I) - I ( I - I ) _ O.in -- U|n

Cin b (I) k(I-1) a ~(I-1) di n D bout out ~-- Uin 7- bin

. (1) _ ^ 0 - 1 ) ~ (1 -1)

bi n P E 8 bout b (I) - t '(l-1) I"(I-1) out -- ~in " ~in

a i n D ~ ' - ~ ~(I) _ _ ( I - 1 ) (1-1) bin . [VI~4 I p Cout ~ o u t - C t i n + b i n

Fig. 4. Systolic array of the processor spaDf.

2.2. Systolic structure o f the processor ~'°CFAR ~'" dfn

It is well known that Constant False Alarm Rate (CFAR) processors are used for detecting target echoes in background of clutters with an unknown intensity. According to basic concept the automatic target detection is commonly implemented by comparing the voltage in each resolution cell to an adaptive threshold determined on the base of the noise power estimate over adjacent range and/or Doppler resolution cells [2,4]. The decision rule for testing two hypotheses HI (Ddfn = 1 : target present) and Ho (Ddfn = 0 : target absent) in the Act frith resolution cell is given by

1 ifZdfn >>. T x P, Ddfn = 0 ifZdfn < T x P,

where Zdfn is the voltage in the cell under test, P is the noise power estimate, T is the detection scale factor obtained from the following equation:

102 C. Kabakchiev, V. Behar /Mathematics and Computers in Simulation 42 (1996) 97-105

fa=f f 0 PT

~all ~ a12 ~alNp

0 ( N p -- 1 ) steps

o

(Np - 1) steps

~a2Np

ain t.(l-1) ~ _ ~(l-1) ~ i °l°'~' = °' . . . . . '° bin t tout a~'~m t

~tln

t

Fig. 5. Systolic array of the processor SP~f n.

It

Zaf~

: H o ) f ( P ) d(Zdfn) dP,

where f ( Z d f n : H0) is the noise pdf in the cell under test; f ( p ) is the pdf of the noise power estimate ; Pfa is the false alarm probability to be maintained. The OS CFAR processor proposed by Rohling [4] estimates the noise power simply by selecting the kth largest cell in the reference window of size N. The OS CFAR processor suffers only minor degradation in detection probability (in exponential homogeneous noise background) and resolves closely spaced targets effectively for k tended to the maximum.

The general block diagram of this processor is shown in Fig. 6. The structure of OS CFAR processor involves a sort procedure. As a result from the sort procedure all voltages in the reference window are ranked according to increasing magnitudes:

YI <. Y 2 . . . <. Y k . . . <. YN,

where N is the size of the reference window. In this case the estimate P is formed as P ---- Yk. The main application difficulty of the OS CFAR implementation is a time-consuming sorting procedure. There is a large set of relatively fast sorting algorithms, namely: HeapSort, QuickSort, Odd-Even Transposition Sort, Bucket Sort, Counting Sort, Odd-Even Merging, etc. [5]. Most of these algorithms can be realized as parallel over systolic architecture. However, we think the more effective solution of this problem is to develop special adaptations of sorting algorithms using all prior information for the input sequence to be sorted and OS CFAR parameters. Such information can contain the following data: size of a reference window (N), order of an element to be selected (k), false alarm probability to be maintained (Pfa). Rohling showed all practical values of k and N are usually chosen from the following condition:

C. Kabakchiev, V. Behar/Mathematics and Computers in Simulation 42 (1996) 97-105 103

ILII

, Range Tapped Delay Line

I Y-, . . . . . . .

I,L]I I

I 1

SOR~ and SELECT k-th reference cell

Estimated @p clutter power

TOS p Detector

scale factor S

Threshold SmTxP

"l~arget '1 Comparator ~_.~DeclsionDdfn

Fig. 6. Block diagram of OS CFAR processor.

k/> 43-N, where N/> 24. (7)

Using this condition we proposed and described in [6] the following new practical algorithm of sorting. It has the following computational steps: (1) Split the input vector Y into "M" subvectors of size L meeting the requirements

L = N / M and L >~ N - k q-1 . (8)

(2) Sort in parallel all subvectors according to decreasing magnitudes. (3) Form new subvectors Z from the largest (N - k + 1) elements of the vectors Y. (4) Merge subvectors into vector P. (5) Sort the vector P according to decreasing magnitudes. (6) Evaluate the estimate P.

We realized this algorithm as a sorting network on the base of the Odd-Even Transposition Sort method convenient for systolic implementation. The general block diagram of OS CFAR processor realized on the base of this M-splitting procedure is shown in Fig. 7.

At the first stage of sorting all voltages in the reference window are separated into M vectors Y1 . . . . . YM.

The number of elements in each vector must be greater than (N - k + 1) (Eq. (8)). After that the vectors YI . . . . . 1" M are sorted according to decreasing magnitudes by processors S P CFAR . . . . . SP CFAR operating in parallel. The first (N - k + 1) elements of sorted vectors form the vectors Z1, Z2 . . . . . ZM. At the second

s C F A R

i P N - k + I ~ p,~c,,-k+d ze.

• Da,,.

Fig. 7. Structure of the processor SPdCf FAR.

104 C. Kabakchiev, V Behar/Mathematics and Computers in Simulation 42 (1996) 97-105

stage of sorting the vectors Z l , Z 2 . . . . . Z M are merged in a new vector which is sorted in decreasing order by the processor qloCFAR The result from sorting is the vector P, i.e. P1 /> P2 /> • " /> PN-k+I >1 "'" >>- ~'" M+I • PM(N-k+I) . The power estimate P is assumed to be the value PN-k+I . The structure of the processors spCFAR gioCFAR is a systolic network designed on the base of the Odd-Even Transposition Sort . . . . . . . M+I Method implementation. This systolic an-ay is shown in Fig. 8. The logic of processing elements PE1 and PE2 is shown in Fig. 9 and described by the following expressions:

{X~ n ifX~ n~<Xi~ n,

x ~ u t = xi9 n otherwise.

z~ut = { 1 if Z~ n ) Z~ n. Z~ n,

0 otherwise.

{X~ n ifX~ n > X ~ n,

x ~ u t = xi9 n otherwise.

The number of processing elements PE1 necessary only for sorting can be evaluated by

Np(M) N ( N / M - 1) M ( N - k + 1)[M(N - k + 1) - 1] E1 = 2 + 2 ( 9 )

The number of computational steps necessary only for sorting is evaluated by

N(T M) = N / M + M ( N - k + 1). (10)

The cost of systolic implementation of a sort network evaluated for several values of M, two types of a reference window is shown in Table 1. Analysis of results shows that optimum splitting of the reference window into subwindows can sufficiently reduce (up to 60%) the number of processing elements and minimize the running time in the systolic array of the OS CFAR processor (for example M = 4 for two given samples). Varying the basic OS CFAR parameters (N, M and K) we can design the optimum systolic architecture of the OS CFAR processor-SPdCfFn AR.

~a

Yi ;[ P E I " ' " ~ z,,

. . .

I / L - - h

Fig. 8. Systolic structure of the p r o c e s s o r s SP1 cFAR, ~, ioCFAR • ' ' ' ~ " M + I "

xp z .z~- x l - ~ . 1 - ~ ." xt~'x¢., z ~ . zr"

Fig. 9. Structure of processing elements PE1 and PE2.

C. Kabakchiev, V. Behar/Mathematics and Computers in Simulation 42 (1996) 97-105

Table 1 Cost of systolic implementation of a sort network

105

Sample 1: N=24, k=22 Sample 2: N=32, k=30

M = I M = 2 M = 4 M = I M = 2 M = 4 M = 8

Number of PE1 (N~M~) 276 147 126 496 255 178 324 Number of steps (N~,,)'m) 24 18 18 32 22 20 28

3. Conclusions

In conclusion the cost effectiveness of systolic array of AMTD processor can be evaluated as for a single "range-azimuth-velocity" resolution cell as for the whole radar observation space. The number of processing elements in the systolic array of AMTD processor necessary for signal processing:

- in the Adf n "range-azimuth-Doppler velocity" resolution cell is computed by

M(M) 4Np + 6 + ~'PE1, (1 1)

Ndfn = 8 + ~(M),,PE1.

-- in the whole radar observation space is computed by

Nobs = Nsp Ndfn. (12)

The number of processors Nsp is found from Eq. (1). In the end the number of computational steps in AMTD processor necessary for signal processing in the whole radar observation space T °bs can be computed as

T ° b s = rd: = [ Np + 5 + M), (13) / Np+a+s M>

Two variants of Ndf n and T °bs are computed for two systolic arrays of the processor SPfff n according to Eqs. (3)-(6).

We think that varying all basic AMTD parameters (number of coherent pulses Np, size of the reference window N, number of subwindows M, order of selected magnitude in the reference window K, etc.) we can find compromise between the cost effectiveness of systolic implementation of AMTD processor and the necessary quality of moving target detection. In conclusion it must be noted that the so-obtained systolic architectures of AMTD processor are very convenient for VLSI technology.

References

[1] H. Kung and C. Leiserson, Sparse Matrix Proc. (Academic Press, Orlando, F1, 1978) 256-282. [2] D. Barton, Modem Radar System Analysis (Artech House, 1988) 255-260. [3] E. D'Addio and G. Galati, lEE Proc. 132 (1) (1985) 58-65. [4] H. Rohling, IEEE Trans. AES-19 (4) (1983) 608-621. [5] G. Selim, Parallel Sorting Algorithms (Academic Press, Canada, 1985) 41-47. [6] V. Behar and Chr. Kabakchiev, Proc. ECCTD'93 (Davos, Switzerland, 1993) 981-984.

parallel architecture of adaptive mtd processors

Documents