unsupervised clustering of polsar data using polarimetric ...unsupervised clustering of polsar data...

Unsupervised clustering of PolSAR data using Polarimetric GDistribution and Markov Random Fields

Salman Khan, Surrey Space Centre, University of Surrey, [email protected], United Kingdom

Anthony Paul Doulgeris, University of Tromsø, [email protected], Norway

Abstract

In this paper an unsupervised PolSAR data clustering algorithm utilizing the flexible polarimetric G distribution is

proposed for the first time. This algorithm has been demonstrated in earlier contributions using other non-Gaussian

distributions like K, G0, and U distributions. The K and G0 distributions suffer from limited modeling capability due

to the presence of only one shape parameter, while the U distribution, although as flexible as the G model, has a very

cumbersome probability distribution function, making its software implementation difficult and computation slow. The

proposed algorithm with the G distribution has a similar non-Gaussian modeling accuracy to the U model, a more easily

implementable probability distribution function, and a much faster computation time. The only disadvantage being that

the log cumulants of the G model are only computable using numerical differentiation, and hence fractional moment

estimators are used in this analysis.

1 Introduction

Synthetic Aperture Radar (SAR) data are of significant

interest because of the weather and light independence

properties of SAR sensors. These data offer a viable al-

ternative in situations where other sensors (e.g. optical)

suffer from impenetrable signals due to cloud cover, rain,

smoke, light conditions etc. The recent advent of high

resolution SARs now provides data with sub meter reso-

lution comparable to that of some optical sensors. Fur-

ther, the availability of polarimetric SAR (PolSAR) data

facilitates more diverse information through the different

transmit and receive polarization pairs. This helps in dif-

ferentiating between physical scattering mechanisms oc-

curring at the target of interest.

SAR data are inherently statistical due to the presence of

speckle, which is a characteristic phenomenon of a co-

herent imaging system. Therefore, it is inevitable that

the analyses of such data take place from a probabilis-

tic approach. Gaussian statistics model low resolution

SAR data reasonably well, however, when the resolution

increases and central limit theorem is not strictly satis-

fied, non-Gaussian statistics are observed. Consequently,

many non-Gaussian probability models have been used to

describe both single-channel and PolSAR data. For mul-

tilook PolSAR data, which is the format analyzed in this

research, the underlying Gaussian statistics are modeled

by the scaled complex Wishart distribution, sWd, while

the non-Gaussian statistics are derived using the product

model [1], which states that the backscattered signal re-

sults from the product between a Gaussian speckle noise

random variate and a positive scalar texture random vari-

able.

The non-Gaussian multilook polarimetric Kd [2] and G0d

[3] distributions are relatively more flexible (one texture

parameter each), and successfully model many PolSAR

scenes. However, it has been noted that sometimes more

modeling flexibility is needed for real PolSAR data. In

this regard, the multilook polarimetric Ud [4], and Gd [3]

distributions are more flexible with two texture shape pa-

rameters each, and have the sWd, Kd, and G0d

distribu-

tions as special cases. In fact, the modeling flexibility of

Ud and Gd distributions is very similar as shown recently

in [5].

In many applications, clustering or segmentation of SAR

data is of interest. These include, land monitoring, map-

ping, change detection, damage assessment and detec-

tion, and rescue and recovery operations. Some of these

recent algorithms are presented in [4, 6–10]. The cluster-

ing algorithm of interest in this paper is a modified ver-

sion of the unsupervised expectation maximization (EM)

algorithm. This algorithm has been proposed by Doul-

geris et al. in [7] and later extended in [10] to incorporate

contextual smoothening through the use of Markov ran-

dom fields (MRF). The algorithm uses one of the afore-

mentioned probability distributions as the underlying sta-

tistical model, and has recently been proposed with the

flexible Ud distribution in [11]. However, the pdf of the

Ud distribution is computationally challenging as it in-

volves Kummer-U functions, which do no have readily

available logarithmic implementations, and are only com-

putable through numerical integration. This is the rea-

son for the slow computation time of this segmentation

algorithm as noted in [11]. In contrast to this, the simi-

larly flexible Gd distribution pdf contains modified Bessel

functions of the second kind, which have stable and well

tested logarithmic implementations in GNU scientific li-

brary (GSL) [12]. It is therefore expected to be computa-

tionally faster with a similar modeling capability. This is

exactly the motivation behind the current research.

The rest of the paper has been organized as follows. Sec-

tion 2 gives a brief overview of the clustering algorithm.

The Gd distribution and its estimator are presented in Sec-

tion 3. Section 4 depicts the application of the algorithm

1025

EUSAR 2014

ISBN 978-3-8007-3607-2 / ISSN 2197-4403 © VDE VERLAG GMBH ∙ Berlin ∙ Offenbach, Germany

to PolSAR data. Section 5 discusses the observed numer-

ical inaccuracies during parameter estimation, while Sec-

tion 6 lists some conclusions, and possible future study.

2 Clustering Algorithm

The clustering algorithm is developed for multilook po-

larimetric data available in the form of polarimetric co-

variance matrices. It is also assumed that the scalar prod-

uct model is valid, and the multilooking procedure is a

simple box-car multilook averaging of single-look scat-

tering vectors. Currently, the equivalent number of looks

(ENL) is estimated only once from a homogeneous area

in the image, and is utilized throughout the lifetime of

the clustering algorithm, although this can be set to an

adaptive ENL estimation.

The clustering algorithm uses the method of multivari-

ate fractional moments (MoMFM), recently proposed in

[13], to estimate the texture shape parameters of the Gd

distribution within the expectation maximization frame-

work. It is pertinent to introduce the two dimensional

matrix log cumulants (2D-MLC) diagram here. Figure 1

shows the 2D-MLC diagram, depicting the color-coded

manifolds spanned by the theoretical MLCs of several

matrix-variate compound distributions. The dimension

spanned by the manifold is equal to the number of texture

parameters present in the compound PDF. The two tex-

ture shape parameters of the Gd distribution only need to

be estimated when the sample matrix log cumulants fall

within the Gd distribution manifold in the 2D-MLC dia-

gram. In this case MoMFM is used to estimate the texture

shape parameters. Outside this domain, only one texture

parameter needs to be estimated, which corresponds to

the texture parameter of the Kd or G0d

distributions de-

pending on whether the sample matrix log cumulant falls

on the left or the right side of the Gd distribution matrix

log cumulant manifold, respectively. In this case the cor-

responding method of matrix log cumulants (MoMLC) is

used for texture shape parameter estimation [14].

−4 −3 −2 −1 0 1 2 3 40

1

2

3

4

5

6

7

8

κ3

κ2

U/G

W

M

K

G0

Wis

Figure 1: Manifolds of different models in matrix log

cumulants diagram. The U and G models have the same

manifold. The W and M models are currently consid-

ered invalid and ignored.

The algorithm separates the image pixels into clusters

based on the Gd distribution. It uses a modified version

of the iterative expectation maximization algorithm as de-

tailed in [7] and contextual smoothening is achieved with

an MRF approach, which integrates the Gd distribution

to model the statistics of each image cluster and a Potts

model for the spatial context.

The goodness-of-fit (GoF) testing in the algorithm is per-

formed using Pearson’s chi-squared GoF test instead of

using matrix log cumulants based GoF procedures as in

[11]. The primary reason of not using matrix log cumu-

lants based GoF testing is that the theoretical matrix log

cumulants of the Gd distribution do not have closed forms

and can only be computed using numerical differentia-

tion. The Pearson’s GoF testing is done by comparing

the model fitting to the histogram of the determinants of

the multilook polarimetric covariance matrices.

The GoF of each cluster is used to automatically deter-

mine the number of significant clusters within the data-

set. Poorly fitting clusters are split into two clusters and

the EM-algorithm is re-applied to convergence. The algo-

rithm stops when all clusters are considered good-fits to

the data histograms. Consistent initialisation is achieved

by always starting with one cluster. This results in as

many statistically distinct classes as allowed by the cho-

sen underlying pdf, the number of data samples, and the

chosen confidence level, e.g., 95%. The algorithm op-

tionally includes adaptive sensitivity and sub-sampling

ability as explained in [7].

3 The Gd Distribution

The multilook polarimetric Gd distribution was initially

proposed in [3]. It has two texture shape parameters

α ∈ R, andω > 0. When ω → 0+, it reduces to the Kd or

G0d

distributions if α is positive or negative, respectively.

When |α| → ∞ or ω → ∞ it reduces to the Gaussian

case of sWd distribution. Its pdf is given by [5, 13]:

pC(C;L,Γ, α, ω, η) =LLd|C|L−d

Γd(L)|Γ|L1

ηαKα(ω)

×

(

2LTr(

Γ−1

C)

+ ωη

ω/η

)α−Ld

2

×Kα−Ld

(

√

ω/η (2LTr (Γ−1C) + ωη))

,

(1)

where L is the number of looks, Γ is the normalized sam-

ple covariance matrix, η is the scale parameter, Kν(·) is

modified Bessel function of the second kind and order ν.

Its MoMFM estimator can be derived from the following

equation:

E{Tr(

Σ−1

C)ν} =

Kα̂K1+ν(ω̂K1)Kν−1

α̂K1(ω̂K1)

Kνα+1(ω̂K1)

×Γ(Ld+ ν)

LνΓ(Ld),

(2)

by simultaneously solving two equations with ν = 1

8and

1

4. Outside the Gd distribution manifold ω → 0+, there-

1026

EUSAR 2014


fore only α needs to be estimated, which can be easily

done using the MoMLC estimators for Kd and G0d

distri-

butions listed in [14]. The MoMLC estimator for the tex-

ture shape parameters of the Ud distribution is also listed

in [14]. Moreover, the performance analysis of these esti-

mators on simulated PolSAR data can be found in [13,14]

1

2

3

4

5

6

Figure 2: Clustering of simulated Gd data with 6 classes,

sub-sampling = 4.

0 1 2 3 4 5 6 70

20

40

clu

ste

r:1

α=

19

ω=

0.0

|Σ|(1

/d) =

3

2.9

0 1 2 3 4 5 6 70

20

clu

ste

r:2

α=

−8

.2

ω=

9.7

|Σ|(1

/d) =

1

0.2

0 1 2 3 4 5 6 70

20

clu

ste

r:3

α=

3.7

ω=

0.0

|Σ|(1

/d) =

9

.24

0 1 2 3 4 5 6 70

20

40

clu

ste

r:4

α=

−3

3

ω=

0.0

|Σ|(1

/d) =

3

.67

0 1 2 3 4 5 6 70

5

10

clu

ste

r:5

α=

3.7

ω=

0.0

|Σ|(1

/d) =

0

.73

1

0 1 2 3 4 5 6 70

5

10

clu

ste

r:6

α=

−2

.1

ω=

0.0

|Σ|(1

/d) =

2

.43

Figure 3: Fitting of estimated G pdf to cluster histograms

of simulated data.

4 Results

The clustering algorithm using the Gd distribution has

been used to cluster both simulated and real PolSAR data.

Selected results for each case have been shown below.

4.1 Simulated PolSAR Data

Synthetic dual-pol PolSAR data 250 × 250 pixels with

six distinct classes of Gd distribution were generated us-

ing 5 looks. The parameters chosen for simulated data

were collected from real data samples. The results of

the clustering algorithm, with a sub-sampling of four,

after MRF smoothening are shown in Figure 2. The

labeled classes show a nearly perfect clustering perfor-

mance. The algorithm took approximately 1 minute and

2 seconds to compute the shown results using an Intel

quad core 3.1 GHz processor, with 8 Gb RAM, and MAT-

LAB software. The fitting of the G pdf to the six detected

clusters is shown in Figure 3.

1

2

3

4

5

6

7

8

9

Figure 4: (Top) False color Pauli RGB image. (Bottom)

Clustering of quad-pol TerraSARX data with 6 look and

sub-sampling = 8.

0 2 4 6 8 10 12 140

0.5

1

clu

ste

r:1

α=

5.7

83E

+03

ω=

779

|Σ|(1

/d) =

0.8

56

0 2 4 6 8 10 12 140

5

clu

ste

r:2

α=

−1.9

ω=

0.0

|Σ|(1

/d) =

0

.36

0 2 4 6 8 10 12 140

20

clu

ste

r:3

α=

−11

ω=

0.0

|Σ|(1

/d) =

0.0

652

0 2 4 6 8 10 12 140

5

10

clu

ste

r:4

α=

−11

ω=

0.0

|Σ|(1

/d) =

0.0

727

0 2 4 6 8 10 12 140

1

2

clu

ste

r:5

α=

−24

ω=

0.0

|Σ|(1

/d) =

0.0

724

0 2 4 6 8 10 12 140

5

10

clu

ste

r:6

α=

2.6

ω=

3.5

|Σ|(1

/d) =

0.0

368

0 2 4 6 8 10 12 140

5

clu

ste

r:7

α=

−2.3

40E

+03

ω=

0.0

|Σ|(1

/d) =

0.0

573

0 2 4 6 8 10 12 140

5

clu

ste

r:8

α=

−1.0

00E

+04

ω=

0.0

|Σ|(1

/d) =

0.0

548

0 2 4 6 8 10 12 140

50

clu

ste

r:9

α=

−1.0

00E

+04

ω=

0.0

|Σ|(1

/d) =

0.0

425

Figure 5: Fitting of estimated G pdf to cluster histograms

of real data.

1027

EUSAR 2014


4.2 Real PolSAR Data

The algorithm is also applied to quad-pol TerraSAR-X

data, 300 × 400 pixels, with an ENL of six using a sub-

sampling of eight. The results are shown in Figure 4,

where they can also be compared to the correspond-

ing Pauli decomposed false color image. Nine distinct

classes were found, with the first class containing only

one pixel. A comparison of the clustering results with

the Pauli decomposed image shows a visually acceptable

performance. The algorithm took 5 minutes and 11 sec-

onds to compute the shown results on the same comput-

ing platform. The fitting of the G pdf to the nine detected

clusters is shown in Figure 5.

5 Numerical Inaccuracy of Param-

eter Estimation

It has been experimentally observed that the parame-

ter estimation using multivariate fractional moments has

slight numerical inaccuracies, which accentuate on real

PolSAR data. There is a consensus between the authors

that, for practical purposes, the fast computation time of

MoMFM estimators outweighs their slight numerical in-

accuracy. In the proposed version of the algorithm, this

effect has been mitigated by using sub-sampling and also

limiting the maximum sample size to 10,000 pixels. This

reduces the sensitivity of the GoF test, enough to cancel

out the little inaccuracy in estimation. However, improve-

ment in the accuracy of these estimators will form a sub-

ject suitable for future research as the fast computational

time is highly desirable.

6 Conclusions and Future Work

A fast unsupervised clustering algorithm for multilook

PolSAR data has been proposed using the flexible G dis-

tribution for the first time. The results on simulated and

real PolSAR data look very promising. The computa-

tional time and software implementation have also been

found to be very straight forward. The only drawback

is the numerical inaccuracy during parameter estimation,

which will form a topic of further investigation.

7 Acknowledgments

This work has been funded by the EC FP7 project Dem-

ining ToolBOX (D-BOX), grant agreement no:284996,

and the TerraSAR-X dataset has been provided by DLR.

References

[1] C. Oliver and S. Quegan, Understanding Synthetic

Aperture Radar Images, 2nd ed. Raleigh, NC:

SciTech Publishing, 2004.

[2] J. Lee, D. Schuler, R. Lang, and K. Ranson, “K-

distribution for multi-look processed polarimetric

SAR imagery,” in Proc. IGARSS, vol. 4, Pasadena,

CA, Aug. 1994, pp. 2179–2181.

[3] C. Freitas, A. Frery, and A. Correia, “The polari-

metric G distribution for SAR data analysis,” Envi-

ronmetrics, vol. 16, no. 1, pp. 13–31, Feb. 2005.

[4] L. Bombrun, G. Vasile, M. Gay, and F. Totir, “Hier-

archical segmentation of polarimetric SAR images

using heterogeneous clutter models,” IEEE Trans.

Geosci. Remote Sens., vol. 49, no. 2, pp. 726–737,

Feb. 2011.

[5] S. Khan and R. Guida, “Application of mellin-kind

statistics to polarimetric g distribution for sar data,”

IEEE Trans. Geosci. Remote Sens., vol. PP, no. 99,

pp. 1–16, 2013.

[6] J.-M. Beaulieu and R. Touzi, “Segmentation of tex-

tured polarimetric SAR scenes by likelihood ap-

proximation,” IEEE Trans. Geosci. Remote Sens.,

vol. 42, no. 10, pp. 2063–2072, Oct. 2004.

[7] A. Doulgeris, S. Anfinsen, and T. Eltoft, “Auto-

mated non-gaussian clustering of polarimetric syn-

thetic aperture radar images,” IEEE Trans. Geosci.

Remote Sens., vol. 49, no. 10, pp. 3665–3676, Oct.

2011.

[8] A. C. Frery, J. Jacobo-Berlles, J. Gambini, and

M. Mejail, “Polarimetric SAR image segmentation

with B-splines and a new statistical model,” Multi-

dimensional Syst. Signal Process., vol. 21, no. 4, pp.

319–342, Dec. 2010.

[9] O. Harant, L. Bombrun, M. Gay, R. Fallourd,

E. Trouvé, and F. Tupin, “Segmentation and clas-

sification of polarimetric SAR data based on the

KummerU distribution,” in Proc. PolInSAR, Fras-

cati, Italy, 2011.

[10] V. Akbari, A. P. Doulgeris, G. Moser, T. Eltoft,

S. N. Anfinsen, and S. B. Serpico, “A textural-

contextual model for unsupervised segmentation of

multipolarization synthetic aperture radar images,”

IEEE Trans. Geosci. Remote Sens., no. 99, pp. 1–

12, 2012.

[11] A. Doulgeris, V. Akbari, and T. Eltoft, “Automatic

PolSAR segmentation with the U-distribution and

Markov Random Fields,” in Proc. EUSAR, Nurem-

berg, Germany, Apr. 2012, pp. 183–186.

[12] B. Gough, GNU Scientific Library Reference Man-

ual - Third Edition. Network Theory Ltd., 2009.

[13] S. Khan and R. Guida, “On fractional moments

of multilook polarimetric whitening filter for po-

larimetric SAR data,” IEEE Trans. Geosci. Remote

Sens., vol. PP, no. 99, pp. 1–11, 2013.

[14] S. Anfinsen and T. Eltoft, “Application of the

matrix-variate Mellin transform to analysis of po-

larimetric radar images,” IEEE Trans. Geosci. Re-

mote Sens., vol. 49, no. 6, pp. 2281–2295, Jun.

2011.

1028

EUSAR 2014


unsupervised clustering of polsar data using polarimetric ...unsupervised clustering of polsar data...

Documents