an adaptive fuzzy-genetic algorithm approach for building detection using high-resolution satellite...

15
An adaptive fuzzy-genetic algorithm approach for building detection using high-resolution satellite images Emre Sumer a,, Mustafa Turker b,1 a Baskent University, Faculty of Engineering, Department of Computer Engineering, 06810 Ankara, Turkey b Hacettepe University, Faculty of Engineering, Department of Geomatics Engineering, 06800 Ankara, Turkey article info Article history: Received 12 March 2012 Received in revised form 22 January 2013 Accepted 23 January 2013 Keywords: Building detection Image processing High resolution satellite imagery Genetic algorithms Fuzzy logic abstract We propose a new approach for building detection using high-resolution satellite imagery based on an adaptive fuzzy-genetic algorithm. This novel approach improves object detection accuracy by reducing the premature convergence problem encountered when using genetic algorithms. We integrate the fun- damental image processing operators with genetic algorithm concepts such as population, chromosome, gene, crossover and mutation. To initiate the approach, training samples are selected that represent the specified two feature classes, in this case ‘‘building’’ and ‘‘non-building’’. The image processing operations are carried out on a chromosome-by-chromosome basis to reveal the attribute planes. These planes are then reduced to one hyperplane that is optimal for discriminating between the specified feature classes. For each chromosome, the fitness values are calculated through the analysis of detection and mis-detec- tion rates. This analysis is followed by genetic algorithm operations such as selection, crossover and mutation. At the end of each generation cycle, the adaptive-fuzzy module determines the new (adjusted) probabilities of crossover and mutation. This evolutionary process repeats until a specified number of generations has been reached. To enhance the detected building patches, morphological image process- ing operations are applied. The approach was tested on ten different test scenes of the Batikent district of the city of Ankara, Turkey using 1 m resolution pan-sharpened IKONOS imagery. The kappa statistics computed for the proposed adaptive fuzzy-genetic algorithm approach were between 0.55 and 0.88. The extraction performance of the algorithm was better for urban and suburban buildings than for build- ings in rural test scenes. Ó 2013 Elsevier Ltd. All rights reserved. 1. Introduction Building detection has long been one of the major research areas in urban remote sensing. At first glance, buildings may appear to be simple objects that can be easily identified and extracted. However, automatic building extraction from high- resolution images must address several difficulties caused by dif- ferences in viewpoint and by buildings of complex shape and size. Buildings are one of the fundamental GIS data components, and building detection has been shown to be extremely useful in urban planning, infrastructure development, the construction of telecom- munication lines, pollution modeling, disaster planning and many other types of urban simulation. Several approaches based on high-resolution multi-spectral spaceborne imagery have been developed for the acquisition of 2-D building information. Depending on the application area, sin- gle and stereo uses of panchromatic, multi-spectral and pan-sharp- ened spaceborne imagery are commonly encountered. Fraser, Baltsavias, and Gruen (2001), Lee, Shan, and Bethel (2003), Shackelford and Davis (2003), Kim, Lee, and Kim (2006), Sirmacek and Unsalan (2009) and Koc San and Turker (2012) utilized IKO- NOS imagery for building extraction with different methodologies based on image classification, image segmentation, fuzzy pixel/ob- ject based approaches, line analysis and graph theoretical methods. Additionally, QuickBird imagery has been used to extract buildings in several studies by employing different region- and feature-based approaches such as clustering, edge detection and snake contours (Liu, Cui, & Yan, 2008; Mayunga, Coleman, & Zhang, 2007; Wei, Zhao, & Song, 2004). Inglada (2007) proposed an image processing system based on support vector machines for the detection and recognition of man-made objects from SPOT-5 imagery. Further- more, the use of hybrid datasets such as integrated SAR–optical imagery and LIDAR–optical imagery have also been tested in sev- eral studies (Karantzalos & Paragios, 2010; Sohn & Dowman, 2007; Tupin & Roux, 2003). Moreover, the use of digital elevation models (DEMs) and digital surface models (DSMs) generated from different spaceborne sensors for building detection can be 0198-9715/$ - see front matter Ó 2013 Elsevier Ltd. All rights reserved. http://dx.doi.org/10.1016/j.compenvurbsys.2013.01.004 Corresponding author. Tel.: +90 312 246 66 66; fax: +90 312 246 66 60. E-mail addresses: [email protected] (E. Sumer), [email protected] (M. Turker). 1 Tel.: +90 312 297 69 90; fax: +90 312 297 61 69. Computers, Environment and Urban Systems 39 (2013) 48–62 Contents lists available at SciVerse ScienceDirect Computers, Environment and Urban Systems journal homepage: www.elsevier.com/locate/compenvurbsys

Upload: mustafa

Post on 08-Dec-2016

214 views

Category:

Documents


2 download

TRANSCRIPT

Computers, Environment and Urban Systems 39 (2013) 48–62

Contents lists available at SciVerse ScienceDirect

Computers, Environment and Urban Systems

journal homepage: www.elsevier .com/locate /compenvurbsys

An adaptive fuzzy-genetic algorithm approach for building detectionusing high-resolution satellite images

0198-9715/$ - see front matter � 2013 Elsevier Ltd. All rights reserved.http://dx.doi.org/10.1016/j.compenvurbsys.2013.01.004

⇑ Corresponding author. Tel.: +90 312 246 66 66; fax: +90 312 246 66 60.E-mail addresses: [email protected] (E. Sumer), [email protected]

(M. Turker).1 Tel.: +90 312 297 69 90; fax: +90 312 297 61 69.

Emre Sumer a,⇑, Mustafa Turker b,1

a Baskent University, Faculty of Engineering, Department of Computer Engineering, 06810 Ankara, Turkeyb Hacettepe University, Faculty of Engineering, Department of Geomatics Engineering, 06800 Ankara, Turkey

a r t i c l e i n f o a b s t r a c t

Article history:Received 12 March 2012Received in revised form 22 January 2013Accepted 23 January 2013

Keywords:Building detectionImage processingHigh resolution satellite imageryGenetic algorithmsFuzzy logic

We propose a new approach for building detection using high-resolution satellite imagery based on anadaptive fuzzy-genetic algorithm. This novel approach improves object detection accuracy by reducingthe premature convergence problem encountered when using genetic algorithms. We integrate the fun-damental image processing operators with genetic algorithm concepts such as population, chromosome,gene, crossover and mutation. To initiate the approach, training samples are selected that represent thespecified two feature classes, in this case ‘‘building’’ and ‘‘non-building’’. The image processing operationsare carried out on a chromosome-by-chromosome basis to reveal the attribute planes. These planes arethen reduced to one hyperplane that is optimal for discriminating between the specified feature classes.For each chromosome, the fitness values are calculated through the analysis of detection and mis-detec-tion rates. This analysis is followed by genetic algorithm operations such as selection, crossover andmutation. At the end of each generation cycle, the adaptive-fuzzy module determines the new (adjusted)probabilities of crossover and mutation. This evolutionary process repeats until a specified number ofgenerations has been reached. To enhance the detected building patches, morphological image process-ing operations are applied. The approach was tested on ten different test scenes of the Batikent district ofthe city of Ankara, Turkey using 1 m resolution pan-sharpened IKONOS imagery. The kappa statisticscomputed for the proposed adaptive fuzzy-genetic algorithm approach were between 0.55 and 0.88.The extraction performance of the algorithm was better for urban and suburban buildings than for build-ings in rural test scenes.

� 2013 Elsevier Ltd. All rights reserved.

1. Introduction

Building detection has long been one of the major researchareas in urban remote sensing. At first glance, buildings mayappear to be simple objects that can be easily identified andextracted. However, automatic building extraction from high-resolution images must address several difficulties caused by dif-ferences in viewpoint and by buildings of complex shape and size.Buildings are one of the fundamental GIS data components, andbuilding detection has been shown to be extremely useful in urbanplanning, infrastructure development, the construction of telecom-munication lines, pollution modeling, disaster planning and manyother types of urban simulation.

Several approaches based on high-resolution multi-spectralspaceborne imagery have been developed for the acquisition of2-D building information. Depending on the application area, sin-

gle and stereo uses of panchromatic, multi-spectral and pan-sharp-ened spaceborne imagery are commonly encountered. Fraser,Baltsavias, and Gruen (2001), Lee, Shan, and Bethel (2003),Shackelford and Davis (2003), Kim, Lee, and Kim (2006), Sirmacekand Unsalan (2009) and Koc San and Turker (2012) utilized IKO-NOS imagery for building extraction with different methodologiesbased on image classification, image segmentation, fuzzy pixel/ob-ject based approaches, line analysis and graph theoretical methods.Additionally, QuickBird imagery has been used to extract buildingsin several studies by employing different region- and feature-basedapproaches such as clustering, edge detection and snake contours(Liu, Cui, & Yan, 2008; Mayunga, Coleman, & Zhang, 2007; Wei,Zhao, & Song, 2004). Inglada (2007) proposed an image processingsystem based on support vector machines for the detection andrecognition of man-made objects from SPOT-5 imagery. Further-more, the use of hybrid datasets such as integrated SAR–opticalimagery and LIDAR–optical imagery have also been tested in sev-eral studies (Karantzalos & Paragios, 2010; Sohn & Dowman,2007; Tupin & Roux, 2003). Moreover, the use of digital elevationmodels (DEMs) and digital surface models (DSMs) generated fromdifferent spaceborne sensors for building detection can be

E. Sumer, M. Turker / Computers, Environment and Urban Systems 39 (2013) 48–62 49

observed in Ioannidis, Psaltis, and Potsiou (2009), Lafarge,Descombes, Zerubia, and Pierrot-Deseilligny (2010) and Tournaire,Bredif, Boldo, and Durupt (2010).

These previous approaches have generally employed purelyspectral input vectors built by the set of intensity values from eachspectral channel for each pixel in the image. Although these vec-tors provide a suitable fixed-dimensionality space in which con-ventional classifiers often work well, it is evident that spatialrelationships such as texture, proximity and shape can also be veryinformative in feature extraction. This type of additional informa-tion can be added to the spectral information. However, there exista huge number of potential combinations of these additional vectordimensions. To address this problem, a hybrid evolutionary algo-rithm called GENIE (GENetic Image Exploitation) was developedby Perkins et al. (2000). The algorithm maintains a population ofprimitive image processing operators such as basic mathematical,logical and texture operators. For each individual (chromosome)in the population, the ability to find the feature of interest (e.g.,the building) is tested by assigning a fitness value. The fitness ofan individual is determined by the agreement between the ex-tracted feature of interest and the user-provided reference pixels.After fitness determination, evolutionary operators such as selec-tion, crossover and mutation are employed until some stoppingcriterion is satisfied. As a general tendency, the ‘‘less-fit’’ individu-als are discarded and the ‘‘more-fit’’ ones are preserved to producebetter operator chains. In a study conducted by Harvey et al.(2002), GENIE proved to perform better than the fundamental con-ventional supervised classifiers such as minimum distance, maxi-mum likelihood, Mahalanobis distance, spectral angle mappingand binary encoding. In a further study conducted by Perkinset al. (2005), a system called GENIE Pro was developed. Similarto GENIE, GENIE Pro is a general purpose adaptive tool that derivesautomatic pixel classification algorithms for satellite and aerialimagery from training inputs. In particular, GENIE Pro integratedspectral information and spatial cues such as texture, local mor-phology and large-scale shape information in a much more sophis-ticated way.

The performance of genetic algorithms (GAs) is quite sensitiveto control parameters. For example, it is possible to destroy awell-performing chromosome when the crossover probability ishigh. On the other hand, a low crossover probability may preventthe algorithm from obtaining better individuals and does not guar-antee faster convergence. A high mutation rate may cause toomuch diversity and take longer to reach the optimal solution,whereas low mutation tends to miss some near-optimal points. Atendency for all of the population to converge to a single subopti-mal solution is also possible given a low mutation rate. If all of themembers of the population are very similar, the crossover operatorhas little function and mutation turns out to be the primary oper-ator (Herrera & Lozano, 2003). This negative effect triggers theproblem of premature convergence, where the solving procedureis trapped in a suboptimal state and most of the operators are un-able to generate offspring that surpass their parents any more. Theuse of fuzzy logic controllers to adapt GA parameters is one possi-ble solution to overcome these impediments and improve the per-formance of the GA.

In this study, we propose an adaptive fuzzy logic-based geneticapproach to the detection of buildings from high-resolution satel-lite images. The approach is based on the combination of GAs andsupervised image classification and therefore, it can be considereda hybrid feature extraction procedure. As the approach’s majornovelty, an adaptive-fuzzy logic module is integrated with the con-ventional GA in an attempt to improve the performance of the GAand reduce the premature convergence problem by adjusting thealgorithm’s parameters. Unlike the abovementioned previous stud-ies, the present study solely utilizes satellite imagery; no auxiliary

data such as LIDAR or a DEM are employed to locate the buildings.The approach was implemented using a program written in theMATLAB programming environment.

2. Methodology

A flowchart of the proposed methodology is given in Fig. 1. First,training and test regions are selected within the imagery. Next,predefined image processing operations are arbitrarily applied tothe Blue (B), Green (G), Red (R) and Near Infrared (NIR) imagebands to obtain spectral and textural attributes that are then re-duced to a single binary image band (the temporary building re-gions) through Fisher’s linear discriminant analysis. Then, basedon the temporary output and reference data, the fitness value ofthe candidate solution is computed by comparing the pixels thatbelong to the region of interest. This computation is followed byrunning GA operations such as selection, crossover and mutation.Selection retains the successful solutions (operator chains),whereas crossover and mutation are included to try to diversifythe remaining candidate solutions for the next generations. Inthe next step, the parameters of the GA are updated by an adaptivefuzzy logic controller to improve the algorithm’s performance. Thenewly adjusted parameters are then used in the next generation.This evolutionary process is repeated until a predefined numberof generations is reached. As the final step, we apply post-process-ing operations. Post-processing is essential to remove various falsealarm areas and image distortions that are likely to appear. Theoperations we use in post-processing include morphological imageprocessing functions such as opening, artifact removal, closing andhole-filling.

2.1. Image-based genetic algorithm fundamentals

GAs are a relatively popular paradigm that mimics the princi-ples of genetics and natural selection. A GA is a search heuristicthat is used to generate solutions for optimization and search prob-lems (Haupt & Haupt, 2004). In recent years, GAs have become apopular optimization technique in the field of image processing.Applications of image-based GAs extend from image enhancementfilters (Chang-Shing, Shu-Mei, & Chin-Yuan, 2005) and edge detec-tion (Li, Bai, & Zhang, 2007) to image classification (Yang, 2007)and segmentation (Maulik, 2009). For instance, GAs can be usedto construct new image enhancement filters or to optimize theparameters of existing filters. Different image-based GA studieshave addressed different problems. Every approach is unique, withdifferent chromosome and gene encoding schemes as well as selec-tion, crossover and mutation strategies, which are the key to thesuccess of the optimization (Paulinas & Usinskas, 2007).

The structure of the image-based GA used in our study is shownin Fig. 2. In this model, the population of the GA is generated from apredefined number of chromosomes, each of which can be seen asa candidate solution for extracting the building regions. The struc-ture of a chromosome consists of a predefined number of imageprocessing operations (genes). The genes are well-known operatorssuch as the basic mathematical, logical, thresholding, texture, spec-tral and filtering operators.

2.2. Spectral and textural attribute extraction

The first processing step is the selection of training and testareas from the imagery. In this study, we aim to discriminatebuildings from all other background objects. Therefore, the featureclasses ‘‘building’’ and ‘‘non-building’’ are specified, and the train-ing samples are selected for those two classes. We also collect testsamples with which to compute the fitness value of the extracted

Fig. 1. Flowchart of the proposed adaptive fuzzy-genetic algorithm approach for building detection.

Fig. 2. The structure of a population that is composed of M chromosomes and Ngenes in each chromosome.

50 E. Sumer, M. Turker / Computers, Environment and Urban Systems 39 (2013) 48–62

building regions. Next, the initialization of the chromosomes withthe image processing operators (genes) is carried out. These oper-ators are randomly selected from a gene pool. The complete list ofimage processing operators (Gonzalez, Woods, & Eddins, 2009,chaps. 2–6 & 9–10) included in the gene pool is given in Table 1.

In the gene pool, the first category (Table 1) consists of basicmathematical operators. Operator 1 simply adds two bands of

imagery. Operator 2 adds a positive or negative scalar parameterto a band. Operator 3 subtracts two bands from each other,whereas Operator 4 subtracts a positive or a negative scalar param-eter from a band. Operator 5 is similar to Operator 3 but divides theresult by the sum of its two inputs. Operator 6 multiplies the pixelvalues of two bands. Operator 7 scales the input band by a positivescalar. Operator 8 divides two bands pixel by pixel where the divi-sor and the divided bands are selected arbitrarily. Operator 9 issimilar to Operator 7 but multiplies the input band by the recipro-cal of the scalar. Operators 10, 11 and 12 respectively apply thenegation, square root and square operators to a single input band.Operator 13 is similar to Operator 7 but adds an extra parameter tothe scaled input. Operator 14 outputs a linear combination of twoinputs specified by a parameter that takes a value between 0 and 1.

The second category comprises the fundamental logical opera-tors. Operators 15 and 16 perform the minimum and maximumoperations, respectively, pixel-by-pixel. Operator 17 outputs itsthird input whenever the first input is less than the second inputand outputs its fourth input otherwise. The third category in thegene pool comprises several basic thresholding operators. In thiscategory, Operator 18 truncates any pixel values above a valueset by its parameter. Operator 19 does the reverse operation ofOperator 18. With the use of Operator 20, the values below itsparameter are set to 0 (black), whereas the values above theparameter are set to 1 (white).

The operators in the texture category apply Laws’ texture en-ergy measures (Laws, 1980) to the input bands. The fundamental

Table 1The primitive image processing operators (the gene pool).

Category Operator Id Operator description Input

# Of input bands # Of parameters

Basic mathematical 1 Add bands 2 02 Add scalar 1 13 Subtract bands 2 04 Subtract scalar 1 15 Normalized difference 2 06 Multiply bands 2 07 Multiply by scalar 1 18 Divide bands 2 09 Divide by scalar 1 1

10 Negate band 1 011 Square root 1 012 Square 1 013 Linear scale 1 214 Linear combination 2 1

Logical 15 Minimum 2 016 Maximum 2 017 If less than else 4 0

Thresholding 18 Clip high 1 119 Clip low 1 120 Threshold 1 1

Texture 21 R5R5 1 022 LAWB 1 023 LAWD 1 024 LAWF 1 025 LAWH 1 0

Spectral 26 Distance similarity 3 027 Correlation similarity 3 028 Similarity value 3 0

Filtering 29 Average 1 030 Sobel 1 131 Prewitt 1 132 Gaussian 1 133 Laplacian 1 134 Laplacian of Gaussian 1 135 Unsharp 1 1

E. Sumer, M. Turker / Computers, Environment and Urban Systems 39 (2013) 48–62 51

L3, E3 and S3 and the derived vectors L5, E5, S5, W5 and R5 arecomposed of 1-D convolution kernels: L3 = [121], E3 = [�101],S3 = [�12�1], L5 = [14641], E5 = [�1�2021], S5 = [�1020�1],W5 = [�120�21] and R5 = [1�46�41]. For these kernels, themnemonics stand for (L)evel, (E)dge, (S)pot, (W)ave and (R)ipple(Laws, 1980). In this study, R5R5, LAWB, LAWD, LAWF and LAWH(Operators 21–25) are generated from the above set of 1-D kernels,in which R5R5 corresponds to R5T � R5 and LAWB, LAWD, LAWFand LAWH correspond to S3T � L3, E3T � E3, L3T � S3 andS3T � S3, respectively. In the next category, which is composed ofspectral operators (Operators 26–28), the spectral similarity withinthe input bands is provided by the distance and correlation similar-ities along with the similarity value.

The last category (Operators 29–35) consists of various filteringoperators with a default kernel size of 3 � 3. Operator 29 performsaverage filtering. Operators 30 and 31 emphasize edges, where anadditional binary parameter (0 or 1) is used to indicate the gradientdirection such that 0 indicates vertical and 1 refers to horizontal.Operator 32 performs rotationally symmetric Gaussian low-passfiltering with a standard deviation sigma value between 0 and 1.Operator 33 is a Laplacian filter that approximates the shape ofthe two-dimensional Laplacian operator. The parameter ‘alpha’controls the shape of the Laplacian and ranges from 0 to 1. Opera-tor 34 is the Laplacian of the Gaussian filter with the same standarddeviation parameter as in Operator 32. Finally, Operator 35 per-forms unsharp contrast enhancement filtering from the negativeof the Laplacian filter with an ‘alpha’ parameter that controls theshape of the Laplacian and must be between 0 and 1.

All chromosomes in the population have the same fixed numberof genes. An example of a chromosome with five genes could be [310 20 10 24], where the numbers denote the operators. For thischromosome, the image processing operators ‘‘Subtraction (Opera-tor 3)’’, ‘‘Negation (Operator 10)’’, ‘‘Thresholding (Operator 20)’’,‘‘Negation (Operator 10)’’ and ‘‘Texture (LAWF) (Operator 24)’’are randomly applied to selected input and output bands. The in-put image bands include B (blue), G (green), R (red) and NIR(near-infrared), and the output bands are the empty temporarybands. A temporary output band can also be used as an input bandafter it is initialized by an operator. Throughout this study, we usedfour predefined temporary bands: ‘‘temp1’’, ‘‘temp2’’, ‘‘temp3’’ and‘‘temp4’’.

Considering the above hypothetical chromosome, an examplefictitious scenario works as follows: Let us assume that the algo-rithm selects two input bands (R and G) and one output band(temp3) for Operator 3. The result of the subtraction (R–G) is writ-ten to ‘‘temp3’’. From now on, band ‘‘temp3’’ can also be used as aninput band. For the next gene (Operator 10), single input and out-put bands are selected. For example, the bands ‘‘NIR’’ and ‘‘temp1’’might be selected as the input and output, respectively. The resultof negating the ‘‘NIR’’ band is written to ‘‘temp1’’. Next, a single in-put band (temp3) and an output band (temp2) are selected forOperator 20 together with a scalar parameter between 0 and 255for an 8-bit image. In our example, the scalar parameter is definedas 135. Therefore, those pixel values of ‘‘temp3’’ above 135 are setto 255, whereas values below the parameter are set to 0. The resultof this operation is written to band ‘‘temp2’’. After that, Operator

52 E. Sumer, M. Turker / Computers, Environment and Urban Systems 39 (2013) 48–62

10 is used to negate an input band (B) and the resultant image iswritten to an output band (temp4). In this case, ‘‘temp4’’ becomesthe automatic output band because it is the only remaining emptyband. Finally, an input band (temp1) is selected for Operator 24,the LAWF texture mask is applied to this band and the output iswritten to the selected output band (temp4).

2.3. Dimension reduction using Fisher’s linear discriminant analysis

In the next step, the temporary output bands are reduced to asingle band that represents the temporary building regions. Thereduction process is conducted by means of Fisher’s linear discrim-inant, which is a conventional classification algorithm. The methodprovides a linear combination of the temporary output bands thatmaximize the mean separation between true pixels (building) andfalse pixels (non-building), normalized by the total variance in theprojection defined by the linear combination. The result of the dis-criminant-finding phase is a gray-scale image, which is then re-duced to a binary image using a threshold value that maximizes‘‘fitness’’. Fig. 3 illustrates the dimension reduction procedure(Duda, Hart, & Stork, 2001).

In a projection onto one direction, w (two class problem), thesamples are d-dimensional vectors x1, . . .,xn, which consist of twosubsets, D1 and D2. The projected samples are computed usingEq. (1), which consists of two subsets Y1 and Y2 (Duda et al.,2001, chap. 3):

y ¼ wtx ð1Þ

The criterion is to maximize Fisher’s linear discriminant J(w):

JðwÞ ¼ wtSBwwtSww

ð2Þ

where SB ¼ ðm1 �m2Þ � ðm1 �m2Þt is the between-scatter matrix(mi = mean of x e Di) and Sw = S1 + S2 is the within-scatter matrix,where

Si ¼X

x2Di

ðx�miÞðx�miÞt ð3Þ

The optimal line direction w can be computed as follows:

w ¼ S�1w ðm1 �m2Þ ð4Þ

Fig. 3. The optimum direction w in discriminating the points that belong to twodifferent classes (red and black). (For interpretation of the references to color in thisfigure legend, the reader is referred to the web version of this article.)

2.4. Fitness computation

After extracting the building regions in binary form, the fitnessvalue (FT) of the candidate solution (chromosome) is computed.For the building and non-building regions, the fitness value of achromosome can be defined by the degree of agreement betweenthe binary output and the test pixels. For each chromosome, FT iscalculated using the following equation:

FT ¼ 500ðDþ ð1�MDÞÞ ð5Þ

where D, the detection rate, is the fraction of test pixels marked as‘‘building’’ that the classifier marks as ‘‘building’’ plus the fraction oftest pixels marked as ‘‘non-building’’ that the classifier marks as‘‘non-building’’. MD, the mis-detection rate, is the fraction of testpixels marked as ‘‘building’’ that the classifier marks as ‘‘non-build-ing’’ plus the fraction of test pixels marked as ‘‘non-building’’ thatthe classifier marks as ‘‘building’’. For instance, if D = 1, then MD be-comes 0 and FT is computed as 1000, which is the best case. In theworst case, FT becomes 0, for which D = 0 and MD = 1. Note that afitness score of 500 can be achieved with a classifier that identifiesall pixels as ‘‘building’’ or ‘‘non-building’’.

2.5. Selection, crossover and mutation operations

After obtaining the fitness values for all chromosomes in thepopulation, the chromosomes are ranked according to their valuesand only the highest-ranking chromosomes are selected, discard-ing the rest. The selection rate XR is the fraction of the total popu-lation NPOP that survives for the next generation. The number ofchromosomes to be kept NKEPT is computed as follows:

NKEPT ¼ NPOPXR ð6Þ

For the chromosome population in a generation NPOP, only thetop NKEPT are kept for mating and the rest (NPOP–NKEPT) are dis-carded to make room for new chromosomes. Next, two chromo-somes are selected from among the NKEPT chromosomes toproduce two new offspring. To select the chromosomes, a randompairing technique, which utilizes a uniform random number gener-ator, is used. To preserve the success rate for the next generations,the chromosome with the highest fitness value, the ‘‘elite chromo-some’’, is excluded from this process as suggested by De Jong(1975).

After selecting the parent chromosomes, the mating procedureis carried out. Mating can be defined as the creation of one or moreoffspring chromosomes from the selected parents. The most com-mon forms of mating include the production of two offspring bytwo parents (crossover) and the creation of a single offspring fromone parent (mutation). In general, mutation takes place after acrossover is performed. These operations are aimed at creating abetter population in the next generation by producing altered off-spring versions of ‘‘fit’’ parent chromosomes. The probabilities ofthe parent chromosomes being involved in the crossover andmutation operations are set to Pc and Pm, respectively. In this study,we use the ‘‘single point’’ crossover operation, in which a crossoverpoint is randomly selected between the first and the last genes ofthe parents’ chromosomes (Holland, 1992). In Fig. 4, ParentChromosome-1 first copies those of its genes that are to the leftof the crossover point to Offspring Chromosome-1. Similarly, Par-ent Chromosome-2 copies those of its genes that are to the leftof the same crossover point to Offspring Chromosome-2. Then,the genes to the right of the crossover point from ParentChromosome-1 are moved to Offspring Chromosome-2 and thesame genes from Parent Chromosome-2 (genes 1–3) are passedto Offspring Chromosome-1 in the same manner.

Mutation is the second way of diversifying a population. As withcrossover, we employ a single point mutation procedure (Holland,

Fig. 4. An example for the crossover operation and the generated offspring chromosomes.

E. Sumer, M. Turker / Computers, Environment and Urban Systems 39 (2013) 48–62 53

1992). The gene to be mutated is randomly selected from the off-spring chromosome produced by the crossover operation and ex-changed for a new gene. This gene is selected arbitrarily from thegene pool, which comprises the set of image operators (Fig. 5). Atthe end of the crossover and mutation operations, the parentsare expected to produce a total of NPOP–NKEPT offspring to keepthe chromosome population at NPOP. To do that, the selection andmating procedures are repeated until the required number of off-spring is produced.

2.6. Adaptive fuzzy logic controller operation

Before proceeding to the next generation of the genetic imagecomponent, an adaptive fuzzy logic controller operation proposedby Herrera and Lozano (2003) and Liu, Xu, and Abraham (2005) isperformed. In this study, the measures for the performance of theGA, such as the average and maximum fitness values and the con-trol parameters (crossover and mutation probabilities), are fed intothe adaptive fuzzy logic controller. Then, the controller returns theadjusted parameters for use in the next generation of the GA cycle.The idea behind the adaptive fuzzy logic controller is that thecrossover and mutation probabilities (Pc and Pm) should increaseif it consistently produces better offspring. However, Pc should de-crease and Pm should increase when fave(k) (average fitness of thekth generation) approaches fmax(k) (maximum fitness of the kthgeneration) or fave(k � 1) approaches fave(k). This scheme is basedon encouraging the well-performing genes to produce more off-spring and reducing the chance for poorly performing genes to de-stroy the potential chromosomes during the crossover andmutation processes. Two parameters (e1 and e2) were introduced(Eqs. (7) and (8)) to define the fuzzy rules for crossover and muta-tion operations. With the use of these parameters, the fuzzy rulesare identified to describe the relationship between the inputs (e1

and e2) and the output, which is the step size of the crossover ormutation probabilities (Table 2).

e1 ¼fmaxðkÞ � faveðkÞ

fmaxðkÞð7Þ

Fig. 5. An example for the mutation op

Table 2Fuzzy rules for crossover and mutation operations.

CROSSOVER (DPc(k)) e2

e1 NL NS ZE PS PLPL NS ZE NS PS PLPS ZE ZE NL ZE ZEZE NS NL NL NL NL

e2 ¼faveðkÞ � faveðk� 1Þ

fmaxðkÞð8Þ

In Table 2, the abbreviations NL, NS, ZE, PS and PL, respectively,stand for ‘‘Negative Large’’, ‘‘Negative Small’’, ‘‘Zero’’, ‘‘PositiveSmall’’ and ‘‘Positive Large’’. The values of these parameters are de-rived from the membership functions given in Liu et al. (2005). Theinputs for the mutation controller are the same as those for thefuzzy logic controller of the crossover. However, the output valuesof the mutation (illustrated by an asterisk in Table 2) are scaled by10% (i.e. PS� = PS/10). The output values, which are computed bythe defuzzification process, specify the step sizes DPc(k) andDPm(k) for the crossover and mutation probabilities, respectively.The defuzzification process, which converts the fuzzy output backinto numerical values, is performed by means of the centroid ap-proach using the membership functions described above. In thisapproach, the fuzzy set membership function has the shape of a tri-angle (Fig. 6a). If this triangle is cut along a straight horizontal linesomewhere between the top and the bottom and the top portion isremoved, the remaining shape looks like a trapezoid (Fig. 6b). Inthe initial step of defuzzification, parts of the graphs are cut offto form trapezoids. All of these trapezoids are then superimposedon one another, forming a single geometric shape. Then, the cen-troid of this shape is calculated and used as the defuzzified value(Fig. 6c). If the shape has a plate of equal density, the centroid isthe point along the horizontal axis about which this shape wouldbalance. By means of the defuzzification process, the controlparameters of the GA are modified using the computed valuesDPc(k) and DPm(k) given in the following equations (Liu et al.(2005)):

PcðkÞ ¼ Pcðk� 1Þ þ DPcðkÞ ð9Þ

PmðkÞ ¼ Pmðk� 1Þ þ DPmðkÞ ð10Þ

After determining the new probabilities for crossover andmutation, the next generation is initiated with a renewed popula-tion. The number of generations in the model depends on whetheran acceptable solution is reached or a predefined number of

eration and the updated offspring.

MUTATION (DPm(k)) e2

e1 NL NS ZE PS PLPL PS� ZE� PS� NS� NL�

PS ZE� ZE� PL� ZE� NS�

ZE PS� PL� PL� PL� PS�

Fig. 6. An example defuzzification process; (a) The original membership functions,(b) the membership functions whose top portions are removed and (c) thesuperimposed trapezoids with the centroid of the shape (red dashed line), whichspecifies the defuzzified value. (For interpretation of the references to color in thisfigure legend, the reader is referred to the web version of this article.)

54 E. Sumer, M. Turker / Computers, Environment and Urban Systems 39 (2013) 48–62

iterations is reached. After a while, all of the chromosomes andtheir fitness values will become the same. At this point, the algo-rithm is stopped. In our experiments, the GA is stopped after thepredefined number of generations is reached.

2.7. Morphological post-processing

Once the buildings are detected, artifacts are removed by apply-ing post-processing operations. The post-processing functions thatwe utilize include the image morphological operations: (i) opening,(ii) artifact removal, (iii) closing and (iv) hole filling (Gonzalez &Woods, 2008, chap. 9). First, the opening operation is carried outto smooth the contours of the building regions and eliminate thinprotrusions. The opening of set A by structuring element B is de-noted as A � B, which is formulated as

A � B ¼ ðA� BÞ � B ð11Þ

where the symbols � and � denote morphological erosion and dila-tion, respectively. Erosion tends to decrease the sizes of objects andremove small anomalies by subtracting objects with a radius smal-ler than that of the structuring element. In contrast, dilation gener-ally increases the sizes of objects and connecting areas that areseparated by spaces smaller than the size of the structuring

element. Next, an artifact removal operation is carried out to re-move isolated regions. Then, the closing operation, which tends tosmooth sections of contours, is carried out. In contrast to the open-ing operation, the closing operation generally fuses narrow breaksand long thin gulfs. The closing of set A by structuring element B,denoted A B, is defined as

A B ¼ ðA� BÞ � B ð12Þ

where the disk-shaped structuring element is used once again. Inthe final step of post-processing, the hole-filling operation is carriedout. A hole is defined as a set of background pixels surrounded by aconnected border of foreground pixels in a binary image. In general,the hole-filling algorithms are based on a combination of dilation,complementation and intersection in an image. The effects of themorphological operators on a sample test scene (scene 1) are illus-trated in Fig. 7. The composite effect of the opening and artifact re-moval operations shows that isolated regions and small protrusionsare eliminated to a great extent. Closing fuses the narrow buildingobject parts and hole-filling fills the isolated regions inside thebuilding patches.

3. Experimental setup

3.1. Description of study area and data

The developed methodology was implemented in 10 differenttest scenes selected from the Batikent district of the city of Ankara,Turkey. Batikent, which covers an area of approximately 1000 ha,is located on the western corridor of Ankara. It is a planned andregularly developed settlement area that contains various typesof buildings with different shapes and usages, such as residential,industrial, commercial, social and cultural facilities. The districtwas the biggest mass-housing project of the 1980s, accomplishedthrough cooperatives in Turkey. The project was planned for50,000 housing units and 250,000 inhabitants (European Resettle-ment Fund., 2011).

Fig. 8 illustrates the test scenes in false color composite. Scenes1, 3, 4, 5, 6 and 7 were classified as ‘‘urban’’ with respect to a studyconducted by Steiniger, Lange, Burghardt, and Weibel (2008). Theperceptual properties of an urban area are such that the built-uparea is dense and the building shapes are generally complex andcompact. The areas covered by the urban scenes are 353 � 263 mfor scene 1, 283 � 282 m for scene 3, 164 � 156 m for scene 4,357 � 348 m for scene 5, 324 � 196 m for scene 6 and247 � 190 m for scene 7. Scenes 2 and 8 were identified as ‘‘subur-ban’’ areas where rows of single houses along roads are empha-sized. Moreover, the built-up density is low and the buildings arerather dispersed. The areas covered by the suburban scenes are572 � 347 m for scene 2 and 632 � 431 m for scene 8. The remain-ing scenes (scenes 9 and 10) were classified as ‘‘rural’’ areas wherethe rural context generally comprises single buildings. Addition-ally, the built-up area is open and the size of the buildings variesfrom small to large. Rural scenes 9 and 10 cover areas of552 � 572 m and 646 � 607 m, respectively. For the satellite data,we used 1-m resolution pan-sharpened IKONOS imagery acquiredon August 4, 2002. The image was in ‘‘Geo’’ format. To assess theclassification accuracies, a reference dataset was prepared for eachscene by means of manually delineating and labeling the buildingsfrom the image (Fig. 9).

3.2. Parameter assignment

The proposed approach to building detection utilizes a numberof parameters including the selection rate (XR), the number ofgenerations, the number of runs, the population size (number of

Fig. 7. For a sample test scene; (a) the extracted building patches, the building patches after applying the (b) opening, (c) artifact removal, (d) closing and (e) hole fillingoperations.

E. Sumer, M. Turker / Computers, Environment and Urban Systems 39 (2013) 48–62 55

chromosomes), the chromosome size (number of genes) and theprobabilities of crossover (Pc) and mutation (Pm). For each of theseparameters, we determined the optimum value. To decide on aselection rate is somewhat arbitrary. Letting only a few chromo-somes survive may limit the number of available genes, whereaskeeping too many chromosomes may result in poor performance.Similar to a study conducted by Haupt and Haupt (2004, chaps.1–2), XR was kept to the 50% level in the natural selection process.After performing several tests, we found that 20 generations wereoptimum to obtain a barely changing value of maximum fitness.Similar to studies conducted by Garg (2009) and Bielza, Fernandezdel Pozo, Larranaga, and Bengoetxea (2010), the number of runswas set to 10. The initial crossover and mutation probabilities wereset to 0.8 and 0.2, in parallel with the literature (Haupt & Haupt,2004; Liu et al., 2005; Perkins et al., 2000). To determine the opti-mum values for the population and chromosome sizes, the accu-racy test conducted in Sumer (2011) based on the average andmaximum fitness values was used. The test results showed that apopulation size of 30 and a chromosome size of 5 can be used forscenes with similar characteristics to the test scenes used in thisstudy.

For the post-processing operations, a disk-shaped structuringelement with a radius of 3 was used for both the opening and

closing operations. Although other shapes of structuring elements,such as diamond, line, square and rectangle, are also available, thedisk-shaped element with a radius of 3 was found to be more fea-sible for preserving the orientation of the building regions. Thethreshold value for the artifact removal operation was set to 75.This threshold value was selected because it represents half ofthe minimum area patch (150 square meters) among the patchesin the reference data. Therefore, the patches above the specifiedthreshold value were considered to be buildings and preserved;the patches staying below the threshold value were removed fromthe binary image.

3.3. Performance evaluation

To evaluate the performance of the proposed building detectionapproach, the quantitative metrics proposed by Shufelt (1999),Lillesand, Kiefer, and Chipman (2008), chap. 7) and Rutzinger,Rottensteiner, and Pfeifer (2009) were used. When comparing thedetected buildings with the reference data, True Positive (TP) is de-fined as an entity labeled as a ‘‘building’’ that also corresponds to a‘‘building’’ in the reference data. True Negative (TN) is an entitythat belongs to ‘‘non-building’’ in both the detection results andthe reference data. A False Positive (FP) is an entity labeled as a

Fig. 8. The false color pan-sharpened IKONOS images for the test scenes 1–10.

Fig. 9. The reference data prepared for the test scenes 1–10.

56 E. Sumer, M. Turker / Computers, Environment and Urban Systems 39 (2013) 48–62

‘‘building’’ that corresponds to a ‘‘non-building’’ in the reference,whereas a False Negative (FN) is the exact opposite of the FP case.To evaluate the accuracies of the detected building patches, the

metrics Producer’s Accuracy (PA), User’s Accuracy (UA) and Kappa(j) were computed using the following equations (Lillesandet al., 2008):

E. Sumer, M. Turker / Computers, Environment and Urban Systems 39 (2013) 48–62 57

PA ¼ TPTP þ FN

ð13Þ

UA ¼ TPTP þ FP

ð14Þ

j ¼ ðTP þ TNÞ � ðTP þ TN þ FP þ FNÞ � chance agreement

ðTP þ TN þ FP þ FNÞ2 � chance agreementð15Þ

chance agreement ¼ ðTP þ FPÞ � ðTP þ FNÞ þ ðTN þ FNÞ � ðTN

þ FPÞ ð16Þ

The metric PA, also referred to as ‘‘Detection Rate’’ or ‘‘Com-pleteness’’, is treated as a measure of the object detection perfor-mance. It evaluates the fraction of reference pixels labeled as‘‘building’’ that is also identified as such by the approach. The met-ric UA, also referred to as ‘‘Correctness’’, indicates how well the de-tected buildings match with the reference data and is an indicatorof the false alarm rate. The compound metric j is generally as-sumed to be a more robust measure than a simple percent agree-ment calculation because it also takes into account the

Table 3(a) For 10 individual runs, the fitness values computed for 20 generations for scene#1 and

Number of generation Number of run

#1 #2 #3 #4 #5

(a)#1 921 855 902 861 903#2 937 858 902 862 903#3 940 858 902 923 903#4 940 858 902 934 903#5 940 942 902 935 903#6 940 942 938 935 936#7 940 942 939 935 936#8 940 942 939 935 936#9 940 942 939 935 936#10 940 942 941 935 936#11 944 942 942 935 936#12 944 942 942 935 936#13 944 942 942 935 936#14 944 942 947 935 936#15 944 942 947 935 936#16 944 942 947 935 936#17 944 942 947 935 936#18 944 942 947 935 936#19 944 942 947 936 939#20 944 943 947 936 940

Avg. fitnessScene#2 #3 #4 #5 #6

(b)908 897 897 810 945918 899 898 813 951920 901 901 815 953921 902 902 816 954924 902 902 817 954930 902 902 820 954933 902 903 822 955934 902 904 823 959934 903 904 824 959935 903 904 824 960935 903 904 826 960936 905 905 826 962936 905 905 826 962937 906 906 827 962937 906 906 827 963937 906 906 827 963938 906 906 829 964938 906 907 830 964938 906 907 831 964938 907 907 831 965

agreement that would occur by chance. The kappa value will be0 if two datasets agree only at the rate expected by chance, 1 ifthey always agree and negative if the performance is worse thanrandom. In general, a kappa value above 0.8 is considered a ‘‘good’’agreement, a value between 0.67 and 0.8 is taken as ‘‘fair’’ andagreement below 0.67 is assumed to be ‘‘dubious’’ (Manning,Raghavan, & Schütze, 2008, chap. 8).

4. Results and discussion

4.1. Building detection using the adaptive fuzzy logic controller

The developed approach was tested using the predeterminedGA parameters. We used a value of 20 for the parameter ‘‘numberof generations’’, 30 for ‘‘population size’’, 5 for ’’chromosome size’’,0.8 for ‘‘crossover probability’’ and 0.2 for ‘‘mutation probability’’.Due to the adaptive fuzzy logic controller, the crossover and muta-tion probabilities were adjusted adaptively with respect to perfor-mance measures. Each experiment was repeated 10 times and thehighest average fitness values computed for scenes 1–10,

(b) average fitness values for scenes #2–#10.

Avg. fitness

#6 #7 #8 #9 #10

894 918 893 896 918 896899 920 896 896 937 901899 920 921 902 937 910899 920 921 902 937 912899 920 921 902 937 920935 920 921 902 938 931935 920 921 931 946 934935 920 921 931 946 934935 920 936 931 946 936935 920 936 944 946 937935 920 936 944 946 938935 934 936 944 946 939939 935 936 944 946 940941 935 936 944 946 941941 935 940 944 946 941941 935 940 944 947 941941 935 940 944 947 941941 935 940 944 947 941944 935 940 944 947 942944 935 940 944 947 942

#7 #8 #9 #10

851 750 852 887864 773 854 892866 775 860 912868 787 872 916869 795 881 917873 801 885 921874 804 894 933877 811 902 933877 816 909 933877 837 910 934889 843 917 936895 844 917 939904 844 920 939905 847 920 940905 850 920 940905 852 924 940912 853 926 942913 854 927 943913 857 927 943913 857 927 944

58 E. Sumer, M. Turker / Computers, Environment and Urban Systems 39 (2013) 48–62

respectively, in the 20th generation were 942, 938, 907, 907, 831,965, 913, 857, 927 and 944.

The complete fitness values obtained for scene 1 and the aver-age fitness values for the remaining scenes (scenes 2–10) are givenin Table 3. For scene 1, the best progress was made by run 2, whichstarts from a fitness value of 855 and reaches 943. These extremevalues are italic in Table 3a. However, assessing the progress of anindividual run may yield misleading results because the runs maystart from different fitness scores in the parameter space. This isdue to the fact that the proposed approach is based on a randomprocess in which one individual may produce a very successfulscore in a certain run, whereas another may fail in a differentrun for the same scene. Therefore, to stabilize the extreme values,

Fig. 10. The detected buildings for the test scenes 1–10 with the m

Fig. 11. Examples of false alarm regions (circled by red), incompletely detected buildingstest scenes 1–4. (For interpretation of the references to color in this figure legend, the r

we considered the average of the fitness values. In this context, theaverage fitness values were analyzed for all generations where thefirst generation refers to the classification results of the Fisher’s lin-ear discriminant analysis and the last generation corresponds tothe final fitness values. As expected, an increasing trend is evidentfor all scenes; the differences in fitness values between the firstand the last generations were found to be 46, 30, 10, 10, 21, 20,62, 107, 75 and 57 for scenes 1–10, respectively. The best progress,with a standard deviation of 33.28, was achieved for scene 8. Thisscene is classified as ‘‘suburban’’ and covers an area of632 � 431 m, which can be considered rather large. On the otherhand, the worst progress was observed for scenes 3 and 4 withstandard deviations of 2.76 and 2.86, respectively. These scenes

etrics: Fitness, Producer’s Accuracy, User’s Accuracy and Kappa.

(circled by yellow) and closely located buildings (circled by green) selected from theeader is referred to the web version of this article.)

E. Sumer, M. Turker / Computers, Environment and Urban Systems 39 (2013) 48–62 59

were categorized as ‘‘urban’’ and based on their areas(283 � 282 m for scene 3 and 164 � 156 m for scene 4), they canbe considered small.

The computed fitness values cannot be used for the evaluationof the quantitative results. They are used only to determine theoutput images, which are assessed using the metrics described insection 3.3. Therefore, for each test scene, we considered the out-put image that has the highest fitness value. For the quantitativeassessments, the counts TP, TN, FP and FN were computed togetherwith the metrics PA, UA and j. The binary output images with thehighest fitness values and the metrics PA, UA and j are shown inFig. 10.

The producer’s accuracies ranged from 0.69 (scene 2) to 0.89(scene 1). The user’s accuracies and kappa values ranged from0.50 (scene 10) to 0.91 (scene 3) and 0.55 (scene 5)–0.88(scene1), respectively. The computed average kappa value was 0.76 forthe suburban context (scenes 2, 8), 0.74 for the urban context(scenes 1, 3, 4, 5, 6, 7) and 0.68 for the rural context (scenes 9,10). Similarly, the computed average producer’s and user’s accura-cies were 0.75 and 0.84 for the suburban context, 0.79 and 0.85 forthe urban context and 0.78 and 0.66 for the rural context, respec-tively. Of the ten test scenes used, scene 1 can said to be the most

Table 4(a) For 10 individual runs, the fitness values computed for 20 generations using the conve

Number of generation Number of run

#1 #2 #3 #4 #5

(a)#1 915 934 904 899 863#2 915 934 904 899 897#3 931 934 906 903 897#4 931 934 906 903 897#5 931 934 934 935 897#6 931 934 934 935 897#7 931 934 934 935 897#8 931 934 934 935 897#9 931 934 934 935 897#10 931 934 934 935 899#11 931 934 934 935 922#12 931 934 934 935 922#13 931 937 934 935 922#14 931 937 934 935 922#15 931 938 934 940 922#16 931 938 934 940 922#17 931 938 934 940 922#18 931 938 934 940 922#19 931 938 935 940 922#20 931 938 935 940 922

Avg. fitnessScene#2 #3 #4 #5 #6

(b)919 895 897 812 951920 896 897 813 952930 898 898 814 952930 899 898 818 955931 899 899 819 955932 900 900 820 955932 900 900 823 956933 900 900 823 956933 901 901 823 957933 901 901 823 957934 901 902 824 958934 902 903 824 959934 902 903 825 960935 902 903 825 960935 902 904 825 960935 902 904 826 960935 902 904 826 961935 902 904 827 961935 902 904 827 961935 902 904 827 961

successful one with the highest kappa value of 0.88. Therefore,scene 1 demonstrates ‘‘good’’ agreement. In contrast, scene 5yielded the worst kappa value of 0.55 and therefore can be inter-preted as ‘‘dubious’’. We believe that the failure of the classificationfor this scene is due to its high building density.

Although the computed accuracies were satisfactory, severalshortcomings were evident in certain cases. The reasons for thefailures were investigated and we found that the success of build-ing detection was highly affected by the scene characteristics. Forinstance, in scene 3, vegetation occludes the roof edges to a certainextent, which boosts the FN pixels (Fig. 11c). Furthermore, in thelower part of scene 2, the zigzag-shaped roof structures causespectral confusion that tends to increase the FP pixels (Fig. 11b).The other shortcoming that emerged from the scene characteristicsis that closely located buildings are detected as a joined singlebuilding patch. This case is shown in scene 4 (circled in green),where the urban buildings are rather dense (Fig. 11d). Apart fromthese, the red and yellow circles respectively indicate areas of falsealarms and incompletely detected buildings (Fig. 11).

Aside from the experimental results, the running times werealso computed for each test scene. The average running time for10 individual runs varied between 40 s (scene 4) and 274 s (scene

ntional approach for scene#1 and (b) average fitness values for scenes #2–#10.

Avg. fitness

#6 #7 #8 #9 #10

903 938 880 932 864 903903 938 880 932 880 908903 939 880 932 880 910903 939 880 932 880 910903 939 880 932 880 916903 939 880 932 880 916903 939 903 932 889 920903 939 903 939 889 920903 939 903 939 902 922903 939 903 939 902 922934 939 903 939 902 927934 940 938 939 902 931934 940 938 939 902 931935 940 938 939 902 931935 940 938 939 908 932935 941 938 939 908 932935 941 938 939 908 933935 945 938 939 908 933935 945 941 939 908 933935 945 941 939 908 933

#7 #8 #9 #10

849 786 858 903860 789 864 907865 810 877 910865 813 886 919869 814 887 923873 827 897 924879 830 898 924884 835 901 926885 838 905 931891 839 909 933891 839 909 933892 839 909 933892 840 911 935896 843 912 936896 843 913 936896 848 913 936896 850 914 936898 851 915 937898 851 915 937899 853 916 938

Fig. 12. For two approaches, the equalized performance curves generated for the test scenes 1–10. The dashed lines denote the proposed adaptive approach, while the solidlines refer to conventional approach.

60 E. Sumer, M. Turker / Computers, Environment and Urban Systems 39 (2013) 48–62

E. Sumer, M. Turker / Computers, Environment and Urban Systems 39 (2013) 48–62 61

10). Of the test scenes used, scene 4 has the smallest area,164 � 156 m, whereas scene 10 has the largest area,646 � 607 m. We found that the scene size was not the only factorthat affects the execution time. The gene arrangement is also animportant factor because every image processing operator has adifferent run time. The total time required to process 10 sceneswas computed to be 20.9 min (1251 s), which means that eachscene was processed in 2.1 min on average. All of these perfor-mance measurements were based on a Windows XP operating sys-tem with a Pentium Core 2 1.86 GHz processor and 2 GB RAM.

4.2. Building Detection without using the adaptive fuzzy logiccontroller

To assess the contribution of the adaptive-fuzzy approach, thesystem was re-executed excluding the adaptive fuzzy logic con-troller. In this case, the crossover and mutation probabilities werenot allowed to be changed and remained at 0.8 and 0.2, respec-tively, throughout the execution. As in the previous scenario (adap-tive approach), each experiment was repeated 10 times. Thecomplete fitness values computed for scene 1 and the average fit-ness values for the remaining scenes (scenes 2–10) are provided inTable 4. For scene 1, the best fitness progress was made by run 8,which starts from a fitness value of 880 and reaches 941. The ex-treme values are italic in Table 4a. Similarly to the adaptive sce-nario, the average fitness values were analyzed instead ofindividual runs. The computed highest values (averaged over 10runs) were 933, 935, 902, 904, 827, 961, 899, 853, 916 and 938for scenes 1–10, respectively, in the 20th generation. In this re-spect, the fitness differences between the first and the last genera-tions were analyzed for each scene. For scenes 1–10 respectively,the differences were computed to be, 30, 16, 7, 7, 15, 10, 50, 67,58 and 35. As in the adaptive scenario, the best progress, with astandard deviation of 19.74, was achieved for scene 8. On the otherhand, scenes 3 and 4 made the worst progress with standard devi-ations of 2.09 and 2.57, respectively.

In addition to the fitness values, the performance curves of thetwo approaches (Fig. 12) were also analyzed. To make a bettercomparison between the conventional approach and the proposedadaptive approach, the starting fitness values were equalized. It isevident that for all test scenes, the proposed adaptive approachperforms better than the conventional approach. The equalized dif-ferences between the two approaches are 16 (949–933) for scene1, 14 (949–935) for scene 2, 3 (905–902) for scene 3, 3 (907–904) for scene 4, 6 (833–827) for scene 5, 10 (971–961) for scene6, 12 (911–899) for scene 7, 40 (893–853) for scene 8, 17 (933–916) for scene 9 and 22 (960–938) for scene 10 (Fig. 12). Theperformance curves indicate that there is a larger probability ofgetting trapped in local optima using the conventional approach.In particular, the performance curves for scenes 1 and 6 (solidlines) reach their maxima after the 17th generation, whereas theperformance curves for scenes 2, 3 and 4 reach their maxima afterthe 14th, 12th and 15th generations at the local optimal solutions.For scenes 3 and 7, the adaptive approach takes more time to find abetter solution with a larger probability of arriving at the optimumsolution. On the other hand, scene 8 yields the highest differencebetween the adaptive and conventional approaches.

Similarly to the adaptive approach, the running times were alsocomputed in the present scenario. The computed average runningtimes for all scenes stayed between 32 s for scene 4 and 274 s forscene 10. The scenes for which the fastest and the slowest runningtimes were computed were the same as in the adaptive approach.However, the total running time required to process 10 scenes was20.1 min (1205 s), which is 4% faster than the running time of theadaptive approach. It is believed that this difference is due to thevariable probabilities of crossover and mutation in the adaptive

approach. In the present scenario, the crossover and mutationprobabilities are fixed, whereas in the adaptive approach, theseprobabilities are subject to change. In particular, it is observedfrom the experiments that the mutation probability tends to in-crease in later generations, making the running times slightly long-er due to the execution of many more operations.

5. Conclusions

In this study, we presented a GA-based building detection ap-proach using high-resolution satellite imagery. The approach com-bines a hybrid system of evolutionary techniques with a traditionalclassification method (Fisher’s linear discriminant) and an adaptivefuzzy logic component. The approach makes a novel improvementto object detection accuracy by reducing the premature conver-gence problem encountered in GAs. The fundamental image pro-cessing operators are integrated with the GA concepts such aspopulation, chromosome, gene, crossover and mutation. The effec-tiveness of the proposed approach for producing successful resultswas demonstrated.

The experimental validation of the approach was carried out onten selected test scenes with different characteristics. The kappavalues computed for the detected building patches were in therange from 0.55 to 0.88. The extraction performance was betterfor urban and suburban buildings than for the buildings in the ruraltest scenes. Among the scenes analyzed, the rural scenes generallyinclude more buildings under construction, which yield low user’sand producer’s accuracies. Further, the detection of densely locatedurban buildings is also problematic, although the computed user’sand producer’s accuracies were fairly high.

The proposed approach provided higher fitness values whencompared to a traditional classification method (Fisher’s lineardiscriminant classifier), for which the corresponding fitness val-ues are computed in the first generation of the GA. In the exper-imental tests that were carried out, considerable improvementswere evident such that average fitness increased by 107 of1000 units (e.g. scene 8). On the other hand, minor improve-ments were encountered for the test scenes that have smallerareas (e.g. scene 3). Moreover, if initiated at a high fitness value,it was noticeable that the algorithm has little chance to make asignificant jump.

Compared to the conventional GA approach, the proposed adap-tive fuzzy-genetic algorithm approach is efficient, yielding higheraverage fitness values. For the test scenes analyzed, the differencesin average fitness values between the two approaches were com-puted in the range from 3 to 40. It is believed that these differencesare due to the fixed initial probabilities of the crossover and muta-tion operations, which greatly increase the risk of getting trappedin a local minimum solution. In other words, the adaptive fuzzy-genetic approach reveals the most appropriate solution by dynam-ically adjusting the GA control parameters.

Finally, the image morphology-based post-processing stagereduced the false alarm areas successfully. The morphologicalopening and artifact removal operations removed the isolated re-gions and small protrusions to a large extent. Further, the closingand hole-filling operations successfully fused the narrow buildingparts and removed the holes in the building patches,respectively.

The proposed approach has a few limitations. One limitation isthat although an adaptive fuzzy logic controller is integrated intothe proposed approach, there is no absolute assurance that theGA will find a global optimum solution. Apart from that, parameterselection and initialization are the critical issues in terms of execu-tion cost and overall accuracy. Careful selection is needed for set-tings like selection, crossover and mutation methods along withthe size of the population and of chromosomes. Choosing improper

62 E. Sumer, M. Turker / Computers, Environment and Urban Systems 39 (2013) 48–62

parameters might result in longer program runs or even unsatis-factory results.

Acknowledgments

The authors thank the two anonymous reviewers for their con-structive suggestions and comments on this article.

Appendix A. Supplementary material

Supplementary data associated with this article can be found, inthe online version, at http://dx.doi.org/10.1016/j.compenvurbsys.2013.01.004.

References

Bielza, C., Fernandez del Pozo, J. A., Larranaga, P., & Bengoetxea, E. (2010).Multidimensional statistical analysis of the parameterization of a geneticalgorithm for the optimal ordering of tables. Expert Systems with Applications, 37,804–815.

Chang-Shing, L., Shu-Mei, G., & Chin-Yuan, H. (2005). Genetic-based fuzzy imagefilter and its applications to image processing. IEEE Transactions on Systems,Man, and Cybernetics – Part B. Cybernetics, 35, 694–711.

De Jong, K. A. (1975). An analysis of the behavior of a class of genetic adaptive systems.Doctoral dissertation, University of Michigan, Ann Arbor, Michigan, USA.

Duda, R. O., Hart, P. E., & Stork, D. G. (2001). Pattern Classification (2nd ed.). Wiley-Blackwell.

European Resettlement Fund. (2011). Batikent project-Turkey. <http://vcn.bc.ca/citizens-handbook/unesco/most/easteur1.html> Retrieved 12.10.11.

Fraser, C. S., Baltsavias, E., & Gruen, A. (2001). 3-D building reconstruction fromhigh-resolution Ikonos stereo-imagery. In E. P. Baltsavias, A. Gruen, & L. V. Gool(Eds.), Automatic extraction of man-made objects from aerial and space images (III)(pp. 331–344). Leiden: Balkema Publishing.

Garg, P. (2009). A comparison between memetic algorithm and genetic algorithmfor the cryptanalysis of simplified data encryption standard algorithm.International Journal of Network Security & Its Applications, 1, 34–42.

Gonzalez, R. C., Woods, R. E., & Eddins, S. L. (2009). Digital image processing usingMatlab (2nd ed.). Gatesmark Publishing, LLC.

Gonzalez, R. C., & Woods, R. E. (2008). Digital image processing (3rd ed.). New Jersey:Pearson Education.

Harvey, N. R., Theiler, J., Brumby, S. P., Perkins, S., Szymanski, J. J., Bloch, J. J., et al.(2002). Comparison of GENIE and conventional supervised classifiers formultispectral image feature extraction. IEEE Transactions on Geoscience andRemote Sensing, 40, 393–404.

Haupt, R. L., & Haupt, S. E. (2004). Practical genetic algorithms (2nd ed.). New Jersey:John Wiley & Sons.

Herrera, F., & Lozano, M. (2003). Fuzzy adaptive genetic algorithms: Design,taxonomy and future directions. Soft Computing, 7, 545–562.

Holland, J. H. (1992). Adaptation in natural and artificial systems (2nd ed.).Cambridge, MA: MIT Press (Chapter 6).

Inglada, J. (2007). Automatic recognition of man-made objects in high resolutionoptical remote sensing images by SVM classification of geometric imagefeatures. ISPRS Journal of Photogrammetry and Remote Sensing, 62, 236–248.

Ioannidis, C., Psaltis, C., & Potsiou, C. (2009). Towards a strategy for control ofsuburban informal buildings through automatic change detection. Computers,Environment and Urban Systems, 33, 64–74.

Karantzalos, K., & Paragios, N. (2010). Large scale building reconstruction throughinformation fusion and 3-D priors. IEEE Transactions on Geoscience and RemoteSensing, 48, 2283–2296.

Kim, T., Lee, T. Y., & Kim, K. O. (2006). Semiautomatic building line extraction fromIkonos images through monoscopic line analysis. Photogrammetric Engineeringand Remote Sensing, 72, 541–549.

Koc San, D., & Turker, M. (2012). A model-based approach for automatic buildingdatabase updating from high-resolution space imagery. International Journal ofRemote Sensing, 33, 4193–4218.

Lafarge, F., Descombes, X., Zerubia, J., & Pierrot-Deseilligny, M. (2010). Structuralapproach for building reconstruction from a single DSM. IEEE Transactions onPattern Analysis and Machine Intelligence, 32, 135–146.

Laws, K. I. (1980). Rapid texture identification. Proceedings of SPIE, 238, 376–380.Lee, D. S., Shan, J., & Bethel, J. S. (2003). Class-guided building extraction from Ikonos

imagery. Photogrammetric Engineering and Remote Sensing, 69, 143–150.Lillesand, T., Kiefer, R. W., & Chipman, J. W. (2008). Remote sensing and image

interpretation (6th ed.). John Wiley & Sons Inc.Li, Y., Bai, B., & Zhang, Y. (2007). An adaptive immune genetic algorithm for edge

detection. Advanced Intelligent Computing Theories and Applications: With Aspectsof Artificial Intelligence Lecture Notes in Computer Science, 4682, 565–571.

Liu, Z., Cui, S., & Yan, Q. (2008). Building extraction from high resolution satelliteimagery based on multi-scale image segmentation and model matching. InProceedings of 2008 international workshop on earth observation and remotesensing applications, Beijing.

Liu, H., Xu, Z., & Abraham, A. (2005). Hybrid fuzzy-genetic algorithm approach forcrew grouping. In Proceedings of 5th international conference on IntelligentSystems Design and Applications (ISDA’05) (pp. 332–337), Wroclaw.

Manning, C. D., Raghavan, P., & Schütze, H. (2008). An introduction to informationretrieval. Cambridge University Press.

Maulik, U. (2009). Medical image segmentation using genetic algorithms. IEEETransactions on Information Technology in Biomedicine, 13, 166–173.

Mayunga, S. D., Coleman, D. J., & Zhang, Y. (2007). A semi-automated approach forextracting buildings from Quickbird imagery applied to informal settlementmapping. International Journal of Remote Sensing, 28, 2343–2357.

Paulinas, M., & Usinskas, A. (2007). A survey of genetic algorithms applications forimage enhancement and segmentation. Information Technology and Control, 36,278–284.

Perkins, S., Edlund, K., Esch-Mosher, D., Eads, D., Harvey, N., & Brumby, S. (2005).Genie Pro: Robust image classification using shape, texture and spectralinformation. Proceedings of SPIE, 5806, 139–148.

Perkins, S., Theiler, J., Brumby, S. P., Harvey, N. R., Porter, R., Szymanski, J. J., et al.(2000). GENIE: A hybrid genetic algorithm for feature classification in multi-spectral images. Proceedings of SPIE, 4120, 52–62.

Rutzinger, M., Rottensteiner, F., & Pfeifer, N. (2009). A comparison of evaluationtechniques for building extraction from airborne laser scanning. IEEE Journal ofSelected Topics in Applied Earth Observation and Remote Sensing, 2, 11–20.

Shackelford, A. K., & Davis, C. H. (2003). A combined fuzzy pixel-based and object-based approach for classification of high-resolution multispectral data overurban areas. IEEE Transactions on Geoscience and Remote Sensing, 41, 2354–2363.

Shufelt, J. A. (1999). Performance evaluation and analysis of monocular buildingextraction from aerial imagery. IEEE Transactions on Pattern Analysis andMachine Intelligence, 21, 311–326.

Sirmacek, B., & Unsalan, C. (2009). Urban-area and building detection using SIFTkeypoints and graph theory. IEEE Transactions on Geoscience and Remote Sensing,47, 1156–1167.

Sohn, G., & Dowman, I. (2007). Data fusion of high-resolution satellite imagery andLIDAR data for automatic building extraction. ISPRS Journal of Photogrammetryand Remote Sensing, 62, 43–63.

Steiniger, S., Lange, T., Burghardt, D., & Weibel, R. (2008). An approach for theclassification of urban building structures based on discriminant analysistechniques. Transactions in GIS, 12, 31–59.

Sumer, E. (2011). Automatic reconstruction of photorealistic 3-D building models fromsatellite and ground-level images. Unpublished PhD thesis, Middle East TechnicalUniversity, Ankara, Turkey.

Tournaire, O., Bredif, M., Boldo, D., & Durupt, M. (2010). An efficient stochasticapproach for building footprint extraction from digital elevation models. ISPRSJournal of Photogrammetry and Remote Sensing, 65, 317–327.

Tupin, F., & Roux, M. (2003). Detection of building outlines based on the fusion ofSAR and optical features. ISPRS Journal of Photogrammetry and Remote Sensing,58, 71–82.

Wei, Y., Zhao, Z., & Song, J. (2004). Urban building extraction from high-resolutionsatellite panchromatic image using clustering and edge detection. InProceedings of the IEEE international geoscience and remote sensing symposium,Anchorage, AK, 2008–2010.

Yang, M. D. (2007). A genetic algorithm (GA) based automated classifier for remotesensing imagery. Canadian Journal of Remote Sensing, 33, 203–213.