bottom-up merging segmentation for color images with complex...

12
354 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS: SYSTEMS, VOL. 48, NO. 3, MARCH 2018 Bottom-Up Merging Segmentation for Color Images With Complex Areas Haifeng Sima, Ping Guo, Senior Member, IEEE, Youfeng Zou, Zhiheng Wang, and Mingliang Xu Abstract—Most color images obtained from the real world usually contain complex areas, such as nature scene images, remote sensing images, and medical images. All these type of images are very difficult to be separated accurately and automat- ically for complex color and structures included. In this paper, we focus on detecting hybrid cues of color image to segment complex scene in a bottom-up framework. The main idea of the proposed segmentation method is based on a two-step procedure: 1) a reasonable superpixels computing method is conducted and 2) a Mumford–Shah (M–S) optimal merging model is proposed for presegment suerpixels. First, a set of seed pixels is positioned at the lowest texture energy map computed from structure ten- sor diffusion features. Next, we implement a growing procedure to extract superpixels from selected seed pixels with color and texture cues. After that, a color-texture histograms feature is defined to measure similarity between regions, and an M–S opti- mal merging process is executed by comparing the similarity of adjacent regions with standard deviation constraints to get final segmentation. Extensive experiments are conducted on the Berkeley segmentation database, some remote sensing images, and medical images. The results of experiments have verified that the segmentation effectiveness of the proposed method in segmenting complex scenes and indicated that it is more robust and accurate than conventional methods. Index Terms—Color-texture histograms, image segmentation, Mumford–Shah (M–S) merging, structure tensor diffusion, superpixels. I. I NTRODUCTION I MAGE segmentation is a basic vision task that facilitates image analysis and understanding, and it is a fundamental Manuscript received January 10, 2016; revised April 14, 2016 and July 2, 2016; accepted August 31, 2016. Date of publication January 18, 2017; date of current version February 14, 2018. This work was supported in part by the Joint Fund of National Natural Science Foundation of China (NSFC) under Grant U1261206, in part by the Joint Research Fund of NSFC and the Chinese Academy of Sciences in Astronomy under Grant U1531242, in part by the NSFC under Grant 61375045, Grant 61572173, and Grant 61602157, in part by the Science and Technology Planning Project of Henan Province under Grant 162102210062, in part by the Key Scientific Research Fund of Henan Provincial Education Department for higher school under Grant 15A520072, and in part by the Doctoral Foundation of Henan Polytechnic University under Grant B2016-37. This paper was recommended by Associate Editor D. Zhang. (Corresponding author: Zhiheng Wang.) H. Sima and Z. Wang are with the Department of Computer Science and Technology, Henan Polytechnic University, Jiaozuo 454003, China (e-mail: [email protected]; [email protected]). P. Guo is with the School of System Science, Beijing Normal University, Beijing 100875, China (e-mail: [email protected]). Y. Zou is with Henan Polytechnic University, Jiaozuo 454003, China. M. Xu is with the School of Information Engineering, Zhengzhou University, Zhengzhou 450001, China. Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org. Digital Object Identifier 10.1109/TSMC.2016.2608831 and crucial step in many computer vision applications. The primary aim of segmentation is to divide an image into meaningful and spatially connected regions (such as object and background) on the basis of similar proper- ties. Existing segmentation approaches can predominantly be divided into the following four categories: 1) region-based methods [1]–[3]; 2) feature-based methods [4]; 3) boundary- based methods [5], [6]; and 4) model-based methods [7]–[10]. It is difficult to define a perfect assessment criterion to measure segmentation quality, because there is no universal solution for color image segmentation. In most applications, image segmentation methods are designed to adapt to specific applications and requirements. In the segmentation of images with complex scenes, super- pixels extraction is an ideal preprocessing technique to sim- plify the segmentation task. A superpixel is defined as a set of connected homogeneous image region, which have attracted increasingly attention in recent years. Various types of superpixels extraction algorithms differ in solving mea- sures which result in different presentations including shape, areas, boundaries, compactness, etc. For further task, the boundaries precision is the most important index regarded in this paper as a preparations commission for merging stage. The region merging process is the critical step in bottom-up segmentation approaches. The merging technique is region- based manipulations set by making decision of merging region via region properties and features. This process starts from an initial over-segmentation (superpixels) and all these superpix- els are iteratively merged until a given termination criterion is satisfied. Thus, effective superpixels computing method and merging strategy are the key issues of complex scenes segmen- tation. The goal of this paper is thereby to find ideal method to extract superpixels with accurate boundaries and establish a consistent optimal region merging model for the bottom-up segmentation framework. In this paper, we propose a two-step segmentation method based on the pixel growing and Mumford–Shah (M–S) optimal merging. First, the texture information is obtained by total vari- ation flow (TV flow) diffusion on structure tensors. Next, we select seeds in a texture gallery and make them grow to be uni- form superpixels according to similarity of color and texture features. Then we define a combined color and texture descrip- tor for superpixels based on local color pattern (LCP) and design a standard deviation-based M–S minimization merg- ing method to gather adjacent and homogeneous superpixels as final segmentation regions. A flowchart of our proposed 2168-2216 c 2017 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

Upload: others

Post on 19-Mar-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Bottom-Up Merging Segmentation for Color Images With Complex …sss.bnu.edu.cn/~pguo/pdf/2018/Bottom-Up Merging... · 2018-12-27 · 354 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS:

354 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS: SYSTEMS, VOL. 48, NO. 3, MARCH 2018

Bottom-Up Merging Segmentation forColor Images With Complex Areas

Haifeng Sima, Ping Guo, Senior Member, IEEE, Youfeng Zou, Zhiheng Wang, and Mingliang Xu

Abstract—Most color images obtained from the real worldusually contain complex areas, such as nature scene images,remote sensing images, and medical images. All these type ofimages are very difficult to be separated accurately and automat-ically for complex color and structures included. In this paper,we focus on detecting hybrid cues of color image to segmentcomplex scene in a bottom-up framework. The main idea of theproposed segmentation method is based on a two-step procedure:1) a reasonable superpixels computing method is conducted and2) a Mumford–Shah (M–S) optimal merging model is proposedfor presegment suerpixels. First, a set of seed pixels is positionedat the lowest texture energy map computed from structure ten-sor diffusion features. Next, we implement a growing procedureto extract superpixels from selected seed pixels with color andtexture cues. After that, a color-texture histograms feature isdefined to measure similarity between regions, and an M–S opti-mal merging process is executed by comparing the similarityof adjacent regions with standard deviation constraints to getfinal segmentation. Extensive experiments are conducted on theBerkeley segmentation database, some remote sensing images,and medical images. The results of experiments have verifiedthat the segmentation effectiveness of the proposed method insegmenting complex scenes and indicated that it is more robustand accurate than conventional methods.

Index Terms—Color-texture histograms, image segmentation,Mumford–Shah (M–S) merging, structure tensor diffusion,superpixels.

I. INTRODUCTION

IMAGE segmentation is a basic vision task that facilitatesimage analysis and understanding, and it is a fundamental

Manuscript received January 10, 2016; revised April 14, 2016 and July 2,2016; accepted August 31, 2016. Date of publication January 18, 2017; dateof current version February 14, 2018. This work was supported in part by theJoint Fund of National Natural Science Foundation of China (NSFC) underGrant U1261206, in part by the Joint Research Fund of NSFC and the ChineseAcademy of Sciences in Astronomy under Grant U1531242, in part by theNSFC under Grant 61375045, Grant 61572173, and Grant 61602157, in partby the Science and Technology Planning Project of Henan Province underGrant 162102210062, in part by the Key Scientific Research Fund of HenanProvincial Education Department for higher school under Grant 15A520072,and in part by the Doctoral Foundation of Henan Polytechnic University underGrant B2016-37. This paper was recommended by Associate Editor D. Zhang.(Corresponding author: Zhiheng Wang.)

H. Sima and Z. Wang are with the Department of Computer Science andTechnology, Henan Polytechnic University, Jiaozuo 454003, China (e-mail:[email protected]; [email protected]).

P. Guo is with the School of System Science, Beijing Normal University,Beijing 100875, China (e-mail: [email protected]).

Y. Zou is with Henan Polytechnic University, Jiaozuo 454003, China.M. Xu is with the School of Information Engineering, Zhengzhou

University, Zhengzhou 450001, China.Color versions of one or more of the figures in this paper are available

online at http://ieeexplore.ieee.org.Digital Object Identifier 10.1109/TSMC.2016.2608831

and crucial step in many computer vision applications.The primary aim of segmentation is to divide an imageinto meaningful and spatially connected regions (such asobject and background) on the basis of similar proper-ties. Existing segmentation approaches can predominantly bedivided into the following four categories: 1) region-basedmethods [1]–[3]; 2) feature-based methods [4]; 3) boundary-based methods [5], [6]; and 4) model-based methods [7]–[10].It is difficult to define a perfect assessment criterion tomeasure segmentation quality, because there is no universalsolution for color image segmentation. In most applications,image segmentation methods are designed to adapt to specificapplications and requirements.

In the segmentation of images with complex scenes, super-pixels extraction is an ideal preprocessing technique to sim-plify the segmentation task. A superpixel is defined as aset of connected homogeneous image region, which haveattracted increasingly attention in recent years. Various typesof superpixels extraction algorithms differ in solving mea-sures which result in different presentations including shape,areas, boundaries, compactness, etc. For further task, theboundaries precision is the most important index regardedin this paper as a preparations commission for mergingstage.

The region merging process is the critical step in bottom-upsegmentation approaches. The merging technique is region-based manipulations set by making decision of merging regionvia region properties and features. This process starts from aninitial over-segmentation (superpixels) and all these superpix-els are iteratively merged until a given termination criterionis satisfied. Thus, effective superpixels computing method andmerging strategy are the key issues of complex scenes segmen-tation. The goal of this paper is thereby to find ideal methodto extract superpixels with accurate boundaries and establisha consistent optimal region merging model for the bottom-upsegmentation framework.

In this paper, we propose a two-step segmentation methodbased on the pixel growing and Mumford–Shah (M–S) optimalmerging. First, the texture information is obtained by total vari-ation flow (TV flow) diffusion on structure tensors. Next, weselect seeds in a texture gallery and make them grow to be uni-form superpixels according to similarity of color and texturefeatures. Then we define a combined color and texture descrip-tor for superpixels based on local color pattern (LCP) anddesign a standard deviation-based M–S minimization merg-ing method to gather adjacent and homogeneous superpixelsas final segmentation regions. A flowchart of our proposed

2168-2216 c© 2017 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

Page 2: Bottom-Up Merging Segmentation for Color Images With Complex …sss.bnu.edu.cn/~pguo/pdf/2018/Bottom-Up Merging... · 2018-12-27 · 354 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS:

SIMA et al.: BOTTOM-UP MERGING SEGMENTATION FOR COLOR IMAGES WITH COMPLEX AREAS 355

Fig. 1. Segmentation process and intermediate results of our method.

method is depicted in Fig. 1. The main contribution of thispaper are as follows.

1) The total variation diffusion of structure tensor is devel-oped on three channels for computing texture featuresand pixels growing method is first used for producingsuperpixels that contain complete texture and accurateboundaries using mixture features.

2) In the second stage, we establish a two-tier merging cri-terion by M–S model and standard deviation of regionfeatures to control region merging. We design a com-bined histogram feature by combining color statisticalcharacteristic and distribution information in a mixtureform to describe commonality and distinction of thesuperpixels. These two measures are very effective fordistinguishing homogeneous and inhomogeneous super-pixelss in the merging process that leading to accuratesegmentation.

The remainder of this paper is organized as follows. InSection II, we describe the related work. Section III describesthe proposed texture detection and growing strategy to obtainthe superpixels. Section IV describes the basic concept of thecolor-texture feature and merging scheme in detail. Section V

describes extensive experiments conducted on the Berkeleysegmentation dataset (BSD300) and some selected images totest and validate the efficiency of our method. Section VIsummarizes and concludes this paper.

II. RELATED WORK

A number of superpixels-based segmentation methodshave shown good performance. Yang et al. [26] proposeda compression-based texture merging (CTM) segmentationmethod based on superpixels. The method uses a slide win-dow to extract texture features in each channel of a Laboratorycolor image in superpixels, and reduces the color-texturefeature to 8-D by principal component analysis. Then, ituses a mixture Gaussian degradation distribution to describetexture features and uses data clustering to merge superpix-els for image segmentation. Its segmentation performancesignificantly depends on the texture difference parameter,and it is very time-consuming when texture feature com-puting is conducted. Xie et al. [15] proposed a Laplaciansparse subspace clustering for salient object extraction basedon similarity between superpixels. Li et al. [35] proposed

Page 3: Bottom-Up Merging Segmentation for Color Images With Complex …sss.bnu.edu.cn/~pguo/pdf/2018/Bottom-Up Merging... · 2018-12-27 · 354 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS:

356 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS: SYSTEMS, VOL. 48, NO. 3, MARCH 2018

a novel segmentation framework based on bipartite graphpartitioning that is able to aggregate multilayer superpixelsin a principled and effective way, which can lead to fast seg-mentation of images. However, there are limitations. First, itneeds to predefine segment numbers. Second, texture imagesare poorly displayed. GL-graph [36] refined the B-graph usingmedium-sized superpixels through a sparse representation ofsuperpixels features by solving a �0-minimization problem.Taylor [16] explored approaches to accelerating segmentationand edge detection algorithms based on the gPb framework, itinvokes a globalization procedure which captures the essen-tial features based on an analysis of the normalized cutseigenvectors.

The superpixels extraction methods usually divide an imageinto hundreds of homogeneous segments instead of manyredundant pixels. These methods gather homogeneous pixelsthat meet corresponding demands. The most popular methodsare based on graph-cut [9], Lattice cut [23], local cluster-ing [22], geometric flows [24], etc. Graph-cut [9] tries tosegment homogeneous connection regions based on the prin-ciple of max-flow/min-cut, and it performs poorly on textureregions. Lattice cut [23] employs a supervised binary seg-mentation method to conduct unsupervised over-segmentedsuperpixels based on an alternating optimization strategy.Local clustering [22] chooses cluster centers from uniformblocks, and each pixel is assigned to a similar superpixelby linear iterative clustering algorithm with color and coordi-nates feature. Geometric flows [24] is an effective superpixelsextraction method by evolution contour model. Besides, somenovel models are designed for superpixels computing. Linearspectral clustering (LSC) [25] uses a kernel function lead-ing to an explicitly mapping of pixel values and coordinatesinto a high dimensional feature space and optimizes the costfunction of normalized cuts by iteratively applying simple K-means clustering in the proposed feature space. Yang et al. [26]proposed a K-means-like clustering method applied to theRGB-D data for over-segmentation using an 8-D distancemetric constructed from both color and 3-D geometrical infor-mation. As the first step of images segmentation, its first job isto obtain homogeneous regions and accurate boundaries but allthese methods perform inadequately at the boundaries detec-tion on images with complex scenes. For color images withcomplex scenes, color and texture are the basic and essentialvisual features which are used to separate these complex areas.Many researches have paid attention to improving their adapt-ability on nonuniformity of color distribution, input noise, andtextures discrimination [26]. Rousson et al. [21] proposed TVflow to describe texture characteristics. The method is notonly able to express the details of the texture features includ-ing direction and magnitude, but also can estimate local scaleinformation of texture by variation evolution. TV flow alsomakes up for the loss of texture scale in features coming fromthe structure tensor operator. Hence, the variation tensor diffu-sion model is appropriate to compute texture information forsuperpixels extraction in this paper.

In general, growing methods often result in over-segmentation regions and they are often followed by mergingprocedure to improve the segmentation results [45]. Region

merging process aims to yield a larger meaningful region withrespect to the input image content by gathering a set of adja-cent segments. Both similarity of feature space and the spatialrelations of pixels or regions need to be computed in the merg-ing process. The measurement of similarity usually employscolor and texture information.

Many researches have developed the merging techniques invarious aspects. Nock and Nielsen [27] improved the mergingstrategy by establishing a statistical feature of initial quantitiveimage and enhancing the merging model by a nonparametricmixture model estimation. Kuan et al. [44] presented a novelregion merging strategy for segmenting salient regions in colorimages. The technique generated an initial set of regions byextracting dominant colors in the input image using a non-parametric density estimation methodology. Subsequently, amerging protocol based on importance index and merging like-lihood criterions was used to refine the initial set. A goodsegmentation algorithm should preserve certain global proper-ties according to the perceptual cues. So that Peng et al. [29]proposed a graph based merging order by defining regionmerging predicate with homogeneous, continuity and similar-ity of regions. In numerous texture feature detection methods,local binary pattern (LBP) is used to describe the spatialstructure of local texture. The LBP approach was proposedby Ojala et al. [17] and provided highly discriminative tex-ture information. It is invariant to monotonic changes in graylevel and its computation is simple. For texture description,histograms of LBP patterns are used for segmentation [18].

III. FORMING SUPERPIXELS BY GROWING PIXELS

Of all the image segmentation strategies, region growingmethod is a stable method for obtaining a homogeneous andcontinuous region. In order to find reliable segments, it isnecessary to introduce both color and texture features whenthe growing process deals with complex images. To enhancethe description ability of local regions, both structure ten-sor and color channels are combined as pixels features whichare integrated into the region growing strategy for superpix-els extraction. The basic idea underlying TV flow is to blursome image information and compute the minimum TV dif-ference simultaneously so as to achieve denoising and edgepreservation. So we adopt nonlinear diffusion to extract tex-ture areas and enhance the edges simultaneously. The structuretensor calculation methods provided by Bigun et al. [20] canbe extended to color images with three channels. The structuretensor of a color image I with c(=3) channels is defined as

TS =[

T1,1S T1,2

ST2,1

S T2,2S

]= g ∗ ∇Ic∇IT

c

=[ (

Gxc

)2(Gx

c)(Gy

c)

(Gxc)

(Gy

c) (

Gyc)2

](1)

where ∇ denotes the gradient vector, g is the Gauss kernelfunction to smooth the structure tensor, and Gx(y)

c denotes thepartial derivative of a certain channel c in a certain direction(x or y). Ts is easy to be interfered by noise and it is usuallynecessary to be filtered for the texture region detection relying

Page 4: Bottom-Up Merging Segmentation for Color Images With Complex …sss.bnu.edu.cn/~pguo/pdf/2018/Bottom-Up Merging... · 2018-12-27 · 354 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS:

SIMA et al.: BOTTOM-UP MERGING SEGMENTATION FOR COLOR IMAGES WITH COMPLEX AREAS 357

Fig. 2. Boundary affinity illustration of SLIC [22], Turbo [24], and LSC [25] conducted on original images and combined features F(R, G, B, U′).

on nonlinear diffusion filtering. As a consequence, the secondmoment matrix Ts is replaced by its square root by eigenvaluedecomposition [16]

TS = √TS = TS

|∇I| =⎡⎢⎣

T1,1S|∇I|

T1,2S|∇I|

T2,1S|∇I|

T2,2S|∇I|

⎤⎥⎦

=⎡⎣ T1,1

S T1,2S

T2,1S T2,2

S

⎤⎦. (2)

Thus, the texture feature in this section is denoted byU = (u1

c, u2c, u3

c, u4c, u5

c), which consists of 15-D matrixes incolor image(c = 3). c represents arbitrary channel of imagein RGB color space. u1

c denotes the original channel c of theimage, < u2

c, u3c, u4

c > are the corresponding structure tensorsof channels c: <

˜

T1,1S ,

˜

T1,2S ,

˜

T2,2S > , and u5

c is scale estimationof channel c. It is considered that the scale of the regionis inversely proportional to the changes in pixel density andthe detail scale measurement is described in [21]. In order toensure consistency of all characteristics, the values in U areall mapped to the intervals [0, 255]

U′kc = 255 ∗ (

Ukc − min

(Uk

))max

(Uk

) − min(Uk

) . (3)

An image size of m×n is divided into average lattices asS = √

m × n/K and K is the initial superpixels number pre-defined. The seed pixels of superpixels are located at theminimum gradient points in the average lattices on the imageand the specific seed pixels sampling method can be findin [22]. When computing superpixels, pure texture featuresshow blurring of the true edges so that the boundary informa-tion of superpixels is weaken. So we integrate RGB channelfor locating accurate boundaries in growing superpixels. Thecolor-texture descriptor of a pixel is denoted by F(R, G, B, U′).The superpixels growing process is implemented in the adja-cent lattice (r = 1.5S) around the seed pixels for compact anduniform superpixels with complete textures. The normalizeddistance between pixels and regions in the combined feature

Algorithm 1 Growing Algorithm for Superpixels1: Initialise the selected seed pixels as area �K ;2: Assign a label pi for each area �i;3: Record all labeled pixels in a list T ;4: Select the seed pixels xi(xi in �i) from T and check its

eight-neighbor pixels Neigh(8). If the d(i, j) between �iand Neigh( j) is smaller than the d(i, R) between �i andthe average F of neighbor regions(r=1.5S) around the seedof �i, set the label of Neigh( j) as pi. Next, add neighborsto region �i and T , update the F of this region �i anddelete xi from T .Repeat step 4 until no pixels can be labeled.

5: Connect all small regions<20 pixels to similar and adja-cent regions.

space is calculated as follows:

di,j =∥∥RGBi − RGBj

∥∥ +∥∥∥U

′i − U

′j

∥∥∥2 ∗ 255

. (4)

where ‖·‖ denotes the Euclidean distance in the combinedfeature space of color and texture. Let s1, s2 · · · si denote theinitial seeds of the local areas and �i denotes the growingarea corresponding to si. The feature RGB and U of growingregions �i are averages of pixels in �i. The growing algorithmis thus outlined in Algorithm 1.

For keeping the real boundaries and homogeneous regions,the explicit iteration argument is in proportion to the pixelsnumber in superpixels S/2.

We show our advantages in Figs. 2 and 3 by comparing withsuperpixels produced by SLIC [22], Turbo [24], and LSC [25].In Fig. 2, we display the boundaries affinity comparison of ourmethod against SLIC, Turbo, and LSC on two images withcomplex texture. One can note that our method can retain goodlocalization of boundaries on test images. For the sake of fair-ness, we test SLIC, Turbo, and LSC with combined featuresF(R, G, B, U′) and show the results in Fig. 2. In the chee-tah image, there are a few improvements in the real boundarydetection with combined features F(R, G, B, U′). In the zebra

Page 5: Bottom-Up Merging Segmentation for Color Images With Complex …sss.bnu.edu.cn/~pguo/pdf/2018/Bottom-Up Merging... · 2018-12-27 · 354 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS:

358 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS: SYSTEMS, VOL. 48, NO. 3, MARCH 2018

Fig. 3. Boundary precision comparison with SLIC [22], Turbo [24], andLSC [25] on BSD300.

image, the boundary affinity with truth are improved in cer-tain areas (in red rectangles), but also preform not very wellin yellow rectangles. Fig. 3 shows the average performance ofboundaries precision on BSD300 compared with SLIC, Turbo,and LSC and it is obvious that our method obtains the betterboundaries affinity with manual boundaries and gets uniformsuperpixels with complete textures.

From the experimental results in Fig. 3, we can see thatthe average boundaries precision of segmentations with 900superpixles is higher than 95% compared with the groundtruth, which can meet the requirement of the segmentationtask. Moreover, more superpixels will increase the computa-tion complexity of merging and characteristics calculation, andit is not conducive to calculate the complete description of thetexture in single superpixels. Therefore, we set the superpixelsnumber K = 900 to ensure the boundary accuracy and textureintegrity in the growing procedure.

IV. GRAPH-BASED MUMFORD–SHAH

OPTIMAL REGION MERGING

In recent works, many region merging algorithms basedon statistical properties [27], [28], graph properties [29],and interactive model [30] have been developed for imagesegmentation. The basic merging criteria which relies oncolor homogeneity always suffers from small and meaninglessregions. Some other merging algorithms use global optimiza-tion energy terms, region number and the region area to controlthe merging process rather than setting termination condition.

To find an optimal sequence of merging which produces aunion of optimal labeling for all regions, the minimization ofa certain objective function F is required. At the same time,M–S function is a variational model proposed for segmenta-tion, and it detects optimal result by using objective function.M–S segmentation uses the energy functional to judge all pix-els whether can be merged into homogeneous regions andobtain qualitative region boundaries.

In addition to the merging strategy, feature selection isthe key issue for reliable segmentation. A combined featureincluding color-texture information is proposed to describethe region properties, which is a reliable similarity measure-ment for region discrimination and the standard deviation isintroduced for further control of the merging process.

Therefore, we design a hybrid constraints merging methodbased on the M–S model that can optimize multiple featuresof image for reasonable merging. The M–S model is describedin Section IV-A, the feature definition and distance metric are

described in Section IV-B, and the detailed description of themerging algorithm is in Section IV-C.

A. Mumford–Shah Model

In the M–S model [32] in (5), I0 represents the image to besegmented and I represents a differentiable segmented image.In [32], R is the image located in planar domain, Ri is thedisjoint connected area and � are the union of the portions ofthe boundaries of Ri inside R. The positive constant coefficientμ and ν are the scale controller of M–S model. Then the M–Sfunctional is defined as follows:

E(I, �) = μ2∫ ∫

R(I − I0)

2dxdy +∫ ∫

R−�

(∇I)dxdy + v|�|.(5)

M–S theory will minimize energy E(I, �) to achieve a setof I as the best segmentation result. The three terms on theright side of (15) means: the first term is used to maintain theapproximation between I and I0 as far as possible; the secondterm is used to control the smoothness of I in each segmentedregion on the energy; the third term υ|�| is used to control theboundary length to be parsimonious. Through the controls ofthree subitems, the M–S energy equation tries to minimize theboundary length and obtain continuous and smooth segments.By restricting I to be a local constant function in each regionof the segmentation, the M–S model is simplified as

E(I, �) =∑

i

∫ ∫Ri

(I − ci)2dxdy + v|�| (6)

where � represents the length of all the objects boundaries, Irepresents the image to be segmented, ci represents the averageintensity of the ith region Ri and υ is a constant parameter.

The key issue of M–S segmentation model is to design anefficient minimization energy function to combine the 2-D fea-ture and 1-D (boundary) constraints together and act relatedeffects. We study two typical M–S segmentation models todesign our merging strategy. One model is trying to detect thelargest energy decrease by using the heuristic greedy mannerand it will merge two regions when the largest decrease isappeared. The energy model is simplified to keep an inequal-ity relations between boundary and regional. The other modelassumes the two neighboring regions contain pixels numberNi and Nj with average intensity as ci and cj, the length of thecommon border between the region is �. This model definesthe energy function for the stage of before and after mergingwith region features and common boundaries. If the differ-ence of the energy after merging becomes smaller, two regionsshould be merged.

B. Hybrid Region Histogram Feature and Distance Metric

In this section, we design a statistical histogram featuresas assistant descriptor of superpixels with color and textureinformation for two reasons: 1) histogram is a rotation invari-ant feature of superpixels and 2) complex regions cannot beproperly characterized by simplex color information. The his-tograms feature we defined is able to describe the local color

Page 6: Bottom-Up Merging Segmentation for Color Images With Complex …sss.bnu.edu.cn/~pguo/pdf/2018/Bottom-Up Merging... · 2018-12-27 · 354 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS:

SIMA et al.: BOTTOM-UP MERGING SEGMENTATION FOR COLOR IMAGES WITH COMPLEX AREAS 359

Fig. 4. LCP coding of single pixel.

distribution feature and enhance the categorized ability ofsuperpixels.

For the number of pixels in superpixels being different andthe number of color that can be discerned visually being lim-ited, a quantized image is sufficient to describe the colordistribution of the local area for segmentation. Quantizationon the basis of uniform cutting of color space is too scattered,thus it will generate error and unnecessary texture. Therefore,an image is quantized to 64 colors by the minimum variancemethod to ensure the least color distortion, which is able toreduce the computing time [31].

In the quantized color image, Con denotes the connectionrelation between the neighbor pixel j and the center pixel Cp,and eight-neighbor pixels Neigh( j = 1 ∼ 8) are coded from21 −28. If the color difference Cd between Neigh( j) and Cp isgreater than the standard deviation of the image (the standarddeviation denotes by Cdth, 48 in Fig. 4), then Con( j) = 0,otherwise Con( j) = 1. Inspired by the LBP coding theory,the LCP value of the pixel is defined as follows (the eightneighbors are coded in the order shown in Fig. 4). After tex-ture coding of pixels, summing up all LCP values of pixelswith the same color in the superpixel serves as texture weightsof corresponding color bins, then multiply the probability ofquantitative colors and texture weights to obtain the compos-ite histogram feature of superpixels. Hence, the distributedcharacteristic of superpixels is defined by

His(superpixels) =64∑1

LCP(C)P(C/superpixels). (7)

Here, C denotes the mean value of the corresponding pixelsin the color map, and P represents the statistical value of thecolor bins. The metric of similarity between two histogramsis calculated as follows:

dist(X, Y) =√

(X − Y)TA(X − Y) (8)

where X and Y are the 64-bins histograms feature of adjacentregions, and A is the bin-similarity matrix. In this paper, it isa 64×64 ones matrix because the bins are the same betweenX and Y .

C. Hybrid Feature Fusion Mumford–Shah Merging Model

In this section, we propose an accelerated optimizationmerging method based on the M–S model. Simple greedysearch strategy easily leads to deviating from the correct seg-mentation results, so we employ both M–S minimization andstandard deviation-based threshold to constrain the mergingprocess for getting a more rational merging sequence.

1) Standard Deviation-Based Threshold: The hybrid his-tograms is a combined feature of pixels group that has theadvantage of discriminating different regions. To define themerging threshold, the standard deviation is a validated mea-sure of dispersion of data set in statistics, and it is an importantindex to describe the differences between samples [47]. Thus,the standard deviation of all image superpixels (K superpixels)features is computed for setting termination condition and itis defined as

V2 = 1

k

k∑1

(Hist(i) − Hist(image))2. (9)

From the test, it is found that the standard deviation changestoo fast to control the merging speed, which may lead to over-segmentation. Therefore, we introduce an exponential functionto modify the standard deviation V as follows, (N is the currentiteration times):

th = V

exp(N). (10)

2) M–S Minimization: Inspired by the region competitionmerging model in [46], we develop the M–S model by intro-ducing the novel histogram feature into the definition of M–Sequation. The hybrid feature is integrated as a coefficient ofregion feature in the energy computing equation. Thus theboundaries � in (6) of two given adjacent segment pairs Oi andOj can be replaced by ∂(Oi, Oj). The energy before merging∂(Oi, Oj) and the energy after merging ∂(Oi, Oj) are defined as

Epre = ξi

∫ ∫Ri

(I − ci)2dxdy

+ ξj

∫ ∫Rj

(I − cj

)2dxdy + λ · �

(∂(Ri, Rj

))(11)

Eafter = ξi

∫ ∫Ri

(I − ci+j

)2dxdy + ξj

∫ ∫Rj

(I − ci+j

)2dxdy

(12)

where ξi is the ratio of relative hybrid histogram feature ofregion i to standard deviation of the all histograms features ofsuperpixels computed by ξi = |hisi − his(image)|/V . Wherethe standard deviation V is defined in (9), ci denotes colormean value of region i in color image. I is the original imagein R and μ is the mean value of I in R. λ is the weight ofcommon boundaries in M–S model. � is the 1-D Hausdorffdistance measure of boundaries length [32], which means themaximum distance between the common boundary points ofthe RGB space. ∂(Ri, Rj) denotes common boundaries of thetwo regions. Therefore the energy difference is

Eafter − Epre = 2ξi

(|Ri|(ci − c) + |Ri|

(ci2 − c2

))+ 2ξj

(|Rj|(cj − c) + 2|Rj|

(cj2 − c2

))− λ · �

(∂(Ri, Rj

))(13)

where |R| is the surface measurement.

Page 7: Bottom-Up Merging Segmentation for Color Images With Complex …sss.bnu.edu.cn/~pguo/pdf/2018/Bottom-Up Merging... · 2018-12-27 · 354 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS:

360 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS: SYSTEMS, VOL. 48, NO. 3, MARCH 2018

Fig. 5. Details merging process with intermediate results of two test images.

3) M–S Merging Strategy: The region adjacencygraph (RAG) is a convenient model to represent adjacentrelations between superpixels, so we employee RAG of allsuperpixels to illustrate the optimization merging process. Allsuperpixels are mapped to an undirected graph G = (V, E).V is the set of nodes corresponding to superpixels, and Eis the set of edges connecting the pairs of adjacent nodeswhose corresponding superpixels have common boundaries.Corresponding weights of edges connected adjacent nodesare computed from the distance between hybrid histogramfeature space of corresponding superpixels.

In the merging algorithm, the adjacency graph is establishedaccording to the boundary coincidence of all superpixels.Arbitrary pairs of adjacent nodes (superpixels) (vi, vj) is con-nected by an edge E (vi, vj) with the similarity weight calcu-lated by dist(vi, vj). We establish adjacency matrix Sim(G) anda dynamic Queue. The similarity matrix Sim(G) is constructedbased on the adjacency relationships.

The merging manipulation is executed as follows: insertall of the nodes into a queue Q according to the similarityof link edges in ascending order. For each iteration, checkeach superpixel S in the Queue and all its adjacent regionsN(i)(0, 1, 2...k). If the two region meet merging conditions(Eafter(S, N(i)) > Epre(S, N(i)) and (dist(RS, RN(i)) < th),merge the corresponding regions into one region, delete theedges e(vS, vN(i)) from the graph, and then remove the twonodes from the Q. Insert the new region S’ to the tail andupdate the adjacency graph of the new node S’ and similaritymatrix Sim(G). The loop is executed until all regions do notmeet the merging criteria.

Algorithm 2 M–S Merging Algorithm for SuperpixelsInput:

The initial RAG of the labeled superpixels G(L)Output:

The merged result of RAG and relabeled superpixels1: Construct region adjacency matrix Sim(G) and dynamic

Queue Q for G(L)2: For each node S in G(L), compute similarity between

adjacent regions to initialize adjacent matrix Sim(G) andinsert all nodes into Q according to the order of the distin descending order

3: For each node S in Q, N(i) is neighbor region set of S4: if Eafter(S, N(i)) > Epre(S, N(i)) & dist(RS, RN(i)) < th

then5: merged N(i) to S; L = L − 16: change the region label of N(i) to S7: modified the combined histogram feature of new region8: update Q, RAG, Sim(G)

9: end if10: Return to 2 until no regions satisfy (Eafter(S, N(i)) −

Epre(S, N(i)) > 0), otherwise end the merge process11: renumber the labels of merged segments12: Return relabeled segments

When merge two regions we should conduct three postpro-cessing: 1) change the bigger region label to the smaller one,and sum up the pixels number of two regions; 2) recompute theregion histogram features of the merging region; and 3) updatethe adjacent relations of the adjacent table. The merging pro-cess is very efficient because no pixels processing is executed.In Fig. 5, we display the merging details of two images withintermediate results.

In this merging process, the adjacent properties, theregion features and the geometric boundary are introducedinto merging likelihood. By using the hybrid featurefusion M–S merging, it is easy to find a region merg-ing sequence producing higher merging efficiency and moreaccurate segmentation results. The specific merging algo-rithm is described in Algorithm 2. In Fig. 6, we showsome of segmentation results from the test images includ-ing the superpixels, merged results and region-iterationcurves.

V. EXPERIMENTAL RESULTS

A. Evaluation of BSD Images

In this section, we test the segmentation effect of the pro-posed method on BSD300. The BSD300 is the most populardataset for segmentation performance evaluation. It includes300 images of the same size 321 × 481 pixels includingvarious types of texture from the real world.

We first evaluate the proposed method qualitatively by usingseveral representative images and comparing them with severalwell-known and novel algorithms: mean-shift [33], J measurebased segmentation (JSEG) [34], CTM [26], and B-graph [35],GL-graph [36], and hybrid ordering and quartile analy-sis (HOQ) [37]. The comparison is based on three quantitative

Page 8: Bottom-Up Merging Segmentation for Color Images With Complex …sss.bnu.edu.cn/~pguo/pdf/2018/Bottom-Up Merging... · 2018-12-27 · 354 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS:

SIMA et al.: BOTTOM-UP MERGING SEGMENTATION FOR COLOR IMAGES WITH COMPLEX AREAS 361

Fig. 6. Some segmentation results with superpixels, merged results, anditeration curves.

performance measures: 1) probabilistic rand index (PRI) [38];2) variation of information (VoI) [39]; and 3) global consis-tency error (GCE) [40]. Many researchers have evaluated theseindexes to compare the performance of various methods inimage segmentation. Consequently, we believe these measuresare able to demonstrate the segmentation effectiveness of thecompared algorithms.

We select six images from BSD300 to display the visualsegmentation effect. First, we show the segmentation resultsand compare them with five other algorithms in Fig. 7. Theimages in the first row are the original images, and thosein the second row are the segmentation results of mean-shiftwith the bandwidth parameters hs = 8, hr = 12, and min-area = 500 pixels. The setting of the parameters is aimed toachieve more complete semantic regions of test images. Thesegmentation results of J-seg are shown in the third row. TheJSEG algorithm needs three predefined parameters: 1) quan-tized colors = 10; 2) number of scales = 5; and 3) mergingparameter = 0.78, which are provided in [34]. The images inthe fourth row are the segmentation results of CTM (λ = 0.1),and those images in the fifth and sixth row are the segmen-tation results of B-graph and GL-graph method, in which theparameters are provided by the author in the open codes. Themerger threshold THl of 0.65 is default in HOQ, and its seg-mentation results are displayed in the seventh row. The imagesin the last row are our segmentation results.

TABLE IPRI, VOI, AND GCE VALUE OF SIX TEST IMAGES

Fig. 7. Some segmentation results and comparison. Each row from top tobottom presents original images and segmentation results produced by mean-shift, JSEG, CTM, B-graph, GL-graph, HOQ, and our method.

The experimental data of the six compared images are listedin Table I. The table shows that our method outperforms theother algorithms based on the PRI, VoI, and GCE measures.From the segments in Fig. 7, we can observe that the mean-shift algorithm performs well in smooth color regions, whileits segmentation performance excessively rely on the settingof the color and spatial scale parameters: hs and hr. Mean-shift segmentation tends to obtain more redundant regions asover-segments. The segmentation results of JSEG performswell in identifying texture areas with specific and properscale parameters. However, the experimental data indicates

Page 9: Bottom-Up Merging Segmentation for Color Images With Complex …sss.bnu.edu.cn/~pguo/pdf/2018/Bottom-Up Merging... · 2018-12-27 · 354 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS:

362 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS: SYSTEMS, VOL. 48, NO. 3, MARCH 2018

Fig. 8. Some segmentation results from BSD300.

TABLE IIAVERAGE PERFORMANCE INDEX OF OUR METHOD

AGAINST OTHER METHODS ON BSD300

that JSEG easily generates over-or under-segmented regionswith improper parameters. And it has difficulty in segment-ing the low color contrast, small, and narrow regions. CTMoften displays over-or under-segmented regions enveloped byfault boundaries and it is rather time-consuming. The B-graphmethod is a type of graph partitioning method based on mul-tilayer superpixels, which facilitates pixel grouping dependingon various superpixels results. GL-graph is able to encodeadaptively both local and global homogeneity of an object thanB-graph. It is clear that B-graph and GL-graph outperformsother methods in homogeneous region detection, but they cannot work well in complex areas. The results of HOQ are moreconsistent with subjective human perception than above meth-ods. From Fig. 7, one can note the proposed growing-mergingframework which can segment the image into compact seman-tic regions and decrease the over-segments. In addition, byusing a histogram-based color-texture distance metric and M–Soptimization, the proposed method can obtain results withmore complete regions and more accurate boundaries. Weshow some of the segmentations from BSD300 in Fig. 8.

The average values of PRI, VoI, and GCE of comparisonmethods on the BSD300 are given in Table II. It can be seenthat our method shows outstanding performance in all eval-uation indexes. The PRI and GCE ranks second and third inVoI. The PRI is very close to the best performance.

In addition to regional measures, the boundary accuracyis also an important evaluation index of the segmentationperformance. Therefore, we employ the boundary accuracycomputing strategy to evaluate the proposed method [43]. Itdefines precision measures the proportion of boundary pix-els in the automatic segmentation Ssource that correspond to aboundary pixel in the ground truth Starget. Recall measures theproportion of boundary pixels in the ground-truth Starget that

Fig. 9. Boundary precision-recall curve of images in BSD300.

TABLE IIIAVERAGE RUNNING TIME OF OUR METHOD

AGAINST OTHER METHODS ON BSD300

were detected in the automatic segmentation Ssource. WhereSsource is the boundaries extracted from segmentation resultsby computer algorithms and Starget is the boundaries of humansegmentation provided by BSD300. A boundary pixel is iden-tified as true position when the smallest distance betweenthe extracted and the ground-truth is less than a threshold(ε = 3 pixels in this paper). The precision, recall are definedas follows:

Precision(Ssource, Starget) = Matched(Ssource, Starget)

Starget(14)

Recall(Ssource, Starget) = Matched(Ssource, Starget)

Ssource. (15)

We show the boundary precision-recall curve of six testmethods in Fig. 9 on BSD300. It is clear that our algo-rithm performs better than compared methods in most of thetest images. Compared with above-mentioned strategies, theboundaries obtained by our method are more approximateto the manual segmentations. Moreover, excess segmentationcontours are relatively less than other methods and completeregions are detected precisely by contrast.

In this section, all the resolution of the test images in BSD300 are 321 × 481. In our algorithm, the average of timeconsumption to segment an image is 16.8 s including all thecalculations and iterations. The average running time and codetype of all the other algorithm are shown in Table III (ona laptop with Windows 7 system with 2.70 GHz CPU and2.0 GB RAM).

B. Evaluation on Remote Sensing Images

The remote sensing images contains complex informationof land cover. In order to better develop the remote sensingimage understanding and interpretation, the segmentation of

Page 10: Bottom-Up Merging Segmentation for Color Images With Complex …sss.bnu.edu.cn/~pguo/pdf/2018/Bottom-Up Merging... · 2018-12-27 · 354 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS:

SIMA et al.: BOTTOM-UP MERGING SEGMENTATION FOR COLOR IMAGES WITH COMPLEX AREAS 363

Fig. 10. Comparison of some segmentation results. From left to rightpresents original images and segmentation results produced by mean-shift,JSEG, CTM, B-graph, and our method.

Fig. 11. Comparison of segmentation on large size of image. original image(left), mean-shift (middle), and our method (right).

land cover is the key issue for exploiting the semantic infor-mation. Here, we choose several remote sensing images todemonstrate the segmentation performance of our method. Allthese test images are slices of Iowa in Brazil including dif-ferent land covers acquired on July 5, 2015 by the advancedland imager on earth observing-1 in size of 512 × 512 pixels.We compared the segmentation results against the followingalgorithms: mean-shift, JSEG, CTM, and Bi-graph-cut, wherethe parameters are the same as Section V-A. The visual com-parison is presented in Fig. 10. It is obvious that our methodis able to obtain more accurate boundaries of homogeneousregions than compared algorithms.

In addition, we carried out an experiment to test the perfor-mance on large size of images of the proposed algorithm. Alarger scope remote sensing image of Iowa in Brazil is cho-sen for the segmentation experiment in size of 2332 × 2332pixels. Because of large size, the superpixels number is setto be 3600 for obtaining accurate boundaries. The segmenta-tion result is shown in Fig. 11 and compared with mean-shift.Because of the overflow and error in the process of computing,the segmentation result can not be obtained by JSEG, CTM,and Bi-graph-cut. It is noted that the proposed method has astronger adaptability in dealing with large images.

C. Evaluation on Medical Images

As we all know, medical images are always difficult to besegmented accurately by common methods. The level set andactive contours (AC) model are used widely in medical image

Fig. 12. Comparison with level set and AC model on skin lesions andcells images. From left to right presents original images, superpixels andsegmentation results produced by our method, level set and CV.

segmentation. In this section, we demonstrate the effective-ness and robustness of proposed strategy on some medicalimages. We compare our method with the improved level setmethod [41] and AC model [42] on the skin lesions and cellsimages. In this paper, the parameters of level set are setted asfollows (μ = 0.2,�t = 1, λ = 2, α = −0.3), and the α = 0.1in AC. The segmentation results from superpixels and compar-ison is shown in Fig. 12. From the results we can see that ourmethod is able to detect more accurate boundaries of lesionsand cells than other two methods. Another advantage of theproposed method is that it is able to segment multiobjects andarbitrary regions no matter what subordinative relationshipsbetween them.

D. Algorithm Complexity

The complexity of our algorithm contains two parts: 1) thesuperpixels computing and 2) the superpixels merging. Thetotal number of image pixels is N. In superpixels growingprocess, the texture feature, gradients and all pixels of the dis-tance function are computed. Thus, each item update takesO(N) operations. In superpixels merging process, each iter-ation requires the computation of the similarity table of theadjacency matrix and the histograms features, each updatetakes O(N/K) × O(logN) operations (K is the superpixelsnumber), and it will take O(K) iterations for the algorithm toconverge. The overall complexity of our algorithm is roughlyO(NlogN).

VI. CONCLUSION

In this paper, we present a novel bottom-up segmentationmethod for color image segmentation based on superpixels.This method groups adjacent pixels as superpixels by grow-ing seed pixels, which allows us to divide a complex imageinto small homogenous regions. This process employs thecolor-texture and scale features by exploiting structure ten-sors diffusion to generate superpixels enveloped with accurateboundaries. Another contribution of our method is the proposalof a composite color-texture histograms descriptor for regionsand M–S model to supervise the merging process, which is

Page 11: Bottom-Up Merging Segmentation for Color Images With Complex …sss.bnu.edu.cn/~pguo/pdf/2018/Bottom-Up Merging... · 2018-12-27 · 354 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS:

364 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS: SYSTEMS, VOL. 48, NO. 3, MARCH 2018

robust for discriminating regions with different texture struc-tures. Compared with some other texture-based methods, oursegmentation method is less affected by color mutation. Theresults of the experiment conducted on the test databaseattest to the excellent performance of our proposed imagesegmentation method under visual and quantitative evaluation.

REFERENCES

[1] L. Khelifi and M. Mignotte, “A novel fusion approach based on theglobal consistency criterion to fusing multiple segmentations,” IEEETrans. Syst., Man, Cybern., Syst., to be published.

[2] H. D. Cheng, X. H. Jiang, and J. Wang, “Color image segmenta-tion based on homogram thresholding and region merging,” PatternRecognit., vol. 35, no. 2, pp. 373–393, Feb. 2002.

[3] S. Makrogiannis, G. Economou, S. Fotopoulos, and N. G. Bourbakis,“Segmentation of color images using multiscale clustering and graphtheoretic region synthesis,” IEEE Trans. Syst., Man, Cybern. A, Syst.,Humans, vol. 35, no. 2, pp. 224–238, Mar. 2005.

[4] M. Mignotte, “Segmentation by fusion of histogram-based K-meansclusters in different color spaces,” IEEE Trans. Image Process., vol. 17,no. 5, pp. 780–787, May 2008.

[5] V. Caselles, R. Kimmel, and G. Sapiro, “Geodesic active contours,” Int.J. Comput. Vis., vol. 22, no. 1, pp. 61–79, Feb./Mar. 1997.

[6] L. Chen, C. L. P. Chen, and M. Lu, “A multiple-kernel fuzzy C-meansalgorithm for image segmentation,” IEEE Trans. Syst., Man, Cybern. B,Cybern., vol. 41, no. 5, pp. 1263–1274, Oct. 2011.

[7] C. Li, C. Xu, C. Gui, and M. D. Fox, “Distance regularized level setevolution and its application to image segmentation,” IEEE Trans. ImageProcess., vol. 19, no. 12, pp. 3243–3254, Dec. 2010.

[8] J. Shi and J. Malik, “Normalized cuts and image segmentation,”IEEE Trans. Pattern Anal. Mach. Intell., vol. 22, no. 8, pp. 888–905,Aug. 2000.

[9] P. F. Felzenszwalb and D. P. Huttenlocher, “Efficient graph-basedimage segmentation,” Int. J. Comput. Vis., vol. 59, no. 2, pp. 167–181,Sep. 2004.

[10] L. Bertelli, B. Sumengen, B. S. Manjunath, and F. Gibou, “A varia-tional framework for multiregion pairwise-similarity-based image seg-mentation,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 30, no. 8,pp. 1400–1414, Aug. 2008.

[11] P. Krajcevski and D. Manocha, SegTC: Fast Texture Compression UsingImage Segmentation, I. Wald and J. Ragan-Kelley, Eds. Lyon, France:Eurograph. Assoc., 2014, pp. 71–77.

[12] J. Shuai, L. Qing, J. Miao, Z. Ma, and X. Chen, “Salient region detectionvia texture-suppressed background contrast,” in Proc. IEEE Conf. ICIP,Melbourne, VIC, Australia, 2013, pp. 2470–2474.

[13] H. Sima and P. Guo, “Texture superpixels merging by color-texture his-tograms for color image segmentation,” KSII Trans. Internet Inf. Syst.,vol. 8, no. 7, pp. 2400–2419, Jul. 2014.

[14] A. Y. Yang, J. Wright, Y. Ma, and S. S. Sastry, “Unsupervised segmenta-tion of natural images via lossy data compression,” Comput. Vis. ImageUnderstand., vol. 110, no. 2, pp. 212–225, 2008.

[15] Y. Xie, H. Lu, and M.-H. Yang, “Bayesian saliency via low and midlevel cues,” IEEE Trans. Image Process., vol. 22, no. 5, pp. 1689–1698,May 2013.

[16] C. J. Taylor, “Towards fast and accurate segmentation,” in Proc. IEEEConf. CVPR, Portland, OR, USA, 2013, pp. 1916–1922.

[17] T. Ojala, M. Pietikäinen, and T. Mäenpää, “Multiresolution gray-scaleand rotation invariant texture classification with local binary patterns,”IEEE Trans. Pattern Anal. Mach. Intell., vol. 24, no. 7, pp. 971–987,Jul. 2002.

[18] M. Pietikäinen, T. Mäenpää, and J. Viertola, “Colour texture classifica-tion with colour histograms and local binary patterns,” in Proc. WorkshopTexture Anal. Mach. Vis., Copenhagen, Denmark, 2002, pp. 109–112.

[19] T. Brox and J. Weickert, “A TV flow based local scale measure fortexture discrimination,” in Proc. Int. Conf. Comput. Vis., vol. 2. Prague,Czech Republic, 2004, pp. 578–590.

[20] J. Bigun, G. H. Granlund, and J. Wiklund, “Multidimensional orienta-tion estimation with applications to texture analysis and optical flow,”IEEE Trans. Pattern Anal. Mach. Intell., vol. 13, no. 8, pp. 775–790,Aug. 1991.

[21] M. Rousson, T. Brox, and R. Deriche, “Active unsupervised texture seg-mentation on a diffusion based feature space,” in Proc. IEEE CVPR,vol. 2. Madison, WI, USA, 2003, pp. 699–704.

[22] R. Achanta et al., “SLIC superpixels compared to state-of-the-art super-pixel methods,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 34, no. 11,pp. 2274–2282, Nov. 2012.

[23] A. P. Moore, S. J. D. Prince, and J. Warrell, “‘Lattice cut’—Constructingsuperpixels using layer constraints,” in Proc. IEEE Conf. CVPR,San Francisco, CA, USA, 2010, pp. 2117–2124.

[24] A. Levinshtein et al., “TurboPixels: Fast superpixels using geomet-ric flows,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 31, no. 12,pp. 2290–2297, Dec. 2009.

[25] Z. Li and J. Chen, “Superpixel segmentation using linear spectralclustering,” in Proc. IEEE Conf. CVPR, Boston, MA, USA, 2015,pp. 1356–1363.

[26] J. Yang, Z. Gan, K. Li, and C. Hou, “Graph-based segmentation forRGB-D data using 3-D geometry enhanced superpixels,” IEEE Trans.Cybern., vol. 45, no. 5, pp. 927–940, May 2015.

[27] R. Nock and F. Nielsen, “Statistical region merging,” IEEE Trans.Pattern Anal. Mach. Intell., vol. 26, no. 11, pp. 1452–1458, Nov. 2004.

[28] F. Calderero and F. Marques, “Region merging techniques using infor-mation theory statistical measures,” IEEE Trans. Image Process., vol. 19,no. 6, pp. 1567–1586, Jun. 2010.

[29] B. Peng, L. Zhang, and J. Yang, “Iterated graph cuts for imagesegmentation,” in Proc. ACCV, Xi’an, China, 2009, pp. 677–686.

[30] H. Liu, Q. Guo, M. Xu, and I. Shen, “Fast image segmentation usingregion merging with a k-nearest neighbor graph,” in Proc. IEEE Int.Conf. Cybern. Intell. Syst., London, U.K., 2008, pp. 179–184.

[31] J.-P. Braquelaire and L. Brun, “Comparison and optimization of methodsof color image quantization,” IEEE Trans. Image Process., vol. 6, no. 7,pp. 1048–1052, Jul. 1997.

[32] A. C. Sobieranski, E. Comunello, and A. von Wangenheim, “Learning anonlinear distance metric for supervised region-merging image segmen-tation,” Comput. Vis. Image Understand., vol. 115, no. 2, pp. 127–139,Feb. 2011.

[33] D. Comaniciu and P. Meer, “Mean shift: A robust approach towardfeature space analysis,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 24,no. 5, pp. 603–619, May 2002.

[34] Y. Deng and B. S. Manjunath, “Unsupervised segmentation of color-texture regions in images and video,” IEEE Trans. Pattern Anal. Mach.Intell., vol. 23, no. 8, pp. 800–810, Aug. 2001.

[35] Z. Li, X.-M. Wu, and S.-F. Chang, “Segmentation using superpixels:A bipartite graph partitioning approach,” in Proc. IEEE Conf. CVPR,Providence, RI, USA, 2012, pp. 789–796.

[36] X. Wang, Y. Tang, S. Masnou, and L. Chen, “A global/local affinitygraph for image segmentation,” IEEE Trans. Image Process., vol. 24,no. 4, pp. 1399–1411, Apr. 2015.

[37] H.-C. Shih and E.-R. Liu, “New quartile-based region merging algo-rithm for unsupervised image segmentation using color-alone feature,”Inf. Sci., vol. 342, pp. 24–36, May 2016.

[38] R. Unnikrishnan, C. Pantofaru, and M. Hebert, “Toward objective eval-uation of image segmentation algorithms,” IEEE Trans. Pattern Anal.Mach. Intell., vol. 29, no. 6, pp. 929–944, Jun. 2007.

[39] M. Meila, “Comparing clusterings: An axiomatic view,” in Proc. ACMICML, Bonn, Germany, Aug. 2005, pp. 577–584.

[40] D. Martin, C. Fowlkes, D. Tal, and J. Malik, “A database of humansegmented natural images and its application to evaluating segmenta-tion algorithms and measuring ecological statistics,” in Proc. ICCV,Vancouver, BC, Canada, Jul. 2001, pp. 416–423.

[41] K. Zhang, L. Zhang, H. Song, and W. Zhou, “Active contours withselective local or global segmentation: A new formulation and level setmethod,” Image Vis. Comput., vol. 28, no. 4, pp. 668–676, 2010.

[42] S. Lankton and A. Tannenbaum, “Localizing region-based active con-tours,” IEEE Trans. Image Process., vol. 17, no. 11, pp. 2029–2039,Nov. 2008.

[43] F. J. Estrada and A. D. Jepson, “Quantitative evaluation of a novel imagesegmentation algorithm,” in Proc. CVPR, San Diego, CA, USA, 2005,pp. 1132–1139.

[44] Y.-H. Kuan, C.-M. Kuo, and N.-C. Yang, “Color-based image salientregion segmentation using novel region merging strategy,” IEEE Trans.Multimedia, vol. 10, no. 5, pp. 832–845, Aug. 2008.

[45] S. R. Vantaram and E. Saber, “Survey of contemporary trendsin color image segmentation,” J. Electron. Imag., vol. 21, no. 4,pp. 040901-1–040901-28, 2012.

[46] Y. Pan, J. D. Birdwell, and S. M. Djouadi, “Bottom-up hierarchicalimage segmentation using region competition and the mumford-shahfunctional,” in Proc. ICPR, vol. 2. Hong Kong, Aug. 2006, pp. 117–121.

[47] M. T. Ibrahim, T. M. Khan, M. A. Khan, and L. Guan, “Automaticsegmentation of pupil using local histogram and standard deviation,” inProc. VCIP, Huangshan, China, Jul. 2010, Art. no. 77442S.

Page 12: Bottom-Up Merging Segmentation for Color Images With Complex …sss.bnu.edu.cn/~pguo/pdf/2018/Bottom-Up Merging... · 2018-12-27 · 354 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS:

SIMA et al.: BOTTOM-UP MERGING SEGMENTATION FOR COLOR IMAGES WITH COMPLEX AREAS 365

Haifeng Sima received the B.E. and M.E. degreesin computer science from Zhengzhou University,Zhengzhou, China, in 2004 and 2007, respectively,and the Ph.D. degree in software and theory fromthe Beijing Institute of Technology, Beijing, Chinain 2015.

Since 2007, he has been with the Faculty of HenanPolytechnic University, Jiaozuo, China, where he iscurrently a Lecturer with the School of ComputerScience and Technology. His research interestsinclude pattern recognition, image processing, image

segmentation, and image classification.

Ping Guo (SM’05) received the B.S. and M.S.degrees from Peking University, Beijing, China,in 1980 and 1983, respectively, and the Ph.D.degree from the Chinese University of Hong Kong,Hong Kong, in 2002.

He is currently with the School of System Science,Beijing Normal University, Beijing, where he isthe Founding Director of the Image Processingand Pattern Recognition Laboratory. He is also anAdjunct Professor with the School of ComputerScience, Beijing Institute of Technology, Beijing.

His research interests include computational intelligence, image processing,pattern recognition, and astronomy big data analysis systems and applica-tions.

Prof. Guo was a recipient of the Science and Technology (Third Rank)Award of 2012 Beijing Peoples Government for his contributions to Studiesof Regularization Method and Their Applications.

Youfeng Zou received the B.E. degree from theJiaozuo Institute of Technology, Jiaozuo, China,in 1984, and the M.S. and Ph.D. degrees fromthe China University of Mining and Technology,Beijing, China, in 1988 and 1994, respectively.

He is currently a Professor with HenanPolytechnic University, Jiaozuo. His researchinterests include mine survey and remote sensingclassification.

Zhiheng Wang received the B.Sc. degree in mecha-tronic engineering from the Beijing Institute ofTechnology, Beijing, China, in 2004, and the Ph.D.degree from the Institute of Automation, ChineseAcademy of Sciences, Beijing, in 2009.

He is currently an Associate Professor with theSchool of Computer Science and Technique, HenanPolytechnic University, Jiaozuo, China. His researchinterests include computer vision, pattern recogni-tion, and image processing.

Mingliang Xu received the B.E. and M.E. degreesfrom Zhengzhou University, Zhengzhou, China, in2005 and 2008, respectively, and the Ph.D. degreefrom the State Key Laboratory of CAD & CG,Zhejiang University, Hangzhou, China, in 2012, allin computer science.

He is currently an Associate Professor withthe School of Information Engineering, ZhengzhouUniversity, and also the Director of Center forInterdisciplinary Information Science Research.His research interests include computer graphicsand computer vision.