robust image segmentation using learned priors

8
Robust Image Segmentation Using Learned Priors Ayman El-Baz BioImaging Lab., University of Louisville Louisville, KY, USA, 40292 Georgy Gimel’farb Computer Science Dept., University of Auckland Private Bag 92019, Auckland 1142, New Zealand Abstract A novel parametric deformable model of a goal object controlled by shape and appearance priors learned from co-aligned training images is introduced. The shape prior is built in a linear space of vectors of distances to the train- ing boundaries from their common centroid. The appear- ance prior is modeled with a spatially homogeneous 2 - order Markov-Gibbs random field (MGRF) of gray levels within each training boundary. Geometric structure of the MGRF and Gibbs potentials are analytically estimated from the training data. To accurately separate goal objects from arbitrary background, the deformable model is evolved by solving an Eikonal partial differential equation with a speed function combining the shape and appearance priors and the current appearance model. The latter represents em- pirical gray level marginals inside and outside an evolv- ing boundary with adaptive linear combinations of discrete Gaussians (LCDG). The analytical shape and appearance priors and a simple Expectation-Maximization procedure for getting the object and background LCDGs, make our segmentation considerably faster than most of the known counterparts. Experiments with various images confirm ro- bustness, accuracy, and speed of our approach. 1. Introduction Parametric and geometric deformable models are widely used for image segmentation. However, in many applica- tions, especially in medical image analysis, accurate seg- mentation with these models is a challenging problem due to noisy or low-contrast 2D/3D images with fuzzy bound- aries between goal objects (e.g. anatomical structures) and their background; the similarly shaped objects with differ- ent visual appearances, and discontinuous boundaries be- cause of occlusions or similar visual appearance of adjacent parts of objects of different shapes [1, 2]. Prior knowledge about the goal shape and/or visual appearance helps in solv- ing such segmentation problems [2]. Relationship to the previous works: Conventional para- metric deformable models [3] and geometric models (e.g. [4]) based on level set techniques [5] search for strong signal discontinuities (grayscale or color edges) in an im- age and do not account for prior shape constraints. But the evolution guided only by edges and general continuity– curvature limitations fails when a goal object is not clearly distinguishable from background. More accurate results, but at the expense of a considerably reduced speed, were obtained by restricting grayscale or color patterns within an evolving surface [6]. Nonetheless, the accuracy remains poor without prior knowledge of goal objects. At present, 2D/3D parametric and geometric deformable models with shape and/or appearance priors learned from a training set of manually segmented images are of the main interest. Initial attempts to involve the prior shape knowledge were built upon the edges. Pentland and Sclaroff [7] and Cootes et al. [8] described an evolving curve with shape and pose parameters of a parametric set of points matched to strong image gradients and use a linear combination of eigenvectors to represent variations from an average shape. A parametric point model of Staib and Duncan [9] was based on an elliptic Fourier decomposition of landmarks. Model parameters ensure the best match between the evolv- ing curve and points of strong gradients. Chakraborty et al. [10] extended this approach to a hybrid model combin- ing the region gradient and homogeneity information. More efficient results were obtained by learning the pri- ors from a training set of manually segmented images of goal objects [11, 12, 13, 14, 1, 15]. Pizer et al. [11] and Styner et al. [13] segment 3D medical images by coarse- to-fine deformation of a shape-based medial representation (“m-rep”). A deformable model of Huang et al. [16] inte- grates region, shape and interior signal features assuming an approximate region shape is a priori known and aligned with the image to initialize the model. Leventon et al. [12] and Shen et al. [14] augment a level set-based energy func- tion guiding the evolution with special terms attracting to more likely shapes specified with the principal component analysis (PCA) of the training set of goal objects, while Chen et al. [17] use a geometric model with the prior “av- erage shape”. The most advanced level set-based geometric model of Tsai et al. [1] evolves as zero level of a 2D map 857 2009 IEEE 12th International Conference on Computer Vision (ICCV) 978-1-4244-4419-9/09/$25.00 ©2009 IEEE

Upload: others

Post on 09-Nov-2021

7 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Robust Image Segmentation Using Learned Priors

Robust Image Segmentation Using Learned Priors

Ayman El-BazBioImaging Lab., University of Louisville

Louisville, KY, USA, 40292

Georgy Gimel’farbComputer Science Dept., University of AucklandPrivate Bag 92019, Auckland 1142, New Zealand

Abstract

A novel parametric deformable model of a goal objectcontrolled by shape and appearance priors learned fromco-aligned training images is introduced. The shape prioris built in a linear space of vectors of distances to the train-ing boundaries from their common centroid. The appear-ance prior is modeled with a spatially homogeneous 2��-order Markov-Gibbs random field (MGRF) of gray levelswithin each training boundary. Geometric structure of theMGRF and Gibbs potentials are analytically estimated fromthe training data. To accurately separate goal objects fromarbitrary background, the deformable model is evolved bysolving an Eikonal partial differential equation with a speedfunction combining the shape and appearance priors andthe current appearance model. The latter represents em-pirical gray level marginals inside and outside an evolv-ing boundary with adaptive linear combinations of discreteGaussians (LCDG). The analytical shape and appearancepriors and a simple Expectation-Maximization procedurefor getting the object and background LCDGs, make oursegmentation considerably faster than most of the knowncounterparts. Experiments with various images confirm ro-bustness, accuracy, and speed of our approach.

1. Introduction

Parametric and geometric deformable models are widelyused for image segmentation. However, in many applica-tions, especially in medical image analysis, accurate seg-mentation with these models is a challenging problem dueto noisy or low-contrast 2D/3D images with fuzzy bound-aries between goal objects (e.g. anatomical structures) andtheir background; the similarly shaped objects with differ-ent visual appearances, and discontinuous boundaries be-cause of occlusions or similar visual appearance of adjacentparts of objects of different shapes [1, 2]. Prior knowledgeabout the goal shape and/or visual appearance helps in solv-ing such segmentation problems [2].Relationship to the previous works: Conventional para-metric deformable models [3] and geometric models

(e.g. [4]) based on level set techniques [5] search for strongsignal discontinuities (grayscale or color edges) in an im-age and do not account for prior shape constraints. Butthe evolution guided only by edges and general continuity–curvature limitations fails when a goal object is not clearlydistinguishable from background. More accurate results,but at the expense of a considerably reduced speed, wereobtained by restricting grayscale or color patterns withinan evolving surface [6]. Nonetheless, the accuracy remainspoor without prior knowledge of goal objects. At present,2D/3D parametric and geometric deformable models withshape and/or appearance priors learned from a training setof manually segmented images are of the main interest.

Initial attempts to involve the prior shape knowledgewere built upon the edges. Pentland and Sclaroff [7] andCootes et al. [8] described an evolving curve with shapeand pose parameters of a parametric set of points matchedto strong image gradients and use a linear combination ofeigenvectors to represent variations from an average shape.A parametric point model of Staib and Duncan [9] wasbased on an elliptic Fourier decomposition of landmarks.Model parameters ensure the best match between the evolv-ing curve and points of strong gradients. Chakraborty etal. [10] extended this approach to a hybrid model combin-ing the region gradient and homogeneity information.

More efficient results were obtained by learning the pri-ors from a training set of manually segmented images ofgoal objects [11, 12, 13, 14, 1, 15]. Pizer et al. [11] andStyner et al. [13] segment 3D medical images by coarse-to-fine deformation of a shape-based medial representation(“m-rep”). A deformable model of Huang et al. [16] inte-grates region, shape and interior signal features assumingan approximate region shape is a priori known and alignedwith the image to initialize the model. Leventon et al. [12]and Shen et al. [14] augment a level set-based energy func-tion guiding the evolution with special terms attracting tomore likely shapes specified with the principal componentanalysis (PCA) of the training set of goal objects, whileChen et al. [17] use a geometric model with the prior “av-erage shape”. The most advanced level set-based geometricmodel of Tsai et al. [1] evolves as zero level of a 2D map

857 2009 IEEE 12th International Conference on Computer Vision (ICCV) 978-1-4244-4419-9/09/$25.00 ©2009 IEEE

Page 2: Robust Image Segmentation Using Learned Priors

of the signed shortest distances between each pixel and theboundary. The goal shapes are approximated with a lin-ear combination of the training distance maps for a set ofmutually aligned training images. High dimensionality ofthe distance map space hinders PCA, and to simplify themodel, only a few top-rank principal components are in-cluded to the linear combination. Unfortunately, this spaceis not closed with respect to linear operations, and zero levelcontour of a linear combination need not relate to a validdistance map. Therefore, the prior shape may differ muchfrom the training ones.

Prior appearance models have not been explicitly used toguide the evolution of all the above models. Typically, theguidance was based on simple assumptions such as signif-icantly different gray level means or variances in an objectand background (see e.g. [1]). More promising is to learnthe appearance model as well, although the segmentation inthis case involves usually only pixel- or voxel-wise classifi-cation [18]. An alternative approach of Joshi [19] performsnonparametric warping of a goal surface to a deformableatlas. The atlas contours are then transferred to the goalvolume. But due to no shape prior, the resulting boundariesneed not approach the actual ones. Speed of the iterativewarpings of a whole image volume is also quite low.

To overcome these problems, Paragios and Deriche [20]and Cootes et al. [21] learned the joint probabilistic shape–appearance prior models of goal objects with the PCA. Afew such models have been successfully applied to segmentcomplex 3D medical images [22]. But segmentation stillwas slow due to setting up pixel- or voxel-wise model–image correspondences at each step. To speed the processup, Yang and Duncan [23] and Freedman et al. [24] intro-duced probabilistic appearance priors with only low-ordersignal statistics. Yang and Duncan [23] used a joint Gaus-sian probability model to describe variability of the goalshapes and image gray levels and developed optimizationalgorithms for estimating model parameters from a set oftraining images and conducting the maximum a posteriori(MAP) segmentation. Freedman et al. [24] derived a speciallevel-set function such that its zero level produces approx-imately the goal shapes and used empirical marginal graylevel distributions for the training objects as the appearanceprior to guide the alignment of a deformable model to animage to be segmented. A level set-based segmentation oftextured objects in [25] uses a structure tensor and nonlin-ear diffusion to extract features that describe each object tobe segmented. In [26], a nonlinear structure tensor basedon nonlinear matrix-valued diffusion was proposed to solvethe problem of edges dislocation in a feature space due tousing smoothed Gaussian structure tensor.

Our approach follows the same ideas of using the shapeand appearance priors but differs in three aspects: (i) weuse a simple parametric deformable boundary instead of the

level set framework running into problems with linear com-binations of the distance maps; (ii) visual appearance ofgoal objects is roughly described by multiple characteristicstatistics of gray level co–occurrences, and (iii) in additionto the priors, the boundary evolution is guided at each stepwith a simple first-order probability model of the currentappearance of a goal object and its background.

Grayscale object patterns are considered as samplesof a spatially homogeneous Markov-Gibbs random field(MGRF) with multiple pairwise interaction.

Both the evolving and each goal shapes are representedby piecewise-linear boundaries with a predefined numberof control points. Corresponding points are positioned onroughly equiangular rays from the common center beingthe centroid of the control points along each boundary. Arobust wave propagation is used to find correspondences inan aligned pair of the boundaries. A preliminary variant ofour approach was tested in [28]. Below it is refined andvalidated on larger and more diverse sets of natural images.

2. Shape Prior

Let � � ����� � ��� �� � �� ���� � �� and � ������ � ��� �� � ������ � �� be a digital image andits region map, respectively, on a a finite arithmetic lattice� � ���� �� � � � �� � � � � � � � � � �� � � � � � � �.Here, � � ��� ���� � � is a finite set of gray values and� � ���� ��� is a binary set of labels. Each label ����

indicates whether the pixel ��� �� in the corresponding im-age � belongs to a goal object (��) or background (��). Let� � ��� � � � � � � ��� be a deformable piecewise-linearboundary with the � control points �� � ���� ��� form-ing a circularly connected chain of line segments �������,. . . , ���������, ��� ����. Let � � ���� � � � � � � ���be a vector description of the boundary � in terms of thesquare distances ��� � ��� � ���

� ��� � ���� from

the control points to the model centroid �� � ��� ��

��

��� ��� �� � �

��

��� ���, i.e. to the point at the min-imum mean square distance from all the control points. Let � �������������� � � � � � � � �� denote a trainingset of grayscale images of the goal objects with manuallyprepared region maps and boundary models.

To build the shape prior, all the training objects in are mutually aligned to have the same centroid and unifiedposes (orientations and scales of the objects boundaries) asin Fig. 1(a). For the definiteness, each training boundary�� � is represented with � control points in the polarsystem of �Æ equiangular rays (i.e. with the angular pitch����Æ) emitted from the common centroid ��. The raysare enumerated clockwise, with zero angle for the first po-sition ���� of each boundary. Generally, the number � ofthe control points for a particular boundary may differ from�Æ because of rays with no or more than one intersection.

858

Page 3: Robust Image Segmentation Using Learned Priors

(a) (b)

Figure 1. (a) Mutually aligned training boundaries and (b) search-ing for corresponding points between the aligned boundaries.

Because the training boundaries �� � �; � � �� � � � � � ,share the same centroid ��, any linear combination � ���

��� ���� of the training distance vectors defines a uniquenew boundary � with the same centroid. Typically, shapesof the training objects are similar, and thus their shape priorcan be built by the PCA. Let � � ��� �� � � ��� � de-note the � � � matrix with the training distance vectors ascolumns and � � ��� be the symmetric � � � Grammatrix of sums of squares and pair products

��

��� ��������� ;�� �� � �� � � � ��, of their components. The PCA of the ma-trix � produces � eigen-vectors ��� � � �� � � � ��� sortedby their eigenvalues � � � � � � � � � � �. Dueto similar training shapes, most of the bottom-rank eigen-values are close to zero and the corresponding eigenvec-tors can be discarded. Only a few top-rank eigenvectorsactually represent the training shapes, the top one, ��, giv-ing an “average” shape and a few others determining itsvariability. For simplicity, we select the top-rank eigen-vectors ��� � � �� � � � �� ��; � � � � by thresholding:���

��� � � ���

��� � with an empirical � � �� � � � ��.An arbitrary boundary �� aligned with the training set

is described with the vector �� of the squared distancesfrom its control points to the centroid. The prior shapethat approximates this boundary is specified by the linearcombination of the selected training eigenvectors: �� ����

��� ��

� �� ����

������

� ������ Each signed difference�� � ��� � ���� determines the direction and force to movethe boundary �� towards the closest shape prior �� speci-fied by the distance vector ��.Search for corresponding points is performed to suppresslocal “noise” (spurious deviations) in the training bound-aries. The corresponding points are found in the alignedtraining boundaries by a robust wave-propagation basedsearch (see Fig. 1(b)). An orthogonal wave is emitted froma point in one boundary, and the point at which the max-imum curvature position of the wave front hits the secondboundary is considered as the corresponding point.

Any one from a given set of aligned training bound-aries is selected as the reference shape, , and representedwith � equiangular points. For every point of the refer-ence shape , the corresponding point in each other trainingboundary, �, is found as follows: (i) the intersection areabetween the shapes � and is found (note that this step isperformed after the global alignment ensuring the overlaps

(a–b) (a–c)Figure 2. Corresponding points found by SIFT in each pair (a,b)and (a,c), respectively, shown in red and yellow.

between the aligned objects); (ii) the normalized minimumEuclidian distance ���� �� from every point ��� �� in theintersection area to the boundary is computed by solvingthe Eikonal equation

�� ��� ���� ��� �� � � (1)

where � ��� �� is the time at which the front crosses thepoint ��� ��; the solution uses the fast marching level setsat the unit speed function, � ��� �� � � [27]; (iii) for ev-ery control point of the reference shape , an orthogonalwave is generated by solving the Eikonal equation with thefast marching level sets at the speed function � ��� �� �� ������� ���; (iv) the point with the maximum curva-ture is traced at each propagating wave front, and (v) thepoint at which the maximum curvature point of the propa-gating wave hits the boundary � is selected as the controlpoint corresponding to the starting point of the referenceshape . The use of the resulting control points is equiva-lent to small-scale local deformations of each boundary thatincrease similarity between the training shapes.SIFT-based alignment: Just as the conventional level-setbased geometric models with the shape priors, e.g. in [1],our approach depends essentially on accuracy of mutualalignment of shapes at both the training and segmentationstages. In the latter case the deformable model is initial-ized by aligning an image � to be segmented with a trainingimage, say, �� � �, arbitrarily chosen as a prototype.

First we use the scale invariant feature transform (SIFT)proposed by Lowe [29] to reliably determine a numberof point-wise correspondences between two images undertheir relative affine geometric and local contrast / offset sig-nal distortions (comparisons to other approaches in [30]confirm its higher robustness under the affine distortions).Then the affine transform aligning � most closely to ��is determined by the gradient descent minimization of themean squared positional error between the correspondingpoints.

859

Page 4: Robust Image Segmentation Using Learned Priors

AFigure 3. Mutually aligned images shown in Fig. 2 and overlaps ofthe training region maps after (A) the alignment.

Figure 2 shows correspondences found by SIFT in digitalnatural images of starfish, zebras and magnetic resonanceimages (MRI) of human kidneys. The resulting affinelyaligned goal shapes have roughly the same center and sim-ilar poses (orientations and scales). Quality of such align-ment is evaluated in Fig. 3(A) by averaging all the trainingregion maps ��; � � �� � � � � � , after the training set � ismutually aligned.

3. Appearance Models

Our appearance prior is a rough descriptor of typi-cally complex goal grayscale patterns in terms of multiplesecond-order signal statistics. Each goal image is consid-ered as a sample of a Markov–Gibbs random field (MGRF)with an arbitrary structure of translation invariant pairwiseinteractions. Let � � ����� ��� � � � �� � � � � �� bea finite set of �� �-offsets specifying neighbors ��� ��� � ��� � � �� � ��� � ��� �� � �� � � inter-acting with each pixel �� � � �. Let ���� be a fam-ily of pairwise cliques �������� � ��� �� � � �� � ���with the offset ��� �� � � in the interaction graph on �.Let � be a vector of Gibbs potentials for gray level co-occurrences in the cliques: �� � ���

��� � ��� �� � � �

where��

��� � ������ � �� � � � �� � ���.A generic ��-order MGRF on � is specified by the

Gibbs probability distribution (GPD)

� ��� ��

���

��������

where � is the partition function, � is the cardinality of�, and ���� is the vector of scaled empirical probability

distributions of gray level co-occurrences over each cliquefamily: ����� � ������

������ � ��� �� � � �. Here,

�������� � ������ � ��� � � � �� � ���; ���� ���������� is

the relative family size, and the empirical probabilities are����� � ��� �

�������������������

where ���������� is the subfam-ily of the pairs �������� � ���� supporting a co-occurrence���� � � � ��� �� � �� � �� in the image �.

To form the appearance prior, let�� � ��� � � �� � �� � ������ � ��� be a part of� supporting the goal objectin the training image–map pair ������� � �. Let ������ ���� and ������ denote the subfamily of the pixel pairs in�� with the coordinate offset ��� �� � � and the empiricalprobability distribution of gray level co-occurrences in thetraining image �� over the subfamily ������, respectively.

The model of the �-th training object has the GPD on thesublattice��:

�� ��

��

��

����

��������

��������

���������

��

where ������ � ��������� is the relative size of ������with respect to��.

The areas and shapes of the aligned goal objects aresimilar for all � � �� � � � � � , so that �� � ��� and������ � ������� where ��� and ���� are the average car-dinalities over the training set �. Assuming the indepen-dent samples, all the training objects are represented withthe joint GPD:

�� ��

����

������

��������

���������

�������������

��

where ������� � �����������, and the empirical probabilitydistributions ������� describe the gray level co-occurrencesin all the training goal objects.

(a) � � � � (b) � � � � � (c) � � � � Figure 4. Relative interaction energies for the clique families infunction of the offsets ��� �� and the characteristic pixel neigh-bors� � (white areas in the ��� ��-plane) for the training sets of 43starfish (a), 67 zebra (b), and 1300 kidney (c) images.

We use the analytical maximum likelihood estimator forthe ��-order Gibbs potentials introduced in [31]:

�������� � �� � �������� ��������� � ��� ���� �������� (2)

860

Page 5: Robust Image Segmentation Using Learned Priors

where ������ is an empirical marginal distribution of pixelintensities, ���������� �� is an empirical joint distribution ofintensity co-occurrences, and � is a common scaling fac-torthe that also is computed analytically. In our case thelatter can be omitted, i.e. set to � � �, because only relativeenergies ����

������ of pairwise interaction in the goal objectsare of interest:

���������� �

��������

���������� ������������� ����������������

Most characteristic neighbors � � � � are selected as theprior appearance descriptors in Eq. (2) by thresholding therelative energies [31, 32]. Figure 4 shows the relative ener-gies for many candidates ��� �� and the selected neighbors.

Under this prior description, a grayscale pattern withineach current deformable boundary � in an image �

is described by its relative Gibbs energy ������ ��������� � �

���������������� where� � is a selected subset

of the top-rank neighbors, and the empirical distributions��������� are collected within �.Model of current appearance is added to the learned shapeand appearance priors in order to more accurately accountfor an image to be segmented. Both a goal object andits background are roughly modeled by assuming they re-late to distinct dominant modes of empirical marginal graylevel distributions inside and outside an initial deformableboundary �. The distributions are approximated with lin-ear combinations of discrete Gaussians (LCDG) using anExpectation-Maximization based approach similar to theLCG-approximation in [34]. Components of a DG �� ������ � � � �� integrate a continuous Gaussian with pa-rameters � ��� ��� where � and �� are the mean andthe variance, respectively, over successive gray level inter-vals 1 in �. Numbers �; � � �, of the dominant DGs inthe models are determined by maximizing the Akaike In-formation Criterion (AIC) [33] for the corresponding em-pirical distributions. Then the LCDG-models with the pos-itive dominant and sign-alternate subordinate DGs are builtby the EM-based techniques. The subordinate componentsapproximate deviations of the empirical distribution fromthe dominant mixture. Let ��; �� � �, and �� de-note total numbers of the positive and negative components.Then the LCDG appearance model is as follows:

������� ��������

����������������

���������

�������

�� ���������

��� � �

with the non-negative weights � � ���� � ��� �. As shownin [34], the numbers �� � � and �� of its subordinate

1That is, ������ � �������, ������ � ���� � ����� ���� � ����for � � �� � � � ���, and �������� � ����������� where �����is the cumulative Gaussian probability function.

components and the parameters �, � (weights, means, andvariances) of all the DGs are estimated first by a sequen-tial EM-based algorithm producing a close initial LCDG-approximation of the empirical distribution. Then under thefixed numbers �� and ��, all other parameters are re-fined with a modified EM algorithm that accounts for thesign-alternate components. Each final LCDG-model is par-titioned into two LCDG-submodels � ��� � �� ������ �� � ��, one per class � � �, by associating the subordi-nate DGs with the corresponding dominant terms such thatthe misclassification rate is minimal.

4. Model Evolution

The evolution �� � ���� of the deformable boundary �in discrete time, � � �� �� � � �, is determined by solving theEikonal �� ����� ��� ����� � � �; � � �� � � � � , where� ��� is a speed function for the control point � � ��� ��of the current boundary. Our speed function depends on theshape prior, the LCDG-model of current appearance, andthe MGRF-based appearance prior:

� ��� � ������� ��������������������� (3)

Here, is the signed distance between the current controlpoint � � �� and the like one in the closest shape prioralong the ray from the current boundary centroid. Note thatthe sign of determines the direction of propagation asshown in Fig. 5. The constant factor � determines the evo-lution speed (� � � � � for a smooth propagation). Themarginal probabilities � ������� and � ������� of the grayvalue ���� � � are estimated with the LCDG-submodels forthe object and its background, respectively. The prior con-ditional probability �������� of the gray value ���� � �

in the pixel � � ��� ��, given the current gray values inits neighborhood, is estimated in line with the MGRF priorappearance model:

����������� ��� �����������������

�� ����������

where�������� is the pixel-wise Gibbs energy for the grayvalue � in the pixel � � ��� ��, given the fixed gray valuesin its characteristic neighborhood: �������� �

��������

������������������ �� ���������� ����������

In total, the proposed segmentation algorithm is as follows:

1. Initialization (� � �):

(a) Find the affine alignment of a given image � to aselected prototype �� � using the SIFT corre-spondences and gradient optimization.

(b) Initialize the deformable model with the trainingboundary �� for ��.

861

Page 6: Robust Image Segmentation Using Learned Priors

Figure 5. Illustration of the propagation of a parametric de-formable model using fast marching level set [27].

(c) Find the current appearance LCDG model.

2. Evolution (� � � � �):

(a) Evolve the parametric deformable model bound-ary based on solving the Eikonal PDE with thespeed function in Eq. (3).

(b) Terminate the process if the overall absolutedeformation

��

��� ������� � ���� � � � (asmall predefined threshold); otherwise return toStep 2a.

3. Segmentation: transfer the final boundary to the initial(non-aligned) image � by the inverse affine transform.

5. Experimental Results and Conclusions

The performance of the proposed parametric deformablemodel was evaluated on a large number of intricate digitalimages such as starfish and zebras with the visually obvi-ous ground truth (actual object boundaries), MRI of humanbrain, and dynamic contrast-enhanced MRI (DCE-MRI) ofhuman kidneys with the ground truth presented by a radi-ologist. The DCE-MR images are usually noisy, with con-tinuously changing and low contrast. About one third ofimages of each type were used to learn the priors.

Basic segmentation stages of the algorithm are shown inFig. 6. Figures 7 and 8 highlight the advantages of using thelearned prior shape model in addition to the learned visualappearance models in case the object is occluded by anotherobject as shown in Fig. 7, or two overlapping objects havingthe same visual appearance models as shown in Fig. 8.

Additional experimental results for starfish, kidney, brainventricle, corpus callosum, and brain stem images areshown in Figures 9, 10, and 11. Also, comparative resultsare shown in Table 1, for 130 starfish images, 200 zebraimages, and 4000 kidney DCE-MR images, confirm the ac-curacy and robustness of our approach2. According to these

2Due of space limitations, more results for starfish, ze-bra, kidney images, brain stem, corpus callosum, and brainventricles and comparisons with other approaches (e.g. theactive shape model, etc.) are presented on the web sitehttp://louisville.edu/speed/bioengineering/faculty/bioengineering-full/dr-ayman-el-baz/elbazlab.html.

(a) (b) (c)

0 50 100 150 200 2500

0.002

0.004

0.006

0.008

0.01

0.012

0.014

0.016

q

Background

Object

(d) (e) (f)

(g) (h) (i)Figure 6. Chosen training kidney prototype (a), an image to besegmented (b), its alignment to the prototype and initialization (inblue) (c), LCDG-estimates (d) of the marginal gray level distri-butions ������� and �������, final segmentation (in red) of the im-age aligned to the training prototype (e), the same result (f) afterits inverse affine transform to the initial image (b) (the total error0.63% comparing to the ground truth in green (i)), the segmenta-tion (g) with the algorithm in [1] (the total error 4.9% comparing tothe ground truth (i)), and the segmentation (h) with the algorithmin [35](the total error 5.17% comparing to the ground truth (radi-ologist segmentation) (i)). Note that our results obtained using 140points describing kidney shape.

Figure 7. From left to right: an image to be segmented (note thatthe starfish is occluded by a leaf); segmentation (shown on theenergy image) based on the learned prior and current appearancemodels; final segmentation (shown on the energy image) basedon the learned prior shape, and the prior and current appearancemodels, and the same result shown on the image (the total error1.27% compared to the ground truth). Our results were obtainedwith the 525 control points for the starfish shape.

results, by accuracy and processing time our parametric de-formable model notably outperforms one of the best levelset-based geometric models proposed in [1]. One of possi-ble reasons is that its shape prior specified by linear combi-nations of the level-set distance maps may produce unpre-dictably distorted shapes in comparison to the training ones(the same limitations of signed distance maps in describingprior shape models are also mentioned in [36, 37]). Simul-taneously, our approach is faster due to mostly analyticalcomputations.

Experiments with different natural images provide sup-port for the proposed parametric deformable model guidedwith the learned shape and appearance priors. Our approachassumes that (i) the boundaries of the training and test ob-

862

Page 7: Robust Image Segmentation Using Learned Priors

(a) (b)

(c) (d)Figure 8. (a) An image to be segmented, (b) segmentation resultsbased on learned prior and current appearance models imposedon the energy domain, and the final segmentation results basedon learned prior shape, and prior and current appearance modelsimposed in the energy domain (c) and in image domain (d) (thetotal error 2.03% compared to the ground truth). Our results wereobtained using the 927 control points for the zebra shape.

Table 1. Accuracy (the minimum, ��, maximum, ���������,mean error, ��, and standard deviation of errors, ��, in %) andaverage time performance ( , sec) of our segmentation in com-parison to Tsai et al. approach [1] and Active Shape model(ASM) [35]. One third of the images of each type, i.e. 43, 67,and 1300 starfish, zebra, and kidney images, respectively, are usedas training sets.

Starfish (130 images) Zebra (200 images) Kidney (4000 images)

Our [1] [35] Our [1] [35] Our [1] [35]

�� 0.15 5.2 4.9 0.21 7.3 8.1 0.25 3.9 1.3

�� 2.0 14.6 15.7 1.1 18.6 13.1 1.5 8.3 10.3

�� 1.0 10.1 12.3 0.54 10.3 9.87 0.83 5.8 5.95

�� 0.67 3.1 5.1 0.31 3.3 2.9 0.45 1.5 3.7

� 51 549 10 59 790 13 23 253 7

jects are reasonably similar to within a relative affine trans-form and (ii) SIFT reliably detects corresponding points toautomatically align the goal objects in the images in spiteof their different backgrounds. Although these assump-tions restrict an application area of our approach compar-ing to the conventional parametric models, the latter typi-cally fail on the above and similar images. More accuratelevel set-based geometric models with linear combinationsof the training distance maps as the shape priors also relyon the mutual image alignment. Compared to these models,our approach escapes some of theoretical inconsistencies,is computationally much simpler and faster, and has similaraccuracy on high-contrast images, but notably better perfor-mance on low-contrast and multimodal ones.

References

[1] A. Tsai, A. Yezzi, W. Wells, C. Tempany, D. Tucker, A. Fan,W. Grimson, and A. Willsky, “A shape based approach tothe segmentation of medical imagery using level sets,” IEEETrans. Medical Imaging, vol. 22, pp. 137–154, 2003.

(a) Error 1.8% (b) Error 1.9% (c) Error 1.1%

(a) Error 6.1% (b) Error 12.8% (c) Error 12.4%Figure 9. Segmentation of three other starfish images with our ap-proach (the upper row) and the algorithm in [1] (the bottom row).

Error: 0.67% 1.5% 1.5% 1.3%

Error: 4.1% 7.4% 6.4% 5.9%Figure 10. Segmentation of four other kidney DCE-MR imageswith our approach (the upper rows) and the algorithm in [1] (thebottom rows) vs. the ground truth: the final boundaries and theground truth are in red and green, respectively.

(a)

(b)

(c)Figure 11. More segmentation results: (a) brain ventricle, (b) cor-pus callosum, and (c) brain stem.

[2] M. Rousson and N. Paragios, “Shape priors for level set rep-resentations,” Proc. 7�� European Conf. Computer Vision,Copenhagen, Denmark, June 2002, pp. 78–92, 2002.

[3] M. Kass, A. Witkin, and D. Terzopoulos, “Snakes: Activecontour models,” Int. J. Computer Vision, vol. 1, pp. 321–331, 1987.

[4] V. Caselles, R. Kimmel, and G. Sapiro, “On geodesic active

863

Page 8: Robust Image Segmentation Using Learned Priors

contours,” Int. J. Computer Vision, vol. 22, pp. 61–79, 1997.

[5] S. Osher and J. Sethian, “Fronts propagating with curvature-dependent speed: Algorithms based on Hamilton-Jacobi for-mulation,” J. Comput. Physics, vol. 79, pp. 12–49, 1988.

[6] T. Chan and L. Vese, “Active contours without edges,” IEEETrans. Image Processing, vol. 10, pp. 266–277, 2001.

[7] A. Pentland and S. Sclaroff, “Closed-form solutions for phys-ically based shape modeling and recognition,” IEEE Trans.Pattern Analysis Machine Intell., vol. 13, pp. 715–729, 1991.

[8] T. Cootes, C. Taylor, D. Cooper, and J. Graham, “Activeshape models – their training and application,” Computer Vi-sion Image Understanding, 61:38–59, 1995.

[9] L. Staib and J. Duncan, “Boundary finding with parametri-cally deformable contour models,” IEEE Trans. Pattern Anal-ysis Machine Intell., vol. 14, pp. 1061–1075, 1992.

[10] A. Chakraborty, L. Staib, and J. Duncan, “An integrated ap-proach to boundary finding in medical images,” Proc. IEEEWorkshop Biomedical Image Analysis, Seattle, Washington,June 24–25, 1994, pp. 13–22, 1994.

[11] S. Pizer, G. Gerig, S. Joshi, and S. Aylward, “Multiscalemedial shape-based analysis of image objects,” Proc. IEEE,vol. 91, pp. 1670–1679, 2003.

[12] M. Leventon, E. Grimson, and O. Faugeras, “Statisticalshape influence in geodesic active contours,” Proc. IEEEConf. Computer Vision Pattern Recognition, Hilton Head,South Carolina, June 13–15, 2000, pp. 316–323, 2000.

[13] M. Styner, G. Gerig, S. Joshi, and S. Pizer, “Automatic androbust computation of 3D medial models incorporating objectvariability,” Int. J. Computer Vision, vol. 55, pp. 107–122,2002.

[14] D. Shen and C. Davatzikos, “An adaptive-focus deformablemodel using statistical and geometric information,” IEEETrans. PAMI, vol. 22, pp. 906–913, 2000.

[15] D. Cremers, T. Kohlberger, and C. Schnorr, “Nonlinear shapestatistics in mumford-shah based segmentation,” In Proc.ECCV, Copenhagen, Denmark, June 2002, pp. 93–108, 2002.

[16] X. Huang, D. Metaxas, and T. Chen, “MetaMorphs: De-formable shape and texture models,” In Proc. CVPR, Wash-ington, D.C., June 27 – July 2, vol. 1, pp. 496–503, 2004.

[17] Y. Chen, S. Thiruvenkadam, H. Tagare, F. Huang, D. Wilson,and E. Geiser, “On the incorporation of shape priors into ge-ometric active contours,” Proc. IEEE Workshop Variationaland Level Set Methods, Vancouver, Canada, July 13, 2001

[18] L. Van, F. Maes, D. Vandermeulen, and P. Suetens, “A unify-ing framework for partial volume segmentation of brain MRimages,” IEEE TMI, vol. 22, no. 1, pp. 105–119, 2003.

[19] S. Joshi, “Large Deformation Diffeomorphisms and Gaus-sian Random Fields for Statistical Characterization of BrainSubmanifolds,” PhD Thesis, Washington Univ., St. Louis,1997.

[20] N. Paragios and R. Deriche, “Geodesic active contours andlevel sets for the detection and tracking of moving ob-jects,” IEEE Trans. Pattern Analysis Machine Intell., vol. 22,pp. 266–280, 2000.

[21] T. Cootes and C. Taylor, “Statistical models of appearancefor medical image analysis and computer vision,” SPIE,4322:236–248, 2001.

[22] A. Hill, A. Thornham, and C. Taylor, “Model-based inter-pretation of 3-D medical images,” Proc. 4�� British MachineVision Conf., Guildford, Sept., BMVA, pp. 339–348, 1993.

[23] J. Yang, J. Duncan, “3D image segmentation of deformableobjects with joint shape-intensity prior models using levelsets,” Med. Image Anal., vol.8(3):285–294, 2004.

[24] D. Freedman, R. Radke, T. Zhang, Y. Jeong, D. Lovelock,and G. Chen, “Model-based segmentation of medical im-agery by matching distributions,” IEEE Trans. Medical Imag-ing, pp. 281–292, vol. 24(3), 2005.

[25] M. Rousson, T. Brox, R. Deriche, “Active Unsupervised Tex-ture Segmentation on a Diffusion Based Feature Space,”InProc. CVPR, Madison, USA, June 16–22, pp. 1–8, 2003.

[26] T. Brox and J. Weickert, “Nonlinear matrix diffusion foroptic flow estimation,” In Proc. 24th DAGM Symposium,2449:446–453, 2002.

[27] D. Adalsteinsson and J. Sethian,“A fast level set method forpropagating interfaces,” J. Computational Physics, 118: 269–277, 1995.

[28] A. El-Baz and G. Gimel’farb, “Image segmentation with aparametric deformable model using shape and appearancepriors”, In Proc. CVPR, Anchorage, USA, June 23–28, pp.1–8, 2008.

[29] D. Lowe, “Distinctive Image Features from Scale-InvariantKeypoints,” Int. J. Computer Vision, vol. 60, pp. 91–110,2004.

[30] K. Mikolajczyk and C. Schmid, “A performance evaluationof local descriptors,” In Proc. CVPR, Madison, Wisconsin,June 16–22, 2003, vol. 2, pp. 257–263, 2003.

[31] G. Gimel’farb and D. Zhou, “Accurate identification of aMarkov-Gibbs model for texture synthesis by bunch sam-pling”, Proc. CAIP, Vienna, Austria, August 27–29, 2007(LNCS 4673), pp. 979–986, 2007.

[32] G. Gimel’farb, Image Textures and Gibbs Random Fields,Kluwer, 1999.

[33] F. Evans, “Detecting fish in underwater video using the EMalgorithm,” In Proc. ICIP, Barcelona, Spain, September 14–17, 2003, vol. II, pp. 177–180, 2003.

[34] A. El-Baz, “Novel Stochastic Models for Medical Im-age Analysis,” Ph.D. dissertation, University of Louisville,Louisville, KY, Chapter 2, pp. 24–60, 2006.

[35] T. Cootes and C. Taylor, “A mixture model for representingshape variation,” Image Vis. Computing, vol. 17, no. 8, pp.567-574, 1999.

[36] D. Cremers and S. Soatto, “A pseudo-distance for shape pri-ors in level set segmentation,”. In N. Paragios (Ed.), IEEE2nd Int. Workshop on Variational, Geometric and Level SetMethods, pp. 169-176, 2003.

[37] K. Pohl, S. Warfield, R. Kikinis, W. Grimson, W. Wells,“Coupling statistical segmentation and PCA shape model-ing,”In Proc. MICCAI, pp. 151- 159, 2004.

864