automatic segmentation of the liver in computed tomography ... · 1 introduction segmentation of...

Automatic segmentation of the liver incomputed tomography scans with voxel

classification and atlas matching

Eva van Rikxoort, Yulia Arzhaeva, and Bram van Ginneken

Image Sciences Institute, University Medical Center Utrecht, the Netherlands{eva,yulia,bram}@isi.uu.nl

Abstract. A fully automatic system for segmentation of the liver fromCT scans is presented. The core of the method consists of a voxel labelingprocedure where the probability that each voxel is part of the liver isestimated using a statistical classifier (k-nearest-neighbor) and a set offeatures. Several features encode positional information, obtained from amulti-atlas registration procedure. In addition, pre-processing steps arecarried out to determine the vertical scan range of the liver and to rotatethe scan so that the subject is in supine position, and post-processing isapplied to the voxel classification result to smooth and improve the finalsegmentation. The method is evaluated on 10 test scans and performsrobustly, as the volumetric overlap error is 12.5% on average and 15.3%for the worst case. A careful inspection of the results reveals, however,that locally many errors are made and the localization of the border isoften not precise. The causes and possible solutions for these failures arebriefly discussed.

1 Introduction

Segmentation of the liver has several clinical applications. It can be used to mea-sure the graft volume before living donor liver transplantations [1]. It is oftenperformed as part of hepatic surgery planning, and although locating vessels andtumors and liver segments may be more important in this application, segmen-tation of the liver is reported to be one of the most time-consuming steps of theplanning procedure [2]. Liver segmentation could also useful to monitor patientswith liver metastases, where the disease is related to an enlargement of the liver.

Automatic liver segmentation from computed tomography (CT) scans is anextremely challenging task, because of shape variations, neighboring structureswith similar intensity values, the enormous variability of structures in the ab-domen in general, the presence of tumors and other abnormalities and contrastmaterial in the liver, and variations in scanning protocols.

2 Method overview & Rationale

The task we set ourselves in this study was to use a general supervised segmen-tation algorithm that could potentially be used to solve a wide range of medical

T. Heimann, M. Styner, B. van Ginneken (Eds.): 3D Segmentation in The Clinic: A Grand Challenge, pp. 101-108, 2007.

image segmentation tasks, and apply it to CT liver segmentation with as littletweaking as possible. The core of the method consist of a voxel labeling pro-cedure: for every voxel in the test set a number of numerical values (a featurevector) is computed and a statistical classifier [4], trained with examples of fea-ture vectors and their corresponding truth (liver or not liver) maps this vectorto a probability that the voxel is part of the liver.

In this approach the key to success is to define good features. From an in-spection of the training data, two types of features seemed important. First,the density of the voxel and the variation of density values in the vicinity ofa voxel. We encode these by computing Gaussian derivatives of various scales.There is nothing particular about this way of representing local image structure,although it is interesting to note that some psychophysical evidence supportsthe idea that this set of operators is encoded in the sensorium of humans andother organisms [5]. It is clear, however, that such gray-level features alone willnot be sufficient. Indeed, if we train a voxel classification system with only thesefeatures, it elicits a strong response (e.g. assigns high probability to voxels) incertain other organs. These voxels are hard to distinguish from liver if one wouldexamine, in isolation, only a small part of the scan around a central voxel. Itseems that the location of a voxel relative to the global anatomy in the scan, isanother important clue that should be encoded in one or more features if ourvoxel labeling approach is to be successful.

A first idea to accomplish this might be to take the coordinates of a voxel(preferably in real world coordinates to accommodate differences in slice thick-ness between scans) as features. We have used this approach successfully for lungsegmentation from chest radiographs by pixel classification [6]. In the currentapplication, however, the ‘raw’ position features turn out to be not very infor-mative, as the scans considered in this study have widely varying scan ranges.In some scans the complete thorax and abdomen are included, other scans arelimited the vertical range that contains the liver only. In addition, some scanscontained the patient in a rotated position on the table. To accommodate forthese variations, we developed a simple ad-hoc procedure, detailed in the nextsection, to find the vertical scan range that contains only the liver and to correctthe orientation if needed.

After applying this preprocessing to the scan, the raw voxel coordinates(in millimeter) could, in principle, provide useful features to complement thegray-level features in the voxel labeling. Results with this approach were, un-fortunately, somewhat disappointing. Note that the preprocessing stage can bethought of as a rough form of alignment. This led us to incorporate a more so-phisticated form of alignment into the method. After the preprocessing stage offinding the liver scan range and correcting for rotation, we registered each testscan with all twenty training scans and applied the transformations to the refer-ence segmentation provided with the training scans. The resulting transformedsegmentations were averaged to provide a probabilistic atlas segmentation. Thisprocedure is generally known as non-rigid multi-atlas segmentation [7].

102

The atlas segmentation can directly be used as a final result. This did, how-ever, not lead to very accurate results. Because the anatomy of the abdomenvaries greatly between individuals, a sensible elastic deformation between twoscans is often impossible. We therefore incorporate the atlas segmentation intothe voxel classification by computing for every voxel which part of the liver vol-ume (according to the atlas segmentation) is above, behind and to the left ofit as three additional features. Next the voxel classification is applied. Detailsabout the classifier are listed in the next section. The final result is obtained byapplying post-processing steps that employ prior knowledge that the liver is asingle connected object that is more or less convex and contains no holes.

The approach we outlined here is in principle generic and could be appliedto many other segmentation tasks. We note that in [8], a largely similar methodwas applied to segment lungs in CT scans.

3 Method

3.1 Pre-processing

Inspection of the training data revealed several sources of variation that mayinterfere with registration and voxel-based classification. These are 1) variationin slice thickness; 2) variation in vertical scan range; 3) variation in patientorientation. We performed several preprocessing steps aimed at removing thisvariation.

Resampling. We resampled the data in the vertical direction (using linearinterpolation) so that the voxel size became isotropic (and the slices measured512 by 512). This removes the variation in slice thickness, but of course scanswith thicker slices have more blurring in the vertical direction, and that doesaffect certain subsequent processing steps.

Liver Scan Range Determination. We attempted to determine a rangethat covers the liver and a bit above and below it. A connected componentanalysis was performed on all voxels below -500 HU. Components that touchedthe sides of the scan in x- and y-direction were discarded (they correspond toair outside the body). Subsequently the two largest components were selected,these are likely the lungs. The lowest slice that contained any of these objectswas computed. The range taken is 140 mm above this slice and 150 mm belowit (as long as these slices are available in the scan). This range was selectedby inspecting the vertical range of the liver in relation to the lung base on thetraining data.

Rotation correction. We determine the orientation of the body along thevertical axis in the scan. (We do not attempt to correct for orientation variationin other directions although these do exist too.) We first compute a maximumintensity projection along the vertical direction and threshold this at 400 HU.This detects dense structures, mostly bone and possibly contrast material. Wenow exploit the fact that the bony skeleton has more spread from left to rightthan from back to front along the body. We rotate this image using rotations

103

Fig. 1. Left: slice from case #1, after rotation correction and subsampling to isotropicdata with 256 by 256 voxels per slice. This slice roughly corresponds to the upper-rightslice in Fig. 2. Note that this test scan has the patient in a rotated position, and thisrotation has been corrected. Middle: probabilistic atlas segmentation of the slice. Largeparts of the liver in this slice have been registered correctly, but some registrationswere partly incorrect, leading to non-zero probabilities (gray zones) outside the liver.Right: result of voxel classification before post-processing. The accuracy has markedlyimproved compared to atlas segmentation but certain regions outside the liver stillyield non-zero responses. The outline after post-processing is shown in blue in Fig. 2,upper-right.

from -90 to +90 degrees with a step size of 1 degree and for every orientation thestandard deviation of the projection along the horizontal direction is determined.The angle for which this spread is maximum is selected. For most scans this angleis close to 0, but when its absolute value is larger than 5 degrees, we apply thatrotation to the scan.

3.2 Registration

Twelve of the twenty test sets were selected, excluding rotated scans and scanswith lower resolution. The registration between these twelve training scans anda test scan was performed with ElastiX, a publicly available registration package[9]. As non-rigid registrations are computationally fairly expensive, subsampleddata was used (128 by 128 voxels per slice, isotropic voxels). Elastix is a volumet-ric registration package that treats the registration as an optimization problemin which the similarity between the target and atlas image is maximized. For thecost function negative mutual information was used using a Parzen window esti-mation procedure proposed by [10] with 32 histogram bins. The transformationis initialized with an affine transformation followed by a non-rigid transformationmodeled by a B-spline. For the optimization of the cost function, an iterativestochastic gradient optimizer was used. To avoid local minima, a multi-resolutionstrategy was taken. For the affine registration 4 resolutions were used, in each ofwhich 512 iterations of the stochastic gradient descent optimizer were performed.For the non-rigid B-spline registration 5 resolutions were used, with 256 stepsper level. The B-spline grid spacing used in these resolutions was 64, 64, 32, 16,and 8 voxels, respectively.

104

The average of the twelve transformed reference segmentations (the atlassegmentation) is upsampled to 256 by 256 voxels per slice, isotropically by linearinterpolation. An example is shown in Fig. 1.

3.3 Voxel classification

Voxel classification is performed on isotropic data with slices of 256 by 256 vox-els. The features that are computed are the gradient and Gaussian derivativesthrough second order, at scales σ = 1, 2, 4, 8 voxels (44 features in total). Spatialinformation is encoded by 3 features that indicates for each voxel which per-centage of an atlas segmentation mass (the sum of all probabilities) is above,next or behind it. Voxel classification is only applied to those voxels that liein a box that is determined from the bounding box of the atlas segmentation,thresholded at 0.5, extended by 30 voxels in all directions. The extension is ap-plied to be able to sample a representative set of negative (non-liver) samplesand to make sure the complete liver is included in the box. For training, every8th voxel in each direction is taken from the twenty training scans, resulting ina training set of 111,412 voxels. A k-NN classifier [11] with k = 15 is used forclassification. The fast implementation for approximate neighbor retrieval fromArya and Mount [12] is used in computations, with ε set to 2.0. An optimalsubset of features is selected with sequential floating forward search [13], wherethe training set is divided in two equal parts for the feature selection process.Ten features were selected. In Fig. 1 a voxel classification result is shown.

3.4 Post-processing

The voxel classification result could simply be thresholded to obtain a finalsegmentation, but this does not lead to a smooth, single connected object. Wetherefore applied a heuristic sequence of operations to achieve better results.First, the voxel classification is blurred with σ = 1 voxel, then thresholded at0.3. The largest connected component is selected and an opening with a 3 voxelradius spherical structuring element is applied. These values were determinedby visually inspecting the results of a few settings on a few test cases. Holesin the object, if any, are filled. Finally the pre-processing steps, including theresampling, are applied in the reverse order to get a binary segmentation volume.

4 Experiments & Results

The method was applied to 10 test scans. Five metrics were computed, volumet-ric overlap error, relative volume difference, mean, RMS and maximum surfacedistance. For each metric and each case, a score was computed, in such a waythat 100 points is a perfect result and 75 points roughly corresponds to the scorethat can be expected of a human observer. A more detailed description of thedata and the scoring system can be found on [3].

The results are tabulated in Table 3.4. Slices from three cases, showing theresult of the method (in blue) and the reference (in red) are given in Fig. 2.

105

Dataset Overlap Error Volume Diff. Avg. Dist. RMS Dist. Max. Dist. Total[%] Score [%] Score [mm] Score [mm] Score [mm] Score Score

1 12.6 51 3.6 81 2.2 44 4.1 43 27.4 64 562 15.3 40 4.2 77 2.5 38 4.3 40 26.6 65 523 10.4 59 -3.4 82 2.2 44 3.7 48 31.8 58 584 12.3 52 3.5 82 2.5 37 3.8 47 22.6 70 575 11.9 54 0.5 98 2.6 34 4.4 39 37.1 51 556 12.3 52 -4.8 75 3.0 24 8.0 0 66.3 13 337 13.8 46 -1.2 93 2.5 38 4.8 34 36.8 52 528 10.1 61 8.7 54 1.9 52 2.8 61 22.0 71 599 15.1 41 1.6 92 2.6 36 5.0 30 35.9 53 50

10 11.2 56 5.5 71 1.9 52 2.7 62 17.5 77 64Average 12.5 51 1.8 80 2.4 40 4.4 40 32.4 57 54

Table 1. Results of the comparison metrics and scores for all ten test cases.

5 Discussion

The method we have presented performs quite robust. The highest overlap erroris 15.3% and the lowest value is 10.1%. Visual inspection of the results shows thatthere is still a lot of room for improvement. In all scans two types of errors aremade. First, occasionally, a completely incorrect border is found. Almost alwaysboth the voxel classification and the registration are incorrect in these situations.For humans, it is often easy to spot these errors because they have resulted inan implausible liver shape. Explicit inclusion of a shape model in our methodmight resolve some of these errors, but incorporating global shape constraintsin a local approach such as voxel classification is not trivial. It is possible thatwith a few simple but intelligently chosen post-processing rules many of theseerrors could also be avoided, but the wide variation in liver morphology makesit hard to define rules that are robust.

A second type of error is imprecise localization of the border. This is espe-cially visible in the second row of Fig. 2. The fact that we have performed thevoxel classification on subsampled data is probably one of the reasons for sucherrors, which occurred more often in the scans with thicker slices. Also unusuallyshaped livers are likely to be incorrectly classified at their borders because thoseparts of the feature space are not adequately sampled.

The preprocessing steps performed satisfactorily. In one test scan with thickslices one of the two largest components that were supposed to represent lungswas actually connected to air in the bowels causing a slight overestimate ofthe scan range. In one case the rotation correction was about 15 degrees offbecause of a large amount of contrast material present in the bowels on onlythe left side of the body. Clearly a more sophisticated analysis is necessaryto find the scan range of the liver and the orientation of the patient in thescan reliably. Alternatively, a more robust registration algorithm, e.g. one thatemploys multiple start positions, may make the preprocessing stage superfluous.

106

Fig. 2. From left to right, a sagittal, coronal and transversal slice from a relatively easycase (1, top), an average case (4, middle), and a relatively difficult case (3, bottom). Theoutline of the reference standard segmentation is in red, the outline of the segmentationof the method described in this paper is in blue. Slices are displayed with a window of400 and a level of 70.

The registration and voxel classification are crucial elements in this method.We have not performed extensive experiments about which settings and designchoices are best. It is possible that a different similarity measure, like sum ofsquared differences instead of mutual information, may give better results (butthe variations in contrast material and the presence of tumors, may favor asimilarity measure that does not assume similar intensities of registered voxels).Likewise, different features or classifiers may be more suitable than the onesemployed here.

It is interesting to note that the combination of atlas matching and voxelclassification performed much better than any of these alone. Possibly the use ofa larger number of reference scans (atlases) in combination with suitable atlasselection strategies, may allow atlas matching alone to yield good results.

The required computation time (stated here for standard PC) of this methodis quite large. Registration of one scan pair requires 3 minutes (so 36 minutesfor registering the twelve selected training scans to one test scan) and voxel

107

classification takes 10 minutes. Pre- and post-processing require a bit more thana minute in total. As the vast majority of voxels do not require a complexclassifier, and twelve reference scans for registration is probably excessive, webelieve the processing time for one test scan could be brought down to around15 minutes or a few minutes if the computations are distributed over multiplecores or PCs.

References

1. Hermoye, L., Laamari-Azjal, I., Cao, Z., Annet, L., Lerut, J., Dawant, B.M., Beers,B.E.V.: Liver segmentation in living liver transplant donors: comparison of semi-automatic and manual methods. Radiology 234(1) (2005) 171–178

2. Heimann, T., Wolf, I., Meinzer, H.P.: Active shape models for a fully automated3d segmentation of the liver — an evaluation on clinical data. In: Medical ImageComputing and Computer-Assisted Intervention. Volume 4190 of Lecture Notes inComputer Science. (2006) 41–48

3. Heimann, T., van Ginneken, B., Styner, M.: Workshop on 3D segmen-tation in the clinic: A grand challenge. http://mbi.dkfz-heidelberg.de/

grand-challenge2007/ (Accessed July 18, 2007)4. Duda, R.O., Hart, P.E., Stork, D.G.: Pattern Classification. 2nd edn. John Wiley

and Sons, New York (2001)5. Koenderink, J., van Doorn, A.: Representation of local geometry in the visual

system. Biological Cybernetics 55 (1987) 367–3756. van Ginneken, B., Stegmann, M., Loog, M.: Segmentation of anatomical structures

in chest radiographs using supervised methods: a comparative study on a publicdatabase. Medical Image Analysis 10(1) (2006) 19–40

7. Rohlfing, T., Brandt, R., Menzel, R., Russakoff, D.B., Maurer, Jr., C.R.: Quovadis, atlas-based segmentation? In: The Handbook of Medical Image Analysis– Volume III: Registration Models, New York, NY, Kluwer Academic / PlenumPublishers (2005) 435–486

8. Sluimer, I.C., Prokop, M., van Ginneken, B.: Towards automated segmentation ofthe pathological lung in CT. IEEE Transactions on Medical Imaging 24(8) (2005)1025–1038

9. Klein, S., Staring, M.: Elastix. http://www.isi.uu.nl/Elastix/ (Accessed July17, 2007)

10. Mattes, D., Haynor, D.R., Vesselle, H., Lewellen, T.K., Eubank, W.: PET-CTimage registration in the chest using free-form deformations. IEEE Transactionson Medical Imaging 22(1) (2003) 120–128

11. Cover, T., Hart, P.: Nearest neighbor pattern classification. IEEE Transactions onInformation Theory 13(1) (1967) 21–27

12. Arya, S., Mount, D., Netanyahu, N., Silverman, R., Wu, A.: An optimal algorithmfor approximate nearest neighbor searching in fixed dimensions. Journal of theACM 45(6) (1998) 891–923

13. Pudil, P., Novovicova, J., Kittler, J.: Floating search methods in feature selection.Pattern Recognition Letters 15(11) (1994) 1119–1125

108

automatic segmentation of the liver in computed tomography ... · 1 introduction segmentation of...

Documents