user-friendly image-based segmentation and analysis of ...bigized hardware and, through imagej, any...

USER-FRIENDLY IMAGE-BASED SEGMENTATION AND ANALYSIS OF CHROMOSOMES

V. Uhlmann, R. Delgado-Gonzalo, M. Unser∗

Biomedical Imaging Group,EPFL, Lausanne, Switzerland

P. O. Michel, L. Baldi, F. M. Wurm

Cellular Biotechnology Laboratory,EPFL, Lausanne, Switzerland.

ABSTRACT

We designed two efficient and user-friendly tools for the seg-mentation and analysis of images containing chromosomesor, more generally, rod-shaped elements that are spread onmicroscopic slides. The segmentation tool allows to auto-matically extract the profile of each chromosome and to sortthe collection of profiles in a karyotype image. The analysistool is interactive and allows to extract quantitative measure-ments and annotate the relative position of the centromere tothe chromosome extremities in a fast and reproducible way.The two methods rely on custom variants of parametric activecontours. Both have been designed as user-friendly plug-insfor the open-source software ImageJ.

Index Terms— Active contours, segmentation, chromo-somes, image analysis.

1. INTRODUCTION

Karyotyping is an important operation for genetic analysessuch as species or gender determination of eukaryotic cellsand organisms. It is based on the analysis of the number andmorphology of chromosomes that become compacted intorod-like structures during metaphase, the first short phase ofthe cell division process. More generally, the assessment ofchromosomal size as well as the analysis of shape parameters(such as centromere position and sizes of chromosome arms)through the microscopic analysis of metaphase spreads is awidely applied technique for the study of genomic structure,organization, function and evolution in different fields ofbiological and environmental sciences [1, 2, 3].

Automated computer-based chromosome analysis toolshave been developed since the seventies, including earlymethods based on densitometric scanning of standard pho-tographs [4, 5]. These approaches remained anecdotic dueto the poor computational power available at the time. Morerecently, the availability of multi-purpose open-source imageanalysis softwares enables biologists to process and anno-tate chromosome images. However, most of the processingis still performed in a fully-manual fashion, accounting for

∗This work was funded by the Swiss National Science Foundation underGrants 200020-144355 and 200020-121763.

Fig. 1. Chromosome segmentation and analysis pipeline.Raw image (top left), chromosome annotation using the an-notation tool (top left), and karyotype image generated fromthe raw image using the segmentation tool (bottom).

large amount of operator time as genetic experiments usuallyinvolve large collections of data.

In this paper, we present a relatively simple and cost-effective method based on digital image processing for kary-otyping and performing chromosome analysis (Figure 1). Wedesigned two pieces of software that can be used together orseparately in order to segment and analyze chromosome im-ages. First, the segmentation tool aims at extracting the im-age profile of each chromosome and sorting them by size ina karyotype image. The underlying algorithm relies on stan-dard image processing methods along with a custom modelof parametric active contours. More precisely, it is composedof four steps: image normalization, detection of candidate re-gions via data clustering and pruning, outlining with activecontours, and extraction of the profile of the chromosomes.Then, the analysis tool allows to precisely extract numeri-cal measurements such as lengths and centromere positionsin images of chromosomes. The output is provided in a stan-dard format to allow further data processing. The approach ismostly automated and requires minimum user input. Thus, itreduces the intra- and inter-user variability and speeds up theannotation process.

Both tools have been programmed as plug-ins for Im-ageJ, which is a free open-source multiplatform Java image-processing software [6]. They do not depend on any special-

978-1-4799-2349-6/16/$31.00 ©2016 IEEE 395

ized hardware and, through ImageJ, any common image for-mat may be used. The two plug-ins are freely available on-line along with their companion user manual1. We dedicatethis paper to the description of the algorithmic details of thesegmentation tool (Section 2) and of the annotation tool (Sec-tion 3). Both softwares have already been successfully usedfor practical applications in biology [7], significantly speed-ing up the image analysis and data extraction steps.

2. DETECTION AND SEGMENTATION OF THECHROMOSOMES

In this section, we provide the algorithmic details of the seg-mentation tool, which extracts the chromosomes and gathersthem in a karyotype image containing the different chromo-somes sorted by size.

2.1. Image Normalization

To improve the robustness of the segmentation, the image isfirst locally normalized to zero mean and unit variance oversome neighborhood. This process is repeated in sliding fash-ion over the whole image. This is especially useful to correctfor nonuniform illumination or shading artifacts. The estima-tion of the local mean and standard deviation is performedthrough local spatial smoothing using fast recursive Gaussianfilters.

2.2. Detection of Candidate Regions

Standard k-means clustering [8] is then performed on the nor-malized image in order to partition all pixels into k classes.The best results, allowing to separate the inner from the outerpart of the chromosomes is achieved by clustering the imagein k = 3 classes. In this situation, the class with the low-est mean is identified as the background, the class with thehighest mean contains the pixels that represent the chromo-somes, and the remaining class contains the pixels that belongto a transition zone between the background and the chromo-somes.

The k-means clustering provides a set of pixels that rep-resents the foreground objects, but the algorithm does notdiscriminate between different chromosomes within the fore-ground cluster. To separate them, the 8-connected compo-nents of the foreground cluster is labeled using a linear-timealgorithm [9]. Each connected component is a chromosomecandidate. However, due to the presence of photometric noiseand overlap between different chromosomes, it is necessaryto discard candidate regions that do not meet certain criteria.To that end, the centroid of each region, its inertia matrix, andits area are computed. Regions smaller than τa = 50 square-pixel units are then discarded. After computing the minimal

1Segmentation tool: http://bigwww.epfl.ch/algorithms/chromosomeJ/Analysis tool: http://bigwww.epfl.ch/algorithms/chromosomeK/

Fig. 2. The rectanguscule. The inner rectangle Rin, shownin brighter gray, is of constant width w. The outer rectan-gle Rout, shown in darker gray, has the same centroid andorientation as the inner one and is obtained by extending theboundaries of Rin by a distance β. These rectangles are en-tirely determined by the pair of points {p,q} that belong tothe boundary of the inner one.

bounding rectangle along the direction given by the inertiamatrix of a candidate region, regions associated to boundingrectangles that are longer than some user-specified value τlare finally discarded.

2.3. Outlining with Custom Active Contours

Each chromosome is individually segmented using as initial-ization the bounding rectangles from previous step. For thistask, we rely on active contours (a.k.a. snakes [10]). Snakesare effective tools for image segmentation, which consist in acurve that evolves from an initial position towards the bound-ary of an object. The evolution of the curve is formulated as aminimization problem. The associated cost function is calledthe snake energy.

To guide the process, the chromosomes are first assumedto have a conserved width. Second, we assume that the shapeof the chromosomes can be described by one rectangle or by acomposition of at most two rectangles. Finally, the pixels thatrepresent the chromosome are assumed to be brighter than thebackground.

To efficiently segment bright objects, we use an approachsimilar to [11] and adapt it to the specifics of the segmenta-tion of quasi-rectangular chromosomes. We name our rect-angular snake the rectanguscule (Figure 2). The rectangus-cule is parameterized by two points p and q on the imageplane. The inner rectangle, Rin, is of constant user-specifiedwidth w. The length and orientation are determined by thecontrol points. The area of the inner rectangle is given by|Rin| = w ‖p − q‖. The outer rectangle, Rout, has the samecentroid and orientation as the inner one. It is constructed byextending the boundaries of the inner rectangle by a constantuser-specified distance β. The area of the outer rectangle isthus computed as |Rout| = (2β + ‖p− q‖) (2β + w).

The corners of the outer rectangle are determined by the

396

control points according to the relations

rout1 = q+ βu+(w2

+ β)v, rout2 = q+ βu−

(w2+ β

)v,

rout3 = p− βu−(w2

+ β)v, rout4 = p− βu+

(w2+ β

)v,

where u = (q− p) /‖q− p‖ is the unit vector that deter-mines the orientation of the rectangle and v is a vector or-thonormal to u. Likewise, the corners of the inner rectangleare determined by the control points according to the relations

rin1 = q+w

2v, rin2 = q− w

2v,

rin3 = p− w

2v, rin4 = p+

w

2v.

The rectanguscule is a surface snake whose energy is notdriven by the data under the curve, but by the data enclosedby it. At each iteration of the optimization process, the geom-etry of the rectanguscule is updated to increase the contrastbetween the intensity of the data averaged over its rectangularcore Rin and the intensity of the data averaged over its rect-angular shell Rout. For Rin ⊂ Rout and the input image dataf , the snake energy is defined as

E =

∫Rout\Rin

f (x, y) dxdy − λ∫Rin

f (x, y) dxdy,

where the factor λ is set to

λ =

(|Rout||Rin|

− 1

).

This value of λ enforces that the energy remains zero when ftakes a constant value irrespective of the size and orientationof the snake. The position of the control points is tuned us-ing a standard unconstrained optimization algorithm that min-imizes the energy of the snake. The final configuration of thecontrol points provides an accurate description of the orien-tation and size of each chromosome. The optimization of thesnake is carried out efficiently by a Powell-like line-searchmethod [12].

2.4. Extraction of the Chromosomes

Finally, we extract each chromosome within the region de-termined by the rectangular snakes. In some situations, longchromosomes may appear bended with respect to their cen-tral section. In these cases, two different rectangular snakesare used to capture each branch of the chromosome. In oursoftware, we allow the user to connect/disconnect rectangularsnakes in order to deal with bent chromosomes. The chro-mosome extraction is performed using cubic-spline interpo-lation at the resolution of the original image. We sort theextracted chromosomes by the combined length of the rect-angular snakes that determine them in order to construct thekaryotype image.

3. INTERACTIVE ANALYSIS OF THECHROMOSOMES

In the following, we describe the design of the analysis toolfor interactively extracting quantitative information on thechromosomes.

3.1. Annotation of the Chromosomes

To initiate the annotation process, the user defines the twoextremities of a chromosome by mouse-clicking on their lo-cation within the image. The two points serve as seeds toautomatically draw a curve going through the medial axis ofthe chromosome. This is obtained by optimizing an open ac-tive contour (or open snake). Our formulation differs fromclassical snakes which are usually defined as closed curves.The open snake is parametrized by M ≥ 2 control points.The two user-defined extremities delimit the chromosome areconsidered as fixed anchors. The additional M − 2 controlpoints are defined with or without constraints. We define theparametric representation of the curve

r(t) =

(x(t)y(t)

)=

M−1∑k=0

ckφ(t− k)

with natural cubic spline as basis function φ(t), as it hasbeen demonstrated [13] that cubic splines are optimal forrepresenting smooth parametric curves of low-curvature. Theopen snake energy is obtained by integrating the values of theEuclidean-distance-transformed image (fEDM, [14]) underthe snake curve, i.e.,

E =

∫C

fEDM ds =

∫ 1

0

fEDM(r(t)) |r′(t)|dt.

In order to optimize our open snake, we rely on a multires-olution strategy inspired from [13]. The open snake is firstinitialized with three control points: the two user-defined ex-tremities, and a point at mid-distance in the segment that joinsthem. The endpoints are considered as ground truth to definechromosome extremities and are not modified during the opti-mization process. The unconstrained optimization algorithmdescribed in Section 2.3 is used to modify the free controlpoint position and minimize the energy of the snake. At con-vergence, the optimal curve is used to spawn a snake offspringcomposed of five control points, the additional points beingset halfway between each pair of parent control points. Thethree parent points are fixed, leaving only two to be modified.This two-steps approach grants enough flexibility to detect S-shaped chromosomes without loosing precision on U-shapedones (Figure 3). It ensures that the three first control points arewell positioned to match U-shaped objects, and then refinesthe curve to match objects with more complex geometry. Inthis way, all chromosome shapes can accurately be segmentedin minimal computation time.

397

Fig. 3. Variation of the number of control points. U-shapedchromosomes are best annotated using a three control pointssnake (top left). Five free control points results in irregularcurves (top right). On S-shaped chromosomes, a three controlpoints snake doesn’t offer enough flexibility to precisely cap-ture chromosome shape (bottom left). When only five con-trol points are used, a more precise trace is obtained (bottomright).

Once the medial axis of a chromosome is detected, theuser can define the centromere position by clicking on thetrace. A numerical label is assigned to each chromosomebased on its length (1 being the longest), with automaticrenumbering during image annotation. The curve, cen-tromere, and extremities of any chromosome can be editedat all time. We emphasize that the centromere is only anoptional annotation on the curve and is not related in any wayto the position of the control points.

3.2. Chromosome Measurements

Chromosome annotations are saved in an XML file that can bereloaded in the software for further edits. The annotated im-age as well as a comma-separated value file containing helpfulmeasurements such as total chromosome length, centromereposition, and relative length of each chromosome arms (if thecentromere was defined) are extracted as well. It is worth not-ing that chromosome traces are recorded in the XML file as acollection of point coordinates. These data can hence handilybe used to extract additional shape-based measurements fromeach object.

4. CONCLUSIONS

Our contribution in this paper are two user-friendly, free, andopen-source methods for image-based karyotyping and anal-ysis of chromosomes (Figure 4). The analysis tool can beused independently or as a companion plug-in for the seg-mentation/karyotyping tool. Although built on rather simpleimage processing algorithms, our softwares speed up process-ing time, systematize data extraction, and have already beenshown to be useful for biologists. In particular, we point outthat while both tools were initially developed for chromosome

Fig. 4. Example of data analysis. Raw image (left), result-ing karyotype (top right), and extracted chromosome lengths(bottom right).

analysis, they can be used as well for the analysis of other rod-shaped objects, including bacteria, worms, etc.

5. REFERENCES

[1] S.A. Romanenko et al., “Karyotype evolution and phylogenetic rela-tionships of hamsters (Cricetidae, Muroidea, Rodentia) inferred fromchromosomal painting and banding comparison,” Chromosome Re-search, vol. 15, no. 3, pp. 283–298, April 2007.

[2] X. Li et al., “Chromosome size in diploid eukaryotic species centers onthe average length with a conserved boundary,” Molecular biology andevolution, vol. 28, no. 6, pp. 1901–1911, January 2011.

[3] I. Titos, T. Ivanova, and M. Mendoza, “Chromosome length and perinu-clear attachment constrain resolution of DNA intertwines,” The Journalof cell biology, vol. 206, no. 6, pp. 719–733, September 2014.

[4] T.T. Puck, “Action of radiation on mammalian cells III. Relationshipbetween reproductive death and induction of chromosome anomaliesby X-irradiation of euploid human cells in vitro,” PNAS, vol. 44, no. 8,pp. 772, August 1958.

[5] D. Schoevaert-Brossault, C. Leonard, and J. Selva, “A new method forautomatic metaphase finding adaptable to different chromosome prepa-rations,” Computer programs in biomedicine, vol. 16, no. 3, pp. 195–201, June 1983.

[6] M.D. Abramoff, P.J. Magalhaes, and S.J. Ram, “Image processing withImageJ,” Biophotonics International, vol. 11, no. 7, pp. 36–41, July2004.

[7] P.O. Michel et al., “Assessment of chromosomal length variation inCHO cells,” in Proceedings of ESACT’13, Lille, France, June 23-26,2013, p. 206.

[8] A.K. Jain and R.C. Dubes, Algorithms for Clustering Data, PrenticeHall, 1988.

[9] K. Suzuki, I. Horiba, and N. Sugie, “Linear-time connected-componentlabeling based on sequential local operations,” Computer Vision andImage Understanding, vol. 89, no. 1, pp. 1–23, January 2003.

[10] R. Delgado-Gonzalo, V. Uhlmann, D. Schmitter, and M. Unser,“Snakes on a plane: A perfect snap for bioimage analysis,” IEEE SignalProcessing Magazine, vol. 32, no. 1, pp. 41–48, January 2015.

[11] P. Thevenaz, R. Delgado-Gonzalo, and M. Unser, “The ovuscule,”IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.33, no. 2, pp. 382–393, February 2011.

[12] W.H. Press, S.A. Teukolsky, W.T. Vetterling, and B.P. Flannery, Numer-ical Recipes: The Art of Scientific Computing, Cambridge UniversityPress, third edition, 1986.

[13] P. Brigger, J. Hoeg, and M. Unser, “B-Spline snakes: A flexible toolfor parametric contour detection,” IEEE Transactions on Image Pro-cessing, vol. 9, no. 9, pp. 1484–1496, September 2000.

[14] A. Rosenfeld and J.L. Pfaltz, “Distance functions on digital pictures,”Pattern Recognition, vol. 1, no. 1, pp. 33–61, July 1968.

398

user-friendly image-based segmentation and analysis of ...bigized hardware and, through imagej, any...

Documents