analyzing huge pathology images with open source … · analyzing huge pathology images with open...

12
Analyzing huge pathology images with open source software Christophe Deroulers *1 , David Ameisen 2 , Mathilde Badoual 1 , Chlo´ e Gerin 3,4 , Alexandre Granier 5 and Marc Lartaud 5,6 1 Univ Paris Diderot, Laboratoire IMNC, UMR 8165 CNRS, Univ Paris-Sud, F-91405 Orsay, France 2 Univ Paris Diderot, Laboratoire de pathologie, Hˆ opital Saint-Louis APHP, INSERM UMR-S 728, F-75010 Paris, France 3 on leave from CNRS, UMR 8165, Laboratoire IMNC, Univ Paris-Sud, Univ Paris Diderot, F-91405 Orsay, France 4 now at CNRS, UMR 8148, Laboratoire IDES, Univ Paris-Sud, F-91405 Orsay, France and CNRS, UMR 8608, IPN, Univ Paris-Sud, F-91405 Orsay, France 5 MRI-Montpellier RIO Imaging, CRBM, F-34293 Montpellier, France 6 CIRAD, F-34398 Montpellier CEDEX 5, France Email: Christophe Deroulers * - [email protected]; David Ameisen - [email protected]; Mathilde Badoual - [email protected]; Chlo´ e Gerin - [email protected]; Alexandre Granier - [email protected]; Marc Lartaud - [email protected]; * Corresponding author Abstract Background: Digital pathology images are increasingly used both for diagnosis and research, because slide scanners are nowadays broadly available and because the quantitative study of these images yields new insights in systems biology. However, such virtual slides build up a technical challenge since the images occupy often several gigabytes and cannot be fully opened in a computer’s memory. Moreover, there is no standard format. Therefore, most common open source tools such as ImageJ fail at treating them, and the others require expensive hardware while still being prohibitively slow. Results: We have developed several cross-platform open source software tools to overcome these limitations. The NDPITools provide a way to transform microscopy images initially in the loosely supported NDPI format into one or several standard TIFF files, and to create mosaics (division of huge images into small ones, with or without overlap) in various TIFF and JPEG formats. They can be driven through ImageJ plugins. The LargeTIFFTools achieve similar functionality for huge TIFF images which do not fit into RAM. We test the performance of these tools on several digital slides and compare them, when applicable, to standard software. A statistical study of the cells in a tissue sample from an oligodendroglioma was performed on an average laptop computer to demonstrate the efficiency of the tools. Conclusions: Our open source software enables dealing with huge images with standard software on average computers. They are cross-platform, independent of proprietary libraries and very modular, allowing them to be used in other open source projects. They have excellent performance in terms of execution speed and RAM requirements. They open promising perspectives both to the clinician who wants to study a single slide and to the research team or data centre who do image analysis of many slides on a computer cluster. Keywords: Digital Pathology, Image Processing, Virtual Slides, Systems Biology, ImageJ, NDPI. 1

Upload: vocong

Post on 20-Apr-2018

215 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Analyzing huge pathology images with open source … · Analyzing huge pathology images with open source software ... telepathology and teaching [1,2]. In more and more hospitals,

Analyzing huge pathology images with open source software

Christophe Deroulerslowast1 David Ameisen2 Mathilde Badoual1 Chloe Gerin34 Alexandre Granier5

and Marc Lartaud56

1Univ Paris Diderot Laboratoire IMNC UMR 8165 CNRS Univ Paris-Sud F-91405 Orsay France2Univ Paris Diderot Laboratoire de pathologie Hopital Saint-Louis APHP INSERM UMR-S 728 F-75010 Paris France3on leave from CNRS UMR 8165 Laboratoire IMNC Univ Paris-Sud Univ Paris Diderot F-91405 Orsay France4now at CNRS UMR 8148 Laboratoire IDES Univ Paris-Sud F-91405 Orsay France and CNRS UMR 8608 IPN Univ Paris-SudF-91405 Orsay France5MRI-Montpellier RIO Imaging CRBM F-34293 Montpellier France6CIRAD F-34398 Montpellier CEDEX 5 France

Email Christophe Deroulerslowast- deroulersimncin2p3fr David Ameisen - davidameisengmailcom Mathilde Badoual -

badoualimncin2p3fr Chloe Gerin - chloegerinu-psudfr Alexandre Granier - alexandregraniermricnrsfr Marc Lartaud -

lartaudciradfr

lowastCorresponding author

Abstract

Background Digital pathology images are increasingly used both for diagnosis and research because slide scannersare nowadays broadly available and because the quantitative study of these images yields new insights in systemsbiology However such virtual slides build up a technical challenge since the images occupy often several gigabytesand cannot be fully opened in a computerrsquos memory Moreover there is no standard format Therefore mostcommon open source tools such as ImageJ fail at treating them and the others require expensive hardware whilestill being prohibitively slow

Results We have developed several cross-platform open source software tools to overcome these limitations TheNDPITools provide a way to transform microscopy images initially in the loosely supported NDPI format into oneor several standard TIFF files and to create mosaics (division of huge images into small ones with or withoutoverlap) in various TIFF and JPEG formats They can be driven through ImageJ plugins The LargeTIFFToolsachieve similar functionality for huge TIFF images which do not fit into RAM We test the performance of thesetools on several digital slides and compare them when applicable to standard software A statistical study of thecells in a tissue sample from an oligodendroglioma was performed on an average laptop computer to demonstratethe efficiency of the tools

Conclusions Our open source software enables dealing with huge images with standard software on averagecomputers They are cross-platform independent of proprietary libraries and very modular allowing them tobe used in other open source projects They have excellent performance in terms of execution speed and RAMrequirements They open promising perspectives both to the clinician who wants to study a single slide and tothe research team or data centre who do image analysis of many slides on a computer cluster

Keywords Digital Pathology Image Processing Virtual Slides Systems Biology ImageJ NDPI

1

BackgroundVirtual microscopy has become routinely used overthe last few years for the transmission of pathol-ogy images (the so-called virtual slides) for bothtelepathology and teaching [12] In more and morehospitals virtual slides are even attached to the pa-tientrsquos file [3 4] They have also a great potentialfor research especially in the context of multidisci-plinary projects involving eg mathematicians andclinicians who do not work at the same locationQuantitative histology is a promising new field in-volving computer-based morphometry or statisticalanalysis of tissues [5ndash9] A growing number of worksreport the pertinence of such images for diagnosisand classification of diseases eg tumours [10ndash14]Databases of clinical cases [15] will include moreand more digitized tissue images This growing useof virtual microscopy is accompanied by the devel-opment of integrated image analysis systems offer-ing both virtual slide scanning and automatic im-age analysis which makes integration into the dailypractice of pathologists easier See Ref [16] for areview of some of these systems

Modern slide scanners produce high magnifica-tion microscopy images of excellent quality [1] forinstance at the so-called ldquo40xrdquo magnification Theyallow much better visualization and analysis thanlower magnification images As an example Fig-ure 1 shows two portions of a slide at different mag-nifications 10x and 40x The benefit of the highmagnification for both diagnosis and automated im-age analysis is clear For instance the state of thechromatin inside the nucleus and the cell morphol-ogy better seen at high magnification are essentialto help the clinician distinguish tumorous and non-tumorous cells An accurate non-pixelated determi-nation of the perimeters of the cell nuclei is neededfor morphometry and statistics

However this technique involves the manipula-tion of huge images (of the order of 10 billions ofpixels for a full-size slide at magnification 40x witha single focus level) for which the approach taken bymost standard software loading and decompressingthe full image into RAM is impossible (a single sliceof a full-size slide needs of the order of 30 GiB ofRAM) As a result standard open-source softwaresuch as ImageJ [17] ImageMagick [18] or Graphics-Magick [19] completely fails or is prohibitively slowwhen used on these images Of course commerciallyavailable software exists [16] but it is usually quiteexpensive and very often restricted to a single oper-

ating system It uses proprietary source code whichis a problem if one wants to control or check thealgorithms and their parameters when doing imageanalysis for research

In addition many automated microscopes orslide scanners store the images which they produceinto proprietary or poorly documented file formatsand the software provided by vendors is often spe-cific to some operating system This leads to sev-eral concerns First it makes research based ondigital pathology technically more difficult Evenwhen a project is led on a single site one has of-ten to use clusters of computers to achieve large-scale studies of many full-size slides from severalpatients [20] Since clusters of computers are typ-ically run by open source software such as Linuxpathology images stored in non-standard file formatsare a problem Furthermore research projects arenow commonly performed in parallel in several sitesnot to say in several countries thanks to technol-ogy such as Grid [21] and there is ongoing effortstowards the interoperability of information systemsused in pathology [3 22] Second proprietary for-mats may hinder the development of shared clinicaldatabases [15] and access of the general public toknowledge whereas the citizen should receive ben-efit of public investments Finally they may alsoraise financial concerns and conflicts of interest [23]

There have been recent attempts to define opendocumented vendor-independent software [24 25]which partly address this problem However verylarge images stored in the NDPI file format producedby some slide scanners manufactured by Hama-matsu such as the NanoZoomer are not yet fullysupported by such software For instance LOCI Bio-Formats [25] is presently unable to open images onedimension of which is larger than 65k and does notdeal properly with NDPI files of more than 4 GiBOpenSlide [24] does not currently support the NDPIformat NDPI-Splitter [26] needs to be run on Win-dows and depends on a proprietary library

To address these problems we have developedopen source tools which achieve two main goalsreading and converting images in the NDPI file for-mat into standard open formats such as TIFF andsplitting a huge image without decompressing itentirely into RAM into a mosaic of much smallerpieces (tiles) each of which can be easily opened orprocessed by standard software All this is realizedwith high treatment speed on all platforms

2

ImplementationOverview

The main software is implemented in the C program-ming language as separate command-line driven ex-ecutables It is independent of any proprietary li-brary This ensures portability on a large num-ber of platforms (we have tested several versions ofMac OS X Linux and Windows) modularity andease of integration into scripts or other softwareprojects

It is complemented by a set of plugins for thepublic domain software ImageJ [17] implemented inJava which call the main executables in an auto-matic way to enable an interactive use

The LargeTIFFTools and NDPITools are basedon the open source TIFF [27] and JPEG [28] orlibjpeg-turbo [29] libraries The NDPITools plug-ins for ImageJ are based on the Java API of Im-ageJ [1730] and on the open source software Image-IO [31] and use the Java Advanced Imaging 113library [32]

Basic functions

The basic functions are the following They canbe performed even on a computer with a modestamount of RAM (see below the ldquoperformancerdquo dis-cussion)

1 splitting a tiled TIFF file into multiple TIFFfiles one for each of the tiles (tiffsplittilesprogram)

2 extracting (ldquocroppingrdquo) quickly a given rectan-gle of a supposedly tiled TIFF file into a TIFFor JPEG file (tifffastcrop program)

3 splitting one or several TIFF file(s) pos-sibly very large into mosaic(s) with-out fully decompressing them in memory(tiffmakemosaic program)

4 converting a NDPI file into a standardmultiple-image TIFF file tiled if necessaryusing upon request the BigTIFF format in-troduced in version 400 of the TIFF li-brary [27 33 34] and encoding magnificationand focus levels as TIFF ldquoimage descriptionrdquofields (ndpi2tiff program)

5 creating a standard TIFF file for all or partof the magnification levels and focus levels

present in the given NDPI file (the user canask for specific magnification and focus levelsand for a specific rectangular region of the im-age) and upon request creating a mosaic foreach image which doesnrsquot fit into RAM or forall images (ndpisplit program) The namesof the created files are built on the name ofthe source file and incorporate the magnifica-tion and focus levels (and in the case of mosaicpieces the coordinates inside the mosaic)

MosaicsA mosaic is a set of TIFF or JPEG files (the pieces)which would reproduce the original image if reassem-bled together but of manageable size by standardsoftware The user can either specify the maximumamount of RAM which a mosaic piece should needto be uncompressed (default 1024 MiB) or directlyspecify the size of each piece In the first case thesize of each piece is determined by the software Agiven amount of overlap between mosaic pieces canbe requested either in pixels or as a percentage ofthe image size This is useful eg for cell countingnot to miss cells which lie on the limit between twoadjacent pieces

UsageStandalone

Our tools can be used through the commandline (POSIX-like shell or Windows command inter-preter) and therefore can be very easily integratedinto scripts or other programs Depending on thetool the paths and file names of one or several filesin NDPI or TIFF format have to be provided Op-tions can be added with their arguments on the com-mand line to modify the behavior of the programsfrom its default They are explained in the messagesprinted by the programs run without arguments inUnix-style man pages and on the web pages of theproject (see below in the Availability and require-ments Section)

Under the Windows OS one can click-and-dragthe NDPI file icon onto the icon of ndpi2tiff orndpisplit We provide precompiled binaries wherefrequently-used options are turned on by defaulteg ndpisplit-mJexe produces a mosaic in JPEGformat as with option -mJ The conversion result ormosaic can be found in the same directory as theoriginal NDPI image

3

ImageJ integration

In addition to command line use the ndpisplit pro-gram can be driven through the NDPITools pluginsin ImageJ with a point-and-click interface so thatpreviewing the content of a NDPI file at low resolu-tion selecting a portion extracting it at high resolu-tion and finally opening it in ImageJ to apply furthertreatments can be done in an easy and graphical wayFigure 2 shows a screen shot of ImageJ 147m afterextraction of a rectangular zone from a NDPI fileFigure 3 explains what happens when the NDPI filecontains several levels of focalization the previewimage is displayed as a stack

When producing a mosaic the user can requestthat pieces be JPEG files Since the File gt Open

command of versions 1x of ImageJ is unable to openTIFF files with JPEG compression (one has to useplugins) this is way to produce mosaics which canbe opened by click-and-drag onto the window or iconof ImageJ while still saving disk space thanks to ef-ficient compression Figure 4 shows how the mosaicproduction options can be set inside ImageJ throughthe NDPITools plugins

Results and DiscussionPerformance

We compare the performance of our tools on severalfundamental tasks to standard broadly availablesoftware in representative examples and on broadlyavailable computers

Making a mosaic from a huge image

We chose an 8-bit RGB colour JPEG-compressedTIFF file of 103168times63232 pixels originating in thedigitization of a pathology slide The original fileweighted 97501 MiB Loading this image entirelyinto RAM would need at least 3times 103168times 63232 =182 GiB and is presently intractable on most if notall desktop and laptop computers of reasonable cost

The task was to produce from this image a mo-saic of 64 pieces so that each one needs less than512 MiB RAM to open

On a 32 GHz Intel Core i3 IMac computer with16 GB of RAM the convert command from Im-ageMagick (version 680-7 with quantum size 8 bits)was unable to complete the request GraphicsMag-ick (gm convert -crop version 1317 with quan-tum size 8 bits) completed the request in 70 min us-

ing 25 GiB of disk space tiffmakemosaic from ourLargeTIFFTools completed the request in 25 min

To ascertain that this task can be equallyachieved even on computers with a modest RAMamount we performed the same task on a 6-year-old266 GHz Core2Duo Intel IMac with 2 GiB RAMThe task was completed in 90 min

Converting NDPI into TIFF

Splitting a NDPI file into TIFF files A pathologysample (67 cm2 of tissue) was scanned at magnifica-tion 40x and with 11 focus levels (every 2 microns)by a NanoZoomer resulting in a 65 GiB file in pro-prietary NDPI format (called file andpi hereafter)On a 26 GHz Intel Core i7 Mac Mini computer with16 GiB RAM ndpisplit extracted all 55 images(11 focus levels and 5 magnifications) as indepen-dent single-image TIFF files with JPEG compres-sion in 711 min The size of the largest images was180224 times 70144 The speed was limited only by therate of IO transfers since the CPU usage of this taskwas 138 min out of which the system used 130 minExecuting again the same task straight after the firstexecution took only 057 min because the NDPI filewas still in the cache of the operating system

To ascertain that this task can be equallyachieved even on computers with a modest RAMamount we made a try on a 6-year-old 266 GHzCore2Duo Intel PC with 2 GiB RAM running 32-bits Windows XP Pro SP3 The original file shownin Figure 1 called bndpi and weighting 207 GiB(largest image 103168times63232 pixels) was split intoindependent TIFF files in 22 min without swapping

In comparison the LOCI Bio-Formats plugins forImageJ [25] in its version 446 with ImageJ 143mwas not able to open the images in file andpi evenat low resolution

Converting a NDPI file into a multiple-images TIFFfile Alternatively the same proprietary-format fileandpi was converted into a multiple-images TIFFfile with ndpi2tiff On the same computer as be-fore the conversion time was 70 min Here againthe speed of the process is limited only by the rate ofIO transfers since the conversion took only 30 s ifperformed when the NDPI file was still in the cacheof the operating system

Since the resulting TIFF file could not store all55 images in less than 4 GiB we passed the option -8

4

on the command line to ndpi2tiff to request usingthe BigTIFF format extension The specificationsof this extension to the TIFF standard discussedand published before 2008 [3334] are supported byLibTIFF as of version 400 [27] and therefore by theabundant image viewing and manipulation softwarewhich relies on LibTIFF If the use of the BigTIFFformat extension would have impeded the further ex-ploitation of the produced TIFF file we could havesimply used ndpisplit as above Or we could havecalled the ndpi2tiff command several times eachtime requesting extraction of a subset of all imagesby specifying image numbers after the file name sep-arated with commas as in andpi01234

Extracting a small region from a huge image

This task can be useful to visualize at full resolutiona region of interest which the user has selected ona low-magnification preview image Therefore itshould be performed as quickly as possible

From a TIFF file

The task was to extract a rectangular region ofsize 256 times 256 pixels situated at the bottom rightcorner of huge TIFF images and to save it as anindependent file The source images were single-image TIFF files using JPEG compression Table 1compares the time needed to complete the taskwith tifffastcrop from our LargeTIFFTools andwith several software tools on increasingly largeTIFF files Tests were performed on a 26 GHz IntelCore i7 Mac Mini computer with 16 GB of RAM andused GraphicsMagick 1317 ImageMagick 680-7 and the utility tiffcrop from LibTIFF 403Noticeably when treating the largest image Graph-icsMagick needs 50 GiB of free disk space whereastifffastcrop doesnrsquot need it

From a NDPI file

The task was to extract a rectangular region of size256 times 256 from one of the largest images of the fileandpi (size 180224 times 70144) On a 26 GHz IntelCore i7 Mac Mini computer with 16 GB of RAMthe execution time was 012 s for one extract and inaverage 006 s per extract in a series of 20 extractswith locations drawn uniformly at random inside thewhole image

ApplicationsIntegration in digital pathology image servers or virtualslide systems

The NDPITools are being used in several other soft-ware projects

bull in a system for automatic blur detection [24]

bull in WIDE [22] to deal with NDPI files WIDEis an open-source biological and digital pathol-ogy image archiving and visualization systemwhich allows the remote user to see imagesstored in a remote library in a browser In par-ticular thanks to the feature of high-speed ex-traction of a rectangular region by ndpisplitWIDE saves costly disk space since it doesnrsquotneed to store TIFF files converted from NDPIfiles in addition to the latter

Exploiting a large set of digital slides

In the framework of a study about invasive low-gradeoligodendrogliomas reported elsewhere [8] we hadto deal with 303 NDPI files occupying 122 GiBOn a 32 GHz Intel Core i3 IMac computer with16 GB RAM we used ndpisplit in a batch workto convert them into standard TIFF files whichtook only a few hours The experimental -s op-tion of ndpisplit was used to remove the blankfilling between scanned regions resulting in an im-portant disk space saving and in smaller TIFF files(one for each scanned region) which where easier tomanipulate afterwards Then for each sample Pre-viewapp and ImageJ were used to inspect the re-sulting images and manually select the regions ofclinical interest The corresponding extracts of thehigh magnification images were the subject of auto-mated cell counting and other quantitative analysesusing ImageJ In particular we collected quantita-tive data about edema or tissue hyperhydration [8]This quantity needed a specific image analysis proce-dure which is not offered by standard morphometrysoftware and unlike cell density estimates could notbe retrieved by sampling a few fields of view in themicroscope Therefore virtual microscopy and ourtools were essential in this study

Study of a whole slide of brain tissue invaded by anoligodendroglioma

To demonstrate the possibility to do research onhuge images even with a modest computer we chose

5

a 3-year-old MacBook Pro laptop computer with266 GHz Intel Core 2 Duo and 4 GiB of RAM Weused ImageJ and the NDPITools to perform statis-tics on the upper piece of tissue on the slide shownin Figure 1

Since the digital slide bndpi weighted 207 GiBwith a high resolution image of 103168 times 63232 pix-els it was not possible to do the study ina straightforward way We opened the filebndpi as a preview image with the commandPlugins gt NDPITools gt Preview NDPI andselected on it the left tissue sample Thenwe used the command Plugins gt NDPITools gt

Custom extract to TIFF Mosaic and askedfor extraction as a mosaic of 16 JPEG files each oneneeding less than 1 GiB of RAM to open and withan overlap of 60 pixels This was completed withina few minutes Then we applied an ImageJ macroto each of the 16 pieces to identify the dark cell nu-clei (those with high chromatin content) based onthresholding the luminosity values of the pixels asshown in Figure 1 It produced text files with thecoordinates and size of each cell nucleus

Out of the 154240 identified nuclei 1951 were po-sitioned on the overlapping regions between piecesUsing the overlap feature of our tools enabled toproperly detect these nuclei since they would havebeen cut by the boundary of the pieces of the mo-saic in absence of overlap We avoided double count-ing by identifying the pairs of nuclei situated in theoverlapping regions and which were separated by adistance smaller than their radius

As shown in earlier studies [7 10 11] these datacan be used for research and diagnosis purposes Asan example Figure 5 shows the distribution of thedistance of each cell nucleus to its nearest neighborThanks to the very high number of analyzed cell nu-clei this distribution is obtained with an excellentprecision

ConclusionsThe LargeTIFFTools NDPITools and NDPIToolsplugins for ImageJ achieve efficiently some funda-mental functions on large images and in particulardigital slides for which standard open source soft-ware fails or performs badly They enable both theclinician to examine a single slide and the bioinfor-matics research team to perform large-scale analysisof many slides possibly on computer grids [20]

To date the LargeTIFFTools have been down-loaded from more than 388 different IP addressesthe NDPITools from more than 1361 addresses andthe ImageJ plugins from more than 235 addressesTable 2 lists the distribution of the target platformsamong the downloads of the binary files It shows abroad usage of the different platforms by the commu-nity emphasizing the importance of cross-platformopen source tools

We have explained how the software was usedto study some microscopic properties of brain tis-sue when invaded by an oligodendroglioma and wehave given an illustrative application to the analysisof a whole-size pathology slide This suggests otherpromising applications

Availability and requirementsa LargeTIFFTools

bull Project name LargeTIFFTools

bull Project home page httpwwwimncin2p3frpagespersoderoulerssoftwarelargetifftools

bull Operating system(s) Platform indepen-dent

bull Programming language C

bull Other requirements libjpeg libtiff

bull License GNU GPLv3

b NDPITools

bull Project name NDPITools

bull Project home page httpwwwimncin2p3frpagespersoderoulerssoftwarendpitools

bull Operating system(s) Platform indepen-dent

bull Programming language C

bull Other requirements mdash

bull License GNU GPLv3

For the convenience of users precompiled bi-naries are provided for Windows (32 and 64 bits)Mac OS X and Linux

c NDPITools plugins for ImageJ

6

bull Project name NDPITools plugins for Im-ageJ

bull Project home page httpwwwimncin2p3frpagespersoderoulerssoftwarendpitools

bull Operating system(s) Platform indepen-dent

bull Programming language Java

bull Other requirements ImageJ 131s orhigher Ant JAI 113

bull License GNU GPLv3

Competing interestsThe authors declare that they have no competing inter-ests

Authors contributionsCD wrote the paper ML conceived and implemented afirst version of the integration into ImageJ as a toolsetof macros CD implemented the software and wrote thedocumentation CG AG and ML contributed sugges-tions to the software CD DA AG and ML performedsoftware tests CD MB CG AG and ML selected andprovided histological samples CD performed the statis-tical analysis of the sample slide All authors reviewedthe manuscript All authors read and approved the finalmanuscript

AcknowledgementsWe thank F Bouhidel and P Bertheau for their helpwith the slide scanner of the Pathology Laboratory ofthe Saint-Louis Hospital in Paris and C Klein (Imag-ing facility Cordeliers Research Center ndash INSERM U872Paris) for tests and suggestions

The computer CPU operating system and pro-gramming language names quoted in this article aretrademarks of their respective owners

References1 Diamond J McCleary D Virtual Microscopy In Ad-

vanced techniques in diagnostic cellular pathology Editedby Hannon-Fletcher M Maxwell P Chichester UK JohnWiley amp Sons Ltd 2009

2 Ameisen D Yunes JB Deroulers C Perrier V BouhidelF Battistella M Legres L Janin A Bertheau P Stackor Trash Fast quality assessment of virtualslides Diagnostics Pathology 2013 in press

3 Garcıa Rojo M Castro AM Goncalves L COST ActionrdquoEuroTelepathrdquo digital pathology integration inelectronic health record including primary carecentres Diagnostic pathology 2011 6(Suppl 1)S6

4 Ameisen D Integration des lames virtuelles dansle dossier patient electronique PhD thesis UnivParis Diderot-Paris 7 2013

5 Collan Y Torkkeli T Personen E Jantunen E KosmaVM Application of morphometry in tumorpathology Analytical and quantitative cytology and his-tology 1987 9(2)79ndash88

6 Wolfe P Murphy J McGinley J Zhu Z Jiang WGottschall E Thompson H Using nuclear morphom-etry to discriminate the tumorigenic potential ofcells A comparison of statistical methods Cancerepidemiology biomarkers amp prevention 2004 13(6)976ndash988

7 Gurcan MN Boucheron LE Can A Madabhushi A Ra-jpoot NM Yener B Histopathological Image Analy-sis A Review Biomedical Engineering IEEE Reviewsin 2009 2147ndash171

8 Gerin C Pallud J Deroulers C Varlet P OppenheimC Roux FX Chretien F Thomas SR GrammaticosB Badoual M Quantitative characterization ofthe imaging limits of diffuse low-grade oligoden-drogliomas Neuro-Oncology 2013 in press

9 Wienert S Heim D Kotani M Lindequist B Sten-zinger A Ishii M Hufnagl P Beil M Dietel M DenkertC Klauschen F CognitionMaster an object-based image analysis framework Diagnostic pathol-ogy 2013 834

10 Gunduz C Yener B Gultekin SH The cell graphs ofcancer Bioinformatics 2004 20 Suppl 1i145ndashi151

11 Gunduz C Gultekin SH Yener B Augmented cell-graphs for automated cancer diagnosis Bioinfor-matics 2005 21 Suppl 2ii7ndashii12

12 West NP Dattani M McShane P Hutchins G GrabschJ Mueller W Treanor D Quirke P Grabsch H Theproportion of tumour cells is an independent pre-dictor for survival in colorectal cancer patientsBritish Journal of Cancer 2010 1021519ndash1523

13 Chang H Han J Borowsky A Loss L Gray JW Spell-man PT Parvin B Invariant Delineation of Nu-clear Architecture in Glioblastoma Multiformefor Clinical and Molecular Association IEEETrans Med Imag 2013 32(4)670ndash682

14 Kayser K Radziszowski D Bzdyl P Sommer R KayserG Towards an automated virtual slide screeningtheoretical considerations and practical experi-ences of automated tissue-based virtual diagno-sis to be implemented in the Internet Diagnosticpathology 2006 110

15 PLGA Foundation Meta Analysis Low GradeGlioma Database Project 2012 [httpwwwfightplgaorgresearchPLGA-Sponsored ProjectsMetaAnalysis]

7

16 Garcıa Rojo M Bueno G Slodkowska J Review ofimaging solutions for integrated quantitative im-munohistochemistry in the Pathology daily prac-tice Folia histochemica et cytobiologica 2009 47(3)349ndash354

17 Rasband WS ImageJ 1997ndash2012 [httpimagejnihgovij]

18 ImageMagick Studio LLC ImageMagick 2013 [httpwwwimagemagickorg]

19 GraphicsMagick Group GraphicsMagick 2013 [httpwwwgraphicsmagickorg]

20 Kong J Cooper LAD Wang F Chisolm C MorenoCS Kurc TM Widener PM Brat DJ Saltz JH Acomprehensive framework for classification of nu-clei in digital microscopy imaging An applica-tion to diffuse gliomas In Biomedical Imaging FromNano to Macro 2011 IEEE International Symposium on20112128ndash2131

21 Kayser K Gortler J Borkenfeld S Kayser G Grid com-puting in image analysis Diagnostic pathology 20116(Suppl 1)S12

22 Granier A Olivier M Laborie S Vaudescal S Baecker VTran-Aupiais C WIDE (Web Images and Data En-vironment) 2013 [httpwwwmricnrsfrindexphpm=81]

23 Kayser K Introduction of virtual microscopy inroutine surgical pathology mdash a hypothesis andpersonal view from Europe Diagnostic pathology2012 748

24 Goode A Satyanarayanan M A Vendor-NeutralLibrary and Viewer for Whole-Slide ImagesTech Rep Technical Report CMU-CS-08-136 Com-puter Science Department Carnegie Mellon Univer-sity 2008 [httpreports-archiveadmcscmueduanon2008CMU-CS-08-136pdf]

25 Linkert M Rueden CT Allan C Burel JM Moore WPatterson A Loranger B Moore J Neves C MacDon-ald D Tarkowska A Sticco C Hill E Rossner M EliceiriKW Swedlow JR Metadata matters access to im-age data in the real world Journal of Cell Biology2010 198(5)777ndash782

26 Khushi M Edwards G de Marcos DA Carpenter JEGraham JD Clarke CL Open source tools for man-agement and archiving of digital microscopy datato allow integration with patient pathology andtreatment information Diagnostic pathology 2013822

27 Sam Leffler S the authors of LibTIFF LibTIFF ndashTIFF Library and Utilities 2012 [httpwwwremotesensingorglibtiff]

28 Lane TG Vollbeding G The Independent JPEGGrouprsquos JPEG software 2013 [httpwwwijgorg]

29 Lane TG Vollbeding G the authors of the libjpeg-turbosoftware libjpeg-turbo 2012 [httplibjpeg-turbovirtualglorg]

30 Schneider CA Rasband WS Eliceiri KW NIH Imageto ImageJ 25 years of image analysis Nature Meth-ods 2012 9671ndash675

31 Sacha J Image IO Plugin Bundle 2004 [httpij-pluginssourceforgenetpluginsimageio]

32 Sun Microsystems Inc Java Advanced Library113 2006 [httpwwworaclecomtechnetworkjavacurrent-142188html]

33 BigTIFF Design 2012 [httpwwwremotesensingorglibtiffbigtiffdesignhtml]

34 The BigTIFF File Format Proposal 2008 [httpwwwawaresystemsbeimagingtiffbigtiffhtml]

8

Figures

(a) (b) (c)

Figure 1 - A sample slide(a) macroscopic view of the whole slide (the black rectangle on the left is 1x2 cm) (bc) Influence of themagnification on the quality of results (b) a portion of the slide scanned at magnification level 10x Thewhite contours show the result of an automatic detection of the dark cell nuclei with the ImageJ software Asignificant fraction of the cell nuclei is missed and the contours are rather pixelated (c) the same portion ofthe slide scanned at magnification 40x The white contours show the result of the same automatic detectionAlmost all cell nuclei are detected and the shapes of the contours are much more precise Scale bar 4 microm

Figure 2 - A typical session using ImageJ and the NDPITools pluginsA NDPI file has been opened with the NDPITools plugins and it is displayed as a preview image (image atlargest resolution which still fits into the computerrsquos screen) mdash top window A rectangular region has beenselected and extracted as a TIFF image then opened mdash bottom window

9

Figure 3 - Preview image of a NDPI file with several focalization levels in ImageJThe NDPI file 08ndpi contains images at 5 different focalization levels Therefore its preview image isdisplayed as a stack of 5 images

10

Figure 4 - Dialog box for customized extraction in ImageJ from an NDPI file with production of amosaicThe dialog box shows some options which can be customized while producing a mosaic from a rectangularselection of a NDPI file preview image (here using the file previewed in Figure 3)

Figure 5 - Statistical properties of the cell nuclei with high chromatin content in the tissue sample ofFigure 1The positions of the 154240 identified nuclei were obtained from the analysis with ImageJ of the digital slideon a laptop computer Since the slide was too large to fit into the computerrsquos memory it was turned into amosaic of 16 pieces with overlap of 60 pixels and each piece underwent automated analysis independentlyThen the results were aggregated The graph shows the probability density function of the distance of a cellnucleus to its nearest neighbor in the whole sample

11

TablesTable 1 - Speed comparison of software to extract a 256 times 256 rectangle from a huge TIFF imageTime needed (or indication of failure when the task was not completed) by several software tools to extracta rectangular region of size 256times256 pixels situated at the bottom right corner of huge TIFF images and tosave it as an independent file The input images were single-image tiled TIFF files using JPEG compressionTheir dimensions are indicated in the top row The computer used was a 26 GHz Intel Core i7 Mac Miniwith 16 GiB of RAM and more than 100 GiB of free hard disk The tested software tools were from topto bottom tifffastcrop from our LargeTIFFTools GraphicsMagick 1317 ImageMagick 680-7 and theutility tiffcrop from LibTIFF 403

Image size (px) 11264 times 4384 45056 times 17536 180224 times 70144

tifffastcrop 030 s 030 s 030 sGraphicsMagick 074 s 236 s gt 80 min

ImageMagick 118 s 236 s failedtiffcrop 050 s failed seg fault

Table 2 - Downloads of the NDPIToolsDistribution of the downloads (unique IP address) of the precompiled binaries of the NDPITools betweenMarch 2012 and April 2013

Windows (32 bits) Windows (64 bits) Linux Mac OS X483 542 217 285

12

Page 2: Analyzing huge pathology images with open source … · Analyzing huge pathology images with open source software ... telepathology and teaching [1,2]. In more and more hospitals,

BackgroundVirtual microscopy has become routinely used overthe last few years for the transmission of pathol-ogy images (the so-called virtual slides) for bothtelepathology and teaching [12] In more and morehospitals virtual slides are even attached to the pa-tientrsquos file [3 4] They have also a great potentialfor research especially in the context of multidisci-plinary projects involving eg mathematicians andclinicians who do not work at the same locationQuantitative histology is a promising new field in-volving computer-based morphometry or statisticalanalysis of tissues [5ndash9] A growing number of worksreport the pertinence of such images for diagnosisand classification of diseases eg tumours [10ndash14]Databases of clinical cases [15] will include moreand more digitized tissue images This growing useof virtual microscopy is accompanied by the devel-opment of integrated image analysis systems offer-ing both virtual slide scanning and automatic im-age analysis which makes integration into the dailypractice of pathologists easier See Ref [16] for areview of some of these systems

Modern slide scanners produce high magnifica-tion microscopy images of excellent quality [1] forinstance at the so-called ldquo40xrdquo magnification Theyallow much better visualization and analysis thanlower magnification images As an example Fig-ure 1 shows two portions of a slide at different mag-nifications 10x and 40x The benefit of the highmagnification for both diagnosis and automated im-age analysis is clear For instance the state of thechromatin inside the nucleus and the cell morphol-ogy better seen at high magnification are essentialto help the clinician distinguish tumorous and non-tumorous cells An accurate non-pixelated determi-nation of the perimeters of the cell nuclei is neededfor morphometry and statistics

However this technique involves the manipula-tion of huge images (of the order of 10 billions ofpixels for a full-size slide at magnification 40x witha single focus level) for which the approach taken bymost standard software loading and decompressingthe full image into RAM is impossible (a single sliceof a full-size slide needs of the order of 30 GiB ofRAM) As a result standard open-source softwaresuch as ImageJ [17] ImageMagick [18] or Graphics-Magick [19] completely fails or is prohibitively slowwhen used on these images Of course commerciallyavailable software exists [16] but it is usually quiteexpensive and very often restricted to a single oper-

ating system It uses proprietary source code whichis a problem if one wants to control or check thealgorithms and their parameters when doing imageanalysis for research

In addition many automated microscopes orslide scanners store the images which they produceinto proprietary or poorly documented file formatsand the software provided by vendors is often spe-cific to some operating system This leads to sev-eral concerns First it makes research based ondigital pathology technically more difficult Evenwhen a project is led on a single site one has of-ten to use clusters of computers to achieve large-scale studies of many full-size slides from severalpatients [20] Since clusters of computers are typ-ically run by open source software such as Linuxpathology images stored in non-standard file formatsare a problem Furthermore research projects arenow commonly performed in parallel in several sitesnot to say in several countries thanks to technol-ogy such as Grid [21] and there is ongoing effortstowards the interoperability of information systemsused in pathology [3 22] Second proprietary for-mats may hinder the development of shared clinicaldatabases [15] and access of the general public toknowledge whereas the citizen should receive ben-efit of public investments Finally they may alsoraise financial concerns and conflicts of interest [23]

There have been recent attempts to define opendocumented vendor-independent software [24 25]which partly address this problem However verylarge images stored in the NDPI file format producedby some slide scanners manufactured by Hama-matsu such as the NanoZoomer are not yet fullysupported by such software For instance LOCI Bio-Formats [25] is presently unable to open images onedimension of which is larger than 65k and does notdeal properly with NDPI files of more than 4 GiBOpenSlide [24] does not currently support the NDPIformat NDPI-Splitter [26] needs to be run on Win-dows and depends on a proprietary library

To address these problems we have developedopen source tools which achieve two main goalsreading and converting images in the NDPI file for-mat into standard open formats such as TIFF andsplitting a huge image without decompressing itentirely into RAM into a mosaic of much smallerpieces (tiles) each of which can be easily opened orprocessed by standard software All this is realizedwith high treatment speed on all platforms

2

ImplementationOverview

The main software is implemented in the C program-ming language as separate command-line driven ex-ecutables It is independent of any proprietary li-brary This ensures portability on a large num-ber of platforms (we have tested several versions ofMac OS X Linux and Windows) modularity andease of integration into scripts or other softwareprojects

It is complemented by a set of plugins for thepublic domain software ImageJ [17] implemented inJava which call the main executables in an auto-matic way to enable an interactive use

The LargeTIFFTools and NDPITools are basedon the open source TIFF [27] and JPEG [28] orlibjpeg-turbo [29] libraries The NDPITools plug-ins for ImageJ are based on the Java API of Im-ageJ [1730] and on the open source software Image-IO [31] and use the Java Advanced Imaging 113library [32]

Basic functions

The basic functions are the following They canbe performed even on a computer with a modestamount of RAM (see below the ldquoperformancerdquo dis-cussion)

1 splitting a tiled TIFF file into multiple TIFFfiles one for each of the tiles (tiffsplittilesprogram)

2 extracting (ldquocroppingrdquo) quickly a given rectan-gle of a supposedly tiled TIFF file into a TIFFor JPEG file (tifffastcrop program)

3 splitting one or several TIFF file(s) pos-sibly very large into mosaic(s) with-out fully decompressing them in memory(tiffmakemosaic program)

4 converting a NDPI file into a standardmultiple-image TIFF file tiled if necessaryusing upon request the BigTIFF format in-troduced in version 400 of the TIFF li-brary [27 33 34] and encoding magnificationand focus levels as TIFF ldquoimage descriptionrdquofields (ndpi2tiff program)

5 creating a standard TIFF file for all or partof the magnification levels and focus levels

present in the given NDPI file (the user canask for specific magnification and focus levelsand for a specific rectangular region of the im-age) and upon request creating a mosaic foreach image which doesnrsquot fit into RAM or forall images (ndpisplit program) The namesof the created files are built on the name ofthe source file and incorporate the magnifica-tion and focus levels (and in the case of mosaicpieces the coordinates inside the mosaic)

MosaicsA mosaic is a set of TIFF or JPEG files (the pieces)which would reproduce the original image if reassem-bled together but of manageable size by standardsoftware The user can either specify the maximumamount of RAM which a mosaic piece should needto be uncompressed (default 1024 MiB) or directlyspecify the size of each piece In the first case thesize of each piece is determined by the software Agiven amount of overlap between mosaic pieces canbe requested either in pixels or as a percentage ofthe image size This is useful eg for cell countingnot to miss cells which lie on the limit between twoadjacent pieces

UsageStandalone

Our tools can be used through the commandline (POSIX-like shell or Windows command inter-preter) and therefore can be very easily integratedinto scripts or other programs Depending on thetool the paths and file names of one or several filesin NDPI or TIFF format have to be provided Op-tions can be added with their arguments on the com-mand line to modify the behavior of the programsfrom its default They are explained in the messagesprinted by the programs run without arguments inUnix-style man pages and on the web pages of theproject (see below in the Availability and require-ments Section)

Under the Windows OS one can click-and-dragthe NDPI file icon onto the icon of ndpi2tiff orndpisplit We provide precompiled binaries wherefrequently-used options are turned on by defaulteg ndpisplit-mJexe produces a mosaic in JPEGformat as with option -mJ The conversion result ormosaic can be found in the same directory as theoriginal NDPI image

3

ImageJ integration

In addition to command line use the ndpisplit pro-gram can be driven through the NDPITools pluginsin ImageJ with a point-and-click interface so thatpreviewing the content of a NDPI file at low resolu-tion selecting a portion extracting it at high resolu-tion and finally opening it in ImageJ to apply furthertreatments can be done in an easy and graphical wayFigure 2 shows a screen shot of ImageJ 147m afterextraction of a rectangular zone from a NDPI fileFigure 3 explains what happens when the NDPI filecontains several levels of focalization the previewimage is displayed as a stack

When producing a mosaic the user can requestthat pieces be JPEG files Since the File gt Open

command of versions 1x of ImageJ is unable to openTIFF files with JPEG compression (one has to useplugins) this is way to produce mosaics which canbe opened by click-and-drag onto the window or iconof ImageJ while still saving disk space thanks to ef-ficient compression Figure 4 shows how the mosaicproduction options can be set inside ImageJ throughthe NDPITools plugins

Results and DiscussionPerformance

We compare the performance of our tools on severalfundamental tasks to standard broadly availablesoftware in representative examples and on broadlyavailable computers

Making a mosaic from a huge image

We chose an 8-bit RGB colour JPEG-compressedTIFF file of 103168times63232 pixels originating in thedigitization of a pathology slide The original fileweighted 97501 MiB Loading this image entirelyinto RAM would need at least 3times 103168times 63232 =182 GiB and is presently intractable on most if notall desktop and laptop computers of reasonable cost

The task was to produce from this image a mo-saic of 64 pieces so that each one needs less than512 MiB RAM to open

On a 32 GHz Intel Core i3 IMac computer with16 GB of RAM the convert command from Im-ageMagick (version 680-7 with quantum size 8 bits)was unable to complete the request GraphicsMag-ick (gm convert -crop version 1317 with quan-tum size 8 bits) completed the request in 70 min us-

ing 25 GiB of disk space tiffmakemosaic from ourLargeTIFFTools completed the request in 25 min

To ascertain that this task can be equallyachieved even on computers with a modest RAMamount we performed the same task on a 6-year-old266 GHz Core2Duo Intel IMac with 2 GiB RAMThe task was completed in 90 min

Converting NDPI into TIFF

Splitting a NDPI file into TIFF files A pathologysample (67 cm2 of tissue) was scanned at magnifica-tion 40x and with 11 focus levels (every 2 microns)by a NanoZoomer resulting in a 65 GiB file in pro-prietary NDPI format (called file andpi hereafter)On a 26 GHz Intel Core i7 Mac Mini computer with16 GiB RAM ndpisplit extracted all 55 images(11 focus levels and 5 magnifications) as indepen-dent single-image TIFF files with JPEG compres-sion in 711 min The size of the largest images was180224 times 70144 The speed was limited only by therate of IO transfers since the CPU usage of this taskwas 138 min out of which the system used 130 minExecuting again the same task straight after the firstexecution took only 057 min because the NDPI filewas still in the cache of the operating system

To ascertain that this task can be equallyachieved even on computers with a modest RAMamount we made a try on a 6-year-old 266 GHzCore2Duo Intel PC with 2 GiB RAM running 32-bits Windows XP Pro SP3 The original file shownin Figure 1 called bndpi and weighting 207 GiB(largest image 103168times63232 pixels) was split intoindependent TIFF files in 22 min without swapping

In comparison the LOCI Bio-Formats plugins forImageJ [25] in its version 446 with ImageJ 143mwas not able to open the images in file andpi evenat low resolution

Converting a NDPI file into a multiple-images TIFFfile Alternatively the same proprietary-format fileandpi was converted into a multiple-images TIFFfile with ndpi2tiff On the same computer as be-fore the conversion time was 70 min Here againthe speed of the process is limited only by the rate ofIO transfers since the conversion took only 30 s ifperformed when the NDPI file was still in the cacheof the operating system

Since the resulting TIFF file could not store all55 images in less than 4 GiB we passed the option -8

4

on the command line to ndpi2tiff to request usingthe BigTIFF format extension The specificationsof this extension to the TIFF standard discussedand published before 2008 [3334] are supported byLibTIFF as of version 400 [27] and therefore by theabundant image viewing and manipulation softwarewhich relies on LibTIFF If the use of the BigTIFFformat extension would have impeded the further ex-ploitation of the produced TIFF file we could havesimply used ndpisplit as above Or we could havecalled the ndpi2tiff command several times eachtime requesting extraction of a subset of all imagesby specifying image numbers after the file name sep-arated with commas as in andpi01234

Extracting a small region from a huge image

This task can be useful to visualize at full resolutiona region of interest which the user has selected ona low-magnification preview image Therefore itshould be performed as quickly as possible

From a TIFF file

The task was to extract a rectangular region ofsize 256 times 256 pixels situated at the bottom rightcorner of huge TIFF images and to save it as anindependent file The source images were single-image TIFF files using JPEG compression Table 1compares the time needed to complete the taskwith tifffastcrop from our LargeTIFFTools andwith several software tools on increasingly largeTIFF files Tests were performed on a 26 GHz IntelCore i7 Mac Mini computer with 16 GB of RAM andused GraphicsMagick 1317 ImageMagick 680-7 and the utility tiffcrop from LibTIFF 403Noticeably when treating the largest image Graph-icsMagick needs 50 GiB of free disk space whereastifffastcrop doesnrsquot need it

From a NDPI file

The task was to extract a rectangular region of size256 times 256 from one of the largest images of the fileandpi (size 180224 times 70144) On a 26 GHz IntelCore i7 Mac Mini computer with 16 GB of RAMthe execution time was 012 s for one extract and inaverage 006 s per extract in a series of 20 extractswith locations drawn uniformly at random inside thewhole image

ApplicationsIntegration in digital pathology image servers or virtualslide systems

The NDPITools are being used in several other soft-ware projects

bull in a system for automatic blur detection [24]

bull in WIDE [22] to deal with NDPI files WIDEis an open-source biological and digital pathol-ogy image archiving and visualization systemwhich allows the remote user to see imagesstored in a remote library in a browser In par-ticular thanks to the feature of high-speed ex-traction of a rectangular region by ndpisplitWIDE saves costly disk space since it doesnrsquotneed to store TIFF files converted from NDPIfiles in addition to the latter

Exploiting a large set of digital slides

In the framework of a study about invasive low-gradeoligodendrogliomas reported elsewhere [8] we hadto deal with 303 NDPI files occupying 122 GiBOn a 32 GHz Intel Core i3 IMac computer with16 GB RAM we used ndpisplit in a batch workto convert them into standard TIFF files whichtook only a few hours The experimental -s op-tion of ndpisplit was used to remove the blankfilling between scanned regions resulting in an im-portant disk space saving and in smaller TIFF files(one for each scanned region) which where easier tomanipulate afterwards Then for each sample Pre-viewapp and ImageJ were used to inspect the re-sulting images and manually select the regions ofclinical interest The corresponding extracts of thehigh magnification images were the subject of auto-mated cell counting and other quantitative analysesusing ImageJ In particular we collected quantita-tive data about edema or tissue hyperhydration [8]This quantity needed a specific image analysis proce-dure which is not offered by standard morphometrysoftware and unlike cell density estimates could notbe retrieved by sampling a few fields of view in themicroscope Therefore virtual microscopy and ourtools were essential in this study

Study of a whole slide of brain tissue invaded by anoligodendroglioma

To demonstrate the possibility to do research onhuge images even with a modest computer we chose

5

a 3-year-old MacBook Pro laptop computer with266 GHz Intel Core 2 Duo and 4 GiB of RAM Weused ImageJ and the NDPITools to perform statis-tics on the upper piece of tissue on the slide shownin Figure 1

Since the digital slide bndpi weighted 207 GiBwith a high resolution image of 103168 times 63232 pix-els it was not possible to do the study ina straightforward way We opened the filebndpi as a preview image with the commandPlugins gt NDPITools gt Preview NDPI andselected on it the left tissue sample Thenwe used the command Plugins gt NDPITools gt

Custom extract to TIFF Mosaic and askedfor extraction as a mosaic of 16 JPEG files each oneneeding less than 1 GiB of RAM to open and withan overlap of 60 pixels This was completed withina few minutes Then we applied an ImageJ macroto each of the 16 pieces to identify the dark cell nu-clei (those with high chromatin content) based onthresholding the luminosity values of the pixels asshown in Figure 1 It produced text files with thecoordinates and size of each cell nucleus

Out of the 154240 identified nuclei 1951 were po-sitioned on the overlapping regions between piecesUsing the overlap feature of our tools enabled toproperly detect these nuclei since they would havebeen cut by the boundary of the pieces of the mo-saic in absence of overlap We avoided double count-ing by identifying the pairs of nuclei situated in theoverlapping regions and which were separated by adistance smaller than their radius

As shown in earlier studies [7 10 11] these datacan be used for research and diagnosis purposes Asan example Figure 5 shows the distribution of thedistance of each cell nucleus to its nearest neighborThanks to the very high number of analyzed cell nu-clei this distribution is obtained with an excellentprecision

ConclusionsThe LargeTIFFTools NDPITools and NDPIToolsplugins for ImageJ achieve efficiently some funda-mental functions on large images and in particulardigital slides for which standard open source soft-ware fails or performs badly They enable both theclinician to examine a single slide and the bioinfor-matics research team to perform large-scale analysisof many slides possibly on computer grids [20]

To date the LargeTIFFTools have been down-loaded from more than 388 different IP addressesthe NDPITools from more than 1361 addresses andthe ImageJ plugins from more than 235 addressesTable 2 lists the distribution of the target platformsamong the downloads of the binary files It shows abroad usage of the different platforms by the commu-nity emphasizing the importance of cross-platformopen source tools

We have explained how the software was usedto study some microscopic properties of brain tis-sue when invaded by an oligodendroglioma and wehave given an illustrative application to the analysisof a whole-size pathology slide This suggests otherpromising applications

Availability and requirementsa LargeTIFFTools

bull Project name LargeTIFFTools

bull Project home page httpwwwimncin2p3frpagespersoderoulerssoftwarelargetifftools

bull Operating system(s) Platform indepen-dent

bull Programming language C

bull Other requirements libjpeg libtiff

bull License GNU GPLv3

b NDPITools

bull Project name NDPITools

bull Project home page httpwwwimncin2p3frpagespersoderoulerssoftwarendpitools

bull Operating system(s) Platform indepen-dent

bull Programming language C

bull Other requirements mdash

bull License GNU GPLv3

For the convenience of users precompiled bi-naries are provided for Windows (32 and 64 bits)Mac OS X and Linux

c NDPITools plugins for ImageJ

6

bull Project name NDPITools plugins for Im-ageJ

bull Project home page httpwwwimncin2p3frpagespersoderoulerssoftwarendpitools

bull Operating system(s) Platform indepen-dent

bull Programming language Java

bull Other requirements ImageJ 131s orhigher Ant JAI 113

bull License GNU GPLv3

Competing interestsThe authors declare that they have no competing inter-ests

Authors contributionsCD wrote the paper ML conceived and implemented afirst version of the integration into ImageJ as a toolsetof macros CD implemented the software and wrote thedocumentation CG AG and ML contributed sugges-tions to the software CD DA AG and ML performedsoftware tests CD MB CG AG and ML selected andprovided histological samples CD performed the statis-tical analysis of the sample slide All authors reviewedthe manuscript All authors read and approved the finalmanuscript

AcknowledgementsWe thank F Bouhidel and P Bertheau for their helpwith the slide scanner of the Pathology Laboratory ofthe Saint-Louis Hospital in Paris and C Klein (Imag-ing facility Cordeliers Research Center ndash INSERM U872Paris) for tests and suggestions

The computer CPU operating system and pro-gramming language names quoted in this article aretrademarks of their respective owners

References1 Diamond J McCleary D Virtual Microscopy In Ad-

vanced techniques in diagnostic cellular pathology Editedby Hannon-Fletcher M Maxwell P Chichester UK JohnWiley amp Sons Ltd 2009

2 Ameisen D Yunes JB Deroulers C Perrier V BouhidelF Battistella M Legres L Janin A Bertheau P Stackor Trash Fast quality assessment of virtualslides Diagnostics Pathology 2013 in press

3 Garcıa Rojo M Castro AM Goncalves L COST ActionrdquoEuroTelepathrdquo digital pathology integration inelectronic health record including primary carecentres Diagnostic pathology 2011 6(Suppl 1)S6

4 Ameisen D Integration des lames virtuelles dansle dossier patient electronique PhD thesis UnivParis Diderot-Paris 7 2013

5 Collan Y Torkkeli T Personen E Jantunen E KosmaVM Application of morphometry in tumorpathology Analytical and quantitative cytology and his-tology 1987 9(2)79ndash88

6 Wolfe P Murphy J McGinley J Zhu Z Jiang WGottschall E Thompson H Using nuclear morphom-etry to discriminate the tumorigenic potential ofcells A comparison of statistical methods Cancerepidemiology biomarkers amp prevention 2004 13(6)976ndash988

7 Gurcan MN Boucheron LE Can A Madabhushi A Ra-jpoot NM Yener B Histopathological Image Analy-sis A Review Biomedical Engineering IEEE Reviewsin 2009 2147ndash171

8 Gerin C Pallud J Deroulers C Varlet P OppenheimC Roux FX Chretien F Thomas SR GrammaticosB Badoual M Quantitative characterization ofthe imaging limits of diffuse low-grade oligoden-drogliomas Neuro-Oncology 2013 in press

9 Wienert S Heim D Kotani M Lindequist B Sten-zinger A Ishii M Hufnagl P Beil M Dietel M DenkertC Klauschen F CognitionMaster an object-based image analysis framework Diagnostic pathol-ogy 2013 834

10 Gunduz C Yener B Gultekin SH The cell graphs ofcancer Bioinformatics 2004 20 Suppl 1i145ndashi151

11 Gunduz C Gultekin SH Yener B Augmented cell-graphs for automated cancer diagnosis Bioinfor-matics 2005 21 Suppl 2ii7ndashii12

12 West NP Dattani M McShane P Hutchins G GrabschJ Mueller W Treanor D Quirke P Grabsch H Theproportion of tumour cells is an independent pre-dictor for survival in colorectal cancer patientsBritish Journal of Cancer 2010 1021519ndash1523

13 Chang H Han J Borowsky A Loss L Gray JW Spell-man PT Parvin B Invariant Delineation of Nu-clear Architecture in Glioblastoma Multiformefor Clinical and Molecular Association IEEETrans Med Imag 2013 32(4)670ndash682

14 Kayser K Radziszowski D Bzdyl P Sommer R KayserG Towards an automated virtual slide screeningtheoretical considerations and practical experi-ences of automated tissue-based virtual diagno-sis to be implemented in the Internet Diagnosticpathology 2006 110

15 PLGA Foundation Meta Analysis Low GradeGlioma Database Project 2012 [httpwwwfightplgaorgresearchPLGA-Sponsored ProjectsMetaAnalysis]

7

16 Garcıa Rojo M Bueno G Slodkowska J Review ofimaging solutions for integrated quantitative im-munohistochemistry in the Pathology daily prac-tice Folia histochemica et cytobiologica 2009 47(3)349ndash354

17 Rasband WS ImageJ 1997ndash2012 [httpimagejnihgovij]

18 ImageMagick Studio LLC ImageMagick 2013 [httpwwwimagemagickorg]

19 GraphicsMagick Group GraphicsMagick 2013 [httpwwwgraphicsmagickorg]

20 Kong J Cooper LAD Wang F Chisolm C MorenoCS Kurc TM Widener PM Brat DJ Saltz JH Acomprehensive framework for classification of nu-clei in digital microscopy imaging An applica-tion to diffuse gliomas In Biomedical Imaging FromNano to Macro 2011 IEEE International Symposium on20112128ndash2131

21 Kayser K Gortler J Borkenfeld S Kayser G Grid com-puting in image analysis Diagnostic pathology 20116(Suppl 1)S12

22 Granier A Olivier M Laborie S Vaudescal S Baecker VTran-Aupiais C WIDE (Web Images and Data En-vironment) 2013 [httpwwwmricnrsfrindexphpm=81]

23 Kayser K Introduction of virtual microscopy inroutine surgical pathology mdash a hypothesis andpersonal view from Europe Diagnostic pathology2012 748

24 Goode A Satyanarayanan M A Vendor-NeutralLibrary and Viewer for Whole-Slide ImagesTech Rep Technical Report CMU-CS-08-136 Com-puter Science Department Carnegie Mellon Univer-sity 2008 [httpreports-archiveadmcscmueduanon2008CMU-CS-08-136pdf]

25 Linkert M Rueden CT Allan C Burel JM Moore WPatterson A Loranger B Moore J Neves C MacDon-ald D Tarkowska A Sticco C Hill E Rossner M EliceiriKW Swedlow JR Metadata matters access to im-age data in the real world Journal of Cell Biology2010 198(5)777ndash782

26 Khushi M Edwards G de Marcos DA Carpenter JEGraham JD Clarke CL Open source tools for man-agement and archiving of digital microscopy datato allow integration with patient pathology andtreatment information Diagnostic pathology 2013822

27 Sam Leffler S the authors of LibTIFF LibTIFF ndashTIFF Library and Utilities 2012 [httpwwwremotesensingorglibtiff]

28 Lane TG Vollbeding G The Independent JPEGGrouprsquos JPEG software 2013 [httpwwwijgorg]

29 Lane TG Vollbeding G the authors of the libjpeg-turbosoftware libjpeg-turbo 2012 [httplibjpeg-turbovirtualglorg]

30 Schneider CA Rasband WS Eliceiri KW NIH Imageto ImageJ 25 years of image analysis Nature Meth-ods 2012 9671ndash675

31 Sacha J Image IO Plugin Bundle 2004 [httpij-pluginssourceforgenetpluginsimageio]

32 Sun Microsystems Inc Java Advanced Library113 2006 [httpwwworaclecomtechnetworkjavacurrent-142188html]

33 BigTIFF Design 2012 [httpwwwremotesensingorglibtiffbigtiffdesignhtml]

34 The BigTIFF File Format Proposal 2008 [httpwwwawaresystemsbeimagingtiffbigtiffhtml]

8

Figures

(a) (b) (c)

Figure 1 - A sample slide(a) macroscopic view of the whole slide (the black rectangle on the left is 1x2 cm) (bc) Influence of themagnification on the quality of results (b) a portion of the slide scanned at magnification level 10x Thewhite contours show the result of an automatic detection of the dark cell nuclei with the ImageJ software Asignificant fraction of the cell nuclei is missed and the contours are rather pixelated (c) the same portion ofthe slide scanned at magnification 40x The white contours show the result of the same automatic detectionAlmost all cell nuclei are detected and the shapes of the contours are much more precise Scale bar 4 microm

Figure 2 - A typical session using ImageJ and the NDPITools pluginsA NDPI file has been opened with the NDPITools plugins and it is displayed as a preview image (image atlargest resolution which still fits into the computerrsquos screen) mdash top window A rectangular region has beenselected and extracted as a TIFF image then opened mdash bottom window

9

Figure 3 - Preview image of a NDPI file with several focalization levels in ImageJThe NDPI file 08ndpi contains images at 5 different focalization levels Therefore its preview image isdisplayed as a stack of 5 images

10

Figure 4 - Dialog box for customized extraction in ImageJ from an NDPI file with production of amosaicThe dialog box shows some options which can be customized while producing a mosaic from a rectangularselection of a NDPI file preview image (here using the file previewed in Figure 3)

Figure 5 - Statistical properties of the cell nuclei with high chromatin content in the tissue sample ofFigure 1The positions of the 154240 identified nuclei were obtained from the analysis with ImageJ of the digital slideon a laptop computer Since the slide was too large to fit into the computerrsquos memory it was turned into amosaic of 16 pieces with overlap of 60 pixels and each piece underwent automated analysis independentlyThen the results were aggregated The graph shows the probability density function of the distance of a cellnucleus to its nearest neighbor in the whole sample

11

TablesTable 1 - Speed comparison of software to extract a 256 times 256 rectangle from a huge TIFF imageTime needed (or indication of failure when the task was not completed) by several software tools to extracta rectangular region of size 256times256 pixels situated at the bottom right corner of huge TIFF images and tosave it as an independent file The input images were single-image tiled TIFF files using JPEG compressionTheir dimensions are indicated in the top row The computer used was a 26 GHz Intel Core i7 Mac Miniwith 16 GiB of RAM and more than 100 GiB of free hard disk The tested software tools were from topto bottom tifffastcrop from our LargeTIFFTools GraphicsMagick 1317 ImageMagick 680-7 and theutility tiffcrop from LibTIFF 403

Image size (px) 11264 times 4384 45056 times 17536 180224 times 70144

tifffastcrop 030 s 030 s 030 sGraphicsMagick 074 s 236 s gt 80 min

ImageMagick 118 s 236 s failedtiffcrop 050 s failed seg fault

Table 2 - Downloads of the NDPIToolsDistribution of the downloads (unique IP address) of the precompiled binaries of the NDPITools betweenMarch 2012 and April 2013

Windows (32 bits) Windows (64 bits) Linux Mac OS X483 542 217 285

12

Page 3: Analyzing huge pathology images with open source … · Analyzing huge pathology images with open source software ... telepathology and teaching [1,2]. In more and more hospitals,

ImplementationOverview

The main software is implemented in the C program-ming language as separate command-line driven ex-ecutables It is independent of any proprietary li-brary This ensures portability on a large num-ber of platforms (we have tested several versions ofMac OS X Linux and Windows) modularity andease of integration into scripts or other softwareprojects

It is complemented by a set of plugins for thepublic domain software ImageJ [17] implemented inJava which call the main executables in an auto-matic way to enable an interactive use

The LargeTIFFTools and NDPITools are basedon the open source TIFF [27] and JPEG [28] orlibjpeg-turbo [29] libraries The NDPITools plug-ins for ImageJ are based on the Java API of Im-ageJ [1730] and on the open source software Image-IO [31] and use the Java Advanced Imaging 113library [32]

Basic functions

The basic functions are the following They canbe performed even on a computer with a modestamount of RAM (see below the ldquoperformancerdquo dis-cussion)

1 splitting a tiled TIFF file into multiple TIFFfiles one for each of the tiles (tiffsplittilesprogram)

2 extracting (ldquocroppingrdquo) quickly a given rectan-gle of a supposedly tiled TIFF file into a TIFFor JPEG file (tifffastcrop program)

3 splitting one or several TIFF file(s) pos-sibly very large into mosaic(s) with-out fully decompressing them in memory(tiffmakemosaic program)

4 converting a NDPI file into a standardmultiple-image TIFF file tiled if necessaryusing upon request the BigTIFF format in-troduced in version 400 of the TIFF li-brary [27 33 34] and encoding magnificationand focus levels as TIFF ldquoimage descriptionrdquofields (ndpi2tiff program)

5 creating a standard TIFF file for all or partof the magnification levels and focus levels

present in the given NDPI file (the user canask for specific magnification and focus levelsand for a specific rectangular region of the im-age) and upon request creating a mosaic foreach image which doesnrsquot fit into RAM or forall images (ndpisplit program) The namesof the created files are built on the name ofthe source file and incorporate the magnifica-tion and focus levels (and in the case of mosaicpieces the coordinates inside the mosaic)

MosaicsA mosaic is a set of TIFF or JPEG files (the pieces)which would reproduce the original image if reassem-bled together but of manageable size by standardsoftware The user can either specify the maximumamount of RAM which a mosaic piece should needto be uncompressed (default 1024 MiB) or directlyspecify the size of each piece In the first case thesize of each piece is determined by the software Agiven amount of overlap between mosaic pieces canbe requested either in pixels or as a percentage ofthe image size This is useful eg for cell countingnot to miss cells which lie on the limit between twoadjacent pieces

UsageStandalone

Our tools can be used through the commandline (POSIX-like shell or Windows command inter-preter) and therefore can be very easily integratedinto scripts or other programs Depending on thetool the paths and file names of one or several filesin NDPI or TIFF format have to be provided Op-tions can be added with their arguments on the com-mand line to modify the behavior of the programsfrom its default They are explained in the messagesprinted by the programs run without arguments inUnix-style man pages and on the web pages of theproject (see below in the Availability and require-ments Section)

Under the Windows OS one can click-and-dragthe NDPI file icon onto the icon of ndpi2tiff orndpisplit We provide precompiled binaries wherefrequently-used options are turned on by defaulteg ndpisplit-mJexe produces a mosaic in JPEGformat as with option -mJ The conversion result ormosaic can be found in the same directory as theoriginal NDPI image

3

ImageJ integration

In addition to command line use the ndpisplit pro-gram can be driven through the NDPITools pluginsin ImageJ with a point-and-click interface so thatpreviewing the content of a NDPI file at low resolu-tion selecting a portion extracting it at high resolu-tion and finally opening it in ImageJ to apply furthertreatments can be done in an easy and graphical wayFigure 2 shows a screen shot of ImageJ 147m afterextraction of a rectangular zone from a NDPI fileFigure 3 explains what happens when the NDPI filecontains several levels of focalization the previewimage is displayed as a stack

When producing a mosaic the user can requestthat pieces be JPEG files Since the File gt Open

command of versions 1x of ImageJ is unable to openTIFF files with JPEG compression (one has to useplugins) this is way to produce mosaics which canbe opened by click-and-drag onto the window or iconof ImageJ while still saving disk space thanks to ef-ficient compression Figure 4 shows how the mosaicproduction options can be set inside ImageJ throughthe NDPITools plugins

Results and DiscussionPerformance

We compare the performance of our tools on severalfundamental tasks to standard broadly availablesoftware in representative examples and on broadlyavailable computers

Making a mosaic from a huge image

We chose an 8-bit RGB colour JPEG-compressedTIFF file of 103168times63232 pixels originating in thedigitization of a pathology slide The original fileweighted 97501 MiB Loading this image entirelyinto RAM would need at least 3times 103168times 63232 =182 GiB and is presently intractable on most if notall desktop and laptop computers of reasonable cost

The task was to produce from this image a mo-saic of 64 pieces so that each one needs less than512 MiB RAM to open

On a 32 GHz Intel Core i3 IMac computer with16 GB of RAM the convert command from Im-ageMagick (version 680-7 with quantum size 8 bits)was unable to complete the request GraphicsMag-ick (gm convert -crop version 1317 with quan-tum size 8 bits) completed the request in 70 min us-

ing 25 GiB of disk space tiffmakemosaic from ourLargeTIFFTools completed the request in 25 min

To ascertain that this task can be equallyachieved even on computers with a modest RAMamount we performed the same task on a 6-year-old266 GHz Core2Duo Intel IMac with 2 GiB RAMThe task was completed in 90 min

Converting NDPI into TIFF

Splitting a NDPI file into TIFF files A pathologysample (67 cm2 of tissue) was scanned at magnifica-tion 40x and with 11 focus levels (every 2 microns)by a NanoZoomer resulting in a 65 GiB file in pro-prietary NDPI format (called file andpi hereafter)On a 26 GHz Intel Core i7 Mac Mini computer with16 GiB RAM ndpisplit extracted all 55 images(11 focus levels and 5 magnifications) as indepen-dent single-image TIFF files with JPEG compres-sion in 711 min The size of the largest images was180224 times 70144 The speed was limited only by therate of IO transfers since the CPU usage of this taskwas 138 min out of which the system used 130 minExecuting again the same task straight after the firstexecution took only 057 min because the NDPI filewas still in the cache of the operating system

To ascertain that this task can be equallyachieved even on computers with a modest RAMamount we made a try on a 6-year-old 266 GHzCore2Duo Intel PC with 2 GiB RAM running 32-bits Windows XP Pro SP3 The original file shownin Figure 1 called bndpi and weighting 207 GiB(largest image 103168times63232 pixels) was split intoindependent TIFF files in 22 min without swapping

In comparison the LOCI Bio-Formats plugins forImageJ [25] in its version 446 with ImageJ 143mwas not able to open the images in file andpi evenat low resolution

Converting a NDPI file into a multiple-images TIFFfile Alternatively the same proprietary-format fileandpi was converted into a multiple-images TIFFfile with ndpi2tiff On the same computer as be-fore the conversion time was 70 min Here againthe speed of the process is limited only by the rate ofIO transfers since the conversion took only 30 s ifperformed when the NDPI file was still in the cacheof the operating system

Since the resulting TIFF file could not store all55 images in less than 4 GiB we passed the option -8

4

on the command line to ndpi2tiff to request usingthe BigTIFF format extension The specificationsof this extension to the TIFF standard discussedand published before 2008 [3334] are supported byLibTIFF as of version 400 [27] and therefore by theabundant image viewing and manipulation softwarewhich relies on LibTIFF If the use of the BigTIFFformat extension would have impeded the further ex-ploitation of the produced TIFF file we could havesimply used ndpisplit as above Or we could havecalled the ndpi2tiff command several times eachtime requesting extraction of a subset of all imagesby specifying image numbers after the file name sep-arated with commas as in andpi01234

Extracting a small region from a huge image

This task can be useful to visualize at full resolutiona region of interest which the user has selected ona low-magnification preview image Therefore itshould be performed as quickly as possible

From a TIFF file

The task was to extract a rectangular region ofsize 256 times 256 pixels situated at the bottom rightcorner of huge TIFF images and to save it as anindependent file The source images were single-image TIFF files using JPEG compression Table 1compares the time needed to complete the taskwith tifffastcrop from our LargeTIFFTools andwith several software tools on increasingly largeTIFF files Tests were performed on a 26 GHz IntelCore i7 Mac Mini computer with 16 GB of RAM andused GraphicsMagick 1317 ImageMagick 680-7 and the utility tiffcrop from LibTIFF 403Noticeably when treating the largest image Graph-icsMagick needs 50 GiB of free disk space whereastifffastcrop doesnrsquot need it

From a NDPI file

The task was to extract a rectangular region of size256 times 256 from one of the largest images of the fileandpi (size 180224 times 70144) On a 26 GHz IntelCore i7 Mac Mini computer with 16 GB of RAMthe execution time was 012 s for one extract and inaverage 006 s per extract in a series of 20 extractswith locations drawn uniformly at random inside thewhole image

ApplicationsIntegration in digital pathology image servers or virtualslide systems

The NDPITools are being used in several other soft-ware projects

bull in a system for automatic blur detection [24]

bull in WIDE [22] to deal with NDPI files WIDEis an open-source biological and digital pathol-ogy image archiving and visualization systemwhich allows the remote user to see imagesstored in a remote library in a browser In par-ticular thanks to the feature of high-speed ex-traction of a rectangular region by ndpisplitWIDE saves costly disk space since it doesnrsquotneed to store TIFF files converted from NDPIfiles in addition to the latter

Exploiting a large set of digital slides

In the framework of a study about invasive low-gradeoligodendrogliomas reported elsewhere [8] we hadto deal with 303 NDPI files occupying 122 GiBOn a 32 GHz Intel Core i3 IMac computer with16 GB RAM we used ndpisplit in a batch workto convert them into standard TIFF files whichtook only a few hours The experimental -s op-tion of ndpisplit was used to remove the blankfilling between scanned regions resulting in an im-portant disk space saving and in smaller TIFF files(one for each scanned region) which where easier tomanipulate afterwards Then for each sample Pre-viewapp and ImageJ were used to inspect the re-sulting images and manually select the regions ofclinical interest The corresponding extracts of thehigh magnification images were the subject of auto-mated cell counting and other quantitative analysesusing ImageJ In particular we collected quantita-tive data about edema or tissue hyperhydration [8]This quantity needed a specific image analysis proce-dure which is not offered by standard morphometrysoftware and unlike cell density estimates could notbe retrieved by sampling a few fields of view in themicroscope Therefore virtual microscopy and ourtools were essential in this study

Study of a whole slide of brain tissue invaded by anoligodendroglioma

To demonstrate the possibility to do research onhuge images even with a modest computer we chose

5

a 3-year-old MacBook Pro laptop computer with266 GHz Intel Core 2 Duo and 4 GiB of RAM Weused ImageJ and the NDPITools to perform statis-tics on the upper piece of tissue on the slide shownin Figure 1

Since the digital slide bndpi weighted 207 GiBwith a high resolution image of 103168 times 63232 pix-els it was not possible to do the study ina straightforward way We opened the filebndpi as a preview image with the commandPlugins gt NDPITools gt Preview NDPI andselected on it the left tissue sample Thenwe used the command Plugins gt NDPITools gt

Custom extract to TIFF Mosaic and askedfor extraction as a mosaic of 16 JPEG files each oneneeding less than 1 GiB of RAM to open and withan overlap of 60 pixels This was completed withina few minutes Then we applied an ImageJ macroto each of the 16 pieces to identify the dark cell nu-clei (those with high chromatin content) based onthresholding the luminosity values of the pixels asshown in Figure 1 It produced text files with thecoordinates and size of each cell nucleus

Out of the 154240 identified nuclei 1951 were po-sitioned on the overlapping regions between piecesUsing the overlap feature of our tools enabled toproperly detect these nuclei since they would havebeen cut by the boundary of the pieces of the mo-saic in absence of overlap We avoided double count-ing by identifying the pairs of nuclei situated in theoverlapping regions and which were separated by adistance smaller than their radius

As shown in earlier studies [7 10 11] these datacan be used for research and diagnosis purposes Asan example Figure 5 shows the distribution of thedistance of each cell nucleus to its nearest neighborThanks to the very high number of analyzed cell nu-clei this distribution is obtained with an excellentprecision

ConclusionsThe LargeTIFFTools NDPITools and NDPIToolsplugins for ImageJ achieve efficiently some funda-mental functions on large images and in particulardigital slides for which standard open source soft-ware fails or performs badly They enable both theclinician to examine a single slide and the bioinfor-matics research team to perform large-scale analysisof many slides possibly on computer grids [20]

To date the LargeTIFFTools have been down-loaded from more than 388 different IP addressesthe NDPITools from more than 1361 addresses andthe ImageJ plugins from more than 235 addressesTable 2 lists the distribution of the target platformsamong the downloads of the binary files It shows abroad usage of the different platforms by the commu-nity emphasizing the importance of cross-platformopen source tools

We have explained how the software was usedto study some microscopic properties of brain tis-sue when invaded by an oligodendroglioma and wehave given an illustrative application to the analysisof a whole-size pathology slide This suggests otherpromising applications

Availability and requirementsa LargeTIFFTools

bull Project name LargeTIFFTools

bull Project home page httpwwwimncin2p3frpagespersoderoulerssoftwarelargetifftools

bull Operating system(s) Platform indepen-dent

bull Programming language C

bull Other requirements libjpeg libtiff

bull License GNU GPLv3

b NDPITools

bull Project name NDPITools

bull Project home page httpwwwimncin2p3frpagespersoderoulerssoftwarendpitools

bull Operating system(s) Platform indepen-dent

bull Programming language C

bull Other requirements mdash

bull License GNU GPLv3

For the convenience of users precompiled bi-naries are provided for Windows (32 and 64 bits)Mac OS X and Linux

c NDPITools plugins for ImageJ

6

bull Project name NDPITools plugins for Im-ageJ

bull Project home page httpwwwimncin2p3frpagespersoderoulerssoftwarendpitools

bull Operating system(s) Platform indepen-dent

bull Programming language Java

bull Other requirements ImageJ 131s orhigher Ant JAI 113

bull License GNU GPLv3

Competing interestsThe authors declare that they have no competing inter-ests

Authors contributionsCD wrote the paper ML conceived and implemented afirst version of the integration into ImageJ as a toolsetof macros CD implemented the software and wrote thedocumentation CG AG and ML contributed sugges-tions to the software CD DA AG and ML performedsoftware tests CD MB CG AG and ML selected andprovided histological samples CD performed the statis-tical analysis of the sample slide All authors reviewedthe manuscript All authors read and approved the finalmanuscript

AcknowledgementsWe thank F Bouhidel and P Bertheau for their helpwith the slide scanner of the Pathology Laboratory ofthe Saint-Louis Hospital in Paris and C Klein (Imag-ing facility Cordeliers Research Center ndash INSERM U872Paris) for tests and suggestions

The computer CPU operating system and pro-gramming language names quoted in this article aretrademarks of their respective owners

References1 Diamond J McCleary D Virtual Microscopy In Ad-

vanced techniques in diagnostic cellular pathology Editedby Hannon-Fletcher M Maxwell P Chichester UK JohnWiley amp Sons Ltd 2009

2 Ameisen D Yunes JB Deroulers C Perrier V BouhidelF Battistella M Legres L Janin A Bertheau P Stackor Trash Fast quality assessment of virtualslides Diagnostics Pathology 2013 in press

3 Garcıa Rojo M Castro AM Goncalves L COST ActionrdquoEuroTelepathrdquo digital pathology integration inelectronic health record including primary carecentres Diagnostic pathology 2011 6(Suppl 1)S6

4 Ameisen D Integration des lames virtuelles dansle dossier patient electronique PhD thesis UnivParis Diderot-Paris 7 2013

5 Collan Y Torkkeli T Personen E Jantunen E KosmaVM Application of morphometry in tumorpathology Analytical and quantitative cytology and his-tology 1987 9(2)79ndash88

6 Wolfe P Murphy J McGinley J Zhu Z Jiang WGottschall E Thompson H Using nuclear morphom-etry to discriminate the tumorigenic potential ofcells A comparison of statistical methods Cancerepidemiology biomarkers amp prevention 2004 13(6)976ndash988

7 Gurcan MN Boucheron LE Can A Madabhushi A Ra-jpoot NM Yener B Histopathological Image Analy-sis A Review Biomedical Engineering IEEE Reviewsin 2009 2147ndash171

8 Gerin C Pallud J Deroulers C Varlet P OppenheimC Roux FX Chretien F Thomas SR GrammaticosB Badoual M Quantitative characterization ofthe imaging limits of diffuse low-grade oligoden-drogliomas Neuro-Oncology 2013 in press

9 Wienert S Heim D Kotani M Lindequist B Sten-zinger A Ishii M Hufnagl P Beil M Dietel M DenkertC Klauschen F CognitionMaster an object-based image analysis framework Diagnostic pathol-ogy 2013 834

10 Gunduz C Yener B Gultekin SH The cell graphs ofcancer Bioinformatics 2004 20 Suppl 1i145ndashi151

11 Gunduz C Gultekin SH Yener B Augmented cell-graphs for automated cancer diagnosis Bioinfor-matics 2005 21 Suppl 2ii7ndashii12

12 West NP Dattani M McShane P Hutchins G GrabschJ Mueller W Treanor D Quirke P Grabsch H Theproportion of tumour cells is an independent pre-dictor for survival in colorectal cancer patientsBritish Journal of Cancer 2010 1021519ndash1523

13 Chang H Han J Borowsky A Loss L Gray JW Spell-man PT Parvin B Invariant Delineation of Nu-clear Architecture in Glioblastoma Multiformefor Clinical and Molecular Association IEEETrans Med Imag 2013 32(4)670ndash682

14 Kayser K Radziszowski D Bzdyl P Sommer R KayserG Towards an automated virtual slide screeningtheoretical considerations and practical experi-ences of automated tissue-based virtual diagno-sis to be implemented in the Internet Diagnosticpathology 2006 110

15 PLGA Foundation Meta Analysis Low GradeGlioma Database Project 2012 [httpwwwfightplgaorgresearchPLGA-Sponsored ProjectsMetaAnalysis]

7

16 Garcıa Rojo M Bueno G Slodkowska J Review ofimaging solutions for integrated quantitative im-munohistochemistry in the Pathology daily prac-tice Folia histochemica et cytobiologica 2009 47(3)349ndash354

17 Rasband WS ImageJ 1997ndash2012 [httpimagejnihgovij]

18 ImageMagick Studio LLC ImageMagick 2013 [httpwwwimagemagickorg]

19 GraphicsMagick Group GraphicsMagick 2013 [httpwwwgraphicsmagickorg]

20 Kong J Cooper LAD Wang F Chisolm C MorenoCS Kurc TM Widener PM Brat DJ Saltz JH Acomprehensive framework for classification of nu-clei in digital microscopy imaging An applica-tion to diffuse gliomas In Biomedical Imaging FromNano to Macro 2011 IEEE International Symposium on20112128ndash2131

21 Kayser K Gortler J Borkenfeld S Kayser G Grid com-puting in image analysis Diagnostic pathology 20116(Suppl 1)S12

22 Granier A Olivier M Laborie S Vaudescal S Baecker VTran-Aupiais C WIDE (Web Images and Data En-vironment) 2013 [httpwwwmricnrsfrindexphpm=81]

23 Kayser K Introduction of virtual microscopy inroutine surgical pathology mdash a hypothesis andpersonal view from Europe Diagnostic pathology2012 748

24 Goode A Satyanarayanan M A Vendor-NeutralLibrary and Viewer for Whole-Slide ImagesTech Rep Technical Report CMU-CS-08-136 Com-puter Science Department Carnegie Mellon Univer-sity 2008 [httpreports-archiveadmcscmueduanon2008CMU-CS-08-136pdf]

25 Linkert M Rueden CT Allan C Burel JM Moore WPatterson A Loranger B Moore J Neves C MacDon-ald D Tarkowska A Sticco C Hill E Rossner M EliceiriKW Swedlow JR Metadata matters access to im-age data in the real world Journal of Cell Biology2010 198(5)777ndash782

26 Khushi M Edwards G de Marcos DA Carpenter JEGraham JD Clarke CL Open source tools for man-agement and archiving of digital microscopy datato allow integration with patient pathology andtreatment information Diagnostic pathology 2013822

27 Sam Leffler S the authors of LibTIFF LibTIFF ndashTIFF Library and Utilities 2012 [httpwwwremotesensingorglibtiff]

28 Lane TG Vollbeding G The Independent JPEGGrouprsquos JPEG software 2013 [httpwwwijgorg]

29 Lane TG Vollbeding G the authors of the libjpeg-turbosoftware libjpeg-turbo 2012 [httplibjpeg-turbovirtualglorg]

30 Schneider CA Rasband WS Eliceiri KW NIH Imageto ImageJ 25 years of image analysis Nature Meth-ods 2012 9671ndash675

31 Sacha J Image IO Plugin Bundle 2004 [httpij-pluginssourceforgenetpluginsimageio]

32 Sun Microsystems Inc Java Advanced Library113 2006 [httpwwworaclecomtechnetworkjavacurrent-142188html]

33 BigTIFF Design 2012 [httpwwwremotesensingorglibtiffbigtiffdesignhtml]

34 The BigTIFF File Format Proposal 2008 [httpwwwawaresystemsbeimagingtiffbigtiffhtml]

8

Figures

(a) (b) (c)

Figure 1 - A sample slide(a) macroscopic view of the whole slide (the black rectangle on the left is 1x2 cm) (bc) Influence of themagnification on the quality of results (b) a portion of the slide scanned at magnification level 10x Thewhite contours show the result of an automatic detection of the dark cell nuclei with the ImageJ software Asignificant fraction of the cell nuclei is missed and the contours are rather pixelated (c) the same portion ofthe slide scanned at magnification 40x The white contours show the result of the same automatic detectionAlmost all cell nuclei are detected and the shapes of the contours are much more precise Scale bar 4 microm

Figure 2 - A typical session using ImageJ and the NDPITools pluginsA NDPI file has been opened with the NDPITools plugins and it is displayed as a preview image (image atlargest resolution which still fits into the computerrsquos screen) mdash top window A rectangular region has beenselected and extracted as a TIFF image then opened mdash bottom window

9

Figure 3 - Preview image of a NDPI file with several focalization levels in ImageJThe NDPI file 08ndpi contains images at 5 different focalization levels Therefore its preview image isdisplayed as a stack of 5 images

10

Figure 4 - Dialog box for customized extraction in ImageJ from an NDPI file with production of amosaicThe dialog box shows some options which can be customized while producing a mosaic from a rectangularselection of a NDPI file preview image (here using the file previewed in Figure 3)

Figure 5 - Statistical properties of the cell nuclei with high chromatin content in the tissue sample ofFigure 1The positions of the 154240 identified nuclei were obtained from the analysis with ImageJ of the digital slideon a laptop computer Since the slide was too large to fit into the computerrsquos memory it was turned into amosaic of 16 pieces with overlap of 60 pixels and each piece underwent automated analysis independentlyThen the results were aggregated The graph shows the probability density function of the distance of a cellnucleus to its nearest neighbor in the whole sample

11

TablesTable 1 - Speed comparison of software to extract a 256 times 256 rectangle from a huge TIFF imageTime needed (or indication of failure when the task was not completed) by several software tools to extracta rectangular region of size 256times256 pixels situated at the bottom right corner of huge TIFF images and tosave it as an independent file The input images were single-image tiled TIFF files using JPEG compressionTheir dimensions are indicated in the top row The computer used was a 26 GHz Intel Core i7 Mac Miniwith 16 GiB of RAM and more than 100 GiB of free hard disk The tested software tools were from topto bottom tifffastcrop from our LargeTIFFTools GraphicsMagick 1317 ImageMagick 680-7 and theutility tiffcrop from LibTIFF 403

Image size (px) 11264 times 4384 45056 times 17536 180224 times 70144

tifffastcrop 030 s 030 s 030 sGraphicsMagick 074 s 236 s gt 80 min

ImageMagick 118 s 236 s failedtiffcrop 050 s failed seg fault

Table 2 - Downloads of the NDPIToolsDistribution of the downloads (unique IP address) of the precompiled binaries of the NDPITools betweenMarch 2012 and April 2013

Windows (32 bits) Windows (64 bits) Linux Mac OS X483 542 217 285

12

Page 4: Analyzing huge pathology images with open source … · Analyzing huge pathology images with open source software ... telepathology and teaching [1,2]. In more and more hospitals,

ImageJ integration

In addition to command line use the ndpisplit pro-gram can be driven through the NDPITools pluginsin ImageJ with a point-and-click interface so thatpreviewing the content of a NDPI file at low resolu-tion selecting a portion extracting it at high resolu-tion and finally opening it in ImageJ to apply furthertreatments can be done in an easy and graphical wayFigure 2 shows a screen shot of ImageJ 147m afterextraction of a rectangular zone from a NDPI fileFigure 3 explains what happens when the NDPI filecontains several levels of focalization the previewimage is displayed as a stack

When producing a mosaic the user can requestthat pieces be JPEG files Since the File gt Open

command of versions 1x of ImageJ is unable to openTIFF files with JPEG compression (one has to useplugins) this is way to produce mosaics which canbe opened by click-and-drag onto the window or iconof ImageJ while still saving disk space thanks to ef-ficient compression Figure 4 shows how the mosaicproduction options can be set inside ImageJ throughthe NDPITools plugins

Results and DiscussionPerformance

We compare the performance of our tools on severalfundamental tasks to standard broadly availablesoftware in representative examples and on broadlyavailable computers

Making a mosaic from a huge image

We chose an 8-bit RGB colour JPEG-compressedTIFF file of 103168times63232 pixels originating in thedigitization of a pathology slide The original fileweighted 97501 MiB Loading this image entirelyinto RAM would need at least 3times 103168times 63232 =182 GiB and is presently intractable on most if notall desktop and laptop computers of reasonable cost

The task was to produce from this image a mo-saic of 64 pieces so that each one needs less than512 MiB RAM to open

On a 32 GHz Intel Core i3 IMac computer with16 GB of RAM the convert command from Im-ageMagick (version 680-7 with quantum size 8 bits)was unable to complete the request GraphicsMag-ick (gm convert -crop version 1317 with quan-tum size 8 bits) completed the request in 70 min us-

ing 25 GiB of disk space tiffmakemosaic from ourLargeTIFFTools completed the request in 25 min

To ascertain that this task can be equallyachieved even on computers with a modest RAMamount we performed the same task on a 6-year-old266 GHz Core2Duo Intel IMac with 2 GiB RAMThe task was completed in 90 min

Converting NDPI into TIFF

Splitting a NDPI file into TIFF files A pathologysample (67 cm2 of tissue) was scanned at magnifica-tion 40x and with 11 focus levels (every 2 microns)by a NanoZoomer resulting in a 65 GiB file in pro-prietary NDPI format (called file andpi hereafter)On a 26 GHz Intel Core i7 Mac Mini computer with16 GiB RAM ndpisplit extracted all 55 images(11 focus levels and 5 magnifications) as indepen-dent single-image TIFF files with JPEG compres-sion in 711 min The size of the largest images was180224 times 70144 The speed was limited only by therate of IO transfers since the CPU usage of this taskwas 138 min out of which the system used 130 minExecuting again the same task straight after the firstexecution took only 057 min because the NDPI filewas still in the cache of the operating system

To ascertain that this task can be equallyachieved even on computers with a modest RAMamount we made a try on a 6-year-old 266 GHzCore2Duo Intel PC with 2 GiB RAM running 32-bits Windows XP Pro SP3 The original file shownin Figure 1 called bndpi and weighting 207 GiB(largest image 103168times63232 pixels) was split intoindependent TIFF files in 22 min without swapping

In comparison the LOCI Bio-Formats plugins forImageJ [25] in its version 446 with ImageJ 143mwas not able to open the images in file andpi evenat low resolution

Converting a NDPI file into a multiple-images TIFFfile Alternatively the same proprietary-format fileandpi was converted into a multiple-images TIFFfile with ndpi2tiff On the same computer as be-fore the conversion time was 70 min Here againthe speed of the process is limited only by the rate ofIO transfers since the conversion took only 30 s ifperformed when the NDPI file was still in the cacheof the operating system

Since the resulting TIFF file could not store all55 images in less than 4 GiB we passed the option -8

4

on the command line to ndpi2tiff to request usingthe BigTIFF format extension The specificationsof this extension to the TIFF standard discussedand published before 2008 [3334] are supported byLibTIFF as of version 400 [27] and therefore by theabundant image viewing and manipulation softwarewhich relies on LibTIFF If the use of the BigTIFFformat extension would have impeded the further ex-ploitation of the produced TIFF file we could havesimply used ndpisplit as above Or we could havecalled the ndpi2tiff command several times eachtime requesting extraction of a subset of all imagesby specifying image numbers after the file name sep-arated with commas as in andpi01234

Extracting a small region from a huge image

This task can be useful to visualize at full resolutiona region of interest which the user has selected ona low-magnification preview image Therefore itshould be performed as quickly as possible

From a TIFF file

The task was to extract a rectangular region ofsize 256 times 256 pixels situated at the bottom rightcorner of huge TIFF images and to save it as anindependent file The source images were single-image TIFF files using JPEG compression Table 1compares the time needed to complete the taskwith tifffastcrop from our LargeTIFFTools andwith several software tools on increasingly largeTIFF files Tests were performed on a 26 GHz IntelCore i7 Mac Mini computer with 16 GB of RAM andused GraphicsMagick 1317 ImageMagick 680-7 and the utility tiffcrop from LibTIFF 403Noticeably when treating the largest image Graph-icsMagick needs 50 GiB of free disk space whereastifffastcrop doesnrsquot need it

From a NDPI file

The task was to extract a rectangular region of size256 times 256 from one of the largest images of the fileandpi (size 180224 times 70144) On a 26 GHz IntelCore i7 Mac Mini computer with 16 GB of RAMthe execution time was 012 s for one extract and inaverage 006 s per extract in a series of 20 extractswith locations drawn uniformly at random inside thewhole image

ApplicationsIntegration in digital pathology image servers or virtualslide systems

The NDPITools are being used in several other soft-ware projects

bull in a system for automatic blur detection [24]

bull in WIDE [22] to deal with NDPI files WIDEis an open-source biological and digital pathol-ogy image archiving and visualization systemwhich allows the remote user to see imagesstored in a remote library in a browser In par-ticular thanks to the feature of high-speed ex-traction of a rectangular region by ndpisplitWIDE saves costly disk space since it doesnrsquotneed to store TIFF files converted from NDPIfiles in addition to the latter

Exploiting a large set of digital slides

In the framework of a study about invasive low-gradeoligodendrogliomas reported elsewhere [8] we hadto deal with 303 NDPI files occupying 122 GiBOn a 32 GHz Intel Core i3 IMac computer with16 GB RAM we used ndpisplit in a batch workto convert them into standard TIFF files whichtook only a few hours The experimental -s op-tion of ndpisplit was used to remove the blankfilling between scanned regions resulting in an im-portant disk space saving and in smaller TIFF files(one for each scanned region) which where easier tomanipulate afterwards Then for each sample Pre-viewapp and ImageJ were used to inspect the re-sulting images and manually select the regions ofclinical interest The corresponding extracts of thehigh magnification images were the subject of auto-mated cell counting and other quantitative analysesusing ImageJ In particular we collected quantita-tive data about edema or tissue hyperhydration [8]This quantity needed a specific image analysis proce-dure which is not offered by standard morphometrysoftware and unlike cell density estimates could notbe retrieved by sampling a few fields of view in themicroscope Therefore virtual microscopy and ourtools were essential in this study

Study of a whole slide of brain tissue invaded by anoligodendroglioma

To demonstrate the possibility to do research onhuge images even with a modest computer we chose

5

a 3-year-old MacBook Pro laptop computer with266 GHz Intel Core 2 Duo and 4 GiB of RAM Weused ImageJ and the NDPITools to perform statis-tics on the upper piece of tissue on the slide shownin Figure 1

Since the digital slide bndpi weighted 207 GiBwith a high resolution image of 103168 times 63232 pix-els it was not possible to do the study ina straightforward way We opened the filebndpi as a preview image with the commandPlugins gt NDPITools gt Preview NDPI andselected on it the left tissue sample Thenwe used the command Plugins gt NDPITools gt

Custom extract to TIFF Mosaic and askedfor extraction as a mosaic of 16 JPEG files each oneneeding less than 1 GiB of RAM to open and withan overlap of 60 pixels This was completed withina few minutes Then we applied an ImageJ macroto each of the 16 pieces to identify the dark cell nu-clei (those with high chromatin content) based onthresholding the luminosity values of the pixels asshown in Figure 1 It produced text files with thecoordinates and size of each cell nucleus

Out of the 154240 identified nuclei 1951 were po-sitioned on the overlapping regions between piecesUsing the overlap feature of our tools enabled toproperly detect these nuclei since they would havebeen cut by the boundary of the pieces of the mo-saic in absence of overlap We avoided double count-ing by identifying the pairs of nuclei situated in theoverlapping regions and which were separated by adistance smaller than their radius

As shown in earlier studies [7 10 11] these datacan be used for research and diagnosis purposes Asan example Figure 5 shows the distribution of thedistance of each cell nucleus to its nearest neighborThanks to the very high number of analyzed cell nu-clei this distribution is obtained with an excellentprecision

ConclusionsThe LargeTIFFTools NDPITools and NDPIToolsplugins for ImageJ achieve efficiently some funda-mental functions on large images and in particulardigital slides for which standard open source soft-ware fails or performs badly They enable both theclinician to examine a single slide and the bioinfor-matics research team to perform large-scale analysisof many slides possibly on computer grids [20]

To date the LargeTIFFTools have been down-loaded from more than 388 different IP addressesthe NDPITools from more than 1361 addresses andthe ImageJ plugins from more than 235 addressesTable 2 lists the distribution of the target platformsamong the downloads of the binary files It shows abroad usage of the different platforms by the commu-nity emphasizing the importance of cross-platformopen source tools

We have explained how the software was usedto study some microscopic properties of brain tis-sue when invaded by an oligodendroglioma and wehave given an illustrative application to the analysisof a whole-size pathology slide This suggests otherpromising applications

Availability and requirementsa LargeTIFFTools

bull Project name LargeTIFFTools

bull Project home page httpwwwimncin2p3frpagespersoderoulerssoftwarelargetifftools

bull Operating system(s) Platform indepen-dent

bull Programming language C

bull Other requirements libjpeg libtiff

bull License GNU GPLv3

b NDPITools

bull Project name NDPITools

bull Project home page httpwwwimncin2p3frpagespersoderoulerssoftwarendpitools

bull Operating system(s) Platform indepen-dent

bull Programming language C

bull Other requirements mdash

bull License GNU GPLv3

For the convenience of users precompiled bi-naries are provided for Windows (32 and 64 bits)Mac OS X and Linux

c NDPITools plugins for ImageJ

6

bull Project name NDPITools plugins for Im-ageJ

bull Project home page httpwwwimncin2p3frpagespersoderoulerssoftwarendpitools

bull Operating system(s) Platform indepen-dent

bull Programming language Java

bull Other requirements ImageJ 131s orhigher Ant JAI 113

bull License GNU GPLv3

Competing interestsThe authors declare that they have no competing inter-ests

Authors contributionsCD wrote the paper ML conceived and implemented afirst version of the integration into ImageJ as a toolsetof macros CD implemented the software and wrote thedocumentation CG AG and ML contributed sugges-tions to the software CD DA AG and ML performedsoftware tests CD MB CG AG and ML selected andprovided histological samples CD performed the statis-tical analysis of the sample slide All authors reviewedthe manuscript All authors read and approved the finalmanuscript

AcknowledgementsWe thank F Bouhidel and P Bertheau for their helpwith the slide scanner of the Pathology Laboratory ofthe Saint-Louis Hospital in Paris and C Klein (Imag-ing facility Cordeliers Research Center ndash INSERM U872Paris) for tests and suggestions

The computer CPU operating system and pro-gramming language names quoted in this article aretrademarks of their respective owners

References1 Diamond J McCleary D Virtual Microscopy In Ad-

vanced techniques in diagnostic cellular pathology Editedby Hannon-Fletcher M Maxwell P Chichester UK JohnWiley amp Sons Ltd 2009

2 Ameisen D Yunes JB Deroulers C Perrier V BouhidelF Battistella M Legres L Janin A Bertheau P Stackor Trash Fast quality assessment of virtualslides Diagnostics Pathology 2013 in press

3 Garcıa Rojo M Castro AM Goncalves L COST ActionrdquoEuroTelepathrdquo digital pathology integration inelectronic health record including primary carecentres Diagnostic pathology 2011 6(Suppl 1)S6

4 Ameisen D Integration des lames virtuelles dansle dossier patient electronique PhD thesis UnivParis Diderot-Paris 7 2013

5 Collan Y Torkkeli T Personen E Jantunen E KosmaVM Application of morphometry in tumorpathology Analytical and quantitative cytology and his-tology 1987 9(2)79ndash88

6 Wolfe P Murphy J McGinley J Zhu Z Jiang WGottschall E Thompson H Using nuclear morphom-etry to discriminate the tumorigenic potential ofcells A comparison of statistical methods Cancerepidemiology biomarkers amp prevention 2004 13(6)976ndash988

7 Gurcan MN Boucheron LE Can A Madabhushi A Ra-jpoot NM Yener B Histopathological Image Analy-sis A Review Biomedical Engineering IEEE Reviewsin 2009 2147ndash171

8 Gerin C Pallud J Deroulers C Varlet P OppenheimC Roux FX Chretien F Thomas SR GrammaticosB Badoual M Quantitative characterization ofthe imaging limits of diffuse low-grade oligoden-drogliomas Neuro-Oncology 2013 in press

9 Wienert S Heim D Kotani M Lindequist B Sten-zinger A Ishii M Hufnagl P Beil M Dietel M DenkertC Klauschen F CognitionMaster an object-based image analysis framework Diagnostic pathol-ogy 2013 834

10 Gunduz C Yener B Gultekin SH The cell graphs ofcancer Bioinformatics 2004 20 Suppl 1i145ndashi151

11 Gunduz C Gultekin SH Yener B Augmented cell-graphs for automated cancer diagnosis Bioinfor-matics 2005 21 Suppl 2ii7ndashii12

12 West NP Dattani M McShane P Hutchins G GrabschJ Mueller W Treanor D Quirke P Grabsch H Theproportion of tumour cells is an independent pre-dictor for survival in colorectal cancer patientsBritish Journal of Cancer 2010 1021519ndash1523

13 Chang H Han J Borowsky A Loss L Gray JW Spell-man PT Parvin B Invariant Delineation of Nu-clear Architecture in Glioblastoma Multiformefor Clinical and Molecular Association IEEETrans Med Imag 2013 32(4)670ndash682

14 Kayser K Radziszowski D Bzdyl P Sommer R KayserG Towards an automated virtual slide screeningtheoretical considerations and practical experi-ences of automated tissue-based virtual diagno-sis to be implemented in the Internet Diagnosticpathology 2006 110

15 PLGA Foundation Meta Analysis Low GradeGlioma Database Project 2012 [httpwwwfightplgaorgresearchPLGA-Sponsored ProjectsMetaAnalysis]

7

16 Garcıa Rojo M Bueno G Slodkowska J Review ofimaging solutions for integrated quantitative im-munohistochemistry in the Pathology daily prac-tice Folia histochemica et cytobiologica 2009 47(3)349ndash354

17 Rasband WS ImageJ 1997ndash2012 [httpimagejnihgovij]

18 ImageMagick Studio LLC ImageMagick 2013 [httpwwwimagemagickorg]

19 GraphicsMagick Group GraphicsMagick 2013 [httpwwwgraphicsmagickorg]

20 Kong J Cooper LAD Wang F Chisolm C MorenoCS Kurc TM Widener PM Brat DJ Saltz JH Acomprehensive framework for classification of nu-clei in digital microscopy imaging An applica-tion to diffuse gliomas In Biomedical Imaging FromNano to Macro 2011 IEEE International Symposium on20112128ndash2131

21 Kayser K Gortler J Borkenfeld S Kayser G Grid com-puting in image analysis Diagnostic pathology 20116(Suppl 1)S12

22 Granier A Olivier M Laborie S Vaudescal S Baecker VTran-Aupiais C WIDE (Web Images and Data En-vironment) 2013 [httpwwwmricnrsfrindexphpm=81]

23 Kayser K Introduction of virtual microscopy inroutine surgical pathology mdash a hypothesis andpersonal view from Europe Diagnostic pathology2012 748

24 Goode A Satyanarayanan M A Vendor-NeutralLibrary and Viewer for Whole-Slide ImagesTech Rep Technical Report CMU-CS-08-136 Com-puter Science Department Carnegie Mellon Univer-sity 2008 [httpreports-archiveadmcscmueduanon2008CMU-CS-08-136pdf]

25 Linkert M Rueden CT Allan C Burel JM Moore WPatterson A Loranger B Moore J Neves C MacDon-ald D Tarkowska A Sticco C Hill E Rossner M EliceiriKW Swedlow JR Metadata matters access to im-age data in the real world Journal of Cell Biology2010 198(5)777ndash782

26 Khushi M Edwards G de Marcos DA Carpenter JEGraham JD Clarke CL Open source tools for man-agement and archiving of digital microscopy datato allow integration with patient pathology andtreatment information Diagnostic pathology 2013822

27 Sam Leffler S the authors of LibTIFF LibTIFF ndashTIFF Library and Utilities 2012 [httpwwwremotesensingorglibtiff]

28 Lane TG Vollbeding G The Independent JPEGGrouprsquos JPEG software 2013 [httpwwwijgorg]

29 Lane TG Vollbeding G the authors of the libjpeg-turbosoftware libjpeg-turbo 2012 [httplibjpeg-turbovirtualglorg]

30 Schneider CA Rasband WS Eliceiri KW NIH Imageto ImageJ 25 years of image analysis Nature Meth-ods 2012 9671ndash675

31 Sacha J Image IO Plugin Bundle 2004 [httpij-pluginssourceforgenetpluginsimageio]

32 Sun Microsystems Inc Java Advanced Library113 2006 [httpwwworaclecomtechnetworkjavacurrent-142188html]

33 BigTIFF Design 2012 [httpwwwremotesensingorglibtiffbigtiffdesignhtml]

34 The BigTIFF File Format Proposal 2008 [httpwwwawaresystemsbeimagingtiffbigtiffhtml]

8

Figures

(a) (b) (c)

Figure 1 - A sample slide(a) macroscopic view of the whole slide (the black rectangle on the left is 1x2 cm) (bc) Influence of themagnification on the quality of results (b) a portion of the slide scanned at magnification level 10x Thewhite contours show the result of an automatic detection of the dark cell nuclei with the ImageJ software Asignificant fraction of the cell nuclei is missed and the contours are rather pixelated (c) the same portion ofthe slide scanned at magnification 40x The white contours show the result of the same automatic detectionAlmost all cell nuclei are detected and the shapes of the contours are much more precise Scale bar 4 microm

Figure 2 - A typical session using ImageJ and the NDPITools pluginsA NDPI file has been opened with the NDPITools plugins and it is displayed as a preview image (image atlargest resolution which still fits into the computerrsquos screen) mdash top window A rectangular region has beenselected and extracted as a TIFF image then opened mdash bottom window

9

Figure 3 - Preview image of a NDPI file with several focalization levels in ImageJThe NDPI file 08ndpi contains images at 5 different focalization levels Therefore its preview image isdisplayed as a stack of 5 images

10

Figure 4 - Dialog box for customized extraction in ImageJ from an NDPI file with production of amosaicThe dialog box shows some options which can be customized while producing a mosaic from a rectangularselection of a NDPI file preview image (here using the file previewed in Figure 3)

Figure 5 - Statistical properties of the cell nuclei with high chromatin content in the tissue sample ofFigure 1The positions of the 154240 identified nuclei were obtained from the analysis with ImageJ of the digital slideon a laptop computer Since the slide was too large to fit into the computerrsquos memory it was turned into amosaic of 16 pieces with overlap of 60 pixels and each piece underwent automated analysis independentlyThen the results were aggregated The graph shows the probability density function of the distance of a cellnucleus to its nearest neighbor in the whole sample

11

TablesTable 1 - Speed comparison of software to extract a 256 times 256 rectangle from a huge TIFF imageTime needed (or indication of failure when the task was not completed) by several software tools to extracta rectangular region of size 256times256 pixels situated at the bottom right corner of huge TIFF images and tosave it as an independent file The input images were single-image tiled TIFF files using JPEG compressionTheir dimensions are indicated in the top row The computer used was a 26 GHz Intel Core i7 Mac Miniwith 16 GiB of RAM and more than 100 GiB of free hard disk The tested software tools were from topto bottom tifffastcrop from our LargeTIFFTools GraphicsMagick 1317 ImageMagick 680-7 and theutility tiffcrop from LibTIFF 403

Image size (px) 11264 times 4384 45056 times 17536 180224 times 70144

tifffastcrop 030 s 030 s 030 sGraphicsMagick 074 s 236 s gt 80 min

ImageMagick 118 s 236 s failedtiffcrop 050 s failed seg fault

Table 2 - Downloads of the NDPIToolsDistribution of the downloads (unique IP address) of the precompiled binaries of the NDPITools betweenMarch 2012 and April 2013

Windows (32 bits) Windows (64 bits) Linux Mac OS X483 542 217 285

12

Page 5: Analyzing huge pathology images with open source … · Analyzing huge pathology images with open source software ... telepathology and teaching [1,2]. In more and more hospitals,

on the command line to ndpi2tiff to request usingthe BigTIFF format extension The specificationsof this extension to the TIFF standard discussedand published before 2008 [3334] are supported byLibTIFF as of version 400 [27] and therefore by theabundant image viewing and manipulation softwarewhich relies on LibTIFF If the use of the BigTIFFformat extension would have impeded the further ex-ploitation of the produced TIFF file we could havesimply used ndpisplit as above Or we could havecalled the ndpi2tiff command several times eachtime requesting extraction of a subset of all imagesby specifying image numbers after the file name sep-arated with commas as in andpi01234

Extracting a small region from a huge image

This task can be useful to visualize at full resolutiona region of interest which the user has selected ona low-magnification preview image Therefore itshould be performed as quickly as possible

From a TIFF file

The task was to extract a rectangular region ofsize 256 times 256 pixels situated at the bottom rightcorner of huge TIFF images and to save it as anindependent file The source images were single-image TIFF files using JPEG compression Table 1compares the time needed to complete the taskwith tifffastcrop from our LargeTIFFTools andwith several software tools on increasingly largeTIFF files Tests were performed on a 26 GHz IntelCore i7 Mac Mini computer with 16 GB of RAM andused GraphicsMagick 1317 ImageMagick 680-7 and the utility tiffcrop from LibTIFF 403Noticeably when treating the largest image Graph-icsMagick needs 50 GiB of free disk space whereastifffastcrop doesnrsquot need it

From a NDPI file

The task was to extract a rectangular region of size256 times 256 from one of the largest images of the fileandpi (size 180224 times 70144) On a 26 GHz IntelCore i7 Mac Mini computer with 16 GB of RAMthe execution time was 012 s for one extract and inaverage 006 s per extract in a series of 20 extractswith locations drawn uniformly at random inside thewhole image

ApplicationsIntegration in digital pathology image servers or virtualslide systems

The NDPITools are being used in several other soft-ware projects

bull in a system for automatic blur detection [24]

bull in WIDE [22] to deal with NDPI files WIDEis an open-source biological and digital pathol-ogy image archiving and visualization systemwhich allows the remote user to see imagesstored in a remote library in a browser In par-ticular thanks to the feature of high-speed ex-traction of a rectangular region by ndpisplitWIDE saves costly disk space since it doesnrsquotneed to store TIFF files converted from NDPIfiles in addition to the latter

Exploiting a large set of digital slides

In the framework of a study about invasive low-gradeoligodendrogliomas reported elsewhere [8] we hadto deal with 303 NDPI files occupying 122 GiBOn a 32 GHz Intel Core i3 IMac computer with16 GB RAM we used ndpisplit in a batch workto convert them into standard TIFF files whichtook only a few hours The experimental -s op-tion of ndpisplit was used to remove the blankfilling between scanned regions resulting in an im-portant disk space saving and in smaller TIFF files(one for each scanned region) which where easier tomanipulate afterwards Then for each sample Pre-viewapp and ImageJ were used to inspect the re-sulting images and manually select the regions ofclinical interest The corresponding extracts of thehigh magnification images were the subject of auto-mated cell counting and other quantitative analysesusing ImageJ In particular we collected quantita-tive data about edema or tissue hyperhydration [8]This quantity needed a specific image analysis proce-dure which is not offered by standard morphometrysoftware and unlike cell density estimates could notbe retrieved by sampling a few fields of view in themicroscope Therefore virtual microscopy and ourtools were essential in this study

Study of a whole slide of brain tissue invaded by anoligodendroglioma

To demonstrate the possibility to do research onhuge images even with a modest computer we chose

5

a 3-year-old MacBook Pro laptop computer with266 GHz Intel Core 2 Duo and 4 GiB of RAM Weused ImageJ and the NDPITools to perform statis-tics on the upper piece of tissue on the slide shownin Figure 1

Since the digital slide bndpi weighted 207 GiBwith a high resolution image of 103168 times 63232 pix-els it was not possible to do the study ina straightforward way We opened the filebndpi as a preview image with the commandPlugins gt NDPITools gt Preview NDPI andselected on it the left tissue sample Thenwe used the command Plugins gt NDPITools gt

Custom extract to TIFF Mosaic and askedfor extraction as a mosaic of 16 JPEG files each oneneeding less than 1 GiB of RAM to open and withan overlap of 60 pixels This was completed withina few minutes Then we applied an ImageJ macroto each of the 16 pieces to identify the dark cell nu-clei (those with high chromatin content) based onthresholding the luminosity values of the pixels asshown in Figure 1 It produced text files with thecoordinates and size of each cell nucleus

Out of the 154240 identified nuclei 1951 were po-sitioned on the overlapping regions between piecesUsing the overlap feature of our tools enabled toproperly detect these nuclei since they would havebeen cut by the boundary of the pieces of the mo-saic in absence of overlap We avoided double count-ing by identifying the pairs of nuclei situated in theoverlapping regions and which were separated by adistance smaller than their radius

As shown in earlier studies [7 10 11] these datacan be used for research and diagnosis purposes Asan example Figure 5 shows the distribution of thedistance of each cell nucleus to its nearest neighborThanks to the very high number of analyzed cell nu-clei this distribution is obtained with an excellentprecision

ConclusionsThe LargeTIFFTools NDPITools and NDPIToolsplugins for ImageJ achieve efficiently some funda-mental functions on large images and in particulardigital slides for which standard open source soft-ware fails or performs badly They enable both theclinician to examine a single slide and the bioinfor-matics research team to perform large-scale analysisof many slides possibly on computer grids [20]

To date the LargeTIFFTools have been down-loaded from more than 388 different IP addressesthe NDPITools from more than 1361 addresses andthe ImageJ plugins from more than 235 addressesTable 2 lists the distribution of the target platformsamong the downloads of the binary files It shows abroad usage of the different platforms by the commu-nity emphasizing the importance of cross-platformopen source tools

We have explained how the software was usedto study some microscopic properties of brain tis-sue when invaded by an oligodendroglioma and wehave given an illustrative application to the analysisof a whole-size pathology slide This suggests otherpromising applications

Availability and requirementsa LargeTIFFTools

bull Project name LargeTIFFTools

bull Project home page httpwwwimncin2p3frpagespersoderoulerssoftwarelargetifftools

bull Operating system(s) Platform indepen-dent

bull Programming language C

bull Other requirements libjpeg libtiff

bull License GNU GPLv3

b NDPITools

bull Project name NDPITools

bull Project home page httpwwwimncin2p3frpagespersoderoulerssoftwarendpitools

bull Operating system(s) Platform indepen-dent

bull Programming language C

bull Other requirements mdash

bull License GNU GPLv3

For the convenience of users precompiled bi-naries are provided for Windows (32 and 64 bits)Mac OS X and Linux

c NDPITools plugins for ImageJ

6

bull Project name NDPITools plugins for Im-ageJ

bull Project home page httpwwwimncin2p3frpagespersoderoulerssoftwarendpitools

bull Operating system(s) Platform indepen-dent

bull Programming language Java

bull Other requirements ImageJ 131s orhigher Ant JAI 113

bull License GNU GPLv3

Competing interestsThe authors declare that they have no competing inter-ests

Authors contributionsCD wrote the paper ML conceived and implemented afirst version of the integration into ImageJ as a toolsetof macros CD implemented the software and wrote thedocumentation CG AG and ML contributed sugges-tions to the software CD DA AG and ML performedsoftware tests CD MB CG AG and ML selected andprovided histological samples CD performed the statis-tical analysis of the sample slide All authors reviewedthe manuscript All authors read and approved the finalmanuscript

AcknowledgementsWe thank F Bouhidel and P Bertheau for their helpwith the slide scanner of the Pathology Laboratory ofthe Saint-Louis Hospital in Paris and C Klein (Imag-ing facility Cordeliers Research Center ndash INSERM U872Paris) for tests and suggestions

The computer CPU operating system and pro-gramming language names quoted in this article aretrademarks of their respective owners

References1 Diamond J McCleary D Virtual Microscopy In Ad-

vanced techniques in diagnostic cellular pathology Editedby Hannon-Fletcher M Maxwell P Chichester UK JohnWiley amp Sons Ltd 2009

2 Ameisen D Yunes JB Deroulers C Perrier V BouhidelF Battistella M Legres L Janin A Bertheau P Stackor Trash Fast quality assessment of virtualslides Diagnostics Pathology 2013 in press

3 Garcıa Rojo M Castro AM Goncalves L COST ActionrdquoEuroTelepathrdquo digital pathology integration inelectronic health record including primary carecentres Diagnostic pathology 2011 6(Suppl 1)S6

4 Ameisen D Integration des lames virtuelles dansle dossier patient electronique PhD thesis UnivParis Diderot-Paris 7 2013

5 Collan Y Torkkeli T Personen E Jantunen E KosmaVM Application of morphometry in tumorpathology Analytical and quantitative cytology and his-tology 1987 9(2)79ndash88

6 Wolfe P Murphy J McGinley J Zhu Z Jiang WGottschall E Thompson H Using nuclear morphom-etry to discriminate the tumorigenic potential ofcells A comparison of statistical methods Cancerepidemiology biomarkers amp prevention 2004 13(6)976ndash988

7 Gurcan MN Boucheron LE Can A Madabhushi A Ra-jpoot NM Yener B Histopathological Image Analy-sis A Review Biomedical Engineering IEEE Reviewsin 2009 2147ndash171

8 Gerin C Pallud J Deroulers C Varlet P OppenheimC Roux FX Chretien F Thomas SR GrammaticosB Badoual M Quantitative characterization ofthe imaging limits of diffuse low-grade oligoden-drogliomas Neuro-Oncology 2013 in press

9 Wienert S Heim D Kotani M Lindequist B Sten-zinger A Ishii M Hufnagl P Beil M Dietel M DenkertC Klauschen F CognitionMaster an object-based image analysis framework Diagnostic pathol-ogy 2013 834

10 Gunduz C Yener B Gultekin SH The cell graphs ofcancer Bioinformatics 2004 20 Suppl 1i145ndashi151

11 Gunduz C Gultekin SH Yener B Augmented cell-graphs for automated cancer diagnosis Bioinfor-matics 2005 21 Suppl 2ii7ndashii12

12 West NP Dattani M McShane P Hutchins G GrabschJ Mueller W Treanor D Quirke P Grabsch H Theproportion of tumour cells is an independent pre-dictor for survival in colorectal cancer patientsBritish Journal of Cancer 2010 1021519ndash1523

13 Chang H Han J Borowsky A Loss L Gray JW Spell-man PT Parvin B Invariant Delineation of Nu-clear Architecture in Glioblastoma Multiformefor Clinical and Molecular Association IEEETrans Med Imag 2013 32(4)670ndash682

14 Kayser K Radziszowski D Bzdyl P Sommer R KayserG Towards an automated virtual slide screeningtheoretical considerations and practical experi-ences of automated tissue-based virtual diagno-sis to be implemented in the Internet Diagnosticpathology 2006 110

15 PLGA Foundation Meta Analysis Low GradeGlioma Database Project 2012 [httpwwwfightplgaorgresearchPLGA-Sponsored ProjectsMetaAnalysis]

7

16 Garcıa Rojo M Bueno G Slodkowska J Review ofimaging solutions for integrated quantitative im-munohistochemistry in the Pathology daily prac-tice Folia histochemica et cytobiologica 2009 47(3)349ndash354

17 Rasband WS ImageJ 1997ndash2012 [httpimagejnihgovij]

18 ImageMagick Studio LLC ImageMagick 2013 [httpwwwimagemagickorg]

19 GraphicsMagick Group GraphicsMagick 2013 [httpwwwgraphicsmagickorg]

20 Kong J Cooper LAD Wang F Chisolm C MorenoCS Kurc TM Widener PM Brat DJ Saltz JH Acomprehensive framework for classification of nu-clei in digital microscopy imaging An applica-tion to diffuse gliomas In Biomedical Imaging FromNano to Macro 2011 IEEE International Symposium on20112128ndash2131

21 Kayser K Gortler J Borkenfeld S Kayser G Grid com-puting in image analysis Diagnostic pathology 20116(Suppl 1)S12

22 Granier A Olivier M Laborie S Vaudescal S Baecker VTran-Aupiais C WIDE (Web Images and Data En-vironment) 2013 [httpwwwmricnrsfrindexphpm=81]

23 Kayser K Introduction of virtual microscopy inroutine surgical pathology mdash a hypothesis andpersonal view from Europe Diagnostic pathology2012 748

24 Goode A Satyanarayanan M A Vendor-NeutralLibrary and Viewer for Whole-Slide ImagesTech Rep Technical Report CMU-CS-08-136 Com-puter Science Department Carnegie Mellon Univer-sity 2008 [httpreports-archiveadmcscmueduanon2008CMU-CS-08-136pdf]

25 Linkert M Rueden CT Allan C Burel JM Moore WPatterson A Loranger B Moore J Neves C MacDon-ald D Tarkowska A Sticco C Hill E Rossner M EliceiriKW Swedlow JR Metadata matters access to im-age data in the real world Journal of Cell Biology2010 198(5)777ndash782

26 Khushi M Edwards G de Marcos DA Carpenter JEGraham JD Clarke CL Open source tools for man-agement and archiving of digital microscopy datato allow integration with patient pathology andtreatment information Diagnostic pathology 2013822

27 Sam Leffler S the authors of LibTIFF LibTIFF ndashTIFF Library and Utilities 2012 [httpwwwremotesensingorglibtiff]

28 Lane TG Vollbeding G The Independent JPEGGrouprsquos JPEG software 2013 [httpwwwijgorg]

29 Lane TG Vollbeding G the authors of the libjpeg-turbosoftware libjpeg-turbo 2012 [httplibjpeg-turbovirtualglorg]

30 Schneider CA Rasband WS Eliceiri KW NIH Imageto ImageJ 25 years of image analysis Nature Meth-ods 2012 9671ndash675

31 Sacha J Image IO Plugin Bundle 2004 [httpij-pluginssourceforgenetpluginsimageio]

32 Sun Microsystems Inc Java Advanced Library113 2006 [httpwwworaclecomtechnetworkjavacurrent-142188html]

33 BigTIFF Design 2012 [httpwwwremotesensingorglibtiffbigtiffdesignhtml]

34 The BigTIFF File Format Proposal 2008 [httpwwwawaresystemsbeimagingtiffbigtiffhtml]

8

Figures

(a) (b) (c)

Figure 1 - A sample slide(a) macroscopic view of the whole slide (the black rectangle on the left is 1x2 cm) (bc) Influence of themagnification on the quality of results (b) a portion of the slide scanned at magnification level 10x Thewhite contours show the result of an automatic detection of the dark cell nuclei with the ImageJ software Asignificant fraction of the cell nuclei is missed and the contours are rather pixelated (c) the same portion ofthe slide scanned at magnification 40x The white contours show the result of the same automatic detectionAlmost all cell nuclei are detected and the shapes of the contours are much more precise Scale bar 4 microm

Figure 2 - A typical session using ImageJ and the NDPITools pluginsA NDPI file has been opened with the NDPITools plugins and it is displayed as a preview image (image atlargest resolution which still fits into the computerrsquos screen) mdash top window A rectangular region has beenselected and extracted as a TIFF image then opened mdash bottom window

9

Figure 3 - Preview image of a NDPI file with several focalization levels in ImageJThe NDPI file 08ndpi contains images at 5 different focalization levels Therefore its preview image isdisplayed as a stack of 5 images

10

Figure 4 - Dialog box for customized extraction in ImageJ from an NDPI file with production of amosaicThe dialog box shows some options which can be customized while producing a mosaic from a rectangularselection of a NDPI file preview image (here using the file previewed in Figure 3)

Figure 5 - Statistical properties of the cell nuclei with high chromatin content in the tissue sample ofFigure 1The positions of the 154240 identified nuclei were obtained from the analysis with ImageJ of the digital slideon a laptop computer Since the slide was too large to fit into the computerrsquos memory it was turned into amosaic of 16 pieces with overlap of 60 pixels and each piece underwent automated analysis independentlyThen the results were aggregated The graph shows the probability density function of the distance of a cellnucleus to its nearest neighbor in the whole sample

11

TablesTable 1 - Speed comparison of software to extract a 256 times 256 rectangle from a huge TIFF imageTime needed (or indication of failure when the task was not completed) by several software tools to extracta rectangular region of size 256times256 pixels situated at the bottom right corner of huge TIFF images and tosave it as an independent file The input images were single-image tiled TIFF files using JPEG compressionTheir dimensions are indicated in the top row The computer used was a 26 GHz Intel Core i7 Mac Miniwith 16 GiB of RAM and more than 100 GiB of free hard disk The tested software tools were from topto bottom tifffastcrop from our LargeTIFFTools GraphicsMagick 1317 ImageMagick 680-7 and theutility tiffcrop from LibTIFF 403

Image size (px) 11264 times 4384 45056 times 17536 180224 times 70144

tifffastcrop 030 s 030 s 030 sGraphicsMagick 074 s 236 s gt 80 min

ImageMagick 118 s 236 s failedtiffcrop 050 s failed seg fault

Table 2 - Downloads of the NDPIToolsDistribution of the downloads (unique IP address) of the precompiled binaries of the NDPITools betweenMarch 2012 and April 2013

Windows (32 bits) Windows (64 bits) Linux Mac OS X483 542 217 285

12

Page 6: Analyzing huge pathology images with open source … · Analyzing huge pathology images with open source software ... telepathology and teaching [1,2]. In more and more hospitals,

a 3-year-old MacBook Pro laptop computer with266 GHz Intel Core 2 Duo and 4 GiB of RAM Weused ImageJ and the NDPITools to perform statis-tics on the upper piece of tissue on the slide shownin Figure 1

Since the digital slide bndpi weighted 207 GiBwith a high resolution image of 103168 times 63232 pix-els it was not possible to do the study ina straightforward way We opened the filebndpi as a preview image with the commandPlugins gt NDPITools gt Preview NDPI andselected on it the left tissue sample Thenwe used the command Plugins gt NDPITools gt

Custom extract to TIFF Mosaic and askedfor extraction as a mosaic of 16 JPEG files each oneneeding less than 1 GiB of RAM to open and withan overlap of 60 pixels This was completed withina few minutes Then we applied an ImageJ macroto each of the 16 pieces to identify the dark cell nu-clei (those with high chromatin content) based onthresholding the luminosity values of the pixels asshown in Figure 1 It produced text files with thecoordinates and size of each cell nucleus

Out of the 154240 identified nuclei 1951 were po-sitioned on the overlapping regions between piecesUsing the overlap feature of our tools enabled toproperly detect these nuclei since they would havebeen cut by the boundary of the pieces of the mo-saic in absence of overlap We avoided double count-ing by identifying the pairs of nuclei situated in theoverlapping regions and which were separated by adistance smaller than their radius

As shown in earlier studies [7 10 11] these datacan be used for research and diagnosis purposes Asan example Figure 5 shows the distribution of thedistance of each cell nucleus to its nearest neighborThanks to the very high number of analyzed cell nu-clei this distribution is obtained with an excellentprecision

ConclusionsThe LargeTIFFTools NDPITools and NDPIToolsplugins for ImageJ achieve efficiently some funda-mental functions on large images and in particulardigital slides for which standard open source soft-ware fails or performs badly They enable both theclinician to examine a single slide and the bioinfor-matics research team to perform large-scale analysisof many slides possibly on computer grids [20]

To date the LargeTIFFTools have been down-loaded from more than 388 different IP addressesthe NDPITools from more than 1361 addresses andthe ImageJ plugins from more than 235 addressesTable 2 lists the distribution of the target platformsamong the downloads of the binary files It shows abroad usage of the different platforms by the commu-nity emphasizing the importance of cross-platformopen source tools

We have explained how the software was usedto study some microscopic properties of brain tis-sue when invaded by an oligodendroglioma and wehave given an illustrative application to the analysisof a whole-size pathology slide This suggests otherpromising applications

Availability and requirementsa LargeTIFFTools

bull Project name LargeTIFFTools

bull Project home page httpwwwimncin2p3frpagespersoderoulerssoftwarelargetifftools

bull Operating system(s) Platform indepen-dent

bull Programming language C

bull Other requirements libjpeg libtiff

bull License GNU GPLv3

b NDPITools

bull Project name NDPITools

bull Project home page httpwwwimncin2p3frpagespersoderoulerssoftwarendpitools

bull Operating system(s) Platform indepen-dent

bull Programming language C

bull Other requirements mdash

bull License GNU GPLv3

For the convenience of users precompiled bi-naries are provided for Windows (32 and 64 bits)Mac OS X and Linux

c NDPITools plugins for ImageJ

6

bull Project name NDPITools plugins for Im-ageJ

bull Project home page httpwwwimncin2p3frpagespersoderoulerssoftwarendpitools

bull Operating system(s) Platform indepen-dent

bull Programming language Java

bull Other requirements ImageJ 131s orhigher Ant JAI 113

bull License GNU GPLv3

Competing interestsThe authors declare that they have no competing inter-ests

Authors contributionsCD wrote the paper ML conceived and implemented afirst version of the integration into ImageJ as a toolsetof macros CD implemented the software and wrote thedocumentation CG AG and ML contributed sugges-tions to the software CD DA AG and ML performedsoftware tests CD MB CG AG and ML selected andprovided histological samples CD performed the statis-tical analysis of the sample slide All authors reviewedthe manuscript All authors read and approved the finalmanuscript

AcknowledgementsWe thank F Bouhidel and P Bertheau for their helpwith the slide scanner of the Pathology Laboratory ofthe Saint-Louis Hospital in Paris and C Klein (Imag-ing facility Cordeliers Research Center ndash INSERM U872Paris) for tests and suggestions

The computer CPU operating system and pro-gramming language names quoted in this article aretrademarks of their respective owners

References1 Diamond J McCleary D Virtual Microscopy In Ad-

vanced techniques in diagnostic cellular pathology Editedby Hannon-Fletcher M Maxwell P Chichester UK JohnWiley amp Sons Ltd 2009

2 Ameisen D Yunes JB Deroulers C Perrier V BouhidelF Battistella M Legres L Janin A Bertheau P Stackor Trash Fast quality assessment of virtualslides Diagnostics Pathology 2013 in press

3 Garcıa Rojo M Castro AM Goncalves L COST ActionrdquoEuroTelepathrdquo digital pathology integration inelectronic health record including primary carecentres Diagnostic pathology 2011 6(Suppl 1)S6

4 Ameisen D Integration des lames virtuelles dansle dossier patient electronique PhD thesis UnivParis Diderot-Paris 7 2013

5 Collan Y Torkkeli T Personen E Jantunen E KosmaVM Application of morphometry in tumorpathology Analytical and quantitative cytology and his-tology 1987 9(2)79ndash88

6 Wolfe P Murphy J McGinley J Zhu Z Jiang WGottschall E Thompson H Using nuclear morphom-etry to discriminate the tumorigenic potential ofcells A comparison of statistical methods Cancerepidemiology biomarkers amp prevention 2004 13(6)976ndash988

7 Gurcan MN Boucheron LE Can A Madabhushi A Ra-jpoot NM Yener B Histopathological Image Analy-sis A Review Biomedical Engineering IEEE Reviewsin 2009 2147ndash171

8 Gerin C Pallud J Deroulers C Varlet P OppenheimC Roux FX Chretien F Thomas SR GrammaticosB Badoual M Quantitative characterization ofthe imaging limits of diffuse low-grade oligoden-drogliomas Neuro-Oncology 2013 in press

9 Wienert S Heim D Kotani M Lindequist B Sten-zinger A Ishii M Hufnagl P Beil M Dietel M DenkertC Klauschen F CognitionMaster an object-based image analysis framework Diagnostic pathol-ogy 2013 834

10 Gunduz C Yener B Gultekin SH The cell graphs ofcancer Bioinformatics 2004 20 Suppl 1i145ndashi151

11 Gunduz C Gultekin SH Yener B Augmented cell-graphs for automated cancer diagnosis Bioinfor-matics 2005 21 Suppl 2ii7ndashii12

12 West NP Dattani M McShane P Hutchins G GrabschJ Mueller W Treanor D Quirke P Grabsch H Theproportion of tumour cells is an independent pre-dictor for survival in colorectal cancer patientsBritish Journal of Cancer 2010 1021519ndash1523

13 Chang H Han J Borowsky A Loss L Gray JW Spell-man PT Parvin B Invariant Delineation of Nu-clear Architecture in Glioblastoma Multiformefor Clinical and Molecular Association IEEETrans Med Imag 2013 32(4)670ndash682

14 Kayser K Radziszowski D Bzdyl P Sommer R KayserG Towards an automated virtual slide screeningtheoretical considerations and practical experi-ences of automated tissue-based virtual diagno-sis to be implemented in the Internet Diagnosticpathology 2006 110

15 PLGA Foundation Meta Analysis Low GradeGlioma Database Project 2012 [httpwwwfightplgaorgresearchPLGA-Sponsored ProjectsMetaAnalysis]

7

16 Garcıa Rojo M Bueno G Slodkowska J Review ofimaging solutions for integrated quantitative im-munohistochemistry in the Pathology daily prac-tice Folia histochemica et cytobiologica 2009 47(3)349ndash354

17 Rasband WS ImageJ 1997ndash2012 [httpimagejnihgovij]

18 ImageMagick Studio LLC ImageMagick 2013 [httpwwwimagemagickorg]

19 GraphicsMagick Group GraphicsMagick 2013 [httpwwwgraphicsmagickorg]

20 Kong J Cooper LAD Wang F Chisolm C MorenoCS Kurc TM Widener PM Brat DJ Saltz JH Acomprehensive framework for classification of nu-clei in digital microscopy imaging An applica-tion to diffuse gliomas In Biomedical Imaging FromNano to Macro 2011 IEEE International Symposium on20112128ndash2131

21 Kayser K Gortler J Borkenfeld S Kayser G Grid com-puting in image analysis Diagnostic pathology 20116(Suppl 1)S12

22 Granier A Olivier M Laborie S Vaudescal S Baecker VTran-Aupiais C WIDE (Web Images and Data En-vironment) 2013 [httpwwwmricnrsfrindexphpm=81]

23 Kayser K Introduction of virtual microscopy inroutine surgical pathology mdash a hypothesis andpersonal view from Europe Diagnostic pathology2012 748

24 Goode A Satyanarayanan M A Vendor-NeutralLibrary and Viewer for Whole-Slide ImagesTech Rep Technical Report CMU-CS-08-136 Com-puter Science Department Carnegie Mellon Univer-sity 2008 [httpreports-archiveadmcscmueduanon2008CMU-CS-08-136pdf]

25 Linkert M Rueden CT Allan C Burel JM Moore WPatterson A Loranger B Moore J Neves C MacDon-ald D Tarkowska A Sticco C Hill E Rossner M EliceiriKW Swedlow JR Metadata matters access to im-age data in the real world Journal of Cell Biology2010 198(5)777ndash782

26 Khushi M Edwards G de Marcos DA Carpenter JEGraham JD Clarke CL Open source tools for man-agement and archiving of digital microscopy datato allow integration with patient pathology andtreatment information Diagnostic pathology 2013822

27 Sam Leffler S the authors of LibTIFF LibTIFF ndashTIFF Library and Utilities 2012 [httpwwwremotesensingorglibtiff]

28 Lane TG Vollbeding G The Independent JPEGGrouprsquos JPEG software 2013 [httpwwwijgorg]

29 Lane TG Vollbeding G the authors of the libjpeg-turbosoftware libjpeg-turbo 2012 [httplibjpeg-turbovirtualglorg]

30 Schneider CA Rasband WS Eliceiri KW NIH Imageto ImageJ 25 years of image analysis Nature Meth-ods 2012 9671ndash675

31 Sacha J Image IO Plugin Bundle 2004 [httpij-pluginssourceforgenetpluginsimageio]

32 Sun Microsystems Inc Java Advanced Library113 2006 [httpwwworaclecomtechnetworkjavacurrent-142188html]

33 BigTIFF Design 2012 [httpwwwremotesensingorglibtiffbigtiffdesignhtml]

34 The BigTIFF File Format Proposal 2008 [httpwwwawaresystemsbeimagingtiffbigtiffhtml]

8

Figures

(a) (b) (c)

Figure 1 - A sample slide(a) macroscopic view of the whole slide (the black rectangle on the left is 1x2 cm) (bc) Influence of themagnification on the quality of results (b) a portion of the slide scanned at magnification level 10x Thewhite contours show the result of an automatic detection of the dark cell nuclei with the ImageJ software Asignificant fraction of the cell nuclei is missed and the contours are rather pixelated (c) the same portion ofthe slide scanned at magnification 40x The white contours show the result of the same automatic detectionAlmost all cell nuclei are detected and the shapes of the contours are much more precise Scale bar 4 microm

Figure 2 - A typical session using ImageJ and the NDPITools pluginsA NDPI file has been opened with the NDPITools plugins and it is displayed as a preview image (image atlargest resolution which still fits into the computerrsquos screen) mdash top window A rectangular region has beenselected and extracted as a TIFF image then opened mdash bottom window

9

Figure 3 - Preview image of a NDPI file with several focalization levels in ImageJThe NDPI file 08ndpi contains images at 5 different focalization levels Therefore its preview image isdisplayed as a stack of 5 images

10

Figure 4 - Dialog box for customized extraction in ImageJ from an NDPI file with production of amosaicThe dialog box shows some options which can be customized while producing a mosaic from a rectangularselection of a NDPI file preview image (here using the file previewed in Figure 3)

Figure 5 - Statistical properties of the cell nuclei with high chromatin content in the tissue sample ofFigure 1The positions of the 154240 identified nuclei were obtained from the analysis with ImageJ of the digital slideon a laptop computer Since the slide was too large to fit into the computerrsquos memory it was turned into amosaic of 16 pieces with overlap of 60 pixels and each piece underwent automated analysis independentlyThen the results were aggregated The graph shows the probability density function of the distance of a cellnucleus to its nearest neighbor in the whole sample

11

TablesTable 1 - Speed comparison of software to extract a 256 times 256 rectangle from a huge TIFF imageTime needed (or indication of failure when the task was not completed) by several software tools to extracta rectangular region of size 256times256 pixels situated at the bottom right corner of huge TIFF images and tosave it as an independent file The input images were single-image tiled TIFF files using JPEG compressionTheir dimensions are indicated in the top row The computer used was a 26 GHz Intel Core i7 Mac Miniwith 16 GiB of RAM and more than 100 GiB of free hard disk The tested software tools were from topto bottom tifffastcrop from our LargeTIFFTools GraphicsMagick 1317 ImageMagick 680-7 and theutility tiffcrop from LibTIFF 403

Image size (px) 11264 times 4384 45056 times 17536 180224 times 70144

tifffastcrop 030 s 030 s 030 sGraphicsMagick 074 s 236 s gt 80 min

ImageMagick 118 s 236 s failedtiffcrop 050 s failed seg fault

Table 2 - Downloads of the NDPIToolsDistribution of the downloads (unique IP address) of the precompiled binaries of the NDPITools betweenMarch 2012 and April 2013

Windows (32 bits) Windows (64 bits) Linux Mac OS X483 542 217 285

12

Page 7: Analyzing huge pathology images with open source … · Analyzing huge pathology images with open source software ... telepathology and teaching [1,2]. In more and more hospitals,

bull Project name NDPITools plugins for Im-ageJ

bull Project home page httpwwwimncin2p3frpagespersoderoulerssoftwarendpitools

bull Operating system(s) Platform indepen-dent

bull Programming language Java

bull Other requirements ImageJ 131s orhigher Ant JAI 113

bull License GNU GPLv3

Competing interestsThe authors declare that they have no competing inter-ests

Authors contributionsCD wrote the paper ML conceived and implemented afirst version of the integration into ImageJ as a toolsetof macros CD implemented the software and wrote thedocumentation CG AG and ML contributed sugges-tions to the software CD DA AG and ML performedsoftware tests CD MB CG AG and ML selected andprovided histological samples CD performed the statis-tical analysis of the sample slide All authors reviewedthe manuscript All authors read and approved the finalmanuscript

AcknowledgementsWe thank F Bouhidel and P Bertheau for their helpwith the slide scanner of the Pathology Laboratory ofthe Saint-Louis Hospital in Paris and C Klein (Imag-ing facility Cordeliers Research Center ndash INSERM U872Paris) for tests and suggestions

The computer CPU operating system and pro-gramming language names quoted in this article aretrademarks of their respective owners

References1 Diamond J McCleary D Virtual Microscopy In Ad-

vanced techniques in diagnostic cellular pathology Editedby Hannon-Fletcher M Maxwell P Chichester UK JohnWiley amp Sons Ltd 2009

2 Ameisen D Yunes JB Deroulers C Perrier V BouhidelF Battistella M Legres L Janin A Bertheau P Stackor Trash Fast quality assessment of virtualslides Diagnostics Pathology 2013 in press

3 Garcıa Rojo M Castro AM Goncalves L COST ActionrdquoEuroTelepathrdquo digital pathology integration inelectronic health record including primary carecentres Diagnostic pathology 2011 6(Suppl 1)S6

4 Ameisen D Integration des lames virtuelles dansle dossier patient electronique PhD thesis UnivParis Diderot-Paris 7 2013

5 Collan Y Torkkeli T Personen E Jantunen E KosmaVM Application of morphometry in tumorpathology Analytical and quantitative cytology and his-tology 1987 9(2)79ndash88

6 Wolfe P Murphy J McGinley J Zhu Z Jiang WGottschall E Thompson H Using nuclear morphom-etry to discriminate the tumorigenic potential ofcells A comparison of statistical methods Cancerepidemiology biomarkers amp prevention 2004 13(6)976ndash988

7 Gurcan MN Boucheron LE Can A Madabhushi A Ra-jpoot NM Yener B Histopathological Image Analy-sis A Review Biomedical Engineering IEEE Reviewsin 2009 2147ndash171

8 Gerin C Pallud J Deroulers C Varlet P OppenheimC Roux FX Chretien F Thomas SR GrammaticosB Badoual M Quantitative characterization ofthe imaging limits of diffuse low-grade oligoden-drogliomas Neuro-Oncology 2013 in press

9 Wienert S Heim D Kotani M Lindequist B Sten-zinger A Ishii M Hufnagl P Beil M Dietel M DenkertC Klauschen F CognitionMaster an object-based image analysis framework Diagnostic pathol-ogy 2013 834

10 Gunduz C Yener B Gultekin SH The cell graphs ofcancer Bioinformatics 2004 20 Suppl 1i145ndashi151

11 Gunduz C Gultekin SH Yener B Augmented cell-graphs for automated cancer diagnosis Bioinfor-matics 2005 21 Suppl 2ii7ndashii12

12 West NP Dattani M McShane P Hutchins G GrabschJ Mueller W Treanor D Quirke P Grabsch H Theproportion of tumour cells is an independent pre-dictor for survival in colorectal cancer patientsBritish Journal of Cancer 2010 1021519ndash1523

13 Chang H Han J Borowsky A Loss L Gray JW Spell-man PT Parvin B Invariant Delineation of Nu-clear Architecture in Glioblastoma Multiformefor Clinical and Molecular Association IEEETrans Med Imag 2013 32(4)670ndash682

14 Kayser K Radziszowski D Bzdyl P Sommer R KayserG Towards an automated virtual slide screeningtheoretical considerations and practical experi-ences of automated tissue-based virtual diagno-sis to be implemented in the Internet Diagnosticpathology 2006 110

15 PLGA Foundation Meta Analysis Low GradeGlioma Database Project 2012 [httpwwwfightplgaorgresearchPLGA-Sponsored ProjectsMetaAnalysis]

7

16 Garcıa Rojo M Bueno G Slodkowska J Review ofimaging solutions for integrated quantitative im-munohistochemistry in the Pathology daily prac-tice Folia histochemica et cytobiologica 2009 47(3)349ndash354

17 Rasband WS ImageJ 1997ndash2012 [httpimagejnihgovij]

18 ImageMagick Studio LLC ImageMagick 2013 [httpwwwimagemagickorg]

19 GraphicsMagick Group GraphicsMagick 2013 [httpwwwgraphicsmagickorg]

20 Kong J Cooper LAD Wang F Chisolm C MorenoCS Kurc TM Widener PM Brat DJ Saltz JH Acomprehensive framework for classification of nu-clei in digital microscopy imaging An applica-tion to diffuse gliomas In Biomedical Imaging FromNano to Macro 2011 IEEE International Symposium on20112128ndash2131

21 Kayser K Gortler J Borkenfeld S Kayser G Grid com-puting in image analysis Diagnostic pathology 20116(Suppl 1)S12

22 Granier A Olivier M Laborie S Vaudescal S Baecker VTran-Aupiais C WIDE (Web Images and Data En-vironment) 2013 [httpwwwmricnrsfrindexphpm=81]

23 Kayser K Introduction of virtual microscopy inroutine surgical pathology mdash a hypothesis andpersonal view from Europe Diagnostic pathology2012 748

24 Goode A Satyanarayanan M A Vendor-NeutralLibrary and Viewer for Whole-Slide ImagesTech Rep Technical Report CMU-CS-08-136 Com-puter Science Department Carnegie Mellon Univer-sity 2008 [httpreports-archiveadmcscmueduanon2008CMU-CS-08-136pdf]

25 Linkert M Rueden CT Allan C Burel JM Moore WPatterson A Loranger B Moore J Neves C MacDon-ald D Tarkowska A Sticco C Hill E Rossner M EliceiriKW Swedlow JR Metadata matters access to im-age data in the real world Journal of Cell Biology2010 198(5)777ndash782

26 Khushi M Edwards G de Marcos DA Carpenter JEGraham JD Clarke CL Open source tools for man-agement and archiving of digital microscopy datato allow integration with patient pathology andtreatment information Diagnostic pathology 2013822

27 Sam Leffler S the authors of LibTIFF LibTIFF ndashTIFF Library and Utilities 2012 [httpwwwremotesensingorglibtiff]

28 Lane TG Vollbeding G The Independent JPEGGrouprsquos JPEG software 2013 [httpwwwijgorg]

29 Lane TG Vollbeding G the authors of the libjpeg-turbosoftware libjpeg-turbo 2012 [httplibjpeg-turbovirtualglorg]

30 Schneider CA Rasband WS Eliceiri KW NIH Imageto ImageJ 25 years of image analysis Nature Meth-ods 2012 9671ndash675

31 Sacha J Image IO Plugin Bundle 2004 [httpij-pluginssourceforgenetpluginsimageio]

32 Sun Microsystems Inc Java Advanced Library113 2006 [httpwwworaclecomtechnetworkjavacurrent-142188html]

33 BigTIFF Design 2012 [httpwwwremotesensingorglibtiffbigtiffdesignhtml]

34 The BigTIFF File Format Proposal 2008 [httpwwwawaresystemsbeimagingtiffbigtiffhtml]

8

Figures

(a) (b) (c)

Figure 1 - A sample slide(a) macroscopic view of the whole slide (the black rectangle on the left is 1x2 cm) (bc) Influence of themagnification on the quality of results (b) a portion of the slide scanned at magnification level 10x Thewhite contours show the result of an automatic detection of the dark cell nuclei with the ImageJ software Asignificant fraction of the cell nuclei is missed and the contours are rather pixelated (c) the same portion ofthe slide scanned at magnification 40x The white contours show the result of the same automatic detectionAlmost all cell nuclei are detected and the shapes of the contours are much more precise Scale bar 4 microm

Figure 2 - A typical session using ImageJ and the NDPITools pluginsA NDPI file has been opened with the NDPITools plugins and it is displayed as a preview image (image atlargest resolution which still fits into the computerrsquos screen) mdash top window A rectangular region has beenselected and extracted as a TIFF image then opened mdash bottom window

9

Figure 3 - Preview image of a NDPI file with several focalization levels in ImageJThe NDPI file 08ndpi contains images at 5 different focalization levels Therefore its preview image isdisplayed as a stack of 5 images

10

Figure 4 - Dialog box for customized extraction in ImageJ from an NDPI file with production of amosaicThe dialog box shows some options which can be customized while producing a mosaic from a rectangularselection of a NDPI file preview image (here using the file previewed in Figure 3)

Figure 5 - Statistical properties of the cell nuclei with high chromatin content in the tissue sample ofFigure 1The positions of the 154240 identified nuclei were obtained from the analysis with ImageJ of the digital slideon a laptop computer Since the slide was too large to fit into the computerrsquos memory it was turned into amosaic of 16 pieces with overlap of 60 pixels and each piece underwent automated analysis independentlyThen the results were aggregated The graph shows the probability density function of the distance of a cellnucleus to its nearest neighbor in the whole sample

11

TablesTable 1 - Speed comparison of software to extract a 256 times 256 rectangle from a huge TIFF imageTime needed (or indication of failure when the task was not completed) by several software tools to extracta rectangular region of size 256times256 pixels situated at the bottom right corner of huge TIFF images and tosave it as an independent file The input images were single-image tiled TIFF files using JPEG compressionTheir dimensions are indicated in the top row The computer used was a 26 GHz Intel Core i7 Mac Miniwith 16 GiB of RAM and more than 100 GiB of free hard disk The tested software tools were from topto bottom tifffastcrop from our LargeTIFFTools GraphicsMagick 1317 ImageMagick 680-7 and theutility tiffcrop from LibTIFF 403

Image size (px) 11264 times 4384 45056 times 17536 180224 times 70144

tifffastcrop 030 s 030 s 030 sGraphicsMagick 074 s 236 s gt 80 min

ImageMagick 118 s 236 s failedtiffcrop 050 s failed seg fault

Table 2 - Downloads of the NDPIToolsDistribution of the downloads (unique IP address) of the precompiled binaries of the NDPITools betweenMarch 2012 and April 2013

Windows (32 bits) Windows (64 bits) Linux Mac OS X483 542 217 285

12

Page 8: Analyzing huge pathology images with open source … · Analyzing huge pathology images with open source software ... telepathology and teaching [1,2]. In more and more hospitals,

16 Garcıa Rojo M Bueno G Slodkowska J Review ofimaging solutions for integrated quantitative im-munohistochemistry in the Pathology daily prac-tice Folia histochemica et cytobiologica 2009 47(3)349ndash354

17 Rasband WS ImageJ 1997ndash2012 [httpimagejnihgovij]

18 ImageMagick Studio LLC ImageMagick 2013 [httpwwwimagemagickorg]

19 GraphicsMagick Group GraphicsMagick 2013 [httpwwwgraphicsmagickorg]

20 Kong J Cooper LAD Wang F Chisolm C MorenoCS Kurc TM Widener PM Brat DJ Saltz JH Acomprehensive framework for classification of nu-clei in digital microscopy imaging An applica-tion to diffuse gliomas In Biomedical Imaging FromNano to Macro 2011 IEEE International Symposium on20112128ndash2131

21 Kayser K Gortler J Borkenfeld S Kayser G Grid com-puting in image analysis Diagnostic pathology 20116(Suppl 1)S12

22 Granier A Olivier M Laborie S Vaudescal S Baecker VTran-Aupiais C WIDE (Web Images and Data En-vironment) 2013 [httpwwwmricnrsfrindexphpm=81]

23 Kayser K Introduction of virtual microscopy inroutine surgical pathology mdash a hypothesis andpersonal view from Europe Diagnostic pathology2012 748

24 Goode A Satyanarayanan M A Vendor-NeutralLibrary and Viewer for Whole-Slide ImagesTech Rep Technical Report CMU-CS-08-136 Com-puter Science Department Carnegie Mellon Univer-sity 2008 [httpreports-archiveadmcscmueduanon2008CMU-CS-08-136pdf]

25 Linkert M Rueden CT Allan C Burel JM Moore WPatterson A Loranger B Moore J Neves C MacDon-ald D Tarkowska A Sticco C Hill E Rossner M EliceiriKW Swedlow JR Metadata matters access to im-age data in the real world Journal of Cell Biology2010 198(5)777ndash782

26 Khushi M Edwards G de Marcos DA Carpenter JEGraham JD Clarke CL Open source tools for man-agement and archiving of digital microscopy datato allow integration with patient pathology andtreatment information Diagnostic pathology 2013822

27 Sam Leffler S the authors of LibTIFF LibTIFF ndashTIFF Library and Utilities 2012 [httpwwwremotesensingorglibtiff]

28 Lane TG Vollbeding G The Independent JPEGGrouprsquos JPEG software 2013 [httpwwwijgorg]

29 Lane TG Vollbeding G the authors of the libjpeg-turbosoftware libjpeg-turbo 2012 [httplibjpeg-turbovirtualglorg]

30 Schneider CA Rasband WS Eliceiri KW NIH Imageto ImageJ 25 years of image analysis Nature Meth-ods 2012 9671ndash675

31 Sacha J Image IO Plugin Bundle 2004 [httpij-pluginssourceforgenetpluginsimageio]

32 Sun Microsystems Inc Java Advanced Library113 2006 [httpwwworaclecomtechnetworkjavacurrent-142188html]

33 BigTIFF Design 2012 [httpwwwremotesensingorglibtiffbigtiffdesignhtml]

34 The BigTIFF File Format Proposal 2008 [httpwwwawaresystemsbeimagingtiffbigtiffhtml]

8

Figures

(a) (b) (c)

Figure 1 - A sample slide(a) macroscopic view of the whole slide (the black rectangle on the left is 1x2 cm) (bc) Influence of themagnification on the quality of results (b) a portion of the slide scanned at magnification level 10x Thewhite contours show the result of an automatic detection of the dark cell nuclei with the ImageJ software Asignificant fraction of the cell nuclei is missed and the contours are rather pixelated (c) the same portion ofthe slide scanned at magnification 40x The white contours show the result of the same automatic detectionAlmost all cell nuclei are detected and the shapes of the contours are much more precise Scale bar 4 microm

Figure 2 - A typical session using ImageJ and the NDPITools pluginsA NDPI file has been opened with the NDPITools plugins and it is displayed as a preview image (image atlargest resolution which still fits into the computerrsquos screen) mdash top window A rectangular region has beenselected and extracted as a TIFF image then opened mdash bottom window

9

Figure 3 - Preview image of a NDPI file with several focalization levels in ImageJThe NDPI file 08ndpi contains images at 5 different focalization levels Therefore its preview image isdisplayed as a stack of 5 images

10

Figure 4 - Dialog box for customized extraction in ImageJ from an NDPI file with production of amosaicThe dialog box shows some options which can be customized while producing a mosaic from a rectangularselection of a NDPI file preview image (here using the file previewed in Figure 3)

Figure 5 - Statistical properties of the cell nuclei with high chromatin content in the tissue sample ofFigure 1The positions of the 154240 identified nuclei were obtained from the analysis with ImageJ of the digital slideon a laptop computer Since the slide was too large to fit into the computerrsquos memory it was turned into amosaic of 16 pieces with overlap of 60 pixels and each piece underwent automated analysis independentlyThen the results were aggregated The graph shows the probability density function of the distance of a cellnucleus to its nearest neighbor in the whole sample

11

TablesTable 1 - Speed comparison of software to extract a 256 times 256 rectangle from a huge TIFF imageTime needed (or indication of failure when the task was not completed) by several software tools to extracta rectangular region of size 256times256 pixels situated at the bottom right corner of huge TIFF images and tosave it as an independent file The input images were single-image tiled TIFF files using JPEG compressionTheir dimensions are indicated in the top row The computer used was a 26 GHz Intel Core i7 Mac Miniwith 16 GiB of RAM and more than 100 GiB of free hard disk The tested software tools were from topto bottom tifffastcrop from our LargeTIFFTools GraphicsMagick 1317 ImageMagick 680-7 and theutility tiffcrop from LibTIFF 403

Image size (px) 11264 times 4384 45056 times 17536 180224 times 70144

tifffastcrop 030 s 030 s 030 sGraphicsMagick 074 s 236 s gt 80 min

ImageMagick 118 s 236 s failedtiffcrop 050 s failed seg fault

Table 2 - Downloads of the NDPIToolsDistribution of the downloads (unique IP address) of the precompiled binaries of the NDPITools betweenMarch 2012 and April 2013

Windows (32 bits) Windows (64 bits) Linux Mac OS X483 542 217 285

12

Page 9: Analyzing huge pathology images with open source … · Analyzing huge pathology images with open source software ... telepathology and teaching [1,2]. In more and more hospitals,

Figures

(a) (b) (c)

Figure 1 - A sample slide(a) macroscopic view of the whole slide (the black rectangle on the left is 1x2 cm) (bc) Influence of themagnification on the quality of results (b) a portion of the slide scanned at magnification level 10x Thewhite contours show the result of an automatic detection of the dark cell nuclei with the ImageJ software Asignificant fraction of the cell nuclei is missed and the contours are rather pixelated (c) the same portion ofthe slide scanned at magnification 40x The white contours show the result of the same automatic detectionAlmost all cell nuclei are detected and the shapes of the contours are much more precise Scale bar 4 microm

Figure 2 - A typical session using ImageJ and the NDPITools pluginsA NDPI file has been opened with the NDPITools plugins and it is displayed as a preview image (image atlargest resolution which still fits into the computerrsquos screen) mdash top window A rectangular region has beenselected and extracted as a TIFF image then opened mdash bottom window

9

Figure 3 - Preview image of a NDPI file with several focalization levels in ImageJThe NDPI file 08ndpi contains images at 5 different focalization levels Therefore its preview image isdisplayed as a stack of 5 images

10

Figure 4 - Dialog box for customized extraction in ImageJ from an NDPI file with production of amosaicThe dialog box shows some options which can be customized while producing a mosaic from a rectangularselection of a NDPI file preview image (here using the file previewed in Figure 3)

Figure 5 - Statistical properties of the cell nuclei with high chromatin content in the tissue sample ofFigure 1The positions of the 154240 identified nuclei were obtained from the analysis with ImageJ of the digital slideon a laptop computer Since the slide was too large to fit into the computerrsquos memory it was turned into amosaic of 16 pieces with overlap of 60 pixels and each piece underwent automated analysis independentlyThen the results were aggregated The graph shows the probability density function of the distance of a cellnucleus to its nearest neighbor in the whole sample

11

TablesTable 1 - Speed comparison of software to extract a 256 times 256 rectangle from a huge TIFF imageTime needed (or indication of failure when the task was not completed) by several software tools to extracta rectangular region of size 256times256 pixels situated at the bottom right corner of huge TIFF images and tosave it as an independent file The input images were single-image tiled TIFF files using JPEG compressionTheir dimensions are indicated in the top row The computer used was a 26 GHz Intel Core i7 Mac Miniwith 16 GiB of RAM and more than 100 GiB of free hard disk The tested software tools were from topto bottom tifffastcrop from our LargeTIFFTools GraphicsMagick 1317 ImageMagick 680-7 and theutility tiffcrop from LibTIFF 403

Image size (px) 11264 times 4384 45056 times 17536 180224 times 70144

tifffastcrop 030 s 030 s 030 sGraphicsMagick 074 s 236 s gt 80 min

ImageMagick 118 s 236 s failedtiffcrop 050 s failed seg fault

Table 2 - Downloads of the NDPIToolsDistribution of the downloads (unique IP address) of the precompiled binaries of the NDPITools betweenMarch 2012 and April 2013

Windows (32 bits) Windows (64 bits) Linux Mac OS X483 542 217 285

12

Page 10: Analyzing huge pathology images with open source … · Analyzing huge pathology images with open source software ... telepathology and teaching [1,2]. In more and more hospitals,

Figure 3 - Preview image of a NDPI file with several focalization levels in ImageJThe NDPI file 08ndpi contains images at 5 different focalization levels Therefore its preview image isdisplayed as a stack of 5 images

10

Figure 4 - Dialog box for customized extraction in ImageJ from an NDPI file with production of amosaicThe dialog box shows some options which can be customized while producing a mosaic from a rectangularselection of a NDPI file preview image (here using the file previewed in Figure 3)

Figure 5 - Statistical properties of the cell nuclei with high chromatin content in the tissue sample ofFigure 1The positions of the 154240 identified nuclei were obtained from the analysis with ImageJ of the digital slideon a laptop computer Since the slide was too large to fit into the computerrsquos memory it was turned into amosaic of 16 pieces with overlap of 60 pixels and each piece underwent automated analysis independentlyThen the results were aggregated The graph shows the probability density function of the distance of a cellnucleus to its nearest neighbor in the whole sample

11

TablesTable 1 - Speed comparison of software to extract a 256 times 256 rectangle from a huge TIFF imageTime needed (or indication of failure when the task was not completed) by several software tools to extracta rectangular region of size 256times256 pixels situated at the bottom right corner of huge TIFF images and tosave it as an independent file The input images were single-image tiled TIFF files using JPEG compressionTheir dimensions are indicated in the top row The computer used was a 26 GHz Intel Core i7 Mac Miniwith 16 GiB of RAM and more than 100 GiB of free hard disk The tested software tools were from topto bottom tifffastcrop from our LargeTIFFTools GraphicsMagick 1317 ImageMagick 680-7 and theutility tiffcrop from LibTIFF 403

Image size (px) 11264 times 4384 45056 times 17536 180224 times 70144

tifffastcrop 030 s 030 s 030 sGraphicsMagick 074 s 236 s gt 80 min

ImageMagick 118 s 236 s failedtiffcrop 050 s failed seg fault

Table 2 - Downloads of the NDPIToolsDistribution of the downloads (unique IP address) of the precompiled binaries of the NDPITools betweenMarch 2012 and April 2013

Windows (32 bits) Windows (64 bits) Linux Mac OS X483 542 217 285

12

Page 11: Analyzing huge pathology images with open source … · Analyzing huge pathology images with open source software ... telepathology and teaching [1,2]. In more and more hospitals,

Figure 4 - Dialog box for customized extraction in ImageJ from an NDPI file with production of amosaicThe dialog box shows some options which can be customized while producing a mosaic from a rectangularselection of a NDPI file preview image (here using the file previewed in Figure 3)

Figure 5 - Statistical properties of the cell nuclei with high chromatin content in the tissue sample ofFigure 1The positions of the 154240 identified nuclei were obtained from the analysis with ImageJ of the digital slideon a laptop computer Since the slide was too large to fit into the computerrsquos memory it was turned into amosaic of 16 pieces with overlap of 60 pixels and each piece underwent automated analysis independentlyThen the results were aggregated The graph shows the probability density function of the distance of a cellnucleus to its nearest neighbor in the whole sample

11

TablesTable 1 - Speed comparison of software to extract a 256 times 256 rectangle from a huge TIFF imageTime needed (or indication of failure when the task was not completed) by several software tools to extracta rectangular region of size 256times256 pixels situated at the bottom right corner of huge TIFF images and tosave it as an independent file The input images were single-image tiled TIFF files using JPEG compressionTheir dimensions are indicated in the top row The computer used was a 26 GHz Intel Core i7 Mac Miniwith 16 GiB of RAM and more than 100 GiB of free hard disk The tested software tools were from topto bottom tifffastcrop from our LargeTIFFTools GraphicsMagick 1317 ImageMagick 680-7 and theutility tiffcrop from LibTIFF 403

Image size (px) 11264 times 4384 45056 times 17536 180224 times 70144

tifffastcrop 030 s 030 s 030 sGraphicsMagick 074 s 236 s gt 80 min

ImageMagick 118 s 236 s failedtiffcrop 050 s failed seg fault

Table 2 - Downloads of the NDPIToolsDistribution of the downloads (unique IP address) of the precompiled binaries of the NDPITools betweenMarch 2012 and April 2013

Windows (32 bits) Windows (64 bits) Linux Mac OS X483 542 217 285

12

Page 12: Analyzing huge pathology images with open source … · Analyzing huge pathology images with open source software ... telepathology and teaching [1,2]. In more and more hospitals,

TablesTable 1 - Speed comparison of software to extract a 256 times 256 rectangle from a huge TIFF imageTime needed (or indication of failure when the task was not completed) by several software tools to extracta rectangular region of size 256times256 pixels situated at the bottom right corner of huge TIFF images and tosave it as an independent file The input images were single-image tiled TIFF files using JPEG compressionTheir dimensions are indicated in the top row The computer used was a 26 GHz Intel Core i7 Mac Miniwith 16 GiB of RAM and more than 100 GiB of free hard disk The tested software tools were from topto bottom tifffastcrop from our LargeTIFFTools GraphicsMagick 1317 ImageMagick 680-7 and theutility tiffcrop from LibTIFF 403

Image size (px) 11264 times 4384 45056 times 17536 180224 times 70144

tifffastcrop 030 s 030 s 030 sGraphicsMagick 074 s 236 s gt 80 min

ImageMagick 118 s 236 s failedtiffcrop 050 s failed seg fault

Table 2 - Downloads of the NDPIToolsDistribution of the downloads (unique IP address) of the precompiled binaries of the NDPITools betweenMarch 2012 and April 2013

Windows (32 bits) Windows (64 bits) Linux Mac OS X483 542 217 285

12