migrating national lung screening trial’s ct image · pdf filemigrating national lung...
TRANSCRIPT
Migrating National Lung Screening Trial’s CT Image Library to a National Biomedical Imaging Archive David Maffitt, Paul Koppel, Stephen Moore, Kenneth Clark , Lawrence Tarbox, David Gierada, Fred Prior
Mallinckrodt Institute of Radiology, Washington University , Saint Louis, Missouri, USA
== Support == National Cancer Institute Contract NO1-CN-25516 and
Mallinckrodt Institute of Radiology, Washington University School of Medicine
The National Lung Screening Trial (NLST) objective is to determine whether screening with low-dose helical computed tomography (CT) scan versus chest X-ray (CXR) reduces lung cancer-specific mortality in participants who are at high risk for developing lung cancer. From 2002-2004, the NLST enrolled 53,456 participants, aged 55-74, in two components: the American College of Radiology Imaging Network (ACRIN) component and the Lung Screening Study (LSS). Participants were randomized to CXR or CT arms and received up to 3 imaging screens at annual intervals. As of Fall 2010, screening is complete; final collection of participant follow-up data through 2009 is underway; an announcement of NLST primary endpoint results is anticipated in 2011.
The LSS enrolled 34,614 participants through 10 screening centers and 2 satellite centers (Figure 1). Available CT exams (48,547) were de-identified of protected health information and delivered to a CT Image Library (CTIL) at Washington University where stringent quality assurance measures (automated checks of DICOM headers and visual inspection of images) were applied before images were archived. Associated baseline medical histories, medical updates at screening, radiologist interpretations of images, and CT-exam and CT-scanner data are maintained at the LSS Coordinating Center, Westat, an independent research firm contracted to manage the LSS data. The CT exams are available, on a restricted basis, to clinical-research and imaging-science investigators.
Plans for the CT Image Library
Access to the CTIL is currently limited to research projects approved by the NLST leadership. In filling approved image requests, CTIL management copies requested images to DVDs or to an external hard drive and ships to the approved investigator. The CTIL itself resides inside a private network on no-longer supported EMC Centera disk storage with access through a commercial Merge Healthcare Fusion Server application. We are migrating the CTIL to updated storage controlled by a local BlueArc network storage system. As part of that migration, we plan to change CTIL’s management software from the commercial Merge application to the National Biomedical Imaging Archive (NBIA) software. By so doing, CTIL could be prepared for public access following an anticipated 2011 announcement of NLST primary endpoint results.
The National Biomedical Imaging Archive (NBIA) is an open-source information management software package developed by the caBIG Imaging Workspace. Many instances of NBIA have been deployed both at NCI and in many research institutions and cancer centers, including several instances at Washington University. NBIA enables the development of imaging resources that lead to improved clinical decision support, accelerated decision-making, and quantitative imaging assessment of drug response. NBIA provides Web-based and caGrid-based access to de-identified DICOM images and metadata using role-based security. In addition to the NCI hosted NBIA, institutions can adapt NBIA for data storage by standing up an instance of NBIA at the institution with assistance from caBIG® licensed service providers. The NBIA download package is a ZIP package that includes the NBIA application, supporting libraries, the RSNA Clinical Trial Processor (CTP) application (with NBIA modifications), documentation, and a sample NBIA database.
The National Biomedical Imaging Archive
Why NBIA is a Better Platform
• Public access through an NCI-supported portal.
• Password-protected access.
• Query-able database using shopping-cart selection.
• On-line curation.
• Portable. Can be located anywhere.
• Interfaces to caBIG through caGrid services.
History of the National Lung Screening Trial and the CT Image Library
Database Mapping
Example Image in Merge and NBIA ( before de-identification; after re-identification )
CTIL (current) & NBIA/CTIL (proposed)
Figure 2 shows the current CTIL configuration and the proposed NBIA/CTIL configuration; the details of the transfer of images are described in the section “CTILNBIA Conversion.”
When images become publicly available through NBIA, investigators may access through the Internet using ordinary web services or through caGrid services via the cancer Bioinformatics Grid.
CT Image Library: Electronic Radiology Lab, MIR, Washington University (Saint Louis MO)
Coordinating Center: Westat (Rockville, MD)
Pacific Health Research Institute (Honolulu)
St. Luke’s Meridian Medical Center (Boise) [ Univ. Utah satellite ]
Univ. of Utah
Univ. of Colorado Denver
Univ. of Minnesota Marshfield (WI) Clinic Research Foundation
Henry Ford Health System
Univ. of Pittsburgh
Georgetown Univ.
Washington Univ.
Center for Diagnostic Imaging (Indianapolis)
[ Univ. Minnesota satellite ]
Univ. of Alabama at Birmingham
Figure 1. LSS Screening Centers, Coordinating Center, and CT Image Library
References
• National Lung Screening Trial. What is NLST? (online) http://www.cancer.gov/nlst/what-is-nlst.
• National Biomedical Imaging Archive. (online) https://cabig.nci.nih.gov/tools/NCIA.
• Cancer Bioinformatics Grid (caBIG). (online) https://cabig.nci.nih.gov.
• caGrid. (online) http://cagrid.org/display/cagridhome/Home.
• User Provisioning Tool (UPT). (online)
https://wiki.nci.nih.gov/display/caCORE/FAQs+-+CSM+-+UPT
• CTP – The RSNA Clinical Trial Processor. (online) http://mircwiki.rsna.org/index.php?title=CTP-
The_RSNA_Clinical_Trial_Processor.
• Moore SM, Maffitt DR, Blaine GJ, Bae KT. A Workstation Acquisition Node for Multi-Center Imaging Studies. Medical Imaging 2001, PACS Design and Evaluation: Engineering and Clinical Issues. 2001; 4323:271-277.
• Clark KW, Gierada DS, Marquez G, Moore SM, Maffitt DR, Moulton JD, Wolfsberger MA, Koppel P, Phillips SR, Prior FW. Collecting 48,000 CT Exams for the Lung Screen Study of the National Lung Screening Trial. J. Digital Imaging. 2009; 22(6):667-680.
Transfer Rate of CTIL Images into NBIA
The NBIA software records a time-stamp for each CTIL-converted-image posted to it; and, from this log, it is possible to generate the time between image postings. The time interval includes conversion from the CTIL JPEG lossless-compressed pixel data to DICOM native format, the removal of unwanted attributes from DICOM headers, the insertion of clinical-trial attributes into those headers, and quality control checks. Figure 6 plots images/second for “time” data-points of 1,000 images for the first 2.3 million images posted to the system. The average posting was 2.71 images/second, including deliberate stoppages to check database integrity (seven time-points near the horizontal axis). The oscillatory observations to the left (transfer beginning) are attributed to a mis-configured firewall issue and an off-and-on intensive 200-node supercomputer job dominating bandwidth into the medical-center’s shared BlueArc storage resource. The dips to the right of center also reflect BlueArc contention. At the average transfer rate, the 12.3 million-image CT Image Library will require about 60 days to convert to NBIA.
Next Steps
Migration testing will continue through third quarter 2010. Full migration and evaluation is expected to take two to three months and complete late 2010.
After verifying successful migration, web services will be installed. NLST investigators will test the NBIA shopping-cart method for downloading small sets of image studies and help determine an optimal range of studies to acquire by this method. CT Image Library staff will develop mechanisms for providing larger sets of image studies to investigators on portable media.
Meanwhile, access to the NBIA/CTIL via caGrid services will be implemented and tested. And, we shall investigate modifying the standard NBIA database to include NLST-specific variables against which investigators might pose queries.
== Contact Information == Fred Prior
T: 314.747.0331 C: 314.303.2485 F: 314.362.6971 E: [email protected]
Figure 3. File-conversion process (left), comments (right).
Host Memory Hard
Drive
Virtual
Hard
Drive
Image
Storage
Location
Database
Storage
Location
Network Physical
Hardware
ncia4 1024 MB 20 GB Operating
system file
Storage
array
Storage
array
Trunked
VLAN –
ERL/DMZ
Dell 2950
nbiatest 1024 MB 13 GB Operating
system file
Storage
array
Storage
array
Trunked
VLAN –
ERL/DMZ
Dell 2950
ctil-nbia1 2048MB 20 GB LVM Storage
array
Storage
array
Standard
CTIL Dell R510
Hosting the NBIA Software
Table 1. Hardware Configurations Hosting the NBIA Software
To install the NBIA software, two components are needed: the NBIA software (current version, 4.4) and the User Provisioning Tool, UPT (4.2). In addition, a standard Java environment and MySQL database server are required (apache-ant-1.71, jdk-1.5.0.16, mysql-server-5.0.67, and mysql-client-5.0.67). The installation of NBIA was done on three Xen Virtual Machine guests (ncia4, nbiatest, ctil-nbia1) distributed over several physical servers. Each virtual system is running CentOS 5.5 and is configured with conservative memory, moderately-sized hard drive, and single CPU core (Table 1).
All three systems mount a large redundantly-mirrored disk array to store uploaded images. The MySQL database is also mounted on the disk array for the first two systems, but the third system (ctil-nbia1) uses locally mirrored storage for a possible speed improvement in processing images.
With respect to the network, the first two virtual machines are located on our DMZ network to allow connections from the public internet. The base operating system of the physical servers is also CentOS 5.5 but its network connection is on the Electronic Radiology Laboratory (ERL) private network for additional security. This configuration is accomplished using standard VLAN networking, but modifying the Xen network scripts.
The last column shows that the first two virtual machines run as Xen guests on a Dell 2950; the third virtual machine guest runs on a Dell R510.
The first two servers are used to test the installation and operation of NBIA and UPT software, as well as test the update process to the NBIA software. The third server is on faster hardware that will be the backbone for the CT Image Library migration from the Merge system and will eventually host public access to the CT Image Library. These three Xen Virtual Machines have been instrumental in giving us the flexibility to test and re-test NBIA software.
Figure 6. Image Transfer Rates per 1,000 Images
Figure 4a shows a Merge/Centera-resident image appearing in the Merge eFilm Fusion PACS viewer. Figure 4b shows the same image, now in the local NBIA/BlueArc system. NBIA does not provide a DICOM viewer, and the image in Figure 4b is rendered with the DicomWorks viewer. Note the DICOM differences, before and after NBIA conversion, below Figure 4b, in Tables 3 and 4.
There are two aspects to mapping CTIL images to NBIA images: (1) values entered into the NBIA database via the normal image submission process, and (2) values that are placed in custom NBIA database tables via a custom channel.
(1) Because images are first run through the anonymizer, only those attributes left in the anonymized images are saved in the NBIA database. The anonymizer uses a “white list” that specifies which attributes are retained; all other attributes are discarded. Of those attributes retained, the list further specifies which attributes pass values “as is”, which to change to <blank>, which to not create if missing, and which to create if missing (and what value assigned). The list also tells which attributes must be present; if an image is missing such an attribute, the study to which the image belongs fails the CTILNBIA conversion and is quarantined. This “white list” mechanism provides a high level of control and prevents unanticipated elements from sneaking through.
(2) Custom mapping. Three tables are created in the NBIA database to contain values in the CTIL database not available in the stock NBIA database. Example values are calculated CT scanner parameters such as calculated-mAs, calculated-effective-mAs, and table-speed. These tables are populated by custom SQL, run in a batch, independent of the normal submission process. These tables are invisible and inaccessible to NBIA users at this time.
Study, Series, and SOP Instance UIDs in DICOM headers of CTIL-resident images are unchanged from those provided by the NLST/LSS screening sites. As these images are converted to NBIA format, they are given Instance UIDS with roots unique to the CTIL collection in the NBIA.
All private DICOM groups in the CTIL images are removed before committed to NBIA. Private DICOM group (0013) is added, as required by NBIA software; and this private group specifies the project name, trial name, and de-identification method. Clinical Trial Identifier attributes, DICOM group (0012), are added to brand images with NLST/CTIL identifiers (Table 2).
CTIL NBIA Conversion
At the heart of migrating CTIL images to NBIA format is the Clinical Trials Processor (CTP). CTP is a stand-alone program that provides a wide variety of features for clinical trials in a highly configurable and extensible application. Among its key features are: support for multiple pipelines; processing pipelines that support multiple configurable stages; support for quarantines of data objects that are rejected during processing; pre-defined implementations for key components (HTTP Import, DICOM Import, DICOM Anonymizer, XML Anonymizer, File Storage, Database Export, HTTP Export, DICOM Export, FTP Export); and web-based monitoring of an application’s status (including configuration, logs, quarantines). The monitoring tools are proving invaluable to the migration-testing process.
CTP is built into the NBIA software, and its flexibility permitted the custom building of new pipelines for walking through the Merge file system, un-tar-ing the image study files, and converting the Merge-formatted image data into NBIA-formatted image data. Both Merge and NBIA store images as DICOM files, but Merge stores the pixel data in a JPEG lossless-compression format. During conversion, the compressed pixel data are restored to original uncompressed pixel data that are saved with their headers in NBIA as ordinary DICOM files.
CTP migration pipelines are outlined in Figure 3.
Clinical Trial Attribute DICOM Tag Type Value
Sponsor Name (0012, 0010) 1 ‘National Cancer Institute’
Protocol ID (0012, 0020) 1 ‘NCT00047385’
Protocol Name (0012, 0021) 2 ‘NLST-LSS’
Site ID (0012, 0030) 2 < blank >
Site Name (0012, 0031) 2 < blank >
Subject ID (0012, 0040) 1C [ same as Patient ID ]
Time Point ID (0012, 0050) 2 ‘T0’ or ‘T1’ or ‘T2’ [ NLST screening year ]
Coordinating Center (0012, 0060) 2 ‘Washington University (images); Westat (associated data)’
Table 2. DICOM Clinical Trial Attributes Added To NBIA Images Type = 1 [required]; = 1C [required if condition(s) met]; = 2 [required, may be blank]
Figure 2. CTIL Images Before and After Conversion to NBIA
DICOM Merge NBIA
Attribute (Group,
Element) Value Value
Changed New
Transfer Syntax UID (0002, 0010) 1.2.840.10008.1.2 1.2.840.10008.1.2.1
Study Date (0008, 0020) 19990102 19990102
Accession # (0008, 0050) 595596 595596
Modality (0008, 0060) CT CT
Manufacturer (0008, 0070) SIEMENS SIEMENS
Station Name (0008, 1010) <blank> S08
Mfr. Model Name (0008, 1090 Volume Zoom Volume Zoom
Patient Name (0010, 0010) CTIL^027318 CTIL^027318
Patient ID (0010, 0020) 027318 027318
Patient Birth Date (0010, 0030) <blank> <blank>
Patient Sex (0010, 0040) <blank> <blank>
Clinical Trial
Sponsor Name (0012, 0010) <not present>
National Cancer
Institute
Clinical Trial
Protocol ID (0012, 0020) <not present> NCT00047385
Clinical Trial
Protocol Name (0012, 0021) <not present> NLST-LSS
Clinical Trial
Site ID (0012, 0030) <not present> <blank>
Clinical Trial
Site Name (0012, 0031) <not present> <blank>
Clinical Trial
Subject ID (0012, 0040) <not present> 027318
Clinical Trial
Time Point ID (0012, 0050) <not present> T0
Clinical Trial
Coord. Ctr. Name (0012, 0060) <not present>
Washington U
(images), Westat
(assoc. data)
Patient Identity
Removed (0012, 0062) <not present> YES
De-identification
Method (0012, 0063) <not present> CTP
SOP, Study, and Series Instance UIDs for
Accession Number 595596, Patient CTIL^027318, Series 4, Image 1
Attribute (Group,
Element)
Value Before NBIA conversion
Value After NBIA conversion
SOP Instance UID (0008, 0018) 1.3.12.2.1107.5.1.3.24031.4.0.48935596658229
1.2.840.113654.2.55.113287286408384640902334327751067875120
Study Instance UID (0020, 000D) 1.2.124.113532.128.252.220.117.20021010.115157.9725570
1.2.840.113654.2.55.71705526669987271282715415317014984574
Series Instance UID (0020, 000E) 1.3.12.2.1107.5.1.3.24031.4.0.4893489429482352
1.2.840.113654.2.55.247263781161754510388644022688834301907
0.0
1.0
2.0
3.0
4.0
Images / Second
Time: Each Data Point = 1,000 Images
CTIL --> NBIA Transfer Rates ( first 2.3 million images )
0.00
0.50
1.00
1.50
2.00
2.50millions CTIL --> NBIA Cumulative Images by Date
Figure 5. CTIL NBIA Migration Progress
Figure 5 shows cumulative migration progress from the start, August 19 into August 31, 2010. More than 2.3 million images have been converted.
CTIL NBIA Migration Progress
Figure 4a. CTIL Image in Merge Fusion PACS Viewer
Figure 4b. NBIA Image After Re-identification, in DicomWorks Viewer
Table 4. Other DICOM Values Before and After Migration
Table 3. DICOM Instance UIDs Before and After Migration
== Acknowledgement ==
The authors thank Eric Kascic and Justin Kirby of the National Cancer Institute for their assistance in helping us understand, install, troubleshoot, and evaluate multiple versions of the NBIA Software. The authors also thank John Perry, independent consultant and author of the RSNA Clinical Trial Processor software, who helped us design efficient pipelines to convert the Merge-format CT Image Library into the NBIA format.