miccai - workshop on high performance and distributed computing for medical imaging - sept 22 2011
DESCRIPTION
Keynote talk at HP-MICCAI / MICCAI-DCI 2011, The Joint Workshop on High Performance and Distributed Computing for Medical Imaging, featured at MICCAI 2011, held on September 22nd 2011 in Toronto, CanadaTRANSCRIPT
Integrative Multi-Scale Biomedical Informatics
Joel Saltz MD, PhD
Director Center for Comprehensive Informatics
C
en
ter
for
Com
pre
hen
siv
e I
nfo
rmati
cs
• Leverage exascale data and computer resources to squeeze the most out of image, sensor or simulation data
• Run lots of different algorithms to derive same features
• Run lots of algorithms to derive complementary features
• Data models and data management infrastructure to manage data products, feature sets and results from classification and machine learning algorithms
Squeezing Information from Spatial Datasets
C
en
ter
for
Com
pre
hen
siv
e I
nfo
rmati
cs
INTEGRATIVE BIOMEDICAL INFORMATICS ANALYSIS
Reproducible anatomic/functional characterization at gross level (Radiology) and fine level (Pathology)Integration of anatomic/functional characterization with multiple types of “omic” informationCreate categories of jointly classified data to describe pathophysiology, predict prognosis, response to treatment
In Silico Center for Brain Tumor Research
Specific Aims:
1. Influence of necrosis/hypoxia on gene expression andgenetic classification.
2. Molecular correlates of highresolution nuclear morphometry.
3. Gene expression profiles that predict glioma progression.
4. Molecular correlates of MRIenhancement patterns.
C
en
ter
for
Com
pre
hen
siv
e I
nfo
rmati
cs
Integration of heterogeneous multiscale information
•Coordinated initiatives Pathology, Radiology, “omics”
•Exploit synergies between all initiatives to improve ability to forecast survival & response.
Radiology
Imaging
Patient Outco
me
Pathologic Features
“Omic”
Data
Lee Cooper Carlos Moreno
Example: Pathology and Gene Expression Joint Predictors of Recurrence/Survival
C
en
ter
for
Com
pre
hen
siv
e I
nfo
rmati
cs
INTEGRATIVE ANALYSIS OF BRAIN TUMOR DATA
In Silico Center for Brain Tumor Research
Key Data Sets
REMBRANDT: Gene expression and genomics data set of all glioma subtypes
The Cancer Genome Atlas (TCGA): Rich “omics” set of GBM, digitized Pathology and Radiology
Pathology and Radiology Images from Henry Ford Hospital, Emory, Thomas Jefferson U, MD Anderson and others
TCGA Research Network
Digital Pathology
Neuroimaging
C
en
ter
for
Com
pre
hen
siv
e I
nfo
rmati
cs
** Data obtained by generosity of Lisa Scarpace/Tom Mikkelsen @ HF***Thanks to Adam Flanders/Mark Curtis @ TJU
TCGA and REMBRANDT Datasets
*Some overlap with TCGA main repository but rescanned at 40X** Data obtained by generosity of Lisa Scarpace/Tom Mikkelsen @ HF***Thanks to Adam Flanders/Mark Curtis @ TJU
C
en
ter
for
Com
pre
hen
siv
e I
nfo
rmati
cs
*Pending Upload to NBIA portal
** Data obtained by efforts of Lisa Scarpace/Tom Mikkelsen @ HF
Neuroimaging/MRI Data
Brain Tumor Radiology and Pathology
Anaplastic Astrocytoma(WHO grade III)
Glioblastoma(WHO grade IV)
C
en
ter
for
Com
pre
hen
siv
e I
nfo
rmati
cs
Morphological Tissue Classification
Nuclei Segmentation
Cellular Features
Jun Kong
Whole Slide Imaging
C
en
ter
for
Com
pre
hen
siv
e I
nfo
rmati
cs
Requirements for High Resolution Whole Slide Feature Characterization
• 1010 pixels for each whole slide image• 10 whole slide images per patient• 108 image features per whole slide
image• 10,000 brain tumor patients• 1015 pixels• 1013 features• Hundreds of algorithms• Annotations and markups from dozens
of humans
Distinguishing Characteristic in Gliomas
Use image analysis algorithms to segment and classify microanatomic features (Nuclei, Astrocytoma, Necrosis ...) in whole slide images
Represent the segmentation and classification in a well defined structured format that can be used to correlate the pathology with other data modalities
Oligodendroglioma Astrocytoma
Nuclear QualitiesRound shaped withsmooth regular texture
Elongated with rough, irregular texture
C
en
ter
for
Com
pre
hen
siv
e I
nfo
rmati
cs
Astrocytoma vs OligodendroglimaOverlap in genetics, gene expression, histology
Astrocytoma vs Oligodendroglima• Assess nuclear size (area and
perimeter), shape (eccentricity, circularity major axis, minor axis, Fourier shape descriptor and extent ratio), intensity (average, maximum, minimum, standard error) and texture (entropy, energy, skewness and kurtosis).
Whole slide scans from 14 TCGA GBMS (69 slides)7 purely astrocytic in morphology; 7 with 2+ oligo component399,233 nuclei analyzed for astro/oligo featuresCases were categorized based on ratio of oligo/astro cells
Machine-based Classification of TCGA GBMs (J Kong)
TCGA Gene Expression Query: c-Met overexpression
Manual: TCGA Neuropathology Attributes 120 TCGA specimens; 3 Reviewers
Presence and Degree of:
Microvascular hyperplasia Complex/glomeruloid Endothelial hyperplasia
Necrosis Pseudopalisading pattern Zonal necrosis
Inflammation Macrophages/histiocytes Lymphocytes Neutrophils
Differentiation: Small cell component Gemistocytes Oligodendroglial Multi-nucleated/giant cells Epithelial metaplasia Mesenchymal metaplasia
Other Features Perineuronal/perivascular
satellitosis Entrapped gray or white matter Micro-mineralization
C
en
ter
for
Com
pre
hen
siv
e I
nfo
rmati
cs
Clustergram of selected features used in consensus clustering
Fea
ture
Ind
ices
10
20
30
40
50
60
70
80
90
100
110
Nuclear Features Used to Classify GBMs
Cen
ter
for
Com
pre
hen
sive In
form
ati
cs
2 1 3 4
50 100 150
20
40
60
80
100
120
140
160
0 0.2 0.4 0.6 0.8 1
1
2
3
4
Silhouette Value
Clu
ste
r
Consensus clustering of morphological signatures
Study includes 200 million nuclei taken from 480 slides corresponding to 167 distinct patients.
Nuclear Features Used to Classify GBMs
Cen
ter
for
Com
pre
hen
sive In
form
ati
cs
Survival of morphological clusters
0 500 1000 1500 2000 2500 30000
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Days
Surv
ival
Cluster 1
Cluster 2Cluster 3
Cluster 4
Cen
ter
for
Com
pre
hen
sive In
form
ati
cs
Survival of patients by molecular tumor subtype
0 500 1000 1500 2000 2500 30000
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Days
Surv
ival
Proneural
NeuralClassical
Mesenchymal
Cen
ter
for
Com
pre
hen
sive In
form
ati
cs
What does this mean? Cluster 2: Gemistocytic Signature
• No association with• Transcriptional class• Age
• Molecular correlates:Wnt signaling, TP53 signaling,Oxidative phosphorylation, mRNA processing
Cen
ter
for
Com
pre
hen
sive In
form
ati
cs
High Resolution Methods
Cen
ter
for
Com
pre
hen
sive In
form
ati
cs
Multicore/GPU Implementation
• Nuclear/cell segmentation• Feature extraction• Generate clusters of nuclear/cell
Features• Classify nuclei/cells by feature cluster• Classify patients by populations of
classified nuclei/cells
Cen
ter
for
Com
pre
hen
sive In
form
ati
cs
Multicore/GPU Feature Extraction
Cen
ter
for
Com
pre
hen
sive In
form
ati
cs
Multicore/GPU Feature Extraction
Cen
ter
for
Com
pre
hen
sive In
form
ati
cs
Cen
ter
for
Com
pre
hen
sive In
form
ati
cs
Cen
ter
for
Com
pre
hen
sive In
form
ati
cs
Optimizing CPU/GPU Memory Management
Cen
ter
for
Com
pre
hen
sive In
form
ati
cs
Optimizing CPU/GPU Irregular Problem Performance
Cen
ter
for
Com
pre
hen
sive In
form
ati
cs
Scheduling for CPU/GPU System
• PRIORITY: assigns tasks to CPUs or GPUs based on estimate of relative performance gain of each task on GPU vs on CPU and load of the CPUs and GPUs
• Sorted queue of (task, data element) tuples based on the relative speedup expected for each tuple
• As more (task, data element) tuples are created, tuples are inserted into the queue such that the queue remains sorted
• First Come First Served: simple low overhead policy
Cen
ter
for
Com
pre
hen
sive In
form
ati
cs
Aggregation to reduce kernel invocation and data movement overheads
• Objects are grouped based on which image tile they are segmented from. Each tuple consists of (objects in image tile, feature computation operation)
• Perform group of tasks that are applied to the same data element (or the same chunk of data elements)
• Performance Aware Task Fusion (PATF) -- groups tasks based on GPU speedup values. Two groups of tasks: one group containing tasks with good GPU speedup values and the other group containing the remaining tasks
Cen
ter
for
Com
pre
hen
sive In
form
ati
cs
Cen
ter
for
Com
pre
hen
sive In
form
ati
cs
Cen
ter
for
Com
pre
hen
sive In
form
ati
cs
Multiscale Systems BiologyEmploy multi-resolution methods to
characterize necrosis, angiogenesis and correlate these with “omics”
No enhancementNormal VesselsStable lesion
?Rim-enhancementVascular ChangesRapid progression
Cen
ter
for
Com
pre
hen
sive In
form
ati
cs
Correlation of Necrosis, Angiogenesis and “omics”
• GBMs display variable and regionally heterogeneous degrees of necrosis (asterisk) and angiogenesis
• These factors may impact gene expression profiles
Cen
ter
for
Com
pre
hen
sive In
form
ati
cs
Genes Correlated with Necrosis include Transcription Factors Identified as Regulators of the Mesenchymal Transition
GeneSymbol
SAM q-value(Corrected p-value)
C/EBPB < 0.000001C/EBPD < 0.000001FOSL2 < 0.000001STAT3 0.0047RUNX1 0.0082
Carro MS, et al. Nature 263: 318-25, 2010
• Frozen sections from 88 GBM samples marked to identify regions of necrosis and angiogenesis
• Extent of both necrosis and angiogenesis calculated as a percentage of total tissue area
C
en
ter
for
Com
pre
hen
siv
e I
nfo
rmati
cs
VASARI Feature Set
Chad HolderScott HwangAdam Flanders
Cen
ter
for
Com
pre
hen
sive In
form
ati
cs
Defining Rich Set of Qualitative and Quantitative Image Biomarkers• Community-driven ontology development
project; collaboration with ASNR• Imaging features (5 categories)
– Location of lesion
– Morphology of lesion margin (definition, thickness, enhancement, diffusion)
– Morphology of lesion substance (enhancement, PS characteristics, focality/multicentricity, necrosis, cysts, midline invasion, cortical involvement, T1/FLAIR ratio)
– Alterations in vicinity of lesion (edema, edema crossing midline, hemorrhage, pial invasion, ependymal invasion, satellites, deep WM invasion, calvarial remodeling)
– Resection features (extent of nCE tissue, CE tissue, resected components)
C
en
ter
for
Com
pre
hen
siv
e I
nfo
rmati
cs
Emory TJU/CBIT/NCI UVA/Northwestern Henry Ford
David A Gutman1 Adam Flanders3 Max Wintermark8 Lisa Scarpace4
Lee Cooper1 Eric Huang2 Manal Jilwan8 Tom Mikkelsen4
Scott N Hwang1 Robert J Clifford2 Prashant Raghavan8
Chad A Holder1 Dina Hammoud3 Pat Mongkolwat9
Doris Gao1 John Freymann7
Carlos Moreno1 Justin Kirby7
Arun Krishnan1
Jun Kong1
Carl Jaffe6
Seena Dehkharghani1
Joel Saltz1
Dan Brat1
Imaging Predictors of survival and molecular profiles in the TCGA Glioblastoma Data set
The TCGA glioma working group
1Emory University Hospital, Atlanta, GA 2National Cancer Institute, Bethesda, MD. 3Thomas Jefferson University Hospital, Philadelphia, PA. 4Henry Ford University Hospital, Detroit, Michigan. 5National Institute of Health, Bethesda, MD. 6Boston University School of
Medicine, Boston, MA. 7SAIC-Frederick, Inc., Frederick, MD. 8University of Virginia, Charlottesville, VA. 9 Northwestern University Chicago, IL
C
en
ter
for
Com
pre
hen
siv
e I
nfo
rmati
cs
Correlative Imaging Results
• Minimal enhancing tumor (≤5%) strongly associated with Proneural classification (p=0.0006).
• >5% proportion of necrosis and the presence of microvascular hyperplasia in pathology slides (p=0.008).
• Greater maximum tumor dimension (T2 signal) associated with present/abundant microvascular hyperplasia (p=0.001).
< 5% Enhancement
C
en
ter
for
Com
pre
hen
siv
e I
nfo
rmati
cs
Correlative Imaging Results
• TP53 mutant tumors had a smaller mean tumor sizes (p=0.002) on T2-weighted or FLAIR images.
• EGFR mutant tumors were significantly larger than TP53 mutant tumors (p=0.0005).
• High level EGFR amplification was associated with >5% enhancement and >5% proportion of necrosis (p < 0.01).
> 5% Necrosis
C
en
ter
for
Com
pre
hen
siv
e I
nfo
rmati
cs
Automated feature extraction pipeline beingdeveloped for MRI
C
en
ter
for
Com
pre
hen
siv
e I
nfo
rmati
cs
Large Scale Data Management
Implemented with IBM DB2 for large scale pathology image metadata (~million markups per slide)
Represented by a complex data model capturing multi-faceted information including markups, annotations, algorithm provenance, specimen, etc.
Support for complex relationships and spatial query: multi-level granularities, relationships between markups and annotations, spatial and nested relationships
Data Models to Represent Feature Sets and Experimental Metadata
PAIS |pās| : Pathology Analytical Imaging Standards
• Provide semantically enabled data model to support pathology analytical imaging
• Data objects, comprehensive data types, and flexible relationships
• Object-oriented design, easily extensible• Reuse existing standards
– Reuse relevant classes already defined in AIM– Follow DICOM WG 26 metadata specifications on WSI
reference– Specimen information in DICOM Supplement 122 and
caTissue– Use caDSR for CDE and NCI Thesaurus for ontology
concepts
C
en
ter
for
Com
pre
hen
siv
e I
nfo
rmati
cs
Pathology Imaging GIS
Segmentation
Feature extraction
Image analysis
class Domain Mo...
Annotation
GeometricShape
CalculationObservation
Specimen
ImageReference
Provenance
User
PAIS
EquipmentGroup
AnatomicEntity
Subject
Field
Project
MicroscopyImageReference
DICOMImageReference
TMAImageReference
Markup
Inference
Region
WholeSlideImageReferencePatient
Surface
Collection
AnnotationReference
10..1
1
0..1
0..*
0..*
1
0..*1
0..1
1 0..*
1
0..1
1
0..1
10..1
10..*
1
0..*
0..*
0..*
1 0..11
0..1
1
0..*
0..1
0..*
1
0..*
1
0..1
1
0..*
10..1
10..1
1
0..*
10..*
1 0..*
1
0..*
Modeling and management of markup and annotation for querying and sharing through parallel RDBMS + spatial DBMS
PAIS model PAIS data management
On the fly data processing for algorithm validation/algorithm sensitivity studies, or discovery of preliminary results
HDFS data staging MapReduce based queries
Workflow, Adaptivity and Quality of Service
• In-transit data processing using filter/stream systems
• Semantic Workflows• Hierarchical pipeline design with coarse
and fine grained components• Adaptivity and Quality of Service
Computerized Classification System for Grading Neuroblastoma
• Background Identification• Image Decomposition (Multi-
resolution levels)• Image Segmentation
(EMLDA)• Feature Construction (2nd
order statistics, Tonal Features)
• Feature Extraction (LDA) + Classification (Bayesian)
• Multi-resolution Layer Controller (Confidence Region)
No
YesImage Tile
InitializationI = L
Background? Label
Create Image I(L)
Segmentation
Feature Construction
Feature Extraction
Classification
Segmentation
Feature Construction
Feature Extraction
Classifier Training
Down-sampling
Training Tiles
Within ConfidenceRegion ?
I = I -1
I > 1?
Yes
Yes
No
No
TRAINING
TESTING
Slides’ Preparation
• 64990 x 59412 pixels in full resolution• Original Size: 10.8 Gb; Compressed Sized: ≈
833Mb
8x
40x
C
en
ter
for
Com
pre
hen
siv
e I
nfo
rmati
cs
Semantic Workflows (Wings)Collaborative Work with Yolanda Gil, Mary Hall
• A systematic strategy for composing application components into workflows
• Search for the most appropriate implementation of both components and workflows
• Component optimization– Select among implementation variants of the same
computation– Derive integer values of optimization parameters– Only search promising code variants and a restricted
parameter space
• Workflow optimization– Knowledge-rich representation of workflow properties
C
en
ter
for
Com
pre
hen
siv
e I
nfo
rmati
cs
Adaptivity
Framework
• Description Module (Wings): Describe application workflow using semantics of workflow components
• Execution module (Pegasus, DataCutter, Condor): Maps to resources, generates and places fine grained filter/stream pipelines
• Tradeoff Module: Schedules execution based on application level QOS
DataCutter Filter Stream Workflow
Impact
Conclusions
• Increasingly common task: Integrative analysis of complementary multi-resolution and “omic” data sources
• Elucidation of mechanisms, identify medically important disease sub-classes
• Cell level whole slide analysis of whole slide images requires innovations in parallelization, data management
• Integration of Radiology and Pathology has promise but pose additional challenges
• Role of quality of service/on demand computing
Thanks to:• In silico center team: Dan Brat (Science PI), Tahsin Kurc, Ashish Sharma, Tony Pan, David
Gutman, Jun Kong, Sharath Cholleti, Carlos Moreno, Chad Holder, Erwin Van Meir, Daniel Rubin, Tom Mikkelsen, Adam Flanders, Joel Saltz (Director)
• caGrid Knowledge Center: Joel Saltz, Mike Caliguiri, Steve Langella co-Directors; Tahsin Kurc, Himanshu Rathod Emory leads
• caBIG In vivo imaging team: Eliot Siegel, Paul Mulhern, Adam Flanders, David Channon, Daniel Rubin, Fred Prior, Larry Tarbox and many others
• In vivo imaging Emory team: Tony Pan, Ashish Sharma, Joel Saltz• Emory ATC Supplement team: Tim Fox, Ashish Sharma, Tony Pan, Edi Schreibmann, Paul
Pantalone• Digital Pathology R01: Foran and Saltz; Jun Kong, Sharath Cholleti, Fusheng Wang, Tony
Pan, Tahsin Kurc, Ashish Sharma, David Gutman (Emory), Wenjin Chen, Vicky Chu, Jun Hu, Lin Yang, David J. Foran (Rutgers)
• NIH/in silico TCGA Imaging Group: Scott Hwang, Bob Clifford, Erich Huang, Dima Hammoud, Manal Jilwan, Prashant Raghavan, Max Wintermark, David Gutman, Carlos Moreno, Lee Cooper, John Freymann, Justin Kirby, Arun Krishnan, Seena Dehkharghani, Carl Jaffe
• ACTSI Biomedical Informatics Program: Marc Overcash, Tim Morris, Tahsin Kurc, Alexander Quarshie, Circe Tsui, Adam Davis, Sharon Mason, Andrew Post, Alfredo Tirado-Ramos
• NSF Scientific Workflow Collaboration: Vijay Kumar, Yolanda Gil, Mary Hall, Ewa Deelman, Tahsin Kurc, P. Sadayappan, Gaurang Mehta, Karan Vahi
Thanks!