applications of machine learning to medical imaging
DESCRIPTION
Applications of Machine Learning to Medical Imaging. Daniela S. Raicu, PhD Associate Professor, CDM DePaul University Email: [email protected] Lab URL: http://facweb.cs.depaul.edu/research/vc/. MS in CS from Wayne State University, Michigan. PhD in CS from Oakland University, Michigan. - PowerPoint PPT PresentationTRANSCRIPT
11
Applications of Machine Learning to Medical Imaging
Daniela S. Raicu, PhD
Associate Professor, CDMDePaul University
Email: [email protected] URL: http://facweb.cs.depaul.edu/research/vc/
About me…About me…
• BS in Mathematics from University of BS in Mathematics from University of Bucharest, RomaniaBucharest, Romania
• MS in CS from Wayne State MS in CS from Wayne State University, MichiganUniversity, Michigan
• PhD in CS from Oakland PhD in CS from Oakland University, MichiganUniversity, Michigan
My dissertation workMy dissertation work• Research areasResearch areas:: Data Mining & Computer Vision Data Mining & Computer Vision
• Dissertation topic:Dissertation topic: Content-based image retrieval Content-based image retrieval
• Research hypothesisResearch hypothesis:“A picture is worth thousands of words…”
•“There is enough information in the image content to perform image retrieval whose similarity results correspond to the human perceived similarity”.
My dissertation work (cont)My dissertation work (cont)• Research hypothesisResearch hypothesis:
•“There is enough information in the image content to perform image retrieval whose similarity results correspond to the human perceived similarity”.
• Methodology:Methodology: 1) extract color image features, 2) define color-based similarity, 3) cluster images based on color, 4) retrieve similar images• Output:Output: Color-based CBIR for general purpose image datasets
Proof of hypothesis: Proof of hypothesis: Google similar images: http://similar-images.googlelabs.com/
Towards an academic careerTowards an academic career
• Assistant Professor at DePaul, 2002-2008Assistant Professor at DePaul, 2002-2008
• Associate Professor, 2008- PresentAssociate Professor, 2008- Present
• Teaching areas & research interests:
data analysis, data mining, image processing, computer vision & medical informatics
• Co-director of the Intelligent Multimedia Processing, Medical Informatics lab & the NSF REU Program in Medical Informatics
Outline
Part I: Introduction to Medical InformaticsMedical InformaticsClinical Decision Making Imaging Modalities and Medical ImagingBasic Concepts in Image Processing
Part II: Advances in Medical Imaging Research Computer-Aided DiagnosisComputer-Aided Diagnostic CharacterizationTexture-based ClassificationContent-based Image Retrieval
Medical informatics Medical informatics researchresearchWhat is medical informatics?What is medical informatics?
Medical informatics is the application of is the application of computers, communications and information computers, communications and information technology and systems to all fields of medicinetechnology and systems to all fields of medicine
- medical care- medical care - medical education- medical education - medical research. - medical research.
MF Collen, MEDINFO '80, TokyoMF Collen, MEDINFO '80, Tokyo
What is medical What is medical informatics?informatics?Medical informaticsMedical informatics is the branch of science is the branch of science concerned with the use of computers and concerned with the use of computers and communication technology communication technology to acquire, store, to acquire, store, analyze, communicate, and display medical analyze, communicate, and display medical information and knowledgeinformation and knowledge to facilitate to facilitate understanding and understanding and improve the accuracy, improve the accuracy, timeliness, and reliability of decision-making.timeliness, and reliability of decision-making.
Warner, Sorenson and Bouhaddou, Warner, Sorenson and Bouhaddou, Knowledge Engineering in Health Informatics, 1997 Knowledge Engineering in Health Informatics, 1997
Clinical decision makingClinical decision making
• Making sound clinical decisions requires:Making sound clinical decisions requires:– – right information, right time, right right information, right time, right
formatformat• Clinicians face a surplus of informationClinicians face a surplus of information
– – ambiguous, incomplete, or poorly ambiguous, incomplete, or poorly organizedorganized
• Rising tide of informationRising tide of information– – Expanding knowledge sourcesExpanding knowledge sources– 40K new biomedical articles per month40K new biomedical articles per month– Publicly accessible online health infoPublicly accessible online health info– Hundreds of pictures per scan for one patientHundreds of pictures per scan for one patient
Clinical decision makingClinical decision making: : What is the What is the problem?problem?
• Man is an imperfect data processorMan is an imperfect data processor
– – We are sensitive to the We are sensitive to the quantity quantity and and organizationorganization of informationof information
• Army officers and pilots commit ‘fatal errors’ Army officers and pilots commit ‘fatal errors’ when given too many, too few, or poorly when given too many, too few, or poorly organized dataorganized data
• The same is true for clinicians who ‘watch’ for The same is true for clinicians who ‘watch’ for eventsevents
• Clinicians are particularly susceptible to Clinicians are particularly susceptible to errors of errors of omissionomission
Clinical decision makingClinical decision making: : What is the What is the problem?problem?
• Humans are “non-perfectable” data Humans are “non-perfectable” data processorsprocessors
- Better performance requires more time to - Better performance requires more time to processprocess
- Irony- Irony
• • Clinicians increasingly face Clinicians increasingly face productivity expectationsproductivity expectations
• • Clinicians face increasing Clinicians face increasing administrative tasksadministrative tasks
Subdomains of medical Subdomains of medical informatics informatics (by Wikipedia)(by Wikipedia)
• imaging informaticsimaging informatics
• clinical informaticsclinical informatics
• nursing informaticsnursing informatics
• consumer health consumer health informaticsinformatics
• public health informaticspublic health informatics
• dental informaticsdental informatics
• clinical research clinical research informaticsinformatics
• bioinformaticsbioinformatics
• pharmacy informaticspharmacy informatics
The study of medical imaging is concerned with theinteraction of all forms of radiation with tissue andthe development of appropriate technology to extract clinically useful information (usually displayed in an image format) from observation of this technology.
What is medical imaging What is medical imaging (MI)?(MI)?
• Structural/anatomical information (CT, MRI, US) - within each elemental volume, tissue-differentiating properties are measured.
• Information about function (PET, SPECT, fMRI).
Sources of Images:
Examples of medical imagesExamples of medical images
The imaging “chain”The imaging “chain”
Raw data
Reconstruction
123……………2346…………..65789…………6578…………..
Quantitative output
Processing
Analysis
Filtering
“Raw data”
Signalacquisition
Image analysis: Image analysis: Turning an image into dataTurning an image into data
• User extracted qualitative featuresUser extracted qualitative features
• User extracted quantitative featuresUser extracted quantitative features
• Semi automatedSemi automated
• AutomatedAutomated
Exam Level: Feature 1Feature 2Feature 3 . .
Finding: Feature 1Feature 2 . .
Major advances in medical Major advances in medical imagingimaging
These major advances can play a major role in These major advances can play a major role in early detection, diagnosis, and computerized early detection, diagnosis, and computerized treatment planning in cancer radiation therapy.treatment planning in cancer radiation therapy.
Image Segmentation Image Classification Computer-Aided Diagnosis Systems Computer-Aided Diagnostic Characterization Content-based Image Retrieval Image Annotation
Computer-Aided Diagnosis
• Computed Aided Diagnosis (CAD) is diagnosis made by a
radiologist when the output of computerized image analysis methods
has been incorporated into his or her medical decision-making
process.
• CAD may be interpreted broadly to incorporate both • the detection of the abnormality task and • the classification task: likelihood that the abnormality
represents a malignancy
Motivation for CAD systems
The amount of image data acquired during a CT scan is
becoming overwhelming for human vision and the overload of
image data for interpretation may result in oversight errors.
Computed Aided Diagnosis for:
• Breast Cancer
• Lung Cancer
– A thoracic CT scan generates about 240 section images for
radiologists to interpret.
• Colon Cancer
– CT colonography (virtual colonoscopy) is being examined as a
potential screening device (400-700 images)
CAD for Breast Cancer
A mammogram is an X-ray of breast tissue used as a screening tool searching for cancer when there are no symptoms of anything being wrong. A mammogram detects lumps, changes in breast tissue or calcifications when they're too small to be found in a physical exam.
• Abnormal tissue shows up a dense white on mammograms.
• The left scan shows a normal breast while the right one shows malignant calcifications.
CAD for Lung Cancer
• Identification of lung nodules in thoracic CT scan; the identification is
complicated by the blood vessels
• Once a nodule has been detected, it may be quantitatively analyzed as
follows:
• The classification of the nodule as benign or malignant
• The evaluation of the temporal size in the nodule size.
CAD for Colon Cancer• Virtual colonoscopy (CT colonography) is a minimally invasive
imaging technique that combines volumetrically acquired helical CT data with advanced graphical software to create two and three-dimensional views of the colon.
Three-dimensional endoluminal view of the colon showing the appearance of normal haustral folds and a small rounded polyp.
Role of Image Analysis & Machine Learning for CAD
• An overall scheme for computed aided diagnosis systems
Organ Segmentation
Lesion / Abnormality
Segmentation
Classification
Feature Extraction
- Breast Boundary- Lungs- Colon
Evaluation & Interpretation
- Nodule- Polyps
- Texture- Shape- Geometrical properties
- Malignant- Benign
- Breast Images- Thoracic Images
SoC Medical imaging research SoC Medical imaging research projectsprojects
1. Computer-aided characterization for lung 1. Computer-aided characterization for lung nodulesnodules
Goal:Goal: establish the link between computer-based establish the link between computer-based image features of lung nodules in CT scans and image features of lung nodules in CT scans and visual descriptors defined by human experts visual descriptors defined by human experts (semantic concepts) for automatic interpretation (semantic concepts) for automatic interpretation of lung nodules of lung nodules
Example:Example: This lung nodule has a “solid” This lung nodule has a “solid” texturetexture and has a “sharp” and has a “sharp” marginmargin
25
Why computer-aided Why computer-aided characterization? characterization?
Ratings and Boundaries across radiologists are Ratings and Boundaries across radiologists are different!!!different!!!
Reader 1Reader 1 Reader 2Reader 2
Reader 3Reader 3 Reader 4Reader 4
Lobulation=4
Malignancy=5
“highly
suspicious”
Sphericity=2
Lobulation=1
“marked”
Malignancy=5 “highly
suspicious”
Sphericity=4
Lobulation=2
Malignancy=5
“highly suspicious”
Sphericity=5
“round”
Lobulation=5 “none”
Malignancy=5
“highly suspicious”
Sphericity=3 “ovoid”
Computer-aided Computer-aided characterizationcharacterization
• Research HypothesisResearch Hypothesis• ““The working hypothesis is that certain radiologists’ The working hypothesis is that certain radiologists’
assessments can be mapped to the most important assessments can be mapped to the most important low-level image features”.low-level image features”.
• MethodologyMethodology• new semi-supervised probabilistic learning new semi-supervised probabilistic learning
approaches that will deal with both the inter-observer approaches that will deal with both the inter-observer variability and the small set of labeled data variability and the small set of labeled data (annotated lung nodules). (annotated lung nodules).
• Our proposed learning approach will be based on an Our proposed learning approach will be based on an ensemble of classifiers (instead of a single classifier ensemble of classifiers (instead of a single classifier as with most CAD systems) built to emulate the LIDC as with most CAD systems) built to emulate the LIDC ensemble (panel) of radiologists. ensemble (panel) of radiologists.
Computer-aided Computer-aided characterization characterization (cont.)(cont.)
• Expected outcome:Expected outcome: • an optimal set of quantitative diagnostic features an optimal set of quantitative diagnostic features
linked to the visual descriptors (semantic concepts). linked to the visual descriptors (semantic concepts).
• Significance:Significance: • The derived mappings can serve to show The derived mappings can serve to show
– the computer interpretation of the corresponding the computer interpretation of the corresponding radiologist rating in terms of a set of standard and radiologist rating in terms of a set of standard and objective image features, objective image features,
– automatically annotate new images, automatically annotate new images, – and augment the lung nodule retrieval results with and augment the lung nodule retrieval results with
their probabilistic diagnostic interpretations. their probabilistic diagnostic interpretations.
Computer-aided Computer-aided characterizationcharacterization• Preliminary resultsPreliminary results
– NIH Lung Image Database Consortium (LIDC):NIH Lung Image Database Consortium (LIDC):
•149 distinct nodules from about 85 149 distinct nodules from about 85 cases/patients;cases/patients;
•four radiologists marked the nodules using 9 four radiologists marked the nodules using 9 semantic characteristics on a scale from 1 to semantic characteristics on a scale from 1 to 5 except for calcification (1 to 6) and 5 except for calcification (1 to 6) and internal structure (1 to 4)internal structure (1 to 4)
29
CharacteristicCharacteristic Possible ScoresPossible Scores
CalcificationCalcification 1. Popcorn1. Popcorn2. Laminated2. Laminated3. Solid3. Solid4. Non-central4. Non-central5. Central5. Central6. Absent6. Absent
Internal Internal structurestructure
1. Soft Tissue1. Soft Tissue2. Fluid2. Fluid3. Fat3. Fat4. Air4. Air
LobulationLobulation 1. Marked1. Marked2. . 3. .2. . 3. . 4. .4. .5. None5. None
MalignancyMalignancy 1. Highly Unlikely 1. Highly Unlikely 2. Moderately Unlikely2. Moderately Unlikely3. Indeterminate3. Indeterminate4. Moderately 4. Moderately Suspicious Suspicious 5. Highly Suspicious5. Highly Suspicious
CharacteristCharacteristicic
Possible ScoresPossible Scores
MarginMargin 1. Poorly Defined1. Poorly Defined2. .2. . 3. .3. . 4. .4. .5. Sharp5. Sharp
SphericitySphericity 1. Linear1. Linear2. .2. .3. Ovoid3. Ovoid4. .4. .5. Round5. Round
SpiculationSpiculation 1. Marked1. Marked2. .2. . 3. .3. . 4. .4. .5. None5. None
SubtletySubtlety 1. Extremely Subtle 1. Extremely Subtle 2. Moderately Subtle 2. Moderately Subtle 3. Fairly Subtle 3. Fairly Subtle 4. Moderately 4. Moderately Obvious Obvious 5. Obvious5. Obvious
TextureTexture 1. Non-Solid1. Non-Solid2. .2. .3. Part Solid/(Mixed) 3. Part Solid/(Mixed) 4. .4. .5. Solid5. Solid
Computer-aided Computer-aided characterizationcharacterization• LIDC high level concepts & LIDC high level concepts &
ratingsratings
30
Computer-aided Computer-aided characterizationcharacterization
Shape Features Shape Features Size Features Size Features Intensity Features Intensity Features Texture Features Texture Features
CircularityCircularity AreaArea MinIntensityMinIntensity
1 1 Haralick features calcul 1 1 Haralick features calcul - ated from co occurrence matr - ated from co occurrence matr
ices ices
RoughnessRoughness ConvexAreaConvexArea MaxintensityMaxintensity 2 4 Gabor features 2 4 Gabor features
ElongationElongation PerimeterPerimeter SDIntensitySDIntensity 5 Markov Random Field feat 5 Markov Random Field feat
uresures
CompactnessCompactness ConvexPerimeterConvexPerimeter M MMMMMMMMMMMMMM MMMMMMMMMMMMM
EccentricityEccentricity EquivDiameterEquivDiameter MaxIntensityBGMaxIntensityBG
SoliditySolidity MajorAxisLengthMajorAxisLength MeanIntensityBGMeanIntensityBG
ExtentExtent MinorAxisLengthMinorAxisLength SDIntensityBGSDIntensityBG RadialDistanceSDRadialDistanceSD IntensityDifferenceIntensityDifference
• Low-level image featuresLow-level image features
31
Computer-aided Computer-aided characterizationcharacterization
CharacterCharacteristicsistics
Decision Decision treestrees
Add instances Add instances predicted with predicted with high confidence high confidence (60%)(60%)
Add instances predicted with Add instances predicted with high confidence (60%) and high confidence (60%) and instances with low margin instances with low margin (5%)(5%)
LobulationLobulation 27.44%27.44% 81.00%81.00% 69.66%69.66%
MalignancyMalignancy 42.22%42.22% 96.31%96.31% 96.31%96.31%
MarginMargin 35.36%35.36% 98.68%98.68% 96.83%96.83%
SphericitySphericity 36.15%36.15% 91.03%91.03% 90.24%90.24%
SpiculationSpiculation 36.15%36.15% 63.06%63.06% 58.84%58.84%
SubtletySubtlety 38.79%38.79% 93.14%93.14% 92.88%92.88%
TextureTexture 53.56%53.56% 97.10%97.10% 97.36%97.36%
AverageAverage 38.52%38.52% 88.62%88.62% 86.02%86.02%
• Accuracy resultsAccuracy results
Computer-aided Computer-aided characterizationcharacterization• ChallengesChallenges
• Small number of training samples and large Small number of training samples and large number of features number of features “curse of “curse of dimensionality” problemdimensionality” problem
• Nodule sizeNodule size• Variation in the nodules’ boundariesVariation in the nodules’ boundaries• Different types of imaging acquisition Different types of imaging acquisition
parametersparameters•Clinical evaluation: observer performance Clinical evaluation: observer performance
studiesstudiesrequire collaboration require collaboration
with medical schools or hospitalswith medical schools or hospitals
-
2. Texture-based Pixel Classification - tissue segmentation - context-sensitive tools for radiology reporting
SoC Medical imaging SoC Medical imaging research projectsresearch projects
Pixel Level Texture Extraction
Pixel Level Classification Organ Segmentation
1 2, , kd d d _tissue label
Texture-based Pixel Classification
• Texture Feature extraction: consider texture around the pixel of interest.
• Capture texture characteristic based on estimation of joint conditional probability
of pixel pair occurrences Pij(d,θ). – Pij denotes the normalized co-occurrence matrix of
specify by displacement vector (d) and angle (θ).
Neighborhood of a pixel
Haralick Texture Features
Haralick Texture Features
Examples of Texture Images
Texture images: original image, energy and cluster tendency, respectively.M. Kalinin, D. S. Raicu, J. D. Furst, D. S. Channin,, " A Classification Approach for Anatomical Regions Segmentation", The IEEE International Conference on Image Processing (ICIP), Genoa, Italy, September 11-14, 2005.
Texture Classification of Tissues in CT Chest/AbdomenExample of Liver Segmentation: (J.D. Furst, R. Susomboon, and D.S. Raicu, "Single Organ Segmentation Filters for Multiple Organ Segmentation", IEEE 2006 International Conference of the Engineering in Medicine and Biology Society (EMBS'06))
Region growing at 70% Region growing at 60% Segmentation Result
Original Image Initial Seed at 90% Split & Merge at 85% Split & Merge at 80%
(a) Optimal selection of an adequate set of textural features is a challenge, especially with the limited data we often have to deal with in clinical problems. Consequently, the effectiveness of any classification system will always be conditional on two things:
(i) how well the selected features describe the tissues(ii) how well the study group reflects the overall target patient population for the
corresponding diagnosis
Classification models: challenges
(b) how other type of information can be incorporated into the classification models:
- metadata
- image features from other imaging modalities (need of image fusion)
(c) how stable and general the classification models are
Classification models: challenges
-
Definition of Content-based Image Retrieval:Content-based image retrieval is a technique for retrieving images on the basis of automatically derived image features such as texture and shape.
Content-based medical image retrieval (CBMS) systems
Applications of Content-based Image Retrieval:• Teaching• Research• Diagnosis• PACS and Electronic Patient Records
Feature Extraction
Similarity Retrieval
Image Features
[D1, D2,…Dn]Image Database
Query Image
Query Results
Feedback Algorithm
User Evaluation
Diagram of a CBIR
http://viper.unige.ch/~muellerh/demoCLEFmed/index.php
An image retrieval system can help when the diagnosis depends strongly on direct visual properties of images in the context of evidence-based medicine or case-based reasoning.
CBIR as a Diagnosis Aid
An image retrieval system will allow students/teachers to browse available data themselves in an easy and straightforward fashion by clicking on “show me similar images”. Advantages:
- stimulate self-learning and a comparison of similar cases- find optimal cases for teaching
Teaching files: • Casimage: http://www.casimage.com• myPACS: http://www.mypacs.net
CBIR as a Teaching Tool
CBIR as a Research Tool
Image retrieval systems can be used:• to complement text-based retrieval methods• for visual knowledge management whereby the images and associated textual data can be analyzed together
• multimedia data mining can be applied to learn the unknown links between visual features and diagnosis or other patient information
• for quality control to find images that might have been misclassified
CBIR as a tool for lookup and reference in CT chest/abdomen
• Case Study: lung nodules retrieval– Lung Imaging Database Resource for Imaging Research
http://imaging.cancer.gov/programsandresources/InformationSystems/LIDC/page7
– 29 cases, 5,756 DICOM images/slices, 1,143 nodule images
– 4 radiologists annotated the images using 9 nodule characteristics: calcification, internal structure, lobulation, malignancy, margin, sphericity, spiculation, subtlety, and texture
• Goals:– Retrieve nodules based on image features:
• Texture, Shape, and Size – Find the correlations between the image features and the
radiologists’ annotations
Choose a nodule
Choose an image feature& a similarity measure
M. Lam, T. Disney, M. Pham, D. Raicu, J. Furst, “Content-Based Image Retrieval for Pulmonary Computed Tomography Nodule Images”, SPIE Medical Imaging Conference, San Diego, CA, February 2007
Retrieved Images
CBIR systems: challenges
•Type of features• image features:
- texture features: statistical, structural, model and filter-based
- shape features• textual features (such as physician annotations)
• Similarity measures-point-based and distribution based metrics
• Retrieval performance:• precision and recall• clinical evaluation
uestions ?