integration of radiologists feedback into computer-aided diagnosis systems sarah a. jabon a daniela...

Integration of Radiologists’ Feedback into Computer-Aided Diagnosis Systems

Sarah A. Jabona

Daniela S. Raicub

Jacob D. Furstb

aRose-Hulman Institute of Technology, Terre Haute, IN 47803bSchool of Computing, CDM, DePaul Universtiy, Chicago, IL 60604

Overview• Introduction• Related Work• The Data• Methodology

▫ Simple Distance Metrics▫ Linear Regression▫ Principle Component Analysis

• Results▫ Simple Distance Metrics▫ Linear Regression▫ Principle Component Analysis

• Conclusions• Future Work

Introduction

•The 2008 official estimate▫ 215,020 cases diagnosed▫161,840 deaths will occur

•Five-year relative-survival rate (1996 – 2004): 15.2%

•Computer-aided diagnosis systems can help improve early detection

Related Work• El-Naqa et al.

▫ mammography images▫neural networks and support vector machines

• Muramatsu et al.▫mammography images. ▫three-layered artificial neural network to

predict the semantic similarity rating between two nodules

• Park et al.▫linear distance-weighted K-nearest neighbor

algorithm to identify similar images

Related Work

•ASSERT by Purdue University▫Content-based features: co-occurrence,

shape, Fourier Transforms, global gray level statistics

▫Radiologists also provide features•BiasMap by Zhou and Huang

▫Relevance feedback, content-based features

▫Analysis: biased-discriminant analysis (BDA)

The Data

• Lung Image Database Consortium

• Reduced 1,989 images down to 149 (one for each nodule)

• Summarized the radiologists’ ratings (up to 4) into a single vector

• Each nodule has 7 semantic based characteristics and 64 content-based characteristics

Methodology

Methodology: Simple Distance Metrics

Semantic-Based Similarity

Content-Based Similarity

Simple Distance MetricsContent-Based Similarity Values

(Euclidean)Semantic-Based Similarity

Values (1 – Cosine)

1.0000000.8000000.6000000.4000000.2000000.000000

VAR00001

Mean =0.2840127Std. Dev. =0.154278896N =11,026

0.400.200.00

VAR00002

Mean =0.0766Std. Dev. =0.06374

N =11,026

Methodology: Linear Regression

Methodology: Principle Component Analysis

Lobulation Malignancy Margin Sphericity Spiculation Subtlety Texture

Lobulation 1.000 .199 .085 -.008 .815 .065 .101

Malignancy .199 1.000 .346 .187 .155 .594 .351

Margin .085 .346 1.000 .391 .109 .533 .717

Sphericity -.008 .187 .391 1.000 .078 .300 .230

Spiculation .815 .155 .109 .078 1.000 .156 .146

Subtlety .065 .594 .533 .300 .156 1.000 .523

Texture .101 .351 .717 .230 .146 .523 1.000

Content-Based Features:

• 77 pairs with a correlation > 0.9• 136 pairs with a correlation > 0.8 or < -0.8

Scree Plots: 5 – 9 Matches

7654321

Component Number

Scree Plot

Component Number

Scree Plot

Methodology: Principle Component Analysis

•PCA on content-based features▫accounts for 99% of the variance▫23 components

•PCA on semantic-based characteristics▫Method 1

accounts for 92% of the variance 4 components

▫Method 2 accounts for 98% of the variance 6 components

Results: Simple Distance Metric

Matches

Gabor MarkovCo-

Occurrence

Gabor, Markov, and

Co-Occurrence

All Features

6 – 10 24 18 31 36 432 – 5 107 104 94 98 930 – 1 18 27 24 15 13

Matches: Nodule 117

Simple Distance Metrics

5 – 9 Matches: PCA and Linear Regression

Results: Linear Regression

Data Set

No. of Nodule

Pairs (≈ 2/3 Set)

Correlation: Euclidean

vs. Semantic

R2 Adj. R2 Feature Set

Distance

6 – 9 Matches

166 -0.016 0.948 0.871 2 -

6 – 9 Matches

166 -0.016 0.802 0.679 1 dist3

5 – 9 Matches

218 -0.006 0.927 0.850 2 -

5 – 9 Matches

218 -0.006 0.733 0.624 1 dist3

Results: Linear Regression

Data Set

No. of Nodule

Pairs (≈1/3 Set)

vs. Semantic

RMSD Euclidea

Correlation: Predicted

vs. Semantic

RMSD Predicted

Features

6 – 9 Matches

85 -0.023 0.2328 0.710 0.0242 128

6 – 9 Matches

85 -0.023 0.2328 0.748 0.0181 64

5 – 9 Matches

108 -0.039 0.1985 0.829 0.0136 128

5 – 9 Matches

108 -0.039 0.1985 0.733 0.0155 64

Results: Linear RegressionLinear Regression versus Euclidean Distance

(5 to 9 Matches with 128 Features)

0 0.2 0.4 0.6 0.8

Predicted Similarity Value (Calculated with Content-Based Features)

Linear Regression

Euclidean Distance

Results: Linear RegressionResidual Plot: Linear Regression versus Euclidean Distance

0 0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16

Semantic Similarity Value

or Linear Regression

Euclidean Distance

Results: PCA

Data Set

No. of Nodule

Pairs (≈ 1/3 Set)

vs. Semantic

RMSD Euclidea

Correlation: Predicted

vs. Semantic

RMSD Predicte

dFeatures

6 – 9 Matches

85 -0.115 0.3043 0.787 0.0061 128

6 – 9 Matches

85 -0.115 0.3043 0.393 0.0114 64

5 – 9 Matches

108 -0.094 0.2664 0.570 0.0096 128

5 – 9 Matches

108 -0.094 0.2664 0.136 0.0112 64

Results: PCANo PCA versus PCA

0 0.05 0.1 0.15 0.2

Predicted Similarity Value (Calculated with Content-Based Features)

Linear Regression with No PCA

Linear Regression with PCA

Results: PCAResidual Plot: No PCA versus PCA (5 to 9 Matches with 128 Features)

0 0.05 0.1 0.15

Semantic Similarity Value

or Linear Regression with No PCA

Linear Regression with PCA

RMSD – Percent of RangeLinear Regression: No

PCALinear Regression: PCA

Data Set Features Euclidean Predicted Euclidean Predicted

6 – 9 Matches

128 23.3% 17.3% 30.4% 6.7%

6 – 9 Matches

64 23.3% 12.9% 30.4% 12.5%

5 – 9 Matches

128 19.9% 9.7% 26.6% 10.1%

5 – 9 Matches

64 19.9% 11.1% 26.6% 11.8%

Example: Nodule 37 and Nodule 38

Nodule 38 Nodule 37Euclidean

Similarity ValuePCA Similarity

0.549066 0.004379

Nodule Number

Lobulation Malignancy Margin Sphericity Spiculation Subtlety Texture

37 5 3 5 5 5 4 5

38 5 3 5 5 5 5 5

Future Work

•Perform the analysis only nodules on which all three radiologists agree

•In order to address the small size of the data set, perform the analysis using a leave one out technique (instead of 2/3 training and 1/3 testing)

•Incorporate relevance feedback into the system

Questions?

integration of radiologists feedback into computer-aided diagnosis systems sarah a. jabon a daniela...

methodology slide

linear regression slide

cosine slide

matches slide

contentbased characteristics

similar images slide

linear distanceweighted

semantic based characteristics

Documents

2015-06-23 video analytics - furst,m - parc

eric matthew furst curriculum vitae · 2020. 4. 1. ·...

manual jabon palmolive

industria de jabon

nsf reu program in medical informatics 1 d. raicu, 1 j....

10-genetic channelopathies, aziz - akron children's...

cs 553 - iit-computer...

wo2005080541a1 barra de jabon mejorado

tony furst article - safety · administrator for safety,...

merrick furst explains startup engineering and flashpoint gt

furst progress

biomedical image analytics using sas® viya® · 3...

pastillas de jabon grupo 1 il

the early political philosophy of furst otto von …

nather and furst rubin rise to the challenge at cblm...

test jabon

jabon de coyol de cuapa

jabon liquido desinfectante, para lavado pre y post...

esteres preparacion de jabon

diane furst re hyperion