integration of radiologists feedback into computer-aided diagnosis systems sarah a. jabon a daniela...
Post on 29-Mar-2015
213 Views
Preview:
TRANSCRIPT
Integration of Radiologists’ Feedback into Computer-Aided Diagnosis Systems
Sarah A. Jabona
Daniela S. Raicub
Jacob D. Furstb
aRose-Hulman Institute of Technology, Terre Haute, IN 47803bSchool of Computing, CDM, DePaul Universtiy, Chicago, IL 60604
Overview• Introduction• Related Work• The Data• Methodology
▫ Simple Distance Metrics▫ Linear Regression▫ Principle Component Analysis
• Results▫ Simple Distance Metrics▫ Linear Regression▫ Principle Component Analysis
• Conclusions• Future Work
Introduction
•The 2008 official estimate▫ 215,020 cases diagnosed▫161,840 deaths will occur
•Five-year relative-survival rate (1996 – 2004): 15.2%
•Computer-aided diagnosis systems can help improve early detection
Related Work• El-Naqa et al.
▫ mammography images▫neural networks and support vector machines
• Muramatsu et al.▫mammography images. ▫three-layered artificial neural network to
predict the semantic similarity rating between two nodules
• Park et al.▫linear distance-weighted K-nearest neighbor
algorithm to identify similar images
Related Work
•ASSERT by Purdue University▫Content-based features: co-occurrence,
shape, Fourier Transforms, global gray level statistics
▫Radiologists also provide features•BiasMap by Zhou and Huang
▫Relevance feedback, content-based features
▫Analysis: biased-discriminant analysis (BDA)
The Data
• Lung Image Database Consortium
• Reduced 1,989 images down to 149 (one for each nodule)
• Summarized the radiologists’ ratings (up to 4) into a single vector
• Each nodule has 7 semantic based characteristics and 64 content-based characteristics
Overview• Introduction• Related Work• The Data• Methodology
▫ Simple Distance Metrics▫ Linear Regression▫ Principle Component Analysis
• Results▫ Simple Distance Metrics▫ Linear Regression▫ Principle Component Analysis
• Conclusions• Future Work
Methodology
Methodology: Simple Distance Metrics
Semantic-Based Similarity
Content-Based Similarity
Simple Distance MetricsContent-Based Similarity Values
(Euclidean)Semantic-Based Similarity
Values (1 – Cosine)
1.0000000.8000000.6000000.4000000.2000000.000000
VAR00001
600
400
200
0
Fre
qu
ency
Mean =0.2840127Std. Dev. =0.154278896N =11,026
0.400.200.00
VAR00002
1,200
1,000
800
600
400
200
0
Fre
qu
ency
Mean =0.0766Std. Dev. =0.06374
N =11,026
Methodology: Linear Regression
Methodology: Principle Component Analysis
Lobulation Malignancy Margin Sphericity Spiculation Subtlety Texture
Lobulation 1.000 .199 .085 -.008 .815 .065 .101
Malignancy .199 1.000 .346 .187 .155 .594 .351
Margin .085 .346 1.000 .391 .109 .533 .717
Sphericity -.008 .187 .391 1.000 .078 .300 .230
Spiculation .815 .155 .109 .078 1.000 .156 .146
Subtlety .065 .594 .533 .300 .156 1.000 .523
Texture .101 .351 .717 .230 .146 .523 1.000
Content-Based Features:
• 77 pairs with a correlation > 0.9• 136 pairs with a correlation > 0.8 or < -0.8
Scree Plots: 5 – 9 Matches
7654321
Component Number
3.0
2.5
2.0
1.5
1.0
0.5
0.0
Eig
env
alu
e
Scree Plot
63
61
59
57
55
53
51
49
47
45
43
41
39
37
35
33
31
29
27
25
23
21
19
17
15
13
11
97531
Component Number
20
15
10
5
0
Eig
env
alu
e
Scree Plot
Methodology: Principle Component Analysis
•PCA on content-based features▫accounts for 99% of the variance▫23 components
•PCA on semantic-based characteristics▫Method 1
accounts for 92% of the variance 4 components
▫Method 2 accounts for 98% of the variance 6 components
Overview• Introduction• Related Work• The Data• Methodology
▫ Simple Distance Metrics▫ Linear Regression▫ Principle Component Analysis
• Results▫ Simple Distance Metrics▫ Linear Regression▫ Principle Component Analysis
• Conclusions• Future Work
Results: Simple Distance Metric
Matches
Gabor MarkovCo-
Occurrence
Gabor, Markov, and
Co-Occurrence
All Features
6 – 10 24 18 31 36 432 – 5 107 104 94 98 930 – 1 18 27 24 15 13
Matches: Nodule 117
Simple Distance Metrics
5 – 9 Matches: PCA and Linear Regression
Results: Linear Regression
Data Set
No. of Nodule
Pairs (≈ 2/3 Set)
Correlation: Euclidean
vs. Semantic
R2 Adj. R2 Feature Set
Distance
6 – 9 Matches
166 -0.016 0.948 0.871 2 -
6 – 9 Matches
166 -0.016 0.802 0.679 1 dist3
5 – 9 Matches
218 -0.006 0.927 0.850 2 -
5 – 9 Matches
218 -0.006 0.733 0.624 1 dist3
Results: Linear Regression
Data Set
No. of Nodule
Pairs (≈1/3 Set)
Correlation: Euclidean
vs. Semantic
RMSD Euclidea
n
Correlation: Predicted
vs. Semantic
RMSD Predicted
Features
6 – 9 Matches
85 -0.023 0.2328 0.710 0.0242 128
6 – 9 Matches
85 -0.023 0.2328 0.748 0.0181 64
5 – 9 Matches
108 -0.039 0.1985 0.829 0.0136 128
5 – 9 Matches
108 -0.039 0.1985 0.733 0.0155 64
Results: Linear RegressionLinear Regression versus Euclidean Distance
(5 to 9 Matches with 128 Features)
0
0.02
0.04
0.06
0.08
0.1
0.12
0.14
0.16
0.18
0.2
0 0.2 0.4 0.6 0.8
Predicted Similarity Value (Calculated with Content-Based Features)
Se
ma
nti
c S
imila
rity
Va
lue
Linear Regression
Euclidean Distance
Results: Linear RegressionResidual Plot: Linear Regression versus Euclidean Distance
(5 to 9 Matches with 128 Features)
-0.7
-0.6
-0.5
-0.4
-0.3
-0.2
-0.1
0
0.1
0 0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16
Semantic Similarity Value
Err
or Linear Regression
Euclidean Distance
Results: PCA
Data Set
No. of Nodule
Pairs (≈ 1/3 Set)
Correlation: Euclidean
vs. Semantic
RMSD Euclidea
n
Correlation: Predicted
vs. Semantic
RMSD Predicte
dFeatures
6 – 9 Matches
85 -0.115 0.3043 0.787 0.0061 128
6 – 9 Matches
85 -0.115 0.3043 0.393 0.0114 64
5 – 9 Matches
108 -0.094 0.2664 0.570 0.0096 128
5 – 9 Matches
108 -0.094 0.2664 0.136 0.0112 64
Results: PCANo PCA versus PCA
(5 to 9 Matches with 128 Features)
0
0.02
0.04
0.06
0.08
0.1
0.12
0.14
0.16
0.18
0.2
0 0.05 0.1 0.15 0.2
Predicted Similarity Value (Calculated with Content-Based Features)
Se
ma
nti
c S
imila
rity
Va
lue
Linear Regression with No PCA
Linear Regression with PCA
Results: PCAResidual Plot: No PCA versus PCA (5 to 9 Matches with 128 Features)
-0.05
-0.04
-0.03
-0.02
-0.01
0
0.01
0.02
0.03
0.04
0.05
0 0.05 0.1 0.15
Semantic Similarity Value
Err
or Linear Regression with No PCA
Linear Regression with PCA
RMSD – Percent of RangeLinear Regression: No
PCALinear Regression: PCA
Data Set Features Euclidean Predicted Euclidean Predicted
6 – 9 Matches
128 23.3% 17.3% 30.4% 6.7%
6 – 9 Matches
64 23.3% 12.9% 30.4% 12.5%
5 – 9 Matches
128 19.9% 9.7% 26.6% 10.1%
5 – 9 Matches
64 19.9% 11.1% 26.6% 11.8%
Example: Nodule 37 and Nodule 38
Nodule 38 Nodule 37Euclidean
Similarity ValuePCA Similarity
Value
0.549066 0.004379
Nodule Number
Lobulation Malignancy Margin Sphericity Spiculation Subtlety Texture
37 5 3 5 5 5 4 5
38 5 3 5 5 5 5 5
Future Work
•Perform the analysis only nodules on which all three radiologists agree
•In order to address the small size of the data set, perform the analysis using a leave one out technique (instead of 2/3 training and 1/3 testing)
•Incorporate relevance feedback into the system
Questions?
top related