integration of radiologists feedback into computer-aided diagnosis systems sarah a. jabon a daniela...

Post on 29-Mar-2015

213 Views

Category:

Documents

1 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Integration of Radiologists’ Feedback into Computer-Aided Diagnosis Systems

Sarah A. Jabona

Daniela S. Raicub

Jacob D. Furstb

aRose-Hulman Institute of Technology, Terre Haute, IN 47803bSchool of Computing, CDM, DePaul Universtiy, Chicago, IL 60604

Overview• Introduction• Related Work• The Data• Methodology

▫ Simple Distance Metrics▫ Linear Regression▫ Principle Component Analysis

• Results▫ Simple Distance Metrics▫ Linear Regression▫ Principle Component Analysis

• Conclusions• Future Work

Introduction

•The 2008 official estimate▫ 215,020 cases diagnosed▫161,840 deaths will occur

•Five-year relative-survival rate (1996 – 2004): 15.2%

•Computer-aided diagnosis systems can help improve early detection

Related Work• El-Naqa et al.

▫ mammography images▫neural networks and support vector machines

• Muramatsu et al.▫mammography images. ▫three-layered artificial neural network to

predict the semantic similarity rating between two nodules

• Park et al.▫linear distance-weighted K-nearest neighbor

algorithm to identify similar images

Related Work

•ASSERT by Purdue University▫Content-based features: co-occurrence,

shape, Fourier Transforms, global gray level statistics

▫Radiologists also provide features•BiasMap by Zhou and Huang

▫Relevance feedback, content-based features

▫Analysis: biased-discriminant analysis (BDA)

The Data

• Lung Image Database Consortium

• Reduced 1,989 images down to 149 (one for each nodule)

• Summarized the radiologists’ ratings (up to 4) into a single vector

• Each nodule has 7 semantic based characteristics and 64 content-based characteristics

Overview• Introduction• Related Work• The Data• Methodology

▫ Simple Distance Metrics▫ Linear Regression▫ Principle Component Analysis

• Results▫ Simple Distance Metrics▫ Linear Regression▫ Principle Component Analysis

• Conclusions• Future Work

Methodology

Methodology: Simple Distance Metrics

Semantic-Based Similarity

Content-Based Similarity

Simple Distance MetricsContent-Based Similarity Values

(Euclidean)Semantic-Based Similarity

Values (1 – Cosine)

1.0000000.8000000.6000000.4000000.2000000.000000

VAR00001

600

400

200

0

Fre

qu

ency

Mean =0.2840127Std. Dev. =0.154278896N =11,026

0.400.200.00

VAR00002

1,200

1,000

800

600

400

200

0

Fre

qu

ency

Mean =0.0766Std. Dev. =0.06374

N =11,026

Methodology: Linear Regression

Methodology: Principle Component Analysis

Lobulation Malignancy Margin Sphericity Spiculation Subtlety Texture

Lobulation 1.000 .199 .085 -.008 .815 .065 .101

Malignancy .199 1.000 .346 .187 .155 .594 .351

Margin .085 .346 1.000 .391 .109 .533 .717

Sphericity -.008 .187 .391 1.000 .078 .300 .230

Spiculation .815 .155 .109 .078 1.000 .156 .146

Subtlety .065 .594 .533 .300 .156 1.000 .523

Texture .101 .351 .717 .230 .146 .523 1.000

Content-Based Features:

• 77 pairs with a correlation > 0.9• 136 pairs with a correlation > 0.8 or < -0.8

Scree Plots: 5 – 9 Matches

7654321

Component Number

3.0

2.5

2.0

1.5

1.0

0.5

0.0

Eig

env

alu

e

Scree Plot

63

61

59

57

55

53

51

49

47

45

43

41

39

37

35

33

31

29

27

25

23

21

19

17

15

13

11

97531

Component Number

20

15

10

5

0

Eig

env

alu

e

Scree Plot

Methodology: Principle Component Analysis

•PCA on content-based features▫accounts for 99% of the variance▫23 components

•PCA on semantic-based characteristics▫Method 1

accounts for 92% of the variance 4 components

▫Method 2 accounts for 98% of the variance 6 components

Overview• Introduction• Related Work• The Data• Methodology

▫ Simple Distance Metrics▫ Linear Regression▫ Principle Component Analysis

• Results▫ Simple Distance Metrics▫ Linear Regression▫ Principle Component Analysis

• Conclusions• Future Work

Results: Simple Distance Metric

Matches

Gabor MarkovCo-

Occurrence

Gabor, Markov, and

Co-Occurrence

All Features

6 – 10 24 18 31 36 432 – 5 107 104 94 98 930 – 1 18 27 24 15 13

Matches: Nodule 117

Simple Distance Metrics

5 – 9 Matches: PCA and Linear Regression

Results: Linear Regression

Data Set

No. of Nodule

Pairs (≈ 2/3 Set)

Correlation: Euclidean

vs. Semantic

R2 Adj. R2 Feature Set

Distance

6 – 9 Matches

166 -0.016 0.948 0.871 2 -

6 – 9 Matches

166 -0.016 0.802 0.679 1 dist3

5 – 9 Matches

218 -0.006 0.927 0.850 2 -

5 – 9 Matches

218 -0.006 0.733 0.624 1 dist3

Results: Linear Regression

Data Set

No. of Nodule

Pairs (≈1/3 Set)

Correlation: Euclidean

vs. Semantic

RMSD Euclidea

n

Correlation: Predicted

vs. Semantic

RMSD Predicted

Features

6 – 9 Matches

85 -0.023 0.2328 0.710 0.0242 128

6 – 9 Matches

85 -0.023 0.2328 0.748 0.0181 64

5 – 9 Matches

108 -0.039 0.1985 0.829 0.0136 128

5 – 9 Matches

108 -0.039 0.1985 0.733 0.0155 64

Results: Linear RegressionLinear Regression versus Euclidean Distance

(5 to 9 Matches with 128 Features)

0

0.02

0.04

0.06

0.08

0.1

0.12

0.14

0.16

0.18

0.2

0 0.2 0.4 0.6 0.8

Predicted Similarity Value (Calculated with Content-Based Features)

Se

ma

nti

c S

imila

rity

Va

lue

Linear Regression

Euclidean Distance

Results: Linear RegressionResidual Plot: Linear Regression versus Euclidean Distance

(5 to 9 Matches with 128 Features)

-0.7

-0.6

-0.5

-0.4

-0.3

-0.2

-0.1

0

0.1

0 0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16

Semantic Similarity Value

Err

or Linear Regression

Euclidean Distance

Results: PCA

Data Set

No. of Nodule

Pairs (≈ 1/3 Set)

Correlation: Euclidean

vs. Semantic

RMSD Euclidea

n

Correlation: Predicted

vs. Semantic

RMSD Predicte

dFeatures

6 – 9 Matches

85 -0.115 0.3043 0.787 0.0061 128

6 – 9 Matches

85 -0.115 0.3043 0.393 0.0114 64

5 – 9 Matches

108 -0.094 0.2664 0.570 0.0096 128

5 – 9 Matches

108 -0.094 0.2664 0.136 0.0112 64

Results: PCANo PCA versus PCA

(5 to 9 Matches with 128 Features)

0

0.02

0.04

0.06

0.08

0.1

0.12

0.14

0.16

0.18

0.2

0 0.05 0.1 0.15 0.2

Predicted Similarity Value (Calculated with Content-Based Features)

Se

ma

nti

c S

imila

rity

Va

lue

Linear Regression with No PCA

Linear Regression with PCA

Results: PCAResidual Plot: No PCA versus PCA (5 to 9 Matches with 128 Features)

-0.05

-0.04

-0.03

-0.02

-0.01

0

0.01

0.02

0.03

0.04

0.05

0 0.05 0.1 0.15

Semantic Similarity Value

Err

or Linear Regression with No PCA

Linear Regression with PCA

RMSD – Percent of RangeLinear Regression: No

PCALinear Regression: PCA

Data Set Features Euclidean Predicted Euclidean Predicted

6 – 9 Matches

128 23.3% 17.3% 30.4% 6.7%

6 – 9 Matches

64 23.3% 12.9% 30.4% 12.5%

5 – 9 Matches

128 19.9% 9.7% 26.6% 10.1%

5 – 9 Matches

64 19.9% 11.1% 26.6% 11.8%

Example: Nodule 37 and Nodule 38

Nodule 38 Nodule 37Euclidean

Similarity ValuePCA Similarity

Value

0.549066 0.004379

Nodule Number

Lobulation Malignancy Margin Sphericity Spiculation Subtlety Texture

37 5 3 5 5 5 4 5

38 5 3 5 5 5 5 5

Future Work

•Perform the analysis only nodules on which all three radiologists agree

•In order to address the small size of the data set, perform the analysis using a leave one out technique (instead of 2/3 training and 1/3 testing)

•Incorporate relevance feedback into the system

Questions?

top related