an investigation into the relationship between semantic and content based similarity using lidc...
Post on 15-Jan-2016
214 views
TRANSCRIPT
![Page 1: An Investigation into the Relationship between Semantic and Content Based Similarity Using LIDC Grace Dasovich Robert Kim Midterm Presentation August 21](https://reader036.vdocuments.us/reader036/viewer/2022062518/56649d595503460f94a392ea/html5/thumbnails/1.jpg)
An Investigation into the Relationship between Semantic and Content Based Similarity
Using LIDC
Grace Dasovich
Robert Kim
Midterm Presentation
August 21 2009
![Page 2: An Investigation into the Relationship between Semantic and Content Based Similarity Using LIDC Grace Dasovich Robert Kim Midterm Presentation August 21](https://reader036.vdocuments.us/reader036/viewer/2022062518/56649d595503460f94a392ea/html5/thumbnails/2.jpg)
OutlineOutline
• Related Work
• Data
• Modeling Approach and Results– Similarity Measures– Artificial Neural Network– Multivariate Linear Regression
• Conclusions
• Future Work
![Page 3: An Investigation into the Relationship between Semantic and Content Based Similarity Using LIDC Grace Dasovich Robert Kim Midterm Presentation August 21](https://reader036.vdocuments.us/reader036/viewer/2022062518/56649d595503460f94a392ea/html5/thumbnails/3.jpg)
• Computer-Aided Diagnosis (CADx) based on low-level image features– Armato et al. developed a linear discriminant
classifier using features of lung nodules– Need to find the relationship between the
image features and radiologists’ ratings
Related Work
![Page 4: An Investigation into the Relationship between Semantic and Content Based Similarity Using LIDC Grace Dasovich Robert Kim Midterm Presentation August 21](https://reader036.vdocuments.us/reader036/viewer/2022062518/56649d595503460f94a392ea/html5/thumbnails/4.jpg)
• Image features and the semantic ratings– Lung Interpretations
• Barb et al. developed Evolutionary System for Semantic Exchange of Information in Collaborative Environments (ESSENCE)
• Raicu et al. used ensemble classifiers and decision trees to predict semantic ratings
• Samala et al. used several combinations of image features and the radiologists’ ratings to classify nodules
Related Work
![Page 5: An Investigation into the Relationship between Semantic and Content Based Similarity Using LIDC Grace Dasovich Robert Kim Midterm Presentation August 21](https://reader036.vdocuments.us/reader036/viewer/2022062518/56649d595503460f94a392ea/html5/thumbnails/5.jpg)
– Similarity• Li et al. investigated four different methods to
compute similarity measures for lung nodules– Feature-based– Pixel-value-difference– Cross correlation– ANN
Related Work
![Page 6: An Investigation into the Relationship between Semantic and Content Based Similarity Using LIDC Grace Dasovich Robert Kim Midterm Presentation August 21](https://reader036.vdocuments.us/reader036/viewer/2022062518/56649d595503460f94a392ea/html5/thumbnails/6.jpg)
Materials
• LIDC Dataset
• 149 Unique Nodules– One slice per nodule, largest nodule area
• 9 Semantic Characteristics– Calcification and Internal Structure had little
variation, thus were not used
• 64 Content Features– Shape, size, intensity, and texture
6
Data
![Page 7: An Investigation into the Relationship between Semantic and Content Based Similarity Using LIDC Grace Dasovich Robert Kim Midterm Presentation August 21](https://reader036.vdocuments.us/reader036/viewer/2022062518/56649d595503460f94a392ea/html5/thumbnails/7.jpg)
• Related Work
• Data
• Modeling Approach and Results– Similarity Measures– Artificial Neural Network– Multivariate Linear Regression
• Conclusions
• Future Work
Outline
![Page 8: An Investigation into the Relationship between Semantic and Content Based Similarity Using LIDC Grace Dasovich Robert Kim Midterm Presentation August 21](https://reader036.vdocuments.us/reader036/viewer/2022062518/56649d595503460f94a392ea/html5/thumbnails/8.jpg)
• Cosine Similarity
• Jeffrey Divergence
• Euclidean Distance
Similarity Measures
![Page 9: An Investigation into the Relationship between Semantic and Content Based Similarity Using LIDC Grace Dasovich Robert Kim Midterm Presentation August 21](https://reader036.vdocuments.us/reader036/viewer/2022062518/56649d595503460f94a392ea/html5/thumbnails/9.jpg)
Similarity Measures
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1-0.1
0
0.1
0.2
0.3
0.4
0.5
0.6
Euclidean Distance
Co
sin
e S
imila
rity
![Page 10: An Investigation into the Relationship between Semantic and Content Based Similarity Using LIDC Grace Dasovich Robert Kim Midterm Presentation August 21](https://reader036.vdocuments.us/reader036/viewer/2022062518/56649d595503460f94a392ea/html5/thumbnails/10.jpg)
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10
0.5
1
1.5
2
2.5
3
3.5
4
Euclidean Distance
Jeff
rey
Div
erg
en
ce
Similarity Measures
![Page 11: An Investigation into the Relationship between Semantic and Content Based Similarity Using LIDC Grace Dasovich Robert Kim Midterm Presentation August 21](https://reader036.vdocuments.us/reader036/viewer/2022062518/56649d595503460f94a392ea/html5/thumbnails/11.jpg)
• Computed feature distance measures
Similarity Measures
![Page 12: An Investigation into the Relationship between Semantic and Content Based Similarity Using LIDC Grace Dasovich Robert Kim Midterm Presentation August 21](https://reader036.vdocuments.us/reader036/viewer/2022062518/56649d595503460f94a392ea/html5/thumbnails/12.jpg)
OutlineOutline
• Related Work
• Data
• Modeling Approach and Results– Similarity Measures– Artificial Neural Network– Multivariate Linear Regression
• Conclusions
• Future Work
![Page 13: An Investigation into the Relationship between Semantic and Content Based Similarity Using LIDC Grace Dasovich Robert Kim Midterm Presentation August 21](https://reader036.vdocuments.us/reader036/viewer/2022062518/56649d595503460f94a392ea/html5/thumbnails/13.jpg)
• Two three-layer ANNs – Input (64 neurons), hidden layer (5 neurons), output
(1)– Input (64 neurons), hidden layer (5 neurons), output
(7)
• Input = 64 feature distances• Output = Semantic similarity or difference in
semantic ratings• Hyperbolic tangent function, backpropagation
algorithm, 200 iterations
Methods
![Page 14: An Investigation into the Relationship between Semantic and Content Based Similarity Using LIDC Grace Dasovich Robert Kim Midterm Presentation August 21](https://reader036.vdocuments.us/reader036/viewer/2022062518/56649d595503460f94a392ea/html5/thumbnails/14.jpg)
• ANN with a single output– 640 random pairs from all 109 nodules– 231 pairs from nodules with malignancy > 3– 496 pairs from nodules with area > 122 mm2
Methods
![Page 15: An Investigation into the Relationship between Semantic and Content Based Similarity Using LIDC Grace Dasovich Robert Kim Midterm Presentation August 21](https://reader036.vdocuments.us/reader036/viewer/2022062518/56649d595503460f94a392ea/html5/thumbnails/15.jpg)
Methods
• ANN with seven outputs– 640 random pairs from all 109 nodules
![Page 16: An Investigation into the Relationship between Semantic and Content Based Similarity Using LIDC Grace Dasovich Robert Kim Midterm Presentation August 21](https://reader036.vdocuments.us/reader036/viewer/2022062518/56649d595503460f94a392ea/html5/thumbnails/16.jpg)
• Leave-one-out method– Cosine similarity or Jeffrey divergence or
difference in Semantic ratings used as teaching data
– An ANN trained with entire dataset minus one image pair
– The pair left out used for testing– Correlation between calculated radiologists’
similarity and ANN output calculated
Methods
![Page 17: An Investigation into the Relationship between Semantic and Content Based Similarity Using LIDC Grace Dasovich Robert Kim Midterm Presentation August 21](https://reader036.vdocuments.us/reader036/viewer/2022062518/56649d595503460f94a392ea/html5/thumbnails/17.jpg)
• ANN with a single output– 640 random pairs from all 109 nodules– 231 pairs from nodules with malignancy > 3– 496 pairs from nodules with area > 122 mm2
• ANN with seven outputs– 640 random pairs from all 109 nodules
Methods
![Page 18: An Investigation into the Relationship between Semantic and Content Based Similarity Using LIDC Grace Dasovich Robert Kim Midterm Presentation August 21](https://reader036.vdocuments.us/reader036/viewer/2022062518/56649d595503460f94a392ea/html5/thumbnails/18.jpg)
• ANN using 640 random pairs
Results
![Page 19: An Investigation into the Relationship between Semantic and Content Based Similarity Using LIDC Grace Dasovich Robert Kim Midterm Presentation August 21](https://reader036.vdocuments.us/reader036/viewer/2022062518/56649d595503460f94a392ea/html5/thumbnails/19.jpg)
• ANN using 231 pairs with malignancy rating > 3
Results
![Page 20: An Investigation into the Relationship between Semantic and Content Based Similarity Using LIDC Grace Dasovich Robert Kim Midterm Presentation August 21](https://reader036.vdocuments.us/reader036/viewer/2022062518/56649d595503460f94a392ea/html5/thumbnails/20.jpg)
• ANN using 496 pairs with area > 122 mm2
Results
![Page 21: An Investigation into the Relationship between Semantic and Content Based Similarity Using LIDC Grace Dasovich Robert Kim Midterm Presentation August 21](https://reader036.vdocuments.us/reader036/viewer/2022062518/56649d595503460f94a392ea/html5/thumbnails/21.jpg)
• ANN output vs. target values using Jeffrey divergence for the 640 pairs (r = 0.438)
Results
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.80
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Output
Ta
rge
t
![Page 22: An Investigation into the Relationship between Semantic and Content Based Similarity Using LIDC Grace Dasovich Robert Kim Midterm Presentation August 21](https://reader036.vdocuments.us/reader036/viewer/2022062518/56649d595503460f94a392ea/html5/thumbnails/22.jpg)
• ANN using random 640 pairs and the Jeffrey divergence with seven semantic ratings
Results
![Page 23: An Investigation into the Relationship between Semantic and Content Based Similarity Using LIDC Grace Dasovich Robert Kim Midterm Presentation August 21](https://reader036.vdocuments.us/reader036/viewer/2022062518/56649d595503460f94a392ea/html5/thumbnails/23.jpg)
OutlineOutline
• Related Work
• Data
• Modeling Approach and Results– Similarity Measures– Artificial Neural Network– Multivariate Linear Regression
• Conclusions
• Future Work
![Page 24: An Investigation into the Relationship between Semantic and Content Based Similarity Using LIDC Grace Dasovich Robert Kim Midterm Presentation August 21](https://reader036.vdocuments.us/reader036/viewer/2022062518/56649d595503460f94a392ea/html5/thumbnails/24.jpg)
Methods
• Normalization of Features– Min-Max Technique – Z-Score Technique
• Pair Selection– Looked for matches between k number of
most similar images based on semantic and content
24
Methods
![Page 25: An Investigation into the Relationship between Semantic and Content Based Similarity Using LIDC Grace Dasovich Robert Kim Midterm Presentation August 21](https://reader036.vdocuments.us/reader036/viewer/2022062518/56649d595503460f94a392ea/html5/thumbnails/25.jpg)
Methods
• Multivariate Regression Analysis– Select features with highest correlation
coefficients
– Feature distance measures
25
Methods
![Page 26: An Investigation into the Relationship between Semantic and Content Based Similarity Using LIDC Grace Dasovich Robert Kim Midterm Presentation August 21](https://reader036.vdocuments.us/reader036/viewer/2022062518/56649d595503460f94a392ea/html5/thumbnails/26.jpg)
• Nodule Analysis– Determine differences between selected and
non-selected nodules– Define requirements for our model
Methods
![Page 27: An Investigation into the Relationship between Semantic and Content Based Similarity Using LIDC Grace Dasovich Robert Kim Midterm Presentation August 21](https://reader036.vdocuments.us/reader036/viewer/2022062518/56649d595503460f94a392ea/html5/thumbnails/27.jpg)
Results
27
Results
0 2 4 6 8 10 12 14 16 18 200
0.5
1
Cor
rela
tion
Threshold0 2 4 6 8 10 12 14 16 18 20
0
1000
2000
Num
ber
of P
airs
![Page 28: An Investigation into the Relationship between Semantic and Content Based Similarity Using LIDC Grace Dasovich Robert Kim Midterm Presentation August 21](https://reader036.vdocuments.us/reader036/viewer/2022062518/56649d595503460f94a392ea/html5/thumbnails/28.jpg)
Results
d(i, j) d2(i, j) exp(d(i, j))
Cosine 0.871 0.849 0.866
Jeffrey 0.647 0.633 0.608
![Page 29: An Investigation into the Relationship between Semantic and Content Based Similarity Using LIDC Grace Dasovich Robert Kim Midterm Presentation August 21](https://reader036.vdocuments.us/reader036/viewer/2022062518/56649d595503460f94a392ea/html5/thumbnails/29.jpg)
Results
Correlation Coefficient Feature0.1175 Equivalent Diameter0.1085 Energy (Haralick)0.0823 Gabor Mean 135_050.0647 Convex Area0.0467 Gabor STD 135_040.0322 Min Intensity BG0.0295 Markov 40.0280 Variance (Haralick)0.0265 Gabor STD 45_050.0238 SD Intensity
R2 = 0.871
29
Results
![Page 30: An Investigation into the Relationship between Semantic and Content Based Similarity Using LIDC Grace Dasovich Robert Kim Midterm Presentation August 21](https://reader036.vdocuments.us/reader036/viewer/2022062518/56649d595503460f94a392ea/html5/thumbnails/30.jpg)
Results
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1-0.1
0
0.1
0.2
0.3
0.4
0.5
0.6
Content
Sem
antic
30
Results
![Page 31: An Investigation into the Relationship between Semantic and Content Based Similarity Using LIDC Grace Dasovich Robert Kim Midterm Presentation August 21](https://reader036.vdocuments.us/reader036/viewer/2022062518/56649d595503460f94a392ea/html5/thumbnails/31.jpg)
Results
1 2 3 4 50
0.5
1Lobulation
1 2 3 4 50
0.5
1Malignancy
1 2 3 4 50
0.2
0.4
0.6
0.8
1Margin
1 2 3 4 50
0.2
0.4
0.6
0.8
1Sphericity
1 2 3 4 50
0.5
1Spiculation
1 2 3 4 50
0.5
1Subtlety
1 2 3 4 50
0.5
1Texture
79 Nodules
70 Nodules
31
Results
![Page 32: An Investigation into the Relationship between Semantic and Content Based Similarity Using LIDC Grace Dasovich Robert Kim Midterm Presentation August 21](https://reader036.vdocuments.us/reader036/viewer/2022062518/56649d595503460f94a392ea/html5/thumbnails/32.jpg)
Results
-2 0 2 4 6 80
0.2
0.4Equivalent Diameter
-2 0 2 4 60
0.2
0.4Energy
-1 0 1 2 3 40
0.2
0.4Gabor Mean 135 5
-2 0 2 4 6 8 100
0.5
1Convex Area
-2 -1 0 1 2 3 4 50
0.1
0.2Gabor SD 135 4
-3 -2 -1 0 1 20
0.2
0.4Min Intensity BG
-1 0 1 2 3 4 5 60
0.5
1Markov4
-2 0 2 4 6 80
0.5
1Variance
-2 -1 0 1 2 3 40
0.1
0.2Gabor SD 45 5
-2 0 2 4 60
0.1
0.2SD Intensity
79 nodules70 nodules
32
Results
![Page 33: An Investigation into the Relationship between Semantic and Content Based Similarity Using LIDC Grace Dasovich Robert Kim Midterm Presentation August 21](https://reader036.vdocuments.us/reader036/viewer/2022062518/56649d595503460f94a392ea/html5/thumbnails/33.jpg)
Results
-5 0 5 100
0.1
0.2
0.3
0.4A
-5 0 5 100
0.05
0.1
0.15
0.2B
79 Nodules70 Nodules
79 Nodules70 Nodules
1 2 3 4 50
0.2
0.4
0.6
0.8C
1 2 3 4 50
0.2
0.4
0.6
0.8D
79 Nodules70 Nodules
79 Nodules70 Nodules
Results
A. Equivalent Diameter, B. Standard Deviation of Intensity, C. Malignancy, D. Subtlety
![Page 34: An Investigation into the Relationship between Semantic and Content Based Similarity Using LIDC Grace Dasovich Robert Kim Midterm Presentation August 21](https://reader036.vdocuments.us/reader036/viewer/2022062518/56649d595503460f94a392ea/html5/thumbnails/34.jpg)
Preliminary Issues
• The ANN also is not yet sufficient to predict semantic similarity from content– Best correlation 0.438– Malignancy correlation 0.521– Jeffrey performed better unlike linear model
• A semantic gap still exists
Conclusions
![Page 35: An Investigation into the Relationship between Semantic and Content Based Similarity Using LIDC Grace Dasovich Robert Kim Midterm Presentation August 21](https://reader036.vdocuments.us/reader036/viewer/2022062518/56649d595503460f94a392ea/html5/thumbnails/35.jpg)
Conclusions
• Our linear model applies to a specific type of nodule– Characteristics: High malignancy, high texture,
low lobulation, and low spiculation– Features: Larger diameter, greater intensity
• Linear models are not sufficient for determination of similarities– R2 of 0.871 with chosen nodules
35
Conclusions
![Page 36: An Investigation into the Relationship between Semantic and Content Based Similarity Using LIDC Grace Dasovich Robert Kim Midterm Presentation August 21](https://reader036.vdocuments.us/reader036/viewer/2022062518/56649d595503460f94a392ea/html5/thumbnails/36.jpg)
Future Work
• Reduce variability among radiologists– Use only nodules with radiologists’ agreement
• Find best combination of content features– 64 may be too many– Currently only using 2D
Future Work
![Page 37: An Investigation into the Relationship between Semantic and Content Based Similarity Using LIDC Grace Dasovich Robert Kim Midterm Presentation August 21](https://reader036.vdocuments.us/reader036/viewer/2022062518/56649d595503460f94a392ea/html5/thumbnails/37.jpg)
• Different semantic distance measures– Some ratings are ordinal, Jeffery is for
categorical
• Different methods of machine learning– Incorporate radiologists’ feedback into training– Ensemble of classifiers
Future Work
![Page 38: An Investigation into the Relationship between Semantic and Content Based Similarity Using LIDC Grace Dasovich Robert Kim Midterm Presentation August 21](https://reader036.vdocuments.us/reader036/viewer/2022062518/56649d595503460f94a392ea/html5/thumbnails/38.jpg)
Thanks for Listening
Any Questions?
38
Thanks for Listening