determining the importance of figures in journal articles to find representative images
DESCRIPTION
When physicians are searching for articles in the medical literature, images of the articles can help determining relevance of the article content for a specific information need. The visual image representation can be an advantage in effectiveness (quality of found articles) and also in efficiency (speed of determining relevance or irrelevance) as many articles can likely be excluded much quicker by looking at a few representative images. In domains such as medical information retrieval, allowing to determine relevance quickly and accurately is an important criterion. This becomes even more important when small interfaces are used as it is frequently the case on mobile phones and tablets to access scientific data whenever information needs arise. In scientific articles many figures are used and particularly in the biomedical literature only a subset may be relevant for determining the relevance of a specific article to an information need. In many cases clinical images can be seen as more important for visual appearance than graphs or histograms that require looking at the context for interpretation. To get a clearer idea of image relevance in articles, a user test with a physician was performed who classified images of biomedical research articles into categories of importance that can subsequently be used to evaluate algorithms that automatically select images as representative examples. The manual sorting of images of 50 journal articles of BioMedCentral with each containing more than 8 figures by importance also allows to derive several rules that determine how to choose images and how to develop algorithms for choosing the most representative images of specific texts. This article describes the user tests and can be a first important step to evaluate automatic tools to select representative images for representing articles and potentially also images in other contexts, for example when representing patient records or other medical concepts when selecting images to represent RadLex terms in tutorials or interactive interfaces for example. This can help to make the image retrieval process more efficient and effective for physicians.TRANSCRIPT
Determining the importance of figures
in journal articles to find representative
images
Henning Müller, Antonio
Foncubierta-Rodriguez,
Chang Lin, Ivan Eggel
Introduction
Images are essential for diagnosis and treatment planning
Diversity of modalities and protocols is growing
Medical images represent largest amount of medical information produced
Medical literature carries much of the medical knowledge
Images are essential for medical articles
Images can help understand content of an article quickly
Much quicker than reading a text to determine relevance
Search in the literature is frequent
2
Motivation
Image retrieval and classification make medical visual content accessible for search
Images from PACS or from the medical literature (as
in the ImageCLEF image retrieval campaign)
When searching the image needs to be put into context
Text and visual data to understand context
Often more than a single image
Identification of important figures
Use of visual content to determine relevance quickly
Limit search to important figures
3
Examples for using figure importance
Interfaces and visualization require choices
Mobile information search has small interface
Creation of ground truth and rules for importance ranking to select in the best way
Often the first N are taken
4
Database used
ImageCLEF 2012 dataset
~75,000 articles of PubMed Central, ~300,000 images
Subset was used
~800 articles contained more than 8 images
Final choice of 50 articles containing 641 diagnostic
images
5
Types of figure importance
Five categories – sorted by importance
Most important figure – 1 figure that best
describes the article
Essential figures – 0-3 figures
Important figures – 0-3 figures
Little importance figures – 0+ figures
Totally unimportant figures – 0+ figures
6
User test
User test participant profile:
45-year-old physician, surgeon
Working with medical images on a daily basis
User test scenario:
Searching for relevant articles on the topic of the
text
Which figures contain the most important
information about the article?
Requires to read and understand article
All systematic findings or rules for the ranking were written down
7
Results
All 50 articles were analyzed and all 641 images manually ranked by importance
Usually over 1 hour per article
Rules and findings were noted and discussed
Statistics on figure importance and places in the text were made
Compound figures were difficult to judge
Information content is high as they contain different
subfigures and links between them
Compound figure separation new sub task in
ImageCLEF 8
Rules for determining importance
Diagnostic images are more important than graphs/system overviews
Diversity in the results set is important
Not several times the same modality if others exist
Size/resolution matters
Image quality is importance
Position in article
Most importance in the conclusions, then results
Common vs. rare modality
Common modalities are often easier to understand
9
More rules for the importance
Magnification
Highest magnification most important
MRI for soft tissue diseases, CT for bone related ones
Modality adapted to the disease, can be derived
from RadLex terms
Compound vs. other
Compound figure contains much information
In non-diagnostic images, statistical graphs most important
10
Results
Distribution of figure importance
11
Results
Importance vs. position of the figure in the article based on relative position in the text
12
Example for figure importance
13
Conclusions
Relative importance of visual content in scientific articles is important for many applications
First study to investigate this in a user test
Can improve IR system performance
Rank images based on position or remove some
images to search a smaller sub space
Can improve visualization
For small interfaces such as mobile phones and
tablets
To combine text and images in an optimal way
14
Future work
Repeat the same test with a second person to measure inter-rater disagreement
Quantify the subjectivity of the task
Model the rules and optimize an automatic selection strategy
Modality classification to filter out unwanted
images
Clustering and use of cluster centers to increase
diversity
Find the place of figures in the text and rank
based on this 15
Thank you for your attention!
Further information:
http://medgift.hevs.ch/
http://www.khresmoi.eu/
http://www.mitime.org/width/
16