evaluation of sttyemantic similarity metrics applied to the automatic
TRANSCRIPT
-
7/26/2019 Evaluation of Sttyemantic Similarity Metrics Applied to the Automatic
1/14
Expert Systems With Applications 44 (2016) 386399
Contents lists available atScienceDirect
Expert Systems With Applications
journal homepage:www.elsevier.com/locate/eswa
Evaluation of semantic similarity metrics applied to the automaticretrieval of medical documents: An UMLS approach
Israel Alonso,David Contreras
Department of Telematics and Computer Science, Comillas Pontifical University, C/ Alberto Aguilera, 25, 28015 Madrid, Spain
a r t i c l e i n f o
Keywords:
Semantic similarity
Information retrieval
Electronic Health Record
UMLS
a b s t r a c t
One promise of current information retrieval systems is the capability to identify risk groups for certain dis-
eases and pathologies based on the automatic analysis of vast amounts of Electronic Medical Records repos-itories. However, the complexity and the degree of specialization of the language used by the experts in this
context, make this task both challenging and complex. In this work, we introduce a novel experimental study
to evaluate the performance of the two semantic similarity metrics (Pathand Intrinsic IC-Path, both widely
accepted in the literature) in a real-life information retrieval situation. In order to achieve this goal and due
to the lack of methodologies for this context in the literature, we propose a straightforward information re-
trieval system for the biomedical field based on the UMLSMetathesaurus and on semantic similarity metrics.
In contrast with previous studies which focus on testbeds with limited and controlled sets of concepts, we
use a large amount of information (101,712 medical documents extracted from TREC Medical Records Track
2011). Our results show that in real-life cases, both metrics display similar performance, Path (F-Measure
=0.430) e Intrinsic IC-Path(F-Measure =0.427). Thereby we suggest that the use ofIntrinsic IC-Pathis not
justified in real scenarios.
2015 Elsevier Ltd. All rights reserved.
1. Introduction
The exponential growth, in recent times, of the amount of
biomedical information that is stored on purely electronic supports
Electronic Health Records, or EHR, spring promptly to our mind
has turned them into an element of undeniable relevance to the
most diverse fields of scientific research (Hoffman, 2010; Prokosch, &
Ganslandt, 2009).
One of these fields is that of Information Retrieval, and its tradi-
tional challenge of identifying those records which most efficiently
answer a users immediate needs for information; for this task
to be accomplished, it is critical to first establish a recognition of
patterns in medical histories which would permit, ultimately, the
early detection of epidemic outbreaks, the prevention of disease, or
the identification of cohort groups (Roque, et al., 2011). The maindifficulty in undertaking this task arises from Natural Language
Processing, as natural language is not only complex, but also highly
context-sensitive. In a broad field such as that of the English lan-
guage, for instance, it becomes necessary draw upon resources and
ontologies like WordNet to aid representation (Fellbaum, 1998).
Corresponding author. Tel.: +34 915422800; fax: +34 91 559 65 69.
E-mail addresses: [email protected] (I. Alonso), [email protected]
(D. Contreras).
Unfortunately, these tools are of limited use to more specializeddisciplines, such as that of biomedicine, whose technical jargon
is often as complex as it is ambiguous; the parsing of biomedical
information calls for very specific terminology (Friedman, Kra, &
Rzhetsky, 2002) and, hence, for new search strategies, designed
from the outset to the particular demands of this branch of science
(Alpi, 2005). In such cases, one must resort to specialist resources
dictionaries and thesauri like UMLS (McCray et al., 1993) to give a
semantic value to relevant information.
Our present work aims to bridge this gap, helping the information
retrieval systems based on Electronic Health Records, according to
their semantic content; in a nutshell, being able to interpret the
information needs of any given query, and consequently select those
medical documents most relevant in terms of semantic proximity.
An endeavor which is, we believe, much needed for the correctidentification of patients in cohort studies, given the complexity,
variability, and lack of structure in the information traditionally
contained in such records. This will require, to define and represent,
through biomedical concepts, the information contained in both
health records and medical queries, in order to establish the seman-
tic proximity between them. The use, in this fashion, of semantic
relationships between said concepts, closely emulates the analogous
process in the human mind to establish similarity between two
given terms (Miller, & Charles, 1991; Rubenstein, & Goodenough,
1965). It should be pointed out beforehand that previous works have
http://dx.doi.org/10.1016/j.eswa.2015.09.028
0957-4174/ 2015 E lsevier Ltd. All rights reserved.
http://dx.doi.org/10.1016/j.eswa.2015.09.028http://www.sciencedirect.com/http://www.elsevier.com/locate/eswamailto:[email protected]:[email protected]:[email protected]:[email protected]:[email protected]://dx.doi.org/10.1016/j.eswa.2015.09.028http://dx.doi.org/10.1016/j.eswa.2015.09.028mailto:[email protected]:[email protected]://crossmark.crossref.org/dialog/?doi=10.1016/j.eswa.2015.09.028&domain=pdfhttp://www.elsevier.com/locate/eswahttp://www.sciencedirect.com/http://dx.doi.org/10.1016/j.eswa.2015.09.028 -
7/26/2019 Evaluation of Sttyemantic Similarity Metrics Applied to the Automatic
2/14
I. Alonso, D. Contreras/ Expert Systems With Applications 44 (2016) 386399 387
shown interest in establishing metrics for determining the degree of
semantic similarity between two terms (Collins, & Loftus, 1975) in
a more general context like the English language, and based on the
WordNet infrastructure (Meng, Huang, & Gu, 2013). Unfortunately,
however, these approaches not always yield satisfactory results when
they are applied in the biomedical domain, since WordNets coverage
of this domain is rather limited. (Burgun, & Bodenreider, 2001).
Later works have attempted to solve this by incorporating specific
resources and ontologies (MeSH, and SNOMED CT) in the study ofsimilarity metrics in the field of biomedicine, always in a theoretical
context and a controlled environment. (Al-Mubaid & Nguyen, 2006;
Batet, Snchez, & Valls, 2011; Caviedes, & Cimino, 2004; Nguyen, &
Al-Mubaid, 2006; Pedersen, Pakhomov, Patwardhan, & Chute, 2007).
These works prove it becomes necessary to resort to a specialised
infrastructure namely UMLS if we are to determine the similarity
existing between two concepts in the field of biomedicine, with the
degree of precision that a human expert would expect to achieve.
In this work we propose an experimental study to evaluate the
performance of the two semantic similarity metrics (PathandIntrin-
sic IC-Path, both widely accepted in the literature) in a real-life in-
formation retrieval context. Moreover, to perform this assessment,
we deploy a straightforward information retrieval system for the
biomedical field based on the UMLS Metathesaurus and on semantic
similarity metrics, due to the lack of methodologies for this context
in the literature.
Our paper will be structured as follows:
InSection 2,we will describe the main components and charac-
teristics of UMLS. In Section 3, we offer an outline of the current state
of the art, focusing on different tools and strategies used nowadays
in the retrieval of biomedical information, as well as the metrics used
in calculating the semantic similarity between two concepts in this
particular field. Then, inSection 4,we will define our proposal, along
with the materials used in our work. InSection 5,we conduct a study
of the inner workings of the different sources and relationships con-
tained in UMLS, and how they are reflected in the results obtained by
semantic similarity metrics in a purely theoretical context; we will
later use, as our reference, the two main metrics based on the ap-
proaches Intrinsic ICandPath findingfor their study and their appli-cation to a real-life context;Section 6will describe the procedures
involved in our proposal for an ad-hoc and straightforward concept-
based medical document retrieval system, and evaluate the efficacy
of the two main semantic similarity metrics when applied to a real-
life context (reflected inTREC 2011).Section 7covers the analysis and
interpretation of the results obtained. Last, Section 8will comment
on the conclusions derived from all conducted tests, as well as the
contributions obtained from their results, and the future lines of re-
search that would give continuity to our work.
2. UMLS
UMLS1 (Unified Medical Language System) is an ongoing project
started in 1986 by the National Library of Medicine. It was envi-sioned as a common environment for the access and treatment of
biomedical information (Bodenreider, 2004; Humphreys, Lindberg,
Schoolman, & Barnett, 1998; Lindberg, Humphreys, & McCray, 1993).
To this end, it structures said information as a series of concepts, with
a setrelationship between them. At itscore,UMLS is made up of three
components, all of which undergo regular updates and revision: a
Metathesaurus, a Semantic Network, and a Specialist Lexicon (lexical
information and tools for natural language processing). Of these ele-
ments, the Metathesaurus and the Semantic Network are of particular
interest to our work: the former for its contained concepts, sources
and relationships, and the latter for its offer of semantic types.
1
http://www.nlm.nih.gov/research/umls/.
Table 1
Representation structure of UMLS concept C0018787.
CUI LUI SUI AUI Source String
C0 018787 L0 018787 S0 047194 A0 06 636 8 M eSH Heart
C0 018787 L0 018787 S0 047194 A16757661 NCI Heart
C0018787 L0018787 S0047194 A2882201 S NOMED Heart
C0 018787 L0 018787 S03759 48 A16766 657 NCI H EART
C0 018787 L0018787 S0419735 A0480532 CSP heart
C0 018787 L0 018787 S0 419735 A18628913 C HV heart
C0 018787 L024 8647 S0324326 A1280280 6 NCI C ardiacC0 018787 L024 8647 S134 4787 A1304355 CSP Cardiac
C0 018787 L024 8647 S134 4787 A186 47556 C HV Cardiac
The Metathesaurus is, in essence, a vast multipurpose and multi-
language database covering more than one million concepts, all of
them represented under a common framework, and stored in over a
hundred different sources. Said sources are grouped in several dis-
tinct perspectives of the biomedical environment, such as scientific
information (MeSH-CRISP), clinical terminology (SNOMED-CT), ad-
ministrative terminology (ICD-9-CM, CPT-4), or data exchange (HL7,
LOINC), as well as general or specific thesauri including anatomy
(UWDA, NeuroNames ), drugs (RxNorm, First Data Bank), medical de-
vices (UMD, SPN), nursing (NIC, NOC, NANDA), oncology (PDG), ad-verse reactions (COSTART, WHO) or gene products (Gene Ontology-
GO), to name a few.
The data compiled in these various sources is organized in the
Metathesaurus following a unique identifier structure, with a hier-
archy of four significance levels: Concepts, Terms, Strings, and Atoms.
In this order:
CUI (Concept Unique Identifier): Each concept represents a dis-
tinct meaning, which encompasses, within a unique code, all its
synonym terms. LUI (Lexical Unique Identifier): Identifies each of the known lexi-
cal variations or terms for any given concept. SUI (String Unique Identifier): Represents each descriptive string
associated to a given term. One of them is designated as its name,or preferred term. All predicted variations in the character se-
quence of the string (upper and lower case, punctuation) are cov-
ered in separate identifiers. AUI (Atom Unique Identifier): correspond to each individual oc-
currence of a given string in a specific source.
Hence, for instance, the concept (C0018787), which represents the
muscle organ that keeps blood circulation going, is grouped into a
number of descriptive strings, of which we now show a few for the
sake of the example. (Table 1).
We must keep in mind that a givendescriptive string (SUI),may be
referenced in oneor manyconceptidentifiers (CUIs). For example, the
string Heart, identifies the preferred term for concept (C0018787),
but it is also oneof the synonym terms forconcept (C1281570) Entire
heart. We will now show the series of descriptive strings for both
these concepts, as well as the semantic type they belong to:
CUI: C0018787
SUI (Prefered term): Heart
Other SUIs (string terms): Hearts; Cardiac; coronary; cardiac struc-
ture; heart structure; structure of heart, unspecified; corazn; es-
tructura cardiaca; Cuore; herzen; Hart; etc.
Semantic Type: (bpoc) - Body Part, Organ, or Organ Component.
CUI: C1281570
SUI (Prefered term): Entire heart
Other SUIs (string terms): Heart; Entire heart (body structure);
corazn; etc.
Semantic Type: (bpoc) - Body Part, Organ, or Organ Component.
http://www.nlm.nih.gov/research/umls/http://www.nlm.nih.gov/research/umls/http://www.nlm.nih.gov/research/umls/ -
7/26/2019 Evaluation of Sttyemantic Similarity Metrics Applied to the Automatic
3/14
388 I. Alonso, D. Contreras / Expert Systems With Applications 44 (2016) 386399
The Metathesaurus includes different kinds of relationships
between concepts, whether they are found in the same source
(intra-source) or in two different ones (inter-source). These relation-
ships may be hierarchical and non-hierarchical (Burgun, & Bodenrei-
der, 2001). Hierarchical relationships cover either direct synonymical
relations of the Parent/Child (PAR/CHD) type or indirect ones
of the Broader/Narrower (RB/RN) type. In turn, non-hierarchical re-
lationships may belong to the Siblings (SIB), Other (RO), Similar (RL),
Source Asserted Synonymy (SY), Possible Synonymy (RQ), AllowedQualifier (AQ), or Can Be Qualifier (QB) types. One advantage of the
UMLS Metathesaurus lies in that it comprehends all known biomed-
ical sources, which can then be used as a whole or independently.
As for the Semantic Network, it categorizes all concepts in UMLS
into 133 semantic types, between which 54 relationships are estab-
lished. This is done through a tree structure, stemming from two
main hierarchies:Entityand Event(Bodenreider, 2001; Bodenreider,
& McCray, 2003; Erdogan, Erdem, & Bodenreider, 2010). Each con-
cept (CUI) in the Metathesaurus belongs to, at least, one semantic
type; the deeper in the structure these types lie (that is, the closer
to the tree leaves), the more specific they will be. Thus, the concept
(C0497327), associated to the termDementia, belongs to theseman-
tic type Mental or Behavioral Dysfunction (mobd), which in turn is
encompassed by the semantic group Disorders(DISO), which at last
is contained in the hierarchy Event.
The representation hence achieved of the biomedical knowledge
contained in UMLS is the basis for itsapplicationin a variety of strate-
gies and tools for natural language processing, and the computation
of semantic similarity between concepts.
3. Related work
Information Retrieval Systems (IRS) based on natural language
processing has been thoroughly studied in the context of biomedi-
cal literature and, more recently, in that of clinical documentation.
This interdisciplinary research field, dubbed biomedical informatics
(Jiang et al., 2013), is among the fastest-growing in recent years.
3.1. Information Retrieval Systems (IRS) in the biomedical environment
IRS attempt to solve the information needs that users set out
through queries. These queries contain, intrinsically, their search ob-
jectives, whose sensitivity proves to be a crucial factor in the devel-
opment of search algorithms and the retrieval of their results (Rose, &
Levinson, 2004). Themain difficulty in reaching said results is thelack
of precision in the queries themselves a problem only made worse
by the inherent complexity of the language, and by the kind of infor-
mation at hand. To address this problem, by completing and improv-
ing the information presented in the query, the application of query
expansion techniques was developed (Efthimiadis, 1996). These tech-
niques, widely used in IRS to improve performance, focus on the ad-
dition of new terms to the original query to narrowdown results. In a
broad categorization, we could separate query expansion techniquesinto two general approaches.
A first group of techniques is established on the analysis of vast
collections of documents, for their grouping through co-occurring
vectors (Xu, Zhu, Zhang, Hu, & Song, 2006; Zhu, Wu, Carterette, & Liu,
2014) and probabilistic models (Qi, & Laquerre, 2012) of the most rel-
evant terms. More recent works employ semantic distribution mod-
els on linguistic elementsin said collections, in order to automatically
extract synonyms and abbreviations (Henriksson, Moen, Skeppstedt,
Daudaravicius, & Duneld, 2014; Zeng, Redd, Rindflesch, & Nebeker,
2012). Lastly, otherworks in this groupanalyzethe applicationof spe-
cific semantic similarity metrics on structures defined beforehand as
containing the elements to be evaluated such as the Vector Space
Model (Turney & Pantel, 2010)and the comparison of histogram dis-
tance orcross-bin distances(Kurtz, Beaulieu, Napel, & Rubin, 2014).
A second group would instead focus on the use and analysis of
structures based on existing knowledge of the field of biomedicine,
such as UMLS. The use of these resources calls for the disambiguation
of the original querys terms, so that they point at unique concepts
within the ontology (Bhogal, Macfarlane, & Smith, 2007; Voorhees,
1994). In this manner, those concepts which arerelatedto theoriginal
search terms would be used to expand the query.
One tool that, alongside the UMLS Metathesaurus, allows us to
identify the concepts that are referred to in a given text, is Metamap(Aronson, 2001; Aronson, & Lang, 2010). This tool gives us the foun-
dation for the development of various query expansion techniques
(Aronson, & Rindflesch, 1997), through the exploitation of the seman-
tic relations contained in UMLS. In this approach, different efforts use
defined UMLS structures to develop solutions stemming from, for in-
stance: the representation of texts from clinical documents via se-
mantic graphs, based on concepts and relationships (Plaza, & Daz,
2010); query expansion through random walks based on the UMLS
structure (Martinez, Otegi, Soroa, & Agirre, 2014); query expansion
through the creation of an ontology of the query itself, associated to
closely related concepts(Babashzadeh, Huang, & Daoud, 2013); the
use of relationships between concepts, to reflect the semantic dis-
tance between patients from stored information (Melton et al., 2006).
As a direct consequence of the need to improve exploitation tech-
niques of the semantics of various biomedical sources, several works
arise which focused on evaluating the semantic similarity between
any given concepts in the field of biomedicine.
3.2. Semantic similarity
Over time, a large variety of metrics have been defined, analyzed,
and implemented, for the computation of semantic similarity be-
tween concepts contained in biomedical sources such as SNOMED-
CT, MeSH, UMLS, etc. These metrics can be categorized according to
two major strategies:
Based on the estimation of the semantic similarity between two
terms, on account of the distance between the links relating them
within the ontology (Path finding). Based on the semantic similarity between two concepts according
to the information they contain (Information Content).
3.2.1. Path finding similarity measures on taxonomical structure
These metrics attempt to measure semantic information across
concepts, based on the hierarchical relationships defined between
them in biomedical sources. The most important among these will
now be explained:
The first metric defines the semantic similarity (sim) between two
concepts as the shortest path between them (sp), according to their
interrelationships. This metric, known as Concept Distance(CDist), is
defined by Rada, Mili, Bicknell, and Blettner (1989) as the number
of nodes in the shortest path between two concepts, c1 andc2, and
is applied with (RB/RN) relationships on MeSH vocabulary.Caviedes,
& Cimino (2004) later evaluate it with (PAR/CHD) relationships on
MeSH, SNMI, ICD9-CM resources in the field of biomedicine.
simCDist(c1, c2) = sp(c1, c2)wheresp is theshortest pathbetween c1, c2
(1)
A later variation, called Path Measure or Path Length (Path), was
defined byPedersen et al. (2007)and applied from is-a type rela-
tionships in SNOMED-CT. This variation corresponds to the inverse
of the distance between two concepts (CDist), hence normalising the
similarity result to a value ranging from 0 to 1.
simPath(c1, c2) = 1/sp(c1, c2) (2)
Later metrics introduce certain characteristics associated to the
structure of taxonomy which had not been explored before, such as
its depth or size, or thelocation of different concepts within it (Fig. 1).
-
7/26/2019 Evaluation of Sttyemantic Similarity Metrics Applied to the Automatic
4/14
I. Alonso, D. Contreras/ Expert Systems With Applications 44 (2016) 386399 389
Fig. 1. Example of hierarchical relationships between concepts in UMLS Metathesaurus. The termsdepth, path, andLCS, are represented.
For instance, Leacock and Chodorow (1998) (lch) consider the
shortest path (sp) between two concepts (c1, c2), scaling it logarith-
mically to the total depth of the taxonomy (D). Thus, the deeper the
taxonomy (that is, the more complex and thorough), the larger the
relative value of semantic proximity between two terms would be. A
proposal for the normalization of this metric to the unit interval can
be found inGarla and Brandt (2012).
simlch(c1, c2) = 1 log (sp)/ log2D (3)
Other approaches introduce a new element in the hierarchy of
both concepts, corresponding to their closest common ancestor. The
depth of both concepts will be established according to the depth of
theirLeast Common Ancestor (LCA), also known asLeast Common Sub-sumer (LCS)(Fig. 1).
Wu and Palmer (1994) (wup) apply a measurement of the similar-
ity between two concepts obtain by scaling the depth (depth) of their
Least Common Ancestor(LCS) to the depth of each of the two concepts
from the root of the taxonomy by way of their LCS.Garla and Brandt
(2012) introduces a change including theshortestpath (sp) inthedef-
inition, to avoid the case (c1 = c2), which would result in simwup(c1,
c2)0and>0are contribution factors of two features andkisa constant.
3.2.2. Information Content (IC) similarity measures
The following approaches to calculating semantic similarity are
based on Shannons Information Theory (Shannon, 2001), by which
2
http://search.cpan.org/dist/UMLS-Similarity/.
http://search.cpan.org/dist/UMLS-Similarity/http://search.cpan.org/dist/UMLS-Similarity/http://search.cpan.org/dist/UMLS-Similarity/ -
7/26/2019 Evaluation of Sttyemantic Similarity Metrics Applied to the Automatic
5/14
390 I. Alonso, D. Contreras / Expert Systems With Applications 44 (2016) 386399
the similarity between two concepts must be measured according to
the amount of information content they provide
Information Content (IC) may be obtained from the distribution
of a concept within a text corpus alongside a taxonomy (Corpus IC)
(Resnik, 1995), or from the structure of a taxonomy alone (Intrinsic
IC) (Snchez, Batet, & Isern, 2011; Seco, Veale, & Hayes, 2004; Zhou,
Wang, & Gu, 2008).
TheCorpus IC of any concept c is defined as the inverse of the
log of the concepts frequency, whereas this frequency is the proba-bility of said concept occurring a number of times in a given corpus
C, q(c,C).The number of times that the children concepts (cs) of
the first appear within the said corpus also adds up, so that the more
frequently the concept occurs, the lesser its information content will
be(Resnik, 1995).
ICCorpus(c) = log (f q(c))
f q(c) = f q(c,C) +
cs children(c)
f q(cs) (8)
On the other hand, the Intrinsic ICof a concept c is a proposal
also defined in various works(Seco et al., 2004; Zhou et al., 2008)
which has however been adapted to the biomedical context (Batet
et al., 2011). The Intrinsic IC of a concept c is defined as the ratioof the number of its terminal concepts ( leaves(c)) to its associated
ancestors (subsumers(c))(Snchez & Batet, 2011). This ratio is then
normalized to the interval [0, 1] by the total number of leaves in the
taxonomy (max_leaves). Thus, the more terminal elements a concept
has relative to its number of ancestors, the lesser its information con-
tent will be.
ICIntrinsic(c) = log
leaves(c)subsumers(c)
+ 1
max_leaves + 1
(9)
These Information Content approaches lead to the semantic simi-
larity measures defined byLin (1998) andlJiang and Conrath (1997).
Lins definition proposes a ratio of the common information content
of a given pair of concepts IC(LCS(c1, c2))to the information contentthat describes each concept separately IC(ci). In this approach, the
higher the IC value of the LCS (that is, the more specific), the greater
the similarity between the concepts thus compared.
simLin(c1, c2) =2 IC(LCS(c1, c2))
IC(c1) + IC(c2) (10)
Jian and Conrath, in turn, propose an analogousmeasure (opposite
to similarity) based on the distance between to concepts, which is
evaluated as the difference between the information content of the
two conceptsIC(ci)and that of their common ancestor IC(LCS(c1,c2)).
DistJC(c1, c2) = IC(c1) + IC(c2) 2 IC(LCS(c1, c2)) (11)
Metrics based on the Path findingapproach have been redefined
(Batet et al., 2011) in terms of Information Content, and implemented
for their evaluation(Garla, & Brandt, 2012). To this end, the shortest
path (sp) between two concepts is redefined as the semantic distance
(as proposed by ), and maximum depth as the maximum Information
Content of any concept (icmax).
All told, the metric (lch) based on Intrinsic IC (Intrinsic IC-lch) is
redefined as:
simIntrinsic
IClch
(c1, c2) = 1 (log(DistJC(c1, c2) + 1))
log(2 icmax + 1) (12)
and the metric (Path) based on Intrinsic IC (Intrinsic IC-Path) as:
simIntrinsic
IClch
(c1, c2) =1
DistJC(c1, c2) + 1 (13)
These metrics have been evaluated in various works and on dif-
ferent test benchmarks (Batet et al., 2011; Garla, & Brandt, 2012;
Pedersen et al., 2007). Said works reveal a betterperformance of met-
rics based on Intrinsic IC over those based on Path finding.
4. Proposal and materials
As described in the previous section, metrics based on Intrin-
sic IC perform better than those based on Path Finding (Batet
et al., 2011; Garla, & Brandt, 2012) working on testbeds with limited
and controlled sets. For this reason, our experimental study will fo-
cus on assessing the performance, in a real-life context, of the Intrin-
sic IC-Path metric itself, andon thesimplestof distance-based metrics
(for its lower computational cost), Path. In order to perform this as-
sessment, we have deployed an information retrieval system for the
biomedical field based on the UMLS Metathesaurus and on semantic
similarity metrics.
As we covered in the previous section, some earlier works focused
on defining retrieval systems and language processing supported bythe UMLS resource(McCray et al., 1993) and others on the applica-
tion of semantic similarity metrics on defined structures, indepen-
dent from UMLS, such as thecomparisonof histogram distance (Kurtz
et al., 2014). None of these works integrate the use of the UMLS re-
source with semantic similarity metrics into information retrieval
systems for the context of biomedical information.
Later on, in Section 5, we will assess the performance of the UMLS
Metathesaurus at calculating the semantic similarity between con-
cepts from a theoretical perspective. To this end, we will use previous
works as reference (Batet et al., 2011; Garla, & Brandt, 2012; McInnes
et al., 2009; Pedersen et al., 2007), and compare their results with
the ones attained in our own work, in order to validate our frame-
work. Said works evaluate the semantic similarity of several lists of
paired concepts, using different metrics, and compare the results to
those proposed by a team of medical coders and physicians. We will
also analyze the impact, in those results, of using different versions
of UMLS and new types of relationships. Lastly, we will highlight the
great diversity appreciable in the results of previous studies, which is
due to the lack of a single correlation coefficient.
InSection 6, we will analyze the results of the Path and Intrin-
sic IC-Path metrics in a real-life information retrieval context based
on semantic similarity. In this part of our paper, we will use the test
dataset from the 2011 Text Retrieval Conference (TREC) (Voorhees &
Tong, 2011). This test dataset is made up of three elements: a cor-
pus of 101,712 de-identified documents or health records, compris-
ing 17,265 visits or medical episodes of various patients (each visit
canhave between 1 and 415 documents or reports);35 queries repre-
senting information needs or inclusion criteria that must be fulfilled
by theretrieval of themost relevantvisits or episodes;and lastly, a se-ries of relevance judgements defined by a team of experts, in which
each individual visit is deemed relevant or not relevant according
to the information needs of each search query in real-life context.
For the development of the solution proposed in this paper,
we have used a range of different tools: the UMLS Metathesaurus,
2010AB and 2011AB, as base for medical knowledge; Metamap20133
for concept-based representation this version of Metamap allows
for the identification of negative statements, and the classification
of concepts for any semantic type they may possess; and two open-
source tools for the semantic similarity computation between con-
cepts the first(McInnes et al., 2009)is a framework composed of
3
http://metamap.nlm.nih.gov/Docs/MM_2013_ReleaseNotes.pdf.
http://metamap.nlm.nih.gov/Docs/MM_2013_ReleaseNotes.pdfhttp://metamap.nlm.nih.gov/Docs/MM_2013_ReleaseNotes.pdfhttp://metamap.nlm.nih.gov/Docs/MM_2013_ReleaseNotes.pdf -
7/26/2019 Evaluation of Sttyemantic Similarity Metrics Applied to the Automatic
6/14
I. Alonso, D. Contreras/ Expert Systems With Applications 44 (2016) 386399 391
Table 2
Semantic similarity correlation values using the Path metric, with PAR/CHD rela-
tionships, for SNOMED-CT (UMLS versions 2008AB and 2010AB). Spearman correla-
tions based on minimum rank values (results reproduced from Pedersen and McInnes
(McInnes et al., 2009; Pedersen et al., 2007) forversion2008AB) andon average values
(used in this work).
S NO MED- CT 2008AB S NO MED- CT 2010AB
Minimum values Average values Minimum values Average values
Physicians 0.3500 0.3170 0.3134 0.2744Coders 0.5000 0.4500 0.4596 0.4160
two packages (UMLS-Similarity4 y UMSL-Interface5)based on PERL
modules available in CPAN (The Comprehensive Perl Archive Net-
work), and the second (Garla & Brandt, 2012) one of the components
of the Ytex6 framework.
Finally, it is worth noting that health records processed by the
system are a series of XML files. Although each document is hence
structured in XML language, the label containing the most important
information (the document itself) is written as natural language
5. Evaluation of UMLS
Firstly, wewill analyze theexisting characteristics of the main tool
used in this work the UMLS Metathesaurus that can effect an im-
provement in semantic similarity computation. These would be the
evaluation of sources, and the existing types of relationship between
concepts, present in the different UMLS versions used.
5.1. Versions of UMLS Metathesaurus
UMLS compiles the knowledge of the biomedical domain, and is
thus undergoing constant evolution and improvement. Small changes
between versions, affecting concepts or relationships, can have a no-
ticeable impact in the results obtained in a tightly-defined context,
such as Pedersens benchmark (29 pairs of concepts) (McInnes et al.,
2009; Pedersen et al., 2007).To reflect this, we have reproduced the results obtained by
Pedersen et al. (2007) and McInnes et al. (2009) on the source
SNOMED-CT of UMLS version 2008AB (used in their work), compared
to those of version 2010AB(Table 2).
This table shows Spearmans rank correlation coefficients for the
Path metric, compared to the estimates of physicians and medical
coders. On one side, we show the correlation coefficient results based
on theminimum rank values, forgroups of similarity values with rep-
etition (as used in(McInnes et al., 2009; Pedersen et al., 2007)); on
the other, the correlation results based on the average values of said
rank (employed in this paper, as we consider it to be the most ade-
quate approach in this context).
As we can see, the obtained correlations (using Spearmans co-
efficient) vary significantly (from 6% to 13%) between versions ofSNOMED-CT. These results show some refinement to the relation-
ships between concepts in version 2010AB, which leads to lower val-
ues(similarityrelationships found in theearlier version arenot found
anymore). We must, then, bear this in mind when we compare the
results of different studies, since many of them may be comparing
the performance of metrics run on different versions of the UMLS
Metathesaurus.
In these results, and in others obtained throughout this work, we
can observe that the metrics are better adjusted to the similarity cri-
teria defined by medical coders than to those set by physicians.
4 http://search.cpan.org/dist/UMLS-Similarity/.5 http://search.cpan.org/dist/UMLS-Interface/.6
https://code.google.com/p/ytex/.
Table 3
Semantic similarity correlation values, using Spearman and Pearson, for a
number of metrics based on Path findingwith PAR/CHD relationships,
for SNOMED-CT with UMLS 2010AB.
Path lch wup nam
Spearman Physicians 0.2744 0.2744 0.3377 0.4063
Coders 0.4160 0.4156 0.4190 0.5578
Pearson Physicians 0.5451 0 .3348 0.3372 0.4301
Coders 0.7170 0 .4566 0.3840 0.4456
5.2. Impact of correlations used
Analyzing the results obtained, and comparing them with the re-
sults of previous works, we observe a lack of a standard criterion
for the coefficient used. For instance, some works used Spearmans
correlation coefficient (Garla, & Brandt, 2012; Pedersen et al., 2007)
while others use Pearsons linear coefficient (Batet et al., 2011). This
took us to the study and interpretation of both kinds of correlations,
for the analysis of various semantic similarity metrics. Pedersen him-
self, who uses Spearmans coefficient in his results ( Pedersen et al.,
2007), points to a maximum Pearson correlation of 0.85 between the
estimates of the evaluating experts (medical coders and physicians).
For this, and for the sake of a better interpretation, we calculatesimilarity for the 29 paired concepts (McInnes et al., 2009)with the
main metrics based on Pathfinding, andobserve that there is signif-
icant variation in the results depending on the correlation coefficient
used (Table 3).
As in previous works (McInnes et al., 2009; Nguyen, & Al-Mubaid,
2006), the nam metric (Nguyen & Al-Mubaid), applied to SNOMED-
CT sources, reflects better correlation values for the Spearman coeffi-
cient. Pearsons correlation, however, offers better results for the Path
metric(Table 3).
Far from joining the discussion over the kind of correlation that
should be used (Pearson correlates similarity values, while Spearman
correlates their order), our study reveals that the results of various
works are simply not comparable with each other, as was already
pointed out byGarla and Brandt (2012). For this reason, and to fur-ther clarify the matter, we now show the results obtained with both
correlation coefficients.
5.3. Study of UMLS relationships and resources
The semantic similarity calculations in previous works defined by
Pedersen et al. (2007) andMcInnes et al. (2009) were done through
direct hierarchical relationships (PAR-CHD), defined also as type
is-a semantic relationships, on a single source. Later works, such
asGarla and Brandt (2012) andBatet et al. (2011), do not specify the
kind of relationships used in the calculation of semantic similarity, so
it is not possible to determine the implications of their results.
For this reason, in the first part of our work, we have also evalu-
ated the impact that different kinds of relationships between con-cepts can have in the calculation of semantic similarity. The kinds
of relationships we evaluate are: direct hierarchical relationships
(PAR/CHD), indirect hierarchical (RB/RN), and non-hierarchical exist-
ing in the UMLS Metathesaurus (SIB, RO, RL, SY, RQ, AQ, and QB).
Firstly, we will run the similarity calculations for Pedersens
benchmark, using the Path metric applied to the sources and rela-
tionships contained in UMLS 2010AB. As shown in Table 4,there is
a significant improvement in the correlation coefficients for hierar-
chical relationships. However, the combined use of all relationships
(both hierarchical and non-hierarchical) degrades the results consid-
erably. This is dueto thefact that these non-hierarchical relationships
generate cycles that do not represent parent/child or sibling relation-
ships between concepts (synonymy) (Bodenreider, 2001; Erdogan
et al., 2010) we do not, then, recommend using them, as they add
http://search.cpan.org/dist/UMLS-Similarity/http://search.cpan.org/dist/UMLS-Similarity/http://search.cpan.org/dist/UMLS-Interface/http://search.cpan.org/dist/UMLS-Interface/https://code.google.com/p/ytex/https://code.google.com/p/ytex/https://code.google.com/p/ytex/http://search.cpan.org/dist/UMLS-Interface/http://search.cpan.org/dist/UMLS-Similarity/ -
7/26/2019 Evaluation of Sttyemantic Similarity Metrics Applied to the Automatic
7/14
392 I. Alonso, D. Contreras / Expert Systems With Applications 44 (2016) 386399
Table 4
Semantic similarity correlations for the Pathmetric, with relationships
PAR/CHD, PAR/CHD+RB/RN, and ALL relationships for sources existing in
UMLS 2010AB.
PAR/CHD PAR/CHD + RB/RN ALL
Spearman Phys. 0.6382 0.5761 0.4788
Coders 0.6 422 0.6 495 0.4338
Pear so n Phys. 0.7059 0.6740 0.6168
Coders 0.7982 0.8012 0.7046
Table 5
Table summarising results, extracted from Garlas work for UMLS re-
lease 2011AB concept graph(Garla, & Brandt, 2012).
Benchmarks Knowledge Based
Path Finding Intrinsic IC
wup Path / lch Path
Pedersen Combined N = 29 0.70 0.61 0.70
Mayo N = 101 0.38 0. 30 0.41
UMN relatedness N = 430 0.33 0.36 0.36
UMN similarity N = 566 0.39 0. 40 0.43
UMN relatedness N = 587 0.32 0.34 0.35
noise to the results. We can also observe how the application of the
entirety of the knowledge offered by the sources within the UMLS
Metathesaurus improves results as well.
The previous tests were also conducted on version 2011AB of the
UMLS Metathesaurus, obtaining similar results.
5.4. Comparison of Path and Intrinsic IC-Path metrics
Although many works have evaluated the performance of metrics
with different sets of pairedconcepts, Garlaoffers a definitive view on
various existing frameworks, and the various metrics defined (Garla,
& Brandt, 2012). As we can see inTable 5 (summary of the resultsreached by Garla), the best overall results are given by the Intrinsic
IC-Pathmetric.
For this reason, our work will be evaluating the performance of
the metric yielding the best results (Intrinsic IC-Path) and the compu-
tationally simplest metric (Path), in a real information retrieval sce-
nario that is, working on large volumes of information.
6. Evaluation of metrics in a real information retrieval context
Now that we have shown the importance of using the latest ver-
sions of the UMLS Metathesaurus (for an updated knowledge of the
biomedical domain) and of applying the right relationships (to re-
duce noise), we will now focus, in this section, on the applicationof these conclusions to a real environment. Also, in contrast with
of earlier works, we will evaluate the impact of using the Path and
Intrinsic IC-Pathmetrics in this real environment, namely the set of
electronic medical records found in TREC Medical Records Track 2011
(Voorhees, & Tong, 2011).
In order to perform this evaluation, each of the medical reports
making up eachvisitfor a given patient, along with the search topics,
will be represented via concepts contained within UMLS. This rep-
resentation will allow us to relate the topics concepts semantically
with the contents of each report; the semantic similarity between
these will determine the relevance of each visit.
For the calculation of metrics of semantic similarity between con-
cepts, we have used Ytex, developed by Garla and Brandt (Garla, &
Brandt, 2012).
6.1. Processing the information to be used
In order to represent, treat, and evaluate the semantic similarity
described above, we must extract UMLS concepts from the search
topic, as well as from the report. That done, the semantic similarity
between the concepts extracted from both is calculated. Lastly, these
results will be aggregated into a single similarity value, which will
determine the relevance (or irrelevance) of the document for a given
search topic. We will now detail the process.Pre-processing of reports and search topics:reports taken from
Text Retrieval Conference (TREC) are in XML format, and contain a
series of headers, footers, codes, and labels that must be removed be-
fore processing. Hence, in this stage, we remove the documents XML
tags, as well as any information that is not relevant to this study, such
as the reports checksum code which identifies the visit it belongs to
its signatures, and its ICD-9 codes. The result is a plain-text version
of the report, written in natural language with no codes or labels.
Topics, on the other hand, require no such processing, as they al-
ready are a mere text string.
Processing of search topics: the topics are processed using the
tool Metamap, breaking them into simple strings termed phrases
which represent symptoms, parts of the body, illnesses, etc. After
this, we obtain the CUIs of each of these resulting phrases. Some of
these phrases or strings may generate more than one CUI (as was
described inSection 2), in which case we combine these CUIs, giv-
ing each phrase a number of sub-phrases, and hence expanding the
query.
As an example of this method, we will now describe the process-
ing ofTopic 104, which defines the search criteriaPatients diagnosed
with localized prostate cancer and treated with robotic surgery. The
strings or phrases that make up this topic are:
1. Patients.
2. diagnosed with localized prostate cancer.
3. treated with robotic surgery.
Following this, we extract the UMLS concepts (CUIs) associatedto each topic phrase, obtaining the 11 sub-phrases shown in Table 6.
For instance, phrase 3 (treated with robotic surgery) generates sub-
phrases 1009, 1010, and 1011, while phrase 1 (Patients) generates
only sub-phrase 1001.
In case of processing a single search criteria, for example Topic 101
Patients with hearing loss, only one phrase will be generated, with
the concept sub-phrases 1001, 1002, and 1003, as seen inTable 7.
In both examples, we can see how phrases given more than one
sub-phrase implicitly expand the original query, through the varia-
tions in the concepts (CUIs) they contain; all of them carry a meaning
that is unique, but common to that of the original query.
Processing of medical reports: reports are processed in a sim-
ilar fashion to topics, identifying the UMLS concepts correspond-
ing to each phrase in the document, and generating all the possiblesub-phrases from the combination of CUIs of its different contextual
phrases.
As an example, we show a brief excerpt from a report (Fig. 2),
after being pre-processed in this stage to generate the correspond-
ing phrases. These phrases are expanded into different sub-phrases
through variations in the concepts (CUIs) that represent them
(Table 8). This way, we will be able to combine and match them with
each of thesub-phrasesdefining thetopic,and obtain the maximum
semantic proximity betweentopicandreport(as will be explained in
detail inSection 6.3). It is also worth noting that those phrases con-
taining a negation (assigned code 1), will be eliminated from the sim-
ilarity calculation process. In both cases, topicand reporthave been
conceptually expanded from the sub-phrases generated in both pro-
cesses.
-
7/26/2019 Evaluation of Sttyemantic Similarity Metrics Applied to the Automatic
8/14
I. Alonso, D. Contreras/ Expert Systems With Applications 44 (2016) 386399 393
CONGESTIVE HEART FAILURE. CYSTIC STRUCTURE AT THE POSTERIOR LEFT SIDE OF THE URINARY BLADDER WHICH
CAUSES MASS EFFECT ON THE URINARY BLADDER AND ADJACENT TO UTERUS, DETECTED ON CT OF THE ABDO-
MEN. NO CHANGE IN 7 X 8 CM FOCAL CYSTIC STRUCTURE. HARD OF HEARING. IRON DEFICIENCY ANEMIA.
Fig. 2. Example pre-processed excerpt of a report (Report 90230).
Table 6
Phrase table (Topic 104).
SUBPHRASE PHRASE Topic 104:"Patients diagnosed with localized prostatecancer and treated with robotic surgery"
1001 1 CUI1 = (C0030705) : podg : "Patients"
1002 2 CUI1 = (C0011900) : fndg : "Diagnosis"
CUI2 = (C0796563) : neop : "Localized Malignant
Neoplasm"
CUI3 = (C0033572) : bpoc : "Prostate"
1003 2 CUI1 = (C0011900) : fndg : "Diagnosis"
CUI2 = (C0796563) :neop : "Localized Malignant
Neoplasm"
CUI3 = (C1278980) :bpoc : "Entire prostate"
1004 2 CUI1 = (C0011900) : fndg : "Diagnosis"
CUI2 = (C1334407) : neop : "Localized Carcinoma"
CUI3 = (C0033572) : bpoc : "Prostate"
1005 2 CUI1 = (C0011900) : fndg : "Diagnosis"
CUI2 = (C1334407) : neop : "Localized Carcinoma"
CUI3 = (C1278980) : bpoc : "Entire prostate"1006 2 CUI1 = (C0011900) : fndg : "Diagnosis"
CUI2 = (C0392752) : spco : "Localized"
CUI3 = (C0376358) : neop : "Malignant neoplasm of
prostate"
1007 2 CUI1 = (C0011900) : fndg : "Diagnosis"
CUI2 = (C0392752) : spco : "Localized"
CUI3 = (C0600139) : neop : "Prostate carcinoma"
1008 2 CUI1 = (C0011900) : fndg : "Diagnosis"
CUI2 = (C0392752) : spco : "Localized"
CUI3 = (C2984325) : ftcn : "Prostate Cancer Pathway"
1009 3 CUI1 = (C0332293) : topp : " Treated with"
CUI2 = (C0035785) : ocdi : "Robotics"
CUI3 = (C0038894) : bmod : "Surgery specialty"
1010 3 CUI1 = (C0332293) : topp : " Treated with"
CUI2 = (C0035785) : ocdi : "Robotics"
CUI3 = (C0038895) : ftcn : "Surgical aspects"
1011 3 CUI1 = (C0332293) : topp : " Treated with"
CUI2 = (C0035785) : ocdi : "Robotics"
CUI3 = (C0543467) : diap : "Operative Surgical
Procedures"
Table 7
Phrase table (Topic 101).
SUBPHRASE PHRASE Topic 101:"Patients with hearing loss"
1001 1 CUI1 = (C0030705) : podg : "Patients"
CUI2 = (C0011053) : dsyn : "Deafness"
1002 1 CUI1 = (C0030705) : podg : "Patients"
CUI2 = (C0018772) : fndg: "Hearing Loss, Partial"
1003 1 CUI1 = (C0030705) : podg : "Patients"
CUI2 = (C1384666) : fndg: "hearing impairment"
6.2. Filtering by topic semantic types
The query expansion conducted in the previous point enhances
the information retrieval process, as it unveils new relationships be-
tween concepts. Still, this expansion may generate relationships be-
tween concepts belonging to semantic types with little semantic
specialization or specificity (Bodenreider, 2001; Bodenreider, &
McCray, 2003; Erdogan et al., 2010; Plaza, & Daz, 2010). These re-
lationships may skew the accuracy of similarity results for those con-
cepts of greater semantic relevance to our current context.
For this reason, wehavegone on to classify semantic typesby their
importance, dividing them into generic and specific types. Spe-
cific semantic types group concepts that carry more importance in
the biomedical domain, such as diseases, symptoms, procedures, and
Table 8
Example processed excerpt of a report (report90230).
SUBPHRASE PHRASE Negation Excerptreport90230
190 254 0 C0018802 dsyn CONGESTIVE HEART
FAILURE.
191 255 0 C0010709 dsyn CYSTIC STRUCTURE AT
THE POSTERIOR LEFT SIDE
191 255 0 C0678594 spco CYSTIC STRUCTURE AT
THE POSTERIOR LEFT SIDE
191 255 0 C0456856 spco CYSTIC STRUCTURE AT
THE POSTERIOR LEFT SIDE
191 255 0 C0441987 spco CYSTIC STRUCTURE AT
THE POSTERIOR LEFT SIDE
192 255 0 C0010709 dsyn CYSTIC STRUCTURE AT
THE POSTERIOR LEFT SIDE
192 255 0 C0678594 spco CYSTIC STRUCTURE AT
THE POSTERIOR LEFT SIDE
192 255 0 C0205095 spco CYSTIC STRUCTURE ATTHE POSTERIOR LEFT SIDE
192 255 0 C0205091 spco CYSTIC STRUCTURE AT
THE POSTERIOR LEFT SIDE
193 256 0 C0577559 fndg MASS EFFECT ON THE
URINARY BLADDER
193 256 0 C1280500 qlco MASS EFFECT ON THE
URINARY BLADDER
193 256 0 C0005682 bpoc MASS EFFECT ON THE
URINARY BLADDER
194 256 0 C0577559 fndg MASS EFFECT ON THE
URINARY BLADDER
194 256 0 C2348382 qlco MASS EFFECT ON THE
URINARY BLADDER
194 256 0 C0005682 bpoc MASS EFFECT ON THE
URINARY BLADDER
195 256 0 C1280500 qlco MASS EFFECT ON THE
URINARY BLADDER195 256 0 C0042027 bpoc MASS EFFECT ON THE
URINARY BLADDER
195 256 0 C0238775 fndg MASS EFFECT ON THE
URINARY BLADDER
196 256 0 C1280500 qlco MASS EFFECT ON THE
URINARY BLADDER
196 256 0 C1524119 qlco MASS EFFECT ON THE
URINARY BLADDER
196 256 0 C0238775 fndg MASS EFFECT ON THE
URINARY BLADDER
197 256 0 C2348382 qlco MASS EFFECT ON THE
URINARY BLADDER
197 256 0 C0042027 bpoc MASS EFFECT ON THE
URINARY BLADDER
197 256 0 C0238775 fndg MASS EFFECT ON THE
URINARY BLADDER
198 256 0 C2348382 qlco MASS EFFECT ON THE
URINARY BLADDER
198 256 0 C1524119 qlco MASS EFFECT ON THE
URINARY BLADDER
198 256 0 C0238775 fndg MASS EFFECT ON THE
URINARY BLADDER
199 257 0 C0442726 fndg DETECTED ON CT
200 258 1 C0205234 spco NO CHANGE IN 7 8 CM
FOCAL CYSTIC STRUCTURE
200 258 1 C1511605 fndg NO CHANGE IN 7 8 CM
FOCAL CYSTIC STRUCTURE
200 258 1 C0678594 spco NO CHANGE IN 7 8 CM
FOCAL CYSTIC STRUCTURE
201 259 0 C0018772 fndg HARD OF HEARING
202 259 0 C1384666 fndg HARD OF HEARING
203 260 0 C0162316 dsyn IRON DEFICIENCY
ANEMIA.
-
7/26/2019 Evaluation of Sttyemantic Similarity Metrics Applied to the Automatic
9/14
394 I. Alonso, D. Contreras / Expert Systems With Applications 44 (2016) 386399
medication. We now show, as an example, the semantic types which
appear in Topics 104 and 101:
-Generic
spco - Spatial Concept (CONC)
podg - Patient or Disabled Group (LIVB)
ftcn - Functional Concept (CONC)
-Specific
dsyn - Disease or Syndrome (DISO)diap - Diagnostic Procedure (PROC)
neop - Neoplastic Process (DISO)
fndg - Finding (DISO)
bpoc - Body Part, Organ, or Org.Component (ANAT)
topp - Thrapeutic or Preventive Procedure (PROC)
bmod - Biomedical Occupation or Discipline (OCCU)
ocdi - Occupation or Discipline (OCCU)
Concepts associated to generic semantic types will be eliminated
from the phrase table generated in the previous step. For example, in
Tables 6and7,we show eliminated concepts in grey, for Topics 104
and 101. This way, we can identify the concepts that are not relevant
in the context of the specific phrase.
Note how, from sub-phrase 1008, the concepts Localized and
Prostate Cancer Pathway are eliminated, as they belong to generic
types spco and ftcn respectively. We will also note how in Topic
104, phrase 1 will be completely eliminated.
6.3. Maximum semantic similarity matrix (topic vs report) and
relevance computation
In order to assess the semantic similarity between each topicand
each report, we perform a similarity evaluation at several aggregation
levels: CUIs, sub-phrases, and phrases. Similarity computation (Sim)
is achieved by a matrix that pairs the topics CUIs with the reports
CUIs, for both of our chosen metrics: Path and Intrinsic IC-Path. Af-
terwards, we select, for every CUI in the topicsub-phrase, the paired
concepts (topic-report) with the highest similarity value. This process
is then repeated for everytopicsub-phrase within a phrase.
Simsubphrasesubphrasei cuij= max
Sim
CUIsubphraseij CUIreportk
(15)
whereiis each of the topic sub-phrases, jeach of the
sub-phrases CUIs, andkeach of the reports CUIs.
Later, for each individual phrase, we select the maximum similar-
ity value of each CUI present in its topicsub-phrases. In the individual
case of Topic 104 and phrase 2(Table 6) we will obtain the maximum
similarity value of CUI1, CUI2, and CUI3.
Sim_max_phrasecuij = max
Sim_subphrasesubphrasei cuij
(16)
This done, we average their values, obtaining a single similar-
ity value per phrase. In our example for Topic 104 and phrase 2,
Sim_avg_phrase = CUI1 + CUI2 + CUI3/3.Sim_avg_phrasei
=
num_cuis_phrasei=0
(Sim_max_phrasei/num_cuis_phrase) (17)
Lastly, we average the maximum similarity values of all the
phrases in the search, which will derive the final relevance of the re-
portrespecting the topic. In thecase of Topic 104, Sim_topicvsreport =
(Sim_avg_phrase1+ Sim_avg_phrase2+ Sim_avg_phrase3)/3.
It is interesting to point out that, in the particular case of Topic
104, the final relevance value is determined by the average similar-
ity value of the last two phrases. Phrase 1 (Patient)is completely
eliminated from the result, since all the concepts (CUIs) that make it
up are associated to generic semantic types (podg).
Sim_topicvsreport =
num_phrasesi=0
(Sim_avg_phrasei/num_phrases)
(18)
We can then say that the final value (Sim_topic vsreport) of the
maximum similarity matrix of a reportin relation toa topicwill deter-
mine whether or not it is relevant for the terms defined by said topic.
The lower extreme (value 0) indicates maximum non-relevance, and
the upper extreme (value 1) indicates maximum relevance.In order to compare the final value obtained by the semantic sim-
ilarity matrix to the relevance criteria offered by experts in each case,
it will be necessary to establish a cut-off value (within the range
[0,1]), which will determine whether a certain report is relevant or
not to a given topic. This will be studied and defined in the next sec-
tion.
Since a medicalvisitmay be made up of more than onereport, the
visits relevance will be determined by the maximum similarity value
of itsreports.
This method tries to preserve the informational uniqueness and
completeness of the query (topic) for its automated treatment, with-
out any input needed from the user. For this, it is necessary toinclude
each of thetopiccomponents by a process of aggregation of the aver-
age of the maximum similarity values of the different phrases. In thisway, each subphrase, which is expanded from the phrases that make
up the topic, is measured with the same precision when the aggrega-
tion of their averages takes place. However, what will determine, in
the end, the relevance of each component, will be the maximum se-
mantic similarity of the topic concepts in relation to the report, along
with the semantic type they belong to.
Through this straightforward example (Table 9), we can observe
the importance of concept-based expansion, both of the topicand of
the report, between theconcepts of whichwe canestablish maximum
similarity relationships, even when the terms or strings are different
in themselves. So, for example, the terms associated to the CUIs of
topic (Deafness; Hearing Loss, Partial; hearing impairment), are
different from the terms associated to the CUIs of report (Hard of
Hearing), and yet, we obtain the maximum possible similarity.Tables 9 and 10, are composed by the following elements: the first
two columns (topicandreport)are formed by the id. sub-phrases, id.
phrases, CUIs, semantic type and string phrases of the topic and the
report respectively. The two last columns correspond to the maxi-
mum similarity for each metric between pairs of topic-reportCUIs.
7. Result analysis
In this section, we will analyze the results obtained after evaluat-
ingtopicsmatched toreports, by the procedure described in the pre-
vious section.
In order to contrast the relevance criteria set by the experts with
the results of the retrieval system we propose in this paper, we have
generated a histogram(Fig. 3) which reflects the similarity of eachvisit (thereportwith the highest similarity value in each) to a search
topic. Thesereportsare distributed along the X axis according to their
degree of relevance (0 being Not relevant, and 1 Relevant). Lastly,
to ease the understanding of the histogram, we highlight in black
those reports which were deemed Relevant by the experts, and in
ochre those deemed Not relevant.
7.1. Justification of topic semantic type filtering
Firstly, we have carried out a series of experiments to validate fil-
tering by concepts associated to specific topic semantic types. Thus,
inFig. 3, we show the results of evaluating the reports matched to
Topic 107(Patients with ductal carcinoma in situ (DCIS)), both filtered
by semantic types (Fig. 3b) and unfiltered(Fig. 3a). We can easily see
-
7/26/2019 Evaluation of Sttyemantic Similarity Metrics Applied to the Automatic
10/14
I. Alonso, D. Contreras/ Expert Systems With Applications 44 (2016) 386399 395
Table 9
Example maximum similarity values matrix for each sub-phrase Sim_subphrase from Topic101 - Report90230.
Topic 101 Report90230 Path IC-Path
Max. Sim Max. Sim
1001 1 C0030705 podg Patients 168 52 C0030705 podg the patient on consultation 1.0000 1.0000
1001 1 C0011053 d syn De afnes s 201 73 C0018772 fnd g HARD O F HEARING. 0.5000 0.8042
1002 1 C0030705 podg Patients 113 11 C0030705 podg the patient in consultation 1.0000 1.0000
1002 1 C0018772 fndg Hearing Loss, Partial 201 73 C0018772 fndg HARD OF HEARING. 1.0000 1.0000
1003 1 C0030705 podg Pati ents 111 9 C0030705 podg The patient appare ntly 1.0000 1.0000
1003 1 C1384666 fndg hearing impairment 202 73 C1384666 fndg HARD OF HEARING. 1.0000 1.0000
Table 10
Example maximum similarity values matrix of each sub-phrase Sim_subphrase from Topic104 - Report51139.
Topic 104 Report51139 Path IC-Path
Max. Sim Max. Sim
1001 1 C0030705 podg Patients 43 15 C0030705 podg he patient 1.0000 1.0000
1002 2 C0011900 fndg Diagnosis 75 28 C0543467 diap DESCRIPTION OF OPERATION 0.3333 0.7172
1002 2 C0796563 neo p Local ized Mal ignant Neo plasm 28 12 C0796563 neop LOCALIZED PRO STATE CANCER. 1.0000 1.0000
1002 2 C0033572 bpoc Prostate 61 18 C0033572 bpoc now for removal of his prostate 1.0000 1.0000
1003 2 C0011900 fndg Diagnosis 40 14 C0376358 neop LOCALIZED PROSTATE CANCER. 0.3333 0.7172
1003 2 C0796563 neo p Local ized Mal ignant Neo plasm 28 12 C0796563 neop LOCALIZED PRO STATE CANCER. 1.0000 1.0000
1003 2 C1278980 bpoc Entire prostate 380 19 C1278980 bpoc The prostate 1.0000 1.0000
1004 2 C0011900 fndg Diagnosis 40 14 C0376358 neop LOCALIZED PROSTATE CANCER. 0.3333 0.7172
10 04 2 C1334407 neop Localized Carcinoma 30 12 C1334407 neop LOCALIZED PROSTATE CANCER. 1.00 00 1.00 00
1004 2 C0033572 bpoc Prostate 171 75 C0033572 bpoc at the prostate. 1.0000 1.00001005 2 C0011900 fndg Diagnosis 63 19 C0184661 diap benefits of the procedure 0.3333 0.7172
10 05 2 C1334407 neop Localized Carcinoma 30 12 C1334407 neop LOCALIZED PROSTATE CANCER. 1.00 00 1.00 00
10 05 2 C12789 80 bpoc E ntire prostate 236 113 C12789 80 bpoc sharp dissec tion until the prostate 1.0 00 0 1. 00 00
1006 2 C0011900 fndg Diagnosis 15 7 C0184661 diap PROCEDURE 0.3333 0.7172
1006 2 C0392752 spco Local ized 53 16 C0392752 spco 50s- ye ar-o ld male wi th l ocal ized adenocarcinoma o f 1.0000 1.0000
1006 2 C0376358 neo p Mali gnant neo plasm o f prostate 32 12 C0376358 neop LOCALIZED PRO STATE CANCER. 1.0000 1.0000
1007 2 C0011900 fndg Diagnosis 14 6 C0543467 diap SURGERY DATE 0.3333 0.7172
1007 2 C0392752 s pco Locali ze d 44 16 C0392752 spco 50s- ye ar-o ld male wi th l ocal ized adenocarcinoma o f 1.0000 1.0000
10 07 2 C060 0139 neop Prostate carcinoma 33 12 C060 0139 neop LOCALIZED PROSTATE CANCER. 1.00 00 1.00 00
1008 2 C0011900 fndg Diagnosis 32 12 C0376358 neop LOCALIZED PROSTATE CANCER. 0.3333 0.7172
1008 2 C0392752 s pco Locali ze d 44 16 C0392752 spco 50s- ye ar- old male wi th l ocal ized adenocarcinoma o f 1.0000 1.0000
10 08 2 C2984325 f tcn Prostate Cancer Path way 42 14 C29 84325 ftc n LOCALIZED PROSTATE CANCER. 1.0 00 0 1. 00 00
1009 3 C0332293 topp Treated with 523 24 C0444667 qnco present for the entire procedure. 0.0000 0.0000
1009 3 C0035785 ocdi Robotics 17 8 C0035785 ocdi ROBOTIC-ASSISTED LAPAROSCOPIC RADICAL PROSTATECTOMY 1.0000 1.0000
1009 3 C0038894 bmod Surgery specialty 9 6 C0038894 bmod SURGERY DATE 1.0000 1.0000
1010 3 C0332293 topp Treated with 522 24 C0450011 topp present for the entire procedure. 0.0000 0.0000
1010 3 C0035785 ocdi Robotics 19 8 C0035785 ocdi ROBOTIC-ASSISTED LAPAROSCOPIC RADICAL PROSTATECTOMY 1.0000 1.00001010 3 C0038895 ftcn Surgical aspects 11 6 C0038895 ftcn SURGERY DATE 1.0000 1.0000
1011 3 C0332293 topp Treated with 522 24 C0450011 topp present for the entire procedure. 0.0000 0.0000
1011 3 C0035785 ocdi Robotics 19 8 C0035785 ocdi ROBOTIC-ASSISTED LAPAROSCOPIC RADICAL PROSTATECTOMY 1.0000 1.0000
1011 3 C0543467 diap Opera tive Surgical P rocedures 75 28 C0543467 diap DESCRIP TION OF OPERATION 1.0 00 0 1. 00 00
how, after filtering, the most significant reports deemed Not rele-
vant (ochre) and Relevant (black) are displaced towards areas of
lower and higher relevance respectively.
These results highlight the necessity to perform a query expan-
sion by specific semantic types only, hence obtaining more accu-
rate results for a lower computational cost (as we eliminate the need
to calculate similarity for generic semantic types).
7.2. Behavior of Path and Intrinsic IC-Path metrics
To comparatively evaluate the performance (in terms of semantic
similarity) of the PathandIntrinsic IC-Pathmetrics in a real-life con-
text, we show a preliminary experiment on two search criteria. One
is a simple topic, Topic 101 (Patients with hearing loss), applied to
4073 reports grouped in 249 visits. The other is a complex topic, Topic
104, (Patients diagnosed withlocalized prostate cancer and treated with
robotic surgery), applied to 3439 reports grouped in 196 visits.
The results obtained from applying the Path metric to a simple
topic (Topic 101), show a discrete distribution of results, derived from
its definition which is based on the inverse of the distances(Fig. 4a).
This makes for uncertainty zones, since some reportsare localised in
similarityvalues between 0.45 and 0.50 (27 non-relevant reports, and
9 relevant).
In the case of the Intrinsic IC-Path metric, the internal nature of
its calculation does away with this discrete character (Fig. 4b). The
global results compared to those of the Path metric are similar, but
distributed in a smoother fashion, more evenly distributed towards
both extremes.
Conversely, when processing complex topics (with multiple
phrases) such as Topic 104, calculations based on aggregated averages
of the maximum similarity values obtained (Section 6.3)counter the
discrete character of thePathmetric. Also, for both metrics, the sim-ilarity values of thereportstend to spread following a normal distri-
bution function(Fig. 5a and b), which removes the previously men-
tioned discrepancies.
7.3. Choosing the cut-off value
From thereportsimilarity distributions generated for each search
criteria, as shown in the previous part (Figs. 4 and 5), we must
establish a cut-off value to determine whether the report is relevant.
Based on that value, reports with an estimated similarity greater or
equal to it will be deemed relevant by the system, and the rest not
relevant. By doing this with reports that have already been assessed
by experts as Relevant or Not relevant for each topic, we can esti-
mate the accuracy of the retrieval system we propose in this work.
-
7/26/2019 Evaluation of Sttyemantic Similarity Metrics Applied to the Automatic
11/14
396 I. Alonso, D. Contreras / Expert Systems With Applications 44 (2016) 386399
Fig. 3. (a) Histogram for Topic 107withoutsemantic type filtering. (b) Histogram for Topic 107withsemantic type filtering.
Fig. 4. PathvsIntrinsic IC-Pathfor a simple search topic (Topic 101).
Fig. 5. PathvsIntrinsic IC-Pathfor a complex search topic (Topic 104).
As the previous part gives out, it is easy to determine the cut-
off point for simple topics, due to the observed distribution of their
values towards the extremes. However, when working with complex
searches, the decision will be more complex, as well as more critical
for the performance of the system. For all this, to define the cut-off
value, we will adhere to the following premises:
The value must be common to both metrics and lie between 0
and 1. It must be greater than 0.5, as this value represents a syn-
onymy relationship between concepts under the Path metric,
but is not sufficient in itself to establish relevance in complex
search.
-
7/26/2019 Evaluation of Sttyemantic Similarity Metrics Applied to the Automatic
12/14
I. Alonso, D. Contreras/ Expert Systems With Applications 44 (2016) 386399 397
Fig. 6. Documents evaluated by the proposed system forTopic 104, usingPathandIntrinsic IC-Pathmetrics.
Table 11
Final relevance values for the examples inSection 6.3.
Path Intrinsic IC-Path
Sim_topic101vsreport90230 = 1.0 00 1.0 00
Sim_topic104vsreport51139= 0.7149 0.7783
It must show a balance in classifying documents by relevance;
that is, the higher the cut-off value is, the more documents it will
classify as not relevant, to the detriment of relevant results.
From the stated premises, and for a simple maximum similarity
matrix of a mere two concepts, a reportwould only be deemed rel-
evant to a topic, if the two concepts had a similarity value of 1.0
(distance equals 1 and represents the same concept) or 0.5 (dis-
tance equals 2 and represents a synonym). In a real-life context, with
complex phrases made of multiple pairs of concepts, applying an av-
erage value to all similarities carries errorsof variance that distort thefinal results. For this reason, it is necessary to enact two additional
requirements to ensure the proper application of the average value:
that atleast one of the pairs of concepts has a similarity value of 1.0,
and that at most one of the pairs has a value lower than 0.5 (values
lower than 0.5 represent a distant synonymy between concepts). If
these two additional criteria are not met, the report is deemed Not
Relevant.
Once this correction was applied, a test group of 1000 reports
showed that all Relevant documents presented values equal to or
greater than 0.6.
For this reason, we have established a cut-off value as 0.6, as the
minimum value to meet all the requirements above.
Hence, the final result of the maximum similarity matrix
(Sim_topicvsreport) will reflect the relevance of a reportin relation toatopicin the following manner:
- If value of (Sim_topicvsreport) is within the range [0.0; 0.6); thereport, is Not Relevant to the topic.
- If value of (Sim_topicvsreport) is within the range [0.6; 1.0]; the
report, is Relevant to the topic.
In this way, the examples shown inSection 6.3(Tables 9and10),
correspond to tworeportsthat were deemed Relevant to the topics
they were evaluated for both metricsTable 11.
UsingTopic 104 as an example, for the Pathmetric with the pro-
posed cut-off value, we can observe (Fig. 6) how 9 reports assessed by
experts as not relevant aretagged as relevant by thesystem, while
1 deemed relevant by the experts turns out as not relevant. In the
case of theIntrinsic IC-Pathmetric, 4 reports tagged not relevant by
Table 12
Aggregated results.
Path Intrinsic IC-Path
Recall 0 .753 0.639
Precision 0 .364 0.392
F-Measure 0.430 0.427
experts are seen as relevant by the system, and 2 relevant as not
relevant.
All told, in the specific case of Topic 104, the results obtained
for the Path metric are: (Precision = 44.4%; Recall = 88.9%; F-
Measure = 59.3%). And for the Intrinsic IC-Path metric: (Preci-
sion = 63.6%; Recall = 77.8%; F-Measure =70.0%). In this case, the
Intrinsic IC-Pathmetric shows a better performance thanPathmetric.
7.4. Evaluation of Path and Intrinsic IC- Path metrics with the
TREC dataset
In this part, we will evaluate the performance of the two met-
rics analyzed in our work (PathandIntrinsic IC-Path) in a real-life in-
formation retrieval scenario. To do so, we will use the 35 topics (or
search criteria) proposed in TREC 2011, with an information source of
101,712 reports (grouped into 17,265 visits).
The metrics used in this evaluation are the standard ones in the
field of information retrieval: Precision, Recall, and F-Measure. The
latter is best at reflecting a balance between the first two, since it is
defined as:
F-Mesaure = 2
Precision Recall
Precision + Recall
(19)
Table 12shows the average of all the results obtained in the re-trieval of relevant reports for each of the proposed search topics.
As we can see, the F-Measure value of both metrics is very similar
(Path = 0.430, Intrinsic IC-Path = 0.427), with a slight edge for the
Path metric. Although these results suggest that, in terms of Recall,
Path is the superior metric, with Intrinsic IC-Path having the upper
hand in Precision, we cannot consider them to be conclusive, as both
indicators are complementary.
Digging deeper into the results, and analyzing their dispersal pat-
tern, Figs. 7 and 8 shows the detailed values of theprevious indicators
for all the search topics, and for both metrics studied in this work.
This figure reveals the complexity of a number of topics (such as 116,
123, 124, 125, 126, 130, 133, or 134) for which the results, in terms
of F-Measure, lie below 20%, for both metrics; good examples further
illustrating the level of complexity of these topics would beTopic 123
-
7/26/2019 Evaluation of Sttyemantic Similarity Metrics Applied to the Automatic
13/14
398 I. Alonso, D. Contreras / Expert Systems With Applications 44 (2016) 386399
Fig. 7. Results usingPathfor the 35 topics. Recall, Precision, and F-Measure shown.
Fig. 8. Results usingIntrinsic IC-Pathfor the 35 topics. Recall, Precision, and F-Measure shown.
(Diabetic patients who received diabetic education in the hospital) or
Topic 133 (Patients admitted for care who take herbal products for os-teoarthritis) .
Topics 123 and 134 produce a completely anomalous result, due
to an error detected in the UMLS relationships for two particular
concepts. These concepts, C0241863 Diabetic for Topic 123, and
C1148454 Seizure activityfor Topic 134, offer no similarity distance,
and are particularly important for said topics.
8. Conclusions
The extraction of information through natural language process-
ing in biomedical documents is both important and complex enough
to deserve very particular attention. For this reason, many works have
been published that address the matter by dealing with similarity
metrics in a theoretical context, using the UMLS resource; however,none of them manage to fulfil the actual need for information re-
trieval from medical documents.
It is for this reason that, in this paper, we have proposed a novel
experimental study for assessing the performance ofIntrinsic IC-Path
and Path metrics in a real-life context that is, real medical re-
ports. Also, in order to perform that study, we have deployed an ad-
hoc framework to formalize the use of the UMLS Metathesaurus for
the retrieval of medical information from these actual reports (TREC
Medical Records Track 2011) through maximum semantic similarity
matrices.
The conclusion drawn from our work is that, in a real-life con-
text, both assessed metrics display similar performance, Path (F-
Measure = 0.430) e Intrinsic IC-Path (F-Measure = 0.427). Therefore,
the variations in performance obtained in these theoretical contextsdisappear when the amount of data is increased, and real visits and
reports are used. So, these results do not justify the use of complex
metrics (with their associated high computational cost) as are these
variations of the Path metric, particularly Intrinsic IC-Path in this case.
The justification for these results lies in the fact that, unlike the com-
parison between isolated pairs of concepts conducted in previous
works, the information contained within a report or topic is inter-
related, extensive, and expressed in a natural language.
Theresults of this work are applicable to any similarity search pro-
cess conducted on biomedical documents (patient histories, clinical
reports, diagnostic tests like CT scans, X-Rays, etc.) as long as they are
contained in text files.
Once we have determined that the improved performance of
these similarity metrics has no impact in a real-life context, it be-
comes necessary to improve, in the future, the straightforward re-
trieval system we have proposed to perform this assessment. In thissense, it may prove beneficial to eliminate those sub-phrases within
a topic which, although syntactically correct, are not semantically
related to its meaning. Furthermore, the reports dealt with are fre-
quently ambiguous, as they refer to disparate (subjects) symptoms or
illnesses for the same patient, making automatic retrieval more dif-
ficult. It would be appropriate to filter or separate these documents
so that eachreportcovers one subject exclusively. By relating the re-
ports subject more closely with the search topic, we could exclude
secondary subjects from the results, which merely add noise, and in-
crease the computational costs of the query.
References
Alpi, K. M. (2005). Expert searching in public health. Journal of the Medical Library As-sociation, 93(1), 97103.
Al-Mubaid, H., & Nguyen, H. (2006). A cluster-based approach for semantic similar-ity in the biomedical domain. In Engineering in Medicine and Biology Society, 2006.EMBS06. 28th Annual International Conference of the IEEE(pp. 27132717).
Aronson, A. R. (2001). Effective mapping of biomedical text to the UMLS Metathe-saurus: The MetaMap program. In Proceedings of the American Medical Informatics
Association Symposium 2001(pp. 1721).Aronson, A. R., & Lang, F. M. (2010). An overview of MetaMap: Historical perspective
and recent advances.Journal of the American Medical Informatics Association, 17(3),229236.
Aronson, A. R., & Rindflesch, T. C. (1997). Query expansion using the UMLS Metathe-saurus. In Proceedings of the American Medical Informatics Association Annual FallSymposium(p. 485).
Babashzadeh, A., Huang, J., & Daoud, M. (2013, July). Exploiting semantics for improv-ing clinical information retrieval. In Proceedingsof the 36thinternational Association
for Computing Machinerys Special Interest Group on Information Retrieval Confer-ence on Research and development in information retrieval (pp. 801804). ACM SIGIR2013.
Batet, M., Snchez, D., & Valls, A. (2011). An ontology-based measure to compute se-mantic similarity in biomedicine. Journal of biomedical informatics, 44(1), 118125.
Bhogal, J., Macfarlane, A., & Smith, P. (2007). A review of ontology based query expan-sion.Information processing & management, 43(4), 866886.
Bodenreider, O. (2001). Circular hierarchical relationships in the UMLS: Etiology, di-agnosis, treatment, complications and prevention. In Proceedings of the AmericanMedical Informatics Association Symposium(p. 57).
Bodenreider, O. (2004). The unified medical language system (UMLS): Integratingbiomedical terminology.Nucleic acids research, 32(suppl 1), D267D270.
Bodenreider, O., & McCray, A. T. (2003). Exploring semantic groups through visual ap-proaches.Journal of biomedical informatics, 36(6), 414432.
Burgun, A., & Bodenreider, O. (2001). Comparing terms, concepts and semantic classesin WordNet and the Unified Medical Language System. InProceedings of the North
American Chapter of th e Association for Computational Linguistics 2001; WorkshopWordNet and Other Lexical Resources: Applications, Extensions and Customiza-tions (pp. 7782).
Caviedes,J. E., & Cimino, J. J. (2004).Towards thedevelopment of a conceptual distance
metric for the UMLS.Journal of biomedical informatics, 37(2), 7785.
http://refhub.elsevier.com/S0957-4174(15)00656-9/sbref0001http://refhub.elsevier.com/S0957-4174(15)00656-9/sbref0001http://refhub.elsevier.com/S0957-4174(15)00656-9/sbref0002http://refhub.elsevier.com/S0957-4174(15)00656-9/sbref0002http://refhub.elsevier.com/S0957-4174(15)00656-9/sbref0002http://refhub.elsevier.com/S0957-4174(15)00656-9/sbref0002http://refhub.elsevier.com/S0957-4174(15)00656-9/sbref0002http://refhub.elsevier.com/S0957-4174(15)00656-9/sbref0003http://refhub.elsevier.com/S0957-4174(15)00656-9/sbref0003http://refhub.elsevier.com/S0957-4174(15)00656-9/sbref0003http://refhub.elsevier.com/S0957-4174(15)00656-9/sbref0003http://refhub.elsevier.com/S0957-4174(15)00656-9/sbref0003http://refhub.elsevier.com/S0957-4174(15)00656-9/sbref0004http://refhub.elsevier.com/S0957-4174(15)00656-9/sbref0004http://refhub.elsevier.com/S0957-4174(15)00656-9/sbref0004http://refhub.elsevier.com/S0957-4174(15)00656-9/sbref0004http://refhub.elsevier.com/S0957-4174(15)00656-9/sbref0004http://refhub.elsevier.com/S0957-4174(15)00656-9/sbref0004http://refhub.elsevier.com/S0957-4174(15)00656-9/sbref0005http://refhub.elsevier.com/S0957-4174(15)00656-9/sbref0005http://refhub.elsevier.com/S0957-4174(15)00656-9/sbref0005http://refhub.elsevier.com/S0957-4174(15)00656-9/sbref0005http://refhub.elsevier.com/S0957-4174(15)00656-9/sbref0006http://refhub.elsevier.com/S0957-4174(15)00656-9/sbref0006http://refhub.elsevier.com/S0957-4174(15)00656-9/sbref0006http://refhub.elsevier.com/S0957-4174(15)00656-9/sbref0006http://refhub.elsevier.com/S0957-4174(15)00656-9/sbref0006http://refhub.elsevier.com/S0957-4174(15)00656-9/sbref0006http://refhub.elsevier.com/S0957-4174(15)00656-9/sbref0006http://refhub.elsevier.com/S0957-4174(15)00656-9/sbref0006http://refhub.elsevier.com/S0957-4174(15)00656-9/sbref0007http://refhub.elsevier.com/S0957-4174(15)00656-9/sbref0007http://refhub.elsevier.com/S0957-4174(15)00656-9/sbref0007http://refhub.elsevier.com/S0957-4174(15)00656-9/sbref0007http://refhub.elsevier.com/S0957-4174(15)00656-9/sbref0007http://refhub.elsevier.com/S0957-4174(15)00656-9/sbref0008http://refhub.elsevier.com/S0957-4174(15)00656-9/sbref0008http://refhub.elsevier.com/S0957-4174(15)00656-9/sbref0009http://refhub.elsevier.com/S0957-4174(15)00656-9/sbref0009http://refhub.elsevier.com/S0957-4174(15)00656-9/sbref0009http://refhub.elsevier.com/S0957-4174(15)00656-9/sbref0009http://refhub.elsevier.com/S0957-4174(15)00656-9/sbref0009http://refhub.elsevier.com/S0957-4174(15)00656-9/sbref0009http://refhub.elsevier.com/S0957-4174(15)00656-9/sbref0010http://refhub.elsevier.com/S0957-4174(15)00656-9/sbref0010http://refhub.elsevier.com/S0957-4174(15)00656-9/sbref0010http://refhub.elsevier.com/S0957-4174(15)00656-9/sbref0011http://refhub.elsevier.com/S0957-4174(15)00656-9/sbref0011http://refhub.elsevier.com/S0957-4174(15)00656-9/sbref0011http://refhub.elsevier.com/S0957-4174(15)00656-9/sbref0011http://refhub.elsevier.com/S0957-4174(15)00656-9/sbref0015ahttp://refhub.elsevier.com/S0957-4174(15)00656-9/sbref0015ahttp://refhub.elsevier.com/S0957-4174(15)00656-9/sbref0015ahttp://refhub.elsevier.com/S0957-4174(15)00656-9/sbref0015ahttp://refhub.elsevier.com/S0957-4174(15)00656-9/sbref0015ahttp://refhub.elsevier.com/S0957-4174(15)00656-9/sbref0015ahttp://refhub.elsevier.com/S0957-4174(15)00656-9/sbref0015ahttp://refhub.elsevier.com/S0957-4174(15)00656-9/sbref0015ahttp://refhub.elsevier.com/S0957-4174(15)00656-9/sbref0015ahttp://refhub.elsevier.com/S0957-4174(15)00656-9/sbref0012http://refhub.elsevier.com/S0957-4174(15)00656-9/sbref0012http://refhub.elsevier.com/S0957-4174(15)00656-9/sbref0012http://refhub.elsevier.com/S0957-4174(15)00656-9/sbref0012http://refhub.elsevier.com/S0957-4174(15)00656-9/sbref0012http://refhub.elsevier.com/S0957-4174(15)00656-9/sbref0012http://refhub.elsevier.com/S0957-4174(15)00656-9/sbref0012http://refhub.elsevier.com/S0957-4174(15)00656-9/sbref0012http://refhub.elsevier.com/S0957-4174(15)00656-9/sbref0012http://refhub.elsevier.com/S0957-4174(15)00656-9/sbref0015ahttp://refhub.elsevier.com/S0957-4174(15)00656-9/sbref0015ahttp://refhub.elsevier.com/S0957-4174(15)00656-9/sbref0015ahttp://refhub.elsevier.com/S0957-4174(15)00656-9/sbref0015ahttp://refhub.elsevier.com/S0957-4174(15)00656-9/sbref0011http://refhub.elsevier.com/S0957-4174(15)00656-9/sbref0011http://refhub.elsevier.com/S0957-4174(15)00656-9/sbref0011http://refhub.elsevier.com/S0957-4174(15)00656-9/sbref0011http://refhub.elsevier.com/S0957-4174(15)00656-9/sbref0010http://refhub.elsevier.com/S0957-4174(15)00656-9/sbref0010http://refhub.elsevier.com/S0957-4174(15)00656-9/sbref0009http://refhub.elsevier.com/S0957-4174(15)00656-9/sbref0009http://refhub.elsevier.com/S0957-4174(15)00656-9/sbref0008http://refhub.elsevier.com/S0957-4174(15)00656-9/sbref0008http://refhub.elsevier.com/S0957-4174(15)00656-9/sbref0008http://refhub.elsevier.com/S0957-4174(15)00656-9/sbref0008http://refhub.elsevier.com/S0957-4174(15)00656-9/sbref0008http://refhub.elsevier.com/S0957-4174(15)00656-9/sbref0007http://refhub.elsevier.com/S0957-4174(15)00656-9/sbref0007http://refhub.elsevier.com/S0957-4174(15)00656-9/sbref0007http://refhub.elsevier.com/S0957-4174(15)00656-9/sbref0007http://refhub.elsevier.com/S0957-4174(15)00656-9/sbref0007http://refhub.elsevier.com/S0957-4174(15)00656-9/sbref0006http://refhub.elsevier.com/S0957-4174(15)00656-9/sbref0006http://refhub.elsevier.com/S0957-4174(15)00656-9/sbref0006http://refhub.elsevier.com/S0957-4174(15)00656-9/sbref0006http://refhub.elsevier.com/S0957-4174(15)00656-9/sbref0006http://refhub.elsevier.com/S0957-4174(15)00656-9/sbref0005http://refhub.elsevier.com/S0957-4174(15)00656-9/sbref0005http://refhub.elsevier.com/S0957-4174(15)00656-9/sbref0005http://refhub.