Download - Experiences on integrating explicit knowledge on information access tools in the medical domain
![Page 1: Experiences on integrating explicit knowledge on information access tools in the medical domain](https://reader033.vdocuments.us/reader033/viewer/2022051612/54c2ee474a79594a428b45a4/html5/thumbnails/1.jpg)
+
Experiences on integrating explicit knowledge on information access tools in the medical domain
Manuel de la Villa Department of Information Technologies University of Huelva
Extractive Summarization
Query user-defined
expansion
Post-retrieval clustering
Computer-aided
summarization
![Page 2: Experiences on integrating explicit knowledge on information access tools in the medical domain](https://reader033.vdocuments.us/reader033/viewer/2022051612/54c2ee474a79594a428b45a4/html5/thumbnails/2.jpg)
+Index
Brief CV Why a research stay? In Wolverhampton?
Teaching
Integrating explicit knowledge on information access tools Knowledge sources (UMLS & Freebase) Automatic Text Summarization Information Retrieval
2
Research Group in Computational Linguistics (Univ. Wolverhampton), June 20th 2011
![Page 3: Experiences on integrating explicit knowledge on information access tools in the medical domain](https://reader033.vdocuments.us/reader033/viewer/2022051612/54c2ee474a79594a428b45a4/html5/thumbnails/3.jpg)
+Brief CV
3
Research Group in Computational Linguistics (Univ. Wolverhampton), June 20th 2011
![Page 4: Experiences on integrating explicit knowledge on information access tools in the medical domain](https://reader033.vdocuments.us/reader033/viewer/2022051612/54c2ee474a79594a428b45a4/html5/thumbnails/4.jpg)
+Teaching experience
Software Engineering Process and Methodologies, Metrics,
Requirements analysis, Design, … Software Engineering Lab (UML, NetBeans,
Subversion, Java, JUnit, Persistence…)
Multimedia applications development Adobe Director, Flash, Photoshop, Premiere Sony Sound Forge, Audacity
6
Research Group in Computational Linguistics (Univ. Wolverhampton), June 20th 2011
![Page 5: Experiences on integrating explicit knowledge on information access tools in the medical domain](https://reader033.vdocuments.us/reader033/viewer/2022051612/54c2ee474a79594a428b45a4/html5/thumbnails/5.jpg)
+Knowledge integration
7
Research Group in Computational Linguistics (Univ. Wolverhampton), June 20th 2011
![Page 6: Experiences on integrating explicit knowledge on information access tools in the medical domain](https://reader033.vdocuments.us/reader033/viewer/2022051612/54c2ee474a79594a428b45a4/html5/thumbnails/6.jpg)
+ Specific Domain Knowledge source. UMLS (I)
Research Group in Computational Linguistics (Univ. Wolverhampton), June 20th 2011
8
ICD-10
MeSH
SNOMED-CT
DSM-IV
LOINC
UK-Clinical Terms
RxNorm Gene Ontology
…
A saturation of different terminologies
UMLS aims to overcome a significant barrier, the variety of ways the same concepts are expressed in different machine-readable sources.
UMLS
An homogeneus group of terminologies
![Page 7: Experiences on integrating explicit knowledge on information access tools in the medical domain](https://reader033.vdocuments.us/reader033/viewer/2022051612/54c2ee474a79594a428b45a4/html5/thumbnails/7.jpg)
+ Specific Domain Knowledge source. UMLS (II)
Project NLM Unified Medical Language System (UMLS):
Aim, to develop tools that help researchers in the knowledge representation, retrieval and integration of biomedical information.
UMLS Knowledge Sources
Software tools
Three main components:
SPECIALIST Lexicon: Compilation of lexical elements (>200.000) with grammatical information and linguistic variants.
9
Research Group in Computational Linguistics (Univ. Wolverhampton), June 20th 2011
“Anaesthetic” {base=anesthetic spelling_variant=anaesthetic entry=E0330018 cat=noun variants=reg variants=uncount }
“Anaesthetic” {base=anesthetic spelling_variant=anaesthetic entry=E0330019 cat=adj variants=inv position=attrib(3) position=pred stative }
![Page 8: Experiences on integrating explicit knowledge on information access tools in the medical domain](https://reader033.vdocuments.us/reader033/viewer/2022051612/54c2ee474a79594a428b45a4/html5/thumbnails/8.jpg)
+ Specific Domain Knowledge source. UMLS (III)
Metathesaurus: very large, multi-purpose, and multi-lingual vocabulary database (compiles more than 100 source vocabularios),
every term (>5M) associated with a concept (>1.5M), terms related (e.g., synonyms) (16M relations)
each concept assigned to one or more semantic types of the 135 existing
Different terms…
for a same concept…
Included in a semantic type
10
Research Group in Computational Linguistics (Univ. Wolverhampton), June 20th 2011
https://uts.nlm.nih.gov/metathesaurus.html
![Page 9: Experiences on integrating explicit knowledge on information access tools in the medical domain](https://reader033.vdocuments.us/reader033/viewer/2022051612/54c2ee474a79594a428b45a4/html5/thumbnails/9.jpg)
+Specific Domain Knowledge source. UMLS (IV)
UMLS Semantic Network: is an ontology with 135 semantic types and to 54 types of relationships between types
11
Research Group in Computational Linguistics (Univ. Wolverhampton), June 20th 2011
https://uts.nlm.nih.gov/semanticnetwork.html
![Page 10: Experiences on integrating explicit knowledge on information access tools in the medical domain](https://reader033.vdocuments.us/reader033/viewer/2022051612/54c2ee474a79594a428b45a4/html5/thumbnails/10.jpg)
+ General Domain Knowledge Source: Freebase (I)
Freebase is a large public database that collects three kinds of information: data;
texts; and media, that references…
…entities or topics (≈ 12 million). An entity is a unique single person, place, or thing.
A single concept or real-world thing. A topic could also be called an entity, resource or element or thing, it is a
fundamental unit in Freebase. /common/topic Each topic has a Guid or globally unique ID
http://www.freebase.com/view/en/barack_obama http://www.freebase.com/guid/9202a8c04000641f800000000029c277
![Page 11: Experiences on integrating explicit knowledge on information access tools in the medical domain](https://reader033.vdocuments.us/reader033/viewer/2022051612/54c2ee474a79594a428b45a4/html5/thumbnails/11.jpg)
+ General Domain Knowledge Source: Freebase (II)
Freebase connects entities together as a graph,
defines its data structure as a set of nodes and a set of links that establish relationships between the nodes.
Most of our topics are associated with one or more types (such as people, places, books, films, etc) and may have additional properties like "date of birth" for a person or latitude and longitude for a location. These types and properties and related concepts are called Schema.
![Page 12: Experiences on integrating explicit knowledge on information access tools in the medical domain](https://reader033.vdocuments.us/reader033/viewer/2022051612/54c2ee474a79594a428b45a4/html5/thumbnails/12.jpg)
+
Schema (the way Freebase's data is laid out) is expressed through Types and Properties. Types are grouped together in Domains.
General Domain Knowledge Source: Freebase (III)The Schema
![Page 13: Experiences on integrating explicit knowledge on information access tools in the medical domain](https://reader033.vdocuments.us/reader033/viewer/2022051612/54c2ee474a79594a428b45a4/html5/thumbnails/13.jpg)
+
Schema (the way Freebase's data is laid out) is expressed through Types and Properties. Types are grouped together in Domains.
General Domain Knowledge Source: Freebase (III)The Schema
![Page 14: Experiences on integrating explicit knowledge on information access tools in the medical domain](https://reader033.vdocuments.us/reader033/viewer/2022051612/54c2ee474a79594a428b45a4/html5/thumbnails/14.jpg)
+
Schema (the way Freebase's data is laid out) is expressed through Types and Properties. Types are grouped together in Domains.
General Domain Knowledge Source: Freebase (III)The Schema
![Page 15: Experiences on integrating explicit knowledge on information access tools in the medical domain](https://reader033.vdocuments.us/reader033/viewer/2022051612/54c2ee474a79594a428b45a4/html5/thumbnails/15.jpg)
+
Schema (the way Freebase's data is laid out) is expressed through Types and Properties. Types are grouped together in Domains.
General Domain Knowledge Source: Freebase (III)The Schema
![Page 16: Experiences on integrating explicit knowledge on information access tools in the medical domain](https://reader033.vdocuments.us/reader033/viewer/2022051612/54c2ee474a79594a428b45a4/html5/thumbnails/16.jpg)
+ General Domain Knowledge Source: Freebase (IV) The Schema: Medicine
![Page 17: Experiences on integrating explicit knowledge on information access tools in the medical domain](https://reader033.vdocuments.us/reader033/viewer/2022051612/54c2ee474a79594a428b45a4/html5/thumbnails/17.jpg)
+ General Domain Knowledge Source: Freebase (V) How can we use it…
As a reference or information source
Create interesting Views and Visualizations and share them with others
Embed Freebase data in your website
Use our API or Acre, our hosted app development platform, to build apps that use Freebase data
Download our Data dumps
Use Freebase's RDF for Semantic Web applications
![Page 18: Experiences on integrating explicit knowledge on information access tools in the medical domain](https://reader033.vdocuments.us/reader033/viewer/2022051612/54c2ee474a79594a428b45a4/html5/thumbnails/18.jpg)
+ General Domain Knowledge Source: Freebase (IV) The Freebase approach
![Page 19: Experiences on integrating explicit knowledge on information access tools in the medical domain](https://reader033.vdocuments.us/reader033/viewer/2022051612/54c2ee474a79594a428b45a4/html5/thumbnails/19.jpg)
+
• http://api.freebase.com/api/service/mqlread?query={"query":{"type":"/music/artist","name":"U2","album":[]}}
• http://api.freebase.com/api/service/mqlread?query={"query":[{"type":"/medicine/disease", "name":null, "symptoms":{"name":"Nausea"}}]}
• Query Editor
MQL (Metaweb Query Language)
![Page 20: Experiences on integrating explicit knowledge on information access tools in the medical domain](https://reader033.vdocuments.us/reader033/viewer/2022051612/54c2ee474a79594a428b45a4/html5/thumbnails/20.jpg)
+Knowledge integration
22
Research Group in Computational Linguistics (Univ. Wolverhampton), June 20th 2011
![Page 21: Experiences on integrating explicit knowledge on information access tools in the medical domain](https://reader033.vdocuments.us/reader033/viewer/2022051612/54c2ee474a79594a428b45a4/html5/thumbnails/21.jpg)
+Experiences in Automatic summarization (I)
+ We develop a proposal with this main characteristics:
Sentences extraction
Document representation as a graph
Centered on biomedical concepts
Using concept frequency to measure relevance
23
Research Group in Computational Linguistics (Univ. Wolverhampton), June 20th 2011
![Page 22: Experiences on integrating explicit knowledge on information access tools in the medical domain](https://reader033.vdocuments.us/reader033/viewer/2022051612/54c2ee474a79594a428b45a4/html5/thumbnails/22.jpg)
+Experiences in Automatic summarization (II)
+ Phase I: Graph generation Sentences and UMLS concepts identification
+ Phase II: Similarity algorithm Concepts overlapping between sentences
(edges) means “recommendation”
+ Phase III: Ranking algorithm Weight associated with each edge depends on
similarity
+ Phase IV: Summary building Top ranked sentences are selected
24
Research Group in Computational Linguistics (Univ. Wolverhampton), June 20th 2011
![Page 23: Experiences on integrating explicit knowledge on information access tools in the medical domain](https://reader033.vdocuments.us/reader033/viewer/2022051612/54c2ee474a79594a428b45a4/html5/thumbnails/23.jpg)
+Experiences in Automatic summarization (II)
+ Phase I: Graph generation Sentences and UMLS concepts identification
+ Phase II: Similarity algorithm Concepts overlapping between sentences
(edges) means “recommendation”
+ Phase III: Ranking algorithm Weight associated with each edge depends on
similarity
+ Phase IV: Summary building Top ranked sentences are selected
25
Research Group in Computational Linguistics (Univ. Wolverhampton), June 20th 2011
![Page 24: Experiences on integrating explicit knowledge on information access tools in the medical domain](https://reader033.vdocuments.us/reader033/viewer/2022051612/54c2ee474a79594a428b45a4/html5/thumbnails/24.jpg)
+Experiences in Automatic summarization (II)
+ Phase I: Graph generation Sentences and UMLS concepts identification
+ Phase II: Similarity algorithm Concepts overlapping between sentences
(edges) means “recommendation”
+ Phase III: Ranking algorithm Weight associated with each edge depends on
similarity
+ Phase IV: Summary building Top ranked sentences are selected
26
Research Group in Computational Linguistics (Univ. Wolverhampton), June 20th 2011
![Page 25: Experiences on integrating explicit knowledge on information access tools in the medical domain](https://reader033.vdocuments.us/reader033/viewer/2022051612/54c2ee474a79594a428b45a4/html5/thumbnails/25.jpg)
+Experiences in Automatic summarization (II)
+ Phase I: Graph generation Sentences and UMLS concepts identification
+ Phase II: Similarity algorithm Concepts overlapping between sentences
(edges) means “recommendation”
+ Phase III: Ranking algorithm Weight associated with each edge depends on
similarity
+ Phase IV: Summary building Top ranked sentences are selected
27
Research Group in Computational Linguistics (Univ. Wolverhampton), June 20th 2011
![Page 26: Experiences on integrating explicit knowledge on information access tools in the medical domain](https://reader033.vdocuments.us/reader033/viewer/2022051612/54c2ee474a79594a428b45a4/html5/thumbnails/26.jpg)
+Experiences in Automatic summarization (II)
+ Phase I: Graph generation Sentences and UMLS concepts identification
+ Phase II: Similarity algorithm Concepts overlapping between sentences
(edges) means “recommendation”
+ Phase III: Ranking algorithm Weight associated with each edge depends on
similarity
+ Phase IV: Summary building Top ranked sentences are selected
28
Research Group in Computational Linguistics (Univ. Wolverhampton), June 20th 2011
![Page 27: Experiences on integrating explicit knowledge on information access tools in the medical domain](https://reader033.vdocuments.us/reader033/viewer/2022051612/54c2ee474a79594a428b45a4/html5/thumbnails/27.jpg)
+Automatic Summarization. Evaluation
Evaluation with ROUGE (based on n-grams) against generic summarizers Our method obtains good results, specially with small n-grams
de la Villa, M., Maña, M. “Propuesta y evaluación de un método de generación de resúmenes extractivo basado en conceptos en el ámbito biomédico”. XXV edición del Congreso Anual de la Sociedad Española para el Procesamiento del Lenguaje Natural 2009 (SEPLN´09) San Sebastián (Sept-2009).
29
Research Group in Computational Linguistics (Univ. Wolverhampton), June 20th 2011
![Page 28: Experiences on integrating explicit knowledge on information access tools in the medical domain](https://reader033.vdocuments.us/reader033/viewer/2022051612/54c2ee474a79594a428b45a4/html5/thumbnails/28.jpg)
+Knowledge integration
30
Research Group in Computational Linguistics (Univ. Wolverhampton), June 20th 2011
![Page 29: Experiences on integrating explicit knowledge on information access tools in the medical domain](https://reader033.vdocuments.us/reader033/viewer/2022051612/54c2ee474a79594a428b45a4/html5/thumbnails/29.jpg)
+Experiences in Computer-aided summarization(I)
31
Research Group in Computational Linguistics (Univ. Wolverhampton), June 20th 2011
Computer-aided summarization combines automatic and human summarization.
The CAS system suggest an initial summary, selecting relevant sentences
The human can change the sentences selection and edit manually the summary.
Purpose: construction of a Gold-Standard building assistant.
Novelty: Considering biomedical concepts distribution (Reeve et al., 2006)
![Page 30: Experiences on integrating explicit knowledge on information access tools in the medical domain](https://reader033.vdocuments.us/reader033/viewer/2022051612/54c2ee474a79594a428b45a4/html5/thumbnails/30.jpg)
+Experiences in Computer-aided summarization(and II)
Experience in the design and construction of a
Gold-Standard building assistant (or Computer-aided summarization)
Considering biomedical concepts distribution
(Reeve et al., 2006)
-Client-server app -Centralized repository
-Supports PDF, XML
32
Research Group in Computational Linguistics (Univ. Wolverhampton), June 20th 2011
![Page 31: Experiences on integrating explicit knowledge on information access tools in the medical domain](https://reader033.vdocuments.us/reader033/viewer/2022051612/54c2ee474a79594a428b45a4/html5/thumbnails/31.jpg)
+Knowledge integration
33
Research Group in Computational Linguistics (Univ. Wolverhampton), June 20th 2011
![Page 32: Experiences on integrating explicit knowledge on information access tools in the medical domain](https://reader033.vdocuments.us/reader033/viewer/2022051612/54c2ee474a79594a428b45a4/html5/thumbnails/32.jpg)
+Experiences in Information Retrieval and Post-retrieval clustering
Experience in the design and construction of an information
retrieval system with: • Post-retrieval clustering, • orientation to biomedical
documents and • mobile devices
34
Research Group in Computational Linguistics (Univ. Wolverhampton), June 20th 2011
![Page 33: Experiences on integrating explicit knowledge on information access tools in the medical domain](https://reader033.vdocuments.us/reader033/viewer/2022051612/54c2ee474a79594a428b45a4/html5/thumbnails/33.jpg)
Document sources: Biomed Central (web crawling in progress) Text Processing: lowercasing, stemming, stop-words ,…
Search and Informa.on Retrieval Our implementa.on
Lucene for indexing…
36
Research Group in Computational Linguistics (Univ. Wolverhampton), June 20th 2011
![Page 34: Experiences on integrating explicit knowledge on information access tools in the medical domain](https://reader033.vdocuments.us/reader033/viewer/2022051612/54c2ee474a79594a428b45a4/html5/thumbnails/34.jpg)
Search and Informa.on Retrieval Our implementa.on (and II)
… and Lucene for searching
37
Research Group in Computational Linguistics (Univ. Wolverhampton), June 20th 2011
![Page 35: Experiences on integrating explicit knowledge on information access tools in the medical domain](https://reader033.vdocuments.us/reader033/viewer/2022051612/54c2ee474a79594a428b45a4/html5/thumbnails/35.jpg)
Weka for Clustering Clustering
Our implementa.on
38
38
Research Group in Computational Linguistics (Univ. Wolverhampton), June 20th 2011
The post-processing clustering is to associate, according to their similarity, a set of documents retrieved from a query in different subsets
![Page 36: Experiences on integrating explicit knowledge on information access tools in the medical domain](https://reader033.vdocuments.us/reader033/viewer/2022051612/54c2ee474a79594a428b45a4/html5/thumbnails/36.jpg)
Clustering algorithm:
Simple-K-Means vs Expectation Maximization
Algorithms Querys (Documents)
Simple-‐K-‐means EM
Ligaments (10) 1 2
Cancer Skin (25) 4 12
Cancer (46) 5 26
Disease (62) 8 57
Time it takes to perform the grouping in seconds
K? It depends on the number of documents retrieved.
Clustering Why Simple-‐K-‐Means?
39
![Page 37: Experiences on integrating explicit knowledge on information access tools in the medical domain](https://reader033.vdocuments.us/reader033/viewer/2022051612/54c2ee474a79594a428b45a4/html5/thumbnails/37.jpg)
Cancer skin
40
Visualiza.on on Mobile Devices Our interface
![Page 38: Experiences on integrating explicit knowledge on information access tools in the medical domain](https://reader033.vdocuments.us/reader033/viewer/2022051612/54c2ee474a79594a428b45a4/html5/thumbnails/38.jpg)
+Knowledge integration
41
Research Group in Computational Linguistics (Univ. Wolverhampton), June 20th 2011
![Page 39: Experiences on integrating explicit knowledge on information access tools in the medical domain](https://reader033.vdocuments.us/reader033/viewer/2022051612/54c2ee474a79594a428b45a4/html5/thumbnails/39.jpg)
+
User have problems to define their information needs in a query string (Jansen, Spink y Koshman, 2007). Queries containe less than three terms (75,2%) and the majority of
queries contained one (18,5%), two (32,2%)
Methods to improve (expand) query: Relevance feedback. Local analysis or global analysis.
Natural Language Processing Resources.
Experiments with users show the preferences of these to maintain control over how the query is reformulated (Belkin et al., 2001).
Experiences in Information Retrieval and Query user-defined expansion (I)
![Page 40: Experiences on integrating explicit knowledge on information access tools in the medical domain](https://reader033.vdocuments.us/reader033/viewer/2022051612/54c2ee474a79594a428b45a4/html5/thumbnails/40.jpg)
+Experiences in Information Retrieval and Query user-defined expansion (II)
Experience on using Ontologies to assist the definition of the search string… previosly
43
Research Group in Computational Linguistics (Univ. Wolverhampton), June 20th 2011
![Page 41: Experiences on integrating explicit knowledge on information access tools in the medical domain](https://reader033.vdocuments.us/reader033/viewer/2022051612/54c2ee474a79594a428b45a4/html5/thumbnails/41.jpg)
+
Pre-retrieval Construction o f the Graph
How does it works?
Experiences in Information Retrieval and Query user-defined expansion (II)
![Page 42: Experiences on integrating explicit knowledge on information access tools in the medical domain](https://reader033.vdocuments.us/reader033/viewer/2022051612/54c2ee474a79594a428b45a4/html5/thumbnails/42.jpg)
+Research: Information Retrieval (and III)
… or using Ontologies to build an enriched concept graph that assist the definition of the search string
http://www.uhu.es/manuel.villa/viewmed/ de la Villa, M., Garcia, S., Maña, M. “¿De verdad sabes lo que quieres buscar? Expansión guiada visualmente de la cadena de búsqueda usando ontologías y grafos de conceptos”. XXVII edición del Congreso Anual de la Sociedad Española para el Procesamiento del Lenguaje Natural 2011 (SEPLN´11) Huelva (Sept-2011).
45
Research Group in Computational Linguistics (Univ. Wolverhampton), June 20th 2011
![Page 43: Experiences on integrating explicit knowledge on information access tools in the medical domain](https://reader033.vdocuments.us/reader033/viewer/2022051612/54c2ee474a79594a428b45a4/html5/thumbnails/43.jpg)
+Tools knowns. Expectations.
UMLS: Metathesaurus, Semantic Network
Tools:
Metamap, MMTx API,
Semrep UTS Web Services, …
Freebase
MQL (Metaweb Query Language)
Newbie with UIMA & GATE
46
Research Group in Computational Linguistics (Univ. Wolverhampton), June 20th 2011
I offer my collaboration if you’re interested in using any of these resources
I’m open to collaborate on whatever task you consider related and…
… to receive some guidelines to improve summarization method
Any questions?