presentación en ideal 2008
Post on 05-Dec-2014
2.400 Views
Preview:
DESCRIPTION
TRANSCRIPT
*Building a Spanish MMTx by using Automatic Translation and Biomedical Ontologies
Francisco Carrero 1,2 ; José Carlos Cortizo 1,2 ; José Mª Gómez 3 1 Wipley, Social Gaming Platform http://www.wipley.com 2 Universidad Europea de Madrid http://www.esp.uem.es/gsi 3 Optenet http://www.esp.uem.es/gsi
Francisco Carrero Garcia
Outline
The MIRCAT project
The challenge
English MetaMap, a big effort
Approaching a Spanish MetaMap
Experiments
Discussion of the Results and Future Work
Francisco Carrero Garcia
The MIRCAT Project
Francisco Carrero Garcia
The Interface
Francisco Carrero Garcia
The MIRCAT Project
Francisco Carrero Garcia
System’s Architecture
Francisco Carrero Garcia
The ChallengeOur Goal
Medical record
English docs
Spanish docs
Francisco Carrero Garcia
The ChallengeThe problem
We can extract UMLS concepts from English texts using MetaMap...
...but there is no Spanish version of MetaMap
Is it difficult to construct a tool like MetaMap?
Francisco Carrero Garcia
English MetaMapA big Effort
Francisco Carrero Garcia
∼3 years!!
Francisco Carrero Garcia
Approaching Spanish MetaMapTwo Main Approaches Considered
Francisco Carrero Garcia
Francisco Carrero Garcia
Approaching Spanish MetaMapOur Approach: Translation and Reuse
Francisco Carrero Garcia
Optional
Francisco Carrero Garcia
Experimental Design
MedLine Plus medical News
http://www.nlm.nih.gov/medlineplus/newsbydate.html
Excellent online resource
2000 news, some in English, some in Spanish
600 available in both languages
Text Collections
Francisco Carrero Garcia
Experiments
MetaMap extracts concepts, allowing multiple representations
A => Using compound concepts
B => simple concepts
1 => resolves ambiguity by adding all the concepts
2 => ignores ambiguities by choosing the first possibility
4 representations: A1, A2, B1, B2
Experimental Design
Francisco Carrero Garcia
Experiments
Data representations containing a lot of features do not usually perform very well in text tasks
Many classifiers degrade in prediction accuracy when faced with many irrelevant features or redundant/correlated ones (“curse of dimensionality”)
We apply Zipf’s Law to filter the attributes
Filtering
Francisco Carrero Garcia
Experiments ResultsNumber of concepts for each representation
Francisco Carrero Garcia
Experiments ResultsAverage Similarities
Francisco Carrero Garcia
Experiments ResultsLast Experiments (not in IDEAL paper)
Francisco Carrero Garcia
Discussion of the Results
The worst results (similarity) are achieved with the most complex (near to humans) representation: A1
B1 is less complex and produces the best results
=> Our model seems to be more suitable as a plain bag-of-concepts representation
Similar to bag-of-words representation, widely used in text processing tasks
Translation
Francisco Carrero Garcia
Discussion of the Results
All results are comparable to classification on original English texts
In some cases, are even better
Best results using A2+Zipf, +7.8% in AUC
UNMKD representations never achieves worse classifications than English
Classification
Francisco Carrero Garcia
Conclussions and Future Work
The “easy way” to construct a Spanish MetaMap is promising
Google Translation seems a good tool to adapt English resources to any other languages (like Spanish)
We should try other translation tools
We are working on applying this approach to other text tasks (like Information Retrieval and Filtering)
Francisco Carrero Garcia
Ending...
Thank you very much for your attention
Francisco Carrero Garcia
Any Question?
top related