presentación en ideal 2008

Post on 05-Dec-2014

2.400 Views

Category:

Technology

2 Downloads

Preview:

Click to see full reader

DESCRIPTION

Presentación de nuestro artículo de traducción+MetaMap en IDEAL 2008

TRANSCRIPT

*Building a Spanish MMTx by using Automatic Translation and Biomedical Ontologies

Francisco Carrero 1,2 ; José Carlos Cortizo 1,2 ; José Mª Gómez 3 1 Wipley, Social Gaming Platform http://www.wipley.com 2 Universidad Europea de Madrid http://www.esp.uem.es/gsi 3 Optenet http://www.esp.uem.es/gsi

Francisco Carrero Garcia

Outline

The MIRCAT project

The challenge

English MetaMap, a big effort

Approaching a Spanish MetaMap

Experiments

Discussion of the Results and Future Work

Francisco Carrero Garcia

The MIRCAT Project

Francisco Carrero Garcia

The Interface

Francisco Carrero Garcia

The MIRCAT Project

Francisco Carrero Garcia

System’s Architecture

Francisco Carrero Garcia

The ChallengeOur Goal

Medical record

English docs

Spanish docs

Francisco Carrero Garcia

The ChallengeThe problem

We can extract UMLS concepts from English texts using MetaMap...

...but there is no Spanish version of MetaMap

Is it difficult to construct a tool like MetaMap?

Francisco Carrero Garcia

English MetaMapA big Effort

Francisco Carrero Garcia

∼3 years!!

Francisco Carrero Garcia

Approaching Spanish MetaMapTwo Main Approaches Considered

Francisco Carrero Garcia

Francisco Carrero Garcia

Approaching Spanish MetaMapOur Approach: Translation and Reuse

Francisco Carrero Garcia

Optional

Francisco Carrero Garcia

Experimental Design

MedLine Plus medical News

http://www.nlm.nih.gov/medlineplus/newsbydate.html

Excellent online resource

2000 news, some in English, some in Spanish

600 available in both languages

Text Collections

Francisco Carrero Garcia

Experiments

MetaMap extracts concepts, allowing multiple representations

A => Using compound concepts

B => simple concepts

1 => resolves ambiguity by adding all the concepts

2 => ignores ambiguities by choosing the first possibility

4 representations: A1, A2, B1, B2

Experimental Design

Francisco Carrero Garcia

Experiments

Data representations containing a lot of features do not usually perform very well in text tasks

Many classifiers degrade in prediction accuracy when faced with many irrelevant features or redundant/correlated ones (“curse of dimensionality”)

We apply Zipf’s Law to filter the attributes

Filtering

Francisco Carrero Garcia

Experiments ResultsNumber of concepts for each representation

Francisco Carrero Garcia

Experiments ResultsAverage Similarities

Francisco Carrero Garcia

Experiments ResultsLast Experiments (not in IDEAL paper)

Francisco Carrero Garcia

Discussion of the Results

The worst results (similarity) are achieved with the most complex (near to humans) representation: A1

B1 is less complex and produces the best results

=> Our model seems to be more suitable as a plain bag-of-concepts representation

Similar to bag-of-words representation, widely used in text processing tasks

Translation

Francisco Carrero Garcia

Discussion of the Results

All results are comparable to classification on original English texts

In some cases, are even better

Best results using A2+Zipf, +7.8% in AUC

UNMKD representations never achieves worse classifications than English

Classification

Francisco Carrero Garcia

Conclussions and Future Work

The “easy way” to construct a Spanish MetaMap is promising

Google Translation seems a good tool to adapt English resources to any other languages (like Spanish)

We should try other translation tools

We are working on applying this approach to other text tasks (like Information Retrieval and Filtering)

Francisco Carrero Garcia

Ending...

Thank you very much for your attention

Francisco Carrero Garcia

Any Question?

top related