presentación en ideal 2008

20
* Building a Spanish MMTx by using Automatic Translation and Biomedical Ontologies Francisco Carrero 1,2 ; José Carlos Cortizo 1,2 ; José Mª Gómez 3 1 Wipley, Social Gaming Platform http://www.wipley.com 2 Universidad Europea de Madrid http://www.esp.uem.es/gsi 3 Optenet http://www.esp.uem.es/gsi

Upload: jose-carlos-cortizo-perez

Post on 05-Dec-2014

2.400 views

Category:

Technology


2 download

DESCRIPTION

Presentación de nuestro artículo de traducción+MetaMap en IDEAL 2008

TRANSCRIPT

Page 1: Presentación en IDEAL 2008

*Building a Spanish MMTx by using Automatic Translation and Biomedical Ontologies

Francisco Carrero 1,2 ; José Carlos Cortizo 1,2 ; José Mª Gómez 3 1 Wipley, Social Gaming Platform http://www.wipley.com 2 Universidad Europea de Madrid http://www.esp.uem.es/gsi 3 Optenet http://www.esp.uem.es/gsi

Page 2: Presentación en IDEAL 2008

Francisco Carrero Garcia

Outline

The MIRCAT project

The challenge

English MetaMap, a big effort

Approaching a Spanish MetaMap

Experiments

Discussion of the Results and Future Work

Page 3: Presentación en IDEAL 2008

Francisco Carrero Garcia

The MIRCAT Project

Francisco Carrero Garcia

The Interface

Page 4: Presentación en IDEAL 2008

Francisco Carrero Garcia

The MIRCAT Project

Francisco Carrero Garcia

System’s Architecture

Page 5: Presentación en IDEAL 2008

Francisco Carrero Garcia

The ChallengeOur Goal

Medical record

English docs

Spanish docs

Page 6: Presentación en IDEAL 2008

Francisco Carrero Garcia

The ChallengeThe problem

We can extract UMLS concepts from English texts using MetaMap...

...but there is no Spanish version of MetaMap

Is it difficult to construct a tool like MetaMap?

Page 7: Presentación en IDEAL 2008

Francisco Carrero Garcia

English MetaMapA big Effort

Francisco Carrero Garcia

∼3 years!!

Page 8: Presentación en IDEAL 2008

Francisco Carrero Garcia

Approaching Spanish MetaMapTwo Main Approaches Considered

Francisco Carrero Garcia

Page 9: Presentación en IDEAL 2008

Francisco Carrero Garcia

Approaching Spanish MetaMapOur Approach: Translation and Reuse

Francisco Carrero Garcia

Optional

Page 10: Presentación en IDEAL 2008

Francisco Carrero Garcia

Experimental Design

MedLine Plus medical News

http://www.nlm.nih.gov/medlineplus/newsbydate.html

Excellent online resource

2000 news, some in English, some in Spanish

600 available in both languages

Text Collections

Page 11: Presentación en IDEAL 2008

Francisco Carrero Garcia

Experiments

MetaMap extracts concepts, allowing multiple representations

A => Using compound concepts

B => simple concepts

1 => resolves ambiguity by adding all the concepts

2 => ignores ambiguities by choosing the first possibility

4 representations: A1, A2, B1, B2

Experimental Design

Page 12: Presentación en IDEAL 2008

Francisco Carrero Garcia

Experiments

Data representations containing a lot of features do not usually perform very well in text tasks

Many classifiers degrade in prediction accuracy when faced with many irrelevant features or redundant/correlated ones (“curse of dimensionality”)

We apply Zipf’s Law to filter the attributes

Filtering

Page 13: Presentación en IDEAL 2008

Francisco Carrero Garcia

Experiments ResultsNumber of concepts for each representation

Page 14: Presentación en IDEAL 2008

Francisco Carrero Garcia

Experiments ResultsAverage Similarities

Page 15: Presentación en IDEAL 2008

Francisco Carrero Garcia

Experiments ResultsLast Experiments (not in IDEAL paper)

Page 16: Presentación en IDEAL 2008

Francisco Carrero Garcia

Discussion of the Results

The worst results (similarity) are achieved with the most complex (near to humans) representation: A1

B1 is less complex and produces the best results

=> Our model seems to be more suitable as a plain bag-of-concepts representation

Similar to bag-of-words representation, widely used in text processing tasks

Translation

Page 17: Presentación en IDEAL 2008

Francisco Carrero Garcia

Discussion of the Results

All results are comparable to classification on original English texts

In some cases, are even better

Best results using A2+Zipf, +7.8% in AUC

UNMKD representations never achieves worse classifications than English

Classification

Page 18: Presentación en IDEAL 2008

Francisco Carrero Garcia

Conclussions and Future Work

The “easy way” to construct a Spanish MetaMap is promising

Google Translation seems a good tool to adapt English resources to any other languages (like Spanish)

We should try other translation tools

We are working on applying this approach to other text tasks (like Information Retrieval and Filtering)

Page 19: Presentación en IDEAL 2008

Francisco Carrero Garcia

Ending...

Thank you very much for your attention

Page 20: Presentación en IDEAL 2008

Francisco Carrero Garcia

Any Question?