presentación en ideal 2008

*Building a Spanish MMTx by using Automatic Translation and Biomedical Ontologies

Francisco Carrero 1,2 ; José Carlos Cortizo 1,2 ; José Mª Gómez 3 1 Wipley, Social Gaming Platform http://www.wipley.com 2 Universidad Europea de Madrid http://www.esp.uem.es/gsi 3 Optenet http://www.esp.uem.es/gsi

Francisco Carrero Garcia

Outline

The MIRCAT project

The challenge

English MetaMap, a big effort

Approaching a Spanish MetaMap

Experiments

Discussion of the Results and Future Work

The MIRCAT Project

The Interface

The MIRCAT Project

System’s Architecture

The ChallengeOur Goal

Medical record

English docs

Spanish docs

The ChallengeThe problem

We can extract UMLS concepts from English texts using MetaMap...

...but there is no Spanish version of MetaMap

Is it difficult to construct a tool like MetaMap?

English MetaMapA big Effort

∼3 years!!

Approaching Spanish MetaMapTwo Main Approaches Considered

Approaching Spanish MetaMapOur Approach: Translation and Reuse

Optional

Experimental Design

MedLine Plus medical News

http://www.nlm.nih.gov/medlineplus/newsbydate.html

Excellent online resource

2000 news, some in English, some in Spanish

600 available in both languages

Text Collections

Experiments

MetaMap extracts concepts, allowing multiple representations

A => Using compound concepts

B => simple concepts

1 => resolves ambiguity by adding all the concepts

2 => ignores ambiguities by choosing the first possibility

4 representations: A1, A2, B1, B2

Experimental Design

Experiments

Data representations containing a lot of features do not usually perform very well in text tasks

Many classifiers degrade in prediction accuracy when faced with many irrelevant features or redundant/correlated ones (“curse of dimensionality”)

We apply Zipf’s Law to filter the attributes

Filtering

Experiments ResultsNumber of concepts for each representation

Experiments ResultsAverage Similarities

Experiments ResultsLast Experiments (not in IDEAL paper)

Discussion of the Results

The worst results (similarity) are achieved with the most complex (near to humans) representation: A1

B1 is less complex and produces the best results

=> Our model seems to be more suitable as a plain bag-of-concepts representation

Similar to bag-of-words representation, widely used in text processing tasks

Translation

Discussion of the Results

All results are comparable to classification on original English texts

In some cases, are even better

Best results using A2+Zipf, +7.8% in AUC

UNMKD representations never achieves worse classifications than English

Classification

Conclussions and Future Work

The “easy way” to construct a Spanish MetaMap is promising

Google Translation seems a good tool to adapt English resources to any other languages (like Spanish)

We should try other translation tools

We are working on applying this approach to other text tasks (like Information Retrieval and Filtering)

Ending...

Thank you very much for your attention

Any Question?

presentación en ideal 2008

Technology

presentación 2008 intercambo ionico fase sólida final

bases de la presentación - london stock exchange · bases...

presentación karin taylor para pgs 2008 (1) español

campus (presentación campus junta directiva, montreal 2008)

presentación especialista universitario java...

hyundai electro electric systems heavy industry...

presentación de...

04-basics of non-ideal reactors 2008

presentación. comic strips and the pragmatic wastebasket....

aos 101 discussion val bennington ideal gases september,...

presentación nuemral 5 iso 9001 2008

título de la presentación - subtítulo de la...

ideal 2270 ideal 2270 cc - produktinfo.conrad.com · ideal...

lecture 4: ideal probability metrics - kit - willkommen am...

presentación de powerpoint · 1000/0 tÜvrheinland...

presentaciÓn presentaciÓn presentaciÓn power point

presentación de comando ukho enero 2008 ingles

winter ideal 2008

page 1 viii seminario de acsda presentación de simon thomas...

vista-healthevet monograph 2008-2009, department of ... ·...