a few contributions of the sifr project · 2015. 12. 21. · sifr axes of research (4/8): semantic...

24
A few contributions of the SIFR project Semantic Indexing of French biomedical Resources Data seminar- December 10th 2015 – LIRMM – University of Montpellier Clement Jonquet, Mathieu Roche, Sandra Bringay et al.

Upload: others

Post on 25-Feb-2021

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: A few contributions of the SIFR project · 2015. 12. 21. · SIFR axes of research (4/8): Semantic distance framework Automatically compute existing (Rada, Wu&Palmer, Resnik ) semantic

A few contributions of the SIFR project

Semantic Indexing of French

biomedical Resources

Data seminar- December 10th 2015 – LIRMM – University of Montpellier

Clement Jonquet, Mathieu Roche, Sandra Bringay et al.

Page 2: A few contributions of the SIFR project · 2015. 12. 21. · SIFR axes of research (4/8): Semantic distance framework Automatically compute existing (Rada, Wu&Palmer, Resnik ) semantic

Biologists have adopted ontologies

  To provide canonical representation of scientific knowledge

  To annotate experimental data to enable interpretation, comparison, and discovery across databases

  To facilitate knowledge-based applications for   Decision support   Natural language-processing   Data integration

  But ontologies are: spread out, in different formats, of different size, with different structures UMLS (163 databases) :

~ 9 000 000 terms in English / ~ 1 000 000 tems in Spanish

~ 330 000 terms in French

Page 3: A few contributions of the SIFR project · 2015. 12. 21. · SIFR axes of research (4/8): Semantic distance framework Automatically compute existing (Rada, Wu&Palmer, Resnik ) semantic

  Comparison of the approaches [IWBBIO'14]

Page 4: A few contributions of the SIFR project · 2015. 12. 21. · SIFR axes of research (4/8): Semantic distance framework Automatically compute existing (Rada, Wu&Palmer, Resnik ) semantic

Annotation challenge   Explosion of biomedical data: diverse,

distributed, unstructured… not link to ontologies

  Hard for biomedical researchers to find the data they need

  Data integration problem

  Translational discoveries are prevented

  Good examples

  GO annotations

  PubMed (biomedical literature) indexed with Mesh headings

ONTOLOGIES

RESOURCES

Page 5: A few contributions of the SIFR project · 2015. 12. 21. · SIFR axes of research (4/8): Semantic distance framework Automatically compute existing (Rada, Wu&Palmer, Resnik ) semantic

Semantic Indexing of French Biomedical Data Resources project

… in collaboration with…

Page 6: A few contributions of the SIFR project · 2015. 12. 21. · SIFR axes of research (4/8): Semantic distance framework Automatically compute existing (Rada, Wu&Palmer, Resnik ) semantic

Use biomedical ontologies-based annotations end-user applications

MeSH 2015 SNOMED …

Annotation

Enrichment

Researchers

Ontologies

produce

DATA

A

B

Page 7: A few contributions of the SIFR project · 2015. 12. 21. · SIFR axes of research (4/8): Semantic distance framework Automatically compute existing (Rada, Wu&Palmer, Resnik ) semantic

SIFR axes of research (1/8): Design of the SIFR (French) Annotator service   Deployment of a local instance of BioPortal at LIRMM

  16 French terminologies imported from UMLS, EHTOP & BioPortal

  http://bioportal.lirmm.fr/annotator

  New improvement to the annotation workflow

  Automatic term extraction measures (C-value, LIDF-value, etc.)

  Scoring of annotations & representation in RDF using the AO [SWAT4LS 2014]

Page 8: A few contributions of the SIFR project · 2015. 12. 21. · SIFR axes of research (4/8): Semantic distance framework Automatically compute existing (Rada, Wu&Palmer, Resnik ) semantic

SIFR axes of research (2/8): Dealing with multilingualism within BioPortal   Status of multilingualism in BioPortal – quite negative

  Set of propositions [MSW 2014]

  Representation of natural language property for an ontology

  Representation of the distinction between ontologies

  Representation of relation between ontologies

  Representation of multilingual translation mappings

  Reconciliation of multilingual mappings

  Currently being tested/implemented within our local instance

Page 9: A few contributions of the SIFR project · 2015. 12. 21. · SIFR axes of research (4/8): Semantic distance framework Automatically compute existing (Rada, Wu&Palmer, Resnik ) semantic

SIFR axes of research (3/8): Automatic extraction of biomedical terminology from text   Context of the PhD of Juan Antonio Lossio

[TALN 2014][PolTAL 2014][IRJ 2015]

  BioTex , software http://tubo.lirmm.fr/biotex [ISWC 2014]

  Work in French, English, and Spanish

  Motivations for automatic terminology extraction

  Experiment and validate approaches for French data

  Contribute to the ontology enrichment process

  Acquire some NLP expertise for the annotation workflow

Page 10: A few contributions of the SIFR project · 2015. 12. 21. · SIFR axes of research (4/8): Semantic distance framework Automatically compute existing (Rada, Wu&Palmer, Resnik ) semantic

SIFR axes of research (4/8): Semantic distance framework

  Automatically compute existing (Rada, Wu&Palmer, Resnik) semantic similarity measures over BioPortal ontologies

  For a given concept get all semantically closed concepts

  Get the semantic distance between 2 concepts

  Collaboration with LGI2P to reuse Semantic Measure Library (SML) within BioPortal

  1st prototype: http://tubo.lirmm.fr/BioMedicalSemantic/web/app_dev.php

  To include SML within BioPortal backend to bring semantic distance services to the ontologies and data annotated

Page 11: A few contributions of the SIFR project · 2015. 12. 21. · SIFR axes of research (4/8): Semantic distance framework Automatically compute existing (Rada, Wu&Palmer, Resnik ) semantic

SIFR axes of research (5/8): Informal patient data analysis   Dealing with public patient data on blogs, forums and

tweets

  Detection of emotion [EGC 2014][eTELEMED 2014]

  Patient vocabulary (crabe vs. cancer)

  Project “Parlons de nous” (www.lirmm.fr/patient-mind)

  MSH-M

  A vocabulary currently being constructed

  Hosted and available in our local instance of BioPortal

  Used for annotations, indexing

Page 12: A few contributions of the SIFR project · 2015. 12. 21. · SIFR axes of research (4/8): Semantic distance framework Automatically compute existing (Rada, Wu&Palmer, Resnik ) semantic

SIFR axes of research (6/8): Semantic indexing of semantic Web data and social Web data - Viewpoint project   Graph based knowledge representation formalism

  Collaboration with P. Lemoisson (CIRAD)

  PhD project of Guillaume Surroca

  First prototype for semantic search over HAL-LIRMM publications [IC2014]

  Toward a model for Serendipity and collective intelligence [KEOD2015]

Page 13: A few contributions of the SIFR project · 2015. 12. 21. · SIFR axes of research (4/8): Semantic distance framework Automatically compute existing (Rada, Wu&Palmer, Resnik ) semantic

SIFR axes of research (7/8): pharmacogenomics use case   PGx studies how individual gene variations cause variability in

drug responses

  Validation of pharmacogenomics state-of-the-art knowledge on the basis of practice-based evidences

  Compare pharmacogenomics literature (in English) and electronic health records (in French)

  EHRs from Paris (HEGP) & St Etienne hospitals

  Improvement of the Annotator in order to handle clinical data: negation, disambiguation, modularity, temporality

  ANR

  Collaborative action lead by Adrien Coulet (LORIA)

  Stanford is in the loop (Russ, Mark, Michel, Nigam)

Page 14: A few contributions of the SIFR project · 2015. 12. 21. · SIFR axes of research (4/8): Semantic distance framework Automatically compute existing (Rada, Wu&Palmer, Resnik ) semantic

SIFR axes of research (8/8): AgroPortal project   In collaboration with the Institute of Computational Biology

of Montpellier

  Design of a semantic annotation workflow for plant data - collaboration with IBC project [CO-PDI 2014]

  AgroLD: to build an RDF knowledge base to house plant data resources: SouthGreen, Gramene, OryGeneDB… [RDA 2014]

  In collaboration with CIRAD/IRD, INRA, and Bioversity International

  Experiment NCBO technologies for the plant community

  Help the design and evolution of Cropontology.org

  1-year postdoc starting in June

  Interactions with NSF Planteome project (P. Jaiswal, L. Cooper)

Page 15: A few contributions of the SIFR project · 2015. 12. 21. · SIFR axes of research (4/8): Semantic distance framework Automatically compute existing (Rada, Wu&Palmer, Resnik ) semantic

Terminology extraction in Biomedecine (step 3)

Page 16: A few contributions of the SIFR project · 2015. 12. 21. · SIFR axes of research (4/8): Semantic distance framework Automatically compute existing (Rada, Wu&Palmer, Resnik ) semantic

term1

term2 … termn

Linguistic

Statistic

Graph Web

(Ranking)

(Re-ranking)

1

2

Page 17: A few contributions of the SIFR project · 2015. 12. 21. · SIFR axes of research (4/8): Semantic distance framework Automatically compute existing (Rada, Wu&Palmer, Resnik ) semantic

LIDF-value

Linguistic

Statistic

term1

term2 … termn

Graph Web

(Ranking)

Page 18: A few contributions of the SIFR project · 2015. 12. 21. · SIFR axes of research (4/8): Semantic distance framework Automatically compute existing (Rada, Wu&Palmer, Resnik ) semantic

TF-IDF and Okapi BM25

Keyword1

Keyword2 …

Keyword

Keyword

Keyword

Linguistic

Statistic

term1

term2 … termn

Graph Web

(Ranking)

Page 19: A few contributions of the SIFR project · 2015. 12. 21. · SIFR axes of research (4/8): Semantic distance framework Automatically compute existing (Rada, Wu&Palmer, Resnik ) semantic

Linguistic

Statistic

term1

term2 … termn

Graph Web

(Re-ranking)

WEB

Web-based: WAHI

« buschke lowenstein tumor » 

buschke lowenstein tumor  lowenstein 

tumor 

buschke

nb = number of hits!

Page 20: A few contributions of the SIFR project · 2015. 12. 21. · SIFR axes of research (4/8): Semantic distance framework Automatically compute existing (Rada, Wu&Palmer, Resnik ) semantic

Experiments (quantitive evaluation)

Precision@k K = 100, 500, …, 20000

Page 21: A few contributions of the SIFR project · 2015. 12. 21. · SIFR axes of research (4/8): Semantic distance framework Automatically compute existing (Rada, Wu&Palmer, Resnik ) semantic

Experiments (qualitative evaluation)

http://www.ontologos-corp.com/corporate/index.php http://www.varapp.org/

Ontologos, VARAPP

500 candidate terms extracted from their documents

Objective: extraction of relevant biomedical terms (i.e. those which can be added to a biomedical terminology)

Precision

True Biomedical Terms 74.6 %

False Biomedical Terms 25.4 %

http://tubo.lirmm.fr:8080/ontologos/

bêta-2 mimétiques

bêta-2 agonistes

dosage des ige spécifiques

suivi des maladies allergiques

Page 22: A few contributions of the SIFR project · 2015. 12. 21. · SIFR axes of research (4/8): Semantic distance framework Automatically compute existing (Rada, Wu&Palmer, Resnik ) semantic

A few conclusions

Page 23: A few contributions of the SIFR project · 2015. 12. 21. · SIFR axes of research (4/8): Semantic distance framework Automatically compute existing (Rada, Wu&Palmer, Resnik ) semantic

Future work

  Continue to move different prototypes into production

  Release of the French Annotator

  Find more use cases

  Collaboration with the plant/agro community

  Continue reusing and contributing to NCBO technology

Page 24: A few contributions of the SIFR project · 2015. 12. 21. · SIFR axes of research (4/8): Semantic distance framework Automatically compute existing (Rada, Wu&Palmer, Resnik ) semantic

Online resources

  Web page: www.lirmm.fr/sifr

  To be turned into a real small web site

  Task & team: https://www.researchgate.net/projects

  Feature removed by RG in February (to be replaced)

  Code repository: https://github.com/sifrproject

  13 developpers

  10 repositories

  Publications: http://bit.ly/194ImnR

  Direct link to HAL-LIRMM platform with advance search features