how bio ontologies enable open science
TRANSCRIPT
![Page 2: How Bio Ontologies Enable Open Science](https://reader033.vdocuments.us/reader033/viewer/2022060108/55506835b4c90524138b4586/html5/thumbnails/2.jpg)
Ontologies
By Pedro Beltrão
![Page 3: How Bio Ontologies Enable Open Science](https://reader033.vdocuments.us/reader033/viewer/2022060108/55506835b4c90524138b4586/html5/thumbnails/3.jpg)
Key Points
• Open science requires structured content.• Structured content acquisition runs into a
curation bottleneck.– And “controlled manual curation” will not scale
• For “open science” to really take off:– collaborative curation platforms are going to be
necessary and, – (semi-)automation of curation is going to be
necessary. • Researchers need to exactly identify what is
being mentioned/discussed.• NCBO provides services that support these needs
![Page 4: How Bio Ontologies Enable Open Science](https://reader033.vdocuments.us/reader033/viewer/2022060108/55506835b4c90524138b4586/html5/thumbnails/4.jpg)
Currently, the main use of ontologies is for making sense of high throughput
data.
4
There are other uses of course, see Biomedical Ontologies: A functional perspective, Rubin et al, Briefings in Bioinformatics, Dec 2007, Vol 9:1 75-90
![Page 5: How Bio Ontologies Enable Open Science](https://reader033.vdocuments.us/reader033/viewer/2022060108/55506835b4c90524138b4586/html5/thumbnails/5.jpg)
Ontologies and content acquisition
• First start naming ‘things’• Then name ‘relationships’• Then comes the ‘logic of combining simple
relationships’• … realization that all this “structure” is hard to
create manually and manual curation will not scale … lots of dead projects.– Leads to new found love for text-mining!
![Page 6: How Bio Ontologies Enable Open Science](https://reader033.vdocuments.us/reader033/viewer/2022060108/55506835b4c90524138b4586/html5/thumbnails/6.jpg)
Emerging trends in content acquisition
• Increased Structure (in curation and annotations)
• Collaborative curation platforms– Knewco– SWAN– CBioC– …
• Integration of Text-mining in curation– Finding entities
• BioLit by Phil Bourne’s group– Finding relations … facts.
• Larry Hunter’s group• Biolink papers• EBI-MED
![Page 7: How Bio Ontologies Enable Open Science](https://reader033.vdocuments.us/reader033/viewer/2022060108/55506835b4c90524138b4586/html5/thumbnails/7.jpg)
Increasing Structure
• Until now the predominant use of ontologies is as a vocabulary to describe data … minimal structure in the descriptions.
• Precise capture of biomedical knowledge in structured form is now considered essential– Hits the manual curation bottleneck.– WA Baumgartner Jr. et al, Manual curation is not sufficient for annotation of
genomic databases. Bioinformatics 2007 23(13):i41-i48. Presented at ISMB 2007
![Page 8: How Bio Ontologies Enable Open Science](https://reader033.vdocuments.us/reader033/viewer/2022060108/55506835b4c90524138b4586/html5/thumbnails/8.jpg)
Knewco: Concept Web and Wikiprofesional
9
![Page 9: How Bio Ontologies Enable Open Science](https://reader033.vdocuments.us/reader033/viewer/2022060108/55506835b4c90524138b4586/html5/thumbnails/9.jpg)
The SWAN discourse ontologyCiccarese P, Wu E, Clark T (2007) 'An Overview of the SWAN 1.0 Ontology of Scientific Discourse‘ at the 16th International World Wide Web Conference Banff, Canada. May 8-12, 2007.
![Page 10: How Bio Ontologies Enable Open Science](https://reader033.vdocuments.us/reader033/viewer/2022060108/55506835b4c90524138b4586/html5/thumbnails/10.jpg)
Collaborative KB curation: SWAN Knowledge Workbench
Copyright 2007 Alzheimer Research Forum and Massachusetts General Hospital
![Page 11: How Bio Ontologies Enable Open Science](https://reader033.vdocuments.us/reader033/viewer/2022060108/55506835b4c90524138b4586/html5/thumbnails/11.jpg)
Copyright 2007 Alzheimer Research Forum and Massachusetts General Hospital
![Page 12: How Bio Ontologies Enable Open Science](https://reader033.vdocuments.us/reader033/viewer/2022060108/55506835b4c90524138b4586/html5/thumbnails/12.jpg)
Copyright 2007 Alzheimer Research Forum and Massachusetts General Hospital
![Page 13: How Bio Ontologies Enable Open Science](https://reader033.vdocuments.us/reader033/viewer/2022060108/55506835b4c90524138b4586/html5/thumbnails/13.jpg)
The SWAN Team and papers
• Harvard/MGH: Paolo Ciccarese, Marco Ocana, Tim Clark
• Alzforum: Elizabeth Wu, Gwen Wong, June Kinoshita www.alzforum.org
Copyright 2007 Alzheimer Research Forum and Massachusetts General Hospital
Photo not
available
[1] Gao Y, Kinoshita J, Wu E, Miller E, Lee R, Seaborne A, Cayzer S, Clark T (2006) ‘SWAN: A Distributed Knowledge Infrastructure for Alzheimer Disease Research’. Journal of Web Semantics 4(3).
[2] Ciccarese P, Wu E, Clark T (2007) 'An Overview of the SWAN 1.0 Ontology of Scientific Discourse'. 16th International World Wide Web Conference (WWW2007). Banff, Canada. May 8-12, 2007.
[3] Clark T and Kinoshita J (2007) 'Alzforum and SWAN: The Present and Future of Scientific Web Communities'. Briefings in Bioinformatics 8(3).
[4] Ciccarese, P, Wu E, Kinoshita J, Wong G, Ocana M, Ruttenberg A and Clark T (submitted for publication 9/4/2007) 'The SWAN Ontology of Scientific Discourse'.
![Page 14: How Bio Ontologies Enable Open Science](https://reader033.vdocuments.us/reader033/viewer/2022060108/55506835b4c90524138b4586/html5/thumbnails/14.jpg)
Integration of Text-mining + Curation
• Text mining works better if it uses appropriate ontologies.
• “Model” mismatch b/w needs of text mining and needs of KB builders.
• Text mining might work much better if:– It works in a loop with a
curator– It leverages the wisdom
of the masses
![Page 15: How Bio Ontologies Enable Open Science](https://reader033.vdocuments.us/reader033/viewer/2022060108/55506835b4c90524138b4586/html5/thumbnails/15.jpg)
Integration of Text-mining + Curation
![Page 16: How Bio Ontologies Enable Open Science](https://reader033.vdocuments.us/reader033/viewer/2022060108/55506835b4c90524138b4586/html5/thumbnails/16.jpg)
![Page 17: How Bio Ontologies Enable Open Science](https://reader033.vdocuments.us/reader033/viewer/2022060108/55506835b4c90524138b4586/html5/thumbnails/17.jpg)
![Page 18: How Bio Ontologies Enable Open Science](https://reader033.vdocuments.us/reader033/viewer/2022060108/55506835b4c90524138b4586/html5/thumbnails/18.jpg)
Quick recap
Use of ontologies in collaborative curation and content acquistion is not wide-spread; possibly because of:
1. Lack of a one stop shop for bio-ontologies2. Lack of tools to use ontologies for annotation
• Manual will not scale• Automatic can it be ‘good enough’?
3. Lack of a sustainable mechanism to create ontology based annotations
19
![Page 19: How Bio Ontologies Enable Open Science](https://reader033.vdocuments.us/reader033/viewer/2022060108/55506835b4c90524138b4586/html5/thumbnails/19.jpg)
NCBO’s efforts
• The key ingredients needed for collaborative curation platforms to succeed: – Proper use of bioontologies (just enough
ontology!) – Appropriate use of Natural Language Processing in
the curation workflow.
• NCBO has created web-services that allow use of ontologies in collaborative platforms http://bioontology.org/tools.html
![Page 20: How Bio Ontologies Enable Open Science](https://reader033.vdocuments.us/reader033/viewer/2022060108/55506835b4c90524138b4586/html5/thumbnails/20.jpg)
NCBO ontology services
21
Description REST URL
List all ontologies ./ontologies
Find a specific ontology ./ontologies/{ontology version id}
Download ontology file ./ontologies/download/{ontology version id}
Get versions of an ontology ./ontologies/version/{ontology id}
Get concept ./concepts/{ontology version id}/{concept id}
Search for concepts ./search/concepts/{query}?ontologies={ids}
Get latest version of an ontology ./virtual/{ontology_id}
Get concept for latest ontology version
./virtual/{ontology id}/{concept id}
List all ontology categories ./categories
Base URL: http://rest.bioontology.org/restDocumentation: www.bioontology.org/wiki/index.php/NCBO_REST_services
![Page 21: How Bio Ontologies Enable Open Science](https://reader033.vdocuments.us/reader033/viewer/2022060108/55506835b4c90524138b4586/html5/thumbnails/21.jpg)
NCBO annotation services
• Open Biomedical Annotator (OBA) web service – To automatically process textual metadata to recognize
relevant ontology concepts and return the terms as annotations
• Open Biomedical Resource (OBR) index– To index the contents of a few biomedical resources with
the biomedical concepts to which they relate … and allow programmatic access to the indexed data.
• URL: http://obs.bioontology.org22
![Page 22: How Bio Ontologies Enable Open Science](https://reader033.vdocuments.us/reader033/viewer/2022060108/55506835b4c90524138b4586/html5/thumbnails/22.jpg)
ANNOTATOR SERVICEANNOTATOR SERVICE
Using Ontologies to Annotate Your Data
![Page 23: How Bio Ontologies Enable Open Science](https://reader033.vdocuments.us/reader033/viewer/2022060108/55506835b4c90524138b4586/html5/thumbnails/23.jpg)
Annotator: The Basic IdeaProcess textual metadata to automatically tag text with as many ontology terms as possible.
24
![Page 24: How Bio Ontologies Enable Open Science](https://reader033.vdocuments.us/reader033/viewer/2022060108/55506835b4c90524138b4586/html5/thumbnails/24.jpg)
Annotator: Usage
• Give your text as input
• Select your parameters (ontologies to use, semantic type to filter, semantic expansion…)
• Get your results… in text, tab-delimited, XML, or OWL
• Paper in AMIA STB 09
![Page 25: How Bio Ontologies Enable Open Science](https://reader033.vdocuments.us/reader033/viewer/2022060108/55506835b4c90524138b4586/html5/thumbnails/25.jpg)
DATA SERVICEDATA SERVICE
Using Ontologies to Access and Analyze Public Data
![Page 26: How Bio Ontologies Enable Open Science](https://reader033.vdocuments.us/reader033/viewer/2022060108/55506835b4c90524138b4586/html5/thumbnails/26.jpg)
Open Biomedical Resources index
• The index can be used for:• Search (next few slides)• Data mining (Paper in AMIA STB 08 on mining relationships
b/w drugs, diseases and genes from Medline) 27
![Page 28: How Bio Ontologies Enable Open Science](https://reader033.vdocuments.us/reader033/viewer/2022060108/55506835b4c90524138b4586/html5/thumbnails/28.jpg)
![Page 29: How Bio Ontologies Enable Open Science](https://reader033.vdocuments.us/reader033/viewer/2022060108/55506835b4c90524138b4586/html5/thumbnails/29.jpg)
![Page 30: How Bio Ontologies Enable Open Science](https://reader033.vdocuments.us/reader033/viewer/2022060108/55506835b4c90524138b4586/html5/thumbnails/30.jpg)
![Page 31: How Bio Ontologies Enable Open Science](https://reader033.vdocuments.us/reader033/viewer/2022060108/55506835b4c90524138b4586/html5/thumbnails/31.jpg)
NCBO services
Ontology services(OBS)
Ontology services(OBS)
UMLS servicesUMLS services
BioPortal servicesBioPortal services
Data service(OBR)
Data service(OBR)
Annotation service(OBA)
Annotation service(OBA)
UsersUCSFLaboratreeCollabRxPharmGKB, JAXHGMD
UsersUCSFLaboratreeCollabRxPharmGKB, JAXHGMD
UsersBioPortal UIPDB/PLoSI2B2NextBioIO informatics
UsersBioPortal UIPDB/PLoSI2B2NextBioIO informatics
Users“Resources” tabKnewcoIO informatics
Users“Resources” tabKnewcoIO informatics
![Page 32: How Bio Ontologies Enable Open Science](https://reader033.vdocuments.us/reader033/viewer/2022060108/55506835b4c90524138b4586/html5/thumbnails/32.jpg)
Uses of NCBO services
• For programmatic access to latest versions of ontologies
• For concept recognition from text– For annotation– For accelerating curation
• For data aggregation and summarization
33
![Page 33: How Bio Ontologies Enable Open Science](https://reader033.vdocuments.us/reader033/viewer/2022060108/55506835b4c90524138b4586/html5/thumbnails/33.jpg)
BioLit web resource: automated recognition of ontology terms and database IDs after publication http://biolit.ucsd.edu
![Page 34: How Bio Ontologies Enable Open Science](https://reader033.vdocuments.us/reader033/viewer/2022060108/55506835b4c90524138b4586/html5/thumbnails/34.jpg)
Automated recognition of ontology terms and database IDs before publication with manual curation by author Word 2007 add-in
![Page 35: How Bio Ontologies Enable Open Science](https://reader033.vdocuments.us/reader033/viewer/2022060108/55506835b4c90524138b4586/html5/thumbnails/35.jpg)
End
![Page 36: How Bio Ontologies Enable Open Science](https://reader033.vdocuments.us/reader033/viewer/2022060108/55506835b4c90524138b4586/html5/thumbnails/36.jpg)
Annotation: UCSF
• The task is to decide which trial is relevant for a particular patient.– Use the annotator service to map concepts in
eligibility rules to UMLS CUIs
– Use the annotations from the OBR index to create tag clouds in CTExplorer.
![Page 37: How Bio Ontologies Enable Open Science](https://reader033.vdocuments.us/reader033/viewer/2022060108/55506835b4c90524138b4586/html5/thumbnails/37.jpg)
Annotation: Laboratree
![Page 38: How Bio Ontologies Enable Open Science](https://reader033.vdocuments.us/reader033/viewer/2022060108/55506835b4c90524138b4586/html5/thumbnails/38.jpg)
Annotation: CollabRx
caTissue/TIES Specimen Banking
Specimen management is based on ontologies developed by NCI
Ontology-based integration to create a virtual specimen bank
![Page 39: How Bio Ontologies Enable Open Science](https://reader033.vdocuments.us/reader033/viewer/2022060108/55506835b4c90524138b4586/html5/thumbnails/39.jpg)
Curation: JAX, UCHSC, PDB/PLoS
• JAX – Use concepts recognized in the abstracts of publications to triage papers for curation.
• UCSHC – Wrap our annotator as a UIMA component and compare performance on full text
• PDB/PLoS – BioLit and Word-plugin
![Page 40: How Bio Ontologies Enable Open Science](https://reader033.vdocuments.us/reader033/viewer/2022060108/55506835b4c90524138b4586/html5/thumbnails/40.jpg)
Ontology Access: I2B2
• Needs a “source” for ontologies in their ontology cell
• Using our services, we export BioPortal Ontologies to the I2B2 format.
![Page 41: How Bio Ontologies Enable Open Science](https://reader033.vdocuments.us/reader033/viewer/2022060108/55506835b4c90524138b4586/html5/thumbnails/41.jpg)
Ontology Access: IO-informatics
42
![Page 42: How Bio Ontologies Enable Open Science](https://reader033.vdocuments.us/reader033/viewer/2022060108/55506835b4c90524138b4586/html5/thumbnails/42.jpg)
Ontology Access: NextBio“Our collaboration with NCBO on adopting public biomedical ontologies throughout NextBio enabled us to create a platform dealing with heterogeneous biological data. These ontology-based search capabilities have resulted in a rapid adoption of NextBio by over 100,000 researchers around the world since our public debut in May of 2008”.
![Page 43: How Bio Ontologies Enable Open Science](https://reader033.vdocuments.us/reader033/viewer/2022060108/55506835b4c90524138b4586/html5/thumbnails/43.jpg)
1.CYP2C9,
2.VKORC1,
3.CYP2A6,
etc.1.Hemorrhage,
2.Venous Thrombosis,
etc.1.warfarin,
2.coumarin,
3.phenoprocoumon,
etc.
34 scored annotations:
5 scored annotations:
20 scored annotations:
Data Summarization: PharmGKB
![Page 44: How Bio Ontologies Enable Open Science](https://reader033.vdocuments.us/reader033/viewer/2022060108/55506835b4c90524138b4586/html5/thumbnails/44.jpg)
Data Summarization: Knewco
![Page 45: How Bio Ontologies Enable Open Science](https://reader033.vdocuments.us/reader033/viewer/2022060108/55506835b4c90524138b4586/html5/thumbnails/45.jpg)
Data Summarization: HGMD
• Use the disease hierarchy from SNOMED-CT to compute “enrichment” of mutation types in particular types of diseases
• … playing the GO-based microarray analysis game for disease mutations