the gene ontology and its insertion into umls jane lomax

22
The Gene Ontology and its insertion into UMLS Jane Lomax

Upload: annice-merritt

Post on 04-Jan-2016

214 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: The Gene Ontology and its insertion into UMLS Jane Lomax

The Gene Ontology and its insertion into UMLS

Jane Lomax

Page 2: The Gene Ontology and its insertion into UMLS Jane Lomax

The Gene Ontology

Set of three structured vocabularies

Provide functional annotation of gene products

Dynamic

Cross-references to external databases

Page 3: The Gene Ontology and its insertion into UMLS Jane Lomax

The vocabularies

Molecular function — elemental activity or task

Biological process — broad objective or goal

Cellular component — location or complex

Page 4: The Gene Ontology and its insertion into UMLS Jane Lomax

The vocabularies

Molecular function — elemental activity or task• nuclease, DNA binding, microtubule motor

Biological process — broad objective or goal

Cellular component — location or complex

Page 5: The Gene Ontology and its insertion into UMLS Jane Lomax

The vocabularies

Molecular function — elemental activity or task• nuclease, DNA binding, microtubule motor

Biological process — broad objective or goal• mitosis, signal transduction, metabolism

Cellular component — location or complex

Page 6: The Gene Ontology and its insertion into UMLS Jane Lomax

The vocabularies

Molecular function — elemental activity or task• nuclease, DNA binding, microtubule motor

Biological process — broad objective or goal• mitosis, signal transduction, metabolism

Cellular component — location or complex• nucleus, ribosome

Page 7: The Gene Ontology and its insertion into UMLS Jane Lomax

GO structure

Directed acyclic graph (DAG) Allows multiple parentage

Page 8: The Gene Ontology and its insertion into UMLS Jane Lomax

True-path rule

Every path from a node back to the root must be biologically accurate

Page 9: The Gene Ontology and its insertion into UMLS Jane Lomax

Relationship types

is_a• subclass: a is a type of b

part_of• physical part of (component)• sub-process of (process)

Page 10: The Gene Ontology and its insertion into UMLS Jane Lomax

What makes up a GO term?

• term name• go_id• definition and

definition dbxref

• GO synonym• general dbxref• comment

Page 11: The Gene Ontology and its insertion into UMLS Jane Lomax

GO cross-links

Cross-references within GO• EC• RESID• MetaCyc

Mappings• SWISS-PROT keywords

Links in other databases• InterPro• UMLS/MeSH – in progress

Page 12: The Gene Ontology and its insertion into UMLS Jane Lomax

Why insert GO into UMLS?

A rich, widely used source for expanding UMLS• Can be used to improve areas of MeSH

Potential for ‘non-fuzzy’ text mining using GO terms• MeSH terms manually assigned to papers

Page 13: The Gene Ontology and its insertion into UMLS Jane Lomax

Unified Medical LanguageSystem (UMLS)

Research project maintained by the National Library of Medicine (NLM)

Aims to • allow computers to ‘understand’ biomedical meaning• improve retrieval and integration of computer

readable info

Has three ‘Knowledge sources’:• UMLS Metathesaurus • SPECIALIST lexicon • semantic network

Page 14: The Gene Ontology and its insertion into UMLS Jane Lomax

Knowledge sources

UMLS Metathesaurus• links multiple source vocabularies into unified

concepts, includes MeSH (Medical Subject Headings)

• GO to become source vocabulary

SPECIALIST lexicon• provides biomedical/English lexical info

semantic network • for categorizing concepts

Page 15: The Gene Ontology and its insertion into UMLS Jane Lomax

Inserting GO into UMLS

inversion• converting GO to correct format for UMLS

insertion• inserting GO using matching algorithms

editing• all concepts containing GO term reviewed

by hand

Page 16: The Gene Ontology and its insertion into UMLS Jane Lomax

Statistics

Approximately 23% of GO terms ‘match’ something in another source vocabulary

23.03%GO terms in concepts with other sources

76.97%GO terms in concepts where they are the only source

Page 17: The Gene Ontology and its insertion into UMLS Jane Lomax

Statistics

biological process molecular functioncellular component

% of GO in sources with other concepts, by GO vocabulary

4.6% 27.8% 45.2%

Page 18: The Gene Ontology and its insertion into UMLS Jane Lomax

Statistics

% of GO in sources with other concepts, by source

CSP2002 (Computer Retrieval of Information on Scientific Projects Thesaurus)

7.34 %

MSH2003_2002_08_14 (Medical Subject Headings)

19.74 %

SNMI98 (Systemized Nomenclature of Human and

Veterinary Medicine)

11.05 %

GO

CRISP

MeSH

SNOMED

Page 19: The Gene Ontology and its insertion into UMLS Jane Lomax

concept name

concept id

GO atoms

MeSH atoms

EC number

contexts

relationships to other concepts

definition

Page 20: The Gene Ontology and its insertion into UMLS Jane Lomax

Challenges with insertion

GO synonyms• As GO evolved - now not all synonymous

GO enzymes• GO separates enzyme function from enzyme

‘complexes’ - most vocabularies don’t

Semantic types• What semantic types now apply to concepts with GO

atoms?

Page 21: The Gene Ontology and its insertion into UMLS Jane Lomax

Future of insertion

Hoped that GO can be released with UMLS early next year• dependent on ironing out problems

Maintenance of insertion• GO changing continually - large differences

between UMLS releases

Page 22: The Gene Ontology and its insertion into UMLS Jane Lomax

www.geneontology.org•FlyBase & Berkeley Drosophila Genome Project•Saccharomyces Genome Database• PomBase (Sanger Institute)• Rat Genome Database• Genome Knowledge Base (CSHL)• The Institute for Genomic Research• Compugen, Inc•The Arabidopsis Information Resource•WormBase•DictyBase•Mouse Genome Informatics•Swiss-Prot/TrEMBL/InterPro•Pathogen Sequencing Unit(Sanger Institute)

•National Library of Medicine

•Alexa McCray•Stuart Nelson•Bill Hole

•Oak Ridge Institute for Science and Education•National Library of Medicine•U. S. Department of Energy

The Gene Ontology Consortium is supported by an R01 grant from the National Human Genome Research Institute (NHGRI) [grant HG02273]. SGD is supported by a P41, National Resources, grant from the NHGRI [grant HG01315]; MGD by a P41 from the NHGRI [grant HG00330]; GXD by the National Institute of Child Health and Human Development [grant HD33745]; FlyBase by a P41 from the NHGRI [grant HG00739] and by the Medical Research Council, London. TAIR is supported by the National Science Foundation [grant DBI-9978564]. WormBase is supported by a P41, National Resources, grant from the NHGRI [grant HG02223]; RGD is supported by an R01 grant from the NHLBI [grant HL64541]; DictyBase is supported by an R01 grant from the NIGMS [grant GM064426].