gene ontology john pinney [email protected]
TRANSCRIPT
Gene annotation
Goal: transfer knowledge about
the function of gene products from model organisms to other genomes
Gene Ontology (GO):a collection of terms
and their definitions
and the logical relationships between them
describing gene products
nucleus
“A membrane-bounded organelle of eukaryotic cells in which chromosomes are housed and replicated. In most cells, the nucleus contains all of the cell's chromosomes except the organellar chromosomes, and is the site of RNA synthesis and processing. In some species, or in specialized cell types, RNA metabolism or DNA replication may be absent.”
GO:0005634
nucleus
intracellular membrane-bounded organelle
pronucleus
intracellular organelle
“is a”
membrane-bounded organelle
A term may have more than one parent term
andmore than one child term.
=>The gene ontology is not a tree
The gene ontology has a structure known as a Directed Acyclic Graph (DAG).
relationships are not symmetrical
there are no directed loops
mathematical term for a network
GO is actually made up of 3 different ontologies:
cellular componentmolecular functionbiological process
cellular component
“The part of a cell or its extracellular environment in which a gene product is located. A gene product may be located in one or more parts of a cell.”
cellular componentexamples:
cohesin core heterodimerextracellular regionlaminin-1 complexreplication forktranscription factor complex
molecular function
“Elemental activities, such as catalysis or binding, describing the actions of a gene product at the molecular level. A given gene product may exhibit one or more molecular functions.”
molecular functionexamples:
transcription factor bindingenzyme activator activity3'-nucleotidase activitymetallopeptidase activityhexokinase activity
biological process
“Those processes specifically pertinent to the functioning of integrated living units: cells, tissues, organs, and organisms. A process is a collection of molecular events with a defined beginning and end.”
biological processexamples:
para-aminobenzoic acid biosynthetic processprotein localizationestablishment of blood-nerve barriercircadian rhythmposterior midgut development
geneontology.orgdownload mappings from other databases
enzyme functions (EC, KEGG, MetaCyc)
protein domains(Pfam, SMART, PRINTS,…)
other controlled vocabularies of functions(E. coli functions, MIPS FunCat)
NCBI_NPNP_354299.2lolDGO:0043190ISS"ABC transporter, nucleotide binding/ATPase protein (lipoprotein)"taxon:17629920070612PAMGO_GAT
geneontology.orgdownload annotations for various genomes
databasegene product IDgene symbolGO term ID
evidence code
evidence codes
Allow curators to indicate the type of evidence for each gene-term annotation.
experimental
computational
author statement
e.g. IMP Inferred from mutant phenotype IDA Inferred from direct assay
e.g. ISS Inferred from sequence similarity
IGC Inferred from genome context e.g. TAS Traceable author statement
NCBI_NPNP_354299.2lolDGO:0043190ISS"ABC transporter, nucleotide binding/ATPase protein (lipoprotein)"taxon:17629920070612PAMGO_GAT
geneontology.orgdownload annotations for various genomes
databasegene product IDgene symbolGO term ID
evidence code description
organism (taxon) IDdateannotation project ID
geneontology.orgrepository of analysis tools that use GO
search, edit and and browse ontologies / annotationssoftware librariesstatistical analysistext miningprotein interactionsenrichment analysis
significant expression change in a microarray experiment
cluster from a protein
interaction network
some other experiment /
analysis
gene setwhole
genome (annotated)
Which GO terms occur significantly more often than expected in this
gene set?
BiNGO
GOstat
ArrayTrack
Advantages of GOsingle set of terms to describe the function of gene products from all organisms.DAG structure provides a logical framework to represent knowledge at whatever level of detail is available.continually revised to reflect the state of current knowledge.can quantify strength of relationships between terms (semantic similarity).many statistical analysis tools available.
Limitations of GOGO is limited in scope: it does not cover
processes that are not normal functions of gene products (e.g. oncogenesis).
sequence attributes (e.g. introns/exons)protein structures or interactionsevolutiongene expression
Summary (1)The gene ontology (GO) is a structured, controlled vocabulary to describe the function of gene products.
Terms in GO have logical relationships (“is a”, “part of”) with one another. Together these form a structure called a Directed Acyclic Graph (DAG).
GO is formed of 3 separate ontologies describing different aspects of gene function: cellular component, molecular function and biological process.
Summary (2)geneontology.org is the central resource for downloading ontology, annotation and mapping files.
evidence codes are used in annotations to show the experimental, computational or literature support for each function.