ecoliwiki and gonuts

39
EcoliWiki and GONUTS Wiki-based Systems for Community Annotation Jim Hu Dept. of Biochemistry and Biophysics Texas A&M University

Upload: tryna

Post on 15-Jan-2016

20 views

Category:

Documents


0 download

DESCRIPTION

EcoliWiki and GONUTS. Wiki-based Systems for Community Annotation Jim Hu Dept. of Biochemistry and Biophysics Texas A&M University. Overview. EcoliWiki and the central problem in genome annotation Gene Ontology and the Gene Ontology Normal Usage Tracking System (GONUTS) - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: EcoliWiki and GONUTS

EcoliWiki and GONUTS

Wiki-based Systems for Community AnnotationJim Hu

Dept. of Biochemistry and Biophysics

Texas A&M University

Page 2: EcoliWiki and GONUTS

Overview

• EcoliWiki and the central problem in genome annotation• Gene Ontology and the Gene Ontology Normal Usage Tracking

System (GONUTS)

• Live demos/Discussion

Page 3: EcoliWiki and GONUTS

Annotation

• Goals for annotation:– Coverage– Accuracy– Usefulness

• for scientists (human-readable)• for machine inference generation (computer-understandable)

• Annotation is a moving target!

Page 4: EcoliWiki and GONUTS

The need for Annotation is growing

Page 5: EcoliWiki and GONUTS

People are limiting for annotation• Major genome databases employ

large numbers of people• This model problematic

– Curators are expensive• NIH and NSF cannot afford to staff

every organism at this level

– Broad expertise across all areas is hard

• Curators have to read papers in areas they were not trained in.

• Curators may not recognize the significance of papers in areas they were not trained in

• Can we make it:– cheaper?– faster?– better?

MGI WormBase Gramene

Curation 31 19 6

Software 10 8 4

SysAdmin 4 0.25 0.25

User Support 3 0 1

Software QA 3 0 0

Administration* 3 1.5 1

Total 54 28.75 12.25

Page 6: EcoliWiki and GONUTS

The Wikipedia approach

• Get your user community to work for free!• aka "Community annotation" or "Community curation"

Page 7: EcoliWiki and GONUTS

EcoliWiki

http://ecoliwiki.org or .net or .com

(most of our hits come from Google)

Page 8: EcoliWiki and GONUTS

“What is true of Escherichia coli is true of the elephant” - Jacques Monod

“Thanks to annotation creep, what’s false for E. coli is false for the elephant too”

- Jim Hu

“What is true of Escherichia coli is true of the elephant” - Jacques Monod

“Thanks to annotation creep, what’s false for E. coli is false for the elephant too”

- Jim Hu

http://www.pasteur.fr/infosci/archives/mon/im_ele.html

Page 9: EcoliWiki and GONUTS

EcoliWiki philosophy

• Any registered user can edit• Any registered user can

register new users• Any registered user can

create new pages• It's easier to revise than to

create new content– Seed content from other

places, mostly EcoCyc

• Any registered user can edit• Any registered user can

register new users• Any registered user can

create new pages• It's easier to revise than to

create new content– Seed content from other

sites, mostly EcoCyc

Page 10: EcoliWiki and GONUTS

But won't that invite chaos?

GenBank's managers are dead set against letting users into GenBank's files, however. They say there already are procedures to deal with errors in the database, and researchers themselves have created secondary databases that improve on what GenBank has to offer. "That we would wholesale start changing people's records goes against our idea of an archive," says David Lipman, director of the National Center for Biotechnology Information (NCBI), GenBank's home in Bethesda, Maryland. "It would be chaos."

GenBank's managers are dead set against letting users into GenBank's files, however. They say there already are procedures to deal with errors in the database, and researchers themselves have created secondary databases that improve on what GenBank has to offer. "That we would wholesale start changing people's records goes against our idea of an archive," says David Lipman, director of the National Center for Biotechnology Information (NCBI), GenBank's home in Bethesda, Maryland. "It would be chaos."

Page 11: EcoliWiki and GONUTS

Correct compared to what?

NCBI RefSeq:

Wikipedia:

Page 12: EcoliWiki and GONUTS

Correct compared to what?

NCBI RefSeq:

Wikipedia:

Page 13: EcoliWiki and GONUTS

Correct compared to what?

NCBI RefSeq:

Wikipedia:

Page 14: EcoliWiki and GONUTS

Correct compared to what?

Page 15: EcoliWiki and GONUTS

This is how biology achieves fidelity

A collage of books I haven’t read

Page 16: EcoliWiki and GONUTS

Biology Wikis are proliferating

Page 17: EcoliWiki and GONUTS

Participation is the major challenge• Anyone can edit ≠ Anyone will edit• Wikipedia: a tiny fraction of the users edit anything

– A tiny fraction of those do major editing

– Really big denominator

• Outreach to increase our user base

Page 18: EcoliWiki and GONUTS

Participation is the major challenge• Tools to make it easier to edit

Page 19: EcoliWiki and GONUTS

Participation is the major challenge• Biggest difference from other systems:

– Partial annotations are wanted– It doesn't matter if you don't know the wiki markup– It doesn't matter if what you're adding isn't fully worked out

• Someone else can fix it• And you can fix what others write

Page 20: EcoliWiki and GONUTS

Making it machine-friendly:ontologies

• Ontology: – in philosophy: a metaphysical system for studying being

– In biology/bioinformatics: a structured representation of biological knowledge

• NCBO = National Center for Biological Ontologies• OBO = Open Biological Ontologies• Examples

– MESH

– Sequence ontology = SO

– Phenotype and trait ontology = PATO

– Gene Ontology = GO

– see the EBI ontology browser: http://www.ebi.ac.uk/ontology-lookup/

Page 21: EcoliWiki and GONUTS

What is an ontology?

• Controlled vocabulary with – Term identifiers

• GO:0000075– Name

• cell cycle checkpoint– Definitions

• "A point in the eukaryotic cell cycle where progress through the cycle can be halted until conditions are suitable for the cell to proceed to the next stage." [GOC:mah, ISBN:0815316194]

– Relationships• is_a GO:0000074 ! regulation of progression through cell cycle

• Terms arranged in a Directed Acyclic Graph (DAG)

Page 22: EcoliWiki and GONUTS

Pros and Cons of Ontologies

• Pros– facilitate comparison across systems– facilitate computer based reasoning systems

• Good for data mining!

• Cons– Large and unwieldy– Difficult to understand– Difficult to use– May never capture knowledge accurately– Ontology development lags behind the field it tries to capture

• Example of a theme of genomics: imperfect tools can still be very powerful!

Page 23: EcoliWiki and GONUTS

GO = Gene Ontology

• 3 ontologies for gene products– Biological Process

– Molecular Function

– Cellular Component

• Used to make annotations– aka Gene

associations

– Term + qualifiers + evidence code + reference etc.

part_of

is_a

from GOCfigure from GO consortium presentations

Page 24: EcoliWiki and GONUTS

Cellular Component

• where a gene product acts

from GOCfigure from GO consortium presentations

Page 25: EcoliWiki and GONUTS

Cellular Component

from GOCfigure from GO consortium presentations

Page 26: EcoliWiki and GONUTS

Molecular Function

• activities or “jobs” of a gene product

glucose-6-phosphate isomerase activity

from GOCfigure from GO consortium presentations

Page 27: EcoliWiki and GONUTS

Molecular Function

insulin binding

insulin receptor activity

from GOCfigure from GO consortium presentations

Page 28: EcoliWiki and GONUTS

Molecular Function

• A gene product may have several functions• Sets of functions make up a biological process.

from GOCfigure from GO consortium presentations

Page 29: EcoliWiki and GONUTS

Biological Process

a commonly recognized series of events

cell division

from GOCfigure from GO consortium presentations

Page 30: EcoliWiki and GONUTS

Biological Process

transcription

from GOCfigure from GO consortium presentations

Page 31: EcoliWiki and GONUTS

GO annotation

• Find papers• Read them

– Find what genes are mentioned– What assertions are made about the product?– What GO terms are applicable?

• GO term browsers– Amigo http://amigo.geneontology.org/cgi-bin/amigo/go.cgi– GONUTS http://gowiki.tamu.edu

• New term needed?

– What evidence code should be used to record the assertion?

• Record gene associations in the MOD database• Send gene associations to GO consortium• Downloadable files that users doing electronic analysis can parse

Page 32: EcoliWiki and GONUTS

Human vs Electronic GO annotations

• What is the basis for making a gene association?

• Human– Experimental Evidence Codes

• EXP: Inferred from Experiment• IDA: Inferred from Direct Assay• IPI: Inferred from Physical Interaction• IMP: Inferred from Mutant Phenotype• IGI: Inferred from Genetic Interaction• IEP: Inferred from Expression Pattern

– Computational Analysis Evidence Codes• ISS: Inferred from Sequence or Structural Similarity• ISO: Inferred from Sequence Orthology• ISA: Inferred from Sequence Alignment• ISM: Inferred from Sequence Model• IGC: Inferred from Genomic Context• RCA: inferred from Reviewed Computational Analysis

– Author Statement Evidence Codes• TAS: Traceable Author Statement• NAS: Non-traceable Author Statement

– Curator Statement Evidence Codes• IC: Inferred by Curator• ND: No biological Data available

• Automatically-assigned Evidence Codes• IEA: Inferred from Electronic Annotation

Page 33: EcoliWiki and GONUTS

GONUTs (http://gowiki.tamu.edu)• Started as a wiki-

based usage guide• Each ontology term

is a MW Category– MW supports

DAGs as Categories!

• Each term page has a notes area for user notes on usage

• term pages list examples of genes that were annotated to this term

Page 34: EcoliWiki and GONUTS

MOD gene pages• Gene pages from

established Model Organism Databases provide examples of best practices

Page 35: EcoliWiki and GONUTS

Responding to community needs

Page 36: EcoliWiki and GONUTS

User-created gene pages

• Annotation pages based on UniProt IDs

Page 37: EcoliWiki and GONUTS

Supporting Annotation Jamborees in Cyberspace

• RefGenome subgroup of GO Consortium– collaboration on

annotation consistency

– Electronic Jamborees via teleconference

– Uses GONUTS to collect and compare

Page 38: EcoliWiki and GONUTS

Supporting Annotation Jamborees in Cyberspace

• RefGenome subgroup of GO Consortium– collaboration on

annotation consistency

– Electronic Jamborees via teleconference

– Uses GONUTS to collect and compare

Page 39: EcoliWiki and GONUTS

Thanks to

• EcoliWiki/GONUTS Team– Nathan Liles– Brenley McIntosh– Debby Siegele– Daniel Renfro– Anand Venkatraman– Adrienne Zweifel

• GO consortium

• EcoliHub Team Leaders– Barry Wanner PI, Purdue– Walid Aref, co-PI, Purdue– Tyrell Conway, co-PI, Oklahoma– Mike Gribskov, co-PI, Purdue– Peter Karp, co-PI, SRI– Daisuke Kihara, co-PI, Purdue

• Funding NIH U24-GM077905

URLs: http:ecolihub.org

http:ecoliwiki.org

http:gowiki.tamu.edu