![Page 1: ISB2012: The Gene Wiki: Crowdsourcing human gene annotation](https://reader035.vdocuments.us/reader035/viewer/2022062319/554e80dbb4c9054a698b5463/html5/thumbnails/1.jpg)
The Gene Wiki: Crowdsourcing human gene annotation
Andrew Su, Ph.D.Department of Molecular and Experimental Medicine
The Scripps Research Institute
Biocuration 2012
April 2, 2012
![Page 2: ISB2012: The Gene Wiki: Crowdsourcing human gene annotation](https://reader035.vdocuments.us/reader035/viewer/2022062319/554e80dbb4c9054a698b5463/html5/thumbnails/2.jpg)
The Long Tail is a prolific source of content2
ShortHead
Long Tail
Content produced
Contributors (sorted)
News :Video:
Product reviews:Food reviews:Talent judging:
Gene annotation:
NewspapersTV/Hollywood
Consumer reportsFood criticsOlympics
Manual curation
BlogsYouTube
Amazon reviewsYelp
American IdolGene Wiki
![Page 3: ISB2012: The Gene Wiki: Crowdsourcing human gene annotation](https://reader035.vdocuments.us/reader035/viewer/2022062319/554e80dbb4c9054a698b5463/html5/thumbnails/3.jpg)
3
We can harness the Long Tail of scientists to directly participate in
the gene annotation process.
![Page 4: ISB2012: The Gene Wiki: Crowdsourcing human gene annotation](https://reader035.vdocuments.us/reader035/viewer/2022062319/554e80dbb4c9054a698b5463/html5/thumbnails/4.jpg)
Wikipedia is reasonably accurate4
![Page 5: ISB2012: The Gene Wiki: Crowdsourcing human gene annotation](https://reader035.vdocuments.us/reader035/viewer/2022062319/554e80dbb4c9054a698b5463/html5/thumbnails/5.jpg)
Wikipedia has breadth and depth5
http://en.wikipedia.org/wiki/Wikipedia:Size_comparisons, July 2008
Articles
Words(millions)
Wikipedia Britannica Online
![Page 6: ISB2012: The Gene Wiki: Crowdsourcing human gene annotation](https://reader035.vdocuments.us/reader035/viewer/2022062319/554e80dbb4c9054a698b5463/html5/thumbnails/6.jpg)
Filtering, extracting, and summarizing PubMed
Documents
Concepts
![Page 7: ISB2012: The Gene Wiki: Crowdsourcing human gene annotation](https://reader035.vdocuments.us/reader035/viewer/2022062319/554e80dbb4c9054a698b5463/html5/thumbnails/7.jpg)
Wiki success depends on a positive feedback7
Gene wiki page utility
Number ofusers
Number ofcontributors
1001
2002
![Page 8: ISB2012: The Gene Wiki: Crowdsourcing human gene annotation](https://reader035.vdocuments.us/reader035/viewer/2022062319/554e80dbb4c9054a698b5463/html5/thumbnails/8.jpg)
10,000 gene “stubs” within Wikipedia8
Protein structure
Symbols and identifiers
Tissue expression pattern
Gene Ontology annotations
Links to structured databases
Gene summary
Protein interactions
Linked references
Huss, PLoS Biol, 2008
Utility
Users
Contributors
![Page 9: ISB2012: The Gene Wiki: Crowdsourcing human gene annotation](https://reader035.vdocuments.us/reader035/viewer/2022062319/554e80dbb4c9054a698b5463/html5/thumbnails/9.jpg)
Gene Wiki has a critical mass of readers9
Total: ~4.3 million views / month
Huss, PLoS Biol, 2008; Good, NAR, 2011
Utility
Users
Contributors
![Page 10: ISB2012: The Gene Wiki: Crowdsourcing human gene annotation](https://reader035.vdocuments.us/reader035/viewer/2022062319/554e80dbb4c9054a698b5463/html5/thumbnails/10.jpg)
Gene Wiki has a critical mass of editors10
Good, NAR, 2011
Utility
Users
Contributors
Cum
ulat
ive
edits
Productive edits
Vandalism
~10,000 words added / month
4.3 million views / month
1000 edits / month
Total 1.42 million words ≈ 230 full-length articles
![Page 11: ISB2012: The Gene Wiki: Crowdsourcing human gene annotation](https://reader035.vdocuments.us/reader035/viewer/2022062319/554e80dbb4c9054a698b5463/html5/thumbnails/11.jpg)
A review article for every gene is powerful11
Hyperlinks to related concepts
References to the literature
Reelin: 68 editors, 543 edits since July 2002
Heparin: 175 editors, 320 edits since June 2003
AMPK: 44 editors, 84 edits since March 2004
RNAi: 232 editors, 708 edits since October 2002
![Page 12: ISB2012: The Gene Wiki: Crowdsourcing human gene annotation](https://reader035.vdocuments.us/reader035/viewer/2022062319/554e80dbb4c9054a698b5463/html5/thumbnails/12.jpg)
Making the Gene Wiki more computable12
Structured annotationsFree text
![Page 13: ISB2012: The Gene Wiki: Crowdsourcing human gene annotation](https://reader035.vdocuments.us/reader035/viewer/2022062319/554e80dbb4c9054a698b5463/html5/thumbnails/13.jpg)
Filling the gaps in gene annotation13
Wikilink
GO exact synonym
Gene Wiki mapping
NCBI Entrez Gene: 3362
GO:0004993
Candidate assertion
![Page 14: ISB2012: The Gene Wiki: Crowdsourcing human gene annotation](https://reader035.vdocuments.us/reader035/viewer/2022062319/554e80dbb4c9054a698b5463/html5/thumbnails/14.jpg)
Filling the gaps in gene annotation14
Wikilink
GO exact match
Gene Wiki mapping
NCBI Entrez Gene: 334
GO:0006897
Candidate assertion
![Page 15: ISB2012: The Gene Wiki: Crowdsourcing human gene annotation](https://reader035.vdocuments.us/reader035/viewer/2022062319/554e80dbb4c9054a698b5463/html5/thumbnails/15.jpg)
Disease associations mined from the Gene Wiki
2147 candidate
annotations
Gene Wiki Articles (10,271)
Filter out seeded text
NCBO Annotator
Compare to DO database
Matched Disease Ontology terms
(2983)
70% have no match
2% match child
23% exact match
5% match parent
Good, BMC Genomics 2011, 12:603
![Page 16: ISB2012: The Gene Wiki: Crowdsourcing human gene annotation](https://reader035.vdocuments.us/reader035/viewer/2022062319/554e80dbb4c9054a698b5463/html5/thumbnails/16.jpg)
Disease associations mined from the Gene Wiki
Expert curation
Correct86%
Maybe: 4%
Incorrect: 10%
Overall specificity: 90-93%
Good, BMC Genomics 2011, 12:603
![Page 17: ISB2012: The Gene Wiki: Crowdsourcing human gene annotation](https://reader035.vdocuments.us/reader035/viewer/2022062319/554e80dbb4c9054a698b5463/html5/thumbnails/17.jpg)
GO associations mined from the Gene Wiki
6319 candidate
annotations
Gene Wiki Articles (10,271)
Filter out seeded text
NCBO Annotator
Compare to GO database
Matched Gene Ontology terms
(11,022)
55% have no match
2% match child
17% exact match
26% match parent
Good, BMC Genomics 2011, 12:603
![Page 18: ISB2012: The Gene Wiki: Crowdsourcing human gene annotation](https://reader035.vdocuments.us/reader035/viewer/2022062319/554e80dbb4c9054a698b5463/html5/thumbnails/18.jpg)
GO associations mined from the Gene Wiki
Expert curation
Correct
Maybe
Incorrect 60%
Overall specificity: 48-64%
26%
14%
Good, BMC Genomics 2011, 12:603
![Page 19: ISB2012: The Gene Wiki: Crowdsourcing human gene annotation](https://reader035.vdocuments.us/reader035/viewer/2022062319/554e80dbb4c9054a698b5463/html5/thumbnails/19.jpg)
Common sources of error in GO associations19
OR2F1: “Olfactory receptors … are responsible for the recognition and G protein-mediated transduction of odorant signals.”
1) Incorrect concept recognition
Transduction (GO:0009293)
The transfer of genetic information to a bacterium from a bacteriophage or between bacterial or yeast cells mediated by a phage vector.
Signal transduction (GO:0007165)
The cellular process in which a signal is conveyed to trigger a change in the activity or state of a cell. Signal transduction begins with reception of a signal, e.g. a ligand binding to a receptor or receptor activation by a stimulus such as light, and ends with regulation of a downstream cellular process…
Good, BMC Genomics 2011, 12:603
![Page 20: ISB2012: The Gene Wiki: Crowdsourcing human gene annotation](https://reader035.vdocuments.us/reader035/viewer/2022062319/554e80dbb4c9054a698b5463/html5/thumbnails/20.jpg)
Common sources of error in GO associations20
MEF2C: “Several post translational modifications have been identified including phosphorylation on serine-59 …”
2) Incorrect sentence context
DephosphorylationExcretionGene expressionGlycosylationLocalizationMethylationProteolysisSecretionTransportTranscriptionTranslation
MEF2C
Myelination
Phosporylation
Neurogenesis
Good, BMC Genomics 2011, 12:603
![Page 21: ISB2012: The Gene Wiki: Crowdsourcing human gene annotation](https://reader035.vdocuments.us/reader035/viewer/2022062319/554e80dbb4c9054a698b5463/html5/thumbnails/21.jpg)
Novel GO annotations – so what?21
11,022 annotations mined from Gene Wiki
4703 (43%) match known annotations
~100,000 annotations
from GO consortium
6319 “novel”
annotations @ 48-64% specificity
![Page 22: ISB2012: The Gene Wiki: Crowdsourcing human gene annotation](https://reader035.vdocuments.us/reader035/viewer/2022062319/554e80dbb4c9054a698b5463/html5/thumbnails/22.jpg)
Gene Wiki content improves enrichment analysis22
GO term
Gene listConcept
recognitionPubMed abstracts
Enrichment analysis
GO:0007411
axon guidance
(GO:0007411)
264 genes
Linked genes through PubMed
P = 1.55 E-20
811 articles
Yes No
Yes 13 2
No 251 12033
![Page 23: ISB2012: The Gene Wiki: Crowdsourcing human gene annotation](https://reader035.vdocuments.us/reader035/viewer/2022062319/554e80dbb4c9054a698b5463/html5/thumbnails/23.jpg)
Gene Wiki content improves enrichment analysis23
GO term
Gene listConcept
recognitionPubMed abstracts
Gene Wiki
+
Enrichment analysis
GO:0006936 GO:0006936
muscle contraction
(GO:0006936)
87 genes
Linked genes through PubMed
Linked genes through
PubMed + Gene Wiki
P = 1.0 P = 1.22 E-09
251 articles
87 articles
![Page 24: ISB2012: The Gene Wiki: Crowdsourcing human gene annotation](https://reader035.vdocuments.us/reader035/viewer/2022062319/554e80dbb4c9054a698b5463/html5/thumbnails/24.jpg)
Gene Wiki content improves enrichment analysis24
p-value (PubMed only)
p-value (PubMed + GW)
Muscle contraction
More significant
PubMed + GW
More significant
PubMed only
![Page 25: ISB2012: The Gene Wiki: Crowdsourcing human gene annotation](https://reader035.vdocuments.us/reader035/viewer/2022062319/554e80dbb4c9054a698b5463/html5/thumbnails/25.jpg)
Challenges and future directions
• How to complement and integrate with traditional biocuration workflows?
• How to disseminate and utilize crowdsourced annotations?
25
![Page 26: ISB2012: The Gene Wiki: Crowdsourcing human gene annotation](https://reader035.vdocuments.us/reader035/viewer/2022062319/554e80dbb4c9054a698b5463/html5/thumbnails/26.jpg)
The Long Tail of scientists is a valuable source of
information on gene function
26
![Page 27: ISB2012: The Gene Wiki: Crowdsourcing human gene annotation](https://reader035.vdocuments.us/reader035/viewer/2022062319/554e80dbb4c9054a698b5463/html5/thumbnails/27.jpg)
27
Doug Howe, ZFINJohn Hogenesch, U PennJon Huss, GNFLuca de Alfaro, UCSCAngel Pizzaro, U PennFaramarz Valafar, SDSUPierre Lindenbaum,
Fondation Jean DaussetMichael Martone, RushKonrad Koehler, Karo BioWarren Kibbe, Simon Lim, NorthwesternMany Wikipedia editors
WP:MCB Project
Collaborators
Erik ClarkeBen Good (*)Salvatore Loguercio
Ian MacleodChunlei Wu
Group members
Funding and Support
(BioGPS: GM83924, Gene Wiki: GM089820)
Contacthttp://sulab.org
[email protected]@andrewsu+Andrew Su
See poster # 30 for more on the Gene Wiki and
crowdsourcing in biology!
![Page 28: ISB2012: The Gene Wiki: Crowdsourcing human gene annotation](https://reader035.vdocuments.us/reader035/viewer/2022062319/554e80dbb4c9054a698b5463/html5/thumbnails/28.jpg)
Making the Gene Wiki more reliable28
The company name is derived from old Greek, and means
"destroyer of birds".
Novartis is a multinational pharmaceutical company
based in Basel, Switzerland that manufactures drugs such
as clozapine (Clozaril), diclofenac (Voltaren), …
2
2
![Page 29: ISB2012: The Gene Wiki: Crowdsourcing human gene annotation](https://reader035.vdocuments.us/reader035/viewer/2022062319/554e80dbb4c9054a698b5463/html5/thumbnails/29.jpg)
Making the Gene Wiki more reliable29
http://www.wikitrust.net/
The company name is derived from old Greek, and means
"destroyer of birds".
Novartis is a multinational pharmaceutical company
based in Basel, Switzerland that manufactures drugs such
as clozapine (Clozaril), diclofenac (Voltaren), …
*
36211 total edits 36 total edits
High-trust author Low-trust author
******
** *
*
*
**
2