ismb2012: the gene wiki: crowdsourcing human gene annotation
DESCRIPTION
Note, several slides use animation, so for best display please download and view in Powerpoint.TRANSCRIPT
The Gene Wiki: Crowdsourcing human gene annotation
Andrew Su, Ph.D.The Scripps Research Institute
ISMBSpecial Session: Harnessing community
intelligence for bioinformatics#ISMB #SS7
July 17, 2012
The Long Tail is a prolific source of content2
ShortHead
Long Tail
Content produced
Contributors (sorted)
News :Video:
Product reviews:Food reviews:Talent judging:
Gene annotation:
NewspapersTV/Hollywood
Consumer reportsFood criticsOlympics
Manual curation
BlogsYouTube
Amazon reviewsYelp
American IdolGene Wiki
3
We can harness the Long Tail of scientists to directly participate in
the gene annotation process.
Wikipedia is reasonably accurate4
Wikipedia has breadth and depth5
http://en.wikipedia.org/wiki/Wikipedia:Size_comparisons, July 2008
Articles
Words(millions)
Wikipedia Britannica Online
Filtering, extracting, and summarizing PubMed
Documents
Concepts
Wiki success depends on a positive feedback7
Gene wiki page utility
Number ofusers
Number ofcontributors
1001
2002
10,000 gene “stubs” within Wikipedia8
Protein structure
Symbols and identifiers
Tissue expression pattern
Gene Ontology annotations
Links to structured databases
Gene summary
Protein interactions
Linked references
Huss, PLoS Biol, 2008
Utility
Users
Contributors
Gene Wiki has a critical mass of readers9
Total: ~4.3 million views / month
Huss, PLoS Biol, 2008; Good, NAR, 2011
Utility
Users
Contributors
Gene Wiki has a critical mass of editors10
Good, NAR, 2011
Utility
Users
Contributors
Cum
ulat
ive
edits
Productive edits
Vandalism
~10,000 words added / month
4.3 million views / month
1000 edits / month
Total 1.42 million words ≈ 230 full-length articles
A review article for every gene is powerful11
References to the literature
Hyperlinks to related conceptsReelin: 98 editors, 703 edits since July 2002
Heparin: 358 editors, 654 edits since June 2003
AMPK: 109 editors, 203 edits since March 2004
RNAi: 394 editors, 994 edits since October 2002
Making the Gene Wiki more computable12
Structured annotationsFree text
Annotator
Filling the gaps in gene annotation13
Wikilink
GO exact synonym
Gene Wiki mapping
NCBI Entrez Gene: 3362
GO:0004993
Candidate assertion
Good, BMC Genomics 2011, 12:603
Annotator
Filling the gaps in gene annotation14
Wikilink
GO exact match
Gene Wiki mapping
NCBI Entrez Gene: 334
GO:0006897
Candidate assertion
Good, BMC Genomics 2011, 12:603
Novel GO annotations – so what?15
11,022 annotations mined from Gene Wiki
4703 (43%) match known annotations
~100,000 annotations
from GO consortium
6319 “novel”
annotations @ 48-64% specificity
Good, BMC Genomics 2011, 12:603
Gene Wiki content improves enrichment analysis16
GO term
Gene listConcept
recognitionPubMed abstracts
Enrichment analysis
GO:0007411
axon guidance
(GO:0007411)
264 genes
Linked genes through PubMed
P = 1.55 E-20
811 articles
Yes No
Yes 13 2
No 251 12033
Gene Wiki content improves enrichment analysis17
GO term
Gene listConcept
recognitionPubMed abstracts
Gene Wiki
+
Enrichment analysis
GO:0006936 GO:0006936
muscle contraction
(GO:0006936)
87 genes
Linked genes through PubMed
Linked genes through
PubMed + Gene Wiki
P = 1.0 P = 1.22 E-09
251 articles
87 articles
Gene Wiki content improves enrichment analysis18
p-value (PubMed only)
p-value (PubMed + GW)
Muscle contraction
More significant with PubMed + GW
More significant with PubMed only
Gene Wiki+ for integrative queries19
http://genewikiplus.org
mwsync
Dynamic queries across genes, diseases, SNPs20
21
22
TOP 100 GENES
Gene Wiki+ for integrative queries23
http://genewikiplus.org
mwsync
{{#ask: [[Category:Human_proteins]] [[is_associated_with:: <q>[[Category:Breast_cancer]]</q>]] [[HasSNP:: <q>[[is_associated_with:: <q>[[Category:Breast_cancer]]</q>]] </q>]]}}
…
OMIMPharmGKB
OMIMPharmGKB
Gene Wiki+ for integrative queries24
http://genewikiplus.org
mwsync
The Long Tail of scientists is a valuable source of
information on gene function
25
Crowdsourcing a gene annotation portal26
27
Doug Howe, ZFINJohn Hogenesch, U PennJon Huss, GNFLuca de Alfaro, UCSCAngel Pizzaro, U PennFaramarz Valafar, SDSUPierre Lindenbaum,
Fondation Jean DaussetMichael Martone, RushKonrad Koehler, Karo BioWarren Kibbe, Simon Lim, NorthwesternMany Wikipedia editors
WP:MCB Project
Collaborators
Erik ClarkeBen GoodSalvatore Loguercio
Ian MacleodMax NanisChunlei Wu
Group members
Funding and Support
(BioGPS: GM83924, Gene Wiki: GM089820)
Contacthttp://sulab.org
[email protected]@andrewsu+Andrew Su
ISMB travel support