f01-cloud-mygene.info
TRANSCRIPT
A migration story for BioGPS
http://biogps.org
A migration story for BioGPS
http://biogps.org
A migration story for BioGPS
http://biogps.org
Gene-centric annotation data
A simple view:
http://biogps.org
Gene 1017 →Symbol: CDK2
→Ensembl: ENSG00000123374
→RefSeq: NM_001798
NM_052827
→Reporter: →U95A: 1792_g_at
1833_at
→U133A: 211804_s_at
204252_at
211803_at
A real example:
Symbol
Name
Alias
Summary
Ensembl
Refseq
UniGene
Homologene
GO
UniProt
InterPro
PDB
Prosite
IPI
And many more…
Relational database solutions Solution 1: “star” schema
GeneID Symbol
1017 CDK2
GeneID Platform Reporter
1017 U95A 1792_s_at
1017 U95A 1833_at
1017 U133A 211804_s_at
1017 U133A 204252_at
1017 U133A 211803_at
GeneID EnsemblID
1017 ENSG00000250560 Reporter Table
Master Table
Ensembl Table
GeneID RefseqID
1017 NM_001798
1017 NM_052827
Refseq Table
Relational database solutions
ID Type Value Parent Root
1 GeneID 1017 NULL 1017
2 Symbol CDK2 1017 1017
3 Ensembl ENSG00000123374 1017 1017
4 RefSeq NM_001798 1017 1017
5 RefSeq NM_052827 1017 1017
6 Platform U95A 1017 1017
7 Platform U133A 1017 1017
8 Reporter 1792_g_at U95A 1017
9 Reporter 1833_at U95A 1017
10 Reporter 211804_s_at U133A 1017
11 Reporter 204252_at U133A 1017
12 Reporter 211803_at U133A 1017
Generic Data Table
Solution 2: “weakly-typed” schema
“Document”-based database solution
{ “Symbol”: “CDK2”, “Ensembl”: “ENSG00000123374”, “RefSeq”: [ “NM_001798”, “NM_052827” ], “Reporter”: { “U95A”: [ “1792_g_at”, “1833_at” ], “U133A”:[ “211804_s_at”, “2045252_at”, “211803_at” ] } }
CDK2
1017:
What’s CouchDB
Document-based (“schema-free”) database
Index and query data in MapReduce fashion using Javascript
RESTful JSON API
Bi-directional replicator
Distributed
Load data into “CouchDB”
NCBI “gene_info” file
Gene “document”
Create bare-bone document for each gene
“gene2refseq”
“u95a_annot” “u133a_annot”
“gene2ensembl”
Appending more annotation to “document”
And more …
Easy to add append data type Easy to update incrementally
What’s behind Mygene.info
Gene Annotation as a Service
http://MyGene.Info
Gene annotation services go PUBLIC
Gene query service
http://mygene.info/query?q=<query>
Gene annotation service
http://mygene.info/gene/<geneid>
Gene Query Service user query matching gene IDs/symbols/names
(JSON output)
http://mygene.info/query?q=<query> Examples: http://mygene.info/query?q=cdk2
http://mygene.info/query?q=cdk2+AND+species:human http://mygene.info/query?q=cdk? http://mygene.info/query?q=p* http://mygene.info/query?q=entrezgene:1017 http://mygene.info/query?q=ensemblgene:ENSG00000123374
Gene Annotation Service gene id full or filtered gene annotation object (JSON output)
http://mygene.info/gene/<geneid> Examples: http://mygene.info/gene/1017 http://mygene.info/gene/ENSG00000123374 http://mygene.info/gene/1017?filter=name,symbol,summary http://mygene.info/gene/1017?filter=name,symbol,refseq.rna
Species supported: human mouse rat fruitfly nematode zebrafish thale cress frog
Demo and full documentation at http://mygene.info
Source code: https://bitbucket.org/newgene/genedoc/src
Targeted use case: Quickly build a gene-centric online resource without the need of maintaining
a local gene annotation database
Use it in a web application: Server side
Making direct HTTP calls
Client side
Setup a server-side proxy
JSONP calls
Cross-domain AJAX calls via CORS (Cross-Origin Resource Sharing)
Acknowledgement
Group members: Andrew Su Ian MacLeod Benjamin Good Eric Clarke
http://mygene.info
ISMB travel support
GNF collaborators: Camilo Orozco Jon Huss Past contributor: Marc Leglise
Funding and Support
(NIH grant: R01GM083924)