f01-cloud-mygene.info

MyGene.Info: Gene Annotation as a Service - GAaaS

Chunlei Wu [email protected]

2011/07/16

mailto:[email protected]

A migration story for BioGPS

http://biogps.org

Gene-centric annotation data

A simple view:

http://biogps.org

Gene 1017 →Symbol: CDK2

→Ensembl: ENSG00000123374

→RefSeq: NM_001798

NM_052827

→Reporter: →U95A: 1792_g_at

1833_at

→U133A: 211804_s_at

204252_at

211803_at

A real example:

Symbol

Name

Alias

Summary

Ensembl

Refseq

UniGene

Homologene

GO

UniProt

InterPro

PDB

Prosite

IPI

And many more…

Relational database solutions Solution 1: “star” schema

GeneID Symbol

1017 CDK2

GeneID Platform Reporter

1017 U95A 1792_s_at

1017 U95A 1833_at

1017 U133A 211804_s_at

1017 U133A 204252_at

1017 U133A 211803_at

GeneID EnsemblID

1017 ENSG00000250560 Reporter Table

Master Table

Ensembl Table

GeneID RefseqID

1017 NM_001798

1017 NM_052827

Refseq Table

Relational database solutions

ID Type Value Parent Root

1 GeneID 1017 NULL 1017

2 Symbol CDK2 1017 1017

3 Ensembl ENSG00000123374 1017 1017

4 RefSeq NM_001798 1017 1017

5 RefSeq NM_052827 1017 1017

6 Platform U95A 1017 1017

7 Platform U133A 1017 1017

8 Reporter 1792_g_at U95A 1017

9 Reporter 1833_at U95A 1017

10 Reporter 211804_s_at U133A 1017

11 Reporter 204252_at U133A 1017

12 Reporter 211803_at U133A 1017

Generic Data Table

Solution 2: “weakly-typed” schema

“Document”-based database solution

{ “Symbol”: “CDK2”, “Ensembl”: “ENSG00000123374”, “RefSeq”: [ “NM_001798”, “NM_052827” ], “Reporter”: { “U95A”: [ “1792_g_at”, “1833_at” ], “U133A”:[ “211804_s_at”, “2045252_at”, “211803_at” ] } }

CDK2

1017:

What’s CouchDB

Document-based (“schema-free”) database

Index and query data in MapReduce fashion using Javascript

RESTful JSON API

Bi-directional replicator

Distributed

Load data into “CouchDB”

NCBI “gene_info” file

Gene “document”

Create bare-bone document for each gene

“gene2refseq”

“u95a_annot” “u133a_annot”

“gene2ensembl”

Appending more annotation to “document”

And more …

Easy to add append data type Easy to update incrementally

What’s behind Mygene.info

Gene Annotation as a Service

http://MyGene.Info

Gene annotation services go PUBLIC

Gene query service

http://mygene.info/query?q=<query>

Gene annotation service

http://mygene.info/gene/<geneid>

Gene Query Service user query matching gene IDs/symbols/names

(JSON output)

http://mygene.info/query?q=<query> Examples: http://mygene.info/query?q=cdk2

http://mygene.info/query?q=cdk2+AND+species:human http://mygene.info/query?q=cdk? http://mygene.info/query?q=p* http://mygene.info/query?q=entrezgene:1017 http://mygene.info/query?q=ensemblgene:ENSG00000123374

Gene Annotation Service gene id full or filtered gene annotation object (JSON output)

http://mygene.info/gene/<geneid> Examples: http://mygene.info/gene/1017 http://mygene.info/gene/ENSG00000123374 http://mygene.info/gene/1017?filter=name,symbol,summary http://mygene.info/gene/1017?filter=name,symbol,refseq.rna

Species supported: human mouse rat fruitfly nematode zebrafish thale cress frog

Demo and full documentation at http://mygene.info

Source code: https://bitbucket.org/newgene/genedoc/src

Targeted use case: Quickly build a gene-centric online resource without the need of maintaining

a local gene annotation database

Use it in a web application: Server side

Making direct HTTP calls

Client side

Setup a server-side proxy

JSONP calls

Cross-domain AJAX calls via CORS (Cross-Origin Resource Sharing)

http://mygene.info/

https://bitbucket.org/newgene/genedoc/src

Acknowledgement

Group members: Andrew Su Ian MacLeod Benjamin Good Eric Clarke

http://mygene.info

ISMB travel support

GNF collaborators: Camilo Orozco Jon Huss Past contributor: Marc Leglise

Funding and Support

(NIH grant: R01GM083924)

http://mygene.info/

f01-cloud-mygene.info

Technology

gene annotation service

biogps http

cdk2 http

human http

r01gm083924 http

public gene query service

info gene annotation

gene annotation servicegene