f01-cloud-mygene.info

17
MyGene.Info: Gene Annotation as a Service - GAaaS Chunlei Wu [email protected] 2011/07/16

Upload: bioinformatics-open-source-conference

Post on 10-May-2015

696 views

Category:

Technology


1 download

TRANSCRIPT

Page 1: F01-Cloud-Mygene.info

MyGene.Info: Gene Annotation as a Service - GAaaS

Chunlei Wu [email protected]

2011/07/16

Page 2: F01-Cloud-Mygene.info

A migration story for BioGPS

http://biogps.org

Page 3: F01-Cloud-Mygene.info

A migration story for BioGPS

http://biogps.org

Page 4: F01-Cloud-Mygene.info

A migration story for BioGPS

http://biogps.org

Page 5: F01-Cloud-Mygene.info

Gene-centric annotation data

A simple view:

http://biogps.org

Gene 1017 →Symbol: CDK2

→Ensembl: ENSG00000123374

→RefSeq: NM_001798

NM_052827

→Reporter: →U95A: 1792_g_at

1833_at

→U133A: 211804_s_at

204252_at

211803_at

Page 6: F01-Cloud-Mygene.info

A real example:

Symbol

Name

Alias

Summary

Ensembl

Refseq

UniGene

Homologene

GO

UniProt

InterPro

PDB

Prosite

IPI

And many more…

Page 7: F01-Cloud-Mygene.info

Relational database solutions Solution 1: “star” schema

GeneID Symbol

1017 CDK2

GeneID Platform Reporter

1017 U95A 1792_s_at

1017 U95A 1833_at

1017 U133A 211804_s_at

1017 U133A 204252_at

1017 U133A 211803_at

GeneID EnsemblID

1017 ENSG00000250560 Reporter Table

Master Table

Ensembl Table

GeneID RefseqID

1017 NM_001798

1017 NM_052827

Refseq Table

Page 8: F01-Cloud-Mygene.info

Relational database solutions

ID Type Value Parent Root

1 GeneID 1017 NULL 1017

2 Symbol CDK2 1017 1017

3 Ensembl ENSG00000123374 1017 1017

4 RefSeq NM_001798 1017 1017

5 RefSeq NM_052827 1017 1017

6 Platform U95A 1017 1017

7 Platform U133A 1017 1017

8 Reporter 1792_g_at U95A 1017

9 Reporter 1833_at U95A 1017

10 Reporter 211804_s_at U133A 1017

11 Reporter 204252_at U133A 1017

12 Reporter 211803_at U133A 1017

Generic Data Table

Solution 2: “weakly-typed” schema

Page 9: F01-Cloud-Mygene.info

“Document”-based database solution

{ “Symbol”: “CDK2”, “Ensembl”: “ENSG00000123374”, “RefSeq”: [ “NM_001798”, “NM_052827” ], “Reporter”: { “U95A”: [ “1792_g_at”, “1833_at” ], “U133A”:[ “211804_s_at”, “2045252_at”, “211803_at” ] } }

CDK2

1017:

Page 10: F01-Cloud-Mygene.info

What’s CouchDB

Document-based (“schema-free”) database

Index and query data in MapReduce fashion using Javascript

RESTful JSON API

Bi-directional replicator

Distributed

Page 11: F01-Cloud-Mygene.info

Load data into “CouchDB”

NCBI “gene_info” file

Gene “document”

Create bare-bone document for each gene

“gene2refseq”

“u95a_annot” “u133a_annot”

“gene2ensembl”

Appending more annotation to “document”

And more …

Easy to add append data type Easy to update incrementally

Page 12: F01-Cloud-Mygene.info

What’s behind Mygene.info

Page 13: F01-Cloud-Mygene.info

Gene Annotation as a Service

http://MyGene.Info

Gene annotation services go PUBLIC

Gene query service

http://mygene.info/query?q=<query>

Gene annotation service

http://mygene.info/gene/<geneid>

Page 14: F01-Cloud-Mygene.info

Gene Query Service user query matching gene IDs/symbols/names

(JSON output)

http://mygene.info/query?q=<query> Examples: http://mygene.info/query?q=cdk2

http://mygene.info/query?q=cdk2+AND+species:human http://mygene.info/query?q=cdk? http://mygene.info/query?q=p* http://mygene.info/query?q=entrezgene:1017 http://mygene.info/query?q=ensemblgene:ENSG00000123374

Page 15: F01-Cloud-Mygene.info

Gene Annotation Service gene id full or filtered gene annotation object (JSON output)

http://mygene.info/gene/<geneid> Examples: http://mygene.info/gene/1017 http://mygene.info/gene/ENSG00000123374 http://mygene.info/gene/1017?filter=name,symbol,summary http://mygene.info/gene/1017?filter=name,symbol,refseq.rna

Species supported: human mouse rat fruitfly nematode zebrafish thale cress frog

Page 16: F01-Cloud-Mygene.info

Demo and full documentation at http://mygene.info

Source code: https://bitbucket.org/newgene/genedoc/src

Targeted use case: Quickly build a gene-centric online resource without the need of maintaining

a local gene annotation database

Use it in a web application: Server side

Making direct HTTP calls

Client side

Setup a server-side proxy

JSONP calls

Cross-domain AJAX calls via CORS (Cross-Origin Resource Sharing)

Page 17: F01-Cloud-Mygene.info

Acknowledgement

Group members: Andrew Su Ian MacLeod Benjamin Good Eric Clarke

http://mygene.info

ISMB travel support

GNF collaborators: Camilo Orozco Jon Huss Past contributor: Marc Leglise

Funding and Support

(NIH grant: R01GM083924)