swertz metadata capture symposium 2010-11-09€¦ · morris swertz and the molgenis team* ... andre...

26
EBI Overview for Metadata Capture Symposium November 10, Utrecht Morris Swertz and the MOLGENIS team* BBMRI-NL, EU-GEN2PHEN, EU-CASIMIR, EU-EURATRANS, LifeLines, EU- SYSGENET, EU-PANACEA, NBIC and other consortia *Genomics Coordination Center Groningen

Upload: lykhanh

Post on 17-Sep-2018

213 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Swertz Metadata Capture Symposium 2010-11-09€¦ · Morris Swertz and the MOLGENIS team* ... Andre de Vries George Byelas ... Swertz Metadata Capture Symposium 2010-11-09 Author:

EBI

Overview for Metadata Capture Symposium

November 10, Utrecht

Morris Swertz and the MOLGENIS team*

BBMRI-NL, EU-GEN2PHEN, EU-CASIMIR, EU-EURATRANS, LifeLines, EU-

SYSGENET, EU-PANACEA, NBIC and other consortia

*Genomics Coordination Center Groningen

Page 2: Swertz Metadata Capture Symposium 2010-11-09€¦ · Morris Swertz and the MOLGENIS team* ... Andre de Vries George Byelas ... Swertz Metadata Capture Symposium 2010-11-09 Author:

MOLGENIS mission

Grow a product family that supports all *omics experiments…

sharing models and software notwithstanding large variation

Page 3: Swertz Metadata Capture Symposium 2010-11-09€¦ · Morris Swertz and the MOLGENIS team* ... Andre de Vries George Byelas ... Swertz Metadata Capture Symposium 2010-11-09 Author:

etc.

etc.

Genomics Coordination Center – UMC/University Groningen

Page 4: Swertz Metadata Capture Symposium 2010-11-09€¦ · Morris Swertz and the MOLGENIS team* ... Andre de Vries George Byelas ... Swertz Metadata Capture Symposium 2010-11-09 Author:

select

165

10.000

1,000,000

1000,000

10,000

165k

10,000,00

eQTLprofiles

network

correlate

genomecohorts

individuals

markers

expressions

preprocess

probesmicroarrays

100

hybridize

100,000

genotype genotypes

norm exprs.

map

Our biologist challenges

Page 5: Swertz Metadata Capture Symposium 2010-11-09€¦ · Morris Swertz and the MOLGENIS team* ... Andre de Vries George Byelas ... Swertz Metadata Capture Symposium 2010-11-09 Author:

Wanted:

infrastructures for

More biologist challenges:

NGS * 750

Dutch Control Cohort * 80K

Local biobanks * 200

select

165 5,000,000

1000,000

100,000

165k

10,000,00

HapMapNL impute

genomepanels

individuals

genotype preprocess

probesmicroarrays

100

hybridize

100,000

SNPs call

phased

QCsequence

Etc.

Page 6: Swertz Metadata Capture Symposium 2010-11-09€¦ · Morris Swertz and the MOLGENIS team* ... Andre de Vries George Byelas ... Swertz Metadata Capture Symposium 2010-11-09 Author:

6

Connect to annotation services

Connect to annotation services

Plugin rich analysis toolsPlugin rich

analysis toolsConnect to

statisticsConnect to

statisticsUML documentation of

your modelUML documentation of

your model

Edit & trace your dataEdit & trace your data

Import/export to ExcelImport/export to Excel

find.investigation()102 downloaded

obs<-find.observedvalue(43,920 downloaded

#some calculationadd.inferredvalue(res)36 added

� � �

Wanted: ‘dynamic’ data and processing infrastructure

Page 7: Swertz Metadata Capture Symposium 2010-11-09€¦ · Morris Swertz and the MOLGENIS team* ... Andre de Vries George Byelas ... Swertz Metadata Capture Symposium 2010-11-09 Author:

MOLGENIS method

Page 8: Swertz Metadata Capture Symposium 2010-11-09€¦ · Morris Swertz and the MOLGENIS team* ... Andre de Vries George Byelas ... Swertz Metadata Capture Symposium 2010-11-09 Author:

Our challenges

biologist

GUI

STORE

bioinformatician

inbreed

100

10.000

1,000,000

100,000

10,000

10

10,000,00

QTL profiles

network

correlate

genomestrains

individuals

markers

expressions preprocess

probesmicroarrays

100

hybridize

100,000

genotype genotypes

norm exprs.

map

Logic

ANALYSE

Exchange servicesANNOTATE

Etc

(bio)informatician

Page 9: Swertz Metadata Capture Symposium 2010-11-09€¦ · Morris Swertz and the MOLGENIS team* ... Andre de Vries George Byelas ... Swertz Metadata Capture Symposium 2010-11-09 Author:

Use

Animal Observatory

NextGenSeq data

Mutation data

Model organisms data

Researcher needs Work very hard

Situation before MOLGENIS

Page 10: Swertz Metadata Capture Symposium 2010-11-09€¦ · Morris Swertz and the MOLGENIS team* ... Andre de Vries George Byelas ... Swertz Metadata Capture Symposium 2010-11-09 Author:

http://www.molgenis.orgSwertz et al (2010) BMC Bioinformatics accepted Swertz & Jansen (2007) Nature Reviews Genetics 8, 235-243Swertz et al (2004) Bioinformatics 20(13), 2075-83

����

����

Page 11: Swertz Metadata Capture Symposium 2010-11-09€¦ · Morris Swertz and the MOLGENIS team* ... Andre de Vries George Byelas ... Swertz Metadata Capture Symposium 2010-11-09 Author:

Using MOLGENIS

NextGenSeq

Mutation database

Model organisms

Model Use generated software

Animal Observatory

GeneratorGenerator

repeat often

http://www.molgenis.orgSwertz & Jansen (2007) Nature Reviews Genetics 8, 235-243Swertz et al (2004) Bioinformatics 20(13), 2075-83

Page 12: Swertz Metadata Capture Symposium 2010-11-09€¦ · Morris Swertz and the MOLGENIS team* ... Andre de Vries George Byelas ... Swertz Metadata Capture Symposium 2010-11-09 Author:

End products on top of MOLGENIS12

Mutations

Biobank

Sequencing

Proteo/Metabolomics

Animal LIMS

GWAS / QTL studies

And more …

Page 13: Swertz Metadata Capture Symposium 2010-11-09€¦ · Morris Swertz and the MOLGENIS team* ... Andre de Vries George Byelas ... Swertz Metadata Capture Symposium 2010-11-09 Author:

MOLGENIS contributors

GCC/MOLGENIS contributors

Morris Swertz

Erik Roos

Joeri van der Velde

Robert Wagner

Joris Lops

Danny Arends

Despoina Antonakaki

Alex Kanterakis

Jessica Lundberg

Andre de Vries

George Byelas

Freerk van Dijk

And many others

External contributors

Tomasz Adamuziak

Juha Muilu

Gudmundur Thorisson

Sirisha Gollapudi

Helen Parkinson

Pedro Lopes

And many others

Emphasis on collaboration

BBMRI-NL biobanking (Hs)

EU-GEN2PHEN consortium (Hs)

EU-PANACEA consortium (Ce)

EU-EURATRANS consortium (Rn)

NL Brassica Nutr. consortium (At)

EU-CASIMIR consortium (Mm)

NBIC/BioAssist consortium (bioinfo)

And others

13

NLNLEBI

Page 14: Swertz Metadata Capture Symposium 2010-11-09€¦ · Morris Swertz and the MOLGENIS team* ... Andre de Vries George Byelas ... Swertz Metadata Capture Symposium 2010-11-09 Author:

MOLGENIS family features

All MOLGENIS systems differ in their model butshare the common toolchain

Page 15: Swertz Metadata Capture Symposium 2010-11-09€¦ · Morris Swertz and the MOLGENIS team* ... Andre de Vries George Byelas ... Swertz Metadata Capture Symposium 2010-11-09 Author:

Data exchange and loading

http://www.xgap.orgSwertz, van der Velde et al (2010) Genome Biology 9;11(3): R27.

Page 16: Swertz Metadata Capture Symposium 2010-11-09€¦ · Morris Swertz and the MOLGENIS team* ... Andre de Vries George Byelas ... Swertz Metadata Capture Symposium 2010-11-09 Author:

Data loading

http://www.xgap.orgSwertz, van der Velde et al (2010) Genome Biology 9;11(3): R27.

Page 17: Swertz Metadata Capture Symposium 2010-11-09€¦ · Morris Swertz and the MOLGENIS team* ... Andre de Vries George Byelas ... Swertz Metadata Capture Symposium 2010-11-09 Author:

User interfacing

› 102 biobank studies

› 2,042 features

› 42,939 individuals

› 287 panels

› 196 protocols

› 140 ontology terms

+170 dbGaP

investigations

17E.g. Pheno-OM

›102 investigations›2,042 features

›42,939 individuals›287 panels

›196 protocols›140 ontology terms

+170 dbGaPinvestigations

Demo: http://wwwdev.ebi.ac.uk/microarray-srv/pheno/Source: https://svn.gene.le.ac.uk/gen2phen/pheno-model

Page 18: Swertz Metadata Capture Symposium 2010-11-09€¦ · Morris Swertz and the MOLGENIS team* ... Andre de Vries George Byelas ... Swertz Metadata Capture Symposium 2010-11-09 Author:

http://www.xgap.orgSwertz, van der Velde et al (2010) Genome Biology 9;11(3): R27.

Data explorationQTL studies

Page 19: Swertz Metadata Capture Symposium 2010-11-09€¦ · Morris Swertz and the MOLGENIS team* ... Andre de Vries George Byelas ... Swertz Metadata Capture Symposium 2010-11-09 Author:

Programmatic interfaces R, REST, SOAP, …

#connect to my XGAP database source("http://aserver/xgap/api/R") #upload my 'metanetwork' investigation add.investigation(name="metanetwork") #use 'metanetwork' investigation use.investigation(name="metanetwork") #upload subjects and traits add.marker(name=rownames(markers), chr =markers$chr, cm =markers$cM) add.metabolite(name=rownames(metabolites)) add.subject(name=colnames(genotypes)) #upload genotype and phenotype data matrices add.datamatrix(geno,

name="geno" rowtype="marker" coltype="subject" valuetype="text")

add.datamatrix(mpheno, name="mexpr" rowtype="metabolite" coltype="subject" valuetype="double")

#connect to XGAP database source("http://aserver/xgap/api/R") #use 'metanetwork' investigation use.investigation(name="metanetwork") #list available data sets find.datasets() #download genotype and phenotype datasets geno <-find.datamatrix(name="geno") mpheno <-find.datamatrix(name="mexpr") markers <-find.markers() #calculate & plot (Fu 2007, Nature Protocols) mqtls <-qtlMapTwoPart(geno,mpheno,spike=4) qtlPlot(markers,mqtls, 4) #upload qtl result matrix add.datamatrix(mqtls,

name="qtlprofiles" rowtype="metabolite" coltype="marker" valuetype="double")

XGAP

genotypes

markers arab 220903

100 200 300 400 500 600 700 800 900 1000m/z0

100

%

Koornneef0007 526 (11.117) AM (Top,4, A r,10000.0,556.28 ,0 .70 ,LS 10); Sm (Mn, 2x1.00); Sb (1,40.00 )1.40e3171.1702

1396

649.3804551

526.3066248172.1795

162

650.3882224

809.4496;80

phenotype

QTLs

subjects markers subjects arab 220903

100 200 300 400 500 600 700 800 900 1000m/z0

100

%

Koornneef0007 526 (11.117) AM (Top,4, A r,10000.0,556.28 ,0 .70 ,LS 10); Sm (Mn, 2x1.00); Sb (1,40.00 )1.40e3171.1702

1396

649.3804551

526.3066248172.1795

162

650.3882224

809.4496;80

phenotype genotypes

Scientist A uploads raw data Scientist B uploads analysis results http://www.xgap.orgSwertz et al (2010) Genome Biology 9;11(3): R27.

Page 20: Swertz Metadata Capture Symposium 2010-11-09€¦ · Morris Swertz and the MOLGENIS team* ... Andre de Vries George Byelas ... Swertz Metadata Capture Symposium 2010-11-09 Author:

Big QTL, GWAS, NGS, Proteomics computing

Page 21: Swertz Metadata Capture Symposium 2010-11-09€¦ · Morris Swertz and the MOLGENIS team* ... Andre de Vries George Byelas ... Swertz Metadata Capture Symposium 2010-11-09 Author:

Data analysis using cloud/cluster

See poster Q01:

User friendly cluster computing for R/QTL analysis on XGAP

Page 22: Swertz Metadata Capture Symposium 2010-11-09€¦ · Morris Swertz and the MOLGENIS team* ... Andre de Vries George Byelas ... Swertz Metadata Capture Symposium 2010-11-09 Author:

Current work

• Merge all data models

• Next session

Page 23: Swertz Metadata Capture Symposium 2010-11-09€¦ · Morris Swertz and the MOLGENIS team* ... Andre de Vries George Byelas ... Swertz Metadata Capture Symposium 2010-11-09 Author:

Current work: more pipelines

• Galaxy tool defs?

• Taverna flows?

Genome of the Netherlandshttp://www.bbmriwiki.nl

Page 24: Swertz Metadata Capture Symposium 2010-11-09€¦ · Morris Swertz and the MOLGENIS team* ... Andre de Vries George Byelas ... Swertz Metadata Capture Symposium 2010-11-09 Author:

24

Panacea

GEN2PHEN

LifeLines

Deformed ears?

HPO:Abnormally shaped ears Auricular malformation

Deformed auricles

MP:Malformed auricles

Malformed ears Malformed external ears

etc

query expansion

Current work: semantic toolsD2RQ, Lucene, OntoCAT, RDF/OWL

Local ontologies

(OLW or OBO)

CWA

BioPortal

OLS

OntoCAT – Ontology common API taskshttp://www.ontocat.org and http://precedings.nature.com/documents/4666

Abnormally shaped ears ☺

Page 25: Swertz Metadata Capture Symposium 2010-11-09€¦ · Morris Swertz and the MOLGENIS team* ... Andre de Vries George Byelas ... Swertz Metadata Capture Symposium 2010-11-09 Author:

MOLGENIS summary

1. Flexible models

• From biobank to local researcher to community

• eXtensible Genotype And Phenotype model

2. Flexible software

•Get much more because of open source sharing

• Agile development: short cycles with ‘client’

3. Enabling modules

• From cluster backends, workflows to rdf/owl

• Spinouts like Ontocat

Page 26: Swertz Metadata Capture Symposium 2010-11-09€¦ · Morris Swertz and the MOLGENIS team* ... Andre de Vries George Byelas ... Swertz Metadata Capture Symposium 2010-11-09 Author:

Web

• MOLGENIS: http://www.molgenis.org

• XGAP: http://www.xgap.org

• OntoCAT: http://www.ontocat.org

• BBMRI-NL wiki: http://www.bbmriwiki.nl

Read

• Swertz et al (2010) BMC Bioinformatics, due December.

• Swertz et al (2010) Genome Biology 9;11(3): R27.

• Smedley et al (2008) Briefings in Bioinformatics 9(6): 532-544

• Swertz & Jansen (2007) Nature Reviews Genetics 8, 235-243

• Swertz et al (2004) Bioinformatics 20(13), 2075-83

Thank you!

[email protected]