opwarmer for discussion on the harmonization of similar initiatives in nbic sequencing,...

28
Opwarmer for discussion on the harmonization of similar initiatives in NBIC sequencing, metabolomics, protomics and biobanking task forces (+friends like NuGO, EBI, GEN2PHEN, BBMRI-NL, SysMO, EU- PANACEA, Groningen Genomics Coordination Center). LIMS – laboratory info mngmnt system - AKA study capturing framework - AKA sample treatment tracker - AKA investigation metadata annotator

Upload: clarissa-shields

Post on 31-Dec-2015

215 views

Category:

Documents


1 download

TRANSCRIPT

Opwarmer for discussion on the harmonization of similar initiatives in NBIC sequencing, metabolomics, protomics and biobanking task forces (+friends like NuGO, EBI, GEN2PHEN, BBMRI-NL, SysMO, EU-PANACEA, Groningen Genomics Coordination Center).

LIMS – laboratory info mngmnt system- AKA study capturing framework - AKA sample treatment tracker - AKA investigation metadata annotator

Outline

• What do we mean with LIMS / SCS?

• Ingredients for collaboration

• Suggestive discussion topics

•Peak finding• SNP analysis• GWAS• xQTL•...

• Individuals• Samples• Protocols• Results• Background info

10 20 30 40 50 60 70 80

10

20

30

40

50

60

70

80

• Sequencing• Genotyping• Microarrays• Mass spec•…

LIMS/SCF is portal for

Examples DSP/NuGO

Courtesy Kees van Bochove & team, NMC & NuGO

• CilairDB

• Corra

• OpenBIS

• OpenMS

• …

Examples Proteomics

http://www.cisd.ethz.ch/software/openBIS/HCS

• SequenceLIMS

• ChIPLIMS Nijmegen

• GenotypeLIMS

• IBIDAS?

• iSeq

• …

ExamplesSequencing/Genotyping

Courtesy Joris Lops, GCC & LifeLines

• QTL XGAP/EU-PANACEA

• GWAS XGAP/LifeLines

• HGVbaseG2P 2.0

• …

ExamplesBiobanking

Courtesy Joeri van der Velde & friends, GCC & LifeLines

Working hypotheses

1. Each platform has one or more study ‘portals’• Captures all wet-lab and dry-lab flows

• Links to (or copies from) public annotations

• Provides value and data inputs for pipelines

• Stores provenance and results of all pipeline runs (as result files)

2. All tools developed in BioAssist will be connected to them• Need to think on user interaction

• Need to think on data exchange (formats)

• i.e. what does the biologist want?

• We can benefit greatly if we harmonize and share work• Each domain has specific needs but we can still share

• Data models, User Interfaces, Back-ends, …

• Coordination of this a task of CET?

Ingredients for collaboration

1. Conceptual model• To capture all data, including variation/extension mechanisms

2. Exchange formats• To exchange between public and private databases

3. User interfaces• Data import wizards

• Extraction / query modules

• Platforms for analysis!!!

4. Backend engines1. Large scale binary data

2. Automatic generation of services/pipelines

1. Conceptual model• Targets: the thing being followed

AKA: Individuals, Sample, Panels/Groups, Material

• Features: a abstract property of a target

AKA: Characteristics, Comments,

• Values: a concrete property of target (at a certain time)

AKA: Data

• Protocols: description of an activity

AKA: EventType, Template

• ProtocolApplications: use of protocol that produced (a) value

AKA: Events, Activity, Assay

• Investigation: some container of above + contacts/publications

AKA: Study, Project, Laboratory, Partner

‘Pheno-OM’ (generic variation mechanism)

NLNLEBI

Flexible: any feature,

value, and target combo

Observedvalue

Observedvalue*

Observationtarget

Observationtarget

time

Observablefeature

Observablefeature

*

PanelPanel IndividualIndividual*

* ProtocolProtocol

ProtocolapplicationProtocol

application

*

time

Observed Relation

Observed Relation Inferred ValueInferred Value*

*

time

*

Height

179cmInd1

XGAP (extension based variation mechanism)

Swertz et al (2010) Genome Biology 11(3).

DATA ELEMENT

TRAIT

SUBJECT

columns

rows

dimension ELEMENT

PROBE-Name-Gene-Chromosme-Locus

PROBE-Name-Gene-Chromosme-Locus

MARKER-Name-Allele-Chromosme-Locus

MARKER-Name-Allele-Chromosme-Locus

MASSPEAK-Name-MZ-RetentionTime

MASSPEAK-Name-MZ-RetentionTime

Panel-Name-Type: CSS, RIL..-Parent Panels

Panel-Name-Type: CSS, RIL..-Parent Panels

INDIVIDUAL-Name-Strain-Mother-Father-Sex

INDIVIDUAL-Name-Strain-Mother-Father-Sex

SAMPLE-Name-Individual-Tissue

SAMPLE-Name-Individual-Tissue And so on

And so on…

And so on…And so on…

NLNL

ISA-TAB(generic model)

Differs from MAGE-TAB• Nested investigations (as studies)• To have templates assays• More aligned to FuGE• But some find it too difficult

ISA =• Investigation• Study (Investigation component)• Assay (a component of Study)• Data files

Still in testing phase though…

http://isatab.sf.net

MIBBI

• MIBBI Minimum Information for Biological and Biomedical Investigations (total 31 areas)

http://mibbi.sourceforge.netTaylor et al 2008 Nature Biotechnology 8, p 889

MIAME Minimum Information About a Microarray Experiment

MIAPA Minimum Information About a Phylogenetic Analysis

MIAPAR Minimum Information About a Protein Affinity Reagent

MIAPE Minimum Information About a Proteomics Experiment

MIARE Minimum Information About a RNAi Experiment

MIFlowCyt Minimum Information for a Flow Cytometry Experiment

MIGen Minimum Information about a Genotyping Experiment

MIGS Minimum Information about a Genome Sequence

MIMPP Minimal Information for Mouse Phenotyping Procedures

MINSEQE Minimum Information about a high-throughput SeQuencing Experiment

MIPFE Minimal Information for Protein Functional Evaluation

MIQAS Minimal Information for QTLs and Association Studies

Ingredients for collaboration

1. Conceptual model• To capture all data, including variation/extension mechanisms

2. Exchange formats• To exchange between public and private databases

3. User interfaces• Data import wizards

• Extraction / query modules

• Platforms for analysis!!!

4. Backend engines1. Large scale binary data

2. Automatic generation of services/pipelines

2. Data formats

Basic

• CSV

• XML

• RDF/Atom

Specific

• MAGE-TAB

• MOLGENIS

• APML

• …

17

Connect to R statistics

Connect to R statistics

Workflow ready web-services

Workflow ready web-services

UML documentation of your model

UML documentation of your model

Edit & trace your dataEdit & trace your data

Import/export to ExcelImport/export to Excel

plugin your own scripts (OntBrowse)

plugin your own scripts (OntBrowse)

Tech keywords: object oriented data models, multi-platform java, tomcat/glassfish web server, mysql/postgresql database, Eclipse/Netbeans IDE, Java API, WSDL/SOAP API, R-project API, MVC, freemarker templates and css for custom layout, open source.

find.investigation()102 downloaded

obs<-find.observedvalue(43,920 downloaded

#some calculationadd.inferredvalue(res)36 added

3. User interfaces

3. User interfaces (import wizards)

3. User interfaces (import wizards)

http://www.obofoundry.org/http://bioportal.bioontology.org/ REST serviceshttp://www.ebi.ac.uk/ontology-lookup/ SOAP serviceshttp://ontocat.sf.net – Simple API around bioportal

ADD PICTURE OF GSCF

3. User interfaces (compute platform)

Courtesy Arends & van der Velde

Things to discuss as next steps?Put all people/tools in this room on the table

• Agree on exchange formats & models (generic/specific)

• Test drive data exchange or even federation

Share the work

• Communicate requirements and plans

• Reuse each other user interface components

• Share scalable back-ends (for high throughput data)

Invest in technology interoperation

• Invest in Galaxy callback to MOLGENIS/Grails (data chooser)?

• Invest in a MOLGENIS to Grails generator (must be easy)?

Something for NBIC mgmt team to think about

Extra

• XGAP wizard here

Acknowledgements

• Morris Swertz, Kees van Bochove, Erik Roos, Joris Lops, Joeri van der Velde, GEN2PHEN, MAGE-TAB, XGAP, ISA-TAB, FuGE, GSCF teams