opwarmer for discussion on the harmonization of similar initiatives in nbic sequencing,...
TRANSCRIPT
Opwarmer for discussion on the harmonization of similar initiatives in NBIC sequencing, metabolomics, protomics and biobanking task forces (+friends like NuGO, EBI, GEN2PHEN, BBMRI-NL, SysMO, EU-PANACEA, Groningen Genomics Coordination Center).
LIMS – laboratory info mngmnt system- AKA study capturing framework - AKA sample treatment tracker - AKA investigation metadata annotator
Outline
• What do we mean with LIMS / SCS?
• Ingredients for collaboration
• Suggestive discussion topics
•Peak finding• SNP analysis• GWAS• xQTL•...
• Individuals• Samples• Protocols• Results• Background info
10 20 30 40 50 60 70 80
10
20
30
40
50
60
70
80
• Sequencing• Genotyping• Microarrays• Mass spec•…
LIMS/SCF is portal for
• CilairDB
• Corra
• OpenBIS
• OpenMS
• …
Examples Proteomics
http://www.cisd.ethz.ch/software/openBIS/HCS
• SequenceLIMS
• ChIPLIMS Nijmegen
• GenotypeLIMS
• IBIDAS?
• iSeq
• …
ExamplesSequencing/Genotyping
Courtesy Joris Lops, GCC & LifeLines
• QTL XGAP/EU-PANACEA
• GWAS XGAP/LifeLines
• HGVbaseG2P 2.0
• …
ExamplesBiobanking
Courtesy Joeri van der Velde & friends, GCC & LifeLines
Working hypotheses
1. Each platform has one or more study ‘portals’• Captures all wet-lab and dry-lab flows
• Links to (or copies from) public annotations
• Provides value and data inputs for pipelines
• Stores provenance and results of all pipeline runs (as result files)
2. All tools developed in BioAssist will be connected to them• Need to think on user interaction
• Need to think on data exchange (formats)
• i.e. what does the biologist want?
• We can benefit greatly if we harmonize and share work• Each domain has specific needs but we can still share
• Data models, User Interfaces, Back-ends, …
• Coordination of this a task of CET?
Ingredients for collaboration
1. Conceptual model• To capture all data, including variation/extension mechanisms
2. Exchange formats• To exchange between public and private databases
3. User interfaces• Data import wizards
• Extraction / query modules
• Platforms for analysis!!!
4. Backend engines1. Large scale binary data
2. Automatic generation of services/pipelines
1. Conceptual model• Targets: the thing being followed
AKA: Individuals, Sample, Panels/Groups, Material
• Features: a abstract property of a target
AKA: Characteristics, Comments,
• Values: a concrete property of target (at a certain time)
AKA: Data
• Protocols: description of an activity
AKA: EventType, Template
• ProtocolApplications: use of protocol that produced (a) value
AKA: Events, Activity, Assay
• Investigation: some container of above + contacts/publications
AKA: Study, Project, Laboratory, Partner
‘Pheno-OM’ (generic variation mechanism)
NLNLEBI
Flexible: any feature,
value, and target combo
Observedvalue
Observedvalue*
Observationtarget
Observationtarget
time
Observablefeature
Observablefeature
*
PanelPanel IndividualIndividual*
* ProtocolProtocol
ProtocolapplicationProtocol
application
*
time
Observed Relation
Observed Relation Inferred ValueInferred Value*
*
time
*
Height
179cmInd1
XGAP (extension based variation mechanism)
Swertz et al (2010) Genome Biology 11(3).
DATA ELEMENT
TRAIT
SUBJECT
columns
rows
dimension ELEMENT
PROBE-Name-Gene-Chromosme-Locus
PROBE-Name-Gene-Chromosme-Locus
MARKER-Name-Allele-Chromosme-Locus
MARKER-Name-Allele-Chromosme-Locus
MASSPEAK-Name-MZ-RetentionTime
MASSPEAK-Name-MZ-RetentionTime
Panel-Name-Type: CSS, RIL..-Parent Panels
Panel-Name-Type: CSS, RIL..-Parent Panels
INDIVIDUAL-Name-Strain-Mother-Father-Sex
INDIVIDUAL-Name-Strain-Mother-Father-Sex
SAMPLE-Name-Individual-Tissue
SAMPLE-Name-Individual-Tissue And so on
…
And so on…
And so on…And so on…
NLNL
ISA-TAB(generic model)
Differs from MAGE-TAB• Nested investigations (as studies)• To have templates assays• More aligned to FuGE• But some find it too difficult
ISA =• Investigation• Study (Investigation component)• Assay (a component of Study)• Data files
Still in testing phase though…
http://isatab.sf.net
MIBBI
• MIBBI Minimum Information for Biological and Biomedical Investigations (total 31 areas)
http://mibbi.sourceforge.netTaylor et al 2008 Nature Biotechnology 8, p 889
MIAME Minimum Information About a Microarray Experiment
MIAPA Minimum Information About a Phylogenetic Analysis
MIAPAR Minimum Information About a Protein Affinity Reagent
MIAPE Minimum Information About a Proteomics Experiment
MIARE Minimum Information About a RNAi Experiment
MIFlowCyt Minimum Information for a Flow Cytometry Experiment
MIGen Minimum Information about a Genotyping Experiment
MIGS Minimum Information about a Genome Sequence
MIMPP Minimal Information for Mouse Phenotyping Procedures
MINSEQE Minimum Information about a high-throughput SeQuencing Experiment
MIPFE Minimal Information for Protein Functional Evaluation
MIQAS Minimal Information for QTLs and Association Studies
Ingredients for collaboration
1. Conceptual model• To capture all data, including variation/extension mechanisms
2. Exchange formats• To exchange between public and private databases
3. User interfaces• Data import wizards
• Extraction / query modules
• Platforms for analysis!!!
4. Backend engines1. Large scale binary data
2. Automatic generation of services/pipelines
17
Connect to R statistics
Connect to R statistics
Workflow ready web-services
Workflow ready web-services
UML documentation of your model
UML documentation of your model
Edit & trace your dataEdit & trace your data
Import/export to ExcelImport/export to Excel
plugin your own scripts (OntBrowse)
plugin your own scripts (OntBrowse)
Tech keywords: object oriented data models, multi-platform java, tomcat/glassfish web server, mysql/postgresql database, Eclipse/Netbeans IDE, Java API, WSDL/SOAP API, R-project API, MVC, freemarker templates and css for custom layout, open source.
find.investigation()102 downloaded
obs<-find.observedvalue(43,920 downloaded
#some calculationadd.inferredvalue(res)36 added
3. User interfaces
3. User interfaces (import wizards)
http://www.obofoundry.org/http://bioportal.bioontology.org/ REST serviceshttp://www.ebi.ac.uk/ontology-lookup/ SOAP serviceshttp://ontocat.sf.net – Simple API around bioportal
ADD PICTURE OF GSCF
Things to discuss as next steps?Put all people/tools in this room on the table
• Agree on exchange formats & models (generic/specific)
• Test drive data exchange or even federation
Share the work
• Communicate requirements and plans
• Reuse each other user interface components
• Share scalable back-ends (for high throughput data)
Invest in technology interoperation
• Invest in Galaxy callback to MOLGENIS/Grails (data chooser)?
• Invest in a MOLGENIS to Grails generator (must be easy)?
Something for NBIC mgmt team to think about