sysmo-db: a community-based approach to data sharing
DESCRIPTION
SysMO-DB: A Community-Based Approach to Data Sharing. Dr Katy Wolstencroft University of Manchester. SysMO-DB. A data access, model handling and data integration platform for Systems Biology A web based resource That promotes shared understanding - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: SysMO-DB: A Community-Based Approach to Data Sharing](https://reader036.vdocuments.us/reader036/viewer/2022062304/5681400d550346895dab4851/html5/thumbnails/1.jpg)
SysMO-DB: A Community-Based Approach to Data Sharing
Dr Katy WolstencroftUniversity of Manchester
![Page 2: SysMO-DB: A Community-Based Approach to Data Sharing](https://reader036.vdocuments.us/reader036/viewer/2022062304/5681400d550346895dab4851/html5/thumbnails/2.jpg)
SysMO-DB
A data access, model handling and data integration
platform for Systems Biology A web based resource
That promotes shared understanding Using a common platform and common technologies
Started July 2008
DB
![Page 3: SysMO-DB: A Community-Based Approach to Data Sharing](https://reader036.vdocuments.us/reader036/viewer/2022062304/5681400d550346895dab4851/html5/thumbnails/3.jpg)
SysMO-DB Dev Team
University of Stellenbosch, South AfricaUniversity of Manchester, UK
Jacky Snoep
Heidelberg Institute for Theoretical Studies Germany
University of Manchester, UK
Olga Krebs
Wolfgang Müller
Sergejs Aleksejevs Carole Goble
Stuart Owen
Katy Wolstencroft
Finn Bacall
Franco B du Preez
![Page 4: SysMO-DB: A Community-Based Approach to Data Sharing](https://reader036.vdocuments.us/reader036/viewer/2022062304/5681400d550346895dab4851/html5/thumbnails/4.jpg)
Pan European collaboration Eleven individual projects, 89 institutes
Different research outcomes A cross-section of microorganisms, incl.
bacteria, archaea and yeast
Record and describe the dynamic molecular processes occurring in microorganisms in a comprehensive way
Present these processes in the form of computerized mathematical models
Pool research capacities and know-how
Already running since April 2007 Runs for 3-5 years This year, 2 new projects join and 6 leave
http://www.sysmo.net
Systems Biology of Microorganisms
![Page 5: SysMO-DB: A Community-Based Approach to Data Sharing](https://reader036.vdocuments.us/reader036/viewer/2022062304/5681400d550346895dab4851/html5/thumbnails/5.jpg)
Challenges
Heterogeneous data and models Distributed groups of researchers Modellers and experimentalists have different
skills, training, experience Scientists want to remain in control
Social and technical challenges
![Page 6: SysMO-DB: A Community-Based Approach to Data Sharing](https://reader036.vdocuments.us/reader036/viewer/2022062304/5681400d550346895dab4851/html5/thumbnails/6.jpg)
Social Challenge: Focus Group
DB team Focus Group Projects
Show what is thereSuggest what is possible
Ask for requirements
Give requirementsTell priorities
Rate outcomesSuggest improvements
Double checkTransmit
Disseminate
Collect answers
![Page 7: SysMO-DB: A Community-Based Approach to Data Sharing](https://reader036.vdocuments.us/reader036/viewer/2022062304/5681400d550346895dab4851/html5/thumbnails/7.jpg)
Focus Group SysMO-DB PALS
21 Postdocs and PhD students Modellers, experimentalists
and bioinformaticians Design and technical
collaboration team Intense collaboration UK and Continental PALS
Chapters Audits and Sharing.
Methods, data, models, standards, software, schemas, spreadsheets, SOPs…..
20 questions Deployment into Projects
![Page 8: SysMO-DB: A Community-Based Approach to Data Sharing](https://reader036.vdocuments.us/reader036/viewer/2022062304/5681400d550346895dab4851/html5/thumbnails/8.jpg)
Technical Challenge
Rapid and incremental development Just enough and just in time , not Just in case No reinvention Driven by the PALs Sustainable and extensible Migrate to standards Fitting in with normal lab practices
![Page 9: SysMO-DB: A Community-Based Approach to Data Sharing](https://reader036.vdocuments.us/reader036/viewer/2022062304/5681400d550346895dab4851/html5/thumbnails/9.jpg)
What do we share
Protocol Title Authors Keywords Abstract Materials
ReagentsReagent Set UpEquipment
Time Taken Procedure Troubleshooting Critical Steps Anticipated Results References
Methods Data Results+ +
Nature Protocols
All SysMO Assets
![Page 10: SysMO-DB: A Community-Based Approach to Data Sharing](https://reader036.vdocuments.us/reader036/viewer/2022062304/5681400d550346895dab4851/html5/thumbnails/10.jpg)
Protocols for Models
Protocol Title Authors Keywords Description Assumptions Equations Numerical Methods/Algorithms Computational Tools Parameter Estimation Techniques Limitations References
What do we share
Methods Data Results+ +Models +
All SysMO Assets
![Page 11: SysMO-DB: A Community-Based Approach to Data Sharing](https://reader036.vdocuments.us/reader036/viewer/2022062304/5681400d550346895dab4851/html5/thumbnails/11.jpg)
SOP
A Tree View of Assets
Investigation Studies Assay
ConstructionValidation
SOP
SOP
ISA infrastructure provides a directory structure for experiments
http://isatab.sourceforge.net/
![Page 12: SysMO-DB: A Community-Based Approach to Data Sharing](https://reader036.vdocuments.us/reader036/viewer/2022062304/5681400d550346895dab4851/html5/thumbnails/12.jpg)
Expertise, tools
Coordinates, data
![Page 13: SysMO-DB: A Community-Based Approach to Data Sharing](https://reader036.vdocuments.us/reader036/viewer/2022062304/5681400d550346895dab4851/html5/thumbnails/13.jpg)
How do we share
“Just Enough Results Model” What type of data is it
Microarray, growth curve, enzyme activity… What was measured
Gene expression, OD, metabolite concentration…. What do the values in the datasets mean
Units, time series, repeats….
Based on: Minimum information models
e.g. MIAME, MIAPE, MIRIAM Biological ontologies
e.g. Gene Ontology, MGED, SBO Bioportal web service used in SysMO-SEEK for:
Concept lookup and visualisation
![Page 14: SysMO-DB: A Community-Based Approach to Data Sharing](https://reader036.vdocuments.us/reader036/viewer/2022062304/5681400d550346895dab4851/html5/thumbnails/14.jpg)
How do we share
Share JERM templates developed by SysMO-DB, PALs and consortium Spreadsheet templates Database Schemas
Encourage uptake throughout SysMO transcriptomics metabolomics proteomics etc….
![Page 15: SysMO-DB: A Community-Based Approach to Data Sharing](https://reader036.vdocuments.us/reader036/viewer/2022062304/5681400d550346895dab4851/html5/thumbnails/15.jpg)
Tools to help manage data:Annotation standards by stealth
Controlled vocabulary plug inBioPortal
![Page 16: SysMO-DB: A Community-Based Approach to Data Sharing](https://reader036.vdocuments.us/reader036/viewer/2022062304/5681400d550346895dab4851/html5/thumbnails/16.jpg)
JERM Model
SysMO JERM a ‘MIBBI’ for the SysMO-SEEK What do we need to help you find stuff?
Title, person, filename, class
What is experiment specific? What is experiment specific, but helps us map
between them? Common biological elements
chemicals, genes, proteins, organisms, strains
![Page 17: SysMO-DB: A Community-Based Approach to Data Sharing](https://reader036.vdocuments.us/reader036/viewer/2022062304/5681400d550346895dab4851/html5/thumbnails/17.jpg)
Identifying Biological Objects
What do you have in your data? Proteins/enzymes, genes/expression levels,
metabolites
Where/how do these objects interact? Pathways, flux, experimental conditions
What models describe these interactions
Possible when using common frameworks, naming schemes and controlled vocabularies
![Page 18: SysMO-DB: A Community-Based Approach to Data Sharing](https://reader036.vdocuments.us/reader036/viewer/2022062304/5681400d550346895dab4851/html5/thumbnails/18.jpg)
Following Standards We recommend formats but we do not enforce
them Protocols and SOPs – Nature Protocols Data – JERM models and community minimum
information models Models – SBML and related standards Publications – PubMed and DOI
If you follow the prescribed formats, you get more out, but if you don’t, you can still participate
Lowering the adoption barrier
![Page 19: SysMO-DB: A Community-Based Approach to Data Sharing](https://reader036.vdocuments.us/reader036/viewer/2022062304/5681400d550346895dab4851/html5/thumbnails/19.jpg)
Access Permissions
Just Enough Sharing
...we don’t talk about security
![Page 20: SysMO-DB: A Community-Based Approach to Data Sharing](https://reader036.vdocuments.us/reader036/viewer/2022062304/5681400d550346895dab4851/html5/thumbnails/20.jpg)
COSMIC
SysMOLab
MOSES
Alfresco
Wiki
Wiki
ANOTHER
A DATASTORE
Just Enough sharing
SOP
Fetch on Request
Direct Upload
![Page 21: SysMO-DB: A Community-Based Approach to Data Sharing](https://reader036.vdocuments.us/reader036/viewer/2022062304/5681400d550346895dab4851/html5/thumbnails/21.jpg)
When do People Share
Data Collection Pre-publication Post-publication
Your own group and maybe your project
Project + maybe consortium
Consortium and wider community
Collaboration Discussion and criticism Advertising
• Suspicion and fear of scooping
• Reputation
SysMO Aims : sharing sooner
![Page 22: SysMO-DB: A Community-Based Approach to Data Sharing](https://reader036.vdocuments.us/reader036/viewer/2022062304/5681400d550346895dab4851/html5/thumbnails/22.jpg)
Incentives for sharing
Safe haven for data Credit and attribution Help with exporting to public repositories (e.g.
One-click export to ArrayExpress, PRIDE etc) A repository for “supplementary materials” in
publications Linking publications and data
Access other resources through a SEEK gateway
![Page 23: SysMO-DB: A Community-Based Approach to Data Sharing](https://reader036.vdocuments.us/reader036/viewer/2022062304/5681400d550346895dab4851/html5/thumbnails/23.jpg)
SEEK as a Gateway
JWS Online Plugin•online simulator, runs in SysMO-SEEK•upload models in SBML format•SBGN schemas, with annotations and external links
![Page 24: SysMO-DB: A Community-Based Approach to Data Sharing](https://reader036.vdocuments.us/reader036/viewer/2022062304/5681400d550346895dab4851/html5/thumbnails/24.jpg)
Incentives for sharing
Credit and attribution SEEK records who owns what. If data, models, or
protocols are reused, scientists get recognition Accountability
SEEK records who owns what. If you take credit for others work, they will see
Data citation – formal credit for data published in SEEK
![Page 25: SysMO-DB: A Community-Based Approach to Data Sharing](https://reader036.vdocuments.us/reader036/viewer/2022062304/5681400d550346895dab4851/html5/thumbnails/25.jpg)
Data Citation
Persistent identifiers and URLs for the data Linking people to the data Safe haven for the data Guarantees of sustainability
Data MUST be uploaded and archived If cited, it must be public
![Page 26: SysMO-DB: A Community-Based Approach to Data Sharing](https://reader036.vdocuments.us/reader036/viewer/2022062304/5681400d550346895dab4851/html5/thumbnails/26.jpg)
SEEK as a Safe Haven
HITS can archive SysMO data for 10 years All SysMO software is open source and available
Distinction between sustaining the service and the software
![Page 27: SysMO-DB: A Community-Based Approach to Data Sharing](https://reader036.vdocuments.us/reader036/viewer/2022062304/5681400d550346895dab4851/html5/thumbnails/27.jpg)
Governance and Policy
What is required by SysMO members? When should they share during their projects? How long after the project can they keep data private
to finish publications? If their data is stored locally, what is the archive
process? Policy from DMG and funding agencies and NOT
SysMO-DB
![Page 28: SysMO-DB: A Community-Based Approach to Data Sharing](https://reader036.vdocuments.us/reader036/viewer/2022062304/5681400d550346895dab4851/html5/thumbnails/28.jpg)
Governance and Policy
Proposals under discussion: All data registered in SEEK should be uploaded and
archived at the end of a SysMO project All data from finished projects should be shared
How long after the end? 1 day, 6 months, 1 year? Scientists can invoke “creator’s privilege” on SysMO
assets produced near the end of the project Extra time to write-up and publish before release to the
general public – respecting publication cycles
![Page 29: SysMO-DB: A Community-Based Approach to Data Sharing](https://reader036.vdocuments.us/reader036/viewer/2022062304/5681400d550346895dab4851/html5/thumbnails/29.jpg)
SysMO So Far…
People ARE sharing Over 300 assets in SEEK
SOPs: 102, Models: 17, DataFiles: 95 ,Investigations: 13, Studies: 26, Assays: 53
PALs – a network of young SysBio researchers Training and education in data and metadata
management spreading through the consortium Modellers and experimentalists communicating
![Page 30: SysMO-DB: A Community-Based Approach to Data Sharing](https://reader036.vdocuments.us/reader036/viewer/2022062304/5681400d550346895dab4851/html5/thumbnails/30.jpg)
SysMO Methods Spreading
Virtual Liver Mueller, via HITS
Lungsys SBCancer EraSysBio+
Eukaryotic organisms Interactions between host and pathogen Human disease Multi scale modelling
![Page 31: SysMO-DB: A Community-Based Approach to Data Sharing](https://reader036.vdocuments.us/reader036/viewer/2022062304/5681400d550346895dab4851/html5/thumbnails/31.jpg)
Why it works for us
A solution that fits in with current practices Start simple, show benefits, add more Engage with the people actually doing the work
PhD students, Post-docs Build to the PALs requirements Respect publication cycles Respect cultural differences Scientists stay in control
![Page 32: SysMO-DB: A Community-Based Approach to Data Sharing](https://reader036.vdocuments.us/reader036/viewer/2022062304/5681400d550346895dab4851/html5/thumbnails/32.jpg)
Acknowledgements
SysMO-DB Team SysMO-PALS
myGrid, Hits and JWS Online EMBL-EBI, MCISB
http://www.sysmo-db.org