why cyverse helps open science · data stewardship matters we couldn’t do the science we do...
TRANSCRIPT
![Page 1: Why CyVerse helps open science · Data stewardship matters We couldn’t do the science we do without access to data and analysis. The data-driven process ... • EI is the hardware](https://reader035.vdocuments.us/reader035/viewer/2022062920/5f01ee997e708231d401bed8/html5/thumbnails/1.jpg)
Data Sharing, Management and ReuseWhy CyVerse helps open science
Dr. Robert DaveyHead of Research e-Infrastructure
@froggleston
![Page 2: Why CyVerse helps open science · Data stewardship matters We couldn’t do the science we do without access to data and analysis. The data-driven process ... • EI is the hardware](https://reader035.vdocuments.us/reader035/viewer/2022062920/5f01ee997e708231d401bed8/html5/thumbnails/2.jpg)
www.earlham.ac.uk
What is data stewardship?
“Processes, policies, guidelines and responsibilities for administering organisations' entire data in compliance with policy and/or regulatory obligations”
Wikipedia
Who• Custodians, curators, data scientists, etc
What• Data and metadata
How• Standards• Controls• Data Entry
![Page 3: Why CyVerse helps open science · Data stewardship matters We couldn’t do the science we do without access to data and analysis. The data-driven process ... • EI is the hardware](https://reader035.vdocuments.us/reader035/viewer/2022062920/5f01ee997e708231d401bed8/html5/thumbnails/3.jpg)
www.earlham.ac.uk
What is data stewardship?
“Processes, policies, guidelines and responsibilities for administering organisations' entire data in compliance with policy and/or regulatory obligations”
![Page 4: Why CyVerse helps open science · Data stewardship matters We couldn’t do the science we do without access to data and analysis. The data-driven process ... • EI is the hardware](https://reader035.vdocuments.us/reader035/viewer/2022062920/5f01ee997e708231d401bed8/html5/thumbnails/4.jpg)
www.earlham.ac.uk
• Data stewardship must have a well-defined purpose, or fitness– data recording policy– data access policy– computer interoperability policy
• Sounds simple enough
Data stewardship matters
We couldn’t do the science we do without access to data and analysis
![Page 5: Why CyVerse helps open science · Data stewardship matters We couldn’t do the science we do without access to data and analysis. The data-driven process ... • EI is the hardware](https://reader035.vdocuments.us/reader035/viewer/2022062920/5f01ee997e708231d401bed8/html5/thumbnails/5.jpg)
www.earlham.ac.uk
The data-driven process
DATASTANDARDS
ASSEMBLY
VARIATION
FUNCTION
MODEL INTEGRATION
RESEARCHERS & COLLABORATORS
EXPERIMENTS & HYPOTHESES
INTERPRETATION
DATARESOURCES
![Page 6: Why CyVerse helps open science · Data stewardship matters We couldn’t do the science we do without access to data and analysis. The data-driven process ... • EI is the hardware](https://reader035.vdocuments.us/reader035/viewer/2022062920/5f01ee997e708231d401bed8/html5/thumbnails/6.jpg)
www.earlham.ac.uk
Data is a mess
So what can we do about it?
• Put data where humans and computers can see it
• Describe data in a common way
• Be committed to reproducibility
• Connect up resources to build ecosystems
![Page 7: Why CyVerse helps open science · Data stewardship matters We couldn’t do the science we do without access to data and analysis. The data-driven process ... • EI is the hardware](https://reader035.vdocuments.us/reader035/viewer/2022062920/5f01ee997e708231d401bed8/html5/thumbnails/7.jpg)
www.earlham.ac.uk
Managing your data
• Useful to think about project management
• What data are you going to collect?
• Where are you going to store it?
• Is anyone going to work on your data at the same time?
• How can you make sure your data hasn’t changed?
• Organise your project folders!
![Page 8: Why CyVerse helps open science · Data stewardship matters We couldn’t do the science we do without access to data and analysis. The data-driven process ... • EI is the hardware](https://reader035.vdocuments.us/reader035/viewer/2022062920/5f01ee997e708231d401bed8/html5/thumbnails/8.jpg)
www.earlham.ac.uk
Managing your metadata
• https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1000424
• What metadata are you going to record?
• Where are you going to keep it?
• What format will it be in?
• Are you going to share this with anyone else?
• How can you make sure others can understand your work?
![Page 9: Why CyVerse helps open science · Data stewardship matters We couldn’t do the science we do without access to data and analysis. The data-driven process ... • EI is the hardware](https://reader035.vdocuments.us/reader035/viewer/2022062920/5f01ee997e708231d401bed8/html5/thumbnails/9.jpg)
www.earlham.ac.uk
What is metadata?
• Why is metadata useful?
• Where might you keep metadata?
• What are the possible problems with metadata?
• Are these problems to do with people or computers?
• Where might you find information about data you want?
• How might you start getting that data?
![Page 10: Why CyVerse helps open science · Data stewardship matters We couldn’t do the science we do without access to data and analysis. The data-driven process ... • EI is the hardware](https://reader035.vdocuments.us/reader035/viewer/2022062920/5f01ee997e708231d401bed8/html5/thumbnails/10.jpg)
www.earlham.ac.uk
Data is a mess
So what can we do about it?
• Describe data in a common way using human terms
• Supply this data in computer-friendly formats
• Put data where humans and computers can see it
• Be FAIR!
![Page 11: Why CyVerse helps open science · Data stewardship matters We couldn’t do the science we do without access to data and analysis. The data-driven process ... • EI is the hardware](https://reader035.vdocuments.us/reader035/viewer/2022062920/5f01ee997e708231d401bed8/html5/thumbnails/11.jpg)
www.earlham.ac.uk
FAIR data
• Findable, Accessible, Interoperable, Reusable
https://www.nature.com/articles/sdata201618
![Page 12: Why CyVerse helps open science · Data stewardship matters We couldn’t do the science we do without access to data and analysis. The data-driven process ... • EI is the hardware](https://reader035.vdocuments.us/reader035/viewer/2022062920/5f01ee997e708231d401bed8/html5/thumbnails/12.jpg)
www.earlham.ac.uk
FAIR data
• Findable, Accessible, Interoperable, Reusable
https://www.nature.com/articles/sdata201618
![Page 13: Why CyVerse helps open science · Data stewardship matters We couldn’t do the science we do without access to data and analysis. The data-driven process ... • EI is the hardware](https://reader035.vdocuments.us/reader035/viewer/2022062920/5f01ee997e708231d401bed8/html5/thumbnails/13.jpg)
www.earlham.ac.uk
Research Data Lifecycles
![Page 14: Why CyVerse helps open science · Data stewardship matters We couldn’t do the science we do without access to data and analysis. The data-driven process ... • EI is the hardware](https://reader035.vdocuments.us/reader035/viewer/2022062920/5f01ee997e708231d401bed8/html5/thumbnails/14.jpg)
www.earlham.ac.uk
Research Data Lifecycles
![Page 15: Why CyVerse helps open science · Data stewardship matters We couldn’t do the science we do without access to data and analysis. The data-driven process ... • EI is the hardware](https://reader035.vdocuments.us/reader035/viewer/2022062920/5f01ee997e708231d401bed8/html5/thumbnails/15.jpg)
www.earlham.ac.uk
Research Data Lifecycles
![Page 16: Why CyVerse helps open science · Data stewardship matters We couldn’t do the science we do without access to data and analysis. The data-driven process ... • EI is the hardware](https://reader035.vdocuments.us/reader035/viewer/2022062920/5f01ee997e708231d401bed8/html5/thumbnails/16.jpg)
www.earlham.ac.uk
Research Data Lifecycles
![Page 17: Why CyVerse helps open science · Data stewardship matters We couldn’t do the science we do without access to data and analysis. The data-driven process ... • EI is the hardware](https://reader035.vdocuments.us/reader035/viewer/2022062920/5f01ee997e708231d401bed8/html5/thumbnails/17.jpg)
www.earlham.ac.uk
Managing your data
![Page 18: Why CyVerse helps open science · Data stewardship matters We couldn’t do the science we do without access to data and analysis. The data-driven process ... • EI is the hardware](https://reader035.vdocuments.us/reader035/viewer/2022062920/5f01ee997e708231d401bed8/html5/thumbnails/18.jpg)
www.earlham.ac.uk
Managing your data
![Page 19: Why CyVerse helps open science · Data stewardship matters We couldn’t do the science we do without access to data and analysis. The data-driven process ... • EI is the hardware](https://reader035.vdocuments.us/reader035/viewer/2022062920/5f01ee997e708231d401bed8/html5/thumbnails/19.jpg)
www.earlham.ac.uk
Managing your data
http://www.pnas.org/content/115/11/2584.short
![Page 20: Why CyVerse helps open science · Data stewardship matters We couldn’t do the science we do without access to data and analysis. The data-driven process ... • EI is the hardware](https://reader035.vdocuments.us/reader035/viewer/2022062920/5f01ee997e708231d401bed8/html5/thumbnails/20.jpg)
www.earlham.ac.uk
Research Data Lifecycles
![Page 21: Why CyVerse helps open science · Data stewardship matters We couldn’t do the science we do without access to data and analysis. The data-driven process ... • EI is the hardware](https://reader035.vdocuments.us/reader035/viewer/2022062920/5f01ee997e708231d401bed8/html5/thumbnails/21.jpg)
www.earlham.ac.uk
Managing your data
• Journals often require that data and code is available• Reviewers may reject papers with no FAIR outputs
– especially in informatics / data science
• Specific scientific data repositories– EMBL EBI / NCBI, custom databases
• General repos– Zenodo– Figshare– Data Dryad– Github / Bitbucket
![Page 22: Why CyVerse helps open science · Data stewardship matters We couldn’t do the science we do without access to data and analysis. The data-driven process ... • EI is the hardware](https://reader035.vdocuments.us/reader035/viewer/2022062920/5f01ee997e708231d401bed8/html5/thumbnails/22.jpg)
www.earlham.ac.uk
Managing your data
![Page 23: Why CyVerse helps open science · Data stewardship matters We couldn’t do the science we do without access to data and analysis. The data-driven process ... • EI is the hardware](https://reader035.vdocuments.us/reader035/viewer/2022062920/5f01ee997e708231d401bed8/html5/thumbnails/23.jpg)
Earlham Institute, Norwich Research Park, Norwich, Norfolk, NR4 7UG, UKwww.earlham.ac.uk
![Page 24: Why CyVerse helps open science · Data stewardship matters We couldn’t do the science we do without access to data and analysis. The data-driven process ... • EI is the hardware](https://reader035.vdocuments.us/reader035/viewer/2022062920/5f01ee997e708231d401bed8/html5/thumbnails/24.jpg)
www.earlham.ac.uk
Put data where humans and computers can see it
• CyVerse is a huge and versatile research infrastructure• Helps thousands of users collaborate on data and analysis
![Page 25: Why CyVerse helps open science · Data stewardship matters We couldn’t do the science we do without access to data and analysis. The data-driven process ... • EI is the hardware](https://reader035.vdocuments.us/reader035/viewer/2022062920/5f01ee997e708231d401bed8/html5/thumbnails/25.jpg)
www.earlham.ac.uk
• Extensive compute and storage resources
• Committed to data stewardship practices– Data Store - iRODS data grid– Data Commons - descriptions/identifiers for public/user-curated data– Discovery Environment - GUI for end users– Agave API - programmatic interfaces to all of the above
• FAIR data compliance
• Potential to power huge federated services
Put data where humans and computers can see it
![Page 26: Why CyVerse helps open science · Data stewardship matters We couldn’t do the science we do without access to data and analysis. The data-driven process ... • EI is the hardware](https://reader035.vdocuments.us/reader035/viewer/2022062920/5f01ee997e708231d401bed8/html5/thumbnails/26.jpg)
www.earlham.ac.uk
• EI is the hardware and middleware site for CyVerse UK
• First international node outside the US• Opened for use by UK community in November 2016
• New collaborations in Australia and Austria
• http://cyverseuk.org/
![Page 27: Why CyVerse helps open science · Data stewardship matters We couldn’t do the science we do without access to data and analysis. The data-driven process ... • EI is the hardware](https://reader035.vdocuments.us/reader035/viewer/2022062920/5f01ee997e708231d401bed8/html5/thumbnails/27.jpg)
www.earlham.ac.uk
• http://www.cyverse.org/learning-center/manage-data
• Data allocations– 100GB per “standard” user (1TB at request)– Need a login to start storing private / collaborative data
• Data transfer tools– DE - web-based, slower transfers of 2GB max– CyberDuck - Graphical interface (GUI) that runs on the desktop for fast transfers– iRODS icommands - specialised command line (CLI) tools for fast transfers– FUSE - slow transfers, appears as a drive or folder on your computer
Put data where humans and computers can see it
![Page 28: Why CyVerse helps open science · Data stewardship matters We couldn’t do the science we do without access to data and analysis. The data-driven process ... • EI is the hardware](https://reader035.vdocuments.us/reader035/viewer/2022062920/5f01ee997e708231d401bed8/html5/thumbnails/28.jpg)
Main purposes of the project are:
● give research groups access to life science cyberinfrastructure in the UK● allow the analysis of data on UK-provisioned research HPC● ensure reproducibility of analysis through application versioning /
containerisation● support data sharing● distribute documentation and code as open source
CyVerse UK aims to give UK and EU users a geographical advantage, though it is available to users worldwide.
![Page 29: Why CyVerse helps open science · Data stewardship matters We couldn’t do the science we do without access to data and analysis. The data-driven process ... • EI is the hardware](https://reader035.vdocuments.us/reader035/viewer/2022062920/5f01ee997e708231d401bed8/html5/thumbnails/29.jpg)
● Common authentication process
● Use of application through the Discovery Environment
![Page 30: Why CyVerse helps open science · Data stewardship matters We couldn’t do the science we do without access to data and analysis. The data-driven process ... • EI is the hardware](https://reader035.vdocuments.us/reader035/viewer/2022062920/5f01ee997e708231d401bed8/html5/thumbnails/30.jpg)
BUT… applications need to be duplicated to be run both in the US and in the UK.
![Page 31: Why CyVerse helps open science · Data stewardship matters We couldn’t do the science we do without access to data and analysis. The data-driven process ... • EI is the hardware](https://reader035.vdocuments.us/reader035/viewer/2022062920/5f01ee997e708231d401bed8/html5/thumbnails/31.jpg)
![Page 32: Why CyVerse helps open science · Data stewardship matters We couldn’t do the science we do without access to data and analysis. The data-driven process ... • EI is the hardware](https://reader035.vdocuments.us/reader035/viewer/2022062920/5f01ee997e708231d401bed8/html5/thumbnails/32.jpg)
● Now providing webservice VMs to:○ SignaLink○ COPO○ Grassroots
Expansion and Community SupportWon a £400k BBSRC 16ALERT grant to expand the compute capacity of CyVerse UK
● Larger core count and RAM
● Better networking and switching
● Project-initiated VM requests to provide specific groups with cloud access
● Planning to include flexible analysis frontends such as the Genomics Virtual Laboratory
![Page 34: Why CyVerse helps open science · Data stewardship matters We couldn’t do the science we do without access to data and analysis. The data-driven process ... • EI is the hardware](https://reader035.vdocuments.us/reader035/viewer/2022062920/5f01ee997e708231d401bed8/html5/thumbnails/34.jpg)
Earlham Institute, Norwich Research Park, Norwich, Norfolk, NR4 7UG, UKwww.earlham.ac.uk