sharing big data - bob jones
Post on 21-Jan-2018
72 Views
Preview:
TRANSCRIPT
Sharing big data
15 June 2017Bob Jones
CERNBob.Jones <at> cern.ch
Helix Nebula – The Science Cloud
Helix Nebula – The Science Cloud with Grant Agreement 687614 is a Pre-Commercial Procurement Action funded by H2020 Framework Programme
Accelerating Science and Innovation
Data in High-Energy Physics
Based on DPHEP Study Group (2009). Data Preservation in High Energy Physics. http://arxiv.org/abs/0912.0255
Patricia Herterich
5EPFL & SDSC visit 2017-03-24
CERN Open Data Portal
• 2015• 40 TB of 2010 data
• 2016• 320 TB of 2011 data
• Curation, release of • Simulated data (MC)
• Trigger information
• Configuration files
http://github.com/cernopendata
Barend Mons, Leiden University Medical Center
In the FAIR Data approach, data should be:
• Findable – Easy to find by both humans and computer systems and based on mandatory description of the metadata that allow the discovery of interesting datasets
• Accessible – Stored for long term such that they can be easily accessed and/or downloaded with well-defined license and access conditions (Open Access when possible), whether at the level of metadata, or at the level of the actual data content
• Interoperable – Ready to be combined with other datasets by humans as well as computer systems
• Reusable – Ready to be used for future research and to be processed further using computational methods.
https://www.dtls.nl/fair-data/
Peter Doorn, Director DANShttps://www.force11.org/group/fairgroup/fairprinciples
27/06/2017
The Hybrid Cloud ModelBrings together• research organisations,• data providers,• publicly funded e-
infrastructures,• commercial cloud service
providers
In a hybrid cloud with procurement and governance approaches suitable for the dynamic cloud market In-house
27/06/2017
Data Commons is a Platform that fosters development of a digital Ecosystem
Treats products of research – data, software, methods, papers, training materials etc. as a digital asset (object)
Digital objects need to conform to FAIR principles
- Findable, Accessible, Interoperable, Reproducible
Digital objects exist in a shared virtual space (initial)- Find, Deposit, Manage, Share and Reuse: digital assets
Enables interactions between Producers and Consumers of digital assets
Gives currency to digital assets and the people who develop and support them
Philip E. Bourne, Ph.D. FACMI
Associate Director for Data Science
National Institutes of Health, USA
Data Commons Pilot – connecting the pieces
Co-location of large and/or highly
utilized NIH funded data on the cloud
+ commonly used tools for analyzing
and sharing digital objects
to create an interoperable resource for
the research community.
Investigators will be able to collaborate
and share digital objects within this
environment and connect with others
Impact
Biggest issuer of DOIs for software in the world
Reference material for publications
F1000, Wiley, eLife, PLoS, Elsevier, Nature, etc
Recommended by EC and National programmes
https://www.zenodo.org/
Summary
Sharing big data needs technology, processes & organisation, people
FAIR principles represent best practice
Findable, Accessible, Interoperable, Reusable
Research communities around the world are developing science commons to accelerate the sharing of digital assets
27/06/2017
top related