#dphep : status and outlook sustainable strategies for long-term dp at the exa -scale
DESCRIPTION
#DPHEP : Status and Outlook Sustainable Strategies for Long-Term DP at the Exa -scale. International Collaboration for Data Preservation and Long Term Analysis in High Energy Physics. [email protected] LHCC Referees Meeting. Overview. Sustainable Strategy Collaboration Agreement - PowerPoint PPT PresentationTRANSCRIPT
#DPHEP: Status and OutlookSustainable Strategies for Long-Term DP at the Exa-scale
[email protected] LHCC Referees Meeting
International Collaboration for Data Preservation and Long Term Analysis in High Energy Physics
Overview
• Sustainable Strategy
• Collaboration Agreement
• Research Data Alliance
• H2020 (NSF?) Prospects
2020 Vision for LT DP in HEP• Long-term – e.g. LC timescales: disruptive change
– By 2020, all archived data – e.g. that described in Blueprint, including LHC data – easily findable, fully usable by designated communities with clear (Open) access policies and possibilities to annotate further
– Best practices, tools and services well run-in, fully documented and sustainable; built in common with other disciplines, based on standards
Vision achievable, but we are far from this today
Data Preservation Maturity ModelLevel Metric Implications
4 Reproducible results by “citizen scientists”
Desired(?) by funding agencies: people able to reproduce an analysis should be awarded “a degree” – beyond what can realistically be afforded?
3 Reproducible results where consumer ≠ producer and outside immediate community
Stronger demonstration of long-term preservation. Knowledge stored is sufficient for physicist outside immediate community to reproduce results
2 Reproducible results where consumer ≠ producer but within same “larger community”, e.g. LHC (ATLAS / CMS; CDF / D0, …)
Highly desirable for “minimal” long-term preservation. “Knowledge” stored is sufficient for a physicist from a different collaboration (but within same overall programme) to reproduce results
1 Reproducible results where consumer = producer
Required during lifetime of collaboration
0 N/A Data is lost: logically or physically.This is probably the reality for the bulk of pre-DPHEP experiments (and even some of those??)
• Scale (complexity) is probably “exponential”
Software Preservation Maturity ModelLevel Metric Implications
4 Reproducible results by “citizen scientists”
Desired(?) by funding agencies: people able to reproduce an analysis should be awarded “a degree” – beyond what can realistically be afforded?
3 Reproducible results where consumer ≠ producer and outside immediate community
Stronger demonstration of long-term preservation. Knowledge stored is sufficient for physicist outside immediate community to reproduce results
2 Reproducible results where consumer ≠ producer but within same “larger community”, e.g. LHC (ATLAS / CMS; CDF / D0, …)
Highly desirable for “minimal” long-term preservation. “Knowledge” stored is sufficient for a physicist from a different collaboration (but within same overall programme) to reproduce results
1 Reproducible results where consumer = producer
Required during lifetime of collaboration
0 N/A Data is lost: logically or physically.This is probably the reality for the bulk of pre-DPHEP experiments (and even some of those??)
REPRODUCIBLE RESULTS AFTER “PORTING” TO NEW ENVIRONMENT!
Sustainable Strategy
• A document on a sustainable strategy for LTDP is available – discussed at DPHEP IB today
• This version focuses on CERN (IT) – presented yesterday (attached to agenda: doc, ppt)
• Some comments received (DESY, INFN)– DESY comments included in current draft;– INFN: stress need for standards, e.g. for outreach activities
based on data from multiple experiments• Intent is to update document to reflect activities of
other “Collaboration Members”
Summary of Recommendations
ICFA Statement on LTDP• The International Committee for Future Accelerators (ICFA) supports the efforts of the Data
Preservation in High Energy Physics (DPHEP) study group on long-term data preservation and welcomes its transition to an active international collaboration with a full-time project manager. It encourages laboratories, institutes and experiments to review the draft DPHEP Collaboration Agreement with a view to joining by mid- to late-2013.
• ICFA notes the lack of effort available to pursue these activities in the short-term and the possible consequences on data preservation in the medium to long-term. We further note the opportunities in this area for international collaboration with other disciplines and encourage the DPHEP Collaboration to vigorously pursue its activities. In particular, the effort required to prepare project proposals must be prioritized, in addition to supporting on-going data preservation activities.
• ICFA notes the important benefits of long-term data preservation to exploit the full scientific potential of the, often unique, datasets. This potential includes not only future scientific publications but also educational outreach purposes, and the Open Access policies emerging from the funding agencies.
• 15 March 2013
DPHEP Collaboration Agreement
• A draft has been prepared by the CERN legal service, has been sent to ICFA and available to DPHEP since early 2013
• Some comments have been received and integrated• AFAIK CERN, DESY, FNAL and SLAC “ready” to sign• Target: prior to CHEP 2013 (RDA-2 might be better!)• Next steps: get legal services in touch with each other
and complete process• CERN & DESY: defining activities as part of Collaboration
RDA Preservation WG
• The RDA – strongly supported by EU, NSF, AU – seen as an element of implementing HLEG 2030 vision
• A WG on DP was approved in May – Chair: David Giaretta (APA, SCIDIP-ES, author of “Advanced
DP”, ex-DCC, ex-STFC)– Co-chair: JDS
• The intent is to show progress by each RDA plenary (March, September) and co-ordinate international activities, identify candidate services for standardization, lobby for funding…
Component Breakdown
• Can break this down into three distinct areas– (OAIS reference model is somewhat more complex: this
is a zeroth iteration)
• “Archive issues”
• Digital Libraries & “Adding Value” to data
• “Knowledge retention” – the Crux of the Matter
Archive Issues
We (HEP) has significant experience of 100PB+ distributed data stores
Plan is to coordinate long-term “bit preservation” issues via HEPiX
And with other disciplines e.g. via IEEE MSST×Sustainable models for long-term multi-
disciplinary data archives still to be solvedH2020 funding targetted for this
Digital LibrariesSignificant investment in this space, including
multiple EU (and other) funded projectsNo reason to believe that the issues will not be
solved, nor that funding models will not exist, e.g. adapted from “traditional” libraries
Related topics: “linked data”, “adding value to data” – again with projects / communities
Should work closely with these projects / communities, not start new initiatives
14
Where to Invest – Summary
Tools and Services, e.g. Invenio:could be solved. (2-3 years?)
Archival Storage Functionality:should be solved. (i.e. “now”)
Support to the Experiments for DPHEP Levels 3-4:must be solved – but how?
Who Can Help?• Mobilize resources through existing structures:
– Research Data Alliance:• Funding / strong interest from EU, US, AU, others• Part of roadmap to “Riding the Wave” 2030 Vision• STFC and DCC personnel strongly involved in setup
– WLCG:• Efforts on “software re-design” for new architectures• Experiment efforts on Software Validation (to be coordinated via DPHEP), building on
DESY & others– DPHEP:
• Coordination within HEP and with other projects / disciplines
• National & International Projects– H2020 / NSF funding lines– National projects also play an important role
Trust
Trus
t
Data
Cur
ation
DataGenerators
Community Support Services
Users
Common Data Services
User functionalities, datacapture & transfer, virtualresearch environments
Data discovery & navigationworkflow generation,annotation, interoperability
Persistent storage,identification, authenticity,workflow execution, mining
Collaborative Data Infrastructure – Riding The Wave HLEG Report
H2020 Prospects• According to Kostas Glinos (e-IRG meeting, Dublin) first calls:
December 11 2013• “Framework for action” (part of open consultation) has a “fiche”
targetting DP• DPHEP ICFA report (2020 vision) sent to Carlos MP• “References to RDA are appreciated and I really hope that you
take a leading role in bringing people and key players together around a global initiative to tackle the issue of “highly reliable and highly trusted infrastructures for research data preservation”.
• IMHO: need to prepare now (collaboration, WP, tasks) – likely discuss this at RDA Plenary, CHEP 2013, PV …
A Strategy for H2020?• Front-end: collaborate with on-going efforts in Digital Libraries, Linked Data, PV etc.
– Significant effort (also HEP expertise): very high probability of further funding in H2020 (+RDA)– DP(HEP) is already part of these projects: feed in requirements & collaborate (PRELIDA WS??)
• Back-end: collaborate through HEPiX & IEEE MSST– Seek specific H2020 funding for CDIs, including TCO, long-term, sustainable inter-disciplinary archives
• Middle:– Collaborative effort on Validation Frameworks, Virtualization, Training, Outreach etc.
• Includes institute / national funding– Work for “Concurrency Framework” and other efforts so that future migrations less painful; more repeatable– [ CERNLIB consortium ]– Seek further funds (H2020, RDA) to further develop and generalize
• Several (all?) relevant “fiches” in “Call for Action” document– fiche 01: community support data services– fiche 02: infrastructure for Open Access– fiche 03: storing, managing and preserving research data– fiche 04: discovery and provenance of research data– fiche 05: towards global data e-infrastructures– fiche 06: global A&A e-infrastructures– fiche 07: skills and new professions for research data
Other Activities
• Various project proposals in preparation / review• On-going activities in the experiments: “DPHEP classic”
as well as LHC• Discussions with CMS on validation system – other LHC
experiments expected to join• DPHEP session at CHEP 2013 – outlook for CHEP 2015?
(tighter integration into programme)• Presentations accepted at numerous conferences /
workshops – building more links with other disciplines• DPHEP IB (modeled on WLCG) monthly call
What WhenCollaboration Agreement Q3-Q4 2013Preparation for H2020 Now – Q3/Q4 2013HEPiX WG in place <Q4 2014First H2020 calls open Dec 2014ICFA report (work plan, including sustainability plan)
DESY, Feb 20-21 2014
H2020 Proposal End Q1 2014DPHEP Portal Available mid 2014H2020 news July 2014LEP Data “recovery” (CERNLIB???) End 2014?Validation framework(s) 2014 / 2015?Long-term CDI #1 2015 – 2017Full(?) understanding of costs 2016/17?Sustainable, repeatable LTDP 201?
Summary• Making good progress on multiple fronts
• “Sustainable strategy” being discussed (and then put in place)
• Good inter-disciplinary collaboration
• Optimistic regarding H2020 and also NSF(+) – but needs work!
• #DPHEP for news!