photon and neutron data infrastructure · • crystallography that reveals the structures of...

26
PaNdata Photon and Neutron Data Infrastructure 1st EUDAT User Forum Barcelona, 7-8 March 2012 Juan Bicarregui STFC

Upload: others

Post on 08-Oct-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Photon and Neutron Data Infrastructure · • crystallography that reveals the structures of viruses and proteins important ... Curation the researcher acts through ingest and access

PaNdata Photon and Neutron

Data Infrastructure

1st EUDAT User Forum

Barcelona, 7-8 March 2012

Juan Bicarregui

STFC

Page 2: Photon and Neutron Data Infrastructure · • crystallography that reveals the structures of viruses and proteins important ... Curation the researcher acts through ingest and access

Overview

Page 3: Photon and Neutron Data Infrastructure · • crystallography that reveals the structures of viruses and proteins important ... Curation the researcher acts through ingest and access

The PaNdata Collaboration

• Established 2007 with 4 partners

• Expanded since to 11 (now 13) organisations

(see next slide)

• Aims:

– “...to construct and operate a shared data

infrastructure for Neutron and Photon laboratories...”

2007 2008 2009 2010 2011 2012 2013 2014

EDNS (4)

EDNP (10)

PaNdataEurope(11)

Pandata ODI(11)

Page 4: Photon and Neutron Data Infrastructure · • crystallography that reveals the structures of viruses and proteins important ... Curation the researcher acts through ingest and access

PaN-data bring together 11 major European Research Infrastructures

PaN-data is coordinated by the e-Science Department at the Rutherford Appleton Laboratory, UK

ISIS is the world’s leading pulsed spallation neutron source

ILL operates the most intense slow neutron source in the world

PSI operates the Swiss Light Source, SLS, and Neutron Spallation Source, SINQ, and is developing the SwissFEL Free Electron Laser

HZB operates the BER II research reactor the BESSY II synchrotron

CEA/LLB operates neutron scattering spectrometers from the Orphée fission reactor

ESRF is a third generation synchrotron light source jointly funded by 19 European countries

Diamond is new 3rd generation synchrotron funded by the UK and the Wellcome Trust

DESY operates two synchrotrons, Doris III and Petra III, and the FLASH free electron laser

Soleil is a 2.75 GeV synchrotron radiation facility in operation since 2007

ELETTRA operates a 2-2.4 GeV synchrotron and is building the FERMI Free Electron Laser

ALBA is a new 3 GeV synchrotron facility due to become operational in 2010

PaN-data Partners

JCNS Juelich Centre for Neutron Science MaxLab, Max IV Synchrotron

Page 5: Photon and Neutron Data Infrastructure · • crystallography that reveals the structures of viruses and proteins important ... Curation the researcher acts through ingest and access

The Science we do - Structure of materials

Fitting experimental

data to model

Bioactive glass for bone growth

Structure of cholesterol

in crude oil

Hydrogen storage for zero emission vehicles

Magnetic moments in electronic storage

• Over 30,000 user visitors each year:

– physics, chemistry, biology, medicine,

– energy, environmental, materials,

culture

– pharmaceuticals, petrochemicals,

microelectronics

Longitudinal strain in aircraft wing

Diffraction pattern

from sample

Visit facility on

research campus

Place sample in

beam

• Over 5.000 high impact publications per year

– But so far no integrated data repositories

– Lacking sustainability & traceability

Page 6: Photon and Neutron Data Infrastructure · • crystallography that reveals the structures of viruses and proteins important ... Curation the researcher acts through ingest and access

PaN-data Applications

The partners operate hundreds of instruments used by over 30,000 scientists each year

These instruments support scientific fields as varied as:• Physics, Chemistry, Biology, Material sciences, Energy technology,

Environmental science, Medical technology and Cultural heritage

Applications include:

• crystallography that reveals the structures of viruses and proteins important

for the development of new drugs

• neutron scattering that identifies stresses within engineering components

such as turbine blades

• tomography that can image microscopic details of the 3D-structure of the

brain

Industrial applications include pharmaceuticals, petrochemicals and microelectronics

PaN-data Europe – building a sustainable data infrastructure for Neutron and Photon laboratories

Page 7: Photon and Neutron Data Infrastructure · • crystallography that reveals the structures of viruses and proteins important ... Curation the researcher acts through ingest and access

www.pan-data.eu

PaNdata Facilities

Together represent a capital investment of over 3 Billion €

Page 8: Photon and Neutron Data Infrastructure · • crystallography that reveals the structures of viruses and proteins important ... Curation the researcher acts through ingest and access

Overview

Page 9: Photon and Neutron Data Infrastructure · • crystallography that reveals the structures of viruses and proteins important ... Curation the researcher acts through ingest and access

Science driver – Data Integration

Neutron diffraction X-ray diffraction NMR

High-qualitystructure refinement

Page 10: Photon and Neutron Data Infrastructure · • crystallography that reveals the structures of viruses and proteins important ... Curation the researcher acts through ingest and access

What is e-Infrastructure?

DataCreation

Archival

Access

Storage ComputeNetwork

Services

Curation

the researcher actsthrough ingest and access

Virtual Research Environment

the researcher shouldn’t have to worry about the information infrastructure

Information Infrastructure

Page 11: Photon and Neutron Data Infrastructure · • crystallography that reveals the structures of viruses and proteins important ... Curation the researcher acts through ingest and access

EDNS - European Data Infrastructure for

Neutron and Synchrotron Sources

PaNdata Vision

Single Infrastructure � Single User Experience

CapacityStorage

Publications Repositories

Data Repositories

Software Repositories

Raw Data

Data Analysis

Analysed Data

Publication Data

Publications

Facility 1

Raw Data

Data Analysis

Analysed Data

Publication Data

Publications

Facility 2

Raw Data

Data Analysis

Analysed Data

Publication Data

Publications

Facility 3

Different Infrastructures � Different User ExperiencesRaw Data Catalogue

Data Analysis

Analysed Data Catalogue

Publication Data Catalogue

Publications Catalogue

Page 12: Photon and Neutron Data Infrastructure · • crystallography that reveals the structures of viruses and proteins important ... Curation the researcher acts through ingest and access

In words:PANdata will provide our user communities with data repositories and data management tools

to:

• deal with large sets and large data rates from the experiments,

• enable easy and standardised annotation of data,

• allow transparent and secure remote access to data,

• establish sustainable and compatible data catalogues, allow long-term preservation of data, and

• provide compatible open source data analysis software.

This will have a major impact on our scientific user community because it will offer:

• cross facility and cross discipline data analysis,

• secure access to large data sets over the network instead of using portable media,

• maintaining the records of science by having properly annotated data,

• linking publications to data,

• allowing efficient software developments, and

• efficient scientific collaborations across Europe by providing compatible data formats and analysis software.

Page 13: Photon and Neutron Data Infrastructure · • crystallography that reveals the structures of viruses and proteins important ... Curation the researcher acts through ingest and access

Metadata and Digital Curation

Proposal

Approval

Scheduling

Experiment

Data

cleansing

Record

Publication

Scientist submits

application for

beamtime

Facility committee

approves application

Facility registers,

trains, and schedules

scientist’s visit

Scientists visits,

facility run’s

experiment

Subsequent

publication

registered with

facility

Raw data filtered and

cleansed

Data analysis

Tools for processing

made available

Page 14: Photon and Neutron Data Infrastructure · • crystallography that reveals the structures of viruses and proteins important ... Curation the researcher acts through ingest and access

Overview

Page 15: Photon and Neutron Data Infrastructure · • crystallography that reveals the structures of viruses and proteins important ... Curation the researcher acts through ingest and access

PaN-data Standardisation

PaN-data Europe is undertaking 5 standardisation activities:

1.Development of a common data policy framework

2.Agreement on protocols for shared user information exchange

3.Definition of standards for common scientific data formats

4.Strategy for the interoperation of data analysis software enabling the most appropriate software to be used independently of where the data is collected

5.Integration and cross-linking of research outputs completing the lifecycle of research, linking all information underpinning publications, and supporting the long-term preservation of the research outputs

PaN-data Europe – building a sustainable data infrastructure for Neutron and Photon laboratories

Page 16: Photon and Neutron Data Infrastructure · • crystallography that reveals the structures of viruses and proteins important ... Curation the researcher acts through ingest and access

PaN-data Europe TimelinePaN-data Europe runs from June 2010 until December 2011 with workshops in Spring and Autumn 2011.

PaN-data Europe – building a sustainable data infrastructure for Neutron and Photon laboratories

Workpackage (abbreviated title) Jun Jul Aug Sep Oct Nov Dec Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov

Milestones M1 M2 W1 M3 M4 W2

WP1 Management D D D D WP1 Management

WP2 Common data policy framework D D D D WP2 Common data policy framework

WP3 Knowledge exchange/dissemination D D D D WP3 Knowledge exchange/dissemination

WP4 Common user information exchange D D D WP4 Common user information exchange

WP5 Scientific data D D D WP5 Scientific data

WP6 Data analysis software infrastructure D D D D WP6 Data analysis software infrastructure

WP7 Integration and cross-linking D D D WP7 Integration and cross-linking

KeyD - Deliverable

M - Milestone

W - Workshop

Workpackage (abbreviated title)

Workshops

Da

ta P

olicy

De

ve

lop

me

nt a

nd

de

live

ry

of th

e co

mm

on

da

ta p

olicy

Use

r an

d D

ata

Sta

nd

ard

s

De

live

ry o

f dra

ft stan

da

rds

for d

ata

an

d u

ser in

form

atio

n

Ba

selin

e fo

r inte

gra

tion

De

live

ry o

f po

licy o

n u

ser

info

rma

tion

, first rep

ort o

n

pu

blica

tion

s an

d in

teg

ratio

n

Inte

gra

tion

pro

po

sal

De

live

ry o

f po

licy a

nd

first pro

po

sal o

n in

teg

ratio

n

an

d o

n a

na

lysis so

ftwa

re

Fin

al W

ork

sho

p

Fin

al re

po

rts on

stan

da

rds

M1

M2

M3

M4

Page 17: Photon and Neutron Data Infrastructure · • crystallography that reveals the structures of viruses and proteins important ... Curation the researcher acts through ingest and access

2.1 Data

Policy

2.2Software

Policy

2.3User

Policy

2.4Integrated

Policy

4.1User

Proposal

4.2User

Workshop

4.3User

Revision

5.1Data

Proposal

5.2Data

Workshop

5.3Data

Revision

6.1SoftwareReview

6.2Software

Workshop

6.3SoftwareProposal

6.4Software Revision

7.1Integration

Report

7.2Integration Proposal

7.3Integration

Revision

3.4

Final

Workshop

Project Management, Knowledge Exchange and Dissemination Activities

Dependencies between the major project tasks

Dependencies

Page 18: Photon and Neutron Data Infrastructure · • crystallography that reveals the structures of viruses and proteins important ... Curation the researcher acts through ingest and access

Overview

Page 19: Photon and Neutron Data Infrastructure · • crystallography that reveals the structures of viruses and proteins important ... Curation the researcher acts through ingest and access

ERA Open Access Sharing Initiatives (examples, etc)

ERA Infrastructure Platform Initiatives (EGI, etc)

PaNdata

Support Action(Ends 30 Nov 11)

Policies and Standards

PaNdata

ODI(begins

end2011)

JRAs

Users

Data

Software

Integration

Provenance

Preservation

Scalability

PaNdata

ODI(begins

end2011)

ServicesUsers

Data

PaNdata

ODI

Virtual Labs

PoliciesPowder Diff

SAXS & SANS

Tomography

Page 20: Photon and Neutron Data Infrastructure · • crystallography that reveals the structures of viruses and proteins important ... Curation the researcher acts through ingest and access

ObjectivesObjective 2 – Users

To deploy, operate and evaluate a system for pan-European user identification across the participating facilities and

implement common processes for the joint maintenance of that system.

Objective 3 – Data

To deploy, operate and evaluate a generic catalogue of scientific data across the participating facilities and promote

its integration with other catalogues beyond the project.

Objective 4 – Provenance

To research and develop a conceptual framework, defined as a metadata model, which can record the analysis

process, and to provide a software infrastructure which implements that model to record analysis steps hence

enabling the tracing of the derivation of analysed data outputs.

Objective 5 – Preservation

To add to the PaNdata infrastructure extra capabilities oriented towards long-term preservation and to integrate

these within selected virtual laboratories of the project to demonstrate benefits. These capabilities should, as for

the developments in the provenance JRA, be integrated into the normal scientific lifecycle as far as possible. The

conceptual foundations will be the OAIS standard and the NeXus file format.

Objective 6 – Scalability

To develop a scalable data processing framework, combining parallel filesystems with a parallelized standard data

formats (pNexus pHDF5) to permit applications to make most efficient use of dedicated multi-core environments

and to permit simultaneous ingest of data from various sources, while maintaining the possibility for real-time

data processing.

Objective 7 – Demonstration

To deploy and operate the services and technology developed in the project in virtual laboratories for three specific

techniques providing a set of integrated end-to-end data services.

Page 21: Photon and Neutron Data Infrastructure · • crystallography that reveals the structures of viruses and proteins important ... Curation the researcher acts through ingest and access

PaNdata ODI Joint Research

Activities

PaNdata ODI Service Activities

PaNdata ODI Service ReleasesStandards

from

PaNdata

Support

Action

uCat

dCat

vLabs

Prov

Pres

Scale

Rel 1 Rel 2 Rel 3 Rel 4

users

data

s/w

Integ

Jun 2014Jun 2013 Dec 2013Dec 2012

Page 22: Photon and Neutron Data Infrastructure · • crystallography that reveals the structures of viruses and proteins important ... Curation the researcher acts through ingest and access

Overview

Page 23: Photon and Neutron Data Infrastructure · • crystallography that reveals the structures of viruses and proteins important ... Curation the researcher acts through ingest and access

Data

The Research Lifecycle

the researcher actsthrough ingest and access

Research Environment

Creation

Archival

Access

Storage ComputeNetwork

Data

Services

the researcher shouldn’t have to worry about the information infrastructure

Information Infrastructure

ICAT

TopCAT

EGI

GEANTLocal resources

User Info feed

DAQ feed

Data Analysis feed Provenanced Data

EUDAT

?

Page 24: Photon and Neutron Data Infrastructure · • crystallography that reveals the structures of viruses and proteins important ... Curation the researcher acts through ingest and access

The 7 C’s

Creation Collection

Capacity

Computation

Curation

Collaboration Communication

PaNdataEurope SA

PaNdata ODI

PaNdata VRE

DataCreation

Archival

Access

Storage ComputeNetworkServicesCurationEUDAT ?

Page 25: Photon and Neutron Data Infrastructure · • crystallography that reveals the structures of viruses and proteins important ... Curation the researcher acts through ingest and access

Overview

Page 26: Photon and Neutron Data Infrastructure · • crystallography that reveals the structures of viruses and proteins important ... Curation the researcher acts through ingest and access

www.pan-data.eu

Thank You