large scale data facility for data intensive synchrotron

16
KIT – University of the State of Baden-Wuerttemberg and National Research Center of the Helmholtz Association Institute for Data Processing and Electronics www.kit.edu Institute for Sychrotron Radiation Steinbuch Centre for Computing Large Scale Data Facility for Data Intensive Synchrotron Beamlines R. Stotzka, W. Mexner, T. dos Santos Rolo, H. Pasic, J. van Wezel, V. Hartmann, T. Jejkal, A. Garcia, D. Haas, A. Streit

Upload: others

Post on 27-Nov-2021

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Large Scale Data Facility for Data Intensive Synchrotron

KIT – University of the State of Baden-Wuerttemberg and National Research Center of the Helmholtz Association

Institute for Data Processing and Electronics

www.kit.edu

Institute for Sychrotron RadiationSteinbuch Centre for Computing

Large Scale Data Facility for Data Intensive Synchrotron BeamlinesR. Stotzka, W. Mexner, T. dos Santos Rolo, H. Pasic, J. van Wezel, V. Hartmann, T. Jejkal, A. Garcia, D. Haas, A. Streit

Page 2: Large Scale Data Facility for Data Intensive Synchrotron

2 www.kit.edu

ANKA – Synchrotron Light Source at KIT

14 Beamlines operational

Far infrared – hard X-ray

Methods:• Topo-Tomo• Spectroscopy• Fluorescence• Diffraction• Lithography

Under construction:• High-resolution X-ray

diffraction• High-resolution IMAGE

Page 3: Large Scale Data Facility for Data Intensive Synchrotron

3 www.kit.edu

Data Intensive Beamlines at ANKATopo-Tomo and IMAGE

Sampleexchange

robot(15 s)

DAQ

UI UI UIControl System

Dataflow

Control & monitoring

Online Analysis

Pre-Archival

High speedcamera

(750 MB/s)

Onlinevisualization

(50 GB)

Storage requirements: several PB/a

+ Off-ineprocessing

(50 GB)

User

Page 4: Large Scale Data Facility for Data Intensive Synchrotron

4 www.kit.edu

Now and the Future

Now Near futureSample exchange Manual Automated

DAQ Automated Automated

Pre-processing Automated Automated

Online visualization Manual Automated

Offline processing Manual Manual/automated

Data management Manual Automated

Meta data Incomplete Full description

Storage and archival Manual Automated

• Fast changing experiment environments• Need for a flexible automated data management

Page 5: Large Scale Data Facility for Data Intensive Synchrotron

5 www.kit.edu

Large Scale Data Facility (LSDF)

Research Programme SuperComputing of theHelmholtz Association

Aims:• Provide storage• Provide archives• Provide data services• Provide data management• Provide support• Provide data intensive

computing• Provide data analysis

For heterogeneous researchcommunities withheterogeneous requirements

Page 6: Large Scale Data Facility for Data Intensive Synchrotron

6 www.kit.edu

LSDF Resources

LSDF Resources andResource Services

T1 Storage

DatabasesComputeFarm

T2 Tape

4 PB disk online

60 Nodes,2 TB memory,110 TB disks

Oracle,MySQLPostGres

10 GE Networks

GridKATier 1 of LHC Grid

12,000 Cores10 PB Disks14 PB Tape

Page 7: Large Scale Data Facility for Data Intensive Synchrotron

7 www.kit.edu

Software and Service Development

KIT Data Manager (KDM)Software and serviceinfrastructure

KDM DataBrowser• Graphical user interface• Data ingest• Data access• Meta data management• Processing launch

LSDF Execution Framework for Data Intensive Applications• Development• Deployment• Ingest

Page 8: Large Scale Data Facility for Data Intensive Synchrotron

8 www.kit.edu

Imaging Beamline Data Management

DAQ

UI UI UIControl System

Dataflow

Control & monitoring

Online Analysis

Pre-Archival

LSDF Resources andResource Services

User

User DB (SMIS)

T1 Storage

DatabasesComputeFarm

T2 Tape

• Fast changing experiment environments• Need for a flexible automated data management

Page 9: Large Scale Data Facility for Data Intensive Synchrotron

9 www.kit.edu

Imaging Beamline Data Management

DAQ

UI UI UIControl System

Dataflow

Control & monitoring

Online Analysis

Pre-Archival

LSDF Resources andResource Services

User

User DB (SMIS)

T1 Storage

DatabasesComputeFarm

T2 Tape

• Development of new ultrafast andintelligent camera systems

• Data management only on bit level

Page 10: Large Scale Data Facility for Data Intensive Synchrotron

10 www.kit.edu

Imaging Beamline Data Management

DAQ

UI UI UIControl System

Dataflow

Control & monitoring

Online Analysis

Pre-Archival

LSDF Resources andResource Services

User

User DB (SMIS)

T1 Storage

DatabasesComputeFarm

T2 Tape

• Selection of relevant measurement data• Annotation by meta data• Local storage• Unique data format

Page 11: Large Scale Data Facility for Data Intensive Synchrotron

11 www.kit.edu

Imaging Beamline Data Management

DAQ

UI UI UIControl System

Dataflow

Control & monitoring

Online Analysis

Pre-Archival

LSDF Resources andResource Services

User

User DB (SMIS)

T1 Storage

DatabasesComputeFarm

T2 Tape

• GPU accelerated system• Data pre-processing• On-line reconstruction using filtered back-projection

accelerated by GPUs (hm)• User selects promising data sets

Page 12: Large Scale Data Facility for Data Intensive Synchrotron

12 www.kit.edu

Imaging Beamline Data Management

DAQ

UI UI UIControl System

Dataflow

Control & monitoring

Online Analysis

Pre-Archival

LSDF Resources andResource Services

User

User DB (SMIS)

T1 Storage

DatabasesComputeFarm

T2 Tape

• Data preparation for storage, archiving & processing• Meta data:

• Life cycle• Access rights• Experiment description• Offline processing preparation

• Ingest

• User information• Processing information• Data provenance

Page 13: Large Scale Data Facility for Data Intensive Synchrotron

13 www.kit.edu

Imaging Beamline Data Management

DAQ

UI UI UIControl System

Dataflow

Control & monitoring

Online Analysis

Pre-Archival

LSDF Resources andResource Services

User

User DB (SMIS)

T1 Storage

DatabasesComputeFarm

T2 Tape

• Data storage and archive• Automatic data processing, e.g. reconstruction using

computationally demanding algorithms• Extending data sets by processing results

Page 14: Large Scale Data Facility for Data Intensive Synchrotron

14 www.kit.edu

Imaging Beamline Data Management

DAQ

UI UI UIControl System

Dataflow

Control & monitoring

Online Analysis

Pre-Archival

LSDF Resources andResource Services

User

User DB (SMIS)

T1 Storage

DatabasesComputeFarm

T2 Tape

• User access via DataBrowser or Web interface• Temporary accounts for external users

Page 15: Large Scale Data Facility for Data Intensive Synchrotron

15 www.kit.edu

Discussion

Imaging Beamline Data Management• DAQ in production• Online analysis in production• Pre-archival in development• Data format (NeXus) in development• Common Data Model tools (SOLEIL) in preparation

Large Scale Data Facility• Data infrastructure and basic services in production• Data intensive computing in production• DataBrowser, execution framework prototype,

in production (biology)• KIT Data Manager (KDM) in development

Page 16: Large Scale Data Facility for Data Intensive Synchrotron

16 www.kit.edu

Conclusion

KIT provides high level infrastructures and services

Large Scale Data Facility infrastructures:• “Unlimited” resources• Independent technology

development• Shared funding:

• Research ProgrammeSuperComputing

• Communities

Cutting edge beamline research at KIT is not limited by infrastructure constraints