toward real-time analysis of large data volumes for diffraction studies by martin kunz, lbnl

23
Towards real-time analysis of large data volumes for synchrotron experiments Martin Kunz , Nobumichi Tamura Advanced Light Source, Lawrence Berkeley National Lab

Upload: earthcube

Post on 27-Jun-2015

267 views

Category:

Technology


0 download

DESCRIPTION

Talk at the EarthCube End-User Domain Workshop for Rock Deformation and Mineral Physics Research. By Martin Kunz, Lawrence Berkeley National Laboratory

TRANSCRIPT

Page 1: Toward Real-Time Analysis of Large Data Volumes for Diffraction Studies by Martin Kunz, LBNL

Towards real-time analysis of large data volumes for synchrotron experiments

Martin Kunz, Nobumichi Tamura

Advanced Light Source, Lawrence Berkeley National Lab

Page 2: Toward Real-Time Analysis of Large Data Volumes for Diffraction Studies by Martin Kunz, LBNL

Acknowledgements

- Jack Deslippe, David Skinner (NERSC)

- Abdelilah Essiari , Craig E. Tull (LBNL-CRD)

- Eli Dart (ESNET)

- Dula Parkinson (LBNL – ALS)

Towards real-time analysis of large data volumes for synchrotron experiments

Page 3: Toward Real-Time Analysis of Large Data Volumes for Diffraction Studies by Martin Kunz, LBNL

Towards real-time analysis of large data volumes for synchrotron experiments

X-rays and Earth-Sciences; the story of a moving bottle-neck:

X-ray Source

X-ray Detectors

Data Analysis

Publication

1960’s / 1970’s

Henry Levy with Picker 5-circle and PDP-5

Page 4: Toward Real-Time Analysis of Large Data Volumes for Diffraction Studies by Martin Kunz, LBNL

Towards real-time analysis of large data volumes for synchrotron experiments

X-rays and Earth-Sciences; the story of a moving bottle-neck:

X-ray Source

X-ray Detectors

Data Analysis

Publication

1980’s / 1990’s

1995: “MD Storm”: Readout time: 45 minutes

Page 5: Toward Real-Time Analysis of Large Data Volumes for Diffraction Studies by Martin Kunz, LBNL

Towards real-time analysis of large data volumes for synchrotron experiments

X-rays and Earth-Sciences; the story of a moving bottle-neck:

X-ray Source

X-ray Detectors

Data Analysis

Publication

2000’s / 2010’s

Page 6: Toward Real-Time Analysis of Large Data Volumes for Diffraction Studies by Martin Kunz, LBNL

Towards real-time analysis of large data volumes for synchrotron experiments

X-rays and Earth-Sciences; the story of a moving bottle-neck:

X-ray Source

X-ray Detectors

Data Analysis

Publication

Future:

Interactive access to supercomputers

Page 7: Toward Real-Time Analysis of Large Data Volumes for Diffraction Studies by Martin Kunz, LBNL

Towards real-time analysis of large data volumes for synchrotron experiments

Examples of mineral physics related experiments with high data rates:

1) In situ powder diffraction with automated P-T stepping:

http://www.ltp-oldenburg.deALS BL 12.2.2 with Perkin Elmer detector (~ 0 read-out delay)

Data rate in the order of 1000’s of frames per day (i.e. 10’s of GB/day)

Page 8: Toward Real-Time Analysis of Large Data Volumes for Diffraction Studies by Martin Kunz, LBNL

Towards real-time analysis of large data volumes for synchrotron experiments

Examples of mineral physics related experiments with high data rates:

2) Micro-diffraction / phase/orientation/strain-mapping at high spatial resolution

Left: Distribution of Re3N (black) and Re (blue) grown in a laser-heated DACRight: Relative orientation of Re3N grains.Source: Friedrich et al. (2010), PRL (105), 085504.

Data rate in the order of 10000’s of frames per day (i.e. 100’s of GB/day)

Micro-diffraction set-up at ALS beamline 12.3.2 with Pilatus-1M detector.

Page 9: Toward Real-Time Analysis of Large Data Volumes for Diffraction Studies by Martin Kunz, LBNL

Towards real-time analysis of large data volumes for synchrotron experiments

Examples of mineral physics related experiments with high data rates:

3) Tomography 3d-mapping of geo-materials:

Data rate in the order of 100’000’s of frames per day (i.e. TB’s/day)

Tomography set-up at ALS beamline 8.3.2

X-raysScintillator

Supercritical CO2 penetrating sandstone on ALS BL 8.3.2 (courtesy J Ajo-Franklin)

Distribution of Fe-alloy melt prepared at 64 GPa measured at SSRL. Shi et al. (2013) Nature Geosciences. DOI: 10.1038/NGEO1956

Page 10: Toward Real-Time Analysis of Large Data Volumes for Diffraction Studies by Martin Kunz, LBNL

Towards real-time analysis of large data volumes for synchrotron experiments

How do we tackle this at the ALS?

1) Not-quite-real-time - local cluster for micro-diffraction analysis

- 24 dual-socket AMD Opteron 248 2.2Ghz processor nodes 48 CPU’s

- 48 GB aggregate memory

- 14 TB shared disk storage

- Gigabit Ethernet interconnect

- 212 GFLOPS (theoretical peak)

Page 11: Toward Real-Time Analysis of Large Data Volumes for Diffraction Studies by Martin Kunz, LBNL

Towards real-time analysis of large data volumes for synchrotron experiments

How do we tackle this at the ALS?

1) Not-quite-real-time - local cluster for micro-diffraction analysis

1) User tunes parameters manually on some ‘typical’ patterns

Page 12: Toward Real-Time Analysis of Large Data Volumes for Diffraction Studies by Martin Kunz, LBNL

Towards real-time analysis of large data volumes for synchrotron experiments

How do we tackle this at the ALS?

1) Not-quite-real-time - local cluster for micro-diffraction analysis

1) Analysis Parameters are written into a instruction-file

Page 13: Toward Real-Time Analysis of Large Data Volumes for Diffraction Studies by Martin Kunz, LBNL

Towards real-time analysis of large data volumes for synchrotron experiments

How do we tackle this at the ALS?

1) Not-quite-real-time - local cluster for micro-diffraction analysis

1) Analysis Parameters are written into a instruction-file

Page 14: Toward Real-Time Analysis of Large Data Volumes for Diffraction Studies by Martin Kunz, LBNL

Towards real-time analysis of large data volumes for synchrotron experiments

How do we tackle this at the ALS?

1) Not-quite-real-time - local cluster for micro-diffraction analysis

2) Launch parsing script: -> reads instruction file and parses data-file onto available CPU’s -> writes batch files which manage individual CPU’s -> launches software on each node

Page 15: Toward Real-Time Analysis of Large Data Volumes for Diffraction Studies by Martin Kunz, LBNL

Towards real-time analysis of large data volumes for synchrotron experiments

How do we tackle this at the ALS?

1) Not-quite-real-time - local cluster for micro-diffraction analysis

3) Results are written in a single file which can be viewed and further analyzed and published:

Average Intensity: Gives high-res fine structure of grain

Relative lattice orientation: Gives domain structure.Total color range blue to red corresponds to 4 degs rotation.

Page 16: Toward Real-Time Analysis of Large Data Volumes for Diffraction Studies by Martin Kunz, LBNL

Towards real-time analysis of large data volumes for synchrotron experiments

How do we tackle this at the ALS?

2) Real time – collaboration with National Energy Research Scientific Computing Center (NERSC) (in development)

1) Data are sent directly to NERSC for analysis and storage during data collection

Data are packaged: - after every n images a ‘trigger file’ is deposited in a directory which is monitored by NERSC.- a SPADE web-app wraps the data (512 files at a time) with HDF5 (hierarchical data format) and ships them to NERSC via a Gigabit line (will be upgraded to 10G line).- at NERSC data are received by a SPADE instance, places them in target folder and on tape, and sends an acknowledgment.

Page 17: Toward Real-Time Analysis of Large Data Volumes for Diffraction Studies by Martin Kunz, LBNL

Towards real-time analysis of large data volumes for synchrotron experiments

How do we tackle this at the ALS?

2) Real time – collaboration with National Energy Research Scientific Computing Center (NERSC) (in development)

1) Data are sent directly to NERSC for analysis and storage during data collection Up and running

Transfer control is web-based

Page 18: Toward Real-Time Analysis of Large Data Volumes for Diffraction Studies by Martin Kunz, LBNL

Towards real-time analysis of large data volumes for synchrotron experiments

How do we tackle this at the ALS?

2) Real time – collaboration with National Energy Research Scientific Computing Center (NERSC) (in development)

1) Data are sent directly to NERSC for analysis and storage during data collection Up and running

Transfer control is web-based

Page 19: Toward Real-Time Analysis of Large Data Volumes for Diffraction Studies by Martin Kunz, LBNL

Towards real-time analysis of large data volumes for synchrotron experiments

How do we tackle this at the ALS?

2) Real time – collaboration with National Energy Research Scientific Computing Center (NERSC) (in development)

1) Data are sent directly to NERSC for analysis and storage during data collection: Up and running

Transfer control is web-based

Page 20: Toward Real-Time Analysis of Large Data Volumes for Diffraction Studies by Martin Kunz, LBNL

Towards real-time analysis of large data volumes for synchrotron experiments

How do we tackle this at the ALS?

2) Real time – collaboration with National Energy Research Scientific Computing Center (NERSC) (in development)

2) Analysis parameters are set-up with a web-app - under development

Page 21: Toward Real-Time Analysis of Large Data Volumes for Diffraction Studies by Martin Kunz, LBNL

Towards real-time analysis of large data volumes for synchrotron experiments

How do we tackle this at the ALS?

2) Real time – collaboration with National Energy Research Scientific Computing Center (NERSC) (in development)

2) Analysis parameters are set-up with a web-app - under development

Jobs are launched manually by user via same web-page.Test-runs indicate analysis time in the order of data collection time; can in principle run synchronous to data collection.

Page 22: Toward Real-Time Analysis of Large Data Volumes for Diffraction Studies by Martin Kunz, LBNL

Towards real-time analysis of large data volumes for synchrotron experiments

How do we tackle this at the ALS?

2) Real time – collaboration with National Energy Research Scientific Computing Center (NERSC) (in development)

3) Analysis jobs are executed on Carver - under development

Carver is an IBM iDataPlex cluster- 1202 nodes with a total of 9984 processor cores- 106 Tflop/sec peak performance- largest allocated parallel job is 512 cores

Page 23: Toward Real-Time Analysis of Large Data Volumes for Diffraction Studies by Martin Kunz, LBNL

Towards real-time analysis of large data volumes for synchrotron experiments

Summary:

- Data analysis is the new bottle-neck limiting progress in many aspects of experimental mineral physics

- Real-time analysis with immediate feed-back is increasingly important in experimental mineral physics

- These challenges cannot always be met with traditional desktop machines – software has to be automatized and parallelized; collaborations with super-computing is becoming important also for experimental scientists (at least for a few more iterations of Moore’s cycle).

- Data analysis on super-computers, remotely controlled with web-applications is a very promising alley, allowing for big-data methods to enter mineral physics.

- Future developments may (must?) evolve away from super computers to highly parallelized (GPU’s) local computers and/or cloud computing.