toward real-time analysis of large data volumes for diffraction studies by martin kunz, lbnl
DESCRIPTION
Talk at the EarthCube End-User Domain Workshop for Rock Deformation and Mineral Physics Research. By Martin Kunz, Lawrence Berkeley National LaboratoryTRANSCRIPT
Towards real-time analysis of large data volumes for synchrotron experiments
Martin Kunz, Nobumichi Tamura
Advanced Light Source, Lawrence Berkeley National Lab
Acknowledgements
- Jack Deslippe, David Skinner (NERSC)
- Abdelilah Essiari , Craig E. Tull (LBNL-CRD)
- Eli Dart (ESNET)
- Dula Parkinson (LBNL – ALS)
Towards real-time analysis of large data volumes for synchrotron experiments
Towards real-time analysis of large data volumes for synchrotron experiments
X-rays and Earth-Sciences; the story of a moving bottle-neck:
X-ray Source
X-ray Detectors
Data Analysis
Publication
1960’s / 1970’s
Henry Levy with Picker 5-circle and PDP-5
Towards real-time analysis of large data volumes for synchrotron experiments
X-rays and Earth-Sciences; the story of a moving bottle-neck:
X-ray Source
X-ray Detectors
Data Analysis
Publication
1980’s / 1990’s
1995: “MD Storm”: Readout time: 45 minutes
Towards real-time analysis of large data volumes for synchrotron experiments
X-rays and Earth-Sciences; the story of a moving bottle-neck:
X-ray Source
X-ray Detectors
Data Analysis
Publication
2000’s / 2010’s
Towards real-time analysis of large data volumes for synchrotron experiments
X-rays and Earth-Sciences; the story of a moving bottle-neck:
X-ray Source
X-ray Detectors
Data Analysis
Publication
Future:
Interactive access to supercomputers
Towards real-time analysis of large data volumes for synchrotron experiments
Examples of mineral physics related experiments with high data rates:
1) In situ powder diffraction with automated P-T stepping:
http://www.ltp-oldenburg.deALS BL 12.2.2 with Perkin Elmer detector (~ 0 read-out delay)
Data rate in the order of 1000’s of frames per day (i.e. 10’s of GB/day)
Towards real-time analysis of large data volumes for synchrotron experiments
Examples of mineral physics related experiments with high data rates:
2) Micro-diffraction / phase/orientation/strain-mapping at high spatial resolution
Left: Distribution of Re3N (black) and Re (blue) grown in a laser-heated DACRight: Relative orientation of Re3N grains.Source: Friedrich et al. (2010), PRL (105), 085504.
Data rate in the order of 10000’s of frames per day (i.e. 100’s of GB/day)
Micro-diffraction set-up at ALS beamline 12.3.2 with Pilatus-1M detector.
Towards real-time analysis of large data volumes for synchrotron experiments
Examples of mineral physics related experiments with high data rates:
3) Tomography 3d-mapping of geo-materials:
Data rate in the order of 100’000’s of frames per day (i.e. TB’s/day)
Tomography set-up at ALS beamline 8.3.2
X-raysScintillator
Supercritical CO2 penetrating sandstone on ALS BL 8.3.2 (courtesy J Ajo-Franklin)
Distribution of Fe-alloy melt prepared at 64 GPa measured at SSRL. Shi et al. (2013) Nature Geosciences. DOI: 10.1038/NGEO1956
Towards real-time analysis of large data volumes for synchrotron experiments
How do we tackle this at the ALS?
1) Not-quite-real-time - local cluster for micro-diffraction analysis
- 24 dual-socket AMD Opteron 248 2.2Ghz processor nodes 48 CPU’s
- 48 GB aggregate memory
- 14 TB shared disk storage
- Gigabit Ethernet interconnect
- 212 GFLOPS (theoretical peak)
Towards real-time analysis of large data volumes for synchrotron experiments
How do we tackle this at the ALS?
1) Not-quite-real-time - local cluster for micro-diffraction analysis
1) User tunes parameters manually on some ‘typical’ patterns
Towards real-time analysis of large data volumes for synchrotron experiments
How do we tackle this at the ALS?
1) Not-quite-real-time - local cluster for micro-diffraction analysis
1) Analysis Parameters are written into a instruction-file
Towards real-time analysis of large data volumes for synchrotron experiments
How do we tackle this at the ALS?
1) Not-quite-real-time - local cluster for micro-diffraction analysis
1) Analysis Parameters are written into a instruction-file
Towards real-time analysis of large data volumes for synchrotron experiments
How do we tackle this at the ALS?
1) Not-quite-real-time - local cluster for micro-diffraction analysis
2) Launch parsing script: -> reads instruction file and parses data-file onto available CPU’s -> writes batch files which manage individual CPU’s -> launches software on each node
Towards real-time analysis of large data volumes for synchrotron experiments
How do we tackle this at the ALS?
1) Not-quite-real-time - local cluster for micro-diffraction analysis
3) Results are written in a single file which can be viewed and further analyzed and published:
Average Intensity: Gives high-res fine structure of grain
Relative lattice orientation: Gives domain structure.Total color range blue to red corresponds to 4 degs rotation.
Towards real-time analysis of large data volumes for synchrotron experiments
How do we tackle this at the ALS?
2) Real time – collaboration with National Energy Research Scientific Computing Center (NERSC) (in development)
1) Data are sent directly to NERSC for analysis and storage during data collection
Data are packaged: - after every n images a ‘trigger file’ is deposited in a directory which is monitored by NERSC.- a SPADE web-app wraps the data (512 files at a time) with HDF5 (hierarchical data format) and ships them to NERSC via a Gigabit line (will be upgraded to 10G line).- at NERSC data are received by a SPADE instance, places them in target folder and on tape, and sends an acknowledgment.
Towards real-time analysis of large data volumes for synchrotron experiments
How do we tackle this at the ALS?
2) Real time – collaboration with National Energy Research Scientific Computing Center (NERSC) (in development)
1) Data are sent directly to NERSC for analysis and storage during data collection Up and running
Transfer control is web-based
Towards real-time analysis of large data volumes for synchrotron experiments
How do we tackle this at the ALS?
2) Real time – collaboration with National Energy Research Scientific Computing Center (NERSC) (in development)
1) Data are sent directly to NERSC for analysis and storage during data collection Up and running
Transfer control is web-based
Towards real-time analysis of large data volumes for synchrotron experiments
How do we tackle this at the ALS?
2) Real time – collaboration with National Energy Research Scientific Computing Center (NERSC) (in development)
1) Data are sent directly to NERSC for analysis and storage during data collection: Up and running
Transfer control is web-based
Towards real-time analysis of large data volumes for synchrotron experiments
How do we tackle this at the ALS?
2) Real time – collaboration with National Energy Research Scientific Computing Center (NERSC) (in development)
2) Analysis parameters are set-up with a web-app - under development
Towards real-time analysis of large data volumes for synchrotron experiments
How do we tackle this at the ALS?
2) Real time – collaboration with National Energy Research Scientific Computing Center (NERSC) (in development)
2) Analysis parameters are set-up with a web-app - under development
Jobs are launched manually by user via same web-page.Test-runs indicate analysis time in the order of data collection time; can in principle run synchronous to data collection.
Towards real-time analysis of large data volumes for synchrotron experiments
How do we tackle this at the ALS?
2) Real time – collaboration with National Energy Research Scientific Computing Center (NERSC) (in development)
3) Analysis jobs are executed on Carver - under development
Carver is an IBM iDataPlex cluster- 1202 nodes with a total of 9984 processor cores- 106 Tflop/sec peak performance- largest allocated parallel job is 512 cores
Towards real-time analysis of large data volumes for synchrotron experiments
Summary:
- Data analysis is the new bottle-neck limiting progress in many aspects of experimental mineral physics
- Real-time analysis with immediate feed-back is increasingly important in experimental mineral physics
- These challenges cannot always be met with traditional desktop machines – software has to be automatized and parallelized; collaborations with super-computing is becoming important also for experimental scientists (at least for a few more iterations of Moore’s cycle).
- Data analysis on super-computers, remotely controlled with web-applications is a very promising alley, allowing for big-data methods to enter mineral physics.
- Future developments may (must?) evolve away from super computers to highly parallelized (GPU’s) local computers and/or cloud computing.