jialin liu, bradly crysler, yin lu, yong chen oct. 15. 2013@u-reason seminar data-intensive scalable...
TRANSCRIPT
Jialin Liu, Bradly Crysler, Yin Lu, Yong Chen
Oct. 15. 2013@U-REaSON Seminar
Data-Intensive Scalable Computing Laboratory (DISCL)
Locality-driven High-level I/O Aggregation for Processing Scientific Datasets
1
Introduction
Scientific simulations nowadays generate a few terabytes (TB) of data in a single run and the data sizes are expected to reach petabytes (PB) in the near future. VPIC, Vector Particle in Cell, Plasma
physics, 26 bytes per particle, 30TB
Accessing and analyzing the data
reveals poor I/O performance due to
the logical-physical mismatching.
Introduction
Scientific Datasets and Scientific I/O Libraries PnetCDF, HDF5, ADIOS
PnetCDF
MPI-IO
Parallel File Systems
Scientific I/O libraries allow users to specify array-based logical input Logical-physical mismatching
Motivation
I/O methods in scientific I/O libraries(PnetCDF, ADIOS, HDF5):
Independent I/O
Collective I/O
Nonblocking I/O
Processes collaboration: No Calls collaboration : No
Processes collaboration: Yes Calls collaboration : No
Processes collaboration: Yes Calls collaboration : Yes
Motivation
Contention on Storage Server without Aware of Locality
…
Call0
…
Call1
…
Calli
…
Two Phase Collective I/O
…ag00 ag01 ag02 ag03
… … …
ag10 ag11 ag12 ag13 agi0 agi1 agi2 agi3
Idea: High level I/O Aggregation
start{0,0,0}length{100,200,100}
start{0,0,100}length{100,200,100}
start{10,20,100}length{10,150,400}
start{10,170,100}length{10,150,400}
PhysicalLayoutsub0
sub2
sub0sub2
sub1
sub3
sub1
sub3
PhysicalLayout
start{0,0,0}length{100,200,200}
start{10,20,100}length{10,300,400}
Call0
Call1
Logical Input Decomposition
Idea: High level I/O Aggregation
Basic Idea Figure out the overlapping among requests Eliminate the overlapping before doing I/O
Challenges How to decompose the requests How to aggregate the sub-arrays at a high level
Hila: High Level I/O Aggregation
Way to figure out the physical layout Sub-correlation Function
Sub-correlation Set
Lustre Striping: stripe size: t; stripe count: l; Dataset : Dimension: d; subsets size: m
Conclusion and Future Work
Conclusion The mismatching between logical access and physical layout
can lead to poor performance. We propose the locality-driven high-level aggregation approach
(HiLa) to facilitate the existing I/O methods by eliminating the overlapping among sub-array requests.
Future Work Apply to write operations Integrate with file systems.
Locality-driven High-level I/O Aggregationfor Processing Scientific Datasets
Thanks
Q&Ahttp://discl.cs.ttu.edu