jialin liu , bradly crysler , yin lu , yong chen oct. 15. 2013@u-reason seminar
DESCRIPTION
Jialin Liu , Bradly Crysler , Yin Lu , Yong Chen Oct. 15. 2013@U-REaSON Seminar Data-Intensive Scalable Computing Laboratory (DISCL ). Locality-driven High-level I/O Aggregation for Processing Scientific Datasets. Introduction. - PowerPoint PPT PresentationTRANSCRIPT
Jialin Liu, Bradly Crysler, Yin Lu, Yong ChenOct. 15. 2013@U-REaSON Seminar
Data-Intensive Scalable Computing Laboratory (DISCL)
Locality-driven High-level I/O Aggregation for Processing Scientific Datasets
1
Introduction
Scientific simulations nowadays generate a few terabytes (TB) of data in a single run and the data sizes are expected to reach petabytes (PB) in the near future. VPIC, Vector Particle in Cell, Plasma physics, 26 bytes per particle, 30TB
Accessing and analyzing the data reveals poor I/O performance due to the logical-physical mismatching.
Introduction
Scientific Datasets and Scientific I/O Libraries PnetCDF, HDF5, ADIOS
PnetCDF
MPI-IO
Parallel File Systems
Scientific I/O libraries allow users to specify array-based logical input Logical-physical mismatching
Motivation
I/O methods in scientific I/O libraries(PnetCDF, ADIOS, HDF5):
Independent I/O
Collective I/O
Nonblocking I/O
Processes collaboration: No Calls collaboration : No
Processes collaboration: Yes Calls collaboration : No
Processes collaboration: Yes Calls collaboration : Yes
Motivation
Contention on Storage Server without Aware of Locality
…
Call0
…
Call1
…
Calli
…
Two Phase Collective I/O
…ag00 ag01 ag02 ag03
… … …
ag10 ag11 ag12 ag13 agi0 agi1 agi2 agi3
Performance with Overlapping Calls
Conclusion: Overlapping Should be Removed
Idea: High level I/O Aggregation
start{0,0,0}length{100,200,100}
start{0,0,100}length{100,200,100}
start{10,20,100}length{10,150,400}
start{10,170,100}length{10,150,400}
PhysicalLayoutsub0
sub2
sub0sub2
sub1
sub3
sub1sub3
PhysicalLayout
start{0,0,0}length{100,200,200}
start{10,20,100}length{10,300,400}
Call0
Call1
Logical Input Decomposition
Idea: High level I/O Aggregation
Basic Idea Figure out the overlapping among requests Eliminate the overlapping before doing I/O
Challenges How to decompose the requests How to aggregate the sub-arrays at a high level
Hila: High Level I/O Aggregation
Way to figure out the physical layout Sub-correlation Function
Sub-correlation Set
Lustre Striping: stripe size: t; stripe count: l; Dataset : Dimension: d; subsets size: m
Hila Algorithm: Prior Step
Prior Step: calculate sub-correlation set, one time analysis
Hila Algorithm: Decomposition
Main Steps: Request Decomposition and Aggregation
Improvement with Hila
Performance Improved with Hila
Improvement with Hila
FASM Improved with Hila
Conclusion and Future Work
Conclusion The mismatching between logical access and physical layout
can lead to poor performance. We propose the locality-driven high-level aggregation approach
(HiLa) to facilitate the existing I/O methods by eliminating the overlapping among sub-array requests.
Future Work Apply to write operations Integrate with file systems.
Locality-driven High-level I/O Aggregationfor Processing Scientific Datasets
ThanksQ&A
http://discl.cs.ttu.edu