intel corporation - cug · cong xu, omkar kulkarni, vishwanath venkatesan, kalyana chadalavada...
TRANSCRIPT
![Page 1: Intel Corporation - CUG · Cong Xu, Omkar Kulkarni, Vishwanath Venkatesan, Kalyana Chadalavada Intel Corporation Shane Snyder, Philip Carns Argonne National Laboratory Suren Byna](https://reader030.vdocuments.us/reader030/viewer/2022040411/5edabfd0434f4178104f90d6/html5/thumbnails/1.jpg)
Cong Xu, Omkar Kulkarni, Vishwanath Venkatesan, Kalyana Chadalavada
Intel Corporation
Shane Snyder, Philip Carns
Argonne National Laboratory
Suren Byna
Lawrence Berkeley National Laboratory
Robert Sisneros
National Center for Supercomputing Applications
![Page 2: Intel Corporation - CUG · Cong Xu, Omkar Kulkarni, Vishwanath Venkatesan, Kalyana Chadalavada Intel Corporation Shane Snyder, Philip Carns Argonne National Laboratory Suren Byna](https://reader030.vdocuments.us/reader030/viewer/2022040411/5edabfd0434f4178104f90d6/html5/thumbnails/2.jpg)
High Performance Data Division
Outline
2
• Motivation
• Darshan eXtended Tracing (DXT)
• Overhead Measurement
• Case Studies
• Future Work
![Page 3: Intel Corporation - CUG · Cong Xu, Omkar Kulkarni, Vishwanath Venkatesan, Kalyana Chadalavada Intel Corporation Shane Snyder, Philip Carns Argonne National Laboratory Suren Byna](https://reader030.vdocuments.us/reader030/viewer/2022040411/5edabfd0434f4178104f90d6/html5/thumbnails/3.jpg)
High Performance Data Division
Motivation
3
• Optimizing I/O is difficult
• Supercomputers evolve to exascale
• Increasingly complex I/O subsystems
• The Challenge: Profiling Tools
• Facilitate characterization of I/O activities
• Existing tools: Darshan, ScalaIOTrace, Breeze HPC, LIOProf, LMT, etc.
• Need for a comprehensive solution.
• More control over the resolution.
• Minimal runtime performance impact.
• Correlate data from multiple sources for complete picture
![Page 4: Intel Corporation - CUG · Cong Xu, Omkar Kulkarni, Vishwanath Venkatesan, Kalyana Chadalavada Intel Corporation Shane Snyder, Philip Carns Argonne National Laboratory Suren Byna](https://reader030.vdocuments.us/reader030/viewer/2022040411/5edabfd0434f4178104f90d6/html5/thumbnails/4.jpg)
High Performance Data Division
Darshan eXtended Tracing (DXT)
4
• What is Darshan?
• I/O profiling tool from ANL deployed on many large systems.
• Intercepts Application I/O and reports aggregate statistics
• What is “extended tracing”?
• Enhance Darshan to (optionally) report every intercepted call.
• Traces appear as a time series and can be post-processed offline.
• Provide tools for applying different types of analyses to the logs.
• Aggregate statistics and/or drill down to any level of granularity.
![Page 5: Intel Corporation - CUG · Cong Xu, Omkar Kulkarni, Vishwanath Venkatesan, Kalyana Chadalavada Intel Corporation Shane Snyder, Philip Carns Argonne National Laboratory Suren Byna](https://reader030.vdocuments.us/reader030/viewer/2022040411/5edabfd0434f4178104f90d6/html5/thumbnails/5.jpg)
High Performance Data Division
DXT Components
5
• Logging
• Records each intercepted I/O call.
• Request offset, length, start time, end time, MPI rank and the hostname.
• Can be switched on or off at runtime using an environment variable.
• Log buffer starts small and expands gradually as needed.
• Uses compression to limit the size of the output log file.
• Analysis
• Python script; basic analysis and visualization, but can be enhanced.
• Correlates traces with Lustre striping information.
• Group/filter requests by rank, host or Lustre OST.
• Detects outliers.
![Page 6: Intel Corporation - CUG · Cong Xu, Omkar Kulkarni, Vishwanath Venkatesan, Kalyana Chadalavada Intel Corporation Shane Snyder, Philip Carns Argonne National Laboratory Suren Byna](https://reader030.vdocuments.us/reader030/viewer/2022040411/5edabfd0434f4178104f90d6/html5/thumbnails/6.jpg)
High Performance Data Division
DXT Overhead: IOR on Cori
6
• Evaluation Environment
• Range from 1024 to 4096 processes
• Interleaved I/O on a single shared file
• FileSize: 4TB, BlockSize: 4MB, TransferSize: 4MB, Aggregators: 128
• Lustre – OSTs: 128, Clients: 128, Stripe Size: 4MB, Stripe Count: 128
![Page 7: Intel Corporation - CUG · Cong Xu, Omkar Kulkarni, Vishwanath Venkatesan, Kalyana Chadalavada Intel Corporation Shane Snyder, Philip Carns Argonne National Laboratory Suren Byna](https://reader030.vdocuments.us/reader030/viewer/2022040411/5edabfd0434f4178104f90d6/html5/thumbnails/7.jpg)
High Performance Data Division
DXT Overhead: Varying Transfer Size
7
• IOR Transfer Size
• Tunes the data transferred per I/O operation.
• For constant file size, smaller transfers result in more I/O operations.
• Larger DXT log file size due to more log entries.
![Page 8: Intel Corporation - CUG · Cong Xu, Omkar Kulkarni, Vishwanath Venkatesan, Kalyana Chadalavada Intel Corporation Shane Snyder, Philip Carns Argonne National Laboratory Suren Byna](https://reader030.vdocuments.us/reader030/viewer/2022040411/5edabfd0434f4178104f90d6/html5/thumbnails/8.jpg)
High Performance Data Division
Case Study: GCRM-IO
8
• I/O kernel of a climate code that models global atmospheric circulation.
• 256 processes write the pressure variable
• Grid: 10, Subdomain: 4, Timesteps: 64
![Page 9: Intel Corporation - CUG · Cong Xu, Omkar Kulkarni, Vishwanath Venkatesan, Kalyana Chadalavada Intel Corporation Shane Snyder, Philip Carns Argonne National Laboratory Suren Byna](https://reader030.vdocuments.us/reader030/viewer/2022040411/5edabfd0434f4178104f90d6/html5/thumbnails/9.jpg)
High Performance Data Division
Case Study: GCRM-IO
9
• Lock contention due to false sharing.
• Due to optimistic extent-based locks granted by LDLM.
![Page 10: Intel Corporation - CUG · Cong Xu, Omkar Kulkarni, Vishwanath Venkatesan, Kalyana Chadalavada Intel Corporation Shane Snyder, Philip Carns Argonne National Laboratory Suren Byna](https://reader030.vdocuments.us/reader030/viewer/2022040411/5edabfd0434f4178104f90d6/html5/thumbnails/10.jpg)
High Performance Data Division
Case Study: HACC-IO
10
• I/O kernel of Hardware Accelerated Cosmology Code.
• 256 Processes write 8 billion particles to single shared file
• ~300GB
• MPI Collective I/O performs much worse than POSIX
![Page 11: Intel Corporation - CUG · Cong Xu, Omkar Kulkarni, Vishwanath Venkatesan, Kalyana Chadalavada Intel Corporation Shane Snyder, Philip Carns Argonne National Laboratory Suren Byna](https://reader030.vdocuments.us/reader030/viewer/2022040411/5edabfd0434f4178104f90d6/html5/thumbnails/11.jpg)
High Performance Data Division
Aliasing
11
• Serialization of requests during communication phase.
P
P
P
P
A A A A
P
P
P
P
A
A
A
A
Contiguous (Worst Case)
Interleaved (Ideal Case)
![Page 12: Intel Corporation - CUG · Cong Xu, Omkar Kulkarni, Vishwanath Venkatesan, Kalyana Chadalavada Intel Corporation Shane Snyder, Philip Carns Argonne National Laboratory Suren Byna](https://reader030.vdocuments.us/reader030/viewer/2022040411/5edabfd0434f4178104f90d6/html5/thumbnails/12.jpg)
High Performance Data Division
Future Work
12
• DXT has landed and is available at: http://www.mcs.anl.gov/research/projects/darshan/
• First step to a comprehensive solution.
• Add more features to the analysis tool.
• Correlate data from multiple sources more effectively.
• Analysis over multiple jobs / system-wide.
• ML based adaptive caching/pre-fetching for reads.