parallel i/o in cmaq david wong, c. e. yang*, j. s. fu*, k. wong*, and y. gao** *university of...

25
Parallel I/O in CMAQ David Wong, C. E. Yang*, J. S. Fu*, K. Wong*, and Y. Gao** *University of Tennessee, Knoxville, TN, USA **now at: Pacific Northwest National Laboratory, Richland, WA, USA 1 CMAS Conference October 5, 2015

Upload: gerald-dorsey

Post on 17-Jan-2016

216 views

Category:

Documents


5 download

TRANSCRIPT

Page 1: Parallel I/O in CMAQ David Wong, C. E. Yang*, J. S. Fu*, K. Wong*, and Y. Gao** *University of Tennessee, Knoxville, TN, USA **now at: Pacific Northwest

Parallel I/O in CMAQ

David Wong, C. E. Yang*, J. S. Fu*, K. Wong*, and Y. Gao**

*University of Tennessee, Knoxville, TN, USA

**now at: Pacific Northwest National Laboratory, Richland, WA, USA

1

CMAS ConferenceOctober 5, 2015

Page 2: Parallel I/O in CMAQ David Wong, C. E. Yang*, J. S. Fu*, K. Wong*, and Y. Gao** *University of Tennessee, Knoxville, TN, USA **now at: Pacific Northwest

Background

• IOAPI was developed over 20 years ago

• It was designed for serial operation (with OpenMP capability)

• CMAQ uses it for performing data input and output functions

• IOAPI package provides very simple and nice interface to handle netCDF format of data

• I/O has been reported as one of the performance bottlenecks in CMAQ

2

Page 3: Parallel I/O in CMAQ David Wong, C. E. Yang*, J. S. Fu*, K. Wong*, and Y. Gao** *University of Tennessee, Knoxville, TN, USA **now at: Pacific Northwest

PARIO

• It was developed in 1998

• It was built on top of IOAPI with a “pseudo” parallel implementation

• Single read and single write

3

Page 4: Parallel I/O in CMAQ David Wong, C. E. Yang*, J. S. Fu*, K. Wong*, and Y. Gao** *University of Tennessee, Knoxville, TN, USA **now at: Pacific Northwest

CMAQ I/O Scheme

4

Page 5: Parallel I/O in CMAQ David Wong, C. E. Yang*, J. S. Fu*, K. Wong*, and Y. Gao** *University of Tennessee, Knoxville, TN, USA **now at: Pacific Northwest

CMAQ I/O Scheme (cont’)

5

Page 6: Parallel I/O in CMAQ David Wong, C. E. Yang*, J. S. Fu*, K. Wong*, and Y. Gao** *University of Tennessee, Knoxville, TN, USA **now at: Pacific Northwest

What is pnetCDF?

• developed by Northwestern University and Argonne National Laboratory

• Build on top of MPI2

• Based on netCDF format

• Requires Parallel File System (e.g. Lustre, GPFS)

• Publicly available free software

6

Page 7: Parallel I/O in CMAQ David Wong, C. E. Yang*, J. S. Fu*, K. Wong*, and Y. Gao** *University of Tennessee, Knoxville, TN, USA **now at: Pacific Northwest

Parallel I/O Scheme

7

Page 8: Parallel I/O in CMAQ David Wong, C. E. Yang*, J. S. Fu*, K. Wong*, and Y. Gao** *University of Tennessee, Knoxville, TN, USA **now at: Pacific Northwest

Among of Data in a Processor

8

Page 9: Parallel I/O in CMAQ David Wong, C. E. Yang*, J. S. Fu*, K. Wong*, and Y. Gao** *University of Tennessee, Knoxville, TN, USA **now at: Pacific Northwest

With Data Aggregation

9

aggregate along column, cc aggregate along row, cr

Page 10: Parallel I/O in CMAQ David Wong, C. E. Yang*, J. S. Fu*, K. Wong*, and Y. Gao** *University of Tennessee, Knoxville, TN, USA **now at: Pacific Northwest

With Data Aggregation (cont’)

10

Page 11: Parallel I/O in CMAQ David Wong, C. E. Yang*, J. S. Fu*, K. Wong*, and Y. Gao** *University of Tennessee, Knoxville, TN, USA **now at: Pacific Northwest

Experiment Setup

• Consider three sizes of domain: CA (89x134), Eastern US (279x299), and US conus (459x299)

• 35 layers and 146 species

• Reading in or writing out three time steps of data, using variable techniques: regular, pnetcdf, and data aggregation along a dimension

• Tested on two DOE machines: Edison and Kraken with Lustre file system

• Experimenting various stripe counts (2, 5, 11) and sizes (1M, 2M, 4M, 8M, 16M) 11

Page 12: Parallel I/O in CMAQ David Wong, C. E. Yang*, J. S. Fu*, K. Wong*, and Y. Gao** *University of Tennessee, Knoxville, TN, USA **now at: Pacific Northwest

Simulation Domains

12

Page 13: Parallel I/O in CMAQ David Wong, C. E. Yang*, J. S. Fu*, K. Wong*, and Y. Gao** *University of Tennessee, Knoxville, TN, USA **now at: Pacific Northwest

Experiment Code

13

Page 14: Parallel I/O in CMAQ David Wong, C. E. Yang*, J. S. Fu*, K. Wong*, and Y. Gao** *University of Tennessee, Knoxville, TN, USA **now at: Pacific Northwest

Performance Metric

14

Page 15: Parallel I/O in CMAQ David Wong, C. E. Yang*, J. S. Fu*, K. Wong*, and Y. Gao** *University of Tennessee, Knoxville, TN, USA **now at: Pacific Northwest

CA Case on Kraken

15

Page 16: Parallel I/O in CMAQ David Wong, C. E. Yang*, J. S. Fu*, K. Wong*, and Y. Gao** *University of Tennessee, Knoxville, TN, USA **now at: Pacific Northwest

CA Case on Edison

16

Page 17: Parallel I/O in CMAQ David Wong, C. E. Yang*, J. S. Fu*, K. Wong*, and Y. Gao** *University of Tennessee, Knoxville, TN, USA **now at: Pacific Northwest

EUS Case on Kraken

17

Page 18: Parallel I/O in CMAQ David Wong, C. E. Yang*, J. S. Fu*, K. Wong*, and Y. Gao** *University of Tennessee, Knoxville, TN, USA **now at: Pacific Northwest

EUS Case on Edison

18

Page 19: Parallel I/O in CMAQ David Wong, C. E. Yang*, J. S. Fu*, K. Wong*, and Y. Gao** *University of Tennessee, Knoxville, TN, USA **now at: Pacific Northwest

CONUS Case on Kraken

19

Page 20: Parallel I/O in CMAQ David Wong, C. E. Yang*, J. S. Fu*, K. Wong*, and Y. Gao** *University of Tennessee, Knoxville, TN, USA **now at: Pacific Northwest

CONUS Case on Edison

20

Page 21: Parallel I/O in CMAQ David Wong, C. E. Yang*, J. S. Fu*, K. Wong*, and Y. Gao** *University of Tennessee, Knoxville, TN, USA **now at: Pacific Northwest

pnetCDF Failure?!

21

Page 22: Parallel I/O in CMAQ David Wong, C. E. Yang*, J. S. Fu*, K. Wong*, and Y. Gao** *University of Tennessee, Knoxville, TN, USA **now at: Pacific Northwest

With GPFS

• NOAA machine, 16x32 Conus domain

22Different I/O methods

Tim

e (s

ec)

Page 23: Parallel I/O in CMAQ David Wong, C. E. Yang*, J. S. Fu*, K. Wong*, and Y. Gao** *University of Tennessee, Knoxville, TN, USA **now at: Pacific Northwest

Test on CMAQ

• Test with 120, 180, and 360 CPUs on Kraken and Edison with one day simulation on EUS 4km domain

• Write out VIS, WET_DEP, CONC, DRY_DEP, A_CONC and S_CGRID files only

• Stripe count is 11 and stripe size 2M

23

Page 24: Parallel I/O in CMAQ David Wong, C. E. Yang*, J. S. Fu*, K. Wong*, and Y. Gao** *University of Tennessee, Knoxville, TN, USA **now at: Pacific Northwest

Timing

24

Page 25: Parallel I/O in CMAQ David Wong, C. E. Yang*, J. S. Fu*, K. Wong*, and Y. Gao** *University of Tennessee, Knoxville, TN, USA **now at: Pacific Northwest

Summary

• True parallel I/O using pnetCDF renders excellent performance

• Data aggregation technique enhances the performance of pnetCDF

• The new technique has been applied to CMAQ output operation and has shown significant improvement

• Collaborating with Dr. Carlie Coats to implement this in IOAPI_3 library

• Current implementation only works with gridded files with same size as the model domain

25