productivity and high performance, can we have both? an ...anaconda includes h5py package ² h5py...
TRANSCRIPT
Jialin Liu !Data Analytics & Service Group!
Productivity and High Performance, Can we have both? An Exploration of Parallel-H5py from I/O Perspective
-1-May26,2017
Outlines
-2-
Ø HDF5 and H5py Ø Productivity
Ø H5py Internal
Ø Performance
Ø Case Studies
² Warp
² H5Boss
QuinceyKoziol
HDF5
-3-
lat|lon|temp----|-----|-----12|23|3.115|24|4.217|21|3.6
Ø HDF5areamongthetop5librariesatNERSC,2015² 750+uniqueusers@NERSC,millionof
usersworldwideØ 1987,NCSA&UIUC.NASAsendHDF-
EOSto2.4millionsendusersØ HierarchicaldataorganizaWonØ ParallelI/O
HDF5
-4-
Integer: 32-bit, LE
HDF5 Datatype
Multi-dimensional array of identically typed data elements
Specifications for single data element and array dimensions
2 Rank Dimensions
Dim[0] = 4 Dim[1] = 5
HDF5 Dataspace
H5py
-5-
The h5py package is a Pythonic interface to the HDF5 binary data format. Ø H5py provides easy-to-use high level interface, which
allows you to store huge amounts of numerical data, Ø Easily manipulate that data from NumPy. Ø H5py uses straightforward NumPy and Python
metaphors, like dictionary and NumPy array syntax.
-6-
H5py: a Productive HDF5 Interface
Similar & Simpler Interface
-7-
Filename Mode
file_id=H5Fcreate(“output.h5”,H5F_ACC_TRUNC,H5P_DEFAULT,H5P_DEFAULT);
Similar & Simpler Interface
-8-
H5Py HDF5
w-orx H5F_ACC_EXCL
w H5F_ACC_TRUNC
r H5F_RDONLY
r+ H5F_ACC_RDWR
a(default) H5F_ACC_RDWR&H5F_ACC_EXCL
Everything is Object
-9-
FileObject
One Line to Enable Parallel I/O
-10-
-11-
CollecWveIO² ReducestheIOcontenWononserverside² AggregatessmallIOintolargerconWguousIO
IndependentIO CollecWveIO
Two-Phase Collective IO, NERSC Contributions
Jean-Luc.Vay,Remi.Lehe,LBNL
WARP
Looks Like Numpy Arrays
-12-
Pathtothedataset
Field
² Indices:anythingthatcanbeconvertedtoaPythonlong² Slices(i.e.[:]or[0:10])² Fieldnames,inthecaseofcompounddata² AtmostoneEllipsis(...)object² Limitedfancyslicing,e.g.,dset[1:6,[5,8,9]],usewithcau?on
Slicing
Beyond Numpy Arrays
-13-
Chunking
Compression
² Error-detecWon² Chunking² Compression
DatasetObject
Checksum
-14-
Coding Efforts
-15-
Coding Efforts: Implicit IO
NoDataIO
Yes
Yes,butparWal
Yes,butparWal
-16-
Exploring Interactively on Notebook
�
hCps://ipython.nersc.gov
-17-
Learning the Data Easily
�
HDF5DataLayerinCaffe
moduleloaddeeplearning
-18-
Ø H5py: Productivity ² Similar/Simpler Interface ² Everything is Object ² One Line to Parallel I/O ² Beyond Numpy ² Productive Coding ² Seamlessly Importable in Notebook, etc
Ø H5py: Performance ² ? ² ?
Productivity --> Performance?
-19-
h5py mpi4py numpy
Challenging: More than Single IO Layer
“H5py performance is slow, parallel IO is not as good as serial IO”
-20-
h5py mpi4py numpy
Views of Performance
VerGcalView:• Performancepenaltyofpython
layer.e.g.,H5py,Cython
HorizontalView:• Scalability.e.g.,mpi4py,srun
-21-
H5py Implementation (Vertical View)
HDF5 C API (libhdf5)
hsize_tH5Dget_storage_size(hid_tdset_id)
Cython (h5d.pyx)
cdefclassDatasetID(ObjectID):defget_storage_size(self):returnH5Dget_storage_size(self.id)
Python(_hl/dataset.py)
classDataset(HLObject):@propertydefstoragesize(self):returnself.id.get_storage_size()
DatasetID
Dataset
File
Group
FileID
-22-
OperaGon H5py HDF5 Details RaGo
1KFileCreaWon(s) 4.7 3.0 Createafilethenclose
thefile63.8%
1KObjectScanning(s) 4.5 2.7
Openagroupthenscanallobjects:group,dataset,link,etc
60.0%
H5py Metadata Performance
-23-
H5py vs. HDF5 Single Node Independent I/O
StrongScaling,800MB WeakScaling,800MB/Process
97.8%100%
-24-
H5py vs. HDF5 Multi-node Independent I/O
StrongScaling WeakScaling
100%97.1%
-25-
H5py vs. HDF5 Single Node Collective I/O
StrongScaling,800MB WeakScaling,800MB/Process
100%98.6%
-26-
H5py vs. HDF5 Multi-node Collective I/O
StrongScaling WeakScaling
88%,81%,99%AVG:90%
84%,101%,75%AVG:87%
-27-
H5py vs. HDF5 Performance
SingleNode MulG-nodes
Metadata1kFileCreaWon 63.8%
1kObjectScanning 60.0%
IndependentI/OWeakScaling 97.8% 100%
StrongScaling 100% 97.1%
CollecWveI/OWeakScaling 100% 90%
StrongScaling 98.6% 87%
H5PyPerformance/HDF5Performance
-28-
Case Study I: Warp ² ParWcle-in-cellsimulaWoncodes² AlexFriedman,DavidGrote,
1980s² LBNL,LLNL,andPPPL² Broadvarietyofintegrated
physicsmodelsandextensivediagnosWcs
² Laser-wakefieldRemiLehe,LBNL,GTC2017
-29-
Case Study I: Warp ² Laser-wakefield
RemiLehe,LBNL,GTC2017
500metersà9cm
-30-
Case Study I: Warp IO with H5py
iteraWon0
iteraWon1
-31-
Case Study I: Warp IO with H5py
² 172-600MBperfile
² Withparallel_output=False,thesimulaWonfinishedinlessthan20min,soittook
lessthan20mintowriteallthesefiles.
² Withparallel_output=True,thesimulaWononlyhadWmetowritethefirst2files(out
of80!)
-32-
Case Study I: Warp IO with H5py
32bit 64bit
“LettherebeParallelI/O”
“Sorry,youbrokemyrules”
-33-
Case Study I: Warp IO with H5py
² NumpyarrayinWarpisusing64bits² H5pydatasetinWarpiscreatedwith32bits
float,dtype=‘f’² HDF5internallychecksthetypeconsistency² RefusestousecollecWveI/Oincaseof
inconsistency
AlexSim,CRD/LBNL
² BeforeFix:527seconds² AwerFix:1.3seconds
-34-
Ø BOSSBaryonOscillaWonSpectroscopicSurvey–fromSDSS
Ø Performtypicalrandomlygeneratedquerytoextractsmallamountofstars/galaxiesfrommillions
Ø RunonfinalreleaseofSDSS-IIIcompleteBOSSdataset
Ø H5Boss:AH5pybasedpythonpackagefor:² ReformayngFitstoHDF5files² Querying/SubseyngFiberdatasets
Baryon acousWc oscillaWons in earlyuniverse, sWll can be seen in survey likeBOSS, (courtesy of Chris Blake and SamMoorfield)
Case Study II: H5Boss
JialinLiu,DebbieBard,QuinceyKoziol,StephenBailey,Prabhat,“H5Boss:AHDF5basedPythonPackageforBOSSSpectroscopicSurveyData”,InSubmission
-35-
H5boss.subset
User`cat1`islookingfor1millionfibers
DataAnalysis
....
ConvertedFitsFilesinH5
Read
Que
ry
plate mjd fiber
4322 55532 32
5892 55598 25
6298 55133 11
Case Study II: H5Boss IO
Write
SubsetFile
PostAnalysis
Share
-36-
BeforeopWmizaWon,with1kquery,strongscaling:² Scalableonsinglenode² NotscalableonmulWplenodes
Case Study II: H5Boss IO
-37-
1
2
Query
Read1 2
From1nodeto32nodes
1
2
Case Study II: H5Boss IO
-38-
EachProcess:Query1. Filesopen2. Plate/mjd/fiberscanning/searching3. Key-valueconstrucWon,neededforcreaWngthesharedfile
AllProcesses:CommunicaWon4.AlltoallreducWontoformaglobalsharedk-vlist
Checklist:1. Theoverheadinallreduce2. Thek,vstructure
Case Study II: H5Boss IO
Write
SubsetFile
PostAnalysis
Share
e.g.,Groups,Datasets
-39-
² Whympi4py’sallreducecouldbeanissue?
1.Allreducevsallreduce
• Lowercase:genericPythonobjects,send(),recv(),etc• Upper-case:bufferlikeobject,Send(),Recv(),etc
2.pickling&unpickling
sender receiver
pickling unpickling
dispatching allocaWon packing allocaWon translaWon unpacking
direct
Cmpi
Case Study II: H5Boss IO
-40-
K:• b[‘3666/55159/599/coadd’]V:• (([((WAVE, ‘<f4’), (FLUX, ‘<f4’), (IVAR, ‘<f4’),(AND_MAKS, ‘<i4’),
(OR_MASK, ‘<i4’), (‘WAVEDISP’, ‘<f4’), (SKY,‘<f4’), (MODEL,‘<f4’)]),• (4619,) • ‘/global/cscratch1/sd/jialin/h5boss/3666-55159.hdf5’)
Key:PathtoHDF5datasetValue:(type,shape,pathtofile)
Key:PathtoHDF5datasetValue:shape
K:(str)“3666/55159/599/coadd”V:(int)“4619”
² Re-designthekey,valuepairtobebufferlike
Case Study II: H5Boss IO
-41-
WithopWmized(k,v)structure,19Xfaster
Case Study II: H5Boss IO
-42-
WithopWmized(k,v)structureWeakscalingto1.6millionfiberqueryand1600processes
Case Study II: H5Boss IO
-43-
Ø H5py: Productivity ² Similar/simpler interface ² .... ² Seamlessly importable in notebook, …
Ø H5py: Performance ² H5py often reaches 90% of HDF5 performance in benchmarking ² In practice, case by case:
² Type consistency ² Object vs. Buffer ² …
Productivity --> Performance
-44-
1. Optimal HDF5 file creation
2.25X
Choosethemostmodernformat[opWonal]
H5py other Best Practice
-45-
2. Use low-level API in H5py
GetclosertotheHDF5Clibrary,finetuning
H5py other Best Practice
H5py at NERSC
-46-
SerialH5py
Anacondaincludesh5pypackage² H5py2.6.0² Built-inhdf5library,1.8.17² Easyusewithotherpackages² Noparallelsupport
WorksonbothEdisonandCori
H5py at NERSC
-47-
H5py-parallel@NERSCØ H5py2.6.0Ø Compiledwithcray-hdf5-parallel/1.8.16Ø Noconflictwithanaconda’sserialh5py
² Importh5py(perfectlyfine)² Canusetogetherwithanaconda
Ø Uptodatefeatures
RollinThomas
-48-
High Performance H5py with Sample Codes h~p://www.nersc.gov/users/data-analyWcs/data-management/i-o-libraries/hdf5-2/h5py/
Thanks
H5py at NERSC