a gpfs performance & bottleneck analysis · pdf filea gpfs performance & bottleneck...
TRANSCRIPT
snow.llnl.gov(bldg. 451)
!
!
!
!
!
!
!
!"
"
"
" " "
""
A GPFS Performance& Bottleneck Analysis
William LoeweSummer, 2000
Lawrence Livermore National Laboratory
The Snow Report:
UCRL-VG-141522
William Loewe � Summer, 2000 Snow Report � Page 2
Report Outline
! Background:- Snow Architecture in
conjunction with GPFS 1.3
- RAID Access Patterns
! File Creation/Removal Rates using NFS, JFS, GPFS 1.2 &
GPFS 1.3:- Small Files
- Large Files
- Directories
! IOR_POSIX benchmark tests:Segmented Access, Varying:
- Transfer Size
- Node Number
- File Size
- Client-Node Ratio
- Transfer Size w/Multiple Client
- Random Size Transfer
Strided Access, Varying:- Block Size
! Concluding Summary
William Loewe � Summer, 2000 Snow Report � Page 3
Snow HardwareSpecifications
! IBM RS/6000 SP System
! 4 frames
! 4 Nighthawk-1 nodes per frame
! 8 222MHz 64-bit IBM Power3CPUs per node
(16 Nodes & 128 CPUs total)
! 4 GBytes RAM per node
(64 GBytes total memory)
! Peak computing capability of
~114 GFLOPS
! 2 I/O nodes (may also be used ascompute nodes -- not dedicated)
! 3 SSA adapters per I/O node
! 3 73-GByte RAID sets per SSAadapter (1.3 TBytes total diskspace)
! 5 Disks on each RAID (4 + P)
– parity information is distributedamong drives (i.e., not adedicated parity drive)
! ~250MB/sec maximum transfer(14MB/sec per RAID)
William Loewe � Summer, 2000 Snow Report � Page 4
Analysis of NH-1 Nodesw/ Santa Cruz adapters
SERVERS 2 Nighthawk-1 Server Nodes
CLIENTS14 Nighthawk-1
Compute Nodes
222 MhzPower 3
222 MhzPower 3
222 MhzPower 3
ColonyAdapter
222 MhzPower 3
222 MhzPower 3
222 MhzPower 3
222 MhzPower 3
222 MhzPower 3
222 MhzPower 3
222 MhzPower 3222 MhzPower 3
222 MhzPower 3
222 MhzPower 3
222 MhzPower 3222 MhzPower 3
222 MhzPower 3
Santa CruzDisks areconfiguredinto a 4+pRaid Set (73GB). EachRaid Set iscapable of 14MB/sec.
Each disk is anIBM SSA disk,max 8.5 MB/sec,typical 5.5MB/sec,total capacity18.2 GB.
Each Santa Cruz Adapteris capable of: ~48 MB/sec writes ~88 MB/sec reads
Colony Switch
Clients (VSDs)communicate withLAPI protocol(~80% efficient?).Switch comm isclient bottleneck:~294 MB/sec
Colony Adapter:
Point-to-point:~367 MB/secUnidirectional4 MB buff
~??? MB/secbidirectional Santa Cruz
Santa Cruz
ColonyAdapter
140MB/sec maxwrite per CPU?
William Loewe � Summer, 2000 Snow Report � Page 5
Node 1-4Frame 1
Node 7-8Frame 2
Node 6
Node 5
Node 13-16Frame 4
Node 9-12Frame 3
4+P
RAID
Loop
RAID
Loop
RAID
Loop
5 Disks oneach RAID set
(4 + P) SSAAdapters
I/O NodeI/O Node
! RAID loops are sharedbetween I/O nodes (5and 6) so that if a singleI/O node fails, anotherI/O node may cover alleighteen RAID sets.
Switch
RAID Loops
I/O nodes arenot dedicated
(i.e., mayserve as bothcompute nodeand I/O nodeconcurrently.)
William Loewe � Summer, 2000 Snow Report � Page 6
GPFS version 1.3
! IBM General Parallel File System for AIX(GPFS) version 1.3
! Mohonk PSSP 3.2 GA PTF 1 SubRelease#DV (GPFS) Client 1 Client 2 Client 3
A s P u r e A s T h e D r i v e n S n o w .
A s P u r e A s T h e D r i v e n S n o w .
Primary goal is to allow a single fileaccessed by multiple clients to be stored(or striped) across multiple disks.
(Of course, several files may be
accessed concurrently.)
Strided
Segmented
William Loewe � Summer, 2000 Snow Report � Page 7
File Creation Rate onSnow and Blue
! Testing the creationtime of 1000 files (ordirectories) created onNFS, JFS, and GPFSfor Snow and Blue.
! Note: Blue runs GPFS1.2, whereas Snow runsthe newer GPFS 1.3
For file tests: Machines: snow.llnl.gov blue.llnl.gov Code used: csh-script while ( $nfils < 1000 ) echo "Sat Jan 01 00:00:00 PDT 2000" > file.$nfils @ nfils++ end
*Nested #1 -- Subdirectory containing one file and one subdirectory,
recursively, for a 1000 file/directory total.
**Nested #2 -- Subdirectory containing nine files and one subdirectory,
recursively, for a 1000 file/directory total.
DF.xls
Directory / Small File Creation
0
50
100
150
200
250
300
350
NFSJF
SGPFS
NFSJF
SGPFS
NFSJF
SGPFS
NFSJF
SGPFS
Files Directories Nested #1* Nested #2**
Fil
e/s
ec
Snow
Blue
William Loewe � Summer, 2000 Snow Report � Page 8
File Removal Rate onSnow and Blue
! GPFS 1.3 is significantlyfaster for removing files ordirectories than version 1.2
DF.xls
Directory / Small File Removal
0
200
400
600
800
1000
1200
NFSJF
SGPFS
NFSJF
SGPFS
NFSJF
SGPFS
NFSJF
SGPFS
Files Directories Nested #1 Nested #2
Fil
e/s
ec
Snow
Blue
William Loewe � Summer, 2000 Snow Report � Page 9
File Creation/Removal RateSummary for Snow and Blue
! The Network File System (NFS)and Journaled File System (JFS)are comparable on Snow andBlue (both using the sameversion of these file systems.)This shows that despitedifferences in hardware, thesystems behave similarly.Therefore, the notable differencein GPFS performance is likelydue to improvements between 1.2and 1.3, not hardwareconsiderations.
William Loewe � Summer, 2000 Snow Report � Page 10
Terry.FILE.xls
Small File Creation Rate forGPFS 1.2, GPFS 1.3, NFS, JFS
Small File Creation Rate
0
100
200
300
400
500
600
700
800
900
1 2 4 8
# Clients
Fil
e/s
GPFS 1.3
GPFS 1.2
NFS
JFS
Note:
GPFS 1.2 tests run on blue.llnl.gov
GPFS 1.3, NFS, and JFS tests run on snow.llnl.gov
PARAMETERS:! 1000 29-byte files
created on all file systems
William Loewe � Summer, 2000 Snow Report � Page 11
Terry.RM.xls
Large File Removal Rate forGPFS 1.2, GPFS 1.3, NFS, JFS
PARAMETERS:! GPFS 1.2: 128 1G files
! GPFS 1.3: 128 1G files
! NFS: 1000 1M files
! JFS: 1000 _M files
Note:
GPFS 1.2 tests run on blue.llnl.gov
GPFS 1.3, NFS, and JFS tests run on snow.llnl.gov
Large File Removal Rate
0
50000
100000
150000
200000
250000
300000
350000
400000
450000
1 2 4 8
# Clients
KB
/s
GPFS 1.3
GPFS 1.2
NFS
JFS
William Loewe � Summer, 2000 Snow Report � Page 12
Terry.DIR.xls
Directory Creation Rate forGPFS 1.2, GPFS 1.3, NFS, JFS
Directory Creation Rate
0
50
100
150
200
250
300
350
400
1 2 4 8
# Clients
Dir
/s
GPFS 1.3
GPFS 1.2
NFS
JFS
Note:
GPFS 1.2 tests run on blue.llnl.gov
GPFS 1.3, NFS, and JFS tests run on snow.llnl.gov
PARAMETERS:! 1000 single-nested
subdirectories created onall file systems
(e.g., /1, /1/2, 1/2/3 would be
three single-nested subdirectories,
i.e., a unary tree structure)
William Loewe � Summer, 2000 Snow Report � Page 13
Creation / Removal Rate for Snow forGPFS 1.2, GPFS 1.3, NFS, JFS Summary
! The file creation and file removalrates are excellent on GPFS 1.3.Further, they appear scalable forincreasing number of clients.
! With nested directories, thecreation rate on GPFS 1.3 isbelow that of NFS.
William Loewe � Summer, 2000 Snow Report � Page 14
Sensitivity Curve I-A:Transfer Size Variation (Segmented)
PARAMETERS:! Client = 1
! Node = 1
! FileSize = 512MB
! TransferSize =
1KB to 512MB
For all subsequent tests: Code: ior_posix.c Machine: snow.llnl.gov PSSP: 3.2 GPFS: 1.3 Configuration: 2 * ( 3 * ( 3 * ( 4 + p ))) ~250MB/sec Max Transfer Rate (14MB/sec per RAID set)
I-A.out
Transfer Size Variation
0.00
50.00
100.00
150.00
200.00
250.00
1 4
16 64 256
1024
4096
1638
4
6553
6
2621
44
Transfer Size (KB)
Tra
ns
fer
Ra
te (
MB
/se
c)
Read
Write
William Loewe � Summer, 2000 Snow Report � Page 15
RERUN: 11/2000Sensitivity Curve I-A:
Transfer Size Variation (Segmented)
PARAMETERS:! Client = 1
! Node = 1
! FileSize = 512MB
! TransferSize =
1KB to 512MB
queue_depth = 2previous run withqueue_depth = 40
I-A.out
Transfer Size Variation
0.00
50.00
100.00
150.00
200.00
250.00
1 2 4 8 16 32 64 128
256
512
1024
2048 4096
8192
1638
432
768
6553
613
1072
2621
4452
4288
Transfer Size (KB)
Tra
nsfe
r R
ate
(M
B/s
ec) Read
Write
William Loewe � Summer, 2000 Snow Report � Page 16
Sensitivity Curve I-B:Transfer Size Variation (Segmented)
PARAMETERS:! Client = 1
! Node = 1
! FileSize = 5GB
! TransferSize =
1KB to 4MB
A.4.out
Transfer Size Variation
0.00
20.00
40.00
60.00
80.00
100.00
120.00
140.00
1 2 4 8 16 32 64 128 256 512 1024 2048 4096
Transfer Size (KB)
Tra
nsf
er R
ate
(MB
/sec
) Read
Write
William Loewe � Summer, 2000 Snow Report � Page 17
Transfer Size Variation Summary
! For reads, the read-aheadalgorithm GPFS uses becomesinefficient for larger subblocks,thus causing the bell-curve forreads. The bell-shaped curve iscaused by the filling of the L2cache which bumps out theinstruction stream with largertransfer sizes.
! Writes can be expected to beslower due to the additionaloverhead associated withallocation (buffer must beallocated before write request.)But nonetheless for writes, theceiling of 140MB/sec by onethread (CPU) per Nodesupposedly causes the poor writeperformance. Later, we canshow that this appears to be thecase, but that another bottleneckis creeping up elsewhere.
William Loewe � Summer, 2000 Snow Report � Page 18
Sensitivity Curve II-A:Node Number Variation (Segmented)
PARAMETERS:! Client / Node = 1
! TransferSize = 256KB
! Block = 512MB
! FileSize = Client * 512MB
! Nodes = 1 to 14
B.2.out
Node Number Variation
0
50
100
150
200
250
300
1 2 3 4 5 6 7 8 9 10 11 12 13 14
# of Nodes
Tra
ns
fer
Ra
te (
MB
/se
c) Read
Write
William Loewe � Summer, 2000 Snow Report � Page 19
RERUN: 11/2000Sensitivity Curve II-A:
Node Number Variation (Segmented)
PARAMETERS:! Client / Node = 1
! TransferSize = 256KB
! Block = 512MB
! FileSize = Client * 512MB
! Nodes = 1 to 14
B.2.out
Node Number Variation
0
50
100
150
200
250
300
1 2 3 4 5 6 7 8 9 10 11 12 13 14
# of Nodes
Tra
ns
fer
Ra
te (
MB
/se
c) Read
Write
queue_depth = 2previous run withqueue_depth = 40
William Loewe � Summer, 2000 Snow Report � Page 20
RERUN: 11/2000Sensitivity Curve II-A2:
Node Number Variation (Segmented)
PARAMETERS:! Client / Node = 8
! TransferSize = 256KB
! Block = 512MB
! FileSize = Client * 512MB
! Node = 1 to 14
C.3.out
Client Number Variation
0
50
100
150
200
250
300
1 2 3 4 5 6 7 8 9 10 11 12 13 14
# of Clients / Node
Tra
ns
fer
Ra
te (
MB
/se
c)
Read
Write
queue_depth = 2previous run withqueue_depth = 40
[NOTE: No Summer run of this test for comparision]
William Loewe � Summer, 2000 Snow Report � Page 21
Sensitivity Curve II-B:Node Number Variation (Segmented)
PARAMETERS:! Client / Node = 1
! FileSize = 8400MB
! TransferSize = 256KB
! Block = FileSize / Clients
! Nodes = 1 to 8
C.4.out
Node Number Variation
0
50
100
150
200
250
300
1 2 3 4 5 6 7 8
# of Nodes
Tra
ns
fer
Ra
te (
MB
/se
c)
Read
Write
William Loewe � Summer, 2000 Snow Report � Page 22
Sensitivity Curve II-C:Node Number Variation (Segmented)
PARAMETERS:! Client / Node = 1
! FileSize = 2100MB
! TransferSize = 256KB
! Block = FileSize / Clients
! Nodes = 1 to 8
C.7.out
Node Number Variation
0
50
100
150
200
250
300
1 2 3 4 5 6 7 8
# of Nodes
Tra
ns
fer
Ra
te (
MB
/se
c)
Read
Write
William Loewe � Summer, 2000 Snow Report � Page 23
Sensitivity Curve II-D: Node Number Variation (Segmented)
PARAMETERS:! Client / Node = 1
! FileSize = 1470MB
! TransferSize = 256KB
! Block = FileSize / Clients
! Nodes = 1 to 6
C.8.out
Node Number Variation
0
50
100
150
200
250
300
1 2 3 4 5 6
# of Node s
Tra
ns
fer
Ra
te (
MB
/se
c)
Read
Write
William Loewe � Summer, 2000 Snow Report � Page 24
Node Number Variation Summary
! It appears from observation ofthe activity of the I/O nodes thata �read-modify-write� is beingperformed when using more than3 clients during this test. As theRAID stripe is 256KB, but theLVM transfer size is 128KB,there is a failure to coalesce thetransfer packet to 256KB beforewriting to disk. Consequently, aread-modify-write is performed,slowing down the write.
William Loewe � Summer, 2000 Snow Report � Page 25
Sensitivity Curve III:File Size Variation (Segmented)
PARAMETERS:! Nodes = 6
! Client / Node = 1
! TransferSize = 256KB
! Block = FileSize / 6
! FileSize =6 MB to 24GB
B.3.out
B.4.out
File Size Variation
0
50
100
150
200
250
300
350
6
12 24 48 96 192
384
768
1536
3076 6144
1228
824
576
File Size (MB)
Tra
ns
fer
Ra
te (
MB
/se
c)
Read
Write
William Loewe � Summer, 2000 Snow Report � Page 26
RERUN: 11/2000Sensitivity Curve III:
File Size Variation (Segmented)
PARAMETERS:! Nodes = 6
! Client / Node = 1
! TransferSize = 256KB
! Block = FileSize / 6
! FileSize =6 MB to 24GBB.3.out
B.4.out
File Size Variation
0
50
100
150
200
250
300
350
24 48 96 192 384 768 1536 3076 6144 12288 24576
File Size (MB)
Tra
ns
fer
Ra
te (
MB
/se
c)
Read
Write
William Loewe � Summer, 2000 Snow Report � Page 27
File Size Variation Summary
! Apparently clever use ofcaching allows for a betterread/write rate. But, aftera certain blocksize as thefile gets larger, the readlevels off to around250MB/sec, but the writerate plummets to100MB/sec. These are thesame bottlenecks we�vebeen seeing.
! However, note thatwriting 32/64MB per nodecan help boost GPFSthrough use of caching. Asthe Page Pool is set to100MB, we do not see theeffects of caching beyondthis point.
William Loewe � Summer, 2000 Snow Report � Page 28
Sensitivity Curve IV-A:Client / Node Variation (Segmented)
PARAMETERS:! Nodes = 1
! FileSize = 512MB * Clients
! TransferSize = 256KB
! Block = 512MB
! Clients = 1 to 8
C.6.out
Client Number Variation
0
50
100
150
200
250
1 2 3 4 5 6 7 8
# of Clients
Tra
ns
fer
Ra
te (
MB
/se
c)
Read
Write
William Loewe � Summer, 2000 Snow Report � Page 29
Sensitivity Curve IV-B:Client / Node Variation (Segmented)
PARAMETERS:! Nodes = 8
! TransferSize = 256KB
! Block = 512MB
! FileSize = Client * 512MB
! Client / Node = 1 to 8
C.3.out
Client Number Variation
0
50
100
150
200
250
300
1 2 3 4 5 6 7 8
# of Client / Node
Tra
ns
fer
Ra
te (
MB
/se
c)
Read
Write
William Loewe � Summer, 2000 Snow Report � Page 30
RERUN: 11/2000Sensitivity Curve IV-B:
Client / Node Variation (Segmented)
PARAMETERS:! Nodes = 8
! TransferSize = 256KB
! Block = 512MB
! FileSize = Client * 512MB
! Client / Node = 1 to 8C.3.out
Client Number Variation
190
200
210
220
230
240
250
1 2 3 4 5 6 7 8
# of Client / Node
Tra
nsf
er R
ate
(MB
/sec
)
Read
Write
William Loewe � Summer, 2000 Snow Report � Page 31
Sensitivity Curve IV-C:Client / Node Variation (Segmented)
PARAMETERS:! Nodes = 8! FileSize = 8400MB! TransferSize = 256KB! Block = FileSize /
(total) Clients! Client / Node = 1 to 8
C.5.out
Client Number Variation
0
50
100
150
200
250
300
1 2 3 4 5 6 7 8
# of Clients / Node
Tra
ns
fer
Ra
te (
MB
/se
c)
Read
Write
William Loewe � Summer, 2000 Snow Report � Page 32
Client / Node Variation Summary
! With one node, the write ratemax of 140 MB/sec increasesafter one client.
! This does not hold true, however,with more nodes. Instead, thewrite rate stays low. Perhaps adifferent bottleneck is in effecthere.
William Loewe � Summer, 2000 Snow Report � Page 33
Test_V.2.out
Sensitivity Curve V-A:Multiple Clients with Transfer Size
Variation (Segmented)
PARAMETERS:! Clients = 2! Nodes = 1! FileSize = 1024MB! Block = 512MB! TransferSize =
16KB – 1024KB
1 Node, 2 Clients
0
50
100
150
200
250
16 32 64 128 256 512 1024
Transfer Size (Kb)
Tra
ns
fer
Ra
te (
MB
/se
c)
Read
Write
William Loewe � Summer, 2000 Snow Report � Page 34
Test_V.3.out
Sensitivity Curve V-B:Multiple Clients with Transfer Size
Variation (Segmented)
PARAMETERS:! Clients = 4! Nodes = 2! FileSize = 2048MB! Block = 512MB! TransferSize =
16KB – 1024KB
2 Nodes, 4 Clients
180
185
190
195
200
205
210
215
220
225
16 32 64 128 256 512 1024
Transfer Size (Kb)
Tra
ns
fer
Ra
te (
MB
/se
c)
Read
Write
William Loewe � Summer, 2000 Snow Report � Page 35
Test_V.4.out
Sensitivity Curve V-C:Multiple Clients with Transfer Size
Variation (Segmented)
PARAMETERS:! Clients = 6! Nodes = 3! FileSize = 3072MB! Block = 512MB! TransferSize =
16KB – 1024KB
3 Nodes, 6 Clients
200
205
210
215
220
225
230
16 32 64 128 256 512 1024
Transfer Size (Kb)
Tra
ns
fer
Ra
te (
MB
/se
c)
Read
Write
William Loewe � Summer, 2000 Snow Report � Page 36
Sensitivity Curve V-D:Multiple Clients with Transfer Size
Variation (Segmented)
PARAMETERS:! Clients = 8! Nodes = 4! FileSize = 4096MB! Block = 512MB! TransferSize =
16KB – 1024KB
Test_V.5.out
4 Nodes, 8 Clients
0
50
100
150
200
250
16 32 64 128 256 512 1024
Transfer Size (Kb)
Tra
ns
fer
Ra
te (
MB
/se
c)
Read
Write
William Loewe � Summer, 2000 Snow Report � Page 37
Sensitivity Curve V-E:Multiple Clients with Transfer Size
Variation (Segmented)
PARAMETERS:! Clients = 10! Nodes = 5! FileSize = 5120MB! Block = 512MB! TransferSize =
16KB – 1024KB
Test_V.6.out
5 Nodes, 10 Clients
0
50
100
150
200
250
300
16 32 64 128 256 512 1024
Transfer Size (Kb)
Tra
ns
fer
Ra
te (
MB
/se
c)
Read
Write
William Loewe � Summer, 2000 Snow Report � Page 38
Test_V.1.out
Sensitivity Curve V-F:Multiple Clients with Transfer Size
Variation (Segmented)
PARAMETERS:! Clients = 12! Nodes = 6! FileSize = 6144MB! Block = 512MB! TransferSize =
16KB – 1024KB
6 Nodes, 12 Clients
0
50
100
150
200
250
300
16 32 64 128 256 512 1024
Transfer Size (Kb)
Tra
ns
fer
Ra
te (
MB
/se
c)
Read
Write
William Loewe � Summer, 2000 Snow Report � Page 39
Multiple Clients with Transfer SizeVariation Summary
! To determine the effect of the140MB/sec per CPU bottleneck,the benchmark is run withincreasing the number of nodes.With four or few nodes, the writerate is good for any size transfer.After that, however, somethingslows the write rate. It seems the‘camel hump’ from Curve II isconsistent.
William Loewe � Summer, 2000 Snow Report � Page 40
Sensitivity Curve VI-A:Random Transfer Size (Strided)
PARAMETERS:! Client / Node = 1
! FileSize = Clients * 512MB
! TransferSize = 1KB to 32KB! Nodes = 1 to 7
Random1-3.out
Random4-6.out
Random7-8.out
Random Transfe r Sizes(1KB - 32KB)
0
20
40
60
80
100
120
140
160
1 2 3 4 5 6 7
# Nodes
Tra
ns
fer
Ra
te (
MB
/se
c)
Read
Write
William Loewe � Summer, 2000 Snow Report � Page 41
Random Transfer Size Summary
! As suspected, IOR does notaddress the problems withrandom transfer sizes very well.Hopefully MPI-IO will improvethese rates.
William Loewe � Summer, 2000 Snow Report � Page 42
Sensitivity Curve VII-A:Blocksize Variation (Strided)
PARAMETERS:! Nodes = 2
! Client / Node = 1
! FileSize = Clients * 512MB
! Block = 1KB to 512MB! (Note: Strided pattern)
Strided.2.out
Blocksize Variation (Strided)
0
50
100
150
200
250
1KB 2K
B4K
B8K
B16
KB32
KB64
KB12
8KB
256K
B51
2KB
1MB
2MB
4MB
8MB
16MB
32MB64
MB12
8MB
256M
B51
2MB
Blocksize
Tra
ns
fer
Ra
te (
MB
/se
c)
Read
Write
William Loewe � Summer, 2000 Snow Report � Page 43
Sensitivity Curve VII-B:Blocksize Variation (Strided)
PARAMETERS:! Nodes = 4
! Client / Node = 1
! FileSize = Clients * 512MB
! Block = 1KB to 512MB Strided.4.out
Blocksize Variation (Strided)
0
50
100
150
200
250
1KB 2K
B4K
B8K
B16
KB32
KB64
KB12
8KB
256K
B51
2KB
1MB
2MB
4MB
8MB
16MB
32MB64
MB12
8MB
256M
B51
2MB
Blocksize
Tra
ns
fer
Ra
te (
MB
/se
c)
Read
Write
William Loewe � Summer, 2000 Snow Report � Page 44
Sensitivity Curve VII-C:Blocksize Variation (Strided)
PARAMETERS:! Nodes = 8
! Client / Node = 1
! FileSize = Clients * 512MB
! Block = 1KB to 512MB Strided.8.out
Blocksize Variation (Strided)
0
50
100
150
200
250
300
1KB 2K
B4K
B8K
B16
KB32
KB64
KB12
8KB
256K
B51
2KB
1MB
2MB
4MB
8MB
16MB
32MB64
MB12
8MB
256M
B51
2MB
Blocksize
Tra
ns
fer
Ra
te (
MB
/se
c)
Read
Write
William Loewe � Summer, 2000 Snow Report � Page 45
Sensitivity Curve VII-D:Blocksize Variation (Strided)
PARAMETERS:! Nodes = 14
! Client / Node = 1
! FileSize = Clients * 512MB
! Block = 8KB to 512MB Strided.14.out
Strided.14.aa.out
Blocksize Variation (Strided)
0.00
50.00
100.00
150.00
200.00
250.00
300.00
8KB
16KB
32KB64
KB12
8KB
256K
B51
2KB
1MB
2MB
4MB
8MB
16MB
32MB64
MB12
8MB
256M
B51
2MB
Blocksize
Tra
ns
fer
Ra
te (
MB
/se
c)
Read
Write
William Loewe � Summer, 2000 Snow Report � Page 46
RERUN: 11/2000Sensitivity Curve VII-D:
Blocksize Variation (Strided)
PARAMETERS:! Nodes = 14
! Client / Node = 1
! FileSize = Clients * 512MB
! Block = 32KB to 512MB Strided.14.out
Strided.14.aa.out
Blocksize Variation (Strided)
0.00
50.00
100.00
150.00
200.00
250.00
300.00
32KB64
KB12
8KB
256K
B51
2KB
1MB
2MB
4MB
8MB
16MB
32MB64
MB12
8MB
256M
B51
2MB
Blocksize
Tra
nsf
er R
ate
(MB
/sec
)
Read
Write
William Loewe � Summer, 2000 Snow Report � Page 47
Blocksize Variation Summary
! For strided reads above ~256KBblock size (and correspondingtransfer size), the performance isexcellent for any number ofnodes. In fact, we�ve achievedthe theoretical bottleneck of~250MB/sec.
! Writes, on the other hand, tend tobe better with fewer nodes andtend to improve with larger blocksizes. Initially (below 256KB)this is the read-modify-writeproblem. Beyond it, however,probably the large number ofclient are showing the coalescingproblem again.
William Loewe � Summer, 2000 Snow Report � Page 48
Concluding Summary
META-DATA (File Create/Delete) Summary:Changes in GPFS 1.3 have improved metadata performance greatly. Small file creationand large file removal are above that of NFS,JFS, and GPFS 1.2. For directory creation,GPFS 1.3 is strong, but still half that of NFS.
IOR Benchmark Summary:
I. An optimal transfer buffer size seems be~256KB for reads and writes.
II. Read performance is excellent with a ceilingof ~250MB/sec. Increasing the number ofnodes only marginally improves this readperformance.
For writes, there is a coalescing bottleneck.
IOR Benchmark Summary (cont.):
III. The pagepool can improve read/write ratesbeyond the theoretical ceiling.
IV/V. Increasing the number of nodes using morethan one client per node improves write rate,but again the > 4 node bottleneck is present.
VI. Reads/writes for random transfer size is lessthan desirable. MPI-IO will better addressthis.
VII. Strided reads greater than 256KB blocks areexcellent across the board. Writes tend toramp us slowly with improvements due toincreasing blocksize. Again, writes are betterwith fewer nodes.
William Loewe � Summer, 2000 Snow Report � Page 49
Acknowledgements
! This Science & Technology Education Program summer projectwas completed under the guidance and supervision of Kim Yatesand Terry Jones.
! Assistance and information from Dave Fox, Robin Goldstone, MarkGrondona, and Dan McNabb (of IBM) helped to clarify the resultsand offer general understanding of snow.llnl.gov and the GPFSsystem.
! This work was performed under the auspices of the U.S.Department of Energy by the University of California LawrenceLivermore National Laboratory under contract No. W-7405-Eng-48.