a gpfs performance & bottleneck analysis · pdf filea gpfs performance & bottleneck...

49
snow.llnl.gov (bldg. 451) ! ! ! ! ! ! ! ! " " " " " " " " A GPFS Performance & Bottleneck Analysis William Loewe Summer, 2000 Lawrence Livermore National Laboratory The Snow Report: UCRL-VG-141522

Upload: letram

Post on 10-Mar-2018

237 views

Category:

Documents


6 download

TRANSCRIPT

Page 1: A GPFS Performance & Bottleneck Analysis · PDF fileA GPFS Performance & Bottleneck Analysis William Loewe Summer, ... IBM General Parallel File System for AIX ... queue_depth = 2

snow.llnl.gov(bldg. 451)

!

!

!

!

!

!

!

!"

"

"

" " "

""

A GPFS Performance& Bottleneck Analysis

William LoeweSummer, 2000

Lawrence Livermore National Laboratory

The Snow Report:

UCRL-VG-141522

Page 2: A GPFS Performance & Bottleneck Analysis · PDF fileA GPFS Performance & Bottleneck Analysis William Loewe Summer, ... IBM General Parallel File System for AIX ... queue_depth = 2

William Loewe � Summer, 2000 Snow Report � Page 2

Report Outline

! Background:- Snow Architecture in

conjunction with GPFS 1.3

- RAID Access Patterns

! File Creation/Removal Rates using NFS, JFS, GPFS 1.2 &

GPFS 1.3:- Small Files

- Large Files

- Directories

! IOR_POSIX benchmark tests:Segmented Access, Varying:

- Transfer Size

- Node Number

- File Size

- Client-Node Ratio

- Transfer Size w/Multiple Client

- Random Size Transfer

Strided Access, Varying:- Block Size

! Concluding Summary

Page 3: A GPFS Performance & Bottleneck Analysis · PDF fileA GPFS Performance & Bottleneck Analysis William Loewe Summer, ... IBM General Parallel File System for AIX ... queue_depth = 2

William Loewe � Summer, 2000 Snow Report � Page 3

Snow HardwareSpecifications

! IBM RS/6000 SP System

! 4 frames

! 4 Nighthawk-1 nodes per frame

! 8 222MHz 64-bit IBM Power3CPUs per node

(16 Nodes & 128 CPUs total)

! 4 GBytes RAM per node

(64 GBytes total memory)

! Peak computing capability of

~114 GFLOPS

! 2 I/O nodes (may also be used ascompute nodes -- not dedicated)

! 3 SSA adapters per I/O node

! 3 73-GByte RAID sets per SSAadapter (1.3 TBytes total diskspace)

! 5 Disks on each RAID (4 + P)

– parity information is distributedamong drives (i.e., not adedicated parity drive)

! ~250MB/sec maximum transfer(14MB/sec per RAID)

Page 4: A GPFS Performance & Bottleneck Analysis · PDF fileA GPFS Performance & Bottleneck Analysis William Loewe Summer, ... IBM General Parallel File System for AIX ... queue_depth = 2

William Loewe � Summer, 2000 Snow Report � Page 4

Analysis of NH-1 Nodesw/ Santa Cruz adapters

SERVERS 2 Nighthawk-1 Server Nodes

CLIENTS14 Nighthawk-1

Compute Nodes

222 MhzPower 3

222 MhzPower 3

222 MhzPower 3

ColonyAdapter

222 MhzPower 3

222 MhzPower 3

222 MhzPower 3

222 MhzPower 3

222 MhzPower 3

222 MhzPower 3

222 MhzPower 3222 MhzPower 3

222 MhzPower 3

222 MhzPower 3

222 MhzPower 3222 MhzPower 3

222 MhzPower 3

Santa CruzDisks areconfiguredinto a 4+pRaid Set (73GB). EachRaid Set iscapable of 14MB/sec.

Each disk is anIBM SSA disk,max 8.5 MB/sec,typical 5.5MB/sec,total capacity18.2 GB.

Each Santa Cruz Adapteris capable of: ~48 MB/sec writes ~88 MB/sec reads

Colony Switch

Clients (VSDs)communicate withLAPI protocol(~80% efficient?).Switch comm isclient bottleneck:~294 MB/sec

Colony Adapter:

Point-to-point:~367 MB/secUnidirectional4 MB buff

~??? MB/secbidirectional Santa Cruz

Santa Cruz

ColonyAdapter

140MB/sec maxwrite per CPU?

Page 5: A GPFS Performance & Bottleneck Analysis · PDF fileA GPFS Performance & Bottleneck Analysis William Loewe Summer, ... IBM General Parallel File System for AIX ... queue_depth = 2

William Loewe � Summer, 2000 Snow Report � Page 5

Node 1-4Frame 1

Node 7-8Frame 2

Node 6

Node 5

Node 13-16Frame 4

Node 9-12Frame 3

4+P

RAID

Loop

RAID

Loop

RAID

Loop

5 Disks oneach RAID set

(4 + P) SSAAdapters

I/O NodeI/O Node

! RAID loops are sharedbetween I/O nodes (5and 6) so that if a singleI/O node fails, anotherI/O node may cover alleighteen RAID sets.

Switch

RAID Loops

I/O nodes arenot dedicated

(i.e., mayserve as bothcompute nodeand I/O nodeconcurrently.)

Page 6: A GPFS Performance & Bottleneck Analysis · PDF fileA GPFS Performance & Bottleneck Analysis William Loewe Summer, ... IBM General Parallel File System for AIX ... queue_depth = 2

William Loewe � Summer, 2000 Snow Report � Page 6

GPFS version 1.3

! IBM General Parallel File System for AIX(GPFS) version 1.3

! Mohonk PSSP 3.2 GA PTF 1 SubRelease#DV (GPFS) Client 1 Client 2 Client 3

A s P u r e A s T h e D r i v e n S n o w .

A s P u r e A s T h e D r i v e n S n o w .

Primary goal is to allow a single fileaccessed by multiple clients to be stored(or striped) across multiple disks.

(Of course, several files may be

accessed concurrently.)

Strided

Segmented

Page 7: A GPFS Performance & Bottleneck Analysis · PDF fileA GPFS Performance & Bottleneck Analysis William Loewe Summer, ... IBM General Parallel File System for AIX ... queue_depth = 2

William Loewe � Summer, 2000 Snow Report � Page 7

File Creation Rate onSnow and Blue

! Testing the creationtime of 1000 files (ordirectories) created onNFS, JFS, and GPFSfor Snow and Blue.

! Note: Blue runs GPFS1.2, whereas Snow runsthe newer GPFS 1.3

For file tests: Machines: snow.llnl.gov blue.llnl.gov Code used: csh-script while ( $nfils < 1000 ) echo "Sat Jan 01 00:00:00 PDT 2000" > file.$nfils @ nfils++ end

*Nested #1 -- Subdirectory containing one file and one subdirectory,

recursively, for a 1000 file/directory total.

**Nested #2 -- Subdirectory containing nine files and one subdirectory,

recursively, for a 1000 file/directory total.

DF.xls

Directory / Small File Creation

0

50

100

150

200

250

300

350

NFSJF

SGPFS

NFSJF

SGPFS

NFSJF

SGPFS

NFSJF

SGPFS

Files Directories Nested #1* Nested #2**

Fil

e/s

ec

Snow

Blue

Page 8: A GPFS Performance & Bottleneck Analysis · PDF fileA GPFS Performance & Bottleneck Analysis William Loewe Summer, ... IBM General Parallel File System for AIX ... queue_depth = 2

William Loewe � Summer, 2000 Snow Report � Page 8

File Removal Rate onSnow and Blue

! GPFS 1.3 is significantlyfaster for removing files ordirectories than version 1.2

DF.xls

Directory / Small File Removal

0

200

400

600

800

1000

1200

NFSJF

SGPFS

NFSJF

SGPFS

NFSJF

SGPFS

NFSJF

SGPFS

Files Directories Nested #1 Nested #2

Fil

e/s

ec

Snow

Blue

Page 9: A GPFS Performance & Bottleneck Analysis · PDF fileA GPFS Performance & Bottleneck Analysis William Loewe Summer, ... IBM General Parallel File System for AIX ... queue_depth = 2

William Loewe � Summer, 2000 Snow Report � Page 9

File Creation/Removal RateSummary for Snow and Blue

! The Network File System (NFS)and Journaled File System (JFS)are comparable on Snow andBlue (both using the sameversion of these file systems.)This shows that despitedifferences in hardware, thesystems behave similarly.Therefore, the notable differencein GPFS performance is likelydue to improvements between 1.2and 1.3, not hardwareconsiderations.

Page 10: A GPFS Performance & Bottleneck Analysis · PDF fileA GPFS Performance & Bottleneck Analysis William Loewe Summer, ... IBM General Parallel File System for AIX ... queue_depth = 2

William Loewe � Summer, 2000 Snow Report � Page 10

Terry.FILE.xls

Small File Creation Rate forGPFS 1.2, GPFS 1.3, NFS, JFS

Small File Creation Rate

0

100

200

300

400

500

600

700

800

900

1 2 4 8

# Clients

Fil

e/s

GPFS 1.3

GPFS 1.2

NFS

JFS

Note:

GPFS 1.2 tests run on blue.llnl.gov

GPFS 1.3, NFS, and JFS tests run on snow.llnl.gov

PARAMETERS:! 1000 29-byte files

created on all file systems

Page 11: A GPFS Performance & Bottleneck Analysis · PDF fileA GPFS Performance & Bottleneck Analysis William Loewe Summer, ... IBM General Parallel File System for AIX ... queue_depth = 2

William Loewe � Summer, 2000 Snow Report � Page 11

Terry.RM.xls

Large File Removal Rate forGPFS 1.2, GPFS 1.3, NFS, JFS

PARAMETERS:! GPFS 1.2: 128 1G files

! GPFS 1.3: 128 1G files

! NFS: 1000 1M files

! JFS: 1000 _M files

Note:

GPFS 1.2 tests run on blue.llnl.gov

GPFS 1.3, NFS, and JFS tests run on snow.llnl.gov

Large File Removal Rate

0

50000

100000

150000

200000

250000

300000

350000

400000

450000

1 2 4 8

# Clients

KB

/s

GPFS 1.3

GPFS 1.2

NFS

JFS

Page 12: A GPFS Performance & Bottleneck Analysis · PDF fileA GPFS Performance & Bottleneck Analysis William Loewe Summer, ... IBM General Parallel File System for AIX ... queue_depth = 2

William Loewe � Summer, 2000 Snow Report � Page 12

Terry.DIR.xls

Directory Creation Rate forGPFS 1.2, GPFS 1.3, NFS, JFS

Directory Creation Rate

0

50

100

150

200

250

300

350

400

1 2 4 8

# Clients

Dir

/s

GPFS 1.3

GPFS 1.2

NFS

JFS

Note:

GPFS 1.2 tests run on blue.llnl.gov

GPFS 1.3, NFS, and JFS tests run on snow.llnl.gov

PARAMETERS:! 1000 single-nested

subdirectories created onall file systems

(e.g., /1, /1/2, 1/2/3 would be

three single-nested subdirectories,

i.e., a unary tree structure)

Page 13: A GPFS Performance & Bottleneck Analysis · PDF fileA GPFS Performance & Bottleneck Analysis William Loewe Summer, ... IBM General Parallel File System for AIX ... queue_depth = 2

William Loewe � Summer, 2000 Snow Report � Page 13

Creation / Removal Rate for Snow forGPFS 1.2, GPFS 1.3, NFS, JFS Summary

! The file creation and file removalrates are excellent on GPFS 1.3.Further, they appear scalable forincreasing number of clients.

! With nested directories, thecreation rate on GPFS 1.3 isbelow that of NFS.

Page 14: A GPFS Performance & Bottleneck Analysis · PDF fileA GPFS Performance & Bottleneck Analysis William Loewe Summer, ... IBM General Parallel File System for AIX ... queue_depth = 2

William Loewe � Summer, 2000 Snow Report � Page 14

Sensitivity Curve I-A:Transfer Size Variation (Segmented)

PARAMETERS:! Client = 1

! Node = 1

! FileSize = 512MB

! TransferSize =

1KB to 512MB

For all subsequent tests: Code: ior_posix.c Machine: snow.llnl.gov PSSP: 3.2 GPFS: 1.3 Configuration: 2 * ( 3 * ( 3 * ( 4 + p ))) ~250MB/sec Max Transfer Rate (14MB/sec per RAID set)

I-A.out

Transfer Size Variation

0.00

50.00

100.00

150.00

200.00

250.00

1 4

16 64 256

1024

4096

1638

4

6553

6

2621

44

Transfer Size (KB)

Tra

ns

fer

Ra

te (

MB

/se

c)

Read

Write

Page 15: A GPFS Performance & Bottleneck Analysis · PDF fileA GPFS Performance & Bottleneck Analysis William Loewe Summer, ... IBM General Parallel File System for AIX ... queue_depth = 2

William Loewe � Summer, 2000 Snow Report � Page 15

RERUN: 11/2000Sensitivity Curve I-A:

Transfer Size Variation (Segmented)

PARAMETERS:! Client = 1

! Node = 1

! FileSize = 512MB

! TransferSize =

1KB to 512MB

queue_depth = 2previous run withqueue_depth = 40

I-A.out

Transfer Size Variation

0.00

50.00

100.00

150.00

200.00

250.00

1 2 4 8 16 32 64 128

256

512

1024

2048 4096

8192

1638

432

768

6553

613

1072

2621

4452

4288

Transfer Size (KB)

Tra

nsfe

r R

ate

(M

B/s

ec) Read

Write

Page 16: A GPFS Performance & Bottleneck Analysis · PDF fileA GPFS Performance & Bottleneck Analysis William Loewe Summer, ... IBM General Parallel File System for AIX ... queue_depth = 2

William Loewe � Summer, 2000 Snow Report � Page 16

Sensitivity Curve I-B:Transfer Size Variation (Segmented)

PARAMETERS:! Client = 1

! Node = 1

! FileSize = 5GB

! TransferSize =

1KB to 4MB

A.4.out

Transfer Size Variation

0.00

20.00

40.00

60.00

80.00

100.00

120.00

140.00

1 2 4 8 16 32 64 128 256 512 1024 2048 4096

Transfer Size (KB)

Tra

nsf

er R

ate

(MB

/sec

) Read

Write

Page 17: A GPFS Performance & Bottleneck Analysis · PDF fileA GPFS Performance & Bottleneck Analysis William Loewe Summer, ... IBM General Parallel File System for AIX ... queue_depth = 2

William Loewe � Summer, 2000 Snow Report � Page 17

Transfer Size Variation Summary

! For reads, the read-aheadalgorithm GPFS uses becomesinefficient for larger subblocks,thus causing the bell-curve forreads. The bell-shaped curve iscaused by the filling of the L2cache which bumps out theinstruction stream with largertransfer sizes.

! Writes can be expected to beslower due to the additionaloverhead associated withallocation (buffer must beallocated before write request.)But nonetheless for writes, theceiling of 140MB/sec by onethread (CPU) per Nodesupposedly causes the poor writeperformance. Later, we canshow that this appears to be thecase, but that another bottleneckis creeping up elsewhere.

Page 18: A GPFS Performance & Bottleneck Analysis · PDF fileA GPFS Performance & Bottleneck Analysis William Loewe Summer, ... IBM General Parallel File System for AIX ... queue_depth = 2

William Loewe � Summer, 2000 Snow Report � Page 18

Sensitivity Curve II-A:Node Number Variation (Segmented)

PARAMETERS:! Client / Node = 1

! TransferSize = 256KB

! Block = 512MB

! FileSize = Client * 512MB

! Nodes = 1 to 14

B.2.out

Node Number Variation

0

50

100

150

200

250

300

1 2 3 4 5 6 7 8 9 10 11 12 13 14

# of Nodes

Tra

ns

fer

Ra

te (

MB

/se

c) Read

Write

Page 19: A GPFS Performance & Bottleneck Analysis · PDF fileA GPFS Performance & Bottleneck Analysis William Loewe Summer, ... IBM General Parallel File System for AIX ... queue_depth = 2

William Loewe � Summer, 2000 Snow Report � Page 19

RERUN: 11/2000Sensitivity Curve II-A:

Node Number Variation (Segmented)

PARAMETERS:! Client / Node = 1

! TransferSize = 256KB

! Block = 512MB

! FileSize = Client * 512MB

! Nodes = 1 to 14

B.2.out

Node Number Variation

0

50

100

150

200

250

300

1 2 3 4 5 6 7 8 9 10 11 12 13 14

# of Nodes

Tra

ns

fer

Ra

te (

MB

/se

c) Read

Write

queue_depth = 2previous run withqueue_depth = 40

Page 20: A GPFS Performance & Bottleneck Analysis · PDF fileA GPFS Performance & Bottleneck Analysis William Loewe Summer, ... IBM General Parallel File System for AIX ... queue_depth = 2

William Loewe � Summer, 2000 Snow Report � Page 20

RERUN: 11/2000Sensitivity Curve II-A2:

Node Number Variation (Segmented)

PARAMETERS:! Client / Node = 8

! TransferSize = 256KB

! Block = 512MB

! FileSize = Client * 512MB

! Node = 1 to 14

C.3.out

Client Number Variation

0

50

100

150

200

250

300

1 2 3 4 5 6 7 8 9 10 11 12 13 14

# of Clients / Node

Tra

ns

fer

Ra

te (

MB

/se

c)

Read

Write

queue_depth = 2previous run withqueue_depth = 40

[NOTE: No Summer run of this test for comparision]

Page 21: A GPFS Performance & Bottleneck Analysis · PDF fileA GPFS Performance & Bottleneck Analysis William Loewe Summer, ... IBM General Parallel File System for AIX ... queue_depth = 2

William Loewe � Summer, 2000 Snow Report � Page 21

Sensitivity Curve II-B:Node Number Variation (Segmented)

PARAMETERS:! Client / Node = 1

! FileSize = 8400MB

! TransferSize = 256KB

! Block = FileSize / Clients

! Nodes = 1 to 8

C.4.out

Node Number Variation

0

50

100

150

200

250

300

1 2 3 4 5 6 7 8

# of Nodes

Tra

ns

fer

Ra

te (

MB

/se

c)

Read

Write

Page 22: A GPFS Performance & Bottleneck Analysis · PDF fileA GPFS Performance & Bottleneck Analysis William Loewe Summer, ... IBM General Parallel File System for AIX ... queue_depth = 2

William Loewe � Summer, 2000 Snow Report � Page 22

Sensitivity Curve II-C:Node Number Variation (Segmented)

PARAMETERS:! Client / Node = 1

! FileSize = 2100MB

! TransferSize = 256KB

! Block = FileSize / Clients

! Nodes = 1 to 8

C.7.out

Node Number Variation

0

50

100

150

200

250

300

1 2 3 4 5 6 7 8

# of Nodes

Tra

ns

fer

Ra

te (

MB

/se

c)

Read

Write

Page 23: A GPFS Performance & Bottleneck Analysis · PDF fileA GPFS Performance & Bottleneck Analysis William Loewe Summer, ... IBM General Parallel File System for AIX ... queue_depth = 2

William Loewe � Summer, 2000 Snow Report � Page 23

Sensitivity Curve II-D: Node Number Variation (Segmented)

PARAMETERS:! Client / Node = 1

! FileSize = 1470MB

! TransferSize = 256KB

! Block = FileSize / Clients

! Nodes = 1 to 6

C.8.out

Node Number Variation

0

50

100

150

200

250

300

1 2 3 4 5 6

# of Node s

Tra

ns

fer

Ra

te (

MB

/se

c)

Read

Write

Page 24: A GPFS Performance & Bottleneck Analysis · PDF fileA GPFS Performance & Bottleneck Analysis William Loewe Summer, ... IBM General Parallel File System for AIX ... queue_depth = 2

William Loewe � Summer, 2000 Snow Report � Page 24

Node Number Variation Summary

! It appears from observation ofthe activity of the I/O nodes thata �read-modify-write� is beingperformed when using more than3 clients during this test. As theRAID stripe is 256KB, but theLVM transfer size is 128KB,there is a failure to coalesce thetransfer packet to 256KB beforewriting to disk. Consequently, aread-modify-write is performed,slowing down the write.

Page 25: A GPFS Performance & Bottleneck Analysis · PDF fileA GPFS Performance & Bottleneck Analysis William Loewe Summer, ... IBM General Parallel File System for AIX ... queue_depth = 2

William Loewe � Summer, 2000 Snow Report � Page 25

Sensitivity Curve III:File Size Variation (Segmented)

PARAMETERS:! Nodes = 6

! Client / Node = 1

! TransferSize = 256KB

! Block = FileSize / 6

! FileSize =6 MB to 24GB

B.3.out

B.4.out

File Size Variation

0

50

100

150

200

250

300

350

6

12 24 48 96 192

384

768

1536

3076 6144

1228

824

576

File Size (MB)

Tra

ns

fer

Ra

te (

MB

/se

c)

Read

Write

Page 26: A GPFS Performance & Bottleneck Analysis · PDF fileA GPFS Performance & Bottleneck Analysis William Loewe Summer, ... IBM General Parallel File System for AIX ... queue_depth = 2

William Loewe � Summer, 2000 Snow Report � Page 26

RERUN: 11/2000Sensitivity Curve III:

File Size Variation (Segmented)

PARAMETERS:! Nodes = 6

! Client / Node = 1

! TransferSize = 256KB

! Block = FileSize / 6

! FileSize =6 MB to 24GBB.3.out

B.4.out

File Size Variation

0

50

100

150

200

250

300

350

24 48 96 192 384 768 1536 3076 6144 12288 24576

File Size (MB)

Tra

ns

fer

Ra

te (

MB

/se

c)

Read

Write

Page 27: A GPFS Performance & Bottleneck Analysis · PDF fileA GPFS Performance & Bottleneck Analysis William Loewe Summer, ... IBM General Parallel File System for AIX ... queue_depth = 2

William Loewe � Summer, 2000 Snow Report � Page 27

File Size Variation Summary

! Apparently clever use ofcaching allows for a betterread/write rate. But, aftera certain blocksize as thefile gets larger, the readlevels off to around250MB/sec, but the writerate plummets to100MB/sec. These are thesame bottlenecks we�vebeen seeing.

! However, note thatwriting 32/64MB per nodecan help boost GPFSthrough use of caching. Asthe Page Pool is set to100MB, we do not see theeffects of caching beyondthis point.

Page 28: A GPFS Performance & Bottleneck Analysis · PDF fileA GPFS Performance & Bottleneck Analysis William Loewe Summer, ... IBM General Parallel File System for AIX ... queue_depth = 2

William Loewe � Summer, 2000 Snow Report � Page 28

Sensitivity Curve IV-A:Client / Node Variation (Segmented)

PARAMETERS:! Nodes = 1

! FileSize = 512MB * Clients

! TransferSize = 256KB

! Block = 512MB

! Clients = 1 to 8

C.6.out

Client Number Variation

0

50

100

150

200

250

1 2 3 4 5 6 7 8

# of Clients

Tra

ns

fer

Ra

te (

MB

/se

c)

Read

Write

Page 29: A GPFS Performance & Bottleneck Analysis · PDF fileA GPFS Performance & Bottleneck Analysis William Loewe Summer, ... IBM General Parallel File System for AIX ... queue_depth = 2

William Loewe � Summer, 2000 Snow Report � Page 29

Sensitivity Curve IV-B:Client / Node Variation (Segmented)

PARAMETERS:! Nodes = 8

! TransferSize = 256KB

! Block = 512MB

! FileSize = Client * 512MB

! Client / Node = 1 to 8

C.3.out

Client Number Variation

0

50

100

150

200

250

300

1 2 3 4 5 6 7 8

# of Client / Node

Tra

ns

fer

Ra

te (

MB

/se

c)

Read

Write

Page 30: A GPFS Performance & Bottleneck Analysis · PDF fileA GPFS Performance & Bottleneck Analysis William Loewe Summer, ... IBM General Parallel File System for AIX ... queue_depth = 2

William Loewe � Summer, 2000 Snow Report � Page 30

RERUN: 11/2000Sensitivity Curve IV-B:

Client / Node Variation (Segmented)

PARAMETERS:! Nodes = 8

! TransferSize = 256KB

! Block = 512MB

! FileSize = Client * 512MB

! Client / Node = 1 to 8C.3.out

Client Number Variation

190

200

210

220

230

240

250

1 2 3 4 5 6 7 8

# of Client / Node

Tra

nsf

er R

ate

(MB

/sec

)

Read

Write

Page 31: A GPFS Performance & Bottleneck Analysis · PDF fileA GPFS Performance & Bottleneck Analysis William Loewe Summer, ... IBM General Parallel File System for AIX ... queue_depth = 2

William Loewe � Summer, 2000 Snow Report � Page 31

Sensitivity Curve IV-C:Client / Node Variation (Segmented)

PARAMETERS:! Nodes = 8! FileSize = 8400MB! TransferSize = 256KB! Block = FileSize /

(total) Clients! Client / Node = 1 to 8

C.5.out

Client Number Variation

0

50

100

150

200

250

300

1 2 3 4 5 6 7 8

# of Clients / Node

Tra

ns

fer

Ra

te (

MB

/se

c)

Read

Write

Page 32: A GPFS Performance & Bottleneck Analysis · PDF fileA GPFS Performance & Bottleneck Analysis William Loewe Summer, ... IBM General Parallel File System for AIX ... queue_depth = 2

William Loewe � Summer, 2000 Snow Report � Page 32

Client / Node Variation Summary

! With one node, the write ratemax of 140 MB/sec increasesafter one client.

! This does not hold true, however,with more nodes. Instead, thewrite rate stays low. Perhaps adifferent bottleneck is in effecthere.

Page 33: A GPFS Performance & Bottleneck Analysis · PDF fileA GPFS Performance & Bottleneck Analysis William Loewe Summer, ... IBM General Parallel File System for AIX ... queue_depth = 2

William Loewe � Summer, 2000 Snow Report � Page 33

Test_V.2.out

Sensitivity Curve V-A:Multiple Clients with Transfer Size

Variation (Segmented)

PARAMETERS:! Clients = 2! Nodes = 1! FileSize = 1024MB! Block = 512MB! TransferSize =

16KB – 1024KB

1 Node, 2 Clients

0

50

100

150

200

250

16 32 64 128 256 512 1024

Transfer Size (Kb)

Tra

ns

fer

Ra

te (

MB

/se

c)

Read

Write

Page 34: A GPFS Performance & Bottleneck Analysis · PDF fileA GPFS Performance & Bottleneck Analysis William Loewe Summer, ... IBM General Parallel File System for AIX ... queue_depth = 2

William Loewe � Summer, 2000 Snow Report � Page 34

Test_V.3.out

Sensitivity Curve V-B:Multiple Clients with Transfer Size

Variation (Segmented)

PARAMETERS:! Clients = 4! Nodes = 2! FileSize = 2048MB! Block = 512MB! TransferSize =

16KB – 1024KB

2 Nodes, 4 Clients

180

185

190

195

200

205

210

215

220

225

16 32 64 128 256 512 1024

Transfer Size (Kb)

Tra

ns

fer

Ra

te (

MB

/se

c)

Read

Write

Page 35: A GPFS Performance & Bottleneck Analysis · PDF fileA GPFS Performance & Bottleneck Analysis William Loewe Summer, ... IBM General Parallel File System for AIX ... queue_depth = 2

William Loewe � Summer, 2000 Snow Report � Page 35

Test_V.4.out

Sensitivity Curve V-C:Multiple Clients with Transfer Size

Variation (Segmented)

PARAMETERS:! Clients = 6! Nodes = 3! FileSize = 3072MB! Block = 512MB! TransferSize =

16KB – 1024KB

3 Nodes, 6 Clients

200

205

210

215

220

225

230

16 32 64 128 256 512 1024

Transfer Size (Kb)

Tra

ns

fer

Ra

te (

MB

/se

c)

Read

Write

Page 36: A GPFS Performance & Bottleneck Analysis · PDF fileA GPFS Performance & Bottleneck Analysis William Loewe Summer, ... IBM General Parallel File System for AIX ... queue_depth = 2

William Loewe � Summer, 2000 Snow Report � Page 36

Sensitivity Curve V-D:Multiple Clients with Transfer Size

Variation (Segmented)

PARAMETERS:! Clients = 8! Nodes = 4! FileSize = 4096MB! Block = 512MB! TransferSize =

16KB – 1024KB

Test_V.5.out

4 Nodes, 8 Clients

0

50

100

150

200

250

16 32 64 128 256 512 1024

Transfer Size (Kb)

Tra

ns

fer

Ra

te (

MB

/se

c)

Read

Write

Page 37: A GPFS Performance & Bottleneck Analysis · PDF fileA GPFS Performance & Bottleneck Analysis William Loewe Summer, ... IBM General Parallel File System for AIX ... queue_depth = 2

William Loewe � Summer, 2000 Snow Report � Page 37

Sensitivity Curve V-E:Multiple Clients with Transfer Size

Variation (Segmented)

PARAMETERS:! Clients = 10! Nodes = 5! FileSize = 5120MB! Block = 512MB! TransferSize =

16KB – 1024KB

Test_V.6.out

5 Nodes, 10 Clients

0

50

100

150

200

250

300

16 32 64 128 256 512 1024

Transfer Size (Kb)

Tra

ns

fer

Ra

te (

MB

/se

c)

Read

Write

Page 38: A GPFS Performance & Bottleneck Analysis · PDF fileA GPFS Performance & Bottleneck Analysis William Loewe Summer, ... IBM General Parallel File System for AIX ... queue_depth = 2

William Loewe � Summer, 2000 Snow Report � Page 38

Test_V.1.out

Sensitivity Curve V-F:Multiple Clients with Transfer Size

Variation (Segmented)

PARAMETERS:! Clients = 12! Nodes = 6! FileSize = 6144MB! Block = 512MB! TransferSize =

16KB – 1024KB

6 Nodes, 12 Clients

0

50

100

150

200

250

300

16 32 64 128 256 512 1024

Transfer Size (Kb)

Tra

ns

fer

Ra

te (

MB

/se

c)

Read

Write

Page 39: A GPFS Performance & Bottleneck Analysis · PDF fileA GPFS Performance & Bottleneck Analysis William Loewe Summer, ... IBM General Parallel File System for AIX ... queue_depth = 2

William Loewe � Summer, 2000 Snow Report � Page 39

Multiple Clients with Transfer SizeVariation Summary

! To determine the effect of the140MB/sec per CPU bottleneck,the benchmark is run withincreasing the number of nodes.With four or few nodes, the writerate is good for any size transfer.After that, however, somethingslows the write rate. It seems the‘camel hump’ from Curve II isconsistent.

Page 40: A GPFS Performance & Bottleneck Analysis · PDF fileA GPFS Performance & Bottleneck Analysis William Loewe Summer, ... IBM General Parallel File System for AIX ... queue_depth = 2

William Loewe � Summer, 2000 Snow Report � Page 40

Sensitivity Curve VI-A:Random Transfer Size (Strided)

PARAMETERS:! Client / Node = 1

! FileSize = Clients * 512MB

! TransferSize = 1KB to 32KB! Nodes = 1 to 7

Random1-3.out

Random4-6.out

Random7-8.out

Random Transfe r Sizes(1KB - 32KB)

0

20

40

60

80

100

120

140

160

1 2 3 4 5 6 7

# Nodes

Tra

ns

fer

Ra

te (

MB

/se

c)

Read

Write

Page 41: A GPFS Performance & Bottleneck Analysis · PDF fileA GPFS Performance & Bottleneck Analysis William Loewe Summer, ... IBM General Parallel File System for AIX ... queue_depth = 2

William Loewe � Summer, 2000 Snow Report � Page 41

Random Transfer Size Summary

! As suspected, IOR does notaddress the problems withrandom transfer sizes very well.Hopefully MPI-IO will improvethese rates.

Page 42: A GPFS Performance & Bottleneck Analysis · PDF fileA GPFS Performance & Bottleneck Analysis William Loewe Summer, ... IBM General Parallel File System for AIX ... queue_depth = 2

William Loewe � Summer, 2000 Snow Report � Page 42

Sensitivity Curve VII-A:Blocksize Variation (Strided)

PARAMETERS:! Nodes = 2

! Client / Node = 1

! FileSize = Clients * 512MB

! Block = 1KB to 512MB! (Note: Strided pattern)

Strided.2.out

Blocksize Variation (Strided)

0

50

100

150

200

250

1KB 2K

B4K

B8K

B16

KB32

KB64

KB12

8KB

256K

B51

2KB

1MB

2MB

4MB

8MB

16MB

32MB64

MB12

8MB

256M

B51

2MB

Blocksize

Tra

ns

fer

Ra

te (

MB

/se

c)

Read

Write

Page 43: A GPFS Performance & Bottleneck Analysis · PDF fileA GPFS Performance & Bottleneck Analysis William Loewe Summer, ... IBM General Parallel File System for AIX ... queue_depth = 2

William Loewe � Summer, 2000 Snow Report � Page 43

Sensitivity Curve VII-B:Blocksize Variation (Strided)

PARAMETERS:! Nodes = 4

! Client / Node = 1

! FileSize = Clients * 512MB

! Block = 1KB to 512MB Strided.4.out

Blocksize Variation (Strided)

0

50

100

150

200

250

1KB 2K

B4K

B8K

B16

KB32

KB64

KB12

8KB

256K

B51

2KB

1MB

2MB

4MB

8MB

16MB

32MB64

MB12

8MB

256M

B51

2MB

Blocksize

Tra

ns

fer

Ra

te (

MB

/se

c)

Read

Write

Page 44: A GPFS Performance & Bottleneck Analysis · PDF fileA GPFS Performance & Bottleneck Analysis William Loewe Summer, ... IBM General Parallel File System for AIX ... queue_depth = 2

William Loewe � Summer, 2000 Snow Report � Page 44

Sensitivity Curve VII-C:Blocksize Variation (Strided)

PARAMETERS:! Nodes = 8

! Client / Node = 1

! FileSize = Clients * 512MB

! Block = 1KB to 512MB Strided.8.out

Blocksize Variation (Strided)

0

50

100

150

200

250

300

1KB 2K

B4K

B8K

B16

KB32

KB64

KB12

8KB

256K

B51

2KB

1MB

2MB

4MB

8MB

16MB

32MB64

MB12

8MB

256M

B51

2MB

Blocksize

Tra

ns

fer

Ra

te (

MB

/se

c)

Read

Write

Page 45: A GPFS Performance & Bottleneck Analysis · PDF fileA GPFS Performance & Bottleneck Analysis William Loewe Summer, ... IBM General Parallel File System for AIX ... queue_depth = 2

William Loewe � Summer, 2000 Snow Report � Page 45

Sensitivity Curve VII-D:Blocksize Variation (Strided)

PARAMETERS:! Nodes = 14

! Client / Node = 1

! FileSize = Clients * 512MB

! Block = 8KB to 512MB Strided.14.out

Strided.14.aa.out

Blocksize Variation (Strided)

0.00

50.00

100.00

150.00

200.00

250.00

300.00

8KB

16KB

32KB64

KB12

8KB

256K

B51

2KB

1MB

2MB

4MB

8MB

16MB

32MB64

MB12

8MB

256M

B51

2MB

Blocksize

Tra

ns

fer

Ra

te (

MB

/se

c)

Read

Write

Page 46: A GPFS Performance & Bottleneck Analysis · PDF fileA GPFS Performance & Bottleneck Analysis William Loewe Summer, ... IBM General Parallel File System for AIX ... queue_depth = 2

William Loewe � Summer, 2000 Snow Report � Page 46

RERUN: 11/2000Sensitivity Curve VII-D:

Blocksize Variation (Strided)

PARAMETERS:! Nodes = 14

! Client / Node = 1

! FileSize = Clients * 512MB

! Block = 32KB to 512MB Strided.14.out

Strided.14.aa.out

Blocksize Variation (Strided)

0.00

50.00

100.00

150.00

200.00

250.00

300.00

32KB64

KB12

8KB

256K

B51

2KB

1MB

2MB

4MB

8MB

16MB

32MB64

MB12

8MB

256M

B51

2MB

Blocksize

Tra

nsf

er R

ate

(MB

/sec

)

Read

Write

Page 47: A GPFS Performance & Bottleneck Analysis · PDF fileA GPFS Performance & Bottleneck Analysis William Loewe Summer, ... IBM General Parallel File System for AIX ... queue_depth = 2

William Loewe � Summer, 2000 Snow Report � Page 47

Blocksize Variation Summary

! For strided reads above ~256KBblock size (and correspondingtransfer size), the performance isexcellent for any number ofnodes. In fact, we�ve achievedthe theoretical bottleneck of~250MB/sec.

! Writes, on the other hand, tend tobe better with fewer nodes andtend to improve with larger blocksizes. Initially (below 256KB)this is the read-modify-writeproblem. Beyond it, however,probably the large number ofclient are showing the coalescingproblem again.

Page 48: A GPFS Performance & Bottleneck Analysis · PDF fileA GPFS Performance & Bottleneck Analysis William Loewe Summer, ... IBM General Parallel File System for AIX ... queue_depth = 2

William Loewe � Summer, 2000 Snow Report � Page 48

Concluding Summary

META-DATA (File Create/Delete) Summary:Changes in GPFS 1.3 have improved metadata performance greatly. Small file creationand large file removal are above that of NFS,JFS, and GPFS 1.2. For directory creation,GPFS 1.3 is strong, but still half that of NFS.

IOR Benchmark Summary:

I. An optimal transfer buffer size seems be~256KB for reads and writes.

II. Read performance is excellent with a ceilingof ~250MB/sec. Increasing the number ofnodes only marginally improves this readperformance.

For writes, there is a coalescing bottleneck.

IOR Benchmark Summary (cont.):

III. The pagepool can improve read/write ratesbeyond the theoretical ceiling.

IV/V. Increasing the number of nodes using morethan one client per node improves write rate,but again the > 4 node bottleneck is present.

VI. Reads/writes for random transfer size is lessthan desirable. MPI-IO will better addressthis.

VII. Strided reads greater than 256KB blocks areexcellent across the board. Writes tend toramp us slowly with improvements due toincreasing blocksize. Again, writes are betterwith fewer nodes.

Page 49: A GPFS Performance & Bottleneck Analysis · PDF fileA GPFS Performance & Bottleneck Analysis William Loewe Summer, ... IBM General Parallel File System for AIX ... queue_depth = 2

William Loewe � Summer, 2000 Snow Report � Page 49

Acknowledgements

! This Science & Technology Education Program summer projectwas completed under the guidance and supervision of Kim Yatesand Terry Jones.

! Assistance and information from Dave Fox, Robin Goldstone, MarkGrondona, and Dan McNabb (of IBM) helped to clarify the resultsand offer general understanding of snow.llnl.gov and the GPFSsystem.

! This work was performed under the auspices of the U.S.Department of Energy by the University of California LawrenceLivermore National Laboratory under contract No. W-7405-Eng-48.