file system benchmarking
DESCRIPTION
File System Benchmarking. Advanced Research Computing. Outline. I O benchmarks What is benchmarked Micro-benchmarks Synthetic benchmarks Benchmarks results for Shelter NFS server, client on hokiespeed NetApp FAS 3240 server, client on hokiespeed and blueridge. IO BENCHMARKING. - PowerPoint PPT PresentationTRANSCRIPT
File System BenchmarkingAdvanced Research Computing
Advanced Research Computing
Outline• IO benchmarks– What is benchmarked– Micro-benchmarks– Synthetic benchmarks
• Benchmarks results for– Shelter NFS server, client on hokiespeed– NetApp FAS 3240 server, client on hokiespeed and
blueridge
2
Advanced Research Computing 3
IO BENCHMARKING
Advanced Research Computing
IO Benchmarks• Micro-benchmarks– Measure one basic operation in isolation
• Read and write throughput: dd, IOzone, IOR• Metadata operations (file create, stat, remove): mdtest
– Good for: tuning an operation, system acceptance• Synthetic benchmarks:– Mix of operations that model real applications– Useful if they are good models of real applications– Examples:
• Kernel build, kernel tar and untar• NAS BT-IO
4
Advanced Research Computing
IO Benchmark pitfalls• Not measuring want you want to measure– masking of the results by various caching and
buffering mechanisms• Examples of different behaviors– Sequential bandwidth vs random IO bandwidth;– Direct IO bandwidth vs bandwidth in the presence
of the page cache (in the latter case an fsync is needed)
– Caching of file attributes: stat-ing a file on the same node on which the file has been written
5
Advanced Research Computing
What is benchmarked• What we measure is the combined effect of:– native file system on the NFS server (shelter)– NFS server performance which depends on factors such
as enabling/disabling write-delay and the number of server threads• Too few threads: client retries several times• Too many threads: server thrashing
– network between the compute cluster and the NFS server
– NFS client and mount options• Synchronous or asynchronous• Enable/disable attribute caching
6
Advanced Research Computing
• IOZone – measure read/write bandwidth– Historical benchmark ability to test multiple
readers/writers• dd – measure read/write bandwidth– Tests file write/read
• mdtest – metadata operations per second– file/directory create/stat/remove
Micro-benchmarks
Advanced Research Computing
• Measures the rate of the operations of file/directory– create, stat, remove
• Mdtest creates a tree of files and directories• Parameters used– tree depth z = 1– branching factor b = 3– number of files/directories per tree node: I = 256– Stat run by another node than the create node: N = 1– Number of repeats of the run: i = 5
Mdtest – metada test
Advanced Research Computing
• tar-untar-rm – measure time– Test large number of small file creation/deletion– Test filesystem metadata creation/deletion
• NAS BT-IO – bandwidth and time doing IO – Solve a block tri-diagonal linear system arising
from the discretization of Navier-Stokes equations
Synthetic benchmarks
Advanced Research Computing
• Run on 1 to 32 nodes.• Tarball size: 890M• Total directories: 4732– Max directory depth: 10
• Total files: 75984– Max file size: 919 kB– <= 1k: 14490– <= 10k: 40190– <=100k: 20518– <= 1M: 786
Kernel source tar-untar-rm
Advanced Research Computing
• Test mechanism – BT is a simulated CFD application that uses an implicit algorithm to
solve 3-dimensional compressible Navier-Stokes equations. The finite differences solution to the problem is based on an Alternating Direction Implicit (ADI) approximate factorization that decouples the x, y and z dimensions. The resulting systems are Block-Tridiagonal of 5x5 blocks and are solved sequentially along each dimension.
– BT-I/O is test of different parallel I/O techniques in BT– Reference - http://www.nas.nasa.gov/assets/pdf/techreports/1999/nas-99-011.pdf
• What it measures– Multiple cores I/O with a single large file (blocking MPI calls
mpi_file_write_at_all and mpi_file_read_at_all)– I/O timing percentage, Total data written, I/O data rate
NAS BT-I/O
Advanced Research Computing 12
SHELTER NFS RESULTS
Advanced Research Computing
• Run on 1 to 32 nodes• Two block size – 1MB and 4MB• Three file sizes – 1GB, 5GB, 15GB
Block size File size Average Median Stdev
1M 1G 8.01 6.10 4.58
1M 5G 7.75 5.95 4.52
1M 15G 5.74 5.60 0.34
4M 4G 11.17 11.80 2.87
4M 20G 15.71 12.70 10.68
4M 60G 14.60 10.50 9.22
dd throughput (MB/sec)
Advanced Research Computing
1 2 30
2
4
6
8
10
12
14
16
18
1M4M
File size of 1G/4G, 5G/20G, 15G/60G
Band
wid
th (M
B/s)
dd throughput (MB/sec)
Advanced Research Computing
0 2 4 6 8 10 12 1485,000
90,000
95,000
100,000
105,000
110,000
115,000
120,000
Average iozone Throughput
Write Child
Write Parent
Threads
Thro
ughp
ut (K
B/s)
IOZone write throughput
Advanced Research Computing
Write Read
Child 112033.186666667 9973.435
Parent 112032.34 9973.43
10,000
30,000
50,000
70,000
90,000
110,000
Write vs. Read (32 GB, 1 Thread)
Thro
ughp
ut (K
B/s)
IOZone write vs read (single thread)
Advanced Research Computing
0 5 10 15 20 25 30 350
500
1000
1500
2000
2500
Mdtest: IO per second for directory and file creation, mdtest -z 1 -b 3 -I 256
directory create
file create
Number of Nodes
IO o
peati
ons
Mdtest file/directory create rate
Advanced Research Computing
Mdtest file/directory remove rate
0 5 10 15 20 25 30 350
500
1000
1500
2000
2500
3000
Mdtest: IO per second for directory and file removal,
mdtest -z 1 -b 3 -I 256
directory remove
file re-move
Number of Nodes
IO o
peati
ons
Advanced Research Computing
Mdtest file/directory stat rate
0 5 10 15 20 25 30 350
100000
200000
300000
400000
500000
600000
700000
Mdtest: IO per second for directory and file stat, mdtest -z 1 -b 3 -I 256
direc-tory stat
file stat
Number of Nodes
IO o
peati
ons
Advanced Research Computing
Tar-untar-rm time (sec)tar Real User Sys
Average 781.27 1.35 10.41Median 1341.72 1.66 13.08
Standard deviation 644.16 0.44 3.39
untar Real User SysAverage 1214.82 1.51 18.02Median 1200.13 1.51 17.90
Standard deviation 99.03 0.06 0.62
rm Real User SysAverage 227.48 0.22 3.91Median 216.28 0.22 3.87
Standard deviation 64.21 0.02 0.16
Advanced Research Computing
Attribute Class C Class D
Problem Size 162 x 162 x 162 408 x 408 x 408
Iterations 200 250
Number of Processes 4 361
I/O timing percentage 13.44 91.66
Total data written in a single file (MB) 6802.44 135834.62
I/O data rate (MB/sec) 94.99 73.45
Data written or read at every I/O instance into a single file per processor (MB/core)
42.5 7.5
BT-IO Results
Advanced Research Computing 22
NETAPP FAS 3240 RESULTS
Advanced Research Computing
Server and Clients • NAS server: NetApp FAS 3240 • Clients running on two clusters– Hokiespeed– Blueridge
• Hokiespeed: Linux kernel compile, tar-untar and rm tests have been run with: – nodes spread uniformly over racks, and – consecutive nodes (rack-packed)
• Blueridge: Linux kernel compile, tar-untar, and rm tests have been run on consecutive nodes
23
Advanced Research Computing
IOzone read and write throughput (KB/s)
1 3 6 12101000
101500
102000
102500
103000
103500
104000
104500
IOZone write throughput
Write Child Throughput
Write Parent Throughput
# threads1 3 6 12
113800
114000
114200
114400
114600
114800
115000
115200
115400
115600
IOZone Read throughput
Read Child Throughput
Read Parent Throughput
# threads
Hokiespeed
Advanced Research Computing
• Two node placement policies– packed on a rack– spread across racks
• Direct IO was used• Two operations: read and write• Two block sizes – 1MB and 4MB• Three file sizes – 1GB, 5GB, 15 GB• Results show throughput in MB/s
dd bandwidth (MB/sec)
Advanced Research Computing
dd read throughput (MB/sec), 1MB blocks
1 2 4 8 16 32 640
100
200
300
400
500
600
1G
5G
15G
# nodes
Nodes spread
Hokiespeed
Nodes packed
1 2 4 8 16 32 640
100
200
300
400
500
600
700
1G
5G
15G
# nodes1 2 4 8 16 32 64
0.00
100.00
200.00
300.00
400.00
500.00
600.00
700.00
1G
5G
15G
# nodes
BlueRidge
Nodes packed
Advanced Research Computing
dd read throughput (MB/sec), 4 MB blocks
Nodes spread
Hokiespeed
Nodes packed
1 2 4 8 16 32 640
50
100
150
200
250
300
350
400
1G
5G
15G
# nodes
1 2 4 8 16 32 640
100
200
300
400
500
600
700
1G5G15G
# nodes1 2 4 8 16 32 64
0.00
100.00
200.00
300.00
400.00
500.00
600.00
700.00
800.00
1G
5G
15G
# nodes
BlueRidge
Nodes packed
Advanced Research Computing
dd write throughput (MB/sec), 1MB blocks
Nodes spread
Hokiespeed
Nodes packed
1 2 4 8 16 32 640
50
100
150
200
250
300
350
400
1G5G15G
# nodes
1 2 4 8 16 32 640
50
100
150
200
250
300
350
400
1G
5G
15G
# nodes1 2 4 8 16 32 64
0.00
50.00
100.00
150.00
200.00
250.00
300.00
350.00
400.00
1G
5G
15G
# nodes
BlueRidge
Nodes packed
Advanced Research Computing
dd write throughput (MB/sec), 4MB blocks
Nodes spread
Hokiespeed
Nodespacked
1 2 4 8 16 32 640
50
100
150
200
250
300
350
400
1G
5G
15G
# nodes
1 2 4 8 16 32 640
50
100
150
200
250
300
350
400
1G5G15G
# nodes 1 2 4 8 16 32 640.00
50.00
100.00
150.00
200.00
250.00
300.00
350.00
400.00
1G
5G
15G
# nodes
BlueRidge
Nodespacked
Advanced Research Computing
• Two node placement policies– packed on a rack– spread across racks
• Operations– Compile: make –j 12– Tar creations and extraction – Remove directory tree read and write
• Results show throughput in MB/s
Linux Kernel tests
Advanced Research Computing
Linux Kernel compile time (sec)
Nodes packed
nodes real user sys
1 733 5001 1116
2 1546 5086 1233
4 3189 5146 1273
8 6343 5219 1317
16 9476 5251 1366
32 10012 5255 1339
nodes real user sys
1 817 4968 10962 990 5014 11384 993 5223 11718 939 5143 1167
16 1318 5112 119832 2561 5087 118364 4985 5111 1209
Nodes spread
nodes real user sys1 694 4589 9512 1092 4572 9934 2212 4631 10388 4451 4691 1073
16 5636 4716 109832 5999 4702 111164 6609 4699 1089
BlueRidge Hokiespeed
Nodes packed
Advanced Research Computing
Tar extraction time (sec)
Nodes packed
Nodes spread
nodes real user Sys
1 167 1.0 9.5
2 172 0.98 9.5
4 177 1.06 9.6
8 202 1.03 9.7
16 312 1.09 10.2
32 421 1.18 11.9
nodes real user sys
1 143 1.05 9.5
2 125 0.98 9.4
4 144 1.04 9.8
8 149 1.04 9.8
16 216 1.08 10.4
32 399 1.23 12.5
64 809 1.42 15.0
Hokiespeed nodes real user sys
1 98 0.6 6.6
2 103 0.6 6.6
4 106 0.6 6.5
8 130 0.7 7.1
16 217 0.8 9.1
32 406 1.2 13
64 818 1.1 14
Nodes packed
BlueRidge
Advanced Research Computing
Rm execution time (sec)
Nodes packed
Nodes spread
nodes real user sys
1 20 0.12 2.5
2 21 0.15 2.7
4 25 0.16 2.8
8 33 0.17 2.8
16 123 0.22 3.7
32 284 0.24 4.0
64 650 0.27 4.4
nodes real user sys
1 21 0.14 2.84
2 22 0.14 2.82
4 22 0.15 2.80
8 47 0.18 3.30
16 135 0.21 3.85
32 248 0.23 4.01
64 811 0.27 4.54
nodes real user sys
1 19.21 0.07 1.69
2 19.14 0.10 1.69
4 26.68 0.11 1.98
8 63.75 0.16 3.16
16 152.59 0.22 4.24
32 324.90 0.26 4.98
64 699.04 0.25 5.06
Nodes packed
Hokiespeed BlueRidge
Advanced Research Computing
Uplink switch traffic, runs on hokiespeed
Nodes packed
Nodes spread
Advanced Research Computing
Mdtest file/directory create rate
0 10 20 30 40 50 60 700
2000
4000
6000
8000
10000
12000
14000
directory create file create
# nodes
0 10 20 30 40 50 60 700
500
1000
1500
2000
2500
3000
3500
4000
4500
directory create file create
# nodes
IO ops/sec for mdtest –z 1 –b 3 –I 256 –i 10 –N 1
Hokiespeed BlueRidge
Advanced Research Computing
Mdtest file/directory remove rate
0 10 20 30 40 50 60 700
1000
2000
3000
4000
5000
6000
7000
8000
9000
10000
directory remove file remove
# nodes
Hokiespeed BlueRidge
0 10 20 30 40 50 60 700
1000
2000
3000
4000
5000
6000
7000
8000
9000
10000
directory remove file remove
# nodes
IO ops/sec for mdtest –z 1 –b 3 –I 256 –i 10 –N 1
Advanced Research Computing
Mdtest file/directory stat rate
Hokiespeed BlueRidge
IO ops/sec for mdtest –z 1 –b 3 –I 256 –i 10 –N 1
0 10 20 30 40 50 60 700
100000
200000
300000
400000
500000
600000
700000
directory stat file stat
# nodes
0 10 20 30 40 50 60 700
100000
200000
300000
400000
500000
600000
700000
directory stat file stat
# nodes
Advanced Research Computing
Class D
Iterations 250 (I/O after every 5 steps)
Number of jobs 50
Total data size (written/read) (TB) 6.5 (50 files of 135GB each)
System HokieSpeed BlueRidge
Nodes per job 3 4
Total number of cores 1800 3200
Average I/O timing in hours 5.175 5.85 5.3 5.5
Average I/O timing (percentage of total time) 92.6 93.4 92.7 96.6
Average Mop/s/process 80.6 72 79.6 44.5
Average I/O rate per node (MB/s) 2.44 2.15 2.34 1.71
Total I/O rates (MB/s) 357.64 323.02 359.8 343.42
NAS BT IO results
Advanced Research Computing
Uplink switch traffic for BT-IO on hokiespeed
• The boxes indicate the three NAS BT IO runs
• Red is write• Green is read
1 2
3
Advanced Research Computing 40
EMC ISILON X400 RESULTS
Advanced Research Computing
• Runs on BlueRidge– no special node placement policy
• Direct IO was used• Two operations: read and write• Two block sizes – 1MB and 4MB• Three file sizes – 1GB, 5GB, 15 GB• Results show throughput in MB/s
dd bandwidth (MB/sec)
Advanced Research Computing
dd read throughput (MB/sec), 1MB blocks
EMC Isilon
1 2 4 8 16 32 640
100
200
300
400
500
600
700
1G
5G
15G
# nodes
NetApp
1 2 4 8 16 32 640.00
200.00
400.00
600.00
800.00
1000.00
1200.00
1400.00
1G5G15G
# nodes
Advanced Research Computing
dd read throughput (MB/sec), 4 MB blocksIsilon
1 2 4 8 16 32 640.00
100.00
200.00
300.00
400.00
500.00
600.00
700.00
800.00
1G
5G
15G
# nodes
NetApp
1 2 4 8 16 32 640.00
200.00
400.00
600.00
800.00
1000.00
1200.00
1400.00
1G5G15G
# nodes
Advanced Research Computing
dd write throughput (MB/sec), 1MB blocks
Isilon
1 2 4 8 16 32 640.00
50.00
100.00
150.00
200.00
250.00
300.00
350.00
400.00
1G
5G
15G
# nodes
NetApp
1 2 4 8 16 32 640.00
100.00
200.00
300.00
400.00
500.00
600.00
700.00
1G5G15G
# nodes
Advanced Research Computing
dd write throughput (MB/sec), 4MB blocks
Isilon
1 2 4 8 16 32 640.00
50.00
100.00
150.00
200.00
250.00
300.00
350.00
400.00
1G
5G
15G
# nodes
NetApp
1 2 4 8 16 32 640.00
100.00
200.00
300.00
400.00
500.00
600.00
700.00
800.00
900.00
1G5G15G
# nodes
Advanced Research Computing
• Runs on BlueRidge– no special node placement policy
• Direct IO was used• Operations– Compile: make –j 12– Tar creations and extraction – Remove directory tree read and write
• Results show throughput in MB/s
Linux Kernel tests
Advanced Research Computing
Linux Kernel compile time (sec)
nodes real user sys
1 694 4589 951
2 1092 4572 993
4 2212 4631 1038
8 4451 4691 1073
16 5636 4716 1098
32 5999 4702 1111
64 6609 4699 1089
NetApp Isilon
nodes real user sys
1 701 4584 957
2 1094 4558 989
4 2228 4631 1038
8 4642 4713 1084
16 5860 4723 1107
32 6655 4754 1120
64 7181 4760 1113
Advanced Research Computing
Tar creation time (sec)
Isilon NetApp
nodes real user sys
1 32 0.50 4.45
2 32 0.51 4.54
4 32 0.47 4.39
8 32 0.48 4.38
16 33 0.49 4.28
32 35 0.49 4.19
64 57 0.51 4.20
nodes real user sys
1 30 0.51 4.50
2 30 0.49 4.46
4 34 0.50 4.51
8 41 0.51 4.45
16 62 0.54 4.51
32 116 0.60 4.83
64 238 0.89 7.10
Advanced Research Computing
Tar extraction time (sec)
Isilon
nodes real user sys
1 98 0.6 6.6
2 103 0.6 6.6
4 106 0.6 6.5
8 130 0.7 7.1
16 217 0.8 9.1
32 406 1.2 13
64 818 1.1 14
NetApp
nodes real user sys
1 230 0.65 10.1
2 234 0.62 10.3
4 237 0.63 10.4
8 255 0.64 10.5
16 300 0.67 10.9
32 431 0.74 11.8
64 754 0.87 14.1
Advanced Research Computing
Rm execution time (sec)
nodes real user sys
1 19.2 0.07 1.69
2 19.1 0.10 1.69
4 26.7 0.11 1.98
8 63.7 0.16 3.16
16 152 0.22 4.24
32 324 0.26 4.98
64 699 0.25 5.06
Isilon NetApp
nodes real user sys
1 110 0.23 4.76
2 113 0.24 4.80
4 124 0.24 4.82
8 158 0.24 4.85
16 234 0.25 4.93
32 340 0.26 4.99
64 655 0.26 5.27
Advanced Research Computing
IOZone write throughput (KB/s) Isilon
Direct IO/BlueRidge Buffered IO/BlueRidge
1 3 6 120
20000
40000
60000
80000
100000
120000
140000
Write Child Throughput
Write Parent Throughput
# threads1 3 6 1285000.00
90000.00
95000.00
100000.00
105000.00
110000.00
115000.00
120000.00
Write Child Throughput
Write Parent Throughput
# threads
Advanced Research Computing
IOZone read throughput (KB/s) Isilon
1 3 6 120
20000
40000
60000
80000
100000
120000
Read Child Throughput
Read Parent Throughput
# threads1 3 6 120.00
1000000.00
2000000.00
3000000.00
4000000.00
5000000.00
6000000.00
7000000.00
8000000.00
Read Child Throughput
Read Parent Throughput
# threads
Direct IO/BlueRidge Buffered IO/BlueRidge
Advanced Research Computing
Iozone write throughput (KB/s)
1 3 6 12101000
101500
102000
102500
103000
103500
104000
104500
Write Child Throughput
Write Parent Throughput
# threads
NetApp/HokieSpeed
1 3 6 1285000.00
90000.00
95000.00
100000.00
105000.00
110000.00
115000.00
120000.00
Write Child Throughput
Write Parent Throughput
# threads
Isilon/BlueRidge
Advanced Research Computing
IOzone read throughput (KB/s)
1 3 6 12113800
114000
114200
114400
114600
114800
115000
115200
115400
115600
Read Child Throughput
Read Parent Throughput
# threads
1 3 6 120.00
1000000.00
2000000.00
3000000.00
4000000.00
5000000.00
6000000.00
7000000.00
8000000.00
Read Child Throughput
Read Parent Throughput
# threads
NetApp/HokieSpeed Isilon/BlueRidge
Advanced Research Computing
Thank you.