tf -storage action item measuring storage performance ...€¦ · measuring storageperformance...
TRANSCRIPT
TF-Storage Action ItemMeasuring storage performanceMeasuring storage performance
theory and practice
TF-Storage Action Item
Maciej Brzeźniak, PSNCSupercomputing Department
TF-Storage Action ItemMeasuring storage performance
Presentation outline
� TF-Storage Action Item on measuring performance
� Motivations for benchmarking storage
� Benchmarking:
� A bit of theory
� A bit of practice (in the context of tenders)
� Some conclusions
TF-Storage Action ItemMeasuring storage performance
What is the Action Item?
� TF Storage aims:
� ... forum for exchanging and promoting ideas, experience and knowledge
related to data storage, data management and cloud storage;
� to identify and promote common standards, techniques, technology and
procedures in the field of data storage...
� to be a focal point for gathering of storage expertise in .edu area in Europe
� TF-Storage Action Items:
� Forum for storage and cloud
� Overview of (national) activities and deployments
� File sharing
� Measuring storage performance
� Storage and AAI
� Example actions:
� We try to contribute some interesting material to the TF-Storage wiki
� And let the others use it
� Present the results during meetings
TF-Storage Action ItemMeasuring storage performance
Knowledge and experience transfer implementation
� TF-Storage meetings
TF-Storage Action ItemMeasuring storage performance
Knowledge and experience transfer implementation
� TF-Storage Wiki at:
� https://wiki.terena.org/index.php/TF-Storage:Contents
TF-Storage Action ItemMeasuring storage performance
Motivations for measuring performance (1)
� A lot of motivations:
� curiosity ;)
� need to evaluating storage system’s vs application
� to compare with the other products / people ;)
� lack of standard/universal ways of describing system performance
� it’s hard re-use existing results or trust vendor claims:
� results may come from different/optimal configurations
� results might be... tunned ☺
� specifications typically contains best cases
� NRENs: data storage/mgmt infrastructures/services:
� infrastructure development and planning
� tenders
TF-Storage Action ItemMeasuring storage performance
Motivations for measuring performance (2)
� MONEY: value vs price
� Having defined needs we want to spent
minimum money to address them
� With fixed budget we want the best performance
� Realistic value-price assessment:
� Difficult without the proper tool (benchmark)
� Vendors specify performance in tricky ways:
(greeting to those present in the room ☺)
� Maximum bandwidth/IOPS:
� cache-host, not disks-host,
� peak/maximum, but what about sustained?
� ‘Catalogue’ values may be true for the best configurations:
� Max number of disk drives, FC 15krpm drives, SSDs...
� Full cache configuration
� Tenders: Formalised methodology producing re-usable results needed
TF-Storage Action ItemMeasuring storage performance
Motivations for measuring performance (3)
� ‘Independent’ benchmarks
� SPC, ESG, other...
� Only a support for your methodology
� SPC/ESG limitations:
� Not every vendor/product there
� Not every access pattern represented
� SPC1 – random patterns – OLTP, database operations and mail servers
� SPC2 – sequential patterns – Large File Processing, Large DB Queries, VoD
� For complex/application scenarios you must do benchmarking by yourself
� Tested configurations ‘optimised’ by vendors:
� Results show performance of the best configs, perhaps not the one you will get
� FC disks (15k), SSDs, tunning...
� You might be unable to reproduce the results, since:
� you don’t know the product that well
� you’re not able to tune the system as vendors are
� Hard to compare different configurations...
http://www.storageperformance.org
http://www.enterprisestrategygroup.com
TF-Storage Action ItemMeasuring storage performance
Some theory (1)
How to make a solid benchmark?
Current GB/s vs time vs tests
0,00
0,50
1,00
1,50
2,00
2,50
3,00
3,50
4,00
4,50
5,00
0 200 400 600 800 1000 1200 1400 1600 1800
time [s]
GB
/ s
Current GB/second
Cache Hit % [norm.]
1-w rite
2-read
3-w rite
4-read
5-w rite
6-read
7-w rite
8-read
9-w rite
10-read
11-w rite
12-read
TF-Storage Action ItemMeasuring storage performance
Some theory (2):
How to make a solid benchmark?� Be aware of the system complexity ---->
� Apply top-down or bottom-up
approach depending on the
test purpose:
� Top-down – quickly check
what is possible; if you’re not
satisfied you go to lower level
� Bottom-up – if you want to
be sure what is the efficiency
of each layer and what is its overhead
� Theory: Do tuning on all levels!
� Reality: system is complex,
tuning takes time, so:
� Use ‘heuristics’:
� Trust your intuition & experience
while choosing the set of tuned
attributes and their values you examine Source: Stijn Eeckhaut’s presentation during 1st TF-StorageMeeting in Espoo, Finland
TF-Storage Action ItemMeasuring storage performance
Some theory (3)
How to make a solid benchmark?
� Try to reflect the real-life system load:
� Run a real application if possible (costly, difficult)
� If real test impossible, use the benchmarking tools (iozone, xdd, hdparm...)
� Be sure you’re measuring what you want, e.g.:
� Disks-host performance not cache-host
� Beware of caching
� some methods to avoid the influence of the caching:
� Use large-enough data sets (3-4x RAM in hosts + storage caches)
� Force sync’s and flush caches, re-mount or re-create filesystems
� Use direct-io mounting flags or appropriate benchmark options...
� BUT: avoid an artificial setups – examples:
� switching the caching off on all levels typically
does not make sense
� switching cache mirroring off speeds up the test,
but will you make it in production?
TF-Storage Action ItemMeasuring storage performance
Some theory (4)
How to make a solid benchmark?
� Avoid unwanted bottle-necks – try to detect them:
� Host side performance issues:
� use host state monitoring tools,
� optimize kernel settings and I/O subsystem (e.g. queues)
� Communication network issues:
� watch the network switches, check MTUs, drops, etc.
� examine the links speed with different tools
(e.g. iperf for IP)
� Storage system issues:
� explore system traces, logs and performance stats
� Analise storage back-ends
� Avoid configuration faults:
� Use enough:
� hosts, HBAs, network links
� disks if you want to examine controller’s features
� number of protocol peers (e.g. NFS clients and servers)
� Some tricky errors:
� LUN swapping between array controllers may kill your test
performance [kB/s]
0
50 000
100 000
150 000
200 000
250 000
0 32 64 96 128 160 192 224 256
record size [kB]
kB/s
WRITERS READERS RAND_readers RAND_w riters
performance [IOPS]
0
5 000
10 000
15 000
20 000
25 000
30 000
35 000
40 000
45 000
50 000
0 32 64 96 128 160 192 224 256
record size [kB]
IOP
S
WRITERS READERS RAND_readers RAND_w riters
TF-Storage Action ItemMeasuring storage performance
Some theory (4)
performance [kB/s]
0
50 000
100 000
150 000
200 000
250 000
0 32 64 96 128 160 192 224 256
record size [kB]
kB/s
WRITERS READERS RAND_readers RAND_w riters
performance [IOPS]
0
5 000
10 000
15 000
20 000
25 000
30 000
35 000
40 000
45 000
50 000
0 32 64 96 128 160 192 224 256
record size [kB]
IOP
S
WRITERS READERS RAND_readers RAND_w riters
00:11.0 SATA controller: ATI Technologies IncSB700/SB800 SATA Controller [IDE mode]
# cat /proc/scsi/scsiAttached devices:Host: scsi0 Channel: 00 Id: 00 Lun: 00
Vendor: ATA Model: KINGSTON SVP100S Rev: CJRA....
TF-Storage Action ItemMeasuring storage performance
Some theory (5)
How to make a solid benchmark?
� Use appropriate tools for the layer you want to examine:
� Application level: SPC, SPECsfs2008, TPC, the actual application...
� File system: iozone, xdd...
� Block-level: xdd, dd ☺, iometer...
� Network: iperf...
� Use appropriate test depending what is critical for your application:
� Throughput (backup, large file storage etc.)
� sequential read/write tests, possibly multi-host & multi-thread
� IOPS (DBs)
� random workloads
� Response time (DBs, real-time systems)
� random workloads with response time measurement
TF-Storage Action ItemMeasuring storage performance
More theory (☺)
How to make a solid benchmark?
� Methodology - some information is available in the Internet:
� Howto-s/blogs etc.
� SPC tests specifications – very interesting
� TF-Storage benchmarking HOWTO:https://wiki.terena.org/index.php/TF_Storage_Benchmarking_Howto
� This is an early version
� If you can/want to contribute -> please contact me
� I would be happy to get any feedback (even the critical one)
TF-Storage Action ItemMeasuring storage performance
Some practice (1)
� Example 1: PLATON project tender
for Popular Archival Service disk arrays and file servers:� Problem: We wanted to buy equipment we haven’t touched before...
� Application: backup/archive,
� large files, seq. pattern,
� we need throughput and capacity
� IOPS not critical; no ‘crazy features’ (thin-provisioning, snapshots etc.)
� Costs -> min.
� Requirements against the storage systems:
� SATA, 1TB drives – reliability, high # of spindles, short sparing time; lot of drives / RAID
� Disk arrays:
� 8Gbit FC front-end, 200TB (expandable)
� 2GB/sec. under a mixed (50%/50%) read/write workload
� File servers:
� SATA, 1TB drives, 10Gbit the front-end, 200TB (expandable)
� 1GB/sec. under a mixed (50%/50%) read/write workload or
1,2GB/s for 100% writes and reads measured independently
� Performance cannot be degraded if the filesystem is full...
TF-Storage Action ItemMeasuring storage performance
Some practice (2)
� Example 1: PLATON disk arrays & file servers:
� Solution:
� Performance benchmark procedure in tender RFP
� open source benchmark: iozone or xdd throughput tests
� average result evaluated (calculated from 5 round of tests)
� workload: 50%/50% read/write with large block size (256kB)
� vendor can choose the optimal configuration, i.e.:
� Number of testing hosts
� Physical connectivity details
� LUN configuration and mapping to hosts
� Why that much freedom?
� Vendors know their systems the best
� There is a lot of architectures/solutions available in the market
-> hard to define all possible low-level details in RFP
� BUT:
� Test files to be 4x bigger than cache + RAM in hosts
TF-Storage Action ItemMeasuring storage performance
Some practice (3)
� Example 1: PLATON disk arrays & file servers :� Solution:
� Some additional (formal) requirements:
� Full test output to be included in the offer
-> possible ‘vendor tricks’ can be ‘detected’ while evaluating the offers
� Tests to be made on the configuration identical to the delivered one:
-> this is a slippery area...
� Setting up the test-bed is costly to the vendor (some of them complained)
� BUT in that way, at the end we get the offers from:
� robust/’operative’ vendors/partner (able to provide the support later)
� We can verify the test results when the system is delivered
� vendors will/should help you at this stage
� if failed we can reject the accordance to the RFP statements
� BUT be careful: not only vendor has the problem in case of failure
TF-Storage Action ItemMeasuring storage performance
Some practice (4)
� Example 1: PLATON disk arrays & file servers tenders:
� How it worked?
� We got:
� Disk arrays: 8 offers, from 4 vendors (OEMs), 2 types of disk arrays
� Filers: 4 offers for 2 different file servers (OEMs again)
� 1 important vendor+product (within the budget) didn’t apear
� The best offers were:
� Disk array:
� IBM DS5300 (LSI 7900 OEM)
� declared 2,08 GB/s (confirmed)
� File server:
� SGI Infinite Storage NAS with SGI IS220 arrays (Bluearc and LSI OEM)
� declared 1,17 GB/s (confirmed)
� Conclusion:
� This is the right way
� BUT it worked
thanks to the size of the order? (2PB of disks in 5 arrays and 5 filers)
TF-Storage Action ItemMeasuring storage performance
Some practice (5)
� Some details of the benchmarks we made (1):
� Example 1: PLATON’s disk array – tender benchmark reqs.:
� RFP criteria – mixed read/write (50%/50%) perform @ min. 2GB/s
� additional requirements:
� test files size – at least 2x (hosts’ RAM + array cache)
� iozone tool: iozone –t mm –i 0 –i 1 –s nnM –r 256k;
� mm – number of test threads (test files)
� nnM – test file size
� -r (I/O request size) 256k – array to be used under GPFS
� no file-system specified
� number of LUN or hosts not fixed (but >2)
� no RAID level specified (however we will use RAID5 or RAID6)
� tested configuration to be the same as the delivered one
� that was difficult for vendors
� 50%/50% read/write mixture implementation:
� read and write benchmarks to be run in parallel
TF-Storage Action ItemMeasuring storage performance
Some practice (6)
� Example 1: PLATON’s disk array – verification benchmark:
� Array configuration:
� 12 LUNs over 12 RAIDs (each RAID5 composed from 16 drives) = 192 drives in total
� LUNs for writing exported through controller A, LUN for reading through B
� 4 FC ports/controller used for hosts connectivity, theoret. bandwidth ~6.4 GBit/s
� Hosts & SAN configuration:
� A lot of hosts: 16 blades with 1x 4Gbis FC HBA each
� Hosts divided into two groups: reading and writing ones
� 2 separate FC networks (blade FC switches, with 8 Gbit/s and 4 Gbit/s ports)
TF-Storage Action ItemMeasuring storage performance
Some practice (7)
� Example 1: PLATON projects’ disk array – verification benchmark:
� Issues/ problems we had:
� It’s quite a lot of work to configure the testbed:
� setup - hardware, wiring, LUNs, mapping, hosts etc. – takes about 1 week
� tunning took some days - hosts, array <- bottlenecks
� Problems analysing and solving – another 2 days <- bottlenecks
e.g. performance issues in FC switch (firmware issues limited performance)
TF-Storage Action ItemMeasuring storage performance
Some practice (8)
� Example 1: PLATON’s disk array – verification benchmark:
� Issues/ problems we had:
� Difficult to synchronize write/read tests running on 16 machines:
� iozone supports cluster mode but only for single workloads (e.g. 100% read)
� Non-cluster mode reports is only a ‘overall’ throughput
� Particular tests not really parallel
Current GB/s vs time vs tests
0,00
0,50
1,00
1,50
2,00
2,50
3,00
3,50
4,00
4,50
5,00
0 200 400 600 800 1000 1200 1400 1600 1800
time [s]
GB
/ s
Current GB/second
Cache Hit % [norm.]
1-w rite
2-read
3-w rite
4-read
5-w rite
6-read
7-w rite
8-read
9-w rite
10-read
11-w rite
12-read
Performance
criteria
Some benchmark threads finished faster
TF-Storage Action ItemMeasuring storage performance
Current GB/s vs time vs tests
0,00
0,50
1,00
1,50
2,00
2,50
3,00
3,50
4,00
4,50
5,00
0 200 400 600 800 1000 1200 1400 1600 1800
time [s]
GB
/ s
Current GB/second
Cache Hit % [norm.]
1-w rite
2-read
3-w rite
4-read
5-w rite
6-read
7-w rite
8-read
9-w rite
10-read
11-w rite
12-read
Some benchmark threads finished faster
TF-Storage Action ItemMeasuring storage performance
Some practice (9)
� Example 1: PLATON’s disk array – verification benchmark:
� Issues/ problems we had:
� Difficult to synchronize write/read tests running on 16 machines:
=> calculations needed (average perf. over the total time of the test)
Current GB/s vs time vs tests
3,50
4,00
4,50
5,00
Current GB/second
Cache Hit % [norm.]
1-w rite
2-read
TF-Storage Action ItemMeasuring storage performance
Some practice (10)
� Example 1: PLATON disk array – verification benchmark:
� Difficult to synchronize write/read tests running on 16 machines:
-> monitoring of storage was required to verify the results – array perf. Monitor
-> we can see read and write performance over the time
Wydajność zapisu i odczytu danych z macierzy
0
500 000
1 000 000
1 500 000
2 000 000
2 500 000
3 000 000
3 500 000
0 200 400 600 800 1000 1200 1400 1600 1800
Sekundy testu
Wyd
ajn
ość [
kb/s
]
WRITE
READ
TOTAL
Perf.
criteria
Calculated.
average
TF-Storage Action ItemMeasuring storage performance
Some practice (11)
� Example 2: PL-GRID project tender
for fast disk array for HPC (Lustre)
� Problem:
� Again the real performance was a big question mark
� Application: HPC, computing job scratch space
� we need throughput for large blocks (IOPS not critical)
� redundancy – RAID5 or 6 is OK
� Disk array:
� SATA, 1TB drives, 8Gbit FC front-end, 300TB (expandable)
� RFP required: 4.7GB/sec. write performance
(we assumed that writes are more difficult to implement effectively)
� Solution:
� Again the benchmark defined in RFP
� Procedure similar to presented before (but only writes to be measured)
� Formalities:
� Benchmark result to be included in the offer
� Right to verify the benchmark after delivery(and reject the equipment if it does not meet the requirements)
TF-Storage Action ItemMeasuring storage performance
Some practice (12)
� Example 2: PL-GRID tender for disk array for HPC cluster:
� How it worked?
� We got 8 offers for 4 different vendors (OEMs), 2 types of disk arrays
� The best offer was:
� DDN S2A 9900 – declared 4.8 GB/s – confirmed in practice (even more)
� Conclusion:
� This is the right way
� BUT it worked perhaps thanks to the fact that
the vendor wanted to enter the market (so was ready to make effort to perform the tests)
TF-Storage Action ItemMeasuring storage performance
Some practice (13)
� Some details of the benchmarks we made (2):
� Example 2: PL-GRID’s disk array for HPC cluster – tender benchm.:
� RFP criteria – 100 %write to perform @ min. 4.8GB/s
� additional requirements:
� test files size – at least 2x (hosts’ RAM + array cache)
� iozone tool: similarly as before
� number of LUN or hosts not fixed (but >2)
� no RAID level specified (however RAID5 or RAID6 expected)
� tested configuration to be the same as the delivered one
� that was again difficult for vendors
TF-Storage Action ItemMeasuring storage performance
Some practice (14)
� Example 2: PL-GRID’s HPC disk array – verification benchmark:
� Array configuration:
� 29 LUNs over 29 RAIDs (each RAID6 composed from 10 drives) = 290 drives in total
� LUNs distributed equally among array controllers
� 4 FC ports/controller used for hosts connectivity, theoret. bandwidth ~6.4 GBit/s
� Hosts configuration:
� Hosts divided into two groups, each group uses another array controller
� 16 blade hosts with 1x 4Gbis FC HBA each
� 2 separate FC networks (blade FC switches, with 8 Gbit/s and 4 Gbit/s ports)
TF-Storage Action ItemMeasuring storage performance
Some practice (15)
� Example 2: PL-GRID’s HPC disk array – verification benchmark:
� Issues/ problems we had:
� The performance required in RFP is high...:
-> a lot of hosts and LUNs needed (15 hosts, 29 LUNs, 29 threads)
-> fortunately we could exploit iozone’s cluster mode -> simplify the test
TF-Storage Action ItemMeasuring storage performance
Some practice (16)
� Example 2: PL-GRID’s HPC disk array – verification benchmark:
� RFP: the performance should be measured with the 60% filesystem usage:
-> this required using some additional tools (scripts)
-> you have to spend time to prepare them
Peformance vs FS usage (iozone w ithout -c -e flags)
0
1 000
2 000
3 000
4 000
5 000
6 000
0 15 24 32 45 60 80
Writers
Re-w riters
Readers
Re-reader
Tender req. 60%
TF-Storage Action ItemMeasuring storage performance
Some practice (17)
� Example 2: PL-GRID’s HPC disk array – verification benchmark:
� again, we wanted to verify what benchmark says
� array-side measures were used (S2A monitoring tool)
� plus SAN network monitoring
Array-side and FC network measurements during test
TF-Storage Action ItemMeasuring storage performance
Some practice (18)
� Example 3: PLATON project tender for fast file server (NFS) :
� Problem:
� As always... the real performance was a big question mark
� Requirements:
� Application: backup/archive,
� large files, seq. pattern, needed throughput and capacity
� IOPS not critical; no ‘crazy features’ (de-dup, snapshots etc.)
� Performance:
� SATA, 1TB drives, 10Gbit the front-end, 200TB (expandable)
� 1GB/sec. under a mixed (50%/50%) read/write workload or1.2GB/s 100% writes and reads measured independently
� Performance cannot be degraded if the filesystem is full...
� Solution:
� Again the benchmark defined in RFP + right to verify the results:
� Procedure:
� 100% write + 100% read with iozone
� or run 50%/50% read/write test with xdd
TF-Storage Action ItemMeasuring storage performance
Some practice (19)
� Example 3: PLATON project tender for fast file server (NFS) :� How it worked?
� We got:
� 4 offers for 2 different file servers (OEMs again)
� 1 important vendor+product (within the budget) didn’t apear
� The best offers were:
� SGI Infinite Storage NAS with SGI IS220 arrays (Bluearc and LSI OEM)
� declared 1,17 GB/s (confirmed)
� Conclusion:
� Again – RFP & offers – OK.
� Now the verification...
TF-Storage Action ItemMeasuring storage performance
Some practice (20)
� Example 3: PLATON file server – verification test:
� Array configuration:
� 26 LUNs over 224 drives (RAID5s), 3 disk pools (defined in the filer)
� 38 filesystems (!) (proprietary FS withing the filer)
� Hosts configuration:
� 6 physical machines w 2x 10 Gbit Ethernet,
� 38 virtual hosts (!) – KVM
� Network:
� 2 separate
10 Gbit Eth
networks
TF-Storage Action ItemMeasuring storage performance
Some practice (21)
� Example 3: PLATON file server – verification test:� Issues/ problems we had:
� Expected:
� tunning needed, a lot of settings can have influence on results
� Tight schedule: only 2 weeks for everything, hard deadline
� Unexpected:
� installation, wiring, setup, definition of RAIDs, LUNs, pools etc. – 4 days (24h/day)
� after a week of testing & tunning realised that:
� 6 client-server pairs unable to generate declared 1,17 GB/s, needed more
� decided to use virtual hosts (4 days before deadline)
� 10 Gbit Eth driver issues in KVM prevented reaching max performance
(Friday, 2 days to deadline)
� We were working 24h/day (2 benchmarking teams)
� At the end:
� Finished up with 38 client-server pairs (run on VMs)
� Iozone cluster helped to manage the benchmark
� Met the criteria and the deadline:
� Writes: 1,24 GB/s, Reads: 1,42 GB/s (avg from 10 runs)
� expected results got @6:00 AM in Monday – 2 hours before deadline?
TF-Storage Action ItemMeasuring storage performance
Some practice (21)
� Example 3: PLATON file server – verification test:� again, we wanted to verify
what benchmark says
Around: 22,50 IOPS/s
/ NAS node
1 NFS operation = 32 kB
TF-Storage Action ItemMeasuring storage performance
Our benchmarks conclusions:
� Some conclusions for tenders...
� You may want to measure the performance
of the system you plan to purchase/use
� This is difficult:
� Complicates RFP, offers and the whole process (more paper, risk of faults, protests...)
� Some effort/resources needed on the vendor side
� Effort+resources needed on your side:
� Procedure preparation and verification (+formalisation)
� Time, knowledge & equipment needed to verify results
� Some specific knowledge & experience necessary:
� Be aware of possible tricks and eliminate them in the RFP
� Verification test:
� Force the vendor to assist you in testing (at least in tunning)
� Additional monitoring is useful, to verify the results reported by the benchmark
� Plan your tender schedule accordingly (!!!)
� Testing takes time & resources - must be included in schedule, or you have a problem
� Leave some spare time for unexpected problems
TF-Storage Action ItemMeasuring storage performance
Presentation conclusions:
� Benchmarking is needed:
� You can’t trust catalog values and re-use independent benchmarks
� You want to know what a given configuration may offer to you
� It’s difficult as:
� It takes time and human/hardware resources
� Some knowledge and experience is needed including:
� Tunning (storage, network, OS, protocols...)
� Benchmarking methods
� Auxiliary tools and techniques (automating, monitoring)
� We try to help you & want to learn something from you:
� ‘Best practices’ described on the TF-Storage Wiki:
� Please review and contribute to (or at least criticise)
� TF-Storage benchmarking HOWTO:
https://wiki.terena.org/index.php/TF_Storage_Benchmarking_Howto
TF-Storage Action ItemMeasuring storage performance
Contact: Maciej Brzeźniak, [email protected]
Measuring storage performancetheory and practice
Maciej Brzeźniak, PSNC
TF-Storage Action Item
Note: most of the pictures used in the presentation come from sxc.hu service