technical doc storage qa and testing

Upload: vikisorte

Post on 05-Apr-2018

220 views

Category:

Documents


0 download

TRANSCRIPT

  • 7/31/2019 Technical Doc Storage Qa and Testing

    1/28

    Measuring storage performanceSkip to end of metadata

    Attachments:1Added by Dick Visser, last edited by Dick Visser on 07 Mar 2011 (view change)

    show comment

    Go to start of metadataIntroductionGeneral informationsHOWTO content

    This HOWTO describes the (networked) storage performance measurementsmethodology and tips.As the members of TERENA's TF-Storage we try to collect general informations HOW

    TOprepare and perform the storage performance testing, basing on our own experienceandknowledge as well as the material accessible in the Internet.

    This HOWTO re-uses the presentation prepared by IBBT/Ghent University,Storage Benchmarking Cookbook.HOWTO limitations

    Obviously, the world is too complicated to be described in a single work.However, we hope that both background and practical, hands-on informations includedin this HOWTOcan be useful for the community.HOWTO does not cover all possible aspects, problems and approaches to thestorage benchmarking process. We try to describe the most important issues,most popular approaches and tools used to evaulate the storage systems.HOWTO organisation

    The organisation of the sections of this HOWTO follows the real-world, networkedstorage systems layers layout (see table below).

    We provide general and layer-by-layer comments on how to perform storagebenchmarkingalong with some background information, practical benchmarking tips and examplesas well as links to the external sources.

    System layers

    Typical storage environment is composed of layers similar to presented above.It is important to understand, that each of these layers can have important influence of

    the data access performance.

    In following subsections we shortly discuss the layers and mention the main factors,that have the most important impact on the efficiency of the whole storage system.Clients and applications layer

    Client applications run in the client hosts. These hosts are the NAS clients. NASservices include file system-levelservices, basic user authentication and access control.

  • 7/31/2019 Technical Doc Storage Qa and Testing

    2/28

  • 7/31/2019 Technical Doc Storage Qa and Testing

    3/28

    the NAS gateway that implement the file-level data services over the block-leveldata services provided by the external block storage box; the gateway itself can bebuild as:

    dedicated, specialised NAS gateway appliance,software NAS gateway run on the typical operating system,

    the complete NAS appliance that includes NAS engine and the block-level storagemodules.Nevertheless the physical implementation, the logic remains similar.

    The physical and logical characteristics of the implementation of NAS serviceshave very important impact on the performance of the networked storage.

    Therefore, their features must be carefully tested in the confines of the networkedstorage bemchmarking evaluation.

    The NAS elements may consitute a bottle-neck, among others due to:

    a lack of scalability of data processing performance,overhead of the file-level services compared to block-level storagemspecifics of the NAS gateways hardware and software including:file system settings: caching, journaling, prefetching, redundancy,operating systems limits on NAS gateways,transport protocol parameters in both client and storage front-end network.

    Block-level storage clients

    Note, that while being a NAS servers, NAS gateways act as the block-level servicesclients in the same time. Block-level services are provided by storage controllersaccesible through storage front-end network, e.g.a FC-based SAN.

    The performance of file system-level services strongly depends on the performanceof the block-level volumes. Therefore, the evaluation of the performance of thenetworked storage environment should include examining the block storagecomponents.

    This is possible in open systems, where we can have access to the block-level storagedirectly.

    In typical networked storage setups also the application/client systems can beconnected to the storage network directly, exploiting the block-level storage. In suchcase, storage benchmarking can be focusedon the performance evaluation of SAN compoments, SAN network interfaces (so calledHBAs)(Host Bus Adapters) and SAN driver characteristics.Storage front-end connectivity (SAN or DAS)

    The block-level storage can be served directly, locally (DAS - Direct Attached Storage)or by the Storage Area Network. In both cases, the performance evaluation of theblock-level storage can be performed using similar methods and tools, as thevirtualisation mechanismes implemented in modern operating systems hide the factthat the SAN storage is not local.

    SAN network may constitute a performance bootleneck, depending on:

    network topology and technology,

  • 7/31/2019 Technical Doc Storage Qa and Testing

    4/28

    links speed,latency,multipathing support (active-active vs active-passive).

    Beside the FC technology, iSCSI and FCIP, iFCP and SRP (over IB) protocols are used toimplement the data transport in the front-end storage networks.

    Storage controllers and RAID structures

    Storage controllers implement the basic virtualisation functions such as RAIDstructures, volumes (LUNs) and volumemapping to hosts and volumes access control. They are crucial elements of the storagesystem, so their features may determine its performance characteristics.

    Advanced functions of modern controllers can have the significant impact on the dataaccess efficiency.

    They should be taken into account while planning the storage controllers testing.Examples are:

    hardware-aid RAID implementation,front-end ports load balancing,advanced data caching:reads: read-ahead caching, sequential access pattern detection,writes: write-back and write-through caching modes and RAID stripe writes

    optimisation,LUN-in-cache functionality.

    Solid benchmarking of the storage controllers should include (but should not be limitedto):

    sustained mode operation testing,cached mode operation (client-cache efficiency)evaluation of the influence of the processes that take place in the RAID system (e.g

    sparing the failed disks, snapshots management etc.) on the performance observed bythe clients.

    RAID structures are the basic virtualisation technique used in storage controllers. Thetechniques and tricks used to optimise particular RAID implementations can be adifferentiating factor of various storage controllers available on the market.

    Therefore, controller settings that relate to RAID structures should be carefullyexamined in the confinesof storage system testing. Possible configuration settings include:

    RAID level,RAID chunk and stripe size,read-ahead and write cache operation modes.

    Also the logical volumes can be configured in different ways. The settings that shouldbe taken into account while performing the storage testing are:

    LUN distribution over RAID structures,LUN-specific cache setting,support for the access to the volume through multiple paths.

    Particular storage controllers may include advanced configuration parameters, that canbe used

  • 7/31/2019 Technical Doc Storage Qa and Testing

    5/28

    for performance tunnig. Examples of such parameters are:

    cache mirroring and cache coherency settings,active-active vs active-passive controllers operation,and many others.

    Significant amount of time should be spent in order to get a realistic pictureof a given controller performance characteristics.Storage back-end connectivity

    Back-end storage network interconnects the storage controllers with he disks drives.Various type of the technologies are used for that purpose. Traditionally, Fibre Channelloops span disk drives and storage controller back-end ports. In enterprise solutions,switched back-end topologies are used. Currently, SAS technology is often usedfor the back-end connectivity as it provides high bandwidth at optimal cost.

    The storage back-end links can be a bootle-neck, especially in a sustained transferoperations. Therefore its characteristics such as latency and bandwidth should betaken into account whilerunning the performance tests.Disk drives

    As the disk drives hold the actual data, their characteristics are very importantcomponent of storage system performance.

    The efficiency of accessing data held on disk drives may depend on:

    drives technology (FC, SATA/FATA, SSD),disk rotational speed (typically 7,2krpm, 10krpm. 145krpm),internal cache size and efficiency,additional techniques that can be implemented in the drives (e.g. NCQ - Native

    Command Queueing, Tagged Command Queueing),disk segment size,physical layout/placement of the data on the disk cylinders, e.g. outer sectors -

    higher throughput, middle sectors - lower seek latency.

    As the data processed in storage systems ultimately goes to disk drives, it might beuseful to examine the performance of the individual disk drives (if possible), in order todetermine the realistic performance that can be achieved after combining multipledisks into RAID arrays.Storage system layers summary

    In above sections we showed that networked storage systems are typically composedof multiple layers.Each of this layers has its own features, complexity and performance characteristics.Solid storage benchmarking should take into account these facts. Appropriate testing

    metodologyshould be used in order to reveal the real characteristics of particular componentsor to find a source of the actual or potential bottle-neck observed in the system.

    Overview of the benchmarking methodologies is provided in the next section of theHOTWO.Storage Benchmarking methodologies overview

    The complexity of the networked storage system makes solid storage benchmarking a

  • 7/31/2019 Technical Doc Storage Qa and Testing

    6/28

    difficult task.One of the basic decisions that must be taken while preparing the benchmarkingprocedureis the PURPOSE of running the benchmark and the EVALUATION CRITERIA for the testresults.Benchmarking purposes and evaluation criteria

    There are many possible benchmarking purposes. Here are some examples of them:

    evaluation of the actual storage system efficiency (without any expectationsformulated a priori):

    in a default setup,in a tunned configuration,determining the source of the observed bottle-neck (makes sense especially if the

    performance is lower than expected),tunning the storage system for a given application.

    In order to evaluate the benchmark results, the EVALUATION CRITERIA must be set up.The criteria can differ significantly, depending on the TARGET APPLICATION of thestorage system.

    For instance, when considering home directories storage, both sequential and randomaccess pattern performanceare important. For high-performance computing, rich-media and backup/archiveapplications,the sequental data access performance matters the most. Databases storage requiresgoodIOPS characteristics of the storage system.

    Benchmarking procedure and tools should be able to reveal the characteristics of theexaminedstorage system that matches the defined criteria.Benchmarking tactics and organisation

    Depending on the purpose, the benchmarking process should be organised in differentways.

    For instance, the evaluation of the storage system efficiency in a tunned configurationrequire running multiple rounds of tests, while changing configuration parameters onsingleor multiple storage system layers beetween rounds and doing the analysis phase aftereach test.

    While planning the benchmarking process we shoul try to:

    plan the correct sequence of the benchmarking actions, e.g.:

    work bottom-up to know the efficiency of each system layer and the overhead itintroduces comparing to lower layers;

    work top-down to determine see the limit on the highest level and try to find itssource on the lower layers.

    choose the appropriate set of the examined configuration parameters and determinethe set of the values used for them,

    eliminate the influence of unwanted optimisation mechanisms (in case we want it),e.g.:

    storage controller-side caching,

  • 7/31/2019 Technical Doc Storage Qa and Testing

    7/28

    NAS gateway filesystem-side caching,client-level, filesystem-side and application-side cachingmake sure, that the benchmarking load reflects the real load of the target

    applications and the target environment,

    Benchmarking tactics: bottom up or top-down?

    Both approaches has some advantages and disadvantages. The accepted tacticsshould fit the purposeof the test.If we want to quickly determine the overall performance of the storage system, we canstart from the top, e.g. by running a file-system level benchmark (more on benchmarksin the BENCHMARKING

    TOOLS section), and go down in case the performance is lower than expected. This ismost probably lesstime consuming approach, however we may get an incomplete picture of the actualsystem efficiency.Going bottom-up is more costly, but gives as the knowledge about the efficiency ofeach system layer and the overhead it introduces comparing to lower layers. Thisapproach is suitable if we want to find a bottle-neck in thesystem or we want to determine the realistic maximum performance in a tunned setup.Benchmarking parameters selection

    Having in mind the complexity of multi-layer storage system, we may be forced to:

    use heuristics,trust our experience and intuition (smile)in order to determine (and limit!!) the list of the configuration parameters modifiedduring the benchmarking process and their examined values. Choosing the correct

    parametersand their values is a difficult task and can be performed basing on the deep

    knowledgeof the storage systems, networks and computers systems architectures and

    characteristics.

    Measuring the correct bottle-neck

    Benchmarking the multi-layered system we have to make sure that we are measuringthe correctcomponent of the system.Data caching influence

    Typical issue faced while benchmarking on the file-system level is the influence of thedatacaching which can take place on the client side (file system-level, operating system-level), NAS gateway side,

    or on the storage controller.

    In order to measure the selected storage system element, e.g. the storage backendnetwork bandwidth,we may want to disable caching or plan the benchmark in the way, where the influenceof the cachingwill be eliminated.

    Some practical methods to achieve that are:

  • 7/31/2019 Technical Doc Storage Qa and Testing

    8/28

    using test files big enough to not fit into the cache memory,using different files between multiple test rounds,make sure you do not read the file that was recently written (if not big enough, it

    can still reside in the cache),force filesystem sync'ing between test rounds (flushing the data from the previous

    round that still residing in cache can disturb the current round of tests),allocating the buffer cache before running the benchmark (on the client size),restart NFS servers, filesystem deamons, or even client machines and storage

    controllers between the test rounds,re-mount the filesystems,use flags that force the direct I/O processing while mounting the filesystem.use the benchmark that support direct I/O.use block-level benchmarks instead of file-level.

    Another side effect of the data caching is that, the speed of writing the the buffercan be measured instead of the actual tranmission speed. This can be avoided byusing the data set large enought to fill up the buffersand transmit enough data over the real communication channels. We may also useexternal monitoring tools,such as Gigabit Ethernet network or SAN switches performance analysers, host portsusage monitoring tools toanalise the actual data traffic taking place in the network.Monitoring the system compoments

    While doing the benchmark we have to make sure, that we are able to determine thereason of thethe performance bottle-neck visible in the results. On the other hand, we should try toavoidincorrect intepretation of the results in case if some caching/buffering effect makes theresultsbetter than actual system features. We can use monitoring tools in addition to thebenchmarking tools for that purpose.

    Example monitoring tools we could while benchmarking the storage systems are:

    network interfaces monitoring tools: ntop, ethreal, tcpdumpCPU monitoring tools:: top, dstat, vmstatsystem statistics collectors: vmstat, sar, dstat/proc filesystem in linuxvirtual machines monitoring tools: xm top, virt-top, virt-manager for Xen;and many others.

    Avoiding the artificial setup

    Note however, that the decision to switch off or eleminate the effects of caching should

    be taken carefully.Disabling the cache, can make the operation of some optimisation technologiesimplemented in storage system elementsimpossible. For instance, if we disable the write cache in the RAID controller, full-stripewrite technique cannotbe exploited.

    Switching the caching off, we also make the testing environment a bit artificial theresults we get

  • 7/31/2019 Technical Doc Storage Qa and Testing

    9/28

    in such setup, can be useless in the real-life configuration.Benchmarking workloads

    Getting the realistic picture of the storage system performance, requires that we applythe benchmark that generates the workload that matches our needs. Againt, the kindof the testing workload should fit the purpose of performing the benchmark.

    Possible purposes of the storage testing are:

    evaluate the storage performance for a specific application,to compare storage systems without a specific applicationin mind.

    If we have a specific application in mind, we should make sure to be able to preparethe benchmark that reflects the application characteristics.

    This can be achieved in some ways:

    by finding the benchmark that generates the workload that is similar to theapplication data access pattern (micorbenchmarks or macrobenchmarks); thesimilarity should include:

    access pattern: sequential, random or mixed,read/write ratio,temporal and spatial locality of storage access requests,number of simultaneous access requests (concurrent application/benchmark

    threads).by choosing the test that focus on the storage characteristic that are crucial for

    application:throughput-intensive (performance measured e.g. in MB/s),I/O intensive (performance measured e.g. in IOPS or in request serving time).

    Workload generators and benchmark types

    There is a lot of benchmarks types available for free or under the license.They differ in:

    the level on which they run: e.g. :application level,filesystem level,block-level and device level;the storage system component they examine, e.g.:complete storage system,client network,storage controller elements.

    The correct benchmark should generate the workload that is similar to the targetapplication needs.

    Some application-level benchmarks simulate the behaviour of typical computingsystems applications.

    The table below provides some examples of benchmarks along with the links toInternet resources related to them.

    Level

  • 7/31/2019 Technical Doc Storage Qa and Testing

    10/28

    Workload generator / benchmark

    Auxiliary monitoring tools

    applications-level

    real application (FTP,NFS client, DBMS)

    top

    SPC (seq/random R/W)

    SPECsfs2008 (CIFS, NFS)

    DVDstore (SQL)

    TPC (transactions)

    network level

    iperf

    dstat, ethreal/wireshark, ntop

  • 7/31/2019 Technical Doc Storage Qa and Testing

    11/28

    smartbits appliance

    optiview link analyser

    filesystem level

    dd

    xdd

    iozone

    bonnie/bonnie++

    device level

    dd

    dstat

    iometer

    iostat, vmstat

  • 7/31/2019 Technical Doc Storage Qa and Testing

    12/28

    xdd

    diskspeed, hdtune, hdtach, zcav

    Linux's procfs directories

    own tools

    Selected benchmarks discussion

    In this section we discuss some benchmarks in details. We selected them basin on ourexperienceand interests. Therefore, the selection may be not optimal for every situation.SPC benchmarks

    Storage Performance Council (SPC) try to standardize storage system evaluation.The organisation defines industry-standard storage workloads.This "forces" vendors to publish a standardized performance of their storage systems.

    SPC-1 and SPC-2 benchmarks evaluate complete storage systems, while SPC-1C andSPC-2C evaluate storage subsystems,e.g. individual disk drives, HBAs, storage software (e.g. LVM, ...)

    The table below summarizes SPC-1 and SPC-2 specifics.

    SPC-1

    SPC-2

    Typical applications

    database operationsmail serversOLTP

  • 7/31/2019 Technical Doc Storage Qa and Testing

    13/28

    large filesprocessinglarge database queriesvideo on demand

    Workload

    Random I/O

    Sequential I/O (1+ streams)

    Workload variations

    addressrequestdistribution: uniform + sequentialR/W ratio

    transfer sizeR/W rationumber of outstanding I/O requests

    Reported metrics

    I/O rate (IOPS)total storage capacityprice-performance

    Data rate (MB/s)Total storage capacityPrice-performance

    Good practices and tips for benchmarkingStorage system components benchmarking

    If we decide to perform layer-by-layer testing of the storage system or we want to finda sourceof the bottleneck observed in the system we may want to examine a single element ofthe system and

    avoid the influence of the other elements.Network-only benchmarking

    Example problem is to examine the network ability to carry the data traffic using agiven protocol,without the influence of the disk access latency. To do so, we may use some tricks, forinstance:

    measure the network link features with a dedicated tool, that performs RAM to RAM

  • 7/31/2019 Technical Doc Storage Qa and Testing

    14/28

    transfers, e.g. iperf,configure a RAM disk and export it, e.g. using NFS (say we want to test NFS

    transmission efficiency).

    Network benchmarking tips

    As mentioned before, the client network can be a source of significant performancebootlenecks.However, the degree the network parameters impact the performance observed by theapplicationis related to the fact, what is the application sensitive for.In case, the application is bandwidth intensive, we should examine mainly thebandwidth of the network, whilefor I/O intensive application the network delay should be carefully tested.

    The parameters of the network link we should evaluate depend also on the kind of theprotocol usedfor data transmission. For instance, in case of NFS, both bandwidth and delay of thenetwork link matters,as typicall NFS performs a synchronous data transmission. Another example is theGridFTP protocol,which can exploit multiple parallel tranmission streams. Therefore, the networkbenchmark shouldbe able to examine the network bandwidth using multiple parallel transmissionstreams (for instance, wecan use iperf benchmark with multiple TCP/IP streams).Benchmarking the storage systems compoments

    This section of the HOTWO contains the more detailed informations aboutbenchmarking selected storage systemcompoments. Each of the subsections contains some background informations as wellas practical informationson testing the storage system element.Clients network

    This part of the HOWTO describes how to perform clients network testing. Somebackground information isprovided along with the practical informations related to TCP network testing usingiperf tool.Iperf is a open-source benchmark working both under Linux and Windows operatingsystem.In this HOWTO we focus on testing the network under Linux, i.e. between twomachines running Red Hat Linux Enterprise.Background information

    As mentioned in the introduction, the efficiency of data access performed by the client

    network may depend onnetwork topology, technology, transmission protocol and network delay which in turncan result from both the physical distance between communication parties and thefeature of communication equipment that processes the network traffic.

    As the most popular client network in storage systems is the IP connectivityimplemented over Gigabit Ethernetwe focus on this technology in this part of the HOWTO. We will show, how to examinedata transmission delays

  • 7/31/2019 Technical Doc Storage Qa and Testing

    15/28

    present in the network and the bandwidth available in the link. As some datatransmission protocols are able to exploitmultiple parallel data streams (TCP connections), we also show how to examine theinfluence of the numberof the TCP streams used in parallel on the actual bandwidth observed by theapplication.

    Test preparation

    Because Iperf is an client-server application, you have to install iperf on both machinesinvolved in tests. Make sure that you use iperf 2.0.2 using p-threads or newer due tosome multi-threading issues with older versions. You can check the version of thisinstalled tools with a following command:

    [:userathostname ~]$ iperf -viperf version 2.0.2 (03 May 2005) pthreads

    Because in many cases the low network performance is caused by high CPU load, youshould measure CPU usage at both link ends during every test round. In this HOWTOwe use an open-source vmstat LINK-! tool which you probably have already installed onyour machines.Link properties

    Before we start the tests, we should take a look on our network link setup. First, weshould ensure that we can use MTU larger than the standard Ethernet MTU. We shouldtry to use MTU 9000. Using jumbo frames is recommended especially in reliable andfast networks, as bigger frames boost network performance due to a better header-to-body-frame ratio. But we should remember that, it is possible to use MTU 9000 only ifall network hardware between tested hosts (routers, switches, NICs etc.) support

    jumbo frames.

    In order to enable MTU 9000 on the machine network interfaces you may use ifconfigcommand.

    [root@hostname ~]$ ifconfig eth1 mtu 9000

    Alternatively, you can put this settings into interface configuration scripts, e.g./etc/sysconfig/network-scripts/ifcfg-eth1 (on RHEL, Centos, Fedora etc.).

    If jumbo frame are working properly, you should be able to ping one host from anotherusing large MTU:

    [root@hostname ~]$ ping 10.0.0.1 -s 8960

    In the example above, we use 8960 instead of 9000 because ping tool option -s needsframe size minus frame header which lenght is equal to 40 bytes. If you cannot use

    jumbo frames set the mtu to default value 1500.

    To tune your link you should measure the average Round Trip Time (RTT) betweenmachines. RTT can be obtained by multiplying the value returned by a ping commandby 2. When you have RTT measured, you can set TCO read and write buffers sizes.

    There are three values you can set: minimum, initial and maximum buffer size. Thetheoretical value (in bytes) for initial buffer size is BPS / 8 * RTT, where BPS is the linkbandwidth in bits/second. Example commands that set these values for the wholeoperating system are:

  • 7/31/2019 Technical Doc Storage Qa and Testing

    16/28

    [root@hostname ~]# sysctl -w net.ipv4.tcp_rmem="4096 500000 1000000"[root@hostname ~]# sysctl -w net.ipv4.tcp_wmem="4096 500000 1000000"

    Probably, it is best if you start with values computed using the formula mentionedabove and then tune these values according to the tests results.

    You can also experiment with maximum socket buffer sizes:

    [root@hostname ~]# sysctl -w net.core.rmem_max=1000000[root@hostname ~]# sysctl -w net.core.wmem_max=1000000

    Another options that should boost performance are:

    [root@hostname ~]# sysctl -w net.ipv4.tcp_no_metrics_save=1[root@hostname ~]# sysctl -w net.ipv4.tcp_moderate_rcvbuf=1[root@hostname ~]# sysctl -w net.ipv4.tcp_window_scaling=1[root@hostname ~]# sysctl -w net.ipv4.tcp_moderate_rcvbuf=1[root@hostname ~]# sysctl -w net.ipv4.tcp_sack=1[root@hostname ~]# sysctl -w net.ipv4.tcp_fack=1[root@hostname ~]# sysctl -w net.ipv4.tcp_dsack=1

    COMMENT: the meaning of these parameters is explained in the Linux documentation(sysctl command) LINK-!.Iperf tool description

    After setting up the network link parameters, we are ready to run the test. Obviously,configuringthe network settings can be an iterative process, where we check different settings byrunning thetests and evaluating their results.

    To perform the test, we should run iperf in server mode in one host:

    [root@hostname ~]# iperf -s -M $mss

    On the other host we should run command like this:

    [root@hostname ~]# iperf -c $serwer -M $mss -P $threads -w $\{window\} -i $interval-t $test_time

    There is a description of names and symbols used in the command line:

    s - serverc - client$server - address of machine on which iperf server is runningM - MSS=MTU-40

    P number of threads which are sending data through tested link simultaneouslyw - TCP initial buffer sizei - interval between two test rounds in secondst - test time

    Testing methodology

    As we want to perform several actions before, during and after a single test. Theseactions include:

  • 7/31/2019 Technical Doc Storage Qa and Testing

    17/28

    starting vmstat on both machines - in order to collect system statistics,run iperf server and iperf client - to perform actual test,stop stop vmstat and iperf server - after the test round,we may want to automate the process. To achieve this, we can write some bash

    scripts to run tests.

    To simplify the remote machine access (i.e. to avoid providing the password each timewe want to execute a command on a remote host,we may setup ssh keys properly.

    To generate key-pair we use a following command:

    [root@hostname ~]# ssh-keygen -t dsa

    Then we copy the public key to the remote server and add it to authorized keys file:

    [root@hostname ~]# cat identity.pub >> /home/sarevok/.ssh/authorized_keys

    Now we can login from the remote serve to bug without password. So can also runprograms remotely, e.g. from the bash script.

    Below, we provide some simple scripts to run iperf and vmstat on both machines. Theiperf and vmstat runs in rounds for various number of threads.Note, that after each round both vmstat and iperf are killed and restarted to obtainidentical testing environment.

    The test is performed for different number of threads (up to 128), because the worsethe link RTT, the greater performance boost can be achievedby using multiple transmission threads. We may expect that, if RTT of the link is poor,we will achieve performance boost whenmany threads are used.

    There is a simple shell script to run iperf test:#!/bin/shfile_size=41dst_path=/home/stas/iperf_resultsscript_path=/rootcurr_date=`date +%m-%d-%y-%H-%M-%S`serwer="10.0.1.1"user="root"test_time=60interval=1mss=1460window=1000000min_threads=1max_threads=128

    for threads in 1 2 4 8 16 32 64 80 96 112 128 ; dossh $user@$serwer $script_path/run_iperf.sh -s -w $\{window\} -M $mss &ssh $user@$serwer $script_path/run_vmstat 1 vmstat-$window-$threads-$mss-

    $curr_date &vmstat 1 > $dst_path/vmstat-$window-$threads-$mss-$curr_date &

    iperf -c $serwer -M $mss -P $threads -w $\{window\} -i $interval -t $test_time >>

    $dst_path/iperf-$window-$threads-$mss-$curr_date

  • 7/31/2019 Technical Doc Storage Qa and Testing

    18/28

    ps ax | grep vmstat | awk '\{print $1\}' | xargs -i kill \{\} 2&>/dev/nullssh $user@$serwer $script_path/kill_iperf_vmstat.sh

    done

    Script run_iperf.sh can look like this:

    #\!/bin/sh

    iperf $1 $2 $3 &

    run_vmstat.sh script can contain:

    #\!/bin/shvmstat $1 > $2 &

    kill_iperf_vmstat.sh may look like this:

    #\!/bin/shps -elf | egrep "iperf" | egrep -v "egrep" |awk '\{print $4\}' | xargs -i kill -9 \{\}ps -elf | egrep "vmstat" | egrep -v "egrep" |awk '\{print $4\}' | xargs -i kill -9 \{\}

    To start test script that can ignoring hangup signals, you can use nohup command.

    [:stasatworm]$ nohup script.sh &

    This command keeps the test running when you close the session with server.

    To obtain reliable and recurrent test results be sure that both machines are notperforming any other tasks when the test is running. Run a number of tests and countan average the values acquired from all of them.

    To present obtained results you may use the open-source gnuplot LINK-! program, asthe graphical results presentation may help yuo to interpret them. Generating the plotscan also be automated.RAID structures

    ...RAID background information

    One of the most known and important storage virtualisation techniques is the RAID(Redundant Array of Independent Disks) technology. It allows to combine independentdisk resources into structures that can provide advanced reliability and performancefeatures, which are not possible to deliver using individual disk resources (e.g.individual drives).In this section we describe standard RAID levels including RAID 0, 1, 5 and 6 as well as

    nested RAID structures such as 10, 50 and 60. At the end of the section we summarizefault tolerance, performance and storage efficiency characteristics of particular RAIDstructures.Standard RAID structures

    RAID0 (striping) does not provide any data redundancy. The main purpose of usingRAID0 structures is to distribute the data traffic load over RAID components. Each fileis split into blocks of a certain size and those are distributed to the various drives. Thisblock is called chunk. A number of chunks for whom one parity chunk was counted is

  • 7/31/2019 Technical Doc Storage Qa and Testing

    19/28

    called stripe. Chunk size is a user-defined parameter characteristic to array. Thechunks are send to all disk in the array in a way that one chunk is written to only onedisk. Distributing the I/O operations among multiple drives allow to accumulate theperformance of particular RAID components.

    RAID1 is implemented as mirroring. The data that are written to both RAID components

    (disks) are ideal copies of each other. This means that every write operation performedon array must be done on both RAID components (drives). Variant of this technology isduplexing , where two independent RAID controllers are used to perform parallel writeson both mirrors.

    Parity mechanism provides mechanism to write redundant data in more sophisticatedway than in case of mirroring. This mechanism is used to im- plement RAID5 and RAID6structures. If there are N drives in array, the disk controller splits data into N-1 stripe-sized chunks to write simultaneously on N-1 disks. Additionally the controller countsadditional stripe-sized data called parity, and writes it to N-th disk. The parity iscalculated using XOR operation. While recovering the broken data in the array (e.g.when one stripe is damaged), XOR operation is performed on the parity stripe and validN-2 stripes. The result equals the broken stripe.

    Parity information can be written to one special disk (RAID3 and RAID7structures) orcan be spread across all the drives in the array (RAID5 and RAID6). The latter approachhas some performance advantages, as the parity-writing load is distributed over allRAID components, opposite to the former case in which parity stripe writing can be abottleneck. The known limitation of parity mechanism is that calculating the paritydata can affect the write performance. RAID5 uses single parity stripe, while RAID6uses double parity stripes, which provides extra fault tolerance - the array can dealwith two broken drives. However, double-parity calculation can affect the writeperformance.Nested RAID structures

    Standard RAID structures have contradictory performance and redundancy features,e.g. mirroring provides high data redundancy, while limiting the writing performance.In order to provide both redundancy and performance, nested RAID structures areused. RAID10 combines some number of RAID1 structures by striping the data overthem RAID0. In that way, superior fault tolerance can be achieved. The array can dealwith 50% broken drives if for every broken drive there is his mirror drive working.RAID10 can achieve performance similar or or even better (in random read case) thanRAID0. Another commonly used nested RAID structure is RAID50. It is RAID0 made ofsome number of RAID5 structures. RAID50 improve the performance of RAID5 thanksto the fact that the I/O traffic is distributed over particular RAID5 structures. Thisapproach is effective especially for write operations. It also provides better faulttolerance than the single RAID level does. The drawback of nested RAID structures isthat they require a relatively high number of drives to implement a given storagespace.

    Taking into account all theoretical information shown above, we summarize thestorage efficiency information in Table 1

    Table below contains a RAID levels comparison. The notes range is 0 - the worst to 5 -the best. The notes are based onhttp://www.pcguide.com/ref/hdd/perf/raid/levels/comp.htm and modified according togathered experience and the fact that we are considering the same amount of drives inevery RAID structure. To compare performance we assume that we make each RAIDstructure using the same number of drives and we use one thread to read or write data

  • 7/31/2019 Technical Doc Storage Qa and Testing

    20/28

    from or to the RAID structure.

    RAID Level

    Capacity

    Storage efficiency

    Fault tolerance

    Sequential read perf

    Sequential write perf

    RAID0

    S*N

    100%

    0

    5

    5

    RAID1

    S*N/2

    50%

    4

    2

    2

    RAID5

  • 7/31/2019 Technical Doc Storage Qa and Testing

    21/28

    S * (N - 1)

    (N - 1)/N

    3

    4

    3

    RAID6

    S * (N - 2)

    (N - 1)/N

    4,5

    4

    2,5

    RAID10

    S * N/2

    50%

    4

    3

    4

    RAID50

    S * N0 * (N5 - 1) (N5 - 1)/N5

  • 7/31/2019 Technical Doc Storage Qa and Testing

    22/28

    (N 5 - 1)/N

    3,5

    3

    3,5RAID benchmarking assumptions

    This part of the HOWTO explains the methodology and tips for measuring performanceof client's Block-level storage configured on certain types of RAID storage.

    Note, that the RAID storage used in this section is the software RAID implementedusing the Linux MD mechanism LINK-!, over the disk drives directly attachedto the server (DAS storage).

    The purpose of this part of the HOWTO is to:

    show how to examine block level storage features using both block-level andfilesystem-level test (dd, iozone),

    provide examples of configuring RAIDs using Linux MD mechanism, including simpleor nested RAID structures.

    For the purpose of this howto we also suppose that:

    we have the same number of drives to build each RAID structure we examine,we create all tested RAID structures using the same identical pool of disks,we use the same type and size of the filesystem for filesystem-level testing.In that way we assure that we measure and compare only the RAID structures

    performance differences.

    RAID structures preparation

    Here, we present how to make software raid structure using Linux md tool. To createsimple raid level from devices sda1 sda2 sda3 sda4 you should use followingcommand:

    mdadm --create --verbose /dev/md1 --spare-devices=0 --level=0 --raid-devices=4/dev/\{sda1, sda2, sda3, sda4\}

    Where:

    /dev/md1 - the name of created raid group,

    spare devices - you can specify number on drives to be spare ones,level - simple raid level you want to create (Currently, Linux supports LINEAR (disks

    concatenation) md devices, RAID0 (striping), RAID1 (mirroring), RAID4, RAID5, RAID6,RAID10),

    raid-devices - number of devices you want to use to make a RAID structure.

    When you create a RAID structure you should be able to see some RAID details similarto the informations shown below:

  • 7/31/2019 Technical Doc Storage Qa and Testing

    23/28

    [root@sarevok bin]# mdadm --detail /dev/md1/dev/md1:

    Version : 00.90.03Creation Time : Mon Apr 6 17:41:43 2009

    Raid Level : raid0Array Size : 6837337472 (6520.59 GiB 7001.43 GB)

    Raid Devices : 4Total Devices : 4Preferred Minor : 3

    Persistence : Superblock is persistent

    Update Time : Mon Apr 6 17:41:43 2009State : clean

    Active Devices : 4Working Devices : 4Failed Devices : 0Spare Devices : 0

    Chunk Size : 64K

    Rebuild Status : 10% complete

    UUID : 19450624:f6490625:aa77982e:0d41d013Events : 0.1

    Number Major Minor RaidDevice State0 65 16 0 active sync /dev/sda11 65 32 1 active sync /dev/sda22 65 48 2 active sync /dev/sda33 65 64 3 active sync /dev/sda4

    When performance is considered the chunk size of the md device may be importantparameter to tune. There is -c option of the mdadm command, that cen be usedtospecify chunk size of kilobytes. The default is 64 kB, however it should be setup upaccording to some factors such as:

    physical segment size of the disk drives,planned block size of the filesystem.

    Important!! When you want to run any benchmark on a new RAID structure you shouldwait until the RAID structure (re)building is finished. When you run a test when theRAID is being (re)building you achieve poor performance results, because systemresources and disks are involved in (re)build RAID structure.

    Because iozone benchmark is used in the next part of the HOWTH, we should firstlymake a file system on the newly created md device.

    The file system choice is not the topic of this part of the HOWTO. However, we assumethat ext2 or ext3 file system is a good choice for our test.When using ext2 and ext3 we may be interested in tunnning some basic file systemparameters such as journaling type. We can do it using tune2fs Linux command.

    To create the file system on md device and then mount it to some directory we usecommand like this:

  • 7/31/2019 Technical Doc Storage Qa and Testing

    24/28

    mkfs.ext3 /dev/md1mount /dev/md1 /mnt/md1

    Again, there are some filesystem parameters that are interesting from theperformance point of view. One of them isthe blocksize. It should be set taking into account the application features and the

    underlying storage components.One of the rules of thumb is to use the block size equal to the size of the RAID stripe.You can set the the blocksizeusing mkfs's -b parameter. There is also possibility to influence the filesystembehaviouds by using using mount command options.Methods and tools for RAID levels benchmarking

    To examine the performance of md device we normally use iozone tool. However, formaking a quick test (for example to receive fast results) we may use dd tool.dd tool

    The idea of dd is to copy file from 'if' location to 'of' location. Using this tool to measuredisk devices requires some trick. To measure write speed you read data from /dev/zeroto file on the tested device. For measuring the read performance you should read thedata from the file on tested device and write it to /dev/zero.In that way we avoid measuring more that one storage system at a time. To measuretime of reading or writing the file we use time tool. The example commands to testwrite and read 32 GB of data are:

    for writing performance:

    [root@sarevok ~]# time dd if=/dev/zero of=/mnt/md1/test_file.txt bs=1024Mcount=32

    and for reading performance:

    [root@sarevok ~]# time dd if=/mnt/md1/test_file.txt of=/dev/zero bs=1024Mcount=32

    where:

    if - input file/device pathof - output file/device pathbs - size of a chunk of data to copycount - how many times a chunk defined by bs is copied

    iozone tool

    For more precise tests we use iozone tool. Iozone allows to run test in many modes,including:

    read, reread,random read,write, rewrite,random write .

    In order to evaluate storage system features you should use a mode that fits best theapplication data access pattern. However, for all tests you have to run the write/rewritemode. benchmark first (in order to generate the files for reading).

  • 7/31/2019 Technical Doc Storage Qa and Testing

    25/28

    The other parameters which are interesting is the amount of Linux p-threads which areused for simultaneous read or write operations. Thanks to that feature we can realizehow the performance changes while using more than 1 benchmark thread.

    We can also use the block-size parameter to set the read/write chunk size (compare

    with md and file system chunk size).

    Another important options allow to decide if the file close and flush operations areincluded in the measured time. Including flush time is especially significant, becausethis is the time when the cache buffers are written to disks.

    To perform one round of the test we can use command:

    iozone -T -t $threads -r $\{blocksize\}k -s $\{file_size\}G -i 0 -i 1 -i 2 -c -e

    where:

    -T - Use POSIX pthreads for throughput tests-t - how many threads use for test-r - chunk size used to test-s - test file size Important!! It is file size PER THREAD, because each thread writes or

    reads from it's own file.-i - test modes - we choose 0 - write/rewrite 1 - read/reread and 2 - random

    write/read-c - Include close() in the timing calculations-e - Include flush (fsync,fflush) in the timing calculations

    It is also vital thing to realize that we measure the performance of software raid. Thismeans that all tasks of the RAID controller (for example parity counting and checking)is performed by machine CPU. So it is necessary to measure CPU load when youperform a md device test, since CPU can be the performance bottleneck in suchsystem. To measure CPU load we use vmstat tool.

    To automate the testing we can write some simple SH script like this:

    #\!/bin/shdst_path=/home/sarevok/wyniki_test_iozonecurr_date=`date +%m-%d-%y-%H-%M-%S`

    file_size=128min_blocksize=1max_blocksize=32

    min_queuedepth=1max_queuedepth=16

    mkdir $dst_pathcd /mnt/sdaw/

    blocksize=$min_blocksizewhile [: _blocksize -le $max_blocksize ]; do

    queuedepth=$min_queuedepthwhile [: _queuedepth -le $max_queuedepth ]; do

  • 7/31/2019 Technical Doc Storage Qa and Testing

    26/28

    vmstat 1 > $dst_path/vmstat-$blocksize-$queuedepth-$curr_date

    /root/iozone -T -t $queuedepth -r $\{blocksize\}k -s $\{file_size\}G -i 0 -i1 -c -e > $dst_path/iozone-$blocksize-$queuedepth-$curr_date

    ps ax | grep vmstat | awk '\{print $1\}' | xargs -i kill \{\} 2&>/dev/null

    queuedepth=`expr $queuedepth \* 2`file_size=`expr $file_size \/ 2`done

    blocksize=`expr $blocksize \* 2`done

    To start test script that can ignore hangup signals, you can use nohup command:

    : root@sarevok$ nohup script.sh &

    This command keeps the test running when we close the session with server. Topresent obtained results we use the open-source gnuplot program.Remarks

    When you perform any disk device or file system benchmark you should have in mindthat there are many levels of cache in the system, e.g. file system cache, operatingsystem cache, disk drive cache etc. The simplest way to avoid cache influence is to usesuch amount of data to fill all cache levels buffers. To do this we use the amount ofdata that is at least is equal to the machine RAM size doubled. Such data size shouldsuccessfully eliminate the caching influence on measured md device performance.

    After every round of tests, when we want to change some RAID (mD) or the file systemparameters it is recommended to make the fresh file system in md device, in order toavoid the influence of the filesystem state on the test results.Benchmarking tools discussion

    This section of the HOTWO discusses the details of particular benchmarking tools andprovidespractical information about theirs usage, automation, interpretation of the results andso on.

    TO BE EXPANDED.Links:Practical information:File system benchmarking examples:

    Filesystem contestBenchmarking Filesystems

    Independent storage benchmarking organisations:Storage Performance Council (SPC)

    Storage Performance CouncilSPC-1 Benchmark ResultsSPC-2 Benchmark Results

    Enterprise Strategy Group (ESG)

  • 7/31/2019 Technical Doc Storage Qa and Testing

    27/28

    Enterprise Strategy Group

    Howto authors:

    The text re-uses the material presented by Stijn Eeckhaut in Espoo, Finland,during the 1st TF-Storage Meeting, 8 April 2008.

    It also includes the material prepared by Maciej Brzezniak from PoznanSupercomputingand Networking Centre, Poland and Stanislaw Jankowski, student at Poznan Universityof Technology.

    If you want to contact the HOWTO authors directly, send the email to:

    maciekb -=at=- man.poznan.plstaszek -=at=- man.poznan.pl

    The latest released version of the Storage Driver Test Kit is always available at

    http://developer.novell.com/devres/storage/SAS.RLS

    in the "Test Kit version" link.

    What kind of configuration should I use for testing my product?

    The configurations supported by the Storage Driver Test Kit are displayedgraphically in a PDF file at

    http://developer.novell.com/devres/storage/stortest.pdf

    in the chapter titled "Setting Up the Test Configuration."

    If the exact configuration I want to certify does not appear on one of your diagrams,how can I get my particular setup certified?

    Our certification tests are meant to verify the compatibility of the item under testwith NetWare in a fashion that best or most easily manifests problems, not necessarily

    how you sell it or present it to your customers. In order to obtain certification of yourparticular product, you need to conform your configuration to our diagrams the bestyou could. As we are always aware there will be exceptions, if you feel yourconfiguration should qualify for one, please contact us at [email protected] toaddress your particular situation.

    When I power down (or up) the drive under test during the Drive Fault Test, why doesa) another of my drives deactivate or b) my system reboot or hang or c) my SYS:volume dismount due to drive failure?

  • 7/31/2019 Technical Doc Storage Qa and Testing

    28/28

    Often these symptoms are a result of a ground loop caused by the sharing of thesame power supply by the test drive with another drive or the system. The inductive"kick" that results from applying or terminating power to a device suddenly will cause apower spike to propagate throughout the power system. For safe drive fault testing, werecommend very strongly that you supply power to the test drive with a power supply

    independent of the system's power supply. When using an isolated power supply,remember to provide the minimum load to the supply sufficient to maintain thespecified voltage.