robertsmith operating system storage analysis v1-1

Upload: forum4user

Post on 06-Jul-2018

221 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/17/2019 RobertSmith Operating System Storage Analysis v1-1

    1/36

    Operating System, Storage Performance

    Analysis

    Robert M. Smith, Microsoft Corporation

    Author: Robert M. Smith, Microsoft Corporation

  • 8/17/2019 RobertSmith Operating System Storage Analysis v1-1

    2/36

    Operating System Storage Performance Analysis

    © 2012 Storage Networking Industry Association. All Rights Reserved.

    22

    SNIA Legal Notice

    The material contained in this tutorial is copyrighted by the SNIA unless otherwisenoted.

    Member companies and individual members may use this material in presentationsand literature under the following conditions:

    Any slide or slides used must be reproduced in their entirety withoutmodification

    The SNIA must be acknowledged as the source of any material used in thebody of any document containing material from these presentations.

    This presentation is a project of the SNIA Education Committee.

    Neither the author nor the presenter is an attorney and nothing in thispresentation is intended to be, or should be construed as legal advice or an opinionof counsel. If you need legal advice or a legal opinion please contact your attorney.

    The information presented herein represents the author's personal opinion andcurrent understanding of the relevant issues involved. The author, the presenter,and the SNIA do not assume any responsibility or liability for damages arising out ofany reliance on or use of this information.NO WARRANTIES, EXPRESS OR IMPLIED. USE AT YOUR OWN RISK.

  • 8/17/2019 RobertSmith Operating System Storage Analysis v1-1

    3/36

    Operating System Storage Performance Analysis

    © 2012 Storage Networking Industry Association. All Rights Reserved.

    33

    Abstract

    OS Storage Performance AnalysisAnalyzing and dealing with storage performance at the OSlevel can be challenging in many respects. This tutorialcovers aspects of performance with respect to storage.This tutorial will also cover tools that can be used to assist

    in the analysis of operating system performance.

    This presentation will include the following:Factors affecting storage performance

    Examples of tools to monitor storage performanceRecommendations to improve storage performance

  • 8/17/2019 RobertSmith Operating System Storage Analysis v1-1

    4/36

    Operating System Storage Performance Analysis

    © 2012 Storage Networking Industry Association. All Rights Reserved.

    SAN I/O Path, 1000 ft. view

    4

  • 8/17/2019 RobertSmith Operating System Storage Analysis v1-1

    5/36

    Operating System Storage Performance Analysis

    © 2012 Storage Networking Industry Association. All Rights Reserved.

    OS I/O: Closer View

    File System

    Volume / Partition

    Device Class

    Command Port

    5

    User Mode

    Kernel Mode

    Application

    Storage

  • 8/17/2019 RobertSmith Operating System Storage Analysis v1-1

    6/36

    Operating System Storage Performance Analysis

    © 2012 Storage Networking Industry Association. All Rights Reserved.

    Rotational Drives“Capacity Optimized” drives

    TB Size: 0.5, 1, 2, 3

    *IOPS: >= 120 (worst case, random “full-stroke” workloads)

    SAS or SATARegardless of size, same performance, same IOPs

    ~8.5 ms latency (½ platter seek); worst case 16 to 19 ms(on average across manufacturers)

    “Performance Optimized” drives

    GB Size: 72, 144, 450, 600, 900

    *IOPS: 200-400 (worst case)

    SAS, FC (some SATA)

    2-4 ms latency (on average across manufacturers)

    6

    Disk Drive Factors

  • 8/17/2019 RobertSmith Operating System Storage Analysis v1-1

    7/36

  • 8/17/2019 RobertSmith Operating System Storage Analysis v1-1

    8/36

    Operating System Storage Performance Analysis

    © 2012 Storage Networking Industry Association. All Rights Reserved.

    Storage Hardware Factors

    Controller Cache ConfigurationsHow much cache?

    What is read/write ratio of cache?

    How effective is cache?

    Enterprise storage usually has performance measuring capabilityonboard

    What happens when a threshold is reached? (I.E. Flush)

    Idle flushing: does not interrupt, I/O continues

    Low and high watermark flushing: triggers flushing, minorperformance impact

    Forced flushing: to free cache pages, all I/O temporarily halted

    8

  • 8/17/2019 RobertSmith Operating System Storage Analysis v1-1

    9/36

    Operating System Storage Performance Analysis

    © 2012 Storage Networking Industry Association. All Rights Reserved.

    Storage Hardware Factors (2)

    Is cache “mirroring” involvedIf so, is there a performance impact?

    Are there other workloads on the storage device?

    What hardware is between initiator and target?

    If SAN, how many and what types of switches?

    Virtualization Appliances

    Some take the “LUNs” presented and virtualize those

    Some have onboard storage

    9

  • 8/17/2019 RobertSmith Operating System Storage Analysis v1-1

    10/36

    Operating System Storage Performance Analysis

    © 2012 Storage Networking Industry Association. All Rights Reserved.

    Storage Hardware Virtualization

    Virtual Disks (AKA LUN)Comprised of a group or “chunk” of a group of physicaldisks, and then presented by a storage device

    Possibly indicated by:

    Non-standard sizeDevice interrogations returning storage vendor vs. drive vendor

    Virtualize to consolidate

    Aggregation of underlying LUNs

    (virtualization appliance)Adds complexity

    Troubleshooting more difficult(example, very tough to find “hot spots”)

    10

  • 8/17/2019 RobertSmith Operating System Storage Analysis v1-1

    11/36

    Operating System Storage Performance Analysis

    © 2012 Storage Networking Industry Association. All Rights Reserved.

    Storage Layout Factors:

    Disk Configuration

    RAID level

    ex. 1, 5, 6, 1+0, 0+1, 5+0, 0+5, 6+0 etc.

    Number of physical disk drives backing

    Levels of virtualization between server(s) and disks?

    Any storage pool sharing involved?Dedicated disks or shared storage pools?

    What is the backup schedule for ALL connected

    hosts

    LUN snapshots, database table scans, etc.

    11

    What decisions affected design?

  • 8/17/2019 RobertSmith Operating System Storage Analysis v1-1

    12/36

    Operating System Storage Performance Analysis

    © 2012 Storage Networking Industry Association. All Rights Reserved.

    CostConsolidation

    Migration

    Risk

    RAID types versus performance

    Power and cooling

    Expansion

    Manageability

    12

    Storage Layout Factors:

    Design Decisions

  • 8/17/2019 RobertSmith Operating System Storage Analysis v1-1

    13/36

    Operating System Storage Performance Analysis

    © 2012 Storage Networking Industry Association. All Rights Reserved.

    Storage Layout Factors (2)

    What happens to a storage group if a disk drive fails?What is the performance impact?

    How long to rebuild?

    Data could be vulnerable during rebuild

    Is anyone notified of a failure?

    13

  • 8/17/2019 RobertSmith Operating System Storage Analysis v1-1

    14/36

  • 8/17/2019 RobertSmith Operating System Storage Analysis v1-1

    15/36

    Operating System Storage Performance Analysis

    © 2012 Storage Networking Industry Association. All Rights Reserved.

    Storage Controller Factors

    Mass-Storage ControllersRange from on-board to add-in

    Some have battery backup ability in either case

    Basic controllers report limited diagnostic information

    Advanced controllers have diagnostics availableVendor supplied tools

    Capable of sending events to operating system through extendedlogging

    Enterprise storage may have multiple controllers withshared cache

    15

  • 8/17/2019 RobertSmith Operating System Storage Analysis v1-1

    16/36

    Operating System Storage Performance Analysis

    © 2012 Storage Networking Industry Association. All Rights Reserved.

    Fibre Channel or SAS HBAs

    Host-Bus Adapter (HBA)8 Gb and 16 Gb available today

    SCSI command interface to OS

    Often synonymous with Fibre Channel SAN

    Offload packet assembly and disassemblyProvides OS a view into the SAN(though most activity is abstracted by default)

    Vendor provided diagnostics and performance tools

    No software capture tools

    Multiple HBAs, or multiple-port HBAs enable MultiplePath I/O (MPIO)

    Most OS have native support for MPIO

    16

  • 8/17/2019 RobertSmith Operating System Storage Analysis v1-1

    17/36

    Operating System Storage Performance Analysis

    © 2012 Storage Networking Industry Association. All Rights Reserved.

    Ethernet Adapters

    Ethernet Network Interface Card (NIC)

    10 GbETCP/IP and Chimney Offloads

    Hardware parity, CRC, ECC

    Converged Network Adapter (CNA)

    Combines functionality of HBA and NIC

    Fibre Channel over Ethernet (FCoE)

    CPU offloads for FCoE and iSCSI

    Can present NIC, FCoE, or iSCSI function to hostTeaming software for throughput and availability

    Software analyzers likely unable to capture all traffic

    17

  • 8/17/2019 RobertSmith Operating System Storage Analysis v1-1

    18/36

    Operating System Storage Performance Analysis

    © 2012 Storage Networking Industry Association. All Rights Reserved.

    Latency

    Rotational Disks

    Millisecond latency

    Sequential writing to rotationaldrives is the most efficient

    Sequential, and/or “full-stripe”writes to RAID disks are mostefficient

    Latency occurs as heads have tomove position across rotatingplatter

    Operating system logical addressmay be different from physicallocation on disk device

    18

    SSD

    Microsecondlatency

    Small randomwrites slowest

    (Flash block)

    Flushing

    Firmware

    Keeps improving

    performance andavailability

  • 8/17/2019 RobertSmith Operating System Storage Analysis v1-1

    19/36

    Operating System Storage Performance Analysis

    © 2012 Storage Networking Industry Association. All Rights Reserved.

    Queuing

    The art of keeping the I/O pipeline populated, but notcongested

    Can happen at many levels

    Operating system can build up thousands of I/O requests

    Can build up at switch ports (buffer credits)

    Can build up at backend storage ports (inbound queue)

    Can build up in storage controllers (HBA, NIC, etc.)

    I/O throttling via queue depth setting

    Individual disk devices

    Native command queuing (NCQ) for SATA AHCI

    19

  • 8/17/2019 RobertSmith Operating System Storage Analysis v1-1

    20/36

    Operating System Storage Performance Analysis

    © 2012 Storage Networking Industry Association. All Rights Reserved.

    “ Short-Stroking” to reduce latency

    Forcing the use of a smaller area of a rotating disk toreduce seek distance, thus latency

    Also a result of “aerial density”

    Data is written more densely on outer tracks

    Outer edge of disk may get 150 MB/s while inner tracksget 80 MB/s

    Less latency means more IOPs

    Penalty is under-utilized storage space

    20

  • 8/17/2019 RobertSmith Operating System Storage Analysis v1-1

    21/36

    Operating System Storage Performance Analysis

    © 2012 Storage Networking Industry Association. All Rights Reserved.

    “ Advanced Format” (AF) Technology

    AF Refers to physical disk sector size and/or block

    architecturePrevious limits

    Physical disk sector size: 512 bytes

    Master Boot Record (MBR) structure sizesApproximately 2 Terabytes maximum disk size

    New Capabilities:

    Physical sector size: 4096 bytes (4 kb)

    512e is a 4 kb block presented as 512-byte block

    More space for error checking (CRC)

    More storage space available in same or less physical space

    No corresponding increase in performance capability21

  • 8/17/2019 RobertSmith Operating System Storage Analysis v1-1

    22/36

    Operating System Storage Performance Analysis

    © 2012 Storage Networking Industry Association. All Rights Reserved.

    Partit ion Alignment

    Previously a problem, manual steps to mitigateCurrent OS align by default

    Check partition starting sector to confirm

    Using management interface (Ex. WMI)

    Look for starting offset of 2048 blocks

    Cannot easily change

    Can automate during OS installation

    Affects legacy and AF drive technology512e AF blocks can suffer from misalignment

    22

  • 8/17/2019 RobertSmith Operating System Storage Analysis v1-1

    23/36

    Operating System Storage Performance Analysis

    © 2012 Storage Networking Industry Association. All Rights Reserved.

    Understanding the workload

    Request sizeBurstiness

    “Hot” data

    Concurrency

    Inter-arrival time

    (time of arrival from one request to the next)

    Locality (matters more on rotational than SSD)

    Few tools can faithfully reproduce a “live” workload

    23

  • 8/17/2019 RobertSmith Operating System Storage Analysis v1-1

    24/36

    Operating System Storage Performance Analysis

    © 2012 Storage Networking Industry Association. All Rights Reserved.

    Performance Counters

    I/O Transfer Time (Latency)

    Avg. disk sec/readAvg. disk sec/write

    Queuing

    Avg. disk read queue length

    Avg. disk write queue length

    Throughput

    Avg. disk bytes/read

    Avg. disk bytes/write

    Network

    Output queue length

    24

    Transfers / sec (IOPS)

    Disk transfers/secDisk reads/sec

    Disk write/sec

    %Idle Time

    Can be misleading

    Split I/O

    Fragmentation

    Large Requests

    OS CPU

    OS Memory 

  • 8/17/2019 RobertSmith Operating System Storage Analysis v1-1

    25/36

    Operating System Storage Performance Analysis

    © 2012 Storage Networking Industry Association. All Rights Reserved.

    Performance Analysis Tools

    Sampling ToolsSamples may be instantaneous or counters

    Good for long-term analysis

    Real-time Tools

    Software tracingKernel

    Drivers

    Hardware tracing

    Nothing abstracted

    Can be difficult to see everything in between initiator and target

    Transport security may be a factor – IPSEC

     – Encryption

    25

  • 8/17/2019 RobertSmith Operating System Storage Analysis v1-1

    26/36

    Operating System Storage Performance Analysis

    © 2012 Storage Networking Industry Association. All Rights Reserved.

    Vendor Provided Tools (1)

    Vendor Provided Tools

    Provide information about devices that may not all bereported up to OS

    Provide adapter-wide performance statistics

    Allow for adapter test

    Settings changes for tuning

    Fabric Software

    End-to-end visibility

    Sometimes bundled with devicesAbility to easily view fabric devices, including stats

    Help identify “hot spots”

    May require device clock sync for accuracy26

  • 8/17/2019 RobertSmith Operating System Storage Analysis v1-1

    27/36

    Operating System Storage Performance Analysis

    © 2012 Storage Networking Industry Association. All Rights Reserved.

    Vendor Provided Tools (2)

    27

    Sample from an HBA vendor provided tool

  • 8/17/2019 RobertSmith Operating System Storage Analysis v1-1

    28/36

    Operating System Storage Performance Analysis

    © 2012 Storage Networking Industry Association. All Rights Reserved.

    Vendor Provided Tools (3)

    28

    Some common FC error counters

    Link Failure

    Link down, zoning change (isolation)Sync Loss

    Can be caused by OS reboot

    Signal Loss

    Can be caused by OS rebootInvalid CRC

    Not normal

    Primitive Sequence

  • 8/17/2019 RobertSmith Operating System Storage Analysis v1-1

    29/36

    Operating System Storage Performance Analysis

    © 2012 Storage Networking Industry Association. All Rights Reserved.

    Other Error Factors

    29

    iSCSI

    CRC

    Digest

    TCP/IP

    CRCChecksum

    Fibre Channel

    Primitive Sequence

    Buffer_0

    ED_TOV

    RA_TOC

  • 8/17/2019 RobertSmith Operating System Storage Analysis v1-1

    30/36

    Operating System Storage Performance Analysis

    © 2012 Storage Networking Industry Association. All Rights Reserved.

    Virtualization Factors: Hosts

    Measure overall workload over time

    Try to provision storage to meet workload

    Stripe-Unit size

    Number of disks per storage pool or LUN

    If latency becomes apparent, monitor queue depth

    If queue depth is too low, disks may not be fully utilized

    If queue depth is too high, disks might be queuing, or I/O might be delayedin transit

    Adapter (FC, iSCSI, CNA, etc.)

    Consult with vendor for recommendations

    Queue depth – Determine if a change is needed based on performance – Too high and could saturate link of cause stalling in transit

    Onboard: Add disks, add controllers and disks, spread load

    Keep up with host software updates and firmware

    30

  • 8/17/2019 RobertSmith Operating System Storage Analysis v1-1

    31/36

    Operating System Storage Performance Analysis

    © 2012 Storage Networking Industry Association. All Rights Reserved.

    Virtualization: General

    Fixed size disks for intensive performance needs

    Over-provisioned disks; SSD or hybrid if possible

    Pass-Through Disks: Very little overhead, good perf

    Additions/Integrations

    Emulated SCSI or FC controllers may yield better perfAdd additional emulated controllers with fewer disks per

    Monitor memory within VM

    Low free memory could lead to excessive paging ortrimming

    Patch guests as you would physicals:

    Proactively look for and apply performance and stability

    related OS and application updates 31

  • 8/17/2019 RobertSmith Operating System Storage Analysis v1-1

    32/36

    Operating System Storage Performance Analysis

    © 2012 Storage Networking Industry Association. All Rights Reserved.

    Performance Recommendations

    Update software and drivers running in storage stack

    Anti-Virus

    Firewall

    Other Security

    File Screening

    HBA, CNA, NIC

    Multipath (MPIO) software

    Teaming software

    Discover all software in storage stackTrace Tools

    Remove any non-vital software in storage stack

    Utilize appropriate tier of storage per workload 32

  • 8/17/2019 RobertSmith Operating System Storage Analysis v1-1

    33/36

    Operating System Storage Performance Analysis

    © 2012 Storage Networking Industry Association. All Rights Reserved.

    Performance Recommendations (2)

    Tune cache on storage controllers

    Based on observed workload over timeBased on cache effectiveness counters (cache hits, etc.)

    Look for hot spots

    Can be hard to find

    Visual trace tools may help

    Symptom: Optimal storage performing poorly for no otherreason

    Be proactive with alertingSMI-S

    SNMP

    Start with a baseline, periodic snapshot

    Runbooks 33

  • 8/17/2019 RobertSmith Operating System Storage Analysis v1-1

    34/36

    Operating System Storage Performance Analysis

    © 2012 Storage Networking Industry Association. All Rights Reserved.

    Performance Recommendations (3)

    Optimize FAN-IN and/or FAN/OUT ratios

    Avoid congestion pointsMonitor fabric for BUFFER_O, and other errors(set alerts; automate as much as possible)

    Follow best practices for iSCSI

    VLAN or dedicated hardware

    Limit protocols in use

    Limit or remove sharing

    Optimize hardware per vendor recommendationsAvoid unplanned changes and track in detail if made

    Snapshot before and after if possible, and keep logs

    Chart all storage related tasks, look for overlap34

  • 8/17/2019 RobertSmith Operating System Storage Analysis v1-1

    35/36

    Operating System Storage Performance Analysis

    © 2012 Storage Networking Industry Association. All Rights Reserved.

    Performance Recommendations (4)

    Keep historical data about workload

    Take traces periodically (automate if possible)Provides for trending and lifecycle planning

    Use monitoring software and keep data for a year or two

    Have data readily available for engineering and vendor staff

    Plan the workload as much as possibleKeep charts, graphs, spreadsheets, databases

    Exercise new storage layouts before production

    Ask vendors for help if needed with load simulation toolsAlso ask for help if needed with performance tools

    Simulate failure(s) in test environment

    Familiarize yourself with support model

    Can analysis services be made available (with analyzer)? 35

  • 8/17/2019 RobertSmith Operating System Storage Analysis v1-1

    36/36

    3636

    Q&A / Feedback

    Many thanks to the following individuals

    for their contributions to this tutorial.- SNIA Education Committee

    Chris Lionetti,

    Flavio Muratore

    Bruce Worthington,

     Joseph White, Juniper

    Send any questions or comments on this

    presentation to SNIA: [email protected] 

    mailto:[email protected]:[email protected]