data protection and recovery in small mid-size

Upload: shidrangg

Post on 06-Apr-2018

217 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/3/2019 Data Protection and Recovery in Small Mid-Size

    1/40

    1

    Data Protection and Recovery in the Smalland Mid-sized Business (SMB)

    An Outlook Report f rom Storage Strategies NOW

    By Deni Connor, Patrick H. Corrigan and James E. Bagley

    Intern: Emily Hernandez

    October 11, 2010

    Storage Strategies NOW

    8815 Mountain Path Circle

    Austin, Texas 78759

    Note: The information and recommendations made by Storage Strategies NOW, Inc. are based upon public information and sources and may also include personal opinions both of Storage Strategies NOW and others, all of which we believe are accurate and reliable. As market conditions change however and not within our control, the information andrecommendations are made without warranty of any kind. All product names used and mentioned herein are the trademarks of thei r respective owners. Storage Strategies NOW,Inc. assumes no responsibility or liability for any damages whatsoever (including incidental, consequential or otherwise), caused by your use of, or reliance upon, the informationand recommendations presented herein, nor for any inadvertent errors which may appear in this document.

    This report is purchased by Nexsan, who understands and agrees that the report is furnished solely for its use only and may be distributed in whole to partners, prospects andcustomers.

    Copyright 2010. All rights reserved. Storage Strategies NOW, Inc.

  • 8/3/2019 Data Protection and Recovery in Small Mid-Size

    2/40

    2

    Sponsored By

  • 8/3/2019 Data Protection and Recovery in Small Mid-Size

    3/40

    3

    Table of Contents Sponsored By .......................................................................................................................................................... 2

    Introduction ............................................................................................................................................................ 6The Small and Medium Business Market .................................................................................................... ........... 6

    Size of Market by Revenue and IT Spending ...................................................................................................... 6Importance of Data Protection and Business Continuity Software .................................................................... 6

    SMB Unique Requirements .................................................................................................................................7

    Growing Data Retention Demands ......................................................................................................................7

    Technology Availability ........................................................................................................................................7

    US SMB Businesses and Revenue by Size (SBA, 2007 data) ...............................................................................7The North American Industrial Classification System (NAICS) ........................................................................ 8

    How To Reach the SMB Market .......................................................................................................................... 9SMB Sectors Requiring Large Amounts of Data ................................................................................................. 9

    Energy exploration and operations for oil and natural gas ............................................................................. 9

    Mining operations other than oil and gas ............................................................................................. .......... 9

    Motion picture and video production ............................................................................................................. 9Data processing, hosting and related services ................................................................................................ 9

    Software publishers ........................................................................................................................................10

    The financial industry .....................................................................................................................................10

    Legal services ................................................ ..................................................................................................10

    Accounting, tax preparation, bookkeeping and payroll services ...................................................................10 Architectural, engineering and related services ............................................................................................. 11

    Computer systems design and related services .............................................................................................. 11

    Research and development in physics, engineering and life sciences ........................................................... 11

    Healthcare ...................................................................................................................................................... 11

    Managed Service Providers (MSPs) ................................................................................................................... 12

    Data Protection Technologies ................................................................................................................................ 13Backup to Tape ................................................. .................................................................................................. 13

    Virtual Tape Library (VTL) ................................................................................................................................ 14

    Tape vs. Disk What to Choose and Why ......................................................................................................... 14

    Disk-to-Disk-to-Tape (D2D2T) .......................................................................................................................... 14

    Tape isnt dead, the mission for tape changed ............................................................................................... 14

    Tape vs. Disk - Reliability ............................................................................................................................... 15Tape vs. Disk - Performance ........................................................................................................................... 16

    Tape vs. Disk - Management .......................................................................................................................... 17

  • 8/3/2019 Data Protection and Recovery in Small Mid-Size

    4/40

    4

    Tape vs. Disk - Availability ............................................................................................................................. 18Tape vs. Disk - Power Efficiency .................................................................................................................... 18

    Conclusion ...................................................................................................................................................... 19

    Backup to Removable Disk ................................................................................................................................ 19

    On-Line Backup ................................................................................................................................................ 20

    Online Backup Issues ..................................................................................................................................... 21

    Methods for Backing up Data ............................................................................................................................ 21

    File Synchronization ......................................................................................................................................... 22

    Remote Data Replication .................................................................................................................................. 22

    Images, Clones and Snapshot Images ....................................................................................................... ........ 23

    Continuous Data Protection and Near Continuous Data Protection ................................................................ 24

    Agent vs. Agentless Backup ............................................................................................................................... 24

    Windows Volume Shadow Copy Service (VSS) ................................................................................................. 24Encryption and Password Protection of Backup Media ................................................................................... 25

    Tape Drive-based Encryption ........................................................................................................................ 25

    Encryption Issues .......................................................................................................................................... 25

    Backup Data Compression ............................................................................................................................ 25

    Data Deduplication ........................................................................................................................................... 26File Mode and Block Mode ............................................................................................................................ 26

    In-Line or Post-Processing Deduplication .................................................................................................... 26

    Backup Performance ..................................................................................................................................... 27

    Restore Performance ..................................................................................................................................... 28Power and Cooling Consideration ......................................................................................................... ........ 28

    ECO-Matters .................................................................................................................................................. 29Source or Target Deduplication..................................................................................................................... 29

    The Downsides of Data Deduplication .......................................................................................................... 29

    Application-Specific Backup ............................................................................................................................. 29

    Virtual Machine (VM) Backup .......................................................................................................................... 30

    Backing Up Virtual Machines ............................................................................................................................ 31Hypervisor-specific Backup Methods ....................................................................................................... ........ 32

    Microsoft Hyper-V ................................................................................................................................. 32

    KVM, VirtualBox, Xen, XenServer and Others ...................................................................................... 32

    Tips and Best Practices for Effective Backups ...................................................................................................... 33

    Use case profile ..................................................................................................................................................... 34

    Customer Name: Clark Enersen .................................................................................................................... 34

    Vendor Name: Nexsan ...................................................................................................................................... 35

  • 8/3/2019 Data Protection and Recovery in Small Mid-Size

    5/40

    5

    Table 1.1 Vendor/Product Name........................................................................................................................... 36

  • 8/3/2019 Data Protection and Recovery in Small Mid-Size

    6/40

    6

    Introduction

    The data protection and recovery space is exploding as more businesses recognize that protecting their assets their information -- is key to business survival. Small and mid-sized businesses are a market that has beenunderserved by data protection software, appliances and online backup services until the last few years. Yet,these organizations have the same needs as large enterprises to protect their data. SMBs, unlike largeenterprises, though are faced with a number of unique challenges.

    Providing full-time dedicated IT resources may be beyond their means and paying for that IT help and for thesoftware to manage their data may quickly overwhelm them. They often turn to managed service providers or value-added resellers to manage their infrastructures or to supplement the IT skills they have.

    Now, there are many software packages, appliances, target arrays (which have integrated snapshot andreplication capabilities) and services available to SMB customers that provide data protection and recovery.This survey addresses most of them.

    The Small and Medium Business Market

    For this survey we analyzed companies with at least one paid employee but less than one thousand employees.In the United States alone, there are approximately 5.75 million firms in this category and only about 13,000firms with one thousand or more employees. Worldwide, SSG-NOW estimates there are more than eightmillion firms in this category. In addition, many governmental units, be they departments of largerorganizations or typical municipalities, have similar IT requirements of the SMB.

    The SMB market cant be characterized solely by the number of employees an organization has. We talked to

    many SMBs that while they have few employees, have storage capacities under management that may surprisethe casual observer. Their ability to consume storage varies widely from 500GB at the low-end to 100TB at thehigh-end. In some instances such as video post-production, the data can grow into the petabyte range just inthe manufacture of a single movie. The amount of data growth SMBs are experiencing is growing at a pace thatdoubles every 18 months.

    Size of Market by Revenue and IT Spending All US companies had revenue of about $30 trillion in 2007. SMBs accounted for $13 trillion in revenue that year, which is the most recent data set available. Despite the effects of global recession, worldwide IT spending by SMBs was about $575 billion in 2009 and is estimated to grow to $630 billion by 2014.

    Importance of Data Protection and Business Continuity Software

    Data protection has become the highest priority for IT spending in the SMB according to surveys conducted in2010. This represents a shift as SMB executives realize how computer-centric their organizations have become.In recent times, data protection was viewed as expensive insurance against events that could not easily bepredicted and costs of data loss were unknown. But organizations of all sizes now realize that the loss of accessto data directly affects their ability to operate. The recommended allocation of IT budget to this critical functionranges from 5% to 10%, depending on the type of business. Worldwide, SMB spending on data protection isestimated at $30 billion to $60 billion in 2010.

  • 8/3/2019 Data Protection and Recovery in Small Mid-Size

    7/40

    7

    SMB Unique Requirements SMBs have unique requirements and challenges when compared to their larger counterparts. First, ITresources are minimal and often performed by the proprietor or staff members that have many otherresponsibilities in the firm. We talked to one SMB, whose administrator was also responsible for human

    resources and finance, as well as a myriad of other miscellaneous responsibilities.Further, while infrastructure is often limited to a number of desktop or laptop clients and perhaps a few dozenservers, technology available to these organizations is second to none and the ability to adopt new equipment,often at lower cost and better performance, is usually easier than in large organizations that are not as nimble because they need to move technology forward en masse. Purchasing decisions are likely to be quicker due tofewer people involved. And one data loss experience is usually enough to justify acquisition of data protectionand business continuity products. With the reliance upon technology within virtually every business endeavor,data loss experiences happen at an ever increasing rate.

    Growing Data Retention Demands One thing SMBs have in common with their larger counterparts is the explosive growth in data storagerequirements. Certain business segments have higher capacity requirements, for example, healthcareproviders, law firms and financial organizations. But all organizations have data retention requirements foraccounting information and increasing governmental reporting demands. The data must be retained andavailable for long periods, often, as in the case of electronic health records, forever.

    Technology Availability Changing technology can be rapidly adopted by SMBs. Simple tape backup systems, the norm of a decade ago,are now being replaced by low-cost drive arrays and even small organizations are adopting virtualization,replication, mirroring and deduplication technologies. Low-cost bandwidth supplied by cable andtelecommunication providers allows the delivery of cloud-based data protection services to home and smalloffices. The use case of local data storage appliances that automatically back up to cloud storage are becoming widely deployed as organizations realize that offsite data retention can be transparent and automatic to theiroperations, as opposed to an expensive, problem-prone effort.

    US SMB Businesses and Revenue by Size (SBA, 2007 data)

    Size Firms Establishments Employees Revenue (x1000) Total 6,049,655 7,705,018 120,604,265 $29,746,741,9040-4 3,705,275 3,710,700 6,139,463 $1,434,680,8235-9 1,060,250 1,073,875 6,974,591 $1,144,930,23210-14 425,914 444,721 4,981,758 $791,709,66515-19 218,928 237,689 3,674,424 $603,788,76620-24 134,254 152,547 2,928,296 $489,530,870

    25-29 89,643 106,623 2,405,637 $402,007,35930-34 64,753 81,086 2,063,987 $364,392,992

  • 8/3/2019 Data Protection and Recovery in Small Mid-Size

    8/40

    8

    Size Firms Establishments Employees Revenue (x1000) 35-39 47,641 62,878 1,754,582 $304,339,75840-44 38,221 51,847 1,600,913 $293,476,56945-49 29,705 43,325 1,391,754 $249,407,54450-74 86,364 139,864 5,195,105 $979,545,56275-99 41,810 85,215 3,582,686 $710,220,323100-149 39,316 102,135 4,749,055 $967,245,234150-199 18,620 66,602 3,205,201 $674,337,913200-299 17,780 87,923 4,309,143 $897,848,746300-399 8,155 55,515 2,808,347 $595,711,397400-499 4,715 43,678 2,101,982 $476,906,931500-749 6,094 71,702 3,695,682 $800,475,934750-999 2,970 45,990 2,561,972 $636,199,2290-999 6,040,408 6,663,915 66,124,578 12,816,755,847

    The North American Industrial Classification System (NAICS) The NAICS is maintained by the Census Bureau as a way to classify businesses into sectors. The following aremajor classifications with subsectors defined under each. Sectors shown in bold are included in this report ascontaining SMBs with the largest amount of data per employee.

    Sector Description

    11 Forestry, Fishing, Hunting and Agriculture Support21 Mining22 Utilities

    23 Construction31-33 Manufacturing42 Wholesale Trade44-45 Retail Trade48-49 Transportation and Warehousing51 Information52 Finance and Insurance53 Real Estate and Rental and Leasing54 Professional, Scientific and Technical Services55 Management of Companies and Enterprises56 Administrative and Support and Waste Management and Remediation Services61 Educational Services62 Health Care and Social Assistance71 Arts, Entertainment and Recreation72 Accommodation and Food Services81 Other Services (except Public Administration)99 Unclassified

  • 8/3/2019 Data Protection and Recovery in Small Mid-Size

    9/40

    9

    How To Reach the SMB Market With upwards of $60 billion in annual data protection spending, many technology companies are specifically targeting the SMBs with data protection and business continuity products. Due to the large number of organizations, marketing to millions of SMBs is distinctly different than targeting the Global 5000 enterprises.

    Mass media marketing by software-as-a-service providers MozyPro and Carbonite is an example. Much of thedelivery of technology to this market is performed by managed service providers (MSPs), effectively,outsourced IT groups. Data protection software is often bundled with storage systems, ranging from smallnetwork attached storage (NAS) filers to large, purpose-built systems with all of the advanced features used by large enterprises. How do these systems apply to SMBs? The answer is simple. While the SMB market isdefined as fewer than 1,000 employees, the amount of storage that needs protection varies wildly based on thetype of business. As a result, vertical marketing, along with channel partner recruitment, is a critical factor inreaching the gold in the SMB market. A small radiology practice may only have a dozen employees but willoften be managing terabytes of data that doubles every 18 months and needs to be retained forever.

    SMB Sectors Requiring Large Amounts of Data As we focus our research on the businesses with fewer than 1,000 employees, we find a disconnect between the

    amount of storage being protected as a function of the number of employees. This is because the sector the business serves is a bigger predictor than the number of employees. We find that the small business segmentsthat have large amounts of data include energy exploration and extraction, engineering, healthcare practices,law firms and motion picture/video production, to mention a few.

    Energy exploration and operations for oil and natural gas 6,430 firms are involved in oil and natural gas extraction (NAICS 211111). 1,950 firms perform oil and naturalgas drilling (NAICS 213111). 6,880 firms are involved in support services for oil and gas operations (NAICS213112). These SMBs consume large amounts of data in the analysis of geologic information, engineering andequipment design data, well design and mapping, production volume and management of flows and depletion.Regulations require significant data retention periods, in many cases, permanent retention, as well ascontinuous reporting to various governmental entities for safety, revenue, environmental and mappingpurposes.

    Mining operations other than oil and gas 4,440 firms are involved in mineral extraction other than oil and gas (NAICS 212). Another 680 firms areinvolved in support of mining operations (NAICS 213113, 213114, 213115). These SMBs have similar data setanalysis, retention and reporting requirements to the oil and gas industries.

    Motion picture and video production Motion picture and video production, perhaps surprisingly, is dominated by SMBs. Some 12,300 firms havefewer than 1,000 employees while only 45 firms employ more than 1,000 (NAICS 51211). Another 2,015 firmsare involved in post production (NAICS 51219), which has the highest storage requirement per employee because these firms are dealing with editing the full content. The advent of high-definition video and threedimensional productions has multiplied the amount of storage required during production and postproduction. Additionally the cutover from analog to nearly 100% digital content has hit this sector like a tidal wave.

    Data processing, hosting and related services 7,280 firms in this sector are SMBs (NAICS 5182). These include the Managed Service Providers who, in many cases, are the IT departments for the majority of SMBs. These organizations are further critical to the dataprotection market as resellers, recommenders or providers of data protections equipment and services. Thegrowing amount of cloud-based storage services are accessed through this sector.

  • 8/3/2019 Data Protection and Recovery in Small Mid-Size

    10/40

    10

    Software publishers 6,000 software publishers (NAICS 5112) with fewer than 1,000 employees have special data protectionrequirements including revision control and maintenance of large test case databases.

    The financial industry The financial industry, while including many huge banking institutions, is made up of many thousands of SMBs. Due to the transactional data protection, as well as retention requirements, these organizations all have very large data sets that must be protected from data loss, corruption and theft.

    Credit intermediation and related services This large segment includes all banking, savings and lending services. Approximately 67,500 firms fallinto the SMB category (NAICS 522). Transactional data is critical to all companies. It is not acceptableto have any loss of data in this area. Inability to access transactional data may be acceptable for brief periods, but loss of data is fatal. In addition, all data must be retained for indefinite periods, and mostfirms keep all transactional data permanently. In addition, support data such as e-mails and othercommunications are subject to regulatory immutability -- that is, any communication must bemaintained in case of inquiry or litigation. Finally, all documentation related to lending has come under

    additional regulation during the last year. This creates a tremendous data protection requirement for allfirms in this sector.

    Securities intermediation and related services 54,500 firms involved in the brokerage of stocks, bonds and other financial instruments are included inthis sector (NAICS 523). Since the Sarbanes-Oxley Act of the early 2000s, and recent enactments underextended financial regulation, transactions and communications are under increasingly strict regulatory control. Additional scrutiny extends to executive compensation and communications regarding allsecurity transactions. This creates additional stress in this sector in terms of data protection andretentions.

    Insurance carriers and related services 67,000 SMBs are involved in the insurance business (NAICS 524). From simple insurance agents tolarge scale fiduciary activity, the sector has come under significant new reporting and financialallocation requirements. The new regulations in health insurance create additional reporting and datamanagement requirements. Unless significant changes to existing legislation occurs, this segment will be adding storage and protection at levels of an order of magnitude over prior periods. Large data setsalso include research functions and actuarial data.

    Funds, trusts and other financial vehicles 2,100 SMBs are involved in the management of trusts, mutual funds and other financial instruments(NAICS 525). These firms have similar transactional, data retention and regulatory reportingrequirements.

    Legal services

    185,000 SMBs are involved in legal services (NAICS 5411). Data protection and security are extremely criticalin law firms. Extensive access to online research has replaced the traditional law library. Recent enhancementsto tool sets related to electronic discovery and immutable archiving have increased the amount of datamanaged by law firms.

    Accounting, tax preparation, bookkeeping and payroll services 106,200 SMBs are involved in accounting and payroll services (NAICS 5412). Data protection is critical to theseorganizations that have regulatory requirements for long-term record retention.

  • 8/3/2019 Data Protection and Recovery in Small Mid-Size

    11/40

    11

    Architectural, engineering and related services 100,000 SMBs are involved in architectural and civil engineering (NAICS 5413). Permanent data retention andfast access to the data is critical in this field. Often multiple offices of these firms require simultaneous accessto this information. Computer aided design (CAD) data represents very large data sets with access and revisioncontrol as major application requirements.

    Computer systems design and related services 99,600 SMBs are involved in computer systems design and services (NAICS 5415). These firms have extensivedata requirements for CAD files, computer program source files, test and simulation data and developmentsupport data.

    Research and development in physics, engineering and life sciences 10,900 SMBs perform research in these areas (NAICS 54171). Huge data basis including genomes,pharmaceuticals and theoretical simulations are required for basic research. This data is often modified andupdated with associated revision control and results derivatives. Data protection is critical to the field, withmany regulatory aspects related to field testing and trial results.

    Healthcare The American Reinvestment of Recovery Act of 2009 (ARRA), often referred to as the stimulus plan ) created afund in excess of $35 billion to fund new technology for healthcare providers of all types. Along with the largesource of funding came new requirements for data retention and security of data against breach of personaldata. The Affordable Care Act of 2010 added a number of new regulations that directly affect informationtechnology in this sector. New diagnostic equipment generates huge data sets that must be retained withinelectronic health records (EHR) permanently. The net result is an exponential increase in the amount of datathat must be retained and protected.

    Offices of physicians 190,500 SMBs make up the vast majority of physician practices in the US (NAICS 6211). These firms are

    under the same regulations, incentives and data retention requirements of the hospital system, generally without the benefit of information technology employees. In addition to an array of specialized medicalequipment that generates large amounts of data, physicians are becoming more computer-centric in allareas, including EHR, billing and even prescription writing. A major requirement of EHR complianceunder the ARRA is computerized prescription order entry (CPOE) which will require automation far

    beyond the simple scribbling of a prescription onto a piece of paper and sending it off with the patient.

    Outpatient care centers other than family planning and substance abuse Some 8,200 firms are involved in this sector (NAICS 62149) which includes surgical, HMO, dialysis andemergency care centers.

    Medical and diagnostic laboratories 7,500 medical and diagnostic laboratories are SMBs (NAICS 6215). Huge data sets are generated by these

    organizations and are subject to the same type of regulatory considerations as all organizations involvedin EHR generation.

    Acute care hospitals We do not consider the 4,000 Acute Care Hospitals to be in the SMB market (NAICS 622). While someindividual hospitals may fall into the category, even small hospitals are generally managed by largerorganizations with IT staffing and centralized support.

  • 8/3/2019 Data Protection and Recovery in Small Mid-Size

    12/40

    12

    Managed Service Providers (MSPs) Managed Service Providers are major resources for IT support to the SMBs. Ranging from a few employees tolarge regional and national entities, MSPs provide hardware and software recommendations, resale of equipment and software and often provide hosted data center support. They are a major channel for data

    protection services to the SMB market.

  • 8/3/2019 Data Protection and Recovery in Small Mid-Size

    13/40

    13

    Data Protection Technologies

    Backup to TapeThe traditional method of system backup has been file-by-file backup to tape. Typically a tape rotation schemeis used that provides backups at different points in time. Tape has suffered from a number of problems:

    A multiplicity of tape and data formats. Numerous tape formats of varying capacities have beenand are being used. These formats are incompatible with each other, so moving from one tapetechnology over time or to one tape library to another can be an expensive process. In addition,

    backup software vendors have often used their own proprietary logical data formats (similar to the way different word processors, such as Word and WordPerfect, use different formats), whichfurther compound the problem. In both cases you must have the same type of tape drive and oftenthe same software to restore a tape. The recent development of the Linear Tape File System(LTFS), a standardized file system for LTO-5 1 tape, should help alleviate the compatibility issues tosome degree, assuming the standard is widely accepted.

    Reliability issues. Over the years tape has suffered from reliability issues, both with drives andmedia. It is not uncommon for a tape drive to require repair or replacement within three years.

    Although newer tape technologies, such as LTO, have improved tape reliability, media issues arestill all too common. In addition, tape drive read/write heads must be cleaned on a regular basis tomaintain reliability. Tape, libraries add additional mechanical components that can fail as well andrequire replacement.

    Cost. The cost per megabyte of tape media has dropped considerably over the years, but it has notkept pace with the drop in the cost of disk media. In addition, the cost of the tape drive itself isrelatively high. Internet pricing on LTO-4 drives (800GB native capacity) is about $2,500-$4,000,

    while LTO-5 drives (1.5TB native capacity) sell for about $3,500-$5,000.

    Performance. Although tape drive performance and tape capacity have both increased significantly in recent years, the amount of data most organizations need to back up has increased dramatically as well. Even with faster backups, many organizations cannot perform full backups to tape in theiravailable backup window without using multiple tape drives and multiple backup servers, furtherincreasing the cost and complexity of tape backup.

    Recovery. Recovering data from tape can be time-consuming. For effective recovery tapes must belabeled properly and the backup system must maintain a catalog of tapes in a database. If thedatabase is lost or corrupted tapes must be re-cataloged, which can itself be a very time-consuming process. Since most tape rotation schemes include some offsite storage of tapes, if thedata that needs to be recovered is on a tape stored off site that tape must be retrieved to recover

    the data. Since tape is a liner format, accessing and restoring a file or files usually takes

    1

    Linear Tape Open (LTO) is a tape format created by the LTO Consortium, which was initiated by Seagate, HP and IBM. LTO is an openstandard created in the late nineteen nineties as an alternative to the numerous proprietary tape formats then in existence. LTO-5 is thelatest incarnation of the standard. LTO-5 tape cartridges have a native capacity of 1.5 TB. Linear Tape File System (LTFS) is astandardized file system for LTO-5 and above. Data written in LTFS format can be used independently of any particular storageapplication. Since LTO is an open standard, LTO drives and media are available from many manufacturers.

  • 8/3/2019 Data Protection and Recovery in Small Mid-Size

    14/40

    14

    significantly longer than restoring the same file or files from disk. Automated tape libraries and bar coding of tapes can alleviate some of these issues, but automated libraries add additionalmechanical and electronic components that can fail, so in some circumstances they can createadditional problems.

    In spite of these issues many SMBs still use tape backup. In some organizations it is the only backup methodemployed, while in others it is used in addition to or in conjunction with another backup method.

    Virtual Tape Library (VTL) Virtual Tape Libraries solve one of the major problems of tape the difficulty of completing a backup withinthe available backup window. A VTL appears to the system to which it is connected as a tape library withmultiple tapes. This means that an organization can use their existing legacy tape backup software to back upto a much faster disk-based systems. With a VTL, the virtual tapes are stored on the system for a period of timeto allow file restorations, if necessary. Sophisticated VTLs can also export data to tape for archiving purposes. Vendors of VTLs include IBM, SEPATON, Quantum, FalconStor Software, Data Domain, Overland Storage andHitachi Data Systems. Employing a VTL might make sense for SMBs who are trying to extend the life of theirexisting backup software, but a disk-to-disk-to-tape approach 2 (see below) probably makes more sense if software is being upgraded or currently supports disk-to-disk-to-tape.

    Tape vs. Disk What to Choose and Why

    Disk-to-Disk-to-Tape (D2D2T)

    Tape isn t dead, th e mission for tape changed

    The continual announcements of the death of tape are certainly premature. It does however mark the beginning of the inevitable. Storage technologies do die, as witnessed by the demise of everything fromHollerith cards, paper tape, floppy disks, and round reel tape to name a few. They all shared a common fate,the cost and performance to more reliably store data was superseded by emerging technology. Cartridge tapereplaced round reel, but does disk threaten the future of tape?

    The answer is yes, in some areas, and no in others at least for the moment.

    When looking at the benefit comparison, some IT professionals who choose tape for their backup environment will end up citing a couple things like, tape is still fast enough to meet their window, or, their organizationcan handle extended periods of downtime while waiting on a restore. However, the most commonly usedanswer for a tape deployment over disk is that the sheer expense of tape is simply, cheaper. With theperformance, management and reliability benefit clearly belonging to disk, the outstanding issue seems to be a

    perceived cost issue. When making a direct cost comparison of media, it is true that the cost-per-byte is slightly cheaper for a tapecartridge than it is for an equivalent disk. However when you consider the larger costs of media upgrades, new transports, management, and remastering, the overall costs will likely favor the new generations of dense disk

    2 VTL is a form of disk-to-disk-to-tape, but it is usually not recognized as such by backup software. Most backup software sees VTL as anactual tape library.

  • 8/3/2019 Data Protection and Recovery in Small Mid-Size

    15/40

    15

    arrays using MAID energy savings.

    For some IT p rofessionals, thats where they draw the line and make a decision. For them, the cost of media isthe race. The problem, however, has to do with the fact that the race really isnt about the cost of media, its theassociated cost of several other factors: downtime, reliability, management, availability, data growth and thecost of the backup system itself. In other words, its about the big picture.

    It should be noted that some IT professionals have circumvented the whole tape vs. disk decision dilemma andhave implemented tiered solutions that use both in concert, otherwise known as Disk-to-Disk-to-Tape(D2D2T). With this approach, IT professionals are writing directly to a disk array for their backup where thedata remains for 90 days before being passed on to tape for deep archiving and off-site portability. With thisapproach, organizations are leveraging the many benefits of online disk storage while maintaining theportability and long term retention aspects they are used to receiving with tape.

    Since the cost comparison between tape and disk is obviously far more complex than the media itself, thissection outlines all the considerations necessary for a more complete understanding of the benefit and true costcomparison between tape and disk to help IT professionals choose and justify their backup environment.

    Whereas, the decision was more controversial just a few short years ago, it has never been clearer and easier tounderstand than now with new technologies, capacities and market prices.

    Tape vs. Disk - Reliability

    The primary benefit of tape is to offer adequate data protection at low cost. Analysts estimate that one in tenrecovery images on tape is unrecoverable. Data up to one year old has a 10-15% failure rate, and the failure rateof data five or more years old is 40-45%.

    Other studies have revealed that much of this goes unnoticed as Storage Magazine reported that 34% of companies, who backup their data to tape, never test their backups. They went on to say that, 77% of thosecompanies, who did test their backups, found restore failures.

    Boston Computing Network, Data Loss Statistics found that 7 out of 10 small firms that experience a majordata loss are out of business within a year. Paradoxically, all of this risk assumes you have completed a backupand have the option of a restore. Ironically, backups to tape are frequently not completed in the course of adefined backup window. If you have no backup, you have no option for restore.

    If a 10% failure rate with tape is a best-case scenario for a data center, that means it is still 10% more than anorganization can afford. The best- case scenario for tape reliability is still a data centers worst operational risk.For this reason, organizations must make multiple copies of every tape backup to increase the reliability of their protection architecture. Although the cost of media might be cheaper for tape than disk on a one-for-onecomparison, after one includes the number of copies it takes for tape to achieve acceptable levels of reliability,the cost-per-byte protected far exceeds disk.

    Thats why cost comparisons shouldnt revolve around bytes stored but rather bytes protected. Tape only outperforms disk on outright media costs when organizations accept the associated reliability risk a nd dontmake multiple copies of each tape backup.

    Disk, unlike tape, has a multitude of reliability and protection elements that are built-in and commonly usedlike RAID and automated error checking. There is no such thing as RAID with tape. If one tape out of a backup job group fails, the integrity of the whole restore collapses. By utilizing RAID 6, organizations are protectedagainst the most extreme circumstances like dual drive failures on the latest large capacity drives. With tape,there is no system or architecture of built-in redundancy.

  • 8/3/2019 Data Protection and Recovery in Small Mid-Size

    16/40

    16

    Reliability versus cost is one of the key determinants to disk versus tape. Unless a user has the latest high -end tape transport and library targeted at the enterprise, serious reliability issues are to be expected underheavy use, notwithstanding the complexity of tape management and risk from handling. Additionally, usersneed many transports and libraries to stand a chance of getting nightly backups done on time. Then there is the

    media. Anyone using tape understands that the cost of the media is the big expense. Every couple of years, asnew transports are announced, users discover the old media will no longer work, and all of it has to bereplaced. The process of replacement is not only expensive it is also quite disruptive.

    When taking into consideration the reliability exposures and the necessary retention of multiple copies for asingle byte, the cost-per-byte as related to tape is far more expensive than the base measurement. And that stilldoesnt take into consideration what has always been seen as the necessary evil - performance, networks,resource conflicts, scheduling, media management and more.

    Using disk as a library offers a flexible, ultra-reliable, high-performance and operationally efficient solution.Features from backup software vendors have made backup to disk the logical choice for simple and flexible backup and recovery.

    When considering applications like VMware, Exchange and SharePoint, protection, recoverability andperformance are key. While tape is still used, it is rare to see it used exclusively today. The benefits of Disk-to-Disk (D2D) are too great, which is why at least 70% of all backups are written to disk first .

    Tape vs. Disk - Performance

    For many organizations, the only cost that really matters is the business cost of downtime. The fundamentalquestion to ask when looking at tape or disk is, What are your recovery objectives? After all, its not really about the backup, its about the restore. It has to work and it has to work on time.

    The architecture in tape used to gain backup performance has a backlash for restore. The technique that isused to achieve acceptable performance levels with tape is called multi-threading. In multi-threading, a

    backup application will start many backup streams (typically around a 15), which will be interleaved to a singletape transport. A typical user may need to run 300 threads over the course of a night. The reason this is anissue is that while multi-threading allows for the best performance possible of a backup to tape, it also insuresperformance issues in restore, and actually increases the probability of a data loss failure. Here is why.

    While multi-threading allows the backup to run multiple backup job streams interleaved onto one transport toachieve high levels of performance for todays fast transports, as soon as one of the multiple streams completes, because it has a small file size, performance of the backup degrades. This degradation continues over timeuntil eventually the transport has to slow down because it cant receive data fast enough. We are left running atransport so slow it actually goes into a start-stop mode, versus having enough data arriving to keep thetransport in a full streaming mode. From a reliability point of view, start-stop mode can stress the media, which can lead to media failures. Beyond start stop mode, 80 passes are required to completely write on LTO-5 cartridge, that alone is causes concern for media reliability. Each pass causes wear on the media and heads.

    The impact from a restore point of view is slow performance. To recover an individual file, directory, user orapplication, the read back performance requires reading all blocks across all tape cartridges used for the backup. The problem is that the system must read all of the tape(s), with 14/15 of all the data read back thrownaway. This results in seriously slow performance if you can read it back at all. If there is a single uncorrectableread error, the entire backup may be lost.

    Protection objectives are measured as Recovery Point Objectives (the amount of data at risk) and Recovery Time Objectives (the amount of downtime you can tolerate) are a concern with tape. LTO- 5 tape can easily

  • 8/3/2019 Data Protection and Recovery in Small Mid-Size

    17/40

    17

    take nearly 8 hours to recover 10TB as compared to 2.5 hours for a high performance disk array used as aprotection library. What is the cost of downtime for you?

    When using disk as a prote ction library, the problem is solved. Backup software such as Veritas Netbackupindexs all the data as it is stored directly to disk. Whether there is a need to recover a single sub-object to VMware, an email message to Exchange or a SharePoint Document, users can recall them individually with asimple point and click. Due to the nature of random access recovery, performance is unparalleled. Many usethe NDMP protocol to write directly to disk for easy configuration and management of network based backups. With NDMP, network congestion is minimized because the data path and control path are separated.

    With disk used as a protection library, backup can occur locally - from file servers direct to disk - whilemanagement can occur from a central location. Operation is simple because it is indexed by the backupapplication directly to disk. The decreased infrastructure complexity makes everything easier and far moreoperationally efficient. With disk, backup is faster to restore and much easier to manage.

    Tape vs. Disk - Management

    Backup to tape has always been an administrative challenge with the amount of manual intervention needed toperform backups. Tape backup must be closely supervised, equipment needs to be regularly maintained, headshave to be cleaned, tapes must be loaded, replaced, labeled and transported. While multiple tapes andmonitoring are required for a single backup to tape, backup to disk is a completely automated procedure justset and forget.

    Backup to tape typically uses a Grandfather-Father-Son (GFS) managed retention plan. The GFS scheme usesdaily (Son), weekly (Father), and monthly (Grandfather) backup media sets. Four backup media sets each arelabeled for the day of the week. Typically, incremental backups are performed on the Son media, which isreused each week on the day matching its label. The Father media is reused monthly; and the Grandfathermedia records full backups on the last business day of each month. As a result, the estimated total requiredcapacity for each 1TB of primary disk requires up to 25TB of archival tape storage

    The cost to implement, maintain and manage this level of protection can be overwhelming. Heres one exampleto illustrate the capital expense: if an organization were backing up 42TB of primary disk, they would need1,575 LTO-4 tapes over the course of a year. This assumes an 80% efficiency usage for each cartridge. At $38per cartridge, the cost is $59,000. Using GFS the cost of storing 25 copies of the data would rise to $1,496,250.By comparison the cost of a second 42TB array as a backup target is in the range of $45,000.

    A restore of a single user or application can easily require loading and reading 10 to 30 cartridges or more.Finding the right cartridges and having each one of them work without failure is a major concern. Themanpower required to manage a tape library is far beyond the manpower needed to manage disk as aprotection library.

    A tape library is typically a serialized resource. Backup jobs are scheduled by priority; resources are switchedand allocated to a job. When that job completes, resources are switched again, and the process goes on. Theassociated monitoring and administration of complex processes creates heavy bandwidth on the IT departmentand easily leads to operational failures.

    Using disk for a protection library allows users to share resources among multiple servers, simultaneously, whether it is on a SAN or through the network by way of iSCSI - no monitoring, no switching, no hassles.Backup jobs run simultaneously, avoiding the imposed requirement from tape to wait before starting a backup job after the previous one is complete, and resources are switched. With a disk array, multiple streams can runat the same time. Users can also easily collect or move data offsite on a WAN for geographically protected data.

  • 8/3/2019 Data Protection and Recovery in Small Mid-Size

    18/40

    18

    With disk used as a protection library, backups are routed through a centralized backup infrastructure; by leveraging deduplication, users can expect up to 20x savings in stored data with significant improvements in backup and restore performance. With tape requiring 1,575 (times 25 for GFS) cartridges to protect 42 TB overthe course of a year, a deduplication disk storage system would need only 2 TB. And with backup data reduced

    to its raw essentials, data is even more easily transferred over a network to a disk system at a disaster recovery site.

    Tape vs. Disk - Availability

    It is well understood that magnetic tape degrades over time. Temperature and humidity have a dramaticimpact on shelf live. Ten degrees of temperature change can change the life of a tape by ten years or more. If anadministrator loads a cart of tapes and takes them to a non-raised floor room, there is a great dangertemperature and humidity changes will accelerate the effects of thermal decay which, in turn, will destroy datain as little as five years.

    The Library of Congress and the National Media Lab recommends, for data having permanent value, storage

    areas should be kept at a constant 45 to 50 F or colder (do not store magnetic tapes below 46 F as it may cause lubrication separation from the tape binder) and 20 to 30% Relative Humidity (RH) for magnetic tapes(open reel and cassette) and 45 to 50% RH for all others. Environmental conditions must not fluctuate morethat 5 F or 5% RH over a 24 hour period. Tape should be stored in dark areas except when being accessed, being sure to keep recordings away from UV sources (unshielded fluorescent tubes and sunlight).(Source: TheNational Media Lab)

    Widely fluctuating temperature or RH severely shortens the life span of all tape. This is one of the main reasons why tape is only viable for the large enterprise that can afford a library large enough to maintain tape on raisedfloor handled exclusively by a robot.

    The design of the cartridge and the transport are critical to tape reliability as well. The enterprise classtransports used today are in the 400,000-hour range. A well-managed cartridge (correctly controlledtemperature and humidity) that is also a stagnant cartridge (i.e. a cartridge that has not been used) has a shelf life of around 20 years.

    Considering a shelf life of 20 years, at least 6 generations of change would have evolved in transports. Withoutthe transport that wrote the cartridge along with the application software, operating system, computerhardware, operations manuals, ample spare parts and the recorded media itself, data cannot be retrieved. Even with all of those moving parts in harmony and perfect environmental conditions, chances of getting data back are about 23%. If anything goes wrong with any of the cartridges used for backup, there is no redundancy which means an organization is unable to retrieve their data. IT organizations deal with this by re-masteringdata onto new transports and new media with every generation they change, which is a very expensive process.

    The mechanism for reading and writing tape are FAR more complicated than disk. With a disk, there is a flat,stable surface that spins without flexing in a hermetically sealed and contaminate-free enclosure. Beyond, thedisk itself, disk arrays offer complete data redundancy with RAID technology and 99.999% availability withhot-swappable components, redundant controllers, power supplies, etc.

    Tape vs. Disk - Power Efficiency

    Tape has long been considered the most power efficient media since a cartridge can be stored without power.However disk has made huge advances in power efficiency with spin- down technology like Nexsans AutoMAIDthat enables highly cost efficient long-term data retention by progressively putting disks into deeper sleep

  • 8/3/2019 Data Protection and Recovery in Small Mid-Size

    19/40

    19

    modes while offering near instantaneous response. With advanced power savings, Nexsan disk arrays allow theperformance and management simplicity of disk backup with greatly reduced power consumption.

    With an easy-to-use power configuration manager, the user can create policies for desired power savings afteruser-defined periods of idle time. When idle thresholds are met, AutoMAID progressively reduces disk drivepower consumption.

    The first I/O request will wake the array up to full power. Once the array is awake, it performs at 100%performance until enough idle time has passed to satisfy the energy savings policy, which places the array intoincreasingly deep levels of sleep. All of this happens automatically and provides great response performance as well.

    Conclusion

    Although the cost-per-byte stored on a single tape cartridge is less than disk, it is an isolated figure that gives a very incomplete look at a much larger picture. Grandfather-Father-Son produces about 25 to 1 more copies ontape than disk. That alone makes tape much more expensive. The choice is even more clear when adding thecost of labor to manage tape, the risk of data loss and downtime, performance limitations and theinconvenience of data that is offline. Protection, performance, reliability, management and cost all favor disk storage. And with AutoMAID power intelligence, online retention of rarely accessed data is justified.

    From the early 1950s until the late 1990s, the volume of data made sense for tape technology. But with theexplosion of the digital universe, tape cant reasonably sustain the role it once held. For most organizations,that threshold has already been reached as they cant even backup all their data within th e necessary window,let alone restore data fast enough to meet business requirements.

    As the pioneer of disk-to-disk backup, Nexsan was the first to understand and deliver the benefit of low-costdisk for the backup environment. As such, Nexsans unique position in the marketplace has been deliveringunparalleled value and leadership to enterprises of every size for over ten years. Small to large, Nexsan has the

    disk library for all your backup and archiving needs.

    Backup to Removable Disk This approach uses removable disks in a manner similar to tape. One or more backup sets are written tomultiple set removable disks, which are then periodically rotated using a scheme similar to a tape rotationscheme. With this approach the cost of a tape drive is eliminated and the speed of backup and restore isincreased. Hard disks still cost more than tape, however. Also, they are more susceptible to damage fromdropping than tape and their ability to retain data while sitting on the shelf is still relatively unknown, althougha spokesperson for one vendor said the shelf live should be at least five years, and periodic refreshing by powering up and rereading and rewriting the data should extend the data retention period another five years.One vendor of cartridge systems claims thirty years archival storage. Both tape and disk appear to besusceptible to damage from temperature extremes, but hard disks appear to be less susceptible to damage fromhigh humidity than tape.

  • 8/3/2019 Data Protection and Recovery in Small Mid-Size

    20/40

    20

    There are many ways to mount removable disks:

    External disk drives using USB, FireWire or eSATA interfaces.

    Internal cartridge dock and cartridges, such as the RDX system developed by ProStor Systems.The dock is installed in a 5 1/4" drive bay. Internal RDX docks use an USB or SATA interface.

    External cartridge dock and cartridges, such as RDX. External RDX docks use a USB interface.

    Internal tray-less hot-swap rack. This device allows the swapping of bare SATA drives andrequires an available hot-swap SATA port.

    External tray-less hot-swap rack. This typically requires a USB or eSATA port.

    The tray-less drives are the least expensive, since the racks for them only cost approximately $20-$75 and youare not paying for a case or cartridge for each drive. They are, however, the most susceptible to damage fromdropping and static electricity. The cartridge systems are probably least susceptible to damage. The damageresistance of the standard external drives is difficult to determine and to a great extend depends on theconstruction of the enclosure.

    On-Line BackupIncreased Internet access speeds, combined with ever-decreasing disk storage costs have made across-the-Internet backup viable. Known as both online backup and cloud backup, the use of these services has increaseddramatically over the last few years. Numerous companies are providing online backup services, software andeven dedicated backup appliances. Some systems combine online backup with more traditional disk-to-tape ordisk-to-disk backup. Some online systems provide for maintaining multiple versions or revisions of files andsome do not.

    Because of the low transfer speed of online backup when compared with disk-to-disk or disk-to-tape, most

    organizations do not rely on it for primary backup. This is not true in all cases, however. Some backup systems,for example, provide for online mounting of virtual machine images, allowing users to access their serverresources while local virtual machines are being rebuilt.

    On-line backup is usually used in conjunction with some method of local backup. Increasingly, backup systemsthat provide local backup are providing online backup as well.

    Most online services have a fixed base monthly or yearly cost plus data transfer and storage costs. Low-endservices can cost as little as $4-5 base monthly while the base cost of some services can be in the hundreds of dollars per month. Transfer costs and storage costs can vary from a low of about $0.15 per gigabyte to $3.00per gigabyte or more. Some vendors charge for data transfer and some do not. Also, the types of servicesprovided vary as well. For example, some services are strictly backup and restore, while others provide sharedremote access and/or remote drive mapping, so that multiple users can access online data as they would fromlocal storage. Some provide remote application support as well. Some backup services compress anddeduplicate your data before uploading to reduce network traffic, data transfer costs and storage costs.

  • 8/3/2019 Data Protection and Recovery in Small Mid-Size

    21/40

    21

    Online Backup Issues The following issues should be considered when selecting an online backup service:

    1. Data transfer rate. When large amounts of data need to be backed up, high-speed internet connectionsare required.

    2. Security. Most online backup services provide 256-bit SSL connections, but in some cases secureconnections are optional. Also, some service providers encrypt your data and some do not.

    3. Protection of your data. Some online providers have redundant data sites, while some store all your datain a single location. It is important to know how your provider protects your data.

    4. Retention policies. Can you set a policy for retention of multiple versions of your data? How flexiblecan your retention policy be? Can you set different policies for different classes of data? What is theservice provider s retention policy if a billing issue or dispute should arise? Is your data immediatelydeleted? Is there a grace period before access is cut off, and an additional grace period before data isdeleted?

    5. Emergency data access. How do you access your data if the systems being backed up are unavailable?Are there alternate access methods? What if you need a large amount of data quickly? Some servicescan arrange to ship your data to you on disk, if necessary. Also, is the data stored in a proprietary formator can it be accessed by multiple applications?

    6. Appropriateness of service. Are the services provided optimal for your organization? For example, if you would like online access to a virtual machine image in an emergency, can your software and onlineservice provide that?

    7. Costs versus benefits. Price per gigabyte of data stored or transferred is not the only measurement of online service costs and benefits. Make sure the services provided fit your organizations needs in a cost-effective fashion.

    Methods for Backing up DataThere are several means for backing up and protecting data.

    Traditional File-based BackupThe traditional file-based backup approach backs up a systems files and directo ries, along with file attributes,as discrete items. Some systems can back up directory (Active Directory, eDirectory, etc.) information as well,usually as a separate backup. The big advantage of this approach is that it is easy to restore a file or group of files, or a directory object or objects, relatively quickly and easily from any available backup medium. Most file backup systems maintain a catalog of the files and directory entries of all tapes (or other media) in the backuprotation. As tapes are overwritten those entries are removed from the database.

  • 8/3/2019 Data Protection and Recovery in Small Mid-Size

    22/40

    22

    File backup doesnt lend itself to quick bare metal recovery, so a number of backup software vendors haveprovided add-ons that perform disk imaging of the basic system, including boot sector and operating system.This approach, although greatly improved in recent years, has been problematic, especially if the recovery image has not been kept up to date or if a system was being restored to a different server or dissimilar

    hardware. Another problem with traditional file backup systems is that if the catalog becomes unavailable, due toproblems with the backup server for example, backup media needs to be re-imported into a new catalog, whichcan be a time-consuming process with multiple sets of backup media.

    File SynchronizationFile synchronization refers to the periodic or continuous copying of files and directories from a source locationto one or more destination locations in order to maintain duplicate file sets. This technique is often used tomake sure the most recent versions of files are available elsewhere if a primary system fails. Whenimplemented with a versioning system, this approach can maintain multiple revisions of files. Filesynchronization, with or without versioning, is often used in cloud (on-line) backup systems. It is also used between systems within an organization, commonly between sites to make sure data is quickly available in caseof a site-related disaster. File synchronization is often used in addition to traditional backup systems since itcan provide immediate access to data. Most file synchronization approaches are unidirectional, meaning they synchronize in one direction only. Bidirectional or multi-directional approaches also exist, but they are muchmore complex to implement and often require manual intervention to avoid version conflicts. When updatingfiles that have previously been replicated some programs re-replicate entire files while some use delta encodingto only replicate file changes. Delta encoding can significantly reduce both network traffic and replication time.Data compression and data deduplication can also be employed to optimize performance across WAN links.

    Remote Data ReplicationRemote data replication is the process of duplicating data between remote sites. With replication data is written to both a local, or primary, storage system and one or more remote, or secondary, storage systems. It isusually employed to guarantee data currency and availability in the event of a site disaster. Remote data

    replication can be conducted across the Internet or private networks.Remote data replication can be synchronous, asynchronous, semi-synchronous or point-in-time.

    Synchronous replication assures that each write operation is completed to both primary and secondary storage before a host system or application is notified that the operation is complete. This method assures thatidentical data is written to both primary and secondary storage, but, because of the timing issues involved, itcan definitely affect application performance. Effective synchronous replication requires extremely reliable,high-speed networks. Typically Fibre Channel over IP is used. Synchronous replication is usually employed where real-time replication with the highest level of reliability is a greater concern than cost. This method isoften used by financial institutions where the loss of even a few minutes of data can cost millions of dollars.

    Because of network performance requirements, synchronous replication over long distances typically employsFiber Channel over IP with channel extenders. As distance increases, latency also increases, which can affectapplication performance. Typically, distances of less than 150-200 miles are recommended, but under somecircumstances greater distances can be achieved.

  • 8/3/2019 Data Protection and Recovery in Small Mid-Size

    23/40

    23

    With asynchronous replication, data is written to primary storage and then to secondary storage sometimelater. The host system or application is notified that the operation is complete when the write to the primary system is complete. Data is then passed to secondary storage when network bandwidth is available. Typically this is within seconds or less, but sometimes can be several hours. Asynchronous replication is a good choice

    when relatively slow or unreliable networks are employed. With semi-synchronous replication a transaction is considered to be complete when it is acknowledged by theprimary storage system and the secondary storage system has receive the data into memory or to a log file. Theactual write to secondary storage is performed asynchronously. This results in better performance than asynchronous system, but it does increase the chance of failure of the secondary system write.

    Point-in-time replication uses snapshots to periodically update data changes, usually on a scheduled basis. Thisis the least reliable approach, but can be more effectively performed over low-speed links.

    Asynchronous, semi-synchronous and point-in-time replication can span virtually any distance, so are a goodchoices when storage systems are great distances apart. Because these approaches do not require immediate write acknowledgment from secondary storage they also create less of a potential performance impact on the

    host.Images, Clones and Snapshot Images Another method of backup is to replicate a disk or volume to another device. The methods to do this are knownas imaging, cloning and snapshotting. The descriptions here are representative and do not reflect all methodsused by various software vendors to create images, clones or snapshots.

    Imaging software creates a replica of a disk, volume or multiple volumes as a file or set of files that can be usedto restore a system to its state at the time the image was created. An image file is similar in function toCD/DVD ISO file. There are no standards for disk and volume image file formats and most are proprietary to aparticular software package. Older imaging software only allowed the restoration of complete images, but many current systems allow the restoration of specific files and folders.

    Cloning creates replicas of disks, including bootable replicas of system disks. While imaging requires restoringthe image file to a disk, a clone can be used as is in place of a failed disk.

    Snapshotting is a term that refers to the process of capturing the state of a system at a particular point in time.Disk imaging and cloning are both forms of snapshotting. There are two primary forms of snapshots full anddifferential. A full snapshot captures an entire volume, disk or system, while a differential snapshot only captures changes made since the last full snapshot. By creating and maintaining multiple differential snapshotsalong with a full snapshot a system can be restored to different points in time.

    Early image and cloning software, as well as some current software, require the system that is being imaged to be shut down and booted with a floppy disk, CD or USB device that hosts the imaging software in order createor restore the image or clone. A number of current products, however, allow imaging or cloning of a livesystem. In the Windows environment most products use Microsofts Volume Snapshot Service or VolumeShadow Copy Service (VSS) for this function. VSS is a set of services that are designed to provide consistentcopies of Windows systems and applications such as Microsoft SQL Server and Exchange.

    There are also live imaging systems for Macintosh OS and Linux as well. Apples Time Machine, included withMacintosh OS X, can be used to create bootable backups, and there are several third-party products that do thisas well. For Linux, Acronis Backup and Recovery 10 and the open source package Mondo Rescue can be usedfor live imaging.

  • 8/3/2019 Data Protection and Recovery in Small Mid-Size

    24/40

    24

    Continuous Data Protection and Near Continuous Data Protection When data is written to disk a continuous data protection system saves that new or updated data to a backupsystem. A near continuous data protection system will capture changed data every few seconds or at pre-defined intervals instead of immediately upon disk write. For most purposes the effect of the two approaches is

    the same data can be restored from nearly any point in time. Both approaches can have some effect on systemperformance and both generally consume more backup media space than more traditional approaches. SomeCDP packages allow administrators to set event -driven points such as the monthly closing of the books.

    Agent vs. Agentless Backup When the backup server or service is not running on the system being backed up some method of data transfermust be employed. This can be accomplished by installing a special piece of software, an agent, that is writtento specifically communicate with the backup system, or by using software that is already installed on thecomputer. This often means using standard communication protocols such as CIFS (SMB) or NFS. Theagentless approach usually simplifies the rollout of a backup system and can also reduce overall costs. Agents,on the other hand, can often provide better communication between the backup server and client, allowing, forexample, a client to tell the server about changes that need to be implemented in the backup. In agentless

    systems, as well as some agent-based systems, backup control is generally handled at the backup server. Agentsare also used for application backup. An agent can make sure a database is in a consistent state for backup, forexample.

    Windows Volume Shadow Copy Service (VSS) Volume Shadow Copy Service (VSS) is a set of services that are designed to provide consistent copies of Windows systems and Windows applications such as Microsoft SQL Server and Exchange. VSS has beenincluded with Windows since Windows Server 2003. VSS allows the backup of open files, locked files and opendatabases. Backups created with VSS are called shadow copies. VSS can back up full volumes and, with the useof application-aware components, back up specific applications, such as Microsoft SQL Server and Exchange.

    For volumes, VSS can create clones, or complete volume copies and differential copies, which are copies of datachanged since the last full or clone backup. For databases such as SQL Server and Exchange, VSS be used tocreate full backups, copy backups, incremental backups and differential backups.

    A full backup includes all selected databases but deletes transaction log files older than the startof the backup.

    A copy backup does not delete log files and will consume more disk space, but it does allow theability to restore data from points in time prior to the backup, if that data is in the transactionlogs.

    An incremental backup only backs up database changes since the last full or incremental backupand then deletes logs older than the start of the backup. When using incremental backups, torestore a database, you must have a full or copy backup and all subsequent incrementals.Generally, differential backups are preferred over incremental backups.

    A differential backup only backs up changes since the last full or copy backup but it does notdelete pre-backup logs. When using differential backups you only need the full or copy backupand the last differential.

    Some backup systems provide for transaction log backup through VSS as well.

  • 8/3/2019 Data Protection and Recovery in Small Mid-Size

    25/40

    25

    A VSS requester, which is usually a component of the backup software, starts the creation of the backup, orshadow copy. A VSS writer, usually using copy on write, will make sure the data being backed up is in aconsistent state. A VSS provider creates the copy.

    Most current software that backs up Windows uses VSS to some degree.

    Encryption and Password Protection of Backup MediaEncryption and password protection are often used for media that will be physically transported from one siteto another or will be stored in an unsecured location. In some industries legal and/or regulatory compliancemay require encryption of such media. Accidental disclosure of personal health records or financial data canhave severe repercussions, even if specific laws or regulations are not violated.

    Some backup programs, such as older versions of Symantec Backup Exec and EMC Networker, for example,provide password protection but not encryption. This makes unauthorized restoration of data difficult but notimpossible.

    Advanced Encryption Standard (AES)

    Advanced Encryption Standard (AES) is an encryption standard adopted by the U.S. government as FederalInformation Processing Standard (FIPS) 197 in 2001. AES is the encryption standard used by most enterprise-level backup systems. AES supports key sizes of 128, 192 and 256 bits.

    Tape Drive-based Encryption As of version 4, the Linear Tape Open (LTO) tape format supports hardware-based compression at the tapedrive. Although encryption is available for LTO-4 and LTO-5 tape drives, it is not implemented in all drives, soif it is used both the backup drive and restore drive, if different, must support encryption.

    Encryption Issues There are several issues to look for when you decide to encrypt data.

    Performance. Encryption uses CPU cycles, so it will affect performance of the system doing theencryption.

    Key Management . In simplest terms, an encryption key is a randomly-generated piece of information that determines the output of a cryptographic process or algorithm. Once data isencrypted with a particular key the appropriate key (with AES it is the same key) is required fordecryption. When encryption is used for backup systems it is absolutely critical to make sure thekey is available when data restoration is necessary. Effective key management procedures must bein place to make sure keys are properly generated, stored, used and replaced if necessary. Keys andkey management procedures must be stored and backed up outside the systems to which they apply so that they are available in an emergency.

    Encryption and Backup Data Compression. If both compression and encryption are used ona backup system, the data should be compressed before it is encrypted. If software encryption isused then compression should be disabled on the backup device. If LTO hardware encryption is

    being employed then both compression and encryption can be performed by the tape drive.

    Backup Data CompressionData compression is the process of encoding data so it uses less media space. Standard compression algorithmsusually operating on the bit level by removing redundant bits of data and replacing them with codes that can beused to restore that data on read. Data compression is supported by most backup systems. Compression can be

  • 8/3/2019 Data Protection and Recovery in Small Mid-Size

    26/40

    26

    provided by the backup application, the tape drive, or, if backing up to disk, the operating system of the backupdisk. Application-based compression is sometimes proprietary, so the compressed data can only be read by thatsoftware. Tape drive-based compression is transparent to the backup software and does not affect readability of the tape. Current Windows operating systems provide for transparent or on-write compression. Transparent

    compression is not native to any of the current production Linux file systems. Compression and decompression both affect system performance, since the processes use CPU cycles, RAM and disk space.

    Data Deduplication Data deduplication eliminates redundant data to reduce storage requirements. Pointers are used to referencethe single unique instance of the data retained on the storage system. Depending on the type of data,deduplication can significantly reduce storage requirements. For example, an e-mail system might maintaincopies of a file attachment in multiple users mail boxes. With data deduplication only one copy is maintained.Currently deduplication is used primarily for backup and archiving systems. Although data deduplication can be used on primary file systems, system overhead, lack of standards and lack of direct operating systemsupport 3 make this less attractive.

    File Mode and Block Mode There are two primary modes of data deduplication -- file mode and block mode. File mode looks for duplicatefiles while block mode looks for duplicate blocks of data within files.

    Block deduplication can be either fixed block deduplication or variable block deduplication. Fixed block deduplication looks for identical data blocks, while variable block deduplication uses more intelligent, thusmore processor-intensive, algorithms to look for identical data within blocks.

    The effectiveness of the three modes varies with the type of data being stored. Generally, file deduplication isthe least effective in terms of data reduction but has the least system overhead, variable block deduplication isthe most effective, but with the greatest system overhead, and fixed block deduplication falls somewhere in themiddle.

    In-Line or Post-Processing Deduplication In a backup or archiving environment , deduplication can be applied in two ways in-line or post-process. In-line deduplication operates as data is being written to a target device. If a new block (or file) is the same as anexisting block the new block is not written to the storage device. Instead, a pointer is set to the existing block (or file). With post-process deduplication , data is written to disk as it is received and then analyzed anddeduplicated after the fact. In-line deduplication uses RAM instead of disk space, but it can affect performance while data is being written to disk.

    Post process deduplication is part of an intelligent disk target, which is associated with a disk library. In postprocess deduplication, backup data is written to a disk staging area where the dedupe process works on data ata later point in time. Post process deduplication allows the use of most backup software choices, certainly all of the mainstream options.

    While sufficient storage must exist to hold a complete first copy in the scratch pool, the low implementationcosts of a SATA disk library will offset the costs as compared to an in-line based server solution. Specifically,the inline approach to deduplication takes lots of computing capacity, and still is a slower performer, andgreater risk than the post process approach. Servers are also expensive from a CAPEX point of view, and they

    3 Sun Microsystems (now Oracle) ZFS file system includes block deduplication support. ZFS is supported on current versions of Oracle

    Solaris, OpenIndiana (formerly OpenSolaris) and FreeBSD.

  • 8/3/2019 Data Protection and Recovery in Small Mid-Size

    27/40

    27

    are more power hungry than an efficient SATA array. More power produces more heat, and that results ingreater costs for cooling.

    Another major benefit of post processing is that data is moved to the safety of the Disk Library without beingslowed by deduplication processes in a server, ala inline deduplication. As a result, post processing systems will accept data and perform at much faster rates than the inline approach.

    In-line deduplication requires the use of specific backup software clients and becomes the first objection to itsimplementation. Not everyone wants to abandon the backup software that is currently being used, and peopleare happy with. This forced replacement is seen as disruptive and will require training and new processes. Thespecific client software associated with the inline approach is used to talk to a server also running specific backup software; to identify files and the hash or fingerprint that has been created for that file. This is done todetermine what if any action should be taken to deduplicate the file.

    In-line deduplication suppliers will tout the benefits of reductions in the file traffic across LAN/WANs, anddecreased storage capacity since they dont use a scratch pool. However, the in -line approach is in the datapath and can slow down the incoming backup and other applications that are trying to use the same SAN ports.

    This is why inline dedupe devices slow down application performance. Although the largest performance risksare associated with applications moving large streams of data, any application can be impacted. A huge risk isif a restore is required while a backup is underway. If a disaster happens with inline deduplication, the serveris already capacity consumed with very CPU intensive operations as a part of deduplicating the backups thatare running. If the inline server is now asked to restore deduplicated data, there is an additional load placed onthe server to rehydrate deduplicated data, which is also very CPU intensive. This results in not only the backup jobs slowing down, the restore will be slow as well. Most organizations find the cost of downtime to be acritical concern to the health of the company. Slowing down a restore could have significant economicramifications to the business. Better to avoid this potential risk, after a ll Murphys Law prevails.

    By deduplicating inline, performance is limited by the speed of the deduplication engine and scalability istypically limited as well. Building out an infrastructure to hit desired performance levels in large environmentscan be quite expensive.

    Suppliers of inline deduplication solutions wont talk about are the disadvantages in having to change backupsoftware, the slower overall performance, or the potential increase in TCO caused by expensive and powerhungry increases in necessary servers to run this approach.

    One final approach is a hybrid and is called concurrent processing. Concurrent processing still moves data to adisk staging area first, but doesn't wait for backups to finish before deduping.

    Backup Performance

    In backing up, performance is a function of two things;

    1) Capturea. Number of network port connections and their performance abilities, available bandwidth, and

    congestion b. The Intelligent Disk Target (IDT) (a.k.a. Disk Library) performance abilities

    i. The Network and IDT work together to transport data from a backup servers backupapplication, and capture it to the Intelligent Disk Target (IDT) cache.

    2) Post Processinga. The performance of the IDTs back end used to read data from the backup repository t o analyze

    the cached data using a hashing algorithm, and then deduplicating the data down to the block

  • 8/3/2019 Data Protection and Recovery in Small Mid-Size

    28/40

    28

    level to create a block level repository. It is reasonable to expect that the capture speed to thecache will be slightly faster than the creation of the block level repository.

    Restore Performance

    Restoring data to recover an application is always critically sensitive to speed. When restoring fromdeduplicated data, the volume being restored must be complete. While it is possible to optimize the backup by eliminating redundancy, the restore volume must be re -inflated to restore all the duplicate copies. Therefore, while the total amount of data representing a backup of a data space can be 20 or more times smaller than theoriginal data it backed up, when doing a restore the entire data volume, including duplicated data, must be written back as a part of the restore.

    Even though time is spent in processing pointers to re-inflate the data, overall restore speed will beapproximately the same as the spee