alain azagury [email protected] - university of minnesota
TRANSCRIPT
IBM Labs in Haifa © 2005 IBM CorporationThird Intelligent Storage Consortium University of Minnesota
Supporting continuous availability
Alain [email protected]
IBM Labs in Haifa
© 2005 IBM Corporation2 Third Intelligent Storage Consortium University of Minnesota
Agenda
Background and motivationKey building blocksCharacteristics of continuous remote copyDeep dive
Synchronous remote copyMetro Mirror (PPRC)
Asynchronous remote copyExtended Remote Copy (XRC)Global Mirror
The latest “hot” technology in the market…Summary
IBM Labs in Haifa
© 2005 IBM Corporation3 Third Intelligent Storage Consortium University of Minnesota
Scope of this Talk(or Disclaimer)
Focus on (Block) Storage ControllersAs opposed to host-based, switch-based, file-system solutions
Focus on IBM solutionsThe topic is not widely described in the literature
Focus on core technologies in storage controllersAn end-to-end view on business continuity would include:
Application and database specific automation of core technologies for business continuity Integration of core technologies into server environments and automation to enable business continuityCore technologies to do backup and restore, disaster recovery and continuous operations Hardware Infrastructure
IBM Labs in Haifa
© 2005 IBM Corporation4 Third Intelligent Storage Consortium University of Minnesota
Trends
By 2008, 45% of Global 2000 users will utilize two data centers to deliver continuous availability; of these, 25% will support real-time recovery. By 2006, more than 60% of G2000 data centers will utilize capacity on demand to satisfy less critical recovery services. Through 2008, more than 50% of G2000 users will utilize a single "hardened" data center augmented by third-party services to deliver traditional, cost-effective disaster recovery services (48- to 72-hour recovery).
META Trend 3/8/04
IBM Labs in Haifa
© 2005 IBM Corporation5 Third Intelligent Storage Consortium University of Minnesota
Some lessons from September 11
Recovery requires less of a dependency on people and a greater dependency upon automationThe "rolling disaster" scenario was validatedDisasters may cause multiple companies to recover and that puts stress on the commercial business recovery servicesRecovery of data from distributed systems and desktops ranged from grade "A" to grade "F"Tape, as well as disk, is a crucial part of the recovery capabilityRethinking of distance between data centersRethinking of synchronous versus asynchronousD/R Plan after Successful Recovery from Disaster
IBM Labs in Haifa
© 2005 IBM Corporation6 Third Intelligent Storage Consortium University of Minnesota
Recovery Metrics
Time to Recover : How quickly is application recovered after a disaster? 15 Min. 1-4 Hr.. 4 -8 Hr.. 8-12 Hr.. 12-16 Hr.. 24 Hr.. Days
Tier 4 - Data Base Log Replicationed & Host Log Apply at Remote.
Tier 3 - Electronic Tape Vaulting
Tier 1 - PTAM*
Tier 2 - PTAM & Hot SitePoint-in-Time Backup to Tape
Active Secondary Site
Recovery Point Objectives Amount of Lost Data
Cos
t - T
CO
Ser
vers
/Net
wor
k B
andw
idth
/Sto
rage
*PTAM = Pickup Truck Access Method
Tier 7 - RPO=Near, RTO <1Hr. Server/Workload/Network/Data Automatic Site Switch
Tier 6 - RPO=Near Zero, RTO= Manual - Disk or Tape Data Mirroring
Tier 5 - RPO > 15 min. RTO= Manual; PiT or SW Data Replication.
RPO= 24+ HoursRTO= Days
RPO= 4+ HrsRTO= 4+ Hrs
RPO= 0, secondsRTO= <1hr -> 4Hrs
IBM Labs in Haifa
© 2005 IBM Corporation7 Third Intelligent Storage Consortium University of Minnesota
Agenda
Background and motivationKey building blocksCharacteristics of continuous remote copyDeep dive
Synchronous remote copyMetro Mirror (PPRC)
Asynchronous remote copyExtended Remote Copy (XRC)Global Mirror
The latest “hot” technology in the market…Summary
IBM Labs in Haifa
© 2005 IBM Corporation8 Third Intelligent Storage Consortium University of Minnesota
Key Building Blocks
Point-in-Time CopyThe ability to create a consistent snapshot of large volumes of data, potentially spanning across multiple controllers
IBM’s FlashCopy, EMC’s TimeFinder, HDS’s ShadowImageContinuous replication
SynchronousIBM Metro Mirror, EMC SRDF
Asynchronous, no consistency guaranteesIBM Global Copy, SRDF/Adaptive
Asynchronous, consistency guaranteesIBM Global Mirror, EMC SRDF/A, HDS TrueCopy
zSeries onlyGlobal Mirror for zSeries (XRC)
IBM Labs in Haifa
© 2005 IBM Corporation9 Third Intelligent Storage Consortium University of Minnesota
Point-in-Time Copy
Three major techniquesSplit mirrorChanged blockConcurrent
Current expectationsConsistent across thousands of volumes and multiple controllersProduction I/O cannot be disrupted for more than100’s of millisecondsTarget copy needs to survive failures
IBM Labs in Haifa
© 2005 IBM Corporation10 Third Intelligent Storage Consortium University of Minnesota
Agenda
Background and motivationKey building blocksCharacteristics of continuous remote copyDeep dive
Synchronous remote copyMetro Mirror (PPRC)
Asynchronous remote copyExtended Remote Copy (XRC)Global Mirror
The latest “hot” technology in the market…Summary
IBM Labs in Haifa
© 2005 IBM Corporation11 Third Intelligent Storage Consortium University of Minnesota
Continuous Remote Copy – Characteristics
Consistency – does the remote copy reflect a consistent view of the data as seen at the source site at some point in time?
No guaranteeWith enough time without modifications, a consistent state will be reached
Power failureDependent writes consistency
Application consistencyAssumes application-specific knowledge or application control Application performs regular checkpointsAllows faster restarts, at the expense of currency
IBM Labs in Haifa
© 2005 IBM Corporation12 Third Intelligent Storage Consortium University of Minnesota
Continuous Remote Copy – Characteristics (cont.)
Currency – how out of date are the data at the remote site?Synchronous
No data lossHowever – what about rolling disasters such as virus contamination or other data corruption issues?
AsynchronousMinimal impact of application performanceHot spot data transfer reductionBetter bandwidth utilization
IBM Labs in Haifa
© 2005 IBM Corporation13 Third Intelligent Storage Consortium University of Minnesota
Continuous Remote Copy – Characteristics (cont.)
Latency impact – impact on application response time? Impact felt on application writesFor synchronous solutions, a function of the distance between sites
At least 5 µsec per kilometer (10 µsec roundtrip), based on speed of light in glassCompare to < 1 msec for local writes
For asynchronous, a function of the overhead of bookkeeping Can be as simple as setting a bit in a bitmap, or more complex such as queuing a message to transfer the data
IBM Labs in Haifa
© 2005 IBM Corporation14 Third Intelligent Storage Consortium University of Minnesota
Continuous Remote Copy – Characteristics (cont.)
Bandwidth requirements – what network bandwidth is required for the solution?
For synchronous solutions, it is determined by the peak write load requirementsFor asynchronous solutions, it is determined by the average write load, and the tolerance to lag in currencyAdditional considerations
Transfer of modified bytes onlyLevel of granularity for modified data bookkeepingHotspot elimination
IBM Labs in Haifa
© 2005 IBM Corporation15 Third Intelligent Storage Consortium University of Minnesota
Agenda
Background and motivationKey building blocksCharacteristics of continuous remote copyDeep dive
Synchronous remote copyMetro Mirror (PPRC)
Asynchronous remote copyExtended Remote Copy (XRC)Global Mirror
The latest “hot” technology in the market…Summary
IBM Labs in Haifa
© 2005 IBM Corporation16 Third Intelligent Storage Consortium University of Minnesota
Synchronous Replication – ESS Metro Mirror (PPRC)
Ensures that the data written will be applied to the secondary before the application host is notifiedConsists of two major phases
Full track asynchronous transfer modeDuring initial establishment or during resynchronization
Changed sector synchronous transfer modeSends only modified sectors to the secondary volume
Support distances of up to 300kms
IBM Labs in Haifa
© 2005 IBM Corporation17 Third Intelligent Storage Consortium University of Minnesota
Synchronous Replication – ESS Metro Mirror (PPRC)
IBM Labs in Haifa
© 2005 IBM Corporation18 Third Intelligent Storage Consortium University of Minnesota
Synchronous Replication – ESS Metro Mirror (PPRC)Characteristics
No data lossHas a direct impact on write processing time
Processing time in primary ESS to send modified blocksProcessing time at secondary site (fast write)Network delay time
ConsistencySingle volume consistency guaranteed due to synchronous nature of transferConsistency groups allow consistency across volumes (and controllers) in the event of volume suspension
When a volume pair becomes “suspended”, changed tracks are recorded in a bitmapHowever, need to prevent other volume pairs to continue receiving updatesPPRC provides a message to the host processors, and commands to freeze all secondary volumes upon detection of the first failing volume
IBM Labs in Haifa
© 2005 IBM Corporation19 Third Intelligent Storage Consortium University of Minnesota
Synchronous Replication – ESS Metro Mirror (PPRC)Performance
Local writes (NO PPRC) vs. synchronous copy (PPRC@75kms) vs. asynchronous copy with no consistency guarantees (PPRC XD)
Performance measurements from the test lab of the IBM Storage Systems Group on March 22, 2002
IBM Labs in Haifa
© 2005 IBM Corporation20 Third Intelligent Storage Consortium University of Minnesota
Asynchronous Replication – Extended Remote Copy (XRC)
Supports a single zSeries or a zSeries Parallel SysplexThe controller puts information about the write operations in a “side file”
zSeries I/O operations include a timestamp provided by the hostThe zSeries Parallel Sysplex has a cluster-wide timer facility
It also places a pointer to the modified dataA data mover external process issues commands to the primary control unit to read the host’s modifications
The timestamps allow the data mover process to ensure causal consistency among the writes
Notice: if a write is received for data that are referenced from the queue before these data are transferred to the secondary site, the control unit cannot allow the data to be overwritten
IBM Labs in Haifa
© 2005 IBM Corporation21 Third Intelligent Storage Consortium University of Minnesota
Asynchronous Replication – Extended Remote Copy (XRC)
Asynchronous, continuous remote copy solution for zSeries dataSupported by the disk subsystemDriven by software running on a zSeries host
IBM Labs in Haifa
© 2005 IBM Corporation22 Third Intelligent Storage Consortium University of Minnesota
Asynchronous Replication – ESS Global Mirror
Asynchronous solutions for zSeries, iSeries and open systemsConsistency always maintained at mirrored siteMirror lags in currency by as little as 5 secondsA tertiary copy is required to preserve consistencyData loss limited to data in queue or in transit
Consistent Asynchronous Mirroring
1.
•Write
4.
•Write acknowledged by secondary
3.
•Write to secondary
2.
•Write acknowledgment (channel end / device end)
IBM Labs in Haifa
© 2005 IBM Corporation23 Third Intelligent Storage Consortium University of Minnesota
Asynchronous Replication – ESS Global Mirror
SAN
FlashCopy
FlashCopy
FlashCopy
SAN
Copy consistency managed by Master Control Server Uses tertiary copy to ensure consistencyApplies a point-in-time copy every 5 seconds
CA B
IBM Labs in Haifa
© 2005 IBM Corporation24 Third Intelligent Storage Consortium University of Minnesota
Prepare FlashCopy
Asynchronous Replication – ESS Global MirrorConsistency Group Formation
Coordinate ESSs
Let consistent data drain to remote Record new writes at local site
. . .
CG Interval Time*
Commit FlashCopy
*Consistency Group Interval may be set from 0 seconds(consistency continuously formed) up to 18 hours
IBM Labs in Haifa
© 2005 IBM Corporation25 Third Intelligent Storage Consortium University of Minnesota
Asynchronous Replication – ESS Global MirrorCharacteristics
Minimal impact on write processing timeConsistency group formation will inhibit writes to complete until all controllers have acknowledged
Minimal data loss Can create a consistency group every 5 seconds
ConsistencyMaintains power-failure consistency across heterogeneous sets of volumes
zSeries, iSeries, open volumesVolumes “C” are the consistent set
Except for the case where the FlashCopy from “B” to “C” succeeds partially, in which case volumes “B” are consistent
IBM Labs in Haifa
© 2005 IBM Corporation26 Third Intelligent Storage Consortium University of Minnesota
Agenda
Background and motivationKey building blocksCharacteristics of continuous remote copyDeep dive
Synchronous remote copyMetro Mirror (PPRC)
Asynchronous remote copyExtended Remote Copy (XRC)Global Mirror
The latest “hot” technology in the market…Summary
IBM Labs in Haifa
© 2005 IBM Corporation27 Third Intelligent Storage Consortium University of Minnesota
Continuous Data Protection
What is it?New paradigm in data protection Storage mechanism that keeps a time-ordered history of application writesGranularity of history varies between products, from every write to every few minutes State of storage can be quickly reverted to any previous points in time
Major issuesSpace efficiencyManagement and adaptation with current applications
IBM Labs in Haifa
© 2005 IBM Corporation28 Third Intelligent Storage Consortium University of Minnesota
Summary
Continuous availability is moving from a Fortune 500 requirement to the masses
Small and Medium Businesses are requiring continuous availabilityNew regulations impose additional requirements
Additional copies (synchronous and asynchronous)Longer distance
Advanced controllers offer sophisticated infrastructure to support continuous availability
Point-in-time copy, and synchronous and asynchronous remote copy
Requirements are becoming more stringentBetter support for rolling disastersEnhanced resiliency to failures