is it time to reassess your availability approach ?

37
1 © Copyright 2013 EMC Corporation. All rights reserved. Is it Time to Reassess Your Availability Approach? David Edborg Chief Architect EMCC Assured Availability Services

Upload: tess

Post on 25-Feb-2016

24 views

Category:

Documents


0 download

DESCRIPTION

Is it Time to Reassess Your Availability Approach ?. David Edborg Chief Architect EMCC Assured Availability Services. Abstract. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Is it  Time  to  Reassess Your Availability Approach ?

1© Copyright 2013 EMC Corporation. All rights reserved.

Is it Time to Reassess Your Availability

Approach?David EdborgChief Architect EMCC Assured Availability Services

Page 2: Is it  Time  to  Reassess Your Availability Approach ?

2© Copyright 2013 EMC Corporation. All rights reserved.

AbstractBreakout C: Is It Time to Reassess Your Availability Approach?

Information availability is in the middle of a period of intense change. What were acceptable business continuity and disaster recovery strategies even a year ago are now being questioned, and organizations must constantly re-examine information availability capabilities. For IT infrastructure, data center and disaster recovery managers, it has become a challenge to keep up. But this presentation can help. We will address the following questions:

1. What's causing the current shift in information availability?2. What are the emerging technologies that IT managers should be discussing with their teams?3. Is near-zero recovery time a realistic expectation?4. How can companies deliver always-on IT while reducing cost and risk?

David EdborgChief Architect, EMC Availability ServicesEMC

Page 3: Is it  Time  to  Reassess Your Availability Approach ?

3© Copyright 2013 EMC Corporation. All rights reserved.

Which Philosophy Do You Subscribe To?Moore’s LawGordon Moore “… the number of transistors on a microchip would double every two years.”

OrGretzky’s Rule Gretzky, “I skate to where the puck is going to be, not where it has been.”

Page 4: Is it  Time  to  Reassess Your Availability Approach ?

4© Copyright 2013 EMC Corporation. All rights reserved.

Availability Challenges Today

Pressure to Reduce Spend and Expand

Services

HA and DR Increased

Complexity and Cost

Users Expect Zero Downtime

Planned Outage Approval DifficultUnplanned Outages Unacceptable

Page 5: Is it  Time  to  Reassess Your Availability Approach ?

5© Copyright 2013 EMC Corporation. All rights reserved.

1. Data Center Move (< 1%) Relocations Natural Disaster Business Change, Merger or Acquisition

3. Scheduled Events (~85%) Maintenance, migrations, backups/restores,

batch jobs, installations or upgrades Data warehouse extracts, builds, and loads

Availability Impact of Event Types

2. Unscheduled Events (15%) Technical Failure Operational Failure

Page 6: Is it  Time  to  Reassess Your Availability Approach ?

6© Copyright 2013 EMC Corporation. All rights reserved.

Typical IT Availability Charter External

Maximize Uptime Internal

1. Provide Scheduled Outages2. Handle Unscheduled

Outages3. Accommodate Data Center

Moves

Page 7: Is it  Time  to  Reassess Your Availability Approach ?

7© Copyright 2013 EMC Corporation. All rights reserved.

Traditional DR vs. Continuous AvailabilityUtility company has a power failure at primary Data Center – backup DC

with Tier-1 standby equipment 40 miles away & has power

Hospital has a power failure at primary Data Center – Continuous

Availability (CA) Architecture

Downtime: 16 hours Downtime: 0 hours

DecisionWait until power is restored

ReasonToo long to fail-over and fail-back, critical apps have DR, non-critical do not

DecisionNo event, no decision

ReasonCA environment with load balanced enabled production in both centers; if one site goes down, the processing load auto migrates to the other

Page 8: Is it  Time  to  Reassess Your Availability Approach ?

8© Copyright 2013 EMC Corporation. All rights reserved.

Why Rethink DR Traditions?Expensive to ImplementExpensive to MaintainExpensive to Test

During your last major disruption, did you even consider using your DR solution?

…UnreliableAnd it isn’t useful for most

Availability Events

Page 9: Is it  Time  to  Reassess Your Availability Approach ?

9© Copyright 2013 EMC Corporation. All rights reserved.

So, … how did we get here?

TraditionalDisaster Recovery

Tape Backup and Offsite RotationAdvanced RecoveryReplication to Second Site

Two Different Disciplines & Technologies to Deal with SPOFs

High AvailabilityIn-data-center Application Restart

Page 10: Is it  Time  to  Reassess Your Availability Approach ?

10© Copyright 2013 EMC Corporation. All rights reserved.

Where is the Technology Going?Continuous AvailabilityApplication continues without disruption

(0-Downtime)

ConvergenceTraditionalDisaster Recovery

Tape Backup and Offsite Rotation

High AvailabilityIn-data-center Application Restart

Advanced RecoveryReplication to Second Site

Page 11: Is it  Time  to  Reassess Your Availability Approach ?

11© Copyright 2013 EMC Corporation. All rights reserved.

The Journey to Continuous AvailabilityContinuous AvailabilityApplication continues without disruption

(0-Downtime)

Convergence

One Common Discipline & Set of Technologies to Deal with SPOFs

Page 12: Is it  Time  to  Reassess Your Availability Approach ?

12© Copyright 2013 EMC Corporation. All rights reserved.

Why Make the Journey?Single availability solution:+Eliminate downtime for multiple

scenarios+Eliminate idle assets+Reduce the costs of DR/HA testing

Make Verification/Auditing easy

Potential 28-50% Reduction in Compute Cost …

Page 13: Is it  Time  to  Reassess Your Availability Approach ?

13© Copyright 2013 EMC Corporation. All rights reserved.

Site A Site B

Distribution Layer

Presentation Layer

Application Layer

DB / File Layer

Storage Layer

Layer-2 Adjacency

DCI

Distributed Virtual Volume

Continuous Availability Characteristics1. Two-site Parallel Transaction

Processing Architecture• Off-the-Shelf Technology• Non-invasive application adaptation

2. Continuous Availability (CA) Service Level• Always-On• App or Service always available in at

least one site• Able to sustain all single failures

including site loss• Transactions automatically re-routed

3. CA Apps maintained with little to no minimal disruption

Transactions Redirect on Site Failure

Transactions Flow to

Either Site

Page 14: Is it  Time  to  Reassess Your Availability Approach ?

14© Copyright 2013 EMC Corporation. All rights reserved.

Availability Architecture TransformationUse Production 5x9s HA Tier-1 DR Total Over-Provision

△ 20%

△ 120%

△ 200%

+ =0+Site A

60%

Site B

120%60%EMC CA

(Fractional Provisioning)

+ =DR100%

Svr300%

DB Cluster+Svr

100%

HA+1100%

TraditionalTier-1 DB

+ + =Svr100%

HA+.220%

DR100%

Svr220%

TraditionalTier-1 Web

Reducing the Cost Curve by Putting Idle Assets to Use

Page 15: Is it  Time  to  Reassess Your Availability Approach ?

15© Copyright 2013 EMC Corporation. All rights reserved.

Fractional Compute Model Concepts Provision each site with average compute

– Presumption; most servers are modeled to run at 55% - 70%– Headroom is used for peaking

Aggregate in pool provides: need, peaking, and loss protection (traditional HA and DR)

Fractional Computing Math:Traditional Allocation

Need 5HA 1DR 5Total 11

60% Fractional AllocationNeed = 5

Site-A 60% .6 * 5 = 3Site-B 60% .6 * 5 = 3Total = 6Savings 11 – 6 = 5

80% Fractional AllocationNeed = 5

Site-A 80% .8 * 5 = 4Site-B 80% .8 * 5 = 4Total = 8Savings 11 – 8 = 3

Page 16: Is it  Time  to  Reassess Your Availability Approach ?

16© Copyright 2013 EMC Corporation. All rights reserved.

Production/HA & DR Assuming Internal RecoveryFractional Compute Provision v. Traditional DR

Availability Tier-1 Tier-2 Tier-3Service Level Active/Fail-Over Active/Repurpose Active/Recover

Typical Provisioning Active/Idle Standby Active/Reuse Dev/Test

Shared Risk Provider

RTORPO

minutes0

hours0

days24-48h

00

CA-2SiteHA

FullCA

CA &2SHA

All2SHA

Traditional300%

200%

100%CPU

Tier-0Active/Active

Active with spares

Page 17: Is it  Time  to  Reassess Your Availability Approach ?

17© Copyright 2013 EMC Corporation. All rights reserved.

Site Loss – DR v CA AlternativeDisaster Recovery:a. Evaluation

▪ Outage/ Disaster extent / estimate of duration of outage

b. Make decision to fail-over; i.e. declare a disaster

c. Invoke BC processd. Initiate fail-over process; push

the “big red buttons”e. Handle outage callsf. Figure out how to come home

Continuous Availability:a. Immediately have average

compute available – do nothing

b. Evaluation▪ Outage / Disaster extent / estimate of

duration of outagec. Triage

▪ Determine any workloads to defer▪ Take down low use/low priority apps /

reallocate virtual CPUd. Open load balancer when site

back online

Which scenario would you rather deal with?

Page 18: Is it  Time  to  Reassess Your Availability Approach ?

18© Copyright 2013 EMC Corporation. All rights reserved.

Save CAPEX– Fewer Servers– Less Storage (Fewer Copies)

Save OPEX– Tech Refresh Seldom Requires Outage– Impact From Test – Headcount/Labor, Licenses, Space /

Power / Cooling, Maintenance and Patches…

ReduceCost

Why Take The Journey to CA?

Page 19: Is it  Time  to  Reassess Your Availability Approach ?

19© Copyright 2013 EMC Corporation. All rights reserved.

A Model for Continuous Availability1. Stretch Farms and Clusters between

sites2. Stretch an A/A DB with a locking

mechanism between sites3. Add SAN storage4. Add Networking

a. Upgrade Local Load Balancing to Globalb. Data Center Interconnect (DCI)c. Spanned VLAN (VPLS/OTV)d. WAN Connections

5. Add Data Coherency Mechanism (e.g. EMC VPLEX)

The Application now is abstracted and spanned between sites

Site A Site B

Distribution Layer

Presentation Layer

Application Layer

DB / File Layer

Storage Layer

Layer-2 Adjacency

DCI

VPLEX Distributed Virtual Volume

Page 20: Is it  Time  to  Reassess Your Availability Approach ?

20© Copyright 2013 EMC Corporation. All rights reserved.

CA at the Presentation and App Logic Layer

(Static)

CA at the DB and File System Layers

CA at the Storage Layer

Different Layers Can Be IndependentVPLEXWITNESSCA at the

Distribution Layer

Page 21: Is it  Time  to  Reassess Your Availability Approach ?

21© Copyright 2013 EMC Corporation. All rights reserved.

What if Existing Sites Exceed Metro* Distance?Options:

– Create a small presence at a CoLo within metro distance for one leg– Establish the two sides in an existing data center by creating independent pods– Establish two sides in buildings on a campus

▪ Added value: Vmware FT – no inflight transaction loss

The Metro Distance Requirement is 5ms RTT, or roughly 60 miles or 100km.

Page 22: Is it  Time  to  Reassess Your Availability Approach ?

22© Copyright 2013 EMC Corporation. All rights reserved.

What About Metro Distance Limits?Earthquake and Hurricane Considerations

• Historically, the major impact span of US continental earthquakes has been under 43 miles

• Nearly all US Coastal areas are susceptible to Hurricanes; but the impact dissipates as it comes inland

USGS Earthquake Map 1900-2002Red Span 0 – 69KMGreen Span 70-299KM

http://www.nhc.noaa.gov/breakpoints.shtml

Page 23: Is it  Time  to  Reassess Your Availability Approach ?

23© Copyright 2013 EMC Corporation. All rights reserved.

What About Metro Distance Limits?

http://blogs.ei.columbia.edu/wp-content/uploads/2012/11/storm_surge_map_final.jpg

Page 24: Is it  Time  to  Reassess Your Availability Approach ?

24© Copyright 2013 EMC Corporation. All rights reserved.

Site C(out-of-region)

Development Anchor Test/QA

RPA

RPA

Replication Network

• 2N+1 solution deployed out-of-region

• SRM deployed to automate fail-over

Maximum disaster radius for last 100 yrs. has been ~25 miles (40km)

What About Metro Distance Limits?VPLEXWITNESS

Page 25: Is it  Time  to  Reassess Your Availability Approach ?

25© Copyright 2013 EMC Corporation. All rights reserved.

And Use Geographically Dispersed Pods to Backup Each OtherOr Break Processing Down Between Pods

Page 26: Is it  Time  to  Reassess Your Availability Approach ?

26© Copyright 2013 EMC Corporation. All rights reserved.

Leveraging CA Constructs to Reduce Outages

Take a site offline for maintenance

Take an app offline in a site for rolling maintenance

Encapsulate site configuration Encapsulate human errors Data Warehouse Loads/Extracts

Scheduled Outages

Un-Scheduled Outages

Page 27: Is it  Time  to  Reassess Your Availability Approach ?

27© Copyright 2013 EMC Corporation. All rights reserved.

Availability Testing In Lieu of DR Testing

1. Take App Down in Site-AApp Continues to run in Site-B

2. Trace a transaction thru Site-B

3. Bring App Up in Site-ATrace a transaction thru Site-A

Fed Requirement: Regular DR Testing or Use Capability Regularly

Page 28: Is it  Time  to  Reassess Your Availability Approach ?

28© Copyright 2013 EMC Corporation. All rights reserved.

• Studied 4 applications with various recovery requirements

• Current recovery uses repurposed QA systems– Current state has 73 servers

(Need-60, HA-13, DR-Dedicated-0)– DB Replication, file systems not protected

• Requirements– Improve recovery & availability posture– DR solution needs to be scalable

Case Study – Power & Gas Utility

Planned State Potential CA0

50

100

150

DRHANeedCA/Site-BCA/Site-A

• Traditional DR Solution (replication & standby equipment):– Need-56, HA-22, DR-56 = Total 132 Servers

• Converged Prod/HA/DR Solution:– Site-A 44, Site-B 44 = Total 88 Servers

Solution OptionsBenefits• Reduce Server Count by 33%• Improved Availability Posture• $3.1M Cost Avoidance

Potential to Reduce

Server Count from 132

to 88

How Continuous Availability Reduces Costs & Required ResourcesCurrent State & Requirements Traditional Prod/HA/DR vs. CA

Page 29: Is it  Time  to  Reassess Your Availability Approach ?

29© Copyright 2013 EMC Corporation. All rights reserved.

• SAP ERP & Critical IT has 352 servers– Need-173, HA-57, DR-122 = 203% of need– RTO < 24 hrs, RPO < 5 min, SLA 3x9’s

• Concerns:– DR plan in place, but no coming home plan– Failover of top DR tiers strands other apps– Idle and out-of-sync assets

Case Study – Global Life Sciences Firm

Current State

Potential CA

0100200300400

DRHANeedCA/Site-BCA/Site-A

• Converged HA & DR Architectures– VPLEX / RAC / OTV / vSphere MSC

• SAP stays up regardless of failure scenario– Most App Transactions under CA– Low use Apps deployed as 2-Site HA

• Improved confidence in availability

Continuous Availability SolutionBenefits• Reduced Server Count by 29%• Reduced Cost $18M over 3 years• Eliminate RPO & RTO• Eliminate Idle Assets, DR & Fail-Over Time

Reduced Server

Count from 352 to

250

How CA Improved Availability, Eliminated DR, & Reduced Cost Issues Traditional Prod/HA/DR vs. CA

Page 30: Is it  Time  to  Reassess Your Availability Approach ?

30© Copyright 2013 EMC Corporation. All rights reserved.

Forrester Consulting Study Results*Active-Active Data Centers Provide Operational And Financial Benefits Unite HA and DR into a single approach

– 89% of respondents agreed or strongly agreed Leverage off-the shelf technology

– 69% of organizations agreed or strongly agreed Reduce DR capital expenditures

– 67% of organizations agreed or strongly agreed that they were able to reduce capital expenditures by combining HA and DR.

Reduce the downtime for all IT services & applications– 86% of organizations agreed or strongly agreed that AA DC reduced

downtime for all IT services and apps.*Question Details In Appendix

Page 31: Is it  Time  to  Reassess Your Availability Approach ?

31© Copyright 2013 EMC Corporation. All rights reserved.

Assured Availability ServicesContinuous Availability

Back-up and Recovery

Disaster Recovery

Managed Availability

Advisory Service Implementation Service

Management ServiceReadiness Service

Back-up and Recovery

Disaster Recovery

Page 32: Is it  Time  to  Reassess Your Availability Approach ?

32© Copyright 2013 EMC Corporation. All rights reserved.

SummaryContinuous Availability Technologies can: Increase Availability Reduce:

– Cost– Complexity

Can be built with off-the-shelf technology with little to no invasive application changes

Page 33: Is it  Time  to  Reassess Your Availability Approach ?

33© Copyright 2013 EMC Corporation. All rights reserved.

Summary Quiz: Can you Find the Arrow in the FedEx Logo?

• Moral: Sometimes solutions are in front of us and we just can’t see them

Page 34: Is it  Time  to  Reassess Your Availability Approach ?

34© Copyright 2013 EMC Corporation. All rights reserved.

What is unique about EMC?

We leverage technology to provide certainty in availability.

Page 35: Is it  Time  to  Reassess Your Availability Approach ?
Page 36: Is it  Time  to  Reassess Your Availability Approach ?

36© Copyright 2013 EMC Corporation. All rights reserved.

APPENDIX

Page 37: Is it  Time  to  Reassess Your Availability Approach ?

37© Copyright 2013 EMC Corporation. All rights reserved.

Forrester Consulting Study ResultsActive-Active Data Centers Provide Operational And Financial Benefits

Source: A commissioned study conducted by Forrester Consulting on behalf of EMC Corporation, January 2013Full Report @ http://tinyurl.com/owsjxyg