ultan kinahan dr - minasi 2010

23
MINASI INTERNET MEETING 2010 Disaster Recovery Planning Are you covered? Ultan Kinahan [email protected]

Upload: nathan-winters

Post on 28-Nov-2014

1.418 views

Category:

Technology


3 download

DESCRIPTION

Ultan Kinahan

TRANSCRIPT

Page 1: Ultan kinahan   dr - minasi 2010

MINASI INTERNET MEETING 2010

Disaster Recovery PlanningAre you covered?

Ultan [email protected]

Page 2: Ultan kinahan   dr - minasi 2010

Who am I?

Ultan Kinahan Born & Raised in Ireland & moved states side in 1992

the week after I finished college & started working in a bar in Greenwich Village NYC… Landed first role in IT here in late ‘92

Currently; Regional IT Director of Brown & Brown Insurance Co. Previous roles included:

AOL – “Mapquest, Digital City & Moviefone” Network Admin for a realty firm in NYC Multiple Consulting positions

Page 3: Ultan kinahan   dr - minasi 2010

What constitutes a disaster?

Disaster is Natural, Recovery is Superhuman! A disaster is an unplanned event that

interrupts normal business operations. Types of events

“Acts of God” Floods Earthquakes Volcano's Snow Storms etc.

Man Made Acts of Terrorism

Page 4: Ultan kinahan   dr - minasi 2010

A few phrases we've all comes across

Risk Management Business Continuity Disaster Recovery

Recovery Time Objective (RTO) The target time for making an application available

Recovery Point Objective (RPO) The age of the data recovered

Data Center Where we store all our toys.

Page 5: Ultan kinahan   dr - minasi 2010

What is DRP (Disaster Recovery Planning) ?

Essentially the breakdown goes: RM (Risk Management) drives BCP (Business

Continuity Planning) BCP (Business Continuity Planning) drives DRP

(Disaster Recovery Planning) DRP – Mainly where the IT comes into play

and involves: Planning and implementation of procedures and facilities for use when essential systems are not available for a prolonged period of time

Page 6: Ultan kinahan   dr - minasi 2010

What is BCP (Business Continuity Planning) ?

Business continuity planning Is the creation and validation of a practiced logistical

plan for how an organization will recover and restore partially or completely interrupted critical (urgent) functions within a predetermined time after a disaster or extended disruption. The logistical plan is called a business continuity plan.

In plain language BCP is working out how to stay in business in the

event of disaster. Incidents include local incidents like building fires, regional incidents like earthquakes, or national incidents like pandemic illnesses.

Page 7: Ultan kinahan   dr - minasi 2010

A few Statistics

In 2009 Symantec released the results of its fifth annual Global IT Disaster Recovery survey; According to the report, 93% of organizations have had to

execute their DR Plans and the average cost: USA: $287,000 Canada: $496,500

The average budget for disaster recovery initiatives WORLDWIDE is $50 million USD – Not a lot really!

Average time it takes to "achieve skeleton operations after an outage" is 3 hours.

Average time to be fully "up and running after an outage," the average is 4 hours, states the report.

Page 8: Ultan kinahan   dr - minasi 2010

A few Statistics - continued

Executive-level involvement in DR plans is rising. In 2007, 55% of respondents reported DR committees

involved the CIO, CTO or IT director this dropped to 33 per cent in 2008 The number rose to 67 per cent in 2009

DR "becoming a competitive differentiator“ within organizations – especially in the financial sector

Also driven by budgets, Upper management making sure IT spends wisely in the current economy

Page 9: Ultan kinahan   dr - minasi 2010

Disaster Recovery Planning Objectives

Develop the ability to recover key business functions following a disaster

Recover systems based on a timeline as defined by an Internal Audit of operations IT Infrastructure (12-24 hours) Processing System (24-48 hours) Email (4 days) Financials (5 days)

Page 10: Ultan kinahan   dr - minasi 2010

What is mainly affected?

A Business Impact Analysis is essential to determine what core business functions would be most critical to restore following a disaster.

Internal Audits are one of the best options. Typical findings;

Financials (accounts receivables, accounts payable, etc)

HR/Payroll (Payroll, Benefits processing, etc) IT Infrastructure (to support the business

applications above)

Page 11: Ultan kinahan   dr - minasi 2010

What Systems or Services are Required?

Outline all business critical needs for restoration of services within the timeline determined by the analysis. Servers Desktops Laptops Networking Gear Data Lines Software

Home Grown Apps Vendor Based Apps

Vendor Contact List Etc…

Page 12: Ultan kinahan   dr - minasi 2010

What mediums are available?

Portable & Low Cost Options: Has its benefits CD’s, DVD’s USB Sticks or Drives Tape Disk Local Replication

Other Options Application Replication (Local site or Site to Site) Site to Site Replication (Branch site or Colocation) Cloud Storage or hosted applications Thin client computing – VDI

Page 13: Ultan kinahan   dr - minasi 2010

Mmm, What do I need to start?

First things first…Funding Approval!!! Purchase new storage, servers, software, licensing

etc. Can you use existing (non-production) systems at a

second data center? Data lines Configuration of servers, software & storage Setup external access ports for failover

MX Records, Terminal Server, Citrix, VPN etc. Testing failover

Page 14: Ultan kinahan   dr - minasi 2010

Bandwidth – The Replication Challenge

If you're lucky enough to get 70% of the bandwidth usable! you're likely to see transfer rates for a dedicated connection similar to those in the table below. Technolog

yMb/s Theoretical

GB/hExpected

GB/h

T1 1.536 0.66 0.46

10Base-T LAN

10 4.39 3.08

DS3 (T3) 43.2 18.98 13.29

100Base-T LAN

100 43.95 30.76

OC3 155 68.12 47.68

0C12 622 273 191.34

Page 15: Ultan kinahan   dr - minasi 2010

Bandwidth - Continued

Reasons for loss of bandwidth: The provider. In most cases with your standard T1 at

1.5MB your lucky to get 1.3MB then you have… The asynchronous replication engine is often based on

a protocol, protocol converter Application running on top of IP or TCP Transport protocol overhead Replication protocol overhead

Page 16: Ultan kinahan   dr - minasi 2010

Option – Double-Take

System Friendly Asynchronous Replication Low CPU Overhead Defined Memory and Disk Usage Write Order Intact Replication

Bandwidth Friendly Data Movement User-definable Replication Sets Compression, Scheduling, Scheduled Bandwidth

Throttling capabilitiesPoint-in-Time Recovery

Integration w/Volume Shadow Copy Services

Page 17: Ultan kinahan   dr - minasi 2010

Application Failover

Failover Monitoring

Replication

• IP ICMP or Heartbeat Monitoring

• Detect Failure in Seconds or Minutes

• Users can reconnect within minutes of failure

• Failover can occur across a LAN, WAN and even NAT

• Failover more than one Server Identity to the Same Target Server

• Failover Scripting for Custom Configurations

Source Target

Page 18: Ultan kinahan   dr - minasi 2010

Application Failback

Failover Monitoring

Restoration

Source Target

• Recover or Replace Source Server

• Restore Data to the Source Server

• Failback Source Identity

• Bring Source Application Online

• Users Reconnect Within Minutes

• Start Replication and Resume Failover Monitoring for Continued Protection

Replication

Page 19: Ultan kinahan   dr - minasi 2010

Value of Automated Site Recovery

VMware Site Recovery Manager provides cost savings from: Reduced recovery infrastructure requirements Fewer hours spent creating and maintaining DR plans and

processes Significantly reduced cost of DR tests; eliminates IT staff overtime

and application impact Recovery in a matter of hours, not days or weeks – greatly reducing

the financial exposure a company faces during a major outage The following captures an estimate of the cost savings provided by

SRM when used to recover from a major outage or disasterCompany that does $25M in revenue a year = ~$96k/weekday. Assume that SRM can achieve RTO of 12 hours instead of 72 hours compared to traditional DR plan.

= $ 615,385(per disaster)

X $96,153

Value of Lost Revenue

Days offaster recovery

Lost revenue per workday

X + X 500

Value of Lost Time by Workers

Days offaster recovery

Number ofworkers

X X $300/day

Cost of workerwages

2.52.5

Page 20: Ultan kinahan   dr - minasi 2010

Failover Automation

Detect site failures Raise alert when heartbeat

lost

Initiate failover User confirmation of outage Granular failover initiation

Manage replication failover Break replication Make replica visible to

recovery hosts

Execute recovery process Use pre-programmed plan Provide visibility into progress

Page 21: Ultan kinahan   dr - minasi 2010

Testing

Testing is key to success of any DR Plan no matter how big or small the environment

According to Symantec’s annual Global IT Disaster Recovery survey One in four DR tests fail. This marks an improvement, however, when compared to previous years. In 2007, 50% of DR tests failed In 2008, 30% of DR tests failed and 25% in 2009

Symantec continues to say, that only 15% report that they never experienced a failure in test.

Page 22: Ultan kinahan   dr - minasi 2010

Q&A

Any questions?

Page 23: Ultan kinahan   dr - minasi 2010

Contact Information

Ultan KinahanBusiness Continuity & Disaster Recovery

[email protected]