disaster recovery on a limited budget

25
Disaster Recovery on a Limited Budget Stephen Rosenfeld University of New Brunswick [email protected] CANHEIT June 28, 2005

Upload: datacenters

Post on 11-Aug-2015

117 views

Category:

Business


0 download

TRANSCRIPT

Disaster Recovery on a Limited Budget

Disaster Recovery on a Limited Budget

Stephen RosenfeldUniversity of New Brunswick

[email protected]

CANHEIT June 28, 2005

Stephen RosenfeldUniversity of New Brunswick

[email protected]

CANHEIT June 28, 2005

BackgroundBackground Auditors “Management Letter Points” to the

Executive noted that no formal DR plan existed …. 3 years in a row

Last complete IT DR plan was mainframe based (done in 1987 and revised in 1996)

Some staff had been sent for DR training over the years, but we could never afford to dedicate them to the task

We had engaged in talks with the Provincial Government and the NB Electric Power Commission, but it was not a good fit.

If we were to take the situation seriously, we needed outside help

Auditors “Management Letter Points” to the Executive noted that no formal DR plan existed …. 3 years in a row

Last complete IT DR plan was mainframe based (done in 1987 and revised in 1996)

Some staff had been sent for DR training over the years, but we could never afford to dedicate them to the task

We had engaged in talks with the Provincial Government and the NB Electric Power Commission, but it was not a good fit.

If we were to take the situation seriously, we needed outside help

Xwave ProposalXwave Proposal We had a $25,000 budget Solicited a proposal from Xwave, an Aliant affiliate They provided Monique Thébeau, a Certified

Business Continuity Consultant

2 Stage Approach Discovery Phase – June 2004

Assess Risks and Exposures Determine current level of ITS DR preparedness Assess DR strategies currently in use Lay out the roadmap

Plan Development Phase – Sept-Dec 2004 Xwave to provide the Project Management

We had a $25,000 budget Solicited a proposal from Xwave, an Aliant affiliate They provided Monique Thébeau, a Certified

Business Continuity Consultant

2 Stage Approach Discovery Phase – June 2004

Assess Risks and Exposures Determine current level of ITS DR preparedness Assess DR strategies currently in use Lay out the roadmap

Plan Development Phase – Sept-Dec 2004 Xwave to provide the Project Management

Discovery FindingsDiscovery Findings

Why would you build your data centre above your Chemical Engineering department’s storeroom?

49 Recommendations made in 4 categories: Prevention – 33 Response – 1 Recovery – 13 Restoration – 2

Why would you build your data centre above your Chemical Engineering department’s storeroom?

49 Recommendations made in 4 categories: Prevention – 33 Response – 1 Recovery – 13 Restoration – 2

Sprinkler Control and PhotocopierSprinkler Control and Photocopier

Datatel ServerDatatel Server

Aftermath of FireAftermath of Fire

Missing TilesMissing Tiles

Solution to Water Leaks in CeilingSolution to Water Leaks in Ceiling

Compare to Xwave’s Data CentreCompare to Xwave’s Data Centre

ITS Response to FindingsITS Response to Findings

All Discovery Phase recommendations were accepted

ITS took advantage of a scheduled building power outage to do a major reorganization of the equipment in our machine room

Server connections to UPS were rationalized EPO (Emergency Power Off) and fire alarms

were tested and found wanting; repairs made Major cleanup done

All Discovery Phase recommendations were accepted

ITS took advantage of a scheduled building power outage to do a major reorganization of the equipment in our machine room

Server connections to UPS were rationalized EPO (Emergency Power Off) and fire alarms

were tested and found wanting; repairs made Major cleanup done

Proposed Recovery OptionsProposed Recovery Options

Data Replication at another UNB location (SAN) Expensive, but network bandwidth available

Alternate Recovery Sites External DR Vendors - Estimate for 15 Servers

Hot site – $10,000/month Cold site - $5,000/month Quick ship - $2,500/month

UNB Self-Provided Site (VM Ware) - $150,000 Contract with Xwave’s Marysville Data Centre

$150,000 as above, plus $900-$2,300/month

Data Replication at another UNB location (SAN) Expensive, but network bandwidth available

Alternate Recovery Sites External DR Vendors - Estimate for 15 Servers

Hot site – $10,000/month Cold site - $5,000/month Quick ship - $2,500/month

UNB Self-Provided Site (VM Ware) - $150,000 Contract with Xwave’s Marysville Data Centre

$150,000 as above, plus $900-$2,300/month

ITS’s Chosen DR StrategyITS’s Chosen DR Strategy

Use alternate space available in D’Avray Hall (3,600’ away by tunnel; 2,600’ by crow) House decommissioned servers in the wiring

closet in this building (powered up & idling) House live redundant servers here as well, e.g.

another Novell NDS replica, secondary DNS & DHCP servers, Webmail2

Upgrade electrical panel, rack, and UPS Negotiate for more space in D’Avray Hall

Use alternate space available in D’Avray Hall (3,600’ away by tunnel; 2,600’ by crow) House decommissioned servers in the wiring

closet in this building (powered up & idling) House live redundant servers here as well, e.g.

another Novell NDS replica, secondary DNS & DHCP servers, Webmail2

Upgrade electrical panel, rack, and UPS Negotiate for more space in D’Avray Hall

DR Plan FeaturesDR Plan Features

Integrated with UNB’s Critical Incident Plan for Fire, Police, and PR coordination

Music Room will hold quick-shipped replacement servers in the event of disaster

Student computer lab in building available for ITS use in case of disaster

Conference Room also available to use as a Command Center

Integrated with UNB’s Critical Incident Plan for Fire, Police, and PR coordination

Music Room will hold quick-shipped replacement servers in the event of disaster

Student computer lab in building available for ITS use in case of disaster

Conference Room also available to use as a Command Center

D’Avray Hall SpaceD’Avray Hall Space

UNB Departments InvolvedUNB Departments Involved

Nearly everyone in ITS Physical Plant Security SRIM (Public Relations) Purchasing Environmental Health & Safety

Nearly everyone in ITS Physical Plant Security SRIM (Public Relations) Purchasing Environmental Health & Safety

DR Plan SectionsDR Plan Sections

Incident Management Team Plan ESS Team Plan Communications & Network Team Plan Operations Team Plan Applications Recovery Team Plan Help Desk Team Plan

Disaster Recovery Test Program Disaster Recovery Maintenance Program

Incident Management Team Plan ESS Team Plan Communications & Network Team Plan Operations Team Plan Applications Recovery Team Plan Help Desk Team Plan

Disaster Recovery Test Program Disaster Recovery Maintenance Program

DR Team StructureDR Team StructureINCIDENT MANAGEMENT TEAMIncident Commander: Stephen Rosenfeld

Alternate IC: Janice El-BayoumiPurchasing Leader: Doug Beairsto

Purchasing Alternate: Mary-Lou VeerkampAdministration: Wilma Gilchrist

Admin Alternate: Pat Smith

ESS TeamLeader: Peter Ruddock

Alternate: David Lancaster

DR CoordinatorLeader: Brian Kaye

Alternate: Lori Murray-HawkinsAdmin Coordination:

Doug Swift/Terry Arnold

Damage Assessment TeamITS Leaders: Peter Jacobs & Brian KayePhysical Plant: Mike Carter/Terry Koch

UNB Security: Reg Jerrett/ Bob MacLean

COMM & Network TeamLeader: Peter Jacobs

Alternate: Sterling Gallan

Operations TeamLeader: John Jackson

Alternate: Fred Webber

Applications Recovery TeamLeader: Lori Murray-Hawkins

Alternate: Rik Hall

Help DeskLeader: Kim Washburn

Alternate: Scott ChamberlainMembers: SHDC & TSS

Routing, Switching, CablingLeader: Sterling GallanAlternate: Paul Prowse

Network Servers & ApplicationsLeader: Mike Jewett

Alternate: Matt Ashfield

Storage/BackupsLeader: Tracy Allen

Alternate: Doug Swift

UNIXLeader: Tony FitzgeraldAlternate: Rob MurrayMembers: Unix Group

Novell & BackupsLeader: Brian CassidyAlternate: Fred WebberMembers: Novell Group

DATATELLeader: Phil Parent

Alternate: Sean McDougall

EMAILLeader: David LancasterAlternate: Rob MurrayMember: Tracy Allen

WEBLeader: Shawn McGinn

Alternate: Megan Stewart

WEBCTLeader: Rik Hall

Alternate: Rock Leung

UNB Security TeamLeader: Reg Jerrett

Alternate: Bob MacLean

Restoration of Services RankingRestoration of Services Ranking

High: 1 - 7 days (14 machines) Backup server & basic network connectivity (DNS, DHCP) Directory services (PH, LDAP) E-mail & Webmail Datatel Emergency Web presence (Unix)

Medium: 7 - 21 days (25 machines) Library Catalog service Novell file systems, printing, and GroupWise E-Services portal & WebAdvisor Web & WebCT Footprints

Low: 21 days - 2 months (24 machines) Everything else - mostly lab support and monitoring software

High: 1 - 7 days (14 machines) Backup server & basic network connectivity (DNS, DHCP) Directory services (PH, LDAP) E-mail & Webmail Datatel Emergency Web presence (Unix)

Medium: 7 - 21 days (25 machines) Library Catalog service Novell file systems, printing, and GroupWise E-Services portal & WebAdvisor Web & WebCT Footprints

Low: 21 days - 2 months (24 machines) Everything else - mostly lab support and monitoring software

Next StepsNext Steps We now have a DR plan, but we only have a

minimal recovery solution Need to find more space in recovery building Explore options for a shared DR site with

other universities in the region Consolidate OS for ease of recovery Explore VM-Ware and Solaris 10

We now have a DR plan, but we only have a minimal recovery solution

Need to find more space in recovery building Explore options for a shared DR site with

other universities in the region Consolidate OS for ease of recovery Explore VM-Ware and Solaris 10

ConclusionConclusion The plan gives us a foot in the door to do a

University-wide Business Continuity Review by raising awareness of DR outside of ITS For example, the Library hosts 7 servers in their own

building, which do not fall under our DR plan Risk Assessment / Business Impact Analysis is required Our Recovery Time Objectives are an eye-opener

The model of using outside consultants was very successful and will be used for the BCP

DR considerations must be a factor in new purchases

Hardware vendor consolidation required for regional collaboration

The plan gives us a foot in the door to do a University-wide Business Continuity Review by raising awareness of DR outside of ITS For example, the Library hosts 7 servers in their own

building, which do not fall under our DR plan Risk Assessment / Business Impact Analysis is required Our Recovery Time Objectives are an eye-opener

The model of using outside consultants was very successful and will be used for the BCP

DR considerations must be a factor in new purchases

Hardware vendor consolidation required for regional collaboration