it disaster recovery — planning guide · overview developing an it disaster recovery plan...

27
IT Disaster Recovery Planning Guide Bacula Systems User’s Guide Bacula Systems White Paper This document is intended to provide insight into the considerations and processes required to establish, test, and implement disaster recovery procedures for crucial IT services at your company. LEGAL DISCLAIMER Bacula Systems has taken, and will continue to take, proper care in the development, preparation and maintenance of the content of this document which is intended for general information purposes only. Notwithstanding the preceding, Bacula Systems makes no representation and gives no warranty, whether express or implied, as to the accuracy, reliability or completeness of any information, content or materials contained within the document. Bacula Systems will accept no responsibility or liability whatsoever for any material or services contained on any application, website or software not under the direct control of Bacula Systems. Version 1.0, April 5, 2018 Copyright ©2008-2018, Bacula Systems S.A. All rights reserved.

Upload: hanguyet

Post on 16-Feb-2019

218 views

Category:

Documents


0 download

TRANSCRIPT

IT Disaster RecoveryPlanning GuideBacula Systems User’s Guide

Bacula Systems White Paper

This document is intended to provide insight into the considerations and processesrequired to establish, test, and implement disaster recovery procedures for crucialIT services at your company.

LEGAL DISCLAIMERBacula Systems has taken, and will continue to take, proper care in the development, preparation and maintenance of the contentof this document which is intended for general information purposes only. Notwithstanding the preceding, Bacula Systemsmakes no representation and gives no warranty, whether express or implied, as to the accuracy, reliability or completeness ofany information, content or materials contained within the document. Bacula Systems will accept no responsibility or liabilitywhatsoever for any material or services contained on any application, website or software not under the direct control of BaculaSystems.

Version 1.0, April 5, 2018Copyright ©2008-2018, Bacula Systems S.A.All rights reserved.

Contents

1 Overview 2What is an IT Disaster Recovery Plan? . . . . . . . . . . . . . . . . . . . 3Who Is Involved in IT Disaster Recovery Planning? . . . . . . . . . . . . 3Disaster Recovery Planning Process . . . . . . . . . . . . . . . . . . . . 3Glossary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4Disclaimer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

2 IT Disaster Recovery Planning Guide 41 Obtain Authorization and Commitment . . . . . . . . . . . . . . . 5

1.1 Gather Background Information (Optional) . . . . . . . . . 51.2 Determine How to Proceed . . . . . . . . . . . . . . . . . . 5

2 Define Priorities . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62.1 Identify Critical Services . . . . . . . . . . . . . . . . . . . . 62.2 Assess Impact of Service Outage . . . . . . . . . . . . . . . 72.3 Risk Assessment . . . . . . . . . . . . . . . . . . . . . . . . 82.4 Prioritize . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82.5 Decide extent of action . . . . . . . . . . . . . . . . . . . . 8

3 Decide on Technical Methodology . . . . . . . . . . . . . . . . . . 83.1 Determining a technical methodology for each service . . . . 83.2 Developing Facility and Infrastructure Plan . . . . . . . . . 103.3 Estimating Costs and Developing a Schedule . . . . . . . . . 10

4 Develop and Implement the Plan . . . . . . . . . . . . . . . . . . . 104.1 Roles and Responsibilities . . . . . . . . . . . . . . . . . . . 104.2 Determine disaster response process . . . . . . . . . . . . . 104.3 Develop detailed service recovery plans . . . . . . . . . . . . 11

5 Test the Plan . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115.1 How to Test? . . . . . . . . . . . . . . . . . . . . . . . . . 11

3 IT Disaster Recovery Plan Template 121 Authorization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

1.1 Policies and Administrative Regulation . . . . . . . . . . . . 131.2 Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

2 Services and Their Priorities . . . . . . . . . . . . . . . . . . . . . . 132.1 Services List . . . . . . . . . . . . . . . . . . . . . . . . . . 132.2 Assess Impact of Service Outage . . . . . . . . . . . . . . . 142.3 Assess Risks . . . . . . . . . . . . . . . . . . . . . . . . . . 142.4 Prioritize . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152.5 Set Scope . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

3 Facility and Infrastructure Plan . . . . . . . . . . . . . . . . . . . . 163.1 Determining technical approach for each service . . . . . . . 163.2 Facility Plan . . . . . . . . . . . . . . . . . . . . . . . . . . 173.3 Infrastructure Plan . . . . . . . . . . . . . . . . . . . . . . 183.4 Estimating Costs and Developing a Schedule . . . . . . . . . 19

4 Plan Implementation . . . . . . . . . . . . . . . . . . . . . . . . . 194.1 Roles and Responsibilities . . . . . . . . . . . . . . . . . . . 194.2 Disaster Response Processes . . . . . . . . . . . . . . . . . 224.3 IT Services Recovery Plans . . . . . . . . . . . . . . . . . . 24

IT Disaster Recovery — Planning GuideCopyright © April 2018 Bacula Systems SA ..............................................www.baculasystems.com/contactus

All trademarks are the property of their respective owners

1 / 26

5 Testing the DR Plan . . . . . . . . . . . . . . . . . . . . . . . . . . 25

IT Disaster Recovery — Planning GuideCopyright © April 2018 Bacula Systems SA ..............................................www.baculasystems.com/contactus

All trademarks are the property of their respective owners

2 / 26

OverviewDeveloping an IT disaster recovery plan involves choosing the right people to beinvolved, assigning appropriate roles, selecting the technologies to use, as well asdeveloping, implementing, testing, and documenting the recovery process. Thisdocument will show you how to create a simple disaster recovery plan for yourcompany that can be further expanded, based on your company’s needs. Thisdocument contains two sections:◾ IT Disaster Recovery Planning Guide – Walks you through the process of ob-taining the required authorization, establishing planning priorities, determiningthe technical approach, as well as developing, implementing, and testing thedisaster recovery plan.

◾ IT Disaster Recovery Plan Template – Provides sample content that you canuse while developing a disaster recovery plan for your company.

What is an IT Disaster Recovery Plan?An IT disaster recovery plan documents:◾ the company’s leadership’s objectives for disaster recovery

◾ members of the recovery team and their roles and responsibilities

◾ detailed procedures for protecting and recovering required technical servicesafter a disruptive event such as a flood or fire

An IT disaster recovery plan aims to:◾ provide critical IT services after an incident.

◾ ensure that critical business functions continue within a sufficient period oftime.

Who Is Involved in IT Disaster Recovery Planning?The company’s IT manager should lead the planning. He or she usually works withthe IT department to determine specific steps within the disaster recovery processand to develop and test the resulting recovery plan.In addition, it is also important to involve other stakeholders outside of the ITdepartment, including senior leaders, CTO and CEO office representatives, andboard members (if applicable) to ensure the entire organization’s needs are met.

Disaster Recovery Planning ProcessDisaster recovery planning is an ongoing and iterative process. Each step includesseveral activities to be performed. During initial development of the disaster recov-ery plan, stages 4 and 5 are repeated several times, each time focusing on developingand testing recovery plans for a different service or a set of services.After obtaining leadership commitment to the disaster recovery planning programin stage 1, stages 2 through 5 are repeated periodically. IT services are dynamic:new services are created and obsolete services are retired. Priorities and disasterrecovery plans must be reviewed and revised periodically to ensure that they arecurrent.

IT Disaster Recovery — Planning GuideCopyright © April 2018 Bacula Systems SA ..............................................www.baculasystems.com/contactus

All trademarks are the property of their respective owners

3 / 26

GlossaryCriticality Period A criticality period is any point during which the identified pro-

cess is critical and may affect the recovery time objective (RTO). A service orprocess may have multiple criticality periods or none at all, depending on thenature of the process. Criticality periods may be cyclical or one-offs and mayrange from months to hours in length. Examples of criticality periods include:year-end processing, regulatory deadlines, payroll processing, and scheduledevents.

Recovery time objective (RTO) The goal for how soon the service needs to berecovered after a disruption, based on the acceptable amount of downtime andlevel of performance. For example, an RTO of 24 hours with local accessibilityfor payroll services means that the payroll application must be up and runningwithin 24 hours as well as accessible locally.

Recovery point objective (RPO) The goal for how much data or informationcan be lost after a disruption, based on the acceptable amount of data orinformation loss. For example, an RPO of 6 hours for payroll services meansthat the payroll data must be backed-up every 6 hours so that no more than6 hours of data entered into the payroll application is lost after a disruption.

Service Priority The logical grouping of services to be recovered such as Prio 1,Prio 2, and so on. Core infrastructure services need to be recovered first andwould be included in Prio 1.

Steering Committee A group of people that makes decisions, authorizes time andresources, and provides oversight.

Working Group A group of people that defines technical approach, develops re-covery plans for individual services, as well as tests and implements thoseplans.

DisclaimerThe information contained in this guide should not be considered as legal advice.The content may be inaccurate and incomplete and may not be fully applicable toyour situation. This guide should not be your only reference when developing adisaster recovery plan. Please consider using multiple sources of information whileworking on such a plan to make sure all aspects are covered.

IT Disaster Recovery — Planning GuideCopyright © April 2018 Bacula Systems SA ..............................................www.baculasystems.com/contactus

All trademarks are the property of their respective owners

4 / 26

IT Disaster Recovery Planning GuideStage 1: Obtain Authorization and CommitmentCommitment from all business areas and all levels of management is crucial foreffective IT disaster recovery planning. Senior leaders, top management, and otherstakeholders need to understand why disaster recovery planning is important, sothey can budget time, attention and resources for disaster recovery activities asnecessary. This important stage not only commits the company to having a disasterrecovery plan, but with every stakeholder bought in to the project, also helps youdevelop the plan by making it easier to get time and resources from other areas ofthe organization.

Stage 1.1: Gather Background Information (Optional)If there are no pre-existing disaster recovery procedures at your company, it may behelpful to collect the following information before you proceed:

◾ What is the administrative procedure or policy that regulates how an IT dis-aster recovery plan should be maintained (if one exists)?

◾ What is the location of the current IT disaster recovery plan (if one exists)?

◾ The date that the disaster recovery plan was last updated

◾ Who is the person(s) responsible for maintaining the disaster recovery plan?

Stage 1.2: Determine How to ProceedAfter you have collected the necessary background information, evaluate what youneed to do to ensure management support and to acquire any needed authorization.For example, you can:

◾ Ascertain the degree of mindshare and support among senior leaders forputting in place a disaster recovery plan.

◾ Communicate clearly, and/or even deliver a presentation to senior leadersthat outlines why a disaster recovery plan is important, what has alreadybeen done, and what steps the recovery process includes.

◾ Work with your colleagues to build two groups:

– A steering committee that makes decisions, authorizes time and re-sources, and provides oversight.

– A working group that defines technical approach, develops recoveryplans for individual services, as well as tests and implements those plans.

When determining who should participate in each group, consider decision-making authority, resources available to the person, and their skills and knowl-edge.

◾ Develop or revise administrative procedures or policy regulations on how thedisaster recovery plan should be maintained (if applicable).

IT Disaster Recovery — Planning GuideCopyright © April 2018 Bacula Systems SA ..............................................www.baculasystems.com/contactus

All trademarks are the property of their respective owners

5 / 26

Stage 2: Define PrioritiesEstablishing planning priorities involves the following steps:

1 Identify critical services and the applications and other IT elements used bythese crucial services

2 Quantify the impact of service outage and identify maximum allowable dataloss and service downtime.

3 Assess vulnerabilities risks and threats that could cause disruptions to ITservices, processes and applications.

4 Prioritize IT services according to requirements and criticality.5 Decide extent of action: determine which services should be the first to have

recovery plans for.

Additional questions to discuss:

◾ Does the impact vary by time of month or year?

◾ Are there any specific deadlines that organization stakeholders must meet?

◾ What is the maximum allowable outage time for each critical business oreducation service?

◾ How feasible are manual workarounds?

◾ How soon does each technology service need to be available? (See recoverytime objective definition on page 4)

◾ How much data or information can we afford to lose? (See recovery pointobjective definition on page 4)

Stage 2.1: Identify Critical ServicesWork with each department in your organization to determine which services are themost critical to them. Identify all the underlying technologies or other contributingservices are that are needed for the whole.

Create a list of critical services, providing information about the departments theyare used by, their location, criticality periods, manual workarounds (if available),maximum allowable outage, as well as underlying services or applications they de-pend upon.

IT Disaster Recovery — Planning GuideCopyright © April 2018 Bacula Systems SA ..............................................www.baculasystems.com/contactus

All trademarks are the property of their respective owners

6 / 26

Stage 2.2: Assess Impact of Service OutageDetermine what will be the impact of service outages and time sensitivity. A businessdisruption can impact an organization in several ways. There are five main categoriesthat are used to measure impact:

◾ Safety and human life

◾ Financial

◾ Reputation

◾ Operations

◾ Regulatory and legal

For each of the critical services and processes you identified, determine the impactin each of the applicable categories based on the following values:

Minor The consequences would threaten the efficiency or effectiveness of some ser-vices and processes but would be dealt with at the business unit or departmentlevel.The consequences may include low monetary losses.

Moderate The consequences would not threaten the provision of services and pro-cesses, but would mean the business operations could be subject to significantreview or changed ways of operating. Executive involvement would likely berequired.The consequences may include moderate monetary losses.

Major The consequences would threaten continued effective provision of servicesand processes and require executive involvement.The consequences may include significant damage or destruction, some minorinjuries or threat to human safety with no loss of life, high monetary losses.

Catastrophic The consequences would threaten the provision of essential servicesand processes, causing major problems for customers and require immediateexecutive involvement and action.The consequences may include major damage or destruction, imminent threatto human safety, loss of life or major injuries, and extreme monetary losses.

The time sensitivity of service outages can be determined by calculating the requiredrecovery time objective and recovery point objective.

Recovery time objective (RTO) The goal for how soon the service needs to berecovered after a disruption, based on the acceptable amount of downtime andlevel of performance. For example, an RTO of 24 hours with local accessibilityfor payroll services means that the payroll application must be up and runningwithin 24 hours as well as accessible locally.

Recovery point objective (RPO) The goal for how much data or informationcan be lost after a disruption, based on the acceptable amount of data orinformation loss. For example, an RPO of 6 hours for payroll services meansthat the payroll data must be backed up every 6 hours so that no more than6 hours of data entered into the payroll application is lost after a disruption.

IT Disaster Recovery — Planning GuideCopyright © April 2018 Bacula Systems SA ..............................................www.baculasystems.com/contactus

All trademarks are the property of their respective owners

7 / 26

Stage 2.3: Risk AssessmentRecord the risks that would cause disruptions to IT services, processes and appli-cations, such as a server room without a backup power source that would lead tothe services hosted by the servers in this server room not being available during apower outage. If the risk is present, determine the implications and whether or nota strategy to address such a risk is required.

Stage 2.4: PrioritizePrioritize the IT services based on need and criticality, along with their dependencies.Each service can be classified as follows:

Service Priority Recovery Time Objective

Critical e.g. within 24 hoursVital e.g. within 72 hoursNecessary e.g. within 2 weeksDesired e.g. longer than 2 weeks but necessary to return to normal

operating conditions

For better budgeting, it is important to prioritize risk reduction and recovery effortsbased on how critical the service is to the company.

Stage 2.5: Decide extent of actionBased on service priorities identified in Stage 2.4 decide which services will beapplicable for this stage of disaster recovery planning and which will be earmarkedfor a later date. It is advisable to document core IT services and infrastructure usedby other applications and services first according to their criticality.

Stage 3: Decide on Technical MethodologyAt this stage you need to determine how much to invest in proactive disaster preven-tion, as opposed to dealing with the consequences and recovering after a disaster.After you establish investment priorities, determine the facilities and technologiesneeded as well as their cost and what the implementation schedule should be.Determining the technical approach involves the following:

1 Determine technical approach for each service (whether to prevent risks orfocus on recovery options).

2 Develop Facility Plan and Infrastructure Plan: find an alternative site withsufficient power, infrastructure, and space.

3 Develop cost estimates and schedule: prepare labor and technology cost esti-mates and proposed schedule for implementation.

Stage 3.1: Determining a technical methodology for each serviceUsing the list of priorities you identified in Stage 2 on page 6, determine what yourtechnical approach should be: would you prefer to focus on preventing outages (for

IT Disaster Recovery — Planning GuideCopyright © April 2018 Bacula Systems SA ..............................................www.baculasystems.com/contactus

All trademarks are the property of their respective owners

8 / 26

example, through implementing additional components or equipment) or on recoveryoptions, including manual workarounds and alternate sites to restore technologyservices and applications within the required timeframe. When deciding on the besttechnical approach, you may need to consider the following aspects:

◾ ............Recovery.......time.............objective1 and ...........recovery........point.............objective2: Services that haveshorter recovery point objectives and/or recovery time may be better suitedto adopt a methodology based on prevention rather than recovery. It couldbe hard to recover these services in a short time frame.

◾ Risks and risk mitigation: Each service will face a variety of risks which maycost a lot of time just to identify, as well as mitigate. Therefore, it maybe preferable to develop a strategy for dealing with grouped risk scenarios,designed to cover most of the risks to a service. For example, these groupscould be:

IT Service Risk Type Risk Mitigation

(e.g. CRM,or HR Sys-tems, or ERPSystems, etc)

Facility down dueto Non-destructiveEvent

Implement network/servers/software ata mirror data centre. Restore data frombackup until the main facility has beenrecovered.

Facility down due toDestructive Event

Implement network/servers/software ata mirror data centre. Recover data frombackup. Mirror facility becomes primaryfacility.

Network Loss Take preventative approach through set-ting up redundant network componentsand agreement with alternate internetprovider.

Application Loss Have application set up on standbyserver in data centre. Restore applica-tion data from backups.

Continue on the next page

1See definition on page 42See definition on page 4

IT Disaster Recovery — Planning GuideCopyright © April 2018 Bacula Systems SA ..............................................www.baculasystems.com/contactus

All trademarks are the property of their respective owners

9 / 26

IT Service Risk Type Risk Mitigation

Employee Loss Create more detailed documentation dueto criticality of service. If needed, bringin temporary qualified staff during a dis-aster.

Stage 3.2: Developing Facility and Infrastructure PlanWork together with your team to find an alternate site that meets the power,infrastructure, and physical space requirements needed to recover your IT services.This site should not be too close to your primary site, to ensure that a disaster doesnot affect them both. Use the following sections of the template to document yourfacility and infrastructure plans: Step 3.2 on page 17 and and Step 3.3 on page 18.

Stage 3.3: Estimating Costs and Developing a SchedulePrepare cost estimates for labor and technology costs required to implement steps 3.2on page 17 and 3.3 on page 18. Develop a schedule for implementation. Discussand obtain an approval from the IT disaster recovery plan steering committee.

Stage 4: Develop and Implement the PlanStage 4.1: Roles and ResponsibilitiesDeveloping an IT disaster recovery plan requires involvement of the following twogroups:

◾ A steering committee that makes decisions, authorizes time and resources,and provides oversight.

◾ A working group that defines technical approach, develops recovery plansfor individual services, as well as tests and implements those plans.

While determining who should participate in each group, consider decision-makingauthority, resources available to the person, and their skills and knowledge.Establish roles and responsibilities for recovery efforts. Designate alternates in casenot everyone is available (specific names, contact info, and so on.)

Stage 4.2: Determine disaster response processResponding to a disaster consists of a number of phases:

Tip: Handle minor incidents causing service outage via incident response proce-dures. Escalate severe incidents or events, such as loss of all communications, lossof power, flood or fire, or loss of the building to appropriate personnel.

IT Disaster Recovery — Planning GuideCopyright © April 2018 Bacula Systems SA ..............................................www.baculasystems.com/contactus

All trademarks are the property of their respective owners

10 / 26

Stage 4.3: Develop detailed service recovery plansDeveloping detailed service recovery plans involves the following steps:

1 Gather detailed requirements:

◾ Gather information about current configuration and network and securityrequirements.◾ Determine server and storage configuration, network and security re-quirements, as well as application configuration details.

2 Analyze:

◾ Determine recommended technical recovery approach and level of doc-umentation.◾ Gather requirements for restoring the service in your recovery facilityif needed (for example, space, power, telecommunications, and so on).Cloud-based services may not need this step.◾ Analyze dependencies.

3 Document current setup and process for recovering the service. Provide moredocumentation for complex and critical services.

4 Test the process to identify gaps and areas for improvement.

Stage 5: Test the PlanTesting is the most important part of developing a disaster recovery plan. Theonly way to know if your disaster recovery plan will work in a real disaster is if youthoroughly test it before it is actually needed. There are different types of testsvarying in complexity and amount of time and resources to complete. Tabletopwalkthroughs where team members verbally go through steps in the plan tend tobe the least time-consuming. Disaster simulations and full failover testing requiresmore time and resources. Initial testing and tabletop walkthroughs my only involvethe disaster recovery team, but simulations or full failover testing require additionalpeople to be involved to make them more realistic. The IT leader and/or the steeringcommittee need to be involved in determining when to run a disaster simulation orfull failover testing due to the potential impact and the need to involve other areasof the organization.

Stage 5.1: How to Test?◾ Tabletop Walkthrough – Team members gather in a meeting room and ver-bally go through the specific steps as documented in the plan to confirmeffectiveness, and identify gaps, bottlenecks or other weaknesses. This testprovides the opportunity to review a plan with the full team and familiarizestaff with procedures, equipment and offsite facilities.

◾ Disaster Simulation – A mock disaster is simulated so that normal opera-tions are not interrupted. A simulation involves testing hardware, software,personnel, communications, procedures, supplies and forms, documentation,transportation, utilities and alternate site processing. If possible, test against

IT Disaster Recovery — Planning GuideCopyright © April 2018 Bacula Systems SA ..............................................www.baculasystems.com/contactus

All trademarks are the property of their respective owners

11 / 26

production data. Analyze the results to capture lessons learned and updatethe plan as appropriate. Disaster simulation could include:

1 Component testing

(a) Test individual parts of the environment.(b) Execute tests at different times throughout the year.(c) Include participation of a limited number of business areas.(d) Only test connectivity.

2 Environment segment testing

(a) Test segments of the environment together (for example, groups ofservices like routers and firewalls).

(b) Execute a limited number of tests per year.(c) Execute limited functional testing.

3 Real time testing

(a) Test all aspects of the environment within scope.(b) Test all applications on one day.(c) Execute connectivity and some functional testing.(d) Isolate production.

◾ Full Failover Testing – A full failover test exercises the total disaster recoveryplan. The test is likely to be costly and involves risk to normal operations.If your focus is on resiliency, and failing over automatically, these tests arerequired to ensure successful failovers during a disaster.

IT Disaster Recovery — Planning GuideCopyright © April 2018 Bacula Systems SA ..............................................www.baculasystems.com/contactus

All trademarks are the property of their respective owners

12 / 26

IT Disaster Recovery Plan TemplateThis template provides sample content that can be adapted to your context.

How to use this template Red text contains instructions and is intended to bedeleted. Grey text is used for sample content that can be adapted, as you aredeveloping your disaster recovery plan.

Stage 1: AuthorizationStage 1.1: Policies and Administrative RegulationInstructions:

◾ Document why the IT disaster recovery plan was developed.

◾ List applicable legislation or company policies or administrative regulationsthat specify requirements to create and maintain an IT disaster recovery plan.For example, under the General Data Protection Regulation, businesses thathave customers in Europe are required to safeguard copies of personal in-formation (including metadata) from unauthorized access, use, disclosure, ordestruction. An IT disaster recovery plan can help mitigate the risk of disclo-sure or destruction of such private data as the result of an event or disaster.

This plan has been created as per the requirements of the following administrativeregulations:

◾ Data Security Compliance ActPolicy code: DSCA

◾ General Data Protection RegulationPolicy code: GDPR

Stage 1.2: ObjectivesInstructions:

◾ Document the main objectives of the IT disaster recovery plan

The IT Department has developed this IT disaster recovery plan to be used in theevent of a significant disruption to critical IT services at [your company name]. Thegoal of this plan is to outline the key recovery steps to be performed during andafter a disruption so that critical IT and telecommunication services continue withinan appropriate period of time after an incident has occurred.

Stage 2: Services and Their PrioritiesStage 2.1: Services ListInstructions:

◾ Create a list of critical services, providing information about the departmentsthey are used by, their location, criticality periods, manual workarounds (ifavailable), maximum allowable outage, as well as underlying services or ap-plications they depend upon.

IT Disaster Recovery — Planning GuideCopyright © April 2018 Bacula Systems SA ..............................................www.baculasystems.com/contactus

All trademarks are the property of their respective owners

13 / 26

◾ Fill out the table below, adjusting sample data as needed.

Department Service or Pro-

cess

Service location Criticality

Period

Manual

Workaround

Maximum

Allowable

Outage

Underlying IT

Services or

Applications

FinancialServices

Payroll CentralOffice

Lastweekof themonth

None 24 hours SAPSimpleFinance

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Stage 2.2: Assess Impact of Service OutageInstructions:

◾ Determine the impact of service outages and the required recovery time objec-tive (RTO) (how soon the service needs to be recovered) and recovery pointobjective (RPO) (how much data can be lost).

◾ Fill out the table below, adjusting sample data as needed.

Impact if Business Service Unavailable Time Sensitivity

Department Service Underlying

Services /

Applica-

tions

Safety /

Human

Life

Financial Operations Reputation Regulatory

/ Legal /

Contrac-

tual

RTO RPO

Finances Payroll SAPSim-pleFi-nance

Minor Major Major ModerateMajor 24hours

24hours

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Stage 2.3: Assess RisksInstructions:

◾ Document the risks that could cause disruptions to IT services, applications,and processes.

◾ Specify the implications if the risk occurs and whether or not a strategy needsto be developed to address such a risk.

IT Disaster Recovery — Planning GuideCopyright © April 2018 Bacula Systems SA ..............................................www.baculasystems.com/contactus

All trademarks are the property of their respective owners

14 / 26

◾ Fill out the table below, adjusting sample data as needed.

Department Service Underlying Services

/ Applications

Known Risks Implications Need Strategy to

address?

Finances Payroll SAP Sim-ple Finance

No backuppower sup-ply

Financeinformationunavailableduringa poweroutage

Yes, con-siderinstallinga backupgenerator

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Stage 2.4: PrioritizeInstructions:

◾ Prioritize the IT services based on need and criticality, along with their de-pendencies.

◾ Fill out the table below, adjusting sample data as needed.

Department Service Underlying Ser-

vices / Applica-

tions

Recovery Time

Objective

Recovery Point

Objective

Service Classifi-

cation

Service Priority

Finances Payroll SAPSimpleFinance

24 hours 24 hours Critical Prio 2

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Dependencies

. . . . . . . . . . . . . . . . . . NetworkInfras-tructure

12 hours 12 hours Critical Prio 1

. . . . . . . . . . . . . . . . . . NetworkConnec-tivity

12 hours 12 hours Critical Prio 1

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Stage 2.5: Set ScopeInstructions:

◾ List the IT services covered in the scope of this IT disaster recovery plan,

IT Disaster Recovery — Planning GuideCopyright © April 2018 Bacula Systems SA ..............................................www.baculasystems.com/contactus

All trademarks are the property of their respective owners

15 / 26

along with their recovery time objectives, recovery point objectives, and theorder in which these services should be recovered.

Service Priority Service or Application Name Recovery Time Objective Recovery Point Objective

Prio 1 Network Infras-tructure

12 hours 12 hours

Prio 1 Network Connec-tivity

12 hours 12 hours

Prio 1 Storage Services 12 hours 24 hoursPrio 1 Firewall Services 12 hours 24 hoursPrio 1 Active Directory 12 hours 24 hours. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Prio 2 SAP Simple Fi-

nance24 hours 24 hours

Prio 2 Payroll 24 hours 24 hoursPrio 1 Email 24 hours 24 hours

Stage 3: Facility and Infrastructure PlanInstructions:

◾ Document plans for recovering IT services in an alternate facility (if required)and plans for recovering infrastructure.

◾ Answer the following key questions:

– Where will we go when a disaster occurs?– How will we restore our infrastructure services?

Stage 3.1: Determining technical approach for each serviceInstructions:

◾ Using the list of priorities you identified in Step 2 on page 6, determine whatyour technical approach should be: would you prefer to focus on prevent-ing outages (for example, through implementing additional components orequipment) or on recovery options, including manual workarounds and alter-nate sites to recover technology services and applications within the requiredtimeframe.

◾ Fill out the table below, adjusting sample data as needed

IT Disaster Recovery — Planning GuideCopyright © April 2018 Bacula Systems SA ..............................................www.baculasystems.com/contactus

All trademarks are the property of their respective owners

16 / 26

IT Service Risk Type Risk mitigation

(e.g. CRM, or HRSystems, or ERP Systems,etc)

Facility down due to Non-destructive Event

Implement net-work/servers/softwareat a mirror data centre.Restore data from backupuntil the main facility hasbeen recovered.

Facility down due to De-structive Event

Implement net-work/servers/softwareat a mirror data cen-tre. Recover data frombackup. mirror facilitybecomes primary facility.

Network Loss Take preventative ap-proach through setting upredundant network com-ponents and agreementwith alternate internetprovider.

Application Loss Have application set up onstandby server in data cen-tre. Restore applicationdata from backups.

Employee Loss Create more detailed doc-umentation due to critical-ity of service. If needed,bring in temporary quali-fied staff during a disaster.

Stage 3.2: Facility PlanInstructions: Consider documenting items such as:

◾ the power, infrastructure, and space requirements for a recovery facility

◾ the circumstances under which a recovery facility will be used

◾ who is authorized to make the decision to use it

◾ who will be involved in setting up the recovery facility

◾ where the recovery facility is located and plans to identify an alternate facilityif needed

Facility Requirements

Requirement Description

PowerInfrastructureSpace

IT Disaster Recovery — Planning GuideCopyright © April 2018 Bacula Systems SA ..............................................www.baculasystems.com/contactus

All trademarks are the property of their respective owners

17 / 26

If the primary facility is no longer functional to restore normal business operations,the team will be instructed that the recovery of systems will be done at the recoveryfacility. Once this has been determined, the facilities team should start bringing thealternate facility to a functional state. It is also important to properly coordinatetravel and logistics to ensure that the team can operate out of the alternate site.

Stage 3.3: Infrastructure PlanInstructions:

◾ Focus on recovering the minimum core infrastructure required to recover mis-sion critical IT services. Create a separate section in the document for eachcore service that includes detailed recovery procedures.

Sample List of Critical Infrastructure Services:

System

Local Area Network (LAN)Wide Area Network (WAN)ServersCore NetworkFirewallRemote Connectivity. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Local Area Network (LAN) Recovery PlanPeople Responsible:

◾ Thomas Bauer, IT Manager, [contact details]

◾ [backup, if available]

Priority

◾ Critical

Recovery Strategy and Location. . .Network Diagram. . .Assumptions

◾ Racks and power are available

◾ Other

Recovery Time Objective (RTO) and Recovery Point Objective (RPO)

◾ RTO: 6 hours.

◾ RPO: 6 hours.

IT Disaster Recovery — Planning GuideCopyright © April 2018 Bacula Systems SA ..............................................www.baculasystems.com/contactus

All trademarks are the property of their respective owners

18 / 26

Recovery Platform. . .Recovery Procedure

◾ Overview of major steps:

1 Rack gear2 . . .3 . . .4 . . .5 Patch to switches6 Configure router

◾ Details for each step:

1 Rack gear– Mount gear– Confirm power– Patch to servers– Connect to WAN– Log in and update switch– Configure rules– . . .

Stage 3.4: Estimating Costs and Developing a ScheduleInstructions:

◾ Prepare cost estimates for labor and technology costs required to implementsteps 3.2 on page 17 and 3.3 on the preceding page. Develop a schedule forimplementation. Discuss and obtain an approval from the IT disaster recoveryplan steering committee.

Stage 4: Plan ImplementationStage 4.1: Roles and ResponsibilitiesInstructions:

◾ Document roles, responsibilities and contact information for the disaster re-covery team in order to respond effectively to an incident or disaster.

Depending on the size and organization of your team, some roles may be combined.Disaster Recovery Team Org Chart (Optional)[Instructions:Consider adding an org chart to show the team roles and how they are interrelated.]The following chart shows the key roles involved in preparing for and responding toa disaster.Incident Manager (IT Lead)

IT Disaster Recovery — Planning GuideCopyright © April 2018 Bacula Systems SA ..............................................www.baculasystems.com/contactus

All trademarks are the property of their respective owners

19 / 26

The disaster recovery incident manager is responsible for making all decisions relatedto the IT disaster recovery efforts. This person’s primary role is to guide the disasterrecovery process. The entire IT recovery team reports to this person during anincident.Responsibilities:

◾ Initiate the IT disaster recovery call tree.

◾ Provide status updates to senior leaders and information needed for makingdecisions.

◾ Coordinate communications.

Facilities TeamThe facilities team is responsible for all issues related to the physical facilities thathouse IT systems, including both the primary and recovery facilities. They alsoare responsible for assessing the damage and overseeing the repairs to the primarylocation in the event of the primary location’s destruction or damage.Responsibilities:

◾ Ensure that the recovery facility is maintained in working order.

◾ Ensure transportation, sufficient supplies, food and water and sleeping ar-rangements are provided for all employees working at the recovery facility.

◾ Assess physical damage to the primary facility.

◾ Ensure that measures are taken to prevent further damage to the primaryfacility and appropriate resources are provisioned to rebuild or repair the mainfacilities if necessary.

Network TeamThe network team is responsible for assessing damage to network infrastructure andfor providing data and voice network connectivity during a disaster.Responsibilities:

◾ Assess damage to network infrastructure at the primary facility and prioritizethe recovery of services in the manner and order that has the least impact.

◾ Communicate and coordinate with third parties to ensure recovery of connec-tivity.

IT Disaster Recovery — Planning GuideCopyright © April 2018 Bacula Systems SA ..............................................www.baculasystems.com/contactus

All trademarks are the property of their respective owners

20 / 26

◾ Ensure that needed network services are available at the recovery facility (ifneeded).

◾ Restore network services at the primary facility.

Server and Storage TeamThe server and storage team is responsible for providing the physical server andstorage infrastructure required to run IT operations and applications.Responsibilities

◾ Assess damage to servers and storage and prioritize the recovery of serversand storage devices in the manner and order that has the least impact.

◾ Ensure that servers and storage services are kept up-to-date with patches andcopies of data.

◾ Ensure appropriate backups.

◾ Install and implement required tools, hardware and systems in the facilities.

Applications and Processes TeamThe applications and processes team is responsible for ensuring that all applicationsoperate as required to meet organizational objectives as well as managing IT pro-cesses that are fundamental to support the recovery of IT services and applications.Responsibilities:

◾ Assess impact to applications and prioritize the recovery of applications in themanner and order that has the least impact.

◾ Ensure that the following IT processes are followed when managing applica-tions:

– incident management– change management– access provisioning– security– other

◾ Ensure that servers in the facilities are kept up-to-date with application patchesand copies of data.

◾ Install and implement any tools, software and patches required in the facilitiesas appropriate.

Call List[Instructions:Document the names, roles and contact information of leaders and team membersresponsible for responding to an incident and handling recovery efforts.]

Name Role / Title Phone Number

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

IT Disaster Recovery — Planning GuideCopyright © April 2018 Bacula Systems SA ..............................................www.baculasystems.com/contactus

All trademarks are the property of their respective owners

21 / 26

Stage 4.2: Disaster Response ProcessesInstructions:

◾ This section’s aim is to record clearly the main processes involved in re-sponding to a disaster. This information will likely significantly improve theresponse-time and the effectiveness of actions.

◾ The below information may be needed as part of each Disaster Responseprocess:

– process name & description;– steps;– inputs/outputs; and– Personnel responsibilities and roles (what is expected of the differentpeople involved in carrying out procedures?).

Disaster Response occurs in several phases as per the diagram below. When anevent happens, the team first assesses the event and decides whether or not declaringa disaster is necessary. In the case that a disaster has occurred, the team startsprocedures for recovery of the IT service(s), using, if necessary, an alternate location.Once the critical and required services are back, up and running, the team can workon getting back to normal operations. The final stage is to hold a review andanalysis of the event up to the point normal services were resumed.

Processes for Assess Phase:Process to Assess Severity of Incident or EventInstructions:

◾ Document the procedure for deciding how sever the incident is, and give clearcriteria and procedures for escalation.

◾ Additional Advice:

– Handle minor incidents that lead to service outage through incident re-sponse procedures. Severe incidents such as fire, flood, significant com-munications loss, power loss or unavailability of the building, should beescalated to the appropriate personnel.

– Service desk process links should be documented and kept up to date.

Severe Incident EscalationDirections:

◾ The escalation procedures for alerting senior leadership and IT Managementto assess the impact of the incident need to be fully documented.

IT Disaster Recovery — Planning GuideCopyright © April 2018 Bacula Systems SA ..............................................www.baculasystems.com/contactus

All trademarks are the property of their respective owners

22 / 26

How to perform Impact AssessmentDirections:◾ Clearly document what information needs collecting in order to declare – ornot declare- a disaster. Some examples would be: assessing the degree ofdamage and calculating the anticipated recovery time. Include advice onwhether to recover in-situ, or begin recovery in an alternate site.

Process to Declare DisasterDirections:◾ Clearly define the criteria for how and when a disaster should be confirmed andcommunicated. A procedure for delegation of authority should be in place, sothat IT and other infrastructure team members are empowered to act if theusual authorities are unavailable.

Recover Phase Procedures:Team Notification ProceduresDirections:◾ Clearly document call-out procedures to ensure fast response from the disasterrecovery team.

Recovery and Data Restore InitiationDirections:◾ Outline clearly the process for launching the disaster recovery plan, enablingthe recovery site and recovering each system by order of priority.

Progress CommunicationDirections:◾ Identify preferred channels of communication, document who are the impor-tant stakeholders and suggest the frequency of communication.

Recovery Team SupportDirections:◾ Record plans that guarantee the recovery team will receive enough water,food, rest and resources to be able to successfully complete their tasks.◾ Give direction for handling employees’ personal requirements, for examplefamily emergencies, illness or injury.

Resume Services Phase:Resuming Normal Operations ProcedureDirections:◾ Identify and record the process for having normal operations begin again,including checking readiness to resume and communicating intentions andstatus to stakeholders.

Review Phase Procedures:Review ProceduresDirections:◾ Write down and record the review procedure, including analysis, assessmentand clarifying what was leared and how future procedures can be improved.Communicate the findings of the post event analysis to a broad group of theorganizations’ stakeholders.

IT Disaster Recovery — Planning GuideCopyright © April 2018 Bacula Systems SA ..............................................www.baculasystems.com/contactus

All trademarks are the property of their respective owners

23 / 26

Stage 4.3: IT Services Recovery PlansInstructions:

◾ This section documents plans for recovering the IT services listed in Stage 2.5on page 15

◾ It is recommended that you include the following information relevant to eachservice:

– Responsibility – Specify who is responsible for managing this service aswell as any backup contacts in the event they are not available.

– Service Context – Who uses the service, what are the criticality periods,contact information for vendors and other personnel such as databaseadministrators and application owners.

– Service Classification – Specify the classification of this service (critical,vital, necessary or desired) as determined in Step 2.4 on page 8

– Recovery Strategy and Location – Specify the overall strategy for recov-ering this service as well as where the service will be recovered.

– Assumptions – Specify any assumptions required to follow the recoveryprocedure, such as the ability to restore from backups, etc.

– Recovery Time Objective (RTO) and Recovery Point Objective (RPO)– Specify the recovery time objective and recovery point objective forthis service as determined in Step 2.2 on page 7

– Recovery Platform – Specify the technology platform required to restorethis service. For example, virtualized Windows servers configured similarto the current production environment.

– Recovery Procedure – Consider providing an overview of the major stepsof recovery before providing detailed recovery procedures. Select theminimum level of documentation possible that reduces risk to an ac-ceptable level as more detailed documentation requires more time tocreate and maintain.

– Test Procedure – Specify how the service can be tested to ensure thatit’s working correctly.

– Resume Procedure – Specify how to resume the service after the eventhas been addressed.

Example list of IT Services

PayrollMarketing CRMEmailFinancial ServicesDelivery MonitoringLegal Records

IT Disaster Recovery — Planning GuideCopyright © April 2018 Bacula Systems SA ..............................................www.baculasystems.com/contactus

All trademarks are the property of their respective owners

24 / 26

Stage 5: Testing the DR PlanInstructions:

◾ Document and communicate the reasons that plan testing and review arecritical, the nature of the tests to be done, and the frequency of disasterrecovery plan testing.

Disaster recovery plan checks are one of the most important factors in building thebest plan for your organization. Creating an effective, high standard IT disasterrecovery plan comes as a result of good team cohesion. Therefore, drilling andchecking are mandatory to achieve the end goal of reliable disaster recovery.It is imperative that these reviews take place periodically, especially because changesthat are not connected to the technology can likely have a significant impact on theDR plan.

◾ Updated the plan to meet new organizational changes, priorities and aims.

◾ Be sure to update the call lists.

◾ Be sure to update the team lists.

◾ Check that updates have been effected relevant to any alterations to config-urations in the environment.

Remember: a good disaster recovery plan is one that it can be carried out effectivelyand smoothly whenever needed. Each person and each system involved in the planshould be a full part of the practice.

Every six months, the disaster recovery plan should be checked and given a run-through. On a more frequent basis, Bacula Systems recommends doing a walk-through, a disaster simulation and even a complete failover test.

Further Reading:Bacula Enterprise Edition has a large number of features that make it especiallysuitable to be used as part of a Disaster Recovery Plan. Because of it’s deeptechnical features, wide choice of storage destinations (including Cloud), and broadcustomizability, Bacula Systems recommends you consider this software as integralto your Disaster Recovery plans. Download the Bacula Enterprise Edition technicalwhitepaper “......How....to................implement ...........disaster............recovery............strategy.......and .......high ...............availability”here:

https://www.baculasystems.com/how-to-implement-disaster-recovery-strategy-and-high-availability-the-bacula-systems-whitepaper

This whitepaper provides consideration for Disaster Recovery from a software datarecovery aspect. Or ..........contact..........Bacula............Systems to find out more on how this uniquesoftware can help you.

IT Disaster Recovery — Planning GuideCopyright © April 2018 Bacula Systems SA ..............................................www.baculasystems.com/contactus

All trademarks are the property of their respective owners

25 / 26

For More InformationFor more information on Bacula Enterprise Edition, or any part of the broad BaculaSystems services portfolio, visit www.baculasystems.com.

IT Disaster Recovery — Planning GuideCopyright © April 2018 Bacula Systems SA ..............................................www.baculasystems.com/contactus

All trademarks are the property of their respective owners

26 / 26