introduction guide to business continuity

32
1

Upload: alan-trup-mbci

Post on 12-Apr-2017

18 views

Category:

Services


1 download

TRANSCRIPT

Page 1: Introduction guide to Business Continuity

1

Page 2: Introduction guide to Business Continuity

2

“One thing harder than planning for an emergency is explaining why

you did not!”

www.adtbusinesscontinuity.com

Page 3: Introduction guide to Business Continuity

3

Table of Contents

1. About the Author ............................................................................ 4

2. Introduction ..................................................................................... 5

3. Why you should be doing Business Continuity ............................... 7

4. Information Gathering..................................................................... 8

5. Risk Assessment: ............................................................................. 9

6. Business Impact Analysis: .............................................................. 13

7. Business Continuity Plan (BCP) ...................................................... 14

8. Training and BC awareness ........................................................... 16

9. IT Disaster Recovery/IT Service Continuity ................................... 18

10. Exercise/Test ................................................................................. 20

11. Business and Services Linkage ....................................................... 24

12. Governance ................................................................................... 25

13. Life Cycle ........................................................................................ 26

14. Tools: ............................................................................................. 26

15. Terminology ................................................................................... 28

16. Other Options ................................................................................ 30

16. Want to find out more or need some help? ................................. 31

Page 4: Introduction guide to Business Continuity

4

1. About the Author

Alan Trup MBCI (Member of the Business Continuity Institute) joined

the BCI in 2006, he is a passionate and successful Business Continuity

and Disaster Recovery Consultant, he has worked in a large range of

businesses across many industries, he is Managing Director and

founder of Adtbusinesscontinuity Ltd. Independent BC and DR

Consultants.

This guide contains experience gathered from over 15 years of

continuous experience in Business Continuity and Disaster Recovery,

that has helped some major corporations including amongst others:

Mercedes Benz, Whitbread, Compass PLC, John Lewis Partnership and

BP, and a huge range of different small businesses.

Alan certainly has got the T-shirt! and has set out his hands-on

approach to business continuity in this guide, using real life

experiences, and challenges, that anyone implementing Business

Continuity will face.

Page 5: Introduction guide to Business Continuity

5

2. Introduction This guide is an introduction to Business Continuity (BC) suitable

Continuity. So, you might be an owner of a small business or a senior

person in a corporation that has been given the responsibility but do

not know where to start. The principles are the same irrespective of

the size of the organisation; the difference is in the budget, and the

scope.

Some people think:

BC is all about planning for a huge incident, such a building collapse or

bomb, that is so unlikely to happen to their site that it’s just not ever

going to happen, so there is no point in spending time and money

planning for it.

Why this is the wrong approach

Although the above example is the ultimate disaster, it is not the only

type of event that could render your critical processes unrecoverable,

thus threatening the existence of the entire business or at the very

least influencing the bottom line.

A relatively common occurrence of small fire or flood in the sales

location or a loss of a key system, could have the same devastating

effect on the organisation.

What happens in the real world?

There is very little attention to Business Continuity in the day to day

running of a business, and only when an external party e.g. an auditor,

Page 6: Introduction guide to Business Continuity

6

customer or potential customer asks to see evidence of the plans, or

an actual incident happens, (that the organisation was by sheer luck

able to cope with), does the priority change very quickly.

What would happen if, for example, the primary office was suddenly

not accessible, or if an external event that could jeopardise the

reputation of the organisation was developing. How would the

management team react and communicate with the staff and

customers to head off or to mitigate the impact?

It’s often just assumed it will be fine or ignored as it will never happen

to us, however when an incident occurs, panic and frantic work begins.

That word “assumption” is the most dangerous thing in Business

Continuity, and each time I hear it in this space, it is like raising a red

flag, because there will be false assumptions when everyone thinks

someone else is responsible and not them.

Page 7: Introduction guide to Business Continuity

7

3. Why you should be doing Business Continuity

Put simply it is common sense to protect your business against all the

different existential risks that threaten it.

The benefits are:

• Minimise interruption of normal business operations

• Limit disruption and damage

• Minimise any financial and reputational repercussions

• Establish alternative means of operations

• Opportunity to train employees and owners in emergency

procedures

• Provide smooth and rapid service restoration

• Competitive advantage

Page 8: Introduction guide to Business Continuity

8

4. Information Gathering There are several different methods that can be used to gather

information:

• Interviewing/speaking directly to individuals

• Sending out questionnaires

• Reviewing documentation

• Carrying out a traditional audit

In most cases a combination of all the above will be used to ensure all

bases are covered, and nothing key is missed and will depend on

geography, access to individuals and sites.

Sample Questions for a questionnaire

a) What are the secondary location(s) called that employees

can work from?

b) How far away is the above

c) Have the business unit confirmed their requirements

RTO/RPO (see terms section)

d) Are there any existing business continuity documents in

place

e) Can critical IT systems be accessed remotely e.g. in the

cloud from home

f) Has any existing plan ever been tested?

Page 9: Introduction guide to Business Continuity

9

5. Risk Assessment:

You need to understand what the current situation is before you know

what needs to be done. This involves creating a risk register and listing

all the realistic threats that exist to the organisation’s continuity in

business

E.g.: Key customer bankruptcy, terrorist attack in the area, fire,

employee/ex-employee sabotage, flood, IT security breach, physical

security breach.

It is also useful to look at the history of any previous incidents, how

they were resolved and how close they were to getting out of control

and causing an existential threat.

It is essential to have a walk around the site, and key locations such as

server rooms, to assess the status and risk, if for any reason this is not

possible by yourself you should nominate someone to take multiple

photographs that you can then analyse.

Page 10: Introduction guide to Business Continuity

10

Below is a picture of something I discovered on a survey of a computer

room, it is a smoke detector that has been covered by a plastic bag,

rendering it totally useless for the purpose intended, which is of course

to detect fires before they get out of hand.

The reason given was that it was covered when by workmen doing

some building maintenance, and they forgot to remove it!

When reviewing, you need to imagine you are looking at the site post

incident, so that it has happened and you are trying to work out the

cause.

A covered smoke alarm is a very serious thing!

Mitigation of this risk is zero cost but it still needs to be done, other

items may involve expensive remediation which needs to be budgeted

and project managed.

Another sample of something I discovered and highlighted is cabling

looking like spaghetti.

Page 11: Introduction guide to Business Continuity

11

The risks in the picture are, health and

safety as a trip hazard, and from a

continuity perspective if there is a wire

that comes out, the difficulty in tracing

it in all that mess of tangled wires will

add to the delay of recovery, plus other

systems could also be affected and go

down in an uncontrolled manner.

It is also a very good idea to team up

with HSE specialists as many areas will

overlap, plus the HSE regulations tend

to be legally binding and can also be criminal if not followed which

adds extra weight to the business case of getting things fixed.

So, in the case of an engineer tripping over the cables and seriously

injuring themselves, or even dying from hitting their head on a cabinet

or perhaps getting electrocuted, any of these would have a negative

impact on the reputation of the company, plus if the trip meant a wire

was pulled out of a system, it would stop it running.

The assessment needs to include the likelihood of it happening, any

mitigation in place for each risk, and if it did happen then the process

would need to be able to manage the incident to minimise any affect

to the company.

An example of a bad handling of a fatal incident is BP CEO Tony

Hayward, who following his company's battle to contain the massive

Gulf of Mexico oil spill, said to the press “I want my life back” not once

thinking or paying attention to the fact that 11 employees had died

and the Gulf of Mexico was now heavily polluted.

Page 12: Introduction guide to Business Continuity

12

Had he attended a Business Continuity Exercise playing through a

serious incident before, he would have been much more aware of the

consequences, how to react and generally better prepared in advance

and may not have had quite the negative impact to BP’s reputation

that it did.

The risk register is normally done in a spreadsheet:

Sample template

Rather like profit and loss, risks can be described as gross and net. The gross risk is what could happen and then once the mitigation is considered the actual net risk. An example is a fire, the gross risk is total loss, however if there is a detection and sprinkler system in place, the net risk is minimal water damage from the sprinklers.

Page 13: Introduction guide to Business Continuity

13

6. Business Impact Analysis:

If a process was to fail or be unavailable, it is very important to

establish what the monetary, legal, reputation and any other serious

impact on the business would be for any given period.

For some businesses, it can be very difficult to put a value on. For

example, a loss of income for the period when the building from which

the Payroll team are sited was out of action, so not being able to pay

salaries, whilst there was no sales work being done, is hard to establish

the value impact, whereas, loss of a sales outlet is usually more

straightforward, as the figures for how much income they take and

profit are available from existing data.

In order to work this out, it is a good idea to look at previous incidents

and failures to see what the actual impact was, even if the scenario is

different the important thing in this case is the actual.

To represent it graphically you can create a heat map showing the

relative position of each, system, service or asset.

Page 14: Introduction guide to Business Continuity

14

Sample heat map

7. Business Continuity Plan (BCP)

This is the document that details the processes, procedures and

contacts needed in the event of any type of incident that threatens the

continuity of the organisation. It is advisable to keep the scenario

generic so that the same document can be used in multiple different

scenarios.

Page 15: Introduction guide to Business Continuity

15

It is the one document that is most likely to be requested by any

auditor, customer or potential customer.

There are multiple options with this document however one size or

template does not fit all and each one needs to be adapted for each

organisation.

Key elements that must be included in a BCP as a minimum

a) Document Control

This is where you specify the document date, version number and last

revision, it is vital for someone auditing to establish the age of the

document.

b) Invocation Process

A flow chart and description of how the invocation would work.

c) Objectives

Details of what the invocation and plan is expected to achieve, so a key

part will be what is out of scope and what is in scope.

d) Key Roles and Responsibilities

List of who is responsible for which element of the invocation and

recovery, together with deputies.

e) Incident Management Team (IMT)

The core team that would be directing the invocation and recovery,

this may not necessarily mirror the standard BAU management

structure.

f) Activation Procedures

The method and tasks undertaken and used to activate the plan, and

any safeguards in place to prevent an invocation by an authorised plan

which could cause bigger issues.

Page 16: Introduction guide to Business Continuity

16

g) Alternative Locations

A list of locations other than the primary, where people can meet and

or work initially and or on medium term basis.

h) Communications Plan

Contains different messages for different stakeholders, e.g. customers

staff and media and when and how the messages will be

communicated.

Benchmarking and Gap Analysis.

8. Training and BC awareness

It is important for all staff to be aware of their part in Business

Continuity, even if it is passive because they may not realise they don’t

need to do anything and must go home following a major incident and

if they don’t they will cause more issues by being there.

Training and awareness can take many forms, the highest level for

most is the exercise shown below, but other methods I have used in

the past include:

• Adding an agenda item and introduction to BCP to regular team

meetings

• Publishing a scenario with a quiz in newsletters

• Putting notices up to provoke thoughts and discussions e.g.

what would you tomorrow if this office was on fire

Page 17: Introduction guide to Business Continuity

17

Page 18: Introduction guide to Business Continuity

18

9. IT Disaster Recovery/IT Service Continuity

A management process that supports the overall Business Continuity

Management process by ensuring that the IT Service provider can

always provide minimum agreed business continuity

related IT service levels.

The traditional name is Disaster Recovery (DR) although some

organisations and professional bodies now refer to it as Service

Continuity as it no longer is concerned with just dealing with a disaster

affecting IT systems.

Whatever it is defined as within your organisation, I interpret it as

fundamentally the same, with the only difference being that Service

Continuity can cover additional scenarios, such as cyber-attack which

were not traditionally part of DR.

The key question for IT teams to establish is how long it would take

and if it is even possible for your IT systems to be recovered to another

location following loss of the primary location and have they proved it

by testing recently.

There are several different technical solutions to achieve different

capability and in most cases the shorter the time it takes to recover

combined with lowest data loss is the most expensive.

Solution 1:

System has replication occurring constantly between two data

centres if one data centre goes off line for any reason the other

Page 19: Introduction guide to Business Continuity

19

one will take over automatically with virtually no time or data

loss to users, this is normally used where loss of downtime

and/or data would be extremely risky and costly to the

organisation which would justify its large expense

Solution 2

System is backed up overnight into another data centre which

houses all the same systems and it can be recovered within 24

hours

Solution 3

Systems is backed up each night and tapes take off site, there is

no spare equipment so in the event of the need to invoke and

restore the tapes following loss of the primary site, new

premises and equipment would need to be purchased

operation systems and communications would need to be set

up then tapes recovered from the offsite site to be restored

this recovery could easily take months, potentially threatening

the viability of the organisation

The key document for this process is the Disaster Recovery Plan (DRP)

or could be known as the Service Continuity Plan (SCP) it details the

processes and technical tasks that the IT teams need to execute to

complete the recovery or use when testing

It is essential to test this process as in my experience although systems

may on paper appear to be recoverable within the requirements this is

often found to be seriously wanting when tested, and it is much more

sensible to discover this in a test that in a real invocation which could

jeopardise the existence of the organisation

Page 20: Introduction guide to Business Continuity

20

10. Exercise/Test

There will be many assumptions (that word again!) about what people

have thought about in their head in any given serious scenario as to

what they would do, expect others to do, how processes will be

followed etc. However, very often these ideas would only work in

their head and there would be mass confusion and expectations on

everyone else to take responsibility.

One way to establish if these assumptions will work would be to wait

for the incident to occur, which although valid in testing the ideas in

the head, if they fail then the impact to the organisation could well be

terminal!

The best way to find out is to run what is known as a table-top or

desktop exercise. This involves taking all the key stakeholders in a

meeting room, placing a scenario in front of them and seeing how they

react, checking if the steps in the BCP would work, are in a logical

order or even valid and what items are missing altogether. During the

exercise, as with any meeting, someone needs to note actions items

and then a remediation plan is drawn up to update the

documentation, process, training or any other gaps that came out of

the exercise and possibly run another exercise with a different

scenario.

Page 21: Introduction guide to Business Continuity

21

i) Standard Exercise Objectives

• Improve future response

• Safeguard our employees

• Minimise disruption to trade

• Minimise physical, financial and reputational damage

• Ensure prompt and open communication with those

affected and the media

• Restore operations to ‘business as usual’ rapidly and

effectively

You need to set the scene so participants know what they can and

can’t expect and do in the role play they need to undertake

j) The exercise process

• Scenario is described

• BCP is talked through step by step

• It is an opportunity to train individuals in their role and how it

links into the big picture

• Different “curved balls” can be thrown in at any point

• Risk free (virtually)

• Process gaps identified for later remediation

• Many useful lessons will be learnt and Partners will be more

prepared

Page 22: Introduction guide to Business Continuity

22

• Sample Scenario

• These exercises should normally be done on an annual basis

Sample Scenario

• 9th October 09.32hrs

• Location Head Office

• You are working in HQ when the evacuation siren sounds

• You leave the building without your computer and head for the

muster point

• A fire starts in the kitchens of the canteen, and quickly spreads

through-out the basement, triggering the automatic alarm

• Evacuation is swift and thorough; all staff appear to be

accounted for

• Suppressant systems activate, however an undiagnosed fault in

the pipework means that the sprinklers prove ineffective and

the fire soon takes hold of the basement and spreads to the

ground floor

Adding pictures to the slides and handouts that you can find on the

Internet can really help people focus and imagine better:

Page 23: Introduction guide to Business Continuity

23

You can extend props further by for example, downloading and

changing on your computer only news web sites with your own

headlines, record phone conversations in advance, the only limits are

your imagination time and budget.

Page 24: Introduction guide to Business Continuity

24

11. Business and Services Linkage

Standard Blockers and Issues

Remediation Options (some stick, some carrot)

Support from the top management

Engagement and direct involvement with exercises to make them realise their own responsibilities

Unrealistic business assumptions exceed actual capability

Capability should be established, tested and then assumptions can be realigned with reality

Limited budget and will to improve and implement recommendations

BIA will help put BC into to perspective as a business case e.g. the implementation costs relative to the financial impact are usually very small.

Probable production benefits

Lack of knowledge and interest at all levels

Crisis Management/BC desktop exercises

Add BC agenda item to regular meetings

Communications of past incidents and near incidents

We accept the risk! Not an option if a regulatory requirement but if not, provided consequences are fully accepted can be an option

Page 25: Introduction guide to Business Continuity

25

12. Governance

The Purpose of governance is to develop, promote, assure and

maintain the resilience capability of the organisation.

Governance activities include:

• Definition the terms and scope of Business Continuity

• Set direction for Business Continuity for the organisations and

assure its implementation

• Establish and agree the mechanisms to support the delivery

and assurance of BC

• Establish the benchmark methodologies for measuring

resilience

• Be responsible for the maintenance of, and use of good

practice in approaches

Accountability

The Governance group are accountable to the owner/directors and

should report and escalate any issues and risks.

Page 26: Introduction guide to Business Continuity

26

13. Life Cycle

Business Continuity is a never-ending process as organisations, risks

and environments change and evolve constantly, therefore it is not

something that can be done once and left alone for years, it needs be

reviewed and exercised at least annually.

14. Tools:

There are a range of different tools on the market to assist with

Business Continuity, for small businesses many will be too expensive

complex and unsuitable and often easier to implement with a

document. For larger organisations with multiple facets they can make

the management much easier.

Page 27: Introduction guide to Business Continuity

27

a) Document management systems

This are generic systems meant for all types of documents but can be

successfully used to manage updates, version control and storage of

Business Continuity documents as well.

b) Dedicated BC Systems

Systems designed specifically to cover all aspects of Business

Continuity, they tend to be very feature rich, expensive and complex to

install. In my experience, they often end up being used as glorified

document management systems, so if choosing one of this make sure

thorough due diligence is carried out, that is it is fit for your purposes.

c) Caller alert systems

Systems designed to notify multiple people in a pre-determined order,

also known as a call tree. The administrator(s) pre-populates the

system with individual’s names, emails and phone numbers and

following any requirement to notify people, either pre-written or ad

hoc messages can be compiled from the system once, and sent to

different people depending on the groups they are in, this can be SMS,

automated voice call or email and any combination of these. This type

of system is very efficient when large numbers of people need to be

notified at the same time and does not necessarily have be used just

for emergencies.

Page 28: Introduction guide to Business Continuity

28

15. Terminology The key terms

Term Description

Disaster An event or series of events where the impacts exceed the recovery capability of the standard IT incident management function for that service.

RPO - Recovery Point Objective

The target set for the status and availability of data at the start of a recovery process. It is a point in time at which data or capacity of a process is in a known, valid state and can safely be restored from.

RTO - Recovery Time Objective

The target time for resuming the delivery of a product or service to an acceptable level following its disruption.

Recovery Point Capability (RPC)

The actual capability of the “age” of the data in the event of invocation, this can be ascertained initially based on the design but should verified by testing.

Recovery Time Capability (RTC)

The actual capability of the time taken to restore the system to a working status following the decision to invoke, this can be ascertained initially based on the design but should verified by testing

Page 29: Introduction guide to Business Continuity

29

The following diagram provides an illustration of RTO and RPO:

The RPO & RTO could be different for each system and application,

so, for example, the accounts system could have an RPO/RTO of

72hrs/1 week but the customer sales system might have RPO/RTO of 1

hour/24 hours.

The figures are usually arrived with lots of healthy debate and need to

be done in a pragmatic way, bearing in mind that the lower the

RTO/RPO the systems and infrastructure as will usually be more

expensive

so, for example 15min/15mins will nearly always be much more

expensive than 24hrs/24hrs and the amount spent needs to be able to

be justified by the Business Impact Analysis.

Page 30: Introduction guide to Business Continuity

30

16. Other Options

Can I decide not to do anything about it?

If you are in a business regulated by governing bodies for example the

FCA, Health and Safety, obligations to shareholders or subject to other

regulations then you will be required to put something in place,

however for other types of business the choice can be yours.

Its equivalent to the difference between Car Insurance (which is a legal

requirement for everyone driving on public roads in the UK) and house

Insurance which is optional.

You can decide to do nothing at all and just accept the risk, however by

doing this you are also implicitly signing up to what I call,

“The Hope and Pray Method”

If it is optional there is a commercial consideration is that you can

often gain a competitive advantage by having BC in place. Would you

choose a company that has not got a provision over one that has?

Well your customers might well be thinking the same.

Page 31: Introduction guide to Business Continuity

31

16. Want to find out more or need some help?

We can help you with any part of the above, no company too small or

too large, from a half day consultancy to a 6-month project.

• Been asked to carry out a business impact assessment

• Need a Business Continuity Plan (BCP)

• Need a Disaster Recovery Plan (DRP)

• Have you just had an audit and need to resolve your DR/BC issues fast?

• Do you want to test whether your systems can recover in the event of an incident or disaster?

• Need facilitation of testing or running an exercise

• Your IT team do not have the time and/or the skills to carry out DR/BC work

You probably don't like doing this stuff! Well we love it!!!

www.Adtbusinesscontinuity.com

Call 0800 9993374

Page 32: Introduction guide to Business Continuity

32

Copyright notice

This document is copyright of Add Business Continuity Ltd, all rights

reserved.

Any redistribution or reproduction of part or all the contents in any

form is prohibited other than the following: you may print or download

to a local hard disk extracts for your use only.

You may not, except with our express written permission, distribute or

commercially exploit the content. Nor may you transmit it or store it in

any other website or other form of electronic retrieval system.