business continuity management for data centres - bcs · business continuity management for data...

36
Business Continuity Management for Data Centres Dr Robert M Cachia BSc Dott Sc (Milan) FCQI (UK) Director, ISACA Malta Chapter IT Governance Manager Government of Malta Visiting Senior Lecturer University of Malta Joint Event: BCS Malta Section ISACA Malta Chapter 13 th October 2009

Upload: dangkiet

Post on 26-Apr-2018

216 views

Category:

Documents


2 download

TRANSCRIPT

Business Continuity Management

for Data Centres

Dr Robert M Cachia BSc Dott Sc (Milan) FCQI (UK)

Director ISACA Malta Chapter

IT Governance Manager

Government of Malta

Visiting Senior Lecturer

University of Malta

Joint Event

BCS Malta Section

ISACA Malta Chapter

13th October 2009

About myself

20+ years experience

Positions

Programmer Software Engineer IT Project Manager IT Quality Manager Principal Consultant IT Governance Manager

IT experience

bull Software House environment

bull IT Service Desk environment

bull Data Centre environment

University of Malta teaching

bull e-SCM ERP e-CRM

bull Service Quality Assurance

bull Sustainable Logistics amp Transportation

(to MBABAccty Dipl Logistics amp Transportation students)

Director ISACA Malta Chapter

robertmcachia at umedumt

Todayrsquos agenda (1)

BCM for industrial strength co-located commercial IT providers

eg Data Centres offering SaaS using Cloud Computing technologies

Today we discuss

bull e-services e-business data centres amp IT call centres

bull BCM for Data Centres

bull business e-services amp trust

bull BCM amp IT - economic context

bull threats vulnerabilities risk assets

bull Data Centre outages business impacts

bull BCM - in Data Centres for Data Centres

robertmcachia at umedumt

Todayrsquos agenda (2)

Today we discuss (continued)

bull a BCM initiative for DCs its output

bull BCP for DCs management choices

bull BCM ProjectProgramme

bull business impact analysis

bull risk assessment

bull BCM DC recovery - business amp technical

bull an example disaster - mass cyber-attack amp BCM response

bull structure and content of BS25999

bull some takeaways

robertmcachia at umedumt

e-Services e-Business - all from Data Centres amp IT Call Centres

robertmcachia at umedumt

BCM for Data Centres what

BCM

i is a management amp business capability to protect value-creating

activities of the DC

ii supports the Boardrsquos objective of staying in business

iii points i amp ii make BCM immediately a Governance issue

BCM - planned processes to

bull assess reduce amp manage vulnerability to risk

bull plan DC responses in case of adversity

bull exercise preparedness for continued IT delivery during disruption

bull restore normality and protect DC reputation and goodwill

ldquoThe time to repair the roof is when the sun is shiningrdquo (J F Kennedy)

robertmcachia at umedumt

Business e-Services amp trust

Is she

bull a Logistics Manager

bull a Banker

bull a PA to the CEO

bull a CEO

in

bull an Airport

bull a Bourse

bull a Telecoms provider

bull iGaming

bull e-payment

bull a Retail chain hellip

hellip doing B2B or B2C

hellip doing e-CRM or e-SCMe-services need trust

robertmcachia at umedumt

BCM and IT - economic context

Economy 20 amp Economy 30

digital networked global mobile locational 24x7x365

11 billion Internet users 27 billion mobile phone users

ubiquitous VoIPSkype e-mail amp calendaring software

e-Business e-Banking e-Payment e-Government

e-SCM e-CRM B2B B2C -across supply chains geographies time-zones

economies amp jurisdictions

BCM in DCs is mandated by economic logic

robertmcachia at umedumt

BCM and IT - what not to do

human error IT management human error technical

robertmcachia at umedumt

DC BCM terminology Threats Vulnerabilities Risk Assets

Threat the intent and capacity to cause loss or disruption

and create adverse consequences - eg to IT

services data Data Centers IT Call Centers etc

Vulnerability the susceptibility of a service provider service data

or infrastructure to damage impairment or exposure

by a threat

Risk a measure of the potential consequences of a

contingency against the likelihood of its occurring

threats + vulnerabilities = risk

Impact the consequenceeffect of a threat expressed in

terms of reduction of DC capability or loss of

business service data etc

Asset an IT service element eg software hardware IT

people data

robertmcachia at umedumt

Threats - some examples

Some data centre threats

human error (managementtechnical accidentalmalicious)

technical failure (mechanical electrical hardware software

etc)

software failure (even software patches themselves contain

bugs)

fire explosion smoke

floods (natural burst pipes)

toxic hazards (effluents emissions)

structural collapse

crime (vandalism organized crime white-collar crime etc)

mass cyber-attack digital terrorism cyber-war

Established threats may diminish and new threats emerge - alertness

Oct 2006 - Marsascala

robertmcachia at umedumt

Threats - general amp IT-specific

-

Explosion fire smoke emissions effluents

in or close to DC

malicious IT Threat

robertmcachia at umedumt

IT Vulnerabilities - some examples (1)

DC vulnerabilities may be technical or non-technical

eg in organizational design in business processes systems or software

Some IT vulnerabilities

Poor Data Centre site selection or Data Centre design

SPOFs (people hardware databases software)

Poor Data Centre procedures or compliance to

procedure (SFIA HR framework for IT ITIL)

Poor physical access control

Poor software patchinggood physical access

control

robertmcachia at umedumt

IT Vulnerabilities - some examples (2)

Poor software testing especially Web

testing (recall CMMi ISEB)

Poor Configuration ManagementVersion

Control (recall ISO 20000 ITIL)

Poor Infosec (passwords biometrics

encryption anti-virusmalware)

(recall ISO 27001 CobIT RiskIT)

Poor Capacity ManagementAvailability

Management (recall ISO 20000 ITIL)

Poor procurementsupplier management

(ISO 9000 ITIL BS25999)

Poor fire protection

Known vulnerabilities sometimes diminish amp new ones emerge - alertness

wiring from hell

robertmcachia at umedumt

Risks - general amp IT-specific

loss of facilities (buildings roads leading to building etc)

loss of data (data integrity data availability recall ISO 27001)

loss of power

loss of water supply (cooling) the opposite - flooding

computer infection outbreaks - viruses worms

physical access compromised intrusion with intent

asset seizure

Changing threats changing vulnerabilities - shifting scenario alertness

Some Data Centres risks

supply chain disruption (software vendor collapse

hardware vendor collapse ISP collapse)

loss of Telecom connectivity (data voice)

loss of people (people can be SPOFs)Your IT vendor supply

chain

Burst pipe

DC flooding

robertmcachia at umedumt

IT Risks - some examples

Physical DC Risks Logical DC Risks

DC outages business impacts - large amp immediate

Impact of DC outages

bull down-time

bull breached SLAs

bull lost revenue

bull reputational damage

clients and prospects lost

forever

bull cost to recover

bull legal liabilities

bull some DCs never recover

Recall many DCs are in

premises that were never

designed to be DC-ready

Q How to protect value-creation capability

A BCM

robertmcachia at umedumt

BCM in Data Centres BCM for Data Centres

360 e-continuity amp e-assurance

business functions of commercial IT providers(eg CEOrsquos office Procurement Marketing amp Sales HR Facilities

Planning QA)

IT processes eg IncidentProblem ConfigurationChange

AvailabilityCapacity

Data

Employees including IT people

Physical FacilitiesBuilding Networks VoIP

Computational and Storage

Servers SANNAS

Software stackOS DBMS middleware applications

ApplicationsServicesB2BB2C e-SCMERPe-CRM BIKM IntranetsExtranets e-mail

BCM for data centres 360

robertmcachia at umedumt

BCM DC initiative output a BC Plan + organizational capability

A good BC plan is articulated by components and views

With a good BC plan the CIOCTOData Centre Manager can tell the

CEO

We haveare executing plans by DC clientmarket segment hellip

We haveare executing plans by Applicationby Service hellip

We haveare executing plans by DC Departmenthellip

We haveare executing plans by Buildingby Floor hellip

BC Plan BC Planning (i) What (ii) How (iii) Who (iv) When

robertmcachia at umedumt

DC Business Continuity Planning management choices

Which DC clientsservicesapplications get restored first and who will

wait longer hellip recovery sequence for each client

Which servicesapplications get restored completely which clients will

initially only get limited service

MTPD maximum tolerable period of disruption the period of time by

which all identified systemsapplicationsservices and data

must be restored to normality (for DC for each client)

RTO recovery time objective the period in time within which

assetsservices must be recovered after disruption (for DC for

each client)

RTO affects the recovery option shorter RTO -gt more difficult

more expensive (for DC for each client)

RPO recovery point objective the point in time to which

assetsservices must be restored after disruption (for DC for

each client)

RPO affects the volume of data that may need to be restored

robertmcachia at umedumt

BCM initiative requires a dedicated ProjectProgramme

A BC initiative may adopt Prince 2 or PMBOK projectmanagement methods The typical BCM phases

BC Project Initiation Phase

(mandate BCM scope DC BCM objectives responsibilitiesapproach awareness and BC project outcomesdeliverables)

Business Impact Analysis Phase

Risk Assessment Phase

BC Planning amp Design Phase

Exercise amp Testing Phase

Maintenance amp Review Phase

Awareness amp Training Phase

robertmcachia at umedumt

Business Impact Analysis (BIA) Phase

BIA achieves understanding of the DC its users activities

servicesrevenue streams applications data liabilities

bull BIA takes an internal focus - it identifies critical DC value-creatingactivities and assets

bull BIA quantifies DC loss due to disruptionimpairment of assets(reputational financial lost revenue cost of recovery)

bull BIA does not estimate the probability of types of DC incidents - itquantifies the consequences

bull BIA techniques questionnaires + interviews

BIA answers the questions which DC activities amp

assets create most value hellip of all potential DC losses

which impaired activities amp assets would be the

greatest losscascading impacts

multiple DC failure

robertmcachia at umedumt

The Risk Matrix - likelihood vs impact

Major Risks to DC - top-right square

unacceptable Eliminate or Postpone

Transfer Monitor amp Mitigate +

management ownership + DO BCM -Governance

Contingency Risks to DC - top-left

square very rare events will

extinguish the business complacency +

overconfidence can distract

management from the consequences

DO BCM - Governance

High incidence and low impact risksamp

minor risks order of the day -

operational matters

robertmcachia at umedumt

Risk Assessment Phase

RA considers the whatwhywherewho can and would cause the DCthe disastrous losses identified in BIA RA seeks to understand threatsto valued serviceDC assets

bull RA takes a primarily external focus a worldpoliticaleconomic IT sector focus - it looks at sources and causes

bull RA identifies threat scenarios to DC value-creating activities ampassets

bull RA estimates likelihood (probability) of occurrence of DC lossesamp asset impairment identified in BIA (estimates orguesstimates)

Note Need to locate DC activities amp assets in the slots of the RiskMatrix for the DC and plan to protect value-creating capability

robertmcachia at umedumt

BC Planning amp Design Phase - business recovery

DC recovery options driven by business requirements

regulation contracts amp SLAs A typical recovery sequence

1 Set MTPD objective for the DCfor each

client

2 Assign RTOs for individual

clientservicesapplicationsdata on

basis of prioritySLA

3 Evaluate alternative management amp

technical recovery options

4 Perform costbenefit analysis for each

option (re-planiterate)

5 Decide CEOCFO (i) approval (ii)

budget (iii) tasking individualsFailure to prepare is preparing

to fail - John Wooden

robertmcachia at umedumt

Some business recovery options for the DC

bull an Emergency Operations OfficeCentre

bull virtual workplaceteleworking from home for selected DC employees

bull alternative IT Call CentreData Centre

bull business decision - On-shore Near-shore Offshore

fallbackredundancy options

A typical recovery sequence

1 Initial emergency response

2 Resume mission-critical ApplicationsServices

3 Resume non-critical ApplicationsServices

4 Restoration to primary site full services verify stability

Ongoing communication - DC employees clients regulators

BC Planning amp Design Phase - business recovery (2)

robertmcachia at umedumt

BC Planning amp Design - technical recovery

Note business decision On-shore Near-shore Offshore

Note Mirror Hot Warm Cold (i) own (ii) shared-space (iii) reciprocal

agreements (iv) outsourced a business issue

Note high-end BC for DCs active-active hellip hot failover hellip

Note entry-point BC for DCs electronic vaulting remote journaling

transaction logging

Mirror full redundancy short-

distancewidely-separated Data

Centers latency considerations

Hot fully equipped fallback Data Centre

Warm fallback Data Center missing key

components

Cold empty fallback Data Center

DC Recovery options by increasing MTPD RTO amp decreasing cost

mirroring

robertmcachia at umedumt

Phases Exercise amp Testing Maintain amp Review Awareness amp Training

Maintain amp Review Phase

feedback from testingaudit broadendeepen DC BC plan

Awareness amp Training Phase

its never enough institutionalize BC practice embed into

DC culture

The proof of the bdquopudding‟

is in the eating

Exercise amp Testing Phase

Give yourself an intentional DC disaster

1 Re-test BCP regularly

(walk-through checklist role-

playingsimulation full interruption amp

rehearsal)

2 Test alternative risk (threat +

vulnerability) scenarios

robertmcachia at umedumt

An example disaster mass cyber-attack

A mass cyber-attack may

involve

prior sniffing hellip

hellip social engineering amp

widespread attack

An attack may be blended

denial-of-service andor

widespread virusworm

infection andor

penetration (data theft

data corruption) andor

fraudblackmail

IncidentProblem Management

(recall ISO 20000 ITIL)source M Benyoucef

University of Ottawa

robertmcachia at umedumt

An example DC disaster BCM response to cyber-attack

This sequence applies to DoSDDoS infection amp penetration cyber-

attacks

Note After the BCM response seek prevention (i) Lessons Learnt (ii)

improve DC processsystems resilience (iii) heighten DC emergency

preparedness

1 Interdict actions to block attack halt its progress

2 Contain actions to prevent further degradation

amp regain control

3 Recover repair fix damage bring all systems

data applicationsservices to normality

4 Analyse investigationaudit - examination of

evidence

When prevention fails and an attack is under way the BCM

response is

Interdict Contain

Recover Analyse

robertmcachia at umedumt

Structure amp Content of BS25999

A DC may go beyond organising a BCP project

It may decide to formalise its BCM assurance process by embracing the

BS25999 standard

BS25999 holistic structure amp content

bull overview of business continuity management (BCM)

bull business continuity management policy

bull BCM programme management

bull understanding the organization

bull determining business continuity strategy

bull developing and implementing a BCM response

bull exercising maintaining and reviewing BCM arrangements

bull embedding BCM in the organizationrsquos culture

BS25999 Holistic

robertmcachia at umedumt

Some takeaways (1)

BCM thinking amp action

bull decrease the number of

holes both management amp

technical

bull decrease the size of the

holes and

bull keep them moving so they

are never aligned

The Swiss cheese model of business resilience

robertmcachia at umedumt

Some takeaways (2)

BCM is a concern of

CIOs CTOs Data Centre Managers

Quality Managers Infosec Managers

IT Architects DBAs

IT Auditors

CEOs CFOs because itrsquos Governance

BCM addresses broad non-specific risks it is a ldquocatch allrdquo

BCM is probably the most cost-effective security control of all

BCM start now start small scale fast broadendeepen BCM scope

incrementally

You must be able to respond to your circumstances as they exist - not as you

would like them to be - Brian Billick

robertmcachia at umedumt

Sources in and around BCM for Data Centres

The Business Continuity Institute (BCI UK) httpwwwthebciorg

Guidancestudies on BCM by ENISA (EU) ECB (EU) CMI (UK) FSA (UK) UK

Cabinet Office

A primer on ITIL ISO 20000 - IT Service Management

httpwwwisaca-

maltaorgdocumentspresentationsIT_Service_Management_Robert_M_Cachia_26_02

_2009pdf

Agility amp alertness hellipcontinuitycentral httpwwwcontinuitycentralcomukhtm

BS25999 (British Standards Institution) httpwwwbsigroupcomensectorsandservicesDisciplinesBusines

s-Continuity

RiskIT amp CobIT frameworks (ISACA) wwwisaca-maltaorg

SFIA HR framework for IT ISEB (BCS) httpwwwbcsorg

SEI (USA) httpwwwseicmuedu

Uptime Institute (USA) httpwwwuptimeinstituteorg

robertmcachia at umedumt

Questions and Discussion

robertmcachia atldquo umedumt

Thank You

Dr Robert M Cachia

httpwwwlinkedincominrobertmcachia

robertmcachia ldquoatrdquo umedumt

About myself

20+ years experience

Positions

Programmer Software Engineer IT Project Manager IT Quality Manager Principal Consultant IT Governance Manager

IT experience

bull Software House environment

bull IT Service Desk environment

bull Data Centre environment

University of Malta teaching

bull e-SCM ERP e-CRM

bull Service Quality Assurance

bull Sustainable Logistics amp Transportation

(to MBABAccty Dipl Logistics amp Transportation students)

Director ISACA Malta Chapter

robertmcachia at umedumt

Todayrsquos agenda (1)

BCM for industrial strength co-located commercial IT providers

eg Data Centres offering SaaS using Cloud Computing technologies

Today we discuss

bull e-services e-business data centres amp IT call centres

bull BCM for Data Centres

bull business e-services amp trust

bull BCM amp IT - economic context

bull threats vulnerabilities risk assets

bull Data Centre outages business impacts

bull BCM - in Data Centres for Data Centres

robertmcachia at umedumt

Todayrsquos agenda (2)

Today we discuss (continued)

bull a BCM initiative for DCs its output

bull BCP for DCs management choices

bull BCM ProjectProgramme

bull business impact analysis

bull risk assessment

bull BCM DC recovery - business amp technical

bull an example disaster - mass cyber-attack amp BCM response

bull structure and content of BS25999

bull some takeaways

robertmcachia at umedumt

e-Services e-Business - all from Data Centres amp IT Call Centres

robertmcachia at umedumt

BCM for Data Centres what

BCM

i is a management amp business capability to protect value-creating

activities of the DC

ii supports the Boardrsquos objective of staying in business

iii points i amp ii make BCM immediately a Governance issue

BCM - planned processes to

bull assess reduce amp manage vulnerability to risk

bull plan DC responses in case of adversity

bull exercise preparedness for continued IT delivery during disruption

bull restore normality and protect DC reputation and goodwill

ldquoThe time to repair the roof is when the sun is shiningrdquo (J F Kennedy)

robertmcachia at umedumt

Business e-Services amp trust

Is she

bull a Logistics Manager

bull a Banker

bull a PA to the CEO

bull a CEO

in

bull an Airport

bull a Bourse

bull a Telecoms provider

bull iGaming

bull e-payment

bull a Retail chain hellip

hellip doing B2B or B2C

hellip doing e-CRM or e-SCMe-services need trust

robertmcachia at umedumt

BCM and IT - economic context

Economy 20 amp Economy 30

digital networked global mobile locational 24x7x365

11 billion Internet users 27 billion mobile phone users

ubiquitous VoIPSkype e-mail amp calendaring software

e-Business e-Banking e-Payment e-Government

e-SCM e-CRM B2B B2C -across supply chains geographies time-zones

economies amp jurisdictions

BCM in DCs is mandated by economic logic

robertmcachia at umedumt

BCM and IT - what not to do

human error IT management human error technical

robertmcachia at umedumt

DC BCM terminology Threats Vulnerabilities Risk Assets

Threat the intent and capacity to cause loss or disruption

and create adverse consequences - eg to IT

services data Data Centers IT Call Centers etc

Vulnerability the susceptibility of a service provider service data

or infrastructure to damage impairment or exposure

by a threat

Risk a measure of the potential consequences of a

contingency against the likelihood of its occurring

threats + vulnerabilities = risk

Impact the consequenceeffect of a threat expressed in

terms of reduction of DC capability or loss of

business service data etc

Asset an IT service element eg software hardware IT

people data

robertmcachia at umedumt

Threats - some examples

Some data centre threats

human error (managementtechnical accidentalmalicious)

technical failure (mechanical electrical hardware software

etc)

software failure (even software patches themselves contain

bugs)

fire explosion smoke

floods (natural burst pipes)

toxic hazards (effluents emissions)

structural collapse

crime (vandalism organized crime white-collar crime etc)

mass cyber-attack digital terrorism cyber-war

Established threats may diminish and new threats emerge - alertness

Oct 2006 - Marsascala

robertmcachia at umedumt

Threats - general amp IT-specific

-

Explosion fire smoke emissions effluents

in or close to DC

malicious IT Threat

robertmcachia at umedumt

IT Vulnerabilities - some examples (1)

DC vulnerabilities may be technical or non-technical

eg in organizational design in business processes systems or software

Some IT vulnerabilities

Poor Data Centre site selection or Data Centre design

SPOFs (people hardware databases software)

Poor Data Centre procedures or compliance to

procedure (SFIA HR framework for IT ITIL)

Poor physical access control

Poor software patchinggood physical access

control

robertmcachia at umedumt

IT Vulnerabilities - some examples (2)

Poor software testing especially Web

testing (recall CMMi ISEB)

Poor Configuration ManagementVersion

Control (recall ISO 20000 ITIL)

Poor Infosec (passwords biometrics

encryption anti-virusmalware)

(recall ISO 27001 CobIT RiskIT)

Poor Capacity ManagementAvailability

Management (recall ISO 20000 ITIL)

Poor procurementsupplier management

(ISO 9000 ITIL BS25999)

Poor fire protection

Known vulnerabilities sometimes diminish amp new ones emerge - alertness

wiring from hell

robertmcachia at umedumt

Risks - general amp IT-specific

loss of facilities (buildings roads leading to building etc)

loss of data (data integrity data availability recall ISO 27001)

loss of power

loss of water supply (cooling) the opposite - flooding

computer infection outbreaks - viruses worms

physical access compromised intrusion with intent

asset seizure

Changing threats changing vulnerabilities - shifting scenario alertness

Some Data Centres risks

supply chain disruption (software vendor collapse

hardware vendor collapse ISP collapse)

loss of Telecom connectivity (data voice)

loss of people (people can be SPOFs)Your IT vendor supply

chain

Burst pipe

DC flooding

robertmcachia at umedumt

IT Risks - some examples

Physical DC Risks Logical DC Risks

DC outages business impacts - large amp immediate

Impact of DC outages

bull down-time

bull breached SLAs

bull lost revenue

bull reputational damage

clients and prospects lost

forever

bull cost to recover

bull legal liabilities

bull some DCs never recover

Recall many DCs are in

premises that were never

designed to be DC-ready

Q How to protect value-creation capability

A BCM

robertmcachia at umedumt

BCM in Data Centres BCM for Data Centres

360 e-continuity amp e-assurance

business functions of commercial IT providers(eg CEOrsquos office Procurement Marketing amp Sales HR Facilities

Planning QA)

IT processes eg IncidentProblem ConfigurationChange

AvailabilityCapacity

Data

Employees including IT people

Physical FacilitiesBuilding Networks VoIP

Computational and Storage

Servers SANNAS

Software stackOS DBMS middleware applications

ApplicationsServicesB2BB2C e-SCMERPe-CRM BIKM IntranetsExtranets e-mail

BCM for data centres 360

robertmcachia at umedumt

BCM DC initiative output a BC Plan + organizational capability

A good BC plan is articulated by components and views

With a good BC plan the CIOCTOData Centre Manager can tell the

CEO

We haveare executing plans by DC clientmarket segment hellip

We haveare executing plans by Applicationby Service hellip

We haveare executing plans by DC Departmenthellip

We haveare executing plans by Buildingby Floor hellip

BC Plan BC Planning (i) What (ii) How (iii) Who (iv) When

robertmcachia at umedumt

DC Business Continuity Planning management choices

Which DC clientsservicesapplications get restored first and who will

wait longer hellip recovery sequence for each client

Which servicesapplications get restored completely which clients will

initially only get limited service

MTPD maximum tolerable period of disruption the period of time by

which all identified systemsapplicationsservices and data

must be restored to normality (for DC for each client)

RTO recovery time objective the period in time within which

assetsservices must be recovered after disruption (for DC for

each client)

RTO affects the recovery option shorter RTO -gt more difficult

more expensive (for DC for each client)

RPO recovery point objective the point in time to which

assetsservices must be restored after disruption (for DC for

each client)

RPO affects the volume of data that may need to be restored

robertmcachia at umedumt

BCM initiative requires a dedicated ProjectProgramme

A BC initiative may adopt Prince 2 or PMBOK projectmanagement methods The typical BCM phases

BC Project Initiation Phase

(mandate BCM scope DC BCM objectives responsibilitiesapproach awareness and BC project outcomesdeliverables)

Business Impact Analysis Phase

Risk Assessment Phase

BC Planning amp Design Phase

Exercise amp Testing Phase

Maintenance amp Review Phase

Awareness amp Training Phase

robertmcachia at umedumt

Business Impact Analysis (BIA) Phase

BIA achieves understanding of the DC its users activities

servicesrevenue streams applications data liabilities

bull BIA takes an internal focus - it identifies critical DC value-creatingactivities and assets

bull BIA quantifies DC loss due to disruptionimpairment of assets(reputational financial lost revenue cost of recovery)

bull BIA does not estimate the probability of types of DC incidents - itquantifies the consequences

bull BIA techniques questionnaires + interviews

BIA answers the questions which DC activities amp

assets create most value hellip of all potential DC losses

which impaired activities amp assets would be the

greatest losscascading impacts

multiple DC failure

robertmcachia at umedumt

The Risk Matrix - likelihood vs impact

Major Risks to DC - top-right square

unacceptable Eliminate or Postpone

Transfer Monitor amp Mitigate +

management ownership + DO BCM -Governance

Contingency Risks to DC - top-left

square very rare events will

extinguish the business complacency +

overconfidence can distract

management from the consequences

DO BCM - Governance

High incidence and low impact risksamp

minor risks order of the day -

operational matters

robertmcachia at umedumt

Risk Assessment Phase

RA considers the whatwhywherewho can and would cause the DCthe disastrous losses identified in BIA RA seeks to understand threatsto valued serviceDC assets

bull RA takes a primarily external focus a worldpoliticaleconomic IT sector focus - it looks at sources and causes

bull RA identifies threat scenarios to DC value-creating activities ampassets

bull RA estimates likelihood (probability) of occurrence of DC lossesamp asset impairment identified in BIA (estimates orguesstimates)

Note Need to locate DC activities amp assets in the slots of the RiskMatrix for the DC and plan to protect value-creating capability

robertmcachia at umedumt

BC Planning amp Design Phase - business recovery

DC recovery options driven by business requirements

regulation contracts amp SLAs A typical recovery sequence

1 Set MTPD objective for the DCfor each

client

2 Assign RTOs for individual

clientservicesapplicationsdata on

basis of prioritySLA

3 Evaluate alternative management amp

technical recovery options

4 Perform costbenefit analysis for each

option (re-planiterate)

5 Decide CEOCFO (i) approval (ii)

budget (iii) tasking individualsFailure to prepare is preparing

to fail - John Wooden

robertmcachia at umedumt

Some business recovery options for the DC

bull an Emergency Operations OfficeCentre

bull virtual workplaceteleworking from home for selected DC employees

bull alternative IT Call CentreData Centre

bull business decision - On-shore Near-shore Offshore

fallbackredundancy options

A typical recovery sequence

1 Initial emergency response

2 Resume mission-critical ApplicationsServices

3 Resume non-critical ApplicationsServices

4 Restoration to primary site full services verify stability

Ongoing communication - DC employees clients regulators

BC Planning amp Design Phase - business recovery (2)

robertmcachia at umedumt

BC Planning amp Design - technical recovery

Note business decision On-shore Near-shore Offshore

Note Mirror Hot Warm Cold (i) own (ii) shared-space (iii) reciprocal

agreements (iv) outsourced a business issue

Note high-end BC for DCs active-active hellip hot failover hellip

Note entry-point BC for DCs electronic vaulting remote journaling

transaction logging

Mirror full redundancy short-

distancewidely-separated Data

Centers latency considerations

Hot fully equipped fallback Data Centre

Warm fallback Data Center missing key

components

Cold empty fallback Data Center

DC Recovery options by increasing MTPD RTO amp decreasing cost

mirroring

robertmcachia at umedumt

Phases Exercise amp Testing Maintain amp Review Awareness amp Training

Maintain amp Review Phase

feedback from testingaudit broadendeepen DC BC plan

Awareness amp Training Phase

its never enough institutionalize BC practice embed into

DC culture

The proof of the bdquopudding‟

is in the eating

Exercise amp Testing Phase

Give yourself an intentional DC disaster

1 Re-test BCP regularly

(walk-through checklist role-

playingsimulation full interruption amp

rehearsal)

2 Test alternative risk (threat +

vulnerability) scenarios

robertmcachia at umedumt

An example disaster mass cyber-attack

A mass cyber-attack may

involve

prior sniffing hellip

hellip social engineering amp

widespread attack

An attack may be blended

denial-of-service andor

widespread virusworm

infection andor

penetration (data theft

data corruption) andor

fraudblackmail

IncidentProblem Management

(recall ISO 20000 ITIL)source M Benyoucef

University of Ottawa

robertmcachia at umedumt

An example DC disaster BCM response to cyber-attack

This sequence applies to DoSDDoS infection amp penetration cyber-

attacks

Note After the BCM response seek prevention (i) Lessons Learnt (ii)

improve DC processsystems resilience (iii) heighten DC emergency

preparedness

1 Interdict actions to block attack halt its progress

2 Contain actions to prevent further degradation

amp regain control

3 Recover repair fix damage bring all systems

data applicationsservices to normality

4 Analyse investigationaudit - examination of

evidence

When prevention fails and an attack is under way the BCM

response is

Interdict Contain

Recover Analyse

robertmcachia at umedumt

Structure amp Content of BS25999

A DC may go beyond organising a BCP project

It may decide to formalise its BCM assurance process by embracing the

BS25999 standard

BS25999 holistic structure amp content

bull overview of business continuity management (BCM)

bull business continuity management policy

bull BCM programme management

bull understanding the organization

bull determining business continuity strategy

bull developing and implementing a BCM response

bull exercising maintaining and reviewing BCM arrangements

bull embedding BCM in the organizationrsquos culture

BS25999 Holistic

robertmcachia at umedumt

Some takeaways (1)

BCM thinking amp action

bull decrease the number of

holes both management amp

technical

bull decrease the size of the

holes and

bull keep them moving so they

are never aligned

The Swiss cheese model of business resilience

robertmcachia at umedumt

Some takeaways (2)

BCM is a concern of

CIOs CTOs Data Centre Managers

Quality Managers Infosec Managers

IT Architects DBAs

IT Auditors

CEOs CFOs because itrsquos Governance

BCM addresses broad non-specific risks it is a ldquocatch allrdquo

BCM is probably the most cost-effective security control of all

BCM start now start small scale fast broadendeepen BCM scope

incrementally

You must be able to respond to your circumstances as they exist - not as you

would like them to be - Brian Billick

robertmcachia at umedumt

Sources in and around BCM for Data Centres

The Business Continuity Institute (BCI UK) httpwwwthebciorg

Guidancestudies on BCM by ENISA (EU) ECB (EU) CMI (UK) FSA (UK) UK

Cabinet Office

A primer on ITIL ISO 20000 - IT Service Management

httpwwwisaca-

maltaorgdocumentspresentationsIT_Service_Management_Robert_M_Cachia_26_02

_2009pdf

Agility amp alertness hellipcontinuitycentral httpwwwcontinuitycentralcomukhtm

BS25999 (British Standards Institution) httpwwwbsigroupcomensectorsandservicesDisciplinesBusines

s-Continuity

RiskIT amp CobIT frameworks (ISACA) wwwisaca-maltaorg

SFIA HR framework for IT ISEB (BCS) httpwwwbcsorg

SEI (USA) httpwwwseicmuedu

Uptime Institute (USA) httpwwwuptimeinstituteorg

robertmcachia at umedumt

Questions and Discussion

robertmcachia atldquo umedumt

Thank You

Dr Robert M Cachia

httpwwwlinkedincominrobertmcachia

robertmcachia ldquoatrdquo umedumt

Todayrsquos agenda (1)

BCM for industrial strength co-located commercial IT providers

eg Data Centres offering SaaS using Cloud Computing technologies

Today we discuss

bull e-services e-business data centres amp IT call centres

bull BCM for Data Centres

bull business e-services amp trust

bull BCM amp IT - economic context

bull threats vulnerabilities risk assets

bull Data Centre outages business impacts

bull BCM - in Data Centres for Data Centres

robertmcachia at umedumt

Todayrsquos agenda (2)

Today we discuss (continued)

bull a BCM initiative for DCs its output

bull BCP for DCs management choices

bull BCM ProjectProgramme

bull business impact analysis

bull risk assessment

bull BCM DC recovery - business amp technical

bull an example disaster - mass cyber-attack amp BCM response

bull structure and content of BS25999

bull some takeaways

robertmcachia at umedumt

e-Services e-Business - all from Data Centres amp IT Call Centres

robertmcachia at umedumt

BCM for Data Centres what

BCM

i is a management amp business capability to protect value-creating

activities of the DC

ii supports the Boardrsquos objective of staying in business

iii points i amp ii make BCM immediately a Governance issue

BCM - planned processes to

bull assess reduce amp manage vulnerability to risk

bull plan DC responses in case of adversity

bull exercise preparedness for continued IT delivery during disruption

bull restore normality and protect DC reputation and goodwill

ldquoThe time to repair the roof is when the sun is shiningrdquo (J F Kennedy)

robertmcachia at umedumt

Business e-Services amp trust

Is she

bull a Logistics Manager

bull a Banker

bull a PA to the CEO

bull a CEO

in

bull an Airport

bull a Bourse

bull a Telecoms provider

bull iGaming

bull e-payment

bull a Retail chain hellip

hellip doing B2B or B2C

hellip doing e-CRM or e-SCMe-services need trust

robertmcachia at umedumt

BCM and IT - economic context

Economy 20 amp Economy 30

digital networked global mobile locational 24x7x365

11 billion Internet users 27 billion mobile phone users

ubiquitous VoIPSkype e-mail amp calendaring software

e-Business e-Banking e-Payment e-Government

e-SCM e-CRM B2B B2C -across supply chains geographies time-zones

economies amp jurisdictions

BCM in DCs is mandated by economic logic

robertmcachia at umedumt

BCM and IT - what not to do

human error IT management human error technical

robertmcachia at umedumt

DC BCM terminology Threats Vulnerabilities Risk Assets

Threat the intent and capacity to cause loss or disruption

and create adverse consequences - eg to IT

services data Data Centers IT Call Centers etc

Vulnerability the susceptibility of a service provider service data

or infrastructure to damage impairment or exposure

by a threat

Risk a measure of the potential consequences of a

contingency against the likelihood of its occurring

threats + vulnerabilities = risk

Impact the consequenceeffect of a threat expressed in

terms of reduction of DC capability or loss of

business service data etc

Asset an IT service element eg software hardware IT

people data

robertmcachia at umedumt

Threats - some examples

Some data centre threats

human error (managementtechnical accidentalmalicious)

technical failure (mechanical electrical hardware software

etc)

software failure (even software patches themselves contain

bugs)

fire explosion smoke

floods (natural burst pipes)

toxic hazards (effluents emissions)

structural collapse

crime (vandalism organized crime white-collar crime etc)

mass cyber-attack digital terrorism cyber-war

Established threats may diminish and new threats emerge - alertness

Oct 2006 - Marsascala

robertmcachia at umedumt

Threats - general amp IT-specific

-

Explosion fire smoke emissions effluents

in or close to DC

malicious IT Threat

robertmcachia at umedumt

IT Vulnerabilities - some examples (1)

DC vulnerabilities may be technical or non-technical

eg in organizational design in business processes systems or software

Some IT vulnerabilities

Poor Data Centre site selection or Data Centre design

SPOFs (people hardware databases software)

Poor Data Centre procedures or compliance to

procedure (SFIA HR framework for IT ITIL)

Poor physical access control

Poor software patchinggood physical access

control

robertmcachia at umedumt

IT Vulnerabilities - some examples (2)

Poor software testing especially Web

testing (recall CMMi ISEB)

Poor Configuration ManagementVersion

Control (recall ISO 20000 ITIL)

Poor Infosec (passwords biometrics

encryption anti-virusmalware)

(recall ISO 27001 CobIT RiskIT)

Poor Capacity ManagementAvailability

Management (recall ISO 20000 ITIL)

Poor procurementsupplier management

(ISO 9000 ITIL BS25999)

Poor fire protection

Known vulnerabilities sometimes diminish amp new ones emerge - alertness

wiring from hell

robertmcachia at umedumt

Risks - general amp IT-specific

loss of facilities (buildings roads leading to building etc)

loss of data (data integrity data availability recall ISO 27001)

loss of power

loss of water supply (cooling) the opposite - flooding

computer infection outbreaks - viruses worms

physical access compromised intrusion with intent

asset seizure

Changing threats changing vulnerabilities - shifting scenario alertness

Some Data Centres risks

supply chain disruption (software vendor collapse

hardware vendor collapse ISP collapse)

loss of Telecom connectivity (data voice)

loss of people (people can be SPOFs)Your IT vendor supply

chain

Burst pipe

DC flooding

robertmcachia at umedumt

IT Risks - some examples

Physical DC Risks Logical DC Risks

DC outages business impacts - large amp immediate

Impact of DC outages

bull down-time

bull breached SLAs

bull lost revenue

bull reputational damage

clients and prospects lost

forever

bull cost to recover

bull legal liabilities

bull some DCs never recover

Recall many DCs are in

premises that were never

designed to be DC-ready

Q How to protect value-creation capability

A BCM

robertmcachia at umedumt

BCM in Data Centres BCM for Data Centres

360 e-continuity amp e-assurance

business functions of commercial IT providers(eg CEOrsquos office Procurement Marketing amp Sales HR Facilities

Planning QA)

IT processes eg IncidentProblem ConfigurationChange

AvailabilityCapacity

Data

Employees including IT people

Physical FacilitiesBuilding Networks VoIP

Computational and Storage

Servers SANNAS

Software stackOS DBMS middleware applications

ApplicationsServicesB2BB2C e-SCMERPe-CRM BIKM IntranetsExtranets e-mail

BCM for data centres 360

robertmcachia at umedumt

BCM DC initiative output a BC Plan + organizational capability

A good BC plan is articulated by components and views

With a good BC plan the CIOCTOData Centre Manager can tell the

CEO

We haveare executing plans by DC clientmarket segment hellip

We haveare executing plans by Applicationby Service hellip

We haveare executing plans by DC Departmenthellip

We haveare executing plans by Buildingby Floor hellip

BC Plan BC Planning (i) What (ii) How (iii) Who (iv) When

robertmcachia at umedumt

DC Business Continuity Planning management choices

Which DC clientsservicesapplications get restored first and who will

wait longer hellip recovery sequence for each client

Which servicesapplications get restored completely which clients will

initially only get limited service

MTPD maximum tolerable period of disruption the period of time by

which all identified systemsapplicationsservices and data

must be restored to normality (for DC for each client)

RTO recovery time objective the period in time within which

assetsservices must be recovered after disruption (for DC for

each client)

RTO affects the recovery option shorter RTO -gt more difficult

more expensive (for DC for each client)

RPO recovery point objective the point in time to which

assetsservices must be restored after disruption (for DC for

each client)

RPO affects the volume of data that may need to be restored

robertmcachia at umedumt

BCM initiative requires a dedicated ProjectProgramme

A BC initiative may adopt Prince 2 or PMBOK projectmanagement methods The typical BCM phases

BC Project Initiation Phase

(mandate BCM scope DC BCM objectives responsibilitiesapproach awareness and BC project outcomesdeliverables)

Business Impact Analysis Phase

Risk Assessment Phase

BC Planning amp Design Phase

Exercise amp Testing Phase

Maintenance amp Review Phase

Awareness amp Training Phase

robertmcachia at umedumt

Business Impact Analysis (BIA) Phase

BIA achieves understanding of the DC its users activities

servicesrevenue streams applications data liabilities

bull BIA takes an internal focus - it identifies critical DC value-creatingactivities and assets

bull BIA quantifies DC loss due to disruptionimpairment of assets(reputational financial lost revenue cost of recovery)

bull BIA does not estimate the probability of types of DC incidents - itquantifies the consequences

bull BIA techniques questionnaires + interviews

BIA answers the questions which DC activities amp

assets create most value hellip of all potential DC losses

which impaired activities amp assets would be the

greatest losscascading impacts

multiple DC failure

robertmcachia at umedumt

The Risk Matrix - likelihood vs impact

Major Risks to DC - top-right square

unacceptable Eliminate or Postpone

Transfer Monitor amp Mitigate +

management ownership + DO BCM -Governance

Contingency Risks to DC - top-left

square very rare events will

extinguish the business complacency +

overconfidence can distract

management from the consequences

DO BCM - Governance

High incidence and low impact risksamp

minor risks order of the day -

operational matters

robertmcachia at umedumt

Risk Assessment Phase

RA considers the whatwhywherewho can and would cause the DCthe disastrous losses identified in BIA RA seeks to understand threatsto valued serviceDC assets

bull RA takes a primarily external focus a worldpoliticaleconomic IT sector focus - it looks at sources and causes

bull RA identifies threat scenarios to DC value-creating activities ampassets

bull RA estimates likelihood (probability) of occurrence of DC lossesamp asset impairment identified in BIA (estimates orguesstimates)

Note Need to locate DC activities amp assets in the slots of the RiskMatrix for the DC and plan to protect value-creating capability

robertmcachia at umedumt

BC Planning amp Design Phase - business recovery

DC recovery options driven by business requirements

regulation contracts amp SLAs A typical recovery sequence

1 Set MTPD objective for the DCfor each

client

2 Assign RTOs for individual

clientservicesapplicationsdata on

basis of prioritySLA

3 Evaluate alternative management amp

technical recovery options

4 Perform costbenefit analysis for each

option (re-planiterate)

5 Decide CEOCFO (i) approval (ii)

budget (iii) tasking individualsFailure to prepare is preparing

to fail - John Wooden

robertmcachia at umedumt

Some business recovery options for the DC

bull an Emergency Operations OfficeCentre

bull virtual workplaceteleworking from home for selected DC employees

bull alternative IT Call CentreData Centre

bull business decision - On-shore Near-shore Offshore

fallbackredundancy options

A typical recovery sequence

1 Initial emergency response

2 Resume mission-critical ApplicationsServices

3 Resume non-critical ApplicationsServices

4 Restoration to primary site full services verify stability

Ongoing communication - DC employees clients regulators

BC Planning amp Design Phase - business recovery (2)

robertmcachia at umedumt

BC Planning amp Design - technical recovery

Note business decision On-shore Near-shore Offshore

Note Mirror Hot Warm Cold (i) own (ii) shared-space (iii) reciprocal

agreements (iv) outsourced a business issue

Note high-end BC for DCs active-active hellip hot failover hellip

Note entry-point BC for DCs electronic vaulting remote journaling

transaction logging

Mirror full redundancy short-

distancewidely-separated Data

Centers latency considerations

Hot fully equipped fallback Data Centre

Warm fallback Data Center missing key

components

Cold empty fallback Data Center

DC Recovery options by increasing MTPD RTO amp decreasing cost

mirroring

robertmcachia at umedumt

Phases Exercise amp Testing Maintain amp Review Awareness amp Training

Maintain amp Review Phase

feedback from testingaudit broadendeepen DC BC plan

Awareness amp Training Phase

its never enough institutionalize BC practice embed into

DC culture

The proof of the bdquopudding‟

is in the eating

Exercise amp Testing Phase

Give yourself an intentional DC disaster

1 Re-test BCP regularly

(walk-through checklist role-

playingsimulation full interruption amp

rehearsal)

2 Test alternative risk (threat +

vulnerability) scenarios

robertmcachia at umedumt

An example disaster mass cyber-attack

A mass cyber-attack may

involve

prior sniffing hellip

hellip social engineering amp

widespread attack

An attack may be blended

denial-of-service andor

widespread virusworm

infection andor

penetration (data theft

data corruption) andor

fraudblackmail

IncidentProblem Management

(recall ISO 20000 ITIL)source M Benyoucef

University of Ottawa

robertmcachia at umedumt

An example DC disaster BCM response to cyber-attack

This sequence applies to DoSDDoS infection amp penetration cyber-

attacks

Note After the BCM response seek prevention (i) Lessons Learnt (ii)

improve DC processsystems resilience (iii) heighten DC emergency

preparedness

1 Interdict actions to block attack halt its progress

2 Contain actions to prevent further degradation

amp regain control

3 Recover repair fix damage bring all systems

data applicationsservices to normality

4 Analyse investigationaudit - examination of

evidence

When prevention fails and an attack is under way the BCM

response is

Interdict Contain

Recover Analyse

robertmcachia at umedumt

Structure amp Content of BS25999

A DC may go beyond organising a BCP project

It may decide to formalise its BCM assurance process by embracing the

BS25999 standard

BS25999 holistic structure amp content

bull overview of business continuity management (BCM)

bull business continuity management policy

bull BCM programme management

bull understanding the organization

bull determining business continuity strategy

bull developing and implementing a BCM response

bull exercising maintaining and reviewing BCM arrangements

bull embedding BCM in the organizationrsquos culture

BS25999 Holistic

robertmcachia at umedumt

Some takeaways (1)

BCM thinking amp action

bull decrease the number of

holes both management amp

technical

bull decrease the size of the

holes and

bull keep them moving so they

are never aligned

The Swiss cheese model of business resilience

robertmcachia at umedumt

Some takeaways (2)

BCM is a concern of

CIOs CTOs Data Centre Managers

Quality Managers Infosec Managers

IT Architects DBAs

IT Auditors

CEOs CFOs because itrsquos Governance

BCM addresses broad non-specific risks it is a ldquocatch allrdquo

BCM is probably the most cost-effective security control of all

BCM start now start small scale fast broadendeepen BCM scope

incrementally

You must be able to respond to your circumstances as they exist - not as you

would like them to be - Brian Billick

robertmcachia at umedumt

Sources in and around BCM for Data Centres

The Business Continuity Institute (BCI UK) httpwwwthebciorg

Guidancestudies on BCM by ENISA (EU) ECB (EU) CMI (UK) FSA (UK) UK

Cabinet Office

A primer on ITIL ISO 20000 - IT Service Management

httpwwwisaca-

maltaorgdocumentspresentationsIT_Service_Management_Robert_M_Cachia_26_02

_2009pdf

Agility amp alertness hellipcontinuitycentral httpwwwcontinuitycentralcomukhtm

BS25999 (British Standards Institution) httpwwwbsigroupcomensectorsandservicesDisciplinesBusines

s-Continuity

RiskIT amp CobIT frameworks (ISACA) wwwisaca-maltaorg

SFIA HR framework for IT ISEB (BCS) httpwwwbcsorg

SEI (USA) httpwwwseicmuedu

Uptime Institute (USA) httpwwwuptimeinstituteorg

robertmcachia at umedumt

Questions and Discussion

robertmcachia atldquo umedumt

Thank You

Dr Robert M Cachia

httpwwwlinkedincominrobertmcachia

robertmcachia ldquoatrdquo umedumt

Todayrsquos agenda (2)

Today we discuss (continued)

bull a BCM initiative for DCs its output

bull BCP for DCs management choices

bull BCM ProjectProgramme

bull business impact analysis

bull risk assessment

bull BCM DC recovery - business amp technical

bull an example disaster - mass cyber-attack amp BCM response

bull structure and content of BS25999

bull some takeaways

robertmcachia at umedumt

e-Services e-Business - all from Data Centres amp IT Call Centres

robertmcachia at umedumt

BCM for Data Centres what

BCM

i is a management amp business capability to protect value-creating

activities of the DC

ii supports the Boardrsquos objective of staying in business

iii points i amp ii make BCM immediately a Governance issue

BCM - planned processes to

bull assess reduce amp manage vulnerability to risk

bull plan DC responses in case of adversity

bull exercise preparedness for continued IT delivery during disruption

bull restore normality and protect DC reputation and goodwill

ldquoThe time to repair the roof is when the sun is shiningrdquo (J F Kennedy)

robertmcachia at umedumt

Business e-Services amp trust

Is she

bull a Logistics Manager

bull a Banker

bull a PA to the CEO

bull a CEO

in

bull an Airport

bull a Bourse

bull a Telecoms provider

bull iGaming

bull e-payment

bull a Retail chain hellip

hellip doing B2B or B2C

hellip doing e-CRM or e-SCMe-services need trust

robertmcachia at umedumt

BCM and IT - economic context

Economy 20 amp Economy 30

digital networked global mobile locational 24x7x365

11 billion Internet users 27 billion mobile phone users

ubiquitous VoIPSkype e-mail amp calendaring software

e-Business e-Banking e-Payment e-Government

e-SCM e-CRM B2B B2C -across supply chains geographies time-zones

economies amp jurisdictions

BCM in DCs is mandated by economic logic

robertmcachia at umedumt

BCM and IT - what not to do

human error IT management human error technical

robertmcachia at umedumt

DC BCM terminology Threats Vulnerabilities Risk Assets

Threat the intent and capacity to cause loss or disruption

and create adverse consequences - eg to IT

services data Data Centers IT Call Centers etc

Vulnerability the susceptibility of a service provider service data

or infrastructure to damage impairment or exposure

by a threat

Risk a measure of the potential consequences of a

contingency against the likelihood of its occurring

threats + vulnerabilities = risk

Impact the consequenceeffect of a threat expressed in

terms of reduction of DC capability or loss of

business service data etc

Asset an IT service element eg software hardware IT

people data

robertmcachia at umedumt

Threats - some examples

Some data centre threats

human error (managementtechnical accidentalmalicious)

technical failure (mechanical electrical hardware software

etc)

software failure (even software patches themselves contain

bugs)

fire explosion smoke

floods (natural burst pipes)

toxic hazards (effluents emissions)

structural collapse

crime (vandalism organized crime white-collar crime etc)

mass cyber-attack digital terrorism cyber-war

Established threats may diminish and new threats emerge - alertness

Oct 2006 - Marsascala

robertmcachia at umedumt

Threats - general amp IT-specific

-

Explosion fire smoke emissions effluents

in or close to DC

malicious IT Threat

robertmcachia at umedumt

IT Vulnerabilities - some examples (1)

DC vulnerabilities may be technical or non-technical

eg in organizational design in business processes systems or software

Some IT vulnerabilities

Poor Data Centre site selection or Data Centre design

SPOFs (people hardware databases software)

Poor Data Centre procedures or compliance to

procedure (SFIA HR framework for IT ITIL)

Poor physical access control

Poor software patchinggood physical access

control

robertmcachia at umedumt

IT Vulnerabilities - some examples (2)

Poor software testing especially Web

testing (recall CMMi ISEB)

Poor Configuration ManagementVersion

Control (recall ISO 20000 ITIL)

Poor Infosec (passwords biometrics

encryption anti-virusmalware)

(recall ISO 27001 CobIT RiskIT)

Poor Capacity ManagementAvailability

Management (recall ISO 20000 ITIL)

Poor procurementsupplier management

(ISO 9000 ITIL BS25999)

Poor fire protection

Known vulnerabilities sometimes diminish amp new ones emerge - alertness

wiring from hell

robertmcachia at umedumt

Risks - general amp IT-specific

loss of facilities (buildings roads leading to building etc)

loss of data (data integrity data availability recall ISO 27001)

loss of power

loss of water supply (cooling) the opposite - flooding

computer infection outbreaks - viruses worms

physical access compromised intrusion with intent

asset seizure

Changing threats changing vulnerabilities - shifting scenario alertness

Some Data Centres risks

supply chain disruption (software vendor collapse

hardware vendor collapse ISP collapse)

loss of Telecom connectivity (data voice)

loss of people (people can be SPOFs)Your IT vendor supply

chain

Burst pipe

DC flooding

robertmcachia at umedumt

IT Risks - some examples

Physical DC Risks Logical DC Risks

DC outages business impacts - large amp immediate

Impact of DC outages

bull down-time

bull breached SLAs

bull lost revenue

bull reputational damage

clients and prospects lost

forever

bull cost to recover

bull legal liabilities

bull some DCs never recover

Recall many DCs are in

premises that were never

designed to be DC-ready

Q How to protect value-creation capability

A BCM

robertmcachia at umedumt

BCM in Data Centres BCM for Data Centres

360 e-continuity amp e-assurance

business functions of commercial IT providers(eg CEOrsquos office Procurement Marketing amp Sales HR Facilities

Planning QA)

IT processes eg IncidentProblem ConfigurationChange

AvailabilityCapacity

Data

Employees including IT people

Physical FacilitiesBuilding Networks VoIP

Computational and Storage

Servers SANNAS

Software stackOS DBMS middleware applications

ApplicationsServicesB2BB2C e-SCMERPe-CRM BIKM IntranetsExtranets e-mail

BCM for data centres 360

robertmcachia at umedumt

BCM DC initiative output a BC Plan + organizational capability

A good BC plan is articulated by components and views

With a good BC plan the CIOCTOData Centre Manager can tell the

CEO

We haveare executing plans by DC clientmarket segment hellip

We haveare executing plans by Applicationby Service hellip

We haveare executing plans by DC Departmenthellip

We haveare executing plans by Buildingby Floor hellip

BC Plan BC Planning (i) What (ii) How (iii) Who (iv) When

robertmcachia at umedumt

DC Business Continuity Planning management choices

Which DC clientsservicesapplications get restored first and who will

wait longer hellip recovery sequence for each client

Which servicesapplications get restored completely which clients will

initially only get limited service

MTPD maximum tolerable period of disruption the period of time by

which all identified systemsapplicationsservices and data

must be restored to normality (for DC for each client)

RTO recovery time objective the period in time within which

assetsservices must be recovered after disruption (for DC for

each client)

RTO affects the recovery option shorter RTO -gt more difficult

more expensive (for DC for each client)

RPO recovery point objective the point in time to which

assetsservices must be restored after disruption (for DC for

each client)

RPO affects the volume of data that may need to be restored

robertmcachia at umedumt

BCM initiative requires a dedicated ProjectProgramme

A BC initiative may adopt Prince 2 or PMBOK projectmanagement methods The typical BCM phases

BC Project Initiation Phase

(mandate BCM scope DC BCM objectives responsibilitiesapproach awareness and BC project outcomesdeliverables)

Business Impact Analysis Phase

Risk Assessment Phase

BC Planning amp Design Phase

Exercise amp Testing Phase

Maintenance amp Review Phase

Awareness amp Training Phase

robertmcachia at umedumt

Business Impact Analysis (BIA) Phase

BIA achieves understanding of the DC its users activities

servicesrevenue streams applications data liabilities

bull BIA takes an internal focus - it identifies critical DC value-creatingactivities and assets

bull BIA quantifies DC loss due to disruptionimpairment of assets(reputational financial lost revenue cost of recovery)

bull BIA does not estimate the probability of types of DC incidents - itquantifies the consequences

bull BIA techniques questionnaires + interviews

BIA answers the questions which DC activities amp

assets create most value hellip of all potential DC losses

which impaired activities amp assets would be the

greatest losscascading impacts

multiple DC failure

robertmcachia at umedumt

The Risk Matrix - likelihood vs impact

Major Risks to DC - top-right square

unacceptable Eliminate or Postpone

Transfer Monitor amp Mitigate +

management ownership + DO BCM -Governance

Contingency Risks to DC - top-left

square very rare events will

extinguish the business complacency +

overconfidence can distract

management from the consequences

DO BCM - Governance

High incidence and low impact risksamp

minor risks order of the day -

operational matters

robertmcachia at umedumt

Risk Assessment Phase

RA considers the whatwhywherewho can and would cause the DCthe disastrous losses identified in BIA RA seeks to understand threatsto valued serviceDC assets

bull RA takes a primarily external focus a worldpoliticaleconomic IT sector focus - it looks at sources and causes

bull RA identifies threat scenarios to DC value-creating activities ampassets

bull RA estimates likelihood (probability) of occurrence of DC lossesamp asset impairment identified in BIA (estimates orguesstimates)

Note Need to locate DC activities amp assets in the slots of the RiskMatrix for the DC and plan to protect value-creating capability

robertmcachia at umedumt

BC Planning amp Design Phase - business recovery

DC recovery options driven by business requirements

regulation contracts amp SLAs A typical recovery sequence

1 Set MTPD objective for the DCfor each

client

2 Assign RTOs for individual

clientservicesapplicationsdata on

basis of prioritySLA

3 Evaluate alternative management amp

technical recovery options

4 Perform costbenefit analysis for each

option (re-planiterate)

5 Decide CEOCFO (i) approval (ii)

budget (iii) tasking individualsFailure to prepare is preparing

to fail - John Wooden

robertmcachia at umedumt

Some business recovery options for the DC

bull an Emergency Operations OfficeCentre

bull virtual workplaceteleworking from home for selected DC employees

bull alternative IT Call CentreData Centre

bull business decision - On-shore Near-shore Offshore

fallbackredundancy options

A typical recovery sequence

1 Initial emergency response

2 Resume mission-critical ApplicationsServices

3 Resume non-critical ApplicationsServices

4 Restoration to primary site full services verify stability

Ongoing communication - DC employees clients regulators

BC Planning amp Design Phase - business recovery (2)

robertmcachia at umedumt

BC Planning amp Design - technical recovery

Note business decision On-shore Near-shore Offshore

Note Mirror Hot Warm Cold (i) own (ii) shared-space (iii) reciprocal

agreements (iv) outsourced a business issue

Note high-end BC for DCs active-active hellip hot failover hellip

Note entry-point BC for DCs electronic vaulting remote journaling

transaction logging

Mirror full redundancy short-

distancewidely-separated Data

Centers latency considerations

Hot fully equipped fallback Data Centre

Warm fallback Data Center missing key

components

Cold empty fallback Data Center

DC Recovery options by increasing MTPD RTO amp decreasing cost

mirroring

robertmcachia at umedumt

Phases Exercise amp Testing Maintain amp Review Awareness amp Training

Maintain amp Review Phase

feedback from testingaudit broadendeepen DC BC plan

Awareness amp Training Phase

its never enough institutionalize BC practice embed into

DC culture

The proof of the bdquopudding‟

is in the eating

Exercise amp Testing Phase

Give yourself an intentional DC disaster

1 Re-test BCP regularly

(walk-through checklist role-

playingsimulation full interruption amp

rehearsal)

2 Test alternative risk (threat +

vulnerability) scenarios

robertmcachia at umedumt

An example disaster mass cyber-attack

A mass cyber-attack may

involve

prior sniffing hellip

hellip social engineering amp

widespread attack

An attack may be blended

denial-of-service andor

widespread virusworm

infection andor

penetration (data theft

data corruption) andor

fraudblackmail

IncidentProblem Management

(recall ISO 20000 ITIL)source M Benyoucef

University of Ottawa

robertmcachia at umedumt

An example DC disaster BCM response to cyber-attack

This sequence applies to DoSDDoS infection amp penetration cyber-

attacks

Note After the BCM response seek prevention (i) Lessons Learnt (ii)

improve DC processsystems resilience (iii) heighten DC emergency

preparedness

1 Interdict actions to block attack halt its progress

2 Contain actions to prevent further degradation

amp regain control

3 Recover repair fix damage bring all systems

data applicationsservices to normality

4 Analyse investigationaudit - examination of

evidence

When prevention fails and an attack is under way the BCM

response is

Interdict Contain

Recover Analyse

robertmcachia at umedumt

Structure amp Content of BS25999

A DC may go beyond organising a BCP project

It may decide to formalise its BCM assurance process by embracing the

BS25999 standard

BS25999 holistic structure amp content

bull overview of business continuity management (BCM)

bull business continuity management policy

bull BCM programme management

bull understanding the organization

bull determining business continuity strategy

bull developing and implementing a BCM response

bull exercising maintaining and reviewing BCM arrangements

bull embedding BCM in the organizationrsquos culture

BS25999 Holistic

robertmcachia at umedumt

Some takeaways (1)

BCM thinking amp action

bull decrease the number of

holes both management amp

technical

bull decrease the size of the

holes and

bull keep them moving so they

are never aligned

The Swiss cheese model of business resilience

robertmcachia at umedumt

Some takeaways (2)

BCM is a concern of

CIOs CTOs Data Centre Managers

Quality Managers Infosec Managers

IT Architects DBAs

IT Auditors

CEOs CFOs because itrsquos Governance

BCM addresses broad non-specific risks it is a ldquocatch allrdquo

BCM is probably the most cost-effective security control of all

BCM start now start small scale fast broadendeepen BCM scope

incrementally

You must be able to respond to your circumstances as they exist - not as you

would like them to be - Brian Billick

robertmcachia at umedumt

Sources in and around BCM for Data Centres

The Business Continuity Institute (BCI UK) httpwwwthebciorg

Guidancestudies on BCM by ENISA (EU) ECB (EU) CMI (UK) FSA (UK) UK

Cabinet Office

A primer on ITIL ISO 20000 - IT Service Management

httpwwwisaca-

maltaorgdocumentspresentationsIT_Service_Management_Robert_M_Cachia_26_02

_2009pdf

Agility amp alertness hellipcontinuitycentral httpwwwcontinuitycentralcomukhtm

BS25999 (British Standards Institution) httpwwwbsigroupcomensectorsandservicesDisciplinesBusines

s-Continuity

RiskIT amp CobIT frameworks (ISACA) wwwisaca-maltaorg

SFIA HR framework for IT ISEB (BCS) httpwwwbcsorg

SEI (USA) httpwwwseicmuedu

Uptime Institute (USA) httpwwwuptimeinstituteorg

robertmcachia at umedumt

Questions and Discussion

robertmcachia atldquo umedumt

Thank You

Dr Robert M Cachia

httpwwwlinkedincominrobertmcachia

robertmcachia ldquoatrdquo umedumt

e-Services e-Business - all from Data Centres amp IT Call Centres

robertmcachia at umedumt

BCM for Data Centres what

BCM

i is a management amp business capability to protect value-creating

activities of the DC

ii supports the Boardrsquos objective of staying in business

iii points i amp ii make BCM immediately a Governance issue

BCM - planned processes to

bull assess reduce amp manage vulnerability to risk

bull plan DC responses in case of adversity

bull exercise preparedness for continued IT delivery during disruption

bull restore normality and protect DC reputation and goodwill

ldquoThe time to repair the roof is when the sun is shiningrdquo (J F Kennedy)

robertmcachia at umedumt

Business e-Services amp trust

Is she

bull a Logistics Manager

bull a Banker

bull a PA to the CEO

bull a CEO

in

bull an Airport

bull a Bourse

bull a Telecoms provider

bull iGaming

bull e-payment

bull a Retail chain hellip

hellip doing B2B or B2C

hellip doing e-CRM or e-SCMe-services need trust

robertmcachia at umedumt

BCM and IT - economic context

Economy 20 amp Economy 30

digital networked global mobile locational 24x7x365

11 billion Internet users 27 billion mobile phone users

ubiquitous VoIPSkype e-mail amp calendaring software

e-Business e-Banking e-Payment e-Government

e-SCM e-CRM B2B B2C -across supply chains geographies time-zones

economies amp jurisdictions

BCM in DCs is mandated by economic logic

robertmcachia at umedumt

BCM and IT - what not to do

human error IT management human error technical

robertmcachia at umedumt

DC BCM terminology Threats Vulnerabilities Risk Assets

Threat the intent and capacity to cause loss or disruption

and create adverse consequences - eg to IT

services data Data Centers IT Call Centers etc

Vulnerability the susceptibility of a service provider service data

or infrastructure to damage impairment or exposure

by a threat

Risk a measure of the potential consequences of a

contingency against the likelihood of its occurring

threats + vulnerabilities = risk

Impact the consequenceeffect of a threat expressed in

terms of reduction of DC capability or loss of

business service data etc

Asset an IT service element eg software hardware IT

people data

robertmcachia at umedumt

Threats - some examples

Some data centre threats

human error (managementtechnical accidentalmalicious)

technical failure (mechanical electrical hardware software

etc)

software failure (even software patches themselves contain

bugs)

fire explosion smoke

floods (natural burst pipes)

toxic hazards (effluents emissions)

structural collapse

crime (vandalism organized crime white-collar crime etc)

mass cyber-attack digital terrorism cyber-war

Established threats may diminish and new threats emerge - alertness

Oct 2006 - Marsascala

robertmcachia at umedumt

Threats - general amp IT-specific

-

Explosion fire smoke emissions effluents

in or close to DC

malicious IT Threat

robertmcachia at umedumt

IT Vulnerabilities - some examples (1)

DC vulnerabilities may be technical or non-technical

eg in organizational design in business processes systems or software

Some IT vulnerabilities

Poor Data Centre site selection or Data Centre design

SPOFs (people hardware databases software)

Poor Data Centre procedures or compliance to

procedure (SFIA HR framework for IT ITIL)

Poor physical access control

Poor software patchinggood physical access

control

robertmcachia at umedumt

IT Vulnerabilities - some examples (2)

Poor software testing especially Web

testing (recall CMMi ISEB)

Poor Configuration ManagementVersion

Control (recall ISO 20000 ITIL)

Poor Infosec (passwords biometrics

encryption anti-virusmalware)

(recall ISO 27001 CobIT RiskIT)

Poor Capacity ManagementAvailability

Management (recall ISO 20000 ITIL)

Poor procurementsupplier management

(ISO 9000 ITIL BS25999)

Poor fire protection

Known vulnerabilities sometimes diminish amp new ones emerge - alertness

wiring from hell

robertmcachia at umedumt

Risks - general amp IT-specific

loss of facilities (buildings roads leading to building etc)

loss of data (data integrity data availability recall ISO 27001)

loss of power

loss of water supply (cooling) the opposite - flooding

computer infection outbreaks - viruses worms

physical access compromised intrusion with intent

asset seizure

Changing threats changing vulnerabilities - shifting scenario alertness

Some Data Centres risks

supply chain disruption (software vendor collapse

hardware vendor collapse ISP collapse)

loss of Telecom connectivity (data voice)

loss of people (people can be SPOFs)Your IT vendor supply

chain

Burst pipe

DC flooding

robertmcachia at umedumt

IT Risks - some examples

Physical DC Risks Logical DC Risks

DC outages business impacts - large amp immediate

Impact of DC outages

bull down-time

bull breached SLAs

bull lost revenue

bull reputational damage

clients and prospects lost

forever

bull cost to recover

bull legal liabilities

bull some DCs never recover

Recall many DCs are in

premises that were never

designed to be DC-ready

Q How to protect value-creation capability

A BCM

robertmcachia at umedumt

BCM in Data Centres BCM for Data Centres

360 e-continuity amp e-assurance

business functions of commercial IT providers(eg CEOrsquos office Procurement Marketing amp Sales HR Facilities

Planning QA)

IT processes eg IncidentProblem ConfigurationChange

AvailabilityCapacity

Data

Employees including IT people

Physical FacilitiesBuilding Networks VoIP

Computational and Storage

Servers SANNAS

Software stackOS DBMS middleware applications

ApplicationsServicesB2BB2C e-SCMERPe-CRM BIKM IntranetsExtranets e-mail

BCM for data centres 360

robertmcachia at umedumt

BCM DC initiative output a BC Plan + organizational capability

A good BC plan is articulated by components and views

With a good BC plan the CIOCTOData Centre Manager can tell the

CEO

We haveare executing plans by DC clientmarket segment hellip

We haveare executing plans by Applicationby Service hellip

We haveare executing plans by DC Departmenthellip

We haveare executing plans by Buildingby Floor hellip

BC Plan BC Planning (i) What (ii) How (iii) Who (iv) When

robertmcachia at umedumt

DC Business Continuity Planning management choices

Which DC clientsservicesapplications get restored first and who will

wait longer hellip recovery sequence for each client

Which servicesapplications get restored completely which clients will

initially only get limited service

MTPD maximum tolerable period of disruption the period of time by

which all identified systemsapplicationsservices and data

must be restored to normality (for DC for each client)

RTO recovery time objective the period in time within which

assetsservices must be recovered after disruption (for DC for

each client)

RTO affects the recovery option shorter RTO -gt more difficult

more expensive (for DC for each client)

RPO recovery point objective the point in time to which

assetsservices must be restored after disruption (for DC for

each client)

RPO affects the volume of data that may need to be restored

robertmcachia at umedumt

BCM initiative requires a dedicated ProjectProgramme

A BC initiative may adopt Prince 2 or PMBOK projectmanagement methods The typical BCM phases

BC Project Initiation Phase

(mandate BCM scope DC BCM objectives responsibilitiesapproach awareness and BC project outcomesdeliverables)

Business Impact Analysis Phase

Risk Assessment Phase

BC Planning amp Design Phase

Exercise amp Testing Phase

Maintenance amp Review Phase

Awareness amp Training Phase

robertmcachia at umedumt

Business Impact Analysis (BIA) Phase

BIA achieves understanding of the DC its users activities

servicesrevenue streams applications data liabilities

bull BIA takes an internal focus - it identifies critical DC value-creatingactivities and assets

bull BIA quantifies DC loss due to disruptionimpairment of assets(reputational financial lost revenue cost of recovery)

bull BIA does not estimate the probability of types of DC incidents - itquantifies the consequences

bull BIA techniques questionnaires + interviews

BIA answers the questions which DC activities amp

assets create most value hellip of all potential DC losses

which impaired activities amp assets would be the

greatest losscascading impacts

multiple DC failure

robertmcachia at umedumt

The Risk Matrix - likelihood vs impact

Major Risks to DC - top-right square

unacceptable Eliminate or Postpone

Transfer Monitor amp Mitigate +

management ownership + DO BCM -Governance

Contingency Risks to DC - top-left

square very rare events will

extinguish the business complacency +

overconfidence can distract

management from the consequences

DO BCM - Governance

High incidence and low impact risksamp

minor risks order of the day -

operational matters

robertmcachia at umedumt

Risk Assessment Phase

RA considers the whatwhywherewho can and would cause the DCthe disastrous losses identified in BIA RA seeks to understand threatsto valued serviceDC assets

bull RA takes a primarily external focus a worldpoliticaleconomic IT sector focus - it looks at sources and causes

bull RA identifies threat scenarios to DC value-creating activities ampassets

bull RA estimates likelihood (probability) of occurrence of DC lossesamp asset impairment identified in BIA (estimates orguesstimates)

Note Need to locate DC activities amp assets in the slots of the RiskMatrix for the DC and plan to protect value-creating capability

robertmcachia at umedumt

BC Planning amp Design Phase - business recovery

DC recovery options driven by business requirements

regulation contracts amp SLAs A typical recovery sequence

1 Set MTPD objective for the DCfor each

client

2 Assign RTOs for individual

clientservicesapplicationsdata on

basis of prioritySLA

3 Evaluate alternative management amp

technical recovery options

4 Perform costbenefit analysis for each

option (re-planiterate)

5 Decide CEOCFO (i) approval (ii)

budget (iii) tasking individualsFailure to prepare is preparing

to fail - John Wooden

robertmcachia at umedumt

Some business recovery options for the DC

bull an Emergency Operations OfficeCentre

bull virtual workplaceteleworking from home for selected DC employees

bull alternative IT Call CentreData Centre

bull business decision - On-shore Near-shore Offshore

fallbackredundancy options

A typical recovery sequence

1 Initial emergency response

2 Resume mission-critical ApplicationsServices

3 Resume non-critical ApplicationsServices

4 Restoration to primary site full services verify stability

Ongoing communication - DC employees clients regulators

BC Planning amp Design Phase - business recovery (2)

robertmcachia at umedumt

BC Planning amp Design - technical recovery

Note business decision On-shore Near-shore Offshore

Note Mirror Hot Warm Cold (i) own (ii) shared-space (iii) reciprocal

agreements (iv) outsourced a business issue

Note high-end BC for DCs active-active hellip hot failover hellip

Note entry-point BC for DCs electronic vaulting remote journaling

transaction logging

Mirror full redundancy short-

distancewidely-separated Data

Centers latency considerations

Hot fully equipped fallback Data Centre

Warm fallback Data Center missing key

components

Cold empty fallback Data Center

DC Recovery options by increasing MTPD RTO amp decreasing cost

mirroring

robertmcachia at umedumt

Phases Exercise amp Testing Maintain amp Review Awareness amp Training

Maintain amp Review Phase

feedback from testingaudit broadendeepen DC BC plan

Awareness amp Training Phase

its never enough institutionalize BC practice embed into

DC culture

The proof of the bdquopudding‟

is in the eating

Exercise amp Testing Phase

Give yourself an intentional DC disaster

1 Re-test BCP regularly

(walk-through checklist role-

playingsimulation full interruption amp

rehearsal)

2 Test alternative risk (threat +

vulnerability) scenarios

robertmcachia at umedumt

An example disaster mass cyber-attack

A mass cyber-attack may

involve

prior sniffing hellip

hellip social engineering amp

widespread attack

An attack may be blended

denial-of-service andor

widespread virusworm

infection andor

penetration (data theft

data corruption) andor

fraudblackmail

IncidentProblem Management

(recall ISO 20000 ITIL)source M Benyoucef

University of Ottawa

robertmcachia at umedumt

An example DC disaster BCM response to cyber-attack

This sequence applies to DoSDDoS infection amp penetration cyber-

attacks

Note After the BCM response seek prevention (i) Lessons Learnt (ii)

improve DC processsystems resilience (iii) heighten DC emergency

preparedness

1 Interdict actions to block attack halt its progress

2 Contain actions to prevent further degradation

amp regain control

3 Recover repair fix damage bring all systems

data applicationsservices to normality

4 Analyse investigationaudit - examination of

evidence

When prevention fails and an attack is under way the BCM

response is

Interdict Contain

Recover Analyse

robertmcachia at umedumt

Structure amp Content of BS25999

A DC may go beyond organising a BCP project

It may decide to formalise its BCM assurance process by embracing the

BS25999 standard

BS25999 holistic structure amp content

bull overview of business continuity management (BCM)

bull business continuity management policy

bull BCM programme management

bull understanding the organization

bull determining business continuity strategy

bull developing and implementing a BCM response

bull exercising maintaining and reviewing BCM arrangements

bull embedding BCM in the organizationrsquos culture

BS25999 Holistic

robertmcachia at umedumt

Some takeaways (1)

BCM thinking amp action

bull decrease the number of

holes both management amp

technical

bull decrease the size of the

holes and

bull keep them moving so they

are never aligned

The Swiss cheese model of business resilience

robertmcachia at umedumt

Some takeaways (2)

BCM is a concern of

CIOs CTOs Data Centre Managers

Quality Managers Infosec Managers

IT Architects DBAs

IT Auditors

CEOs CFOs because itrsquos Governance

BCM addresses broad non-specific risks it is a ldquocatch allrdquo

BCM is probably the most cost-effective security control of all

BCM start now start small scale fast broadendeepen BCM scope

incrementally

You must be able to respond to your circumstances as they exist - not as you

would like them to be - Brian Billick

robertmcachia at umedumt

Sources in and around BCM for Data Centres

The Business Continuity Institute (BCI UK) httpwwwthebciorg

Guidancestudies on BCM by ENISA (EU) ECB (EU) CMI (UK) FSA (UK) UK

Cabinet Office

A primer on ITIL ISO 20000 - IT Service Management

httpwwwisaca-

maltaorgdocumentspresentationsIT_Service_Management_Robert_M_Cachia_26_02

_2009pdf

Agility amp alertness hellipcontinuitycentral httpwwwcontinuitycentralcomukhtm

BS25999 (British Standards Institution) httpwwwbsigroupcomensectorsandservicesDisciplinesBusines

s-Continuity

RiskIT amp CobIT frameworks (ISACA) wwwisaca-maltaorg

SFIA HR framework for IT ISEB (BCS) httpwwwbcsorg

SEI (USA) httpwwwseicmuedu

Uptime Institute (USA) httpwwwuptimeinstituteorg

robertmcachia at umedumt

Questions and Discussion

robertmcachia atldquo umedumt

Thank You

Dr Robert M Cachia

httpwwwlinkedincominrobertmcachia

robertmcachia ldquoatrdquo umedumt

BCM for Data Centres what

BCM

i is a management amp business capability to protect value-creating

activities of the DC

ii supports the Boardrsquos objective of staying in business

iii points i amp ii make BCM immediately a Governance issue

BCM - planned processes to

bull assess reduce amp manage vulnerability to risk

bull plan DC responses in case of adversity

bull exercise preparedness for continued IT delivery during disruption

bull restore normality and protect DC reputation and goodwill

ldquoThe time to repair the roof is when the sun is shiningrdquo (J F Kennedy)

robertmcachia at umedumt

Business e-Services amp trust

Is she

bull a Logistics Manager

bull a Banker

bull a PA to the CEO

bull a CEO

in

bull an Airport

bull a Bourse

bull a Telecoms provider

bull iGaming

bull e-payment

bull a Retail chain hellip

hellip doing B2B or B2C

hellip doing e-CRM or e-SCMe-services need trust

robertmcachia at umedumt

BCM and IT - economic context

Economy 20 amp Economy 30

digital networked global mobile locational 24x7x365

11 billion Internet users 27 billion mobile phone users

ubiquitous VoIPSkype e-mail amp calendaring software

e-Business e-Banking e-Payment e-Government

e-SCM e-CRM B2B B2C -across supply chains geographies time-zones

economies amp jurisdictions

BCM in DCs is mandated by economic logic

robertmcachia at umedumt

BCM and IT - what not to do

human error IT management human error technical

robertmcachia at umedumt

DC BCM terminology Threats Vulnerabilities Risk Assets

Threat the intent and capacity to cause loss or disruption

and create adverse consequences - eg to IT

services data Data Centers IT Call Centers etc

Vulnerability the susceptibility of a service provider service data

or infrastructure to damage impairment or exposure

by a threat

Risk a measure of the potential consequences of a

contingency against the likelihood of its occurring

threats + vulnerabilities = risk

Impact the consequenceeffect of a threat expressed in

terms of reduction of DC capability or loss of

business service data etc

Asset an IT service element eg software hardware IT

people data

robertmcachia at umedumt

Threats - some examples

Some data centre threats

human error (managementtechnical accidentalmalicious)

technical failure (mechanical electrical hardware software

etc)

software failure (even software patches themselves contain

bugs)

fire explosion smoke

floods (natural burst pipes)

toxic hazards (effluents emissions)

structural collapse

crime (vandalism organized crime white-collar crime etc)

mass cyber-attack digital terrorism cyber-war

Established threats may diminish and new threats emerge - alertness

Oct 2006 - Marsascala

robertmcachia at umedumt

Threats - general amp IT-specific

-

Explosion fire smoke emissions effluents

in or close to DC

malicious IT Threat

robertmcachia at umedumt

IT Vulnerabilities - some examples (1)

DC vulnerabilities may be technical or non-technical

eg in organizational design in business processes systems or software

Some IT vulnerabilities

Poor Data Centre site selection or Data Centre design

SPOFs (people hardware databases software)

Poor Data Centre procedures or compliance to

procedure (SFIA HR framework for IT ITIL)

Poor physical access control

Poor software patchinggood physical access

control

robertmcachia at umedumt

IT Vulnerabilities - some examples (2)

Poor software testing especially Web

testing (recall CMMi ISEB)

Poor Configuration ManagementVersion

Control (recall ISO 20000 ITIL)

Poor Infosec (passwords biometrics

encryption anti-virusmalware)

(recall ISO 27001 CobIT RiskIT)

Poor Capacity ManagementAvailability

Management (recall ISO 20000 ITIL)

Poor procurementsupplier management

(ISO 9000 ITIL BS25999)

Poor fire protection

Known vulnerabilities sometimes diminish amp new ones emerge - alertness

wiring from hell

robertmcachia at umedumt

Risks - general amp IT-specific

loss of facilities (buildings roads leading to building etc)

loss of data (data integrity data availability recall ISO 27001)

loss of power

loss of water supply (cooling) the opposite - flooding

computer infection outbreaks - viruses worms

physical access compromised intrusion with intent

asset seizure

Changing threats changing vulnerabilities - shifting scenario alertness

Some Data Centres risks

supply chain disruption (software vendor collapse

hardware vendor collapse ISP collapse)

loss of Telecom connectivity (data voice)

loss of people (people can be SPOFs)Your IT vendor supply

chain

Burst pipe

DC flooding

robertmcachia at umedumt

IT Risks - some examples

Physical DC Risks Logical DC Risks

DC outages business impacts - large amp immediate

Impact of DC outages

bull down-time

bull breached SLAs

bull lost revenue

bull reputational damage

clients and prospects lost

forever

bull cost to recover

bull legal liabilities

bull some DCs never recover

Recall many DCs are in

premises that were never

designed to be DC-ready

Q How to protect value-creation capability

A BCM

robertmcachia at umedumt

BCM in Data Centres BCM for Data Centres

360 e-continuity amp e-assurance

business functions of commercial IT providers(eg CEOrsquos office Procurement Marketing amp Sales HR Facilities

Planning QA)

IT processes eg IncidentProblem ConfigurationChange

AvailabilityCapacity

Data

Employees including IT people

Physical FacilitiesBuilding Networks VoIP

Computational and Storage

Servers SANNAS

Software stackOS DBMS middleware applications

ApplicationsServicesB2BB2C e-SCMERPe-CRM BIKM IntranetsExtranets e-mail

BCM for data centres 360

robertmcachia at umedumt

BCM DC initiative output a BC Plan + organizational capability

A good BC plan is articulated by components and views

With a good BC plan the CIOCTOData Centre Manager can tell the

CEO

We haveare executing plans by DC clientmarket segment hellip

We haveare executing plans by Applicationby Service hellip

We haveare executing plans by DC Departmenthellip

We haveare executing plans by Buildingby Floor hellip

BC Plan BC Planning (i) What (ii) How (iii) Who (iv) When

robertmcachia at umedumt

DC Business Continuity Planning management choices

Which DC clientsservicesapplications get restored first and who will

wait longer hellip recovery sequence for each client

Which servicesapplications get restored completely which clients will

initially only get limited service

MTPD maximum tolerable period of disruption the period of time by

which all identified systemsapplicationsservices and data

must be restored to normality (for DC for each client)

RTO recovery time objective the period in time within which

assetsservices must be recovered after disruption (for DC for

each client)

RTO affects the recovery option shorter RTO -gt more difficult

more expensive (for DC for each client)

RPO recovery point objective the point in time to which

assetsservices must be restored after disruption (for DC for

each client)

RPO affects the volume of data that may need to be restored

robertmcachia at umedumt

BCM initiative requires a dedicated ProjectProgramme

A BC initiative may adopt Prince 2 or PMBOK projectmanagement methods The typical BCM phases

BC Project Initiation Phase

(mandate BCM scope DC BCM objectives responsibilitiesapproach awareness and BC project outcomesdeliverables)

Business Impact Analysis Phase

Risk Assessment Phase

BC Planning amp Design Phase

Exercise amp Testing Phase

Maintenance amp Review Phase

Awareness amp Training Phase

robertmcachia at umedumt

Business Impact Analysis (BIA) Phase

BIA achieves understanding of the DC its users activities

servicesrevenue streams applications data liabilities

bull BIA takes an internal focus - it identifies critical DC value-creatingactivities and assets

bull BIA quantifies DC loss due to disruptionimpairment of assets(reputational financial lost revenue cost of recovery)

bull BIA does not estimate the probability of types of DC incidents - itquantifies the consequences

bull BIA techniques questionnaires + interviews

BIA answers the questions which DC activities amp

assets create most value hellip of all potential DC losses

which impaired activities amp assets would be the

greatest losscascading impacts

multiple DC failure

robertmcachia at umedumt

The Risk Matrix - likelihood vs impact

Major Risks to DC - top-right square

unacceptable Eliminate or Postpone

Transfer Monitor amp Mitigate +

management ownership + DO BCM -Governance

Contingency Risks to DC - top-left

square very rare events will

extinguish the business complacency +

overconfidence can distract

management from the consequences

DO BCM - Governance

High incidence and low impact risksamp

minor risks order of the day -

operational matters

robertmcachia at umedumt

Risk Assessment Phase

RA considers the whatwhywherewho can and would cause the DCthe disastrous losses identified in BIA RA seeks to understand threatsto valued serviceDC assets

bull RA takes a primarily external focus a worldpoliticaleconomic IT sector focus - it looks at sources and causes

bull RA identifies threat scenarios to DC value-creating activities ampassets

bull RA estimates likelihood (probability) of occurrence of DC lossesamp asset impairment identified in BIA (estimates orguesstimates)

Note Need to locate DC activities amp assets in the slots of the RiskMatrix for the DC and plan to protect value-creating capability

robertmcachia at umedumt

BC Planning amp Design Phase - business recovery

DC recovery options driven by business requirements

regulation contracts amp SLAs A typical recovery sequence

1 Set MTPD objective for the DCfor each

client

2 Assign RTOs for individual

clientservicesapplicationsdata on

basis of prioritySLA

3 Evaluate alternative management amp

technical recovery options

4 Perform costbenefit analysis for each

option (re-planiterate)

5 Decide CEOCFO (i) approval (ii)

budget (iii) tasking individualsFailure to prepare is preparing

to fail - John Wooden

robertmcachia at umedumt

Some business recovery options for the DC

bull an Emergency Operations OfficeCentre

bull virtual workplaceteleworking from home for selected DC employees

bull alternative IT Call CentreData Centre

bull business decision - On-shore Near-shore Offshore

fallbackredundancy options

A typical recovery sequence

1 Initial emergency response

2 Resume mission-critical ApplicationsServices

3 Resume non-critical ApplicationsServices

4 Restoration to primary site full services verify stability

Ongoing communication - DC employees clients regulators

BC Planning amp Design Phase - business recovery (2)

robertmcachia at umedumt

BC Planning amp Design - technical recovery

Note business decision On-shore Near-shore Offshore

Note Mirror Hot Warm Cold (i) own (ii) shared-space (iii) reciprocal

agreements (iv) outsourced a business issue

Note high-end BC for DCs active-active hellip hot failover hellip

Note entry-point BC for DCs electronic vaulting remote journaling

transaction logging

Mirror full redundancy short-

distancewidely-separated Data

Centers latency considerations

Hot fully equipped fallback Data Centre

Warm fallback Data Center missing key

components

Cold empty fallback Data Center

DC Recovery options by increasing MTPD RTO amp decreasing cost

mirroring

robertmcachia at umedumt

Phases Exercise amp Testing Maintain amp Review Awareness amp Training

Maintain amp Review Phase

feedback from testingaudit broadendeepen DC BC plan

Awareness amp Training Phase

its never enough institutionalize BC practice embed into

DC culture

The proof of the bdquopudding‟

is in the eating

Exercise amp Testing Phase

Give yourself an intentional DC disaster

1 Re-test BCP regularly

(walk-through checklist role-

playingsimulation full interruption amp

rehearsal)

2 Test alternative risk (threat +

vulnerability) scenarios

robertmcachia at umedumt

An example disaster mass cyber-attack

A mass cyber-attack may

involve

prior sniffing hellip

hellip social engineering amp

widespread attack

An attack may be blended

denial-of-service andor

widespread virusworm

infection andor

penetration (data theft

data corruption) andor

fraudblackmail

IncidentProblem Management

(recall ISO 20000 ITIL)source M Benyoucef

University of Ottawa

robertmcachia at umedumt

An example DC disaster BCM response to cyber-attack

This sequence applies to DoSDDoS infection amp penetration cyber-

attacks

Note After the BCM response seek prevention (i) Lessons Learnt (ii)

improve DC processsystems resilience (iii) heighten DC emergency

preparedness

1 Interdict actions to block attack halt its progress

2 Contain actions to prevent further degradation

amp regain control

3 Recover repair fix damage bring all systems

data applicationsservices to normality

4 Analyse investigationaudit - examination of

evidence

When prevention fails and an attack is under way the BCM

response is

Interdict Contain

Recover Analyse

robertmcachia at umedumt

Structure amp Content of BS25999

A DC may go beyond organising a BCP project

It may decide to formalise its BCM assurance process by embracing the

BS25999 standard

BS25999 holistic structure amp content

bull overview of business continuity management (BCM)

bull business continuity management policy

bull BCM programme management

bull understanding the organization

bull determining business continuity strategy

bull developing and implementing a BCM response

bull exercising maintaining and reviewing BCM arrangements

bull embedding BCM in the organizationrsquos culture

BS25999 Holistic

robertmcachia at umedumt

Some takeaways (1)

BCM thinking amp action

bull decrease the number of

holes both management amp

technical

bull decrease the size of the

holes and

bull keep them moving so they

are never aligned

The Swiss cheese model of business resilience

robertmcachia at umedumt

Some takeaways (2)

BCM is a concern of

CIOs CTOs Data Centre Managers

Quality Managers Infosec Managers

IT Architects DBAs

IT Auditors

CEOs CFOs because itrsquos Governance

BCM addresses broad non-specific risks it is a ldquocatch allrdquo

BCM is probably the most cost-effective security control of all

BCM start now start small scale fast broadendeepen BCM scope

incrementally

You must be able to respond to your circumstances as they exist - not as you

would like them to be - Brian Billick

robertmcachia at umedumt

Sources in and around BCM for Data Centres

The Business Continuity Institute (BCI UK) httpwwwthebciorg

Guidancestudies on BCM by ENISA (EU) ECB (EU) CMI (UK) FSA (UK) UK

Cabinet Office

A primer on ITIL ISO 20000 - IT Service Management

httpwwwisaca-

maltaorgdocumentspresentationsIT_Service_Management_Robert_M_Cachia_26_02

_2009pdf

Agility amp alertness hellipcontinuitycentral httpwwwcontinuitycentralcomukhtm

BS25999 (British Standards Institution) httpwwwbsigroupcomensectorsandservicesDisciplinesBusines

s-Continuity

RiskIT amp CobIT frameworks (ISACA) wwwisaca-maltaorg

SFIA HR framework for IT ISEB (BCS) httpwwwbcsorg

SEI (USA) httpwwwseicmuedu

Uptime Institute (USA) httpwwwuptimeinstituteorg

robertmcachia at umedumt

Questions and Discussion

robertmcachia atldquo umedumt

Thank You

Dr Robert M Cachia

httpwwwlinkedincominrobertmcachia

robertmcachia ldquoatrdquo umedumt

Business e-Services amp trust

Is she

bull a Logistics Manager

bull a Banker

bull a PA to the CEO

bull a CEO

in

bull an Airport

bull a Bourse

bull a Telecoms provider

bull iGaming

bull e-payment

bull a Retail chain hellip

hellip doing B2B or B2C

hellip doing e-CRM or e-SCMe-services need trust

robertmcachia at umedumt

BCM and IT - economic context

Economy 20 amp Economy 30

digital networked global mobile locational 24x7x365

11 billion Internet users 27 billion mobile phone users

ubiquitous VoIPSkype e-mail amp calendaring software

e-Business e-Banking e-Payment e-Government

e-SCM e-CRM B2B B2C -across supply chains geographies time-zones

economies amp jurisdictions

BCM in DCs is mandated by economic logic

robertmcachia at umedumt

BCM and IT - what not to do

human error IT management human error technical

robertmcachia at umedumt

DC BCM terminology Threats Vulnerabilities Risk Assets

Threat the intent and capacity to cause loss or disruption

and create adverse consequences - eg to IT

services data Data Centers IT Call Centers etc

Vulnerability the susceptibility of a service provider service data

or infrastructure to damage impairment or exposure

by a threat

Risk a measure of the potential consequences of a

contingency against the likelihood of its occurring

threats + vulnerabilities = risk

Impact the consequenceeffect of a threat expressed in

terms of reduction of DC capability or loss of

business service data etc

Asset an IT service element eg software hardware IT

people data

robertmcachia at umedumt

Threats - some examples

Some data centre threats

human error (managementtechnical accidentalmalicious)

technical failure (mechanical electrical hardware software

etc)

software failure (even software patches themselves contain

bugs)

fire explosion smoke

floods (natural burst pipes)

toxic hazards (effluents emissions)

structural collapse

crime (vandalism organized crime white-collar crime etc)

mass cyber-attack digital terrorism cyber-war

Established threats may diminish and new threats emerge - alertness

Oct 2006 - Marsascala

robertmcachia at umedumt

Threats - general amp IT-specific

-

Explosion fire smoke emissions effluents

in or close to DC

malicious IT Threat

robertmcachia at umedumt

IT Vulnerabilities - some examples (1)

DC vulnerabilities may be technical or non-technical

eg in organizational design in business processes systems or software

Some IT vulnerabilities

Poor Data Centre site selection or Data Centre design

SPOFs (people hardware databases software)

Poor Data Centre procedures or compliance to

procedure (SFIA HR framework for IT ITIL)

Poor physical access control

Poor software patchinggood physical access

control

robertmcachia at umedumt

IT Vulnerabilities - some examples (2)

Poor software testing especially Web

testing (recall CMMi ISEB)

Poor Configuration ManagementVersion

Control (recall ISO 20000 ITIL)

Poor Infosec (passwords biometrics

encryption anti-virusmalware)

(recall ISO 27001 CobIT RiskIT)

Poor Capacity ManagementAvailability

Management (recall ISO 20000 ITIL)

Poor procurementsupplier management

(ISO 9000 ITIL BS25999)

Poor fire protection

Known vulnerabilities sometimes diminish amp new ones emerge - alertness

wiring from hell

robertmcachia at umedumt

Risks - general amp IT-specific

loss of facilities (buildings roads leading to building etc)

loss of data (data integrity data availability recall ISO 27001)

loss of power

loss of water supply (cooling) the opposite - flooding

computer infection outbreaks - viruses worms

physical access compromised intrusion with intent

asset seizure

Changing threats changing vulnerabilities - shifting scenario alertness

Some Data Centres risks

supply chain disruption (software vendor collapse

hardware vendor collapse ISP collapse)

loss of Telecom connectivity (data voice)

loss of people (people can be SPOFs)Your IT vendor supply

chain

Burst pipe

DC flooding

robertmcachia at umedumt

IT Risks - some examples

Physical DC Risks Logical DC Risks

DC outages business impacts - large amp immediate

Impact of DC outages

bull down-time

bull breached SLAs

bull lost revenue

bull reputational damage

clients and prospects lost

forever

bull cost to recover

bull legal liabilities

bull some DCs never recover

Recall many DCs are in

premises that were never

designed to be DC-ready

Q How to protect value-creation capability

A BCM

robertmcachia at umedumt

BCM in Data Centres BCM for Data Centres

360 e-continuity amp e-assurance

business functions of commercial IT providers(eg CEOrsquos office Procurement Marketing amp Sales HR Facilities

Planning QA)

IT processes eg IncidentProblem ConfigurationChange

AvailabilityCapacity

Data

Employees including IT people

Physical FacilitiesBuilding Networks VoIP

Computational and Storage

Servers SANNAS

Software stackOS DBMS middleware applications

ApplicationsServicesB2BB2C e-SCMERPe-CRM BIKM IntranetsExtranets e-mail

BCM for data centres 360

robertmcachia at umedumt

BCM DC initiative output a BC Plan + organizational capability

A good BC plan is articulated by components and views

With a good BC plan the CIOCTOData Centre Manager can tell the

CEO

We haveare executing plans by DC clientmarket segment hellip

We haveare executing plans by Applicationby Service hellip

We haveare executing plans by DC Departmenthellip

We haveare executing plans by Buildingby Floor hellip

BC Plan BC Planning (i) What (ii) How (iii) Who (iv) When

robertmcachia at umedumt

DC Business Continuity Planning management choices

Which DC clientsservicesapplications get restored first and who will

wait longer hellip recovery sequence for each client

Which servicesapplications get restored completely which clients will

initially only get limited service

MTPD maximum tolerable period of disruption the period of time by

which all identified systemsapplicationsservices and data

must be restored to normality (for DC for each client)

RTO recovery time objective the period in time within which

assetsservices must be recovered after disruption (for DC for

each client)

RTO affects the recovery option shorter RTO -gt more difficult

more expensive (for DC for each client)

RPO recovery point objective the point in time to which

assetsservices must be restored after disruption (for DC for

each client)

RPO affects the volume of data that may need to be restored

robertmcachia at umedumt

BCM initiative requires a dedicated ProjectProgramme

A BC initiative may adopt Prince 2 or PMBOK projectmanagement methods The typical BCM phases

BC Project Initiation Phase

(mandate BCM scope DC BCM objectives responsibilitiesapproach awareness and BC project outcomesdeliverables)

Business Impact Analysis Phase

Risk Assessment Phase

BC Planning amp Design Phase

Exercise amp Testing Phase

Maintenance amp Review Phase

Awareness amp Training Phase

robertmcachia at umedumt

Business Impact Analysis (BIA) Phase

BIA achieves understanding of the DC its users activities

servicesrevenue streams applications data liabilities

bull BIA takes an internal focus - it identifies critical DC value-creatingactivities and assets

bull BIA quantifies DC loss due to disruptionimpairment of assets(reputational financial lost revenue cost of recovery)

bull BIA does not estimate the probability of types of DC incidents - itquantifies the consequences

bull BIA techniques questionnaires + interviews

BIA answers the questions which DC activities amp

assets create most value hellip of all potential DC losses

which impaired activities amp assets would be the

greatest losscascading impacts

multiple DC failure

robertmcachia at umedumt

The Risk Matrix - likelihood vs impact

Major Risks to DC - top-right square

unacceptable Eliminate or Postpone

Transfer Monitor amp Mitigate +

management ownership + DO BCM -Governance

Contingency Risks to DC - top-left

square very rare events will

extinguish the business complacency +

overconfidence can distract

management from the consequences

DO BCM - Governance

High incidence and low impact risksamp

minor risks order of the day -

operational matters

robertmcachia at umedumt

Risk Assessment Phase

RA considers the whatwhywherewho can and would cause the DCthe disastrous losses identified in BIA RA seeks to understand threatsto valued serviceDC assets

bull RA takes a primarily external focus a worldpoliticaleconomic IT sector focus - it looks at sources and causes

bull RA identifies threat scenarios to DC value-creating activities ampassets

bull RA estimates likelihood (probability) of occurrence of DC lossesamp asset impairment identified in BIA (estimates orguesstimates)

Note Need to locate DC activities amp assets in the slots of the RiskMatrix for the DC and plan to protect value-creating capability

robertmcachia at umedumt

BC Planning amp Design Phase - business recovery

DC recovery options driven by business requirements

regulation contracts amp SLAs A typical recovery sequence

1 Set MTPD objective for the DCfor each

client

2 Assign RTOs for individual

clientservicesapplicationsdata on

basis of prioritySLA

3 Evaluate alternative management amp

technical recovery options

4 Perform costbenefit analysis for each

option (re-planiterate)

5 Decide CEOCFO (i) approval (ii)

budget (iii) tasking individualsFailure to prepare is preparing

to fail - John Wooden

robertmcachia at umedumt

Some business recovery options for the DC

bull an Emergency Operations OfficeCentre

bull virtual workplaceteleworking from home for selected DC employees

bull alternative IT Call CentreData Centre

bull business decision - On-shore Near-shore Offshore

fallbackredundancy options

A typical recovery sequence

1 Initial emergency response

2 Resume mission-critical ApplicationsServices

3 Resume non-critical ApplicationsServices

4 Restoration to primary site full services verify stability

Ongoing communication - DC employees clients regulators

BC Planning amp Design Phase - business recovery (2)

robertmcachia at umedumt

BC Planning amp Design - technical recovery

Note business decision On-shore Near-shore Offshore

Note Mirror Hot Warm Cold (i) own (ii) shared-space (iii) reciprocal

agreements (iv) outsourced a business issue

Note high-end BC for DCs active-active hellip hot failover hellip

Note entry-point BC for DCs electronic vaulting remote journaling

transaction logging

Mirror full redundancy short-

distancewidely-separated Data

Centers latency considerations

Hot fully equipped fallback Data Centre

Warm fallback Data Center missing key

components

Cold empty fallback Data Center

DC Recovery options by increasing MTPD RTO amp decreasing cost

mirroring

robertmcachia at umedumt

Phases Exercise amp Testing Maintain amp Review Awareness amp Training

Maintain amp Review Phase

feedback from testingaudit broadendeepen DC BC plan

Awareness amp Training Phase

its never enough institutionalize BC practice embed into

DC culture

The proof of the bdquopudding‟

is in the eating

Exercise amp Testing Phase

Give yourself an intentional DC disaster

1 Re-test BCP regularly

(walk-through checklist role-

playingsimulation full interruption amp

rehearsal)

2 Test alternative risk (threat +

vulnerability) scenarios

robertmcachia at umedumt

An example disaster mass cyber-attack

A mass cyber-attack may

involve

prior sniffing hellip

hellip social engineering amp

widespread attack

An attack may be blended

denial-of-service andor

widespread virusworm

infection andor

penetration (data theft

data corruption) andor

fraudblackmail

IncidentProblem Management

(recall ISO 20000 ITIL)source M Benyoucef

University of Ottawa

robertmcachia at umedumt

An example DC disaster BCM response to cyber-attack

This sequence applies to DoSDDoS infection amp penetration cyber-

attacks

Note After the BCM response seek prevention (i) Lessons Learnt (ii)

improve DC processsystems resilience (iii) heighten DC emergency

preparedness

1 Interdict actions to block attack halt its progress

2 Contain actions to prevent further degradation

amp regain control

3 Recover repair fix damage bring all systems

data applicationsservices to normality

4 Analyse investigationaudit - examination of

evidence

When prevention fails and an attack is under way the BCM

response is

Interdict Contain

Recover Analyse

robertmcachia at umedumt

Structure amp Content of BS25999

A DC may go beyond organising a BCP project

It may decide to formalise its BCM assurance process by embracing the

BS25999 standard

BS25999 holistic structure amp content

bull overview of business continuity management (BCM)

bull business continuity management policy

bull BCM programme management

bull understanding the organization

bull determining business continuity strategy

bull developing and implementing a BCM response

bull exercising maintaining and reviewing BCM arrangements

bull embedding BCM in the organizationrsquos culture

BS25999 Holistic

robertmcachia at umedumt

Some takeaways (1)

BCM thinking amp action

bull decrease the number of

holes both management amp

technical

bull decrease the size of the

holes and

bull keep them moving so they

are never aligned

The Swiss cheese model of business resilience

robertmcachia at umedumt

Some takeaways (2)

BCM is a concern of

CIOs CTOs Data Centre Managers

Quality Managers Infosec Managers

IT Architects DBAs

IT Auditors

CEOs CFOs because itrsquos Governance

BCM addresses broad non-specific risks it is a ldquocatch allrdquo

BCM is probably the most cost-effective security control of all

BCM start now start small scale fast broadendeepen BCM scope

incrementally

You must be able to respond to your circumstances as they exist - not as you

would like them to be - Brian Billick

robertmcachia at umedumt

Sources in and around BCM for Data Centres

The Business Continuity Institute (BCI UK) httpwwwthebciorg

Guidancestudies on BCM by ENISA (EU) ECB (EU) CMI (UK) FSA (UK) UK

Cabinet Office

A primer on ITIL ISO 20000 - IT Service Management

httpwwwisaca-

maltaorgdocumentspresentationsIT_Service_Management_Robert_M_Cachia_26_02

_2009pdf

Agility amp alertness hellipcontinuitycentral httpwwwcontinuitycentralcomukhtm

BS25999 (British Standards Institution) httpwwwbsigroupcomensectorsandservicesDisciplinesBusines

s-Continuity

RiskIT amp CobIT frameworks (ISACA) wwwisaca-maltaorg

SFIA HR framework for IT ISEB (BCS) httpwwwbcsorg

SEI (USA) httpwwwseicmuedu

Uptime Institute (USA) httpwwwuptimeinstituteorg

robertmcachia at umedumt

Questions and Discussion

robertmcachia atldquo umedumt

Thank You

Dr Robert M Cachia

httpwwwlinkedincominrobertmcachia

robertmcachia ldquoatrdquo umedumt

BCM and IT - economic context

Economy 20 amp Economy 30

digital networked global mobile locational 24x7x365

11 billion Internet users 27 billion mobile phone users

ubiquitous VoIPSkype e-mail amp calendaring software

e-Business e-Banking e-Payment e-Government

e-SCM e-CRM B2B B2C -across supply chains geographies time-zones

economies amp jurisdictions

BCM in DCs is mandated by economic logic

robertmcachia at umedumt

BCM and IT - what not to do

human error IT management human error technical

robertmcachia at umedumt

DC BCM terminology Threats Vulnerabilities Risk Assets

Threat the intent and capacity to cause loss or disruption

and create adverse consequences - eg to IT

services data Data Centers IT Call Centers etc

Vulnerability the susceptibility of a service provider service data

or infrastructure to damage impairment or exposure

by a threat

Risk a measure of the potential consequences of a

contingency against the likelihood of its occurring

threats + vulnerabilities = risk

Impact the consequenceeffect of a threat expressed in

terms of reduction of DC capability or loss of

business service data etc

Asset an IT service element eg software hardware IT

people data

robertmcachia at umedumt

Threats - some examples

Some data centre threats

human error (managementtechnical accidentalmalicious)

technical failure (mechanical electrical hardware software

etc)

software failure (even software patches themselves contain

bugs)

fire explosion smoke

floods (natural burst pipes)

toxic hazards (effluents emissions)

structural collapse

crime (vandalism organized crime white-collar crime etc)

mass cyber-attack digital terrorism cyber-war

Established threats may diminish and new threats emerge - alertness

Oct 2006 - Marsascala

robertmcachia at umedumt

Threats - general amp IT-specific

-

Explosion fire smoke emissions effluents

in or close to DC

malicious IT Threat

robertmcachia at umedumt

IT Vulnerabilities - some examples (1)

DC vulnerabilities may be technical or non-technical

eg in organizational design in business processes systems or software

Some IT vulnerabilities

Poor Data Centre site selection or Data Centre design

SPOFs (people hardware databases software)

Poor Data Centre procedures or compliance to

procedure (SFIA HR framework for IT ITIL)

Poor physical access control

Poor software patchinggood physical access

control

robertmcachia at umedumt

IT Vulnerabilities - some examples (2)

Poor software testing especially Web

testing (recall CMMi ISEB)

Poor Configuration ManagementVersion

Control (recall ISO 20000 ITIL)

Poor Infosec (passwords biometrics

encryption anti-virusmalware)

(recall ISO 27001 CobIT RiskIT)

Poor Capacity ManagementAvailability

Management (recall ISO 20000 ITIL)

Poor procurementsupplier management

(ISO 9000 ITIL BS25999)

Poor fire protection

Known vulnerabilities sometimes diminish amp new ones emerge - alertness

wiring from hell

robertmcachia at umedumt

Risks - general amp IT-specific

loss of facilities (buildings roads leading to building etc)

loss of data (data integrity data availability recall ISO 27001)

loss of power

loss of water supply (cooling) the opposite - flooding

computer infection outbreaks - viruses worms

physical access compromised intrusion with intent

asset seizure

Changing threats changing vulnerabilities - shifting scenario alertness

Some Data Centres risks

supply chain disruption (software vendor collapse

hardware vendor collapse ISP collapse)

loss of Telecom connectivity (data voice)

loss of people (people can be SPOFs)Your IT vendor supply

chain

Burst pipe

DC flooding

robertmcachia at umedumt

IT Risks - some examples

Physical DC Risks Logical DC Risks

DC outages business impacts - large amp immediate

Impact of DC outages

bull down-time

bull breached SLAs

bull lost revenue

bull reputational damage

clients and prospects lost

forever

bull cost to recover

bull legal liabilities

bull some DCs never recover

Recall many DCs are in

premises that were never

designed to be DC-ready

Q How to protect value-creation capability

A BCM

robertmcachia at umedumt

BCM in Data Centres BCM for Data Centres

360 e-continuity amp e-assurance

business functions of commercial IT providers(eg CEOrsquos office Procurement Marketing amp Sales HR Facilities

Planning QA)

IT processes eg IncidentProblem ConfigurationChange

AvailabilityCapacity

Data

Employees including IT people

Physical FacilitiesBuilding Networks VoIP

Computational and Storage

Servers SANNAS

Software stackOS DBMS middleware applications

ApplicationsServicesB2BB2C e-SCMERPe-CRM BIKM IntranetsExtranets e-mail

BCM for data centres 360

robertmcachia at umedumt

BCM DC initiative output a BC Plan + organizational capability

A good BC plan is articulated by components and views

With a good BC plan the CIOCTOData Centre Manager can tell the

CEO

We haveare executing plans by DC clientmarket segment hellip

We haveare executing plans by Applicationby Service hellip

We haveare executing plans by DC Departmenthellip

We haveare executing plans by Buildingby Floor hellip

BC Plan BC Planning (i) What (ii) How (iii) Who (iv) When

robertmcachia at umedumt

DC Business Continuity Planning management choices

Which DC clientsservicesapplications get restored first and who will

wait longer hellip recovery sequence for each client

Which servicesapplications get restored completely which clients will

initially only get limited service

MTPD maximum tolerable period of disruption the period of time by

which all identified systemsapplicationsservices and data

must be restored to normality (for DC for each client)

RTO recovery time objective the period in time within which

assetsservices must be recovered after disruption (for DC for

each client)

RTO affects the recovery option shorter RTO -gt more difficult

more expensive (for DC for each client)

RPO recovery point objective the point in time to which

assetsservices must be restored after disruption (for DC for

each client)

RPO affects the volume of data that may need to be restored

robertmcachia at umedumt

BCM initiative requires a dedicated ProjectProgramme

A BC initiative may adopt Prince 2 or PMBOK projectmanagement methods The typical BCM phases

BC Project Initiation Phase

(mandate BCM scope DC BCM objectives responsibilitiesapproach awareness and BC project outcomesdeliverables)

Business Impact Analysis Phase

Risk Assessment Phase

BC Planning amp Design Phase

Exercise amp Testing Phase

Maintenance amp Review Phase

Awareness amp Training Phase

robertmcachia at umedumt

Business Impact Analysis (BIA) Phase

BIA achieves understanding of the DC its users activities

servicesrevenue streams applications data liabilities

bull BIA takes an internal focus - it identifies critical DC value-creatingactivities and assets

bull BIA quantifies DC loss due to disruptionimpairment of assets(reputational financial lost revenue cost of recovery)

bull BIA does not estimate the probability of types of DC incidents - itquantifies the consequences

bull BIA techniques questionnaires + interviews

BIA answers the questions which DC activities amp

assets create most value hellip of all potential DC losses

which impaired activities amp assets would be the

greatest losscascading impacts

multiple DC failure

robertmcachia at umedumt

The Risk Matrix - likelihood vs impact

Major Risks to DC - top-right square

unacceptable Eliminate or Postpone

Transfer Monitor amp Mitigate +

management ownership + DO BCM -Governance

Contingency Risks to DC - top-left

square very rare events will

extinguish the business complacency +

overconfidence can distract

management from the consequences

DO BCM - Governance

High incidence and low impact risksamp

minor risks order of the day -

operational matters

robertmcachia at umedumt

Risk Assessment Phase

RA considers the whatwhywherewho can and would cause the DCthe disastrous losses identified in BIA RA seeks to understand threatsto valued serviceDC assets

bull RA takes a primarily external focus a worldpoliticaleconomic IT sector focus - it looks at sources and causes

bull RA identifies threat scenarios to DC value-creating activities ampassets

bull RA estimates likelihood (probability) of occurrence of DC lossesamp asset impairment identified in BIA (estimates orguesstimates)

Note Need to locate DC activities amp assets in the slots of the RiskMatrix for the DC and plan to protect value-creating capability

robertmcachia at umedumt

BC Planning amp Design Phase - business recovery

DC recovery options driven by business requirements

regulation contracts amp SLAs A typical recovery sequence

1 Set MTPD objective for the DCfor each

client

2 Assign RTOs for individual

clientservicesapplicationsdata on

basis of prioritySLA

3 Evaluate alternative management amp

technical recovery options

4 Perform costbenefit analysis for each

option (re-planiterate)

5 Decide CEOCFO (i) approval (ii)

budget (iii) tasking individualsFailure to prepare is preparing

to fail - John Wooden

robertmcachia at umedumt

Some business recovery options for the DC

bull an Emergency Operations OfficeCentre

bull virtual workplaceteleworking from home for selected DC employees

bull alternative IT Call CentreData Centre

bull business decision - On-shore Near-shore Offshore

fallbackredundancy options

A typical recovery sequence

1 Initial emergency response

2 Resume mission-critical ApplicationsServices

3 Resume non-critical ApplicationsServices

4 Restoration to primary site full services verify stability

Ongoing communication - DC employees clients regulators

BC Planning amp Design Phase - business recovery (2)

robertmcachia at umedumt

BC Planning amp Design - technical recovery

Note business decision On-shore Near-shore Offshore

Note Mirror Hot Warm Cold (i) own (ii) shared-space (iii) reciprocal

agreements (iv) outsourced a business issue

Note high-end BC for DCs active-active hellip hot failover hellip

Note entry-point BC for DCs electronic vaulting remote journaling

transaction logging

Mirror full redundancy short-

distancewidely-separated Data

Centers latency considerations

Hot fully equipped fallback Data Centre

Warm fallback Data Center missing key

components

Cold empty fallback Data Center

DC Recovery options by increasing MTPD RTO amp decreasing cost

mirroring

robertmcachia at umedumt

Phases Exercise amp Testing Maintain amp Review Awareness amp Training

Maintain amp Review Phase

feedback from testingaudit broadendeepen DC BC plan

Awareness amp Training Phase

its never enough institutionalize BC practice embed into

DC culture

The proof of the bdquopudding‟

is in the eating

Exercise amp Testing Phase

Give yourself an intentional DC disaster

1 Re-test BCP regularly

(walk-through checklist role-

playingsimulation full interruption amp

rehearsal)

2 Test alternative risk (threat +

vulnerability) scenarios

robertmcachia at umedumt

An example disaster mass cyber-attack

A mass cyber-attack may

involve

prior sniffing hellip

hellip social engineering amp

widespread attack

An attack may be blended

denial-of-service andor

widespread virusworm

infection andor

penetration (data theft

data corruption) andor

fraudblackmail

IncidentProblem Management

(recall ISO 20000 ITIL)source M Benyoucef

University of Ottawa

robertmcachia at umedumt

An example DC disaster BCM response to cyber-attack

This sequence applies to DoSDDoS infection amp penetration cyber-

attacks

Note After the BCM response seek prevention (i) Lessons Learnt (ii)

improve DC processsystems resilience (iii) heighten DC emergency

preparedness

1 Interdict actions to block attack halt its progress

2 Contain actions to prevent further degradation

amp regain control

3 Recover repair fix damage bring all systems

data applicationsservices to normality

4 Analyse investigationaudit - examination of

evidence

When prevention fails and an attack is under way the BCM

response is

Interdict Contain

Recover Analyse

robertmcachia at umedumt

Structure amp Content of BS25999

A DC may go beyond organising a BCP project

It may decide to formalise its BCM assurance process by embracing the

BS25999 standard

BS25999 holistic structure amp content

bull overview of business continuity management (BCM)

bull business continuity management policy

bull BCM programme management

bull understanding the organization

bull determining business continuity strategy

bull developing and implementing a BCM response

bull exercising maintaining and reviewing BCM arrangements

bull embedding BCM in the organizationrsquos culture

BS25999 Holistic

robertmcachia at umedumt

Some takeaways (1)

BCM thinking amp action

bull decrease the number of

holes both management amp

technical

bull decrease the size of the

holes and

bull keep them moving so they

are never aligned

The Swiss cheese model of business resilience

robertmcachia at umedumt

Some takeaways (2)

BCM is a concern of

CIOs CTOs Data Centre Managers

Quality Managers Infosec Managers

IT Architects DBAs

IT Auditors

CEOs CFOs because itrsquos Governance

BCM addresses broad non-specific risks it is a ldquocatch allrdquo

BCM is probably the most cost-effective security control of all

BCM start now start small scale fast broadendeepen BCM scope

incrementally

You must be able to respond to your circumstances as they exist - not as you

would like them to be - Brian Billick

robertmcachia at umedumt

Sources in and around BCM for Data Centres

The Business Continuity Institute (BCI UK) httpwwwthebciorg

Guidancestudies on BCM by ENISA (EU) ECB (EU) CMI (UK) FSA (UK) UK

Cabinet Office

A primer on ITIL ISO 20000 - IT Service Management

httpwwwisaca-

maltaorgdocumentspresentationsIT_Service_Management_Robert_M_Cachia_26_02

_2009pdf

Agility amp alertness hellipcontinuitycentral httpwwwcontinuitycentralcomukhtm

BS25999 (British Standards Institution) httpwwwbsigroupcomensectorsandservicesDisciplinesBusines

s-Continuity

RiskIT amp CobIT frameworks (ISACA) wwwisaca-maltaorg

SFIA HR framework for IT ISEB (BCS) httpwwwbcsorg

SEI (USA) httpwwwseicmuedu

Uptime Institute (USA) httpwwwuptimeinstituteorg

robertmcachia at umedumt

Questions and Discussion

robertmcachia atldquo umedumt

Thank You

Dr Robert M Cachia

httpwwwlinkedincominrobertmcachia

robertmcachia ldquoatrdquo umedumt

BCM and IT - what not to do

human error IT management human error technical

robertmcachia at umedumt

DC BCM terminology Threats Vulnerabilities Risk Assets

Threat the intent and capacity to cause loss or disruption

and create adverse consequences - eg to IT

services data Data Centers IT Call Centers etc

Vulnerability the susceptibility of a service provider service data

or infrastructure to damage impairment or exposure

by a threat

Risk a measure of the potential consequences of a

contingency against the likelihood of its occurring

threats + vulnerabilities = risk

Impact the consequenceeffect of a threat expressed in

terms of reduction of DC capability or loss of

business service data etc

Asset an IT service element eg software hardware IT

people data

robertmcachia at umedumt

Threats - some examples

Some data centre threats

human error (managementtechnical accidentalmalicious)

technical failure (mechanical electrical hardware software

etc)

software failure (even software patches themselves contain

bugs)

fire explosion smoke

floods (natural burst pipes)

toxic hazards (effluents emissions)

structural collapse

crime (vandalism organized crime white-collar crime etc)

mass cyber-attack digital terrorism cyber-war

Established threats may diminish and new threats emerge - alertness

Oct 2006 - Marsascala

robertmcachia at umedumt

Threats - general amp IT-specific

-

Explosion fire smoke emissions effluents

in or close to DC

malicious IT Threat

robertmcachia at umedumt

IT Vulnerabilities - some examples (1)

DC vulnerabilities may be technical or non-technical

eg in organizational design in business processes systems or software

Some IT vulnerabilities

Poor Data Centre site selection or Data Centre design

SPOFs (people hardware databases software)

Poor Data Centre procedures or compliance to

procedure (SFIA HR framework for IT ITIL)

Poor physical access control

Poor software patchinggood physical access

control

robertmcachia at umedumt

IT Vulnerabilities - some examples (2)

Poor software testing especially Web

testing (recall CMMi ISEB)

Poor Configuration ManagementVersion

Control (recall ISO 20000 ITIL)

Poor Infosec (passwords biometrics

encryption anti-virusmalware)

(recall ISO 27001 CobIT RiskIT)

Poor Capacity ManagementAvailability

Management (recall ISO 20000 ITIL)

Poor procurementsupplier management

(ISO 9000 ITIL BS25999)

Poor fire protection

Known vulnerabilities sometimes diminish amp new ones emerge - alertness

wiring from hell

robertmcachia at umedumt

Risks - general amp IT-specific

loss of facilities (buildings roads leading to building etc)

loss of data (data integrity data availability recall ISO 27001)

loss of power

loss of water supply (cooling) the opposite - flooding

computer infection outbreaks - viruses worms

physical access compromised intrusion with intent

asset seizure

Changing threats changing vulnerabilities - shifting scenario alertness

Some Data Centres risks

supply chain disruption (software vendor collapse

hardware vendor collapse ISP collapse)

loss of Telecom connectivity (data voice)

loss of people (people can be SPOFs)Your IT vendor supply

chain

Burst pipe

DC flooding

robertmcachia at umedumt

IT Risks - some examples

Physical DC Risks Logical DC Risks

DC outages business impacts - large amp immediate

Impact of DC outages

bull down-time

bull breached SLAs

bull lost revenue

bull reputational damage

clients and prospects lost

forever

bull cost to recover

bull legal liabilities

bull some DCs never recover

Recall many DCs are in

premises that were never

designed to be DC-ready

Q How to protect value-creation capability

A BCM

robertmcachia at umedumt

BCM in Data Centres BCM for Data Centres

360 e-continuity amp e-assurance

business functions of commercial IT providers(eg CEOrsquos office Procurement Marketing amp Sales HR Facilities

Planning QA)

IT processes eg IncidentProblem ConfigurationChange

AvailabilityCapacity

Data

Employees including IT people

Physical FacilitiesBuilding Networks VoIP

Computational and Storage

Servers SANNAS

Software stackOS DBMS middleware applications

ApplicationsServicesB2BB2C e-SCMERPe-CRM BIKM IntranetsExtranets e-mail

BCM for data centres 360

robertmcachia at umedumt

BCM DC initiative output a BC Plan + organizational capability

A good BC plan is articulated by components and views

With a good BC plan the CIOCTOData Centre Manager can tell the

CEO

We haveare executing plans by DC clientmarket segment hellip

We haveare executing plans by Applicationby Service hellip

We haveare executing plans by DC Departmenthellip

We haveare executing plans by Buildingby Floor hellip

BC Plan BC Planning (i) What (ii) How (iii) Who (iv) When

robertmcachia at umedumt

DC Business Continuity Planning management choices

Which DC clientsservicesapplications get restored first and who will

wait longer hellip recovery sequence for each client

Which servicesapplications get restored completely which clients will

initially only get limited service

MTPD maximum tolerable period of disruption the period of time by

which all identified systemsapplicationsservices and data

must be restored to normality (for DC for each client)

RTO recovery time objective the period in time within which

assetsservices must be recovered after disruption (for DC for

each client)

RTO affects the recovery option shorter RTO -gt more difficult

more expensive (for DC for each client)

RPO recovery point objective the point in time to which

assetsservices must be restored after disruption (for DC for

each client)

RPO affects the volume of data that may need to be restored

robertmcachia at umedumt

BCM initiative requires a dedicated ProjectProgramme

A BC initiative may adopt Prince 2 or PMBOK projectmanagement methods The typical BCM phases

BC Project Initiation Phase

(mandate BCM scope DC BCM objectives responsibilitiesapproach awareness and BC project outcomesdeliverables)

Business Impact Analysis Phase

Risk Assessment Phase

BC Planning amp Design Phase

Exercise amp Testing Phase

Maintenance amp Review Phase

Awareness amp Training Phase

robertmcachia at umedumt

Business Impact Analysis (BIA) Phase

BIA achieves understanding of the DC its users activities

servicesrevenue streams applications data liabilities

bull BIA takes an internal focus - it identifies critical DC value-creatingactivities and assets

bull BIA quantifies DC loss due to disruptionimpairment of assets(reputational financial lost revenue cost of recovery)

bull BIA does not estimate the probability of types of DC incidents - itquantifies the consequences

bull BIA techniques questionnaires + interviews

BIA answers the questions which DC activities amp

assets create most value hellip of all potential DC losses

which impaired activities amp assets would be the

greatest losscascading impacts

multiple DC failure

robertmcachia at umedumt

The Risk Matrix - likelihood vs impact

Major Risks to DC - top-right square

unacceptable Eliminate or Postpone

Transfer Monitor amp Mitigate +

management ownership + DO BCM -Governance

Contingency Risks to DC - top-left

square very rare events will

extinguish the business complacency +

overconfidence can distract

management from the consequences

DO BCM - Governance

High incidence and low impact risksamp

minor risks order of the day -

operational matters

robertmcachia at umedumt

Risk Assessment Phase

RA considers the whatwhywherewho can and would cause the DCthe disastrous losses identified in BIA RA seeks to understand threatsto valued serviceDC assets

bull RA takes a primarily external focus a worldpoliticaleconomic IT sector focus - it looks at sources and causes

bull RA identifies threat scenarios to DC value-creating activities ampassets

bull RA estimates likelihood (probability) of occurrence of DC lossesamp asset impairment identified in BIA (estimates orguesstimates)

Note Need to locate DC activities amp assets in the slots of the RiskMatrix for the DC and plan to protect value-creating capability

robertmcachia at umedumt

BC Planning amp Design Phase - business recovery

DC recovery options driven by business requirements

regulation contracts amp SLAs A typical recovery sequence

1 Set MTPD objective for the DCfor each

client

2 Assign RTOs for individual

clientservicesapplicationsdata on

basis of prioritySLA

3 Evaluate alternative management amp

technical recovery options

4 Perform costbenefit analysis for each

option (re-planiterate)

5 Decide CEOCFO (i) approval (ii)

budget (iii) tasking individualsFailure to prepare is preparing

to fail - John Wooden

robertmcachia at umedumt

Some business recovery options for the DC

bull an Emergency Operations OfficeCentre

bull virtual workplaceteleworking from home for selected DC employees

bull alternative IT Call CentreData Centre

bull business decision - On-shore Near-shore Offshore

fallbackredundancy options

A typical recovery sequence

1 Initial emergency response

2 Resume mission-critical ApplicationsServices

3 Resume non-critical ApplicationsServices

4 Restoration to primary site full services verify stability

Ongoing communication - DC employees clients regulators

BC Planning amp Design Phase - business recovery (2)

robertmcachia at umedumt

BC Planning amp Design - technical recovery

Note business decision On-shore Near-shore Offshore

Note Mirror Hot Warm Cold (i) own (ii) shared-space (iii) reciprocal

agreements (iv) outsourced a business issue

Note high-end BC for DCs active-active hellip hot failover hellip

Note entry-point BC for DCs electronic vaulting remote journaling

transaction logging

Mirror full redundancy short-

distancewidely-separated Data

Centers latency considerations

Hot fully equipped fallback Data Centre

Warm fallback Data Center missing key

components

Cold empty fallback Data Center

DC Recovery options by increasing MTPD RTO amp decreasing cost

mirroring

robertmcachia at umedumt

Phases Exercise amp Testing Maintain amp Review Awareness amp Training

Maintain amp Review Phase

feedback from testingaudit broadendeepen DC BC plan

Awareness amp Training Phase

its never enough institutionalize BC practice embed into

DC culture

The proof of the bdquopudding‟

is in the eating

Exercise amp Testing Phase

Give yourself an intentional DC disaster

1 Re-test BCP regularly

(walk-through checklist role-

playingsimulation full interruption amp

rehearsal)

2 Test alternative risk (threat +

vulnerability) scenarios

robertmcachia at umedumt

An example disaster mass cyber-attack

A mass cyber-attack may

involve

prior sniffing hellip

hellip social engineering amp

widespread attack

An attack may be blended

denial-of-service andor

widespread virusworm

infection andor

penetration (data theft

data corruption) andor

fraudblackmail

IncidentProblem Management

(recall ISO 20000 ITIL)source M Benyoucef

University of Ottawa

robertmcachia at umedumt

An example DC disaster BCM response to cyber-attack

This sequence applies to DoSDDoS infection amp penetration cyber-

attacks

Note After the BCM response seek prevention (i) Lessons Learnt (ii)

improve DC processsystems resilience (iii) heighten DC emergency

preparedness

1 Interdict actions to block attack halt its progress

2 Contain actions to prevent further degradation

amp regain control

3 Recover repair fix damage bring all systems

data applicationsservices to normality

4 Analyse investigationaudit - examination of

evidence

When prevention fails and an attack is under way the BCM

response is

Interdict Contain

Recover Analyse

robertmcachia at umedumt

Structure amp Content of BS25999

A DC may go beyond organising a BCP project

It may decide to formalise its BCM assurance process by embracing the

BS25999 standard

BS25999 holistic structure amp content

bull overview of business continuity management (BCM)

bull business continuity management policy

bull BCM programme management

bull understanding the organization

bull determining business continuity strategy

bull developing and implementing a BCM response

bull exercising maintaining and reviewing BCM arrangements

bull embedding BCM in the organizationrsquos culture

BS25999 Holistic

robertmcachia at umedumt

Some takeaways (1)

BCM thinking amp action

bull decrease the number of

holes both management amp

technical

bull decrease the size of the

holes and

bull keep them moving so they

are never aligned

The Swiss cheese model of business resilience

robertmcachia at umedumt

Some takeaways (2)

BCM is a concern of

CIOs CTOs Data Centre Managers

Quality Managers Infosec Managers

IT Architects DBAs

IT Auditors

CEOs CFOs because itrsquos Governance

BCM addresses broad non-specific risks it is a ldquocatch allrdquo

BCM is probably the most cost-effective security control of all

BCM start now start small scale fast broadendeepen BCM scope

incrementally

You must be able to respond to your circumstances as they exist - not as you

would like them to be - Brian Billick

robertmcachia at umedumt

Sources in and around BCM for Data Centres

The Business Continuity Institute (BCI UK) httpwwwthebciorg

Guidancestudies on BCM by ENISA (EU) ECB (EU) CMI (UK) FSA (UK) UK

Cabinet Office

A primer on ITIL ISO 20000 - IT Service Management

httpwwwisaca-

maltaorgdocumentspresentationsIT_Service_Management_Robert_M_Cachia_26_02

_2009pdf

Agility amp alertness hellipcontinuitycentral httpwwwcontinuitycentralcomukhtm

BS25999 (British Standards Institution) httpwwwbsigroupcomensectorsandservicesDisciplinesBusines

s-Continuity

RiskIT amp CobIT frameworks (ISACA) wwwisaca-maltaorg

SFIA HR framework for IT ISEB (BCS) httpwwwbcsorg

SEI (USA) httpwwwseicmuedu

Uptime Institute (USA) httpwwwuptimeinstituteorg

robertmcachia at umedumt

Questions and Discussion

robertmcachia atldquo umedumt

Thank You

Dr Robert M Cachia

httpwwwlinkedincominrobertmcachia

robertmcachia ldquoatrdquo umedumt

DC BCM terminology Threats Vulnerabilities Risk Assets

Threat the intent and capacity to cause loss or disruption

and create adverse consequences - eg to IT

services data Data Centers IT Call Centers etc

Vulnerability the susceptibility of a service provider service data

or infrastructure to damage impairment or exposure

by a threat

Risk a measure of the potential consequences of a

contingency against the likelihood of its occurring

threats + vulnerabilities = risk

Impact the consequenceeffect of a threat expressed in

terms of reduction of DC capability or loss of

business service data etc

Asset an IT service element eg software hardware IT

people data

robertmcachia at umedumt

Threats - some examples

Some data centre threats

human error (managementtechnical accidentalmalicious)

technical failure (mechanical electrical hardware software

etc)

software failure (even software patches themselves contain

bugs)

fire explosion smoke

floods (natural burst pipes)

toxic hazards (effluents emissions)

structural collapse

crime (vandalism organized crime white-collar crime etc)

mass cyber-attack digital terrorism cyber-war

Established threats may diminish and new threats emerge - alertness

Oct 2006 - Marsascala

robertmcachia at umedumt

Threats - general amp IT-specific

-

Explosion fire smoke emissions effluents

in or close to DC

malicious IT Threat

robertmcachia at umedumt

IT Vulnerabilities - some examples (1)

DC vulnerabilities may be technical or non-technical

eg in organizational design in business processes systems or software

Some IT vulnerabilities

Poor Data Centre site selection or Data Centre design

SPOFs (people hardware databases software)

Poor Data Centre procedures or compliance to

procedure (SFIA HR framework for IT ITIL)

Poor physical access control

Poor software patchinggood physical access

control

robertmcachia at umedumt

IT Vulnerabilities - some examples (2)

Poor software testing especially Web

testing (recall CMMi ISEB)

Poor Configuration ManagementVersion

Control (recall ISO 20000 ITIL)

Poor Infosec (passwords biometrics

encryption anti-virusmalware)

(recall ISO 27001 CobIT RiskIT)

Poor Capacity ManagementAvailability

Management (recall ISO 20000 ITIL)

Poor procurementsupplier management

(ISO 9000 ITIL BS25999)

Poor fire protection

Known vulnerabilities sometimes diminish amp new ones emerge - alertness

wiring from hell

robertmcachia at umedumt

Risks - general amp IT-specific

loss of facilities (buildings roads leading to building etc)

loss of data (data integrity data availability recall ISO 27001)

loss of power

loss of water supply (cooling) the opposite - flooding

computer infection outbreaks - viruses worms

physical access compromised intrusion with intent

asset seizure

Changing threats changing vulnerabilities - shifting scenario alertness

Some Data Centres risks

supply chain disruption (software vendor collapse

hardware vendor collapse ISP collapse)

loss of Telecom connectivity (data voice)

loss of people (people can be SPOFs)Your IT vendor supply

chain

Burst pipe

DC flooding

robertmcachia at umedumt

IT Risks - some examples

Physical DC Risks Logical DC Risks

DC outages business impacts - large amp immediate

Impact of DC outages

bull down-time

bull breached SLAs

bull lost revenue

bull reputational damage

clients and prospects lost

forever

bull cost to recover

bull legal liabilities

bull some DCs never recover

Recall many DCs are in

premises that were never

designed to be DC-ready

Q How to protect value-creation capability

A BCM

robertmcachia at umedumt

BCM in Data Centres BCM for Data Centres

360 e-continuity amp e-assurance

business functions of commercial IT providers(eg CEOrsquos office Procurement Marketing amp Sales HR Facilities

Planning QA)

IT processes eg IncidentProblem ConfigurationChange

AvailabilityCapacity

Data

Employees including IT people

Physical FacilitiesBuilding Networks VoIP

Computational and Storage

Servers SANNAS

Software stackOS DBMS middleware applications

ApplicationsServicesB2BB2C e-SCMERPe-CRM BIKM IntranetsExtranets e-mail

BCM for data centres 360

robertmcachia at umedumt

BCM DC initiative output a BC Plan + organizational capability

A good BC plan is articulated by components and views

With a good BC plan the CIOCTOData Centre Manager can tell the

CEO

We haveare executing plans by DC clientmarket segment hellip

We haveare executing plans by Applicationby Service hellip

We haveare executing plans by DC Departmenthellip

We haveare executing plans by Buildingby Floor hellip

BC Plan BC Planning (i) What (ii) How (iii) Who (iv) When

robertmcachia at umedumt

DC Business Continuity Planning management choices

Which DC clientsservicesapplications get restored first and who will

wait longer hellip recovery sequence for each client

Which servicesapplications get restored completely which clients will

initially only get limited service

MTPD maximum tolerable period of disruption the period of time by

which all identified systemsapplicationsservices and data

must be restored to normality (for DC for each client)

RTO recovery time objective the period in time within which

assetsservices must be recovered after disruption (for DC for

each client)

RTO affects the recovery option shorter RTO -gt more difficult

more expensive (for DC for each client)

RPO recovery point objective the point in time to which

assetsservices must be restored after disruption (for DC for

each client)

RPO affects the volume of data that may need to be restored

robertmcachia at umedumt

BCM initiative requires a dedicated ProjectProgramme

A BC initiative may adopt Prince 2 or PMBOK projectmanagement methods The typical BCM phases

BC Project Initiation Phase

(mandate BCM scope DC BCM objectives responsibilitiesapproach awareness and BC project outcomesdeliverables)

Business Impact Analysis Phase

Risk Assessment Phase

BC Planning amp Design Phase

Exercise amp Testing Phase

Maintenance amp Review Phase

Awareness amp Training Phase

robertmcachia at umedumt

Business Impact Analysis (BIA) Phase

BIA achieves understanding of the DC its users activities

servicesrevenue streams applications data liabilities

bull BIA takes an internal focus - it identifies critical DC value-creatingactivities and assets

bull BIA quantifies DC loss due to disruptionimpairment of assets(reputational financial lost revenue cost of recovery)

bull BIA does not estimate the probability of types of DC incidents - itquantifies the consequences

bull BIA techniques questionnaires + interviews

BIA answers the questions which DC activities amp

assets create most value hellip of all potential DC losses

which impaired activities amp assets would be the

greatest losscascading impacts

multiple DC failure

robertmcachia at umedumt

The Risk Matrix - likelihood vs impact

Major Risks to DC - top-right square

unacceptable Eliminate or Postpone

Transfer Monitor amp Mitigate +

management ownership + DO BCM -Governance

Contingency Risks to DC - top-left

square very rare events will

extinguish the business complacency +

overconfidence can distract

management from the consequences

DO BCM - Governance

High incidence and low impact risksamp

minor risks order of the day -

operational matters

robertmcachia at umedumt

Risk Assessment Phase

RA considers the whatwhywherewho can and would cause the DCthe disastrous losses identified in BIA RA seeks to understand threatsto valued serviceDC assets

bull RA takes a primarily external focus a worldpoliticaleconomic IT sector focus - it looks at sources and causes

bull RA identifies threat scenarios to DC value-creating activities ampassets

bull RA estimates likelihood (probability) of occurrence of DC lossesamp asset impairment identified in BIA (estimates orguesstimates)

Note Need to locate DC activities amp assets in the slots of the RiskMatrix for the DC and plan to protect value-creating capability

robertmcachia at umedumt

BC Planning amp Design Phase - business recovery

DC recovery options driven by business requirements

regulation contracts amp SLAs A typical recovery sequence

1 Set MTPD objective for the DCfor each

client

2 Assign RTOs for individual

clientservicesapplicationsdata on

basis of prioritySLA

3 Evaluate alternative management amp

technical recovery options

4 Perform costbenefit analysis for each

option (re-planiterate)

5 Decide CEOCFO (i) approval (ii)

budget (iii) tasking individualsFailure to prepare is preparing

to fail - John Wooden

robertmcachia at umedumt

Some business recovery options for the DC

bull an Emergency Operations OfficeCentre

bull virtual workplaceteleworking from home for selected DC employees

bull alternative IT Call CentreData Centre

bull business decision - On-shore Near-shore Offshore

fallbackredundancy options

A typical recovery sequence

1 Initial emergency response

2 Resume mission-critical ApplicationsServices

3 Resume non-critical ApplicationsServices

4 Restoration to primary site full services verify stability

Ongoing communication - DC employees clients regulators

BC Planning amp Design Phase - business recovery (2)

robertmcachia at umedumt

BC Planning amp Design - technical recovery

Note business decision On-shore Near-shore Offshore

Note Mirror Hot Warm Cold (i) own (ii) shared-space (iii) reciprocal

agreements (iv) outsourced a business issue

Note high-end BC for DCs active-active hellip hot failover hellip

Note entry-point BC for DCs electronic vaulting remote journaling

transaction logging

Mirror full redundancy short-

distancewidely-separated Data

Centers latency considerations

Hot fully equipped fallback Data Centre

Warm fallback Data Center missing key

components

Cold empty fallback Data Center

DC Recovery options by increasing MTPD RTO amp decreasing cost

mirroring

robertmcachia at umedumt

Phases Exercise amp Testing Maintain amp Review Awareness amp Training

Maintain amp Review Phase

feedback from testingaudit broadendeepen DC BC plan

Awareness amp Training Phase

its never enough institutionalize BC practice embed into

DC culture

The proof of the bdquopudding‟

is in the eating

Exercise amp Testing Phase

Give yourself an intentional DC disaster

1 Re-test BCP regularly

(walk-through checklist role-

playingsimulation full interruption amp

rehearsal)

2 Test alternative risk (threat +

vulnerability) scenarios

robertmcachia at umedumt

An example disaster mass cyber-attack

A mass cyber-attack may

involve

prior sniffing hellip

hellip social engineering amp

widespread attack

An attack may be blended

denial-of-service andor

widespread virusworm

infection andor

penetration (data theft

data corruption) andor

fraudblackmail

IncidentProblem Management

(recall ISO 20000 ITIL)source M Benyoucef

University of Ottawa

robertmcachia at umedumt

An example DC disaster BCM response to cyber-attack

This sequence applies to DoSDDoS infection amp penetration cyber-

attacks

Note After the BCM response seek prevention (i) Lessons Learnt (ii)

improve DC processsystems resilience (iii) heighten DC emergency

preparedness

1 Interdict actions to block attack halt its progress

2 Contain actions to prevent further degradation

amp regain control

3 Recover repair fix damage bring all systems

data applicationsservices to normality

4 Analyse investigationaudit - examination of

evidence

When prevention fails and an attack is under way the BCM

response is

Interdict Contain

Recover Analyse

robertmcachia at umedumt

Structure amp Content of BS25999

A DC may go beyond organising a BCP project

It may decide to formalise its BCM assurance process by embracing the

BS25999 standard

BS25999 holistic structure amp content

bull overview of business continuity management (BCM)

bull business continuity management policy

bull BCM programme management

bull understanding the organization

bull determining business continuity strategy

bull developing and implementing a BCM response

bull exercising maintaining and reviewing BCM arrangements

bull embedding BCM in the organizationrsquos culture

BS25999 Holistic

robertmcachia at umedumt

Some takeaways (1)

BCM thinking amp action

bull decrease the number of

holes both management amp

technical

bull decrease the size of the

holes and

bull keep them moving so they

are never aligned

The Swiss cheese model of business resilience

robertmcachia at umedumt

Some takeaways (2)

BCM is a concern of

CIOs CTOs Data Centre Managers

Quality Managers Infosec Managers

IT Architects DBAs

IT Auditors

CEOs CFOs because itrsquos Governance

BCM addresses broad non-specific risks it is a ldquocatch allrdquo

BCM is probably the most cost-effective security control of all

BCM start now start small scale fast broadendeepen BCM scope

incrementally

You must be able to respond to your circumstances as they exist - not as you

would like them to be - Brian Billick

robertmcachia at umedumt

Sources in and around BCM for Data Centres

The Business Continuity Institute (BCI UK) httpwwwthebciorg

Guidancestudies on BCM by ENISA (EU) ECB (EU) CMI (UK) FSA (UK) UK

Cabinet Office

A primer on ITIL ISO 20000 - IT Service Management

httpwwwisaca-

maltaorgdocumentspresentationsIT_Service_Management_Robert_M_Cachia_26_02

_2009pdf

Agility amp alertness hellipcontinuitycentral httpwwwcontinuitycentralcomukhtm

BS25999 (British Standards Institution) httpwwwbsigroupcomensectorsandservicesDisciplinesBusines

s-Continuity

RiskIT amp CobIT frameworks (ISACA) wwwisaca-maltaorg

SFIA HR framework for IT ISEB (BCS) httpwwwbcsorg

SEI (USA) httpwwwseicmuedu

Uptime Institute (USA) httpwwwuptimeinstituteorg

robertmcachia at umedumt

Questions and Discussion

robertmcachia atldquo umedumt

Thank You

Dr Robert M Cachia

httpwwwlinkedincominrobertmcachia

robertmcachia ldquoatrdquo umedumt

Threats - some examples

Some data centre threats

human error (managementtechnical accidentalmalicious)

technical failure (mechanical electrical hardware software

etc)

software failure (even software patches themselves contain

bugs)

fire explosion smoke

floods (natural burst pipes)

toxic hazards (effluents emissions)

structural collapse

crime (vandalism organized crime white-collar crime etc)

mass cyber-attack digital terrorism cyber-war

Established threats may diminish and new threats emerge - alertness

Oct 2006 - Marsascala

robertmcachia at umedumt

Threats - general amp IT-specific

-

Explosion fire smoke emissions effluents

in or close to DC

malicious IT Threat

robertmcachia at umedumt

IT Vulnerabilities - some examples (1)

DC vulnerabilities may be technical or non-technical

eg in organizational design in business processes systems or software

Some IT vulnerabilities

Poor Data Centre site selection or Data Centre design

SPOFs (people hardware databases software)

Poor Data Centre procedures or compliance to

procedure (SFIA HR framework for IT ITIL)

Poor physical access control

Poor software patchinggood physical access

control

robertmcachia at umedumt

IT Vulnerabilities - some examples (2)

Poor software testing especially Web

testing (recall CMMi ISEB)

Poor Configuration ManagementVersion

Control (recall ISO 20000 ITIL)

Poor Infosec (passwords biometrics

encryption anti-virusmalware)

(recall ISO 27001 CobIT RiskIT)

Poor Capacity ManagementAvailability

Management (recall ISO 20000 ITIL)

Poor procurementsupplier management

(ISO 9000 ITIL BS25999)

Poor fire protection

Known vulnerabilities sometimes diminish amp new ones emerge - alertness

wiring from hell

robertmcachia at umedumt

Risks - general amp IT-specific

loss of facilities (buildings roads leading to building etc)

loss of data (data integrity data availability recall ISO 27001)

loss of power

loss of water supply (cooling) the opposite - flooding

computer infection outbreaks - viruses worms

physical access compromised intrusion with intent

asset seizure

Changing threats changing vulnerabilities - shifting scenario alertness

Some Data Centres risks

supply chain disruption (software vendor collapse

hardware vendor collapse ISP collapse)

loss of Telecom connectivity (data voice)

loss of people (people can be SPOFs)Your IT vendor supply

chain

Burst pipe

DC flooding

robertmcachia at umedumt

IT Risks - some examples

Physical DC Risks Logical DC Risks

DC outages business impacts - large amp immediate

Impact of DC outages

bull down-time

bull breached SLAs

bull lost revenue

bull reputational damage

clients and prospects lost

forever

bull cost to recover

bull legal liabilities

bull some DCs never recover

Recall many DCs are in

premises that were never

designed to be DC-ready

Q How to protect value-creation capability

A BCM

robertmcachia at umedumt

BCM in Data Centres BCM for Data Centres

360 e-continuity amp e-assurance

business functions of commercial IT providers(eg CEOrsquos office Procurement Marketing amp Sales HR Facilities

Planning QA)

IT processes eg IncidentProblem ConfigurationChange

AvailabilityCapacity

Data

Employees including IT people

Physical FacilitiesBuilding Networks VoIP

Computational and Storage

Servers SANNAS

Software stackOS DBMS middleware applications

ApplicationsServicesB2BB2C e-SCMERPe-CRM BIKM IntranetsExtranets e-mail

BCM for data centres 360

robertmcachia at umedumt

BCM DC initiative output a BC Plan + organizational capability

A good BC plan is articulated by components and views

With a good BC plan the CIOCTOData Centre Manager can tell the

CEO

We haveare executing plans by DC clientmarket segment hellip

We haveare executing plans by Applicationby Service hellip

We haveare executing plans by DC Departmenthellip

We haveare executing plans by Buildingby Floor hellip

BC Plan BC Planning (i) What (ii) How (iii) Who (iv) When

robertmcachia at umedumt

DC Business Continuity Planning management choices

Which DC clientsservicesapplications get restored first and who will

wait longer hellip recovery sequence for each client

Which servicesapplications get restored completely which clients will

initially only get limited service

MTPD maximum tolerable period of disruption the period of time by

which all identified systemsapplicationsservices and data

must be restored to normality (for DC for each client)

RTO recovery time objective the period in time within which

assetsservices must be recovered after disruption (for DC for

each client)

RTO affects the recovery option shorter RTO -gt more difficult

more expensive (for DC for each client)

RPO recovery point objective the point in time to which

assetsservices must be restored after disruption (for DC for

each client)

RPO affects the volume of data that may need to be restored

robertmcachia at umedumt

BCM initiative requires a dedicated ProjectProgramme

A BC initiative may adopt Prince 2 or PMBOK projectmanagement methods The typical BCM phases

BC Project Initiation Phase

(mandate BCM scope DC BCM objectives responsibilitiesapproach awareness and BC project outcomesdeliverables)

Business Impact Analysis Phase

Risk Assessment Phase

BC Planning amp Design Phase

Exercise amp Testing Phase

Maintenance amp Review Phase

Awareness amp Training Phase

robertmcachia at umedumt

Business Impact Analysis (BIA) Phase

BIA achieves understanding of the DC its users activities

servicesrevenue streams applications data liabilities

bull BIA takes an internal focus - it identifies critical DC value-creatingactivities and assets

bull BIA quantifies DC loss due to disruptionimpairment of assets(reputational financial lost revenue cost of recovery)

bull BIA does not estimate the probability of types of DC incidents - itquantifies the consequences

bull BIA techniques questionnaires + interviews

BIA answers the questions which DC activities amp

assets create most value hellip of all potential DC losses

which impaired activities amp assets would be the

greatest losscascading impacts

multiple DC failure

robertmcachia at umedumt

The Risk Matrix - likelihood vs impact

Major Risks to DC - top-right square

unacceptable Eliminate or Postpone

Transfer Monitor amp Mitigate +

management ownership + DO BCM -Governance

Contingency Risks to DC - top-left

square very rare events will

extinguish the business complacency +

overconfidence can distract

management from the consequences

DO BCM - Governance

High incidence and low impact risksamp

minor risks order of the day -

operational matters

robertmcachia at umedumt

Risk Assessment Phase

RA considers the whatwhywherewho can and would cause the DCthe disastrous losses identified in BIA RA seeks to understand threatsto valued serviceDC assets

bull RA takes a primarily external focus a worldpoliticaleconomic IT sector focus - it looks at sources and causes

bull RA identifies threat scenarios to DC value-creating activities ampassets

bull RA estimates likelihood (probability) of occurrence of DC lossesamp asset impairment identified in BIA (estimates orguesstimates)

Note Need to locate DC activities amp assets in the slots of the RiskMatrix for the DC and plan to protect value-creating capability

robertmcachia at umedumt

BC Planning amp Design Phase - business recovery

DC recovery options driven by business requirements

regulation contracts amp SLAs A typical recovery sequence

1 Set MTPD objective for the DCfor each

client

2 Assign RTOs for individual

clientservicesapplicationsdata on

basis of prioritySLA

3 Evaluate alternative management amp

technical recovery options

4 Perform costbenefit analysis for each

option (re-planiterate)

5 Decide CEOCFO (i) approval (ii)

budget (iii) tasking individualsFailure to prepare is preparing

to fail - John Wooden

robertmcachia at umedumt

Some business recovery options for the DC

bull an Emergency Operations OfficeCentre

bull virtual workplaceteleworking from home for selected DC employees

bull alternative IT Call CentreData Centre

bull business decision - On-shore Near-shore Offshore

fallbackredundancy options

A typical recovery sequence

1 Initial emergency response

2 Resume mission-critical ApplicationsServices

3 Resume non-critical ApplicationsServices

4 Restoration to primary site full services verify stability

Ongoing communication - DC employees clients regulators

BC Planning amp Design Phase - business recovery (2)

robertmcachia at umedumt

BC Planning amp Design - technical recovery

Note business decision On-shore Near-shore Offshore

Note Mirror Hot Warm Cold (i) own (ii) shared-space (iii) reciprocal

agreements (iv) outsourced a business issue

Note high-end BC for DCs active-active hellip hot failover hellip

Note entry-point BC for DCs electronic vaulting remote journaling

transaction logging

Mirror full redundancy short-

distancewidely-separated Data

Centers latency considerations

Hot fully equipped fallback Data Centre

Warm fallback Data Center missing key

components

Cold empty fallback Data Center

DC Recovery options by increasing MTPD RTO amp decreasing cost

mirroring

robertmcachia at umedumt

Phases Exercise amp Testing Maintain amp Review Awareness amp Training

Maintain amp Review Phase

feedback from testingaudit broadendeepen DC BC plan

Awareness amp Training Phase

its never enough institutionalize BC practice embed into

DC culture

The proof of the bdquopudding‟

is in the eating

Exercise amp Testing Phase

Give yourself an intentional DC disaster

1 Re-test BCP regularly

(walk-through checklist role-

playingsimulation full interruption amp

rehearsal)

2 Test alternative risk (threat +

vulnerability) scenarios

robertmcachia at umedumt

An example disaster mass cyber-attack

A mass cyber-attack may

involve

prior sniffing hellip

hellip social engineering amp

widespread attack

An attack may be blended

denial-of-service andor

widespread virusworm

infection andor

penetration (data theft

data corruption) andor

fraudblackmail

IncidentProblem Management

(recall ISO 20000 ITIL)source M Benyoucef

University of Ottawa

robertmcachia at umedumt

An example DC disaster BCM response to cyber-attack

This sequence applies to DoSDDoS infection amp penetration cyber-

attacks

Note After the BCM response seek prevention (i) Lessons Learnt (ii)

improve DC processsystems resilience (iii) heighten DC emergency

preparedness

1 Interdict actions to block attack halt its progress

2 Contain actions to prevent further degradation

amp regain control

3 Recover repair fix damage bring all systems

data applicationsservices to normality

4 Analyse investigationaudit - examination of

evidence

When prevention fails and an attack is under way the BCM

response is

Interdict Contain

Recover Analyse

robertmcachia at umedumt

Structure amp Content of BS25999

A DC may go beyond organising a BCP project

It may decide to formalise its BCM assurance process by embracing the

BS25999 standard

BS25999 holistic structure amp content

bull overview of business continuity management (BCM)

bull business continuity management policy

bull BCM programme management

bull understanding the organization

bull determining business continuity strategy

bull developing and implementing a BCM response

bull exercising maintaining and reviewing BCM arrangements

bull embedding BCM in the organizationrsquos culture

BS25999 Holistic

robertmcachia at umedumt

Some takeaways (1)

BCM thinking amp action

bull decrease the number of

holes both management amp

technical

bull decrease the size of the

holes and

bull keep them moving so they

are never aligned

The Swiss cheese model of business resilience

robertmcachia at umedumt

Some takeaways (2)

BCM is a concern of

CIOs CTOs Data Centre Managers

Quality Managers Infosec Managers

IT Architects DBAs

IT Auditors

CEOs CFOs because itrsquos Governance

BCM addresses broad non-specific risks it is a ldquocatch allrdquo

BCM is probably the most cost-effective security control of all

BCM start now start small scale fast broadendeepen BCM scope

incrementally

You must be able to respond to your circumstances as they exist - not as you

would like them to be - Brian Billick

robertmcachia at umedumt

Sources in and around BCM for Data Centres

The Business Continuity Institute (BCI UK) httpwwwthebciorg

Guidancestudies on BCM by ENISA (EU) ECB (EU) CMI (UK) FSA (UK) UK

Cabinet Office

A primer on ITIL ISO 20000 - IT Service Management

httpwwwisaca-

maltaorgdocumentspresentationsIT_Service_Management_Robert_M_Cachia_26_02

_2009pdf

Agility amp alertness hellipcontinuitycentral httpwwwcontinuitycentralcomukhtm

BS25999 (British Standards Institution) httpwwwbsigroupcomensectorsandservicesDisciplinesBusines

s-Continuity

RiskIT amp CobIT frameworks (ISACA) wwwisaca-maltaorg

SFIA HR framework for IT ISEB (BCS) httpwwwbcsorg

SEI (USA) httpwwwseicmuedu

Uptime Institute (USA) httpwwwuptimeinstituteorg

robertmcachia at umedumt

Questions and Discussion

robertmcachia atldquo umedumt

Thank You

Dr Robert M Cachia

httpwwwlinkedincominrobertmcachia

robertmcachia ldquoatrdquo umedumt

Threats - general amp IT-specific

-

Explosion fire smoke emissions effluents

in or close to DC

malicious IT Threat

robertmcachia at umedumt

IT Vulnerabilities - some examples (1)

DC vulnerabilities may be technical or non-technical

eg in organizational design in business processes systems or software

Some IT vulnerabilities

Poor Data Centre site selection or Data Centre design

SPOFs (people hardware databases software)

Poor Data Centre procedures or compliance to

procedure (SFIA HR framework for IT ITIL)

Poor physical access control

Poor software patchinggood physical access

control

robertmcachia at umedumt

IT Vulnerabilities - some examples (2)

Poor software testing especially Web

testing (recall CMMi ISEB)

Poor Configuration ManagementVersion

Control (recall ISO 20000 ITIL)

Poor Infosec (passwords biometrics

encryption anti-virusmalware)

(recall ISO 27001 CobIT RiskIT)

Poor Capacity ManagementAvailability

Management (recall ISO 20000 ITIL)

Poor procurementsupplier management

(ISO 9000 ITIL BS25999)

Poor fire protection

Known vulnerabilities sometimes diminish amp new ones emerge - alertness

wiring from hell

robertmcachia at umedumt

Risks - general amp IT-specific

loss of facilities (buildings roads leading to building etc)

loss of data (data integrity data availability recall ISO 27001)

loss of power

loss of water supply (cooling) the opposite - flooding

computer infection outbreaks - viruses worms

physical access compromised intrusion with intent

asset seizure

Changing threats changing vulnerabilities - shifting scenario alertness

Some Data Centres risks

supply chain disruption (software vendor collapse

hardware vendor collapse ISP collapse)

loss of Telecom connectivity (data voice)

loss of people (people can be SPOFs)Your IT vendor supply

chain

Burst pipe

DC flooding

robertmcachia at umedumt

IT Risks - some examples

Physical DC Risks Logical DC Risks

DC outages business impacts - large amp immediate

Impact of DC outages

bull down-time

bull breached SLAs

bull lost revenue

bull reputational damage

clients and prospects lost

forever

bull cost to recover

bull legal liabilities

bull some DCs never recover

Recall many DCs are in

premises that were never

designed to be DC-ready

Q How to protect value-creation capability

A BCM

robertmcachia at umedumt

BCM in Data Centres BCM for Data Centres

360 e-continuity amp e-assurance

business functions of commercial IT providers(eg CEOrsquos office Procurement Marketing amp Sales HR Facilities

Planning QA)

IT processes eg IncidentProblem ConfigurationChange

AvailabilityCapacity

Data

Employees including IT people

Physical FacilitiesBuilding Networks VoIP

Computational and Storage

Servers SANNAS

Software stackOS DBMS middleware applications

ApplicationsServicesB2BB2C e-SCMERPe-CRM BIKM IntranetsExtranets e-mail

BCM for data centres 360

robertmcachia at umedumt

BCM DC initiative output a BC Plan + organizational capability

A good BC plan is articulated by components and views

With a good BC plan the CIOCTOData Centre Manager can tell the

CEO

We haveare executing plans by DC clientmarket segment hellip

We haveare executing plans by Applicationby Service hellip

We haveare executing plans by DC Departmenthellip

We haveare executing plans by Buildingby Floor hellip

BC Plan BC Planning (i) What (ii) How (iii) Who (iv) When

robertmcachia at umedumt

DC Business Continuity Planning management choices

Which DC clientsservicesapplications get restored first and who will

wait longer hellip recovery sequence for each client

Which servicesapplications get restored completely which clients will

initially only get limited service

MTPD maximum tolerable period of disruption the period of time by

which all identified systemsapplicationsservices and data

must be restored to normality (for DC for each client)

RTO recovery time objective the period in time within which

assetsservices must be recovered after disruption (for DC for

each client)

RTO affects the recovery option shorter RTO -gt more difficult

more expensive (for DC for each client)

RPO recovery point objective the point in time to which

assetsservices must be restored after disruption (for DC for

each client)

RPO affects the volume of data that may need to be restored

robertmcachia at umedumt

BCM initiative requires a dedicated ProjectProgramme

A BC initiative may adopt Prince 2 or PMBOK projectmanagement methods The typical BCM phases

BC Project Initiation Phase

(mandate BCM scope DC BCM objectives responsibilitiesapproach awareness and BC project outcomesdeliverables)

Business Impact Analysis Phase

Risk Assessment Phase

BC Planning amp Design Phase

Exercise amp Testing Phase

Maintenance amp Review Phase

Awareness amp Training Phase

robertmcachia at umedumt

Business Impact Analysis (BIA) Phase

BIA achieves understanding of the DC its users activities

servicesrevenue streams applications data liabilities

bull BIA takes an internal focus - it identifies critical DC value-creatingactivities and assets

bull BIA quantifies DC loss due to disruptionimpairment of assets(reputational financial lost revenue cost of recovery)

bull BIA does not estimate the probability of types of DC incidents - itquantifies the consequences

bull BIA techniques questionnaires + interviews

BIA answers the questions which DC activities amp

assets create most value hellip of all potential DC losses

which impaired activities amp assets would be the

greatest losscascading impacts

multiple DC failure

robertmcachia at umedumt

The Risk Matrix - likelihood vs impact

Major Risks to DC - top-right square

unacceptable Eliminate or Postpone

Transfer Monitor amp Mitigate +

management ownership + DO BCM -Governance

Contingency Risks to DC - top-left

square very rare events will

extinguish the business complacency +

overconfidence can distract

management from the consequences

DO BCM - Governance

High incidence and low impact risksamp

minor risks order of the day -

operational matters

robertmcachia at umedumt

Risk Assessment Phase

RA considers the whatwhywherewho can and would cause the DCthe disastrous losses identified in BIA RA seeks to understand threatsto valued serviceDC assets

bull RA takes a primarily external focus a worldpoliticaleconomic IT sector focus - it looks at sources and causes

bull RA identifies threat scenarios to DC value-creating activities ampassets

bull RA estimates likelihood (probability) of occurrence of DC lossesamp asset impairment identified in BIA (estimates orguesstimates)

Note Need to locate DC activities amp assets in the slots of the RiskMatrix for the DC and plan to protect value-creating capability

robertmcachia at umedumt

BC Planning amp Design Phase - business recovery

DC recovery options driven by business requirements

regulation contracts amp SLAs A typical recovery sequence

1 Set MTPD objective for the DCfor each

client

2 Assign RTOs for individual

clientservicesapplicationsdata on

basis of prioritySLA

3 Evaluate alternative management amp

technical recovery options

4 Perform costbenefit analysis for each

option (re-planiterate)

5 Decide CEOCFO (i) approval (ii)

budget (iii) tasking individualsFailure to prepare is preparing

to fail - John Wooden

robertmcachia at umedumt

Some business recovery options for the DC

bull an Emergency Operations OfficeCentre

bull virtual workplaceteleworking from home for selected DC employees

bull alternative IT Call CentreData Centre

bull business decision - On-shore Near-shore Offshore

fallbackredundancy options

A typical recovery sequence

1 Initial emergency response

2 Resume mission-critical ApplicationsServices

3 Resume non-critical ApplicationsServices

4 Restoration to primary site full services verify stability

Ongoing communication - DC employees clients regulators

BC Planning amp Design Phase - business recovery (2)

robertmcachia at umedumt

BC Planning amp Design - technical recovery

Note business decision On-shore Near-shore Offshore

Note Mirror Hot Warm Cold (i) own (ii) shared-space (iii) reciprocal

agreements (iv) outsourced a business issue

Note high-end BC for DCs active-active hellip hot failover hellip

Note entry-point BC for DCs electronic vaulting remote journaling

transaction logging

Mirror full redundancy short-

distancewidely-separated Data

Centers latency considerations

Hot fully equipped fallback Data Centre

Warm fallback Data Center missing key

components

Cold empty fallback Data Center

DC Recovery options by increasing MTPD RTO amp decreasing cost

mirroring

robertmcachia at umedumt

Phases Exercise amp Testing Maintain amp Review Awareness amp Training

Maintain amp Review Phase

feedback from testingaudit broadendeepen DC BC plan

Awareness amp Training Phase

its never enough institutionalize BC practice embed into

DC culture

The proof of the bdquopudding‟

is in the eating

Exercise amp Testing Phase

Give yourself an intentional DC disaster

1 Re-test BCP regularly

(walk-through checklist role-

playingsimulation full interruption amp

rehearsal)

2 Test alternative risk (threat +

vulnerability) scenarios

robertmcachia at umedumt

An example disaster mass cyber-attack

A mass cyber-attack may

involve

prior sniffing hellip

hellip social engineering amp

widespread attack

An attack may be blended

denial-of-service andor

widespread virusworm

infection andor

penetration (data theft

data corruption) andor

fraudblackmail

IncidentProblem Management

(recall ISO 20000 ITIL)source M Benyoucef

University of Ottawa

robertmcachia at umedumt

An example DC disaster BCM response to cyber-attack

This sequence applies to DoSDDoS infection amp penetration cyber-

attacks

Note After the BCM response seek prevention (i) Lessons Learnt (ii)

improve DC processsystems resilience (iii) heighten DC emergency

preparedness

1 Interdict actions to block attack halt its progress

2 Contain actions to prevent further degradation

amp regain control

3 Recover repair fix damage bring all systems

data applicationsservices to normality

4 Analyse investigationaudit - examination of

evidence

When prevention fails and an attack is under way the BCM

response is

Interdict Contain

Recover Analyse

robertmcachia at umedumt

Structure amp Content of BS25999

A DC may go beyond organising a BCP project

It may decide to formalise its BCM assurance process by embracing the

BS25999 standard

BS25999 holistic structure amp content

bull overview of business continuity management (BCM)

bull business continuity management policy

bull BCM programme management

bull understanding the organization

bull determining business continuity strategy

bull developing and implementing a BCM response

bull exercising maintaining and reviewing BCM arrangements

bull embedding BCM in the organizationrsquos culture

BS25999 Holistic

robertmcachia at umedumt

Some takeaways (1)

BCM thinking amp action

bull decrease the number of

holes both management amp

technical

bull decrease the size of the

holes and

bull keep them moving so they

are never aligned

The Swiss cheese model of business resilience

robertmcachia at umedumt

Some takeaways (2)

BCM is a concern of

CIOs CTOs Data Centre Managers

Quality Managers Infosec Managers

IT Architects DBAs

IT Auditors

CEOs CFOs because itrsquos Governance

BCM addresses broad non-specific risks it is a ldquocatch allrdquo

BCM is probably the most cost-effective security control of all

BCM start now start small scale fast broadendeepen BCM scope

incrementally

You must be able to respond to your circumstances as they exist - not as you

would like them to be - Brian Billick

robertmcachia at umedumt

Sources in and around BCM for Data Centres

The Business Continuity Institute (BCI UK) httpwwwthebciorg

Guidancestudies on BCM by ENISA (EU) ECB (EU) CMI (UK) FSA (UK) UK

Cabinet Office

A primer on ITIL ISO 20000 - IT Service Management

httpwwwisaca-

maltaorgdocumentspresentationsIT_Service_Management_Robert_M_Cachia_26_02

_2009pdf

Agility amp alertness hellipcontinuitycentral httpwwwcontinuitycentralcomukhtm

BS25999 (British Standards Institution) httpwwwbsigroupcomensectorsandservicesDisciplinesBusines

s-Continuity

RiskIT amp CobIT frameworks (ISACA) wwwisaca-maltaorg

SFIA HR framework for IT ISEB (BCS) httpwwwbcsorg

SEI (USA) httpwwwseicmuedu

Uptime Institute (USA) httpwwwuptimeinstituteorg

robertmcachia at umedumt

Questions and Discussion

robertmcachia atldquo umedumt

Thank You

Dr Robert M Cachia

httpwwwlinkedincominrobertmcachia

robertmcachia ldquoatrdquo umedumt

IT Vulnerabilities - some examples (1)

DC vulnerabilities may be technical or non-technical

eg in organizational design in business processes systems or software

Some IT vulnerabilities

Poor Data Centre site selection or Data Centre design

SPOFs (people hardware databases software)

Poor Data Centre procedures or compliance to

procedure (SFIA HR framework for IT ITIL)

Poor physical access control

Poor software patchinggood physical access

control

robertmcachia at umedumt

IT Vulnerabilities - some examples (2)

Poor software testing especially Web

testing (recall CMMi ISEB)

Poor Configuration ManagementVersion

Control (recall ISO 20000 ITIL)

Poor Infosec (passwords biometrics

encryption anti-virusmalware)

(recall ISO 27001 CobIT RiskIT)

Poor Capacity ManagementAvailability

Management (recall ISO 20000 ITIL)

Poor procurementsupplier management

(ISO 9000 ITIL BS25999)

Poor fire protection

Known vulnerabilities sometimes diminish amp new ones emerge - alertness

wiring from hell

robertmcachia at umedumt

Risks - general amp IT-specific

loss of facilities (buildings roads leading to building etc)

loss of data (data integrity data availability recall ISO 27001)

loss of power

loss of water supply (cooling) the opposite - flooding

computer infection outbreaks - viruses worms

physical access compromised intrusion with intent

asset seizure

Changing threats changing vulnerabilities - shifting scenario alertness

Some Data Centres risks

supply chain disruption (software vendor collapse

hardware vendor collapse ISP collapse)

loss of Telecom connectivity (data voice)

loss of people (people can be SPOFs)Your IT vendor supply

chain

Burst pipe

DC flooding

robertmcachia at umedumt

IT Risks - some examples

Physical DC Risks Logical DC Risks

DC outages business impacts - large amp immediate

Impact of DC outages

bull down-time

bull breached SLAs

bull lost revenue

bull reputational damage

clients and prospects lost

forever

bull cost to recover

bull legal liabilities

bull some DCs never recover

Recall many DCs are in

premises that were never

designed to be DC-ready

Q How to protect value-creation capability

A BCM

robertmcachia at umedumt

BCM in Data Centres BCM for Data Centres

360 e-continuity amp e-assurance

business functions of commercial IT providers(eg CEOrsquos office Procurement Marketing amp Sales HR Facilities

Planning QA)

IT processes eg IncidentProblem ConfigurationChange

AvailabilityCapacity

Data

Employees including IT people

Physical FacilitiesBuilding Networks VoIP

Computational and Storage

Servers SANNAS

Software stackOS DBMS middleware applications

ApplicationsServicesB2BB2C e-SCMERPe-CRM BIKM IntranetsExtranets e-mail

BCM for data centres 360

robertmcachia at umedumt

BCM DC initiative output a BC Plan + organizational capability

A good BC plan is articulated by components and views

With a good BC plan the CIOCTOData Centre Manager can tell the

CEO

We haveare executing plans by DC clientmarket segment hellip

We haveare executing plans by Applicationby Service hellip

We haveare executing plans by DC Departmenthellip

We haveare executing plans by Buildingby Floor hellip

BC Plan BC Planning (i) What (ii) How (iii) Who (iv) When

robertmcachia at umedumt

DC Business Continuity Planning management choices

Which DC clientsservicesapplications get restored first and who will

wait longer hellip recovery sequence for each client

Which servicesapplications get restored completely which clients will

initially only get limited service

MTPD maximum tolerable period of disruption the period of time by

which all identified systemsapplicationsservices and data

must be restored to normality (for DC for each client)

RTO recovery time objective the period in time within which

assetsservices must be recovered after disruption (for DC for

each client)

RTO affects the recovery option shorter RTO -gt more difficult

more expensive (for DC for each client)

RPO recovery point objective the point in time to which

assetsservices must be restored after disruption (for DC for

each client)

RPO affects the volume of data that may need to be restored

robertmcachia at umedumt

BCM initiative requires a dedicated ProjectProgramme

A BC initiative may adopt Prince 2 or PMBOK projectmanagement methods The typical BCM phases

BC Project Initiation Phase

(mandate BCM scope DC BCM objectives responsibilitiesapproach awareness and BC project outcomesdeliverables)

Business Impact Analysis Phase

Risk Assessment Phase

BC Planning amp Design Phase

Exercise amp Testing Phase

Maintenance amp Review Phase

Awareness amp Training Phase

robertmcachia at umedumt

Business Impact Analysis (BIA) Phase

BIA achieves understanding of the DC its users activities

servicesrevenue streams applications data liabilities

bull BIA takes an internal focus - it identifies critical DC value-creatingactivities and assets

bull BIA quantifies DC loss due to disruptionimpairment of assets(reputational financial lost revenue cost of recovery)

bull BIA does not estimate the probability of types of DC incidents - itquantifies the consequences

bull BIA techniques questionnaires + interviews

BIA answers the questions which DC activities amp

assets create most value hellip of all potential DC losses

which impaired activities amp assets would be the

greatest losscascading impacts

multiple DC failure

robertmcachia at umedumt

The Risk Matrix - likelihood vs impact

Major Risks to DC - top-right square

unacceptable Eliminate or Postpone

Transfer Monitor amp Mitigate +

management ownership + DO BCM -Governance

Contingency Risks to DC - top-left

square very rare events will

extinguish the business complacency +

overconfidence can distract

management from the consequences

DO BCM - Governance

High incidence and low impact risksamp

minor risks order of the day -

operational matters

robertmcachia at umedumt

Risk Assessment Phase

RA considers the whatwhywherewho can and would cause the DCthe disastrous losses identified in BIA RA seeks to understand threatsto valued serviceDC assets

bull RA takes a primarily external focus a worldpoliticaleconomic IT sector focus - it looks at sources and causes

bull RA identifies threat scenarios to DC value-creating activities ampassets

bull RA estimates likelihood (probability) of occurrence of DC lossesamp asset impairment identified in BIA (estimates orguesstimates)

Note Need to locate DC activities amp assets in the slots of the RiskMatrix for the DC and plan to protect value-creating capability

robertmcachia at umedumt

BC Planning amp Design Phase - business recovery

DC recovery options driven by business requirements

regulation contracts amp SLAs A typical recovery sequence

1 Set MTPD objective for the DCfor each

client

2 Assign RTOs for individual

clientservicesapplicationsdata on

basis of prioritySLA

3 Evaluate alternative management amp

technical recovery options

4 Perform costbenefit analysis for each

option (re-planiterate)

5 Decide CEOCFO (i) approval (ii)

budget (iii) tasking individualsFailure to prepare is preparing

to fail - John Wooden

robertmcachia at umedumt

Some business recovery options for the DC

bull an Emergency Operations OfficeCentre

bull virtual workplaceteleworking from home for selected DC employees

bull alternative IT Call CentreData Centre

bull business decision - On-shore Near-shore Offshore

fallbackredundancy options

A typical recovery sequence

1 Initial emergency response

2 Resume mission-critical ApplicationsServices

3 Resume non-critical ApplicationsServices

4 Restoration to primary site full services verify stability

Ongoing communication - DC employees clients regulators

BC Planning amp Design Phase - business recovery (2)

robertmcachia at umedumt

BC Planning amp Design - technical recovery

Note business decision On-shore Near-shore Offshore

Note Mirror Hot Warm Cold (i) own (ii) shared-space (iii) reciprocal

agreements (iv) outsourced a business issue

Note high-end BC for DCs active-active hellip hot failover hellip

Note entry-point BC for DCs electronic vaulting remote journaling

transaction logging

Mirror full redundancy short-

distancewidely-separated Data

Centers latency considerations

Hot fully equipped fallback Data Centre

Warm fallback Data Center missing key

components

Cold empty fallback Data Center

DC Recovery options by increasing MTPD RTO amp decreasing cost

mirroring

robertmcachia at umedumt

Phases Exercise amp Testing Maintain amp Review Awareness amp Training

Maintain amp Review Phase

feedback from testingaudit broadendeepen DC BC plan

Awareness amp Training Phase

its never enough institutionalize BC practice embed into

DC culture

The proof of the bdquopudding‟

is in the eating

Exercise amp Testing Phase

Give yourself an intentional DC disaster

1 Re-test BCP regularly

(walk-through checklist role-

playingsimulation full interruption amp

rehearsal)

2 Test alternative risk (threat +

vulnerability) scenarios

robertmcachia at umedumt

An example disaster mass cyber-attack

A mass cyber-attack may

involve

prior sniffing hellip

hellip social engineering amp

widespread attack

An attack may be blended

denial-of-service andor

widespread virusworm

infection andor

penetration (data theft

data corruption) andor

fraudblackmail

IncidentProblem Management

(recall ISO 20000 ITIL)source M Benyoucef

University of Ottawa

robertmcachia at umedumt

An example DC disaster BCM response to cyber-attack

This sequence applies to DoSDDoS infection amp penetration cyber-

attacks

Note After the BCM response seek prevention (i) Lessons Learnt (ii)

improve DC processsystems resilience (iii) heighten DC emergency

preparedness

1 Interdict actions to block attack halt its progress

2 Contain actions to prevent further degradation

amp regain control

3 Recover repair fix damage bring all systems

data applicationsservices to normality

4 Analyse investigationaudit - examination of

evidence

When prevention fails and an attack is under way the BCM

response is

Interdict Contain

Recover Analyse

robertmcachia at umedumt

Structure amp Content of BS25999

A DC may go beyond organising a BCP project

It may decide to formalise its BCM assurance process by embracing the

BS25999 standard

BS25999 holistic structure amp content

bull overview of business continuity management (BCM)

bull business continuity management policy

bull BCM programme management

bull understanding the organization

bull determining business continuity strategy

bull developing and implementing a BCM response

bull exercising maintaining and reviewing BCM arrangements

bull embedding BCM in the organizationrsquos culture

BS25999 Holistic

robertmcachia at umedumt

Some takeaways (1)

BCM thinking amp action

bull decrease the number of

holes both management amp

technical

bull decrease the size of the

holes and

bull keep them moving so they

are never aligned

The Swiss cheese model of business resilience

robertmcachia at umedumt

Some takeaways (2)

BCM is a concern of

CIOs CTOs Data Centre Managers

Quality Managers Infosec Managers

IT Architects DBAs

IT Auditors

CEOs CFOs because itrsquos Governance

BCM addresses broad non-specific risks it is a ldquocatch allrdquo

BCM is probably the most cost-effective security control of all

BCM start now start small scale fast broadendeepen BCM scope

incrementally

You must be able to respond to your circumstances as they exist - not as you

would like them to be - Brian Billick

robertmcachia at umedumt

Sources in and around BCM for Data Centres

The Business Continuity Institute (BCI UK) httpwwwthebciorg

Guidancestudies on BCM by ENISA (EU) ECB (EU) CMI (UK) FSA (UK) UK

Cabinet Office

A primer on ITIL ISO 20000 - IT Service Management

httpwwwisaca-

maltaorgdocumentspresentationsIT_Service_Management_Robert_M_Cachia_26_02

_2009pdf

Agility amp alertness hellipcontinuitycentral httpwwwcontinuitycentralcomukhtm

BS25999 (British Standards Institution) httpwwwbsigroupcomensectorsandservicesDisciplinesBusines

s-Continuity

RiskIT amp CobIT frameworks (ISACA) wwwisaca-maltaorg

SFIA HR framework for IT ISEB (BCS) httpwwwbcsorg

SEI (USA) httpwwwseicmuedu

Uptime Institute (USA) httpwwwuptimeinstituteorg

robertmcachia at umedumt

Questions and Discussion

robertmcachia atldquo umedumt

Thank You

Dr Robert M Cachia

httpwwwlinkedincominrobertmcachia

robertmcachia ldquoatrdquo umedumt

IT Vulnerabilities - some examples (2)

Poor software testing especially Web

testing (recall CMMi ISEB)

Poor Configuration ManagementVersion

Control (recall ISO 20000 ITIL)

Poor Infosec (passwords biometrics

encryption anti-virusmalware)

(recall ISO 27001 CobIT RiskIT)

Poor Capacity ManagementAvailability

Management (recall ISO 20000 ITIL)

Poor procurementsupplier management

(ISO 9000 ITIL BS25999)

Poor fire protection

Known vulnerabilities sometimes diminish amp new ones emerge - alertness

wiring from hell

robertmcachia at umedumt

Risks - general amp IT-specific

loss of facilities (buildings roads leading to building etc)

loss of data (data integrity data availability recall ISO 27001)

loss of power

loss of water supply (cooling) the opposite - flooding

computer infection outbreaks - viruses worms

physical access compromised intrusion with intent

asset seizure

Changing threats changing vulnerabilities - shifting scenario alertness

Some Data Centres risks

supply chain disruption (software vendor collapse

hardware vendor collapse ISP collapse)

loss of Telecom connectivity (data voice)

loss of people (people can be SPOFs)Your IT vendor supply

chain

Burst pipe

DC flooding

robertmcachia at umedumt

IT Risks - some examples

Physical DC Risks Logical DC Risks

DC outages business impacts - large amp immediate

Impact of DC outages

bull down-time

bull breached SLAs

bull lost revenue

bull reputational damage

clients and prospects lost

forever

bull cost to recover

bull legal liabilities

bull some DCs never recover

Recall many DCs are in

premises that were never

designed to be DC-ready

Q How to protect value-creation capability

A BCM

robertmcachia at umedumt

BCM in Data Centres BCM for Data Centres

360 e-continuity amp e-assurance

business functions of commercial IT providers(eg CEOrsquos office Procurement Marketing amp Sales HR Facilities

Planning QA)

IT processes eg IncidentProblem ConfigurationChange

AvailabilityCapacity

Data

Employees including IT people

Physical FacilitiesBuilding Networks VoIP

Computational and Storage

Servers SANNAS

Software stackOS DBMS middleware applications

ApplicationsServicesB2BB2C e-SCMERPe-CRM BIKM IntranetsExtranets e-mail

BCM for data centres 360

robertmcachia at umedumt

BCM DC initiative output a BC Plan + organizational capability

A good BC plan is articulated by components and views

With a good BC plan the CIOCTOData Centre Manager can tell the

CEO

We haveare executing plans by DC clientmarket segment hellip

We haveare executing plans by Applicationby Service hellip

We haveare executing plans by DC Departmenthellip

We haveare executing plans by Buildingby Floor hellip

BC Plan BC Planning (i) What (ii) How (iii) Who (iv) When

robertmcachia at umedumt

DC Business Continuity Planning management choices

Which DC clientsservicesapplications get restored first and who will

wait longer hellip recovery sequence for each client

Which servicesapplications get restored completely which clients will

initially only get limited service

MTPD maximum tolerable period of disruption the period of time by

which all identified systemsapplicationsservices and data

must be restored to normality (for DC for each client)

RTO recovery time objective the period in time within which

assetsservices must be recovered after disruption (for DC for

each client)

RTO affects the recovery option shorter RTO -gt more difficult

more expensive (for DC for each client)

RPO recovery point objective the point in time to which

assetsservices must be restored after disruption (for DC for

each client)

RPO affects the volume of data that may need to be restored

robertmcachia at umedumt

BCM initiative requires a dedicated ProjectProgramme

A BC initiative may adopt Prince 2 or PMBOK projectmanagement methods The typical BCM phases

BC Project Initiation Phase

(mandate BCM scope DC BCM objectives responsibilitiesapproach awareness and BC project outcomesdeliverables)

Business Impact Analysis Phase

Risk Assessment Phase

BC Planning amp Design Phase

Exercise amp Testing Phase

Maintenance amp Review Phase

Awareness amp Training Phase

robertmcachia at umedumt

Business Impact Analysis (BIA) Phase

BIA achieves understanding of the DC its users activities

servicesrevenue streams applications data liabilities

bull BIA takes an internal focus - it identifies critical DC value-creatingactivities and assets

bull BIA quantifies DC loss due to disruptionimpairment of assets(reputational financial lost revenue cost of recovery)

bull BIA does not estimate the probability of types of DC incidents - itquantifies the consequences

bull BIA techniques questionnaires + interviews

BIA answers the questions which DC activities amp

assets create most value hellip of all potential DC losses

which impaired activities amp assets would be the

greatest losscascading impacts

multiple DC failure

robertmcachia at umedumt

The Risk Matrix - likelihood vs impact

Major Risks to DC - top-right square

unacceptable Eliminate or Postpone

Transfer Monitor amp Mitigate +

management ownership + DO BCM -Governance

Contingency Risks to DC - top-left

square very rare events will

extinguish the business complacency +

overconfidence can distract

management from the consequences

DO BCM - Governance

High incidence and low impact risksamp

minor risks order of the day -

operational matters

robertmcachia at umedumt

Risk Assessment Phase

RA considers the whatwhywherewho can and would cause the DCthe disastrous losses identified in BIA RA seeks to understand threatsto valued serviceDC assets

bull RA takes a primarily external focus a worldpoliticaleconomic IT sector focus - it looks at sources and causes

bull RA identifies threat scenarios to DC value-creating activities ampassets

bull RA estimates likelihood (probability) of occurrence of DC lossesamp asset impairment identified in BIA (estimates orguesstimates)

Note Need to locate DC activities amp assets in the slots of the RiskMatrix for the DC and plan to protect value-creating capability

robertmcachia at umedumt

BC Planning amp Design Phase - business recovery

DC recovery options driven by business requirements

regulation contracts amp SLAs A typical recovery sequence

1 Set MTPD objective for the DCfor each

client

2 Assign RTOs for individual

clientservicesapplicationsdata on

basis of prioritySLA

3 Evaluate alternative management amp

technical recovery options

4 Perform costbenefit analysis for each

option (re-planiterate)

5 Decide CEOCFO (i) approval (ii)

budget (iii) tasking individualsFailure to prepare is preparing

to fail - John Wooden

robertmcachia at umedumt

Some business recovery options for the DC

bull an Emergency Operations OfficeCentre

bull virtual workplaceteleworking from home for selected DC employees

bull alternative IT Call CentreData Centre

bull business decision - On-shore Near-shore Offshore

fallbackredundancy options

A typical recovery sequence

1 Initial emergency response

2 Resume mission-critical ApplicationsServices

3 Resume non-critical ApplicationsServices

4 Restoration to primary site full services verify stability

Ongoing communication - DC employees clients regulators

BC Planning amp Design Phase - business recovery (2)

robertmcachia at umedumt

BC Planning amp Design - technical recovery

Note business decision On-shore Near-shore Offshore

Note Mirror Hot Warm Cold (i) own (ii) shared-space (iii) reciprocal

agreements (iv) outsourced a business issue

Note high-end BC for DCs active-active hellip hot failover hellip

Note entry-point BC for DCs electronic vaulting remote journaling

transaction logging

Mirror full redundancy short-

distancewidely-separated Data

Centers latency considerations

Hot fully equipped fallback Data Centre

Warm fallback Data Center missing key

components

Cold empty fallback Data Center

DC Recovery options by increasing MTPD RTO amp decreasing cost

mirroring

robertmcachia at umedumt

Phases Exercise amp Testing Maintain amp Review Awareness amp Training

Maintain amp Review Phase

feedback from testingaudit broadendeepen DC BC plan

Awareness amp Training Phase

its never enough institutionalize BC practice embed into

DC culture

The proof of the bdquopudding‟

is in the eating

Exercise amp Testing Phase

Give yourself an intentional DC disaster

1 Re-test BCP regularly

(walk-through checklist role-

playingsimulation full interruption amp

rehearsal)

2 Test alternative risk (threat +

vulnerability) scenarios

robertmcachia at umedumt

An example disaster mass cyber-attack

A mass cyber-attack may

involve

prior sniffing hellip

hellip social engineering amp

widespread attack

An attack may be blended

denial-of-service andor

widespread virusworm

infection andor

penetration (data theft

data corruption) andor

fraudblackmail

IncidentProblem Management

(recall ISO 20000 ITIL)source M Benyoucef

University of Ottawa

robertmcachia at umedumt

An example DC disaster BCM response to cyber-attack

This sequence applies to DoSDDoS infection amp penetration cyber-

attacks

Note After the BCM response seek prevention (i) Lessons Learnt (ii)

improve DC processsystems resilience (iii) heighten DC emergency

preparedness

1 Interdict actions to block attack halt its progress

2 Contain actions to prevent further degradation

amp regain control

3 Recover repair fix damage bring all systems

data applicationsservices to normality

4 Analyse investigationaudit - examination of

evidence

When prevention fails and an attack is under way the BCM

response is

Interdict Contain

Recover Analyse

robertmcachia at umedumt

Structure amp Content of BS25999

A DC may go beyond organising a BCP project

It may decide to formalise its BCM assurance process by embracing the

BS25999 standard

BS25999 holistic structure amp content

bull overview of business continuity management (BCM)

bull business continuity management policy

bull BCM programme management

bull understanding the organization

bull determining business continuity strategy

bull developing and implementing a BCM response

bull exercising maintaining and reviewing BCM arrangements

bull embedding BCM in the organizationrsquos culture

BS25999 Holistic

robertmcachia at umedumt

Some takeaways (1)

BCM thinking amp action

bull decrease the number of

holes both management amp

technical

bull decrease the size of the

holes and

bull keep them moving so they

are never aligned

The Swiss cheese model of business resilience

robertmcachia at umedumt

Some takeaways (2)

BCM is a concern of

CIOs CTOs Data Centre Managers

Quality Managers Infosec Managers

IT Architects DBAs

IT Auditors

CEOs CFOs because itrsquos Governance

BCM addresses broad non-specific risks it is a ldquocatch allrdquo

BCM is probably the most cost-effective security control of all

BCM start now start small scale fast broadendeepen BCM scope

incrementally

You must be able to respond to your circumstances as they exist - not as you

would like them to be - Brian Billick

robertmcachia at umedumt

Sources in and around BCM for Data Centres

The Business Continuity Institute (BCI UK) httpwwwthebciorg

Guidancestudies on BCM by ENISA (EU) ECB (EU) CMI (UK) FSA (UK) UK

Cabinet Office

A primer on ITIL ISO 20000 - IT Service Management

httpwwwisaca-

maltaorgdocumentspresentationsIT_Service_Management_Robert_M_Cachia_26_02

_2009pdf

Agility amp alertness hellipcontinuitycentral httpwwwcontinuitycentralcomukhtm

BS25999 (British Standards Institution) httpwwwbsigroupcomensectorsandservicesDisciplinesBusines

s-Continuity

RiskIT amp CobIT frameworks (ISACA) wwwisaca-maltaorg

SFIA HR framework for IT ISEB (BCS) httpwwwbcsorg

SEI (USA) httpwwwseicmuedu

Uptime Institute (USA) httpwwwuptimeinstituteorg

robertmcachia at umedumt

Questions and Discussion

robertmcachia atldquo umedumt

Thank You

Dr Robert M Cachia

httpwwwlinkedincominrobertmcachia

robertmcachia ldquoatrdquo umedumt

Risks - general amp IT-specific

loss of facilities (buildings roads leading to building etc)

loss of data (data integrity data availability recall ISO 27001)

loss of power

loss of water supply (cooling) the opposite - flooding

computer infection outbreaks - viruses worms

physical access compromised intrusion with intent

asset seizure

Changing threats changing vulnerabilities - shifting scenario alertness

Some Data Centres risks

supply chain disruption (software vendor collapse

hardware vendor collapse ISP collapse)

loss of Telecom connectivity (data voice)

loss of people (people can be SPOFs)Your IT vendor supply

chain

Burst pipe

DC flooding

robertmcachia at umedumt

IT Risks - some examples

Physical DC Risks Logical DC Risks

DC outages business impacts - large amp immediate

Impact of DC outages

bull down-time

bull breached SLAs

bull lost revenue

bull reputational damage

clients and prospects lost

forever

bull cost to recover

bull legal liabilities

bull some DCs never recover

Recall many DCs are in

premises that were never

designed to be DC-ready

Q How to protect value-creation capability

A BCM

robertmcachia at umedumt

BCM in Data Centres BCM for Data Centres

360 e-continuity amp e-assurance

business functions of commercial IT providers(eg CEOrsquos office Procurement Marketing amp Sales HR Facilities

Planning QA)

IT processes eg IncidentProblem ConfigurationChange

AvailabilityCapacity

Data

Employees including IT people

Physical FacilitiesBuilding Networks VoIP

Computational and Storage

Servers SANNAS

Software stackOS DBMS middleware applications

ApplicationsServicesB2BB2C e-SCMERPe-CRM BIKM IntranetsExtranets e-mail

BCM for data centres 360

robertmcachia at umedumt

BCM DC initiative output a BC Plan + organizational capability

A good BC plan is articulated by components and views

With a good BC plan the CIOCTOData Centre Manager can tell the

CEO

We haveare executing plans by DC clientmarket segment hellip

We haveare executing plans by Applicationby Service hellip

We haveare executing plans by DC Departmenthellip

We haveare executing plans by Buildingby Floor hellip

BC Plan BC Planning (i) What (ii) How (iii) Who (iv) When

robertmcachia at umedumt

DC Business Continuity Planning management choices

Which DC clientsservicesapplications get restored first and who will

wait longer hellip recovery sequence for each client

Which servicesapplications get restored completely which clients will

initially only get limited service

MTPD maximum tolerable period of disruption the period of time by

which all identified systemsapplicationsservices and data

must be restored to normality (for DC for each client)

RTO recovery time objective the period in time within which

assetsservices must be recovered after disruption (for DC for

each client)

RTO affects the recovery option shorter RTO -gt more difficult

more expensive (for DC for each client)

RPO recovery point objective the point in time to which

assetsservices must be restored after disruption (for DC for

each client)

RPO affects the volume of data that may need to be restored

robertmcachia at umedumt

BCM initiative requires a dedicated ProjectProgramme

A BC initiative may adopt Prince 2 or PMBOK projectmanagement methods The typical BCM phases

BC Project Initiation Phase

(mandate BCM scope DC BCM objectives responsibilitiesapproach awareness and BC project outcomesdeliverables)

Business Impact Analysis Phase

Risk Assessment Phase

BC Planning amp Design Phase

Exercise amp Testing Phase

Maintenance amp Review Phase

Awareness amp Training Phase

robertmcachia at umedumt

Business Impact Analysis (BIA) Phase

BIA achieves understanding of the DC its users activities

servicesrevenue streams applications data liabilities

bull BIA takes an internal focus - it identifies critical DC value-creatingactivities and assets

bull BIA quantifies DC loss due to disruptionimpairment of assets(reputational financial lost revenue cost of recovery)

bull BIA does not estimate the probability of types of DC incidents - itquantifies the consequences

bull BIA techniques questionnaires + interviews

BIA answers the questions which DC activities amp

assets create most value hellip of all potential DC losses

which impaired activities amp assets would be the

greatest losscascading impacts

multiple DC failure

robertmcachia at umedumt

The Risk Matrix - likelihood vs impact

Major Risks to DC - top-right square

unacceptable Eliminate or Postpone

Transfer Monitor amp Mitigate +

management ownership + DO BCM -Governance

Contingency Risks to DC - top-left

square very rare events will

extinguish the business complacency +

overconfidence can distract

management from the consequences

DO BCM - Governance

High incidence and low impact risksamp

minor risks order of the day -

operational matters

robertmcachia at umedumt

Risk Assessment Phase

RA considers the whatwhywherewho can and would cause the DCthe disastrous losses identified in BIA RA seeks to understand threatsto valued serviceDC assets

bull RA takes a primarily external focus a worldpoliticaleconomic IT sector focus - it looks at sources and causes

bull RA identifies threat scenarios to DC value-creating activities ampassets

bull RA estimates likelihood (probability) of occurrence of DC lossesamp asset impairment identified in BIA (estimates orguesstimates)

Note Need to locate DC activities amp assets in the slots of the RiskMatrix for the DC and plan to protect value-creating capability

robertmcachia at umedumt

BC Planning amp Design Phase - business recovery

DC recovery options driven by business requirements

regulation contracts amp SLAs A typical recovery sequence

1 Set MTPD objective for the DCfor each

client

2 Assign RTOs for individual

clientservicesapplicationsdata on

basis of prioritySLA

3 Evaluate alternative management amp

technical recovery options

4 Perform costbenefit analysis for each

option (re-planiterate)

5 Decide CEOCFO (i) approval (ii)

budget (iii) tasking individualsFailure to prepare is preparing

to fail - John Wooden

robertmcachia at umedumt

Some business recovery options for the DC

bull an Emergency Operations OfficeCentre

bull virtual workplaceteleworking from home for selected DC employees

bull alternative IT Call CentreData Centre

bull business decision - On-shore Near-shore Offshore

fallbackredundancy options

A typical recovery sequence

1 Initial emergency response

2 Resume mission-critical ApplicationsServices

3 Resume non-critical ApplicationsServices

4 Restoration to primary site full services verify stability

Ongoing communication - DC employees clients regulators

BC Planning amp Design Phase - business recovery (2)

robertmcachia at umedumt

BC Planning amp Design - technical recovery

Note business decision On-shore Near-shore Offshore

Note Mirror Hot Warm Cold (i) own (ii) shared-space (iii) reciprocal

agreements (iv) outsourced a business issue

Note high-end BC for DCs active-active hellip hot failover hellip

Note entry-point BC for DCs electronic vaulting remote journaling

transaction logging

Mirror full redundancy short-

distancewidely-separated Data

Centers latency considerations

Hot fully equipped fallback Data Centre

Warm fallback Data Center missing key

components

Cold empty fallback Data Center

DC Recovery options by increasing MTPD RTO amp decreasing cost

mirroring

robertmcachia at umedumt

Phases Exercise amp Testing Maintain amp Review Awareness amp Training

Maintain amp Review Phase

feedback from testingaudit broadendeepen DC BC plan

Awareness amp Training Phase

its never enough institutionalize BC practice embed into

DC culture

The proof of the bdquopudding‟

is in the eating

Exercise amp Testing Phase

Give yourself an intentional DC disaster

1 Re-test BCP regularly

(walk-through checklist role-

playingsimulation full interruption amp

rehearsal)

2 Test alternative risk (threat +

vulnerability) scenarios

robertmcachia at umedumt

An example disaster mass cyber-attack

A mass cyber-attack may

involve

prior sniffing hellip

hellip social engineering amp

widespread attack

An attack may be blended

denial-of-service andor

widespread virusworm

infection andor

penetration (data theft

data corruption) andor

fraudblackmail

IncidentProblem Management

(recall ISO 20000 ITIL)source M Benyoucef

University of Ottawa

robertmcachia at umedumt

An example DC disaster BCM response to cyber-attack

This sequence applies to DoSDDoS infection amp penetration cyber-

attacks

Note After the BCM response seek prevention (i) Lessons Learnt (ii)

improve DC processsystems resilience (iii) heighten DC emergency

preparedness

1 Interdict actions to block attack halt its progress

2 Contain actions to prevent further degradation

amp regain control

3 Recover repair fix damage bring all systems

data applicationsservices to normality

4 Analyse investigationaudit - examination of

evidence

When prevention fails and an attack is under way the BCM

response is

Interdict Contain

Recover Analyse

robertmcachia at umedumt

Structure amp Content of BS25999

A DC may go beyond organising a BCP project

It may decide to formalise its BCM assurance process by embracing the

BS25999 standard

BS25999 holistic structure amp content

bull overview of business continuity management (BCM)

bull business continuity management policy

bull BCM programme management

bull understanding the organization

bull determining business continuity strategy

bull developing and implementing a BCM response

bull exercising maintaining and reviewing BCM arrangements

bull embedding BCM in the organizationrsquos culture

BS25999 Holistic

robertmcachia at umedumt

Some takeaways (1)

BCM thinking amp action

bull decrease the number of

holes both management amp

technical

bull decrease the size of the

holes and

bull keep them moving so they

are never aligned

The Swiss cheese model of business resilience

robertmcachia at umedumt

Some takeaways (2)

BCM is a concern of

CIOs CTOs Data Centre Managers

Quality Managers Infosec Managers

IT Architects DBAs

IT Auditors

CEOs CFOs because itrsquos Governance

BCM addresses broad non-specific risks it is a ldquocatch allrdquo

BCM is probably the most cost-effective security control of all

BCM start now start small scale fast broadendeepen BCM scope

incrementally

You must be able to respond to your circumstances as they exist - not as you

would like them to be - Brian Billick

robertmcachia at umedumt

Sources in and around BCM for Data Centres

The Business Continuity Institute (BCI UK) httpwwwthebciorg

Guidancestudies on BCM by ENISA (EU) ECB (EU) CMI (UK) FSA (UK) UK

Cabinet Office

A primer on ITIL ISO 20000 - IT Service Management

httpwwwisaca-

maltaorgdocumentspresentationsIT_Service_Management_Robert_M_Cachia_26_02

_2009pdf

Agility amp alertness hellipcontinuitycentral httpwwwcontinuitycentralcomukhtm

BS25999 (British Standards Institution) httpwwwbsigroupcomensectorsandservicesDisciplinesBusines

s-Continuity

RiskIT amp CobIT frameworks (ISACA) wwwisaca-maltaorg

SFIA HR framework for IT ISEB (BCS) httpwwwbcsorg

SEI (USA) httpwwwseicmuedu

Uptime Institute (USA) httpwwwuptimeinstituteorg

robertmcachia at umedumt

Questions and Discussion

robertmcachia atldquo umedumt

Thank You

Dr Robert M Cachia

httpwwwlinkedincominrobertmcachia

robertmcachia ldquoatrdquo umedumt

IT Risks - some examples

Physical DC Risks Logical DC Risks

DC outages business impacts - large amp immediate

Impact of DC outages

bull down-time

bull breached SLAs

bull lost revenue

bull reputational damage

clients and prospects lost

forever

bull cost to recover

bull legal liabilities

bull some DCs never recover

Recall many DCs are in

premises that were never

designed to be DC-ready

Q How to protect value-creation capability

A BCM

robertmcachia at umedumt

BCM in Data Centres BCM for Data Centres

360 e-continuity amp e-assurance

business functions of commercial IT providers(eg CEOrsquos office Procurement Marketing amp Sales HR Facilities

Planning QA)

IT processes eg IncidentProblem ConfigurationChange

AvailabilityCapacity

Data

Employees including IT people

Physical FacilitiesBuilding Networks VoIP

Computational and Storage

Servers SANNAS

Software stackOS DBMS middleware applications

ApplicationsServicesB2BB2C e-SCMERPe-CRM BIKM IntranetsExtranets e-mail

BCM for data centres 360

robertmcachia at umedumt

BCM DC initiative output a BC Plan + organizational capability

A good BC plan is articulated by components and views

With a good BC plan the CIOCTOData Centre Manager can tell the

CEO

We haveare executing plans by DC clientmarket segment hellip

We haveare executing plans by Applicationby Service hellip

We haveare executing plans by DC Departmenthellip

We haveare executing plans by Buildingby Floor hellip

BC Plan BC Planning (i) What (ii) How (iii) Who (iv) When

robertmcachia at umedumt

DC Business Continuity Planning management choices

Which DC clientsservicesapplications get restored first and who will

wait longer hellip recovery sequence for each client

Which servicesapplications get restored completely which clients will

initially only get limited service

MTPD maximum tolerable period of disruption the period of time by

which all identified systemsapplicationsservices and data

must be restored to normality (for DC for each client)

RTO recovery time objective the period in time within which

assetsservices must be recovered after disruption (for DC for

each client)

RTO affects the recovery option shorter RTO -gt more difficult

more expensive (for DC for each client)

RPO recovery point objective the point in time to which

assetsservices must be restored after disruption (for DC for

each client)

RPO affects the volume of data that may need to be restored

robertmcachia at umedumt

BCM initiative requires a dedicated ProjectProgramme

A BC initiative may adopt Prince 2 or PMBOK projectmanagement methods The typical BCM phases

BC Project Initiation Phase

(mandate BCM scope DC BCM objectives responsibilitiesapproach awareness and BC project outcomesdeliverables)

Business Impact Analysis Phase

Risk Assessment Phase

BC Planning amp Design Phase

Exercise amp Testing Phase

Maintenance amp Review Phase

Awareness amp Training Phase

robertmcachia at umedumt

Business Impact Analysis (BIA) Phase

BIA achieves understanding of the DC its users activities

servicesrevenue streams applications data liabilities

bull BIA takes an internal focus - it identifies critical DC value-creatingactivities and assets

bull BIA quantifies DC loss due to disruptionimpairment of assets(reputational financial lost revenue cost of recovery)

bull BIA does not estimate the probability of types of DC incidents - itquantifies the consequences

bull BIA techniques questionnaires + interviews

BIA answers the questions which DC activities amp

assets create most value hellip of all potential DC losses

which impaired activities amp assets would be the

greatest losscascading impacts

multiple DC failure

robertmcachia at umedumt

The Risk Matrix - likelihood vs impact

Major Risks to DC - top-right square

unacceptable Eliminate or Postpone

Transfer Monitor amp Mitigate +

management ownership + DO BCM -Governance

Contingency Risks to DC - top-left

square very rare events will

extinguish the business complacency +

overconfidence can distract

management from the consequences

DO BCM - Governance

High incidence and low impact risksamp

minor risks order of the day -

operational matters

robertmcachia at umedumt

Risk Assessment Phase

RA considers the whatwhywherewho can and would cause the DCthe disastrous losses identified in BIA RA seeks to understand threatsto valued serviceDC assets

bull RA takes a primarily external focus a worldpoliticaleconomic IT sector focus - it looks at sources and causes

bull RA identifies threat scenarios to DC value-creating activities ampassets

bull RA estimates likelihood (probability) of occurrence of DC lossesamp asset impairment identified in BIA (estimates orguesstimates)

Note Need to locate DC activities amp assets in the slots of the RiskMatrix for the DC and plan to protect value-creating capability

robertmcachia at umedumt

BC Planning amp Design Phase - business recovery

DC recovery options driven by business requirements

regulation contracts amp SLAs A typical recovery sequence

1 Set MTPD objective for the DCfor each

client

2 Assign RTOs for individual

clientservicesapplicationsdata on

basis of prioritySLA

3 Evaluate alternative management amp

technical recovery options

4 Perform costbenefit analysis for each

option (re-planiterate)

5 Decide CEOCFO (i) approval (ii)

budget (iii) tasking individualsFailure to prepare is preparing

to fail - John Wooden

robertmcachia at umedumt

Some business recovery options for the DC

bull an Emergency Operations OfficeCentre

bull virtual workplaceteleworking from home for selected DC employees

bull alternative IT Call CentreData Centre

bull business decision - On-shore Near-shore Offshore

fallbackredundancy options

A typical recovery sequence

1 Initial emergency response

2 Resume mission-critical ApplicationsServices

3 Resume non-critical ApplicationsServices

4 Restoration to primary site full services verify stability

Ongoing communication - DC employees clients regulators

BC Planning amp Design Phase - business recovery (2)

robertmcachia at umedumt

BC Planning amp Design - technical recovery

Note business decision On-shore Near-shore Offshore

Note Mirror Hot Warm Cold (i) own (ii) shared-space (iii) reciprocal

agreements (iv) outsourced a business issue

Note high-end BC for DCs active-active hellip hot failover hellip

Note entry-point BC for DCs electronic vaulting remote journaling

transaction logging

Mirror full redundancy short-

distancewidely-separated Data

Centers latency considerations

Hot fully equipped fallback Data Centre

Warm fallback Data Center missing key

components

Cold empty fallback Data Center

DC Recovery options by increasing MTPD RTO amp decreasing cost

mirroring

robertmcachia at umedumt

Phases Exercise amp Testing Maintain amp Review Awareness amp Training

Maintain amp Review Phase

feedback from testingaudit broadendeepen DC BC plan

Awareness amp Training Phase

its never enough institutionalize BC practice embed into

DC culture

The proof of the bdquopudding‟

is in the eating

Exercise amp Testing Phase

Give yourself an intentional DC disaster

1 Re-test BCP regularly

(walk-through checklist role-

playingsimulation full interruption amp

rehearsal)

2 Test alternative risk (threat +

vulnerability) scenarios

robertmcachia at umedumt

An example disaster mass cyber-attack

A mass cyber-attack may

involve

prior sniffing hellip

hellip social engineering amp

widespread attack

An attack may be blended

denial-of-service andor

widespread virusworm

infection andor

penetration (data theft

data corruption) andor

fraudblackmail

IncidentProblem Management

(recall ISO 20000 ITIL)source M Benyoucef

University of Ottawa

robertmcachia at umedumt

An example DC disaster BCM response to cyber-attack

This sequence applies to DoSDDoS infection amp penetration cyber-

attacks

Note After the BCM response seek prevention (i) Lessons Learnt (ii)

improve DC processsystems resilience (iii) heighten DC emergency

preparedness

1 Interdict actions to block attack halt its progress

2 Contain actions to prevent further degradation

amp regain control

3 Recover repair fix damage bring all systems

data applicationsservices to normality

4 Analyse investigationaudit - examination of

evidence

When prevention fails and an attack is under way the BCM

response is

Interdict Contain

Recover Analyse

robertmcachia at umedumt

Structure amp Content of BS25999

A DC may go beyond organising a BCP project

It may decide to formalise its BCM assurance process by embracing the

BS25999 standard

BS25999 holistic structure amp content

bull overview of business continuity management (BCM)

bull business continuity management policy

bull BCM programme management

bull understanding the organization

bull determining business continuity strategy

bull developing and implementing a BCM response

bull exercising maintaining and reviewing BCM arrangements

bull embedding BCM in the organizationrsquos culture

BS25999 Holistic

robertmcachia at umedumt

Some takeaways (1)

BCM thinking amp action

bull decrease the number of

holes both management amp

technical

bull decrease the size of the

holes and

bull keep them moving so they

are never aligned

The Swiss cheese model of business resilience

robertmcachia at umedumt

Some takeaways (2)

BCM is a concern of

CIOs CTOs Data Centre Managers

Quality Managers Infosec Managers

IT Architects DBAs

IT Auditors

CEOs CFOs because itrsquos Governance

BCM addresses broad non-specific risks it is a ldquocatch allrdquo

BCM is probably the most cost-effective security control of all

BCM start now start small scale fast broadendeepen BCM scope

incrementally

You must be able to respond to your circumstances as they exist - not as you

would like them to be - Brian Billick

robertmcachia at umedumt

Sources in and around BCM for Data Centres

The Business Continuity Institute (BCI UK) httpwwwthebciorg

Guidancestudies on BCM by ENISA (EU) ECB (EU) CMI (UK) FSA (UK) UK

Cabinet Office

A primer on ITIL ISO 20000 - IT Service Management

httpwwwisaca-

maltaorgdocumentspresentationsIT_Service_Management_Robert_M_Cachia_26_02

_2009pdf

Agility amp alertness hellipcontinuitycentral httpwwwcontinuitycentralcomukhtm

BS25999 (British Standards Institution) httpwwwbsigroupcomensectorsandservicesDisciplinesBusines

s-Continuity

RiskIT amp CobIT frameworks (ISACA) wwwisaca-maltaorg

SFIA HR framework for IT ISEB (BCS) httpwwwbcsorg

SEI (USA) httpwwwseicmuedu

Uptime Institute (USA) httpwwwuptimeinstituteorg

robertmcachia at umedumt

Questions and Discussion

robertmcachia atldquo umedumt

Thank You

Dr Robert M Cachia

httpwwwlinkedincominrobertmcachia

robertmcachia ldquoatrdquo umedumt

DC outages business impacts - large amp immediate

Impact of DC outages

bull down-time

bull breached SLAs

bull lost revenue

bull reputational damage

clients and prospects lost

forever

bull cost to recover

bull legal liabilities

bull some DCs never recover

Recall many DCs are in

premises that were never

designed to be DC-ready

Q How to protect value-creation capability

A BCM

robertmcachia at umedumt

BCM in Data Centres BCM for Data Centres

360 e-continuity amp e-assurance

business functions of commercial IT providers(eg CEOrsquos office Procurement Marketing amp Sales HR Facilities

Planning QA)

IT processes eg IncidentProblem ConfigurationChange

AvailabilityCapacity

Data

Employees including IT people

Physical FacilitiesBuilding Networks VoIP

Computational and Storage

Servers SANNAS

Software stackOS DBMS middleware applications

ApplicationsServicesB2BB2C e-SCMERPe-CRM BIKM IntranetsExtranets e-mail

BCM for data centres 360

robertmcachia at umedumt

BCM DC initiative output a BC Plan + organizational capability

A good BC plan is articulated by components and views

With a good BC plan the CIOCTOData Centre Manager can tell the

CEO

We haveare executing plans by DC clientmarket segment hellip

We haveare executing plans by Applicationby Service hellip

We haveare executing plans by DC Departmenthellip

We haveare executing plans by Buildingby Floor hellip

BC Plan BC Planning (i) What (ii) How (iii) Who (iv) When

robertmcachia at umedumt

DC Business Continuity Planning management choices

Which DC clientsservicesapplications get restored first and who will

wait longer hellip recovery sequence for each client

Which servicesapplications get restored completely which clients will

initially only get limited service

MTPD maximum tolerable period of disruption the period of time by

which all identified systemsapplicationsservices and data

must be restored to normality (for DC for each client)

RTO recovery time objective the period in time within which

assetsservices must be recovered after disruption (for DC for

each client)

RTO affects the recovery option shorter RTO -gt more difficult

more expensive (for DC for each client)

RPO recovery point objective the point in time to which

assetsservices must be restored after disruption (for DC for

each client)

RPO affects the volume of data that may need to be restored

robertmcachia at umedumt

BCM initiative requires a dedicated ProjectProgramme

A BC initiative may adopt Prince 2 or PMBOK projectmanagement methods The typical BCM phases

BC Project Initiation Phase

(mandate BCM scope DC BCM objectives responsibilitiesapproach awareness and BC project outcomesdeliverables)

Business Impact Analysis Phase

Risk Assessment Phase

BC Planning amp Design Phase

Exercise amp Testing Phase

Maintenance amp Review Phase

Awareness amp Training Phase

robertmcachia at umedumt

Business Impact Analysis (BIA) Phase

BIA achieves understanding of the DC its users activities

servicesrevenue streams applications data liabilities

bull BIA takes an internal focus - it identifies critical DC value-creatingactivities and assets

bull BIA quantifies DC loss due to disruptionimpairment of assets(reputational financial lost revenue cost of recovery)

bull BIA does not estimate the probability of types of DC incidents - itquantifies the consequences

bull BIA techniques questionnaires + interviews

BIA answers the questions which DC activities amp

assets create most value hellip of all potential DC losses

which impaired activities amp assets would be the

greatest losscascading impacts

multiple DC failure

robertmcachia at umedumt

The Risk Matrix - likelihood vs impact

Major Risks to DC - top-right square

unacceptable Eliminate or Postpone

Transfer Monitor amp Mitigate +

management ownership + DO BCM -Governance

Contingency Risks to DC - top-left

square very rare events will

extinguish the business complacency +

overconfidence can distract

management from the consequences

DO BCM - Governance

High incidence and low impact risksamp

minor risks order of the day -

operational matters

robertmcachia at umedumt

Risk Assessment Phase

RA considers the whatwhywherewho can and would cause the DCthe disastrous losses identified in BIA RA seeks to understand threatsto valued serviceDC assets

bull RA takes a primarily external focus a worldpoliticaleconomic IT sector focus - it looks at sources and causes

bull RA identifies threat scenarios to DC value-creating activities ampassets

bull RA estimates likelihood (probability) of occurrence of DC lossesamp asset impairment identified in BIA (estimates orguesstimates)

Note Need to locate DC activities amp assets in the slots of the RiskMatrix for the DC and plan to protect value-creating capability

robertmcachia at umedumt

BC Planning amp Design Phase - business recovery

DC recovery options driven by business requirements

regulation contracts amp SLAs A typical recovery sequence

1 Set MTPD objective for the DCfor each

client

2 Assign RTOs for individual

clientservicesapplicationsdata on

basis of prioritySLA

3 Evaluate alternative management amp

technical recovery options

4 Perform costbenefit analysis for each

option (re-planiterate)

5 Decide CEOCFO (i) approval (ii)

budget (iii) tasking individualsFailure to prepare is preparing

to fail - John Wooden

robertmcachia at umedumt

Some business recovery options for the DC

bull an Emergency Operations OfficeCentre

bull virtual workplaceteleworking from home for selected DC employees

bull alternative IT Call CentreData Centre

bull business decision - On-shore Near-shore Offshore

fallbackredundancy options

A typical recovery sequence

1 Initial emergency response

2 Resume mission-critical ApplicationsServices

3 Resume non-critical ApplicationsServices

4 Restoration to primary site full services verify stability

Ongoing communication - DC employees clients regulators

BC Planning amp Design Phase - business recovery (2)

robertmcachia at umedumt

BC Planning amp Design - technical recovery

Note business decision On-shore Near-shore Offshore

Note Mirror Hot Warm Cold (i) own (ii) shared-space (iii) reciprocal

agreements (iv) outsourced a business issue

Note high-end BC for DCs active-active hellip hot failover hellip

Note entry-point BC for DCs electronic vaulting remote journaling

transaction logging

Mirror full redundancy short-

distancewidely-separated Data

Centers latency considerations

Hot fully equipped fallback Data Centre

Warm fallback Data Center missing key

components

Cold empty fallback Data Center

DC Recovery options by increasing MTPD RTO amp decreasing cost

mirroring

robertmcachia at umedumt

Phases Exercise amp Testing Maintain amp Review Awareness amp Training

Maintain amp Review Phase

feedback from testingaudit broadendeepen DC BC plan

Awareness amp Training Phase

its never enough institutionalize BC practice embed into

DC culture

The proof of the bdquopudding‟

is in the eating

Exercise amp Testing Phase

Give yourself an intentional DC disaster

1 Re-test BCP regularly

(walk-through checklist role-

playingsimulation full interruption amp

rehearsal)

2 Test alternative risk (threat +

vulnerability) scenarios

robertmcachia at umedumt

An example disaster mass cyber-attack

A mass cyber-attack may

involve

prior sniffing hellip

hellip social engineering amp

widespread attack

An attack may be blended

denial-of-service andor

widespread virusworm

infection andor

penetration (data theft

data corruption) andor

fraudblackmail

IncidentProblem Management

(recall ISO 20000 ITIL)source M Benyoucef

University of Ottawa

robertmcachia at umedumt

An example DC disaster BCM response to cyber-attack

This sequence applies to DoSDDoS infection amp penetration cyber-

attacks

Note After the BCM response seek prevention (i) Lessons Learnt (ii)

improve DC processsystems resilience (iii) heighten DC emergency

preparedness

1 Interdict actions to block attack halt its progress

2 Contain actions to prevent further degradation

amp regain control

3 Recover repair fix damage bring all systems

data applicationsservices to normality

4 Analyse investigationaudit - examination of

evidence

When prevention fails and an attack is under way the BCM

response is

Interdict Contain

Recover Analyse

robertmcachia at umedumt

Structure amp Content of BS25999

A DC may go beyond organising a BCP project

It may decide to formalise its BCM assurance process by embracing the

BS25999 standard

BS25999 holistic structure amp content

bull overview of business continuity management (BCM)

bull business continuity management policy

bull BCM programme management

bull understanding the organization

bull determining business continuity strategy

bull developing and implementing a BCM response

bull exercising maintaining and reviewing BCM arrangements

bull embedding BCM in the organizationrsquos culture

BS25999 Holistic

robertmcachia at umedumt

Some takeaways (1)

BCM thinking amp action

bull decrease the number of

holes both management amp

technical

bull decrease the size of the

holes and

bull keep them moving so they

are never aligned

The Swiss cheese model of business resilience

robertmcachia at umedumt

Some takeaways (2)

BCM is a concern of

CIOs CTOs Data Centre Managers

Quality Managers Infosec Managers

IT Architects DBAs

IT Auditors

CEOs CFOs because itrsquos Governance

BCM addresses broad non-specific risks it is a ldquocatch allrdquo

BCM is probably the most cost-effective security control of all

BCM start now start small scale fast broadendeepen BCM scope

incrementally

You must be able to respond to your circumstances as they exist - not as you

would like them to be - Brian Billick

robertmcachia at umedumt

Sources in and around BCM for Data Centres

The Business Continuity Institute (BCI UK) httpwwwthebciorg

Guidancestudies on BCM by ENISA (EU) ECB (EU) CMI (UK) FSA (UK) UK

Cabinet Office

A primer on ITIL ISO 20000 - IT Service Management

httpwwwisaca-

maltaorgdocumentspresentationsIT_Service_Management_Robert_M_Cachia_26_02

_2009pdf

Agility amp alertness hellipcontinuitycentral httpwwwcontinuitycentralcomukhtm

BS25999 (British Standards Institution) httpwwwbsigroupcomensectorsandservicesDisciplinesBusines

s-Continuity

RiskIT amp CobIT frameworks (ISACA) wwwisaca-maltaorg

SFIA HR framework for IT ISEB (BCS) httpwwwbcsorg

SEI (USA) httpwwwseicmuedu

Uptime Institute (USA) httpwwwuptimeinstituteorg

robertmcachia at umedumt

Questions and Discussion

robertmcachia atldquo umedumt

Thank You

Dr Robert M Cachia

httpwwwlinkedincominrobertmcachia

robertmcachia ldquoatrdquo umedumt

BCM in Data Centres BCM for Data Centres

360 e-continuity amp e-assurance

business functions of commercial IT providers(eg CEOrsquos office Procurement Marketing amp Sales HR Facilities

Planning QA)

IT processes eg IncidentProblem ConfigurationChange

AvailabilityCapacity

Data

Employees including IT people

Physical FacilitiesBuilding Networks VoIP

Computational and Storage

Servers SANNAS

Software stackOS DBMS middleware applications

ApplicationsServicesB2BB2C e-SCMERPe-CRM BIKM IntranetsExtranets e-mail

BCM for data centres 360

robertmcachia at umedumt

BCM DC initiative output a BC Plan + organizational capability

A good BC plan is articulated by components and views

With a good BC plan the CIOCTOData Centre Manager can tell the

CEO

We haveare executing plans by DC clientmarket segment hellip

We haveare executing plans by Applicationby Service hellip

We haveare executing plans by DC Departmenthellip

We haveare executing plans by Buildingby Floor hellip

BC Plan BC Planning (i) What (ii) How (iii) Who (iv) When

robertmcachia at umedumt

DC Business Continuity Planning management choices

Which DC clientsservicesapplications get restored first and who will

wait longer hellip recovery sequence for each client

Which servicesapplications get restored completely which clients will

initially only get limited service

MTPD maximum tolerable period of disruption the period of time by

which all identified systemsapplicationsservices and data

must be restored to normality (for DC for each client)

RTO recovery time objective the period in time within which

assetsservices must be recovered after disruption (for DC for

each client)

RTO affects the recovery option shorter RTO -gt more difficult

more expensive (for DC for each client)

RPO recovery point objective the point in time to which

assetsservices must be restored after disruption (for DC for

each client)

RPO affects the volume of data that may need to be restored

robertmcachia at umedumt

BCM initiative requires a dedicated ProjectProgramme

A BC initiative may adopt Prince 2 or PMBOK projectmanagement methods The typical BCM phases

BC Project Initiation Phase

(mandate BCM scope DC BCM objectives responsibilitiesapproach awareness and BC project outcomesdeliverables)

Business Impact Analysis Phase

Risk Assessment Phase

BC Planning amp Design Phase

Exercise amp Testing Phase

Maintenance amp Review Phase

Awareness amp Training Phase

robertmcachia at umedumt

Business Impact Analysis (BIA) Phase

BIA achieves understanding of the DC its users activities

servicesrevenue streams applications data liabilities

bull BIA takes an internal focus - it identifies critical DC value-creatingactivities and assets

bull BIA quantifies DC loss due to disruptionimpairment of assets(reputational financial lost revenue cost of recovery)

bull BIA does not estimate the probability of types of DC incidents - itquantifies the consequences

bull BIA techniques questionnaires + interviews

BIA answers the questions which DC activities amp

assets create most value hellip of all potential DC losses

which impaired activities amp assets would be the

greatest losscascading impacts

multiple DC failure

robertmcachia at umedumt

The Risk Matrix - likelihood vs impact

Major Risks to DC - top-right square

unacceptable Eliminate or Postpone

Transfer Monitor amp Mitigate +

management ownership + DO BCM -Governance

Contingency Risks to DC - top-left

square very rare events will

extinguish the business complacency +

overconfidence can distract

management from the consequences

DO BCM - Governance

High incidence and low impact risksamp

minor risks order of the day -

operational matters

robertmcachia at umedumt

Risk Assessment Phase

RA considers the whatwhywherewho can and would cause the DCthe disastrous losses identified in BIA RA seeks to understand threatsto valued serviceDC assets

bull RA takes a primarily external focus a worldpoliticaleconomic IT sector focus - it looks at sources and causes

bull RA identifies threat scenarios to DC value-creating activities ampassets

bull RA estimates likelihood (probability) of occurrence of DC lossesamp asset impairment identified in BIA (estimates orguesstimates)

Note Need to locate DC activities amp assets in the slots of the RiskMatrix for the DC and plan to protect value-creating capability

robertmcachia at umedumt

BC Planning amp Design Phase - business recovery

DC recovery options driven by business requirements

regulation contracts amp SLAs A typical recovery sequence

1 Set MTPD objective for the DCfor each

client

2 Assign RTOs for individual

clientservicesapplicationsdata on

basis of prioritySLA

3 Evaluate alternative management amp

technical recovery options

4 Perform costbenefit analysis for each

option (re-planiterate)

5 Decide CEOCFO (i) approval (ii)

budget (iii) tasking individualsFailure to prepare is preparing

to fail - John Wooden

robertmcachia at umedumt

Some business recovery options for the DC

bull an Emergency Operations OfficeCentre

bull virtual workplaceteleworking from home for selected DC employees

bull alternative IT Call CentreData Centre

bull business decision - On-shore Near-shore Offshore

fallbackredundancy options

A typical recovery sequence

1 Initial emergency response

2 Resume mission-critical ApplicationsServices

3 Resume non-critical ApplicationsServices

4 Restoration to primary site full services verify stability

Ongoing communication - DC employees clients regulators

BC Planning amp Design Phase - business recovery (2)

robertmcachia at umedumt

BC Planning amp Design - technical recovery

Note business decision On-shore Near-shore Offshore

Note Mirror Hot Warm Cold (i) own (ii) shared-space (iii) reciprocal

agreements (iv) outsourced a business issue

Note high-end BC for DCs active-active hellip hot failover hellip

Note entry-point BC for DCs electronic vaulting remote journaling

transaction logging

Mirror full redundancy short-

distancewidely-separated Data

Centers latency considerations

Hot fully equipped fallback Data Centre

Warm fallback Data Center missing key

components

Cold empty fallback Data Center

DC Recovery options by increasing MTPD RTO amp decreasing cost

mirroring

robertmcachia at umedumt

Phases Exercise amp Testing Maintain amp Review Awareness amp Training

Maintain amp Review Phase

feedback from testingaudit broadendeepen DC BC plan

Awareness amp Training Phase

its never enough institutionalize BC practice embed into

DC culture

The proof of the bdquopudding‟

is in the eating

Exercise amp Testing Phase

Give yourself an intentional DC disaster

1 Re-test BCP regularly

(walk-through checklist role-

playingsimulation full interruption amp

rehearsal)

2 Test alternative risk (threat +

vulnerability) scenarios

robertmcachia at umedumt

An example disaster mass cyber-attack

A mass cyber-attack may

involve

prior sniffing hellip

hellip social engineering amp

widespread attack

An attack may be blended

denial-of-service andor

widespread virusworm

infection andor

penetration (data theft

data corruption) andor

fraudblackmail

IncidentProblem Management

(recall ISO 20000 ITIL)source M Benyoucef

University of Ottawa

robertmcachia at umedumt

An example DC disaster BCM response to cyber-attack

This sequence applies to DoSDDoS infection amp penetration cyber-

attacks

Note After the BCM response seek prevention (i) Lessons Learnt (ii)

improve DC processsystems resilience (iii) heighten DC emergency

preparedness

1 Interdict actions to block attack halt its progress

2 Contain actions to prevent further degradation

amp regain control

3 Recover repair fix damage bring all systems

data applicationsservices to normality

4 Analyse investigationaudit - examination of

evidence

When prevention fails and an attack is under way the BCM

response is

Interdict Contain

Recover Analyse

robertmcachia at umedumt

Structure amp Content of BS25999

A DC may go beyond organising a BCP project

It may decide to formalise its BCM assurance process by embracing the

BS25999 standard

BS25999 holistic structure amp content

bull overview of business continuity management (BCM)

bull business continuity management policy

bull BCM programme management

bull understanding the organization

bull determining business continuity strategy

bull developing and implementing a BCM response

bull exercising maintaining and reviewing BCM arrangements

bull embedding BCM in the organizationrsquos culture

BS25999 Holistic

robertmcachia at umedumt

Some takeaways (1)

BCM thinking amp action

bull decrease the number of

holes both management amp

technical

bull decrease the size of the

holes and

bull keep them moving so they

are never aligned

The Swiss cheese model of business resilience

robertmcachia at umedumt

Some takeaways (2)

BCM is a concern of

CIOs CTOs Data Centre Managers

Quality Managers Infosec Managers

IT Architects DBAs

IT Auditors

CEOs CFOs because itrsquos Governance

BCM addresses broad non-specific risks it is a ldquocatch allrdquo

BCM is probably the most cost-effective security control of all

BCM start now start small scale fast broadendeepen BCM scope

incrementally

You must be able to respond to your circumstances as they exist - not as you

would like them to be - Brian Billick

robertmcachia at umedumt

Sources in and around BCM for Data Centres

The Business Continuity Institute (BCI UK) httpwwwthebciorg

Guidancestudies on BCM by ENISA (EU) ECB (EU) CMI (UK) FSA (UK) UK

Cabinet Office

A primer on ITIL ISO 20000 - IT Service Management

httpwwwisaca-

maltaorgdocumentspresentationsIT_Service_Management_Robert_M_Cachia_26_02

_2009pdf

Agility amp alertness hellipcontinuitycentral httpwwwcontinuitycentralcomukhtm

BS25999 (British Standards Institution) httpwwwbsigroupcomensectorsandservicesDisciplinesBusines

s-Continuity

RiskIT amp CobIT frameworks (ISACA) wwwisaca-maltaorg

SFIA HR framework for IT ISEB (BCS) httpwwwbcsorg

SEI (USA) httpwwwseicmuedu

Uptime Institute (USA) httpwwwuptimeinstituteorg

robertmcachia at umedumt

Questions and Discussion

robertmcachia atldquo umedumt

Thank You

Dr Robert M Cachia

httpwwwlinkedincominrobertmcachia

robertmcachia ldquoatrdquo umedumt

BCM DC initiative output a BC Plan + organizational capability

A good BC plan is articulated by components and views

With a good BC plan the CIOCTOData Centre Manager can tell the

CEO

We haveare executing plans by DC clientmarket segment hellip

We haveare executing plans by Applicationby Service hellip

We haveare executing plans by DC Departmenthellip

We haveare executing plans by Buildingby Floor hellip

BC Plan BC Planning (i) What (ii) How (iii) Who (iv) When

robertmcachia at umedumt

DC Business Continuity Planning management choices

Which DC clientsservicesapplications get restored first and who will

wait longer hellip recovery sequence for each client

Which servicesapplications get restored completely which clients will

initially only get limited service

MTPD maximum tolerable period of disruption the period of time by

which all identified systemsapplicationsservices and data

must be restored to normality (for DC for each client)

RTO recovery time objective the period in time within which

assetsservices must be recovered after disruption (for DC for

each client)

RTO affects the recovery option shorter RTO -gt more difficult

more expensive (for DC for each client)

RPO recovery point objective the point in time to which

assetsservices must be restored after disruption (for DC for

each client)

RPO affects the volume of data that may need to be restored

robertmcachia at umedumt

BCM initiative requires a dedicated ProjectProgramme

A BC initiative may adopt Prince 2 or PMBOK projectmanagement methods The typical BCM phases

BC Project Initiation Phase

(mandate BCM scope DC BCM objectives responsibilitiesapproach awareness and BC project outcomesdeliverables)

Business Impact Analysis Phase

Risk Assessment Phase

BC Planning amp Design Phase

Exercise amp Testing Phase

Maintenance amp Review Phase

Awareness amp Training Phase

robertmcachia at umedumt

Business Impact Analysis (BIA) Phase

BIA achieves understanding of the DC its users activities

servicesrevenue streams applications data liabilities

bull BIA takes an internal focus - it identifies critical DC value-creatingactivities and assets

bull BIA quantifies DC loss due to disruptionimpairment of assets(reputational financial lost revenue cost of recovery)

bull BIA does not estimate the probability of types of DC incidents - itquantifies the consequences

bull BIA techniques questionnaires + interviews

BIA answers the questions which DC activities amp

assets create most value hellip of all potential DC losses

which impaired activities amp assets would be the

greatest losscascading impacts

multiple DC failure

robertmcachia at umedumt

The Risk Matrix - likelihood vs impact

Major Risks to DC - top-right square

unacceptable Eliminate or Postpone

Transfer Monitor amp Mitigate +

management ownership + DO BCM -Governance

Contingency Risks to DC - top-left

square very rare events will

extinguish the business complacency +

overconfidence can distract

management from the consequences

DO BCM - Governance

High incidence and low impact risksamp

minor risks order of the day -

operational matters

robertmcachia at umedumt

Risk Assessment Phase

RA considers the whatwhywherewho can and would cause the DCthe disastrous losses identified in BIA RA seeks to understand threatsto valued serviceDC assets

bull RA takes a primarily external focus a worldpoliticaleconomic IT sector focus - it looks at sources and causes

bull RA identifies threat scenarios to DC value-creating activities ampassets

bull RA estimates likelihood (probability) of occurrence of DC lossesamp asset impairment identified in BIA (estimates orguesstimates)

Note Need to locate DC activities amp assets in the slots of the RiskMatrix for the DC and plan to protect value-creating capability

robertmcachia at umedumt

BC Planning amp Design Phase - business recovery

DC recovery options driven by business requirements

regulation contracts amp SLAs A typical recovery sequence

1 Set MTPD objective for the DCfor each

client

2 Assign RTOs for individual

clientservicesapplicationsdata on

basis of prioritySLA

3 Evaluate alternative management amp

technical recovery options

4 Perform costbenefit analysis for each

option (re-planiterate)

5 Decide CEOCFO (i) approval (ii)

budget (iii) tasking individualsFailure to prepare is preparing

to fail - John Wooden

robertmcachia at umedumt

Some business recovery options for the DC

bull an Emergency Operations OfficeCentre

bull virtual workplaceteleworking from home for selected DC employees

bull alternative IT Call CentreData Centre

bull business decision - On-shore Near-shore Offshore

fallbackredundancy options

A typical recovery sequence

1 Initial emergency response

2 Resume mission-critical ApplicationsServices

3 Resume non-critical ApplicationsServices

4 Restoration to primary site full services verify stability

Ongoing communication - DC employees clients regulators

BC Planning amp Design Phase - business recovery (2)

robertmcachia at umedumt

BC Planning amp Design - technical recovery

Note business decision On-shore Near-shore Offshore

Note Mirror Hot Warm Cold (i) own (ii) shared-space (iii) reciprocal

agreements (iv) outsourced a business issue

Note high-end BC for DCs active-active hellip hot failover hellip

Note entry-point BC for DCs electronic vaulting remote journaling

transaction logging

Mirror full redundancy short-

distancewidely-separated Data

Centers latency considerations

Hot fully equipped fallback Data Centre

Warm fallback Data Center missing key

components

Cold empty fallback Data Center

DC Recovery options by increasing MTPD RTO amp decreasing cost

mirroring

robertmcachia at umedumt

Phases Exercise amp Testing Maintain amp Review Awareness amp Training

Maintain amp Review Phase

feedback from testingaudit broadendeepen DC BC plan

Awareness amp Training Phase

its never enough institutionalize BC practice embed into

DC culture

The proof of the bdquopudding‟

is in the eating

Exercise amp Testing Phase

Give yourself an intentional DC disaster

1 Re-test BCP regularly

(walk-through checklist role-

playingsimulation full interruption amp

rehearsal)

2 Test alternative risk (threat +

vulnerability) scenarios

robertmcachia at umedumt

An example disaster mass cyber-attack

A mass cyber-attack may

involve

prior sniffing hellip

hellip social engineering amp

widespread attack

An attack may be blended

denial-of-service andor

widespread virusworm

infection andor

penetration (data theft

data corruption) andor

fraudblackmail

IncidentProblem Management

(recall ISO 20000 ITIL)source M Benyoucef

University of Ottawa

robertmcachia at umedumt

An example DC disaster BCM response to cyber-attack

This sequence applies to DoSDDoS infection amp penetration cyber-

attacks

Note After the BCM response seek prevention (i) Lessons Learnt (ii)

improve DC processsystems resilience (iii) heighten DC emergency

preparedness

1 Interdict actions to block attack halt its progress

2 Contain actions to prevent further degradation

amp regain control

3 Recover repair fix damage bring all systems

data applicationsservices to normality

4 Analyse investigationaudit - examination of

evidence

When prevention fails and an attack is under way the BCM

response is

Interdict Contain

Recover Analyse

robertmcachia at umedumt

Structure amp Content of BS25999

A DC may go beyond organising a BCP project

It may decide to formalise its BCM assurance process by embracing the

BS25999 standard

BS25999 holistic structure amp content

bull overview of business continuity management (BCM)

bull business continuity management policy

bull BCM programme management

bull understanding the organization

bull determining business continuity strategy

bull developing and implementing a BCM response

bull exercising maintaining and reviewing BCM arrangements

bull embedding BCM in the organizationrsquos culture

BS25999 Holistic

robertmcachia at umedumt

Some takeaways (1)

BCM thinking amp action

bull decrease the number of

holes both management amp

technical

bull decrease the size of the

holes and

bull keep them moving so they

are never aligned

The Swiss cheese model of business resilience

robertmcachia at umedumt

Some takeaways (2)

BCM is a concern of

CIOs CTOs Data Centre Managers

Quality Managers Infosec Managers

IT Architects DBAs

IT Auditors

CEOs CFOs because itrsquos Governance

BCM addresses broad non-specific risks it is a ldquocatch allrdquo

BCM is probably the most cost-effective security control of all

BCM start now start small scale fast broadendeepen BCM scope

incrementally

You must be able to respond to your circumstances as they exist - not as you

would like them to be - Brian Billick

robertmcachia at umedumt

Sources in and around BCM for Data Centres

The Business Continuity Institute (BCI UK) httpwwwthebciorg

Guidancestudies on BCM by ENISA (EU) ECB (EU) CMI (UK) FSA (UK) UK

Cabinet Office

A primer on ITIL ISO 20000 - IT Service Management

httpwwwisaca-

maltaorgdocumentspresentationsIT_Service_Management_Robert_M_Cachia_26_02

_2009pdf

Agility amp alertness hellipcontinuitycentral httpwwwcontinuitycentralcomukhtm

BS25999 (British Standards Institution) httpwwwbsigroupcomensectorsandservicesDisciplinesBusines

s-Continuity

RiskIT amp CobIT frameworks (ISACA) wwwisaca-maltaorg

SFIA HR framework for IT ISEB (BCS) httpwwwbcsorg

SEI (USA) httpwwwseicmuedu

Uptime Institute (USA) httpwwwuptimeinstituteorg

robertmcachia at umedumt

Questions and Discussion

robertmcachia atldquo umedumt

Thank You

Dr Robert M Cachia

httpwwwlinkedincominrobertmcachia

robertmcachia ldquoatrdquo umedumt

DC Business Continuity Planning management choices

Which DC clientsservicesapplications get restored first and who will

wait longer hellip recovery sequence for each client

Which servicesapplications get restored completely which clients will

initially only get limited service

MTPD maximum tolerable period of disruption the period of time by

which all identified systemsapplicationsservices and data

must be restored to normality (for DC for each client)

RTO recovery time objective the period in time within which

assetsservices must be recovered after disruption (for DC for

each client)

RTO affects the recovery option shorter RTO -gt more difficult

more expensive (for DC for each client)

RPO recovery point objective the point in time to which

assetsservices must be restored after disruption (for DC for

each client)

RPO affects the volume of data that may need to be restored

robertmcachia at umedumt

BCM initiative requires a dedicated ProjectProgramme

A BC initiative may adopt Prince 2 or PMBOK projectmanagement methods The typical BCM phases

BC Project Initiation Phase

(mandate BCM scope DC BCM objectives responsibilitiesapproach awareness and BC project outcomesdeliverables)

Business Impact Analysis Phase

Risk Assessment Phase

BC Planning amp Design Phase

Exercise amp Testing Phase

Maintenance amp Review Phase

Awareness amp Training Phase

robertmcachia at umedumt

Business Impact Analysis (BIA) Phase

BIA achieves understanding of the DC its users activities

servicesrevenue streams applications data liabilities

bull BIA takes an internal focus - it identifies critical DC value-creatingactivities and assets

bull BIA quantifies DC loss due to disruptionimpairment of assets(reputational financial lost revenue cost of recovery)

bull BIA does not estimate the probability of types of DC incidents - itquantifies the consequences

bull BIA techniques questionnaires + interviews

BIA answers the questions which DC activities amp

assets create most value hellip of all potential DC losses

which impaired activities amp assets would be the

greatest losscascading impacts

multiple DC failure

robertmcachia at umedumt

The Risk Matrix - likelihood vs impact

Major Risks to DC - top-right square

unacceptable Eliminate or Postpone

Transfer Monitor amp Mitigate +

management ownership + DO BCM -Governance

Contingency Risks to DC - top-left

square very rare events will

extinguish the business complacency +

overconfidence can distract

management from the consequences

DO BCM - Governance

High incidence and low impact risksamp

minor risks order of the day -

operational matters

robertmcachia at umedumt

Risk Assessment Phase

RA considers the whatwhywherewho can and would cause the DCthe disastrous losses identified in BIA RA seeks to understand threatsto valued serviceDC assets

bull RA takes a primarily external focus a worldpoliticaleconomic IT sector focus - it looks at sources and causes

bull RA identifies threat scenarios to DC value-creating activities ampassets

bull RA estimates likelihood (probability) of occurrence of DC lossesamp asset impairment identified in BIA (estimates orguesstimates)

Note Need to locate DC activities amp assets in the slots of the RiskMatrix for the DC and plan to protect value-creating capability

robertmcachia at umedumt

BC Planning amp Design Phase - business recovery

DC recovery options driven by business requirements

regulation contracts amp SLAs A typical recovery sequence

1 Set MTPD objective for the DCfor each

client

2 Assign RTOs for individual

clientservicesapplicationsdata on

basis of prioritySLA

3 Evaluate alternative management amp

technical recovery options

4 Perform costbenefit analysis for each

option (re-planiterate)

5 Decide CEOCFO (i) approval (ii)

budget (iii) tasking individualsFailure to prepare is preparing

to fail - John Wooden

robertmcachia at umedumt

Some business recovery options for the DC

bull an Emergency Operations OfficeCentre

bull virtual workplaceteleworking from home for selected DC employees

bull alternative IT Call CentreData Centre

bull business decision - On-shore Near-shore Offshore

fallbackredundancy options

A typical recovery sequence

1 Initial emergency response

2 Resume mission-critical ApplicationsServices

3 Resume non-critical ApplicationsServices

4 Restoration to primary site full services verify stability

Ongoing communication - DC employees clients regulators

BC Planning amp Design Phase - business recovery (2)

robertmcachia at umedumt

BC Planning amp Design - technical recovery

Note business decision On-shore Near-shore Offshore

Note Mirror Hot Warm Cold (i) own (ii) shared-space (iii) reciprocal

agreements (iv) outsourced a business issue

Note high-end BC for DCs active-active hellip hot failover hellip

Note entry-point BC for DCs electronic vaulting remote journaling

transaction logging

Mirror full redundancy short-

distancewidely-separated Data

Centers latency considerations

Hot fully equipped fallback Data Centre

Warm fallback Data Center missing key

components

Cold empty fallback Data Center

DC Recovery options by increasing MTPD RTO amp decreasing cost

mirroring

robertmcachia at umedumt

Phases Exercise amp Testing Maintain amp Review Awareness amp Training

Maintain amp Review Phase

feedback from testingaudit broadendeepen DC BC plan

Awareness amp Training Phase

its never enough institutionalize BC practice embed into

DC culture

The proof of the bdquopudding‟

is in the eating

Exercise amp Testing Phase

Give yourself an intentional DC disaster

1 Re-test BCP regularly

(walk-through checklist role-

playingsimulation full interruption amp

rehearsal)

2 Test alternative risk (threat +

vulnerability) scenarios

robertmcachia at umedumt

An example disaster mass cyber-attack

A mass cyber-attack may

involve

prior sniffing hellip

hellip social engineering amp

widespread attack

An attack may be blended

denial-of-service andor

widespread virusworm

infection andor

penetration (data theft

data corruption) andor

fraudblackmail

IncidentProblem Management

(recall ISO 20000 ITIL)source M Benyoucef

University of Ottawa

robertmcachia at umedumt

An example DC disaster BCM response to cyber-attack

This sequence applies to DoSDDoS infection amp penetration cyber-

attacks

Note After the BCM response seek prevention (i) Lessons Learnt (ii)

improve DC processsystems resilience (iii) heighten DC emergency

preparedness

1 Interdict actions to block attack halt its progress

2 Contain actions to prevent further degradation

amp regain control

3 Recover repair fix damage bring all systems

data applicationsservices to normality

4 Analyse investigationaudit - examination of

evidence

When prevention fails and an attack is under way the BCM

response is

Interdict Contain

Recover Analyse

robertmcachia at umedumt

Structure amp Content of BS25999

A DC may go beyond organising a BCP project

It may decide to formalise its BCM assurance process by embracing the

BS25999 standard

BS25999 holistic structure amp content

bull overview of business continuity management (BCM)

bull business continuity management policy

bull BCM programme management

bull understanding the organization

bull determining business continuity strategy

bull developing and implementing a BCM response

bull exercising maintaining and reviewing BCM arrangements

bull embedding BCM in the organizationrsquos culture

BS25999 Holistic

robertmcachia at umedumt

Some takeaways (1)

BCM thinking amp action

bull decrease the number of

holes both management amp

technical

bull decrease the size of the

holes and

bull keep them moving so they

are never aligned

The Swiss cheese model of business resilience

robertmcachia at umedumt

Some takeaways (2)

BCM is a concern of

CIOs CTOs Data Centre Managers

Quality Managers Infosec Managers

IT Architects DBAs

IT Auditors

CEOs CFOs because itrsquos Governance

BCM addresses broad non-specific risks it is a ldquocatch allrdquo

BCM is probably the most cost-effective security control of all

BCM start now start small scale fast broadendeepen BCM scope

incrementally

You must be able to respond to your circumstances as they exist - not as you

would like them to be - Brian Billick

robertmcachia at umedumt

Sources in and around BCM for Data Centres

The Business Continuity Institute (BCI UK) httpwwwthebciorg

Guidancestudies on BCM by ENISA (EU) ECB (EU) CMI (UK) FSA (UK) UK

Cabinet Office

A primer on ITIL ISO 20000 - IT Service Management

httpwwwisaca-

maltaorgdocumentspresentationsIT_Service_Management_Robert_M_Cachia_26_02

_2009pdf

Agility amp alertness hellipcontinuitycentral httpwwwcontinuitycentralcomukhtm

BS25999 (British Standards Institution) httpwwwbsigroupcomensectorsandservicesDisciplinesBusines

s-Continuity

RiskIT amp CobIT frameworks (ISACA) wwwisaca-maltaorg

SFIA HR framework for IT ISEB (BCS) httpwwwbcsorg

SEI (USA) httpwwwseicmuedu

Uptime Institute (USA) httpwwwuptimeinstituteorg

robertmcachia at umedumt

Questions and Discussion

robertmcachia atldquo umedumt

Thank You

Dr Robert M Cachia

httpwwwlinkedincominrobertmcachia

robertmcachia ldquoatrdquo umedumt

BCM initiative requires a dedicated ProjectProgramme

A BC initiative may adopt Prince 2 or PMBOK projectmanagement methods The typical BCM phases

BC Project Initiation Phase

(mandate BCM scope DC BCM objectives responsibilitiesapproach awareness and BC project outcomesdeliverables)

Business Impact Analysis Phase

Risk Assessment Phase

BC Planning amp Design Phase

Exercise amp Testing Phase

Maintenance amp Review Phase

Awareness amp Training Phase

robertmcachia at umedumt

Business Impact Analysis (BIA) Phase

BIA achieves understanding of the DC its users activities

servicesrevenue streams applications data liabilities

bull BIA takes an internal focus - it identifies critical DC value-creatingactivities and assets

bull BIA quantifies DC loss due to disruptionimpairment of assets(reputational financial lost revenue cost of recovery)

bull BIA does not estimate the probability of types of DC incidents - itquantifies the consequences

bull BIA techniques questionnaires + interviews

BIA answers the questions which DC activities amp

assets create most value hellip of all potential DC losses

which impaired activities amp assets would be the

greatest losscascading impacts

multiple DC failure

robertmcachia at umedumt

The Risk Matrix - likelihood vs impact

Major Risks to DC - top-right square

unacceptable Eliminate or Postpone

Transfer Monitor amp Mitigate +

management ownership + DO BCM -Governance

Contingency Risks to DC - top-left

square very rare events will

extinguish the business complacency +

overconfidence can distract

management from the consequences

DO BCM - Governance

High incidence and low impact risksamp

minor risks order of the day -

operational matters

robertmcachia at umedumt

Risk Assessment Phase

RA considers the whatwhywherewho can and would cause the DCthe disastrous losses identified in BIA RA seeks to understand threatsto valued serviceDC assets

bull RA takes a primarily external focus a worldpoliticaleconomic IT sector focus - it looks at sources and causes

bull RA identifies threat scenarios to DC value-creating activities ampassets

bull RA estimates likelihood (probability) of occurrence of DC lossesamp asset impairment identified in BIA (estimates orguesstimates)

Note Need to locate DC activities amp assets in the slots of the RiskMatrix for the DC and plan to protect value-creating capability

robertmcachia at umedumt

BC Planning amp Design Phase - business recovery

DC recovery options driven by business requirements

regulation contracts amp SLAs A typical recovery sequence

1 Set MTPD objective for the DCfor each

client

2 Assign RTOs for individual

clientservicesapplicationsdata on

basis of prioritySLA

3 Evaluate alternative management amp

technical recovery options

4 Perform costbenefit analysis for each

option (re-planiterate)

5 Decide CEOCFO (i) approval (ii)

budget (iii) tasking individualsFailure to prepare is preparing

to fail - John Wooden

robertmcachia at umedumt

Some business recovery options for the DC

bull an Emergency Operations OfficeCentre

bull virtual workplaceteleworking from home for selected DC employees

bull alternative IT Call CentreData Centre

bull business decision - On-shore Near-shore Offshore

fallbackredundancy options

A typical recovery sequence

1 Initial emergency response

2 Resume mission-critical ApplicationsServices

3 Resume non-critical ApplicationsServices

4 Restoration to primary site full services verify stability

Ongoing communication - DC employees clients regulators

BC Planning amp Design Phase - business recovery (2)

robertmcachia at umedumt

BC Planning amp Design - technical recovery

Note business decision On-shore Near-shore Offshore

Note Mirror Hot Warm Cold (i) own (ii) shared-space (iii) reciprocal

agreements (iv) outsourced a business issue

Note high-end BC for DCs active-active hellip hot failover hellip

Note entry-point BC for DCs electronic vaulting remote journaling

transaction logging

Mirror full redundancy short-

distancewidely-separated Data

Centers latency considerations

Hot fully equipped fallback Data Centre

Warm fallback Data Center missing key

components

Cold empty fallback Data Center

DC Recovery options by increasing MTPD RTO amp decreasing cost

mirroring

robertmcachia at umedumt

Phases Exercise amp Testing Maintain amp Review Awareness amp Training

Maintain amp Review Phase

feedback from testingaudit broadendeepen DC BC plan

Awareness amp Training Phase

its never enough institutionalize BC practice embed into

DC culture

The proof of the bdquopudding‟

is in the eating

Exercise amp Testing Phase

Give yourself an intentional DC disaster

1 Re-test BCP regularly

(walk-through checklist role-

playingsimulation full interruption amp

rehearsal)

2 Test alternative risk (threat +

vulnerability) scenarios

robertmcachia at umedumt

An example disaster mass cyber-attack

A mass cyber-attack may

involve

prior sniffing hellip

hellip social engineering amp

widespread attack

An attack may be blended

denial-of-service andor

widespread virusworm

infection andor

penetration (data theft

data corruption) andor

fraudblackmail

IncidentProblem Management

(recall ISO 20000 ITIL)source M Benyoucef

University of Ottawa

robertmcachia at umedumt

An example DC disaster BCM response to cyber-attack

This sequence applies to DoSDDoS infection amp penetration cyber-

attacks

Note After the BCM response seek prevention (i) Lessons Learnt (ii)

improve DC processsystems resilience (iii) heighten DC emergency

preparedness

1 Interdict actions to block attack halt its progress

2 Contain actions to prevent further degradation

amp regain control

3 Recover repair fix damage bring all systems

data applicationsservices to normality

4 Analyse investigationaudit - examination of

evidence

When prevention fails and an attack is under way the BCM

response is

Interdict Contain

Recover Analyse

robertmcachia at umedumt

Structure amp Content of BS25999

A DC may go beyond organising a BCP project

It may decide to formalise its BCM assurance process by embracing the

BS25999 standard

BS25999 holistic structure amp content

bull overview of business continuity management (BCM)

bull business continuity management policy

bull BCM programme management

bull understanding the organization

bull determining business continuity strategy

bull developing and implementing a BCM response

bull exercising maintaining and reviewing BCM arrangements

bull embedding BCM in the organizationrsquos culture

BS25999 Holistic

robertmcachia at umedumt

Some takeaways (1)

BCM thinking amp action

bull decrease the number of

holes both management amp

technical

bull decrease the size of the

holes and

bull keep them moving so they

are never aligned

The Swiss cheese model of business resilience

robertmcachia at umedumt

Some takeaways (2)

BCM is a concern of

CIOs CTOs Data Centre Managers

Quality Managers Infosec Managers

IT Architects DBAs

IT Auditors

CEOs CFOs because itrsquos Governance

BCM addresses broad non-specific risks it is a ldquocatch allrdquo

BCM is probably the most cost-effective security control of all

BCM start now start small scale fast broadendeepen BCM scope

incrementally

You must be able to respond to your circumstances as they exist - not as you

would like them to be - Brian Billick

robertmcachia at umedumt

Sources in and around BCM for Data Centres

The Business Continuity Institute (BCI UK) httpwwwthebciorg

Guidancestudies on BCM by ENISA (EU) ECB (EU) CMI (UK) FSA (UK) UK

Cabinet Office

A primer on ITIL ISO 20000 - IT Service Management

httpwwwisaca-

maltaorgdocumentspresentationsIT_Service_Management_Robert_M_Cachia_26_02

_2009pdf

Agility amp alertness hellipcontinuitycentral httpwwwcontinuitycentralcomukhtm

BS25999 (British Standards Institution) httpwwwbsigroupcomensectorsandservicesDisciplinesBusines

s-Continuity

RiskIT amp CobIT frameworks (ISACA) wwwisaca-maltaorg

SFIA HR framework for IT ISEB (BCS) httpwwwbcsorg

SEI (USA) httpwwwseicmuedu

Uptime Institute (USA) httpwwwuptimeinstituteorg

robertmcachia at umedumt

Questions and Discussion

robertmcachia atldquo umedumt

Thank You

Dr Robert M Cachia

httpwwwlinkedincominrobertmcachia

robertmcachia ldquoatrdquo umedumt

Business Impact Analysis (BIA) Phase

BIA achieves understanding of the DC its users activities

servicesrevenue streams applications data liabilities

bull BIA takes an internal focus - it identifies critical DC value-creatingactivities and assets

bull BIA quantifies DC loss due to disruptionimpairment of assets(reputational financial lost revenue cost of recovery)

bull BIA does not estimate the probability of types of DC incidents - itquantifies the consequences

bull BIA techniques questionnaires + interviews

BIA answers the questions which DC activities amp

assets create most value hellip of all potential DC losses

which impaired activities amp assets would be the

greatest losscascading impacts

multiple DC failure

robertmcachia at umedumt

The Risk Matrix - likelihood vs impact

Major Risks to DC - top-right square

unacceptable Eliminate or Postpone

Transfer Monitor amp Mitigate +

management ownership + DO BCM -Governance

Contingency Risks to DC - top-left

square very rare events will

extinguish the business complacency +

overconfidence can distract

management from the consequences

DO BCM - Governance

High incidence and low impact risksamp

minor risks order of the day -

operational matters

robertmcachia at umedumt

Risk Assessment Phase

RA considers the whatwhywherewho can and would cause the DCthe disastrous losses identified in BIA RA seeks to understand threatsto valued serviceDC assets

bull RA takes a primarily external focus a worldpoliticaleconomic IT sector focus - it looks at sources and causes

bull RA identifies threat scenarios to DC value-creating activities ampassets

bull RA estimates likelihood (probability) of occurrence of DC lossesamp asset impairment identified in BIA (estimates orguesstimates)

Note Need to locate DC activities amp assets in the slots of the RiskMatrix for the DC and plan to protect value-creating capability

robertmcachia at umedumt

BC Planning amp Design Phase - business recovery

DC recovery options driven by business requirements

regulation contracts amp SLAs A typical recovery sequence

1 Set MTPD objective for the DCfor each

client

2 Assign RTOs for individual

clientservicesapplicationsdata on

basis of prioritySLA

3 Evaluate alternative management amp

technical recovery options

4 Perform costbenefit analysis for each

option (re-planiterate)

5 Decide CEOCFO (i) approval (ii)

budget (iii) tasking individualsFailure to prepare is preparing

to fail - John Wooden

robertmcachia at umedumt

Some business recovery options for the DC

bull an Emergency Operations OfficeCentre

bull virtual workplaceteleworking from home for selected DC employees

bull alternative IT Call CentreData Centre

bull business decision - On-shore Near-shore Offshore

fallbackredundancy options

A typical recovery sequence

1 Initial emergency response

2 Resume mission-critical ApplicationsServices

3 Resume non-critical ApplicationsServices

4 Restoration to primary site full services verify stability

Ongoing communication - DC employees clients regulators

BC Planning amp Design Phase - business recovery (2)

robertmcachia at umedumt

BC Planning amp Design - technical recovery

Note business decision On-shore Near-shore Offshore

Note Mirror Hot Warm Cold (i) own (ii) shared-space (iii) reciprocal

agreements (iv) outsourced a business issue

Note high-end BC for DCs active-active hellip hot failover hellip

Note entry-point BC for DCs electronic vaulting remote journaling

transaction logging

Mirror full redundancy short-

distancewidely-separated Data

Centers latency considerations

Hot fully equipped fallback Data Centre

Warm fallback Data Center missing key

components

Cold empty fallback Data Center

DC Recovery options by increasing MTPD RTO amp decreasing cost

mirroring

robertmcachia at umedumt

Phases Exercise amp Testing Maintain amp Review Awareness amp Training

Maintain amp Review Phase

feedback from testingaudit broadendeepen DC BC plan

Awareness amp Training Phase

its never enough institutionalize BC practice embed into

DC culture

The proof of the bdquopudding‟

is in the eating

Exercise amp Testing Phase

Give yourself an intentional DC disaster

1 Re-test BCP regularly

(walk-through checklist role-

playingsimulation full interruption amp

rehearsal)

2 Test alternative risk (threat +

vulnerability) scenarios

robertmcachia at umedumt

An example disaster mass cyber-attack

A mass cyber-attack may

involve

prior sniffing hellip

hellip social engineering amp

widespread attack

An attack may be blended

denial-of-service andor

widespread virusworm

infection andor

penetration (data theft

data corruption) andor

fraudblackmail

IncidentProblem Management

(recall ISO 20000 ITIL)source M Benyoucef

University of Ottawa

robertmcachia at umedumt

An example DC disaster BCM response to cyber-attack

This sequence applies to DoSDDoS infection amp penetration cyber-

attacks

Note After the BCM response seek prevention (i) Lessons Learnt (ii)

improve DC processsystems resilience (iii) heighten DC emergency

preparedness

1 Interdict actions to block attack halt its progress

2 Contain actions to prevent further degradation

amp regain control

3 Recover repair fix damage bring all systems

data applicationsservices to normality

4 Analyse investigationaudit - examination of

evidence

When prevention fails and an attack is under way the BCM

response is

Interdict Contain

Recover Analyse

robertmcachia at umedumt

Structure amp Content of BS25999

A DC may go beyond organising a BCP project

It may decide to formalise its BCM assurance process by embracing the

BS25999 standard

BS25999 holistic structure amp content

bull overview of business continuity management (BCM)

bull business continuity management policy

bull BCM programme management

bull understanding the organization

bull determining business continuity strategy

bull developing and implementing a BCM response

bull exercising maintaining and reviewing BCM arrangements

bull embedding BCM in the organizationrsquos culture

BS25999 Holistic

robertmcachia at umedumt

Some takeaways (1)

BCM thinking amp action

bull decrease the number of

holes both management amp

technical

bull decrease the size of the

holes and

bull keep them moving so they

are never aligned

The Swiss cheese model of business resilience

robertmcachia at umedumt

Some takeaways (2)

BCM is a concern of

CIOs CTOs Data Centre Managers

Quality Managers Infosec Managers

IT Architects DBAs

IT Auditors

CEOs CFOs because itrsquos Governance

BCM addresses broad non-specific risks it is a ldquocatch allrdquo

BCM is probably the most cost-effective security control of all

BCM start now start small scale fast broadendeepen BCM scope

incrementally

You must be able to respond to your circumstances as they exist - not as you

would like them to be - Brian Billick

robertmcachia at umedumt

Sources in and around BCM for Data Centres

The Business Continuity Institute (BCI UK) httpwwwthebciorg

Guidancestudies on BCM by ENISA (EU) ECB (EU) CMI (UK) FSA (UK) UK

Cabinet Office

A primer on ITIL ISO 20000 - IT Service Management

httpwwwisaca-

maltaorgdocumentspresentationsIT_Service_Management_Robert_M_Cachia_26_02

_2009pdf

Agility amp alertness hellipcontinuitycentral httpwwwcontinuitycentralcomukhtm

BS25999 (British Standards Institution) httpwwwbsigroupcomensectorsandservicesDisciplinesBusines

s-Continuity

RiskIT amp CobIT frameworks (ISACA) wwwisaca-maltaorg

SFIA HR framework for IT ISEB (BCS) httpwwwbcsorg

SEI (USA) httpwwwseicmuedu

Uptime Institute (USA) httpwwwuptimeinstituteorg

robertmcachia at umedumt

Questions and Discussion

robertmcachia atldquo umedumt

Thank You

Dr Robert M Cachia

httpwwwlinkedincominrobertmcachia

robertmcachia ldquoatrdquo umedumt

The Risk Matrix - likelihood vs impact

Major Risks to DC - top-right square

unacceptable Eliminate or Postpone

Transfer Monitor amp Mitigate +

management ownership + DO BCM -Governance

Contingency Risks to DC - top-left

square very rare events will

extinguish the business complacency +

overconfidence can distract

management from the consequences

DO BCM - Governance

High incidence and low impact risksamp

minor risks order of the day -

operational matters

robertmcachia at umedumt

Risk Assessment Phase

RA considers the whatwhywherewho can and would cause the DCthe disastrous losses identified in BIA RA seeks to understand threatsto valued serviceDC assets

bull RA takes a primarily external focus a worldpoliticaleconomic IT sector focus - it looks at sources and causes

bull RA identifies threat scenarios to DC value-creating activities ampassets

bull RA estimates likelihood (probability) of occurrence of DC lossesamp asset impairment identified in BIA (estimates orguesstimates)

Note Need to locate DC activities amp assets in the slots of the RiskMatrix for the DC and plan to protect value-creating capability

robertmcachia at umedumt

BC Planning amp Design Phase - business recovery

DC recovery options driven by business requirements

regulation contracts amp SLAs A typical recovery sequence

1 Set MTPD objective for the DCfor each

client

2 Assign RTOs for individual

clientservicesapplicationsdata on

basis of prioritySLA

3 Evaluate alternative management amp

technical recovery options

4 Perform costbenefit analysis for each

option (re-planiterate)

5 Decide CEOCFO (i) approval (ii)

budget (iii) tasking individualsFailure to prepare is preparing

to fail - John Wooden

robertmcachia at umedumt

Some business recovery options for the DC

bull an Emergency Operations OfficeCentre

bull virtual workplaceteleworking from home for selected DC employees

bull alternative IT Call CentreData Centre

bull business decision - On-shore Near-shore Offshore

fallbackredundancy options

A typical recovery sequence

1 Initial emergency response

2 Resume mission-critical ApplicationsServices

3 Resume non-critical ApplicationsServices

4 Restoration to primary site full services verify stability

Ongoing communication - DC employees clients regulators

BC Planning amp Design Phase - business recovery (2)

robertmcachia at umedumt

BC Planning amp Design - technical recovery

Note business decision On-shore Near-shore Offshore

Note Mirror Hot Warm Cold (i) own (ii) shared-space (iii) reciprocal

agreements (iv) outsourced a business issue

Note high-end BC for DCs active-active hellip hot failover hellip

Note entry-point BC for DCs electronic vaulting remote journaling

transaction logging

Mirror full redundancy short-

distancewidely-separated Data

Centers latency considerations

Hot fully equipped fallback Data Centre

Warm fallback Data Center missing key

components

Cold empty fallback Data Center

DC Recovery options by increasing MTPD RTO amp decreasing cost

mirroring

robertmcachia at umedumt

Phases Exercise amp Testing Maintain amp Review Awareness amp Training

Maintain amp Review Phase

feedback from testingaudit broadendeepen DC BC plan

Awareness amp Training Phase

its never enough institutionalize BC practice embed into

DC culture

The proof of the bdquopudding‟

is in the eating

Exercise amp Testing Phase

Give yourself an intentional DC disaster

1 Re-test BCP regularly

(walk-through checklist role-

playingsimulation full interruption amp

rehearsal)

2 Test alternative risk (threat +

vulnerability) scenarios

robertmcachia at umedumt

An example disaster mass cyber-attack

A mass cyber-attack may

involve

prior sniffing hellip

hellip social engineering amp

widespread attack

An attack may be blended

denial-of-service andor

widespread virusworm

infection andor

penetration (data theft

data corruption) andor

fraudblackmail

IncidentProblem Management

(recall ISO 20000 ITIL)source M Benyoucef

University of Ottawa

robertmcachia at umedumt

An example DC disaster BCM response to cyber-attack

This sequence applies to DoSDDoS infection amp penetration cyber-

attacks

Note After the BCM response seek prevention (i) Lessons Learnt (ii)

improve DC processsystems resilience (iii) heighten DC emergency

preparedness

1 Interdict actions to block attack halt its progress

2 Contain actions to prevent further degradation

amp regain control

3 Recover repair fix damage bring all systems

data applicationsservices to normality

4 Analyse investigationaudit - examination of

evidence

When prevention fails and an attack is under way the BCM

response is

Interdict Contain

Recover Analyse

robertmcachia at umedumt

Structure amp Content of BS25999

A DC may go beyond organising a BCP project

It may decide to formalise its BCM assurance process by embracing the

BS25999 standard

BS25999 holistic structure amp content

bull overview of business continuity management (BCM)

bull business continuity management policy

bull BCM programme management

bull understanding the organization

bull determining business continuity strategy

bull developing and implementing a BCM response

bull exercising maintaining and reviewing BCM arrangements

bull embedding BCM in the organizationrsquos culture

BS25999 Holistic

robertmcachia at umedumt

Some takeaways (1)

BCM thinking amp action

bull decrease the number of

holes both management amp

technical

bull decrease the size of the

holes and

bull keep them moving so they

are never aligned

The Swiss cheese model of business resilience

robertmcachia at umedumt

Some takeaways (2)

BCM is a concern of

CIOs CTOs Data Centre Managers

Quality Managers Infosec Managers

IT Architects DBAs

IT Auditors

CEOs CFOs because itrsquos Governance

BCM addresses broad non-specific risks it is a ldquocatch allrdquo

BCM is probably the most cost-effective security control of all

BCM start now start small scale fast broadendeepen BCM scope

incrementally

You must be able to respond to your circumstances as they exist - not as you

would like them to be - Brian Billick

robertmcachia at umedumt

Sources in and around BCM for Data Centres

The Business Continuity Institute (BCI UK) httpwwwthebciorg

Guidancestudies on BCM by ENISA (EU) ECB (EU) CMI (UK) FSA (UK) UK

Cabinet Office

A primer on ITIL ISO 20000 - IT Service Management

httpwwwisaca-

maltaorgdocumentspresentationsIT_Service_Management_Robert_M_Cachia_26_02

_2009pdf

Agility amp alertness hellipcontinuitycentral httpwwwcontinuitycentralcomukhtm

BS25999 (British Standards Institution) httpwwwbsigroupcomensectorsandservicesDisciplinesBusines

s-Continuity

RiskIT amp CobIT frameworks (ISACA) wwwisaca-maltaorg

SFIA HR framework for IT ISEB (BCS) httpwwwbcsorg

SEI (USA) httpwwwseicmuedu

Uptime Institute (USA) httpwwwuptimeinstituteorg

robertmcachia at umedumt

Questions and Discussion

robertmcachia atldquo umedumt

Thank You

Dr Robert M Cachia

httpwwwlinkedincominrobertmcachia

robertmcachia ldquoatrdquo umedumt

Risk Assessment Phase

RA considers the whatwhywherewho can and would cause the DCthe disastrous losses identified in BIA RA seeks to understand threatsto valued serviceDC assets

bull RA takes a primarily external focus a worldpoliticaleconomic IT sector focus - it looks at sources and causes

bull RA identifies threat scenarios to DC value-creating activities ampassets

bull RA estimates likelihood (probability) of occurrence of DC lossesamp asset impairment identified in BIA (estimates orguesstimates)

Note Need to locate DC activities amp assets in the slots of the RiskMatrix for the DC and plan to protect value-creating capability

robertmcachia at umedumt

BC Planning amp Design Phase - business recovery

DC recovery options driven by business requirements

regulation contracts amp SLAs A typical recovery sequence

1 Set MTPD objective for the DCfor each

client

2 Assign RTOs for individual

clientservicesapplicationsdata on

basis of prioritySLA

3 Evaluate alternative management amp

technical recovery options

4 Perform costbenefit analysis for each

option (re-planiterate)

5 Decide CEOCFO (i) approval (ii)

budget (iii) tasking individualsFailure to prepare is preparing

to fail - John Wooden

robertmcachia at umedumt

Some business recovery options for the DC

bull an Emergency Operations OfficeCentre

bull virtual workplaceteleworking from home for selected DC employees

bull alternative IT Call CentreData Centre

bull business decision - On-shore Near-shore Offshore

fallbackredundancy options

A typical recovery sequence

1 Initial emergency response

2 Resume mission-critical ApplicationsServices

3 Resume non-critical ApplicationsServices

4 Restoration to primary site full services verify stability

Ongoing communication - DC employees clients regulators

BC Planning amp Design Phase - business recovery (2)

robertmcachia at umedumt

BC Planning amp Design - technical recovery

Note business decision On-shore Near-shore Offshore

Note Mirror Hot Warm Cold (i) own (ii) shared-space (iii) reciprocal

agreements (iv) outsourced a business issue

Note high-end BC for DCs active-active hellip hot failover hellip

Note entry-point BC for DCs electronic vaulting remote journaling

transaction logging

Mirror full redundancy short-

distancewidely-separated Data

Centers latency considerations

Hot fully equipped fallback Data Centre

Warm fallback Data Center missing key

components

Cold empty fallback Data Center

DC Recovery options by increasing MTPD RTO amp decreasing cost

mirroring

robertmcachia at umedumt

Phases Exercise amp Testing Maintain amp Review Awareness amp Training

Maintain amp Review Phase

feedback from testingaudit broadendeepen DC BC plan

Awareness amp Training Phase

its never enough institutionalize BC practice embed into

DC culture

The proof of the bdquopudding‟

is in the eating

Exercise amp Testing Phase

Give yourself an intentional DC disaster

1 Re-test BCP regularly

(walk-through checklist role-

playingsimulation full interruption amp

rehearsal)

2 Test alternative risk (threat +

vulnerability) scenarios

robertmcachia at umedumt

An example disaster mass cyber-attack

A mass cyber-attack may

involve

prior sniffing hellip

hellip social engineering amp

widespread attack

An attack may be blended

denial-of-service andor

widespread virusworm

infection andor

penetration (data theft

data corruption) andor

fraudblackmail

IncidentProblem Management

(recall ISO 20000 ITIL)source M Benyoucef

University of Ottawa

robertmcachia at umedumt

An example DC disaster BCM response to cyber-attack

This sequence applies to DoSDDoS infection amp penetration cyber-

attacks

Note After the BCM response seek prevention (i) Lessons Learnt (ii)

improve DC processsystems resilience (iii) heighten DC emergency

preparedness

1 Interdict actions to block attack halt its progress

2 Contain actions to prevent further degradation

amp regain control

3 Recover repair fix damage bring all systems

data applicationsservices to normality

4 Analyse investigationaudit - examination of

evidence

When prevention fails and an attack is under way the BCM

response is

Interdict Contain

Recover Analyse

robertmcachia at umedumt

Structure amp Content of BS25999

A DC may go beyond organising a BCP project

It may decide to formalise its BCM assurance process by embracing the

BS25999 standard

BS25999 holistic structure amp content

bull overview of business continuity management (BCM)

bull business continuity management policy

bull BCM programme management

bull understanding the organization

bull determining business continuity strategy

bull developing and implementing a BCM response

bull exercising maintaining and reviewing BCM arrangements

bull embedding BCM in the organizationrsquos culture

BS25999 Holistic

robertmcachia at umedumt

Some takeaways (1)

BCM thinking amp action

bull decrease the number of

holes both management amp

technical

bull decrease the size of the

holes and

bull keep them moving so they

are never aligned

The Swiss cheese model of business resilience

robertmcachia at umedumt

Some takeaways (2)

BCM is a concern of

CIOs CTOs Data Centre Managers

Quality Managers Infosec Managers

IT Architects DBAs

IT Auditors

CEOs CFOs because itrsquos Governance

BCM addresses broad non-specific risks it is a ldquocatch allrdquo

BCM is probably the most cost-effective security control of all

BCM start now start small scale fast broadendeepen BCM scope

incrementally

You must be able to respond to your circumstances as they exist - not as you

would like them to be - Brian Billick

robertmcachia at umedumt

Sources in and around BCM for Data Centres

The Business Continuity Institute (BCI UK) httpwwwthebciorg

Guidancestudies on BCM by ENISA (EU) ECB (EU) CMI (UK) FSA (UK) UK

Cabinet Office

A primer on ITIL ISO 20000 - IT Service Management

httpwwwisaca-

maltaorgdocumentspresentationsIT_Service_Management_Robert_M_Cachia_26_02

_2009pdf

Agility amp alertness hellipcontinuitycentral httpwwwcontinuitycentralcomukhtm

BS25999 (British Standards Institution) httpwwwbsigroupcomensectorsandservicesDisciplinesBusines

s-Continuity

RiskIT amp CobIT frameworks (ISACA) wwwisaca-maltaorg

SFIA HR framework for IT ISEB (BCS) httpwwwbcsorg

SEI (USA) httpwwwseicmuedu

Uptime Institute (USA) httpwwwuptimeinstituteorg

robertmcachia at umedumt

Questions and Discussion

robertmcachia atldquo umedumt

Thank You

Dr Robert M Cachia

httpwwwlinkedincominrobertmcachia

robertmcachia ldquoatrdquo umedumt

BC Planning amp Design Phase - business recovery

DC recovery options driven by business requirements

regulation contracts amp SLAs A typical recovery sequence

1 Set MTPD objective for the DCfor each

client

2 Assign RTOs for individual

clientservicesapplicationsdata on

basis of prioritySLA

3 Evaluate alternative management amp

technical recovery options

4 Perform costbenefit analysis for each

option (re-planiterate)

5 Decide CEOCFO (i) approval (ii)

budget (iii) tasking individualsFailure to prepare is preparing

to fail - John Wooden

robertmcachia at umedumt

Some business recovery options for the DC

bull an Emergency Operations OfficeCentre

bull virtual workplaceteleworking from home for selected DC employees

bull alternative IT Call CentreData Centre

bull business decision - On-shore Near-shore Offshore

fallbackredundancy options

A typical recovery sequence

1 Initial emergency response

2 Resume mission-critical ApplicationsServices

3 Resume non-critical ApplicationsServices

4 Restoration to primary site full services verify stability

Ongoing communication - DC employees clients regulators

BC Planning amp Design Phase - business recovery (2)

robertmcachia at umedumt

BC Planning amp Design - technical recovery

Note business decision On-shore Near-shore Offshore

Note Mirror Hot Warm Cold (i) own (ii) shared-space (iii) reciprocal

agreements (iv) outsourced a business issue

Note high-end BC for DCs active-active hellip hot failover hellip

Note entry-point BC for DCs electronic vaulting remote journaling

transaction logging

Mirror full redundancy short-

distancewidely-separated Data

Centers latency considerations

Hot fully equipped fallback Data Centre

Warm fallback Data Center missing key

components

Cold empty fallback Data Center

DC Recovery options by increasing MTPD RTO amp decreasing cost

mirroring

robertmcachia at umedumt

Phases Exercise amp Testing Maintain amp Review Awareness amp Training

Maintain amp Review Phase

feedback from testingaudit broadendeepen DC BC plan

Awareness amp Training Phase

its never enough institutionalize BC practice embed into

DC culture

The proof of the bdquopudding‟

is in the eating

Exercise amp Testing Phase

Give yourself an intentional DC disaster

1 Re-test BCP regularly

(walk-through checklist role-

playingsimulation full interruption amp

rehearsal)

2 Test alternative risk (threat +

vulnerability) scenarios

robertmcachia at umedumt

An example disaster mass cyber-attack

A mass cyber-attack may

involve

prior sniffing hellip

hellip social engineering amp

widespread attack

An attack may be blended

denial-of-service andor

widespread virusworm

infection andor

penetration (data theft

data corruption) andor

fraudblackmail

IncidentProblem Management

(recall ISO 20000 ITIL)source M Benyoucef

University of Ottawa

robertmcachia at umedumt

An example DC disaster BCM response to cyber-attack

This sequence applies to DoSDDoS infection amp penetration cyber-

attacks

Note After the BCM response seek prevention (i) Lessons Learnt (ii)

improve DC processsystems resilience (iii) heighten DC emergency

preparedness

1 Interdict actions to block attack halt its progress

2 Contain actions to prevent further degradation

amp regain control

3 Recover repair fix damage bring all systems

data applicationsservices to normality

4 Analyse investigationaudit - examination of

evidence

When prevention fails and an attack is under way the BCM

response is

Interdict Contain

Recover Analyse

robertmcachia at umedumt

Structure amp Content of BS25999

A DC may go beyond organising a BCP project

It may decide to formalise its BCM assurance process by embracing the

BS25999 standard

BS25999 holistic structure amp content

bull overview of business continuity management (BCM)

bull business continuity management policy

bull BCM programme management

bull understanding the organization

bull determining business continuity strategy

bull developing and implementing a BCM response

bull exercising maintaining and reviewing BCM arrangements

bull embedding BCM in the organizationrsquos culture

BS25999 Holistic

robertmcachia at umedumt

Some takeaways (1)

BCM thinking amp action

bull decrease the number of

holes both management amp

technical

bull decrease the size of the

holes and

bull keep them moving so they

are never aligned

The Swiss cheese model of business resilience

robertmcachia at umedumt

Some takeaways (2)

BCM is a concern of

CIOs CTOs Data Centre Managers

Quality Managers Infosec Managers

IT Architects DBAs

IT Auditors

CEOs CFOs because itrsquos Governance

BCM addresses broad non-specific risks it is a ldquocatch allrdquo

BCM is probably the most cost-effective security control of all

BCM start now start small scale fast broadendeepen BCM scope

incrementally

You must be able to respond to your circumstances as they exist - not as you

would like them to be - Brian Billick

robertmcachia at umedumt

Sources in and around BCM for Data Centres

The Business Continuity Institute (BCI UK) httpwwwthebciorg

Guidancestudies on BCM by ENISA (EU) ECB (EU) CMI (UK) FSA (UK) UK

Cabinet Office

A primer on ITIL ISO 20000 - IT Service Management

httpwwwisaca-

maltaorgdocumentspresentationsIT_Service_Management_Robert_M_Cachia_26_02

_2009pdf

Agility amp alertness hellipcontinuitycentral httpwwwcontinuitycentralcomukhtm

BS25999 (British Standards Institution) httpwwwbsigroupcomensectorsandservicesDisciplinesBusines

s-Continuity

RiskIT amp CobIT frameworks (ISACA) wwwisaca-maltaorg

SFIA HR framework for IT ISEB (BCS) httpwwwbcsorg

SEI (USA) httpwwwseicmuedu

Uptime Institute (USA) httpwwwuptimeinstituteorg

robertmcachia at umedumt

Questions and Discussion

robertmcachia atldquo umedumt

Thank You

Dr Robert M Cachia

httpwwwlinkedincominrobertmcachia

robertmcachia ldquoatrdquo umedumt

Some business recovery options for the DC

bull an Emergency Operations OfficeCentre

bull virtual workplaceteleworking from home for selected DC employees

bull alternative IT Call CentreData Centre

bull business decision - On-shore Near-shore Offshore

fallbackredundancy options

A typical recovery sequence

1 Initial emergency response

2 Resume mission-critical ApplicationsServices

3 Resume non-critical ApplicationsServices

4 Restoration to primary site full services verify stability

Ongoing communication - DC employees clients regulators

BC Planning amp Design Phase - business recovery (2)

robertmcachia at umedumt

BC Planning amp Design - technical recovery

Note business decision On-shore Near-shore Offshore

Note Mirror Hot Warm Cold (i) own (ii) shared-space (iii) reciprocal

agreements (iv) outsourced a business issue

Note high-end BC for DCs active-active hellip hot failover hellip

Note entry-point BC for DCs electronic vaulting remote journaling

transaction logging

Mirror full redundancy short-

distancewidely-separated Data

Centers latency considerations

Hot fully equipped fallback Data Centre

Warm fallback Data Center missing key

components

Cold empty fallback Data Center

DC Recovery options by increasing MTPD RTO amp decreasing cost

mirroring

robertmcachia at umedumt

Phases Exercise amp Testing Maintain amp Review Awareness amp Training

Maintain amp Review Phase

feedback from testingaudit broadendeepen DC BC plan

Awareness amp Training Phase

its never enough institutionalize BC practice embed into

DC culture

The proof of the bdquopudding‟

is in the eating

Exercise amp Testing Phase

Give yourself an intentional DC disaster

1 Re-test BCP regularly

(walk-through checklist role-

playingsimulation full interruption amp

rehearsal)

2 Test alternative risk (threat +

vulnerability) scenarios

robertmcachia at umedumt

An example disaster mass cyber-attack

A mass cyber-attack may

involve

prior sniffing hellip

hellip social engineering amp

widespread attack

An attack may be blended

denial-of-service andor

widespread virusworm

infection andor

penetration (data theft

data corruption) andor

fraudblackmail

IncidentProblem Management

(recall ISO 20000 ITIL)source M Benyoucef

University of Ottawa

robertmcachia at umedumt

An example DC disaster BCM response to cyber-attack

This sequence applies to DoSDDoS infection amp penetration cyber-

attacks

Note After the BCM response seek prevention (i) Lessons Learnt (ii)

improve DC processsystems resilience (iii) heighten DC emergency

preparedness

1 Interdict actions to block attack halt its progress

2 Contain actions to prevent further degradation

amp regain control

3 Recover repair fix damage bring all systems

data applicationsservices to normality

4 Analyse investigationaudit - examination of

evidence

When prevention fails and an attack is under way the BCM

response is

Interdict Contain

Recover Analyse

robertmcachia at umedumt

Structure amp Content of BS25999

A DC may go beyond organising a BCP project

It may decide to formalise its BCM assurance process by embracing the

BS25999 standard

BS25999 holistic structure amp content

bull overview of business continuity management (BCM)

bull business continuity management policy

bull BCM programme management

bull understanding the organization

bull determining business continuity strategy

bull developing and implementing a BCM response

bull exercising maintaining and reviewing BCM arrangements

bull embedding BCM in the organizationrsquos culture

BS25999 Holistic

robertmcachia at umedumt

Some takeaways (1)

BCM thinking amp action

bull decrease the number of

holes both management amp

technical

bull decrease the size of the

holes and

bull keep them moving so they

are never aligned

The Swiss cheese model of business resilience

robertmcachia at umedumt

Some takeaways (2)

BCM is a concern of

CIOs CTOs Data Centre Managers

Quality Managers Infosec Managers

IT Architects DBAs

IT Auditors

CEOs CFOs because itrsquos Governance

BCM addresses broad non-specific risks it is a ldquocatch allrdquo

BCM is probably the most cost-effective security control of all

BCM start now start small scale fast broadendeepen BCM scope

incrementally

You must be able to respond to your circumstances as they exist - not as you

would like them to be - Brian Billick

robertmcachia at umedumt

Sources in and around BCM for Data Centres

The Business Continuity Institute (BCI UK) httpwwwthebciorg

Guidancestudies on BCM by ENISA (EU) ECB (EU) CMI (UK) FSA (UK) UK

Cabinet Office

A primer on ITIL ISO 20000 - IT Service Management

httpwwwisaca-

maltaorgdocumentspresentationsIT_Service_Management_Robert_M_Cachia_26_02

_2009pdf

Agility amp alertness hellipcontinuitycentral httpwwwcontinuitycentralcomukhtm

BS25999 (British Standards Institution) httpwwwbsigroupcomensectorsandservicesDisciplinesBusines

s-Continuity

RiskIT amp CobIT frameworks (ISACA) wwwisaca-maltaorg

SFIA HR framework for IT ISEB (BCS) httpwwwbcsorg

SEI (USA) httpwwwseicmuedu

Uptime Institute (USA) httpwwwuptimeinstituteorg

robertmcachia at umedumt

Questions and Discussion

robertmcachia atldquo umedumt

Thank You

Dr Robert M Cachia

httpwwwlinkedincominrobertmcachia

robertmcachia ldquoatrdquo umedumt

BC Planning amp Design - technical recovery

Note business decision On-shore Near-shore Offshore

Note Mirror Hot Warm Cold (i) own (ii) shared-space (iii) reciprocal

agreements (iv) outsourced a business issue

Note high-end BC for DCs active-active hellip hot failover hellip

Note entry-point BC for DCs electronic vaulting remote journaling

transaction logging

Mirror full redundancy short-

distancewidely-separated Data

Centers latency considerations

Hot fully equipped fallback Data Centre

Warm fallback Data Center missing key

components

Cold empty fallback Data Center

DC Recovery options by increasing MTPD RTO amp decreasing cost

mirroring

robertmcachia at umedumt

Phases Exercise amp Testing Maintain amp Review Awareness amp Training

Maintain amp Review Phase

feedback from testingaudit broadendeepen DC BC plan

Awareness amp Training Phase

its never enough institutionalize BC practice embed into

DC culture

The proof of the bdquopudding‟

is in the eating

Exercise amp Testing Phase

Give yourself an intentional DC disaster

1 Re-test BCP regularly

(walk-through checklist role-

playingsimulation full interruption amp

rehearsal)

2 Test alternative risk (threat +

vulnerability) scenarios

robertmcachia at umedumt

An example disaster mass cyber-attack

A mass cyber-attack may

involve

prior sniffing hellip

hellip social engineering amp

widespread attack

An attack may be blended

denial-of-service andor

widespread virusworm

infection andor

penetration (data theft

data corruption) andor

fraudblackmail

IncidentProblem Management

(recall ISO 20000 ITIL)source M Benyoucef

University of Ottawa

robertmcachia at umedumt

An example DC disaster BCM response to cyber-attack

This sequence applies to DoSDDoS infection amp penetration cyber-

attacks

Note After the BCM response seek prevention (i) Lessons Learnt (ii)

improve DC processsystems resilience (iii) heighten DC emergency

preparedness

1 Interdict actions to block attack halt its progress

2 Contain actions to prevent further degradation

amp regain control

3 Recover repair fix damage bring all systems

data applicationsservices to normality

4 Analyse investigationaudit - examination of

evidence

When prevention fails and an attack is under way the BCM

response is

Interdict Contain

Recover Analyse

robertmcachia at umedumt

Structure amp Content of BS25999

A DC may go beyond organising a BCP project

It may decide to formalise its BCM assurance process by embracing the

BS25999 standard

BS25999 holistic structure amp content

bull overview of business continuity management (BCM)

bull business continuity management policy

bull BCM programme management

bull understanding the organization

bull determining business continuity strategy

bull developing and implementing a BCM response

bull exercising maintaining and reviewing BCM arrangements

bull embedding BCM in the organizationrsquos culture

BS25999 Holistic

robertmcachia at umedumt

Some takeaways (1)

BCM thinking amp action

bull decrease the number of

holes both management amp

technical

bull decrease the size of the

holes and

bull keep them moving so they

are never aligned

The Swiss cheese model of business resilience

robertmcachia at umedumt

Some takeaways (2)

BCM is a concern of

CIOs CTOs Data Centre Managers

Quality Managers Infosec Managers

IT Architects DBAs

IT Auditors

CEOs CFOs because itrsquos Governance

BCM addresses broad non-specific risks it is a ldquocatch allrdquo

BCM is probably the most cost-effective security control of all

BCM start now start small scale fast broadendeepen BCM scope

incrementally

You must be able to respond to your circumstances as they exist - not as you

would like them to be - Brian Billick

robertmcachia at umedumt

Sources in and around BCM for Data Centres

The Business Continuity Institute (BCI UK) httpwwwthebciorg

Guidancestudies on BCM by ENISA (EU) ECB (EU) CMI (UK) FSA (UK) UK

Cabinet Office

A primer on ITIL ISO 20000 - IT Service Management

httpwwwisaca-

maltaorgdocumentspresentationsIT_Service_Management_Robert_M_Cachia_26_02

_2009pdf

Agility amp alertness hellipcontinuitycentral httpwwwcontinuitycentralcomukhtm

BS25999 (British Standards Institution) httpwwwbsigroupcomensectorsandservicesDisciplinesBusines

s-Continuity

RiskIT amp CobIT frameworks (ISACA) wwwisaca-maltaorg

SFIA HR framework for IT ISEB (BCS) httpwwwbcsorg

SEI (USA) httpwwwseicmuedu

Uptime Institute (USA) httpwwwuptimeinstituteorg

robertmcachia at umedumt

Questions and Discussion

robertmcachia atldquo umedumt

Thank You

Dr Robert M Cachia

httpwwwlinkedincominrobertmcachia

robertmcachia ldquoatrdquo umedumt

Phases Exercise amp Testing Maintain amp Review Awareness amp Training

Maintain amp Review Phase

feedback from testingaudit broadendeepen DC BC plan

Awareness amp Training Phase

its never enough institutionalize BC practice embed into

DC culture

The proof of the bdquopudding‟

is in the eating

Exercise amp Testing Phase

Give yourself an intentional DC disaster

1 Re-test BCP regularly

(walk-through checklist role-

playingsimulation full interruption amp

rehearsal)

2 Test alternative risk (threat +

vulnerability) scenarios

robertmcachia at umedumt

An example disaster mass cyber-attack

A mass cyber-attack may

involve

prior sniffing hellip

hellip social engineering amp

widespread attack

An attack may be blended

denial-of-service andor

widespread virusworm

infection andor

penetration (data theft

data corruption) andor

fraudblackmail

IncidentProblem Management

(recall ISO 20000 ITIL)source M Benyoucef

University of Ottawa

robertmcachia at umedumt

An example DC disaster BCM response to cyber-attack

This sequence applies to DoSDDoS infection amp penetration cyber-

attacks

Note After the BCM response seek prevention (i) Lessons Learnt (ii)

improve DC processsystems resilience (iii) heighten DC emergency

preparedness

1 Interdict actions to block attack halt its progress

2 Contain actions to prevent further degradation

amp regain control

3 Recover repair fix damage bring all systems

data applicationsservices to normality

4 Analyse investigationaudit - examination of

evidence

When prevention fails and an attack is under way the BCM

response is

Interdict Contain

Recover Analyse

robertmcachia at umedumt

Structure amp Content of BS25999

A DC may go beyond organising a BCP project

It may decide to formalise its BCM assurance process by embracing the

BS25999 standard

BS25999 holistic structure amp content

bull overview of business continuity management (BCM)

bull business continuity management policy

bull BCM programme management

bull understanding the organization

bull determining business continuity strategy

bull developing and implementing a BCM response

bull exercising maintaining and reviewing BCM arrangements

bull embedding BCM in the organizationrsquos culture

BS25999 Holistic

robertmcachia at umedumt

Some takeaways (1)

BCM thinking amp action

bull decrease the number of

holes both management amp

technical

bull decrease the size of the

holes and

bull keep them moving so they

are never aligned

The Swiss cheese model of business resilience

robertmcachia at umedumt

Some takeaways (2)

BCM is a concern of

CIOs CTOs Data Centre Managers

Quality Managers Infosec Managers

IT Architects DBAs

IT Auditors

CEOs CFOs because itrsquos Governance

BCM addresses broad non-specific risks it is a ldquocatch allrdquo

BCM is probably the most cost-effective security control of all

BCM start now start small scale fast broadendeepen BCM scope

incrementally

You must be able to respond to your circumstances as they exist - not as you

would like them to be - Brian Billick

robertmcachia at umedumt

Sources in and around BCM for Data Centres

The Business Continuity Institute (BCI UK) httpwwwthebciorg

Guidancestudies on BCM by ENISA (EU) ECB (EU) CMI (UK) FSA (UK) UK

Cabinet Office

A primer on ITIL ISO 20000 - IT Service Management

httpwwwisaca-

maltaorgdocumentspresentationsIT_Service_Management_Robert_M_Cachia_26_02

_2009pdf

Agility amp alertness hellipcontinuitycentral httpwwwcontinuitycentralcomukhtm

BS25999 (British Standards Institution) httpwwwbsigroupcomensectorsandservicesDisciplinesBusines

s-Continuity

RiskIT amp CobIT frameworks (ISACA) wwwisaca-maltaorg

SFIA HR framework for IT ISEB (BCS) httpwwwbcsorg

SEI (USA) httpwwwseicmuedu

Uptime Institute (USA) httpwwwuptimeinstituteorg

robertmcachia at umedumt

Questions and Discussion

robertmcachia atldquo umedumt

Thank You

Dr Robert M Cachia

httpwwwlinkedincominrobertmcachia

robertmcachia ldquoatrdquo umedumt

An example disaster mass cyber-attack

A mass cyber-attack may

involve

prior sniffing hellip

hellip social engineering amp

widespread attack

An attack may be blended

denial-of-service andor

widespread virusworm

infection andor

penetration (data theft

data corruption) andor

fraudblackmail

IncidentProblem Management

(recall ISO 20000 ITIL)source M Benyoucef

University of Ottawa

robertmcachia at umedumt

An example DC disaster BCM response to cyber-attack

This sequence applies to DoSDDoS infection amp penetration cyber-

attacks

Note After the BCM response seek prevention (i) Lessons Learnt (ii)

improve DC processsystems resilience (iii) heighten DC emergency

preparedness

1 Interdict actions to block attack halt its progress

2 Contain actions to prevent further degradation

amp regain control

3 Recover repair fix damage bring all systems

data applicationsservices to normality

4 Analyse investigationaudit - examination of

evidence

When prevention fails and an attack is under way the BCM

response is

Interdict Contain

Recover Analyse

robertmcachia at umedumt

Structure amp Content of BS25999

A DC may go beyond organising a BCP project

It may decide to formalise its BCM assurance process by embracing the

BS25999 standard

BS25999 holistic structure amp content

bull overview of business continuity management (BCM)

bull business continuity management policy

bull BCM programme management

bull understanding the organization

bull determining business continuity strategy

bull developing and implementing a BCM response

bull exercising maintaining and reviewing BCM arrangements

bull embedding BCM in the organizationrsquos culture

BS25999 Holistic

robertmcachia at umedumt

Some takeaways (1)

BCM thinking amp action

bull decrease the number of

holes both management amp

technical

bull decrease the size of the

holes and

bull keep them moving so they

are never aligned

The Swiss cheese model of business resilience

robertmcachia at umedumt

Some takeaways (2)

BCM is a concern of

CIOs CTOs Data Centre Managers

Quality Managers Infosec Managers

IT Architects DBAs

IT Auditors

CEOs CFOs because itrsquos Governance

BCM addresses broad non-specific risks it is a ldquocatch allrdquo

BCM is probably the most cost-effective security control of all

BCM start now start small scale fast broadendeepen BCM scope

incrementally

You must be able to respond to your circumstances as they exist - not as you

would like them to be - Brian Billick

robertmcachia at umedumt

Sources in and around BCM for Data Centres

The Business Continuity Institute (BCI UK) httpwwwthebciorg

Guidancestudies on BCM by ENISA (EU) ECB (EU) CMI (UK) FSA (UK) UK

Cabinet Office

A primer on ITIL ISO 20000 - IT Service Management

httpwwwisaca-

maltaorgdocumentspresentationsIT_Service_Management_Robert_M_Cachia_26_02

_2009pdf

Agility amp alertness hellipcontinuitycentral httpwwwcontinuitycentralcomukhtm

BS25999 (British Standards Institution) httpwwwbsigroupcomensectorsandservicesDisciplinesBusines

s-Continuity

RiskIT amp CobIT frameworks (ISACA) wwwisaca-maltaorg

SFIA HR framework for IT ISEB (BCS) httpwwwbcsorg

SEI (USA) httpwwwseicmuedu

Uptime Institute (USA) httpwwwuptimeinstituteorg

robertmcachia at umedumt

Questions and Discussion

robertmcachia atldquo umedumt

Thank You

Dr Robert M Cachia

httpwwwlinkedincominrobertmcachia

robertmcachia ldquoatrdquo umedumt

An example DC disaster BCM response to cyber-attack

This sequence applies to DoSDDoS infection amp penetration cyber-

attacks

Note After the BCM response seek prevention (i) Lessons Learnt (ii)

improve DC processsystems resilience (iii) heighten DC emergency

preparedness

1 Interdict actions to block attack halt its progress

2 Contain actions to prevent further degradation

amp regain control

3 Recover repair fix damage bring all systems

data applicationsservices to normality

4 Analyse investigationaudit - examination of

evidence

When prevention fails and an attack is under way the BCM

response is

Interdict Contain

Recover Analyse

robertmcachia at umedumt

Structure amp Content of BS25999

A DC may go beyond organising a BCP project

It may decide to formalise its BCM assurance process by embracing the

BS25999 standard

BS25999 holistic structure amp content

bull overview of business continuity management (BCM)

bull business continuity management policy

bull BCM programme management

bull understanding the organization

bull determining business continuity strategy

bull developing and implementing a BCM response

bull exercising maintaining and reviewing BCM arrangements

bull embedding BCM in the organizationrsquos culture

BS25999 Holistic

robertmcachia at umedumt

Some takeaways (1)

BCM thinking amp action

bull decrease the number of

holes both management amp

technical

bull decrease the size of the

holes and

bull keep them moving so they

are never aligned

The Swiss cheese model of business resilience

robertmcachia at umedumt

Some takeaways (2)

BCM is a concern of

CIOs CTOs Data Centre Managers

Quality Managers Infosec Managers

IT Architects DBAs

IT Auditors

CEOs CFOs because itrsquos Governance

BCM addresses broad non-specific risks it is a ldquocatch allrdquo

BCM is probably the most cost-effective security control of all

BCM start now start small scale fast broadendeepen BCM scope

incrementally

You must be able to respond to your circumstances as they exist - not as you

would like them to be - Brian Billick

robertmcachia at umedumt

Sources in and around BCM for Data Centres

The Business Continuity Institute (BCI UK) httpwwwthebciorg

Guidancestudies on BCM by ENISA (EU) ECB (EU) CMI (UK) FSA (UK) UK

Cabinet Office

A primer on ITIL ISO 20000 - IT Service Management

httpwwwisaca-

maltaorgdocumentspresentationsIT_Service_Management_Robert_M_Cachia_26_02

_2009pdf

Agility amp alertness hellipcontinuitycentral httpwwwcontinuitycentralcomukhtm

BS25999 (British Standards Institution) httpwwwbsigroupcomensectorsandservicesDisciplinesBusines

s-Continuity

RiskIT amp CobIT frameworks (ISACA) wwwisaca-maltaorg

SFIA HR framework for IT ISEB (BCS) httpwwwbcsorg

SEI (USA) httpwwwseicmuedu

Uptime Institute (USA) httpwwwuptimeinstituteorg

robertmcachia at umedumt

Questions and Discussion

robertmcachia atldquo umedumt

Thank You

Dr Robert M Cachia

httpwwwlinkedincominrobertmcachia

robertmcachia ldquoatrdquo umedumt

Structure amp Content of BS25999

A DC may go beyond organising a BCP project

It may decide to formalise its BCM assurance process by embracing the

BS25999 standard

BS25999 holistic structure amp content

bull overview of business continuity management (BCM)

bull business continuity management policy

bull BCM programme management

bull understanding the organization

bull determining business continuity strategy

bull developing and implementing a BCM response

bull exercising maintaining and reviewing BCM arrangements

bull embedding BCM in the organizationrsquos culture

BS25999 Holistic

robertmcachia at umedumt

Some takeaways (1)

BCM thinking amp action

bull decrease the number of

holes both management amp

technical

bull decrease the size of the

holes and

bull keep them moving so they

are never aligned

The Swiss cheese model of business resilience

robertmcachia at umedumt

Some takeaways (2)

BCM is a concern of

CIOs CTOs Data Centre Managers

Quality Managers Infosec Managers

IT Architects DBAs

IT Auditors

CEOs CFOs because itrsquos Governance

BCM addresses broad non-specific risks it is a ldquocatch allrdquo

BCM is probably the most cost-effective security control of all

BCM start now start small scale fast broadendeepen BCM scope

incrementally

You must be able to respond to your circumstances as they exist - not as you

would like them to be - Brian Billick

robertmcachia at umedumt

Sources in and around BCM for Data Centres

The Business Continuity Institute (BCI UK) httpwwwthebciorg

Guidancestudies on BCM by ENISA (EU) ECB (EU) CMI (UK) FSA (UK) UK

Cabinet Office

A primer on ITIL ISO 20000 - IT Service Management

httpwwwisaca-

maltaorgdocumentspresentationsIT_Service_Management_Robert_M_Cachia_26_02

_2009pdf

Agility amp alertness hellipcontinuitycentral httpwwwcontinuitycentralcomukhtm

BS25999 (British Standards Institution) httpwwwbsigroupcomensectorsandservicesDisciplinesBusines

s-Continuity

RiskIT amp CobIT frameworks (ISACA) wwwisaca-maltaorg

SFIA HR framework for IT ISEB (BCS) httpwwwbcsorg

SEI (USA) httpwwwseicmuedu

Uptime Institute (USA) httpwwwuptimeinstituteorg

robertmcachia at umedumt

Questions and Discussion

robertmcachia atldquo umedumt

Thank You

Dr Robert M Cachia

httpwwwlinkedincominrobertmcachia

robertmcachia ldquoatrdquo umedumt

Some takeaways (1)

BCM thinking amp action

bull decrease the number of

holes both management amp

technical

bull decrease the size of the

holes and

bull keep them moving so they

are never aligned

The Swiss cheese model of business resilience

robertmcachia at umedumt

Some takeaways (2)

BCM is a concern of

CIOs CTOs Data Centre Managers

Quality Managers Infosec Managers

IT Architects DBAs

IT Auditors

CEOs CFOs because itrsquos Governance

BCM addresses broad non-specific risks it is a ldquocatch allrdquo

BCM is probably the most cost-effective security control of all

BCM start now start small scale fast broadendeepen BCM scope

incrementally

You must be able to respond to your circumstances as they exist - not as you

would like them to be - Brian Billick

robertmcachia at umedumt

Sources in and around BCM for Data Centres

The Business Continuity Institute (BCI UK) httpwwwthebciorg

Guidancestudies on BCM by ENISA (EU) ECB (EU) CMI (UK) FSA (UK) UK

Cabinet Office

A primer on ITIL ISO 20000 - IT Service Management

httpwwwisaca-

maltaorgdocumentspresentationsIT_Service_Management_Robert_M_Cachia_26_02

_2009pdf

Agility amp alertness hellipcontinuitycentral httpwwwcontinuitycentralcomukhtm

BS25999 (British Standards Institution) httpwwwbsigroupcomensectorsandservicesDisciplinesBusines

s-Continuity

RiskIT amp CobIT frameworks (ISACA) wwwisaca-maltaorg

SFIA HR framework for IT ISEB (BCS) httpwwwbcsorg

SEI (USA) httpwwwseicmuedu

Uptime Institute (USA) httpwwwuptimeinstituteorg

robertmcachia at umedumt

Questions and Discussion

robertmcachia atldquo umedumt

Thank You

Dr Robert M Cachia

httpwwwlinkedincominrobertmcachia

robertmcachia ldquoatrdquo umedumt

Some takeaways (2)

BCM is a concern of

CIOs CTOs Data Centre Managers

Quality Managers Infosec Managers

IT Architects DBAs

IT Auditors

CEOs CFOs because itrsquos Governance

BCM addresses broad non-specific risks it is a ldquocatch allrdquo

BCM is probably the most cost-effective security control of all

BCM start now start small scale fast broadendeepen BCM scope

incrementally

You must be able to respond to your circumstances as they exist - not as you

would like them to be - Brian Billick

robertmcachia at umedumt

Sources in and around BCM for Data Centres

The Business Continuity Institute (BCI UK) httpwwwthebciorg

Guidancestudies on BCM by ENISA (EU) ECB (EU) CMI (UK) FSA (UK) UK

Cabinet Office

A primer on ITIL ISO 20000 - IT Service Management

httpwwwisaca-

maltaorgdocumentspresentationsIT_Service_Management_Robert_M_Cachia_26_02

_2009pdf

Agility amp alertness hellipcontinuitycentral httpwwwcontinuitycentralcomukhtm

BS25999 (British Standards Institution) httpwwwbsigroupcomensectorsandservicesDisciplinesBusines

s-Continuity

RiskIT amp CobIT frameworks (ISACA) wwwisaca-maltaorg

SFIA HR framework for IT ISEB (BCS) httpwwwbcsorg

SEI (USA) httpwwwseicmuedu

Uptime Institute (USA) httpwwwuptimeinstituteorg

robertmcachia at umedumt

Questions and Discussion

robertmcachia atldquo umedumt

Thank You

Dr Robert M Cachia

httpwwwlinkedincominrobertmcachia

robertmcachia ldquoatrdquo umedumt

Sources in and around BCM for Data Centres

The Business Continuity Institute (BCI UK) httpwwwthebciorg

Guidancestudies on BCM by ENISA (EU) ECB (EU) CMI (UK) FSA (UK) UK

Cabinet Office

A primer on ITIL ISO 20000 - IT Service Management

httpwwwisaca-

maltaorgdocumentspresentationsIT_Service_Management_Robert_M_Cachia_26_02

_2009pdf

Agility amp alertness hellipcontinuitycentral httpwwwcontinuitycentralcomukhtm

BS25999 (British Standards Institution) httpwwwbsigroupcomensectorsandservicesDisciplinesBusines

s-Continuity

RiskIT amp CobIT frameworks (ISACA) wwwisaca-maltaorg

SFIA HR framework for IT ISEB (BCS) httpwwwbcsorg

SEI (USA) httpwwwseicmuedu

Uptime Institute (USA) httpwwwuptimeinstituteorg

robertmcachia at umedumt

Questions and Discussion

robertmcachia atldquo umedumt

Thank You

Dr Robert M Cachia

httpwwwlinkedincominrobertmcachia

robertmcachia ldquoatrdquo umedumt

Questions and Discussion

robertmcachia atldquo umedumt

Thank You

Dr Robert M Cachia

httpwwwlinkedincominrobertmcachia

robertmcachia ldquoatrdquo umedumt

Thank You

Dr Robert M Cachia

httpwwwlinkedincominrobertmcachia

robertmcachia ldquoatrdquo umedumt