business continuity management for data centres - bcs · business continuity management for data...
TRANSCRIPT
Business Continuity Management
for Data Centres
Dr Robert M Cachia BSc Dott Sc (Milan) FCQI (UK)
Director ISACA Malta Chapter
IT Governance Manager
Government of Malta
Visiting Senior Lecturer
University of Malta
Joint Event
BCS Malta Section
ISACA Malta Chapter
13th October 2009
About myself
20+ years experience
Positions
Programmer Software Engineer IT Project Manager IT Quality Manager Principal Consultant IT Governance Manager
IT experience
bull Software House environment
bull IT Service Desk environment
bull Data Centre environment
University of Malta teaching
bull e-SCM ERP e-CRM
bull Service Quality Assurance
bull Sustainable Logistics amp Transportation
(to MBABAccty Dipl Logistics amp Transportation students)
Director ISACA Malta Chapter
robertmcachia at umedumt
Todayrsquos agenda (1)
BCM for industrial strength co-located commercial IT providers
eg Data Centres offering SaaS using Cloud Computing technologies
Today we discuss
bull e-services e-business data centres amp IT call centres
bull BCM for Data Centres
bull business e-services amp trust
bull BCM amp IT - economic context
bull threats vulnerabilities risk assets
bull Data Centre outages business impacts
bull BCM - in Data Centres for Data Centres
robertmcachia at umedumt
Todayrsquos agenda (2)
Today we discuss (continued)
bull a BCM initiative for DCs its output
bull BCP for DCs management choices
bull BCM ProjectProgramme
bull business impact analysis
bull risk assessment
bull BCM DC recovery - business amp technical
bull an example disaster - mass cyber-attack amp BCM response
bull structure and content of BS25999
bull some takeaways
robertmcachia at umedumt
e-Services e-Business - all from Data Centres amp IT Call Centres
robertmcachia at umedumt
BCM for Data Centres what
BCM
i is a management amp business capability to protect value-creating
activities of the DC
ii supports the Boardrsquos objective of staying in business
iii points i amp ii make BCM immediately a Governance issue
BCM - planned processes to
bull assess reduce amp manage vulnerability to risk
bull plan DC responses in case of adversity
bull exercise preparedness for continued IT delivery during disruption
bull restore normality and protect DC reputation and goodwill
ldquoThe time to repair the roof is when the sun is shiningrdquo (J F Kennedy)
robertmcachia at umedumt
Business e-Services amp trust
Is she
bull a Logistics Manager
bull a Banker
bull a PA to the CEO
bull a CEO
in
bull an Airport
bull a Bourse
bull a Telecoms provider
bull iGaming
bull e-payment
bull a Retail chain hellip
hellip doing B2B or B2C
hellip doing e-CRM or e-SCMe-services need trust
robertmcachia at umedumt
BCM and IT - economic context
Economy 20 amp Economy 30
digital networked global mobile locational 24x7x365
11 billion Internet users 27 billion mobile phone users
ubiquitous VoIPSkype e-mail amp calendaring software
e-Business e-Banking e-Payment e-Government
e-SCM e-CRM B2B B2C -across supply chains geographies time-zones
economies amp jurisdictions
BCM in DCs is mandated by economic logic
robertmcachia at umedumt
BCM and IT - what not to do
human error IT management human error technical
robertmcachia at umedumt
DC BCM terminology Threats Vulnerabilities Risk Assets
Threat the intent and capacity to cause loss or disruption
and create adverse consequences - eg to IT
services data Data Centers IT Call Centers etc
Vulnerability the susceptibility of a service provider service data
or infrastructure to damage impairment or exposure
by a threat
Risk a measure of the potential consequences of a
contingency against the likelihood of its occurring
threats + vulnerabilities = risk
Impact the consequenceeffect of a threat expressed in
terms of reduction of DC capability or loss of
business service data etc
Asset an IT service element eg software hardware IT
people data
robertmcachia at umedumt
Threats - some examples
Some data centre threats
human error (managementtechnical accidentalmalicious)
technical failure (mechanical electrical hardware software
etc)
software failure (even software patches themselves contain
bugs)
fire explosion smoke
floods (natural burst pipes)
toxic hazards (effluents emissions)
structural collapse
crime (vandalism organized crime white-collar crime etc)
mass cyber-attack digital terrorism cyber-war
Established threats may diminish and new threats emerge - alertness
Oct 2006 - Marsascala
robertmcachia at umedumt
Threats - general amp IT-specific
-
Explosion fire smoke emissions effluents
in or close to DC
malicious IT Threat
robertmcachia at umedumt
IT Vulnerabilities - some examples (1)
DC vulnerabilities may be technical or non-technical
eg in organizational design in business processes systems or software
Some IT vulnerabilities
Poor Data Centre site selection or Data Centre design
SPOFs (people hardware databases software)
Poor Data Centre procedures or compliance to
procedure (SFIA HR framework for IT ITIL)
Poor physical access control
Poor software patchinggood physical access
control
robertmcachia at umedumt
IT Vulnerabilities - some examples (2)
Poor software testing especially Web
testing (recall CMMi ISEB)
Poor Configuration ManagementVersion
Control (recall ISO 20000 ITIL)
Poor Infosec (passwords biometrics
encryption anti-virusmalware)
(recall ISO 27001 CobIT RiskIT)
Poor Capacity ManagementAvailability
Management (recall ISO 20000 ITIL)
Poor procurementsupplier management
(ISO 9000 ITIL BS25999)
Poor fire protection
Known vulnerabilities sometimes diminish amp new ones emerge - alertness
wiring from hell
robertmcachia at umedumt
Risks - general amp IT-specific
loss of facilities (buildings roads leading to building etc)
loss of data (data integrity data availability recall ISO 27001)
loss of power
loss of water supply (cooling) the opposite - flooding
computer infection outbreaks - viruses worms
physical access compromised intrusion with intent
asset seizure
Changing threats changing vulnerabilities - shifting scenario alertness
Some Data Centres risks
supply chain disruption (software vendor collapse
hardware vendor collapse ISP collapse)
loss of Telecom connectivity (data voice)
loss of people (people can be SPOFs)Your IT vendor supply
chain
Burst pipe
DC flooding
robertmcachia at umedumt
IT Risks - some examples
Physical DC Risks Logical DC Risks
DC outages business impacts - large amp immediate
Impact of DC outages
bull down-time
bull breached SLAs
bull lost revenue
bull reputational damage
clients and prospects lost
forever
bull cost to recover
bull legal liabilities
bull some DCs never recover
Recall many DCs are in
premises that were never
designed to be DC-ready
Q How to protect value-creation capability
A BCM
robertmcachia at umedumt
BCM in Data Centres BCM for Data Centres
360 e-continuity amp e-assurance
business functions of commercial IT providers(eg CEOrsquos office Procurement Marketing amp Sales HR Facilities
Planning QA)
IT processes eg IncidentProblem ConfigurationChange
AvailabilityCapacity
Data
Employees including IT people
Physical FacilitiesBuilding Networks VoIP
Computational and Storage
Servers SANNAS
Software stackOS DBMS middleware applications
ApplicationsServicesB2BB2C e-SCMERPe-CRM BIKM IntranetsExtranets e-mail
BCM for data centres 360
robertmcachia at umedumt
BCM DC initiative output a BC Plan + organizational capability
A good BC plan is articulated by components and views
With a good BC plan the CIOCTOData Centre Manager can tell the
CEO
We haveare executing plans by DC clientmarket segment hellip
We haveare executing plans by Applicationby Service hellip
We haveare executing plans by DC Departmenthellip
We haveare executing plans by Buildingby Floor hellip
BC Plan BC Planning (i) What (ii) How (iii) Who (iv) When
robertmcachia at umedumt
DC Business Continuity Planning management choices
Which DC clientsservicesapplications get restored first and who will
wait longer hellip recovery sequence for each client
Which servicesapplications get restored completely which clients will
initially only get limited service
MTPD maximum tolerable period of disruption the period of time by
which all identified systemsapplicationsservices and data
must be restored to normality (for DC for each client)
RTO recovery time objective the period in time within which
assetsservices must be recovered after disruption (for DC for
each client)
RTO affects the recovery option shorter RTO -gt more difficult
more expensive (for DC for each client)
RPO recovery point objective the point in time to which
assetsservices must be restored after disruption (for DC for
each client)
RPO affects the volume of data that may need to be restored
robertmcachia at umedumt
BCM initiative requires a dedicated ProjectProgramme
A BC initiative may adopt Prince 2 or PMBOK projectmanagement methods The typical BCM phases
BC Project Initiation Phase
(mandate BCM scope DC BCM objectives responsibilitiesapproach awareness and BC project outcomesdeliverables)
Business Impact Analysis Phase
Risk Assessment Phase
BC Planning amp Design Phase
Exercise amp Testing Phase
Maintenance amp Review Phase
Awareness amp Training Phase
robertmcachia at umedumt
Business Impact Analysis (BIA) Phase
BIA achieves understanding of the DC its users activities
servicesrevenue streams applications data liabilities
bull BIA takes an internal focus - it identifies critical DC value-creatingactivities and assets
bull BIA quantifies DC loss due to disruptionimpairment of assets(reputational financial lost revenue cost of recovery)
bull BIA does not estimate the probability of types of DC incidents - itquantifies the consequences
bull BIA techniques questionnaires + interviews
BIA answers the questions which DC activities amp
assets create most value hellip of all potential DC losses
which impaired activities amp assets would be the
greatest losscascading impacts
multiple DC failure
robertmcachia at umedumt
The Risk Matrix - likelihood vs impact
Major Risks to DC - top-right square
unacceptable Eliminate or Postpone
Transfer Monitor amp Mitigate +
management ownership + DO BCM -Governance
Contingency Risks to DC - top-left
square very rare events will
extinguish the business complacency +
overconfidence can distract
management from the consequences
DO BCM - Governance
High incidence and low impact risksamp
minor risks order of the day -
operational matters
robertmcachia at umedumt
Risk Assessment Phase
RA considers the whatwhywherewho can and would cause the DCthe disastrous losses identified in BIA RA seeks to understand threatsto valued serviceDC assets
bull RA takes a primarily external focus a worldpoliticaleconomic IT sector focus - it looks at sources and causes
bull RA identifies threat scenarios to DC value-creating activities ampassets
bull RA estimates likelihood (probability) of occurrence of DC lossesamp asset impairment identified in BIA (estimates orguesstimates)
Note Need to locate DC activities amp assets in the slots of the RiskMatrix for the DC and plan to protect value-creating capability
robertmcachia at umedumt
BC Planning amp Design Phase - business recovery
DC recovery options driven by business requirements
regulation contracts amp SLAs A typical recovery sequence
1 Set MTPD objective for the DCfor each
client
2 Assign RTOs for individual
clientservicesapplicationsdata on
basis of prioritySLA
3 Evaluate alternative management amp
technical recovery options
4 Perform costbenefit analysis for each
option (re-planiterate)
5 Decide CEOCFO (i) approval (ii)
budget (iii) tasking individualsFailure to prepare is preparing
to fail - John Wooden
robertmcachia at umedumt
Some business recovery options for the DC
bull an Emergency Operations OfficeCentre
bull virtual workplaceteleworking from home for selected DC employees
bull alternative IT Call CentreData Centre
bull business decision - On-shore Near-shore Offshore
fallbackredundancy options
A typical recovery sequence
1 Initial emergency response
2 Resume mission-critical ApplicationsServices
3 Resume non-critical ApplicationsServices
4 Restoration to primary site full services verify stability
Ongoing communication - DC employees clients regulators
BC Planning amp Design Phase - business recovery (2)
robertmcachia at umedumt
BC Planning amp Design - technical recovery
Note business decision On-shore Near-shore Offshore
Note Mirror Hot Warm Cold (i) own (ii) shared-space (iii) reciprocal
agreements (iv) outsourced a business issue
Note high-end BC for DCs active-active hellip hot failover hellip
Note entry-point BC for DCs electronic vaulting remote journaling
transaction logging
Mirror full redundancy short-
distancewidely-separated Data
Centers latency considerations
Hot fully equipped fallback Data Centre
Warm fallback Data Center missing key
components
Cold empty fallback Data Center
DC Recovery options by increasing MTPD RTO amp decreasing cost
mirroring
robertmcachia at umedumt
Phases Exercise amp Testing Maintain amp Review Awareness amp Training
Maintain amp Review Phase
feedback from testingaudit broadendeepen DC BC plan
Awareness amp Training Phase
its never enough institutionalize BC practice embed into
DC culture
The proof of the bdquopudding‟
is in the eating
Exercise amp Testing Phase
Give yourself an intentional DC disaster
1 Re-test BCP regularly
(walk-through checklist role-
playingsimulation full interruption amp
rehearsal)
2 Test alternative risk (threat +
vulnerability) scenarios
robertmcachia at umedumt
An example disaster mass cyber-attack
A mass cyber-attack may
involve
prior sniffing hellip
hellip social engineering amp
widespread attack
An attack may be blended
denial-of-service andor
widespread virusworm
infection andor
penetration (data theft
data corruption) andor
fraudblackmail
IncidentProblem Management
(recall ISO 20000 ITIL)source M Benyoucef
University of Ottawa
robertmcachia at umedumt
An example DC disaster BCM response to cyber-attack
This sequence applies to DoSDDoS infection amp penetration cyber-
attacks
Note After the BCM response seek prevention (i) Lessons Learnt (ii)
improve DC processsystems resilience (iii) heighten DC emergency
preparedness
1 Interdict actions to block attack halt its progress
2 Contain actions to prevent further degradation
amp regain control
3 Recover repair fix damage bring all systems
data applicationsservices to normality
4 Analyse investigationaudit - examination of
evidence
When prevention fails and an attack is under way the BCM
response is
Interdict Contain
Recover Analyse
robertmcachia at umedumt
Structure amp Content of BS25999
A DC may go beyond organising a BCP project
It may decide to formalise its BCM assurance process by embracing the
BS25999 standard
BS25999 holistic structure amp content
bull overview of business continuity management (BCM)
bull business continuity management policy
bull BCM programme management
bull understanding the organization
bull determining business continuity strategy
bull developing and implementing a BCM response
bull exercising maintaining and reviewing BCM arrangements
bull embedding BCM in the organizationrsquos culture
BS25999 Holistic
robertmcachia at umedumt
Some takeaways (1)
BCM thinking amp action
bull decrease the number of
holes both management amp
technical
bull decrease the size of the
holes and
bull keep them moving so they
are never aligned
The Swiss cheese model of business resilience
robertmcachia at umedumt
Some takeaways (2)
BCM is a concern of
CIOs CTOs Data Centre Managers
Quality Managers Infosec Managers
IT Architects DBAs
IT Auditors
CEOs CFOs because itrsquos Governance
BCM addresses broad non-specific risks it is a ldquocatch allrdquo
BCM is probably the most cost-effective security control of all
BCM start now start small scale fast broadendeepen BCM scope
incrementally
You must be able to respond to your circumstances as they exist - not as you
would like them to be - Brian Billick
robertmcachia at umedumt
Sources in and around BCM for Data Centres
The Business Continuity Institute (BCI UK) httpwwwthebciorg
Guidancestudies on BCM by ENISA (EU) ECB (EU) CMI (UK) FSA (UK) UK
Cabinet Office
A primer on ITIL ISO 20000 - IT Service Management
httpwwwisaca-
maltaorgdocumentspresentationsIT_Service_Management_Robert_M_Cachia_26_02
_2009pdf
Agility amp alertness hellipcontinuitycentral httpwwwcontinuitycentralcomukhtm
BS25999 (British Standards Institution) httpwwwbsigroupcomensectorsandservicesDisciplinesBusines
s-Continuity
RiskIT amp CobIT frameworks (ISACA) wwwisaca-maltaorg
SFIA HR framework for IT ISEB (BCS) httpwwwbcsorg
SEI (USA) httpwwwseicmuedu
Uptime Institute (USA) httpwwwuptimeinstituteorg
robertmcachia at umedumt
Questions and Discussion
robertmcachia atldquo umedumt
Thank You
Dr Robert M Cachia
httpwwwlinkedincominrobertmcachia
robertmcachia ldquoatrdquo umedumt
About myself
20+ years experience
Positions
Programmer Software Engineer IT Project Manager IT Quality Manager Principal Consultant IT Governance Manager
IT experience
bull Software House environment
bull IT Service Desk environment
bull Data Centre environment
University of Malta teaching
bull e-SCM ERP e-CRM
bull Service Quality Assurance
bull Sustainable Logistics amp Transportation
(to MBABAccty Dipl Logistics amp Transportation students)
Director ISACA Malta Chapter
robertmcachia at umedumt
Todayrsquos agenda (1)
BCM for industrial strength co-located commercial IT providers
eg Data Centres offering SaaS using Cloud Computing technologies
Today we discuss
bull e-services e-business data centres amp IT call centres
bull BCM for Data Centres
bull business e-services amp trust
bull BCM amp IT - economic context
bull threats vulnerabilities risk assets
bull Data Centre outages business impacts
bull BCM - in Data Centres for Data Centres
robertmcachia at umedumt
Todayrsquos agenda (2)
Today we discuss (continued)
bull a BCM initiative for DCs its output
bull BCP for DCs management choices
bull BCM ProjectProgramme
bull business impact analysis
bull risk assessment
bull BCM DC recovery - business amp technical
bull an example disaster - mass cyber-attack amp BCM response
bull structure and content of BS25999
bull some takeaways
robertmcachia at umedumt
e-Services e-Business - all from Data Centres amp IT Call Centres
robertmcachia at umedumt
BCM for Data Centres what
BCM
i is a management amp business capability to protect value-creating
activities of the DC
ii supports the Boardrsquos objective of staying in business
iii points i amp ii make BCM immediately a Governance issue
BCM - planned processes to
bull assess reduce amp manage vulnerability to risk
bull plan DC responses in case of adversity
bull exercise preparedness for continued IT delivery during disruption
bull restore normality and protect DC reputation and goodwill
ldquoThe time to repair the roof is when the sun is shiningrdquo (J F Kennedy)
robertmcachia at umedumt
Business e-Services amp trust
Is she
bull a Logistics Manager
bull a Banker
bull a PA to the CEO
bull a CEO
in
bull an Airport
bull a Bourse
bull a Telecoms provider
bull iGaming
bull e-payment
bull a Retail chain hellip
hellip doing B2B or B2C
hellip doing e-CRM or e-SCMe-services need trust
robertmcachia at umedumt
BCM and IT - economic context
Economy 20 amp Economy 30
digital networked global mobile locational 24x7x365
11 billion Internet users 27 billion mobile phone users
ubiquitous VoIPSkype e-mail amp calendaring software
e-Business e-Banking e-Payment e-Government
e-SCM e-CRM B2B B2C -across supply chains geographies time-zones
economies amp jurisdictions
BCM in DCs is mandated by economic logic
robertmcachia at umedumt
BCM and IT - what not to do
human error IT management human error technical
robertmcachia at umedumt
DC BCM terminology Threats Vulnerabilities Risk Assets
Threat the intent and capacity to cause loss or disruption
and create adverse consequences - eg to IT
services data Data Centers IT Call Centers etc
Vulnerability the susceptibility of a service provider service data
or infrastructure to damage impairment or exposure
by a threat
Risk a measure of the potential consequences of a
contingency against the likelihood of its occurring
threats + vulnerabilities = risk
Impact the consequenceeffect of a threat expressed in
terms of reduction of DC capability or loss of
business service data etc
Asset an IT service element eg software hardware IT
people data
robertmcachia at umedumt
Threats - some examples
Some data centre threats
human error (managementtechnical accidentalmalicious)
technical failure (mechanical electrical hardware software
etc)
software failure (even software patches themselves contain
bugs)
fire explosion smoke
floods (natural burst pipes)
toxic hazards (effluents emissions)
structural collapse
crime (vandalism organized crime white-collar crime etc)
mass cyber-attack digital terrorism cyber-war
Established threats may diminish and new threats emerge - alertness
Oct 2006 - Marsascala
robertmcachia at umedumt
Threats - general amp IT-specific
-
Explosion fire smoke emissions effluents
in or close to DC
malicious IT Threat
robertmcachia at umedumt
IT Vulnerabilities - some examples (1)
DC vulnerabilities may be technical or non-technical
eg in organizational design in business processes systems or software
Some IT vulnerabilities
Poor Data Centre site selection or Data Centre design
SPOFs (people hardware databases software)
Poor Data Centre procedures or compliance to
procedure (SFIA HR framework for IT ITIL)
Poor physical access control
Poor software patchinggood physical access
control
robertmcachia at umedumt
IT Vulnerabilities - some examples (2)
Poor software testing especially Web
testing (recall CMMi ISEB)
Poor Configuration ManagementVersion
Control (recall ISO 20000 ITIL)
Poor Infosec (passwords biometrics
encryption anti-virusmalware)
(recall ISO 27001 CobIT RiskIT)
Poor Capacity ManagementAvailability
Management (recall ISO 20000 ITIL)
Poor procurementsupplier management
(ISO 9000 ITIL BS25999)
Poor fire protection
Known vulnerabilities sometimes diminish amp new ones emerge - alertness
wiring from hell
robertmcachia at umedumt
Risks - general amp IT-specific
loss of facilities (buildings roads leading to building etc)
loss of data (data integrity data availability recall ISO 27001)
loss of power
loss of water supply (cooling) the opposite - flooding
computer infection outbreaks - viruses worms
physical access compromised intrusion with intent
asset seizure
Changing threats changing vulnerabilities - shifting scenario alertness
Some Data Centres risks
supply chain disruption (software vendor collapse
hardware vendor collapse ISP collapse)
loss of Telecom connectivity (data voice)
loss of people (people can be SPOFs)Your IT vendor supply
chain
Burst pipe
DC flooding
robertmcachia at umedumt
IT Risks - some examples
Physical DC Risks Logical DC Risks
DC outages business impacts - large amp immediate
Impact of DC outages
bull down-time
bull breached SLAs
bull lost revenue
bull reputational damage
clients and prospects lost
forever
bull cost to recover
bull legal liabilities
bull some DCs never recover
Recall many DCs are in
premises that were never
designed to be DC-ready
Q How to protect value-creation capability
A BCM
robertmcachia at umedumt
BCM in Data Centres BCM for Data Centres
360 e-continuity amp e-assurance
business functions of commercial IT providers(eg CEOrsquos office Procurement Marketing amp Sales HR Facilities
Planning QA)
IT processes eg IncidentProblem ConfigurationChange
AvailabilityCapacity
Data
Employees including IT people
Physical FacilitiesBuilding Networks VoIP
Computational and Storage
Servers SANNAS
Software stackOS DBMS middleware applications
ApplicationsServicesB2BB2C e-SCMERPe-CRM BIKM IntranetsExtranets e-mail
BCM for data centres 360
robertmcachia at umedumt
BCM DC initiative output a BC Plan + organizational capability
A good BC plan is articulated by components and views
With a good BC plan the CIOCTOData Centre Manager can tell the
CEO
We haveare executing plans by DC clientmarket segment hellip
We haveare executing plans by Applicationby Service hellip
We haveare executing plans by DC Departmenthellip
We haveare executing plans by Buildingby Floor hellip
BC Plan BC Planning (i) What (ii) How (iii) Who (iv) When
robertmcachia at umedumt
DC Business Continuity Planning management choices
Which DC clientsservicesapplications get restored first and who will
wait longer hellip recovery sequence for each client
Which servicesapplications get restored completely which clients will
initially only get limited service
MTPD maximum tolerable period of disruption the period of time by
which all identified systemsapplicationsservices and data
must be restored to normality (for DC for each client)
RTO recovery time objective the period in time within which
assetsservices must be recovered after disruption (for DC for
each client)
RTO affects the recovery option shorter RTO -gt more difficult
more expensive (for DC for each client)
RPO recovery point objective the point in time to which
assetsservices must be restored after disruption (for DC for
each client)
RPO affects the volume of data that may need to be restored
robertmcachia at umedumt
BCM initiative requires a dedicated ProjectProgramme
A BC initiative may adopt Prince 2 or PMBOK projectmanagement methods The typical BCM phases
BC Project Initiation Phase
(mandate BCM scope DC BCM objectives responsibilitiesapproach awareness and BC project outcomesdeliverables)
Business Impact Analysis Phase
Risk Assessment Phase
BC Planning amp Design Phase
Exercise amp Testing Phase
Maintenance amp Review Phase
Awareness amp Training Phase
robertmcachia at umedumt
Business Impact Analysis (BIA) Phase
BIA achieves understanding of the DC its users activities
servicesrevenue streams applications data liabilities
bull BIA takes an internal focus - it identifies critical DC value-creatingactivities and assets
bull BIA quantifies DC loss due to disruptionimpairment of assets(reputational financial lost revenue cost of recovery)
bull BIA does not estimate the probability of types of DC incidents - itquantifies the consequences
bull BIA techniques questionnaires + interviews
BIA answers the questions which DC activities amp
assets create most value hellip of all potential DC losses
which impaired activities amp assets would be the
greatest losscascading impacts
multiple DC failure
robertmcachia at umedumt
The Risk Matrix - likelihood vs impact
Major Risks to DC - top-right square
unacceptable Eliminate or Postpone
Transfer Monitor amp Mitigate +
management ownership + DO BCM -Governance
Contingency Risks to DC - top-left
square very rare events will
extinguish the business complacency +
overconfidence can distract
management from the consequences
DO BCM - Governance
High incidence and low impact risksamp
minor risks order of the day -
operational matters
robertmcachia at umedumt
Risk Assessment Phase
RA considers the whatwhywherewho can and would cause the DCthe disastrous losses identified in BIA RA seeks to understand threatsto valued serviceDC assets
bull RA takes a primarily external focus a worldpoliticaleconomic IT sector focus - it looks at sources and causes
bull RA identifies threat scenarios to DC value-creating activities ampassets
bull RA estimates likelihood (probability) of occurrence of DC lossesamp asset impairment identified in BIA (estimates orguesstimates)
Note Need to locate DC activities amp assets in the slots of the RiskMatrix for the DC and plan to protect value-creating capability
robertmcachia at umedumt
BC Planning amp Design Phase - business recovery
DC recovery options driven by business requirements
regulation contracts amp SLAs A typical recovery sequence
1 Set MTPD objective for the DCfor each
client
2 Assign RTOs for individual
clientservicesapplicationsdata on
basis of prioritySLA
3 Evaluate alternative management amp
technical recovery options
4 Perform costbenefit analysis for each
option (re-planiterate)
5 Decide CEOCFO (i) approval (ii)
budget (iii) tasking individualsFailure to prepare is preparing
to fail - John Wooden
robertmcachia at umedumt
Some business recovery options for the DC
bull an Emergency Operations OfficeCentre
bull virtual workplaceteleworking from home for selected DC employees
bull alternative IT Call CentreData Centre
bull business decision - On-shore Near-shore Offshore
fallbackredundancy options
A typical recovery sequence
1 Initial emergency response
2 Resume mission-critical ApplicationsServices
3 Resume non-critical ApplicationsServices
4 Restoration to primary site full services verify stability
Ongoing communication - DC employees clients regulators
BC Planning amp Design Phase - business recovery (2)
robertmcachia at umedumt
BC Planning amp Design - technical recovery
Note business decision On-shore Near-shore Offshore
Note Mirror Hot Warm Cold (i) own (ii) shared-space (iii) reciprocal
agreements (iv) outsourced a business issue
Note high-end BC for DCs active-active hellip hot failover hellip
Note entry-point BC for DCs electronic vaulting remote journaling
transaction logging
Mirror full redundancy short-
distancewidely-separated Data
Centers latency considerations
Hot fully equipped fallback Data Centre
Warm fallback Data Center missing key
components
Cold empty fallback Data Center
DC Recovery options by increasing MTPD RTO amp decreasing cost
mirroring
robertmcachia at umedumt
Phases Exercise amp Testing Maintain amp Review Awareness amp Training
Maintain amp Review Phase
feedback from testingaudit broadendeepen DC BC plan
Awareness amp Training Phase
its never enough institutionalize BC practice embed into
DC culture
The proof of the bdquopudding‟
is in the eating
Exercise amp Testing Phase
Give yourself an intentional DC disaster
1 Re-test BCP regularly
(walk-through checklist role-
playingsimulation full interruption amp
rehearsal)
2 Test alternative risk (threat +
vulnerability) scenarios
robertmcachia at umedumt
An example disaster mass cyber-attack
A mass cyber-attack may
involve
prior sniffing hellip
hellip social engineering amp
widespread attack
An attack may be blended
denial-of-service andor
widespread virusworm
infection andor
penetration (data theft
data corruption) andor
fraudblackmail
IncidentProblem Management
(recall ISO 20000 ITIL)source M Benyoucef
University of Ottawa
robertmcachia at umedumt
An example DC disaster BCM response to cyber-attack
This sequence applies to DoSDDoS infection amp penetration cyber-
attacks
Note After the BCM response seek prevention (i) Lessons Learnt (ii)
improve DC processsystems resilience (iii) heighten DC emergency
preparedness
1 Interdict actions to block attack halt its progress
2 Contain actions to prevent further degradation
amp regain control
3 Recover repair fix damage bring all systems
data applicationsservices to normality
4 Analyse investigationaudit - examination of
evidence
When prevention fails and an attack is under way the BCM
response is
Interdict Contain
Recover Analyse
robertmcachia at umedumt
Structure amp Content of BS25999
A DC may go beyond organising a BCP project
It may decide to formalise its BCM assurance process by embracing the
BS25999 standard
BS25999 holistic structure amp content
bull overview of business continuity management (BCM)
bull business continuity management policy
bull BCM programme management
bull understanding the organization
bull determining business continuity strategy
bull developing and implementing a BCM response
bull exercising maintaining and reviewing BCM arrangements
bull embedding BCM in the organizationrsquos culture
BS25999 Holistic
robertmcachia at umedumt
Some takeaways (1)
BCM thinking amp action
bull decrease the number of
holes both management amp
technical
bull decrease the size of the
holes and
bull keep them moving so they
are never aligned
The Swiss cheese model of business resilience
robertmcachia at umedumt
Some takeaways (2)
BCM is a concern of
CIOs CTOs Data Centre Managers
Quality Managers Infosec Managers
IT Architects DBAs
IT Auditors
CEOs CFOs because itrsquos Governance
BCM addresses broad non-specific risks it is a ldquocatch allrdquo
BCM is probably the most cost-effective security control of all
BCM start now start small scale fast broadendeepen BCM scope
incrementally
You must be able to respond to your circumstances as they exist - not as you
would like them to be - Brian Billick
robertmcachia at umedumt
Sources in and around BCM for Data Centres
The Business Continuity Institute (BCI UK) httpwwwthebciorg
Guidancestudies on BCM by ENISA (EU) ECB (EU) CMI (UK) FSA (UK) UK
Cabinet Office
A primer on ITIL ISO 20000 - IT Service Management
httpwwwisaca-
maltaorgdocumentspresentationsIT_Service_Management_Robert_M_Cachia_26_02
_2009pdf
Agility amp alertness hellipcontinuitycentral httpwwwcontinuitycentralcomukhtm
BS25999 (British Standards Institution) httpwwwbsigroupcomensectorsandservicesDisciplinesBusines
s-Continuity
RiskIT amp CobIT frameworks (ISACA) wwwisaca-maltaorg
SFIA HR framework for IT ISEB (BCS) httpwwwbcsorg
SEI (USA) httpwwwseicmuedu
Uptime Institute (USA) httpwwwuptimeinstituteorg
robertmcachia at umedumt
Questions and Discussion
robertmcachia atldquo umedumt
Thank You
Dr Robert M Cachia
httpwwwlinkedincominrobertmcachia
robertmcachia ldquoatrdquo umedumt
Todayrsquos agenda (1)
BCM for industrial strength co-located commercial IT providers
eg Data Centres offering SaaS using Cloud Computing technologies
Today we discuss
bull e-services e-business data centres amp IT call centres
bull BCM for Data Centres
bull business e-services amp trust
bull BCM amp IT - economic context
bull threats vulnerabilities risk assets
bull Data Centre outages business impacts
bull BCM - in Data Centres for Data Centres
robertmcachia at umedumt
Todayrsquos agenda (2)
Today we discuss (continued)
bull a BCM initiative for DCs its output
bull BCP for DCs management choices
bull BCM ProjectProgramme
bull business impact analysis
bull risk assessment
bull BCM DC recovery - business amp technical
bull an example disaster - mass cyber-attack amp BCM response
bull structure and content of BS25999
bull some takeaways
robertmcachia at umedumt
e-Services e-Business - all from Data Centres amp IT Call Centres
robertmcachia at umedumt
BCM for Data Centres what
BCM
i is a management amp business capability to protect value-creating
activities of the DC
ii supports the Boardrsquos objective of staying in business
iii points i amp ii make BCM immediately a Governance issue
BCM - planned processes to
bull assess reduce amp manage vulnerability to risk
bull plan DC responses in case of adversity
bull exercise preparedness for continued IT delivery during disruption
bull restore normality and protect DC reputation and goodwill
ldquoThe time to repair the roof is when the sun is shiningrdquo (J F Kennedy)
robertmcachia at umedumt
Business e-Services amp trust
Is she
bull a Logistics Manager
bull a Banker
bull a PA to the CEO
bull a CEO
in
bull an Airport
bull a Bourse
bull a Telecoms provider
bull iGaming
bull e-payment
bull a Retail chain hellip
hellip doing B2B or B2C
hellip doing e-CRM or e-SCMe-services need trust
robertmcachia at umedumt
BCM and IT - economic context
Economy 20 amp Economy 30
digital networked global mobile locational 24x7x365
11 billion Internet users 27 billion mobile phone users
ubiquitous VoIPSkype e-mail amp calendaring software
e-Business e-Banking e-Payment e-Government
e-SCM e-CRM B2B B2C -across supply chains geographies time-zones
economies amp jurisdictions
BCM in DCs is mandated by economic logic
robertmcachia at umedumt
BCM and IT - what not to do
human error IT management human error technical
robertmcachia at umedumt
DC BCM terminology Threats Vulnerabilities Risk Assets
Threat the intent and capacity to cause loss or disruption
and create adverse consequences - eg to IT
services data Data Centers IT Call Centers etc
Vulnerability the susceptibility of a service provider service data
or infrastructure to damage impairment or exposure
by a threat
Risk a measure of the potential consequences of a
contingency against the likelihood of its occurring
threats + vulnerabilities = risk
Impact the consequenceeffect of a threat expressed in
terms of reduction of DC capability or loss of
business service data etc
Asset an IT service element eg software hardware IT
people data
robertmcachia at umedumt
Threats - some examples
Some data centre threats
human error (managementtechnical accidentalmalicious)
technical failure (mechanical electrical hardware software
etc)
software failure (even software patches themselves contain
bugs)
fire explosion smoke
floods (natural burst pipes)
toxic hazards (effluents emissions)
structural collapse
crime (vandalism organized crime white-collar crime etc)
mass cyber-attack digital terrorism cyber-war
Established threats may diminish and new threats emerge - alertness
Oct 2006 - Marsascala
robertmcachia at umedumt
Threats - general amp IT-specific
-
Explosion fire smoke emissions effluents
in or close to DC
malicious IT Threat
robertmcachia at umedumt
IT Vulnerabilities - some examples (1)
DC vulnerabilities may be technical or non-technical
eg in organizational design in business processes systems or software
Some IT vulnerabilities
Poor Data Centre site selection or Data Centre design
SPOFs (people hardware databases software)
Poor Data Centre procedures or compliance to
procedure (SFIA HR framework for IT ITIL)
Poor physical access control
Poor software patchinggood physical access
control
robertmcachia at umedumt
IT Vulnerabilities - some examples (2)
Poor software testing especially Web
testing (recall CMMi ISEB)
Poor Configuration ManagementVersion
Control (recall ISO 20000 ITIL)
Poor Infosec (passwords biometrics
encryption anti-virusmalware)
(recall ISO 27001 CobIT RiskIT)
Poor Capacity ManagementAvailability
Management (recall ISO 20000 ITIL)
Poor procurementsupplier management
(ISO 9000 ITIL BS25999)
Poor fire protection
Known vulnerabilities sometimes diminish amp new ones emerge - alertness
wiring from hell
robertmcachia at umedumt
Risks - general amp IT-specific
loss of facilities (buildings roads leading to building etc)
loss of data (data integrity data availability recall ISO 27001)
loss of power
loss of water supply (cooling) the opposite - flooding
computer infection outbreaks - viruses worms
physical access compromised intrusion with intent
asset seizure
Changing threats changing vulnerabilities - shifting scenario alertness
Some Data Centres risks
supply chain disruption (software vendor collapse
hardware vendor collapse ISP collapse)
loss of Telecom connectivity (data voice)
loss of people (people can be SPOFs)Your IT vendor supply
chain
Burst pipe
DC flooding
robertmcachia at umedumt
IT Risks - some examples
Physical DC Risks Logical DC Risks
DC outages business impacts - large amp immediate
Impact of DC outages
bull down-time
bull breached SLAs
bull lost revenue
bull reputational damage
clients and prospects lost
forever
bull cost to recover
bull legal liabilities
bull some DCs never recover
Recall many DCs are in
premises that were never
designed to be DC-ready
Q How to protect value-creation capability
A BCM
robertmcachia at umedumt
BCM in Data Centres BCM for Data Centres
360 e-continuity amp e-assurance
business functions of commercial IT providers(eg CEOrsquos office Procurement Marketing amp Sales HR Facilities
Planning QA)
IT processes eg IncidentProblem ConfigurationChange
AvailabilityCapacity
Data
Employees including IT people
Physical FacilitiesBuilding Networks VoIP
Computational and Storage
Servers SANNAS
Software stackOS DBMS middleware applications
ApplicationsServicesB2BB2C e-SCMERPe-CRM BIKM IntranetsExtranets e-mail
BCM for data centres 360
robertmcachia at umedumt
BCM DC initiative output a BC Plan + organizational capability
A good BC plan is articulated by components and views
With a good BC plan the CIOCTOData Centre Manager can tell the
CEO
We haveare executing plans by DC clientmarket segment hellip
We haveare executing plans by Applicationby Service hellip
We haveare executing plans by DC Departmenthellip
We haveare executing plans by Buildingby Floor hellip
BC Plan BC Planning (i) What (ii) How (iii) Who (iv) When
robertmcachia at umedumt
DC Business Continuity Planning management choices
Which DC clientsservicesapplications get restored first and who will
wait longer hellip recovery sequence for each client
Which servicesapplications get restored completely which clients will
initially only get limited service
MTPD maximum tolerable period of disruption the period of time by
which all identified systemsapplicationsservices and data
must be restored to normality (for DC for each client)
RTO recovery time objective the period in time within which
assetsservices must be recovered after disruption (for DC for
each client)
RTO affects the recovery option shorter RTO -gt more difficult
more expensive (for DC for each client)
RPO recovery point objective the point in time to which
assetsservices must be restored after disruption (for DC for
each client)
RPO affects the volume of data that may need to be restored
robertmcachia at umedumt
BCM initiative requires a dedicated ProjectProgramme
A BC initiative may adopt Prince 2 or PMBOK projectmanagement methods The typical BCM phases
BC Project Initiation Phase
(mandate BCM scope DC BCM objectives responsibilitiesapproach awareness and BC project outcomesdeliverables)
Business Impact Analysis Phase
Risk Assessment Phase
BC Planning amp Design Phase
Exercise amp Testing Phase
Maintenance amp Review Phase
Awareness amp Training Phase
robertmcachia at umedumt
Business Impact Analysis (BIA) Phase
BIA achieves understanding of the DC its users activities
servicesrevenue streams applications data liabilities
bull BIA takes an internal focus - it identifies critical DC value-creatingactivities and assets
bull BIA quantifies DC loss due to disruptionimpairment of assets(reputational financial lost revenue cost of recovery)
bull BIA does not estimate the probability of types of DC incidents - itquantifies the consequences
bull BIA techniques questionnaires + interviews
BIA answers the questions which DC activities amp
assets create most value hellip of all potential DC losses
which impaired activities amp assets would be the
greatest losscascading impacts
multiple DC failure
robertmcachia at umedumt
The Risk Matrix - likelihood vs impact
Major Risks to DC - top-right square
unacceptable Eliminate or Postpone
Transfer Monitor amp Mitigate +
management ownership + DO BCM -Governance
Contingency Risks to DC - top-left
square very rare events will
extinguish the business complacency +
overconfidence can distract
management from the consequences
DO BCM - Governance
High incidence and low impact risksamp
minor risks order of the day -
operational matters
robertmcachia at umedumt
Risk Assessment Phase
RA considers the whatwhywherewho can and would cause the DCthe disastrous losses identified in BIA RA seeks to understand threatsto valued serviceDC assets
bull RA takes a primarily external focus a worldpoliticaleconomic IT sector focus - it looks at sources and causes
bull RA identifies threat scenarios to DC value-creating activities ampassets
bull RA estimates likelihood (probability) of occurrence of DC lossesamp asset impairment identified in BIA (estimates orguesstimates)
Note Need to locate DC activities amp assets in the slots of the RiskMatrix for the DC and plan to protect value-creating capability
robertmcachia at umedumt
BC Planning amp Design Phase - business recovery
DC recovery options driven by business requirements
regulation contracts amp SLAs A typical recovery sequence
1 Set MTPD objective for the DCfor each
client
2 Assign RTOs for individual
clientservicesapplicationsdata on
basis of prioritySLA
3 Evaluate alternative management amp
technical recovery options
4 Perform costbenefit analysis for each
option (re-planiterate)
5 Decide CEOCFO (i) approval (ii)
budget (iii) tasking individualsFailure to prepare is preparing
to fail - John Wooden
robertmcachia at umedumt
Some business recovery options for the DC
bull an Emergency Operations OfficeCentre
bull virtual workplaceteleworking from home for selected DC employees
bull alternative IT Call CentreData Centre
bull business decision - On-shore Near-shore Offshore
fallbackredundancy options
A typical recovery sequence
1 Initial emergency response
2 Resume mission-critical ApplicationsServices
3 Resume non-critical ApplicationsServices
4 Restoration to primary site full services verify stability
Ongoing communication - DC employees clients regulators
BC Planning amp Design Phase - business recovery (2)
robertmcachia at umedumt
BC Planning amp Design - technical recovery
Note business decision On-shore Near-shore Offshore
Note Mirror Hot Warm Cold (i) own (ii) shared-space (iii) reciprocal
agreements (iv) outsourced a business issue
Note high-end BC for DCs active-active hellip hot failover hellip
Note entry-point BC for DCs electronic vaulting remote journaling
transaction logging
Mirror full redundancy short-
distancewidely-separated Data
Centers latency considerations
Hot fully equipped fallback Data Centre
Warm fallback Data Center missing key
components
Cold empty fallback Data Center
DC Recovery options by increasing MTPD RTO amp decreasing cost
mirroring
robertmcachia at umedumt
Phases Exercise amp Testing Maintain amp Review Awareness amp Training
Maintain amp Review Phase
feedback from testingaudit broadendeepen DC BC plan
Awareness amp Training Phase
its never enough institutionalize BC practice embed into
DC culture
The proof of the bdquopudding‟
is in the eating
Exercise amp Testing Phase
Give yourself an intentional DC disaster
1 Re-test BCP regularly
(walk-through checklist role-
playingsimulation full interruption amp
rehearsal)
2 Test alternative risk (threat +
vulnerability) scenarios
robertmcachia at umedumt
An example disaster mass cyber-attack
A mass cyber-attack may
involve
prior sniffing hellip
hellip social engineering amp
widespread attack
An attack may be blended
denial-of-service andor
widespread virusworm
infection andor
penetration (data theft
data corruption) andor
fraudblackmail
IncidentProblem Management
(recall ISO 20000 ITIL)source M Benyoucef
University of Ottawa
robertmcachia at umedumt
An example DC disaster BCM response to cyber-attack
This sequence applies to DoSDDoS infection amp penetration cyber-
attacks
Note After the BCM response seek prevention (i) Lessons Learnt (ii)
improve DC processsystems resilience (iii) heighten DC emergency
preparedness
1 Interdict actions to block attack halt its progress
2 Contain actions to prevent further degradation
amp regain control
3 Recover repair fix damage bring all systems
data applicationsservices to normality
4 Analyse investigationaudit - examination of
evidence
When prevention fails and an attack is under way the BCM
response is
Interdict Contain
Recover Analyse
robertmcachia at umedumt
Structure amp Content of BS25999
A DC may go beyond organising a BCP project
It may decide to formalise its BCM assurance process by embracing the
BS25999 standard
BS25999 holistic structure amp content
bull overview of business continuity management (BCM)
bull business continuity management policy
bull BCM programme management
bull understanding the organization
bull determining business continuity strategy
bull developing and implementing a BCM response
bull exercising maintaining and reviewing BCM arrangements
bull embedding BCM in the organizationrsquos culture
BS25999 Holistic
robertmcachia at umedumt
Some takeaways (1)
BCM thinking amp action
bull decrease the number of
holes both management amp
technical
bull decrease the size of the
holes and
bull keep them moving so they
are never aligned
The Swiss cheese model of business resilience
robertmcachia at umedumt
Some takeaways (2)
BCM is a concern of
CIOs CTOs Data Centre Managers
Quality Managers Infosec Managers
IT Architects DBAs
IT Auditors
CEOs CFOs because itrsquos Governance
BCM addresses broad non-specific risks it is a ldquocatch allrdquo
BCM is probably the most cost-effective security control of all
BCM start now start small scale fast broadendeepen BCM scope
incrementally
You must be able to respond to your circumstances as they exist - not as you
would like them to be - Brian Billick
robertmcachia at umedumt
Sources in and around BCM for Data Centres
The Business Continuity Institute (BCI UK) httpwwwthebciorg
Guidancestudies on BCM by ENISA (EU) ECB (EU) CMI (UK) FSA (UK) UK
Cabinet Office
A primer on ITIL ISO 20000 - IT Service Management
httpwwwisaca-
maltaorgdocumentspresentationsIT_Service_Management_Robert_M_Cachia_26_02
_2009pdf
Agility amp alertness hellipcontinuitycentral httpwwwcontinuitycentralcomukhtm
BS25999 (British Standards Institution) httpwwwbsigroupcomensectorsandservicesDisciplinesBusines
s-Continuity
RiskIT amp CobIT frameworks (ISACA) wwwisaca-maltaorg
SFIA HR framework for IT ISEB (BCS) httpwwwbcsorg
SEI (USA) httpwwwseicmuedu
Uptime Institute (USA) httpwwwuptimeinstituteorg
robertmcachia at umedumt
Questions and Discussion
robertmcachia atldquo umedumt
Thank You
Dr Robert M Cachia
httpwwwlinkedincominrobertmcachia
robertmcachia ldquoatrdquo umedumt
Todayrsquos agenda (2)
Today we discuss (continued)
bull a BCM initiative for DCs its output
bull BCP for DCs management choices
bull BCM ProjectProgramme
bull business impact analysis
bull risk assessment
bull BCM DC recovery - business amp technical
bull an example disaster - mass cyber-attack amp BCM response
bull structure and content of BS25999
bull some takeaways
robertmcachia at umedumt
e-Services e-Business - all from Data Centres amp IT Call Centres
robertmcachia at umedumt
BCM for Data Centres what
BCM
i is a management amp business capability to protect value-creating
activities of the DC
ii supports the Boardrsquos objective of staying in business
iii points i amp ii make BCM immediately a Governance issue
BCM - planned processes to
bull assess reduce amp manage vulnerability to risk
bull plan DC responses in case of adversity
bull exercise preparedness for continued IT delivery during disruption
bull restore normality and protect DC reputation and goodwill
ldquoThe time to repair the roof is when the sun is shiningrdquo (J F Kennedy)
robertmcachia at umedumt
Business e-Services amp trust
Is she
bull a Logistics Manager
bull a Banker
bull a PA to the CEO
bull a CEO
in
bull an Airport
bull a Bourse
bull a Telecoms provider
bull iGaming
bull e-payment
bull a Retail chain hellip
hellip doing B2B or B2C
hellip doing e-CRM or e-SCMe-services need trust
robertmcachia at umedumt
BCM and IT - economic context
Economy 20 amp Economy 30
digital networked global mobile locational 24x7x365
11 billion Internet users 27 billion mobile phone users
ubiquitous VoIPSkype e-mail amp calendaring software
e-Business e-Banking e-Payment e-Government
e-SCM e-CRM B2B B2C -across supply chains geographies time-zones
economies amp jurisdictions
BCM in DCs is mandated by economic logic
robertmcachia at umedumt
BCM and IT - what not to do
human error IT management human error technical
robertmcachia at umedumt
DC BCM terminology Threats Vulnerabilities Risk Assets
Threat the intent and capacity to cause loss or disruption
and create adverse consequences - eg to IT
services data Data Centers IT Call Centers etc
Vulnerability the susceptibility of a service provider service data
or infrastructure to damage impairment or exposure
by a threat
Risk a measure of the potential consequences of a
contingency against the likelihood of its occurring
threats + vulnerabilities = risk
Impact the consequenceeffect of a threat expressed in
terms of reduction of DC capability or loss of
business service data etc
Asset an IT service element eg software hardware IT
people data
robertmcachia at umedumt
Threats - some examples
Some data centre threats
human error (managementtechnical accidentalmalicious)
technical failure (mechanical electrical hardware software
etc)
software failure (even software patches themselves contain
bugs)
fire explosion smoke
floods (natural burst pipes)
toxic hazards (effluents emissions)
structural collapse
crime (vandalism organized crime white-collar crime etc)
mass cyber-attack digital terrorism cyber-war
Established threats may diminish and new threats emerge - alertness
Oct 2006 - Marsascala
robertmcachia at umedumt
Threats - general amp IT-specific
-
Explosion fire smoke emissions effluents
in or close to DC
malicious IT Threat
robertmcachia at umedumt
IT Vulnerabilities - some examples (1)
DC vulnerabilities may be technical or non-technical
eg in organizational design in business processes systems or software
Some IT vulnerabilities
Poor Data Centre site selection or Data Centre design
SPOFs (people hardware databases software)
Poor Data Centre procedures or compliance to
procedure (SFIA HR framework for IT ITIL)
Poor physical access control
Poor software patchinggood physical access
control
robertmcachia at umedumt
IT Vulnerabilities - some examples (2)
Poor software testing especially Web
testing (recall CMMi ISEB)
Poor Configuration ManagementVersion
Control (recall ISO 20000 ITIL)
Poor Infosec (passwords biometrics
encryption anti-virusmalware)
(recall ISO 27001 CobIT RiskIT)
Poor Capacity ManagementAvailability
Management (recall ISO 20000 ITIL)
Poor procurementsupplier management
(ISO 9000 ITIL BS25999)
Poor fire protection
Known vulnerabilities sometimes diminish amp new ones emerge - alertness
wiring from hell
robertmcachia at umedumt
Risks - general amp IT-specific
loss of facilities (buildings roads leading to building etc)
loss of data (data integrity data availability recall ISO 27001)
loss of power
loss of water supply (cooling) the opposite - flooding
computer infection outbreaks - viruses worms
physical access compromised intrusion with intent
asset seizure
Changing threats changing vulnerabilities - shifting scenario alertness
Some Data Centres risks
supply chain disruption (software vendor collapse
hardware vendor collapse ISP collapse)
loss of Telecom connectivity (data voice)
loss of people (people can be SPOFs)Your IT vendor supply
chain
Burst pipe
DC flooding
robertmcachia at umedumt
IT Risks - some examples
Physical DC Risks Logical DC Risks
DC outages business impacts - large amp immediate
Impact of DC outages
bull down-time
bull breached SLAs
bull lost revenue
bull reputational damage
clients and prospects lost
forever
bull cost to recover
bull legal liabilities
bull some DCs never recover
Recall many DCs are in
premises that were never
designed to be DC-ready
Q How to protect value-creation capability
A BCM
robertmcachia at umedumt
BCM in Data Centres BCM for Data Centres
360 e-continuity amp e-assurance
business functions of commercial IT providers(eg CEOrsquos office Procurement Marketing amp Sales HR Facilities
Planning QA)
IT processes eg IncidentProblem ConfigurationChange
AvailabilityCapacity
Data
Employees including IT people
Physical FacilitiesBuilding Networks VoIP
Computational and Storage
Servers SANNAS
Software stackOS DBMS middleware applications
ApplicationsServicesB2BB2C e-SCMERPe-CRM BIKM IntranetsExtranets e-mail
BCM for data centres 360
robertmcachia at umedumt
BCM DC initiative output a BC Plan + organizational capability
A good BC plan is articulated by components and views
With a good BC plan the CIOCTOData Centre Manager can tell the
CEO
We haveare executing plans by DC clientmarket segment hellip
We haveare executing plans by Applicationby Service hellip
We haveare executing plans by DC Departmenthellip
We haveare executing plans by Buildingby Floor hellip
BC Plan BC Planning (i) What (ii) How (iii) Who (iv) When
robertmcachia at umedumt
DC Business Continuity Planning management choices
Which DC clientsservicesapplications get restored first and who will
wait longer hellip recovery sequence for each client
Which servicesapplications get restored completely which clients will
initially only get limited service
MTPD maximum tolerable period of disruption the period of time by
which all identified systemsapplicationsservices and data
must be restored to normality (for DC for each client)
RTO recovery time objective the period in time within which
assetsservices must be recovered after disruption (for DC for
each client)
RTO affects the recovery option shorter RTO -gt more difficult
more expensive (for DC for each client)
RPO recovery point objective the point in time to which
assetsservices must be restored after disruption (for DC for
each client)
RPO affects the volume of data that may need to be restored
robertmcachia at umedumt
BCM initiative requires a dedicated ProjectProgramme
A BC initiative may adopt Prince 2 or PMBOK projectmanagement methods The typical BCM phases
BC Project Initiation Phase
(mandate BCM scope DC BCM objectives responsibilitiesapproach awareness and BC project outcomesdeliverables)
Business Impact Analysis Phase
Risk Assessment Phase
BC Planning amp Design Phase
Exercise amp Testing Phase
Maintenance amp Review Phase
Awareness amp Training Phase
robertmcachia at umedumt
Business Impact Analysis (BIA) Phase
BIA achieves understanding of the DC its users activities
servicesrevenue streams applications data liabilities
bull BIA takes an internal focus - it identifies critical DC value-creatingactivities and assets
bull BIA quantifies DC loss due to disruptionimpairment of assets(reputational financial lost revenue cost of recovery)
bull BIA does not estimate the probability of types of DC incidents - itquantifies the consequences
bull BIA techniques questionnaires + interviews
BIA answers the questions which DC activities amp
assets create most value hellip of all potential DC losses
which impaired activities amp assets would be the
greatest losscascading impacts
multiple DC failure
robertmcachia at umedumt
The Risk Matrix - likelihood vs impact
Major Risks to DC - top-right square
unacceptable Eliminate or Postpone
Transfer Monitor amp Mitigate +
management ownership + DO BCM -Governance
Contingency Risks to DC - top-left
square very rare events will
extinguish the business complacency +
overconfidence can distract
management from the consequences
DO BCM - Governance
High incidence and low impact risksamp
minor risks order of the day -
operational matters
robertmcachia at umedumt
Risk Assessment Phase
RA considers the whatwhywherewho can and would cause the DCthe disastrous losses identified in BIA RA seeks to understand threatsto valued serviceDC assets
bull RA takes a primarily external focus a worldpoliticaleconomic IT sector focus - it looks at sources and causes
bull RA identifies threat scenarios to DC value-creating activities ampassets
bull RA estimates likelihood (probability) of occurrence of DC lossesamp asset impairment identified in BIA (estimates orguesstimates)
Note Need to locate DC activities amp assets in the slots of the RiskMatrix for the DC and plan to protect value-creating capability
robertmcachia at umedumt
BC Planning amp Design Phase - business recovery
DC recovery options driven by business requirements
regulation contracts amp SLAs A typical recovery sequence
1 Set MTPD objective for the DCfor each
client
2 Assign RTOs for individual
clientservicesapplicationsdata on
basis of prioritySLA
3 Evaluate alternative management amp
technical recovery options
4 Perform costbenefit analysis for each
option (re-planiterate)
5 Decide CEOCFO (i) approval (ii)
budget (iii) tasking individualsFailure to prepare is preparing
to fail - John Wooden
robertmcachia at umedumt
Some business recovery options for the DC
bull an Emergency Operations OfficeCentre
bull virtual workplaceteleworking from home for selected DC employees
bull alternative IT Call CentreData Centre
bull business decision - On-shore Near-shore Offshore
fallbackredundancy options
A typical recovery sequence
1 Initial emergency response
2 Resume mission-critical ApplicationsServices
3 Resume non-critical ApplicationsServices
4 Restoration to primary site full services verify stability
Ongoing communication - DC employees clients regulators
BC Planning amp Design Phase - business recovery (2)
robertmcachia at umedumt
BC Planning amp Design - technical recovery
Note business decision On-shore Near-shore Offshore
Note Mirror Hot Warm Cold (i) own (ii) shared-space (iii) reciprocal
agreements (iv) outsourced a business issue
Note high-end BC for DCs active-active hellip hot failover hellip
Note entry-point BC for DCs electronic vaulting remote journaling
transaction logging
Mirror full redundancy short-
distancewidely-separated Data
Centers latency considerations
Hot fully equipped fallback Data Centre
Warm fallback Data Center missing key
components
Cold empty fallback Data Center
DC Recovery options by increasing MTPD RTO amp decreasing cost
mirroring
robertmcachia at umedumt
Phases Exercise amp Testing Maintain amp Review Awareness amp Training
Maintain amp Review Phase
feedback from testingaudit broadendeepen DC BC plan
Awareness amp Training Phase
its never enough institutionalize BC practice embed into
DC culture
The proof of the bdquopudding‟
is in the eating
Exercise amp Testing Phase
Give yourself an intentional DC disaster
1 Re-test BCP regularly
(walk-through checklist role-
playingsimulation full interruption amp
rehearsal)
2 Test alternative risk (threat +
vulnerability) scenarios
robertmcachia at umedumt
An example disaster mass cyber-attack
A mass cyber-attack may
involve
prior sniffing hellip
hellip social engineering amp
widespread attack
An attack may be blended
denial-of-service andor
widespread virusworm
infection andor
penetration (data theft
data corruption) andor
fraudblackmail
IncidentProblem Management
(recall ISO 20000 ITIL)source M Benyoucef
University of Ottawa
robertmcachia at umedumt
An example DC disaster BCM response to cyber-attack
This sequence applies to DoSDDoS infection amp penetration cyber-
attacks
Note After the BCM response seek prevention (i) Lessons Learnt (ii)
improve DC processsystems resilience (iii) heighten DC emergency
preparedness
1 Interdict actions to block attack halt its progress
2 Contain actions to prevent further degradation
amp regain control
3 Recover repair fix damage bring all systems
data applicationsservices to normality
4 Analyse investigationaudit - examination of
evidence
When prevention fails and an attack is under way the BCM
response is
Interdict Contain
Recover Analyse
robertmcachia at umedumt
Structure amp Content of BS25999
A DC may go beyond organising a BCP project
It may decide to formalise its BCM assurance process by embracing the
BS25999 standard
BS25999 holistic structure amp content
bull overview of business continuity management (BCM)
bull business continuity management policy
bull BCM programme management
bull understanding the organization
bull determining business continuity strategy
bull developing and implementing a BCM response
bull exercising maintaining and reviewing BCM arrangements
bull embedding BCM in the organizationrsquos culture
BS25999 Holistic
robertmcachia at umedumt
Some takeaways (1)
BCM thinking amp action
bull decrease the number of
holes both management amp
technical
bull decrease the size of the
holes and
bull keep them moving so they
are never aligned
The Swiss cheese model of business resilience
robertmcachia at umedumt
Some takeaways (2)
BCM is a concern of
CIOs CTOs Data Centre Managers
Quality Managers Infosec Managers
IT Architects DBAs
IT Auditors
CEOs CFOs because itrsquos Governance
BCM addresses broad non-specific risks it is a ldquocatch allrdquo
BCM is probably the most cost-effective security control of all
BCM start now start small scale fast broadendeepen BCM scope
incrementally
You must be able to respond to your circumstances as they exist - not as you
would like them to be - Brian Billick
robertmcachia at umedumt
Sources in and around BCM for Data Centres
The Business Continuity Institute (BCI UK) httpwwwthebciorg
Guidancestudies on BCM by ENISA (EU) ECB (EU) CMI (UK) FSA (UK) UK
Cabinet Office
A primer on ITIL ISO 20000 - IT Service Management
httpwwwisaca-
maltaorgdocumentspresentationsIT_Service_Management_Robert_M_Cachia_26_02
_2009pdf
Agility amp alertness hellipcontinuitycentral httpwwwcontinuitycentralcomukhtm
BS25999 (British Standards Institution) httpwwwbsigroupcomensectorsandservicesDisciplinesBusines
s-Continuity
RiskIT amp CobIT frameworks (ISACA) wwwisaca-maltaorg
SFIA HR framework for IT ISEB (BCS) httpwwwbcsorg
SEI (USA) httpwwwseicmuedu
Uptime Institute (USA) httpwwwuptimeinstituteorg
robertmcachia at umedumt
Questions and Discussion
robertmcachia atldquo umedumt
Thank You
Dr Robert M Cachia
httpwwwlinkedincominrobertmcachia
robertmcachia ldquoatrdquo umedumt
e-Services e-Business - all from Data Centres amp IT Call Centres
robertmcachia at umedumt
BCM for Data Centres what
BCM
i is a management amp business capability to protect value-creating
activities of the DC
ii supports the Boardrsquos objective of staying in business
iii points i amp ii make BCM immediately a Governance issue
BCM - planned processes to
bull assess reduce amp manage vulnerability to risk
bull plan DC responses in case of adversity
bull exercise preparedness for continued IT delivery during disruption
bull restore normality and protect DC reputation and goodwill
ldquoThe time to repair the roof is when the sun is shiningrdquo (J F Kennedy)
robertmcachia at umedumt
Business e-Services amp trust
Is she
bull a Logistics Manager
bull a Banker
bull a PA to the CEO
bull a CEO
in
bull an Airport
bull a Bourse
bull a Telecoms provider
bull iGaming
bull e-payment
bull a Retail chain hellip
hellip doing B2B or B2C
hellip doing e-CRM or e-SCMe-services need trust
robertmcachia at umedumt
BCM and IT - economic context
Economy 20 amp Economy 30
digital networked global mobile locational 24x7x365
11 billion Internet users 27 billion mobile phone users
ubiquitous VoIPSkype e-mail amp calendaring software
e-Business e-Banking e-Payment e-Government
e-SCM e-CRM B2B B2C -across supply chains geographies time-zones
economies amp jurisdictions
BCM in DCs is mandated by economic logic
robertmcachia at umedumt
BCM and IT - what not to do
human error IT management human error technical
robertmcachia at umedumt
DC BCM terminology Threats Vulnerabilities Risk Assets
Threat the intent and capacity to cause loss or disruption
and create adverse consequences - eg to IT
services data Data Centers IT Call Centers etc
Vulnerability the susceptibility of a service provider service data
or infrastructure to damage impairment or exposure
by a threat
Risk a measure of the potential consequences of a
contingency against the likelihood of its occurring
threats + vulnerabilities = risk
Impact the consequenceeffect of a threat expressed in
terms of reduction of DC capability or loss of
business service data etc
Asset an IT service element eg software hardware IT
people data
robertmcachia at umedumt
Threats - some examples
Some data centre threats
human error (managementtechnical accidentalmalicious)
technical failure (mechanical electrical hardware software
etc)
software failure (even software patches themselves contain
bugs)
fire explosion smoke
floods (natural burst pipes)
toxic hazards (effluents emissions)
structural collapse
crime (vandalism organized crime white-collar crime etc)
mass cyber-attack digital terrorism cyber-war
Established threats may diminish and new threats emerge - alertness
Oct 2006 - Marsascala
robertmcachia at umedumt
Threats - general amp IT-specific
-
Explosion fire smoke emissions effluents
in or close to DC
malicious IT Threat
robertmcachia at umedumt
IT Vulnerabilities - some examples (1)
DC vulnerabilities may be technical or non-technical
eg in organizational design in business processes systems or software
Some IT vulnerabilities
Poor Data Centre site selection or Data Centre design
SPOFs (people hardware databases software)
Poor Data Centre procedures or compliance to
procedure (SFIA HR framework for IT ITIL)
Poor physical access control
Poor software patchinggood physical access
control
robertmcachia at umedumt
IT Vulnerabilities - some examples (2)
Poor software testing especially Web
testing (recall CMMi ISEB)
Poor Configuration ManagementVersion
Control (recall ISO 20000 ITIL)
Poor Infosec (passwords biometrics
encryption anti-virusmalware)
(recall ISO 27001 CobIT RiskIT)
Poor Capacity ManagementAvailability
Management (recall ISO 20000 ITIL)
Poor procurementsupplier management
(ISO 9000 ITIL BS25999)
Poor fire protection
Known vulnerabilities sometimes diminish amp new ones emerge - alertness
wiring from hell
robertmcachia at umedumt
Risks - general amp IT-specific
loss of facilities (buildings roads leading to building etc)
loss of data (data integrity data availability recall ISO 27001)
loss of power
loss of water supply (cooling) the opposite - flooding
computer infection outbreaks - viruses worms
physical access compromised intrusion with intent
asset seizure
Changing threats changing vulnerabilities - shifting scenario alertness
Some Data Centres risks
supply chain disruption (software vendor collapse
hardware vendor collapse ISP collapse)
loss of Telecom connectivity (data voice)
loss of people (people can be SPOFs)Your IT vendor supply
chain
Burst pipe
DC flooding
robertmcachia at umedumt
IT Risks - some examples
Physical DC Risks Logical DC Risks
DC outages business impacts - large amp immediate
Impact of DC outages
bull down-time
bull breached SLAs
bull lost revenue
bull reputational damage
clients and prospects lost
forever
bull cost to recover
bull legal liabilities
bull some DCs never recover
Recall many DCs are in
premises that were never
designed to be DC-ready
Q How to protect value-creation capability
A BCM
robertmcachia at umedumt
BCM in Data Centres BCM for Data Centres
360 e-continuity amp e-assurance
business functions of commercial IT providers(eg CEOrsquos office Procurement Marketing amp Sales HR Facilities
Planning QA)
IT processes eg IncidentProblem ConfigurationChange
AvailabilityCapacity
Data
Employees including IT people
Physical FacilitiesBuilding Networks VoIP
Computational and Storage
Servers SANNAS
Software stackOS DBMS middleware applications
ApplicationsServicesB2BB2C e-SCMERPe-CRM BIKM IntranetsExtranets e-mail
BCM for data centres 360
robertmcachia at umedumt
BCM DC initiative output a BC Plan + organizational capability
A good BC plan is articulated by components and views
With a good BC plan the CIOCTOData Centre Manager can tell the
CEO
We haveare executing plans by DC clientmarket segment hellip
We haveare executing plans by Applicationby Service hellip
We haveare executing plans by DC Departmenthellip
We haveare executing plans by Buildingby Floor hellip
BC Plan BC Planning (i) What (ii) How (iii) Who (iv) When
robertmcachia at umedumt
DC Business Continuity Planning management choices
Which DC clientsservicesapplications get restored first and who will
wait longer hellip recovery sequence for each client
Which servicesapplications get restored completely which clients will
initially only get limited service
MTPD maximum tolerable period of disruption the period of time by
which all identified systemsapplicationsservices and data
must be restored to normality (for DC for each client)
RTO recovery time objective the period in time within which
assetsservices must be recovered after disruption (for DC for
each client)
RTO affects the recovery option shorter RTO -gt more difficult
more expensive (for DC for each client)
RPO recovery point objective the point in time to which
assetsservices must be restored after disruption (for DC for
each client)
RPO affects the volume of data that may need to be restored
robertmcachia at umedumt
BCM initiative requires a dedicated ProjectProgramme
A BC initiative may adopt Prince 2 or PMBOK projectmanagement methods The typical BCM phases
BC Project Initiation Phase
(mandate BCM scope DC BCM objectives responsibilitiesapproach awareness and BC project outcomesdeliverables)
Business Impact Analysis Phase
Risk Assessment Phase
BC Planning amp Design Phase
Exercise amp Testing Phase
Maintenance amp Review Phase
Awareness amp Training Phase
robertmcachia at umedumt
Business Impact Analysis (BIA) Phase
BIA achieves understanding of the DC its users activities
servicesrevenue streams applications data liabilities
bull BIA takes an internal focus - it identifies critical DC value-creatingactivities and assets
bull BIA quantifies DC loss due to disruptionimpairment of assets(reputational financial lost revenue cost of recovery)
bull BIA does not estimate the probability of types of DC incidents - itquantifies the consequences
bull BIA techniques questionnaires + interviews
BIA answers the questions which DC activities amp
assets create most value hellip of all potential DC losses
which impaired activities amp assets would be the
greatest losscascading impacts
multiple DC failure
robertmcachia at umedumt
The Risk Matrix - likelihood vs impact
Major Risks to DC - top-right square
unacceptable Eliminate or Postpone
Transfer Monitor amp Mitigate +
management ownership + DO BCM -Governance
Contingency Risks to DC - top-left
square very rare events will
extinguish the business complacency +
overconfidence can distract
management from the consequences
DO BCM - Governance
High incidence and low impact risksamp
minor risks order of the day -
operational matters
robertmcachia at umedumt
Risk Assessment Phase
RA considers the whatwhywherewho can and would cause the DCthe disastrous losses identified in BIA RA seeks to understand threatsto valued serviceDC assets
bull RA takes a primarily external focus a worldpoliticaleconomic IT sector focus - it looks at sources and causes
bull RA identifies threat scenarios to DC value-creating activities ampassets
bull RA estimates likelihood (probability) of occurrence of DC lossesamp asset impairment identified in BIA (estimates orguesstimates)
Note Need to locate DC activities amp assets in the slots of the RiskMatrix for the DC and plan to protect value-creating capability
robertmcachia at umedumt
BC Planning amp Design Phase - business recovery
DC recovery options driven by business requirements
regulation contracts amp SLAs A typical recovery sequence
1 Set MTPD objective for the DCfor each
client
2 Assign RTOs for individual
clientservicesapplicationsdata on
basis of prioritySLA
3 Evaluate alternative management amp
technical recovery options
4 Perform costbenefit analysis for each
option (re-planiterate)
5 Decide CEOCFO (i) approval (ii)
budget (iii) tasking individualsFailure to prepare is preparing
to fail - John Wooden
robertmcachia at umedumt
Some business recovery options for the DC
bull an Emergency Operations OfficeCentre
bull virtual workplaceteleworking from home for selected DC employees
bull alternative IT Call CentreData Centre
bull business decision - On-shore Near-shore Offshore
fallbackredundancy options
A typical recovery sequence
1 Initial emergency response
2 Resume mission-critical ApplicationsServices
3 Resume non-critical ApplicationsServices
4 Restoration to primary site full services verify stability
Ongoing communication - DC employees clients regulators
BC Planning amp Design Phase - business recovery (2)
robertmcachia at umedumt
BC Planning amp Design - technical recovery
Note business decision On-shore Near-shore Offshore
Note Mirror Hot Warm Cold (i) own (ii) shared-space (iii) reciprocal
agreements (iv) outsourced a business issue
Note high-end BC for DCs active-active hellip hot failover hellip
Note entry-point BC for DCs electronic vaulting remote journaling
transaction logging
Mirror full redundancy short-
distancewidely-separated Data
Centers latency considerations
Hot fully equipped fallback Data Centre
Warm fallback Data Center missing key
components
Cold empty fallback Data Center
DC Recovery options by increasing MTPD RTO amp decreasing cost
mirroring
robertmcachia at umedumt
Phases Exercise amp Testing Maintain amp Review Awareness amp Training
Maintain amp Review Phase
feedback from testingaudit broadendeepen DC BC plan
Awareness amp Training Phase
its never enough institutionalize BC practice embed into
DC culture
The proof of the bdquopudding‟
is in the eating
Exercise amp Testing Phase
Give yourself an intentional DC disaster
1 Re-test BCP regularly
(walk-through checklist role-
playingsimulation full interruption amp
rehearsal)
2 Test alternative risk (threat +
vulnerability) scenarios
robertmcachia at umedumt
An example disaster mass cyber-attack
A mass cyber-attack may
involve
prior sniffing hellip
hellip social engineering amp
widespread attack
An attack may be blended
denial-of-service andor
widespread virusworm
infection andor
penetration (data theft
data corruption) andor
fraudblackmail
IncidentProblem Management
(recall ISO 20000 ITIL)source M Benyoucef
University of Ottawa
robertmcachia at umedumt
An example DC disaster BCM response to cyber-attack
This sequence applies to DoSDDoS infection amp penetration cyber-
attacks
Note After the BCM response seek prevention (i) Lessons Learnt (ii)
improve DC processsystems resilience (iii) heighten DC emergency
preparedness
1 Interdict actions to block attack halt its progress
2 Contain actions to prevent further degradation
amp regain control
3 Recover repair fix damage bring all systems
data applicationsservices to normality
4 Analyse investigationaudit - examination of
evidence
When prevention fails and an attack is under way the BCM
response is
Interdict Contain
Recover Analyse
robertmcachia at umedumt
Structure amp Content of BS25999
A DC may go beyond organising a BCP project
It may decide to formalise its BCM assurance process by embracing the
BS25999 standard
BS25999 holistic structure amp content
bull overview of business continuity management (BCM)
bull business continuity management policy
bull BCM programme management
bull understanding the organization
bull determining business continuity strategy
bull developing and implementing a BCM response
bull exercising maintaining and reviewing BCM arrangements
bull embedding BCM in the organizationrsquos culture
BS25999 Holistic
robertmcachia at umedumt
Some takeaways (1)
BCM thinking amp action
bull decrease the number of
holes both management amp
technical
bull decrease the size of the
holes and
bull keep them moving so they
are never aligned
The Swiss cheese model of business resilience
robertmcachia at umedumt
Some takeaways (2)
BCM is a concern of
CIOs CTOs Data Centre Managers
Quality Managers Infosec Managers
IT Architects DBAs
IT Auditors
CEOs CFOs because itrsquos Governance
BCM addresses broad non-specific risks it is a ldquocatch allrdquo
BCM is probably the most cost-effective security control of all
BCM start now start small scale fast broadendeepen BCM scope
incrementally
You must be able to respond to your circumstances as they exist - not as you
would like them to be - Brian Billick
robertmcachia at umedumt
Sources in and around BCM for Data Centres
The Business Continuity Institute (BCI UK) httpwwwthebciorg
Guidancestudies on BCM by ENISA (EU) ECB (EU) CMI (UK) FSA (UK) UK
Cabinet Office
A primer on ITIL ISO 20000 - IT Service Management
httpwwwisaca-
maltaorgdocumentspresentationsIT_Service_Management_Robert_M_Cachia_26_02
_2009pdf
Agility amp alertness hellipcontinuitycentral httpwwwcontinuitycentralcomukhtm
BS25999 (British Standards Institution) httpwwwbsigroupcomensectorsandservicesDisciplinesBusines
s-Continuity
RiskIT amp CobIT frameworks (ISACA) wwwisaca-maltaorg
SFIA HR framework for IT ISEB (BCS) httpwwwbcsorg
SEI (USA) httpwwwseicmuedu
Uptime Institute (USA) httpwwwuptimeinstituteorg
robertmcachia at umedumt
Questions and Discussion
robertmcachia atldquo umedumt
Thank You
Dr Robert M Cachia
httpwwwlinkedincominrobertmcachia
robertmcachia ldquoatrdquo umedumt
BCM for Data Centres what
BCM
i is a management amp business capability to protect value-creating
activities of the DC
ii supports the Boardrsquos objective of staying in business
iii points i amp ii make BCM immediately a Governance issue
BCM - planned processes to
bull assess reduce amp manage vulnerability to risk
bull plan DC responses in case of adversity
bull exercise preparedness for continued IT delivery during disruption
bull restore normality and protect DC reputation and goodwill
ldquoThe time to repair the roof is when the sun is shiningrdquo (J F Kennedy)
robertmcachia at umedumt
Business e-Services amp trust
Is she
bull a Logistics Manager
bull a Banker
bull a PA to the CEO
bull a CEO
in
bull an Airport
bull a Bourse
bull a Telecoms provider
bull iGaming
bull e-payment
bull a Retail chain hellip
hellip doing B2B or B2C
hellip doing e-CRM or e-SCMe-services need trust
robertmcachia at umedumt
BCM and IT - economic context
Economy 20 amp Economy 30
digital networked global mobile locational 24x7x365
11 billion Internet users 27 billion mobile phone users
ubiquitous VoIPSkype e-mail amp calendaring software
e-Business e-Banking e-Payment e-Government
e-SCM e-CRM B2B B2C -across supply chains geographies time-zones
economies amp jurisdictions
BCM in DCs is mandated by economic logic
robertmcachia at umedumt
BCM and IT - what not to do
human error IT management human error technical
robertmcachia at umedumt
DC BCM terminology Threats Vulnerabilities Risk Assets
Threat the intent and capacity to cause loss or disruption
and create adverse consequences - eg to IT
services data Data Centers IT Call Centers etc
Vulnerability the susceptibility of a service provider service data
or infrastructure to damage impairment or exposure
by a threat
Risk a measure of the potential consequences of a
contingency against the likelihood of its occurring
threats + vulnerabilities = risk
Impact the consequenceeffect of a threat expressed in
terms of reduction of DC capability or loss of
business service data etc
Asset an IT service element eg software hardware IT
people data
robertmcachia at umedumt
Threats - some examples
Some data centre threats
human error (managementtechnical accidentalmalicious)
technical failure (mechanical electrical hardware software
etc)
software failure (even software patches themselves contain
bugs)
fire explosion smoke
floods (natural burst pipes)
toxic hazards (effluents emissions)
structural collapse
crime (vandalism organized crime white-collar crime etc)
mass cyber-attack digital terrorism cyber-war
Established threats may diminish and new threats emerge - alertness
Oct 2006 - Marsascala
robertmcachia at umedumt
Threats - general amp IT-specific
-
Explosion fire smoke emissions effluents
in or close to DC
malicious IT Threat
robertmcachia at umedumt
IT Vulnerabilities - some examples (1)
DC vulnerabilities may be technical or non-technical
eg in organizational design in business processes systems or software
Some IT vulnerabilities
Poor Data Centre site selection or Data Centre design
SPOFs (people hardware databases software)
Poor Data Centre procedures or compliance to
procedure (SFIA HR framework for IT ITIL)
Poor physical access control
Poor software patchinggood physical access
control
robertmcachia at umedumt
IT Vulnerabilities - some examples (2)
Poor software testing especially Web
testing (recall CMMi ISEB)
Poor Configuration ManagementVersion
Control (recall ISO 20000 ITIL)
Poor Infosec (passwords biometrics
encryption anti-virusmalware)
(recall ISO 27001 CobIT RiskIT)
Poor Capacity ManagementAvailability
Management (recall ISO 20000 ITIL)
Poor procurementsupplier management
(ISO 9000 ITIL BS25999)
Poor fire protection
Known vulnerabilities sometimes diminish amp new ones emerge - alertness
wiring from hell
robertmcachia at umedumt
Risks - general amp IT-specific
loss of facilities (buildings roads leading to building etc)
loss of data (data integrity data availability recall ISO 27001)
loss of power
loss of water supply (cooling) the opposite - flooding
computer infection outbreaks - viruses worms
physical access compromised intrusion with intent
asset seizure
Changing threats changing vulnerabilities - shifting scenario alertness
Some Data Centres risks
supply chain disruption (software vendor collapse
hardware vendor collapse ISP collapse)
loss of Telecom connectivity (data voice)
loss of people (people can be SPOFs)Your IT vendor supply
chain
Burst pipe
DC flooding
robertmcachia at umedumt
IT Risks - some examples
Physical DC Risks Logical DC Risks
DC outages business impacts - large amp immediate
Impact of DC outages
bull down-time
bull breached SLAs
bull lost revenue
bull reputational damage
clients and prospects lost
forever
bull cost to recover
bull legal liabilities
bull some DCs never recover
Recall many DCs are in
premises that were never
designed to be DC-ready
Q How to protect value-creation capability
A BCM
robertmcachia at umedumt
BCM in Data Centres BCM for Data Centres
360 e-continuity amp e-assurance
business functions of commercial IT providers(eg CEOrsquos office Procurement Marketing amp Sales HR Facilities
Planning QA)
IT processes eg IncidentProblem ConfigurationChange
AvailabilityCapacity
Data
Employees including IT people
Physical FacilitiesBuilding Networks VoIP
Computational and Storage
Servers SANNAS
Software stackOS DBMS middleware applications
ApplicationsServicesB2BB2C e-SCMERPe-CRM BIKM IntranetsExtranets e-mail
BCM for data centres 360
robertmcachia at umedumt
BCM DC initiative output a BC Plan + organizational capability
A good BC plan is articulated by components and views
With a good BC plan the CIOCTOData Centre Manager can tell the
CEO
We haveare executing plans by DC clientmarket segment hellip
We haveare executing plans by Applicationby Service hellip
We haveare executing plans by DC Departmenthellip
We haveare executing plans by Buildingby Floor hellip
BC Plan BC Planning (i) What (ii) How (iii) Who (iv) When
robertmcachia at umedumt
DC Business Continuity Planning management choices
Which DC clientsservicesapplications get restored first and who will
wait longer hellip recovery sequence for each client
Which servicesapplications get restored completely which clients will
initially only get limited service
MTPD maximum tolerable period of disruption the period of time by
which all identified systemsapplicationsservices and data
must be restored to normality (for DC for each client)
RTO recovery time objective the period in time within which
assetsservices must be recovered after disruption (for DC for
each client)
RTO affects the recovery option shorter RTO -gt more difficult
more expensive (for DC for each client)
RPO recovery point objective the point in time to which
assetsservices must be restored after disruption (for DC for
each client)
RPO affects the volume of data that may need to be restored
robertmcachia at umedumt
BCM initiative requires a dedicated ProjectProgramme
A BC initiative may adopt Prince 2 or PMBOK projectmanagement methods The typical BCM phases
BC Project Initiation Phase
(mandate BCM scope DC BCM objectives responsibilitiesapproach awareness and BC project outcomesdeliverables)
Business Impact Analysis Phase
Risk Assessment Phase
BC Planning amp Design Phase
Exercise amp Testing Phase
Maintenance amp Review Phase
Awareness amp Training Phase
robertmcachia at umedumt
Business Impact Analysis (BIA) Phase
BIA achieves understanding of the DC its users activities
servicesrevenue streams applications data liabilities
bull BIA takes an internal focus - it identifies critical DC value-creatingactivities and assets
bull BIA quantifies DC loss due to disruptionimpairment of assets(reputational financial lost revenue cost of recovery)
bull BIA does not estimate the probability of types of DC incidents - itquantifies the consequences
bull BIA techniques questionnaires + interviews
BIA answers the questions which DC activities amp
assets create most value hellip of all potential DC losses
which impaired activities amp assets would be the
greatest losscascading impacts
multiple DC failure
robertmcachia at umedumt
The Risk Matrix - likelihood vs impact
Major Risks to DC - top-right square
unacceptable Eliminate or Postpone
Transfer Monitor amp Mitigate +
management ownership + DO BCM -Governance
Contingency Risks to DC - top-left
square very rare events will
extinguish the business complacency +
overconfidence can distract
management from the consequences
DO BCM - Governance
High incidence and low impact risksamp
minor risks order of the day -
operational matters
robertmcachia at umedumt
Risk Assessment Phase
RA considers the whatwhywherewho can and would cause the DCthe disastrous losses identified in BIA RA seeks to understand threatsto valued serviceDC assets
bull RA takes a primarily external focus a worldpoliticaleconomic IT sector focus - it looks at sources and causes
bull RA identifies threat scenarios to DC value-creating activities ampassets
bull RA estimates likelihood (probability) of occurrence of DC lossesamp asset impairment identified in BIA (estimates orguesstimates)
Note Need to locate DC activities amp assets in the slots of the RiskMatrix for the DC and plan to protect value-creating capability
robertmcachia at umedumt
BC Planning amp Design Phase - business recovery
DC recovery options driven by business requirements
regulation contracts amp SLAs A typical recovery sequence
1 Set MTPD objective for the DCfor each
client
2 Assign RTOs for individual
clientservicesapplicationsdata on
basis of prioritySLA
3 Evaluate alternative management amp
technical recovery options
4 Perform costbenefit analysis for each
option (re-planiterate)
5 Decide CEOCFO (i) approval (ii)
budget (iii) tasking individualsFailure to prepare is preparing
to fail - John Wooden
robertmcachia at umedumt
Some business recovery options for the DC
bull an Emergency Operations OfficeCentre
bull virtual workplaceteleworking from home for selected DC employees
bull alternative IT Call CentreData Centre
bull business decision - On-shore Near-shore Offshore
fallbackredundancy options
A typical recovery sequence
1 Initial emergency response
2 Resume mission-critical ApplicationsServices
3 Resume non-critical ApplicationsServices
4 Restoration to primary site full services verify stability
Ongoing communication - DC employees clients regulators
BC Planning amp Design Phase - business recovery (2)
robertmcachia at umedumt
BC Planning amp Design - technical recovery
Note business decision On-shore Near-shore Offshore
Note Mirror Hot Warm Cold (i) own (ii) shared-space (iii) reciprocal
agreements (iv) outsourced a business issue
Note high-end BC for DCs active-active hellip hot failover hellip
Note entry-point BC for DCs electronic vaulting remote journaling
transaction logging
Mirror full redundancy short-
distancewidely-separated Data
Centers latency considerations
Hot fully equipped fallback Data Centre
Warm fallback Data Center missing key
components
Cold empty fallback Data Center
DC Recovery options by increasing MTPD RTO amp decreasing cost
mirroring
robertmcachia at umedumt
Phases Exercise amp Testing Maintain amp Review Awareness amp Training
Maintain amp Review Phase
feedback from testingaudit broadendeepen DC BC plan
Awareness amp Training Phase
its never enough institutionalize BC practice embed into
DC culture
The proof of the bdquopudding‟
is in the eating
Exercise amp Testing Phase
Give yourself an intentional DC disaster
1 Re-test BCP regularly
(walk-through checklist role-
playingsimulation full interruption amp
rehearsal)
2 Test alternative risk (threat +
vulnerability) scenarios
robertmcachia at umedumt
An example disaster mass cyber-attack
A mass cyber-attack may
involve
prior sniffing hellip
hellip social engineering amp
widespread attack
An attack may be blended
denial-of-service andor
widespread virusworm
infection andor
penetration (data theft
data corruption) andor
fraudblackmail
IncidentProblem Management
(recall ISO 20000 ITIL)source M Benyoucef
University of Ottawa
robertmcachia at umedumt
An example DC disaster BCM response to cyber-attack
This sequence applies to DoSDDoS infection amp penetration cyber-
attacks
Note After the BCM response seek prevention (i) Lessons Learnt (ii)
improve DC processsystems resilience (iii) heighten DC emergency
preparedness
1 Interdict actions to block attack halt its progress
2 Contain actions to prevent further degradation
amp regain control
3 Recover repair fix damage bring all systems
data applicationsservices to normality
4 Analyse investigationaudit - examination of
evidence
When prevention fails and an attack is under way the BCM
response is
Interdict Contain
Recover Analyse
robertmcachia at umedumt
Structure amp Content of BS25999
A DC may go beyond organising a BCP project
It may decide to formalise its BCM assurance process by embracing the
BS25999 standard
BS25999 holistic structure amp content
bull overview of business continuity management (BCM)
bull business continuity management policy
bull BCM programme management
bull understanding the organization
bull determining business continuity strategy
bull developing and implementing a BCM response
bull exercising maintaining and reviewing BCM arrangements
bull embedding BCM in the organizationrsquos culture
BS25999 Holistic
robertmcachia at umedumt
Some takeaways (1)
BCM thinking amp action
bull decrease the number of
holes both management amp
technical
bull decrease the size of the
holes and
bull keep them moving so they
are never aligned
The Swiss cheese model of business resilience
robertmcachia at umedumt
Some takeaways (2)
BCM is a concern of
CIOs CTOs Data Centre Managers
Quality Managers Infosec Managers
IT Architects DBAs
IT Auditors
CEOs CFOs because itrsquos Governance
BCM addresses broad non-specific risks it is a ldquocatch allrdquo
BCM is probably the most cost-effective security control of all
BCM start now start small scale fast broadendeepen BCM scope
incrementally
You must be able to respond to your circumstances as they exist - not as you
would like them to be - Brian Billick
robertmcachia at umedumt
Sources in and around BCM for Data Centres
The Business Continuity Institute (BCI UK) httpwwwthebciorg
Guidancestudies on BCM by ENISA (EU) ECB (EU) CMI (UK) FSA (UK) UK
Cabinet Office
A primer on ITIL ISO 20000 - IT Service Management
httpwwwisaca-
maltaorgdocumentspresentationsIT_Service_Management_Robert_M_Cachia_26_02
_2009pdf
Agility amp alertness hellipcontinuitycentral httpwwwcontinuitycentralcomukhtm
BS25999 (British Standards Institution) httpwwwbsigroupcomensectorsandservicesDisciplinesBusines
s-Continuity
RiskIT amp CobIT frameworks (ISACA) wwwisaca-maltaorg
SFIA HR framework for IT ISEB (BCS) httpwwwbcsorg
SEI (USA) httpwwwseicmuedu
Uptime Institute (USA) httpwwwuptimeinstituteorg
robertmcachia at umedumt
Questions and Discussion
robertmcachia atldquo umedumt
Thank You
Dr Robert M Cachia
httpwwwlinkedincominrobertmcachia
robertmcachia ldquoatrdquo umedumt
Business e-Services amp trust
Is she
bull a Logistics Manager
bull a Banker
bull a PA to the CEO
bull a CEO
in
bull an Airport
bull a Bourse
bull a Telecoms provider
bull iGaming
bull e-payment
bull a Retail chain hellip
hellip doing B2B or B2C
hellip doing e-CRM or e-SCMe-services need trust
robertmcachia at umedumt
BCM and IT - economic context
Economy 20 amp Economy 30
digital networked global mobile locational 24x7x365
11 billion Internet users 27 billion mobile phone users
ubiquitous VoIPSkype e-mail amp calendaring software
e-Business e-Banking e-Payment e-Government
e-SCM e-CRM B2B B2C -across supply chains geographies time-zones
economies amp jurisdictions
BCM in DCs is mandated by economic logic
robertmcachia at umedumt
BCM and IT - what not to do
human error IT management human error technical
robertmcachia at umedumt
DC BCM terminology Threats Vulnerabilities Risk Assets
Threat the intent and capacity to cause loss or disruption
and create adverse consequences - eg to IT
services data Data Centers IT Call Centers etc
Vulnerability the susceptibility of a service provider service data
or infrastructure to damage impairment or exposure
by a threat
Risk a measure of the potential consequences of a
contingency against the likelihood of its occurring
threats + vulnerabilities = risk
Impact the consequenceeffect of a threat expressed in
terms of reduction of DC capability or loss of
business service data etc
Asset an IT service element eg software hardware IT
people data
robertmcachia at umedumt
Threats - some examples
Some data centre threats
human error (managementtechnical accidentalmalicious)
technical failure (mechanical electrical hardware software
etc)
software failure (even software patches themselves contain
bugs)
fire explosion smoke
floods (natural burst pipes)
toxic hazards (effluents emissions)
structural collapse
crime (vandalism organized crime white-collar crime etc)
mass cyber-attack digital terrorism cyber-war
Established threats may diminish and new threats emerge - alertness
Oct 2006 - Marsascala
robertmcachia at umedumt
Threats - general amp IT-specific
-
Explosion fire smoke emissions effluents
in or close to DC
malicious IT Threat
robertmcachia at umedumt
IT Vulnerabilities - some examples (1)
DC vulnerabilities may be technical or non-technical
eg in organizational design in business processes systems or software
Some IT vulnerabilities
Poor Data Centre site selection or Data Centre design
SPOFs (people hardware databases software)
Poor Data Centre procedures or compliance to
procedure (SFIA HR framework for IT ITIL)
Poor physical access control
Poor software patchinggood physical access
control
robertmcachia at umedumt
IT Vulnerabilities - some examples (2)
Poor software testing especially Web
testing (recall CMMi ISEB)
Poor Configuration ManagementVersion
Control (recall ISO 20000 ITIL)
Poor Infosec (passwords biometrics
encryption anti-virusmalware)
(recall ISO 27001 CobIT RiskIT)
Poor Capacity ManagementAvailability
Management (recall ISO 20000 ITIL)
Poor procurementsupplier management
(ISO 9000 ITIL BS25999)
Poor fire protection
Known vulnerabilities sometimes diminish amp new ones emerge - alertness
wiring from hell
robertmcachia at umedumt
Risks - general amp IT-specific
loss of facilities (buildings roads leading to building etc)
loss of data (data integrity data availability recall ISO 27001)
loss of power
loss of water supply (cooling) the opposite - flooding
computer infection outbreaks - viruses worms
physical access compromised intrusion with intent
asset seizure
Changing threats changing vulnerabilities - shifting scenario alertness
Some Data Centres risks
supply chain disruption (software vendor collapse
hardware vendor collapse ISP collapse)
loss of Telecom connectivity (data voice)
loss of people (people can be SPOFs)Your IT vendor supply
chain
Burst pipe
DC flooding
robertmcachia at umedumt
IT Risks - some examples
Physical DC Risks Logical DC Risks
DC outages business impacts - large amp immediate
Impact of DC outages
bull down-time
bull breached SLAs
bull lost revenue
bull reputational damage
clients and prospects lost
forever
bull cost to recover
bull legal liabilities
bull some DCs never recover
Recall many DCs are in
premises that were never
designed to be DC-ready
Q How to protect value-creation capability
A BCM
robertmcachia at umedumt
BCM in Data Centres BCM for Data Centres
360 e-continuity amp e-assurance
business functions of commercial IT providers(eg CEOrsquos office Procurement Marketing amp Sales HR Facilities
Planning QA)
IT processes eg IncidentProblem ConfigurationChange
AvailabilityCapacity
Data
Employees including IT people
Physical FacilitiesBuilding Networks VoIP
Computational and Storage
Servers SANNAS
Software stackOS DBMS middleware applications
ApplicationsServicesB2BB2C e-SCMERPe-CRM BIKM IntranetsExtranets e-mail
BCM for data centres 360
robertmcachia at umedumt
BCM DC initiative output a BC Plan + organizational capability
A good BC plan is articulated by components and views
With a good BC plan the CIOCTOData Centre Manager can tell the
CEO
We haveare executing plans by DC clientmarket segment hellip
We haveare executing plans by Applicationby Service hellip
We haveare executing plans by DC Departmenthellip
We haveare executing plans by Buildingby Floor hellip
BC Plan BC Planning (i) What (ii) How (iii) Who (iv) When
robertmcachia at umedumt
DC Business Continuity Planning management choices
Which DC clientsservicesapplications get restored first and who will
wait longer hellip recovery sequence for each client
Which servicesapplications get restored completely which clients will
initially only get limited service
MTPD maximum tolerable period of disruption the period of time by
which all identified systemsapplicationsservices and data
must be restored to normality (for DC for each client)
RTO recovery time objective the period in time within which
assetsservices must be recovered after disruption (for DC for
each client)
RTO affects the recovery option shorter RTO -gt more difficult
more expensive (for DC for each client)
RPO recovery point objective the point in time to which
assetsservices must be restored after disruption (for DC for
each client)
RPO affects the volume of data that may need to be restored
robertmcachia at umedumt
BCM initiative requires a dedicated ProjectProgramme
A BC initiative may adopt Prince 2 or PMBOK projectmanagement methods The typical BCM phases
BC Project Initiation Phase
(mandate BCM scope DC BCM objectives responsibilitiesapproach awareness and BC project outcomesdeliverables)
Business Impact Analysis Phase
Risk Assessment Phase
BC Planning amp Design Phase
Exercise amp Testing Phase
Maintenance amp Review Phase
Awareness amp Training Phase
robertmcachia at umedumt
Business Impact Analysis (BIA) Phase
BIA achieves understanding of the DC its users activities
servicesrevenue streams applications data liabilities
bull BIA takes an internal focus - it identifies critical DC value-creatingactivities and assets
bull BIA quantifies DC loss due to disruptionimpairment of assets(reputational financial lost revenue cost of recovery)
bull BIA does not estimate the probability of types of DC incidents - itquantifies the consequences
bull BIA techniques questionnaires + interviews
BIA answers the questions which DC activities amp
assets create most value hellip of all potential DC losses
which impaired activities amp assets would be the
greatest losscascading impacts
multiple DC failure
robertmcachia at umedumt
The Risk Matrix - likelihood vs impact
Major Risks to DC - top-right square
unacceptable Eliminate or Postpone
Transfer Monitor amp Mitigate +
management ownership + DO BCM -Governance
Contingency Risks to DC - top-left
square very rare events will
extinguish the business complacency +
overconfidence can distract
management from the consequences
DO BCM - Governance
High incidence and low impact risksamp
minor risks order of the day -
operational matters
robertmcachia at umedumt
Risk Assessment Phase
RA considers the whatwhywherewho can and would cause the DCthe disastrous losses identified in BIA RA seeks to understand threatsto valued serviceDC assets
bull RA takes a primarily external focus a worldpoliticaleconomic IT sector focus - it looks at sources and causes
bull RA identifies threat scenarios to DC value-creating activities ampassets
bull RA estimates likelihood (probability) of occurrence of DC lossesamp asset impairment identified in BIA (estimates orguesstimates)
Note Need to locate DC activities amp assets in the slots of the RiskMatrix for the DC and plan to protect value-creating capability
robertmcachia at umedumt
BC Planning amp Design Phase - business recovery
DC recovery options driven by business requirements
regulation contracts amp SLAs A typical recovery sequence
1 Set MTPD objective for the DCfor each
client
2 Assign RTOs for individual
clientservicesapplicationsdata on
basis of prioritySLA
3 Evaluate alternative management amp
technical recovery options
4 Perform costbenefit analysis for each
option (re-planiterate)
5 Decide CEOCFO (i) approval (ii)
budget (iii) tasking individualsFailure to prepare is preparing
to fail - John Wooden
robertmcachia at umedumt
Some business recovery options for the DC
bull an Emergency Operations OfficeCentre
bull virtual workplaceteleworking from home for selected DC employees
bull alternative IT Call CentreData Centre
bull business decision - On-shore Near-shore Offshore
fallbackredundancy options
A typical recovery sequence
1 Initial emergency response
2 Resume mission-critical ApplicationsServices
3 Resume non-critical ApplicationsServices
4 Restoration to primary site full services verify stability
Ongoing communication - DC employees clients regulators
BC Planning amp Design Phase - business recovery (2)
robertmcachia at umedumt
BC Planning amp Design - technical recovery
Note business decision On-shore Near-shore Offshore
Note Mirror Hot Warm Cold (i) own (ii) shared-space (iii) reciprocal
agreements (iv) outsourced a business issue
Note high-end BC for DCs active-active hellip hot failover hellip
Note entry-point BC for DCs electronic vaulting remote journaling
transaction logging
Mirror full redundancy short-
distancewidely-separated Data
Centers latency considerations
Hot fully equipped fallback Data Centre
Warm fallback Data Center missing key
components
Cold empty fallback Data Center
DC Recovery options by increasing MTPD RTO amp decreasing cost
mirroring
robertmcachia at umedumt
Phases Exercise amp Testing Maintain amp Review Awareness amp Training
Maintain amp Review Phase
feedback from testingaudit broadendeepen DC BC plan
Awareness amp Training Phase
its never enough institutionalize BC practice embed into
DC culture
The proof of the bdquopudding‟
is in the eating
Exercise amp Testing Phase
Give yourself an intentional DC disaster
1 Re-test BCP regularly
(walk-through checklist role-
playingsimulation full interruption amp
rehearsal)
2 Test alternative risk (threat +
vulnerability) scenarios
robertmcachia at umedumt
An example disaster mass cyber-attack
A mass cyber-attack may
involve
prior sniffing hellip
hellip social engineering amp
widespread attack
An attack may be blended
denial-of-service andor
widespread virusworm
infection andor
penetration (data theft
data corruption) andor
fraudblackmail
IncidentProblem Management
(recall ISO 20000 ITIL)source M Benyoucef
University of Ottawa
robertmcachia at umedumt
An example DC disaster BCM response to cyber-attack
This sequence applies to DoSDDoS infection amp penetration cyber-
attacks
Note After the BCM response seek prevention (i) Lessons Learnt (ii)
improve DC processsystems resilience (iii) heighten DC emergency
preparedness
1 Interdict actions to block attack halt its progress
2 Contain actions to prevent further degradation
amp regain control
3 Recover repair fix damage bring all systems
data applicationsservices to normality
4 Analyse investigationaudit - examination of
evidence
When prevention fails and an attack is under way the BCM
response is
Interdict Contain
Recover Analyse
robertmcachia at umedumt
Structure amp Content of BS25999
A DC may go beyond organising a BCP project
It may decide to formalise its BCM assurance process by embracing the
BS25999 standard
BS25999 holistic structure amp content
bull overview of business continuity management (BCM)
bull business continuity management policy
bull BCM programme management
bull understanding the organization
bull determining business continuity strategy
bull developing and implementing a BCM response
bull exercising maintaining and reviewing BCM arrangements
bull embedding BCM in the organizationrsquos culture
BS25999 Holistic
robertmcachia at umedumt
Some takeaways (1)
BCM thinking amp action
bull decrease the number of
holes both management amp
technical
bull decrease the size of the
holes and
bull keep them moving so they
are never aligned
The Swiss cheese model of business resilience
robertmcachia at umedumt
Some takeaways (2)
BCM is a concern of
CIOs CTOs Data Centre Managers
Quality Managers Infosec Managers
IT Architects DBAs
IT Auditors
CEOs CFOs because itrsquos Governance
BCM addresses broad non-specific risks it is a ldquocatch allrdquo
BCM is probably the most cost-effective security control of all
BCM start now start small scale fast broadendeepen BCM scope
incrementally
You must be able to respond to your circumstances as they exist - not as you
would like them to be - Brian Billick
robertmcachia at umedumt
Sources in and around BCM for Data Centres
The Business Continuity Institute (BCI UK) httpwwwthebciorg
Guidancestudies on BCM by ENISA (EU) ECB (EU) CMI (UK) FSA (UK) UK
Cabinet Office
A primer on ITIL ISO 20000 - IT Service Management
httpwwwisaca-
maltaorgdocumentspresentationsIT_Service_Management_Robert_M_Cachia_26_02
_2009pdf
Agility amp alertness hellipcontinuitycentral httpwwwcontinuitycentralcomukhtm
BS25999 (British Standards Institution) httpwwwbsigroupcomensectorsandservicesDisciplinesBusines
s-Continuity
RiskIT amp CobIT frameworks (ISACA) wwwisaca-maltaorg
SFIA HR framework for IT ISEB (BCS) httpwwwbcsorg
SEI (USA) httpwwwseicmuedu
Uptime Institute (USA) httpwwwuptimeinstituteorg
robertmcachia at umedumt
Questions and Discussion
robertmcachia atldquo umedumt
Thank You
Dr Robert M Cachia
httpwwwlinkedincominrobertmcachia
robertmcachia ldquoatrdquo umedumt
BCM and IT - economic context
Economy 20 amp Economy 30
digital networked global mobile locational 24x7x365
11 billion Internet users 27 billion mobile phone users
ubiquitous VoIPSkype e-mail amp calendaring software
e-Business e-Banking e-Payment e-Government
e-SCM e-CRM B2B B2C -across supply chains geographies time-zones
economies amp jurisdictions
BCM in DCs is mandated by economic logic
robertmcachia at umedumt
BCM and IT - what not to do
human error IT management human error technical
robertmcachia at umedumt
DC BCM terminology Threats Vulnerabilities Risk Assets
Threat the intent and capacity to cause loss or disruption
and create adverse consequences - eg to IT
services data Data Centers IT Call Centers etc
Vulnerability the susceptibility of a service provider service data
or infrastructure to damage impairment or exposure
by a threat
Risk a measure of the potential consequences of a
contingency against the likelihood of its occurring
threats + vulnerabilities = risk
Impact the consequenceeffect of a threat expressed in
terms of reduction of DC capability or loss of
business service data etc
Asset an IT service element eg software hardware IT
people data
robertmcachia at umedumt
Threats - some examples
Some data centre threats
human error (managementtechnical accidentalmalicious)
technical failure (mechanical electrical hardware software
etc)
software failure (even software patches themselves contain
bugs)
fire explosion smoke
floods (natural burst pipes)
toxic hazards (effluents emissions)
structural collapse
crime (vandalism organized crime white-collar crime etc)
mass cyber-attack digital terrorism cyber-war
Established threats may diminish and new threats emerge - alertness
Oct 2006 - Marsascala
robertmcachia at umedumt
Threats - general amp IT-specific
-
Explosion fire smoke emissions effluents
in or close to DC
malicious IT Threat
robertmcachia at umedumt
IT Vulnerabilities - some examples (1)
DC vulnerabilities may be technical or non-technical
eg in organizational design in business processes systems or software
Some IT vulnerabilities
Poor Data Centre site selection or Data Centre design
SPOFs (people hardware databases software)
Poor Data Centre procedures or compliance to
procedure (SFIA HR framework for IT ITIL)
Poor physical access control
Poor software patchinggood physical access
control
robertmcachia at umedumt
IT Vulnerabilities - some examples (2)
Poor software testing especially Web
testing (recall CMMi ISEB)
Poor Configuration ManagementVersion
Control (recall ISO 20000 ITIL)
Poor Infosec (passwords biometrics
encryption anti-virusmalware)
(recall ISO 27001 CobIT RiskIT)
Poor Capacity ManagementAvailability
Management (recall ISO 20000 ITIL)
Poor procurementsupplier management
(ISO 9000 ITIL BS25999)
Poor fire protection
Known vulnerabilities sometimes diminish amp new ones emerge - alertness
wiring from hell
robertmcachia at umedumt
Risks - general amp IT-specific
loss of facilities (buildings roads leading to building etc)
loss of data (data integrity data availability recall ISO 27001)
loss of power
loss of water supply (cooling) the opposite - flooding
computer infection outbreaks - viruses worms
physical access compromised intrusion with intent
asset seizure
Changing threats changing vulnerabilities - shifting scenario alertness
Some Data Centres risks
supply chain disruption (software vendor collapse
hardware vendor collapse ISP collapse)
loss of Telecom connectivity (data voice)
loss of people (people can be SPOFs)Your IT vendor supply
chain
Burst pipe
DC flooding
robertmcachia at umedumt
IT Risks - some examples
Physical DC Risks Logical DC Risks
DC outages business impacts - large amp immediate
Impact of DC outages
bull down-time
bull breached SLAs
bull lost revenue
bull reputational damage
clients and prospects lost
forever
bull cost to recover
bull legal liabilities
bull some DCs never recover
Recall many DCs are in
premises that were never
designed to be DC-ready
Q How to protect value-creation capability
A BCM
robertmcachia at umedumt
BCM in Data Centres BCM for Data Centres
360 e-continuity amp e-assurance
business functions of commercial IT providers(eg CEOrsquos office Procurement Marketing amp Sales HR Facilities
Planning QA)
IT processes eg IncidentProblem ConfigurationChange
AvailabilityCapacity
Data
Employees including IT people
Physical FacilitiesBuilding Networks VoIP
Computational and Storage
Servers SANNAS
Software stackOS DBMS middleware applications
ApplicationsServicesB2BB2C e-SCMERPe-CRM BIKM IntranetsExtranets e-mail
BCM for data centres 360
robertmcachia at umedumt
BCM DC initiative output a BC Plan + organizational capability
A good BC plan is articulated by components and views
With a good BC plan the CIOCTOData Centre Manager can tell the
CEO
We haveare executing plans by DC clientmarket segment hellip
We haveare executing plans by Applicationby Service hellip
We haveare executing plans by DC Departmenthellip
We haveare executing plans by Buildingby Floor hellip
BC Plan BC Planning (i) What (ii) How (iii) Who (iv) When
robertmcachia at umedumt
DC Business Continuity Planning management choices
Which DC clientsservicesapplications get restored first and who will
wait longer hellip recovery sequence for each client
Which servicesapplications get restored completely which clients will
initially only get limited service
MTPD maximum tolerable period of disruption the period of time by
which all identified systemsapplicationsservices and data
must be restored to normality (for DC for each client)
RTO recovery time objective the period in time within which
assetsservices must be recovered after disruption (for DC for
each client)
RTO affects the recovery option shorter RTO -gt more difficult
more expensive (for DC for each client)
RPO recovery point objective the point in time to which
assetsservices must be restored after disruption (for DC for
each client)
RPO affects the volume of data that may need to be restored
robertmcachia at umedumt
BCM initiative requires a dedicated ProjectProgramme
A BC initiative may adopt Prince 2 or PMBOK projectmanagement methods The typical BCM phases
BC Project Initiation Phase
(mandate BCM scope DC BCM objectives responsibilitiesapproach awareness and BC project outcomesdeliverables)
Business Impact Analysis Phase
Risk Assessment Phase
BC Planning amp Design Phase
Exercise amp Testing Phase
Maintenance amp Review Phase
Awareness amp Training Phase
robertmcachia at umedumt
Business Impact Analysis (BIA) Phase
BIA achieves understanding of the DC its users activities
servicesrevenue streams applications data liabilities
bull BIA takes an internal focus - it identifies critical DC value-creatingactivities and assets
bull BIA quantifies DC loss due to disruptionimpairment of assets(reputational financial lost revenue cost of recovery)
bull BIA does not estimate the probability of types of DC incidents - itquantifies the consequences
bull BIA techniques questionnaires + interviews
BIA answers the questions which DC activities amp
assets create most value hellip of all potential DC losses
which impaired activities amp assets would be the
greatest losscascading impacts
multiple DC failure
robertmcachia at umedumt
The Risk Matrix - likelihood vs impact
Major Risks to DC - top-right square
unacceptable Eliminate or Postpone
Transfer Monitor amp Mitigate +
management ownership + DO BCM -Governance
Contingency Risks to DC - top-left
square very rare events will
extinguish the business complacency +
overconfidence can distract
management from the consequences
DO BCM - Governance
High incidence and low impact risksamp
minor risks order of the day -
operational matters
robertmcachia at umedumt
Risk Assessment Phase
RA considers the whatwhywherewho can and would cause the DCthe disastrous losses identified in BIA RA seeks to understand threatsto valued serviceDC assets
bull RA takes a primarily external focus a worldpoliticaleconomic IT sector focus - it looks at sources and causes
bull RA identifies threat scenarios to DC value-creating activities ampassets
bull RA estimates likelihood (probability) of occurrence of DC lossesamp asset impairment identified in BIA (estimates orguesstimates)
Note Need to locate DC activities amp assets in the slots of the RiskMatrix for the DC and plan to protect value-creating capability
robertmcachia at umedumt
BC Planning amp Design Phase - business recovery
DC recovery options driven by business requirements
regulation contracts amp SLAs A typical recovery sequence
1 Set MTPD objective for the DCfor each
client
2 Assign RTOs for individual
clientservicesapplicationsdata on
basis of prioritySLA
3 Evaluate alternative management amp
technical recovery options
4 Perform costbenefit analysis for each
option (re-planiterate)
5 Decide CEOCFO (i) approval (ii)
budget (iii) tasking individualsFailure to prepare is preparing
to fail - John Wooden
robertmcachia at umedumt
Some business recovery options for the DC
bull an Emergency Operations OfficeCentre
bull virtual workplaceteleworking from home for selected DC employees
bull alternative IT Call CentreData Centre
bull business decision - On-shore Near-shore Offshore
fallbackredundancy options
A typical recovery sequence
1 Initial emergency response
2 Resume mission-critical ApplicationsServices
3 Resume non-critical ApplicationsServices
4 Restoration to primary site full services verify stability
Ongoing communication - DC employees clients regulators
BC Planning amp Design Phase - business recovery (2)
robertmcachia at umedumt
BC Planning amp Design - technical recovery
Note business decision On-shore Near-shore Offshore
Note Mirror Hot Warm Cold (i) own (ii) shared-space (iii) reciprocal
agreements (iv) outsourced a business issue
Note high-end BC for DCs active-active hellip hot failover hellip
Note entry-point BC for DCs electronic vaulting remote journaling
transaction logging
Mirror full redundancy short-
distancewidely-separated Data
Centers latency considerations
Hot fully equipped fallback Data Centre
Warm fallback Data Center missing key
components
Cold empty fallback Data Center
DC Recovery options by increasing MTPD RTO amp decreasing cost
mirroring
robertmcachia at umedumt
Phases Exercise amp Testing Maintain amp Review Awareness amp Training
Maintain amp Review Phase
feedback from testingaudit broadendeepen DC BC plan
Awareness amp Training Phase
its never enough institutionalize BC practice embed into
DC culture
The proof of the bdquopudding‟
is in the eating
Exercise amp Testing Phase
Give yourself an intentional DC disaster
1 Re-test BCP regularly
(walk-through checklist role-
playingsimulation full interruption amp
rehearsal)
2 Test alternative risk (threat +
vulnerability) scenarios
robertmcachia at umedumt
An example disaster mass cyber-attack
A mass cyber-attack may
involve
prior sniffing hellip
hellip social engineering amp
widespread attack
An attack may be blended
denial-of-service andor
widespread virusworm
infection andor
penetration (data theft
data corruption) andor
fraudblackmail
IncidentProblem Management
(recall ISO 20000 ITIL)source M Benyoucef
University of Ottawa
robertmcachia at umedumt
An example DC disaster BCM response to cyber-attack
This sequence applies to DoSDDoS infection amp penetration cyber-
attacks
Note After the BCM response seek prevention (i) Lessons Learnt (ii)
improve DC processsystems resilience (iii) heighten DC emergency
preparedness
1 Interdict actions to block attack halt its progress
2 Contain actions to prevent further degradation
amp regain control
3 Recover repair fix damage bring all systems
data applicationsservices to normality
4 Analyse investigationaudit - examination of
evidence
When prevention fails and an attack is under way the BCM
response is
Interdict Contain
Recover Analyse
robertmcachia at umedumt
Structure amp Content of BS25999
A DC may go beyond organising a BCP project
It may decide to formalise its BCM assurance process by embracing the
BS25999 standard
BS25999 holistic structure amp content
bull overview of business continuity management (BCM)
bull business continuity management policy
bull BCM programme management
bull understanding the organization
bull determining business continuity strategy
bull developing and implementing a BCM response
bull exercising maintaining and reviewing BCM arrangements
bull embedding BCM in the organizationrsquos culture
BS25999 Holistic
robertmcachia at umedumt
Some takeaways (1)
BCM thinking amp action
bull decrease the number of
holes both management amp
technical
bull decrease the size of the
holes and
bull keep them moving so they
are never aligned
The Swiss cheese model of business resilience
robertmcachia at umedumt
Some takeaways (2)
BCM is a concern of
CIOs CTOs Data Centre Managers
Quality Managers Infosec Managers
IT Architects DBAs
IT Auditors
CEOs CFOs because itrsquos Governance
BCM addresses broad non-specific risks it is a ldquocatch allrdquo
BCM is probably the most cost-effective security control of all
BCM start now start small scale fast broadendeepen BCM scope
incrementally
You must be able to respond to your circumstances as they exist - not as you
would like them to be - Brian Billick
robertmcachia at umedumt
Sources in and around BCM for Data Centres
The Business Continuity Institute (BCI UK) httpwwwthebciorg
Guidancestudies on BCM by ENISA (EU) ECB (EU) CMI (UK) FSA (UK) UK
Cabinet Office
A primer on ITIL ISO 20000 - IT Service Management
httpwwwisaca-
maltaorgdocumentspresentationsIT_Service_Management_Robert_M_Cachia_26_02
_2009pdf
Agility amp alertness hellipcontinuitycentral httpwwwcontinuitycentralcomukhtm
BS25999 (British Standards Institution) httpwwwbsigroupcomensectorsandservicesDisciplinesBusines
s-Continuity
RiskIT amp CobIT frameworks (ISACA) wwwisaca-maltaorg
SFIA HR framework for IT ISEB (BCS) httpwwwbcsorg
SEI (USA) httpwwwseicmuedu
Uptime Institute (USA) httpwwwuptimeinstituteorg
robertmcachia at umedumt
Questions and Discussion
robertmcachia atldquo umedumt
Thank You
Dr Robert M Cachia
httpwwwlinkedincominrobertmcachia
robertmcachia ldquoatrdquo umedumt
BCM and IT - what not to do
human error IT management human error technical
robertmcachia at umedumt
DC BCM terminology Threats Vulnerabilities Risk Assets
Threat the intent and capacity to cause loss or disruption
and create adverse consequences - eg to IT
services data Data Centers IT Call Centers etc
Vulnerability the susceptibility of a service provider service data
or infrastructure to damage impairment or exposure
by a threat
Risk a measure of the potential consequences of a
contingency against the likelihood of its occurring
threats + vulnerabilities = risk
Impact the consequenceeffect of a threat expressed in
terms of reduction of DC capability or loss of
business service data etc
Asset an IT service element eg software hardware IT
people data
robertmcachia at umedumt
Threats - some examples
Some data centre threats
human error (managementtechnical accidentalmalicious)
technical failure (mechanical electrical hardware software
etc)
software failure (even software patches themselves contain
bugs)
fire explosion smoke
floods (natural burst pipes)
toxic hazards (effluents emissions)
structural collapse
crime (vandalism organized crime white-collar crime etc)
mass cyber-attack digital terrorism cyber-war
Established threats may diminish and new threats emerge - alertness
Oct 2006 - Marsascala
robertmcachia at umedumt
Threats - general amp IT-specific
-
Explosion fire smoke emissions effluents
in or close to DC
malicious IT Threat
robertmcachia at umedumt
IT Vulnerabilities - some examples (1)
DC vulnerabilities may be technical or non-technical
eg in organizational design in business processes systems or software
Some IT vulnerabilities
Poor Data Centre site selection or Data Centre design
SPOFs (people hardware databases software)
Poor Data Centre procedures or compliance to
procedure (SFIA HR framework for IT ITIL)
Poor physical access control
Poor software patchinggood physical access
control
robertmcachia at umedumt
IT Vulnerabilities - some examples (2)
Poor software testing especially Web
testing (recall CMMi ISEB)
Poor Configuration ManagementVersion
Control (recall ISO 20000 ITIL)
Poor Infosec (passwords biometrics
encryption anti-virusmalware)
(recall ISO 27001 CobIT RiskIT)
Poor Capacity ManagementAvailability
Management (recall ISO 20000 ITIL)
Poor procurementsupplier management
(ISO 9000 ITIL BS25999)
Poor fire protection
Known vulnerabilities sometimes diminish amp new ones emerge - alertness
wiring from hell
robertmcachia at umedumt
Risks - general amp IT-specific
loss of facilities (buildings roads leading to building etc)
loss of data (data integrity data availability recall ISO 27001)
loss of power
loss of water supply (cooling) the opposite - flooding
computer infection outbreaks - viruses worms
physical access compromised intrusion with intent
asset seizure
Changing threats changing vulnerabilities - shifting scenario alertness
Some Data Centres risks
supply chain disruption (software vendor collapse
hardware vendor collapse ISP collapse)
loss of Telecom connectivity (data voice)
loss of people (people can be SPOFs)Your IT vendor supply
chain
Burst pipe
DC flooding
robertmcachia at umedumt
IT Risks - some examples
Physical DC Risks Logical DC Risks
DC outages business impacts - large amp immediate
Impact of DC outages
bull down-time
bull breached SLAs
bull lost revenue
bull reputational damage
clients and prospects lost
forever
bull cost to recover
bull legal liabilities
bull some DCs never recover
Recall many DCs are in
premises that were never
designed to be DC-ready
Q How to protect value-creation capability
A BCM
robertmcachia at umedumt
BCM in Data Centres BCM for Data Centres
360 e-continuity amp e-assurance
business functions of commercial IT providers(eg CEOrsquos office Procurement Marketing amp Sales HR Facilities
Planning QA)
IT processes eg IncidentProblem ConfigurationChange
AvailabilityCapacity
Data
Employees including IT people
Physical FacilitiesBuilding Networks VoIP
Computational and Storage
Servers SANNAS
Software stackOS DBMS middleware applications
ApplicationsServicesB2BB2C e-SCMERPe-CRM BIKM IntranetsExtranets e-mail
BCM for data centres 360
robertmcachia at umedumt
BCM DC initiative output a BC Plan + organizational capability
A good BC plan is articulated by components and views
With a good BC plan the CIOCTOData Centre Manager can tell the
CEO
We haveare executing plans by DC clientmarket segment hellip
We haveare executing plans by Applicationby Service hellip
We haveare executing plans by DC Departmenthellip
We haveare executing plans by Buildingby Floor hellip
BC Plan BC Planning (i) What (ii) How (iii) Who (iv) When
robertmcachia at umedumt
DC Business Continuity Planning management choices
Which DC clientsservicesapplications get restored first and who will
wait longer hellip recovery sequence for each client
Which servicesapplications get restored completely which clients will
initially only get limited service
MTPD maximum tolerable period of disruption the period of time by
which all identified systemsapplicationsservices and data
must be restored to normality (for DC for each client)
RTO recovery time objective the period in time within which
assetsservices must be recovered after disruption (for DC for
each client)
RTO affects the recovery option shorter RTO -gt more difficult
more expensive (for DC for each client)
RPO recovery point objective the point in time to which
assetsservices must be restored after disruption (for DC for
each client)
RPO affects the volume of data that may need to be restored
robertmcachia at umedumt
BCM initiative requires a dedicated ProjectProgramme
A BC initiative may adopt Prince 2 or PMBOK projectmanagement methods The typical BCM phases
BC Project Initiation Phase
(mandate BCM scope DC BCM objectives responsibilitiesapproach awareness and BC project outcomesdeliverables)
Business Impact Analysis Phase
Risk Assessment Phase
BC Planning amp Design Phase
Exercise amp Testing Phase
Maintenance amp Review Phase
Awareness amp Training Phase
robertmcachia at umedumt
Business Impact Analysis (BIA) Phase
BIA achieves understanding of the DC its users activities
servicesrevenue streams applications data liabilities
bull BIA takes an internal focus - it identifies critical DC value-creatingactivities and assets
bull BIA quantifies DC loss due to disruptionimpairment of assets(reputational financial lost revenue cost of recovery)
bull BIA does not estimate the probability of types of DC incidents - itquantifies the consequences
bull BIA techniques questionnaires + interviews
BIA answers the questions which DC activities amp
assets create most value hellip of all potential DC losses
which impaired activities amp assets would be the
greatest losscascading impacts
multiple DC failure
robertmcachia at umedumt
The Risk Matrix - likelihood vs impact
Major Risks to DC - top-right square
unacceptable Eliminate or Postpone
Transfer Monitor amp Mitigate +
management ownership + DO BCM -Governance
Contingency Risks to DC - top-left
square very rare events will
extinguish the business complacency +
overconfidence can distract
management from the consequences
DO BCM - Governance
High incidence and low impact risksamp
minor risks order of the day -
operational matters
robertmcachia at umedumt
Risk Assessment Phase
RA considers the whatwhywherewho can and would cause the DCthe disastrous losses identified in BIA RA seeks to understand threatsto valued serviceDC assets
bull RA takes a primarily external focus a worldpoliticaleconomic IT sector focus - it looks at sources and causes
bull RA identifies threat scenarios to DC value-creating activities ampassets
bull RA estimates likelihood (probability) of occurrence of DC lossesamp asset impairment identified in BIA (estimates orguesstimates)
Note Need to locate DC activities amp assets in the slots of the RiskMatrix for the DC and plan to protect value-creating capability
robertmcachia at umedumt
BC Planning amp Design Phase - business recovery
DC recovery options driven by business requirements
regulation contracts amp SLAs A typical recovery sequence
1 Set MTPD objective for the DCfor each
client
2 Assign RTOs for individual
clientservicesapplicationsdata on
basis of prioritySLA
3 Evaluate alternative management amp
technical recovery options
4 Perform costbenefit analysis for each
option (re-planiterate)
5 Decide CEOCFO (i) approval (ii)
budget (iii) tasking individualsFailure to prepare is preparing
to fail - John Wooden
robertmcachia at umedumt
Some business recovery options for the DC
bull an Emergency Operations OfficeCentre
bull virtual workplaceteleworking from home for selected DC employees
bull alternative IT Call CentreData Centre
bull business decision - On-shore Near-shore Offshore
fallbackredundancy options
A typical recovery sequence
1 Initial emergency response
2 Resume mission-critical ApplicationsServices
3 Resume non-critical ApplicationsServices
4 Restoration to primary site full services verify stability
Ongoing communication - DC employees clients regulators
BC Planning amp Design Phase - business recovery (2)
robertmcachia at umedumt
BC Planning amp Design - technical recovery
Note business decision On-shore Near-shore Offshore
Note Mirror Hot Warm Cold (i) own (ii) shared-space (iii) reciprocal
agreements (iv) outsourced a business issue
Note high-end BC for DCs active-active hellip hot failover hellip
Note entry-point BC for DCs electronic vaulting remote journaling
transaction logging
Mirror full redundancy short-
distancewidely-separated Data
Centers latency considerations
Hot fully equipped fallback Data Centre
Warm fallback Data Center missing key
components
Cold empty fallback Data Center
DC Recovery options by increasing MTPD RTO amp decreasing cost
mirroring
robertmcachia at umedumt
Phases Exercise amp Testing Maintain amp Review Awareness amp Training
Maintain amp Review Phase
feedback from testingaudit broadendeepen DC BC plan
Awareness amp Training Phase
its never enough institutionalize BC practice embed into
DC culture
The proof of the bdquopudding‟
is in the eating
Exercise amp Testing Phase
Give yourself an intentional DC disaster
1 Re-test BCP regularly
(walk-through checklist role-
playingsimulation full interruption amp
rehearsal)
2 Test alternative risk (threat +
vulnerability) scenarios
robertmcachia at umedumt
An example disaster mass cyber-attack
A mass cyber-attack may
involve
prior sniffing hellip
hellip social engineering amp
widespread attack
An attack may be blended
denial-of-service andor
widespread virusworm
infection andor
penetration (data theft
data corruption) andor
fraudblackmail
IncidentProblem Management
(recall ISO 20000 ITIL)source M Benyoucef
University of Ottawa
robertmcachia at umedumt
An example DC disaster BCM response to cyber-attack
This sequence applies to DoSDDoS infection amp penetration cyber-
attacks
Note After the BCM response seek prevention (i) Lessons Learnt (ii)
improve DC processsystems resilience (iii) heighten DC emergency
preparedness
1 Interdict actions to block attack halt its progress
2 Contain actions to prevent further degradation
amp regain control
3 Recover repair fix damage bring all systems
data applicationsservices to normality
4 Analyse investigationaudit - examination of
evidence
When prevention fails and an attack is under way the BCM
response is
Interdict Contain
Recover Analyse
robertmcachia at umedumt
Structure amp Content of BS25999
A DC may go beyond organising a BCP project
It may decide to formalise its BCM assurance process by embracing the
BS25999 standard
BS25999 holistic structure amp content
bull overview of business continuity management (BCM)
bull business continuity management policy
bull BCM programme management
bull understanding the organization
bull determining business continuity strategy
bull developing and implementing a BCM response
bull exercising maintaining and reviewing BCM arrangements
bull embedding BCM in the organizationrsquos culture
BS25999 Holistic
robertmcachia at umedumt
Some takeaways (1)
BCM thinking amp action
bull decrease the number of
holes both management amp
technical
bull decrease the size of the
holes and
bull keep them moving so they
are never aligned
The Swiss cheese model of business resilience
robertmcachia at umedumt
Some takeaways (2)
BCM is a concern of
CIOs CTOs Data Centre Managers
Quality Managers Infosec Managers
IT Architects DBAs
IT Auditors
CEOs CFOs because itrsquos Governance
BCM addresses broad non-specific risks it is a ldquocatch allrdquo
BCM is probably the most cost-effective security control of all
BCM start now start small scale fast broadendeepen BCM scope
incrementally
You must be able to respond to your circumstances as they exist - not as you
would like them to be - Brian Billick
robertmcachia at umedumt
Sources in and around BCM for Data Centres
The Business Continuity Institute (BCI UK) httpwwwthebciorg
Guidancestudies on BCM by ENISA (EU) ECB (EU) CMI (UK) FSA (UK) UK
Cabinet Office
A primer on ITIL ISO 20000 - IT Service Management
httpwwwisaca-
maltaorgdocumentspresentationsIT_Service_Management_Robert_M_Cachia_26_02
_2009pdf
Agility amp alertness hellipcontinuitycentral httpwwwcontinuitycentralcomukhtm
BS25999 (British Standards Institution) httpwwwbsigroupcomensectorsandservicesDisciplinesBusines
s-Continuity
RiskIT amp CobIT frameworks (ISACA) wwwisaca-maltaorg
SFIA HR framework for IT ISEB (BCS) httpwwwbcsorg
SEI (USA) httpwwwseicmuedu
Uptime Institute (USA) httpwwwuptimeinstituteorg
robertmcachia at umedumt
Questions and Discussion
robertmcachia atldquo umedumt
Thank You
Dr Robert M Cachia
httpwwwlinkedincominrobertmcachia
robertmcachia ldquoatrdquo umedumt
DC BCM terminology Threats Vulnerabilities Risk Assets
Threat the intent and capacity to cause loss or disruption
and create adverse consequences - eg to IT
services data Data Centers IT Call Centers etc
Vulnerability the susceptibility of a service provider service data
or infrastructure to damage impairment or exposure
by a threat
Risk a measure of the potential consequences of a
contingency against the likelihood of its occurring
threats + vulnerabilities = risk
Impact the consequenceeffect of a threat expressed in
terms of reduction of DC capability or loss of
business service data etc
Asset an IT service element eg software hardware IT
people data
robertmcachia at umedumt
Threats - some examples
Some data centre threats
human error (managementtechnical accidentalmalicious)
technical failure (mechanical electrical hardware software
etc)
software failure (even software patches themselves contain
bugs)
fire explosion smoke
floods (natural burst pipes)
toxic hazards (effluents emissions)
structural collapse
crime (vandalism organized crime white-collar crime etc)
mass cyber-attack digital terrorism cyber-war
Established threats may diminish and new threats emerge - alertness
Oct 2006 - Marsascala
robertmcachia at umedumt
Threats - general amp IT-specific
-
Explosion fire smoke emissions effluents
in or close to DC
malicious IT Threat
robertmcachia at umedumt
IT Vulnerabilities - some examples (1)
DC vulnerabilities may be technical or non-technical
eg in organizational design in business processes systems or software
Some IT vulnerabilities
Poor Data Centre site selection or Data Centre design
SPOFs (people hardware databases software)
Poor Data Centre procedures or compliance to
procedure (SFIA HR framework for IT ITIL)
Poor physical access control
Poor software patchinggood physical access
control
robertmcachia at umedumt
IT Vulnerabilities - some examples (2)
Poor software testing especially Web
testing (recall CMMi ISEB)
Poor Configuration ManagementVersion
Control (recall ISO 20000 ITIL)
Poor Infosec (passwords biometrics
encryption anti-virusmalware)
(recall ISO 27001 CobIT RiskIT)
Poor Capacity ManagementAvailability
Management (recall ISO 20000 ITIL)
Poor procurementsupplier management
(ISO 9000 ITIL BS25999)
Poor fire protection
Known vulnerabilities sometimes diminish amp new ones emerge - alertness
wiring from hell
robertmcachia at umedumt
Risks - general amp IT-specific
loss of facilities (buildings roads leading to building etc)
loss of data (data integrity data availability recall ISO 27001)
loss of power
loss of water supply (cooling) the opposite - flooding
computer infection outbreaks - viruses worms
physical access compromised intrusion with intent
asset seizure
Changing threats changing vulnerabilities - shifting scenario alertness
Some Data Centres risks
supply chain disruption (software vendor collapse
hardware vendor collapse ISP collapse)
loss of Telecom connectivity (data voice)
loss of people (people can be SPOFs)Your IT vendor supply
chain
Burst pipe
DC flooding
robertmcachia at umedumt
IT Risks - some examples
Physical DC Risks Logical DC Risks
DC outages business impacts - large amp immediate
Impact of DC outages
bull down-time
bull breached SLAs
bull lost revenue
bull reputational damage
clients and prospects lost
forever
bull cost to recover
bull legal liabilities
bull some DCs never recover
Recall many DCs are in
premises that were never
designed to be DC-ready
Q How to protect value-creation capability
A BCM
robertmcachia at umedumt
BCM in Data Centres BCM for Data Centres
360 e-continuity amp e-assurance
business functions of commercial IT providers(eg CEOrsquos office Procurement Marketing amp Sales HR Facilities
Planning QA)
IT processes eg IncidentProblem ConfigurationChange
AvailabilityCapacity
Data
Employees including IT people
Physical FacilitiesBuilding Networks VoIP
Computational and Storage
Servers SANNAS
Software stackOS DBMS middleware applications
ApplicationsServicesB2BB2C e-SCMERPe-CRM BIKM IntranetsExtranets e-mail
BCM for data centres 360
robertmcachia at umedumt
BCM DC initiative output a BC Plan + organizational capability
A good BC plan is articulated by components and views
With a good BC plan the CIOCTOData Centre Manager can tell the
CEO
We haveare executing plans by DC clientmarket segment hellip
We haveare executing plans by Applicationby Service hellip
We haveare executing plans by DC Departmenthellip
We haveare executing plans by Buildingby Floor hellip
BC Plan BC Planning (i) What (ii) How (iii) Who (iv) When
robertmcachia at umedumt
DC Business Continuity Planning management choices
Which DC clientsservicesapplications get restored first and who will
wait longer hellip recovery sequence for each client
Which servicesapplications get restored completely which clients will
initially only get limited service
MTPD maximum tolerable period of disruption the period of time by
which all identified systemsapplicationsservices and data
must be restored to normality (for DC for each client)
RTO recovery time objective the period in time within which
assetsservices must be recovered after disruption (for DC for
each client)
RTO affects the recovery option shorter RTO -gt more difficult
more expensive (for DC for each client)
RPO recovery point objective the point in time to which
assetsservices must be restored after disruption (for DC for
each client)
RPO affects the volume of data that may need to be restored
robertmcachia at umedumt
BCM initiative requires a dedicated ProjectProgramme
A BC initiative may adopt Prince 2 or PMBOK projectmanagement methods The typical BCM phases
BC Project Initiation Phase
(mandate BCM scope DC BCM objectives responsibilitiesapproach awareness and BC project outcomesdeliverables)
Business Impact Analysis Phase
Risk Assessment Phase
BC Planning amp Design Phase
Exercise amp Testing Phase
Maintenance amp Review Phase
Awareness amp Training Phase
robertmcachia at umedumt
Business Impact Analysis (BIA) Phase
BIA achieves understanding of the DC its users activities
servicesrevenue streams applications data liabilities
bull BIA takes an internal focus - it identifies critical DC value-creatingactivities and assets
bull BIA quantifies DC loss due to disruptionimpairment of assets(reputational financial lost revenue cost of recovery)
bull BIA does not estimate the probability of types of DC incidents - itquantifies the consequences
bull BIA techniques questionnaires + interviews
BIA answers the questions which DC activities amp
assets create most value hellip of all potential DC losses
which impaired activities amp assets would be the
greatest losscascading impacts
multiple DC failure
robertmcachia at umedumt
The Risk Matrix - likelihood vs impact
Major Risks to DC - top-right square
unacceptable Eliminate or Postpone
Transfer Monitor amp Mitigate +
management ownership + DO BCM -Governance
Contingency Risks to DC - top-left
square very rare events will
extinguish the business complacency +
overconfidence can distract
management from the consequences
DO BCM - Governance
High incidence and low impact risksamp
minor risks order of the day -
operational matters
robertmcachia at umedumt
Risk Assessment Phase
RA considers the whatwhywherewho can and would cause the DCthe disastrous losses identified in BIA RA seeks to understand threatsto valued serviceDC assets
bull RA takes a primarily external focus a worldpoliticaleconomic IT sector focus - it looks at sources and causes
bull RA identifies threat scenarios to DC value-creating activities ampassets
bull RA estimates likelihood (probability) of occurrence of DC lossesamp asset impairment identified in BIA (estimates orguesstimates)
Note Need to locate DC activities amp assets in the slots of the RiskMatrix for the DC and plan to protect value-creating capability
robertmcachia at umedumt
BC Planning amp Design Phase - business recovery
DC recovery options driven by business requirements
regulation contracts amp SLAs A typical recovery sequence
1 Set MTPD objective for the DCfor each
client
2 Assign RTOs for individual
clientservicesapplicationsdata on
basis of prioritySLA
3 Evaluate alternative management amp
technical recovery options
4 Perform costbenefit analysis for each
option (re-planiterate)
5 Decide CEOCFO (i) approval (ii)
budget (iii) tasking individualsFailure to prepare is preparing
to fail - John Wooden
robertmcachia at umedumt
Some business recovery options for the DC
bull an Emergency Operations OfficeCentre
bull virtual workplaceteleworking from home for selected DC employees
bull alternative IT Call CentreData Centre
bull business decision - On-shore Near-shore Offshore
fallbackredundancy options
A typical recovery sequence
1 Initial emergency response
2 Resume mission-critical ApplicationsServices
3 Resume non-critical ApplicationsServices
4 Restoration to primary site full services verify stability
Ongoing communication - DC employees clients regulators
BC Planning amp Design Phase - business recovery (2)
robertmcachia at umedumt
BC Planning amp Design - technical recovery
Note business decision On-shore Near-shore Offshore
Note Mirror Hot Warm Cold (i) own (ii) shared-space (iii) reciprocal
agreements (iv) outsourced a business issue
Note high-end BC for DCs active-active hellip hot failover hellip
Note entry-point BC for DCs electronic vaulting remote journaling
transaction logging
Mirror full redundancy short-
distancewidely-separated Data
Centers latency considerations
Hot fully equipped fallback Data Centre
Warm fallback Data Center missing key
components
Cold empty fallback Data Center
DC Recovery options by increasing MTPD RTO amp decreasing cost
mirroring
robertmcachia at umedumt
Phases Exercise amp Testing Maintain amp Review Awareness amp Training
Maintain amp Review Phase
feedback from testingaudit broadendeepen DC BC plan
Awareness amp Training Phase
its never enough institutionalize BC practice embed into
DC culture
The proof of the bdquopudding‟
is in the eating
Exercise amp Testing Phase
Give yourself an intentional DC disaster
1 Re-test BCP regularly
(walk-through checklist role-
playingsimulation full interruption amp
rehearsal)
2 Test alternative risk (threat +
vulnerability) scenarios
robertmcachia at umedumt
An example disaster mass cyber-attack
A mass cyber-attack may
involve
prior sniffing hellip
hellip social engineering amp
widespread attack
An attack may be blended
denial-of-service andor
widespread virusworm
infection andor
penetration (data theft
data corruption) andor
fraudblackmail
IncidentProblem Management
(recall ISO 20000 ITIL)source M Benyoucef
University of Ottawa
robertmcachia at umedumt
An example DC disaster BCM response to cyber-attack
This sequence applies to DoSDDoS infection amp penetration cyber-
attacks
Note After the BCM response seek prevention (i) Lessons Learnt (ii)
improve DC processsystems resilience (iii) heighten DC emergency
preparedness
1 Interdict actions to block attack halt its progress
2 Contain actions to prevent further degradation
amp regain control
3 Recover repair fix damage bring all systems
data applicationsservices to normality
4 Analyse investigationaudit - examination of
evidence
When prevention fails and an attack is under way the BCM
response is
Interdict Contain
Recover Analyse
robertmcachia at umedumt
Structure amp Content of BS25999
A DC may go beyond organising a BCP project
It may decide to formalise its BCM assurance process by embracing the
BS25999 standard
BS25999 holistic structure amp content
bull overview of business continuity management (BCM)
bull business continuity management policy
bull BCM programme management
bull understanding the organization
bull determining business continuity strategy
bull developing and implementing a BCM response
bull exercising maintaining and reviewing BCM arrangements
bull embedding BCM in the organizationrsquos culture
BS25999 Holistic
robertmcachia at umedumt
Some takeaways (1)
BCM thinking amp action
bull decrease the number of
holes both management amp
technical
bull decrease the size of the
holes and
bull keep them moving so they
are never aligned
The Swiss cheese model of business resilience
robertmcachia at umedumt
Some takeaways (2)
BCM is a concern of
CIOs CTOs Data Centre Managers
Quality Managers Infosec Managers
IT Architects DBAs
IT Auditors
CEOs CFOs because itrsquos Governance
BCM addresses broad non-specific risks it is a ldquocatch allrdquo
BCM is probably the most cost-effective security control of all
BCM start now start small scale fast broadendeepen BCM scope
incrementally
You must be able to respond to your circumstances as they exist - not as you
would like them to be - Brian Billick
robertmcachia at umedumt
Sources in and around BCM for Data Centres
The Business Continuity Institute (BCI UK) httpwwwthebciorg
Guidancestudies on BCM by ENISA (EU) ECB (EU) CMI (UK) FSA (UK) UK
Cabinet Office
A primer on ITIL ISO 20000 - IT Service Management
httpwwwisaca-
maltaorgdocumentspresentationsIT_Service_Management_Robert_M_Cachia_26_02
_2009pdf
Agility amp alertness hellipcontinuitycentral httpwwwcontinuitycentralcomukhtm
BS25999 (British Standards Institution) httpwwwbsigroupcomensectorsandservicesDisciplinesBusines
s-Continuity
RiskIT amp CobIT frameworks (ISACA) wwwisaca-maltaorg
SFIA HR framework for IT ISEB (BCS) httpwwwbcsorg
SEI (USA) httpwwwseicmuedu
Uptime Institute (USA) httpwwwuptimeinstituteorg
robertmcachia at umedumt
Questions and Discussion
robertmcachia atldquo umedumt
Thank You
Dr Robert M Cachia
httpwwwlinkedincominrobertmcachia
robertmcachia ldquoatrdquo umedumt
Threats - some examples
Some data centre threats
human error (managementtechnical accidentalmalicious)
technical failure (mechanical electrical hardware software
etc)
software failure (even software patches themselves contain
bugs)
fire explosion smoke
floods (natural burst pipes)
toxic hazards (effluents emissions)
structural collapse
crime (vandalism organized crime white-collar crime etc)
mass cyber-attack digital terrorism cyber-war
Established threats may diminish and new threats emerge - alertness
Oct 2006 - Marsascala
robertmcachia at umedumt
Threats - general amp IT-specific
-
Explosion fire smoke emissions effluents
in or close to DC
malicious IT Threat
robertmcachia at umedumt
IT Vulnerabilities - some examples (1)
DC vulnerabilities may be technical or non-technical
eg in organizational design in business processes systems or software
Some IT vulnerabilities
Poor Data Centre site selection or Data Centre design
SPOFs (people hardware databases software)
Poor Data Centre procedures or compliance to
procedure (SFIA HR framework for IT ITIL)
Poor physical access control
Poor software patchinggood physical access
control
robertmcachia at umedumt
IT Vulnerabilities - some examples (2)
Poor software testing especially Web
testing (recall CMMi ISEB)
Poor Configuration ManagementVersion
Control (recall ISO 20000 ITIL)
Poor Infosec (passwords biometrics
encryption anti-virusmalware)
(recall ISO 27001 CobIT RiskIT)
Poor Capacity ManagementAvailability
Management (recall ISO 20000 ITIL)
Poor procurementsupplier management
(ISO 9000 ITIL BS25999)
Poor fire protection
Known vulnerabilities sometimes diminish amp new ones emerge - alertness
wiring from hell
robertmcachia at umedumt
Risks - general amp IT-specific
loss of facilities (buildings roads leading to building etc)
loss of data (data integrity data availability recall ISO 27001)
loss of power
loss of water supply (cooling) the opposite - flooding
computer infection outbreaks - viruses worms
physical access compromised intrusion with intent
asset seizure
Changing threats changing vulnerabilities - shifting scenario alertness
Some Data Centres risks
supply chain disruption (software vendor collapse
hardware vendor collapse ISP collapse)
loss of Telecom connectivity (data voice)
loss of people (people can be SPOFs)Your IT vendor supply
chain
Burst pipe
DC flooding
robertmcachia at umedumt
IT Risks - some examples
Physical DC Risks Logical DC Risks
DC outages business impacts - large amp immediate
Impact of DC outages
bull down-time
bull breached SLAs
bull lost revenue
bull reputational damage
clients and prospects lost
forever
bull cost to recover
bull legal liabilities
bull some DCs never recover
Recall many DCs are in
premises that were never
designed to be DC-ready
Q How to protect value-creation capability
A BCM
robertmcachia at umedumt
BCM in Data Centres BCM for Data Centres
360 e-continuity amp e-assurance
business functions of commercial IT providers(eg CEOrsquos office Procurement Marketing amp Sales HR Facilities
Planning QA)
IT processes eg IncidentProblem ConfigurationChange
AvailabilityCapacity
Data
Employees including IT people
Physical FacilitiesBuilding Networks VoIP
Computational and Storage
Servers SANNAS
Software stackOS DBMS middleware applications
ApplicationsServicesB2BB2C e-SCMERPe-CRM BIKM IntranetsExtranets e-mail
BCM for data centres 360
robertmcachia at umedumt
BCM DC initiative output a BC Plan + organizational capability
A good BC plan is articulated by components and views
With a good BC plan the CIOCTOData Centre Manager can tell the
CEO
We haveare executing plans by DC clientmarket segment hellip
We haveare executing plans by Applicationby Service hellip
We haveare executing plans by DC Departmenthellip
We haveare executing plans by Buildingby Floor hellip
BC Plan BC Planning (i) What (ii) How (iii) Who (iv) When
robertmcachia at umedumt
DC Business Continuity Planning management choices
Which DC clientsservicesapplications get restored first and who will
wait longer hellip recovery sequence for each client
Which servicesapplications get restored completely which clients will
initially only get limited service
MTPD maximum tolerable period of disruption the period of time by
which all identified systemsapplicationsservices and data
must be restored to normality (for DC for each client)
RTO recovery time objective the period in time within which
assetsservices must be recovered after disruption (for DC for
each client)
RTO affects the recovery option shorter RTO -gt more difficult
more expensive (for DC for each client)
RPO recovery point objective the point in time to which
assetsservices must be restored after disruption (for DC for
each client)
RPO affects the volume of data that may need to be restored
robertmcachia at umedumt
BCM initiative requires a dedicated ProjectProgramme
A BC initiative may adopt Prince 2 or PMBOK projectmanagement methods The typical BCM phases
BC Project Initiation Phase
(mandate BCM scope DC BCM objectives responsibilitiesapproach awareness and BC project outcomesdeliverables)
Business Impact Analysis Phase
Risk Assessment Phase
BC Planning amp Design Phase
Exercise amp Testing Phase
Maintenance amp Review Phase
Awareness amp Training Phase
robertmcachia at umedumt
Business Impact Analysis (BIA) Phase
BIA achieves understanding of the DC its users activities
servicesrevenue streams applications data liabilities
bull BIA takes an internal focus - it identifies critical DC value-creatingactivities and assets
bull BIA quantifies DC loss due to disruptionimpairment of assets(reputational financial lost revenue cost of recovery)
bull BIA does not estimate the probability of types of DC incidents - itquantifies the consequences
bull BIA techniques questionnaires + interviews
BIA answers the questions which DC activities amp
assets create most value hellip of all potential DC losses
which impaired activities amp assets would be the
greatest losscascading impacts
multiple DC failure
robertmcachia at umedumt
The Risk Matrix - likelihood vs impact
Major Risks to DC - top-right square
unacceptable Eliminate or Postpone
Transfer Monitor amp Mitigate +
management ownership + DO BCM -Governance
Contingency Risks to DC - top-left
square very rare events will
extinguish the business complacency +
overconfidence can distract
management from the consequences
DO BCM - Governance
High incidence and low impact risksamp
minor risks order of the day -
operational matters
robertmcachia at umedumt
Risk Assessment Phase
RA considers the whatwhywherewho can and would cause the DCthe disastrous losses identified in BIA RA seeks to understand threatsto valued serviceDC assets
bull RA takes a primarily external focus a worldpoliticaleconomic IT sector focus - it looks at sources and causes
bull RA identifies threat scenarios to DC value-creating activities ampassets
bull RA estimates likelihood (probability) of occurrence of DC lossesamp asset impairment identified in BIA (estimates orguesstimates)
Note Need to locate DC activities amp assets in the slots of the RiskMatrix for the DC and plan to protect value-creating capability
robertmcachia at umedumt
BC Planning amp Design Phase - business recovery
DC recovery options driven by business requirements
regulation contracts amp SLAs A typical recovery sequence
1 Set MTPD objective for the DCfor each
client
2 Assign RTOs for individual
clientservicesapplicationsdata on
basis of prioritySLA
3 Evaluate alternative management amp
technical recovery options
4 Perform costbenefit analysis for each
option (re-planiterate)
5 Decide CEOCFO (i) approval (ii)
budget (iii) tasking individualsFailure to prepare is preparing
to fail - John Wooden
robertmcachia at umedumt
Some business recovery options for the DC
bull an Emergency Operations OfficeCentre
bull virtual workplaceteleworking from home for selected DC employees
bull alternative IT Call CentreData Centre
bull business decision - On-shore Near-shore Offshore
fallbackredundancy options
A typical recovery sequence
1 Initial emergency response
2 Resume mission-critical ApplicationsServices
3 Resume non-critical ApplicationsServices
4 Restoration to primary site full services verify stability
Ongoing communication - DC employees clients regulators
BC Planning amp Design Phase - business recovery (2)
robertmcachia at umedumt
BC Planning amp Design - technical recovery
Note business decision On-shore Near-shore Offshore
Note Mirror Hot Warm Cold (i) own (ii) shared-space (iii) reciprocal
agreements (iv) outsourced a business issue
Note high-end BC for DCs active-active hellip hot failover hellip
Note entry-point BC for DCs electronic vaulting remote journaling
transaction logging
Mirror full redundancy short-
distancewidely-separated Data
Centers latency considerations
Hot fully equipped fallback Data Centre
Warm fallback Data Center missing key
components
Cold empty fallback Data Center
DC Recovery options by increasing MTPD RTO amp decreasing cost
mirroring
robertmcachia at umedumt
Phases Exercise amp Testing Maintain amp Review Awareness amp Training
Maintain amp Review Phase
feedback from testingaudit broadendeepen DC BC plan
Awareness amp Training Phase
its never enough institutionalize BC practice embed into
DC culture
The proof of the bdquopudding‟
is in the eating
Exercise amp Testing Phase
Give yourself an intentional DC disaster
1 Re-test BCP regularly
(walk-through checklist role-
playingsimulation full interruption amp
rehearsal)
2 Test alternative risk (threat +
vulnerability) scenarios
robertmcachia at umedumt
An example disaster mass cyber-attack
A mass cyber-attack may
involve
prior sniffing hellip
hellip social engineering amp
widespread attack
An attack may be blended
denial-of-service andor
widespread virusworm
infection andor
penetration (data theft
data corruption) andor
fraudblackmail
IncidentProblem Management
(recall ISO 20000 ITIL)source M Benyoucef
University of Ottawa
robertmcachia at umedumt
An example DC disaster BCM response to cyber-attack
This sequence applies to DoSDDoS infection amp penetration cyber-
attacks
Note After the BCM response seek prevention (i) Lessons Learnt (ii)
improve DC processsystems resilience (iii) heighten DC emergency
preparedness
1 Interdict actions to block attack halt its progress
2 Contain actions to prevent further degradation
amp regain control
3 Recover repair fix damage bring all systems
data applicationsservices to normality
4 Analyse investigationaudit - examination of
evidence
When prevention fails and an attack is under way the BCM
response is
Interdict Contain
Recover Analyse
robertmcachia at umedumt
Structure amp Content of BS25999
A DC may go beyond organising a BCP project
It may decide to formalise its BCM assurance process by embracing the
BS25999 standard
BS25999 holistic structure amp content
bull overview of business continuity management (BCM)
bull business continuity management policy
bull BCM programme management
bull understanding the organization
bull determining business continuity strategy
bull developing and implementing a BCM response
bull exercising maintaining and reviewing BCM arrangements
bull embedding BCM in the organizationrsquos culture
BS25999 Holistic
robertmcachia at umedumt
Some takeaways (1)
BCM thinking amp action
bull decrease the number of
holes both management amp
technical
bull decrease the size of the
holes and
bull keep them moving so they
are never aligned
The Swiss cheese model of business resilience
robertmcachia at umedumt
Some takeaways (2)
BCM is a concern of
CIOs CTOs Data Centre Managers
Quality Managers Infosec Managers
IT Architects DBAs
IT Auditors
CEOs CFOs because itrsquos Governance
BCM addresses broad non-specific risks it is a ldquocatch allrdquo
BCM is probably the most cost-effective security control of all
BCM start now start small scale fast broadendeepen BCM scope
incrementally
You must be able to respond to your circumstances as they exist - not as you
would like them to be - Brian Billick
robertmcachia at umedumt
Sources in and around BCM for Data Centres
The Business Continuity Institute (BCI UK) httpwwwthebciorg
Guidancestudies on BCM by ENISA (EU) ECB (EU) CMI (UK) FSA (UK) UK
Cabinet Office
A primer on ITIL ISO 20000 - IT Service Management
httpwwwisaca-
maltaorgdocumentspresentationsIT_Service_Management_Robert_M_Cachia_26_02
_2009pdf
Agility amp alertness hellipcontinuitycentral httpwwwcontinuitycentralcomukhtm
BS25999 (British Standards Institution) httpwwwbsigroupcomensectorsandservicesDisciplinesBusines
s-Continuity
RiskIT amp CobIT frameworks (ISACA) wwwisaca-maltaorg
SFIA HR framework for IT ISEB (BCS) httpwwwbcsorg
SEI (USA) httpwwwseicmuedu
Uptime Institute (USA) httpwwwuptimeinstituteorg
robertmcachia at umedumt
Questions and Discussion
robertmcachia atldquo umedumt
Thank You
Dr Robert M Cachia
httpwwwlinkedincominrobertmcachia
robertmcachia ldquoatrdquo umedumt
Threats - general amp IT-specific
-
Explosion fire smoke emissions effluents
in or close to DC
malicious IT Threat
robertmcachia at umedumt
IT Vulnerabilities - some examples (1)
DC vulnerabilities may be technical or non-technical
eg in organizational design in business processes systems or software
Some IT vulnerabilities
Poor Data Centre site selection or Data Centre design
SPOFs (people hardware databases software)
Poor Data Centre procedures or compliance to
procedure (SFIA HR framework for IT ITIL)
Poor physical access control
Poor software patchinggood physical access
control
robertmcachia at umedumt
IT Vulnerabilities - some examples (2)
Poor software testing especially Web
testing (recall CMMi ISEB)
Poor Configuration ManagementVersion
Control (recall ISO 20000 ITIL)
Poor Infosec (passwords biometrics
encryption anti-virusmalware)
(recall ISO 27001 CobIT RiskIT)
Poor Capacity ManagementAvailability
Management (recall ISO 20000 ITIL)
Poor procurementsupplier management
(ISO 9000 ITIL BS25999)
Poor fire protection
Known vulnerabilities sometimes diminish amp new ones emerge - alertness
wiring from hell
robertmcachia at umedumt
Risks - general amp IT-specific
loss of facilities (buildings roads leading to building etc)
loss of data (data integrity data availability recall ISO 27001)
loss of power
loss of water supply (cooling) the opposite - flooding
computer infection outbreaks - viruses worms
physical access compromised intrusion with intent
asset seizure
Changing threats changing vulnerabilities - shifting scenario alertness
Some Data Centres risks
supply chain disruption (software vendor collapse
hardware vendor collapse ISP collapse)
loss of Telecom connectivity (data voice)
loss of people (people can be SPOFs)Your IT vendor supply
chain
Burst pipe
DC flooding
robertmcachia at umedumt
IT Risks - some examples
Physical DC Risks Logical DC Risks
DC outages business impacts - large amp immediate
Impact of DC outages
bull down-time
bull breached SLAs
bull lost revenue
bull reputational damage
clients and prospects lost
forever
bull cost to recover
bull legal liabilities
bull some DCs never recover
Recall many DCs are in
premises that were never
designed to be DC-ready
Q How to protect value-creation capability
A BCM
robertmcachia at umedumt
BCM in Data Centres BCM for Data Centres
360 e-continuity amp e-assurance
business functions of commercial IT providers(eg CEOrsquos office Procurement Marketing amp Sales HR Facilities
Planning QA)
IT processes eg IncidentProblem ConfigurationChange
AvailabilityCapacity
Data
Employees including IT people
Physical FacilitiesBuilding Networks VoIP
Computational and Storage
Servers SANNAS
Software stackOS DBMS middleware applications
ApplicationsServicesB2BB2C e-SCMERPe-CRM BIKM IntranetsExtranets e-mail
BCM for data centres 360
robertmcachia at umedumt
BCM DC initiative output a BC Plan + organizational capability
A good BC plan is articulated by components and views
With a good BC plan the CIOCTOData Centre Manager can tell the
CEO
We haveare executing plans by DC clientmarket segment hellip
We haveare executing plans by Applicationby Service hellip
We haveare executing plans by DC Departmenthellip
We haveare executing plans by Buildingby Floor hellip
BC Plan BC Planning (i) What (ii) How (iii) Who (iv) When
robertmcachia at umedumt
DC Business Continuity Planning management choices
Which DC clientsservicesapplications get restored first and who will
wait longer hellip recovery sequence for each client
Which servicesapplications get restored completely which clients will
initially only get limited service
MTPD maximum tolerable period of disruption the period of time by
which all identified systemsapplicationsservices and data
must be restored to normality (for DC for each client)
RTO recovery time objective the period in time within which
assetsservices must be recovered after disruption (for DC for
each client)
RTO affects the recovery option shorter RTO -gt more difficult
more expensive (for DC for each client)
RPO recovery point objective the point in time to which
assetsservices must be restored after disruption (for DC for
each client)
RPO affects the volume of data that may need to be restored
robertmcachia at umedumt
BCM initiative requires a dedicated ProjectProgramme
A BC initiative may adopt Prince 2 or PMBOK projectmanagement methods The typical BCM phases
BC Project Initiation Phase
(mandate BCM scope DC BCM objectives responsibilitiesapproach awareness and BC project outcomesdeliverables)
Business Impact Analysis Phase
Risk Assessment Phase
BC Planning amp Design Phase
Exercise amp Testing Phase
Maintenance amp Review Phase
Awareness amp Training Phase
robertmcachia at umedumt
Business Impact Analysis (BIA) Phase
BIA achieves understanding of the DC its users activities
servicesrevenue streams applications data liabilities
bull BIA takes an internal focus - it identifies critical DC value-creatingactivities and assets
bull BIA quantifies DC loss due to disruptionimpairment of assets(reputational financial lost revenue cost of recovery)
bull BIA does not estimate the probability of types of DC incidents - itquantifies the consequences
bull BIA techniques questionnaires + interviews
BIA answers the questions which DC activities amp
assets create most value hellip of all potential DC losses
which impaired activities amp assets would be the
greatest losscascading impacts
multiple DC failure
robertmcachia at umedumt
The Risk Matrix - likelihood vs impact
Major Risks to DC - top-right square
unacceptable Eliminate or Postpone
Transfer Monitor amp Mitigate +
management ownership + DO BCM -Governance
Contingency Risks to DC - top-left
square very rare events will
extinguish the business complacency +
overconfidence can distract
management from the consequences
DO BCM - Governance
High incidence and low impact risksamp
minor risks order of the day -
operational matters
robertmcachia at umedumt
Risk Assessment Phase
RA considers the whatwhywherewho can and would cause the DCthe disastrous losses identified in BIA RA seeks to understand threatsto valued serviceDC assets
bull RA takes a primarily external focus a worldpoliticaleconomic IT sector focus - it looks at sources and causes
bull RA identifies threat scenarios to DC value-creating activities ampassets
bull RA estimates likelihood (probability) of occurrence of DC lossesamp asset impairment identified in BIA (estimates orguesstimates)
Note Need to locate DC activities amp assets in the slots of the RiskMatrix for the DC and plan to protect value-creating capability
robertmcachia at umedumt
BC Planning amp Design Phase - business recovery
DC recovery options driven by business requirements
regulation contracts amp SLAs A typical recovery sequence
1 Set MTPD objective for the DCfor each
client
2 Assign RTOs for individual
clientservicesapplicationsdata on
basis of prioritySLA
3 Evaluate alternative management amp
technical recovery options
4 Perform costbenefit analysis for each
option (re-planiterate)
5 Decide CEOCFO (i) approval (ii)
budget (iii) tasking individualsFailure to prepare is preparing
to fail - John Wooden
robertmcachia at umedumt
Some business recovery options for the DC
bull an Emergency Operations OfficeCentre
bull virtual workplaceteleworking from home for selected DC employees
bull alternative IT Call CentreData Centre
bull business decision - On-shore Near-shore Offshore
fallbackredundancy options
A typical recovery sequence
1 Initial emergency response
2 Resume mission-critical ApplicationsServices
3 Resume non-critical ApplicationsServices
4 Restoration to primary site full services verify stability
Ongoing communication - DC employees clients regulators
BC Planning amp Design Phase - business recovery (2)
robertmcachia at umedumt
BC Planning amp Design - technical recovery
Note business decision On-shore Near-shore Offshore
Note Mirror Hot Warm Cold (i) own (ii) shared-space (iii) reciprocal
agreements (iv) outsourced a business issue
Note high-end BC for DCs active-active hellip hot failover hellip
Note entry-point BC for DCs electronic vaulting remote journaling
transaction logging
Mirror full redundancy short-
distancewidely-separated Data
Centers latency considerations
Hot fully equipped fallback Data Centre
Warm fallback Data Center missing key
components
Cold empty fallback Data Center
DC Recovery options by increasing MTPD RTO amp decreasing cost
mirroring
robertmcachia at umedumt
Phases Exercise amp Testing Maintain amp Review Awareness amp Training
Maintain amp Review Phase
feedback from testingaudit broadendeepen DC BC plan
Awareness amp Training Phase
its never enough institutionalize BC practice embed into
DC culture
The proof of the bdquopudding‟
is in the eating
Exercise amp Testing Phase
Give yourself an intentional DC disaster
1 Re-test BCP regularly
(walk-through checklist role-
playingsimulation full interruption amp
rehearsal)
2 Test alternative risk (threat +
vulnerability) scenarios
robertmcachia at umedumt
An example disaster mass cyber-attack
A mass cyber-attack may
involve
prior sniffing hellip
hellip social engineering amp
widespread attack
An attack may be blended
denial-of-service andor
widespread virusworm
infection andor
penetration (data theft
data corruption) andor
fraudblackmail
IncidentProblem Management
(recall ISO 20000 ITIL)source M Benyoucef
University of Ottawa
robertmcachia at umedumt
An example DC disaster BCM response to cyber-attack
This sequence applies to DoSDDoS infection amp penetration cyber-
attacks
Note After the BCM response seek prevention (i) Lessons Learnt (ii)
improve DC processsystems resilience (iii) heighten DC emergency
preparedness
1 Interdict actions to block attack halt its progress
2 Contain actions to prevent further degradation
amp regain control
3 Recover repair fix damage bring all systems
data applicationsservices to normality
4 Analyse investigationaudit - examination of
evidence
When prevention fails and an attack is under way the BCM
response is
Interdict Contain
Recover Analyse
robertmcachia at umedumt
Structure amp Content of BS25999
A DC may go beyond organising a BCP project
It may decide to formalise its BCM assurance process by embracing the
BS25999 standard
BS25999 holistic structure amp content
bull overview of business continuity management (BCM)
bull business continuity management policy
bull BCM programme management
bull understanding the organization
bull determining business continuity strategy
bull developing and implementing a BCM response
bull exercising maintaining and reviewing BCM arrangements
bull embedding BCM in the organizationrsquos culture
BS25999 Holistic
robertmcachia at umedumt
Some takeaways (1)
BCM thinking amp action
bull decrease the number of
holes both management amp
technical
bull decrease the size of the
holes and
bull keep them moving so they
are never aligned
The Swiss cheese model of business resilience
robertmcachia at umedumt
Some takeaways (2)
BCM is a concern of
CIOs CTOs Data Centre Managers
Quality Managers Infosec Managers
IT Architects DBAs
IT Auditors
CEOs CFOs because itrsquos Governance
BCM addresses broad non-specific risks it is a ldquocatch allrdquo
BCM is probably the most cost-effective security control of all
BCM start now start small scale fast broadendeepen BCM scope
incrementally
You must be able to respond to your circumstances as they exist - not as you
would like them to be - Brian Billick
robertmcachia at umedumt
Sources in and around BCM for Data Centres
The Business Continuity Institute (BCI UK) httpwwwthebciorg
Guidancestudies on BCM by ENISA (EU) ECB (EU) CMI (UK) FSA (UK) UK
Cabinet Office
A primer on ITIL ISO 20000 - IT Service Management
httpwwwisaca-
maltaorgdocumentspresentationsIT_Service_Management_Robert_M_Cachia_26_02
_2009pdf
Agility amp alertness hellipcontinuitycentral httpwwwcontinuitycentralcomukhtm
BS25999 (British Standards Institution) httpwwwbsigroupcomensectorsandservicesDisciplinesBusines
s-Continuity
RiskIT amp CobIT frameworks (ISACA) wwwisaca-maltaorg
SFIA HR framework for IT ISEB (BCS) httpwwwbcsorg
SEI (USA) httpwwwseicmuedu
Uptime Institute (USA) httpwwwuptimeinstituteorg
robertmcachia at umedumt
Questions and Discussion
robertmcachia atldquo umedumt
Thank You
Dr Robert M Cachia
httpwwwlinkedincominrobertmcachia
robertmcachia ldquoatrdquo umedumt
IT Vulnerabilities - some examples (1)
DC vulnerabilities may be technical or non-technical
eg in organizational design in business processes systems or software
Some IT vulnerabilities
Poor Data Centre site selection or Data Centre design
SPOFs (people hardware databases software)
Poor Data Centre procedures or compliance to
procedure (SFIA HR framework for IT ITIL)
Poor physical access control
Poor software patchinggood physical access
control
robertmcachia at umedumt
IT Vulnerabilities - some examples (2)
Poor software testing especially Web
testing (recall CMMi ISEB)
Poor Configuration ManagementVersion
Control (recall ISO 20000 ITIL)
Poor Infosec (passwords biometrics
encryption anti-virusmalware)
(recall ISO 27001 CobIT RiskIT)
Poor Capacity ManagementAvailability
Management (recall ISO 20000 ITIL)
Poor procurementsupplier management
(ISO 9000 ITIL BS25999)
Poor fire protection
Known vulnerabilities sometimes diminish amp new ones emerge - alertness
wiring from hell
robertmcachia at umedumt
Risks - general amp IT-specific
loss of facilities (buildings roads leading to building etc)
loss of data (data integrity data availability recall ISO 27001)
loss of power
loss of water supply (cooling) the opposite - flooding
computer infection outbreaks - viruses worms
physical access compromised intrusion with intent
asset seizure
Changing threats changing vulnerabilities - shifting scenario alertness
Some Data Centres risks
supply chain disruption (software vendor collapse
hardware vendor collapse ISP collapse)
loss of Telecom connectivity (data voice)
loss of people (people can be SPOFs)Your IT vendor supply
chain
Burst pipe
DC flooding
robertmcachia at umedumt
IT Risks - some examples
Physical DC Risks Logical DC Risks
DC outages business impacts - large amp immediate
Impact of DC outages
bull down-time
bull breached SLAs
bull lost revenue
bull reputational damage
clients and prospects lost
forever
bull cost to recover
bull legal liabilities
bull some DCs never recover
Recall many DCs are in
premises that were never
designed to be DC-ready
Q How to protect value-creation capability
A BCM
robertmcachia at umedumt
BCM in Data Centres BCM for Data Centres
360 e-continuity amp e-assurance
business functions of commercial IT providers(eg CEOrsquos office Procurement Marketing amp Sales HR Facilities
Planning QA)
IT processes eg IncidentProblem ConfigurationChange
AvailabilityCapacity
Data
Employees including IT people
Physical FacilitiesBuilding Networks VoIP
Computational and Storage
Servers SANNAS
Software stackOS DBMS middleware applications
ApplicationsServicesB2BB2C e-SCMERPe-CRM BIKM IntranetsExtranets e-mail
BCM for data centres 360
robertmcachia at umedumt
BCM DC initiative output a BC Plan + organizational capability
A good BC plan is articulated by components and views
With a good BC plan the CIOCTOData Centre Manager can tell the
CEO
We haveare executing plans by DC clientmarket segment hellip
We haveare executing plans by Applicationby Service hellip
We haveare executing plans by DC Departmenthellip
We haveare executing plans by Buildingby Floor hellip
BC Plan BC Planning (i) What (ii) How (iii) Who (iv) When
robertmcachia at umedumt
DC Business Continuity Planning management choices
Which DC clientsservicesapplications get restored first and who will
wait longer hellip recovery sequence for each client
Which servicesapplications get restored completely which clients will
initially only get limited service
MTPD maximum tolerable period of disruption the period of time by
which all identified systemsapplicationsservices and data
must be restored to normality (for DC for each client)
RTO recovery time objective the period in time within which
assetsservices must be recovered after disruption (for DC for
each client)
RTO affects the recovery option shorter RTO -gt more difficult
more expensive (for DC for each client)
RPO recovery point objective the point in time to which
assetsservices must be restored after disruption (for DC for
each client)
RPO affects the volume of data that may need to be restored
robertmcachia at umedumt
BCM initiative requires a dedicated ProjectProgramme
A BC initiative may adopt Prince 2 or PMBOK projectmanagement methods The typical BCM phases
BC Project Initiation Phase
(mandate BCM scope DC BCM objectives responsibilitiesapproach awareness and BC project outcomesdeliverables)
Business Impact Analysis Phase
Risk Assessment Phase
BC Planning amp Design Phase
Exercise amp Testing Phase
Maintenance amp Review Phase
Awareness amp Training Phase
robertmcachia at umedumt
Business Impact Analysis (BIA) Phase
BIA achieves understanding of the DC its users activities
servicesrevenue streams applications data liabilities
bull BIA takes an internal focus - it identifies critical DC value-creatingactivities and assets
bull BIA quantifies DC loss due to disruptionimpairment of assets(reputational financial lost revenue cost of recovery)
bull BIA does not estimate the probability of types of DC incidents - itquantifies the consequences
bull BIA techniques questionnaires + interviews
BIA answers the questions which DC activities amp
assets create most value hellip of all potential DC losses
which impaired activities amp assets would be the
greatest losscascading impacts
multiple DC failure
robertmcachia at umedumt
The Risk Matrix - likelihood vs impact
Major Risks to DC - top-right square
unacceptable Eliminate or Postpone
Transfer Monitor amp Mitigate +
management ownership + DO BCM -Governance
Contingency Risks to DC - top-left
square very rare events will
extinguish the business complacency +
overconfidence can distract
management from the consequences
DO BCM - Governance
High incidence and low impact risksamp
minor risks order of the day -
operational matters
robertmcachia at umedumt
Risk Assessment Phase
RA considers the whatwhywherewho can and would cause the DCthe disastrous losses identified in BIA RA seeks to understand threatsto valued serviceDC assets
bull RA takes a primarily external focus a worldpoliticaleconomic IT sector focus - it looks at sources and causes
bull RA identifies threat scenarios to DC value-creating activities ampassets
bull RA estimates likelihood (probability) of occurrence of DC lossesamp asset impairment identified in BIA (estimates orguesstimates)
Note Need to locate DC activities amp assets in the slots of the RiskMatrix for the DC and plan to protect value-creating capability
robertmcachia at umedumt
BC Planning amp Design Phase - business recovery
DC recovery options driven by business requirements
regulation contracts amp SLAs A typical recovery sequence
1 Set MTPD objective for the DCfor each
client
2 Assign RTOs for individual
clientservicesapplicationsdata on
basis of prioritySLA
3 Evaluate alternative management amp
technical recovery options
4 Perform costbenefit analysis for each
option (re-planiterate)
5 Decide CEOCFO (i) approval (ii)
budget (iii) tasking individualsFailure to prepare is preparing
to fail - John Wooden
robertmcachia at umedumt
Some business recovery options for the DC
bull an Emergency Operations OfficeCentre
bull virtual workplaceteleworking from home for selected DC employees
bull alternative IT Call CentreData Centre
bull business decision - On-shore Near-shore Offshore
fallbackredundancy options
A typical recovery sequence
1 Initial emergency response
2 Resume mission-critical ApplicationsServices
3 Resume non-critical ApplicationsServices
4 Restoration to primary site full services verify stability
Ongoing communication - DC employees clients regulators
BC Planning amp Design Phase - business recovery (2)
robertmcachia at umedumt
BC Planning amp Design - technical recovery
Note business decision On-shore Near-shore Offshore
Note Mirror Hot Warm Cold (i) own (ii) shared-space (iii) reciprocal
agreements (iv) outsourced a business issue
Note high-end BC for DCs active-active hellip hot failover hellip
Note entry-point BC for DCs electronic vaulting remote journaling
transaction logging
Mirror full redundancy short-
distancewidely-separated Data
Centers latency considerations
Hot fully equipped fallback Data Centre
Warm fallback Data Center missing key
components
Cold empty fallback Data Center
DC Recovery options by increasing MTPD RTO amp decreasing cost
mirroring
robertmcachia at umedumt
Phases Exercise amp Testing Maintain amp Review Awareness amp Training
Maintain amp Review Phase
feedback from testingaudit broadendeepen DC BC plan
Awareness amp Training Phase
its never enough institutionalize BC practice embed into
DC culture
The proof of the bdquopudding‟
is in the eating
Exercise amp Testing Phase
Give yourself an intentional DC disaster
1 Re-test BCP regularly
(walk-through checklist role-
playingsimulation full interruption amp
rehearsal)
2 Test alternative risk (threat +
vulnerability) scenarios
robertmcachia at umedumt
An example disaster mass cyber-attack
A mass cyber-attack may
involve
prior sniffing hellip
hellip social engineering amp
widespread attack
An attack may be blended
denial-of-service andor
widespread virusworm
infection andor
penetration (data theft
data corruption) andor
fraudblackmail
IncidentProblem Management
(recall ISO 20000 ITIL)source M Benyoucef
University of Ottawa
robertmcachia at umedumt
An example DC disaster BCM response to cyber-attack
This sequence applies to DoSDDoS infection amp penetration cyber-
attacks
Note After the BCM response seek prevention (i) Lessons Learnt (ii)
improve DC processsystems resilience (iii) heighten DC emergency
preparedness
1 Interdict actions to block attack halt its progress
2 Contain actions to prevent further degradation
amp regain control
3 Recover repair fix damage bring all systems
data applicationsservices to normality
4 Analyse investigationaudit - examination of
evidence
When prevention fails and an attack is under way the BCM
response is
Interdict Contain
Recover Analyse
robertmcachia at umedumt
Structure amp Content of BS25999
A DC may go beyond organising a BCP project
It may decide to formalise its BCM assurance process by embracing the
BS25999 standard
BS25999 holistic structure amp content
bull overview of business continuity management (BCM)
bull business continuity management policy
bull BCM programme management
bull understanding the organization
bull determining business continuity strategy
bull developing and implementing a BCM response
bull exercising maintaining and reviewing BCM arrangements
bull embedding BCM in the organizationrsquos culture
BS25999 Holistic
robertmcachia at umedumt
Some takeaways (1)
BCM thinking amp action
bull decrease the number of
holes both management amp
technical
bull decrease the size of the
holes and
bull keep them moving so they
are never aligned
The Swiss cheese model of business resilience
robertmcachia at umedumt
Some takeaways (2)
BCM is a concern of
CIOs CTOs Data Centre Managers
Quality Managers Infosec Managers
IT Architects DBAs
IT Auditors
CEOs CFOs because itrsquos Governance
BCM addresses broad non-specific risks it is a ldquocatch allrdquo
BCM is probably the most cost-effective security control of all
BCM start now start small scale fast broadendeepen BCM scope
incrementally
You must be able to respond to your circumstances as they exist - not as you
would like them to be - Brian Billick
robertmcachia at umedumt
Sources in and around BCM for Data Centres
The Business Continuity Institute (BCI UK) httpwwwthebciorg
Guidancestudies on BCM by ENISA (EU) ECB (EU) CMI (UK) FSA (UK) UK
Cabinet Office
A primer on ITIL ISO 20000 - IT Service Management
httpwwwisaca-
maltaorgdocumentspresentationsIT_Service_Management_Robert_M_Cachia_26_02
_2009pdf
Agility amp alertness hellipcontinuitycentral httpwwwcontinuitycentralcomukhtm
BS25999 (British Standards Institution) httpwwwbsigroupcomensectorsandservicesDisciplinesBusines
s-Continuity
RiskIT amp CobIT frameworks (ISACA) wwwisaca-maltaorg
SFIA HR framework for IT ISEB (BCS) httpwwwbcsorg
SEI (USA) httpwwwseicmuedu
Uptime Institute (USA) httpwwwuptimeinstituteorg
robertmcachia at umedumt
Questions and Discussion
robertmcachia atldquo umedumt
Thank You
Dr Robert M Cachia
httpwwwlinkedincominrobertmcachia
robertmcachia ldquoatrdquo umedumt
IT Vulnerabilities - some examples (2)
Poor software testing especially Web
testing (recall CMMi ISEB)
Poor Configuration ManagementVersion
Control (recall ISO 20000 ITIL)
Poor Infosec (passwords biometrics
encryption anti-virusmalware)
(recall ISO 27001 CobIT RiskIT)
Poor Capacity ManagementAvailability
Management (recall ISO 20000 ITIL)
Poor procurementsupplier management
(ISO 9000 ITIL BS25999)
Poor fire protection
Known vulnerabilities sometimes diminish amp new ones emerge - alertness
wiring from hell
robertmcachia at umedumt
Risks - general amp IT-specific
loss of facilities (buildings roads leading to building etc)
loss of data (data integrity data availability recall ISO 27001)
loss of power
loss of water supply (cooling) the opposite - flooding
computer infection outbreaks - viruses worms
physical access compromised intrusion with intent
asset seizure
Changing threats changing vulnerabilities - shifting scenario alertness
Some Data Centres risks
supply chain disruption (software vendor collapse
hardware vendor collapse ISP collapse)
loss of Telecom connectivity (data voice)
loss of people (people can be SPOFs)Your IT vendor supply
chain
Burst pipe
DC flooding
robertmcachia at umedumt
IT Risks - some examples
Physical DC Risks Logical DC Risks
DC outages business impacts - large amp immediate
Impact of DC outages
bull down-time
bull breached SLAs
bull lost revenue
bull reputational damage
clients and prospects lost
forever
bull cost to recover
bull legal liabilities
bull some DCs never recover
Recall many DCs are in
premises that were never
designed to be DC-ready
Q How to protect value-creation capability
A BCM
robertmcachia at umedumt
BCM in Data Centres BCM for Data Centres
360 e-continuity amp e-assurance
business functions of commercial IT providers(eg CEOrsquos office Procurement Marketing amp Sales HR Facilities
Planning QA)
IT processes eg IncidentProblem ConfigurationChange
AvailabilityCapacity
Data
Employees including IT people
Physical FacilitiesBuilding Networks VoIP
Computational and Storage
Servers SANNAS
Software stackOS DBMS middleware applications
ApplicationsServicesB2BB2C e-SCMERPe-CRM BIKM IntranetsExtranets e-mail
BCM for data centres 360
robertmcachia at umedumt
BCM DC initiative output a BC Plan + organizational capability
A good BC plan is articulated by components and views
With a good BC plan the CIOCTOData Centre Manager can tell the
CEO
We haveare executing plans by DC clientmarket segment hellip
We haveare executing plans by Applicationby Service hellip
We haveare executing plans by DC Departmenthellip
We haveare executing plans by Buildingby Floor hellip
BC Plan BC Planning (i) What (ii) How (iii) Who (iv) When
robertmcachia at umedumt
DC Business Continuity Planning management choices
Which DC clientsservicesapplications get restored first and who will
wait longer hellip recovery sequence for each client
Which servicesapplications get restored completely which clients will
initially only get limited service
MTPD maximum tolerable period of disruption the period of time by
which all identified systemsapplicationsservices and data
must be restored to normality (for DC for each client)
RTO recovery time objective the period in time within which
assetsservices must be recovered after disruption (for DC for
each client)
RTO affects the recovery option shorter RTO -gt more difficult
more expensive (for DC for each client)
RPO recovery point objective the point in time to which
assetsservices must be restored after disruption (for DC for
each client)
RPO affects the volume of data that may need to be restored
robertmcachia at umedumt
BCM initiative requires a dedicated ProjectProgramme
A BC initiative may adopt Prince 2 or PMBOK projectmanagement methods The typical BCM phases
BC Project Initiation Phase
(mandate BCM scope DC BCM objectives responsibilitiesapproach awareness and BC project outcomesdeliverables)
Business Impact Analysis Phase
Risk Assessment Phase
BC Planning amp Design Phase
Exercise amp Testing Phase
Maintenance amp Review Phase
Awareness amp Training Phase
robertmcachia at umedumt
Business Impact Analysis (BIA) Phase
BIA achieves understanding of the DC its users activities
servicesrevenue streams applications data liabilities
bull BIA takes an internal focus - it identifies critical DC value-creatingactivities and assets
bull BIA quantifies DC loss due to disruptionimpairment of assets(reputational financial lost revenue cost of recovery)
bull BIA does not estimate the probability of types of DC incidents - itquantifies the consequences
bull BIA techniques questionnaires + interviews
BIA answers the questions which DC activities amp
assets create most value hellip of all potential DC losses
which impaired activities amp assets would be the
greatest losscascading impacts
multiple DC failure
robertmcachia at umedumt
The Risk Matrix - likelihood vs impact
Major Risks to DC - top-right square
unacceptable Eliminate or Postpone
Transfer Monitor amp Mitigate +
management ownership + DO BCM -Governance
Contingency Risks to DC - top-left
square very rare events will
extinguish the business complacency +
overconfidence can distract
management from the consequences
DO BCM - Governance
High incidence and low impact risksamp
minor risks order of the day -
operational matters
robertmcachia at umedumt
Risk Assessment Phase
RA considers the whatwhywherewho can and would cause the DCthe disastrous losses identified in BIA RA seeks to understand threatsto valued serviceDC assets
bull RA takes a primarily external focus a worldpoliticaleconomic IT sector focus - it looks at sources and causes
bull RA identifies threat scenarios to DC value-creating activities ampassets
bull RA estimates likelihood (probability) of occurrence of DC lossesamp asset impairment identified in BIA (estimates orguesstimates)
Note Need to locate DC activities amp assets in the slots of the RiskMatrix for the DC and plan to protect value-creating capability
robertmcachia at umedumt
BC Planning amp Design Phase - business recovery
DC recovery options driven by business requirements
regulation contracts amp SLAs A typical recovery sequence
1 Set MTPD objective for the DCfor each
client
2 Assign RTOs for individual
clientservicesapplicationsdata on
basis of prioritySLA
3 Evaluate alternative management amp
technical recovery options
4 Perform costbenefit analysis for each
option (re-planiterate)
5 Decide CEOCFO (i) approval (ii)
budget (iii) tasking individualsFailure to prepare is preparing
to fail - John Wooden
robertmcachia at umedumt
Some business recovery options for the DC
bull an Emergency Operations OfficeCentre
bull virtual workplaceteleworking from home for selected DC employees
bull alternative IT Call CentreData Centre
bull business decision - On-shore Near-shore Offshore
fallbackredundancy options
A typical recovery sequence
1 Initial emergency response
2 Resume mission-critical ApplicationsServices
3 Resume non-critical ApplicationsServices
4 Restoration to primary site full services verify stability
Ongoing communication - DC employees clients regulators
BC Planning amp Design Phase - business recovery (2)
robertmcachia at umedumt
BC Planning amp Design - technical recovery
Note business decision On-shore Near-shore Offshore
Note Mirror Hot Warm Cold (i) own (ii) shared-space (iii) reciprocal
agreements (iv) outsourced a business issue
Note high-end BC for DCs active-active hellip hot failover hellip
Note entry-point BC for DCs electronic vaulting remote journaling
transaction logging
Mirror full redundancy short-
distancewidely-separated Data
Centers latency considerations
Hot fully equipped fallback Data Centre
Warm fallback Data Center missing key
components
Cold empty fallback Data Center
DC Recovery options by increasing MTPD RTO amp decreasing cost
mirroring
robertmcachia at umedumt
Phases Exercise amp Testing Maintain amp Review Awareness amp Training
Maintain amp Review Phase
feedback from testingaudit broadendeepen DC BC plan
Awareness amp Training Phase
its never enough institutionalize BC practice embed into
DC culture
The proof of the bdquopudding‟
is in the eating
Exercise amp Testing Phase
Give yourself an intentional DC disaster
1 Re-test BCP regularly
(walk-through checklist role-
playingsimulation full interruption amp
rehearsal)
2 Test alternative risk (threat +
vulnerability) scenarios
robertmcachia at umedumt
An example disaster mass cyber-attack
A mass cyber-attack may
involve
prior sniffing hellip
hellip social engineering amp
widespread attack
An attack may be blended
denial-of-service andor
widespread virusworm
infection andor
penetration (data theft
data corruption) andor
fraudblackmail
IncidentProblem Management
(recall ISO 20000 ITIL)source M Benyoucef
University of Ottawa
robertmcachia at umedumt
An example DC disaster BCM response to cyber-attack
This sequence applies to DoSDDoS infection amp penetration cyber-
attacks
Note After the BCM response seek prevention (i) Lessons Learnt (ii)
improve DC processsystems resilience (iii) heighten DC emergency
preparedness
1 Interdict actions to block attack halt its progress
2 Contain actions to prevent further degradation
amp regain control
3 Recover repair fix damage bring all systems
data applicationsservices to normality
4 Analyse investigationaudit - examination of
evidence
When prevention fails and an attack is under way the BCM
response is
Interdict Contain
Recover Analyse
robertmcachia at umedumt
Structure amp Content of BS25999
A DC may go beyond organising a BCP project
It may decide to formalise its BCM assurance process by embracing the
BS25999 standard
BS25999 holistic structure amp content
bull overview of business continuity management (BCM)
bull business continuity management policy
bull BCM programme management
bull understanding the organization
bull determining business continuity strategy
bull developing and implementing a BCM response
bull exercising maintaining and reviewing BCM arrangements
bull embedding BCM in the organizationrsquos culture
BS25999 Holistic
robertmcachia at umedumt
Some takeaways (1)
BCM thinking amp action
bull decrease the number of
holes both management amp
technical
bull decrease the size of the
holes and
bull keep them moving so they
are never aligned
The Swiss cheese model of business resilience
robertmcachia at umedumt
Some takeaways (2)
BCM is a concern of
CIOs CTOs Data Centre Managers
Quality Managers Infosec Managers
IT Architects DBAs
IT Auditors
CEOs CFOs because itrsquos Governance
BCM addresses broad non-specific risks it is a ldquocatch allrdquo
BCM is probably the most cost-effective security control of all
BCM start now start small scale fast broadendeepen BCM scope
incrementally
You must be able to respond to your circumstances as they exist - not as you
would like them to be - Brian Billick
robertmcachia at umedumt
Sources in and around BCM for Data Centres
The Business Continuity Institute (BCI UK) httpwwwthebciorg
Guidancestudies on BCM by ENISA (EU) ECB (EU) CMI (UK) FSA (UK) UK
Cabinet Office
A primer on ITIL ISO 20000 - IT Service Management
httpwwwisaca-
maltaorgdocumentspresentationsIT_Service_Management_Robert_M_Cachia_26_02
_2009pdf
Agility amp alertness hellipcontinuitycentral httpwwwcontinuitycentralcomukhtm
BS25999 (British Standards Institution) httpwwwbsigroupcomensectorsandservicesDisciplinesBusines
s-Continuity
RiskIT amp CobIT frameworks (ISACA) wwwisaca-maltaorg
SFIA HR framework for IT ISEB (BCS) httpwwwbcsorg
SEI (USA) httpwwwseicmuedu
Uptime Institute (USA) httpwwwuptimeinstituteorg
robertmcachia at umedumt
Questions and Discussion
robertmcachia atldquo umedumt
Thank You
Dr Robert M Cachia
httpwwwlinkedincominrobertmcachia
robertmcachia ldquoatrdquo umedumt
Risks - general amp IT-specific
loss of facilities (buildings roads leading to building etc)
loss of data (data integrity data availability recall ISO 27001)
loss of power
loss of water supply (cooling) the opposite - flooding
computer infection outbreaks - viruses worms
physical access compromised intrusion with intent
asset seizure
Changing threats changing vulnerabilities - shifting scenario alertness
Some Data Centres risks
supply chain disruption (software vendor collapse
hardware vendor collapse ISP collapse)
loss of Telecom connectivity (data voice)
loss of people (people can be SPOFs)Your IT vendor supply
chain
Burst pipe
DC flooding
robertmcachia at umedumt
IT Risks - some examples
Physical DC Risks Logical DC Risks
DC outages business impacts - large amp immediate
Impact of DC outages
bull down-time
bull breached SLAs
bull lost revenue
bull reputational damage
clients and prospects lost
forever
bull cost to recover
bull legal liabilities
bull some DCs never recover
Recall many DCs are in
premises that were never
designed to be DC-ready
Q How to protect value-creation capability
A BCM
robertmcachia at umedumt
BCM in Data Centres BCM for Data Centres
360 e-continuity amp e-assurance
business functions of commercial IT providers(eg CEOrsquos office Procurement Marketing amp Sales HR Facilities
Planning QA)
IT processes eg IncidentProblem ConfigurationChange
AvailabilityCapacity
Data
Employees including IT people
Physical FacilitiesBuilding Networks VoIP
Computational and Storage
Servers SANNAS
Software stackOS DBMS middleware applications
ApplicationsServicesB2BB2C e-SCMERPe-CRM BIKM IntranetsExtranets e-mail
BCM for data centres 360
robertmcachia at umedumt
BCM DC initiative output a BC Plan + organizational capability
A good BC plan is articulated by components and views
With a good BC plan the CIOCTOData Centre Manager can tell the
CEO
We haveare executing plans by DC clientmarket segment hellip
We haveare executing plans by Applicationby Service hellip
We haveare executing plans by DC Departmenthellip
We haveare executing plans by Buildingby Floor hellip
BC Plan BC Planning (i) What (ii) How (iii) Who (iv) When
robertmcachia at umedumt
DC Business Continuity Planning management choices
Which DC clientsservicesapplications get restored first and who will
wait longer hellip recovery sequence for each client
Which servicesapplications get restored completely which clients will
initially only get limited service
MTPD maximum tolerable period of disruption the period of time by
which all identified systemsapplicationsservices and data
must be restored to normality (for DC for each client)
RTO recovery time objective the period in time within which
assetsservices must be recovered after disruption (for DC for
each client)
RTO affects the recovery option shorter RTO -gt more difficult
more expensive (for DC for each client)
RPO recovery point objective the point in time to which
assetsservices must be restored after disruption (for DC for
each client)
RPO affects the volume of data that may need to be restored
robertmcachia at umedumt
BCM initiative requires a dedicated ProjectProgramme
A BC initiative may adopt Prince 2 or PMBOK projectmanagement methods The typical BCM phases
BC Project Initiation Phase
(mandate BCM scope DC BCM objectives responsibilitiesapproach awareness and BC project outcomesdeliverables)
Business Impact Analysis Phase
Risk Assessment Phase
BC Planning amp Design Phase
Exercise amp Testing Phase
Maintenance amp Review Phase
Awareness amp Training Phase
robertmcachia at umedumt
Business Impact Analysis (BIA) Phase
BIA achieves understanding of the DC its users activities
servicesrevenue streams applications data liabilities
bull BIA takes an internal focus - it identifies critical DC value-creatingactivities and assets
bull BIA quantifies DC loss due to disruptionimpairment of assets(reputational financial lost revenue cost of recovery)
bull BIA does not estimate the probability of types of DC incidents - itquantifies the consequences
bull BIA techniques questionnaires + interviews
BIA answers the questions which DC activities amp
assets create most value hellip of all potential DC losses
which impaired activities amp assets would be the
greatest losscascading impacts
multiple DC failure
robertmcachia at umedumt
The Risk Matrix - likelihood vs impact
Major Risks to DC - top-right square
unacceptable Eliminate or Postpone
Transfer Monitor amp Mitigate +
management ownership + DO BCM -Governance
Contingency Risks to DC - top-left
square very rare events will
extinguish the business complacency +
overconfidence can distract
management from the consequences
DO BCM - Governance
High incidence and low impact risksamp
minor risks order of the day -
operational matters
robertmcachia at umedumt
Risk Assessment Phase
RA considers the whatwhywherewho can and would cause the DCthe disastrous losses identified in BIA RA seeks to understand threatsto valued serviceDC assets
bull RA takes a primarily external focus a worldpoliticaleconomic IT sector focus - it looks at sources and causes
bull RA identifies threat scenarios to DC value-creating activities ampassets
bull RA estimates likelihood (probability) of occurrence of DC lossesamp asset impairment identified in BIA (estimates orguesstimates)
Note Need to locate DC activities amp assets in the slots of the RiskMatrix for the DC and plan to protect value-creating capability
robertmcachia at umedumt
BC Planning amp Design Phase - business recovery
DC recovery options driven by business requirements
regulation contracts amp SLAs A typical recovery sequence
1 Set MTPD objective for the DCfor each
client
2 Assign RTOs for individual
clientservicesapplicationsdata on
basis of prioritySLA
3 Evaluate alternative management amp
technical recovery options
4 Perform costbenefit analysis for each
option (re-planiterate)
5 Decide CEOCFO (i) approval (ii)
budget (iii) tasking individualsFailure to prepare is preparing
to fail - John Wooden
robertmcachia at umedumt
Some business recovery options for the DC
bull an Emergency Operations OfficeCentre
bull virtual workplaceteleworking from home for selected DC employees
bull alternative IT Call CentreData Centre
bull business decision - On-shore Near-shore Offshore
fallbackredundancy options
A typical recovery sequence
1 Initial emergency response
2 Resume mission-critical ApplicationsServices
3 Resume non-critical ApplicationsServices
4 Restoration to primary site full services verify stability
Ongoing communication - DC employees clients regulators
BC Planning amp Design Phase - business recovery (2)
robertmcachia at umedumt
BC Planning amp Design - technical recovery
Note business decision On-shore Near-shore Offshore
Note Mirror Hot Warm Cold (i) own (ii) shared-space (iii) reciprocal
agreements (iv) outsourced a business issue
Note high-end BC for DCs active-active hellip hot failover hellip
Note entry-point BC for DCs electronic vaulting remote journaling
transaction logging
Mirror full redundancy short-
distancewidely-separated Data
Centers latency considerations
Hot fully equipped fallback Data Centre
Warm fallback Data Center missing key
components
Cold empty fallback Data Center
DC Recovery options by increasing MTPD RTO amp decreasing cost
mirroring
robertmcachia at umedumt
Phases Exercise amp Testing Maintain amp Review Awareness amp Training
Maintain amp Review Phase
feedback from testingaudit broadendeepen DC BC plan
Awareness amp Training Phase
its never enough institutionalize BC practice embed into
DC culture
The proof of the bdquopudding‟
is in the eating
Exercise amp Testing Phase
Give yourself an intentional DC disaster
1 Re-test BCP regularly
(walk-through checklist role-
playingsimulation full interruption amp
rehearsal)
2 Test alternative risk (threat +
vulnerability) scenarios
robertmcachia at umedumt
An example disaster mass cyber-attack
A mass cyber-attack may
involve
prior sniffing hellip
hellip social engineering amp
widespread attack
An attack may be blended
denial-of-service andor
widespread virusworm
infection andor
penetration (data theft
data corruption) andor
fraudblackmail
IncidentProblem Management
(recall ISO 20000 ITIL)source M Benyoucef
University of Ottawa
robertmcachia at umedumt
An example DC disaster BCM response to cyber-attack
This sequence applies to DoSDDoS infection amp penetration cyber-
attacks
Note After the BCM response seek prevention (i) Lessons Learnt (ii)
improve DC processsystems resilience (iii) heighten DC emergency
preparedness
1 Interdict actions to block attack halt its progress
2 Contain actions to prevent further degradation
amp regain control
3 Recover repair fix damage bring all systems
data applicationsservices to normality
4 Analyse investigationaudit - examination of
evidence
When prevention fails and an attack is under way the BCM
response is
Interdict Contain
Recover Analyse
robertmcachia at umedumt
Structure amp Content of BS25999
A DC may go beyond organising a BCP project
It may decide to formalise its BCM assurance process by embracing the
BS25999 standard
BS25999 holistic structure amp content
bull overview of business continuity management (BCM)
bull business continuity management policy
bull BCM programme management
bull understanding the organization
bull determining business continuity strategy
bull developing and implementing a BCM response
bull exercising maintaining and reviewing BCM arrangements
bull embedding BCM in the organizationrsquos culture
BS25999 Holistic
robertmcachia at umedumt
Some takeaways (1)
BCM thinking amp action
bull decrease the number of
holes both management amp
technical
bull decrease the size of the
holes and
bull keep them moving so they
are never aligned
The Swiss cheese model of business resilience
robertmcachia at umedumt
Some takeaways (2)
BCM is a concern of
CIOs CTOs Data Centre Managers
Quality Managers Infosec Managers
IT Architects DBAs
IT Auditors
CEOs CFOs because itrsquos Governance
BCM addresses broad non-specific risks it is a ldquocatch allrdquo
BCM is probably the most cost-effective security control of all
BCM start now start small scale fast broadendeepen BCM scope
incrementally
You must be able to respond to your circumstances as they exist - not as you
would like them to be - Brian Billick
robertmcachia at umedumt
Sources in and around BCM for Data Centres
The Business Continuity Institute (BCI UK) httpwwwthebciorg
Guidancestudies on BCM by ENISA (EU) ECB (EU) CMI (UK) FSA (UK) UK
Cabinet Office
A primer on ITIL ISO 20000 - IT Service Management
httpwwwisaca-
maltaorgdocumentspresentationsIT_Service_Management_Robert_M_Cachia_26_02
_2009pdf
Agility amp alertness hellipcontinuitycentral httpwwwcontinuitycentralcomukhtm
BS25999 (British Standards Institution) httpwwwbsigroupcomensectorsandservicesDisciplinesBusines
s-Continuity
RiskIT amp CobIT frameworks (ISACA) wwwisaca-maltaorg
SFIA HR framework for IT ISEB (BCS) httpwwwbcsorg
SEI (USA) httpwwwseicmuedu
Uptime Institute (USA) httpwwwuptimeinstituteorg
robertmcachia at umedumt
Questions and Discussion
robertmcachia atldquo umedumt
Thank You
Dr Robert M Cachia
httpwwwlinkedincominrobertmcachia
robertmcachia ldquoatrdquo umedumt
IT Risks - some examples
Physical DC Risks Logical DC Risks
DC outages business impacts - large amp immediate
Impact of DC outages
bull down-time
bull breached SLAs
bull lost revenue
bull reputational damage
clients and prospects lost
forever
bull cost to recover
bull legal liabilities
bull some DCs never recover
Recall many DCs are in
premises that were never
designed to be DC-ready
Q How to protect value-creation capability
A BCM
robertmcachia at umedumt
BCM in Data Centres BCM for Data Centres
360 e-continuity amp e-assurance
business functions of commercial IT providers(eg CEOrsquos office Procurement Marketing amp Sales HR Facilities
Planning QA)
IT processes eg IncidentProblem ConfigurationChange
AvailabilityCapacity
Data
Employees including IT people
Physical FacilitiesBuilding Networks VoIP
Computational and Storage
Servers SANNAS
Software stackOS DBMS middleware applications
ApplicationsServicesB2BB2C e-SCMERPe-CRM BIKM IntranetsExtranets e-mail
BCM for data centres 360
robertmcachia at umedumt
BCM DC initiative output a BC Plan + organizational capability
A good BC plan is articulated by components and views
With a good BC plan the CIOCTOData Centre Manager can tell the
CEO
We haveare executing plans by DC clientmarket segment hellip
We haveare executing plans by Applicationby Service hellip
We haveare executing plans by DC Departmenthellip
We haveare executing plans by Buildingby Floor hellip
BC Plan BC Planning (i) What (ii) How (iii) Who (iv) When
robertmcachia at umedumt
DC Business Continuity Planning management choices
Which DC clientsservicesapplications get restored first and who will
wait longer hellip recovery sequence for each client
Which servicesapplications get restored completely which clients will
initially only get limited service
MTPD maximum tolerable period of disruption the period of time by
which all identified systemsapplicationsservices and data
must be restored to normality (for DC for each client)
RTO recovery time objective the period in time within which
assetsservices must be recovered after disruption (for DC for
each client)
RTO affects the recovery option shorter RTO -gt more difficult
more expensive (for DC for each client)
RPO recovery point objective the point in time to which
assetsservices must be restored after disruption (for DC for
each client)
RPO affects the volume of data that may need to be restored
robertmcachia at umedumt
BCM initiative requires a dedicated ProjectProgramme
A BC initiative may adopt Prince 2 or PMBOK projectmanagement methods The typical BCM phases
BC Project Initiation Phase
(mandate BCM scope DC BCM objectives responsibilitiesapproach awareness and BC project outcomesdeliverables)
Business Impact Analysis Phase
Risk Assessment Phase
BC Planning amp Design Phase
Exercise amp Testing Phase
Maintenance amp Review Phase
Awareness amp Training Phase
robertmcachia at umedumt
Business Impact Analysis (BIA) Phase
BIA achieves understanding of the DC its users activities
servicesrevenue streams applications data liabilities
bull BIA takes an internal focus - it identifies critical DC value-creatingactivities and assets
bull BIA quantifies DC loss due to disruptionimpairment of assets(reputational financial lost revenue cost of recovery)
bull BIA does not estimate the probability of types of DC incidents - itquantifies the consequences
bull BIA techniques questionnaires + interviews
BIA answers the questions which DC activities amp
assets create most value hellip of all potential DC losses
which impaired activities amp assets would be the
greatest losscascading impacts
multiple DC failure
robertmcachia at umedumt
The Risk Matrix - likelihood vs impact
Major Risks to DC - top-right square
unacceptable Eliminate or Postpone
Transfer Monitor amp Mitigate +
management ownership + DO BCM -Governance
Contingency Risks to DC - top-left
square very rare events will
extinguish the business complacency +
overconfidence can distract
management from the consequences
DO BCM - Governance
High incidence and low impact risksamp
minor risks order of the day -
operational matters
robertmcachia at umedumt
Risk Assessment Phase
RA considers the whatwhywherewho can and would cause the DCthe disastrous losses identified in BIA RA seeks to understand threatsto valued serviceDC assets
bull RA takes a primarily external focus a worldpoliticaleconomic IT sector focus - it looks at sources and causes
bull RA identifies threat scenarios to DC value-creating activities ampassets
bull RA estimates likelihood (probability) of occurrence of DC lossesamp asset impairment identified in BIA (estimates orguesstimates)
Note Need to locate DC activities amp assets in the slots of the RiskMatrix for the DC and plan to protect value-creating capability
robertmcachia at umedumt
BC Planning amp Design Phase - business recovery
DC recovery options driven by business requirements
regulation contracts amp SLAs A typical recovery sequence
1 Set MTPD objective for the DCfor each
client
2 Assign RTOs for individual
clientservicesapplicationsdata on
basis of prioritySLA
3 Evaluate alternative management amp
technical recovery options
4 Perform costbenefit analysis for each
option (re-planiterate)
5 Decide CEOCFO (i) approval (ii)
budget (iii) tasking individualsFailure to prepare is preparing
to fail - John Wooden
robertmcachia at umedumt
Some business recovery options for the DC
bull an Emergency Operations OfficeCentre
bull virtual workplaceteleworking from home for selected DC employees
bull alternative IT Call CentreData Centre
bull business decision - On-shore Near-shore Offshore
fallbackredundancy options
A typical recovery sequence
1 Initial emergency response
2 Resume mission-critical ApplicationsServices
3 Resume non-critical ApplicationsServices
4 Restoration to primary site full services verify stability
Ongoing communication - DC employees clients regulators
BC Planning amp Design Phase - business recovery (2)
robertmcachia at umedumt
BC Planning amp Design - technical recovery
Note business decision On-shore Near-shore Offshore
Note Mirror Hot Warm Cold (i) own (ii) shared-space (iii) reciprocal
agreements (iv) outsourced a business issue
Note high-end BC for DCs active-active hellip hot failover hellip
Note entry-point BC for DCs electronic vaulting remote journaling
transaction logging
Mirror full redundancy short-
distancewidely-separated Data
Centers latency considerations
Hot fully equipped fallback Data Centre
Warm fallback Data Center missing key
components
Cold empty fallback Data Center
DC Recovery options by increasing MTPD RTO amp decreasing cost
mirroring
robertmcachia at umedumt
Phases Exercise amp Testing Maintain amp Review Awareness amp Training
Maintain amp Review Phase
feedback from testingaudit broadendeepen DC BC plan
Awareness amp Training Phase
its never enough institutionalize BC practice embed into
DC culture
The proof of the bdquopudding‟
is in the eating
Exercise amp Testing Phase
Give yourself an intentional DC disaster
1 Re-test BCP regularly
(walk-through checklist role-
playingsimulation full interruption amp
rehearsal)
2 Test alternative risk (threat +
vulnerability) scenarios
robertmcachia at umedumt
An example disaster mass cyber-attack
A mass cyber-attack may
involve
prior sniffing hellip
hellip social engineering amp
widespread attack
An attack may be blended
denial-of-service andor
widespread virusworm
infection andor
penetration (data theft
data corruption) andor
fraudblackmail
IncidentProblem Management
(recall ISO 20000 ITIL)source M Benyoucef
University of Ottawa
robertmcachia at umedumt
An example DC disaster BCM response to cyber-attack
This sequence applies to DoSDDoS infection amp penetration cyber-
attacks
Note After the BCM response seek prevention (i) Lessons Learnt (ii)
improve DC processsystems resilience (iii) heighten DC emergency
preparedness
1 Interdict actions to block attack halt its progress
2 Contain actions to prevent further degradation
amp regain control
3 Recover repair fix damage bring all systems
data applicationsservices to normality
4 Analyse investigationaudit - examination of
evidence
When prevention fails and an attack is under way the BCM
response is
Interdict Contain
Recover Analyse
robertmcachia at umedumt
Structure amp Content of BS25999
A DC may go beyond organising a BCP project
It may decide to formalise its BCM assurance process by embracing the
BS25999 standard
BS25999 holistic structure amp content
bull overview of business continuity management (BCM)
bull business continuity management policy
bull BCM programme management
bull understanding the organization
bull determining business continuity strategy
bull developing and implementing a BCM response
bull exercising maintaining and reviewing BCM arrangements
bull embedding BCM in the organizationrsquos culture
BS25999 Holistic
robertmcachia at umedumt
Some takeaways (1)
BCM thinking amp action
bull decrease the number of
holes both management amp
technical
bull decrease the size of the
holes and
bull keep them moving so they
are never aligned
The Swiss cheese model of business resilience
robertmcachia at umedumt
Some takeaways (2)
BCM is a concern of
CIOs CTOs Data Centre Managers
Quality Managers Infosec Managers
IT Architects DBAs
IT Auditors
CEOs CFOs because itrsquos Governance
BCM addresses broad non-specific risks it is a ldquocatch allrdquo
BCM is probably the most cost-effective security control of all
BCM start now start small scale fast broadendeepen BCM scope
incrementally
You must be able to respond to your circumstances as they exist - not as you
would like them to be - Brian Billick
robertmcachia at umedumt
Sources in and around BCM for Data Centres
The Business Continuity Institute (BCI UK) httpwwwthebciorg
Guidancestudies on BCM by ENISA (EU) ECB (EU) CMI (UK) FSA (UK) UK
Cabinet Office
A primer on ITIL ISO 20000 - IT Service Management
httpwwwisaca-
maltaorgdocumentspresentationsIT_Service_Management_Robert_M_Cachia_26_02
_2009pdf
Agility amp alertness hellipcontinuitycentral httpwwwcontinuitycentralcomukhtm
BS25999 (British Standards Institution) httpwwwbsigroupcomensectorsandservicesDisciplinesBusines
s-Continuity
RiskIT amp CobIT frameworks (ISACA) wwwisaca-maltaorg
SFIA HR framework for IT ISEB (BCS) httpwwwbcsorg
SEI (USA) httpwwwseicmuedu
Uptime Institute (USA) httpwwwuptimeinstituteorg
robertmcachia at umedumt
Questions and Discussion
robertmcachia atldquo umedumt
Thank You
Dr Robert M Cachia
httpwwwlinkedincominrobertmcachia
robertmcachia ldquoatrdquo umedumt
DC outages business impacts - large amp immediate
Impact of DC outages
bull down-time
bull breached SLAs
bull lost revenue
bull reputational damage
clients and prospects lost
forever
bull cost to recover
bull legal liabilities
bull some DCs never recover
Recall many DCs are in
premises that were never
designed to be DC-ready
Q How to protect value-creation capability
A BCM
robertmcachia at umedumt
BCM in Data Centres BCM for Data Centres
360 e-continuity amp e-assurance
business functions of commercial IT providers(eg CEOrsquos office Procurement Marketing amp Sales HR Facilities
Planning QA)
IT processes eg IncidentProblem ConfigurationChange
AvailabilityCapacity
Data
Employees including IT people
Physical FacilitiesBuilding Networks VoIP
Computational and Storage
Servers SANNAS
Software stackOS DBMS middleware applications
ApplicationsServicesB2BB2C e-SCMERPe-CRM BIKM IntranetsExtranets e-mail
BCM for data centres 360
robertmcachia at umedumt
BCM DC initiative output a BC Plan + organizational capability
A good BC plan is articulated by components and views
With a good BC plan the CIOCTOData Centre Manager can tell the
CEO
We haveare executing plans by DC clientmarket segment hellip
We haveare executing plans by Applicationby Service hellip
We haveare executing plans by DC Departmenthellip
We haveare executing plans by Buildingby Floor hellip
BC Plan BC Planning (i) What (ii) How (iii) Who (iv) When
robertmcachia at umedumt
DC Business Continuity Planning management choices
Which DC clientsservicesapplications get restored first and who will
wait longer hellip recovery sequence for each client
Which servicesapplications get restored completely which clients will
initially only get limited service
MTPD maximum tolerable period of disruption the period of time by
which all identified systemsapplicationsservices and data
must be restored to normality (for DC for each client)
RTO recovery time objective the period in time within which
assetsservices must be recovered after disruption (for DC for
each client)
RTO affects the recovery option shorter RTO -gt more difficult
more expensive (for DC for each client)
RPO recovery point objective the point in time to which
assetsservices must be restored after disruption (for DC for
each client)
RPO affects the volume of data that may need to be restored
robertmcachia at umedumt
BCM initiative requires a dedicated ProjectProgramme
A BC initiative may adopt Prince 2 or PMBOK projectmanagement methods The typical BCM phases
BC Project Initiation Phase
(mandate BCM scope DC BCM objectives responsibilitiesapproach awareness and BC project outcomesdeliverables)
Business Impact Analysis Phase
Risk Assessment Phase
BC Planning amp Design Phase
Exercise amp Testing Phase
Maintenance amp Review Phase
Awareness amp Training Phase
robertmcachia at umedumt
Business Impact Analysis (BIA) Phase
BIA achieves understanding of the DC its users activities
servicesrevenue streams applications data liabilities
bull BIA takes an internal focus - it identifies critical DC value-creatingactivities and assets
bull BIA quantifies DC loss due to disruptionimpairment of assets(reputational financial lost revenue cost of recovery)
bull BIA does not estimate the probability of types of DC incidents - itquantifies the consequences
bull BIA techniques questionnaires + interviews
BIA answers the questions which DC activities amp
assets create most value hellip of all potential DC losses
which impaired activities amp assets would be the
greatest losscascading impacts
multiple DC failure
robertmcachia at umedumt
The Risk Matrix - likelihood vs impact
Major Risks to DC - top-right square
unacceptable Eliminate or Postpone
Transfer Monitor amp Mitigate +
management ownership + DO BCM -Governance
Contingency Risks to DC - top-left
square very rare events will
extinguish the business complacency +
overconfidence can distract
management from the consequences
DO BCM - Governance
High incidence and low impact risksamp
minor risks order of the day -
operational matters
robertmcachia at umedumt
Risk Assessment Phase
RA considers the whatwhywherewho can and would cause the DCthe disastrous losses identified in BIA RA seeks to understand threatsto valued serviceDC assets
bull RA takes a primarily external focus a worldpoliticaleconomic IT sector focus - it looks at sources and causes
bull RA identifies threat scenarios to DC value-creating activities ampassets
bull RA estimates likelihood (probability) of occurrence of DC lossesamp asset impairment identified in BIA (estimates orguesstimates)
Note Need to locate DC activities amp assets in the slots of the RiskMatrix for the DC and plan to protect value-creating capability
robertmcachia at umedumt
BC Planning amp Design Phase - business recovery
DC recovery options driven by business requirements
regulation contracts amp SLAs A typical recovery sequence
1 Set MTPD objective for the DCfor each
client
2 Assign RTOs for individual
clientservicesapplicationsdata on
basis of prioritySLA
3 Evaluate alternative management amp
technical recovery options
4 Perform costbenefit analysis for each
option (re-planiterate)
5 Decide CEOCFO (i) approval (ii)
budget (iii) tasking individualsFailure to prepare is preparing
to fail - John Wooden
robertmcachia at umedumt
Some business recovery options for the DC
bull an Emergency Operations OfficeCentre
bull virtual workplaceteleworking from home for selected DC employees
bull alternative IT Call CentreData Centre
bull business decision - On-shore Near-shore Offshore
fallbackredundancy options
A typical recovery sequence
1 Initial emergency response
2 Resume mission-critical ApplicationsServices
3 Resume non-critical ApplicationsServices
4 Restoration to primary site full services verify stability
Ongoing communication - DC employees clients regulators
BC Planning amp Design Phase - business recovery (2)
robertmcachia at umedumt
BC Planning amp Design - technical recovery
Note business decision On-shore Near-shore Offshore
Note Mirror Hot Warm Cold (i) own (ii) shared-space (iii) reciprocal
agreements (iv) outsourced a business issue
Note high-end BC for DCs active-active hellip hot failover hellip
Note entry-point BC for DCs electronic vaulting remote journaling
transaction logging
Mirror full redundancy short-
distancewidely-separated Data
Centers latency considerations
Hot fully equipped fallback Data Centre
Warm fallback Data Center missing key
components
Cold empty fallback Data Center
DC Recovery options by increasing MTPD RTO amp decreasing cost
mirroring
robertmcachia at umedumt
Phases Exercise amp Testing Maintain amp Review Awareness amp Training
Maintain amp Review Phase
feedback from testingaudit broadendeepen DC BC plan
Awareness amp Training Phase
its never enough institutionalize BC practice embed into
DC culture
The proof of the bdquopudding‟
is in the eating
Exercise amp Testing Phase
Give yourself an intentional DC disaster
1 Re-test BCP regularly
(walk-through checklist role-
playingsimulation full interruption amp
rehearsal)
2 Test alternative risk (threat +
vulnerability) scenarios
robertmcachia at umedumt
An example disaster mass cyber-attack
A mass cyber-attack may
involve
prior sniffing hellip
hellip social engineering amp
widespread attack
An attack may be blended
denial-of-service andor
widespread virusworm
infection andor
penetration (data theft
data corruption) andor
fraudblackmail
IncidentProblem Management
(recall ISO 20000 ITIL)source M Benyoucef
University of Ottawa
robertmcachia at umedumt
An example DC disaster BCM response to cyber-attack
This sequence applies to DoSDDoS infection amp penetration cyber-
attacks
Note After the BCM response seek prevention (i) Lessons Learnt (ii)
improve DC processsystems resilience (iii) heighten DC emergency
preparedness
1 Interdict actions to block attack halt its progress
2 Contain actions to prevent further degradation
amp regain control
3 Recover repair fix damage bring all systems
data applicationsservices to normality
4 Analyse investigationaudit - examination of
evidence
When prevention fails and an attack is under way the BCM
response is
Interdict Contain
Recover Analyse
robertmcachia at umedumt
Structure amp Content of BS25999
A DC may go beyond organising a BCP project
It may decide to formalise its BCM assurance process by embracing the
BS25999 standard
BS25999 holistic structure amp content
bull overview of business continuity management (BCM)
bull business continuity management policy
bull BCM programme management
bull understanding the organization
bull determining business continuity strategy
bull developing and implementing a BCM response
bull exercising maintaining and reviewing BCM arrangements
bull embedding BCM in the organizationrsquos culture
BS25999 Holistic
robertmcachia at umedumt
Some takeaways (1)
BCM thinking amp action
bull decrease the number of
holes both management amp
technical
bull decrease the size of the
holes and
bull keep them moving so they
are never aligned
The Swiss cheese model of business resilience
robertmcachia at umedumt
Some takeaways (2)
BCM is a concern of
CIOs CTOs Data Centre Managers
Quality Managers Infosec Managers
IT Architects DBAs
IT Auditors
CEOs CFOs because itrsquos Governance
BCM addresses broad non-specific risks it is a ldquocatch allrdquo
BCM is probably the most cost-effective security control of all
BCM start now start small scale fast broadendeepen BCM scope
incrementally
You must be able to respond to your circumstances as they exist - not as you
would like them to be - Brian Billick
robertmcachia at umedumt
Sources in and around BCM for Data Centres
The Business Continuity Institute (BCI UK) httpwwwthebciorg
Guidancestudies on BCM by ENISA (EU) ECB (EU) CMI (UK) FSA (UK) UK
Cabinet Office
A primer on ITIL ISO 20000 - IT Service Management
httpwwwisaca-
maltaorgdocumentspresentationsIT_Service_Management_Robert_M_Cachia_26_02
_2009pdf
Agility amp alertness hellipcontinuitycentral httpwwwcontinuitycentralcomukhtm
BS25999 (British Standards Institution) httpwwwbsigroupcomensectorsandservicesDisciplinesBusines
s-Continuity
RiskIT amp CobIT frameworks (ISACA) wwwisaca-maltaorg
SFIA HR framework for IT ISEB (BCS) httpwwwbcsorg
SEI (USA) httpwwwseicmuedu
Uptime Institute (USA) httpwwwuptimeinstituteorg
robertmcachia at umedumt
Questions and Discussion
robertmcachia atldquo umedumt
Thank You
Dr Robert M Cachia
httpwwwlinkedincominrobertmcachia
robertmcachia ldquoatrdquo umedumt
BCM in Data Centres BCM for Data Centres
360 e-continuity amp e-assurance
business functions of commercial IT providers(eg CEOrsquos office Procurement Marketing amp Sales HR Facilities
Planning QA)
IT processes eg IncidentProblem ConfigurationChange
AvailabilityCapacity
Data
Employees including IT people
Physical FacilitiesBuilding Networks VoIP
Computational and Storage
Servers SANNAS
Software stackOS DBMS middleware applications
ApplicationsServicesB2BB2C e-SCMERPe-CRM BIKM IntranetsExtranets e-mail
BCM for data centres 360
robertmcachia at umedumt
BCM DC initiative output a BC Plan + organizational capability
A good BC plan is articulated by components and views
With a good BC plan the CIOCTOData Centre Manager can tell the
CEO
We haveare executing plans by DC clientmarket segment hellip
We haveare executing plans by Applicationby Service hellip
We haveare executing plans by DC Departmenthellip
We haveare executing plans by Buildingby Floor hellip
BC Plan BC Planning (i) What (ii) How (iii) Who (iv) When
robertmcachia at umedumt
DC Business Continuity Planning management choices
Which DC clientsservicesapplications get restored first and who will
wait longer hellip recovery sequence for each client
Which servicesapplications get restored completely which clients will
initially only get limited service
MTPD maximum tolerable period of disruption the period of time by
which all identified systemsapplicationsservices and data
must be restored to normality (for DC for each client)
RTO recovery time objective the period in time within which
assetsservices must be recovered after disruption (for DC for
each client)
RTO affects the recovery option shorter RTO -gt more difficult
more expensive (for DC for each client)
RPO recovery point objective the point in time to which
assetsservices must be restored after disruption (for DC for
each client)
RPO affects the volume of data that may need to be restored
robertmcachia at umedumt
BCM initiative requires a dedicated ProjectProgramme
A BC initiative may adopt Prince 2 or PMBOK projectmanagement methods The typical BCM phases
BC Project Initiation Phase
(mandate BCM scope DC BCM objectives responsibilitiesapproach awareness and BC project outcomesdeliverables)
Business Impact Analysis Phase
Risk Assessment Phase
BC Planning amp Design Phase
Exercise amp Testing Phase
Maintenance amp Review Phase
Awareness amp Training Phase
robertmcachia at umedumt
Business Impact Analysis (BIA) Phase
BIA achieves understanding of the DC its users activities
servicesrevenue streams applications data liabilities
bull BIA takes an internal focus - it identifies critical DC value-creatingactivities and assets
bull BIA quantifies DC loss due to disruptionimpairment of assets(reputational financial lost revenue cost of recovery)
bull BIA does not estimate the probability of types of DC incidents - itquantifies the consequences
bull BIA techniques questionnaires + interviews
BIA answers the questions which DC activities amp
assets create most value hellip of all potential DC losses
which impaired activities amp assets would be the
greatest losscascading impacts
multiple DC failure
robertmcachia at umedumt
The Risk Matrix - likelihood vs impact
Major Risks to DC - top-right square
unacceptable Eliminate or Postpone
Transfer Monitor amp Mitigate +
management ownership + DO BCM -Governance
Contingency Risks to DC - top-left
square very rare events will
extinguish the business complacency +
overconfidence can distract
management from the consequences
DO BCM - Governance
High incidence and low impact risksamp
minor risks order of the day -
operational matters
robertmcachia at umedumt
Risk Assessment Phase
RA considers the whatwhywherewho can and would cause the DCthe disastrous losses identified in BIA RA seeks to understand threatsto valued serviceDC assets
bull RA takes a primarily external focus a worldpoliticaleconomic IT sector focus - it looks at sources and causes
bull RA identifies threat scenarios to DC value-creating activities ampassets
bull RA estimates likelihood (probability) of occurrence of DC lossesamp asset impairment identified in BIA (estimates orguesstimates)
Note Need to locate DC activities amp assets in the slots of the RiskMatrix for the DC and plan to protect value-creating capability
robertmcachia at umedumt
BC Planning amp Design Phase - business recovery
DC recovery options driven by business requirements
regulation contracts amp SLAs A typical recovery sequence
1 Set MTPD objective for the DCfor each
client
2 Assign RTOs for individual
clientservicesapplicationsdata on
basis of prioritySLA
3 Evaluate alternative management amp
technical recovery options
4 Perform costbenefit analysis for each
option (re-planiterate)
5 Decide CEOCFO (i) approval (ii)
budget (iii) tasking individualsFailure to prepare is preparing
to fail - John Wooden
robertmcachia at umedumt
Some business recovery options for the DC
bull an Emergency Operations OfficeCentre
bull virtual workplaceteleworking from home for selected DC employees
bull alternative IT Call CentreData Centre
bull business decision - On-shore Near-shore Offshore
fallbackredundancy options
A typical recovery sequence
1 Initial emergency response
2 Resume mission-critical ApplicationsServices
3 Resume non-critical ApplicationsServices
4 Restoration to primary site full services verify stability
Ongoing communication - DC employees clients regulators
BC Planning amp Design Phase - business recovery (2)
robertmcachia at umedumt
BC Planning amp Design - technical recovery
Note business decision On-shore Near-shore Offshore
Note Mirror Hot Warm Cold (i) own (ii) shared-space (iii) reciprocal
agreements (iv) outsourced a business issue
Note high-end BC for DCs active-active hellip hot failover hellip
Note entry-point BC for DCs electronic vaulting remote journaling
transaction logging
Mirror full redundancy short-
distancewidely-separated Data
Centers latency considerations
Hot fully equipped fallback Data Centre
Warm fallback Data Center missing key
components
Cold empty fallback Data Center
DC Recovery options by increasing MTPD RTO amp decreasing cost
mirroring
robertmcachia at umedumt
Phases Exercise amp Testing Maintain amp Review Awareness amp Training
Maintain amp Review Phase
feedback from testingaudit broadendeepen DC BC plan
Awareness amp Training Phase
its never enough institutionalize BC practice embed into
DC culture
The proof of the bdquopudding‟
is in the eating
Exercise amp Testing Phase
Give yourself an intentional DC disaster
1 Re-test BCP regularly
(walk-through checklist role-
playingsimulation full interruption amp
rehearsal)
2 Test alternative risk (threat +
vulnerability) scenarios
robertmcachia at umedumt
An example disaster mass cyber-attack
A mass cyber-attack may
involve
prior sniffing hellip
hellip social engineering amp
widespread attack
An attack may be blended
denial-of-service andor
widespread virusworm
infection andor
penetration (data theft
data corruption) andor
fraudblackmail
IncidentProblem Management
(recall ISO 20000 ITIL)source M Benyoucef
University of Ottawa
robertmcachia at umedumt
An example DC disaster BCM response to cyber-attack
This sequence applies to DoSDDoS infection amp penetration cyber-
attacks
Note After the BCM response seek prevention (i) Lessons Learnt (ii)
improve DC processsystems resilience (iii) heighten DC emergency
preparedness
1 Interdict actions to block attack halt its progress
2 Contain actions to prevent further degradation
amp regain control
3 Recover repair fix damage bring all systems
data applicationsservices to normality
4 Analyse investigationaudit - examination of
evidence
When prevention fails and an attack is under way the BCM
response is
Interdict Contain
Recover Analyse
robertmcachia at umedumt
Structure amp Content of BS25999
A DC may go beyond organising a BCP project
It may decide to formalise its BCM assurance process by embracing the
BS25999 standard
BS25999 holistic structure amp content
bull overview of business continuity management (BCM)
bull business continuity management policy
bull BCM programme management
bull understanding the organization
bull determining business continuity strategy
bull developing and implementing a BCM response
bull exercising maintaining and reviewing BCM arrangements
bull embedding BCM in the organizationrsquos culture
BS25999 Holistic
robertmcachia at umedumt
Some takeaways (1)
BCM thinking amp action
bull decrease the number of
holes both management amp
technical
bull decrease the size of the
holes and
bull keep them moving so they
are never aligned
The Swiss cheese model of business resilience
robertmcachia at umedumt
Some takeaways (2)
BCM is a concern of
CIOs CTOs Data Centre Managers
Quality Managers Infosec Managers
IT Architects DBAs
IT Auditors
CEOs CFOs because itrsquos Governance
BCM addresses broad non-specific risks it is a ldquocatch allrdquo
BCM is probably the most cost-effective security control of all
BCM start now start small scale fast broadendeepen BCM scope
incrementally
You must be able to respond to your circumstances as they exist - not as you
would like them to be - Brian Billick
robertmcachia at umedumt
Sources in and around BCM for Data Centres
The Business Continuity Institute (BCI UK) httpwwwthebciorg
Guidancestudies on BCM by ENISA (EU) ECB (EU) CMI (UK) FSA (UK) UK
Cabinet Office
A primer on ITIL ISO 20000 - IT Service Management
httpwwwisaca-
maltaorgdocumentspresentationsIT_Service_Management_Robert_M_Cachia_26_02
_2009pdf
Agility amp alertness hellipcontinuitycentral httpwwwcontinuitycentralcomukhtm
BS25999 (British Standards Institution) httpwwwbsigroupcomensectorsandservicesDisciplinesBusines
s-Continuity
RiskIT amp CobIT frameworks (ISACA) wwwisaca-maltaorg
SFIA HR framework for IT ISEB (BCS) httpwwwbcsorg
SEI (USA) httpwwwseicmuedu
Uptime Institute (USA) httpwwwuptimeinstituteorg
robertmcachia at umedumt
Questions and Discussion
robertmcachia atldquo umedumt
Thank You
Dr Robert M Cachia
httpwwwlinkedincominrobertmcachia
robertmcachia ldquoatrdquo umedumt
BCM DC initiative output a BC Plan + organizational capability
A good BC plan is articulated by components and views
With a good BC plan the CIOCTOData Centre Manager can tell the
CEO
We haveare executing plans by DC clientmarket segment hellip
We haveare executing plans by Applicationby Service hellip
We haveare executing plans by DC Departmenthellip
We haveare executing plans by Buildingby Floor hellip
BC Plan BC Planning (i) What (ii) How (iii) Who (iv) When
robertmcachia at umedumt
DC Business Continuity Planning management choices
Which DC clientsservicesapplications get restored first and who will
wait longer hellip recovery sequence for each client
Which servicesapplications get restored completely which clients will
initially only get limited service
MTPD maximum tolerable period of disruption the period of time by
which all identified systemsapplicationsservices and data
must be restored to normality (for DC for each client)
RTO recovery time objective the period in time within which
assetsservices must be recovered after disruption (for DC for
each client)
RTO affects the recovery option shorter RTO -gt more difficult
more expensive (for DC for each client)
RPO recovery point objective the point in time to which
assetsservices must be restored after disruption (for DC for
each client)
RPO affects the volume of data that may need to be restored
robertmcachia at umedumt
BCM initiative requires a dedicated ProjectProgramme
A BC initiative may adopt Prince 2 or PMBOK projectmanagement methods The typical BCM phases
BC Project Initiation Phase
(mandate BCM scope DC BCM objectives responsibilitiesapproach awareness and BC project outcomesdeliverables)
Business Impact Analysis Phase
Risk Assessment Phase
BC Planning amp Design Phase
Exercise amp Testing Phase
Maintenance amp Review Phase
Awareness amp Training Phase
robertmcachia at umedumt
Business Impact Analysis (BIA) Phase
BIA achieves understanding of the DC its users activities
servicesrevenue streams applications data liabilities
bull BIA takes an internal focus - it identifies critical DC value-creatingactivities and assets
bull BIA quantifies DC loss due to disruptionimpairment of assets(reputational financial lost revenue cost of recovery)
bull BIA does not estimate the probability of types of DC incidents - itquantifies the consequences
bull BIA techniques questionnaires + interviews
BIA answers the questions which DC activities amp
assets create most value hellip of all potential DC losses
which impaired activities amp assets would be the
greatest losscascading impacts
multiple DC failure
robertmcachia at umedumt
The Risk Matrix - likelihood vs impact
Major Risks to DC - top-right square
unacceptable Eliminate or Postpone
Transfer Monitor amp Mitigate +
management ownership + DO BCM -Governance
Contingency Risks to DC - top-left
square very rare events will
extinguish the business complacency +
overconfidence can distract
management from the consequences
DO BCM - Governance
High incidence and low impact risksamp
minor risks order of the day -
operational matters
robertmcachia at umedumt
Risk Assessment Phase
RA considers the whatwhywherewho can and would cause the DCthe disastrous losses identified in BIA RA seeks to understand threatsto valued serviceDC assets
bull RA takes a primarily external focus a worldpoliticaleconomic IT sector focus - it looks at sources and causes
bull RA identifies threat scenarios to DC value-creating activities ampassets
bull RA estimates likelihood (probability) of occurrence of DC lossesamp asset impairment identified in BIA (estimates orguesstimates)
Note Need to locate DC activities amp assets in the slots of the RiskMatrix for the DC and plan to protect value-creating capability
robertmcachia at umedumt
BC Planning amp Design Phase - business recovery
DC recovery options driven by business requirements
regulation contracts amp SLAs A typical recovery sequence
1 Set MTPD objective for the DCfor each
client
2 Assign RTOs for individual
clientservicesapplicationsdata on
basis of prioritySLA
3 Evaluate alternative management amp
technical recovery options
4 Perform costbenefit analysis for each
option (re-planiterate)
5 Decide CEOCFO (i) approval (ii)
budget (iii) tasking individualsFailure to prepare is preparing
to fail - John Wooden
robertmcachia at umedumt
Some business recovery options for the DC
bull an Emergency Operations OfficeCentre
bull virtual workplaceteleworking from home for selected DC employees
bull alternative IT Call CentreData Centre
bull business decision - On-shore Near-shore Offshore
fallbackredundancy options
A typical recovery sequence
1 Initial emergency response
2 Resume mission-critical ApplicationsServices
3 Resume non-critical ApplicationsServices
4 Restoration to primary site full services verify stability
Ongoing communication - DC employees clients regulators
BC Planning amp Design Phase - business recovery (2)
robertmcachia at umedumt
BC Planning amp Design - technical recovery
Note business decision On-shore Near-shore Offshore
Note Mirror Hot Warm Cold (i) own (ii) shared-space (iii) reciprocal
agreements (iv) outsourced a business issue
Note high-end BC for DCs active-active hellip hot failover hellip
Note entry-point BC for DCs electronic vaulting remote journaling
transaction logging
Mirror full redundancy short-
distancewidely-separated Data
Centers latency considerations
Hot fully equipped fallback Data Centre
Warm fallback Data Center missing key
components
Cold empty fallback Data Center
DC Recovery options by increasing MTPD RTO amp decreasing cost
mirroring
robertmcachia at umedumt
Phases Exercise amp Testing Maintain amp Review Awareness amp Training
Maintain amp Review Phase
feedback from testingaudit broadendeepen DC BC plan
Awareness amp Training Phase
its never enough institutionalize BC practice embed into
DC culture
The proof of the bdquopudding‟
is in the eating
Exercise amp Testing Phase
Give yourself an intentional DC disaster
1 Re-test BCP regularly
(walk-through checklist role-
playingsimulation full interruption amp
rehearsal)
2 Test alternative risk (threat +
vulnerability) scenarios
robertmcachia at umedumt
An example disaster mass cyber-attack
A mass cyber-attack may
involve
prior sniffing hellip
hellip social engineering amp
widespread attack
An attack may be blended
denial-of-service andor
widespread virusworm
infection andor
penetration (data theft
data corruption) andor
fraudblackmail
IncidentProblem Management
(recall ISO 20000 ITIL)source M Benyoucef
University of Ottawa
robertmcachia at umedumt
An example DC disaster BCM response to cyber-attack
This sequence applies to DoSDDoS infection amp penetration cyber-
attacks
Note After the BCM response seek prevention (i) Lessons Learnt (ii)
improve DC processsystems resilience (iii) heighten DC emergency
preparedness
1 Interdict actions to block attack halt its progress
2 Contain actions to prevent further degradation
amp regain control
3 Recover repair fix damage bring all systems
data applicationsservices to normality
4 Analyse investigationaudit - examination of
evidence
When prevention fails and an attack is under way the BCM
response is
Interdict Contain
Recover Analyse
robertmcachia at umedumt
Structure amp Content of BS25999
A DC may go beyond organising a BCP project
It may decide to formalise its BCM assurance process by embracing the
BS25999 standard
BS25999 holistic structure amp content
bull overview of business continuity management (BCM)
bull business continuity management policy
bull BCM programme management
bull understanding the organization
bull determining business continuity strategy
bull developing and implementing a BCM response
bull exercising maintaining and reviewing BCM arrangements
bull embedding BCM in the organizationrsquos culture
BS25999 Holistic
robertmcachia at umedumt
Some takeaways (1)
BCM thinking amp action
bull decrease the number of
holes both management amp
technical
bull decrease the size of the
holes and
bull keep them moving so they
are never aligned
The Swiss cheese model of business resilience
robertmcachia at umedumt
Some takeaways (2)
BCM is a concern of
CIOs CTOs Data Centre Managers
Quality Managers Infosec Managers
IT Architects DBAs
IT Auditors
CEOs CFOs because itrsquos Governance
BCM addresses broad non-specific risks it is a ldquocatch allrdquo
BCM is probably the most cost-effective security control of all
BCM start now start small scale fast broadendeepen BCM scope
incrementally
You must be able to respond to your circumstances as they exist - not as you
would like them to be - Brian Billick
robertmcachia at umedumt
Sources in and around BCM for Data Centres
The Business Continuity Institute (BCI UK) httpwwwthebciorg
Guidancestudies on BCM by ENISA (EU) ECB (EU) CMI (UK) FSA (UK) UK
Cabinet Office
A primer on ITIL ISO 20000 - IT Service Management
httpwwwisaca-
maltaorgdocumentspresentationsIT_Service_Management_Robert_M_Cachia_26_02
_2009pdf
Agility amp alertness hellipcontinuitycentral httpwwwcontinuitycentralcomukhtm
BS25999 (British Standards Institution) httpwwwbsigroupcomensectorsandservicesDisciplinesBusines
s-Continuity
RiskIT amp CobIT frameworks (ISACA) wwwisaca-maltaorg
SFIA HR framework for IT ISEB (BCS) httpwwwbcsorg
SEI (USA) httpwwwseicmuedu
Uptime Institute (USA) httpwwwuptimeinstituteorg
robertmcachia at umedumt
Questions and Discussion
robertmcachia atldquo umedumt
Thank You
Dr Robert M Cachia
httpwwwlinkedincominrobertmcachia
robertmcachia ldquoatrdquo umedumt
DC Business Continuity Planning management choices
Which DC clientsservicesapplications get restored first and who will
wait longer hellip recovery sequence for each client
Which servicesapplications get restored completely which clients will
initially only get limited service
MTPD maximum tolerable period of disruption the period of time by
which all identified systemsapplicationsservices and data
must be restored to normality (for DC for each client)
RTO recovery time objective the period in time within which
assetsservices must be recovered after disruption (for DC for
each client)
RTO affects the recovery option shorter RTO -gt more difficult
more expensive (for DC for each client)
RPO recovery point objective the point in time to which
assetsservices must be restored after disruption (for DC for
each client)
RPO affects the volume of data that may need to be restored
robertmcachia at umedumt
BCM initiative requires a dedicated ProjectProgramme
A BC initiative may adopt Prince 2 or PMBOK projectmanagement methods The typical BCM phases
BC Project Initiation Phase
(mandate BCM scope DC BCM objectives responsibilitiesapproach awareness and BC project outcomesdeliverables)
Business Impact Analysis Phase
Risk Assessment Phase
BC Planning amp Design Phase
Exercise amp Testing Phase
Maintenance amp Review Phase
Awareness amp Training Phase
robertmcachia at umedumt
Business Impact Analysis (BIA) Phase
BIA achieves understanding of the DC its users activities
servicesrevenue streams applications data liabilities
bull BIA takes an internal focus - it identifies critical DC value-creatingactivities and assets
bull BIA quantifies DC loss due to disruptionimpairment of assets(reputational financial lost revenue cost of recovery)
bull BIA does not estimate the probability of types of DC incidents - itquantifies the consequences
bull BIA techniques questionnaires + interviews
BIA answers the questions which DC activities amp
assets create most value hellip of all potential DC losses
which impaired activities amp assets would be the
greatest losscascading impacts
multiple DC failure
robertmcachia at umedumt
The Risk Matrix - likelihood vs impact
Major Risks to DC - top-right square
unacceptable Eliminate or Postpone
Transfer Monitor amp Mitigate +
management ownership + DO BCM -Governance
Contingency Risks to DC - top-left
square very rare events will
extinguish the business complacency +
overconfidence can distract
management from the consequences
DO BCM - Governance
High incidence and low impact risksamp
minor risks order of the day -
operational matters
robertmcachia at umedumt
Risk Assessment Phase
RA considers the whatwhywherewho can and would cause the DCthe disastrous losses identified in BIA RA seeks to understand threatsto valued serviceDC assets
bull RA takes a primarily external focus a worldpoliticaleconomic IT sector focus - it looks at sources and causes
bull RA identifies threat scenarios to DC value-creating activities ampassets
bull RA estimates likelihood (probability) of occurrence of DC lossesamp asset impairment identified in BIA (estimates orguesstimates)
Note Need to locate DC activities amp assets in the slots of the RiskMatrix for the DC and plan to protect value-creating capability
robertmcachia at umedumt
BC Planning amp Design Phase - business recovery
DC recovery options driven by business requirements
regulation contracts amp SLAs A typical recovery sequence
1 Set MTPD objective for the DCfor each
client
2 Assign RTOs for individual
clientservicesapplicationsdata on
basis of prioritySLA
3 Evaluate alternative management amp
technical recovery options
4 Perform costbenefit analysis for each
option (re-planiterate)
5 Decide CEOCFO (i) approval (ii)
budget (iii) tasking individualsFailure to prepare is preparing
to fail - John Wooden
robertmcachia at umedumt
Some business recovery options for the DC
bull an Emergency Operations OfficeCentre
bull virtual workplaceteleworking from home for selected DC employees
bull alternative IT Call CentreData Centre
bull business decision - On-shore Near-shore Offshore
fallbackredundancy options
A typical recovery sequence
1 Initial emergency response
2 Resume mission-critical ApplicationsServices
3 Resume non-critical ApplicationsServices
4 Restoration to primary site full services verify stability
Ongoing communication - DC employees clients regulators
BC Planning amp Design Phase - business recovery (2)
robertmcachia at umedumt
BC Planning amp Design - technical recovery
Note business decision On-shore Near-shore Offshore
Note Mirror Hot Warm Cold (i) own (ii) shared-space (iii) reciprocal
agreements (iv) outsourced a business issue
Note high-end BC for DCs active-active hellip hot failover hellip
Note entry-point BC for DCs electronic vaulting remote journaling
transaction logging
Mirror full redundancy short-
distancewidely-separated Data
Centers latency considerations
Hot fully equipped fallback Data Centre
Warm fallback Data Center missing key
components
Cold empty fallback Data Center
DC Recovery options by increasing MTPD RTO amp decreasing cost
mirroring
robertmcachia at umedumt
Phases Exercise amp Testing Maintain amp Review Awareness amp Training
Maintain amp Review Phase
feedback from testingaudit broadendeepen DC BC plan
Awareness amp Training Phase
its never enough institutionalize BC practice embed into
DC culture
The proof of the bdquopudding‟
is in the eating
Exercise amp Testing Phase
Give yourself an intentional DC disaster
1 Re-test BCP regularly
(walk-through checklist role-
playingsimulation full interruption amp
rehearsal)
2 Test alternative risk (threat +
vulnerability) scenarios
robertmcachia at umedumt
An example disaster mass cyber-attack
A mass cyber-attack may
involve
prior sniffing hellip
hellip social engineering amp
widespread attack
An attack may be blended
denial-of-service andor
widespread virusworm
infection andor
penetration (data theft
data corruption) andor
fraudblackmail
IncidentProblem Management
(recall ISO 20000 ITIL)source M Benyoucef
University of Ottawa
robertmcachia at umedumt
An example DC disaster BCM response to cyber-attack
This sequence applies to DoSDDoS infection amp penetration cyber-
attacks
Note After the BCM response seek prevention (i) Lessons Learnt (ii)
improve DC processsystems resilience (iii) heighten DC emergency
preparedness
1 Interdict actions to block attack halt its progress
2 Contain actions to prevent further degradation
amp regain control
3 Recover repair fix damage bring all systems
data applicationsservices to normality
4 Analyse investigationaudit - examination of
evidence
When prevention fails and an attack is under way the BCM
response is
Interdict Contain
Recover Analyse
robertmcachia at umedumt
Structure amp Content of BS25999
A DC may go beyond organising a BCP project
It may decide to formalise its BCM assurance process by embracing the
BS25999 standard
BS25999 holistic structure amp content
bull overview of business continuity management (BCM)
bull business continuity management policy
bull BCM programme management
bull understanding the organization
bull determining business continuity strategy
bull developing and implementing a BCM response
bull exercising maintaining and reviewing BCM arrangements
bull embedding BCM in the organizationrsquos culture
BS25999 Holistic
robertmcachia at umedumt
Some takeaways (1)
BCM thinking amp action
bull decrease the number of
holes both management amp
technical
bull decrease the size of the
holes and
bull keep them moving so they
are never aligned
The Swiss cheese model of business resilience
robertmcachia at umedumt
Some takeaways (2)
BCM is a concern of
CIOs CTOs Data Centre Managers
Quality Managers Infosec Managers
IT Architects DBAs
IT Auditors
CEOs CFOs because itrsquos Governance
BCM addresses broad non-specific risks it is a ldquocatch allrdquo
BCM is probably the most cost-effective security control of all
BCM start now start small scale fast broadendeepen BCM scope
incrementally
You must be able to respond to your circumstances as they exist - not as you
would like them to be - Brian Billick
robertmcachia at umedumt
Sources in and around BCM for Data Centres
The Business Continuity Institute (BCI UK) httpwwwthebciorg
Guidancestudies on BCM by ENISA (EU) ECB (EU) CMI (UK) FSA (UK) UK
Cabinet Office
A primer on ITIL ISO 20000 - IT Service Management
httpwwwisaca-
maltaorgdocumentspresentationsIT_Service_Management_Robert_M_Cachia_26_02
_2009pdf
Agility amp alertness hellipcontinuitycentral httpwwwcontinuitycentralcomukhtm
BS25999 (British Standards Institution) httpwwwbsigroupcomensectorsandservicesDisciplinesBusines
s-Continuity
RiskIT amp CobIT frameworks (ISACA) wwwisaca-maltaorg
SFIA HR framework for IT ISEB (BCS) httpwwwbcsorg
SEI (USA) httpwwwseicmuedu
Uptime Institute (USA) httpwwwuptimeinstituteorg
robertmcachia at umedumt
Questions and Discussion
robertmcachia atldquo umedumt
Thank You
Dr Robert M Cachia
httpwwwlinkedincominrobertmcachia
robertmcachia ldquoatrdquo umedumt
BCM initiative requires a dedicated ProjectProgramme
A BC initiative may adopt Prince 2 or PMBOK projectmanagement methods The typical BCM phases
BC Project Initiation Phase
(mandate BCM scope DC BCM objectives responsibilitiesapproach awareness and BC project outcomesdeliverables)
Business Impact Analysis Phase
Risk Assessment Phase
BC Planning amp Design Phase
Exercise amp Testing Phase
Maintenance amp Review Phase
Awareness amp Training Phase
robertmcachia at umedumt
Business Impact Analysis (BIA) Phase
BIA achieves understanding of the DC its users activities
servicesrevenue streams applications data liabilities
bull BIA takes an internal focus - it identifies critical DC value-creatingactivities and assets
bull BIA quantifies DC loss due to disruptionimpairment of assets(reputational financial lost revenue cost of recovery)
bull BIA does not estimate the probability of types of DC incidents - itquantifies the consequences
bull BIA techniques questionnaires + interviews
BIA answers the questions which DC activities amp
assets create most value hellip of all potential DC losses
which impaired activities amp assets would be the
greatest losscascading impacts
multiple DC failure
robertmcachia at umedumt
The Risk Matrix - likelihood vs impact
Major Risks to DC - top-right square
unacceptable Eliminate or Postpone
Transfer Monitor amp Mitigate +
management ownership + DO BCM -Governance
Contingency Risks to DC - top-left
square very rare events will
extinguish the business complacency +
overconfidence can distract
management from the consequences
DO BCM - Governance
High incidence and low impact risksamp
minor risks order of the day -
operational matters
robertmcachia at umedumt
Risk Assessment Phase
RA considers the whatwhywherewho can and would cause the DCthe disastrous losses identified in BIA RA seeks to understand threatsto valued serviceDC assets
bull RA takes a primarily external focus a worldpoliticaleconomic IT sector focus - it looks at sources and causes
bull RA identifies threat scenarios to DC value-creating activities ampassets
bull RA estimates likelihood (probability) of occurrence of DC lossesamp asset impairment identified in BIA (estimates orguesstimates)
Note Need to locate DC activities amp assets in the slots of the RiskMatrix for the DC and plan to protect value-creating capability
robertmcachia at umedumt
BC Planning amp Design Phase - business recovery
DC recovery options driven by business requirements
regulation contracts amp SLAs A typical recovery sequence
1 Set MTPD objective for the DCfor each
client
2 Assign RTOs for individual
clientservicesapplicationsdata on
basis of prioritySLA
3 Evaluate alternative management amp
technical recovery options
4 Perform costbenefit analysis for each
option (re-planiterate)
5 Decide CEOCFO (i) approval (ii)
budget (iii) tasking individualsFailure to prepare is preparing
to fail - John Wooden
robertmcachia at umedumt
Some business recovery options for the DC
bull an Emergency Operations OfficeCentre
bull virtual workplaceteleworking from home for selected DC employees
bull alternative IT Call CentreData Centre
bull business decision - On-shore Near-shore Offshore
fallbackredundancy options
A typical recovery sequence
1 Initial emergency response
2 Resume mission-critical ApplicationsServices
3 Resume non-critical ApplicationsServices
4 Restoration to primary site full services verify stability
Ongoing communication - DC employees clients regulators
BC Planning amp Design Phase - business recovery (2)
robertmcachia at umedumt
BC Planning amp Design - technical recovery
Note business decision On-shore Near-shore Offshore
Note Mirror Hot Warm Cold (i) own (ii) shared-space (iii) reciprocal
agreements (iv) outsourced a business issue
Note high-end BC for DCs active-active hellip hot failover hellip
Note entry-point BC for DCs electronic vaulting remote journaling
transaction logging
Mirror full redundancy short-
distancewidely-separated Data
Centers latency considerations
Hot fully equipped fallback Data Centre
Warm fallback Data Center missing key
components
Cold empty fallback Data Center
DC Recovery options by increasing MTPD RTO amp decreasing cost
mirroring
robertmcachia at umedumt
Phases Exercise amp Testing Maintain amp Review Awareness amp Training
Maintain amp Review Phase
feedback from testingaudit broadendeepen DC BC plan
Awareness amp Training Phase
its never enough institutionalize BC practice embed into
DC culture
The proof of the bdquopudding‟
is in the eating
Exercise amp Testing Phase
Give yourself an intentional DC disaster
1 Re-test BCP regularly
(walk-through checklist role-
playingsimulation full interruption amp
rehearsal)
2 Test alternative risk (threat +
vulnerability) scenarios
robertmcachia at umedumt
An example disaster mass cyber-attack
A mass cyber-attack may
involve
prior sniffing hellip
hellip social engineering amp
widespread attack
An attack may be blended
denial-of-service andor
widespread virusworm
infection andor
penetration (data theft
data corruption) andor
fraudblackmail
IncidentProblem Management
(recall ISO 20000 ITIL)source M Benyoucef
University of Ottawa
robertmcachia at umedumt
An example DC disaster BCM response to cyber-attack
This sequence applies to DoSDDoS infection amp penetration cyber-
attacks
Note After the BCM response seek prevention (i) Lessons Learnt (ii)
improve DC processsystems resilience (iii) heighten DC emergency
preparedness
1 Interdict actions to block attack halt its progress
2 Contain actions to prevent further degradation
amp regain control
3 Recover repair fix damage bring all systems
data applicationsservices to normality
4 Analyse investigationaudit - examination of
evidence
When prevention fails and an attack is under way the BCM
response is
Interdict Contain
Recover Analyse
robertmcachia at umedumt
Structure amp Content of BS25999
A DC may go beyond organising a BCP project
It may decide to formalise its BCM assurance process by embracing the
BS25999 standard
BS25999 holistic structure amp content
bull overview of business continuity management (BCM)
bull business continuity management policy
bull BCM programme management
bull understanding the organization
bull determining business continuity strategy
bull developing and implementing a BCM response
bull exercising maintaining and reviewing BCM arrangements
bull embedding BCM in the organizationrsquos culture
BS25999 Holistic
robertmcachia at umedumt
Some takeaways (1)
BCM thinking amp action
bull decrease the number of
holes both management amp
technical
bull decrease the size of the
holes and
bull keep them moving so they
are never aligned
The Swiss cheese model of business resilience
robertmcachia at umedumt
Some takeaways (2)
BCM is a concern of
CIOs CTOs Data Centre Managers
Quality Managers Infosec Managers
IT Architects DBAs
IT Auditors
CEOs CFOs because itrsquos Governance
BCM addresses broad non-specific risks it is a ldquocatch allrdquo
BCM is probably the most cost-effective security control of all
BCM start now start small scale fast broadendeepen BCM scope
incrementally
You must be able to respond to your circumstances as they exist - not as you
would like them to be - Brian Billick
robertmcachia at umedumt
Sources in and around BCM for Data Centres
The Business Continuity Institute (BCI UK) httpwwwthebciorg
Guidancestudies on BCM by ENISA (EU) ECB (EU) CMI (UK) FSA (UK) UK
Cabinet Office
A primer on ITIL ISO 20000 - IT Service Management
httpwwwisaca-
maltaorgdocumentspresentationsIT_Service_Management_Robert_M_Cachia_26_02
_2009pdf
Agility amp alertness hellipcontinuitycentral httpwwwcontinuitycentralcomukhtm
BS25999 (British Standards Institution) httpwwwbsigroupcomensectorsandservicesDisciplinesBusines
s-Continuity
RiskIT amp CobIT frameworks (ISACA) wwwisaca-maltaorg
SFIA HR framework for IT ISEB (BCS) httpwwwbcsorg
SEI (USA) httpwwwseicmuedu
Uptime Institute (USA) httpwwwuptimeinstituteorg
robertmcachia at umedumt
Questions and Discussion
robertmcachia atldquo umedumt
Thank You
Dr Robert M Cachia
httpwwwlinkedincominrobertmcachia
robertmcachia ldquoatrdquo umedumt
Business Impact Analysis (BIA) Phase
BIA achieves understanding of the DC its users activities
servicesrevenue streams applications data liabilities
bull BIA takes an internal focus - it identifies critical DC value-creatingactivities and assets
bull BIA quantifies DC loss due to disruptionimpairment of assets(reputational financial lost revenue cost of recovery)
bull BIA does not estimate the probability of types of DC incidents - itquantifies the consequences
bull BIA techniques questionnaires + interviews
BIA answers the questions which DC activities amp
assets create most value hellip of all potential DC losses
which impaired activities amp assets would be the
greatest losscascading impacts
multiple DC failure
robertmcachia at umedumt
The Risk Matrix - likelihood vs impact
Major Risks to DC - top-right square
unacceptable Eliminate or Postpone
Transfer Monitor amp Mitigate +
management ownership + DO BCM -Governance
Contingency Risks to DC - top-left
square very rare events will
extinguish the business complacency +
overconfidence can distract
management from the consequences
DO BCM - Governance
High incidence and low impact risksamp
minor risks order of the day -
operational matters
robertmcachia at umedumt
Risk Assessment Phase
RA considers the whatwhywherewho can and would cause the DCthe disastrous losses identified in BIA RA seeks to understand threatsto valued serviceDC assets
bull RA takes a primarily external focus a worldpoliticaleconomic IT sector focus - it looks at sources and causes
bull RA identifies threat scenarios to DC value-creating activities ampassets
bull RA estimates likelihood (probability) of occurrence of DC lossesamp asset impairment identified in BIA (estimates orguesstimates)
Note Need to locate DC activities amp assets in the slots of the RiskMatrix for the DC and plan to protect value-creating capability
robertmcachia at umedumt
BC Planning amp Design Phase - business recovery
DC recovery options driven by business requirements
regulation contracts amp SLAs A typical recovery sequence
1 Set MTPD objective for the DCfor each
client
2 Assign RTOs for individual
clientservicesapplicationsdata on
basis of prioritySLA
3 Evaluate alternative management amp
technical recovery options
4 Perform costbenefit analysis for each
option (re-planiterate)
5 Decide CEOCFO (i) approval (ii)
budget (iii) tasking individualsFailure to prepare is preparing
to fail - John Wooden
robertmcachia at umedumt
Some business recovery options for the DC
bull an Emergency Operations OfficeCentre
bull virtual workplaceteleworking from home for selected DC employees
bull alternative IT Call CentreData Centre
bull business decision - On-shore Near-shore Offshore
fallbackredundancy options
A typical recovery sequence
1 Initial emergency response
2 Resume mission-critical ApplicationsServices
3 Resume non-critical ApplicationsServices
4 Restoration to primary site full services verify stability
Ongoing communication - DC employees clients regulators
BC Planning amp Design Phase - business recovery (2)
robertmcachia at umedumt
BC Planning amp Design - technical recovery
Note business decision On-shore Near-shore Offshore
Note Mirror Hot Warm Cold (i) own (ii) shared-space (iii) reciprocal
agreements (iv) outsourced a business issue
Note high-end BC for DCs active-active hellip hot failover hellip
Note entry-point BC for DCs electronic vaulting remote journaling
transaction logging
Mirror full redundancy short-
distancewidely-separated Data
Centers latency considerations
Hot fully equipped fallback Data Centre
Warm fallback Data Center missing key
components
Cold empty fallback Data Center
DC Recovery options by increasing MTPD RTO amp decreasing cost
mirroring
robertmcachia at umedumt
Phases Exercise amp Testing Maintain amp Review Awareness amp Training
Maintain amp Review Phase
feedback from testingaudit broadendeepen DC BC plan
Awareness amp Training Phase
its never enough institutionalize BC practice embed into
DC culture
The proof of the bdquopudding‟
is in the eating
Exercise amp Testing Phase
Give yourself an intentional DC disaster
1 Re-test BCP regularly
(walk-through checklist role-
playingsimulation full interruption amp
rehearsal)
2 Test alternative risk (threat +
vulnerability) scenarios
robertmcachia at umedumt
An example disaster mass cyber-attack
A mass cyber-attack may
involve
prior sniffing hellip
hellip social engineering amp
widespread attack
An attack may be blended
denial-of-service andor
widespread virusworm
infection andor
penetration (data theft
data corruption) andor
fraudblackmail
IncidentProblem Management
(recall ISO 20000 ITIL)source M Benyoucef
University of Ottawa
robertmcachia at umedumt
An example DC disaster BCM response to cyber-attack
This sequence applies to DoSDDoS infection amp penetration cyber-
attacks
Note After the BCM response seek prevention (i) Lessons Learnt (ii)
improve DC processsystems resilience (iii) heighten DC emergency
preparedness
1 Interdict actions to block attack halt its progress
2 Contain actions to prevent further degradation
amp regain control
3 Recover repair fix damage bring all systems
data applicationsservices to normality
4 Analyse investigationaudit - examination of
evidence
When prevention fails and an attack is under way the BCM
response is
Interdict Contain
Recover Analyse
robertmcachia at umedumt
Structure amp Content of BS25999
A DC may go beyond organising a BCP project
It may decide to formalise its BCM assurance process by embracing the
BS25999 standard
BS25999 holistic structure amp content
bull overview of business continuity management (BCM)
bull business continuity management policy
bull BCM programme management
bull understanding the organization
bull determining business continuity strategy
bull developing and implementing a BCM response
bull exercising maintaining and reviewing BCM arrangements
bull embedding BCM in the organizationrsquos culture
BS25999 Holistic
robertmcachia at umedumt
Some takeaways (1)
BCM thinking amp action
bull decrease the number of
holes both management amp
technical
bull decrease the size of the
holes and
bull keep them moving so they
are never aligned
The Swiss cheese model of business resilience
robertmcachia at umedumt
Some takeaways (2)
BCM is a concern of
CIOs CTOs Data Centre Managers
Quality Managers Infosec Managers
IT Architects DBAs
IT Auditors
CEOs CFOs because itrsquos Governance
BCM addresses broad non-specific risks it is a ldquocatch allrdquo
BCM is probably the most cost-effective security control of all
BCM start now start small scale fast broadendeepen BCM scope
incrementally
You must be able to respond to your circumstances as they exist - not as you
would like them to be - Brian Billick
robertmcachia at umedumt
Sources in and around BCM for Data Centres
The Business Continuity Institute (BCI UK) httpwwwthebciorg
Guidancestudies on BCM by ENISA (EU) ECB (EU) CMI (UK) FSA (UK) UK
Cabinet Office
A primer on ITIL ISO 20000 - IT Service Management
httpwwwisaca-
maltaorgdocumentspresentationsIT_Service_Management_Robert_M_Cachia_26_02
_2009pdf
Agility amp alertness hellipcontinuitycentral httpwwwcontinuitycentralcomukhtm
BS25999 (British Standards Institution) httpwwwbsigroupcomensectorsandservicesDisciplinesBusines
s-Continuity
RiskIT amp CobIT frameworks (ISACA) wwwisaca-maltaorg
SFIA HR framework for IT ISEB (BCS) httpwwwbcsorg
SEI (USA) httpwwwseicmuedu
Uptime Institute (USA) httpwwwuptimeinstituteorg
robertmcachia at umedumt
Questions and Discussion
robertmcachia atldquo umedumt
Thank You
Dr Robert M Cachia
httpwwwlinkedincominrobertmcachia
robertmcachia ldquoatrdquo umedumt
The Risk Matrix - likelihood vs impact
Major Risks to DC - top-right square
unacceptable Eliminate or Postpone
Transfer Monitor amp Mitigate +
management ownership + DO BCM -Governance
Contingency Risks to DC - top-left
square very rare events will
extinguish the business complacency +
overconfidence can distract
management from the consequences
DO BCM - Governance
High incidence and low impact risksamp
minor risks order of the day -
operational matters
robertmcachia at umedumt
Risk Assessment Phase
RA considers the whatwhywherewho can and would cause the DCthe disastrous losses identified in BIA RA seeks to understand threatsto valued serviceDC assets
bull RA takes a primarily external focus a worldpoliticaleconomic IT sector focus - it looks at sources and causes
bull RA identifies threat scenarios to DC value-creating activities ampassets
bull RA estimates likelihood (probability) of occurrence of DC lossesamp asset impairment identified in BIA (estimates orguesstimates)
Note Need to locate DC activities amp assets in the slots of the RiskMatrix for the DC and plan to protect value-creating capability
robertmcachia at umedumt
BC Planning amp Design Phase - business recovery
DC recovery options driven by business requirements
regulation contracts amp SLAs A typical recovery sequence
1 Set MTPD objective for the DCfor each
client
2 Assign RTOs for individual
clientservicesapplicationsdata on
basis of prioritySLA
3 Evaluate alternative management amp
technical recovery options
4 Perform costbenefit analysis for each
option (re-planiterate)
5 Decide CEOCFO (i) approval (ii)
budget (iii) tasking individualsFailure to prepare is preparing
to fail - John Wooden
robertmcachia at umedumt
Some business recovery options for the DC
bull an Emergency Operations OfficeCentre
bull virtual workplaceteleworking from home for selected DC employees
bull alternative IT Call CentreData Centre
bull business decision - On-shore Near-shore Offshore
fallbackredundancy options
A typical recovery sequence
1 Initial emergency response
2 Resume mission-critical ApplicationsServices
3 Resume non-critical ApplicationsServices
4 Restoration to primary site full services verify stability
Ongoing communication - DC employees clients regulators
BC Planning amp Design Phase - business recovery (2)
robertmcachia at umedumt
BC Planning amp Design - technical recovery
Note business decision On-shore Near-shore Offshore
Note Mirror Hot Warm Cold (i) own (ii) shared-space (iii) reciprocal
agreements (iv) outsourced a business issue
Note high-end BC for DCs active-active hellip hot failover hellip
Note entry-point BC for DCs electronic vaulting remote journaling
transaction logging
Mirror full redundancy short-
distancewidely-separated Data
Centers latency considerations
Hot fully equipped fallback Data Centre
Warm fallback Data Center missing key
components
Cold empty fallback Data Center
DC Recovery options by increasing MTPD RTO amp decreasing cost
mirroring
robertmcachia at umedumt
Phases Exercise amp Testing Maintain amp Review Awareness amp Training
Maintain amp Review Phase
feedback from testingaudit broadendeepen DC BC plan
Awareness amp Training Phase
its never enough institutionalize BC practice embed into
DC culture
The proof of the bdquopudding‟
is in the eating
Exercise amp Testing Phase
Give yourself an intentional DC disaster
1 Re-test BCP regularly
(walk-through checklist role-
playingsimulation full interruption amp
rehearsal)
2 Test alternative risk (threat +
vulnerability) scenarios
robertmcachia at umedumt
An example disaster mass cyber-attack
A mass cyber-attack may
involve
prior sniffing hellip
hellip social engineering amp
widespread attack
An attack may be blended
denial-of-service andor
widespread virusworm
infection andor
penetration (data theft
data corruption) andor
fraudblackmail
IncidentProblem Management
(recall ISO 20000 ITIL)source M Benyoucef
University of Ottawa
robertmcachia at umedumt
An example DC disaster BCM response to cyber-attack
This sequence applies to DoSDDoS infection amp penetration cyber-
attacks
Note After the BCM response seek prevention (i) Lessons Learnt (ii)
improve DC processsystems resilience (iii) heighten DC emergency
preparedness
1 Interdict actions to block attack halt its progress
2 Contain actions to prevent further degradation
amp regain control
3 Recover repair fix damage bring all systems
data applicationsservices to normality
4 Analyse investigationaudit - examination of
evidence
When prevention fails and an attack is under way the BCM
response is
Interdict Contain
Recover Analyse
robertmcachia at umedumt
Structure amp Content of BS25999
A DC may go beyond organising a BCP project
It may decide to formalise its BCM assurance process by embracing the
BS25999 standard
BS25999 holistic structure amp content
bull overview of business continuity management (BCM)
bull business continuity management policy
bull BCM programme management
bull understanding the organization
bull determining business continuity strategy
bull developing and implementing a BCM response
bull exercising maintaining and reviewing BCM arrangements
bull embedding BCM in the organizationrsquos culture
BS25999 Holistic
robertmcachia at umedumt
Some takeaways (1)
BCM thinking amp action
bull decrease the number of
holes both management amp
technical
bull decrease the size of the
holes and
bull keep them moving so they
are never aligned
The Swiss cheese model of business resilience
robertmcachia at umedumt
Some takeaways (2)
BCM is a concern of
CIOs CTOs Data Centre Managers
Quality Managers Infosec Managers
IT Architects DBAs
IT Auditors
CEOs CFOs because itrsquos Governance
BCM addresses broad non-specific risks it is a ldquocatch allrdquo
BCM is probably the most cost-effective security control of all
BCM start now start small scale fast broadendeepen BCM scope
incrementally
You must be able to respond to your circumstances as they exist - not as you
would like them to be - Brian Billick
robertmcachia at umedumt
Sources in and around BCM for Data Centres
The Business Continuity Institute (BCI UK) httpwwwthebciorg
Guidancestudies on BCM by ENISA (EU) ECB (EU) CMI (UK) FSA (UK) UK
Cabinet Office
A primer on ITIL ISO 20000 - IT Service Management
httpwwwisaca-
maltaorgdocumentspresentationsIT_Service_Management_Robert_M_Cachia_26_02
_2009pdf
Agility amp alertness hellipcontinuitycentral httpwwwcontinuitycentralcomukhtm
BS25999 (British Standards Institution) httpwwwbsigroupcomensectorsandservicesDisciplinesBusines
s-Continuity
RiskIT amp CobIT frameworks (ISACA) wwwisaca-maltaorg
SFIA HR framework for IT ISEB (BCS) httpwwwbcsorg
SEI (USA) httpwwwseicmuedu
Uptime Institute (USA) httpwwwuptimeinstituteorg
robertmcachia at umedumt
Questions and Discussion
robertmcachia atldquo umedumt
Thank You
Dr Robert M Cachia
httpwwwlinkedincominrobertmcachia
robertmcachia ldquoatrdquo umedumt
Risk Assessment Phase
RA considers the whatwhywherewho can and would cause the DCthe disastrous losses identified in BIA RA seeks to understand threatsto valued serviceDC assets
bull RA takes a primarily external focus a worldpoliticaleconomic IT sector focus - it looks at sources and causes
bull RA identifies threat scenarios to DC value-creating activities ampassets
bull RA estimates likelihood (probability) of occurrence of DC lossesamp asset impairment identified in BIA (estimates orguesstimates)
Note Need to locate DC activities amp assets in the slots of the RiskMatrix for the DC and plan to protect value-creating capability
robertmcachia at umedumt
BC Planning amp Design Phase - business recovery
DC recovery options driven by business requirements
regulation contracts amp SLAs A typical recovery sequence
1 Set MTPD objective for the DCfor each
client
2 Assign RTOs for individual
clientservicesapplicationsdata on
basis of prioritySLA
3 Evaluate alternative management amp
technical recovery options
4 Perform costbenefit analysis for each
option (re-planiterate)
5 Decide CEOCFO (i) approval (ii)
budget (iii) tasking individualsFailure to prepare is preparing
to fail - John Wooden
robertmcachia at umedumt
Some business recovery options for the DC
bull an Emergency Operations OfficeCentre
bull virtual workplaceteleworking from home for selected DC employees
bull alternative IT Call CentreData Centre
bull business decision - On-shore Near-shore Offshore
fallbackredundancy options
A typical recovery sequence
1 Initial emergency response
2 Resume mission-critical ApplicationsServices
3 Resume non-critical ApplicationsServices
4 Restoration to primary site full services verify stability
Ongoing communication - DC employees clients regulators
BC Planning amp Design Phase - business recovery (2)
robertmcachia at umedumt
BC Planning amp Design - technical recovery
Note business decision On-shore Near-shore Offshore
Note Mirror Hot Warm Cold (i) own (ii) shared-space (iii) reciprocal
agreements (iv) outsourced a business issue
Note high-end BC for DCs active-active hellip hot failover hellip
Note entry-point BC for DCs electronic vaulting remote journaling
transaction logging
Mirror full redundancy short-
distancewidely-separated Data
Centers latency considerations
Hot fully equipped fallback Data Centre
Warm fallback Data Center missing key
components
Cold empty fallback Data Center
DC Recovery options by increasing MTPD RTO amp decreasing cost
mirroring
robertmcachia at umedumt
Phases Exercise amp Testing Maintain amp Review Awareness amp Training
Maintain amp Review Phase
feedback from testingaudit broadendeepen DC BC plan
Awareness amp Training Phase
its never enough institutionalize BC practice embed into
DC culture
The proof of the bdquopudding‟
is in the eating
Exercise amp Testing Phase
Give yourself an intentional DC disaster
1 Re-test BCP regularly
(walk-through checklist role-
playingsimulation full interruption amp
rehearsal)
2 Test alternative risk (threat +
vulnerability) scenarios
robertmcachia at umedumt
An example disaster mass cyber-attack
A mass cyber-attack may
involve
prior sniffing hellip
hellip social engineering amp
widespread attack
An attack may be blended
denial-of-service andor
widespread virusworm
infection andor
penetration (data theft
data corruption) andor
fraudblackmail
IncidentProblem Management
(recall ISO 20000 ITIL)source M Benyoucef
University of Ottawa
robertmcachia at umedumt
An example DC disaster BCM response to cyber-attack
This sequence applies to DoSDDoS infection amp penetration cyber-
attacks
Note After the BCM response seek prevention (i) Lessons Learnt (ii)
improve DC processsystems resilience (iii) heighten DC emergency
preparedness
1 Interdict actions to block attack halt its progress
2 Contain actions to prevent further degradation
amp regain control
3 Recover repair fix damage bring all systems
data applicationsservices to normality
4 Analyse investigationaudit - examination of
evidence
When prevention fails and an attack is under way the BCM
response is
Interdict Contain
Recover Analyse
robertmcachia at umedumt
Structure amp Content of BS25999
A DC may go beyond organising a BCP project
It may decide to formalise its BCM assurance process by embracing the
BS25999 standard
BS25999 holistic structure amp content
bull overview of business continuity management (BCM)
bull business continuity management policy
bull BCM programme management
bull understanding the organization
bull determining business continuity strategy
bull developing and implementing a BCM response
bull exercising maintaining and reviewing BCM arrangements
bull embedding BCM in the organizationrsquos culture
BS25999 Holistic
robertmcachia at umedumt
Some takeaways (1)
BCM thinking amp action
bull decrease the number of
holes both management amp
technical
bull decrease the size of the
holes and
bull keep them moving so they
are never aligned
The Swiss cheese model of business resilience
robertmcachia at umedumt
Some takeaways (2)
BCM is a concern of
CIOs CTOs Data Centre Managers
Quality Managers Infosec Managers
IT Architects DBAs
IT Auditors
CEOs CFOs because itrsquos Governance
BCM addresses broad non-specific risks it is a ldquocatch allrdquo
BCM is probably the most cost-effective security control of all
BCM start now start small scale fast broadendeepen BCM scope
incrementally
You must be able to respond to your circumstances as they exist - not as you
would like them to be - Brian Billick
robertmcachia at umedumt
Sources in and around BCM for Data Centres
The Business Continuity Institute (BCI UK) httpwwwthebciorg
Guidancestudies on BCM by ENISA (EU) ECB (EU) CMI (UK) FSA (UK) UK
Cabinet Office
A primer on ITIL ISO 20000 - IT Service Management
httpwwwisaca-
maltaorgdocumentspresentationsIT_Service_Management_Robert_M_Cachia_26_02
_2009pdf
Agility amp alertness hellipcontinuitycentral httpwwwcontinuitycentralcomukhtm
BS25999 (British Standards Institution) httpwwwbsigroupcomensectorsandservicesDisciplinesBusines
s-Continuity
RiskIT amp CobIT frameworks (ISACA) wwwisaca-maltaorg
SFIA HR framework for IT ISEB (BCS) httpwwwbcsorg
SEI (USA) httpwwwseicmuedu
Uptime Institute (USA) httpwwwuptimeinstituteorg
robertmcachia at umedumt
Questions and Discussion
robertmcachia atldquo umedumt
Thank You
Dr Robert M Cachia
httpwwwlinkedincominrobertmcachia
robertmcachia ldquoatrdquo umedumt
BC Planning amp Design Phase - business recovery
DC recovery options driven by business requirements
regulation contracts amp SLAs A typical recovery sequence
1 Set MTPD objective for the DCfor each
client
2 Assign RTOs for individual
clientservicesapplicationsdata on
basis of prioritySLA
3 Evaluate alternative management amp
technical recovery options
4 Perform costbenefit analysis for each
option (re-planiterate)
5 Decide CEOCFO (i) approval (ii)
budget (iii) tasking individualsFailure to prepare is preparing
to fail - John Wooden
robertmcachia at umedumt
Some business recovery options for the DC
bull an Emergency Operations OfficeCentre
bull virtual workplaceteleworking from home for selected DC employees
bull alternative IT Call CentreData Centre
bull business decision - On-shore Near-shore Offshore
fallbackredundancy options
A typical recovery sequence
1 Initial emergency response
2 Resume mission-critical ApplicationsServices
3 Resume non-critical ApplicationsServices
4 Restoration to primary site full services verify stability
Ongoing communication - DC employees clients regulators
BC Planning amp Design Phase - business recovery (2)
robertmcachia at umedumt
BC Planning amp Design - technical recovery
Note business decision On-shore Near-shore Offshore
Note Mirror Hot Warm Cold (i) own (ii) shared-space (iii) reciprocal
agreements (iv) outsourced a business issue
Note high-end BC for DCs active-active hellip hot failover hellip
Note entry-point BC for DCs electronic vaulting remote journaling
transaction logging
Mirror full redundancy short-
distancewidely-separated Data
Centers latency considerations
Hot fully equipped fallback Data Centre
Warm fallback Data Center missing key
components
Cold empty fallback Data Center
DC Recovery options by increasing MTPD RTO amp decreasing cost
mirroring
robertmcachia at umedumt
Phases Exercise amp Testing Maintain amp Review Awareness amp Training
Maintain amp Review Phase
feedback from testingaudit broadendeepen DC BC plan
Awareness amp Training Phase
its never enough institutionalize BC practice embed into
DC culture
The proof of the bdquopudding‟
is in the eating
Exercise amp Testing Phase
Give yourself an intentional DC disaster
1 Re-test BCP regularly
(walk-through checklist role-
playingsimulation full interruption amp
rehearsal)
2 Test alternative risk (threat +
vulnerability) scenarios
robertmcachia at umedumt
An example disaster mass cyber-attack
A mass cyber-attack may
involve
prior sniffing hellip
hellip social engineering amp
widespread attack
An attack may be blended
denial-of-service andor
widespread virusworm
infection andor
penetration (data theft
data corruption) andor
fraudblackmail
IncidentProblem Management
(recall ISO 20000 ITIL)source M Benyoucef
University of Ottawa
robertmcachia at umedumt
An example DC disaster BCM response to cyber-attack
This sequence applies to DoSDDoS infection amp penetration cyber-
attacks
Note After the BCM response seek prevention (i) Lessons Learnt (ii)
improve DC processsystems resilience (iii) heighten DC emergency
preparedness
1 Interdict actions to block attack halt its progress
2 Contain actions to prevent further degradation
amp regain control
3 Recover repair fix damage bring all systems
data applicationsservices to normality
4 Analyse investigationaudit - examination of
evidence
When prevention fails and an attack is under way the BCM
response is
Interdict Contain
Recover Analyse
robertmcachia at umedumt
Structure amp Content of BS25999
A DC may go beyond organising a BCP project
It may decide to formalise its BCM assurance process by embracing the
BS25999 standard
BS25999 holistic structure amp content
bull overview of business continuity management (BCM)
bull business continuity management policy
bull BCM programme management
bull understanding the organization
bull determining business continuity strategy
bull developing and implementing a BCM response
bull exercising maintaining and reviewing BCM arrangements
bull embedding BCM in the organizationrsquos culture
BS25999 Holistic
robertmcachia at umedumt
Some takeaways (1)
BCM thinking amp action
bull decrease the number of
holes both management amp
technical
bull decrease the size of the
holes and
bull keep them moving so they
are never aligned
The Swiss cheese model of business resilience
robertmcachia at umedumt
Some takeaways (2)
BCM is a concern of
CIOs CTOs Data Centre Managers
Quality Managers Infosec Managers
IT Architects DBAs
IT Auditors
CEOs CFOs because itrsquos Governance
BCM addresses broad non-specific risks it is a ldquocatch allrdquo
BCM is probably the most cost-effective security control of all
BCM start now start small scale fast broadendeepen BCM scope
incrementally
You must be able to respond to your circumstances as they exist - not as you
would like them to be - Brian Billick
robertmcachia at umedumt
Sources in and around BCM for Data Centres
The Business Continuity Institute (BCI UK) httpwwwthebciorg
Guidancestudies on BCM by ENISA (EU) ECB (EU) CMI (UK) FSA (UK) UK
Cabinet Office
A primer on ITIL ISO 20000 - IT Service Management
httpwwwisaca-
maltaorgdocumentspresentationsIT_Service_Management_Robert_M_Cachia_26_02
_2009pdf
Agility amp alertness hellipcontinuitycentral httpwwwcontinuitycentralcomukhtm
BS25999 (British Standards Institution) httpwwwbsigroupcomensectorsandservicesDisciplinesBusines
s-Continuity
RiskIT amp CobIT frameworks (ISACA) wwwisaca-maltaorg
SFIA HR framework for IT ISEB (BCS) httpwwwbcsorg
SEI (USA) httpwwwseicmuedu
Uptime Institute (USA) httpwwwuptimeinstituteorg
robertmcachia at umedumt
Questions and Discussion
robertmcachia atldquo umedumt
Thank You
Dr Robert M Cachia
httpwwwlinkedincominrobertmcachia
robertmcachia ldquoatrdquo umedumt
Some business recovery options for the DC
bull an Emergency Operations OfficeCentre
bull virtual workplaceteleworking from home for selected DC employees
bull alternative IT Call CentreData Centre
bull business decision - On-shore Near-shore Offshore
fallbackredundancy options
A typical recovery sequence
1 Initial emergency response
2 Resume mission-critical ApplicationsServices
3 Resume non-critical ApplicationsServices
4 Restoration to primary site full services verify stability
Ongoing communication - DC employees clients regulators
BC Planning amp Design Phase - business recovery (2)
robertmcachia at umedumt
BC Planning amp Design - technical recovery
Note business decision On-shore Near-shore Offshore
Note Mirror Hot Warm Cold (i) own (ii) shared-space (iii) reciprocal
agreements (iv) outsourced a business issue
Note high-end BC for DCs active-active hellip hot failover hellip
Note entry-point BC for DCs electronic vaulting remote journaling
transaction logging
Mirror full redundancy short-
distancewidely-separated Data
Centers latency considerations
Hot fully equipped fallback Data Centre
Warm fallback Data Center missing key
components
Cold empty fallback Data Center
DC Recovery options by increasing MTPD RTO amp decreasing cost
mirroring
robertmcachia at umedumt
Phases Exercise amp Testing Maintain amp Review Awareness amp Training
Maintain amp Review Phase
feedback from testingaudit broadendeepen DC BC plan
Awareness amp Training Phase
its never enough institutionalize BC practice embed into
DC culture
The proof of the bdquopudding‟
is in the eating
Exercise amp Testing Phase
Give yourself an intentional DC disaster
1 Re-test BCP regularly
(walk-through checklist role-
playingsimulation full interruption amp
rehearsal)
2 Test alternative risk (threat +
vulnerability) scenarios
robertmcachia at umedumt
An example disaster mass cyber-attack
A mass cyber-attack may
involve
prior sniffing hellip
hellip social engineering amp
widespread attack
An attack may be blended
denial-of-service andor
widespread virusworm
infection andor
penetration (data theft
data corruption) andor
fraudblackmail
IncidentProblem Management
(recall ISO 20000 ITIL)source M Benyoucef
University of Ottawa
robertmcachia at umedumt
An example DC disaster BCM response to cyber-attack
This sequence applies to DoSDDoS infection amp penetration cyber-
attacks
Note After the BCM response seek prevention (i) Lessons Learnt (ii)
improve DC processsystems resilience (iii) heighten DC emergency
preparedness
1 Interdict actions to block attack halt its progress
2 Contain actions to prevent further degradation
amp regain control
3 Recover repair fix damage bring all systems
data applicationsservices to normality
4 Analyse investigationaudit - examination of
evidence
When prevention fails and an attack is under way the BCM
response is
Interdict Contain
Recover Analyse
robertmcachia at umedumt
Structure amp Content of BS25999
A DC may go beyond organising a BCP project
It may decide to formalise its BCM assurance process by embracing the
BS25999 standard
BS25999 holistic structure amp content
bull overview of business continuity management (BCM)
bull business continuity management policy
bull BCM programme management
bull understanding the organization
bull determining business continuity strategy
bull developing and implementing a BCM response
bull exercising maintaining and reviewing BCM arrangements
bull embedding BCM in the organizationrsquos culture
BS25999 Holistic
robertmcachia at umedumt
Some takeaways (1)
BCM thinking amp action
bull decrease the number of
holes both management amp
technical
bull decrease the size of the
holes and
bull keep them moving so they
are never aligned
The Swiss cheese model of business resilience
robertmcachia at umedumt
Some takeaways (2)
BCM is a concern of
CIOs CTOs Data Centre Managers
Quality Managers Infosec Managers
IT Architects DBAs
IT Auditors
CEOs CFOs because itrsquos Governance
BCM addresses broad non-specific risks it is a ldquocatch allrdquo
BCM is probably the most cost-effective security control of all
BCM start now start small scale fast broadendeepen BCM scope
incrementally
You must be able to respond to your circumstances as they exist - not as you
would like them to be - Brian Billick
robertmcachia at umedumt
Sources in and around BCM for Data Centres
The Business Continuity Institute (BCI UK) httpwwwthebciorg
Guidancestudies on BCM by ENISA (EU) ECB (EU) CMI (UK) FSA (UK) UK
Cabinet Office
A primer on ITIL ISO 20000 - IT Service Management
httpwwwisaca-
maltaorgdocumentspresentationsIT_Service_Management_Robert_M_Cachia_26_02
_2009pdf
Agility amp alertness hellipcontinuitycentral httpwwwcontinuitycentralcomukhtm
BS25999 (British Standards Institution) httpwwwbsigroupcomensectorsandservicesDisciplinesBusines
s-Continuity
RiskIT amp CobIT frameworks (ISACA) wwwisaca-maltaorg
SFIA HR framework for IT ISEB (BCS) httpwwwbcsorg
SEI (USA) httpwwwseicmuedu
Uptime Institute (USA) httpwwwuptimeinstituteorg
robertmcachia at umedumt
Questions and Discussion
robertmcachia atldquo umedumt
Thank You
Dr Robert M Cachia
httpwwwlinkedincominrobertmcachia
robertmcachia ldquoatrdquo umedumt
BC Planning amp Design - technical recovery
Note business decision On-shore Near-shore Offshore
Note Mirror Hot Warm Cold (i) own (ii) shared-space (iii) reciprocal
agreements (iv) outsourced a business issue
Note high-end BC for DCs active-active hellip hot failover hellip
Note entry-point BC for DCs electronic vaulting remote journaling
transaction logging
Mirror full redundancy short-
distancewidely-separated Data
Centers latency considerations
Hot fully equipped fallback Data Centre
Warm fallback Data Center missing key
components
Cold empty fallback Data Center
DC Recovery options by increasing MTPD RTO amp decreasing cost
mirroring
robertmcachia at umedumt
Phases Exercise amp Testing Maintain amp Review Awareness amp Training
Maintain amp Review Phase
feedback from testingaudit broadendeepen DC BC plan
Awareness amp Training Phase
its never enough institutionalize BC practice embed into
DC culture
The proof of the bdquopudding‟
is in the eating
Exercise amp Testing Phase
Give yourself an intentional DC disaster
1 Re-test BCP regularly
(walk-through checklist role-
playingsimulation full interruption amp
rehearsal)
2 Test alternative risk (threat +
vulnerability) scenarios
robertmcachia at umedumt
An example disaster mass cyber-attack
A mass cyber-attack may
involve
prior sniffing hellip
hellip social engineering amp
widespread attack
An attack may be blended
denial-of-service andor
widespread virusworm
infection andor
penetration (data theft
data corruption) andor
fraudblackmail
IncidentProblem Management
(recall ISO 20000 ITIL)source M Benyoucef
University of Ottawa
robertmcachia at umedumt
An example DC disaster BCM response to cyber-attack
This sequence applies to DoSDDoS infection amp penetration cyber-
attacks
Note After the BCM response seek prevention (i) Lessons Learnt (ii)
improve DC processsystems resilience (iii) heighten DC emergency
preparedness
1 Interdict actions to block attack halt its progress
2 Contain actions to prevent further degradation
amp regain control
3 Recover repair fix damage bring all systems
data applicationsservices to normality
4 Analyse investigationaudit - examination of
evidence
When prevention fails and an attack is under way the BCM
response is
Interdict Contain
Recover Analyse
robertmcachia at umedumt
Structure amp Content of BS25999
A DC may go beyond organising a BCP project
It may decide to formalise its BCM assurance process by embracing the
BS25999 standard
BS25999 holistic structure amp content
bull overview of business continuity management (BCM)
bull business continuity management policy
bull BCM programme management
bull understanding the organization
bull determining business continuity strategy
bull developing and implementing a BCM response
bull exercising maintaining and reviewing BCM arrangements
bull embedding BCM in the organizationrsquos culture
BS25999 Holistic
robertmcachia at umedumt
Some takeaways (1)
BCM thinking amp action
bull decrease the number of
holes both management amp
technical
bull decrease the size of the
holes and
bull keep them moving so they
are never aligned
The Swiss cheese model of business resilience
robertmcachia at umedumt
Some takeaways (2)
BCM is a concern of
CIOs CTOs Data Centre Managers
Quality Managers Infosec Managers
IT Architects DBAs
IT Auditors
CEOs CFOs because itrsquos Governance
BCM addresses broad non-specific risks it is a ldquocatch allrdquo
BCM is probably the most cost-effective security control of all
BCM start now start small scale fast broadendeepen BCM scope
incrementally
You must be able to respond to your circumstances as they exist - not as you
would like them to be - Brian Billick
robertmcachia at umedumt
Sources in and around BCM for Data Centres
The Business Continuity Institute (BCI UK) httpwwwthebciorg
Guidancestudies on BCM by ENISA (EU) ECB (EU) CMI (UK) FSA (UK) UK
Cabinet Office
A primer on ITIL ISO 20000 - IT Service Management
httpwwwisaca-
maltaorgdocumentspresentationsIT_Service_Management_Robert_M_Cachia_26_02
_2009pdf
Agility amp alertness hellipcontinuitycentral httpwwwcontinuitycentralcomukhtm
BS25999 (British Standards Institution) httpwwwbsigroupcomensectorsandservicesDisciplinesBusines
s-Continuity
RiskIT amp CobIT frameworks (ISACA) wwwisaca-maltaorg
SFIA HR framework for IT ISEB (BCS) httpwwwbcsorg
SEI (USA) httpwwwseicmuedu
Uptime Institute (USA) httpwwwuptimeinstituteorg
robertmcachia at umedumt
Questions and Discussion
robertmcachia atldquo umedumt
Thank You
Dr Robert M Cachia
httpwwwlinkedincominrobertmcachia
robertmcachia ldquoatrdquo umedumt
Phases Exercise amp Testing Maintain amp Review Awareness amp Training
Maintain amp Review Phase
feedback from testingaudit broadendeepen DC BC plan
Awareness amp Training Phase
its never enough institutionalize BC practice embed into
DC culture
The proof of the bdquopudding‟
is in the eating
Exercise amp Testing Phase
Give yourself an intentional DC disaster
1 Re-test BCP regularly
(walk-through checklist role-
playingsimulation full interruption amp
rehearsal)
2 Test alternative risk (threat +
vulnerability) scenarios
robertmcachia at umedumt
An example disaster mass cyber-attack
A mass cyber-attack may
involve
prior sniffing hellip
hellip social engineering amp
widespread attack
An attack may be blended
denial-of-service andor
widespread virusworm
infection andor
penetration (data theft
data corruption) andor
fraudblackmail
IncidentProblem Management
(recall ISO 20000 ITIL)source M Benyoucef
University of Ottawa
robertmcachia at umedumt
An example DC disaster BCM response to cyber-attack
This sequence applies to DoSDDoS infection amp penetration cyber-
attacks
Note After the BCM response seek prevention (i) Lessons Learnt (ii)
improve DC processsystems resilience (iii) heighten DC emergency
preparedness
1 Interdict actions to block attack halt its progress
2 Contain actions to prevent further degradation
amp regain control
3 Recover repair fix damage bring all systems
data applicationsservices to normality
4 Analyse investigationaudit - examination of
evidence
When prevention fails and an attack is under way the BCM
response is
Interdict Contain
Recover Analyse
robertmcachia at umedumt
Structure amp Content of BS25999
A DC may go beyond organising a BCP project
It may decide to formalise its BCM assurance process by embracing the
BS25999 standard
BS25999 holistic structure amp content
bull overview of business continuity management (BCM)
bull business continuity management policy
bull BCM programme management
bull understanding the organization
bull determining business continuity strategy
bull developing and implementing a BCM response
bull exercising maintaining and reviewing BCM arrangements
bull embedding BCM in the organizationrsquos culture
BS25999 Holistic
robertmcachia at umedumt
Some takeaways (1)
BCM thinking amp action
bull decrease the number of
holes both management amp
technical
bull decrease the size of the
holes and
bull keep them moving so they
are never aligned
The Swiss cheese model of business resilience
robertmcachia at umedumt
Some takeaways (2)
BCM is a concern of
CIOs CTOs Data Centre Managers
Quality Managers Infosec Managers
IT Architects DBAs
IT Auditors
CEOs CFOs because itrsquos Governance
BCM addresses broad non-specific risks it is a ldquocatch allrdquo
BCM is probably the most cost-effective security control of all
BCM start now start small scale fast broadendeepen BCM scope
incrementally
You must be able to respond to your circumstances as they exist - not as you
would like them to be - Brian Billick
robertmcachia at umedumt
Sources in and around BCM for Data Centres
The Business Continuity Institute (BCI UK) httpwwwthebciorg
Guidancestudies on BCM by ENISA (EU) ECB (EU) CMI (UK) FSA (UK) UK
Cabinet Office
A primer on ITIL ISO 20000 - IT Service Management
httpwwwisaca-
maltaorgdocumentspresentationsIT_Service_Management_Robert_M_Cachia_26_02
_2009pdf
Agility amp alertness hellipcontinuitycentral httpwwwcontinuitycentralcomukhtm
BS25999 (British Standards Institution) httpwwwbsigroupcomensectorsandservicesDisciplinesBusines
s-Continuity
RiskIT amp CobIT frameworks (ISACA) wwwisaca-maltaorg
SFIA HR framework for IT ISEB (BCS) httpwwwbcsorg
SEI (USA) httpwwwseicmuedu
Uptime Institute (USA) httpwwwuptimeinstituteorg
robertmcachia at umedumt
Questions and Discussion
robertmcachia atldquo umedumt
Thank You
Dr Robert M Cachia
httpwwwlinkedincominrobertmcachia
robertmcachia ldquoatrdquo umedumt
An example disaster mass cyber-attack
A mass cyber-attack may
involve
prior sniffing hellip
hellip social engineering amp
widespread attack
An attack may be blended
denial-of-service andor
widespread virusworm
infection andor
penetration (data theft
data corruption) andor
fraudblackmail
IncidentProblem Management
(recall ISO 20000 ITIL)source M Benyoucef
University of Ottawa
robertmcachia at umedumt
An example DC disaster BCM response to cyber-attack
This sequence applies to DoSDDoS infection amp penetration cyber-
attacks
Note After the BCM response seek prevention (i) Lessons Learnt (ii)
improve DC processsystems resilience (iii) heighten DC emergency
preparedness
1 Interdict actions to block attack halt its progress
2 Contain actions to prevent further degradation
amp regain control
3 Recover repair fix damage bring all systems
data applicationsservices to normality
4 Analyse investigationaudit - examination of
evidence
When prevention fails and an attack is under way the BCM
response is
Interdict Contain
Recover Analyse
robertmcachia at umedumt
Structure amp Content of BS25999
A DC may go beyond organising a BCP project
It may decide to formalise its BCM assurance process by embracing the
BS25999 standard
BS25999 holistic structure amp content
bull overview of business continuity management (BCM)
bull business continuity management policy
bull BCM programme management
bull understanding the organization
bull determining business continuity strategy
bull developing and implementing a BCM response
bull exercising maintaining and reviewing BCM arrangements
bull embedding BCM in the organizationrsquos culture
BS25999 Holistic
robertmcachia at umedumt
Some takeaways (1)
BCM thinking amp action
bull decrease the number of
holes both management amp
technical
bull decrease the size of the
holes and
bull keep them moving so they
are never aligned
The Swiss cheese model of business resilience
robertmcachia at umedumt
Some takeaways (2)
BCM is a concern of
CIOs CTOs Data Centre Managers
Quality Managers Infosec Managers
IT Architects DBAs
IT Auditors
CEOs CFOs because itrsquos Governance
BCM addresses broad non-specific risks it is a ldquocatch allrdquo
BCM is probably the most cost-effective security control of all
BCM start now start small scale fast broadendeepen BCM scope
incrementally
You must be able to respond to your circumstances as they exist - not as you
would like them to be - Brian Billick
robertmcachia at umedumt
Sources in and around BCM for Data Centres
The Business Continuity Institute (BCI UK) httpwwwthebciorg
Guidancestudies on BCM by ENISA (EU) ECB (EU) CMI (UK) FSA (UK) UK
Cabinet Office
A primer on ITIL ISO 20000 - IT Service Management
httpwwwisaca-
maltaorgdocumentspresentationsIT_Service_Management_Robert_M_Cachia_26_02
_2009pdf
Agility amp alertness hellipcontinuitycentral httpwwwcontinuitycentralcomukhtm
BS25999 (British Standards Institution) httpwwwbsigroupcomensectorsandservicesDisciplinesBusines
s-Continuity
RiskIT amp CobIT frameworks (ISACA) wwwisaca-maltaorg
SFIA HR framework for IT ISEB (BCS) httpwwwbcsorg
SEI (USA) httpwwwseicmuedu
Uptime Institute (USA) httpwwwuptimeinstituteorg
robertmcachia at umedumt
Questions and Discussion
robertmcachia atldquo umedumt
Thank You
Dr Robert M Cachia
httpwwwlinkedincominrobertmcachia
robertmcachia ldquoatrdquo umedumt
An example DC disaster BCM response to cyber-attack
This sequence applies to DoSDDoS infection amp penetration cyber-
attacks
Note After the BCM response seek prevention (i) Lessons Learnt (ii)
improve DC processsystems resilience (iii) heighten DC emergency
preparedness
1 Interdict actions to block attack halt its progress
2 Contain actions to prevent further degradation
amp regain control
3 Recover repair fix damage bring all systems
data applicationsservices to normality
4 Analyse investigationaudit - examination of
evidence
When prevention fails and an attack is under way the BCM
response is
Interdict Contain
Recover Analyse
robertmcachia at umedumt
Structure amp Content of BS25999
A DC may go beyond organising a BCP project
It may decide to formalise its BCM assurance process by embracing the
BS25999 standard
BS25999 holistic structure amp content
bull overview of business continuity management (BCM)
bull business continuity management policy
bull BCM programme management
bull understanding the organization
bull determining business continuity strategy
bull developing and implementing a BCM response
bull exercising maintaining and reviewing BCM arrangements
bull embedding BCM in the organizationrsquos culture
BS25999 Holistic
robertmcachia at umedumt
Some takeaways (1)
BCM thinking amp action
bull decrease the number of
holes both management amp
technical
bull decrease the size of the
holes and
bull keep them moving so they
are never aligned
The Swiss cheese model of business resilience
robertmcachia at umedumt
Some takeaways (2)
BCM is a concern of
CIOs CTOs Data Centre Managers
Quality Managers Infosec Managers
IT Architects DBAs
IT Auditors
CEOs CFOs because itrsquos Governance
BCM addresses broad non-specific risks it is a ldquocatch allrdquo
BCM is probably the most cost-effective security control of all
BCM start now start small scale fast broadendeepen BCM scope
incrementally
You must be able to respond to your circumstances as they exist - not as you
would like them to be - Brian Billick
robertmcachia at umedumt
Sources in and around BCM for Data Centres
The Business Continuity Institute (BCI UK) httpwwwthebciorg
Guidancestudies on BCM by ENISA (EU) ECB (EU) CMI (UK) FSA (UK) UK
Cabinet Office
A primer on ITIL ISO 20000 - IT Service Management
httpwwwisaca-
maltaorgdocumentspresentationsIT_Service_Management_Robert_M_Cachia_26_02
_2009pdf
Agility amp alertness hellipcontinuitycentral httpwwwcontinuitycentralcomukhtm
BS25999 (British Standards Institution) httpwwwbsigroupcomensectorsandservicesDisciplinesBusines
s-Continuity
RiskIT amp CobIT frameworks (ISACA) wwwisaca-maltaorg
SFIA HR framework for IT ISEB (BCS) httpwwwbcsorg
SEI (USA) httpwwwseicmuedu
Uptime Institute (USA) httpwwwuptimeinstituteorg
robertmcachia at umedumt
Questions and Discussion
robertmcachia atldquo umedumt
Thank You
Dr Robert M Cachia
httpwwwlinkedincominrobertmcachia
robertmcachia ldquoatrdquo umedumt
Structure amp Content of BS25999
A DC may go beyond organising a BCP project
It may decide to formalise its BCM assurance process by embracing the
BS25999 standard
BS25999 holistic structure amp content
bull overview of business continuity management (BCM)
bull business continuity management policy
bull BCM programme management
bull understanding the organization
bull determining business continuity strategy
bull developing and implementing a BCM response
bull exercising maintaining and reviewing BCM arrangements
bull embedding BCM in the organizationrsquos culture
BS25999 Holistic
robertmcachia at umedumt
Some takeaways (1)
BCM thinking amp action
bull decrease the number of
holes both management amp
technical
bull decrease the size of the
holes and
bull keep them moving so they
are never aligned
The Swiss cheese model of business resilience
robertmcachia at umedumt
Some takeaways (2)
BCM is a concern of
CIOs CTOs Data Centre Managers
Quality Managers Infosec Managers
IT Architects DBAs
IT Auditors
CEOs CFOs because itrsquos Governance
BCM addresses broad non-specific risks it is a ldquocatch allrdquo
BCM is probably the most cost-effective security control of all
BCM start now start small scale fast broadendeepen BCM scope
incrementally
You must be able to respond to your circumstances as they exist - not as you
would like them to be - Brian Billick
robertmcachia at umedumt
Sources in and around BCM for Data Centres
The Business Continuity Institute (BCI UK) httpwwwthebciorg
Guidancestudies on BCM by ENISA (EU) ECB (EU) CMI (UK) FSA (UK) UK
Cabinet Office
A primer on ITIL ISO 20000 - IT Service Management
httpwwwisaca-
maltaorgdocumentspresentationsIT_Service_Management_Robert_M_Cachia_26_02
_2009pdf
Agility amp alertness hellipcontinuitycentral httpwwwcontinuitycentralcomukhtm
BS25999 (British Standards Institution) httpwwwbsigroupcomensectorsandservicesDisciplinesBusines
s-Continuity
RiskIT amp CobIT frameworks (ISACA) wwwisaca-maltaorg
SFIA HR framework for IT ISEB (BCS) httpwwwbcsorg
SEI (USA) httpwwwseicmuedu
Uptime Institute (USA) httpwwwuptimeinstituteorg
robertmcachia at umedumt
Questions and Discussion
robertmcachia atldquo umedumt
Thank You
Dr Robert M Cachia
httpwwwlinkedincominrobertmcachia
robertmcachia ldquoatrdquo umedumt
Some takeaways (1)
BCM thinking amp action
bull decrease the number of
holes both management amp
technical
bull decrease the size of the
holes and
bull keep them moving so they
are never aligned
The Swiss cheese model of business resilience
robertmcachia at umedumt
Some takeaways (2)
BCM is a concern of
CIOs CTOs Data Centre Managers
Quality Managers Infosec Managers
IT Architects DBAs
IT Auditors
CEOs CFOs because itrsquos Governance
BCM addresses broad non-specific risks it is a ldquocatch allrdquo
BCM is probably the most cost-effective security control of all
BCM start now start small scale fast broadendeepen BCM scope
incrementally
You must be able to respond to your circumstances as they exist - not as you
would like them to be - Brian Billick
robertmcachia at umedumt
Sources in and around BCM for Data Centres
The Business Continuity Institute (BCI UK) httpwwwthebciorg
Guidancestudies on BCM by ENISA (EU) ECB (EU) CMI (UK) FSA (UK) UK
Cabinet Office
A primer on ITIL ISO 20000 - IT Service Management
httpwwwisaca-
maltaorgdocumentspresentationsIT_Service_Management_Robert_M_Cachia_26_02
_2009pdf
Agility amp alertness hellipcontinuitycentral httpwwwcontinuitycentralcomukhtm
BS25999 (British Standards Institution) httpwwwbsigroupcomensectorsandservicesDisciplinesBusines
s-Continuity
RiskIT amp CobIT frameworks (ISACA) wwwisaca-maltaorg
SFIA HR framework for IT ISEB (BCS) httpwwwbcsorg
SEI (USA) httpwwwseicmuedu
Uptime Institute (USA) httpwwwuptimeinstituteorg
robertmcachia at umedumt
Questions and Discussion
robertmcachia atldquo umedumt
Thank You
Dr Robert M Cachia
httpwwwlinkedincominrobertmcachia
robertmcachia ldquoatrdquo umedumt
Some takeaways (2)
BCM is a concern of
CIOs CTOs Data Centre Managers
Quality Managers Infosec Managers
IT Architects DBAs
IT Auditors
CEOs CFOs because itrsquos Governance
BCM addresses broad non-specific risks it is a ldquocatch allrdquo
BCM is probably the most cost-effective security control of all
BCM start now start small scale fast broadendeepen BCM scope
incrementally
You must be able to respond to your circumstances as they exist - not as you
would like them to be - Brian Billick
robertmcachia at umedumt
Sources in and around BCM for Data Centres
The Business Continuity Institute (BCI UK) httpwwwthebciorg
Guidancestudies on BCM by ENISA (EU) ECB (EU) CMI (UK) FSA (UK) UK
Cabinet Office
A primer on ITIL ISO 20000 - IT Service Management
httpwwwisaca-
maltaorgdocumentspresentationsIT_Service_Management_Robert_M_Cachia_26_02
_2009pdf
Agility amp alertness hellipcontinuitycentral httpwwwcontinuitycentralcomukhtm
BS25999 (British Standards Institution) httpwwwbsigroupcomensectorsandservicesDisciplinesBusines
s-Continuity
RiskIT amp CobIT frameworks (ISACA) wwwisaca-maltaorg
SFIA HR framework for IT ISEB (BCS) httpwwwbcsorg
SEI (USA) httpwwwseicmuedu
Uptime Institute (USA) httpwwwuptimeinstituteorg
robertmcachia at umedumt
Questions and Discussion
robertmcachia atldquo umedumt
Thank You
Dr Robert M Cachia
httpwwwlinkedincominrobertmcachia
robertmcachia ldquoatrdquo umedumt
Sources in and around BCM for Data Centres
The Business Continuity Institute (BCI UK) httpwwwthebciorg
Guidancestudies on BCM by ENISA (EU) ECB (EU) CMI (UK) FSA (UK) UK
Cabinet Office
A primer on ITIL ISO 20000 - IT Service Management
httpwwwisaca-
maltaorgdocumentspresentationsIT_Service_Management_Robert_M_Cachia_26_02
_2009pdf
Agility amp alertness hellipcontinuitycentral httpwwwcontinuitycentralcomukhtm
BS25999 (British Standards Institution) httpwwwbsigroupcomensectorsandservicesDisciplinesBusines
s-Continuity
RiskIT amp CobIT frameworks (ISACA) wwwisaca-maltaorg
SFIA HR framework for IT ISEB (BCS) httpwwwbcsorg
SEI (USA) httpwwwseicmuedu
Uptime Institute (USA) httpwwwuptimeinstituteorg
robertmcachia at umedumt
Questions and Discussion
robertmcachia atldquo umedumt
Thank You
Dr Robert M Cachia
httpwwwlinkedincominrobertmcachia
robertmcachia ldquoatrdquo umedumt
Questions and Discussion
robertmcachia atldquo umedumt
Thank You
Dr Robert M Cachia
httpwwwlinkedincominrobertmcachia
robertmcachia ldquoatrdquo umedumt