business continuity: ensuring survival ron lapedis, cbcp, cissp sr. product manager, compaq ron...
TRANSCRIPT
Business Continuity: Ensuring SurvivalBusiness Continuity: Ensuring Survival
Ron LaPedis, CBCP, CISSP
Sr. Product Manager, Compaq
Ron LaPedis, CBCP, CISSP
Sr. Product Manager, Compaq
2
AgendaAgendaAgendaAgenda
Continuity planning? I thoughtit was called disaster recovery…
Why? Professional practices Continuity planning model Step by step Horror stories Food for thought
3
Source:San Francicso Chronicle
Some peopleSome peoplenever learn…never learn…Some peopleSome peoplenever learn…never learn…
11/30/89Crane Collapse Closes Buildings(Over 1 month after the Loma Prieta earthquake)
…for 10 minutes…her job was to race through work areas and scoop up appointment books, payroll records and Rolodexes needed to carry on business elsewhere… Many tenants’ main concern
was getting payroll checks…phone lists and calendars
4
Something happensSomething happensSomething happensSomething happens
Business processloss
Productivity(Single
department or multiple
departments)
Disaster eventoccurs
Time
Source: DRII
5
Disaster recoveryDisaster recoveryDisaster recoveryDisaster recovery
1112.7
Business processloss
Productivity
Disaster eventoccurs
Time
Source: DRII
6
Continuity planningContinuity planningContinuity planningContinuity planning
1112.81112.7
Businessprocess
loss
Disaster eventoccurs
Time
Source: DRII
Productivity
Why?Why?
8Source: Contingency Planning Research, 2000
Downtime is lost revenueDowntime is lost revenueDowntime is lost revenueDowntime is lost revenue
Industry
Financial
Financial
Media
Retail
Retail
Transportation
Entertainment
Shipping
Financial
Application
Brokerage operations
Credit card sales
Pay-per-view
Home shopping (TV)
Catalog sales
Airline reservations
Tele-ticket sales
Package shipping
ATM fees
Average cost per hour of downtime (US$)
$ 7,840,000
$ 3,160,000
$ 183,000
$ 137,000
$ 109,000
$ 108,000
$ 83,000
$ 34,000
$ 18,000
9
Time zones are no longer a barrier for conducting business If your site is down, your competition is one click away
– Utility failure– Communications failure– System failure– Application failure– OS failure– Utility upgradeUtility upgrade– Communications upgradeCommunications upgrade– System upgradeSystem upgrade– Application upgradeApplication upgrade– OS upgradeOS upgrade
And what aboutsystem and database maintenance?
Downtime is not acceptableDowntime is not acceptableDowntime is not acceptableDowntime is not acceptable
10
Downtime is controllableDowntime is controllableDowntime is controllableDowntime is controllable
System and network architecture– High-availability systems
– Redundant network
– Hardened primary site
– Remote backup site Continuity planning
– Know what you will do before you need to do it
11
Continuity planning perspectiveContinuity planning perspectiveContinuity planning perspectiveContinuity planning perspective
Ensures that an event doesn’t becomea disaster
Covers a broad spectrum of business and technology issues
The key goal:– Required business process availability
12
Disaster Recovery Institute International Disaster Recovery Institute International (DRII)(DRII)Disaster Recovery Institute International Disaster Recovery Institute International (DRII)(DRII)
MissionDRII’s mission is to provide the leadership and best practices that serve as a base of common knowledge for all business continuity and disaster recovery planners and organizationsin the industry.
13
DRII’s professional practicesDRII’s professional practicesDRII’s professional practicesDRII’s professional practicesPre-planning
1. Project initiation and management2. Risk evaluation and control3. Business impact analysis
Planning4. Developing business continuity strategies5. Emergency response and operations6. Developing and implementing business continuity plans
Post-planning7. Awareness and training programs8. Maintaining and exercising business continuity plans9. Public relations and crisis communication
10. Coordination with public authorities
14
DRII’s business continuityDRII’s business continuityplanning modelplanning modelDRII’s business continuityDRII’s business continuityplanning modelplanning model
1.Project initiation phase
2.Functional requirements phase
3.Design and development phase
4.Implementation phase
5.Testing and exercise phase
6.Maintenance and update phase
7.Execution phase
15
It’s a processIt’s a processIt’s a processIt’s a process
Projectinitiation
Functionalrequirements
Design anddevelopmentImplementation
Maintenanceand updating
Testing andexercising
Procedures
Businesscontinuityprocess
Start
Requiredavailability times
Source: DRII
16
Project initiation phaseProject initiation phaseProject initiation phaseProject initiation phase
Management commitment and policies Objectives and requirements Baseline assumptions Project management Teams
– Delphi – Business function knowledge
– Corporate team – Infrastructure / common activities
– EMT – Emergency Management Team ‘the workers’
– CMT – Crisis Management Team ‘the decision makers’
17
Project initiation phase Project initiation phase Project managementProject management
CP is a process consisting of programs and projects
It does not take a subject matter expert to manage projects, it takes a project manager
Use your CP experts to perform CP activities, not to manage projects.
18Source: DRII
Projectinitiation
Functionalrequirements
Design anddevelopmentImplementation
Maintenanceand updating
Testing andexercising
Procedures
Businesscontinuityprocess
Requiredavailability times
You are
here
19
Functional requirements phaseFunctional requirements phaseFunctional requirements phaseFunctional requirements phase
Fact gathering, alternatives and decisions Risk analysis and controls Business impact analysis
– RTO – Recovery Time Objective – How fast
– RPO – Recovery Point Objective – How much Alternative strategies Cost benefit analysis and budgeting
20
Functional requirements phase Functional requirements phase Risk analysisRisk analysis
Asset inventoryand definition
Evaluationof controls
Decision
Vulnerabilityand threat
assessment
Communicationand monitoring
21
Functional requirements phase Functional requirements phase Risk analysisRisk analysis
Quantitative – Facts and figures, hard– Statistical
– Actuarial
– Annualized Loss Exposure (ALE)
– Objective
Qualitative – Not calculable, soft– Reputation
– Future market share
– Subjective
22
Functional requirements phase Functional requirements phase Risk analysisRisk analysis
Controls do not reduce the threat, they reduce the exposure (and hence, the risk)
23
Time to recover
COST
LOSS
Maximum costof control
Acceptabledowntime
Mon
eyFunctional requirements phase Functional requirements phase Business impact analysisBusiness impact analysis
24Source: DRII
Projectinitiation
Functionalrequirements
Design anddevelopmentImplementation
Maintenanceand updating
Testing andexercising
Procedures
Businesscontinuityprocess
Requiredavailability times
You
are
Her
e
25
Design and development phaseDesign and development phaseDesign and development phaseDesign and development phase
Scope and objectives Recovery teams Cookbook Key disaster scenario Escalation, notification, and activation
26
Design and development phaseDesign and development phaseRecovery teamsRecovery teams
Evaluation and declaration Notification Emergency response Interim processing Salvage Relocation/reentry
27
Design and development phaseDesign and development phaseKey disaster scenarioKey disaster scenario
“A fire broke out in the computer room. We are unsure of the state of the computers and data stored there. The building has been shut down by the fire department until they are sure that it is safe to enter. They are estimating that we will not have access to the building for a couple of days”
28
Design and development phaseDesign and development phaseEscalation, notification, and activationEscalation, notification, and activation
Who activates the EMT? How does the EMT get activated? Who decides to activate the CMT? How does the CMT get activated? How does the CMT decide to activate the plan? What happens if certain members of the CMT
are unavailable?
29Source: DRII
Projectinitiation
Functionalrequirements
Design anddevelopmentImplementation
Maintenanceand updating
Testing andexercising
Procedures
Businesscontinuityprocess
Requiredavailability times
You areHere
30
Implementation phaseImplementation phaseImplementation phaseImplementation phase
Emergency response Command and control Designation of authority Scripts Vendors and resources
31
Implementation phaseImplementation phaseDesignation of authorityDesignation of authority
Who is in charge?– If they are not available, who is in charge?
If they are not available, who is in charge?– If they are not available, who is in charge?
Committees cannot be in charge!
32
Implementation phaseImplementation phaseScriptsScripts
Step by step listing of activities to be performed every step of the way
– In a disaster situation, people do not think rationally
Scripts can be tested, tuned, and tested again– The person who follows a script does not need to
be the person who developed the script Automate as much as possible
– One company has 800 automated scripts just for recovering their database!
33
Implementation phaseImplementation phaseVendors and resourcesVendors and resources
Hot site, warm site, cold site, off-site records storage
Equipment replacement Rent-a-guard Salvage experts Catering Hotel rooms, rental cars Local authorities
– Police, fire, hospitals, hazmat teams
34Source: DRII
Projectinitiation
Functionalrequirements
Design anddevelopmentImplementation
Maintenanceand updating
Testing andexercising
Procedures
Businesscontinuityprocess
Requiredavailability times
You are
Here
35
Testing and exercise phaseTesting and exercise phaseTesting and exercise phaseTesting and exercise phase
Training and awareness Exercise program objectives Exercise plans, scenarios and exercises Evaluation and modification
36
Testing and exercise phaseTesting and exercise phaseExercise program objectivesExercise program objectives
Practice makes perfect – Some companies spend hundreds of hours tweaking parts of their plans to decrease recovery time
Every second counts
37
Testing and exercise phaseTesting and exercise phaseEvaluation and modificationEvaluation and modification
What went wrong and how do we fix it fornext time?
Do not find someone to blame. A fault found now could save your company later
Were any of our assumptions wrong? Do we need to revisit a previous phase?
38Source: DRII
Projectinitiation
Functionalrequirements
Design anddevelopmentImplementation
Maintenanceand updating
Testing andexercising
Procedures
Businesscontinuityprocess
Requiredavailability times
You
are
Her
e
39
Maintenance and update phaseMaintenance and update phaseMaintenance and update phaseMaintenance and update phase
Remember to budget for this phase. An untested, stale plan is worse than no plan at all!
Review criteria – still current? Status, reporting, and audits Distribution and security
– Your plan is a competitive asset
40
Execution phaseExecution phaseExecution phaseExecution phase
If an event becomes a disaster– Decide
– Declare
– Notify
– Execute
41
. . . and no one is around to use them
IT recovery is part of a complete
contingency plan
Like Cheerios are part of a complete breakfast…
Not just an IT problemNot just an IT problemNot just an IT problemNot just an IT problem
IT can recover computers and applications, not Business Processes
The computers are humming, the applications are loaded…
Horror StoriesHorror Stories
43
Horror storiesHorror storiesHorror storiesHorror stories
Your backup site is in Atlantic city. You declare during the Miss America pageant (Hurricane Andrew)
Your computer room is in the basement and there’s a fire in the building (Bell Canada)
Will the generators be safe? Do you have a way to refuel them? (Tropical storm Allison)
44
Horror storiesHorror storiesHorror storiesHorror stories
1.You power up the generators and nothing happens
2.You power up the generators and the power surge blows out your systems
3.You power up the generators and realize that your air conditioning isn’t on backup power
Hint: Exercise your plan!
45
Food for thoughtFood for thoughtTapesTapesFood for thoughtFood for thoughtTapesTapes
Where is your tape backup hardware? Where are tapes stored until they go offsite? How quickly do your tapes go offsite? Are multiple tape copies sent via different routes? Do you do tape retrieval / restore tests? For recovery, do you ship tapes in ‘waves?’
46
Food for thoughtFood for thoughtReplicated enterprise storageReplicated enterprise storageFood for thoughtFood for thoughtReplicated enterprise storageReplicated enterprise storage
Vendors guarantee disk integrity– Backup disk = primary disk at a bit level
Database integrity is not guaranteed
Your database software needs to recover the database to a consistent state before you can begin processing on the backup system
47
Site
Fai
lure
Source system
Target system
D1
D2
Not flushed to disk but transaction committed and log flushed
D1D2D1T3B T3C
D1
D2On disk, but not committed
D2T2B
D2
D2
D2T2B
Disk 1
Disk 2
T1B D1 T1C
D1
D2
D1
T1B D1 T1CD2
Audit disk cache flushed at transaction commit for safety
Database diskcache flushed infrequently for performance
D1
D2
D1D2D1T3B T3C
AuditLogDisk
Disk 1
Disk 2
AuditLogDisk
= disk cache flush
Physical disk does not equal Physical disk does not equal logical databaselogical databasePhysical disk does not equal Physical disk does not equal logical databaselogical database
48
Food for thoughtFood for thoughtFood for thoughtFood for thought
Check your third party site contract– How many other companies in the same
threat area use the same vendor?
– How soon do you have to vacate? Where will you go?
– Have you included workstations and spacefor them?
49
Remember that building?Remember that building?Remember that building?Remember that building?
One year later, the tornado-scarred Bank One tower in Ft. Worth Texas is still closed.
2000/03/30 2001/02/10
Thank you!Thank you!