business continuity planning in an organisation · the evolution of business continuity too often,...

Business Continuity PlanningIn an Organisation

Smartha Guha Thakurta

EMC Proven Profesional Knowledge Sharing 2009

Smartha Guha ThakurtaEMC

EMC Proven Professional Knowledge Sharing 2009 2

TABLE OF CONTENTS

CHAPTER 1: INTRODUCTION............................................................................. 4

CHAPTER 2: BUSINESS CONTINUITY & DATA PROTECTION OPTIONS ...... 9

CHAPTER 3: BUSINESS CONTINUITY PLANNING OBJECTIVES ................. 15

CHAPTER 4: DEFINING DISASTER TYPES ..................................................... 19

CHAPTER 5: BEST PRACTICES AND TRENDS IN BC PLANNING................. 21

CHAPTER 6: CASE STUDY ............................................................................... 29

CHAPTER 7: CHANGE MANAGEMENT & DECISION MAKING...................... 31

CHAPTER 8: RECOMMENDATIONS................................................................. 33

CHAPTER 9: TCO AND ROI ANALYSIS ........................................................... 34

CHAPTER 10: CONCLUSIONS.......................................................................... 40

APPENDIX A: SAMPLE RISK ANALYSIS ......................................................... 41

APPENDIX B: REFERENCES AND BIBLIOGRAPHY....................................... 42

BIOGRAPHY........................................................................................................ 43

Disclaimer: The views, processes or methodologies published in this compilation are those of the authors. They do not necessarily reflect EMC Corporation’s views, processes, or methodologies


List of Figures

Fig. 1 Cost of downtime by Industry segments

Fig. 2 Market growth in BC/ DR worldwide estimates

Fig. 3 Obstacles to availability

Fig. 4 Perspectives of various stakeholders in Business Continuance

Fig. 5 Components of a Business continuity model

Fig. 6 The Foundation Pillars of Business Continuity Planning

List of Tables

Table 1: Components of BC and corresponding plans

Table 2: Classification of Business Process Service Levels

Disclaimer: The views, processes, or methodologies published in this compilation are those of the authors. They do not necessarily reflect EMC Corporation’s views, processes, or methodologies.


Chapter 1: Introduction Preface On the morning of 26th December 2004, people were enjoying the beaches in Indonesia,

Malaysia, Sri Lanka and India. No one was aware that a tsunami was going to destroy everything

– people, industries and resources. Disaster struck.

On 11th September 2001, people working at the World Trade Centre never thought that their world

would end. Many of the organizations with offices in those buildings could not recover.

These and many more disasters, like the power outage in Europe and North America (Black

Friday), have significantly changed people’s lives. Technology managers, CEOs, and CIOs were

particularly affected not only personally but by the business challenges that these disasters

provoked.

Today’s marketplace is largely driven by a single principle; we want it, and we want it now. When

we turn the spigot, we expect water. When we pick up the phone, we expect a dial tone. When we

turn on the television, we expect programming. When services are normal, they are expected;

when the unexpected happens, we continue to expect services.

Successful organizations realize that these basic but unrelenting expectations are translated into

business demands. When we launch an application, we expect full and immediate functionality.

When we click the “buy” button, we expect a successful and secure transaction. When we

connect, we expect full and expeditious access to the information we require.

The Evolution of Business Continuity Too often, IT organizations (ITOs) view business continuity planning from a technical rather than a

business perspective that is aligned to user requirements. Availability has evolved into a business

issue with many nuances, including data protection and disaster recovery.

Traditional Approaches Traditional approaches to disaster recovery reacted to outage events by returning systems to the

status quo. Purely an IT function, traditional disaster recovery cobbled together a series of

hardware redundancies, outsourcing arrangements, tape backups, and often impractical ideas on

geographical dispersion.


Many ITOs questioned the affordability, scalability, and reliability of disaster recovery plans.

During an outage, traditional disaster recovery involved vulnerabilities such as unreliable manual

processes, untested third-party interventions, exposure of corporate data and applications to

external and untrained parties, and a host of other failure opportunities. Most disturbing, traditional

disaster recovery was exorbitantly expensive in that most outsourcing arrangements offered a

‘one size fits all’ approach, protecting all systems and applications with a single service level.

Business requirements demand a shift from reactive disaster recovery to a proactive, high

availability state. Coupled with disaster preparedness, traditional high availability sought to

provide near fault tolerance by using multiple locations and recovery architectures such as high-

speed networking, high-performance servers, and automation. Again, this approach failed to

provide a realistic balance between availability and cost effectiveness. No significant focus was

placed on mission-critical activities, user perspectives, or loss impacts. High availability is fine, but

business requires an appropriate level of availability. Traditional high availability assumes an

unlimited budget. The cost of downtime associated with each industry is stunning.

The Paradigm Shift Business continuity planning has emerged as the new availability benchmark by adopting the

benefits and shedding the shortcomings of traditional approaches. This has blurred the distinction

between business as usual and always available. Proactive, testable, and pragmatic business

continuity requires a holistic approach to address availability from user and business

perspectives, rather than from infrastructure or recovery perspectives. Taking availability to a

higher level, business continuity planning does not focus on a specific deployment or project.

Instead, it sets availability management as a corporate mentality and measures all deployments.


The Need for Business Continuity Many organizations have a flawed perception of business continuity. Users don’t care about

availability, disaster recovery, and associated processes and policies. Users care about

seamless, uninterrupted, and rapid access. When business continuity is effective, availability

occurs under the covers, ultimately providing the best service levels to all stakeholders.

The need for business continuity in the Asia-Pacific market can be visualized by looking at IDC

data that forecasts 100% YoY growth in the domain.

Fig. 2 Market growth in BC/ DR worldwide estimates

Planned or Unplanned- It's All Downtime Research shows that the majority of users’ resources for availability initiatives (time, money, and

effort) are dedicated to addressing unplanned downtime. Furthermore, many organizational

stakeholders view downtime in the context of a tragedy or disaster.

This approach is flawed for several reasons. First, planned activities account for as much as 90%

of downtime.

Fig. 3 Obstacles to availability


Second, planned downtime is necessary due to mandatory IT functions such as maintenance,

configurations, backups, etc. Downtime must be addressed with planning and control.

The Perspectives

Fig. 4 Perspectives of various stakeholders in Business Continuance

The Perspective of the Internal User Internal users (employees, contractors, other stakeholders) are customers who contribute to the

bottom line through productivity and efficiency. Providing continuous availability to internal users

has become increasingly difficult due to many rapid marketplace evolutions, including mobile,

wireless, and distributed computing. Given the benefits of global connectedness, ITOs realize that

complex computing must be adequately supported and available.

The Perspective of the External User - The Supply Chain While affording many benefits, the practice of connecting supply chains across enterprises

creates vulnerabilities via dependency. Research shows that organizations are not fully protecting

themselves against supply chain failure through enforcement of delivery or performance

requirements. In an outage, this breakdown in accountability will affect the bottom line by

disabling the supply chain that often feeds mission-critical activities. Ultimately, organizations

should look to business continuity planning to avoid jeopardizing trust or breaching contracts with

partners in the supply chain.


Most Importantly, the Perspective of Customers Everyone remembers the customer service tenets of the brick-and-mortar world. Successful

organizations realize that external users expect unfettered access and availability without glitches,

abnormal pauses, reconnections, or error messages. Fickle and unforgiving, customers will

abandon a user session mid-transaction, never to return. In a world of instantaneous information,

a competitor or substitute is but one click away.

Business continuity decision-makers must place themselves in the shoes of the customer to

assess experience, tolerance, flexibility, forgiveness levels, and switching costs. Business

functions in some industries enjoy more forgiveness than others. If the switching costs are low,

the consumer encounters no barriers to exit and little to no pain in changing providers. Even with

generous forgiveness and high switching costs, a consumer will not remain loyal to an

inconsistent provider.

The Perspective of Law and Governance Companies are under increasing pressure to gain control of core business processes to comply

with current and emerging legislation. Three United States regulatory agencies- the Federal

Reserve System, Department of the Treasury, and Securities and Exchange Commission - are

driving the new surge toward corporate governance by issuing business continuity and disaster

recovery guidelines. Organizations should begin planning for future legislative requirements to

minimize corporate exposure and drive business continuity principles.

For instance, the Sarbanes-Oxley Act requires control over material business processes and

related information. Under the threat of personal indictment, executives of publicly traded

companies must report and validate that their financial information is complete and the underlying

controls secure. Compliance is further complicated since much of the new legislation is industry

specific. Guidelines are in their infancy and often lack specific instruction on how to achieve

compliance. The holistic approach, proactively managing business processes and related

information with technology through effective business continuity planning, is the only viable way

to achieve compliance with Sarbanes-Oxley.


Chapter 2: Business Continuity & Data Protection Options Once an autonomous and largely technical consideration, the concept and implementation of data

backup systems has emerged from a ‘nice to have’ IT add-on to a strategic investment in ensuring

availability. Many tradeoffs force the ITO to prioritize based on cost, business impact, scalability/

capability, data accuracy/ coherency, and elapsed time to copy and restoration. Although many

different data protection variations and configurations exist, data protection is a critical component

to ensure seamless and expeditious availability.

Tape Backup Most organizational downtime is attributed to planned activities. The goal of effective backup in

the context of business continuity planning should tend toward reducing the impact of mandatory

backup processes on business operations while minimizing associated costs. As always, the

needs of users remain an underlying consideration.

Offline backups often create unnecessary risk and uncertainty, and their duration depends highly

on the volume of data and I/O throughput capacity variables. While guaranteeing that data is in a

static and defined state, an offline approach engenders significant obstacles such as

unavailability, bandwidth, capacity, and scalability issues.

Disk Backup Although tape backup is not completely obsolete, disk backup is emerging as a low-cost, space-

saving alternative or complement that ultimately provides speedier backup and recovery. In

addition, disk backup allows for an effective allocation of resources because excessive workloads

can be offloaded to tape systems. This is particularly beneficial in archiving situations. However,

disk backup still fails to cost effectively address the storage objective of long-term retention

because of the associated costs of cooling, power, and handling.

Furthermore, tape media better serves long-term off-site retention, due to the transferring of disk

drives to safe locations, etc.


Replication In conjunction with backup, replication strategies complement traditional approaches by providing

alternative levels of data protection and integrity while minimizing user disruptions. Replication

creates a point-in-time copy of data to be used as the backup source. Thus, replication addresses

the shortcomings of both offline and online backup by reducing downtime intervals while

minimizing application disruption and drag during the backup and synchronization processes.

However, replication involves many different approaches based on data protection needs.

Appropriate use of replication techniques is tantamount to a sound business continuity plan. If the

primary data is corrupted or unavailable, replication processes enable the organization to call up a

viable, current, and accurate secondary data copy to support mission-critical business

applications.

Asynchronous Versus Synchronous Replication occurs in two different modes, synchronous and asynchronous. In real-time,

synchronous replication writes and confirms data to both the primary and secondary storage

before the application continues. This approach entails a two-phase commit; no transaction may

occur until the secondary source copies and acknowledges the valid copy with the primary

source.

Because of this committed approach, synchronous replication results in near-zero loss of data,

and rapid recovery times in case of failure. This means that systems and operations could be

transferred to a different location with little user disruption. As the premium and highly costly

solution, synchronous replication may not meet some geographical dispersion requirements due

to Fibre Channel limitations. Therefore, synchronous replication is often considered mandatory for

mission-critical applications or applications for which delay is unacceptable.

Providing this protection substantially increases cost due to high-bandwidth network connectivity,

robust communication infrastructure and storage performance requirements.

Conversely, asynchronous replication writes and confirms data to the primary source only, or to a

temporary staging area. The write and confirm process to the secondary storage occurs when the

application resumes normal operation, at some pre-programmed interval or whenever an update

takes place. A common fallacy of asynchronous replication is that the copy is not as current as the

production original.


Hardware versus Software Replication can be hardware- or software- based. Hardware replication is generally performed by

specialized controllers, freeing server bandwidth for other functions. Available from most

hardware storage vendors, hardware-based replication solutions require specific disk arrays

accompanied by replication software.

While hardware-based replication typically represents a synchronous approach, many vendors

are responding to consumer demand for more robust functionality through automation, data

transfer modes, and other add-ons. Unfortunately, the effective use of these add-ons may require

additional advanced programming and configuration. Some of these costs are offset since the

initial configuration is typically straightforward.

Conversely, software-based replication works at the application or DBMS level, alleviating some

of the challenges of hardware-based replication. Software-based replication solutions are not

bound to any particular hardware vendor or brand, providing more flexibility and versatility. Thus,

a software-based approach enables data replication from any type of storage system to any other,

over any type of IP network connection. Application-level replication is transaction-aware,

assisting in advanced disaster recovery initiatives. However, software-based replication leverages

server cycles that can affect performance.

Database-Level Replication Database-level replication is the most cost effective option, addressing both planned and

unplanned downtime as well as disaster recovery functionality. Because replication takes place at

the database level, organizations can switch users to a second node in near real time avoiding a

complete database restart. Furthermore, since the replicated copy is a different data set it can be

remote.

Block/Storage-Level Replication Block/storage level replication is best for applications that can tolerate up to 30 minutes of

downtime. They replicate over storage blocks when changed on the primary site, unaware of the

application or database that runs above them. Thus, the target site is a complete copy of the

source, replicating all changes including applications and agents. Block-level replication usually

occurs in synchronous mode, resulting in a high level of data consistency. Asynchronous

replication can be enabled when high transaction volumes are present or distances are great.


Because replication is done at the block level, no applications can be running on the target site.

This necessitates a cold restart that introduces manual processes and time delays. Furthermore,

because the replication occurs at the block level, errors on the primary site, such as database

corruptions, could be repeated at the recovery site.

Mirroring In conjunction with backup, mirroring is a real-time approach to data protection. Mirroring

technology continuously copies data and mirrors it to a secondary server unless specifically

instructed to stop. Unlike a traditional point-in-time backup, a mirrored copy keeps no record of

the original source, simply replacing the original with updated data.

Mirroring can either be synchronous or asynchronous. Synchronous mirroring results in multiple

exact duplicates, but suffers from latency issues that limit its effectiveness over geographical

distances. Not error free, asynchronous mirroring deployments are effective over distances

reducing network costs and providing low latency, but greater time delays. Mirroring technology

proves most effective when responding to a local component failure.

Furthermore, aging mirrors produce diminished returns for use as a fallback point. Due to cost

considerations, especially in storage capacity, mirroring implementations (particularly

synchronous) are suggested for mission-critical applications and data categories only.

Business Continuity Planning Given a choice, stakeholders will opt for 100% availability. However, decision makers realize this

is not possible. Therefore, organizations must approach an availability management plan with

prudence through careful planning, testing, and preparation. A sound, enterprise wide business

continuity plan is an enormous undertaking.

Roles and Responsibilities As with any enterprise wide deployment, organizational acceptance of the need, plan, and

approach is mandatory from the boardroom to the mailroom. To avoid competing agendas, the

first step is to establish roles and responsibilities. This will instill a sense of participation and

encourage collaboration. The IT organization is a significant player in the availability management

process. IT retains responsibility for technical planning, implementation, and operational details of

the business continuity plan.


In conjunction, lines of business (LOBs) are responsible for:

• communicating mission-critical functionality

• understanding and executing business continuity procedures and policies

• planning and executing contingency plans

Lastly, corporate leadership must define and implement policies regarding a business continuity

plan by actively participating and dedicating resources.

Definition of Losses We can predict tangible losses with some degree of accuracy using consistent or known metrics.

Although this approach may appear straightforward, organizations are naive to think that bottom-

line impacts due to outages are simple to predict and quantify. Predicting and quantifying

intangible losses proves much more difficult since they require a clear understanding of business

functions, user expectations, market position, and brand impact. For instance, corporate credibility

and consumer trust are not debited or credited on a balance sheet or income statement. However,

the associated losses affect the bottom-line causing considerable and justifiable concern.

In addition, several loss categories behave differently in the short and long terms. Unavailability of

a particular service may be acceptable in the short term, but cause significant pain long term.

Loss Potential For each system and application, business leaders must define appropriate and acceptable

recovery time objectives (RTOs) and recovery point objectives (RPOs) to understand lost

business opportunity and the resulting financial impacts.

RTO identifies the point in time when application (and associated business process) technology

components are functional to the extent that transactions, business functions, etc. can be

resumed. RTO does not mean 100% recovered; it usually indicates a degraded processing mode

(e.g., less capacity, lower performance).

RPO defines the point in time during the recovery cycle at which consistent data recovery can

begin. It can incorporate the time needed to make the disaster declaration (i.e., activate the plan),

stage equipment and personnel resources, transport backup media, and install software

infrastructure (e.g., operating systems, middleware), applications, network switchovers, etc.

Many organizations consider advanced recovery options (e.g., standby operating system,

electronic vaulting of data to the recovery site) to reduce RPO windows (sometimes to zero i.e.,

consistent data recovery begins immediately).


Categorization Framework The categorization framework seeks to set base-level availability minimums that are aligned with

most major business requirements. Best practices dictate a weighted and tiered approach based

on the metrics described above: RPO, RTO, loss of core business functions, and both tangible

and intangible financial impacts.

Pinnacle systems and applications are classified as platinum, meaning mission-critical

functionality and high loss potential. Platinum systems demand high availability and a recovery

time as close to zero as possible, minimizing risk, vulnerability, and repercussion. Systems and

applications in a gold classification withstand outages better, with recovery times up to an hour.

However, the potential losses from gold-level availability failure have far less impact than those

associated with platinum systems, and so on.

Business Impact Analysis With unlimited funds, availability management would prioritize all systems and applications at the

platinum level. Although possible in an ideal world, the categorization framework assumes a finite

pool of resources. Because of this, business impact analysis assists in allocating the appropriate

level of scarce funds to the systems that impact the business most during an outage. A business

impact analysis enables organizations to identify and assign costs to key business processes.

Process and Testing Processes that support the business continuity plan are established and documented as the

business impact analysis tests and solidifies priorities. Policies must be created regarding process

improvement, sustainability, organizational resiliency, and conflict resolution. Gartner research

shows that organizations are poorly positioned and woefully unprepared to execute on their

business continuity plans. Only 25% of organizations include training and education about the

business continuity plan, leaving key personnel to act independently during a crisis. Fewer than

50% of organizations plan for transportation logistics and telecommunications/network outages.

Rehearsals are the best way to prepare for uncertainty and test the business continuity plan.

Announced or unannounced rehearsals should confirm time objectives, staff preparedness and

awareness, duplication or over commitment of resources, and the responsiveness and

effectiveness of external parties.


Chapter 3: Business Continuity Planning Objectives Today, a Business Continuity Management (BCM) strategy ensures that an organization can

survive during and after any disaster causing data loss. It’s also a key tool to build a

comprehensive emergency management system to sustain critical business processes.

BCM/ BCP is synonymous with disaster recovery. But it encompasses more than just DR. It ties in

all the essential components needed to deal with disaster, and ensures incessant provisioning of

critical business operations and services even during a total system collapse.

Business continuity planning is an integrated, enterprise-wide process that should include business impact analysis, resumption planning, business recovery, contingency planning, crisis communication systems, disaster recovery, information security, risk management, and software management.

Business continuity is moving from a reactive to a proactive investment. Most think that they

already have Business Continuity Plans in place, they may be called e.g. crisis plans, emergency

evacuation plans, disaster recovery plans, communication plans, recovery plans or just Plan B.

HOW BUSINESS CONTINUITY WORKS

AVAILABILITY RELIABILITY RECOVERABILITY

Enterprise High Availability Service Level Mgmt. Business Continuity Planning

Achieve and maintain the chosen availability level of the enterprise’s

IT infrastructure.

Effectively manage and control the IT infrastructure to improve the overall operational reliability.

Provide an effective plan to minimise downtime of key

processes in the event of a major disruption.

Technology Processes People

Proactive and preventive Response and recovery

ISSUE

SOLUTION

OBJECTIVE

EMPHASIS

FOCUS

BUSINESS CONTINUITY MODEL

Fig. 5 Components of a Business Continuity Model


First, the Business Continuity Institute (BCI) and the Disaster Recovery Institute International

(DRII) have agreed to ten (10) standards for Business Continuity Management. The ten

certification standards for business continuity practitioners are:

1. Project Initiation and management Establish the need for a business continuity plan, including management support and the elements of organizing and managing the project to completion

2. Risk identification, analysis and control Determine the risks that can adversely affect an organization, analyze the results and determine the controls needed to prevent or minimize risks

3. Business impact analysis (BIA) Quantify the risks identified in item 2. Establish critical functions, their recovery priorities, and interdependencies so that recovery time objectives can be set

4. Developing continuity strategies Determine and guide the selection of alternative recovery operating strategies to maintain the organization’s critical functions

5. Emergency response and operations Develop and implement procedures to respond to and stabilize an incident or event

6. Developing and implementing Business Continuity Plans Design, develop, deliver and implement the continuity plans

7. Awareness and training programmes Prepare a program to create awareness and enhance the skills required to develop, implement, maintain and execute the continuity plan

8. Maintaining and exercising continuity plans Coordinate, evaluate, test and exercise the continuity plan; document results. Develop processes to maintain the continuity capabilities and the plan document, in accordance with the strategic direction of the organization

9. Public relations and media communication Develop, coordinate, evaluate, implement, and exercise plans to handle the media during a crisis. Communicate with employees and their families, key customers, critical suppliers and other suppliers, owners/stockholders, and corporate management during the crisis to ensure that all stakeholders are informed

10. Co-ordination with public authorities Establish applicable procedures and policies for co-coordinating continuity and restoration activities with other emergency management agencies as required by statutes or regulations Note: Details of item 10 vary from country to country, and from industry to industry.


The first five (5) steps are critical before BC Plans are produced.

How many BC Plans should an organization have? There should be at least one for each location

and then separate plans within each location for mission critical business units or divisions at that

location.

A Management Recovery Team Plan binds the BC plans together. The Management Recovery

Team includes senior representatives of all the mission critical functions and specialists that have

company wide responsibilities including human resources, legal, finance, public relations,

telecommunications, systems, and strategic planning. Keep in mind that the business units that

these key people come from will also have their own separate Business Continuity Plans.

Disaster recovery planning is not limited to IT. It is a business issue.

When developing the Plan, address the following points:

• Senior management must understand the level of effort needed to research, define,

construct, and test the Plan. There needs to be support and commitment from the top!

• Management must support the planning effort and ensure its success both on a short-term

and an ongoing basis. This means allocating resources to manage tasks such as

documentation and testing on an ongoing basis.

• Select a project team to ensure an adequate balance between IT and business community

members. This will ensure that the resulting Plan will cover the requirements of both the

IT and business communities.

• Define and agree upon the recovery requirements of the business and IT communities.

Furthermore, they should be posted and accessible to everyone in the organization (such

as the company intranet). Visibility ensures that people realize the importance of the

effort, and their role in its success.

• Design solutions to fit the requirements of the business and the IT communities, including

risk mitigation.

• The final Plan, incorporating those solutions, must be easy to understand, put into

practice, and maintain.


• Develop a process needs to keep the plan current, representing the business and

computing environments at all times.

Disaster recovery planning is a highly complex and time-consuming activity that requires a firm

commitment from management to allocate the hours and funds necessary to achieve success. In

addition, implementing solutions designed to mitigate risk often requires major expenditures.


Chapter 4: Defining Disaster Types

The word ‘disaster’ is derived from the Latin word for “evil star”- a metaphor for comet, once

thought to be a harbinger of impending doom. Today, IT is a vital part of the Value Chain. It has

evolved from a support function to the heart-line of the entire business. Disaster, in this context,

means the unplanned interruption of normal business processes resulting from the interruption of

the IT infrastructure components used to support them.

Disasters happen. Diligent planning and preparation helps us to control our disaster response.

Defining Disasters The first step in planning to recover from a disaster is to define what types of disaster may occur.

There are three categories of disaster:

Category I, the least serious category, may include events as electrical failure, rolling blackouts,

or an accident that severs a major power line. These are the most common types of disaster;

they may be serious if an organization is not prepared.

Category II, localized man-made or natural disasters of a more serious nature, such as a flooded

or fire damaged computer room, require more extensive planning. Since downtime caused by

these disasters can last for days or weeks, they can devastate an unprepared company.

Mitigation of risk for these may include contracting with an outside agency for a mobile computer

center, or maintaining a hot site in another location.

Category III, widespread natural disasters such as earthquakes or floods, require the most

planning and can be the most difficult to recover from. Although these events do not happen

often, they do happen. They can drive a company out of business without adequate planning.

Mitigating Risk Disasters in the first category are relatively easy to mitigate, and many businesses will have

recognized the need for this during their initial business planning processes. Cost is often a factor

in determining what, if anything, is done to prepare for disasters. The impact of data loss and the

inability to continue business for an extended period of time may be enough to put any company

out of business! While a cost / benefit analysis should be performed, it is important to view risk

mitigation as an insurance policy.


Define what Types of Disasters need to be planned for. Disasters in categories II and III are more difficult to mitigate and require extensive preparation

and planning. Every organization that relies on its IT community in the day-to-day operations of

business should assess its risk of disaster.

As part of risk mitigation, many companies face the question of whether it is better to maintain a

hot site of their own or to contract with an outside agency for recovery services at the agency’s

own recovery site(s), or mobile. Cost, rather than ability to recover, is often the key consideration.

A company's location is first determination. If the company is in an area that is not subject to

widespread natural disasters, such as are described in category III, a mobile solution may be

acceptable. If a company resides in an area that is subject to such disasters, it would be better to

have the remote recovery site in an area that is not subject to those forces of nature. Having a

hot site that resides in another geographic location, and possibly a more rural setting, also makes

sense when mitigating the risk of terrorism.

Size is another determining factor. A company with relatively modest computing requirements

may find, depending on location, that either a mobile or fixed recovery site supplied by an outside

agency is adequate. A company that has large scale computing needs may find that, depending

on the size and number of systems being restored, a mobile recovery solution is not practical.

Availability is another factor. Many agencies that sell recovery services sell the same services to

a large number of clients. This is perfectly reasonable business practice since supplying a

dedicated site and system to each of the companies contracting for these services would be

financially impractical, both for the agency and the client. In the event a company has a fire in its

computer room and requires recovery services until their own facility can be rebuilt, this may work

well. However, in the event of a widespread regional disaster, a company may find itself sharing

computing systems with other clients, or worse, waiting until another client is finished with the

facilities. This aspect of recovery in the event of an actual disaster must be understood,

acknowledged, and managed using contractual agreements that provide for priority access to one

or more recovery sites.


Chapter 5: Best Practices and Trends in BC Planning

Strategic Imperative: Real-time enterprises cannot afford to accept the risks associated with

business continuity (BC) vulnerabilities — the consequences could be fatal.

Real-Time Enterprise and BCP — A Collision Course Business Is Moving Faster than Ever Before with focus on:

• Real-time enterprise business process integration

• Significant reliance on partners in the value chain

• Faster flow and immediate responses expected

• You are only as strong as the weakest link

Historically, business continuity (BC) was focused on protecting the enterprise against unlikely but

large events — fire, flood, and natural disaster. With the real-time enterprise (RTE), however,

even the smallest of interruptions — minutes or hours outage of a critical business system,

interruption in service from a critical supplier or outside service provider, or the potential business

impact caused by the economy and its effects on critical customers/suppliers — can have serious

business consequences.

It is estimated that less than 25% of large enterprises have comprehensive business continuity

planning programs, and just 50% have comprehensive disaster recovery programs. Those that do

not are on a collision course with destruction. Those that have done BCP planning are confident

in their ability to adapt and survive, whatever the incident/situation facing them.


BC Components

Disaster Recovery

Business Recovery

Business Resumption

Contingency Planning

Objective Mission critical applications

Mission critical business processing (workspace)

Business Process workarounds

External events

Focus Site or component outage (external)

Site outage (external)

Application outage (internal)

External behavior forcing change to internal

Deliverable Disaster Recovery Plan

Business Recovery Plan

Alternate Processing Plan

Business Contingency Plan

Sample Event(s)

Fire at the datacenter; critical server failure

Electrical outage in the building

Credit authorization system down

Main supplier cannot ship due to its own problem

Sample solution

Recovery site in a different location

Recovery site in a different power grid

Manual procedure

25% backup of vital products; backup suppliers

CRISIS MANAGEMENT

Table 1: Components of BC and Corresponding Plans

The shift from disaster recovery planning to BCP recognizes that IT services are just one

essential component of a business process. Planning and mitigation of all critical resources — IT,

people, facilities, specialized equipment - are required to effectively recover from a disaster. BCP

is a top-level concern and is vital to maintain financial confidence and the reputation of the

business.

BCP includes five components:

• disaster recovery

• business recovery

• business resumption

• contingency planning

• crisis management


The crisis management component addresses managing the event, protecting employees, and

maintaining confidence in the business despite the business interruption. This presentation

focuses on best practices in BCP, emphasizing the RTE impact. RTE does not change the five

components of BCP but rather places more importance on the enterprise’s contingency and crisis

management plans because of the public nature of outages and the increasing reliance on

external services providers (ESPs) for processing. It also reduces recovery point and time

objectives toward real time — 24x7 continuous availability.

How has BC evolved, and what is the impact of real-time enterprise? BCP has evolved significantly during the past 20 years. In the early 1990s, BCP was IT

disaster recovery, providing protection from natural disasters and critical component failure by

enabling recovery in another data center in about 72 hours. In the mid-1990s, enterprises added

business process protection, and recovery plans were developed (e.g., those for customer call

centers). In the late 1990s, as enterprises re-engineered their business processes and assessed

business processes from a year 2000 remediation perspective; it became apparent that

traditional recovery plans with 72-hour recovery periods were not good enough. Thus, enterprises

significantly increased spending to achieve recovery times of between 4 and 24 hours. The

evolution toward e-commerce and RTE resulted in yet another discontinuity affecting BCP.

For many RTEs, a 4- to 24-hour site outage would cause irreparable damage to the enterprise.

Consequently, many enterprises are incorporating BCP into their business process, application

and technology architecture designs — and building in continuous 24x7 availability. Furthermore,

the risks are greater with RTE, so the BC plan must address new scenarios — and BC processes

must integrate with a greater number of enterprise processes.

One of the most important lessons learned from recent disasters is that people issues need to

take center stage in planning — safety, communication and resiliency in workspace and process

issues. As a result, crisis management plans and call trees are being created or updated as are

contingency plans regarding availability of outside service providers and partners. New scenarios

are being developed to address new vulnerabilities.

Business processes (and their integration with external constituents) are rapidly transforming with

enterprise investment in RTE applications and infrastructures. The new risks and the integration

of continuous availability into the business process affect the business in new ways. The


boundaries between “business as usual” and an emergency event that was so easily erected prior

to RTE are no longer possible, and there is no distinction between these two operating

environments. The cost of operating the RTE application environment increases because the

decision for BC is pushed up into the design phase of the project. The risk of RTE application

service downtime reduces the risks that can be accepted by business management; therefore,

they must be addressed with recovery solutions. The new risks need to be reviewed by an

integrated business/IT team. An outage is public knowledge; therefore, the reaction to it must be

immediate and well-managed. Outages take on many faces: the application might hum, but the

operating processes around the application environment might be the cause.

What best practices should enterprises pursue in striving for BC program excellence?

PROCESS

CHANGE MANAGEMENT EDUCATION TESTING REVIEW

TESTING

Group Plans & Procedures Risk Reduction Implement standby facilities

Create Planning Organisation

Recovery Strategy

Risk Analysis

Business Impact Analysis

Policy Organisation Resources Scope

BUSINESS CONTINUITY PLANNING INITIATION

Fig. 6 The Foundation Pillars of Business Continuity Planning

Senior management sponsorship and participation are the foundations of BC excellence. Build BC

into the enterprise culture by weaving BC processes into the life cycle of every project and change

management process. In the requirements phase, the business impact analysis (BIA) identifies

what the enterprise has at risk and which business processes are most critical, thereby prioritizing

risk management and recovery investments. The direct/indirect impact of business


interruptions is assessed over time, resulting in requirements for recovery time and point

objectives. Risk analysis identifies the enterprise’s vulnerability to risks so that they can be

mitigated in the project design phase. Recovery strategies and processes are developed In the

architecture and design phase. When cost of recovery is outside the project budget, enterprises

often must revert to business requirements to re-justify investments or change requirements.

During construction, detailed plans and procedures are created by those responsible for the daily

operation of the processes. The recovery process must be tested prior to implementation to

ensure that requirements can be met. Establish a process to keep the plan current by initiating a

review of every change to business processes or systems.

Action Item: Enterprises need to formalize business continuity processes, starting with the creation of a BC organization responsible for setting policy, governance and reporting.

To determine appropriate availability investments, enterprises need to understand the

consequences of downtime to justify investments for operational availability and BC. A first step

in developing a BC plan is to perform a Business Impact Analysis (BIA). Identify and prioritize

critical business processes and evaluate costs of downtime.

Key goals of the BIA:

1) agree on the cost of business downtime over varying time periods,

2) identify business process availability and recovery time objectives, and

3) identify business process recovery point objectives. The BIA results feed into the recovery

strategy and process.

Action Item: Integrate BCP into the enterprise project life cycle to ensure that recovery needs are identified in the initial phases of new projects, or when business processes and systems change. Tactical Guideline: Even a failed disaster recovery test is useful. BC plans require frequent testing

to ensure the support of critical business requirements. Every plan must be tested to ensure

credible recovery preparedness. Testing familiarizes all BC team members with the experience of

a sudden, unexpected business processing interruption and exposes potential problems and

unforeseen situations. Continuously testing and modifying plans is the key to recovery

preparedness, maximizing the chances of surviving a disaster.


Action Item: Testing RTE recovery plans requires an integrated effort by all parties involved with the transaction. The participation of all owners is critical to the success of the recovery process. When it is not possible to conduct a live test of a BC plan or a component plan, conduct tabletop testing to ensure that external dependencies are addressed.

Tactical Guideline: Annual assessment of recovery capabilities in light of changing requirements

is necessary to ensure the BC plan meets changing business requirements.

Critical Success Factor: Evaluate Capabilities vs. Goals, and Act. Business requirements change

over time; reassess recovery capabilities frequently to ensure they meet business requirements.

This reassessment may be a formal process (such as a mini-BIA). Often, a failed disaster

recovery (DR) test will propel an enterprise into conducting a more detailed analysis.

Action Item: Know your enterprise’s recovery requirements and capabilities; frequently synchronize them with changing business requirements. What to Focus on When BC Funds Are Limited

• Crisis management plan — ensure the safety of employees, continuity of decision making,

and view from the outside world (includes employee call-tree and facilities diagrams)

• Asset list and key supplier contact information

• Secure, offsite backup tape storage

• Prioritize spending on most critical business processes — perform a BIA to determine

priorities

• Work-at-home programs for workspace recovery

• Contingency planning — mitigate the risks of external events

The most important activity is ensuring that, regardless of the level of spending, it is spent in the

right place — to protect the most-critical business processes. Performing a BIA will aid in

identifying business process and resource criticality, priority and dependencies so that spending

can be prioritized accordingly.

Classifying Business Process Service Levels in Project Life Cycle illustration follows on next page.


Classifying Business Process Service Levels in Project Life Cycle CLASS BUSINESS PROCESS

SERVICES SERVICE LEVELS

Class 1 (RTE) • Customer/ Partner facing

• Functions critical to Revenue Production

• 24 x 7 scheduled • 99.9% availability • RTO = 2 hrs., RPO = 0

hrs. Class 2 • Less – Critical

Revenue- Producing Functions

• Supply Chain

• 24 x 6-3/4 scheduled • 99.5% availability • RTO = 8-24 hrs., RPO = 4

hrs. Class 3 • Enterprise Back-Office

Functions • 18 x 7 scheduled • 99% availability • RTO = 3 days, RPO = 1

day Class 4 • Departmental

Functions • 24 x 6-1/2 scheduled • 98% availability • RTO = 5 days, RPO = 1

day

Table 2: Classification of Business Process Service Levels

Define business requirements for application service availability and DR during the business

requirements phase. Ignoring requirements often results in a solution that requires significant re-

architecture to improve service. Service-level definitions should include scheduled uptime, percent

availability in scheduled uptime, and recovery time and point objectives. Availability is day-today

availability of the service. Recovery means the time to recover from a significant event (a rolling

hardware failure or natural disaster) affecting the business process. In this example, Class 1

application services are those with an RTE.

Action Item: Develop a service-level classification system with associated development, infrastructure and operations architecture requirements. A repeatable process is a process that works.

Technologies to Reduce RTO/RPO Traditional BC plans provide 24- to 72-hour application/business process recoverability. Many

enterprises need shorter recovery times for critical applications. High-availability techniques are

escalating (especially for RTE applications), enabling enterprises to achieve RTOs and RPOs in

minutes rather than days. With RTE, hot standby (an idle standby application environment that

waits for a disaster affecting the primary physical site) often isn’t good enough, and many design

applications architectures cross several active physical sites. Thus, if one data center has an

outage, the others continue processing requests.


Action Item: Although rapid RTE recovery is expensive, the alternative (recovery in three or more days) could jeopardize an enterprise’s survival. A BIA will help assess the recovery ROI.

How are technologies and service providers evolving to meet BC’s needs? Multi-site architectures are used for application services with Class 1 or 2 (short RTO/RPO).

Often, a new RTE application service starts with single-site architecture and migrates to multiple

sites as its risks grow. Multiple sites complicate applications architecture design (load balancing,

database partitioning, database replication and site synchronization must be designed into the

architecture).

For non-transaction processing applications, multiple sites run concurrently to connect users to

the closest or least-used site. To reduce complexity, most transaction processing (TP)

applications replicate databases (or disks) to an alternative site, but the alternative databases are

idle unless a disaster occurs. A switch to the alternative site can be accomplished in 15 to 30

minutes. Some enterprises prefer to partition databases and split the TP load between sites, and

consolidate data later for decision support and reporting. This reduces the impact of a site outage,

affecting only a portion of the user base. Others prefer more-complex architectures with

bidirectional replication between sites to maintain a single database image. All application

services require end-to-end data backup and offsite storage as a component of the DR strategy.


Chapter 6: Case study

Methodology Since our expertise is primarily in the core domain of Storage, we shared BCP best practices with

the customer. At the same time, we helped them understand the value that to be gained by

engaging our expert consultancy services.

Tailor each disaster recovery-planning project to the individual organization. For this organization,

we had a specific charter designed for the various critical processes and the teams involved. The

project phases can be broadly classified as follows:

Disaster Recovery Planning for ABC Project Phases Phase I Project Initiation The objectives of this phase are to gain an understanding of the organization’s existing and

planned future IT environment, define the scope of the project, develop the project schedule, and

identify project risks.

We established a Steering Committee during this phase. As with any project, the Steering

Committee provided guidance to the project team. The Committee included key personnel from

the business and IT communities.

Phase II Assessment of Disaster Risk This should include, but not be limited to, an assessment of geographical location, building

composition, computing environment/physical plant security, installed security devices (including

automated fire extinguishers and automated shut-down devices), computing environment/physical

plant access control systems and software, personnel practices, operating practices, and backup

practices. This is a good time to perform an IT Assessment, Practices and Procedures Audit, and

Single Points of Failure Analysis.

Phase III Business Impact Analysis We conducted an analysis of all key business units supported by the IT team to identify which

systems and functions were critical to the continuation of business, and to determine the length of

time those units could survive without the critical systems.


Phase IV Definition of Requirements This was the most difficult and time-consuming part of the project. All requirements of, and

relating to, the Plan were defined and detailed. These included the recovery requirements of the

business and IT communities, the requirements generated by the business impact analysis, and

the requirements generated by the assessment of disaster risk and the mitigation of disaster risk.

Phase V Project Planning It is important here to distinguish between the Project Plan and the Disaster Recovery Plan. The

Project Plan defines the project that is being executed. One of its objectives is to develop the

Disaster Recovery Plan.


Chapter 7: Change Management & Decision Making

Conducting BC Plan exercises & “scenario planning” I recommend the term “exercise” rather than “test.” This is because the word test suggests either

success or failure; that is not what BCM is all about. Indeed “failure” is not an option for most

businesses today.

The organization should expect to find flaws in BC Plans during each exercise but that does not

equate to failure. When reporting results, report outcomes and recommended actions.

Example of a BCP “Exercising Policy” Exercising Business Continuity Plans should occur annually or as scheduled by the BCP

Committee. It is important to consider the opinions and recommendations of all Business Unit

Recovery Team members.

The purpose of the exercise is to:

1. Confirm the validity, accuracy and workability of the Business Continuity Plans

2. Ensure that all required resources, including personnel, are available in time

3. Validate that Recovery Team personnel are trained and understand the Business Continuity

Plans

The BCP Committee or Management Recovery Team is responsible for BC Plan exercises.

Results of individual Business Unit exercises must be reported to the BCP Committee in a

prescribed format and should define the action taken by the business unit to change BC Plans

identified by the exercise.

Suggested Draft policy statement - Exercising of all Business Continuity Plans “Exercising Business Continuity Plans is an integral and critical part of Business Continuity

Management and Business Continuity Plans.”

This is a typical statement of exercising objectives for a BCP Committee: “Provide exercise platforms to ensure business unit confidence in the recovery process for technology & systems, people & planning, business systems, and continuity of operations.”


Scenario Planning or WHAT IF? You can be either surprised by the future or be prepared for it. With markets and technology

becoming less predictable, scenario planning will help exercise your organizations’ business

continuity strategy and BC Plans. Consultants, academics and corporate planners agree that

rapid change in markets and technology make reliance on traditional planning increasingly

dangerous.

Documentation and Exercising Output It is imperative that all parties review their BCP documentation to ensure it is current prior to

initiating the exercise. After the exercise is complete, produce a report summarizing results, any

actions items (with expected completion dates), and a table showing expected recovery times for

each system. Distribute this report to all stakeholders, scope document signoffs, and exercise

participants.

Reviewing the exercise plan, timing, cost, objectives This is an ongoing task for the BC Committee or Management Recovery Team.

Consideration should be given to:

•Change management

•New business units

•Results of previous exercises

•New scenarios identified from risk analysis and identification

•Critical staff views on regularity of exercises

•Incident register recordings

•Cost identification recorded for analysis

Audit the exercise regularly to evaluate different elements of BCP All regions in SE Asia business units carry out BCP exercises annually based on a recent Gartner

audit. A few performed additional, smaller scale tests on a monthly or quarterly basis.

A business case would be a standard part of the deliverable. It would include how much it will

cost and how much it would save in operations after implementation.


Chapter 8: Recommendations Exercising BC Plans is the most important continuing function of BC management. It requires

regular and consistent planning, documenting, communicating and analyzing incidents that

require invoking unexpected BC Plans. Therefore, the element of surprise should be a feature of

all BC Plan exercises. If you rely on internal resources, it may become too routine.

Recommendations

1. Consolidate server-attached storage to a Networked Storage Solution

2. Establish a Hot/Warm DR Site with remote replication

3. Improve the Backup/Restore Process

4. Increase Availability

5. Implement Storage Resource Management (SRM) Tools

This allows you to plan your IT purchases to optimally meet scalability requirements.


Chapter 9: TCO and ROI Analysis Disaster Recovery Strategies that have evolved as part of the exercise

Now that all the Disaster Recovery requirements have been documented, they should be rolled

into viable Disaster Recovery strategies.

These Disaster Recovery strategies include the recovery of critical infrastructure and services but

not the recovery of Desktop platforms or their supporting network infrastructure. The proposed

network infrastructure will only support host and backup platforms at the proposed alternate site.

This doesn’t mean that the networking requirements for desktop or notebook access isn’t most

critical; it’s just that the distributed nature of desktops by DRP of each individual Business Unit

would make it hard to consolidate the deployment of required switches at any single location.

Assumptions:

1. Based on currently available information and established scope, pricing is expected to be

within a 20% range of the indicative pricing (plus or minus). During the subsequent Stage 2

engagement a fully costed proposal, technical rollout plan, and timeframe for rollout would be

delivered.

2. This pricing doesn’t consider the ABC resources required during the scoping, solutioning, and

delivery stages of the DRP solution.

3. The pricing presented (in the CAPEX portion) is based on the purchase of the infrastructure.

Other financial alternatives (including leasing and asset management options) are available

and will be covered in the subsequent Stage 2 engagement.

The preferred strategy follows on the next page.


Strategy 1: Rapid Recovery Solution with Advanced Functionality - The Preferred Strategy

FOCUS: Rapid Recovery

Target for recovery of service:

2hrs – 8 hrs

Primary Site Requirements:

• Storage Arrays and SAN deployed to consolidate information.

• Real Time Bi-Directional Replication Infrastructure (DWDM / Replicating Software)

• Possibly consolidate all production systems at a single site or distribute between Primary & Secondary sites

Between Sites Requirements:

• Telecommunications Links (Dark Fiber)

• WAN links (existing)

Secondary Site Requirements: Hot DR site with

• Secondary site power & environmental facilities • Storage Arrays and SAN deployed • Real Time Bi-Directional Replication Infrastructure

(DWDM / Replicating Software) • Dedicated Network infrastructure for fail-over. Details of

Network failover worked • Dedicated Host &Backup Infrastructure

Advanced Functionality: (Incremental Value over Strategy 2)

Automation and Integration: • Discovery, configuration, operation of multiple disk

replicas • Automation of source data restoration to production

servers or alternate hosts • Automation of provisioning and related reporting • Automation of policy-based activities and system

administration tasks • Integrated Performance Monitoring Scalability &

Performance: • Solution can scale to include Enterprise Business

Application across BUs • Ensure minimal performance degradation as these

platforms integrated into the deployed solution DRP Process Integration Consulting: • Professional Services engagements during the

implementation process to ensure features implemented during DRP implementation are integrated into BCP plans of other BUs.


Consideration Pro Con

Management: 1. Better control of future DRP requirements

2. Least Business Impact 3. More options for

replication and distribution of data

1. Some resource redeploy-ment and training required

2. Needs commitment from other Business Units

Ease of Recoverability: 1. Most available and scalable of options

2. Frequent Testing of DR procedure possible

Price: 1. Strong TCO + Strongest ROI

1. Higher initial + mainten-ance cost

Solution: 2 site synchronous storage based replication using EMC Enterprise Storage

Indicative Pricing:

INFRASTRUCTURE COMPONENTS: PRICING:

CAPEX (INCLUSIVE OF 1 YR. MAINTENANCE) Storage Arrays and SAN deployed at Primary & Secondary sites + SRM Software + Replicating Software

$ x million

Dedicated Host Infrastructure at Secondary site $ x million Dedicated Backup Infrastructure at Secondary site $ x million Dedicated Network infrastructure for fail-over at Secondary site $ x million Sub Total: $ 3x

REVEX (ONE-TIME IMPLEMENTATION EXPENSE)

One-time Professional Services to deploy and integrate infrastructure $ x million

OPEX (ON-GOING OPERATING EXPENSE) Secondary site facilities (per/annum)

12 x $ x/ 1.78 = $ x millionReal Time Replication Infrastructure (DWDM / telecommuni-cations Links) as a service 2 x $ x/ 1.78 = $ x million Maintenance from second year

$ x million Tapes bunkered at new site

No additional cost

Sub Total: (Per Year) $ 1x


Synchronous versus Asynchronous Replication (Intermediate Recovery) Comparative Pricing:

Based on earlier pricing presented to ABC, we established that the incremental CAPEX (related to

networking equipment required to support Asynchronous replication) presented for Asynchronous

negated some of the OPEX benefits offered by the lower Operating Costs of Telecom

requirements for Asynchronous replication. We decided that the incremental benefits of a

predominantly synchronous solution outweighed the marginal additional costs.

Pricing Summary for ABC Requirements: Based on Strategy 1 and the data provided to us, the following pricing changes should be

considered when planning for ABC’s DRP requirements:


CAPEX (INCLUSIVE OF 1 YR. MAINTENANCE)

Storage Arrays and SAN deployed at Primary & Secondary sites + SRM Software + Replicating Software

$ x million

Dedicated Host Infrastructure at Secondary site ABC Hosts $ x million

Dedicated Backup Infrastructure at Secondary site $ x million

Dedicated Network infrastructure for fail-over at Secondary site $ x million

Sub Total: $ 3.3 x

REVEX (ONE-TIME IMPLEMENTATION EXPENSE)

One-time Professional Services to deploy and integrate infrastructure $ x million

OPEX (ON-GOING OPERATING EXPENSE)

Secondary site facilities (per/annum) 12 x $ x/ 1.78 = $ x million

Real Time Replication Infrastructure (DWDM / Telecommunications Links) as a service 2 x $ x/ 1.78 = $ x million

Maintenance from second year $ x million

Tapes bunkered at new site No additional cost

Sub Total: (Per Year) $ 1 x


Strategy 2: Standard Rapid Recovery Solution

FOCUS: Rapid Recovery

Target for recovery of service:

2hrs – 8 hrs

Primary Site Requirements:

• Storage Arrays and SAN deployed to consolidate information.

• Real Time Bi-Directional Replication Infrastructure (DWDM / Replicating Software)

• Possibly consolidate all production systems at a single site or distribute between Primary & Secondary sites

Between Sites Requirements:

• Telecommunications Links (Dark Fiber) • WAN links (existing)

Secondary Site Requirements: Hot DR site with

• Secondary site power & environmental facilities • Storage Arrays and SAN deployed • Real Time Bi-Directional Replication Infrastructure

(DWDM / Replicating Software) • Dedicated Network infrastructure for fail-over. Details of

Network failover worked • Dedicated Host &Backup Infrastructure

Consideration Pro Con Management: 1. Better control of future

DRP requirements 2. Some Business Impact 3. Not as option rich for

replication and distribution of data

1. Less resource redeploy-ment and training required

2. Needs commitment from other Business Units

Ease of Recoverability: 1. Still high availability 2. Frequent Testing of DR

procedure possible

1. Not the most scalable options

Price: 1. Stronger TCO + Strong ROI

2. Lower initial + mainten-ance cost

1. Incremental functionality has to be negotiated on a case by case basis with incremental cost


Solution: EMC Enterprise Storage based Synchronous Replication

Indicative Pricing:


CAPEX (INCLUSIVE OF 1 YR. MAINTENANCE)

Storage Arrays and SAN deployed at Primary & Secondary sites + SRM Software + Replicating Software

$ x million

Dedicated Host Infrastructure at Secondary site

$ x million

Dedicated Backup Infrastructure at Secondary site $ x million Dedicated Network infrastructure for fail-over at Secondary site $ x million Sub Total: $ 2.5 x

REVEX (ONE-TIME IMPLEMENTATION EXPENSE) One-time Professional Services to deploy and integrate infrastructure $ x million

OPEX (ON-GOING OPERATING EXPENSE) Secondary site facilities (per/annum)

12 x $ x/ 1.78 = $ x millionReal Time Replication Infrastructure (DWDM / Telecommunications Links) as a service 2 x $ x/ 1.78 = $ x million Maintenance from second year

$ x million Tapes bunkered at new site

No additional cost Sub Total: (Per Year)

$ 0.75x


Chapter 10: Conclusions

People have begun to understand ‘Business Continuity Planning is, in essence, a pragmatic undertaking.’ BCP is well within the grasp of common sense individuals possessing sound

analytical and communication skills.

Jargon aside, the “new focus" of BCP is to recover critical business operations as expeditiously as

possible following an unplanned interruption. Now is the time to develop Business Continuity

capability. New technologies will continuously evolve and BCP will keep on changing dimensions,

but for organizations to sustain their competitive edge, this is the way to go. It has to be driven by

top management and every individual in the organization MUST understand his or her role and

value. Change management is the most CRITICAL SUCCESS FACTOR for the Business to

continue after a disaster. If you have a plan in place but no change guidelines, it is better to have

no plan.

The art of war suggests that the victor does not survive due to luck or chance, but due to a clearly outlined strategy that is executed at all levels within the ranks, each individual with his set of objectives set at achieving the collective team’s goal.


Appendix A: Sample Risk Analysis

Likelihood (Rate 1-5) X

Impact (Rate 1-5) =

Risk Category

S.No. Threat or Trigger 1= Very Low 2= Low 3= Medium 4= High 5= Very High

1= Negligible 2= Some 3= Moderate 4= Significant 5= Severe

Relative Weight

(W) A => W<8 B=> 8<W<12C=> 12<W

1 Earthquake 2 x 4 = 8 B 2 Power Failure 4 x 3 = 12 C 3 Fire 2 x 2 = 4 A 4 Hurricane 1 x 4 = 4 A 5 Tsunami 3 x 3 = 9 B 6 Flood 1 x 5 = 5 A 7 Bombing 4 x 4 = 16 C 8 NBC* Attack at Site 2 x 5 = 10 C

9 NBC* Attack within 50 kms 4 x 4 = 16 C

10 Cyber Attack 5 x 3 = 15 C 11 Kidnapping 2 x 3 = 6 A 12 Sabotage 3 x 2 = 6 A 13 Hazardous Accident 3 x 3 = 9 B 14 Product Recall 2 x 2 = 4 A 15 Public Health 2 x 2 = 4 A 16 Work Stoppage 3 x 2 = 6 A

*NBC = Nuclear, Biological, Chemical


Appendix B: References and Bibliography

Books & Journals a) Regis J. Bates, Disaster Recovery Planning for Networks, Telecommunications and Data

Communications, McGraw-Hill, 2nd Edition, 2002. b) A. V. Vedpuriswar and Rajesh Kumar Singh, Enterprise Risk Management Concepts and

Cases Vol. 1 - by, ICFAI University, © ICFAI University 2002

c) John Laye, Avoiding Disaster – How to keep your business going when Catastrophe strikes, FBCI, John Wiley and Sons © 2002

d) The US Congress, The 9/11 Commission Report – The official report by the National

Commission on Terrorist attacks upon United States.

e) Regis J. Bates, Voice & Data Communication Handbook, Fourth Edition, McGraw-Hill, 2001

f) Jon William Toigo, Disaster Recovery Planning: Strategies for Protecting Critical

Information Assets, 3rd Edition, Prentice Hall

g) Michael Wallace, Lawrence Webber, The Disaster Recovery Handbook: A Step-by-Step Plan to Ensure Business Continuity and Protect Vital Operations, Facilities, and Assets, July 2004, AMACOM

Websites a) http://www.globalcontinuity.com

b) http://www.drplanning.org

c) http://www.emc.com

d) http://www.disaster-resource.com

e) http://drie.org

f) http://www.continuitycentral.com

g) http://www.drpplanning.com

Magazines and Newspapers a) The Economic Times articles

b) The Times of India articles


Biography

Smartha Guha Thakurta is a Technology Consultant and Product Marketing Manager who holds the following credentials:

• EMC Technology Architect NAS Expert • EMC Technology Architect CAS Specialist • EMC Technology Architect Storage and Infrastructure Specialist • Symmetrix Speed • NAS Speed

He has also achieved the following certifications:

• Brocade: Brocade Certified SAN Designer • Network Appliance: Network Appliance Storage Associate • Hitachi Data Systems: Hitachi 9500 Presales Certified Professional • Sun Solaris • Sun Certified Cluster 3.x Installer • Sun Certified Field Engineer • Sun Certified Network Administrator for Solaris 8 • Sun Certified System Administrator for Solaris 8, Level 2 & 1 • Veritas: Veritas Certified Presales Professional • Microsoft: Microsoft Certified Professional

business continuity planning in an organisation · the evolution of business continuity too often,...

Documents