REGIS University ARNe Network
Data Backups and Disaster Recovery Plans for the ARNe Network
by
Anthony O. Ayodele
An Operational Guideline and Procedure for Academic Research Network
Practicum Paper submitted in partial fulfillment of the requirements for the degree
of Master of Science in Computer Information Technology
School for Professional Studies
Regis University
Denver, Colorado
07-30-05
School for Professional Studies
Regis University
MSCIT Program
Certification of Authorship of Professional Project Work
Submitted to Dan Likarish
Student’s Name: Anthony Ayodele
Date of Submission:
Title of Submission: Data Backups and Disaster Recovery Plans for ARNe
Network
Certification of Authorship: I hereby certify that I am the author of this document
and that any assistance I received in its preparation is fully acknowledge and
disclosed in the document. I have also cited all sources from which I obtained
data, ideas, or words that are copied directly or paraphrased in the document.
Sources are properly credited according to accepted standards for professional
publications. I also certify that his paper was prepared by me for the purpose of
partial fulfillment of requirements for the MSCIT degree.
Student’s Signature:
School for Professional Studies
Regis University
MSCIT program
IT Section: Data Access Group
Procedure Name: Data Backups and Disaster Recovery Plans
Created by: Anthony Ayodele
Approval by:
Document Library #
Date Created: 07-30-2005
Date Approved:
Introduction: This procedure will walk the reader through the backup and
disaster recovery procedure plans for the ARNe Network
Precedence or Reference:
School for Professional Studies
Regis University
MSCIT program
Advisor/MSC 696 Faculty Approval Forum
Student’s Name: Anthony Ayodele
Professional Project Title: Data Backups and Disaster Recovery Plan for the
ARNe Network
Advisor’s Declaration: I have advised this student through the Professional
project Process and approve of this final document as acceptable to be
submitted as fulfillment of partial completion of requirements for the MSC 696
course. The student has received project approval from the Advisory Board and
has followed due process in the completion of the project and subsequent
documentation.
ADVISOR Dan Likarish. Asst. Professor
Signature Date
Abstract
Data Backup and Disaster Recovery Plan for the ARNe Network
This research paper will save as safeguard and best practice procedure plan for
the ARNe network in respect to Data Backups and Disaster recovery, so that we
can be fully prepare when disaster strikes.
This paper details the methodology approach that will be used by ARNe to
implement Data backups and Disaster recovery plan.
Acknowledgment
I would like to start by thanking the Author of Life, God the creator of all things.
My special thanks go to Asst. Professor Dan Likarish, for his invaluable
comments and constructive feedback throughout the course of written this paper.
Dan laid out standard and leadership directions that serve as guide through the
practicum class
My special appreciation goes to Dr Jame Lupo for his assistance on this project.
As a co-coordinator for the practicum lab, Dr Lupo provides the direction and
useful insight into this project.
I offer my heartfelt thanks to my entire course mate for their peer review of the
Review of Literature and Research section of this paper.
Finally, my thanks go to my father late Bishop Joseph Ayodele for his moral
support, and contribution towards my education.
Table of Content
1.0 Introduction............................................................................................................112.0 Review of Literature and Research..........................................................................13
2.1 Microsoft Operations Framework (MOF)...............................................................132. 2 Service-oriented architecture (SOA)......................................................................16I. Literature and research that is specific/relevant to the project..............................18II. What is known and unknown about the project topic............................................20III. Contribution this project will make to the Academic Research Network.........20
3.0 Methodology..........................................................................................................224.0 Project History.......................................................................................................26
4.1 Data Backup and Recovery.....................................................................................324.2 Disaster Recovery....................................................................................................44
5.0 Lessons Learned and Next Evolution of the Project.............................................515.1 Conclusion.........................................................................................................51Practicum Support Documentation................................................................................51List of Tables.................................................................................................................52Table 4.1 Data Backup and Recovery Support Matrix..................................................52Table 4.2 Backup and Recovery Support Responsibilities..........................................54Table 4.3 Data Backup Configuration and Management..............................................54Table 4. 4 Data Daily Monitoring and Failure Notification..........................................55Table 4.5 (file restoration and Recovery of Corrupt or deleted files)...........................56Table 4.6 Media Labels.................................................................................................56List of Figures................................................................................................................58Figure 2.1 MOF (Microsoft Operational framework) Quadrant..................................58Figure 2.2 (SMF service Management function of each MOF Quadrant)....................59Figure 2.3 (SOA) Service Oriented Architecture..........................................................59Figure 3.0 SDLC ( System Development Life Cycle )..............................................60Bibliography..................................................................................................................61References......................................................................................................................63Definition of Terms.......................................................................................................64
Chapter 1
1.0 Introduction
This project describes the methods and procedure to be used by ARNe for Data
Backup and restore. Also act as a safeguard procedure in the event of a disaster.
Data corruption, viruses, hard disk failure, power failure, accidental or malicious
data deletion, theft and natural disasters are all situations that necessitate
attention for a meaningful disaster recovery policy.
Security risk analysis, otherwise known as risk assessment, is fundamental to the
security of any organization. It is essential in ensuring that controls and
expenditure are fully commensurate with the risks to which the organization is
exposed.
A critical part of handling any serious emergency situation is in the management
of the Disaster Recovery Phase. By definition, the Disaster Recovery Phase is
likely to involve, to a significant degree, external emergency services. The priority
during this phase is the safety and well being of the employees and other
involved persons, the minimizations of the emergency itself, the removal or
minimization of the threat of further injury or damage and the re-establishment of
external services such as power, communications, water etc. A significant task
during this phase is also the completion of Damage Assessment Forms.
Disaster Recovery Phase may involve different personnel depending upon the
type of emergency and a Disaster Recovery Team should be nominated
according to the requirements of each specific crisis.
Today, Business continuity planning and disaster recovery planning are now
generally acknowledged as a vital element of an organization business activity
plans. However, the creation and maintenance of a sound business continuity
and disaster recovery plan, is a complex undertaking, involving a series of steps.
An organization must analyze what needs to be achieved in order to carry on as
though the disaster never happened. Data and assets must be identified for
restoration, documentation and reservation to reduce loss.
Prior to creation of the plan itself, it is essential to consider the potential impacts
of disaster and to understand the underlying risks: these are the foundations
upon which sound business continuity plan or disaster recovery plan should be
built. Following these activities the plan itself must be constructed. This itself
must then be maintained, tested and audited to ensure that it remains
appropriate to the needs of the organization.
The creation and maintenance of a sound business continuity and disaster
recovery plan, is a complex undertaking, involving a series of steps.
Chapter 2
2.0 Review of Literature and Research
In supporting and managing the ARNe network, Microsoft Operations Framework
(MOF) and SOA (Service Oriented Architecture) will be used as standard for the
as a best practice for the System Engineering and Application Development
practicum ( SEADP) . The Strategic nature of the ARNE Network, call for an
operational framework that can stand the test of time, in other to achieve high
2.1 Microsoft Operations Framework (MOF)
The framework is divided into four quadrants namely:
1) Optimizing
2) Supporting
3) Operating
4) Changing
See figure 2.1 (MOF) Overview of MOF Quadrant
2.1.1 Optimizing: delivering the best service possible
a) Service Level Management - All service provider (Qwest) of the network will
be require to meet certain level of service agreement base on the need of the
network, In other to serve it purpose. Business focused service levels will be
created, managed, met and improved.
b) Capacity management - Meet demands on services by controlling capacity
requirements
c) Availability Management – ARNE Network will be up and running 24/7,
expect when maintenance are been carry out.
d) Financial Management – The running cost of 100k will be maintain
e) Workforce management - Students in the practicum will be supporting and
maintaining the network within the budget constraint. MSCIT faculty member will
be in charge of the daily operation of the network
There will be new improvement to service and delivery as the network continues
to grow. To accommodate any propose changes to services, the approval
process includes confirming business priority, cost/benefit analysis and release
plans.
2.1.2 Changing: Managing changes in the enterprise
a) Change Management – All network changes in term of hardware and
software changes will be recorded, tracked, assessed and monitor.
b) Configuration Management – All configuration and update on any network
infrastructure will conform to standard procedure and business rules.
c) Release Management - All software and hardware releases into the ARNE
network will be deployed in most efficient manner without any disruption of
service
There will be adequate plan, release readiness review, before the release of any
new product into the network, to ensure that changes happen smoothly with
minimal distribution to the IT Services
2.1.3 Supporting: Responsive high quality support
a) Service Desk - Practicum MSCIT student with require skill will serve as first
point of contact in problem resolution.
b) Incident Management – Track-it service desk will be use to report, monitor and
escalate all incident in conformity with SLA (Service Level Agreement) with all
parties
c) Problem Management - All problems will be determine, manage, resolve and
documented on centralized knowledge management database, with the aim of
proactively preventing problems happening.
This will ensures customer satisfaction by reviewing the IT performance delivered
for the services against the targets documented within the Service Level
Agreement (SLA).
2.1.4 Operating: Successful, reliable and predictable day-to-day IT
Operations
a) System Administration – MSCIT faculty will provide day-to-day
administrative services, and responsible for providing direction for operations.
b) Security Administration – With the implementation of Single Sign On, and
Firewall security, this will ensure IT is safe, confidential, accurate and available
c) Service Monitoring and Control – All network resources will be monitor for
optimization, availability and efficiency. Notification will be sent to the all right
people know what is going on
d) Network Administration - Access to the server and all physical component of
the network will be restricted to authorized student
e) Directory Services Administration - Application delivery will be through the
Citrix server, this will ensure that all student and faculty have access to the right
information and application whenever they need it.
f) Storage management - It is important to have High performance SAN
storage with outstanding scalability (IBM Total Storage DS 4300 and the Hitachi
Storage) ARNe will make use of these systems as the practicum grows
See figure 2.2 (MOF) Overview of Services Management functions of each
Quadrant
2. 2 Service-oriented architecture (SOA)
Successful integration for today’s business must accommodate a high level of
variety and change involving a large number of systems, applications, data
format, standards and connectivity for both legacy systems and new applications.
Driven by business and technical factors, this growing volatility makes the goal of
enterprise integration a complex, hard-to-reach moving target for today
professionals.
Service-Oriented Architecture offers a fresh approach for business integration
that provides more flexibility technologies such as Web Services, Asynchronous
Messaging, Business Process Management (BPM) and the Enterprise Service
Bus (ESB)
A service-oriented architecture is essentially a collection of services. These
services communicate with each other. The communication can involve either
simple data passing or it could involve two or more services coordinating some
activity. Some means of connecting services to each other is needed.
Service-oriented architectures are not a new thing. SOA and its related
technologies are being adopted across a range of industries by both large and
small to medium-sized businesses. The first service-oriented architecture for
many people in the past was with the use DCOM or Object Request Brokers
(ORBs) based on the CORBA specification.
1) Services
If a service-oriented architecture is to be effective, we need a clear
understanding of the term service. A service is a function that is well-defined,
self-contained, and does not depend on the context or state of other services.
2) Connections
The technology of Web services is the most likely connection technology of
service-oriented architectures. Web services essentially use XML to create a
robust connection.
The following figure (see figure 2.2) illustrates a basic service-oriented
architecture. It shows a service consumer at the right sending a service request
message to a service provider at the left. The service provider returns a response
message to the service consumer. The request and subsequent response
connections are defined in some way that is understandable to both the service
consumer and service provider. How those connections are defined is explained
in Web Services explained. A service provider can also be a service
consumer.
With integration processes as the key building blocks of a flexible integration
strategy, SOA can accommodate variety and change, thereby fully delivering on
the promises of agile enterprise.
I. Literature and research that is specific/relevant to the project
The main focus of this project is to develop an operational procedure for data
backup and disaster recovery on the ARNe network.
The area to be examined is:
1) Storage management
2) Continuity management
Each area of the Regis ARN will be analyzed and then suggested guidelines will
be created to use in changing the network into a more structured format.
a) Storage management. The purpose of storage management is to properly
maintain, monitor, and develop policies for storing, backing up, and restoring
data. The roles involved with the functions of storage management are storage
manager, media librarian, and capacity manager. The storage manager has total
responsibility for ensuring proper storage management processes are being
followed. The practicum data access groups are responsible for tracking all
media used for backup operations. The practicum faculty will be responsible for
ensuring that current storage capacities and processes are meeting the
requirements of the organization and projects changes to such agreements
based upon foreseeable storage growth changes.
Microsoft “best-practices” will be used to ensure proper storage configuration
management. The use of media sets, off-site rotation schedules, scheduled
restoration tests, and server space storage checklists will be used to ensure that
proper storage management is being conducted.
b) Continuity management: This is concerned with ensuring that critical
services remain available to customers. Continuity management is usually
associated with disaster recovery procedures and maintaining high availability of
services.
The major focus of this project is disaster recovery for the purpose of business
continuity and operation in face of any failure defined within other functional
areas.
Each new NLP participant will be required to become familiar with the processes
laid out in this project. . The facilitators of the NLP must ensure that exiting
students have updated all processes and that entering students understand the
goals, design, structure, and processes for the entire NLP and each site/domain.
II. What is known and unknown about the project topic
The current Data Backup is not fully operation, though we have all the
resources needed to put it into operation. Faculty and practicum student are
aware of the existence of the all the tools. Since all operation of the practicum
need appropriate documentation, it necessary to have a laid down procedure to
serve as a guide for future usage and improvement. With the advent of share-
point (a repository point for all documentation and communication).
Unknown factor for this project is the budget available for the implementation of
the project. I am aware that NLP faculty is always sourcing for fund to make the
NLP a success. The goals and guidelines proposed by this project may never be
accepted by Regis administration as a cost-effective resource for student
learning.
Also, an unknown factor is the general acceptance of the procedure and plans
mentioned in this project by the NLP faculty. Student Participation at various
campuses is not equal and this might hamper the implementation of this project
III. Contribution this project will make to the Academic Research
Network
This project is to as serve a standard procedure for Data Backup and Disaster
Recovery activities on the ARNe. Since, there is no any precious procedure for
Data Backup and Disaster Recovery, this project will serve as not just as
starting point but as foundation that subsequent Practicum student can be
build on . The procedures presented in this project will serve a guide for ARN
practicum student that will be involve the actual Data Backup activities. The
objectives and guidelines presented in this project offer standards to be followed
at each location. In addition, this project will lay out fundamental procedures for
maintaining continuity between cycles of NLP students. This project will give
Practicum student the basic understanding of Data Backup and Disaster
Recovery.
Chapter 3
3.0 Methodology
System development methodology provides guidelines to follow for completing
every activity in the systems development life cycle, including specific models,
tools, and techniques. These methodologies examine the need, and the risk
associated with the implementation of the propose Data backup and Disaster
Recovery Plan for the ARNe Network.
The development phases for this project are will follow the standard format for
any systems development life cycle (SDLC) such as: planning, analysis, design,
implementation, and support. The real world implementation of this project may
or may not occur. The implementation of this project rest solely on ARN NLP
management and support of practicum students.
System Development Life Cycle for the Project
Planning
Analysis
Design
Support /Maintenance
Update and Review documentation
Suspend Project
Technology Vendors/Service Provider
Implementation
Figure 3.0
1) Planning phase: Planning phase involves the justification of the feasibility of
the project if it worth investing time and energy into. The need to have
standardized procedure of operation in the ARN NLP in respect of Data backup
and Disaster Recovery is very important. The structure for determining guidelines
(i.e. the Microsoft Operational Framework) was already chosen my ARN NLP
management before this project was constructed. Therefore a need and an
operating template were already chosen. The only thing that required planning
was limiting the scope of the project down to an area that was manageable. This
was accomplished by only examining nine areas of the Microsoft Operational
Framework that need to be addressed concerning operational control guidelines,
change control guidelines, and continuity planning.
2) Analysis phase: After a examining the rapid growth and expansion of the
ARNe network , it is it paramount to have a procedure to handle Data backup
and Disaster recovery to safeguard the loss of valuable data and assets .
MOF (Microsoft Operation Framework) and SOA (Service Oriented Architecture),
best practice was used as baseline for the analysis of the ARNe network
The tools used for this project are Microsoft Visio, MOF and SOA instruction
guides. The instruction guides are what were examined to determine the “best-
practices” to be used within the ARN NLP. Visio was used to create logical,
physical, and organizational diagrams.
Life-cycle models to be followed
3) Design phase: this will involves listing of all assets and other valuables data
to be protected and backup. Also, software and hardware upgrade will be part of
this process. It might require changing or replacing some of the system, to
accommodate any anticipated capacity. With full participation of future ARN NLP
student, this phase will be an opportunity to get a hands-on experience
4) Implementation phase: this will involve actual execution of the Back up and
Disaster Recovery Procedure in steps to ensure that it meets all it intended
purpose. This phase with be the easiest to carry out once all resources have
been put in place. However, this phase can also be daunted based upon the
same factors that inhibit the third phase. Participation of all ARN NLP practicum
members, most, especially the Data access group will be needed. Funding is
another cogent factor here as well, as lack of funding will make implementation
unrealistic. Effective training, constant update and review of documentation are
also important.
5) Support Phase: this is not an actual phase on it own. After, implementation
phase, Support phase will involve maintenance of the systems. Creating a
knowledge based during the support phase will serve as a repository for
troubleshooting. And this enhances the improvement of the procedures. Data
access group member should be fully involve in support
Chapter 4
4.0 Project History
I. How the project began
Data Access group of Regis System Engineering and Application Development
Practicum are saddle with following responsibility of maintaining the security
and integrity of data.
To accomplish this task the group are require to;
a) monitor server health and performance
b) perform backups and disaster recovery
c) verify and adjust the security configuration of servers and desktops
d) maintain the storage devices
e) Create and maintain UNIX, and Window user account.
This Project is one of the core assignments of the Data Access group. At the
very beginning of this project, it was realized that no specific procedure or
standard as per Data Backups and Recovery was in place, though there were
previous documentation on Data backups and Recovery. The Adam Brennen’s
project “Combine_6_12_04.doc” was one reference for this project. With my
participation in various group meeting and consultation with practicum faculty, I
was able to establish the basis for this project.
After due consultation with SEADP faculty( Dan Likarish and Dr. Jim Lupo ) , it
became necessary to have a written procedure on best practices for Data
Backup and Disaster Recovery on ARNe network .
Implementation
With an appreciable knowledge of the ARNe network, assets and distributed
applications, this help in the creations and documentation of this project.
II. How the project was managed
The first phase of this project was to define the goals and objectives of this
project. Then a proposal was submitted to the faculty to establish the rationale
behind the project.
Also, a project outline with an anticipated time line was included in the proposal.
Another vigorous phase of this project was the research and fact finding
activities, this was done with various consultation with the SEADP faculty and
practicum group members. After a thorough research and consultation from
other group members, with the input from SEADP faculty, a guideline for this
project was established for this project.
III. Significant events/milestones in the project
The most significant events of this project were the change of direction of the
project. The project began as one focus and evolved into a much broader focus.
The realization of a need for organizational guidelines, rather than immediate
documentation, was significant to the development of this project.
Another major milestone was actually the research. Almost all of the processes
that need to be implemented in smoothly-run IT organization are already defined
within the Service Management Functions (SMFs) of the Microsoft Operational
Framework. The processes just needed to be modified to meet the organizational
goals of the NLP.
The separation of the ARN development network from the ARN production
network reduced the coverage area of the project. The decisions from NLP
management were not to change the structure or operations of the ARN
production network. The current operations of the production network did not
need to be changed. Therefore, considerations for how the development network
interacted with the production network could be eliminated from the scope of the
project.
Interviews with NLP management demonstrated that not every single process
needs to be defined according to the recommended, Microsoft “best-practices”.
The lack of a true business environment within the ARN development network
does not require certain service level agreements (SLAs). The lack of SLAs does
not justify implementing all of the SMFs defined within the MOF (i.e. high
availability management). This realization also reduced the area of concern for
this project.
IV. Changes to the project plan
My initial intention was to start doing a daily and weekly Data Backups on the
ARNe network, but I realized that there are no specific written procedure as to
how , and when backup activities should be carry out on the ARNE network .
Due to this development, I decided to concentrate on developing a procedure,
which other subsequent practicum student can follow and update accordingly.
V. Evaluation of whether or not the project met project goals
This project met its goals and objectives:
(1) An operational framework was established
(2) Basic procedure of Data Backup was established
(3) Disaster Recovery plans was formulated
All goals and objectives of this project were structured to accommodate future
inputs and expansion.
VI. Discussion of what went right and what went wrong in the project
The area that went wrong with the project has to deal with the lack of current
documentation of the existing network structure that is being used at each
campus of the DTC. The needs identified by NLP management may not be the
needs of each individual campus. High-level ARN management defined the
needs for the entire organization. Therefore the guidelines that were developed
are going to be downward directed. This may cause unrest and contention
amongst member of mid-level ARN management.
In addition, the DTC NLP participants were making changes to the ARN that
were not aligned with the guidelines being developed in this project. The
implementations that the DTC, and possibly other campuses, were making to the
ARN during the development of these guidelines may have to be scraped and
redesigned if management actually follows through with the implementation of
the proposed guidelines within this project.
The major areas that went right in this project are that the chosen operational
framework fits well with the goals defined within the project. The fact that the
Microsoft Operational Framework was already decided upon by ARN
management made the research and development of guidelines very easy to
facilitate.
VII. Discussion of project variables and their impact on the project
The greatest project variable was the decisions of ARN management. The
guidelines presented in this project offer a template for “best practices” in an
environment that has funding and everyday administration. Many responsibilities
fall high-level and mid-level management to ensure that guidelines are followed.
Management may not view such highly “expensive” guidelines are necessary.
However, the guidelines should be used a “golden state” template and
management will tweak the guidelines to meet the needs of the ARN accordingly.
VIII. Findings / analysis results
The analyses of the results of this project are:
(1) An organizational structure was defined;
(2) Needs were identified and prioritized;
(3) a logical/physical network structure was designed; and
(4) Operational guidelines were created for daily network operations.
IX. Summary of results
The analyses of the results of this project are:
(1) an organizational structure was defined;
(2) needs were identified and prioritized;
(3) a logical/physical network structure was designed; and
(4) operational guidelines were created for daily network operations.
4.1 Data Backup and Recovery
(DTCBACK01 Server as A case study)
Backup Procedure: Login to DTCBACK01
From you desktop select Program >>>>> then Control Panel >>>> Then
Remote Desktop connection >>> Then enter www.arn-regis.org as show below,
then connect
2) Then you should have this screen below
3) After entering you password you should have the screen below
4) Then select Remote Desktop Connection from the desktop then login
again with you username and password. After you successfully login, you should
be looking at the screen below.
4) You can now select VERITAS backup Exec from the desktop or go
through the start menu as shown below
Finally you should be looking at Veritas Overview screen and the available
option as shown below
See table 4.1 (Data Backup and Recovery Support Matrix)
See table 4.2. (Data Backup and Recovery Support Responsibilities)
4.1.1 Provide Documentation
The Backup and Recovery Team will be responsible for providing documentation
on the installation and configuration guidelines of the backup toolsets. The
documentation will be posted on the SEAD practicum share point site (www.arn-
regis.org ) the documentation will be reviewed and updated on a 6-month review
cycle.
4.1.2 Provide Access to the Toolset
The Backup and Recovery Team in conjunction with practicum faculty will be
responsible for providing all files needed for the installation of each supported
version of backup tool, its agents, and software build updates. The locations of
the files are found in the documentation.
4.1.3 Install Software and Patches
The faculty and Backup and Recovery Team will coordinate software installations
on new servers using the documentation and toolset provided by the Backup and
Recovery Team.
The Backup and Recovery Team will install software patches, updates and fixes
as necessary.
4.1. 4 Request and Provide Licenses
The Faculty lead will provide appropriate Veritas backup licenses to the Backup
& Recovery Team new servers being brought into production, hardware
upgrades, database installations, or database upgrades.
Different versions of backup software may not be compatible when used in the
same backup scheme.
4.1.5 Create Change Control
The change implementer will normally submit changes using the MOF change
management approach. Backup and Recovery team support will submit changes
for software and update installation. The change must include Faculty and the
Backup and Recovery Team Lead as an approver.
See table 4.3 (Data Backup and Configuration and Management)
4.1.6 Configure and Verify Backups
The Backup and Recovery Team will be responsible for configuring server and
file backups. The backup configuration will be checked and verified every 4-6
months and regularly on a failure basis.
4.1.7 Coordinate Database Backups
Faculty and Data access Team lead will always coordinate all the backup
process to ensure proper security of data
4.1.8 Save and Document Backup Scheme
Backup configurations are saved automatically on a regular basis as part of the
normal log collection process.
4.1.9 Check Failures and Update Reports
The Backup and Recovery Team will investigate the failure reports first thing in
the morning Monday-Friday. After failure detection and resolution the failures are
logged.
4.1.10 Failure Notification and Resolution
The Backup and Recovery Team will try to identify and resolve any failure.
In the case of an unresolved or second consecutive failure, the Backup and
Recovery team will notify the their group team lead and faculty in charge to
coordinate the best solution
The Backup and Recovery Team Lead will work with faculty to determine the
Root Cause Analysis of the problem. Knowledge base and other means of
troubleshooting will be utilized. A hardware or network failure that impacts a
backup will be treated as an Unresolved or Second Consecutive Failure if a
workaround cannot be established.
The Backup and recovery team will work will notify the faculty in charge to
determine if the backup should be moved to other available resources or accept
the risk of subsequent failures.
See table 4.4 (Daily Monitoring and Failure Notification)
4.1.11 Handling lost of Drives
The Backup and Recovery (Data Access group) will notify the faculty in charge of
the practicum that a drive was missing media immediately after it is detected.
Faculty with Backup and Recovery team lead will take appropriate action.
4.1.12 Team Knowledge Database
The Data Backup team will create a Knowledge base document so that know
problems could be resolve easily. The knowledge base will be updated
periodically. The Backup and Recovery “Team Knowledge Base” is to be used
to:
Track progress on large backup issues
Provide a central location for viewing progress
Allow updates by multiple parties
Provide technical reference for similar issues.
The Backup and Recovery Team Lead will create and monitor the entry when the
above needs cannot be met using standard failure notification.
4. 1.13 File Restoration and Recovery of Corrupt or Deleted Files
The Backup and Recovery team will carry out file restoration. The Backup &
Recovery team will liaise with the faculty to know on what file to restore. And
make sure the appropriate media has been received and mounted.
The Backup and Recovery Team will coordinate and perform the restore.
See table 4.5 (file restoration and Recovery of Corrupt or deleted files
4.1.14 Label Media
Backup and Recovery team will accurately label all backup media according to
the media rotation and retention defined by the by faculty. Media requirements
should be communicated to the Backup and Recovery Team faculty. Refer to
“Sample Media Labeling” (table 4.6) in the appendix of this document for
NetBackup uses barcode media labels in the robotic devices. These can be any
6-character alphanumeric combinations. Labels exceeding the 6-character
limitation will truncate from the left. Typically cleaning tapes are designated with
a CLN### label. Bar codes should not be transferred from a tape with usable
data or the NetBackup database will become corrupted and data recovery is
jeopardized.
4.1.15 Maintain Cleaning Schedule
It is recommended that running a proactive cleaning tape at least once every 30-
45 days or as needed unless specified by the tape device manufacturer. The
Backup and Recovery Team will maintain a cleaning schedule similar to the
“Sample Cleaning Tracking Report” found in the appendix of this document.
4.1.16 Change Media and Acquire Off-Site Media
Onsite support will mount the correct media in accordance with the backup
retention and rotation prior to the start time of the nightly backup.
When the backup has exceeded the capacity of the media or when the current
media has become damaged, the Backup and Recover Team will follow up with
the faculty
4.1.17 Server Access and Administration
The faculty on site will allow the Backup and Recovery Team appropriate access
to a server whenever the need arise.
4.1.18 Sample Cleaning Tracking Report
The name of the tape device is usually associated with the server to which it is
connected.
TAPE DEVICE JANUARY FEBRUARY MARCH APRIL
DTCBACK01XXX Monday 1st Monday 5th Monday 5th
After approximately 12-15 uses the cleaning tape should be replaced.
Data Backup Process Flowchart
Data Backup and Restore Procedure
Problem detected
Check For problem
Open Failure Report
Failure Notice received Recei
ved rt
Contact Faculty
Support
Troubleshoot with other Data access group members
Problem resolv
ed
Update the daily
problem log /Knowledge base
Verified that Job are now ready for backup or restore
Return
Follow Escalation
YES
NO
Yes
Yes
4.2 Disaster Recovery
4.2. 1 Perform Risk assessment and Audit
To have effective Disaster Recovery plan for the ARNe network, this project will
look at the following threats on the ARNE network as a potential disaster that can
affect the network:
a) Accidental: loss of power,
b) Natural: floods, earthquakes, hurricanes, tornadoes
c) Internal: Sabotage, theft etc
Inventory of Assets on the ARNe Network and Severity
CS Severity
Level
ILB Severity
Level
DTC Severity
Level
Primary
W2K3 DC
Server
Mission
Critical
Broomfield
Ghost Server
= ILOFS03
Important Firewall Mission
critical
Secondary
W2K3
DC/Ghost
Server
Important Broomfield
Server =
ILOFS04 =
192.168.X.X
Mission
critical
Citrix 03 Mission
critical
W2K3
Citrix
Server
Mission
critical
SaintMary
( Citrix Server
= 192.168.X.X
Mission
critical
VMware
Servers
Mission
Critical
Solaris 10
x86
Server
Important SQL server
(NLP-
XXXXXXXX) =
Win2000
Important Sun Solaris Important
NETGEAR
fast Ethernet
switch (fs116)
16 port
Mission
critical
Saintluke(
DC/AD)
Mission
Critical
NETGEAR
fast Ethernet
switch ( fs 108
)
Mission
critical
NetIQ Mission
critical
Cisco router Mission DTCBACKUP01 Mission
2500 critical ( Backup
Server)
Critical
NLP T1 Red
Switch
Mission
critical
Acadunix.regis
.xxxx
Mission
Critical
SEVERITY BASED ON IMPACT IN CASE OF DISRUPTIVE EVENT
Mission
Critical
This will cause extreme disruptions to the network and practicum
student wont’ be able to function
Importan
t
This will cause a moderate disruption to the network
4.2.2 Disaster recovery plan for the ARNe Network
The plan will entail the following:
1) The purpose and scope: The recovery scope will cover all assets on the
ARNE network in all the three location (DTC, ILB and CS). All detail
documentation (software and Hardware) physical safeguards, Insurance
considerations, contingencies should are considered. Also, computer service and
telecommunication link to the entire three sites are put into consideration, to
avoid undue disruption.
2) Creating, maintaining and protecting backups: All data backup activities at
DTC location through DTCBACK01 server will be kept away from the site . Up
to date backups of all application and data will be maintained. This step ensure
that data are recover incase of any loss. Also this step helps in maintaining data
integrity in case of any disparity. All Tapes backups will be protected against
strong magnetic fields, which can destroy the tapes
3) Disaster Recovery Team (DRT): ARNe Management will set up a Recovery
team to comprise the variuos particum group (Data access group, Development
group, Operation group and System Network group) on the ARNe network for
effective response to any incident. This team will be organize into various areas
of responsible such ; collecting and analyzing evidence, containing and
preventing further intrusions, and updating the recovery plan. The Recovery
team will have an establised line of communication ( email, phones etc )
5) Disaster Recovery Procedure: In case of any disaster on the ARNe
network, the disaster recovery teams need to be assembled. The team will make
a decision on which of the alternate site ( CS, ILB or DTC) they need to utilize
for business continuity purpose, depending on which site are affected by the
disaster . Make instant evacuation of personnel (if practicum student) are onsite
at the time of incident. On completion of the Initial Disaster Recovery Phase the
DRT leader(s) should prepare a report on the activities undertaken. The report
should contain information on the emergency, who was notified and when, action
taken by members of the DRT together with outcomes arising from those actions.
The report will also contain an assessment of the impact to normal business
operations. The report should be given to the Practicum faculty management
Business Recovery Team (beyond this project), with a copy to any Management
team, as appropriate.
6) Recovery procedure: after initial response has been put in place and
operation shifted to unaffected site. The recovery team will start the utilization of
all data (backups) that has been kept off-site. The procedure for business
continuity (This is beyond the scope of the project) and restoration of data will be
observed. Also, how to fully recover from the disaster and prompt return to
normal business operation needs to be fully addressed.
7) Preparing for a disaster and Test the plan: There will be a constant review
of procedure and safeguards measure in place to reduce the risk of a disaster
and evaluate the level of impact. This includes general procedure and software
safeguards. Also, all systems (Desktop and servers) network devices ,
communication links , and office facilities will be tested on a periodic basis to
ascertain the readiness in case of recovery of a disaster . A realistic test of the
components of a business continuity plan should be conducted and analyzed so
that modifications can be made as necessary.
event of serious injury or even death of an employee, it would be beneficial if the
person notifying had access to counseling service contact numbers in order to be
able to offer this type of support and advice.
8) Assessing Potential Business Impact of the Emergency
Assessments need to be made at various stages during the recovery process as
to the potential scale of the emergency from a business perspective. During the
Disaster Recovery Process, these will include a preliminary damage assessment.
The initial assessments will normally be carried out by the Disaster Recovery
Team who may call on other specialists to help them with this process as
appropriate.
The assessments will be based on the particular circumstances applying and the
following five point scale may be considered appropriate.
9) Maintaining Event Log during Disaster Recovery Phase
It is important that all key events during the disaster recovery phase are
recorded. An event log should be maintained by the leader of the Disaster
Recovery Team. This Event Log should be started at the commencement of the
emergency and a copy of the log passed on to the Business Recovery Team
once the initial dangers have been controlled.
The format should include the date, time, title of the event, brief description of the
event and outcomes. It should also include follow up action needed, as
appropriate.
Chapter 5
5.0 Lessons Learned and Next Evolution of the Project
No disaster recovery plan is a static document, even this document, but this
represent the starting point for the on-going maintenance necessary to keep any
such plan current. And To assist those who will be responsible for maintenance
of and safeguard of the ARNe network .Each student exiting the NLP is
responsible for ensuring that any relevant information pertaining to Data Backup
and Disaster Recovery are well documented as an ongoing process.
5.1 Conclusion
Data backup and disaster recovery is a critical aspect for any organization. .
Business continuity planning and disaster recovery planning are now generally
acknowledged as a vital element of an organization business activity plans.
In conclusion, a sound Data Backup and Disaster Recovery plan is essential to
protect the well being of an organization.
Practicum Support Documentation
List of Tables
Table 4.1 Data Backup and Recovery Support Matrix
Action Data Access Group Approval ( Faculty)
Provide Software
Documentation
Provide Access to Toolset
Install Software and
Patches (Staffed Sites)
Install Software and
Patches
Request Licenses
Order Licenses
Create Change Control
for Software Installation
Configure Backup Jobs
Verify Backup
Configuration
Coordinate Database
Backups
Save and Document
Backup Configuration
Check Failure Report
Failure Notification
Update Failure Logs
Check Failure on Web
Reports
Hardware Failure
Create Issue in Tracking
Database
Update and Monitor Issue
in Tracking Database
Recover Corrupt or
Deleted Files
Mount Correct Media
Attain Tapes from Off-Site
Storage Location
Complete Test Restores
Label Media
Maintain Cleaning
Schedule
Change Media for
Backups
Communicate Additional
Media Needs
Provide Appropriate
Server Access
Table 4.2 Backup and Recovery Support Responsibilities
Action Implementation Group ( Data
access group )
Approval
Provide Documentation
Provide access to the toolset
Install software and patches
Install software and patches
Request appropriate licenses
Acquire and Provide
appropriate licenses
Create Change Control
Table 4.3 Data Backup Configuration and Management
Action Implementation ( Data
access group )
Approval ( Faculty)
Configure backup jobs
Verify backup
configuration
Coordinate specific
backup for databases
Save and document
backup
Table 4. 4 Data Daily Monitoring and Failure Notification Action Practicum Student
( Data access group )
Faculty
Check Failure Reports
Immediate failure
notification
2 or more failure
notification
Update the failure logs
Check reports on web
site
Hardware Failure
Table 4.5 (file restoration and Recovery of Corrupt or deleted files)
Implementation ( Data
access group )
Faculty
Recover corrupt or
deleted files
Mount correct media
Attain tapes from off-site
storage location
Monitor and report
Complete Test Restores
Table 4.6 Media Labels
Daily Media
“SERVERNAME ” Monday
“SERVERNAME” Tuesday
“SERVERNAME” Wednesday
“SERVERNAME” Thursday
Weekly Media
“SERVERNAME” Week 1
“SERVERNAME” Week 2
“SERVERNAME” Week 3
“SERVERNAME” Week 4
Monthly Media
“SERVERNAME” January
“SERVERNAME” February
“SERVERNAME” March
“SERVERNAME” April …..
List of Figures
Figure 2.1 MOF (Microsoft Operational framework) Quadrant
Figure 2.2 (SMF service Management function of each MOF Quadrant)
Figure 2.3 (SOA) Service Oriented Architecture
Figure 3.0 SDLC ( System Development Life Cycle )
Planning
Analysis
Design
Support /Maintenance
Update and Review documentation
Suspend Project
Technology Vendors/Service Provider
Implementation
Bibliography
1) Mayo, Sophie. "Service Oriented Architecture: The Services
Opportunity ." An IDC Report Series . 21 Jul. 2005
http://www.idc.com/getdoc.jsp?containerId=IDC_P262>.
This IDC Report series discussed how the emergent of Service-
oriented architectures (SOA) and Web services promise to be
crucial enablers in the dynamic and on-demand IT and business
computing journey. This report series, Service Oriented
Architecture: the Services Opportunity, examine the most pertinent
topics in this area to help services providers enhance their services
portfolio and guide their strategic direction in this rapidly evolving
area.
2 MOF Process Model for Operations." Microsoft Operations
Framework (MOF). 17 2005. Microsoft Inc. 21 Jul. 2005
<http://www.microsoft.com/technet/itsolutions/cits/mo/
mof/mofpm.mspx#ECAA>.
This paper describes the Microsoft Operations Framework
(MOF) Process Model, one of the two core MOF models.
(The other is the MOF Team Model.) The MOF Process
Model describes Microsoft's approach to the IT operations
and service management life cycle. The Process Model
organizes the life cycle into quadrants, with each quadrant
having a specific focus and set of tasks that are carried out
through its corresponding set of service management
functions (SMFs).
3 Storage Area Network : An approach to Data Backup and
recovery ."
Storage Environment . Brocade Communication
Systems Inc . 21 Jul. 2005 <http://ftp.us.dell.com/app/ps-
broca.pdf>.
This article highlights some of the advantages of
implementing SANs. As enterprise data becomes an
increasingly essential business asset, ensuring its stability
and protection is more critical than ever.
Many organizations have faced the challenge of having to
back up growing data within shrinking backup windows. The
backup server receives data from other servers across a
LAN or wide area network (WAN), then stores that data on
centrally owned disk and tape resources. SANs improve
storage resource management through centralization, even
within distributed information technology (IT) architectures.
References
1 Ciampa, Mark . Security + Guide To Network Security . 2nd ed.
Boston : Thomson Course Technology , 2005.
2 Holden , Greg. Guide To Network Defense And Countermeasures . 2nd
ed. Boston : Thomson Course Technology , 2003.
3 Hoskins, Micheal. "Developing SOA Solutions To Accommodate Variety
and Change." Pervasive Software. 21 Jul. 2005
<http://www.pervasive.com/documentation/whitepapers/pdf/wp_soa_soluti
ons.pdf
4 Johnson , Judith J. "Disaster Recovery Planning With a Focus On
5 Data On Data Backup/Recovery." 26 2001. SANS Institute. 18 Jul. 2005
<http://www.giac.org/certified_professionals/practicals/gsec/0424.php>.
6 Web Services and Service-Oriented Architectures . Barry &
Associates, Inc.. 19 Jul. 2005 <http://www.service- architecture.com/>.
7 "Developing an effective data backup/recovery procedure." The
do's and don'ts of backup . 10 Jul. 2005
http://www.willowstarcom.co.uk/index.php/d1_data_backup_proced ure.pdf
8 Data Center Contingency Management / Disaster Recovery Plan." 25
1998. Disaster Recovery Plan . 12 Jul. 2005
http://helpnet.ut.cc.va.us/NOC/Mianframe.htm>.
9 "Contingency Planning & Business Continuity Plan Development:
Disaster Recovery Plans." 2005. Contingency Planning
Technologies. 12 Jul. 2005 <http://www.business-continuity-
world.com/>.
10 "Disaster Recovery : Best Practices White Paper ." Cisco Systems,
Inc . 17 Jul. 2005 <http://www.cisco.com/warp/public/63/disrec.pdf>.
11 , . "." Disaster Recovery Journal (). 19 Jul 2005
<http://www.drj.com/glossary/glossary.htm>.
Definition of Terms
Definition of a Disaster:
“An event that create an inability on an organization’ part to provide critical
business functions for some predetermined period of time” [10]
Definition of a Disaster Recovery plan:
“The Document that defines the resources, actions, tasks and data required to
manage the business recovery process in the event of a business interruption.
The plan is designed to assist in restoring the business process within the stated
disaster “[10]
Definition of Disaster Recovery:
“The ability to respond to an interruption in services by implementing a disaster
recovery plan to restore an organization’ critical business functions.” [10]
Definition of Data Backups:
The back up of system, application, program and/or production files to media that
can be stored both on and/or offsite. Data backups can be used to restore
corrupted or lost data or to recover entire systems and databases in the event of
a disaster. Data backups should be considered confidential and should be kept
secure from physical damage and theft” [10]
Definition of Backups (Data):
“A process to copy electronic or paper based data in some form to be available if
the original data is lost destroyed or corrupted”. [10]
Definition of Data Recovery:
“The restoration of computer files from backup media to restore programs and
production data to the state that existed at the time of the last safe backup”. [10]
Definition of Business Continuity:
“The ability of an organization to ensure continuity of service and support for its
customers and to maintain its viability before, after and during an event”.[10]
Definition of Risk Assessment / Analysis:
“Process of identifying the risks to an organization, assessing the critical
functions necessary for an organization to continue business operations, defining
the controls in place to reduce organization exposure and evaluating the cost for
such controls. Risk analysis often involves an evaluation of the probabilities of a
particular event”. [10]