disaster recovery plan - amazon s3 · estimate. consideration for the estimate should be based on...

13
SCHEDULE A to the master services agreement between MT Services Limited Partnership and Paper Interactive, Inc. > Disaster Recovery Plan This document assists Athennian in preparing and recovering from a disaster.

Upload: others

Post on 20-Jul-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Disaster Recovery Plan - Amazon S3 · estimate. Consideration for the estimate should be based on worst case scenario. An estimated duration larger than the RTO should initiate the

SCHEDULE A

to the master services agreement between MT Services Limited Partnership and Paper Interactive, Inc.

> Disaster Recovery Plan

ThisdocumentassistsAthennianinpreparingandrecoveringfromadisaster.

Page 2: Disaster Recovery Plan - Amazon S3 · estimate. Consideration for the estimate should be based on worst case scenario. An estimated duration larger than the RTO should initiate the

2|P a g e

Creativity.Ambition.Quirky.Community.

Table of Contents

Record of Changes ...................................................................................................................... 3 Review Cycle ................................................................................................................................ 3 Overview....................................................................................................................................... 4 Purpose ........................................................................................................................................ 4 Scope ........................................................................................................................................... 4 Policy ............................................................................................................................................ 4

Contingency Plans .................................................................................................................... 4 Preparation ........................................................................................................................... 4 Disaster Criteria .................................................................................................................... 6 Disaster Declaration ............................................................................................................. 7 Disaster Recovery ................................................................................................................ 8 Disaster Recovery Termination ............................................................................................ 9 Plan Maintenance ................................................................................................................. 9

Appendix .................................................................................................................................... 11 Disaster Recovery Plan .......................................................................................................... 11

Role Assignments ............................................................................................................... 11 Contact Information (Call Tree) .......................................................................................... 12

Page 3: Disaster Recovery Plan - Amazon S3 · estimate. Consideration for the estimate should be based on worst case scenario. An estimated duration larger than the RTO should initiate the

3|P a g e

Creativity.Ambition.Quirky.Community.

Record of Changes

Ver. #

Implemented by

Revision Date

Approved by Approval Date

Reason

0.1 Sean Gowing 2018-08-14

Shane Fast 2018-08-14

Initial document creation.

0.2 Adrian Camara

2018-10-28

2018-10-28

Updating with operational information.

Review Cycle

This Disaster Recovery Plan should be reviewed on an annual basis. The review should verify the information within and remove any references or individuals that are no relevant to Athennian’s business. All changes to this plan must be reviewed and approved by Athennian’s CTO, Shane Fast.

Page 4: Disaster Recovery Plan - Amazon S3 · estimate. Consideration for the estimate should be based on worst case scenario. An estimated duration larger than the RTO should initiate the

4|P a g e

Creativity.Ambition.Quirky.Community.

Overview

Disasters are an infrequent event, yet it is important to realize that having a contingency plan in the event of a disaster gives Athennian a competitive advantage. This policy requires management to financially support and diligently attend to disaster contingency planning efforts. Disasters are not limited to adverse weather conditions. Any event that could likely cause an extended delay of service should be considered.

Purpose

This policy defines the requirement for a baseline disaster recovery plan to be developed and implemented by Athennian that will describe the process to recover IT Systems, Applications and Data from any type of disaster that causes a major outage.

Scope

This policy is directed to the Athennian Management Staff who are accountable to ensure the plan is developed, tested and kept up-to-date. This policy does not provide requirements around what goes into the plan or subplans.

Policy

Contingency Plans

The following contingency plans address the multitude of disaster recovery actions that can be taken to mitigate negative consequences to the business:

Preparation

The ability of an organization to accelerate the recovery process starts with pre-disaster preparation. Recovery and mitigation can be greatly enhanced with pre-disaster preparation and may include various planning activities such as:

• Identifying criticality of business services. • Setting backup and recovery policies. • Defining key roles and responsibilities.

Criticality of Service List List all the services provided and their order of importance.

Tier I Tier I services are services that are vital to the running of an organization. Recovery Time Objectives (RTO) for Tier I services are 24 hours. Athennian’s Tier I services are as follows:

• Webserver • Database server

Tier II Tier II services are services that are important to an organization but aren’t vital. RTO for Tier II services are 3 days. Athennian’s Tier II services are as follows:

• Amazon S3 Storage

Tier III Tier III services are services that can be offline with no impact to business operations. RTO for Tier III services are 2 weeks. Athennian’s Tier III services are as follows:

Page 5: Disaster Recovery Plan - Amazon S3 · estimate. Consideration for the estimate should be based on worst case scenario. An estimated duration larger than the RTO should initiate the

5|P a g e

Creativity.Ambition.Quirky.Community.

• Patching server

Backup and Recovery Policy Purpose Designed to protect data in the organization by ensuring it can be recovered in the event of an equipment failure, intentional destruction of data or a disaster scenario.

Scope Applies to all data and data repositories owned and operated by Athennian.

Backup Identification The IT Manager is responsible for identifying all systems, data and data repositories required for business operations. This can include, but is not limited to, operating systems, applications, source code, policies, procedures and contact information for vendors or business partners. Exceptions are at the discretion of the IT Manager and must be recorded.

Backup Schedule Outlines the backup schedule for data as it pertains to the service tier or criticality.

Tier I With a Recovery Point Objective (RPO) of 24 hours, daily backups are to be performed at a minimum, ideally during the night after the daily business has been completed. Backups can be performed at an increased frequency during changes to support back out procedures.

Tier II With a Recovery Point Objective (RPO) of 1 week, weekly backups are to be performed at a minimum, ideally at the end of the week to capture a full week of business. Backups can be performed at an increased frequency during changes to support back out procedures.

Tier III Backups are not required but monthly backups are recommended, ideally at the end of the month to capture a full month of business. Backups can be performed at an increased frequency during changes to support back out procedures.

Backup Storage Backups will be replications stored in the cloud.

Responsibility The IT Manager shall delegate a staff member to ensure backups are being performed and testing the ability to restore the data on a quarterly basis.

Testing The ability to restore data from backups shall be tested on a quarterly basis. This is critical as a backup that cannot be restored has no value and only adds cost to an organization.

Key Roles and Responsibilities Defines the roles and responsibilities required to best respond in a crisis. Please see Appendix – Role Assignments for a detailed list of the individuals responsible for the roles described below.

Crisis Manager Responsible for declaring a disaster, managing the recovery of business operations and has been granted full authority to make all decisions relative to recovery efforts.

Reviewing Senior Executive Responsible for providing oversight to protect the business interests of Athennian.

Page 6: Disaster Recovery Plan - Amazon S3 · estimate. Consideration for the estimate should be based on worst case scenario. An estimated duration larger than the RTO should initiate the

6|P a g e

Creativity.Ambition.Quirky.Community.

Recovery Management Team Responsible for implementing their respective portion of the recovery plan for their functional area and are granted authority to do so by the Crisis Manager.

IT Manager Responsible for assisting the recovery operations and maintaining the IT operations unaffected by the crisis.

Media Relations Manager Responsible for engaging and maintaining communication channels with media organizations during recovery operations.

Vendor Manager Responsible for engaging and maintaining communication channels with vendors during recovery operations and working to establish alternate arrangements to resume business operations.

Customer Service Manager Responsible for engaging and maintaining communication channels with customers during recovery operations and working to establish alternate arrangements to resume business operations.

Suppliers and Contractors Suppliers and Contractors who have agreed to provide resources and/or services during a crisis. Responsible to provide those resources and/or services agreed upon in a timely fashion to restore business operations.

Critical Operations Support Staff Consists of key staff that are considered critical for the continuation of business operations after a disaster. Responsible to provide assistance as required to preserve business operations during recovery operations.

Disaster Criteria

The criteria for a determining whether a disaster is predicted or has occurred should be based on the following factors:

Type • Natural

o Blizzard o Earthquake o Fire o Flood o Hurricane o Tornado o Tsunami o Any other serious adverse weather conditions may apply.

• Manmade o Bioterrorism o Civil unrest o Fire o Hazardous material spills

Page 7: Disaster Recovery Plan - Amazon S3 · estimate. Consideration for the estimate should be based on worst case scenario. An estimated duration larger than the RTO should initiate the

7|P a g e

Creativity.Ambition.Quirky.Community.

o Nuclear and radiation accidents o Power failure

Impact 1. Minimal impact, situation contained, i.e. Single non-critical application failure.

Minimal involvement required, timing is flexible, recovery can’t come at the expense of production business operations.

2. Minimal impact, ongoing situation, i.e. Multiple non-critical application failure. Increased involvement required, timing still flexible, recovery can’t come at the expense of production business operations.

3. Major impact, situation contained, i.e. Single critical application failure. Medium involvement required, timing becomes urgent, recovery may come at the expense of production business operations.

4. Major impact, ongoing situation, i.e. Multiple critical application failure. Large involvement required, timing becomes very urgent, recovery will most likely come at the expense of production business operations.

5. Major impact, ongoing expanding situation, i.e. Datacenter failure. All hands-on deck, this includes relevant vendors and contractors. Urgency is paramount. All other operations must immediately be put on hold until resolution of the disaster.

Duration While type and impact of a disaster can be easy to determine, duration is likely to be an estimate. Consideration for the estimate should be based on worst case scenario. An estimated duration larger than the RTO should initiate the appropriate recovery plan.

Disaster Declaration

A formal declaration by the Crisis Manager that a disaster or severe outage is predicted or has occurred. The decision to activate the Disaster Recovery Plan is based upon an estimate whether the disruption of business operations caused by the incident are greater than the RTO for the service.

Disaster Notification Initial Disaster Notification is the process of notifying relevant parties of a crisis that is currently impacting the organization. While this notification will likely be an email, email may be one of the systems impacted or unavailable. Alternate communication channels may be required to notify relevant parties. The initial notification should contain the following information:

DISASTER NOTIFICATION

The parties listed below are responding to a disaster scenario:

Date/time:

Crisis Manager POC:

Data/Service Impacted:

Technical impact:

Business impact:

Page 8: Disaster Recovery Plan - Amazon S3 · estimate. Consideration for the estimate should be based on worst case scenario. An estimated duration larger than the RTO should initiate the

8|P a g e

Creativity.Ambition.Quirky.Community.

Next Steps:

Obstacles (if any):

Next update and Means:

Additional Notifications Additional disaster notifications may also be required for the media, customers and vendors. The Media Relations Manager, Customer Service Manager and Vendor Manager can send notifications to relevant parties with details pertinent to the disaster situation.

Disaster Recovery

Initiate the recovery plan for the disrupted service. Recovery plans are listed in order of their priority level.

Tier I • Migrate service to warm server in a secondary cloud datacenter. • Launch a new server/container from templated build and restore data to recover service. • Launch a new/server/container in a new cloud datacenter restoring data if possible.

Page 9: Disaster Recovery Plan - Amazon S3 · estimate. Consideration for the estimate should be based on worst case scenario. An estimated duration larger than the RTO should initiate the

9|P a g e

Creativity.Ambition.Quirky.Community.

Tier II • Launch a new server/container and restore data to recover service. • Launch a new/server/container in a new cloud datacenter restoring data if possible.

Tier III • Build a new server, restore data as required if possible.

Disaster Recovery Termination

Before termination of disaster recovery can occur, it is imperative that all due diligence activities have been completed. The termination phase is a crucial time to ensure that every required action has been performed and that all applicable standards have been met in resolving the crisis.

Criteria for Terminating an Incident The criteria for terminating an incident are as follows:

• Has the source of the disaster been identified, contained, and service restored? • Have we confirmed that other Athennian systems and services were not affected? • Have all the required notifications occurred? • Are there any significant activities outstanding that require the immediate attention of the

Disaster Recovery teams to resolve? • Have the appropriate changes been made to prevent future occurrences of the crisis

from affecting Athennian systems wherever possible? • Has a discussion with members of the Disaster Recovery teams occurred, and was

there consensus that the crisis has been resolved?

Disaster Recovery Termination can be performed through a formal declaration by the Crisis Manager that a disaster or severe outage has been resolved and all business-critical services have been restored.

Lessons Learned All individuals involved in the Disaster Recovery activities should review their actions, responsibilities and notes for possible documentation that was incomplete or possible documentation that may prove to be beneficial in future incidents. All lessons learned should be forwarded on to the Crisis Manager for consideration to be incorporated into the Disaster Recovery Plan.

Plan Maintenance

Overview This Disaster Recovery Plan only has value if the information contained within is current and relevant. With the goal of the Disaster Recovery Plan being to provide guidance for immediate execution in response to a disaster, it is then the responsibility of every individual called upon to act within the Disaster Recovery Plan to also ensure they have kept their information current and relevant.

Regular Updates Verification Updates of Perishable Data As changes are made to the environment, applications or organization, all efforts to keep the components of this plan current and relevant are the responsibility of all individuals listed within the Disaster Recovery Plan document.

Page 10: Disaster Recovery Plan - Amazon S3 · estimate. Consideration for the estimate should be based on worst case scenario. An estimated duration larger than the RTO should initiate the

10|P a g e

Creativity.Ambition.Quirky.Community.

Incorporation of Previous Lessons Learned Lessons Learned feedback from the disaster recovery termination should be collected and distributed via email to the Disaster Recovery Teams for their approval and incorporation into the Disaster Recovery Plan. The Crisis Manager has final approval for implementing a Lessons Learned recommendation. Entry of the Lessons Learned items will be recorded as a change and approved by the Crisis Manager for inclusion in the Disaster Recovery Plan document.

Annual Testing of the Plan Requirement Athennian mandates that all Disaster plans be tested on an annual basis. Plan validation requires the participation of all anticipated parties to ensure that if the plan were to be executed, it would have current information, work within the current structure and staffing of Athennian, and would be immediately executable as written.

Exercise Mechanics The annual test should focus on the following objectives:

• Is the information in the plan current and relevant? • Are the processes listed within the plan effective within the current Athennian

environment? • Are the proper individuals involved in the execution of the plan? • Are the processes listed in the plan current with industry standard/best practices? • Have we exposed the plan to all relevant parties from within Athennian and outside

partners/participants for their validation?

Lessons Learned During the annual testing of the Disaster Recovery Plan, the summary of Lessons Learned from the previous year should be presented to ensure to all members that all Lessons Learned are accounted for and implemented when applicable.

Record(s) Retention All accounts relating to the maintenance of the Disaster Recovery Plan should be retained by Athennian to demonstrate annual testing of the Disaster Recovery Plan, integration of Lessons Learned from previous incidents and the overall commitment to being prepared for a disaster.

Page 11: Disaster Recovery Plan - Amazon S3 · estimate. Consideration for the estimate should be based on worst case scenario. An estimated duration larger than the RTO should initiate the

11|P a g e

Creativity.Ambition.Quirky.Community.

Appendix

Disaster Recovery Plan

Role Assignments

The following tables lists the individuals responsible for the roles described within this Disaster Recovery Plan.

Role Individual or Group Responsible

Crisis Manager Shane Fast (Chief Technology Officer) Katie McLean (Chief Growth Officer)

Reviewing Senior Executive Adrian Camara ( Chief Executive Officer)

Recovery Management Team Shane Fast

IT Manager Nav Malhorta (IT & DevOps Manager)

Media Relations Manager Katie McLean

Vendor/Contract Manager Katie McLean

Customer Service Manager Katie McLean Mark Tinana (Senior Developer)

Suppliers and Contractors Adrian Camara

Critical Operations Support Staff Katie McLean

Page 12: Disaster Recovery Plan - Amazon S3 · estimate. Consideration for the estimate should be based on worst case scenario. An estimated duration larger than the RTO should initiate the

12|P a g e

Creativity.Ambition.Quirky.Community.

Contact Information (Call Tree)

A call tree is typically used to notify staff outside of business hours. A common arrangement is that one person will call a small group of staff members with a message, then those persons will phone other staff and pass on the message, until finally all relevant members of staff have received the message.

To ensure that a call tree is effective, it should be regularly tested: missing or changed phone numbers can severely degrade the performance of a call tree.

Name Work Phone

Mobile Phone

Team Work Email Time Called

Contacted

Y/N

Adrian Camara

306 385 9024

306 385 9024

Executive [email protected]

Katie McLean

403 617 3103

403 617 3103

Executive [email protected]

Nav Malhorta

780 292 3481

780 292 3481

IT & Development

[email protected]

Mark Tinana

403 889 6858

403 889 6858

IT & Development

[email protected]

Andrew Dravucz

403 470 3092

403 470 3092

IT & Development

[email protected]

Spencer Greff

587 897 1214

587 897 1214

IT & Development

[email protected]

Shane Fast

403 862 3300

403 862 3300

Executive [email protected]

Page 13: Disaster Recovery Plan - Amazon S3 · estimate. Consideration for the estimate should be based on worst case scenario. An estimated duration larger than the RTO should initiate the

13|P a g e

Creativity.Ambition.Quirky.Community.