disaster recovery planningjschauma/615a/... · 2012. 4. 26. · a presentation by vincent lipoma...
TRANSCRIPT
Disaster Recovery Planning
A Presentation by Vincent Lipoma
Stevens Institute of Technology
CS 615 April 2012
Agenda
1. What is and When Should You Have a DRP?
2. How is a DRP Made?
3. Real Life Disaster Stories
4. DR Tools – To the Cloud!
A Disaster Recover Plan is…
• A plan to follow in the event of an emergency
• A list of contact information
• Documentation - Instructions on how to keep the system running and bring them back.
• Normally a physical and well distributed document
When should a DRP be made?
• It should always exist, even in an extremely small project or business.
• The DRP should be taken seriously when the business starts to grow, or when there are legal obligations to protect certain data (HIPAA)
• There should always be a DRP when the resources exit to make a backup.
How is a Disaster Recovery Plan Created?
1. What is and When Should You Have a DRP?
2. How is a DRP Made?
3. DR Tools – To the Cloud!
4. Real Life Disaster Stories
Understand the Scope
• A DRP is a top level document.– Upper management needs a plan
before you do
• The System Administrator’s DRP does not exist if the entire company has no DRP.
• A DRP is not just for terrorist attacks and hurricanes – it’s for any ‘big problem’ you could face.
• When management has a plan, you can go ahead with your DRP.
• Company Size and Budget affect your DRP
The DRP Document
• The document generally has this outline:
– Contact Information
– System Assessment
– Risk Assessment
• Vulnerabilities
• Probabilities
– Risk Mitigation
Contact Information
• More than a Big List of Important People
• A chain of command – Who’s in charge of what, and who’s their backup?
• “If the data center blew up, who do I call first? Who’s second?”
The Important People
• The CEO or Senior Management
– Your Boss’ boss
• Head of IT
• Chief of IT Security
– (If you have it)
• The DRP Author
• Whoever is needed to make the DRP work!
System Assessment
• Objective: “What constitutes my system?”
– Hardware, Software, Data
System Assessment
• Aside from components, there is data
• Biggest challenge is in understanding what data is important
• …then understanding what data is *most* important
• Finally, what else keeps the system running?
List of System Components
Item Serial # Description Manufacturer Responsibility
1 TB Hard Drive 012345 1 TB RAID 0 HD Western Digital Bob Williams
Load Balancer 678910 Model 240 Barracuda Shaun Jones
… … … … …
Acer Inspire 246810 Net book Acer Vincent Lipoma
Item Serial Key Description Distributer Responsibility
Apache 2.2 None Webserver Apache Bob Williams
MS SQL 2000 678910 Model 240 Microsoft Shaun Jones
… … … … …
Company website
None Version 2.0 of the website
Company Vincent Lipoma
Risk Assessment
• The fun part – thinking of every possible thing that can break the system
• Risk assessment encompasses all risks – fire, natural disaster, malicious intent, accident, etc
Terminology
• Risk: A threat to an asset
• Likelihood: The probability of a risk affecting an asset
• Impact: The damage a risk may have
• Control: Preventative measure to minimize risk
Risk Assessment Chart
Threat Description Actions Probability Impact
Hackers Wants to break the system for a
challenge
Social EngineeringSystem Intrusion
5 High
Terrorist Wants money ordestruction
Bombing/PhysicalSystem Attack
System Penetration
3 High
Disgruntled Employee
Wants revenge System IntegrityAbuse
6 Medium
Fire Partial or complete property
destruction
Natural DisasterAccident
8 High
Zombies [Bot] Cyber Crime Denial of Service 4 Medium
Fraud Crime Theft of MoneyTheft of Product
6 Low
Zombies [Real] Eat Brains Physical Attack 1 High
External Risks
• Risks can also come from external factors such as:
– Number of Data Centers
– Data Center Geography
– Vendor Risks
– ISP
Risk Mitigation
• How you handle a disaster
• Assumption – Ignore it
• Avoidance – Try not to let it happen
• Limitation – Confine the damage
• Planning – Establish a procedure to follow
• Research – Understand the threat
• Transference…
Risk Mitigation Cont’d
• Transference - “Failure is an option”
• Cost-Benefit analysis
DRP Maintenance
• Ensure everyone has access to the DRP
• Keep physical Copies
• Know where it is.
– “I can’t seem to find it at the moment…” CWIE Employee
• As the company grows, maintain the DRP.
Sample DRP: Contact Info
Sample DRP: External Contacts
Sample DRP: Define Important Data
Sample DRP: Risk Assessment
DRP in Action
• Hurricane Irene – Veeam Technology
DR Stories: Veeam
• Hurricane Irene vs Veeam Main Office (and 4 smaller offices)
• Small offices contain VOIP services that route to the main office (data center)
• …and the data center was right next to a levee.
DR Stories: Veeam
• “I loaded up a few core servers into my Jeep that I didn’t have located at DR site … Having this hardware with me was one of the best decisions I made during the entire ordeal. The desktop guys got all of our disaster workstations loaded up in a van and prepped for the ride to DR site which thankfully is only 25 minutes away, on top of a mountain.”
DR Stories: Veeam
• “I learned very quickly where our weaknesses were in our disaster plan, it was an area that is often overlooked, communication”
• “Not a single transaction was lost and the company continued to function normally, a true testament to a successful disaster recovery operation.”
DR Stories: Melinda Martin
• Hurricane Ike – Melinda Martin of TFI Resources
• “They had a few laptops and a tower that were to be used in deployment but there was nothing on paper.”
DR Stories
• After some DR maintenance…
• “TFI had leased a small office in Austin to deploy to.”
• “We were able to connect to databases and files at the colo but we had no email. Our email replication solution had failed. Plan B was we did have a website TFIEmergency.com that we broadcast to so we posted updates for mass information…”
DR Stories
• It was also a learning experience:
– “We learned a lot. The first thing was to lease a bigger space. TFI now has two colo facilities”
• Prepare to make friends too:
– “…the team bonded and those of us who deployed for IKE have a special respect for each other.”
When the DRP fails…
“We had to go into New Orleans under armed guard to regain access to documents and email that had not yet been captured by the tape backup system prior to Katrina’s landfall.”
- Yehuda Cagen from Xvand Technology Corporation
When there is no DRP…
• 1996 Docklands Bombing
Tools
1. What is and When Should You Have a DRP?
2. How is a DRP Made?
3. Real Life Disaster Stories
4. Disaster Recovery – To the Cloud!
The Cloud
• Remember that every DR is unique – therefore the tools used will be unique
• The cloud is attractive to small business –offsets costs of Disaster Recovery Planning
– No need to buy a datacenter, backup servers, desktop… just purchase a service.
The Cloud’s Big Advantages
• Provides different recovery options:
– Send data to and retrieve from the cloud
– Go straight to and use cloud instances
• Very fast recovery
Cloud Shortcomings
• Where is my data?– “According to continual updates from Japan’s Ministry of Internal Affairs
and Communication and the Japanese office of news outlet ZDNet, about one dozen major data centers and cloud facilities had reported back with varying degrees of problems, though no loss of life.” – Kern
– (On a related note, all major data centers from Yahoo and Amazon were back online within a matter of hours after the Japanese Earthquake)
• Cloud Reliability– Major EBS Outages of 2011
• Backing up from the cloud can require time (and bandwidth!)
Verdict
• Just like any tool, know how to use it
• It’s not a magic pill, but it does provide flexibility and possible cost savings
• It’s only useful if you know how to use it
3 Kinds of Backup Sites
• A backup site will boil down to either the cloud, a data center, or a private server. All three however can have the following states:
• Cold – built but not up to date
• Warm – up to date but idle
• Hot – up to date files and already serving traffic normally
Backing Up and Synching Files
• Drop Box
• VM Ware
• Subversion
• Sometimes tools have built in backups
– Windows System Restore
– SQL Dumps
• Snapshots
Backing Up Files
• The Point is that tools exist for every organization of every size – look for them! Options are not limited.
Summary
• Have a Plan– Trouble Making One? Look up National Institute of Standards document
sp800-30
• Make Everyone Aware of the Plan
• Practice Your Plan
• Know Your Tools
Links and Sources
http://www.linuxtopia.org/online_books/redhat_linux_sysadmin_intro/s1-disaster-recovery.html
DR Story: http://www.virtualizationimpact.com/?p=1854
DR story 2: http://enterprisefeatures.com/2011/11/real-life-disaster-recovery-stories/
DR Story 3: http://ezinearticles.com/?The-Day-an-IRA-Bomb-Took-Out-the-Data-Center-of-a-Major-Japanese-Bank-and-the-Turmoil-That-Followed&id=2327438
NIST: http://csrc.nist.gov/publications/nistpubs/800-12/800-12-html/index.html
Cool Info:http://www.information-management.com/news/Japan_earthquake_tsunami_data_center_cloud-10019922-1.html
Some Information from Gene Super, former IT VP of “Totsy” child clothes/parental items distributor
Some information from Cave Creek Webhosting (CWIE of Tempe Arizona)
More Links in PPT Annotations