sharepoint backup and recovery a discussion of different ... · with sharing capabilities,...

25
EMC Proven Professional Knowledge Sharing 2010 SharePoint Backup and Recovery A discussion of Different Options Including the Cloud Douglas Collimore and Jean Weintraub Douglas Collimore EMC Corporation Principal Technical Consultant [email protected] Jean Weintraub EMC Corporation Sr. Practice Manager – BRS [email protected]

Upload: others

Post on 03-Jun-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: SharePoint Backup and Recovery A discussion of Different ... · with sharing capabilities, application portals, WIKIs, blogs, etc. Out of the box, SharePoint puts this type of control

EMC Proven Professional Knowledge Sharing 2010

SharePoint Backup and RecoveryA discussion of Different Options Including the Cloud Douglas Collimore and Jean Weintraub

Douglas CollimoreEMC CorporationPrincipal Technical [email protected]

Jean WeintraubEMC CorporationSr. Practice Manager – [email protected]

Page 2: SharePoint Backup and Recovery A discussion of Different ... · with sharing capabilities, application portals, WIKIs, blogs, etc. Out of the box, SharePoint puts this type of control

2010 EMC Proven Professional Knowledge Sharing 2

Contents Introduction ................................................................................................................................... 3 What is SharePoint? ..................................................................................................................... 4 Backup and Recovery of SharePoint ............................................................................................ 6

Recycle Bin ............................................................................................................................... 8 Versioning ................................................................................................................................. 9 VSS-Enabled Backups and Recoveries .................................................................................. 10 Granular Backups .................................................................................................................... 11 Enterprise Backup of SharePoint ............................................................................................ 12 Deduplication to Support SharePoint Backup ......................................................................... 13 Backup Infrastructure for SharePoint ...................................................................................... 14 SharePoint Backup in the Cloud ............................................................................................. 16 Backup Cloned to Cloud .......................................................................................................... 17 Backup Infrastructure Moved to Cloud .................................................................................... 18 Hybrid Backup with Data Externalization ................................................................................ 21

Summary ..................................................................................................................................... 24 Disclaimer: The views, processes, or methodologies published in this compilation are those of the authors. They do not necessarily reflect EMC Corporation’s views, processes, or methodologies.

Page 3: SharePoint Backup and Recovery A discussion of Different ... · with sharing capabilities, application portals, WIKIs, blogs, etc. Out of the box, SharePoint puts this type of control

2010 EMC Proven Professional Knowledge Sharing 3

Introduction

Microsoft’s collaboration application, SharePoint, has taken the enterprise by storm. The

proliferation of SharePoint sites throughout enterprises has created exponential data growth that

needs to be controlled. However, unlike most applications – Exchange or SQL, for example –

there seems to be no real owner of SharePoint. Sure, there is the SharePoint group that builds

the farms or develops the portal applications, but are they the real owners? What about the SQL

team that manages the content databases, or the storage team that supports the hardware.

Let’s not forget security or the backup/recovery teams tasked with maintaining the environment

as well. Most applications, such as e-mail or databases, tend to have a primary group that controls it

and is ultimately responsible for administering. SharePoint does not seem to follow that same

structure. That is because SharePoint is not a single application. It is a development platform

that supports team collaboration and company-wide information sharing and document storage.

This means many organizations have responsibility for data within the environment and

ultimately have some level of responsibility One of the most complicated areas in support of a true enterprise-size SharePoint farm revolves

around backup and recovery. Figure 1 depicts a typical enterprise level SharePoint farm. As you

can see, there are many components that require backup to be able to offer a farm level

recovery in case of an outage. You have the Web front-end servers (WFE), the content

databases (SQL), the application servers, the search/index/and query servers, and most

importantly, the configuration database, to name a few. While 95% of the farm resides in the

data that is part of the content databases, meaning a SQL backup may be good enough, the

other 5% is actually required as well if you want to do something with the data in the event of a

failure. In other words, just having the data with the supporting platform to read the data does

not offer the resiliency expected by an enterprise customer.

That is why understanding and deploying an enterprise backup/recovery solution can be so

perplexing. Do I need to back up consistently? What changes if I virtualize some or ALL of the

farm? How do my SLAs change if I don’t back up everything? How do I restore at a granular

level? How does moving all or part of my farm to the cloud impact these other questions? These

are all good and important questions that will be covered throughout the remainder of this

article.

Page 4: SharePoint Backup and Recovery A discussion of Different ... · with sharing capabilities, application portals, WIKIs, blogs, etc. Out of the box, SharePoint puts this type of control

2010 EMC Proven Professional Knowledge Sharing 4

Figure 1. Typical Enterprise-Size SharePoint farm

What is SharePoint?

It is important to understand what SharePoint is, what it isn’t, the components of SharePoint,

and the built-in features that assist in the ability to protect data. Regardless if it is a backup, an

item sent to a recycle bin, or using versioning to protect multiple copies of a file, each has its

place and is required to build a SharePoint farm with complete recovery capability.

So what is SharePoint? The quick answer – and the answer most people give is it is a

document sharing and collaboration application. However, SharePoint is not an application, but

a collection of elements –both applications and products – that provide web-based

collaboration, process management functionality, and document management capabilities. It is

this and the web part tools that make SharePoint a tool to create easily built personal websites

with sharing capabilities, application portals, WIKIs, blogs, etc. Out of the box, SharePoint puts

this type of control directly into the end user’s hands –one of the primary reasons for its

explosive growth, exponential storage usage, and issues surrounding security, backup and

recovery, and compliance.

Page 5: SharePoint Backup and Recovery A discussion of Different ... · with sharing capabilities, application portals, WIKIs, blogs, etc. Out of the box, SharePoint puts this type of control

2010 EMC Proven Professional Knowledge Sharing 5

With the trained developer, SharePoint is a powerful toolset capable of taking disparate custom

business applications and consolidating them on a single platform with a common, recognizable

user interface (UI). It is this powerful capability that makes business continuance, disaster

recovery, and infrastructure consolidation key areas that demand the attention of IT

departments. Unfortunately, each is slightly more complicated than maintaining the standard file

server. This is due to the different server types and databases that make up a SharePoint farm

including:

• Configuration Database

• Content Database(s)

• Search Server

• Index Server

• Query Server

• Web Front-End Server

• Application Servers (example-Excel, Project, Security)

• Shared Service Provider (SSP) Databases

Configuration Database - the database that maintains all the information about the entire

SharePoint farm. This database cannot be backed up by most backup applications not using

VSS without shutting down the farm. It is required for disaster recovery of the SharePoint farm.

It is also required to restore a SharePoint farm to original configuration.

Content Database - the repository for much of the information used in SharePoint such as web

applications, site collections, sites, documents, lists, etc. The content databases represent most

of the SharePoint farm. Without this data, you have simply an empty farm. Many utilities are

available to backup and recover the content databases as they are SQL-based. These

applications/utilities include, but are not limited to, stsadm, Central Administration, 3rd party VSS

backup applications, SQL dumps, or streaming backups.

Index Server - The index server maintains the index file. It manages the crawling and indexing

of SharePoint Server content to maintain this file. You have the ability to search SharePoint

content through this service and coordination with the Search service. Correct index restorability

is important in large farms so re-indexing, a timely and performance impacting procedure, will

not have to take place during a content database recovery.

Page 6: SharePoint Backup and Recovery A discussion of Different ... · with sharing capabilities, application portals, WIKIs, blogs, etc. Out of the box, SharePoint puts this type of control

2010 EMC Proven Professional Knowledge Sharing 6

Search/Query Servers – Working in conjunction with the Index service, the Query service

allows users to search the content stored within the SharePoint farm. While Index server

recovery is important to limit any re-indexing that may need to occur, the same does not hold

true for the Query server. Server level or VSS backup of this server is sufficient to meet even

the most stringent SLA.

Web Front End (WFE) – Also referred to as Web servers, this server (or servers in the case of

a large farm or a farm with built-in redundancy) contains the files and settings that are required

to, for example, respond to user requests for Web pages/content or search requests. For

backup purposes, the primary WFE should be a robust server to properly handle the backup

and recovery of the farm. It is the WFE that VSS coordinates with to properly back up the farm

during a VSS-enabled backup. While you could perform backups across multiple WFE hosts, it

is not recommended for administrative reasons as your restores would need to be performed on

the WFE that performed the original backup. The required administration to maintain this list in a

large farm is not necessary with today’s high performing servers.

Shared Service Provider (SSP) database - a set of services that can be shared across Office

SharePoint Server Web applications. This greatly reduces the resources required to provide

these services across multiple sites.

Application Server - supported by SSP, Application servers include Excel Services, Project

Server 2007, Office SharePoint Server Search, Business Data catalog, Personalization

Services, and Portal/Search Reporting.

Backup and Recovery of SharePoint

Designing a backup/recovery solution for any application is impossible without understanding

the requirements you are designing against. These requirements come from the lines of

business the application(s) support. In the case of SharePoint, you may have multiple portals

that operate applications for the company. Examples are a human resources portal for

employees to check their profiles or an ordering system for drugs for external customers.

Page 7: SharePoint Backup and Recovery A discussion of Different ... · with sharing capabilities, application portals, WIKIs, blogs, etc. Out of the box, SharePoint puts this type of control

2010 EMC Proven Professional Knowledge Sharing 7

You may also have many team sites with multiple sub-sites that are used for internal team

collaboration. These may or may not be as important as the portals and hence may not require

the same Service Level Agreement (SLA), possibly resulting in a different level of restorability. It

is a matter of understanding your SLA to the business. Without a clear understanding of what

your business is expecting for recovery capability and time objectives, trying to design a

backup/recovery solution is impossible. Once the objectives are understood, a backup/recovery

design can be started.

There are many schools of thought when it comes to planning and implementing a backup and

recovery solution for SharePoint. As seen in the previous section, SharePoint has many

components that require backup, some more important than others. As all of the content users

place in the farm are stored in the content databases, these SQL databases may appear to be

the most important. However, without the configuration database, recovering your farm to the

state it was BEFORE an outage or data corruption would be almost impossible. Then, there are

the other servers; index; query; application, etc.

This causes us to question how one can maintain a consistent backup of all these servers

without having to perform multiple backup processes. Is it actually NECESSARY to back up

everything or is just backing up the content databases good enough? While the last question

may seem a little farfetched for an enterprise administrator, the fact is there are tools within

SharePoint that make data recovery easier and less reliant upon backups than ever before

regardless if your data resides locally or in a cloud infrastructure. Understanding and using

these features are critical in easing the burden of multi-terabyte backups and recoveries.

SharePoint comes with a few features that allow for some simplistic levels of data recovery.

These recovery features allow users and administrators the luxury of restoring deleted objects

and files WITHOUT having to go to a backup of the content database and using a lengthy

restore process to recover the data. These features are the multi-tiered recycle bin and

document versioning. Both must be considered when developing an SLA as they each bring

unique availability levels for the less important data in your SharePoint farm.

Page 8: SharePoint Backup and Recovery A discussion of Different ... · with sharing capabilities, application portals, WIKIs, blogs, etc. Out of the box, SharePoint puts this type of control

2010 EMC Proven Professional Knowledge Sharing 8

Recycle Bin

The recycle bin, not unlike the recycle bin that is available on your Windows Desktop and

Server systems has two levels of file recovery. The first tier, the user administered recycle bin,

allows the end-user to actually recover deleted items without having to resort to a data restore

from a content database backup.

The recycle bin allows users to recover:

• deleted files

• documents

• list items

• lists

• document libraries

Notice, the recycle bin does NOT allow for restoration of deleted sites, sub-sites, or web-parts.

These deletions still require restoration from a completed backup. The default retention is 30

days for deleted data. This value can be modified based on storage availability and site quota

size. It is important to remember the amount of data stored in the Tier 1 recycle bin will be

deducted from the total capacity of the site. In other words, the Tier 1 recycle bin capacity is

included in the quota for the site.

Another important point to understand about the recycle bin is it will only maintain a single

named copy of a file or item. If you have a file and save it and delete it multiple times, you can

only get back the last deleted copy. If you want the ability to have access to ALL versions of the

file, then you must turn on versioning which will be discussed later in this section. After the file

has reached the end of the recycle bin retention policy, it will enter the Tier 2 recycle bin.

The Tier 2 recycle bin is managed by the SharePoint administrator. It is also referred to as the

Site-Collection Recycle Bin. This is a site collection administrator-only recycle bin meaning end-

users cannot see or administer delete objects from this recycle bin. This recycle bin is used

either after an item has been deleted from the Tier 1 recycle bin or after it has reached the

retention policy limit. Unlike the Tier 1 recycle bin, this one is NOT included in the Site quota but

is included in the total capacity of the site. You may allocate up to 100% of the site capacity size

to the Tier 2 recycle bin. For instance, if you have a 50 GB site collection quota, you may

allocate up to an additional 50 GB for the Tier 2 recycle bin capacity for a total of 100 GB of

capacity for the site.

Page 9: SharePoint Backup and Recovery A discussion of Different ... · with sharing capabilities, application portals, WIKIs, blogs, etc. Out of the box, SharePoint puts this type of control

2010 EMC Proven Professional Knowledge Sharing 9

Versioning

Versioning is another feature of SharePoint 2007 that allows for user recovery of files and

objects. As with the recycle bin, it does NOT allow for recovery of sites, sub-sites, or web parts.

Unlike the recycle bin, enabling versioning allows you to recover different copies of files that

may have been previously overwritten. For instance, you created a Word document that was

revised by another employee. It is decided that the new version is not accurate and the team

wants to go back to your original copy. The recycle bin would not help you in this instance as it

only maintains the most recent deleted copy. Versioning keeps a copy of your file and would

allow you to restore it without having to restore a recently created backup. Versioning, much like the recycle bin, has two stages; major and minor versioning. You can

choose not to enable versioning, enable only major versioning, or enable both major and minor

versioning. Versioning is not a farm-level setting. Rather, versioning is controlled at the site level

for each list or library. Microsoft has described the different level of versioning and when to

apply:

• No versioning Previous document versions and the history (such as comments) associated with each version are not retrievable. This is the default setting.

• Create major versions Each iteration becomes a full copy of the document with the versions numbered sequentially (1, 2, 3, and so on). All users with permissions to the document library are able to view every updated version. Use this option if you do not need to differentiate between draft versions and published versions. To control the effect on storage space, you can specify how many previous versions to retain based on the current version.

• Create major and minor versions Versions ending with a zero extension (.0) are major versions and versions ending with a non-zero extensions are minor versions. Only major versions can be published. Additional permission levels can be configured for working with minor versions. In most scenarios, users who can edit major versions are also allowed to edit minor versions, but read-only users can only view major versions. As with the previous option, you can specify how many previous versions to keep based on the current version. You can also specify how many minor versions are kept per major version.

Care must be taken when enabling versioning as sites can become quite large, very quickly, as

multiple copies of files are stored. A discussion of versioning must take place when discussing

Page 10: SharePoint Backup and Recovery A discussion of Different ... · with sharing capabilities, application portals, WIKIs, blogs, etc. Out of the box, SharePoint puts this type of control

2010 EMC Proven Professional Knowledge Sharing 10

SLAs with your business units as this can severely impact the capacity levels of storage for

each site

Backup Methodologies

Now that we have covered the aspects of Microsoft features that will help end users support

deleted item retention and recovery, the discussion needs to address the backup and more

importantly, the restore methods available to SharePoint administrators for those times when

the user can’t retrieve deleted items, has accidentally deleted a site, or is experiencing a corrupt

database. As mentioned previously, there are many applications and utilities that can back up

and recover SharePoint. Microsoft provides a few methods to back up and recover SharePoint.

They are not very extensive, do not offer the flexibility of 3rd party applications, but are included

with SharePoint (hence, free) and can provide enough restore capability for many installations.

For an enterprise installation, these tools are usually not robust enough to support the SLAs set

forth by the business. The Recovery Point Objectives (RPO) and Recovery Time Objectives

(RTO) are usually stringent and some form of automated backup and recovery application is

required. There is also the risk factor involved during a backup or restore when multiple

applications are required to obtain a complete and consistent copy of data. The two most

popular methods for backing up SharePoint farms are VSS- enabled backups and granular

backups.

VSS-Enabled Backups and Recoveries

Microsoft developed the Virtual Shadow Copy Service (VSS) to support quick, consistent

copies, or point-in-time (PIT) copies of volumes. By allowing for timely images of data created

almost instantly to disk (or tape), you can complete backups of large production volumes

quickly, reducing the performance impact to your production server a streaming solution might

cause. It also allows for quick restoration of files or volumes using the same VSS technology.

There are many advantages to using VSS-enabled backups (dependent on backup application

used) including uninterrupted backups with little production impact, non-production impacting

backup workflows, quick disk-based restores, database-level granularity of restores, SharePoint

rollback, and automatic database synchronization after a restore. Most importantly, VSS-

enabled backups are what Microsoft prefers and supports.

Page 11: SharePoint Backup and Recovery A discussion of Different ... · with sharing capabilities, application portals, WIKIs, blogs, etc. Out of the box, SharePoint puts this type of control

2010 EMC Proven Professional Knowledge Sharing 11

One key decision when thinking about VSS-enabled backups is whether to use snapshots or

clones. A snapshot tends to be a smaller subset copy of the original volume dependent on

change rate. If a consistency check is required against that volume to establish its consistency

before declaring it a valid backup, then you must take into account the impact to production you

will experience. Remember, a snapshot points back to the production volume and only stores

changes since the last snapshot creation. You may find you are inducing the same impact you

were trying to eliminate by using VSS in the first place during a consistency check. Clones

eliminate this issue by creating a full copy of data that can be checked on a separate mount

host. Of course, there is the additional disk cost of having to purchase 2X the amount of

production capacity to use this method. Figure 2 depicts an example of the amount of disk

capacity required using each technology to back up four disk volumes

Figure 2. Snap vs. Clone Comparison

Granular Backups

Also known as brick-level backups, granular backups crawl the content databases and back up

each item within the file individually, allowing for down-to-the-item level of restorability. As you

can imagine, this is a slower, production-impacting process. However, it allows for single item

recovery not available with VSS-enabled backups without the use of secondary farms or 3rd

party applications.

Most granular backup applications can be installed into a current backup infrastructure without

too much complication and add the valuable site level recovery not supported by the recycle bin

2 copies of snaps versus 1 copy using clones. Snaps may use only 10% to 20% of the actual production capacity depending upon frequency of snaps and the change rate of the data.

Page 12: SharePoint Backup and Recovery A discussion of Different ... · with sharing capabilities, application portals, WIKIs, blogs, etc. Out of the box, SharePoint puts this type of control

2010 EMC Proven Professional Knowledge Sharing 12

or versioning. It is important to note you must size both the total capacity of the SharePoint farm

AND the change rate before deciding if you can leverage granular backups. As it is slower than

VSS, too large a farm with a high change rate may not be able to complete a full granular

backup within your established backup window. On the other hand, you may choose to utilize

granular backup for only the most important sites based on the SLA you were given.

Granular backups, when used with the recycle bin and versioning mentioned previously, can

supplement your data recovery plan without having to add the complexity or expense of extra

disks that VSS requires.

For many SharePoint farms, this type of SLA to a business unit may be good enough. The

question is not whether to use VSS or Granular backups to backup your SharePoint farm. The

question should be, “Do I use one or the other or a combination of the two?” Your business

requirements will dictate the answer for you.

Enterprise Backup of SharePoint

So now you understand the different types of backup available for SharePoint and the level of

recovery you get from both. You also know there are features within SharePoint that allow you a

degree of restorability at a user and administrator level. You must now understand the service

level that the business requires to sustain the business without overpricing the solution. You

don’t want to use a fire hose if a bucket of water will do.

Today’s typical SharePoint farm can contain five terabytes or more of data. This data can be

files, documents, scanned images, or any of hundreds of other objects that will be stored within

the content databases. It is also important to know that the typical change rate of a SharePoint

farm is approximately 5%. That would mean a data change of 250 GB per day. This should be a

manageable capacity for incremental backups even with a small backup window of four hours.

However, what happens when your farm is 20 TBs and your change rate is an abnormal 10%

and you must back up two TB of data instead of 250 GBs? You need to look at what comprises

your farm.

The typical enterprise farm is comprised of portals, team collaboration sites, Wikis, blogs, and

other not so important sites. What if 50% of the data change rate suggested previously was

from team and personal sites within the farm? Do you have to offer the same granular recovery

Page 13: SharePoint Backup and Recovery A discussion of Different ... · with sharing capabilities, application portals, WIKIs, blogs, etc. Out of the box, SharePoint puts this type of control

2010 EMC Proven Professional Knowledge Sharing 13

you might offer the Human Resources portal that operates the Tier 1 and 2 HR applications for

the business? You might say maybe, but for most team sites, the answer is probably no. What

this means is you need to decide what type of site gets what type of SLA.

Maybe portal sites are treated as Tier 1 sites and have granular restore capability with remote

disaster recovery and a restore time of less than one hour. You would treat general team sites

as Tier 2 and have site restore capability, but only to a recovery point of 24 hours and no remote

recovery capability. These are important questions to answer because each will add

infrastructure and cost to your solution.

Deduplication to Support SharePoint Backup

Data residing in a SharePoint farm is made up of documents, files, images, etc. Many of these

items are perfect candidates for deduplication, especially if versioning is turned on. Maintaining

duplicate copies of objects that may have as little as a single change within the document is a

perfect example of where you can leverage deduplication.

There are two types of deduplication methodologies; source-based and target-based. With

source-based deduplication, the client agent will identify duplicate data at the source. Only the

unique data will be sent across the LAN, reducing network utilization to perform the backup and

also the amount of backup data stored on disk/tape. This may also create a shorter backup,

shrinking the window needed to complete it.

Target-based deduplication fits into your current backup infrastructure. The dedupe appliance

replaces your current tape or virtual tape library. Once the backup data reaches the appliance

(or in-flight in some cases), the data is deduplicated and stored. The appliance and dedupe

process is completely transparent to the backup application. This makes target-based dedupe a

simple plug-and-play solution with no change to your current infrastructure required to install or

use.

Data commonality is the key to using deduplication effectively in your SharePoint environment.

This is an important variable to understand as it can also help in your quest to size your farm

backup correctly. Assessment tools are available from many dedupe vendors, including EMC,

and are unobtrusive to run against your SharePoint databases. Running a commonality

Page 14: SharePoint Backup and Recovery A discussion of Different ... · with sharing capabilities, application portals, WIKIs, blogs, etc. Out of the box, SharePoint puts this type of control

2010 EMC Proven Professional Knowledge Sharing 14

assessment will show you how much of your data can be deduplicated and what type of savings

you can expect when performing your backup to such a device.

Many deduplication solutions enable you to replicate your backups offsite as well, which will

add a level of disaster recovery to your design without the need for complex replication

solutions. It will depend on your SLA. It is outside the scope of this document to get into the

details of data deduplication, but just know, if you leverage it for SharePoint, you should be able

to reduce the amount of data you back up, store, and replicate offsite.

Backup Infrastructure for SharePoint

To understand how the cloud can be leveraged for SharePoint backup, a quick review of the

physical backup of SharePoint is required. Figure 3 depicts a typical SharePoint infrastructure

for an enterprise- size account. Note the VTL is SAN-based but could be easily moved to the

LAN as well. Also note the amount of hardware and associated manpower and administration

required to support such an environment–storage, switches, network, and backup.

Figure 3. Typical Enterprise SharePoint Infrastructure

Page 15: SharePoint Backup and Recovery A discussion of Different ... · with sharing capabilities, application portals, WIKIs, blogs, etc. Out of the box, SharePoint puts this type of control

2010 EMC Proven Professional Knowledge Sharing 15

Whether you decide to use a VSS-enabled backup with some form of secondary backup (VTL,

disk, or dedupe appliance), granular backup, or one of the other Microsoft methods of backup,

the infrastructure doesn’t really change.

If you were to take the servers within the farm and virtualize them with either Microsoft Hyper-V

or VMware vSphere, the number of servers may be reduced as you consolidate services, but

the types and quantities of services would remain constant. Figure 4 depicts the same

infrastructure after virtualizing the SharePoint farm. Physical servers are represented by the

blue borders placed around the servers. The servers now represent virtual instances. It is a

sizable consolidation savings. However, care needs to be taken to make sure the Web Front-

End and Content Database SQL servers can support their associated workload when virtualized

and sharing resources between each other.

Figure 4. Typical Enterprise SharePoint Infrastructure (Virtualized)

Now that we have an understanding of the typical layout for a SharePoint infrastructure, let’s

discuss how we can leverage the cloud to support our SharePoint farm, reduce costs, and

create a robust infrastructure that is flexible. These costs include real dollars in savings from

consolidation and reduced hardware and software purchases, cost savings from reduced

manpower required to support the hardware, software, and day-to-day support responsibilities,

and costs from reduced real estate, power, and cooling requirements.

Page 16: SharePoint Backup and Recovery A discussion of Different ... · with sharing capabilities, application portals, WIKIs, blogs, etc. Out of the box, SharePoint puts this type of control

2010 EMC Proven Professional Knowledge Sharing 16

SharePoint Backup in the Cloud

There are three specific scenarios we will discuss when we talk about SharePoint backup in the

cloud. These scenarios are:

• Backup cloned to the cloud

• Backup target moved to the cloud

• Hybrid Backup with Data Externalization

You can move the entire farm to the cloud (Figure 5) leaving users accessing a SharePoint site

via the Internet, but that takes all control away from you and is beyond the scope of this

document. An example of this scenario is Microsoft Azure services for SharePoint.

Figure 5. Entire SharePoint Farm in Cloud

Page 17: SharePoint Backup and Recovery A discussion of Different ... · with sharing capabilities, application portals, WIKIs, blogs, etc. Out of the box, SharePoint puts this type of control

2010 EMC Proven Professional Knowledge Sharing 17

Backup Cloned to Cloud

Why would you keep your current infrastructure and simply take your backup and copy it to a

cloud infrastructure? You would do this for a couple of reasons. First, taking a completed

backup and moving it to the cloud will fulfill the requirement for taking your backups offsite.

Many enterprise customers have this requirement forced upon them by government regulations.

The backup is accessible from anywhere there is an IP connection. You do not need to move

your backup to another form of media, put it on a truck, and pay for someone to store it.

Accessibility to that backup data is available immediately when required. You don’t have to call

your archive vendor, have them locate your backup, put it on a truck, and deliver it to you.

Furthermore, as the copy sent to the

cloud is a secondary copy, you have a

longer backup window in which to get it

copied to the cloud resource.

Simply put, cloning or replicating your backup to the cloud will provides these benefits:

• Ease of use for long term retention–no infrastructure changes to current

environment

• Primary backup stays onsite for immediate restoration if required–most

restore requests within 2 weeks of original backup

Page 18: SharePoint Backup and Recovery A discussion of Different ... · with sharing capabilities, application portals, WIKIs, blogs, etc. Out of the box, SharePoint puts this type of control

2010 EMC Proven Professional Knowledge Sharing 18

• Offsite replication is immediate upon completion of backup–no manual

intervention required

• Immediate access over IP connection of offsite backup when required

• Clone replication runs outside backup window

• Deduplication/inline compression reduces secondary capacity requirements

and data transfer

Backup Infrastructure Moved to Cloud

If you can keep your infrastructure and move your backup offsite requirement to the cloud, why

would you want to take your entire backup process to the cloud (Figure 6)? First, you save the

infrastructure and administration costs of handling your backup process. With high-speed IP

backbones available, why worry about upgrading your hardware and network infrastructure in

support of backup? Move it to the cloud and you not only reduce your capital expenditures, but

you eliminate the personnel cost at the same time.

Page 19: SharePoint Backup and Recovery A discussion of Different ... · with sharing capabilities, application portals, WIKIs, blogs, etc. Out of the box, SharePoint puts this type of control

2010 EMC Proven Professional Knowledge Sharing 19

Figure 6. Backup Infrastructure Moved to Cloud

This sounds great on the surface. However, just because you can back up via IP to an entity

that is not part of your internal infrastructure does not mean you want to do it. In addition, if you

currently have a SAN- based backup solution, you may not be pleased with the performance

you get when moving to a LAN- based solution. Let’s explore why you may or may not want to

make this move. Remember, this is now your primary backup and you must meet your backup

window SLA. Unlike scenario 1, where you moved your secondary copy to the cloud, this change would take

your primary copy of backup offsite as well. If your RTO is short, then not having a local

recovery copy may cause you to miss your objective. This scenario may or may not be

advantageous for you depending on your SLA to the business. As stated over and over again,

your solution is designed to the SLA. This does not mean you have NO input as an IT admin or

solution architect. It means you need to have educated discussions with the business owners

Page 20: SharePoint Backup and Recovery A discussion of Different ... · with sharing capabilities, application portals, WIKIs, blogs, etc. Out of the box, SharePoint puts this type of control

2010 EMC Proven Professional Knowledge Sharing 20

and discuss what your capabilities are, what your recovery capabilities are if you move backup

to the cloud, and what the costs are if you move or don’t move. Moving backup off-premise also means you don’t control the physical security of your data. Your

cloud vendor is responsible for it. This can have implications if your company has governance

and compliance policies to meet. Your legal department now needs to be part of an IT

discussion when it may not have had any input before. One feature you can take advantage of when moving backup off-premise is disaster recovery. If

off- premise backups are your preferred method of restoring a site in the event of a disaster,

moving to the cloud makes that process simpler. All you need to do is connect the DR

environment to the IP link of the backup cloud and perform your standard DR restores. You

remove the necessity of supporting expensive WAN links, data replication, or a secondary site

including hardware, software, power, and cooling. Hopefully, you are starting to see that moving infrastructure out of the data center to another

location not in your control may be appealing and it is not without its advantages. There is a

level of control you give up for the perceived cost savings. Other business groups will now have

a say in how infrastructure is controlled which may be a hindrance for development time making

you less flexible as a company. Here are some topic areas you need to discuss before an

educated decision can be made to move to cloud for backup.

• Costs

• savings - real estate

• power and cooling

• personnel reduction

• licensing and maintenance of backup software and hardware

• disaster recovery replication can possibly be replaced by offsite backup

• new hardware/software included in flat fee, per usage payment

• Control

• data no longer under your control - raises compliance and governance

questions

• theft of data-physical and network security needs to be addressed

• service level to business is now under control of vendor - no different than

outsourcing data center

Page 21: SharePoint Backup and Recovery A discussion of Different ... · with sharing capabilities, application portals, WIKIs, blogs, etc. Out of the box, SharePoint puts this type of control

2010 EMC Proven Professional Knowledge Sharing 21

• Administration

• As backup is IP-based, nothing changes with backup process if LAN-based

• SAN-based backup to LAN will require detailed assessments to validate if IP

throughput can continue to support backup SLA

• Inability to meet performance or backup window requirements

Hybrid Backup with Data Externalization

The main issue with large SharePoint farms and backup has been the size of the datasets that

must be backed up. Full backups of 10 TB or more are not manageable and taking multiple

incremental backups require extensive playback during a restoration. But what if you were able

to take the unstructured data out of the database (files, objects, etc.), move it to less expensive

storage or the cloud, and leave only structured metadata within the SQL content databases?

That would significantly reduce the size of the content databases (downwards to 100 GB for a

five TB farm for example). Suddenly, cloud-based backups and restores become realistic.

The external binary large object (BLOB) store provider, otherwise known as EBS, working in

parallel with the SQL Server content databases, moves the unstructured data out of SQL and

onto less expensive storage - even to storage that exists in the cloud. This externalization of

data means the backup paradigm has changed (Figure 7).

When a document is created and the BLOB moved to external storage, metadata is left within

the content database; structured data for which SQL was designed. The actual file data, a Word

document for example, resides as a BLOB on some other form of storage, such as NAS or

CAS. This secondary storage is typically much less expensive than typical Fibre Channel SAN

storage. When the document is modified, a new BLOB is created. The original BLOB is NOT

overwritten. The metadata is updated to point to the new BLOB.

Page 22: SharePoint Backup and Recovery A discussion of Different ... · with sharing capabilities, application portals, WIKIs, blogs, etc. Out of the box, SharePoint puts this type of control

2010 EMC Proven Professional Knowledge Sharing 22

Figure 7. SharePoint with Externalized BLOB Storage (EBS)

Now, if you modify the same documents over and over, you can see where the storage

utilization can be quite extensive - many terabytes can be used storing all these files. There is a

utility that performs a garbage collection of these orphaned files and eventually removes them

from the EBS. Having the orphaned files available for a period of time actually is an advantage. You could

restore the metadata from a previous week and NOT have to restore the associated BLOB data

if garbage collection hadn’t been performed and the BLOB was actually still stored on the

secondary device. The advantage of this is quick restores, even from granular backups and,

because the BLOBs reside on inexpensive storage, you could have a retention period much

greater than you normally have on Tier 1 storage. Additionally, because the EBS is detached from the content database, you can choose to back it

up or replicate it for recovery purposes. Your daily backup would back up the metadata and

manage log truncation while replication can be set up between storage devices to move the

actual files, which is the time consuming portion of any backup. This design brings us to the

cloud.

Page 23: SharePoint Backup and Recovery A discussion of Different ... · with sharing capabilities, application portals, WIKIs, blogs, etc. Out of the box, SharePoint puts this type of control

2010 EMC Proven Professional Knowledge Sharing 23

Instead of paying for all this storage which needs data center space, cooling, power,

maintenance, administration, etc., why not just keep your metadata local in your SQL database

and move the BLOB to cloud storage? This way, your storage is available from anywhere. Lose

a data center, point your DR servers to the cloud from a DR site. Replicating only the metadata

is a huge time and cost savings from having to move terabytes of data. It could be something as

simple as log shipping. You might even want to move the entire backup/replication process to the cloud (Figure 8),

leaving just your metadata storage local. A well-connected WAN infrastructure to the cloud

should be more than ample to support this backup process, as the metadata backup will be

hundreds of gigabytes and not terabytes,. A good cloud provider should be able to offer data

replication so instead of backing up the large quantity of BLOB capacity; replication to a

secondary storage system may be enough to meet your SLAs and compliance policies.

Page 24: SharePoint Backup and Recovery A discussion of Different ... · with sharing capabilities, application portals, WIKIs, blogs, etc. Out of the box, SharePoint puts this type of control

2010 EMC Proven Professional Knowledge Sharing 24

Figure 8. Externalized Data Infrastructure

I am not suggesting that you should rush out and implement EBS in your SharePoint farm

tomorrow. All the questions discussed previously remain a concern.

• Can you meet the SLA to the business?

• How will your SharePoint applications perform with data externalized?

• Is moving your data out of the data center and onto the cloud allowed from a

governance and compliance perspective?

• What is the user experience when data is externalized?

• Does the design meet your disaster recovery requirements?

• What security is in place at your provider to secure your data both physically and

from hackers?

• Is there a back out plan in case the vendor can’t provide the level of service

promised?

These questions must be thought through carefully before you can truly make an educated

decision on whether to externalize your SharePoint data to the cloud, implement a hybrid

design, or keep it local to your data center.

Summary

SharePoint backup and recovery is difficult in the enterprise because of the uncontrolled,

exponential growth within the farm. SharePoint is not your typical database application, but a

toolset that allows for the building of portals, collaboration sites, and even as a repository for

data files, a by-product of the application.

There are many components within a SharePoint farm that need to be backed up including the

configuration database, the content databases, web servers, and a host of other server types.

Without a consistent backup across the farm, supported by VSS-capable backup applications,

recovering from a disaster without manually rebuilding the farm is impossible.

Page 25: SharePoint Backup and Recovery A discussion of Different ... · with sharing capabilities, application portals, WIKIs, blogs, etc. Out of the box, SharePoint puts this type of control

2010 EMC Proven Professional Knowledge Sharing 25

Externalizing the data files that reside within the SQL content databases reduces the size of

SQL and unburdens the database from storing unstructured data. It also greatly reduces the

size of the content databases making them more manageable and performing optimally. While

moving the actual data files onto less expensive storage or moving it completely out of the data

center to the cloud means cost savings, it can come at the expense of performance. It also may

break compliance or governance rules and regulations. Finally, remember that the SLA to the

business is the first and foremost priority to any design. It does no one any good to save money

if your first outage costs the business more money than you saved.