backup optimization ‘networker inside’ · part 1 while deduplicated storage as a backup media...

35
BACKUP OPTIMIZATION ‘NETWORKER INSIDE’ Shareef Bassiouny EMC Mohamed Sohail EMC Giovanni Gobbo Senior IT Consultant

Upload: others

Post on 01-Jun-2020

7 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: BACKUP OPTIMIZATION ‘NETWORKER INSIDE’ · Part 1 While deduplicated storage as a backup media target is not a new concept in Backup and Recovery Solutions architecture (BRSa),

BACKUP OPTIMIZATION ‘NETWORKER INSIDE’Shareef BassiounyEMC

Mohamed SohailEMC

Giovanni GobboSenior IT Consultant

Page 2: BACKUP OPTIMIZATION ‘NETWORKER INSIDE’ · Part 1 While deduplicated storage as a backup media target is not a new concept in Backup and Recovery Solutions architecture (BRSa),

2014 EMC Proven Professional Knowledge Sharing 2

Table of Contents

Executive summary .................................................................................................................... 3

Introduction ................................................................................................................................ 4

Part 1 ......................................................................................................................................... 5

How much Data Storage could be gained? How could it be maximized? ................................ 7

What is the penalty of this gain? ............................................................................................. 8

Classic design example .......................................................................................................... 8

Advantages/disadvantages of the new DD Boost over Fibre Channel (DFC) .........................15

Part II ........................................................................................................................................16

Journey to an optimized backup environment ........................................................................16

The Journey ..........................................................................................................................18

Steps to the solution .................................................................................................................23

NetWorker .............................................................................................................................23

Data Domain .........................................................................................................................25

Avamar ..................................................................................................................................29

“Virtualized Environments” .....................................................................................................31

Appendix ...................................................................................................................................34

Biography ..................................................................................................................................35

Disclaimer: The views, processes, or methodologies published in this article are those of the

authors. They do not necessarily reflect EMC Corporation’s views, processes, or

methodologies.

Page 3: BACKUP OPTIMIZATION ‘NETWORKER INSIDE’ · Part 1 While deduplicated storage as a backup media target is not a new concept in Backup and Recovery Solutions architecture (BRSa),

2014 EMC Proven Professional Knowledge Sharing 3

Executive summary

Do you need to speed up your back up by up to 50%? Do you need to reduce the use of your

bandwidth up to 99%? Do you want to reduce the backup server workload up to 40%? Do you

want to increase your backup success rate?

The answer? Data Domain® Boost (DD Boost) which enables you to finish backups within

backup windows and provide breathing room for data growth. With performance up to 31 TB/hr,

it is 3 times faster than any other solution, enabling you to use your existing network

infrastructure more efficiently.

In this Knowledge Sharing article we illustrate how we optimized our backup processes and

leveraged current resources by integrating NetWorker® backup management software and the

new DD Boost over Fiber Channel feature to enhance backup system performance.

The major component of EMC backup and recovery software solutions, NetWorker is a

cornerstone element in the backup solutions of large infrastructure customers. This article targets

backup administrators, support engineers, and stakeholders interested in the importance of the

DD Boost over Fiber Channel feature and how to use it to enhance backup success rate. The

goal of this article is to help you:

speed up backups

avoid congestion that slows down large critical backups through bandwidth utilization

reduction

minimize workloads on backup hosts (NetWorker server and Storage nodes)

Page 4: BACKUP OPTIMIZATION ‘NETWORKER INSIDE’ · Part 1 While deduplicated storage as a backup media target is not a new concept in Backup and Recovery Solutions architecture (BRSa),

2014 EMC Proven Professional Knowledge Sharing 4

Introduction

In Part 1, we follow a dialogue we had with a customer while promoting Data Domain for his

backup environment, which led us to promote NetWorker as one of the best integrated products

with Data Domain appliances. Part 1 is a series of questions and answers that try to discover

why and how, while we were trying to concentrate on the basic concepts and leave the details to

the referenced documents, primarily “NetWorker and Data Domain Devices Integration Guide”

version 8.1.

Part 2 is the final output of the customer conversation from part 1 coupled with the data that we

had from the customer requirements documents. We then produced a solution proposal that

relied on the concepts we had built in Part 1, along with details on how those products fit into

the customer environment.

Page 5: BACKUP OPTIMIZATION ‘NETWORKER INSIDE’ · Part 1 While deduplicated storage as a backup media target is not a new concept in Backup and Recovery Solutions architecture (BRSa),

2014 EMC Proven Professional Knowledge Sharing 5

Part 1

While deduplicated storage as a backup media target is not a new concept in Backup and

Recovery Solutions architecture (BRSa), the technology used for this deduplicated storage is

one of the major factors that affects backup performance and success rates.

A well-known example is EMC DL3D which integrated multiple storage technologies to achieve

Backup to Disk (B2D) performance through a Virtual Tape Library interface, coupled with a

backend storage deduplication. However, since the deduplication process was running offline,

appliance performance was known to deteriorate beyond 70-80% disk utilization.

Data Domain emerged as cutting-edge technology for deduplicated storage solutions targeting

backup solutions as a backup to disk storage. Its “in-line” deduplication technology (data is

deduplicated before being written to disk, as soon as it reached the storage host), and high

performance made it one of the best-selling products in the EMC Data Protection and

Availability Delivery portfolio. Perhaps the main reason for its market appeal is the sustainable

performance that it delivers (minimal performance degradation beyond 95% utilization) and the

diverse storage connectivity options it provided. Further integration with backup solutions led to

DD Boost, one of the most interesting features provided with Data Domain appliances.

DD Boost is comprised of Distributed Segment Processing (DSP) coupled with DD API. DSP is

a mechanism that enables client-side deduplication to be integrated into virtually any application

that wants to dump data to a secondary storage backup media. DD API is the Data Domain

programming interface that enables applications/hosts to communicate with DD Operating

system (DDOS) in a way that leverages this integration interface to provide more features and

facilities to “boost” performance, minimize backup widow and bandwidth utilization, and

enhance backup success rates.

Basic concepts mentioned in the following discussion include:

Brief Blueprint on Deduplication Technologies

Deduplication and compression have the same aim; to remove redundancies from the data

patterns. While compression scope is file or an archive of files, deduplication scope is a File

System used to store backup data, also called Storage Unit (SU) in Data Domain jargon. Here,

we are not talking about file level deduplication (which hashes the contents of every file on the

file system and thus detects duplicate content and removes the duplicate copies, replacing them

by stub-pointers to the original content).

Page 6: BACKUP OPTIMIZATION ‘NETWORKER INSIDE’ · Part 1 While deduplicated storage as a backup media target is not a new concept in Backup and Recovery Solutions architecture (BRSa),

2014 EMC Proven Professional Knowledge Sharing 6

Figure 1: File-based deduplication

We are talking about sub-file deduplication technology which segments every file using a certain

segmentation algorithm—the most efficient have been found to be variable length

segmentation—into chunks. It is those chunks that are identified by their hash fingerprints, so if

a duplicate chunk is found it is replaced by a pointer to the original chunk (the first one found to

be unique). This is the technology used for Data Domain deduplication, taking into account that

an added layer of compression is applied after new/unique chunks are identified.

Page 7: BACKUP OPTIMIZATION ‘NETWORKER INSIDE’ · Part 1 While deduplicated storage as a backup media target is not a new concept in Backup and Recovery Solutions architecture (BRSa),

2014 EMC Proven Professional Knowledge Sharing 7

Figure 2: Sub-file, variable length chunks deduplication

How much Data Storage could be gained? How could it be maximized?

While deduplication efficiency varies according to different factors, 20x disk space

reductions is typical for plain uncompressed file systems data. The main factors that

affect deduplication efficiency include:

Data type or nature; some types of data are much more compressible (text files,

spreadsheets, etc.) versus other types that are already compressed in nature

(Audio/Video files, graphics) and thus recompressing them will not produce a

significant benefit. As it relies on file segmentation and file-chunks identification,

any change applied on those incoming files (such as compression and/or

encryption) will produce new patterns of chunks—even with minor changes on

those files—and thus reduce the gain from the deduplication operation.

Change rate: Storage savings increase with each subsequent backup of the

save set because a deduplication backup writes to disk only those data blocks

unique to its catalogue; thus, data that have a high change rate will produce

lower gain than data that has a lower change rate.

Page 8: BACKUP OPTIMIZATION ‘NETWORKER INSIDE’ · Part 1 While deduplicated storage as a backup media target is not a new concept in Backup and Recovery Solutions architecture (BRSa),

2014 EMC Proven Professional Knowledge Sharing 8

Data Retention: The amount of time data is intended to be kept available for

recovery affects the size of the data catalogue (imagine that there is a database

of hashes that represent every stored chunk). As such, if you retain the data for

longer period, your catalogue is larger and thus your deduplication efficiency

would increase (as there will be higher probability to find similar chunks).

For more information, check page 27 of “NetWorker and Data Domain Devices

Integration Guide”

What is the penalty of this gain?

While not really a penalty, as with any compression algorithm, uncompressing

(rehydrating) the data consumes time and effort. However, the DD appliance is

engineered to make data rehydration as painless as possible. With DD Boost devices,

concurrent sessions per device may extend up to 60 sessions per DD Boost device

(multiple recoveries will not impair each other, nor will any application supporting parallel

recovery); each session can easily reach 50 MB/s in a good network infrastructure

supporting Gigabit Ethernet. Backup performance becomes very high following the first

full backup (how high depends on the change rate), but recovery performance will be

comparable to the performance of the first full backup because the data will be rebuilt in

its plain format, then sent to the recovering host.

Classic design example

When configuring an environment for Backup to Disk, there are many alternatives in the choice

of the type of target media (local disk, SAN connected, or even NAS attached).

One could settle for the simplest way and export a NAS (Network Attached Storage) file system

(CIFS or NFS as per your client platform preference) to enable a backup to disk target file

system that could be mounted on a backup Server, any of its Storage Nodes (SNs), or even a

Dedicated SN (a client application host that is used as a SN only for its own data; this aims to

optimize LAN access by setting the Application host as a "dedicated" Storage Node (DSN).

Thus, the data goes from the application host to the backup storage directly instead of having to

pass by some generic storage node. This is an optimization configuration that avoids having the

backup data flow (client to SN and then SN to NAS appliance) traverse the LAN twice. It is

needed when the network access is a bottleneck.

Page 9: BACKUP OPTIMIZATION ‘NETWORKER INSIDE’ · Part 1 While deduplicated storage as a backup media target is not a new concept in Backup and Recovery Solutions architecture (BRSa),

2014 EMC Proven Professional Knowledge Sharing 9

A Data Domain host can be configured to export a NAS filesystem. Once your target disk is

ready, configure your disk device. In NetWorker AFTD is the type most used.

Even without backup software, this NAS file system can still be used as a target for Oracle

RMAN backup script or MSSQL backup script, and Data Domain host will deduplicate the

resulting backup files. Meanwhile, DD Boost demonstrates its value-add will by discovering that

backup performance is limited by the network bandwidth.

To tackle network congestion at the target Data Domain host, the Data Domain appliance is

configurable for NIC aggregation for Network connectivity optimization. Network connectivity

optimization through link aggregation on Data Domain host side will certainly help. Still, even

with link aggregation deployed without any problems, there are physical limitations to any LAN

that it cannot bypass.

Different aggregation protocols and hashing methods exist in the Data Domain configuration

option. It is important to mention that Link aggregation is a point-to-point mechanism, not end-

to-end. In other words, it aggregates the switch ports to Data Domain NICs into a single virtual

interface, but the clients are not aware of this mechanism. Details are available in the Data

Domain OS administration guide.

If you do not favor backup to disk for any reason, i.e. it saturates your LAN links or causes LAN

infrastructure constraint, you can use your hosts SAN connectivity to connect to the DD virtual

tape library (VTL). This allows data to travel on the SAN through Fiber Channel (FC)

connectivity without any data transport overhead on the LAN, which is still used in this case but

just for meta-data transport to the backup server.

What if the above options are not enough? What if we have a tighter backup window and

need more optimizations?

DD Boost is the answer. As the size of data targeting the Data Domain host—sent over the wire

as plain data—scales up, duties increase for your LAN and pressures rise on your backup

system especially with more backup-to-disk clients and storage nodes added in your data

center.

In such situations, client-side deduplication or, in Data Domain parlance, Distributed Segment

Processing is your solution, as it enables identification of file chunks to take place on the

deduplication client side (the host that sends data to the Data Domain appliance). Thus, there is

Page 10: BACKUP OPTIMIZATION ‘NETWORKER INSIDE’ · Part 1 While deduplicated storage as a backup media target is not a new concept in Backup and Recovery Solutions architecture (BRSa),

2014 EMC Proven Professional Knowledge Sharing 10

no need to send all plain data on the network; only new chunks need to be sent to the Data

Domain host.

In other words, DD Boost with its DSP feature working through Data Domain API ensures that

the host that is sending data to the Data Domain appliance is not sending redundant data over

the network. Any DD Boost-enabled application will compute the hashes of the chunks of data

that it wants to send to the Data Domain host for storage, then ask the Data Domain host : do

you already have those fingerprints (as an identifier for each file chunk) in your catalogue? If so,

the Data Domain host does not need to receive redundant data; it will just create the pointer. If

not (this is a new data chunk) it is compressed, then sent to the Data Domain host for storage.

Figure 3

In this way, data redundancy checking becomes a mutual effort between the deduplication client

(host sending data using DD Boost functionality) and the Data Domain host appliance, which

optimizes the network usage in exchange for a minimal CPU and memory penalty on the client

side.

Projecting the above concept on to NetWorker operations, we can see that all that is needed is

to transform NetWorker Backup to disk devices to be DD Boost-enabled. Thus, we do not have

to worry about which NAS protocol to use for network file access (DD Boost handles that part

through its native NFS). Even device directory creation is managed through DD Boost as

NetWorker can do that through talking directly to Data Domain OS through the DD Boost API.

Details on DD Boost device creation are found in the “NetWorker and Data Domain Devices

Integration Guide”. Also, migration from Old Tape devices / Backup to Disk devices to DD

Boost-optimized devices is discussed in Chapter 3 of the same document.

Page 11: BACKUP OPTIMIZATION ‘NETWORKER INSIDE’ · Part 1 While deduplicated storage as a backup media target is not a new concept in Backup and Recovery Solutions architecture (BRSa),

2014 EMC Proven Professional Knowledge Sharing 11

The bandwidth and time gain is quite astonishing, as NetWorker adds more usages of the DD

Boost API “client direct” configuration option (added in NetWorker 8.0) enabling backup clients

to send their data “directly” to the Data Domain host instead of having to pass by their

configured Storage Node. This optimizes network usage and accelerates backup execution as

now the clients are not sending any plain data to their SNs on the wire. Though the SN is still

used for meta-data processing, it is not stressed with the data storage efforts which increases

the likelihood for backup success.

Figure 4

This is not the only gain from buying DD Boost, but this is how we choose to introduce an

example on its utility. Two great gains arise from the fact that the backup application can talk to

Data Domain OS and see the deduplication catalogue:

1. Clone Controlled Replication

2. Virtual Synthetic Full

How does DD Boost enhance cloning? Does that include cloning to all type of media?

Cloning is copying a saveset from one storage media to another. A common example is cloning

savesets from disk devices to tape devices for long term retention. Thus, the cloning operation

Page 12: BACKUP OPTIMIZATION ‘NETWORKER INSIDE’ · Part 1 While deduplicated storage as a backup media target is not a new concept in Backup and Recovery Solutions architecture (BRSa),

2014 EMC Proven Professional Knowledge Sharing 12

includes reading the saveset (similar to recovering it) then writing it back to another media

(similar to backup).

The scope of DD Boost cloning is not related to storage media other than Data Domain, which

means if you are cloning your savesets between storage media that includes anything other

than Data Domain hosts, you will be running conventional cloning (recover the saveset from

media A + write it to media B). However, if you are cloning your saveset between two Data

Domain hosts, this is your chance to leverage DD Boost Clone Controlled Replication (CCR).

Figure 5

How does it work?

When both source and target storage pools are Data Domain devices, DD Boost saves backup

system bandwidth, i.e. CPU, memory, and network bandwidth, through the Managed File

Replication (MFR) feature. How this happens is an interesting story; as the cloning operation

reads the saveset that it wants to clone from DD Boost device A on DD host A (like recovering),

it should then write it to DD Boost device B on Data Domain host B as this saveset is stored in

the form of a file (or more). Why not tell Data Domain host A to replicate that file to Data Domain

host B? This would save the effort of reading (which is a rehydration operation) the whole file

and writing it back (which is a dehydration operation) to a different host. Also, bandwidth that will

be utilized to read the plain data saveset can also be preserved, because Data Domain

Page 13: BACKUP OPTIMIZATION ‘NETWORKER INSIDE’ · Part 1 While deduplicated storage as a backup media target is not a new concept in Backup and Recovery Solutions architecture (BRSa),

2014 EMC Proven Professional Knowledge Sharing 13

replication copies only the chunks missing (to construct that file) from source to destination. This

makes CCR a great candidate for cloning to the Disaster Recovery (DR) site, satisfying

legislative requirements to archive your backups offsite for financial auditing, corporate internal

auditing requirements, DR planning, and contingencies without the need to clone to tape.

Figure 6

What advantage does CCR have over the conventional Data Domain replication?

Quite a few. Data Domain conventional replication has three limitations:

1. The backup server will not be aware that cloning took place, so manual intervention will

be needed to create and mount the required device, if recovery from DR is needed.

2. You should not use conventional cloning with DD Boost devices (the replicated devices

cannot be used as a source for further replication). For more information, see Data

Domain native replication considerations of “NetWorker and Data Domain Devices

Integration Guide”

3. There is no way to force different retention on cloned savesets as the backup server is

not aware of replication. Consequently, this cannot be used for long term archiving.

Page 14: BACKUP OPTIMIZATION ‘NETWORKER INSIDE’ · Part 1 While deduplicated storage as a backup media target is not a new concept in Backup and Recovery Solutions architecture (BRSa),

2014 EMC Proven Professional Knowledge Sharing 14

Any other cloning enhancements?

It is important to mention that NetWorker 8.1 added a new enhancement to a cloning operation

called “immediate cloning”. This enables a saveset to be cloned as soon as its backup is done,

as opposed to group cloning that runs a clone process to clone all savesets backed up during

the group run and scheduled cloning that runs clone process in a scheduled manner aside from

the backup run.

For more information on how to configure and run CCR clones, refer to the “NetWorker and

Data Domain Devices Integration Guide”

What is the Virtual Synthetic Full feature added in NetWorker version 8.1? How does it

leverage DD Boost for further optimizations of backup operation?

First, let’s define what a Synthetic Full (SF) backup is: suppose that you need to do a full back

up before rolling out a critical system patch or cumulative update but you don’t have enough

time on your backup window for a full backup. The solution is either to cut the time needed from

production time (typically not an option) or postpone the critical update. This is when SF comes

to the rescue.

SF runs an incremental backup, then uses that incremental and earlier incrementals till the last

full backup to construct a new full backup without actually running a full backup. Hence, the

name, Synthetic Full. Introduced in NetWorker version 8, SF is not supported for NDMP

backups. For a list of SF requirements consult KB article 168411:

https://support.emc.com/kb/169411. Also, more details can be found in the NetWorker

Administration guide.

Figure 7

Page 15: BACKUP OPTIMIZATION ‘NETWORKER INSIDE’ · Part 1 While deduplicated storage as a backup media target is not a new concept in Backup and Recovery Solutions architecture (BRSa),

2014 EMC Proven Professional Knowledge Sharing 15

Virtual Synthetic Full (VSF) backup is a new feature introduced in NetWorker 8.1(it requires DD

Boost 2.6 and DD OS 5.3 or higher). It is the same as a synthetic full backup, except that it is

performed on a single Data Domain system (all full and incremental backups must reside on the

same Data Domain host). Similar to Synthetic Full, VSF uses full and partial backups to create a

new full backup. However, since the backups reside on a Data Domain system, and use the

new DD Boost APIs, the backup does not require saveset data to be sent over the wire (no

need to read the savesets), resulting in improved performance over synthetic full and traditional

backups.

What actually happens is that since NetWorker is constructing a synthetic full from savesets

stored on the Data Domain host, and since DD Boost allows NetWorker to see the file-chunks

catalogue, it does not have to read the savesets off the Data Domain host. Instead, it may use

that catalogue to construct the new SF (or VFS, in this case) without having to read all the

savesets off the Data Domain host and write the new saveset. For more details on VSF backup

execution, refer to the NetWorker Administration guide – page 88.

Does DD Boost and client direct work for module backups as well as for file system

backups?

Yes. Consult your module documentation to confirm that your version has the proper support.

Of course, your client must have direct network access to the Data Domain host.

How many DD Boost devices can I configure on a SN?

You can configure as many as you need. Keep in mind that a single device can accommodate

60 sessions so there should be no problem sharing the same device through multiple SNs as

long as backups directed to this device are targeting the same pool.

Advantages/disadvantages of the new DD Boost over Fibre Channel (DFC)

The pre-requisites are DD OS 5.3 or above, coupled with NetWorker 8.1 or above (the version

that has DD Boost 2.6). DFC-enabled clients and SNs must be zoned to the Data Domain host

HBAs target LUNs that represent the DD Boost devices. For complete deployment procedure,

refer to the DD OS Administration guide and NetWorker 8.1 “NetWorker and Data Domain

Devices Integration Guide”.

DFC is another way to work around an overloaded LAN during the backup window. DFC

enables DD Boost-enabled clients and SNs to access DD Boost devices through SAN,

minimizing bandwidth pressures over LAN to the Data Domain host during the busy backup

Page 16: BACKUP OPTIMIZATION ‘NETWORKER INSIDE’ · Part 1 While deduplicated storage as a backup media target is not a new concept in Backup and Recovery Solutions architecture (BRSa),

2014 EMC Proven Professional Knowledge Sharing 16

window. SAN connected clients and SNs send the data to the DD Boost devices through SAN

partially relieving the LAN for other tasks. Client Direct configuration still applies with DFC

devices; Clients are configurable for the connectivity type required (either IP of FC)

The only disadvantage is a minor performance penalty. When sending data over SAN,

performance has been found lower than LAN DDBoost connectivity by 20% in the worst cases.

It is worth mentioning that the client HBA settings (specifically the HBA queue length) have an

impact on performance as mentioned in this DD article

https://my.datadomain.com/download/kb/all/boostfc_client_qdepth.htm

DFC is currently available for Windows and Linux hosts only, but further platform support is on

the way.

Part II

This is the solution proposal that we developed answering the backup requirements of an

anonymous customer, there are some redundancies some figures and concepts shown below,

but we decided to represent the document in its full length to preserve its integrity.

Journey to an optimized backup environment

EMC is really pride on helping, designing and implementing enterprise-class backup and

recovery solutions, based on powerful, sustainable, and world class level products for its

customers. EMC invests in key infrastructure-related initiatives delivering upon a strategic long-

term vision. With this vision in mind, we recently implemented a consolidation project for a large

customer to leverage their current infrastructure and consolidate their data center with EMC

solutions.

The design put forth aimed to solve these business challenges:

Cost Competitiveness

Highest Levels of Reliability

Ease of Management

High Performance

Compatibility

Page 17: BACKUP OPTIMIZATION ‘NETWORKER INSIDE’ · Part 1 While deduplicated storage as a backup media target is not a new concept in Backup and Recovery Solutions architecture (BRSa),

2014 EMC Proven Professional Knowledge Sharing 17

Cost Competitiveness

Being cost competitive is paramount in order to build and maintain business. Whether facing

economic turmoil or economic boom times, we must ensure that the solutions we offer fit our

customer‘s budgets.

Highest Levels of Reliability

Offering cost-effective solutions is meaningless if the solution is plagued by outages. Backup

infrastructures must be capable of performing even after suffering multiple component failures.

Customer loyalty will be lost if we fail to meet our availability obligations. We strive to offer

products and services that remain operational 24 x Forever.

An added benefit of highly reliable systems is the cost savings realized the longer the systems

remain operational. Systems capable of remaining in production 7 or more years can yield

significant long-term savings and/or profit over those capable of running production workloads

for 3 to 5 years.

Ease of Management

Hand-in-hand with being cost-effective and reliable, systems need to be as automated and easy

to use as possible—being able to do more with less. The more complex the solution, the more

resources it takes to maintain and operate over its lifecycle driving overall cost up while driving

reliability down. Ensuring staffing levels remain stable in the face of unabated growth is

essential in cost containment and is the main reason ease of management remains a key

requirement.

High Performance

The overall solution must be capable of delivering during periods of high usage and must be

designed to eliminate congestion points. Delivering solutions that suffer from poor performance

frustrates customers and wastes precious time and resources tracking down and resolving

performance-related issues.

Compatibility

Gone are the days of implementing independent computing silos. It‘s expensive and difficult to

maintain solutions designed in isolation. To meet aggressive growth objectives, we need to

ensure all of the systems being deployed are compatible and work with one another. Everything

needs to work together and scale in order to keep the overall solution as simple and

manageable as possible.

Page 18: BACKUP OPTIMIZATION ‘NETWORKER INSIDE’ · Part 1 While deduplicated storage as a backup media target is not a new concept in Backup and Recovery Solutions architecture (BRSa),

2014 EMC Proven Professional Knowledge Sharing 18

The Journey

The solution was designed to meet and exceed our customer’s expectations. We emphasized

leveraging their current infrastructure, and enhanced the ability for future upgrades depending

on scalable solutions and powerful products that can support all aspects of the business.

Where we were

The customer has a complex environment; the data center has many topologies for performing

backups.

Current infrastructure

An analysis of the infrastructure uncovered points of possible improvement and the importance

of having a single backup tool for improving management and reducing administration time.

Existing infrastructure

4 backup servers (Data Zones) running with 3 different backup products

o HP Data Protector - main backup infrastructure

o Dell NetVault - NDMP backup DMZ and Fernord

o Symantec Backexec - Trenord site

o Symantec Backupexec - Infrastructure Iseo site

Repository Backup Fujitsu CS800 S2 (backup to disk storage)

Analysis based on data collected from EMC staff for ABC customer.

Page 19: BACKUP OPTIMIZATION ‘NETWORKER INSIDE’ · Part 1 While deduplicated storage as a backup media target is not a new concept in Backup and Recovery Solutions architecture (BRSa),

2014 EMC Proven Professional Knowledge Sharing 19

Legend of the table (which is written in Italian)

Ambiente = Environment

Ambito aziendale= Name of the company

Tipo di backup = type of the backup

Mezzo trasmisivo attuale = method of transmission.

TB giorno = TB of every day, Mese = Month, and Anno = Year.

Points of improvement of the future infrastructure

Shorten the backup of the Exchange infrastructure via LAN-free backup mode.

Shorten the backup of NetApp storage by increasing the number of drives used for

backup.

Shorten the backup of Oracle database through the use of LAN-free backup mode.

Possible reduction of the backup window SAP infrastructure by increasing the number of

drives used by the server.

Availability of a single backup tool: single point of management and unique management

methodology. NetWorker employed.

Increase performance and disk space on the VTL for a project to longer-term, adequate

to support the performance improvements provided above. “Data Domain as a

candidate”.

Page 20: BACKUP OPTIMIZATION ‘NETWORKER INSIDE’ · Part 1 While deduplicated storage as a backup media target is not a new concept in Backup and Recovery Solutions architecture (BRSa),

2014 EMC Proven Professional Knowledge Sharing 20

ABC backup Architecture 1

FAS 2040

FAS 3140

10 Gbe

Networker Server 8.1

City 1

DMZ

SAN TRENSAN FERN

VMWARE38 VM

Bare- Metal

SAP DWH

5 SAP ServersTransazionali AIX

OracleCluster

SAP DWH

SAP DWH Svil

Data DomainWith DD boost

vStore API ProxyNetworker

Server 8.1

LAN TREN

LAN ISEOLAN DMZ

LAN FERN

AGENTI

TSM EE

VE

SAP

DB

MAIL

LAN

FREE

vStore API Proxy

VMwareFERN

Bare- Metal

VMwareTREN

Bare- Metal

FAS 3140

SAP DWH Svil

OracleWindows

Lupin

Figure 8

Page 21: BACKUP OPTIMIZATION ‘NETWORKER INSIDE’ · Part 1 While deduplicated storage as a backup media target is not a new concept in Backup and Recovery Solutions architecture (BRSa),

2014 EMC Proven Professional Knowledge Sharing 21

ABC Backup architecture 2

10Gbe

Networker Server 8.1

City 2

LANFERN

VMWARE38 VM

VM FERNORD

5 SAP Svil.

ExchangeNor

SAP DWHSvil

NetworkerServer 8.1

LAN ISELAN DMZ

VM TREN5 SAPTest

LANTREN

SQL Server

SQL ServerWin Server

Win Server

Linux ServerLinux Server

AGENTI

VE

SAP

DB

MAIL

LAN FREE

TSM EE

ExchangeTren

Win ServerWin Server

Oracle Server

Exchange Fern

Figure 9

Page 22: BACKUP OPTIMIZATION ‘NETWORKER INSIDE’ · Part 1 While deduplicated storage as a backup media target is not a new concept in Backup and Recovery Solutions architecture (BRSa),

2014 EMC Proven Professional Knowledge Sharing 22

10Gbe

FAS2030

Networker Server 8.1

SANNovate

City 3

8x FC

LAN City 2

ABC Backup architecture 3

Win Server

AGENTI

VE

SAP

DB

MAIL

LAN FREE

TSM EE

DDBoost Replica IP bidirectional 1 Gbe

Figure 10

Page 23: BACKUP OPTIMIZATION ‘NETWORKER INSIDE’ · Part 1 While deduplicated storage as a backup media target is not a new concept in Backup and Recovery Solutions architecture (BRSa),

2014 EMC Proven Professional Knowledge Sharing 23

Customer’s challenges vs. solutions

The customer had many challenges in his environment, including:

Load on the LAN

Inefficient backups

Low level integration with virtual environment

Inability to perform tech refresh on fiber network.

Distributed management system for the backup environment

Steps to the solution

NetWorker

The first phase was implementing NetWorker as management software to centralize the

customer’s backup, recovery, and archiving environment. The integration features of NetWorker

enabled us to integrate it with the major components of the backup & recovery environment

(databases and applications servers).

New features we were able to use after implementing NetWorker 8.1 as a central platform for

backup and recovery included:

greater backup efficiencies, spanning integration for EMC Array snapshot

management, to further integrations with Data Domain, and new support for

Block-Based Backup for Windows systems.

optimized support for VMware backup and recovery with a new underlying

VMware Backup

enhanced NetWorker management on several fronts continues to expand

support for enterprise applications with support for new features that maximize

efficiencies

Snapshot management

The customer wished to simplify management of the snapshot and also remove the overhead

components by integrating the solutions together. We used the Integrated snapshot

management feature which enabled us to eliminate the need to have a separate proxy server to

move the snapshots. The administrator now has the ability to use the NetWorker Storage Node

to act as a proxy in the workflow.

Page 24: BACKUP OPTIMIZATION ‘NETWORKER INSIDE’ · Part 1 While deduplicated storage as a backup media target is not a new concept in Backup and Recovery Solutions architecture (BRSa),

2014 EMC Proven Professional Knowledge Sharing 24

Use of snapshots as part of an overall data protection strategy not only enables fast operational

backup and recovery, but also allows backup to disk or tape to happen offline without impact to

the mission critical application server. This process is often referred to as “Live Backup”. Tapes

can be created and sent offsite for disaster recovery purposes. At any time, recovery can be

accomplished from a snapshot or from disk or tape as needed.

NetWorker Snapshot Management will catalog all snapshot activities, enabling quick search and

recovery for restore purposes. NetWorker software provides lifecycle policies for snapshot save

sets. Snapshot policies specify the following:

• Time interval between snapshots

• Maximum number of snapshots retained, above which the older snapshots are

recycled

• Which snapshots will be backed up to traditional storage

• Selecting the type of snap that will be created

• Expiration policy of the snapshot

• Number of active snapshots that will be retained on the storage array

Snapshots for DB2, Oracle, and SAP are also managed via the NetWorker Snapshot

Management feature. Configuration Wizard support for these applications will be added in a

later release.

NetWorker Snapshot Management operations for each NetWorker client can be monitored

through NMC reporting features. Monitored operations cover snapshots that are successfully

created or in progress, as well as snapshots that are mounted, in the process of being rolled

over, and deleted. Reports include details of licensed capacities consumed. NMC also provides

a detailed log of snapshot operations.

Snapshot Management is included with a NetWorker capacity-based license.

The Client Configuration Wizard for the NetWorker Snapshot Management feature enables

automatic discovery of the environment that has been configured for snapshots by the Storage

Administrator. The Wizard accommodates the common NetWorker Snapshot Management

workflows associated with snapshot and rollover configurations.

Page 25: BACKUP OPTIMIZATION ‘NETWORKER INSIDE’ · Part 1 While deduplicated storage as a backup media target is not a new concept in Backup and Recovery Solutions architecture (BRSa),

2014 EMC Proven Professional Knowledge Sharing 25

Snapshot validation will verify whether a backup as configured by the Wizard is likely to be

successful.

Simplify the process

No scripting is required. The Configuration Wizard will ensure that the proper commands are

executed for the associated snapshot operations, that the LUNs are paired appropriately, and

that all NetWorker resources are properly assigned. Basically the Wizard will take care of

configuring, end-to-end, the client snapshot/rollover policy.

Data Domain

DD Boost inside

Thanks to its ease of management and the deeper integration with NetWorker, Data Domain

enabled us to eliminate the need to tape out via the the new Data Domain Boost over Fibre

Channel feature.

Support for the Fibre Channel protocol has now been added to DD Boost and NetWorker 8.1

leverages it for customers who have standardized on Fibre Channel as their backup protocol of

choice. This support not only optimizes the customers’ existing investment in their Fibre

Page 26: BACKUP OPTIMIZATION ‘NETWORKER INSIDE’ · Part 1 While deduplicated storage as a backup media target is not a new concept in Backup and Recovery Solutions architecture (BRSa),

2014 EMC Proven Professional Knowledge Sharing 26

Channel infrastructure, but with DD Boost client-side deduplication, the customer can now enjoy

50% faster backups over their traditional VTL-based model and 2.5x faster recovery.

Data Domain systems reduce the bandwidth required on the network, as well as the disk

capacity required. Since this support offers both client-side deduplication and support of the

Fibre Channel protocol using a backup-to-disk workflow, the old VTL ‘tape-based’ management

can be eliminated. This results in greater reliability and less complexity. This support also

enables for Fibre Channel all the features that Data Domain and DD Boost offer, including

virtual synthetic full backups, clone controlled replication, global deduplication, and more.

DD Boost over Fibre Channel is supported for Windows and Linux environments.

When performing full backups previously, all data had to be sent from the backup server to the

Data Domain system. With DD Boost, only unique data is sent from the backup server or the

client to the DD system. This means up to 99% less data to be moved across the already

loaded network, even for full backup.

This enabled us to use the current infrastructure LAN/SAN resources more efficently. Actually,

when DD Boost can be leveraged at the client level (EMC NetWorker, Avamar®, and Oracle

RMAN), this bandwidth advantage spans the entire backup path all the way from the client to

the Data Domain system.

Page 27: BACKUP OPTIMIZATION ‘NETWORKER INSIDE’ · Part 1 While deduplicated storage as a backup media target is not a new concept in Backup and Recovery Solutions architecture (BRSa),

2014 EMC Proven Professional Knowledge Sharing 27

Figure 11

For our backup enviroment, we were experiencing bandwidth choking during full backup. This

provided significant performance improvements and helped avoid infrastructure upgrades.

Figure 12

Reduce the workload on the backup servers

We were restricted on adding new componenets to the current environment and the solution we

proposed needed to use some componenets of the backup servers already in use.

Though we thought that moving some of the deduplicaiton work from the Data Domain system

to the backup server would negativly impact the backup server, the good news was that was not

the case. This might seem counterintuitive but, as it turns out, sending less data significantily

Page 28: BACKUP OPTIMIZATION ‘NETWORKER INSIDE’ · Part 1 While deduplicated storage as a backup media target is not a new concept in Backup and Recovery Solutions architecture (BRSa),

2014 EMC Proven Professional Knowledge Sharing 28

reduces the load on the server. In other words, it takes fewer CPU cycles to assist with two

steps of the deduplication process than it takes to push full backups over the ethernet.

Virtual Synthetics

Figure 13

Virtual Synthetic Full backups are an out-of-the-box integration with NetWorker, making it ‘self-

aware.’ Therefore, our customer is now using a Data Domain System as their backup target.

NetWorker will use Virtual Synthetic Full backups as the backup workflow by default when a

synthetic full backup is scheduled, thus optimizing incremental backups for file systems.

Virtual synthetics reduce the processing overhead associated with traditional synthetic full

backups by using metadata on the Data Domain system to synthesize a full backup without

moving data across the network. Unlike other vendors, no Storage Node/Media server is

required, and there is no rehydration during the recovery.

In this workflow, a full backup is sent to Data Domain, taking full advantage of Data Domain

value-add features, namely DD Boost. Incremental backups are run daily, as usual, after which

point, instead of initiating a new full backup, another incremental backup would be run, and then

a Virtual Full.

In a Virtual Synthetic Full backup, NetWorker sends commands to the Data Domain System of

what regions are required to create a full backup, but no data is transferred over the network.

Instead, the regions of the full backup are synthesized from the previous full and incrementals

already on the system by using pointers. This process eliminates the data that needs to be

Page 29: BACKUP OPTIMIZATION ‘NETWORKER INSIDE’ · Part 1 While deduplicated storage as a backup media target is not a new concept in Backup and Recovery Solutions architecture (BRSa),

2014 EMC Proven Professional Knowledge Sharing 29

gathered from the file server, reducing system overhead, time to complete the process, and

required network bandwidth.

This workflow is repeated over the following weeks, with a new traditional full backup

recommended only after every 8-10 Virtual Full backups have been completed. Therefore, the

use of Virtual Synthetic Full backups also reduces the number of traditional full backups from 52

to 6 per year – up to 90% reduction in full backups annually.

Avamar

As we have illustrated in the initial diagram of the customer’s environment, customer needed to

add two remote sites to support the business; thus also needing to back up these sites. We

thought about the best solutions that can support this new structure without requiring major

changes in the network design.

We suggested integrating the new features available in the powerful capabilities of DD Boost

under the umbrella of NetWorker.

Here is the design we proposed to support all the data center activities.

Figure 14

Our proposal offers the maximum benefits of the marriage between Avamar and Data Domain,

where Avamar clients send the data directly to the Data Domain system. Specifically, this

integration will provide the Data Domain system scalability and performance advantages for the

most challenging backup workloads including VMware image backups, NDMP, FS, and

Page 30: BACKUP OPTIMIZATION ‘NETWORKER INSIDE’ · Part 1 While deduplicated storage as a backup media target is not a new concept in Backup and Recovery Solutions architecture (BRSa),

2014 EMC Proven Professional Knowledge Sharing 30

enterprise applications, such as Oracle, DB2, MS Exchange, MS SQL databases, and MS

Sharepoint. This greatly optimizes LAN bandwidth and multiplies the advantage of distributing

some of the deduplication effort to hundreds of clients, improving the performance of the overall

back cycle.

Additionally, DD Boost supports Avamar instant access to the virtual machines stored on the

Data Domain system, which is of great benefit during system restores.

Backing up Oracle and SAP databases

As referenced at the beginning of the article, the customer has Oracle DB and SAP applications.

While we were adding the design we remained sensitive to what needs to be taken into

consideration about these applications. We asked the DBAs about their preferences to back up

their data and found that they preferred to perform a full back up every day and sometimes

more than one full back up each day as per the criticality of the database.

The challenge here is; can the system and the current infrastructure support such workload? In

normal conditions the backup window can take more than 12 hours for a full back up, plus the

data growth over time.

In testing the difference that DD Boost can provide to reduce this issue, we found that the

backup window for full backups can be reduced to 8 hours. Plus, that it can provide DBAs the

ability to administrate everything through the RMAN and eliminate the need to rely upon the

backup administrators. Additionally, this enables DBAs to have a full RMAN catalog of both the

local and DR sites.

Page 31: BACKUP OPTIMIZATION ‘NETWORKER INSIDE’ · Part 1 While deduplicated storage as a backup media target is not a new concept in Backup and Recovery Solutions architecture (BRSa),

2014 EMC Proven Professional Knowledge Sharing 31

Figure 15

“Virtualized Environments”

VMware

Optimizing VMware Backup and Recovery

Integrating with the industry-leading Avamar technology for backup of VMware environments is

a major feature of EMC NetWorker. VMware has chosen Avamar technology to power its

recently announced vSphere Data Protection (VDP) and vSphere Data Protection-Advanced

(VDP-A) support. Now, that same technology has been leveraged in NetWorker, thus enabling

Change Block Tracking technology for both backup and recovery of data, as well as a multi-

streaming centralized proxy that will also load balance jobs between proxy servers for increased

VM backup performance, and many other features. Since the backup includes all the changed

blocks, every backup is essentially always a full backup.

NetWorker uses a software-based VMware Backup Appliance (VBA). The VBA stores the

metadata, sending changed blocks during the backup workflow to a Data Domain System

target. This support is specific to, and optimized by, Data Domain. Therefore, the customer

enjoys all the features and value from a Data Domain solution including DD Boost support,

clone to tape for retention and compliance, and global deduplication, to name a few. Each VBA

is capable of protecting hundreds of virtual machines ensuring protection for the largest virtual

environments.

Page 32: BACKUP OPTIMIZATION ‘NETWORKER INSIDE’ · Part 1 While deduplicated storage as a backup media target is not a new concept in Backup and Recovery Solutions architecture (BRSa),

2014 EMC Proven Professional Knowledge Sharing 32

Now our customer has the option to clone from Data Domain to tape or other external media for

extended retention and compliance purposes.

In-guest protection is enabled by NetWorker Modules for application consistency. The new

VMware engine is supported to co-exist with support in NetWorker 8.0 and earlier, primarily for

customers who continue to have a requirement to back up directly to tape using physical

proxies.

Managing VMware backup and recovery

Through direct integration with VMware vCenter, we offered a collaborative approach to backup

management that empowers the VMware Administrator to manage their own backups, while the

Backup (NetWorker) Administrator maintains visibility and control of corporate SLAs through

policy-setting, monitoring, and reporting.

Both VMware and Backup Administrators are empowered with visibility and control of the

environment. Protection is based on policies, as defined by the Backup Administrator, and

selected for each virtual machine, or group of virtual machines, by the VMware Administrator.

Virtual machines are auto-discovered and automatically protected based on the policies

assigned to the group where they are created.

Both image and file level recovery are supported. Since this feature support is enabled by

integration with VMware vCenter, management is virtual-centric, with information on the

VMware environment presented as VM groups and folders. File level recovery is supported for

both Windows and Linux.

Page 33: BACKUP OPTIMIZATION ‘NETWORKER INSIDE’ · Part 1 While deduplicated storage as a backup media target is not a new concept in Backup and Recovery Solutions architecture (BRSa),

2014 EMC Proven Professional Knowledge Sharing 33

Enhanced Management and Enterprise Applications Support

Figure 16

The last thing we offered to our customer is the new NetWorker plugin to have a panoramic

view of the backup environment. With no additional cost we introduced the new EMC Backup

and Recovery Manager. It is a new intuitive management interface for monitoring and reporting

for NetWorker and Avamar through a single pane of glass. While primarily used for NetWorker

and Avamar, it will also support monitoring of Data Domain Systems from the backup

administrator’s perspective. Operators and administrators can monitor alerts, activities, and

systems. It also monitors events, which are informational messages, useful for troubleshooting

and auditing. Reporting features enable customers to confirm that client systems are being

properly protected and also track system usage and capacity. It offers a dashboard approach

providing all key information on a single screen, including alerts and warnings. Other key

usability features include filters, grouping, search, and color-coded tracking for system capacity.

New core NetWorker features focused on management simplicity and usability including an

integrated, wizard-based recovery graphical user interface available directly from the NetWorker

Management Console. This GUI will walk the Administrator through every step of the recovery

process, including recovery of snapshots, file systems, and the new Block Based Backups. It

enables a recovery operation to be scheduled and can also perform multiple recovery

operations at once.

Page 34: BACKUP OPTIMIZATION ‘NETWORKER INSIDE’ · Part 1 While deduplicated storage as a backup media target is not a new concept in Backup and Recovery Solutions architecture (BRSa),

2014 EMC Proven Professional Knowledge Sharing 34

The NetWorker server DR process has been simplified and replaces the current manual multi-

step process in the event a NetWorker server goes down. Features include self-awareness such

that if the bootstrap server ID is unknown, the system will initiate a scanner process. Now, the

Backup Administrator is stepped through the process of recovery, without having to pull out a

complicated manual to follow.

The command line wizard program automates the recovery of the NetWorker server’s media

database, resource files, and client file indexes. The administrator can choose to recover just

the media database, the resource files, the client file indexes – or all of the above.

Appendix

1- http://nsrd.moab.be/2013/07/12/networker-8-1-countdown-2/

2- Why EMC DD series doc number h11755

3- V to the MAX by John Bowling- knowledge sharing article 2012

Page 35: BACKUP OPTIMIZATION ‘NETWORKER INSIDE’ · Part 1 While deduplicated storage as a backup media target is not a new concept in Backup and Recovery Solutions architecture (BRSa),

2014 EMC Proven Professional Knowledge Sharing 35

Biography

Mohamed Sohail

Mohamed has over 9 years of IT experience in operations, implementation, and support; 4 of

them with EMC. Mohamed previously worked as a Technical Support Engineer at Oracle Egypt

and was a Technical Trainer at Microsoft. Mohamed holds a Bsc. in Computer Science from

Rome University Sapienza Italy and a B.A in Italian from Ain Shams University Egypt.

Mohamed holds EMC Proven Professional Backup Recovery certification.

Shareef Bassiouny

A Backup Recovery NetWorker Specialist, Shareef is a Technical Support engineer in the GTS

Organization at EMC. Shareef has over 12 years of experience in IT operations,

implementation, and support; more than 3 of those spent with EMC NetWorker support. Shareef

holds a Bsc. in Telecommunication Engineering from Cairo University. His previous role was

leading a Dedicated IT Customer Support Desk that handled Data Center Operation and

Change Management at Orange Business Services.

Giovanni Gobbo

With 20 years of experience in the IT field ranging from Microsoft, Linux, Unix, VMware,

Storage, and Backup environments, Giovanni has solid hands-on experience implementing,

managing, and planning physical and virtualized computer infrastructure.

Giovanni has worked for Atlantica, Terasystem, Getronics, and Olivetti.

EMC believes the information in this publication is accurate as of its publication date. The

information is subject to change without notice.

THE INFORMATION IN THIS PUBLICATION IS PROVIDED “AS IS.” EMC CORPORATION

MAKES NO RESPRESENTATIONS OR WARRANTIES OF ANY KIND WITH RESPECT TO

THE INFORMATION IN THIS PUBLICATION, AND SPECIFICALLY DISCLAIMS IMPLIED

WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.

Use, copying, and distribution of any EMC software described in this publication requires an

applicable software license.