deduplication and single instance storage

53
© Copyright 2009-2010, Cambridge Computer Services, Inc. All Rights Reserved www.CambridgeComputer.com 781-250-3000 Presented by: Deduplication and Single Instance Storage Practical Applications for Backups, Archiving, and Primary Storage Jacob Farmer Cambridge Computer

Upload: interop

Post on 18-Nov-2014

4.108 views

Category:

Education


2 download

DESCRIPTION

 

TRANSCRIPT

Page 1: Deduplication and single instance storage

© Copyright 2009-2010, Cambridge Computer Services, Inc. – All Rights Reserved

www.CambridgeComputer.com – 781-250-3000

Presented by:

Deduplication and Single Instance Storage

Practical Applications for Backups, Archiving, and Primary Storage

Jacob Farmer Cambridge Computer

Page 2: Deduplication and single instance storage

2www.CambridgeComputer.com

About Your Lecturer

Jacob Farmer, CTO, Cambridge Computer

• Cambridge Computer, founded in 1991, provides training, integration,

sales, and consulting in the fields of storage management, data

protection, and digital archiving.

Been working in data protection and storage management for almost 20 years.

• Lecturer on storage technologies for Usenix for the past 10 years.

Hybrid of industry analyst and consultant to end-users.

• Spend 25% of my time working in the industry, going to conferences,

meeting with vendors.

• 75% of my time customer-facing, helping the sales and services

departments design solutions for end users.

Email: [email protected]

Deduplication and Single Instance Storage – Interop – Las Vegas – April 27, 2010

© Copyright 2009-2010, Cambridge Computer Services, Inc. All rights reserved.

Page 3: Deduplication and single instance storage

3www.CambridgeComputer.comUsenix-On-The-Road: The Latest Trends in Storage Networking

© Copyright 2009-2010-2010, Cambridge Computer Services, Inc. All rights reserved.

Follow Me on Twitter

My personal activities:

•@JacobAFarmer

–Note the “A” – my middle initial

My educational activities

•@Cambridge_EDU

Page 4: Deduplication and single instance storage

4www.CambridgeComputer.com

Agenda / Topics

Dedupe basics

• What is it, how does it work, and what is all the fuss about?

• Hashing, segmenting, indexing, etc.

Dedupe for backup systems

• Basic benefits

• Different approaches for scaling backups and how they relate back to dedupe

– Front end bottlenecks

– Backup data-movers

– Back-end bottlenecks and scalable deduping

Dedupe for primary storage

• Virtual servers, physical servers, VDI

• Rich media dedupe

WAN Accelerators

Questions as time permits

Deduplication and Single Instance Storage – Interop – Las Vegas – April 27, 2010

© Copyright 2009-2010, Cambridge Computer Services, Inc. All rights reserved.

Page 5: Deduplication and single instance storage

5www.CambridgeComputer.comDeduplication and Single Instance Storage – Interop – Las Vegas – April 27, 2010

© Copyright 2009-2010, Cambridge Computer Services, Inc. All rights reserved.

What is Deduplication?

A term that refers to a number of different methods and techniques for reducing multiple instances of identical data down to a single (or at least fewer) instances.

• Common data is replaced with pointers or tokens that refer

back to the actual data.

Other terms for deduplication

• Data Reduction

• Commonality Factoring

• Capacity Optimization

• Single Instancing or Single-Instance Storage (SIS)

Page 6: Deduplication and single instance storage

6www.CambridgeComputer.comDeduplication and Single Instance Storage – Interop – Las Vegas – April 27, 2010

© Copyright 2009-2010, Cambridge Computer Services, Inc. All rights reserved.

Is Deduplication a Form of Compression?

Yes, and No.

YES – Deduplication results in data taking up less storage space or consuming less bandwidth on a network circuit.

• Note that dedupe is often used in conjunction with conventional compression.

NO – Deduplication could work on data types that are not compressible.

• If you have 10 identical JPEG files stored in an

uncompressible format, they could be reduced to a single

instance, thus freeing up 90% of your capacity.

Page 7: Deduplication and single instance storage

7www.CambridgeComputer.comDeduplication and Single Instance Storage – Interop – Las Vegas – April 27, 2010

© Copyright 2009-2010, Cambridge Computer Services, Inc. All rights reserved.

Where Do You Find Dedupe Solutions?

Deduplication solutions come to market whenever costs or efficiencies can be achieved by eliminating redundancy.

• Backups

– Conventional backup systems generate tons of redundant data

• Email systems (at rest and in flight)

– I send an email with the same attachment to everyone in the company.

– Then everyone stores it in his/her personal home directory

– Everyone in the branch offices pulls it over the WAN

• File traffic over a WAN

• Application and O.S. binaries across multiple systems

– Virtual Servers and Virtual Desktops

– Backups over a WAN

• Very large collections of rich media files

Page 8: Deduplication and single instance storage

8www.CambridgeComputer.com

Hashing / Fingerprinting

Hashing (aka fingerprints, digests, signatures)• Generates a unique number (160+ bits) based on content

• Hash acts as a proxy for content

• Given a hash, not computationally feasible to generate content

Common Hashing Algorithms• MD5

• SHA-1

• SHA-256

• AES

Hash Size and the Birthday Paradox• The size of the hash needs to be suitable to the task at

hand

Deduplication and Single Instance Storage – Interop – Las Vegas – April 27, 2010

© Copyright 2009-2010, Cambridge Computer Services, Inc. All rights reserved.

Page 9: Deduplication and single instance storage

9www.CambridgeComputer.comDeduplication and Single Instance Storage – Interop – Las Vegas – April 27, 2010

© Copyright 2009-2010, Cambridge Computer Services, Inc. All rights reserved.

Hash Collisions - Are they real?

10-10 10-20 10-30

Probability

Hit by lightning Simultaneous

triple disk fault on RAID-6

Win the lottery

Cryptographic hash collision

Cretaceous extinction meteor

hitting in the next second

Fibre ChannelBit Error Rate

Page 10: Deduplication and single instance storage

10www.CambridgeComputer.com

What Makes Deduplication and SIS Technology Difficult to Engineer?

Hash Processing

• Modern CPUs make this much easier

– 100+ MB/sec/core

• Hardware co-processor cards can hash at rates north of

1.5GB/Sec.

Disk performance

• Deduped data often ends up getting fragmented on disk

– This can hurt performance especially for backup systems

Alignment of de-dupe segments

Indexing

Deduplication and Single Instance Storage – Interop – Las Vegas – April 27, 2010

© Copyright 2009-2010, Cambridge Computer Services, Inc. All rights reserved.

Page 11: Deduplication and single instance storage

11www.CambridgeComputer.comDeduplication and Single Instance Storage – Interop – Las Vegas – April 27, 2010

© Copyright 2009-2010, Cambridge Computer Services, Inc. All rights reserved.

Human Lookup Rates

Software Database

Technology

Purpose built hardware

Indexing Can Be Hard

101 102 103 104 105 106 107 108 109

106

105

104

103

102

101

100

iPod

Router

NYC

phonebook

Fine grained

content

tracking

large

database

# records

# loo

kup

s/s

ec

Page 12: Deduplication and single instance storage

12www.CambridgeComputer.comDeduplication and Single Instance Storage – Interop – Las Vegas – April 27, 2010

© Copyright 2009-2010, Cambridge Computer Services, Inc. All rights reserved.

Parsing / Segmenting / Chunking

Data needs to be “chopped up” in a consistent way in order to get optimal

dedupe ratios

Without any kind of special segmenting strategies backup streams and

complex file types do not dedupe effectively

Large files are almost always changed with overstrike semantics

• Databases, structured data, .vmdk, .pst files

Small files are almost changed with insert semantics

• Office apps, editors etc

If there are large files (e.g. database tables, virtual machine images) in the

backup mix, their treatment usually will dominate any data reduction

strategy.

• Don’t sweat the small stuff!

Different vendors may have strengths with one type or another

Page 13: Deduplication and single instance storage

13www.CambridgeComputer.com

Change Types: Insert v. Overstrike

Insert:

The quick brown fox jumped over the lazy dog.

The quick brown horse jumped over the lazy dog.

Identical data (may be)

misaligned

Overstrike:

Fred

“Fred” added to

employee database

Identical data doesn’t

move

Joe

Joe Sue

Sue

Deduplication and Single Instance Storage – Interop – Las Vegas – April 27, 2010

© Copyright 2009-2010, Cambridge Computer Services, Inc. All rights reserved.

Page 14: Deduplication and single instance storage

14www.CambridgeComputer.com

NetBackup OST (open storage option)

API and framework that makes it very easy for a dedupe

target device vendor to parse the data stream.

• Pre segments content

• Enables more efficient dedup solutions

• Allows for smart copy between systems of only changed

data

Deduplication and Single Instance Storage – Interop – Las Vegas – April 27, 2010

© Copyright 2009-2010, Cambridge Computer Services, Inc. All rights reserved.

P Q Z P Q RR

Z

Page 15: Deduplication and single instance storage

w w w . C a m b r i d g e C o m p u t e r . c o m 15

Deduplication for Backups

Page 16: Deduplication and single instance storage

16www.CambridgeComputer.comDeduplication and Single Instance Storage – Interop – Las Vegas – April 27, 2010

© Copyright 2009-2010, Cambridge Computer Services, Inc. All rights reserved.

Backup Systems Have a Lot of Redundant Data

Conventional backup solutions generate a ton of redundant data

• Assuming weekly full backups, a file that has not changed in 5 years,

still gets backed up 260 times!

• Assuming daily full backups of email, a message you received 5 years

ago gets backed up 1825 times.

– Similarly, a record in a database from 5 years ago might be backed up

1825 times!

There are really two problems to solve:

• Minimizing the amount of redundant data that gets repeatedly

transferred

• Minimizing the amount of redundant data that gets stored.

Page 17: Deduplication and single instance storage

17www.CambridgeComputer.comDeduplication and Single Instance Storage – Interop – Las Vegas – April 27, 2010

© Copyright 2009-2010, Cambridge Computer Services, Inc. All rights reserved.

Most of the Buzz on Dedupe is from Backup Target Vendors

A “target” is a backup storage device

Dedupe disk targets generally come packaged as

• NAS

– File server (NFS or CIFS) interface

• Virtual tape library

– A disk device that emulates a tape library

– Fibre Channel or iSCSI interface

– NAS vs VTL outside the scope of this talk

Page 18: Deduplication and single instance storage

18www.CambridgeComputer.comDeduplication and Single Instance Storage – Interop – Las Vegas – April 27, 2010

© Copyright 2009-2010, Cambridge Computer Services, Inc. All rights reserved.

When Do Dedupe Disk Targets Shine?

When you are backing up a lot of redundant data

• Files that never or seldom change between backups

• Duplicated files

• Databases and email repositories that are receptive to

commonality factoring

When you are retaining backup data for a decent

amount of time

• Ideally you are keeping several weeks of backups

When you seek to replicate a conventional backup

system over a WAN.

Page 19: Deduplication and single instance storage

19www.CambridgeComputer.comDeduplication and Single Instance Storage – Interop – Las Vegas – April 27, 2010

© Copyright 2009-2010, Cambridge Computer Services, Inc. All rights reserved.

Example: NYC Law Firm with NetBackup

6+ Terabytes

Full backups every day!

• Why? Because someone had a bad experience in the past

with incremental backups and has trust issues

90 day retention period

Most files seldom change

• Many files are scanned images that never change

Several large databases

Several TB of MS Exchange

Average result – 102x capacity optimization !

Page 20: Deduplication and single instance storage

20www.CambridgeComputer.comDeduplication and Single Instance Storage – Interop – Las Vegas – April 27, 2010

© Copyright 2009-2010, Cambridge Computer Services, Inc. All rights reserved.

Try to Visualize 100x Capacity Optimization

OR

One 3U cabinet v. 7 full racks full of gear!

Page 21: Deduplication and single instance storage

21www.CambridgeComputer.comDeduplication and Single Instance Storage – Interop – Las Vegas – April 27, 2010

© Copyright 2009-2010, Cambridge Computer Services, Inc. All rights reserved.

Backup Vaulting – Another Use Case for Dedupe

Page 22: Deduplication and single instance storage

22www.CambridgeComputer.comDeduplication and Single Instance Storage – Interop – Las Vegas – April 27, 2010

© Copyright 2009-2010, Cambridge Computer Services, Inc. All rights reserved.

Why Replicate the Backup System?

Relatively easy DR solution

• Does not require additional software for the hosts

• Does not require storage devices with replication

capabilities

• One system that replicate all of your hosts

– Platform-independent

Eliminate the need to ship tapes off site

• Eliminate the need to encrypt tapes

Page 23: Deduplication and single instance storage

23www.CambridgeComputer.comDeduplication and Single Instance Storage – Interop – Las Vegas – April 27, 2010

© Copyright 2009-2010, Cambridge Computer Services, Inc. All rights reserved.

Example: Defense Contractor Replicating ERP System

Problem: CIO does not want employee data being

sent off site without encrypting the tapes. • IT staff wants to avoid tape encryption.

Solution: • Full backup of 800GB+ Oracle database to deduplicating disk target

every day.

• Retain backups for 60 days on disk.

– 60 x 800 = 48TB

• Vault backups to remote site over T1

Outcome• Dedupe ration of about 70:1

• 800GB backup job traverses the T1 in a few hours

Page 24: Deduplication and single instance storage

w w w . C a m b r i d g e C o m p u t e r . c o m 24

But, Before We Get all Hot and Bothered . . .

Let’s review how backup systems

actually work!

Page 25: Deduplication and single instance storage

25www.CambridgeComputer.comDeduplication and Single Instance Storage – Interop – Las Vegas – April 27, 2010

© Copyright 2009-2010, Cambridge Computer Services, Inc. All rights reserved.

Common Backup Bottlenecks

Network

Backup Clients

You have to get data off the

host and transfer it

Network

Seldom the real bottleneck,

except over a WAN

Backup Servers

I/O processing is the most

common bottleneck

Storage Devices

Storage devices can be a

bottleneck, but are seldom

the whole problem.

Page 26: Deduplication and single instance storage

26www.CambridgeComputer.comDeduplication and Single Instance Storage – Interop – Las Vegas – April 27, 2010

© Copyright 2009-2010, Cambridge Computer Services, Inc. All rights reserved.

Front-end and Network – Minimize Duplication in the First Place

Backups generate a lot of redundant data, so what if

we had smarter client software that did not generate

redundant data?

• Incremental Forever

– After the first full backup, only do incremental backups

– This is what IBM TSM does, for instance

• Synthetic Full Backup

– Last weeks full backup is merged with this weeks incremental

backups to “synthesize” this week’s full backup.

– No need to transfer redundant data

Page 27: Deduplication and single instance storage

27www.CambridgeComputer.comDeduplication and Single Instance Storage – Interop – Las Vegas – April 27, 2010

© Copyright 2009-2010, Cambridge Computer Services, Inc. All rights reserved.

Example: Energy Firm using IBM Tivoli Storage Manager

TSM only backs up files that have changed.

• It does not generate a lot of duplicate files

Most of the 15TB of capacity are documentation and images that do not change – ever.

• Relatively little of it is database.

• Images don’t compress

• Utilizing compression on TSM client for compressible files

Over all deduplication ratio: about 2:1

• Can’t justify the cost of dedupe across the board

• Resolution: Set up dedupe tier for database and email

– Do the file backups to conventional disk and tape

Page 28: Deduplication and single instance storage

28www.CambridgeComputer.comDeduplication and Single Instance Storage – Interop – Las Vegas – April 27, 2010

© Copyright 2009-2010, Cambridge Computer Services, Inc. All rights reserved.

Synthetic Full Backups – An Approach that Creates a Need for Dedupe

Synthetic Full Backups

• “Poor man’s incremental forever”

• Combine subsequent incremental backups with the

previous full backup to “synthesize” the next week’s full

backup.

• Great technique for minimizing networking traffic from

backups.

Synthetic full backups require that at least two weeks

of backups be available on disk.

• Dedupe disk targets tend to be a big win for synthetic full

backups

Page 29: Deduplication and single instance storage

29www.CambridgeComputer.com

Example: Research Firm with 6 Week Retention and Synthetic Fulls

60TB+

• Mix of large file systems, content management systems,

email, and database

Using Commvault with heavy use of synthetic full

backups

6 week retention on disk

Dedupe ratios between 8x and 16x

• NOTE: Their backup data could not fit in one dedupe box,

so they are managing 4 separate dedupe appliances in

each of their locations.

Deduplication and Single Instance Storage – Interop – Las Vegas – April 27, 2010

© Copyright 2009-2010, Cambridge Computer Services, Inc. All rights reserved.

Page 30: Deduplication and single instance storage

30www.CambridgeComputer.com

Theoretical v. Actual CapacityYour Mileage May Vary

YMMV – one customer’s mileage

• 48 TB raw disk

• 36 TB with RAID-6

• 35 +/ TB for unique capacity

• 3-5 TB deliberately left empty for headroom

Might hold

• As much as 500 TB of backups

• Or as little as 50 TB.

Deduplication and Single Instance Storage – Interop – Las Vegas – April 27, 2010

© Copyright 2009-2010, Cambridge Computer Services, Inc. All rights reserved.

Page 31: Deduplication and single instance storage

31www.CambridgeComputer.com

Dedupe on the Backup Client

Host-side dedupe is a form of sub-file-level incremental

• Instead of catching block-level changes, the file system changes are

hashed and compared with the back-end storage repository.

• Alternative to block-based CDP

• Unique data segments are then transferred to the backup service.

Host-side deduping is very valuable over the WAN.

• Minimizes data that needs to be transferred

• Typically it will dedupe across hosts, reducing files that are common to

multiple hosts

– Such as application and operating system binaries

Deduplication and Single Instance Storage – Interop – Las Vegas – April 27, 2010

© Copyright 2009-2010, Cambridge Computer Services, Inc. All rights reserved.

Page 32: Deduplication and single instance storage

32www.CambridgeComputer.comDeduplication and Single Instance Storage – Interop – Las Vegas – April 27, 2010

© Copyright 2009-2010, Cambridge Computer Services, Inc. All rights reserved.

WAN Backup Software with Dedupe in the Client

WAN

LAN

New York

Jersey City

London

Hong Kong

Dedupe Client

Local USB

Backup Server(s)

Shared Client & Local Recovery

Page 33: Deduplication and single instance storage

w w w . C a m b r i d g e C o m p u t e r . c o m 33

Backup System Network I/O Processing Bottlenecks

Moving Backup Data Through the

Network

Page 34: Deduplication and single instance storage

34www.CambridgeComputer.com

Backup Server I/O Processing is a Major Bottleneck

In most enterprise backup systems a single backup

server would be a major performance bottleneck

• Unless you were doing incremental forever or sub-file-level

backups

• Add a dedup process to that and it becomes that much

harder

A common practice for scaling out backup server

performance is to add network “data movers”

• Also known as: storage nodes, media servers, media

agents, etc.

Deduplication and Single Instance Storage – Interop – Las Vegas – April 27, 2010

© Copyright 2009-2010, Cambridge Computer Services, Inc. All rights reserved.

Page 35: Deduplication and single instance storage

35www.CambridgeComputer.comDeduplication and Single Instance Storage – Interop – Las Vegas – April 27, 2010

© Copyright 2009-2010, Cambridge Computer Services, Inc. All rights reserved.

Interesting Idea – Add Deduplication to the Network Data Movers

Dedicated Storage Network

Network

Page 36: Deduplication and single instance storage

36www.CambridgeComputer.comDeduplication and Single Instance Storage – Interop – Las Vegas – April 27, 2010

© Copyright 2009-2010, Cambridge Computer Services, Inc. All rights reserved.

I/O Processing BottlenecksNetwork Data Movers and “LAN-Free”

Dedicated Storage Network

Network data movers “LAN-Free”

Clients

Page 37: Deduplication and single instance storage

37www.CambridgeComputer.com

LAN-Free Backup Clients and NDMP Backups

In larger enterprise-class backup systems it is

common to have larger servers move data directly to

storage devices over Fibre Channel.

The fastest way to backup large NAS server is to do

NDMP dumps over Fibre Channel.

Deduplication and Single Instance Storage – Interop – Las Vegas – April 27, 2010

© Copyright 2009-2010, Cambridge Computer Services, Inc. All rights reserved.

Page 38: Deduplication and single instance storage

38www.CambridgeComputer.com

“LAN-FREE” – End-run Around the Backup Server

SAN Clients work

like slave servers.

They back up

directly to the

storage media, while

reporting metadata

over the LAN to the

backup server.

GigE

Storage Area Network

TapeRobotArm

Presumably all of these

tape drives are part of a

tape library.

Deduplication and Single Instance Storage – Interop – Las Vegas – April 27, 2010

© Copyright 2009-2010, Cambridge Computer Services, Inc. All rights reserved.

Page 39: Deduplication and single instance storage

39www.CambridgeComputer.com

Dedupe with LAN-Free Backup Clients

With LAN-Free backup you get no benefit from dedupe processing residing on the data movers.

• The dedupe logic needs to sit on the target storage device

This is where VTLs shine

• VTL works just like tape

– Network data movers work fine

– LAN-Free clients work fine

• VTLs offer higher throughput than CIFS or NFS

– Common to see total throughput in excess of 1GB/Sec

• VTLs might offer tighter integration with tape

Many VTLs do dedupe as a post-process

Deduplication and Single Instance Storage – Interop – Las Vegas – April 27, 2010

© Copyright 2009-2010, Cambridge Computer Services, Inc. All rights reserved.

Page 40: Deduplication and single instance storage

w w w . C a m b r i d g e C o m p u t e r . c o m 40

Back End Bottlenecks

Can your dedupe appliance keep

pace with the backup system?

Page 41: Deduplication and single instance storage

41www.CambridgeComputer.comDeduplication and Single Instance Storage – Interop – Las Vegas – April 27, 2010

© Copyright 2009-2010, Cambridge Computer Services, Inc. All rights reserved.

Back-End Bottlenecks: Can the Dedupe Storage Devices Hack It?

If you open up the flood gates, you might find that a single dedupe box on the LAN cannot hack it.

Some solutions:

• Buy lots of individual dedupe devices

• Maybe use a VTL implementation of dedupe

– Sorry out of the scope of this lecture

• Post-process deduping instead of deduping on-the-fly

– Less efficient from a capacity standpoint, but should be able to

achieve considerably better performance

• New grid-based architectures that offer parallel processing

for deduplication

• Newer dedupe devices that are up to the task

Page 42: Deduplication and single instance storage

42www.CambridgeComputer.comDeduplication and Single Instance Storage – Interop – Las Vegas – April 27, 2010

© Copyright 2009-2010, Cambridge Computer Services, Inc. All rights reserved.

Stand-alone Deduplication Servers

Single server dedupe solutions are often constrained

by:

• RAM and processing power

• The size of the index they can manage

• Disk performance

When you max out the box, you need to buy another

one

• Very painful incremental upgrade

• No dedupe across multiple boxes

• Make sure that you but a big enough box!

Page 43: Deduplication and single instance storage

43www.CambridgeComputer.com

Object-Based File System with Grid Architecture and Global Dedupe

Deduplication and Single Instance Storage – Interop – Las Vegas – April 27, 2010

© Copyright 2009-2010, Cambridge Computer Services, Inc. All rights reserved.

Front-End NodesExport File SystemsScale-out performance into GBs/Sec

CIFS/NFS ClientsBackup System Data Movers

Conventional File System Consumers

Back-End NodesManage disk, dedupe, and redundancy

Scale-deep to Petabytes of capacity

Page 44: Deduplication and single instance storage

44www.CambridgeComputer.com

VTL with Scalable Deduplication

GigE

Storage Network

VTLSingle Instance

RepositoryDe-Duplication Processors

Deduplication and Single Instance Storage – Interop – Las Vegas – April 27, 2010

© Copyright 2009-2010, Cambridge Computer Services, Inc. All rights reserved.

Page 45: Deduplication and single instance storage

45www.CambridgeComputer.comDeduplication and Single Instance Storage – Interop – Las Vegas – April 27, 2010

© Copyright 2009-2010, Cambridge Computer Services, Inc. All rights reserved.

Summary: Alternative Technologies to Dedupe Disk Targets

Don’t Duplicate in the First Place

• Incremental Forever Backups

• WAN-enabled backups, perhaps with dedupe on the client

Throw disk at it • Bulk SATA arrays cost typically less than $1K per TB

– Capacities up to 2PB

– Densities on the order of 1PB / rack

– MAID – power management to spin down inactive drives

Replicate Your SAN or NAS• Use optimized file backup or archive solution to provide file recovery

and to meet retention requirements

Page 46: Deduplication and single instance storage

w w w . C a m b r i d g e C o m p u t e r . c o m 46

Other Examples of Dedupe Technology

Primary Storage, VDI, Rich Media

Archiving

Page 47: Deduplication and single instance storage

47www.CambridgeComputer.com

Block-Level Dedupe for Primary Storage

Most dedupe solutions are designed specifically for backup

and archival data.

A limited number of products can dedupe on live data.

• One day perhaps, dedupe for primary storage will be a way of life

Great applications – those with redundant data!

• Desktop virtualization (VDI)

– A number of very interesting solutions are coming to market

• VMDK backup, dedupe, and fail-over on one platform

• Boot image servers

Reclamation of empty disk space

• Blank space deduplicates very nicely!

Deduplication and Single Instance Storage – Interop – Las Vegas – April 27, 2010

© Copyright 2009-2010, Cambridge Computer Services, Inc. All rights reserved.

Page 48: Deduplication and single instance storage

48www.CambridgeComputer.com

Single-Instance Storage for Virtual Desktops

Storage is a big deal-breaker for many VDI use cases

• Replaces desktop storage and desktop personnel with SAN storage and highly specialized storage managers

New techniques for VDI storage break the desktop down into elements and find commonality across all desktops

Virtual desktop file systems are “stitched together” from common elements:

• Operating system

• Applications or sets of applications

• Variable elements

– For example: anti-virus signatures

• Personal elements

– Screen savers and background images

– Google toolbar

– Personal applications

– Personal files

Deduplication and Single Instance Storage – Interop – Las Vegas – April 27, 2010

© Copyright 2009-2010, Cambridge Computer Services, Inc. All rights reserved.

Page 49: Deduplication and single instance storage

49www.CambridgeComputer.comDeduplication and Single Instance Storage – Interop – Las Vegas – April 27, 2010

© Copyright 2009-2010, Cambridge Computer Services, Inc. All rights reserved.

Dedupe Across Large Collections of Rich Media Files

Many types of files have content-level commonality across a large collection of files.

• TIFF

• JPG

• PNG

• OpenEXR

• DICOM

• MS Office Documents

• PDFs

A high level of commonality can be detected and de-duplicated, assuming a large enough sample set of data.

• Capacity optimization (depending on file type) on the order of 2x to 10x and

beyond.

Page 50: Deduplication and single instance storage

w w w . C a m b r i d g e C o m p u t e r . c o m 50

Dedupe in WAN Accelerators

Page 51: Deduplication and single instance storage

51www.CambridgeComputer.comDeduplication and Single Instance Storage – Interop – Las Vegas – April 27, 2010

© Copyright 2009-2010, Cambridge Computer Services, Inc. All rights reserved.

MS Exchange Branch Office: Example of the Need for Dedupe over the WAN

WAN

MS Exchange Server

Message with attachment sent to all staff.

Single instance message storage, but

the same message crosses the WAN multiple times

New YorkChicago

Atlanta

San Fran

Page 52: Deduplication and single instance storage

52www.CambridgeComputer.com

WAN Accelerator with Inline Dedupe

Deduplication and Single Instance Storage – Interop – Las Vegas – April 27, 2010

© Copyright 2009-2010, Cambridge Computer Services, Inc. All rights reserved.

WAN Accelerators / WAFS GatewaysFile Servers or NAS Appliance

Site A

Site B

Page 53: Deduplication and single instance storage

w w w . C a m b r i d g e C o m p u t e r . c o m 53

Questions – If Time Permits