ibm protectier deduplication solutions stanislav dzúrik ibm ftss storage stanislav_...

41
IBM ProtecTIER Deduplication Solutions Stanislav Dzúrik IBM FTSS Storage stanislav_ [email protected]

Upload: elmer-harrell

Post on 24-Dec-2015

217 views

Category:

Documents


1 download

TRANSCRIPT

IBM ProtecTIER Deduplication Solutions

Stanislav DzúrikIBM FTSS Storagestanislav_ [email protected]

Protect More. Store Less.®

got data?

too much

And not enough ( blank ) to store it all?

Time Money People Floor Space Electricity Air Conditioning

Protect More. Store Less.®

The tidal wave of data continues …

• The amount of digital information continues to grow exponentially• And we need to keep more of it, longer• And the costs of losing data are increasingly unacceptable

o Lost revenueso Lost customer confidenceo Embarrassment in the marketo Fines from contracts, government agencieso CEO and CFO could go to jail

• But budgets are not increasing

2005 2006 2007 2008 2009 2010

Data created and copied is expected to grow at 48% CAGR through 2010

Source: Various external consultant reportsWe Need to do More with Less,

and we need to do it smarter

Protect More. Store Less.®

Survey - what are your two biggest storage pain points?

* TheInfoPro Storage Study: F1000 Sample. n=149. Other n=14. *Multiple responses recorded

Protect More. Store Less.®

• Move data to the right place

• Store more with what’s on the floor

• Stop storing so much

Storage efficiency strategies and best practices

Protect More. Store Less.®

• Storage Virtualization• Thin Provisioning

• Data Compression• Data Deduplication

• Automated Tiering• Automated Data Migration

A set of essential technologies enables storage efficiency

• Move data to the right place

• Store more with what’s on the floor

• Stop storing so much

Protect More. Store Less.®

The pressures on backup administrators are growing

Backup takes longer

Recovery takes longerCan’t buy more storage

More new data coming

Growth Backup

Manage Recover

Protect More. Store Less.®

• Short Term Retentiono Use disk for daily backup

& restore operations• Performance

o Fast backupso Even faster restoreso Meet “backup windows”

• Long Term Retentiono Cost effective capacityo Removable & transportable

• Complianceo Meet financial & regulatory

requirementso Data encryption, WORM

Using the right balance of high density tapeand high performance disk will help . . .

Protect More. Store Less.®

Compression and Deduplication use less physical storage

• Store data more efficiently• Lower Operating Expenses: Power, cooling, floor space• Keep more data online for analytics and fast restores

Protect More. Store Less.®

And data deduplication is the key to using more disk more cost effectively!

Data Deduplication Overview

Protect More. Store Less.®

Deduplication Architectures

LAN SAN

ClientServer

Storage Devices

Client side• Reduce load on

server• Reduces bandwidth

on LAN• Adds load to client• No cross correlation

among multiple clients

Server side• Allows cross

correlation of data among multiple clients

• Adds load to server

Block Storage Device• Transparent to clients and

servers• Reduces load on server

and client• Adds load to storage device• No file or format awareness

Protect More. Store Less.®

Data Deduplication Process (simplified)

FEDCBA

Data Object / Stream

Identical Chunks

Assume a Data-Object or -Stream as Subject for deduplication

Data Object is split in Chunks (fixed or variable size)

For each Chunk an identity characteristic is determined

Duplicate chunks are identified• Identical Chunks are referenced with pointers,

references.• Non-identical chunks or single instances are

effectively stored• Compression may be performed in addition.

A B C D A E F F D B A F

Required Disk-Cache is reduced

Protect More. Store Less.®

Methods for Data Chunking

1. File based o One chunk is one file, most appropriate for file systemso E.g. TSM Incremental Backup forever helps eliminate redundant data

– Fixed blocko Data object is split into fixed blockso Used by block storage devices

– Format Awareo Understands explicit data formats and chunk data object according to formato Example: breaking a PowerPoint deck into separate slides

– Format agnostico Chunking is based on algorithm that looks for logical breaks or similar elements

within a data object

• Chunking method influences deduplication ratio

Data Object / Stream

Protect More. Store Less.®

Method for Determining Duplicates

1. Hashingo Computes a hash (MD-5, SHA-256) for each data chunko Compares hash with all hash of existing datao Identical hash means most likely identical datao Potential (small) Risk of Hash Collisions: identical hash and non identical datao Must be prevented through secondary comparison (additional metadata, second hash

method, binary comparison)

– Binary Comparisono Compares all bits of similar chunks

– Delta Differencingo Computes a “delta” between two “similar” chunks of data where one chunk is the

baseline and the second is the deltao Since each delta is unique there is no possibility of collisiono To reconstruct the original chunk the delta(s) have to be re-applied to the

baseline chunk

FEDCBA

A B C D A E F F D

Protect More. Store Less.®

In-Line Deduplication

• Data is deduplicated before it is actually stored • Deduplication is performed as data flows into the secondary storage system

• Advantageso Processes data once, eliminates additional post-processing tasks

• Disadvantageso CPU intensive deduplication process can create performance bottlenecko One process per I/O stream

Deduplication

Primary

Storage

Secondary

Storage

VTLBackup

Protect More. Store Less.®

Out-Band Deduplication (Post-Processing)• Data is first stored and deduplicated in the background

• Advantageso De-duplication CPU overhead no longer affects backup windowo Supports multiple I/O streamso Potentially faster restore for first version (not deduplicated)

• Disadvantageso Data is written, read and written – thus more I/O intensiveo Deduplication window must be coordinated with backup window as it take

typically longer than in-line processingo Requires larger secondary storage because first version is not deduplicated

Deduplication

Primary

Storage

Secondary

Storage

VTLSecondary

Storage

Backup

Protect More. Store Less.®

3 × Deduplication in the IBM Portfolio

ProtecTIER A-SIS

TS7650GN seriesGateway

TSM R6

TSMAPI

TapeFileLUN

ProtecTIER Overview

Protect More. Store Less.®

ProtecTIER reduces the required backup disk capacity by

up to 25 times!

Protect More. Store Less.®

Protect More. Store Less.®

2003 2004 2005 2006 2007 2008 2009 2010

IBM ProtecTIER Deduplication Innovation and Leadership

2011

6 PhDs begin researching massively scalablededuplication algorithms

First non-hash deduplication algorithm developed, designed for

100% data integrity

First single node system to store

over 1PB of deduplicated data

First Deduplication Virtual Tape Library

deployed into production

First to deliver VTL solutions for both Open

and Mainframe environments

Fastest single node inline

deduplication solution

IBM acquires Diligent

The only “true” enterprise-class

deduplication solution on the market today

First true clustered system with Global

Deduplication

First Deduplication solution for

System z IBM’s first midrange solution released

First to deliver Many-to-Many

replication

Fastest restore speed – up to 2800 MB/sec!

• Installed in all major industrieso Over 1,400 ProtecTIER systems sold to

dateo Production systems range in size from 5TB

to over 700TBo Over 90 PB of physical disk capacity

behind ProtecTIER servers in production protecting thousands of PBs of backup data

Protect More. Store Less.®

IBM’s Virtual Tape De-duplication SW Products

• HyperFactor

• ProtecTIER ProtecTIER VT is a scalable and robust virtual tape solution that emulates

tape libraries, enabling existing backup applications to send data to the ProtecTIER disk-based platform, rather than directly to tape.

HyperFactor is a revolutionary de-duplication solution which eliminates redundant data, enabling customers to increase their effective capacity by up to 25 times. ProtecTIER is powered by HyperFactor and can radically reduce both physical disk capacity and total storage costs.

Protect More. Store Less.®

Repository

Backup Servers

ProtecTIER™Server

HyperFactor™

New Data Stream

“Filtered” data

MemoryResident Index

Only 4GB needed to map 1PB of physical disk! Backup with Inline deduplication Up to 1400MB/sec per server or

2000MB/sec with 2 node cluster!

How ProtecTIER works

Protect More. Store Less.®

• Backup application writes data to ProtecTIER as it would to tape

• Only unique data is stored, existing duplicate data is referenced

• When data objects expire, references are removed and free space is reclaimed and reused

1 2 3 4 5

A B D E F G H I JC

ProtecTIER Deduplication Operation and Results Example

Backup Amount Amount Dedupe Event Received Stored Ratio

Incremental Backup 100 GB 10 GB 4.2:1

Incremental Backup 100 GB 10 GB 4.4:1

Second Full Backup 1 TB 10 GB 7.8:1

Incremental Backup 100 GB 10 GB 8:1

Third Full Backup 1 TB 10 GB 11:1

After two months . . . 7.8 TB 350 GB 22:1

First Full Backup 1 TB 250 GB 4:1

Protect More. Store Less.®

Master Server

Backup Server

ProtecTIER Server

Physical capacity

Store up to 25 times backup data on given physical storage capacity

Represented capacity

Storage Impact from ProtecTIER Deduplication

Protect More. Store Less.®

Physical capacity

ProtecTIERGateway

Backup Server

Backup Server

Represented capacityPrimary Site

Physical capacity

ProtecTIER GatewayBackup

Server

Secondary Site

IP-based WAN link

Tape library

Virtual cartridges can be cloned to tape at DR site

Deduplication enables a large amounts of data to be replicated with significantly less bandwidth

Significantly Reduces Replication Bandwidth

Protect More. Store Less.®

Up to 12 Branch Offices (spokes): Gateways and/or Appliances1 target (hub): Appliance, Gateway, single or two-node cluster

Physical capacity

ProtecTIER Gateway

Backup Server

Central / DR Site

IP based NR links

Tape library

Virtual cartridges can be cloned to tape by the Main-Site B/U server

ProtecTIER Many-to-One Replication Overview

Protect More. Store Less.®

Physical capacity

ProtecTIER Gateway

Backup Server

Site D

Site A

Site C

Site B

Supports any combination of Gateways, Appliances, single or two-node clusters

Up to 4 hubs in a grid

ProtecTIER Many-to-Many Native Replication Grid

Protect More. Store Less.®

ProtecTIER Support for Symantec OpenStorage (OST)

ProtecTIER OST Plugin

IBM ProtecTIER:Backup storage appliance with Deduplication and Native Replication

ProtecTIER Server

OpenStorage API

NetBackupPolicy and Control

• OST API separates the backup logic from the storage appliance logic and implementation

NetBackup Server

Protect More. Store Less.®

Scalable Capacity and

Performance

Better PerformanceLarger Capacity

Scalable

Up to 500 MB/sec7 TB to 36 TB

Useable Capacity

IBM ProtecTIER® Deduplication Family

Highest PerformanceLargest CapacityHigh Availability

Backup: Up to 2000 MB/secRestore: Up to 2800 MB/secUp to 1 PB Useable Capacity

TS7650G & TS7680 ProtecTIER GatewaysTS7650

ProtecTIER Appliances

TS7610 ProtecTIER Appliance Express

Up to 100 MB/sec4 TB and 5.4 TB Useable

Capacity

Good PerformanceEntry Level

Easy to Install

ProtecTIER Differentiation

Protect More. Store Less.®

ProtecTIER Advantage: Data Integrity

• Unique and patented HyperFactor® deduplication technology

• The only production proven deduplication solution not based on a hash algorithm

• Designed for 100% data integrity• Bit for bit comparison of data to ensure data is a

duplicate• Can NEVER lose data due to a hash collision

Although the chance of losing data from a hash collision is low, it is NOT ZERO as it is with a ProtecTIER solution

Protect More. Store Less.®

ProtecTIER Advantage: Restore Performance

• Restoring data from a ProtecTIER solution is even FASTER than backing up

• ProtecTIER can easily restore at 2800MB/sec!• High restore performance not limited to certain backup

applications or specific data sets like other vendors• High restore performance achieved on real data with

realistic 20% change rate in production environments• Never requires agents on backup servers

Other vendor’s “CPU-centric” architectures are optimized for processing hashes not moving data

Protect More. Store Less.®

• A single ProtecTIER system can support up to 1 Petabyte of useable capacity

• ProtecTIER supports the use of any IBM storage system (DS8000, DS5000, XIV, etc.) and most third party storage systems for the repository

• IBM has hundreds of ProtecTIER systems with over 100TBs of useable capacity in production environments throughout the world

• IBM always states “Useable Capacity” and never uses the deceptive “RAW capacity” terms like other vendors

The hidden costs associated with managing, maintaining, poweringand cooling multiple appliances is significant and should not be ignored!

ProtecTIER Advantage: Scalability

Protect More. Store Less.®

Many vendors claim to have Global Deduplication but create multiple separate repositories that may contain redundant data!

ProtecTIER Advantage: Global Deduplication• ProtecTIER Cluster with true Global Deduplication has been

Generally Available and in production since 2008• Supported with all major backup applications and available for

all Open Systems, System z and System I platforms• No agents or backup server upgrades required• Other vendor’s Global Deduplication capabilities are immature

and incomplete with very few if any systems in production• Other vendor’s Global Dedupe restricted to certain models,

only with NetBackup OST and require agents to be installed

Protect More. Store Less.®

Example: Disk activity needed to ingest and deduplicate 10 TBs of backup data

Hash-basedPost Process

10 TB Data Write 10 TB

Read 10 TB 2x

1xHyperFactor10 TB Data

Read or Write 10 TB

Post Process Approach: Deduplicate after Storing

ProtecTIER Inline Approach: Deduplicate before Storing

Requires: > storage > I/Os > Time > Effort > Admin

Results: simple faster easier cheaper efficient

ProtecTIER Advantage: Inline Deduplication

Protect More. Store Less.®

8:00 PM

BackupServer ProtecTIER VT Tape Library

2:00 AM

Truck

8:00 PM8:00 AM

SLA is Met

Post Processing

8:00 PM

BackupServer VTL Tape Library

Truck

8:00 PM

Dedupe

DedupeOverlap

Dedupe

Inline Processing

2:00 AM 8:00 AM

ProtecTIER Advantage: Inline Deduplication

Protect More. Store Less.®

• Store up to 25 times more data on disko Up to 25:1 reduction with 100% data integrity

• Reduce backup and restore timeso Fast inline deduplication up to 2000 MB/seco Even faster restores up to 2800 MB/sec

• Improve the reliability of backup operationso Eliminates mechanical & handling failures

• Drive the cost of disk based backup downo Reduces energy, cooling, and space

required• Increase data retention

o Store more backup data on disk for a longer time with very little additional cost

With an IBM ProtecTIER Solution you can . . .

Protect More. Store Less.®

IBM Customers

The main ProtecTIER Web Pagewww.ibm.com/systems/storage/tape/protectier

For More Information on IBM’s ProtecTIER

Protect More. Store Less.®

8 IBM Corporation 1994-2011. All rights reserved.References in this document to IBM products or services do not imply that IBM intends to make them available in every country.

Trademarks of International Business Machines Corporation in the United States, other countries, or both can be found on the World Wide Web at http://www.ibm.com/legal/copytrade.shtml.

Intel, Intel logo, Intel Inside, Intel Inside logo, Intel Centrino, Intel Centrino logo, Celeron, Intel Xeon, Intel SpeedStep, Itanium, and Pentium are trademarks or registeredtrademarks of Intel Corporation or its subsidiaries in the United States and other countries.Linux is a registered trademark of Linus Torvalds in the United States, other countries, or both.Microsoft, Windows, Windows NT, and the Windows logo are trademarks of Microsoft Corporation in the United States, other countries, or both.UNIX is a registered trademark of The Open Group in the United States and other countries.Java and all Java-based trademarks are trademarks of Sun Microsystems, Inc. in the United States, other countries, or both.Other company, product, or service names may be trademarks or service marks of others.

Information is provided "AS IS" without warranty of any kind.

The customer examples described are presented as illustrations of how those customers have used IBM products and the results they may have achieved. Actual environmental costs and performance characteristics may vary by customer.

Information concerning non-IBM products was obtained from a supplier of these products, published announcement material, or other publicly available sources and does not constitute an endorsement of such products by IBM. Sources for non-IBM list prices and performance numbers are taken from publicly available information, including vendor announcements and vendor worldwide homepages. IBM has not tested these products and cannot confirm the accuracy of performance, capability, or any other claims related to non-IBM products. Questions on the capability of non-IBM products should be addressed to the supplier of those products.

All statements regarding IBM future direction and intent are subject to change or withdrawal without notice, and represent goals and objectives only.

Some information addresses anticipated future capabilities. Such information is not intended as a definitive statement of a commitment to specific levels of performance, function or delivery schedules with respect to any future products. Such commitments are only made in IBM product announcements. The information is presented here to communicate IBM's current investment and development activities as a good faith effort to help with our customers' future planning.

Performance is based on measurements and projections using standard IBM benchmarks in a controlled environment. The actual throughput or performance that any user will experience will vary depending upon considerations such as the amount of multiprogramming in the user's job stream, the I/O configuration, the storage configuration, and the workload processed. Therefore, no assurance can be given that an individual user will achieve throughput or performance improvements equivalent to the ratios stated here.

Photographs shown may be engineering prototypes. Changes may be incorporated in production models.

Trademarks and Disclaimers

Protect More. Store Less.®

Ďakujem za pozornosť