data domain protection for emerging big data platforms · data domain protection for emerging big...

26
Data Domain Protection for Emerging Big Data Platforms Yatin Patil – Product Management Jeff St. Cyr – Manager, Global Technology Office [email protected] [email protected]

Upload: vandien

Post on 15-May-2018

232 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Data Domain Protection for Emerging Big Data Platforms · Data Domain Protection for Emerging Big Data Platforms ... True point-in-time backup and recovery of Hadoop data to Data

Data Domain Protection for Emerging Big Data Platforms

Yatin Patil – Product Management Jeff St. Cyr – Manager, Global Technology Office [email protected] [email protected]

Page 2: Data Domain Protection for Emerging Big Data Platforms · Data Domain Protection for Emerging Big Data Platforms ... True point-in-time backup and recovery of Hadoop data to Data

The Industry Challenge

Page 3: Data Domain Protection for Emerging Big Data Platforms · Data Domain Protection for Emerging Big Data Platforms ... True point-in-time backup and recovery of Hadoop data to Data

© Copyright 2017 Dell Inc. 3

A new breed of applications spreads into enterprises Racing towards production use

How will you protect your critical data in these apps?

Page 4: Data Domain Protection for Emerging Big Data Platforms · Data Domain Protection for Emerging Big Data Platforms ... True point-in-time backup and recovery of Hadoop data to Data

© Copyright 2017 Dell Inc. 4

Often with transformative business outcomes

Major credit card issuer

$2B potential fraud incidents identified before any money was lost

Electric car manufacturer

Growth that outperforms the market Can proactively identify & fix issues

Accurate forecasts saved $$$ Fewer insurance claims. Optimal energy use

Weather Forecaster

45 straight quarters of growth Increased repeat business

Leading grocery chain

Page 5: Data Domain Protection for Emerging Big Data Platforms · Data Domain Protection for Emerging Big Data Platforms ... True point-in-time backup and recovery of Hadoop data to Data

© Copyright 2017 Dell Inc. 5

But customers struggle with data protection and they are speaking with one voice

… expecting 10x data growth, we need a proper backup & DR strategy for Hadoop & Greenplum systems, just like how we backup our other systems

- a Wall street bank

(for Hadoop) we want a backup strategy involving daily incrementals & rollback to the last known good point. - a consumer electronics company

Hadoop needs a backup & DR story. We need a way to verifiably delete customer data after the mandated retention time

- a large multi-national accounting firm

Current homegrown utilities are too slow for weekly data ingests of 15-40TB - a data analytics & software-as-a-service company

Backup and DR are critical for our Hadoop & Cassandra systems - an electric utility

Page 6: Data Domain Protection for Emerging Big Data Platforms · Data Domain Protection for Emerging Big Data Platforms ... True point-in-time backup and recovery of Hadoop data to Data

© Copyright 2017 Dell Inc. 6

Data lakes aggregate new & existing data sources Transactional /In-memory

THE ENTERPRISE

DATA LAKE

Analytical Systems

Content / File Shares

NAS

NoSQL & other databases

Business Insights

Data warehouses Event Streams

Unstructured & Semi-structured

Page 7: Data Domain Protection for Emerging Big Data Platforms · Data Domain Protection for Emerging Big Data Platforms ... True point-in-time backup and recovery of Hadoop data to Data

© Copyright 2017 Dell Inc. 7

Unstructured & Semi-structured

Big data applications lack a robust backup solution Transactional /In-memory

THE ENTERPRISE

DATA LAKE

Analytical Systems

Content / File Shares

NAS

NoSQL Databases

Business Insights

Data warehouses Event Streams

Crude or no backup story

Snapshots & replication aren’t really a backup strategy

Needs enterprise grade data protection

Protected by

existing backup

solutions

Page 8: Data Domain Protection for Emerging Big Data Platforms · Data Domain Protection for Emerging Big Data Platforms ... True point-in-time backup and recovery of Hadoop data to Data

© Copyright 2017 Dell Inc. 8

Dell EMC database protection strategy

Covering broad range of mission critical applications

ANY APPLICATION

Stay within your backup window and recover your data from any point in time

ANY SPEED, ANY SLO

Leverage your entire IT team to complete the backups needed for your organization

ANY ADMINISTRATOR

No matter where your data lives

ANY LOCATION

Page 9: Data Domain Protection for Emerging Big Data Platforms · Data Domain Protection for Emerging Big Data Platforms ... True point-in-time backup and recovery of Hadoop data to Data

The Industry Solution

Page 10: Data Domain Protection for Emerging Big Data Platforms · Data Domain Protection for Emerging Big Data Platforms ... True point-in-time backup and recovery of Hadoop data to Data

© Copyright 2017 Dell Inc. 10

Efficient, flexible, cloud-enabled protection

• Back directly from enterprise apps or primary storage • Deploy protection storage however you want it

FLEXIBLE

• Natively tier deduped data to the cloud for modern long-term retention • Deliver data protection as a service with logical data isolation

CLOUD-ENABLED

• Reduce storage requirements by 10 – 30x with variable-length deduplication • Gain industry leading speed, scalability, and reliability

EFFICIENT

… Powered by Data Domain software

Page 11: Data Domain Protection for Emerging Big Data Platforms · Data Domain Protection for Emerging Big Data Platforms ... True point-in-time backup and recovery of Hadoop data to Data

© Copyright 2017 Dell Inc. 11

DD Boost – industry leading protocol • APPLICATION INTEGRATION WITH Data Domain Boost

When you want • Faster Backup Performance

• Network Bandwidth Reduction

• Increased Database Availability

• Reduced impact on database server

• Simplified configuration management

DD6800 DD9800

Speed (DD Boost) 32 TB/hr 68 TB/hr

Speed (other) 14 TB/hr 31 TB/hr

Logical capacity 2.8–14.4 PB1

8.4–43.2 PB2 10–50 PB1 30–150 PB2

Usable capacity Up to 288 TB1 Up to 864 TB2

Up to 1 PB1 Up to 3 PB2

1 Total capacity on Active Tier only

2 Total capacity with DD Cloud Tier software for long-term retention

Page 12: Data Domain Protection for Emerging Big Data Platforms · Data Domain Protection for Emerging Big Data Platforms ... True point-in-time backup and recovery of Hadoop data to Data

© Copyright 2017 Dell Inc. 12

DD Boost File System Plugin

DD Boost/BoostFS are ideal for large, clustered databases & Hadoop

Application agnostic agent

Integrate any application without an SDK

Supported for - MongoDB and Mongo Ops Manager - MySQL

Simple File systems interface

Efficient DD Boost efficiency +

Access via a file system mount-point cp file /mount_point/ddboost

Sends data to Data Domain via DD Boost - With the network & storage efficiencies of

DD Boost and Data Domain

Page 13: Data Domain Protection for Emerging Big Data Platforms · Data Domain Protection for Emerging Big Data Platforms ... True point-in-time backup and recovery of Hadoop data to Data

Platform Integration

Page 14: Data Domain Protection for Emerging Big Data Platforms · Data Domain Protection for Emerging Big Data Platforms ... True point-in-time backup and recovery of Hadoop data to Data

© Copyright 2017 Dell Inc. 14

Introducing the Hadoop application agent

True point-in-time backup and recovery of Hadoop data to Data Domain

Supports leading commercial distros: Cloudera & Hortonworks

Backup & recovery operations controlled by Hadoop admins

Linux CLI interface facilitates scripting & extensibility

Page 15: Data Domain Protection for Emerging Big Data Platforms · Data Domain Protection for Emerging Big Data Platforms ... True point-in-time backup and recovery of Hadoop data to Data

© Copyright 2017 Dell Inc. 15

Protecting Hadoop with Data Domain

HDFS Hadoop File System

Data Node

Data Node

Data Node

Data Node

OR Shared Storage DAS

B B B B

Name Node

Hadoop Cluster

Hadoop App Agent

DDBoost Filesystems Plugin

B

• An HDFS-integrated backup app for Hadoop – Point-in time backup & recovery – Backup HDFS directories & HBASE tables

• Empowers Hadoop admins to backup their data – Using Hadoop native tooling (MapReduce, distcp)

• Storage agnostic: DAS and NAS configurations

• Add multiple Data Domains to grow capacity

• Most efficient data path to backup storage

Hadoop Application Agent A backup application for Hadoop

Backup data path

Hadoop Admin

• Supported on: – Cloudera Enterprise 5.4 – 5.9 – Hortonworks Data Platform 2.2 – 2.5

Data Domain

Backup Solution

Page 16: Data Domain Protection for Emerging Big Data Platforms · Data Domain Protection for Emerging Big Data Platforms ... True point-in-time backup and recovery of Hadoop data to Data

© Copyright 2017 Dell Inc. 16

Protecting MongoDB with Data Domain File system ease of use with the power of DD Boost

DD Boost

MongoDB Server

/backup

Linux Mount point

• Supported On: – MongoDB Ops Manager 2.6.3.0, 3.3, 3.4

• Dump MongoDB to Data Domain (via BoostFS) – Get storage efficiency due to deduplication – Network bandwidth efficiency due to Boost – Use parallel connections (more detail??) – mongodump –db testdb –numParallelCollections 5 –out /backup/

• Simple to deploy – Install BoostFS on the MongoDB server/Ops Manager server – Create a mountpoint /backup – Mount Data Domain Storage Unit using BoostFS

• Supports WiredTiger & MMAPv1 storage engines

• Best practices – Use Ops Mgr. v3.4 (larger file size writes) – Up to 63 streams per BoostFS plug-in – mongodump writes backup files uncompressed from WT or

MMAPv1 storage engines

Page 17: Data Domain Protection for Emerging Big Data Platforms · Data Domain Protection for Emerging Big Data Platforms ... True point-in-time backup and recovery of Hadoop data to Data

© Copyright 2017 Dell Inc. 17

Pivotal Greenplum data protection

DD Boost

Greenplum cluster

• Supported versions: – 4.2.1 through 4.3.10

• gpcrondump utility – Wrapper utility around gp_dump – dumps to Data Domain storage

unit via DD Boost – Compressed by default; with object consistency

• Use gpcrondump to backup – Databases, schemas, & tables – gpcrondump -x mydatabase -z -v –ddboost

• Use gpdbrestore for recovery – GP_RESTORE and GPDBRESTORE – Greenplum database system is online and running – Have the same primary segment instances as the system backed up – Database being restored existing but is empty, or –e to drop & create – gpdbrestore -t backup_timestamp -v -ddboost

• Incremental Backup and Restore (GPDB 4.2.5) – AO tables and partitions – ALTER TABLE, INSERT, TRUNCATE, DROP & RECREATE

DD Boost integrates into Greenplum’s database backup commands

Page 18: Data Domain Protection for Emerging Big Data Platforms · Data Domain Protection for Emerging Big Data Platforms ... True point-in-time backup and recovery of Hadoop data to Data

© Copyright 2017 Dell Inc. 18

Protecting MySQL Databases with Data Domain

DD Boost

• Supported versions: – 5.6 & 5.7

Integration and qualification of 3 different applications

• Dump MySQL to Data Domain (via BoostFS) – Get storage efficiency due to deduplication – Network bandwidth efficiency due to Boost – Mysqlbackup command

• MyDumper

• Perfcona Xtrabackup

Backup Application

MySQL Enterprise

Backup

Mydumper XtraBackup

Single 30% 35% 25%

Multiple N/A 50% 40%

Page 19: Data Domain Protection for Emerging Big Data Platforms · Data Domain Protection for Emerging Big Data Platforms ... True point-in-time backup and recovery of Hadoop data to Data

© Copyright 2017 Dell Inc. 19

Protecting EnterpriseDB with Data Domain

DD Boost

Integration and qualification of 3 different applications

• Dump MySQL to Data Domain (via BoostFS) – Get storage efficiency due to deduplication – Network bandwidth efficiency due to Boost – Mysqlbackup command

• Backup Tools Qualified – BART - Enterprise – PG Dump – Standard (Community) – PG RMAN - Open Source

• Supported Distros – Standard v9.5 & 9.6 – Enterprise v9.5 & 9.6

Page 20: Data Domain Protection for Emerging Big Data Platforms · Data Domain Protection for Emerging Big Data Platforms ... True point-in-time backup and recovery of Hadoop data to Data

© Copyright 2017 Dell Inc. 20

DD Boost for everyone

• Expanding the benefits of DD Boost to even more applications with DD Boost File System Plug-in

• Can be deployed in minutes to reduce backup windows and storage capacity

• Same advanced DD Boost features in a file system format

Page 21: Data Domain Protection for Emerging Big Data Platforms · Data Domain Protection for Emerging Big Data Platforms ... True point-in-time backup and recovery of Hadoop data to Data

© Copyright 2017 Dell Inc. 21

What does AVT do? Why do I need it?

• AVT saves you time and money by ensuring your application will benefit from using BoostFS, BEFORE going into production.

• AVT is a POC in a box, that will measure the benefits of BoostFS with your workload.

• Shorter backup windows compared to NFS

• Greater storage efficiencies with data deduplication

• Recommended for any application/workloads that use NFS for data protection that are NOT listed in the Integration guide.

https://community.emc.com/docs/DOC-55465 What guide? This one…..

Page 23: Data Domain Protection for Emerging Big Data Platforms · Data Domain Protection for Emerging Big Data Platforms ... True point-in-time backup and recovery of Hadoop data to Data

© Copyright 2017 Dell Inc. 23

Want to win a levitating death star speaker?

• Follow @DellEMCProtect while at Dell EMC World

• 2 Winners will be chosen daily from

Monday May 8 to Thursday May 11 • All winners will be notified through

Twitter Direct Message

NO PURCHASE NECESSARY. Ends 05/11/2017. To enter and for Official Rules, visit http://thecoreblog.emc.com/dell-emc-world-follow-win-sweepstakes-2017/

Page 24: Data Domain Protection for Emerging Big Data Platforms · Data Domain Protection for Emerging Big Data Platforms ... True point-in-time backup and recovery of Hadoop data to Data

© Copyright 2017 Dell Inc. 24

Learn more: join the conversation

@DellEMCProtect

Dell EMC Storage and Data Protection

Dell EMC Data Protection Community

Data Protection on EMC.com

Mozy.com

Spanning.com

Page 25: Data Domain Protection for Emerging Big Data Platforms · Data Domain Protection for Emerging Big Data Platforms ... True point-in-time backup and recovery of Hadoop data to Data

© Copyright 2017 Dell Inc. 25

You may also be interested in these sessions …

Session Breakout Session Title First Session

Second Session

dps.01 Enterprise Copy Data Management: Primary & Protection Copy Management Best Practices Mon 01:30

dps.06 Dell EMC Data Domain: What's New For 2017 Mon 08:30 Wed 01:30

dps.07 Dell EMC Data Protection Suite: What's New For 2017 Tue 03:00 Wed 12:00

dps.13 Data Domain Protection For Microsoft Applications: SQL, SharePoint & Exchange Wed 03:00 Thu 11:30

dps.14 Data Domain Protection For Large Enterprise Databases: Oracle, SAP & IBM Mon 12:00 Thu 08:30

bof.14 Bird’s Of A Feather: Data Domain Ask The Experts Tue 01:30

Page 26: Data Domain Protection for Emerging Big Data Platforms · Data Domain Protection for Emerging Big Data Platforms ... True point-in-time backup and recovery of Hadoop data to Data