scale-out data deduplication architecture
TRANSCRIPT
![Page 1: Scale-out Data Deduplication Architecture](https://reader030.vdocuments.us/reader030/viewer/2022012920/61c7ad825809fb120a421947/html5/thumbnails/1.jpg)
Scale-out Data
Deduplication Architecture
Gideon Senderov
Product Management & Technical Marketing
NEC Corporation of America
![Page 2: Scale-out Data Deduplication Architecture](https://reader030.vdocuments.us/reader030/viewer/2022012920/61c7ad825809fb120a421947/html5/thumbnails/2.jpg)
Outline
• Data Growth and Retention
• Deduplication Methods
• Legacy Architecture Limitations
• Adaptive Platform for Long-term Data
• Enhanced Scale-out Attributes
– Scalability
– Performance
– Resiliency
Page 2
![Page 3: Scale-out Data Deduplication Architecture](https://reader030.vdocuments.us/reader030/viewer/2022012920/61c7ad825809fb120a421947/html5/thumbnails/3.jpg)
Long-term Data
Source: IDC 2010
0
10
20
30
40
50
60
70
80
90
2008 2009 2010 2011 2012 2013 2014
Content Depots & CloudUnstructured dataReplicated dataStructured data
Consumption of Enterprise
Disk Capacity by Type
CAGR
23.6%
54.8%
(EB)
24.2%
75.6%
88%
• According to SNIA, 80% of files are dormant, unchanged.
• According to Contoural, average file shares grow 40% annually.
• Non-mission critical data filling costly Tier 1 storage
Page 3
![Page 4: Scale-out Data Deduplication Architecture](https://reader030.vdocuments.us/reader030/viewer/2022012920/61c7ad825809fb120a421947/html5/thumbnails/4.jpg)
Increasing Enterprise Size
Data Outlasts Its Container
Page 4
Months Years Decades Centuries
DATA
MINING
FILE
SHARING
FILE
SHARING
WWW
DATA
MINING
MIGRATE
DATA
MIGRATE
DATA
MIGRATE
DATA
![Page 5: Scale-out Data Deduplication Architecture](https://reader030.vdocuments.us/reader030/viewer/2022012920/61c7ad825809fb120a421947/html5/thumbnails/5.jpg)
Data Migration Costs
• In 2011 there will be 1775 Exabytes of information
• Archive data outlasts a storage container, data migrate
costs $1K – $1.8K per TB
• 15% of Storage Admin’s time spent on data migration
Page 5
Source: IDC, Gartner, Contoural Consulting
![Page 6: Scale-out Data Deduplication Architecture](https://reader030.vdocuments.us/reader030/viewer/2022012920/61c7ad825809fb120a421947/html5/thumbnails/6.jpg)
Deduplication Granularity
• File (SIS)
– Data deduplication performed at a file or object granularity
• Sub-file (Block)
– Data deduplication performed at a sub-file granularity
• Fixed size chunk
• Variable size chunk
• Sub-file variable size chunk deduplication benefits
– Greater efficiency by deduplicating data at finer granularity
– Automatic adjustments (“slide”) across inserted data
Page 6
![Page 7: Scale-out Data Deduplication Architecture](https://reader030.vdocuments.us/reader030/viewer/2022012920/61c7ad825809fb120a421947/html5/thumbnails/7.jpg)
Inline vs. Post-process
• Inline
– Data deduplication performed before writing the deduplicated data
• Post-Process
– Data deduplication performed after the data to be deduplicated has been initially stored
• Inline deduplication benefits
– Eliminate need for disk buffer for un-deduplicated data
– Eliminate need for subsequent “idle time” for processing
– Immediate replication and shorter RTO for DR
Page 7
![Page 8: Scale-out Data Deduplication Architecture](https://reader030.vdocuments.us/reader030/viewer/2022012920/61c7ad825809fb120a421947/html5/thumbnails/8.jpg)
Generic vs. Application-aware
• Generic
– Deduplication with same algorithm against all data streams
• Application-aware
– Custom deduplication for specific data streams to account for metadata inserted by the corresponding applications
• Application-aware deduplication benefits
– Greater efficiency by eliminating negative impact of application metadata on deduplication ratio
– Cross-application deduplication for greater scope and higher deduplication efficiency
Page 8
![Page 9: Scale-out Data Deduplication Architecture](https://reader030.vdocuments.us/reader030/viewer/2022012920/61c7ad825809fb120a421947/html5/thumbnails/9.jpg)
Local vs. Global
• Local
– Deduplication across a single node or sub-node, requiring separate deduplication repositories for multiple nodes
• Global
– Deduplication spanning multiple nodes with a single shared deduplication repository across all nodes
• Global deduplication benefits
– Greater scalability of a single deduplication repository
– Greater efficiency with a broader scope spanning multiple nodes
Page 9
![Page 10: Scale-out Data Deduplication Architecture](https://reader030.vdocuments.us/reader030/viewer/2022012920/61c7ad825809fb120a421947/html5/thumbnails/10.jpg)
Legacy Scalability Limitations
• Inadequate scalability of capacity & performance
– Cannot scale performance to keep up with growth
– Multiple products with different architectures
– More siloed capacity to manage
• Limited deduplication scope
– Limited scalability proliferates duplicate data across appliances
– Lower deduplication ratio for large environments
Page 10
![Page 11: Scale-out Data Deduplication Architecture](https://reader030.vdocuments.us/reader030/viewer/2022012920/61c7ad825809fb120a421947/html5/thumbnails/11.jpg)
Local vs. Global Scope
• Single deduplication repository across entire solution
• Data deduplication across ALL data from ALL nodes
– Cross-node dedupe for greater efficiency
– Cross-application dedupe with application-awareness
Page 11
Local – Volume
Dedupe
Repository
Dedupe
Repository
Dedupe
Repository
Local – Node
Dedupe
Repository
Dedupe
Repository
Global
Dedupe
Repository
![Page 12: Scale-out Data Deduplication Architecture](https://reader030.vdocuments.us/reader030/viewer/2022012920/61c7ad825809fb120a421947/html5/thumbnails/12.jpg)
Legacy Resiliency Limitations
Page 12
5 6 7 8
2 3 41
Q: What data
can be restored
if block #1
lost?
A: NONE!
14 6 2 1
7 1 4
6
1
3
6
4 6
1
7
6
8
1 4
7
5 1
Day 1: FullDay 2:
Incremental…Day 7: Full
Day 3:
Incremental
1
Traditional RAID is not sufficient for deduped data
![Page 13: Scale-out Data Deduplication Architecture](https://reader030.vdocuments.us/reader030/viewer/2022012920/61c7ad825809fb120a421947/html5/thumbnails/13.jpg)
Non-disruptive
RemoveNon-disruptive
Add, Upgrade
V1 Grid + V1 + V2 + Vx = 1 System
Page 13Page 13
Adaptive Scale-out ArchitectureOne system becomes Greener, Faster, Denser
• Concurrently support multiple HW generations
• Non-disruptive resource scaling
• Zero provisioning, dynamic self-management
• Automatically balance system workload
• In-place technology refresh – No migration
![Page 14: Scale-out Data Deduplication Architecture](https://reader030.vdocuments.us/reader030/viewer/2022012920/61c7ad825809fb120a421947/html5/thumbnails/14.jpg)
Page 14Page 14
Independent Linear Scalability
• Customized performance and capacity per
application
• Dynamically adjust to changing data growth
Uniquely Scale
Performance and Capacity
to Meet Current and
Future Needs
Capacity
Pe
rfo
rma
nce
![Page 15: Scale-out Data Deduplication Architecture](https://reader030.vdocuments.us/reader030/viewer/2022012920/61c7ad825809fb120a421947/html5/thumbnails/15.jpg)
SAFE DEDUPEProtection
SCALABLE DEDUPEPerformance
GLOBAL DEDUPEScope
AttributesCriteria
Page 15
Enhanced Scale-out Attributes
![Page 16: Scale-out Data Deduplication Architecture](https://reader030.vdocuments.us/reader030/viewer/2022012920/61c7ad825809fb120a421947/html5/thumbnails/16.jpg)
Scope
• Multiple applications and data sources across a single scalable deduplication repository
• Broader scope leads to greater efficiencies
– Higher dedupe ratios
– Improved capacity utilization
– Easier management
– Longer retention periods
– Lower costs
• Single scalable system vs. multiple disparate silos
– Cross-application deduplication for greater efficiency
– Global deduplication for entire environment
Page 16
![Page 17: Scale-out Data Deduplication Architecture](https://reader030.vdocuments.us/reader030/viewer/2022012920/61c7ad825809fb120a421947/html5/thumbnails/17.jpg)
Performance
• Linear performance scalability to keep up with data growth
• Scalability
– Linear scalability vs. high degradation
– Independent performance scalability
– Beyond the physical boundaries of the system
• Inline dedupe vs. post-process
– Keeping up with the workload
• Effects of deduplication rates
– Higher performance with higher dedupe
Page 17
![Page 18: Scale-out Data Deduplication Architecture](https://reader030.vdocuments.us/reader030/viewer/2022012920/61c7ad825809fb120a421947/html5/thumbnails/18.jpg)
Scale-out Dedupe Architecture
• Multi-node architecture
– Inline data routing
– Processing and memory scales with capacity
• Distributed hash table
• Linear performance scalability
• Global deduplication across ALL nodes
Page 18
![Page 19: Scale-out Data Deduplication Architecture](https://reader030.vdocuments.us/reader030/viewer/2022012920/61c7ad825809fb120a421947/html5/thumbnails/19.jpg)
Linear Performance Scalability
Page 19
![Page 20: Scale-out Data Deduplication Architecture](https://reader030.vdocuments.us/reader030/viewer/2022012920/61c7ad825809fb120a421947/html5/thumbnails/20.jpg)
Resiliency
• Enhanced data resiliency, especially with fewer copies due to deduplication
• The greater the dedupe ratio, the greater the exposure
• Maximum protection against multiple failures
– Component failures
– Appliance or node failures
• Upgradeability and technology refresh
– In-place technology refresh vs. forklift upgrade
– Online versus downtime
– Configurable resiliency for different applications
– Investment protection
Page 20
![Page 21: Scale-out Data Deduplication Architecture](https://reader030.vdocuments.us/reader030/viewer/2022012920/61c7ad825809fb120a421947/html5/thumbnails/21.jpg)
Erasure-coded Resiliency
• “User dialable” disk/node protection
– Intermix of multiple resiliency levels
– Dynamically allocated protection
• Greater protection with less overhead
– No idle spare drives
• Faster healing while maximizing I/O performance
– Only data is reconstructed
– Reconstruction across multiple spindles/processors
Page 21
![Page 22: Scale-out Data Deduplication Architecture](https://reader030.vdocuments.us/reader030/viewer/2022012920/61c7ad825809fb120a421947/html5/thumbnails/22.jpg)
Long-term Evolution
• Functionality
– Application-specific requirements (Security, Resiliency)
– Regulatory compliance requirements
• Performance
– Linear scalability throughput HW/SW enhancements
– Multiple I/O profiles
• Attributes
– Flexibility
– Efficiency
– Granularity
Page 22
![Page 23: Scale-out Data Deduplication Architecture](https://reader030.vdocuments.us/reader030/viewer/2022012920/61c7ad825809fb120a421947/html5/thumbnails/23.jpg)
Page 23
Long-term Data Protection
Sto
rag
e S
pe
nd
ing
TimeMonths Years Decades Centuries
Clones/Snapshots
WAN-Optimized Replication
Erasure-Coded Resiliency
Inline Global Deduplication
Automatic Load Balancing
Dynamic Provisioning
Massive ScalabilityFeatures drive
savings
![Page 24: Scale-out Data Deduplication Architecture](https://reader030.vdocuments.us/reader030/viewer/2022012920/61c7ad825809fb120a421947/html5/thumbnails/24.jpg)