deep store
TRANSCRIPT
-
8/9/2019 Deep Store
1/25
Deep Store: An ArchivalStorage System
ArchitectureLawrence L. You, Kristal T. Pollack, Darrell D. E. Long
Slides by: Brian Madden
Friday, May 7, 2010
-
8/9/2019 Deep Store
2/25
Motivation
Estimated that over five exabytes of data
produced in 2002
Over 30% increase from previous year
37% of stored data is immutable
Expected to grow to more than 50% inthe next year (2003)
Federal regulations (Sarbanes-Oxley, etc)
Friday, May 7, 2010
-
8/9/2019 Deep Store
3/25
Challenges
Cost - Must be space efficient to reduce
cost
Scalability - Must scale to accommodate
new data
Reliability - Data has to be there later
Retrieval - Data is useless if it cant be
found
Friday, May 7, 2010
-
8/9/2019 Deep Store
4/25
Desired Properties
Reduced storage cost
Immutability of data
Dynamically scalable
Highly reliable
Archival compliance
Friday, May 7, 2010
-
8/9/2019 Deep Store
5/25
Architecture Overview
Primary Abstractions:
Storage objects
Physical storage components
Software architecture
Storage interface
Friday, May 7, 2010
-
8/9/2019 Deep Store
6/25
Primary Abstractions
Storage objects
File - single contiguous stream of binary
data
Identified by content; hash used ascontent address
Metadata - filename, length, etc
Also identified by content address
Friday, May 7, 2010
-
8/9/2019 Deep Store
7/25
Storage Nodes
Nodes - the primary storage unit:
Contains a processor, memory, and low-
cost disk
Nodes connect to form a storage cluster
Friday, May 7, 2010
-
8/9/2019 Deep Store
8/25
Software Architecture
Consists of:
Archival storage service
Temporary storage buffer
Content analyzer
Content addressable store
Friday, May 7, 2010
-
8/9/2019 Deep Store
9/25
-
8/9/2019 Deep Store
10/25
PRESIDIO
Progressive Redundancy Elimination of
Similar and Identical Data In Objects
Uses various compression and delta
encoding schemes to reduce on disk data
size
Friday, May 7, 2010
-
8/9/2019 Deep Store
11/25
PRESIDIO
Virtual Object:
Handle - contains content address
Constant data block - Binary data
Virtual data block - Polymorphicallyconstructed data block
Friday, May 7, 2010
-
8/9/2019 Deep Store
12/25
PRESIDIO
Friday, May 7, 2010
-
8/9/2019 Deep Store
13/25
PRESIDIO
Objects are encapsulated by...
Group: contain a number of megablocks
Megablock: 16mb-4gb groups of data
Maximizes contiguous writes
Friday, May 7, 2010
-
8/9/2019 Deep Store
14/25
PRESIDIO
Each group stored on a node
Groups can have varying levels of
replication and coding for reliability
A DHT is maintained mapping nodes to
group numbers
Friday, May 7, 2010
-
8/9/2019 Deep Store
15/25
Metadata
Current metadata is kind of lame...
A rich, extensible metadata is much moreuseful!
For search
To help future consumers understand thedata, formats, etc
Friday, May 7, 2010
-
8/9/2019 Deep Store
16/25
Metadata
Extended/Rich metadata counter
productive to space efficiency
Dont want to compress it
Instead store metadata according to its role
in the system
Friday, May 7, 2010
-
8/9/2019 Deep Store
17/25
Metadata
Search metadata stored in query-enabled
structure
System metadata store in fast lookup
structure
Archival metadata stored in system itself
Versioned, compressed losslessly
Friday, May 7, 2010
-
8/9/2019 Deep Store
18/25
Metadata
To enable space efficient versioning
Delta compression
XML tree merging
Friday, May 7, 2010
-
8/9/2019 Deep Store
19/25
-
8/9/2019 Deep Store
20/25
-
8/9/2019 Deep Store
21/25
Evaluation
Feature selection program runs at 19.7MB/s
on P4 2.66GHz
Delta encoding ran at 8.7MB/s
Friday, May 7, 2010
-
8/9/2019 Deep Store
22/25
Evaluation
Friday, May 7, 2010
-
8/9/2019 Deep Store
23/25
Bottom Line
Using a compression technique that best
fits the data yields better results
Friday, May 7, 2010
-
8/9/2019 Deep Store
24/25
-
8/9/2019 Deep Store
25/25
Conclusions
Deep store...
is an archival storage frameworkconsisting of abstractions for data objects
includes content analysis and PRESIDIO
proposes rich, extensible metadata
proposes value based chunk redundancy