deep store

Upload: madden8154

Post on 30-May-2018

219 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/9/2019 Deep Store

    1/25

    Deep Store: An ArchivalStorage System

    ArchitectureLawrence L. You, Kristal T. Pollack, Darrell D. E. Long

    Slides by: Brian Madden

    Friday, May 7, 2010

  • 8/9/2019 Deep Store

    2/25

    Motivation

    Estimated that over five exabytes of data

    produced in 2002

    Over 30% increase from previous year

    37% of stored data is immutable

    Expected to grow to more than 50% inthe next year (2003)

    Federal regulations (Sarbanes-Oxley, etc)

    Friday, May 7, 2010

  • 8/9/2019 Deep Store

    3/25

    Challenges

    Cost - Must be space efficient to reduce

    cost

    Scalability - Must scale to accommodate

    new data

    Reliability - Data has to be there later

    Retrieval - Data is useless if it cant be

    found

    Friday, May 7, 2010

  • 8/9/2019 Deep Store

    4/25

    Desired Properties

    Reduced storage cost

    Immutability of data

    Dynamically scalable

    Highly reliable

    Archival compliance

    Friday, May 7, 2010

  • 8/9/2019 Deep Store

    5/25

    Architecture Overview

    Primary Abstractions:

    Storage objects

    Physical storage components

    Software architecture

    Storage interface

    Friday, May 7, 2010

  • 8/9/2019 Deep Store

    6/25

    Primary Abstractions

    Storage objects

    File - single contiguous stream of binary

    data

    Identified by content; hash used ascontent address

    Metadata - filename, length, etc

    Also identified by content address

    Friday, May 7, 2010

  • 8/9/2019 Deep Store

    7/25

    Storage Nodes

    Nodes - the primary storage unit:

    Contains a processor, memory, and low-

    cost disk

    Nodes connect to form a storage cluster

    Friday, May 7, 2010

  • 8/9/2019 Deep Store

    8/25

    Software Architecture

    Consists of:

    Archival storage service

    Temporary storage buffer

    Content analyzer

    Content addressable store

    Friday, May 7, 2010

  • 8/9/2019 Deep Store

    9/25

  • 8/9/2019 Deep Store

    10/25

    PRESIDIO

    Progressive Redundancy Elimination of

    Similar and Identical Data In Objects

    Uses various compression and delta

    encoding schemes to reduce on disk data

    size

    Friday, May 7, 2010

  • 8/9/2019 Deep Store

    11/25

    PRESIDIO

    Virtual Object:

    Handle - contains content address

    Constant data block - Binary data

    Virtual data block - Polymorphicallyconstructed data block

    Friday, May 7, 2010

  • 8/9/2019 Deep Store

    12/25

    PRESIDIO

    Friday, May 7, 2010

  • 8/9/2019 Deep Store

    13/25

    PRESIDIO

    Objects are encapsulated by...

    Group: contain a number of megablocks

    Megablock: 16mb-4gb groups of data

    Maximizes contiguous writes

    Friday, May 7, 2010

  • 8/9/2019 Deep Store

    14/25

    PRESIDIO

    Each group stored on a node

    Groups can have varying levels of

    replication and coding for reliability

    A DHT is maintained mapping nodes to

    group numbers

    Friday, May 7, 2010

  • 8/9/2019 Deep Store

    15/25

    Metadata

    Current metadata is kind of lame...

    A rich, extensible metadata is much moreuseful!

    For search

    To help future consumers understand thedata, formats, etc

    Friday, May 7, 2010

  • 8/9/2019 Deep Store

    16/25

    Metadata

    Extended/Rich metadata counter

    productive to space efficiency

    Dont want to compress it

    Instead store metadata according to its role

    in the system

    Friday, May 7, 2010

  • 8/9/2019 Deep Store

    17/25

    Metadata

    Search metadata stored in query-enabled

    structure

    System metadata store in fast lookup

    structure

    Archival metadata stored in system itself

    Versioned, compressed losslessly

    Friday, May 7, 2010

  • 8/9/2019 Deep Store

    18/25

    Metadata

    To enable space efficient versioning

    Delta compression

    XML tree merging

    Friday, May 7, 2010

  • 8/9/2019 Deep Store

    19/25

  • 8/9/2019 Deep Store

    20/25

  • 8/9/2019 Deep Store

    21/25

    Evaluation

    Feature selection program runs at 19.7MB/s

    on P4 2.66GHz

    Delta encoding ran at 8.7MB/s

    Friday, May 7, 2010

  • 8/9/2019 Deep Store

    22/25

    Evaluation

    Friday, May 7, 2010

  • 8/9/2019 Deep Store

    23/25

    Bottom Line

    Using a compression technique that best

    fits the data yields better results

    Friday, May 7, 2010

  • 8/9/2019 Deep Store

    24/25

  • 8/9/2019 Deep Store

    25/25

    Conclusions

    Deep store...

    is an archival storage frameworkconsisting of abstractions for data objects

    includes content analysis and PRESIDIO

    proposes rich, extensible metadata

    proposes value based chunk redundancy