the hp autoraid hierarchical storage system

31
The HP AutoRAID Hierarchical Storage System John Wilkes, Richard Golding, Carl Staelin, and Tim Sullivan Presented by Arthur Strutzenberg

Upload: addison

Post on 22-Feb-2016

38 views

Category:

Documents


1 download

DESCRIPTION

The HP AutoRAID Hierarchical Storage System. John Wilkes, Richard Golding, Carl Staelin, and Tim Sullivan. Presented by Arthur Strutzenberg. What is RAID. This stands for (R)edundant (A)rray of (I)ndependent (D)isks - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: The HP AutoRAID Hierarchical Storage System

The HP AutoRAID Hierarchical Storage System

John Wilkes, Richard Golding, Carl Staelin, and Tim Sullivan

Presented by Arthur Strutzenberg

Page 2: The HP AutoRAID Hierarchical Storage System

What is RAID

This stands for (R)edundant (A)rray of (I)ndependent (D)isks RAID can be configured according to many different

levels. For this paper we will worry about Level 3 Level 5

You have two goals with a RAID array Performance Boosting Data Redundancy

Page 3: The HP AutoRAID Hierarchical Storage System

Data Redundancy (Mirroring)

Mirroring is a complete copy of all of the dataProvides reliability The cost is a high storage overhead

Page 4: The HP AutoRAID Hierarchical Storage System

Mirroring at the different levels

RAID 5 works by interleaving blocks of data and parity information through the disk. This alleviates the performance bottleneck of having a single disk devoted to it.

Parity

Data

Raid 3 works by interleaving data across a set of disks, and stores parity information on a separate “parity” data disk

Data Parity(Bottleneck)Data

Page 5: The HP AutoRAID Hierarchical Storage System

The Challenges of RAID

This technology is difficult to useDifferent RAID levels have different

performance characteristicsTypically there are a LOT of configuration

parametersWrong Choices in the configuration usually

result inPoor PerformanceHaving to rebuild the array (a daunting task)

Page 6: The HP AutoRAID Hierarchical Storage System

Combining Level 5 & Mirroring

This paper explores what happens when you combine elements from Mirroring with that of Level 5 Gain the performance advantages of mirroring Gain the advantage of the read rates of Level 5 Gain the write advantages of Level 3

Assumption #1: It is possible to divide the filesystem between active and inactive data

Assumption #2: The active subsets must change relatively slowly

Page 7: The HP AutoRAID Hierarchical Storage System

Implementation Levels

As with any OS design, Managed Storage can occur at many different levelsManually by the Sysadmin In the FileSystemAt the Hardware (Controller) level

Page 8: The HP AutoRAID Hierarchical Storage System

Manual by the Sysadmin

Advantage of bringing the human element into the problem

Disadvantage of bringing the human into the problem

Page 9: The HP AutoRAID Hierarchical Storage System

Implementation: FileSystem Level

Advantage of offering knowledge about the file system

Disadvantage that there are many file system implementations out there

Page 10: The HP AutoRAID Hierarchical Storage System

At the Controller Level

Advantage is that this is easily deployable-in the case of HP AutoRAID, the system looks like a POD (Plain ‘ol Drive)

Disadvantage that you lose knowledge about the physical location of data on disk

Page 11: The HP AutoRAID Hierarchical Storage System

Features of HP AutoRAID

In the design of AutoRAID the authors attempted to provide a system with (Transparent) Mapping Mirroring Adaptation to storage changes

Hot pluggable Storage Expansion (Adding a new disk) Disk Upgrades (Switching out a disk)

Controller Failover (Redundancy) Simple Administration & Setup Log Structured RAID 5 Writes

In other words they wanted a product that would be easy to convert into something that was eventually salable!

Page 12: The HP AutoRAID Hierarchical Storage System

Bringing in other Papers

Part of the approach for AutoRAID uses the idea from most Virtual Memory Management systemsYou have a single “address space” on the

file systemHowever a lot of complicated calculations

and work that goes on in the background

Page 13: The HP AutoRAID Hierarchical Storage System

Looking at this in layers

Host

SCSI Device

AutoRAID Logic

SCSI Devices

Physical Drives

Cache

Mirrored Level Level 5Migration

If we were to look at this in a layered approach, there are two ways to examine the system

Page 14: The HP AutoRAID Hierarchical Storage System

HP AutoRAID Structure

ParityLogic

Processor,RAM &

Control Logic

Speed Matching

RAM

SCSI Controller

DRAM Read Cache

NVRAM Write

Cache

Other RAM

SCSIControllers

Bus

Host Processor

Page 15: The HP AutoRAID Hierarchical Storage System

Data Layout

The AutoRAID system is organized by the following logical units PEX (P)hysical (EX)tents

The large granual sized data chunk that divides each disk PEG (P)hysical (E)xtent (G)roup

Several PEX’s can be combined together to form one PEG 3 states: each PEG is either mirrored, RAID 5, or unallocated

Segments Units of contiguous space on a drive included in a stripe or mirrored

pair Describes the Stripe unit in RAID 5, or the unit of duplication in a

mirrored pair RB (R)elocation (B)locks

The unit of migration used by the system LUN (L)ogical (U)nit

The visible disk viewable by the host

Page 16: The HP AutoRAID Hierarchical Storage System

Data Layout at the various layers

RAID 5 uses a log system approachDue to this, it needs to have a lot of free

space to write out its logsThis necessitates the need for a cleaner

that works similarly to the system presented in the paper last week

This system does add some complexity to this process, due to the nature of the parity data and when it is written

Page 17: The HP AutoRAID Hierarchical Storage System

Data Layout

PEXes

DiskAddresses PEGs

Disks

Page 18: The HP AutoRAID Hierarchical Storage System

Data Layout Cont

Segment0 0' 1 1' 2

3 3'2'

X X’

Relocation Blocks

PEX

Mirrored Pair

Mirrored PEG

Relocation Blocks

0 1 2 3 P0

P7 29 3028Stripe

Striped PEG

Page 19: The HP AutoRAID Hierarchical Storage System

Mapping Structure

AutoRAID makes use of a virtual device table This table maps RB elements to the PEG’s in which

they reside The PEG table holds a list of RB’s in the PEG

and the PEXes used to store them Finally you have the PEX tables. (one per

drive) This all goes on behind the scenes, just like a

VMM system

Page 20: The HP AutoRAID Hierarchical Storage System

Reading and Writing

Here is where it gets interesting The autoraid system makes use of a cache

system like most standard computer systems However this system divides the cache

between standard DRAM (for read cache), and NVRAM (for write cache)

Reads first check the cache to see if the data resides there. A cache miss results in a read request that is dispatched to the back end storage

Using NVRAM for write cache allows for several things

This host system can consider a write request as “complete” once a copy of the data to write has been written into this memory.

This allows the autoraid system to ameliorate their cost of multiple writes by combining them together into one monolithic write

ParityLogic

Processor,RAM &

Control Logic

Speed Matching

RAM

SCSI Controller

DRAM Read Cache

NVRAM Write Cache

Other RAM

SCSIControllers

Bus

Host Processor

Page 21: The HP AutoRAID Hierarchical Storage System

Mirrored Reads & Writes

Both Reads & Writes to the mirrored storage are straightforwardRead picks up a copy of the data from one

of the disksWrites Requests will generate a write to

both disks, and the request only returns once both disks have been updated

THIS IS INDEPENDENT OF A WRITE PERFORMED BY A HOST

Page 22: The HP AutoRAID Hierarchical Storage System

RAID 5 Reads & Writes

Reads to RAID 5 are just as straightforward Writes are where things get a tad more complex

The AutoRAID RAID 5 layer makes use of a log approach and happen

Per RB, As a batch

There are times as well where the RAID 5 implementation makes use of In Place writes instead of the logged approach (more about this later)

Page 23: The HP AutoRAID Hierarchical Storage System

Background Operations Because of the nature of this system there are several

background operations occur to keep the array healthy and balanced Compaction, Cleaning & Hole Plugging

In the case of both Mirrored, and RAID 5, the system suffers from fragmentation. This process identifies the holes, and compacts the system in order to generate fresh unfragmented resources for use.

Migration The crux of this system involves moving more elderly data that

has not been modified into RAID 5. This is done as a house cleaning step This is also performed due to the “bursty” nature of most

writes. This system ensures that a minimum threshold is kept within the mirrored space

Balancing This is the process of migrating PEX’s between disks to equalize

data load. This process is necessary because of the dynamic nature of the disk system

Page 24: The HP AutoRAID Hierarchical Storage System

Does it work? Test Results

Test results were garnered from a combination of Prototype testing Simulation

When possible comparisons were made against two basic controls A standard RAID setup that used RAID Level 5 Individual Disks (aka JBOD or Just a Bunch of

Disks)

Page 25: The HP AutoRAID Hierarchical Storage System

Transaction Rates (Macrobenchmarks)

The test was a database workload on an OLTP Database that comprised a series of medium weight transactions Graph one compares

AutoRAID to the control groups

Graph two compares what happens when you add more disks into the system

Page 26: The HP AutoRAID Hierarchical Storage System

Microbenchmarks

Page 27: The HP AutoRAID Hierarchical Storage System

Simulation Results

In this case the Simulations compared several systems Cello, an HP-UX time sharing system Snake, a clustered file server OLTP benchmark tests Hplajw a workstation and netware server

Overall there were hundreds of experiments that the authors could simulate. What follows is only a few Consequences of adjusting disk speed Consequences of adjusting the RB Size Consequences of poor data layout Consequences of various read disk policies Write Cache Overwrites RB demotion via standard (log approach) vs demotion using

hole plugging

Page 28: The HP AutoRAID Hierarchical Storage System

Simulation Results

Page 29: The HP AutoRAID Hierarchical Storage System

Data Layout & Read Disk Selection

Page 30: The HP AutoRAID Hierarchical Storage System

Write Cache Overwrites and Hole Plugging

The normal mode of operation for AutoRAID is to log RP demotions using a standard log process. Changing the demotion process to a hole plugging process had the effect of Reducing the RP’s moved

by the cleaner by 93% for the cello/usr workload, and 96% for the snake workload

Improved mean i/o time for the user i/o’s by 8.4% and 3.2% respectively

Page 31: The HP AutoRAID Hierarchical Storage System

Conclusions

The authors indicated that what they had was a working prototype

They were able to prove that such a system is workable, and demonstrated that it approached the performance of the standard JBOD system

This disk system approach hearkens back to the days of the Commodore. The drive systems on the Commodore brand machines generally could be considered a computer system in and of themselves