journal-guided resynchronization for software raid
Post on 04-Jan-2016
24 Views
Preview:
DESCRIPTION
TRANSCRIPT
Journal-guided Resynchronizationfor Software RAID
Timothy E. Denehy,Andrea C. Arpaci-Dusseau, and Remzi H. Arpaci-Dusseau
University of Wisconsin, Madison
RAID Consistent Update Problem
• RAID task is to maintain consistency
• Challenging in the face of crashes– Updates must be applied to more than one disk
• Inconsistency means window of vulnerability– Disk failure may lead to data loss
P P P PP P P P
P P P PP P P P
High-end RAID Solution
• Consistent update with non-volatile memory– Logs writes in NVRAM until they reach disk
• Performance – logging to NVRAM is fast
• Reliability – data is safe in NVRAM
• Availability – recovery is fast
• But, enterprise systems are expensive
Software RAID Solutions• Consistent update is challenging
– Performance versus reliability trade-off
• Performance: resynchronization after crash– Scan entire volume to fix inconsistencies– Extremely slow, hours for 100s of GBs to days for TBs– Reliability: lengthens window of vulnerability– Availability: consumes array bandwidth
• Reliability: log intentions to a bitmap– Performance: extra writes to maintain bitmap
Cooperative Software RAID Solution
• Journaling file systems perform logging– Maintain file system data structure consistency– ext3, ReiserFS, JFS, NTFS
• Journal-guided resynchronization– New ext3 mode: declared mode– New software RAID interface: verify read– Achieves performance, reliability, availability
Journal-guided Resync Overview
• Crash: What writes were outstanding?– Narrow the range of possible inconsistencies– Obtain information from journal (declared mode)
• Restart: journal-guided resynchronization– Use journal to identify outstanding writes– Communicate locations to RAID (verify read)– Check redundancy and repair inconsistencies– Greatly reduce the time for resynchronization
Outline
• Problem
• ext3 Background and Analysis
• ext3 Declared Mode and RAID Verify Read
• Journal-guided Resynchronization
• Evaluation
• Conclusion
ext3 Modes
• Data-journaling mode– All data and metadata is written to the journal
• Ordered mode (default)– Only metadata is written to the journal– Strict ordering between data and metadata
• Writeback mode– Only metadata is written to the journal– No ordering between data and metadata
ext3 Transactions
• Updates are grouped into transactions
• Transaction states– Running – collect updates in memory– Commit – write updates to journal– Checkpoint – write updates to home locations
ext3 Journal Structures
• Journal superblock– Head and tail pointers into journal file– Transaction sequence number
• Descriptor block– List of home locations for upcoming blocks
• Commit block– Marks the end of a transaction
Data-journaling Write Analysis
Jou
rnal
P P P PP P P P
P P P PP P P P
Su
per
METADATA
DATA
Running
DATA DATA
Running: collect file system updates in memoryCommit: write desc, meta, data to journal, wait (bounded) write commit to journal, wait (bounded)
CommittingCheckpoint: write journaled blocks to home, wait (known) update superblock (known)
DESC11
METADATA
DATA DATA DATACOMM
11
Checkpointing
Data-journaling Summary
• Provides a record of all outstanding writes– Suitable for journal-guided resynchronization
• Offers poor performance
Block Type Write Location
superblock known, fixed
journal bounded, fixed
home metadata known, descriptors
home data known, descriptors
Ordered Write Analysis
Jou
rnal
P P P PP P P P
P P P PP P P P
Su
per
METADATA
DATA
Running
DATA DATA
Running: collect file system updates in memory pdflush may write data to home (unknown)
Commit: write data to home, wait (unknown) write desc and meta to journal, wait (bounded) write commit to journal, wait (bounded)
Committing
DESC11
METADATA
COMM11
• Does not provide outstanding write record– Unsuitable for journal-guided resynchronization
Ordered SummaryBlock Type Write Location
superblock known, fixed
journal bounded, fixed
home metadata known, descriptors
home data unknown
Outline
• Problem
• ext3 Background and Analysis
• ext3 Declared Mode and RAID Verify Read
• Journal-guided Resynchronization
• Evaluation
• Conclusion
Declared Mode
• Variation of ordered mode– Only metadata is journaled, strict ordering
• Declares its intent to write to home locations
• New journal structure: declare block– List of home data locations for the transaction
• Space and performance overheads
Declared Write Analysis
Jou
rnal
P P PPP P P P
P P P PP P P P
Su
per
METADATA
DATA
Running
DATA DATA
Running: collect file system updates in memory pdflush may write data to home (unknown)
Commit: write declare to journal, wait (bounded) write data to home, wait (known) write desc and meta to journal, wait (bounded) write commit to journal, wait (bounded)
Committing
DESC11
METADATA
COMM11
DECL11
Software RAID Verify Read
• File system must communicate possible inconsistencies to the software RAID layer
• New interface: verify read request– Read block and verify its redundant information– Repair redundant information if inconsistent
P P P PP P P P
P P P PP P P P
P= ?xorxor
Outline
• Problem
• ext3 Background and Analysis
• ext3 Declared Mode and RAID Verify Read
• Journal-guided Resynchronization
• Evaluation
• Conclusion
Journal-guided Resynchronization
Jou
rnal
DECL12
P P PPP P P P
P P P PP P P P
Su
per
Recovery and Resynchronization: superblock write: verify read for superblock checkpointing: verify reads for descriptor home locations committing: verify reads for head of the journal home data writes: verify reads for declared home locations checkpoint committed transactions
DESC11
METADATA
COMM11
DECL11
Outline
• Problem
• ext3 Background and Analysis
• ext3 Declared Mode and RAID Verify Read
• Journal-guided Resynchronization
• Evaluation
• Conclusion
Declared Mode Evaluation
• Microbenchmarks (versus ordered mode)– Random write (3% slowdown)– Sequential write (5% slowdown)– Sprite create, read, unlink (4% slowdown)
• Macrobenchmarks– ssh Benchmark (3% speedup for unpack)– Postmark (40% speedup - 5% slowdown)
• Speedup from globally sorted write order
– TPC-B (20% - 5% slowdown)• Small transaction size increases declare overhead
Implementation Complexity
• Cooperative approach reduces complexity
Journal-guided Resynchronization
ModuleOriginal
LinesModified
LinesChange
Software RAID-5 3475 18 0.5 %
ext3 8621 69 0.8 %
Journaling 3472 308 8.9 %
Total 15568 395 2.5 %
Linux RAID-1 Intent Bitmap Logging
Software RAID-1 3116 1193 38.3 %
Resynchronization Experiment
• Five disk, 1 GB RAID-5 array
• Foreground process reading a set of files
• After 30 seconds, crash and restart machine– Resynchronization begins– Foreground process restarts
• Monitor foreground bandwidth and resync
Resynchronization Results
• Availability: foreground BW from 29.6 to 34.1 MB/s• Reliability: vulnerability from 254 to 0.21 seconds
– Reduced from O(array size) to O(journal size)
Outline
• Problem
• ext3 Background and Analysis
• ext3 Declared Mode and RAID Verify Read
• Journal-guided Resynchronization
• Evaluation
• Conclusion
Conclusion
• RAID consistent updates are challenging
• Analyzed ext3 journaling, declared mode– Identifies outstanding writes after a crash
• Software RAID verify read interface
• Journal-guided Resynchronization– Leverages functionality, reducing complexity– Provides performance, reliability, and availability
• Cooperation between layers is the key
Questions?
http://www.cs.wisc.edu/adsl/
top related