1 sqck: a declarative file system checker haryadi s. gunawi, abhishek rajimwale, andrea c....
Post on 13-Dec-2015
217 Views
Preview:
TRANSCRIPT
1
SQCK: A Declarative File System Checker
Haryadi S. Gunawi, Abhishek Rajimwale,
Andrea C. Arpaci-Dusseau, Remzi H. Arpaci-Dusseau
University of Wisconsin – Madison
OSDI ’08 – December 9th, 2008
2/25
Corrupt file systems File systems
Store massive amounts of data Must be reliable
Corrupted file system images Due to hardware errors, file system bugs, etc. Need to be repaired a.s.a.p.
3/25
Who should repair? Does journaling (write-ahead log) help?
No, only for crashes
Does file system repair itself online? No, not enough machinery
Fsck: the last line of defense It’s a “must have” utility
− XFS: “no need fsck ever”, but deploys fsck at the end Must be fully reliable
4/25
But … fsck is complex Fsck has a big task
Turn any corrupt image to a consistent image E.g. check if a data block is shared by two inodes
How are they implemented? Written in C hard to reason about Large and complex
− Ext2 fsck: 150 checks in 16 KLOC− XFS fsck: 340 checks in 22 KLOC
Hundreds of cluttered if-check statements
Bottom line: fsck code is “untouchable”
5/25
Two Questions
Are current checkers really reliable?
If not, how should we build robust checkers?
6/25
e2fsck is unreliable
Analyze e2fsck (ext2 file system checker)
Findings: Inconsistent repair
− The file system becomes unreadable Consistent but not “correct”
− Fsck deletes valid directory entries− Fsck loses a huge number of files
7/25
SQCK Lesson: Complexity is the enemy of reliability
Big task + bad design complexity unreliability Need a higher-level approach for simplicity
SQCK (SQL-based Fsck) Use a declarative query language to write checks Put simply: write fewer lines of code
Evaluation Simple and reliable: e2fsck in 150 queries (vs. 16 KLOC of C) More: Great flexibility and reasonable performance
8/25
Outline Introduction
Analysis of e2fsck
SQCK Design
SQCK Evaluation
Conclusion
9/25
Methodology E2fsck task: cross-check all ext2 metadata
An indirect pointer should not point to the superblock A subdir should only be accessible from one
directory
Inject single corruption Observe how e2fsck repairs a single corruption Only corrupt on-disk pointers
− Corrupt an indirect pointer to point to the superblock− Corrupt a directory entry to point to another directory
Usually, a corrupt pointer is simply cleared to zero
10/25
Inconsistent (Out-of-order) Repair
Inode
*ind
850
851
998999
853
Inode
*ind
…
…
…
…
Indirect block
0
Superblock
1. Check bad indirect pointer
2. Check indirect content
Ideal fsck
e2fsckInode
*ind
Inode
*ind
…
…
…
…
Superblock
2. Check indirect content
1. Check bad indirect pointer
0
Superblock
…
…
…
…
0
0
0
11/25
Consistent but Incorrect Repair (1)
/
a1 b1
a2 b2
Ideal fsck
e2fsck
/
a1 b1
a2 b2
/
a1 b1
a2b2
XLF
/
a1 b1
a2 b2
/
a1 b1
b2
X
Kidnapping problem!
E2fsck does not use all available information
12/25
Result Summary Four problems
Inconsistent Information-incomplete Policy-inconsistent Insecure
E2fsck does not handle all corruptions “Warning: Programming bug in e2fsck! Or some bonehead
(you) is checking a mounted (live) filesystem.”
Not simple implementation bugs Difficult to combine available information Difficult to ensure correct ordering
13/25
Outline Introduction
Analysis
SQCK Design
SQCK Evaluation
Conclusion
14/25
Fsck Properties Hundreds of checks
Complex cross-checks Taxonomy of checks in e2fsck:
Must be ordered correctly
Single instance
Multiple instances
Same structure
63 11
Different structures
12 35
struct A {
int x
int y
}
A {
x
y
}
A {
x
y
}
A {
x
y
}
A {
x
y
}
B {
m
n
}
A { x y}
B { m n}
A { x y}
B { m n}
A { x y}
B { m n}
15/25
A Declarative Approach Lesson: Complexity is the enemy of reliability
SQCK Use a declarative query language (e.g. SQL), why? It is declarative: high-level intent is clear Fit for cross-checking massive information
Goals achieved Simple: e2fsck in 150 queries (vs. 16 KLOC of C) Reliable: Each check/query is easy to understand Flexible: Plug in/out different queries
16/25
Using SQCK Take a fs image
Load metadata to db tables Temporary tables Ex: InodeTable,
GroupDescTable, DirEntryTable
Run checks and repairs (in the form of queries)
Flush any modification, and delete tables
ScannerLoader
File system image
Checks + Repairs
Flush
Database tables
17/25
Declarative check (example 1) Cross-checking a single instance of a structure
“Find block bitmap that is not located within its block group”
first_block = sb->s_first_data_block;last_block = first_block + blocks_per_group;for (i = 0, gd=fs->group_desc; i < fs->group_desc_count; i++, gd++) \{ if (i == fs->group_desc_count - 1) last_block = sb->s_blocks_count; if ((gd->bg_blk_bmap < first_block) || (gd->bg_blk_bmap >= last_block)) { px.blk = gd->bg_block_bitmap; if (fix_problem(BB_NOT_GROUP, ...)) gd->bg_block_bitmap = 0; } ...}
SELECT *FROM GroupDescTable GWHERE G.blockBitmap NOT BETWEEN G.start AND G.end
18/25
Declarative check (example 2) Cross-checking multiple instances of the same
structure
“Find false parents (i.e. directory entries that point to a subdirectory that already belongs to another directory)” Must read all directory entries in dir data blocks Wrong implementation in e2fsck (the kidnapping
problem)
19/25
Declarative check (example 2)if ((dot_state > 1) && (ext2fs_test_inode_bitmap (ctx->inode_dir_map, dirent->inode))) { // ext2fs_get_dir_info // is 20 lines long subdir = e2fsck_get_dir_info (dirent->inode); ... if (subdir->parent) { if (fix_problem(LINK_DIR,..)) { dirent->inode = 0; goto next; } } else { subdir->parent = ino; }}
20/25
Declarative check (example 2)SELECT F.* // returns the // false parent(s)
FROM DirEntryTable P, C, F
WHERE // P says C is its child P.entry_num >= 3 AND P.entry_ino = C.ino AND
// and C says P is his parent C.entry_num = 2 AND C.entry_ino = P.ino AND
// F also says C is its child F.entry_num >= 3 AND F.entry_ino = C.ino AND F.ino <> P.ino AND
F P
C
21/25
Declarative Repairs Running declarative checks is part of the problem
Must also perform the declarative repairs
A repair = An update query Some repairs simply update a few fields
A repair = A series of queries Ex: Reconnect an orphan directory to the lost+found directory Combine a series of queries with C code
− All repairs are written in SQL− C code is only used for connecting them
...SET T.field = newValue, T.dirty = 1
22/25
Outline Introduction
Analysis
SQCK Design
SQCK Evaluation
Conclusion
23/25
SQCK Evaluation Complexity
150 queries in 1100 lines of SQL statements (compared to 16,000 lines of C in e2fsck)
Reliability Pass hundreds of corruption scenarios
Flexibility Add new checks/repairs Enable different versions of e2fsck
Performance Introduce some optimizations
24/25
SQCK vs. e2fsck
Reasonable First generation of
SQCK (with MySQL) Within 1.5x of e2fsck
Future optimizations Hierarchical checks Concurrent queries
25/25
Conclusion Complexity is the enemy of reliability
Recovery code is complex
SQCK: Build recovery tools with a higher-level approach
26
Thank you!Questions?
ADvanced Systems Laboratory www.cs.wisc.edu/adsl
top related