recon: verifying file system consistency at runtime · 2019. 2. 25. · daniel fryer, jack (kuei)...
TRANSCRIPT
![Page 1: Recon: Verifying File System Consistency at Runtime · 2019. 2. 25. · Daniel Fryer, Jack (Kuei) Sun, Rahat Mahmood, TingHao Cheng, ... Data corruption: resume from hibernate always](https://reader034.vdocuments.us/reader034/viewer/2022052022/60371c0fb244d92ed17491d6/html5/thumbnails/1.jpg)
October 4, 2011
Recon: Verifying File System
Consistency at Runtime
Daniel Fryer, Jack (Kuei) Sun,
Rahat Mahmood, TingHao Cheng, Shaun Benjamin, Angela Demke Brown and Ashvin Goel
University of Toronto
![Page 2: Recon: Verifying File System Consistency at Runtime · 2019. 2. 25. · Daniel Fryer, Jack (Kuei) Sun, Rahat Mahmood, TingHao Cheng, ... Data corruption: resume from hibernate always](https://reader034.vdocuments.us/reader034/viewer/2022052022/60371c0fb244d92ed17491d6/html5/thumbnails/2.jpg)
Metadata Integrity is Crucial
You don’t know what
you’ve got ’til it’s gone…
2
D D a
D D D
D D t
D D a
Kernel
Block Layer
M M M
Storage
File System
![Page 3: Recon: Verifying File System Consistency at Runtime · 2019. 2. 25. · Daniel Fryer, Jack (Kuei) Sun, Rahat Mahmood, TingHao Cheng, ... Data corruption: resume from hibernate always](https://reader034.vdocuments.us/reader034/viewer/2022052022/60371c0fb244d92ed17491d6/html5/thumbnails/3.jpg)
File Systems Have Bugs
Why can’t existing solutions handle this problem?
3
Bugs in Linux Ext3 File System Closed
panic/ext3 fs corruption with RHEL4-U6-re20070927.0 2007-11
Re: [2.6.27] filesystem (ext3) corruption (access beyond end) 2008-06
linux-2.6: ext3 filesystem corruption 2008-09
linux-image-2.6.29-2-amd64: occasional ext3 filesystem
corruption
2009-06
ENOSPC during fsstress leads to filesystem corruption on ext2,
ext3, and ext4
2010-03
ext3: Fix fs corruption when make_indexed_dir() fails 2011-06
Data corruption: resume from hibernate always ends up with
EXT3 fs errors
Not yet
![Page 4: Recon: Verifying File System Consistency at Runtime · 2019. 2. 25. · Daniel Fryer, Jack (Kuei) Sun, Rahat Mahmood, TingHao Cheng, ... Data corruption: resume from hibernate always](https://reader034.vdocuments.us/reader034/viewer/2022052022/60371c0fb244d92ed17491d6/html5/thumbnails/4.jpg)
“Solutions”
4
None of these protect against bugs in file systems
Existing approaches assume file systems are correct
Kernel
Block Layer
Storage
File System
RAID?
Checksums? Journals?
![Page 5: Recon: Verifying File System Consistency at Runtime · 2019. 2. 25. · Daniel Fryer, Jack (Kuei) Sun, Rahat Mahmood, TingHao Cheng, ... Data corruption: resume from hibernate always](https://reader034.vdocuments.us/reader034/viewer/2022052022/60371c0fb244d92ed17491d6/html5/thumbnails/5.jpg)
Offline Checking
• Check consistency offline, e.g., fsck
• Consistency properties necessary for correctness
5
FS1: No double
allocation FS2: Refcount-based
sharing
D D
M M
D Ref: 2
M M metadata
data
![Page 6: Recon: Verifying File System Consistency at Runtime · 2019. 2. 25. · Daniel Fryer, Jack (Kuei) Sun, Rahat Mahmood, TingHao Cheng, ... Data corruption: resume from hibernate always](https://reader034.vdocuments.us/reader034/viewer/2022052022/60371c0fb244d92ed17491d6/html5/thumbnails/6.jpg)
Problems with Offline Checking
• Slow, getting slower with larger disks
• Requires taking file system offline
• After the fact, repair is error prone
6
M M
D
metadata
data
![Page 7: Recon: Verifying File System Consistency at Runtime · 2019. 2. 25. · Daniel Fryer, Jack (Kuei) Sun, Rahat Mahmood, TingHao Cheng, ... Data corruption: resume from hibernate always](https://reader034.vdocuments.us/reader034/viewer/2022052022/60371c0fb244d92ed17491d6/html5/thumbnails/7.jpg)
Outline
• Problem
• Metadata can be corrupted by bugs and existing
techniques are inadequate
• Our Solution: Recon
• a system for protecting metadata from bugs
• Key idea
• Runtime consistency checking
• Design
• Evaluation
7
![Page 8: Recon: Verifying File System Consistency at Runtime · 2019. 2. 25. · Daniel Fryer, Jack (Kuei) Sun, Rahat Mahmood, TingHao Cheng, ... Data corruption: resume from hibernate always](https://reader034.vdocuments.us/reader034/viewer/2022052022/60371c0fb244d92ed17491d6/html5/thumbnails/8.jpg)
Runtime Consistency Checking
• Ensure every update results in a consistent file
system
• Makes repair unnecessary!
• “What happens in DRAM stays in DRAM”
BUT
• Consistency properties are global
• Global properties require full scan
• We can’t run fsck at every write
8
![Page 9: Recon: Verifying File System Consistency at Runtime · 2019. 2. 25. · Daniel Fryer, Jack (Kuei) Sun, Rahat Mahmood, TingHao Cheng, ... Data corruption: resume from hibernate always](https://reader034.vdocuments.us/reader034/viewer/2022052022/60371c0fb244d92ed17491d6/html5/thumbnails/9.jpg)
Consistency Invariants
• We transform global consistency properties to
fast, local consistency invariants
• Assume initial consistent state
• New file system is clean
• Use checksums/redundancy to handle errors below FS
• At runtime, check only what is changing
• Do so before changes become persistent
• Resulting new state is consistent
9
![Page 10: Recon: Verifying File System Consistency at Runtime · 2019. 2. 25. · Daniel Fryer, Jack (Kuei) Sun, Rahat Mahmood, TingHao Cheng, ... Data corruption: resume from hibernate always](https://reader034.vdocuments.us/reader034/viewer/2022052022/60371c0fb244d92ed17491d6/html5/thumbnails/10.jpg)
size
Example: Block Allocation in Ext3
• Ext3 maintains a block bitmap – every allocated
block is marked in the bitmap
10
Block Bitmap
5 6 7 8 9
Block 7
inode
time
7
Block 8
Updated Block 8 8 U
pdate
d B
lock
![Page 11: Recon: Verifying File System Consistency at Runtime · 2019. 2. 25. · Daniel Fryer, Jack (Kuei) Sun, Rahat Mahmood, TingHao Cheng, ... Data corruption: resume from hibernate always](https://reader034.vdocuments.us/reader034/viewer/2022052022/60371c0fb244d92ed17491d6/html5/thumbnails/11.jpg)
Example: Block Allocation in Ext3
• Consistency Invariant
• Invariant fails if either update is missing
• Should not mark allocated without setting block pointer
• Should not set block pointer without marking allocated
• Can any consistency property be transformed?
• File systems should maintain consistency efficiently
11
Bitmap bit X flip
from “0” to “1”
Block pointer
set to X
![Page 12: Recon: Verifying File System Consistency at Runtime · 2019. 2. 25. · Daniel Fryer, Jack (Kuei) Sun, Rahat Mahmood, TingHao Cheng, ... Data corruption: resume from hibernate always](https://reader034.vdocuments.us/reader034/viewer/2022052022/60371c0fb244d92ed17491d6/html5/thumbnails/12.jpg)
When to Check Invariants
• Invariants involve changes to multiple blocks
• When should they be consistent?
• Transactions are used for crash consistency
• Consistency can be checked at transaction
boundaries
12
Transaction
Must check transaction
just before commit block
reaches disk
Memory
Disk
![Page 13: Recon: Verifying File System Consistency at Runtime · 2019. 2. 25. · Daniel Fryer, Jack (Kuei) Sun, Rahat Mahmood, TingHao Cheng, ... Data corruption: resume from hibernate always](https://reader034.vdocuments.us/reader034/viewer/2022052022/60371c0fb244d92ed17491d6/html5/thumbnails/13.jpg)
Outline
• Problem
• Metadata corruption cause by bugs
• Solution
• Recon
• Key idea
• Runtime checking
• Design
• Metadata interpretation
• Logical change generation
• Evaluation
13
![Page 14: Recon: Verifying File System Consistency at Runtime · 2019. 2. 25. · Daniel Fryer, Jack (Kuei) Sun, Rahat Mahmood, TingHao Cheng, ... Data corruption: resume from hibernate always](https://reader034.vdocuments.us/reader034/viewer/2022052022/60371c0fb244d92ed17491d6/html5/thumbnails/14.jpg)
The Recon Design
14
Recon
File System
Ye Olde Disk
Block Layer
Metadata
Write Cache
Metadata
Read Cache
Ext3_Recon
Btrfs_Recon
FS Recon Interface
Metadata interpretation
Logical change generation
![Page 15: Recon: Verifying File System Consistency at Runtime · 2019. 2. 25. · Daniel Fryer, Jack (Kuei) Sun, Rahat Mahmood, TingHao Cheng, ... Data corruption: resume from hibernate always](https://reader034.vdocuments.us/reader034/viewer/2022052022/60371c0fb244d92ed17491d6/html5/thumbnails/15.jpg)
Metadata Interpretation
• To check invariants, we need to determine the
type of a block on a read or write
• Take advantage of tree structure of metadata
• Superblock is the root of the tree
• Parents are read before children
• For example, inode is read before indirect blocks
• We see the pointer to the block before the block, and
• The pointer within the parent determines the type of
the child block
15
![Page 16: Recon: Verifying File System Consistency at Runtime · 2019. 2. 25. · Daniel Fryer, Jack (Kuei) Sun, Rahat Mahmood, TingHao Cheng, ... Data corruption: resume from hibernate always](https://reader034.vdocuments.us/reader034/viewer/2022052022/60371c0fb244d92ed17491d6/html5/thumbnails/16.jpg)
Logical Change Generation
• Invariants are expressed in terms of logical
changes to structures, e.g., bitmaps, pointers
• Recon generates these changes based on
• Block types
• Comparing the blocks in the write and read cache
• Logical changes to metadata structures are
represented as a set of change records:
16
Bitmap bit X flip
from “0” to “1”
Block pointer
set to X
[type, id, field, old, new]
![Page 17: Recon: Verifying File System Consistency at Runtime · 2019. 2. 25. · Daniel Fryer, Jack (Kuei) Sun, Rahat Mahmood, TingHao Cheng, ... Data corruption: resume from hibernate always](https://reader034.vdocuments.us/reader034/viewer/2022052022/60371c0fb244d92ed17491d6/html5/thumbnails/17.jpg)
Checking with Change Records
17
type id field oldval newval
inode 12 blockptr[1] 0 501
inode 12 i_size 4096 8192
inode 12 i_blocks 8 16
Bitmap 501 -- 0 1
BGD 0 free_blocks 1500 1499
Transaction appends a new block to inode 12
Bitmap bit X flip
from “0” to “1”
Block pointer
set to X
![Page 18: Recon: Verifying File System Consistency at Runtime · 2019. 2. 25. · Daniel Fryer, Jack (Kuei) Sun, Rahat Mahmood, TingHao Cheng, ... Data corruption: resume from hibernate always](https://reader034.vdocuments.us/reader034/viewer/2022052022/60371c0fb244d92ed17491d6/html5/thumbnails/18.jpg)
Outline
• Problem
• Metadata corruption cause by bugs
• Solution
• Recon
• Key idea
• Runtime checking
• Design
• Evaluation
• Complexity
• Corruption detection
• Performance overhead
18
![Page 19: Recon: Verifying File System Consistency at Runtime · 2019. 2. 25. · Daniel Fryer, Jack (Kuei) Sun, Rahat Mahmood, TingHao Cheng, ... Data corruption: resume from hibernate always](https://reader034.vdocuments.us/reader034/viewer/2022052022/60371c0fb244d92ed17491d6/html5/thumbnails/19.jpg)
Complexity
• Much simpler than FS code
• Only need to verify result of file system operations
• Each invariant can be checked independently
• Code divided into three sections
• Generic Recon framework: 1.5 kLOC
• Ext3 metadata interpretation: 1.5kLOC
• 31 Ext3 invariants: 800 LOC
19
![Page 20: Recon: Verifying File System Consistency at Runtime · 2019. 2. 25. · Daniel Fryer, Jack (Kuei) Sun, Rahat Mahmood, TingHao Cheng, ... Data corruption: resume from hibernate always](https://reader034.vdocuments.us/reader034/viewer/2022052022/60371c0fb244d92ed17491d6/html5/thumbnails/20.jpg)
Corruption Detection
20
31
79
52 59 112 17 72 352
2
2
1
4
25 8 23
31
0%
100%
Corr
upti
ons
C
aught
Detected by both e2fsck only Recon only
inode (stat)
inode (blk ptr)
inode (others)
dir
bgd
bbm
ibm
random
Recon matches e2fsck
![Page 21: Recon: Verifying File System Consistency at Runtime · 2019. 2. 25. · Daniel Fryer, Jack (Kuei) Sun, Rahat Mahmood, TingHao Cheng, ... Data corruption: resume from hibernate always](https://reader034.vdocuments.us/reader034/viewer/2022052022/60371c0fb244d92ed17491d6/html5/thumbnails/21.jpg)
Performance Evaluation
• Used Linux port of Sun’s FileBench
• Used 5 different emulated workloads
• webserver, webproxy, varmail, fileserver, ms_nfs
• ms_nfs configured to match metadata
characteristics from Microsoft study (FAST’11)
• 3 GHz dual core Xeon CPUs, 2 GB RAM
• 1 TB ext3 file system
21
![Page 22: Recon: Verifying File System Consistency at Runtime · 2019. 2. 25. · Daniel Fryer, Jack (Kuei) Sun, Rahat Mahmood, TingHao Cheng, ... Data corruption: resume from hibernate always](https://reader034.vdocuments.us/reader034/viewer/2022052022/60371c0fb244d92ed17491d6/html5/thumbnails/22.jpg)
Performance Evaluation
22
webserver webproxy varmail fileserver ms_nfs
Cache Size = 128MB
For reasonable cache sizes, performance impact is modest
![Page 23: Recon: Verifying File System Consistency at Runtime · 2019. 2. 25. · Daniel Fryer, Jack (Kuei) Sun, Rahat Mahmood, TingHao Cheng, ... Data corruption: resume from hibernate always](https://reader034.vdocuments.us/reader034/viewer/2022052022/60371c0fb244d92ed17491d6/html5/thumbnails/23.jpg)
Handling Violations
Several options
• Prevent all writes, remount read-only
• Preserves correctness
• Reduces availability
• Take snapshot of filesystem and continue
• Minimal availability impact, snapshot is correct
• Requires repair afterwards
• Micro-reboot file system or kernel
• Transparent to applications
• Overcomes transient failures
23
![Page 24: Recon: Verifying File System Consistency at Runtime · 2019. 2. 25. · Daniel Fryer, Jack (Kuei) Sun, Rahat Mahmood, TingHao Cheng, ... Data corruption: resume from hibernate always](https://reader034.vdocuments.us/reader034/viewer/2022052022/60371c0fb244d92ed17491d6/html5/thumbnails/24.jpg)
Conclusion
• All consistency properties of fsck can be
enforced on updates without full disk scan
• Checking can be done outside the file system,
entirely at the block layer
• Preventing corruption from being committed is a
huge win over after-the-fact repair!
24
![Page 25: Recon: Verifying File System Consistency at Runtime · 2019. 2. 25. · Daniel Fryer, Jack (Kuei) Sun, Rahat Mahmood, TingHao Cheng, ... Data corruption: resume from hibernate always](https://reader034.vdocuments.us/reader034/viewer/2022052022/60371c0fb244d92ed17491d6/html5/thumbnails/25.jpg)
Thanks!
• To our anonymous reviewers
• To our shepherd, Junfeng Yang
• To the Systems Software Reading Group @ U of T
For their many insightful comments & suggestions!
• To Vivek Lakshmanan
For early insights that helped start the project!
This work was supported by NSERC through the Discovery
Grants program
25