why panic () ? improving reliability through restartable file systems

Swaminathan Sundararaman, Sriram Subramanian, Abhishek Rajimwale, Andrea C.

Arpaci-Dusseau, Remzi H. Arpaci-Dusseau, Michael M. Swift

Applications require data Use FS to reliably store data

Both hardware and software can fail

Typical Solution Large clusters for availability Reliability through replication

2

GFS MasterGFS

Master

Sla

ve

Nod

esS

lave

N

odes

OS

FS

Replication infeasible for desktop environments

Wouldn’t RAID work? Can only tolerate H/W failures

FS crash are more severe Services/applications are killed Requiring OS reboot and

recovery Need: better reliability in the event of file system failures

3

Raid Controller

Dis

ks

Dis

k

App

App

App

MotivationBackgroundRestartable file systemsAdvantages and limitationsConclusions

4

6

int journal_mark_dirty(….){ struct reiserfs_journal_cnode *cn = NULL; if (!cn) { cn = get_cnode(p_s_sb); if (!cn) { reiserfs_panic(p_s_sb, "get_cnode failed!\n"); }}}

void reiserfs_panic(struct super_block *sb, ...){ BUG(); /* this is not actually called, but makes reiserfs_panic() "noreturn" */ panic("REISERFS: panic %s\n“, error_buf);}

ReiserFS

File systems already detect failures

Recovery: simplified by generic recovery mechanism

1. Code to recover from all failures Not feasible in reality

2. Restart on failure Previous work have taken this approach

FS need: stateful & lightweightrecovery

7

HeavyweightLightweight

Stat

eles

sSt

atef

ulNooks/Shadow

Xen, MinixL4, Nexus

SafeDriveSingularity

CuriOSEROS

Goal: build lightweight & stateful solution to tolerate file-system failures

Solution: single generic recovery mechanism for any file system failure

1. Detect failures through assertions2. Cleanup resources used by file system3. Restore file-system state before crash4. Continue to service new file system requests

8

FS Failures: completely transparent to applications

9

Transparency Multiple applications using FS upon crash Intertwined execution

Fault-tolerance Handle a gamut of failures Transform to fail-stop failures

Consistency OS and FS could be left in an inconsistent state

FS consistency required to prevent data loss

10

Not all FS support crash-consistency FS state constantly modified by applications

Periodically checkpoint FS state Mark dirty blocks as Copy-On-Write Ensure each checkpoint is atomically written

On Crash: revert back to the last checkpoint

11

VFS

File System

Application

Epoch 0 Epoch 1

time

chec

kpoi

ntOpen (“file”) write() read()

Completed In-progressLegend: Crash

write()

Periodically create

checkpoints1

Move to recent checkpoint4

Replay completed operations

5

Unwind in-flight

processes3

File System Crash2

Re-execute unwound process

6

1

2

4

5

6

write() Close()3

File systems constantly modified Hard to identify a consistent recovery

point

Naïve Solution: Prevent any new FS operation and call sync Inefficient and unacceptable overhead

12

13

VFS

File System

Page Cache

Disk

App

App

App

File Systems write to disk through Page Cache

All requests go through the VFS layer

ext3VFA

T Control requests to FS and dirty pages to disk

14

VFS

File System

Page Cache

Disk

App

VFS

File System

Page Cache

App

Disk

Regular

VFS

File System

Page Cache

App

Disk

STOP STOP

Membrane

11

Have built-in crash consistency mechanism Journaling or Snapshotting

Seamlessly integrate with these mechanism Need FSes to indicate beginning and end of

an transaction Works for data and ordered journaling mode Need to combine writeback mode with COW

15

Log operations at the VFS level Need not modify existing file systems

Operations: open, close, read, write, symlink, unlink, seek, etc. Read:

Logs are thrown away after each checkpoint

What about logging writes?16

Mainly used for replaying writesGoal: Reduce the overhead of

logging writes Soln: Grab data from page cache during

recovery

17

VFS

File System

Page Cache

VFS

File System

Page Cache

VFS

File System

Page Cache

Write (fd, buf, offset, count)

Setup

20

Restart ext2 during random-read micro benchmark

23

Data (Mb)

Recovery Time (ms)

10 12.920 13.240 16.1

24

Improves tolerance to file system failures Build trust in new file systems (e.g., ext4, btrfs)

Quick-fix bug patching Developer transform corruptions to restart Restart instead of extensive code restructuring

Encourage more integrity checks in FS code Assertions could be seamlessly transformed to

restart File systems more robust to failures/crashes

25

Only tolerate fail-stop failures Not address-space based Faults could corrupt other kernel components

FS restart may be visible to application e.g., Inode numbers could be changed after

restart

26

VFS

File System

Application

Epoch 0After Crash RecoveryBefore Crash

Epoch 0

create (“file1”) stat (“file1”) write (“file1”, 4k)

File : file1Inode# : 15

create (“file1”) stat (“file1”)write (“file1”, 4k)

File1: inode# 12

File1: inode# 15

Inode# Mismatch

File : file1Inode# : 12

Failures are inevitable in file systems Learn to cope and not hope to avoid them

Generic recovery mechanism for FS failures Improves FS reliability availability of

data Users: Install new FSes with confidence Developers: Ship FS faster; as not all

exception cases are now show-stoppers27

Questions and Comments

28

Advanced Systems Lab (ADSL)University of Wisconsin-Madison

http://www.cs.wisc.edu/adsl

why panic () ? improving reliability through restartable file systems

Documents

filesystem failuressolution

event of file system

crashwritewriteclosewhy

panic noreturn

swiftwhy panic

applicationswhy panic

better reliability

reiserfsfile systems