nooks - nyu computer science · os kernel programmer makes to insert nooks into a par-ticular os....
TRANSCRIPT
Nooks
Robert GrimmNew York University
The Three Questions
What is the problem?
What is new or different?
What are the contributions and limitations?
Design and Implementation
Nooks Overview
An isolation and recovery service
Manages kernel-space extensions
Targeted at commodity kernels
Implemented in Linux, should be easily portable to other OS's
Detects, removes, and restarts misbehaving extensions
But not malicious ones
With shadow drivers, also hides recovery from applications
Why Safe Extensionsfor Today's Commodity Kernels?
Cost of failures continues to rise
Downtime of mission-critical systems
Staffing for help-desk
Extensions are common-place
70% of Linux code
35,000 different drivers with 120,000 versions for Windows XP
Extensions are leading cause of failures
85% of failures for Windows XP
7 times more bugs in drivers than rest of kernel for Linux
Why Not Use X?
Capabilities, segments
Need specialized hardware, no support for recovery
Micro-, pico-, exo-kernels
No support for recovery, some performance concerns
Transactions
Sloooooooooooooooowwwwwwwwwwww
Type-safe languages and runtimes
Not backwards compatible
Software fault isolation
No support for recovery
Why Not Use X? (cont.)
Virtual machines
Still have drivers in VMM
Insight: Virtualize interface between kernel and extensions but not harware
I.e., we don't need to be perfect, just good enough
Nooks Principles and Goals
Two principles
Design for fault resistance, not fault tolerance
Design for mistakes, not abuse
Three goals
Isolation
Recovery
Backwards compatibility
Nooks Functionality
Isolation
Lightweight protection domainsfor extensions
Extension procedure call (XPC)
Interposition
Wrappers for all kernel / extension crossings
Manage control and data flow
Object-tracking
List of kernel data structures modified by extension
Recovery
Removal and restarting of extensions
!"#$%&'%(
$%&'%(#)*+%',-.',
/,.(0+-.'
/'+%&1.,-+-.'
!23%4+
5&046-'7
8%4.9%&: ;..6,
/,.(0+-.'
<0'07%&
Figure 1: The Nooks Isolation Manager, a transparentOS layer inserted between the kernel and kernel exten-sions.
failures in the extension before they infect other partsof the kernel.
2. Recovery. The architecture must support automaticrecovery after failure to permit applications that de-pend on a failing extension to continue executing.
3. Backward Compatibility. The architecture must applyto existing systems and existing extensions, with min-imal changes to either.
Achieving all three goals in an existing operating systemis challenging. In particular, the need for backward com-patibility rules out certain otherwise appealing technologies,such as type safety and capability-based hardware. Further-more, backward compatibility implies that the performanceof a system using Nooks should not be significantly worsethan a system without it.
3.2 Functions
We achieve the preceding goals by creating a new operatingsystem reliability layer that is inserted between the exten-sions and the OS kernel. The reliability layer intercepts allinteractions between the extensions and the kernel to facili-tate isolation and recovery. A crucial property of this layer istransparency, i.e., to meet our backward compatibility goals,it must be largely invisible to existing components.
Figure 1 shows this new layer, which we call the NooksIsolation Manager (NIM). Above the NIM is the operatingsystem kernel. The NIM function lines jutting up into thekernel represent kernel-dependent modifications, if any, theOS kernel programmer makes to insert Nooks into a par-ticular OS. These modifications need only be made once.Underneath the NIM is the set of isolated extensions. Thefunction lines jutting down below the NIM represent thechanges, if any, the extension writer makes to interface aspecific extension or extension class to Nooks. In general,no modifications should be required at this level, since ourmajor objective is transparency for existing extensions.
The NIM provides four major architectural functions, asshown in Figure 1: Isolation, Interposition, Object Tracking,and Recovery. We describe each function below.
3.2.1 Isolation
The Nooks isolation mechanisms prevent extension errorsfrom damaging the kernel (or other isolated extensions). Ev-ery extension in Nooks executes within its own lightweightkernel protection domain. This domain is an execution con-text with the same processor privilege as the kernel butwith write access to a limited portion of the kernel’s addressspace.
The major task of the isolation mechanism, then, isprotection-domain management. This involves the creation,manipulation, and maintenance of lightweight protection do-mains. The secondary task is inter-domain control transfer.Isolation services support the control flow in both directionsbetween extension domains and the kernel domain.
Unlike system calls, which are always initiated by an ap-plication, the kernel frequently calls into extensions. Thesecalls may generate callbacks into the kernel, which may thengenerate a call into the extension, and so on. This com-plex communication style is handled by a new kernel service,called the Extension Procedure Call (XPC) – a control trans-fer mechanism specifically tailored to isolating extensionswithin the kernel. This mechanism resembles LightweightRemote Procedure Call (LRPC) [2] and Protected Proce-dure Call (PPC) in capability systems [12]. However, LRPCand PPC handle control and data transfer between mutu-ally distrustful peers. XPC occurs between trusted domainsbut is asymmetric (i.e., the kernel has more rights to theextension’s domain than vice versa).
3.2.2 Interposition
The Nooks interposition mechanisms transparently inte-grate existing extensions into the Nooks environment. Inter-position code ensures that: (1) all extension-to-kernel andkernel-to-extension control flow occurs through the XPCmechanism, and (2) all data transfer between the kernel andextension is viewed and managed by Nooks’ object-trackingcode (described below).
The interface between the extension, the NIM, and thekernel is provided by a set of wrapper stubs that are part ofthe interposition mechanism. Wrappers resemble the stubsin an RPC system [4] that provide transparent control anddata transfer across address space (and machine) bound-aries. Nooks’ stubs provide transparent control and datatransfer between the kernel domain and extension domains.Thus, from the extension’s viewpoint, the stubs appear tobe the kernel’s extension API. From the kernel’s point ofview, the stubs appear to be the extension’s function entrypoints.
3.2.3 Object Tracking
The NIM’s object-tracking functions oversee all kernel re-sources used by extensions. In particular, object-trackingcode: (1) maintains a list of kernel data structures thatare manipulated by an extension, (2) controls all modifica-tions to those structures, and (3) provides object informa-tion for cleanup when an extension fails. An extension’sprotection domain cannot modify kernel data structures di-rectly. Therefore, object-tracking code must copy kernelobjects into an extension domain so they can be modifiedand copy them back after changes have been applied. Whenpossible, object-tracking code verifies the type and accessi-
210
Nooks ImplementationAdditional layer for Linux 2.4.18
Same privilege for all code (ring 0)
Memory protection through page tables
Interposition
Interposition
Application Daemon
Device Device Device
Driver Driver
Driver
Driver
Linux Kernel
Nooks Isolation Manager
ApplicationApplications
DaemonDaemons
Interposition
Kernel
Service
Kernel
Service
Driver
Nooks Recovery
Agent
Kernel
Service
Kernel
Service
Figure 2: The Nooks layer (shaded) inside the LinuxOS, showing wrapped Linux extensions executing in iso-lated protection domains. It is not necessary to wrap allextensions, as indicated by the unshaded extensions onthe right.
bility of each parameter that passes between the extensionand kernel. Kernel routines can then avoid scrutinizing pa-rameters, executing checks only when called from unreliableextensions.
3.2.4 Recovery
Nooks’ recovery functions detect and recover from a varietyof extension faults. Nooks detects a software fault when anextension invokes a kernel service improperly (e.g., with in-valid arguments) or when an extension consumes too manyresources. In this case, recovery policy determines whetherNooks triggers recovery or returns control to the extension,with an error code when possible. Nooks detects a hard-ware fault when the processor raises an exception duringextension execution, e.g., when an extension attempts toread unmapped memory or to write memory outside of itsprotection domain. Hardware faults can be recovered fromonly outside the faulting extension, so Nooks always triggersrecovery in this case.
Faulty behavior may also be detected from outside Nooksby a user or a program. The user or program can thentrigger Nooks recovery explicitly.
Extensions executing in a Nooks domain only accessdomain-local memory directly. All extension access to kernelresources is managed and tracked through wrappers. There-fore, Nooks can successfully release extension-held kernelstructures, such as memory objects or locks, during the re-covery process.
4. IMPLEMENTATION
We implemented Nooks inside the Linux 2.4.18 kernel onthe Intel x86 architecture.2 We chose Linux as our platformbecause of its popularity and its wide support for kernel ex-tensions in the form of loadable modules. Our experience
2Updating to newer versions is straightforward, e.g., portingfrom version 2.4.10 to 2.4.18 required no changes to Nooksand only a few minor adjustments where Linux kernel func-tions had moved between files.
Source Components # Lines
Memory Management 1,882Object Tracking 1,454Extension Procedure Call 770Wrappers 14,396Recovery 1,136Linux Kernel Changes 924Miscellaneous 2,074
Total number of lines of code 22,266
Table 2: The number of non-comment lines of sourcecode in Nooks.
working inside other operating systems, though, includingWindows NT, DEC OSF/1, VMS, NetBSD, and Mach, sug-gests that Linux may be a worst-case Nooks target.3 TheLinux kernel provides over 700 functions callable by exten-sions and more than 650 extension-entry functions callableby the kernel. Moreover, few data types are abstracted, andextensions directly access fields in many kernel data struc-tures. Despite these challenges, one developer brought thesystem from concept to function in about 18 months.
The Linux kernel supports standard interfaces for manyextension classes. For example, there is a generic interfacefor block and character devices, and another one for file sys-tems. The interfaces are implemented as C language struc-tures containing a set of function pointers.
Most interactions between the kernel and extensions takeplace through function calls, either from the kernel into ex-tensions or from extensions into exported kernel routines.Some global data structures, such as the current task struc-ture, are directly accessed by extensions. Fortunately, ex-tensions modify few of these structures, and frequently doso through preprocessor macros and inline functions. As aresult, Nooks can interpose on most extension/kernel inter-actions by intercepting the function calls between the exten-sions and kernel.
Figure 2 shows the Nooks layer inside of Linux. Underthe Nooks Isolation Manager are isolated kernel extensions:a single device driver, three stacked drivers, and a kernel ser-vice. These extensions are wrapped by Nooks wrapper stubs,as indicated by the shaded boxes surrounding them. Eachwrapped box, containing one or more extensions, representsa single Nooks protection domain. Figure 2 also shows un-wrapped kernel extensions that continue to interface directlyto the Linux kernel.
The NIM exists as a Linux layer that implements thefunctions described in the previous section. To facilitateportability, we do not use the Intel x86 protection rings ormemory segmentation mechanisms. Instead, extensions ex-ecute at the same (ring 0) privilege level as the rest of thekernel. Memory protection is provided through the conven-tional page table architecture and can be implemented bothwith hardware- and software-filled TLBs.
Table 2 shows the size of the Nooks implementation.Nooks is composed of about 22,000 lines of code. In con-trast, the kernel itself has 2.4 million lines, and all of thecode that ships as part of the Linux 2.4 distribution kit ac-
3While we developed Nooks on Linux, we expect that thearchitecture and design could be ported to other operatingsystems, such as Windows XP or Solaris.
211
Compared to2.4 million lines
in the Linux kernel
Isolation
Lightweight protection domains
Private memory structures for each extension
Heap, stacks, memory-mapped I/O regions, buffers
Different page tables for kernel and each extension (why?)
Kernel can read and write all memory
Each extension can only write its own memory
XPC
Saves caller's context, finds stack, changes page tables
May be deferred
Amortize cost over several logical transfers
Interposition and Wrappers
How to interpose?
Explicitly interpose on extension initialization call
Replace function pointers with wrapped versions
What about kernel objects (i.e., data structures)?
Some are read only ➨ done
Some are written by extensions
Non-performance-critical updates through XPC
Performance-critical updates on shadow copy,synchronized through a deferred XPC on next regular XPC
Call-by-what?
More on Wrappers
Check parameters for validity
Implement call-by-value-result for kernel objects
Perform XPC
Skeleton generatedby tool
Body writtenby hand
!"#$"% &'("$)*+$
,-.
&'("$)*+$
,-.
!"#$"%
/#011"#
/#011"#
.+12342$56
.7"58.7"58
.+12342$56
Figure 4: Control flow of extension and kernel wrappers.
of all control and data transfers in both directions. Inter-position required two changes to Linux kernel code. First,we modified the standard module loader to bind extensionsto wrappers instead of kernel functions when the extensionsare loaded. We also modified the kernel’s module initial-ization code to explicitly interpose on the initialization callinto an extension, enabling the extension to execute withinits lightweight protection domain. Following initialization,all function pointers passed from the extension to the kernelare replaced by wrapper pointers. This causes the kernel tocall wrappers rather than extension procedures directly.
In addition to interposing on control transfers, Nooksmust interpose on some data references. The Linux ker-nel exports many objects that are only read by extensions(e.g., the current time). These objects are linked directlyinto the extension so they can be freely read. Other ker-nel objects are directly written by extensions. We changedmacros and inline functions that directly modify kernel ob-jects into wrapped function calls. For object modificationsthat are not performance critical, Nooks converts the objectaccess into an XPC into the kernel. For performance-criticaldata structures, we create a shadow copy of the kernel objectwithin the extension’s domain. The contents of the kernelobject and the shadow object are synchronized before andafter XPCs into the extension. This technique is used, forexample, for the softnet data structure, which contains aqueue of the packets sent and received by a network device.
Nooks uses deferred XPC to synchronize extension modifi-cations to objects explicitly passed from the kernel to exten-sions. In Linux, the kernel often returns a kernel structurepointer to an extension for structure modification, with noexplicit synchronization of the update. The kernel assumesthat the modification is atomic and that the extension willupdate it “in time.” In such cases, the wrapper queues adeferred function call to copy the modified object back tothe kernel at the extension’s next XPC return to the kernel.
4.3 WrappersAs noted above, Nooks inserts wrapper stubs between ker-nel and extension functions. There are two types of wrap-pers: kernel wrappers are called by extensions to executekernel-supplied functions; extension wrappers are called bythe kernel to execute extension-supplied functions. In eithercase, a wrapper functions as an XPC stub that appears to
the caller as if it were the target procedure in the calleddomain.
Both wrapper types perform the body of their work withinthe kernel’s protection domain. Therefore, the domainchange occurs at a di!erent point depending on the directionof transfer, as shown in Figure 4. When an extension calls akernel wrapper, the wrapper performs an XPC on entry sothat the body of the wrapper (i.e., object checking, copying,etc.) can execute in the kernel’s domain. Once the wrap-per’s work is done, it calls the target kernel function directlywith a (local) procedure call. In the opposite direction, whenthe kernel calls an extension wrapper, the wrapper executeswithin the kernel’s domain. When it is done, the wrapperperforms an XPC to transfer to the target function withinthe extension.
Wrappers perform three basic tasks. First, they check pa-rameters for validity by verifying with the object tracker andmemory manager that pointers are valid. Second, object-tracker code within wrappers implements call-by-value-resultsemantics for XPC, by creating a copy of kernel objects onthe local heap or stack within the extension’s protection do-main. No marshalling or unmarshalling is necessary, becausethe extension and kernel share the kernel address space. Forsimple objects, the synchronization code is placed directlyin the wrappers. For more complex objects, such as file sys-tem inodes or directory entries that have many pointers toother structures, we wrote explicit synchronization routinesto copy objects between the kernel and an extension. Third,wrappers perform an XPC into the kernel or extension toexecute the desired function, as shown in Figure 4.
Wrappers are relatively straightforward to write and in-tegrate into the kernel. We developed a tool that auto-matically generates wrapper entry code and the skeleton ofwrapper bodies from Linux kernel header files. To createthe wrappers for exported kernel functions, the tool takesa list of kernel function names and generates wrappers thatimplement function interposition through XPC. Similarly,for the kernel-to-extension interface, the tool takes a list ofinterfaces (C structures containing function pointers) andgenerates wrappers for the kernel to call.
We wrote the main wrapper body functions by hand. Thisis a one-time task required to support the kernel-extensioninterface for a specific OS. This code verifies that parame-ters are correct and moves parameters between protectiondomains. Once written, wrappers are automatically usableby all extensions that use the kernel’s interface. Writinga wrapper requires knowing how parameters are used: Arethey alive across calls? Can they be passed to other threads?What parameters or fields of parameters may be modified?We performed this task by hand, but metacompilation [14]could be used to determine the characteristics of extensionsby analyzing the set of existing drivers.
4.3.1 Wrapper Code Sharing
Table 2 showed that the Nooks implementation includes 14Klines of wrapper code, over half of the Nooks code base.We implemented 248 wrappers, which we use to isolate 463imported and exported functions. That is, wrapper codeis often shared among multiple drivers in a class or acrossclasses.
Section 5 describes the eight extensions we isolated for ourNooks experiments: two sound-card drivers (sb and es1371),four Ethernet drivers (pcnet32, e1000, 3c59x, and 3c90x), a
213
Even More on WrappersWrappers Used By Extensions
0
10
20
30
40
50
60
70
80
90
100
sb
es13
71
e100
0
pcne
t323c
59x
3c90
xvfa
t
khttp
d
Extension
Num
ber
of W
rapp
ers
Use
d
Unique to thisextensionShared with at leastone other extensionCommon to thisdriver classCommon to alldriversCommon to allextensions
Figure 5: Code sharing among wrappers for di!erentextensions.
file system (VFAT), and an in-kernel Web server (kHTTPd).Figure 5 shows the total number of wrappers (both kerneland extension wrappers) used by each of these extensions.Each bar gives a breakdown of the number of wrappersunique to that extension and the number of wrappers sharedin various ways. For example, of the 44 wrappers used bythe pcnet32 Ethernet driver (31 kernel wrappers and 13 ex-tension wrappers), 27 are shared among the four networkdrivers. Similarly, 39 wrappers are shared between the twosound-card drivers. Overall, of the 159 wrappers that arenot shared, 114 are in the one-of-a-kind extensions VFATand kHTTPd.
4.4 Object TrackingThe Nooks object-tracking mechanism manages the manip-ulation of kernel objects by extensions. An object is a kerneldata structure accessed through a pointer.
The object tracker records all kernel objects and typesin use by extensions. Our Nooks implementation currentlysupports 43 di!erent kernel object types, such as tasklets,PCI devices, inodes, and memory pages. To determine theset of objects to track, we inspected the interfaces betweenthe kernel and our supported extensions and noted everyobject that passed through those interfaces. We then wroteobject-tracking procedures for each of the 43 object typesthat we saw. For each object type, there is a unique typeidentifier and code to release instances of that type duringrecovery.
The Nooks object tracker in Nooks performs two tasks.First, it records the addresses of all objects in use by anextension. Objects used only for the duration of a singleXPC call are recorded in a table attached to the currenttask structure. Objects with long lifetimes are recorded in aper-protection-domain hash table. Second, for objects thatmay be written by an extension, the object tracker recordsan association between the kernel and extension versions ofthe object. This association is used by wrappers to passparameters between the extension’s protection domain andthe kernel’s protection domain.
The object tracker must know the lifetimes of objects toperform garbage collection, when necessary, or to prevent
extensions from using dangling references. Currently, thiscode can be written only by examining the kernel-extensioninterface. There are several common paradigms. For exam-ple, some objects are accessible to the extension only duringthe lifetime of a single XPC call from the kernel. In thiscase, we add the object to the tracker’s database when thecall begins and remove it on return. Other objects are ex-plicitly allocated and deallocated by the extension, in whichcase we know their lifetimes exactly. In still other cases, wego by the semantics of the object and its use. For exam-ple, extensions allocate the timer data structure to suspenda task. We add this object to the object tracker when anextension calls add timer and remove it when the timer exe-cutes, at which point we know that it can no longer be used.In some cases, it may be necessary to modify the kernel tonotify Nooks when an object is deleted.
Complex objects may be handled in di!erent ways. Insome cases, Nooks copies objects into the application do-main, following embedded pointers as appropriate. In othercases, Nooks avoids copying, for example, by mapping net-work packets and disk blocks into and out of an extension.A “page tracker” mechanism within the object tracker re-members the state of these mapped pages and grants andrevokes extension access to the pages.
4.5 RecoveryRecovery in Nooks consists of two parts. After a fault oc-curs, the recovery manager releases resources in use by theextension. The user-mode agent coordinates recovery anddetermines what course of action to take.
Nooks triggers recovery when it detects a failure throughsoftware checks (e.g., parameter validation or livelock de-tection), processor exceptions, or explicit external signals.After a failure, Nooks suspends the running extension andnotifies the recovery manager.
The Nooks recovery manager is tasked with returningthe system, including the extension, to a clean state fromwhich it can continue. The recovery manager executes inphases to ensure that resources are not used after they arereleased. The first phase of recovery is specific to devicedrivers: Nooks disables interrupt processing for the devicecontrolled by the extension, preventing livelock that couldoccur if device interrupts are not properly dismissed. It thenstarts the user-mode recovery agent, which controls the sub-sequent phases of recovery.
The user-mode recovery agent facilitates flexible recoveryby consulting configuration files that set recovery policy forspecific extensions or classes of extensions. The agent canperform extension-specific recovery actions as well as notifysystem managers of the fault. It can also change config-uration parameters, replace the extension, or even disablerecovery if the extension fails too frequently. The agent re-quires that many kernel components, such as a file systemand disk driver, function properly.
By default, the recovery agent initiates full recovery offaulting extensions by unloading the extension, releasing allof its kernel and physical resources, and then reloading andrestarting the extension. The agent first calls the recoverymanager to release any resources that may be safely reusedby the kernel.
The recovery manager signals tasks that are currently ex-ecuting within the extension, or have called through the ex-tension, to unwind. For a task in a non-interruptible state
214
Code sharedbetween extensions!
Object Tracker
Currently supports 43 types
E.g., tasklets, PCI devices, inodes
Records addresses of all objects
If object used for one XPC, table attached to task structure
If object used across several XPCs, hash table
Keeps mapping between kernel and extension versions
Tracks object lifetimes
Single XPC call
Explicit allocation and deallocation
Semantics of object (e.g., timer data structure)
Recovery
Triggered by
Parameter validation, livelock detection, exceptions, signals
Performed by
Recovery manager
Cleans up after extension
User-mode agent
Determines recovery policy
Broken into several stages
Disable interrupts, unwind tasks, release resources, unload extension, reload and init extension, re-enable interrupts
Limitations
Extensions run in kernel mode
May execute privileged instructions
May loop forever (but Nooks detects livelock)
Parameter checking is incomplete
Recovery safe only for dynamically loaded extensions
Evaluation
Evaluation Criteria
The two eff's
Effectiveness (reliability)
Inject faults into extensions
By hand
Automatically based on common bugs
Efficiency (performance)
Measure latency/throughput with and w/o Nooksbut exactly same code (why?)
EffectivenessSystem Crashes
0
40
80
120
160
200
sb e1000 pcnet32 VFAT kHTTPd
Extension under test
Num
ber
of c
rash
es
Native Nooks
Figure 6: The reduction in system crashes in 2000 fault-injection trials (400 for each extension) observed usingNooks. In total, there were 317 system crashes in thenative configuration and only four system crashes withNooks.
5.2.2 Non-Fatal Extension Failures
The preceding subsection makes clear that Nooks is e!ec-tive in achieving its goal – preventing system crashes dueto extension failures. While Nooks is designed to protectthe OS from misbehaving extensions, it is not designed todetect erroneous extension behavior. For example, the net-work could disappear because the device driver corrupts thedevice registers, or a mounted file system might simply be-come non-responsive due to a bug. Neither of these failuresis fatal to the system in its own right, and Nooks does not de-tect (nor is it intended to detect) such problems. If Nookscould detect such problems, however, its recovery servicescould safely restart the faulty extensions.
Our fault-injection trials cause a number of non-fatal ex-tension failures, allowing us to examine Nooks’ e!ectivenessin dealing with these cases, as well. Figure 7 shows theextent to which Nooks reduces non-fatal extension failuresthat occurred in native Linux. In reality, these results aresimply a reflection of the Linux handling of process- andinterrupt-oriented extension code, as previously described.That is, Nooks can trap exceptions in process-oriented ex-tensions and can recover the extensions to bring them to aclean state in many cases.
For the two interrupt-oriented Ethernet drivers (e1000and pcnet32), Nooks already eliminated all system crashesresulting from extension exceptions. The remaining non-crash failures are those that leave the device in a non-functional state, e.g., unable to send or receive packets.Nooks cannot remove these failures for e1000 and pcnet32,since it cannot detect them. The one or two extension fail-ures it eliminated occurred when the device was being ma-nipulated by process-oriented code.
For VFAT and the sb sound card driver, Nooks did reducethe number of non-fatal extension failures, because kernelexceptions in process-oriented code caused Linux to termi-nate the calling process and leave the extension in an ill-defined state. Nooks could detect the processor exceptionsand perform an extension recovery, thereby allowing the ap-plication to continue. The remaining non-fatal extension
Non-fatal Extension Failures
0
50
100
150
200
250
sb e1000 pcnet32 VFAT kHTTPd
Extension under test
Num
ber
of fa
ilure
s
Native Nooks
Figure 7: The reduction in non-fatal extension failuresobserved using Nooks. In total, there were 512 suchfailures in the native configuration and 212 with Nooks.
failures, which occurred under native Linux and Nooks, wereserious enough to leave the extension in a non-functioningstate but not serious enough to generate a processor excep-tion that could be trapped by Nooks.
kHTTPd is similar to the interrupt-oriented drivers be-cause it causes corruption that leads to interrupt-level faults.However, a small number of injected faults caused an excep-tions within the kHTTPd process-oriented code. These werecaught by Nooks and an extension failure was avoided.
In general, the remaining non-fatal extension failures un-der Nooks were the result of deadlock or data structure cor-ruption within the extension itself. Fortunately, such fail-ures were localized to the extension and could usually berecovered from once discovered. It is straightforward to de-velop a “nanny” service that probes for disabled extensionsand invokes Nooks’ recovery procedures, as appropriate. Al-ternatively, the failure could be detected by the user, whocan then invoke Nooks to initiate a manual recovery.
5.2.3 Recovery Errors
The Nooks recovery procedure is straightforward – a faultingextension is unloaded, reloaded, and restarted. For network,sb, and kHTTPd extensions, this process improves reliabil-ity directly. For VFAT, however, which deals with persistentstate stored on disk, there is some chance that the extensiondamaged critical on-disk structures before Nooks detected anerror condition.
In practice, we found that in 90% of the cases, VFAT re-covery resulted in on-disk corruption (i.e., lost or corruptfiles or directories). Since fault injection occurs after manyfiles and directories have been created, the abrupt shutdownand restart of the file system leaves them in a corruptedstate. As an experiment, we caused Nooks to synchronizethe disks with the in-memory disk cache before releasing re-sources on a VFAT recovery. This reduced the number ofcorruption cases from 90% to 10%. While we would not ex-pect Nooks to do this automatically, it suggests that theremay be straightforward extensions to Nooks that could im-prove recovery through the use of application-specific recov-ery services.
217
System Crashes
0
40
80
120
160
200
sb e1000 pcnet32 VFAT kHTTPd
Extension under test
Num
ber
of c
rash
es
Native Nooks
Figure 6: The reduction in system crashes in 2000 fault-injection trials (400 for each extension) observed usingNooks. In total, there were 317 system crashes in thenative configuration and only four system crashes withNooks.
5.2.2 Non-Fatal Extension Failures
The preceding subsection makes clear that Nooks is e!ec-tive in achieving its goal – preventing system crashes dueto extension failures. While Nooks is designed to protectthe OS from misbehaving extensions, it is not designed todetect erroneous extension behavior. For example, the net-work could disappear because the device driver corrupts thedevice registers, or a mounted file system might simply be-come non-responsive due to a bug. Neither of these failuresis fatal to the system in its own right, and Nooks does not de-tect (nor is it intended to detect) such problems. If Nookscould detect such problems, however, its recovery servicescould safely restart the faulty extensions.
Our fault-injection trials cause a number of non-fatal ex-tension failures, allowing us to examine Nooks’ e!ectivenessin dealing with these cases, as well. Figure 7 shows theextent to which Nooks reduces non-fatal extension failuresthat occurred in native Linux. In reality, these results aresimply a reflection of the Linux handling of process- andinterrupt-oriented extension code, as previously described.That is, Nooks can trap exceptions in process-oriented ex-tensions and can recover the extensions to bring them to aclean state in many cases.
For the two interrupt-oriented Ethernet drivers (e1000and pcnet32), Nooks already eliminated all system crashesresulting from extension exceptions. The remaining non-crash failures are those that leave the device in a non-functional state, e.g., unable to send or receive packets.Nooks cannot remove these failures for e1000 and pcnet32,since it cannot detect them. The one or two extension fail-ures it eliminated occurred when the device was being ma-nipulated by process-oriented code.
For VFAT and the sb sound card driver, Nooks did reducethe number of non-fatal extension failures, because kernelexceptions in process-oriented code caused Linux to termi-nate the calling process and leave the extension in an ill-defined state. Nooks could detect the processor exceptionsand perform an extension recovery, thereby allowing the ap-plication to continue. The remaining non-fatal extension
Non-fatal Extension Failures
0
50
100
150
200
250
sb e1000 pcnet32 VFAT kHTTPd
Extension under test
Num
ber
of fa
ilure
s
Native Nooks
Figure 7: The reduction in non-fatal extension failuresobserved using Nooks. In total, there were 512 suchfailures in the native configuration and 212 with Nooks.
failures, which occurred under native Linux and Nooks, wereserious enough to leave the extension in a non-functioningstate but not serious enough to generate a processor excep-tion that could be trapped by Nooks.
kHTTPd is similar to the interrupt-oriented drivers be-cause it causes corruption that leads to interrupt-level faults.However, a small number of injected faults caused an excep-tions within the kHTTPd process-oriented code. These werecaught by Nooks and an extension failure was avoided.
In general, the remaining non-fatal extension failures un-der Nooks were the result of deadlock or data structure cor-ruption within the extension itself. Fortunately, such fail-ures were localized to the extension and could usually berecovered from once discovered. It is straightforward to de-velop a “nanny” service that probes for disabled extensionsand invokes Nooks’ recovery procedures, as appropriate. Al-ternatively, the failure could be detected by the user, whocan then invoke Nooks to initiate a manual recovery.
5.2.3 Recovery Errors
The Nooks recovery procedure is straightforward – a faultingextension is unloaded, reloaded, and restarted. For network,sb, and kHTTPd extensions, this process improves reliabil-ity directly. For VFAT, however, which deals with persistentstate stored on disk, there is some chance that the extensiondamaged critical on-disk structures before Nooks detected anerror condition.
In practice, we found that in 90% of the cases, VFAT re-covery resulted in on-disk corruption (i.e., lost or corruptfiles or directories). Since fault injection occurs after manyfiles and directories have been created, the abrupt shutdownand restart of the file system leaves them in a corruptedstate. As an experiment, we caused Nooks to synchronizethe disks with the in-memory disk cache before releasing re-sources on a VFAT recovery. This reduced the number ofcorruption cases from 90% to 10%. While we would not ex-pect Nooks to do this automatically, it suggests that theremay be straightforward extensions to Nooks that could im-prove recovery through the use of application-specific recov-ery services.
217
Nooks recoversform 99% of
system crashes
But catches only a fractionof non-fatal failures
Efficiency
XPC rate serves as performance indicator
Three broad categories
Benchmark Extension XPC Nooks Native NooksRate Relative CPU CPU
(per sec) Performance Util. (%) Util. (%)
Play-mp3 sb 150 1 4.8 4.6Receive-stream e1000 (receiver) 8,923 0.92 15.2 15.5Send-stream e1000 (sender) 60,352 0.91 21.4 39.3Compile-local VFAT 22,653 0.78 97.5 96.8
Serve-simple-web-page kHTTPd (server) 61,183 0.44 96.6 96.8Serve-complex-web-page e1000 (server) 1,960 0.97 90.5 92.6
Table 4: The relative performance of Nooks compared to native Linux for six benchmark tests. CPU utilization is accu-rate to only a few percent. Relative performance is determined either by comparing latency (Play-mp3, Compile-local)or throughput (Send-stream, Receive-stream, Serve-simple-web-page, Serve-complex-web-page). The data reflects theaverage of three trials with a standard deviation of less than 2%.
Time spent in kernel mode for Compile-local benchmark
0
20
40
60
80
100
120
140
160
180
Native NooksConfiguration under test
Ker
nel t
ime
(sec
onds
) Misc. NooksPage trackingObject trackingXPCVFATOther kernel
Figure 8: Comparative times spent in kernel mode forthe Compile-local (VFAT) benchmark. During the runwith Nooks, the system performed 10,936,572 XPCs intothe kernel and 4,316,338 XPCs into the extension. Timein user mode (not shown) was identical for both config-urations (480 seconds).
This benchmark provides a good opportunity to analyzethe performance impact of our approach. We used statis-tical profiling of the kernel to compare the execution timesfor the VFAT benchmark under the native and Nooks con-figurations. The results are shown in Figure 8. User time(not shown) for both configurations was identical, about 480seconds; however, kernel time is significantly di!erent. Fornative Linux, about 29.5 seconds were spent in the kernel asa whole, of which about 1.5 seconds were spent within theVFAT code. In contrast, Nooks spent about 165 seconds inthe kernel for the same benchmark.
At a high level, Figure 8 illustrates two points about theperformance of Nooks compared to native Linux: there’smore code to run, and the code runs more slowly. The firstpoint is shown by the additional execution components thatappear with Nooks, which reflect isolation and recovery sup-port. The isolation costs manifest themselves in terms ofXPC overhead and in terms of the subsequent TLB missesthat result from changing protection domains. Nooks’ recov-ery support is reflected in terms of the time spent trackingpages and objects. Fundamentally, page and object trackingoccur on every XPC in which a pointer parameter is passed.They allow the Nooks recovery manager to correctly recoverkernel resources in the event of an extension failure. These
measurements demonstrate that isolation and recovery haveroughly equivalent costs, or, more generally, that Nooks’support for recovery doubles the cost of its cross-domaincommunication mechanism for this workload.
The second point – that code runs more slowly – is demon-strated by the fact that the “Other kernel” and “VFAT”components slow down with Nooks. Since the code in bothconfigurations is the same, we speculated that the di!erencewas due to the TLB flushes that occurred during each XPC.Using the Pentium 4 performance counters, we monitoredTLB behavior during runs of the compilation benchmark.Of the 167 kernel-mode seconds spent in Nooks, 127 arespent handling TLB misses. By comparison, native Linuxspends just 5.5 seconds in TLB miss handling. The TLBtherefore causes most of the increase in execution time forthe kernel and VFAT and much of the execution time ofNooks itself.
While other systems have used segmentation to avoidthese additional TLB misses [31, 49], our goal of back-ward compatibility precluded rewriting drivers for a seg-mented addressing model. We have done little to optimizeNooks, but we believe that significant speedup is possiblethrough software techniques that have been demonstratedby other researchers, such as selective TLB flushing andreloading, handcoding critical paths in assembler, and us-ing more finely tuned data structures [42, 13].
6.4 Web Server Benchmarks
The final two benchmarks illustrate the impact on serverperformance of transactional workloads. Serve-simple-web-page uses a high XPC-frequency extension (kHTTPd) onthe server to deliver static content cached in memory. Weused httperf [33] to generate a workload that repeatedly re-quested a single kilobyte-sized Web page and measured themaximum possible request rate. kHTTPd on native Linuxcan serve over 15,000 pages per second. With Nooks, it canserve about 6,000, representing a 60% decrease in through-put.
Two elements of the benchmark’s behavior conspire toproduce such poor performance. First, the kHTTPd serverprocessor is the system bottleneck. For example, whenrun natively, the server’s CPU utilization is nearly 96%.Consequently, the high XPC rate slows the server substan-tially. Second, since the workload is transactional and non-bu!ered, the client’s request rate drops as a function of theserver’s slowdown. By comparison, the Send-stream bench-mark, which exhibits roughly the same rate of XPCs but
219
Efficiency (cont.)
There's more code to run
The code runs more slowly
Benchmark Extension XPC Nooks Native NooksRate Relative CPU CPU
(per sec) Performance Util. (%) Util. (%)
Play-mp3 sb 150 1 4.8 4.6Receive-stream e1000 (receiver) 8,923 0.92 15.2 15.5Send-stream e1000 (sender) 60,352 0.91 21.4 39.3Compile-local VFAT 22,653 0.78 97.5 96.8
Serve-simple-web-page kHTTPd (server) 61,183 0.44 96.6 96.8Serve-complex-web-page e1000 (server) 1,960 0.97 90.5 92.6
Table 4: The relative performance of Nooks compared to native Linux for six benchmark tests. CPU utilization is accu-rate to only a few percent. Relative performance is determined either by comparing latency (Play-mp3, Compile-local)or throughput (Send-stream, Receive-stream, Serve-simple-web-page, Serve-complex-web-page). The data reflects theaverage of three trials with a standard deviation of less than 2%.
Time spent in kernel mode for Compile-local benchmark
0
20
40
60
80
100
120
140
160
180
Native NooksConfiguration under test
Ker
nel t
ime
(sec
onds
) Misc. NooksPage trackingObject trackingXPCVFATOther kernel
Figure 8: Comparative times spent in kernel mode forthe Compile-local (VFAT) benchmark. During the runwith Nooks, the system performed 10,936,572 XPCs intothe kernel and 4,316,338 XPCs into the extension. Timein user mode (not shown) was identical for both config-urations (480 seconds).
This benchmark provides a good opportunity to analyzethe performance impact of our approach. We used statis-tical profiling of the kernel to compare the execution timesfor the VFAT benchmark under the native and Nooks con-figurations. The results are shown in Figure 8. User time(not shown) for both configurations was identical, about 480seconds; however, kernel time is significantly di!erent. Fornative Linux, about 29.5 seconds were spent in the kernel asa whole, of which about 1.5 seconds were spent within theVFAT code. In contrast, Nooks spent about 165 seconds inthe kernel for the same benchmark.
At a high level, Figure 8 illustrates two points about theperformance of Nooks compared to native Linux: there’smore code to run, and the code runs more slowly. The firstpoint is shown by the additional execution components thatappear with Nooks, which reflect isolation and recovery sup-port. The isolation costs manifest themselves in terms ofXPC overhead and in terms of the subsequent TLB missesthat result from changing protection domains. Nooks’ recov-ery support is reflected in terms of the time spent trackingpages and objects. Fundamentally, page and object trackingoccur on every XPC in which a pointer parameter is passed.They allow the Nooks recovery manager to correctly recoverkernel resources in the event of an extension failure. These
measurements demonstrate that isolation and recovery haveroughly equivalent costs, or, more generally, that Nooks’support for recovery doubles the cost of its cross-domaincommunication mechanism for this workload.
The second point – that code runs more slowly – is demon-strated by the fact that the “Other kernel” and “VFAT”components slow down with Nooks. Since the code in bothconfigurations is the same, we speculated that the di!erencewas due to the TLB flushes that occurred during each XPC.Using the Pentium 4 performance counters, we monitoredTLB behavior during runs of the compilation benchmark.Of the 167 kernel-mode seconds spent in Nooks, 127 arespent handling TLB misses. By comparison, native Linuxspends just 5.5 seconds in TLB miss handling. The TLBtherefore causes most of the increase in execution time forthe kernel and VFAT and much of the execution time ofNooks itself.
While other systems have used segmentation to avoidthese additional TLB misses [31, 49], our goal of back-ward compatibility precluded rewriting drivers for a seg-mented addressing model. We have done little to optimizeNooks, but we believe that significant speedup is possiblethrough software techniques that have been demonstratedby other researchers, such as selective TLB flushing andreloading, handcoding critical paths in assembler, and us-ing more finely tuned data structures [42, 13].
6.4 Web Server Benchmarks
The final two benchmarks illustrate the impact on serverperformance of transactional workloads. Serve-simple-web-page uses a high XPC-frequency extension (kHTTPd) onthe server to deliver static content cached in memory. Weused httperf [33] to generate a workload that repeatedly re-quested a single kilobyte-sized Web page and measured themaximum possible request rate. kHTTPd on native Linuxcan serve over 15,000 pages per second. With Nooks, it canserve about 6,000, representing a 60% decrease in through-put.
Two elements of the benchmark’s behavior conspire toproduce such poor performance. First, the kHTTPd serverprocessor is the system bottleneck. For example, whenrun natively, the server’s CPU utilization is nearly 96%.Consequently, the high XPC rate slows the server substan-tially. Second, since the workload is transactional and non-bu!ered, the client’s request rate drops as a function of theserver’s slowdown. By comparison, the Send-stream bench-mark, which exhibits roughly the same rate of XPCs but
219
All Is Good, Is It?
Problem and Requirements
Kernel recovers driver (most of the time)
But applications using failed driver still crash
Need to conceal crash and recovery from applications
Need to centralize generic recovery logic
Need to ensure low overhead
Drivers in Action
Organized into classes
Handle requests
To access hardware
To change configuration
May crash
Due to request stream, hardware interaction,kernel environment
Either deterministically or transiently
Either right away (fail-stop) or some time later
Enter Shadow Drivers
Conceal transient and fail-stop failuresfor entire class of drivers
Monitor driver in passive mode
Replicate procedure calls
Log requests and responses
Impersonate driver/kernel in active mode
Respond to kernel requests
Respond to driver requests during re-initialization
Restore state in driver
Enabling MechanismTaps: T-junction between kernel and drivers
Requires ability to interpose on all communications
Sound Card Device Driver
Shadow Sound Driver
Kern
elIn
terfa
ceSo
und
Drive
rCl
ass
Inte
rface
OS Kernel
Kernel Interface
Sound DriverClass Interface
Sound Card
TapsCopies
Figure 2: A sample shadow driver operating in passivemode. Taps inserted between the kernel and sound driverensure that all communication between the two is passivelymonitored by the shadow driver.
Sound Card Device Driver
Shadow Sound Driver
Kern
elIn
terfa
ceSo
und
Drive
rCl
ass
Inte
rface
OS Kernel
Kernel Interface
Sound DriverClass Interface
Sound Card
Taps
Figure 3: A sample shadow driver operating in active mode.The taps redirect communication between the kernel andthe failed driver directly to the shadow driver.
kernel video drivers often communicate with usermodeapplications through shared memory regions [22].
3.4 The Shadow Manager
Recovery is supervised by the shadow manager, which isa kernel agent that interfaces with and controls all shadowdrivers. The shadow manager instantiates new shadowdrivers and injects taps into the call interfaces betweenthe device driver and kernel. It also receives notifica-tion from the fault-isolation subsystem that a driver hasstopped due to a failure.
When a driver fails, the shadow manager transitionsits taps and shadow driver to active mode. In this mode,
requests for the driver’s services are redirected to an ap-propriately prepared shadow driver. The shadow managerthen initiates the shadow driver’s recovery sequence torestore the driver. When recovery ends, the shadow man-ager returns the shadow driver and taps to passive-modeoperation so the driver can resume service.
3.5 Summary
Our design simplifies the development and integration ofshadow drivers into existing systems. Each shadow driveris a single module written with knowledge of the behav-ior (interface) of a class of device drivers, allowing it toconceal a driver failure and restart the driver after a fault.A shadow driver, normally passive, monitors communi-cation between the kernel and the driver. It becomes anactive proxy when a driver fails and then manages its re-covery.
4 Shadow Driver ImplementationThis section describes the implementation of shadowdrivers in the Linux operating system [6]. We have imple-mented shadow drivers for three classes of device drivers:sound card drivers, network interface drivers, and IDEstorage drivers.
4.1 General Infrastructure
All shadow drivers rely on a generic service infrastructurethat provides three functions. An isolation service pre-vents driver errors from corrupting the kernel by stoppinga driver on detecting a failure. A transparent redirectionmechanism implements the taps required for transparentshadowing and recovery. Lastly, an object tracking ser-vice tracks kernel resources created or held by the driverso as to facilitate recovery.
Our shadow driver implementation uses Nooks to pro-vide these functions. Through its fault isolation subsys-tem, Nooks [33] isolates drivers within separate kernelprotection domains. The domains use memory protec-tion to trap driver faults and ensure the integrity of kernelmemory. Nooks interposes proxy procedures on all com-munication between the device driver and kernel. We in-sert our tap code into these Nooks proxies to replicateand redirect communication. Finally, Nooks tracks ker-nel objects used by drivers to perform garbage collectionof kernel resources during recovery.
Our implementation adds a shadow manager to theLinux operating system. In addition to receiving failurenotifications from Nooks, the shadow manager handlesthe initial installation of shadow drivers. In coordina-tion with the kernel’s module loader, which provides thedriver’s class, the shadow manager creates a new shadowdriver instance for a driver. Because a single shadowdriver services a class of device drivers, there may be
OSDI ’04: 6th Symposium on Operating Systems Design and ImplementationUSENIX Association 5
Sound Card Device Driver
Shadow Sound Driver
Kern
elIn
terfa
ceSo
und
Drive
rCl
ass
Inte
rface
OS Kernel
Kernel Interface
Sound DriverClass Interface
Sound Card
TapsCopies
Figure 2: A sample shadow driver operating in passivemode. Taps inserted between the kernel and sound driverensure that all communication between the two is passivelymonitored by the shadow driver.
Sound Card Device Driver
Shadow Sound Driver
Kern
elIn
terfa
ceSo
und
Drive
rCl
ass
Inte
rface
OS Kernel
Kernel Interface
Sound DriverClass Interface
Sound Card
Taps
Figure 3: A sample shadow driver operating in active mode.The taps redirect communication between the kernel andthe failed driver directly to the shadow driver.
kernel video drivers often communicate with usermodeapplications through shared memory regions [22].
3.4 The Shadow Manager
Recovery is supervised by the shadow manager, which isa kernel agent that interfaces with and controls all shadowdrivers. The shadow manager instantiates new shadowdrivers and injects taps into the call interfaces betweenthe device driver and kernel. It also receives notifica-tion from the fault-isolation subsystem that a driver hasstopped due to a failure.
When a driver fails, the shadow manager transitionsits taps and shadow driver to active mode. In this mode,
requests for the driver’s services are redirected to an ap-propriately prepared shadow driver. The shadow managerthen initiates the shadow driver’s recovery sequence torestore the driver. When recovery ends, the shadow man-ager returns the shadow driver and taps to passive-modeoperation so the driver can resume service.
3.5 Summary
Our design simplifies the development and integration ofshadow drivers into existing systems. Each shadow driveris a single module written with knowledge of the behav-ior (interface) of a class of device drivers, allowing it toconceal a driver failure and restart the driver after a fault.A shadow driver, normally passive, monitors communi-cation between the kernel and the driver. It becomes anactive proxy when a driver fails and then manages its re-covery.
4 Shadow Driver ImplementationThis section describes the implementation of shadowdrivers in the Linux operating system [6]. We have imple-mented shadow drivers for three classes of device drivers:sound card drivers, network interface drivers, and IDEstorage drivers.
4.1 General Infrastructure
All shadow drivers rely on a generic service infrastructurethat provides three functions. An isolation service pre-vents driver errors from corrupting the kernel by stoppinga driver on detecting a failure. A transparent redirectionmechanism implements the taps required for transparentshadowing and recovery. Lastly, an object tracking ser-vice tracks kernel resources created or held by the driverso as to facilitate recovery.
Our shadow driver implementation uses Nooks to pro-vide these functions. Through its fault isolation subsys-tem, Nooks [33] isolates drivers within separate kernelprotection domains. The domains use memory protec-tion to trap driver faults and ensure the integrity of kernelmemory. Nooks interposes proxy procedures on all com-munication between the device driver and kernel. We in-sert our tap code into these Nooks proxies to replicateand redirect communication. Finally, Nooks tracks ker-nel objects used by drivers to perform garbage collectionof kernel resources during recovery.
Our implementation adds a shadow manager to theLinux operating system. In addition to receiving failurenotifications from Nooks, the shadow manager handlesthe initial installation of shadow drivers. In coordina-tion with the kernel’s module loader, which provides thedriver’s class, the shadow manager creates a new shadowdriver instance for a driver. Because a single shadowdriver services a class of device drivers, there may be
OSDI ’04: 6th Symposium on Operating Systems Design and ImplementationUSENIX Association 5
Passive mode Active mode
Implementation
General infrastructure
Isolation service to prevent kernel corruption
Redirection service to implement taps
Object tracking service to facilitate recovery
Luckily, we got Nooks
Lightweight protection domains, wrappers, object tracking
Passive Monitoring
Track all requests
Either: state of each active connection
Or: log of pending commands
Record configuration parameters
Log ioctl calls
Track all kernel objects (used by driver)
In reality: many calls require no work
Recovery: Stop Failed Driver
Disable execution of driver (e.g., any tasks in driver)
Disable hardware device (e.g., interrupts)
Also remove I/O mappings (for memory mapped I/O)
Garbage collect resources held by driver
However, not those used by kernel to request driver services
Recovery: Re-initialization
Goal: reboot driver from clean slate
Need to keep copy of device driver's clean data section
No need to access disk, whose driver may have crashed
What about driver's code?
Re-initialization in action
Initialize driver's internal state
Repeat kernel's initialization sequence
Reattach driver to pre-failure kernel resources
Recovery: Transfer State
Goal: Restore driver to just before time of failure
Restore configuration state
Connections and ioctl's, depending on class
Handle outstanding requests as well as new arrivals
Strategy depends on driver class
Disk and network devices, drivers, and kernel stack tolerate duplicate requests ➙ reissue requests
Printers do not like duplicates ➙ drop requests
Recovery: Proxy Requests
Responses depend on driver/request semantics:
Respond with recorded information
Silently drop request
Queue request for later processing
Block request until recovery is complete
Report driver as busy
"Please call again during regular business hours"
Implementation depends on interface specbut not drivers themselves
Another Paper,Another Evaluation
Criteria
Performance: what is the overhead of shadow drivers?
Fault tolerance: can applications continue to run?
Limitations: how realistic is fail-stop assumption?
Code size: how hard is it to implement?
Not hard: 700 lines for sound, 200 for network, 300 for IDE
Performance
My slightly pessimistic take: Nooks already is so slow that shadowing doesn't matter
Figure 5: Comparative application performance, relativeto Linux-Native, for three configurations. The X-axis crossesat 80%.
Figure 6: Absolute CPU utilization by application for threeconfigurations.
alone. Across all nine applications, performance of thesystem with shadow drivers averaged 99% of the systemwithout, and was never worse than 97%.
The low overhead of shadow drivers can be explainedin terms of its two constituents: fault isolation and theshadowing itself. As mentioned previously, fault isola-tion runs each driver in its own domain, leading to over-head caused by domain crossings. Each domain crossingtakes approximately 3000 cycles, mostly to change pagetables and execution stacks. As a side effect of chang-ing page tables, the Pentium 4 processor flushes the TLB,resulting in TLB misses that can noticeably slow downdrivers [33].
For example, the kernel calls the driver approximately1000 times per second when running audio recorder.Each invocation executes only a small amount of code.As a result, isolating the sound driver adds only negligi-bly to CPU utilization, because there are not many cross-
ings and not much code to slow down. For the most disk-intensive of the IDE storage applications, the databasebenchmark, the kernel and driver interact only 290 timesper second. However, each call into the ide-disk driverresults in substantial work to process a queue of disk re-quests. The TLB-induced slowdown doubles the timedatabase spent in the driver relative to Linux-Native andincreases the application’s CPU utilization from 21% to27%. On the other hand, the network send benchmarktransmits 45,000 packets per second, causing 45,000 do-main crossings. The driver does little work for eachpacket, but the overall impact is visible in Figure 6, whereCPU utilization for this benchmark increases from 28%to 57% with driver fault isolation.
In the case the actual shadowing, we see from a com-parison of the Linux-Nooks and Linux-SD bars in Fig-ures 5 and 6 that the cost is small or negligible. As notedin Section 4.2, many passive-mode shadow-driver func-tions are no-ops. As a result, the incremental passive-mode performance cost over basic fault isolation is lowor unmeasurable in many cases.
In summary, then, the overall performance penalty ofshadow drivers during failure-free operation is low, sug-gesting that shadow drivers could be used across a widerange of applications and environments.
5.2 Fault-Tolerance
Regardless of performance, the crucial question forshadow drivers is whether an application can continuefunctioning following the failure of a device driver onwhich it relies. To answer this question, we tested 10applications on the three configurations, Linux-Native,Linux-Nooks, and Linux-SD. For the disk and sounddrivers, we again ran the applications shown in Table 2.Because we were interested in the response to, not per-formance, we substituted network file copy, remote win-dow manager, and network analyzer for the networkingbenchmarks.
We simulated common bugs by injecting a softwarefault into a device driver while an application usingthat driver was running. Because both Linux-Nooksand Linux-SD depend on the same isolation and failure-detection services, we differentiate their recovery capa-bilities by simulating failures that are easily isolated anddetected. To generate realistic synthetic driver bugs, weanalyzed patches posted to the Linux Kernel MailingList [24]. We found 31 patches that contained the strings“patch,” “driver,” and “crash” or “oops” (the Linux termfor a kernel fault) in their subject lines. Of the 31 patches,we identified 11 that fix transient bugs (i.e., bugs that oc-cur occasionally or only after a long delay from the trig-gering test). The most common cause of failure (three in-stances) was a missing check for a null pointer, often witha secondary cause of missing or broken synchronization.
OSDI ’04: 6th Symposium on Operating Systems Design and Implementation USENIX Association10
Fault-Tolerance
Why does Nooks crash for IDE driver crashes?
Why do speech synthesizer and some network toolscontinue to function with Nooks?
Application BehaviorDevice Driver Application Activity Linux-Native Linux-Nooks Linux-SD
Sound mp3 player CRASH MALFUNCTION!
(audigy driver) audio recorder CRASH MALFUNCTION!
speech synthesizer CRASH! !
strategy game CRASH MALFUNCTION!
Network network file transfer CRASH! !
(e1000 driver) remote window manager CRASH! !
network analyzer CRASH MALFUNCTION!
IDE compiler CRASH CRASH!
(ide-disk driver) encoder CRASH CRASH!
database CRASH CRASH!
Table 3: The observed behavior of several applications following the failure of the device drivers on which they rely. Thereare three behaviors: a checkmark (
!) indicates that the application continued to operate normally; CRASH indicates that
the application failed completely (i.e., it terminated); MALFUNCTION indicates that the application continued to run, butwith abnormal behavior.
We also found missing pointer initialization code (two in-stances) and bad calculations (two instances) that led toendless loops and buffer overruns. Because these faultsare detected by Nooks, they cause fail-stop failures onLinux-Nooks and Linux-SD.
We injected a null-pointer dereference bug derivedfrom these patches into our three drivers. We ensuredthat the synthetic bug was transient by inserting the buginto uncommon execution paths, such as code that han-dles unusual hardware conditions. These paths are rarelyexecuted, so we accelerated the occurrence of faults byalso executing the bug at random intervals. The fault coderemains active in the driver during and after recovery.
Table 3 shows the three application behaviors weobserved. When a driver failed, each application ei-ther continued to run normally (
!), failed completely
(“CRASH”), or continued to run but behaved abnormally(“MALFUNCTION”). In the latter case, manual inter-vention was typically required to reset or terminate theprogram.
This table demonstrates that shadow drivers (Linux-SD) enable applications to continue running normallyeven when device drivers fail. In contrast, all applica-tions on Linux-Native failed when drivers failed. Mostprograms running on Linux-Nooks failed or behaved ab-normally, illustrating that Nooks’ kernel-focused recov-ery system does not extend to applications. For example,Nooks isolates the kernel from driver faults and reboots(unloads, reloads, and restarts) the driver. However, itlacks two key features of shadow drivers: (1) it does notadvance the driver to its pre-fail state, and (2) it has nocomponent to “pinch hit” for the failed driver during re-covery. As a result, Linux-Nooks handles driver failuresby returning an error to the application, leaving it to re-cover by itself. Unfortunately, few applications can dothis.
Some applications on Linux-Nooks survived the driverfailure but in a degraded form. For example, mp3 player,audio recorder and strategy game continued running, butthey lost their ability to input or output sound until theuser intervened. Similarly, network analyzer, which in-terfaces directly with the network device driver, lost itsability to receive packets once the driver was reloaded.
A few applications continued to function properlyafter driver failure on Linux-Nooks. One application,speech synthesizer, includes the code to reestablish itscontext within an unreliable sound card driver. Two of thenetwork applications survived on Linux-Nooks becausethey access the network device driver through kernel ser-vices (TCP/IP and sockets) that are themselves resilientto driver failures.
Linux-SD recovers transparently from disk driver fail-ures. Recovery is possible because the IDE storageshadow driver instance maintains the failing driver’s ini-tial state. During recovery the shadow copies back theinitial data and reuses the driver code, which is alreadystored read-only in the kernel. In contrast, Linux-Nooksillustrates the risk of circular dependencies from reboot-ing drivers. Following these failures, Nooks, which hadunloaded the ide-disk driver, was then required to reloadthe driver off the IDE disk. The circularity could only beresolved by a system reboot. While a second (non-IDE)disk would mitigate this problem, few machines are con-figured this way.
In general, programs that directly depend on driverstate but are unprepared to deal with its loss benefit themost from shadow drivers. In contrast, those that do notdirectly depend on driver state or are able to reconstructit when necessary benefit the least. Our experience sug-gests that few applications are as fault-tolerant as speechsynthesizer. Were future applications to be pushed in thisdirection, software manufacturers would either need to
OSDI ’04: 6th Symposium on Operating Systems Design and ImplementationUSENIX Association 11
Limits to Recovery
Automatic detection is incomplete, but recovery really good
develop custom recovery solutions on a per-applicationbasis or find a general solution that could protect any ap-plication from the failure of a kernel device driver. Costis a barrier to the first approach. Shadow drivers are apath to the second.
Application Behavior During Driver Recovery
Although shadow drivers can prevent application failure,they are not “real” device drivers and do not provide com-plete device services. As a result, we often observed aslight timing disruption while the driver recovered. Atbest, output was queued in the shadow driver. At worst,input was lost by the device. The length of the delay wasprimarily determined by the recovering device driver it-self, which, on initialization, must first discover, and thenconfigure, the hardware.
Few device drivers implement fast reconfiguration,which can lead to brief recovery delays. For example,the temporary loss of the e1000 network device driverprevented applications from receiving packets for aboutfive seconds.2 Programs using files stored on the diskmanaged by the ide-disk driver stalled for about four sec-onds during recovery. In contrast, the normally smoothsounds produced by the audigy sound card driver wereinterrupted by a pause of about one-tenth of one second,which sounded like a slight click in the audio stream.
Of course, the significance of these delays dependson the application. Streaming applications may becomeunacceptably “jittery” during recovery. Those processinginput data in real-time might become lossy. Others maysimply run a few seconds longer in response to a diskthat appears to be operating more sluggishly than usual.In any event, a short delay during recovery is best con-sidered in light of the alternative: application and systemfailure.
5.3 Limits to Recovery
The previous section assumed that failures were fail-stop.However, driver failures experienced in deployed systemsmay exhibit a wider variety of behaviors. For example, adriver may corrupt state in the application, kernel, or de-vice without the failure being detected. In this situation,shadow drivers may not be able to recover or mask fail-ures from applications. This section uses fault injectionexperiments in an attempt to generate faults that may notbe fail-stop.
Non-fail-stop Failures
If driver failures are not fail stop, then shadow driversmay not be useful. To evaluate whether device driver fail-
2This driver is particularly slow at recovery. The other networkdrivers we tested recovered in less than a second.
Figure 7: Results of fault-injection experiments on Linux-SD. We show (1) the percentage of failures that are automat-ically detected by the fault isolation subsystem, and (2) thepercentage of failures that shadow drivers successfully re-covered. The total number of failures experienced by eachapplication is shown at the top of the chart.
ures are indeed fail-stop, we performed large-scale fault-injection tests of our drivers and applications running onLinux-SD. For each driver and application combination,we ran 350 fault-injection trials.3 In total, we ran 2100trials across the three drivers and six applications. Be-tween trials, we reset the system and reloaded the driver.For each trial, we injected five random errors into thedriver while the application was using it. We ensured theerrors were transient by removing them during recovery.After injection, we visually observed the impact on theapplication and the system to determine whether a fail-ure or recovery had occurred. For each driver, we testedtwo applications with significantly different usage scenar-ios. For example, we chose one sound-playing applica-tion (mp3 player) and one sound-recording application(audio recorder).
If we observed a failure, we then assessed the trial ontwo criteria: whether the fault was detected, and whetherthe shadow driver could mask the failure and subsequentrecovery from the application. For undetected failures,we triggered recovery manually. Note that a user mayobserve a failure that an application does not, for exam-ple, by testing the application’s responsiveness.
Figure 7 shows the results of our experiments. Foreach application, we show the percentage of failuresthat the Nooks subsystem detected and the percentage offailures from which shadow drivers correctly recovered.Only 18% of the injected faults caused a visible failure.
In our tests, 390 failures occurred across all applica-tions. The sytem automatically detected 65% of the fail-ures. In every one of these cases, shadow drivers wereable to mask the failure and facilitate driver recovery. The
3For details on the fault injector see [33].
OSDI ’04: 6th Symposium on Operating Systems Design and Implementation USENIX Association12
Pulling Back
Some Issues
If only we had a software/tagged TLB...
What about end-to-end benchmarking?
All/most drivers managed by Nooks
Typical application mix
How many wrappers is enough?
How general is Nooks?
Supports only one communication pattern
Kernel / extension, but not between extensions
What about deterministic faults?
What Do You Think?