efficient software-based fault isolation
DESCRIPTION
Efficient Software-Based Fault Isolation. By Robert Wahbe, Steven Lucco, Thomas E. Anderson, and Susan L. Graham Presented by Pehr Collins. Background: Tannenbaum-Torvalds Debate. Linus Torvalds. Andrew Tannenbaum. Monolithic versus Microkernel. Monolithic OS - PowerPoint PPT PresentationTRANSCRIPT
Efficient Software-Based Fault Isolation
By Robert Wahbe, Steven Lucco, Thomas E. Anderson, and Susan L. Graham
Presented by Pehr Collins
Background: Tannenbaum-Torvalds Debate
Linus Torvalds Andrew Tannenbaum
Monolithic versus Microkernel
• Monolithic OS– Co-locates modules in same address space– Faults in extension code could bring down
whole OS or corrupt data– Not safe! – Many developers choose performance over
safety
• Microkernel OS– Core functions handled by microkernel– Additional functionality added by means of
modules in separate address spaces– Faults are isolated– Safe but has a performance cost– Calls between modules required a full
context switch– Three orders of magnitude more expensive
than normal procedure call to same address space
Resolving Conflict Between Safety and Performance
• Last week we looked at “Improving IPC by Kernel Design” by J. Liedtke – Optimization techniques could decrease context switch performance
penalty to two orders of magnitude for microkernel IPC– Simultaneous with this paper, but still not enough to tip the balance
• Enter software-based fault isolation– No more conflict: OS extension code can be both safe and efficient
How to Resolve Conflict? Sandboxing
•Fault domains are contiguous memory segments used for untrusted modules Distinguished by unique identifiers
•Protection is handled by software in the same address space for all modules
Isolating the Fault Domain
• Distrusted module code in a fault domain is modified to prevent writing and jumping to outside addresses
• This prevents distrusted module from harming other domains
• Two ways to accomplish this– Segment matching which pinpoints fault locations– Sandbox addressing which provides no data on source of faults
Segment Matching• Most control transfer instructions can be statically verified as
address is known at compile time• Checks are added to all other potentially unsafe instructions
– Jumps to register address– Stores to register address
• Illegal addresses prevented via segment matching– Check if unsafe instruction’s target address has correct segment
identifier– If check fails, trap to system error routine outside distrusted module’s
fault domain
Segment ID
Target Address
= UpperAddress Bits
Segment Matching• Requires four dedicated registers
1. Holds addresses in the code segment2. Holds addresses in the data segment3. Holds the segment shift amount4. Holds the segment identifier
• These registers are used only by inserted code, never modified by distrusted module code
• Dedicated registers are used to perform the checks on untrusted code• Performance impact of assigning some registers to become dedicated
registers is minimal on a RISC system
Segment ID
Target Address
= UpperAddress Bits
Address Sandboxing• Even better performance than segment matching• Cost: lose the information about the source of the faults• Before each unsafe instruction insert code that sets the upper
bits of the target address to the correct segment identifier• Does not catch illegal addresses• Prevents illegal addresses from affecting any other fault
domain• But what happens when there is an illegal address?
– It just jumps/writes to a garbage location within the fault domain
Segment ID
Target Address
overwrite
Address Sandboxing
• Requires five dedicated registers1. Holds the segment mask2. Holds the code segment identifiers3. Holds the data segment identifiers4. Holds the sandboxed code address5. Holds the sandboxed data address
Segment ID
Target Address
overwrite
Both Techniques Require Dedicated Registers
• Segment Checking: 4 dedicated registers• Address Sandboxing: 5 dedicated registers• What happens if all registers are already
allocated by the compiler?
Trust/Performance Tradeoff
• Only distrusted modules incur performance penalty
• Trusted modules can run at full speed
• We have covered write and jump, but what about load?
• Security can be ramped up to prevent distrusted modules from reading data outside their fault domain– Increases execution time
overhead (by quite a bit)!
Resource Protection• Fault domains share the same virtual address space• Problem: if a fault domain made system calls it can close or
delete files needed by other code in the address space• Could cause crash• Potential solution: modify the OS to know about fault
domains• Not portable• Their solution: resource arbitration
Resource Arbitration• Require distrusted modules to access resources through
cross-fault-domain RPC• Reserve a fault domain to hold trusted arbitration code• Arbiter determines safeness of system calls by other fault
domains• System calls in object code of distrusted modules are
transformed to use the arbiter RPC call• Trusted modules make system calls as normal and share fault
domain with arbiter
How Do Modules Communicate?Cross-fault-domain RPC
• Since the whole idea of fault domains is to provide better IPC performance, this is essential
• Trusted stubs used for fault domains to call outside their domain• Stubs run unprotected outside caller and callee domains• Stubs copy cross-domain arguments (marshal) and manage machine state• Trustworthiness of stub allows caller and callee to communicate via a
shared buffer• This creates a LRPC as only a single shared copy of the data is necessary• Stubs are created manually for now
Cross-fault-domain RPC
Jump Table•Allows the untrusted module to call into a stub outside its fault domain•Each entry in the jump table is a legal entry point to a stub outside the untrusted fault domain•Is read only to untrusted module•Is written to by trusted modules to set the entry point addresses
Performance Testing
• Prototype running on DEC-MIPS and DEC-ALPHA
• Considered:1. How much overhead incurred by software
encapsulation?2. How fast is cross-default domain RPC?3. Performance impact of using software enforced
fault isolation on an application
Encapsulation Overhead
RPC and Fault Isolation Costs
Fault Domain RPC Cost Fault Isolation Overhead in POSTGRES
Results Analysis
• Savings can be represented by the following formula
• Function of: – Time spent in distrusted code (td)– Percentage of time spent crossing fault domains (tc)– Overhead of encapsulation (h)– Ratio (r) of fault domain crossing time to the crossing time of
competing hardware based RPC
Performance Analysis with Entire Application Encapsulated
Performance Analysis with 50% of Application Encapsulated
Conclusion• Results are impressive at first glance• Suggest that software based fault isolation is the way to go in many cases
where crossing time is sufficiently quicker than standard RPC• However, security, security, security!• When security for reads is desired, overhead shoots way up
– from 4.3% on average to 21.8%!• Errors from sandbox addressing difficult to track
– Could generate a garbage address inside the fault domain
• Stubs are manually generated• Requirement to dedicate 4 or 5 registers could be problematic
– Solution is geared towards RISC architecture– Authors mention that CISC systems like 8086 would suffer performance
penalties due to dedicated register requirements
Thanks for your attention!
• Diagrams on Monolithic/Microkernel from Wikipedia• Photos of Linus Torvalds and Andrew Tannenbaum from Wikipedia• Segment Matching and Sandboxing Addressing figures from Tony Bock’s presentation on the same
paper (Winter 2006)
Quick RecapFault Isolation in cooperating modules: what is the
problem?
• Existing schemes place each module in own address space
• This isolates faults• Major context switch overhead for tightly-
coupled modules
Quick RecapA Solution in Two Parts
1. Load code and data for distrusted module into own fault domain
2. Modify object code of this module to prevent writing jumping to addresses outside fault domain
• Portable and language agnostic solutions• Cost is slight increase in execution time for distrusted
modules• Yields significant boost in inter-fault domain performance
and hence overall performance