oct 26, 2008dynamos -- kocsea '08 1 runtime mutation of commodity operating system kernels or...
TRANSCRIPT
Oct 26, 2008 DynAMOS -- KOCSEA '081
Runtime Mutation of Commodity Operating System Kernels
orPlease, No More Rebooting!
Kyung Dong Ryu <[email protected]>IBM T.J. Watson Research Center
Kristis Makris <[email protected]>Arizona State University
Oct 26, 2008 DynAMOS -- KOCSEA '082
IBM Research Worldwide
2700 Researchers in eight labs around the world
Oct 26, 2008 DynAMOS -- KOCSEA '083
Culture of Innovation External Recognition
5 National Medals of Science5 Nobel Laureates8 National Medals of Technology
6 Turing Awards
59 Members in National Academy of Engineering
21 Members in NationalAcademy of Sciences
More than 300 Professional Society Fellows
10 Inductees in NationalInventors Hall of Fame
Scanning Tunneling Microscope
Electron Tunneling
Effect
High Temperature Superconductivity
Nuclear Magnetic Resonance Techniques
Basis for MRI today
High Performance
Computing
First woman recipient in the history of this prestigious ACM award
DRAMSiGe
Silicon-on-Insulator
Copper Chip Technology
• AAAS
• ACM
• ACS
• APS
• AVS
• ECS
• IEEE
• IOP
• OSA
Oct 26, 2008 DynAMOS -- KOCSEA '084
IBM Patent Leadership – 2006
3,621
2,4512,366
2,2282,110
1,9591,771 1,731 1,671
1,610
0
500
1,000
1,500
2,000
2,500
3,000
3,500
4,000
IBM Samsung Canon Matsushita HPQ Intel Sony Hitachi Toshiba Micron
Oct 26, 2008 DynAMOS -- KOCSEA '085
Overview Motivation Dynamic Kernel Updates Categorization System Architecture Adaptive Function Cloning Synchronized Updates Applications Conclusion
Oct 26, 2008 DynAMOS -- KOCSEA '086
Motivation Dynamic kernel updates are essential Existing updating methods are inadequate Two approaches
– Build adaptable OS Specially crafted (K42, VINO, Synthetix) Require OS and application restructuring
– Dynamic code instrumentation No kernel source modification (KernInst, GILK) Basic block code interposition Currently limited
– No procedure replacement– No autonomous kernel adaptability– No safe, complete subsystem update guarantees
Oct 26, 2008 DynAMOS -- KOCSEA '087
Dynamic Updates Categorization (1)
Updating variable values– Update an entry in system call table– Update owner (uid) of an inode
Needs synchronized update– Count number of system calls of a process
Needs state tracking
Updating datatypes– Add new fields in Linux PCB for process checkpointing
Update all functions that use the old datatype, or Maintain new fields in separate data structure
– Does not need state transfer
Oct 26, 2008 DynAMOS -- KOCSEA '088
Dynamic Updates Categorization (2)
Updating single function– Correct a defect
Updating kernel threads– Update memory paging subsystem
Needs update during infinite loop
Updating function groups– Update pipefs subsystem
Needs synchronized update
Oct 26, 2008 DynAMOS -- KOCSEA '089
Our Approach DynAMOS
– Prototype for i386 Linux 2.2-2.6 Dynamic code instrumentation
– No kernel source modification or reboot– Procedure replacement
Adaptive updates– Concurrent execution of multiple versions– State tracking– Autonomous kernel adaptability
Safe updates of complete subsystems– Quiescence detection– Update synchronization (non-quiescent subsystems)– Datatype updates– State transfer
Oct 26, 2008 DynAMOS -- KOCSEA '0810
Unmodified kernelin memory
DynAMOS System Architecture
updatesource
gccld
vmlinux
kernelsource
makeobjectfile
insertmodule new
functionimages
originalfunctionimages
Oct 26, 2008 DynAMOS -- KOCSEA '0811
Unmodified kernelin memory
DynAMOS System Architecture
DynAMOSkernel moduleload DynAMOS
newfunctionimages
originalfunctionimages
Oct 26, 2008 DynAMOS -- KOCSEA '0812
Unmodified kernelin memory
DynAMOS System Architecture
DynAMOSkernel module
Update tool/dev/dynamos
version manager
initiate update
newfunctionimages
originalfunctionimages
Oct 26, 2008 DynAMOS -- KOCSEA '0813
Update tool
Unmodified kernelin memory
DynAMOS System Architecture
DynAMOSkernel module
newfunctionimages
image relocation
disassembler
prepare update
version manager
copy
originalfunctionimages
/dev/dynamos
cloned newfunctionimages
Oct 26, 2008 DynAMOS -- KOCSEA '0814
Unmodified kernelin memory
DynAMOS System Architecture
DynAMOSkernel module
version manager
cloned newfunctionimages
originalfunctionimages
newfunctionimages
Update tool/dev/dynamos
cloned newfunctionimages
Oct 26, 2008 DynAMOS -- KOCSEA '0815
Unmodified kernelin memory
DynAMOS System Architecture
DynAMOSkernel module
version manager
activate updateredirection
cloned newfunctionimages
originalfunctionimages
newfunctionimages
/dev/dynamosUpdate tool
Oct 26, 2008 DynAMOS -- KOCSEA '0816
schedule
Execution Flow Redirection
...call schedule...
caller
step 1
Apply Linger-Longer scheduler– Unobtrusive fine-grain cycle stealing– Implemented in schedule_LL as a
scheduling policy
Oct 26, 2008 DynAMOS -- KOCSEA '0817
Execution Flow Redirection
step 2
jmp *schedule
...call schedule...
caller
trampoline
Trampoline installation– Disable processor interrupts– Flush I-cache
Indirect jump– Don’t modify page permissions
redirection handler
Oct 26, 2008 DynAMOS -- KOCSEA '0818
schedule
Execution Flow Redirection
...call schedule...
caller
step 2
trampoline
preserve stateperform bookkeepingexecute adaptation handlerrestore state
Bookkeeping– Maintain use counters
User-defined adaptation handler– Execute if available– Select active version of function
adaptation handler
call
ret
redirection handler
Oct 26, 2008 DynAMOS -- KOCSEA '0819
redirection handler
Execution Flow Redirection
step 3
jmp *
jump to active function
schedule_clone schedule_LL_clone
schedule
...call schedule...
caller
trampoline
adaptation handler
Oct 26, 2008 DynAMOS -- KOCSEA '0820
Execution Flow Redirection
step 4
jump to active function
schedule_clone schedule_LL_clone
jump back jump back
jmp *
schedule
...call schedule...
caller
trampoline
adaptation handler
redirection handler
Oct 26, 2008 DynAMOS -- KOCSEA '0821
Execution Flow Redirection
step 5
jump to active function
schedule_clone schedule_LL_clone
jump back
preserve stateperform bookkeepingrestore stateret
return to caller
jump back
schedule
...call schedule...
caller
trampoline
adaptation handler
redirection handler
Oct 26, 2008 DynAMOS -- KOCSEA '0822
Adaptive Function Cloning Benefits
No processor state saved on stack– Function arguments accessed directly
Autonomous kernel determination of update timeliness– Using adaptation handler
Function-level updates– Basic blocks can be bypassed (no control-flow graph
needed)– Function modifications developed in original source
language
Oct 26, 2008 DynAMOS -- KOCSEA '0823
Function Relocation Issues
Replace ret (1-byte) with jmp * (6-byte) back to handler– Adjust inbound (jmp) and outbound (call) relative offsets
Safely detect– Backward branches: jmp to code overwritten by trampoline– Outbound branches: jmp to code outside function image– Indirect outbound branches: jmp * from indirection table– Data-in-code
Need user verification– Multiple entry-points: e.g. produced by Intel C Compiler
Oct 26, 2008 DynAMOS -- KOCSEA '0824
Overhead Small memory footprint (42k) Indirect addressing (jmp *) hurts branch prediction
– Can use direct addressing (jmp)– Overhead not correlated to path length– Mostly 1-8%
Oct 26, 2008 DynAMOS -- KOCSEA '0825
Quiescence Detection
Needed to– Atomically update function groups
e.g. Count number of processes using a filesystem– Safely reverse updates
Implemented by– Usage counters
On entry and exit– Stack walk-through
For non-returning calls (do_exit in Linux; no ret instruction) Examine stack and program counter of all processes Default kernel compilation (works without frame pointers)
Oct 26, 2008 DynAMOS -- KOCSEA '0826
wait fornew datain buffer
wait formore room
in buffer
Non-quiescent Subsystems
pipe_read()
{
...
acquire Sem
while (buffer_empty) {
...
release Sem
L1: sleep
acquire Sem
}
read from data buffer
release Sem
return
}
pipe_write()
{
...
acquire Sem
while (buffer_full) {
...
release Sem
L2: sleep
acquire Sem
}
write in data buffer
release Sem
return
}
Adaptively enlarge pipefs 4k copy bufferduring large data transfers
reader and writer aresynchronized with each other
Oct 26, 2008 DynAMOS -- KOCSEA '0827
Non-quiescent Subsystems
pipe_read()
{
...
acquire Sem
while (buffer_empty) {
...
release Sem
L1: sleep
acquire Sem
}
read from data buffer
release Sem
return
}
pipe_write()
{
...
acquire Sem
while (buffer_full) {
...
release Sem
L2: sleep
acquire Sem
}
write in data buffer
release Sem
return
}
subsystem may never quiescecannot update atomically
quiescentnon-quiescent; sleeping
Oct 26, 2008 DynAMOS -- KOCSEA '0828
Synchronized update of pipefspipe_read() {
acquire Sem
while (4k_buffer_empty) {
release Sem
L1: sleep
acquire Sem
}
read data from 4k_buffer
release Sem
return
}
Phase 1
pipe_read_v3() {
acquire Sem
while (1mb_buffer_empty) {
release Sem
L1: sleep
acquire Sem
}
read data from 1mb_buffer
release Sem
return
}
Oct 26, 2008 DynAMOS -- KOCSEA '0829
Synchronized update of pipefspipe_read() {
acquire Sem
while (4k_buffer_empty) {
release Sem
L1: sleep
acquire Sem
}
read data from 4k_buffer
release Sem
return
}
Semantically equivalent version at source code level
Wait for pipe_read to become inactive
pipe_read_v3() {
acquire Sem
while (1mb_buffer_empty) {
release Sem
L1: sleep
acquire Sem
}
read data from 1mb_buffer
release Sem
return
}
Phase 2
pipe_read_v2() {
acquire Sem
while (4k_buffer_empty) {
release Sem
L1: sleep
acquire Sem
if (must_update) {
phase = 3
STATE TRANSFER
goto new
}
}
read data from 4k_buffer
release Sem
return
new:
}
Oct 26, 2008 DynAMOS -- KOCSEA '0830
Synchronized update of pipefspipe_read() {
acquire Sem
while (4k_buffer_empty) {
release Sem
L1: sleep
acquire Sem
}
read data from 4k_buffer
release Sem
return
}
pipe_read_v2() {
acquire Sem
while (4k_buffer_empty) {
release Sem
L1: sleep
acquire Sem
if (must_update) {
phase = 3
STATE TRANSFER
goto new
}
}
read data from 4k_buffer
release Sem
return
while (1mb_buffer_empty) {
release Sem
sleep
acquire Sem
new:
}
read data from 1mb_buffer
release Sem
return
}
Inline updated version
pipe_read_v3() {
acquire Sem
while (1mb_buffer_empty) {
release Sem
L1: sleep
acquire Sem
}
read data from 1mb_buffer
release Sem
return
}
Phase 2
Oct 26, 2008 DynAMOS -- KOCSEA '0831
pipe_read() {
acquire Sem
while (4k_buffer_empty) {
release Sem
L1: sleep
acquire Sem
}
read data from 4k_buffer
release Sem
return
}
pipe_read_v2() {
acquire Sem
while (4k_buffer_empty) {
release Sem
L1: sleep
acquire Sem
if (must_update) {
phase = 3
STATE TRANSFER
goto new
}
}
read data from 4k_buffer
release Sem
return
while (1mb_buffer_empty) {
release Sem
sleep
acquire Sem
new:
}
read data from 1mb_buffer
release Sem
return
}
pipe_read_v3() {
acquire Sem
while (1mb_buffer_empty) {
release Sem
L1: sleep
acquire Sem
}
read data from 1mb_buffer
release Sem
return
}
Synchronized update of pipefs
Phase 3
Oct 26, 2008 DynAMOS -- KOCSEA '0832
pipe_read_v2() {
acquire Sem
while (4k_buffer_empty) {
release Sem
L1: sleep
acquire Sem
if (must_update) {
phase = 3
STATE TRANSFER
goto new
}
}
read data from 4k_buffer
release Sem
return
while (1mb_buffer_empty) {
release Sem
sleep
acquire Sem
new:
}
read data from 1mb_buffer
release Sem
return
}
pipe_read_v3() {
acquire Sem
while (1mb_buffer_empty) {
release Sem
L1: sleep
acquire Sem
}
read data from 1mb_buffer
release Sem
return
}pipe_read_adaptation_handler() {
if (phase == 3)
activate pipe_read_v3
else
activate pipe_read_v2
if (this process read
more than 64k)
must_update = 1
}
Sleep in original versionAwake in new version
Multi-phase approach
Adaptive update
30-90% improvementin Linux 2.6
3.2% overhead whennot adapting
Synchronized update of pipefs
Phase 3
Oct 26, 2008 DynAMOS -- KOCSEA '0833
Adaptive Memory Paging For Efficient Gang Scheduling
Kernel thread update (kswapd), Linux 2.2– Infinite loop– Awaken by other subsystems– Goes back to sleep
e.g. calls interruptible_sleep_on in Linux
To update– Activate interruptible_sleep_on_v2
Save state, exit Start new version of kernel thread, restore state
Oct 26, 2008 DynAMOS -- KOCSEA '0834
Kernel-Assisted Process Checkpointing
Datatype update for EPCKPT in Linux 2.4– Compact datatypes in commodity kernel. No extra room
struct task_struct: semaphores, pipes, memory mapped files
struct file: checkpoint filename
Shadow data structures– Instantiation (do_fork, sys_open): map memory address
of original variable to shadow using hash table– Removal (do_exit, fput): free shadow too– Already instantiated variables
Shadow missing: idempotent use of new fields– Update only functions that use new fields
No state transfer needed
Oct 26, 2008 DynAMOS -- KOCSEA '0835
Related Work
K42– Specially designed with hot-swappable capabilities– Guarantees quiescence
Ginseng– User-level software updates; requires recompilation
KernInst, GILK, Detours, ATOM, EEL– Do not facilitate adaptive execution– Do not safely replace complete subsystems
Oct 26, 2008 DynAMOS -- KOCSEA '0836
On-going and Future Work
Automatically produce updates given a patch– Apply MOSIX, Superpages: parallel applications– Apply Nooks: OS reliability– Upgrade Linux kernel
Multiprocessor support– Safely install trampoline: freeze other processors
using single-byte trap instruction (ud2)
Kernel module port– FreeBSD, OpenSolaris
Oct 26, 2008 DynAMOS -- KOCSEA '0837
Conclusion
Dynamic Kernel Updates– Dynamic code instrumentation– Commodity operating system (prototype for i386 Linux 2.2-2.6)
Adaptive function cloning– Concurrent execution of multiple function versions
Safe updates of non-quiescent subsystems– Scheduler, kernel threads, synchronized updates
Datatype updates Demonstrated updates
– Synchronized pipefs adaptation, process checkpointing, adaptive memory paging for efficient gang-scheduling, unobtrusive fine-grain cycle stealing, public security fixes
Small memory footprint (42k), 1-8% overhead
Oct 26, 2008 DynAMOS -- KOCSEA '0838
Questions ?