copyright 2013 – noah mendelsohn the life and death of a process noah mendelsohn tufts university...
TRANSCRIPT
Copyright 2013 – Noah Mendelsohn
The Life and Deathof
A Process
Noah MendelsohnTufts UniversityEmail: [email protected]: http://www.cs.tufts.edu/~noahBased on a presentation by Professor Alva Couch
COMP 111: Operating Systems (Fall 2013)
© 2010 Noah Mendelsohn2
Today
How processes are created, managed and terminated
Sharing the computer (redux)
From file.c to a.out to running image
Library routines and shared libraries
© 2010 Noah Mendelsohn
Operating systems do two things for us:
4
• They make the computer easier to use
• The facilitate sharing of the computer by multiple programs and users
© 2010 Noah Mendelsohn
…actually, Unix & Linux have one more goal:
5
• To facilitate running the same program (and OS!) on different types of computer
© 2010 Noah Mendelsohn
CPU
OPERATING SYSTEM KERNEL
MA
IN M
EM
ORY
The protected OS “Kernel”
Angry Birds Play Video Browser
Multiple ProgramsRunning at once
The operating system is a special, privileged program,
with its own code and data. We call the protected, shared part
of the OS the “kernel”.
© 2010 Noah Mendelsohn
CPU
OPERATING SYSTEM KERNEL
MA
IN M
EM
ORY
We need help from the hardware to protect the kernel!
Angry Birds Play Video Browser
The hardware has memory mapping features that the OS can use to:
• Hide the kernel from other programs• Hide programs from each other
• Convince each program it’s got its own private memory starting at address zero
© 2010 Noah Mendelsohn
CPU
OPERATING SYSTEM KERNEL
MA
IN M
EM
ORY
Angry Birds Play Video Browser
The hardware has special instructions that only the kernel can use to:
* Initiate I/O* Set clocks and timers
* Control memory mapping
The Kernel runs in “privileged” or “kernel” or “supervisor” state.
Ordinary programs run in “user mode”.
If a user program tries a privileged operation, the hardware will tell the
kernel!
Privileged instructions only the OS can use
© 2010 Noah Mendelsohn
CPUMEMORY
Angry Birds Play Video Browser
Disk PrinterKeyboard,mouse,display
OPERATING SYSTEM KERNEL
A process is an instance of a running program
© 2010 Noah Mendelsohn
CPUMEMORY
Angry Birds Play Video Browser
Disk PrinterKeyboard,mouse,display
OPERATING SYSTEM KERNEL
A process is an instance of a running program
How does this process get started?
How does the OS know what code to run?
© 2010 Noah Mendelsohn11
Today
How processes are created, managed and terminated
Sharing the computer (redux)
From file.c to a.out to running image
Library routines and shared libraries
© 2010 Noah Mendelsohn
Process creation in Unix/Linux
Each process starts life as a clone of its parent
– Use the fork() system call to create a clone
When it’s born, each process inherits from its parent
– Open files (related processes share a file pointer)
– Environment variables
– Many other things: e.g. current directory
– An exact copy of all memory segments from the parent
– The same code running at the same place, I.E. dropping through the fork!
Each copy can tell whether it is parent or child
– Parent gets process ID as return value from fork
– Child gets zero
13
© 2010 Noah Mendelsohn
Example of fork() system call
int main(int argc, char *argv[]) { pid_t child_pid; /* child’s process id or zero */
fprintf(stderr,"PARENT: Parent has started\n"); child_pid = fork();
if (child_pid) { fprintf(stderr,
"PARENT: my pid is %d and my parent is %d\n", getpid(), getppid()); wait(child_pid); fprintf(stderr,"PARENT: my child with pid=%d has died\n", child_pid); } else { fprintf(stderr,"CHILD: my pid is %d and my parent is %d\n", getpid(), getppid()); }}
14
Using stderr instead of stdout because it’s unbuffered…output is never delayed
© 2010 Noah Mendelsohn
Example of fork() system call
int main(int argc, char *argv[]) { pid_t child_pid; /* child’s process id or zero */
fprintf(stderr,"PARENT: Parent has started\n"); child_pid = fork();
if (child_pid) { fprintf(stderr,
"PARENT: my pid is %d and my parent is %d\n", getpid(), getppid()); wait(child_pid); fprintf(stderr,"PARENT: my child with pid=%d has died\n", child_pid); } else { fprintf(stderr,"CHILD: my pid is %d and my parent is %d\n", getpid(), getppid()); }}
15
Using stderr instead of stdout because it’s unbuffered…output is never delayed
Print startup message and fork into two processes – a parent and a child
© 2010 Noah Mendelsohn
Example of fork() system call
int main(int argc, char *argv[]) { pid_t child_pid; /* child’s process id or zero */
fprintf(stderr,"PARENT: Parent has started\n"); child_pid = fork();
if (child_pid) { fprintf(stderr,
"PARENT: my pid is %d and my parent is %d\n", getpid(), getppid()); wait(child_pid); fprintf(stderr,"PARENT: my child with pid=%d has died\n", child_pid); } else { fprintf(stderr,"CHILD: my pid is %d and my parent is %d\n", getpid(), getppid()); }}
16
Using stderr instead of stdout because it’s unbuffered…output is never delayed
The parent gets the child’s process id..the child gets zero.
© 2010 Noah Mendelsohn
Example of fork() system call
int main(int argc, char *argv[]) { pid_t child_pid; /* child’s process id or zero */
fprintf(stderr,"PARENT: Parent has started\n"); child_pid = fork();
if (child_pid) { fprintf(stderr,
"PARENT: my pid is %d and my parent is %d\n", getpid(), getppid()); wait(child_pid); fprintf(stderr,"PARENT: my child with pid=%d has died\n", child_pid); } else { fprintf(stderr,"CHILD: my pid is %d and my parent is %d\n", getpid(), getppid()); }}
17
Using stderr instead of stdout because it’s unbuffered…output is never delayed
In the parent…
• Print a message• Wait for the child to complete its work• Announce that the child has been “reaped”• Exit (drop through)
© 2010 Noah Mendelsohn
Example of fork() system call
int main(int argc, char *argv[]) { pid_t child_pid; /* child’s process id or zero */
fprintf(stderr,"PARENT: Parent has started\n"); child_pid = fork();
if (child_pid) { fprintf(stderr,
"PARENT: my pid is %d and my parent is %d\n", getpid(), getppid()); wait(child_pid); fprintf(stderr,"PARENT: my child with pid=%d has died\n", child_pid); } else { fprintf(stderr,"CHILD: my pid is %d and my parent is %d\n", getpid(), getppid()); }}
18
Using stderr instead of stdout because it’s unbuffered…output is never delayed
In the child…
• Print a message• Exit
© 2010 Noah Mendelsohn
Example of fork() system call
int main(int argc, char *argv[]) { pid_t child_pid; /* child’s process id or zero */
fprintf(stderr,"PARENT: Parent has started\n"); child_pid = fork();
if (child_pid) { fprintf(stderr,
"PARENT: my pid is %d and my parent is %d\n", getpid(), getppid()); wait(child_pid); fprintf(stderr,"PARENT: my child with pid=%d has died\n", child_pid); } else { fprintf(stderr,"CHILD: my pid is %d and my parent is %d\n", getpid(), getppid()); }}
19
Using stderr instead of stdout because it’s unbuffered…output is never delayed
Remember:
On a multi-core machine, the parent and the child may really be running at the same time!
© 2010 Noah Mendelsohn
Example of fork() system call
int main(int argc, char *argv[]) { pid_t child_pid; /* child’s process id or zero */
fprintf(stderr,"PARENT: Parent has started\n"); child_pid = fork();
if (child_pid) { fprintf(stderr,
"PARENT: my pid is %d and my parent is %d\n", getpid(), getppid()); wait(child_pid); fprintf(stderr,"PARENT: my child with pid=%d has died\n", child_pid); } else { fprintf(stderr,"CHILD: my pid is %d and my parent is %d\n", getpid(), getppid()); }}
20
Using stderr instead of stdout because it’s unbuffered…output is never delayed
OUTPUT$ fork1PARENT: Parent has startedPARENT: my pid is 26928 and my parent is 24979CHILD: my pid is 26930 and my parent is 26928PARENT: my child with pid=26930 has died$
© 2010 Noah Mendelsohn
Example of fork() system call
int main(int argc, char *argv[]) { pid_t child_pid; /* child’s process id or zero */
fprintf(stderr,"PARENT: Parent has started\n"); child_pid = fork();
if (child_pid) { fprintf(stderr,
"PARENT: my pid is %d and my parent is %d\n", getpid(), getppid()); wait(child_pid); fprintf(stderr,"PARENT: my child with pid=%d has died\n", child_pid); } else { fprintf(stderr,"CHILD: my pid is %d and my parent is %d\n", getpid(), getppid()); }}
21
Using stderr instead of stdout because it’s unbuffered…output is never delayed
Question? What’s the parent’s parent?
© 2010 Noah Mendelsohn
Example of fork() system call
int main(int argc, char *argv[]) { pid_t child_pid; /* child’s process id or zero */
fprintf(stderr,"PARENT: Parent has started\n"); child_pid = fork();
if (child_pid) { fprintf(stderr,
"PARENT: my pid is %d and my parent is %d\n", getpid(), getppid()); wait(child_pid); fprintf(stderr,"PARENT: my child with pid=%d has died\n", child_pid); } else { fprintf(stderr,"CHILD: my pid is %d and my parent is %d\n", getpid(), getppid()); }}
22
Using stderr instead of stdout because it’s unbuffered…output is never delayed
$ ps PID TTY TIME CMD24979 pts/22 00:00:00 tcsh26834 pts/22 00:00:00 ps$ $ $ fork1PARENT: Parent has startedPARENT: my pid is 26928 and my parent is 24979CHILD: my pid is 26930 and my parent is 26928PARENT: my child with pid=26930 has died$
Most commands have the shell as a parent!
© 2010 Noah Mendelsohn
Some things to note about fork
Code for parent and child is the same
…we haven’t learned to run another program yet
Parent and child run in parallel
Child gets a copy of variables – changes are not seen by the parent
There is a tree of processes rooted at a special system “init” process, which is always pid=1 … every process except init has a parent!
Parent must wait for the child to die or it becomes a “zombie”
23
© 2010 Noah Mendelsohn
Zombies, process IDs and the process table
Every process has an ID (you’ve seen that)
Inside the kernel, there is a data structure known as a process descriptor for each process that hasn’t been “reaped” by a wait call from the parent
On 32 bit Linux systems, there are typically 32767 process IDs…if they get used up, the system can’t make new processes
Process IDs and descriptors can’t be reused until the parent has waited
Therefore: be sure to wait for the death of every process you create!
What happens if the parent dies without waiting?
– Children get inherited by init, which will reap them
– The big problem is if you keep running and don’t reap your children!
24
© 2010 Noah Mendelsohn
Killing a process
Send the process a kill signal to ask/tell it to die
How?
– From another process: kill(victim_pid, SIGKILL) system call
– From the console: kill -9 victim_pid
SIGKILL (integer value -9) is a magic signal that the victim cannot intercept: it will kill the process immediately
Other signals are used for many purposes...covered next week
The parent still must wait/reap or the result is a zombie…the shell will reap processes of commands it launches
25
© 2010 Noah Mendelsohn
Review: How can your process “call” the kernel?
CPUMEMORY
Angry Birds Play Video Browser
Disk PrinterKeyboard,mouse,display
OPERATING SYSTEM KERNEL
Filesystem, Graphics System, Window system, TCP/IP Networking, etc., etc.
Your program can use system calls to ask the kernel for
service (e.g. read, kill, etc.)
© 2010 Noah Mendelsohn
Signals: How the kernel can call your program!
CPUMEMORY
Parent
Child
OPERATING SYSTEM KERNEL
The kernel can cause a preset signal handler in your program
to run…this alerts your program to some news from
the kernel
Before signal can be caught, child must issue to identify the handler function:
signal(SIGNAME, handler_function)
If no handler, OS supplies default behavior
© 2010 Noah Mendelsohn
Signal handling example
29
/* sigalarm.c */ #include <stdio.h> #include <signal.h>typedef enum { false, true } bool;
int sleeping = false;
void timeisup(int sig) { fprintf(stderr, "Quit bothering me, I was sleeping!\n", sig); sleeping = false;}
main() { signal(SIGALRM,timeisup); alarm(5); /* Please wake me in 5 seconds */
sleeping = true; while (sleeping) { printf("zzz...\n"); sleep(1); }
fprintf(stderr, "Dang, you woke me up!\n");}
Tell the OS to call the timeisup function when the alarm signal arrives.
timeisup is the signal handler
© 2010 Noah Mendelsohn
Signal handling example
30
/* sigalarm.c */ #include <stdio.h> #include <signal.h>typedef enum { false, true } bool;
int sleeping = false;
void timeisup(int sig) { fprintf(stderr, "Quit bothering me, I was sleeping!\n", sig); sleeping = false;}
main() { signal(SIGALRM,timeisup); alarm(5); /* Please wake me in 5 seconds */
sleeping = true; while (sleeping) { printf("zzz...\n"); sleep(1); }
fprintf(stderr, "Dang, you woke me up!\n");}
Tell the OS to send SIGALRM in 5 seconds
© 2010 Noah Mendelsohn
Signal handling example
31
/* sigalarm.c */ #include <stdio.h> #include <signal.h>typedef enum { false, true } bool;
int sleeping = false;
void timeisup(int sig) { fprintf(stderr, "Quit bothering me, I was sleeping!\n", sig); sleeping = false;}
main() { signal(SIGALRM,timeisup); alarm(5); /* Please wake me in 5 seconds */
sleeping = true; while (sleeping) { printf("zzz...\n"); sleep(1); }
fprintf(stderr, "Dang, you woke me up!\n");}
QUESTION: does sleeping ever become false?
Loop printing “zzz”
© 2010 Noah Mendelsohn
Signal handling example
32
/* sigalarm.c */ #include <stdio.h> #include <signal.h>typedef enum { false, true } bool;
int sleeping = false;
void timeisup(int sig) { fprintf(stderr, "Quit bothering me, I was sleeping!\n", sig); sleeping = false;}
main() { signal(SIGALRM,timeisup); alarm(5); /* Please wake me in 5 seconds */
sleeping = true; while (sleeping) { printf("zzz...\n"); sleep(1); }
fprintf(stderr, "Dang, you woke me up!\n");}
QUESTION: does sleeping ever become false?
YES!!When the signal arrives!
© 2010 Noah Mendelsohn
Signal handling example
33
/* sigalarm.c */ #include <stdio.h> #include <signal.h>typedef enum { false, true } bool;
volatile sigatomic_t sleeping = false;
void timeisup(int sig) { fprintf(stderr, "Quit bothering me, I was sleeping!\n", sig); sleeping = false;}
main() { signal(SIGALRM,timeisup); alarm(5); /* Please wake me in 5 seconds */
sleeping = true; while (sleeping) { printf("zzz...\n"); sleep(1); }
fprintf(stderr, "Dang, you woke me up!\n");}
Advanced topic
On most machines, this will work fine if you declare sleeping as an int, however…
There are two issues in principle:
1) The compiler working on main() needs to know that sleeping could get updated
by code it’s not seeing (the handler – volatile warns it)
2) For some data types, updating a value takes multiple instructions, and the alarm
could ring while the data is in an inconsistent state. Very unlikely for an int, but sigatomic_t is guaranteed to
be updated atomically
Glad you asked?
Lesson: asynchronous programming is tricky, and OS’s do it all the time!
© 2010 Noah Mendelsohn
One process can ask kernel to signal another
CPUMEMORY
Parent
Child
OPERATING SYSTEM KERNEL
© 2010 Noah Mendelsohn
One process can ask kernel to signal another
CPUMEMORY
Parent
Child
OPERATING SYSTEM KERNEL
Before signal can be caught, child must issue:
signal(SIG_XXXX, handler_function)
If no handler, OS supplies default behaviorkill(SIG_XXXX, child_pid)
Oddly, kill is used not just for kill signals, but for all signals!
handler_function()called
© 2010 Noah Mendelsohn
One process can ask kernel to signal another
CPUMEMORY
Parent
Child
OPERATING SYSTEM KERNEL
Before signal can be caught, child must issue:
signal(SIGNAME, handler_function)
If no handler, OS supplies default behaviorkill(SIG_STP, child_pid)
Oddly, kill is used not just for kill signals, but for all signals!
By the way, the shell has a kill command you can use to send signals to any of your processes (or other
people’s processes if you have permission). Use “man 1 kill” for more info on the shell command, and “man 2
kill” & “man 2 signal” for the system calls .
© 2010 Noah Mendelsohn
Background and suspended processes
From most shells:– emacs myfile.txt shell (parent process) stays busy while Emacs runs– emacs myfile.txt & & says: run in background: let shell run while Emacs runs
If you start a program in the foreground and want to do something else– emacs myfile.txt shell (parent process) stays busy while Emacs runs– CTRL-Z: suspend Emacs and let shell run– Choices after CTRL-Z: fg puts job back in foreground; bg resumes it in background
To find out about running background jobs– Run the jobs command– Each job is named: %1, %2, etc.– You can do things like kill -9 %1 or fg %2
Most of this is implemented with signals (e.g. CTRL-Z sends SIGSTP which by default pauses the process) – so you can do this from a program as well as from the shell
37
Summary from Prof. Couch: “ typing ./a.out in the shell is an explicit wait. typing ./a.out & in the shell is a background execution. ”
© 2010 Noah Mendelsohn
Using exec to launch a new program
Fork creates parallel copies of the same program
Exec replaces the code for a process with a brand new program and calls its “main” function
Common idiom: to run a new program
– fork() /* to create a child process */
– exec() /* have the child replace itself with the program to be run */
– [ optional: continue to do work in the parent while the child runs ]
– wait(): /* in the parent for the new program to complete */
Ever wondered where your return values from exit() go?
– The are available to the parent via: pid = wait(int *child_exit_status)
– So, the parent can find out if the child returned success or an error
39
© 2010 Noah Mendelsohn
Examples of fork/exec
Example: running a "cat" command in the foreground:
– http://www.cs.tufts.edu/comp/111/examples/The_Visible_OS/wait1.c
Example: running a "cat" command in the foreground with explicit wait:
– http://www.cs.tufts.edu/comp/111/examples/The_Visible_OS/wait2.c
Example: running a "cat" command in the background with implicit wait:
– http://www.cs.tufts.edu/comp/111/examples/The_Visible_OS/wait3.c
Example: running a user-typed command in the foreground without arguments:
– http://www.cs.tufts.edu/comp/111/examples/The_Visible_OS/shell.c
Example: running a user-typed command in the background without arguments:
– http://www.cs.tufts.edu/comp/111/examples/The_Visible_OS/shell2.c
40
These examples are from Prof. Couch’s lecture
© 2010 Noah Mendelsohn
Some things to watch with exec
Read the man page to find out the arguments it takes – there are several flavors
As with fork, the new program retains:
– Open files, environment variables, current working directory, owner, etc., etc.
All data and variables from the caller are replaced – if you have buffered I/O that is be lost
41
Consider the following:
main() { printf("this won't get seen at all…"); execl("/bin/cat", "cat", "/dev/null", 0); }
// prints nothing at all, because // the execl erases the unwritten line buffer!
© 2010 Noah Mendelsohn
Some things to watch with exec
Read the man page to find out the arguments it takes – there are several flavors
As with fork, the new program retains:
– Open files, environment variables, current working directory, owner, etc., etc.
All data and variables from the caller are replaced – if you have buffered I/O that will be lost
42
Consider the following:
main() { printf("this won't get seen at all…"); fprintf(stderr, "this will get seen, because stderr ” “flushes buffers on each write…"); execl("/bin/cat", "cat", "/dev/null", 0); }
/* prints “this will get seen, because stderr flushes buffers on each write” *//* the execl erases the unwritten stdout buffer, but the stderr output is already done */
By default, stderr does not buffer it’s output……as soon as you print it goes out. Try it!
© 2010 Noah Mendelsohn43
Today
How processes are created, managed and terminated
Sharing the computer (redux)
From file.c to a.out to running image
Library routines and shared libraries
© 2010 Noah Mendelsohn
MAIN MEMORY
CPU
Sharing the CPU
Angry Birds Play Video Browser
Multiple ProgramsRunning at once
OPERATING SYSTEM
CPU is shared…can only do one thing at a time*
*Modern multi-core CPUs can schedule one process/core at a time
© 2010 Noah Mendelsohn
Process scheduling
If processes are ready to run, the OS picks one and runs it
– The chosen process is in the running state
– Processes have priority and high priority processes are run more often
– The others are marked as ready (or runnable) (I.e. they’d like to run but need to wait their turn)
Some processes are healthy but waiting for something
– Reasons: sleep(), waiting for I/O, wait(), select(), page fault
– These processes are in the blocked state and they aren’t scheduled until that changes
Multicore CPUs: exactly the same, but we can have one running process on each core!
46
(For now, assume we have a simple one core CPU)
Designing process schedulers is an art. The strategy that gives good interactive response on a shared server may not be what you need for a massive database system!
© 2010 Noah Mendelsohn
The five state process model
47
New Ready Running Exit
Blocked
Admit
Dispatch
TimeoutEvent occurs Even
t w
ait
release
“Run queue” (processes in line for CPU)
See: Stallings 7th Edition Page 118
© 2010 Noah Mendelsohn
The five state process model
48
New Ready Running Exit
Blocked
Admit
Dispatch
TimeoutEvent occurs Even
t w
ait
release
“Run queue” (processes in line for CPU)
See: Stallings 7th Edition Page 118
In some OS’s, a blocked process can die without running any
cleanup.
© 2010 Noah Mendelsohn
MAIN MEMORY
CPU
Sharing Memory
Angry Birds Play Video Browser
Multiple ProgramsRunning at once
All programs share memory
OPERATING SYSTEM
© 2010 Noah Mendelsohn
MAIN MEMORY
CPU
Memory shortage
Angry Birds Play Video Browser
What if we need more memory than we have?
OPERATING SYSTEMC compiler Browser Emacs
© 2010 Noah Mendelsohn
MAIN MEMORY
CPU
Swapping
Angry Birds Browser
The OS can “swap” some process memory to disk
OPERATING SYSTEM
Disk
C compiler Browser Emacs
Play Video
© 2010 Noah Mendelsohn
MAIN MEMORY
CPU
Swapping
Angry Birds
The OS can “swap” some process memory to disk
OPERATING SYSTEM
Disk
C compiler
Browser Emacs
Play Video
Browser
© 2010 Noah Mendelsohn
The five seven state process model
54
New Ready Running Exit
Blocked
Admit
Dispatch
Timeout
EventEv
ent w
ait
Release
“Run queue” (processes in line for CPU)
Ready / Suspend
See: Stallings 7th Edition Page 118
Blocked / Suspend
Event
© 2010 Noah Mendelsohn
The five seven state process model
55
New Ready Running Exit
Blocked
Admit
Dispatch
Timeout
EventEv
ent w
ait
Release
“Run queue” (processes in line for CPU)
Ready / Suspend
See: Stallings 7th Edition Page 118
Blocked / Suspend
Event
Processes that have been swapped to disk aren’t
scheduled to run even if they’re otherwise ready
© 2010 Noah Mendelsohn
Summary of paging and swapping
30 years ago, systems swapped whole processes
Today, page-size chunks of memory are moved and mapped individually– A process is often partially resident– The scheduler blocks when a particular page that’s needed is on disk– On our systems, pagesize = 4096 bytes (try command: “getconf PAGESIZE”)
Special hardware is needed to make this work– “Real” memory pages can be mapped to arbitrary locations in one or more virtual
memories– The hardware faults (tells the kernel) if a needed page is reference but not mapped– Pages can be mapped “read-only”: fault if write is attempted
The system tends to run well if the “working sets” of pages that programs reference a lot fit together in memory
56
Historical reference: Denning, P.J. (1968), The working set model for program behavior. Communications of the ACM, 5/1968, Volume 11, pp. 323-333
© 2010 Noah Mendelsohn
Each process has its own virtual memory
MA
IN M
EM
ORY
CPU
Angry
B
irds
Pla
y
Vid
eo
Bro
wse
r
OPER
ATIN
G S
YSTEM
Stack(Angry Birds Call Stack)
Text(Angry Birds
code)
Static initialized (Angry Birds
Data)
Static uninitialized(Angry Birds
Data)
Heap(malloc’d)
argv, environ
0
0xFFF…FFFF
Angry Birds Virtual Memory
© 2010 Noah Mendelsohn
Each process has its own virtual memory
MA
IN M
EM
ORY
CPU
Angry
B
irds
Pla
y
Vid
eo
Bro
wse
r
OPER
ATIN
G S
YSTEM
Stack(Angry Birds Call Stack)
Text(Angry Birds
code)
Static initialized (Angry Birds
Data)
Static uninitialized(Angry Birds
Data)
Heap(malloc’d)
argv, environ
0
0xFFF…FFFF
Angry Birds Virtual Memory
Consider what happens when we fork…
…we’ll need two complete copies of the whole process
memory!
© 2010 Noah Mendelsohn
Fork needs to copy the virtual memory
MA
IN M
EM
ORY
CPU
Angry
B
irds
Pla
y
Vid
eo
Bro
wse
r
OPER
ATIN
G S
YSTEM
Stack(Angry Birds Call Stack)
Text(Angry Birds
code)
Static initialized (Angry Birds
Data)
Static uninitialized(Angry Birds
Data)
Heap(malloc’d)
argv, environ
0
0xFFF…FFFF
Angry
B
irds
Stack(Angry Birds Call Stack)
Text(Angry Birds
code)
Static initialized (Angry Birds
Data)
Static uninitialized(Angry Birds
Data)
Heap(malloc’d)
argv, environ
© 2010 Noah Mendelsohn
Each process has its own virtual memory
MA
IN M
EM
ORY
CPU
Angry
B
irds
Pla
y
Vid
eo
Bro
wse
r
OPER
ATIN
G S
YSTEM
Stack(Angry Birds Call Stack)
Text(Angry Birds
code)
Static initialized (Angry Birds
Data)
Static uninitialized(Angry Birds
Data)
Heap(malloc’d)
argv, environ
0
0xFFF…FFFF
Stack(Angry Birds Call Stack)
Text(Angry Birds
code)
Static initialized (Angry Birds
Data)
Static uninitialized(Angry Birds
Data)
Heap(malloc’d)
argv, environ
Angry
B
irds
Can we find a way to make the copy cheap?
Yes!
© 2010 Noah Mendelsohn
Each process has its own virtual memory
MA
IN M
EM
ORY
CPU
Angry
B
irds
Pla
y
Vid
eo
Bro
wse
r
OPER
ATIN
G S
YSTEM
Stack(Angry Birds Call Stack)
Text(Angry Birds
code)
Static initialized (Angry Birds
Data)
Static uninitialized(Angry Birds
Data)
Heap(malloc’d)
argv, environ
0
0xFFF…FFFF
Stack(Angry Birds Call Stack)
Text(Angry Birds
code)
Static initialized (Angry Birds
Data)
Static uninitialized(Angry Birds
Data)
Heap(malloc’d)
argv, environ
Angry
B
irds
Map all the pages in both VMs to the same actual
pages in memory…
© 2010 Noah Mendelsohn
Each process has its own virtual memory
MA
IN M
EM
ORY
CPU
Angry
B
irds
Pla
y
Vid
eo
Bro
wse
r
OPER
ATIN
G S
YSTEM
Stack(Angry Birds Call Stack)
Text(Angry Birds
code)
Static initialized (Angry Birds
Data)
Static uninitialized(Angry Birds
Data)
Heap(malloc’d)
argv, environ
0
0xFFF…FFFF
Stack(Angry Birds Call Stack)
Text(Angry Birds
code)
Static initialized (Angry Birds
Data)
Static uninitialized(Angry Birds
Data)
Heap(malloc’d)
argv, environ
Angry
B
irds
Map all the pages in both VMs to the same actual
pages in memory…
…and mark them all read only…
© 2010 Noah Mendelsohn
Each process has its own virtual memory
MA
IN M
EM
ORY
CPU
Angry
B
irds
Pla
y
Vid
eo
Bro
wse
r
OPER
ATIN
G S
YSTEM
Stack(Angry Birds Call Stack)
Text(Angry Birds
code)
Static initialized (Angry Birds
Data)
Static uninitialized(Angry Birds
Data)
Heap(malloc’d)
argv, environ
0
0xFFF…FFFF
Stack(Angry Birds Call Stack)
Text(Angry Birds
code)
Static initialized (Angry Birds
Data)
Static uninitialized(Angry Birds
Data)
Heap(malloc’d)
argv, environ
Angry
B
irds
Map all the pages in both VMs to the same actual
pages in memory…
…and mark them all read only…
If either process changes data, the OS will get the
write fault and copy just the updated page!
© 2010 Noah Mendelsohn
Shared executables
By the way: the same mapping tricks allow us to share a single copy of each executable (e.g. Emacs, gcc, firefox)
This works even if they are launched using unrelated exec calls
The system typically loads at most one copy of a given executable…any later exec calls just map it!
We can also share copies of libraries by using something called “shared libraries”…we’ll study those later today
65
© 2010 Noah Mendelsohn
Summary of copy on write
A classic optimization used in many systems
– Hardware must support read-only mapping at the page level
The time to clone a process or other data is small and often independent of the size
– Some work is necessary to set up the new maps: depends on the hardware
Read-only data is never copied (e.g. the program code!)
Writeable data is still not copied until it’s actually updated
66
© 2010 Noah Mendelsohn
Sharing executable programs
A similar trick is used to share the code for programs
– We’ve seen how that works with fork(), but…
Even if lots of copies of a program are loaded independently using exec(), there’s typically only one copy in memory or swapped to disk
This is a huge savings. Think of how many times our Halligan “homework” server runs:
– bash, tcsh, gcc, make, emacs, vim
– There’s typically just one loaded copy of each, no matter how many users
– Launching a new copy is very quick
67
© 2010 Noah Mendelsohn68
Today
How processes are created, managed and terminated
Sharing the computer (redux)
From file.c to a.out to running image
Library routines and shared libraries
© 2010 Noah Mendelsohn
From source code to executable
70
#include <stdio.h>int main(int argc, char *argv[]) { printf(“The sum is %d\n” sum(1,2))}
two_plus_one.c
gcc –c two_plus_one.c
int sum(int a, int b) { return a+b;}
gcc –c arith.c
Relocateable object code for sum()
arith.c
arith.oRelocateable object code for sum()
two_plus_one.o
© 2010 Noah Mendelsohn
From source code to executable
71
#include <stdio.h>int main(int argc, char *argv[]) { printf(“The sum is %d\n” sum(1,2))}
two_plus_one.c
gcc –c two_plus_one.c
int sum(int a, int b) { return a+b;}
gcc –c arith.c
Relocateable object code for sum()
arith.c
arith.oRelocateable object code for main()
two_plus_one.o
Relocatable .o files
• Contain machine code• References within the file are resolved
• References to external files not resolved• Some address fields may need adjusting
depending on final location in executable program
© 2010 Noah Mendelsohn
Linking .o files to create executable
72
gcc –o two_plus_one two_plus_one.o arith.o
Relocateable object code for sum()
two_plus_one.o
Relocateable object code for sum()
arith.o
Executable Program
two_plus_one
gcc actually runs a program named “ld” to create the executable.
© 2010 Noah Mendelsohn
Linking .o files to create executable
73
gcc –o two_plus one two_plus_one.o arith.o
Relocateable object code for sum()
two_plus_one.o
Relocateable object code for sum()
arith.o
Executable Program
two_plus_one
The executable contains all the code, with references resolved. It is ready to be invoked using the exec_() family of system calls.
© 2010 Noah Mendelsohn
Linking .o files to create executable
74
gcc –o two_plus_one two_plus_one.o arith.o
Relocateable object code for sum()
two_plus_one.o
Relocateable object code for sum()
arith.o
Executable Program
two_plus_one
The default name for an executable is a.out so programmers sometimes informally refer to any executable as an “a.out”.
© 2010 Noah Mendelsohn75
Today
How processes are created, managed and terminated
Sharing the computer (redux)
From file.c to a.out to running image
Library routines and shared libraries
© 2010 Noah Mendelsohn
Ooops! Where does printf come from?
76
gcc –o two_plus one two_plus_one.o arith.o
Relocateable object code for sum()
two_plus_one.o
Relocateable object code for sum()
arith.o
Executable Program
two_plus_one
Routines like printf live in libraries.
© 2010 Noah Mendelsohn
Ooops! Where does printf come from?
77
gcc –o two_plus one two_plus_one.o arith.o
Relocateable object code for sum()
two_plus_one.o
Relocateable object code for sum()
arith.o
Executable Program
two_plus_one
Routines like printf live in libraries.
These are created with the “ar” command, which packages up several .o files together into a “.a” archive or library. You can list the .a along with your separate .o files and ld will pull from it any .o files it needs.
© 2010 Noah Mendelsohn
Ooops! Where does printf come from?
78
gcc –o two_plus one two_plus_one.o arith.o
Relocateable object code for sum()
two_plus_one.o
Relocateable object code for sum()
arith.o
Executable Program
two_plus_one
Routines like printf live in libraries.
These are created with the “ar” command, which packages up several .o files together into a “.a” archive or library. You can list the .a along with your separate .o files and ld will pull from it any .o files it needs.
printf used to live in the system library named libc.a, which the compiler links automatically into the executable (so you don’t have to list it).
© 2010 Noah Mendelsohn
Why shared libraries?
Problem: if printf is linked from the libc.a, then we get a separate copy in each program that uses printf
Idea: what if we could have one copy and use memory mapping to put it into every executable that needs it?
Challenges:
– We can’t link it when ld builds the rest of the executable: we can just note we need it
– The same copy is likely to be mapped at different addresses in different programs
79
© 2010 Noah Mendelsohn
Why shared libraries?
Problem: if printf is linked from the libc.a, then we get a separate copy in each program that uses printf
Idea: what if we could have one copy and use memory mapping to put it into every executable that needs it?
Challenges:
– We can’t link it when ld builds the rest of the executable: we can just note we need it
– The same copy is likely to be mapped at different addresses in different programs
Solution: compiler, linker and OS work together to support shared libraries
– gcc –fPIC printf.c generates “position-independent code” that can load at any address
– gcc –shared –o libc.so printf.o xxx.o obj3.o creates shared library
– gcc –o two_plus_one two_plus_one.o arith.o libc.so
80
We’ll use printf as an example even though it’s built in to the system…
Compile the source with –fPIC to make a position-independent .o file.
© 2010 Noah Mendelsohn
Why shared libraries?
Problem: if printf is linked from the libc.a, then we get a separate copy in each program that uses printf
Idea: what if we could have one copy and use memory mapping to put it into every executable that needs it?
Challenges:
– We can’t link it when ld builds the rest of the executable: we can just note we need it
– The same copy is likely to be mapped at different addresses in different programs
Solution: compiler, linker and OS work together to support shared libraries
– gcc –fPIC printf.c generates “position-independent code” that can load at any address
– gcc –shared –o libc.so printf.o xxx.o obj3.o creates shared library
– gcc –o two_plus_one two_plus_one.o arith.o libc.so
81
Link that printf.o and any other files with the –shared option to create a shared library (.so) file.
© 2010 Noah Mendelsohn
Why shared libraries?
Problem: if printf is linked from the libc.a, then we get a separate copy in each program that uses printf
Idea: what if we could have one copy and use memory mapping to put it into every executable that needs it?
Challenges:
– We can’t link it when ld builds the rest of the executable: we can just note we need it
– The same copy is likely to be mapped at different addresses in different programs
Solution: compiler, linker and OS work together to support shared libraries
– gcc –fPIC printf.c generates “position-independent code” that can load at any address
– gcc –shared –o libc.so printf.o xxx.o obj3.o creates shared library
– gcc –o two_plus_one two_plus_one.o arith.o libc.so
82
The linker recognizes .so files…instead of including the code, it leaves a little stub that tells the OS to find and map the shared copy of the .so file when exec loads the program.
(Actually, libc.so is so widely used that it’s automatically linked, so you don’t need to list it as you would your own .so libraries).
© 2010 Noah Mendelsohn
MA
IN M
EM
ORY
CPU
Angry
B
irds
Pla
y
Vid
eo
Bro
wse
r
OPER
ATIN
G S
YSTEM
Angry
B
irds
Stack(Angry Birds Call Stack)
Text(Angry Birds
code)
Static initialized (Angry Birds
Data)
Static uninitialized(Angry Birds Data)
Heap(malloc’d)
argv, environ
???
libc.so
Stack(Browser Call
Stack)
Text(Browser code)
Static initialized (Browser Data)
Static uninitialized(Browser Data)
Heap(malloc’d)
argv, environ
libc.so
libc.so (with printf code) shows up at
different locations in
the two programs
Memory mapping allows sharing of .so libraries
© 2010 Noah Mendelsohn
Memory mapping allows sharing of .so libraries
MA
IN M
EM
ORY
CPU
Angry
B
irds
Pla
y
Vid
eo
Bro
wse
r
OPER
ATIN
G S
YSTEM
Stack(Angry Birds Call Stack)
Text(Angry Birds
code)
Static initialized (Angry Birds
Data)
Static uninitialized(Angry Birds Data)
Heap(malloc’d)
argv, environ
Stack(Angry Birds Call Stack)
Text(Browser code)
Static initialized (Browser Data)
Static uninitialized(Browser Data)
Heap(malloc’d)
argv, environ
Angry
B
irds ???
libc.so
libc.so
libc.so
Only one copy lives in
memory… everyone shares it!
© 2010 Noah Mendelsohn
Memory mapping allows sharing of .so libraries
MA
IN M
EM
ORY
CPU
Angry
B
irds
Pla
y
Vid
eo
Bro
wse
r
OPER
ATIN
G S
YSTEM
Stack(Angry Birds Call Stack)
Text(Angry Birds
code)
Static initialized (Angry Birds
Data)
Static uninitialized(Angry Birds Data)
Heap(malloc’d)
argv, environ
Stack(Angry Birds Call Stack)
Text(Browser code)
Static initialized (Browser Data)
Static uninitialized(Browser Data)
Heap(malloc’d)
argv, environ
Angry
B
irds ???
libc.so
libc.so
libc.so
Memory mapping
hardware can do this…
Code must be position-
independent!
© 2010 Noah Mendelsohn
Summary of today’s topics
Processes are cloned using fork
To run a new program: fork then exec
Kernel-to-program communication:
– A process calls the kernel using a system call (trap)
– The kernel calls a process using a signal
Processes form a tree, and dead processes must be “reaped”
The OS scheduler chooses high priority processes to run
Process memory can be swapped or paged to disk when memory is tight
Memory mapping & copy-on-write are used:
– To make fork quick
– To save memory by sharing executables and shared libraries across processes
87