10 virtual memory 2 contents
DESCRIPTION
TRANSCRIPT
10 Virtual Memory
2
Contents
Background Demand Paging Process Creation Page Replacement Allocation of Frames Thrashing Operating System Examples Other Considerations
3
10.1 Background all memory management algorithms outlined so
far require that all of the processes are kept in physical memory– contiguous allocation
– paging
– segmentation
overlay is an exception, but it requires special effort of the programmer– explicit memory allocation and free operations in an
application program
4
Background
real programs, in many cases, do not need the entire program be kept in memory– code to handle unusual error conditions– arrays, lists and tables are often sparse– even when the entire programs is needed, it
may not all be needed at the same time (such is the case with overlays)
5
Background
conclusion: if we can keep the program only partially in memory– we can run a program larger than the
physical memory– more programs could be run at the same
time
6
Virtual Memory
virtual memory – separation of user logical memory from physical memory, allows an extremely large virtual memory to be provided to programmers when only a smaller physical memory is available– only part of the program needs to be in memory for
execution– logical address space can therefore be much larger
than physical address space– allows address spaces to be shared by several
processes– allows for more efficient process creation
7
virtual memory is larger than physical memory
8
common implementation
virtual memory can be implemented via:– demand paging
several system provide a paged segmentation scheme, i.e., the user view is segmentation, but the OS can implement this view with demand paging
– demand segmentation
9
10.2 Demand Paging
a demand-paging system is similar to a paging system with swapping
a lazy swapper – a swapper that never swaps a page into memory unless that page will be needed
compare:– a swapper manipulates entire processes– a pager is concerned with the individual
pages of a process
10
10.2.1 Basic Concepts
use disk space to emulate memory– a special partition/volume, e.g. Unix, Linux– a special file, e.g. Windows family
11
Transfer of a Paged Memory to Contiguous Disk Space
12
valid-invalid bit in page table
hardware support: whether a specific page is in memory?– if it is in memory, valid bit is set;
otherwise, valid bit is reset (i.e. invalid)
13
Page Table When Some Pages Are Not in Main Memory
6 logical pages with three pages in memory and three not
14
Page Fault if all reference attempts to access pages
already in memory, the process will run exactly as though we bring in all pages
if there is ever a reference to a page not in memory, first reference will trap to OS page fault
15
Page Fault Handling
OS looks at another table to decide:– invalid reference abort– just not in memory
find a free frame swap page into frame reset tables, validation bit = 1 restart instruction
16
Steps in Handling a Page Fault
17
two type of demand paging
pure demand paging– never bring a page into memory until it is
required demand paging with anticipation– some pre-load with anticipation
18
10.2.2 Performance of demand paging Page Fault Rate 0 p 1.0– if p = 0, no page faults – if p = 1, every reference is a fault
Effective Access Time (EAT)EAT = (1 – p) * memory access
+ p * (service the page fault interrupt+ [swap a page out]+ swap the page in+ restart overhead)
– (see p.326 for detail)
19
Performance Example
memory access time = 100 nanosecond swap page time = 25 millisecond effective access time
EAT = (1 – p)*100 + p*25,000,000= 100 + 24,999,900*p
– performance depends heavily on the page fault ratio (probability)
– e.g. less then 10% performance loss => p < 0.000 000 4 (see p.327 for detail)
20
10.3 Process Creation
in the extreme case, a process may be started with no pages in memory– fast start
virtual memory allows other benefits during process creation– copy-on-write– memory-mapped file
21
10.3.1 Copy-on-Write
Copy-on-Write allows both parent and child processes to initially share the same pages in memory
if either process modifies a shared page, only then is the page copied– Unix, Linux: fork( ) followed by exec( )
copy-on-write allows more efficient process creation as only modified pages are copied
currently used by Windows 2000, Linux, and Solaris 2
22
Copy-on-Write
free pages are allocated from a pool of zeroed-out pages, like stack or heap.
vfork( ) system call in some versions of UNIX (e.g. Solaris 2)– study the manual page of vfork( )
23
10.3.2 Memory-Mapped Files Memory-mapped file I/O allows file I/O to be
treated as routine memory access by mapping a disk block to a page in memory.
A file is initially read using demand paging. A page-sized portion of the file is read from the file system into a physical page. Subsequent reads/writes to/from the file are treated as ordinary memory accesses.
Simplifies file access by treating file I/O through memory rather than read( ) write( ) system calls.
Also allows several processes to map the same file allowing the pages in memory to be shared.
24
memory-mapped files
25
10.4 Page Replacement
one process may be bigger than the physical memory
total memory of all processes may be bigger than the physical memory– over-allocating memory
solution– swapping– page replacement: prevent over-allocation of
memory by modifying page-fault service routine to include page replacement
26
Need for page replacement
27
10.4.1 Basic Scheme
1. find the location of the desired page on disk2. find a free frame:
- if there is a free frame, use it.- if there is no free frame, use a page replacement algorithm to
select a victim frame- write the victim page to the disk
3. read the desired page into the (newly) free frame; update the page and frame tables
4. restart the process
28
Page Replacement
29
Page Replacement
if no frames are free, two page transfers are required
use modify (dirty) bit to reduce overhead of page transfers – only modified pages are written to disk– the modify bit for a page is set by the
hardware whenever any word or byte in the page is written into
30
Page Replacement Algorithm two problems to implement demand
paging– frame-allocation algorithm: how many
frames to allocate to each process– page-replacement algorithm: how to select the
frames that are to be replaced
– expensive disk I/O makes a good design of these algorithms an important task: we want lowest page-fault rate
31
Reference String
many algorithms, how to evaluate?– lowest page-fault rate
we must evaluate an algorithm by running it on a particular string of memory references– (page) reference string: page numbers only,
with adjacent duplications eliminated
32
reference string example
original memory access sequence:0100, 0432, 0101, 0612, 0102, 0103, 0104, 0101, 0611, 0102, 0103, 0104, 0101, 0610, 0102, 0103, 0104, 0101, 0609, 0102, 0105
page reference sequence/string:1, 4, 1, 6, 1, 1, 1, 1, 6, 1, 1, 1, 1, 6, 1, 1, 1, 1, 6, 1, 1
reference string:1, 4, 1, 6, 1, 6, 1, 6, 1, 6, 1
33
page-faults versus number of frames
for the given reference string:1, 4, 1, 6, 1, 6, 1, 6, 1, 6, 1
if we have three (or more) frames: 3 page-faults if we have only one frames: 11 page-faults
34
10.4.2 FIFO Page Replacement First-In-First-Out Algorithm:– use a FIFO queue to hold all pages in memory– when a page is brought into memory, insert it at the
tail– when a free frame is needed, we replace the page at
the queue
total 15 page faults
35
Example 2: Belady’s Anomaly reference string: 1, 2, 3, 4, 1, 2, 5, 1, 2, 3, 4, 5 3 frames (3 pages can be in memory at a time per process)
4 frames
FIFO Replacement – Belady’s Anomaly– we expect: more frames less page faults
1
2
3
1
2
3
4
1
2
5
3
4
9 page faults
1
2
3
1
2
3
5
1
2
4
5 10 page faults
44 3
36
10.4.3 Optimal Page Replacement
replace the page that will not be used for the longest period of time
difficult to implement, because it requires future knowledge of the reference string
total 9 page faults
37
Example 2
4 frames example 1, 2, 3, 4, 1, 2, 5, 1, 2, 3, 4, 5
optimal algorithm is used for measuring how well your algorithm performs
1
2
3
4
6 page faults
4 5
38
10.4.4 Least Recently Used (LRU) LRU replacement associates with each page the
time of that page’s last use when a page must be replaced, LRU choose the
page that has not been used for the longest period of time
total 12 page faults
39
Example 2
reference string:1, 2, 3, 4, 1, 2, 5, 1, 2, 3, 4, 5
1
2
3
5
4
4 3
5
8 page faults
40
LRU implementation counter implementation– every page entry has a time-of-use field– a logical clock, incremented for every memory
reference– every time a page is referenced, copy the clock into
the time-of-use field– when a page needs to be changed, look for the page
with smallest time-of-use field difficulties– requires a search of the page table– a write to memory (time-of-use field in page table)
for each memory access– page-table maintain in context switch– overflow of the clock
41
LRU implementation
Stack implementation– keep a stack of page numbers in a double
linked list– when a page is referenced, move it to the top
requires 6 pointers to be changed
– always replace the page at the bottom No search for replacement
42
stack implementation example
43
Hardware support needed!
neither optimal replacement nor LRU replacement suffers from Belady’s anomaly
both implementations of LRU need special hardware support, or we will suffer the performance by a factor of at least ten (e.g. via interrupt) – the updating of the clock fields or stack must
be done for every memory reference
44
10.4.5 LRU Approximation Reference Bit– with each page associate a bit, initially = 0– when page is referenced, bit set to 1– replace the one which is 0 (if one exists)
we do not know the order, however
Additional Reference Bits– record the reference bits at regular intervals, say,
every 100ms– use right-shift history byte for each page, current
reference bit shift into the left-most bit– choose the page with lowest number
(see Section 10.4.5.1 on page 341 for detail)
45
LRU Approximation Second-Chance Algorithm– basically an FIFO algorithm + reference bit– all frames form a circular queue– inspect the current frame
if the reference bit is set, reset it, and skip to next frame
otherwise, replace it
if a frame is referenced frequently enough, it will never get replaced
46
second-chance algorithm
47
LRU Approximation Enhanced Second-Chance Algorithm– an FIFO circular queue– reference bit– modify bit (dirty bit)
four classes– (0, 0) neither recently used nor modified– (0, 1) not recently used, but modified (dirty)– (1, 0) recently used, but clean– (1, 1) recently used and modified
examine the class to which a page belongs replace the first page encountered in the lowest
nonempty class– drawback: may have to scan the circular queue
several times– used in Macintosh
48
10.4.6 Counting-based Algorithms keep a counter of number of references
to each page least-frequently-used (LFU) algorithm– the page with the smallest count is replaced– counter shift right at regular interval:
exponentially decaying average most-frequently-used (MFU) algorithm– based on the argument that the page with
the smallest count was probably just brought in and has yet to be used
49
10.5 Allocation of Frames Each process needs minimum number of pages– performance: fewer pages => more page-faults– architecture limit: machine instructions’
requirements example: IBM 370 needs at least 6 pages to
handle MVC instruction:– instruction is 6 bytes, might span 2 pages– 2 pages to handle from– 2 pages to handle to
two major allocation schemes– fixed allocation– priority allocation
50
10.5.2 Allocation Algorithms equal allocation: each process gets an
equal share of total frames proportional allocation: allocate available
memory to each process according to its size– priority-based allocation
in all schemes, the allocation to each process may vary according to the multiprogramming level
mSs
pa
m
sS
ps
iii
i
ii
for allocation
frames of number total
process of size
51
10.5.3 Global .vs. Local Allocation global replacement – process selects a
replacement frame from the set of all frames– one process can take a frame from another
e.g. a high priority process can take a frame from a lower priority process
– problem: a process cannot control its own page-fault rate
– results in greater system throughput local replacement – each process selects from
only its own set of allocated frames– the number of frames allocated to a process does not
change
52
10.6 Thrashing
If a process does not have “enough” pages, the page-fault rate is very high. This leads to:– low CPU utilization
– operating system thinks that it needs to increase the degree of multiprogramming
– another process added to the system, and worse the condition
thrashing a process is busy swapping pages in and out
53
10.6.1 Cause of Thrashing
Why does paging work? -- Locality model
– Process migrates from one locality to another
– Localities may overlap
54
Locality in a Memory-Reference Pattern as a process executes, it
moves from locality to locality
a locality is a set of pages that are actively used together
e.g. a subroutine defines a new locality
Why does thrashing occur? size of locality > total memory size
55
10.6.2 Working-Set Model working-set window: a fixed number of page
references Example: 10,000 instruction (example below: 10 pages)
Working-Set Size WSSi (working set of Process Pi) = total number of pages referenced in the most recent (varies in time)– if too small will not encompass entire locality.– if too large will encompass several localities.
if = will encompass entire program.
56
Working-Set Model
D = WSSi total demand frames if D > m thrashing policy: if D > m, then suspend one of the
processes (and swap it out completely)
the working-set strategy prevents thrashing while keeping the degree of multiprogramming as high as possible– it optimizes CPU utilization
57
Keeping Track of the Working Set
approximate with interval timer + a reference bit Example: = 10,000 references– timer interrupts about every 5000 references– keep in memory 2 bits for each page– whenever a timer interrupts, copy and sets the values
of all reference bits to 0– if one of the bits in memory = 1 page in working set
why is this not completely accurate? improvement: 10 reference bits, and interrupt
every 1000 time units
58
10.6.3 Page-Fault Frequency Scheme
establish “acceptable” page-fault rate– if actual rate too low, process loses frame– if actual rate too high, process gains frame
59
10.7 Case Study
Windows NT Solaris 2 Linux
60
10.7.1 Windows NT Uses demand paging with clustering. Clustering brings
in pages surrounding the faulting page. Processes are assigned working set minimum and
working set maximum.– Working set minimum is the minimum number of
pages the process is guaranteed to have in memory.– A process may be assigned as many pages up to its
working set maximum. When the amount of free memory in the system falls
below a threshold, automatic working set trimming is performed to restore the amount of free memory.– Working set trimming removes pages from
processes that have pages in excess of their working set minimum.
61
Windows NT
local page-replacement policy– on single x86 processor – a variation of the
clock algorithm– on multiprocessor and Alpha – a variation of
FIFO How to determine working-set minimum
and maximum?– a mystery not stated by the textbook
62
10.7.2 Solaris 2 maintains a list of free pages to assign faulting
processes lotsfree – threshold parameter to begin paging Paging is peformed by pageout process.– pageout scans pages using modified clock algorithm
– scanrate is the rate at which pages are scanned This ranged from slowscan to fastscan
– pageout is called more frequently depending upon the amount of free memory available
63
Solar Page Scanner
64
Supplementary: LinuxMemory Management (20.6)
Linux’s physical memory-management system deals with allocating and freeing pages, groups of pages, and small blocks of memory.
It has additional mechanisms for handling virtual memory, memory mapped into the address space of running processes.
65
Splitting of Memory in a Buddy Heap
66
Managing Physical Memory The page allocator allocates and frees all physical pages;
it can allocate ranges of physically-contiguous pages on request.
The allocator uses a buddy-heap algorithm to keep track of available physical pages.– Each allocatable memory region is paired with an adjacent
partner.– Whenever two allocated partner regions are both freed up they
are combined to form a larger region.– If a small memory request cannot be satisfied by allocating an
existing small free region, then a larger free region will be subdivided into two partners to satisfy the request.
Memory allocations in the Linux kernel occur either statically (drivers reserve a contiguous area of memory during system boot time) or dynamically (via the page allocator).
67
Virtual Memory The VM system maintains the address space
visible to each process: it creates pages of virtual memory on demand, and manages the loading of those pages from disk or their swapping back out to disk as required
The VM manager maintains two separate views of a process’s address space:– a logical view describing instructions concerning the
layout of the address space. The address space consists of a set of non-overlapping regions, each representing a continuous, page-aligned subset of the address space.
– a physical view of each address space which is stored in the hardware page tables for the process.
68
Virtual Memory (Cont.) on executing a new program, the process is
given a new, completely empty virtual-address space; the program-loading routines populate the address space with virtual-memory regions.
creating a new process with fork involves creating a complete copy of the existing process’s virtual address space– The kernel copies the parent process’s VMA
descriptors, then creates a new set of page tables for the child.
– The parent’s page tables are copies directly into the child’s, with the reference count of each page covered being incremented.
– After the fork, the parent and child share the same physical pages of memory in their address spaces.
69
Virtual Memory (Cont.)
The VM paging system relocates pages of memory from physical memory out to disk when the memory is needed for something else.
The VM paging system can be divided into two sections:– the pageout-policy algorithm decides which pages to
write out to disk, and when.– the paging mechanism actually carries out the
transfer, and pages data back into physical memory as needed.
70
10.9 Summary
virtual memory makes it possible to execute a process whose logical address space is larger than the available physical address space
virtual memory increases multiprogramming level, thus CPU utilization and throughput
71
Summary
pure demand paging– backing store
– page fault
– page table
– OS internal frame table
page fault rate low => performance acceptable
72
Summary page replacement algorithms– FIFO
Belady’s anomaly
– optimal– Least Recently Used (LRU)
reference bit additional reference bits second chance algorithm (clock algorithm) enhanced second chance algorithm
– Counter-based Least Frequently Used (LFU)
73
Summary
frame allocation policy– fixed (i.e. equal share)– proportional (to program size)– priority-based
– static, or local, page replacement supported by working-set model
– dynamic, or global, page replacement
74
Summary
working-set model– locality– the WS is the set of pages in the current
locality thrash– if a process does not have enough memory
for its working set, it will thrash
75
Homework
paper– 2, 4, 5, 8, 11, 17, 20
oral– 1, 6, 7, 9, 16, 18
lab– 21
supplementary materials– Intel 80x86 protected mode operation