memory management - utrgv faculty webfaculty.utrgv.edu/david.egle/csci4334/ch11… · ppt file ·...
TRANSCRIPT
MEMORY MANAGEMENT
2
Basic OS Organization
Processor(s) Main Memory Devices
Process, Thread &Resource Manager
MemoryManager
DeviceManager
FileManager
Operating System
Computer Hardware
3
The Basic Memory Hierarchy
CPU Registers
Primary Memory(Executable Memory)
e.g. RAM
Secondary Memorye.g. Disk or Tape
More Frequently
Used Information
Less Frequently
Used Information
von Neumann architecture
4
Memory System Primary memory
Holds programs and data while they are being used by the CPU
Referenced by byte; fast access; volatile Secondary memory
Collection of storage devices Referenced by block; slow access; nonvolatile
5
Primary & Secondary MemoryCPU
Primary Memory(Executable Memory)
e.g. RAM
Secondary Memorye.g. Disk or Tape
• CPU can load/store• Ctl Unit executes code
from this memory• Transient storage
• Access using I/O operations• Persistent storage
Load
Information can be loaded statically or dynamically
6
Classical Memory Manager Tasks
Memory management technology has evolved
Early multiprogramming systems Resource manager for space-multiplexed
primary memory As popularity of multiprogramming grew
Provide robust isolation mechanisms Still later
Provide mechanisms for shared memory
7
Contemporary Memory Manager
Performs the classic functions required to manage primary memory Attempts to efficiently use primary memory Keep programs/data in primary memory only
while they are being used by CPU Store/restore data in secondary memory soon
after it has been used or created Exploits storage hierarchies
Virtual memory manager
8
Requirements on Memory Designs
The primary memory access time must be as small as possible
The perceived primary memory must be as large as possible
The memory system must be cost effective
9
Functions of Memory Manager Allocate primary memory space to
processes Map the process address space into the
allocated portion of the primary memory Minimize access times using a cost-
effective amount of primary memory May use static or dynamic techniques
10
Memory Manager Only a small number of interface
functions provided – usually calls to: Request/release primary memory space Load programs Share blocks of memory
Provides following Memory abstraction Allocation/deallocation of memory Memory isolation Memory sharing
11
Memory Abstraction Process address space
Allows process to use an abstract set of addresses to reference physical primary memory
Mapped to objectother than memory
Process Address Space Hardware Primary Memory
12
Address Space Program must be brought into memory
and placed within a process for it to be executed A program is a file on disk CPU reads instructions from main memory and
reads/writes data to main memory Determined by the computer architecture
Address binding of instructions and data to memory addresses
13
Creating an Executable Program
Sourcecode
• Compile time: Translate elements
• Load time:• Allocate primary memory• Adjust addresses in address space (relocation)• Copy address space from secondary to primary memory
Loader Processaddressspace
Primarymemory
C
RelocObjectcode
LinkEdit
Librarycode
Otherobjects
Secondary memory
• Link time: Combine elements
14
Bindings Compiler
Binds static variables to storage locations relative to start of data segment
Binds automatic variables to storage locations relative to bottom of stack
Linker Combines data segments and adjusts bindings
accordingly Same for stack
15
Bindings – cont. Loader
Binds logical addresses used by program with physical memory locations (address binding)
This type of binding is called static address binding
The last stage of address binding can be deferred to runtime
dynamic address binding
16
Dynamic Memory Static and automatic variables are
assigned addresses in the data or stack segments at compile time
Dynamic memory allocation (e.g., new or malloc) is done at runtime This is not handled by the memory manager This merely binds parts of the process’s
address space to dynamic data structures Memory manager gets involved if the
process runs out of address space
17
Variations in program linking/loading
18
Normal linking and loading
19
Load-time dynamic linking
20
Run-time dynamic linking
21
Data Storage Allocation Static variables
stored in programs data segment Automatic variables
Stored on stack Dynamically allocated space (new or
malloc) Taken from heap storage – no system call
Note: If heap disappears, kernel memory manager invoked to get more memory for the process
22
C Style Memory LayoutText Segment
Initialized Part Data Segment
Uninitialized Part Data Segment
Heap Storage
Stack Segment
EnvironmentVariables, …High Address
Low Address
23
Program and Process Address Spaces
Process Address Space Hardware Primary MemoryAbsoluteProgramAddressSpace
0
3 GB
4 GB
User ProcessAddressSpace
SupervisorProcessAddressSpace
24
Overview of Memory Management Techniques
Memory allocation strategies View the process address space and the
primary memory as contiguous address space Paging and segmentation based
techniques View the process address space and the
primary memory as a set of pages / segments Map an address in process space to a memory
address Virtual memory
Extension of paging/segmentation based techniques
To run a program, only the current pages/segments need to in primary memory
25
Memory Allocation Strategies
- There are two different levels in memory allocation
26
Two levels of memory management
27
Memory Management System Calls
In Unix, the system call is brk Increase the amount of memory allocated to a
process
28
Malloc and New functions They are user-level memory allocation
functions, not system calls
29
Memory Management
30
Issues in a memory allocation algorithm
Memory layout / organization how to divide the memory into blocks for
allocation? Fixed partition method: divide the memory once before
any bytes are allocated. Variable partition method: divide it up as you are
allocating the memory. Memory allocation
select which piece of memory to allocate to a request
Memory organization and memory allocation are close related
It is a very general problem Variations of this problem occurs in many places.
For examples: disk space management
31
Static Memory Allocation
OperatingSystem
Process 3
Process 0
Process 2
Process 1
UnusedIn Use
Issue: Need a mechanism/policy for loading pi’s address space into primary memory
pi
32
Fixed-Partition Memory allocation
Statically divide the primary memory into fixed size regions Regions can have different sizes or same sizes
A process / request can be allocated to any region that is large enough
33
Fixed-Partition Memory allocation – cont.
Advantages easy to implement. Good when the sizes for memory requests are
known. Disadvantage:
cannot handle variable-size requests effectively. Might need to use a large block to satisfy a
request for small size. Internal fragmentation – The difference between
the request and the allocated region size; Space allocated to a process but is not used It can be significant if the requests vary in size
considerably
34
Fixed-Partition Memory Mechanism
OperatingSystem
Region 3
Region 2
Region 1
Region 0 N0
N1
N2
N3
pi
pi needs ni units
ni
35
Which free block to allocate
How to satisfy a request of size n from a list of free blocks First-fit: Allocate the first hole that is big enough Next-fit: Choose the next block that is large
enough Best-fit: Allocate the smallest hole that is big
enough; must search entire list, unless ordered by size. Produces the smallest leftover hole.
Worst-fit: Allocate the largest hole; must also search entire list. Produces the largest leftover hole.
36
Fixed-Partition Memory -- Best-Fit
OperatingSystem
Region 3
Region 2
Region 1
Region 0 N0
N1
N2
N3
pi
InternalFragmentation
• Loader must adjust every address in the absolute module when placed in memory
37
Fixed-Partition Memory -- Worst-Fit
OperatingSystem
Region 3
Region 2
Region 1
Region 0 N0
N1
N2
N3
pi
38
Fixed-Partition Memory -- First-Fit
OperatingSystem
Region 3
Region 2
Region 1
Region 0 N0
N1
N2
N3
pi
39
Fixed-Partition Memory -- Next-Fit
OperatingSystem
Region 3
Region 2
Region 1
Region 0 N0
N1
N2
N3
pi
Pi+1
40
Variable partition memory allocation
Grant only the size requested Example:
total 512 bytes: allocate(r1, 100), allocate(r2, 200), allocate(r3, 200), free(r2), allocate(r4, 10), free(r1), allocate(r5, 200)
External Fragmentation Memory is divided up into small blocks that
none of them can be used to satisfy any requests.
41
Issues in Variable partition memory allocation
Where are the free memory blocks? Keeping trace of the memory blocks List method and bitmap method
Which memory blocks to allocate? There may exist multiple free memory blocks
that can satisfy a request. Which block to use? Fragmentation must be minimized How to keep track of free and allocated
memory blocks?
42
Variable Partition Memory MechanismOperating
SystemOperating
SystemProcess 0
Process 6
Process 2
Process 5Process 4
• Compaction moves program in memory in (d)
OperatingSystem
Process 0
Process 6
Process 2
Process 5
Process 4
• External fragmentation in (c)
OperatingSystem
Process 0
Process 1
Process 2
Process 3Process 4
Loader adjusts every address in every absolute module when placed in memory
43
Cost of Moving Programs
load R1, 0x02010
3F013010
Program loaded at 0x01000 3F016010
Program loaded at 0x04000
• Must run loader over program again!
Consider dynamic techniques
Compaction requires that a program be moved
44
Dynamic Memory Allocation
Could use dynamically allocated memory
Process wants to change the size of its address space Smaller Creates an external fragment Larger May have to move/relocate the
program Allocate “holes” in memory according
to Best- /Worst- / First- /Next-fit
45
Contemporary Memory Allocation
Use some form of variable partitioning Usually allocate memory in fixed-size
blocks (pages) Simplifies management of free list Greatly complicates binding problem
46
Dynamic Address Space Binding Recall: in static binding
Symbols first bound to relative addresses in a relocatable module at compile time
Then to addresses in absolute module at link time
Then to primary memory addresses at load time
Dynamic binding Wait to bind absolute program addresses
until run time Simplest mechanism is dynamic relocation
Usually implemented by the processor
47
Dynamic Address Relocation
CPU
0x02010
0x10000+
MARload R1, 0x02010
0x12010
• Program loaded at 0x10000 Relocation Register = 0x10000• Program loaded at 0x04000 Relocation Register = 0x04000
Relocation Register
Relative Address
We never have to change the load module addresses!
Performed automagically by processor
48
Dynamic Address Relocation Same holds for multiple segment
registersCPU (Generated address)
Relative Address
Code register
Stack register
Data register
+
MAR
Primary memory
49
Runtime Bound CheckingCPU
Relative Address
Relocation Register+
MAR
Limit Register <
Interrupt
• Bound checking is inexpensive to add
• Provides excellent memory protection
50
Memory Mgmt Strategies Fixed-Partition used only in batch
systems Variable-Partition used everywhere
(except in virtual memory) Swapping systems
Popularized in timesharing Relies on dynamic address relocation
Dynamic Loading (Virtual Memory) Exploit the memory hierarchy Paging -- mainstream in contemporary
systems Shared-memory multiprocessors
51
Swapping
Special case of dynamic memory allocation
Suppose there is high demand for executable memory
Equitable policy might be to time-multiplex processes into the memory (also space-mux)
Means that process can have its address space unloaded when it still needs memory Usually only happens when it is blocked
52
Swapping – cont.
Objective Optimize system performance by removing a
process from memory when it is blocked, allowing that memory to be used by other processes
Block may be caused by a request for a resource, or by the memory manager
Swapping only becomes necessary when processes are being denied access to memory
53
Swapping – cont.
Image for pj
Image for pi
Swap pi out
Swap pj in
54
Cost of Swapping Need to consider time to copy execution
image from primary to secondary memory, and back This is the major part of the swap time
In addition, there is the time required by the memory manager, and the usual context switching time
55
Swapping Systems Standard swapping used in few systems
Requires too much swapping time and provides too little execution time
Most systems do use some modified version of swapping In UNIX, swapping is normally disabled, but
will be enabled if memory usage reaches some threshold limit; when usage drops below the threshold, swapping is again disabled
56
Virtual Memory Allows a process to execute when only
part of its address space is loaded in primary memory – the rest is in secondary
Need to be able to partition the address space into parts that can be loaded into primary memory when needed
57
Virtual Memory – cont. A characteristic of programs that is very
important to the strategy used by virtual memory systems is spatial reference locality Refers to the implicit partitioning of code and
data segments due to the functioning of the program (portion for initializing data, another for reading input, others for computation, etc.)
Can be used to select which parts of the process should be loaded into primary memory
58
Virtual Memory Barriers Must be able to treat the address space
in parts that correspond to the various localities that will exist during the programs execution
Must be able to load a part anywhere in physical memory and dynamically bind the addresses appropriately
More on this in next chapter
59
Shared-memory Multiprocessors Several processors share an
interconnection network to access a set of shared-memory modules
Any CPU can read/write any memory unit
CPU CPU CPU. . .
Memory Memory Memory. . .
Interconnection Network
60
Shared-memory Multiprocessors – cont.
Goal is to use processes or threads to implement units of computation on different processors while sharing information via common primary memory locations
One technique would be to have the address spaces of two processes overlap
Another would split the address space of a process into a private part and a public part
61
Sharing a Portion of the Address Space
Process 1
Process 2
Address Space for Process 2
Address Space for Process 1
Primary Memory
62
Figure 11‑26: Multiple Segments
Relocation
RelocationLimit
Limit
Relocation
RelocationLimit
Limit
CPU Executing Process 1
CPU Executing Process 2
Primary Memory
Shared
Priv
ate
toPr
oces
s 1Pr
ivat
e to
Proc
ess 2
63
Shared-memory Multiprocessors – cont.
A major problem is synchronization How can one process detect when the other
process has written or read information Will need to use interprocess communication
to handle the synchronization Another problem is overloading the
interconnection network Use cache memories to decrease load on
network
64
VIRTUAL MEMORY
65
Virtual Memory Manager Provides abstraction of physical memory Creates virtual address space in
secondary memory and then “automatically” determines which part of the address space to load into primary memory at any given time
Allows application programmers to think that they have a very large address space in which to write programs
66
Virtual Memory Organization
Memory Image for pi
Secondary Memory
Primary Memory
67
Locality Programs do not access their address
space uniformly they access the same location over and over
Spatial locality: processes tend to access location near to location they just accessed because of sequential program execution because data for a function is grouped
together Temporal locality: processes tend to
access data over and over again because of program loops because data is processed over and over again
68
Spatial Reference Locality
Address space is logically partitioned Text, data, stack Initialization, main, error
handle Different parts have
different reference patterns:
Address Space for pi
Initialization code (used once)Code for 1Code for 2Code for 3Code for error 1Code for error 2Code for error 3Data & stack
30%
20%
35%
15%
<1%
<1%
Execution time
69
Virtual Memory Every process has code and data
locality Dynamically load/unload currently-
used address space fragments as the process executes
Uses dynamic address relocation/binding Generalization of base-limit registers Physical address corresponding to a
compile-time address is not bound until run time
70
Virtual Memory – cont Since binding changes with time, use a
dynamic virtual address map, Yt
VirtualAddressSpace
Yt
71
Virtual Memory – cont
Virtual Address Space for piVirtual Address Space for pj
Virtual Address Space for pk
Secondary Memory
• Complete virtual address space is stored in secondary memory
Primary Memory
0
n-1
Physical Address Space
• Fragments of the virtual address space are dynamically loaded into primary memory at any given time
• Each address space is fragmented
72
Address Translation Virtual memory systems distinguish
among symbolic names, virtual address, and physical address spaces
Need to map symbolic names to virtual addresses, and then to physical addresses
Compiler/assembler and link editor handle mapping from symbolic names in name space to virtual address
When program is executed, the virtual addresses are mapped to physical addresses
73
Names, Virtual Addresses & Physical Addresses
SourceProgram
AbsoluteModule
Name SpacePi’s Virtual
Address Space
ExecutableImage
PhysicalAddress Space
Yt: Virtual Address Space Physical Address Space
74
Address Formation Translation system creates an address
space, but its address are virtual instead of physical
A virtual address, x: Is mapped to physical address y = Yt(x) if x
is loaded at physical address y Is mapped to W (the null address) if x is not
loaded The map, Yt, changes as the process
executes -- it is “time varying” Yt: Virtual Address Physical Address
{W}
75
Translation Process If Yt(k) = W at time t and the process
references location k, then The virtual manager will stop the process The referenced location is loaded at some
location (say m) The manager changes Yt(k) = m The manager lets the process continue execution
Note that the referenced element was determined missing after an instruction started execution – CPU needs to be able to “back out” of an instruction and reexecute instruction after translation mapping
76
Size of Blocks of Memory
Virtual memory system transfers “blocks” of the address space to/from primary memory
Fixed size blocks: System-defined pages are moved back and forth between primary and secondary memory
Variable size blocks: Programmer-defined segments – corresponding to logical fragments – are the unit of movement
Paging is the commercially dominant form of virtual memory today
77
Paging A page is a fixed size, 2h, block of
virtual addresses A page frame is a fixed size, 2h, block
of physical memory (the same size as a page)
When a virtual address, x, in page i is referenced by the CPU If page i is loaded at page frame j, the
virtual address is relocated to page frame j If page is not loaded, the OS interrupts the
process and loads the page into a page frame
78
Practicality of paging Paging only works because of locality
at any one point in time programs don’t need most of their pages
Page fault rates must be very, very low for paging to be practical like one page fault per 100,000 or more
memory references
79
Addresses Suppose there are G= 2g2h=2g+h
virtual addresses and H=2j+h physical addresses assigned to a process Each page/page frame is 2h addresses There are 2g pages in the virtual address
space 2j page frames are allocated to the
process Rather than map individual addresses
Yt maps the 2g pages to the 2j page frames That is, page_framej = Yt(pagei) Address k in pagei corresponds to address k
in page_framej
80
Page-Based Address Translation
Let N = {d0, d1, … dn-1} be the pages Let M = {b0, b1, …, bm-1} be page
frames Virtual address, i, satisfies 0i<G= 2g+h Physical address, k = U2h+V (0V<G=
2h ) U is page frame number V is the line number within the page Yt:[0:G-1] <U, V> {W} Since every page is size c=2h
page number = U = i/c line number = V = i mod c
81
Address Translation (cont)
Page # Line #Virtual Address
g bits h bits
YtMissing Page
Frame # Line #j bits h bits
Physical Address
MAR
CPU Memory
“page table”
82
Paging Algorithms Two basic types of paging algorithms
Static allocation Dynamic allocation
Three basic policies in defining any paging algorithm Fetch policy – when a page should be loaded Replacement policy –which page is unloaded Placement policy – where page should be
loaded
83
Fetch Policy Determines when a page should be
brought into primary memory Usually don’t have prior knowledge about
what pages will be needed Majority of paging mechanisms use a
demand fetch policy Page is loaded only when process references it
84
Demand Paging Algorithm1. Page fault occurs2. Process with missing page is
interrupted3. Memory manager locates the missing
page4. Page frame is unloaded (replacement
policy)5. Page is loaded in the vacated page
frame6. Page table is updated7. Process is restarted
85
Page references Processes continually reference memory
and so generate a stream of page references The page reference stream tells us everything about
how a process uses memory For a given size, we only need to consider the page
number If we have a reference to a page, then immediately
following references to the page will never generate a page fault0100, 0432, 0101, 0612, 0102, 0103, 0104, 0101, 0611, 0102, 01030104, 0101, 0610, 0103, 0104, 0101, 0609, 0102, 0105
Suppose the page size is 100 bytes, what is the page reference stream?
We use page reference streams to evaluate paging algorithms
86
Modeling Page Behavior Let w = r1, r2, r3, …, ri, … be a page
reference stream ri is the ith page # referenced by the
process The subscript is the virtual time for the
process Given a page frame allocation of m,
the memory state at time t, St(m), is set of pages loaded St(m) = St-1(m) Xt - Yt
Xt is the set of fetched pages at time t Yt is the set of replaced pages at time t
87
More on Demand Paging
If rt was loaded at time t-1, St(m) = St-
1(m) If rt was not loaded at time t-1 and
there were empty page frames St(m) = St-1(m) {rt}
If rt was not loaded at time t-1 and there were no empty page frames St(m) = St-1(m) {rt} - {y}
where y is the page frame unloaded
88
Replacement Policy When there is no empty page frame in
memory, we need to find one to replace Write it out to the swap area if it has been
changed since it was read in from the swap area Dirty bit or modified bit
pages that have been changed are referred to as “dirty”
these pages must be written out to disk because the disk version is out of date this is called “cleaning” the page
Which page to remove from memory to make room for a new page We need a page replacement algorithm
89
Page replacement algorithms
The goal of a page replacement algorithm is to produce the fewest page faults
We can compare two algorithms on a range of page reference streams
Or we can compare an algorithm to the best possible algorithm
We will start by considering static page replacement algorithms
90
Static Paging Algorithms A fixed number of page frames is
allocated to each process when it is created
Paging policy defines how these page frames will be loaded and unloaded
Placement policy is fixed The page frame holding the new page is
always the one vacated by the page selected for replacement
91
Static Allocation, Demand Paging
Number of page frames is static over the life of the process
Fetch policy is demand Since St(m) = St-1(m) {rt} - {y}, the
replacement policy must choose y -- which uniquely identifies the paging policy
92
Random page replacement Algorithm: replace a page randomly Theory: we cannot predict the future at
all Implementation: easy Performance: poor
but it is easy to implement but the best case, worse case and average
case are all the same
93
Random Replacement• Replaced page, y, is chosen from the m
loaded page frames with probability 1/mLet page reference stream, v = 2031203120316457
Frame 2 0 3 1 2 0 3 1 2 0 3 1 6 4 5 7012
213
213
210
310
310
312
012
032
032
062
462
452
752
203
20
2
• No knowledge of v doesn’t perform well13 page faults
94
Belady’s Optimal algorithm
The one that produces the fewest possible page faults on all page reference sequences
Algorithm: replace the page that will not be used for the longest time in the future
Problem: it requires knowledge of the future
Not realizable in practice but it is used to measure the effectiveness of
realizable algorithms
95
Belady’s Optimal Algorithm
• Replace page with maximal forward distance: yt = max xeS t-1(m)FWDt(x)
Let page reference stream, v = 2031203120316457
Frame 2 0 3 1 2 0 3 1 2 0 3 1 6 4 5 70 2 2 21 0 02 3
96
Belady’s Optimal Algorithm
• Replace page with maximal forward distance: yt = max xeS t-1(m)FWDt(x)
Let page reference stream, v = 2031203120316457
Frame 2 0 3 1 2 0 3 1 2 0 3 1 6 4 5 70 2 2 21 0 02 3
FWD4(2) = 1 FWD4(0) = 2FWD4(3) = 3
97
Belady’s Optimal Algorithm
• Replace page with maximal forward distance: yt = max xeS t-1(m)FWDt(x)
Let page reference stream, v = 2031203120316457
Frame 2 0 3 1 2 0 3 1 2 0 3 1 6 4 5 70 2 2 2 21 0 0 02 3 1
FWD4(2) = 1 FWD4(0) = 2FWD4(3) = 3
98
Belady’s Optimal Algorithm
• Replace page with maximal forward distance: yt = max xeS t-1(m)FWDt(x)
Let page reference stream, v = 2031203120316457
Frame 2 0 3 1 2 0 3 1 2 0 3 1 6 4 5 70 2 2 2 2 2 21 0 0 0 0 02 3 1 1 1
99
Belady’s Optimal Algorithm
• Replace page with maximal forward distance: yt = max xeS t-1(m)FWDt(x)
Let page reference stream, v = 2031203120316457
Frame 2 0 3 1 2 0 3 1 2 0 3 1 6 4 5 70 2 2 2 2 2 2 21 0 0 0 0 0 32 3 1 1 1 1
FWD7(2) = 2 FWD7(0) = 3FWD7(1) = 1
100
Belady’s Optimal Algorithm
• Replace page with maximal forward distance: yt = max xeS t-1(m)FWDt(x)
Let page reference stream, v = 2031203120316457
Frame 2 0 3 1 2 0 3 1 2 0 3 1 6 4 5 70 2 2 2 2 2 2 2 2 2 01 0 0 0 0 0 3 3 3 32 3 1 1 1 1 1 1 1
FWD10(2) = FWD10(3) = 2FWD10(1) = 3
101
Belady’s Optimal Algorithm
• Replace page with maximal forward distance: yt = max xeS t-1(m)FWDt(x)
Let page reference stream, v = 2031203120316457
Frame 2 0 3 1 2 0 3 1 2 0 3 1 6 4 5 70 2 2 2 2 2 2 2 2 2 0 0 01 0 0 0 0 0 3 3 3 3 3 32 3 1 1 1 1 1 1 1 1 1
FWD13(0) = FWD13(3) = FWD13(1) =
102
Belady’s Optimal Algorithm
Replace page with maximal forward distance: yt = max xeS t-1(m)FWDt(x)
Let page reference stream, v = 2031203120316457
• Perfect knowledge of v perfect performance• Impossible to implement
10 page faults
Frame 2 0 3 1 2 0 3 1 2 0 3 1 6 4 5 70 2 2 2 2 2 2 2 2 2 0 0 0 0 4 4 41 0 0 0 0 0 3 3 3 3 3 3 6 6 6 72 3 1 1 1 1 1 1 1 1 1 1 1 5 5
103
Theories of program behavior
All replacement algorithms try to predict the future and act like Belady’s optimal algorithm
All replacement algorithms have a theory of how program behave they use it to predict the future, that is, when
pages will be referenced then the replace the page that they think
won’t be referenced for the longest time.
104
LRU page replacement Least-recently used (LRU) Algorithm: remove the page that hasn’t
been referenced for the longest time Theory: the future will be like the past,
page accesses tend to be clustered in time
Implementation: hard, requires hardware assistance (and then still not easy)
Performance: very good, within 30%-40% of optimal
105
LRU model of the future
106
Least Recently Used (LRU)
Replace page with maximal forward distance: yt = max xeS t-1(m)BKWDt(x)
Let page reference stream, v = 2031203120316457
Frame 2 0 3 1 2 0 3 1 2 0 3 1 6 4 5 70 2 2 21 0 02 3
BKWD4(2) = 3BKWD4(0) = 2BKWD4(3) = 1
107
Least Recently Used (LRU)
• Replace page with maximal forward distance: yt = max xeS t-1(m)BKWDt(x)
Let page reference stream, v = 2031203120316457
Frame 2 0 3 1 2 0 3 1 2 0 3 1 6 4 5 70 2 2 2 11 0 0 02 3 3
BKWD4(2) = 3BKWD4(0) = 2BKWD4(3) = 1
108
Least Recently Used (LRU)
• Replace page with maximal forward distance: yt = max xeS t-1(m)BKWDt(x)
Let page reference stream, v = 2031203120316457
Frame 2 0 3 1 2 0 3 1 2 0 3 1 6 4 5 70 2 2 2 1 11 0 0 0 22 3 3 3
BKWD5(1) = 1BKWD5(0) = 3BKWD5(3) = 2
109
Least Recently Used (LRU)
• Replace page with maximal forward distance: yt = max xeS t-1(m)BKWDt(x)
Let page reference stream, v = 2031203120316457
Frame 2 0 3 1 2 0 3 1 2 0 3 1 6 4 5 70 2 2 2 1 1 11 0 0 0 2 22 3 3 3 0
BKWD6(1) = 2BKWD6(2) = 1BKWD6(3) = 3
110
Least Recently Used (LRU)
• Replace page with maximal forward distance: yt = max xeS t-1(m)BKWDt(x)
Let page reference stream, v = 2031203120316457
Frame 2 0 3 1 2 0 3 1 2 0 3 1 6 4 5 70 2 2 2 1 1 1 3 3 3 0 0 0 6 6 6 71 0 0 0 2 2 2 1 1 1 3 3 3 4 4 42 3 3 3 0 0 0 2 2 2 1 1 1 5 5
111
Least Recently Used (LRU)
• Replace page with maximal forward distance: yt = max xeS t-1(m)BKWDt(x)
Let page reference stream, v = 2031203120316457
Frame 2 0 3 1 2 0 3 1 2 0 3 1 6 4 5 70 2 2 2 2 2 2 2 3 2 2 2 2 6 6 6 61 0 0 0 0 0 0 0 0 0 0 0 0 4 4 42 3 3 3 3 3 3 3 3 3 3 3 3 5 53 1 1 1 1 1 1 1 1 1 1 1 1 7
• Backward distance is a good predictor of forward distance -- locality
112
LFU page replacement
Least-frequently used (LFU) Algorithm: remove the page that hasn’t
been used often in the past Theory: an actively used page should
have a large reference count Implementation: hard, also requires
hardware assistance (and then still not easy)
Performance: not very good
113
Least Frequently Used (LFU)
• Replace page with minimum use: yt = min xeS t-1(m)FREQ(x)
Let page reference stream, v = 2031203120316457
Frame 2 0 3 1 2 0 3 1 2 0 3 1 6 4 5 70 2 2 21 0 02 3
FREQ4(2) = 1FREQ4(0) = 1FREQ4(3) = 1
114
Least Frequently Used (LFU)
• Replace page with minimum use: yt = min xeS t-1(m)FREQ(x)
Let page reference stream, v = 2031203120316457
Frame 2 0 3 1 2 0 3 1 2 0 3 1 6 4 5 70 2 2 2 21 0 0 12 3 3
FREQ4(2) = 1FREQ4(0) = 1FREQ4(3) = 1
115
Least Frequently Used (LFU)
• Replace page with minimum use: yt = min xeS t-1(m)FREQ(x)
Let page reference stream, v = 2031203120316457
Frame 2 0 3 1 2 0 3 1 2 0 3 1 6 4 5 70 2 2 2 2 2 21 0 0 1 1 12 3 3 3 0
FREQ6(2) = 2FREQ6(1) = 1FREQ6(3) = 1
116
Least Frequently Used (LFU)
• Replace page with minimum use: yt = min xeS t-1(m)FREQ(x)
Let page reference stream, v = 2031203120316457
Frame 2 0 3 1 2 0 3 1 2 0 3 1 6 4 5 70 2 2 2 2 2 21 0 0 1 1 12 3 3 3 0
FREQ7(2) = ?FREQ7(1) = ?FREQ7(0) = ?
117
FIFO page replacement Algorithm: replace the oldest page Theory: pages are used for a while and
then stop being used Implementation: easy Performance: poor
because old pages are often accessed, that is, the theory if FIFO is not correct
118
First In First Out (FIFO)
• Replace page that has been in memory the longest: yt = max xeS t-1(m)AGE(x)
Let page reference stream, v = 2031203120316457
Frame 2 0 3 1 2 0 3 1 2 0 3 1 6 4 5 70 2 2 2 1 0 0 2 3
119
First In First Out (FIFO)
• Replace page that has been in memory the longest: yt = max xeS t-1(m)AGE(x)
Let page reference stream, v = 2031203120316457
Frame 2 0 3 1 2 0 3 1 2 0 3 1 6 4 5 70 2 2 21 0 02 3
AGE4(2) = 3AGE4(0) = 2AGE4(3) = 1
120
First In First Out (FIFO)
• Replace page that has been in memory the longest: yt = max xeS t-1(m)AGE(x)
Let page reference stream, v = 2031203120316457
Frame 2 0 3 1 2 0 3 1 2 0 3 1 6 4 5 70 2 2 2 11 0 0 02 3 3
AGE4(2) = 3AGE4(0) = 2AGE4(3) = 1
121
First In First Out (FIFO)
• Replace page that has been in memory the longest: yt = max xeS t-1(m)AGE(x)
Let page reference stream, v = 2031203120316457
Frame 2 0 3 1 2 0 3 1 2 0 3 1 6 4 5 70 2 2 2 11 0 0 02 3 3
AGE5(1) = ?AGE5(0) = ?AGE5(3) = ?
122
Belady’s Anomaly
• FIFO with m = 3 has 9 faults• FIFO with m = 4 has 10 faults
Let page reference stream, v = 012301401234
Frame 0 1 2 3 0 1 4 0 1 2 3 40 0 0 0 3 3 3 4 4 4 4 4 41 1 1 1 0 0 0 0 0 2 2 22 2 2 2 1 1 1 1 1 3 3
Frame 0 1 2 3 0 1 4 0 1 2 3 40 0 0 0 0 0 0 4 4 4 4 3 31 1 1 1 1 1 1 0 0 0 0 42 2 2 2 2 2 2 1 1 1 13 3 3 3 3 3 3 2 2 2
123
Belady’s Anomaly
The paging algorithm has worse performance when the amount of primary memory allocated to the process increases
Problem arises because the set of pages loaded with the smaller memory allocation is not necessarily also loaded with the larger memory allocation
124
Avoiding Belady’s Anomaly
Inclusion Property Set of pages loaded with an allocation of m
frames is always a subset of the set of pages that has a page frame allocation of m+1
FIFO does not satisfy the inclusion property
LRU and LFU do Algorithms that satisfy the inclusion
property are called stack algorithms
125
Stack Algorithms Some algorithms are well-behaved Inclusion Property: Pages loaded at
time t with m is also loaded at time t with m+1
Frame 0 1 2 3 0 1 4 0 1 2 3 40 0 0 0 31 1 1 12 2 2
Frame 0 1 2 3 0 1 4 0 1 2 3 40 0 0 0 01 1 1 12 2 23 3
LRU
126
Stack Algorithms• Some algorithms are well-behaved• Inclusion Property: Pages loaded at time t
with m is also loaded at time t with m+1
Frame 0 1 2 3 0 1 4 0 1 2 3 40 0 0 0 3 31 1 1 1 12 2 2 0
Frame 0 1 2 3 0 1 4 0 1 2 3 40 0 0 0 0 01 1 1 1 12 2 2 23 3 3
LRU
127
Stack Algorithms• Some algorithms are well-behaved• Inclusion Property: Pages loaded at time t
with m is also loaded at time t with m+1
Frame 0 1 2 3 0 1 4 0 1 2 3 40 0 0 0 3 3 31 1 1 1 0 02 2 2 2 1
Frame 0 1 2 3 0 1 4 0 1 2 3 40 0 0 0 0 0 01 1 1 1 1 12 2 2 2 23 3 3 3
LRU
128
Stack Algorithms• Some algorithms are well-behaved• Inclusion Property: Pages loaded at time t
with m is also loaded at time t with m+1
Frame 0 1 2 3 0 1 4 0 1 2 3 40 0 0 0 3 3 3 41 1 1 1 0 0 02 2 2 2 1 1
Frame 0 1 2 3 0 1 4 0 1 2 3 40 0 0 0 0 0 0 01 1 1 1 1 1 12 2 2 2 2 43 3 3 3 3
LRU
129
Stack Algorithms• Some algorithms are well-behaved• Inclusion Property: Pages loaded at time t
with m is also loaded at time t with m+1
Frame 0 1 2 3 0 1 4 0 1 2 3 40 0 0 0 3 3 3 4 4 4 2 2 21 1 1 1 0 0 0 0 0 0 3 32 2 2 2 1 1 1 1 1 1 4
Frame 0 1 2 3 0 1 4 0 1 2 3 40 0 0 0 0 0 0 0 0 0 0 0 41 1 1 1 1 1 1 1 1 1 1 12 2 2 2 2 4 4 4 4 3 33 3 3 3 3 3 3 2 2 2
LRU
130
Stack Algorithms• Some algorithms are not well-behaved• Inclusion Property: Pages loaded at time t
with m aren’t loaded at time t with m+1
FIFO Frame 0 1 2 3 0 1 4 0 1 2 3 40 0 0 0 3 3 3 4 4 4 4 4 41 1 1 1 0 0 0 0 0 2 2 22 2 2 2 1 1 1 1 1 3 3
Frame 0 1 2 3 0 1 4 0 1 2 3 40 0 0 0 0 0 0 4 4 4 4 3 31 1 1 1 1 1 1 0 0 0 0 42 2 2 2 2 2 2 1 1 1 13 3 3 3 3 3 3 2 2 2
131
Implementation LRU has become preferred algorithm Difficult to implement
Must record when each page was referenced Difficult to do in hardware
Approximate LRU with a reference bit Periodically reset Set for a page when it is referenced
Dirty bit pages that have been changed are referred
to as “dirty” these pages must be written out to disk
because the disk version is out of date this is called “cleaning” the page
132
First LRU approximation When you get a page fault
replace any page whose referenced bit is off then turn off all the referenced bits
Two classes of pages Pages referenced since the last page fault Pages not referenced since the last page fault
the least recently used page is in this class but you don’t know which one it is
A crude approximation of LRU
133
Second LRU approximation Algorithm:
Keep a counter for each page Have a daemon wake up every 500 ms and
add one to the counter of each page that has not been referenced
zero the counter of pages that have been referenced
turn off all referenced bits When you get a page fault
replace the page whose counter is largest Divides pages into 256 classes
134
Dynamic Paging Algorithms
Static page replacement algorithms assume that a process is allocated to a fixed amount of primary memory
But, the amount of physical memory – the number of page frames – varies as the process executes
How much memory should be allocated? Fault rate must be “tolerable” Will change according to the phase of process
Need to define a placement & replacement policy
Contemporary models based on working set
135
Working Set Intuitively, the working set is the set of
pages in the process’s locality Somewhat imprecise Time varying Given k processes in memory, let mi(t) be
# of pages frames allocated to pi at time t mi(0) = 0 i=1
k mi(t) |primary memory| Also have St(mi(t)) = St(mi(t-1)) Xt - Yt Or, more simply S(mi(t)) = S(mi(t-1)) Xt - Yt
136
Placed/Replaced Pages S(mi(t)) = S(mi(t-1)) Xt - Yt For the missing page
Allocate a new page frame Xt = {rt} in the new page frame
How should Yt be defined? Consider a parameter, , called the
window size Determine BKWDt(y) for every yS(mi(t-1)) if BKWDt(y) , unload y and deallocate
frame if BKWDt(y) < do not disturb y
137
Working Set Principle
Process pi should only be loaded and active if it can be allocated enough page frames to hold its entire working set
The size of the working set is estimated using Unfortunately, a “good” value of depends
on the size of the locality Empirically this works with a fixed
138
Working set algorithm Algorithm
Keep track of the working set of each running process
Only run a process if its entire working set fits in memory – called working set principle
139
Working set algorithm example
• With =3, there are 16 page faults• With =4, there are 8 – the minimum size since
there are 8 distinct pages
140
Working set algorithm example – cont.
• Letting =9 does not reduce the number of page faults
• In fact, not all the page frames are used
141
Working set algorithm example – cont.
• Here the page frame allocation changes dynamicallyincreasing and decreasing
142
Implementing the Working Set Global LRU will behave similarly to a
working set algorithm Page fault
Add a page frame to one process Take away a page frame from another process
Use LRU implementation idea Reference bit for every page frame Cleared periodically, set with each reference Change allocation of some page frame with a
clear reference bit Clock algorithms use this technique by
searching for cleared ref bits in a circular fashion
143
Performance of Demand Paging
Page Fault Rate probability: 0 p 1.0 if p = 0 no page faults if p = 1, every reference is a fault
Effective Access Time (EAT) EAT = (1 – p) x memory access
+ p (page fault overhead+ [swap page out ]+ swap page in+ restart overhead)
144
Demand Paging Performance Example
Assume memory access time = 100 nanosecond
Assume fault service time = 25 ms = 25,000,000 ns
Then EAT = (1 – p) x 100 + p (25,000,000)= 100 + 24,999,900 p (in ns)
So, if one out of 1000 accesses causes a page fault, then EAT = 100+24,999,900x0.001=25,099.9 ns ≈ 25 microseconds
145
Demand Paging Performance Example
So, if one access out of 1000 causes a page fault, the computer would be slowed down by a factor of 250 because of demand paging!
Can calculate that if we want less than 10% degradation, need to allow only one access out of 2,500,000 to page fault
146
Evaluating paging algorithms Mathematical modeling
powerful where it works but most real algorithms cannot be analyzed
Measurement implement it on a real system and measure
it extremely expensive
Simulation Test on page reference traces reasonably efficient effective
147
Performance of paging algorithms
148
Thrashing VM allows more processes in memory, so
several processes are more likely to be ready to run at the same time
If CPU usage is low, it is logical to bring more processes into memory
But, low CPU use may to due to too many pages faults because there are too many processes competing for memory
Bringing in processes makes it worse, and leads to thrashing
149
Thrashing Diagram There are too many processes in memory and
no process has enough memory to run. As a result, the page fault is very high and the system spends all of its time handling page fault interrupts.
150
Load control Load control: deciding how many
processes should be competing for page frames too many leads to thrashing too few means that memory is underused
Load control determines which processes are running at a point in time the others have no page frames and cannot
run CPU load is a bad load control measure Page fault rate is a good load control
measure
151
Load control and page replacement
152
Two levels of scheduling
153
Load control algorithms A load control algorithm measures
memory load and swaps processes in and out depending on the current load
Load control measures rotational speed of the clock hand average time spent in the standby list page fault rate
154
Page fault frequency load control
L = mean time between page faults S = mean time to service a page fault Try to keep L = S
if L < S, then swap a process out if L > S, then swap a process in
If L = S, then the paging system can just keep up with the page faults
155
Windows NT Paging System
PrimaryMemory
Virtual AddressSpace
Supv spaceUser space
Paging Disk(Secondary Memory)
2. Lookup (Page i, Addr k)
3. Translate (Page i, Addr k) to (Page Frame j, Addr k)
4. Reference (Page Frame j, Addr k)
1. Reference to Address k in Page i (User space)
156
Windows Address Translation
Page Directory Page Table Byte Index
Virtual page number Line number
Page Directory
Page Tables
TargetPage
Target Byte
c
b
a
A
C
B
157
Linux Virtual Address Translation
j.pgdj.pgd j.pmdj.pmdVirtual Address
PageDirectory
PageDirectoryBase
Page
PageTable
Page MiddleDirectory
j.ptej.pte j.offsetj.offset
158
Segmentation Unit of memory movement is:
Variably sized Defined by the programmer
Two component addresses, <Seg#, offset> Seg # is reference to a base location Offset is offset of target within segment
Address translation is more complex than paging
159
Segment Address Translation
• Yt: segments x offsets physical address {W}• Yt(i, j) = k; i = segment, j = offset, k = physical address
• Segment names typically symbolic; bound at runtime• s: segments segment addresses• Yt(s(segName), j) = k
• Offset may also not be bound until runtime• l: offset names offset addresses
• So, address map could be as complex as• Yt(s(segName), l(offsetName)) = k
• Address translation is more complex than paging
160
Segment Address Translation
Task of designing segmentation system to handle such general address translation is very challenging
Each memory reference is theoretically a pair of symbols to be translated when the reference occurs
In addition, the mappings are time-varying The segment could be anywhere in primary
and/or secondary memory
161
Address Translation
s
<segmentName, offsetName>
l
segment # offset
Yt
Missing segment
Limit Base P
Limit
Relocation
+
?
To Memory Address Register
162
Address Translation – cont.
System maintains segment table for each process (which is a segment itself)
Table contains a set of entries – called segment descriptors
Descriptors contain fields to support relocation; also indicates if not loaded Base: relocation register for segment Limit: length of segment Protection: allowable forms of access
163
Implementation Most implementations do not fully
implement the address translation model
Segmentation requires special hardware Segment descriptor support Segment base registers (segment, code,
stack) Translation hardware
Some of translation can be static No dynamic offset name binding Limited protection
164
Multics Designed in late 60’s Old, but still state-of-the-art
segmentation Uses linkage segments to support sharing Uses dynamic offset name binding Required sophisticated memory
management unit See pp 500-502