cs 519: lecture 3 zmemory management. 2 cs 519operating system theory memory management...
Post on 21-Dec-2015
216 views
TRANSCRIPT
CS 519: Lecture 3
Memory Management
CS 519Operating System
Theory2
Memory Management
Requirements for memory management strategy: Consistency: all address spaces look “basically the
same” Relocation: processes can be loaded at any physical
address Protection: a process cannot maliciously access
memory belonging to another process Sharing: may allow sharing of physical memory
(must implement control)
CS 519Operating System
Theory3
Basic Concepts: Memory Partitioning
Static: a process must be loaded into a partition of equal or greater size => Internal Fragmentation Dynamic: each process load into partition of exact size => External fragmentation
Memory Memory
New Job
CS 519Operating System
Theory4
Basic Concepts: Pure Paging and Segmentation
Paging: memory divided into equal-sized frames. All process pages loaded into non-necessarily contiguous frames Segmentation: each process divided into variable-sized segments. All process segments loaded into dynamic partitions that are not necessarily contiguous More details in the context of Virtual Memory
CS 519Operating System
Theory5
Memory Hierarchy
Registers
Cache
Memory
Question: What if we want to support programs that require more memory than what’s available in the system?
CS 519Operating System
Theory6
Registers
Cache
Memory
Virtual Memory
Memory Hierarchy
Answer: Pretend we had something bigger => Virtual Memory
CS 519Operating System
Theory7
Virtual Memory
Virtual memory is the OS abstraction that gives the programmer the illusion of an address space that may be larger than the physical address space
Virtual memory can be implemented using either paging or segmentation but paging is most common
Virtual memory is motivated by both Convenience: the programmer does not have to deal with
the fact that individual machines may have very different amounts of physical memory
Higher degree of multiprogramming: processes are not loaded as a whole. Rather they are loaded on demand
CS 519Operating System
Theory8
Virtual Memory: Paging
A page is a cacheable unit of virtual memory The OS controls the mapping between pages of
VM and physical memory More flexible (at a cost)
Cache
Memory
Memory
VM
framepage
CS 519Operating System
Theory9
Virtual Memory: Segmentation
Memory
Job 0
Job 1
CS 519Operating System
Theory10
Hardware Translation
Translation from virtual to physical can be done in software However, hardware support is needed to ensure protection
and perform translation faster Simplest solution with two registers: base and size
Processor
Physicalmemory
translationbox (MMU)
CS 519Operating System
Theory11
Segmentation Hardware
Segments are of variable size Translation done through a set of (base, size, state)
registers - segment table State: valid/invalid, access permission, reference, and
modified bits Segments may be visible to the programmer and can be
used as a convenience for organizing the programs and data (i.e code segment or data segments)
virtual addressoffset
segment
segment table
+ physical address
CS 519Operating System
Theory12
Paging hardware
Pages are of fixed size The physical memory corresponding to a page is called a page frame Translation done through a page table indexed by page number Each entry in a page table contains the physical frame number that
the virtual page is mapped to and the state of the page in memory State: valid/invalid, access permission, reference, modified, and
caching bits Paging is transparent to the programmer
virtual address
page table
+ physical addresspage # offset
CS 519Operating System
Theory13
Combined Paging and Segmentation
Some MMUs combine paging with segmentation Virtual address: segment number + page number + offset Segmentation translation is performed first The segment entry points to a page table for that segment The page number is used to index the page table and look
up the corresponding page frame number Segmentation not used much anymore so we’ll focus on
paging UNIX has simple form of segmentation but does not require
any hardware support
CS 519Operating System
Theory14
Paging: Address Translation
CPU p d
p
f
f d
f
d
page tablememory
virtual address
physical address
CS 519Operating System
Theory15
Translation Lookaside Buffers
Translation on every memory access must be fast
What to do? Caching, of course … Why does caching work? Temporal locality! Same as normal memory cache – cache is smaller so
can spend more $$ to make it faster Cache for page table entries is called the
Translation Lookaside Buffer (TLB) Typically fully associative No more than 64 entries
On every memory access, we look for the page frame mapping in the TLB
CS 519Operating System
Theory16
Paging: Address Translation
CPU p d
f d
f
d
TLB
memory
virtual address
physical address
p/f
f
CS 519Operating System
Theory17
TLB Miss
What if the TLB does not contain the right PT entry? TLB miss Evict an existing entry if does not have any free ones
Replacement policy? Bring in the missing entry from the PT
TLB misses can be handled in hardware or software Software allows application to assist in replacement
decisions
CS 519Operating System
Theory18
Where to Store Address Space?
Virtual address space may be larger than physical memory
Where do we keep it? Where do we keep the page table?
CS 519Operating System
Theory19
Where to Store Address Space?
On the next device down our storage hierarchy, of course …
Memory
VM
Disk
CS 519Operating System
Theory20
Where to Store Page Table?
In memory, of course …
OS
Code
Globals
Stack
Heap
P1 Page Table
P0 Page Table
Interestingly, use memory to “enlarge” view of memory, leaving LESS physical memory
This kind of overhead is common
Gotta know what the right trade-off is
Have to understand common application characteristics
Have to be common enough!
CS 519Operating System
Theory21
Page table structure
Page table can become huge What to do?
Two-Level PT: saves memory but requires two lookups per access Page the page tables Inverted page tables (one entry per page frame in physical memory):
translation through hash tables
PageTable
MasterPT
2nd-LevelPTs
P1 PT
P0 PT
Kernel PTNon-page-able
Page-able
OS Segment
CS 519Operating System
Theory22
Demand Paging
To start a process (program), just load the code page where the process will start executing
As process references memory (instruction or data) outside of loaded page, bring in as necessary
How to represent fact that a page of VM is not yet in memory?
012
1 vii
A
B
C
0
1
23
A
0
1
2
B
C
VM
Paging Table Memory Disk
CS 519Operating System
Theory23
Page Fault
What happens when process references a page marked as invalid in the page table? Page fault trap Check that reference is valid Find a free memory frame Read desired page from disk Change valid bit of page to v Restart instruction that was interrupted by the trap
Is it easy to restart an instruction? What happens if there is no free frame?
CS 519Operating System
Theory24
Page Fault (Cont’d)
So, what can happen on a memory access?1. TLB miss read page table entry2. TLB miss read kernel page table entry3. Page fault for necessary page of process page table4. All frames are used need to evict a page modify a
process page table entry1. TLB miss read kernel page table entry2. Page fault for necessary page of process page table3. Go back
5. Read in needed page, modify page table entry, fill TLB
CS 519Operating System
Theory25
Cost of Handling a Page Fault
Trap, check page table, find free memory frame (or find victim) … about 200 - 600 s
Disk seek and read … about 10 ms Memory access … about 100 ns Page fault degrades performance by ~100000!!!!!
And this doesn’t even count all the additional things that can happen along the way
Better not have too many page faults! If want no more than 10% degradation, can only have 1
page fault for every 1,000,000 memory accesses OS must do a great job of managing the movement of data
between secondary storage and main memory
CS 519Operating System
Theory26
Page Replacement
What if there’s no free frame left on a page fault? Free a frame that’s currently being used
1. Select the frame to be replaced (victim)2. Write victim back to disk3. Change page table to reflect that victim is now invalid4. Read the desired page into the newly freed frame5. Change page table to reflect that new page is now valid6. Restart faulting instruction
Optimization: do not need to write victim back if it has not been modified (need dirty bit per page).
CS 519Operating System
Theory27
Page Replacement
Highly motivated to find a good replacement policy That is, when evicting a page, how do we choose the
best victim in order to minimize the page fault rate?
Is there an optimal replacement algorithm? If yes, what is it?
Let’s look at an example: Suppose we have 3 memory frames and are running a
program that has the following reference pattern7, 0, 1, 2, 0, 3, 0, 4, 2, 3
CS 519Operating System
Theory28
Page Replacement
Suppose we know the access pattern in advance7, 0, 1, 2, 0, 3, 0, 4, 2, 3
Optimal algorithm is to replace the page that will not be used for the longest period of time
What’s the problem with this algorithm? Realistic policies try to predict future behavior on
the basis of past behavior Works because of locality
CS 519Operating System
Theory29
FIFO
First-in, First-out Be fair, let every page live in memory for the about the
same amount of time, then toss it.
What’s the problem? Is this compatible with what we know about behavior of
programs?
How does it do on our example?7, 0, 1, 2, 0, 3, 0, 4, 2, 3
CS 519Operating System
Theory30
LRU
Least Recently Used On access to a page, timestamp it When need to evict a page, choose the one with the
oldest timestamp What’s the motivation here?
Is LRU optimal? In practice, LRU is quite good for most programs
Is it easy to implement?
CS 519Operating System
Theory31
Not Frequently Used Replacement
Have a reference bit and software counter for each page frame
At each clock interrupt, the OS adds the reference bit of each frame to its counter and then clears the reference bit
When need to evict a page, choose frame with lowest counter What’s the problem?
Doesn’t forget anything, no sense of time – hard to evict a page that was referenced long in the past but is no longer relevant
Updating counters is expensive, especially since memory is getting rather large these days
Can be improved with an aging scheme: counters are shifted right before adding the reference bit and the reference bit is added to the leftmost bit (rather than to the rightmost one)
CS 519Operating System
Theory32
Clock (Second-Chance)
Arrange physical pages in a circle, with a clock hand
Hardware keeps 1 use bit per frame. Sets use bit on memory reference to a frame. If bit is not set, hasn’t been used for a while
On page fault:1. Advance clock hand2. Check use bit
If 1, has been used recently, clear and go on If 0, this is our victim
Can we always find a victim?
CS 519Operating System
Theory33
Nth-Chance
Similar to Clock except: maintain a counter as well as a use bit
On page fault:1. Advance clock hand2. Check use bit
If 1, clear and set counter to 0 If 0, increment counter, if counter < N, go on, otherwise,
this is our victim
What’s the problem if N is too large?
CS 519Operating System
Theory34
A Different Implementation of2nd-Chance
Always keep a free list of some size n > 0 On page fault, if free list has more than n frames, get a
frame from the free list If free list has only n frames, get a frame from the list,
then choose a victim from the frames currently being used and put on the free list
On page fault, if page is on a frame on the free list, don’t have to read page back in.
Implemented on VAX … works well, gets performance close to true LRU
CS 519Operating System
Theory35
Virtual Memory and Cache Conflicts
Assume an architecture with direct-mapped caches First-level caches are often direct-mapped
The VM page size partitions a direct-mapped cache into a set of cache-pages
Page frames are colored (partitioned into equivalence classes) where pages with the same color map to the same cache-page
Cache conflicts can occur only between pages with the same color, and no conflicts can occur within a single page
CS 519Operating System
Theory36
VM Mapping to Avoid Cache Misses
Goal: to assign active virtual pages to different cache-pages
A mapping is optimal if it avoids conflict misses A mapping that assigns two or more active pages
to the same cache-page can induce cache conflict misses
Example: a program with 4 active virtual pages 16 KB direct-mapped cache a 4 KB page size partitions the cache into four cache-pages there are 256 mappings of virtual pages into cache-pages
but only 4!= 24 are optimal
CS 519Operating System
Theory37
Page Re-coloring
With a bit of hardware, can detect conflict at runtime Count cache misses on a per-page basis
Can solve conflicts by re-mapping one or more of the conflicting virtual pages into new page frames of different color Re-coloring
For the limited applications that have been studied, only small performance gain (~10-15%)
CS 519Operating System
Theory38
Multi-Programming Environment
Why? Better utilization of resources (CPU, disks, memory, etc.)
Problems? Mechanism – TLB, caches? How to guarantee fairness? Over commitment of memory
What’s the potential problem? Each process needs its working set in memory in order
to perform well If too many processes running, can thrash
CS 519Operating System
Theory39
Support for Multiple Processes
More than one address space should be loaded in memory
A register points to the current page table OS updates the register when context switching
between threads from different processes Most TLBs can cache more than one PT
Store the process id to distinguish between virtual addresses belonging to different processes
If no pids, then TLB must be flushed at the process switch time
CS 519Operating System
Theory40
Sharing
physical memory:
v-to-p memory mappings
processes:
virtual address spacesp1 p2
CS 519Operating System
Theory41
Copy-on-Write
p1 p2 p1 p2
CS 519Operating System
Theory42
Resident Set Management
How many pages of a process should be brought in ?
Resident set size can be fixed or variable Replacement scope can be local or global Most common schemes implemented in the OS:
Variable allocation with global scope: simple - resident set size of some process is modified at replacement time
Variable allocation with local scope: more complicated - resident set size is modified periodically to approximate the working set size
CS 519Operating System
Theory43
Working Set
The set of pages that have been referenced in the last window of time
The size of the working set varies during the execution of the process depending on the locality of accesses
If the number of pages allocated to a process covers its working set then the number of page faults is small
Schedule a process only if enough free memory to load its working set
How can we determine/approximate the working set size?
CS 519Operating System
Theory44
Page-Fault Frequency
A counter per page stores the virtual time between page faults
An upper threshold for the virtual time is defined If the amount of time since the last page fault is
less than the threshold (frequent faults), then the page is added to the resident set
A lower threshold can be used to discard pages from the resident set
If time between faults higher than the lower threshold (infrequent faults), then discard the LRU page of this process
CS 519Operating System
Theory45
Application-Controlled Paging
OS kernel provides the mechanism and implements the global policy Chooses the process which has to evict a page when
need a free frame
Application decides the local replacement Chooses the particular page that should be evicted
Basic protocol for an external memory manager: At page fault, kernel upcalls the manager asking it to
pick a page to be evicted Manager provides the info and kernel re-maps it as
appropriate
CS 519Operating System
Theory46
Summary
Virtual memory is a way of introducing another level in our memory hierarchy in order to abstract away the amount of memory actually available on a particular system This is incredibly important for “ease-of-programming” Imagine having to explicitly check for size of physical memory
and manage it in each and every one of your programs Can be implemented using paging (sometimes
segmentation) Page fault is expensive so can’t have too many of them
Important to implement good page replacement policy Have to watch out for thrashing!!
CS 519Operating System
Theory47
Single Address Space
What’s the point? Virtual address space is currently used for three
purposes Provide the illusion of a (possibly) larger address space
than physical memory Provide the illusion of a contiguous address space while
allowing for non-contiguous storage in physical memory Protection
Protection, provided through private address spaces, makes sharing difficult
There is no inherent reason why protection should be provided by private address spaces
CS 519Operating System
Theory48
Private Address Spaces vs. Sharing
Shared physical page may be mapped to different virtual pages when shared BTW, what happens if we want to page red page out?
This variable mapping makes sharing of pointer-based data structures difficult
Storing these data structures on disk is also difficult
physical memory:
v-to-p memory mappings
processes:
virtual address spaces
p1 p2
CS 519Operating System
Theory49
Private Address Space vs. Sharing (Cont’d)
Most complex data structures are pointer-based Various techniques have been developed to deal
with this Linearization Pointer swizzling: translation of pointers; small/large AS OS support for multiple processes mapping the same
physical page to the same virtual page All above techniques are either expensive
(linearization and swizzling) or have shortcomings (mapping to same virtual page requires previous agreement)
CS 519Operating System
Theory50
Opal: The Basic Idea
Provide a single virtual address space to all processes in the system … Can be used on a single machine or set of machines on a
LAN
… but … but … won’t we run out of address space? Enough for 500 years if allocated at 1 Gigabyte per second
A virtual address means the same thing to all processes Share and save to secondary storage data structures as is
CS 519Operating System
Theory51
Opal: The Basic Idea
OS
Code
Globals
Stack
Heap
Code
Data
Data
Code
Data
P0
P1
CS 519Operating System
Theory52
Opal: Basic Mechanisms
Protection domain process Container for all resources allocated to a running instantiation
of a program Contains identity of “user”, which can be used for access
control (protection) Virtual address space is allocated in chunks – segments
Segments can be persistent, meaning they might be stored on secondary storage, and so cannot be garbage collected
Segments (and other resources) are named by capabilities Capability is an “un-forgeable” set of rights to access a
resource (we’ll learn more about this later) Access control + identity capabilities Can attach to a segment once have a capability for it
Portals: protection domain entry points (RPC)
CS 519Operating System
Theory53
Opal: Issues
Naming of segments: capabilities Recycling of addresses: reference counting Non-contiguity of address space: segments
cannot grow, so must request segments that are large enough for data structures that assume contiguity
Private static data: must use register-relative addressing