cs 519: lecture 3 zmemory management. 2 cs 519operating system theory memory management...

CS 519: Lecture 3

Memory Management

CS 519Operating System

Theory2

Memory Management

Requirements for memory management strategy: Consistency: all address spaces look “basically the

same” Relocation: processes can be loaded at any physical

address Protection: a process cannot maliciously access

memory belonging to another process Sharing: may allow sharing of physical memory

(must implement control)


Theory3

Basic Concepts: Memory Partitioning

Static: a process must be loaded into a partition of equal or greater size => Internal Fragmentation Dynamic: each process load into partition of exact size => External fragmentation

Memory Memory

New Job


Theory4

Basic Concepts: Pure Paging and Segmentation

Paging: memory divided into equal-sized frames. All process pages loaded into non-necessarily contiguous frames Segmentation: each process divided into variable-sized segments. All process segments loaded into dynamic partitions that are not necessarily contiguous More details in the context of Virtual Memory


Theory5

Memory Hierarchy

Registers

Cache

Memory

Question: What if we want to support programs that require more memory than what’s available in the system?


Theory6

Registers

Cache

Memory

Virtual Memory

Memory Hierarchy

Answer: Pretend we had something bigger => Virtual Memory


Theory7

Virtual Memory

Virtual memory is the OS abstraction that gives the programmer the illusion of an address space that may be larger than the physical address space

Virtual memory can be implemented using either paging or segmentation but paging is most common

Virtual memory is motivated by both Convenience: the programmer does not have to deal with

the fact that individual machines may have very different amounts of physical memory

Higher degree of multiprogramming: processes are not loaded as a whole. Rather they are loaded on demand


Theory8

Virtual Memory: Paging

A page is a cacheable unit of virtual memory The OS controls the mapping between pages of

VM and physical memory More flexible (at a cost)

Cache

Memory

Memory

VM

framepage


Theory9

Virtual Memory: Segmentation

Memory

Job 0

Job 1


Theory10

Hardware Translation

Translation from virtual to physical can be done in software However, hardware support is needed to ensure protection

and perform translation faster Simplest solution with two registers: base and size

Processor

Physicalmemory

translationbox (MMU)


Theory11

Segmentation Hardware

Segments are of variable size Translation done through a set of (base, size, state)

registers - segment table State: valid/invalid, access permission, reference, and

modified bits Segments may be visible to the programmer and can be

used as a convenience for organizing the programs and data (i.e code segment or data segments)

virtual addressoffset

segment

segment table

+ physical address


Theory12

Paging hardware

Pages are of fixed size The physical memory corresponding to a page is called a page frame Translation done through a page table indexed by page number Each entry in a page table contains the physical frame number that

the virtual page is mapped to and the state of the page in memory State: valid/invalid, access permission, reference, modified, and

caching bits Paging is transparent to the programmer

virtual address

page table

+ physical addresspage # offset


Theory13

Combined Paging and Segmentation

Some MMUs combine paging with segmentation Virtual address: segment number + page number + offset Segmentation translation is performed first The segment entry points to a page table for that segment The page number is used to index the page table and look

up the corresponding page frame number Segmentation not used much anymore so we’ll focus on

paging UNIX has simple form of segmentation but does not require

any hardware support


Theory14

Paging: Address Translation

CPU p d

p

f

f d

f

d

page tablememory

virtual address

physical address


Theory15

Translation Lookaside Buffers

Translation on every memory access must be fast

What to do? Caching, of course … Why does caching work? Temporal locality! Same as normal memory cache – cache is smaller so

can spend more $$ to make it faster Cache for page table entries is called the

Translation Lookaside Buffer (TLB) Typically fully associative No more than 64 entries

On every memory access, we look for the page frame mapping in the TLB


Theory16

Paging: Address Translation

CPU p d

f d

f

d

TLB

memory

virtual address

physical address

p/f

f


Theory17

TLB Miss

What if the TLB does not contain the right PT entry? TLB miss Evict an existing entry if does not have any free ones

Replacement policy? Bring in the missing entry from the PT

TLB misses can be handled in hardware or software Software allows application to assist in replacement

decisions


Theory18

Where to Store Address Space?

Virtual address space may be larger than physical memory

Where do we keep it? Where do we keep the page table?


Theory19

Where to Store Address Space?

On the next device down our storage hierarchy, of course …

Memory

VM

Disk


Theory20

Where to Store Page Table?

In memory, of course …

OS

Code

Globals

Stack

Heap

P1 Page Table

P0 Page Table

Interestingly, use memory to “enlarge” view of memory, leaving LESS physical memory

This kind of overhead is common

Gotta know what the right trade-off is

Have to understand common application characteristics

Have to be common enough!


Theory21

Page table structure

Page table can become huge What to do?

Two-Level PT: saves memory but requires two lookups per access Page the page tables Inverted page tables (one entry per page frame in physical memory):

translation through hash tables

PageTable

MasterPT

2nd-LevelPTs

P1 PT

P0 PT

Kernel PTNon-page-able

Page-able

OS Segment


Theory22

Demand Paging

To start a process (program), just load the code page where the process will start executing

As process references memory (instruction or data) outside of loaded page, bring in as necessary

How to represent fact that a page of VM is not yet in memory?

012

1 vii

A

B

C

0

1

23

A

0

1

2

B

C

VM

Paging Table Memory Disk


Theory23

Page Fault

What happens when process references a page marked as invalid in the page table? Page fault trap Check that reference is valid Find a free memory frame Read desired page from disk Change valid bit of page to v Restart instruction that was interrupted by the trap

Is it easy to restart an instruction? What happens if there is no free frame?


Theory24

Page Fault (Cont’d)

So, what can happen on a memory access?1. TLB miss read page table entry2. TLB miss read kernel page table entry3. Page fault for necessary page of process page table4. All frames are used need to evict a page modify a

process page table entry1. TLB miss read kernel page table entry2. Page fault for necessary page of process page table3. Go back

5. Read in needed page, modify page table entry, fill TLB


Theory25

Cost of Handling a Page Fault

Trap, check page table, find free memory frame (or find victim) … about 200 - 600 s

Disk seek and read … about 10 ms Memory access … about 100 ns Page fault degrades performance by ~100000!!!!!

And this doesn’t even count all the additional things that can happen along the way

Better not have too many page faults! If want no more than 10% degradation, can only have 1

page fault for every 1,000,000 memory accesses OS must do a great job of managing the movement of data

between secondary storage and main memory


Theory26

Page Replacement

What if there’s no free frame left on a page fault? Free a frame that’s currently being used

1. Select the frame to be replaced (victim)2. Write victim back to disk3. Change page table to reflect that victim is now invalid4. Read the desired page into the newly freed frame5. Change page table to reflect that new page is now valid6. Restart faulting instruction

Optimization: do not need to write victim back if it has not been modified (need dirty bit per page).


Theory27

Page Replacement

Highly motivated to find a good replacement policy That is, when evicting a page, how do we choose the

best victim in order to minimize the page fault rate?

Is there an optimal replacement algorithm? If yes, what is it?

Let’s look at an example: Suppose we have 3 memory frames and are running a

program that has the following reference pattern7, 0, 1, 2, 0, 3, 0, 4, 2, 3


Theory28

Page Replacement

Suppose we know the access pattern in advance7, 0, 1, 2, 0, 3, 0, 4, 2, 3

Optimal algorithm is to replace the page that will not be used for the longest period of time

What’s the problem with this algorithm? Realistic policies try to predict future behavior on

the basis of past behavior Works because of locality


Theory29

FIFO

First-in, First-out Be fair, let every page live in memory for the about the

same amount of time, then toss it.

What’s the problem? Is this compatible with what we know about behavior of

programs?

How does it do on our example?7, 0, 1, 2, 0, 3, 0, 4, 2, 3


Theory30

LRU

Least Recently Used On access to a page, timestamp it When need to evict a page, choose the one with the

oldest timestamp What’s the motivation here?

Is LRU optimal? In practice, LRU is quite good for most programs

Is it easy to implement?


Theory31

Not Frequently Used Replacement

Have a reference bit and software counter for each page frame

At each clock interrupt, the OS adds the reference bit of each frame to its counter and then clears the reference bit

When need to evict a page, choose frame with lowest counter What’s the problem?

Doesn’t forget anything, no sense of time – hard to evict a page that was referenced long in the past but is no longer relevant

Updating counters is expensive, especially since memory is getting rather large these days

Can be improved with an aging scheme: counters are shifted right before adding the reference bit and the reference bit is added to the leftmost bit (rather than to the rightmost one)


Theory32

Clock (Second-Chance)

Arrange physical pages in a circle, with a clock hand

Hardware keeps 1 use bit per frame. Sets use bit on memory reference to a frame. If bit is not set, hasn’t been used for a while

On page fault:1. Advance clock hand2. Check use bit

If 1, has been used recently, clear and go on If 0, this is our victim

Can we always find a victim?


Theory33

Nth-Chance

Similar to Clock except: maintain a counter as well as a use bit

On page fault:1. Advance clock hand2. Check use bit

If 1, clear and set counter to 0 If 0, increment counter, if counter < N, go on, otherwise,

this is our victim

What’s the problem if N is too large?


Theory34

A Different Implementation of2nd-Chance

Always keep a free list of some size n > 0 On page fault, if free list has more than n frames, get a

frame from the free list If free list has only n frames, get a frame from the list,

then choose a victim from the frames currently being used and put on the free list

On page fault, if page is on a frame on the free list, don’t have to read page back in.

Implemented on VAX … works well, gets performance close to true LRU


Theory35

Virtual Memory and Cache Conflicts

Assume an architecture with direct-mapped caches First-level caches are often direct-mapped

The VM page size partitions a direct-mapped cache into a set of cache-pages

Page frames are colored (partitioned into equivalence classes) where pages with the same color map to the same cache-page

Cache conflicts can occur only between pages with the same color, and no conflicts can occur within a single page


Theory36

VM Mapping to Avoid Cache Misses

Goal: to assign active virtual pages to different cache-pages

A mapping is optimal if it avoids conflict misses A mapping that assigns two or more active pages

to the same cache-page can induce cache conflict misses

Example: a program with 4 active virtual pages 16 KB direct-mapped cache a 4 KB page size partitions the cache into four cache-pages there are 256 mappings of virtual pages into cache-pages

but only 4!= 24 are optimal


Theory37

Page Re-coloring

With a bit of hardware, can detect conflict at runtime Count cache misses on a per-page basis

Can solve conflicts by re-mapping one or more of the conflicting virtual pages into new page frames of different color Re-coloring

For the limited applications that have been studied, only small performance gain (~10-15%)


Theory38

Multi-Programming Environment

Why? Better utilization of resources (CPU, disks, memory, etc.)

Problems? Mechanism – TLB, caches? How to guarantee fairness? Over commitment of memory

What’s the potential problem? Each process needs its working set in memory in order

to perform well If too many processes running, can thrash


Theory39

Support for Multiple Processes

More than one address space should be loaded in memory

A register points to the current page table OS updates the register when context switching

between threads from different processes Most TLBs can cache more than one PT

Store the process id to distinguish between virtual addresses belonging to different processes

If no pids, then TLB must be flushed at the process switch time


Theory40

Sharing

physical memory:

v-to-p memory mappings

processes:

virtual address spacesp1 p2


Theory41

Copy-on-Write

p1 p2 p1 p2


Theory42

Resident Set Management

How many pages of a process should be brought in ?

Resident set size can be fixed or variable Replacement scope can be local or global Most common schemes implemented in the OS:

Variable allocation with global scope: simple - resident set size of some process is modified at replacement time

Variable allocation with local scope: more complicated - resident set size is modified periodically to approximate the working set size


Theory43

Working Set

The set of pages that have been referenced in the last window of time

The size of the working set varies during the execution of the process depending on the locality of accesses

If the number of pages allocated to a process covers its working set then the number of page faults is small

Schedule a process only if enough free memory to load its working set

How can we determine/approximate the working set size?


Theory44

Page-Fault Frequency

A counter per page stores the virtual time between page faults

An upper threshold for the virtual time is defined If the amount of time since the last page fault is

less than the threshold (frequent faults), then the page is added to the resident set

A lower threshold can be used to discard pages from the resident set

If time between faults higher than the lower threshold (infrequent faults), then discard the LRU page of this process


Theory45

Application-Controlled Paging

OS kernel provides the mechanism and implements the global policy Chooses the process which has to evict a page when

need a free frame

Application decides the local replacement Chooses the particular page that should be evicted

Basic protocol for an external memory manager: At page fault, kernel upcalls the manager asking it to

pick a page to be evicted Manager provides the info and kernel re-maps it as

appropriate


Theory46

Summary

Virtual memory is a way of introducing another level in our memory hierarchy in order to abstract away the amount of memory actually available on a particular system This is incredibly important for “ease-of-programming” Imagine having to explicitly check for size of physical memory

and manage it in each and every one of your programs Can be implemented using paging (sometimes

segmentation) Page fault is expensive so can’t have too many of them

Important to implement good page replacement policy Have to watch out for thrashing!!


Theory47

Single Address Space

What’s the point? Virtual address space is currently used for three

purposes Provide the illusion of a (possibly) larger address space

than physical memory Provide the illusion of a contiguous address space while

allowing for non-contiguous storage in physical memory Protection

Protection, provided through private address spaces, makes sharing difficult

There is no inherent reason why protection should be provided by private address spaces


Theory48

Private Address Spaces vs. Sharing

Shared physical page may be mapped to different virtual pages when shared BTW, what happens if we want to page red page out?

This variable mapping makes sharing of pointer-based data structures difficult

Storing these data structures on disk is also difficult

physical memory:

v-to-p memory mappings

processes:

virtual address spaces

p1 p2


Theory49

Private Address Space vs. Sharing (Cont’d)

Most complex data structures are pointer-based Various techniques have been developed to deal

with this Linearization Pointer swizzling: translation of pointers; small/large AS OS support for multiple processes mapping the same

physical page to the same virtual page All above techniques are either expensive

(linearization and swizzling) or have shortcomings (mapping to same virtual page requires previous agreement)


Theory50

Opal: The Basic Idea

Provide a single virtual address space to all processes in the system … Can be used on a single machine or set of machines on a

LAN

… but … but … won’t we run out of address space? Enough for 500 years if allocated at 1 Gigabyte per second

A virtual address means the same thing to all processes Share and save to secondary storage data structures as is


Theory51

Opal: The Basic Idea

OS

Code

Globals

Stack

Heap

Code

Data

Data

Code

Data

P0

P1


Theory52

Opal: Basic Mechanisms

Protection domain process Container for all resources allocated to a running instantiation

of a program Contains identity of “user”, which can be used for access

control (protection) Virtual address space is allocated in chunks – segments

Segments can be persistent, meaning they might be stored on secondary storage, and so cannot be garbage collected

Segments (and other resources) are named by capabilities Capability is an “un-forgeable” set of rights to access a

resource (we’ll learn more about this later) Access control + identity capabilities Can attach to a segment once have a capability for it

Portals: protection domain entry points (RPC)


Theory53

Opal: Issues

Naming of segments: capabilities Recycling of addresses: reference counting Non-contiguity of address space: segments

cannot grow, so must request segments that are large enough for data structures that assume contiguity

Private static data: must use register-relative addressing

cs 519: lecture 3 zmemory management. 2 cs 519operating system theory memory management...

Documents