cs5460: operating systemscs5460/slides/lecture16.pdf · cs 5460: operating systems timeline of a...

CS 5460: Operating Systems

CS5460: Operating Systems Lecture 16: Page Replacement

(Ch. 9)

Last Time: Demand Paging   Key idea: RAM is used as a cache for disk

–  Don’t give a process a page of RAM until it is needed –  When running short on RAM, take pages away from processes –  This only works if accesses to memory pages have high temporal

locality »  Why don’t we care about spatial locality?

  Three basic kinds of page table entries –  Valid mapping – the OS is not involved – translation performed

entirely by the CPU –  Invalid mapping – trap, then kernel does something special, such

as kill the process –  Valid but not present – trap and do demand paging

  Demand paging makes the exec() system call fast



Timeline of a Page Fault

1.  Trap to operating system 2.  Save state in PCB 3.  Vector to page fault handler 4.  If invalid, send SIGSEGV 5.  If valid, find or create a free

page a.  Possibly involves disk write

6.  Issue disk read for page a.  Wait until request queued at

disk controller b.  Wait for seek/latency c.  Wait for data transfer (DMA) d.  Wait for completion interrupt

7.  (Optional) Schedule another process while waiting

8.  Take disk interrupt 9.  Update page table 10.  Add process to run queue 11.  Wait for process to be

scheduled next 12.  Restore state from PCB 13.  Return from OS 14.  Re-execute faulting

instruction


Effective Access Times   What is average access latency?

–  L1 cache: 2 cycles –  L2 cache: 10 cycles –  Main memory: 150 cycles –  Disk: 10 ms à 30,000,000 cycles on 3.0 GHz processor –  Assume access have following characteristics:

»  98% handled by L1 cache »  1% handled by L2 cache »  0.99% handled by DRAM »  0.01% cause page fault

–  Average access latency: »  (0.98 * 2) + (0.01 * 10) + (0.0099 * 150) + (0.0001 * 30,000,000) =

1.96 + 0.1 + 1.485 + 3000 = about 3000 cycles / access

  Moral: Need LOW fault rates to sustain performance!


Issues in Demand Paging   Page selection policy

–  When do we load a page?

  Page replacement policy –  What page(s) do we swap to disk to make room for new pages? –  When do we swap pages out to disk?

  How do we handle thrashing?


Page Selection Policy   Demand paging:

–  Load page in response to access (page fault) –  Predominant selection policy

  Pre-paging (prefetching) –  Predict what pages will be accessed in near future –  Prefetch pages in advance of access –  Problems:

»  Hard to predict accurately (trace cache) »  Mispredictions can cause useful pages to be replaced

  Overlays –  Application controls when pages loaded/replaced –  Only really relevant now for embedded/real-time systems


Page Replacement Policies   Optimal

–  Throw out page used farthest in the future

  Random –  Works surprisingly well

  FIFO (first in, first out) –  Throw out oldest pages

  LRU (least recently used) –  Throw out page not used in longest time

  NRU (not recently used) –  Approximation to LRU à do not throw out recently used pages

  How should we evaluate page replacement policies?


FIFO Page Replacement

A B C A B D A D B C B Frame1 Frame2 Frame3

  FIFO: replace oldest page (first loaded)

  Example: –  Memory system with three pages à all initially free –  Reference string: A B C A B D A D B C B

Result: 7 page faults

• A • B

• C

• √ • √

• D • A

• √ • B

• C

• √


Optimal Page Replacement


  Optimal: replace page used farthest in the future



• A • B

• C

• √ • √

• D

• √ • C • √

• √ • C • √


LRU Page Replacement


  LRU: replace least recently used page



• A • B

• C

• √ • √

• D

• √ • C • √

• √ • √

  How would you implement… –  Random –  FIFO –  Optimal –  LRU –  NRU

  Which ones are efficient?



NRU Page Replacement   Observations

–  LRU is pretty good approximation of OPT »  Past performance is often reasonable predictor of future performance »  Captures “phase” behavior in many (but not all) applications

–  Implementing true LRU requires far too much overhead »  Logically, need to update “sort order” on every memory access

  How can we approximate LRU efficiently? –  Exploit “referenced” bit in modern page tables –  Only replace pages that have not been recently referenced (NRU) –  Periodically clear referenced bits à enforces “recently”

»  Optionally: Maintain recent history of referenced bits per-page »  Example: 10010101 à denotes times page referenced last 8 sweeps


NRU Page Replacement   This is a modified version of FIFO   Checks if the page at the head of the FIFO queue

has its referenced bit set –  Yes? Then clear the bit and put it at the back of the queue and

look at the next page –  No? Then select this page

  Is this fast? What is the worst case?

  This is called the “second chance” algorithm


1

1

1 0

0

0

Clock Algorithm   This is basically an

optimized version of second chance

  Maintains “next” pointer –  Starts sweep there until done –  Persists across invocations

  While (need more pages) –  Check referenced bit –  If 0 à add to free pool –  If 1 à reset bit

  Between sweeps –  If a process accesses page,

referenced bit gets set –  TLB helps here!

0

0

0

1

0

Next

Referenced

Next

Next

Next

Next

Free!

Free!


BSD Page Replacement (NRU)

  Goal: maintain pool of free pages at all times –  Avoid waiting for replacement algorithm/write during page fault –  Typical goal: ~5% of main memory in free page pool

  Sweeper process –  Privileged (kernel) process –  Scheduled whenever free page pool drops below threshold

»  Low watermark (sweep) –vs- high watermark (goal) –  Sweeps through list of allocated pages doing 2nd chance

Nth Chance   Like second chance but…

–  If page is referenced, clear its counter and move on –  If page is not referenced, increment its counter

»  If new counter == N, select this page »  Otherwise move on

–  If N is big, we have a really good LRU approximation »  But we spend a lot of time looking for pages

–  If N == 1 we have second chance –  If N == 0 we have FIFO

  Lots more work exists on page replacement…



Belady’s Anomaly   For some replacement algorithms

–  MORE pages in main memory can lead to… –  MORE page faults!

  This phenomenon is known as “Belady’s Anomaly”   Example:

–  FIFO replacement policy –  Reference string: A B C D A B E A B C D E –  Three pages à 9 faults –  Four pages à 10 faults!

  Interesting since we would expect that adding more memory always helps


Thrashing   Working set: collection of memory currently being

used by a process   If all working sets do not fit in memory à thrashing

–  One “hot” page replaces another –  Percentage of accesses that generate page faults skyrockets

  Typical solution: “swap out” entire processes –  Scheduler needs to get involved –  Two-level scheduling policy à runnable vs memory-available –  Need to be fair –  Invoked when page fault rate exceeds some bound

  When swap devices are full, Linux invokes the “OOM killer”


  Who should we compete against for memory?   Global replacement:

–  All pages for all processes come from single shared pool –  Advantage: very flexible à can globally “optimize” memory usage –  Disadvantages: Thrashing more likely, can often do just the

wrong thing (e.g., replace the pages of a process about to be scheduled)

–  Many OSes, including Linux, do this

  Per-process replacement: –  Each process has private pool of pages à competes with itself –  Alleviates inter-process problems, but not every process equal –  Need to know working set size for each process –  Windows kernel does this

»  There are Win32 API calls to set a process’s minimum and maximum working set sizes


Important From Today   Demand paging

–  What is it? What is the “effective access time”?

  Page replacement policies –  Random, FIFO, Optimal, LRU, NRU, … –  Belady’s anomaly

  Thrashing   Global vs local allocation

–  Concept of a process’s “working set”

cs5460: operating systemscs5460/slides/lecture16.pdf · cs 5460: operating systems timeline of a...

Documents