memory management - utrgv faculty webfaculty.utrgv.edu/david.egle/csci4334/ch11… · ppt file ·...

MEMORY MANAGEMENT

2

Basic OS Organization

Processor(s) Main Memory Devices

Process, Thread &Resource Manager

MemoryManager

DeviceManager

FileManager

Operating System

Computer Hardware

3

The Basic Memory Hierarchy

CPU Registers

Primary Memory(Executable Memory)

e.g. RAM

Secondary Memorye.g. Disk or Tape

More Frequently

Used Information

Less Frequently

Used Information

von Neumann architecture

4

Memory System Primary memory

Holds programs and data while they are being used by the CPU

Referenced by byte; fast access; volatile Secondary memory

Collection of storage devices Referenced by block; slow access; nonvolatile

5

Primary & Secondary MemoryCPU

Primary Memory(Executable Memory)

e.g. RAM

Secondary Memorye.g. Disk or Tape

• CPU can load/store• Ctl Unit executes code

from this memory• Transient storage

• Access using I/O operations• Persistent storage

Load

Information can be loaded statically or dynamically

6

Classical Memory Manager Tasks

Memory management technology has evolved

Early multiprogramming systems Resource manager for space-multiplexed

primary memory As popularity of multiprogramming grew

Provide robust isolation mechanisms Still later

Provide mechanisms for shared memory

7

Contemporary Memory Manager

Performs the classic functions required to manage primary memory Attempts to efficiently use primary memory Keep programs/data in primary memory only

while they are being used by CPU Store/restore data in secondary memory soon

after it has been used or created Exploits storage hierarchies

Virtual memory manager

8

Requirements on Memory Designs

The primary memory access time must be as small as possible

The perceived primary memory must be as large as possible

The memory system must be cost effective

9

Functions of Memory Manager Allocate primary memory space to

processes Map the process address space into the

allocated portion of the primary memory Minimize access times using a cost-

effective amount of primary memory May use static or dynamic techniques

10

Memory Manager Only a small number of interface

functions provided – usually calls to: Request/release primary memory space Load programs Share blocks of memory

Provides following Memory abstraction Allocation/deallocation of memory Memory isolation Memory sharing

11

Memory Abstraction Process address space

Allows process to use an abstract set of addresses to reference physical primary memory

Mapped to objectother than memory

Process Address Space Hardware Primary Memory

12

Address Space Program must be brought into memory

and placed within a process for it to be executed A program is a file on disk CPU reads instructions from main memory and

reads/writes data to main memory Determined by the computer architecture

Address binding of instructions and data to memory addresses

13

Creating an Executable Program

Sourcecode

• Compile time: Translate elements

• Load time:• Allocate primary memory• Adjust addresses in address space (relocation)• Copy address space from secondary to primary memory

Loader Processaddressspace

Primarymemory

C

RelocObjectcode

LinkEdit

Librarycode

Otherobjects

Secondary memory

• Link time: Combine elements

14

Bindings Compiler

Binds static variables to storage locations relative to start of data segment

Binds automatic variables to storage locations relative to bottom of stack

Linker Combines data segments and adjusts bindings

accordingly Same for stack

15

Bindings – cont. Loader

Binds logical addresses used by program with physical memory locations (address binding)

This type of binding is called static address binding

The last stage of address binding can be deferred to runtime

dynamic address binding

16

Dynamic Memory Static and automatic variables are

assigned addresses in the data or stack segments at compile time

Dynamic memory allocation (e.g., new or malloc) is done at runtime This is not handled by the memory manager This merely binds parts of the process’s

address space to dynamic data structures Memory manager gets involved if the

process runs out of address space

17

Variations in program linking/loading

18

Normal linking and loading

19

Load-time dynamic linking

20

Run-time dynamic linking

21

Data Storage Allocation Static variables

stored in programs data segment Automatic variables

Stored on stack Dynamically allocated space (new or

malloc) Taken from heap storage – no system call

Note: If heap disappears, kernel memory manager invoked to get more memory for the process

22

C Style Memory LayoutText Segment

Initialized Part Data Segment

Uninitialized Part Data Segment

Heap Storage

Stack Segment

EnvironmentVariables, …High Address

Low Address

23

Program and Process Address Spaces

Process Address Space Hardware Primary MemoryAbsoluteProgramAddressSpace

0

3 GB

4 GB

User ProcessAddressSpace

SupervisorProcessAddressSpace

24

Overview of Memory Management Techniques

Memory allocation strategies View the process address space and the

primary memory as contiguous address space Paging and segmentation based

techniques View the process address space and the

primary memory as a set of pages / segments Map an address in process space to a memory

address Virtual memory

Extension of paging/segmentation based techniques

To run a program, only the current pages/segments need to in primary memory

25

Memory Allocation Strategies

- There are two different levels in memory allocation

26

Two levels of memory management

27

Memory Management System Calls

In Unix, the system call is brk Increase the amount of memory allocated to a

process

28

Malloc and New functions They are user-level memory allocation

functions, not system calls

29

Memory Management

30

Issues in a memory allocation algorithm

Memory layout / organization how to divide the memory into blocks for

allocation? Fixed partition method: divide the memory once before

any bytes are allocated. Variable partition method: divide it up as you are

allocating the memory. Memory allocation

select which piece of memory to allocate to a request

Memory organization and memory allocation are close related

It is a very general problem Variations of this problem occurs in many places.

For examples: disk space management

31

Static Memory Allocation

OperatingSystem

Process 3

Process 0

Process 2

Process 1

UnusedIn Use

Issue: Need a mechanism/policy for loading pi’s address space into primary memory

pi

32

Fixed-Partition Memory allocation

Statically divide the primary memory into fixed size regions Regions can have different sizes or same sizes

A process / request can be allocated to any region that is large enough

33

Fixed-Partition Memory allocation – cont.

Advantages easy to implement. Good when the sizes for memory requests are

known. Disadvantage:

cannot handle variable-size requests effectively. Might need to use a large block to satisfy a

request for small size. Internal fragmentation – The difference between

the request and the allocated region size; Space allocated to a process but is not used It can be significant if the requests vary in size

considerably

34

Fixed-Partition Memory Mechanism

OperatingSystem

Region 3

Region 2

Region 1

Region 0 N0

N1

N2

N3

pi

pi needs ni units

ni

35

Which free block to allocate

How to satisfy a request of size n from a list of free blocks First-fit: Allocate the first hole that is big enough Next-fit: Choose the next block that is large

enough Best-fit: Allocate the smallest hole that is big

enough; must search entire list, unless ordered by size. Produces the smallest leftover hole.

Worst-fit: Allocate the largest hole; must also search entire list. Produces the largest leftover hole.

36

Fixed-Partition Memory -- Best-Fit

OperatingSystem

Region 3

Region 2

Region 1

Region 0 N0

N1

N2

N3

pi

InternalFragmentation

• Loader must adjust every address in the absolute module when placed in memory

37

Fixed-Partition Memory -- Worst-Fit

OperatingSystem

Region 3

Region 2

Region 1

Region 0 N0

N1

N2

N3

pi

38

Fixed-Partition Memory -- First-Fit

OperatingSystem

Region 3

Region 2

Region 1

Region 0 N0

N1

N2

N3

pi

39

Fixed-Partition Memory -- Next-Fit

OperatingSystem

Region 3

Region 2

Region 1

Region 0 N0

N1

N2

N3

pi

Pi+1

40

Variable partition memory allocation

Grant only the size requested Example:

total 512 bytes: allocate(r1, 100), allocate(r2, 200), allocate(r3, 200), free(r2), allocate(r4, 10), free(r1), allocate(r5, 200)

External Fragmentation Memory is divided up into small blocks that

none of them can be used to satisfy any requests.

41

Issues in Variable partition memory allocation

Where are the free memory blocks? Keeping trace of the memory blocks List method and bitmap method

Which memory blocks to allocate? There may exist multiple free memory blocks

that can satisfy a request. Which block to use? Fragmentation must be minimized How to keep track of free and allocated

memory blocks?

42

Variable Partition Memory MechanismOperating

SystemOperating

SystemProcess 0

Process 6

Process 2

Process 5Process 4

• Compaction moves program in memory in (d)

OperatingSystem

Process 0

Process 6

Process 2

Process 5

Process 4

• External fragmentation in (c)

OperatingSystem

Process 0

Process 1

Process 2

Process 3Process 4

Loader adjusts every address in every absolute module when placed in memory

43

Cost of Moving Programs

load R1, 0x02010

3F013010

Program loaded at 0x01000 3F016010

Program loaded at 0x04000

• Must run loader over program again!

Consider dynamic techniques

Compaction requires that a program be moved

44

Dynamic Memory Allocation

Could use dynamically allocated memory

Process wants to change the size of its address space Smaller Creates an external fragment Larger May have to move/relocate the

program Allocate “holes” in memory according

to Best- /Worst- / First- /Next-fit

45

Contemporary Memory Allocation

Use some form of variable partitioning Usually allocate memory in fixed-size

blocks (pages) Simplifies management of free list Greatly complicates binding problem

46

Dynamic Address Space Binding Recall: in static binding

Symbols first bound to relative addresses in a relocatable module at compile time

Then to addresses in absolute module at link time

Then to primary memory addresses at load time

Dynamic binding Wait to bind absolute program addresses

until run time Simplest mechanism is dynamic relocation

Usually implemented by the processor

47

Dynamic Address Relocation

CPU

0x02010

0x10000+

MARload R1, 0x02010

0x12010

• Program loaded at 0x10000 Relocation Register = 0x10000• Program loaded at 0x04000 Relocation Register = 0x04000

Relocation Register

Relative Address

We never have to change the load module addresses!

Performed automagically by processor

48

Dynamic Address Relocation Same holds for multiple segment

registersCPU (Generated address)

Relative Address

Code register

Stack register

Data register

+

MAR

Primary memory

49

Runtime Bound CheckingCPU

Relative Address

Relocation Register+

MAR

Limit Register <

Interrupt

• Bound checking is inexpensive to add

• Provides excellent memory protection

50

Memory Mgmt Strategies Fixed-Partition used only in batch

systems Variable-Partition used everywhere

(except in virtual memory) Swapping systems

Popularized in timesharing Relies on dynamic address relocation

Dynamic Loading (Virtual Memory) Exploit the memory hierarchy Paging -- mainstream in contemporary

systems Shared-memory multiprocessors

51

Swapping

Special case of dynamic memory allocation

Suppose there is high demand for executable memory

Equitable policy might be to time-multiplex processes into the memory (also space-mux)

Means that process can have its address space unloaded when it still needs memory Usually only happens when it is blocked

52

Swapping – cont.

Objective Optimize system performance by removing a

process from memory when it is blocked, allowing that memory to be used by other processes

Block may be caused by a request for a resource, or by the memory manager

Swapping only becomes necessary when processes are being denied access to memory

53

Swapping – cont.

Image for pj

Image for pi

Swap pi out

Swap pj in

54

Cost of Swapping Need to consider time to copy execution

image from primary to secondary memory, and back This is the major part of the swap time

In addition, there is the time required by the memory manager, and the usual context switching time

55

Swapping Systems Standard swapping used in few systems

Requires too much swapping time and provides too little execution time

Most systems do use some modified version of swapping In UNIX, swapping is normally disabled, but

will be enabled if memory usage reaches some threshold limit; when usage drops below the threshold, swapping is again disabled

56

Virtual Memory Allows a process to execute when only

part of its address space is loaded in primary memory – the rest is in secondary

Need to be able to partition the address space into parts that can be loaded into primary memory when needed

57

Virtual Memory – cont. A characteristic of programs that is very

important to the strategy used by virtual memory systems is spatial reference locality Refers to the implicit partitioning of code and

data segments due to the functioning of the program (portion for initializing data, another for reading input, others for computation, etc.)

Can be used to select which parts of the process should be loaded into primary memory

58

Virtual Memory Barriers Must be able to treat the address space

in parts that correspond to the various localities that will exist during the programs execution

Must be able to load a part anywhere in physical memory and dynamically bind the addresses appropriately

More on this in next chapter

59

Shared-memory Multiprocessors Several processors share an

interconnection network to access a set of shared-memory modules

Any CPU can read/write any memory unit

CPU CPU CPU. . .

Memory Memory Memory. . .

Interconnection Network

60

Shared-memory Multiprocessors – cont.

Goal is to use processes or threads to implement units of computation on different processors while sharing information via common primary memory locations

One technique would be to have the address spaces of two processes overlap

Another would split the address space of a process into a private part and a public part

61

Sharing a Portion of the Address Space

Process 1

Process 2

Address Space for Process 2

Address Space for Process 1

Primary Memory

62

Figure 11‑26: Multiple Segments

Relocation

RelocationLimit

Limit

Relocation

RelocationLimit

Limit

CPU Executing Process 1

CPU Executing Process 2

Primary Memory

Shared

Priv

ate

toPr

oces

s 1Pr

ivat

e to

Proc

ess 2

63

Shared-memory Multiprocessors – cont.

A major problem is synchronization How can one process detect when the other

process has written or read information Will need to use interprocess communication

to handle the synchronization Another problem is overloading the

interconnection network Use cache memories to decrease load on

network

64

VIRTUAL MEMORY

65

Virtual Memory Manager Provides abstraction of physical memory Creates virtual address space in

secondary memory and then “automatically” determines which part of the address space to load into primary memory at any given time

Allows application programmers to think that they have a very large address space in which to write programs

66

Virtual Memory Organization

Memory Image for pi

Secondary Memory

Primary Memory

67

Locality Programs do not access their address

space uniformly they access the same location over and over

Spatial locality: processes tend to access location near to location they just accessed because of sequential program execution because data for a function is grouped

together Temporal locality: processes tend to

access data over and over again because of program loops because data is processed over and over again

68

Spatial Reference Locality

Address space is logically partitioned Text, data, stack Initialization, main, error

handle Different parts have

different reference patterns:

Address Space for pi

Initialization code (used once)Code for 1Code for 2Code for 3Code for error 1Code for error 2Code for error 3Data & stack

30%

20%

35%

15%

<1%

<1%

Execution time

69

Virtual Memory Every process has code and data

locality Dynamically load/unload currently-

used address space fragments as the process executes

Uses dynamic address relocation/binding Generalization of base-limit registers Physical address corresponding to a

compile-time address is not bound until run time

70

Virtual Memory – cont Since binding changes with time, use a

dynamic virtual address map, Yt

VirtualAddressSpace

Yt

71

Virtual Memory – cont

Virtual Address Space for piVirtual Address Space for pj

Virtual Address Space for pk

Secondary Memory

• Complete virtual address space is stored in secondary memory

Primary Memory

0

n-1

Physical Address Space

• Fragments of the virtual address space are dynamically loaded into primary memory at any given time

• Each address space is fragmented

72

Address Translation Virtual memory systems distinguish

among symbolic names, virtual address, and physical address spaces

Need to map symbolic names to virtual addresses, and then to physical addresses

Compiler/assembler and link editor handle mapping from symbolic names in name space to virtual address

When program is executed, the virtual addresses are mapped to physical addresses

73

Names, Virtual Addresses & Physical Addresses

SourceProgram

AbsoluteModule

Name SpacePi’s Virtual

Address Space

ExecutableImage

PhysicalAddress Space

Yt: Virtual Address Space Physical Address Space

74

Address Formation Translation system creates an address

space, but its address are virtual instead of physical

A virtual address, x: Is mapped to physical address y = Yt(x) if x

is loaded at physical address y Is mapped to W (the null address) if x is not

loaded The map, Yt, changes as the process

executes -- it is “time varying” Yt: Virtual Address Physical Address

{W}

75

Translation Process If Yt(k) = W at time t and the process

references location k, then The virtual manager will stop the process The referenced location is loaded at some

location (say m) The manager changes Yt(k) = m The manager lets the process continue execution

Note that the referenced element was determined missing after an instruction started execution – CPU needs to be able to “back out” of an instruction and reexecute instruction after translation mapping

76

Size of Blocks of Memory

Virtual memory system transfers “blocks” of the address space to/from primary memory

Fixed size blocks: System-defined pages are moved back and forth between primary and secondary memory

Variable size blocks: Programmer-defined segments – corresponding to logical fragments – are the unit of movement

Paging is the commercially dominant form of virtual memory today

77

Paging A page is a fixed size, 2h, block of

virtual addresses A page frame is a fixed size, 2h, block

of physical memory (the same size as a page)

When a virtual address, x, in page i is referenced by the CPU If page i is loaded at page frame j, the

virtual address is relocated to page frame j If page is not loaded, the OS interrupts the

process and loads the page into a page frame

78

Practicality of paging Paging only works because of locality

at any one point in time programs don’t need most of their pages

Page fault rates must be very, very low for paging to be practical like one page fault per 100,000 or more

memory references

79

Addresses Suppose there are G= 2g2h=2g+h

virtual addresses and H=2j+h physical addresses assigned to a process Each page/page frame is 2h addresses There are 2g pages in the virtual address

space 2j page frames are allocated to the

process Rather than map individual addresses

Yt maps the 2g pages to the 2j page frames That is, page_framej = Yt(pagei) Address k in pagei corresponds to address k

in page_framej

80

Page-Based Address Translation

Let N = {d0, d1, … dn-1} be the pages Let M = {b0, b1, …, bm-1} be page

frames Virtual address, i, satisfies 0i<G= 2g+h Physical address, k = U2h+V (0V<G=

2h ) U is page frame number V is the line number within the page Yt:[0:G-1] <U, V> {W} Since every page is size c=2h

page number = U = i/c line number = V = i mod c

81

Address Translation (cont)

Page # Line #Virtual Address

g bits h bits

YtMissing Page

Frame # Line #j bits h bits

Physical Address

MAR

CPU Memory

“page table”

82

Paging Algorithms Two basic types of paging algorithms

Static allocation Dynamic allocation

Three basic policies in defining any paging algorithm Fetch policy – when a page should be loaded Replacement policy –which page is unloaded Placement policy – where page should be

loaded

83

Fetch Policy Determines when a page should be

brought into primary memory Usually don’t have prior knowledge about

what pages will be needed Majority of paging mechanisms use a

demand fetch policy Page is loaded only when process references it

84

Demand Paging Algorithm1. Page fault occurs2. Process with missing page is

interrupted3. Memory manager locates the missing

page4. Page frame is unloaded (replacement

policy)5. Page is loaded in the vacated page

frame6. Page table is updated7. Process is restarted

85

Page references Processes continually reference memory

and so generate a stream of page references The page reference stream tells us everything about

how a process uses memory For a given size, we only need to consider the page

number If we have a reference to a page, then immediately

following references to the page will never generate a page fault0100, 0432, 0101, 0612, 0102, 0103, 0104, 0101, 0611, 0102, 01030104, 0101, 0610, 0103, 0104, 0101, 0609, 0102, 0105

Suppose the page size is 100 bytes, what is the page reference stream?

We use page reference streams to evaluate paging algorithms

86

Modeling Page Behavior Let w = r1, r2, r3, …, ri, … be a page

reference stream ri is the ith page # referenced by the

process The subscript is the virtual time for the

process Given a page frame allocation of m,

the memory state at time t, St(m), is set of pages loaded St(m) = St-1(m) Xt - Yt

Xt is the set of fetched pages at time t Yt is the set of replaced pages at time t

87

More on Demand Paging

If rt was loaded at time t-1, St(m) = St-

1(m) If rt was not loaded at time t-1 and

there were empty page frames St(m) = St-1(m) {rt}

If rt was not loaded at time t-1 and there were no empty page frames St(m) = St-1(m) {rt} - {y}

where y is the page frame unloaded

88

Replacement Policy When there is no empty page frame in

memory, we need to find one to replace Write it out to the swap area if it has been

changed since it was read in from the swap area Dirty bit or modified bit

pages that have been changed are referred to as “dirty”

these pages must be written out to disk because the disk version is out of date this is called “cleaning” the page

Which page to remove from memory to make room for a new page We need a page replacement algorithm

89

Page replacement algorithms

The goal of a page replacement algorithm is to produce the fewest page faults

We can compare two algorithms on a range of page reference streams

Or we can compare an algorithm to the best possible algorithm

We will start by considering static page replacement algorithms

90

Static Paging Algorithms A fixed number of page frames is

allocated to each process when it is created

Paging policy defines how these page frames will be loaded and unloaded

Placement policy is fixed The page frame holding the new page is

always the one vacated by the page selected for replacement

91

Static Allocation, Demand Paging

Number of page frames is static over the life of the process

Fetch policy is demand Since St(m) = St-1(m) {rt} - {y}, the

replacement policy must choose y -- which uniquely identifies the paging policy

92

Random page replacement Algorithm: replace a page randomly Theory: we cannot predict the future at

all Implementation: easy Performance: poor

but it is easy to implement but the best case, worse case and average

case are all the same

93

Random Replacement• Replaced page, y, is chosen from the m

loaded page frames with probability 1/mLet page reference stream, v = 2031203120316457

Frame 2 0 3 1 2 0 3 1 2 0 3 1 6 4 5 7012

213

213

210

310

310

312

012

032

032

062

462

452

752

203

20

2

• No knowledge of v doesn’t perform well13 page faults

94

Belady’s Optimal algorithm

The one that produces the fewest possible page faults on all page reference sequences

Algorithm: replace the page that will not be used for the longest time in the future

Problem: it requires knowledge of the future

Not realizable in practice but it is used to measure the effectiveness of

realizable algorithms

95

Belady’s Optimal Algorithm

• Replace page with maximal forward distance: yt = max xeS t-1(m)FWDt(x)

Let page reference stream, v = 2031203120316457

Frame 2 0 3 1 2 0 3 1 2 0 3 1 6 4 5 70 2 2 21 0 02 3

96




Frame 2 0 3 1 2 0 3 1 2 0 3 1 6 4 5 70 2 2 21 0 02 3

FWD4(2) = 1 FWD4(0) = 2FWD4(3) = 3

97




Frame 2 0 3 1 2 0 3 1 2 0 3 1 6 4 5 70 2 2 2 21 0 0 02 3 1

FWD4(2) = 1 FWD4(0) = 2FWD4(3) = 3

98




Frame 2 0 3 1 2 0 3 1 2 0 3 1 6 4 5 70 2 2 2 2 2 21 0 0 0 0 02 3 1 1 1

99




Frame 2 0 3 1 2 0 3 1 2 0 3 1 6 4 5 70 2 2 2 2 2 2 21 0 0 0 0 0 32 3 1 1 1 1

FWD7(2) = 2 FWD7(0) = 3FWD7(1) = 1

100




Frame 2 0 3 1 2 0 3 1 2 0 3 1 6 4 5 70 2 2 2 2 2 2 2 2 2 01 0 0 0 0 0 3 3 3 32 3 1 1 1 1 1 1 1

FWD10(2) = FWD10(3) = 2FWD10(1) = 3

101




Frame 2 0 3 1 2 0 3 1 2 0 3 1 6 4 5 70 2 2 2 2 2 2 2 2 2 0 0 01 0 0 0 0 0 3 3 3 3 3 32 3 1 1 1 1 1 1 1 1 1

FWD13(0) = FWD13(3) = FWD13(1) =

102


Replace page with maximal forward distance: yt = max xeS t-1(m)FWDt(x)


• Perfect knowledge of v perfect performance• Impossible to implement

10 page faults

Frame 2 0 3 1 2 0 3 1 2 0 3 1 6 4 5 70 2 2 2 2 2 2 2 2 2 0 0 0 0 4 4 41 0 0 0 0 0 3 3 3 3 3 3 6 6 6 72 3 1 1 1 1 1 1 1 1 1 1 1 5 5

103

Theories of program behavior

All replacement algorithms try to predict the future and act like Belady’s optimal algorithm

All replacement algorithms have a theory of how program behave they use it to predict the future, that is, when

pages will be referenced then the replace the page that they think

won’t be referenced for the longest time.

104

LRU page replacement Least-recently used (LRU) Algorithm: remove the page that hasn’t

been referenced for the longest time Theory: the future will be like the past,

page accesses tend to be clustered in time

Implementation: hard, requires hardware assistance (and then still not easy)

Performance: very good, within 30%-40% of optimal

105

LRU model of the future

106

Least Recently Used (LRU)

Replace page with maximal forward distance: yt = max xeS t-1(m)BKWDt(x)


Frame 2 0 3 1 2 0 3 1 2 0 3 1 6 4 5 70 2 2 21 0 02 3

BKWD4(2) = 3BKWD4(0) = 2BKWD4(3) = 1

107


• Replace page with maximal forward distance: yt = max xeS t-1(m)BKWDt(x)


Frame 2 0 3 1 2 0 3 1 2 0 3 1 6 4 5 70 2 2 2 11 0 0 02 3 3

BKWD4(2) = 3BKWD4(0) = 2BKWD4(3) = 1

108




Frame 2 0 3 1 2 0 3 1 2 0 3 1 6 4 5 70 2 2 2 1 11 0 0 0 22 3 3 3

BKWD5(1) = 1BKWD5(0) = 3BKWD5(3) = 2

109




Frame 2 0 3 1 2 0 3 1 2 0 3 1 6 4 5 70 2 2 2 1 1 11 0 0 0 2 22 3 3 3 0

BKWD6(1) = 2BKWD6(2) = 1BKWD6(3) = 3

110




Frame 2 0 3 1 2 0 3 1 2 0 3 1 6 4 5 70 2 2 2 1 1 1 3 3 3 0 0 0 6 6 6 71 0 0 0 2 2 2 1 1 1 3 3 3 4 4 42 3 3 3 0 0 0 2 2 2 1 1 1 5 5

111




Frame 2 0 3 1 2 0 3 1 2 0 3 1 6 4 5 70 2 2 2 2 2 2 2 3 2 2 2 2 6 6 6 61 0 0 0 0 0 0 0 0 0 0 0 0 4 4 42 3 3 3 3 3 3 3 3 3 3 3 3 5 53 1 1 1 1 1 1 1 1 1 1 1 1 7

• Backward distance is a good predictor of forward distance -- locality

112

LFU page replacement

Least-frequently used (LFU) Algorithm: remove the page that hasn’t

been used often in the past Theory: an actively used page should

have a large reference count Implementation: hard, also requires

hardware assistance (and then still not easy)

Performance: not very good

113

Least Frequently Used (LFU)

• Replace page with minimum use: yt = min xeS t-1(m)FREQ(x)


Frame 2 0 3 1 2 0 3 1 2 0 3 1 6 4 5 70 2 2 21 0 02 3

FREQ4(2) = 1FREQ4(0) = 1FREQ4(3) = 1

114




Frame 2 0 3 1 2 0 3 1 2 0 3 1 6 4 5 70 2 2 2 21 0 0 12 3 3

FREQ4(2) = 1FREQ4(0) = 1FREQ4(3) = 1

115




Frame 2 0 3 1 2 0 3 1 2 0 3 1 6 4 5 70 2 2 2 2 2 21 0 0 1 1 12 3 3 3 0

FREQ6(2) = 2FREQ6(1) = 1FREQ6(3) = 1

116




Frame 2 0 3 1 2 0 3 1 2 0 3 1 6 4 5 70 2 2 2 2 2 21 0 0 1 1 12 3 3 3 0

FREQ7(2) = ?FREQ7(1) = ?FREQ7(0) = ?

117

FIFO page replacement Algorithm: replace the oldest page Theory: pages are used for a while and

then stop being used Implementation: easy Performance: poor

because old pages are often accessed, that is, the theory if FIFO is not correct

118

First In First Out (FIFO)

• Replace page that has been in memory the longest: yt = max xeS t-1(m)AGE(x)


Frame 2 0 3 1 2 0 3 1 2 0 3 1 6 4 5 70 2 2 2 1 0 0 2 3

119




Frame 2 0 3 1 2 0 3 1 2 0 3 1 6 4 5 70 2 2 21 0 02 3

AGE4(2) = 3AGE4(0) = 2AGE4(3) = 1

120




Frame 2 0 3 1 2 0 3 1 2 0 3 1 6 4 5 70 2 2 2 11 0 0 02 3 3

AGE4(2) = 3AGE4(0) = 2AGE4(3) = 1

121




Frame 2 0 3 1 2 0 3 1 2 0 3 1 6 4 5 70 2 2 2 11 0 0 02 3 3

AGE5(1) = ?AGE5(0) = ?AGE5(3) = ?

122

Belady’s Anomaly

• FIFO with m = 3 has 9 faults• FIFO with m = 4 has 10 faults


Frame 0 1 2 3 0 1 4 0 1 2 3 40 0 0 0 3 3 3 4 4 4 4 4 41 1 1 1 0 0 0 0 0 2 2 22 2 2 2 1 1 1 1 1 3 3

Frame 0 1 2 3 0 1 4 0 1 2 3 40 0 0 0 0 0 0 4 4 4 4 3 31 1 1 1 1 1 1 0 0 0 0 42 2 2 2 2 2 2 1 1 1 13 3 3 3 3 3 3 2 2 2

123

Belady’s Anomaly

The paging algorithm has worse performance when the amount of primary memory allocated to the process increases

Problem arises because the set of pages loaded with the smaller memory allocation is not necessarily also loaded with the larger memory allocation

124

Avoiding Belady’s Anomaly

Inclusion Property Set of pages loaded with an allocation of m

frames is always a subset of the set of pages that has a page frame allocation of m+1

FIFO does not satisfy the inclusion property

LRU and LFU do Algorithms that satisfy the inclusion

property are called stack algorithms

125

Stack Algorithms Some algorithms are well-behaved Inclusion Property: Pages loaded at

time t with m is also loaded at time t with m+1

Frame 0 1 2 3 0 1 4 0 1 2 3 40 0 0 0 31 1 1 12 2 2

Frame 0 1 2 3 0 1 4 0 1 2 3 40 0 0 0 01 1 1 12 2 23 3

LRU

126

Stack Algorithms• Some algorithms are well-behaved• Inclusion Property: Pages loaded at time t

with m is also loaded at time t with m+1

Frame 0 1 2 3 0 1 4 0 1 2 3 40 0 0 0 3 31 1 1 1 12 2 2 0

Frame 0 1 2 3 0 1 4 0 1 2 3 40 0 0 0 0 01 1 1 1 12 2 2 23 3 3

LRU

127



Frame 0 1 2 3 0 1 4 0 1 2 3 40 0 0 0 3 3 31 1 1 1 0 02 2 2 2 1

Frame 0 1 2 3 0 1 4 0 1 2 3 40 0 0 0 0 0 01 1 1 1 1 12 2 2 2 23 3 3 3

LRU

128



Frame 0 1 2 3 0 1 4 0 1 2 3 40 0 0 0 3 3 3 41 1 1 1 0 0 02 2 2 2 1 1

Frame 0 1 2 3 0 1 4 0 1 2 3 40 0 0 0 0 0 0 01 1 1 1 1 1 12 2 2 2 2 43 3 3 3 3

LRU

129



Frame 0 1 2 3 0 1 4 0 1 2 3 40 0 0 0 3 3 3 4 4 4 2 2 21 1 1 1 0 0 0 0 0 0 3 32 2 2 2 1 1 1 1 1 1 4

Frame 0 1 2 3 0 1 4 0 1 2 3 40 0 0 0 0 0 0 0 0 0 0 0 41 1 1 1 1 1 1 1 1 1 1 12 2 2 2 2 4 4 4 4 3 33 3 3 3 3 3 3 2 2 2

LRU

130

Stack Algorithms• Some algorithms are not well-behaved• Inclusion Property: Pages loaded at time t

with m aren’t loaded at time t with m+1

FIFO Frame 0 1 2 3 0 1 4 0 1 2 3 40 0 0 0 3 3 3 4 4 4 4 4 41 1 1 1 0 0 0 0 0 2 2 22 2 2 2 1 1 1 1 1 3 3

Frame 0 1 2 3 0 1 4 0 1 2 3 40 0 0 0 0 0 0 4 4 4 4 3 31 1 1 1 1 1 1 0 0 0 0 42 2 2 2 2 2 2 1 1 1 13 3 3 3 3 3 3 2 2 2

131

Implementation LRU has become preferred algorithm Difficult to implement

Must record when each page was referenced Difficult to do in hardware

Approximate LRU with a reference bit Periodically reset Set for a page when it is referenced

Dirty bit pages that have been changed are referred

to as “dirty” these pages must be written out to disk

because the disk version is out of date this is called “cleaning” the page

132

First LRU approximation When you get a page fault

replace any page whose referenced bit is off then turn off all the referenced bits

Two classes of pages Pages referenced since the last page fault Pages not referenced since the last page fault

the least recently used page is in this class but you don’t know which one it is

A crude approximation of LRU

133

Second LRU approximation Algorithm:

Keep a counter for each page Have a daemon wake up every 500 ms and

add one to the counter of each page that has not been referenced

zero the counter of pages that have been referenced

turn off all referenced bits When you get a page fault

replace the page whose counter is largest Divides pages into 256 classes

134

Dynamic Paging Algorithms

Static page replacement algorithms assume that a process is allocated to a fixed amount of primary memory

But, the amount of physical memory – the number of page frames – varies as the process executes

How much memory should be allocated? Fault rate must be “tolerable” Will change according to the phase of process

Need to define a placement & replacement policy

Contemporary models based on working set

135

Working Set Intuitively, the working set is the set of

pages in the process’s locality Somewhat imprecise Time varying Given k processes in memory, let mi(t) be

# of pages frames allocated to pi at time t mi(0) = 0 i=1

k mi(t) |primary memory| Also have St(mi(t)) = St(mi(t-1)) Xt - Yt Or, more simply S(mi(t)) = S(mi(t-1)) Xt - Yt

136

Placed/Replaced Pages S(mi(t)) = S(mi(t-1)) Xt - Yt For the missing page

Allocate a new page frame Xt = {rt} in the new page frame

How should Yt be defined? Consider a parameter, , called the

window size Determine BKWDt(y) for every yS(mi(t-1)) if BKWDt(y) , unload y and deallocate

frame if BKWDt(y) < do not disturb y

137

Working Set Principle

Process pi should only be loaded and active if it can be allocated enough page frames to hold its entire working set

The size of the working set is estimated using Unfortunately, a “good” value of depends

on the size of the locality Empirically this works with a fixed

138

Working set algorithm Algorithm

Keep track of the working set of each running process

Only run a process if its entire working set fits in memory – called working set principle

139

Working set algorithm example

• With =3, there are 16 page faults• With =4, there are 8 – the minimum size since

there are 8 distinct pages

140

Working set algorithm example – cont.

• Letting =9 does not reduce the number of page faults

• In fact, not all the page frames are used

141

Working set algorithm example – cont.

• Here the page frame allocation changes dynamicallyincreasing and decreasing

142

Implementing the Working Set Global LRU will behave similarly to a

working set algorithm Page fault

Add a page frame to one process Take away a page frame from another process

Use LRU implementation idea Reference bit for every page frame Cleared periodically, set with each reference Change allocation of some page frame with a

clear reference bit Clock algorithms use this technique by

searching for cleared ref bits in a circular fashion

143

Performance of Demand Paging

Page Fault Rate probability: 0 p 1.0 if p = 0 no page faults if p = 1, every reference is a fault

Effective Access Time (EAT) EAT = (1 – p) x memory access

+ p (page fault overhead+ [swap page out ]+ swap page in+ restart overhead)

144

Demand Paging Performance Example

Assume memory access time = 100 nanosecond

Assume fault service time = 25 ms = 25,000,000 ns

Then EAT = (1 – p) x 100 + p (25,000,000)= 100 + 24,999,900 p (in ns)

So, if one out of 1000 accesses causes a page fault, then EAT = 100+24,999,900x0.001=25,099.9 ns ≈ 25 microseconds

145

Demand Paging Performance Example

So, if one access out of 1000 causes a page fault, the computer would be slowed down by a factor of 250 because of demand paging!

Can calculate that if we want less than 10% degradation, need to allow only one access out of 2,500,000 to page fault

146

Evaluating paging algorithms Mathematical modeling

powerful where it works but most real algorithms cannot be analyzed

Measurement implement it on a real system and measure

it extremely expensive

Simulation Test on page reference traces reasonably efficient effective

147

Performance of paging algorithms

148

Thrashing VM allows more processes in memory, so

several processes are more likely to be ready to run at the same time

If CPU usage is low, it is logical to bring more processes into memory

But, low CPU use may to due to too many pages faults because there are too many processes competing for memory

Bringing in processes makes it worse, and leads to thrashing

149

Thrashing Diagram There are too many processes in memory and

no process has enough memory to run. As a result, the page fault is very high and the system spends all of its time handling page fault interrupts.

150

Load control Load control: deciding how many

processes should be competing for page frames too many leads to thrashing too few means that memory is underused

Load control determines which processes are running at a point in time the others have no page frames and cannot

run CPU load is a bad load control measure Page fault rate is a good load control

measure

151

Load control and page replacement

152

Two levels of scheduling

153

Load control algorithms A load control algorithm measures

memory load and swaps processes in and out depending on the current load

Load control measures rotational speed of the clock hand average time spent in the standby list page fault rate

154

Page fault frequency load control

L = mean time between page faults S = mean time to service a page fault Try to keep L = S

if L < S, then swap a process out if L > S, then swap a process in

If L = S, then the paging system can just keep up with the page faults

155

Windows NT Paging System

PrimaryMemory

Virtual AddressSpace

Supv spaceUser space

Paging Disk(Secondary Memory)

2. Lookup (Page i, Addr k)

3. Translate (Page i, Addr k) to (Page Frame j, Addr k)

4. Reference (Page Frame j, Addr k)

1. Reference to Address k in Page i (User space)

156

Windows Address Translation

Page Directory Page Table Byte Index

Virtual page number Line number

Page Directory

Page Tables

TargetPage

Target Byte

c

b

a

A

C

B

157

Linux Virtual Address Translation

j.pgdj.pgd j.pmdj.pmdVirtual Address

PageDirectory

PageDirectoryBase

Page

PageTable

Page MiddleDirectory

j.ptej.pte j.offsetj.offset

158

Segmentation Unit of memory movement is:

Variably sized Defined by the programmer

Two component addresses, <Seg#, offset> Seg # is reference to a base location Offset is offset of target within segment

Address translation is more complex than paging

159

Segment Address Translation

• Yt: segments x offsets physical address {W}• Yt(i, j) = k; i = segment, j = offset, k = physical address

• Segment names typically symbolic; bound at runtime• s: segments segment addresses• Yt(s(segName), j) = k

• Offset may also not be bound until runtime• l: offset names offset addresses

• So, address map could be as complex as• Yt(s(segName), l(offsetName)) = k

• Address translation is more complex than paging

160

Segment Address Translation

Task of designing segmentation system to handle such general address translation is very challenging

Each memory reference is theoretically a pair of symbols to be translated when the reference occurs

In addition, the mappings are time-varying The segment could be anywhere in primary

and/or secondary memory

161

Address Translation

s

<segmentName, offsetName>

l

segment # offset

Yt

Missing segment

Limit Base P

Limit

Relocation

+

?

To Memory Address Register

162

Address Translation – cont.

System maintains segment table for each process (which is a segment itself)

Table contains a set of entries – called segment descriptors

Descriptors contain fields to support relocation; also indicates if not loaded Base: relocation register for segment Limit: length of segment Protection: allowable forms of access

163

Implementation Most implementations do not fully

implement the address translation model

Segmentation requires special hardware Segment descriptor support Segment base registers (segment, code,

stack) Translation hardware

Some of translation can be static No dynamic offset name binding Limited protection

164

Multics Designed in late 60’s Old, but still state-of-the-art

segmentation Uses linkage segments to support sharing Uses dynamic offset name binding Required sophisticated memory

management unit See pp 500-502

memory management - utrgv faculty webfaculty.utrgv.edu/david.egle/csci4334/ch11… · ppt file ·...

Documents