Download - CENG 334 – Operating Systems 06- Memory Asst. Prof. Yusuf Sahillioğlu Computer Eng. Dept,, Turkey

CENG 334 – Operating Systems

06- Memory

Asst. Prof. Yusuf Sahillioğlu

Computer Eng. Dept, , Turkey

Memory Management2 / 105

Program must be brought (from disk) into memory to run

Main memory and registers are only storage CPU can access directly

Register access in one CPU clock cycle (perform multiplication a * b)

Main memory can take many cycles (read the operands from memory, or write result back to memory)

Cache sits between main memory and CPU registers (2-3 cycles) Instructions that are executed and data that is operated

on

Protection of memory required to ensure correct operation


Overview


When code is generated (or assembly program is written) we use memory addresses for varaibles, functions, and branching/jumping

Those addresses can be physical or logical (=virtual) memory addreses

Physical: discontinuous locations in main memory.

Logical:


When code is generated (or assembly program is written) we use memory addresses for varaibles, functions, and branching/jumping

Those addresses can be physical or logical (=virtual) memory addreses

Physical:

Logical: each process is given its owncontinuous logical memory space = ownview of memory with its own addr. spaceLogical addresses divided into fixed-sizepages.


Key advantage of logical addressing (= paging). Eliminates the issue of external fragmentation. Since CPU translates logical page-based addresses to

physical frame-based addresses there is no need for the physical frames to be continuous


Physical address of a variable is 0x0734432. That variable has to sit there while the program is executing: no relocation.

Logical address of a variable is 7 for myArray[7]. Not has to sit at physical 0x0000007 (7 in hexadecimal).


Assume physical addresses in use: no problem


Assume physical addresses in use for 2 parallel programs: problem


We cannot have a multiprogramming environment. We cannot load a program to an arbitrary position. Early systems have this physical addressing idea. Thank god

we don’t.


Logical address space concept. A program uses logical addresses. Logical address space has to be mapped somewhere in

physical (main) memory.


Logical addresses provide Multiprogramming environment Relocatable code

Binding: mapping logical addresses to physical addresses. physicalAddr = logicalAddr + base Logical address space is bound to a physical address space


An example


An example

Memory Mananagement Unit (MMU) converts logical address 28 into physical address (28 + 24 = 52 M[52]) in execution time.


Hardware device that at run time maps virtual (logical) to physical address

In prev simple example we used 1 relocation register: base More complicated schemes around

The user program deals with logical addresses; it never sees the real physical addresses

Execution-time binding occurs when reference is made to location in memory

Logical address bound to physical addresses


Dynamic relocation using a relocation register


Another memo management idea: Swapping Assume 10 programs loaded into memo and memory is filled

up A process can be swapped temporarily out of memory to a

backing store (disk), and then brought back into memo for continued execution

Started if more than threshold amount of memory allocated Disabled again once memory demand reduced below

threshold


Another memo management idea: Swapping


Contiguous allocation (continuous): allocate physical space that is equal to process’ logical address space

Main memory usually into two partitions: Resident operating system, usually held in low memory

with interrupt vector User processes then held in high memory

Relocation registers used to protect user processes from each other, and from changing operating-system code and data Base register contains value of smallest physical address Limit register contains range of logical addresses (size of

the program): each logical address must be less than the limit register

MMU maps logical address dynamically


HW support for relocation and limit registers


Contiguous allocation

After a while we see partitions, some of which are empty (hole).


Contiguous allocation Multiple-partition allocation

Degree of multiprogramming limited by number of partitions

Hole: block of available memory; holes of various size are scattered throughout memory

When a process arrives, it is allocated memory from a hole large enough to accommodate it

Process exiting frees its partition, adjacent free partitions combined

Operating system maintains information about:a) allocated partitions b) free partitions (hole)


Contiguous allocation How to satsify a request of size n from a list of free holes?

First-fit: Allocate the first hole that is big enough Best-fit: Allocate the smallest hole that is big enough; must

search entire list, unless ordered by size Produces the smallest leftover hole

Worst-fit: Allocate the largest hole; must also search entire list Produces the largest leftover hole


Fragmentation: There will be useless holes that cannot accommodate any process continuously

External Fragmentation: external to allocated partitions you have unused space Total memory space exists to satisfy a request, but it is not

contiguous Reduce external fragmentation by compaction:

Shuffle memory contents to place all free memory together in 1 large block

Internal Fragmentation: allocated memory may be slightly larger than requested memory; this size difference is memory internal to a partition, but not being used


More advanced idea for memory management: Paging Used for implementing virtual memory which allows a

program whose size is > physical memory size to be run Also good for eliminating the external fragmentation Allows logical and physical address spaces to be

noncontiguous High utilization of memory space


More advanced idea for memory management: Paging Divide physical memory into fixed-sized blocks called frames Size is power of 2, between 512 bytes and 16 Mbytes Divide logical memory into blocks of same size called pages Keep track of all free frames To run a program of size N pages, need to find N free frames

and load program Set up a page table to translate logical to physical addresses

Not a simple translation anymore: phyAddr != logAddr + base


More advanced idea for memory management: Paging


More advanced idea for memory management: Paging Divide physical memory into fixed-size blocks, called (page)

frames.

Page frame is a container that can hold a content which is a page.


More advanced idea for memory management: Paging Divide physical memo into fixed-size (4K) blocks, called

(page) frames. Frame0 has address space from 0 to 4095, frame1 has 4096

to 8191, ..


More advanced idea for memory management: Paging Divide logical address space into fixed-size blocks, called

pages, whose size is equal to the page frame size (4K) Frame0 has address space from 0 to 4095, frame1 has 4096

to 8191, ..



pages, whose size is equal to the page frame size (4K) When program loaded into memo, allocation not have to be

contiguous




contiguous

Info is kept in a Page Table




contiguous

Info is kept in a Page Table, determined by OS, for each process

Conversion logical->physical done by HW (CPU)


More advanced idea for memory management: Paging Conversion logical->physical done by HW (CPU) Assume pageSize = 4 bytes pageNumber = 1 & offset = 3 for h LA = 7; PA = ? PA = 4*6 + 3 = 27


More advanced idea for memory management: Paging Conversion logical->physical done by HW (CPU) Assume pageSize = 4 bytes _ _ _ _ //4bit logical address First _ _ for page number Next _ _ for offset (displacement)

inside page Logical address of h is 0 1 1 1


More advanced idea for memory management: Paging Conversion logical->physical done by HW (CPU) Assume pageSize = 4 bytes LA for f = 5; PA = ?

Logical address is 0 1 0 1 (5 in binary) Page number = 01 1 in decimal PA = 110 01 (Frame number = 6)

Offset will not change ‘cos it is relative position; copy from LA 110 01 = 25 in decimal, which is the PA case for f


More advanced idea for memory management: Paging Conversion logical->physical done by HW (CPU) Assume pageSize = 4 bytes LA for l = 11; PA = ?


Offset will not change ‘cos it is relative position; copy from LA 001 11 = 7 in decimal, which is the PA case for l


More advanced idea for memory management: Paging Conversion logical->physical done by HW (CPU) Assume pageSize = 4 bytes LA for n = 13; PA = ?


Offset will not change ‘cos it is relative position; copy from LA 010 01 = 9 in decimal, which is the PA case for n


More advanced idea for memory management: Paging Conversion logical->physical done by HW (CPU) In general Address generated by CPU is divided into:

Page number (p): used as an index into a page table which contains base address of each page in physical memory

Page offset (d): combined with base address to define the physical memory address that is sent to the memory unit

For given logical address space 2m and page size 2n


More advanced idea for memory management: Paging Conversion logical->physical done by HW (CPU) In general


More advanced idea for memory management: Paging Conversion logical->physical done by HW (CPU)

Must be very fast ‘cos done for every memory reference At least 1 memory access to fetch the instruction Plus potential memory operation(s) for that instruction

(LOAD) Setting up the page table done by SW (OS)

When program is loaded into memo, OS knows into which frames the pages of the program are loaded


More advanced idea for memory management: Paging We will have free and used frames in memory at any time t


More advanced idea for memory management: Paging 8-byte long program (each byte is an instruction, not a char, but

anyway)


Implementation of Page Table Page table is kept in main memory, per process Page-table base register (PTBR) points to the page table (load with

context switch) Page-table length register (PTLR) indicates size of the page table (load

with cs) In this scheme every data/instruction access requires two memory

accesses: 1 for the page table (‘cos table is in memory) and 1 for the data/instruction (by phy adr)


Implementation of Page Table

Access to Page Table in memory (for logical physical conversion)

Access to that physical address in memory (2nd access)


Implementation of Page Table The two memory access problem can be solved by the use of a special

fast-lookup hardware cache called associative memory or translation look-aside buffers (TLBs)

Some TLBs store address-space identifiers (ASIDs) in each TLB entry: uniquely identifies each process to provide address-space protection for that process

After we learn page frame, we store this association in 1 entry of TLB. A page has 4096 instructions so it is likely that I’ll access the same

page again soon (in the next instruction); keep that page frame mapping in the cache.

Without ASIDs you have to flush (erase) TLB at every context switch (0 17 of P1 may not work for P2).

TLBs typically small (64 to 1,024 entries)


TLB associative memory Associative memory: parallel search

Address translation (p, d) If p is in associative register, get frame # out Otherwise get frame # from page table in memory


Paging HW with TLB


Effective memory access time w/ Paging HW with TLB Associative Lookup = e time unit //e = epsilon Assume memory access (cycle) time is 1 msec (>> e)

Hit ratio = alpha Hit ratio: percentage of times that a page number is found in

the TLB

Effective Access Time (EAT)EAT = HIT + MISS

= (1 + e)alpha + (2 + e)(1 – alpha) = 2 + e – alpha msecs //e << alpha so ignore it


Memory protection with paging scheme Memory protection implemented by associating protection bit

with each frame to indicate if read-only or read-write access is allowed

Valid-invalid bit attached to each entry in the page table: “valid” indicates that the associated page is in the process’

logical address space, and is thus a legal page “invalid” indicates that the page is not in the process’

logical address space


Memory protection with paging scheme


Shared pages

Shared code One copy of read-only (reentrant) code shared among

processes (i.e., text editors, compilers, window systems) Similar to multiple threads sharing the same process space Also useful for interprocess communication if sharing of

read-write pages is allowed

Private code and data Each process keeps a separate copy of the code and data The pages for the private code and data can appear

anywhere in the logical address space


Shared pages


Structure of the page table is important 1D page table can grow to a large size if u have a large

address space 4GB of logical memory (32bit systems may have < 232

=4GB spce) Each page is 4KB Then you have 4GB / 4KB = ~1M (million) pages

1M entries needed in a 1D page table Each entry 4 bytes 4MB page table per process; too

large!


Solutions to 1D page table problem Hierarchical paging Hashed page tables Inverted page tables


Hierarchical page tables

Break up the logical address space into multiple page tables Some portion of the logical address space will be mapped

by some page table, some portion by another page tables, and so on This idea replaces the Single page table responsible for

mapping the whole logical address space You usually use a small portion of your logical addr space

need small page tables to map those portions. Unused portion stored on disk (brought to memo when

necessary) A simple technique is a two-level page table


Hierarchical page tables Two-level page table scheme (2D): page the page table

page table

level1

level2


Hierarchical page tables Two-level page table scheme (2D): page the page table 32 bit logical address:

d=10 bits page size 2^10 = 1K (each page stores 1K data) p2 = 10 bits second-level page table can have at most 1K

entries p1 = 12 bits first-level page table can have at most 4K

entries p1 part is used as an index to outer page table; p2 to 2nd level

page table

page number page offset

p1 p2 d

12 10 10


Hierarchical page tables Two-level page table scheme (2D): page the page table From outer page table (idx: p1), get the address of the 2nd

level page table. From 2nd level page table (idx: p2), get the page frame

number in PA. From page frame in physical memo (idx: d), get the desired

content.


Hierarchical page tables Two-level page table scheme (2D): page the page table Benefit: reduce page table space needed for a program. Logical address length = 32 bits Page size = 4K (4096 bytes) Logical address division: 10, 10, 12

Program has 4GB (2^32) address space But only uses bottom and top portions (20 MB) Don’t need a page table for the unused part Need only 1 top-level page table Need ?? second-level page tables


Hierarchical page tables Two-level page table scheme (2D): page the page table 2nd-level page table has 210 entries Each entry can map a page of 4K size 210 * 212 = 222 = 4MB of logical address space can be mapped by a single2nd-level page table!

Use this fact to find theanswer for the prev slide..


Hierarchical page tables Two-level page table scheme (2D): page the page table Each 4MB can be mapped by 1 2nd-level page table need

20/4=5!!!!


Hierarchical page tables Two-level page table scheme (2D): page the page table Benefit: using whole address space (4GB per process) of 1D

case reduces to only 24KB per process using 2-level page table scheme.

Assume each entry (in both top-level and 2nd-level tables) 4 bytes.


Hierarchical page tables Three-level page table scheme (3D) for 64 bit logical address

space. Same idea but have 2 outer pages now. Every process has to have outer page table in memory: 242

too big.

232 still huge for top-level page table (4GB per process).


Hashed page tables Solves the huge top-level page table problem of hierarchial

page tables Arises after exceeding a threshold address spacing length

Reduce large pinto a numberin, say, [0, 1K]

Virtual Memory67 / 105

Utilize memory management techniques to implement Virtual Memory

Again separation of the logical memory and physical memory Virtual memory size can be much bigger than the physical

memory

Idea Just bring a small portion of the program into memory

initially Bring the rest whenever that part is needed (unbring

unused part) Initially bring the part that includes the main() Then jump to a long function and bring it

Benefit Execute more programs in parallel ‘cos each needs less

storage in physical memory Logical address space can be much larger than physical

memory


Virtual memory that is larger than physical memory


Typical virtual address space layout

Compiler sets this up It places the code/instructions Then data (global variables) Then heap section to allocate storage for pointer, e.g. malloc returnsspace to you from heap section (then you can write/read something to/from there) Then run-time stack for called functionsFunction A calls B, which calls C, .. stack grows


Virtual memory can be implemented in the following way

Demand Paging Bring pages into memory when they are used, i.e.,

allocate memo for pages when they are used


Demand Paging: bring a page into memory only when it is needed

Less I/O needed, no unnecessary I/O Less memory needed Faster response More users (each user program needs less space in physical

memo)

Page is needed reference to it invalid reference abort not-in-memory bring to memory

Lazy swapper: never swap a page into memory unless page’ll be needed Swapper that deals with pages is a pager


Valid-invalid bit: who is currently in memory?

Initially all invalid: i Page[2] in memory but Page[n-1] is not During address translation, if bit is i page fault


Page table when some pages not in main memory

Instructions in Page[0] may be all pages in disk all the time

using an operand stored in Page[4], (must fit otherwise can’t run)

which makes a page fault! suspend

the program and bring it in. schedule another program during suspension


Page fault: not being able to find a page in memory When triggered?

While CPU is executing an instruction (w/ a memo operand/address at a different page not in physical memo)

While CPU is fetching an instruction (the page containing the next instruction to execute not in main/physical memory)

Handling? Get an empty frame to load the new page into Load the page to that empty frame (disk I/O) Reset page table (new index, validation bit) Restart the instruction that caused the page fault


Handling a page fault


Performance of demand paging

Page fault rate 0 <= p <= 1 p = 0 means no page fault p = 1 means every reference is a page fault (page not in

memo)

Effective Access Time to Memory: EAT

EAT = (1-p) memoryAccess + p(pageFaultOverhead + swap page out + swap page in + restartOverhead)


Performance of demand paging


COW: Copy-on-Write

Just another benefit of Virtual Memory Used during process creation (fork()) for fast child creation


COW: Copy-on-Write

After fork() Child has its own address space that’s duplicated from the

parent Child has its own memory whose content is initially same

as parent

Since they’re the same, initially we can have the child share the pages of the parent

No need to copy everything as long as child & parent are just reading those pages

Do copying when 1 of the processes modify/write


COW: Copy-on-Write

Before Process 1 modifies Page C


COW: Copy-on-Write

After Process 1 modifies Page C


Page replacement: what happens if there is no free frame to put the new page in?

Find some not-in-use page in memory and swap it out

Whish page to remove? An algorithm which results in min # of page faults is

preferrable

With page replacement same page may be brought into mem 1+ times


Prevent over-allocation by 1 process by giving each process a fixed # frames to play with

Modify page-fault service routine to include page-replacement over those frames

Only modified pages are written back to disk (I/O) while being removed/replaced

Attach a modify (dirty) bit to each page in memory to do this, i.e., to reduce the overhead of page transfers

Separation b/w logical and physical memo achieved w/ page replacmnt Compiler writers or assembly coders can now use a large

virtual memory on a smaller physical memory


Need for page replacement


Basic page replacement OS finds the location of the desired page on disk Find a free frame

If there is one, use it If there is none, use a page replacement policy to select a

victim frame; if the victim is modified, write it back to disk See policies at Onur hoca’s cool paging.js demo

Optimum FIFO Second chance Least recently used (LRU)

Bring the desired page into the free frame; update page table Restart the process at the instruction that caused the page

fault


Basic page replacement


Expect less # of page faults as # frames increases


Page replacement policy: FIFO

Better policies prevent such anomalies.


FIFO Illustrating Belady’s Anomaly


Other page replacement algorithms to know Optimum FIFO Second chance Least recently used (LRU) See Onur hoca’s demo: paging.js


Allocation of frames So far we learnt which page to remove/replace in case of a

page fault Now learn how many frames should we allocate to a process

If the process doesn’t have enough pages in memory then page fault rate is high, which leads to Low CPU utilization (more I/O to retrieve pages from disk) OS thinks that it needs to increase the degree of

multiprogramming (‘cos CPU seems to be underutilized) Another process added to the sys, which makes it even

worse Thrashing: a process is busy doing I/O (swapping pages in

and out)


Thrashing

Initially 1 process utilizes half of the CPU (half I/O) As # processes increase, utilization increases as 1+ of them

needs CPU After a while, not enough frames in memory causes page

faults


To prevent thrashing, use demand paging: don’t bring the whole program into memory at once. Just bring the needed pages.

This works ‘cos we have locality in program execution Some set of instructions (loops) executed repeatedly

Pages storing those instructions heavily accessed for a while

When does thrashing occur? SUM size-of-locality > total memory size No thrashing: Guarantee SUM size-of-locality < total

memory size


Working-set model An algorithm to decide how many frames to give to each

process Can also be used as a page replacement algorithm

Look Delta back to learn the heavily used pages: WS Alloc |WS| frames as those pages are likely to be used again

(locality)


Working-set model How to use as a page replacer? If you have only 2 frames allocated and in those frames

you have a page p1 in WS and another page p2 not in WS Remove p2


Working-set model WSSi (working set of Process Pi) =

total # of pages referenced in the most recent Delta (varies in time) if Delta too small will not encompass/cover entire locality if Delta too large will encompass several localities if Delta = INF will encompass entire program

D = SUM WSSi = total demand frames (approximation of locality)

if D > m Thrashing

Policy if D > m, then suspend or swap out one of the processes


Page-Fault Frequency (PFF) scheme Alternative to the Working-set model for frame allocation

Dynamically tune memory size of process based on # page faults

Monitor page fault rate for each process (faults per sec) If page fault rate above threshold, give process more memory

Should cause process to fault less Doesn't always work!

Recall Belady's Anomaly If page fault rate below threshold, reduce memory allocaton


You can understand from fault curve where the locality starts and ends

# page faults increases as pages (of this locality) are brought in

When needed pages (of locality) in memory, # page faults decreases


You can understand from fault curve where the locality starts and ends


Memory-mapped files: treat file disk blocks as memory pages Memory-mapped file I/O allows file I/O to be treated as routine

memory access by mapping a disk block to a page in memory A file is initially read using demand paging (map some blocks

of file into pages of virtual memory). A page-sized portion of the file is read from the file system into a physical page. Subsequent reads/writes to/from the file are treated as ordinary memory accesses

Simplifies and speeds file access by driving file I/O through memory rather than read() and write() system calls

Also allows several processes to map the same file allowing the pages in memory to be shared


Memory-mapped files: treat file disk blocks as memory pages

Process maps file to some portion of its logical addr space via mmap()


Memory-mapped files: treat file disk blocks as memory pages

Shared memory can be implemented this way (no disk access at all)


Allocating kernel memory Allocate memory to dynamic kernel objects/structures Static allocation during loading/booting is easy

So far we learnt allocating memo to processes Allocate frames in physical memory to the pages of the

processes

Dynamic kernel objects?


Allocating kernel memory Allocate memory to dynamic kernel objects/structures Static allocation during loading/booting is easy

So far we learnt allocating memo to processes Allocate frames in physical memory to the pages of the

processes

Dynamic kernel objects? Creation of a process initiates a PCB structure in kernel Creation of a semaphore, queues of condition variables


Allocating kernel memory Why dynamic kernel memory allocation is a problem?

‘cos kernel objects are much smaller than processes’ objects

Semaphore: 8/16 bytes PCB: 1200 bytes Much less than the page size

Can’t give whole page to a semaphore (wasteful) Some frames are reserved for dynamically allocated kernel

objects Solution

We could use first/best-fit heap management but it causes external fragmentation

Better techniques Buddy system allocator Slab allocator

Download - CENG 334 – Operating Systems 06- Memory Asst. Prof. Yusuf Sahillioğlu Computer Eng. Dept,, Turkey

Top Related