operating systems & memory systems: address translation cps 220 professor alvin r. lebeck fall...

52
Operating Systems & Memory Systems: Address Translation CPS 220 Professor Alvin R. Lebeck Fall 2001

Upload: susan-haynes

Post on 06-Jan-2018

221 views

Category:

Documents


1 download

DESCRIPTION

3 © Alvin R. Lebeck 2001 CPS 220 I/O Bus Core Chip Set Processor Cache Main Memory Disk Controller Disk Graphics Controller Network Interface Graphics Network interrupts System Organization

TRANSCRIPT

Page 1: Operating Systems & Memory Systems: Address Translation CPS 220 Professor Alvin R. Lebeck Fall 2001

Operating Systems & Memory Systems: Address Translation

CPS 220Professor Alvin R. Lebeck

Fall 2001

Page 2: Operating Systems & Memory Systems: Address Translation CPS 220 Professor Alvin R. Lebeck Fall 2001

CPS 220 2© Alvin R. Lebeck 2001

Outline

• Address Translation– basics– 64-bit Address Space

• Managing memory• OS PerformanceThroughout• Review Computer Architecture• Interaction with Architectural Decisions

Page 3: Operating Systems & Memory Systems: Address Translation CPS 220 Professor Alvin R. Lebeck Fall 2001

CPS 220 3© Alvin R. Lebeck 2001

I/O Bus

Core Chip Set

Processor

Cache

MainMemory

DiskController

Disk Disk

GraphicsController

NetworkInterface

Graphics Network

interrupts

System Organization

Page 4: Operating Systems & Memory Systems: Address Translation CPS 220 Professor Alvin R. Lebeck Fall 2001

CPS 220 4© Alvin R. Lebeck 2001

Computer Architecture

• Interface Between Hardware and Software

Hardware

SoftwareOperatingSystem

Compiler

Applications

CPU Memory I/O

Multiprocessor Networks

This is IT

Page 5: Operating Systems & Memory Systems: Address Translation CPS 220 Professor Alvin R. Lebeck Fall 2001

CPS 220 5© Alvin R. Lebeck 2001

Memory Hierarchy 101

P

$

Memory

Very fast 1ns clockMultiple Instructionsper cycle SRAM, Fast, Small

Expensive

DRAM, Slow, Big,Cheap(called physical or main)

=> Cost Effective Memory System (Price/Performance)

Magnetic, Really Slow,Really Big, Really Cheap

Page 6: Operating Systems & Memory Systems: Address Translation CPS 220 Professor Alvin R. Lebeck Fall 2001

CPS 220 6© Alvin R. Lebeck 2001

Virtual Memory: Motivation

• Process = Address Space + thread(s) of control

• Address space = PA– programmer controls

movement from disk– protection?– relocation?

• Linear Address space– larger than physical

address space» 32, 64 bits v.s. 28-bit

physical (256MB)

• Automatic management

Virtual

Physical

Page 7: Operating Systems & Memory Systems: Address Translation CPS 220 Professor Alvin R. Lebeck Fall 2001

CPS 220 7© Alvin R. Lebeck 2001

Virtual Memory

• Process = virtual address space + thread(s) of control• Translation

– VA -> PA– What physical address does virtual address A map to– Is VA in physical memory?

• Protection (access control)– Do you have permission to access it?

Page 8: Operating Systems & Memory Systems: Address Translation CPS 220 Professor Alvin R. Lebeck Fall 2001

CPS 220 8© Alvin R. Lebeck 2001

Virtual Memory: Questions

• How is data found if it is in physical memory?

• Where can data be placed in physical memory? Fully Associative, Set Associative, Direct Mapped

• What data should be replaced on a miss? (Take CPS210 …)

Page 9: Operating Systems & Memory Systems: Address Translation CPS 220 Professor Alvin R. Lebeck Fall 2001

CPS 220 9© Alvin R. Lebeck 2001

Segmented Virtual Memory

• Virtual address (232, 264) to Physical Address mapping (230)

• Variable size, base + offset, contiguous in both VA and PA

Virtual

Physical0x1000

0x6000

0x9000

0x00000x1000

0x2000

0x11000

Page 10: Operating Systems & Memory Systems: Address Translation CPS 220 Professor Alvin R. Lebeck Fall 2001

CPS 220 10© Alvin R. Lebeck 2001

Intel Pentium Segmentation

Seg Selector Offset

Logical Address

SegmentDescriptor

Global DescriptorTable (GDT)

Segment Base Address

Physical Address Space

Page 11: Operating Systems & Memory Systems: Address Translation CPS 220 Professor Alvin R. Lebeck Fall 2001

CPS 220 11© Alvin R. Lebeck 2001

Pentium Segmention (Continued)

• Segment Descriptors– Local and Global– base, limit, access rights– Can define many

• Segment Registers– contain segment descriptors (faster than load from mem)– Only 6

• Must load segment register with a valid entry before segment can be accessed

– generally managed by compiler, linker, not programmer

Page 12: Operating Systems & Memory Systems: Address Translation CPS 220 Professor Alvin R. Lebeck Fall 2001

CPS 220 12© Alvin R. Lebeck 2001

Paged Virtual Memory

• Virtual address (232, 264) to Physical Address mapping (228)

– virtual page to physical page frame• Fixed Size units for access control & translation

Virtual

Physical0x1000

0x6000

0x9000

0x00000x1000

0x2000

0x11000

Virtual page number Offset

Page 13: Operating Systems & Memory Systems: Address Translation CPS 220 Professor Alvin R. Lebeck Fall 2001

CPS 220 13© Alvin R. Lebeck 2001

Page Table

• Kernel data structure (per process)• Page Table Entry (PTE)

– VA -> PA translations (if none page fault)– access rights (Read, Write, Execute, User/Kernel, cached/uncached)– reference, dirty bits

• Many designs– Linear, Forward mapped, Inverted, Hashed, Clustered

• Design Issues– support for aliasing (multiple VA to single PA)– large virtual address space– time to obtain translation

Page 14: Operating Systems & Memory Systems: Address Translation CPS 220 Professor Alvin R. Lebeck Fall 2001

CPS 220 14© Alvin R. Lebeck 2001

Alpha VM Mapping (Forward Mapped)

• “64-bit” address divided into 3 segments

– seg0 (bit 63=0) user code/heap– seg1 (bit 63 = 1, 62 = 1) user stack– kseg (bit 63 = 1, 62 = 0)

kernel segment for OS• Three level page table, each one

page– Alpha 21064 only 43 unique bits of VA– (future min page size up to 64KB => 55

bits of VA)• PTE bits; valid, kernel & user read

& write enable (No reference, use, or dirty bit)

– What do you do for replacement?

2110

POL3L2L1

base+

10 10 13

+

+

phys pageframe number

seg 0/1

Page 15: Operating Systems & Memory Systems: Address Translation CPS 220 Professor Alvin R. Lebeck Fall 2001

CPS 220 15© Alvin R. Lebeck 2001

Inverted Page Table (HP, IBM)

• One PTE per page frame

– only one VA per physical frame

• Must search for virtual address

• More difficult to support aliasing

• Force all sharing to use the same VA

Virtual page number Offset

VA PA,ST

Hash Anchor Table (HAT)

Inverted Page Table (IPT)

Hash

Page 16: Operating Systems & Memory Systems: Address Translation CPS 220 Professor Alvin R. Lebeck Fall 2001

CPS 220 16© Alvin R. Lebeck 2001

Intel Pentium Segmentation + Paging

Seg Selector Offset

Logical Address

SegmentDescriptor

Global DescriptorTable (GDT)

Segment Base Address

Linear Address Space

PageDir

Physical Address Space

Dir OffsetTable

PageTable

Page 17: Operating Systems & Memory Systems: Address Translation CPS 220 Professor Alvin R. Lebeck Fall 2001

CPS 220 17© Alvin R. Lebeck 2001

The Memory Management Unit (MMU)

• Input– virtual address

• Output– physical address– access violation (exception, interrupts the processor)

• Access Violations– not present– user v.s. kernel– write– read– execute

Page 18: Operating Systems & Memory Systems: Address Translation CPS 220 Professor Alvin R. Lebeck Fall 2001

CPS 220 18© Alvin R. Lebeck 2001

Translation Lookaside Buffers (TLB)

• Need to perform address translation on every memory reference

– 30% of instructions are memory references– 4-way superscalar processor– at least one memory reference per cycle

• Make Common Case Fast, others correct• Throw HW at the problem• Cache PTEs

Page 19: Operating Systems & Memory Systems: Address Translation CPS 220 Professor Alvin R. Lebeck Fall 2001

CPS 220 19© Alvin R. Lebeck 2001

Fast Translation: Translation Buffer

• Cache of translated addresses• Alpha 21164 TLB: 48 entry fully associative

Page Number

Pageoffset

. . . . . .

v r w tag phys frame

. . .

48:1 mux

1 2

. . .

483

4

Page 20: Operating Systems & Memory Systems: Address Translation CPS 220 Professor Alvin R. Lebeck Fall 2001

CPS 220 20© Alvin R. Lebeck 2001

TLB Design

• Must be fast, not increase critical path• Must achieve high hit ratio• Generally small highly associative• Mapping change

– page removed from physical memory– processor must invalidate the TLB entry

• PTE is per process entity– Multiple processes with same virtual addresses– Context Switches?

• Flush TLB• Add ASID (PID)

– part of processor state, must be set on context switch

Page 21: Operating Systems & Memory Systems: Address Translation CPS 220 Professor Alvin R. Lebeck Fall 2001

CPS 220 21© Alvin R. Lebeck 2001

Hardware Managed TLBs

• Hardware Handles TLB miss

• Dictates page table organization

• Compilicated state machine to “walk page table”

– Multiple levels for forward mapped

– Linked list for inverted

• Exception only if access violation

Control

Memory

TLB

CPU

Page 22: Operating Systems & Memory Systems: Address Translation CPS 220 Professor Alvin R. Lebeck Fall 2001

CPS 220 22© Alvin R. Lebeck 2001

Software Managed TLBs

• Software Handles TLB miss

• Flexible page table organization

• Simple Hardware to detect Hit or Miss

• Exception if TLB miss or access violation

• Should you check for access violation on TLB miss?

Control

Memory

TLB

CPU

Page 23: Operating Systems & Memory Systems: Address Translation CPS 220 Professor Alvin R. Lebeck Fall 2001

CPS 220 23© Alvin R. Lebeck 2001

Kernel

Mapping the Kernel

• Digital Unix Kseg– kseg (bit 63 = 1, 62 = 0)

• Kernel has direct access to physical memory

• One VA->PA mapping for entire Kernel

• Lock (pin) TLB entry– or special HW detection

UserStack

Kernel

User Code/Data

PhysicalMemory

0

264-1

Page 24: Operating Systems & Memory Systems: Address Translation CPS 220 Professor Alvin R. Lebeck Fall 2001

CPS 220 24© Alvin R. Lebeck 2001

Considerations for Address Translation

Large virtual address space• Can map more things

– files– frame buffers– network interfaces– memory from another workstation

• Sparse use of address space• Page Table Design

– space– less locality => TLB misses

OS structure• microkernel => more TLB misses

Page 25: Operating Systems & Memory Systems: Address Translation CPS 220 Professor Alvin R. Lebeck Fall 2001

CPS 220 25© Alvin R. Lebeck 2001

Address Translation for Large Address Spaces

• Forward Mapped Page Table– grows with virtual address space

» worst case 100% overhead not likely– TLB miss time: memory reference for each level

• Inverted Page Table– grows with physical address space

» independent of virtual address space usage– TLB miss time: memory reference to HAT, IPT, list search

Page 26: Operating Systems & Memory Systems: Address Translation CPS 220 Professor Alvin R. Lebeck Fall 2001

CPS 220 26© Alvin R. Lebeck 2001

Hashed Page Table (HP)

• Combine Hash Table and IPT [Huck96]

– can have more entries than physical page frames

• Must search for virtual address

• Easier to support aliasing than IPT

• Space– grows with physical space

• TLB miss– one less memory ref than

IPT

Virtual page number Offset

VA PA,ST

Hashed Page Table (HPT)Hash

Page 27: Operating Systems & Memory Systems: Address Translation CPS 220 Professor Alvin R. Lebeck Fall 2001

CPS 220 27© Alvin R. Lebeck 2001

Clustered Page Table (SUN)

• Combine benefits of HPT and Linear [Talluri95]

• Store one base VPN (TAG) and several PPN values

– virtual page block number (VPBN)

– block offset

VPBN Offset

VPBNnext

PA0 attrib

Hash

Boff

VPBNnext

PA0 attrib

..... .

PA1 attribPA2 attribPA3 attrib

VPBNnext

PA0 attribVPBNnext

PA0 attrib

Page 28: Operating Systems & Memory Systems: Address Translation CPS 220 Professor Alvin R. Lebeck Fall 2001

CPS 220 28© Alvin R. Lebeck 2001

Reducing TLB Miss Handling Time

• Problem– must walk Page Table on TLB miss– usually incur cache misses– big problem for IPC in microkernels

• Solution– build a small second-level cache in SW– on TLB miss, first check SW cache

» use simple shift and mask index to hash table

Page 29: Operating Systems & Memory Systems: Address Translation CPS 220 Professor Alvin R. Lebeck Fall 2001

CPS 220 29© Alvin R. Lebeck 2001

Next Time

• More TLB issues• Virtual Memory & Caches• Multiprocessor Issues

Page 30: Operating Systems & Memory Systems: Address Translation CPS 220 Professor Alvin R. Lebeck Fall 2001

Operating Systems & Memory Systems: Managing the Memory System

CPS 220Professor Alvin R. Lebeck

Page 31: Operating Systems & Memory Systems: Address Translation CPS 220 Professor Alvin R. Lebeck Fall 2001

CPS 220 31© Alvin R. Lebeck 2001

Review: Address Translation

• Map from virtual address to physical address• Page Tables, PTE

– va->pa, attributes– forward mapped, inverted, hashed, clustered

• Translation Lookaside Buffer– hardware cache of most recent va->pa translation– misses handled in hardware or software

• Implications of larger address space– page table size– possibly more TLB misses

• OS Structure– microkernels -> lots of IPC -> more TLB misses

Page 32: Operating Systems & Memory Systems: Address Translation CPS 220 Professor Alvin R. Lebeck Fall 2001

CPS 220 32© Alvin R. Lebeck 2001

Cache Memory 102

• Block 7 placed in 4 block cache:

– Fully associative, direct mapped, 2-way set associative

– S.A. Mapping = Block Number Modulo Number Sets

– DM = 1-way Set Assoc

• Cache Frame– location in cache

• Bit-selection

0 1 2 3 7

0 1 2 3 0 1 2 3

FADM7 mod 4

0 1 2 3

SA7 mod 2

Set 0

Set 1

MainMemory

Page 33: Operating Systems & Memory Systems: Address Translation CPS 220 Professor Alvin R. Lebeck Fall 2001

CPS 220 33© Alvin R. Lebeck 2001

Cache Indexing

• Tag on each block– No need to check index or block offset

• Increasing associativity shrinks index, expands tag

Fully Associative: No indexDirect-Mapped: Large index

Block offset

Block Address

TAG Index

Page 34: Operating Systems & Memory Systems: Address Translation CPS 220 Professor Alvin R. Lebeck Fall 2001

CPS 220 34© Alvin R. Lebeck 2001

Address Translation and Caches

• Where is the TLB wrt the cache?• What are the consequences?

• Most of today’s systems have more than 1 cache– Digital 21164 has 3 levels – 2 levels on chip (8KB-data,8KB-inst,96KB-unified)– one level off chip (2-4MB)

• Does the OS need to worry about this?

Definition: page coloring = careful selection of va->pa mapping

Page 35: Operating Systems & Memory Systems: Address Translation CPS 220 Professor Alvin R. Lebeck Fall 2001

CPS 220 35© Alvin R. Lebeck 2001

TLBs and Caches

CPU

TLB

$

MEM

VA

PA

PA

ConventionalOrganization

CPU

$

TLB

MEM

VA

VA

PA

Virtually Addressed CacheTranslate only on miss

Alias (Synonym) Problem

CPU

$ TLB

MEM

VA

PATags

PA

Overlap $ accesswith VA translation:requires $ index to

remain invariantacross translation

VATags

L2 $

Page 36: Operating Systems & Memory Systems: Address Translation CPS 220 Professor Alvin R. Lebeck Fall 2001

CPS 220 36© Alvin R. Lebeck 2001

Virtual Caches

• Send virtual address to cache. Called Virtually Addressed Cache or just Virtual Cache vs. Physical Cache or Real Cache

• Avoid address translation before accessing cache– faster hit time to cache

• Context Switches?– Just like the TLB (flush or pid)– Cost is time to flush + “compulsory” misses from empty cache– Add process identifier tag that identifies process as well as address

within process: can’t get a hit if wrong process

• I/O must interact with cache

Page 37: Operating Systems & Memory Systems: Address Translation CPS 220 Professor Alvin R. Lebeck Fall 2001

CPS 220 37© Alvin R. Lebeck 2001

I/O Bus

Memory Bus

Processor

Cache

MainMemory

DiskController

Disk Disk

GraphicsController

NetworkInterface

Graphics Network

interrupts

I/O and Virtual Caches

I/O Bridge

VirtualCache

PhysicalAddresses

I/O is accomplishedwith physical addressesDMA• flush pages from cache• need pa->va reverse translation• coherent DMA

Page 38: Operating Systems & Memory Systems: Address Translation CPS 220 Professor Alvin R. Lebeck Fall 2001

CPS 220 38© Alvin R. Lebeck 2001

Aliases and Virtual Caches

• aliases (sometimes called synonyms); Two different virtual addresses map to same physical address

• But, but... the virtual address is used to index the cache

• Could have data in two different locations in the cache

Kernel

UserStack

Kernel

User Code/Data

PhysicalMemory

0

264-1

Page 39: Operating Systems & Memory Systems: Address Translation CPS 220 Professor Alvin R. Lebeck Fall 2001

CPS 220 39© Alvin R. Lebeck 2001

• If index is physical part of address, can start tag access in parallel with translation so that can compare to physical tag

• Limits cache to page size: what if want bigger caches and use same trick?

– Higher associativity– Page coloring

Index with Physical Portion of Address

Page Address Page Offset

Address Tag Index Block Offset

Page 40: Operating Systems & Memory Systems: Address Translation CPS 220 Professor Alvin R. Lebeck Fall 2001

CPS 220 40© Alvin R. Lebeck 2001

Page Coloring for Aliases

• HW that guarantees that every cache frame holds unique physical address

• OS guarantee: lower n bits of virtual & physical page numbers must have same value; if direct-mapped, then aliases map to same cache frame

– one form of page coloring

Page Address

Page Offset

Address Tag

Index

Block Offset

Page 41: Operating Systems & Memory Systems: Address Translation CPS 220 Professor Alvin R. Lebeck Fall 2001

CPS 220 41© Alvin R. Lebeck 2001

Virtual Memory and Physically Indexed Caches

• Notion of bin– region of cache that may

contain cache blocks from a page

• Random vs careful mapping

• Selection of physical page frame dictates cache index

• Overall goal is to minimize cache misses

Cache Page frames

Page 42: Operating Systems & Memory Systems: Address Translation CPS 220 Professor Alvin R. Lebeck Fall 2001

CPS 220 42© Alvin R. Lebeck 2001

Careful Page Mapping

[Kessler92, Bershad94]• Select a page frame such that cache conflict misses

are reduced– only choose from available pages (no replacement induced)

• static– “smart” selection of page frame at page fault time

• dynamic– move pages around

Page 43: Operating Systems & Memory Systems: Address Translation CPS 220 Professor Alvin R. Lebeck Fall 2001

CPS 220 43© Alvin R. Lebeck 2001

Page Coloring

• Make physical index match virtual index• Behaves like virtual index cache

– no conflicts for sequential pages

• Possibly many conflicts between processes– address spaces all have same structure (stack, code, heap)– modify to xor PID with address (MIPS used variant of this)

• Simple implementation• Pick abitrary page if necessary

Page 44: Operating Systems & Memory Systems: Address Translation CPS 220 Professor Alvin R. Lebeck Fall 2001

CPS 220 44© Alvin R. Lebeck 2001

Bin Hopping

• Allocate sequentially mapped pages (time) to sequential bins (space)

• Can exploit temporal locality– pages mapped close in time will be accessed close in time

• Search from last allocated bin until bin with available page frame

• Separate search list per process• Simple implementation

Page 45: Operating Systems & Memory Systems: Address Translation CPS 220 Professor Alvin R. Lebeck Fall 2001

CPS 220 45© Alvin R. Lebeck 2001

Best Bin

• Keep track of two counters per bin– used: # of pages allocated to this bin for this address space– free: # of available pages in the system for this bin

• Bin selection is based on low values of used and high values of free

• Low used value– reduce conflicts within the address space

• High free value– reduce conflicts between address spaces

Page 46: Operating Systems & Memory Systems: Address Translation CPS 220 Professor Alvin R. Lebeck Fall 2001

CPS 220 46© Alvin R. Lebeck 2001

Hierarchical

• Best bin could be linear in # of bins• Build a tree

– internal nodes contain sum of child <used,free> values

• Independent of cache size– simply stop at a particular level in the tree

Page 47: Operating Systems & Memory Systems: Address Translation CPS 220 Professor Alvin R. Lebeck Fall 2001

CPS 220 47© Alvin R. Lebeck 2001

Benefit of Static Page Coloring

• Reduces cache misses by 10% to 20%• Multiprogramming

– want to distribute mapping to avoid inter-address space conflicts

Page 48: Operating Systems & Memory Systems: Address Translation CPS 220 Professor Alvin R. Lebeck Fall 2001

CPS 220 48© Alvin R. Lebeck 2001

Dynamic Page Coloring

• Cache Miss Lookaside (CML) buffer [Bershad94]– proposed hardware device

• Monitor # of misses per page• If # of misses >> # of cache blocks in page

– must be conflict misses– interrupt processor – move a page (recolor)

• Cost of moving page << benefit

Page 49: Operating Systems & Memory Systems: Address Translation CPS 220 Professor Alvin R. Lebeck Fall 2001

CPS 220 49© Alvin R. Lebeck 2001

Outline

• Page Coloring• Page Size

Page 50: Operating Systems & Memory Systems: Address Translation CPS 220 Professor Alvin R. Lebeck Fall 2001

CPS 220 50© Alvin R. Lebeck 2001

A Case for Large Pages

• Page table size is inversely proportional to the page size

– memory saved

• Fast cache hit time easy when cache <= page size (VA caches);

– bigger page makes it feasible as cache size grows

• Transferring larger pages to or from secondary storage, possibly over a network, is more efficient

• Number of TLB entries are restricted by clock cycle time,

– larger page size maps more memory– reduces TLB misses

Page 51: Operating Systems & Memory Systems: Address Translation CPS 220 Professor Alvin R. Lebeck Fall 2001

CPS 220 51© Alvin R. Lebeck 2001

A Case for Small Pages

• Fragmentation– large pages can waste storage– data must be contiguous within page

• Quicker process start for small processes(??)

Page 52: Operating Systems & Memory Systems: Address Translation CPS 220 Professor Alvin R. Lebeck Fall 2001

CPS 220 52© Alvin R. Lebeck 2001

Superpages

• Hybrid solution: multiple page sizes– 8KB, 16KB, 32KB, 64KB pages– 4KB, 64KB, 256KB, 1MB, 4MB, 16MB pages

• Need to identify candidate superpages– Kernel– Frame buffers– Database buffer pools

• Application/compiler hints• Detecting superpages

– static, at page fault time– dynamically create superpages

• Page Table & TLB modifications