cs152 – computer architecture and engineering lecture 15 ...cs152/fa04/lecnotes/lec8-2.pdf · ¥...

23
CS 152 L15 Virtual Memory () UC Regents Fall 2004 © UCB 2004-10-21 Dave Patterson (www.cs.berkeley.edu/~patterson) John Lazzaro (www.cs.berkeley.edu/~lazzaro) www-inst.eecs.berkeley.edu/~cs152/ CS152 – Computer Architecture and Engineering Lecture 15 – Virtual Memory 1

Upload: others

Post on 31-Jul-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: CS152 – Computer Architecture and Engineering Lecture 15 ...cs152/fa04/lecnotes/lec8-2.pdf · ¥ 1. Dri ve bit line ¥ 2.. Select row ¡Read: ¥ 1. Precharge bit line to Vdd/2 ¥

CS 152 L15 Virtual Memory () UC Regents Fall 2004 © UCB

2004-10-21 Dave Patterson

(www.cs.berkeley.edu/~patterson)

John Lazzaro (www.cs.berkeley.edu/~lazzaro)

www-inst.eecs.berkeley.edu/~cs152/

CS152 – Computer Architecture andEngineering

Lecture 15 – Virtual Memory

1

Page 2: CS152 – Computer Architecture and Engineering Lecture 15 ...cs152/fa04/lecnotes/lec8-2.pdf · ¥ 1. Dri ve bit line ¥ 2.. Select row ¡Read: ¥ 1. Precharge bit line to Vdd/2 ¥

CS 152 L15 Virtual Memory () UC Regents Fall 2004 © UCB

Last Time: How to Design a Cache

ToCPU

ToLowerLevelMemory

ToCPU

ToLowerLevelMemory

TagsBlocks

Addr

Din

Dout

Addr

Din

Dout

State Machine

Control

Control Control

Most design errors come from incorrect specification of state machine behavior!

Common bugs: stalls, block replacement, write buffer

2

Page 3: CS152 – Computer Architecture and Engineering Lecture 15 ...cs152/fa04/lecnotes/lec8-2.pdf · ¥ 1. Dri ve bit line ¥ 2.. Select row ¡Read: ¥ 1. Precharge bit line to Vdd/2 ¥

CS 152 L15 Virtual Memory () UC Regents Fall 2004 © UCB

Today’s Lecture - Virtual Memory

Virtual address spaces

Page table layout

TLB design options

DRAM technology

3

Page 4: CS152 – Computer Architecture and Engineering Lecture 15 ...cs152/fa04/lecnotes/lec8-2.pdf · ¥ 1. Dri ve bit line ¥ 2.. Select row ¡Read: ¥ 1. Precharge bit line to Vdd/2 ¥

UC Regents Fall 2004 © UCBCS 152 L15 Virtual Memory ()

DRAM Technology

4

Page 5: CS152 – Computer Architecture and Engineering Lecture 15 ...cs152/fa04/lecnotes/lec8-2.pdf · ¥ 1. Dri ve bit line ¥ 2.. Select row ¡Read: ¥ 1. Precharge bit line to Vdd/2 ¥

UC Regents Fall 2004 © UCBCS 152 L15 Virtual Memory ()

Why DRAM over SRAM? Density!bit

!"#$%&'())* ++,!-.)'/ 012-3/414- 56&1'--

!"#$%&#'()'*"+,(-"*.$+&/"0(

1 234)(-'##

1 5$+6'+(7'##(! #"8'+(9'0/&%,:(;&6;'+(7"/%<=&%((((((((((((((((((

1 >"(+'?+'/;(+'@A&+'9(

1 2&*.#'(+'$9(! ?$/%'+($77'//(

1 2%$09$+9(B-(.+"7'//(! 0$%A+$#(?"+(&0%'6+$%&"0(8&%;(#"6&7

1 C34)(-'##

1 2*$##'+(7'##(! ;&6;'+(9'0/&%,:(#"8'+(7"/%<=&%(

1 >''9/(.'+&"9&7(+'?+'/;:($09(+'?+'/;($?%'+(+'$9(

1 -"*.#'D(+'$9(! #"06'+($77'//(%&*'(

1 2.'7&$#(B-(.+"7'//(! 9&??&7A#%(%"(&0%'6+$%'(8&%;(#"6&7(7&+7A&%/

8"+9(#&0'

=&%(#&0' =&%(#&0'

8"+9(#&0'

=&%(#&0'

!"#$%&'()&*$+',,#&#-.#$/#01##-$+',,#&#-0$(#(2&*$0*%#3$'3$0"#$/'0 .#445

bit

word

1 0

SRAM Cell: Large

6 transistorsnFET and pFET3 interface wires

Vdd and Gnd

DRAM Cell: Small

transistor + capacitornFET only

2 interface wiresno Vdd

4/12/04 ©UCB Spring 2004CS152 / Kubiatowicz

Lec19.21

1-Transistor Memory Cell (DRAM)

° Write:• 1. Drive bit line

• 2.. Select row

° Read:• 1. Precharge bit line to Vdd/2

• 2.. Select row

• 3. Cell and bit line share charges

- Very small voltage changes on the bit line

• 4. Sense (fancy sense amp)

- Can detect changes of ~1 million electrons

• 5. Write: restore the value

° Refresh• 1. Just do a dummy read to every cell.

row select

bit

4/12/04 ©UCB Spring 2004CS152 / Kubiatowicz

Lec19.22

DRAM Capacitors: more capacitance in a small area

° Trench capacitors:• Logic ABOVE capacitor

• Gain in surface area of capacitor

• Better Scaling properties

• Better Planarization

° Stacked capacitors• Logic BELOW capacitor

• Gain in surface area of capacitor

• 2-dim cross-section quite small

4/12/04 ©UCB Spring 2004CS152 / Kubiatowicz

Lec19.23

Classical DRAM Organization (square)

row

decoder

rowaddress

Column Selector &I/O Circuits Column

Address

data

RAM CellArray

word (row) select

bit (data) lines

° Row and Column Address together:

• Select 1 bit a time

Each intersection represents

a 1-T DRAM Cell

4/12/04 ©UCB Spring 2004CS152 / Kubiatowicz

Lec19.24

DRAM logical organization (4 Mbit)

° Square root of bits per RAS/CAS

Column Decoder

Sense Amps & I/O

Memory Array

(2,048 x 2,048)A0…A10

11 D

Q

Word LineStorage Cell

Density advantage: 3X to 10X, depends on metric

5

Page 6: CS152 – Computer Architecture and Engineering Lecture 15 ...cs152/fa04/lecnotes/lec8-2.pdf · ¥ 1. Dri ve bit line ¥ 2.. Select row ¡Read: ¥ 1. Precharge bit line to Vdd/2 ¥

UC Regents Fall 2004 © UCBCS 152 L15 Virtual Memory ()

DRAM: Reading, Writing, RefreshWriting DRAM:

Drive data on bit lineSelect row

4/12/04 ©UCB Spring 2004CS152 / Kubiatowicz

Lec19.21

1-Transistor Memory Cell (DRAM)

° Write:• 1. Drive bit line

• 2.. Select row

° Read:• 1. Precharge bit line to Vdd/2

• 2.. Select row

• 3. Cell and bit line share charges

- Very small voltage changes on the bit line

• 4. Sense (fancy sense amp)

- Can detect changes of ~1 million electrons

• 5. Write: restore the value

° Refresh• 1. Just do a dummy read to every cell.

row select

bit

4/12/04 ©UCB Spring 2004CS152 / Kubiatowicz

Lec19.22

DRAM Capacitors: more capacitance in a small area

° Trench capacitors:• Logic ABOVE capacitor

• Gain in surface area of capacitor

• Better Scaling properties

• Better Planarization

° Stacked capacitors• Logic BELOW capacitor

• Gain in surface area of capacitor

• 2-dim cross-section quite small

4/12/04 ©UCB Spring 2004CS152 / Kubiatowicz

Lec19.23

Classical DRAM Organization (square)

row

decoder

rowaddress

Column Selector &I/O Circuits Column

Address

data

RAM CellArray

word (row) select

bit (data) lines

° Row and Column Address together:

• Select 1 bit a time

Each intersection represents

a 1-T DRAM Cell

4/12/04 ©UCB Spring 2004CS152 / Kubiatowicz

Lec19.24

DRAM logical organization (4 Mbit)

° Square root of bits per RAS/CAS

Column Decoder

Sense Amps & I/O

Memory Array

(2,048 x 2,048)A0…A10

11 D

Q

Word LineStorage Cell

1

1

1

Reading DRAM

Select rowSense bit line

(~1 million electrons)Write value back

4/12/04 ©UCB Spring 2004CS152 / Kubiatowicz

Lec19.21

1-Transistor Memory Cell (DRAM)

° Write:• 1. Drive bit line

• 2.. Select row

° Read:• 1. Precharge bit line to Vdd/2

• 2.. Select row

• 3. Cell and bit line share charges

- Very small voltage changes on the bit line

• 4. Sense (fancy sense amp)

- Can detect changes of ~1 million electrons

• 5. Write: restore the value

° Refresh• 1. Just do a dummy read to every cell.

row select

bit

4/12/04 ©UCB Spring 2004CS152 / Kubiatowicz

Lec19.22

DRAM Capacitors: more capacitance in a small area

° Trench capacitors:• Logic ABOVE capacitor

• Gain in surface area of capacitor

• Better Scaling properties

• Better Planarization

° Stacked capacitors• Logic BELOW capacitor

• Gain in surface area of capacitor

• 2-dim cross-section quite small

4/12/04 ©UCB Spring 2004CS152 / Kubiatowicz

Lec19.23

Classical DRAM Organization (square)

row

decoder

rowaddress

Column Selector &I/O Circuits Column

Address

data

RAM CellArray

word (row) select

bit (data) lines

° Row and Column Address together:

• Select 1 bit a time

Each intersection represents

a 1-T DRAM Cell

4/12/04 ©UCB Spring 2004CS152 / Kubiatowicz

Lec19.24

DRAM logical organization (4 Mbit)

° Square root of bits per RAS/CAS

Column Decoder

Sense Amps & I/O

Memory Array

(2,048 x 2,048)A0…A10

11 D

Q

Word LineStorage Cell

1

1

1

1

1Refresh: a dummy read

Capacitor holds statefor 60 ms -- then

must do “refresh”

To learn more ...DRAM Circuit Design: A TutorialBrent Keeth, R. Jacob BakerISBN: 0-7803-6014-1November 2000,Wiley-IEEE Press

6

Page 7: CS152 – Computer Architecture and Engineering Lecture 15 ...cs152/fa04/lecnotes/lec8-2.pdf · ¥ 1. Dri ve bit line ¥ 2.. Select row ¡Read: ¥ 1. Precharge bit line to Vdd/2 ¥

UC Regents Fall 2004 © UCBCS 152 L15 Virtual Memory ()

Synchronous DRAM (SDRAM) Interface

11128Mb: x4, x8, x16 SDRAM Micron Technology, Inc., reserves the right to change products or specifications without notice.128MSDRAM_E.p65 – Rev. E; Pub. 1/02 ©2001, Micron Technology, Inc.

128Mb: x4, x8, x16SDRAM

Operating ModeThe normal operating mode is selected by setting M7

and M8 to zero; the other combinations of values for M7and M8 are reserved for future use and/or test modes.The programmed burst length applies to both READ andWRITE bursts.

Test modes and reserved states should not be usedbecause unknown operation or incompatibility with fu-ture versions may result.

Write Burst ModeWhen M9 = 0, the burst length programmed via

M0-M2 applies to both READ and WRITE bursts; whenM9 = 1, the programmed burst length applies toREAD bursts, but write accesses are single-location(nonburst) accesses.

CAS LatencyThe CAS latency is the delay, in clock cycles, between

the registration of a READ command and the availabilityof the first piece of output data. The latency can be set totwo or three clocks.

If a READ command is registered at clock edge n, andthe latency is m clocks, the data will be available by clockedge n + m. The DQs will start driving as a result of theclock edge one cycle earlier (n + m - 1), and provided thatthe relevant access times are met, the data will be valid byclock edge n + m. For example, assuming that the clockcycle time is such that all relevant access times are met,if a READ command is registered at T0 and the latency isprogrammed to two clocks, the DQs will start drivingafter T1 and the data will be valid by T2, as shown inFigure 2. Table 2 below indicates the operating frequen-cies at which each CAS latency setting can be used.

Reserved states should not be used as unknown op-eration or incompatibility with future versionsmay result.

Figure 2CAS Latency

CLK

DQ

T2T1 T3T0

CAS Latency = 3

LZ

DOUT

tOHt

COMMAND NOPREAD

tAC

NOP

T4

NOP

DON’T CARE

UNDEFINED

CLK

DQ

T2T1 T3T0

CAS Latency = 2

LZ

DOUT

tOHt

COMMAND NOPREAD

tAC

NOP

Table 2CAS LatencyALLOWABLE OPERATING

FREQUENCY (MHz)CAS CAS

SPEED LATENCY = 2 LATENCY = 3-7E ≤ 133 ≤ 143-75 ≤ 100 ≤ 133-8E ≤ 100 ≤ 125

Note! This example is best-case! For a random access, DRAM takes

many more than 2 cycles!

A clocked bus protocol

(ex: 100 MHz)

Cache controllerputs commands

on bus

Data comes out several cycles

later.From Micron 128 Mb SDRAM data sheet (on “resources” web page)

(CAS = Column Address Strobe)

7

Page 8: CS152 – Computer Architecture and Engineering Lecture 15 ...cs152/fa04/lecnotes/lec8-2.pdf · ¥ 1. Dri ve bit line ¥ 2.. Select row ¡Read: ¥ 1. Precharge bit line to Vdd/2 ¥

CS 152 L15 Virtual Memory () UC Regents Fall 2004 © UCB

Administrivia - Lab 3, HW 3, Lab 4

Homework 3 due 10/26 (Tuesday),283 Soda, in CS 152 box at 5 PM

Lab 3 final demo on 10/22 (Friday)

Lab 3 report due: Monday 10/25 11:59 PM

Lab 4 to be posted on 10/22 (Friday)(if all goes well).

8

Page 9: CS152 – Computer Architecture and Engineering Lecture 15 ...cs152/fa04/lecnotes/lec8-2.pdf · ¥ 1. Dri ve bit line ¥ 2.. Select row ¡Read: ¥ 1. Precharge bit line to Vdd/2 ¥

UC Regents Fall 2004 © UCBCS 152 L15 Virtual Memory ()

Virtual Addressing

9

Page 10: CS152 – Computer Architecture and Engineering Lecture 15 ...cs152/fa04/lecnotes/lec8-2.pdf · ¥ 1. Dri ve bit line ¥ 2.. Select row ¡Read: ¥ 1. Precharge bit line to Vdd/2 ¥

UC Regents Fall 2004 © UCBCS 152 L15 Virtual Memory ()

The Limits of Physical Addressing

CPU Memory

A0-A31 A0-A31

D0-D31 D0-D31

“Physical addresses” of memory locations

Data

All programs share one address space: The physical address space

No way to prevent a program from accessing any machine resource

Machine language programs must beaware of the machine organization

Where we are in CS 152 ...

10

Page 11: CS152 – Computer Architecture and Engineering Lecture 15 ...cs152/fa04/lecnotes/lec8-2.pdf · ¥ 1. Dri ve bit line ¥ 2.. Select row ¡Read: ¥ 1. Precharge bit line to Vdd/2 ¥

UC Regents Fall 2004 © UCBCS 152 L15 Virtual Memory ()

Solution: Add a Layer of Indirection

CPU Memory

A0-A31 A0-A31

D0-D31 D0-D31

Data

User programs run in an standardizedvirtual address space

Address Translation hardware managed by the operating system (OS)

maps virtual address to physical memory

“Physical Addresses”

AddressTranslation

Virtual Physical

“Virtual Addresses”

Hardware supports “modern” OS features:Protection, Translation, Sharing

11

Page 12: CS152 – Computer Architecture and Engineering Lecture 15 ...cs152/fa04/lecnotes/lec8-2.pdf · ¥ 1. Dri ve bit line ¥ 2.. Select row ¡Read: ¥ 1. Precharge bit line to Vdd/2 ¥

UC Regents Fall 2004 © UCBCS 152 L15 Virtual Memory ()

MIPS R4000: Address Space ModelProcess A

AddressError

2 GB

0

231

232- 1

Process B

AddressError

2 GB

0

231

232- 1

Process A and B have independent address spaces

ASID = 12 ASID = 13

ASID = Address Space Identifier

When Process A writes its address 9, it writes to a different physical memory location than

Process B’s address 9

May only be accessed by kernel/supervisor

To let Process A and B share memory, OS maps parts of

ASID 12 and ASID 13 to the same physical memory locations.

All address spaces“translated” to standard map

Still works (slowly!) if a process accesses more virtual memory than the machine has physical memory

12

Page 13: CS152 – Computer Architecture and Engineering Lecture 15 ...cs152/fa04/lecnotes/lec8-2.pdf · ¥ 1. Dri ve bit line ¥ 2.. Select row ¡Read: ¥ 1. Precharge bit line to Vdd/2 ¥

UC Regents Fall 2004 © UCBCS 152 L15 Virtual Memory ()

MIPS R4000: Who’s Running on the CPU?System Control Registers

Chapter 4

80 MIPS R4000 Microprocessor User' s Manual

4.3 System Control Coprocessor

The System Control Coprocessor (CP0) is implemented as an integral part

of the CPU, and supports memory management, address translation,

exception handling, and other privileged operations. CP0 contains the

registers shown in Figure 4-7 plus a 48-entry TLB. The sections that follow

describe how the processor uses the memory management-related

registers†.

Each CP0 register has a unique number that identifies it; this number is

referred to as the register number. For instance, the Page Mask register is

register number 5.

Figure 4-7 CP0 Registers and the TLB

† For a description of CP0 data dependencies and hazards, please see Appendix F.

EntryLo02*

EntryHi

Page Mask

Index

Random

Wired

Count

47

0

BadVAddr

TLB

(“Safe” entries)

(See Random Register,

PRId

0127

8*

15*

Compare

11*

Config

16*

LLAddr

17*

WatchLo

18*

WatchHi

19*

TagLo

28*TagHi

29*

contents of TLB Wired)ECC

26*

*Register number

Used with exception

processing. SeeUsed with memory

Chapter 5 for details.

EntryLo0

2*

3*

EntryLo1

EntryHi

10*

5*Page Mask

Index

0*

Random1*

Wired6*

ErrorEPC

30*

Context

4*

Status

12*Cause

13*

EPC14*

management system.

CacheErr

27*

XContext

20*

9*

User cannot write supervisor/kernel bits. Supervisor cannot write kernel bit.

User cannot change address translation configuration

Status (12): Indicatesuser, supervisor, or

kernel mode

EntryLo0 (2): 8-bit ASID field codes virtualaddress space ID.

13

Page 14: CS152 – Computer Architecture and Engineering Lecture 15 ...cs152/fa04/lecnotes/lec8-2.pdf · ¥ 1. Dri ve bit line ¥ 2.. Select row ¡Read: ¥ 1. Precharge bit line to Vdd/2 ¥

UC Regents Fall 2004 © UCBCS 152 L15 Virtual Memory ()

MIPS Address Translation: How does it work?“Physical Addresses”

CPU Memory

A0-A31 A0-A31

D0-D31 D0-D31

Data

TLB also contains ASID andkernel/supervisor bits for virtual address

Virtual Physical

“Virtual Addresses”

TranslationLook-Aside

Buffer(TLB)

Translation Look-Aside Buffer (TLB)A small fully-associative cache of

mappings from virtual to physical addresses

Fast common case: Virtual address is in TLB, process has permission to read/write it.

What is the table

ofmappings

that it caches?

14

Page 15: CS152 – Computer Architecture and Engineering Lecture 15 ...cs152/fa04/lecnotes/lec8-2.pdf · ¥ 1. Dri ve bit line ¥ 2.. Select row ¡Read: ¥ 1. Precharge bit line to Vdd/2 ¥

UC Regents Fall 2004 © UCBCS 152 L15 Virtual Memory ()

Page tables encode virtual address spaces

A machine usually supports

pages of a few sizes

(MIPS R4000):

MIPS R4000 Microprocessor User' s Manual 87

Memory Management

EntryLo0 (2), and EntryLo1 (3) Registers

The EntryLo register consists of two registers that have identical formats:

• EntryLo0 is used for even virtual pages.

• EntryLo1 is used for odd virtual pages.

The EntryLo0 and EntryLo1 registers are read / write registers. They hold

the physical page frame number (PFN) of the TLB entry for even and odd

pages, respectively, when performing TLB read and write operations.

Figure 4-10 shows the format of these registers.

PageMask Register (5)

The PageMask register is a read / write register used for read ing from or

writing to the TLB; it holds a comparison mask that sets the variable page

size for each TLB entry, as shown in Table 4-9.

TLB read and write operations use this register as either a source or a

destination; when virtual addresses are presented for translation into

physical address, the corresponding bits in the TLB identify which virtual

address bits among bits 24:13 are used in the comparison. When the Mask

field is not one of the values shown in Table 4-9, the operation of the TLB

is undefined .

Table 4-9 Mask Field Values for Page Sizes

Page SizeBi t

24 23 22 21 20 19 18 17 16 15 14 13

4 Kbytes 0 0 0 0 0 0 0 0 0 0 0 0

16 Kbytes 0 0 0 0 0 0 0 0 0 0 1 1

64 Kbytes 0 0 0 0 0 0 0 0 1 1 1 1

256 Kbytes 0 0 0 0 0 0 1 1 1 1 1 1

1 Mbyte 0 0 0 0 1 1 1 1 1 1 1 1

4 Mbytes 0 0 1 1 1 1 1 1 1 1 1 1

16 Mbytes 1 1 1 1 1 1 1 1 1 1 1 1

PhysicalMemory Space

A valid page table entry codes physical memory “frame” address for the page

A virtual address spaceis divided into blocks

of memory called pagesframeframe

frame

frame

A page table is indexed by a virtual address

virtual address

Page Table(One per ASID)

OS manages the page table for each ASID

15

Page 16: CS152 – Computer Architecture and Engineering Lecture 15 ...cs152/fa04/lecnotes/lec8-2.pdf · ¥ 1. Dri ve bit line ¥ 2.. Select row ¡Read: ¥ 1. Precharge bit line to Vdd/2 ¥

UC Regents Fall 2004 © UCBCS 152 L15 Virtual Memory ()4/19/04 ©UCB Spring 2004CS152 / Kubiatowicz

Lec21.21

CPU Registers100s Bytes<10s ns

CacheK Bytes10-100 ns$.01-.001/bit

Main MemoryM Bytes100ns-1us$.01-.001

DiskG Bytesms10 - 10 cents

-3 -4

CapacityAccess TimeCost

Tapeinfinitesec-min10-6

Registers

Cache

Memory

Disk

Tape

Instr. Operands

Blocks

Pages

Files

StagingXfer Unit

prog./compiler1-8 bytes

cache cntl8-128 bytes

OS512-4K bytes

user/operatorMbytes

Upper Level

Lower Level

faster

Larger

Recall: Levels of the Memory Hierarchy

4/19/04 ©UCB Spring 2004CS152 / Kubiatowicz

Lec21.22

° Virtual memory => treat memory as a cache for the disk° Terminology: blocks in this cache are called “Pages”

• Typical size of a page: 1K — 8K

° Page table maps virtual page numbers to physical frames• “PTE” = Page Table Entry

Physical Address Space

Virtual Address Space

What is virtual memory?

Virtual Address

Page Table

indexintopagetable

Page TableBase Reg

VAccessRights PA

V page no. offset10

table locatedin physicalmemory

P page no. offset

10

Physical Address

4/19/04 ©UCB Spring 2004CS152 / Kubiatowicz

Lec21.23

Three Advantages of Virtual Memory

° Translation:• Program can be given consistent view of memory, even

though physical memory is scrambled

• Makes multithreading reasonable (now used a lot!)

• Only the most important part of program (“Working Set”) must be in physical memory.

• Contiguous structures (like stacks) use only as much physical memory as necessary yet still grow later.

° Protection:• Different threads (or processes) protected from each other.

• Different pages can be given special behavior- (Read Only, Invisible to user programs, etc).

• Kernel data protected from User programs

• Very important for protection from malicious programs=> Far more “viruses” under Microsoft Windows

° Sharing:• Can map same physical page to multiple users

(“Shared memory”) 4/19/04 ©UCB Spring 2004CS152 / Kubiatowicz

Lec21.24

What is the size of information blocks that are transferred from secondary to main storage (M)? ⇒ page size(Contrast with physical block size on disk, I.e. sector size)

Which region of M is to hold the new block ⇒ placement policy

How do we find a page when we look for it? ⇒ block identification

Block of information brought into M, and M is full, then some region of M must be released to make room for the new block ⇒ replacement policy

What do we do on a write? ⇒ write policy

Missing item fetched from secondary memory only on the occurrence of a fault ⇒ demand load policy

pages

reg

cachemem disk

frame

Issues in Virtual Memory System Design

V=0 pages either reside on disk or

have not yet been allocated.

OS handles V=0“Page fault”

Physical and virtual pages must be the

same size!

The TLB caches page table entries

TLB

Page Table

2

0

1

3

virtual address

page off

2frame page

250

physical address

page off

TLB caches page table

entries.

MIPS handles TLB misses in software (random replacement). Other

machines use hardware.

for ASID

Physicalframe

address

16

Page 17: CS152 – Computer Architecture and Engineering Lecture 15 ...cs152/fa04/lecnotes/lec8-2.pdf · ¥ 1. Dri ve bit line ¥ 2.. Select row ¡Read: ¥ 1. Precharge bit line to Vdd/2 ¥

UC Regents Fall 2004 © UCBCS 152 L15 Virtual Memory ()

Page tables may not fit in memory!A table for 4KB pages for a 32-bit

address space has 1M entries Each process needs its own address space!

P1 index P2 index Page Offset

31 12 11 02122

32 bit virtual address

Top-level table wired in main memory

Subset of 1024 second-level tables in main memory; rest are on disk or

unallocated

Two-level Page Tables

4/19/04 ©UCB Spring 2004CS152 / Kubiatowicz

Lec21.25

How big is the translation (page) table?

° Simplest way to implement “fully associative” lookup policy is with large lookup table.

° Each entry in table is some number of bytes, say 4

° With 4K pages, 32- bit address space, need:232/4K = 220 = 1 Meg entries x 4 bytes = 4MB

° With 4K pages, 64-bit address space, need:264/4K = 252 entries = BIG!

° Can’t keep whole page table in memory!

Virtual Page Number Page Offset

4/19/04 ©UCB Spring 2004CS152 / Kubiatowicz

Lec21.26

Large Address Spaces

Two-level Page Tables

32-bit address:

P1 index P2 index page offest

4 bytes

4 bytes

4KB

10 10 12

1KPTEs

° 2 GB virtual address space

° 4 MB of PTE2

– paged, holes

° 4 KB of PTE1

What about a 48-64 bit address space?

4/19/04 ©UCB Spring 2004CS152 / Kubiatowicz

Lec21.27

Inverted Page Tables

V.Page P. FramehashVirtual

Page

=

IBM System 38 (AS400) implements 64-bit addresses.

48 bits translated

start of object contains a 12-bit tag

=> TLBs or virtually addressed caches are critical

4/19/04 ©UCB Spring 2004CS152 / Kubiatowicz

Lec21.28

Virtual Address and a Cache: Step backward???

° Virtual memory seems to be really slow:

• Must access memory on load/store -- even cache hits!

• Worse, if translation not completely in memory, may need to go to disk before hitting in cache!

° Solution: Caching! (surprise!)

• Keep track of most common translations and place them in a “Translation Lookaside Buffer” (TLB)

CPUTrans-lation

CacheMain

Memory

VA PA miss

hit

data

17

Page 18: CS152 – Computer Architecture and Engineering Lecture 15 ...cs152/fa04/lecnotes/lec8-2.pdf · ¥ 1. Dri ve bit line ¥ 2.. Select row ¡Read: ¥ 1. Precharge bit line to Vdd/2 ¥

UC Regents Fall 2004 © UCBCS 152 L15 Virtual Memory ()

VM and Disk: Page replacement policy

...

Page Table

1 0

useddirty

1 00 11 10 0

Set of all pagesin Memory Tail pointer:

Clear the usedbit in thepage table

Head pointerPlace pages on free list if used bitis still clear.Schedule pages with dirty bit set tobe written to disk.

Freelist

Free Pages

Dirty bit: page written.

Used bit: set to1 on any reference

Architect’s role: support setting dirty

and used bits

18

Page 19: CS152 – Computer Architecture and Engineering Lecture 15 ...cs152/fa04/lecnotes/lec8-2.pdf · ¥ 1. Dri ve bit line ¥ 2.. Select row ¡Read: ¥ 1. Precharge bit line to Vdd/2 ¥

UC Regents Fall 2004 © UCBCS 152 L15 Virtual Memory ()

TLB Design Concepts

19

Page 20: CS152 – Computer Architecture and Engineering Lecture 15 ...cs152/fa04/lecnotes/lec8-2.pdf · ¥ 1. Dri ve bit line ¥ 2.. Select row ¡Read: ¥ 1. Precharge bit line to Vdd/2 ¥

UC Regents Fall 2004 © UCBCS 152 L15 Virtual Memory ()

MIPS R4000 TLB: A closer look ...“Physical Addresses”

CPU MemorySystem

A0-A31 A0-A31

D0-D31 D0-D31

Data

TranslationLook-Aside

Buffer(TLB)

Virtual Physical

“Virtual Addresses”

MIPS R4000 Microprocessor User' s Manual 65

Memory Management

32-bit Mode Address Translation

Figure 4-2 shows the virtual-to-physical-address translation of a 32-bit

mode address.

• The top portion of Figure 4-2 shows a virtual add ress w ith a

12-bit, or 4-Kbyte, page size, labelled Offset. The remaining 20

bits of the add ress represent the VPN, and index the 1M-entry

page table.

• The bottom portion of Figure 4-2 shows a virtual add ress w ith

a 24-bit, or 16-Mbyte, page size, labelled Offset. The remaining

8 bits of the add ress represent the VPN, and index the 256-

entry page table.

Figure 4-2 32-bit Mode Virtual Address Translation

28 11 0

20 12

2931

VPN Offset

3239

ASID

8

Virtual Address with 1M (220) 4-Kbyte pages

23 0

8 24

Offset

39

Virtual Address with 256 (28)16-Mbyte pages

8 bits = 256 pages

20 bits = 1M pages 12

ASID

8

28 293132

VPN

24

Virtual-to-physicaltranslation in TLB

Bits 31, 30 and 29 of the virtualaddress select user, supervisor,or kernel address spaces.

Offset passedunchanged tophysicalmemory

Virtual-to-physicaltranslation in TLB

TLB

TLB

35 0

PFN Offset

Offset passedunchanged tophysicalmemory

36-bit Physical Address

Physical space larger than virtual space!Checked

againstCPO ASID

20

Page 21: CS152 – Computer Architecture and Engineering Lecture 15 ...cs152/fa04/lecnotes/lec8-2.pdf · ¥ 1. Dri ve bit line ¥ 2.. Select row ¡Read: ¥ 1. Precharge bit line to Vdd/2 ¥

UC Regents Fall 2004 © UCBCS 152 L15 Virtual Memory ()

Can TLB and caching be overlapped?

Index Byte Select

Valid

Cache Block

Cache Block

Cache Tags Cache Data

Data out

Virtual Page Number Page Offset

TranslationLook-Aside

Buffer(TLB)

Virtual

Physical

=

Hit

Cache Tag

This works, but ...

Q. What is the downside?

A. Inflexibility. VPN size locked to cache tag size.

21

Page 22: CS152 – Computer Architecture and Engineering Lecture 15 ...cs152/fa04/lecnotes/lec8-2.pdf · ¥ 1. Dri ve bit line ¥ 2.. Select row ¡Read: ¥ 1. Precharge bit line to Vdd/2 ¥

UC Regents Fall 2004 © UCBCS 152 L15 Virtual Memory ()

Can we cache virtual addresses?“Physical Addresses”

CPU Main Memory

A0-A31 A0-A31

D0-D31 D0-D31

Only use TLB on a cache miss !

TranslationLook-Aside

Buffer(TLB)

Virtual Physical

“Virtual Addresses”

A. Synonym problem. If two address spaces share a physical frame, data may be in cache twice.

Maintaining consistency is a nightmare.

CacheVirtual

D0-D31

Downside: a subtle, fatal problem. What is it?

22

Page 23: CS152 – Computer Architecture and Engineering Lecture 15 ...cs152/fa04/lecnotes/lec8-2.pdf · ¥ 1. Dri ve bit line ¥ 2.. Select row ¡Read: ¥ 1. Precharge bit line to Vdd/2 ¥

CS 152 L15 Virtual Memory () UC Regents Fall 2004 © UCB

Conclusions

VM: Uniform memory models,protection, sharing.

Synchronous DRAM: flexiblebus protocol for array access

Operating systems manage the page table and (often) the TLB

A TLB acts as a fast cache forrecent address translations.

23