the memory hierarchy ii cpsc 321 andreas klappenecker

The Memory Hierarchy II CPSC 321

Andreas Klappenecker

Today’s Menu

CacheVirtual MemoryTranslation Lookaside Buffer

Caches

Why? How?

Memory

• Users want large and fast memories• SRAM is too expensive for main memory• DRAM is too slow for many purposes

• Compromise• Build a memory hierarchy

CPU

Level n

Level 2

Level 1

Levels in thememory hierarchy

Increasing distance from the CPU in

access time

Size of the memory at each level

Locality

• If an item is referenced, then • it will be again referenced soon (temporal locality)• nearby data will be referenced soon (spatial locality)

• Why does code have locality?

• Mapping: address modulo the number of blocks in the cache, x -> x mod B

Direct Mapped Cache

00001 00101 01001 01101 10001 10101 11001 11101

000

Cache

Memory

001

01

001

11

001

011

101

11

• Cache with 1024=210 words• tag from cache is compared against

upper portion of the address• If tag=upper 20 bits and valid bit is

set, then we have a cache hit otherwise it is a cache miss

What kind of locality are we taking advantage of?

Direct Mapped Cache

Address (showing bit positions)

20 10

Byteoffset

Valid Tag DataIndex

0

1

2

1021

1022

1023

Tag

Index

Hit Data

20 32

31 30 13 12 11 2 1 0

The index is determined by address mod 1024

• Taking advantage of spatial locality:

Direct Mapped Cache

Address (showing bit positions)

16 12 Byteoffset

V Tag Data

Hit Data

16 32

4Kentries

16 bits 128 bits

Mux

32 32 32

2

32

Block offsetIndex

Tag

31 16 15 4 32 1 0

Bits in a Cache

How many total bits are required for a direct-mapped cache with 16 KB of data, 4 word blocks, assuming a 32 bit address?• 16 KB = 4K words = 2^12 words• Block size of 4 words => 2^10 blocks• Each block has 4 x 32 = 128 bits of data + tag + valid bit• tag + valid bit = (32 – 10 – 2 – 2) + 1 = 19• Total cache size = 2^10*(128 + 19) = 2^10 * 147 Therefore, 147 KB are needed for the cache

Cache Block Mapping

• Direct mapped cache• a block goes in exactly one place in the cache

• Fully associative cache• a block can go anywhere in the cache• it is difficult to find a block• parallel comparison to speed-up search

• Set associative cache• a block can go to a (small) number of places• compromise between the two extremes above

Cache Types

Set Associative Caches

• Each block maps to a unique set,• the block can be placed into any

element of that set,• Position is given by

(Block number) modulo (# of sets in cache)

• If the sets contain n elements, then the cache is called n-way set associative

Summary: Where can a Block be Placed?

Name Number of Sets Blocks per Set

Direct mapped # Blocks in Cache

1

Set associative (#Blocks in Cache)

Associativity

Associativity (typically 2-8)

Fully associative

1 Number of blocks in

cache

Summary: How is a Block Found?

Associativity Number of Sets # Comparisons

Direct mapped Index 1

Set associative

Index the set, search among elements

Degree of Associativity

Fully associative

search all cache entries

size of the cache

separate lookup table

0

Virtual Memory

Virtual Memory

• Processor generates virtual addresses• Memory is accessed using physical

addresses • Virtual and physical memory is broken into

blocks of memory, called pages• A virtual page may be

• absent from main memory, residing on the disk• or may be mapped to a physical page

Virtual Memory

• Main memory can act as a cache for the secondary storage (disk)

• Virtual address generated by processor (left)• Address translation (middle)• Physical addresses (right)

• Advantages:• illusion of having more physical memory• program relocation • protection

Physical addresses

Disk addresses

Virtual addresses

Address translation

Pages: virtual memory blocks

• Page faults: if data is not in memory, retrieve it from disk• huge miss penalty, thus pages should be fairly large

(e.g., 4KB)• reducing page faults is important (LRU is worth the

price)• can handle the faults in software instead of hardware• using write-through takes too long so we use writeback• Example: page size 212=4KB; 218 physical pages; main memory <= 1GB; virtual memory <= 4GB

3 2 1 011 10 9 815 14 13 1231 30 29 28 27

Page offsetVirtual page number

Virtual address

3 2 1 011 10 9 815 14 13 1229 28 27

Page offsetPhysical page number

Physical address

Translation

Page Faults

• Incredible high penalty for a page fault• Reduce number of page faults by

optimizing page placement• Use fully associative placement

• full search of pages is impractical• pages are located by a full table that indexes

the memory, called the page table• the page table resides within the memory

Page Tables

Physical memory

Disk storage

Valid

1

1

1

1

0

1

1

0

1

1

0

1

Page table

Virtual pagenumber

Physical page ordisk address

The page table maps each page to either a page in mainmemory or to a page stored on disk

Page Tables

Page offsetVirtual page number

Virtual address

Page offsetPhysical page number

Physical address

Physical page numberValid

If 0 then page is notpresent in memory

Page table register

Page table

20 12

18

31 30 29 28 27 15 14 13 12 11 10 9 8 3 2 1 0

29 28 27 15 14 13 12 11 10 9 8 3 2 1 0

Making Memory Access Fast

• Page tables slow us down• Memory access will take at least twice as

long• access page table in memory• access page

• What can we do?

Memory access is local => use a cache that keeps track of recently used address translations, called translation lookaside buffer

Making Address Translation Fast

A cache for address translations: translation lookaside buffer

Valid

1

1

1

1

0

1

1

0

1

1

0

1

Page table

Physical pageaddressValid

TLB

1

1

1

1

0

1

TagVirtual page

number

Physical pageor disk address

Physical memory

Disk storage

Translation Lookaside Buffer

• Some typical values for a TLB• TLB size 32-4096• Block size: 1-2 page table entries (4-8bytes

each)• Hit time: 0.5-1 clock cycle• Miss penalty: 10-30 clock cycles• Miss rate: 0.01%-1%

TLBs and Caches

Yes

Deliver datato the CPU

Write?

Try to read datafrom cache

Write data into cache,update the tag, and put

the data and the addressinto the write buffer

Cache hit?Cache miss stall

TLB hit?

TLB access

Virtual address

TLB missexception

No

YesNo

YesNo

Write accessbit on?

YesNo

Write protectionexception

Physical address

More Modern Systems

• Very complicated memory systems:Characteristic Intel Pentium Pro PowerPC 604

Virtual address 32 bits 52 bitsPhysical address 32 bits 32 bitsPage size 4 KB, 4 MB 4 KB, selectable, and 256 MBTLB organization A TLB for instructions and a TLB for data A TLB for instructions and a TLB for data

Both four-way set associative Both two-way set associativePseudo-LRU replacement LRU replacementInstruction TLB: 32 entries Instruction TLB: 128 entriesData TLB: 64 entries Data TLB: 128 entriesTLB misses handled in hardware TLB misses handled in hardware

Characteristic Intel Pentium Pro PowerPC 604Cache organization Split instruction and data caches Split intruction and data cachesCache size 8 KB each for instructions/data 16 KB each for instructions/dataCache associativity Four-way set associative Four-way set associativeReplacement Approximated LRU replacement LRU replacementBlock size 32 bytes 32 bytesWrite policy Write-back Write-back or write-through

• Processor speeds continue to increase very fast— much faster than either DRAM or disk access times

• Design challenge: dealing with this growing disparity

• Trends:• synchronous SRAMs (provide a burst of data)• redesign DRAM chips to provide higher bandwidth or processing • restructure code to increase locality• use prefetching (make cache visible to ISA)

Some Issues

the memory hierarchy ii cpsc 321 andreas klappenecker

Documents

virtual memory slide

associative slide

cache types

directmapped cache

memory hierarchy slide

blocks of memory

search set associative

total cache size