csce 212 chapter 7 memory hierarchy instructor: jason d. bakos

Post on 19-Dec-2015

216 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

CSCE 212Chapter 7

Memory Hierarchy

Instructor: Jason D. Bakos

CSCE 212 2

Memory Hierarchy

• Programmers want more memory and faster memory

• Problems:– Denser memories require longer access times

• Example: papers on your desk vs. papers in your filing cabinet

– Fast memories are extremely expensive per unit capacity

• Examples:– SRAM: .5 – 5 ns access time, $1K/GB– DRAM: 50 – 70 ns access time, $100/GB– Magnetic disk: 5 – 20 ms access time, $.10/GB

CSCE 212 3

Locality

• Goal:– Achieve the access time of smaller memories but have the

effective capacity of larger memories

• Solution:

– Temporal locality• memory locations are accessed more than once

– Spatial locality• when a memory location is accessed, there’s a good chance a nearly

location will be accessed in the near future

CSCE 212 4

Memory Hierarchy

CSCE 212 5

Memory Hierarchy• Each level of the hierarchy stores a subset

of the level below it

• Each level can only communicate with the level below it

• For now, assume 2-level hierarchy– CPU-cache-RAM– cache is usually on-chip

• Sometimes the data we need is not in cache– hit rate

• Block or line– spatial locality

• miss penalty– time required to move a line to the top of the

hierarchy (may vary)

CPU cache mainmemory

CSCE 212 6

Caches

• Questions:

1. How do we know if the requested location is in the cache?

2. How do we find it?

CSCE 212 7

Cache Organization

n words

tags

address(31 downto (log2 n + 2))• Fully associative

– Too many tags to compare!

CSCE 212 8

Direct Mapped Cache

CSCE 212 9

Direct Mapped Cache

• Direct mapped – each memory location maps to only one location in the cache

8 wordstags

addr(31:8)addr(7:5)

000

001

010

011

100

101

110

111

CSCE 212 10

Addresses

• The memory address can be partitioned:

• Example: 128 lines, 16 word lines:

tag bits index

log2lines bits

(which line in each set?)

word offset

log2lines_size bits

(which word in the line?)

byte offset

2 bits

(which byte in the word?)

tag bits index word offset byte offset

1:05:29:331:10

CSCE 212 11

Cache Organization

CSCE 212 12

The Three C’s

• Three different kinds of misses:

– Compulsary (cold-start) misses• First access to a block

– Capacity misses• Replaced block is needed again• Because… cache capacity isn’t sufficient for the program

– Conflict (collision) misses• Multiple blocks compete for the same set

CSCE 212 13

Associativity

• 2-way set associative:– Two choices where to store a given line

• Replacement policy (ex. LRU)

8 wordstags 0

addr(31:8)addr(7:5)

000

001

010

011

100

101

110

111

8 wordstags 1

addr(31:8)

CSCE 212 14

Associative Cache Organization

CSCE 212 15

Cache Behavior

• Hits at the top-level cache can usually be performed in one (or a few) clock cycles

• Misses stall the processor

• Writes can be handled using

– Write-through (write allocate, write no-allocate)• When cache data is changed, the lower level memory is updated

immediately• Use a write buffer

– Write-back• When cache data is changed, the lower level memory isn’t updated until the

cache line containing the changes is replaced

CSCE 212 16

Memory Systems

• Main memory is DRAM, designed for density (not access time)

• How to reduce miss penalty?

CSCE 212 17

Average Memory Access Time

• AMAT = hit_time + miss_rate * miss_penalty

• Reduce miss rate:– Larger cache (capacity misses)– Increase associativity (conflict misses)– Replacement policy

– Each of these may increase hit time and miss penalty

• Reduce miss penalty:– Wider or banked memory bus

CSCE 212 18

Virtual Memory

• Main memory acts as a cache to secondary storage– Allows memory to be shared– Make memory appear to be larger than it physically is

• Each program has own address space• Enforces protection

• Virtual memory block is called a page, a miss is called a page fault

• Virtual addresses are translated into physical addresses– Address mapping / address translation– Combination of hardware and software

CSCE 212 19

Virtual Memory

CSCE 212 20

Virtual Memory

CSCE 212 21

Page Faults

• Main memory is 100,000 times faster than disk– Page faults are expensive

• Reduce page fault rate– Fully associative placement of pages in memory

• Each process has a page table that maps virtual addresses to physical addresses

• OS creates space on disk for all the process’s pages– Swap space

• OS maintains another table that keeps track of each page in main memory– During a page fault, the OS must decide which page to replace– Least recently used (LRU)– Write-back used for writes

CSCE 212 22

Page Table

CSCE 212 23

Page Table

CSCE 212 24

TLB

• Page lookups must be performed in hardware– Page table is cached on-chip– Translation-lookaside buffer– Small fully associative or large limited associative

CSCE 212 25

Integrating Cache and VM

• Data cannot be in the cache unless it is present in main memory

• Cache can be– physically addressed (TLB in critical path)– virtually addressed (TLB out of critical path)

• Cache miss requires TLB access

• TLB miss means:– page is in memory but we need the TLB entry, or– page is not in memory (page fault)– (both handled by OS software)

CSCE 212 26

TLB Misses and Page Faults

• When a virtual address causes a page fault…1. Look up page table entry and find location on disk2. Choose a physical page to replace, write-back if dirty3. Read page from disk into chosen physical page (allow another process to run)

• TLB miss in MIPS– BadVAddr set, special exception triggered (8000 0000), go to TLB miss handler– Context register:

• bits 31:20 base of the page table• bits 19:2 virtual address of the missing page

– Use Context register directly to load missing entry• If the page table entry is invalid, a page fault exception occurs at the normal handler (8000 0180)

– Move missing entry to EntryLo register– Execute tlbwr to move EntryLo to TLB at address stored in Random register (free

running counter)– Execute eret to return

• TLB miss exception doesn’t save process state (fast) while page fault does (slow)

top related