csce 212 chapter 7 memory hierarchy instructor: jason d. bakos
Post on 19-Dec-2015
216 Views
Preview:
TRANSCRIPT
CSCE 212Chapter 7
Memory Hierarchy
Instructor: Jason D. Bakos
CSCE 212 2
Memory Hierarchy
• Programmers want more memory and faster memory
• Problems:– Denser memories require longer access times
• Example: papers on your desk vs. papers in your filing cabinet
– Fast memories are extremely expensive per unit capacity
• Examples:– SRAM: .5 – 5 ns access time, $1K/GB– DRAM: 50 – 70 ns access time, $100/GB– Magnetic disk: 5 – 20 ms access time, $.10/GB
CSCE 212 3
Locality
• Goal:– Achieve the access time of smaller memories but have the
effective capacity of larger memories
• Solution:
– Temporal locality• memory locations are accessed more than once
– Spatial locality• when a memory location is accessed, there’s a good chance a nearly
location will be accessed in the near future
CSCE 212 4
Memory Hierarchy
CSCE 212 5
Memory Hierarchy• Each level of the hierarchy stores a subset
of the level below it
• Each level can only communicate with the level below it
• For now, assume 2-level hierarchy– CPU-cache-RAM– cache is usually on-chip
• Sometimes the data we need is not in cache– hit rate
• Block or line– spatial locality
• miss penalty– time required to move a line to the top of the
hierarchy (may vary)
CPU cache mainmemory
CSCE 212 6
Caches
• Questions:
1. How do we know if the requested location is in the cache?
2. How do we find it?
CSCE 212 7
Cache Organization
n words
tags
address(31 downto (log2 n + 2))• Fully associative
– Too many tags to compare!
CSCE 212 8
Direct Mapped Cache
CSCE 212 9
Direct Mapped Cache
• Direct mapped – each memory location maps to only one location in the cache
8 wordstags
addr(31:8)addr(7:5)
000
001
010
011
100
101
110
111
CSCE 212 10
Addresses
• The memory address can be partitioned:
• Example: 128 lines, 16 word lines:
tag bits index
log2lines bits
(which line in each set?)
word offset
log2lines_size bits
(which word in the line?)
byte offset
2 bits
(which byte in the word?)
tag bits index word offset byte offset
1:05:29:331:10
CSCE 212 11
Cache Organization
CSCE 212 12
The Three C’s
• Three different kinds of misses:
– Compulsary (cold-start) misses• First access to a block
– Capacity misses• Replaced block is needed again• Because… cache capacity isn’t sufficient for the program
– Conflict (collision) misses• Multiple blocks compete for the same set
CSCE 212 13
Associativity
• 2-way set associative:– Two choices where to store a given line
• Replacement policy (ex. LRU)
8 wordstags 0
addr(31:8)addr(7:5)
000
001
010
011
100
101
110
111
8 wordstags 1
addr(31:8)
CSCE 212 14
Associative Cache Organization
CSCE 212 15
Cache Behavior
• Hits at the top-level cache can usually be performed in one (or a few) clock cycles
• Misses stall the processor
• Writes can be handled using
– Write-through (write allocate, write no-allocate)• When cache data is changed, the lower level memory is updated
immediately• Use a write buffer
– Write-back• When cache data is changed, the lower level memory isn’t updated until the
cache line containing the changes is replaced
CSCE 212 16
Memory Systems
• Main memory is DRAM, designed for density (not access time)
• How to reduce miss penalty?
CSCE 212 17
Average Memory Access Time
• AMAT = hit_time + miss_rate * miss_penalty
• Reduce miss rate:– Larger cache (capacity misses)– Increase associativity (conflict misses)– Replacement policy
– Each of these may increase hit time and miss penalty
• Reduce miss penalty:– Wider or banked memory bus
CSCE 212 18
Virtual Memory
• Main memory acts as a cache to secondary storage– Allows memory to be shared– Make memory appear to be larger than it physically is
• Each program has own address space• Enforces protection
• Virtual memory block is called a page, a miss is called a page fault
• Virtual addresses are translated into physical addresses– Address mapping / address translation– Combination of hardware and software
CSCE 212 19
Virtual Memory
CSCE 212 20
Virtual Memory
CSCE 212 21
Page Faults
• Main memory is 100,000 times faster than disk– Page faults are expensive
• Reduce page fault rate– Fully associative placement of pages in memory
• Each process has a page table that maps virtual addresses to physical addresses
• OS creates space on disk for all the process’s pages– Swap space
• OS maintains another table that keeps track of each page in main memory– During a page fault, the OS must decide which page to replace– Least recently used (LRU)– Write-back used for writes
CSCE 212 22
Page Table
CSCE 212 23
Page Table
CSCE 212 24
TLB
• Page lookups must be performed in hardware– Page table is cached on-chip– Translation-lookaside buffer– Small fully associative or large limited associative
CSCE 212 25
Integrating Cache and VM
• Data cannot be in the cache unless it is present in main memory
• Cache can be– physically addressed (TLB in critical path)– virtually addressed (TLB out of critical path)
• Cache miss requires TLB access
• TLB miss means:– page is in memory but we need the TLB entry, or– page is not in memory (page fault)– (both handled by OS software)
CSCE 212 26
TLB Misses and Page Faults
• When a virtual address causes a page fault…1. Look up page table entry and find location on disk2. Choose a physical page to replace, write-back if dirty3. Read page from disk into chosen physical page (allow another process to run)
• TLB miss in MIPS– BadVAddr set, special exception triggered (8000 0000), go to TLB miss handler– Context register:
• bits 31:20 base of the page table• bits 19:2 virtual address of the missing page
– Use Context register directly to load missing entry• If the page table entry is invalid, a page fault exception occurs at the normal handler (8000 0180)
– Move missing entry to EntryLo register– Execute tlbwr to move EntryLo to TLB at address stored in Random register (free
running counter)– Execute eret to return
• TLB miss exception doesn’t save process state (fast) while page fault does (slow)
top related