Download - CSCE 212 Chapter 7 Memory Hierarchy
![Page 1: CSCE 212 Chapter 7 Memory Hierarchy](https://reader036.vdocuments.us/reader036/viewer/2022062314/568145bb550346895db2c477/html5/thumbnails/1.jpg)
CSCE 212Chapter 7
Memory Hierarchy
Instructor: Jason D. Bakos
![Page 2: CSCE 212 Chapter 7 Memory Hierarchy](https://reader036.vdocuments.us/reader036/viewer/2022062314/568145bb550346895db2c477/html5/thumbnails/2.jpg)
CSCE 212 2
Memory Hierarchy
• Programmers want more memory and faster memory
• Problems:– Denser memories require longer access times
• Example: papers on your desk vs. papers in your filing cabinet
– Fast memories are extremely expensive per unit capacity
• Examples:– SRAM: .5 – 5 ns access time, $1K/GB– DRAM: 50 – 70 ns access time, $100/GB– Magnetic disk: 5 – 20 ms access time, $.10/GB
![Page 3: CSCE 212 Chapter 7 Memory Hierarchy](https://reader036.vdocuments.us/reader036/viewer/2022062314/568145bb550346895db2c477/html5/thumbnails/3.jpg)
CSCE 212 3
Locality
• Goal:– Achieve the access time of smaller memories but have the
effective capacity of larger memories
• Solution:
– Temporal locality• memory locations are accessed more than once
– Spatial locality• when a memory location is accessed, there’s a good chance a nearly
location will be accessed in the near future
![Page 4: CSCE 212 Chapter 7 Memory Hierarchy](https://reader036.vdocuments.us/reader036/viewer/2022062314/568145bb550346895db2c477/html5/thumbnails/4.jpg)
CSCE 212 4
Memory Hierarchy
![Page 5: CSCE 212 Chapter 7 Memory Hierarchy](https://reader036.vdocuments.us/reader036/viewer/2022062314/568145bb550346895db2c477/html5/thumbnails/5.jpg)
CSCE 212 5
Memory Hierarchy• Each level of the hierarchy stores a subset
of the level below it
• Each level can only communicate with the level below it
• For now, assume 2-level hierarchy– CPU-cache-RAM– cache is usually on-chip
• Sometimes the data we need is not in cache– hit rate
• Block or line– spatial locality
• miss penalty– time required to move a line to the top of the
hierarchy (may vary)
CPU cache mainmemory
![Page 6: CSCE 212 Chapter 7 Memory Hierarchy](https://reader036.vdocuments.us/reader036/viewer/2022062314/568145bb550346895db2c477/html5/thumbnails/6.jpg)
CSCE 212 6
Caches
• Questions:
1. How do we know if the requested location is in the cache?
2. How do we find it?
![Page 7: CSCE 212 Chapter 7 Memory Hierarchy](https://reader036.vdocuments.us/reader036/viewer/2022062314/568145bb550346895db2c477/html5/thumbnails/7.jpg)
CSCE 212 7
Cache Organization
n words
tags
address(31 downto (log2 n + 2))• Fully associative
– Too many tags to compare!
![Page 8: CSCE 212 Chapter 7 Memory Hierarchy](https://reader036.vdocuments.us/reader036/viewer/2022062314/568145bb550346895db2c477/html5/thumbnails/8.jpg)
CSCE 212 8
Direct Mapped Cache
![Page 9: CSCE 212 Chapter 7 Memory Hierarchy](https://reader036.vdocuments.us/reader036/viewer/2022062314/568145bb550346895db2c477/html5/thumbnails/9.jpg)
CSCE 212 9
Direct Mapped Cache
• Direct mapped – each memory location maps to only one location in the cache
8 wordstags
addr(31:8)addr(7:5)
000
001
010
011
100
101
110
111
![Page 10: CSCE 212 Chapter 7 Memory Hierarchy](https://reader036.vdocuments.us/reader036/viewer/2022062314/568145bb550346895db2c477/html5/thumbnails/10.jpg)
CSCE 212 10
Addresses
• The memory address can be partitioned:
• Example: 128 lines, 16 word lines:
tag bits index
log2lines bits
(which line in each set?)
word offset
log2lines_size bits
(which word in the line?)
byte offset
2 bits
(which byte in the word?)
tag bits index word offset byte offset
1:05:29:331:10
![Page 11: CSCE 212 Chapter 7 Memory Hierarchy](https://reader036.vdocuments.us/reader036/viewer/2022062314/568145bb550346895db2c477/html5/thumbnails/11.jpg)
CSCE 212 11
Cache Organization
![Page 12: CSCE 212 Chapter 7 Memory Hierarchy](https://reader036.vdocuments.us/reader036/viewer/2022062314/568145bb550346895db2c477/html5/thumbnails/12.jpg)
CSCE 212 12
The Three C’s
• Three different kinds of misses:
– Compulsary (cold-start) misses• First access to a block
– Capacity misses• Replaced block is needed again• Because… cache capacity isn’t sufficient for the program
– Conflict (collision) misses• Multiple blocks compete for the same set
![Page 13: CSCE 212 Chapter 7 Memory Hierarchy](https://reader036.vdocuments.us/reader036/viewer/2022062314/568145bb550346895db2c477/html5/thumbnails/13.jpg)
CSCE 212 13
Associativity
• 2-way set associative:– Two choices where to store a given line
• Replacement policy (ex. LRU)
8 wordstags 0
addr(31:8)addr(7:5)
000
001
010
011
100
101
110
111
8 wordstags 1
addr(31:8)
![Page 14: CSCE 212 Chapter 7 Memory Hierarchy](https://reader036.vdocuments.us/reader036/viewer/2022062314/568145bb550346895db2c477/html5/thumbnails/14.jpg)
CSCE 212 14
Associative Cache Organization
![Page 15: CSCE 212 Chapter 7 Memory Hierarchy](https://reader036.vdocuments.us/reader036/viewer/2022062314/568145bb550346895db2c477/html5/thumbnails/15.jpg)
CSCE 212 15
Cache Behavior
• Hits at the top-level cache can usually be performed in one (or a few) clock cycles
• Misses stall the processor
• Writes can be handled using
– Write-through (write allocate, write no-allocate)• When cache data is changed, the lower level memory is updated
immediately• Use a write buffer
– Write-back• When cache data is changed, the lower level memory isn’t updated until the
cache line containing the changes is replaced
![Page 16: CSCE 212 Chapter 7 Memory Hierarchy](https://reader036.vdocuments.us/reader036/viewer/2022062314/568145bb550346895db2c477/html5/thumbnails/16.jpg)
CSCE 212 16
Memory Systems
• Main memory is DRAM, designed for density (not access time)
• How to reduce miss penalty?
![Page 17: CSCE 212 Chapter 7 Memory Hierarchy](https://reader036.vdocuments.us/reader036/viewer/2022062314/568145bb550346895db2c477/html5/thumbnails/17.jpg)
CSCE 212 17
Average Memory Access Time
• AMAT = hit_time + miss_rate * miss_penalty
• Reduce miss rate:– Larger cache (capacity misses)– Increase associativity (conflict misses)– Replacement policy
– Each of these may increase hit time and miss penalty
• Reduce miss penalty:– Wider or banked memory bus
![Page 18: CSCE 212 Chapter 7 Memory Hierarchy](https://reader036.vdocuments.us/reader036/viewer/2022062314/568145bb550346895db2c477/html5/thumbnails/18.jpg)
CSCE 212 18
Virtual Memory
• Main memory acts as a cache to secondary storage– Allows memory to be shared– Make memory appear to be larger than it physically is
• Each program has own address space• Enforces protection
• Virtual memory block is called a page, a miss is called a page fault
• Virtual addresses are translated into physical addresses– Address mapping / address translation– Combination of hardware and software
![Page 19: CSCE 212 Chapter 7 Memory Hierarchy](https://reader036.vdocuments.us/reader036/viewer/2022062314/568145bb550346895db2c477/html5/thumbnails/19.jpg)
CSCE 212 19
Virtual Memory
![Page 20: CSCE 212 Chapter 7 Memory Hierarchy](https://reader036.vdocuments.us/reader036/viewer/2022062314/568145bb550346895db2c477/html5/thumbnails/20.jpg)
CSCE 212 20
Virtual Memory
![Page 21: CSCE 212 Chapter 7 Memory Hierarchy](https://reader036.vdocuments.us/reader036/viewer/2022062314/568145bb550346895db2c477/html5/thumbnails/21.jpg)
CSCE 212 21
Page Faults
• Main memory is 100,000 times faster than disk– Page faults are expensive
• Reduce page fault rate– Fully associative placement of pages in memory
• Each process has a page table that maps virtual addresses to physical addresses
• OS creates space on disk for all the process’s pages– Swap space
• OS maintains another table that keeps track of each page in main memory– During a page fault, the OS must decide which page to replace– Least recently used (LRU)– Write-back used for writes
![Page 22: CSCE 212 Chapter 7 Memory Hierarchy](https://reader036.vdocuments.us/reader036/viewer/2022062314/568145bb550346895db2c477/html5/thumbnails/22.jpg)
CSCE 212 22
Page Table
![Page 23: CSCE 212 Chapter 7 Memory Hierarchy](https://reader036.vdocuments.us/reader036/viewer/2022062314/568145bb550346895db2c477/html5/thumbnails/23.jpg)
CSCE 212 23
Page Table
![Page 24: CSCE 212 Chapter 7 Memory Hierarchy](https://reader036.vdocuments.us/reader036/viewer/2022062314/568145bb550346895db2c477/html5/thumbnails/24.jpg)
CSCE 212 24
TLB
• Page lookups must be performed in hardware– Page table is cached on-chip– Translation-lookaside buffer– Small fully associative or large limited associative
![Page 25: CSCE 212 Chapter 7 Memory Hierarchy](https://reader036.vdocuments.us/reader036/viewer/2022062314/568145bb550346895db2c477/html5/thumbnails/25.jpg)
CSCE 212 25
Integrating Cache and VM
• Data cannot be in the cache unless it is present in main memory
• Cache can be– physically addressed (TLB in critical path)– virtually addressed (TLB out of critical path)
• Cache miss requires TLB access
• TLB miss means:– page is in memory but we need the TLB entry, or– page is not in memory (page fault)– (both handled by OS software)
![Page 26: CSCE 212 Chapter 7 Memory Hierarchy](https://reader036.vdocuments.us/reader036/viewer/2022062314/568145bb550346895db2c477/html5/thumbnails/26.jpg)
CSCE 212 26
TLB Misses and Page Faults
• When a virtual address causes a page fault…1. Look up page table entry and find location on disk2. Choose a physical page to replace, write-back if dirty3. Read page from disk into chosen physical page (allow another process to run)
• TLB miss in MIPS– BadVAddr set, special exception triggered (8000 0000), go to TLB miss handler– Context register:
• bits 31:20 base of the page table• bits 19:2 virtual address of the missing page
– Use Context register directly to load missing entry• If the page table entry is invalid, a page fault exception occurs at the normal handler (8000 0180)
– Move missing entry to EntryLo register– Execute tlbwr to move EntryLo to TLB at address stored in Random register (free
running counter)– Execute eret to return
• TLB miss exception doesn’t save process state (fast) while page fault does (slow)