ece232: hardware organization and design · 2019. 8. 29. · adapted from computer organization and...
TRANSCRIPT
Adapted from Computer Organization and Design, Patterson & Hennessy, UCB
ECE232: Hardware Organization and Design
Lecture 28: More Virtual Memory
ECE232: More Virtual Memory 2
Overview
Virtual memory used to protect applications from each other
Portions of application located both in main memory and on disk
Need to speed up access for virtual memory
Idea: use a small cache to store translation for frequently used pages
ECE232: More Virtual Memory 3
How to Translate Fast?
Problem: Virtual Memory requires two memory accesses!
• one to translate Virtual Address into Physical Address (page table lookup) - Page Table is in physical memory
• one to transfer the actual data (hopefully cache hit)
• VM hierarchy only or Cache-memory-disk hierarchy
Why not create a cache of virtual to physical address translations to make translation fast? (smaller is faster)
For historical reasons, such a “page table cache” is called a Translation Lookaside Buffer, or TLB
CPU
Memory
ECE232: More Virtual Memory 4
Translation-Lookaside Buffer (TLB)
Physical Page N-1
Physical Page 0
Physical Page 1
Main Memory
of page 1
H. Stone, “High Performance Computer Architecture,” AW 1993
ECE232: More Virtual Memory 5
TLB and Page Table
ECE232: More Virtual Memory 6
Translation Look-Aside Buffers
TLB is usually small, typically 32-512 entries
Like any other cache, the TLB can be fully associative, set associative, or direct mapped
TLB Cache Main
Memory
misshit
data
hit
miss
Disk
MemoryOS FaultHandler
page fault/protection violation
Page
Table
data
virtualaddr.
physicaladdr.
Processor
ECE232: More Virtual Memory 7
Steps in Memory Access - Example
TLB Cache Main
Memory
misshit
data
hit
miss
Disk
MemoryOS FaultHandler
Page
Table
data
virtualaddr.
physicaladdr.
CPU
ECE232: More Virtual Memory 8
Valid Tag Data
Page offset
Page offset
Virtual page number
Physical page numberValid
1220
20
16 14
Cache index
32
DataCache hit
2Byte
offset
Dirty Tag
TLB hit
Physical page number
Physical address tag
31 30 29 15 14 13 12 11 10 9 8 3 2 1 0 DECStation 3100/MIPS R2000Virtual Address
TLB
Cache
64 entries,
fully
associative
Physical Address
16K entries,
direct
mapped
ECE232: More Virtual Memory 9
Real Stuff: Pentium Pro Memory Hierarchy
Address Size: 32 bits (VA, PA)
VM Page Size: 4 KB
TLB organization: separate i,d TLBs(i-TLB: 32 entries, d-TLB: 64 entries)4-way set associativeLRU approximatedhardware handles miss
L1 Cache: 8 KB, separate i,d 4-way set associativeLRU approximated32 byte blockwrite back
L2 Cache: 256 or 512 KB
ECE232: More Virtual Memory 10
Each processor has: private 32-KB instruction and 32-KB data
caches and a 512-KB L2 cache. The four cores share an 8-MB L3
cache. Each core also has a two-level TLB.
Intel “Nehalim” quad-core processor
13.519.6 mm die;
731 million
transistors;
Two 128-bit
memory channels
ECE232: More Virtual Memory 11
Intel Nehalem AMD Opteron X4
Virtual addr 48 bits 48 bits
Physical addr
44 bits 48 bits
Page size 4KB, 2/4MB 4KB, 2/4MB
L1 TLB(per core)
L1 I-TLB: 128 entries
L1 D-TLB: 64 entries
Both 4-way, LRU replacement
L1 I-TLB: 48 entries
L1 D-TLB: 48 entries
Both fully associative, LRU replacement
L2 TLB(per core)
Single L2 TLB: 512 entries
4-way, LRU replacement
L2 I-TLB: 512 entries
L2 D-TLB: 512 entries
Both 4-way, round-robin LRU
TLB misses Handled in hardware Handled in hardware
Comparing Intel’s Nehalim to AMD’s Opteron
ECE232: More Virtual Memory 12
Intel Nehalem AMD Opteron X4
L1 caches(per core)
L1 I-cache: 32KB, 64-byte blocks, 4-way, approx LRU, hit time n/a
L1 D-cache: 32KB, 64-byte blocks, 8-way, approx LRU, write-back/allocate, hit time n/a
L1 I-cache: 32KB, 64-byte blocks, 2-way, LRU, hit time 3 cycles
L1 D-cache: 32KB, 64-byte blocks, 2-way, LRU, write-back/allocate, hit time 9 cycles
L2 unified cache(per core)
256KB, 64-byte blocks, 8-way, approx LRU, write-back/allocate, hit time n/a
512KB, 64-byte blocks, 16-way, approx LRU, write-back/allocate, hit time n/a
L3 unified cache (shared)
8MB, 64-byte blocks, 16-way, write-back/allocate, hit time n/a
2MB, 64-byte blocks, 32-way, write-back/allocate, hit time 32 cycles
Further Comparison
ECE232: More Virtual Memory 13
Summary
Virtual memory allows the appearance of a main memory that is larger than what is physically present
Virtual memory can be shared by multiple applications
Page table indicates how to translate from virtual to physical address
TLB speeds up access to virtual memory
• Generally set associative or fully associative
• Much smaller than main memory
Next time: Putting it all together (cache, TLB, virtual memory)