caches in systems comp25212 cache 4. learning objectives to understand: –“3 x c’s” model of...
TRANSCRIPT
COMP25212 Cache 4
Caches in Systems
COMP25212 Cache 4
Learning Objectives
• To understand:– “3 x C’s” model of cache performance
– Time penalties for starting with empty cache
– Systems interconnect issues with caching– and solutions!
– Caching and Virtual Memory
COMP25212 Cache 4
Describing Cache Misses
• Compulsory Misses
• Capacity Misses
• Conflict Misses
COMP25212 Cache 4
Cache Performance again
• Today’s caches, how long does it take:a) to fill L3 cache? (8MB)
b) to fill L2 cache? (256KB)
c) to fill L1 D cache? (32KB)
• Number of lines = (cache size) / (line size)• Number of lines = 32K/64 = 512• 512 x memory access times at 20nS = 10 uS• 20,000 clock cycles at 2GHz
COMP25212 Cache 4
Caches in Systems
e.g. disk, network
L1Data
Cache
CPU
RAMMemory
On-chip
L1Inst
Cachefetch
data
L2
Input/Output
how often?internconnect
stuff
COMP25212 Cache 4
Cache Consistency Problem 1
• Problem:– I/O writes to mem;
cache outdated
DataCache
CPU
RAMMemory
On-chip
L1Inst
Cachefetch
data
L2
Input/Output
internconnect
stuff
“I”
“I”
“I”
COMP25212 Cache 4
Cache Consistency Problem 2
COMP25212 Cache 4
DataCache
CPU
RAMMemory
On-chip
L1Inst
Cachefetch
data
L2
Input/Output
internconnect
stuff
“I”
“I”
“I”
• Problem:– I/O reads mem;
cache holds newer
COMP25212 Cache 4
Cache Consistency Software Solutions
• O/S knows where I/O takes place in memory– Mark I/O areas as non-cachable (how?)
• O/S knows when I/O starts and finishes– Clear caches before&after I/O?
COMP25212 Cache 4
Hardware Solutions:1
Unfortunately:
tends to slow down cache COMP25212 Cache 4
DataCache
CPU
RAMMemory
On-chip
L1Inst
Cachefetch
data
L2
Input/Output
internconnect
stuff
“I”
“I”
“I”
Hardware Solutions: 2 - Snooping
COMP25212 Cache 4COMP25212 Cache 4
DataCache
CPU
RAMMemory
On-chip
L1Inst
Cachefetch
data
L2
Input/Output
internconnect
stuff
“I”
“I”
“I”
Snoop logic in cache observes every memory cycle
snoo
p
L2 keeps track of L1 contents
COMP25212 Cache 4
Caches and Virtual Addresses
• CPU addresses – virtual
• Memory addresses – physical
• Recap – use TLB to translate v-to-p
• What addresses in cache?
COMP25212 Cache 4
Option 1: Cache by Physical Addresses
CPU
RAMMemory
On-chip
address
data
$
TLB
• BUT:– Address translation
in series with cacheSLOW
COMP25212 Cache 4
Option 2: Cache by Virtual Addresses
COMP25212 Cache 4
CPU
RAMMemory
On-chip
address
data
$
TLB
• BUT:– Snooping?– Aliasing?
More Functional Difficulties
COMP25212 Cache 4
3: Translate in parallel with Cache Lookup
• Translation only affects high-order bits of address• Address within page remains unchanged
Low-order bits of Physical Address
= low-order bits of Virtual Address
Select “index” field of cache address from within low-order bits
Only “Tag” bits changed by translation
COMP25212 Cache 4
Option 3 in operation:
within line
indexvirtual page no
7520
Virtual address
data linetag line
multiplexer
compare = ?
TLB
Physical address
Hit? Data
COMP25212 Cache 4
The Last Word on Caching?
RAMMemory
On-chip
L1Data
Cache
CPUL1Inst
Cachefetch
data
L2
L1Data
Cache
CPUL1Inst
Cachefetch
data
L2
L3
Input/Output
On-chip
L1Data
Cache
CPUL1Inst
Cachefetch
data
L2
L1Data
Cache
CPUL1Inst
Cachefetch
data
L2
L3
On-chip
L1Data
Cache
CPUL1Inst
Cachefetch
data
L2
L1Data
Cache
CPUL1Inst
Cachefetch
data
L2
L3
You ain’t seen nothing yet!