![Page 1: M E M O R Y. Computer Performance It depends in large measure on the interface between processor and memory. CPI (or IPC) is affected CPI = Cycles per](https://reader035.vdocuments.us/reader035/viewer/2022062519/5697bff31a28abf838cbc9c8/html5/thumbnails/1.jpg)
M E M O R Y
![Page 2: M E M O R Y. Computer Performance It depends in large measure on the interface between processor and memory. CPI (or IPC) is affected CPI = Cycles per](https://reader035.vdocuments.us/reader035/viewer/2022062519/5697bff31a28abf838cbc9c8/html5/thumbnails/2.jpg)
Computer Performance
• It depends in large measure on the interface between processor and memory.
• CPI (or IPC) is affected
CPI = Cycles per instructionIPC = Instructions per cycle
![Page 3: M E M O R Y. Computer Performance It depends in large measure on the interface between processor and memory. CPI (or IPC) is affected CPI = Cycles per](https://reader035.vdocuments.us/reader035/viewer/2022062519/5697bff31a28abf838cbc9c8/html5/thumbnails/3.jpg)
Program locality
• Temporal locality (Data)– Recently referenced information
• Spatial locality– Same address space will be referenced
• Sequential locality (instructions)– (spatial locality) next address will be used
![Page 4: M E M O R Y. Computer Performance It depends in large measure on the interface between processor and memory. CPI (or IPC) is affected CPI = Cycles per](https://reader035.vdocuments.us/reader035/viewer/2022062519/5697bff31a28abf838cbc9c8/html5/thumbnails/4.jpg)
Caches
• Instruction cache
• Data cache
• Unified cache
• Split cache: I-cache & D-cache
![Page 5: M E M O R Y. Computer Performance It depends in large measure on the interface between processor and memory. CPI (or IPC) is affected CPI = Cycles per](https://reader035.vdocuments.us/reader035/viewer/2022062519/5697bff31a28abf838cbc9c8/html5/thumbnails/5.jpg)
Tradeoffs
• Usually cache (L1 and L2) is within CPU chip.
• Thus, area is a concern
• Design alternatives:– Large I-cache
– Equal area: I- and D-cache
– Large unified cache
– Multi-level split
![Page 6: M E M O R Y. Computer Performance It depends in large measure on the interface between processor and memory. CPI (or IPC) is affected CPI = Cycles per](https://reader035.vdocuments.us/reader035/viewer/2022062519/5697bff31a28abf838cbc9c8/html5/thumbnails/6.jpg)
Some terms
• Read: Any read access (by the processor) to cache –instruction (inst. fetch) or data
• Write: Any write access (by the processor) into cache
• Fetch: read to main mem. to load a block
• Access: any reference to memory
![Page 7: M E M O R Y. Computer Performance It depends in large measure on the interface between processor and memory. CPI (or IPC) is affected CPI = Cycles per](https://reader035.vdocuments.us/reader035/viewer/2022062519/5697bff31a28abf838cbc9c8/html5/thumbnails/7.jpg)
Miss Ratio
Number of references to cache not found in cache/total references to cache
Number of I-cache references not found in cache/ instructions executed
totalrefAnymiss
refAnyMissRatio
_
_
numberInst
missrefInst
MissRatio CacheI
_
![Page 8: M E M O R Y. Computer Performance It depends in large measure on the interface between processor and memory. CPI (or IPC) is affected CPI = Cycles per](https://reader035.vdocuments.us/reader035/viewer/2022062519/5697bff31a28abf838cbc9c8/html5/thumbnails/8.jpg)
Placement Problem
Main Memory
Cache Memory
![Page 9: M E M O R Y. Computer Performance It depends in large measure on the interface between processor and memory. CPI (or IPC) is affected CPI = Cycles per](https://reader035.vdocuments.us/reader035/viewer/2022062519/5697bff31a28abf838cbc9c8/html5/thumbnails/9.jpg)
Placement Policies
• WHERE to put a block in cache
• Mapping between main and cache memories.
• Main memory has a much larger capacity than cache memory.
![Page 10: M E M O R Y. Computer Performance It depends in large measure on the interface between processor and memory. CPI (or IPC) is affected CPI = Cycles per](https://reader035.vdocuments.us/reader035/viewer/2022062519/5697bff31a28abf838cbc9c8/html5/thumbnails/10.jpg)
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
Memory
Blo
ck n
umbe
r
0
1
2
3
4
5
6
7
Fully Associative Cache
Block can be placed in any location in cache.
![Page 11: M E M O R Y. Computer Performance It depends in large measure on the interface between processor and memory. CPI (or IPC) is affected CPI = Cycles per](https://reader035.vdocuments.us/reader035/viewer/2022062519/5697bff31a28abf838cbc9c8/html5/thumbnails/11.jpg)
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
Memory
Blo
ck n
umbe
r
0
1
2
3
4
5
6
7
Direct Mapped Cache
(Block address) MOD (Number of blocks in cache)
12 MOD 8 = 4
Block can be placed ONLY in a single location in cache.
![Page 12: M E M O R Y. Computer Performance It depends in large measure on the interface between processor and memory. CPI (or IPC) is affected CPI = Cycles per](https://reader035.vdocuments.us/reader035/viewer/2022062519/5697bff31a28abf838cbc9c8/html5/thumbnails/12.jpg)
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
Memory
Blo
ck n
umbe
r
0
1
2
3
4
5
6
7
Set Associative Cache
(Block address) MOD (Number of sets in cache)
12 MOD 4 = 0
0
1
2
3
Set no.
Blo
ck n
um
be
r
Block can be placed in one of n locations in n-way set associative cache.
![Page 13: M E M O R Y. Computer Performance It depends in large measure on the interface between processor and memory. CPI (or IPC) is affected CPI = Cycles per](https://reader035.vdocuments.us/reader035/viewer/2022062519/5697bff31a28abf838cbc9c8/html5/thumbnails/13.jpg)
Mem. Addr. cache addr.
CPU
Name trans.Cache address
![Page 14: M E M O R Y. Computer Performance It depends in large measure on the interface between processor and memory. CPI (or IPC) is affected CPI = Cycles per](https://reader035.vdocuments.us/reader035/viewer/2022062519/5697bff31a28abf838cbc9c8/html5/thumbnails/14.jpg)
Cache organization
Tag Index blk<21> <8> <5>
:.
<1> <21> <256>Valid Tag Data
CPUaddress
Data
= MUX
![Page 15: M E M O R Y. Computer Performance It depends in large measure on the interface between processor and memory. CPI (or IPC) is affected CPI = Cycles per](https://reader035.vdocuments.us/reader035/viewer/2022062519/5697bff31a28abf838cbc9c8/html5/thumbnails/15.jpg)
TAG
• Contains the “sector name” of block
Datatag
=
Block
Main mem addr
![Page 16: M E M O R Y. Computer Performance It depends in large measure on the interface between processor and memory. CPI (or IPC) is affected CPI = Cycles per](https://reader035.vdocuments.us/reader035/viewer/2022062519/5697bff31a28abf838cbc9c8/html5/thumbnails/16.jpg)
Valid bit(s)
• Indicates if block contains valid data– initial program load into main memory.
• No valid data
– Cache coherency in multiprocessors• Data in main memory could have been chanced by
other processor (or process).
![Page 17: M E M O R Y. Computer Performance It depends in large measure on the interface between processor and memory. CPI (or IPC) is affected CPI = Cycles per](https://reader035.vdocuments.us/reader035/viewer/2022062519/5697bff31a28abf838cbc9c8/html5/thumbnails/17.jpg)
Dirty bit(s)
• Indicates if the block has been written to.– No need in I-caches.– No need in write through D-cache.– Write back D-cache needs it.
![Page 18: M E M O R Y. Computer Performance It depends in large measure on the interface between processor and memory. CPI (or IPC) is affected CPI = Cycles per](https://reader035.vdocuments.us/reader035/viewer/2022062519/5697bff31a28abf838cbc9c8/html5/thumbnails/18.jpg)
Write back
C P U
Main memory
cacheD
![Page 19: M E M O R Y. Computer Performance It depends in large measure on the interface between processor and memory. CPI (or IPC) is affected CPI = Cycles per](https://reader035.vdocuments.us/reader035/viewer/2022062519/5697bff31a28abf838cbc9c8/html5/thumbnails/19.jpg)
Write through
C P U
Main memory
cache
![Page 20: M E M O R Y. Computer Performance It depends in large measure on the interface between processor and memory. CPI (or IPC) is affected CPI = Cycles per](https://reader035.vdocuments.us/reader035/viewer/2022062519/5697bff31a28abf838cbc9c8/html5/thumbnails/20.jpg)
Cache Performance
Access time (in cycles)
Tea = Hit time + Miss rate * Miss penalty
![Page 21: M E M O R Y. Computer Performance It depends in large measure on the interface between processor and memory. CPI (or IPC) is affected CPI = Cycles per](https://reader035.vdocuments.us/reader035/viewer/2022062519/5697bff31a28abf838cbc9c8/html5/thumbnails/21.jpg)
Associativity (D-cache misses per 1000 instructions)
2-way 4-way 8-way
Size (KB) LRU Rdm FIFO LRU Rdm FIFO LRU Rdm FIFO
16 114.1 117.3 115.5 111.7 115.1 113.3 109.0 111.8 110.4
64 103.4 104.9 103.9 102.4 102.3 103.1 99.7 100.5 100.3
256 92.2 92.1 92.5 92.1 92.1 92.5 92.1 92.1 92.5
![Page 22: M E M O R Y. Computer Performance It depends in large measure on the interface between processor and memory. CPI (or IPC) is affected CPI = Cycles per](https://reader035.vdocuments.us/reader035/viewer/2022062519/5697bff31a28abf838cbc9c8/html5/thumbnails/22.jpg)
CPU
Cache memory
Main
memory
![Page 23: M E M O R Y. Computer Performance It depends in large measure on the interface between processor and memory. CPI (or IPC) is affected CPI = Cycles per](https://reader035.vdocuments.us/reader035/viewer/2022062519/5697bff31a28abf838cbc9c8/html5/thumbnails/23.jpg)
Replacement Policies
• FIFO (First-In First-Out)
• Random
• Least-recently used (LRU)– Replace the block that has been unused for the
longest time.
![Page 24: M E M O R Y. Computer Performance It depends in large measure on the interface between processor and memory. CPI (or IPC) is affected CPI = Cycles per](https://reader035.vdocuments.us/reader035/viewer/2022062519/5697bff31a28abf838cbc9c8/html5/thumbnails/24.jpg)
Reducing Cache Miss Penalties
• Multilevel cache
• Critical word first
• Priority to Read over Write
• Victim cache
![Page 25: M E M O R Y. Computer Performance It depends in large measure on the interface between processor and memory. CPI (or IPC) is affected CPI = Cycles per](https://reader035.vdocuments.us/reader035/viewer/2022062519/5697bff31a28abf838cbc9c8/html5/thumbnails/25.jpg)
Multilevel caches
CPU L1 L2 L3Main
![Page 26: M E M O R Y. Computer Performance It depends in large measure on the interface between processor and memory. CPI (or IPC) is affected CPI = Cycles per](https://reader035.vdocuments.us/reader035/viewer/2022062519/5697bff31a28abf838cbc9c8/html5/thumbnails/26.jpg)
Multilevel caches
• Average access time
=Hit timeL1 + Miss rateL1 * Miss penaltyL1
Hit timeL2 + Miss rateL2 * Miss penaltyL2
![Page 27: M E M O R Y. Computer Performance It depends in large measure on the interface between processor and memory. CPI (or IPC) is affected CPI = Cycles per](https://reader035.vdocuments.us/reader035/viewer/2022062519/5697bff31a28abf838cbc9c8/html5/thumbnails/27.jpg)
Critical word first
• The missed word is requested to be sent first from next level.
• Writes are not as critical as reads.
• CPU cannot continue if data or instruction is not read.
Priority to Read over Write misses
![Page 28: M E M O R Y. Computer Performance It depends in large measure on the interface between processor and memory. CPI (or IPC) is affected CPI = Cycles per](https://reader035.vdocuments.us/reader035/viewer/2022062519/5697bff31a28abf838cbc9c8/html5/thumbnails/28.jpg)
Victim cache
• Small fully associative cache.
• Contains blocks that have been discarded (“victims”)
• Four entry cache removed 20-90% of conflict misses (4KB direct-mapped).
![Page 29: M E M O R Y. Computer Performance It depends in large measure on the interface between processor and memory. CPI (or IPC) is affected CPI = Cycles per](https://reader035.vdocuments.us/reader035/viewer/2022062519/5697bff31a28abf838cbc9c8/html5/thumbnails/29.jpg)
![Page 30: M E M O R Y. Computer Performance It depends in large measure on the interface between processor and memory. CPI (or IPC) is affected CPI = Cycles per](https://reader035.vdocuments.us/reader035/viewer/2022062519/5697bff31a28abf838cbc9c8/html5/thumbnails/30.jpg)
Pseudo Associative Cache
• Direct-mapped cache hit time
• If miss – Check other entry in cache– (change the most significant bit)
• If found no long wait
![Page 31: M E M O R Y. Computer Performance It depends in large measure on the interface between processor and memory. CPI (or IPC) is affected CPI = Cycles per](https://reader035.vdocuments.us/reader035/viewer/2022062519/5697bff31a28abf838cbc9c8/html5/thumbnails/31.jpg)
Reducing Cache Miss Rate
• Larger block size
• Larger caches
• Higher Associativity
• Pseudo-associative cache
• Prefetching
![Page 32: M E M O R Y. Computer Performance It depends in large measure on the interface between processor and memory. CPI (or IPC) is affected CPI = Cycles per](https://reader035.vdocuments.us/reader035/viewer/2022062519/5697bff31a28abf838cbc9c8/html5/thumbnails/32.jpg)
Larger Block Size
• Large block take advantage of spatial locality.
• On the other hand, less blocks in cache
![Page 33: M E M O R Y. Computer Performance It depends in large measure on the interface between processor and memory. CPI (or IPC) is affected CPI = Cycles per](https://reader035.vdocuments.us/reader035/viewer/2022062519/5697bff31a28abf838cbc9c8/html5/thumbnails/33.jpg)
0
5
10
15
20
25
16 32 64 128 256
Block size
Mis
s ra
te (
%)
1K
2K
16K
64K
256K
![Page 34: M E M O R Y. Computer Performance It depends in large measure on the interface between processor and memory. CPI (or IPC) is affected CPI = Cycles per](https://reader035.vdocuments.us/reader035/viewer/2022062519/5697bff31a28abf838cbc9c8/html5/thumbnails/34.jpg)
Miss rate versus block size
Block size
Cache size
4K 16K 64K 256K
16 8.57% 3.94% 2.04% 1.09%
32 7.24% 2.87% 1.35% 0.70%
64 7.00% 2.64% 1.06% 0.51%
128 7.78% 2.77% 1.02% 0.49%
256 9.51% 3.29% 1.15% 0.49%
![Page 35: M E M O R Y. Computer Performance It depends in large measure on the interface between processor and memory. CPI (or IPC) is affected CPI = Cycles per](https://reader035.vdocuments.us/reader035/viewer/2022062519/5697bff31a28abf838cbc9c8/html5/thumbnails/35.jpg)
Example
• Memory takes 80 clock cycles of overhead
• Delivers 16 bytes every 2 clock cycles (cc)
Mem. System supplies 16 bytes in 82 cc
32 bytes in 84 cc
1 cycle (miss) + 1 cycle (read)
![Page 36: M E M O R Y. Computer Performance It depends in large measure on the interface between processor and memory. CPI (or IPC) is affected CPI = Cycles per](https://reader035.vdocuments.us/reader035/viewer/2022062519/5697bff31a28abf838cbc9c8/html5/thumbnails/36.jpg)
Example (cont’d)
Average access time
=Hit timeL1 + Miss rateL1 * Miss penaltyL1
For a 16-byte block in a 4 KB cache
Ave. access time = 1 + (8.57% * 82)=8.0274
![Page 37: M E M O R Y. Computer Performance It depends in large measure on the interface between processor and memory. CPI (or IPC) is affected CPI = Cycles per](https://reader035.vdocuments.us/reader035/viewer/2022062519/5697bff31a28abf838cbc9c8/html5/thumbnails/37.jpg)
Example (cont’d)
Average access time
=Hit timeL1 + Miss rateL1 * Miss penaltyL1
For a 64-byte block in a 256 KB cache (miss rate 0.51%)
Ave. access time = 1 + (0.51% * 88)= 1.449
![Page 38: M E M O R Y. Computer Performance It depends in large measure on the interface between processor and memory. CPI (or IPC) is affected CPI = Cycles per](https://reader035.vdocuments.us/reader035/viewer/2022062519/5697bff31a28abf838cbc9c8/html5/thumbnails/38.jpg)
Ave. mem. access time versus block size
Block size
Miss penalty
Cache size
4K 16K 64K 256K
16 82 8.027 4.231 2.673 1.894
32 84 7.082 3.411 2.134 1.588
64 88 7.160 3.323 1.933 1.449
128 96 8.469 3.659 1.979 1.470
256 112 11.651 4.685 2.288 1.549
![Page 39: M E M O R Y. Computer Performance It depends in large measure on the interface between processor and memory. CPI (or IPC) is affected CPI = Cycles per](https://reader035.vdocuments.us/reader035/viewer/2022062519/5697bff31a28abf838cbc9c8/html5/thumbnails/39.jpg)
Two-way cache (Alpha)
![Page 40: M E M O R Y. Computer Performance It depends in large measure on the interface between processor and memory. CPI (or IPC) is affected CPI = Cycles per](https://reader035.vdocuments.us/reader035/viewer/2022062519/5697bff31a28abf838cbc9c8/html5/thumbnails/40.jpg)
![Page 41: M E M O R Y. Computer Performance It depends in large measure on the interface between processor and memory. CPI (or IPC) is affected CPI = Cycles per](https://reader035.vdocuments.us/reader035/viewer/2022062519/5697bff31a28abf838cbc9c8/html5/thumbnails/41.jpg)
Higher Associativity
• Having higher associativity is NOT for free
• Slower clock may be required
• Thus, there is a trade-off between associativity (with higher hit rate) and faster clocks (direct-mapped).
![Page 42: M E M O R Y. Computer Performance It depends in large measure on the interface between processor and memory. CPI (or IPC) is affected CPI = Cycles per](https://reader035.vdocuments.us/reader035/viewer/2022062519/5697bff31a28abf838cbc9c8/html5/thumbnails/42.jpg)
Associativity example
• If clock cycle time (cct) increases as follows:– cct2-way = 1.1cct1-way
– cct4-way = 1.12cct1-way
– cct8-way = 1.14cct1-way
• Miss penalty is 50 cycles
• Hit time is 1 cycle
![Page 43: M E M O R Y. Computer Performance It depends in large measure on the interface between processor and memory. CPI (or IPC) is affected CPI = Cycles per](https://reader035.vdocuments.us/reader035/viewer/2022062519/5697bff31a28abf838cbc9c8/html5/thumbnails/43.jpg)
![Page 44: M E M O R Y. Computer Performance It depends in large measure on the interface between processor and memory. CPI (or IPC) is affected CPI = Cycles per](https://reader035.vdocuments.us/reader035/viewer/2022062519/5697bff31a28abf838cbc9c8/html5/thumbnails/44.jpg)
Pseudo Associative Cache
• Direct-mapped cache hit time
• If miss – Check other entry in cache– (change the most significant bit)
• If found no long wait
![Page 45: M E M O R Y. Computer Performance It depends in large measure on the interface between processor and memory. CPI (or IPC) is affected CPI = Cycles per](https://reader035.vdocuments.us/reader035/viewer/2022062519/5697bff31a28abf838cbc9c8/html5/thumbnails/45.jpg)
Pseudo Associative
time
Hit time
Pseudo hit time Miss penalty
![Page 46: M E M O R Y. Computer Performance It depends in large measure on the interface between processor and memory. CPI (or IPC) is affected CPI = Cycles per](https://reader035.vdocuments.us/reader035/viewer/2022062519/5697bff31a28abf838cbc9c8/html5/thumbnails/46.jpg)
Hardware prefetching
• What about fetching two blocks in a miss.– Alpha AXP 21064
• Problem: real misses may be delayed due to prefetching
![Page 47: M E M O R Y. Computer Performance It depends in large measure on the interface between processor and memory. CPI (or IPC) is affected CPI = Cycles per](https://reader035.vdocuments.us/reader035/viewer/2022062519/5697bff31a28abf838cbc9c8/html5/thumbnails/47.jpg)
Fetch Policies
• Memory references:– 90% reads– 10% writes
• Thus, read has higher priority
![Page 48: M E M O R Y. Computer Performance It depends in large measure on the interface between processor and memory. CPI (or IPC) is affected CPI = Cycles per](https://reader035.vdocuments.us/reader035/viewer/2022062519/5697bff31a28abf838cbc9c8/html5/thumbnails/48.jpg)
Fetch
• Fetch on miss– Demand fetching
• Prefetching– Instructions (I-Cache)
![Page 49: M E M O R Y. Computer Performance It depends in large measure on the interface between processor and memory. CPI (or IPC) is affected CPI = Cycles per](https://reader035.vdocuments.us/reader035/viewer/2022062519/5697bff31a28abf838cbc9c8/html5/thumbnails/49.jpg)
Hardware prefetching
• What about fetching two blocks in a miss.– Alpha AXP 21064
• Problem: real misses may be delayed due to prefetching
![Page 50: M E M O R Y. Computer Performance It depends in large measure on the interface between processor and memory. CPI (or IPC) is affected CPI = Cycles per](https://reader035.vdocuments.us/reader035/viewer/2022062519/5697bff31a28abf838cbc9c8/html5/thumbnails/50.jpg)
Compiler Controlled Prefetching
• Register prefetching– Load values in regs before they are needed
• Cache prefetching – Loads data in cache only
This is only possible if we have nonblocking or lockup-free caches.
![Page 51: M E M O R Y. Computer Performance It depends in large measure on the interface between processor and memory. CPI (or IPC) is affected CPI = Cycles per](https://reader035.vdocuments.us/reader035/viewer/2022062519/5697bff31a28abf838cbc9c8/html5/thumbnails/51.jpg)
Order Main Memory to Cache
• Needed AU is in the middle of the block• Block load
– The complete block is loaded
• Load forward– The needed AU is forwarded
– AUs behind the miss are not loaded
• Fetch bypass– Start with needed AU and AUs behind are loaded later
AU=Addressable Unit
![Page 52: M E M O R Y. Computer Performance It depends in large measure on the interface between processor and memory. CPI (or IPC) is affected CPI = Cycles per](https://reader035.vdocuments.us/reader035/viewer/2022062519/5697bff31a28abf838cbc9c8/html5/thumbnails/52.jpg)
Compiler optimization:Merging Arrays
• The goal is to improve locality
• Rather than having two independent arrays– we could have only one.
![Page 53: M E M O R Y. Computer Performance It depends in large measure on the interface between processor and memory. CPI (or IPC) is affected CPI = Cycles per](https://reader035.vdocuments.us/reader035/viewer/2022062519/5697bff31a28abf838cbc9c8/html5/thumbnails/53.jpg)
Compiler optimization: Loop interchange
• Nested loops with nonsequential memory access.
A11 A12 A13 A21
A22
A23A31 A32 A33
Array
A11
A12
A13
A21 A22 A23
A31
A32
A33
![Page 54: M E M O R Y. Computer Performance It depends in large measure on the interface between processor and memory. CPI (or IPC) is affected CPI = Cycles per](https://reader035.vdocuments.us/reader035/viewer/2022062519/5697bff31a28abf838cbc9c8/html5/thumbnails/54.jpg)
A11 A12 A13 A21
A22
A23A31 A32 A33
Array
A11
A12
A13
A21 A22 A23
A31
A32
A33
Memory
![Page 55: M E M O R Y. Computer Performance It depends in large measure on the interface between processor and memory. CPI (or IPC) is affected CPI = Cycles per](https://reader035.vdocuments.us/reader035/viewer/2022062519/5697bff31a28abf838cbc9c8/html5/thumbnails/55.jpg)
Priority to Read over Write
• If two (or more) misses occur give priority to read.
M[512] R3
R1 M[1024]
R1 M[512]
• Write buffer
![Page 56: M E M O R Y. Computer Performance It depends in large measure on the interface between processor and memory. CPI (or IPC) is affected CPI = Cycles per](https://reader035.vdocuments.us/reader035/viewer/2022062519/5697bff31a28abf838cbc9c8/html5/thumbnails/56.jpg)
Non-blocking caches
• On a miss the cache should not block any access
• This is important for an out-of-order execution machine (Tomasulo approach).
![Page 57: M E M O R Y. Computer Performance It depends in large measure on the interface between processor and memory. CPI (or IPC) is affected CPI = Cycles per](https://reader035.vdocuments.us/reader035/viewer/2022062519/5697bff31a28abf838cbc9c8/html5/thumbnails/57.jpg)
Access time
Mem_ access_timeL1 = Hit_rateL1 + Miss_rateL1 X Miss_penaltyL1
Mem_ access_time = Hit_rate + Miss_rate X Miss_penalty
Miss_penaltyL1 = Hit_rateL2 + Miss_rateL2 X Miss_penaltyL2
![Page 58: M E M O R Y. Computer Performance It depends in large measure on the interface between processor and memory. CPI (or IPC) is affected CPI = Cycles per](https://reader035.vdocuments.us/reader035/viewer/2022062519/5697bff31a28abf838cbc9c8/html5/thumbnails/58.jpg)
Reducing Hit Time
• Critical in increasing the processor clock frequency.
• “smaller” hardware is faster– Less sequential operations
• Direct mapping is simpler
Small & simple caches
![Page 59: M E M O R Y. Computer Performance It depends in large measure on the interface between processor and memory. CPI (or IPC) is affected CPI = Cycles per](https://reader035.vdocuments.us/reader035/viewer/2022062519/5697bff31a28abf838cbc9c8/html5/thumbnails/59.jpg)
Avoid address translation
• Virtual address Physical address.
![Page 60: M E M O R Y. Computer Performance It depends in large measure on the interface between processor and memory. CPI (or IPC) is affected CPI = Cycles per](https://reader035.vdocuments.us/reader035/viewer/2022062519/5697bff31a28abf838cbc9c8/html5/thumbnails/60.jpg)
Example
Cache Size I-cache D-cacheUnified Cache
8 KB 0.82% 12.22% 4.63%
16 KB 0.38% 11.36% 3.75%
32 KB 0.16% 11.73% 3.18%
64 KB 0.065% 10.25% 2.89%
128 KB 0.03% 9.81% 2.66%
256 KB 0.002% 9.06% 2.42%