machine structures lecture 25 – caches iii
DESCRIPTION
Machine Structures Lecture 25 – Caches III. 100 Gig Eth?! Yesterday Infinera demoed a system that took a 100 GigE signal, broke it up into ten 10GigE signals and sent it over existing optical networks. Fast!!!. gigaom.com/2006/11/14/100gbe. What to do on a write hit?. Write-through - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Machine Structures Lecture 25 – Caches III](https://reader036.vdocuments.us/reader036/viewer/2022070401/56813742550346895d9ed623/html5/thumbnails/1.jpg)
L25 Caches III (1)
Machine Structures
Lecture 25 – Caches III
gigaom.com/2006/11/14/100gbe
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
100 Gig Eth?! Yesterday Infinera
demoed a system that took a 100 GigE signal, broke it up into ten 10GigE signals and sent it over
existing optical networks. Fast!!!
![Page 2: Machine Structures Lecture 25 – Caches III](https://reader036.vdocuments.us/reader036/viewer/2022070401/56813742550346895d9ed623/html5/thumbnails/2.jpg)
L25 Caches III (2)
What to do on a write hit?•Write-through
• 同时更新 cache 块和对应的内存字•Write-back
• 更新 cache 块中的字• 允许内存中的字为“ stale”
add ‘dirty’ bit to each block indicating that memory needs to be updated when block is replaced
OS flushes cache before I/O…
•Performance trade-offs?
![Page 3: Machine Structures Lecture 25 – Caches III](https://reader036.vdocuments.us/reader036/viewer/2022070401/56813742550346895d9ed623/html5/thumbnails/3.jpg)
L25 Caches III (3)
Block Size Tradeoff (1/3)•Larger Block Size 优点
• 空间局部性 : 如果访问给定字 , 可能很快会访问附近的其他字
• 适用于存储程序概念 : 如果执行一条指令 , 极有可能会执行接下来的几条指令
• 对于线性数组的访问也是合适的
![Page 4: Machine Structures Lecture 25 – Caches III](https://reader036.vdocuments.us/reader036/viewer/2022070401/56813742550346895d9ed623/html5/thumbnails/4.jpg)
L25 Caches III (4)
Block Size Tradeoff (2/3)•Larger Block Size 缺点
• 大块意味着更大的 miss penalty 在 miss 时 , 需要花更长的时间从内存中装入新
块• 如果块太大,块的数量就少
结果 : miss 率上升
•总的来说,应最小化平均访问时间 Average Memory Access Time (AMAT)
= Hit Time + Miss Penalty x Miss Rate
![Page 5: Machine Structures Lecture 25 – Caches III](https://reader036.vdocuments.us/reader036/viewer/2022070401/56813742550346895d9ed623/html5/thumbnails/5.jpg)
L25 Caches III (5)
Block Size Tradeoff (3/3)•Hit Time = 从 cache 中找到并提取数据
所花的时间•Miss Penalty = 在当前层 miss 时,取数
据所花的平均时间 ( 包括在下一层也出现misses 的可能性 )
•Hit Rate = % of requests that are found in current level cache
•Miss Rate = 1 - Hit Rate
![Page 6: Machine Structures Lecture 25 – Caches III](https://reader036.vdocuments.us/reader036/viewer/2022070401/56813742550346895d9ed623/html5/thumbnails/6.jpg)
L25 Caches III (6)
Extreme Example: One Big Block
•Cache Size = 4 bytes Block Size = 4 bytes• Only ONE entry (row) in the cache!
• If item accessed, likely accessed again soon• But unlikely will be accessed again immediately!
•The next access will likely to be a miss again• Continually loading data into the cache butdiscard data (force out) before use it again
• Nightmare for cache designer: Ping Pong Effect
Cache DataValid BitB 0B 1B 3
TagB 2
![Page 7: Machine Structures Lecture 25 – Caches III](https://reader036.vdocuments.us/reader036/viewer/2022070401/56813742550346895d9ed623/html5/thumbnails/7.jpg)
L25 Caches III (7)
Block Size Tradeoff ConclusionsMissPenalty
Block Size
Increased Miss Penalty& Miss Rate
AverageAccess
Time
Block Size
Exploits Spatial Locality
Fewer blocks: compromisestemporal locality
MissRate
Block Size
![Page 8: Machine Structures Lecture 25 – Caches III](https://reader036.vdocuments.us/reader036/viewer/2022070401/56813742550346895d9ed623/html5/thumbnails/8.jpg)
L25 Caches III (8)
Types of Cache Misses (1/2)
•“ Three Cs” Model of Misses
•1st C: Compulsory Misses• occur when a program is first started
• cache does not contain any of that program’s data yet, so misses are bound to occur
• can’t be avoided easily, so won’t focus on these in this course
![Page 9: Machine Structures Lecture 25 – Caches III](https://reader036.vdocuments.us/reader036/viewer/2022070401/56813742550346895d9ed623/html5/thumbnails/9.jpg)
L25 Caches III (9)
Types of Cache Misses (2/2)•2nd C: Conflict Misses
• 因为两个不同的内存地址映射到相同的cache 地址而发生的 miss
• 两个块可能不断互相覆盖 ( 不幸地映射到同一位置 )
• big problem in direct-mapped caches• 如何减小这个效果 ?
•处理 Conflict Misses• Solution 1: 增加 cache
Fails at some point
• Solution 2: 多个不同的块映射到同样的cache 索引 ?
![Page 10: Machine Structures Lecture 25 – Caches III](https://reader036.vdocuments.us/reader036/viewer/2022070401/56813742550346895d9ed623/html5/thumbnails/10.jpg)
L25 Caches III (10)
完全关联( Fully Associative ) Cache (1/3)•内存地址字段 :
• Tag: same as before
• Offset: same as before
• Index: non-existant
•What does this mean?• 没有 “ rows” 索引 : 每个块都可以映射到cache 的任一行
• 必须要比较整个 cache 的所有行来发现数据是否在 cache 中
![Page 11: Machine Structures Lecture 25 – Caches III](https://reader036.vdocuments.us/reader036/viewer/2022070401/56813742550346895d9ed623/html5/thumbnails/11.jpg)
L25 Caches III (11)
Fully Associative Cache (2/3)•Fully Associative Cache (e.g., 32 B block)
• compare tags in parallel
Byte Offset
:
Cache DataB 0
0431
:
Cache Tag (27 bits long)
Valid
:
B 1B 31 :
Cache Tag=
==
=
=:
![Page 12: Machine Structures Lecture 25 – Caches III](https://reader036.vdocuments.us/reader036/viewer/2022070401/56813742550346895d9ed623/html5/thumbnails/12.jpg)
L25 Caches III (12)
Fully Associative Cache (3/3)•Fully Assoc Cache 优点
• No Conflict Misses ( 因数据可在任何地方 )
•Fully Assoc Cache 缺点• 需要硬件比较来比较所有入口 : 如果在cache 中有 64KB 数据 , 4B 偏移 , 需要16K 比较器 : infeasible
![Page 13: Machine Structures Lecture 25 – Caches III](https://reader036.vdocuments.us/reader036/viewer/2022070401/56813742550346895d9ed623/html5/thumbnails/13.jpg)
L25 Caches III (13)
Accessing data in a Fully assoc cache•6 Addresses:•0x00000014, 0x0000001C, 0x00000034, 0x00008014, 0x00000030, 0x0000001c
•6 地址分解为 ( 为方便起见 ) Tag, 字节偏移域0000000000000000000000000001 01000000000000000000000000000001 11000000000000000000000000000011 01000000000000000000100000000001 01000000000000000000000000000011 00000000000000000000000000000001 1100 Tag Offset
![Page 14: Machine Structures Lecture 25 – Caches III](https://reader036.vdocuments.us/reader036/viewer/2022070401/56813742550346895d9ed623/html5/thumbnails/14.jpg)
L25 Caches III (14)
16 KB Fully Asso. Cache, 16B blocks• Valid bit: 确定该行是否存有数据 ( 当计算
机初始化时 , 所有行都为 invalid)
...
ValidTag 0x0-3 0x4-7 0x8-b 0xc-f
01234567
10221023
...
Index00000000
00
![Page 15: Machine Structures Lecture 25 – Caches III](https://reader036.vdocuments.us/reader036/viewer/2022070401/56813742550346895d9ed623/html5/thumbnails/15.jpg)
L25 Caches III (15)
1. Read 0x00000014
...
ValidTag 0x0-3 0x4-7 0x8-b 0xc-f
01234567
10221023
...
• 0000000000000000000000000001 0100
Index
Tag field Offset
00000000
00
![Page 16: Machine Structures Lecture 25 – Caches III](https://reader036.vdocuments.us/reader036/viewer/2022070401/56813742550346895d9ed623/html5/thumbnails/16.jpg)
L25 Caches III (16)
Find the block to match tag (00…001)
...
ValidTag 0x0-3 0x4-7 0x8-b 0xc-f
01234567
10221023
...
• 0000000000000000000000000001 0100Tag field Offset
00000000
00
![Page 17: Machine Structures Lecture 25 – Caches III](https://reader036.vdocuments.us/reader036/viewer/2022070401/56813742550346895d9ed623/html5/thumbnails/17.jpg)
L25 Caches III (17)
No. So we find block to fill 0 (0000000000)
...
ValidTag 0x0-3 0x4-7 0x8-b 0xc-f
01234567
10221023
...
• 0000000000000000000000000001 0100Tag field Offset
00000000
00
![Page 18: Machine Structures Lecture 25 – Caches III](https://reader036.vdocuments.us/reader036/viewer/2022070401/56813742550346895d9ed623/html5/thumbnails/18.jpg)
L25 Caches III (18)
So load that data into cache, setting tag, valid
...
ValidTag 0x0-3 0x4-7 0x8-b 0xc-f
01234567
10221023
...
1 1 a b c d
Tag field Offset
00
00000
00
• 0000000000000000000000000001 0100
![Page 19: Machine Structures Lecture 25 – Caches III](https://reader036.vdocuments.us/reader036/viewer/2022070401/56813742550346895d9ed623/html5/thumbnails/19.jpg)
L25 Caches III (19)
get that data in the offset (0100)
...
ValidTag 0x0-3 0x4-7 0x8-b 0xc-f
01234567
10221023
...
1 1 a b c d
Tag field Offset
00
00000
00
• 0000000000000000000000000001 0100
![Page 20: Machine Structures Lecture 25 – Caches III](https://reader036.vdocuments.us/reader036/viewer/2022070401/56813742550346895d9ed623/html5/thumbnails/20.jpg)
L25 Caches III (20)
2. Read 0x0000001C = 0…00 0..001 1100
...
ValidTag 0x0-3 0x4-7 0x8-b 0xc-f
01234567
10221023
...
1 1 a b c d
• 0000000000000000000000000001 1100Tag field Offset
0000000
00
![Page 21: Machine Structures Lecture 25 – Caches III](https://reader036.vdocuments.us/reader036/viewer/2022070401/56813742550346895d9ed623/html5/thumbnails/21.jpg)
L25 Caches III (21)
Find the tag in cache
...
ValidTag 0x0-3 0x4-7 0x8-b 0xc-f
01234567
10221023
...
01 a b c d
• 0000000000000000000000000001 1100
Index
Tag field Offset
1
000000
00
![Page 22: Machine Structures Lecture 25 – Caches III](https://reader036.vdocuments.us/reader036/viewer/2022070401/56813742550346895d9ed623/html5/thumbnails/22.jpg)
L25 Caches III (22)
The data have found in Cache
...
ValidTag 0x0-3 0x4-7 0x8-b 0xc-f
01234567
10221023
...
01 a b c d
• 0000000000000000000000000001 1100
Index
Tag field Offset
1
000000
00
![Page 23: Machine Structures Lecture 25 – Caches III](https://reader036.vdocuments.us/reader036/viewer/2022070401/56813742550346895d9ed623/html5/thumbnails/23.jpg)
L25 Caches III (23)
Get the data from Offset (1100)
...
ValidTag 0x0-3 0x4-7 0x8-b 0xc-f
01234567
10221023
...
01 a b c d
• 0000000000000000000000000001 1100
Index
Tag field Offset
1
000000
00
![Page 24: Machine Structures Lecture 25 – Caches III](https://reader036.vdocuments.us/reader036/viewer/2022070401/56813742550346895d9ed623/html5/thumbnails/24.jpg)
L25 Caches III (24)
3. Read 0x00000034 = 0…000..011 0100
...
ValidTag 0x0-3 0x4-7 0x8-b 0xc-f
01234567
10221023
...
1 1 a b c d
• 0000000000000000000000000011 0100
Index
Tag field Offset
0000000
00
![Page 25: Machine Structures Lecture 25 – Caches III](https://reader036.vdocuments.us/reader036/viewer/2022070401/56813742550346895d9ed623/html5/thumbnails/25.jpg)
L25 Caches III (25)
Find the tag 3 (0011)
...
ValidTag 0x0-3 0x4-7 0x8-b 0xc-f
01234567
10221023
...
01 a b c d
• 0000000000000000000000000011 0100
Index
Tag field Offset
1
000000
00
![Page 26: Machine Structures Lecture 25 – Caches III](https://reader036.vdocuments.us/reader036/viewer/2022070401/56813742550346895d9ed623/html5/thumbnails/26.jpg)
L25 Caches III (26)
Not Find (0011)
...
ValidTag 0x0-3 0x4-7 0x8-b 0xc-f
01234567
10221023
...
01 a b c d
• 0000000000000000000000000011 0100
Index
Tag field Offset
1
000000
00
![Page 27: Machine Structures Lecture 25 – Caches III](https://reader036.vdocuments.us/reader036/viewer/2022070401/56813742550346895d9ed623/html5/thumbnails/27.jpg)
L25 Caches III (27)
So load that data into cache, setting tag, valid
...
ValidTag 0x0-3 0x4-7 0x8-b 0xc-f
01234567
10221023
...
1 a b c d
• 0000000000000000000000000011 0100
Index
Tag field Offset
1
000000
00
3 e f g h1
![Page 28: Machine Structures Lecture 25 – Caches III](https://reader036.vdocuments.us/reader036/viewer/2022070401/56813742550346895d9ed623/html5/thumbnails/28.jpg)
L25 Caches III (28)
Get the data from Offset (0100)
...
ValidTag 0x0-3 0x4-7 0x8-b 0xc-f
01234567
10221023
...
1 a b c d
• 0000000000000000000000000011 0100
Index
Tag field Offset
1
000000
00
3 e f g h1
![Page 29: Machine Structures Lecture 25 – Caches III](https://reader036.vdocuments.us/reader036/viewer/2022070401/56813742550346895d9ed623/html5/thumbnails/29.jpg)
L25 Caches III (29)
4. Read 0x00008014 = 0…10 0..001 0100
...
ValidTag 0x0-3 0x4-7 0x8-b 0xc-f
01234567
10221023
...
• 0000000000000000100000000001 0100
Index
Tag field Offset
1 a b c d
Offset
1
000000
00
3 e f g h1
![Page 30: Machine Structures Lecture 25 – Caches III](https://reader036.vdocuments.us/reader036/viewer/2022070401/56813742550346895d9ed623/html5/thumbnails/30.jpg)
L25 Caches III (30)
Find Tag(0x801), Not Found
...
ValidTag 0x0-3 0x4-7 0x8-b 0xc-f
01234567
10221023
...
• 0000000000000000100000000001 0100
Index
Tag field Offset
1 a b c d
Offset
1
000000
00
3 e f g h1
![Page 31: Machine Structures Lecture 25 – Caches III](https://reader036.vdocuments.us/reader036/viewer/2022070401/56813742550346895d9ed623/html5/thumbnails/31.jpg)
L25 Caches III (31)
So load that data into cache, setting tag, valid
...
ValidTag 0x0-3 0x4-7 0x8-b 0xc-f
01234567
10221023
...
1 a b c d
• 0000000000000000100000000001 0100
Index
Tag field Offset
1
100000
00
3 e f g h1i j k l801
![Page 32: Machine Structures Lecture 25 – Caches III](https://reader036.vdocuments.us/reader036/viewer/2022070401/56813742550346895d9ed623/html5/thumbnails/32.jpg)
L25 Caches III (32)
Get data from cache
...
ValidTag 0x0-3 0x4-7 0x8-b 0xc-f
01234567
10221023
...
1 a b c d
• 0000000000000000100000000001 0100
Index
Tag field Offset
1
100000
00
3 e f g h1i j k l801
![Page 33: Machine Structures Lecture 25 – Caches III](https://reader036.vdocuments.us/reader036/viewer/2022070401/56813742550346895d9ed623/html5/thumbnails/33.jpg)
L25 Caches III (33)
Do an example yourself. What happens?• Chose from: Cache: Hit, Miss, Miss w. replace
Values returned: a ,b, c, d, e, ..., k, l• Read address 0x00000030 ? 000000000000000000 0000000011 0000• Read address 0x0000001c ? 000000000000000000 0000000001 1100
...
ValidTag 0x0-3 0x4-7 0x8-b 0xc-f01234567...
11
i j k l0
3 e f g h
Index1
1
0000
Cache
801
a b c d
![Page 34: Machine Structures Lecture 25 – Caches III](https://reader036.vdocuments.us/reader036/viewer/2022070401/56813742550346895d9ed623/html5/thumbnails/34.jpg)
L25 Caches III (34)
Answers•0x00000030 a hit
tag = 3, Tag found, Offset = 0, value = e
•0x0000001c a hittag = 1, Tag found, Offset = 0xc, value = d
•Since reads, values must = memory values whether or not cached:•0x00000030 = e•0x0000001c = d
Address Value of WordMemory
0000001000000014000000180000001c
abcd
... ...
0000003000000034000000380000003c
efgh
0000801000008014000080180000801c
ijkl
... ...
... ...
... ...
![Page 35: Machine Structures Lecture 25 – Caches III](https://reader036.vdocuments.us/reader036/viewer/2022070401/56813742550346895d9ed623/html5/thumbnails/35.jpg)
L25 Caches III (35)
Third Type of Cache Miss•Capacity Misses
• miss that occurs because the cache has a limited size
• miss that would not occur if we increase the size of the cache
• sketchy definition, so just get the general idea
•This is the primary type of miss for Fully Associative caches.
![Page 36: Machine Structures Lecture 25 – Caches III](https://reader036.vdocuments.us/reader036/viewer/2022070401/56813742550346895d9ed623/html5/thumbnails/36.jpg)
L25 Caches III (36)
N-Way Set Associative Cache (1/3)
•Memory address fields:• Tag: same as before
• Offset: same as before
• Index: 指向正确的行“ row” ( 此处称为集合 )
•有什么区别 ?• 每个 set 包含多个块• 当找到正确的块后 , 需要比较该 set 中的所有 tag 以查找数据
![Page 37: Machine Structures Lecture 25 – Caches III](https://reader036.vdocuments.us/reader036/viewer/2022070401/56813742550346895d9ed623/html5/thumbnails/37.jpg)
L25 Caches III (37)
Associative Cache Example
• Here’s a simple 2 way set associative cache.
MemoryMemory Address
0123456789ABCDEF
Cache Index
0011
![Page 38: Machine Structures Lecture 25 – Caches III](https://reader036.vdocuments.us/reader036/viewer/2022070401/56813742550346895d9ed623/html5/thumbnails/38.jpg)
L25 Caches III (38)
N-Way Set Associative Cache (2/3)
• Basic Idea• cache is direct-mapped w/respect to sets
• each set is fully associative
• basically N direct-mapped caches working in parallel: each has its own valid bit and data
•给定内存地址 :• 使用 Index 找到正确的 set.
• 在所找到的 set 中比较所有的 Tag.
• 如果找到 hit!, 否则 miss.
• 最后 , 和前面一样使用偏移字段在块中找到所需的数据 .
![Page 39: Machine Structures Lecture 25 – Caches III](https://reader036.vdocuments.us/reader036/viewer/2022070401/56813742550346895d9ed623/html5/thumbnails/39.jpg)
L25 Caches III (39)
N-Way Set Associative Cache (3/3)
•What’s so great about this?• even a 2-way set assoc cache avoids a lot of conflict misses
• hardware cost isn’t that bad: only need N comparators
• In fact, for a cache with M blocks,• it’s Direct-Mapped if it’s 1-way set assoc
• it’s Fully Assoc if it’s M-way set assoc
• so these two are just special cases of the more general set associative design
![Page 40: Machine Structures Lecture 25 – Caches III](https://reader036.vdocuments.us/reader036/viewer/2022070401/56813742550346895d9ed623/html5/thumbnails/40.jpg)
L25 Caches III (40)
4-Way Set Associative Cache Circuit
tagindex
![Page 41: Machine Structures Lecture 25 – Caches III](https://reader036.vdocuments.us/reader036/viewer/2022070401/56813742550346895d9ed623/html5/thumbnails/41.jpg)
L25 Caches III (41)
Block Replacement Policy• Direct-Mapped Cache: index completely
specifies position which position a block can go in on a miss
• N-Way Set Assoc: index specifies a set, but block can occupy any position within the set on a miss
• Fully Associative: block can be written into any position
•Question: 如果有多个选择 , 我们该写到何处 ?• 如果有空位 , 通常总是写到第一个空位 .
• If all possible locations already have a valid block, we must pick a replacement policy: rule by which we determine which block gets “cached out” on a miss.
![Page 42: Machine Structures Lecture 25 – Caches III](https://reader036.vdocuments.us/reader036/viewer/2022070401/56813742550346895d9ed623/html5/thumbnails/42.jpg)
L25 Caches III (42)
Block Replacement Policy: LRU•LRU (Least Recently Used)
• Idea: cache out block which has been accessed (read or write) least recently
• Pro: temporal locality recent past use implies likely future use: in fact, this is a very effective policy
• Con: with 2-way set assoc, easy to keep track (one LRU bit); with 4-way or greater, requires complicated hardware and much time to keep track of this
![Page 43: Machine Structures Lecture 25 – Caches III](https://reader036.vdocuments.us/reader036/viewer/2022070401/56813742550346895d9ed623/html5/thumbnails/43.jpg)
L25 Caches III (43)
Block Replacement Example•We have a 2-way set associative cache
with a four word total capacity and one word blocks. We perform the following word accesses (ignore bytes for this problem):
0, 2, 0, 1, 4, 0, 2, 3, 5, 4
How many hits and how many misses will there be for the LRU block replacement policy?
![Page 44: Machine Structures Lecture 25 – Caches III](https://reader036.vdocuments.us/reader036/viewer/2022070401/56813742550346895d9ed623/html5/thumbnails/44.jpg)
L25 Caches III (44)
Block Replacement Example: LRU•Addresses 0, 2, 0, 1, 4, 0, ... 0 lru
2
1 lru
loc 0 loc 1
set 0
set 1
0 2lruset 0
set 1
0: miss, bring into set 0 (loc 0)
2: miss, bring into set 0 (loc 1)
0: hit
1: miss, bring into set 1 (loc 0)
4: miss, bring into set 0 (loc 1, replace 2)
0: hit
0set 0
set 1
lrulru
0 2set 0
set 1
lru lru
set 0
set 1
01
lru
lru24lru
set 0
set 1
0 41
lru
lru lru
![Page 45: Machine Structures Lecture 25 – Caches III](https://reader036.vdocuments.us/reader036/viewer/2022070401/56813742550346895d9ed623/html5/thumbnails/45.jpg)
L25 Caches III (45)
Big Idea
•How to choose between associativity, block size, replacement & write policy?
•Design against a performance model• Minimize: Average Memory Access Time
= Hit Time + Miss Penalty x Miss Rate
• influenced by technology & program behavior
•Create the illusion of a memory that is large, cheap, and fast - on average
•How can we improve miss penalty?
![Page 46: Machine Structures Lecture 25 – Caches III](https://reader036.vdocuments.us/reader036/viewer/2022070401/56813742550346895d9ed623/html5/thumbnails/46.jpg)
L25 Caches III (46)
Improving Miss Penalty
•When caches first became popular, Miss Penalty ~ 10 processor clock cycles
•Today 2400 MHz Processor (0.4 ns per clock cycle) and 80 ns to go to DRAM 200 processor clock cycles!
Proc $2
DR
AM
$
MEM
Solution: another cache between memory and the processor cache: Second Level (L2) Cache
![Page 47: Machine Structures Lecture 25 – Caches III](https://reader036.vdocuments.us/reader036/viewer/2022070401/56813742550346895d9ed623/html5/thumbnails/47.jpg)
L25 Caches III (47)
Analyzing Multi-level cache hierarchy
Proc $2
DR
AM
$
L1 hit time
L1 Miss RateL1 Miss Penalty
Avg Mem Access Time = L1 Hit Time + L1 Miss Rate * L1 Miss Penalty
L1 Miss Penalty = L2 Hit Time + L2 Miss Rate * L2 Miss Penalty
Avg Mem Access Time = L1 Hit Time + L1 Miss Rate * (L2 Hit Time + L2 Miss Rate * L2 Miss Penalty)
L2 hit time L2 Miss Rate
L2 Miss Penalty
![Page 48: Machine Structures Lecture 25 – Caches III](https://reader036.vdocuments.us/reader036/viewer/2022070401/56813742550346895d9ed623/html5/thumbnails/48.jpg)
L25 Caches III (48)
And in Conclusion…• We’ve discussed memory caching in detail.
Caching in general shows up over and over in computer systems
• Filesystem cache• Web page cache• Game databases / tablebases• Software memoization• Others?
• Big idea: if something is expensive but we want to do it repeatedly, do it once and cache the result.
• Cache design choices:• Write through v. write back• size of cache: speed v. capacity• direct-mapped v. associative• for N-way set assoc: choice of N• block replacement policy• 2nd level cache?• 3rd level cache?
• Use performance model to pick between choices, depending on programs, technology, budget, ...
![Page 49: Machine Structures Lecture 25 – Caches III](https://reader036.vdocuments.us/reader036/viewer/2022070401/56813742550346895d9ed623/html5/thumbnails/49.jpg)
L25 Caches III (49)
BONUS SLIDES
![Page 50: Machine Structures Lecture 25 – Caches III](https://reader036.vdocuments.us/reader036/viewer/2022070401/56813742550346895d9ed623/html5/thumbnails/50.jpg)
L25 Caches III (50)
Example
•Assume • Hit Time = 1 cycle
• Miss rate = 5%
• Miss penalty = 20 cycles
• Calculate AMAT…
•Avg mem access time = 1 + 0.05 x 20
= 1 + 1 cycles
= 2 cycles
![Page 51: Machine Structures Lecture 25 – Caches III](https://reader036.vdocuments.us/reader036/viewer/2022070401/56813742550346895d9ed623/html5/thumbnails/51.jpg)
L25 Caches III (51)
Ways to reduce miss rate
•Larger cache• limited by cost and technology
• hit time of first level cache < cycle time (bigger caches are slower)
•More places in the cache to put each block of memory – associativity
• fully-associative any block any line
• N-way set associated N places for each block direct map: N=1
![Page 52: Machine Structures Lecture 25 – Caches III](https://reader036.vdocuments.us/reader036/viewer/2022070401/56813742550346895d9ed623/html5/thumbnails/52.jpg)
L25 Caches III (52)
Typical Scale
•L1 • size: tens of KB• hit time: complete in one clock cycle
• miss rates: 1-5%
•L2:• size: hundreds of KB• hit time: few clock cycles
• miss rates: 10-20%
•L2 miss rate is fraction of L1 misses that also miss in L2
• why so high?
![Page 53: Machine Structures Lecture 25 – Caches III](https://reader036.vdocuments.us/reader036/viewer/2022070401/56813742550346895d9ed623/html5/thumbnails/53.jpg)
L25 Caches III (53)
Example: with L2 cache
•Assume • L1 Hit Time = 1 cycle
• L1 Miss rate = 5%
• L2 Hit Time = 5 cycles
• L2 Miss rate = 15% (% L1 misses that miss)
• L2 Miss Penalty = 200 cycles
•L1 miss penalty = 5 + 0.15 * 200 = 35
•Avg mem access time = 1 + 0.05 x 35= 2.75 cycles
![Page 54: Machine Structures Lecture 25 – Caches III](https://reader036.vdocuments.us/reader036/viewer/2022070401/56813742550346895d9ed623/html5/thumbnails/54.jpg)
L25 Caches III (54)
Example: without L2 cache
•Assume • L1 Hit Time = 1 cycle
• L1 Miss rate = 5%
• L1 Miss Penalty = 200 cycles
•Avg mem access time = 1 + 0.05 x 200= 11 cycles
•4x faster with L2 cache! (2.75 vs. 11)
![Page 55: Machine Structures Lecture 25 – Caches III](https://reader036.vdocuments.us/reader036/viewer/2022070401/56813742550346895d9ed623/html5/thumbnails/55.jpg)
L25 Caches III (55)
An actual CPU – Early PowerPC• Cache
• 32 KByte Instructions and 32 KByte Data L1 caches
• External L2 Cache interface with integrated controller and cache tags, supports up to 1 MByte external L2 cache
• Dual Memory Management Units (MMU) with Translation Lookaside Buffers (TLB)
• Pipelining• Superscalar (3
inst/cycle)• 6 execution units (2
integer and 1 double precision IEEE floating point)
![Page 56: Machine Structures Lecture 25 – Caches III](https://reader036.vdocuments.us/reader036/viewer/2022070401/56813742550346895d9ed623/html5/thumbnails/56.jpg)
L25 Caches III (56)
An Actual CPU – Pentium M
32KB I$
32KB D$