inside out of your computer...
TRANSCRIPT
![Page 1: Inside out of your computer memoriescseweb.ucsd.edu/classes/su16/cse141-a/slides/Cache_20160719.pdf · Outline • Memory Hierarchy • The CPU-memory gap problems • Locality •](https://reader035.vdocuments.us/reader035/viewer/2022062505/5bb4f38509d3f2b4158b53c6/html5/thumbnails/1.jpg)
Inside out of your computer memories
Hung-Wei Tseng
![Page 2: Inside out of your computer memoriescseweb.ucsd.edu/classes/su16/cse141-a/slides/Cache_20160719.pdf · Outline • Memory Hierarchy • The CPU-memory gap problems • Locality •](https://reader035.vdocuments.us/reader035/viewer/2022062505/5bb4f38509d3f2b4158b53c6/html5/thumbnails/2.jpg)
Announcement• Pick up your midterm• Homework #4 due next Monday• Reading quiz due tomorrow
2
![Page 3: Inside out of your computer memoriescseweb.ucsd.edu/classes/su16/cse141-a/slides/Cache_20160719.pdf · Outline • Memory Hierarchy • The CPU-memory gap problems • Locality •](https://reader035.vdocuments.us/reader035/viewer/2022062505/5bb4f38509d3f2b4158b53c6/html5/thumbnails/3.jpg)
Outline• Memory Hierarchy
• The CPU-memory gap problems• Locality
• Cache organization• The structure of a cache• Hung-Wei’s secret formula of cache structures
3
![Page 4: Inside out of your computer memoriescseweb.ucsd.edu/classes/su16/cse141-a/slides/Cache_20160719.pdf · Outline • Memory Hierarchy • The CPU-memory gap problems • Locality •](https://reader035.vdocuments.us/reader035/viewer/2022062505/5bb4f38509d3f2b4158b53c6/html5/thumbnails/4.jpg)
The memory gap problem
4
![Page 5: Inside out of your computer memoriescseweb.ucsd.edu/classes/su16/cse141-a/slides/Cache_20160719.pdf · Outline • Memory Hierarchy • The CPU-memory gap problems • Locality •](https://reader035.vdocuments.us/reader035/viewer/2022062505/5bb4f38509d3f2b4158b53c6/html5/thumbnails/5.jpg)
Stored-program computerProcessor
PC
120007a30: 0f00bb27 ldah gp,15(t12) 120007a34: 509cbd23 lda gp,-25520(gp)120007a38: 00005d24 ldah t1,0(gp)120007a3c: 0000bd24 ldah t4,0(gp)120007a40: 2ca422a0 ldl t0,-23508(t1)120007a44: 130020e4 beq t0,120007a94120007a48: 00003d24 ldah t0,0(gp)120007a4c: 2ca4e2b3 stl zero,-23508(t1)120007a50: 0004ff47 clr v0120007a54: 28a4e5b3 stl zero,-23512(t4)120007a58: 20a421a4 ldq t0,-23520(t0)120007a5c: 0e0020e4 beq t0,120007a98120007a60: 0204e147 mov t0,t1120007a64: 0304ff47 clr t2120007a68: 0500e0c3 br 120007a80
instruction memory
5
![Page 6: Inside out of your computer memoriescseweb.ucsd.edu/classes/su16/cse141-a/slides/Cache_20160719.pdf · Outline • Memory Hierarchy • The CPU-memory gap problems • Locality •](https://reader035.vdocuments.us/reader035/viewer/2022062505/5bb4f38509d3f2b4158b53c6/html5/thumbnails/6.jpg)
The memory space
6
0x0
0x2000
0x1000
0xFFF
0x1FFF
0x2FFF
0x8000
0x4000
0x3000
0x6000
0x5000
0x7000
0x3FFF
0x4FFF
0x5FFF
0x6FFF
0x7FFF
0x8FFF
![Page 7: Inside out of your computer memoriescseweb.ucsd.edu/classes/su16/cse141-a/slides/Cache_20160719.pdf · Outline • Memory Hierarchy • The CPU-memory gap problems • Locality •](https://reader035.vdocuments.us/reader035/viewer/2022062505/5bb4f38509d3f2b4158b53c6/html5/thumbnails/7.jpg)
The memory space
6
0x0
0x2000
0x1000
0xFFF
0x1FFF
0x2FFF
0x8000
0x4000
0x3000
0x6000
0x5000
0x7000
0x3FFF
0x4FFF
0x5FFF
0x6FFF
0x7FFF
0x8FFF
Processor
PC
![Page 8: Inside out of your computer memoriescseweb.ucsd.edu/classes/su16/cse141-a/slides/Cache_20160719.pdf · Outline • Memory Hierarchy • The CPU-memory gap problems • Locality •](https://reader035.vdocuments.us/reader035/viewer/2022062505/5bb4f38509d3f2b4158b53c6/html5/thumbnails/8.jpg)
The memory space
6
0x0
0x2000
0x1000
0xFFF
0x1FFF
0x2FFF
0x8000
0x4000
0x3000
0x6000
0x5000
0x7000
0x3FFF
0x4FFF
0x5FFF
0x6FFF
0x7FFF
0x8FFF
Processor
PC
![Page 9: Inside out of your computer memoriescseweb.ucsd.edu/classes/su16/cse141-a/slides/Cache_20160719.pdf · Outline • Memory Hierarchy • The CPU-memory gap problems • Locality •](https://reader035.vdocuments.us/reader035/viewer/2022062505/5bb4f38509d3f2b4158b53c6/html5/thumbnails/9.jpg)
The memory space
6
0x0
0x2000
0x1000
0xFFF
0x1FFF
0x2FFF
0x8000
0x4000
0x3000
0x6000
0x5000
0x7000
0x3FFF
0x4FFF
0x5FFF
0x6FFF
0x7FFF
0x8FFF
Processor
PC
![Page 10: Inside out of your computer memoriescseweb.ucsd.edu/classes/su16/cse141-a/slides/Cache_20160719.pdf · Outline • Memory Hierarchy • The CPU-memory gap problems • Locality •](https://reader035.vdocuments.us/reader035/viewer/2022062505/5bb4f38509d3f2b4158b53c6/html5/thumbnails/10.jpg)
Why memory hierarchy?
CPU
main memory
lw $t2, 0($a0)add $t3, $t2, $a1addi $a0, $a0, 4subi $a1, $a1, 1bne $a1, LOOPlw $t2, 0($a0)add $t3, $t2, $a1
The access time of DDR3-1600 DRAM is around 50ns
100x to the cycle time of a 2GHz processor!SRAM is as fast as the processor, but $$$
7
![Page 11: Inside out of your computer memoriescseweb.ucsd.edu/classes/su16/cse141-a/slides/Cache_20160719.pdf · Outline • Memory Hierarchy • The CPU-memory gap problems • Locality •](https://reader035.vdocuments.us/reader035/viewer/2022062505/5bb4f38509d3f2b4158b53c6/html5/thumbnails/11.jpg)
Memory’s impact
8
• Considering that you have a processor with base CPI (including instruction fetch) of 1. The latency of DRAM is 100 cycles. If the application contains 20% memory operations, what’s the slowdown comparing with a perfect processor with CPI=1? (Choose the closest one)A. 15%B. 35%C. 55%D. 75%E. 95%
average CPI = 1 + 0.2*100 = 21slowdown = 1/21 = 4.76% (95% performance drop)
![Page 12: Inside out of your computer memoriescseweb.ucsd.edu/classes/su16/cse141-a/slides/Cache_20160719.pdf · Outline • Memory Hierarchy • The CPU-memory gap problems • Locality •](https://reader035.vdocuments.us/reader035/viewer/2022062505/5bb4f38509d3f2b4158b53c6/html5/thumbnails/12.jpg)
Islands (long-term memory)
9
Short-term Memory
The memory hierarchy in “inside out”
Core Memory
![Page 13: Inside out of your computer memoriescseweb.ucsd.edu/classes/su16/cse141-a/slides/Cache_20160719.pdf · Outline • Memory Hierarchy • The CPU-memory gap problems • Locality •](https://reader035.vdocuments.us/reader035/viewer/2022062505/5bb4f38509d3f2b4158b53c6/html5/thumbnails/13.jpg)
Memory hierarchy
10
CPU
Main Memory
Secondary Storage
Fastest, Most Expensive
Biggest
Access time
< 1ns
50-60ns
10,000,000ns
$ < 1ns ~20 nsCache
![Page 14: Inside out of your computer memoriescseweb.ucsd.edu/classes/su16/cse141-a/slides/Cache_20160719.pdf · Outline • Memory Hierarchy • The CPU-memory gap problems • Locality •](https://reader035.vdocuments.us/reader035/viewer/2022062505/5bb4f38509d3f2b4158b53c6/html5/thumbnails/14.jpg)
Memory hierarchy
10
CPU
Main Memory
Secondary Storage
Fastest, Most Expensive
Biggest
Access time
< 1ns
50-60ns
10,000,000ns
$ < 1ns ~20 nsCache
![Page 15: Inside out of your computer memoriescseweb.ucsd.edu/classes/su16/cse141-a/slides/Cache_20160719.pdf · Outline • Memory Hierarchy • The CPU-memory gap problems • Locality •](https://reader035.vdocuments.us/reader035/viewer/2022062505/5bb4f38509d3f2b4158b53c6/html5/thumbnails/15.jpg)
The memory space
11
0x0
0x2000
0x1000
0xFFF
0x1FFF
0x2FFF
0x8000
0x4000
0x3000
0x6000
0x5000
0x7000
0x3FFF
0x4FFF
0x5FFF
0x6FFF
0x7FFF
0x8FFF
Processor
PC
![Page 16: Inside out of your computer memoriescseweb.ucsd.edu/classes/su16/cse141-a/slides/Cache_20160719.pdf · Outline • Memory Hierarchy • The CPU-memory gap problems • Locality •](https://reader035.vdocuments.us/reader035/viewer/2022062505/5bb4f38509d3f2b4158b53c6/html5/thumbnails/16.jpg)
The memory space
11
0x0
0x2000
0x1000
0xFFF
0x1FFF
0x2FFF
0x8000
0x4000
0x3000
0x6000
0x5000
0x7000
0x3FFF
0x4FFF
0x5FFF
0x6FFF
0x7FFF
0x8FFF
Processor
PC
$
![Page 17: Inside out of your computer memoriescseweb.ucsd.edu/classes/su16/cse141-a/slides/Cache_20160719.pdf · Outline • Memory Hierarchy • The CPU-memory gap problems • Locality •](https://reader035.vdocuments.us/reader035/viewer/2022062505/5bb4f38509d3f2b4158b53c6/html5/thumbnails/17.jpg)
The memory space
11
0x0
0x2000
0x1000
0xFFF
0x1FFF
0x2FFF
0x8000
0x4000
0x3000
0x6000
0x5000
0x7000
0x3FFF
0x4FFF
0x5FFF
0x6FFF
0x7FFF
0x8FFF
Processor
PC
$block
![Page 18: Inside out of your computer memoriescseweb.ucsd.edu/classes/su16/cse141-a/slides/Cache_20160719.pdf · Outline • Memory Hierarchy • The CPU-memory gap problems • Locality •](https://reader035.vdocuments.us/reader035/viewer/2022062505/5bb4f38509d3f2b4158b53c6/html5/thumbnails/18.jpg)
The memory space
11
0x0
0x2000
0x1000
0xFFF
0x1FFF
0x2FFF
0x8000
0x4000
0x3000
0x6000
0x5000
0x7000
0x3FFF
0x4FFF
0x5FFF
0x6FFF
0x7FFF
0x8FFF
Processor
PC
$block
![Page 19: Inside out of your computer memoriescseweb.ucsd.edu/classes/su16/cse141-a/slides/Cache_20160719.pdf · Outline • Memory Hierarchy • The CPU-memory gap problems • Locality •](https://reader035.vdocuments.us/reader035/viewer/2022062505/5bb4f38509d3f2b4158b53c6/html5/thumbnails/19.jpg)
The memory space
11
0x0
0x2000
0x1000
0xFFF
0x1FFF
0x2FFF
0x8000
0x4000
0x3000
0x6000
0x5000
0x7000
0x3FFF
0x4FFF
0x5FFF
0x6FFF
0x7FFF
0x8FFF
Processor
PC
$block
![Page 20: Inside out of your computer memoriescseweb.ucsd.edu/classes/su16/cse141-a/slides/Cache_20160719.pdf · Outline • Memory Hierarchy • The CPU-memory gap problems • Locality •](https://reader035.vdocuments.us/reader035/viewer/2022062505/5bb4f38509d3f2b4158b53c6/html5/thumbnails/20.jpg)
The memory space
11
0x0
0x2000
0x1000
0xFFF
0x1FFF
0x2FFF
0x8000
0x4000
0x3000
0x6000
0x5000
0x7000
0x3FFF
0x4FFF
0x5FFF
0x6FFF
0x7FFF
0x8FFF
Processor
PC
$block
![Page 21: Inside out of your computer memoriescseweb.ucsd.edu/classes/su16/cse141-a/slides/Cache_20160719.pdf · Outline • Memory Hierarchy • The CPU-memory gap problems • Locality •](https://reader035.vdocuments.us/reader035/viewer/2022062505/5bb4f38509d3f2b4158b53c6/html5/thumbnails/21.jpg)
The memory space
11
0x0
0x2000
0x1000
0xFFF
0x1FFF
0x2FFF
0x8000
0x4000
0x3000
0x6000
0x5000
0x7000
0x3FFF
0x4FFF
0x5FFF
0x6FFF
0x7FFF
0x8FFF
Processor
PC
$block
![Page 22: Inside out of your computer memoriescseweb.ucsd.edu/classes/su16/cse141-a/slides/Cache_20160719.pdf · Outline • Memory Hierarchy • The CPU-memory gap problems • Locality •](https://reader035.vdocuments.us/reader035/viewer/2022062505/5bb4f38509d3f2b4158b53c6/html5/thumbnails/22.jpg)
The memory space
11
0x0
0x2000
0x1000
0xFFF
0x1FFF
0x2FFF
0x8000
0x4000
0x3000
0x6000
0x5000
0x7000
0x3FFF
0x4FFF
0x5FFF
0x6FFF
0x7FFF
0x8FFF
Processor
PC
$block
![Page 23: Inside out of your computer memoriescseweb.ucsd.edu/classes/su16/cse141-a/slides/Cache_20160719.pdf · Outline • Memory Hierarchy • The CPU-memory gap problems • Locality •](https://reader035.vdocuments.us/reader035/viewer/2022062505/5bb4f38509d3f2b4158b53c6/html5/thumbnails/23.jpg)
Why building memory hierarchy would help?
• How many of the following descriptions about memory hierarchy/caching is/are correct?
I. Existing programs can take advantage from memory hierarchy without any change.
II. Memory hierarchy can capture frequently used data/instructions in faster/more expensive memory.
III. Memory hierarchy can capture data/instructions that will be referenced in the near future in faster/more expensive memory.
IV. Memory hierarchy exists because we cannot build large, fast memories.
12
localitylocality
A. 0B. 1C. 2D. 3E. 4
![Page 24: Inside out of your computer memoriescseweb.ucsd.edu/classes/su16/cse141-a/slides/Cache_20160719.pdf · Outline • Memory Hierarchy • The CPU-memory gap problems • Locality •](https://reader035.vdocuments.us/reader035/viewer/2022062505/5bb4f38509d3f2b4158b53c6/html5/thumbnails/24.jpg)
Locality
CPU
$
Main Memory
Secondary Storage
Fastest, Most Expensive
Biggest
• Temporal Locality• Referenced item tends to
be referenced again soon.• Spatial Locality
• Items close by referenced item tends to be referenced soon. • example: consecutive
instructions, arrays
13
• Let’s see how to build a memory hierarchy with “cache” that exploits “both” locality
![Page 25: Inside out of your computer memoriescseweb.ucsd.edu/classes/su16/cse141-a/slides/Cache_20160719.pdf · Outline • Memory Hierarchy • The CPU-memory gap problems • Locality •](https://reader035.vdocuments.us/reader035/viewer/2022062505/5bb4f38509d3f2b4158b53c6/html5/thumbnails/25.jpg)
Where in our code has locality?• Which description about locality of arrays sum and A
in the following code is the most accurate?for(i = 0; i< 100000; i++){ sum[i%10] += A[i];}
A. Access of A has temporal locality, sum has spatial localityB. Both A and sum have temporal locality, and sum also has
spatial localityC. Access of A has spatial locality, sum has temporal localityD. Both A and sum have spatial localityE. Both A and sum have spatial locality, and sum also has
temporal locality14
spatial locality: A[0], A[1], A[2], A[3], ....sum[0], sum[1], ... , sum[9]temporal locality:reuse of sum[0], sum[1], ... , sum[9]
![Page 26: Inside out of your computer memoriescseweb.ucsd.edu/classes/su16/cse141-a/slides/Cache_20160719.pdf · Outline • Memory Hierarchy • The CPU-memory gap problems • Locality •](https://reader035.vdocuments.us/reader035/viewer/2022062505/5bb4f38509d3f2b4158b53c6/html5/thumbnails/26.jpg)
Demo revisited• Why the left performs a lot better than the right one?
A. The left one has fewer instruction countsB. The left one exploits spatial locality betterC. The left one exploits temporal locality betterD. The left one exploits both spatial and temporal locality better
15
for(i = 0; i < ARRAY_SIZE; i++){ for(j = 0; j < ARRAY_SIZE; j++) { c[i][j] = a[i][j] + b[i][j]; }}
for(j = 0; j < ARRAY_SIZE; j++){ for(i = 0; i < ARRAY_SIZE; i++) { c[i][j] = a[i][j] + b[i][j]; }}
Array_size = 1024, 0.048s(5.25X faster)
Array_size = 1024, 0.252s
![Page 27: Inside out of your computer memoriescseweb.ucsd.edu/classes/su16/cse141-a/slides/Cache_20160719.pdf · Outline • Memory Hierarchy • The CPU-memory gap problems • Locality •](https://reader035.vdocuments.us/reader035/viewer/2022062505/5bb4f38509d3f2b4158b53c6/html5/thumbnails/27.jpg)
Demo revisited
16
for(i = 0; i < ARRAY_SIZE; i++){ for(j = 0; j < ARRAY_SIZE; j++) { c[i][j] = a[i][j] + b[i][j]; }}
for(j = 0; j < ARRAY_SIZE; j++){ for(i = 0; i < ARRAY_SIZE; i++) { c[i][j] = a[i][j] + b[i][j]; }}
Array_size = 1024, 0.048s(5.25X faster)
Array_size = 1024, 0.252s
![Page 28: Inside out of your computer memoriescseweb.ucsd.edu/classes/su16/cse141-a/slides/Cache_20160719.pdf · Outline • Memory Hierarchy • The CPU-memory gap problems • Locality •](https://reader035.vdocuments.us/reader035/viewer/2022062505/5bb4f38509d3f2b4158b53c6/html5/thumbnails/28.jpg)
Cache organization
17
![Page 29: Inside out of your computer memoriescseweb.ucsd.edu/classes/su16/cse141-a/slides/Cache_20160719.pdf · Outline • Memory Hierarchy • The CPU-memory gap problems • Locality •](https://reader035.vdocuments.us/reader035/viewer/2022062505/5bb4f38509d3f2b4158b53c6/html5/thumbnails/29.jpg)
Cache• Like a cheat-sheet for the processor• For a cheat-sheet, you may need to put
• Most frequently asked concepts (temporal locality)• Problems, key points related to the frequently asked topics
(spatial locality)
18
![Page 30: Inside out of your computer memoriescseweb.ucsd.edu/classes/su16/cse141-a/slides/Cache_20160719.pdf · Outline • Memory Hierarchy • The CPU-memory gap problems • Locality •](https://reader035.vdocuments.us/reader035/viewer/2022062505/5bb4f38509d3f2b4158b53c6/html5/thumbnails/30.jpg)
How do you make a cheatsheet?• Go through your homework• Write down the topic and content• If running out of space: kick out the least recently
used content
19
1. Performance equationPerformance equation ET=IC*CPI*CT
![Page 31: Inside out of your computer memoriescseweb.ucsd.edu/classes/su16/cse141-a/slides/Cache_20160719.pdf · Outline • Memory Hierarchy • The CPU-memory gap problems • Locality •](https://reader035.vdocuments.us/reader035/viewer/2022062505/5bb4f38509d3f2b4158b53c6/html5/thumbnails/31.jpg)
How do you make a cheatsheet?• Go through your homework• Write down the topic and content• If running out of space: kick out the least recently
used content
19
1. Performance equationPerformance equation ET=IC*CPI*CT2. Amdahl’s lawAmdahl’s law ET_after = ET_affected/Speedup
+ ET_unaffected
![Page 32: Inside out of your computer memoriescseweb.ucsd.edu/classes/su16/cse141-a/slides/Cache_20160719.pdf · Outline • Memory Hierarchy • The CPU-memory gap problems • Locality •](https://reader035.vdocuments.us/reader035/viewer/2022062505/5bb4f38509d3f2b4158b53c6/html5/thumbnails/32.jpg)
How do you make a cheatsheet?• Go through your homework• Write down the topic and content• If running out of space: kick out the least recently
used content
19
1. Performance equationPerformance equation ET=IC*CPI*CT2. Amdahl’s lawAmdahl’s law ET_after = ET_affected/Speedup
+ ET_unaffected3. MIPS
MIPS MIPS = IC/(ET*106)
![Page 33: Inside out of your computer memoriescseweb.ucsd.edu/classes/su16/cse141-a/slides/Cache_20160719.pdf · Outline • Memory Hierarchy • The CPU-memory gap problems • Locality •](https://reader035.vdocuments.us/reader035/viewer/2022062505/5bb4f38509d3f2b4158b53c6/html5/thumbnails/33.jpg)
How do you make a cheatsheet?• Go through your homework• Write down the topic and content• If running out of space: kick out the least recently
used content
19
1. Performance equationPerformance equation ET=IC*CPI*CT2. Amdahl’s lawAmdahl’s law ET_after = ET_affected/Speedup
+ ET_unaffected3. MIPS
Power consumption P = aCV2f
4. Power consumptionMIPS MIPS = IC/(ET*106)
![Page 34: Inside out of your computer memoriescseweb.ucsd.edu/classes/su16/cse141-a/slides/Cache_20160719.pdf · Outline • Memory Hierarchy • The CPU-memory gap problems • Locality •](https://reader035.vdocuments.us/reader035/viewer/2022062505/5bb4f38509d3f2b4158b53c6/html5/thumbnails/34.jpg)
How do you make a cheatsheet?• Go through your homework• Write down the topic and content• If running out of space: kick out the least recently
used content
19
1. Performance equationPerformance equation ET=IC*CPI*CT2. Amdahl’s lawAmdahl’s law ET_after = ET_affected/Speedup
+ ET_unaffected3. MIPS
Power consumption P = aCV2f
4. Power consumptionMIPS MIPS = IC/(ET*106)5. Performance equation
![Page 35: Inside out of your computer memoriescseweb.ucsd.edu/classes/su16/cse141-a/slides/Cache_20160719.pdf · Outline • Memory Hierarchy • The CPU-memory gap problems • Locality •](https://reader035.vdocuments.us/reader035/viewer/2022062505/5bb4f38509d3f2b4158b53c6/html5/thumbnails/35.jpg)
How do you make a cheatsheet?• Go through your homework• Write down the topic and content• If running out of space: kick out the least recently
used content
19
1. Performance equationPerformance equation ET=IC*CPI*CT2. Amdahl’s lawAmdahl’s law ET_after = ET_affected/Speedup
+ ET_unaffected3. MIPS
Power consumption P = aCV2f
4. Power consumptionMIPS MIPS = IC/(ET*106)5. Performance equation
6. Amdahl’s law
![Page 36: Inside out of your computer memoriescseweb.ucsd.edu/classes/su16/cse141-a/slides/Cache_20160719.pdf · Outline • Memory Hierarchy • The CPU-memory gap problems • Locality •](https://reader035.vdocuments.us/reader035/viewer/2022062505/5bb4f38509d3f2b4158b53c6/html5/thumbnails/36.jpg)
How do you make a cheatsheet?• Go through your homework• Write down the topic and content• If running out of space: kick out the least recently
used content
19
1. Performance equationPerformance equation ET=IC*CPI*CT2. Amdahl’s lawAmdahl’s law ET_after = ET_affected/Speedup
+ ET_unaffected3. MIPS
Power consumption P = aCV2f
4. Power consumptionMIPS MIPS = IC/(ET*106)5. Performance equation
6. Amdahl’s law7. MFLOPS
![Page 37: Inside out of your computer memoriescseweb.ucsd.edu/classes/su16/cse141-a/slides/Cache_20160719.pdf · Outline • Memory Hierarchy • The CPU-memory gap problems • Locality •](https://reader035.vdocuments.us/reader035/viewer/2022062505/5bb4f38509d3f2b4158b53c6/html5/thumbnails/37.jpg)
How do you make a cheatsheet?• Go through your homework• Write down the topic and content• If running out of space: kick out the least recently
used content
19
1. Performance equationPerformance equation ET=IC*CPI*CT2. Amdahl’s lawAmdahl’s law ET_after = ET_affected/Speedup
+ ET_unaffected3. MIPS
Power consumption P = aCV2f
4. Power consumption5. Performance equation6. Amdahl’s law7. MFLOPS
MFLOPS MIPS = No_FP_Ops/(ET*106)
![Page 38: Inside out of your computer memoriescseweb.ucsd.edu/classes/su16/cse141-a/slides/Cache_20160719.pdf · Outline • Memory Hierarchy • The CPU-memory gap problems • Locality •](https://reader035.vdocuments.us/reader035/viewer/2022062505/5bb4f38509d3f2b4158b53c6/html5/thumbnails/38.jpg)
How do you make a cheatsheet?• Go through your homework• Write down the topic and content• If running out of space: kick out the least recently
used content
19
1. Performance equationPerformance equation ET=IC*CPI*CT2. Amdahl’s lawAmdahl’s law ET_after = ET_affected/Speedup
+ ET_unaffected3. MIPS
Power consumption P = aCV2f
4. Power consumption5. Performance equation6. Amdahl’s law7. MFLOPS
MFLOPS MIPS = No_FP_Ops/(ET*106)
Cacheline/block: data with the same prefix in their addresses
![Page 39: Inside out of your computer memoriescseweb.ucsd.edu/classes/su16/cse141-a/slides/Cache_20160719.pdf · Outline • Memory Hierarchy • The CPU-memory gap problems • Locality •](https://reader035.vdocuments.us/reader035/viewer/2022062505/5bb4f38509d3f2b4158b53c6/html5/thumbnails/39.jpg)
How do you make a cheatsheet?• Go through your homework• Write down the topic and content• If running out of space: kick out the least recently
used content
19
1. Performance equationPerformance equation ET=IC*CPI*CT2. Amdahl’s lawAmdahl’s law ET_after = ET_affected/Speedup
+ ET_unaffected3. MIPS
Power consumption P = aCV2f
4. Power consumption5. Performance equation6. Amdahl’s law7. MFLOPS
MFLOPS MIPS = No_FP_Ops/(ET*106)
Cacheline/block: data with the same prefix in their addresses
Tag: the address prefix of data in the cacheline/block
![Page 40: Inside out of your computer memoriescseweb.ucsd.edu/classes/su16/cse141-a/slides/Cache_20160719.pdf · Outline • Memory Hierarchy • The CPU-memory gap problems • Locality •](https://reader035.vdocuments.us/reader035/viewer/2022062505/5bb4f38509d3f2b4158b53c6/html5/thumbnails/40.jpg)
Let’s make memory great again!• Spatial locality
• Each hash entry contains a block of data • We bring a “block” of data each time• Cache blocks are a power of 2 in size. • Usually between 16B-128Bs• Tag: help us identify what’s in the block
• Temporal locality• LRU-like polices keeps the most frequently used data
20
![Page 41: Inside out of your computer memoriescseweb.ucsd.edu/classes/su16/cse141-a/slides/Cache_20160719.pdf · Outline • Memory Hierarchy • The CPU-memory gap problems • Locality •](https://reader035.vdocuments.us/reader035/viewer/2022062505/5bb4f38509d3f2b4158b53c6/html5/thumbnails/41.jpg)
A simple cache: a block can go anywhere
21
tag data1. 0x4 0b000001000b0000 0b00000000 - 0b000011112. 0x48 0b01001000
0b0100 0b01000000 - 0b010011113. 0xC4 0b11000100
0b1111 0b11110000 - 0b11111111
4. 0xFC 0b11111100
0b1100 0b11000000 - 0b11001111
• Assume each block contains 16B data• A total of 4 blocks• LRU
![Page 42: Inside out of your computer memoriescseweb.ucsd.edu/classes/su16/cse141-a/slides/Cache_20160719.pdf · Outline • Memory Hierarchy • The CPU-memory gap problems • Locality •](https://reader035.vdocuments.us/reader035/viewer/2022062505/5bb4f38509d3f2b4158b53c6/html5/thumbnails/42.jpg)
A simple cache: a block can go anywhere
21
tag data1. 0x4 0b000001000b0000 0b00000000 - 0b000011112. 0x48 0b01001000
0b0100 0b01000000 - 0b010011113. 0xC4 0b11000100
0b1111 0b11110000 - 0b11111111
4. 0xFC 0b11111100
0b1100 0b11000000 - 0b110011115. 0x12 0b00001100
• Assume each block contains 16B data• A total of 4 blocks• LRU
![Page 43: Inside out of your computer memoriescseweb.ucsd.edu/classes/su16/cse141-a/slides/Cache_20160719.pdf · Outline • Memory Hierarchy • The CPU-memory gap problems • Locality •](https://reader035.vdocuments.us/reader035/viewer/2022062505/5bb4f38509d3f2b4158b53c6/html5/thumbnails/43.jpg)
A simple cache: a block can go anywhere
21
tag data1. 0x4 0b000001000b0000 0b00000000 - 0b000011112. 0x48 0b01001000
0b0100 0b01000000 - 0b010011113. 0xC4 0b11000100
0b1111 0b11110000 - 0b11111111
4. 0xFC 0b11111100
0b1100 0b11000000 - 0b110011115. 0x12 0b000011006. 0x44 0b01000100
• Assume each block contains 16B data• A total of 4 blocks• LRU
![Page 44: Inside out of your computer memoriescseweb.ucsd.edu/classes/su16/cse141-a/slides/Cache_20160719.pdf · Outline • Memory Hierarchy • The CPU-memory gap problems • Locality •](https://reader035.vdocuments.us/reader035/viewer/2022062505/5bb4f38509d3f2b4158b53c6/html5/thumbnails/44.jpg)
A simple cache: a block can go anywhere
21
tag data1. 0x4 0b000001000b0000 0b00000000 - 0b000011112. 0x48 0b01001000
0b0100 0b01000000 - 0b010011113. 0xC4 0b11000100
0b1111 0b11110000 - 0b11111111
4. 0xFC 0b11111100
0b1100 0b11000000 - 0b110011115. 0x12 0b000011006. 0x44 0b010001007. 0x68 0b01100100
• Assume each block contains 16B data• A total of 4 blocks• LRU
![Page 45: Inside out of your computer memoriescseweb.ucsd.edu/classes/su16/cse141-a/slides/Cache_20160719.pdf · Outline • Memory Hierarchy • The CPU-memory gap problems • Locality •](https://reader035.vdocuments.us/reader035/viewer/2022062505/5bb4f38509d3f2b4158b53c6/html5/thumbnails/45.jpg)
A simple cache: a block can go anywhere
21
tag data1. 0x4 0b000001000b0000 0b00000000 - 0b000011112. 0x48 0b01001000
0b0100 0b01000000 - 0b010011113. 0xC4 0b11000100
0b1111 0b11110000 - 0b11111111
4. 0xFC 0b111111005. 0x12 0b000011006. 0x44 0b010001007. 0x68 0b01100100
0b0110 0b01100000 - 0b01101111
• Assume each block contains 16B data• A total of 4 blocks• LRU
![Page 46: Inside out of your computer memoriescseweb.ucsd.edu/classes/su16/cse141-a/slides/Cache_20160719.pdf · Outline • Memory Hierarchy • The CPU-memory gap problems • Locality •](https://reader035.vdocuments.us/reader035/viewer/2022062505/5bb4f38509d3f2b4158b53c6/html5/thumbnails/46.jpg)
A simple cache: a block can go anywhere
21
tag data1. 0x4 0b000001000b0000 0b00000000 - 0b000011112. 0x48 0b01001000
0b0100 0b01000000 - 0b010011113. 0xC4 0b11000100
0b1111 0b11110000 - 0b11111111
4. 0xFC 0b111111005. 0x12 0b000011006. 0x44 0b010001007. 0x68 0b01100100
0b0110 0b01100000 - 0b01101111
• Assume each block contains 16B data• A total of 4 blocks• LRU
• Too slow if the number of entries/blocks/cachelines is huge
![Page 47: Inside out of your computer memoriescseweb.ucsd.edu/classes/su16/cse141-a/slides/Cache_20160719.pdf · Outline • Memory Hierarchy • The CPU-memory gap problems • Locality •](https://reader035.vdocuments.us/reader035/viewer/2022062505/5bb4f38509d3f2b4158b53c6/html5/thumbnails/47.jpg)
Let’s make memory great again!• Spatial locality
• Each hash entry contains a block of data • We bring a “block” of data each time• Cache blocks are a power of 2 in size. • Usually between 16B-128Bs• Tag: help us identify what’s in the block
• Temporal locality• LRU-like polices keeps the most frequently used data
• Performance needs to be better than linear search• Make cache a hardware hash table!• The hash function takes memory addresses as inputs
22
![Page 48: Inside out of your computer memoriescseweb.ucsd.edu/classes/su16/cse141-a/slides/Cache_20160719.pdf · Outline • Memory Hierarchy • The CPU-memory gap problems • Locality •](https://reader035.vdocuments.us/reader035/viewer/2022062505/5bb4f38509d3f2b4158b53c6/html5/thumbnails/48.jpg)
valid tag datadirty
valid tag datadirty
The structure of a cache
23
1000 0000 0000 0000 00001000 0001 0000 1000 0000 11 10
![Page 49: Inside out of your computer memoriescseweb.ucsd.edu/classes/su16/cse141-a/slides/Cache_20160719.pdf · Outline • Memory Hierarchy • The CPU-memory gap problems • Locality •](https://reader035.vdocuments.us/reader035/viewer/2022062505/5bb4f38509d3f2b4158b53c6/html5/thumbnails/49.jpg)
valid tag datadirty
valid tag datadirty
The structure of a cache
23
1000 0000 0000 0000 00001000 0001 0000 1000 0000 11
Block / Cacheline: The basic unit of data storage in cache. Contains all data with the same tag/prefix and index in their memory addresses
10
![Page 50: Inside out of your computer memoriescseweb.ucsd.edu/classes/su16/cse141-a/slides/Cache_20160719.pdf · Outline • Memory Hierarchy • The CPU-memory gap problems • Locality •](https://reader035.vdocuments.us/reader035/viewer/2022062505/5bb4f38509d3f2b4158b53c6/html5/thumbnails/50.jpg)
valid tag datadirty
valid tag datadirty
The structure of a cache
23
1000 0000 0000 0000 00001000 0001 0000 1000 0000 11
Block / Cacheline: The basic unit of data storage in cache. Contains all data with the same tag/prefix and index in their memory addressesTag:
the high order address bits stored along with the data in a block to identify the actual address of the cache line.
10
![Page 51: Inside out of your computer memoriescseweb.ucsd.edu/classes/su16/cse141-a/slides/Cache_20160719.pdf · Outline • Memory Hierarchy • The CPU-memory gap problems • Locality •](https://reader035.vdocuments.us/reader035/viewer/2022062505/5bb4f38509d3f2b4158b53c6/html5/thumbnails/51.jpg)
valid tag datadirty
valid tag datadirty
The structure of a cache
23
Set: cache blocks/lines sharing the same index.A cache is called N-way set associative cache if N blocks share the same set/index (this one is a 2-way set cache)
1000 0000 0000 0000 00001000 0001 0000 1000 0000 11
Block / Cacheline: The basic unit of data storage in cache. Contains all data with the same tag/prefix and index in their memory addressesTag:
the high order address bits stored along with the data in a block to identify the actual address of the cache line.
10
![Page 52: Inside out of your computer memoriescseweb.ucsd.edu/classes/su16/cse141-a/slides/Cache_20160719.pdf · Outline • Memory Hierarchy • The CPU-memory gap problems • Locality •](https://reader035.vdocuments.us/reader035/viewer/2022062505/5bb4f38509d3f2b4158b53c6/html5/thumbnails/52.jpg)
valid tag datadirty
valid tag datadirty
The structure of a cache
23
Set: cache blocks/lines sharing the same index.A cache is called N-way set associative cache if N blocks share the same set/index (this one is a 2-way set cache)
1000 0000 0000 0000 00001000 0001 0000 1000 0000 11
Block / Cacheline: The basic unit of data storage in cache. Contains all data with the same tag/prefix and index in their memory addressesTag:
the high order address bits stored along with the data in a block to identify the actual address of the cache line.
10
valid: if the data is meaningfuldirty: if the block is modified
![Page 53: Inside out of your computer memoriescseweb.ucsd.edu/classes/su16/cse141-a/slides/Cache_20160719.pdf · Outline • Memory Hierarchy • The CPU-memory gap problems • Locality •](https://reader035.vdocuments.us/reader035/viewer/2022062505/5bb4f38509d3f2b4158b53c6/html5/thumbnails/53.jpg)
valid tag datadirty
valid tag datadirty
Accessing the cache
24
1000 0001 0000 1000 00001 0 1000 0000 0000 0000 00001 1
![Page 54: Inside out of your computer memoriescseweb.ucsd.edu/classes/su16/cse141-a/slides/Cache_20160719.pdf · Outline • Memory Hierarchy • The CPU-memory gap problems • Locality •](https://reader035.vdocuments.us/reader035/viewer/2022062505/5bb4f38509d3f2b4158b53c6/html5/thumbnails/54.jpg)
valid tag datadirty
valid tag datadirty
Accessing the cache
24
1000 0001 0000 1000 00001 0 1000 0000 0000 0000 00001 1
memory address: 0x8 0 0 0 0 1 5 8
![Page 55: Inside out of your computer memoriescseweb.ucsd.edu/classes/su16/cse141-a/slides/Cache_20160719.pdf · Outline • Memory Hierarchy • The CPU-memory gap problems • Locality •](https://reader035.vdocuments.us/reader035/viewer/2022062505/5bb4f38509d3f2b4158b53c6/html5/thumbnails/55.jpg)
valid tag datadirty
valid tag datadirty
Accessing the cache
24
1000 0000 0000 0000 0000 0001 0101 1000memory address:
1000 0001 0000 1000 00001 0 1000 0000 0000 0000 00001 1
memory address: 0x8 0 0 0 0 1 5 8
![Page 56: Inside out of your computer memoriescseweb.ucsd.edu/classes/su16/cse141-a/slides/Cache_20160719.pdf · Outline • Memory Hierarchy • The CPU-memory gap problems • Locality •](https://reader035.vdocuments.us/reader035/viewer/2022062505/5bb4f38509d3f2b4158b53c6/html5/thumbnails/56.jpg)
valid tag datadirty
valid tag datadirty
Accessing the cache
24
tag offsetindex1000 0000 0000 0000 0000 0001 0101 1000memory address:
1000 0001 0000 1000 00001 0 1000 0000 0000 0000 00001 1
memory address: 0x8 0 0 0 0 1 5 8
![Page 57: Inside out of your computer memoriescseweb.ucsd.edu/classes/su16/cse141-a/slides/Cache_20160719.pdf · Outline • Memory Hierarchy • The CPU-memory gap problems • Locality •](https://reader035.vdocuments.us/reader035/viewer/2022062505/5bb4f38509d3f2b4158b53c6/html5/thumbnails/57.jpg)
valid tag datadirty
valid tag datadirty
Accessing the cache
24
tag offsetindex1000 0000 0000 0000 0000 0001 0101 1000memory address:
1000 0001 0000 1000 00001 0 1000 0000 0000 0000 00001 1
memory address: 0x8 0 0 0 0 1 5 8
![Page 58: Inside out of your computer memoriescseweb.ucsd.edu/classes/su16/cse141-a/slides/Cache_20160719.pdf · Outline • Memory Hierarchy • The CPU-memory gap problems • Locality •](https://reader035.vdocuments.us/reader035/viewer/2022062505/5bb4f38509d3f2b4158b53c6/html5/thumbnails/58.jpg)
valid tag datadirty
valid tag datadirty
Accessing the cache
24
tag offsetindex1000 0000 0000 0000 0000 0001 0101 1000memory address:
1000 0001 0000 1000 00001 0 1000 0000 0000 0000 00001 1
memory address: 0x8 0 0 0 0 1 5 8
![Page 59: Inside out of your computer memoriescseweb.ucsd.edu/classes/su16/cse141-a/slides/Cache_20160719.pdf · Outline • Memory Hierarchy • The CPU-memory gap problems • Locality •](https://reader035.vdocuments.us/reader035/viewer/2022062505/5bb4f38509d3f2b4158b53c6/html5/thumbnails/59.jpg)
valid tag datadirty
valid tag datadirty
Accessing the cache
=?=?
24
tag offsetindex1000 0000 0000 0000 0000 0001 0101 1000memory address:
1000 0001 0000 1000 00001 0 1000 0000 0000 0000 00001 1
memory address: 0x8 0 0 0 0 1 5 8
![Page 60: Inside out of your computer memoriescseweb.ucsd.edu/classes/su16/cse141-a/slides/Cache_20160719.pdf · Outline • Memory Hierarchy • The CPU-memory gap problems • Locality •](https://reader035.vdocuments.us/reader035/viewer/2022062505/5bb4f38509d3f2b4158b53c6/html5/thumbnails/60.jpg)
valid tag datadirty
valid tag datadirty
Accessing the cache
=?=?
24
tag offsetindex1000 0000 0000 0000 0000 0001 0101 1000memory address:
1000 0001 0000 1000 00001 0 1000 0000 0000 0000 00001 1
memory address: 0x8 0 0 0 0 1 5 8
![Page 61: Inside out of your computer memoriescseweb.ucsd.edu/classes/su16/cse141-a/slides/Cache_20160719.pdf · Outline • Memory Hierarchy • The CPU-memory gap problems • Locality •](https://reader035.vdocuments.us/reader035/viewer/2022062505/5bb4f38509d3f2b4158b53c6/html5/thumbnails/61.jpg)
valid tag datadirty
valid tag datadirty
Accessing the cache
hit? miss?hit? miss?
=?=?
24
tag offsetindex1000 0000 0000 0000 0000 0001 0101 1000memory address:
1000 0001 0000 1000 00001 0 1000 0000 0000 0000 00001 1
memory address: 0x8 0 0 0 0 1 5 8
![Page 62: Inside out of your computer memoriescseweb.ucsd.edu/classes/su16/cse141-a/slides/Cache_20160719.pdf · Outline • Memory Hierarchy • The CPU-memory gap problems • Locality •](https://reader035.vdocuments.us/reader035/viewer/2022062505/5bb4f38509d3f2b4158b53c6/html5/thumbnails/62.jpg)
valid tag datadirty
valid tag datadirty
Accessing the cache
hit? miss?hit? miss?
=?=?
24
tag offsetindex1000 0000 0000 0000 0000 0001 0101 1000memory address:
Hit: The data was found in the cacheMiss: The data was not found in the cache
1000 0001 0000 1000 00001 0 1000 0000 0000 0000 00001 1
memory address: 0x8 0 0 0 0 1 5 8
![Page 63: Inside out of your computer memoriescseweb.ucsd.edu/classes/su16/cse141-a/slides/Cache_20160719.pdf · Outline • Memory Hierarchy • The CPU-memory gap problems • Locality •](https://reader035.vdocuments.us/reader035/viewer/2022062505/5bb4f38509d3f2b4158b53c6/html5/thumbnails/63.jpg)
valid tag datadirty
valid tag datadirty
Accessing the cache
hit? miss?hit? miss?
=?=?
24
tag offsetindex1000 0000 0000 0000 0000 0001 0101 1000memory address:
Offset:The position of the requesting word in a cache block
Hit: The data was found in the cacheMiss: The data was not found in the cache
1000 0001 0000 1000 00001 0 1000 0000 0000 0000 00001 1
memory address: 0x8 0 0 0 0 1 5 8
![Page 64: Inside out of your computer memoriescseweb.ucsd.edu/classes/su16/cse141-a/slides/Cache_20160719.pdf · Outline • Memory Hierarchy • The CPU-memory gap problems • Locality •](https://reader035.vdocuments.us/reader035/viewer/2022062505/5bb4f38509d3f2b4158b53c6/html5/thumbnails/64.jpg)
How many bits in each field?
tag index offset
hit?
block / cacheline
hit?
=?=?
25
valid tag datadirty
valid tag datadirty
![Page 65: Inside out of your computer memoriescseweb.ucsd.edu/classes/su16/cse141-a/slides/Cache_20160719.pdf · Outline • Memory Hierarchy • The CPU-memory gap problems • Locality •](https://reader035.vdocuments.us/reader035/viewer/2022062505/5bb4f38509d3f2b4158b53c6/html5/thumbnails/65.jpg)
How many bits in each field?
tag index offset
hit?
block / cacheline
hit?
=?=?
25
lg(block size)
valid tag datadirty
valid tag datadirty
![Page 66: Inside out of your computer memoriescseweb.ucsd.edu/classes/su16/cse141-a/slides/Cache_20160719.pdf · Outline • Memory Hierarchy • The CPU-memory gap problems • Locality •](https://reader035.vdocuments.us/reader035/viewer/2022062505/5bb4f38509d3f2b4158b53c6/html5/thumbnails/66.jpg)
How many bits in each field?
tag index offset
hit?
block / cacheline
hit?
=?=?
25
lg(block size)lg(number of sets)
valid tag datadirty
valid tag datadirty
![Page 67: Inside out of your computer memoriescseweb.ucsd.edu/classes/su16/cse141-a/slides/Cache_20160719.pdf · Outline • Memory Hierarchy • The CPU-memory gap problems • Locality •](https://reader035.vdocuments.us/reader035/viewer/2022062505/5bb4f38509d3f2b4158b53c6/html5/thumbnails/67.jpg)
C = ABS• C: Capacity in data arrays• A: Way-Associativity
• N-way: N blocks in a set, A = N• 1 for direct-mapped cache
• B: Block Size (Cacheline)• How many bytes in a block
• S: Number of Sets:• A set contains blocks sharing the same index• 1 for fully associate cache
26
![Page 68: Inside out of your computer memoriescseweb.ucsd.edu/classes/su16/cse141-a/slides/Cache_20160719.pdf · Outline • Memory Hierarchy • The CPU-memory gap problems • Locality •](https://reader035.vdocuments.us/reader035/viewer/2022062505/5bb4f38509d3f2b4158b53c6/html5/thumbnails/68.jpg)
Corollary of C = ABS
• offset bits: lg(B)• index bits: lg(S)• tag bits: address_length - lg(S) - lg(B)
• address_length is 32 bits for 32-bit machine
• (address / block_size) % S = set index
tag index offset
27
![Page 69: Inside out of your computer memoriescseweb.ucsd.edu/classes/su16/cse141-a/slides/Cache_20160719.pdf · Outline • Memory Hierarchy • The CPU-memory gap problems • Locality •](https://reader035.vdocuments.us/reader035/viewer/2022062505/5bb4f38509d3f2b4158b53c6/html5/thumbnails/69.jpg)
AMD Phenom II• L1 data (D-L1) cache configuration of AMD Phenom II
• Size 64KB, 2-way set associativity, 64B block• Assume 64-bit memory address
Which of the following is correct? A. Tag is 49 bitsB. Index is 8 bitsC. Offset is 7 bitsD. The cache has 1024 setsE. None of the above
C = ABS
28
64KB = 2 * 64 * SS = 512
offset = lg(64) = 6 bitsindex = lg(512) = 9 bits
tag = 64 - lg(512) - lg(64) = 49 bits
![Page 70: Inside out of your computer memoriescseweb.ucsd.edu/classes/su16/cse141-a/slides/Cache_20160719.pdf · Outline • Memory Hierarchy • The CPU-memory gap problems • Locality •](https://reader035.vdocuments.us/reader035/viewer/2022062505/5bb4f38509d3f2b4158b53c6/html5/thumbnails/70.jpg)
Core i7• L1 data (D-L1) cache configuration of Core i7
• Size 32KB, 8-way set associativity, 64B block• Assume 64-bit memory address• Which of the following is NOT correct?A. Tag is 52 bitsB. Index is 6 bitsC. Offset is 6 bitsD. The cache has 128 sets
C = ABS32KB = 8 * 64 * S
S = 64offset = lg(64) = 6 bitsindex = lg(64) = 6 bits
tag = 64 - lg(64) - lg(64) = 52 bits29
![Page 71: Inside out of your computer memoriescseweb.ucsd.edu/classes/su16/cse141-a/slides/Cache_20160719.pdf · Outline • Memory Hierarchy • The CPU-memory gap problems • Locality •](https://reader035.vdocuments.us/reader035/viewer/2022062505/5bb4f38509d3f2b4158b53c6/html5/thumbnails/71.jpg)
Array of structures or structure of arrays
30
Array of objects object of arraysstruct grades{ int id; double *homework; double average;};
struct grades{ int *id; double **homework; double *average;};
average of each student
average of each
homework