![Page 1: Yuejian Xie, Gabriel H. Loh. Core0 IL1 DL1 Core1 IL1 DL1 Last Level Cache (LLC) Core1s Data 2 Core0s Data](https://reader035.vdocuments.us/reader035/viewer/2022062619/55171b4d550346f5558b578b/html5/thumbnails/1.jpg)
PIPP: Promotion/Insertion Pseudo-Partitioning of Multi-Core Shared Caches
Yuejian Xie, Gabriel H. Loh
![Page 2: Yuejian Xie, Gabriel H. Loh. Core0 IL1 DL1 Core1 IL1 DL1 Last Level Cache (LLC) Core1s Data 2 Core0s Data](https://reader035.vdocuments.us/reader035/viewer/2022062619/55171b4d550346f5558b578b/html5/thumbnails/2.jpg)
2
Last Level Cache In Multi-Core
Core0
IL1 DL1
Core1
IL1 DL1
Last Level Cache (LLC)Core1’s DataCore0’s Data
![Page 3: Yuejian Xie, Gabriel H. Loh. Core0 IL1 DL1 Core1 IL1 DL1 Last Level Cache (LLC) Core1s Data 2 Core0s Data](https://reader035.vdocuments.us/reader035/viewer/2022062619/55171b4d550346f5558b578b/html5/thumbnails/3.jpg)
3
Previous Work and Motivation• Capacity Management
– Considering different cache space need, allocate proper space to each core.
– Guo-MICRO07, Kim-PACT04, Srikantaiah-ASPLOS09, Qureshi-MICRO06 (UCP), …
• Dead Time Management– Evict dead lines (blocks with no reuse) sooner.– Kaxiras-ISCA01, Qureshi-ISCA07, Jaleel-PACT07
(TADIP), …
PIPP: Do both CAPACITY and DEAD TIME management better AT THE SAME
TIME !
![Page 4: Yuejian Xie, Gabriel H. Loh. Core0 IL1 DL1 Core1 IL1 DL1 Last Level Cache (LLC) Core1s Data 2 Core0s Data](https://reader035.vdocuments.us/reader035/viewer/2022062619/55171b4d550346f5558b578b/html5/thumbnails/4.jpg)
4
UCP TechniqueCore
1Core
0
Core 0 gets 5 ways
Core 1 gets 3 ways
![Page 5: Yuejian Xie, Gabriel H. Loh. Core0 IL1 DL1 Core1 IL1 DL1 Last Level Cache (LLC) Core1s Data 2 Core0s Data](https://reader035.vdocuments.us/reader035/viewer/2022062619/55171b4d550346f5558b578b/html5/thumbnails/5.jpg)
TADIP Technique
MRU LRU
Incoming Block
5
![Page 6: Yuejian Xie, Gabriel H. Loh. Core0 IL1 DL1 Core1 IL1 DL1 Last Level Cache (LLC) Core1s Data 2 Core0s Data](https://reader035.vdocuments.us/reader035/viewer/2022062619/55171b4d550346f5558b578b/html5/thumbnails/6.jpg)
TADIP Technique
MRU LRU
6
Occupies one cache blockfor a long time with no benefit!
![Page 7: Yuejian Xie, Gabriel H. Loh. Core0 IL1 DL1 Core1 IL1 DL1 Last Level Cache (LLC) Core1s Data 2 Core0s Data](https://reader035.vdocuments.us/reader035/viewer/2022062619/55171b4d550346f5558b578b/html5/thumbnails/7.jpg)
7
TADIP Technique
MRU LRU
Incoming Block
![Page 8: Yuejian Xie, Gabriel H. Loh. Core0 IL1 DL1 Core1 IL1 DL1 Last Level Cache (LLC) Core1s Data 2 Core0s Data](https://reader035.vdocuments.us/reader035/viewer/2022062619/55171b4d550346f5558b578b/html5/thumbnails/8.jpg)
8
TADIP Technique
MRU LRU
Useless Block Evicted at next eviction
Useful Block Moved to MRU position
![Page 9: Yuejian Xie, Gabriel H. Loh. Core0 IL1 DL1 Core1 IL1 DL1 Last Level Cache (LLC) Core1s Data 2 Core0s Data](https://reader035.vdocuments.us/reader035/viewer/2022062619/55171b4d550346f5558b578b/html5/thumbnails/9.jpg)
9
TADIP Technique
MRU LRU
Useless Block Evicted at next eviction
Useful Block Moved to MRU position
![Page 10: Yuejian Xie, Gabriel H. Loh. Core0 IL1 DL1 Core1 IL1 DL1 Last Level Cache (LLC) Core1s Data 2 Core0s Data](https://reader035.vdocuments.us/reader035/viewer/2022062619/55171b4d550346f5558b578b/html5/thumbnails/10.jpg)
10
PIPP: Novel scheme for Promotion and Insertion
Break “Replacement” Into Three Pieces
• Eviction– When replacing a block in a set, which should
be evicted?• Insertion
– For new blocks, where to insert the new block?• Promotion
– When there is a hit in the cache, how to adjust the block’s position/priority?
![Page 11: Yuejian Xie, Gabriel H. Loh. Core0 IL1 DL1 Core1 IL1 DL1 Last Level Cache (LLC) Core1s Data 2 Core0s Data](https://reader035.vdocuments.us/reader035/viewer/2022062619/55171b4d550346f5558b578b/html5/thumbnails/11.jpg)
11
Our Scheme: PIPP• What’s PIPP?
– Promotion/Insertion Pseudo Partitioning– Achieving both capacity and dead-time management.
• Eviction– LRU block as the victim
• Insertion– The core’s quota worth of blocks away from LRU
• Promotion– To MRU by only one.
MRU LRU
To Evict
Promote
Hit
Insert Position = 3 (Target Allocation)
New
![Page 12: Yuejian Xie, Gabriel H. Loh. Core0 IL1 DL1 Core1 IL1 DL1 Last Level Cache (LLC) Core1s Data 2 Core0s Data](https://reader035.vdocuments.us/reader035/viewer/2022062619/55171b4d550346f5558b578b/html5/thumbnails/12.jpg)
12
PIPP ExampleCore0 quota: 5
blocksCore1 quota: 3
blocks
1 A 2 3 4 5B C
Core0’s Block
Core1’s Block
Request
MRU
LRU
Core1’s quota=3
D
![Page 13: Yuejian Xie, Gabriel H. Loh. Core0 IL1 DL1 Core1 IL1 DL1 Last Level Cache (LLC) Core1s Data 2 Core0s Data](https://reader035.vdocuments.us/reader035/viewer/2022062619/55171b4d550346f5558b578b/html5/thumbnails/13.jpg)
13
PIPP ExampleCore0 quota: 5
blocksCore1 quota: 3
blocks
1 A 2 53 4 D B
Core0’s Block
Core1’s Block
Request
MRU
LRU
6
Core0’s quota=5
![Page 14: Yuejian Xie, Gabriel H. Loh. Core0 IL1 DL1 Core1 IL1 DL1 Last Level Cache (LLC) Core1s Data 2 Core0s Data](https://reader035.vdocuments.us/reader035/viewer/2022062619/55171b4d550346f5558b578b/html5/thumbnails/14.jpg)
14
PIPP ExampleCore0 quota: 5
blocksCore1 quota: 3
blocks
1 A 2 6 3 4 D B
Core0’s Block
Core1’s Block
Request
MRU
LRU
Core0’s quota=5
7
![Page 15: Yuejian Xie, Gabriel H. Loh. Core0 IL1 DL1 Core1 IL1 DL1 Last Level Cache (LLC) Core1s Data 2 Core0s Data](https://reader035.vdocuments.us/reader035/viewer/2022062619/55171b4d550346f5558b578b/html5/thumbnails/15.jpg)
15
PIPP ExampleCore0 quota: 5
blocksCore1 quota: 3
blocks
1 A 2 6 3 4 D
Core0’s Block
Core1’s Block
Request
MRU
LRU
D
7
![Page 16: Yuejian Xie, Gabriel H. Loh. Core0 IL1 DL1 Core1 IL1 DL1 Last Level Cache (LLC) Core1s Data 2 Core0s Data](https://reader035.vdocuments.us/reader035/viewer/2022062619/55171b4d550346f5558b578b/html5/thumbnails/16.jpg)
16
PIPP ExampleCore0 quota: 5
blocksCore1 quota: 3
blocks
1 A 2 7 6 4
Core0’s Block
Core1’s Block
Request
MRU
LRU
Core1’s quota=3
D3
E
![Page 17: Yuejian Xie, Gabriel H. Loh. Core0 IL1 DL1 Core1 IL1 DL1 Last Level Cache (LLC) Core1s Data 2 Core0s Data](https://reader035.vdocuments.us/reader035/viewer/2022062619/55171b4d550346f5558b578b/html5/thumbnails/17.jpg)
17
PIPP ExampleCore0 quota: 5
blocksCore1 quota: 3
blocks
1 A 2 7 6 D
Core0’s Block
Core1’s Block
Request
MRU
LRU
3E
2
![Page 18: Yuejian Xie, Gabriel H. Loh. Core0 IL1 DL1 Core1 IL1 DL1 Last Level Cache (LLC) Core1s Data 2 Core0s Data](https://reader035.vdocuments.us/reader035/viewer/2022062619/55171b4d550346f5558b578b/html5/thumbnails/18.jpg)
18
How PIPP Does Both Managements
Core0 Core1 Core2 Core3
Quota 6 4 4 2
MRU
LRUInsert closer
to LRU position
![Page 19: Yuejian Xie, Gabriel H. Loh. Core0 IL1 DL1 Core1 IL1 DL1 Last Level Cache (LLC) Core1s Data 2 Core0s Data](https://reader035.vdocuments.us/reader035/viewer/2022062619/55171b4d550346f5558b578b/html5/thumbnails/19.jpg)
19
Pseudo-Partition Benefit
MRU0
Core0 quota: 5 blocks
Core1 quota: 3 blocks
Core0’s Block
Core1’s Block
Request
Strict Partition
MRU1 LRU1LRU0
New
![Page 20: Yuejian Xie, Gabriel H. Loh. Core0 IL1 DL1 Core1 IL1 DL1 Last Level Cache (LLC) Core1s Data 2 Core0s Data](https://reader035.vdocuments.us/reader035/viewer/2022062619/55171b4d550346f5558b578b/html5/thumbnails/20.jpg)
20
Pseudo-Partition Benefit
MRU
LRU
Core0 quota: 5 blocks
Core1 quota: 3 blocks
Core0’s Block
Core1’s Block
Request
New
Pseudo Partition
![Page 21: Yuejian Xie, Gabriel H. Loh. Core0 IL1 DL1 Core1 IL1 DL1 Last Level Cache (LLC) Core1s Data 2 Core0s Data](https://reader035.vdocuments.us/reader035/viewer/2022062619/55171b4d550346f5558b578b/html5/thumbnails/21.jpg)
21
Dir
ect
ly t
o M
RU
(TA
DIP
)Single Reuse Block
New
MRU
LRU
Pro
mote
By O
ne
(PIP
P)
MRU LRU
New
![Page 22: Yuejian Xie, Gabriel H. Loh. Core0 IL1 DL1 Core1 IL1 DL1 Last Level Cache (LLC) Core1s Data 2 Core0s Data](https://reader035.vdocuments.us/reader035/viewer/2022062619/55171b4d550346f5558b578b/html5/thumbnails/22.jpg)
22
Algorithm Comparison
AlgorithmCapacity
Management
Dead-time Managemen
tNote
LRU Baseline, no explicit management
UCP Strict partitioning
TADIP Insert at LRU and promote to MRU on hit
PIPP Pseudo-partitioning and incremental promotion
![Page 23: Yuejian Xie, Gabriel H. Loh. Core0 IL1 DL1 Core1 IL1 DL1 Last Level Cache (LLC) Core1s Data 2 Core0s Data](https://reader035.vdocuments.us/reader035/viewer/2022062619/55171b4d550346f5558b578b/html5/thumbnails/23.jpg)
23
Evaluation Methodology• Simulation environment
– SimpleScalar-Zesto, Out-Of-Order, Intel Core2-like
– 32KB, 8way DL1 IL1, 4MB 16way LLC, 1.6GHz DDR2
• Workloads Classification– “UCP2-5”
• UCP-friendly, 2-core, 5th workload– “DIP4-3”
• TADIP-friendly, 4-core, 3th workload
![Page 24: Yuejian Xie, Gabriel H. Loh. Core0 IL1 DL1 Core1 IL1 DL1 Last Level Cache (LLC) Core1s Data 2 Core0s Data](https://reader035.vdocuments.us/reader035/viewer/2022062619/55171b4d550346f5558b578b/html5/thumbnails/24.jpg)
24
i iIPC
iIPC
][
][ Speedup Weighted
alonestand
TADIP FriendlyUCP Friendly
Dual-Core Weighted Speedup
PIPP outperforms LRU, 19.0%, UCP 10.6%, TADIP 10.1%
PIPP is too cautious
here.
![Page 25: Yuejian Xie, Gabriel H. Loh. Core0 IL1 DL1 Core1 IL1 DL1 Last Level Cache (LLC) Core1s Data 2 Core0s Data](https://reader035.vdocuments.us/reader035/viewer/2022062619/55171b4d550346f5558b578b/html5/thumbnails/25.jpg)
25
TADIP FriendlyUCP Friendly
Quad-Core Weighted Speedup
PIPP outperforms LRU 21.9%, UCP 12.1%, TADIP 17.5%
![Page 26: Yuejian Xie, Gabriel H. Loh. Core0 IL1 DL1 Core1 IL1 DL1 Last Level Cache (LLC) Core1s Data 2 Core0s Data](https://reader035.vdocuments.us/reader035/viewer/2022062619/55171b4d550346f5558b578b/html5/thumbnails/26.jpg)
26
PIPP Behavior Analysis
Occupancy Control
Insertion Behavior TADIP inserts no-reuse lines at 1.7 while PIPP inserts those at 1.3. (LRU position equals to 0.)
Pseudo-Partition Benefit
![Page 27: Yuejian Xie, Gabriel H. Loh. Core0 IL1 DL1 Core1 IL1 DL1 Last Level Cache (LLC) Core1s Data 2 Core0s Data](https://reader035.vdocuments.us/reader035/viewer/2022062619/55171b4d550346f5558b578b/html5/thumbnails/27.jpg)
27
Conclusion• Novel proposal on Insertion and Promotion• A single unified mechanism provides both
capacity and dead time management• Outperforms prior UCP and TADIP
• In the full paper:– Special version of PIPP for streaming application– Reducing hardware overhead– Sensitivity analysis
![Page 28: Yuejian Xie, Gabriel H. Loh. Core0 IL1 DL1 Core1 IL1 DL1 Last Level Cache (LLC) Core1s Data 2 Core0s Data](https://reader035.vdocuments.us/reader035/viewer/2022062619/55171b4d550346f5558b578b/html5/thumbnails/28.jpg)
28
BACKUP SLIDES
![Page 29: Yuejian Xie, Gabriel H. Loh. Core0 IL1 DL1 Core1 IL1 DL1 Last Level Cache (LLC) Core1s Data 2 Core0s Data](https://reader035.vdocuments.us/reader035/viewer/2022062619/55171b4d550346f5558b578b/html5/thumbnails/29.jpg)
29
Hardware Cost
![Page 30: Yuejian Xie, Gabriel H. Loh. Core0 IL1 DL1 Core1 IL1 DL1 Last Level Cache (LLC) Core1s Data 2 Core0s Data](https://reader035.vdocuments.us/reader035/viewer/2022062619/55171b4d550346f5558b578b/html5/thumbnails/30.jpg)
30
Total IPC Throughput
![Page 31: Yuejian Xie, Gabriel H. Loh. Core0 IL1 DL1 Core1 IL1 DL1 Last Level Cache (LLC) Core1s Data 2 Core0s Data](https://reader035.vdocuments.us/reader035/viewer/2022062619/55171b4d550346f5558b578b/html5/thumbnails/31.jpg)
31
Fair Speedup
![Page 32: Yuejian Xie, Gabriel H. Loh. Core0 IL1 DL1 Core1 IL1 DL1 Last Level Cache (LLC) Core1s Data 2 Core0s Data](https://reader035.vdocuments.us/reader035/viewer/2022062619/55171b4d550346f5558b578b/html5/thumbnails/32.jpg)
32
Occupancy ControlE.g. Target Partition {5,3} – Actual Occupancy {6,2} = 1
![Page 33: Yuejian Xie, Gabriel H. Loh. Core0 IL1 DL1 Core1 IL1 DL1 Last Level Cache (LLC) Core1s Data 2 Core0s Data](https://reader035.vdocuments.us/reader035/viewer/2022062619/55171b4d550346f5558b578b/html5/thumbnails/33.jpg)
33
Stealing Benifit
![Page 34: Yuejian Xie, Gabriel H. Loh. Core0 IL1 DL1 Core1 IL1 DL1 Last Level Cache (LLC) Core1s Data 2 Core0s Data](https://reader035.vdocuments.us/reader035/viewer/2022062619/55171b4d550346f5558b578b/html5/thumbnails/34.jpg)
34
Streaming-Sensitive PIPP• Streaming Application Detection
– #Accesses, #Misses, MissRate > threshold• Insertion
– At a fixed position (independent of quota)– #Streaming Apps blocks away from LRU
position• Promotion
– Promote by 1 with probability pstream
– pstream « 1
![Page 35: Yuejian Xie, Gabriel H. Loh. Core0 IL1 DL1 Core1 IL1 DL1 Last Level Cache (LLC) Core1s Data 2 Core0s Data](https://reader035.vdocuments.us/reader035/viewer/2022062619/55171b4d550346f5558b578b/html5/thumbnails/35.jpg)
35
Importance of Components
![Page 36: Yuejian Xie, Gabriel H. Loh. Core0 IL1 DL1 Core1 IL1 DL1 Last Level Cache (LLC) Core1s Data 2 Core0s Data](https://reader035.vdocuments.us/reader035/viewer/2022062619/55171b4d550346f5558b578b/html5/thumbnails/36.jpg)
36
Sensitivity of Promotion Prob
Promotion Prob for General App
Promotion Prob for Streaming App
![Page 37: Yuejian Xie, Gabriel H. Loh. Core0 IL1 DL1 Core1 IL1 DL1 Last Level Cache (LLC) Core1s Data 2 Core0s Data](https://reader035.vdocuments.us/reader035/viewer/2022062619/55171b4d550346f5558b578b/html5/thumbnails/37.jpg)
37
In-Cache UMON
![Page 38: Yuejian Xie, Gabriel H. Loh. Core0 IL1 DL1 Core1 IL1 DL1 Last Level Cache (LLC) Core1s Data 2 Core0s Data](https://reader035.vdocuments.us/reader035/viewer/2022062619/55171b4d550346f5558b578b/html5/thumbnails/38.jpg)
38
In-Cache UMON Performance