pipp: promotion/insertion pseudo-partitioning of multi-core shared caches yuejian xie, gabriel h....
TRANSCRIPT
![Page 1: PIPP: Promotion/Insertion Pseudo-Partitioning of Multi-Core Shared Caches Yuejian Xie, Gabriel H. Loh Georgia Institute of Technology Presented by: Yingying](https://reader036.vdocuments.us/reader036/viewer/2022062500/5697c0081a28abf838cc65e3/html5/thumbnails/1.jpg)
PIPP:Promotion/Insertion Pseudo-Partitioning of Multi-Core Shared Caches
Yuejian Xie, Gabriel H. LohGeorgia Institute of Technology
Presented by: Yingying Tian
36th ACM/IEEE International Symposium on Computer Architecture (ISCA ‘09)
![Page 2: PIPP: Promotion/Insertion Pseudo-Partitioning of Multi-Core Shared Caches Yuejian Xie, Gabriel H. Loh Georgia Institute of Technology Presented by: Yingying](https://reader036.vdocuments.us/reader036/viewer/2022062500/5697c0081a28abf838cc65e3/html5/thumbnails/2.jpg)
Last Level Caches (LLCs) are shared by all cores in Chip Multi-Processors (CMPs).
Multiple cores compete for the limited LLC capacity.
Manage Shared Caches
Core0
L1I L1D
Core1
L1IL1D
Last Level Cache (LLC)Core1’s DataCore0’s Data
![Page 3: PIPP: Promotion/Insertion Pseudo-Partitioning of Multi-Core Shared Caches Yuejian Xie, Gabriel H. Loh Georgia Institute of Technology Presented by: Yingying](https://reader036.vdocuments.us/reader036/viewer/2022062500/5697c0081a28abf838cc65e3/html5/thumbnails/3.jpg)
LRU leads to poor performance and fairness as a sharing-oblivious cache management policy.
Previous works tried to allocate LLC resources fairly via: Capacity Management: way-partitioning
(UCP) Dead-Time Management: LRU insertion
(TADIP)
PIPP: Do both capacity and dead time management better at the same time !
![Page 4: PIPP: Promotion/Insertion Pseudo-Partitioning of Multi-Core Shared Caches Yuejian Xie, Gabriel H. Loh Georgia Institute of Technology Presented by: Yingying](https://reader036.vdocuments.us/reader036/viewer/2022062500/5697c0081a28abf838cc65e3/html5/thumbnails/4.jpg)
Outline
Background and Motivation Previous Work PIPP Evaluation Conclusion
![Page 5: PIPP: Promotion/Insertion Pseudo-Partitioning of Multi-Core Shared Caches Yuejian Xie, Gabriel H. Loh Georgia Institute of Technology Presented by: Yingying](https://reader036.vdocuments.us/reader036/viewer/2022062500/5697c0081a28abf838cc65e3/html5/thumbnails/5.jpg)
UCP (Utility based Cache Partitioning) `
Core1Core0
Core 0 gets 5 ways
Core 1 gets 3 ways
*Some materials are taken from original presentation slides.
![Page 6: PIPP: Promotion/Insertion Pseudo-Partitioning of Multi-Core Shared Caches Yuejian Xie, Gabriel H. Loh Georgia Institute of Technology Presented by: Yingying](https://reader036.vdocuments.us/reader036/viewer/2022062500/5697c0081a28abf838cc65e3/html5/thumbnails/6.jpg)
DIP (Dynamic Insertion Policy)
MRU LRU
Incoming Block
![Page 7: PIPP: Promotion/Insertion Pseudo-Partitioning of Multi-Core Shared Caches Yuejian Xie, Gabriel H. Loh Georgia Institute of Technology Presented by: Yingying](https://reader036.vdocuments.us/reader036/viewer/2022062500/5697c0081a28abf838cc65e3/html5/thumbnails/7.jpg)
MRU LRU
Occupies one cache blockfor a long time with no benefit!
DIP (Dynamic Insertion Policy)
![Page 8: PIPP: Promotion/Insertion Pseudo-Partitioning of Multi-Core Shared Caches Yuejian Xie, Gabriel H. Loh Georgia Institute of Technology Presented by: Yingying](https://reader036.vdocuments.us/reader036/viewer/2022062500/5697c0081a28abf838cc65e3/html5/thumbnails/8.jpg)
DIP (Dynamic Insertion Policy)
MRU LRU
Incoming Block
![Page 9: PIPP: Promotion/Insertion Pseudo-Partitioning of Multi-Core Shared Caches Yuejian Xie, Gabriel H. Loh Georgia Institute of Technology Presented by: Yingying](https://reader036.vdocuments.us/reader036/viewer/2022062500/5697c0081a28abf838cc65e3/html5/thumbnails/9.jpg)
DIP (Dynamic Insertion Policy)
MRU LRU
Useless Block Evicted at next eviction
Useful Block Moved to MRU position
![Page 10: PIPP: Promotion/Insertion Pseudo-Partitioning of Multi-Core Shared Caches Yuejian Xie, Gabriel H. Loh Georgia Institute of Technology Presented by: Yingying](https://reader036.vdocuments.us/reader036/viewer/2022062500/5697c0081a28abf838cc65e3/html5/thumbnails/10.jpg)
DIP (Dynamic Insertion Policy)
MRU LRU
Useless Block Evicted at next eviction
Useful Block Moved to MRU position
![Page 11: PIPP: Promotion/Insertion Pseudo-Partitioning of Multi-Core Shared Caches Yuejian Xie, Gabriel H. Loh Georgia Institute of Technology Presented by: Yingying](https://reader036.vdocuments.us/reader036/viewer/2022062500/5697c0081a28abf838cc65e3/html5/thumbnails/11.jpg)
Cache Replacement Policy Eviction: Which block should be
replaced when a cache miss occurs? LRU block
Insertion: For a coming block, where should it be inserted in the corresponding set? MRU insertion (Default LRU replacement
policy) LRU insertion (Dead-on-arrival blocks)
Promotion: If a block is re-referenced, where should its position be adjusted? Move to MRU position
![Page 12: PIPP: Promotion/Insertion Pseudo-Partitioning of Multi-Core Shared Caches Yuejian Xie, Gabriel H. Loh Georgia Institute of Technology Presented by: Yingying](https://reader036.vdocuments.us/reader036/viewer/2022062500/5697c0081a28abf838cc65e3/html5/thumbnails/12.jpg)
PIPP: Promotion/Insertion Pseudo-Partitioning Insertion:Target partitioning: ∏ = {∏1, ∏2, …., ∏n},
∑∏i = w (w is the associativity of the cache)On insertion, corei inserts its coming block in position ∏i. (Dynamically computed via
UCP monitors or other ways.) Promotion:One step toward MRU position with P and unchanged with 1-P.
MRU LRU
To Evict
Promote
Hit
Insert Position = 3 (Target Allocation) New
![Page 13: PIPP: Promotion/Insertion Pseudo-Partitioning of Multi-Core Shared Caches Yuejian Xie, Gabriel H. Loh Georgia Institute of Technology Presented by: Yingying](https://reader036.vdocuments.us/reader036/viewer/2022062500/5697c0081a28abf838cc65e3/html5/thumbnails/13.jpg)
13
PIPP ExampleCore0 quota: 5 blocksCore1 quota: 3 blocks
1 A 2 3 4 5B C
Core0’s Block Core1’s Block
Request
MRU LRU
Core1’s quota=3
D
![Page 14: PIPP: Promotion/Insertion Pseudo-Partitioning of Multi-Core Shared Caches Yuejian Xie, Gabriel H. Loh Georgia Institute of Technology Presented by: Yingying](https://reader036.vdocuments.us/reader036/viewer/2022062500/5697c0081a28abf838cc65e3/html5/thumbnails/14.jpg)
14
PIPP ExampleCore0 quota: 5 blocksCore1 quota: 3 blocks
1 A 2 53 4 D B
Core0’s Block Core1’s Block
Request
MRU LRU
6
Core0’s quota=5
![Page 15: PIPP: Promotion/Insertion Pseudo-Partitioning of Multi-Core Shared Caches Yuejian Xie, Gabriel H. Loh Georgia Institute of Technology Presented by: Yingying](https://reader036.vdocuments.us/reader036/viewer/2022062500/5697c0081a28abf838cc65e3/html5/thumbnails/15.jpg)
15
PIPP ExampleCore0 quota: 5 blocksCore1 quota: 3 blocks
1 A 2 6 3 4 D B
Core0’s Block Core1’s Block
Request
MRU LRU
Core0’s quota=5
7
![Page 16: PIPP: Promotion/Insertion Pseudo-Partitioning of Multi-Core Shared Caches Yuejian Xie, Gabriel H. Loh Georgia Institute of Technology Presented by: Yingying](https://reader036.vdocuments.us/reader036/viewer/2022062500/5697c0081a28abf838cc65e3/html5/thumbnails/16.jpg)
16
PIPP ExampleCore0 quota: 5 blocksCore1 quota: 3 blocks
1 A 2 6 3 4 D
Core0’s Block Core1’s Block
Request
MRU LRU
D
7
![Page 17: PIPP: Promotion/Insertion Pseudo-Partitioning of Multi-Core Shared Caches Yuejian Xie, Gabriel H. Loh Georgia Institute of Technology Presented by: Yingying](https://reader036.vdocuments.us/reader036/viewer/2022062500/5697c0081a28abf838cc65e3/html5/thumbnails/17.jpg)
17
PIPP ExampleCore0 quota: 5 blocksCore1 quota: 3 blocks
1 A 2 7 6 4
Core0’s Block Core1’s Block
Request
MRU LRU
Core1’s quota=3
D3
E
![Page 18: PIPP: Promotion/Insertion Pseudo-Partitioning of Multi-Core Shared Caches Yuejian Xie, Gabriel H. Loh Georgia Institute of Technology Presented by: Yingying](https://reader036.vdocuments.us/reader036/viewer/2022062500/5697c0081a28abf838cc65e3/html5/thumbnails/18.jpg)
18
PIPP ExampleCore0 quota: 5 blocksCore1 quota: 3 blocks
1 A 2 7 6 D
Core0’s Block Core1’s Block
Request
MRU LRU
3E
2
![Page 19: PIPP: Promotion/Insertion Pseudo-Partitioning of Multi-Core Shared Caches Yuejian Xie, Gabriel H. Loh Georgia Institute of Technology Presented by: Yingying](https://reader036.vdocuments.us/reader036/viewer/2022062500/5697c0081a28abf838cc65e3/html5/thumbnails/19.jpg)
19
Pseudo-Partition Benefit
MRU0
Core0 quota: 5 blocksCore1 quota: 3 blocks
Core0’s Block Core1’s Block
Request
Strict Partition
MRU1 LRU1LRU0
New
![Page 20: PIPP: Promotion/Insertion Pseudo-Partitioning of Multi-Core Shared Caches Yuejian Xie, Gabriel H. Loh Georgia Institute of Technology Presented by: Yingying](https://reader036.vdocuments.us/reader036/viewer/2022062500/5697c0081a28abf838cc65e3/html5/thumbnails/20.jpg)
20
Pseudo-Partition Benefit
MRU LRU
Core0 quota: 5 blocksCore1 quota: 3 blocks
Core0’s Block Core1’s Block
Request
New
Pseudo Partition
![Page 21: PIPP: Promotion/Insertion Pseudo-Partitioning of Multi-Core Shared Caches Yuejian Xie, Gabriel H. Loh Georgia Institute of Technology Presented by: Yingying](https://reader036.vdocuments.us/reader036/viewer/2022062500/5697c0081a28abf838cc65e3/html5/thumbnails/21.jpg)
Methodology
SimpleScalar simulator for x86 Intel Core 2 processor 32KB, 8-way 3-cycle L1I-L1D for
each core A shared 4MB, 16-way, 11-cycle LLC Multi-programmed workloads from
SPEC CPU benchmarks. (2-core and 4-core workloads)
500m insns warmup, 250m insns simulation
![Page 22: PIPP: Promotion/Insertion Pseudo-Partitioning of Multi-Core Shared Caches Yuejian Xie, Gabriel H. Loh Georgia Institute of Technology Presented by: Yingying](https://reader036.vdocuments.us/reader036/viewer/2022062500/5697c0081a28abf838cc65e3/html5/thumbnails/22.jpg)
Evaluation 2-Core Weighted Speedup
TADIP FriendlyUCP Friendly
PIPP outperforms LRU by 19.0%, UCP by 10.6%, TADIP by 10.1%
![Page 23: PIPP: Promotion/Insertion Pseudo-Partitioning of Multi-Core Shared Caches Yuejian Xie, Gabriel H. Loh Georgia Institute of Technology Presented by: Yingying](https://reader036.vdocuments.us/reader036/viewer/2022062500/5697c0081a28abf838cc65e3/html5/thumbnails/23.jpg)
4-Core Weighted Speedup
TADIP FriendlyUCP Friendly
PIPP outperforms LRU by 21.9%, UCP by 12.1%, TADIP by 17.5%
![Page 24: PIPP: Promotion/Insertion Pseudo-Partitioning of Multi-Core Shared Caches Yuejian Xie, Gabriel H. Loh Georgia Institute of Technology Presented by: Yingying](https://reader036.vdocuments.us/reader036/viewer/2022062500/5697c0081a28abf838cc65e3/html5/thumbnails/24.jpg)
Occupancy Control
For most workloads, the partitioning deviation is within 1.0 of the target allocation, similar to UCP.
![Page 25: PIPP: Promotion/Insertion Pseudo-Partitioning of Multi-Core Shared Caches Yuejian Xie, Gabriel H. Loh Georgia Institute of Technology Presented by: Yingying](https://reader036.vdocuments.us/reader036/viewer/2022062500/5697c0081a28abf838cc65e3/html5/thumbnails/25.jpg)
Conclusion
Novel proposal on Insertion and Promotion
A single unified mechanism provides both capacity and dead time management
Outperforms prior UCP and TADIP
![Page 26: PIPP: Promotion/Insertion Pseudo-Partitioning of Multi-Core Shared Caches Yuejian Xie, Gabriel H. Loh Georgia Institute of Technology Presented by: Yingying](https://reader036.vdocuments.us/reader036/viewer/2022062500/5697c0081a28abf838cc65e3/html5/thumbnails/26.jpg)
Thank you !
Questions?