paging for multi-core shared caches alejandro lópez-ortiz, alejandro salinger itcs, january 8 th,...
TRANSCRIPT
Paging for Multi-Core Shared Caches
Alejandro López-Ortiz , Alejandro Salinger
ITCS, January 8th, 2012
2
Multi-Core challenges• Access to data is a key factor • Cache efficiency is determinant
– Algorithms– Schedulers– Paging strategies
• Extensively studied for sequential case• Almost no previous theory for multi-core case
3
Sequential Paging
5
Slow memory Cache of size K
…p6 p3 p2 p4 p4 p2 p10 p11 p5 p4…Page request
Is pi in the cache? -Yes, do nothing (hit)-No, fetch pi from slow memory, evict one page from cache (fault)
Goal: minimize number of faults
Sequential PagingCommon eviction policies:
– Least-Recently-Used (LRU)– First-In-First-Out (FIFO)– Flush-When-Full (FWF)– Furthest-In-The-Future (FITF) (offline)
• An online algorithm A is c-competitive if for all R
6
Multi-Core Paging
7
RAM
Core 1 Core 2 Core 3 Core 4
L2/L3 Cache
t 1 2 3 4 5 6 7 8 9 10 11 12
R1: p2 p8 p1 p4 p3 p4 p10 p5 …
R2: p9 p1 _ _ _ p8 p2 p1 p1 p4 p7 …
R3: p3 p18 p17 p8 p2 p3 p2 p9 …
Multi-Core Paging
• p sequences• shared cache of size K• total length n (n K, p)• hit = 1 unit of time • fault = units
8
t 1 2 3 4 5 6 7 8 9 10 11 12
R1: p2 p8 p1 p4 p3 p4 p10 p5 …
R2: p9 p1 p8 p2 p1 p1 p4 p7 …
R3: p3 p18 p17 p8 p2 p3 p2 p9 …
fault at t=2 on p1,
Related Models
• Multiple applications or threads• Multi-Core model [Hassidim, ICS‘10]
– Makespan– LRU is not competitive– Scheduling
• Our model:– No scheduling of requests– Separates scheduling and paging– Minimize faults
9
Natural Strategies
• Share the cache– Eviction policy
• Partition the cache among cores– Partition function (static, dynamic)– Eviction policy
• Examples: – Shared-LRU – Optimal Static Partition with LRU
10
Partition vs. Shared
11
𝑂𝑝𝑡 𝑆𝑡𝑎𝑡𝑖𝑐 𝑃𝑎𝑟𝑡𝑖𝑡𝑖𝑜𝑛h𝑆 𝑎𝑟𝑒𝑑 𝐿𝑅𝑈
=Ω(𝑛)
h𝑆 𝑎𝑟𝑒𝑑 𝐿𝑅𝑈𝑂𝑝𝑡 𝑆𝑡𝑎𝑡𝑖𝑐 𝑃𝑎𝑟𝑡𝑖𝑡𝑖𝑜𝑛
≤𝐾
For any online dynamic partition that changes o(n) times
𝑃𝑎𝑟𝑡𝑖𝑡𝑖𝑜𝑛h𝑆 𝑎𝑟𝑒𝑑 𝐿𝑅𝑈
=𝜔 (1)
Partitions that don’t change enough are not competitive
Shared strategies
The same applies to FIFO, CLOCK, FWF
12
Theorem:Competitive Ratio of (Shared) LRU =
when offline algorithm has cache h ≈ K/2
Proof idea
13
pages pages
Faults LRU ≥ n/2
Faults Offline ≤ Initial + αK per coloured phase =
Competitive Ratio LRU =
Obs: Furthest-In-The-Future is not optimal
The Offline Problem
14
PARTIAL-INDIVIDUAL-FAULTS (PIF):
Given , time and , can be served such that at time the number of faults on is at most ?
15
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
p1 _ _ p2 p8 p1 p4 _ _ p10 p5 p1 p4 p2 p9 p9 p5 p2 p3 p7
p2 p9 p1 _ _ p4 p8 _ _ p1 p4 p7 p2 _ _ p3 p4 _ _ p1
p3 p4 _ _ p8 p2 p3 p2 p9 p5 p1 p4 p2 p9 p9 p1 _ _ p4 p2
p2 _ _ p3 p8 p1 p1 p3 p9 _ _ p10 p5 p1 p8 _ _ p1 p4 p2
(𝑓 1𝑓 2𝑓 3𝑓 4
)≤(2334)E.g. At t=18, ?
PARTIAL-INDIVIDUAL-FAULTS (PIF):
• Optimization version (MAX-PIF): given an instance of PIF, maximize the number of sequences that fault within given bound
• Unless P=NP, there is no PTAS for MAX-PIF
Theorem: PIF is NP-complete
Theorem: MAX-PIF is APX-hard
PIF vs. Min Faults
• Partial-Individual-Faults remains NP-hard even when
• If , minimizing faults can be solved by FITF• Achieving a fair fault distribution is harder
than minimizing the total number of faults
17
The Offline Problem• Offline algorithm can align sequences properly by means of faults• Algorithm could “force faults” for this sake
• Regular execution
• Forcing a fault on p1
18
p1 p2 p3
p5 p8 p9
p1 p5 p4 p5 p1 p4 p6 p9 …p2 p3 p3 p2 p8 p8 p3 p10 p7 …
p1 p2 p3
p5 p8 p9
p1 p2 p3
p5 p8 p4
p1 p5 p4 _ _ _ p5 p1 p4 p6 p9 …p2 p3 p3 p2 p8 p8 p3 p10 p7 …
p1 p5 p4 p5 p1 p4 p6 p9 …p2 p3 p3 p2 p8 p8 p3 p10 p7 …
p1 _ _ _ p5 p4 p5 p1 p4 p6 p9 …p2 p3 p3 p2 p8 p8 p3 p10 p7 …
p1 p2 p3
p5 p8 p9
p1 p2 p3
p5 p8 p9
p1 p4 p3
p5 p8 p9
p1 _ _ _ p5 p4 _ _ _ p5 p1 p4 p6 p9
p2 p3 p3 p2 p8 p8 p3 p10 p7 …
The Offline Problem
• However, this has no advantage over an honest offline algorithm
19
Theorem: Let A be an offline algorithm that forces faults. There exists an offline algorithm A’ such that for all disjoint R
A(R) =A’(R)
The Offline Problem• For minimizing faults:
• Yields an time algorithm• Can be improved to using dynamic programming (recall n>>p)• This algorithm extends to Partial-Individual-Faults
20
Theorem: There exists an optimal offline algorithm that upon each fault evicts a page whose next request time is maximal in , for some j=1..p
Conclusions• Multi-core paging is significantly different
from sequential paging• Traditional paging strategies are not
competitive • Serving a set of requests while limiting faults
in each sequence is hard• Multi-core paging is in P when number of
cores is constant
21
Open Problems
• What are good online strategies?• What are good measures of performance?
– Fairness? • What is the complexity of minimizing the
number of faults?• Can we obtain more efficient offline
algorithms (exact or approximate)?
22
Thank you