paging for multi-core shared caches alejandro lópez-ortiz, alejandro salinger itcs, january 8 th,...

Paging for Multi-Core Shared Caches

Alejandro López-Ortiz , Alejandro Salinger

ITCS, January 8th, 2012

Multi-Core challenges• Access to data is a key factor • Cache efficiency is determinant

– Algorithms– Schedulers– Paging strategies

• Extensively studied for sequential case• Almost no previous theory for multi-core case

3

Sequential Paging

5

Slow memory Cache of size K

…p6 p3 p2 p4 p4 p2 p10 p11 p5 p4…Page request

Is pi in the cache? -Yes, do nothing (hit)-No, fetch pi from slow memory, evict one page from cache (fault)

Goal: minimize number of faults

Sequential PagingCommon eviction policies:

– Least-Recently-Used (LRU)– First-In-First-Out (FIFO)– Flush-When-Full (FWF)– Furthest-In-The-Future (FITF) (offline)

• An online algorithm A is c-competitive if for all R

6

Multi-Core Paging

7

RAM

Core 1 Core 2 Core 3 Core 4

L2/L3 Cache

t 1 2 3 4 5 6 7 8 9 10 11 12

R1: p2 p8 p1 p4 p3 p4 p10 p5 …

R2: p9 p1 _ _ _ p8 p2 p1 p1 p4 p7 …

R3: p3 p18 p17 p8 p2 p3 p2 p9 …

Multi-Core Paging

• p sequences• shared cache of size K• total length n (n K, p)• hit = 1 unit of time • fault = units

8

t 1 2 3 4 5 6 7 8 9 10 11 12

R1: p2 p8 p1 p4 p3 p4 p10 p5 …

R2: p9 p1 p8 p2 p1 p1 p4 p7 …

R3: p3 p18 p17 p8 p2 p3 p2 p9 …

fault at t=2 on p1,

Related Models

• Multiple applications or threads• Multi-Core model [Hassidim, ICS‘10]

– Makespan– LRU is not competitive– Scheduling

• Our model:– No scheduling of requests– Separates scheduling and paging– Minimize faults

9

Natural Strategies

• Share the cache– Eviction policy

• Partition the cache among cores– Partition function (static, dynamic)– Eviction policy

• Examples: – Shared-LRU – Optimal Static Partition with LRU

10

Partition vs. Shared

11

𝑂𝑝𝑡 𝑆𝑡𝑎𝑡𝑖𝑐 𝑃𝑎𝑟𝑡𝑖𝑡𝑖𝑜𝑛h𝑆 𝑎𝑟𝑒𝑑 𝐿𝑅𝑈

=Ω(𝑛)

h𝑆 𝑎𝑟𝑒𝑑 𝐿𝑅𝑈𝑂𝑝𝑡 𝑆𝑡𝑎𝑡𝑖𝑐 𝑃𝑎𝑟𝑡𝑖𝑡𝑖𝑜𝑛

≤𝐾

For any online dynamic partition that changes o(n) times

𝑃𝑎𝑟𝑡𝑖𝑡𝑖𝑜𝑛h𝑆 𝑎𝑟𝑒𝑑 𝐿𝑅𝑈

=𝜔 (1)

Partitions that don’t change enough are not competitive

Shared strategies

The same applies to FIFO, CLOCK, FWF

12

Theorem:Competitive Ratio of (Shared) LRU =

when offline algorithm has cache h ≈ K/2

Proof idea

13

pages pages

Faults LRU ≥ n/2

Faults Offline ≤ Initial + αK per coloured phase =

Competitive Ratio LRU =

Obs: Furthest-In-The-Future is not optimal

The Offline Problem

14

PARTIAL-INDIVIDUAL-FAULTS (PIF):

Given , time and , can be served such that at time the number of faults on is at most ?

15

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

p1 _ _ p2 p8 p1 p4 _ _ p10 p5 p1 p4 p2 p9 p9 p5 p2 p3 p7

p2 p9 p1 _ _ p4 p8 _ _ p1 p4 p7 p2 _ _ p3 p4 _ _ p1

p3 p4 _ _ p8 p2 p3 p2 p9 p5 p1 p4 p2 p9 p9 p1 _ _ p4 p2

p2 _ _ p3 p8 p1 p1 p3 p9 _ _ p10 p5 p1 p8 _ _ p1 p4 p2

(𝑓 1𝑓 2𝑓 3𝑓 4

)≤(2334)E.g. At t=18, ?

PARTIAL-INDIVIDUAL-FAULTS (PIF):

• Optimization version (MAX-PIF): given an instance of PIF, maximize the number of sequences that fault within given bound

• Unless P=NP, there is no PTAS for MAX-PIF

Theorem: PIF is NP-complete

Theorem: MAX-PIF is APX-hard

PIF vs. Min Faults

• Partial-Individual-Faults remains NP-hard even when

• If , minimizing faults can be solved by FITF• Achieving a fair fault distribution is harder

than minimizing the total number of faults

17

The Offline Problem• Offline algorithm can align sequences properly by means of faults• Algorithm could “force faults” for this sake

• Regular execution

• Forcing a fault on p1

18

p1 p2 p3

p5 p8 p9

p1 p5 p4 p5 p1 p4 p6 p9 …p2 p3 p3 p2 p8 p8 p3 p10 p7 …

p1 p2 p3

p5 p8 p9

p1 p2 p3

p5 p8 p4

p1 p5 p4 _ _ _ p5 p1 p4 p6 p9 …p2 p3 p3 p2 p8 p8 p3 p10 p7 …

p1 p5 p4 p5 p1 p4 p6 p9 …p2 p3 p3 p2 p8 p8 p3 p10 p7 …

p1 _ _ _ p5 p4 p5 p1 p4 p6 p9 …p2 p3 p3 p2 p8 p8 p3 p10 p7 …

p1 p2 p3

p5 p8 p9

p1 p2 p3

p5 p8 p9

p1 p4 p3

p5 p8 p9

p1 _ _ _ p5 p4 _ _ _ p5 p1 p4 p6 p9

p2 p3 p3 p2 p8 p8 p3 p10 p7 …

The Offline Problem

• However, this has no advantage over an honest offline algorithm

19

Theorem: Let A be an offline algorithm that forces faults. There exists an offline algorithm A’ such that for all disjoint R

A(R) =A’(R)

The Offline Problem• For minimizing faults:

• Yields an time algorithm• Can be improved to using dynamic programming (recall n>>p)• This algorithm extends to Partial-Individual-Faults

20

Theorem: There exists an optimal offline algorithm that upon each fault evicts a page whose next request time is maximal in , for some j=1..p

Conclusions• Multi-core paging is significantly different

from sequential paging• Traditional paging strategies are not

competitive • Serving a set of requests while limiting faults

in each sequence is hard• Multi-core paging is in P when number of

cores is constant

21

Open Problems

• What are good online strategies?• What are good measures of performance?

– Fairness? • What is the complexity of minimizing the

number of faults?• Can we obtain more efficient offline

algorithms (exact or approximate)?

22

Thank you

paging for multi-core shared caches alejandro lópez-ortiz, alejandro salinger itcs, january 8 th,...

Documents

p4p4 p2p2 p2p2

p1p1 p3p3 p4p4

p3p3 p8p8 p1p1 p1p1

p8p8 p2p2 p3p3 p2p2

p5p5 p1p1 p8p8

p1p1 p4p4 p7p7 p2p2

p4p4 p8p8

p5p5 r2