rathijit sen david a. wood reuse-based online models for caches 6/20/2013 acm sigmetrics 2013 @ cmu,...

36
RATHIJIT SEN DAVID A. WOOD Reuse-based Online Models for Caches 6/20/2013 ACM SIGMETRICS 2013 @ CMU, Pittsburgh, PA 1

Upload: ashlyn-paul

Post on 17-Dec-2015

224 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: RATHIJIT SEN DAVID A. WOOD Reuse-based Online Models for Caches 6/20/2013 ACM SIGMETRICS 2013 @ CMU, Pittsburgh, PA 1

RATHIJIT SENDAVID A. WOOD

Reuse-based Online Models for Caches

6/20/2013ACM SIGMETRICS 2013 @ CMU, Pittsburgh, PA

1

Page 2: RATHIJIT SEN DAVID A. WOOD Reuse-based Online Models for Caches 6/20/2013 ACM SIGMETRICS 2013 @ CMU, Pittsburgh, PA 1

2

The Problem

6/20/2013ACM SIGMETRICS 2013 @ CMU, Pittsburgh, PA

Caches: power vs performance

Reconfigurable caches e.g., IvyBridge

The Problem: Which configuration to select?

e.g., to get the best energy-efficiency?

Core

Core

Core

Core

Core

Core

Core

Core

LLC LLC

LLC LLC

LLC LLC

LLC LLC

DRAM

Miss Fetch

Page 3: RATHIJIT SEN DAVID A. WOOD Reuse-based Online Models for Caches 6/20/2013 ACM SIGMETRICS 2013 @ CMU, Pittsburgh, PA 1

3

Cache Performance Prediction

6/20/2013ACM SIGMETRICS 2013 @ CMU, Pittsburgh, PA

We propose a frameworkh = (r · B) · φ

h: hit ratio r: reuse-distance distribution (novel hardware support) B: stochastic Binomial matrix φ: hit function (LRU, PLRU, RANDOM, NMRU)

Case study: Energy-Delay Product (EDP) within 7% of minimum

Page 4: RATHIJIT SEN DAVID A. WOOD Reuse-based Online Models for Caches 6/20/2013 ACM SIGMETRICS 2013 @ CMU, Pittsburgh, PA 1

4

Agenda

6/20/2013ACM SIGMETRICS 2013 @ CMU, Pittsburgh, PA

The Problem

Framework Locality (r) Matrix transformations (B) Hit functions (φ) h = (r · B) · φ

Hardware support

Case Study

Page 5: RATHIJIT SEN DAVID A. WOOD Reuse-based Online Models for Caches 6/20/2013 ACM SIGMETRICS 2013 @ CMU, Pittsburgh, PA 1

5

Cache Overview

6/20/2013ACM SIGMETRICS 2013 @ CMU, Pittsburgh, PA

Limited storage Sets of (usually 64-byte) blocks #blocks/set = associativity (#ways) Set Index + Address tags identify data

b b b b b b b b

b b b b b b b b

b b b b b b b b

b b b b b b b b

Associativity (A)

Sets (S)

AddressTag

Match?

Y HitMiss

N

Page 6: RATHIJIT SEN DAVID A. WOOD Reuse-based Online Models for Caches 6/20/2013 ACM SIGMETRICS 2013 @ CMU, Pittsburgh, PA 1

6

Last-Level Cache (LLC)

Workload Variation

2MB 4MB 8MB 16MB 32MB0

5

10

15

20

25

30

Mis

s / 1

000

Inst

ructi

on swim

6/20/2013ACM SIGMETRICS 2013 @ CMU, Pittsburgh, PA

ammp, blackscholes, bodytrack, fluidanimate, freqmine, swaptions

equake, gafort, wupwise

apache

mgrid

zeus

oltpjbb

fma3d

Page 7: RATHIJIT SEN DAVID A. WOOD Reuse-based Online Models for Caches 6/20/2013 ACM SIGMETRICS 2013 @ CMU, Pittsburgh, PA 1

7

Bad configurations hurt!

6/20/2013ACM SIGMETRICS 2013 @ CMU, Pittsburgh, PA

EDP (energy-delay product)

blac body flui freq swapammp equa fma3 gafo mgri swim wupw apac jbb oltp zeus1

1.5

2

2.5

3

3.5Max. EDP

Rela

tive

to m

in. E

DP

27% worse

218% worse

MinimumMaximum

Page 8: RATHIJIT SEN DAVID A. WOOD Reuse-based Online Models for Caches 6/20/2013 ACM SIGMETRICS 2013 @ CMU, Pittsburgh, PA 1

8

Problem Summary

6/20/2013ACM SIGMETRICS 2013 @ CMU, Pittsburgh, PA

Reconfigurable caches

Multiple replacement policies

Goal: Online miss-ratio prediction

b b b b b b b b

b b b b b b b b

b b b b b b b b

b b b b b b b b

Associativity (A)

Sets (S)

Page 9: RATHIJIT SEN DAVID A. WOOD Reuse-based Online Models for Caches 6/20/2013 ACM SIGMETRICS 2013 @ CMU, Pittsburgh, PA 1

9

Indexing Assumption

6/20/2013ACM SIGMETRICS 2013 @ CMU, Pittsburgh, PA

Mapping of unique addresses to cache setsAssumption: independent, uniform [Smith, 1978]

Unique accesses as Bernoulli trials

(Partial) Hashing POWER4, POWER5, POWER6, Xeon Simple XOR-based function [similar to Cypher, 2008]

Page 10: RATHIJIT SEN DAVID A. WOOD Reuse-based Online Models for Caches 6/20/2013 ACM SIGMETRICS 2013 @ CMU, Pittsburgh, PA 1

10

Agenda

6/20/2013ACM SIGMETRICS 2013 @ CMU, Pittsburgh, PA

The Problem

Framework Locality (r) Matrix transformations (B) Hit functions (φ) h = (r · B) · φ

Hardware support

Case Study

Page 11: RATHIJIT SEN DAVID A. WOOD Reuse-based Online Models for Caches 6/20/2013 ACM SIGMETRICS 2013 @ CMU, Pittsburgh, PA 1

11

Temporal Locality Metrics

6/20/2013ACM SIGMETRICS 2013 @ CMU, Pittsburgh, PA

Unique Reuse Distance (URD) #unique intervening addresses x y z z y x : URD(x)=2 Stack Distance [Mattson, 1970] – 1 Large cache large distances to track

Absolute Reuse Distance (ARD) #intervening addresses x y z z y x : ARD(x)=4

■ ■ ■ ■ … ■ ■ i

P(URD=i)

r

Size?

Page 12: RATHIJIT SEN DAVID A. WOOD Reuse-based Online Models for Caches 6/20/2013 ACM SIGMETRICS 2013 @ CMU, Pittsburgh, PA 1

12

Per-set Locality, r(S)

6/20/2013ACM SIGMETRICS 2013 @ CMU, Pittsburgh, PA

r(S) is “compressed” as S (#sets) increases Less of the tail is important

0 4 8 12 16 20 24 28 320

0.1

0.2

0.3

0.4

0.5

0.6S=2^14S=2^13S=2^12S=2^11S=2^10

Per-set URD (unique reuse distance)

Prob

abili

ty

■ ■ ■ ■ … ■ ■ i

P(URD=i)

r

x x

x x

#sets: S #sets: S > S

0 4 8 12 16 20 24 28 320

0.2

0.4

0.6

0.8

1

S=2^14S=2^13S=2^12S=2^11S=2^10

Per-set URD (unique reuse distance)

Cum

ulati

ve P

rob.

Page 13: RATHIJIT SEN DAVID A. WOOD Reuse-based Online Models for Caches 6/20/2013 ACM SIGMETRICS 2013 @ CMU, Pittsburgh, PA 1

13

Agenda

6/20/2013ACM SIGMETRICS 2013 @ CMU, Pittsburgh, PA

The Problem

Framework Locality (r) Matrix transformations (B) Hit functions (φ) h = (r · B) · φ

Hardware support

Case Study

Page 14: RATHIJIT SEN DAVID A. WOOD Reuse-based Online Models for Caches 6/20/2013 ACM SIGMETRICS 2013 @ CMU, Pittsburgh, PA 1

14

Generalized stochastic Binomial matrices [Strum, 1977]r(S) = r(1) · B(1 – 1/S, 1/S)

Composition:r(S) = r(S) · B(1 – S/S, S/S)

0

0

0

0

0 0

0

0

0

0

0 0 0

0

0

0

0

Estimating per-set locality

6/20/2013ACM SIGMETRICS 2013 @ CMU, Pittsburgh, PA

■ ■ ■ ■ ■ ■ ■ ■ i

P(URD=i)

k

ir

B

P(k successes in i trials) i.e.,P(k of i to the same set)

0

0

0 0

0

0

0

0

0

0

0

1

Page 15: RATHIJIT SEN DAVID A. WOOD Reuse-based Online Models for Caches 6/20/2013 ACM SIGMETRICS 2013 @ CMU, Pittsburgh, PA 1

15

Computation reuse & speedup

6/20/2013ACM SIGMETRICS 2013 @ CMU, Pittsburgh, PA

“Shorter” tail smaller matrices

r(1)

r(214)

r(213)

r(212)

r(211)

r(210)

r(210)

r(214)

r(213)

r(212)

r(211)

r(1)

Now: computeLater: hardware support

Size?

Poisson Approximation

■ ■ ■ ■ … ■ ■ i

P(URD=i)

r

Page 16: RATHIJIT SEN DAVID A. WOOD Reuse-based Online Models for Caches 6/20/2013 ACM SIGMETRICS 2013 @ CMU, Pittsburgh, PA 1

16

Size of r(210)?

6/20/2013ACM SIGMETRICS 2013 @ CMU, Pittsburgh, PA

Prediction with r(210) limited to URD < n

2-w

4-w

8-w

16-w

32-w

2-w

4-w

8-w

16-w

32-w

2-w

4-w

8-w

16-w

32-w

2-w

4-w

8-w

16-w

32-w

2-w

4-w

8-w

16-w

32-w

2MB 4MB 8MB 16MB 32MB

0

0.05

0.1

0.15

0.2

0.25

0.3n=32 n=64 n=128n=256 n=512 Actual

Mis

s Ra

tio

■ ■ ■ ■ … ■ ■ i

P(URD=i)

r

Page 17: RATHIJIT SEN DAVID A. WOOD Reuse-based Online Models for Caches 6/20/2013 ACM SIGMETRICS 2013 @ CMU, Pittsburgh, PA 1

17

Agenda

6/20/2013ACM SIGMETRICS 2013 @ CMU, Pittsburgh, PA

The Problem

Framework Locality (r) Matrix transformations (B) Hit functions (φ) h = (r · B) · φ

Hardware support

Case Study

Page 18: RATHIJIT SEN DAVID A. WOOD Reuse-based Online Models for Caches 6/20/2013 ACM SIGMETRICS 2013 @ CMU, Pittsburgh, PA 1

18

Hit Function, φ

6/20/2013ACM SIGMETRICS 2013 @ CMU, Pittsburgh, PA

φk: P(x will hit|URD(x)=k)

Monotonically decreasing model Intuition: larger URD same or larger eviction probability

φ0 = 1φk ≤ φk-1

φ = 0

x

Not x

x

Page 19: RATHIJIT SEN DAVID A. WOOD Reuse-based Online Models for Caches 6/20/2013 ACM SIGMETRICS 2013 @ CMU, Pittsburgh, PA 1

19

Hit Function, φ

6/20/2013ACM SIGMETRICS 2013 @ CMU, Pittsburgh, PA

Example: A=8

0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 320

0.10.20.30.40.50.60.70.80.9

1LRUPLRUNMRURANDOM

Unique Reuse Distance

Hit

Prob

abili

ty

Page 20: RATHIJIT SEN DAVID A. WOOD Reuse-based Online Models for Caches 6/20/2013 ACM SIGMETRICS 2013 @ CMU, Pittsburgh, PA 1

20

Formulating φ

6/20/2013ACM SIGMETRICS 2013 @ CMU, Pittsburgh, PA

φ(LRU): step-function (r · B) · φ(LRU) [Smith, 1978], [Hill & Smith, 1989]

φ(PLRU): Assumes on average, traffic evenly divided between subtrees

φ(RANDOM): Estimates #intervening misses using ARD

φ(NMRU): similar to φ(RANDOM) except φ1=1

Page 21: RATHIJIT SEN DAVID A. WOOD Reuse-based Online Models for Caches 6/20/2013 ACM SIGMETRICS 2013 @ CMU, Pittsburgh, PA 1

21

Agenda

6/20/2013ACM SIGMETRICS 2013 @ CMU, Pittsburgh, PA

The Problem

Framework Locality (r) Matrix transformations (B) Hit functions (φ) h = (r · B) · φ

Hardware support

Case Study

Page 22: RATHIJIT SEN DAVID A. WOOD Reuse-based Online Models for Caches 6/20/2013 ACM SIGMETRICS 2013 @ CMU, Pittsburgh, PA 1

22

Prediction Accuracy

6/20/2013ACM SIGMETRICS 2013 @ CMU, Pittsburgh, PA

LRU, PLRU(A=2), NMRU(A=2): exact per-set modelOthers: approximate per-set model

-1% 0% 1% 2% 3% 4% 5% 6%0

0.10.20.30.40.50.60.70.80.9

1

LRU PLRU RANDOM NMRU

abs((predicted-actual)/actual) miss ratio

Cum

ulati

ve P

roba

bilit

y

Page 23: RATHIJIT SEN DAVID A. WOOD Reuse-based Online Models for Caches 6/20/2013 ACM SIGMETRICS 2013 @ CMU, Pittsburgh, PA 1

23

Overheads

6/20/2013ACM SIGMETRICS 2013 @ CMU, Pittsburgh, PA

r = r · B : 6 80 μsec Binomial Poisson approximation for each row of B

h = (r · B) · φ : 20 30 μsec Average over 24 configurations B applied 8 times

Page 24: RATHIJIT SEN DAVID A. WOOD Reuse-based Online Models for Caches 6/20/2013 ACM SIGMETRICS 2013 @ CMU, Pittsburgh, PA 1

24

Agenda

6/20/2013ACM SIGMETRICS 2013 @ CMU, Pittsburgh, PA

The Problem

Framework Locality (r) Matrix transformations (B) Hit functions (φ) h = (r · B) · φ

Hardware support

Case Study

Page 25: RATHIJIT SEN DAVID A. WOOD Reuse-based Online Models for Caches 6/20/2013 ACM SIGMETRICS 2013 @ CMU, Pittsburgh, PA 1

25

Computation reuse & speedup

6/20/2013ACM SIGMETRICS 2013 @ CMU, Pittsburgh, PA

“Shorter” tail smaller matrices

r(1)

r(214)

r(213)

r(212)

r(211)

r(210)

r(210)

r(214)

r(213)

r(212)

r(211)

r(1)

Now: computeLater: hardware support

Size=512

Poisson Approximation

■ ■ ■ ■ … ■ ■ i

P(URD=i)

r

Now

Page 26: RATHIJIT SEN DAVID A. WOOD Reuse-based Online Models for Caches 6/20/2013 ACM SIGMETRICS 2013 @ CMU, Pittsburgh, PA 1

26

Insights

6/20/2013ACM SIGMETRICS 2013 @ CMU, Pittsburgh, PA

x y z z y x : URD(x)=2

Unique “remember” addresses Only cardinality, not full addresses

Bloom filter for compact (approximate) representation

r(210) is seen by any set of a cache with S=210 Filter address stream

■ ■ ■ ■ … ■ ■ i

P(URD=i)

r

Page 27: RATHIJIT SEN DAVID A. WOOD Reuse-based Online Models for Caches 6/20/2013 ACM SIGMETRICS 2013 @ CMU, Pittsburgh, PA 1

27

Reference address register

access

insert

Set Filter Control Logic

filtered access

load hitinc

reset

read

read

1024-bit Bloom Filter2 hash fns

9-bit Counter

inc

512-entry Histogram

array

Hardware Support for estimating r(210)

6/20/2013ACM SIGMETRICS 2013 @ CMU, Pittsburgh, PA

Start Sample

Addr match?

Unique?

Remember

End Sample

N

Y (not hit)

Y

Page 28: RATHIJIT SEN DAVID A. WOOD Reuse-based Online Models for Caches 6/20/2013 ACM SIGMETRICS 2013 @ CMU, Pittsburgh, PA 1

28

Agenda

6/20/2013ACM SIGMETRICS 2013 @ CMU, Pittsburgh, PA

The Problem

Framework Locality (r) Matrix transformations (B) Hit functions (φ) h = (r · B) · φ

Hardware support

Case Study

+ way counters

Page 29: RATHIJIT SEN DAVID A. WOOD Reuse-based Online Models for Caches 6/20/2013 ACM SIGMETRICS 2013 @ CMU, Pittsburgh, PA 1

ACM SIGMETRICS 2013 @ CMU, Pittsburgh, PA

29

LRU Way Counters [Suh, et al. 2002]

6/20/2013

One counter per logical way (stack position)Determining logical position is hard

not totally (re-)ordered with every access heuristics, e.g., for PLRU [Kedzierski, et al. 2010]

Other Limitations Inclusion property Fixed #sets

S = S : special case of reuse frameworkS S ? Use B

provided, enough tail of r(S) is available

Page 30: RATHIJIT SEN DAVID A. WOOD Reuse-based Online Models for Caches 6/20/2013 ACM SIGMETRICS 2013 @ CMU, Pittsburgh, PA 1

30

Min. EDP configuration

6/20/2013ACM SIGMETRICS 2013 @ CMU, Pittsburgh, PA

EDP within 7% of minimumReuse models outperform PLRU way counters in most cases

blac body flui freq swapammpequa fma3 gafo mgri swim wupw apac jbb oltp zeus1

1.01

1.02

1.03

1.04

1.05

1.06

1.07

1.08Reuse ModelPLRU Way Counters

Rela

tive

to m

in. E

DP

Page 31: RATHIJIT SEN DAVID A. WOOD Reuse-based Online Models for Caches 6/20/2013 ACM SIGMETRICS 2013 @ CMU, Pittsburgh, PA 1

31

Summary

6/20/2013ACM SIGMETRICS 2013 @ CMU, Pittsburgh, PA

The Problem: Online miss-rate estimation for reconfigurable caches

We propose a frameworkh = (r · B) · φ

h: hit-ratio r: reuse-distance distribution (novel hardware support) B: stochastic Binomial matrix φ: hit function (LRU, PLRU, RANDOM, NMRU)

Case study: EDP within 7% of minimum

Future work: More policies, applications/case studies

Page 32: RATHIJIT SEN DAVID A. WOOD Reuse-based Online Models for Caches 6/20/2013 ACM SIGMETRICS 2013 @ CMU, Pittsburgh, PA 1

ACM SIGMETRICS 2013 @ CMU, Pittsburgh, PA

32

Also in the paper

6/20/2013

r: lossy summarization of the address trace

Estimation for ARD

Optimizations for LRU

Conditions for PLRU eviction

More details on models & evaluation

Page 33: RATHIJIT SEN DAVID A. WOOD Reuse-based Online Models for Caches 6/20/2013 ACM SIGMETRICS 2013 @ CMU, Pittsburgh, PA 1

Reuse-based Online Models for Caches

6/20/2013ACM SIGMETRICS 2013 @ CMU, Pittsburgh, PA

33

Questions?

Page 34: RATHIJIT SEN DAVID A. WOOD Reuse-based Online Models for Caches 6/20/2013 ACM SIGMETRICS 2013 @ CMU, Pittsburgh, PA 1

34

Example LLC performance

6/20/2013ACM SIGMETRICS 2013 @ CMU, Pittsburgh, PA

OLTP (TPC-C + IBM DB2)

2-w

4-w

8-w

16-w

32-w

2-w

4-w

8-w

16-w

32-w

2-w

4-w

8-w

16-w

32-w

2-w

4-w

8-w

16-w

32-w

2-w

4-w

8-w

16-w

32-w

2MB 4MB 8MB 16MB 32MB

0

0.1

0.2

0.3

0.4RANDOMNMRUPLRULRU

Mis

s Ra

tio

Page 35: RATHIJIT SEN DAVID A. WOOD Reuse-based Online Models for Caches 6/20/2013 ACM SIGMETRICS 2013 @ CMU, Pittsburgh, PA 1

ACM SIGMETRICS 2013 @ CMU, Pittsburgh, PA

35

Estimating cache performance

6/20/2013

Hit ratio = hits/access

∑ P(URD=i) · P(hit|URD=i)

= ·

Miss ratio = misses/access = 1 – hit ratio

Miss rate = misses/instruction = miss ratio x access/instruction

■ ■ ■ ■ … ■ ■ i

P(URD=i)

r … i

P(hit|URD=i)

φ

i

Page 36: RATHIJIT SEN DAVID A. WOOD Reuse-based Online Models for Caches 6/20/2013 ACM SIGMETRICS 2013 @ CMU, Pittsburgh, PA 1

36

URD vs ARD

6/20/2013ACM SIGMETRICS 2013 @ CMU, Pittsburgh, PA

x x

z0z1 z2 z3 zk-1

{z0}* {z0,z1}* {z0,z1,z2}* {z0,z1,z2,...,zk-1}*

dk = dk-1 +1/rikApproximation:

∞dk