learning cache models by measurements jan reineke joint work with andreas abel uppsala university...

Post on 31-Mar-2015

215 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Learning Cache Models by Measurements

Jan Reineke

joint work with Andreas Abel

Uppsala University

December 20, 2012

Jan Reineke, Saarland 2

The Timing Analysis Problem

Embedded Software

Timing Requirements?

Microarchitecture

+

Jan Reineke, Saarland 3

What does the execution time of a program depend on?

Input-dependentcontrol flow

Microarchitectural State

+ Pipeline,Memory Hierarchy,Interconnect

Jan Reineke, Saarland 4

Example of Influence of Microarchitectural State

Motorola PowerPC 755

Jan Reineke, Saarland 5

Typical Structure of Timing Analysis Tools

ISA-specific analysis parts

Microarchitecture-specific part

Jan Reineke, Saarland 6

Construction of Timing Models

Needs to accurately model all aspects of the microarchitecture that influence execution time.

Jan Reineke, Saarland 7

Construction of Timing Models Today

Jan Reineke, Saarland 8

Construction of Timing Models Tomorrow

Jan Reineke, Saarland 9

Cache Models

Cache Model is important part of Timing Model Caches have simple interface: loads+stores Can be characterized by a few parameters:

ABC: associativity, block size, capacity Replacement policy

Jan Reineke, Saarland 10

High-level Approach

Generate memory access sequences Determine number of cache misses on these

sequences using performance counters, or by measuring execution times.

Infer property from measurement results.

Jan Reineke, Saarland 11

Warm-up I: Inferring the Cache Capacity

Notation: Preparatory accesses

Accesses to measure

Basic idea: Access sets of memory blocks A once. Then access A again and count cache misses. If A fits into the cache, no misses will occur.

Measure with .

Jan Reineke, Saarland 12

Example: Intel Core 2 Duo E6750, L1 Data Cache

Capacity = 32 KB

Way Size = 4 KB

|Misses|

|Size|

Jan Reineke, Saarland 13

Warm-up II: Inferring the Block Size

Given: way size W and associativity A

Wanted: block size B

Jan Reineke, Saarland 14

Inferring the Replacement Policy

There are infinitely many conceivable replacement policies… For any set of observations, multiple policies

remain possible. Need to make some assumption on possible

policies to render inference possible.

Jan Reineke, Saarland 15

A Class of Replacement Policies: Permutation Policies

Permutation Policies: Maintain total order on blocks in each cache set Evict the greatest block in the order Update the order based on the position of the

accessed block in the order

Can be specified by A+1 permutationsAssociativity-many “hit” permutations

one “miss” permutation

Examples: LRU, FIFO, PLRU, …

Jan Reineke, Saarland 16

Permutation Policy LRU

LRU (least recently used):

order on blocks based on recency of last access

a

b

c

d

c

a

b

d

c

e

c

a

b

e

most-recently-used

least-recently-used

“hit” permutation 3 “miss” permutation

Jan Reineke, Saarland 17

Permutation Policy FIFO

FIFO (first-in first-out):

order on blocks based on order of cache misses

a

b

c

d

a

b

c

d

c

e

a

b

c

e

last-in

first-in

“hit” permutation 3 “miss” permutation

Jan Reineke, Saarland 18

Inferring Permutation Policies

Strategy to infer permutation i:

1) Establish a known cache state s

2) Trigger permutation i

3) “Read out” the resulting cache state s’. Deduce permutation from s and s’.

Jan Reineke, Saarland 19

1) Establishing a known cache state

Assume “miss” permutation of FIFO, LRU, PLRU:

?

?

?

?

d

?

?

?

c

d

?

?

b

c

d

?

a

b

c

d

d c b a

Jan Reineke, Saarland 20

2) Triggering permutation i

Simply access ith block in cache state.

E,g, for i = 3, access c:

a

b

c

d

?

?

?

?

?

c

Jan Reineke, Saarland 21

3) Reading out a cache state

Exploit “miss” permutation to determine position of each of the original blocks:

If position is j then A-j+1 misses will evict the block, but A-j misses will not.

?

?

?

c

?

?

c

?

?

?

?

?

After 1 miss,c is still cached.

After 2 misses, c is not cached

Jan Reineke, Saarland 22

Inferring Permutation Policies:Example of LRU, Permutation 3

a

b

c

d

c

a

b

d

c

most-recently-used

least-recently-used

1) Establish known state

2) Trigger permutation 3

3) Read out resulting state

Jan Reineke, Saarland 23

Implementation Challenges

Interference Prefetching Instruction Caches Virtual Memory L2+L3 Caches:

Strictly-inclusive Non-inclusive Exclusive

Shared Caches: Coherence

Jan Reineke, Saarland 24

Experimental Results

Undocumented variant of LRU

Nehalem introduced“L0” micro-op buffer

Exclusive hierarchy

Surprising measurements ;-)

Jan Reineke, Saarland 25

L2 Caches on some Core 2 Duos

|misses|

n

Number of cache sets, 4096

associativities

Core 2 Duo E6300, 2 MB, 8 ways

Core 2 Duo E6750, 4 MB, 16 ways

Core 2 Duo E8400, 6 MB, 24 ways

Jan Reineke, Saarland 26

Replacement Policy of the Intel Atom D525

Discovered to our knowledge undocumented “hierarchical” policy:

a b

c d

e f

d c

a b

e f

d

x e

d c

a b

x

ATOM = LRU(3, LRU(2))

PLRU(2k) = LRU(2, PLRU(k))

Jan Reineke, Saarland 27

Conclusions and Future Work

Inference of cache models I More general class of replacement policies,

e.g. by inferring canonical register automata. Shared caches in multicores, coherency

protocols, etc. Deal with other architectural features:

translation lookaside buffers, branch predictors, prefetchers

Infer abstractions instead of “concrete” models

Jan Reineke, Saarland 28

Questions?

Measurement-based Modeling of the Cache Replacement PolicyA. Abel and J. ReinekeRTAS, 2013 (to appear).

top related