madcache : a pc-aware cache insertion policy

21
MadCache: A PC-aware Cache Insertion Policy Andrew Nere, Mitch Hayenga, and Mikko Lipasti PHARM Research Group University of Wisconsin – Madison June 20, 2010

Upload: hoang

Post on 23-Feb-2016

58 views

Category:

Documents


1 download

DESCRIPTION

MadCache : A PC-aware Cache Insertion Policy. Andrew Nere , Mitch Hayenga , and Mikko Lipasti PHARM Research Group University of Wisconsin – Madison June 20, 2010. Executive Summary. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: MadCache : A PC-aware Cache Insertion Policy

MadCache: A PC-aware Cache Insertion Policy

Andrew Nere, Mitch Hayenga, and Mikko Lipasti PHARM Research Group

University of Wisconsin – Madison

June 20, 2010

Page 2: MadCache : A PC-aware Cache Insertion Policy

2

• Problem: Changing hardware and workloads encourage investigation of cache replacement/insertion policy designs

• Proposal: MadCache uses PC history to choose cache insertion policy– Last level cache granularity– Individual PC granularity

• Performance improvements over LRU– 2.5% improvement IPC (single thread)– 4.5% speedup and 6% speedup improvement (multithreaded)

Executive Summary

Page 3: MadCache : A PC-aware Cache Insertion Policy

3

• Importance of investigating cache insertion policies– Direct affect on performance– LRU dominated hardware designs for many years– Changing workloads, levels of caches

• Shared last-level cache– Cache behavior now depends on multiple running applications– One streaming thread can ruin the cache for everyone

Motivation

Page 4: MadCache : A PC-aware Cache Insertion Policy

4

• Dynamic insertion policies– DIP – Qureshi et. al – ISCA ’07

• Dueling sets select best of multiple policies• Bimodal Insertion Policy (BIP) offers thrash protection

– TADIP – Jaleel et. al – PACT ’08• Awareness of other threads’ workloads

• Utilizing Program Counter information– Exhibit a useful amount of predictable behavior– Dead-block prediction and prefetching – ISCA ’01– PC-based load miss prediction – MICRO ’95

Previous Work

Page 5: MadCache : A PC-aware Cache Insertion Policy

5

• Problem: With changing hardware and workloads, caches are subject to suboptimal insertion policies

• Solution: Use PC information to create a better policy– Adaptive default cache insertion policy– Track PCs to determine the policy on a finer grain than DIP– Filter out streaming PCs

Introducing MadCache!

MadCache Proposal

Page 6: MadCache : A PC-aware Cache Insertion Policy

6

• Tracker Sets– Sample behavior of the cache– Enter the PCs into PC-Predictor Table– Determines default policy of cache

• Uses set dueling - Qureshi et. al – ISCA ’07• LRU and Bypassing Bimodal Insertion Policy (BBIP)

• Follower Sets– Majority of the last level cache– Typically follow the default policy– Can override default cache policy (PC-Predictor Table)

MadCache Design

Page 7: MadCache : A PC-aware Cache Insertion Policy

7

Tracker and Follower Sets

BBIP Tracker Sets

LRU Trackers Sets

Follower Sets

Reuse Bit

Index to PC- Predictor

• Tracker Sets overhead– 1-bit to indicate if line was accessed again– 10/11 bits to index PC-Predictor table

Last Level Cache

Page 8: MadCache : A PC-aware Cache Insertion Policy

8

• PC-Predictor Table– Store PCs that have accessed Tracker Sets– Track behavior history using counter

• Decrement if an address is used many times in the LLC • Increment if line is evicted and was never reused

– Per-PC default policy override• LRU (default) plus BBIP override• BBIP (default) plus LRU override

MadCache Design

Page 9: MadCache : A PC-aware Cache Insertion Policy

9

PC-Predictor Table

Policy + PC (MSB) Counter # Entries(1 + 64 bits) (6 bits) (9 bits)

PC (miss) (MSB) Counter

Hit?

0 1

• Parallel to cache miss, PC + current policy index PC-Predictor • If hit in table, follow the PC’s override policy• If miss in table, follow global default policy

Default Policy PC-Predictor Table

Page 10: MadCache : A PC-aware Cache Insertion Policy

10

• Thread aware MadCache– Similar structures as single-threaded MadCache– Track based on current policy of other threads

• Multithreaded MadCache extensions– Separate tracker sets for each thread

• Each thread still tracks LRU and BBIP– PC-Predictor table

• Extended number of entries• Indexed by thread-ID, policy, and PC

– Set dueling PER THREAD

Multi-Threaded MadCache

Page 11: MadCache : A PC-aware Cache Insertion Policy

11

Multi-threaded MadCache

TID + <P0,P1,P2,P3> + PC (MSB) Counter # Entries (2 + 4 + 64 bits) (6 bits) (9 bits)

(MSB) Counter

Hit?

0 1

Default Policy PC-Predictor Table

(10 bits)TID-0TID-1TID-2TID-3

TID-0 BBIP Tracker Sets

TID-0 LRU Tracker Sets

Other Tracker Sets

Follower Sets

Last Level Cache

Page 12: MadCache : A PC-aware Cache Insertion Policy

12

• Deep Packet Inspection1

– Large match tables (1MB+) commonly used for DFA/XFA regular expression matching

– Incoming byte stream from packets causes different table traversals• Table exhibits reuse between packets• Packets mostly streaming (backtracking implementation

dependent)

MadCache – Example Application

1Evaluating GPUs for Network Packet Signature Matching – ISPASS ‘09

Page 13: MadCache : A PC-aware Cache Insertion Policy

13

MadCache – Example Application

– Packets mostly streaming– Frequently accessed Match Table contents held in L1/L2

• Less frequently accessed elements in LLC/memory

Match Table Current Processing Element

Pac

ket Current Processing Element

Current Processing Element

Current Processing Element

Current Processing ElementCurrent Processing Element

Pac

ket

Pac

ket

Page 14: MadCache : A PC-aware Cache Insertion Policy

14

MadCache – Example Application

• DIP– Would favor BIP policy due to packet data streaming– LLC mixture of Match Table and useless packet data

• MadCache– Would identify PCs associated with Match Table as useful– LLC populated almst entirely by Match Table

DIP LLC MadCache LLC

Packet Data

Table Data

Page 15: MadCache : A PC-aware Cache Insertion Policy

15

Experimentation

Processor 8-Stage, 4-wide pipelineInstruction window size 128 entriesBranch Predictor PerfectL1 inst. cache 32KB, 64B linesize, 4-way

SA, LRU, 1 cycle hitL1 data cache 32KB, 64B linesize, 8-way

SA, LRU, 1 cycle hitL2 cache 32KB, 64B linesize, 8-way

SA, LRU, 10 cycle hitL3 cache (1 thread) 1MB, 64B linesize, 30 cycle

hitL3 cache (4 threads) 4MB, 64B linesize, 30 cycle

hitMain memory 200 cycles

– 15 benchmarks from SPEC CPU2006– 15 workload mixes for multithreaded experiments– 200 million cycle simulations

Page 16: MadCache : A PC-aware Cache Insertion Policy

16

IPC normalized to LRU– 2.5% improvement across benchmarks tested– Slight improvement over DIP

Results – Single-threaded

0.88

0.9

0.92

0.94

0.96

0.98

1

1.02

1.04

1.06

1.08

1.1

astar gcc hmmer geomean

IPC

Nor

mal

ized

to LR

U

RAND

DIP

MAD

Page 17: MadCache : A PC-aware Cache Insertion Policy

17

Results – Multithreaded

0.95

1

1.05

1.1

1.15

1.2Th

roug

hput

Nor

mal

ized

to L

RUDIP

MAD

Throughput normalized to LRU– 6% improvement across mixes tested– DIP performs similarly to LRU

Page 18: MadCache : A PC-aware Cache Insertion Policy

18

Results

Weighted speedup normalized to LRU– 4.5% improvement across benchmaks tested– DIP performs similarly to LRU

0.96

0.98

1

1.02

1.04

1.06

1.08

1.1

1.12

1.14

Wei

ghte

d Sp

eedu

p N

orm

alize

d to

LRU DIP

MAD

Page 19: MadCache : A PC-aware Cache Insertion Policy

19

Future Work

• MadderCache?– Optimize size of structures

• PC-Predictor Table size• Replace CAM with Hashed PC & Tag

– Detailed analysis of benchmarks with MadCache– Extend PC Predictions

• Don’t take into account sharers

Page 20: MadCache : A PC-aware Cache Insertion Policy

20

Conclusions

• Cache behavior still evolving – Changing cache levels, sharing, workloads

• MadCache insertion policy uses PC information– PCs exhibit useful amount of predictable behavior

• MadCache performance– 2.5% improvement IPC for single-threaded– 4.5% speedup, 6% throughput improvement for 4-threads– Sized to competition bit budget

• Preliminary investigations show little impact with reduction in structures

Page 21: MadCache : A PC-aware Cache Insertion Policy

21

Questions?