prefetch-aware shared-resource management for multi-core systems eiman ebrahimi * chang joo lee * +...

27
Prefetch-Aware Shared-Resource Management for Multi-Core Systems Eiman Ebrahimi * Chang Joo Lee* + Onur Mutlu Yale N. Patt * * HPS Research Group The University of Texas at Austin ‡ Computer Architecture Laboratory Carnegie Mellon University + Intel Corporation Austin

Upload: joseph-doherty

Post on 28-Mar-2015

217 views

Category:

Documents


4 download

TRANSCRIPT

Page 1: Prefetch-Aware Shared-Resource Management for Multi-Core Systems Eiman Ebrahimi * Chang Joo Lee * + Onur Mutlu Yale N. Patt * * HPS Research Group The

Prefetch-Aware Shared-Resource Management

for Multi-Core Systems

Eiman Ebrahimi*

Chang Joo Lee*+

Onur Mutlu‡

Yale N. Patt*

* HPS Research Group The University of Texas at

Austin‡ Computer Architecture Laboratory

Carnegie Mellon University+ Intel Corporation

Austin

Page 2: Prefetch-Aware Shared-Resource Management for Multi-Core Systems Eiman Ebrahimi * Chang Joo Lee * + Onur Mutlu Yale N. Patt * * HPS Research Group The

2

Background and Problem

Core 0 Core 1 Core 2 Core N

Shared Cache

Memory Controller

DRAMBank

0

DRAMBank

1

DRAM Bank

2

... DRAMBank K

...

Shared MemoryResources

Chip BoundaryOn-chipOff-chip

2

Core 0 Prefetcher

Core N Prefetcher

...

...

Page 3: Prefetch-Aware Shared-Resource Management for Multi-Core Systems Eiman Ebrahimi * Chang Joo Lee * + Onur Mutlu Yale N. Patt * * HPS Research Group The

Background and Problem

Understand the impact of prefetching on previously proposed shared resource management techniques

3

Page 4: Prefetch-Aware Shared-Resource Management for Multi-Core Systems Eiman Ebrahimi * Chang Joo Lee * + Onur Mutlu Yale N. Patt * * HPS Research Group The

Background and Problem

Understand the impact of prefetching on previously proposed shared resource management techniques Fair cache management techniques Fair memory controllers Fair management of on-chip inteconnect Fair management of multiple shared resources

4

Page 5: Prefetch-Aware Shared-Resource Management for Multi-Core Systems Eiman Ebrahimi * Chang Joo Lee * + Onur Mutlu Yale N. Patt * * HPS Research Group The

Background and Problem

Understand the impact of prefetching on previously proposed shared resource management techniques Fair cache management techniques Fair memory controllers

- Network Fair Queuing (Nesbit et. al. MICRO’06)- Parallelism Aware Batch Scheduling

(Mutlu et. al. ISCA’08)

Fair management of on-chip interconnect Fair management of multiple shared resources

- Fairness via Source Throttling (Ebrahimi et. al., ASPLOS’10)

5

Page 6: Prefetch-Aware Shared-Resource Management for Multi-Core Systems Eiman Ebrahimi * Chang Joo Lee * + Onur Mutlu Yale N. Patt * * HPS Research Group The

Background and Problem

6Perf. Max Slowdown

0

0.2

0.4

0.6

0.8

1

1.2

Fair memory scheduling technique: Network Fair Queuing (NFQ) Improves fairness and performance with no prefetching Significant degradation of performance and fairness

in the presence of prefetching

Perf.

Max

Slo

wdo

wn

0

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1.8

FR-FCFSNFQ

No Prefetching Aggressive Stream Prefetching

Page 7: Prefetch-Aware Shared-Resource Management for Multi-Core Systems Eiman Ebrahimi * Chang Joo Lee * + Onur Mutlu Yale N. Patt * * HPS Research Group The

Background and Problem

Understanding the impact of prefetching on previously proposed shared resource management techniques Fair cache management techniques Fair memory controllers Fair management of on-chip inteconnect Fair management of multiple shared resources

Goal: Devise general mechanisms for taking into account prefetch requests in fairness techniques

7

Page 8: Prefetch-Aware Shared-Resource Management for Multi-Core Systems Eiman Ebrahimi * Chang Joo Lee * + Onur Mutlu Yale N. Patt * * HPS Research Group The

Background and Problem

Prior work addresses inter-application interference caused by prefetches Hierarchical Prefetcher Aggressiveness

Control (Ebrahimi et. al., MICRO’09) Dynamically detects interference caused by

prefetches and throttles down overly aggressive prefetchers

Even with controlled prefetching, fairness techniques should be made prefetch-aware

8

Page 9: Prefetch-Aware Shared-Resource Management for Multi-Core Systems Eiman Ebrahimi * Chang Joo Lee * + Onur Mutlu Yale N. Patt * * HPS Research Group The

Outline

Problem Statement Motivation for Special Treatment of

Prefetches Prefetch-Aware Shared Resource

Management Evaluation Conclusion

9

Page 10: Prefetch-Aware Shared-Resource Management for Multi-Core Systems Eiman Ebrahimi * Chang Joo Lee * + Onur Mutlu Yale N. Patt * * HPS Research Group The

Parallelism-Aware Batch Scheduling (PAR-BS) [Mutlu & Moscibroda ISCA’08]

Principle 1: Parallelism-awareness Schedules requests from each thread to

different banks back to back Preserves each thread’s bank parallelism

Principle 2: Request Batching Marks a fixed number of oldest requests

from each thread to form a “batch” Eliminates starvation & provides fairness

10

Bank 0 Bank 1

T1

T1

T0

T0

T2

T2

T3

T3

T3 T2

T2

Batch

T0

T1 T1

Page 11: Prefetch-Aware Shared-Resource Management for Multi-Core Systems Eiman Ebrahimi * Chang Joo Lee * + Onur Mutlu Yale N. Patt * * HPS Research Group The

Impact of Prefetching onParallelism-Aware Batch Scheduling

Policy (a): Include prefetches and demands alike when generating a batch

Policy (b): Prefetches are not included alongside demands when generating a batch

11

Page 12: Prefetch-Aware Shared-Resource Management for Multi-Core Systems Eiman Ebrahimi * Chang Joo Lee * + Onur Mutlu Yale N. Patt * * HPS Research Group The

Impact of Prefetching onParallelism-Aware Batch Scheduling

12

Bank 1 Bank 2

Bank 1 Bank 2

Policy (a) Mark Prefetches in PAR-BS

Policy (b) Don’t Mark Prefetches in PAR-BS

P1D1D2P2

P1P1D2D2P2

Serv

ice O

rder

P1

D1D2

P2P1P1

D2D2

P2

DRAMBank 1Bank 2

Core 1

Core 2

P1 D1 D2 P2

P1 P1 D2 D2 P2

Compute

Compute

Hit P2 Hit P2

Serv

ice O

rder

Bank 1Bank 2

Core 1

Core 2

P1D1 D2 P2

P1 P1D2 D2 P2

Compute

ComputeMiss

Miss

P1D1D2P2

P1P1D2D2P2

Saved Cycles

Saved Cycles

Accurate PrefetchInaccurate Prefetch

Accurate PrefetchesToo Late

Stall

Stall

C C

Stall

C CStall

Stall

Stall

Batch

Batch

Page 13: Prefetch-Aware Shared-Resource Management for Multi-Core Systems Eiman Ebrahimi * Chang Joo Lee * + Onur Mutlu Yale N. Patt * * HPS Research Group The

Impact of Prefetching on Parallelism-Aware Batch Scheduling

Policy (a): Include prefetches and demands alike when generating a batch Pros: Accurate prefetches will be more timely Cons: Inaccurate prefetches from one thread can

unfairly delay demands and accurate prefetches of others

Policy (b): Prefetches are not included alongside demands when generating a batch Pros: Inaccurate prefetches can not unfairly delay

demands of other cores Cons: Accurate prefetches will be less timely

- Less performance benefit from prefetching

13

Page 14: Prefetch-Aware Shared-Resource Management for Multi-Core Systems Eiman Ebrahimi * Chang Joo Lee * + Onur Mutlu Yale N. Patt * * HPS Research Group The

Outline

Problem Statement Motivation for Special Treatment of

Prefetches Prefetch-Aware Shared Resource

Management Evaluation Conclusion

14

Page 15: Prefetch-Aware Shared-Resource Management for Multi-Core Systems Eiman Ebrahimi * Chang Joo Lee * + Onur Mutlu Yale N. Patt * * HPS Research Group The

Prefetch-Aware Shared Resource Management

Three key ideas: Fair memory controllers:

Extend underlying prioritization policies to distinguish between prefetches based on prefetch accuracy

Fairness via source-throttling technique:Coordinate core and prefetcher throttling decisions

Demand boosting for memory non-intensive applications

15

Page 16: Prefetch-Aware Shared-Resource Management for Multi-Core Systems Eiman Ebrahimi * Chang Joo Lee * + Onur Mutlu Yale N. Patt * * HPS Research Group The

Prefetch-Aware Shared Resource Management

Three key ideas: Fair memory controllers:

Extend underlying prioritization policies to distinguish between prefetches based on prefetch accuracy

Fairness via source-throttling technique:Coordinate core and prefetcher throttling decisions

Demand boosting for memory non-intensive applications

16

Page 17: Prefetch-Aware Shared-Resource Management for Multi-Core Systems Eiman Ebrahimi * Chang Joo Lee * + Onur Mutlu Yale N. Patt * * HPS Research Group The

Batch

Prefetch-aware PARBS (P-PARBS)

17

Bank 1 Bank 2

P1D1D2P2

P1P1D2D2P2

Serv

ice O

rder

DRAMBank 1Bank 2

Core 1

Core 2

P1 D1 D2 P2

P1 P1 D2 D2 P2

Compute

Compute

Hit P2 Hit P2

Accurate PrefetchInaccurate Prefetch

Stall

C CStall

Policy (a) Mark Prefetches in PAR-BS

Page 18: Prefetch-Aware Shared-Resource Management for Multi-Core Systems Eiman Ebrahimi * Chang Joo Lee * + Onur Mutlu Yale N. Patt * * HPS Research Group The

Batch

Prefetch-aware PARBS (P-PARBS)

18

Bank 1 Bank 2

Policy (b) Don’t Mark Prefetches in PAR-BS

P1

D1D2

P2P1P1

D2

P2

Serv

ice O

rder

Bank 1Bank 2

Core 1

Core 2

P1D1 D2 P2

P1 P1D2 D2 P2

Compute

Compute

Miss Miss

D2

Saved Cycles

Stall

Stall

C CStall

Stall

Bank 1 Bank 2

Our Policy: Mark Accurate Prefetches

P1

D1D2P2

P1P1

D2D2P2

Serv

ice O

rder

DRAM

Bank 1Bank 2

Core 1

Core 2

P1D1 D2 P2

P1P1D2 D2 P2

Compute

Compute

Hit P2 Hit P2

Accurate PrefetchInaccurate Prefetch

Stall

C CStall

Batch

Accurate PrefetchesToo Late

Underlying prioritization policies need to distinguish between

prefetches based on accuracy

Page 19: Prefetch-Aware Shared-Resource Management for Multi-Core Systems Eiman Ebrahimi * Chang Joo Lee * + Onur Mutlu Yale N. Patt * * HPS Research Group The

Prefetch-Aware Shared Resource Management

Three key ideas: Fair memory controllers:

Extend underlying prioritization policies to distinguish between prefetches based on prefetch accuracy

Fairness via source-throttling technique:Coordinate core and prefetcher throttling decisions

Demand boosting for memory non-intensive applications

19

Page 20: Prefetch-Aware Shared-Resource Management for Multi-Core Systems Eiman Ebrahimi * Chang Joo Lee * + Onur Mutlu Yale N. Patt * * HPS Research Group The

Bank 1 Bank 2

Serviced First

Serviced Last

ServiceOrder

No Demand Boosting With Demand Boosting

Core1 Dem

Core2 Dem

Legend:

Core2 Pref

Core 1 is memory

non-intensive

Core 2 is memoryintensive

Core1 Dem

Core2 Dem

Legend:

Core2 Pref

Core 1 is memory

non-intensive

Core 2 is memoryintensiveBank 1 Bank 2

Demand boosting eliminates starvation of memory non-intensive

applications

Page 21: Prefetch-Aware Shared-Resource Management for Multi-Core Systems Eiman Ebrahimi * Chang Joo Lee * + Onur Mutlu Yale N. Patt * * HPS Research Group The

Prefetch-Aware Shared Resource Management

Three key ideas: Fair memory controllers:

Extend underlying prioritization policies to distinguish between prefetches based on prefetch accuracy

Fairness via source-throttling technique:Coordinate core and prefetcher throttling decisions

Demand boosting for memory non-intensive applications

21

Page 22: Prefetch-Aware Shared-Resource Management for Multi-Core Systems Eiman Ebrahimi * Chang Joo Lee * + Onur Mutlu Yale N. Patt * * HPS Research Group The

Outline

Problem Statement Motivation for Special Treatment of

Prefetches Prefetch-Aware Shared Resource

Management Evaluation Conclusion

22

Page 23: Prefetch-Aware Shared-Resource Management for Multi-Core Systems Eiman Ebrahimi * Chang Joo Lee * + Onur Mutlu Yale N. Patt * * HPS Research Group The

Evaluation Methodology

x86 cycle accurate simulator

Baseline processor configuration Per-core

- 4-wide issue, out-of-order, 256 entry ROB

Shared (4-core system)- 128 MSHRs- 2MB, 16-way L2 cache

Main Memory- DDR3 1333 MHz- Latency of 15ns per command (tRP, tRCD, CL)- 8B wide core to memory bus

23

Page 24: Prefetch-Aware Shared-Resource Management for Multi-Core Systems Eiman Ebrahimi * Chang Joo Lee * + Onur Mutlu Yale N. Patt * * HPS Research Group The

System Performance Results

NFQ-1.66533453693773E-16

0.2

0.4

0.6

0.8

1

1.2

24

PARBS0

0.2

0.4

0.6

0.8

1

1.2

FST (Core Throt-tling)

0

0.2

0.4

0.6

0.8

1

1.2

No Prefetching

Aggressive Prefetching

HPAC

Prefetch-Aware

11% 10.9% 11.3%

Page 25: Prefetch-Aware Shared-Resource Management for Multi-Core Systems Eiman Ebrahimi * Chang Joo Lee * + Onur Mutlu Yale N. Patt * * HPS Research Group The

Max Slowdown Results

NFQ-1.66533453693773E-16

0.2

0.4

0.6

0.8

1

1.2

25

PARBS0

0.2

0.4

0.6

0.8

1

1.2

FST (Core Throt-tling)

0

0.2

0.4

0.6

0.8

1

1.2

No Prefetching

Aggressive Prefetching

HPAC

Prefetch-Aware

9.9% 18.4% 14.5%

Page 26: Prefetch-Aware Shared-Resource Management for Multi-Core Systems Eiman Ebrahimi * Chang Joo Lee * + Onur Mutlu Yale N. Patt * * HPS Research Group The

Conclusion

State-of-the-art fair shared resource management techniques can be harmful in the presence of prefetching Their underlying prioritization techniques need to be extended

to differentiate prefetches based on accuracy Core and prefetcher throttling should be coordinated with

source-based resource management techniques

Demand boosting eliminates starvation ofmemory non-intensive applications

Our mechanisms improve both fair memory schedulers and source throttling in both system performance and fairness by >10%

26

Page 27: Prefetch-Aware Shared-Resource Management for Multi-Core Systems Eiman Ebrahimi * Chang Joo Lee * + Onur Mutlu Yale N. Patt * * HPS Research Group The

Prefetch-Aware Shared-Resource Management

for Multi-Core Systems

Eiman Ebrahimi*

Chang Joo Lee*+

Onur Mutlu‡

Yale N. Patt*

* HPS Research Group The University of Texas at

Austin‡ Computer Architecture Laboratory

Carnegie Mellon University+ Intel Corporation

Austin