prefetch-aware shared-resource management for multi-core systems

27
Prefetch-Aware Shared-Resource Management for Multi-Core Systems Eiman Ebrahimi * Chang Joo Lee* + Onur Mutlu Yale N. Patt * * HPS Research Group The University of Texas at Austin ‡ Computer Architecture Laboratory Carnegie Mellon University + Intel Corporation Austin

Upload: dianne

Post on 25-Feb-2016

50 views

Category:

Documents


0 download

DESCRIPTION

Prefetch-Aware Shared-Resource Management for Multi-Core Systems. Eiman Ebrahimi * Chang Joo Lee * + Onur Mutlu ‡ Yale N. Patt *. * HPS Research Group The University of Texas at Austin. ‡ Computer Architecture Laboratory Carnegie Mellon University. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Prefetch-Aware  Shared-Resource Management for Multi-Core Systems

Prefetch-Aware Shared-Resource Management

for Multi-Core Systems

Eiman Ebrahimi*Chang Joo Lee*+

Onur Mutlu‡

Yale N. Patt*

* HPS Research Group The University of Texas at

Austin‡ Computer Architecture Laboratory

Carnegie Mellon University+ Intel Corporation

Austin

Page 2: Prefetch-Aware  Shared-Resource Management for Multi-Core Systems

2

Background and ProblemCore 0 Core 1 Core 2 Core N

Shared Cache

Memory Controller

DRAMBank

0

DRAMBank

1

DRAM Bank

2... DRAM

Bank K

...

Shared MemoryResources

Chip BoundaryOn-chipOff-chip

2

Core 0 Prefetcher

Core N Prefetcher

...

...

Page 3: Prefetch-Aware  Shared-Resource Management for Multi-Core Systems

Background and Problem Understand the impact of prefetching on

previously proposed shared resource management techniques

3

Page 4: Prefetch-Aware  Shared-Resource Management for Multi-Core Systems

Background and Problem Understand the impact of prefetching on

previously proposed shared resource management techniques Fair cache management techniques Fair memory controllers Fair management of on-chip inteconnect Fair management of multiple shared resources

4

Page 5: Prefetch-Aware  Shared-Resource Management for Multi-Core Systems

Background and Problem Understand the impact of prefetching on

previously proposed shared resource management techniques Fair cache management techniques Fair memory controllers

- Network Fair Queuing (Nesbit et. al. MICRO’06)- Parallelism Aware Batch Scheduling

(Mutlu et. al. ISCA’08) Fair management of on-chip interconnect Fair management of multiple shared resources

- Fairness via Source Throttling (Ebrahimi et. al., ASPLOS’10)

5

Page 6: Prefetch-Aware  Shared-Resource Management for Multi-Core Systems

Background and Problem

6Perf. Max Slowdown

0

0.2

0.4

0.6

0.8

1

1.2

Fair memory scheduling technique: Network Fair Queuing (NFQ) Improves fairness and performance with no prefetching Significant degradation of performance and fairness

in the presence of prefetching

Perf.

Max Sl

owdo

wn0

0.20.40.60.8

11.21.41.61.8

FR-FCFSNFQ

No Prefetching Aggressive Stream Prefetching

Page 7: Prefetch-Aware  Shared-Resource Management for Multi-Core Systems

Background and Problem Understanding the impact of prefetching on

previously proposed shared resource management techniques Fair cache management techniques Fair memory controllers Fair management of on-chip inteconnect Fair management of multiple shared resources

Goal: Devise general mechanisms for taking into account prefetch requests in fairness techniques

7

Page 8: Prefetch-Aware  Shared-Resource Management for Multi-Core Systems

Background and Problem Prior work addresses inter-application

interference caused by prefetches Hierarchical Prefetcher Aggressiveness

Control (Ebrahimi et. al., MICRO’09) Dynamically detects interference caused by

prefetches and throttles down overly aggressive prefetchers

Even with controlled prefetching, fairness techniques should be made prefetch-aware

8

Page 9: Prefetch-Aware  Shared-Resource Management for Multi-Core Systems

Outline Problem Statement Motivation for Special Treatment of

Prefetches Prefetch-Aware Shared Resource

Management Evaluation Conclusion

9

Page 10: Prefetch-Aware  Shared-Resource Management for Multi-Core Systems

Parallelism-Aware Batch Scheduling (PAR-BS) [Mutlu & Moscibroda ISCA’08]

Principle 1: Parallelism-awareness Schedules requests from each thread to

different banks back to back Preserves each thread’s bank parallelism

Principle 2: Request Batching Marks a fixed number of oldest requests

from each thread to form a “batch” Eliminates starvation & provides fairness

10

Bank 0 Bank 1

T1

T1

T0

T0

T2

T2

T3

T3

T3 T2

T2

Batch

T0

T1 T1

Page 11: Prefetch-Aware  Shared-Resource Management for Multi-Core Systems

Impact of Prefetching onParallelism-Aware Batch Scheduling Policy (a): Include prefetches and demands alike

when generating a batch

Policy (b): Prefetches are not included alongside demands when generating a batch

11

Page 12: Prefetch-Aware  Shared-Resource Management for Multi-Core Systems

Impact of Prefetching onParallelism-Aware Batch Scheduling

12

Bank 1 Bank 2

Bank 1 Bank 2

Policy (a) Mark Prefetches in PAR-BS

Policy (b) Don’t Mark Prefetches in PAR-BS

P1D1D2P2

P1P1D2D2P2

Serv

ice O

rder

P1

D1D2

P2P1P1

D2D2

P2

DRAMBank 1Bank 2Core 1Core 2

P1 D1 D2 P2 P1 P1 D2 D2 P2

Compute

Compute

Hit P2 Hit P2

Serv

ice O

rder

Bank 1Bank 2Core 1Core 2

P1D1 D2 P2 P1 P1D2 D2 P2

Compute

ComputeMiss Miss

P1D1D2P2

P1P1D2D2P2

Saved Cycles

Saved Cycles

Accurate PrefetchInaccurate Prefetch

Accurate PrefetchesToo Late

Stall

Stall

C C

Stall

C CStall

Stall

Stall

Batch

Batch

Page 13: Prefetch-Aware  Shared-Resource Management for Multi-Core Systems

Impact of Prefetching on Parallelism-Aware Batch Scheduling Policy (a): Include prefetches and demands

alike when generating a batch Pros: Accurate prefetches will be more timely Cons: Inaccurate prefetches from one thread can

unfairly delay demands and accurate prefetches of others

Policy (b): Prefetches are not included alongside demands when generating a batch Pros: Inaccurate prefetches can not unfairly delay

demands of other cores Cons: Accurate prefetches will be less timely

- Less performance benefit from prefetching

13

Page 14: Prefetch-Aware  Shared-Resource Management for Multi-Core Systems

Outline Problem Statement Motivation for Special Treatment of

Prefetches Prefetch-Aware Shared Resource

Management Evaluation Conclusion

14

Page 15: Prefetch-Aware  Shared-Resource Management for Multi-Core Systems

Prefetch-Aware Shared Resource Management Three key ideas:

Fair memory controllers: Extend underlying prioritization policies to distinguish between prefetches based on prefetch accuracy

Fairness via source-throttling technique:Coordinate core and prefetcher throttling decisions

Demand boosting for memory non-intensive applications

15

Page 16: Prefetch-Aware  Shared-Resource Management for Multi-Core Systems

Prefetch-Aware Shared Resource Management Three key ideas:

Fair memory controllers: Extend underlying prioritization policies to distinguish between prefetches based on prefetch accuracy

Fairness via source-throttling technique:Coordinate core and prefetcher throttling decisions

Demand boosting for memory non-intensive applications

16

Page 17: Prefetch-Aware  Shared-Resource Management for Multi-Core Systems

Batch

Prefetch-aware PARBS (P-PARBS)

17

Bank 1 Bank 2

P1D1D2P2

P1P1D2D2P2

Serv

ice O

rder DRAM

Bank 1Bank 2Core 1Core 2

P1 D1 D2 P2 P1 P1 D2 D2 P2

Compute

Compute

Hit P2 Hit P2

Accurate PrefetchInaccurate Prefetch

Stall

C CStall

Policy (a) Mark Prefetches in PAR-BS

Page 18: Prefetch-Aware  Shared-Resource Management for Multi-Core Systems

Batch

Prefetch-aware PARBS (P-PARBS)

18

Bank 1 Bank 2

Policy (b) Don’t Mark Prefetches in PAR-BS

P1

D1D2

P2P1P1

D2

P2

Serv

ice O

rder

Bank 1Bank 2Core 1Core 2

P1D1 D2 P2P1 P1D2 D2 P2

Compute

Compute

Miss Miss

D2

Saved Cycles

Stall

Stall

C CStall

Stall

Bank 1 Bank 2

Our Policy: Mark Accurate Prefetches

P1

D1D2P2

P1P1

D2D2P2

Serv

ice O

rder

DRAM

Bank 1Bank 2Core 1Core 2

P1D1 D2 P2P1P1D2 D2 P2

Compute

Compute

Hit P2 Hit P2

Accurate PrefetchInaccurate Prefetch

Stall

C CStall

Batch

Accurate PrefetchesToo Late

Underlying prioritization policies need to distinguish between

prefetches based on accuracy

Page 19: Prefetch-Aware  Shared-Resource Management for Multi-Core Systems

Prefetch-Aware Shared Resource Management Three key ideas:

Fair memory controllers: Extend underlying prioritization policies to distinguish between prefetches based on prefetch accuracy

Fairness via source-throttling technique:Coordinate core and prefetcher throttling decisions

Demand boosting for memory non-intensive applications

19

Page 20: Prefetch-Aware  Shared-Resource Management for Multi-Core Systems

Bank 1 Bank 2

Serviced First

Serviced Last

ServiceOrder

No Demand Boosting With Demand Boosting

Core1 Dem

Core2 Dem

Legend:

Core2 Pref

Core 1 is memory

non-intensive

Core 2 is memoryintensive

Core1 Dem

Core2 Dem

Legend:

Core2 Pref

Core 1 is memory

non-intensive

Core 2 is memoryintensiveBank 1 Bank 2

Demand boosting eliminates starvation of memory non-intensive

applications

Page 21: Prefetch-Aware  Shared-Resource Management for Multi-Core Systems

Prefetch-Aware Shared Resource Management Three key ideas:

Fair memory controllers: Extend underlying prioritization policies to distinguish between prefetches based on prefetch accuracy

Fairness via source-throttling technique:Coordinate core and prefetcher throttling decisions

Demand boosting for memory non-intensive applications

21

Page 22: Prefetch-Aware  Shared-Resource Management for Multi-Core Systems

Outline Problem Statement Motivation for Special Treatment of

Prefetches Prefetch-Aware Shared Resource

Management Evaluation Conclusion

22

Page 23: Prefetch-Aware  Shared-Resource Management for Multi-Core Systems

Evaluation Methodology x86 cycle accurate simulator

Baseline processor configuration Per-core

- 4-wide issue, out-of-order, 256 entry ROB

Shared (4-core system)- 128 MSHRs- 2MB, 16-way L2 cache

Main Memory- DDR3 1333 MHz- Latency of 15ns per command (tRP, tRCD, CL)- 8B wide core to memory bus

23

Page 24: Prefetch-Aware  Shared-Resource Management for Multi-Core Systems

System Performance Results

NFQ-1.66533453693773E-16

0.2

0.4

0.6

0.8

1

1.2

24

PARBS0

0.2

0.4

0.6

0.8

1

1.2

FST (Core Throt-tling)

0

0.2

0.4

0.6

0.8

1

1.2

No Prefetching

Aggressive Prefetching

HPAC

Prefetch-Aware

11% 10.9% 11.3%

Page 25: Prefetch-Aware  Shared-Resource Management for Multi-Core Systems

Max Slowdown Results

NFQ-1.66533453693773E-16

0.2

0.4

0.6

0.8

1

1.2

25

PARBS0

0.2

0.4

0.6

0.8

1

1.2

FST (Core Throt-tling)

0

0.2

0.4

0.6

0.8

1

1.2

No Prefetching

Aggressive Prefetching

HPAC

Prefetch-Aware

9.9% 18.4% 14.5%

Page 26: Prefetch-Aware  Shared-Resource Management for Multi-Core Systems

Conclusion State-of-the-art fair shared resource management

techniques can be harmful in the presence of prefetching Their underlying prioritization techniques need to be extended

to differentiate prefetches based on accuracy Core and prefetcher throttling should be coordinated with

source-based resource management techniques

Demand boosting eliminates starvation ofmemory non-intensive applications

Our mechanisms improve both fair memory schedulers and source throttling in both system performance and fairness by >10%

26

Page 27: Prefetch-Aware  Shared-Resource Management for Multi-Core Systems

Prefetch-Aware Shared-Resource Management

for Multi-Core Systems

Eiman Ebrahimi*Chang Joo Lee*+

Onur Mutlu‡

Yale N. Patt*

* HPS Research Group The University of Texas at

Austin‡ Computer Architecture Laboratory

Carnegie Mellon University+ Intel Corporation

Austin