prefetch-aware shared-resource management for multi-core systems eiman ebrahimi * chang joo lee * +...
TRANSCRIPT
Prefetch-Aware Shared-Resource Management
for Multi-Core Systems
Eiman Ebrahimi*
Chang Joo Lee*+
Onur Mutlu‡
Yale N. Patt*
* HPS Research Group The University of Texas at
Austin‡ Computer Architecture Laboratory
Carnegie Mellon University+ Intel Corporation
Austin
2
Background and Problem
Core 0 Core 1 Core 2 Core N
Shared Cache
Memory Controller
DRAMBank
0
DRAMBank
1
DRAM Bank
2
... DRAMBank K
...
Shared MemoryResources
Chip BoundaryOn-chipOff-chip
2
Core 0 Prefetcher
Core N Prefetcher
...
...
Background and Problem
Understand the impact of prefetching on previously proposed shared resource management techniques
3
Background and Problem
Understand the impact of prefetching on previously proposed shared resource management techniques Fair cache management techniques Fair memory controllers Fair management of on-chip inteconnect Fair management of multiple shared resources
4
Background and Problem
Understand the impact of prefetching on previously proposed shared resource management techniques Fair cache management techniques Fair memory controllers
- Network Fair Queuing (Nesbit et. al. MICRO’06)- Parallelism Aware Batch Scheduling
(Mutlu et. al. ISCA’08)
Fair management of on-chip interconnect Fair management of multiple shared resources
- Fairness via Source Throttling (Ebrahimi et. al., ASPLOS’10)
5
Background and Problem
6Perf. Max Slowdown
0
0.2
0.4
0.6
0.8
1
1.2
Fair memory scheduling technique: Network Fair Queuing (NFQ) Improves fairness and performance with no prefetching Significant degradation of performance and fairness
in the presence of prefetching
Perf.
Max
Slo
wdo
wn
0
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8
FR-FCFSNFQ
No Prefetching Aggressive Stream Prefetching
Background and Problem
Understanding the impact of prefetching on previously proposed shared resource management techniques Fair cache management techniques Fair memory controllers Fair management of on-chip inteconnect Fair management of multiple shared resources
Goal: Devise general mechanisms for taking into account prefetch requests in fairness techniques
7
Background and Problem
Prior work addresses inter-application interference caused by prefetches Hierarchical Prefetcher Aggressiveness
Control (Ebrahimi et. al., MICRO’09) Dynamically detects interference caused by
prefetches and throttles down overly aggressive prefetchers
Even with controlled prefetching, fairness techniques should be made prefetch-aware
8
Outline
Problem Statement Motivation for Special Treatment of
Prefetches Prefetch-Aware Shared Resource
Management Evaluation Conclusion
9
Parallelism-Aware Batch Scheduling (PAR-BS) [Mutlu & Moscibroda ISCA’08]
Principle 1: Parallelism-awareness Schedules requests from each thread to
different banks back to back Preserves each thread’s bank parallelism
Principle 2: Request Batching Marks a fixed number of oldest requests
from each thread to form a “batch” Eliminates starvation & provides fairness
10
Bank 0 Bank 1
T1
T1
T0
T0
T2
T2
T3
T3
T3 T2
T2
Batch
T0
T1 T1
Impact of Prefetching onParallelism-Aware Batch Scheduling
Policy (a): Include prefetches and demands alike when generating a batch
Policy (b): Prefetches are not included alongside demands when generating a batch
11
Impact of Prefetching onParallelism-Aware Batch Scheduling
12
Bank 1 Bank 2
Bank 1 Bank 2
Policy (a) Mark Prefetches in PAR-BS
Policy (b) Don’t Mark Prefetches in PAR-BS
P1D1D2P2
P1P1D2D2P2
Serv
ice O
rder
P1
D1D2
P2P1P1
D2D2
P2
DRAMBank 1Bank 2
Core 1
Core 2
P1 D1 D2 P2
P1 P1 D2 D2 P2
Compute
Compute
Hit P2 Hit P2
Serv
ice O
rder
Bank 1Bank 2
Core 1
Core 2
P1D1 D2 P2
P1 P1D2 D2 P2
Compute
ComputeMiss
Miss
P1D1D2P2
P1P1D2D2P2
Saved Cycles
Saved Cycles
Accurate PrefetchInaccurate Prefetch
Accurate PrefetchesToo Late
Stall
Stall
C C
Stall
C CStall
Stall
Stall
Batch
Batch
Impact of Prefetching on Parallelism-Aware Batch Scheduling
Policy (a): Include prefetches and demands alike when generating a batch Pros: Accurate prefetches will be more timely Cons: Inaccurate prefetches from one thread can
unfairly delay demands and accurate prefetches of others
Policy (b): Prefetches are not included alongside demands when generating a batch Pros: Inaccurate prefetches can not unfairly delay
demands of other cores Cons: Accurate prefetches will be less timely
- Less performance benefit from prefetching
13
Outline
Problem Statement Motivation for Special Treatment of
Prefetches Prefetch-Aware Shared Resource
Management Evaluation Conclusion
14
Prefetch-Aware Shared Resource Management
Three key ideas: Fair memory controllers:
Extend underlying prioritization policies to distinguish between prefetches based on prefetch accuracy
Fairness via source-throttling technique:Coordinate core and prefetcher throttling decisions
Demand boosting for memory non-intensive applications
15
Prefetch-Aware Shared Resource Management
Three key ideas: Fair memory controllers:
Extend underlying prioritization policies to distinguish between prefetches based on prefetch accuracy
Fairness via source-throttling technique:Coordinate core and prefetcher throttling decisions
Demand boosting for memory non-intensive applications
16
Batch
Prefetch-aware PARBS (P-PARBS)
17
Bank 1 Bank 2
P1D1D2P2
P1P1D2D2P2
Serv
ice O
rder
DRAMBank 1Bank 2
Core 1
Core 2
P1 D1 D2 P2
P1 P1 D2 D2 P2
Compute
Compute
Hit P2 Hit P2
Accurate PrefetchInaccurate Prefetch
Stall
C CStall
Policy (a) Mark Prefetches in PAR-BS
Batch
Prefetch-aware PARBS (P-PARBS)
18
Bank 1 Bank 2
Policy (b) Don’t Mark Prefetches in PAR-BS
P1
D1D2
P2P1P1
D2
P2
Serv
ice O
rder
Bank 1Bank 2
Core 1
Core 2
P1D1 D2 P2
P1 P1D2 D2 P2
Compute
Compute
Miss Miss
D2
Saved Cycles
Stall
Stall
C CStall
Stall
Bank 1 Bank 2
Our Policy: Mark Accurate Prefetches
P1
D1D2P2
P1P1
D2D2P2
Serv
ice O
rder
DRAM
Bank 1Bank 2
Core 1
Core 2
P1D1 D2 P2
P1P1D2 D2 P2
Compute
Compute
Hit P2 Hit P2
Accurate PrefetchInaccurate Prefetch
Stall
C CStall
Batch
Accurate PrefetchesToo Late
Underlying prioritization policies need to distinguish between
prefetches based on accuracy
Prefetch-Aware Shared Resource Management
Three key ideas: Fair memory controllers:
Extend underlying prioritization policies to distinguish between prefetches based on prefetch accuracy
Fairness via source-throttling technique:Coordinate core and prefetcher throttling decisions
Demand boosting for memory non-intensive applications
19
Bank 1 Bank 2
Serviced First
Serviced Last
ServiceOrder
No Demand Boosting With Demand Boosting
Core1 Dem
Core2 Dem
Legend:
Core2 Pref
Core 1 is memory
non-intensive
Core 2 is memoryintensive
Core1 Dem
Core2 Dem
Legend:
Core2 Pref
Core 1 is memory
non-intensive
Core 2 is memoryintensiveBank 1 Bank 2
Demand boosting eliminates starvation of memory non-intensive
applications
Prefetch-Aware Shared Resource Management
Three key ideas: Fair memory controllers:
Extend underlying prioritization policies to distinguish between prefetches based on prefetch accuracy
Fairness via source-throttling technique:Coordinate core and prefetcher throttling decisions
Demand boosting for memory non-intensive applications
21
Outline
Problem Statement Motivation for Special Treatment of
Prefetches Prefetch-Aware Shared Resource
Management Evaluation Conclusion
22
Evaluation Methodology
x86 cycle accurate simulator
Baseline processor configuration Per-core
- 4-wide issue, out-of-order, 256 entry ROB
Shared (4-core system)- 128 MSHRs- 2MB, 16-way L2 cache
Main Memory- DDR3 1333 MHz- Latency of 15ns per command (tRP, tRCD, CL)- 8B wide core to memory bus
23
System Performance Results
NFQ-1.66533453693773E-16
0.2
0.4
0.6
0.8
1
1.2
24
PARBS0
0.2
0.4
0.6
0.8
1
1.2
FST (Core Throt-tling)
0
0.2
0.4
0.6
0.8
1
1.2
No Prefetching
Aggressive Prefetching
HPAC
Prefetch-Aware
11% 10.9% 11.3%
Max Slowdown Results
NFQ-1.66533453693773E-16
0.2
0.4
0.6
0.8
1
1.2
25
PARBS0
0.2
0.4
0.6
0.8
1
1.2
FST (Core Throt-tling)
0
0.2
0.4
0.6
0.8
1
1.2
No Prefetching
Aggressive Prefetching
HPAC
Prefetch-Aware
9.9% 18.4% 14.5%
Conclusion
State-of-the-art fair shared resource management techniques can be harmful in the presence of prefetching Their underlying prioritization techniques need to be extended
to differentiate prefetches based on accuracy Core and prefetcher throttling should be coordinated with
source-based resource management techniques
Demand boosting eliminates starvation ofmemory non-intensive applications
Our mechanisms improve both fair memory schedulers and source throttling in both system performance and fairness by >10%
26
Prefetch-Aware Shared-Resource Management
for Multi-Core Systems
Eiman Ebrahimi*
Chang Joo Lee*+
Onur Mutlu‡
Yale N. Patt*
* HPS Research Group The University of Texas at
Austin‡ Computer Architecture Laboratory
Carnegie Mellon University+ Intel Corporation
Austin