rfvp: rollback-free value prediction with safe to ... · virtual reality data analytics robotics...

33
RFVP: Rollback-Free Value Prediction with Safe to Approximate Loads Amir Yazdanbakhsh, Bradley Thwaites, Hadi Esmaeilzadeh Gennady Pekhimenko, Onur Mutlu, Todd C. Mowry Georgia Institute of Technology Carnegie Mellon University

Upload: others

Post on 24-Jul-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: RFVP: Rollback-Free Value Prediction with Safe to ... · Virtual Reality Data Analytics Robotics Multimedia Many GPU applications are limited by the off-chip bandwidth. Motivation:

RFVP: Rollback-Free Value Prediction with Safe to

Approximate LoadsAmir Yazdanbakhsh,

Bradley Thwaites,

Hadi Esmaeilzadeh

Gennady Pekhimenko, Onur Mutlu,

Todd C. Mowry

Georgia Institute of TechnologyCarnegie Mellon University

Page 2: RFVP: Rollback-Free Value Prediction with Safe to ... · Virtual Reality Data Analytics Robotics Multimedia Many GPU applications are limited by the off-chip bandwidth. Motivation:

Executive Summary

2

• Problem: Performance of modern GPUs significantly limited by the available off-chip bandwidth

• Observations: – Many GPU applications are amenable to approximation

– Data value similarity allows to efficiently predict values of cache misses

• Key Idea: Use simple value prediction mechanisms to avoid accesses to main memory when it is safe

• Results:– Higher speedup (36% on average) with less than 10%

quality loss

– Lower energy consumption (27% on average)

Page 3: RFVP: Rollback-Free Value Prediction with Safe to ... · Virtual Reality Data Analytics Robotics Multimedia Many GPU applications are limited by the off-chip bandwidth. Motivation:

3

VirtualReality

DataAnalytics

Robotics Multimedia

GPU

Page 4: RFVP: Rollback-Free Value Prediction with Safe to ... · Virtual Reality Data Analytics Robotics Multimedia Many GPU applications are limited by the off-chip bandwidth. Motivation:

4

VirtualReality

DataAnalytics

Robotics Multimedia

Many GPU applications arelimited by the off-chip

bandwidth

Page 5: RFVP: Rollback-Free Value Prediction with Safe to ... · Virtual Reality Data Analytics Robotics Multimedia Many GPU applications are limited by the off-chip bandwidth. Motivation:

Motivation: Bandwidth Bottleneck

5

0.6

0.8

1

1.2

1.4

1.6

1.8

2

2.2

Sp

eed

up

0.5x 1x 2x 4x 8x Perfect Memory13.7 2.5 13.5 2.6 4.0 2.6

Off-chip bandwidth is a major performance bottleneck

Page 6: RFVP: Rollback-Free Value Prediction with Safe to ... · Virtual Reality Data Analytics Robotics Multimedia Many GPU applications are limited by the off-chip bandwidth. Motivation:

Only Few Loads Matters

6

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

1 2 3 4 5 6 7 8 9 10

backprop fastwalsh gaussian heartwall

matrixmul particlefilter reduce similarityscore

srad2 stringmatch

Pe

rce

nta

ge o

f Lo

ad M

isse

s

Number of LoadsFew GPU instructions generate most of the cache misses

Page 7: RFVP: Rollback-Free Value Prediction with Safe to ... · Virtual Reality Data Analytics Robotics Multimedia Many GPU applications are limited by the off-chip bandwidth. Motivation:

7

VirtualReality

DataAnalytics

Robotics Multimedia

Many GPU applications are alsoamenable to approximation

Page 8: RFVP: Rollback-Free Value Prediction with Safe to ... · Virtual Reality Data Analytics Robotics Multimedia Many GPU applications are limited by the off-chip bandwidth. Motivation:

Rollback-Free Value Prediction

Key idea:

Predict values for safe-to-approximate loads when they miss in the cache

Design principles:

1. No rollback/recovery, only value prediction

2. Drop rate is a tuning knob

3. Other requests are serviced normally

4. Providing safety guarantees

8

Page 9: RFVP: Rollback-Free Value Prediction with Safe to ... · Virtual Reality Data Analytics Robotics Multimedia Many GPU applications are limited by the off-chip bandwidth. Motivation:

GPURFVP: Diagram

9

MemoryHierarchy

Cores

L1Data Cache

RFVPValue

Predictor

Page 10: RFVP: Rollback-Free Value Prediction with Safe to ... · Virtual Reality Data Analytics Robotics Multimedia Many GPU applications are limited by the off-chip bandwidth. Motivation:

Code Example to Support Intuition

10

float newVal = 0; for (int i=0; i<N; i++) {

float4 v1 = matrix1[i];float4 v2 = matrix2[i];newVal += v1.x * v2.x;newVal += v1.y * v2.y;newVal += v1.z * v2.z;newVal += v1.w * v2.w;

}

Matrixmul:

Page 11: RFVP: Rollback-Free Value Prediction with Safe to ... · Virtual Reality Data Analytics Robotics Multimedia Many GPU applications are limited by the off-chip bandwidth. Motivation:

Code Example to Support Intuition (2)

int d_cN, d_cS, d_cW, d_cE;

d_cN = d_c[ei];

d_cS = d_c[d_iS[row] + d_Nr * col];

d_cW = d_c[ei];

d_cE = d_c[row + d_Nr * d_jE[col]];

11

s.srad2:

Page 12: RFVP: Rollback-Free Value Prediction with Safe to ... · Virtual Reality Data Analytics Robotics Multimedia Many GPU applications are limited by the off-chip bandwidth. Motivation:

Outline

12

• Motivation

• Key Idea

• RFVP Design and Operation

• Evaluation

• Conclusion

Page 13: RFVP: Rollback-Free Value Prediction with Safe to ... · Virtual Reality Data Analytics Robotics Multimedia Many GPU applications are limited by the off-chip bandwidth. Motivation:

RFVP Architecture Design

• Instruction annotations by the programmer

• ISA changes

– Approximate load instruction

– Instruction for setting the drop rate

• Defining approximate load semantics

• Microarchitecture Integration

13

Page 14: RFVP: Rollback-Free Value Prediction with Safe to ... · Virtual Reality Data Analytics Robotics Multimedia Many GPU applications are limited by the off-chip bandwidth. Motivation:

Programmer Annotations

• Safety is a semantic property of the program

• We rely on the programmer to annotate the code

14

Page 15: RFVP: Rollback-Free Value Prediction with Safe to ... · Virtual Reality Data Analytics Robotics Multimedia Many GPU applications are limited by the off-chip bandwidth. Motivation:

ISA Support

• Approximate Loads

load.approx Reg<id>, MEMORY<address>

is a probabilistic load - can assign precise or imprecise value to Reg<id>

• Drop rate

set.rate DropRateReg

sets the fraction (e.g., 50%) of the approximate cache misses that do not initiate memory requests

15

Page 16: RFVP: Rollback-Free Value Prediction with Safe to ... · Virtual Reality Data Analytics Robotics Multimedia Many GPU applications are limited by the off-chip bandwidth. Motivation:

Microarchitecture Integration

16

MemoryHierarchy

Cores

RFVPValue

Predictor

L1Data Cache

Cores

RFVPValue

Predictor

L1Data Cache

Cores

RFVPValue

Predictor

L1Data Cache

Cores

RFVPValue

Predictor

L1Data Cache

SM

Page 17: RFVP: Rollback-Free Value Prediction with Safe to ... · Virtual Reality Data Analytics Robotics Multimedia Many GPU applications are limited by the off-chip bandwidth. Motivation:

Microarchitecture Integration

17

MemoryHierarchy

Cores

RFVPValue

Predictor

L1Data Cache

Cores

RFVPValue

Predictor

L1Data Cache

Cores

RFVPValue

Predictor

L1Data Cache

Cores

RFVPValue

Predictor

L1Data Cache

Memory Request

All the L1 misses are sent to the memory subsystem

DATA

SM

Page 18: RFVP: Rollback-Free Value Prediction with Safe to ... · Virtual Reality Data Analytics Robotics Multimedia Many GPU applications are limited by the off-chip bandwidth. Motivation:

Microarchitecture Integration

18

MemoryHierarchy

Cores

RFVPValue

Predictor

L1Data Cache

Cores

RFVPValue

Predictor

L1Data Cache

Cores

RFVPValue

Predictor

L1Data Cache

Cores

RFVPValue

Predictor

L1Data Cache

Memory Request

A fraction of the requests will be handled by RFVP

DATA

SM

Page 19: RFVP: Rollback-Free Value Prediction with Safe to ... · Virtual Reality Data Analytics Robotics Multimedia Many GPU applications are limited by the off-chip bandwidth. Motivation:

Language and Software Support

• Targeting performance critical loads

– Only a few critical instructions matter for value prediction

• Providing safety guarantees

– Programmer annotations and compiler passes

• Drop-rate selection

– A new knob that allows to control quality vs. performance tradeoffs

19

Page 20: RFVP: Rollback-Free Value Prediction with Safe to ... · Virtual Reality Data Analytics Robotics Multimedia Many GPU applications are limited by the off-chip bandwidth. Motivation:

Base Value Predictor: Two-Delta Stride

20

+

Last Value Stride1 Stride2

Hash (PC)

Predicted Value

Page 21: RFVP: Rollback-Free Value Prediction with Safe to ... · Virtual Reality Data Analytics Robotics Multimedia Many GPU applications are limited by the off-chip bandwidth. Motivation:

Designing RFVP predictor for GPUs

+

Last Value Stride1 Stride2

Hash (PC)

Predicted Value

How to design a predictor for GPUs with, for example, 32 threads per warp?

21

Page 22: RFVP: Rollback-Free Value Prediction with Safe to ... · Virtual Reality Data Analytics Robotics Multimedia Many GPU applications are limited by the off-chip bandwidth. Motivation:

GPU Predictor Design and Operation

22

+

Last Value Stride1 Stride2

Hash (PC)

++ +

1 2 … 16 17 18 … 32

Last Value Stride1 Stride2

PredictedValue

PredictedValue

warp(32 threads)

Page 23: RFVP: Rollback-Free Value Prediction with Safe to ... · Virtual Reality Data Analytics Robotics Multimedia Many GPU applications are limited by the off-chip bandwidth. Motivation:

Outline

23

• Motivation

• Key Idea

• RFVP Design and Operation

• Evaluation

• Conclusion

Page 24: RFVP: Rollback-Free Value Prediction with Safe to ... · Virtual Reality Data Analytics Robotics Multimedia Many GPU applications are limited by the off-chip bandwidth. Motivation:

Methodology

24

• Simulator

GPGPU-Sim simulator (cycle-accurate) ver. 3.1

• Workloads

GPU benchmarks from Rodinia, Nvidia SDK, and Marsbenchmark suites

• System Parameters

GPU with 15 SMs, 32 threads/warp, 6 memory channels,

48 warps/SM, 32KB shared memory, 768KB LLC, GDDR5

177.4 GB/sec off-chip bandwidth

Page 25: RFVP: Rollback-Free Value Prediction with Safe to ... · Virtual Reality Data Analytics Robotics Multimedia Many GPU applications are limited by the off-chip bandwidth. Motivation:

RFVP Performance

25

Sp

ee

du

p

1.0

1.1

1.2

1.3

1.4

1.5

1.6

Error < 1% Error < 3% Error < 5% Error < 10%2.2 2.4

Significant speedup for various acceptable quality rates

Page 26: RFVP: Rollback-Free Value Prediction with Safe to ... · Virtual Reality Data Analytics Robotics Multimedia Many GPU applications are limited by the off-chip bandwidth. Motivation:

1.01.11.21.31.41.51.61.71.8

Error < 1% Error < 3% Error < 5% Error < 10%

RFVP Bandwidth Consumption

26

BW

Co

ns

um

pti

on

Red

uc

tio

n

1.9 2.0

Reduction in consumed bandwidth (up to 1.5X average)

2.3

Page 27: RFVP: Rollback-Free Value Prediction with Safe to ... · Virtual Reality Data Analytics Robotics Multimedia Many GPU applications are limited by the off-chip bandwidth. Motivation:

1.0

1.1

1.2

1.3

1.4

Error < 1% Error < 3% Error < 5% Error < 10%

RFVP Energy Reduction

27

En

erg

y R

ed

ucti

on

1.9 2.0

Reduction in consumed energy (27% on average)

1.6

Page 28: RFVP: Rollback-Free Value Prediction with Safe to ... · Virtual Reality Data Analytics Robotics Multimedia Many GPU applications are limited by the off-chip bandwidth. Motivation:

Sensitivity to the Value Prediction

28Two-Delta predictor was the best option

1

1.1

1.2

1.3

1.4

1.5

1.6

Null Predictor Last-Value Predictor Two-Delta Predictor

Sp

eed

up

2.2 2.4

Page 29: RFVP: Rollback-Free Value Prediction with Safe to ... · Virtual Reality Data Analytics Robotics Multimedia Many GPU applications are limited by the off-chip bandwidth. Motivation:

Other Results and Analyses in the Paper

• Sensitivity to the drop rate (energy and quality)

• Precise vs. imprecise value distributions

• RFVP for memory latency wall

– CPU performance

– CPU energy reduction

– CPU quality vs. performance tradeoff

29

Page 30: RFVP: Rollback-Free Value Prediction with Safe to ... · Virtual Reality Data Analytics Robotics Multimedia Many GPU applications are limited by the off-chip bandwidth. Motivation:

Conclusion

30

• Problem: Performance of modern GPUs significantly limited by the available off-chip bandwidth

• Observations: – Many GPU applications are amenable to approximation

– Data value similarity allows to efficiently predict values of cache misses

• Key Idea: Use simple rollback-free value prediction mechanism to avoid accesses to main memory

• Results:– Higher speedup (36% on average) with less than 10%

quality loss

– Lower energy consumption (27% on average)

Page 31: RFVP: Rollback-Free Value Prediction with Safe to ... · Virtual Reality Data Analytics Robotics Multimedia Many GPU applications are limited by the off-chip bandwidth. Motivation:

RFVP: Rollback-Free Value Prediction with Safe to

Approximate LoadsAmir Yazdanbakhsh,

Bradley Thwaites,

Hadi Esmaeilzadeh

Gennady Pekhimenko, Onur Mutlu,

Todd C. Mowry

Georgia Institute of TechnologyCarnegie Mellon University

Page 32: RFVP: Rollback-Free Value Prediction with Safe to ... · Virtual Reality Data Analytics Robotics Multimedia Many GPU applications are limited by the off-chip bandwidth. Motivation:

1.0

1.2

1.4

1.6

1.8

2.0

2.2

Drop Rate = 12.5% Drop Rate = 25% Drop Rate = 50%

Drop Rate = 75% Drop Rate = 80% Drop Rate = 90%

Sensitivity to the Drop Rate

32

Sp

ee

du

p

Speedup varies significantly with different drop rates

Page 33: RFVP: Rollback-Free Value Prediction with Safe to ... · Virtual Reality Data Analytics Robotics Multimedia Many GPU applications are limited by the off-chip bandwidth. Motivation:

Pareto Analysis

33

Pareto-optimal is the configuration with 192 entries and 2 independent predictors for 32 threads