sage: self-tuning approximation for graphics engines · pdf filesage: self-tuning...

1
SAGE: Self-Tuning Approximation for Graphics Engines Approximation is Acceptable in Many Domains Approximation Machine Learning Image Processing Video processing Less work Higher performance Lower power consumption Quality: 100% 96% 90% Improving Performance While Quality is Acceptable Evaluation Metric Target Output Quality (TOQ) CUDA Code Atomic Operation Data Packing Thread Fusion Approximate Kernels Tuning Parameters SAGE Generates Multiple Approximate Kernels SAGE Monitors the Output Quality During Runtime Preprocessing Tuning Execution Calibration Quality Speedup TOQ Tuning T T + C T + 2C CPU GPU Tuning K-Means Runtime 80 90 100 0 1 2 1 4 7 10 13 16 19 22 25 28 31 34 37 40 43 46 49 52 55 58 61 64 67 70 73 76 79 82 85 88 91 94 97 100 103 106 109 112 115 Output Quality Accumulative Speedup Target Output Quality Performance 6.4 1 2 3 4 K-MEANS NB HISTOGRAM SVM FUZZY MEANSHIFT BINARIZATION DYNAMIC MEAN FILTER GAUSSIAN GEOMEAN Speedup 1 2 3 4 Speedup SAGE Loop Perforation SAGE Loop Perforation TOQ = 95% TOQ = 90% Conclusion SAGE enables the programmer to implement a program once It automatically generates approximate kernels with different parameters Runtime system uses tuning parameters to control the output quality during execution 2.5x speedup with less than 10% quality loss compared to the accurate execution K(0,0) Quality = 100% K(1,0) K(0,1) Quality = 94% Speedup = 1.15X Quality = 96% Speedup = 1.5X K(1,1) K(0,2) Quality = 94% Speedup = 2.5X Quality = 95% Speedup = 1.6X K(2,1) K(1,2) Quality = 88% Speedup = 3X Quality = 87% Speedup = 2.8X Final Choice Tuning Path TOQ = 90% Mehrzad Samadi 1 , Janghaeng Lee 1 , D. Anoushe Jamshidi 1 , Amir Hormati 2 , and Scott Mahlke 1 University of Michigan 1 , Google Inc. 2 cccp.eecs.umich.edu Quality: 100% 95% 90% 85% Simplify or skip processing on the input data Computationally expensive for GPU Have the lowest impact on the output quality SAGE Write the program once Automatic approximation Dynamic self-tuning

Upload: doannhu

Post on 29-Mar-2018

224 views

Category:

Documents


0 download

TRANSCRIPT

SAGE: Self-Tuning Approximation for Graphics Engines

Approximation is Acceptable in Many Domains

Approximation–Machine Learning– Image Processing– Video processing–…

Less work – Higher performance– Lower power consumption

Quality: 100% 96%

90%

Improving Performance While Quality is Acceptable

EvaluationMetric

Target Output Quality (TOQ)

CUDA Code

Atomic Operation

Data Packing

ThreadFusion

Approximate Kernels

Tuning Parameters

SAGE Generates Multiple Approximate Kernels SAGE Monitors the Output Quality During Runtime

Preprocessing

Tuning Execution Calibration

Quality

Speedup

TOQ

Tuning T T + C T + 2C

CPU

GPU

Tuning K-Means Runtime

80

90

100

0

1

2

1 4 7 10 13 16 19 22 25 28 31 34 37 40 43 46 49 52 55 58 61 64 67 70 73 76 79 82 85 88 91 94 97 100

103

106

109

112

115

Output Quality

Accumulative Speedup

Target

Output

Quality

Performance

6.4

1 2 3 4

K-MEANS

NB

HISTOGRAM

SVM

FUZZY

MEANSHIFT

BINARIZATION

DYNAMIC

MEAN FILTER

GAUSSIAN

GEOMEAN

Speedup 1 2 3 4Speedup

SAGE

Loop PerforationSAGE

Loop Perforation

TOQ = 95% TOQ = 90%Conclusion

SAGE enables the programmer to implement a program once

It automatically generates approximate kernels with different parameters

Runtime system uses tuning parameters to control the output quality during execution

2.5x speedup with less than 10% quality loss compared to the accurate execution

K(0,0) Quality = 100%

K(1,0) K(0,1)Quality = 94%Speedup = 1.15X

Quality = 96%Speedup = 1.5X

K(1,1) K(0,2)Quality = 94%Speedup = 2.5X

Quality = 95%Speedup = 1.6X

K(2,1) K(1,2)Quality = 88%Speedup = 3X

Quality = 87%Speedup = 2.8X

Final Choice

Tuning Path

TOQ = 90%

Mehrzad Samadi1, Janghaeng Lee1, D. Anoushe Jamshidi1, Amir Hormati2, and Scott Mahlke1

University of Michigan1, Google Inc.2

cccp.eecs.umich.edu

Quality: 100% 95%

90% 85%

Simplify or skip processing on the input data–Computationally expensive for GPU– Have the lowest impact on

the output quality SAGE

– Write the program once– Automatic approximation– Dynamic self-tuning