pairing up cnns for high throughput deep learningcccp.eecs.umich.edu/papers/babak-pact19.pdfpairing...

1
Pairing Up CNNs for High Throughput Deep Learning Babak Zamirai, Salar Latifi, Scott Mahlke Input Variation There is no single best CNN for all inputs Combine multiple CNNs - Lower computational complexity - Higher accuracy Odd Correct 3% 54% Common Correct Complex Correct 25% 18% Common Wrong ResNet-152 79% AlexNet 57% DNN Complexity Trend Computational complexity grows fast Accuracy improvement Input-invariant accelerations ResNet-50 (2016) Inception-v3 (2016) Xception (2017) Inception-v4 (2017) ResNeXt-101 (2017) PolyNet (2017) SENet (2018) 75 76 77 78 79 80 81 82 83 84 0 5 10 15 20 25 30 35 40 45 Top-1 Accuracy (%) MAC Operations (GFLOPs) Pairing Up CNNs Traditional Service GPU CPU GPU Dynamic Duo (DD) Service GPU CPU GPU Confidence Probe HW Resources Model Pool Req. User GPUs CPUs FPGAs Runtime Datacenter Throughput Same response time as baseline 1.8x Latency Exhaustive search results in only 5% additional gains 69% Synergistic Pairs Higher accuracy means more room for savings Odd corrects and peak accuracy are correlated More odd corrects, better synergy Confidence Probe Recovery rate = 26% Odd corrects maintain the accuracy Complex corrects Common corrects Softmax Layer CNN 0.9 0.2 1.3 4.98 Output layer Softmax 0.016 0.008 0.024 0.952 Normalized output Cat Fish Bird Dog An estimation of confidence Sum of the elements = 1.0 Run everything on the little CNN Detect and recover unreliable outputs Little CNN Input Output User Confidence Probe Unreliable Reliable Big CNN

Upload: others

Post on 24-Jul-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Pairing Up CNNs for High Throughput Deep Learningcccp.eecs.umich.edu/papers/babak-pact19.pdfPairing Up CNNs for High Throughput Deep Learning Babak Zamirai, Salar Latifi, Scott MahlkeInput

Pairing Up CNNs for High Throughput Deep LearningBabak Zamirai, Salar Latifi, Scott Mahlke

Input Variation

• ThereisnosinglebestCNNforallinputs• CombinemultipleCNNs- Lowercomputationalcomplexity- Higheraccuracy

OddCorrect3%54%

CommonCorrect

ComplexCorrect25%

18%CommonWrong

ResNet-15279%

AlexNet57%

DNN Complexity Trend

• Computationalcomplexitygrowsfast✓ Accuracyimprovement✗ Input-invariantaccelerations

ResNet-50(2016)

Inception-v3(2016)

Xception (2017)

Inception-v4(2017)

ResNeXt-101(2017)

PolyNet(2017)

SENet(2018)

75

76

77

78

79

80

81

82

83

84

0 5 10 15 20 25 30 35 40 45

Top-1Accuracy(%

)

MACOperations(GFLOPs)

Pairing Up CNNs

HW Resources

Model Pool

Req.

Input

Output

UserService

GPUsCPUs FPGAs

Traditional Service

GPUCPU GPU

Dynamic Duo (DD) Service

GPUCPU GPU

Confidence Probe

HW Resources

Model Pool

Req.

Input

Output

UserService

GPUsCPUs FPGAs

Traditional Service

GPUCPU GPU

Dynamic Duo (DD) Service

GPUCPU GPU

Confidence Probe

Runtime

Datacenter Throughput

• Sameresponsetimeasbaseline

1.8x

Latency

• Exhaustivesearchresultsinonly5%additionalgains

69%

Synergistic Pairs

• Higheraccuracymeansmoreroomforsavings• Oddcorrectsandpeakaccuracyarecorrelated• Moreoddcorrects,bettersynergy

Confidence Probe

• Recoveryrate=26%• Oddcorrectsmaintaintheaccuracy

Complexcorrects Commoncorrects

Softmax Layer

CNN

0.9

0.2

1.3

4.98

Outputlayer

Softmax0.016

0.008

0.024

0.952

Normalizedoutput

✗Cat

✗Fish

✗Bird

✓ Dog

• Anestimationofconfidence• Sumoftheelements=1.0

• RuneverythingonthelittleCNN• Detectandrecoverunreliableoutputs

Little CNNInput

Output

User

Confidence ProbeUnreliable

Reliable

Big CNN