deluceva: delta-based neural network inference …...deluceva: delta-based neural network inference...

Post on 22-May-2020

4 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Deluceva:Delta-BasedNeuralNetworkInferenceforFastVideoAnalytics

Jingjing WangandMagdalenaBalazinskaUniversity of Washington

1

Deluceva:Delta-BasedNeuralNetworkInferenceforFastVideoAnalytics

• Largevolumeofimages/videoswithvaluableinformation• Alargesetofneuralnetworkmodelsforimages• Objectclassification,detection,…

• Nextstep:videoanalytics• Largervolume• Efficiencycritical,liveoutput

2

VideoAnalyticsUsingNeuralNetworks

3

… …

DeepNeuralNetworkRichsetofestablishedimagemodels

KeyObservation:TemporalRedundancy

4

- =

ProcessDeltasInsteadofFullFrames

5

… …

State

Query

“IncrementalQueryEvaluation”

ProcessDeltasInsteadofFullFrames

6

DataStream

State

Query

“IncrementalQueryEvaluation”

…newboatdiscoveredwithboundingboxB

…pixel[x,y,z]has

changedby(0,0,1)…

Delta-BasedInferenceforVideos

• Problem:• Input: a videostream,areferencemodel• Output: similartothereferencemodel’sresult

• Approach:• Acceleratemodelinferencebyperforminglesscomputation

7

Delta-BasedInferenceforVideos:Overview

• Modifyneuralnetworktotakedeltasasinputs• Decidewhichdeltasaresignificantenoughtoprocess• Generate a network of mixed-type (dense or delta-based) operators

8

Delta-BasedInferenceforVideos:Overview

• Modifyneuralnetworktotakedeltasasinputs• Decidewhichdeltasaresignificantenoughtoprocess• Generate a network of mixed-type (dense or delta-based) operators

9

Frame2– Frame1

Frame1

Neural Network with SparseDeltas

10

=

+

Model(original)

Frame2

Model(delta)

Model(delta)

ExampleNeuralNetwork Model

11

Model(original)

OpOp ……

• Exampleoperators:convolution,maxpooling,ReLU,…

ExampleNeuralNetwork Model

12

Model(delta)

OpOp ……

• Modifyoperatorstooperateondeltas

DeltaOperatorUnit

FilterOperator

DeltaOperatorUnit

• Sparseoperator:takessparsedeltas&outputsdelta• Saves#ofFLOPsbyprocessingdeltascalarsonly

• Filter:sendonlysignificantdeltastotheoperator• Buildshistogram,keepssmalldeltas&outputslargedeltas

13

Indices,Values

abs

Accum.deltas

Kernel

・ +

SparseHow much to filterforthecurrentframe?

Delta-BasedInferenceforVideos

• Modifyneuralnetworktotakedeltasasinputs• Decidewhichdeltasaresignificantenoughtoprocess• Generate a network of mixed-type (dense or delta-based) operators

14

DeltaImpactsOutputQuality

15

Conv …

Conv …

Frame1 Frame2

Frame2– Frame1

Groundtruths:

1s

5s

90%

20%

DynamicFilteringPercentage

16

• Filteringpercentage• Higherisfasterbutrisky• Lowerissafer butslower

• Targetfilteringpercentage:largestpercentagethatgeneratesgoodresult• Appliestoalloperators• Two approaches: PI controller / Machine learning

Safer,slower Risky,faster

abs(delta)

90%

Delta-BasedInferenceforVideos

• Modifyneuralnetworktotakedeltasasinputs• Decidewhichdeltasaresignificantenoughtoprocess• Generate a network of mixed-type (dense or delta-based) operators

17

MixedNetwork

18

OpOp ……

OpOp ……

Filteringpercentageis:low(e.g.0)high(e.g.99)

medium(e.g.50)

OpOp ……

DeltaOpUnit

MixedNetwork• Logical plan: a DAG of operators• Physical plan: choose between delta opunit/ originaldenseimplementation• Profileeachoperatorwithdifferentfilteringpercentages• Pickthefastervariant

19

Op2(sparse)Filter

Op2(dense)Filteringpercentage=99

Op1… …Op3

Evaluation:Setup

20

• Threeobjectdetectionmodels

• Six10-minutevideosfromthreeYouTubelivestreams– Takenatdifferenttimes(e.g.day/night)for eachstream– Typicalobjects:people,cars,buses,boats,…– One frame per second

• TensorFlow,oneCPUthread, AmazonEC2r3.2xlarge

Model Abbrv. #ofFLOPs

SSD-VGG16 ssd-vgg 123B

FRCNN-RESNET101 frcnn-res 550B

FRCNN-INCEPTION-RESNET-V2 frcnn-incep 1395B

Time

3s

16s

41s

Evaluation:End-to-EndComparison

21

• HighestruntimesavingsbyPIcontroller– Whenerrorlessthanathreshold

Model / Datasetssd-vgg

je jn kd kn vd vnfrcnn-res

je jn kd kn vd vnfrcnn-incep

je jn kd kn vd vn

0

10

20

30

40

50

60

Avg.

Pct

. of S

avin

g

videos

Deluceva:Conclusion

• Observerichtemporalredundancyinvideos• Acceleratemodelinferencebyprocessingsignificantdeltasonly

• ModifyNNmodelstoconsistofsparse&denseops• Adjustthefilteringgranularityadaptively• Generateanetwork of mixed-type operatorsbasedoncostmodels

• Improveruntimeupto67%with lowerror• Appliestoconvolutionalneuralnetworkmodels

• Ongoing:GPUimplementation,comparetootherwork

22

top related