uncertainty in bayesian neural nets · bnn •variational inference •maximize lower bound on the...

41
Uncertainty in Bayesian Neural Nets August 4 2017

Upload: others

Post on 18-Aug-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Uncertainty in Bayesian Neural Nets · BNN •Variational Inference •Maximize lower bound on the marginal log-likelihood log’12 ≥!"$[log’12,,+log’,−log6,] Prior Posterior

UncertaintyinBayesianNeuralNets

August42017

Page 2: Uncertainty in Bayesian Neural Nets · BNN •Variational Inference •Maximize lower bound on the marginal log-likelihood log’12 ≥!"$[log’12,,+log’,−log6,] Prior Posterior

Overview

• BNNreview• Visualizationexperiments• BNNresults

Page 3: Uncertainty in Bayesian Neural Nets · BNN •Variational Inference •Maximize lower bound on the marginal log-likelihood log’12 ≥!"$[log’12,,+log’,−log6,] Prior Posterior

BNN

Prior:p(W)Likelihood:p(Y|X,W)ApproximatePosterior:q(W)PosteriorPredictive:𝐸"($)[𝑝(𝑦|𝑥,𝑊)]

Page 4: Uncertainty in Bayesian Neural Nets · BNN •Variational Inference •Maximize lower bound on the marginal log-likelihood log’12 ≥!"$[log’12,,+log’,−log6,] Prior Posterior

BNN

• VariationalInference• Maximizelowerboundonthemarginallog-likelihood

log 𝑝 𝑌 𝑋 ≥ 𝐸" $ [log 𝑝 𝑌 𝑋,𝑊 + log 𝑝 𝑊 − log 𝑞 𝑊 ]

Prior PosteriorApprox

Likelihood

Y

X W

Dependentonthenumberofdatapoints

1𝑀9 log 𝑝 𝑌: 𝑋:,𝑊

;

:<=

+1𝑁 log

𝑝(𝑊)𝑞(𝑊)

Page 5: Uncertainty in Bayesian Neural Nets · BNN •Variational Inference •Maximize lower bound on the marginal log-likelihood log’12 ≥!"$[log’12,,+log’,−log6,] Prior Posterior

Differentpriorsandposteriorapproximations

• Priorsp(W):• 𝑁(0, 𝜎A)• Scale-mixturesofNormals• SparsityInducing

• PosteriorApproximationsq(W):• Deltapeak q W = 𝛿𝑊• FullyFactorizedGaussiansq W =∏𝑁(𝑤I|𝜇I, 𝜎IA)�

�• BernoulliDropout• GaussianDropout• MNF

Page 6: Uncertainty in Bayesian Neural Nets · BNN •Variational Inference •Maximize lower bound on the marginal log-likelihood log’12 ≥!"$[log’12,,+log’,−log6,] Prior Posterior

MultiplicativeNormalizingFlows(MNF)

• Augmentmodelwithauxiliaryvariable

ChristosLouizos,MaxWellingICML2017

Y

X W

Z

GenerativeModel

InferenceModel

W

Z

𝑧~𝑞 𝑧 𝑊~𝑞 𝑊 𝑧

𝑞 𝑊 = N𝑞 𝑊 𝑧 𝑞 𝑧 𝑑𝑧�

𝑞 𝑊 𝑧 =PP𝑁(𝑧I𝜇IQ, 𝜎IQA)RSTU

Q<=

RVW

I<=

log 𝑝 𝑌 𝑋 ≥ 𝐸" $ [log 𝑝 𝑌 𝑋,𝑊 + log 𝑝 𝑊 − log 𝑞 𝑊|𝑧 + log 𝑟 𝑧 𝑤 − log 𝑞(𝑧)]

NewlowerboundNormalizingFlows

Page 7: Uncertainty in Bayesian Neural Nets · BNN •Variational Inference •Maximize lower bound on the marginal log-likelihood log’12 ≥!"$[log’12,,+log’,−log6,] Prior Posterior

PredictiveDistributions

Page 8: Uncertainty in Bayesian Neural Nets · BNN •Variational Inference •Maximize lower bound on the marginal log-likelihood log’12 ≥!"$[log’12,,+log’,−log6,] Prior Posterior

Uncertainties

• Modeluncertainty(Epistemicuncertainty)• Capturesignoranceaboutthemodelthatismostsuitabletoexplainthedata• Reducesastheamountofobserveddataincreases• Summarizedbygeneratingfunctionrealizationsfromourdistribution

• MeasurementNoise(Aleatoric uncertainty)• Noiseinherentintheenvironment,capturedinlikelihoodfunction

• Predictiveuncertainty• Entropyofprediction=H[p(y|x)]

Page 9: Uncertainty in Bayesian Neural Nets · BNN •Variational Inference •Maximize lower bound on the marginal log-likelihood log’12 ≥!"$[log’12,,+log’,−log6,] Prior Posterior

VisualizationExperiments

• 1Dregression• ClassificationofMNIST(visualizein2D)

• Questions:• Activations• Numberofsamples• Heldoutclasses• Typeofuncertainties

Page 10: Uncertainty in Bayesian Neural Nets · BNN •Variational Inference •Maximize lower bound on the marginal log-likelihood log’12 ≥!"$[log’12,,+log’,−log6,] Prior Posterior

Sigmoid:(1+e-x)-1 Tanh

Softplus:ln(1+ex) ReLU:max(0,x)

BNNswithDifferentActivationFunctions

Page 11: Uncertainty in Bayesian Neural Nets · BNN •Variational Inference •Maximize lower bound on the marginal log-likelihood log’12 ≥!"$[log’12,,+log’,−log6,] Prior Posterior

UncertaintyofDecisionBoundaries

• Setup:• ClassificationofMNIST• Train:50000Test:10000

784-100-2-100-10

BNN:FFG,N(0,1)Activations:Softplus

NN BNN

Page 12: Uncertainty in Bayesian Neural Nets · BNN •Variational Inference •Maximize lower bound on the marginal log-likelihood log’12 ≥!"$[log’12,,+log’,−log6,] Prior Posterior

DecisionBoundaries– 3Samples

PlotofArgmax p(y|x)ateachpoint

Page 13: Uncertainty in Bayesian Neural Nets · BNN •Variational Inference •Maximize lower bound on the marginal log-likelihood log’12 ≥!"$[log’12,,+log’,−log6,] Prior Posterior

UncertaintyofDecisionBoundaries:HeldOutClasses• Setup:• Classificationofdigits0to4(5to9heldout)

784-100-100-2-100-100-10

BNN:FFG,N(0,1)Activations:Softplus

NN BNN

Page 14: Uncertainty in Bayesian Neural Nets · BNN •Variational Inference •Maximize lower bound on the marginal log-likelihood log’12 ≥!"$[log’12,,+log’,−log6,] Prior Posterior

Wheredoyouthinktheheldoutclasseswillgo?

InsideorOutsidetheCircle?

Page 15: Uncertainty in Bayesian Neural Nets · BNN •Variational Inference •Maximize lower bound on the marginal log-likelihood log’12 ≥!"$[log’12,,+log’,−log6,] Prior Posterior

Wheredoyouthinktheheldoutclasseswillgo?

Page 16: Uncertainty in Bayesian Neural Nets · BNN •Variational Inference •Maximize lower bound on the marginal log-likelihood log’12 ≥!"$[log’12,,+log’,−log6,] Prior Posterior

HeldOutClasses

Unseenclassesdon’tgetencodedassomethingfaraway,insteadencodednearmean

Page 17: Uncertainty in Bayesian Neural Nets · BNN •Variational Inference •Maximize lower bound on the marginal log-likelihood log’12 ≥!"$[log’12,,+log’,−log6,] Prior Posterior

ConfidenceofPredictions?MaybelargeareashavehighentropyArgmax vsMax

Page 18: Uncertainty in Bayesian Neural Nets · BNN •Variational Inference •Maximize lower bound on the marginal log-likelihood log’12 ≥!"$[log’12,,+log’,−log6,] Prior Posterior

ClassBoundaries- Confidences

SharptransitionsThereisn’tmuchuncertainspace:mostlyuniform,highconfidence

Page 19: Uncertainty in Bayesian Neural Nets · BNN •Variational Inference •Maximize lower bound on the marginal log-likelihood log’12 ≥!"$[log’12,,+log’,−log6,] Prior Posterior

EntropyArgmax

Max

Entropy

Page 20: Uncertainty in Bayesian Neural Nets · BNN •Variational Inference •Maximize lower bound on the marginal log-likelihood log’12 ≥!"$[log’12,,+log’,−log6,] Prior Posterior

AffectofChoiceofActivationFunction

• Softplus• ReLU• Tanh

Page 21: Uncertainty in Bayesian Neural Nets · BNN •Variational Inference •Maximize lower bound on the marginal log-likelihood log’12 ≥!"$[log’12,,+log’,−log6,] Prior Posterior

Softplus

Sample1 Sample2 Sample3 Meanofq(W) 𝐸"($)[𝑝(𝑦|𝑥, 𝑤)]

Page 22: Uncertainty in Bayesian Neural Nets · BNN •Variational Inference •Maximize lower bound on the marginal log-likelihood log’12 ≥!"$[log’12,,+log’,−log6,] Prior Posterior

ReLU

Sample1 Sample2 Sample3 Meanofq(W) 𝐸"($)[𝑝(𝑦|𝑥, 𝑤)]

Page 23: Uncertainty in Bayesian Neural Nets · BNN •Variational Inference •Maximize lower bound on the marginal log-likelihood log’12 ≥!"$[log’12,,+log’,−log6,] Prior Posterior

TanhSample1 Sample2 Sample3 Meanofq(W) 𝐸"($)[𝑝(𝑦|𝑥, 𝑤)]

Page 24: Uncertainty in Bayesian Neural Nets · BNN •Variational Inference •Maximize lower bound on the marginal log-likelihood log’12 ≥!"$[log’12,,+log’,−log6,] Prior Posterior

Mix(Softplus,ReLu,Tanh)Sample1 Sample2 Sample3 Meanofq(W) 𝐸"($)[𝑝(𝑦|𝑥)]

Page 25: Uncertainty in Bayesian Neural Nets · BNN •Variational Inference •Maximize lower bound on the marginal log-likelihood log’12 ≥!"$[log’12,,+log’,−log6,] Prior Posterior

NumberofDatapoints25000 10000 1000 100

Argmax

Max

Entropy

𝐸"($)[𝑝(𝑦|𝑥)]

Page 26: Uncertainty in Bayesian Neural Nets · BNN •Variational Inference •Maximize lower bound on the marginal log-likelihood log’12 ≥!"$[log’12,,+log’,−log6,] Prior Posterior

ModelvsOutputUncertainty

• PredictiveUncertainty=𝐻[𝑝(𝑦|𝑥)]

OutputUncertainty

ModelUncertainty

𝐻[𝑝(𝑦|𝑥, 𝑤Z)]where𝑤Z=meanofq(w)

𝐻[𝐸"($)[𝑝(𝑦|𝑥, 𝑤)]]

Outputhighentropy(ondecisionboundary)

Highvariancepredictions

Page 27: Uncertainty in Bayesian Neural Nets · BNN •Variational Inference •Maximize lower bound on the marginal log-likelihood log’12 ≥!"$[log’12,,+log’,−log6,] Prior Posterior

ModelvsOutputUncertainty

Train Test HeldOut

ModelUncertainty .07 .26 .43

OutputUncertainty .03 .15 .25

Train Test HeldOut

ModelUncertainty .06 .06 .43

OutputUncertainty .05 .05 .36

100trainingdatapoints

25000trainingdatapoints

Smalldata:modeluncertainty

Largedata:outputuncertainty

Page 28: Uncertainty in Bayesian Neural Nets · BNN •Variational Inference •Maximize lower bound on the marginal log-likelihood log’12 ≥!"$[log’12,,+log’,−log6,] Prior Posterior

NN BNN GP+NN

AdversarialExamples,Uncertainty,andTransferTestingRobustnessinGaussianProcessHybridDeepNetworks(July2017)

Page 29: Uncertainty in Bayesian Neural Nets · BNN •Variational Inference •Maximize lower bound on the marginal log-likelihood log’12 ≥!"$[log’12,,+log’,−log6,] Prior Posterior

Visualizelandscapeoflikelihood

w1

w2

p(ytrain|xtrain,W)

DimensionofWislarge,sousean2Dauxiliaryvariable

Page 30: Uncertainty in Bayesian Neural Nets · BNN •Variational Inference •Maximize lower bound on the marginal log-likelihood log’12 ≥!"$[log’12,,+log’,−log6,] Prior Posterior

Visualizelandscapeoflikelihood

• AuxiliaryVariableModel

Y

X W

Z

GenerativeModel

InferenceModel

W

Z

784-100-100-2-10-10-10NN BNN

(2D)𝑧~𝑞 𝑧

𝑞 𝑊 = N𝛿 𝑊 𝑧 𝑞 𝑧 𝑑𝑧�

𝑊~𝑞 𝑊 𝑧r 𝑧 𝑊

log 𝑝 𝑌 𝑋 ≥ 𝐸" $ [log 𝑝 𝑌 𝑋,𝑊 + log 𝑝 𝑊 − log 𝑞 𝑊|𝑧 + log 𝑟 𝑧 𝑤 − log 𝑞(𝑧)]

𝑞 𝑊 𝑧 = 𝛿(𝑊|𝑧)

hyper-network hypo-network

Page 31: Uncertainty in Bayesian Neural Nets · BNN •Variational Inference •Maximize lower bound on the marginal log-likelihood log’12 ≥!"$[log’12,,+log’,−log6,] Prior Posterior

DecisionBoundariesz1 z2 z3 𝐸"([)[𝑝(𝑦|𝑥, 𝑧)]

Page 32: Uncertainty in Bayesian Neural Nets · BNN •Variational Inference •Maximize lower bound on the marginal log-likelihood log’12 ≥!"$[log’12,,+log’,−log6,] Prior Posterior

LikelihoodLandscape

Logp(ytrain|xtrain,W,z) Logp(ytest|xtest,W,z)

z1

z2 z2

Page 33: Uncertainty in Bayesian Neural Nets · BNN •Variational Inference •Maximize lower bound on the marginal log-likelihood log’12 ≥!"$[log’12,,+log’,−log6,] Prior Posterior

LikelihoodLandscape

logp(ytrain|xtrain,W,z) logp(ytest|xtest,W,z)

logp(ytrain|xtrain,W,z)+logr(z|W)- logq(z)

z1

z2

Page 34: Uncertainty in Bayesian Neural Nets · BNN •Variational Inference •Maximize lower bound on the marginal log-likelihood log’12 ≥!"$[log’12,,+log’,−log6,] Prior Posterior

LikelihoodLandscape

Logp(ytrain|xtrain,W,z) Logp(ytest|xtest,W,z)

logp(ytrain|xtrain,W,z)+logr(z|W)- logq(z)

z1

z2

Page 35: Uncertainty in Bayesian Neural Nets · BNN •Variational Inference •Maximize lower bound on the marginal log-likelihood log’12 ≥!"$[log’12,,+log’,−log6,] Prior Posterior

LikelihoodLandscape

Logp(ytrain|xtrain,W,z) Logp(ytest|xtest,W,z)

logp(ytrain|xtrain,W,z)+logr(z|W)- logq(z)

z1

z2

Page 36: Uncertainty in Bayesian Neural Nets · BNN •Variational Inference •Maximize lower bound on the marginal log-likelihood log’12 ≥!"$[log’12,,+log’,−log6,] Prior Posterior

RecentBNNPapers

• MultiplicativeNormalizingFlowsforVariationalBayesianNeuralNetworks(2017)• VariationalDropoutSparsifies DeepNeuralNetworks(2017)• BayesianCompressionforDeepLearning(2017)

• AdversarialPerturbations• Compression

Page 37: Uncertainty in Bayesian Neural Nets · BNN •Variational Inference •Maximize lower bound on the marginal log-likelihood log’12 ≥!"$[log’12,,+log’,−log6,] Prior Posterior

Adversarialperturbations

MNIST CIFAR10

Page 38: Uncertainty in Bayesian Neural Nets · BNN •Variational Inference •Maximize lower bound on the marginal log-likelihood log’12 ≥!"$[log’12,,+log’,−log6,] Prior Posterior

CompressionvsUncertainty

H[P]

Page 39: Uncertainty in Bayesian Neural Nets · BNN •Variational Inference •Maximize lower bound on the marginal log-likelihood log’12 ≥!"$[log’12,,+log’,−log6,] Prior Posterior

Conclusion

• UsedvisualizationstohelpunderstanduncertaintyinBNNs• Goal:improveuncertaintyestimatesandgeneralization

Applications• Activelearning• BayesOpt• RL

• Safety• Efficiency

Page 40: Uncertainty in Bayesian Neural Nets · BNN •Variational Inference •Maximize lower bound on the marginal log-likelihood log’12 ≥!"$[log’12,,+log’,−log6,] Prior Posterior

References

• WeightUncertaintyinNeuralNetworks(2015)

• VariationalDropoutandtheLocalReparameterization Trick(2015)

• DropoutasaBayesianApproximation:RepresentingModelUncertaintyinDeepLearning(2016)

• VariationalDropoutSparsifies DeepNeuralNetworks(2017)

• OnCalibrationofModernNeuralNetworks(2017)

• MultiplicativeNormalizingFlowsforVariationalBayesianNeuralNetworks(2017)

Page 41: Uncertainty in Bayesian Neural Nets · BNN •Variational Inference •Maximize lower bound on the marginal log-likelihood log’12 ≥!"$[log’12,,+log’,−log6,] Prior Posterior

ThankYou