lecture 12: variationalinference and mean field · computing mean parameter: bernoulli 10 •a...
TRANSCRIPT
![Page 1: Lecture 12: VariationalInference and Mean Field · Computing Mean Parameter: Bernoulli 10 •A single Bernoulli random variable ... •Minkowski-Weyl Theorem: any non -empty ... •For](https://reader033.vdocuments.us/reader033/viewer/2022041809/5e56fc027977993248651bff/html5/thumbnails/1.jpg)
CS839:ProbabilisticGraphicalModels
Lecture12:Variational InferenceandMeanField
TheoRekatsinas
1
![Page 2: Lecture 12: VariationalInference and Mean Field · Computing Mean Parameter: Bernoulli 10 •A single Bernoulli random variable ... •Minkowski-Weyl Theorem: any non -empty ... •For](https://reader033.vdocuments.us/reader033/viewer/2022041809/5e56fc027977993248651bff/html5/thumbnails/2.jpg)
Summary
2
• Variational Inference(approximateinference):• LoopyBP(BetheFreeEnergy)• Mean-fieldApproximation
• Whatiscommoninthetwo?
• LoopyBP:outerapproximationofthemarginalpolytope• Mean-field:innerapproximationofthemarginalpolytope
![Page 3: Lecture 12: VariationalInference and Mean Field · Computing Mean Parameter: Bernoulli 10 •A single Bernoulli random variable ... •Minkowski-Weyl Theorem: any non -empty ... •For](https://reader033.vdocuments.us/reader033/viewer/2022041809/5e56fc027977993248651bff/html5/thumbnails/3.jpg)
Variational Methods
3
• Variational means:optimization-basedformulation• Representaquantityofinterestasthesolutiontoanoptimizationproblem• Approximatethedesiredsolutionbyrelaxing/approximatingtheintractableoptimizationproblem
• Example:
![Page 4: Lecture 12: VariationalInference and Mean Field · Computing Mean Parameter: Bernoulli 10 •A single Bernoulli random variable ... •Minkowski-Weyl Theorem: any non -empty ... •For](https://reader033.vdocuments.us/reader033/viewer/2022041809/5e56fc027977993248651bff/html5/thumbnails/4.jpg)
InferenceproblemsinGraphicalModels
4
• Consideranundirectedgraphicalmodel(MRF)
• Thequantitatesofinterest(forinference)
• Marginaldistributions
• NormalizationconstantZ• Howtorepresentthesequantitiesinavariational form?• Exponentialfamiliesandconvexanalysis
![Page 5: Lecture 12: VariationalInference and Mean Field · Computing Mean Parameter: Bernoulli 10 •A single Bernoulli random variable ... •Minkowski-Weyl Theorem: any non -empty ... •For](https://reader033.vdocuments.us/reader033/viewer/2022041809/5e56fc027977993248651bff/html5/thumbnails/5.jpg)
ExponentialFamilies
5
• Canonicalparameterization
• Lognormalizationconstant
• Thisisaconvexfunction• Spaceofcanonicalparameters
![Page 6: Lecture 12: VariationalInference and Mean Field · Computing Mean Parameter: Bernoulli 10 •A single Bernoulli random variable ... •Minkowski-Weyl Theorem: any non -empty ... •For](https://reader033.vdocuments.us/reader033/viewer/2022041809/5e56fc027977993248651bff/html5/thumbnails/6.jpg)
GraphicalModelsasExponentialFamilies
6
• Undirectedgraphicalmodel(MRF)
• MRFinexponentialform:
![Page 7: Lecture 12: VariationalInference and Mean Field · Computing Mean Parameter: Bernoulli 10 •A single Bernoulli random variable ... •Minkowski-Weyl Theorem: any non -empty ... •For](https://reader033.vdocuments.us/reader033/viewer/2022041809/5e56fc027977993248651bff/html5/thumbnails/7.jpg)
Example:GaussianMRF
7
• Zero-meanmultivariateGaussiandistributionthatrespectstheMarkovpropertyofagraph
• GaussianMRFinexponentialform
![Page 8: Lecture 12: VariationalInference and Mean Field · Computing Mean Parameter: Bernoulli 10 •A single Bernoulli random variable ... •Minkowski-Weyl Theorem: any non -empty ... •For](https://reader033.vdocuments.us/reader033/viewer/2022041809/5e56fc027977993248651bff/html5/thumbnails/8.jpg)
Example:DiscreteMRF
8
![Page 9: Lecture 12: VariationalInference and Mean Field · Computing Mean Parameter: Bernoulli 10 •A single Bernoulli random variable ... •Minkowski-Weyl Theorem: any non -empty ... •For](https://reader033.vdocuments.us/reader033/viewer/2022041809/5e56fc027977993248651bff/html5/thumbnails/9.jpg)
Whyexponentialfamilies
9
• Computingtheexpectationofsufficientstatistics(meanparameters)giventhecanonicalparametersyieldsthemarginals
• Computingthenormalizeryieldsthelogpartitionfunction(orloglikelihoodfunction)
![Page 10: Lecture 12: VariationalInference and Mean Field · Computing Mean Parameter: Bernoulli 10 •A single Bernoulli random variable ... •Minkowski-Weyl Theorem: any non -empty ... •For](https://reader033.vdocuments.us/reader033/viewer/2022041809/5e56fc027977993248651bff/html5/thumbnails/10.jpg)
ComputingMeanParameter:Bernoulli
10
• AsingleBernoullirandomvariable
• Inference=Computingthemeanparameter
• Inavariational manner:casttheprocedureofcomputingmeaninanoptimization-basedformulation
![Page 11: Lecture 12: VariationalInference and Mean Field · Computing Mean Parameter: Bernoulli 10 •A single Bernoulli random variable ... •Minkowski-Weyl Theorem: any non -empty ... •For](https://reader033.vdocuments.us/reader033/viewer/2022041809/5e56fc027977993248651bff/html5/thumbnails/11.jpg)
ConjugateDualFunction
11
• Givenanyfunctionf(θ)itsconjugatedualfunctionis
• Conjugatedualisalwaysaconvexfunction:point-wisesupremumofaclassoflinearfunctions
![Page 12: Lecture 12: VariationalInference and Mean Field · Computing Mean Parameter: Bernoulli 10 •A single Bernoulli random variable ... •Minkowski-Weyl Theorem: any non -empty ... •For](https://reader033.vdocuments.us/reader033/viewer/2022041809/5e56fc027977993248651bff/html5/thumbnails/12.jpg)
DualoftheDualistheOriginal
12
• Undersometechnicalconditionsonf(convexandlowersemi-continuous)thedualofdualisitself.
• Forlogpartitionfunction
![Page 13: Lecture 12: VariationalInference and Mean Field · Computing Mean Parameter: Bernoulli 10 •A single Bernoulli random variable ... •Minkowski-Weyl Theorem: any non -empty ... •For](https://reader033.vdocuments.us/reader033/viewer/2022041809/5e56fc027977993248651bff/html5/thumbnails/13.jpg)
ComputingMeanParameter:Bernoulli
13
• Theconjugate
• Stationarycondition
• If
• If
• Wehave:
![Page 14: Lecture 12: VariationalInference and Mean Field · Computing Mean Parameter: Bernoulli 10 •A single Bernoulli random variable ... •Minkowski-Weyl Theorem: any non -empty ... •For](https://reader033.vdocuments.us/reader033/viewer/2022041809/5e56fc027977993248651bff/html5/thumbnails/14.jpg)
ComputingMeanParameter:Bernoulli
14
• Theconjugate
• Stationarycondition
• Wehave:
• Thevariational form:
• Theoptimumisachievedat
![Page 15: Lecture 12: VariationalInference and Mean Field · Computing Mean Parameter: Bernoulli 10 •A single Bernoulli random variable ... •Minkowski-Weyl Theorem: any non -empty ... •For](https://reader033.vdocuments.us/reader033/viewer/2022041809/5e56fc027977993248651bff/html5/thumbnails/15.jpg)
Remarks
15
• Thelastfewidentitiesrelyonadeeptheoryingeneralexponentialfamily:• Thedualfunctionisthenegativeentropyfunction• Themeanparameterisrestricted• Solvingtheoptimizationreturnsthemeanparameterandlogpartitionfunction
• Extendthistogeneralexponentialfamilies/graphicalmodels.
• However,• Computingtheconjugatedualentropyisingeneralintractable• Theconstraintsetofmeanparameterishardtocharacterize• Weneedtoapproximate
![Page 16: Lecture 12: VariationalInference and Mean Field · Computing Mean Parameter: Bernoulli 10 •A single Bernoulli random variable ... •Minkowski-Weyl Theorem: any non -empty ... •For](https://reader033.vdocuments.us/reader033/viewer/2022041809/5e56fc027977993248651bff/html5/thumbnails/16.jpg)
ComputetheConjugateDual
16
• Givenanexponentialfamily
• Thedualfunction• Stationarycondition
• DerivativesofAyieldsthemeanparameters• Thestationaryconditionbecomes• Forwhichμwehaveasolutionθ(μ)?
![Page 17: Lecture 12: VariationalInference and Mean Field · Computing Mean Parameter: Bernoulli 10 •A single Bernoulli random variable ... •Minkowski-Weyl Theorem: any non -empty ... •For](https://reader033.vdocuments.us/reader033/viewer/2022041809/5e56fc027977993248651bff/html5/thumbnails/17.jpg)
ComputetheConjugateDual
17
• Let’sassumethereisasolutionθ(μ) suchthat
• Thedualhastheform
• Theentropyisdefinedas
• Sothedualiswhenthereisasolutionθ(μ)
![Page 18: Lecture 12: VariationalInference and Mean Field · Computing Mean Parameter: Bernoulli 10 •A single Bernoulli random variable ... •Minkowski-Weyl Theorem: any non -empty ... •For](https://reader033.vdocuments.us/reader033/viewer/2022041809/5e56fc027977993248651bff/html5/thumbnails/18.jpg)
ComplexityofComputingConjugateDual
18
• Thedualfunctionisimplicitlydefined:
• Solvingtheinversemappingisnon-trivial• Evaluatingthenegativeentropyrequireshigh-dimensionalintegration(summation)• Forwhichμ doesithaveasolutionθ(μ)?WhatisthedomainofA*(μ)
![Page 19: Lecture 12: VariationalInference and Mean Field · Computing Mean Parameter: Bernoulli 10 •A single Bernoulli random variable ... •Minkowski-Weyl Theorem: any non -empty ... •For](https://reader033.vdocuments.us/reader033/viewer/2022041809/5e56fc027977993248651bff/html5/thumbnails/19.jpg)
MarginalPolytope
19
• Foranydistributionp(x)andasetofsufficientstatisticsφ(x)defineavectorofmeanparameters
• p(x)isnotnecessarilyanexponentialfamily
• Thesetofallrealizablemeanparametersisaconvexset
• Fordiscreteexp.familiesthisiscalledmarginalpolytope.
![Page 20: Lecture 12: VariationalInference and Mean Field · Computing Mean Parameter: Bernoulli 10 •A single Bernoulli random variable ... •Minkowski-Weyl Theorem: any non -empty ... •For](https://reader033.vdocuments.us/reader033/viewer/2022041809/5e56fc027977993248651bff/html5/thumbnails/20.jpg)
ConvexPolytope
20
• Convexhullrepresentation
• Half-planerepresentation• Minkowski-WeylTheorem:anynon-emptyconvexpolytopecanbecharacterizedbyafinitecollectionoflinearinequalityconstraints
![Page 21: Lecture 12: VariationalInference and Mean Field · Computing Mean Parameter: Bernoulli 10 •A single Bernoulli random variable ... •Minkowski-Weyl Theorem: any non -empty ... •For](https://reader033.vdocuments.us/reader033/viewer/2022041809/5e56fc027977993248651bff/html5/thumbnails/21.jpg)
Example:Two-nodeIsing Model
21
• Sufficientstatistics
• Meanparameters
• Two-nodeIsing model• Convexhullrepresentation
• Half-planerepresentation
![Page 22: Lecture 12: VariationalInference and Mean Field · Computing Mean Parameter: Bernoulli 10 •A single Bernoulli random variable ... •Minkowski-Weyl Theorem: any non -empty ... •For](https://reader033.vdocuments.us/reader033/viewer/2022041809/5e56fc027977993248651bff/html5/thumbnails/22.jpg)
MarginalPolytopeforGeneralGraphs
22
• Stilldoableforconnectedbinarygraphswith3nodes:16constraints
• Fortreegraphicalmodels,thenumberofhalf-placesgrowsonlylinearlyinthegraphsize
• Generalgraphs?• Extremelyhardtocharacterizethemarginalpolytope.
![Page 23: Lecture 12: VariationalInference and Mean Field · Computing Mean Parameter: Bernoulli 10 •A single Bernoulli random variable ... •Minkowski-Weyl Theorem: any non -empty ... •For](https://reader033.vdocuments.us/reader033/viewer/2022041809/5e56fc027977993248651bff/html5/thumbnails/23.jpg)
Variational Principle
23
• Thedualfunctiontakestheform
• Thelogpartitionfunctionhasthevariational form
• Forallθ theaboveoptimizationproblemisattaineduniquelyatμ(θ)thatsatisfies
![Page 24: Lecture 12: VariationalInference and Mean Field · Computing Mean Parameter: Bernoulli 10 •A single Bernoulli random variable ... •Minkowski-Weyl Theorem: any non -empty ... •For](https://reader033.vdocuments.us/reader033/viewer/2022041809/5e56fc027977993248651bff/html5/thumbnails/24.jpg)
Example:Two-nodeIsing Model
24
• Thedistribution• Sufficientstatistics
• Themarginalpolytopeischaracterizedby
• Thedualhasanexplicitform
• Thevariational problemis• Theoptimumisattainedat
![Page 25: Lecture 12: VariationalInference and Mean Field · Computing Mean Parameter: Bernoulli 10 •A single Bernoulli random variable ... •Minkowski-Weyl Theorem: any non -empty ... •For](https://reader033.vdocuments.us/reader033/viewer/2022041809/5e56fc027977993248651bff/html5/thumbnails/25.jpg)
Variational Principle
25
• Exactvariational formulation
• Meanfieldmethod:non-convexinnerboundandexactformofentropy
• BetheapproximationandLoopyBP:polyhedralouterboundandnon-convexBetheapproximation
![Page 26: Lecture 12: VariationalInference and Mean Field · Computing Mean Parameter: Bernoulli 10 •A single Bernoulli random variable ... •Minkowski-Weyl Theorem: any non -empty ... •For](https://reader033.vdocuments.us/reader033/viewer/2022041809/5e56fc027977993248651bff/html5/thumbnails/26.jpg)
BeliefPropagationAlgorithm
26
• Messagepassingrule:
• Marginals
• Exactfortreesbutapproximateforloopygraphs• Howdoesthisrelatetothevariational principle?Fortrees/genericgraphs?
![Page 27: Lecture 12: VariationalInference and Mean Field · Computing Mean Parameter: Bernoulli 10 •A single Bernoulli random variable ... •Minkowski-Weyl Theorem: any non -empty ... •For](https://reader033.vdocuments.us/reader033/viewer/2022041809/5e56fc027977993248651bff/html5/thumbnails/27.jpg)
TreeGraphicalModels
27
• DiscretevariablesonatreeT=(V,E)
• Sufficientstatistics
• Exponentialrepresentationofdistribution?• Meanparametersaremarginalprobabilities:
![Page 28: Lecture 12: VariationalInference and Mean Field · Computing Mean Parameter: Bernoulli 10 •A single Bernoulli random variable ... •Minkowski-Weyl Theorem: any non -empty ... •For](https://reader033.vdocuments.us/reader033/viewer/2022041809/5e56fc027977993248651bff/html5/thumbnails/28.jpg)
MarginalPolytopeforTrees
28
• Marginalpolytopeforgeneralgraphs
• Byjunctiontreewehave:
• Ifthen
![Page 29: Lecture 12: VariationalInference and Mean Field · Computing Mean Parameter: Bernoulli 10 •A single Bernoulli random variable ... •Minkowski-Weyl Theorem: any non -empty ... •For](https://reader033.vdocuments.us/reader033/viewer/2022041809/5e56fc027977993248651bff/html5/thumbnails/29.jpg)
DecompositionofEntropyforTrees
29
• Fortreestheentropydecomposesas(thisisalsoourdual!):
![Page 30: Lecture 12: VariationalInference and Mean Field · Computing Mean Parameter: Bernoulli 10 •A single Bernoulli random variable ... •Minkowski-Weyl Theorem: any non -empty ... •For](https://reader033.vdocuments.us/reader033/viewer/2022041809/5e56fc027977993248651bff/html5/thumbnails/30.jpg)
ExactVariational PrincipleforTrees
30
• Variational formulation
• AssignaLagrangemultiplierforthenormalizationconstraintandeachmarginalizationconstraint
![Page 31: Lecture 12: VariationalInference and Mean Field · Computing Mean Parameter: Bernoulli 10 •A single Bernoulli random variable ... •Minkowski-Weyl Theorem: any non -empty ... •For](https://reader033.vdocuments.us/reader033/viewer/2022041809/5e56fc027977993248651bff/html5/thumbnails/31.jpg)
Lagrangian Derivation
31
• TakingthederivativesoftheLagrangian wrt toμs μst
• Settingthemtozerosyields
![Page 32: Lecture 12: VariationalInference and Mean Field · Computing Mean Parameter: Bernoulli 10 •A single Bernoulli random variable ... •Minkowski-Weyl Theorem: any non -empty ... •For](https://reader033.vdocuments.us/reader033/viewer/2022041809/5e56fc027977993248651bff/html5/thumbnails/32.jpg)
BPonArbitraryGraphs
32
• Twomaindifficultiesofthevariationformulation
• Themarginalpolytopeishardtocharacterize,solet’susethetree-basedouterbound
• Exactentropylacksexplicitform,solet’sapproximateitusingtheexactexpressionfortrees
![Page 33: Lecture 12: VariationalInference and Mean Field · Computing Mean Parameter: Bernoulli 10 •A single Bernoulli random variable ... •Minkowski-Weyl Theorem: any non -empty ... •For](https://reader033.vdocuments.us/reader033/viewer/2022041809/5e56fc027977993248651bff/html5/thumbnails/33.jpg)
BetheVariational Problem
33
• CombiningthetwogivesustheBethevariational problem
• Whatishappening?• Tree-basedouterbound
![Page 34: Lecture 12: VariationalInference and Mean Field · Computing Mean Parameter: Bernoulli 10 •A single Bernoulli random variable ... •Minkowski-Weyl Theorem: any non -empty ... •For](https://reader033.vdocuments.us/reader033/viewer/2022041809/5e56fc027977993248651bff/html5/thumbnails/34.jpg)
MeanFieldApproximation
34
![Page 35: Lecture 12: VariationalInference and Mean Field · Computing Mean Parameter: Bernoulli 10 •A single Bernoulli random variable ... •Minkowski-Weyl Theorem: any non -empty ... •For](https://reader033.vdocuments.us/reader033/viewer/2022041809/5e56fc027977993248651bff/html5/thumbnails/35.jpg)
TractableSubgraphs
35
• ForanexponentialfamilywithsufficientstatisticsφdefinedongraphGthesetofrealizablemeanparametersetis
• Idea:restrictptoasubsetofdistributionsassociatedwithatractablesubgraph
![Page 36: Lecture 12: VariationalInference and Mean Field · Computing Mean Parameter: Bernoulli 10 •A single Bernoulli random variable ... •Minkowski-Weyl Theorem: any non -empty ... •For](https://reader033.vdocuments.us/reader033/viewer/2022041809/5e56fc027977993248651bff/html5/thumbnails/36.jpg)
MeanFieldMethods
36
• ForagiventractablesubgraphF,asubsetofcanonicalparametersis
• Innerapproximation
• Meanfieldsolvestherelaxedproblem
• istheexactdualfunctionrestrictedto
![Page 37: Lecture 12: VariationalInference and Mean Field · Computing Mean Parameter: Bernoulli 10 •A single Bernoulli random variable ... •Minkowski-Weyl Theorem: any non -empty ... •For](https://reader033.vdocuments.us/reader033/viewer/2022041809/5e56fc027977993248651bff/html5/thumbnails/37.jpg)
Example:NaïveMeanFieldforIsing Model
37
• Ising modelin{0,1}representation
• Meanparameters
• ForfullydisconnectedgraphF
• Thedualdecomposesintosum,oneforeachnode
![Page 38: Lecture 12: VariationalInference and Mean Field · Computing Mean Parameter: Bernoulli 10 •A single Bernoulli random variable ... •Minkowski-Weyl Theorem: any non -empty ... •For](https://reader033.vdocuments.us/reader033/viewer/2022041809/5e56fc027977993248651bff/html5/thumbnails/38.jpg)
Example:NaïveMeanFieldforIsing Model
38
• Meanfieldproblem
• Thesameobjectivefunciton asinfreeenergybasedapproach
• Thenaïvemeanfieldupdateequations
• Lowerboundonlogpartitionfunction
![Page 39: Lecture 12: VariationalInference and Mean Field · Computing Mean Parameter: Bernoulli 10 •A single Bernoulli random variable ... •Minkowski-Weyl Theorem: any non -empty ... •For](https://reader033.vdocuments.us/reader033/viewer/2022041809/5e56fc027977993248651bff/html5/thumbnails/39.jpg)
GeometryofMeanField
39
• Meanfieldoptimizationisalwaysnon-convexforanyexponentialfamilyinwhichthestatespaceisfinite
• Marginalpolytopeisaconvexhull
• containsalltheextremepoints(ifitisastrictsubsetthenitmustbenon-convex• Example:two-nodeising
• Paraboliccrosssectionalongτ1 =τ2
![Page 40: Lecture 12: VariationalInference and Mean Field · Computing Mean Parameter: Bernoulli 10 •A single Bernoulli random variable ... •Minkowski-Weyl Theorem: any non -empty ... •For](https://reader033.vdocuments.us/reader033/viewer/2022041809/5e56fc027977993248651bff/html5/thumbnails/40.jpg)
Summary
40
• Variationmethodsingeneralturninfernece intoanoptimizationproblemviaexponentialfamiliesandconvexduality
• Theexactvariational principleisintractabletosolve;Twoapproximations:• Eitherinnerorouterboundtothemarginalpolytope• Variousapproximationstotheentropyfunction
• Mean-field:non-convexinnerboundandexactformofentropy• BP:polyhedralouterboundandnon-convexBetheapproximation