meta particle flow for sequential bayesian inference › materials › slides › ht19-1...–the...
TRANSCRIPT
![Page 1: Meta Particle Flow for Sequential Bayesian Inference › materials › Slides › ht19-1...–the state of art for online Bayesian Inference –but suffers from path degeneracy problem](https://reader033.vdocuments.us/reader033/viewer/2022052723/5f0e40487e708231d43e565e/html5/thumbnails/1.jpg)
Meta Particle Flowfor Sequential Bayesian Inference
Le Song
Associate Professor, CSE
Associate Director, Machine Learning Center
Georgia Institute of Technology
Joint work with Xinshi Chen and Hanjun Dai
![Page 2: Meta Particle Flow for Sequential Bayesian Inference › materials › Slides › ht19-1...–the state of art for online Bayesian Inference –but suffers from path degeneracy problem](https://reader033.vdocuments.us/reader033/viewer/2022052723/5f0e40487e708231d43e565e/html5/thumbnails/2.jpg)
Bayesian Inference
Infer the posterior distribution of unknown parameter 𝒙 given
• Prior distribution 𝜋(𝑥)
• Likelihood function 𝑝(𝑜|𝑥)
• Observations 𝑜1, 𝑜2, … , 𝑜𝑚
𝑝 𝑥 𝑜1:𝑚 =1
𝑧𝜋 𝑥
𝑖=1
𝑚
𝑝(𝑜𝑖|𝑥)
𝑧 = 𝜋 𝑥
𝑖=1
𝑚
𝑝(𝑜𝑖|𝑥) 𝑑𝑥
Challenging computational problem for high dimensional 𝑥
𝑥
𝑜1 𝑜2 𝑜𝑚…… ……
![Page 3: Meta Particle Flow for Sequential Bayesian Inference › materials › Slides › ht19-1...–the state of art for online Bayesian Inference –but suffers from path degeneracy problem](https://reader033.vdocuments.us/reader033/viewer/2022052723/5f0e40487e708231d43e565e/html5/thumbnails/3.jpg)
Gaussian Mixture Model
• prior 𝑥1, 𝑥2 ∼ 𝜋 𝑥 = 𝒩(0, 𝐼)
• observations o|𝑥1, 𝑥2 ∼ 𝑝 𝑜 𝑥1, 𝑥2 =1
2𝒩 𝑥1, 1 +
1
2𝒩(𝑥1 + 𝑥2, 1)
• With 𝑥1, 𝑥2 = (1,−2), the resulting posterior will have two modes: 1, −2 and −1, 2
Challenges in Bayesian Inference
To fit only one posterior 𝒑(𝒙|𝒐𝟏:𝒎) is already not easy.
[Results reported by Dai et al. (2016)]
−2 −1.5 −1 −0.5 0 0.5 1 1.5 2−3
−2
−1
0
1
2
3
1
2
3
4
5
6
7
8
9
10x 10
−3
(a) True posterior (b) Stochastic VariationalInference
(c) Stochastic GradientLangevin Dynamics
(d) Gibbs Sampling (e) One-pass SMC
![Page 4: Meta Particle Flow for Sequential Bayesian Inference › materials › Slides › ht19-1...–the state of art for online Bayesian Inference –but suffers from path degeneracy problem](https://reader033.vdocuments.us/reader033/viewer/2022052723/5f0e40487e708231d43e565e/html5/thumbnails/4.jpg)
Fundamental Principle for Machine Learning
Lots of applications in machine learning
• Hidden Markov model
• Topic modeling
• Uncertainty quantification
• 𝑜𝑡+1 = 𝑃𝑜𝑡−𝜏 exp −𝑦𝑡−𝜏
𝑦0𝑒𝑡 + 𝑜𝑡 exp −𝛿𝜖𝑡 ,
• 𝑒𝑡 ∼ Γ 𝜎𝑝−2, 𝜎𝑝
2 , 𝜖𝑡 ∼ Γ 𝜎𝑑−2, 𝜎𝑑
2
• parameters 𝑥 = 𝑃, 𝑦0, 𝜎𝑝2, 𝜎𝑑
2, 𝜏, 𝛿
𝑜𝑧𝜃𝛼𝑀
𝑁
𝑥
𝑥𝑚
𝑜1 𝑜2 𝑜𝑚
……𝑥2𝑥1
sensormeasure
true location
topic
word
![Page 5: Meta Particle Flow for Sequential Bayesian Inference › materials › Slides › ht19-1...–the state of art for online Bayesian Inference –but suffers from path degeneracy problem](https://reader033.vdocuments.us/reader033/viewer/2022052723/5f0e40487e708231d43e565e/html5/thumbnails/5.jpg)
Sequential Bayesian Inference
𝑥
𝑜1 𝑜2 𝑜𝑚
𝑝 𝑥 𝑜1 𝑝 𝑥 𝑜1:2 𝑝 𝑥 𝑜1:𝑚…
…… ……
prior 𝜋(𝑥)
𝑜1 𝑜2 𝑜𝑚
𝑝 𝑥 𝑜1:𝑚 ∝ 𝑝 𝑥 𝑜1:𝑚−1 𝑝 𝑜𝑚 𝑥
updated posterior current posterior likelihood
Online Bayesian Inference
• Observations 𝑜1, 𝑜2, … , 𝑜𝑚 arrive sequentially
An ideal algorithm should:
• Efficiently update 𝑝 𝑥 𝑜1:𝑚 to 𝑝 𝑥 𝑜1:𝑚+1 when 𝑜𝑚+1 is observed
• Without storing all historical observations 𝑜1, 𝑜2, … , 𝑜𝑚
![Page 6: Meta Particle Flow for Sequential Bayesian Inference › materials › Slides › ht19-1...–the state of art for online Bayesian Inference –but suffers from path degeneracy problem](https://reader033.vdocuments.us/reader033/viewer/2022052723/5f0e40487e708231d43e565e/html5/thumbnails/6.jpg)
Related Work
• MCMC
– requires a complete scan of the data
• Variational Inference (VI)
– requires re-optimization for every new observation
• Stochastic approximate inference
– are prescribed algorithms to optimize the final posterior 𝑝 𝑥 𝑜1:𝑀
– can not exploit the structure of the sequential inference problem
𝑥
𝑜1 𝑜2 𝑜𝑚
𝑝 𝑥 𝑜1 𝑝 𝑥 𝑜1:2 𝑝 𝑥 𝑜1:𝑚…
…… ……
prior 𝜋(𝑥)
𝑜1 𝑜2 𝑜𝑚
![Page 7: Meta Particle Flow for Sequential Bayesian Inference › materials › Slides › ht19-1...–the state of art for online Bayesian Inference –but suffers from path degeneracy problem](https://reader033.vdocuments.us/reader033/viewer/2022052723/5f0e40487e708231d43e565e/html5/thumbnails/7.jpg)
Related Work
Sequential monte Carlo (Doucet et al., 2001; Balakrishnan&Madigan, 2006)
– the state of art for online Bayesian Inference
– but suffers from path degeneracy problem in high dimensions
– rejuvenation steps can help but will violate online constraints (Canini et al., 2009)
𝑥
𝑜1 𝑜2 𝑜𝑚
𝑝 𝑥 𝑜1 𝑝 𝑥 𝑜1:2 𝑝 𝑥 𝑜1:𝑚…
…… ……
prior 𝜋(𝑥)
𝑜1 𝑜2 𝑜𝑚
Can we learn to perform efficient and effective sequential Bayesian update?
![Page 8: Meta Particle Flow for Sequential Bayesian Inference › materials › Slides › ht19-1...–the state of art for online Bayesian Inference –but suffers from path degeneracy problem](https://reader033.vdocuments.us/reader033/viewer/2022052723/5f0e40487e708231d43e565e/html5/thumbnails/8.jpg)
Operator View
Kernel Bayes’ Rule (Fukumizu et al., 2012)
– the posterior is represented as an embedding 𝜇𝑚 = 𝔼𝑝(𝑥|𝑜1:𝑚)[𝜙 𝑥 ]
– 𝜇𝑚+1 = 𝒦( 𝜇𝑚 , 𝑜𝑚+1)
– views the Bayes update as an operator in reproducing kernel Hilbert space (RKHS)
– conceptually nice but is limited in practice
𝑥
𝑜1 𝑜2 𝑜𝑚
𝑝 𝑥 𝑜1 𝑝 𝑥 𝑜1:2 𝑝 𝑥 𝑜1:𝑚…
…… ……
prior 𝜋(𝑥)
𝑜1 𝑜2 𝑜𝑚
updated embedding current embedding
![Page 9: Meta Particle Flow for Sequential Bayesian Inference › materials › Slides › ht19-1...–the state of art for online Bayesian Inference –but suffers from path degeneracy problem](https://reader033.vdocuments.us/reader033/viewer/2022052723/5f0e40487e708231d43e565e/html5/thumbnails/9.jpg)
Our Approach: Bayesian Inference as Particle Flow
𝒳0 = {𝑥01, … , 𝑥0
𝑁}𝒳0 ∼ 𝜋(𝑥)
𝑥1𝑛 = 𝑥0
𝑛 + 0
𝑇
𝑓 𝒳0, 𝑜1, 𝑥(𝑡) 𝑑𝑡
𝒳1 = {𝑥11, … , 𝑥1
𝑁}𝒳1 ∼ 𝑝(𝑥|𝑜1)
Particle Flow
• Start with 𝑵 particles
𝒳0 = {𝑥01, … , 𝑥0
𝑁}, sampled i.i.d. from prior 𝜋(𝑥)
• Transport particles to next posterior via solution of an initial value problem (IVP)
⟹ solution 𝑥1𝑛 = 𝑥(𝑇)
𝑑𝑥
𝑑𝑡= 𝑓 𝒳0, 𝑜1, 𝑥(𝑡) , ∀𝑡 ∈ [0, 𝑇] and 𝑥 0 = 𝑥0
𝑛
![Page 10: Meta Particle Flow for Sequential Bayesian Inference › materials › Slides › ht19-1...–the state of art for online Bayesian Inference –but suffers from path degeneracy problem](https://reader033.vdocuments.us/reader033/viewer/2022052723/5f0e40487e708231d43e565e/html5/thumbnails/10.jpg)
Flow Property
• Continuity Equation expresses the law of local conservation of mass:
– Mass can neither be created nor destroyed
– nor can it ‘teleport’ from one place to another
𝜕𝑞 𝑥, 𝑡
𝜕𝑡= −𝛻𝑥 ⋅ (𝑞𝑓)
• Theorem. If 𝑑𝑥
𝑑𝑡= 𝑓, then the change in log-density follows the differential equation
𝑑 log 𝑞 𝑥, 𝑡
𝑑𝑡= −𝛻𝑥 ⋅ 𝑓
• Notation
–𝑑𝑞
𝑑𝑡is material derivative that defines the rate of change of 𝑞 in a given particle as
it moves along its trajectory 𝑥 = 𝑥(𝑡)
–𝜕𝑞
𝜕𝑡is partial derivative that defines the rate of change of 𝑞 at a particular point 𝑥
![Page 11: Meta Particle Flow for Sequential Bayesian Inference › materials › Slides › ht19-1...–the state of art for online Bayesian Inference –but suffers from path degeneracy problem](https://reader033.vdocuments.us/reader033/viewer/2022052723/5f0e40487e708231d43e565e/html5/thumbnails/11.jpg)
Particle Flow for Sequential Bayesian Inference
Particle Flow for Sequential Bayesian Inference
𝑥𝑚+1𝑛 = 𝑥𝑚
𝑛 + 0
𝑇
𝑓 𝒳𝑚, 𝑜𝑚+1, 𝑥(𝑡) 𝑑𝑡
−log 𝑝𝑚+1𝑛 = −log𝑝𝑚
𝑛 + 0
𝑇
𝛻𝑥 ⋅ 𝑓 𝒳𝑚, 𝑜𝑚+1, 𝑥(𝑡) 𝑑𝑡
𝒳0 = {𝑥01, … , 𝑥0
𝑁} 𝒳1 = {𝑥11, … , 𝑥1
𝑁} 𝒳2 = {𝑥21, … , 𝑥2
𝑁}
……
0
𝑇
𝑓 𝒳0, 𝑜1, 𝑥(𝑡) 𝑑𝑡 0
𝑇
𝑓 𝒳1, 𝑜2, 𝑥(𝑡) 𝑑𝑡 0
𝑇
𝑓 𝒳2, 𝑜3, 𝑥(𝑡) 𝑑𝑡
𝑥
𝑜1 𝑜2 𝑜𝑚
𝑝 𝑥 𝑜1 𝑝 𝑥 𝑜1:2 𝑝 𝑥 𝑜1:𝑚…
…… ……
prior 𝜋(𝑥)
𝑜1 𝑜2 𝑜𝑚
• Other ODE approaches (eg. Neural ODE of Chen et al 18), are not for sequential case.
![Page 12: Meta Particle Flow for Sequential Bayesian Inference › materials › Slides › ht19-1...–the state of art for online Bayesian Inference –but suffers from path degeneracy problem](https://reader033.vdocuments.us/reader033/viewer/2022052723/5f0e40487e708231d43e565e/html5/thumbnails/12.jpg)
Shared Flow Velocity 𝒇 Exists?
A simple Gaussian Example
• Prior 𝜋 𝑥 = 𝒩(0, 𝜎0), likelihood 𝑝 𝑜 𝑥 = 𝒩 𝑥, σ , observation 𝑜 = 0
• ⟹ posterior 𝑝 𝑥 𝑜 = 0 = 𝒩(0,𝜎⋅𝜎0
𝜎+𝜎0)
• Whether a shared 𝑓 exists for priors with different 𝜎0? What is the form for it?
– E.g. 𝑓 in the form of 𝑓(𝑜, 𝑥(𝑡)) won’t be able to handle different 𝜎0.
𝑥 𝑇 = 𝑥 0 + 0
𝑇
𝑓(𝑖𝑛𝑝𝑢𝑡𝑠)𝑑𝑡
𝜋(𝑥)𝑥 0 ∼ 𝑝(𝑥|𝑜1)𝑥 𝑡 ∼
Does a shared flow velocity 𝑓 exist for different Bayesian inference tasks involving different priors and different observations?
![Page 13: Meta Particle Flow for Sequential Bayesian Inference › materials › Slides › ht19-1...–the state of art for online Bayesian Inference –but suffers from path degeneracy problem](https://reader033.vdocuments.us/reader033/viewer/2022052723/5f0e40487e708231d43e565e/html5/thumbnails/13.jpg)
Existence: Connection to Stochastic Flow
• Langevin dynamics is a stochastic process
𝑑𝑥 𝑡 = 𝛻𝑥 log 𝜋 𝑥 𝑝(𝑜|𝑥) 𝑑𝑡 + 2 𝑑𝑤 𝑡 ,
where 𝑑𝑤(𝑡) is a standard Brownian motion.
• Property. If the potential function Ψ 𝑥 ≔ −log 𝜋 𝑥 𝑝(𝑜|𝑥) is smooth and
𝑒−Ψ ∈ 𝐿1(ℝ𝑑) , the Fokker-Planck equation has a unique stationary solution
in the form of Gibbs distribution,
𝑞 𝑥,∞ =𝑒−Ψ
𝑍=𝜋 𝑥 𝑝 𝑜 𝑥
𝑍= 𝑝(𝑥|𝑜)
![Page 14: Meta Particle Flow for Sequential Bayesian Inference › materials › Slides › ht19-1...–the state of art for online Bayesian Inference –but suffers from path degeneracy problem](https://reader033.vdocuments.us/reader033/viewer/2022052723/5f0e40487e708231d43e565e/html5/thumbnails/14.jpg)
Existence: Connection to Stochastic Flow
• The probability density 𝑞(𝑥, 𝑡) of 𝑥 𝑡 follows a deterministic evolution according to
the Fokker-Planck equation
𝜕𝑞
𝜕𝑡= −𝛻𝑥 ⋅ 𝑞𝛻𝑥 log 𝜋 𝑥 𝑝 𝑜 𝑥 + Δ𝑥𝑞 𝑥, 𝑡
which is in the form of Continuity Equation.
• Theorem. When the deterministic transformation of random variable 𝑥 𝑡 follows
𝑑𝑥
𝑑𝑡= 𝛻𝑥 log 𝜋 𝑥 𝑝(𝑜|𝑥) − 𝛻𝑥 log 𝑞 𝑥, 𝑡 ,
its probability density 𝑝(𝑥, 𝑡) converges to the posterior 𝑝(𝑥|𝑜) as 𝑡 → ∞.
= −𝛻𝑥 ⋅ (𝑞(𝛻𝑥 log 𝜋 𝑥 𝑝(𝑜|𝑥) − 𝛻𝑥 log 𝑞(𝑥, 𝑡))),
𝑓
![Page 15: Meta Particle Flow for Sequential Bayesian Inference › materials › Slides › ht19-1...–the state of art for online Bayesian Inference –but suffers from path degeneracy problem](https://reader033.vdocuments.us/reader033/viewer/2022052723/5f0e40487e708231d43e565e/html5/thumbnails/15.jpg)
Existence: Close-Loop to Open-Loop Conversion
Close loop to Open loop
• Fokker-Planck equation leads to close loop flow, depend not just on 𝜋(𝑥) and 𝑝 𝑜 𝑥 , but also on flow state 𝑞 𝑥, 𝑡 .
• Is there an equivalent form independent of 𝑞 𝑥, 𝑡 which can achieve the same flow? Optimization problem
min𝑤
𝑑 𝑞 𝑥,∞ , 𝑝(𝑥|𝑜)
𝑠. 𝑡.𝑑𝑥
𝑑𝑡= 𝛻𝑥 log 𝜋 𝑥 𝑝(𝑜|𝑥) − 𝑤,
• Positive answer: there exists a fixed and deterministic flow velocity 𝑓 of the form
𝑑𝑥
𝑑𝑡= 𝛻𝑥 log 𝜋 𝑥 𝑝(𝑜|𝑥) − 𝑤∗(𝜋 𝑥 , 𝑝 𝑜|𝑥 , 𝑥, 𝑡)
![Page 16: Meta Particle Flow for Sequential Bayesian Inference › materials › Slides › ht19-1...–the state of art for online Bayesian Inference –but suffers from path degeneracy problem](https://reader033.vdocuments.us/reader033/viewer/2022052723/5f0e40487e708231d43e565e/html5/thumbnails/16.jpg)
Parameterization
Parameterization
• 𝝅 𝒙 ⟹ 𝓧
– use samples 𝒳 as surrogates, feature space embedding
– Ideally, if 𝜇𝒳 is an injective mapping from the space of probability measures
• 𝒑 𝒐|𝒙 ⟹ (𝒐, 𝒙(𝒕)) for a fixed likelihood function
• With two neural networks 𝜙 and ℎ, the overall parameterization
𝜇𝒳 𝑝 ≔ 𝒳
𝜙 𝑥 𝑝 𝑥 𝑑𝑥 ≈1
𝑁
𝑛=1
𝑁
𝜙 𝑥𝑛 , 𝑥𝑛 ∼ 𝜋
𝑓 𝒳, 𝑜, 𝑥 𝑡 , 𝑡 = ℎ1
𝑁
𝑛=1
𝑁
𝜙 𝑥𝑛 , 𝑜, 𝑥 𝑡 , 𝑡
𝑑𝑥
𝑑𝑡= 𝛻𝑥 log 𝜋 𝑥 𝑝(𝑜|𝑥) − 𝑤∗(𝜋 𝑥 , 𝑝 𝑜 𝑥 , 𝑥, 𝑡)
![Page 17: Meta Particle Flow for Sequential Bayesian Inference › materials › Slides › ht19-1...–the state of art for online Bayesian Inference –but suffers from path degeneracy problem](https://reader033.vdocuments.us/reader033/viewer/2022052723/5f0e40487e708231d43e565e/html5/thumbnails/17.jpg)
Multi-task Framework
• Training set 𝒟𝑡𝑟𝑎𝑖𝑛 for meta learning
– containing multiple inference tasks, with diverse prior and observations
• Each task 𝒯 ∈ 𝒟𝑡𝑟𝑎𝑖𝑛,
Loss Function
• Minimize negative evidence lower bound (ELBO) for each task 𝒯
ℒ 𝒯 = 𝑚=1
𝑀
𝐾𝐿(𝑞𝑚(𝑥)||𝑝(𝑥|𝑜1:𝑚)) = 𝑚=1
𝑀
𝑛=1
𝑁
log 𝑞𝑚𝑛 − log 𝑝(𝑥𝑚
𝑛 , 𝑜1:𝑚) + 𝑐𝑜𝑛𝑠𝑡.
Meta Learning
prior likelihood 𝑀 observations
𝒯 ≔ (𝜋 𝑥 , 𝑝 ⋅ 𝑥 , 𝑜1, 𝑜2, … , 𝑜𝑀 )
𝑥𝑚+1𝑛 = 𝑥𝑚
𝑛 + 0
𝑇
𝑓 𝒳𝑚, 𝑜𝑚+1, 𝑥 𝑡 , 𝑡 𝑑𝑡
−log 𝑞𝑚+1𝑛 = −log 𝑞𝑚
𝑛 + 0
𝑇
𝛻𝑥 ⋅ 𝑓 𝒳𝑚, 𝑜𝑚+1, 𝑥 𝑡 , 𝑡 𝑑𝑡
![Page 18: Meta Particle Flow for Sequential Bayesian Inference › materials › Slides › ht19-1...–the state of art for online Bayesian Inference –but suffers from path degeneracy problem](https://reader033.vdocuments.us/reader033/viewer/2022052723/5f0e40487e708231d43e565e/html5/thumbnails/18.jpg)
Experiments: Benefit for High Dimension
Multivariate Gaussian Model
• prior 𝑥 ∼ 𝒩(𝜇𝑥, Σ𝑥)• observation conditioned on prior o|𝑥 ∼ 𝒩(𝑥, Σ𝑜)
Experiment Setting
• Training set only contains sequences of 10 observations, but a diverse set of prior distributions.
• Testing set contains 25 different sequences of 100 observations.Result• As the dimension of the model increases, our MPF operator has more advantages.
![Page 19: Meta Particle Flow for Sequential Bayesian Inference › materials › Slides › ht19-1...–the state of art for online Bayesian Inference –but suffers from path degeneracy problem](https://reader033.vdocuments.us/reader033/viewer/2022052723/5f0e40487e708231d43e565e/html5/thumbnails/19.jpg)
Gaussian Mixture Model
• prior 𝑥1, 𝑥2 ∼ 𝒩(0,1)
• observations o|𝑥1, 𝑥2 ∼1
2𝒩 𝑥1, 1 +
1
2𝒩(𝑥1 + 𝑥2, 1)
• With 𝑥1, 𝑥2 = (1,−2), the resulting posterior will have two modes: 1, −2 and −1, 2
Experiments: Benefit for Multimodal Posterior
To fit only one posterior 𝒑(𝒙|𝒐𝟏:𝒎) is already not easy.
[Results reported by Dai et al. (2016)]
−2 −1.5 −1 −0.5 0 0.5 1 1.5 2−3
−2
−1
0
1
2
3
1
2
3
4
5
6
7
8
9
10x 10
−3
(a) True posterior (b) Stochastic VariationalInference
(c) Stochastic GradientLangevin Dynamics
(d) Gibbs Sampling (e) One-pass SMC
![Page 20: Meta Particle Flow for Sequential Bayesian Inference › materials › Slides › ht19-1...–the state of art for online Bayesian Inference –but suffers from path degeneracy problem](https://reader033.vdocuments.us/reader033/viewer/2022052723/5f0e40487e708231d43e565e/html5/thumbnails/20.jpg)
Gaussian Mixture Model
• prior 𝑥1, 𝑥2 ∼ 𝒩(0,1)
• observations o|𝑥1, 𝑥2 ∼1
2𝒩 𝑥1, 1 +
1
2𝒩(𝑥1 + 𝑥2, 1)
• With 𝑥1, 𝑥2 = (1,−2), the resulting posterior will have two modes: 1, −2 and −1, 2
Experiments: Benefit for Multimodal Posterior
Our more challenging experimental setting:
• The learned MPF operator will be tested on sequences that is not observed in the training set.
• It needs to fit all intermediate posteriors 𝑝 𝑥 𝑜1 , 𝑝 𝑥 𝑜1:2 , … , 𝑝 𝑥 𝑜1:𝑚 .
Visualization of the evolution of posterior density from left to right.
![Page 21: Meta Particle Flow for Sequential Bayesian Inference › materials › Slides › ht19-1...–the state of art for online Bayesian Inference –but suffers from path degeneracy problem](https://reader033.vdocuments.us/reader033/viewer/2022052723/5f0e40487e708231d43e565e/html5/thumbnails/21.jpg)
Hidden Markov Model – Linear Dynamical System
Experiments: Hidden Markov Model
𝑥𝑚
𝑜1 𝑜2 𝑜𝑚
……𝑥2𝑥1• 𝑥𝑚 = 𝐴 𝑥𝑚−1 + 𝜖𝑚, 𝜖𝑚 ∼ 𝒩(0, Σ1)
• 𝑜𝑚 = 𝐵 𝑥𝑚 + 𝛿𝑚, 𝛿𝑚 ∼ 𝒩(0, Σ2)
Transition sampling + MPF operator:
1. 𝑥𝑚𝑛 = 𝐴 𝑥𝑚−1
𝑛 + 𝜖𝑚
2. 𝑥𝑚𝑛 = ℱ( 𝒳𝑚, 𝑥𝑚
𝑛 , 𝑜𝑚+1) A Bayesian update from 𝑝(𝑥𝑚|𝑜1:𝑚−1) to 𝑝(𝑥𝑚|𝑜1:𝑚)
𝑝(𝑥1|𝑜1) 𝑝(𝑥2|𝑜1:2) 𝑝(𝑥𝑚|𝑜1:𝑚)……Marginal posteriors update:
![Page 22: Meta Particle Flow for Sequential Bayesian Inference › materials › Slides › ht19-1...–the state of art for online Bayesian Inference –but suffers from path degeneracy problem](https://reader033.vdocuments.us/reader033/viewer/2022052723/5f0e40487e708231d43e565e/html5/thumbnails/22.jpg)
Experiments: Generalization across Multitasks
Bayesian Logistic Regression on MNIST dataset 8 vs 6
• About 1.6M training images and 1932 testing images
• Each data point 𝑜𝑚: = (𝜙𝑚, 𝑐𝑚)
• Logistic Regression 𝑦 = 𝜎(𝜃⊤𝜙𝑚)
• Likelihood function 𝑝(𝑜𝑚 𝜃 = 𝑦𝑐𝑚 1 − 𝑦 1−𝑐𝑚
feature label
Multi-task Environment
• We reduce the dimension of the images to 50 by PCA
• We rotate the first two components by an angle 𝜓 ∼ −15°, 15°
– The first two components account for more variability in the data
• Different rotation angle 𝜓 ⟹ different decision boundary ⟹ different tasks
Multi-task Training
• MPF operator will be learned from multiple training tasks with different 𝜓
• Use the learned MPF as an online-learning algorithm during testing
![Page 23: Meta Particle Flow for Sequential Bayesian Inference › materials › Slides › ht19-1...–the state of art for online Bayesian Inference –but suffers from path degeneracy problem](https://reader033.vdocuments.us/reader033/viewer/2022052723/5f0e40487e708231d43e565e/html5/thumbnails/23.jpg)
Experiments: Generalization across Multitasks
Testing as Online learning:
(1)All algorithms start with a set of particles sampled from the prior;
(2)Each algorithm makes predictions to the encountered batch of 32 images;
(3)All algorithms observe true labels of the encountered batch;
(4)Each algorithm updates its particles and then make predictions to next batch;
(predict → observe true labels → update particles → predict → observe true labels → update ……)
accuracy 𝑟1 accuracy 𝑟2
![Page 24: Meta Particle Flow for Sequential Bayesian Inference › materials › Slides › ht19-1...–the state of art for online Bayesian Inference –but suffers from path degeneracy problem](https://reader033.vdocuments.us/reader033/viewer/2022052723/5f0e40487e708231d43e565e/html5/thumbnails/24.jpg)
Conclusion and Future Work
• An ODE-based Bayesian operator for sequential Bayesian inference• Existence• Parametrization• Meta-learning framework
• Future work• Architecture• Stable flow• Improve training
𝒳0 = {𝑥01, … , 𝑥0
𝑁}𝒳0 ∼ 𝜋(𝑥)
𝑥1𝑛 = 𝑥0
𝑛 + 0
𝑇
𝑓 𝒳0, 𝑜1, 𝑥 𝑡 , 𝑡 𝑑𝑡
𝒳1 = {𝑥11, … , 𝑥1
𝑁}𝒳1 ∼ 𝑝(𝑥|𝑜1)