john cunningham and david knowles machine learning rcc 08...

John Cunninghamand

David Knowles

Machine Learning RCC08 December 2011

Approximate Inference

• Motivation

• Taxonomy

• Summations

• Estimators

• Easier Integrals

• Summary

Outline

Probabilistic Inference

• Bayes Rule:

Probabilistic Inference

• Bayes Rule:

• (frequentist/statistical inference)

• (Bayesian/non-Bayesian distinction)

• (conjugate models)

• (enumerable simple cases)

• Many (most?) problems of interest in inference can be written as an integral of type:

• Examples:

• Posterior mean and moments:

• Data likelihood and model selection:

• Prediction:

Just an Integral

Central Object of Interest

Central Object of Focus

• Why not...


• Why not...

• message passing on the factor graph?• explains {BP,VB,EP,Gibbs,etc.} nicely• abstracts approximate inference to message calculation• mechanistic, not actually the problem we are trying to solve


• Why not...

• message passing on the factor graph?• explains {BP,VB,EP,Gibbs,etc.} nicely• abstracts approximate inference to message calculation• mechanistic, not actually the problem we are trying to solve

• the posterior?• pretty much the same thing, but again not often the core problem

• A huge field• Bishop PRML: ~100 pages• MacKay Info Theory, Inference, ... : ~180 pages• Murphy ML: A Probabilistic Perspective: ~110 pages• MLSS: ~half a day

Fool’s Errand

• A huge field• Bishop PRML: ~100 pages• MacKay Info Theory, Inference, ... : ~180 pages• Murphy ML: A Probabilistic Perspective: ~110 pages• MLSS: ~half a day

• Scope of this talk: • tutorial view of the field• incorporate by reference where possible• details where (hopefully) valuable

Fool’s Errand

• Motivation

• Taxonomy

• Summations

• Estimators


• Summary

Outline

Approximate Inference Taxonomy


• “Replace hard integrals with summations”

• Sampling methods• Central problem: how to

sample • Monte Carlo, MCMC,

Gibbs, etc.


• “Replace hard integrals with easier integrals”

• Message Passing on Factor Graph

• Central problem: how to find

• VB, EP, etc.• Note (cheat): Also

“replace hard sums with easier sums”: BP, LBP,etc.




Gibbs, etc.










Gibbs, etc.

• “Replace hard integrals with estimators”

• “Non-Bayesian” methods• Central problem: how to

find • MAP, ML, Laplace, Nested

Laplace, etc.










Gibbs, etc.




Laplace, etc.

Deterministic MethodsRandom Methods

• Motivation

• Taxonomy

• Summations

• Estimators


• Summary

Outline





Gibbs, etc.

Summations

• Two basic types: Sampling and MCMC

• “Instead of choosing [points] randomly, then weighting them..., we choose [points] with a probability... and weight them evenly.” - Metropolis et al (1953).

Summations

• Sampling:

Summations

• Sampling:

• “pick an arbitrary point and weight it by what you care about.”

• MC, importance, rejection.

• MH/MCMC:

Summations

• Sampling:



• MH/MCMC:

• “pick a point from what you care about and weight it evenly.”

• MH, MCMC, AIS, Gibbs, HMC, Slice Sampling, ESS, Hamiltonian MCMC, RML, ...

Summations

• Sampling:



Big Topic, Incorporated by Reference

• Iain Murray’s MLSS lectures: http://videolectures.net/mlss09uk_murray_mcmc/

• Motivation

• Taxonomy

• Summations

• Estimators


• Summary

Outline





Laplace, etc.





Laplace, etc.

• Laplace

• MAP





Laplace, etc.

• Nested Laplace• Rue and Martino (2009), “Approximate

Bayesian Inference for latent Gaussian models by using integrated nested Laplace approximations”, JRSSB.





Laplace, etc.

• Nested Laplace• Rue and Martino (2009), “Approximate

Bayesian Inference for latent Gaussian models by using integrated nested Laplace approximations”, JRSSB.

• ...but see Cseke and Heskes (2011), “Approximate marginals in latent Gaussian models”, JMLR.





Laplace, etc.

• Motivation

• Taxonomy

• Summations

• Estimators


• Summary

Outline

Message Passing on Factor Graph

Belief Propagation / Sum-product

Approximate messages

Expectation Propagation (EP)

Instead...

~

~

Instead, EP does this...

Moment Match

A - Form “CAVITY” B - Add a true factor and “PROJECT”

~

~

Instead, EP does this...

Moment Match

A - Form “CAVITY” B - Add a true factor and “PROJECT”

At convergence, we have...

~

~

Approximately this...

Moment Match

Beyond Simple EP

Variational Bayes / Variational Message Passing

Summary of Message Passing Perspective

• Exclusive (VB mode-seeking) vs. inclusive (EP) KL, consequences for multimodality

• Damping for EP• Power EP• More structured approximations (GBP, tree EP,

structured VB)• Connection to EM • Infer.NET

Things to be aware of

• Motivation

• Taxonomy

• Summations

• Estimators


• Summary

Outline










Gibbs, etc.




Laplace, etc.

Summary of Features

Summary of Features

• exact (eventually)

• fast/efficient in big-huge cases (at times the only option)

• poor for model selection

• slow error convergence

• analytically useful

• fits into many MLschemes (bounds)

• fast/efficient in small-medium cases

• no exactness (ignores some features of true integral)

Summary of Features





• analytically useful

• fits into many MLschemes (bounds)

• fast/efficient in small-medium cases

• no exactness (ignores some features of true integral)

Summary of Features





• quick and dirty

• often works well

• quick and dirty (local... ignores many features of true integral)

• Has many names and duplicate fields, but in the end is just numerical integration

• Disappointingly (necessarily?) fractured field

• Inherently problem-specific

Conclusion

Resources• Books

• Bishop (2006) “Pattern Recognition and Machine Learning”, Chapters 10-11• Murphy (2012) “ML: A Probabilistic Perspective”, Chapters 18 - 22

• Rasmussen and Williams (2006), “Gaussian Processes for Machine Learning”, Chapter 3 (for EP and Laplace).

• MacKay (2003) “Information Theory, Inference, and Learning Algorithms”, Part IV

• Video• MLSS 09: Murray (MCMC): http://videolectures.net/mlss09uk_murray_mcmc/

• MLSS 09: Minka (Min Divergence): http://videolectures.net/mlss09uk_minka_ai/

• Papers• Wainwright and Jordan (2008), “Graphical Models, Exponential Families, and

Variational Inference.” Foundations and Trends in Machine Learning.

• Winn and Bishop (2005), “Variational Message Passing”, JMLR.• Minka and Winn (2009): Gates, NIPS

• Hennig (2011), “Approximate Inference in Graphical Models” (PhD Thesis), Chap 2.• Kuss and Rasmussen (2005), “Assessing Approximate Inference for Binary

Gaussian Process Classification”, JMLR.

john cunningham and david knowles machine learning rcc 08...

Documents