![Page 1: Extending Expectation Propagation on Graphical Models](https://reader036.vdocuments.us/reader036/viewer/2022062518/56814a9a550346895db7a922/html5/thumbnails/1.jpg)
Extending Expectation Propagation on Graphical Models
Yuan (Alan) Qi
MIT Media Lab
![Page 2: Extending Expectation Propagation on Graphical Models](https://reader036.vdocuments.us/reader036/viewer/2022062518/56814a9a550346895db7a922/html5/thumbnails/2.jpg)
Motivation• Graphical models are widely used in real-world
applications, such as human behavior recognition and wireless digital communications.
• Inference on graphical models: infer hidden variables– Previous approaches often sacrifice efficiency for accuracy
or sacrifice accuracy for efficiency.> Need methods that better balance the trade-off between
accuracy and efficiency.
• Learning graphical models : learning model parameters– Overfitting problem: Maximum likelihood approaches > Need efficient Bayesian training methods
![Page 3: Extending Expectation Propagation on Graphical Models](https://reader036.vdocuments.us/reader036/viewer/2022062518/56814a9a550346895db7a922/html5/thumbnails/3.jpg)
Outline• Background:
– Graphical models and expectation propagation (EP)
• Inference on graphical models– Extending EP on Bayesian dynamic networks
• Fixed lag smoothing: wireless signal detection
• Different approximation techniques: Poisson tracking
– Combining EP with local propagation on loopy graphs
• Learning conditional graphical models– Extending EP classification to perform feature selection
• Gene expression classification
– Training Bayesian conditional random fields
• Handwritten ink analysis
• Conclusions
![Page 4: Extending Expectation Propagation on Graphical Models](https://reader036.vdocuments.us/reader036/viewer/2022062518/56814a9a550346895db7a922/html5/thumbnails/4.jpg)
Outline
• Background on expectation propagation (EP) – 4 kinds of graphical models– EP in a nutshell
• Inference on graphical models
• Learning conditional graphical models
• Conclusions
![Page 5: Extending Expectation Propagation on Graphical Models](https://reader036.vdocuments.us/reader036/viewer/2022062518/56814a9a550346895db7a922/html5/thumbnails/5.jpg)
Graphical Models
Bayesian networks
Markov networks
conditional classification conditional random fields
x1 x2
y1 y2
j
jpaji
ipai ppp )|()|()( )()( xyxxyx,
x1 x2
y1 y2
a
aZp )(
1)( yx,yx,
i
iitpp )|()( w,xwx,|t a
aZp )(
)(
1)( ty,x,
wxw,|t
x1 x2
t1 t2
w x1 x2
t1 t2
w
INFERENCE
LEARNING
![Page 6: Extending Expectation Propagation on Graphical Models](https://reader036.vdocuments.us/reader036/viewer/2022062518/56814a9a550346895db7a922/html5/thumbnails/6.jpg)
Expectation Propagation in a Nutshell
• Approximate a probability distribution by simpler parametric terms (Minka 2001):
– For Bayesian networks:– For Markov networks:
– For conditional classification:– For conditional random fields:
Each approximation term or lives in an exponential family (such as Gaussian & Multinomial)
a
afp )()( xx a
afq )(~
)( xx
)(~
xaf
)|()( ajaia xxpf x
)()( yx,x aaf
)|()( w,xw aaa tpf
),,()( wxw aaa tf
)(~
waf
a
aa
a fqfp )(~
)()()( wwwxt,|w
![Page 7: Extending Expectation Propagation on Graphical Models](https://reader036.vdocuments.us/reader036/viewer/2022062518/56814a9a550346895db7a922/html5/thumbnails/7.jpg)
EP in a Nutshell (2)• The approximate term minimizes the
following KL divergence by moment matching:
))()(~
||)()((minarg \\
)(~
xxxxx
aa
aa
f
qfqfD
a
)(~
)()(
~)(\
x
xxx
aabb
a
f
qfq
Where the leave-one-out approximation is
)(~
xaf
![Page 8: Extending Expectation Propagation on Graphical Models](https://reader036.vdocuments.us/reader036/viewer/2022062518/56814a9a550346895db7a922/html5/thumbnails/8.jpg)
EP in a Nutshell (3)Three key steps:• Deletion Step: approximate the “leave-one-out”
predictive posterior for the ith point:
• Minimizing the following KL divergence by moment matching (Assumed Density filtering):
• Inclusion:
ij
jii ffqq )(
~)(
~/)()(\ xxxx
))()(~
||)()((minarg \\
)(~
xxxxx
ii
ii
f
qfqfKL
i
)()(~
)( \ xxx ii qfq
![Page 9: Extending Expectation Propagation on Graphical Models](https://reader036.vdocuments.us/reader036/viewer/2022062518/56814a9a550346895db7a922/html5/thumbnails/9.jpg)
Limitations of Plain EP• Batch processing of terms: not online• Can be difficult or expensive to analytically compute
ADF step• Can be expensive to compute and maintain a valid
approximation distribution q(x), which is coherent under marginalization – Tree-structured q(x):
• EP classification degenerates in the presence of noisy features.
• Cannot incorporate denominators
)()( iji xq,xxqj
x
![Page 10: Extending Expectation Propagation on Graphical Models](https://reader036.vdocuments.us/reader036/viewer/2022062518/56814a9a550346895db7a922/html5/thumbnails/10.jpg)
Four Extensions on Four Types of Graphical Models
• Fixed-lag smoothing and embedding different approximation techniques for dynamic Bayesian networks
• Allow a structured approximation to be globally non-coherent, while only maintaining local consistency during inference on loopy graphs.
• Combine EP with ARD for classification with noisy features
• Extend EP to train conditional random fields with a denominator (partition function)
![Page 11: Extending Expectation Propagation on Graphical Models](https://reader036.vdocuments.us/reader036/viewer/2022062518/56814a9a550346895db7a922/html5/thumbnails/11.jpg)
Inference on dynamic Baysian networks
Bayesian networks
Markov networks
conditional classification conditional random fields
x1 x2
y1 y2
j
jpaji
ipai ppp )|()|()( )()( xyxxyx,
x1 x2
y1 y2
a
aZp )(
1)( yx,yx,
i
iitpp )|()( w,xwx,|t a
aZp )(
)(
1)( ty,x,
wxw,|t
x1 x2
t1 t2
w x1 x2
t1 t2
w
![Page 12: Extending Expectation Propagation on Graphical Models](https://reader036.vdocuments.us/reader036/viewer/2022062518/56814a9a550346895db7a922/html5/thumbnails/12.jpg)
Outline
• Background • Inference on graphical models
– Extending EP on Bayesian dynamic networks • Fixed lag smoothing: wireless signal detection
• Different approximation techniques: Poisson tracking
– Combining EP with junction tree algorithm on loopy graphs
• Learning conditional graphical models• Conclusions
![Page 13: Extending Expectation Propagation on Graphical Models](https://reader036.vdocuments.us/reader036/viewer/2022062518/56814a9a550346895db7a922/html5/thumbnails/13.jpg)
Object Tracking
Guess the position of an object given noisy observations
1y
4y
Object
1x2x
3x
4x
2y
3y
![Page 14: Extending Expectation Propagation on Graphical Models](https://reader036.vdocuments.us/reader036/viewer/2022062518/56814a9a550346895db7a922/html5/thumbnails/14.jpg)
Bayesian Network
ttt νxx 1
noise tt xy
(random walk)e.g.
want distribution of x’s given y’s
x1 x2 xT
y1 y2 yT
![Page 15: Extending Expectation Propagation on Graphical Models](https://reader036.vdocuments.us/reader036/viewer/2022062518/56814a9a550346895db7a922/html5/thumbnails/15.jpg)
Approximation
1
1111 )|()|()|()(),(t
tttt xypxxpxypxpp yx
1
111111 )(~)(~)(~)(~)()(t
tttttttt xoxpxpxoxpq x
Factorized and Gaussian in x
![Page 16: Extending Expectation Propagation on Graphical Models](https://reader036.vdocuments.us/reader036/viewer/2022062518/56814a9a550346895db7a922/html5/thumbnails/16.jpg)
Message Interpretation
)(~)(~)(~)( 11 tttttttt xpxoxpxq
= (forward msg)(observation msg)(backward msg)
xt
yt
Forward Message
Backward Message
Observation Message
![Page 17: Extending Expectation Propagation on Graphical Models](https://reader036.vdocuments.us/reader036/viewer/2022062518/56814a9a550346895db7a922/html5/thumbnails/17.jpg)
Extensions of EP• Instead of batch iterations, use fixed-lag smoothing
for online processing.• Instead of assumed density filtering, use any method
for approximate filtering.– Examples: unscented Kalman filter (UKF)
Turn a deterministic filtering method into a smoothing method!
All methods can be interpreted as finding linear/Gaussian approximations to original terms.
– Use quadrature or Monte Carlo for term approximations
![Page 18: Extending Expectation Propagation on Graphical Models](https://reader036.vdocuments.us/reader036/viewer/2022062518/56814a9a550346895db7a922/html5/thumbnails/18.jpg)
Bayesian network for Wireless Signal Detection
x1 x2 xT
y1 y2 yT
s1 s2 sT
si: Transmitted signals
xi: Channel coefficients for digital wireless communications
yi: Received noisy observations
![Page 19: Extending Expectation Propagation on Graphical Models](https://reader036.vdocuments.us/reader036/viewer/2022062518/56814a9a550346895db7a922/html5/thumbnails/19.jpg)
Experimental Results
EP outperforms particle smoothers in efficiency with comparable accuracy.
(Chen, Wang, Liu 2000)
Signal-Noise-Ratio Signal-Noise-Ratio
![Page 20: Extending Expectation Propagation on Graphical Models](https://reader036.vdocuments.us/reader036/viewer/2022062518/56814a9a550346895db7a922/html5/thumbnails/20.jpg)
Computational ComplexityAlgorithm Complexity
Extended EP O(nLd2)
Stochastic mixture of Kalman filters O(MLd2)
Rao-blackwised particle smoothers O(MNLd2)
L: Length of fixed-lag smooth window
d: Dimension of the parameter vector
n: Number of EP iterations (Typically, 4 or 5)
M: Number of samples in filtering (Often larger than 500 or 100)
N: Number of samples in smoothing (Larger than 50)
000,505
50100 2
2
2
n
MN
nLd
MNLdExtended EP is about than
Rao-blackwised particle smoothers!
![Page 21: Extending Expectation Propagation on Graphical Models](https://reader036.vdocuments.us/reader036/viewer/2022062518/56814a9a550346895db7a922/html5/thumbnails/21.jpg)
Example: Poisson Tracking
• is an integer valued Poisson variate with mean )exp( tx
ty
![Page 22: Extending Expectation Propagation on Graphical Models](https://reader036.vdocuments.us/reader036/viewer/2022062518/56814a9a550346895db7a922/html5/thumbnails/22.jpg)
Accuracy/Efficiency Tradeoff
(TIME)
![Page 23: Extending Expectation Propagation on Graphical Models](https://reader036.vdocuments.us/reader036/viewer/2022062518/56814a9a550346895db7a922/html5/thumbnails/23.jpg)
Inference on markov networks
Bayesian networks
Markov networks
conditional classification conditional random fields
x1 x2
y1 y2
j
jpaji
ipai ppp )|()|()( )()( xyxxyx,
x1 x2
y1 y2
a
aZp )(
1)( yx,yx,
i
iitpp )|()( w,xwx,|t a
aZp )(
)(
1)( ty,x,
wxw,|t
x1 x2
t1 t2
w x1 x2
t1 t2
w
![Page 24: Extending Expectation Propagation on Graphical Models](https://reader036.vdocuments.us/reader036/viewer/2022062518/56814a9a550346895db7a922/html5/thumbnails/24.jpg)
Outline
• Background on expectation propagation (EP) • Inference on graphical models
– Extending EP on Bayesian dynamic networks • Fixed lag smoothing: wireless signal detection
• Different approximation techniques: poisson tracking
– Combining EP with junction tree algorithm on loopy graphs
• Learning conditional graphical models• Conclusions
![Page 25: Extending Expectation Propagation on Graphical Models](https://reader036.vdocuments.us/reader036/viewer/2022062518/56814a9a550346895db7a922/html5/thumbnails/25.jpg)
Inference on Loopy Graphs
Problem: estimate marginal distributions of the variables indexed by the nodes in a loopy graph, e.g., p(xi), i = 1, . . . , 16.
X1 X2 X3 X4
X5 X6 X7 X8
X9 X10 X11 X12
X13 X14 X15 X16
![Page 26: Extending Expectation Propagation on Graphical Models](https://reader036.vdocuments.us/reader036/viewer/2022062518/56814a9a550346895db7a922/html5/thumbnails/26.jpg)
4-node Loopy Graph
4x
2x 3x
1x
a
afp )()( xx
Joint distribution is product of pairwise potentials for all edges:
Want to approximate by a simpler distribution)(xp
![Page 27: Extending Expectation Propagation on Graphical Models](https://reader036.vdocuments.us/reader036/viewer/2022062518/56814a9a550346895db7a922/html5/thumbnails/27.jpg)
BP vs. TreeEP
4x
2x 3x
1x
4x
2x 3x
4x
2x 3x
1x1xBP TreeEP
projectionprojection
![Page 28: Extending Expectation Propagation on Graphical Models](https://reader036.vdocuments.us/reader036/viewer/2022062518/56814a9a550346895db7a922/html5/thumbnails/28.jpg)
Junction Tree Representation
p(x) q(x) Junction tree
p(x) q(x) Junction tree
![Page 29: Extending Expectation Propagation on Graphical Models](https://reader036.vdocuments.us/reader036/viewer/2022062518/56814a9a550346895db7a922/html5/thumbnails/29.jpg)
Two Kinds of Edges
• On-tree edges, e.g., (x1,x4): exactly incorporated into the junction tree
• Off-tree edges, e.g., (x1,x2): approximated by projecting them onto the tree structure
4x
2x 3x
1x
![Page 30: Extending Expectation Propagation on Graphical Models](https://reader036.vdocuments.us/reader036/viewer/2022062518/56814a9a550346895db7a922/html5/thumbnails/30.jpg)
KL Minimization • KL minimization moment matching
• Match single and pairwise marginals of
4x
2x 3x
1x
4x
2x 3x
1x
and
![Page 31: Extending Expectation Propagation on Graphical Models](https://reader036.vdocuments.us/reader036/viewer/2022062518/56814a9a550346895db7a922/html5/thumbnails/31.jpg)
Matching Marginals on Graph
(1) Incorporate edge (x3 x4)
x3 x4
x5 x7
x1 x2
x1 x3 x1 x4
x3 x5
x3 x6
(2) Incorporate edge (x6 x7)
x5 x7
x1 x2
x1 x3 x1 x4
x3 x5
x3 x6
x6 x7
x5 x7
x1 x2
x1 x3 x1 x4
x3 x5
x3 x6x5 x7
x1 x2
x1 x3 x1 x4
x3 x5
x3 x6x5 x7
x1 x2
x1 x3 x1 x4
x3 x5
x3 x6
![Page 32: Extending Expectation Propagation on Graphical Models](https://reader036.vdocuments.us/reader036/viewer/2022062518/56814a9a550346895db7a922/html5/thumbnails/32.jpg)
Drawbacks of Global Propagation by Regular EP
• Update all the cliques even when only incorporating one off-tree edge – Computationally expensive
• Store each off-tree data message as a whole tree– Require large memory size
![Page 33: Extending Expectation Propagation on Graphical Models](https://reader036.vdocuments.us/reader036/viewer/2022062518/56814a9a550346895db7a922/html5/thumbnails/33.jpg)
Solution: Local Propagation
• Allow q(x) be non-coherent during the iterations. It only needs to be coherent in the end.
• Exploit the junction tree representation: only locally propagate information within the minimal loop (subtree) that is directly connected to the off-tree edge.– Reduce computational complexity
– Save memory
![Page 34: Extending Expectation Propagation on Graphical Models](https://reader036.vdocuments.us/reader036/viewer/2022062518/56814a9a550346895db7a922/html5/thumbnails/34.jpg)
x5 x7
x1 x2
x1 x3 x1 x4
x3 x5
x3 x6
x3 x4
x1 x2
x1 x3 x1 x4
x5 x7
x1 x2
x1 x3 x1 x4
x3 x5
x3 x6
x5 x7
x1 x2
x1 x3 x1 x4
x3 x5
x3 x6
x3 x5
x6 x7
x5 x7
x3 x5
x3 x6
(1) Incorporate edge(x3 x4)
(3) Incorporate edge (x6 x7)
(2) Propagate evidence
On this simple graph, local propagation runs roughly 2 times faster and uses 2 times less memory to store messages than plain EP
![Page 35: Extending Expectation Propagation on Graphical Models](https://reader036.vdocuments.us/reader036/viewer/2022062518/56814a9a550346895db7a922/html5/thumbnails/35.jpg)
Tree-EP
• Combine EP with junction algorithm
• Can perform efficiently over hypertrees and hypernodes
![Page 36: Extending Expectation Propagation on Graphical Models](https://reader036.vdocuments.us/reader036/viewer/2022062518/56814a9a550346895db7a922/html5/thumbnails/36.jpg)
Fully-connected graphs
Results are averaged over 10 graphs with randomly generated potentials• TreeEP performs the same or better than all other methods in both accuracy and efficiency!
![Page 37: Extending Expectation Propagation on Graphical Models](https://reader036.vdocuments.us/reader036/viewer/2022062518/56814a9a550346895db7a922/html5/thumbnails/37.jpg)
Learning Conditional Classification Models
Bayesian networks
Markov networks
conditional classification conditional random fields
x1 x2
y1 y2
j
jpaji
ipai ppp )|()|()( )()( xyxxyx,
x1 x2
y1 y2
a
aZp )(
1)( yx,yx,
i
iitpp )|()( w,xwx,|t a
aZp )(
)(
1)( ty,x,
wxw,|t
x1 x2
t1 t2
w x1 x2
t1 t2
w
![Page 38: Extending Expectation Propagation on Graphical Models](https://reader036.vdocuments.us/reader036/viewer/2022062518/56814a9a550346895db7a922/html5/thumbnails/38.jpg)
Outline
• Background on expectation propagation (EP) • Inference on graphical models• Learning conditional graphical models
– Extending EP classification to perform feature selection
• Gene expression classification
– Training Bayesian conditional random fields
• Handwritten ink analysis
• Conclusions
![Page 39: Extending Expectation Propagation on Graphical Models](https://reader036.vdocuments.us/reader036/viewer/2022062518/56814a9a550346895db7a922/html5/thumbnails/39.jpg)
Conditional Bayesian Classification Model
Prior of the classifier w:
Labels: t inputs: X parameters: w
Likelihood for the data set:
Where is a cumulative distribution function for a standard Gaussian.
x1 x2
t1 t2
w
![Page 40: Extending Expectation Propagation on Graphical Models](https://reader036.vdocuments.us/reader036/viewer/2022062518/56814a9a550346895db7a922/html5/thumbnails/40.jpg)
Evidence and Predictive Distribution
The evidence, i.e., the marginal likelihood of the hyperparameters :
The predictive posterior distribution of the label for a new input :
![Page 41: Extending Expectation Propagation on Graphical Models](https://reader036.vdocuments.us/reader036/viewer/2022062518/56814a9a550346895db7a922/html5/thumbnails/41.jpg)
Limitation of EP classifications
• In the presence of noisy features, the performance of classical conditional Bayesian classifiers, e.g., Bayes Point Machines trained by EP, degenerates.
![Page 42: Extending Expectation Propagation on Graphical Models](https://reader036.vdocuments.us/reader036/viewer/2022062518/56814a9a550346895db7a922/html5/thumbnails/42.jpg)
Automatic Relevance Determination (ARD)
• Give the classifier weight independent Gaussian priors whose variance, , controls how far away from zero each weight is allowed to go:
• Maximize , the marginal likelihood of the model, with respect to .
• Outcome: many elements of go to infinity, which naturally prunes irrelevant features in the data.
1
![Page 43: Extending Expectation Propagation on Graphical Models](https://reader036.vdocuments.us/reader036/viewer/2022062518/56814a9a550346895db7a922/html5/thumbnails/43.jpg)
Two Types of Overfitting
• Classical Maximum likelihood:– Optimizing the classifier weights w can directly fit
noise in the data, resulting in a complicated model.
• Type II Maximum likelihood (ARD):– Optimizing the hyperparameters corresponds to
choosing which variables are irrelevant. Choosing one out of exponentially many models can also overfit if we maximize the model marginal likelihood.
![Page 44: Extending Expectation Propagation on Graphical Models](https://reader036.vdocuments.us/reader036/viewer/2022062518/56814a9a550346895db7a922/html5/thumbnails/44.jpg)
Risk of Optimizing
X: Class 1 vs O: Class 2
Evd-ARD-1
Evd-ARD-2
Bayes Point
![Page 45: Extending Expectation Propagation on Graphical Models](https://reader036.vdocuments.us/reader036/viewer/2022062518/56814a9a550346895db7a922/html5/thumbnails/45.jpg)
Predictive-ARD
• Choosing the model with the best estimated predictive performance instead of the most probable model.
• Expectation propagation (EP) estimates the leave-one-out predictive performance without performing any expensive cross-validation.
![Page 46: Extending Expectation Propagation on Graphical Models](https://reader036.vdocuments.us/reader036/viewer/2022062518/56814a9a550346895db7a922/html5/thumbnails/46.jpg)
Estimate Predictive Performance• Predictive posterior given a test data point
• EP can estimate predictive leave-one-out error probability
• where q( w| t\i) is the approximate posterior of leaving out the ith label.
• EP can also estimate predictive leave-one-out error count
1Nx
wtwwxtx d)|(),|(),|( 1111 ptptp NNNN
N
iiii
N
iiii qtp
Ntp
N 1\
1\ d)|(),|(1
1),|(1
1wtwwxtx
N
1i21
\ )),|(I(1
iiitpN
LOO tx
![Page 47: Extending Expectation Propagation on Graphical Models](https://reader036.vdocuments.us/reader036/viewer/2022062518/56814a9a550346895db7a922/html5/thumbnails/47.jpg)
Comparison of different model selection criteria for ARD training
• 1st row: Test error• 2nd row: Estimated leave-one-out error probability• 3rd row: Estimated leave-one-out error counts• 4th row: Evidence (Model marginal likelihood)• 5th row: Fraction of selected features
The estimated leave-one-out error probabilities and counts are better correlated with the test error than evidence and sparsity level.
![Page 48: Extending Expectation Propagation on Graphical Models](https://reader036.vdocuments.us/reader036/viewer/2022062518/56814a9a550346895db7a922/html5/thumbnails/48.jpg)
Gene Expression Classification
Task: Classify gene expression datasets into different categories, e.g., normal v.s. cancer
Challenge: Thousands of genes measured in the micro-array data. Only a small subset of genes are probably correlated with the classification task.
![Page 49: Extending Expectation Propagation on Graphical Models](https://reader036.vdocuments.us/reader036/viewer/2022062518/56814a9a550346895db7a922/html5/thumbnails/49.jpg)
Classifying Leukemia Data
• The task: distinguish acute myeloid leukemia (AML) from acute lymphoblastic leukemia (ALL).
• The dataset: 47 and 25 samples of type ALL and AML respectively with 7129 features per sample.
• The dataset was randomly split 100 times into 36 training and 36 testing samples.
![Page 50: Extending Expectation Propagation on Graphical Models](https://reader036.vdocuments.us/reader036/viewer/2022062518/56814a9a550346895db7a922/html5/thumbnails/50.jpg)
Classifying Colon Cancer Data
• The task: distinguish normal and cancer samples
• The dataset: 22 normal and 40 cancer samples with 2000 features per sample.
• The dataset was randomly split 100 times into 50 training and 12 testing samples.
• SVM results from Li et al. 2002
![Page 51: Extending Expectation Propagation on Graphical Models](https://reader036.vdocuments.us/reader036/viewer/2022062518/56814a9a550346895db7a922/html5/thumbnails/51.jpg)
Learning Conditional Random Fields
Bayesian networks
Markov networks
conditional classification conditional random fields
x1 x2
y1 y2
j
jpaji
ipai ppp )|()|()( )()( xyxxyx,
x1 x2
y1 y2
a
aZp )(
1)( yx,yx,
i
iitpp )|()( w,xwx,|t a
aZp )(
)(
1)( ty,x,
wxw,|t
x1 x2
t1 t2
w x1 x2
t1 t2
w
![Page 52: Extending Expectation Propagation on Graphical Models](https://reader036.vdocuments.us/reader036/viewer/2022062518/56814a9a550346895db7a922/html5/thumbnails/52.jpg)
Outline
• Background on expectation propagation (EP) • Inference on graphical models• Learning conditional graphical models
– Extending EP classification to perform feature selection
• Gene expression classification
– Training Bayesian conditional random fields
• Handwritten ink analysis
• Conclusions
![Page 53: Extending Expectation Propagation on Graphical Models](https://reader036.vdocuments.us/reader036/viewer/2022062518/56814a9a550346895db7a922/html5/thumbnails/53.jpg)
x1 x2
t1 t2
w
• Generalize traditional classification model by introducing correlation between labels.
Potential functions:
• Model data with independence and structure, e.g, web pages, natural languages, and multiple visual objects in a picture.
• Information at one location propagates to other locations.
Conditional random fields (CRFs)
![Page 54: Extending Expectation Propagation on Graphical Models](https://reader036.vdocuments.us/reader036/viewer/2022062518/56814a9a550346895db7a922/html5/thumbnails/54.jpg)
Learning the parameter w by ML/MAP
Maximum likelihood (ML) : Maximize the data likelihood
where
Maximum a posterior (MAP):Gaussian prior on w
ML/MAP problem: Overfitting to the noise in data.
![Page 55: Extending Expectation Propagation on Graphical Models](https://reader036.vdocuments.us/reader036/viewer/2022062518/56814a9a550346895db7a922/html5/thumbnails/55.jpg)
Bayesian Conditional Networks• Bayesian training to avoid overfitting
• Need efficient training:– The exact posterior of w
– The Gaussian approximate posterior of w
![Page 56: Extending Expectation Propagation on Graphical Models](https://reader036.vdocuments.us/reader036/viewer/2022062518/56814a9a550346895db7a922/html5/thumbnails/56.jpg)
Two Difficulties for Bayesian Training
• the partition function appears in the denominator– Regular EP does not apply
• the partition function is a complex function of w
![Page 57: Extending Expectation Propagation on Graphical Models](https://reader036.vdocuments.us/reader036/viewer/2022062518/56814a9a550346895db7a922/html5/thumbnails/57.jpg)
Turn Denominator to Numerator (1)• Inverting approximation term:
– Deletion:
– ADF:
– Inclusion:
q q\Z
q new
q =
KL projection
One step forward; two step backward
![Page 58: Extending Expectation Propagation on Graphical Models](https://reader036.vdocuments.us/reader036/viewer/2022062518/56814a9a550346895db7a922/html5/thumbnails/58.jpg)
Turn Denominator to Numerator (2)• Minka’s approach:
– Deletion:
– ADF:
– Inclusion:
Two step backward; one step forward
![Page 59: Extending Expectation Propagation on Graphical Models](https://reader036.vdocuments.us/reader036/viewer/2022062518/56814a9a550346895db7a922/html5/thumbnails/59.jpg)
Approximating the partition function
• The parameters w and the labels t are intertwined in Z(w):
where k = {i, j} is the index of edges.
The joint distribution of w and t:
• Factorized approximation:
![Page 60: Extending Expectation Propagation on Graphical Models](https://reader036.vdocuments.us/reader036/viewer/2022062518/56814a9a550346895db7a922/html5/thumbnails/60.jpg)
Remove the intermediate level
Flatten Approximation Structure
Increased efficiency, stability, and accuracy!
Iterations
Iterations
Iterations
![Page 61: Extending Expectation Propagation on Graphical Models](https://reader036.vdocuments.us/reader036/viewer/2022062518/56814a9a550346895db7a922/html5/thumbnails/61.jpg)
Results on Synthetic Data
• Data generation: first, randomly sample input x, fixed true parameters w, and then sample the labels t
• Graphical structure: Four nodes in a simple loop• Comparing maximum likelihood trained CRF with Bayesian
conditional networks: 10 Trials. 100 training examples and 1000 test examples.
BCNs significantly outperformed ML CRFs.
![Page 62: Extending Expectation Propagation on Graphical Models](https://reader036.vdocuments.us/reader036/viewer/2022062518/56814a9a550346895db7a922/html5/thumbnails/62.jpg)
Ink Application: analyzing handwritten organization charts
• Parsing a graph into different components: containers vs. connectors
![Page 63: Extending Expectation Propagation on Graphical Models](https://reader036.vdocuments.us/reader036/viewer/2022062518/56814a9a550346895db7a922/html5/thumbnails/63.jpg)
Ink Application: compare BCNs with i.i.d conditional Bayesian classifiers
Results: conditional Bayesian classifiers BCNs (early version)
![Page 64: Extending Expectation Propagation on Graphical Models](https://reader036.vdocuments.us/reader036/viewer/2022062518/56814a9a550346895db7a922/html5/thumbnails/64.jpg)
Ink Application: compare ML CRFs with BCNs
• Comparing maximum likelihood trained CRFs with Bayesian conditional networks (BCNs): 15 Trials, 14 graphs for training and 9 graphs for testing in each trials.
• BCNs significantly outperformed ML CRFs.
![Page 65: Extending Expectation Propagation on Graphical Models](https://reader036.vdocuments.us/reader036/viewer/2022062518/56814a9a550346895db7a922/html5/thumbnails/65.jpg)
4 types of graphical models
Bayesian networks
Markov networks
conditional classification conditional random fields
x1 x2
y1 y2
j
jpaji
ipai ppp )|()|()( )()( xyxxyx,
x1 x2
y1 y2
a
aZp )(
1)( yx,yx,
i
iitpp )|()( w,xwx,|t a
aZp )(
)(
1)( ty,x,
wxw,|t
x1 x2
t1 t2
w x1 x2
t1 t2
w
![Page 66: Extending Expectation Propagation on Graphical Models](https://reader036.vdocuments.us/reader036/viewer/2022062518/56814a9a550346895db7a922/html5/thumbnails/66.jpg)
Outline
• Background on expectation propagation (EP) • Inference on graphical models• Learning conditional graphical models• Conclusions
– 4 extensions to EP on 4 types of graphical models & 3 real-world applications
– Inference: better trade-off between accuracy and efficiency
– Learning: better generalization than the state of the art.
![Page 67: Extending Expectation Propagation on Graphical Models](https://reader036.vdocuments.us/reader036/viewer/2022062518/56814a9a550346895db7a922/html5/thumbnails/67.jpg)
Conclusion: 4 extensions & 3 applications
1. Extending EP on dynamic models by fixed-lag smoothing and embedding different approximation techniques:
– Wireless signal detection: Much less computation, and comparable or superior accuracy to sequential Monte Carlo
2. Combining EP with local propagation on loopy graphs:– Outperformed belief propagation, naïve mean field, and
structure variational methods3. Extending EP classification to perform feature selection:
– Gene expression classification: Outperformed traditional ARD, SVM with feature selection
4. Training Bayesian conditional random fields to deal with denominator and flattened approximation structure
– Ink analysis: Beats ML CRFs
![Page 68: Extending Expectation Propagation on Graphical Models](https://reader036.vdocuments.us/reader036/viewer/2022062518/56814a9a550346895db7a922/html5/thumbnails/68.jpg)
Extended EP algorithms for inference and learning
Computational time
Infe
renc
e er
ror
Outperform state-of-art inference techniques in the trade-off between accuracy and efficiency
Extended EP
State-of-artInference tech.
0
Lea
rnin
g te
ch.
Outperform state-of-art techs on sparse learning and CRFs in generalization performance
Test error with error bar0
Extended EP
State-oft-art learning tech.
![Page 69: Extending Expectation Propagation on Graphical Models](https://reader036.vdocuments.us/reader036/viewer/2022062518/56814a9a550346895db7a922/html5/thumbnails/69.jpg)
Acknowledgement• My advisor Roz Picard
• Tom Minka
• Tommi and Zoubin
• Rgrads: Ashish, Carson, Karen, Phil, Win, Raul, etc
• Researchers at MSR: Martin Szummer, Chris Bishop, Ralf Herbrich, Thore Graepel, Andrew Blake
• Folks at UCL: Chu Wei, Jaz Kandola, Fernando, Ed, Iain, Katherine, and Mark
• Peter Gorniak and Brian Whitman
![Page 71: Extending Expectation Propagation on Graphical Models](https://reader036.vdocuments.us/reader036/viewer/2022062518/56814a9a550346895db7a922/html5/thumbnails/71.jpg)
a
aZp )(
)(
1)( ty,x,
wxw,|t
x1 x2
t1 t2
wx1 x2
t1 t2
w
![Page 72: Extending Expectation Propagation on Graphical Models](https://reader036.vdocuments.us/reader036/viewer/2022062518/56814a9a550346895db7a922/html5/thumbnails/72.jpg)
![Page 73: Extending Expectation Propagation on Graphical Models](https://reader036.vdocuments.us/reader036/viewer/2022062518/56814a9a550346895db7a922/html5/thumbnails/73.jpg)
Conclusions
• Extend EP on graphical models:– Instead of minimizing KL divergence, use other
sensible criteria to generate messages. Effectively turn any deterministic filtering method into a smoothing method.
– Use quadrature to approximate messages.– Local propagation to save the computation and
memory in tree structured EP.
![Page 74: Extending Expectation Propagation on Graphical Models](https://reader036.vdocuments.us/reader036/viewer/2022062518/56814a9a550346895db7a922/html5/thumbnails/74.jpg)
Conclusions
• Extended EP algorithms outperform state-of-art inference methods on graphical models in the trade-off between accuracy and efficiency
Computational Time
Err
or
Extended EP
State-of-artTechniques
![Page 75: Extending Expectation Propagation on Graphical Models](https://reader036.vdocuments.us/reader036/viewer/2022062518/56814a9a550346895db7a922/html5/thumbnails/75.jpg)
Future Work
• More extensions of EP:– How to choose a sensible approximation family (e.g.
which tree structure)
– More flexible approximation: mixture of EP?
– Error bound?
– Bayesian conditional random fields
– EP for optimization (generalize max-product)
• More real-world applications, e.g., classification of gene expression data.
![Page 76: Extending Expectation Propagation on Graphical Models](https://reader036.vdocuments.us/reader036/viewer/2022062518/56814a9a550346895db7a922/html5/thumbnails/76.jpg)
Motivation
Task 1: Classify high dimensional datasets with many irrelevant features, e.g., normal v.s. cancer microarray data.
Task 2: Sparse Bayesian kernel classifiers for fast test performance.
![Page 77: Extending Expectation Propagation on Graphical Models](https://reader036.vdocuments.us/reader036/viewer/2022062518/56814a9a550346895db7a922/html5/thumbnails/77.jpg)
Outline
• Background on expectation propagation (EP)
• Extending EP on Bayesian dynamic networks – Fixed lag smoothing: wireless signal detection
– Different approximation techniques: poisson tracking
• Combining EP with junction tree algorithm on loopy graphs
• Extending EP classification to perform feature selection
– Gene expression classification
• Training Bayesian conditional random fields
– Handwritten ink analysis
• Conclusions and future work
![Page 78: Extending Expectation Propagation on Graphical Models](https://reader036.vdocuments.us/reader036/viewer/2022062518/56814a9a550346895db7a922/html5/thumbnails/78.jpg)
Outline• Background
– Bayesian classification model
– Automatic relevance determination (ARD)
• Risk of Overfitting by optimizing hyperparameters
• Predictive ARD by expectation propagation (EP):
– Approximate prediction error
– EP approximation
• Experiments
• Conclusions
![Page 79: Extending Expectation Propagation on Graphical Models](https://reader036.vdocuments.us/reader036/viewer/2022062518/56814a9a550346895db7a922/html5/thumbnails/79.jpg)
Outline• Background
– Bayesian classification model
– Automatic relevance determination (ARD)
• Risk of Overfitting by optimizing hyperparameters
• Predictive ARD by expectation propagation (EP):
– Approximate prediction error
– EP approximation
– Sequential update
• Experiments
• Conclusion
![Page 80: Extending Expectation Propagation on Graphical Models](https://reader036.vdocuments.us/reader036/viewer/2022062518/56814a9a550346895db7a922/html5/thumbnails/80.jpg)
Outline• Background
– Bayesian classification model
– Automatic relevance determination (ARD)
• Risk of Overfitting by optimizing hyperparameters
• Predictive ARD by expectation propagation (EP):
– Approximate prediction error
– EP approximation
• Experiments
• Conclusions
![Page 81: Extending Expectation Propagation on Graphical Models](https://reader036.vdocuments.us/reader036/viewer/2022062518/56814a9a550346895db7a922/html5/thumbnails/81.jpg)
Outline• Background
– Bayesian classification model
– Automatic relevance determination (ARD)
• Risk of Overfitting by optimizing hyperparameters
• Predictive ARD by expectation propagation (EP):
– Approximate prediction error
– EP approximation
– Sequential update
• Experiments
• Conclusions
![Page 82: Extending Expectation Propagation on Graphical Models](https://reader036.vdocuments.us/reader036/viewer/2022062518/56814a9a550346895db7a922/html5/thumbnails/82.jpg)
Conclusions
• Maximizing marginal likelihood can lead to overfitting in the model space if there are a lot of features.
• We propose Predictive-ARD based on EP for – feature selection
– sparse kernel learning
• In practice Predictive-ARD works better than traditional ARD.
![Page 83: Extending Expectation Propagation on Graphical Models](https://reader036.vdocuments.us/reader036/viewer/2022062518/56814a9a550346895db7a922/html5/thumbnails/83.jpg)
Three Extensions1. Instead of choosing the approximate term to
minimize the following KL divergence:
))()(~
||)()((minarg \\
)(~
xxxxx
aa
aa
f
qfqfD
a
)(~
xaf
use other criteria.2. Use numerical approximation to compute moments: Quadrature or Monte Carlo.
3. Allow the tree-structured q(x) to be non-coherent during the iterations. It only needs to be coherent in the end.
![Page 84: Extending Expectation Propagation on Graphical Models](https://reader036.vdocuments.us/reader036/viewer/2022062518/56814a9a550346895db7a922/html5/thumbnails/84.jpg)
Motivation
Computational Time
Err
orCurrentTechniques
What we want
![Page 85: Extending Expectation Propagation on Graphical Models](https://reader036.vdocuments.us/reader036/viewer/2022062518/56814a9a550346895db7a922/html5/thumbnails/85.jpg)
Efficiency vs. Accuracy
Computational Time
Err
or
Extended EP ?
Monte Carlo
Loopy BP (Factorized EP)
![Page 86: Extending Expectation Propagation on Graphical Models](https://reader036.vdocuments.us/reader036/viewer/2022062518/56814a9a550346895db7a922/html5/thumbnails/86.jpg)
Outline• Background
– Bayesian classification model
– Automatic relevance determination (ARD)
• Risk of Overfitting by optimizing hyperparameters
• Predictive ARD by expectation propagation (EP):
– Approximate prediction error
– EP approximation
– Sequential update
• Experiments
• Conclusions
![Page 87: Extending Expectation Propagation on Graphical Models](https://reader036.vdocuments.us/reader036/viewer/2022062518/56814a9a550346895db7a922/html5/thumbnails/87.jpg)
Conclusions
• Maximizing marginal likelihood can lead to overfitting in the model space if there are a lot of features.
• We propose Predictive-ARD based on EP for – feature selection
– sparse kernel learning
• In practice Predictive-ARD works better than traditional ARD.
![Page 88: Extending Expectation Propagation on Graphical Models](https://reader036.vdocuments.us/reader036/viewer/2022062518/56814a9a550346895db7a922/html5/thumbnails/88.jpg)
Outline• Background
– Bayesian classification model
– Automatic relevance determination (ARD)
• Risk of Overfitting by optimizing hyperparameters
• Predictive ARD by expectation propagation (EP):
– Approximate prediction error
– EP approximation
– Sequential update
• Experiments
• Conclusions
![Page 89: Extending Expectation Propagation on Graphical Models](https://reader036.vdocuments.us/reader036/viewer/2022062518/56814a9a550346895db7a922/html5/thumbnails/89.jpg)
Conclusions
• Maximizing marginal likelihood can lead to overfitting in the model space if there are a lot of features.
• We propose Predictive-ARD based on EP for – feature selection
– sparse kernel learning
• In practice Predictive-ARD works better than traditional ARD.
![Page 90: Extending Expectation Propagation on Graphical Models](https://reader036.vdocuments.us/reader036/viewer/2022062518/56814a9a550346895db7a922/html5/thumbnails/90.jpg)
Outline• Background
– Bayesian classification model
– Automatic relevance determination (ARD)
• Risk of Overfitting by optimizing hyperparameters
• Predictive ARD by expectation propagation (EP):
– Approximate prediction error
– EP approximation
– Sequential update
• Experiments
• Conclusions
![Page 91: Extending Expectation Propagation on Graphical Models](https://reader036.vdocuments.us/reader036/viewer/2022062518/56814a9a550346895db7a922/html5/thumbnails/91.jpg)
Conclusions
• Maximizing marginal likelihood can lead to overfitting in the model space if there are a lot of features.
• We propose Predictive-ARD based on EP for – feature selection
– sparse kernel learning
• In practice Predictive-ARD works better than traditional ARD.
![Page 92: Extending Expectation Propagation on Graphical Models](https://reader036.vdocuments.us/reader036/viewer/2022062518/56814a9a550346895db7a922/html5/thumbnails/92.jpg)
• Background
– Bayesian classification model
– Automatic relevance determination (ARD)
• Risk of Overfitting by optimizing hyperparameters
• Predictive ARD by expectation propagation (EP):
– Approximate prediction error
– EP approximation
– Sequential update
• Experiments
• Conclusion
Outline
![Page 93: Extending Expectation Propagation on Graphical Models](https://reader036.vdocuments.us/reader036/viewer/2022062518/56814a9a550346895db7a922/html5/thumbnails/93.jpg)
Motivation
Computational Time
Err
orCurrentTechniques
What we want
![Page 94: Extending Expectation Propagation on Graphical Models](https://reader036.vdocuments.us/reader036/viewer/2022062518/56814a9a550346895db7a922/html5/thumbnails/94.jpg)
Inference on Graphical Models
• Bayesian inference techniques:
– Belief propagation (BP): Kalman filtering /smoothing, forward-backward algorithm
– Monte Carlo: Particle filter/smoothers, MCMC
• Loopy BP: typically efficient, but not accurate on general loopy graphs
• Monte Carlo: accurate, but often not efficient
![Page 95: Extending Expectation Propagation on Graphical Models](https://reader036.vdocuments.us/reader036/viewer/2022062518/56814a9a550346895db7a922/html5/thumbnails/95.jpg)
Extended EP vs. Monte Carlo: Accuracy
Variance
Mean
![Page 96: Extending Expectation Propagation on Graphical Models](https://reader036.vdocuments.us/reader036/viewer/2022062518/56814a9a550346895db7a922/html5/thumbnails/96.jpg)
Poisson Tracking Model
)01.0,(~)|( 11 ttt xNxxp
)100,0(~)( 1 Nxp
!/)exp()|( tx
tttt yexyxyp t
![Page 97: Extending Expectation Propagation on Graphical Models](https://reader036.vdocuments.us/reader036/viewer/2022062518/56814a9a550346895db7a922/html5/thumbnails/97.jpg)
Extended-EP Joint Signal Detection and Channel Estimation
• Turn mixture of Kalman filters into a smoothing method
• Smoothing over the last observations
• Observations before act as prior for the current estimation
)( t
![Page 98: Extending Expectation Propagation on Graphical Models](https://reader036.vdocuments.us/reader036/viewer/2022062518/56814a9a550346895db7a922/html5/thumbnails/98.jpg)
Bayesian Networks for Adaptive Decoding
x1 x2 xT
y1 y2 yT
e1 e2 eT
The information bits et are coded by a convolutional
error-correcting encoder.
![Page 99: Extending Expectation Propagation on Graphical Models](https://reader036.vdocuments.us/reader036/viewer/2022062518/56814a9a550346895db7a922/html5/thumbnails/99.jpg)
EP Outperforms Viterbi Decoding
Signal-Noise-Ratio
![Page 100: Extending Expectation Propagation on Graphical Models](https://reader036.vdocuments.us/reader036/viewer/2022062518/56814a9a550346895db7a922/html5/thumbnails/100.jpg)
Combine Tree-structured Approximation with Junction Tree algorithm
• Combine EP with junction algorithm
• Can perform efficiently over hypertrees and hypernodes
![Page 101: Extending Expectation Propagation on Graphical Models](https://reader036.vdocuments.us/reader036/viewer/2022062518/56814a9a550346895db7a922/html5/thumbnails/101.jpg)
8x8 grids, 10 trials
Method FLOPS Error
Exact 30,000 0
TreeEP 300,000 0.149
BP/double-loop 15,500,000 0.358
GBP 17,500,000 0.003
![Page 102: Extending Expectation Propagation on Graphical Models](https://reader036.vdocuments.us/reader036/viewer/2022062518/56814a9a550346895db7a922/html5/thumbnails/102.jpg)
4-node Graph
TreeEP = the proposed method GBP = generalized belief propagation on triangles TreeVB = variational tree BP = loopy belief propagation = Factorized EP MF = mean-field
![Page 103: Extending Expectation Propagation on Graphical Models](https://reader036.vdocuments.us/reader036/viewer/2022062518/56814a9a550346895db7a922/html5/thumbnails/103.jpg)
Efficiency vs. Accuracy
Computational Time
Err
or
Extended EP ?
Monte Carlo
Loopy BP (Factorized EP)
![Page 104: Extending Expectation Propagation on Graphical Models](https://reader036.vdocuments.us/reader036/viewer/2022062518/56814a9a550346895db7a922/html5/thumbnails/104.jpg)
Outline
• Background on expectation propagation (EP) • Extending EP on Bayesian dynamic networks
– Fixed lag smoothing: wireless signal detection– Different approximation techniques: poisson tracking
• Combining EP with junction tree algorithm on loopy graphs• Extending EP classification to perform feature selection
– Gene expression classification• Training Bayesian conditional random fields
– Handwritten ink analysis• Conclusions and future work
![Page 105: Extending Expectation Propagation on Graphical Models](https://reader036.vdocuments.us/reader036/viewer/2022062518/56814a9a550346895db7a922/html5/thumbnails/105.jpg)
Outline• Background
– Bayesian classification model
– Automatic relevance determination (ARD)
• Risk of Overfitting by optimizing hyperparameters
• Predictive ARD by expectation propagation (EP):
– Approximate prediction error
– EP approximation
• Experiments
• Conclusions
![Page 106: Extending Expectation Propagation on Graphical Models](https://reader036.vdocuments.us/reader036/viewer/2022062518/56814a9a550346895db7a922/html5/thumbnails/106.jpg)
Outline: Extending EP classification to perform feature selection
• Background
– Bayesian classification model
– Automatic relevance determination (ARD)
• Risk of Overfitting by optimizing hyperparameters
• Predictive ARD by expectation propagation (EP):
– Approximate prediction error
– EP approximation
• Experiments
![Page 107: Extending Expectation Propagation on Graphical Models](https://reader036.vdocuments.us/reader036/viewer/2022062518/56814a9a550346895db7a922/html5/thumbnails/107.jpg)
Approximate Leave-One-Out ErrorThree key steps:• Deletion Step: approximate the “leave-one-out”
predictive posterior for the ith point:
• Minimizing the following KL divergence by moment matching:
• Inclusion:
ij
jii ffqq )(
~)(
~/)()(\ wwww
))()(~
||)()((minarg \\
)(~
wwwww
ii
ii
f
qfqfKL
i
)()(~
)( \ www ii qfq
The key observation: we can use the approximate predictive posterior, obtained in the deletion step, for model selection. No extra computation!
![Page 108: Extending Expectation Propagation on Graphical Models](https://reader036.vdocuments.us/reader036/viewer/2022062518/56814a9a550346895db7a922/html5/thumbnails/108.jpg)
Bayesian Sparse Kernel Classifiers
• Using feature/kernel expansions defined on training data points:
• Predictive-ARD-EP trains a classifier that depends on a small subset of the training set.
• Fast test performance.
![Page 109: Extending Expectation Propagation on Graphical Models](https://reader036.vdocuments.us/reader036/viewer/2022062518/56814a9a550346895db7a922/html5/thumbnails/109.jpg)
Test error rates and numbers of relevance or support vectors on breast cancer dataset.
50 partitionings of the data were used. All these methods use the same Gaussian kernel with kernel width = 5. The trade-off parameter C in SVM is chosen via 10-fold cross-validation for each partition.
![Page 110: Extending Expectation Propagation on Graphical Models](https://reader036.vdocuments.us/reader036/viewer/2022062518/56814a9a550346895db7a922/html5/thumbnails/110.jpg)
Test error rates on diabetes data
100 partitionings of the data were used. Evidence and Predictive ARD-EPs use the Gaussian kernel with kernel width = 5.
![Page 111: Extending Expectation Propagation on Graphical Models](https://reader036.vdocuments.us/reader036/viewer/2022062518/56814a9a550346895db7a922/html5/thumbnails/111.jpg)
Ink application: using graphical models
• Three steps:– Subdivision of pen strokes into fragments,
– Construction of a conditional random field that only contains pairwise features based on the fragments,
– Training and inference on the network.
![Page 112: Extending Expectation Propagation on Graphical Models](https://reader036.vdocuments.us/reader036/viewer/2022062518/56814a9a550346895db7a922/html5/thumbnails/112.jpg)
Low rank matrix computation
• Explore the structure of the problem
• Observation: each potential function only constraints the posterior in a subspace
• More efficiency with low-rank matrix computation
![Page 113: Extending Expectation Propagation on Graphical Models](https://reader036.vdocuments.us/reader036/viewer/2022062518/56814a9a550346895db7a922/html5/thumbnails/113.jpg)
Compare to Belief Propagation in ML training
• Similarity: Both propagate probabilistic information between nodes in a graph
• Difference: Bayesian training averages the belief q(t) over the potential parameters w, while belief propagation does not.
![Page 114: Extending Expectation Propagation on Graphical Models](https://reader036.vdocuments.us/reader036/viewer/2022062518/56814a9a550346895db7a922/html5/thumbnails/114.jpg)
TreeEP versus BP and GBP
• TreeEP is always more accurate than BP and is often faster
• TreeEP is much more efficient than GBP and more accurate on some problems
• TreeEP converges more often than BP and GBP