introduction to machine learning10701/slides/21_undirected_graphical_models.pdf · example: image...
Post on 08-Mar-2020
10 Views
Preview:
TRANSCRIPT
Introduction to Machine Learning
Undirected Graphical Models
Barnabás Póczos
Credits
Many of these slides are taken from
Ruslan Salakhutdinov, Hugo Larochelle, & Eric Xing
• http://www.dmi.usherb.ca/~larocheh/neural_networks
• http://www.cs.cmu.edu/~rsalakhu/10707/
• http://www.cs.cmu.edu/~epxing/Class/10708/
Reading material:• http://www.cs.cmu.edu/~rsalakhu/papers/Russ_thesis.pdf
• Section 30.1 of Information Theory, Inference, and Learning Algorithms by David
MacKay
• http://www.stat.cmu.edu/~larry/=sml/GraphicalModels.pdf
2
Probabilistic graphical models: a powerful framework for representing dependency
structure between random variables.
Markov network (or undirected graphical model) is a set of random variables having a
dependency structure described by an undirected graph.
Undirected Graphical Models
= Markov Random Fields
Semantic labeling
3
4
Cliques
Clique: a subset of nodes such that there exists a link between all pairs of nodes in a
subset.
Maximal Clique: a clique such that it is not possible to include any other nodes in
the set without it ceasing to be a clique.
This graph has two maximal cliques:
Other cliques:
Undirected Graphical Models
= Markov Random Fields
Directed graphs are useful for expressing causal relationships between random
variables, whereas undirected graphs are useful for expressing dependencies between
random variables.
The joint distribution defined by the graph is given by the
product of non-negative potential functions over the
maximal cliques (connected subset of nodes).
In this example, the joint distribution factorizes as:
5
Markov Random Fields (MRFs)
Each potential function is a mapping from the joint
configurations of random variables in a maximal clique to non-
negative real numbers.
The choice of potential functions is not restricted to having
specific probabilistic interpretations.
6where E(x) is called an energy function.
Conditional Independence
7
Theorem:
It follows that the undirected graphical structure represents conditional independence:
MRFs with Hidden Variables
8
For many interesting problems, we need to introduce hidden or latent variables.
Our random variables will contain both visible and hidden variables x=(v,h)
Computing the Z partition function is intractable
Computing the summation over hidden variables is intractable
Parameter learning is very challenging.
Boltzmann Machines
Definition: [Boltzmann machines] MRFs with maximum click size two [pairwise (edge)
potentials] on binary-valued nodes are called Boltzmann machines
The joint probabilities are given by :
The parameter θij measures the dependence of xi on xj, conditioned on the other nodes. 9
Boltzmann Machines
Theorem: One can prove that the conditional distribution of one node conditioned
on the others is given by the logistic function in Boltzmann Machines:
10
Proof:
Boltzmann Machines
11
Proof [Continued]:
Q.E.D.
Example: Image Denoising
12
Let us look at the example of noise removal from a binary image.
The image is an array of {-1, +1} pixel values.
We take the original noise-free image (x) and randomly flip the sign of pixels with a small
probability. This process creates the noisy image (y)
Our goal is to estimate the original image x from the noisy observations y.
We model the joint distribution with
Inference: Iterated Conditional Models
13
Goal:
Solution: coordinate-wise gradient descent
Gaussian MRFs
• The information matrix is sparse, but the covariance matrix is not.14
Restricted Boltzmann Machines
15
Restricted Boltzmann Machines
Restricted = no connections in the hidden layer + no connection in the visible layer
16
x
Partition function (intractable)
[Quadratic in v linear in h]
17
Gaussian-Bernoulli RBM
Possible Tasks with RBM
Tasks:
Inference:
Evaluate the likelihood function:
Sampling from RBM:
Training RBM:
18
Inference
19
Inference
20
Theorem: Inference in RBM is simple: the conditional
distributions are logistic functions
Similarly,
x
21
Proof:
22
Proof [Continued]:
Q.E.D.
Evaluating the Likelihood
23
Calculating the Likelihood of an RBM
24
Theorem: Calculating the likelihood is simple in RBM
(apart from the partition function)
Free energy
Proof:
25Q.E.D.
Sampling
26
Sampling from p(x,h) in RBM
27
Sampling is tricky… it is easier much in directed graphical models.
Here we will use Gibbs sampling.
Similarly,
x
Goal: Generate samples from
28
Gibbs Sampling: The Problem
Our goal is to generate samples from
Suppose that we can generate samples from
29
Gibbs Sampling: Pseudo Code
30
Gibbs Sampling
31
Training
32
RBM Training
Training is complicated…
To train an RBM, we would like to minimize the negative log-likelihood
function:
To solve this, we use stochastic gradient ascent:
Theorem:
Positive phase Negative phase
(hard to computer)
33
RBM Training
Proof:
34
RBM Training
Proof [Continued]:
First term Second term
First term:
Difficult to
calculate the
expectation
Negative phase
35
RBM Training
Proof [Continued]:
36
RBM Training
Proof [Continued]:
Second term:
The conditionals
are independent
logistic
distributions
First term Second term
Q.E.DPositive phase
37
Since
We need to calculate
where
RBM Training
38
The second term is more tricky.
Approximate the expectations with a single sample:
RBM Training
Since
We need to calculate
where
39
RBM Training
LogisticLogistic
40
CD-k (Contrastive Divergence) Pseudocode
Results
41
http://deeplearning.net/tutorial/rbm.html
RBM Training Results
Samples generated by the RBM after training.
Each row represents a mini-batch of negative
particles (samples from independent Gibbs
chains). 1000 steps of Gibbs sampling were
taken between each of those rows.
Original images Learned filters
42
Summary
Tasks:
Inference:
Evaluate the likelihood function:
Sampling from RBM:
Training RBM:
43
Thanks for your Attention!
45
RBM Training Results
46
Gaussian-Bernoulli RBM Training Results
Each document (story) is represented with a bag of world coming from a multinomial
distribution with parameters (h = topics). After training we can generate words from this topics.
top related