deep boltzman machines paper by : r. salakhutdinov, g. hinton presenter : roozbeh gholizadeh

Post on 22-Dec-2015






Click to see full reader


Deep Boltzman machinesPaper by : R. Salakhutdinov, G. Hinton

Presenter : Roozbeh Gholizadeh


Problems with some other methods!

Energy based models

Boltzmann machine

Restricted Boltzmann machine

Deep Boltzmann machine

Problems with other methods!

Supervised learning need labeled data.

Amount of information restricted by labels!

Finding and knowing abnormalities before ever seeing them such as some conditions in a nuclear power plant.

So Instead of learning p(label | data) learn p(data)

Energy Based Models

Some Energy function is defined. Energy function shows score (scalar value) assigned to a configuration.

Ex. , Boltzman (Gibbs) Distribution.

, integral of numerator over all observations.

Parameters that lead to lower energy are desired.

Boltzmann machine

Markov random field (MRF) with hidden variables.

Undirected edges representing dependency. Weights can be assigned.

Conditional distributions over hidden and visible units:


Learning process

Parameters update:

Exact maximum likelihood learning is intractable.

Use Gibbs sampling to approximate.

Run 2 separate Markov chains to approximate them.

Restricted Boltzmann Machine

Setting .

Without visible-visible and hidden-hidden connections!

Learning carried out efficiently using Contrastive Divergence (CD)

Or Stochastic approximation procedure (SAP)

Variational Approach to estimating data-dependent expectations.

Stochastic approximation procedure (SAP)

and : current parameters and state

and updated sequentially as :

Given , a new state sampled from a transition operator that leaves invariant.

New parameter obtained by replacing intractable model’s expectation by expectation with respect to

Learning rate has to decrease with time, for example by .

Why go deep?

Why go deep?

Deep architectures are representationally efficient, fewer computational units for same function.

Allow for showing a hierarchy.

Non-local generalization

Easier to monitor what is being learn

and guide the machine.

Deep Boltzmann Machine

Undirected connection between all layers.

Conditional distributions over visible and hidden:”

Pretraining (greedy layerwise)

MNIST dataset


Misclassification Error rate:DBM : 10.8% , SVM:11.6% , logistic regression: 22.5% , K-nearest neighbors : 18.4%

Thank you!

top related