the new generation of neural networks1 - uamarantxa.ii.uam.es/~jmlobato/docs/presentaciondbn.pdf ·...

28
Deep Belief Networks The New Generation of Neural Networks 1 Jos´ e Miguel Hern´ andez Lobato and Daniel Hern´ andez Lobato Universidad Aut´onoma de Madrid, Computer Science Department May 5, 2008 1 This presentation is mainly based on the work by Geoffrey E. Hinton. 1 / 28

Upload: hoanghanh

Post on 12-Jul-2018

214 views

Category:

Documents


0 download

TRANSCRIPT

Deep Belief Networks

The New Generation of Neural Networks1

Jose Miguel Hernandez Lobato and Daniel Hernandez Lobato

Universidad Autonoma de Madrid, Computer Science Department

May 5, 2008

1This presentation is mainly based on the work by Geoffrey E. Hinton.1 / 28

Deep Belief Networks

Outline

1 Boltzmann Machines

2 Restricted Boltzmann Machines

3 Deep Belief Networks

4 Applications of Deep Belief Networks

5 Deep Belief Networks and the Human Brain

2 / 28

Deep Belief Networks

Boltzmann Machines

Outline

1 Boltzmann Machines

2 Restricted Boltzmann Machines

3 Deep Belief Networks

4 Applications of Deep Belief Networks

5 Deep Belief Networks and the Human Brain

3 / 28

Deep Belief Networks

Boltzmann Machines

Boltzmann Machines

Networks of stochastic binnary units with associated energy

E (x) = −1

2xtWx (1)

and associated probability distribution

P(x|W) =1

Z (W)exp

[

1

2xtWx

]

. (2)

The activity rule of the net implements Gibbs sampling fromP(x|W):

P(xi = 1|W) =1

1 + e−ai. (3)

where ai =∑

j wijxj .

4 / 28

Deep Belief Networks

Boltzmann Machines

Learning in Boltzmann Machines

Given a set of examples {x(n)}N1 we want to adjust W so that

P(x|W) is a good generative model. For this, we maximize

log

[

N∏

n=1

P(x(n)|W)

]

=N

n=1

[

1

2x(n)Wx(n) − log Z (W )

]

. (4)

The gradient descent learning rule is

∆wij = ηN(

EData[xixj ] − EP(x|W)[xixj ])

, (5)

where η is the learning rate. The rule has a wake and a sleep step.

5 / 28

Deep Belief Networks

Boltzmann Machines

Learning in Boltzmann Machines with hidden units

x denotes the visible units.

h denotes the hidden units.

yi denotes an arbitrary neuron.

The likelihood of W given a single data example x(n) is:

h

P(x(n),h|W) =

h

1

Z (W)exp

[

1

2

[

y(n)]t

Wy(n)

]

. (6)

The learning rule given a sample {x(n)}Nn=1is

∆wij = η

N∑

n=1

(

EP(h|x(n),W)[yiyj ] − EP(x,h|W)[yiyj ]

)

(7)

and again has a wake and a sleep step.6 / 28

Deep Belief Networks

Boltzmann Machines

Why are Boltzmann Machines not in widespread use?

Training depends on computing the gradient by Monte Carlomethods (Gibbs sampling).

A Boltzmann machine with many units requires a huge amount ofsamples to approximate the equilibrium distribution.

The origin of the problem is that the conditional distributions ofthe hidden units and the visible units do not factorize due to thevisible to visible and hidden to hidden connections.

7 / 28

Deep Belief Networks

Restricted Boltzmann Machines

Outline

1 Boltzmann Machines

2 Restricted Boltzmann Machines

3 Deep Belief Networks

4 Applications of Deep Belief Networks

5 Deep Belief Networks and the Human Brain

8 / 28

Deep Belief Networks

Restricted Boltzmann Machines

Restricted Boltzmann Machines

They are Boltzmann Machines where learning is feasible.

No visible to visible and hidden to hidden connections.

The distributions P(h|x,W) and P(x|h,W) now factorize:

P(h|x,W) =∏

i

P(hi |x,W) (8)

P(x|h,W) =∏

i

P(xi |h,W) . (9)

The learning rule given a sample {x(n)}Nn=1 is still the same:

∆wij = η

N∑

n=1

(

EP(h|x(n),W)[xihj ] − EP(x,h|W)[xihj ]

)

. (10)

9 / 28

Deep Belief Networks

Restricted Boltzmann Machines

Learning in RBMs

In an RBM we can compute EP(h|x(n),W)[xihj ] exactly.

Contrastive divergence (an approximation to Gibbs sampling) isused to estimate EP(x,h|W)[xihj ] in the sleep step:

For each data point x(n) we

1 Sample h(1) from P(h|x(n)).

2 Sample x′(n) from P(x|h(1)).

3 Sample h(2) from P(h|x′(n)).

1N

∑Nn=1 x

′(n)i h

(2)j approximates EP(x,h|W)[xihj ].

10 / 28

Deep Belief Networks

Deep Belief Networks

Outline

1 Boltzmann Machines

2 Restricted Boltzmann Machines

3 Deep Belief Networks

4 Applications of Deep Belief Networks

5 Deep Belief Networks and the Human Brain

11 / 28

Deep Belief Networks

Deep Belief Networks

Deep Belief Network

A deep generative model can be obtained by stacking RBMs. Eachlayer of RBMs models more and more abstract features.

Each time an RBM is added to the stack a lower bound on thelikelihood increases.

x

h1

RBM

h1

h2

x

RBM

RBM

h1

h2

x

RBM

RBM

RBM

h3

12 / 28

Deep Belief Networks

Deep Belief Networks

Greedy algorithm for stacking RBMs

The deep architecture is initially empty.

1 Learn an RBM and put it on the top.

2 Filter the data through the current deep architecture.

3 Learn an RBM using the filtered data and put it on the top.

4 Filter the data through the current deep architecture.

5 Repeat 3 and 4 until n RBMs have been stacked.

To filter the data through the deep architecture we just propagateexpectations up in the RBMs, conditioning to the original data.

We sample from the deep architecture by sampling from the topRBM and then sampling the remaining RBMs down.

13 / 28

Deep Belief Networks

Deep Belief Networks

Fine tuning DBNs

After the greedy algorithm the recognition and generative weightscan be fine-tuned by means of a Wake-Sleep method or aback-propagation technique.

Wake-Sleep

The original data is propagated up, the top RBM iterates a fewtimes and then a sample is propagated down. The weigths areupdated so that the sample of the DBN matches the original data.

Back-Propagation

The network is unfoled to produce encoder and decoder networks.Stochastic activities are replaced by deterministic probabilities andthe weights are updated by back-propagation for optimalreconstruction.

14 / 28

Deep Belief Networks

Applications of Deep Belief Networks

Outline

1 Boltzmann Machines

2 Restricted Boltzmann Machines

3 Deep Belief Networks

4 Applications of Deep Belief Networks

5 Deep Belief Networks and the Human Brain

15 / 28

Deep Belief Networks

Applications of Deep Belief Networks

Applications of DBNs

High level feature extraction

Non-linear dimensionality reduction: MNIST and Olivetti faces.

Digits Recognition

MNIST example from Hinton’s web.

16 / 28

Deep Belief Networks

Applications of Deep Belief Networks

Feature Extraction: Olivetti faces

Description

400 faces in 24 × 24 bitmap images.

Gray scale images.

Pixel intesities (0 − 255) are normalized to lie in the [0, 1]interval.

5 transformations increase the set size to 1600 images.

17 / 28

Deep Belief Networks

Applications of Deep Belief Networks

Feature Extraction: Olivetti faces

18 / 28

Deep Belief Networks

Applications of Deep Belief Networks

Feature Extraction: Olivetti faces.

Comparison of original images, DBN reconstructions, and PCAreconstructions.

19 / 28

Deep Belief Networks

Applications of Deep Belief Networks

Feature Extraction: MNIST.

Description

60.0000 hand written digit images.

28 × 28 gray scale pixels.

Normalized to lie in [0, 1].

DBN arquitecture: 1000, 500, 250, 30, 2.

20 / 28

Deep Belief Networks

Applications of Deep Belief Networks

Feature Extraction: MNIST. PCA results.

−4 −2 0 2 4 6 8

−6

−4

−2

02

4

PC1

PC

2

21 / 28

Deep Belief Networks

Applications of Deep Belief Networks

Feature Extraction: MNIST. DBN results.

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

h1

h2

22 / 28

Deep Belief Networks

Applications of Deep Belief Networks

MNIST: digit recognition.

See web application.

23 / 28

Deep Belief Networks

Deep Belief Networks and the Human Brain

Outline

1 Boltzmann Machines

2 Restricted Boltzmann Machines

3 Deep Belief Networks

4 Applications of Deep Belief Networks

5 Deep Belief Networks and the Human Brain

24 / 28

Deep Belief Networks

Deep Belief Networks and the Human Brain

DBNs and the Human Brain

The human brain could be a huge DBN with a temporal dimension.

The memory-prediction framework is a theory of brain functioningwhich has many elements in common with DBNs:

A hierarchy of recognition with higher levels representing moreand more abstract and invariant features.

The predictions propagated down in the memory-predictionframework are similar to the sleep phase in DBNs.

DBNs have a top level associative memory. Thememory-prediction framework places the hippocampus at thetop of its hierarchy. The hippocampus is essential for theformation of long-term memory.

25 / 28

Deep Belief Networks

Deep Belief Networks and the Human Brain

A Cube

26 / 28

Deep Belief Networks

Deep Belief Networks and the Human Brain

References

David J. C. MacKay. Information Theory, Inference, andLearning Algorithms. Cambridge University Press. 2003.

Hinton, G. E., Osindero, S. and Teh, Y. A fast learningalgorithm for deep belief nets. Neural Computation 18, pp1527-1554. 2006.

Hinton, G. E. and Salakhutdinov, R. Reducing thedimensionality of data with neural networks. Science, Vol.313. no. 5786, pp. 504-507, 28 July 2006.

Jeff Hawkins, Sandra Blakeslee. On Intelligence. Timesbooks, 2004.

27 / 28

Deep Belief Networks

Deep Belief Networks and the Human Brain

Questions?

28 / 28