feedback networks and hopfield networks - kth

34
Reducing the Boltzmann machine Hopfield Networks Hebbian Learning Beyond the Boltzmann machine Feedback Networks and Hopfield Networks Erik Fransén, Daniel Gillblad CB, KTH Erik Fransén, Daniel Gillblad Feedback networks

Upload: others

Post on 10-Feb-2022

6 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Feedback Networks and Hopfield Networks - KTH

Reducing the Boltzmann machineHopfield NetworksHebbian Learning

Beyond the Boltzmann machine

Feedback Networks and Hopfield Networks

Erik Fransén, Daniel Gillblad

CB, KTH

Erik Fransén, Daniel Gillblad Feedback networks

Page 2: Feedback Networks and Hopfield Networks - KTH

Reducing the Boltzmann machineHopfield NetworksHebbian Learning

Beyond the Boltzmann machine

Outline

1 Reducing the Boltzmann machine

2 Hopfield Networks

3 Hebbian Learning

4 Beyond the Boltzmann machine

Some examples and images from MacKay (2003): Information Theory,Inference, and Learning Algorithms, a good introduction to machine learningand information theory.

Erik Fransén, Daniel Gillblad Feedback networks

Page 3: Feedback Networks and Hopfield Networks - KTH

Reducing the Boltzmann machineHopfield NetworksHebbian Learning

Beyond the Boltzmann machine

Reducing the Boltzmann machine

Problem of the Boltzmann machine: Number of computationsgrows exponentially with problem size.

Reducing the topology: The restricted Boltzmann machine(no connections between hidden units)Replacing the stochastic variables with their mean: TheHopfield network

Erik Fransén, Daniel Gillblad Feedback networks

Page 4: Feedback Networks and Hopfield Networks - KTH

Reducing the Boltzmann machineHopfield NetworksHebbian Learning

Beyond the Boltzmann machine

Replacing stochastic variables

The activation (input to a neuron) from other neurons isstochastic. Replace ai = ∑j wijxj with < ai >The state of a neuron is stochastic. Replace xj with < xj >To do this, mean field theory is used.Simply stated: the average of the function value isapproximated by the function value of the average.+ much faster- only first order (and with some tricks also second order)correlations can be handled

Erik Fransén, Daniel Gillblad Feedback networks

Page 5: Feedback Networks and Hopfield Networks - KTH

Reducing the Boltzmann machineHopfield NetworksHebbian Learning

Beyond the Boltzmann machine

Hopfield Networks

A fully connected feedback network.The weights are constrained to be symmetric.Can be used

As nonlinear associative memories or content-addressablememories.To solve optimization problems.

Erik Fransén, Daniel Gillblad Feedback networks

Page 6: Feedback Networks and Hopfield Networks - KTH

Reducing the Boltzmann machineHopfield NetworksHebbian Learning

Beyond the Boltzmann machine

Discrete Hopfield Networks, Notation

We will denote weights from neuron i to neuron j as wij .The network consists of I fully connected neurons troughsymmetric connections, i.e. wij = wji .There are no self connections, thus wii = 0.Biases wi0 may be included.The activity of a neuron is written as xi .

Erik Fransén, Daniel Gillblad Feedback networks

Page 7: Feedback Networks and Hopfield Networks - KTH

Reducing the Boltzmann machineHopfield NetworksHebbian Learning

Beyond the Boltzmann machine

Discrete Hopfield Networks, Activities

A Hopfield network’s activity rule is for each neuron toupdate its state in accordance with a thresholdingactivation function,

x(a) = Θ(a)≡{

1−1

a≥ 0a < 0

As there is feedback, we need to define an order for theupdates:

Synchronous updates. All neurons compute theiractivationsai = ∑j wijxjthen update their states simultaneously usingxi = Θ(ai )Asyncronous updates. One neuron at a time updates itsactivation and state. The sequence can be fixed or random.

The properties of the network may be sensitive to updatestrategy.

Erik Fransén, Daniel Gillblad Feedback networks

Page 8: Feedback Networks and Hopfield Networks - KTH

Reducing the Boltzmann machineHopfield NetworksHebbian Learning

Beyond the Boltzmann machine

Discrete Hopfield Networks, Convergence

Let us update one neuron at a time according to theupdate rule.The network state will converge within a finite number ofsteps if

wij = wji wii = 0

Define an energy measure,

E =−12 ∑

i ,jwijxixj

The change in energy for each update is

∆Exk→x∗k =−(∑i

wkixix∗k −∑i

wkixixk ) =−(x∗k −xk )∑i

wkixi ≤ 0

Erik Fransén, Daniel Gillblad Feedback networks

Page 9: Feedback Networks and Hopfield Networks - KTH

Reducing the Boltzmann machineHopfield NetworksHebbian Learning

Beyond the Boltzmann machine

Hebbian Learning

Hebb’s postulate of learning is the oldest and most famousof all learning rules (Hebb, 1949):When an axon of cell A is near enough to excite a cell B and repeatedlyor persistently takes part in firing it, some growth process or metabolicchanges take place in one or both cells such that A’s efficiency as oneof the cells firing B, is increased.

We can reformulate this into1 If two neurons on either side of a connection are activated

simultaneously, the strength of the connection is increased.2 If two neurons on either side of a connection are activated

asynchronously, the connection is weakened or eliminated.

Erik Fransén, Daniel Gillblad Feedback networks

Page 10: Feedback Networks and Hopfield Networks - KTH

Reducing the Boltzmann machineHopfield NetworksHebbian Learning

Beyond the Boltzmann machine

Hebbian Connections

Four key properties:1 Time-dependant mechanism. Modification depends on the

time of occurrence.2 Local mechanism. Only uses locally available information.3 Interactive mechanism. A change depends on activity

levels of both sides of the connection.4 Correlational mechanism. Connection change depends on

correlation between activities.

Erik Fransén, Daniel Gillblad Feedback networks

Page 11: Feedback Networks and Hopfield Networks - KTH

Reducing the Boltzmann machineHopfield NetworksHebbian Learning

Beyond the Boltzmann machine

Hebbian Learning and Correlation

Hebbian learning can be described in terms of correlation.Positively correlated activites are increased

dwij

dt∼ Correlation(xi ,xj)

For example,

dwij

dt= ηcov(xi ,xj) = ηE [(xi − x̄i)(xj − x̄j)]

If two stimuli co-occur, the Hebbian learning rule willincrease the weights.Unsupervised learning.Can be used to provide pattern completion.

Erik Fransén, Daniel Gillblad Feedback networks

Page 12: Feedback Networks and Hopfield Networks - KTH

Reducing the Boltzmann machineHopfield NetworksHebbian Learning

Beyond the Boltzmann machine

Associative Networks

Heteroassociation: Mapping from one pattern to another,e.g.y = sign(Wx) (Thresholding)y = Wx (Linear mapping)Autoassociation: Mapping to the same pattern, e.g.x = sign(Wx)x = WxCan be performed with a recurrent network.

Erik Fransén, Daniel Gillblad Feedback networks

Page 13: Feedback Networks and Hopfield Networks - KTH

Reducing the Boltzmann machineHopfield NetworksHebbian Learning

Beyond the Boltzmann machine

Discrete Hopfield Networks, Learning

The learning rule is intended to make a set of desiredmemories {x(n)} be stable states of the activity rule.Each memory is a binary pattern, xi ∈ {−1,1}.The weights are set using the sum of outer products(Hebbian learning),

wij = η ∑n

x (n)i x (n)

j

where η is an unimportant constant.To prevent the weights from growing with the number ofpatterns, η is often set to 1

N .

Erik Fransén, Daniel Gillblad Feedback networks

Page 14: Feedback Networks and Hopfield Networks - KTH

Reducing the Boltzmann machineHopfield NetworksHebbian Learning

Beyond the Boltzmann machine

Continuous Hopfield Networks

Using the same architecture and learning rule as in thediscrete Hopfield network, we can define a continuousHopfield network.Activities are real numbers between -1 and 1.Update activities as single neurons with sigmoid activationfunctions:

Synchronous or asynchronous updates. Activations areagain calculated asai = ∑j wijxjbut neurons use the activation functionxi = tanh(ai )

Although the learning rule is the same as before, the valueof η now becomes important.

Erik Fransén, Daniel Gillblad Feedback networks

Page 15: Feedback Networks and Hopfield Networks - KTH

Reducing the Boltzmann machineHopfield NetworksHebbian Learning

Beyond the Boltzmann machine

Attractors

Use local minima to store patterns.The input of the network is the initial state.The output of the network is the closest local minima.There is an area of attraction around each local minima.

Erik Fransén, Daniel Gillblad Feedback networks

Page 16: Feedback Networks and Hopfield Networks - KTH

Reducing the Boltzmann machineHopfield NetworksHebbian Learning

Beyond the Boltzmann machine

Attractors, Example 1

Erik Fransén, Daniel Gillblad Feedback networks

Page 17: Feedback Networks and Hopfield Networks - KTH

Reducing the Boltzmann machineHopfield NetworksHebbian Learning

Beyond the Boltzmann machine

Attractors, Example 2

Erik Fransén, Daniel Gillblad Feedback networks

Page 18: Feedback Networks and Hopfield Networks - KTH

Reducing the Boltzmann machineHopfield NetworksHebbian Learning

Beyond the Boltzmann machine

Attractor Types

“Normal” attractors,

Point attractors.Limit cycles.

Strange attractors (the system itself is chaotic),

Sensitive dependence on initial conditions.Deterministic, but can exhibit a behaviour so complicatedthat it looks random.

Erik Fransén, Daniel Gillblad Feedback networks

Page 19: Feedback Networks and Hopfield Networks - KTH

Reducing the Boltzmann machineHopfield NetworksHebbian Learning

Beyond the Boltzmann machine

Associative Memories

For simple Hopfield networks, it often takes just oneiteration to converge from a randomly perturbed storedpattern.The network often has more stable states in adition to thedesired memories:

The inverse of a stable state.Mixtures of the memories.

Introducing “Brain-damage” by setting a subset of thelearned weights to zero often allows the network to stillcomplete the patterns.Patterns can usually be added up to a certain point, atwhich the network fails catastrophically.Network properties are not robust when changing fromasynchronous to synchronous updates.

Erik Fransén, Daniel Gillblad Feedback networks

Page 20: Feedback Networks and Hopfield Networks - KTH

Reducing the Boltzmann machineHopfield NetworksHebbian Learning

Beyond the Boltzmann machine

Examples, auto associative memory

�����

������������� ������������������������������������ ������������� �� �� �� �� ������������������������ ��������������������� �� �� �� �� ������������������������ ��������������������� �� �� �� �� ������������������������ ��������������������� �� ���� ��������������������������������������� �� �� ����������������������������������������� ������� �� �� �� ��������������������������������������� �������� �� �� �� �������������������������������������� ������� �� �� ���������������������������������������� �� ����� �� �� �� ��������������������������������������� �������� �� �� ����������������������������������������� ���������������������� �������������������������������������������������������������������������������������������������� ������ �� ������������������������������� ��� �� �� �� ��������������������������������������� ����������������� �� ���� ����������������������������������������������������� ������������������������������������ �� �� �� �������������������������������������� ����������������� ������ �� ������������������������������� ��� �� �� �� ��������������������������������������� ������� �� �� ���������������������������������������� �� ��������������� �� �� �� �� ������������������������ ��������������������� �� �� �� �� ������������������������ ������������������ ������������������ ����������������������������� �� �� ����������������������������������������� ������

����� �

�! "� �

�!#$� � �

��%&� �

�(')� �

��*+� �

��,�� �

�(-.� � �0/1� � �(23� � �

�(4.� � � �(56� � �

Erik Fransén, Daniel Gillblad Feedback networks

Page 21: Feedback Networks and Hopfield Networks - KTH

Reducing the Boltzmann machineHopfield NetworksHebbian Learning

Beyond the Boltzmann machine

Examples, hetero associative memory

moscow------russia

lima----------peru

london-----england

tokyo--------japan

edinburgh-scotland

ottawa------canada

oslo--------norway

stockholm---sweden

paris-------france

moscow---????????? ⇒ moscow------russia

?????????---canada ⇒ ottawa------canada

otowaa------canada ⇒ ottawa------canada

egindurrh-scotland ⇒ edinburgh-scotland

Erik Fransén, Daniel Gillblad Feedback networks

Page 22: Feedback Networks and Hopfield Networks - KTH

Reducing the Boltzmann machineHopfield NetworksHebbian Learning

Beyond the Boltzmann machine

Hopfield Networks and Optimization

Since a Hopfield network minimizes an energy function, wecan map some optimization problems onto Hopfieldnetworks.Travelling Salesman:

B

C

D

2 3 41

A

������������ ������

� � ��

D

C

A

B

� �����

B

C

D

2 3 41

A

������������� ���� �

� ! ��

D

C

A

B

� �#"#�

�%$ �

C

D

A

B

2 3 41

� �&�

C

D

A

B

2 3 41

')(+*-,

Erik Fransén, Daniel Gillblad Feedback networks

Page 23: Feedback Networks and Hopfield Networks - KTH

Reducing the Boltzmann machineHopfield NetworksHebbian Learning

Beyond the Boltzmann machine

Capacity of Hopfield Networks

The standard learning method gives us local minima in thecorrect places.However, retreiving patterns can fail in numerous ways:

1 Individual bits in some memories might be corrupted. Astable state of the network is displaced a little from thedesired memory.

2 Entire memories might be absent from the set of attractors.3 Spurious additional memories unrelated to desired

memories might be present.4 Spurious additional memories derived from the desired

memories through operations such as mixing and inversionmay be present.

These are in general all undesirable, although failure type4 may be regarded as generalization.

Erik Fransén, Daniel Gillblad Feedback networks

Page 24: Feedback Networks and Hopfield Networks - KTH

Reducing the Boltzmann machineHopfield NetworksHebbian Learning

Beyond the Boltzmann machine

Examples on capacity

moscow------russia

lima----------peru

london-----england

tokyo--------japan

edinburgh-scotland

ottawa------canada

oslo--------norway

stockholm---sweden

paris-------france

→W→

moscow------russia

lima----------peru

londog-----englard

tonco--------japan

edinburgh-scotland

lostoslo--------norway

stockholm---sweden

paris-------france

wrpkmh---xqpqwqxpq

paris-------sweden

ecnarf-------sirap

Erik Fransén, Daniel Gillblad Feedback networks

Page 25: Feedback Networks and Hopfield Networks - KTH

Reducing the Boltzmann machineHopfield NetworksHebbian Learning

Beyond the Boltzmann machine

More Examples on capacity, 1

�����

������������ � ������� ��������� ��������� ���������� �������������������������������������� ����������������������������������� �!�����"������� ��������������� ������������������������ �#�$������ ��!���������������������������"���$��������������������������� $����������������%��������� � ������������������ ���������������������� �������������&��������������������������������� ��� ��������� � �������������������������������������������������� ��������������������� ��������$������������"������� �������'�$������ ���������������������� �����"�������������(���������������'�)������� �� ������������ *� �������������������� ����� ����������������������� ���������������$�"����"�������������$�����������������������������������"����"����������������+���,������������' ���� �����$������������ � ��-�#��� ��� �������������� �����������������������������"����"���������' (�"��������"����. �������������������������� ����������������� �������������' /�� $�������� �������������������"�����������������"������������� 0������������������������������������ ������������� ����������� �����1�2�����������������������������������$������������"��� *�'����������3����������������� ����������� ���� �+����������'����������������������������������������������$�� *����� ������������ �����������,�� ��������������������������' *�"�� ����������(������������� 0����������-����� ��������������' �������������"���$������������������������,�4�� ����� �#�� *�"���������������������������������������������������� �,�3���� ���� �+���$������ *�"����+��������������������$����� ������������������������$�"����"�������������$�������������������������������

��54� 6 6

��7�� 6 6

��8�� 6

��9�� 6

�:;� 6 6

Erik Fransén, Daniel Gillblad Feedback networks

Page 26: Feedback Networks and Hopfield Networks - KTH

Reducing the Boltzmann machineHopfield NetworksHebbian Learning

Beyond the Boltzmann machine

More examples on capacity, 2

������������� ��� �����������

� � � � �

� � � � �

Erik Fransén, Daniel Gillblad Feedback networks

Page 27: Feedback Networks and Hopfield Networks - KTH

Reducing the Boltzmann machineHopfield NetworksHebbian Learning

Beyond the Boltzmann machine

Consequences of Capacity Calculations, 1

If we try to store N ' 0.18I then about 1% of the bits will beunstable after the first iteration, starting from a stablepattern.When N/I is large, unstable bits may cause an avalancheeffect of bits becoming unstable during iteration.There is a sharp discontinuity at

Ncritical = 0.138I

When N/I exceeds 0.138, the system only has spuriousstates.

Erik Fransén, Daniel Gillblad Feedback networks

Page 28: Feedback Networks and Hopfield Networks - KTH

Reducing the Boltzmann machineHopfield NetworksHebbian Learning

Beyond the Boltzmann machine

Consequences of Capacity Calculations, 2

0

0.2

0.4

0.6

0.8

1

0 0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.160.95

0.96

0.97

0.98

0.99

1

0.09 0.1 0.11 0.12 0.13 0.14 0.15

Erik Fransén, Daniel Gillblad Feedback networks

Page 29: Feedback Networks and Hopfield Networks - KTH

Reducing the Boltzmann machineHopfield NetworksHebbian Learning

Beyond the Boltzmann machine

Transition Properties

For all N/I stable states uncorrelated with the desiredmemories exist.

For N/I ∈ (0,0.138) there are stable states close tothe desired memories.

For N/I ∈ (0,0.05) the desired memories have lowerenergy than the uncorrelated states.

For N/I ∈ (0.05,0.138) the uncorrelated statesdominate.

For N/I ∈ (0,0.03) there are additional mixture states.

Erik Fransén, Daniel Gillblad Feedback networks

Page 30: Feedback Networks and Hopfield Networks - KTH

Reducing the Boltzmann machineHopfield NetworksHebbian Learning

Beyond the Boltzmann machine

Capacity of Random Patterns, 1

Assume that the patterns we want to store are randombinary patterns.Let us study the stability of a single bit, assuming that thestate of the network is set to the desired pattern x(n).The activation of a particular neuron is

ai = ∑j

wijx(n)j

and the weights are (for i 6= j)

wij = x (n)i x (n)

j + ∑m 6=n

x (m)i x (m)

j

Erik Fransén, Daniel Gillblad Feedback networks

Page 31: Feedback Networks and Hopfield Networks - KTH

Reducing the Boltzmann machineHopfield NetworksHebbian Learning

Beyond the Boltzmann machine

Capacity of Random Patterns, 2

We split W into two terms, one representing “signal”reinforcing the desired memory, and the second “noise”.The activation becomes

ai = ∑j 6=i

x (n)i x (n)

j x (n)j + ∑

j 6=i∑

m 6=nx (n)

i x (n)j x (n)

j

= (I−1)x (n)i + ∑

j 6=i∑

m 6=nx (m)

i x (m)j x (m)

j

The first term is (I−1) times the desired state x (n)i .

The second term is a sum of (I−1)(N−1) randomquantities x (m)

i x (m)j x (m)

j . These are independent randomvariables with mean 0 and variance 1.

Erik Fransén, Daniel Gillblad Feedback networks

Page 32: Feedback Networks and Hopfield Networks - KTH

Reducing the Boltzmann machineHopfield NetworksHebbian Learning

Beyond the Boltzmann machine

Capacity of Random Patterns, 3

We can conclude that ai has mean (I−1)x (n)i and variance

(I−1)(N−1).Assume that I and N are large enough so that we canassume that the distinction to (I−1) and (N−1) arenegligable.This means that ai is approximately Gaussian distributedwith mean Ix (n)

i and variance IN.The probability that bit i will flip is

P(i flip) = Φ

(− I√

IN

)= Φ

(− 1√

N/I

)where

Φ(z) =∫ z

−∞

dz1√2π

e−z2/2

Erik Fransén, Daniel Gillblad Feedback networks

Page 33: Feedback Networks and Hopfield Networks - KTH

Reducing the Boltzmann machineHopfield NetworksHebbian Learning

Beyond the Boltzmann machine

Increasing Capacity in Hopfield Networks

We can increase the capacity of the network if we abandonthe Hebbian learning rule, and instead use an objectivefunction that measures how well the patterns are storedand minimizing it.For all patterns x(n), if all other neurons are set correctly,the activation of neuron i should be such that xi = x (n)

i ,

Cost(W) = ∑i ∑n tni ln(y (n)

i ) + (1− t(n)n ln(1−y (n)

i ))

t(n)i =

{1 x (n)

i = 10 x (n)

i =−1

y (n)i = 1

1+e−a(n)i

Parameters can be found using gradient descent.

Erik Fransén, Daniel Gillblad Feedback networks

Page 34: Feedback Networks and Hopfield Networks - KTH

Reducing the Boltzmann machineHopfield NetworksHebbian Learning

Beyond the Boltzmann machine

Beyond the Boltzmann machine

The Boltzmann machine is a special case of an undirectedgraphical model, it belongs to a class called Markov RandomFields.More about (statistical) belief networks in a later lecture.

Erik Fransén, Daniel Gillblad Feedback networks