learning rules 1 computational neuroscience 03 lecture 8
TRANSCRIPT
![Page 1: Learning Rules 1 Computational Neuroscience 03 Lecture 8](https://reader035.vdocuments.us/reader035/viewer/2022062318/55160064550346d46f8b5b07/html5/thumbnails/1.jpg)
Learning Rules 1
Computational Neuroscience 03
Lecture 8
![Page 2: Learning Rules 1 Computational Neuroscience 03 Lecture 8](https://reader035.vdocuments.us/reader035/viewer/2022062318/55160064550346d46f8b5b07/html5/thumbnails/2.jpg)
Last week showed how to model neurons and networks of neurons using firing rate models.
And we then discussed how to add them together to form networks of neurons
).( rwFvdt
dvr
![Page 3: Learning Rules 1 Computational Neuroscience 03 Lecture 8](https://reader035.vdocuments.us/reader035/viewer/2022062318/55160064550346d46f8b5b07/html5/thumbnails/3.jpg)
Where the weight vector is replaced by a matrix. Also often replace feedforward input with a vector
Feedforward and Recurrent networks
)()( rMhFvrMrWFvdt
dvr
![Page 4: Learning Rules 1 Computational Neuroscience 03 Lecture 8](https://reader035.vdocuments.us/reader035/viewer/2022062318/55160064550346d46f8b5b07/html5/thumbnails/4.jpg)
Recurrent networks can also do this but have much more complex dynamics than feedforward nets. Also more difficult to analyse
Much analysis focuses on looking at the eigenvectors of the matrix M
Can show for instance that networks can exhibit selective amplification if there is one dominant eigenvector (cf PCA)
![Page 5: Learning Rules 1 Computational Neuroscience 03 Lecture 8](https://reader035.vdocuments.us/reader035/viewer/2022062318/55160064550346d46f8b5b07/html5/thumbnails/5.jpg)
Or if an eigenvalue is exactly equal to 1 and others < 1can get integration of inputs and therefore persistent activity as activity does not stop when input stops
But how are we to generate such precise weight changes? Need some synaptic modification rules
![Page 6: Learning Rules 1 Computational Neuroscience 03 Lecture 8](https://reader035.vdocuments.us/reader035/viewer/2022062318/55160064550346d46f8b5b07/html5/thumbnails/6.jpg)
Hebb’s postulateHebb's postulate of learning (or simply Hebb's rule) (1949), is the following:
"When an axon of cell A is near enough to excite cell B and repeatedly or persistently takes part in firing it, some growth processes or metabolic changes take place in one or both cells such that A's efficiency as one of the cells firing B, is increased".
This rule forms the basis of much of the research on role of synaptic plasticity in memory and learning
Has been generalised to include decreases of strength when neuron A repeatedly fails to be involved in activation of B and generally look at the correlation or covariance of activities of pre-and postsynaptic neurons
![Page 7: Learning Rules 1 Computational Neuroscience 03 Lecture 8](https://reader035.vdocuments.us/reader035/viewer/2022062318/55160064550346d46f8b5b07/html5/thumbnails/7.jpg)
Learning in the hippocampus
Here see examples of long-term potentiation and depression (LTP and LTD).
High frequency stimulation induces LTP while long-lasting low frequency leads to LTD
‘Long Term’ refers to changes > 10 mins
![Page 8: Learning Rules 1 Computational Neuroscience 03 Lecture 8](https://reader035.vdocuments.us/reader035/viewer/2022062318/55160064550346d46f8b5b07/html5/thumbnails/8.jpg)
Learning RulesWe will consider 3 types of learning and will focus on Hebbian type learning though others exist (modifcations on pre/postsynaptic activity only etc)
• Unsupervised learning: Network responds during training solely as a result of its connections and intrinsic dynamics. Net self-organises in a manner dependent on inputs and synaptic plasticity rule
• Supervised learning: Here the network also has a “teacher” in the form of a set of desired ‘target’ outputs for each input. Not especially biologically plausible, but good for existence proofs
• Reinforcement learning: Somewhere in between the 2. The net does not know the target output but gets feedback via reward punishment
![Page 9: Learning Rules 1 Computational Neuroscience 03 Lecture 8](https://reader035.vdocuments.us/reader035/viewer/2022062318/55160064550346d46f8b5b07/html5/thumbnails/9.jpg)
Unsupervised Learning
Start with single postsynaptic neuron and a linear activation function. As synaptic changes will be much longer time scale than these dynamics firing rate equation reduces to:
uvdt
wdw
uwvdt
dvr .
Start with simplest Hebbian style plasticity for a single neuron:
uN
bbbuwuwv
1
.
![Page 10: Learning Rules 1 Computational Neuroscience 03 Lecture 8](https://reader035.vdocuments.us/reader035/viewer/2022062318/55160064550346d46f8b5b07/html5/thumbnails/10.jpg)
As u, v denote firing rates vu can be interpreted as a probability that pre- and post fire together.
Each different set of u is known as an input pattern. As weight changes are slow, rather than summing all changes separately can average the input patterns and thus compute the average change.
To do this use < > to denote averages over the ensemble of input patterns. Thus get:
uvdt
wdw
Remembering that v = w.u this gives the correlation-based rule
uN
bbbb
bww wQ
dt
dwwQ
dt
wd
1''' :or .
Where Q is correlation matrix: Qbb’ = <ubub’>
![Page 11: Learning Rules 1 Computational Neuroscience 03 Lecture 8](https://reader035.vdocuments.us/reader035/viewer/2022062318/55160064550346d46f8b5b07/html5/thumbnails/11.jpg)
So suppose u1= (1, 0) and u2= (0, 1) then
5.00
05.0Q
01.0
015.0
1.0
1.0
5.05.0
5.011.0
1wQw
w
Alternatively u1= (1, 1) and u2= (1, 0) then
5.05.0
5.01Q
Suppose w=10 and w=(0.1,0.1) and we have the 2nd matrix then:
As wn+1= wn + w, w1=(0.115, 0.11), w2=(0.13, 0.12) ….
Which leads to instability and uncontrolled growth of w
eg for 2 patterns u1 and u2, Qij = ½(u1i u1
j + u2i u2
j)
![Page 12: Learning Rules 1 Computational Neuroscience 03 Lecture 8](https://reader035.vdocuments.us/reader035/viewer/2022062318/55160064550346d46f8b5b07/html5/thumbnails/12.jpg)
W=(0.1, 0.1)
W=(0.1, -0.3)
Outcome dependent on eiegenvectors of Q and initial conditions, but unstable
![Page 13: Learning Rules 1 Computational Neuroscience 03 Lecture 8](https://reader035.vdocuments.us/reader035/viewer/2022062318/55160064550346d46f8b5b07/html5/thumbnails/13.jpg)
To avoid unbounded growth can impose a saturation constraint.
However, this means all weights go to max or min and thus we have no competition between different synapses
This means that the neuron cannot distinguish between presynaptic inputs
Also since u, v are firing rates and therefore positive, Hebb rule only describes LTP.
However, earlier figure showed that synapses can depress in strength if presynaptic activity is accompanied by a low level of postsynaptic activity
Can also get results where opposite is true
![Page 14: Learning Rules 1 Computational Neuroscience 03 Lecture 8](https://reader035.vdocuments.us/reader035/viewer/2022062318/55160064550346d46f8b5b07/html5/thumbnails/14.jpg)
Analysis focuses on looking at eigenvectors of Q the correlation matrix of the input vectors
For instance, can show that after training Hebbian rule leads to v e1.u for arbitary vector u and that weight vector expressed as a sum of eigenvectors is dominated by e1 ie w e1
That is v is projection of u onto the principal eigenvector of Q
Eg for a Q with principal eigenvector (1, -1)/sqrt(2) we would expect w to end up as (wmax,0) or (0, wmax).
However because of saturation constraints can get (wmax, wmax)
![Page 15: Learning Rules 1 Computational Neuroscience 03 Lecture 8](https://reader035.vdocuments.us/reader035/viewer/2022062318/55160064550346d46f8b5b07/html5/thumbnails/15.jpg)
Covariance ruleTo get LTD can introduce a postsynaptic or presynaptic threshold:
uvdt
wdvw )(
Below thresholds get depression, above potentiation. A convenient choice for the thresholds is the average pre/post synaptic input <u> or <v>. If we now replace v with w.u we get the covariance rule
.wCdt
wdw
Where C is the covariance matrix of the input data ie:
)( uw uvdt
wd
]))([( )-()-(2 TuuuuEuuuuuuuC
![Page 16: Learning Rules 1 Computational Neuroscience 03 Lecture 8](https://reader035.vdocuments.us/reader035/viewer/2022062318/55160064550346d46f8b5b07/html5/thumbnails/16.jpg)
Note that as <v> keeps changing we need to keep updating the postsynaptic threshold while the presynaptic one is independent of the weights
Although both average to the same thing, they do have differences.
The postsynaptic threshold means that only modifies weights for non-zero presynaptic activities. If v is below threshold then this results in homosynaptic depression
Alternatively, presynaptic threshold reduces the strength of inactive synapses for v>0: heterosynaptic depression
Although the covariance rule allows LTD it is still unstable due to positive feedback
Also we do not have competition, but this can be introduced to allowing threshold to slide as follows
![Page 17: Learning Rules 1 Computational Neuroscience 03 Lecture 8](https://reader035.vdocuments.us/reader035/viewer/2022062318/55160064550346d46f8b5b07/html5/thumbnails/17.jpg)
As covariance rule allows LTD without postsynaptic/presynaptic activity, Bienenstock, Cooper and Munro (82) proposeed an alternative for which there is experimental evidence where the postsynaptic threshold is dynamic
BCM rule
)( vw vuvdt
wd
![Page 18: Learning Rules 1 Computational Neuroscience 03 Lecture 8](https://reader035.vdocuments.us/reader035/viewer/2022062318/55160064550346d46f8b5b07/html5/thumbnails/18.jpg)
This is again unstable if is fixed.
However, if we allow the threshold to grow faster than v we get stability. For instance use as low pass filtered version of v2
vv v
dt
d 2
Usually set to be less than w so that changes in faster than changes in v
Now get competition between synapses since strengthening some synapses results in threshold increasing meaning that it is harder for others to be strengthened
![Page 19: Learning Rules 1 Computational Neuroscience 03 Lecture 8](https://reader035.vdocuments.us/reader035/viewer/2022062318/55160064550346d46f8b5b07/html5/thumbnails/19.jpg)
A more direct way of enforcing competition is through synaptic normalisation
Idea is that postsynaptic neuron can only support a certain amount of total synaptic weight so strengthening one leads to weakening others
Can either hold the sum of weights constant if all are +ve or –ve or can constrain the sum of squares of the weights (cf ANN network pruning)
2 types: subtractive normalisation and multiplicative normalisation
Synaptic normalisation
![Page 20: Learning Rules 1 Computational Neuroscience 03 Lecture 8](https://reader035.vdocuments.us/reader035/viewer/2022062318/55160064550346d46f8b5b07/html5/thumbnails/20.jpg)
Where n is a vector of ones of length Nu so n.u is the sum of all the inputs u.
Thus the second term is simply a vector Nu long with the same values in ie (k, k, ….., k) whose sum over all the elements is equal to the sum over all the elements of vu. Thus the total increase in the weights is 0
This rule must be augmented by a saturation constraint to prevent the weights becoming negative, that if a weight becomes zero, it is not moved downwards
Also, without upper saturation often leads to all weights bar one being zero. Note also rule involves global knowledge of weights
Subtractive normalisation
uw N
nunvuv
dt
wd ).(
![Page 21: Learning Rules 1 Computational Neuroscience 03 Lecture 8](https://reader035.vdocuments.us/reader035/viewer/2022062318/55160064550346d46f8b5b07/html5/thumbnails/21.jpg)
Where is a +ve constant: known as Oja’s rule (1982)
This rule is more local than previous as it only involves the weight in question and pre-and post synaptic activities.
However, its form is based on theoretical arguments rather than experimental data
Previous rule was rigid as it had to be satisfied at all times whereas this is more dynamic with |w|2 gradually relaxing to 1/
This induces competition as if one weight increases, the maintenance of constant length of the weight vector forces others to decrease
Multiplicative normalisation
wvuvdt
wdw
2
![Page 22: Learning Rules 1 Computational Neuroscience 03 Lecture 8](https://reader035.vdocuments.us/reader035/viewer/2022062318/55160064550346d46f8b5b07/html5/thumbnails/22.jpg)
Both Hebbian and Oja’s rule run for long enough generate vectors parallel to principal eigenvector of correlation matrix as in A
This is basically principal component analaysis (PCA) which is theoretically the optimal in terms of retaining info way to encode high dimensional info onto lower dimensional subspace
However, B shows what happens if input vectors don’t have zero mean (as in real systems), but this problem is alleviated by using covariance-based rules
![Page 23: Learning Rules 1 Computational Neuroscience 03 Lecture 8](https://reader035.vdocuments.us/reader035/viewer/2022062318/55160064550346d46f8b5b07/html5/thumbnails/23.jpg)
Previous rules don’t take timing into account
Can be crucial since if pre-synaptic spike occurs after postsynaptic get LTD rather than LTP
Timing based rules
![Page 24: Learning Rules 1 Computational Neuroscience 03 Lecture 8](https://reader035.vdocuments.us/reader035/viewer/2022062318/55160064550346d46f8b5b07/html5/thumbnails/24.jpg)
dtutvHtutvHvdt
wdw ))()()()()()((
0
Therfore need to integrate over time as in the following:
Where H is a function like the solid line in previous figure
Such functions still require saturation constraints but timing can generate competition
![Page 25: Learning Rules 1 Computational Neuroscience 03 Lecture 8](https://reader035.vdocuments.us/reader035/viewer/2022062318/55160064550346d46f8b5b07/html5/thumbnails/25.jpg)
Can extend the rules defined previously to nets with multiple postsynaptic neurons. In these networks the output rates are:
Multiple Postsynaptic Neurons
vMuWv Thus where 1)( MIKuKWv
And the Hebbian rule becomes:
KWQuvdt
dWw
![Page 26: Learning Rules 1 Computational Neuroscience 03 Lecture 8](https://reader035.vdocuments.us/reader035/viewer/2022062318/55160064550346d46f8b5b07/html5/thumbnails/26.jpg)
Can also use feature-based models where the net is indexed by input features rather than buy individual inputs
Network models can have adaptive feedforward wieghts and fixed recurrent ones, or vice versa or both layers can be adaptive
Can get competition through mainly inhibitory recurrent connections
Can then get eg self-organising maps and elastic nets where K can have forms like: