a biologically inspired visual working memory for deep...
TRANSCRIPT
![Page 1: A Biologically Inspired Visual Working Memory for Deep Networkscomp6248.ecs.soton.ac.uk/handouts/lec.pdf · neuroanatomy, to psychology What are we interested in? Psychophysics, applies](https://reader035.vdocuments.us/reader035/viewer/2022080720/5f79fef236c7e91cdc1c5b6e/html5/thumbnails/1.jpg)
A Biologically Inspired Visual Working Memory forDeep Networks
Ethan Harris
Dr Jonathon Hare Professor Mahesan Niranjan
Vision Learning and Control GroupDepartment of Electronics and Computer Science
University of Southampton
https://arxiv.org/abs/1901.03665https://github.com/ethanwharris/STAWM
March 20, 2019
Ethan Harris (VLC) STAWM March 20, 2019 1 / 28
![Page 2: A Biologically Inspired Visual Working Memory for Deep Networkscomp6248.ecs.soton.ac.uk/handouts/lec.pdf · neuroanatomy, to psychology What are we interested in? Psychophysics, applies](https://reader035.vdocuments.us/reader035/viewer/2022080720/5f79fef236c7e91cdc1c5b6e/html5/thumbnails/2.jpg)
Neural Networks and the Brain
Neural networks don’t look like the brain
Or do they?
We need to be clear with how we characterise the brain and neuralnetworks
Should they be similar?
Ethan Harris (VLC) STAWM March 20, 2019 2 / 28
![Page 3: A Biologically Inspired Visual Working Memory for Deep Networkscomp6248.ecs.soton.ac.uk/handouts/lec.pdf · neuroanatomy, to psychology What are we interested in? Psychophysics, applies](https://reader035.vdocuments.us/reader035/viewer/2022080720/5f79fef236c7e91cdc1c5b6e/html5/thumbnails/3.jpg)
The Scope of Neuroscience
Neuroscience can be used to mean many things
Best described as a spectrum, from cellular dynamics, toneuroanatomy, to psychology
What are we interested in? Psychophysics, applies across thespectrum, concerned with understanding our perception of the worldaround us
Ethan Harris (VLC) STAWM March 20, 2019 3 / 28
![Page 4: A Biologically Inspired Visual Working Memory for Deep Networkscomp6248.ecs.soton.ac.uk/handouts/lec.pdf · neuroanatomy, to psychology What are we interested in? Psychophysics, applies](https://reader035.vdocuments.us/reader035/viewer/2022080720/5f79fef236c7e91cdc1c5b6e/html5/thumbnails/4.jpg)
In This Lecture
We will look at a particular class of deep network which tries tomodel visual attention
We will consider relevant concepts from the science of perception tobring these models closer to visual processing in nature
Through experimentation we will demonstrate the viability of thisalternative approach to deep learning research
Ethan Harris (VLC) STAWM March 20, 2019 4 / 28
![Page 5: A Biologically Inspired Visual Working Memory for Deep Networkscomp6248.ecs.soton.ac.uk/handouts/lec.pdf · neuroanatomy, to psychology What are we interested in? Psychophysics, applies](https://reader035.vdocuments.us/reader035/viewer/2022080720/5f79fef236c7e91cdc1c5b6e/html5/thumbnails/5.jpg)
An Attention Primer
In a deep network we can construct layers which, rather than processa particular signal, alter or warp it in some fashion to focus on therelevant information
We call this attention
Our function needs to be differentiable (almost everywhere, non-zeroat least somewhere) for this to work
Ethan Harris (VLC) STAWM March 20, 2019 5 / 28
![Page 6: A Biologically Inspired Visual Working Memory for Deep Networkscomp6248.ecs.soton.ac.uk/handouts/lec.pdf · neuroanatomy, to psychology What are we interested in? Psychophysics, applies](https://reader035.vdocuments.us/reader035/viewer/2022080720/5f79fef236c7e91cdc1c5b6e/html5/thumbnails/6.jpg)
Interpolation (an aside)
Estimate value by considering its location among known points
Can take the nearest neighbour or use a linear estimate (bilinear for2D data like images)
Ethan Harris (VLC) STAWM March 20, 2019 6 / 28
![Page 7: A Biologically Inspired Visual Working Memory for Deep Networkscomp6248.ecs.soton.ac.uk/handouts/lec.pdf · neuroanatomy, to psychology What are we interested in? Psychophysics, applies](https://reader035.vdocuments.us/reader035/viewer/2022080720/5f79fef236c7e91cdc1c5b6e/html5/thumbnails/7.jpg)
Spatial Transformer Networks (Jaderberg et al., 2015)
Interpolation schemes are differentiable
Bilinear is just a linear combination of stuff, Nearest Neighbour ispiece-wise differentiable (the gradient is 1 for each pixel I choose and0 elsewhere; think dropout)
We can differentiably make a flow field from 6 affine co-ordinates(other such transforms are available), giving a network the ability tolearn to spatially transform an image based on some observations
No explicit supervision is required to learn the attention policy, justbackprop through the whole network
Ethan Harris (VLC) STAWM March 20, 2019 7 / 28
![Page 8: A Biologically Inspired Visual Working Memory for Deep Networkscomp6248.ecs.soton.ac.uk/handouts/lec.pdf · neuroanatomy, to psychology What are we interested in? Psychophysics, applies](https://reader035.vdocuments.us/reader035/viewer/2022080720/5f79fef236c7e91cdc1c5b6e/html5/thumbnails/8.jpg)
Recurrent Models of Visual Attention (Mnih et al., 2014)
RNNs are a thing (and sometimes they work!)
Combine them with an attention mechanism and you have a modelthat can process an image through a series of glimpses
Called a Recurrent Attention Model (RAM)
Ethan Harris (VLC) STAWM March 20, 2019 8 / 28
![Page 9: A Biologically Inspired Visual Working Memory for Deep Networkscomp6248.ecs.soton.ac.uk/handouts/lec.pdf · neuroanatomy, to psychology What are we interested in? Psychophysics, applies](https://reader035.vdocuments.us/reader035/viewer/2022080720/5f79fef236c7e91cdc1c5b6e/html5/thumbnails/9.jpg)
Enriched DRAM (Ablavatski et al., 2017)
Original RAMs used non-differentiablehard attention to pixels and had to betrained with reinforcement learning
Enriched DRAM replaces this with aspatial transformer, now we don’t needRL!
Ethan Harris (VLC) STAWM March 20, 2019 9 / 28
![Page 10: A Biologically Inspired Visual Working Memory for Deep Networkscomp6248.ecs.soton.ac.uk/handouts/lec.pdf · neuroanatomy, to psychology What are we interested in? Psychophysics, applies](https://reader035.vdocuments.us/reader035/viewer/2022080720/5f79fef236c7e91cdc1c5b6e/html5/thumbnails/10.jpg)
Attention in Nature
When Humans attend to a visual scene we rely on our memory (notan RNN)
Every time we glimpse at something, this triggers a short term changeto the strength (efficacy) of connections (synapses) in the brain
These changes help to alter our perception of future glimpses for ashort period of time
Ethan Harris (VLC) STAWM March 20, 2019 10 / 28
![Page 11: A Biologically Inspired Visual Working Memory for Deep Networkscomp6248.ecs.soton.ac.uk/handouts/lec.pdf · neuroanatomy, to psychology What are we interested in? Psychophysics, applies](https://reader035.vdocuments.us/reader035/viewer/2022080720/5f79fef236c7e91cdc1c5b6e/html5/thumbnails/11.jpg)
Plasticity
In ‘The Organization of Behavior’ Hebb (1949) described anobservable learning procedure in neurons which became known asHebb’s postulate:
‘When an axon of cell A is near enough to excite a cell B andrepeatedly or persistently takes part in firing it, some growth processor metabolic change takes place in one or both cells such that A’sefficiency, as one of the cells firing B, is increased.’
Unsupervised learning from inputs based on associations betweenthem
Held to be the basis of learning and memory within the brain
Ethan Harris (VLC) STAWM March 20, 2019 11 / 28
![Page 12: A Biologically Inspired Visual Working Memory for Deep Networkscomp6248.ecs.soton.ac.uk/handouts/lec.pdf · neuroanatomy, to psychology What are we interested in? Psychophysics, applies](https://reader035.vdocuments.us/reader035/viewer/2022080720/5f79fef236c7e91cdc1c5b6e/html5/thumbnails/12.jpg)
A Differentiable Memory Mechanism
Suppose we want to model plasticity in the brain
We can use Hebbs rule ‘Neurons that fire together wire together’
Rosenblatt had some (continuous time, discrete activation) stuff thatdid this (Rosenblatt, 1962)
Can we make a discrete time, continuous activation relaxation of it?
Ethan Harris (VLC) STAWM March 20, 2019 12 / 28
![Page 13: A Biologically Inspired Visual Working Memory for Deep Networkscomp6248.ecs.soton.ac.uk/handouts/lec.pdf · neuroanatomy, to psychology What are we interested in? Psychophysics, applies](https://reader035.vdocuments.us/reader035/viewer/2022080720/5f79fef236c7e91cdc1c5b6e/html5/thumbnails/13.jpg)
The Hebb-Rosenblatt Memory
Learning rule based on Rosenblatts perceptron
W(t + ∆t) = W(t) + ∆t diag(η)(y(n)(t)x(n)>)−∆t diag(δ)W(t) (1)
LI
0
i
... ...LII
0
j
θ0x(n)0 +W(t)00 x
(n)0
θix(n)i +W(t)ij x
(n)i
W(t)0j x(n)0
W(t)i0 x(n)i
Ethan Harris (VLC) STAWM March 20, 2019 13 / 28
![Page 14: A Biologically Inspired Visual Working Memory for Deep Networkscomp6248.ecs.soton.ac.uk/handouts/lec.pdf · neuroanatomy, to psychology What are we interested in? Psychophysics, applies](https://reader035.vdocuments.us/reader035/viewer/2022080720/5f79fef236c7e91cdc1c5b6e/html5/thumbnails/14.jpg)
Hebbian Learning vs Backprop (an aside)
Backpropagation is used to update weights on a backward passaccording to the expected outputs
With Hebbian learning we only make observations about our inputs,looking for structure in the data from the ground up
This has a strong connection to component analysis (e.g. Oja’s rule(Oja, 1982))
Ethan Harris (VLC) STAWM March 20, 2019 14 / 28
![Page 15: A Biologically Inspired Visual Working Memory for Deep Networkscomp6248.ecs.soton.ac.uk/handouts/lec.pdf · neuroanatomy, to psychology What are we interested in? Psychophysics, applies](https://reader035.vdocuments.us/reader035/viewer/2022080720/5f79fef236c7e91cdc1c5b6e/html5/thumbnails/15.jpg)
STAWM: Short-Term Attentive Working Memory
Combine our Hebb-Rosenblatt memory with the EDRAM from before
Minor changes to the ‘what’, ‘where’ pathways
Context CNN Emission RNN
Aggregator RNN
Glimpse CNNMemoryNetwork
updateHidden State Output
×Ng
×Ng
Zeros
Ethan Harris (VLC) STAWM March 20, 2019 15 / 28
![Page 16: A Biologically Inspired Visual Working Memory for Deep Networkscomp6248.ecs.soton.ac.uk/handouts/lec.pdf · neuroanatomy, to psychology What are we interested in? Psychophysics, applies](https://reader035.vdocuments.us/reader035/viewer/2022080720/5f79fef236c7e91cdc1c5b6e/html5/thumbnails/16.jpg)
Using the Memory
Now we have a memory, how do we use it?
Need some kind of query signal to project through the weights
For this we use the output of the context CNN followed by a singlelinear layer
Once we have this representation we can attach networks that use itin different ways
Ethan Harris (VLC) STAWM March 20, 2019 16 / 28
![Page 17: A Biologically Inspired Visual Working Memory for Deep Networkscomp6248.ecs.soton.ac.uk/handouts/lec.pdf · neuroanatomy, to psychology What are we interested in? Psychophysics, applies](https://reader035.vdocuments.us/reader035/viewer/2022080720/5f79fef236c7e91cdc1c5b6e/html5/thumbnails/17.jpg)
Classification
Classification is easy, just add some linear layers
Context CNNMemoryNetwork
terminal
Classifier 5Projection
What about something more complicated? Maybe even recurrent?
Ethan Harris (VLC) STAWM March 20, 2019 17 / 28
![Page 18: A Biologically Inspired Visual Working Memory for Deep Networkscomp6248.ecs.soton.ac.uk/handouts/lec.pdf · neuroanatomy, to psychology What are we interested in? Psychophysics, applies](https://reader035.vdocuments.us/reader035/viewer/2022080720/5f79fef236c7e91cdc1c5b6e/html5/thumbnails/18.jpg)
Drawing
For drawing we can use transpose convolutions to output a sketchafter each glimpse
Encoder (parameters φ)
Context CNNMemoryNetwork
intermediate
Transpose CNN
Decoder (parameters θ)
Projection
×Ng
µ
σ2
N (µ, σ2)
We just need a way to combine the sketches
Ethan Harris (VLC) STAWM March 20, 2019 18 / 28
![Page 19: A Biologically Inspired Visual Working Memory for Deep Networkscomp6248.ecs.soton.ac.uk/handouts/lec.pdf · neuroanatomy, to psychology What are we interested in? Psychophysics, applies](https://reader035.vdocuments.us/reader035/viewer/2022080720/5f79fef236c7e91cdc1c5b6e/html5/thumbnails/19.jpg)
Alpha Compositing over operator (an aside)
We can use some old school computer graphics
Sequentially (and differentiably) combine images using an alpha mask
CN =CN−1 � (1− B) + tA−1(σ(UN)) � B ,
B ∼ Concrete(tA−1(σ(P)), τ) (2)
C0 ∈ {0}Hi×Wi , tA−1 is an inverse glimpse transform, σ is the sigmoidfunction, UN is a sketch or update and P are the alpha probabilities
Ethan Harris (VLC) STAWM March 20, 2019 19 / 28
![Page 20: A Biologically Inspired Visual Working Memory for Deep Networkscomp6248.ecs.soton.ac.uk/handouts/lec.pdf · neuroanatomy, to psychology What are we interested in? Psychophysics, applies](https://reader035.vdocuments.us/reader035/viewer/2022080720/5f79fef236c7e91cdc1c5b6e/html5/thumbnails/20.jpg)
Classification Experiments
Supervised classification performance on MNIST (LeCun, 1998)
Model Error
RAM, Sg = 8, Ng = 5 1.34%RAM, Sg = 8, Ng = 6 1.12%RAM, Sg = 8, Ng = 7 1.07%STAWM, Sg = 8, Ng = 8 0.41%±0.03
STAWM, Sg = 28, Ng = 10 0.35%±0.02
Ethan Harris (VLC) STAWM March 20, 2019 20 / 28
![Page 21: A Biologically Inspired Visual Working Memory for Deep Networkscomp6248.ecs.soton.ac.uk/handouts/lec.pdf · neuroanatomy, to psychology What are we interested in? Psychophysics, applies](https://reader035.vdocuments.us/reader035/viewer/2022080720/5f79fef236c7e91cdc1c5b6e/html5/thumbnails/21.jpg)
Classification Experiments
Self-supervised classification performance on MNIST
Model Error
DBM, Dropout (Srivastava et al., 2014) 0.79%Adversarial (Goodfellow et al., 2014) 0.78%Virtual Adversarial (Miyato et al., 2015) 0.64%Ladder (Rasmus et al., 2015) 0.57%STAWM, Sg = 6, Ng = 12 0.77%
Ethan Harris (VLC) STAWM March 20, 2019 21 / 28
![Page 22: A Biologically Inspired Visual Working Memory for Deep Networkscomp6248.ecs.soton.ac.uk/handouts/lec.pdf · neuroanatomy, to psychology What are we interested in? Psychophysics, applies](https://reader035.vdocuments.us/reader035/viewer/2022080720/5f79fef236c7e91cdc1c5b6e/html5/thumbnails/22.jpg)
Classification Experiments
Self-supervised classification performance on CIFAR-10 (Krizhevskyand Hinton, 2009)
Model Error
Baseline β-VAE (Higgins et al., 2016) 63.44%±0.31
STAWM, Sg = 16, Ng = 8 55.40%±0.63
Ethan Harris (VLC) STAWM March 20, 2019 22 / 28
![Page 23: A Biologically Inspired Visual Working Memory for Deep Networkscomp6248.ecs.soton.ac.uk/handouts/lec.pdf · neuroanatomy, to psychology What are we interested in? Psychophysics, applies](https://reader035.vdocuments.us/reader035/viewer/2022080720/5f79fef236c7e91cdc1c5b6e/html5/thumbnails/23.jpg)
Painting CIFAR-10 (Krizhevsky and Hinton, 2009)
We can view the update sequence for the drawing model
What if the update sequence becomes interesting?
Ethan Harris (VLC) STAWM March 20, 2019 23 / 28
![Page 24: A Biologically Inspired Visual Working Memory for Deep Networkscomp6248.ecs.soton.ac.uk/handouts/lec.pdf · neuroanatomy, to psychology What are we interested in? Psychophysics, applies](https://reader035.vdocuments.us/reader035/viewer/2022080720/5f79fef236c7e91cdc1c5b6e/html5/thumbnails/24.jpg)
CelebA (Liu et al., 2015) Emergent Segmentation
Ethan Harris (VLC) STAWM March 20, 2019 24 / 28
![Page 25: A Biologically Inspired Visual Working Memory for Deep Networkscomp6248.ecs.soton.ac.uk/handouts/lec.pdf · neuroanatomy, to psychology What are we interested in? Psychophysics, applies](https://reader035.vdocuments.us/reader035/viewer/2022080720/5f79fef236c7e91cdc1c5b6e/html5/thumbnails/25.jpg)
CelebA (Liu et al., 2015) Emergent Segmentation
Top: the drawn results. Middle: the target images. Bottom: the learnedmask from the last glimpse, pointwise multiplied with the target image.
Ethan Harris (VLC) STAWM March 20, 2019 25 / 28
![Page 26: A Biologically Inspired Visual Working Memory for Deep Networkscomp6248.ecs.soton.ac.uk/handouts/lec.pdf · neuroanatomy, to psychology What are we interested in? Psychophysics, applies](https://reader035.vdocuments.us/reader035/viewer/2022080720/5f79fef236c7e91cdc1c5b6e/html5/thumbnails/26.jpg)
Interpretable Classifiers
What if we allow STAWM to use multiple query vectors through thememory to perform a series of different tasks like classification anddrawing?
8 2 9 0 5 2 7 6 3 7 5 9 6 2 5 5 7 2 3 7 7 6 7 5 2
3 7 4 6 3 8 2 0 7 4 3 8 4 7 9 3 5 8 5 9 8 0 3 8 7
Top: sketchpad results and associated predictions for a sample ofmisclassifications. Bottom: associated input images and target classes.
Ethan Harris (VLC) STAWM March 20, 2019 26 / 28
![Page 27: A Biologically Inspired Visual Working Memory for Deep Networkscomp6248.ecs.soton.ac.uk/handouts/lec.pdf · neuroanatomy, to psychology What are we interested in? Psychophysics, applies](https://reader035.vdocuments.us/reader035/viewer/2022080720/5f79fef236c7e91cdc1c5b6e/html5/thumbnails/27.jpg)
Summary / Conclusions
We constructed a model which has a memory much more similar tothat found in nature
We used that model to perform tasks such as drawing in a way that ismuch more natural for humans
Not only does this model work much better than previous attentionalmodels it also has some emergent properties
STAWM learns the structure in images of faces from CelebA forsegmentation without supervision and learns to draw and classify intandem so that we (humans) can view and understand its mistakes
Should neural networks look more like the brain? We think so
Ethan Harris (VLC) STAWM March 20, 2019 27 / 28
![Page 28: A Biologically Inspired Visual Working Memory for Deep Networkscomp6248.ecs.soton.ac.uk/handouts/lec.pdf · neuroanatomy, to psychology What are we interested in? Psychophysics, applies](https://reader035.vdocuments.us/reader035/viewer/2022080720/5f79fef236c7e91cdc1c5b6e/html5/thumbnails/28.jpg)
References
Ablavatski, A., Lu, S., and Cai, J. (2017). Enriched deep recurrent visual attention model for multiple object recognition. InApplications of Computer Vision (WACV), 2017 IEEE Winter Conference on, pages 971–978. IEEE.
Goodfellow, I. J., Shlens, J., and Szegedy, C. (2014). Explaining and harnessing adversarial examples. arXiv preprintarXiv:1412.6572.
Hebb, D. O. (1949). The organization of behavior: A neuropsychological theory. New York: Wiley.
Higgins, I., Matthey, L., Pal, A., Burgess, C., Glorot, X., Botvinick, M., Mohamed, S., and Lerchner, A. (2016). beta-vae:Learning basic visual concepts with a constrained variational framework.
Jaderberg, M., Simonyan, K., Zisserman, A., et al. (2015). Spatial transformer networks. In Advances in neural informationprocessing systems, pages 2017–2025.
Krizhevsky, A. and Hinton, G. (2009). Learning multiple layers of features from tiny images.
LeCun, Y. (1998). The MNIST database of handwritten digits. http://yann. lecun. com/exdb/mnist/.
Liu, Z., Luo, P., Wang, X., and Tang, X. (2015). Deep Learning Face Attributes in the Wild. In Proceedings of InternationalConference on Computer Vision (ICCV).
Miyato, T., Maeda, S., Koyama, M., Nakae, K., and Ishii, S. (2015). Distributional Smoothing with Virtual Adversarial Training.stat, 1050:2.
Mnih, V., Heess, N., Graves, A., et al. (2014). Recurrent models of visual attention. In Advances in neural informationprocessing systems, pages 2204–2212. IEEE.
Oja, E. (1982). Simplified neuron model as a principal component analyzer. Journal of mathematical biology, 15(3):267–273.
Rasmus, A., Berglund, M., Honkala, M., Valpola, H., and Raiko, T. (2015). Semi-supervised learning with ladder networks. InAdvances in Neural Information Processing Systems, pages 3546–3554.
Rosenblatt, F. (1962). Principles of neurodynamics. Spartan Books.
Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., and Salakhutdinov, R. (2014). Dropout: A simple way to preventneural networks from overfitting. The Journal of Machine Learning Research, 15(1):1929–1958.
Ethan Harris (VLC) STAWM March 20, 2019 28 / 28