![Page 1: Machine Learning and AI via Brain simulations](https://reader030.vdocuments.us/reader030/viewer/2022020703/61fb41972e268c58cd5c03b9/html5/thumbnails/1.jpg)
Andrew Ng
Machine Learning and AI via Brain simulations
Andrew Ng Stanford University & Google
Adam Coates Quoc Le Honglak Lee Andrew Saxe Andrew Maas Chris Manning Jiquan Ngiam Richard Socher Will Zou
Kai Chen Greg Corrado Jeff Dean Matthieu Devin Rajat Monga Marc’Aurelio Paul Tucker Kay Le
Ranzato
Stanford:
Google:
Thanks to:
![Page 2: Machine Learning and AI via Brain simulations](https://reader030.vdocuments.us/reader030/viewer/2022020703/61fb41972e268c58cd5c03b9/html5/thumbnails/2.jpg)
400 100,000
![Page 3: Machine Learning and AI via Brain simulations](https://reader030.vdocuments.us/reader030/viewer/2022020703/61fb41972e268c58cd5c03b9/html5/thumbnails/3.jpg)
Andrew Ng
This talk
The idea of “deep learning.” Using brain simulations, hope to:
- Make learning algorithms much better and easier to use.
- Make revolutionary advances in machine learning and AI.
Vision is not only mine; shared with many researchers:
E.g., Samy Bengio, Yoshua Bengio, Tom Dean, Jeff Dean, Nando de
Freitas, Jeff Hawkins, Geoff Hinton, Quoc Le, Yann LeCun, Honglak
Lee, Tommy Poggio, Ruslan Salakhutdinov, Josh Tenenbaum, Kai
Yu, Jason Weston, ….
I believe this is our best shot at progress towards real AI.
![Page 4: Machine Learning and AI via Brain simulations](https://reader030.vdocuments.us/reader030/viewer/2022020703/61fb41972e268c58cd5c03b9/html5/thumbnails/4.jpg)
Andrew Ng
What do we want computers to do with our data?
Images/video
Audio
Text
Label: “Motorcycle”
Suggest tags
Image search
…
Speech recognition
Music classification
Speaker identification
…
Web search
Anti-spam
Machine translation
…
![Page 5: Machine Learning and AI via Brain simulations](https://reader030.vdocuments.us/reader030/viewer/2022020703/61fb41972e268c58cd5c03b9/html5/thumbnails/5.jpg)
Andrew Ng
Computer vision is hard!
Motorcycle
Motorcycle
Motorcycle
Motorcycle
Motorcycle Motorcycle
Motorcycle
Motorcycle
Motorcycle
![Page 6: Machine Learning and AI via Brain simulations](https://reader030.vdocuments.us/reader030/viewer/2022020703/61fb41972e268c58cd5c03b9/html5/thumbnails/6.jpg)
Andrew Ng
What do we want computers to do with our data?
Images/video
Audio
Text
Label: “Motorcycle”
Suggest tags
Image search
…
Speech recognition
Speaker identification
Music classification
…
Web search
Anti-spam
Machine translation
…
Machine learning performs well on many of these problems, but is a
lot of work. What is it about machine learning that makes it so hard
to use?
![Page 7: Machine Learning and AI via Brain simulations](https://reader030.vdocuments.us/reader030/viewer/2022020703/61fb41972e268c58cd5c03b9/html5/thumbnails/7.jpg)
Andrew Ng
Machine learning for image classification
“Motorcycle”
This talk: Develop ideas using images and audio.
Ideas apply to other problems (e.g., text) too.
![Page 8: Machine Learning and AI via Brain simulations](https://reader030.vdocuments.us/reader030/viewer/2022020703/61fb41972e268c58cd5c03b9/html5/thumbnails/8.jpg)
Andrew Ng
Why is this hard?
You see this:
But the camera sees this:
![Page 9: Machine Learning and AI via Brain simulations](https://reader030.vdocuments.us/reader030/viewer/2022020703/61fb41972e268c58cd5c03b9/html5/thumbnails/9.jpg)
Andrew Ng
Machine learning and feature representations
Input
Raw image
Motorbikes
“Non”-Motorbikes
Learning algorithm
pixel 1
pix
el 2
pixel 1
pixel 2
![Page 10: Machine Learning and AI via Brain simulations](https://reader030.vdocuments.us/reader030/viewer/2022020703/61fb41972e268c58cd5c03b9/html5/thumbnails/10.jpg)
Andrew Ng
Machine learning and feature representations
Input
Motorbikes
“Non”-Motorbikes
Learning algorithm
pixel 1
pix
el 2
pixel 1
pixel 2
Raw image
![Page 11: Machine Learning and AI via Brain simulations](https://reader030.vdocuments.us/reader030/viewer/2022020703/61fb41972e268c58cd5c03b9/html5/thumbnails/11.jpg)
Andrew Ng
Machine learning and feature representations
Input
Motorbikes
“Non”-Motorbikes
Learning algorithm
pixel 1
pix
el 2
pixel 1
pixel 2
Raw image
![Page 12: Machine Learning and AI via Brain simulations](https://reader030.vdocuments.us/reader030/viewer/2022020703/61fb41972e268c58cd5c03b9/html5/thumbnails/12.jpg)
Andrew Ng
What we want
Input
Motorbikes
“Non”-Motorbikes
Learning algorithm
pixel 1
pix
el 2
Feature representation
handlebars
wheel
E.g., Does it have Handlebars? Wheels?
Handlebars
Wh
ee
ls
Raw image Features
![Page 13: Machine Learning and AI via Brain simulations](https://reader030.vdocuments.us/reader030/viewer/2022020703/61fb41972e268c58cd5c03b9/html5/thumbnails/13.jpg)
Andrew Ng
Computing features in computer vision
0.1 0.7
0.6 0.4
0.1 0.4
0.5 0.5
0.1 0.6
0.7 0.5
0.2 0.3
0.4 0.4
0.1
0.7
0.4
0.6
0.1
0.4
0.5
0.5
…
Find edges
at four
orientations
Sum up edge
strength in
each quadrant
Final
feature
vector
But… we don’t have a handlebars detector. So, researchers try to hand-design features
to capture various statistical properties of the image.
![Page 14: Machine Learning and AI via Brain simulations](https://reader030.vdocuments.us/reader030/viewer/2022020703/61fb41972e268c58cd5c03b9/html5/thumbnails/14.jpg)
Andrew Ng
Feature representations
Learning algorithm
Feature Representation
Input
![Page 15: Machine Learning and AI via Brain simulations](https://reader030.vdocuments.us/reader030/viewer/2022020703/61fb41972e268c58cd5c03b9/html5/thumbnails/15.jpg)
Andrew Ng
How is computer perception done?
Image Grasp point Low-level
features
Image Vision features Detection
Images/video
Audio Audio features Speaker ID
Audio
Text
Text Text features
Text classification,
Machine translation,
Information retrieval,
....
![Page 16: Machine Learning and AI via Brain simulations](https://reader030.vdocuments.us/reader030/viewer/2022020703/61fb41972e268c58cd5c03b9/html5/thumbnails/16.jpg)
Andrew Ng
Feature representations
Learning algorithm
Feature Representation
Input
![Page 17: Machine Learning and AI via Brain simulations](https://reader030.vdocuments.us/reader030/viewer/2022020703/61fb41972e268c58cd5c03b9/html5/thumbnails/17.jpg)
Andrew Ng
Computer vision features
SIFT Spin image
HoG RIFT
Textons GLOH
![Page 18: Machine Learning and AI via Brain simulations](https://reader030.vdocuments.us/reader030/viewer/2022020703/61fb41972e268c58cd5c03b9/html5/thumbnails/18.jpg)
Andrew Ng
Audio features
ZCR
Spectrogram MFCC
Rolloff Flux
![Page 19: Machine Learning and AI via Brain simulations](https://reader030.vdocuments.us/reader030/viewer/2022020703/61fb41972e268c58cd5c03b9/html5/thumbnails/19.jpg)
Andrew Ng
NLP features
Parser features Named entity recognition Stemming
Part of speech Anaphora
Ontologies (WordNet)
Coming up with features is difficult, time-
consuming, requires expert knowledge.
When working applications of learning, we
spend a lot of time tuning the features.
![Page 20: Machine Learning and AI via Brain simulations](https://reader030.vdocuments.us/reader030/viewer/2022020703/61fb41972e268c58cd5c03b9/html5/thumbnails/20.jpg)
Andrew Ng
Feature representations
Input Learning algorithm
Feature Representation
![Page 21: Machine Learning and AI via Brain simulations](https://reader030.vdocuments.us/reader030/viewer/2022020703/61fb41972e268c58cd5c03b9/html5/thumbnails/21.jpg)
Andrew Ng
The “one learning algorithm” hypothesis
[Roe et al., 1992]
Auditory cortex learns to see
Auditory Cortex
![Page 22: Machine Learning and AI via Brain simulations](https://reader030.vdocuments.us/reader030/viewer/2022020703/61fb41972e268c58cd5c03b9/html5/thumbnails/22.jpg)
Andrew Ng
The “one learning algorithm” hypothesis
[Metin & Frost, 1989]
Somatosensory cortex learns to see
Somatosensory Cortex
![Page 23: Machine Learning and AI via Brain simulations](https://reader030.vdocuments.us/reader030/viewer/2022020703/61fb41972e268c58cd5c03b9/html5/thumbnails/23.jpg)
Andrew Ng
Sensor representations in the brain
[BrainPort; Welsh & Blasch, 1997; Nagel et al., 2005; Constantine-Paton & Law, 2009]
Seeing with your tongue Human echolocation (sonar)
Haptic belt: Direction sense Implanting a 3rd eye
![Page 24: Machine Learning and AI via Brain simulations](https://reader030.vdocuments.us/reader030/viewer/2022020703/61fb41972e268c58cd5c03b9/html5/thumbnails/24.jpg)
Andrew Ng
On two approaches to computer perception
The adult visual system computes an incredibly complicated function of
the input.
We can try to directly implement most of this incredibly complicated
function (hand-engineer features).
Can we learn this function instead?
A trained learning algorithm (e.g., neural network, boosting, decision
tree, SVM,…) is very complex. But the learning algorithm itself is
usually very simple. The complexity of the trained algorithm comes
from the data, not the algorithm.
![Page 25: Machine Learning and AI via Brain simulations](https://reader030.vdocuments.us/reader030/viewer/2022020703/61fb41972e268c58cd5c03b9/html5/thumbnails/25.jpg)
Andrew Ng
Learning input representations
Find a better way to represent images than pixels.
![Page 26: Machine Learning and AI via Brain simulations](https://reader030.vdocuments.us/reader030/viewer/2022020703/61fb41972e268c58cd5c03b9/html5/thumbnails/26.jpg)
Andrew Ng
Learning input representations
Find a better way to represent audio.
![Page 27: Machine Learning and AI via Brain simulations](https://reader030.vdocuments.us/reader030/viewer/2022020703/61fb41972e268c58cd5c03b9/html5/thumbnails/27.jpg)
Andrew Ng
Feature learning problem
• Given a 14x14 image patch x, can represent
it using 196 real numbers.
• Problem: Can we find a learn a better
feature vector to represent this?
255
98
93
87
89
91
48
…
![Page 28: Machine Learning and AI via Brain simulations](https://reader030.vdocuments.us/reader030/viewer/2022020703/61fb41972e268c58cd5c03b9/html5/thumbnails/28.jpg)
Andrew Ng
Self-taught learning (Unsupervised Feature Learning)
Testing:
What is this?
[This uses unlabeled data. One can learn the features from labeled data too.]
Not motorcycles
Unlabeled images
…
Motorcycles
![Page 29: Machine Learning and AI via Brain simulations](https://reader030.vdocuments.us/reader030/viewer/2022020703/61fb41972e268c58cd5c03b9/html5/thumbnails/29.jpg)
Andrew Ng
Self-taught learning (Unsupervised Feature Learning)
Testing:
What is this?
[This uses unlabeled data. One can learn the features from labeled data too.]
Not motorcycles
Unlabeled images
…
Motorcycles
![Page 30: Machine Learning and AI via Brain simulations](https://reader030.vdocuments.us/reader030/viewer/2022020703/61fb41972e268c58cd5c03b9/html5/thumbnails/30.jpg)
Andrew Ng
First stage of visual processing: V1
V1 is the first stage of visual processing in the brain.
Neurons in V1 typically modeled as edge detectors:
Neuron #1 of visual cortex
(model)
Neuron #2 of visual cortex
(model)
![Page 31: Machine Learning and AI via Brain simulations](https://reader030.vdocuments.us/reader030/viewer/2022020703/61fb41972e268c58cd5c03b9/html5/thumbnails/31.jpg)
Andrew Ng
Feature Learning via Sparse Coding
Sparse coding (Olshausen & Field,1996). Originally
developed to explain early visual processing in
the brain (edge detection).
Input: Images x(1), x(2), …, x(m) (each in Rn x n)
Learn: Dictionary of bases f1, f2, …, fk (also Rn x n),
so that each input x can be approximately
decomposed as:
x aj fj
s.t. aj’s are mostly zero (“sparse”)
[NIPS 2006, 2007]
j=1
k
![Page 32: Machine Learning and AI via Brain simulations](https://reader030.vdocuments.us/reader030/viewer/2022020703/61fb41972e268c58cd5c03b9/html5/thumbnails/32.jpg)
Andrew Ng
Sparse coding illustration
Natural Images Learned bases (f1 , …, f64): “Edges”
50 100 150 200 250 300 350 400 450 500
50
100
150
200
250
300
350
400
450
500
50 100 150 200 250 300 350 400 450 500
50
100
150
200
250
300
350
400
450
500
50 100 150 200 250 300 350 400 450 500
50
100
150
200
250
300
350
400
450
500
0.8 * + 0.3 * + 0.5 *
x 0.8 * f36 + 0.3 * f42
+ 0.5 * f63
[a1, …, a64] = [0, 0, …, 0, 0.8, 0, …, 0, 0.3, 0, …, 0, 0.5, 0] (feature representation)
Test example
More succinct, higher-level, representation.
![Page 33: Machine Learning and AI via Brain simulations](https://reader030.vdocuments.us/reader030/viewer/2022020703/61fb41972e268c58cd5c03b9/html5/thumbnails/33.jpg)
Andrew Ng
More examples
Represent as: [a15=0.6, a28=0.8, a37 = 0.4].
Represent as: [a5=1.3, a18=0.9, a29 = 0.3].
0.6 * + 0.8 * + 0.4 *
f15 f28 f
37
1.3 * + 0.9 * + 0.3 *
f5 f18 f
29
• Method “invents” edge detection.
• Automatically learns to represent an image in terms of the edges that
appear in it. Gives a more succinct, higher-level representation than
the raw pixels.
• Quantitatively similar to primary visual cortex (area V1) in brain.
![Page 34: Machine Learning and AI via Brain simulations](https://reader030.vdocuments.us/reader030/viewer/2022020703/61fb41972e268c58cd5c03b9/html5/thumbnails/34.jpg)
Andrew Ng
Sparse coding applied to audio
[Evan Smith & Mike Lewicki, 2006]
Image shows 20 basis functions learned from unlabeled audio.
![Page 35: Machine Learning and AI via Brain simulations](https://reader030.vdocuments.us/reader030/viewer/2022020703/61fb41972e268c58cd5c03b9/html5/thumbnails/35.jpg)
Andrew Ng
Sparse coding applied to audio
[Evan Smith & Mike Lewicki, 2006]
Image shows 20 basis functions learned from unlabeled audio.
![Page 36: Machine Learning and AI via Brain simulations](https://reader030.vdocuments.us/reader030/viewer/2022020703/61fb41972e268c58cd5c03b9/html5/thumbnails/36.jpg)
Andrew Ng
Sparse coding applied to touch data
Collect touch data using a glove, following distribution of grasps used by animals in the wild.
Grasps used by animals
[Macfarlane & Graziano, 2009]
Sparse Autoencoder Sample Bases
Sparse RBM Sample Bases
ICA Sample Bases
K-Means Sample Bases
Sparse Autoencoder Sample Bases
Sparse RBM Sample Bases
ICA Sample Bases
K-Means Sample Bases
Example learned representations
-1 -0.5 0 0.5 10
5
10
15
20
25Experimental Data Distribution
Log (Excitatory/Inhibitory Area)N
um
ber
of
Neuro
ns
-1 -0.5 0 0.5 10
5
10
15
20
25Model Distribution
Log (Excitatory/Inhibitory Area)
Num
ber
of
Bases
-1 -0.5 0 0.5 10
0.02
0.04
0.06
0.08
0.1
Log (Excitatory/Inhibitory Area)
Pro
bability
PDF comparisons (p = 0.5872)
Biological data
Learning Algorithm
[Andrew Saxe]
![Page 37: Machine Learning and AI via Brain simulations](https://reader030.vdocuments.us/reader030/viewer/2022020703/61fb41972e268c58cd5c03b9/html5/thumbnails/37.jpg)
Andrew Ng
Learning feature hierarchies
Input image (pixels)
“Sparse coding”
(edges; cf. V1)
Higher layer
(Combinations of edges;
cf. V2)
[Lee, Ranganath & Ng, 2007]
x1 x2 x3 x4
a3 a2 a1
[Technical details: Sparse autoencoder or sparse version of Hinton’s DBN.]
![Page 38: Machine Learning and AI via Brain simulations](https://reader030.vdocuments.us/reader030/viewer/2022020703/61fb41972e268c58cd5c03b9/html5/thumbnails/38.jpg)
Andrew Ng
Learning feature hierarchies
Input image
Model V1
Higher layer
(Model V2?)
Higher layer
(Model V3?)
[Lee, Ranganath & Ng, 2007]
[Technical details: Sparse autoencoder or sparse version of Hinton’s DBN.]
x1 x2 x3 x4
a3 a2 a1
![Page 39: Machine Learning and AI via Brain simulations](https://reader030.vdocuments.us/reader030/viewer/2022020703/61fb41972e268c58cd5c03b9/html5/thumbnails/39.jpg)
Andrew Ng
Hierarchical Sparse coding (Sparse DBN): Trained on face images
pixels
edges
object parts
(combination
of edges)
object models
[Honglak Lee]
Training set: Aligned
images of faces.
![Page 40: Machine Learning and AI via Brain simulations](https://reader030.vdocuments.us/reader030/viewer/2022020703/61fb41972e268c58cd5c03b9/html5/thumbnails/40.jpg)
Andrew Ng
Features learned from training on different object classes.
Hierarchical Sparse coding (Sparse DBN)
Faces Cars Elephants Chairs
[Honglak Lee]
![Page 41: Machine Learning and AI via Brain simulations](https://reader030.vdocuments.us/reader030/viewer/2022020703/61fb41972e268c58cd5c03b9/html5/thumbnails/41.jpg)
Andrew Ng
Machine learning
applications
![Page 42: Machine Learning and AI via Brain simulations](https://reader030.vdocuments.us/reader030/viewer/2022020703/61fb41972e268c58cd5c03b9/html5/thumbnails/42.jpg)
Andrew Ng
Video Activity recognition (Hollywood 2 benchmark)
Method Accuracy
Hessian + ESURF [Williems et al 2008] 38%
Harris3D + HOG/HOF [Laptev et al 2003, 2004] 45%
Cuboids + HOG/HOF [Dollar et al 2005, Laptev 2004] 46%
Hessian + HOG/HOF [Laptev 2004, Williems et al 2008] 46%
Dense + HOG / HOF [Laptev 2004] 47%
Cuboids + HOG3D [Klaser 2008, Dollar et al 2005] 46%
Unsupervised feature learning (our method) 52%
Unsupervised feature learning significantly improves
on the previous state-of-the-art.
[Le, Zhou & Ng, 2011]
![Page 43: Machine Learning and AI via Brain simulations](https://reader030.vdocuments.us/reader030/viewer/2022020703/61fb41972e268c58cd5c03b9/html5/thumbnails/43.jpg)
Andrew Ng
Sparse coding on audio (speech)
0.9 * + 0.7 * + 0.2 *
Spectrogram
x f36 f42 f
63
![Page 44: Machine Learning and AI via Brain simulations](https://reader030.vdocuments.us/reader030/viewer/2022020703/61fb41972e268c58cd5c03b9/html5/thumbnails/44.jpg)
Andrew Ng
Dictionary of bases fi learned for speech
[Honglak Lee]
Many bases seem to correspond to phonemes.
![Page 45: Machine Learning and AI via Brain simulations](https://reader030.vdocuments.us/reader030/viewer/2022020703/61fb41972e268c58cd5c03b9/html5/thumbnails/45.jpg)
Andrew Ng
Hierarchical Sparse coding (sparse DBN) for audio
Spectrogram
[Honglak Lee]
![Page 46: Machine Learning and AI via Brain simulations](https://reader030.vdocuments.us/reader030/viewer/2022020703/61fb41972e268c58cd5c03b9/html5/thumbnails/46.jpg)
Andrew Ng
Spectrogram
Hierarchical Sparse coding (sparse DBN) for audio
[Honglak Lee]
![Page 47: Machine Learning and AI via Brain simulations](https://reader030.vdocuments.us/reader030/viewer/2022020703/61fb41972e268c58cd5c03b9/html5/thumbnails/47.jpg)
Andrew Ng
Hierarchical Sparse coding (sparse DBN) for audio
[Honglak Lee]
![Page 48: Machine Learning and AI via Brain simulations](https://reader030.vdocuments.us/reader030/viewer/2022020703/61fb41972e268c58cd5c03b9/html5/thumbnails/48.jpg)
Andrew Ng
Phoneme Classification (TIMIT benchmark)
Method Accuracy
Clarkson and Moreno (1999) 77.6%
Gunawardana et al. (2005) 78.3%
Sung et al. (2007) 78.5%
Petrov et al. (2007) 78.6%
Sha and Saul (2006) 78.9%
Yu et al. (2006) 79.2%
Unsupervised feature learning (our method) 80.3%
Unsupervised feature learning significantly improves
on the previous state-of-the-art.
[Lee et al., 2009]
![Page 49: Machine Learning and AI via Brain simulations](https://reader030.vdocuments.us/reader030/viewer/2022020703/61fb41972e268c58cd5c03b9/html5/thumbnails/49.jpg)
Andrew Ng
State-of-the-art
Unsupervised
feature learning
![Page 50: Machine Learning and AI via Brain simulations](https://reader030.vdocuments.us/reader030/viewer/2022020703/61fb41972e268c58cd5c03b9/html5/thumbnails/50.jpg)
Andrew Ng
Images
Multimodal (audio/video)
CIFAR Object classification Accuracy
Prior art (Ciresan et al., 2011) 80.5%
Stanford Feature learning 82.0%
NORB Object classification Accuracy
Prior art (Scherer et al., 2010) 94.4%
Stanford Feature learning 95.0%
AVLetters Lip reading Accuracy
Prior art (Zhao et al., 2009) 58.9%
Stanford Feature learning 65.8%
Galaxy
Other unsupervised feature learning records:
Pedestrian detection (Yann LeCun)
Speech recognition (Geoff Hinton)
PASCAL VOC object classification (Kai Yu)
Hollywood2 Classification Accuracy
Prior art (Laptev et al., 2004) 48%
Stanford Feature learning 53%
KTH Accuracy
Prior art (Wang et al., 2010) 92.1%
Stanford Feature learning 93.9%
UCF Accuracy
Prior art (Wang et al., 2010) 85.6%
Stanford Feature learning 86.5%
YouTube Accuracy
Prior art (Liu et al., 2009) 71.2%
Stanford Feature learning 75.8%
Video
Text/NLP
Paraphrase detection Accuracy
Prior art (Das & Smith, 2009) 76.1%
Stanford Feature learning 76.4%
Sentiment (MR/MPQA data) Accuracy
Prior art (Nakagawa et al., 2010) 77.3%
Stanford Feature learning 77.7%
![Page 51: Machine Learning and AI via Brain simulations](https://reader030.vdocuments.us/reader030/viewer/2022020703/61fb41972e268c58cd5c03b9/html5/thumbnails/51.jpg)
Andrew Ng
Technical challenge:
Scaling up
![Page 52: Machine Learning and AI via Brain simulations](https://reader030.vdocuments.us/reader030/viewer/2022020703/61fb41972e268c58cd5c03b9/html5/thumbnails/52.jpg)
Andrew Ng
Supervised Learning
• Choices of learning algorithm:
– Memory based
– Winnow
– Perceptron
– Naïve Bayes
– SVM
– ….
• What matters the most?
[Banko & Brill, 2001]
Training set size (millions)
Accu
racy
“It’s not who has the best algorithm that wins.
It’s who has the most data.”
![Page 53: Machine Learning and AI via Brain simulations](https://reader030.vdocuments.us/reader030/viewer/2022020703/61fb41972e268c58cd5c03b9/html5/thumbnails/53.jpg)
Andrew Ng
Scaling and classification accuracy (CIFAR-10)
Large numbers of features is critical. The specific learning algorithm is
important, but ones that can scale to many features also have a big
advantage.
[Adam Coates]
![Page 54: Machine Learning and AI via Brain simulations](https://reader030.vdocuments.us/reader030/viewer/2022020703/61fb41972e268c58cd5c03b9/html5/thumbnails/54.jpg)
Andrew Ng
Attempts to scale up
Significant effort spent on algorithmic tricks to get algorithms to run faster.
• Efficient sparse coding. [LeCun, Ng, Yu]
• Efficient posterior inference [Bengio, Hinton]
• Convolutional Networks. [Bengio, de Freitas, LeCun, Lee, Ng]
• Tiled Networks. [Hinton, Ng]
• Randomized/fast parameter search. [DiCarlo, Ng]
• Massive data synthesis. [LeCun, Schmidhuber]
• Massive embedding models [Bengio, Collobert, Hinton, Weston]
• Fast decoder algorithms. [LeCun, Lee, Ng, Yu]
• GPU, FPGA and ASIC implementations. [Dean, LeCun, Ng, Olukotun]
[Raina, Madhavan and Ng, 2008]
![Page 55: Machine Learning and AI via Brain simulations](https://reader030.vdocuments.us/reader030/viewer/2022020703/61fb41972e268c58cd5c03b9/html5/thumbnails/55.jpg)
Andrew Ng
Images
Multimodal (audio/video)
CIFAR Object classification Accuracy
Prior art (Ciresan et al., 2011) 80.5%
Stanford Feature learning 82.0%
NORB Object classification Accuracy
Prior art (Scherer et al., 2010) 94.4%
Stanford Feature learning 95.0%
AVLetters Lip reading Accuracy
Prior art (Zhao et al., 2009) 58.9%
Stanford Feature learning 65.8%
Galaxy
Other unsupervised feature learning records:
Pedestrian detection (Yann LeCun)
Speech recognition (Geoff Hinton)
PASCAL VOC object classification (Kai Yu)
Hollywood2 Classification Accuracy
Prior art (Laptev et al., 2004) 48%
Stanford Feature learning 53%
KTH Accuracy
Prior art (Wang et al., 2010) 92.1%
Stanford Feature learning 93.9%
UCF Accuracy
Prior art (Wang et al., 2010) 85.6%
Stanford Feature learning 86.5%
YouTube Accuracy
Prior art (Liu et al., 2009) 71.2%
Stanford Feature learning 75.8%
Video
Text/NLP
Paraphrase detection Accuracy
Prior art (Das & Smith, 2009) 76.1%
Stanford Feature learning 76.4%
Sentiment (MR/MPQA data) Accuracy
Prior art (Nakagawa et al., 2010) 77.3%
Stanford Feature learning 77.7%
![Page 56: Machine Learning and AI via Brain simulations](https://reader030.vdocuments.us/reader030/viewer/2022020703/61fb41972e268c58cd5c03b9/html5/thumbnails/56.jpg)
Andrew Ng
Scaling up: Discovering
object classes
[Quoc V. Le, Marc'Aurelio Ranzato, Rajat Monga,
Greg Corrado, Matthieu Devin, Kai Chen, Jeff Dean]
![Page 57: Machine Learning and AI via Brain simulations](https://reader030.vdocuments.us/reader030/viewer/2022020703/61fb41972e268c58cd5c03b9/html5/thumbnails/57.jpg)
Andrew Ng
Training procedure
What features can we learn if we train a massive model on a massive
amount of data. Can we learn a “grandmother cell”?
• Train on 10 million images (YouTube)
• 1000 machines (16,000 cores) for 1 week.
• 1.15 billion parameters
• Test on novel images
Training set (YouTube) Test set (FITW + ImageNet)
![Page 58: Machine Learning and AI via Brain simulations](https://reader030.vdocuments.us/reader030/viewer/2022020703/61fb41972e268c58cd5c03b9/html5/thumbnails/58.jpg)
Andrew Ng
Face neuron
[Raina, Madhavan and Ng, 2008]
Top Stimuli from the test set Optimal stimulus by numerical optimization
![Page 59: Machine Learning and AI via Brain simulations](https://reader030.vdocuments.us/reader030/viewer/2022020703/61fb41972e268c58cd5c03b9/html5/thumbnails/59.jpg)
Random distractors
Faces
![Page 60: Machine Learning and AI via Brain simulations](https://reader030.vdocuments.us/reader030/viewer/2022020703/61fb41972e268c58cd5c03b9/html5/thumbnails/60.jpg)
Andrew Ng
Invariance properties Fe
atu
re r
esp
on
se
Horizontal shift Vertical shift
Feat
ure
res
po
nse
3D rotation angle
Feat
ure
res
po
nse
90
+15 pixels
o
Feat
ure
res
po
nse
Scale factor
1.6x
+15 pixels
![Page 61: Machine Learning and AI via Brain simulations](https://reader030.vdocuments.us/reader030/viewer/2022020703/61fb41972e268c58cd5c03b9/html5/thumbnails/61.jpg)
Andrew Ng
Cat neuron
[Raina, Madhavan and Ng, 2008]
Top Stimuli from the test set Optimal stimulus by numerical optimization
![Page 62: Machine Learning and AI via Brain simulations](https://reader030.vdocuments.us/reader030/viewer/2022020703/61fb41972e268c58cd5c03b9/html5/thumbnails/62.jpg)
Cat face neuron
Random distractors
Cat faces
![Page 63: Machine Learning and AI via Brain simulations](https://reader030.vdocuments.us/reader030/viewer/2022020703/61fb41972e268c58cd5c03b9/html5/thumbnails/63.jpg)
Visualization
Top Stimuli from the test set Optimal stimulus by numerical optimization
![Page 64: Machine Learning and AI via Brain simulations](https://reader030.vdocuments.us/reader030/viewer/2022020703/61fb41972e268c58cd5c03b9/html5/thumbnails/64.jpg)
Pedestrian neuron
Random distractors
Pedestrians
![Page 65: Machine Learning and AI via Brain simulations](https://reader030.vdocuments.us/reader030/viewer/2022020703/61fb41972e268c58cd5c03b9/html5/thumbnails/65.jpg)
Andrew Ng
Weaknesses &
Criticisms
![Page 66: Machine Learning and AI via Brain simulations](https://reader030.vdocuments.us/reader030/viewer/2022020703/61fb41972e268c58cd5c03b9/html5/thumbnails/66.jpg)
Andrew Ng
Weaknesses & Criticisms
• You’re learning everything. It’s better to encode prior knowledge about
structure of images (or audio, or text).
A: Wasn’t there a similar machine learning vs. linguists debate in NLP ~20
years ago….
• Unsupervised feature learning cannot currently do X, where X is:
Go beyond Gabor (1 layer) features. Work on temporal data (video). Learn hierarchical representations (compositional semantics). Get state-of-the-art in activity recognition. Get state-of-the-art on image classification. Get state-of-the-art on object detection. Learn variable-size representations.
A: Many of these were true, but not anymore (were not fundamental
weaknesses). There’s still work to be done though!
• We don’t understand the learned features.
A: True. Though many vision/audio/etc. features also suffer from this (e.g,
concatenations/combinations of different features).
![Page 67: Machine Learning and AI via Brain simulations](https://reader030.vdocuments.us/reader030/viewer/2022020703/61fb41972e268c58cd5c03b9/html5/thumbnails/67.jpg)
Andrew Ng
Conclusion
![Page 68: Machine Learning and AI via Brain simulations](https://reader030.vdocuments.us/reader030/viewer/2022020703/61fb41972e268c58cd5c03b9/html5/thumbnails/68.jpg)
Andrew Ng
• Deep Learning and Self-Taught learning: Lets learn rather than
manually design our features.
• Discover the fundamental computational principles that
underlie perception?
• Sparse coding and deep versions very successful on vision
and audio tasks. Other variants for learning recursive
representations.
• To get this to work for yourself, see online tutorial:
http://deeplearning.stanford.edu/wiki
Thanks to:
Unsupervised Feature Learning Summary
Unlabeled images
Car Motorcycle
Adam Coates Quoc Le Honglak Lee Andrew Saxe Andrew Maas Chris Manning Jiquan Ngiam Richard Socher Will Zou
Kai Chen Greg Corrado Jeff Dean Matthieu Devin Rajat Monga Marc’Aurelio Paul Tucker Kay Le
Ranzato
Stanford
![Page 69: Machine Learning and AI via Brain simulations](https://reader030.vdocuments.us/reader030/viewer/2022020703/61fb41972e268c58cd5c03b9/html5/thumbnails/69.jpg)
Andrew Ng
Advanced topics + Research philosophy
Andrew Ng Stanford University & Google
![Page 70: Machine Learning and AI via Brain simulations](https://reader030.vdocuments.us/reader030/viewer/2022020703/61fb41972e268c58cd5c03b9/html5/thumbnails/70.jpg)
Andrew Ng
Learning Recursive
Representations
![Page 71: Machine Learning and AI via Brain simulations](https://reader030.vdocuments.us/reader030/viewer/2022020703/61fb41972e268c58cd5c03b9/html5/thumbnails/71.jpg)
Andrew Ng
Feature representations of words
0
0
0
0
1
0
0
0
Imagine taking each word, and computing an n-dimensional feature vector for it.
[Distributional representations, or Bengio et al., 2003, Collobert & Weston, 2008.]
2-d embedding example below, but in practice use ~100-d embeddings.
x2
x1
0 1 2 3 4 5 6 7 8 9 10
5
4
3
2
1
0
1
0
0
0
0
0
0
Monday Britain
Monday
2
4
Britain 9
2
Tuesday 2.1
3.3
France 9.5
1.5
On 8
5
E.g., LSA (Landauer & Dumais, 1997); Distributional clustering (Brown et al., 1992; Pereira et al., 1993);
On Monday, Britain ….
Representation: 8
5
2
4
9
2
![Page 72: Machine Learning and AI via Brain simulations](https://reader030.vdocuments.us/reader030/viewer/2022020703/61fb41972e268c58cd5c03b9/html5/thumbnails/72.jpg)
Andrew Ng
“Generic” hierarchy on text doesn’t make sense
Node has to represent
sentence fragment “cat
sat on.” Doesn’t make
sense.
The cat on the mat. The cat sat
9
1 5
3
8
5
9
1
4
3
7
1
Feature representation
for words
![Page 73: Machine Learning and AI via Brain simulations](https://reader030.vdocuments.us/reader030/viewer/2022020703/61fb41972e268c58cd5c03b9/html5/thumbnails/73.jpg)
Andrew Ng
The cat on the mat.
What we want (illustration)
The cat sat
9
1 5
3
8
5
9
1
4
3
NP NP
PP
S This node’s job is
to represent
“on the mat.”
7
1
VP
![Page 74: Machine Learning and AI via Brain simulations](https://reader030.vdocuments.us/reader030/viewer/2022020703/61fb41972e268c58cd5c03b9/html5/thumbnails/74.jpg)
Andrew Ng
The cat on the mat.
What we want (illustration)
The cat sat
9
1 5
3
8
5
9
1
4
3
NP NP
PP
S This node’s job is
to represent
“on the mat.”
7
1
VP
5
2 3
3
8
3
5
4
7
3
![Page 75: Machine Learning and AI via Brain simulations](https://reader030.vdocuments.us/reader030/viewer/2022020703/61fb41972e268c58cd5c03b9/html5/thumbnails/75.jpg)
Andrew Ng
What we want (illustration)
x2
x1
0 1 2 3 4 5 6 7 8 9 10
5
4
3
2
1
Monday
Britain
Tuesday
France
The day after my birthday, …
g 8
5
2
4
9
2 3
2 9
2
5
2
3
3
8
3
3
5
g 8
5
9
2
9
9 3
2 2
2
2
8
3
2
9
2
9
3
The country of my birth…
The country of my birth
The day after my birthday
![Page 76: Machine Learning and AI via Brain simulations](https://reader030.vdocuments.us/reader030/viewer/2022020703/61fb41972e268c58cd5c03b9/html5/thumbnails/76.jpg)
Andrew Ng
Learning recursive representations
The cat on the mat.
8
5
9
1
4
3
3
3
8
3
This node’s job is
to represent
“on the mat.”
![Page 77: Machine Learning and AI via Brain simulations](https://reader030.vdocuments.us/reader030/viewer/2022020703/61fb41972e268c58cd5c03b9/html5/thumbnails/77.jpg)
Andrew Ng
Learning recursive representations
The cat on the mat.
8
5
9
1
4
3
3
3
8
3
This node’s job is
to represent
“on the mat.”
![Page 78: Machine Learning and AI via Brain simulations](https://reader030.vdocuments.us/reader030/viewer/2022020703/61fb41972e268c58cd5c03b9/html5/thumbnails/78.jpg)
Andrew Ng
Learning recursive representations
The cat on the mat.
8
5
9
1
4
3
3
3
8
3
This node’s job is
to represent
“on the mat.”
Basic computational unit: Neural Network
that inputs two candidate children’s
representations, and outputs:
• Whether we should merge the two nodes.
• The semantic representation if the two
nodes are merged.
8
5
3
3
Neural
Network
8
3 “Yes”
![Page 79: Machine Learning and AI via Brain simulations](https://reader030.vdocuments.us/reader030/viewer/2022020703/61fb41972e268c58cd5c03b9/html5/thumbnails/79.jpg)
Andrew Ng
Parsing a sentence
Neural
Network
No
0
1
Neural
Network
No
0
0
Neural
Network
Yes
3
3
The cat on the mat. The cat sat
9
1 5
3
8
5
9
1
4
3
7
1
Neural
Network
Yes
5
2
Neural
Network
No
0
1
![Page 80: Machine Learning and AI via Brain simulations](https://reader030.vdocuments.us/reader030/viewer/2022020703/61fb41972e268c58cd5c03b9/html5/thumbnails/80.jpg)
Andrew Ng
The cat on the mat.
Parsing a sentence
The cat sat
9
1 5
3
8
5
9
1
4
3
7
1
5
2 3
3
Neural
Network
Yes
8
3
Neural
Network
No
0
1
Neural
Network
No
0
1
![Page 81: Machine Learning and AI via Brain simulations](https://reader030.vdocuments.us/reader030/viewer/2022020703/61fb41972e268c58cd5c03b9/html5/thumbnails/81.jpg)
Andrew Ng
The cat on the mat.
Parsing a sentence
The cat sat
9
1 5
3
8
5
9
1
4
3
7
1
5
2 3
3
8
3
5
4
7
3
![Page 82: Machine Learning and AI via Brain simulations](https://reader030.vdocuments.us/reader030/viewer/2022020703/61fb41972e268c58cd5c03b9/html5/thumbnails/82.jpg)
Andrew Ng
Finding Similar Sentences
• Each sentence has a feature vector representation.
• Pick a sentence (“center sentence”) and list nearest neighbor sentences.
• Often either semantically or syntactically similar. (Digits all mapped to 2.)
Similarities Center
Sentence
Nearest Neighbor Sentences (most similar feature
vector)
Bad News
Both took
further hits
yesterday
1. We 're in for a lot of turbulence ...
2. BSN currently has 2.2 million common shares
outstanding
3. This is panic buying
4. We have a couple or three tough weeks coming
Something said I had calls all
night long from
the States, he
said
1. Our intent is to promote the best alternative, he
says
2. We have sufficient cash flow to handle that, he
said
3. Currently, average pay for machinists is 22.22 an
hour, Boeing said
4. Profit from trading for its own account dropped, the
securities firm said
Gains and good
news
Fujisawa gained
22 to 2,222
1. Mochida advanced 22 to 2,222
2. Commerzbank gained 2 to 222.2
3. Paris loved her at first sight
4. Profits improved across Hess's businesses
Unknown words
which are cities
Columbia , S.C
1. Greenville , Miss
2. UNK , Md
3. UNK , Miss
4. UNK , Calif
![Page 83: Machine Learning and AI via Brain simulations](https://reader030.vdocuments.us/reader030/viewer/2022020703/61fb41972e268c58cd5c03b9/html5/thumbnails/83.jpg)
Andrew Ng
Finding Similar Sentences
Similarities Center
Sentence
Nearest Neighbor Sentences (most similar feature
vector)
Declining to
comment = not
disclosing
Hess declined to
comment
1. PaineWebber declined to comment
2. Phoenix declined to comment
3. Campeau declined to comment
4. Coastal wouldn't disclose the terms
Large changes in
sales or revenue
Sales grew
almost 2 % to
222.2 million
from 222.2
million
1. Sales surged 22 % to 222.22 billion yen from 222.22
billion
2. Revenue fell 2 % to 2.22 billion from 2.22 billion
3. Sales rose more than 2 % to 22.2 million from 22.2
million
4. Volume was 222.2 million shares , more than triple
recent levels
Negation of
different types
There's nothing
unusual about
business groups
pushing for
more
government
spending
1. We don't think at this point anything needs to be said
2. It therefore makes no sense for each market to adopt
different circuit breakers
3. You can't say the same with black and white
4. I don't think anyone left the place UNK UNK
People in bad
situations
We were lucky
1. It was chaotic
2. We were wrong
3. People had died
4. They still are
![Page 84: Machine Learning and AI via Brain simulations](https://reader030.vdocuments.us/reader030/viewer/2022020703/61fb41972e268c58cd5c03b9/html5/thumbnails/84.jpg)
Andrew Ng
Application: Paraphrase Detection
• Task: Decide whether or not two sentences are paraphrases of each
other. (MSR Paraphrase Corpus)
Method F1
Baseline 57.8
Tf-idf + cosine similarity (from Mihalcea, 2006) 75.3
Kozareva and Montoyo (2006) (lexical and semantic features) 79.6
RNN-based Model (our work) 79.7
Mihalcea et al. (2006) (word similarity measures: WordNet, dictionaries,
etc.) 81.3
Fernando & Stevenson (2008) (WordNet based features) 82.4
Wan et al (2006) (many features: POS, parsing, BLEU, etc.) 83.0
Method F1
Baseline 79.9
Rus et al., (2008) 80.5
Mihalcea et al., (2006) 81.3
Islam et al. (2007) 81.3
Qiu et al. (2006) 81.6
Fernando & Stevenson (2008) (WordNet based features) 82.4
Das et al. (2009) 82.7
Wan et al (2006) (many features: POS, parsing, BLEU, etc.) 83.0
Stanford Feature Learning 83.4
![Page 85: Machine Learning and AI via Brain simulations](https://reader030.vdocuments.us/reader030/viewer/2022020703/61fb41972e268c58cd5c03b9/html5/thumbnails/85.jpg)
Andrew Ng
Parsing sentences and parsing images
A small crowd
quietly enters the
historic church.
Each node in the hierarchy has a “feature vector” representation.
![Page 86: Machine Learning and AI via Brain simulations](https://reader030.vdocuments.us/reader030/viewer/2022020703/61fb41972e268c58cd5c03b9/html5/thumbnails/86.jpg)
Andrew Ng
Nearest neighbor examples for image patches
• Each node (e.g., set of merged superpixels) in the hierarchy has a feature vector.
• Select a node (“center patch”) and list nearest neighbor nodes.
• I.e., what image patches/superpixels get mapped to similar features?
Selected patch Nearest Neighbors
![Page 87: Machine Learning and AI via Brain simulations](https://reader030.vdocuments.us/reader030/viewer/2022020703/61fb41972e268c58cd5c03b9/html5/thumbnails/87.jpg)
Andrew Ng
Multi-class segmentation (Stanford background dataset)
Clarkson and Moreno (1999): 77.6%
Gunawardana et al. (2005): 78.3%
Sung et al. (2007): 78.5%
Petrov et al. (2007): 78.6%
Sha and Saul (2006): 78.9%
Yu et al. (2009): 79.2%
Method Accuracy
Pixel CRF (Gould et al., ICCV 2009) 74.3
Classifier on superpixel features 75.9
Region-based energy (Gould et al., ICCV 2009) 76.4
Local labelling (Tighe & Lazebnik, ECCV 2010) 76.9
Superpixel MRF (Tighe & Lazebnik, ECCV 2010) 77.5
Simultaneous MRF (Tighe & Lazebnik, ECCV 2010) 77.5
Stanford Feature learning (our method) 78.1
![Page 88: Machine Learning and AI via Brain simulations](https://reader030.vdocuments.us/reader030/viewer/2022020703/61fb41972e268c58cd5c03b9/html5/thumbnails/88.jpg)
Andrew Ng
Multi-class Segmentation MSRC dataset: 21 Classes
Methods Accuracy
TextonBoost (Shotton et al., ECCV 2006) 72.2
Framework over mean-shift patches (Yang et al., CVPR
2007)
75.1
Pixel CRF (Gould et al., ICCV 2009) 75.3
Region-based energy (Gould et al., IJCV 2008) 76.5
Stanford Feature learning (out method) 76.7
![Page 89: Machine Learning and AI via Brain simulations](https://reader030.vdocuments.us/reader030/viewer/2022020703/61fb41972e268c58cd5c03b9/html5/thumbnails/89.jpg)
Andrew Ng
Analysis of feature
learning algorithms
Andrew Coates Honglak Lee
![Page 90: Machine Learning and AI via Brain simulations](https://reader030.vdocuments.us/reader030/viewer/2022020703/61fb41972e268c58cd5c03b9/html5/thumbnails/90.jpg)
Andrew Ng
Supervised Learning
• Choices of learning algorithm:
– Memory based
– Winnow
– Perceptron
– Naïve Bayes
– SVM
– ….
• What matters the most?
[Banko & Brill, 2001]
Training set size
Accu
racy
“It’s not who has the best algorithm that wins.
It’s who has the most data.”
![Page 91: Machine Learning and AI via Brain simulations](https://reader030.vdocuments.us/reader030/viewer/2022020703/61fb41972e268c58cd5c03b9/html5/thumbnails/91.jpg)
Andrew Ng
Unsupervised Feature Learning
• Many choices in feature learning algorithms;
– Sparse coding, RBM, autoencoder, etc.
– Pre-processing steps (whitening)
– Number of features learned
– Various hyperparameters.
• What matters the most?
![Page 92: Machine Learning and AI via Brain simulations](https://reader030.vdocuments.us/reader030/viewer/2022020703/61fb41972e268c58cd5c03b9/html5/thumbnails/92.jpg)
Andrew Ng
Unsupervised feature learning
Most algorithms learn Gabor-like edge detectors.
Sparse auto-encoder
![Page 93: Machine Learning and AI via Brain simulations](https://reader030.vdocuments.us/reader030/viewer/2022020703/61fb41972e268c58cd5c03b9/html5/thumbnails/93.jpg)
Andrew Ng
Unsupervised feature learning
Weights learned with and without whitening.
Sparse auto-encoder
with whitening without whitening
Sparse RBM
with whitening without whitening
K-means
with whitening without whitening
Gaussian mixture model
with whitening without whitening
![Page 94: Machine Learning and AI via Brain simulations](https://reader030.vdocuments.us/reader030/viewer/2022020703/61fb41972e268c58cd5c03b9/html5/thumbnails/94.jpg)
Andrew Ng
Scaling and classification accuracy (CIFAR-10)
![Page 95: Machine Learning and AI via Brain simulations](https://reader030.vdocuments.us/reader030/viewer/2022020703/61fb41972e268c58cd5c03b9/html5/thumbnails/95.jpg)
Andrew Ng
Results on CIFAR-10 and NORB (old result)
• K-means achieves state-of-the-art
– Scalable, fast and almost parameter-free, K-means does surprisingly well.
NORB Test accuracy (error)
Convolutional Neural Networks 93.4% (6.6%)
Deep Boltzmann Machines 92.8% (7.2%)
Deep Belief Networks 95.0% (5.0%)
Jarrett et al., 2009 94.4% (5.6%)
Sparse auto-encoder 96.9% (3.1%)
Sparse RBM 96.2% (3.8%)
K-means (Hard) 96.9% (3.1%)
K-means (Triangle) 97.0% (3.0%)
CIFAR-10 Test accuracy
Raw pixels 37.3%
RBM with back-propagation 64.8%
3-Way Factored RBM (3 layers) 65.3%
Mean-covariance RBM (3 layers) 71.0%
Improved Local Coordinate Coding 74.5%
Convolutional RBM 78.9%
Sparse auto-encoder 73.4%
Sparse RBM 72.4%
K-means (Hard) 68.6%
K-means (Triangle, 1600 features) 77.9%
K-means (Triangle, 4000 features) 79.6%
![Page 96: Machine Learning and AI via Brain simulations](https://reader030.vdocuments.us/reader030/viewer/2022020703/61fb41972e268c58cd5c03b9/html5/thumbnails/96.jpg)
Andrew Ng
Tiled Convolution
Neural Networks
Quoc Le Jiquan Ngiam
![Page 97: Machine Learning and AI via Brain simulations](https://reader030.vdocuments.us/reader030/viewer/2022020703/61fb41972e268c58cd5c03b9/html5/thumbnails/97.jpg)
Andrew Ng
Learning Invariances
• We want to learn invariant features.
• Convolutional networks uses weight tying to:
– Reduce number of weights that need to be learned. Allows scaling to larger images/models.
– Hard code translation invariance. Makes it harder to learn more complex types of invariances.
• Goal: Preserve computational scaling advantage of
convolutional nets, but learn more complex invariances.
![Page 98: Machine Learning and AI via Brain simulations](https://reader030.vdocuments.us/reader030/viewer/2022020703/61fb41972e268c58cd5c03b9/html5/thumbnails/98.jpg)
Andrew Ng
Fully Connected Topographic ICA
Input
Pooling Units
(Sqrt)
Simple Units
(Square)
Doesn’t scale to large images.
![Page 99: Machine Learning and AI via Brain simulations](https://reader030.vdocuments.us/reader030/viewer/2022020703/61fb41972e268c58cd5c03b9/html5/thumbnails/99.jpg)
Andrew Ng
Fully Connected Topographic ICA
Input
Orthogonalize
Pooling Units
(Sqrt)
Simple Units
(Square)
Doesn’t scale to large images.
![Page 100: Machine Learning and AI via Brain simulations](https://reader030.vdocuments.us/reader030/viewer/2022020703/61fb41972e268c58cd5c03b9/html5/thumbnails/100.jpg)
Andrew Ng
Local Receptive Fields
Input
Pooling Units
(Sqrt)
Simple Units
(Square)
![Page 101: Machine Learning and AI via Brain simulations](https://reader030.vdocuments.us/reader030/viewer/2022020703/61fb41972e268c58cd5c03b9/html5/thumbnails/101.jpg)
Andrew Ng
Convolution Neural Networks (Weight Tying)
Input
Pooling Units
(Sqrt)
Simple Units
(Square)
![Page 102: Machine Learning and AI via Brain simulations](https://reader030.vdocuments.us/reader030/viewer/2022020703/61fb41972e268c58cd5c03b9/html5/thumbnails/102.jpg)
Andrew Ng
Tiled Networks (Partial Weight Tying)
Input
Pooling Units
(Sqrt)
Simple Units
(Square)
Tile Size (k) = 2
Local pooling can capture complex invariances (not just translation);
but total number of parameters is small.
![Page 103: Machine Learning and AI via Brain simulations](https://reader030.vdocuments.us/reader030/viewer/2022020703/61fb41972e268c58cd5c03b9/html5/thumbnails/103.jpg)
Andrew Ng
Tiled Networks (Partial Weight Tying)
Input
Pooling Units
(Sqrt)
Simple Units
(Square)
Tile Size (k) = 2
![Page 104: Machine Learning and AI via Brain simulations](https://reader030.vdocuments.us/reader030/viewer/2022020703/61fb41972e268c58cd5c03b9/html5/thumbnails/104.jpg)
Andrew Ng
Tiled Networks (Partial Weight Tying)
Number
of Maps (l)
= 3
Input
Pooling Units
(Sqrt)
Simple Units
(Square)
Tile Size (k) = 2
![Page 105: Machine Learning and AI via Brain simulations](https://reader030.vdocuments.us/reader030/viewer/2022020703/61fb41972e268c58cd5c03b9/html5/thumbnails/105.jpg)
Andrew Ng
Tiled Networks (Partial Weight Tying)
Number
of Maps (l)
= 3
Input
Pooling Units
(Sqrt)
Simple Units
(Square)
Tile Size (k) = 2
Local
Orthogonalization
![Page 106: Machine Learning and AI via Brain simulations](https://reader030.vdocuments.us/reader030/viewer/2022020703/61fb41972e268c58cd5c03b9/html5/thumbnails/106.jpg)
Andrew Ng
NORB and CIFAR-10 results
Algorithms NORB Accuracy
Deep Tiled CNNs [this work] 96.1%
CNNs [Huang & LeCun, 2006] 94.1%
3D Deep Belief Networks [Nair & Hinton, 2009] 93.5%
Deep Boltzmann Machines [Salakhutdinov & Hinton, 2009] 92.8%
TICA [Hyvarinen et al., 2001] 89.6%
SVMs 88.4%
Algorithms CIFAR-10 Accuracy
Improved LCC [Yu et al., 2010] 74.5%
Deep Tiled CNNs [this work] 73.1%
LCC [Yu et al., 2010] 72.3%
mcRBMs [Ranzato & Hinton, 2010] 71.0%
Best of all RBMs [Krizhevsky, 2009] 64.8%
TICA [Hyvarinen et al., 2001] 56.1%
![Page 107: Machine Learning and AI via Brain simulations](https://reader030.vdocuments.us/reader030/viewer/2022020703/61fb41972e268c58cd5c03b9/html5/thumbnails/107.jpg)
Andrew Ng
Summary/Big ideas
![Page 108: Machine Learning and AI via Brain simulations](https://reader030.vdocuments.us/reader030/viewer/2022020703/61fb41972e268c58cd5c03b9/html5/thumbnails/108.jpg)
Andrew Ng
Summary/Big ideas
• Large scale brain simulations as revisiting of the big “AI
dream.”
• “Deep learning” has had two big ideas:
– Learning multiple layers of representation
– Learning features from unlabeled data
• Has worked well so far in two regimes (confusing to
outsiders):
– Lots of labeled data. “Train the heck out of the network.”
– Unsupervised Feature Learning/Self-Taught learning
• Scalability is important.
• Detailed tutorial: http://deeplearning.stanford.edu/wiki
![Page 109: Machine Learning and AI via Brain simulations](https://reader030.vdocuments.us/reader030/viewer/2022020703/61fb41972e268c58cd5c03b9/html5/thumbnails/109.jpg)
Andrew Ng
END END
END