andrew ng machine learning and ai via brain simulations andrew ng stanford university adam coates...
TRANSCRIPT
![Page 1: Andrew Ng Machine Learning and AI via Brain simulations Andrew Ng Stanford University Adam Coates Quoc Le Honglak Lee Andrew Saxe Andrew Maas Chris Manning](https://reader037.vdocuments.us/reader037/viewer/2022103005/56649f465503460f94c67c69/html5/thumbnails/1.jpg)
Andrew Ng
Machine Learning and AI via Brain simulations
Andrew NgStanford University
Adam Coates Quoc Le Honglak Lee Andrew Saxe Andrew Maas Chris Manning Jiquan Ngiam Richard Socher Will Zou
Thanks to:
Google: Kai Chen, Greg Corrado, Jeff Dean, Matthieu Devin, Andrea Frome, Rajat Monga, Marc’Aurelio Ranzato, Paul Tucker, Kay Le
![Page 2: Andrew Ng Machine Learning and AI via Brain simulations Andrew Ng Stanford University Adam Coates Quoc Le Honglak Lee Andrew Saxe Andrew Maas Chris Manning](https://reader037.vdocuments.us/reader037/viewer/2022103005/56649f465503460f94c67c69/html5/thumbnails/2.jpg)
Coursera
400100,000
![Page 3: Andrew Ng Machine Learning and AI via Brain simulations Andrew Ng Stanford University Adam Coates Quoc Le Honglak Lee Andrew Saxe Andrew Maas Chris Manning](https://reader037.vdocuments.us/reader037/viewer/2022103005/56649f465503460f94c67c69/html5/thumbnails/3.jpg)
Coursera: Courses from Top Universities
![Page 4: Andrew Ng Machine Learning and AI via Brain simulations Andrew Ng Stanford University Adam Coates Quoc Le Honglak Lee Andrew Saxe Andrew Maas Chris Manning](https://reader037.vdocuments.us/reader037/viewer/2022103005/56649f465503460f94c67c69/html5/thumbnails/4.jpg)
• 30 of the top 60 universities worldwide (Academic Ranking of World Universities)• The #1 or #2 ranked university in 14 countries.
Coursera: Courses from Top Universities
![Page 5: Andrew Ng Machine Learning and AI via Brain simulations Andrew Ng Stanford University Adam Coates Quoc Le Honglak Lee Andrew Saxe Andrew Maas Chris Manning](https://reader037.vdocuments.us/reader037/viewer/2022103005/56649f465503460f94c67c69/html5/thumbnails/5.jpg)
Andrew Ng
This talk: Deep Learning
Using brain simulations: - Make learning algorithms much better and easier to use.- Make revolutionary advances in machine learning and AI.
Vision shared with many researchers:
E.g., Samy Bengio, Yoshua Bengio, Tom Dean, Jeff Dean, Nando de Freitas, Jeff Hawkins, Geoff Hinton, Quoc Le, Yann LeCun, Honglak Lee, Tommy Poggio, Marc’Aurelio Ranzato, Ruslan Salakhutdinov, Yoram Singer, Josh Tenenbaum, Kai Yu, Jason Weston, ….
I believe this is our best shot at progress towards real AI.
![Page 6: Andrew Ng Machine Learning and AI via Brain simulations Andrew Ng Stanford University Adam Coates Quoc Le Honglak Lee Andrew Saxe Andrew Maas Chris Manning](https://reader037.vdocuments.us/reader037/viewer/2022103005/56649f465503460f94c67c69/html5/thumbnails/6.jpg)
Andrew Ng
What do we want computers to do with our data?
Images/video
Audio
Text
Label: “Motorcycle”Suggest tagsImage search…
Speech recognitionMusic classificationSpeaker identification…
Web searchAnti-spamMachine translation…
![Page 7: Andrew Ng Machine Learning and AI via Brain simulations Andrew Ng Stanford University Adam Coates Quoc Le Honglak Lee Andrew Saxe Andrew Maas Chris Manning](https://reader037.vdocuments.us/reader037/viewer/2022103005/56649f465503460f94c67c69/html5/thumbnails/7.jpg)
Andrew Ng
Computer vision is hard!
Motorcycle
Motorcycle
Motorcycle
Motorcycle
Motorcycle Motorcycle
Motorcycle
Motorcycle
Motorcycle
![Page 8: Andrew Ng Machine Learning and AI via Brain simulations Andrew Ng Stanford University Adam Coates Quoc Le Honglak Lee Andrew Saxe Andrew Maas Chris Manning](https://reader037.vdocuments.us/reader037/viewer/2022103005/56649f465503460f94c67c69/html5/thumbnails/8.jpg)
Andrew Ng
What do we want computers to do with our data?
Images/video
Audio
Text
Label: “Motorcycle”Suggest tagsImage search…
Speech recognitionSpeaker identificationMusic classification…
Web searchAnti-spamMachine translation…
Machine learning performs well on many of these problems, but is a lot of work. What is it about machine learning that makes it so hard to use?
![Page 9: Andrew Ng Machine Learning and AI via Brain simulations Andrew Ng Stanford University Adam Coates Quoc Le Honglak Lee Andrew Saxe Andrew Maas Chris Manning](https://reader037.vdocuments.us/reader037/viewer/2022103005/56649f465503460f94c67c69/html5/thumbnails/9.jpg)
Andrew Ng
Machine learning and feature representations
Learningalgorithm
Input
![Page 10: Andrew Ng Machine Learning and AI via Brain simulations Andrew Ng Stanford University Adam Coates Quoc Le Honglak Lee Andrew Saxe Andrew Maas Chris Manning](https://reader037.vdocuments.us/reader037/viewer/2022103005/56649f465503460f94c67c69/html5/thumbnails/10.jpg)
Andrew Ng
Machine learning and feature representations
Input
Learningalgorithm
Feature representation
![Page 11: Andrew Ng Machine Learning and AI via Brain simulations Andrew Ng Stanford University Adam Coates Quoc Le Honglak Lee Andrew Saxe Andrew Maas Chris Manning](https://reader037.vdocuments.us/reader037/viewer/2022103005/56649f465503460f94c67c69/html5/thumbnails/11.jpg)
Andrew Ng
How is computer perception done?
Image Vision features Detection
Images/video
Audio Audio features Speaker ID
Audio
Text
Text Text features
Text classification, Machine translation, Information retrieval, ....
![Page 12: Andrew Ng Machine Learning and AI via Brain simulations Andrew Ng Stanford University Adam Coates Quoc Le Honglak Lee Andrew Saxe Andrew Maas Chris Manning](https://reader037.vdocuments.us/reader037/viewer/2022103005/56649f465503460f94c67c69/html5/thumbnails/12.jpg)
Andrew Ng
Feature representations
Learningalgorithm
Feature Representation
Input
![Page 13: Andrew Ng Machine Learning and AI via Brain simulations Andrew Ng Stanford University Adam Coates Quoc Le Honglak Lee Andrew Saxe Andrew Maas Chris Manning](https://reader037.vdocuments.us/reader037/viewer/2022103005/56649f465503460f94c67c69/html5/thumbnails/13.jpg)
Andrew Ng
Computer vision features
SIFT Spin image
HoG RIFT
Textons GLOH
![Page 14: Andrew Ng Machine Learning and AI via Brain simulations Andrew Ng Stanford University Adam Coates Quoc Le Honglak Lee Andrew Saxe Andrew Maas Chris Manning](https://reader037.vdocuments.us/reader037/viewer/2022103005/56649f465503460f94c67c69/html5/thumbnails/14.jpg)
Andrew Ng
Audio features
ZCR
Spectrogram MFCC
RolloffFlux
![Page 15: Andrew Ng Machine Learning and AI via Brain simulations Andrew Ng Stanford University Adam Coates Quoc Le Honglak Lee Andrew Saxe Andrew Maas Chris Manning](https://reader037.vdocuments.us/reader037/viewer/2022103005/56649f465503460f94c67c69/html5/thumbnails/15.jpg)
Andrew Ng
NLP features
Parser featuresNamed entity recognition Stemming
Part of speechAnaphoraOntologies (WordNet)
Coming up with features is difficult, time-consuming, requires experts. “Applied machine learning” is basically feature engineering.
![Page 16: Andrew Ng Machine Learning and AI via Brain simulations Andrew Ng Stanford University Adam Coates Quoc Le Honglak Lee Andrew Saxe Andrew Maas Chris Manning](https://reader037.vdocuments.us/reader037/viewer/2022103005/56649f465503460f94c67c69/html5/thumbnails/16.jpg)
Andrew Ng
Feature representations
Input Learningalgorithm
Feature Representation
![Page 17: Andrew Ng Machine Learning and AI via Brain simulations Andrew Ng Stanford University Adam Coates Quoc Le Honglak Lee Andrew Saxe Andrew Maas Chris Manning](https://reader037.vdocuments.us/reader037/viewer/2022103005/56649f465503460f94c67c69/html5/thumbnails/17.jpg)
Andrew Ng
The “one learning algorithm” hypothesis
[Roe et al., 1992]
Auditory cortex learns to see
Auditory Cortex
![Page 18: Andrew Ng Machine Learning and AI via Brain simulations Andrew Ng Stanford University Adam Coates Quoc Le Honglak Lee Andrew Saxe Andrew Maas Chris Manning](https://reader037.vdocuments.us/reader037/viewer/2022103005/56649f465503460f94c67c69/html5/thumbnails/18.jpg)
Andrew Ng
The “one learning algorithm” hypothesis
[Metin & Frost, 1989]
Somatosensory cortex learns to see
Somatosensory Cortex
![Page 19: Andrew Ng Machine Learning and AI via Brain simulations Andrew Ng Stanford University Adam Coates Quoc Le Honglak Lee Andrew Saxe Andrew Maas Chris Manning](https://reader037.vdocuments.us/reader037/viewer/2022103005/56649f465503460f94c67c69/html5/thumbnails/19.jpg)
Andrew Ng
Feature learning problem
• Given a 14x14 image patch x, can represent it using 196 real numbers.
• Problem: Can we find a learn a better feature vector to represent this?
255989387899148…
![Page 20: Andrew Ng Machine Learning and AI via Brain simulations Andrew Ng Stanford University Adam Coates Quoc Le Honglak Lee Andrew Saxe Andrew Maas Chris Manning](https://reader037.vdocuments.us/reader037/viewer/2022103005/56649f465503460f94c67c69/html5/thumbnails/20.jpg)
Andrew Ng
First stage of visual processing: V1
V1 is the first stage of visual processing in the brain.
Neurons in V1 typically modeled as edge detectors:
Neuron #1 of visual cortex(model)
Neuron #2 of visual cortex(model)
![Page 21: Andrew Ng Machine Learning and AI via Brain simulations Andrew Ng Stanford University Adam Coates Quoc Le Honglak Lee Andrew Saxe Andrew Maas Chris Manning](https://reader037.vdocuments.us/reader037/viewer/2022103005/56649f465503460f94c67c69/html5/thumbnails/21.jpg)
Andrew Ng
Learning sensor representations
Sparse coding (Olshausen & Field,1996)
Input: Images x(1), x(2), …, x(m) (each in Rn x n)
Learn: Dictionary of bases f1, f2, …, fk (also Rn x n), so that each input x can be approximately decomposed as:
x aj fj
s.t. aj’s are mostly zero (“sparse”)
Use to represent 14x14 image patch succinctly, as [a7=0.8, a36=0.3, a41 = 0.5]. I.e., this indicates which “basic edges” make up the image.
j=1
k
![Page 22: Andrew Ng Machine Learning and AI via Brain simulations Andrew Ng Stanford University Adam Coates Quoc Le Honglak Lee Andrew Saxe Andrew Maas Chris Manning](https://reader037.vdocuments.us/reader037/viewer/2022103005/56649f465503460f94c67c69/html5/thumbnails/22.jpg)
Andrew Ng
Sparse coding illustration
Natural Images Learned bases (f1 , …, f64): “Edges”
50 100 150 200 250 300 350 400 450 500
50
100
150
200
250
300
350
400
450
500 50 100 150 200 250 300 350 400 450 500
50
100
150
200
250
300
350
400
450
500
50 100 150 200 250 300 350 400 450 500
50
100
150
200
250
300
350
400
450
500 50 100 150 200 250 300 350 400 450 500
50
100
150
200
250
300
350
400
450
500
50 100 150 200 250 300 350 400 450 500
50
100
150
200
250
300
350
400
450
500 50 100 150 200 250 300 350 400 450 500
50
100
150
200
250
300
350
400
450
500
» 0.8 * + 0.3 * + 0.5 *
x » 0.8 * f36 + 0.3 * f42
+ 0.5 *
f63[a1, …, a64] = [0, 0, …, 0, 0.8, 0, …, 0, 0.3, 0, …, 0, 0.5, 0] (feature representation)
Test example
More succinct, higher-level, representation.
![Page 23: Andrew Ng Machine Learning and AI via Brain simulations Andrew Ng Stanford University Adam Coates Quoc Le Honglak Lee Andrew Saxe Andrew Maas Chris Manning](https://reader037.vdocuments.us/reader037/viewer/2022103005/56649f465503460f94c67c69/html5/thumbnails/23.jpg)
Andrew Ng
More examples
Represent as: [a15=0.6, a28=0.8, a37 = 0.4].
Represent as: [a5=1.3, a18=0.9, a29 = 0.3].
0.6 * + 0.8 * + 0.4 *
15 28
37
1.3 * + 0.9 * + 0.3 *
5 18
29
• Method “invents” edge detection. • Automatically learns to represent an image in terms of the edges that
appear in it. Gives a more succinct, higher-level representation than the raw pixels.
• Quantitatively similar to primary visual cortex (area V1) in brain.
![Page 24: Andrew Ng Machine Learning and AI via Brain simulations Andrew Ng Stanford University Adam Coates Quoc Le Honglak Lee Andrew Saxe Andrew Maas Chris Manning](https://reader037.vdocuments.us/reader037/viewer/2022103005/56649f465503460f94c67c69/html5/thumbnails/24.jpg)
Andrew Ng
Sparse coding applied to audio
[Evan Smith & Mike Lewicki, 2006]
Image shows 20 basis functions learned from unlabeled audio.
![Page 25: Andrew Ng Machine Learning and AI via Brain simulations Andrew Ng Stanford University Adam Coates Quoc Le Honglak Lee Andrew Saxe Andrew Maas Chris Manning](https://reader037.vdocuments.us/reader037/viewer/2022103005/56649f465503460f94c67c69/html5/thumbnails/25.jpg)
Andrew Ng
Sparse coding applied to audio
[Evan Smith & Mike Lewicki, 2006]
Image shows 20 basis functions learned from unlabeled audio.
![Page 26: Andrew Ng Machine Learning and AI via Brain simulations Andrew Ng Stanford University Adam Coates Quoc Le Honglak Lee Andrew Saxe Andrew Maas Chris Manning](https://reader037.vdocuments.us/reader037/viewer/2022103005/56649f465503460f94c67c69/html5/thumbnails/26.jpg)
Andrew Ng
Learning feature hierarchies
Input image (pixels)
“Sparse coding”(edges; cf. V1)
Higher layer(Combinations of edges; cf. V2)
[Lee, Ranganath & Ng, 2007]
x1 x2 x3 x4
a3a2a1
[Technical details: Sparse autoencoder or sparse version of Hinton’s DBN.]
![Page 27: Andrew Ng Machine Learning and AI via Brain simulations Andrew Ng Stanford University Adam Coates Quoc Le Honglak Lee Andrew Saxe Andrew Maas Chris Manning](https://reader037.vdocuments.us/reader037/viewer/2022103005/56649f465503460f94c67c69/html5/thumbnails/27.jpg)
Andrew Ng
Learning feature hierarchies
Input image
Model V1
Higher layer(Model V2?)
Higher layer(Model V3?)
[Lee, Ranganath & Ng, 2007]
[Technical details: Sparse autoencoder or sparse version of Hinton’s DBN.]
x1 x2 x3 x4
a3a2a1
![Page 28: Andrew Ng Machine Learning and AI via Brain simulations Andrew Ng Stanford University Adam Coates Quoc Le Honglak Lee Andrew Saxe Andrew Maas Chris Manning](https://reader037.vdocuments.us/reader037/viewer/2022103005/56649f465503460f94c67c69/html5/thumbnails/28.jpg)
Andrew Ng
Hierarchical Sparse coding (Sparse DBN): Trained on face images
pixels
edges
object parts(combination of edges)
object models
[Honglak Lee]
Training set: Alignedimages of faces.
![Page 29: Andrew Ng Machine Learning and AI via Brain simulations Andrew Ng Stanford University Adam Coates Quoc Le Honglak Lee Andrew Saxe Andrew Maas Chris Manning](https://reader037.vdocuments.us/reader037/viewer/2022103005/56649f465503460f94c67c69/html5/thumbnails/29.jpg)
Andrew Ng
Machine learning applications
![Page 30: Andrew Ng Machine Learning and AI via Brain simulations Andrew Ng Stanford University Adam Coates Quoc Le Honglak Lee Andrew Saxe Andrew Maas Chris Manning](https://reader037.vdocuments.us/reader037/viewer/2022103005/56649f465503460f94c67c69/html5/thumbnails/30.jpg)
Andrew Ng
Unsupervised feature learning (Self-taught learning)
Testing:What is this?
Motorcycles Not motorcycles
Unlabeled images
…[Lee, Raina and Ng, 2006; Raina, Lee, Battle, Packer & Ng, 2007]
![Page 31: Andrew Ng Machine Learning and AI via Brain simulations Andrew Ng Stanford University Adam Coates Quoc Le Honglak Lee Andrew Saxe Andrew Maas Chris Manning](https://reader037.vdocuments.us/reader037/viewer/2022103005/56649f465503460f94c67c69/html5/thumbnails/31.jpg)
Andrew Ng
Video Activity recognition (Hollywood 2 benchmark)
Method Accuracy
Hessian + ESURF [Williems et al 2008] 38%
Harris3D + HOG/HOF [Laptev et al 2003, 2004] 45%
Cuboids + HOG/HOF [Dollar et al 2005, Laptev 2004] 46%
Hessian + HOG/HOF [Laptev 2004, Williems et al 2008] 46%
Dense + HOG / HOF [Laptev 2004] 47%
Cuboids + HOG3D [Klaser 2008, Dollar et al 2005] 46%
Unsupervised feature learning (our method) 52%
Unsupervised feature learning significantly improves on the previous state-of-the-art.
[Le, Zhou & Ng, 2011]
![Page 32: Andrew Ng Machine Learning and AI via Brain simulations Andrew Ng Stanford University Adam Coates Quoc Le Honglak Lee Andrew Saxe Andrew Maas Chris Manning](https://reader037.vdocuments.us/reader037/viewer/2022103005/56649f465503460f94c67c69/html5/thumbnails/32.jpg)
Andrew Ng
TIMIT Phone classification AccuracyPrior art (Clarkson et al.,1999) 79.6%
Stanford Feature learning 80.3%
TIMIT Speaker identification AccuracyPrior art (Reynolds, 1995) 99.7%Stanford Feature learning 100.0%
Audio
Images
Multimodal (audio/video)
CIFAR Object classification Accuracy
Prior art (Ciresan et al., 2011) 80.5%
Stanford Feature learning 82.0%
NORB Object classification Accuracy
Prior art (Scherer et al., 2010) 94.4%
Stanford Feature learning 95.0%
AVLetters Lip reading Accuracy
Prior art (Zhao et al., 2009) 58.9%
Stanford Feature learning 65.8%
Galaxy
Hollywood2 Classification Accuracy
Prior art (Laptev et al., 2004) 48%
Stanford Feature learning 53%
KTH Accuracy
Prior art (Wang et al., 2010) 92.1%
Stanford Feature learning 93.9%
UCF Accuracy
Prior art (Wang et al., 2010) 85.6%
Stanford Feature learning 86.5%
YouTube Accuracy
Prior art (Liu et al., 2009) 71.2%
Stanford Feature learning 75.8%
Video
Text/NLPParaphrase detection Accuracy
Prior art (Das & Smith, 2009) 76.1%
Stanford Feature learning 76.4%
Sentiment (MR/MPQA data) Accuracy
Prior art (Nakagawa et al., 2010) 77.3%
Stanford Feature learning 77.7%
![Page 33: Andrew Ng Machine Learning and AI via Brain simulations Andrew Ng Stanford University Adam Coates Quoc Le Honglak Lee Andrew Saxe Andrew Maas Chris Manning](https://reader037.vdocuments.us/reader037/viewer/2022103005/56649f465503460f94c67c69/html5/thumbnails/33.jpg)
Andrew Ng
How do you build a high accuracy
learning system?
![Page 34: Andrew Ng Machine Learning and AI via Brain simulations Andrew Ng Stanford University Adam Coates Quoc Le Honglak Lee Andrew Saxe Andrew Maas Chris Manning](https://reader037.vdocuments.us/reader037/viewer/2022103005/56649f465503460f94c67c69/html5/thumbnails/34.jpg)
Andrew Ng
Supervised Learning: Labeled data
• Choices of learning algorithm:– Memory based– Winnow– Perceptron– Naïve Bayes– SVM– ….
• What matters the most?
[Banko & Brill, 2001]Training set size (millions)
Acc
urac
y
“It’s not who has the best algorithm that wins. It’s who has the most data.”
![Page 35: Andrew Ng Machine Learning and AI via Brain simulations Andrew Ng Stanford University Adam Coates Quoc Le Honglak Lee Andrew Saxe Andrew Maas Chris Manning](https://reader037.vdocuments.us/reader037/viewer/2022103005/56649f465503460f94c67c69/html5/thumbnails/35.jpg)
Andrew Ng
Unsupervised Learning
Large numbers of features is critical. The specific learning algorithm is important, but ones that can scale to many features also have a big advantage.
[Adam Coates]
![Page 36: Andrew Ng Machine Learning and AI via Brain simulations Andrew Ng Stanford University Adam Coates Quoc Le Honglak Lee Andrew Saxe Andrew Maas Chris Manning](https://reader037.vdocuments.us/reader037/viewer/2022103005/56649f465503460f94c67c69/html5/thumbnails/36.jpg)
![Page 37: Andrew Ng Machine Learning and AI via Brain simulations Andrew Ng Stanford University Adam Coates Quoc Le Honglak Lee Andrew Saxe Andrew Maas Chris Manning](https://reader037.vdocuments.us/reader037/viewer/2022103005/56649f465503460f94c67c69/html5/thumbnails/37.jpg)
Learning from Labeled data
![Page 38: Andrew Ng Machine Learning and AI via Brain simulations Andrew Ng Stanford University Adam Coates Quoc Le Honglak Lee Andrew Saxe Andrew Maas Chris Manning](https://reader037.vdocuments.us/reader037/viewer/2022103005/56649f465503460f94c67c69/html5/thumbnails/38.jpg)
Model
Training Data
![Page 39: Andrew Ng Machine Learning and AI via Brain simulations Andrew Ng Stanford University Adam Coates Quoc Le Honglak Lee Andrew Saxe Andrew Maas Chris Manning](https://reader037.vdocuments.us/reader037/viewer/2022103005/56649f465503460f94c67c69/html5/thumbnails/39.jpg)
Model
Training Data
Machine (Model Partition)
![Page 40: Andrew Ng Machine Learning and AI via Brain simulations Andrew Ng Stanford University Adam Coates Quoc Le Honglak Lee Andrew Saxe Andrew Maas Chris Manning](https://reader037.vdocuments.us/reader037/viewer/2022103005/56649f465503460f94c67c69/html5/thumbnails/40.jpg)
Model
Machine (Model Partition)
CoreTraining Data
![Page 41: Andrew Ng Machine Learning and AI via Brain simulations Andrew Ng Stanford University Adam Coates Quoc Le Honglak Lee Andrew Saxe Andrew Maas Chris Manning](https://reader037.vdocuments.us/reader037/viewer/2022103005/56649f465503460f94c67c69/html5/thumbnails/41.jpg)
Model
Training Data
Basic DistBelief Model Training
Parallelize across ~100 machines (~1600 cores). Stochastic gradient descent.
But training is still slow with large data sets.
Add another dimension of parallelism, and have multiple model instances in parallel.
![Page 42: Andrew Ng Machine Learning and AI via Brain simulations Andrew Ng Stanford University Adam Coates Quoc Le Honglak Lee Andrew Saxe Andrew Maas Chris Manning](https://reader037.vdocuments.us/reader037/viewer/2022103005/56649f465503460f94c67c69/html5/thumbnails/42.jpg)
p
Model
Data
∆p p’
p’ = p + ∆p
Asynchronous Distributed Stochastic Gradient Descent
Parameter Server
∆p’
p’’ = p’ + ∆p’
![Page 43: Andrew Ng Machine Learning and AI via Brain simulations Andrew Ng Stanford University Adam Coates Quoc Le Honglak Lee Andrew Saxe Andrew Maas Chris Manning](https://reader037.vdocuments.us/reader037/viewer/2022103005/56649f465503460f94c67c69/html5/thumbnails/43.jpg)
Parameter Server
ModelWorkers
DataShards
p’ = p + ∆p
∆p p’
Asynchronous Distributed Stochastic Gradient Descent
![Page 44: Andrew Ng Machine Learning and AI via Brain simulations Andrew Ng Stanford University Adam Coates Quoc Le Honglak Lee Andrew Saxe Andrew Maas Chris Manning](https://reader037.vdocuments.us/reader037/viewer/2022103005/56649f465503460f94c67c69/html5/thumbnails/44.jpg)
Asynchronous Distributed Stochastic Gradient Descent
Parameter Server
Slave models
Data Shards
• Better robustness to individual slow machines
• Makes forward progress even during evictions/restarts
From an engineering standpoint, superior to a single model with the same number of total machines:
![Page 45: Andrew Ng Machine Learning and AI via Brain simulations Andrew Ng Stanford University Adam Coates Quoc Le Honglak Lee Andrew Saxe Andrew Maas Chris Manning](https://reader037.vdocuments.us/reader037/viewer/2022103005/56649f465503460f94c67c69/html5/thumbnails/45.jpg)
Acoustic Modeling for Speech Recognition
Async SGD and L-BFGS can both speed up model training.
To reach the same model quality DistBelief reached in 4 days took 55 days using a GPU....
DistBelief can support much larger models than a GPU (useful for unsupervised learning).
![Page 46: Andrew Ng Machine Learning and AI via Brain simulations Andrew Ng Stanford University Adam Coates Quoc Le Honglak Lee Andrew Saxe Andrew Maas Chris Manning](https://reader037.vdocuments.us/reader037/viewer/2022103005/56649f465503460f94c67c69/html5/thumbnails/46.jpg)
Andrew Ng
![Page 47: Andrew Ng Machine Learning and AI via Brain simulations Andrew Ng Stanford University Adam Coates Quoc Le Honglak Lee Andrew Saxe Andrew Maas Chris Manning](https://reader037.vdocuments.us/reader037/viewer/2022103005/56649f465503460f94c67c69/html5/thumbnails/47.jpg)
Andrew Ng
Speech recognition on Android
![Page 48: Andrew Ng Machine Learning and AI via Brain simulations Andrew Ng Stanford University Adam Coates Quoc Le Honglak Lee Andrew Saxe Andrew Maas Chris Manning](https://reader037.vdocuments.us/reader037/viewer/2022103005/56649f465503460f94c67c69/html5/thumbnails/48.jpg)
Andrew Ng
Application to Google Streetview
[with Yuval Netzer, Julian Ibarz]
![Page 49: Andrew Ng Machine Learning and AI via Brain simulations Andrew Ng Stanford University Adam Coates Quoc Le Honglak Lee Andrew Saxe Andrew Maas Chris Manning](https://reader037.vdocuments.us/reader037/viewer/2022103005/56649f465503460f94c67c69/html5/thumbnails/49.jpg)
Andrew Ng
Learning from Unlabeled data
![Page 50: Andrew Ng Machine Learning and AI via Brain simulations Andrew Ng Stanford University Adam Coates Quoc Le Honglak Lee Andrew Saxe Andrew Maas Chris Manning](https://reader037.vdocuments.us/reader037/viewer/2022103005/56649f465503460f94c67c69/html5/thumbnails/50.jpg)
Andrew Ng
Unsupervised Learning
Large numbers of features is critical. The specific learning algorithm is important, but ones that can scale to many features also have a big advantage.
[Adam Coates]
![Page 51: Andrew Ng Machine Learning and AI via Brain simulations Andrew Ng Stanford University Adam Coates Quoc Le Honglak Lee Andrew Saxe Andrew Maas Chris Manning](https://reader037.vdocuments.us/reader037/viewer/2022103005/56649f465503460f94c67c69/html5/thumbnails/51.jpg)
(training: 50,000 32x32 images)
10 million parameters
![Page 52: Andrew Ng Machine Learning and AI via Brain simulations Andrew Ng Stanford University Adam Coates Quoc Le Honglak Lee Andrew Saxe Andrew Maas Chris Manning](https://reader037.vdocuments.us/reader037/viewer/2022103005/56649f465503460f94c67c69/html5/thumbnails/52.jpg)
(training: 10,000,000 200x200 images)
1 billion parameters
![Page 53: Andrew Ng Machine Learning and AI via Brain simulations Andrew Ng Stanford University Adam Coates Quoc Le Honglak Lee Andrew Saxe Andrew Maas Chris Manning](https://reader037.vdocuments.us/reader037/viewer/2022103005/56649f465503460f94c67c69/html5/thumbnails/53.jpg)
Training procedure
What features can we learn if we train a massive model on a massive amount of data. Can we learn a “grandmother cell”?
• Train on 10 million images (YouTube)• 1000 machines (16,000 cores) for 1 week. • Test on novel images
Training set (YouTube) Test set (FITW + ImageNet)
![Page 54: Andrew Ng Machine Learning and AI via Brain simulations Andrew Ng Stanford University Adam Coates Quoc Le Honglak Lee Andrew Saxe Andrew Maas Chris Manning](https://reader037.vdocuments.us/reader037/viewer/2022103005/56649f465503460f94c67c69/html5/thumbnails/54.jpg)
Top stimuli from the test set Optimal stimulus by numerical optimization
The face neuron
Le, et al., Building high-level features using large-scale unsupervised learning. ICML 2012
![Page 55: Andrew Ng Machine Learning and AI via Brain simulations Andrew Ng Stanford University Adam Coates Quoc Le Honglak Lee Andrew Saxe Andrew Maas Chris Manning](https://reader037.vdocuments.us/reader037/viewer/2022103005/56649f465503460f94c67c69/html5/thumbnails/55.jpg)
Cat neuronTop Stimuli from the test set Average of top stimuli from test set
![Page 56: Andrew Ng Machine Learning and AI via Brain simulations Andrew Ng Stanford University Adam Coates Quoc Le Honglak Lee Andrew Saxe Andrew Maas Chris Manning](https://reader037.vdocuments.us/reader037/viewer/2022103005/56649f465503460f94c67c69/html5/thumbnails/56.jpg)
ImageNet classification: 22,000 classes…smoothhound, smoothhound shark, Mustelus mustelusAmerican smooth dogfish, Mustelus canisFlorida smoothhound, Mustelus norrisiwhitetip shark, reef whitetip shark, Triaenodon obseusAtlantic spiny dogfish, Squalus acanthiasPacific spiny dogfish, Squalus suckleyihammerhead, hammerhead sharksmooth hammerhead, Sphyrna zygaenasmalleye hammerhead, Sphyrna tudesshovelhead, bonnethead, bonnet shark, Sphyrna tiburoangel shark, angelfish, Squatina squatina, monkfishelectric ray, crampfish, numbfish, torpedosmalltooth sawfish, Pristis pectinatusguitarfishroughtail stingray, Dasyatis centrourabutterfly rayeagle rayspotted eagle ray, spotted ray, Aetobatus narinaricownose ray, cow-nosed ray, Rhinoptera bonasusmanta, manta ray, devilfishAtlantic manta, Manta birostrisdevil ray, Mobula hypostomagrey skate, gray skate, Raja batislittle skate, Raja erinacea…
Stingray
Mantaray
![Page 57: Andrew Ng Machine Learning and AI via Brain simulations Andrew Ng Stanford University Adam Coates Quoc Le Honglak Lee Andrew Saxe Andrew Maas Chris Manning](https://reader037.vdocuments.us/reader037/viewer/2022103005/56649f465503460f94c67c69/html5/thumbnails/57.jpg)
0.005%Random guess
9.5% ?Feature learning From raw pixels
State-of-the-art(Weston, Bengio ‘11)
Le, et al., Building high-level features using large-scale unsupervised learning. ICML 2012
![Page 58: Andrew Ng Machine Learning and AI via Brain simulations Andrew Ng Stanford University Adam Coates Quoc Le Honglak Lee Andrew Saxe Andrew Maas Chris Manning](https://reader037.vdocuments.us/reader037/viewer/2022103005/56649f465503460f94c67c69/html5/thumbnails/58.jpg)
0.005%Random guess
9.5%State-of-the-art
(Weston, Bengio ‘11)
18.3%Feature learning From raw pixels
Le, et al., Building high-level features using large-scale unsupervised learning. ICML 2012
![Page 59: Andrew Ng Machine Learning and AI via Brain simulations Andrew Ng Stanford University Adam Coates Quoc Le Honglak Lee Andrew Saxe Andrew Maas Chris Manning](https://reader037.vdocuments.us/reader037/viewer/2022103005/56649f465503460f94c67c69/html5/thumbnails/59.jpg)
Andrew Ng
Scaling up with HPC GPU cluster
HPC cluster: GPUs with InfinibandDifficult to program---lots of MPI and CUDA code.
GPUs with CUDA
1 very fast node.Limited memory; hard to scale out.
“Cloud” infrastructure
Many inexpensive nodes.Comm. bottlenecks, node failures.
Network fabric
![Page 60: Andrew Ng Machine Learning and AI via Brain simulations Andrew Ng Stanford University Adam Coates Quoc Le Honglak Lee Andrew Saxe Andrew Maas Chris Manning](https://reader037.vdocuments.us/reader037/viewer/2022103005/56649f465503460f94c67c69/html5/thumbnails/60.jpg)
Andrew Ng
Stanford GPU cluster
• Current system– 64 GPUs in 16 machines.– Tightly optimized CUDA for Deep Learning operations.– 47x faster than single-GPU implementation.
– Train 11.2 billion parameter, 9 layer neural network in < 4 days.
1 4 9 16 36 641
10
10011.2B6.9B3.0B1.9B680M
# GPUs
Fa
cto
r S
pe
ed
up
![Page 61: Andrew Ng Machine Learning and AI via Brain simulations Andrew Ng Stanford University Adam Coates Quoc Le Honglak Lee Andrew Saxe Andrew Maas Chris Manning](https://reader037.vdocuments.us/reader037/viewer/2022103005/56649f465503460f94c67c69/html5/thumbnails/61.jpg)
Andrew Ng
Discussion: Engineering vs.
Data
![Page 62: Andrew Ng Machine Learning and AI via Brain simulations Andrew Ng Stanford University Adam Coates Quoc Le Honglak Lee Andrew Saxe Andrew Maas Chris Manning](https://reader037.vdocuments.us/reader037/viewer/2022103005/56649f465503460f94c67c69/html5/thumbnails/62.jpg)
Andrew Ng
Discussion: Engineering vs.
Data
Humaningenuity
Data/learning
Contribution to performance
![Page 63: Andrew Ng Machine Learning and AI via Brain simulations Andrew Ng Stanford University Adam Coates Quoc Le Honglak Lee Andrew Saxe Andrew Maas Chris Manning](https://reader037.vdocuments.us/reader037/viewer/2022103005/56649f465503460f94c67c69/html5/thumbnails/63.jpg)
Andrew Ng
Discussion: Engineering vs.
Data
Time
Contribution to performance
Now
![Page 64: Andrew Ng Machine Learning and AI via Brain simulations Andrew Ng Stanford University Adam Coates Quoc Le Honglak Lee Andrew Saxe Andrew Maas Chris Manning](https://reader037.vdocuments.us/reader037/viewer/2022103005/56649f465503460f94c67c69/html5/thumbnails/64.jpg)
Andrew Ng
• Deep Learning: Lets learn our features.
• Discover the fundamental computational principles that underlie perception.
• Scaling up has been key to achieving good performance.
• Didn’t talk about: Recursive deep learning for NLP.
• Online tutorial on deep learning: http://deeplearning.stanford.edu/wiki
Deep Learning
Adam Coates Quoc Le Honglak Lee Andrew Saxe Andrew Maas Chris Manning Jiquan Ngiam Richard Socher Will Zou
Stanford
Kai Chen Greg Corrado Jeff Dean Matthieu Devin Andrea Frome Rajat Monga Marc’Aurelio Paul Tucker Kay Le Ranzato