rajat raina honglak lee, roger grosse alexis battle, chaitanya ekanadham, helen kwong, benjamin...
TRANSCRIPT
![Page 1: Rajat Raina Honglak Lee, Roger Grosse Alexis Battle, Chaitanya Ekanadham, Helen Kwong, Benjamin Packer, Narut Sereewattanawoot Andrew Y. Ng Stanford University](https://reader035.vdocuments.us/reader035/viewer/2022081603/56649c815503460f94938d44/html5/thumbnails/1.jpg)
Rajat Raina
Honglak Lee, Roger GrosseAlexis Battle, Chaitanya Ekanadham, Helen Kwong,
Benjamin Packer,Narut Sereewattanawoot
Andrew Y. Ng
Stanford University
Self-taught LearningTransfer Learning from Unlabeled Data
![Page 2: Rajat Raina Honglak Lee, Roger Grosse Alexis Battle, Chaitanya Ekanadham, Helen Kwong, Benjamin Packer, Narut Sereewattanawoot Andrew Y. Ng Stanford University](https://reader035.vdocuments.us/reader035/viewer/2022081603/56649c815503460f94938d44/html5/thumbnails/2.jpg)
The “one learning algorithm” hypothesis
There is some evidence that the human brain uses essentially the same algorithm to understand many different input modalities.– Example: Ferret experiments, in which the “input”
for vision was plugged into auditory part of brain, and the auditory cortex learns to “see.” [Roe et al., 1992]
Self-taught Learning
(Roe et al., 1992. Hawkins & Blakeslee, 2004)
![Page 3: Rajat Raina Honglak Lee, Roger Grosse Alexis Battle, Chaitanya Ekanadham, Helen Kwong, Benjamin Packer, Narut Sereewattanawoot Andrew Y. Ng Stanford University](https://reader035.vdocuments.us/reader035/viewer/2022081603/56649c815503460f94938d44/html5/thumbnails/3.jpg)
There is some evidence that the human brain uses essentially the same algorithm to understand many different input modalities.– Example: Ferret experiments, in which the “input”
for vision was plugged into auditory part of brain, and the auditory cortex learns to “see.” [Roe et al., 1992]
If we could find this one learning algorithm,we would be done. (Finally!)
Self-taught Learning
(Roe et al., 1992. Hawkins & Blakeslee, 2004)
The “one learning algorithm” hypothesis
![Page 4: Rajat Raina Honglak Lee, Roger Grosse Alexis Battle, Chaitanya Ekanadham, Helen Kwong, Benjamin Packer, Narut Sereewattanawoot Andrew Y. Ng Stanford University](https://reader035.vdocuments.us/reader035/viewer/2022081603/56649c815503460f94938d44/html5/thumbnails/4.jpg)
This talk
If the brain really is one learning algorithm, it would suffice to just:Find a learning algorithm for a single layer,
and,Show that it can build a small number of
layers.We evaluate our algorithms:
Against biology. On applications.
Finding a deep learning algorithm
Self-taught Learning
e.g., Sparse RBMs for V2: Poster yesterday (Lee et al.)
![Page 5: Rajat Raina Honglak Lee, Roger Grosse Alexis Battle, Chaitanya Ekanadham, Helen Kwong, Benjamin Packer, Narut Sereewattanawoot Andrew Y. Ng Stanford University](https://reader035.vdocuments.us/reader035/viewer/2022081603/56649c815503460f94938d44/html5/thumbnails/5.jpg)
Supervised learning
Cars Motorcycles
Train Test
Self-taught Learning
Supervised learning algorithms may not work well with limited labeled data.
![Page 6: Rajat Raina Honglak Lee, Roger Grosse Alexis Battle, Chaitanya Ekanadham, Helen Kwong, Benjamin Packer, Narut Sereewattanawoot Andrew Y. Ng Stanford University](https://reader035.vdocuments.us/reader035/viewer/2022081603/56649c815503460f94938d44/html5/thumbnails/6.jpg)
Learning in humansYour brain has 1014 synapses (connections).You will live for 109 seconds.If each synapse requires 1 bit to
parameterize, you need to “learn” 1014 bits in 109 seconds.
Or, 105 bits per second.
Human learning is largely unsupervised, and uses readily available unlabeled data.
Self-taught Learning
(Geoffrey Hinton, personal communication)
![Page 7: Rajat Raina Honglak Lee, Roger Grosse Alexis Battle, Chaitanya Ekanadham, Helen Kwong, Benjamin Packer, Narut Sereewattanawoot Andrew Y. Ng Stanford University](https://reader035.vdocuments.us/reader035/viewer/2022081603/56649c815503460f94938d44/html5/thumbnails/7.jpg)
Supervised learning
Cars Motorcycles
Train Test
Self-taught Learning
![Page 8: Rajat Raina Honglak Lee, Roger Grosse Alexis Battle, Chaitanya Ekanadham, Helen Kwong, Benjamin Packer, Narut Sereewattanawoot Andrew Y. Ng Stanford University](https://reader035.vdocuments.us/reader035/viewer/2022081603/56649c815503460f94938d44/html5/thumbnails/8.jpg)
“Brain-like” Learning
Cars Motorcycles
Train Test
Unlabeled images(randomly downloaded from the Internet)
Self-taught Learning
![Page 9: Rajat Raina Honglak Lee, Roger Grosse Alexis Battle, Chaitanya Ekanadham, Helen Kwong, Benjamin Packer, Narut Sereewattanawoot Andrew Y. Ng Stanford University](https://reader035.vdocuments.us/reader035/viewer/2022081603/56649c815503460f94938d44/html5/thumbnails/9.jpg)
“Brain-like” Learning
Unlabeled English charactersLabeled Digits
Self-taught Learning
Labeled Webpages Unlabeled newspaper articles
Labeled Russian Speech Unlabeled English speech
+ ?
+ ?
+ ?
![Page 10: Rajat Raina Honglak Lee, Roger Grosse Alexis Battle, Chaitanya Ekanadham, Helen Kwong, Benjamin Packer, Narut Sereewattanawoot Andrew Y. Ng Stanford University](https://reader035.vdocuments.us/reader035/viewer/2022081603/56649c815503460f94938d44/html5/thumbnails/10.jpg)
“Self-taught Learning”
Unlabeled English charactersLabeled Digits
Self-taught Learning
Labeled Webpages Unlabeled newspaper articles
Labeled Russian Speech Unlabeled English speech
+ ?
+ ?
+ ?
![Page 11: Rajat Raina Honglak Lee, Roger Grosse Alexis Battle, Chaitanya Ekanadham, Helen Kwong, Benjamin Packer, Narut Sereewattanawoot Andrew Y. Ng Stanford University](https://reader035.vdocuments.us/reader035/viewer/2022081603/56649c815503460f94938d44/html5/thumbnails/11.jpg)
Recent history of machine learning• 20 years ago: Supervised learning
• 10 years ago: Semi-supervised learning.
• 10 years ago: Transfer learning.
• Next: Self-taught learning?
Cars Motorcycles
Bus Cars MotorcyclesTractor Aircraft Helicopter
Natural scenesCar Motorcycle
Cars Motorcycles
![Page 12: Rajat Raina Honglak Lee, Roger Grosse Alexis Battle, Chaitanya Ekanadham, Helen Kwong, Benjamin Packer, Narut Sereewattanawoot Andrew Y. Ng Stanford University](https://reader035.vdocuments.us/reader035/viewer/2022081603/56649c815503460f94938d44/html5/thumbnails/12.jpg)
Self-taught Learning
Self-taught Learning
Labeled examples:
Unlabeled examples:
The unlabeled and labeled data:• Need not share labels y.• Need not share a generative distribution.
Advantage: Such unlabeled data is often easy to obtain.
mi
iil yx 1
)()( )},{( },,1{, )()( TyRx inil
ki
iux 1
)( }{ mkRx niu ,)(
![Page 13: Rajat Raina Honglak Lee, Roger Grosse Alexis Battle, Chaitanya Ekanadham, Helen Kwong, Benjamin Packer, Narut Sereewattanawoot Andrew Y. Ng Stanford University](https://reader035.vdocuments.us/reader035/viewer/2022081603/56649c815503460f94938d44/html5/thumbnails/13.jpg)
Overview: Represent each labeled or unlabeled input as a sparse linear combination of “basis vectors” .
A self-taught learning algorithm
= 0.8 * + 0.3 * + 0.5 *
x = 0.8 * b87 + 0.3 * b376
+ 0.5 *
b411
Self-taught Learning
j
jjbax RaRb jn
j ,
sjjb 1}{
x
![Page 14: Rajat Raina Honglak Lee, Roger Grosse Alexis Battle, Chaitanya Ekanadham, Helen Kwong, Benjamin Packer, Narut Sereewattanawoot Andrew Y. Ng Stanford University](https://reader035.vdocuments.us/reader035/viewer/2022081603/56649c815503460f94938d44/html5/thumbnails/14.jpg)
Key steps:1. Learn good bases using unlabeled data .2. Use these learnt bases to construct “higher-level” features for the
labeled data.3. Apply a standard supervised learning algorithm on these features.
A self-taught learning algorithm
= 0.8 * + 0.3 * + 0.5 *
Self-taught Learning
)(iuxjb
j
jjbax
x = 0.8 * b87 + 0.3 * b376
+ 0.5 *
b411
![Page 15: Rajat Raina Honglak Lee, Roger Grosse Alexis Battle, Chaitanya Ekanadham, Helen Kwong, Benjamin Packer, Narut Sereewattanawoot Andrew Y. Ng Stanford University](https://reader035.vdocuments.us/reader035/viewer/2022081603/56649c815503460f94938d44/html5/thumbnails/15.jpg)
Given only unlabeled data , we find good bases b using sparse coding:
Learning the bases: Sparse coding
Self-taught Learning
i
i
i jj
ij
iuab abax 1
)(22
)()(, ||||||||min
)(iux
2|||| jb
Reconstruction error Sparsity penalty
[Details: An extra normalization constraint on is required.]
(Efficient algorithms: Lee et al., NIPS 2006)
![Page 16: Rajat Raina Honglak Lee, Roger Grosse Alexis Battle, Chaitanya Ekanadham, Helen Kwong, Benjamin Packer, Narut Sereewattanawoot Andrew Y. Ng Stanford University](https://reader035.vdocuments.us/reader035/viewer/2022081603/56649c815503460f94938d44/html5/thumbnails/16.jpg)
Example basesNatural images. Learnt bases: “Edges”
Self-taught Learning
Handwritten characters. Learnt bases: “Strokes”
![Page 17: Rajat Raina Honglak Lee, Roger Grosse Alexis Battle, Chaitanya Ekanadham, Helen Kwong, Benjamin Packer, Narut Sereewattanawoot Andrew Y. Ng Stanford University](https://reader035.vdocuments.us/reader035/viewer/2022081603/56649c815503460f94938d44/html5/thumbnails/17.jpg)
Constructing featuresUsing the learnt bases b, compute features for the
examples xl from the classification task by solving:
Finally, learn a classifer using a standard supervised learning algorithm (e.g., SVM) over these features.
= 0.8 * + 0.3 * + 0.5 *
Self-taught Learning
122 ||||||||minarg of Features abaxx
jjjlal
xl = 0.8 * b87 + 0.3 * b376
+ 0.5 *
b411
Reconstruction error Sparsity penalty
![Page 18: Rajat Raina Honglak Lee, Roger Grosse Alexis Battle, Chaitanya Ekanadham, Helen Kwong, Benjamin Packer, Narut Sereewattanawoot Andrew Y. Ng Stanford University](https://reader035.vdocuments.us/reader035/viewer/2022081603/56649c815503460f94938d44/html5/thumbnails/18.jpg)
Image classification
Self-taught Learning
Large image(Platypus from
Caltech101 dataset)
Feature visualization
![Page 19: Rajat Raina Honglak Lee, Roger Grosse Alexis Battle, Chaitanya Ekanadham, Helen Kwong, Benjamin Packer, Narut Sereewattanawoot Andrew Y. Ng Stanford University](https://reader035.vdocuments.us/reader035/viewer/2022081603/56649c815503460f94938d44/html5/thumbnails/19.jpg)
Image classification
Self-taught Learning
Platypus image(Caltech101 dataset)
Feature visualization
![Page 20: Rajat Raina Honglak Lee, Roger Grosse Alexis Battle, Chaitanya Ekanadham, Helen Kwong, Benjamin Packer, Narut Sereewattanawoot Andrew Y. Ng Stanford University](https://reader035.vdocuments.us/reader035/viewer/2022081603/56649c815503460f94938d44/html5/thumbnails/20.jpg)
Image classification
Self-taught Learning
Platypus image(Caltech101 dataset)
Feature visualization
![Page 21: Rajat Raina Honglak Lee, Roger Grosse Alexis Battle, Chaitanya Ekanadham, Helen Kwong, Benjamin Packer, Narut Sereewattanawoot Andrew Y. Ng Stanford University](https://reader035.vdocuments.us/reader035/viewer/2022081603/56649c815503460f94938d44/html5/thumbnails/21.jpg)
Image classification
Self-taught Learning
Platypus image(Caltech101 dataset)
Feature visualization
![Page 22: Rajat Raina Honglak Lee, Roger Grosse Alexis Battle, Chaitanya Ekanadham, Helen Kwong, Benjamin Packer, Narut Sereewattanawoot Andrew Y. Ng Stanford University](https://reader035.vdocuments.us/reader035/viewer/2022081603/56649c815503460f94938d44/html5/thumbnails/22.jpg)
Image classification
Self-taught Learning
Baseline 16%
PCA 37%
Sparse coding 47%
Other reported results:Fei-Fei et al, 2004: 16%Berg et al., 2005: 17%Holub et al., 2005: 40%Serre et al., 2005: 35%Berg et al, 2005: 48%
Zhang et al., 2006: 59%Lazebnik et al., 2006: 56%
(15 labeled images per class)
36.0% error reduction
![Page 23: Rajat Raina Honglak Lee, Roger Grosse Alexis Battle, Chaitanya Ekanadham, Helen Kwong, Benjamin Packer, Narut Sereewattanawoot Andrew Y. Ng Stanford University](https://reader035.vdocuments.us/reader035/viewer/2022081603/56649c815503460f94938d44/html5/thumbnails/23.jpg)
Raw 54.8%
PCA 54.8%
Sparse coding 58.5%
Character recognition
Self-taught Learning
Digits Handwritten English English font
Handwritten English classification(20 labeled images per handwritten character)
Bases learnt on digits
English font classification(20 labeled images per font character)
Bases learnt on handwritten English
Raw 17.9%
PCA 14.5%
Sparse coding 16.6%
Sparse coding + Raw 20.2%
8.2% error reduction 2.8% error reduction
![Page 24: Rajat Raina Honglak Lee, Roger Grosse Alexis Battle, Chaitanya Ekanadham, Helen Kwong, Benjamin Packer, Narut Sereewattanawoot Andrew Y. Ng Stanford University](https://reader035.vdocuments.us/reader035/viewer/2022081603/56649c815503460f94938d44/html5/thumbnails/24.jpg)
Text classification
Self-taught Learning
Raw words 62.8%
PCA 63.3%
Sparse coding 64.3%
Reuters newswire Webpages UseNet articles
Webpage classification(2 labeled documents per class)
Bases learnt on Reuters newswire
Raw words 61.3%
PCA 60.7%
Sparse coding 63.8%
UseNet classification(2 labeled documents per class)
Bases learnt on Reuters newswire
4.0% error reduction 6.5% error reduction
![Page 25: Rajat Raina Honglak Lee, Roger Grosse Alexis Battle, Chaitanya Ekanadham, Helen Kwong, Benjamin Packer, Narut Sereewattanawoot Andrew Y. Ng Stanford University](https://reader035.vdocuments.us/reader035/viewer/2022081603/56649c815503460f94938d44/html5/thumbnails/25.jpg)
Shift-invariant sparse coding
Self-taught Learning
Sparse features Basis functions
Reconstruction
(Algorithms: Grosse et al., UAI 2007)
![Page 26: Rajat Raina Honglak Lee, Roger Grosse Alexis Battle, Chaitanya Ekanadham, Helen Kwong, Benjamin Packer, Narut Sereewattanawoot Andrew Y. Ng Stanford University](https://reader035.vdocuments.us/reader035/viewer/2022081603/56649c815503460f94938d44/html5/thumbnails/26.jpg)
Audio classification
Self-taught Learning
Spectrogram 38.5%
MFCCs 43.8%
Sparse coding 48.7%
8.7% error reduction
(Details: Grosse et al., UAI 2007)
Speaker identification(5 labels, TIMIT corpus, 1 sentence per speaker.)
Bases learnt on different dialects
Spectrogram 48.4%
MFCCs 54.0%
Music-specific model 49.3%
Sparse coding 56.6%
Musical genre classification(5 labels, 18 seconds per genre.)
Bases learnt on different genres, songs
5.7% error reduction
![Page 27: Rajat Raina Honglak Lee, Roger Grosse Alexis Battle, Chaitanya Ekanadham, Helen Kwong, Benjamin Packer, Narut Sereewattanawoot Andrew Y. Ng Stanford University](https://reader035.vdocuments.us/reader035/viewer/2022081603/56649c815503460f94938d44/html5/thumbnails/27.jpg)
Sparse deep belief networks
Self-taught Learning
(Details: Lee et al., NIPS 2007. Poster yesterday.)
. . .
. . .
h: Hidden layer
v: Visible layer
W, b, c: Parameters
New
Sparse RBM
![Page 28: Rajat Raina Honglak Lee, Roger Grosse Alexis Battle, Chaitanya Ekanadham, Helen Kwong, Benjamin Packer, Narut Sereewattanawoot Andrew Y. Ng Stanford University](https://reader035.vdocuments.us/reader035/viewer/2022081603/56649c815503460f94938d44/html5/thumbnails/28.jpg)
Sparse deep belief networks
Self-taught Learning
1-layer sparse DBN 44.5%
2-layer sparse DBN 46.6%
3.2% error reduction
(Details: Lee et al., NIPS 2007. Poster yesterday.)
Image classification(Caltech101 dataset)
![Page 29: Rajat Raina Honglak Lee, Roger Grosse Alexis Battle, Chaitanya Ekanadham, Helen Kwong, Benjamin Packer, Narut Sereewattanawoot Andrew Y. Ng Stanford University](https://reader035.vdocuments.us/reader035/viewer/2022081603/56649c815503460f94938d44/html5/thumbnails/29.jpg)
SummarySelf-taught learning: Unlabeled data does not
share the labels of the classification task.
Use unlabeled data to discover features.Use sparse coding to construct an easy-to-
classify, “higher-level” representation.
Self-taught Learning
Cars Motorcycles
= 0.8 * + 0.3 * + 0.5 *
Unlabeled images
![Page 30: Rajat Raina Honglak Lee, Roger Grosse Alexis Battle, Chaitanya Ekanadham, Helen Kwong, Benjamin Packer, Narut Sereewattanawoot Andrew Y. Ng Stanford University](https://reader035.vdocuments.us/reader035/viewer/2022081603/56649c815503460f94938d44/html5/thumbnails/30.jpg)
THE END
![Page 31: Rajat Raina Honglak Lee, Roger Grosse Alexis Battle, Chaitanya Ekanadham, Helen Kwong, Benjamin Packer, Narut Sereewattanawoot Andrew Y. Ng Stanford University](https://reader035.vdocuments.us/reader035/viewer/2022081603/56649c815503460f94938d44/html5/thumbnails/31.jpg)
Related Work
Self-taught Learning
• Weston et al, ICML 2006• Make stronger assumptions on the unlabeled data.
• Ando & Zhang, JMLR 2005• For natural language tasks and character
recognition, use heuristics to construct a transfer learning task using unlabeled data.