the role of learning in vision
DESCRIPTION
The Role of Learning in Vision. 3.30pm: Rob Fergus 3.40pm: Andrew Ng 3 .50pm: Kai Yu 4.00pm: Yann LeCun 4.10pm: Alan Yuille 4.20pm: Deva Ramanan 4.30pm: Erik Learned-Miller 4 .40pm: Erik Sudderth 4.50pm: Spotlights - Qiang Ji , M-H Yang 4 .55pm: Discussion - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: The Role of Learning in Vision](https://reader033.vdocuments.us/reader033/viewer/2022061418/56815865550346895dc5c46b/html5/thumbnails/1.jpg)
The Role of Learning in Vision
3.30pm: Rob Fergus3.40pm: Andrew Ng3.50pm: Kai Yu4.00pm: Yann LeCun4.10pm: Alan Yuille4.20pm: Deva Ramanan4.30pm: Erik Learned-Miller4.40pm: Erik Sudderth4.50pm: Spotlights
- Qiang Ji, M-H Yang4.55pm: Discussion5.30pm: End
Feature / Deep Learning
Compositional Models
Learning Representations
Overview
Low-level Representations
Learning on the fly
![Page 2: The Role of Learning in Vision](https://reader033.vdocuments.us/reader033/viewer/2022061418/56815865550346895dc5c46b/html5/thumbnails/2.jpg)
An Overview of Hierarchical Feature Learning and Relations to Other Models
Rob Fergus
Dept. of Computer Science, Courant Institute,
New York University
![Page 3: The Role of Learning in Vision](https://reader033.vdocuments.us/reader033/viewer/2022061418/56815865550346895dc5c46b/html5/thumbnails/3.jpg)
Motivation
• Multitude of hand-designed features currently in use– SIFT, HOG, LBP, MSER, Color-SIFT………….
• Maybe some way of learning the features?
• Also, just capture low-level edge gradients
Felzenszwalb, Girshick, McAllester and Ramanan, PAMI
2007
Yan & Huang (Winner of PASCAL 2010 classification
competition)
![Page 4: The Role of Learning in Vision](https://reader033.vdocuments.us/reader033/viewer/2022061418/56815865550346895dc5c46b/html5/thumbnails/4.jpg)
• Mid-level cues
Beyond Edges?
“Tokens” from Vision by D.Marr:
Continuation Parallelism Junctions Corners
• High-level object parts:
• Difficult to hand-engineer What about learning them?
![Page 5: The Role of Learning in Vision](https://reader033.vdocuments.us/reader033/viewer/2022061418/56815865550346895dc5c46b/html5/thumbnails/5.jpg)
• Build hierarchy of feature extractors (≥ 1 layers)– All the way from pixels classifier– Homogenous structure per layer– Unsupervised training
Deep/Feature Learning Goal
Layer 1Layer 1 Layer 2Layer 2 Layer 3Layer 3 Simple Classifier
Image/VideoPixels
• Numerous approaches:– Restricted Boltzmann Machines (Hinton, Ng, Bengio,…)– Sparse coding (Yu,
Fergus, LeCun)– Auto-encoders (LeCun,
Bengio)– ICA variants (Ng, Cottrell)
& many more….
![Page 6: The Role of Learning in Vision](https://reader033.vdocuments.us/reader033/viewer/2022061418/56815865550346895dc5c46b/html5/thumbnails/6.jpg)
Single Layer Architecture
Filter
Normalize
Pool
Input: Image Pixels / Features
Output: Features / Classifier
Details in the boxes matter
(especially in a hierarchy)
Links to neuroscience
![Page 7: The Role of Learning in Vision](https://reader033.vdocuments.us/reader033/viewer/2022061418/56815865550346895dc5c46b/html5/thumbnails/7.jpg)
Example Feature Learning Architectures
Pixels /Features
Filter with Dictionary(patch/tiled/convolutional)
Spatial/Feature (Sum or Max)
Normalizationbetween feature responses
Features
+ Non-linearity
Local Contrast Normalization (Subtractive /
Divisive)
(Group)
Sparsity
Max /
Softmax
![Page 8: The Role of Learning in Vision](https://reader033.vdocuments.us/reader033/viewer/2022061418/56815865550346895dc5c46b/html5/thumbnails/8.jpg)
SIFT Descriptor
Image Pixels Apply
Gabor filters
Spatial pool (Sum)
Normalize to unit length
Feature Vector
![Page 9: The Role of Learning in Vision](https://reader033.vdocuments.us/reader033/viewer/2022061418/56815865550346895dc5c46b/html5/thumbnails/9.jpg)
SIFTFeatures
Filter with Visual Words
Multi-scalespatial pool (Sum)
Max
Classifier
Spatial Pyramid Matching
Lazebnik, Schmid,
Ponce [CVPR 2006]
![Page 10: The Role of Learning in Vision](https://reader033.vdocuments.us/reader033/viewer/2022061418/56815865550346895dc5c46b/html5/thumbnails/10.jpg)
Role of Normalization
• Lots of different mechanisms (max, sparsity, LCN etc.)
• All induce local competition between features to explain input– “Explaining away” – Just like top-down models– But more local mechanism
Example: Convolutional Sparse Coding
FiltersConvolution
|.|1|.|1|.|1|.|1
Zeiler et al. [CVPR’10/ICCV’11],Kavakouglou et al. [NIPS’10], Yang et al. [CVPR’10]
![Page 11: The Role of Learning in Vision](https://reader033.vdocuments.us/reader033/viewer/2022061418/56815865550346895dc5c46b/html5/thumbnails/11.jpg)
Role of Pooling
• Spatial pooling– Invariance to small
transformations
Chen, Zhu, Lin, Yuille, Zhang [NIPS 2007]
• Pooling across feature groups– Gives AND/OR type behavior– Compositional models of Zhu,
Yuille
– Larger receptive fields
Zeiler, Taylor, Fergus [ICCV 2011]
• Pooling with latent variables (& springs)– Pictorial structures models
Felzenszwalb, Girshick, McAllester, Ramanan[PAMI 2009]
![Page 12: The Role of Learning in Vision](https://reader033.vdocuments.us/reader033/viewer/2022061418/56815865550346895dc5c46b/html5/thumbnails/12.jpg)
![Page 13: The Role of Learning in Vision](https://reader033.vdocuments.us/reader033/viewer/2022061418/56815865550346895dc5c46b/html5/thumbnails/13.jpg)
HOGPyramid
Apply objectpart filters
Pool part responses (latent variables & springs) Non-maxSuppression(Spatial)
Score
Object Detection with Discriminatively Trained Part-Based Models
Felzenszwalb, Girshick,
McAllester, Ramanan
[PAMI 2009]
+ +