the role of learning in vision

The Role of Learning in Vision

3.30pm: Rob Fergus3.40pm: Andrew Ng3.50pm: Kai Yu4.00pm: Yann LeCun4.10pm: Alan Yuille4.20pm: Deva Ramanan4.30pm: Erik Learned-Miller4.40pm: Erik Sudderth4.50pm: Spotlights

- Qiang Ji, M-H Yang4.55pm: Discussion5.30pm: End

Feature / Deep Learning

Compositional Models

Learning Representations

Overview

Low-level Representations

Learning on the fly

An Overview of Hierarchical Feature Learning and Relations to Other Models

Rob Fergus

Dept. of Computer Science, Courant Institute,

New York University

Motivation

• Multitude of hand-designed features currently in use– SIFT, HOG, LBP, MSER, Color-SIFT………….

• Maybe some way of learning the features?

• Also, just capture low-level edge gradients

Felzenszwalb, Girshick, McAllester and Ramanan, PAMI

2007

Yan & Huang (Winner of PASCAL 2010 classification

competition)

• Mid-level cues

Beyond Edges?

“Tokens” from Vision by D.Marr:

Continuation Parallelism Junctions Corners

• High-level object parts:

• Difficult to hand-engineer What about learning them?

• Build hierarchy of feature extractors (≥ 1 layers)– All the way from pixels classifier– Homogenous structure per layer– Unsupervised training

Deep/Feature Learning Goal

Layer 1Layer 1 Layer 2Layer 2 Layer 3Layer 3 Simple Classifier

Image/VideoPixels

• Numerous approaches:– Restricted Boltzmann Machines (Hinton, Ng, Bengio,…)– Sparse coding (Yu,

Fergus, LeCun)– Auto-encoders (LeCun,

Bengio)– ICA variants (Ng, Cottrell)

& many more….

Single Layer Architecture

Filter

Normalize

Pool

Input: Image Pixels / Features

Output: Features / Classifier

Details in the boxes matter

(especially in a hierarchy)

Links to neuroscience

Example Feature Learning Architectures

Pixels /Features

Filter with Dictionary(patch/tiled/convolutional)

Spatial/Feature (Sum or Max)

Normalizationbetween feature responses

Features

+ Non-linearity

Local Contrast Normalization (Subtractive /

Divisive)

(Group)

Sparsity

Max /

Softmax

SIFT Descriptor

Image Pixels Apply

Gabor filters

Spatial pool (Sum)

Normalize to unit length

Feature Vector

SIFTFeatures

Filter with Visual Words

Multi-scalespatial pool (Sum)

Max

Classifier

Spatial Pyramid Matching

Lazebnik, Schmid,

Ponce [CVPR 2006]

Role of Normalization

• Lots of different mechanisms (max, sparsity, LCN etc.)

• All induce local competition between features to explain input– “Explaining away” – Just like top-down models– But more local mechanism

Example: Convolutional Sparse Coding

FiltersConvolution

|.|1|.|1|.|1|.|1

Zeiler et al. [CVPR’10/ICCV’11],Kavakouglou et al. [NIPS’10], Yang et al. [CVPR’10]

Role of Pooling

• Spatial pooling– Invariance to small

transformations

Chen, Zhu, Lin, Yuille, Zhang [NIPS 2007]

• Pooling across feature groups– Gives AND/OR type behavior– Compositional models of Zhu,

Yuille

– Larger receptive fields

Zeiler, Taylor, Fergus [ICCV 2011]

• Pooling with latent variables (& springs)– Pictorial structures models

Felzenszwalb, Girshick, McAllester, Ramanan[PAMI 2009]

HOGPyramid

Apply objectpart filters

Pool part responses (latent variables & springs) Non-maxSuppression(Spatial)

Score

Object Detection with Discriminatively Trained Part-Based Models

Felzenszwalb, Girshick,

McAllester, Ramanan

[PAMI 2009]

+ +

the role of learning in vision

Documents

role of learning

max normalization

note pooling

role of normalization

ramanan pami

inf normalization

designed features

gabor channel