fcv learn yu

36
1 Sparse Coding and Its Extensions for Visual Recognition Kai Yu Media Analytics Department NEC Labs America, Cupertino, CA

Upload: zukun

Post on 15-Jan-2015

308 views

Category:

Technology


1 download

DESCRIPTION

 

TRANSCRIPT

Page 1: Fcv learn yu

1

Sparse Coding and Its Extensions for Visual Recognition

Kai Yu

Media Analytics DepartmentNEC Labs America, Cupertino, CA

Page 2: Fcv learn yu

Visual Recognition is HOT in Computer Vision

04/10/23 2

Caltech 101

PASCAL VOC

80 Million Tiny Images

ImageNet

Page 3: Fcv learn yu

The pipeline of machine visual perception

04/10/23 3

Low-level sensing

Pre-processing

Feature extract.

Feature selection

Inference: prediction, recognition

• Most critical for accuracy• Account for most of the computation• Most time-consuming in development cycle• Often hand-craft in practice

Most Efforts in Machine Learning

Page 4: Fcv learn yu

Computer vision features

SIFT Spin image

HoG RIFT

Slide Credit: Andrew Ng

GLOH

Page 5: Fcv learn yu

Learning everything from data

04/10/23 5

Low-level sensing

Pre-processing

Feature extract.

Feature selection

Inference: prediction, recognition

Machine Learning

Machine Learning

Page 6: Fcv learn yu

BoW + SPM Kernel

04/10/23 6

• Combining multiple features, this method had been the state-of-the-art on Caltech-101, PASCAL, 15 Scene Categories, …

• Combining multiple features, this method had been the state-of-the-art on Caltech-101, PASCAL, 15 Scene Categories, …

Figure credit: Fei-Fei Li, Svetlana Lazebnik

Bag-of-visual-words representation (BoW) based on vector quantization (VQ)

Spatial pyramid matching (SPM) kernel

Page 7: Fcv learn yu

Winning Method in PASCAL VOC before 2009

04/10/23

Multiple Feature Sampling Methods

Multiple Visual Descriptors

VQ Coding, Histogram,

SPM Nonlinear SVM

7

Page 8: Fcv learn yu

Convolution Neural Networks

8

• The architectures of some successful methods are not so much different from CNNs

• The architectures of some successful methods are not so much different from CNNs

Conv. Filtering Pooling Conv. Filtering Pooling

Page 9: Fcv learn yu

BoW+SPM: the same architecture

9

e.g, SIFT, HOG

VQ Coding Average Pooling (obtain histogram)

Nonlinear SVM

Local Gradients Pooling

Observations: • Nonlinear SVM is not scalable• VQ coding may be too coarse• Average pooling is not optimal• Why not learn the whole thing?

Observations: • Nonlinear SVM is not scalable• VQ coding may be too coarse• Average pooling is not optimal• Why not learn the whole thing?

Page 10: Fcv learn yu

Develop better methods

10

Better Coding Better Pooling Scalable Linear

Classifier

Better Coding Better Pooling

Page 11: Fcv learn yu

Sparse Coding

04/10/23 11

Sparse coding (Olshausen & Field,1996). Originally developed to explain early visual processing in the brain (edge detection).

Training: given a set of random patches x, learning a dictionary of bases [Φ1, Φ2, …]

Coding: for data vector x, solve LASSO to find the sparse coefficient vector a

Page 12: Fcv learn yu

Sparse Coding Example

Natural Images Learned bases (1 , …, 64): “Edges”

50 100 150 200 250 300 350 400 450 500

50

100

150

200

250

300

350

400

450

500 50 100 150 200 250 300 350 400 450 500

50

100

150

200

250

300

350

400

450

500

50 100 150 200 250 300 350 400 450 500

50

100

150

200

250

300

350

400

450

500 50 100 150 200 250 300 350 400 450 500

50

100

150

200

250

300

350

400

450

500

50 100 150 200 250 300 350 400 450 500

50

100

150

200

250

300

350

400

450

500 50 100 150 200 250 300 350 400 450 500

50

100

150

200

250

300

350

400

450

500

0.8 * + 0.3 * + 0.5 *

x 0.8 * 36 + 0.3 * 42

+ 0.5

* 63

[a1, …, a64] = [0, 0, …, 0, 0.8, 0, …, 0, 0.3, 0, …, 0, 0.5, 0] (feature representation)

Test example

Compact & easily interpretableSlide credit: Andrew Ng

Page 13: Fcv learn yu

Testing:What is this?

Motorcycles Not motorcycles

Unlabeled images

[Raina, Lee, Battle, Packer & Ng, ICML 07]Self-taught Learning

Testing:What is this?

Slide credit: Andrew Ng

Page 14: Fcv learn yu

Classification Result on Caltech 101

04/10/23 14

64% SIFT VQ + Nonlinear SVM

50%Pixel Sparse Coding + Linear SVM

9K images, 101 classes

Page 15: Fcv learn yu

15

Sparse Coding Max Pooling Scalable Linear

Classifier

Local Gradients Pooling

e.g, SIFT, HOG

Sparse Coding on SIFT [Yang, Yu, Gong & Huang, CVPR09]

Page 16: Fcv learn yu

04/10/23 16

64% SIFT VQ + Nonlinear SVM

73%SIFT Sparse Coding + Linear SVM

Caltech-101

Sparse Coding on SIFT [Yang, Yu, Gong & Huang, CVPR09]

Page 17: Fcv learn yu

What we have learned?

17

Sparse Coding Max Pooling Scalable Linear

Classifier

Local Gradients Pooling

1. Sparse coding is a useful stuff (why?)2. Hierarchical architecture is needed

1. Sparse coding is a useful stuff (why?)2. Hierarchical architecture is needed

e.g, SIFT, HOG

Page 18: Fcv learn yu

MNIST Experiments

04/10/23 18

Error: 4.54%

• When SC achieves the best classification accuracy, the learned bases are like digits – each basis has a clear local class association.

• When SC achieves the best classification accuracy, the learned bases are like digits – each basis has a clear local class association.

Error: 3.75% Error: 2.64%

Page 19: Fcv learn yu

Distribution of coefficient (SIFT, Caltech101)

04/10/23 19

Neighbor bases tend to get nonzero coefficients

Neighbor bases tend to get nonzero coefficients

Page 20: Fcv learn yu

04/10/23 20

Interpretation 2Geometry of data manifold

• Each basis an “anchor point”• Sparsity is induced by locality: each datum is a linear combination of neighbor anchors.

Interpretation 1Discover subspaces

• Each basis is a “direction”• Sparsity: each datum is a linear combination of only several bases.• Related to topic model

Page 21: Fcv learn yu

A Function Approximation View to Coding

04/10/23 21

• Setting: f(x) is a nonlinear feature extraction function on image patches x

• Coding: nonlinear mapping x a

typically, a is high-dim & sparse

• Nonlinear Learning: f(x) = <w, a>

• Setting: f(x) is a nonlinear feature extraction function on image patches x

• Coding: nonlinear mapping x a

typically, a is high-dim & sparse

• Nonlinear Learning: f(x) = <w, a>

A coding scheme is good if it helps learning f(x)

Page 22: Fcv learn yu

04/10/23 22

A Function Approximation View to Coding – The General Formulation

Function Approx. Error

≤An unsupervised learning objective

Page 23: Fcv learn yu

Local Coordinate Coding (LCC)

04/10/23 23

• Dictionary Learning: k-means (or hierarchical k-means)• Dictionary Learning: k-means (or hierarchical k-means)

• Coding for x, to obtain its sparse representation a

Step 1 – ensure locality: find the K nearest bases

Step 2 – ensure low coding error:

Yu, Zhang & Gong, NIPS 09Wang, Yang, Yu, Lv, Huang CVPR 10

Page 24: Fcv learn yu

Super-Vector Coding (SVC)

04/10/23 24

• Dictionary Learning: k-means (or hierarchical k-means)• Dictionary Learning: k-means (or hierarchical k-means)

• Coding for x, to obtain its sparse representation a

Step 1 – find the nearest basis of x, obtain its VQ coding

e.g. [0, 0, 1, 0, …]

Step 2 – form super vector coding:

e.g. [0, 0, 1, 0, …, 0, 0, (x-m3), 0 ,… ]

Zhou, Yu, Zhang, and Huang, ECCV 10

Zero-order Local tangent

Page 25: Fcv learn yu

Function Approximation based on LCC

04/10/23 25

data points bases

locally linear

Yu, Zhang, Gong, NIPS 10

Page 26: Fcv learn yu

Function Approximation based on SVC

data pointscluster centers

Piecewise local linear (first-order)Local tangent

Zhou, Yu, Zhang, and Huang, ECCV 10

Page 27: Fcv learn yu

PASCAL VOC Challenge 2009

04/10/23 27

OursBest of

Other Teams DifferenceClasses

No.1 for 18 of 20 categories

We used only HOG feature on gray images

Page 28: Fcv learn yu

ImageNet Challenge 2010

04/10/23 28

~40% VQ + Intersection Kernel

64%~73%Various Coding Methods + Linear SVM

1.4 million images, 1000 classes, top5 hit rate

50%Classification accuracy

Page 29: Fcv learn yu

Hierarchical sparse coding

29

Conv. Filtering Pooling Conv. Filtering Pooling

Learning from unlabeled data

Yu, Lin, & Lafferty, CVPR 11

Page 30: Fcv learn yu

A two-layer sparse coding formulation

04/10/23 30

Page 31: Fcv learn yu

MNIST Results -- classificationMNIST Results -- classification

HSC vs. CNN: HSC provide even better performance than CNN more amazingly, HSC learns features in unsupervised

manner!31

Page 32: Fcv learn yu

MNIST results MNIST results -- effect of hierarchical learning -- effect of hierarchical learning

Comparing the Fisher score of HSC and SC

Discriminative power: is significantly improved by HSC although HSC is unsupervised coding 32

Page 33: Fcv learn yu

MNIST results MNIST results -- learned codebook-- learned codebook

33

One dimension in the second layer: invariance to translation, rotation, and deformation

Page 34: Fcv learn yu

Caltech101 results Caltech101 results -- classification-- classification

Learned descriptor: performs slightly better than SIFT + SC

34

Page 35: Fcv learn yu

Conclusion and Future Work

“function approximation” view to derive novel sparse coding methods.

Locality – one way to achieve sparsity and it’s really useful. But we need deeper understanding of the feature learning methods

Interesting directions– Hierarchical coding – Deep Learning (many papers now!)– Faster methods for sparse coding (e.g. from LeCun’s group)– Learning features from a richer structure of data, e.g., video

(learning invariance to out plane rotation)

Page 36: Fcv learn yu

References

04/10/23 37

• Learning Image Representations from Pixel Level via Hierarchical Sparse Coding, Kai Yu, Yuanqing Lin, John Lafferty. CVPR 2011

• Large-scale Image Classification: Fast Feature Extraction and SVM Training, Yuanqing Lin, Fengjun Lv, Liangliang Cao, Shenghuo Zhu, Ming Yang, Timothee Cour, Thomas Huang, Kai Yu in CVPR 2011

• ECCV 2010 Tutorial, Kai Yu, Andrew Ng (with links to some source codes)

• Deep Coding Networks, Yuanqing Lin, Tong Zhang, Shenghuo Zhu, Kai Yu. In NIPS 2010.

• Image Classification using Super-Vector Coding of Local Image Descriptors, Xi Zhou, Kai Yu, Tong Zhang, and Thomas Huang. In ECCV 2010.

• Efficient Highly Over-Complete Sparse Coding using a Mixture Model, Jianchao Yang, Kai Yu, and Thomas Huang. In ECCV 2010.

• Improved Local Coordinate Coding using Local Tangents, Kai Yu and Tong Zhang. In ICML 2010.

• Supervised translation-invariant sparse coding, Jianchao Yang, Kai Yu, and Thomas Huang, In CVPR 2010

• Learning locality-constrained linear coding for image classification, Jingjun Wang, Jianchao Yang, Kai Yu, Fengjun Lv, Thomas Huang. In CVPR 2010.

• Nonlinear learning using local coordinate coding, Kai Yu, Tong Zhang, and Yihong Gong. In NIPS 2009.

• Linear spatial pyramid matching using sparse coding for image classification, Jianchao Yang, Kai Yu, Yihong Gong, and Thomas Huang. In CVPR 2009.