histograms of sparse codes for object detectionazizpour/rg/materials/hossein_008_slides.pdf · a...
TRANSCRIPT
![Page 1: Histograms of Sparse Codes for Object Detectionazizpour/rg/materials/hossein_008_slides.pdf · a greedy method to select K codes given the codes X, update dictionary D using SVD](https://reader034.vdocuments.us/reader034/viewer/2022050121/5f5199071b3d9c5901439294/html5/thumbnails/1.jpg)
Histograms of Sparse Codes for Object DetectionXiaofeng Ren (Amazon), Deva Ramanan (UC Irvine)
Presented by Hossein Azizpour
![Page 2: Histograms of Sparse Codes for Object Detectionazizpour/rg/materials/hossein_008_slides.pdf · a greedy method to select K codes given the codes X, update dictionary D using SVD](https://reader034.vdocuments.us/reader034/viewer/2022050121/5f5199071b3d9c5901439294/html5/thumbnails/2.jpg)
What does the paper do? (learning) a new representation
local histograms of sparse encodings
replaces HOG…
(sliding window) detection
and improves result
![Page 3: Histograms of Sparse Codes for Object Detectionazizpour/rg/materials/hossein_008_slides.pdf · a greedy method to select K codes given the codes X, update dictionary D using SVD](https://reader034.vdocuments.us/reader034/viewer/2022050121/5f5199071b3d9c5901439294/html5/thumbnails/3.jpg)
How is the feature extracted? (summary)Offline partrandomly select a set of n local image intensity patches 𝑌 = [𝑦1, 𝑦2,…,𝑦𝑛]
generate a codebook D = [𝑑1, 𝑑2,…, 𝑑𝑚 ] able to reconstruct Y:
𝑌 = 𝐷𝑋 where X = [𝑥1, 𝑥2,…, 𝑥𝑛] is the new encoding of the patches
Random Selection
Learn Codebook
I Y D
![Page 4: Histograms of Sparse Codes for Object Detectionazizpour/rg/materials/hossein_008_slides.pdf · a greedy method to select K codes given the codes X, update dictionary D using SVD](https://reader034.vdocuments.us/reader034/viewer/2022050121/5f5199071b3d9c5901439294/html5/thumbnails/4.jpg)
How is the feature extracted? (summary)Online partcompute the encodings for patches around each pixel.
do average pooling over fixed regions of pixels
(optionally) reduce dimensions using a projection matrix
post process the extracted feature (e.g. L2 normalization)
Patch encoding at each pixel
averagepooling
dimreduction
use instead of HOGin training a detector
e.g. DPM
![Page 5: Histograms of Sparse Codes for Object Detectionazizpour/rg/materials/hossein_008_slides.pdf · a greedy method to select K codes given the codes X, update dictionary D using SVD](https://reader034.vdocuments.us/reader034/viewer/2022050121/5f5199071b3d9c5901439294/html5/thumbnails/5.jpg)
Generating the Codebook Given a set of image patches
Jointly find dictionary
sparse code matrix
K is a predefined sparsity level
minimize residual rather than a complete reconstruction (𝑌 = 𝐷𝑋 + 𝛜)
solve for X and D in the following objective using K-SVD
![Page 6: Histograms of Sparse Codes for Object Detectionazizpour/rg/materials/hossein_008_slides.pdf · a greedy method to select K codes given the codes X, update dictionary D using SVD](https://reader034.vdocuments.us/reader034/viewer/2022050121/5f5199071b3d9c5901439294/html5/thumbnails/6.jpg)
Generating the Codebook (K-SVD) alternate between the estimation of X and D
given a dictionary D, use Orthogonal Matching Pursuit (OMP) to find the codes Xa greedy method to select K codes
given the codes X, update dictionary D using SVD
![Page 7: Histograms of Sparse Codes for Object Detectionazizpour/rg/materials/hossein_008_slides.pdf · a greedy method to select K codes given the codes X, update dictionary D using SVD](https://reader034.vdocuments.us/reader034/viewer/2022050121/5f5199071b3d9c5901439294/html5/thumbnails/7.jpg)
Generating the Codebook (K-SVD) K-SVD as generalization of K-means
Alternates between estimation of D and X
A good approximate solver of the optimization
![Page 8: Histograms of Sparse Codes for Object Detectionazizpour/rg/materials/hossein_008_slides.pdf · a greedy method to select K codes given the codes X, update dictionary D using SVD](https://reader034.vdocuments.us/reader034/viewer/2022050121/5f5199071b3d9c5901439294/html5/thumbnails/8.jpg)
Generating the Codebook (OMP)
Greedily selects the codes for the encodings
Updates all thecoefficients
There is a fast versionCalled Batch OMP
![Page 9: Histograms of Sparse Codes for Object Detectionazizpour/rg/materials/hossein_008_slides.pdf · a greedy method to select K codes given the codes X, update dictionary D using SVD](https://reader034.vdocuments.us/reader034/viewer/2022050121/5f5199071b3d9c5901439294/html5/thumbnails/9.jpg)
Feature Extraction (Binning)8x8 cells
soft binning (bilinear interpolation)
4 – neighborhood
average pooling on 16x16 neighborhood
absolute value of sparse codes
F = (|𝑥1|, |𝑥2|, … , |𝑥𝑚|)
contrast sensitive features [ 𝑥𝑖 , max 𝑥𝑖 , 0 ,max(−𝑥𝑖 , 0)]
![Page 10: Histograms of Sparse Codes for Object Detectionazizpour/rg/materials/hossein_008_slides.pdf · a greedy method to select K codes given the codes X, update dictionary D using SVD](https://reader034.vdocuments.us/reader034/viewer/2022050121/5f5199071b3d9c5901439294/html5/thumbnails/10.jpg)
Feature Extraction (post processing) L2 normalization
power transform 𝐹 = 𝐹𝛼 (element-wise)
![Page 11: Histograms of Sparse Codes for Object Detectionazizpour/rg/materials/hossein_008_slides.pdf · a greedy method to select K codes given the codes X, update dictionary D using SVD](https://reader034.vdocuments.us/reader034/viewer/2022050121/5f5199071b3d9c5901439294/html5/thumbnails/11.jpg)
Feature Extraction (example)3m dimensional feature
HOG can be replaced
![Page 12: Histograms of Sparse Codes for Object Detectionazizpour/rg/materials/hossein_008_slides.pdf · a greedy method to select K codes given the codes X, update dictionary D using SVD](https://reader034.vdocuments.us/reader034/viewer/2022050121/5f5199071b3d9c5901439294/html5/thumbnails/12.jpg)
Feature Extraction (dim reduction)Too long feature vectors -> slow training and testing
(somewhat) supervised dimensionality reduction
Train root filters (𝑤1, 𝑤2, … , 𝑤𝑞) for different classes/subclasses using original 3m dimensional features.
𝑤𝑖 = 𝑤𝑖1|| 𝑤𝑖
2|| … ||𝑤𝑖𝐶 where 𝑤𝑖
𝑐 is the corresponding part of cell c.
stack all cells of all weight vectors to produce matrix 𝑊 =
𝑤11𝑇
⋮
𝑤𝑖𝑐𝑇
⋮
𝑤𝑞𝐶𝑇
Do PCA dimensionality reduction on 𝑊′ = 𝑊𝑃, 𝑊 = 𝑈𝑆𝑃𝑇 ,
Use the projection matrix to transform original cell features to lower dimensions 𝐹′ = 𝐹𝑃
![Page 13: Histograms of Sparse Codes for Object Detectionazizpour/rg/materials/hossein_008_slides.pdf · a greedy method to select K codes given the codes X, update dictionary D using SVD](https://reader034.vdocuments.us/reader034/viewer/2022050121/5f5199071b3d9c5901439294/html5/thumbnails/13.jpg)
Training DetectorDeformable Parts Model (DPM)Root only
Root with Parts
fixing part latencyusing originally trained DPMs
to make training faster!
![Page 14: Histograms of Sparse Codes for Object Detectionazizpour/rg/materials/hossein_008_slides.pdf · a greedy method to select K codes given the codes X, update dictionary D using SVD](https://reader034.vdocuments.us/reader034/viewer/2022050121/5f5199071b3d9c5901439294/html5/thumbnails/14.jpg)
Experiments (Different Parameters)INRIA pedestrian
root-only
HOG AP = 80.2%
sparsity level vs Dictionary Size
fix K = 1
histogram of sparse codes
![Page 15: Histograms of Sparse Codes for Object Detectionazizpour/rg/materials/hossein_008_slides.pdf · a greedy method to select K codes given the codes X, update dictionary D using SVD](https://reader034.vdocuments.us/reader034/viewer/2022050121/5f5199071b3d9c5901439294/html5/thumbnails/15.jpg)
Experiments (Different Parameters)Patch size vs Dictionary size
K-medoid clustering wouldn’t gain performance larger than 3x3
Fixed to 5x5
![Page 16: Histograms of Sparse Codes for Object Detectionazizpour/rg/materials/hossein_008_slides.pdf · a greedy method to select K codes given the codes X, update dictionary D using SVD](https://reader034.vdocuments.us/reader034/viewer/2022050121/5f5199071b3d9c5901439294/html5/thumbnails/16.jpg)
Experiments (Different Parameters)K-SVD vs K-means
activated code can have aweight other than 1
possible change of sign
![Page 17: Histograms of Sparse Codes for Object Detectionazizpour/rg/materials/hossein_008_slides.pdf · a greedy method to select K codes given the codes X, update dictionary D using SVD](https://reader034.vdocuments.us/reader034/viewer/2022050121/5f5199071b3d9c5901439294/html5/thumbnails/17.jpg)
Experiments (Different Parameters)Power transform
Fixed at 0.25
double helinger kernel!
![Page 18: Histograms of Sparse Codes for Object Detectionazizpour/rg/materials/hossein_008_slides.pdf · a greedy method to select K codes given the codes X, update dictionary D using SVD](https://reader034.vdocuments.us/reader034/viewer/2022050121/5f5199071b3d9c5901439294/html5/thumbnails/18.jpg)
Experiments (Different Parameters)Supervised PCA (on models) vs PCA on data
PASCAL 4 classes
bus
cat
diningtable
Motorbike
More effective for person
fixed at 100
![Page 19: Histograms of Sparse Codes for Object Detectionazizpour/rg/materials/hossein_008_slides.pdf · a greedy method to select K codes given the codes X, update dictionary D using SVD](https://reader034.vdocuments.us/reader034/viewer/2022050121/5f5199071b3d9c5901439294/html5/thumbnails/19.jpg)
Final Experiments
INRIA root only
PASCALroot only
PASCALwith parts
![Page 20: Histograms of Sparse Codes for Object Detectionazizpour/rg/materials/hossein_008_slides.pdf · a greedy method to select K codes given the codes X, update dictionary D using SVD](https://reader034.vdocuments.us/reader034/viewer/2022050121/5f5199071b3d9c5901439294/html5/thumbnails/20.jpg)
Visualizing HSC with Reconstructions(1) Image
(2) HSC
(3) HOG
Courtesy of Carl Voldrick et al 2012 “Inverting and Visualizing Features”
![Page 21: Histograms of Sparse Codes for Object Detectionazizpour/rg/materials/hossein_008_slides.pdf · a greedy method to select K codes given the codes X, update dictionary D using SVD](https://reader034.vdocuments.us/reader034/viewer/2022050121/5f5199071b3d9c5901439294/html5/thumbnails/21.jpg)
Some detections…
![Page 22: Histograms of Sparse Codes for Object Detectionazizpour/rg/materials/hossein_008_slides.pdf · a greedy method to select K codes given the codes X, update dictionary D using SVD](https://reader034.vdocuments.us/reader034/viewer/2022050121/5f5199071b3d9c5901439294/html5/thumbnails/22.jpg)
Conclusions we can easily? go beyond hand crafted HOG by a sort of feature learning
deep learning
primitive shape codes might work better than simple gradient orientation
PCA on model instead of data seems promising