BMVA2013 1K. Mikolajczyk, Local Feature Descriptors for Visual Recognition
Krystian Mikolajczyk
Center for Vision, Speech and Signal Processing, University of Surrey, Guildford UK
Local Feature Descriptors for Visual Recognition
BMVA2014 Tutorial
BMVA2013 2K. Mikolajczyk, Local Feature Descriptors for Visual Recognition
Introduction• Krystian Mikolajczyk, Reader in Robot Vision• University of Surrey, Guildford, UK
• 50km south west of London• 5 Faculties, • 12 000 students, 2500 staff
• Faculty of Electronic and Physical Sciences• Electronic Engineering Department
• CVSSP ‐ Centre for Vision, Speech, and Signal Processing– 19 Academics– 30 Research Fellows– 60 PhD students
BMVA2013 3K. Mikolajczyk, Local Feature Descriptors for Visual Recognition
Research
Image enhancement,VideorestorationSuperresolutionHDR imaging
Image and video representationLocal descriptorsMotion estimationSegmentationClustering
Machine Learning methodsLDA, KDA, SVM
Retrieval, indexing, data structures
Image and video recognition
Object detectionScene classification
Activity recognition
BMVA2013 4K. Mikolajczyk, Local Feature Descriptors for Visual Recognition
Program• Local Feature Definitions / Properties• Applications• Interest point detectors• Local Descriptors• Evaluations
BMVA2013 5K. Mikolajczyk, Local Feature Descriptors for Visual Recognition
Features Detector• Definition: A feature detector (extractor) is an algorithm taking an image
as input and outputting a set of regions (“local features”).
• “Local Features” are regions, i.e. in principle arbitrary sets of pixels, not necessarily contiguous, which are at least :– distinguishable in an image regardless of viewpoint/illumination– robust to occlusion must be local– Must have a discriminative neighborhood: they are “features”
• Terminology has not stabilised:Local Feature = Interest “Point” = Keypoint =
= Feature “Point” = The “Patch”= Distinguished Region = Features = (Transformation) Covariant Region
• Definition: A descriptor is computed on an image region defined by a detector. The descriptor is a representation of the intensity (colour, ….) function on the region.
BMVA2013 6K. Mikolajczyk, Local Feature Descriptors for Visual Recognition
Feature Detectors• Invariance (or covariance) to a broad class of geometric and photometric
transforms• Efficiency: close to real‐time performance• Quantity/Density of features to cover small object/part of scenes• Robustness to:
– occlusion and clutter (requires locality)– to noise, blur, discretization, compression
• Distinctiveness: individual features can be matched to a large database of objects
• Stability over time (to support long‐temporal‐baseline matching)• Geometrically accuracy: precise localization• Generalization to similar objects• Even coverage, complementarity, number of geometric constraints, …• No detector dominates in all aspects, some properties are competing,
e.g. level of invariance x speed
BMVA2013 7K. Mikolajczyk, Local Feature Descriptors for Visual Recognition
Feature Descriptor• Definition: A descriptor is computed on an image region defined
by a detector. The descriptor is a representation of the intensity (colour, ….) function on the region.
Desiderata for feature descriptors:• Discriminability• Robustness to misalignment, illumination, blur, compression, …• Efficiency: real‐time often required• Compactness: small memory footprint. Very significant on
mobile large‐scale applications
• Note: The region on which a descriptor is computed is a called a measurement region. This may be directly the feature detector output or any other function of it (eg. convex hull, triple area region..)
BMVA2013 8K. Mikolajczyk, Local Feature Descriptors for Visual Recognition
• Methods based on “Local Features” are the state‐of‐the‐art for number of computer vision problems (mostly those that require local correspondences).– Registration– Stereo vision– Motion estimation– Matching– Retrieval– Image & Video Classification– Detection– Action recognition– Robot navigation
However, there are still many issues to address
Local Features
BMVA2013 9K. Mikolajczyk, Local Feature Descriptors for Visual Recognition
Local transf: scale/affine – Detector: affine‐Harris Descriptor: SIFT
Example 1: Wide baseline matching• Establish correspondence between two (or more) images• Useful in visual geometry: Camera calibration, 3D reconstruction, Structure and motion estimation, …
BMVA2013 10K. Mikolajczyk, Local Feature Descriptors for Visual Recognition
Example 2: Panoramic mosaic
BMVA2013 11K. Mikolajczyk, Local Feature Descriptors for Visual Recognition
M. Brown, D. Lowe, B. Hearn, J. BeisBMVA2013 12K. Mikolajczyk, Local Feature Descriptors for Visual Recognition
Example 3: 3D reconstruction• Photo Tourism overview
Scene reconstruction
Photo ExplorerInput photographs
Relative camera positions and orientations
Point cloud
Sparse correspondence
Slide: N. Snavely
Noah Snavely, Steven M. Seitz, Richard Szeliski, "Photo tourism: Exploring photo collections in 3D," ACM Transactions on Graphics (SIGGRAPH Proceedings)
BMVA2013 13K. Mikolajczyk, Local Feature Descriptors for Visual Recognition
Example 3: 3D reconstruction• 57,845 downloaded images, 11,868 registered images. This video: 4,619 images. • The Old City of Dubrovnik •
Building Rome in a Day, Agarwal, Snavely, Simon, Seitz, Szeliski, ICCV 2009 See also [Havlena, Torrii, Knop pand Pajdla, CVPR 2009].
BMVA2013 14K. Mikolajczyk, Local Feature Descriptors for Visual Recognition
Example 4: Query by example search in large scale image datasets
Find these objects ...in these images and 1M more
Search the web with a visual query …
BMVA2013 15K. Mikolajczyk, Local Feature Descriptors for Visual Recognition
Example 5: Google goggles
Slide credit: I. Laptev BMVA2013 16K. Mikolajczyk, Local Feature Descriptors for Visual Recognition
Example 6: Where am I?• Place recognition ‐ retrieval in a structured (on a map) database
[Knopp, Sivic, Pajdla, ECCV 2010] http://www.di.ens.fr/willow/research/confusers/
Query
Query Expansion(Panoramio,
Flickr, … )
Best match
Image indexingwith spatial verification
ConfuserSuppressionOnly negative training data
(from geotags)
Image database
BMVA2013 17K. Mikolajczyk, Local Feature Descriptors for Visual Recognition
Example 8: Re‐acquisition in tracking• Tracking Loop
ECCV 2012 Modern features: … Introduction.
Detect Correspondence generation + PROSAC
Update Structured SVM + stochastic gradient descent
Hare, Amri, Torr, CVPR 2012
BMVA2013 18K. Mikolajczyk, Local Feature Descriptors for Visual Recognition
Example 8: Object category recognitionSliding window detector
• Classifier: SVM with linear kernel
• BOW representation for ROI
Example detections for dog
Lampert et al CVPR 08: Efficient branch and bound search over all windows
BMVA2013 19K. Mikolajczyk, Local Feature Descriptors for Visual Recognition BMVA2013 20K. Mikolajczyk, Local Feature Descriptors for Visual Recognition
BMVA2013 21K. Mikolajczyk, Local Feature Descriptors for Visual Recognition
Human Action Recognition
BMVA2013 22K. Mikolajczyk, Local Feature Descriptors for Visual Recognition
Local features meet Invariants: Schmid and Mohr, 1997.
C. Schmid, R. Mohr, "Local Gray-Value Invariants for Image Retrieval", IEEE Trans. PAMI, vol. 19 (5), 1997, pp. 530--535.
• Multi scale differential gray value invariants computed at Harris points
• Similarity‐based geometric constraint to reject mismatches
• Canonical Framenot used.
BMVA2013 23K. Mikolajczyk, Local Feature Descriptors for Visual Recognition
D. Lowe, Object recognition from local scale‐invariant features, ICCV, 1999
Detector:• Scale‐space peaks of Difference‐of‐Gaussians filter
response (Lindeberg 1995 )• Similarity frame from modes of gradient histogram
SIFT Descriptor:• Local histograms of gradient orientation• Allows for small misalignments
=> robust to non‐similarity transformsIndexing:• Modified kD‐tree structureVerification:• Hough transform based clustering of
correspondences with similar transformations
Fast, efficient implementation, real‐time recognition
D. G. Lowe: “Distinctive image features from scale-invariant keypoints”. IJCV, 2004.
BMVA2013 24K. Mikolajczyk, Local Feature Descriptors for Visual Recognition
J. Sivic& A. Zisserman, Video Google …, ICCV 2003
• Given an image or a part ofit, return its label or a ranking of labels
• Local image patches– Interest points– Regular grid sampling
• Image descriptors– Histogram of gradients
• Visual vocabulary – Clustering of descriptors
– Learning codewordweights per category
• Codeword occurrence distribution per image
…
I14I22..
Image indexes
I44I212..
I14I22..
I134I252..
I14I22..
I34I82..
I184I52..
I514I542..
I64I692..
I664I252..
I784I72..
…
…
voting
BMVA2013 25K. Mikolajczyk, Local Feature Descriptors for Visual Recognition
Image Classificationhttp://kahlan.eps.surrey.ac.uk/featurespace/web/Classification_Exercise.zip
• Local image patches
• Image descriptors
• Visual vocabulary
• Codeword occurrence distribution per image
• Machine learning– classifiers
freq
uen
cy
codewords
…
4. Classification
…
…
… ………
Kernel matrix
Kernel Discriminant
Analysis
Cla
ss la
bels
…
BMVA2013 26K. Mikolajczyk, Local Feature Descriptors for Visual Recognition
Properties of the ideal feature
• Local: features are local, so robust to occlusion and clutter (no prior segmentation)
• Invariant (or covariant)• Robust: noise, blur, discretization, compression, etc. do not have a big impact on the feature
• Distinctive: individual features can be matched to a large database of objects
• Quantity: many features can be generated for even small objects
• Accurate: precise localization• Efficient: close to real‐time performance
BMVA2013 27K. Mikolajczyk, Local Feature Descriptors for Visual Recognition
How to cope with transformations?
• Exhaustive search• Robustness• Invariance
BMVA2013 28K. Mikolajczyk, Local Feature Descriptors for Visual Recognition
Invariance
• Integration, e.g.– moment invariants, …
• Heuristics, e.g.– Difference of intensity values for photom. offset– Ratio of intensity values for photom. scale factor
• Selection and normalization, e.g.– Automatic scale selection (Lindeberg et al., 1996)– Orientation assignment– Affine normalization (‘deskewing’)
• …
BMVA2013 29K. Mikolajczyk, Local Feature Descriptors for Visual Recognition
Photometric transformations
Modelled as a linear transformation:scaling + offset (Color features, T. Gevers)
baII +='
BMVA2013 30K. Mikolajczyk, Local Feature Descriptors for Visual Recognition
Geometric transformations
• Translation• Euclidean (translation + rotation)• Similarity (transl. + rotation + scale)• Affine transformations• Projective transformations
Only holds for planar patches!
BMVA2013 31K. Mikolajczyk, Local Feature Descriptors for Visual Recognition
The need for geometric invariance
BMVA2013 32K. Mikolajczyk, Local Feature Descriptors for Visual Recognition
Overview of existing detectors
• Hessian & Harris• Lowe: DoG• Mikolajczyk&Schmid:
Hessian/Harris‐Laplacian/Affine• Tuytelaars& Van Gool: EBR and IBR• Matas: MSER• Kadir& Brady: Salient Regions • Others
BMVA2013 33K. Mikolajczyk, Local Feature Descriptors for Visual Recognition
• Hessian determinant
⎥⎦
⎤⎢⎣
⎡=
yyxy
xyxx
IIII
IHessian )(
2))(det( xyyyxx IIIIHessian −=
Ixx
Iyy
Ixy
2)^(. xyyyxx III −∗In Matlab:
Hessian detector (Beaudet, 1978)
BMVA2013 34K. Mikolajczyk, Local Feature Descriptors for Visual Recognition
Hessian detector (Beaudet, 1978)
BMVA2013 35K. Mikolajczyk, Local Feature Descriptors for Visual Recognition
35
• Second moment matrix / autocorrelation matrix
1. Image derivatives
2. Square of derivatives
3. Gaussian filter g(σI)
Ix Iy
Ix2 Iy2 IxIy
g(Ix2) g(Iy2) g(IxIy)
222222 )]()([)]([)()( yxyxyx IgIgIIgIgIg +−− α=−= ))],([trace()],(det[ DIDIhar σσμασσμ
4. Cornerness function – both eigenvalues are strong
har5. Non‐maxima suppression
Harris detector (Harris, 1988)
⎥⎥⎦
⎤
⎢⎢⎣
⎡∗=
)()()()(
)(),( 2
2
DyDyx
DyxDxIDI III
IIIg
σσσσ
σσσμ
BMVA2013 36K. Mikolajczyk, Local Feature Descriptors for Visual Recognition
Scale invariant detectorsLaplacian of Gaussian
• Local maxima in scale space of Laplacian of Gaussian LoG
)()( σσ yyxx LL +
σ
σ2
σ3
σ4
σ5
list of (x, y, σ)
BMVA2013 37K. Mikolajczyk, Local Feature Descriptors for Visual Recognition
• LoG –> diffution quation ‐> derivative to scale
)(σL
)()( σσ yyxx LLLLsL
+=Δ=∇⋅∇=∂∂ vv
- =
)( σkL
LLΔ=
∂∂ σσ
2σ=s
)()()1( 2 σσσ LkLLk −≈Δ−
)()( σσ LkL −
))()((2 σσσ yyxx LL +scale normalized Laplacean
σσσσσ
σ −−
≈Δ=∂∂
kLkLLL )()(
Lowe: DoG
BMVA2013 38K. Mikolajczyk, Local Feature Descriptors for Visual Recognition
Lowe: DoG
BMVA2013 39K. Mikolajczyk, Local Feature Descriptors for Visual Recognition
scale‐invariantsimple, efficient schemelaplacian fires more on edges than
determinant of hessian
Properties
BMVA2013 40K. Mikolajczyk, Local Feature Descriptors for Visual Recognition
Harris Laplace
σ
σ2
σ3
σ4
Detecting local maxima
1. Initialization: Multiscale Harris corner detection
BMVA2013 41K. Mikolajczyk, Local Feature Descriptors for Visual Recognition
Harris Laplace
Harris points
Harris‐Laplace points
1. Initialization: Multiscale Harris corner detection2. Scale selection based on Laplacian
BMVA2013 42K. Mikolajczyk, Local Feature Descriptors for Visual Recognition
Harris Affine
1. Detect multi‐scale Harris points2. Automatically select the scales3. Adapt affine shape based on second order moment matrix4. Refine point location
BMVA2013 43K. Mikolajczyk, Local Feature Descriptors for Visual Recognition
Harris & Hessian Affine
BMVA2013 44K. Mikolajczyk, Local Feature Descriptors for Visual Recognition
T. Tuytelaars, B. Leibe 44
Scale or affine invariantDetects blob‐ and corner‐like structures
large number of regionswell suited for object class recognitionless accurate than some competitors
Properties
BMVA2013 45K. Mikolajczyk, Local Feature Descriptors for Visual Recognition
Matas: Maximally Stable Extremal Regions (MSERs)
• Based on watershed algorithm
BMVA2013 46K. Mikolajczyk, Local Feature Descriptors for Visual Recognition
Matas: Maximally Stable Extremal Regions (MSERs)
• Based on watershed algorithm
BMVA2013 47K. Mikolajczyk, Local Feature Descriptors for Visual Recognition
Matas: Maximally Stable Extremal Regions (MSERs)
• Based on watershed algorithm
BMVA2013 48K. Mikolajczyk, Local Feature Descriptors for Visual Recognition
Matas: Maximally Stable Extremal Regions (MSERs)
• Based on watershed algorithm
BMVA2013 49K. Mikolajczyk, Local Feature Descriptors for Visual Recognition
Maximally Stable Extremal Regions
BMVA2013 50K. Mikolajczyk, Local Feature Descriptors for Visual Recognition
Affine invariantDetects blob‐like structures
Simple, efficient schemeHigh repeatabilityFires on similar features as IBR
(regions need not be convex, but need to be closed)
Sensitive to image blur
Properties
BMVA2013 51K. Mikolajczyk, Local Feature Descriptors for Visual Recognition
Kadir&Brady: salient regions• Based on entropy
BMVA2013 52K. Mikolajczyk, Local Feature Descriptors for Visual Recognition
• Maxima in entropy, combined with inter‐scale saliency
• Extended to affine invariance
Kadir& Brady: salient regions
BMVA2013 53K. Mikolajczyk, Local Feature Descriptors for Visual Recognition
Kadir& Brady: salient regions
BMVA2013 54K. Mikolajczyk, Local Feature Descriptors for Visual Recognition
PropertiesScale or affine invariantDetects blob‐like structures
very good for object class recognitionlimited number of regionsslow to extract
BMVA2013 55K. Mikolajczyk, Local Feature Descriptors for Visual Recognition
55
Affine normalization (‘deskewing’)
rotate
rescale
BMVA2013 56K. Mikolajczyk, Local Feature Descriptors for Visual Recognition
Local descriptors - rotation invariance
• Estimation of the dominant orientation– extract gradient orientation– histogram over gradient orientation– peak in this histogram
• Rotate patch in dominant direction0 2π
• Plus: invariance• Minus: less discriminant, additional noise
BMVA2013 57K. Mikolajczyk, Local Feature Descriptors for Visual Recognition
57
• Scale, stretch and skew• Fixed size disk
Affine Normalization
BMVA2013 58K. Mikolajczyk, Local Feature Descriptors for Visual Recognition
ASIFT ‐ a new affine invariant detector?Idea:• (due to Lepetit et al. and others) Synthesize warped views of both images in two view
matching• Match all pairs of synthesized images.• Impose a geometric constraint to prune the tentative correspondences• Positives:
– yes, more correct correspondences are found =>– some (very) difficult matching problems solvable
• Negatives:– detection time goes up significantly– matching time goes up even more significantly (quadratically)– problematic use in e.g. retrieval – issue with evaluation (not all reported matches are inliers)
• ASIFT is NOT a detector – rather a “matching” scheme– generates “redundant” representation, that slows down the matching significantly– any detector may benefit from this “matching” scheme
Guoshen Yu, Jean‐Michel Morel: A fully affine invariant image comparison method. ICASSP 2009: 1597‐1600G.Yu and J.M. Morel, ASIFT: An Algorithm for Fully Affine Invariant Comparison, Image Processing On Line, 2011.
ECCV 2012 Modern features: … Detectors.
BMVA2013 59K. Mikolajczyk, Local Feature Descriptors for Visual Recognition
Affine Invariance by Sampling Viewsphere• Why use DoG (“SIFT”)?
Replacing DoG by HessianAffine or MSER is beneficial and more efficient!• HessianAffine matches from direct matching:
• HessianAffine matches with synthesized images (viewsphere sampling)
• Conclusions: 1. generating synthesized view work 2. no reason to use DoGECCV 2012 Modern features: … Detectors. BMVA2013 60K. Mikolajczyk, Local Feature Descriptors for Visual Recognition
Efficient methods
• Consider that descriptor calculation may take longer that the detection process! Sometimes, “auxiliary calculations” like non‐maximum suppression dominates computation time.
• Consider the required level of invariance:in some applications, reduced level invariance is sufficient
• Consider fast approximations• Use fast implementations, e. g. on GPU (GPU SURF, GPU SIFT)
ECCV 2012 Modern features: … Detectors. 60/60
BMVA2013 61K. Mikolajczyk, Local Feature Descriptors for Visual Recognition
Speeded Up Robust Features• Idea:
– Approximate Hessian + SIFT calculation with a computationally efficient algorithm.
• Properties:– exploit the integral image – the SURF detector is an approximation to the Hessian– reuse the calculations needed for detection in descriptor computation
– maintain robustness to rotation, scale illumination change– approximately 2x faster than DoG
10x faster Hessian‐Laplace detector
Herbert Bay, Tinne Tuytelaars, and Luc Van Gool , SURF: Speeded Up Robust Features, ECCV 2006.
61ECCV 2012 Modern features: … Detectors.
citations2300 (2010)4000 (2012)
BMVA2013 62K. Mikolajczyk, Local Feature Descriptors for Visual Recognition
62
The Integral image (Sum Table)
To calculate the sum in the DBCA rectangle, only 3 additions are needed
ECCV 2012 Modern features: … Detectors.
BMVA2013 63K. Mikolajczyk, Local Feature Descriptors for Visual Recognition
63
SURF Detection
• Approximate second order derivatives with box filters filters (mean/average filter)
Hessian-based interest point localization:Lxx(x,y,σ) is the convolution of the Gaussian second order derivative with the image
ECCV 2012 Modern features: … Detectors. BMVA2013 64K. Mikolajczyk, Local Feature Descriptors for Visual Recognition
64
SURF Detection• Scale analysis easily handled with the integral image
9 x 9, 15 x 15, 21 x 21, 27 x 27 39 x 39, 51 x 51 …1st octave 2nd octave
ECCV 2012 Modern features: … Detectors.
BMVA2013 65K. Mikolajczyk, Local Feature Descriptors for Visual Recognition
CenSurE‐Oct detector• Approximation of LoG / DoG by octagonal box filters• Sum of intensities inside an octagon
calculated in O(1) using 3 integral images:• Zero DC response requires normalisation• Constant DC response over scales• 3x3x3 non‐maxima suppression• Edge responses supressed using Harris measure• Scale sampling:
ECCV 2012 Modern features: … Detectors. 65/60
M. Agrawal, K. Konolige, M. R. Blas. CenSurE: Center Surround Extremasfor Realtime Feature Detection and Matching ECCV 2008
BMVA2013 66K. Mikolajczyk, Local Feature Descriptors for Visual Recognition
STAR detector
ECCV 2012 Modern features: … Detectors. 66/60
Approximating LoG / DoGusing 2 integral images5x5 spatial non-max suppressionA single response at a point: maximum over scalesEdge reponses suppressed using Hessian
K. Konolige et al. View-Based Maps. Journal of Robotics 2010 http://pr.willowgarage.com/wiki/Star_Detector
BMVA2013 67K. Mikolajczyk, Local Feature Descriptors for Visual Recognition
CenSurE and STAR det. vs. DoG• Small scale features: Only integer locations and scales• Limited rotation invariance• Only 2x speedup for the DoG detector
ECCV 2012 Modern features: … Detectors. 67/60
Det. Time [s] Without SSEinstructions
With SSEinstructions
VLF SIFT (DoG)
0.34 0.14
STAR 0.16 0.08
CenSurE 0.28 X
BMVA2013 68K. Mikolajczyk, Local Feature Descriptors for Visual Recognition
Fast‐9 and Fast‐ER (E. Rosten)• in some situations (controlled lighting, tracking),
invariance/robustness is less important than speed• simple detector based on intensity comparisons could be very
fast, and yet “repeatable enough”
• detection: 12 contiguous pixels are darker/brighter than the central pixel by at least t.
• http://www.edwardrosten.com/work/fast.html
68
citations:730 (2012)
ECCV 2012 Modern features: … Detectors.
BMVA2013 69K. Mikolajczyk, Local Feature Descriptors for Visual Recognition
Fast‐9 and Fast‐ER (E. Rosten)
69ECCV 2012 Modern features: … Detectors. BMVA2013 70K. Mikolajczyk, Local Feature Descriptors for Visual Recognition
Other feature detectors• Edge‐based detectors
– Jurie et al., Mikolajczyk et al., …• Combinations of small‐scale features
– Brown & Lowe• Vertical line segments
– Goedeme et al.• Speeded‐Up Robust Features (SURF)
– Bay et al.• Fast Features
– Rosten et al.• Segmentation based features
– Malik et al, Koniusz et al.
BMVA2013 71K. Mikolajczyk, Local Feature Descriptors for Visual Recognition
Program• Local Feature Definitions / Properties• Applications• Interest point detectors • Local Descriptors• Evaluations
BMVA2013 72K. Mikolajczyk, Local Feature Descriptors for Visual Recognition
Extract affine regions Normalize regionsEliminate rotational
+ illuminationCompute appearance
descriptors
SIFT (Lowe ’04)
Descriptors
BMVA2013 73K. Mikolajczyk, Local Feature Descriptors for Visual Recognition
Descriptors history
• Normalized cross-correlation (NCC) [~ 60s]
• Gaussian derivative-based descriptors– Differential invariants [Koenderink and van Doorn’87]– Steerable filters [Freeman and Adelson’91]
• Moment invariants [Van Gool et al.’96]
• SIFT [Lowe’99]
• Shape context [Belongie et al.’02]
• Gradient PCA [Ke and Sukthankar’04]
• SURF descriptor [Bay et al.’08]
• DAISY descriptor [Tola et al.’08, Windler et al’09]
• …….
BMVA2013 74K. Mikolajczyk, Local Feature Descriptors for Visual Recognition
SIFT descriptor [Lowe’99]
• Spatial binning and binning of the gradient orientation• 4x4 spatial grid, 8 orientations of the gradient, dim 128 • Soft-assignment to spatial bins• Normalization of the descriptor to norm one (robust to illumination) • Comparison with Euclidean distance
gradient
→ →
image patch
y
x
BMVA2013 75K. Mikolajczyk, Local Feature Descriptors for Visual Recognition
SURF: Speeded Up Robust Features
• Approximate derivatives with Haar wavelets• Exploit integral images
Herbert Bay, Andreas Ess, Tinne Tuytelaars, Luc Van Gool "SURF: Speeded Up Robust Features", Computer Vision and Image Understanding (CVIU), Vol. 110, No. 3, pp. 346‐‐359, 2008
Citations: 4500 (2012)
BMVA2013 76K. Mikolajczyk, Local Feature Descriptors for Visual Recognition
DAISY• Optimized for dense sampling• Log‐polar grid• Gaussian smoothing• Dealing with occlusions
Engin Tola, Vincent Lepetit, and Pascal Fua, DAISY: An Efficient Dense Descriptor Applied to Wide‐Baseline Stereo, TPAMI 32(5), 2010.
Citations: 150 (2012)
BMVA2013 77K. Mikolajczyk, Local Feature Descriptors for Visual Recognition
Fast and compact descriptors• Binary descriptors• Comparison of pairs of intensity values
– LBP– BRIEF– ORB– BRISK
BMVA2013 78K. Mikolajczyk, Local Feature Descriptors for Visual Recognition
LBP: Local Binary Patterns
T. Ojala, M. Pietikäinen, and D. Harwood (1994), "Performance evaluation of texture measures with classification based on Kullback discrimination of distributions", ICPR 1994, pp.582‐585.M Heikkilä, M Pietikäinen, C Schmid, Description of interest regions with LBP, Pattern recognition 42 (3), 425‐436
• First proposed for texture recognition in 1994.
Citations: 2500 (2012)
BMVA2013 79K. Mikolajczyk, Local Feature Descriptors for Visual Recognition
BRIEF:Binary Robust Independent
Elementary Features• Random selection of pairsof intensity values.
• Fixed sampling patternof 128, 256 or 512 pairs.
• Hamming distance to compare descriptors (XOR).
M. Calonder, V. Lepetit, C. Strecha, P. Fua, BRIEF: Binary Robust Independent Elementary Features, 11th European Conference on Computer Vision, 2010.
Citations: 149 (2012)
BMVA2013 80K. Mikolajczyk, Local Feature Descriptors for Visual Recognition
Various others• BRISK: Binary Robust Invariant Scalable Keypoints• FREAK: Fast Retina Keypoint• CARD: Compact and Realtime Descriptor• LDB: Local Difference Binary
BMVA2013 81K. Mikolajczyk, Local Feature Descriptors for Visual Recognition
LIOP: Local Intensity Order Pattern for Feature Description
Zhenhua Wang Bin Fan Fuchao Wu, Local Intensity Order Pattern for Feature Description, ICCV 2011.
• Robustness to monotonic intensity changes• Data‐driven division into cells
(and predecessors MROGH and MRRID)
BMVA2013 82K. Mikolajczyk, Local Feature Descriptors for Visual Recognition
Linear Discriminant Projections
M. Brown, G. Hua and S. Winder, Discriminant Learning of Local Image Descriptors. PAMI. 2010.H. Cai, K. Mikolajczyk, J Matas, Linear Discriminant Projections, PAMI 2010.
• Learn configuration and other parametersfrom training data obtained from 3D reconstructions
Citations: 194 (2012)
BMVA2013 83K. Mikolajczyk, Local Feature Descriptors for Visual Recognition
Linear Discriminant Projections• Training data = set of corresponding image patches
BMVA2013 84K. Mikolajczyk, Local Feature Descriptors for Visual Recognition
Descriptor Learning Using Optimisation
• Learning of– spatial pooling regions– dimensionality reduction
• Learning from very weak supervision
Non‐linear transform
Spatial pooling
Dimensionality reduction
Pre‐rectified keypoint patch
Descriptor vector
Normalisation and cropping
learning
learning
K. Simonyan et al., Descriptor Learning Using Convex Optimisation, ECCV 2012
M. Brown, G. Hua and S. Winder, Discriminant Learning of Local Image Descriptors. PAMI 2010.
BMVA2013 85K. Mikolajczyk, Local Feature Descriptors for Visual Recognition
D‐BRIEF: Discriminative BRIEF
T. Trzcinski and V. Lepetit, Efficient Discriminative Projections for Compact Binary DescriptorsEuropean Conference on Computer Vision (ECCV) 2012
• Learn linear projections that map image patches to a more discriminative subspace
• Exploit integral images
BMVA2013 86K. Mikolajczyk, Local Feature Descriptors for Visual Recognition
Dimensionality Reduction• PCA or LDE can reduce dimensionality to 30% without performance decrease
• Improves clustering performance• Especially useful with combined grayvalue and colordescriptors
• Improves matching/recognition performance e.g Scene 15– SIFT 128 dim 83.5%– PCA 30 dim 82.9% – LDE 30 dim 84.5%
H. Cai, K. Mikolajczyk, J Matas, Linear Discriminant Projections, PAMI 2010.
BMVA2013 87K. Mikolajczyk, Local Feature Descriptors for Visual Recognition
Program• Local Feature Definitions / Properties• Applications• Interest point detectors • Local Descriptors• Evaluations
BMVA2013 88K. Mikolajczyk, Local Feature Descriptors for Visual Recognition
Setting up an evaluation• Which problem? Performance in different application/niches may vary significantly.– Category recognition, – Matching, – Retrieval
• What dataset?– Pascal VOC 2007– Oxford image pairs– Oxford ‐ Paris buildings
• Protocol and criteria?– Public dataset, – Avoiding risk to over‐fitting/optimizing to the data
BMVA2013 89K. Mikolajczyk, Local Feature Descriptors for Visual Recognition
Detector evaluations
matchesallmatchescorrectprecision
##
=
A
BB
homography
Two points are correctly matched ifT=40%
TBABA
>∪∩
encescorrespondtruthgroundmatchescorrectrecall
##
=
BMVA2013 90K. Mikolajczyk, Local Feature Descriptors for Visual Recognition
90
1
Performance measurePrecision‐Recall
recall
1‐precision
correctincorrectcorrectecisionPr
+=
truthgroundcorrectrecall =
High precision= very few incorrect images Low precision= all images
1
High recall = all ground truth images
Low recall = none of ground truth images
Good approach
Bad approach
0.5
0.5
BMVA2013 91K. Mikolajczyk, Local Feature Descriptors for Visual Recognition
Matching testPrecision‐recall area
matchesallmatchescorrectprecision
##
=
encescorrespondtruthgroundmatchescorrectrecall
##
=
20 30 40 50 600
BMVA2013 92K. Mikolajczyk, Local Feature Descriptors for Visual Recognition
Previous Evaluations• 2D Scene – Homography
– C. Schmid, R. Mohr, and C. Bauckhage, “Evaluation of interest point detectors,” IJCV, 2000.
– K. Mikolajczyk and C. Schmid, “A performance evaluation of local descriptors,” CVPR, 2003.
– T. Kadir, M. Brady, and A. Zisserman, “An affine invariant method for selecting salient regions in images,” in ECCV, 2004.
– K. Mikolajczyk, T. Tuytelaars, C. Schmid, A. Zisserman, J. Matas, F. Schaffalitzky,T. Kadir, and L. Van Gool, “A comparison of affine region detectors,” IJCV, 2005.
– A. Haja, S. Abraham, and B. Jahne, Localization accuracy of region detectors, CVPR 2008
– T. Dickscheid, FSchindler, Falko, W. Förstner, Coding Images with Local Features, IJCV 2011
BMVA2013 93K. Mikolajczyk, Local Feature Descriptors for Visual Recognition
Previous Evaluations• 3D Scene ‐ epipolar constraints
– F. Fraundorfer and H. Bischof, “Evaluation of local detectors on non‐planar, scenes,” in AAPR, 2004.
– P. Moreels and P. Perona, “Evaluation of features detectors and descriptors based on 3D objects,” IJCV, 2007.
– S. Winder and M. Brown, “Learning local image descriptors,” CVPR, 2007,2009.– Dahl, A.L., Aanæs, H. and Pedersen, K.S. (2011): Finding the Best Feature Detector‐
Descriptor Combination. 3DIMPVT, 2011.
BMVA2013 94K. Mikolajczyk, Local Feature Descriptors for Visual Recognition
Recent Evaluations• Recent detectorsO. Miksik and K. Mikolajczyk, Evaluation of Local Detectors and Descriptors for Fast Feature Matching, ICPR 2012
BMVA2013 95K. Mikolajczyk, Local Feature Descriptors for Visual Recognition
Recent Descriptor Evaluations• Computation times for the different descriptors for 1000 SURF
keypointsO. Miksik and K. Mikolajczyk, Evaluation of Local Detectors and Descriptors for Fast Feature Matching, ICPR 2012J. Heinly E. Dunn, J‐M. Frahm, Comparative Evaluation of Binary Features, ECCV2012
BMVA2013 96K. Mikolajczyk, Local Feature Descriptors for Visual Recognition
Previous Evaluations• Image/object categories
– K. Mikolajczyk, B. Leibe, and B. Schiele, “Local features for object class recognition,” in ICCV, 2005
– E. Seemann, B. Leibe, K. Mikolajczyk, and B. Schiele, “An evaluation of local shape‐based features for pedestrian detection,” in BMVC, 2005.
– M. Stark and B. Schiele, “How good are local features for classes of geometric objects,” in ICCV, 2007.
– K. E. A. van de Sande, T. Gevers and C. G. M. Snoek, Evaluation of Color Descriptors for Object and Scene Recognition. CVPR, 2008.
BMVA2013 97K. Mikolajczyk, Local Feature Descriptors for Visual Recognition
Approach• Bags‐of‐features
1. Interest point / region detector2. Descriptors3. K‐means clustering (4000 clusters)4. Histogram of cluster occurrences (NN assignment)5. Chi‐square distance and RBF kernel for KDA or SVM classifier
• J. Zhang and M. Marszalek and S. Lazebnik and C. Schmid, Local Features and Kernels for Classification of Texture and Object Categories: A Comprehensive Study, IJCV, 2007• K. E. A. van de Sande, T. Gevers and C. G. M. Snoek, Evaluation of Color Descriptors for Object and Scene Recognition. CVPR, 2008
BMVA2013 98K. Mikolajczyk, Local Feature Descriptors for Visual Recognition
Evaluation
• PASCAL VOC measures– Average precision for every object category– Mean average precision APCategory
precision
recall
APoutput=>
BMVA2013 99K. Mikolajczyk, Local Feature Descriptors for Visual Recognition
MAP #dimensions density
MAP Rankingcolor/gray, density, dimensionality ...
• SIFT still dominates (Histograms of gradient locations and orientations)• Opponent chromatic space (normalized red‐green, blue‐yellow, and intensity Y
BMVA2013 100K. Mikolajczyk, Local Feature Descriptors for Visual Recognition
Grayvalue descriptors
• Observations• Color improves• All based on histograms of gradient locations and orientations• Dimensionality not much correlated with the performance• Density Strongly correlated (the more the better)• Results biased by density• Implementation details matter
MAP Ranking density#dimensions
BMVA2013 101K. Mikolajczyk, Local Feature Descriptors for Visual Recognition
Features• Which features to use ?
– Affine invariant features if large viewpoint changes are expected (>30 degrees)
– Level of invariance needed depends on number of model images
– Features need to be distinctive: risk for false matches is large
– At least a few good matches (if time for post‐processing is not an issue)
– Take into account typical image content (blobs/corners/prints/…)
• MSER, SURF, DoG, Harris/Hessian‐Laplace/Affine• www.featurespace.org, VLFeat, OpenCV(data, code)
BMVA2013 102K. Mikolajczyk, Local Feature Descriptors for Visual Recognition
Conclusions• Histograms of gradient location‐orientation dominate
• Color brings improvement for most classes– opponent chromatic space
• Feature number– the more the better
• Similar ranking in image matching – performance generalizes across applications
• Exercisehttp://kahlan.eps.surrey.ac.uk/featurespace/web/Classification_Exercise.zip