Download - Vincent Lepetit - Real-time computer vision
![Page 1: Vincent Lepetit - Real-time computer vision](https://reader034.vdocuments.us/reader034/viewer/2022051412/5485c96eb4af9f910d8b4e84/html5/thumbnails/1.jpg)
Real-Time Computer Vision
Microsoft Computer Vision School
Vincent Lepetit - CVLab - EPFL (Lausanne, Switzerland)
1
![Page 2: Vincent Lepetit - Real-time computer vision](https://reader034.vdocuments.us/reader034/viewer/2022051412/5485c96eb4af9f910d8b4e84/html5/thumbnails/2.jpg)
demo
2
![Page 3: Vincent Lepetit - Real-time computer vision](https://reader034.vdocuments.us/reader034/viewer/2022051412/5485c96eb4af9f910d8b4e84/html5/thumbnails/3.jpg)
applications
...
3
![Page 4: Vincent Lepetit - Real-time computer vision](https://reader034.vdocuments.us/reader034/viewer/2022051412/5485c96eb4af9f910d8b4e84/html5/thumbnails/4.jpg)
• How the demo works (including Randomized Trees);
• More recent work.
4
![Page 5: Vincent Lepetit - Real-time computer vision](https://reader034.vdocuments.us/reader034/viewer/2022051412/5485c96eb4af9f910d8b4e84/html5/thumbnails/5.jpg)
Background• 3D world to 2D images (projection matrix,
internal parameters, external parameters, homography, ...);
• Robust estimation (non-linear least-squares, RANSAC, robust estimators, ...);
• Feature point matching (affine region detectors, SIFT, ...).
5
![Page 6: Vincent Lepetit - Real-time computer vision](https://reader034.vdocuments.us/reader034/viewer/2022051412/5485c96eb4af9f910d8b4e84/html5/thumbnails/6.jpg)
From the 3D World to a 2D Image
M
m
World coordinate system
What is the relation between the 3D coordinates of a point M and its correspondent m in the image captured by the camera ?
6
![Page 7: Vincent Lepetit - Real-time computer vision](https://reader034.vdocuments.us/reader034/viewer/2022051412/5485c96eb4af9f910d8b4e84/html5/thumbnails/7.jpg)
Perspective Projection
C
M
World coordinate system
Camera center
The image formation is modeled as a perspective projection, which is realistic for standard cameras:
The rays passing through a 3D point M and its correspondent m in the image all intersect at a single point C, the camera center.
m
7
![Page 8: Vincent Lepetit - Real-time computer vision](https://reader034.vdocuments.us/reader034/viewer/2022051412/5485c96eb4af9f910d8b4e84/html5/thumbnails/8.jpg)
Z
C
Expressing M in the Camera Coordinates System
M
m
World coordinate system
Camera coordinate systemX
Y
Mcam
Step 1: Express the coordinates of M in the camera coordinates system as Mcam.
This transformation corresponds to a Euclidean displacement (a rotation plus a translation):
Mcam = RM + Twhere: R is a 3x3 rotation matrix, and T is a 3- vector.
8
![Page 9: Vincent Lepetit - Real-time computer vision](https://reader034.vdocuments.us/reader034/viewer/2022051412/5485c96eb4af9f910d8b4e84/html5/thumbnails/9.jpg)
Homogeneous Coordinates
Lets replace by the 4- homogeneous vector : Just add a 1 as the fourth coordinate.
Now, the Euclidean displacement can be expressed as an linear transformation instead of an affine one:
Z
C
m
World coordinate system
Camera coordinate systemX
Y
€
M =
XYZ
⎛
⎝
⎜ ⎜ ⎜
⎞
⎠
⎟ ⎟ ⎟ → ˜ M =
XYZ1
⎛
⎝
⎜ ⎜ ⎜ ⎜
⎞
⎠
⎟ ⎟ ⎟ ⎟
€
M
€
˜ M
€
Mcam = RM + T →
Xcam
Ycam
Zcam
⎛
⎝
⎜ ⎜ ⎜
⎞
⎠
⎟ ⎟ ⎟
= RXYZ
⎛
⎝
⎜ ⎜ ⎜
⎞
⎠
⎟ ⎟ ⎟
+ T →
Xcam
Ycam
Zcam
⎛
⎝
⎜ ⎜ ⎜
⎞
⎠
⎟ ⎟ ⎟
= R | T( )
XYZ1
⎛
⎝
⎜ ⎜ ⎜ ⎜
⎞
⎠
⎟ ⎟ ⎟ ⎟
→ Mcam = R | T( ) ˜ M
Mcam
(R | T ) is a 3x4 matrix.9
![Page 10: Vincent Lepetit - Real-time computer vision](https://reader034.vdocuments.us/reader034/viewer/2022051412/5485c96eb4af9f910d8b4e84/html5/thumbnails/10.jpg)
Projection
Computation of the coordinates of m in the image plane, from Mcam (expressed in the camera coordinates system): Simply use Thales' theorem:
CX
mX
Mcam
f
Z
mX
XC
f
Mcam
m
Camera coordinate system
Z
Y
€
mX
f=XZ
→ mX = f XZ
10
![Page 11: Vincent Lepetit - Real-time computer vision](https://reader034.vdocuments.us/reader034/viewer/2022051412/5485c96eb4af9f910d8b4e84/html5/thumbnails/11.jpg)
From Projection to ImageCoordinates of m in pixels ?
C
f
m
Camera coordinate system
€
mX = f XZ, mY = f Y
Z
u
v
Image coordinate system u0
v0
1 pixel
€
1ku
€
1kv
)
)
€
mu = u0 + kumX , mv = v0 + kvmY
11
![Page 12: Vincent Lepetit - Real-time computer vision](https://reader034.vdocuments.us/reader034/viewer/2022051412/5485c96eb4af9f910d8b4e84/html5/thumbnails/12.jpg)
€
mX = f XZ, mY = f Y
Z
€
mu = u0 + kumX , mv = v0 + kvmY
€
In matrix form :uvw
⎛
⎝
⎜ ⎜ ⎜
⎞
⎠
⎟ ⎟ ⎟
=
ku f 0 u00 kv f v00 0 1
⎛
⎝
⎜ ⎜ ⎜
⎞
⎠
⎟ ⎟ ⎟
XYZ
⎛
⎝
⎜ ⎜ ⎜
⎞
⎠
⎟ ⎟ ⎟
€
uvw
⎛
⎝
⎜ ⎜ ⎜
⎞
⎠
⎟ ⎟ ⎟ defines m in homogeneous coordinates
€
→mu =
uw
= u0 + ku fXZ
mv =vw
= v0 + kv fYZ
⎧
⎨ ⎪
⎩ ⎪
Putting • the perspective projection and• the transformation into pixel coordinatestogether:
12
![Page 13: Vincent Lepetit - Real-time computer vision](https://reader034.vdocuments.us/reader034/viewer/2022051412/5485c96eb4af9f910d8b4e84/html5/thumbnails/13.jpg)
The Full TransformationThe two transformations are chained to form the full transformation from a
3D point in the world coordinate system to its projection in the image:
The product of the internal calibration matrix and the external calibration matrix is a 3x4 matrix called the "projection matrix".
The projection matrix is defined up to a scale factor.
€
uvw
⎛
⎝
⎜ ⎜ ⎜
⎞
⎠
⎟ ⎟ ⎟
=
ku f 0 u00 kv f v00 0 1
⎛
⎝
⎜ ⎜ ⎜
⎞
⎠
⎟ ⎟ ⎟
R11 R13 R13 T1R21 R22 R23 T2R31 R32 R33 T3
⎛
⎝
⎜ ⎜ ⎜
⎞
⎠
⎟ ⎟ ⎟
XYZ1
⎛
⎝
⎜ ⎜ ⎜ ⎜
⎞
⎠
⎟ ⎟ ⎟ ⎟
=
P11 P12 P13 P14P21 P22 P23 P24P31 P32 P33 P34
⎛
⎝
⎜ ⎜ ⎜
⎞
⎠
⎟ ⎟ ⎟
XYZ1
⎛
⎝
⎜ ⎜ ⎜ ⎜
⎞
⎠
⎟ ⎟ ⎟ ⎟
projection matrix
13
![Page 14: Vincent Lepetit - Real-time computer vision](https://reader034.vdocuments.us/reader034/viewer/2022051412/5485c96eb4af9f910d8b4e84/html5/thumbnails/14.jpg)
The Full Transformation
R, T, and the products kuf and kvf can be extracted from the projection matrix.
€
uvw
⎛
⎝
⎜ ⎜ ⎜
⎞
⎠
⎟ ⎟ ⎟
=
ku f 0 u00 kv f v00 0 1
⎛
⎝
⎜ ⎜ ⎜
⎞
⎠
⎟ ⎟ ⎟
R11 R13 R13 T1R21 R22 R23 T2R31 R32 R33 T3
⎛
⎝
⎜ ⎜ ⎜
⎞
⎠
⎟ ⎟ ⎟
XYZ1
⎛
⎝
⎜ ⎜ ⎜ ⎜
⎞
⎠
⎟ ⎟ ⎟ ⎟
=
P11 P12 P13 P14P21 P22 P23 P24P31 P32 P33 P34
⎛
⎝
⎜ ⎜ ⎜
⎞
⎠
⎟ ⎟ ⎟
XYZ1
⎛
⎝
⎜ ⎜ ⎜ ⎜
⎞
⎠
⎟ ⎟ ⎟ ⎟
projection matrix
14
![Page 15: Vincent Lepetit - Real-time computer vision](https://reader034.vdocuments.us/reader034/viewer/2022051412/5485c96eb4af9f910d8b4e84/html5/thumbnails/15.jpg)
Homography
m� = PM = [P1P2P3P4]
XYZ1
= [P1P2P3P4]
XY01
= [P1P2P4]
XY1
= H3×3m
m�M/mH3×3
15
![Page 16: Vincent Lepetit - Real-time computer vision](https://reader034.vdocuments.us/reader034/viewer/2022051412/5485c96eb4af9f910d8b4e84/html5/thumbnails/16.jpg)
Computing a Projection Matrix or a Homography from Point Correspondences
by solving a linear system
m�m
�u v 1 0 0 0 uu� vu� u�
0 0 0 u v 1 uv� vv� v�
�
H11
H12
H13
H21
H22
H23
H31
H32
H33
=�
00
�
m� = Hm
m = [u, v, 1]�,m� = [u�, v�, 1]�
16
![Page 17: Vincent Lepetit - Real-time computer vision](https://reader034.vdocuments.us/reader034/viewer/2022051412/5485c96eb4af9f910d8b4e84/html5/thumbnails/17.jpg)
• Non-linear least-squares minimization: Minimization of a physical, meaningful error (reprojection error, in pixels)
• Minimization algorithms: Gauss-Newton or Levenberg-Marquardt (very efficient).
minR,T
�
i
dist2�HR,Tmi,m
�i
�
m�m
Computing a Projection Matrix or a Homography from Point Correspondences with a non-linear optimization
minR,T
�
i
dist2�PR,TMi,m�
i
�
MHR,Tmm�
PR,Tm
17
![Page 18: Vincent Lepetit - Real-time computer vision](https://reader034.vdocuments.us/reader034/viewer/2022051412/5485c96eb4af9f910d8b4e84/html5/thumbnails/18.jpg)
A Look to the Reprojection Error
reprojection error
1D camera under 2D translation
100 "3D points" taken at randomly in
[400;1000]x[-500;+500]
True camera position at (0, 0)
18
![Page 19: Vincent Lepetit - Real-time computer vision](https://reader034.vdocuments.us/reader034/viewer/2022051412/5485c96eb4af9f910d8b4e84/html5/thumbnails/19.jpg)
Gaussian Noise on the ProjectionsWhite cross: true camera position;Black cross: global minimum of the objective function.
In that case, the global minimum of the objective function is close to the true camera pose.
19
![Page 20: Vincent Lepetit - Real-time computer vision](https://reader034.vdocuments.us/reader034/viewer/2022051412/5485c96eb4af9f910d8b4e84/html5/thumbnails/20.jpg)
What if there are Outliers ?
M1
M2
m1
m2
C
M3
m3
M4
m4
incorrect measure (outlier)
20
![Page 21: Vincent Lepetit - Real-time computer vision](https://reader034.vdocuments.us/reader034/viewer/2022051412/5485c96eb4af9f910d8b4e84/html5/thumbnails/21.jpg)
Gaussian Noise on the Projections + 20% outliers
White cross: true camera position;Black cross: global minimum of the objective function.
The global minimum is now far from the true camera pose.
21
![Page 22: Vincent Lepetit - Real-time computer vision](https://reader034.vdocuments.us/reader034/viewer/2022051412/5485c96eb4af9f910d8b4e84/html5/thumbnails/22.jpg)
What Happened ?
The error on the 2D point locations mi is assumed to have a Gaussian (Normal) distribution with identical covariance matrices σI, and independent;
This assumption is violated when mi is an outlier.
Bayesian interpretation:
M1
M2
m1
m2
C
M3
m3
M4
m4
argminR,T
�i dist2
�PR,Tmi,m�
i
�
= argmaxR,T
�i N
�m�
i;PR,Tmi,σI�
22
![Page 23: Vincent Lepetit - Real-time computer vision](https://reader034.vdocuments.us/reader034/viewer/2022051412/5485c96eb4af9f910d8b4e84/html5/thumbnails/23.jpg)
Robust estimationIdea: Replace the Normal distribution by a more suitable distribution, or
equivalently replace the least-squares estimator by a "robust estimator" or “M-estimator”:
argminR,T
�i dist2
�PR,Tmi,m�
i
�
→ argminR,T
�i ρ
�dist
�PR,Tmi,m�
i
��
23
![Page 24: Vincent Lepetit - Real-time computer vision](https://reader034.vdocuments.us/reader034/viewer/2022051412/5485c96eb4af9f910d8b4e84/html5/thumbnails/24.jpg)
Example of an M-estimator:The Tukey Estimator
x2
ρ(x)
The Tukey estimator assumes the measures follow a distribution that is a mixture of:• a Normal distribution, for the inliers,• a uniform distribution, for the outliers.
�if |x| ≤ c ρ(x) = c2
6 (1− (1− (xc )2)3)
if |x| > c ρ(x) = c2
6
24
![Page 25: Vincent Lepetit - Real-time computer vision](https://reader034.vdocuments.us/reader034/viewer/2022051412/5485c96eb4af9f910d8b4e84/html5/thumbnails/25.jpg)
Normal distribution(inliers)
+ =
Uniform distribution(outliers)
Tukey estimator
-log(.)-log(.)
Least-squares
Mixture
25
![Page 26: Vincent Lepetit - Real-time computer vision](https://reader034.vdocuments.us/reader034/viewer/2022051412/5485c96eb4af9f910d8b4e84/html5/thumbnails/26.jpg)
Gaussian Noise on the Projections + 20% outliers + Tukey estimator
White cross: true camera position;Black cross: global minimum of the object function.
The global minimum is very close to the true camera pose.BUT: - local minimums;- the objective function is flat where all the correspondences are considered outliers.
26
![Page 27: Vincent Lepetit - Real-time computer vision](https://reader034.vdocuments.us/reader034/viewer/2022051412/5485c96eb4af9f910d8b4e84/html5/thumbnails/27.jpg)
Gaussian Noise on the Projections + 50% outliers + Tukey estimator
Even more local minimums.Numerical optimization can get trapped into a local minimum.
27
![Page 28: Vincent Lepetit - Real-time computer vision](https://reader034.vdocuments.us/reader034/viewer/2022051412/5485c96eb4af9f910d8b4e84/html5/thumbnails/28.jpg)
RANSAC
28
![Page 29: Vincent Lepetit - Real-time computer vision](https://reader034.vdocuments.us/reader034/viewer/2022051412/5485c96eb4af9f910d8b4e84/html5/thumbnails/29.jpg)
How to Optimize ?
Idea: sampling the space of solutions (the camera pose space here):
29
![Page 30: Vincent Lepetit - Real-time computer vision](https://reader034.vdocuments.us/reader034/viewer/2022051412/5485c96eb4af9f910d8b4e84/html5/thumbnails/30.jpg)
How to Optimize ?
Idea: sampling the space of solutions:
+ Numerical Optimization from the best sampled pose.
Problem: Exhaustive regular sampling is too expensive in 6 dimensions.Can we do a smarter sampling ?
30
![Page 31: Vincent Lepetit - Real-time computer vision](https://reader034.vdocuments.us/reader034/viewer/2022051412/5485c96eb4af9f910d8b4e84/html5/thumbnails/31.jpg)
RANSACRANSAC: RANdom SAmple Consensus
Line fitting: the "Throwing Out the worst residual" heuristics can fail (Example for the original paper [Fischler81]):
outlier
final least-squares solution
Ideal line
31
![Page 32: Vincent Lepetit - Real-time computer vision](https://reader034.vdocuments.us/reader034/viewer/2022051412/5485c96eb4af9f910d8b4e84/html5/thumbnails/32.jpg)
RANSACAs before, we could do a regular sampling, but would not be optimal:
Ideal line
32
![Page 33: Vincent Lepetit - Real-time computer vision](https://reader034.vdocuments.us/reader034/viewer/2022051412/5485c96eb4af9f910d8b4e84/html5/thumbnails/33.jpg)
Idea:
Generate hypotheses from subsets of the measurements.If a subset contains no gross errors, the estimated parameters (the hypothesis) are closed
to the true ones.
Take several subsets at random, retain the best one.
Ideal line
33
![Page 34: Vincent Lepetit - Real-time computer vision](https://reader034.vdocuments.us/reader034/viewer/2022051412/5485c96eb4af9f910d8b4e84/html5/thumbnails/34.jpg)
The quality of a hypothesis is evaluated by the number of measures that lie "close enough" to the predicted line.
We need to choose a threshold (T) to decide if the measure is "close enough". RANSAC returns the best hypothesis, ie the hypothesis with the largest number of
inliers.
T
€
1 if dist(mi,line p( )) ≤ T0 if dist(mi,line p( )) > T⎧ ⎨ ⎩ i
∑
34
![Page 35: Vincent Lepetit - Real-time computer vision](https://reader034.vdocuments.us/reader034/viewer/2022051412/5485c96eb4af9f910d8b4e84/html5/thumbnails/35.jpg)
RANSAC for HomographiesTo apply RANSAC to homography estimation, we need a way to compute a
homography from a subset of measurements:
Since RANSAC only provides a solution estimated with a limited number of data, it must be followed by a robust minimization to refine the solution.
�u v 1 0 0 0 uu� vu� u�
0 0 0 u v 1 uv� vv� v�
�
H11
H12
H13
H21
H22
H23
H31
H32
H33
=�
00
�
35
![Page 36: Vincent Lepetit - Real-time computer vision](https://reader034.vdocuments.us/reader034/viewer/2022051412/5485c96eb4af9f910d8b4e84/html5/thumbnails/36.jpg)
How to Get the Correspondences ?
• Extract Feature Points / Keypoints / Regions (Harris corner detector, extrema of Laplacian, affine region detectors, ...);
• standard approach: Match them based on Euclidean distances between descriptors such as SIFT, SURF, ...
m�m
36
![Page 37: Vincent Lepetit - Real-time computer vision](https://reader034.vdocuments.us/reader034/viewer/2022051412/5485c96eb4af9f910d8b4e84/html5/thumbnails/37.jpg)
Affine Region Detectors
Hessian-Affine detector MSER detector
37
![Page 38: Vincent Lepetit - Real-time computer vision](https://reader034.vdocuments.us/reader034/viewer/2022051412/5485c96eb4af9f910d8b4e84/html5/thumbnails/38.jpg)
Affine Normalization
Warp by M11/2 Warp by M2
1/2
We still have to correct for the orientation !
38
![Page 39: Vincent Lepetit - Real-time computer vision](https://reader034.vdocuments.us/reader034/viewer/2022051412/5485c96eb4af9f910d8b4e84/html5/thumbnails/39.jpg)
Select Canonical Orientation• Create histogram of local gradient directions computed over the image patch;• Each gradient contributes for its norm, weighted by its distance to patch center;• Assign canonical orientation at peak of smoothed histogram.
0 2π
39
![Page 40: Vincent Lepetit - Real-time computer vision](https://reader034.vdocuments.us/reader034/viewer/2022051412/5485c96eb4af9f910d8b4e84/html5/thumbnails/40.jpg)
Select Canonical Orientation
40
![Page 41: Vincent Lepetit - Real-time computer vision](https://reader034.vdocuments.us/reader034/viewer/2022051412/5485c96eb4af9f910d8b4e84/html5/thumbnails/41.jpg)
Description Vector
?
...
41
![Page 42: Vincent Lepetit - Real-time computer vision](https://reader034.vdocuments.us/reader034/viewer/2022051412/5485c96eb4af9f910d8b4e84/html5/thumbnails/42.jpg)
SIFT Description VectorMade of local histograms of gradients:
In practice: 8 orientations x 4 x 4 histograms = 128 dimensions vector.Normalised to be robust to light changes.
...
42
![Page 43: Vincent Lepetit - Real-time computer vision](https://reader034.vdocuments.us/reader034/viewer/2022051412/5485c96eb4af9f910d8b4e84/html5/thumbnails/43.jpg)
Matching Regions
m�
...
...
...
...
...
...
...
?
43
![Page 44: Vincent Lepetit - Real-time computer vision](https://reader034.vdocuments.us/reader034/viewer/2022051412/5485c96eb4af9f910d8b4e84/html5/thumbnails/44.jpg)
Matching: Approximate Nearest Neighbour
Best-Bin-First: Approximate nearest-neighbour search in k-d tree
44
![Page 45: Vincent Lepetit - Real-time computer vision](https://reader034.vdocuments.us/reader034/viewer/2022051412/5485c96eb4af9f910d8b4e84/html5/thumbnails/45.jpg)
Keypoint Matching
Pre-processingMake the actual classification easier
Nearest neighbor classification
The standard approach is a particular case of classification:
Search in the Database
Idea: let’s try another classification method!
45
![Page 46: Vincent Lepetit - Real-time computer vision](https://reader034.vdocuments.us/reader034/viewer/2022051412/5485c96eb4af9f910d8b4e84/html5/thumbnails/46.jpg)
One Class per KeypointOne class per keypoint: the set of the keypoint’s possible appearances
under various perspective, lighting, noise...
class 1
class 2
46
![Page 47: Vincent Lepetit - Real-time computer vision](https://reader034.vdocuments.us/reader034/viewer/2022051412/5485c96eb4af9f910d8b4e84/html5/thumbnails/47.jpg)
Training phase
Classifier
Classifierclass 1
class 1
class 2
...
Run-Time
47
![Page 48: Vincent Lepetit - Real-time computer vision](https://reader034.vdocuments.us/reader034/viewer/2022051412/5485c96eb4af9f910d8b4e84/html5/thumbnails/48.jpg)
Which Classifier ?We want a classifier that:
• can handle many classes;• is very fast;• has reasonable recognition performances (a
very high recognition rate is not an necessary requirement).
48
![Page 49: Vincent Lepetit - Real-time computer vision](https://reader034.vdocuments.us/reader034/viewer/2022051412/5485c96eb4af9f910d8b4e84/html5/thumbnails/49.jpg)
Which Classifier ?• Randomized Trees [Amit & Geman, 1997];• Random forests [Breiman, 2001].
49
![Page 50: Vincent Lepetit - Real-time computer vision](https://reader034.vdocuments.us/reader034/viewer/2022051412/5485c96eb4af9f910d8b4e84/html5/thumbnails/50.jpg)
An (Ideal) Single Tree
binary test
binary test
binary test
class #
50
![Page 51: Vincent Lepetit - Real-time computer vision](https://reader034.vdocuments.us/reader034/viewer/2022051412/5485c96eb4af9f910d8b4e84/html5/thumbnails/51.jpg)
How to Build the Tree ?
binary test ?
training set
51
![Page 52: Vincent Lepetit - Real-time computer vision](https://reader034.vdocuments.us/reader034/viewer/2022051412/5485c96eb4af9f910d8b4e84/html5/thumbnails/52.jpg)
binary test ?
training set
found by minimizing the entropy after the test:
S
Sleft
Sright
argmintest
|Sleft||S| Entropy(Sleft) + |Sright|
|S| Entropy(Sright)
52
![Page 53: Vincent Lepetit - Real-time computer vision](https://reader034.vdocuments.us/reader034/viewer/2022051412/5485c96eb4af9f910d8b4e84/html5/thumbnails/53.jpg)
binary test
training set
S
Problem: runs quickly out of training samples for the deeper tests
53
![Page 54: Vincent Lepetit - Real-time computer vision](https://reader034.vdocuments.us/reader034/viewer/2022051412/5485c96eb4af9f910d8b4e84/html5/thumbnails/54.jpg)
Idea: Use Several Sub-Optimal TreesEach tree is trained with a random subset of the training set.
54
![Page 55: Vincent Lepetit - Real-time computer vision](https://reader034.vdocuments.us/reader034/viewer/2022051412/5485c96eb4af9f910d8b4e84/html5/thumbnails/55.jpg)
Idea: Use Several Sub-Optimal Trees
The leaves contain the probabilities over the classes, computed from the training set.
55
![Page 56: Vincent Lepetit - Real-time computer vision](https://reader034.vdocuments.us/reader034/viewer/2022051412/5485c96eb4af9f910d8b4e84/html5/thumbnails/56.jpg)
Classification with Several Sub-Optimal TreesThe test sample is dropped into each tree, and the probabilities in the leaves it reached are averaged:
+ + ) = (13
56
![Page 57: Vincent Lepetit - Real-time computer vision](https://reader034.vdocuments.us/reader034/viewer/2022051412/5485c96eb4af9f910d8b4e84/html5/thumbnails/57.jpg)
Visual InterpretationEach tree partitions the space in a different way and compute the probability of each class for each cell of the partition:
57
![Page 58: Vincent Lepetit - Real-time computer vision](https://reader034.vdocuments.us/reader034/viewer/2022051412/5485c96eb4af9f910d8b4e84/html5/thumbnails/58.jpg)
Visual InterpretationCombining the trees gives a fine partition with a better estimate of the class probabilities:
58
![Page 59: Vincent Lepetit - Real-time computer vision](https://reader034.vdocuments.us/reader034/viewer/2022051412/5485c96eb4af9f910d8b4e84/html5/thumbnails/59.jpg)
For PatchesPossible tests: compare the intensities of two pixels around the keypoint after Gaussian smoothing:
• Very efficient to compute;• Invariant to light change by any raising function.
mfi =
�1 if I(m + dmi,1) ≤ I(m + dmi,2)0 otherwise
m + dmi,1
m + dmi,2 I : image after Gaussian smoothing
59
![Page 60: Vincent Lepetit - Real-time computer vision](https://reader034.vdocuments.us/reader034/viewer/2022051412/5485c96eb4af9f910d8b4e84/html5/thumbnails/60.jpg)
Results
60
![Page 61: Vincent Lepetit - Real-time computer vision](https://reader034.vdocuments.us/reader034/viewer/2022051412/5485c96eb4af9f910d8b4e84/html5/thumbnails/61.jpg)
Randomized Trees (and Random Ferns) applied to image patches are becoming a powerful tool for Computer Vision.
61
![Page 62: Vincent Lepetit - Real-time computer vision](https://reader034.vdocuments.us/reader034/viewer/2022051412/5485c96eb4af9f910d8b4e84/html5/thumbnails/62.jpg)
[Shotton et al, CVPR’11]
Used to infer body parts in the Kinect body tracking system.
The tests rely on the depth map.
62
![Page 63: Vincent Lepetit - Real-time computer vision](https://reader034.vdocuments.us/reader034/viewer/2022051412/5485c96eb4af9f910d8b4e84/html5/thumbnails/63.jpg)
Tests in [Shotton et al, CVPR’11]Classes are the body parts. The goal is to label each pixel with the label of the part it belongs to.
Tests compare the depth of two pixels around the considered pixel.
The displacements are normalized by the depth of the considered pixel for invariance:
fi(m) =�
1 if depth(m + dm1depth(m) ) ≤ depth(m + dm2
depth(m) )0 otherwise
m
63
![Page 64: Vincent Lepetit - Real-time computer vision](https://reader034.vdocuments.us/reader034/viewer/2022051412/5485c96eb4af9f910d8b4e84/html5/thumbnails/64.jpg)
3D Pose EstimationMean-Shift is used to find the joint locations from the body parts.
64
![Page 65: Vincent Lepetit - Real-time computer vision](https://reader034.vdocuments.us/reader034/viewer/2022051412/5485c96eb4af9f910d8b4e84/html5/thumbnails/65.jpg)
Training
“Training 3 trees to depth 20 from 1 million images takes about 1 day on a 1000 core cluster” [Shotton et al, CVPR’11]
Most of the training data is synthetic:
65
![Page 66: Vincent Lepetit - Real-time computer vision](https://reader034.vdocuments.us/reader034/viewer/2022051412/5485c96eb4af9f910d8b4e84/html5/thumbnails/66.jpg)
A SubtreeAverage of the patches that reach this node
66
![Page 67: Vincent Lepetit - Real-time computer vision](https://reader034.vdocuments.us/reader034/viewer/2022051412/5485c96eb4af9f910d8b4e84/html5/thumbnails/67.jpg)
[Gall et Lempitsky, CVPR’09; Barinova et al, CVPR’10]
Hough Forest for Object Detection:• Random Forests used to make each patch vote for the object centroid;
• The tests compare the output of filters and histograms-of-gradient between 2 pixels;• The leaves contain the displacement toward the object center.
Accumulated votes from all patches
Final detectionEach patch votes for the object centroid
Votes from the 3 patches
67
![Page 68: Vincent Lepetit - Real-time computer vision](https://reader034.vdocuments.us/reader034/viewer/2022051412/5485c96eb4af9f910d8b4e84/html5/thumbnails/68.jpg)
Tests used in [Gall et Lempitsky, CVPR’09]
Channels: the 3 color channels, absolute values of the first and second derivatives of the image, and 9 channels from HoG (Histograms-of-Gradients).
fi(m) =�
1 if channeli(m + dm1) < channeli(m + dm2) + τ0 otherwise
HoG
68
![Page 69: Vincent Lepetit - Real-time computer vision](https://reader034.vdocuments.us/reader034/viewer/2022051412/5485c96eb4af9f910d8b4e84/html5/thumbnails/69.jpg)
[Bosch et al, ICCV’07]Image Classification using Random Forests and Ferns [Bosch et al, ICCV’07]Use a sliding window to detect objects.Much faster than SVMs, recognition performances similar.
69
![Page 70: Vincent Lepetit - Real-time computer vision](https://reader034.vdocuments.us/reader034/viewer/2022051412/5485c96eb4af9f910d8b4e84/html5/thumbnails/70.jpg)
[Bosch et al, ICCV’07]
Tests:
n and b: random vector and scalar.xm: vector computed from a Pyramidal Histogram-of-Gradients.
fi(m) =�
1 if n�xm + b ≤ 00 otherwise
70
![Page 71: Vincent Lepetit - Real-time computer vision](https://reader034.vdocuments.us/reader034/viewer/2022051412/5485c96eb4af9f910d8b4e84/html5/thumbnails/71.jpg)
[Kalal et al, CVPR’10]TLD (aka Predator), for Track, Learn, Detect:
• Random Ferns used to speed up detection;
• Trained online: the distributions in the leaves are updated online, using the incoming images.
71
![Page 72: Vincent Lepetit - Real-time computer vision](https://reader034.vdocuments.us/reader034/viewer/2022051412/5485c96eb4af9f910d8b4e84/html5/thumbnails/72.jpg)
[Kalal et al, CVPR’10]• Tests: 2bit binary patterns• Trained online: the distributions in the leaves are updated online, using the
incoming images.
72
![Page 73: Vincent Lepetit - Real-time computer vision](https://reader034.vdocuments.us/reader034/viewer/2022051412/5485c96eb4af9f910d8b4e84/html5/thumbnails/73.jpg)
Random Ferns: A Simplified Tree-Like Classifier
73
![Page 74: Vincent Lepetit - Real-time computer vision](https://reader034.vdocuments.us/reader034/viewer/2022051412/5485c96eb4af9f910d8b4e84/html5/thumbnails/74.jpg)
For Keypoint Recognition, We Can Use Random Tests!
Number of trees
Recognition rate
Comparison of the recognition rates for 200 keypoints:
tests selected by minimizing entropy
tests with random locations
74
![Page 75: Vincent Lepetit - Real-time computer vision](https://reader034.vdocuments.us/reader034/viewer/2022051412/5485c96eb4af9f910d8b4e84/html5/thumbnails/75.jpg)
We can use random tests • For a small number of classes
– we can try several tests, and– retain the best one according to some criterion.
75
![Page 76: Vincent Lepetit - Real-time computer vision](https://reader034.vdocuments.us/reader034/viewer/2022051412/5485c96eb4af9f910d8b4e84/html5/thumbnails/76.jpg)
We can use random tests• For a small number of classes
– we can try several tests, and– retain the best one according to some criterion.
• When the number of classes is large– any test does a decent job:
76
![Page 77: Vincent Lepetit - Real-time computer vision](https://reader034.vdocuments.us/reader034/viewer/2022051412/5485c96eb4af9f910d8b4e84/html5/thumbnails/77.jpg)
Why it is Interesting
• Building the trees takes no time (we still have to estimate the posterior probabilities);
• Allows incremental learning;
• Simplifies the classifier structure.
77
![Page 78: Vincent Lepetit - Real-time computer vision](https://reader034.vdocuments.us/reader034/viewer/2022051412/5485c96eb4af9f910d8b4e84/html5/thumbnails/78.jpg)
The Tree Structure is not Needed
78
![Page 79: Vincent Lepetit - Real-time computer vision](https://reader034.vdocuments.us/reader034/viewer/2022051412/5485c96eb4af9f910d8b4e84/html5/thumbnails/79.jpg)
The Tree Structure is not Needed
f1
f2
f3
79
![Page 80: Vincent Lepetit - Real-time computer vision](https://reader034.vdocuments.us/reader034/viewer/2022051412/5485c96eb4af9f910d8b4e84/html5/thumbnails/80.jpg)
The Tree Structure is not Needed
f1
f2
f3
Results of pixel comparisons (0 or 1) Class Label
The distributions can be expressed simply, as:
80
![Page 81: Vincent Lepetit - Real-time computer vision](https://reader034.vdocuments.us/reader034/viewer/2022051412/5485c96eb4af9f910d8b4e84/html5/thumbnails/81.jpg)
Compromise:
which is proportional to
but complete representation of the joint distribution infeasible.
Naive Bayesian ignores the correlation:
We are looking for
€
argmaxi
P(C = ci patch)
If patch can be represented by a set of image features { fi }:
€
P(C = ci patch) = P(C = ci f1, f2,… fn, fn+1,… … fN )
81
![Page 82: Vincent Lepetit - Real-time computer vision](https://reader034.vdocuments.us/reader034/viewer/2022051412/5485c96eb4af9f910d8b4e84/html5/thumbnails/82.jpg)
Training
82
![Page 83: Vincent Lepetit - Real-time computer vision](https://reader034.vdocuments.us/reader034/viewer/2022051412/5485c96eb4af9f910d8b4e84/html5/thumbnails/83.jpg)
Training
83
![Page 84: Vincent Lepetit - Real-time computer vision](https://reader034.vdocuments.us/reader034/viewer/2022051412/5485c96eb4af9f910d8b4e84/html5/thumbnails/84.jpg)
Training0
1
1
6
84
![Page 85: Vincent Lepetit - Real-time computer vision](https://reader034.vdocuments.us/reader034/viewer/2022051412/5485c96eb4af9f910d8b4e84/html5/thumbnails/85.jpg)
Training1
0
0
0
1
1
6
1
85
![Page 86: Vincent Lepetit - Real-time computer vision](https://reader034.vdocuments.us/reader034/viewer/2022051412/5485c96eb4af9f910d8b4e84/html5/thumbnails/86.jpg)
Training1
0
1
5
1
0
0
0
1
1
6
1
86
![Page 87: Vincent Lepetit - Real-time computer vision](https://reader034.vdocuments.us/reader034/viewer/2022051412/5485c96eb4af9f910d8b4e84/html5/thumbnails/87.jpg)
Training
87
![Page 88: Vincent Lepetit - Real-time computer vision](https://reader034.vdocuments.us/reader034/viewer/2022051412/5485c96eb4af9f910d8b4e84/html5/thumbnails/88.jpg)
Training
88
![Page 89: Vincent Lepetit - Real-time computer vision](https://reader034.vdocuments.us/reader034/viewer/2022051412/5485c96eb4af9f910d8b4e84/html5/thumbnails/89.jpg)
Training Results
Normalize:
€
P( f1, f2,…, fn |C = ci)000001
111
∑ =1
89
![Page 90: Vincent Lepetit - Real-time computer vision](https://reader034.vdocuments.us/reader034/viewer/2022051412/5485c96eb4af9f910d8b4e84/html5/thumbnails/90.jpg)
Training Results
Normalize:
€
P( f1, f2,…, fn |C = ci)000001
111
∑ =1
90
![Page 91: Vincent Lepetit - Real-time computer vision](https://reader034.vdocuments.us/reader034/viewer/2022051412/5485c96eb4af9f910d8b4e84/html5/thumbnails/91.jpg)
Recognition
91
![Page 92: Vincent Lepetit - Real-time computer vision](https://reader034.vdocuments.us/reader034/viewer/2022051412/5485c96eb4af9f910d8b4e84/html5/thumbnails/92.jpg)
Normalization
Normalize:
€
P( f1, f2,…, fn |C = ci)000001
111
∑ =1
92
![Page 93: Vincent Lepetit - Real-time computer vision](https://reader034.vdocuments.us/reader034/viewer/2022051412/5485c96eb4af9f910d8b4e84/html5/thumbnails/93.jpg)
Subtlety with Normalizationpleaf, class =
Number of samples(leaf, class)Number of samples(class)
too selective:Number of samples(leaf, class) can be 0 simply because the training set is finite.
we use:pleaf, class =
Number of samples(leaf, class)+NregularizationNumber of samples(class)+Number of leaves×Nregularization
This can be done by simply initializing the counters to Nregularization instead of 0.
93
![Page 94: Vincent Lepetit - Real-time computer vision](https://reader034.vdocuments.us/reader034/viewer/2022051412/5485c96eb4af9f910d8b4e84/html5/thumbnails/94.jpg)
Influence of Nregularization
pleaf, class =Number of samples(leaf, class)+Nregularization
Number of samples(class)+Number of leaves×Nregularization
Nregularization (log scale)
Recognition rate
50%
94
![Page 95: Vincent Lepetit - Real-time computer vision](https://reader034.vdocuments.us/reader034/viewer/2022051412/5485c96eb4af9f910d8b4e84/html5/thumbnails/95.jpg)
Implementation of Feature Point Recognition with Ferns
1: for(int i = 0; i < H; i++) P[i] = 0.; 2: for(int k = 0; k < M; k++) { 3: int index = 0, * d = D + k * 2 * S; 4: for(int j = 0; j < S; j++) { 5: index <<= 1; 6: if (*(K + d[0]) < *(K + d[1])) 7: index++; 8: d += 2; } 9: p = PF + k * shift2 + index * shift1;10: for(int i = 0; i < H; i++) P[i] += p[i]; }
• Very simple to implement;• No need for orientation, perspective, light correction.
95
![Page 96: Vincent Lepetit - Real-time computer vision](https://reader034.vdocuments.us/reader034/viewer/2022051412/5485c96eb4af9f910d8b4e84/html5/thumbnails/96.jpg)
Number of inliers for Ferns
Number of inliers for SIFT
each point corresponds to an image from a 1000-frame sequence
Ferns are much faster, sometimes more accurate, but SIFT does not need training.
Ferns versus SIFT
96
![Page 97: Vincent Lepetit - Real-time computer vision](https://reader034.vdocuments.us/reader034/viewer/2022051412/5485c96eb4af9f910d8b4e84/html5/thumbnails/97.jpg)
Randomized Trees vs Ferns
Ferns more discriminant but more sensitive to outliers.
Ferns with productRT (with random tests) with product
Ferns with averageRT (with random tests) with average
Different combination strategies: average (RT) / product (Ferns)
Rec
ogni
tion
rate
Number of structures
97
![Page 98: Vincent Lepetit - Real-time computer vision](https://reader034.vdocuments.us/reader034/viewer/2022051412/5485c96eb4af9f910d8b4e84/html5/thumbnails/98.jpg)
Randomized Trees vs FernsInfluence of the number of classes:
Ferns with product
Ferns with averageRec
ogni
tion
rate
98
![Page 99: Vincent Lepetit - Real-time computer vision](https://reader034.vdocuments.us/reader034/viewer/2022051412/5485c96eb4af9f910d8b4e84/html5/thumbnails/99.jpg)
Memory and Computation Time
• Recognition time grows linearly with the number of Trees/Ferns and the number of classes.
• Recognition time grows linearly with the logarithm of the depth of Trees/Ferns.
• Memory grows linearly with the number of Trees/Ferns and the number of classes.
• Memory grows exponentially with the depth of Trees/Ferns.
• Increasing the depth may result in overfitting.• Increasing the number of Trees/Ferns (usually) improves
recognition.
99
![Page 100: Vincent Lepetit - Real-time computer vision](https://reader034.vdocuments.us/reader034/viewer/2022051412/5485c96eb4af9f910d8b4e84/html5/thumbnails/100.jpg)
Influence of the Number of FernsFerns with productRT (with random tests) with product
Ferns with averageRT (with random tests) with average
Rec
ogni
tion
rate
Number of structures
Increasing the number of Ferns/Trees improves the recognition rate, but increases the computation time and memory.
100
![Page 101: Vincent Lepetit - Real-time computer vision](https://reader034.vdocuments.us/reader034/viewer/2022051412/5485c96eb4af9f910d8b4e84/html5/thumbnails/101.jpg)
Number of Ferns / Number of Leaves / Memory / Computation Time
Rec
ogni
tion
Rat
e
Fern size
Number of Ferns
Com
puta
tion
Tim
eFern size
101
![Page 102: Vincent Lepetit - Real-time computer vision](https://reader034.vdocuments.us/reader034/viewer/2022051412/5485c96eb4af9f910d8b4e84/html5/thumbnails/102.jpg)
Conclusions on Randomized Trees and Ferns
• Simple to implement, Ferns even simpler;
• Both very fast, but dumb: need a lot of training examples to learn.
• Use a lot of memory to store the posterior distributions in the leaves.
102
![Page 103: Vincent Lepetit - Real-time computer vision](https://reader034.vdocuments.us/reader034/viewer/2022051412/5485c96eb4af9f910d8b4e84/html5/thumbnails/103.jpg)
We now have correspondences between a reference image of the object and the input image:
Some correspondences are correct, some are not.We can estimate the homography between the 2 images by applying RANSAC on subsets of 4 correspondences.
103
![Page 104: Vincent Lepetit - Real-time computer vision](https://reader034.vdocuments.us/reader034/viewer/2022051412/5485c96eb4af9f910d8b4e84/html5/thumbnails/104.jpg)
Computing a Homography from Point Correspondencesby solving a linear system
�u v 1 0 0 0 uu� vu� u�
0 0 0 u v 1 uv� vv� v�
�
H11
H12
H13
H21
H22
H23
H31
H32
H33
=�
00
�
m�m
�m� = H �m�m = [u, v, 1]�, �m� = [ku�, kv�, k]�
104
![Page 105: Vincent Lepetit - Real-time computer vision](https://reader034.vdocuments.us/reader034/viewer/2022051412/5485c96eb4af9f910d8b4e84/html5/thumbnails/105.jpg)
Computing a Homography from Point Correspondencesby solving a linear system
m�m
�m� = H �m�m = [u, v, 1]�, �m� = [ku�, kv�, k]�
u� = H11u+H12u+H13H31u+H32u+H33
v� = H21u+H22u+H23H31u+H32u+H33
105
![Page 106: Vincent Lepetit - Real-time computer vision](https://reader034.vdocuments.us/reader034/viewer/2022051412/5485c96eb4af9f910d8b4e84/html5/thumbnails/106.jpg)
Computing a Homography from Point Correspondencesby solving a linear system
u� = H11u+H12u+H13H31u+H32u+H33
v� = H21u+H22u+H23H31u+H32u+H33
�u v 1 0 0 0 uu� vu� u�
0 0 0 u v 1 uv� vv� v�
�
H11
H12
H13
H21
H22
H23
H31
H32
H33
=�
00
�
Using four correspondences:BX = 08
with X = [H11,H12,H13,H21,H22,H23,H31,H32,H33]�
106
![Page 107: Vincent Lepetit - Real-time computer vision](https://reader034.vdocuments.us/reader034/viewer/2022051412/5485c96eb4af9f910d8b4e84/html5/thumbnails/107.jpg)
How to Solve this Linear System ?
• X is the null eigenvector of B.
• In practice: the eigenvector corresponding to the smallest eigenvalue.
BX = 08
with X = [H11,H12,H13,H21,H22,H23,H31,H32,H33]�
107
![Page 108: Vincent Lepetit - Real-time computer vision](https://reader034.vdocuments.us/reader034/viewer/2022051412/5485c96eb4af9f910d8b4e84/html5/thumbnails/108.jpg)
• Non-linear least-squares minimization: Minimization of a physical, meaningful error (reprojection error, in pixels)
• Minimization algorithms: Gauss-Newton or Levenberg-Marquardt (very efficient).
Computing a Homography from Point Correspondences with a non-linear optimization
minR,T
�
i
dist2�HR,Tmi,m
�i
�
m�mHR,Tm
108
![Page 109: Vincent Lepetit - Real-time computer vision](https://reader034.vdocuments.us/reader034/viewer/2022051412/5485c96eb4af9f910d8b4e84/html5/thumbnails/109.jpg)
Numerical Optimization
p0
Start from an initial guess p0:
p0 can be taken randomly but should be as close as possible to the global minimum:
- pose computed at time t-1;- pose predicted from pose computed at time t-1 and a motion model;- ...
p1p2
109
![Page 110: Vincent Lepetit - Real-time computer vision](https://reader034.vdocuments.us/reader034/viewer/2022051412/5485c96eb4af9f910d8b4e84/html5/thumbnails/110.jpg)
Numerical OptimizationGeneral methods:• Gradient descent / Steepest Descent;• Conjugate Gradient;• ...
Non-linear Least-squares optimization:• Gauss-Newton;• Levenberg-Marquardt;• ...
110
![Page 111: Vincent Lepetit - Real-time computer vision](https://reader034.vdocuments.us/reader034/viewer/2022051412/5485c96eb4af9f910d8b4e84/html5/thumbnails/111.jpg)
Numerical OptimizationWe want to find p that minimizes:
where • p is a vector of parameters that define the camera pose (translation vector + parameters of the rotation matrix);• b is a vector made of the measurements (here the m’i);• f is the function that relates the camera pose to these measurements.
f(p) =
u(HR(p),T(p)m1)v(HR(p),T(p)m1)
...
b =
u(m�
1)v(m�
1)...
E(p) =�
i dist2�HR(p),T(p)mi,m�
i
�
= �f(p)− b�2
111
![Page 112: Vincent Lepetit - Real-time computer vision](https://reader034.vdocuments.us/reader034/viewer/2022051412/5485c96eb4af9f910d8b4e84/html5/thumbnails/112.jpg)
Gradient descent / Steepest Descent
Weaknesses:- How to choose λ ? - Needs a lot of iterations in long and narrow valleys:
€
pi+1 = pi − λ∇E(pi)
E(pi) = f (pi) −b2
= f (pi) −b( )T f (pi) −b( )→∇E(pi) = 2J f (pi) −b( ) with J the Jacobian matrix of f , computed at pi
112
![Page 113: Vincent Lepetit - Real-time computer vision](https://reader034.vdocuments.us/reader034/viewer/2022051412/5485c96eb4af9f910d8b4e84/html5/thumbnails/113.jpg)
The Gauss-Newton and the Levenberg-Marquardt algorithms
But first, the Linear Least-Squares Case:
If the function f is linear ie f(p) = Ap, p can be estimated as:
p=A+b
where A+ is the pseudo-inverse of A: A+=(ATA)-1AT€
E(p) = f (p) −b 2
113
![Page 114: Vincent Lepetit - Real-time computer vision](https://reader034.vdocuments.us/reader034/viewer/2022051412/5485c96eb4af9f910d8b4e84/html5/thumbnails/114.jpg)
Non-Linear Least-Squares: The Gauss-Newton algorithm
Iteration steps:
pi+1=pi + ∆i
∆i is chosen to minimize the residual || f(pi+1) – b ||2. It is computed by approximating f to the first order:
€
Δ i = argminΔ
f (pi + Δ) −b 2
= argminΔ
f (pi) + JΔ −b 2 First order approximation: f (pi + Δ) ≈ f (pi) + JΔ
= argminΔ
εi + JΔ 2εi = f (pi) −b denotes the residual at iteration i
Δ i is the solution of the system JΔ = −εi in the least − squares sense :Δ i = −J+εi where J+ is the pseudo - inverse of J
114
![Page 115: Vincent Lepetit - Real-time computer vision](https://reader034.vdocuments.us/reader034/viewer/2022051412/5485c96eb4af9f910d8b4e84/html5/thumbnails/115.jpg)
Non-Linear Least-Squares: The Levenberg-Marquardt Algorithm
In the Gauss-Newton algorithm:
In the Levenberg-Marquardt algorithm:
Levenberg-Marquardt Algorithm:
0. Initialize λ with a small value: λ = 0.001
1. Compute ∆i and E(pi + ∆i)
2. If E(pi + ∆i) > E(pi): λ ← 10 λ and go back to 1 [happens when the linear approximation of f is too coarse]
3. If E(pi + ∆i) < E(pi): λ ← λ / 10, pi+1 ← pi + ∆i and go back to 1.
Once converged, set λ ← 0 and continue up to convergence.
€
Δ i = − JTJ( )−1JTεi
€
Δ i = − JTJ + λI( )−1JTεi
115
![Page 116: Vincent Lepetit - Real-time computer vision](https://reader034.vdocuments.us/reader034/viewer/2022051412/5485c96eb4af9f910d8b4e84/html5/thumbnails/116.jpg)
Non-Linear Least-Squares: the Levenberg-Marquardt Algorithm
• When λ is small, LM behaves similarly to the Gauss-Newton algorithm.• When λ becomes large, LM behaves similarly to a steepest descent to guarantee
convergence.
€
Δ i = − JTJ + λI( )−1JTεi
116
![Page 117: Vincent Lepetit - Real-time computer vision](https://reader034.vdocuments.us/reader034/viewer/2022051412/5485c96eb4af9f910d8b4e84/html5/thumbnails/117.jpg)
Another Way to Refine the Pose:Template Matching
117
![Page 118: Vincent Lepetit - Real-time computer vision](https://reader034.vdocuments.us/reader034/viewer/2022051412/5485c96eb4af9f910d8b4e84/html5/thumbnails/118.jpg)
Global region tracking by minimizing cross-correlation: •Useful for objects difficult to model using local features;•Accurate.
Template T
Input Image Ip
118
![Page 119: Vincent Lepetit - Real-time computer vision](https://reader034.vdocuments.us/reader034/viewer/2022051412/5485c96eb4af9f910d8b4e84/html5/thumbnails/119.jpg)
Lucas-Kanade Algorithm
Gauss-Newton step:€
minp
W (I,p)[m j ]−T[m j ]( )2
j∑
€
Δ i = Jp+ ⋅ εp,I
Template T
mj
Input Image I
Pseudo-inverse of the Jacobian of W(I, p) evaluated at p and the mj
€
εp,I = (…,T[m j ]−W (I,p)[m j ],…)T
p
119
![Page 120: Vincent Lepetit - Real-time computer vision](https://reader034.vdocuments.us/reader034/viewer/2022051412/5485c96eb4af9f910d8b4e84/html5/thumbnails/120.jpg)
Template T
Lucas-Kanade Algorithm
Computing J and J+ is computationally expensive.
p0
p
120
![Page 121: Vincent Lepetit - Real-time computer vision](https://reader034.vdocuments.us/reader034/viewer/2022051412/5485c96eb4af9f910d8b4e84/html5/thumbnails/121.jpg)
Inverse Compositional Algorithm[Baker et al. IJCV03]
Template T
Input Image It
pi = pi-1 + dpi
dpi = Jp= 0+ εp= 0,I
Jp=0 is a constant matrix and can therefore be precomputed !
-pi-1dpi
121
![Page 122: Vincent Lepetit - Real-time computer vision](https://reader034.vdocuments.us/reader034/viewer/2022051412/5485c96eb4af9f910d8b4e84/html5/thumbnails/122.jpg)
ESM (Efficient Second-order Method)(1) I = T + Jp=0dp + dpTHp=0dp [second-order Taylor expansion]
(2) Jp=dp = Jp=0 + 2dpTHp=0 [derivation of (1) wrt p]
(3) dpTHp=0 = ½(Jp=dp - Jp=0) [from Equation (2)]
(4) I = T + Jp=0 + ½(Jp=dp - Jp=0)dp [by injecting (3) in (1)]
(5) dp = [½(Jp=0 + Jp=dp)]+ (I - T) [from Equation (4)]
Like Gauss-Newton but replace Jp=0 by ½(Jp=0 + Jp=dp).Need to compute Jp=dp at each iteration, and a pseudo-inverse
at each iteration, but need much less iterations.122
![Page 123: Vincent Lepetit - Real-time computer vision](https://reader034.vdocuments.us/reader034/viewer/2022051412/5485c96eb4af9f910d8b4e84/html5/thumbnails/123.jpg)
BRIEF [ECCV’10]very fast feature point descriptor
123
![Page 124: Vincent Lepetit - Real-time computer vision](https://reader034.vdocuments.us/reader034/viewer/2022051412/5485c96eb4af9f910d8b4e84/html5/thumbnails/124.jpg)
Remark
• Moving legacy code to new CPUs does not result in a speed-up anymore;
• Should consider the features of new platforms: parallelism (multi-cores, GPU), locality, ...
124
![Page 125: Vincent Lepetit - Real-time computer vision](https://reader034.vdocuments.us/reader034/viewer/2022051412/5485c96eb4af9f910d8b4e84/html5/thumbnails/125.jpg)
1
1
0...
0
1
BRIEF descriptor
Gaussian smoothing
125
![Page 126: Vincent Lepetit - Real-time computer vision](https://reader034.vdocuments.us/reader034/viewer/2022051412/5485c96eb4af9f910d8b4e84/html5/thumbnails/126.jpg)
1
1
0...
0
1
BRIEF descriptor
Gaussian smoothing
Alternatively, using integral images:
126
![Page 127: Vincent Lepetit - Real-time computer vision](https://reader034.vdocuments.us/reader034/viewer/2022051412/5485c96eb4af9f910d8b4e84/html5/thumbnails/127.jpg)
Integral Images
Integral Image
Integral Image(u, v) =�
i=1..u
�
j=1..v
Image(i, j)
127
![Page 128: Vincent Lepetit - Real-time computer vision](https://reader034.vdocuments.us/reader034/viewer/2022051412/5485c96eb4af9f910d8b4e84/html5/thumbnails/128.jpg)
=
-
-
+
How to Use Integral Images
128
![Page 129: Vincent Lepetit - Real-time computer vision](https://reader034.vdocuments.us/reader034/viewer/2022051412/5485c96eb4af9f910d8b4e84/html5/thumbnails/129.jpg)
[Viola & Jones, IJCV’01]
Features computed in constant time
129
![Page 130: Vincent Lepetit - Real-time computer vision](https://reader034.vdocuments.us/reader034/viewer/2022051412/5485c96eb4af9f910d8b4e84/html5/thumbnails/130.jpg)
Computing Integral Images
IntegralImage[u][v] = IntegralImage[u][v-1] +LineBuffer[u] +Image[u][v]
130
![Page 131: Vincent Lepetit - Real-time computer vision](https://reader034.vdocuments.us/reader034/viewer/2022051412/5485c96eb4af9f910d8b4e84/html5/thumbnails/131.jpg)
Evaluation
131
![Page 132: Vincent Lepetit - Real-time computer vision](https://reader034.vdocuments.us/reader034/viewer/2022051412/5485c96eb4af9f910d8b4e84/html5/thumbnails/132.jpg)
Evaluation
132
![Page 133: Vincent Lepetit - Real-time computer vision](https://reader034.vdocuments.us/reader034/viewer/2022051412/5485c96eb4af9f910d8b4e84/html5/thumbnails/133.jpg)
Computation Speed
For BRIEF, most of the time is spent in Gaussian smoothing.
133
![Page 134: Vincent Lepetit - Real-time computer vision](https://reader034.vdocuments.us/reader034/viewer/2022051412/5485c96eb4af9f910d8b4e84/html5/thumbnails/134.jpg)
Matching Speed distance(BRIEF descriptor 1, BRIEF descriptor 2)
= Hamming distance(BRIEF descriptor 1, BRIEF descriptor 2)
= number of bits set to 1(BRIEF descriptor 1 xor BRIEF descriptor 2)
= popcount(BRIEF descriptor 1 xor BRIEF descriptor 2)
10- to 15-fold speed increase on Intel's Bloomfield (SSE 4.2) and AMD's Phenom (SSE 4a)
134
![Page 135: Vincent Lepetit - Real-time computer vision](https://reader034.vdocuments.us/reader034/viewer/2022051412/5485c96eb4af9f910d8b4e84/html5/thumbnails/135.jpg)
Matching Speed
135
![Page 136: Vincent Lepetit - Real-time computer vision](https://reader034.vdocuments.us/reader034/viewer/2022051412/5485c96eb4af9f910d8b4e84/html5/thumbnails/136.jpg)
Picking the Locations
uniform distribution Gaussian distribution Gaussian distribution for location and length
uniform distribution on Polar coordinates
census transform locations
136
![Page 137: Vincent Lepetit - Real-time computer vision](https://reader034.vdocuments.us/reader034/viewer/2022051412/5485c96eb4af9f910d8b4e84/html5/thumbnails/137.jpg)
Picking the Locations
uniform distribution Gaussian distribution Gaussian distribution for location and length
uniform distribution on Polar coordinates
census transform locations
137
![Page 138: Vincent Lepetit - Real-time computer vision](https://reader034.vdocuments.us/reader034/viewer/2022051412/5485c96eb4af9f910d8b4e84/html5/thumbnails/138.jpg)
Rotation and Scale Invariance
138
![Page 139: Vincent Lepetit - Real-time computer vision](https://reader034.vdocuments.us/reader034/viewer/2022051412/5485c96eb4af9f910d8b4e84/html5/thumbnails/139.jpg)
Rotation and Scale InvarianceDuplicate the Descriptors:18 rotations x 3 scales
...
...
...
139
![Page 140: Vincent Lepetit - Real-time computer vision](https://reader034.vdocuments.us/reader034/viewer/2022051412/5485c96eb4af9f910d8b4e84/html5/thumbnails/140.jpg)
code released in GPL on CVLab website
140
![Page 141: Vincent Lepetit - Real-time computer vision](https://reader034.vdocuments.us/reader034/viewer/2022051412/5485c96eb4af9f910d8b4e84/html5/thumbnails/141.jpg)
DOT [CVPR’10]dense descriptor for object detection
Joint work with Stefan Hinterstoisser (TU Munich)
141
![Page 142: Vincent Lepetit - Real-time computer vision](https://reader034.vdocuments.us/reader034/viewer/2022051412/5485c96eb4af9f910d8b4e84/html5/thumbnails/142.jpg)
Template matching with an efficient representation of the images and the templates.
object detection with a sliding window and template matching
142
![Page 143: Vincent Lepetit - Real-time computer vision](https://reader034.vdocuments.us/reader034/viewer/2022051412/5485c96eb4af9f910d8b4e84/html5/thumbnails/143.jpg)
143
![Page 144: Vincent Lepetit - Real-time computer vision](https://reader034.vdocuments.us/reader034/viewer/2022051412/5485c96eb4af9f910d8b4e84/html5/thumbnails/144.jpg)
144
![Page 145: Vincent Lepetit - Real-time computer vision](https://reader034.vdocuments.us/reader034/viewer/2022051412/5485c96eb4af9f910d8b4e84/html5/thumbnails/145.jpg)
Initial Similarity Measure
145
![Page 146: Vincent Lepetit - Real-time computer vision](https://reader034.vdocuments.us/reader034/viewer/2022051412/5485c96eb4af9f910d8b4e84/html5/thumbnails/146.jpg)
Making the Similarity Measure Robust to Small Motions
146
![Page 147: Vincent Lepetit - Real-time computer vision](https://reader034.vdocuments.us/reader034/viewer/2022051412/5485c96eb4af9f910d8b4e84/html5/thumbnails/147.jpg)
Downsampling
147
![Page 148: Vincent Lepetit - Real-time computer vision](https://reader034.vdocuments.us/reader034/viewer/2022051412/5485c96eb4af9f910d8b4e84/html5/thumbnails/148.jpg)
Ignoring the Dependencies between the Regions...
148
![Page 149: Vincent Lepetit - Real-time computer vision](https://reader034.vdocuments.us/reader034/viewer/2022051412/5485c96eb4af9f910d8b4e84/html5/thumbnails/149.jpg)
Lists of Dominant Orientations
149
![Page 150: Vincent Lepetit - Real-time computer vision](https://reader034.vdocuments.us/reader034/viewer/2022051412/5485c96eb4af9f910d8b4e84/html5/thumbnails/150.jpg)
Fast Computation with Bitwise Operations
0000110000010000
150
![Page 151: Vincent Lepetit - Real-time computer vision](https://reader034.vdocuments.us/reader034/viewer/2022051412/5485c96eb4af9f910d8b4e84/html5/thumbnails/151.jpg)
Code available under LGPL license athttp://campar.in.tum.de/personal/hinterst/index/
151
![Page 152: Vincent Lepetit - Real-time computer vision](https://reader034.vdocuments.us/reader034/viewer/2022051412/5485c96eb4af9f910d8b4e84/html5/thumbnails/152.jpg)
New Method, LINE[PAMI, under revision]
152
![Page 153: Vincent Lepetit - Real-time computer vision](https://reader034.vdocuments.us/reader034/viewer/2022051412/5485c96eb4af9f910d8b4e84/html5/thumbnails/153.jpg)
Initial Similarity Measure
ESteger(I,O, c) =�
r
��� cos�orientation(O, r)− orientation(I, c + r)
����
previous measure:153
![Page 154: Vincent Lepetit - Real-time computer vision](https://reader034.vdocuments.us/reader034/viewer/2022051412/5485c96eb4af9f910d8b4e84/html5/thumbnails/154.jpg)
Making the Similarity Measure Robust to Small Motions
ESteger(I,O, c) =�
r
��� cos�orientation(O, r)− orientation(I, c + r)
����
E(I,O, c) =�
r
�max
t∈region(c+r)
�� cos�orientation(O, r)− orientation(I, t)
���
154
![Page 155: Vincent Lepetit - Real-time computer vision](https://reader034.vdocuments.us/reader034/viewer/2022051412/5485c96eb4af9f910d8b4e84/html5/thumbnails/155.jpg)
Avoiding to Recompute the max Operator1. spread the gradients
155
![Page 156: Vincent Lepetit - Real-time computer vision](https://reader034.vdocuments.us/reader034/viewer/2022051412/5485c96eb4af9f910d8b4e84/html5/thumbnails/156.jpg)
2. precompute response mapsBecause• we consider only a discrete set of gradient directions;• we do not consider the gradient norms,we can precompute a response for each region in the image and a gradient direction for the template
in the template
in the template
156
![Page 157: Vincent Lepetit - Real-time computer vision](https://reader034.vdocuments.us/reader034/viewer/2022051412/5485c96eb4af9f910d8b4e84/html5/thumbnails/157.jpg)
Optimized Version
11010
1. The sets of orientations in the image regions are encoded with a binary representation:
157
![Page 158: Vincent Lepetit - Real-time computer vision](https://reader034.vdocuments.us/reader034/viewer/2022051412/5485c96eb4af9f910d8b4e84/html5/thumbnails/158.jpg)
Optimized Version2. The binary representation is used as an index to lookup tables with the precomputed responses for each gradient direction in the template:
158
![Page 159: Vincent Lepetit - Real-time computer vision](https://reader034.vdocuments.us/reader034/viewer/2022051412/5485c96eb4af9f910d8b4e84/html5/thumbnails/159.jpg)
Avoiding Caches MissesThe response maps are re-arranged into linear memories:
159
![Page 160: Vincent Lepetit - Real-time computer vision](https://reader034.vdocuments.us/reader034/viewer/2022051412/5485c96eb4af9f910d8b4e84/html5/thumbnails/160.jpg)
Using the Linear Memories
The similarity measure can be computed for all the image locations by summing linear memories, shifted by an offset that depends on the template.
160
![Page 161: Vincent Lepetit - Real-time computer vision](https://reader034.vdocuments.us/reader034/viewer/2022051412/5485c96eb4af9f910d8b4e84/html5/thumbnails/161.jpg)
Advantage of Linearizing the Memory
Speed-up factor
161
![Page 162: Vincent Lepetit - Real-time computer vision](https://reader034.vdocuments.us/reader034/viewer/2022051412/5485c96eb4af9f910d8b4e84/html5/thumbnails/162.jpg)
DOT [CVPR’10]LINE
162
![Page 163: Vincent Lepetit - Real-time computer vision](https://reader034.vdocuments.us/reader034/viewer/2022051412/5485c96eb4af9f910d8b4e84/html5/thumbnails/163.jpg)
LINE-MOD [Hinterstoisser et al, ICCV’11]
Extension to the Kinect: the templates combine the image and the depth map.
163
![Page 164: Vincent Lepetit - Real-time computer vision](https://reader034.vdocuments.us/reader034/viewer/2022051412/5485c96eb4af9f910d8b4e84/html5/thumbnails/164.jpg)
thanks!
164