low complexity keypoint recognition and pose estimation vincent lepetit

Low Complexity Keypoint Recognition and Pose Estimation

Vincent Lepetit

Real-Time 3D Object Detection

Runs at 15 Hz

QuickTime™ and a decompressor

are needed to see this picture.

Keypoint Recognition

Pre-processingMake the actual classification easier

Nearest neighbor classification

One class per keypoint: the set of the keypoint’s possible appearances under various perspective, lighting, noise...

The general approach [Lowe, Matas, Mikolajczyk] is a particular case of classification:

Search in the Database

Used at run-time to recognize the keypoints

Training phase Classifier

A New Classifier: FernsJoint Work with Mustafa Özuysal

Compromise:

which is proportional to

but complete representation of the joint distribution infeasible.

Naive Bayesian ignores the correlation:

We are looking for

argmaxi

P(C = c i patch)

If patch can be represented by a set of image features { fi }:

P(C = c i patch) = P(C = c i f1, f2,K fn, fn +1,K K fN )

Presentation on an Example

Ferns: TrainingThe tests compare the intensities of two pixels around the keypoint:

Invariant to light change by any raising function.

Posterior probabilities:

Ferns: Training

Ferns: Training Results

Ferns: Recognition

It Really Works

Ferns outperform Trees500 classes.

No orientation or perspective correction.

Number of structures

Recognition rateFerns responses are combined multiplicatively(Naive Bayesian rule)

Trees responses are combined additively(average)

Optimized Locations versus Random Locations:We Can Use Random Tests

Number of trees

Recognition rate Information gain optimizationRandomness

Comparison of the recognition rates for 200 keypoints:

We Can Use Random Tests

For a small number of classeswe can try several tests, and

retain the best one according to some criterion.

We Can Use Random Tests

For a small number of classeswe can try several tests, and

retain the best one according to some criterion.

When the number of classes is largeany test does a decent job:

Another Graphical Interpretation

We Can Use Random Tests:Why It Is Interesting

Building the ferns takes no time (except for the posterior probabilities estimation);

Simplifies the classifier structure;

Allows incremental learning.

Comparison with SIFTRecognition rate

Frame Index

Number of Inliers

Comparison with SIFTComputation time

• SIFT: 1 ms to compute the descriptor of a keypoint (without including convolution);

• FERNS: 13.5 micro-second to classify one keypoint into 200 classes.

1: for(int i = 0; i < H; i++) P[i ] = 0.; 2: for(int k = 0; k < M; k++) { 3: int index = 0, * d = D + k * 2 * S; 4: for(int j = 0; j < S; j++) { 5: index <<= 1; 6: if (*(K + d[0]) < *(K + d[1])) 7: index++; 8: d += 2; } 9: p = PF + k * shift2 + index * shift1;10: for(int i = 0; i < H; i++) P[i] += p[i]; }

Very simple to implement;No need for orientation nor perspective correction;(Almost) no parameters to tune;Very fast.

Keypoint Recognition in Ten Lines of Code

Ferns Tuning

• The number of ferns, and

• The number of tests per ferns

can be tuned to adapt to the hardware in terms of CPU power and memory size.

Feature Harvesting

Estimate the posterior probabilities from a training video sequence:

QuickTime™ and aYUV420 codec decompressor

Feature Harvesting

Update Classifier

Detect Object in Current Frame

With the ferns, we can easily:

- add a class;

- remove a class;

- add samples of a class to refine the classifier.

Incremental learning

No need to store image patches; We can select the keypoints the classifier can recognize.

Training examplesMatches

Test Sequence

Handling Light Changes

Low Complexity Keypoint Recognition and Pose Estimation

EPnP: An Accurate Non-Iterative O(n) Solution to the PnP Problem

Joint Work with Francesc Moreno-Noguer

The Perspective-n-Point (PnP) Problem

How to take advantage of the internal parameters ?

Solutions exist for the specific cases n = 3 [...], n = 4 [...], n = 5 [...], and the general case [...].

Rotation, Translation ?Internal

parameters known

2D/3D correspondences

A Stable Algorithm

MEAN MEDIAN

Rotation Error (%)

qtrue − q

⎝ ⎜

⎠ ⎟

Number of points used to estimate pose

LHM: Lu-Hager-Mjolsness, Fast and Globally Convergent Pose Estimation from Video Images. PAMI'00. (Alternatively optimize over Rotation and Translation);

EPnP: Our method.

A Fast Algorithm

Rotation Error (%)

Computation Time (sec) - Logarithmic scale

qtrue − q

⎝ ⎜

⎠ ⎟

MEDIAN

General Approach

Estimate the coordinates of the 3D points in the camera coordinate system.

knownpiworld

picamera ?

estimatedpicamera

Rotation, Translation[Lu et al. PAMI00]

pi = α ijc j

Introducing Control PointsThe 3D points are expressed as a weighted sum of four control points.

c1camera

c2camera

c3camera

c4camera

⎢ ⎢ ⎢ ⎢

⎥ ⎥ ⎥ ⎥

c1camera

c2camera

c3camera

c4camera

⎢ ⎢ ⎢ ⎢

⎥ ⎥ ⎥ ⎥

12 unknowns: The coordinates of the control points in the camera coordinates system.

The Point Reprojections Give a Linear System

⎣ ⎢

⎦ ⎥= Ap

camera = A α ijc jcamera

For each correspondence i:

Rewriting and Concatenating the Equations from all the Correspondences:

Mx = 0

withx =

c1camera

c2camera

c3camera

c4camera

⎢ ⎢ ⎢ ⎢

⎥ ⎥ ⎥ ⎥

Mx = 0 MTMx = 0 x belongs to the null space of MTM:

with vi eigenvectors of matrix MTM associated to null eigenvalues.

Computing MTM is the most costly operation — and linear in n, the number of correspondences.

The Solution as Weighted Sum of Eigenvectors

∃N, β i{ } such that x = β ivi

• The i are our N new unknowns;• N is the dimension of the null space of MTM;

• Without noise: N = 1 (scale ambiguity).

• In practice: no zero eigenvalues, but several very small, and N ≥ 1 (depends on the 2D locations noise).

We found that only the cases N = 1, 2, 3 and 4 must be considered.

From 12 Unknowns to 1, 2, 3, or 4

∃N, β i{ } such that x = β ivi

How the Control Points Vary with the i

Reprojections in the Image Corresponding 3D points

∃ i{ } such that x = β ivi

c1camera

c2camera

c3camera

c4camera

⎢ ⎢ ⎢ ⎢

⎥ ⎥ ⎥ ⎥

When varying the i:

Imposing the Rigidity Constraint

The distances between the control points must be preserved:

6 quadratic equations in the i.

ckcamera − c l

camera 2= ck

world − c lworld 2

(known)

ckcamera − c l

camera 2= ck

(known)

The Case N = 1

c1camera

c2camera

c3camera

c4camera

⎢ ⎢ ⎢ ⎢

⎥ ⎥ ⎥ ⎥

= β1v1

1 can easily be computed: • Its absolute value is solution of a linear system:

• Its sign is chosen so that the handedness of the control points is preserved.

ckcamera − c l

camera 2= ck

(known)

, and 6 quadratic equations:

× v1[k ] − v1

[ l ] = ckworld − c l

We use the linearization technique.Gives 6 linear equations in 11 = 1

2, 12 = 1 2, and 22 = 22 :

l11 l12 l13

l21 l22 l23

l31 l32 l33

l41 l42 l43

l51 l52 l53

l61 l62 l63

⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢

⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥

⎢ ⎢ ⎢

⎥ ⎥ ⎥=

⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢

⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥

⎪ ⎪ ⎪

6 equations

l11 l12 l13

l21 l22 l23

l31 l32 l33

l41 l42 l43

l51 l52 l53

l61 l62 l63

⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢

⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥

⎢ ⎢ ⎢

⎥ ⎥ ⎥=

⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢

⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥

⎪ ⎪ ⎪

6 equations

The Case N = 2

c1camera

c2camera

c3camera

c4camera

⎢ ⎢ ⎢ ⎢

⎥ ⎥ ⎥ ⎥

= β1v1 + β 2v2

ckcamera − c l

camera 2= ck

(known)

Same linearization technique.Gives 6 linear equations for 6 unknowns:€

c1camera

c2camera

c3camera

c4camera

⎢ ⎢ ⎢ ⎢

⎥ ⎥ ⎥ ⎥

= β ivi

ckcamera − c l

camera 2= ck

(known)

l11 l12 l13 l14 l15 l16

l21 l22 l23 l24 l25 l26

l31 l32 l33 l34 l35 l36

l41 l42 l43 l44 l45 l46

l51 l52 l53 l54 l55 l56

l61 l62 l63 l64 l65 l66

⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢

⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥

⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢

⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥

⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢

⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥

⎪ ⎪ ⎪

6 equations

l11 l12 l13 l14 l15 l16

l21 l22 l23 l24 l25 l26

l31 l32 l33 l34 l35 l36

l41 l42 l43 l44 l45 l46

l51 l52 l53 l54 l55 l56

l61 l62 l63 l64 l65 l66

⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢

⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥

⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢

⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥

⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢

⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥

⎪ ⎪ ⎪

6 equations

The Case N = 3

Six quadratic equations in 1, 2, 3, and 4. The linearization introduces 10 products ab= a b

Not enough equations anymore ! Relinearization: The ab are expressed as a linear combination of eigenvectors.

l11 l12 l13 l14 l15 l16 l17 l18 l19 l1,10

l21 l22 l23 l24 l25 l26 l27 l28 l29 l2,10

l31 l32 l33 l34 l35 l36 l37 l38 l39 l3,10

l41 l42 l43 l44 l45 l46 l47 l48 l49 l4,10

l51 l52 l53 l54 l55 l56 l57 l58 l59 l5,10

l61 l62 l63 l64 l65 l66 l67 l68 l69 l6,10

⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢

⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥

⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢

⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥

⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢

⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥

⎪ ⎪ ⎪ ⎪ ⎪ ⎪

6 equations

l11 l12 l13 l14 l15 l16 l17 l18 l19 l1,10

l21 l22 l23 l24 l25 l26 l27 l28 l29 l2,10

l31 l32 l33 l34 l35 l36 l37 l38 l39 l3,10

l41 l42 l43 l44 l45 l46 l47 l48 l49 l4,10

l51 l52 l53 l54 l55 l56 l57 l58 l59 l5,10

l61 l62 l63 l64 l65 l66 l67 l68 l69 l6,10

⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢

⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥

⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢

⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥

⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢

⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥

⎪ ⎪ ⎪ ⎪ ⎪ ⎪

6 equations

The Case N = 4

Algorithm Summary

1. The control points coordinates are the (12) unknowns;

2. The 3D points should project on the given corresponding 2D locations: Linear system in the control points coordinates.

3. The control points coordinates can be expressed as a linear combination of the null eigenvectors of this linear system: The weights (the i) are the new unknowns (not more than 4).

4. Adding the rigidity constraints gives quadratic equations in the i.

5. Solving for the i depends on their number (linearization or relinearization).

Results

Thank you.

Questions ?

The Point Reprojections Give a Linear System

⎣ ⎢

⎦ ⎥= Ap

camera = A α ijc jcamera

∑For each correspondence i:

⇔ ∀i, wi

⎢ ⎢ ⎢

⎥ ⎥ ⎥=

fu 0 uc

0 fv vc

⎢ ⎢ ⎢

⎥ ⎥ ⎥

x jcamera

y jcamera

z jcamera

⎢ ⎢ ⎢

⎥ ⎥ ⎥j=1

∑Let's expand:

⇔α ij fux j

camera + α ij uc − ui( )z jcamera = 0

∑α ij fv y j

camera + α ij vc − v i( )z jcamera = 0

⎨ ⎪

⎩ ⎪

Concatenating equations from all the correspondences:

⇔ Mx = 0, with x =

c1camera

c2camera

c3camera

c4camera

⎢ ⎢ ⎢ ⎢

⎥ ⎥ ⎥ ⎥

From point reprojection:

low complexity keypoint recognition and pose estimation vincent lepetit

Documents

deep representation learning for keypoint localization ·...

mask...

06 leadership rick keypoint - references

learning descriptors for object recognition and 3d...

lepetit f 06

iccv workshop on recovering 6d object pose tae-kyun kim,...

keypoint based...

centernet: keypoint triplets for object...

adversarial training for adverse conditions: robust...

articulate object keypoint detection and pose estimation ·...

methodology - keypoint intelligence

ur2kid: unifying retrieval, keypoint detection, and...

mask...

it consulting - keypoint

regressive domain adaptation for unsupervised keypoint

the keypoint report

transpose: keypoint localization via transformer

car recognition through sift keypoint matching

unsupervised keypoint learning for guiding class-conditional...

l’agence de notation beyond ratings - michel lepetit