low complexity keypoint recognition and pose estimation vincent lepetit

Post on 20-Jan-2016

224 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Low Complexity Keypoint Recognition and Pose Estimation

Vincent Lepetit

Real-Time 3D Object Detection

Runs at 15 Hz

QuickTime™ and a decompressor

are needed to see this picture.

QuickTime™ and a decompressor

are needed to see this picture.

3

Keypoint Recognition

Pre-processingMake the actual classification easier

Nearest neighbor classification

One class per keypoint: the set of the keypoint’s possible appearances under various perspective, lighting, noise...

The general approach [Lowe, Matas, Mikolajczyk] is a particular case of classification:

Search in the Database

Search in the Database

4

Used at run-time to recognize the keypoints

Training phase Classifier

5

A New Classifier: FernsJoint Work with Mustafa Özuysal

6

Compromise:

which is proportional to

but complete representation of the joint distribution infeasible.

Naive Bayesian ignores the correlation:

We are looking for

argmaxi

P(C = c i patch)

If patch can be represented by a set of image features { fi }:

P(C = c i patch) = P(C = c i f1, f2,K fn, fn +1,K K fN )

Presentation on an Example

Ferns: TrainingThe tests compare the intensities of two pixels around the keypoint:

Invariant to light change by any raising function.

Posterior probabilities:

Ferns: Training

6

1

5

0

1

1

1

0

0

1

0

1

++

++

++

Ferns: Training

Ferns: Training Results

Ferns: Recognition

It Really Works

QuickTime™ and a decompressor

are needed to see this picture.

QuickTime™ and a decompressor

are needed to see this picture.

14

Ferns outperform Trees500 classes.

No orientation or perspective correction.

FERNS

TREES

Number of structures

Recognition rateFerns responses are combined multiplicatively(Naive Bayesian rule)

Trees responses are combined additively(average)

Optimized Locations versus Random Locations:We Can Use Random Tests

Number of trees

Recognition rate Information gain optimizationRandomness

Comparison of the recognition rates for 200 keypoints:

16

We Can Use Random Tests

For a small number of classeswe can try several tests, and

retain the best one according to some criterion.

17

We Can Use Random Tests

For a small number of classeswe can try several tests, and

retain the best one according to some criterion.

When the number of classes is largeany test does a decent job:

18

Another Graphical Interpretation

19

Another Graphical Interpretation

20

Another Graphical Interpretation

21

Another Graphical Interpretation

22

Another Graphical Interpretation

23

Another Graphical Interpretation

24

We Can Use Random Tests:Why It Is Interesting

Building the ferns takes no time (except for the posterior probabilities estimation);

Simplifies the classifier structure;

Allows incremental learning.

25

Comparison with SIFTRecognition rate

FERNS

SIFT

Frame Index

Number of Inliers

26

Comparison with SIFTComputation time

• SIFT: 1 ms to compute the descriptor of a keypoint (without including convolution);

• FERNS: 13.5 micro-second to classify one keypoint into 200 classes.

27

1: for(int i = 0; i < H; i++) P[i ] = 0.; 2: for(int k = 0; k < M; k++) { 3: int index = 0, * d = D + k * 2 * S; 4: for(int j = 0; j < S; j++) { 5: index <<= 1; 6: if (*(K + d[0]) < *(K + d[1])) 7: index++; 8: d += 2; } 9: p = PF + k * shift2 + index * shift1;10: for(int i = 0; i < H; i++) P[i] += p[i]; }

Very simple to implement;No need for orientation nor perspective correction;(Almost) no parameters to tune;Very fast.

Keypoint Recognition in Ten Lines of Code

28

Ferns Tuning

• The number of ferns, and

• The number of tests per ferns

can be tuned to adapt to the hardware in terms of CPU power and memory size.

Feature Harvesting

Estimate the posterior probabilities from a training video sequence:

QuickTime™ and aYUV420 codec decompressor

are needed to see this picture.

QuickTime™ and aYUV420 codec decompressor

are needed to see this picture.

Feature Harvesting

Update Classifier

Detect Object in Current Frame

With the ferns, we can easily:

- add a class;

- remove a class;

- add samples of a class to refine the classifier.

Incremental learning

No need to store image patches; We can select the keypoints the classifier can recognize.

Training examplesMatches

Test Sequence

QuickTime™ and a decompressor

are needed to see this picture.

Handling Light Changes

QuickTime™ and a decompressor

are needed to see this picture.

QuickTime™ and a decompressor

are needed to see this picture.

QuickTime™ and a decompressor

are needed to see this picture.

QuickTime™ and a decompressor

are needed to see this picture.

QuickTime™ and a decompressor

are needed to see this picture.

QuickTime™ and a decompressor

are needed to see this picture.

35

QuickTime™ and a decompressor

are needed to see this picture.

Low Complexity Keypoint Recognition and Pose Estimation

37

EPnP: An Accurate Non-Iterative O(n) Solution to the PnP Problem

Joint Work with Francesc Moreno-Noguer

38

The Perspective-n-Point (PnP) Problem

How to take advantage of the internal parameters ?

Solutions exist for the specific cases n = 3 [...], n = 4 [...], n = 5 [...], and the general case [...].

Rotation, Translation ?Internal

parameters known

2D/3D correspondences

known

39

A Stable Algorithm

MEAN MEDIAN

Rotation Error (%)

qtrue − q

q

⎝ ⎜

⎠ ⎟

Number of points used to estimate pose

LHM: Lu-Hager-Mjolsness, Fast and Globally Convergent Pose Estimation from Video Images. PAMI'00. (Alternatively optimize over Rotation and Translation);

EPnP: Our method.

40

A Fast Algorithm

Rotation Error (%)

Computation Time (sec) - Logarithmic scale

qtrue − q

q

⎝ ⎜

⎠ ⎟

MEDIAN

41

General Approach

Estimate the coordinates of the 3D points in the camera coordinate system.

knownpiworld

knownpiworld

picamera ?

picamera ?

estimatedpicamera

estimatedpicamera

Rotation, Translation[Lu et al. PAMI00]

Rotation, Translation[Lu et al. PAMI00]

42

pi = α ijc j

j=1

4

Introducing Control PointsThe 3D points are expressed as a weighted sum of four control points.

pi

c1

c2

c3

c4

x =

c1camera

c2camera

c3camera

c4camera

⎢ ⎢ ⎢ ⎢

⎥ ⎥ ⎥ ⎥

x =

c1camera

c2camera

c3camera

c4camera

⎢ ⎢ ⎢ ⎢

⎥ ⎥ ⎥ ⎥

12 unknowns: The coordinates of the control points in the camera coordinates system.

43

The Point Reprojections Give a Linear System

wi

ui

1

⎣ ⎢

⎦ ⎥= Ap

i

camera = A α ijc jcamera

j=1

4

For each correspondence i:

Rewriting and Concatenating the Equations from all the Correspondences:

Mx = 0

Mx = 0

withx =

c1camera

c2camera

c3camera

c4camera

⎢ ⎢ ⎢ ⎢

⎥ ⎥ ⎥ ⎥

44

Mx = 0 MTMx = 0 x belongs to the null space of MTM:

with vi eigenvectors of matrix MTM associated to null eigenvalues.

Computing MTM is the most costly operation — and linear in n, the number of correspondences.

The Solution as Weighted Sum of Eigenvectors

∃N, β i{ } such that x = β ivi

i=1

N

∃N, β i{ } such that x = β ivi

i=1

N

45

• The i are our N new unknowns;• N is the dimension of the null space of MTM;

• Without noise: N = 1 (scale ambiguity).

• In practice: no zero eigenvalues, but several very small, and N ≥ 1 (depends on the 2D locations noise).

We found that only the cases N = 1, 2, 3 and 4 must be considered.

From 12 Unknowns to 1, 2, 3, or 4

∃N, β i{ } such that x = β ivi

i=1

N

46

How the Control Points Vary with the i

QuickTime™ and a decompressor

are needed to see this picture.

Reprojections in the Image Corresponding 3D points

∃ i{ } such that x = β ivi

i=1

N

∑ =

c1camera

c2camera

c3camera

c4camera

⎢ ⎢ ⎢ ⎢

⎥ ⎥ ⎥ ⎥

When varying the i:

47

Imposing the Rigidity Constraint

The distances between the control points must be preserved:

6 quadratic equations in the i.

c1

c2

c3

c4

ckcamera − c l

camera 2= ck

world − c lworld 2

(known)

ckcamera − c l

camera 2= ck

world − c lworld 2

(known)

48

The Case N = 1

c1camera

c2camera

c3camera

c4camera

⎢ ⎢ ⎢ ⎢

⎥ ⎥ ⎥ ⎥

= β1v1

1 can easily be computed: • Its absolute value is solution of a linear system:

• Its sign is chosen so that the handedness of the control points is preserved.

ckcamera − c l

camera 2= ck

world − c lworld 2

(known)

, and 6 quadratic equations:

× v1[k ] − v1

[ l ] = ckworld − c l

world

49

We use the linearization technique.Gives 6 linear equations in 11 = 1

2, 12 = 1 2, and 22 = 22 :

l11 l12 l13

l21 l22 l23

l31 l32 l33

l41 l42 l43

l51 l52 l53

l61 l62 l63

⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢

⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥

β11

β12

β 22

⎢ ⎢ ⎢

⎥ ⎥ ⎥=

ρ1

ρ 2

ρ 3

ρ 4

ρ 5

ρ 6

⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢

⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥

⎪ ⎪ ⎪

⎪ ⎪ ⎪

6 equations

l11 l12 l13

l21 l22 l23

l31 l32 l33

l41 l42 l43

l51 l52 l53

l61 l62 l63

⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢

⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥

β11

β12

β 22

⎢ ⎢ ⎢

⎥ ⎥ ⎥=

ρ1

ρ 2

ρ 3

ρ 4

ρ 5

ρ 6

⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢

⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥

⎪ ⎪ ⎪

⎪ ⎪ ⎪

6 equations

The Case N = 2

c1camera

c2camera

c3camera

c4camera

⎢ ⎢ ⎢ ⎢

⎥ ⎥ ⎥ ⎥

= β1v1 + β 2v2

ckcamera − c l

camera 2= ck

world − c lworld 2

(known)

, and 6 quadratic equations:

50

Same linearization technique.Gives 6 linear equations for 6 unknowns:€

c1camera

c2camera

c3camera

c4camera

⎢ ⎢ ⎢ ⎢

⎥ ⎥ ⎥ ⎥

= β ivi

i=1

3

ckcamera − c l

camera 2= ck

world − c lworld 2

(known)

l11 l12 l13 l14 l15 l16

l21 l22 l23 l24 l25 l26

l31 l32 l33 l34 l35 l36

l41 l42 l43 l44 l45 l46

l51 l52 l53 l54 l55 l56

l61 l62 l63 l64 l65 l66

⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢

⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥

β11

β12

β13

β 22

β 23

β 33

⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢

⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥

=

ρ1

ρ 2

ρ 3

ρ 4

ρ 5

ρ 6

⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢

⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥

⎪ ⎪ ⎪

⎪ ⎪ ⎪

6 equations

l11 l12 l13 l14 l15 l16

l21 l22 l23 l24 l25 l26

l31 l32 l33 l34 l35 l36

l41 l42 l43 l44 l45 l46

l51 l52 l53 l54 l55 l56

l61 l62 l63 l64 l65 l66

⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢

⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥

β11

β12

β13

β 22

β 23

β 33

⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢

⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥

=

ρ1

ρ 2

ρ 3

ρ 4

ρ 5

ρ 6

⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢

⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥

⎪ ⎪ ⎪

⎪ ⎪ ⎪

6 equations

The Case N = 3

, and 6 quadratic equations:

51

Six quadratic equations in 1, 2, 3, and 4. The linearization introduces 10 products ab= a b

Not enough equations anymore ! Relinearization: The ab are expressed as a linear combination of eigenvectors.

l11 l12 l13 l14 l15 l16 l17 l18 l19 l1,10

l21 l22 l23 l24 l25 l26 l27 l28 l29 l2,10

l31 l32 l33 l34 l35 l36 l37 l38 l39 l3,10

l41 l42 l43 l44 l45 l46 l47 l48 l49 l4,10

l51 l52 l53 l54 l55 l56 l57 l58 l59 l5,10

l61 l62 l63 l64 l65 l66 l67 l68 l69 l6,10

⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢

⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥

β11

β12

β13

β14

β 22

β 23

β 24

β 32

β 33

β 44

⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢

⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥

=

ρ1

ρ 2

ρ 3

ρ 4

ρ 5

ρ 6

⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢

⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥

⎪ ⎪ ⎪ ⎪ ⎪ ⎪

⎪ ⎪ ⎪ ⎪ ⎪ ⎪

6 equations

l11 l12 l13 l14 l15 l16 l17 l18 l19 l1,10

l21 l22 l23 l24 l25 l26 l27 l28 l29 l2,10

l31 l32 l33 l34 l35 l36 l37 l38 l39 l3,10

l41 l42 l43 l44 l45 l46 l47 l48 l49 l4,10

l51 l52 l53 l54 l55 l56 l57 l58 l59 l5,10

l61 l62 l63 l64 l65 l66 l67 l68 l69 l6,10

⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢

⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥

β11

β12

β13

β14

β 22

β 23

β 24

β 32

β 33

β 44

⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢

⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥

=

ρ1

ρ 2

ρ 3

ρ 4

ρ 5

ρ 6

⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢

⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥

⎪ ⎪ ⎪ ⎪ ⎪ ⎪

⎪ ⎪ ⎪ ⎪ ⎪ ⎪

6 equations

The Case N = 4

52

Algorithm Summary

1. The control points coordinates are the (12) unknowns;

2. The 3D points should project on the given corresponding 2D locations: Linear system in the control points coordinates.

3. The control points coordinates can be expressed as a linear combination of the null eigenvectors of this linear system: The weights (the i) are the new unknowns (not more than 4).

4. Adding the rigidity constraints gives quadratic equations in the i.

5. Solving for the i depends on their number (linearization or relinearization).

53

Results

QuickTime™ and a decompressor

are needed to see this picture.

QuickTime™ and a decompressor

are needed to see this picture.

54

Thank you.

Questions ?

60

The Point Reprojections Give a Linear System

wi

ui

1

⎣ ⎢

⎦ ⎥= Ap

i

camera = A α ijc jcamera

j=1

4

∑For each correspondence i:

⇔ ∀i, wi

ui

v i

1

⎢ ⎢ ⎢

⎥ ⎥ ⎥=

fu 0 uc

0 fv vc

0 0 1

⎢ ⎢ ⎢

⎥ ⎥ ⎥

α ij

x jcamera

y jcamera

z jcamera

⎢ ⎢ ⎢

⎥ ⎥ ⎥j=1

4

∑Let's expand:

⇔α ij fux j

camera + α ij uc − ui( )z jcamera = 0

j=1

4

∑α ij fv y j

camera + α ij vc − v i( )z jcamera = 0

j=1

4

⎨ ⎪

⎩ ⎪

Concatenating equations from all the correspondences:

⇔ Mx = 0, with x =

c1camera

c2camera

c3camera

c4camera

⎢ ⎢ ⎢ ⎢

⎥ ⎥ ⎥ ⎥

From point reprojection:

top related