compact signatures for high- speed interest point description and matching calonder, lepetit, fua,...

Compact Signatures for High-speed Interest Point Description

and MatchingCalonder, Lepetit, Fua, Konolige,

Bowman, Mihelich(as rendered by Lord)

Just Kidding• Actually, we’re doing three papers

– Fast Keypoint Recognition in Ten Lines of Code (Ferns)– Keypoint Signatures for Fast Learning and Recognition

(Signatures)– Compact Signatures for High-speed Interest Point Description

and Matching (Compact Signatures)

• We will be doing them briefly, so don’t worry• Context: we’re talking about keypoint description and

matching

Ferns

• Problem: Features designed to be invariant or robust to commonly-observed deformations (e.g. SIFT) are slow to compute, limiting how many can be handled in many practical applications

• Solution: Move most of the computation offline via a discriminative learning framework

FernsWe want to assign the patch around a keypoint to the most probable class ĉi given the binary features fj calculated over it:

Standard Bayes’s Rule:

Assuming a uniform prior, this becomes a maximum likelihood expression:

Choose a very simple feature, the sign of the difference between two pixels:

FernsNeed about 300 of these features for accurate classification. The full joint thus can’t be represented.

As usual, seek to alleviate this problem with independence assumptions. At the extreme:

This (complete independence) will of course not really work on anything. So, a simple in-between:

These groups are the ferns. Model dependence within each group, assume independence between them (at random):

Ferns

The titular ten lines:

The fern form has M2S parameters, with M between 30 and 50, and S about 10.

Ferns• Other details, which we’ll skip:

– Modeling confidence in empirical estimates– Using thresholds to reduce evaluation count– Relationship with Random Trees– Comparison against SIFT

Signatures

• Problem: Ferns are based on an offline training phase, so you can’t learn new features online. This renders ferns useless for, e.g., SLAM.

• Solution: Describe new classes in terms of the old (assuming the initial set is rich enough).

Signatures

Call these points the “base set”, and train a Randomiz(s)ed Tree classifier on them. (Call the method “Generic Trees”.)

Pull some keypoints at random from an arbitrary textured scene (here, N DOG/SIFT points not within 5 pixels):

The response of a keypoint from the base set to the classifier trained on the base set should peak at that keypoint:

You also warp the base set patches to make the class recognition transformation-invariant (TBD):

SignaturesThe response of a keypoint not in the base set tends to peak in multiple (but relatively few) locations. This response is the keypoint’s “signature” (intended to be transformation-invariant):

By thresholding, you can replace this signature with a sparse approximation to itself:

A signature is essentially the collection of base patches you most resemble:

SignaturesFor evaluation, signatures are matched using best-bin-first with geometric ground truths on baseline pairs like this:

N and t determine “signature length”, N explicitly and t implicitly (N increases description and matching, t only increases matching)

(At t=0.01,) signature lengths are short and tightly distributed

Experimentally, found reason to go beyond N=300

Signaturest does not have to be terribly large to max out your matching performance:

Signatures

According to the paper, this represents a 35-time speedup. Division gives me about 53. Am I misunderstanding something, or was that a typo?

(They also show this can be applied to SLAM, but we’ll note that without getting into it yet. TBD.)

The selling point of this is that it gives very similar performance to SIFT, at a fraction of the cost in time (TBD):

Compact Signatures

• Problem: Signatures are naturally sparse, but the first attempt at them did not exploit this: matching time and memory usage are higher than needed.

• Solution: Compress the signatures through random projection.

• (This is the whole paper.)

Compact SignaturesYou again have a base classifier consisting of J fern units. Although, now, the “ferns” are combined additively, like random trees, so they’re not really the ferns detailed in the reference (TBD):

And again, there is a sparse version of the response created by thresholding against θ:

With base size N, feature count d, and bytes to store a float b, the memory requirement of the approach is

For J=50, d=10, and N=500, this exceeds 100 MB.

Compact SignaturesHowever, you can compress this with an ROP matrix Φ:

Because of the linear combination of fern responses, you can pre-compress the leaf vectors, avoiding storing their uncompressed versions:

This effectively replaces N by M (row dimension of Φ), dividing memory requirement by N/M, and requiring N/M times fewer operations in computing descriptors.

There is then further (SIMD-enabling) bit-level compression:

Compact SignaturesThe transformation from the previous approach (top) to this one (bottom) can be pictured like this:

Compact Signatures

Compact Signatures

There’s no reason to make M larger than 176, and no reason to worry much about how you do the projection:

Compact SignaturesThis paper was about time and space:

There are details about PTAM incorporation and a small appendix on compressive sensing, which we don’t do in detail here.

Ta :-*

TBD

• “Transformation-invariance”

• SLAM application

• Ferns vs. random trees

compact signatures for high- speed interest point description and matching calonder, lepetit, fua,...

Documents

matching slide

signatures t

compact signatures problem

lord slide

ferns problem

base size n

keypoint description

base classifier