recognition by linear combinations of models by shimon ullman & rosen basri presented by: piotr...

Recognition by Linear Recognition by Linear Combinations of ModelsCombinations of Models

By Shimon Ullman & Rosen BasriBy Shimon Ullman & Rosen Basri

Presented by:Presented by:Piotr DollarPiotr Dollar

Nov. 19, 2002Nov. 19, 2002

Key IdeasKey Ideas A 2d image that has undergone a linear A 2d image that has undergone a linear

transformation can be expressed as a linear transformation can be expressed as a linear combination of a few other combination of a few other 2d2d images! images!

This is very, very cool!This is very, very cool! Note:Note: notnot a linear combination of pixels – a linear combination of pixels –

rather a linear combination of point rather a linear combination of point coordinatescoordinates

Given a new 2d image of an object, we can Given a new 2d image of an object, we can test to see if it is a transformed version of a test to see if it is a transformed version of a model object by seeing if it is a linear model object by seeing if it is a linear combination of 2d model images. Object combination of 2d model images. Object Recognition.Recognition.

Aside: Orthographic Aside: Orthographic ProjectionProjection

Given an object we take its orthographic projection Given an object we take its orthographic projection (lose z coordinate). This is an approximation of the (lose z coordinate). This is an approximation of the perspective projection – makes math more perspective projection – makes math more tractable.tractable.

So long as the object is small compared to So long as the object is small compared to its distance from the camera the orthographic its distance from the camera the orthographic projection the approximation is goodprojection the approximation is good

Producing the 2D imageProducing the 2D image Take the orthographic projection of an object to get Take the orthographic projection of an object to get

a 2D edge mapa 2D edge map The The rimrim is the “set of points on object’s surface is the “set of points on object’s surface

whose normal is perpendicular to the viewing whose normal is perpendicular to the viewing direction”, and the orthographic projection of the direction”, and the orthographic projection of the rim generates the rim generates the silhouettesilhouette. .

Objects w Smooth Objects w Smooth BoundariesBoundaries

More difficult, since as More difficult, since as perspective changes, so do perspective changes, so do the points on the rim.the points on the rim.

We can estimate curvature at We can estimate curvature at each point and thus how the each point and thus how the boundaries would changeboundaries would change

This is a generalization since This is a generalization since objects with sharp edges can objects with sharp edges can be dealt with as smooth but be dealt with as smooth but having very little curvaturehaving very little curvature

Leads to ugly math, but same Leads to ugly math, but same high level ideahigh level idea

Serge said I can skip this, so Serge said I can skip this, so I will! From now on we will I will! From now on we will deal only with objects with deal only with objects with sharp edges. sharp edges.

Rest of PresentationRest of Presentation

For most of talk, we focus only the case of For most of talk, we focus only the case of rotation about the vertical axis. Going rotation about the vertical axis. Going through the details will demonstrate many through the details will demonstrate many of the key ideas of Ullman’s approach.of the key ideas of Ullman’s approach.

Next we will show how to find the Next we will show how to find the coefficient of the linear combination.coefficient of the linear combination.

Finally we show how to modify algorithm if Finally we show how to modify algorithm if general linear and rigid transformations general linear and rigid transformations allowed.allowed.

This is a different format from the paper.This is a different format from the paper.

OkOk

here we gohere we go

Rotation around the Rotation around the Vertical AxisVertical Axis

z is the viewing direction, x is the z is the viewing direction, x is the horizontal axis, y is the vertical axis.horizontal axis, y is the vertical axis.

For now assume no occlusionFor now assume no occlusion Given an object O, let P1 be an image Given an object O, let P1 be an image

(orthographic) of O, P2 another image of (orthographic) of O, P2 another image of O after it has been rotated by O after it has been rotated by αα (where (where αα ! != k= kππ), and ), and PP an image of O after a an image of O after a rotation of O by rotation of O by θθ

The projection of p=(x, y, z) from O is:The projection of p=(x, y, z) from O is: p1 = (x1, y1) = (x, y) in P1p1 = (x1, y1) = (x, y) in P1 p2 = (x2, y2) = (x cos p2 = (x2, y2) = (x cos αα + z sin + z sin αα, y) in P2, y) in P2 pp = ( = (xx, , yy) = (x cos ) = (x cos θθ + z sin + z sin θθ, y) in , y) in PP

The Cool PartThe Cool Part For any For any θθ there exist an a, b such that for there exist an a, b such that for

every such point p in O:every such point p in O:xx = ax1 + bx2 = ax1 + bx2

That is every point in the third image is a That is every point in the third image is a linear combination of the point in the first linear combination of the point in the first and second image. So, if the image had k and second image. So, if the image had k points with coordinates (xpoints with coordinates (x11, y, y11)…(x)…(xkk, y, ykk), and ), and after rotating by after rotating by αα the points had the points had coordinates (x2coordinates (x211, y2, y211)…(x2)…(x2kk, y2, y2kk), then after ), then after a rotation by a rotation by θθ the points would have x- the points would have x-coordinates:coordinates:

[[xx11,… ,… xxkk] = a[x] = a[x11,… x,… xkk] + b[x2] + b[x211,… x2,… x2kk]]

ProofProof

Let:Let: a = sin(a = sin(αα--θθ) / sin ) / sin αα b = sin b = sin θθ / sin / sin αα

Then: ax1 + bx2 = Then: ax1 + bx2 = = sin(= sin(αα--θθ) / sin ) / sin αα + sin + sin θθ / sin / sin αα (x cos (x cos αα + z sin + z sin αα))= x cos = x cos θθ + z sin + z sin θθ

That’s it.That’s it.

ApplicationApplication

Suppose we have the image P1 and P2 of Suppose we have the image P1 and P2 of the object 0. Now we are given some new the object 0. Now we are given some new image P (with labeled points) and asked if P image P (with labeled points) and asked if P could be an image of O after a rotation could be an image of O after a rotation about the vertical axis. about the vertical axis.

Intuition: if we can show there is an a, b Intuition: if we can show there is an a, b such that [such that [xx11,… ,… xxkk] = a[x] = a[x11,… x,… xkk] + b[x2] + b[x211,… ,… x2x2kk], then it is possible that P is an image of ], then it is possible that P is an image of O (but we can never be sure). If no such a, O (but we can never be sure). If no such a, b exist then P cannot be an image of O.b exist then P cannot be an image of O.

But wait! But wait! (constraints on a and b)(constraints on a and b)

However, recall that a and b are related:However, recall that a and b are related: a = sin(a = sin(αα--θθ) / sin ) / sin αα b = sin b = sin θθ / sin / sin αα

Can show that the following relation must Can show that the following relation must hold between a and b (just plug and chug):hold between a and b (just plug and chug):aa22 + b + b22 + 2ab cos + 2ab cos αα = 1 = 1

Thus, to show that P is possibly an image of Thus, to show that P is possibly an image of O, a and b must satisfy the additional O, a and b must satisfy the additional condition given above.condition given above.

Testing the constraintTesting the constraint

Note that in order to test (aNote that in order to test (a22 + b + b22 + + 2ab cos 2ab cos αα = 1) we would need to = 1) we would need to know know αα. This poses a serious . This poses a serious problem since all we have is two problem since all we have is two images P1 and P2. So how do we images P1 and P2. So how do we proceed?proceed?

Approach 1: Approach 1: Recover Recover αα From 3D From 3D

StructureStructure This requires first recovering the 3D structure This requires first recovering the 3D structure

of the object O, which defeats the purpose! of the object O, which defeats the purpose! One of the nicest things about Ullman’s method One of the nicest things about Ullman’s method

is that it does not require us to know 3D is that it does not require us to know 3D structure of object! If we had the 3D structure structure of object! If we had the 3D structure of the object than other methods could be used.of the object than other methods could be used.

(If we wanted 3D structure we could use (If we wanted 3D structure we could use “structure from motion” (SFM) theorem which “structure from motion” (SFM) theorem which says that given 3 orthographic projection of 4 says that given 3 orthographic projection of 4 non-coplanar points we can recover the non-coplanar points we can recover the structure. Note that we would need an structure. Note that we would need an additional image of the model).additional image of the model).

Approach 2:Approach 2:Recover Recover αα Directly Directly

We can use the constraint itself to recover We can use the constraint itself to recover αα. That is if we had an a, b that we were . That is if we had an a, b that we were sure satisfied asure satisfied a22 + b + b22 + 2ab cos + 2ab cos αα = 1, then = 1, then we could find we could find αα from this equation based on from this equation based on a and b.a and b.

If we know that a third image of O, call it If we know that a third image of O, call it P3, was taken after a rotation about the y-P3, was taken after a rotation about the y-axis, and we find P3 in terms of P1 and P2, axis, and we find P3 in terms of P1 and P2, then we get a, b that we know satisfy the then we get a, b that we know satisfy the constraint and can thus calculate constraint and can thus calculate αα..

Note that this again requires 3 model Note that this again requires 3 model images, just like an application of SFM.images, just like an application of SFM.

Approach 3: Approach 3: Ignore the constraintIgnore the constraint

That is do not test if a and b satisfy the constraint. That is do not test if a and b satisfy the constraint. This will increase the chance of “false positives” – This will increase the chance of “false positives” – the chance that P is a linear combination of P1 and the chance that P is a linear combination of P1 and P2 even though it is not an image of O.P2 even though it is not an image of O.

Note that false positives are already possible (since Note that false positives are already possible (since an image P of some object O2 could look just like an an image P of some object O2 could look just like an image of O after some rotation would).image of O after some rotation would).

As the number of points increases the likelihood of a As the number of points increases the likelihood of a false positive falls drastically anyway (according to false positive falls drastically anyway (according to Ullman).Ullman).

This is the approach he uses – although he tells us This is the approach he uses – although he tells us what constraints must be satisfied by the what constraints must be satisfied by the coefficients in different cases, he never uses these coefficients in different cases, he never uses these constraints.constraints.

Step backStep back

take a deep breathtake a deep breath

Finding a and bFinding a and b But how do we actually find a and b?But how do we actually find a and b? We are given k model images Mi. Each model We are given k model images Mi. Each model

image is simply two vectors – a vector of the x image is simply two vectors – a vector of the x coordinates Mi_x and a vector of the y coordinates Mi_x and a vector of the y coordinates Mi_y. We are also given an image P coordinates Mi_y. We are also given an image P as two vectors Px and Py.as two vectors Px and Py.

We can choose k. For the case of rotation about We can choose k. For the case of rotation about the y-axis k must be at least 2. the y-axis k must be at least 2.

Now we want to find a series of coefficients Now we want to find a series of coefficients such that:such that: Px = c1 * M1_x + … + ck * Mk_xPx = c1 * M1_x + … + ck * Mk_x Py = d1 * M1_y + … + dk * Mk_y Py = d1 * M1_y + … + dk * Mk_y

(note that in the case of rotation about the vertical (note that in the case of rotation about the vertical axis the y –coordinates of points d not change)axis the y –coordinates of points d not change)

Minimal AlignmentMinimal Alignment

We have 2k equations with 2k We have 2k equations with 2k unknowns. Can solve this explicitly unknowns. Can solve this explicitly to get the coefficients ci and di.to get the coefficients ci and di.

Let X = [M1_x … Mk_x]. Then Let X = [M1_x … Mk_x]. Then cc = X = X--

11 Px. Px. If we use an over-determined system If we use an over-determined system

(additional model images) than we (additional model images) than we take the pseudo-inverse of X. take the pseudo-inverse of X.

Getting the Models and PGetting the Models and P

Select a series of features that appear Select a series of features that appear in all the model images as well as the in all the model images as well as the image P. Find the correspondences. image P. Find the correspondences. This gives you Mi_x, Mi_y and Px, Py.This gives you Mi_x, Mi_y and Px, Py.

Occlusion not a problem whatsoever so Occlusion not a problem whatsoever so long as we have good segmentation and long as we have good segmentation and correspondence algorithms.correspondence algorithms.

Then, given, Mi_x, Mi_y and P…Then, given, Mi_x, Mi_y and P…

Other approachesOther approaches

Brute force: search for a, b. Why do this?Brute force: search for a, b. Why do this? Linear Mappings: Find an L such that for Linear Mappings: Find an L such that for

any V that is a linear combination of the any V that is a linear combination of the Mi_x, Mi_x,

V*P = c * V*P = c * qq

where c is a scalar and where c is a scalar and qq some fixed some fixed vector.vector.

Just linear algebra, nothing particularly Just linear algebra, nothing particularly more interesting than minimal alignment.more interesting than minimal alignment.

OkOk

On to general linear On to general linear transformationstransformations

General Linear TransformationsGeneral Linear Transformations General Rotation in 3D spaceGeneral Rotation in 3D space Rigid Transformations & Scaling in Rigid Transformations & Scaling in

3D space3D space Using two view onlyUsing two view only

(on board to avoid matrix in PPT)(on board to avoid matrix in PPT)

Other ReferencesOther References

High-level Vision by Ullman, MIT High-level Vision by Ullman, MIT Press 1997Press 1997

(especially chapter 5).(especially chapter 5).

recognition by linear combinations of models by shimon ullman & rosen basri presented by: piotr...

Documents

orthographic projection

model object

perspective projection

d images

object recognition

linear combination of

linear transformation

d model images