mathematics, shape, computer vision

Mathematics, Shape, Computer Vision

Massimo Ferri

Univ. of Bologna, Italy

[email protected]


• Introduction

• Contours (gradients etc)

• Alignments (Hough)

• Shape synthesis (Fourier etc.)

• Shape from X (Epipoles etc. etc.)

• Recognition (Transformation groups)

• Retrieval (Distances)

• Application: naevus/melanoma diagnosis

Ljubljana, 1/7/2013 M. Ferri - Mathematics, Shape, Computer Vision 2/62

Introduction

Some mathematically minded textbooks:

• R. Hartley, A. Zisserman, Multiple View Geometry in Computer Vision, Cambridge University Press, 2004.

• E. Trucco, A. Verri, Introductory Techniques for 3-D Computer Vision, Prentice Hall 1998.

• O. Faugeras, Three-dimensional computer vision: a geometric viewpoint, MIT Press Cambridge, MA, USA 1993.

• Y. Ma, S. Soatto, J. Kosecka, S.S. Sastry, An Invitation to 3-D Vision: From Images to Geometric Models, Springer Verlag 2003.


Introduction

Computer vision (mainly) consists in techniques devoted to automatically obtain information on a 3D environment (called the scene) out of one or several images.

Applications:

• Biomedical App’s – Diagnosis – Surgical aids – Aids for the handicapped – …

• Industrial App’s – Inspection – Manipulation – …

•Optical Character Recognition •Remote sensing •Augmented reality •Robot navigation •Safety •Security •…


Introduction

The levels of Computer Vision

• Low level

– Extraction of elementary features

• Alignments

• Junctions

• …

– Segmentation

• Contours

• Regions

– Textures

– …


Introduction

• Middle level – Correspondences

• Stereo

• Motion

– Shape • Representation

• Topology

• Distances

– Geometry • Convexity

• Visibility

• Decompositions

• Invariants

• Transforms

–3D • Shape from

– Shading

– Texture

– Motion

– Stereovision

– Defocussing

• Active vision – Interferometry

– Structured light


Introduction

• High level – Recognition

– Pose estimation

– Retrieval

– Description

– Man-machine interaction


Introduction

• Continuous model

f: D R, (D Rn)

x D 0 ≤ f(x) ≤ M

• Discrete image

– Sampling (finite point set)

– Quantizing (finite value set)

– Tessellation (pixels)

– Representation (bits)

What is an image?


Introduction


Introduction

Continuous model:

Convolution of function f with kernel h is defined as:

A very useful tool: Convolution


Introduction

Discrete model:

Functions are substituted by matrices and integrals by sums.


Introduction

An example: smoothing by a Gaussian (for getting rid of noise).


Introduction



• Introduction









Contours (gradients etc)

In order to deal with objects, one has first to isolate them from background.

One way is by finding contours.

A different strategy focusses on regions.



In order to detect contours, one finds the pixels where the norm of the gradient of f exceeds a given threshold:

Partial derivatives are obtained by convolution. Various masks are available.



Canny Sobel Roberts



Different thresholds



A different strategy: zero crossings of the Laplacian



(Apparent) contour: the projection of the rim, i.e. of the locus of critical points of the projection itself.



A wrong conclusion of the great visionist D. Marr: “In general, of course, points of inflection in a contour need have no significance for the surface”.



On the contrary, the inflection points of the contour correspond to points of 0 Gaussian curvature of the surface!

J.J. Koenderink, What does the occluding contour tell us about solid shape? Perception 31 (1984), 321-330



Contours can be very informative on the topology of the observed surface.

Whitney, Haefliger, Koenderink, Weiss & Callahan, Pignoni, Edelsbrunner Morozov & Patel,…



Alternatively to contour extraction, one can focus on region growing.

Problem: substitute grey tone function f with a function h close to f, with smaller gradient (piecewise constant, if possible), with small length of the discontinuity curves

Solution: find h and which minimize a suitable functional.



The Mumford-Shah functional

D. Mumford1, J. Shah, Optimal approximations by piecewise smooth functions and associated variational problems, Comm. Pure Appl. Math.42 (1989), 577–685.



Thresholding

Mumford-Shah



A variant of the Mumford-Shah functional for image restoration: The subset D of nonreliable pixels is excluded from minimization. M. Nitzberg, D. Mumford, T. Shiota, Filtering, Segmentation and Depth. LNCS 662, Springer 1993



• Introduction









Alignments (Hough)

Problem: Determine the straight lines on which the image elements are mainly aligned.


Alignments (Hough)

A solution: each image point P “votes” points in the dual Hough plane (where each point represents a straight line). It votes points representing lines through P.

The points of the Hough plane, which got most votes, represent lines of the image planes on which most points lie.

Practically, in the Hough plane: • Threshold • Find clusters.


Alignments (Hough)

P.V.C. Hough, Machine Analysis of Bubble Chamber Pictures, Proc. Int. Conf. High Energy Accelerators and Instrumentation, 1959


Alignments (Hough)

Hough transform with 40% threshold


Alignments (Hough)


Alignments (Hough)

It can be generalized to other parametrized spaces of curves


Alignments (Hough)

32 m. paraboloid

E-W, N-S arrays

Radiotelescope “North Cross”, Istituto Nazionale di Astrofisica, SETI Project


Alignments (Hough)

Spectrometer “Serendip IV”:

Bandwidth: 15 MHz

Channels: 25.165.800

The output consists in matrices

with columns corresponding to

channels and rows to sampling

times.


Alignments (Hough)

How to recognize an ET signal in the time-frequency matrices?

Its “signature” is the Doppler effect due to rotation/revolution. In the output matrices the signal appears as piecewise sinusoidal.


Alignments (Hough)

Signal received on July 7,1998 by the radiotelescope “North Cross”


Alignments (Hough)



• Introduction









Shape synthesis (Fourier etc) Under suitable continuity and integrability

conditions, given a function f: RC there exists its Fourier transform F: R C

Luckily, we can recover f from F:

The Fourier transform reorganizes data in the frequency domain.


Shape synthesis (Fourier etc)

In the graphical representations of the Fourier transform of f, what is normally plotted is the amplitude function of F, called the Fourier spectrum of f. The transform of a 2D signal (typically an image) can be performed in two stages, along two coordinates in sequence.



Periodic functions (above) and their Fourier transforms



Elimination of undesired frequencies Ljubljana, 1/7/2013 M. Ferri - Mathematics, Shape, Computer Vision 46/62


Focussing Ljubljana, 1/7/2013 M. Ferri - Mathematics, Shape, Computer Vision 47/62


Data reduction Ljubljana, 1/7/2013 M. Ferri - Mathematics, Shape, Computer Vision 48/62


What about a discrete version of the Fourier transform of f : RC ? Answer: Given N equally spaced samples (for simplicity x=0,...,N-1), an approximation of the Fourier transform of f is:

Conversely, given a suitable sampling of F one obtains an approximation of f:


Shape synthesis (Fourier etc) This can be used to synthesize a Jordan curve C. Let C be regularly sampled by N points (x0, y0), …, (xN-1, yN-1); they can be seen as complex numbers sk=xk+iyk. Now apply the discrete Fourier transform:

The numbers a(u) are called Fourier descriptors of the curve C (better: of the approximating polygonal).



The original points are recovered so:

But if you use only the descriptors relative to the lowest frequencies, you get a curve C’ which is smoother than C but fairly similar to it.



N=1024 M=3 M=21

M=61 M=201 M=401


Shape synthesis (Fourier etc) A different base change is used for face recognition and retrieval with the “eigenfaces” of a covariance matrix



Original 98x98 3x3 5x5

11x11 15x15 20x20

Information carried by pixel sampling Ljubljana, 1/7/2013 M. Ferri - Mathematics, Shape, Computer Vision 56/62


Information carried by coefficients of eigenvectors



A combinatorial synthesis of the shape of a polyhedron is given by the aspect graph. An aspect is the set of faces which can be seen from a given direction. Vertices are the aspects, and two aspects are adjacent if and only if one can go continuously from one to the other.



Any two images corresponding to the same aspect are related together by piecewise homographies. There is a smooth version, in which equivalent images are related together by diffeomorphisms of the contours. The Gaussian sphere is then divided into regions of directions yielding equivalent images. Transition curves can be labelled by “catastrophes”.



Massimo Ferri

Univ. of Bologna, Italy

[email protected]


• Introduction









Shape from X (epipoles etc etc)

There is quite a lot of techniques for guessing shape out of particular image features. E.g.

• stereo • shading • structured light • motion • focussing/defocussing • zooming • laser range finding • texture • ...



Shape from stereo



The scene point P, of coordinates (X,Y,Z) projects point p of coordinates (x,y) on the image plane. The relation between coordinates depends on the focal length f:



As a first, simplified stereo setting, consider a pair of cameras with overlapping image planes. So the optical axes are parallel. Assume also that the focal length is the same for both. Fix a Cartesian frame for each camera, so that the abscissa axes coincide with the straight line connecting the vantage points.



Any scene point P projects to a point pl in the left image and to a point pr on the right image. The disparity d= xr – xl is the variable which allows us to compute the depth Z of point P. From the similar triangles OlOrP and plprP one gets

whence



For a single point, computation is simple, once the stereo base T is known. The real problem is another: It is the matching problem, i.e. to understand which points in the right image correspond to which points in the left image. Search for matching pair is generally performed by comparing neighborhoods.



A big help (when the image planes are at an angle) comes from the epipolar constraint: Given a point p in the left image, the search for a match in the right image can be limited to the epipolar line which corresponds (under a computable projectivity) to the epipolar line containing p.



To be precise, the homogeneous coordinates Xr and Xl of matching points are conjugated:

XrT E Xl = 0

where E is obtained as follows. Let xl = R xr + T be the orthogonal Cartesian frame transformation; then E = RT S with



Shape from shading



Image irradiance E(P): light power per area unit received at the image point p.

Scene radiance L(P): light power per area unit emitted at the scene point P in direction d.

Lambertian surface: a surface whose radiance is constant w.r.t. direction d.

I(P): vector with the direction of incident light at P, and modulus equal to the incident light power per area unit.

N(P): normal versor at P. Ljubljana, 3/7/2013 M. Ferri - Mathematics, Shape, Computer Vision 14/77


For a Lambertian surface: L(P) = I(P)Tn(P)

where the albedo ρ is a positive constant. Then

where d is the lens diameter, f is the focal length and α is the (small) angle between pP and the optical axis. So, with good approximation, E(p) is proportional to I(P)Tn(P).



The possible normal directions at P form a whole cone, so observing a single point cannot reveal the normal direction. But we generally see a whole region; if there is a point where the normal is known, then we can get the normals of all points of the region.



When lighting is controlled, another strategy is possible: one lights up the object of interest with three different light sources. Each produces a cone of potential normals at each point; the intersection of the three cones is the true normal. Knowledge of normals and face areas (for polyhedra) and of Gaussian curvatures (for smooth surfaces) makes a complete reconstruction possible (convex case)!



Theorem (Minkowski, 1897) – Given two convex polyhedra in Euclidean 3D space, if a bijection between their face sets exists, such that corresponding faces have the same normal and same area, then the polyhedra are congruent. (Note that the equality of the numbers of edges for the corresponding faces is not requested: It comes by itself!)



Luckily, there is a smooth version of the theorem: Theorem (Aleksandroff, 1942) – Given two convex solid bodies in Euclidean 3D space, if a bijection between their boundaries exists, such that corresponding points have the same normal and same Guassian curvature, then the solid bodies are congruent.



In practice, knowledge of the depth of one point and the direction (a,b,-1) of the normal at each point yields the depth of the other points of the region by integration:



Shape from structured light



In controlled environments, it is possible to project a blade of light or a grid on the object of interest. The deformation imposed by the surface gives information on the local shape. Actually, structured light can be brought back to stereo, since the nondeformed grid can be considered as the second image. The problem of matching, however, comes again into play.



A tentative disambiguating grid



A texture – even a not too regular one – plays much the same rôle as structured light. Its deformation on a surface gives again information on the normal direction. (This is – at least intuitively – well-known to fashion designers)



An interesting variation in structured light is the use of moiré fringes, interference patterns that one gets by projecting a very thin and dense grid, and interjecting an analogous grid on the lens. The fringes are level curves of a Morse function defined on the surface. Their singular points then carry topological information on the surface itself.



Attention gets focused on the singular points. They correspond to critical points of the function, and show their indices.



• Introduction









Recognition (transformation groups)

The simplest type of shape recognition is by superimposition: One tries to deform a template into the given image. Problem: Different environments imply different transformation groups; the wider the group, the greater is the freedom, but also the computational complexity, due to a greater number of parameters.



translation movement conserving a direction



movement(direct congruence): det>0



congruence: nonvanishing det



similitude



affinity



homography



homeomorphism Ljubljana, 3/7/2013 M. Ferri - Mathematics, Shape, Computer Vision 41/77


homeomorphism



But here there is a homeomorphism too!



Instead of superimposing the images themselves, one can extract from the image invariants with respect to the given transformation groups. E.g. lengths and angles are invariant under movements; angles and length ratios under similitudes; area ratios under affinities.



An important invariant under similitudes is the form factor C=4πA/P2 (A area, P perimeter). An affine invariant for triples of collinear points B,C,D is the simple ratio d(B,D)/d(C,D). For homographies we have the cross ratio of four collinear points d(B,D)d(C,E)/d(C,D)d(B,E)

As for homeomorphisms, the whole body of Algebraic Topology is expressly dedicated to the construction and study of suitable topological invariants.



A very general setting with practical applications in shape recognition is Persistent Topology, initiated in the ’90s with the name of Size Theory. It tries to answer a rather philosophical type of question: What is shape?

P. Frosini Measuring shapes by size functions, Proc. of SPIE, Intelligent Robots and Computer Vision X: Algorithms and Techniques, Boston, MA, 1991, vol. 1607.



Which object has “the same shape” as the upper circle? In our opinion, this depends on the observer (his/her viewpoint, interest, tasks…).



similitudes Ljubljana, 3/7/2013 M. Ferri - Mathematics, Shape, Computer Vision 52/77


affinities



homographies



homeomorphisms



• Introduction









Retrieval (distances)

Search engines which search for images using images (instead of words) are already coming out of the early research stage. Whatever the shape representation they use, they obviously need smart distances.



How much do these two curves differ w.r.t. ordinate as a measuring function?

Our proposal: the minimum cost, in terms of f and f’, of transforming one into the other. Ljubljana, 3/7/2013 M. Ferri - Mathematics, Shape, Computer Vision 58/77


P. Frosini, M. Mulazzani, Size homotopy groups for computation of natural size distances, Bull. of the Belgian Math. Soc. - Simon Stevin, 6 (1999), 455-464.



But the natural pseudodistance is difficult (or even impossible) to compute. Therefore we need a computable lower bound for it. Luckily, we have it: the matching distance between the already seen Persistent Betti Number functions of the size pairs.



All information carried by a Persistent Betti Number function can be condensed in the formal series of its cornerpoints

The matching distance



It turns out that:

i.e. the matching distance between size functions yields a lower bound to the natural pseudodistance.

S. Biasotti, A. Cerri, P. Frosini, D. Giorgi, C. Landi Multidimensional size functions for shape comparison Journal of Mathematical Imaging and Vision 32 (2008), 161–179.

A. Cerri, B. Di Fabio, M. Ferri, P. Frosini, C. Landi Betti numbers in multidimensional persistent homology are stable functions Math. Meth. Appl. Sci. DOI: 10.1002/mma.2704 (2012).



We used it successfully for searching a database of randomly generated polygonals...



… a database of randomly generated smooth curves …



… a database of trade marks …



Query 1 2 3

… and a public database of sea fauna.

CSS

CSS

our system

our system



• Introduction









Application: naevus/melanoma diagnosis

Melanocytic lesion images acquired under polarized light and mild magnification.

• Goal: distinction between naevus and melanoma

• Problems:

– No template for either class

– Various diagnostic criteria

– Morphological analysis not always sufficient

– Processing speed compatible with medical consulting room environment



We concentrated on the search for asymmetries of

• boundary shape

• color distribution

• pattern distribution.

This is performed for each lesion as follows.

We take a bundle of 45 lines through the center of mass, and for each we compare the two halves of the lesion, separated by the line.



Instead of making a geometric comparison (e.g. by superimposition) we performed a qualitative comparison by computing the distance of 0-PBN’s of the two halves.

The chosen measuring functions are: • distance from the splitting line • sum of luminance along segments • sum of color variations along segments.

Then, the distances from all splittings are gathered into a

function (called A-curve) of which some classical invariants

(one for each A-curve) are computed.



An image and one of its splittings.

The A-curve of the image (meas. fct.: luminance).



From this curve the software extracts min, max, average, min plus the value at 90° from min, integral, first moment, variation, min derivative, max derivative, integral of absolute value of derivative, variation of absolute value of derivative.

A Support Vector Machine with a 3rd order kernel is fed with these numbers, computed for each measuring function.



melanoma

naevus

The vectors also contain three more parameters: area, perimeter, and a bumpiness measure coming from the 0-PBN’s of the whole lesion, with distance from center of mass as the filtering function.



Experimentation

The data set contains 50 melanomas and 927 naevi. Receiver Operating Characteristic (ROC) curve: It plots Sensitivity vs. (1-Specificity)

M. Ferri, I. Stanganelli, Size functions for the morphological analysis of melanocytic lesions, Int. J.

Biomed. Imaging 2010 (2010), Article ID 621357, doi:10.1155/2010/621357


THANKS FOR YOUR ATTENTION !

[email protected]

http://vis.dm.unibo.it


mailto:[email protected]

http://vis.dm.unibo.it/

mathematics, shape, computer vision

Documents