geometric techniques for humanoid perception · cameras of the stereoscopic vision system into a...

22
International Journal of Humanoid Robotics Vol. 7, No. 3 (2010) 429–450 c World Scientific Publishing Company DOI: 10.1142/S0219843610002234 GEOMETRIC TECHNIQUES FOR HUMANOID PERCEPTION ALBERTO PETRILLI-BARCEL ´ O , HERIBERTO CASARRUBIAS-VARGAS , MIGUEL BERNAL-MARIN , EDUARDO BAYRO-CORROCHANO § and R ¨ UDIGER DILLMAN Department of Electrical Engineering and Computer Science, CINVESTAV Unidad Guadalajara, Guadalajara, Zapopan, 45015, Mexico [email protected] [email protected] [email protected] § [email protected] Received 12 August 2009 Accepted 18 April 2010 In this article, we propose a conformal model for 3D visual perception. In our model, the two views are fused in an extended 3D horopter model. For visual simultaneous localization and mapping (SLAM), an extended Kalman filter (EKF) technique is used for 3D reconstruction and determination of the robot head pose. In addition, the Viola and Jones machine-learning technique is applied to improve the robot relocalization. The 3D horopter, the EKF-based SLAM, and the Viola and Jones machine-learning technique are key elements for building a strong real-time perception system for robot humanoids. A variety of interesting experiments show the efficiency of our system for humanoid robot vision. Keywords : Robot humanoid vision; 3D reconstruction; tracking and relocalization. 1. Introduction This article presents a geometric approach to building a humanoid perception sys- tem. The traditional 2D horopter is reformulated as a 3D horopter using Confor- mal Geometric Algebra (CGA). In this framework, the visual space is represented as a family of horopter spheres, which, together with their Poncelet points, lead remarkably to a 3D log-polar representation of the visual space. There is abun- dant research activity on image processing using 2D log-polar schemes with either monocular or stereo systems. 1,2 However, this kind of work basically represents the Cartesian world using polar coordinates, and it fails to fuse the data from the two cameras of the stereoscopic vision system into a single framework. We believe that our human-like computational scheme looks promising for the processing of visual space data. Each camera in the perception system utilizes an extended Kalman filter (EKF)-based SLAM approach or 3D reconstruction and estimates the pose 429 Int. J. Human. Robot. 2010.07:429-450. Downloaded from www.worldscientific.com by CENTRO DE INVESTIGACION Y DE ESTUDIOS AVANZADOS DEL IPN (CINVESTAV) SERVICIOS BIBLIOGRAFICOS on 10/19/12. For personal use only.

Upload: others

Post on 18-Jan-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: GEOMETRIC TECHNIQUES FOR HUMANOID PERCEPTION · cameras of the stereoscopic vision system into a single framework. We believe that our human-like computational scheme looks promising

November 8, 2010 12:1 WSPC/S0219-8436 191-IJHRS0219843610002234

International Journal of Humanoid RoboticsVol. 7, No. 3 (2010) 429–450c© World Scientific Publishing CompanyDOI: 10.1142/S0219843610002234

GEOMETRIC TECHNIQUES FOR HUMANOID PERCEPTION

ALBERTO PETRILLI-BARCELO∗, HERIBERTO CASARRUBIAS-VARGAS†,MIGUEL BERNAL-MARIN‡, EDUARDO BAYRO-CORROCHANO§

and RUDIGER DILLMAN

Department of Electrical Engineering and Computer Science,CINVESTAV Unidad Guadalajara,

Guadalajara, Zapopan, 45015,Mexico

[email protected][email protected][email protected]

§[email protected]

Received 12 August 2009Accepted 18 April 2010

In this article, we propose a conformal model for 3D visual perception. In our model,the two views are fused in an extended 3D horopter model. For visual simultaneouslocalization and mapping (SLAM), an extended Kalman filter (EKF) technique is usedfor 3D reconstruction and determination of the robot head pose. In addition, the Violaand Jones machine-learning technique is applied to improve the robot relocalization.The 3D horopter, the EKF-based SLAM, and the Viola and Jones machine-learningtechnique are key elements for building a strong real-time perception system for robothumanoids. A variety of interesting experiments show the efficiency of our system forhumanoid robot vision.

Keywords: Robot humanoid vision; 3D reconstruction; tracking and relocalization.

1. Introduction

This article presents a geometric approach to building a humanoid perception sys-tem. The traditional 2D horopter is reformulated as a 3D horopter using Confor-mal Geometric Algebra (CGA). In this framework, the visual space is representedas a family of horopter spheres, which, together with their Poncelet points, leadremarkably to a 3D log-polar representation of the visual space. There is abun-dant research activity on image processing using 2D log-polar schemes with eithermonocular or stereo systems.1,2 However, this kind of work basically represents theCartesian world using polar coordinates, and it fails to fuse the data from the twocameras of the stereoscopic vision system into a single framework. We believe thatour human-like computational scheme looks promising for the processing of visualspace data. Each camera in the perception system utilizes an extended Kalmanfilter (EKF)-based SLAM approach or 3D reconstruction and estimates the pose

429

Int.

J. H

uman

. Rob

ot. 2

010.

07:4

29-4

50. D

ownl

oade

d fr

om w

ww

.wor

ldsc

ient

ific

.com

by C

EN

TR

O D

E I

NV

EST

IGA

CIO

N Y

DE

EST

UD

IOS

AV

AN

ZA

DO

S D

EL

IPN

(C

INV

EST

AV

) SE

RV

ICIO

S B

IBL

IOG

RA

FIC

OS

on 1

0/19

/12.

For

per

sona

l use

onl

y.

Page 2: GEOMETRIC TECHNIQUES FOR HUMANOID PERCEPTION · cameras of the stereoscopic vision system into a single framework. We believe that our human-like computational scheme looks promising

November 8, 2010 12:1 WSPC/S0219-8436 191-IJHRS0219843610002234

430 A. Petrilli-Barcelo et al.

and motion of the binocular head. Like human eyes, the cameras work in a master-slave fashion. To initialize and improve the 3D reconstruction, stereo triangulationis used. Since robot drifting causes a loss of position, the robot uses both the Violaand Jones machine-learning technique and the EKF-based SLAM method to reducethe covariance error in localization and pose. Furthermore, when the robot returnsto previously visited navigation areas, it reutilizes old features storage in its statevector, which consequently helps to diminish the EKF covariance degradation. Sev-eral experiments confirm the efficiency of our approach, which combines the EKFfiltering and the Viola and Jones machine learning technique in a synergistic andcooperative fashion. The structure of this article comprises the following sections.Section 2 gives a brief introduction to CGA. Section 3 describes the conformalmodel for stereoscopic perception and provides insights into the implementationdetails. Section 4 explains and presents many experiments for robot egomotion, 3Dreconstruction, and relocalization using an EKF-based SLAM and the Viola andJones machine-learning approach; both techniques work cooperatively. Section 5 isdevoted to the conclusions.

2. Geometric Algebra: An Outline

The Geometric Algebra (GA) Gp,q,r is constructed over the vector space Vp,q,r,where p, q, r denote the signature of the algebra; if p = 0 and p = r = 0, the metricis Euclidean; if only r = 0, the metric is pseudo-Euclidean; if p = 0, q = 0, r = 0,the metric is degenerate. The dimension of Gn=p+q+r is 2n, and Gn is constructedby the applications of the geometric product over the vector basis ei. The geometricproduct between two vectors a,b is defined as

ab = a · b + a ∧ b

and the two parts; the inner product a · b is the symmetric part, while the wedgeproduct (outer product) a ∧ b is the antisymmetric part.

In Gp,q,r, the geometric product of two bases is defined as

eiej :=

1 ∈ R for i = j ∈ 1, . . . , p−1 ∈ R for i = j ∈ p + 1, . . . , p + q0 ∈ R for i = j ∈ p + q + 1, . . . , neij = ei ∧ ej for i = j,

this leads to a basis for Gn that contains elements of different grades called blades(e.g. scalars, vectors, bivectors, trivectors, etc.):

1, ei, ei ∧ ej, ei ∧ ej ∧ ek, . . . , e1 ∧ e2 ∧ · · · ∧ en,

which is called a basis blade; where the element of maximum grade is the pseu-doscalar I = e1 ∧ e2 ∧ · · · ∧ en. A linear combination of basis blades, all of the samegrade k, is called a k-vector. The linear combination of such k-vectors is called a mul-tivector, and multivectors with certain characteristics represent different geometric

Int.

J. H

uman

. Rob

ot. 2

010.

07:4

29-4

50. D

ownl

oade

d fr

om w

ww

.wor

ldsc

ient

ific

.com

by C

EN

TR

O D

E I

NV

EST

IGA

CIO

N Y

DE

EST

UD

IOS

AV

AN

ZA

DO

S D

EL

IPN

(C

INV

EST

AV

) SE

RV

ICIO

S B

IBL

IOG

RA

FIC

OS

on 1

0/19

/12.

For

per

sona

l use

onl

y.

Page 3: GEOMETRIC TECHNIQUES FOR HUMANOID PERCEPTION · cameras of the stereoscopic vision system into a single framework. We believe that our human-like computational scheme looks promising

November 8, 2010 12:1 WSPC/S0219-8436 191-IJHRS0219843610002234

Geometric Techniques for Humanoid Perception 431

objects or entities (such as points, lines, planes, circles, spheres, etc.), depending onthe GA in which we are working (e.g., a point (a, b, c) is represented in G3,0,0 [theGA of the 3D-Euclidean space E3] as x = ae1 + be2 + ce3; however, a circle cannotbe defined in G3,0,0, but it is possible to define it in G4,1,0 (CGA) as a four-vectorz = s1∧s2 [the intersection of two spheres in the same space]). Given a multivectorM, if we are interested in extracting only the blades of a given grade, we write 〈M〉r,where r is the grade of the blades we want to extract (obtaining a homogeneousmultivector M ′ or an r-vector).

The reader should consult Ref. 3 for a detailed explanation about CGA and itsapplications.

2.1. Conformal Geometrical Algebra

When working in CGA, G4,1,0 means to embed the Euclidean space in a higher-dimensional space with two extra basis vectors that have a particular meaning; inthis way, we represent particular entities of the Euclidean space with subspacesof the conformal space. The extra vectors we add are e+ and e−, defined by theproperties e+

2 = 1, e−2 = −1, e+ · e− = 0. With these two vectors, we define thenull vectors

e0 =12(e− − e+) and e = e− + e+

interpreted as the origin and the point at infinity, respectively. From now on, pointsin the 3D-Euclidean space are represented in lowercase, while conformal pointsappear with underlined letters; also, the conformal entities will be expressed in theOuter Product Null Space (OPNS) (noted with an asterisk; also known as the dualof the entity), and not in the Inner Product Null Space (IPNS) (without an asterisk)unless it is specified explicitly. To go from OPNS to IPNS, we need to multiply theentity by the pseudoscalar. To map a point x ∈ E3 to the conformal space in G4,1,0

(using IPNS), we use

x = x +12x2e + e0. (1)

Applying the wedge operator “∧” on points, we can express new entities in CGA.All geometric entities from CGA are shown in Table 1 for quick reference.

The pseudoscalar in CGA G4,1,0 is defined as I = IEE, where IE = e1e2e3 isthe pseudoscalar from G3,0,0 and E = e+e− is the pseudoscalar from the Minkowskiplane.

In GA there exist specific operators to model rotations and translations calledrotors and translators, respectively. In CGA, such operators are called versors andare defined by Eq. (2) being R the rotor, T the translator.

R = e−12 lθ = cos

2

)− sen

2

)l; T = e

et2 =

(1 +

et2

), (2)

Int.

J. H

uman

. Rob

ot. 2

010.

07:4

29-4

50. D

ownl

oade

d fr

om w

ww

.wor

ldsc

ient

ific

.com

by C

EN

TR

O D

E I

NV

EST

IGA

CIO

N Y

DE

EST

UD

IOS

AV

AN

ZA

DO

S D

EL

IPN

(C

INV

EST

AV

) SE

RV

ICIO

S B

IBL

IOG

RA

FIC

OS

on 1

0/19

/12.

For

per

sona

l use

onl

y.

Page 4: GEOMETRIC TECHNIQUES FOR HUMANOID PERCEPTION · cameras of the stereoscopic vision system into a single framework. We believe that our human-like computational scheme looks promising

November 8, 2010 12:1 WSPC/S0219-8436 191-IJHRS0219843610002234

432 A. Petrilli-Barcelo et al.

Table 1. Entities in CGA.

Entity IPNS OPNS

Sphere s = p + 12(p2 − ρ2)e + e0 s∗ = a ∧ b ∧ c ∧ d

Point x = x + 12x2e + e0 x∗ = (−Ex − 1

2x2e + e0)IE

Plane P = NIE − de P ∗ = e ∧ a ∧ b ∧ cN = (a − b) ∧ (a − c)d = (a ∧ b ∧ c)IE

Line L = P1 ∧ P2 L∗ = e ∧ a ∧ b= rIE + eMIE

r = a − bM = a ∧ b

Circle z = s1 ∧ s2 z∗ = a ∧ b ∧ csz = (e · z)z

ρz = z2

(e∧z)2

P-pair PP = s1 ∧ s2 ∧ s3 PP∗ = a ∧ b

where the rotation axis l = l1e23+l2e31+l3e12 is a unit bivector that represents a line(in IPNS) through the origin in CGA, θ is the rotation angle, t = t1e1 + t2e2 + t3e3

is the translation vector in E3.Such operators are applied to any entity of any dimension by multiplying the

entity by the operator from the left, and by the reverse of the operator from theright, as shown in Eq. (3).

x′ = σxσ, (3)

where x is any entity mentioned in Table 1, and σ is a versor (rotor, translator ormotor mentioned below). Using Eq. (3), it is easy to transform any entities fromCGA (points, point-pair, lines, circles, planes, and spheres), not only points, as isusual in other algebras.

Vector calculus is a coordinate-dependent mathematical system and its crossproduct cannot be extended to higher dimensions. The representation of geometricprimitives is based on lengthy equations and for linear transformations one usesmatrix representation with redundant coefficients. In contrast, CGA, a coordinate-free system, provides a fruitful description language to represent primitives andconstraints in any dimension, and by using successive reflections with bivectors, onebuilds versors to carry out linear transformations avoiding redundant coefficients.

3. Conformal Model for Stereoscopic Perception Systems

Now, we explain how we developed a conformal model to represent 3D informationin visual space. We start by discussing the role of conformal image mapping formodeling the human visual system. Then, we explain how we use the horopter inour conformal model to build a stereoscopic perception system. In the next section,we describe how we implement a real-time system using our model.

Int.

J. H

uman

. Rob

ot. 2

010.

07:4

29-4

50. D

ownl

oade

d fr

om w

ww

.wor

ldsc

ient

ific

.com

by C

EN

TR

O D

E I

NV

EST

IGA

CIO

N Y

DE

EST

UD

IOS

AV

AN

ZA

DO

S D

EL

IPN

(C

INV

EST

AV

) SE

RV

ICIO

S B

IBL

IOG

RA

FIC

OS

on 1

0/19

/12.

For

per

sona

l use

onl

y.

Page 5: GEOMETRIC TECHNIQUES FOR HUMANOID PERCEPTION · cameras of the stereoscopic vision system into a single framework. We believe that our human-like computational scheme looks promising

November 8, 2010 12:1 WSPC/S0219-8436 191-IJHRS0219843610002234

Geometric Techniques for Humanoid Perception 433

3.1. Conformal image mapping for modeling the human

visual system

In the last few decades, there has been a lot of researches to elucidate image mappingin the visual systems of primates and humans. Figure 1(a) shows the human visualsystem. The visual cues excite nodes of the retina, which generate biological signals.These signals traverse through optical fibers and finally are merged at the neocortexto produce a disparity map. Our brain uses this map to recreate a space and timeimpression of the visual space. Figure 1(b) depicts the so-called Vieth–Muller circle,or geometric locus, of the 3D points 1 to 5, which causes the same disparity on theeye’s retinas; see the lines passing the nodal points arriving at points 1 to 5 on theretinas. If one suffers from myopia, the circle, also called the theoretical horopter,

(a) (b)

(c) (d)

Fig. 1. (a) Human vision system. (b) Horopter and the Vieth–Muller [KCN2] circle. (c) Cartesianrepresentation of the stereo vision system. (d) Polar representation of the stereo vision system.

Int.

J. H

uman

. Rob

ot. 2

010.

07:4

29-4

50. D

ownl

oade

d fr

om w

ww

.wor

ldsc

ient

ific

.com

by C

EN

TR

O D

E I

NV

EST

IGA

CIO

N Y

DE

EST

UD

IOS

AV

AN

ZA

DO

S D

EL

IPN

(C

INV

EST

AV

) SE

RV

ICIO

S B

IBL

IOG

RA

FIC

OS

on 1

0/19

/12.

For

per

sona

l use

onl

y.

Page 6: GEOMETRIC TECHNIQUES FOR HUMANOID PERCEPTION · cameras of the stereoscopic vision system into a single framework. We believe that our human-like computational scheme looks promising

November 8, 2010 12:1 WSPC/S0219-8436 191-IJHRS0219843610002234

434 A. Petrilli-Barcelo et al.

is deformed to an ellipsoid or empirical horopter. In the ideal case, we can regardthe eyeball as perfectly symmetric. To build an artificial vision system, we have toachieve almost a circular horopter. Figure 1(c) depicts the human stereo vision usingCartesian coordinates and Fig. 1(d) depicts it using polar coordinates. In a series ofoutstanding papers, Erick L. Schwartz shows the developments concerning the imagemapping of visual information to the neocortex. He claims that “to simulate theimage properties of the human visual system (and perhaps other sensory systems)conformal image mapping is a necessary technique”.4 The mapping function

w = k log(z + a) (4)

is a widely accepted approximation to the topographic structure of the primateV1 foveal and parafoveal regions. An extension of it by simply adding an addi-tional parameter captures the full-field topographic map in terms of the dipole map

function

w = k log(z + a)(z + b)

. (5)

However, these models are still unsatisfactory, as they cannot describe topographicshear because they are both explicitly complex-analytic or conformal. Balasubra-manian et al.5 suggested a very simple procedure for topographic shear in V1, V2,and V3 assuming that cortical topographic shear is rotational (a compression alongiso-eccentricity contours). The authors model the constant rotational with a quasi-conformal mapping called the wedge mapping. Using five independent parameters,this mapping yields an approximation to the V1, V2, and V3 topographic struc-tures unifying these three areas into a single V1–V2–V3 complex as follows: First,we represent any point in the visual hemifield with the complex variable z = r eiθ,where r and θ denote the eccentricity and polar angle, respectively. The wedge mapfor the three visual areas Vk, k = 1, 2, 3, is the map

ηk(r eiθ) = r eiθk(θ), (6)

where the respective functions for V1, V2, and V3 are given by

Θ1(θ) = α1θ,

Θ2(θ) =

−α2

(θ − π

2

)+ Θ1

(+

π

2

)if 0+ ≤ θ ≤ π

2,

−α2

(θ +

π

2

)+ Θ1

(−π

2

)if − π

2≤ θ ≤ 0−,

Θ3(θ) =

α3θ + Θ2(0+) if 0+ ≤ θ ≤ π

2,

α3(θ) + Θ2(0−) if − π

2≤ θ ≤ 0−.

(7)

The wedge warps three copies of V1, V2, and V3 of the visual hemi-field and local-izes them into a pie form, where each one is compressed by an amount αk in theazimuthal direction, thus resulting in a rotational shear in each of the wedges.Finally, the wedge map is further modified via a dipole map using Eq. (5). Theresult is the full wedge-dipole model depicted semiqualitatively in Fig. 2.

Int.

J. H

uman

. Rob

ot. 2

010.

07:4

29-4

50. D

ownl

oade

d fr

om w

ww

.wor

ldsc

ient

ific

.com

by C

EN

TR

O D

E I

NV

EST

IGA

CIO

N Y

DE

EST

UD

IOS

AV

AN

ZA

DO

S D

EL

IPN

(C

INV

EST

AV

) SE

RV

ICIO

S B

IBL

IOG

RA

FIC

OS

on 1

0/19

/12.

For

per

sona

l use

onl

y.

Page 7: GEOMETRIC TECHNIQUES FOR HUMANOID PERCEPTION · cameras of the stereoscopic vision system into a single framework. We believe that our human-like computational scheme looks promising

November 8, 2010 12:1 WSPC/S0219-8436 191-IJHRS0219843610002234

Geometric Techniques for Humanoid Perception 435

Fig. 2. The wedge-dipole model semiqualitatively superimposed on the topography of the humanvisual areas.

Many visual cortex architectures of the primates and the human have an impor-tant feature responsible for the procedure of mixing the visual data of the left andright eyes. It has been shown that ocular dominance columns represent thin strips(5–10 minutes of arc) alternating the left- and right-eye input to the brain. Accord-ing to Yeshurfun and Schwartz,6 such an architecture, when operated upon witha cepstral filter, provides a strong cue for binocular stereopsis. The creature cansense depth using this visual cue.

In our work, we have a different motivation; we extend the 2D horopter concept7

to the 3D horopter sphere for fusing the left and right stereoscopic images in a sphere(Fig. 3(c)). This is basically a 3D representation using polar coordinates of the 3Dvisual space. This representation occurs after the stereopsis has been computed.When one traverses the spheres outwardly, one gets the sense of directed depth. InCGA, the directed depth is a vector pointing outward from the egocenter of ourhoropter. Its magnitude is the scalar value of the depth.

(a) (b) (c)

Fig. 3. (a) The horopter and sphere centers by bisector line when the depth δ grows. (b) Sphericalhoropter and the unit sphere, where the horopter depends on the azimuth angle κ. (c) Conformalhoropter configuration. The left camera center Lc, the right camera center Rc, and the fixationpoint Fp define a circle, which, by means of varying κ, defines a family of spheres or horopterspheres.

Int.

J. H

uman

. Rob

ot. 2

010.

07:4

29-4

50. D

ownl

oade

d fr

om w

ww

.wor

ldsc

ient

ific

.com

by C

EN

TR

O D

E I

NV

EST

IGA

CIO

N Y

DE

EST

UD

IOS

AV

AN

ZA

DO

S D

EL

IPN

(C

INV

EST

AV

) SE

RV

ICIO

S B

IBL

IOG

RA

FIC

OS

on 1

0/19

/12.

For

per

sona

l use

onl

y.

Page 8: GEOMETRIC TECHNIQUES FOR HUMANOID PERCEPTION · cameras of the stereoscopic vision system into a single framework. We believe that our human-like computational scheme looks promising

November 8, 2010 12:1 WSPC/S0219-8436 191-IJHRS0219843610002234

436 A. Petrilli-Barcelo et al.

Why do we believe that this representation is useful? Let us answer this questionby first showing the advantages of our mathematical system. In CGA, the compu-tational unit is the sphere. One can map all the 3D visual information onto a familyof spheres using the 3D horopter concept. As the 3D visual information is mappedon spheres, we can start to use this representation, however, in the 5D space ofCGA. We can explode then all the computational advantages of this mathematicalsystem, such as applying incidence algebra among circles, planes, and spheres andutilizing linear transformations such as translators, rotors, and dilators in terms ofspinors. In this way, the 3D visual information on the sphere can be treated moreefficiently in the CGA framework for various humanoid tasks such as recognition,reasoning, planning, and conducting autonomous actions.

We can also claim that all the efforts on conformal mapping are restricted tothe mapping on the primate and human visual areas; however, we are introduc-ing a mathematical framework in order to have an artificial way to fuse in 3D theimages of the left and right cameras for the depth sensing necessary for recogni-tion, representation, reasoning, and planning. Furthermore, using the powerful CGAframework, we can relate quite advantageously the algebra of visual primitives ofperception with the kinematics and dynamics of robot mechanisms.

3.2. Horopter and the conformal model

The horopter is the 3D geometric locus in space where an object has to be placed inorder to stimulate exactly two points in correspondence in the left and right retinasof a biological binocular vision system.8 In Fig. 3(c), we see a horopter dependingof an azimuth angle κ. In other words, the horopter represents a set of points thatcause minimal (almost zero) disparity on the retinas. We draw the horopter tracingan arc through the fixation point and the nodal points of the two retinas (seeFig. 3(a)). The theoretical horopter is known as the Vieth–Muller circle. Note thateach fixation distance has its own Vieth–Muller circle. According to this theoreticalview, the following assumptions can be made: each retina may be a perfect circle,both retinas are of the same size; corresponding points are perfectly matched intheir retina locations; and points in correspondence are evenly spaced across thenasal and temporal retinas of the right and left eyes.

If an object is located on either side of the horopter, a small amount of disparityis caused by the eyes. The brain analyzes this disparity and computes the rela-tive distance of the object with respect to the horopter. In a narrow range near thehoropter, the stereopsis does not exist. That is due to very small disparities that arenot enough to stimulate stereopsis. Empirical horopter measurements (even doneusing the Nonius method) do not agree with the Vieth–Muller circle. There are twoobvious reasons for this inconsistency, either due to irregularities in the distributionof visual directions in the two eyes or because of the optical distortion in the retinalimage. There are various physiological reasons why the horopter can be distorted.Another cause of distortion is the asymmetric distribution of oculocentric visual

Int.

J. H

uman

. Rob

ot. 2

010.

07:4

29-4

50. D

ownl

oade

d fr

om w

ww

.wor

ldsc

ient

ific

.com

by C

EN

TR

O D

E I

NV

EST

IGA

CIO

N Y

DE

EST

UD

IOS

AV

AN

ZA

DO

S D

EL

IPN

(C

INV

EST

AV

) SE

RV

ICIO

S B

IBL

IOG

RA

FIC

OS

on 1

0/19

/12.

For

per

sona

l use

onl

y.

Page 9: GEOMETRIC TECHNIQUES FOR HUMANOID PERCEPTION · cameras of the stereoscopic vision system into a single framework. We believe that our human-like computational scheme looks promising

November 8, 2010 12:1 WSPC/S0219-8436 191-IJHRS0219843610002234

Geometric Techniques for Humanoid Perception 437

distributions. In addition to a regional asymmetry in local signs in one eye, thedistribution between the two eyes may not be congruent (correspondence problem),which may be another cause of horopter distortion. Asymmetric mapping from theretina to the neocortex in both eyes also causes a deviation of the horopter fromthe Vieth–Muller circle.

The simple configuration of the horopter shown in Fig. 3(a) is nothing more thana very naive geometric representation using polar coordinates of the geometric locusof the visual space. In contrast, using the tools of CGA, we can claim that binocularvision can be reformulated advantageously using a spherical retina. Now, we showhow we find the horopter in the sphere of conformal geometry. Actually, we aredealing with a bunch of spheres intersecting the centers of the cameras LC and RC

(see Fig. 3(b)). This is the pencil of spheres in the projective space of spheres. Notethat the LC and RC camera’s centers are Poncelet points. Since a stereo systemonly sees in front of it, we consider the spheres emerging toward the front. When thespace locus of objects expands, the centers of the spheres move along the bisectorline of the stereo rig. This is when the depth δ grows (see Fig. 3(a)). From now on,we will use the term “horopter sphere” rather than “horopter circle”, because whenwe change the azimuth of the horopter circle, we are simply selecting a differentcircle of a particular horopter sphere si (see Fig. 3(b)). As a result, we can considerthat all the points of the visual space are lying on the pencil of the horopter sphere.Let us translate this description in terms of equations of CGA.

We call the unit horopter sphere s0 the one whose center is the egocenter ofthe stereo rig (see Fig. 3(c)) and that has the sphere equation s0 = p − 1

2ρ2e∞ =c0 + 1

2 (c20 − ρ2)e∞ + e0, where its center (egocenter) is attached to the true origin

of space of CGA and the radius is the half the stereo rig length ρ0 = 12 |LC − RC |.

The center ci of any horopter sphere si moves toward the point at infinity asci = ci + 1

2 (c2i )e∞ + e0, where ci is the Euclidean 3D vector. Thus, we can write

the equation of the sphere si as si = ci + 12 (c2

i − ρ2i )e∞ + e0, where the radius is

computed in terms of the stereopsis depth ρi = 12 (1 + δ).

Consider the figure of the model for the visual human system in Fig. 3(c). Wesee that the horopter circles lie on a pencil of planes πi. We can obtain the samecircles z

isimply by intersecting in our conformal model such a pencil of planes with

the pencil of tangent spheres as depicted in Fig. 3(c). The intersection is computedusing the meet operation of the duals of the plane and sphere and taking the dualof the result as zi = πi ∧ si. Now, taking the meet of any two horopter spheres,we gain a circle that lies on the front parallel plane with respect to the digitalcamera’s common plane z = si ∧ sj. Later, taking the meet of this circle with theunit horopter, we regain the Poncelet points LC and RC , which in our terms arecalled the point pair PPLR = z ∧ s0 = L ∧ (s∗0). (Note that the second part ofthe equation computes the point pair wedging the dual of sphere s∗0 with the linecrossing the camera centers LC and RC .)

If we further consider Fig. 3(b), the intersecting plane πi cuts the horopterspheres, generating a geometric locus on the plane as depicted in Figs. 3(b) and 3(c).

Int.

J. H

uman

. Rob

ot. 2

010.

07:4

29-4

50. D

ownl

oade

d fr

om w

ww

.wor

ldsc

ient

ific

.com

by C

EN

TR

O D

E I

NV

EST

IGA

CIO

N Y

DE

EST

UD

IOS

AV

AN

ZA

DO

S D

EL

IPN

(C

INV

EST

AV

) SE

RV

ICIO

S B

IBL

IOG

RA

FIC

OS

on 1

0/19

/12.

For

per

sona

l use

onl

y.

Page 10: GEOMETRIC TECHNIQUES FOR HUMANOID PERCEPTION · cameras of the stereoscopic vision system into a single framework. We believe that our human-like computational scheme looks promising

November 8, 2010 12:1 WSPC/S0219-8436 191-IJHRS0219843610002234

438 A. Petrilli-Barcelo et al.

(a) (b)

Fig. 4. (a) Log-polar representation of visual space. (b) Nodal points LC–RC (camera’s centers),image planes IL–IR, and the azimuth angle κ.

These horopter circles fulfill an interesting property. If one takes an inversion of allhoropter circles with centers on the line l, we get the radial lines of the polar dia-gram; see the right in Fig. 4(a). Now, since the plane πi (by varying angle κ) inter-sects the family of horopter spheres producing the horopter circles of Fig. 4(a),whose inverse is a 2D log-polar diagram, we can conclude that the inverse ofthe arrangement of horopter spheres and Poncelet points is equivalent to a 3Dlog-polar diagram, as depicted in Fig. 4(b). To understand this better, let us takeany radial line of the 2D log-polar diagram and express it in CGA: L = X∧Y∧e∞.Now, applying an inversion to this line, we get a circle; i.e., z = e4Le4. Notethat this inversion is implemented as a reflection on e4. The 3D log-polar diagramis an extraordinary result, because contrary to the general belief that conformalimage processing takes places in a 2D log-polar diagram, we can consider thatthe visual processing rather takes place in a 3D log-polar diagram. This claim isnovel as well as promising, because this framework can be used for 3D flow esti-mation, as opposed to the use of one view or even an arrangement of two log-polarcameras.

4. Egomotion, 3D Reconstruction, and Relocalization

Robots equipped with a stereo rig for 3D vision require a hand–eye calibration pro-cedure so that the robot global-coordinate system is coordinated with the coordinatesystem of the stereo rig. However, due to unmodeled parameters of the pan-tilt unitand noise, it is still difficult to have precise coordination between those coordinatesystems. After various attempts, we decided that, on the one hand, we should resortto Kalman filter techniques for the image stabilization and, on the other hand, weshould carry out the tracking by applying sliding-modes-based nonlinear controltechniques. Consequently, we can get a more precise perception and tracking sys-tem. In this article, we will not give the details of our algorithms for the nonlinearcontrol of the pan-tilt unit; they can be found in Ref. 9.

Int.

J. H

uman

. Rob

ot. 2

010.

07:4

29-4

50. D

ownl

oade

d fr

om w

ww

.wor

ldsc

ient

ific

.com

by C

EN

TR

O D

E I

NV

EST

IGA

CIO

N Y

DE

EST

UD

IOS

AV

AN

ZA

DO

S D

EL

IPN

(C

INV

EST

AV

) SE

RV

ICIO

S B

IBL

IOG

RA

FIC

OS

on 1

0/19

/12.

For

per

sona

l use

onl

y.

Page 11: GEOMETRIC TECHNIQUES FOR HUMANOID PERCEPTION · cameras of the stereoscopic vision system into a single framework. We believe that our human-like computational scheme looks promising

November 8, 2010 12:1 WSPC/S0219-8436 191-IJHRS0219843610002234

Geometric Techniques for Humanoid Perception 439

4.1. Relocalization and mapping

In order to perceive the environment autonomously using stereo vision, our robotuses simultaneous localization and mapping (SLAM). To implement this, we usedthe approach based on the EKF proposed by Davison.10 We extended the monocularEKF approach for stereo vision to improve the perception accuracy. We will brieflysummarize the monocular SLAM and then give additional details for stereo SLAM.For the sake of simplicity, the EKF is formulated to work in the 3D Euclideangeometric algebra G3,0,0, which is a subalgebra of CGA G4,1 for 3D space. Theestimated state and covariance of the digital camera are given by

x =

xv

y1

y2

...

, P =

Pxx Pxy1 Pxy2 . . .

Py1x Py1y1 Py1y2 . . .

Py2x Py2y1 Py2y2 . . ....

......

, (8)

where the camera state vector xv comprises a 3D translation vector tW , an orienta-tion rotor RWR (isomorphic to a quaternion), a velocity vector vW , and an angularvelocity vector wW . This amounts to a total of 13 parameters. Feature states yi

correspond to 3D points of objects or 3D points in the environment. In order tocorrect the rotor R and the translation vector t, we rectify them using motors (rotorand dual rotor). Since a motor is given by

M = TR =(

1 + It

2

)R = R + I

t

2R = R + IR′, (9)

the rotor R and dual rotor R′, which involves the translation, must fulfill thefollowing constraints:

MM = 1, (10)

RR = 1, (11)

RR′+ RR′ = 0, (12)

where the last equation indicates that R has to be orthogonal to the dual rotor R′

and is valid up to a scalar. Unfortunately, in practice, the rotor R estimated by theEKF is usually not truly orthogonal to the estimated R′ = t

2R. We adjust bothrotors for orthogonality using the simple technique suggested in Ref. 11, and thetranslation vector is recomputed as follows:

t = 2R′R.

In the EKF prediction step, a model for smooth motion is used involving theGaussian-distributed perturbations VW and ΩR, which affect the camera’s linearand angular velocities, respectively. The explicit process for motion in a time-step

Int.

J. H

uman

. Rob

ot. 2

010.

07:4

29-4

50. D

ownl

oade

d fr

om w

ww

.wor

ldsc

ient

ific

.com

by C

EN

TR

O D

E I

NV

EST

IGA

CIO

N Y

DE

EST

UD

IOS

AV

AN

ZA

DO

S D

EL

IPN

(C

INV

EST

AV

) SE

RV

ICIO

S B

IBL

IOG

RA

FIC

OS

on 1

0/19

/12.

For

per

sona

l use

onl

y.

Page 12: GEOMETRIC TECHNIQUES FOR HUMANOID PERCEPTION · cameras of the stereoscopic vision system into a single framework. We believe that our human-like computational scheme looks promising

November 8, 2010 12:1 WSPC/S0219-8436 191-IJHRS0219843610002234

440 A. Petrilli-Barcelo et al.

(a) (b)

Fig. 5. Visualization of the camera’s motion: (a) constant velocity model for smooth motion and(b) Nonzero acceleration by shaking motion.

t is given by

xv =

tWnew

RWRnew

vWnew

wRnew

=

tW + (vW + VW )t

RWRR((wR + ΩR)t)vW + VW

wW + ΩR)

. (13)

Figure 5(a) depicts the potential deviations from a constant-velocity trajectory andFig. 5(b) depicts a shaking motion during robot maneuvers.

The EKF implementation demands computations of the Jacobians of this motionfunction with respect to both xv and the perturbation vector.

4.2. Machine learning to support the robot’s spatial

orientation and navigation

A landmark literally is a geographic feature used by explorers and others to findtheir way back or to move through a certain area. In the 3D map-building pro-cess, a robot can use these landmarks to remember where it was before while itexplores its environment. Also, the robot can use landmarks to find its position ina map previously built, therefore facilitating its relocalization. Since we are using acamera stereo system, the 3D position and pose of any object can be obtained andrepresented in the 3D virtual environment of the map. Thus, a natural or artificiallandmark located in the environment can greatly help the mobile robot to know itsactual position and relative pose with respect to both the map and the environment.

Viola and Jones introduced a faster machine-learning approach to face detectionbased on the so-called AdaBoost algorithm.12 This approach can be used to detectour static landmarks. Once the landmarks have been selected and trained, the robotcan utilize them to navigate in the environment. This navigation is supported bythe 3D virtual map, and only one camera (left camera) is used to perform the Violaand Jones algorithm. If a landmark is found, we get a subimage IL from the leftcamera image. This IL is the region of the image where the landmark was found(Fig. 6(a)).

When a landmark is identified in one image (left camera), we must be sure thatthe landmark is present in the image in the right camera in order to get the 3Dposition. The landmark in the right image is also detected by the Viola and Jonesalgorithm, and its region is identified by a subimage IR.

Int.

J. H

uman

. Rob

ot. 2

010.

07:4

29-4

50. D

ownl

oade

d fr

om w

ww

.wor

ldsc

ient

ific

.com

by C

EN

TR

O D

E I

NV

EST

IGA

CIO

N Y

DE

EST

UD

IOS

AV

AN

ZA

DO

S D

EL

IPN

(C

INV

EST

AV

) SE

RV

ICIO

S B

IBL

IOG

RA

FIC

OS

on 1

0/19

/12.

For

per

sona

l use

onl

y.

Page 13: GEOMETRIC TECHNIQUES FOR HUMANOID PERCEPTION · cameras of the stereoscopic vision system into a single framework. We believe that our human-like computational scheme looks promising

November 8, 2010 12:1 WSPC/S0219-8436 191-IJHRS0219843610002234

Geometric Techniques for Humanoid Perception 441

(a)

(b)

Fig. 6. (a) Four landmarks found in the left camera image using the Viola and Jones algorithm.The images show the point of interest. (b) Identification of an object and determining its 3Dposition. It is represented in the 3D virtual environment using a sphere.

Int.

J. H

uman

. Rob

ot. 2

010.

07:4

29-4

50. D

ownl

oade

d fr

om w

ww

.wor

ldsc

ient

ific

.com

by C

EN

TR

O D

E I

NV

EST

IGA

CIO

N Y

DE

EST

UD

IOS

AV

AN

ZA

DO

S D

EL

IPN

(C

INV

EST

AV

) SE

RV

ICIO

S B

IBL

IOG

RA

FIC

OS

on 1

0/19

/12.

For

per

sona

l use

onl

y.

Page 14: GEOMETRIC TECHNIQUES FOR HUMANOID PERCEPTION · cameras of the stereoscopic vision system into a single framework. We believe that our human-like computational scheme looks promising

November 8, 2010 12:1 WSPC/S0219-8436 191-IJHRS0219843610002234

442 A. Petrilli-Barcelo et al.

4.3. Landmark position and pose estimation

When we talk about landmark position estimation, we are looking strictly for the3D location of these landmarks in the environment, not for the pose (position andorientation) of the object found. To do this, we precalculated the depth using thedisparity of one object’s fixed point. After getting the landmark identified in bothimages, we proceed to calculate the points of interest. To do this, we use Cannyedge detection operator on IL. To find the correspondence of these points, we usethe epipolar geometry13 and the zero-mean normalized cross-correlation (ZNCC).

Correspondences of an image patch are searched for along the epipolar lineby calculating the ZNCC only in a given interval (dmin, . . . , dmax) of so-calleddisparities.14,15 A small disparity represents a large distance to the camera, whereasa large value indicates a small distance (parallax).

When all the points are matched in both images, we proceed to calculate its 3Dposition using the stereo triangulation. Then, we integrate this set of points to getits center of gravity and place the center of a virtual sphere on it. The radius of thecovering sphere is calculated taking the highest number of points of the landmarkwithin the sphere. The sphere is stored in the 3D virtual map using CGA andlabeled as a landmark (see Fig. 6(b)).

Since the robot is using the EKF filter to estimate the position and pose ofits binocular head, we use the 3D landmarks also to compute the pose relativeto the binocular head with respect to the 3D landmark location. Once the objecthas been recognized using the Viola and Jones method, we use stereo to get therelative 3D pose. For that, a very simple method is utilized; namely, a plane ofthe object is used as reference and the robot computes the gravity center and itsnormal. This information is fed into the state vector of the EKF in order to reducethe pose and position error. Note that through time the EKF covariance increasesdue to the robot drifting and noise. However, by using the pose gained from the3D landmarks, we can reduce the covariance greatly. In addition, if the robot hascarried out a shaking motion, the covariance increases as well. Thus, using the poseof the landmarks, the robot can improve its performance. Note that this procedureprovides intermediate help for the robot both for the initialization of the EKF filterand to get a much more robust relocalization in long robot tours.

4.4. Integration of SLAM and machine learning in the conformal

vision system

The cameras of the binocular vision system work in a master-slave fashion whileeach camera fixates on points of interest to estimate its own pose and the 3Dpoints. Points of interest are tracked through time. Note that with the EKF, the 3Dreconstruction is monocular without the need to establish frame by frame the pointcorrespondences satisfying epipolar geometry. It may be the same with the humanvisual system, because it appears that point correspondence between the left andright eyes is not necessarily established. If some points get out of the field of view,

Int.

J. H

uman

. Rob

ot. 2

010.

07:4

29-4

50. D

ownl

oade

d fr

om w

ww

.wor

ldsc

ient

ific

.com

by C

EN

TR

O D

E I

NV

EST

IGA

CIO

N Y

DE

EST

UD

IOS

AV

AN

ZA

DO

S D

EL

IPN

(C

INV

EST

AV

) SE

RV

ICIO

S B

IBL

IOG

RA

FIC

OS

on 1

0/19

/12.

For

per

sona

l use

onl

y.

Page 15: GEOMETRIC TECHNIQUES FOR HUMANOID PERCEPTION · cameras of the stereoscopic vision system into a single framework. We believe that our human-like computational scheme looks promising

November 8, 2010 12:1 WSPC/S0219-8436 191-IJHRS0219843610002234

Geometric Techniques for Humanoid Perception 443

the EKF keeps in its state vector for a while those 3D points that are not visible. Thecameras use the stereo depth for the EKF initialization only at the start. When therobot uses a landmark found with machine learning and stereo to get the disparity,the robot actualizes the 3D points in its state vector; consequently, the covariancewill be diminished rapidly in fewer than ten frames.

In Fig. 7(a), we see different points of interest on objects and on the wall.Figure 7(b) depicts the cyclopean eye, where the 3D points estimated by the EKFlie on the Vieth–Muller circles or locus of zero of disparity (ZOD); namely, all pointslying in the same circle produce the same disparity with respect to the egocenter.You can appreciate that these circles lie on spheres, which vary their radius towardinfinity. Figure 7(c) shows the EKF estimated rotation angles of the camera.

In Fig. 8(a), we see points lying on the edge of a staircase of interest on objectsand on the wall. Figure 7(b) depicts the cyclopean eye, where the staircase 3Dpoints were estimated by the EKF. The cyclopean view can be used by a humanoidto climb the staircase.

(a) (b) (c) (d)

Fig. 7. (a), (b) Left and right images of objects and wall points. (c) Cyclopean view of 3D space.(d) Estimation of the rotation angles of left and right cameras.

(a) (b)

Fig. 8. (a) Left and right images of points of interest on a staircase. (b) Cyclopean view of thestaircase.

Int.

J. H

uman

. Rob

ot. 2

010.

07:4

29-4

50. D

ownl

oade

d fr

om w

ww

.wor

ldsc

ient

ific

.com

by C

EN

TR

O D

E I

NV

EST

IGA

CIO

N Y

DE

EST

UD

IOS

AV

AN

ZA

DO

S D

EL

IPN

(C

INV

EST

AV

) SE

RV

ICIO

S B

IBL

IOG

RA

FIC

OS

on 1

0/19

/12.

For

per

sona

l use

onl

y.

Page 16: GEOMETRIC TECHNIQUES FOR HUMANOID PERCEPTION · cameras of the stereoscopic vision system into a single framework. We believe that our human-like computational scheme looks promising

November 8, 2010 12:1 WSPC/S0219-8436 191-IJHRS0219843610002234

444 A. Petrilli-Barcelo et al.

(a)

(b)

(c)

Fig. 9. (a) Left and right images of objects and wall points. (b) Vergence in the humanoid. (c) Cyclo-pean view of 3D space and estimation of the rotation angles of the left and right cameras: the leftcurves are EKF estimates and the right ones are supplied by the motor encoders.

In Fig. 9(a), we see a pair of stereo sequences showing points of interest onobjects and the wall. The head of the robot was tilted slowly from the floorto the ceiling, tracking these points of interest, and simultaneously the camerashad a vergence [KCN7] motion from the inside to the outside, i.e., from fixating

Int.

J. H

uman

. Rob

ot. 2

010.

07:4

29-4

50. D

ownl

oade

d fr

om w

ww

.wor

ldsc

ient

ific

.com

by C

EN

TR

O D

E I

NV

EST

IGA

CIO

N Y

DE

EST

UD

IOS

AV

AN

ZA

DO

S D

EL

IPN

(C

INV

EST

AV

) SE

RV

ICIO

S B

IBL

IOG

RA

FIC

OS

on 1

0/19

/12.

For

per

sona

l use

onl

y.

Page 17: GEOMETRIC TECHNIQUES FOR HUMANOID PERCEPTION · cameras of the stereoscopic vision system into a single framework. We believe that our human-like computational scheme looks promising

November 8, 2010 12:1 WSPC/S0219-8436 191-IJHRS0219843610002234

Geometric Techniques for Humanoid Perception 445

(a)

(b)

Fig. 10. (a) Selected object. (b) Monocular recognition of a cup using the Viola and Jones method.This retrieves 3D information, which can be used for grasping and manipulation.

near to fixating far away. Figure 9(b) shows the estimated angles of the left andright camera views compared with the ground truth of the angles supplied bythe motor encoders (curves at the right). You can see acceptable angle estimationby the EKF.

In Fig. 10(a), we see detected points of interest on a cup of the set of objectslying on the table shown in Fig. 7(a). Using features from a small window, the robotuses the Viola and Jones machine-learning procedure to recognize the cup using justone camera; then the related 3D object information is retrieved, which is useful forgrasping and manipulating, as depicted in Fig. 10(b). Here, the stereo vision canhelp further to get the object’s pose.

Int.

J. H

uman

. Rob

ot. 2

010.

07:4

29-4

50. D

ownl

oade

d fr

om w

ww

.wor

ldsc

ient

ific

.com

by C

EN

TR

O D

E I

NV

EST

IGA

CIO

N Y

DE

EST

UD

IOS

AV

AN

ZA

DO

S D

EL

IPN

(C

INV

EST

AV

) SE

RV

ICIO

S B

IBL

IOG

RA

FIC

OS

on 1

0/19

/12.

For

per

sona

l use

onl

y.

Page 18: GEOMETRIC TECHNIQUES FOR HUMANOID PERCEPTION · cameras of the stereoscopic vision system into a single framework. We believe that our human-like computational scheme looks promising

November 8, 2010 12:1 WSPC/S0219-8436 191-IJHRS0219843610002234

446 A. Petrilli-Barcelo et al.

4.5. Robot navigation

In this section, we show how the EKF and Viola and Jones machine-learning meth-ods work cooperatively to reduce the degradation of the covariance of the EKF.In Fig. 11(a), the image sequence shows the robot moving along a corridor; Fig. 11(c)is a virtual representation of the navigation path and the pose of the binocular head,where the ellipsoids represent the pose uncertainty, which is directly proportionalto the evolution of the EKF covariance. The robot perception works at a rate of20 frames per second. The figure shows frames 450, 900, 1700, and 2800. In the

(a) (b) (c)

Fig. 11. (a) Left images with points of interest. (b) Robot navigating in the 3D virtual worldwith the pose of the binocular head. The ellipsoids indicate the pose uncertainty. (c) Zoom of theellipsoids.

Int.

J. H

uman

. Rob

ot. 2

010.

07:4

29-4

50. D

ownl

oade

d fr

om w

ww

.wor

ldsc

ient

ific

.com

by C

EN

TR

O D

E I

NV

EST

IGA

CIO

N Y

DE

EST

UD

IOS

AV

AN

ZA

DO

S D

EL

IPN

(C

INV

EST

AV

) SE

RV

ICIO

S B

IBL

IOG

RA

FIC

OS

on 1

0/19

/12.

For

per

sona

l use

onl

y.

Page 19: GEOMETRIC TECHNIQUES FOR HUMANOID PERCEPTION · cameras of the stereoscopic vision system into a single framework. We believe that our human-like computational scheme looks promising

November 8, 2010 12:1 WSPC/S0219-8436 191-IJHRS0219843610002234

Geometric Techniques for Humanoid Perception 447

(a) (b) (c)

Fig. 12. (a) Left images of the stereo vision system with two recognized landmarks. (b) Relocaliza-tion of the robot during the navigation in the 3D virtual world. We note by the robot’s trajectorythe reduction in the size of the old pose ellipsoids when the robot finds a landmark. (c) Lateralperspective to appreciate better the reduction in the pose uncertainty (size of ellipsoids) when therobot finds the two landmarks and uses their relative position to reduce the covariances.

Int.

J. H

uman

. Rob

ot. 2

010.

07:4

29-4

50. D

ownl

oade

d fr

om w

ww

.wor

ldsc

ient

ific

.com

by C

EN

TR

O D

E I

NV

EST

IGA

CIO

N Y

DE

EST

UD

IOS

AV

AN

ZA

DO

S D

EL

IPN

(C

INV

EST

AV

) SE

RV

ICIO

S B

IBL

IOG

RA

FIC

OS

on 1

0/19

/12.

For

per

sona

l use

onl

y.

Page 20: GEOMETRIC TECHNIQUES FOR HUMANOID PERCEPTION · cameras of the stereoscopic vision system into a single framework. We believe that our human-like computational scheme looks promising

November 8, 2010 12:1 WSPC/S0219-8436 191-IJHRS0219843610002234

448 A. Petrilli-Barcelo et al.

first image, we can see that the ellipsoids are relatively small, because at the verybeginning, the EKFs for the stereo cameras were initialized using the depth gainedby stereo vision. This will be used only at the very beginning; henceforth, the accu-racy of the pose and 3D reconstruction will only rely on the application of the Violaand Jones method.

Since the old features are found again in the visual space, this helps to reducethe EKF covariance and thus to relocalize the robot. Another way to reduce thecovariance and relocalize the robot is to find landmarks by using the Viola andJones machine-learning method. Once the landmark is identified using machinelearning, by taking a set of points of the object’s frame, the distance and orienta-tion of the robot with respect to this plane are computed. This information is fedinto the EKF state vector, which helps to reduce the covariance and relocalize therobot (Fig. 12).

5. Conclusions

In this work, we introduced a novel conformal model for human-like vision. Thestandard concept of horopter circles is extended to a horopter sphere, which leadsto a 3D log-polar representation of the visual space. Robots navigate and trackfeatures; for this purpose, we have equipped the robot perception system with ahybrid approach combining EKF-based SLAM and the Viola and Jones machine-learning technique. This cooperative approach works in real time, looking for 3Dlandmarks to compute the relative pose of the robot. As a result, the estimationerror of the 3D pose by the EKF filter can be bounded. Working cooperatively, bothtechniques help greatly to improve the relocalization of the robot in a 3D map. Adetailed experimental analysis shows the efficiency of our perception system, whichis useful for humanoids.

References

1. C. Capurro, F. Panerai and G. Sandini, Dynamic vergence using log-polar images,International Journal of Computer Vision 24(1) (1997) 79–94.

2. R. Mandelbaum, L. McDowell, L. Bogoni, B. Reich and M. Hansen, Real-time stereoprocessing, obstacle detection, and terrain estimation from vehicle-mounted stereocameras, Proceedings of the 4th IEEE Workshop on Applications of Computer Vision(WACV’98) (1998), pp. 288–294.

3. E. Bayro-Corrochano, Geometric Computing: For Wavelet Transforms, Robot Vision,Learning, Control and Action (Springer, ISBN: 978-1-84882-928-2, 2010).

4. C. Frederick and E. L. Schwartz, Conformal image warping, IEEE Computer Graphicsand Applications 10 (1990) 54–61.

5. M. Balasubramanian, J. Polimeni and E. L. Schwartz, The V1–V2–V3 complex qua-siconformal dipole maps in primate striate and extra-striate cortex, Neural Networks15(10) (2002) 1157–1163.

6. Y. Yeshurun and E. L. Schwartz, Cepstral filtering on a columnar image architecture:a fast algorithm for binocular stereo segmentation, IEEE Transactions on PatternAnalysis and Machine Intelligence II(5) (1989) 759–767.

Int.

J. H

uman

. Rob

ot. 2

010.

07:4

29-4

50. D

ownl

oade

d fr

om w

ww

.wor

ldsc

ient

ific

.com

by C

EN

TR

O D

E I

NV

EST

IGA

CIO

N Y

DE

EST

UD

IOS

AV

AN

ZA

DO

S D

EL

IPN

(C

INV

EST

AV

) SE

RV

ICIO

S B

IBL

IOG

RA

FIC

OS

on 1

0/19

/12.

For

per

sona

l use

onl

y.

Page 21: GEOMETRIC TECHNIQUES FOR HUMANOID PERCEPTION · cameras of the stereoscopic vision system into a single framework. We believe that our human-like computational scheme looks promising

November 8, 2010 12:1 WSPC/S0219-8436 191-IJHRS0219843610002234

Geometric Techniques for Humanoid Perception 449

7. W. C. Hoffman, The Lie algebra of visual perception, Journal of Mathematical Psy-chology 3 (1966) 65–98.

8. S. Steinman, Binocular Vision Module: The Empirical Horopter (Addison-WesleyReading, 1994).

9. E. Bayro-Corrochano and D. Gonzalez-Aguirre, Like vision using conformal geomet-ric algebra, Proceedings of the International Conference on Robotics and Automation(ICRA’2008, Pasadena, CA, May 19–23, 2008), pp. 1299–1304.

10. A. J. Davison, I. Reid, N. Molton and O. Stasse, Monoslam: real time single cam-era slam, IEEE Transactions on Pattern Analysis and Machine Intelligence 6 (2007)1052–1076.

11. E. Bayro-Corrochano and Y. Zang, The motor extended Kalman filter: a geometricapproach for rigid motion estimation, Journal of Mathematical Imaging and Vision13 (2000) 79–100.

12. P. Viola and M. Jones, Rapid object detection using a boosted cascade of simplefeatures, In IEEE Computer Society Conference on Computer Vision and PatternRecognition (December 2001), pp. 511–518.

13. R. Hartley and A. Zisserman, Multiple View Geometry in Computer Vision (Cam-bridge University Press, Cambridge, 2000).

14. P. Azad, T. Gockel and R. Dillmann (ed.), Computer Vision: Principles and Practice(Elektor Electronics, 2008).

15. O. Faugeras et al., Real-time correlation-based stereo: algorithm, implementation andapplications (INRIA Technical Report No. 2013, 1993).

16. E. Bayro-Corrochano, Robot perception and action using conformal geometric alge-bra, E. Bayro-Corrochano (ed.), Handbook of Geometric Computing, Applications inPattern Recognition, Computer Vision, Neuralcomputing and Robotics (Springer, Hei-delberg, Germany, 2005), Chap. 13, pp. 405–460.

17. J. Castano, E. Zalama and J. Gomez, Reconstruction of three-dimensional modelsof environments with a mobile robot, Proceedings of the International Conference onRobotics and Automation (ICRA’2003, New Orleans, 2003).

18. D. Comaniciu and P. Meer, Mean shift: a robust approach toward feature space anal-ysis, IEEE Transactions on Pattern Analysis and Machine Intelligence 24 (2002)603–619.

19. E. Bayro-Corrochano and D. Gonzales-Aguirre, Humna like vision using confor-mal geometric algebra, In Proceedings of International Conference on Robotics andAutomation (ICRA’2006) (Orlando, Florida, May, 2006), pp. 1299–1304.

20. N. Pollefeys and S. Sinha, Iso-disparity surfaces for general stereo configurations, Com-puter Vision — ECCV 2004 (European Conference on Computer Vision), T. Pajdlaand J. Matas (eds.) (LNCS Springer-Verlag, 2004), Vol. 3023, pp. 509–520.

21. Z. Zhang, Flexible camera calibration by viewing a plane from unknown orientations,Seventh International Conference on Computer Vision (ICCV’99), (1999), Vol. 1,pp. 666–672.

Alberto Petrilli-Barcelo received his bachelor’s degree inphysics and mathematics from the Superior School on Physicsand Mathematics, ESFM-IPN (2003) and his master’s degreein computer sciences from the Center for Computer Research,CIC-IPN (2006). Currently, he is a Ph.D. student at the Cen-ter for Research and Advanced Studies (CINVESTAV-IPN). Hisresearch interests include computer vision, augmented reality, 3Dreconstruction, tracking and humanoid robotics.

Int.

J. H

uman

. Rob

ot. 2

010.

07:4

29-4

50. D

ownl

oade

d fr

om w

ww

.wor

ldsc

ient

ific

.com

by C

EN

TR

O D

E I

NV

EST

IGA

CIO

N Y

DE

EST

UD

IOS

AV

AN

ZA

DO

S D

EL

IPN

(C

INV

EST

AV

) SE

RV

ICIO

S B

IBL

IOG

RA

FIC

OS

on 1

0/19

/12.

For

per

sona

l use

onl

y.

Page 22: GEOMETRIC TECHNIQUES FOR HUMANOID PERCEPTION · cameras of the stereoscopic vision system into a single framework. We believe that our human-like computational scheme looks promising

November 8, 2010 12:1 WSPC/S0219-8436 191-IJHRS0219843610002234

450 A. Petrilli-Barcelo et al.

Heriberto Casarrubias-Vargas received his bachelor’s degreein physics and mathematics from the Superior School on Physicsand Mathematics, ESFM-IPN (2004) and his master’s degreein computer sciences from the Center for Computer Research,CIC-IPN (2006). Currently, he is a Ph.D. student at the Cen-ter for Research and Advanced Studies (CINVESTAV-IPN). Hisresearch interests include computer vision, image analysis, pat-tern recognition, machine learning, SLAM and computer vision.

Miguel Bernal-Marin received his B. Sc. degree in mathemat-ics from Guadalajara University (UdG) in 2003 and his M.Sc.degree in electrical engineering from the Center for Researchand Advanced Studies of the National Polytechnic Institute(CINVESTAV-IPN), Guadalajara Campus, Mexico, in 2007.Currently he is a Ph.D. student in electrical engineering atCINVESTAV-IPN, campus Guadalajara. His research interestsinclude robotics, humanoids, computer vision, SLAM, geometricmethods to model environments, and CGA applied in robotics.

Eduardo Bayro-Corrochano gained his Ph.D. in CognitiveComputer Science in 1993 from the University of Wales atCardiff. At present is a full professor at CINVESTAV UnidadGuadalajara, Mexico, Department of Electrical Engineering andComputer Science. His current research interest focuses on geo-metric methods for artificial perception and action systems. Itincludes geometric neural networks, visually guided robotics,humanoids, color image processing, Lie bivector algebras for

early vision and robot maneuvering. He built the humanoid MEXONE 36 DOFand 105 cm. He is editor and author of 4 international books. He has published over120 refereed journal, book chapters and conference papers. He is fellow of the IAPRsociety.

Rudiger Dillmann Rudiger Dillmann received a Ph.D. at the University of Karl-sruhe in 1980. Since 1987 he is Professor of the Department of Computer Science andsince 2001 director of the research group, Industrial Applications of Informatics andMicrosystems (IAIM) at the University of Karlsruhe. Since 2002 he is also directorof the Research Center for Information Science (FZI), Karlsruhe. Special interestin intelligent, autonomous and mobile robotics, machine learning, machine vision,man-machine interaction, computer science in medicine and simulation techniques.

He is author or co-author of more than 100 scientific publications and severalbooks.

Int.

J. H

uman

. Rob

ot. 2

010.

07:4

29-4

50. D

ownl

oade

d fr

om w

ww

.wor

ldsc

ient

ific

.com

by C

EN

TR

O D

E I

NV

EST

IGA

CIO

N Y

DE

EST

UD

IOS

AV

AN

ZA

DO

S D

EL

IPN

(C

INV

EST

AV

) SE

RV

ICIO

S B

IBL

IOG

RA

FIC

OS

on 1

0/19

/12.

For

per

sona

l use

onl

y.