light source estimation from spherical reflections

Light Source Estimation from

Spherical Reflections

by

Dirk Schnieders

A thesis submitted for the degree of

Doctor of Philosophy

at The University of Hong Kong

March 2011

Hand with Reflecting Sphere, M. C. Escher 1935

Declaration

I declare that the thesis and the research work thereof represents

my own work, except where due acknowledgement is made, and

that it has not been previously included in a thesis, dissertation or

report submitted to this University or to any other institution for

a degree, diploma or other qualifications.

Signed . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Dirk Schnieders

Acknowledgements

I would like to thank the Hong Kong Government, the Japan Soci-

ety for Promotion of Science, Microsoft Research Asia, the German

Academic Exchange Service and the organizers of ECCV 2008 and

CVPR 2010 for providing financial support for my research.

Shuda Li contributed to Chapter 2, Zhenwen Dai to Chapter 3

and Xingdou Fu to Chapter 4, and Chen Liang provided his source

code for the synthetic experiments. I am very grateful for their

advice and help. I would like to express my gratitude towards my

supervisor Kwan-Yee Kenneth Wong for his support and guidance.

In addition, I am thankful to Yasuyuki Matsushita for inviting

me to Beijing for an internship at Microsoft Research Asia and

Atsushi Nakazawa for inviting me to Osaka for a research stay at

Osaka University.

Last, but not least, I wish to thank my family and friends for their

love and support.

Dirk Schnieders

Abstract of thesis entitled

Light Source Estimation from SphericalReflections

Submitted by

Dirk Schnieders

for the degree of Doctor of Philosophy

at The University of Hong Kong

in March 2011

Abstract

In the first part of this thesis, a novel method for recovering light

directions and camera poses from a single sphere is introduced.

Traditional methods for estimating light directions using spheres

either assume both the radius and center of the sphere being known

precisely, or they depend on multiple calibrated views to recover

these parameters. It will be shown that the light directions can

be uniquely determined from the specular highlights observed in

a single view of a sphere without knowing or recovering the exact

radius and center of the sphere. Besides, if multiple cameras are

observing the sphere, its images will uniquely define the translation

vector of each camera from a common world origin centered at the

sphere center. It will be shown that the relative rotations between

the cameras can be recovered using two or more light directions

estimated from each view. Closed form solutions for recovering the

light directions and camera poses are presented.

The thesis then considers an area light source, which is estimated

in 3D space by reconstructing its edges. An empirical analysis on

existing methods for line estimation from a single view is carried

out, and it is shown that line estimation from a single view of a

sphere is an ill-conditioned configuration. By considering a second

identical sphere, a closed form solution for single view polygonal

light estimation is proposed. In addition, an iterative approach

based on two unknown views of just a single sphere is proposed.

Finally, a novel method for reconstructing a visual display (a rect-

angular light source) from spherical reflections on the cornea of

human eyes is proposed. Reconstruction of eyes and display is use-

ful for point-of-gaze estimation, which can be approximated from

the 3D positions of the iris and display. It is shown that iris bound-

aries and display reflections in a single intrinsically calibrated im-

age provide enough information for such estimation. The proposed

method assumes a simplified geometric eyeball model with certain

anatomical constants, which are used to reconstruct the eye. By

using minimal information to perform the reconstruction, the cor-

responding hardware setup can be greatly simplified, which in turn

results in a simplified and automatic reconstruction. (358 words)

Contents

1 Introduction 1

1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

1.2 Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

1.3 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

1.4 Thesis Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

2 Light Direction Estimation 6

2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

2.2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

2.3 Reconstruction of Sphere . . . . . . . . . . . . . . . . . . . . . . 10

2.4 Illuminant Direction Estimation . . . . . . . . . . . . . . . . . . 12

2.5 Camera Pose Estimation . . . . . . . . . . . . . . . . . . . . . . 16

2.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

2.7 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . 21

2.7.1 Synthetic Data . . . . . . . . . . . . . . . . . . . . . . . 21

2.7.2 Real Data . . . . . . . . . . . . . . . . . . . . . . . . . . 26

2.8 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

3 Polygonal Light Source Estimation 39

3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

3.2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

3.3 Line Reconstruction . . . . . . . . . . . . . . . . . . . . . . . . . 44

3.3.1 Intersection in Plucker Space . . . . . . . . . . . . . . . 49

3.3.2 Empirical Analysis . . . . . . . . . . . . . . . . . . . . . 50

3.3.3 Two Spheres & One View . . . . . . . . . . . . . . . . . 54

i

CONTENTS

3.3.4 Two Views & One Sphere . . . . . . . . . . . . . . . . . 55

3.4 Polygonal Light Source Estimation . . . . . . . . . . . . . . . . 58

3.4.1 Extraction of Polygon . . . . . . . . . . . . . . . . . . . 58

3.4.2 Camera Position Estimation . . . . . . . . . . . . . . . . 58

3.5 Experimental Results on Real Data . . . . . . . . . . . . . . . . 60

3.5.1 Two Spheres & One View . . . . . . . . . . . . . . . . . 60

3.5.2 Two Views & One Sphere . . . . . . . . . . . . . . . . . 64

3.6 Comparison to a Point-based Algorithm . . . . . . . . . . . . . 66

3.7 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68

4 Display and Gaze Estimation 70

4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70

4.2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72

4.3 The Eye . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74

4.3.1 Anatomy . . . . . . . . . . . . . . . . . . . . . . . . . . . 74

4.3.2 Dimensions . . . . . . . . . . . . . . . . . . . . . . . . . 76

4.3.3 Optics . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77

4.3.4 Approximate Geometric Model . . . . . . . . . . . . . . 78

4.3.5 Movement . . . . . . . . . . . . . . . . . . . . . . . . . . 81

4.4 Limbus Reconstruction . . . . . . . . . . . . . . . . . . . . . . . 82

4.4.1 Closed Form Solution . . . . . . . . . . . . . . . . . . . . 82

4.4.2 Noise Analysis . . . . . . . . . . . . . . . . . . . . . . . . 85

4.4.3 Experimental Results . . . . . . . . . . . . . . . . . . . . 88

4.5 Display Edge Reconstruction . . . . . . . . . . . . . . . . . . . . 90

4.5.1 Variations in Anatomical Parameter . . . . . . . . . . . . 92

4.6 Visual Display Reconstruction . . . . . . . . . . . . . . . . . . . 93

4.7 Point-of-Gaze Estimation . . . . . . . . . . . . . . . . . . . . . . 95

4.8 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . 95

4.9 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96

5 Conclusions 103

5.1 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103

5.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105

ii

CONTENTS

A Quaternion Representation of Rotations 106

B Plucker Representation of Lines 108

C Spherical Forward Projection 111

References 124

iii

Chapter 1

Introduction

Then God said, ”Let there be light”;and there was light.

Genesis 1:3

1.1 Motivation

Photography is the process of capturing light with a camera. It is based on

the two Greek words photos ”light” and graphe ”drawing”, together meaning

”drawing with light”. Light is the most important element of a photograph.

Computer vision techniques are based on informations that are extracted

from images. It can be difficult to robustly extract informations from images

because light variations can have a large impact on an object’s appearance.

As a result, most computer vision techniques fail in arbitrary light conditions.

In segmentation, for instance, intensity changes that result from surface shad-

ing are a major problem and if light conditions would be known, one could

correct for illumination artifacts like shading, specular reflections and shad-

ows. Similarly, if the light source position would be known, one could use

specular highlights and shadows in other computer vision areas as a source of

information instead of treating them as statistical outliers.

This thesis presents theoretical and practical solutions for light source

estimation from spherical specular reflections (i.e. specular reflections on a

1

1.2 Approach

sphere). Novel algorithms will be introduced for the estimation of

• multiple light directions and camera poses,

• a polygonal light source, and

• a visual display and the point-of-gaze of a subject in front of the display.

Inferring information about light from images has applications in com-

puter vision, computer graphics and several other areas. For instance, light

positions can be used to determine the shape of objects using shape-from-

shading [Horn (1977)], shape-from-shadows [Daum & Dudekb (1997)], shape-

from-specularities [Blake & Brelstaff (1988)] or photometric stereo [Woodham

(1980)]. The appearance of an object depends on both light position and cam-

era position, which can be exploited for camera pose estimation [Lagger et al.

(2008)]. To render virtual/real objects into a scene realistically, augmented

reality and image based rendering techniques use the precise location of light

sources for rendering [Bimber & Raskar (2005)]. In medical applications light

estimates are used for quantification of skin cancer and burn scars [Powell

et al. (2000)]. Other applications use light sources in controlled setups for

gaze estimation and recognition [Osadchy et al. (2003)].

Light source estimation has applications in the humanities, where scholars

analyze light sources in realistic paintings to address a number of technical

problems in the history of art [Stork (2009)]. In recent work [Johnson et al.

(2008)], methods for the illuminant position estimation were applied to Jan

Vermeer’s Girl with a pearl earring (c. 1665-1666). A physical model analysis

of the girl’s pearl, a cast shadow analysis and an occluding contour analysis was

employed to determine light positions in the three dimensions of the picture

space.

1.2 Approach

This thesis aims at tackling the problem of light source estimation from specu-

lar reflections on spheres (billiard balls or eyeballs were used in experiments).

Specular reflection is the mirror like reflection of light, in which light from a

2

1.3 Contributions

single incoming direction is reflected into a single outgoing direction. Specular

reflections are very predictable because they obey the following two laws:

1. The incident ray, the reflected ray and the normal to the reflection surface

at the point of the incidence lie in the same plane.

2. The angle which the incident ray makes with the normal is equal to the

angle which the reflected ray makes to the same normal.

Apart from specular reflections we will use object boundaries like the silhouette

of a sphere (in Chapter 2 and Chapter 3) and the iris boundary (in Chapter 4)

to reconstruct a spherical object. The reconstruction allows us to determine

a normal at a given point of incidence. Given the normal and an intrinsically

calibrated camera the light position can be estimated.

1.3 Contributions

Through the studies of the properties of specular reflections on spherical ob-

jects, theories have been developed in this thesis to provide practical solutions

for light estimation. The main contributions of this thesis are:

• a closed form solution for recovering multiple distant point light

sources and camera positions from a single sphere (Chapter 2). It is

shown that the light directions can be uniquely determined from the

specular highlights observed in a single view of a sphere without knowing

or recovering the exact radius and center of the sphere. If the sphere is

being observed by multiple cameras, its images will uniquely define the

translation vector of each camera from a common world origin centered at

the sphere center. The relative rotation between the cameras is recovered

using two or more light directions estimated from each view. Preliminary

results of this research have been published in [Wong et al. (2008)].

• a novel solution for the recovery of a planar polygonal area light

source (Chapter 3). The light source is represented as a closed path of

a sequence of straight 3-space line segments. Each 3-space line will be

3

1.4 Thesis Outline

estimated independently from its reflections (a) on two spheres observed

in a single viewpoint or (b) from two views of a single sphere. An empir-

ical analysis will show that spherical line estimation from a single view

and a single sphere is ill-conditioned. Preliminary results of this research

have been published in [Schnieders et al. (2009)].

• an approximate solution for the estimation of eye positions and

visual display from a single image (Chapter 4). Specular reflections

on the cornea of the quasi spherical eye provide strong constraints on

the environment surrounding the subject, and can be exploited to find

the position of the visual display in front of the subject. A geometric

eye model based on the anatomy of human eyes is proposed and is used

to determine the location of the eyes. The point-of-gaze is estimated

from the positions of the eyes and the display. In contrast to existing

work, there are no subject specific parameters necessary for determining

where the subject is gazing relative to a visual display. Apart from the

camera intrinsics, setup specific parameters do not need to be known.

Preliminary results of this research have been published in [Schnieders

et al. (2010)].

1.4 Thesis Outline

The remainder of this thesis is organized as follows.

Chapter 2. This chapter recovers multiple point light sources from its

reflections on a sphere. It begins by giving a survey on existing methods

for distant point light estimation and then addresses the problem of sphere

reconstruction from a single image. With unknown radius, a one-parameter

family of solutions will be obtained with all the sphere centers lying on the line

joining the camera center and the true sphere center. The standard technique

for recovering light directions from the observed highlights of a sphere with

known radius and location are then briefly reviewed. The chapter then proves

that any sphere from the family of solutions recovered from a single image

4

1.4 Thesis Outline

can be used to estimate the light directions. A method for recovering the

camera poses based on quaternion representation of rotations is presented, and

a summary of the proposed algorithms is given. Finally, experimental results

on both synthetic and real data demonstrate the accuracy and the robustness

of the method.

Chapter 3. In this chapter, a polygonal light source is estimated. It

first briefly reviews the literature on related work and then introduces the the-

ory for line reconstruction. An algebraic formulation for line intersections in

Plucker-space is formulated and an empirical analysis shows that spherical line

estimation from a single view is ill-conditioned. It is shown that a line can be

estimated robustly by introducing an additional sphere or an additional view.

The line reconstruction algorithm is applied to polygonal light source estima-

tion, and experiments on real data show the usefulness of the proposed method.

Finally, the proposed algorithm is compared to a point based reconstruction

algorithm and experiments on real data are presented.

Chapter 4. This chapter estimates a visual display and a point-of-gaze

from eye reflections. A survey of the literature on gaze estimation is first pre-

sented. An introduction to the general shape and dimensions of the human eye

follows and an approximate geometric eye model is proposed. The geometric

eye model consists of two sphere segments of different sizes placed one in front

of the other. It is shown that the limbus (iris boundary) can be reconstructed

up to a sign ambiguity from its perspective projection. Using the theory pro-

posed in Chapter 3, a closed form solution for the reconstruction of a display

edge from eye reflections is proposed. A rectangle is extracted from the four in-

dependently estimated display edges. Finally, the point-of-gaze from different

subjects is determined on the visual display in experiments.

Chapter 5. This chapter presents a summary of the theories and algo-

rithms developed in this dissertation, followed by a brief discussion of possible

future work.

5

Chapter 2

Light Direction Estimation

2.1 Introduction

This chapter introduces a novel, closed form solution for recovering multiple

distant point light sources and camera positions from a single sphere. Tradi-

tional methods for estimating light directions using spheres either assume both

the radius and center of the sphere being known precisely, or depend on multi-

ple calibrated views to recover these parameters. This chapter will show that

the light directions can be uniquely determined from the specular highlights

observed in a single view of a sphere without knowing or recovering the exact

radius and center of the sphere. Besides, if the sphere is being observed by

multiple cameras, its images will uniquely define the translation vector of each

camera from a common world origin centered at the sphere center. It will be

shown that the relative rotation between the cameras can be recovered using

two or more light directions estimated from each view.

The robustness of the proposed method is evaluated using noisy data and

the method is compared to the 8 point algorithm [Hartley (1995); Longuet-

Higgins (1981)]. Experimental results for 3D reconstruction and augmented

reality show that the presented method is practical and accurate.

There is a large number of applications that require estimating light di-

rections. Note that the analysis of an objects appearance under near light

sources is often difficult. For near light, both the direction and distance of the

light source varies over the surface of the object. For the sake of simplicity,

6

2.2 Related Work

computer vision techniques such as shape from shading, photometric stereo

and augmented reality often assume directional light sources. The proposed

method enables multiple light directions estimation and camera pose estima-

tion in a simple closed form solution. This can greatly simplify the experi-

mental setup of many computer vision applications which require directional

lights and multiple viewpoints.

The rest of this chapter is organized as follows. Sect. 2.2 discusses previous

work on distant point light estimation. Sect. 2.3 addresses the problem of

sphere reconstruction from a single image. It is shown that with unknown

radius, a one-parameter family of solutions will be obtained with all the sphere

centers lying on the line joining the camera center and the true sphere center.

Sect. 2.4 briefly reviews the standard technique for recovering light directions

from the observed highlights of a sphere with known radius and location. It

then proves that any sphere from the family of solutions recovered from a

single image can be used to estimate the light directions. Sect. 2.5 presents

a method for recovering the camera poses based on quaternions using the

recovered sphere and light directions. A brief summary of the algorithms is

given in Sect. 2.6. Finally, experimental results on both synthetic and real

data are given in Sect. 2.7, followed by conclusions in Sect. 2.8.

2.2 Related Work

In the literature, there exists a relative large number of work dealing with the

estimation of light directions. In early work [Pentland (1982)], a maximum-

likelihood method was proposed for the estimation of a single distant light

source. In experiments, estimates of the maximum-likelihood method were

compared to those made by humans. In the context of shape from shading

an iterative algorithm that alternatively estimates the surface shape and the

direction of a single source was published in [Brooks & Horn (1985)]. Similarly,

the work [Ikeuchi & Sato (1991)] and [Zheng & Chellappa (1991a)] estimated

a single distant point light source. However, multiple light sources are of-

ten present in a natural environment, and the problem of estimating multiple

illuminant directions is generally more challenging.

7

2.2 Related Work

Surface normals and image intensities at occluding boundaries imposes con-

straints to estimate multiple light directions [Yang & Yuille (1991)]. However,

a unique solution for more than four light sources cannot be computed from a

single image under the Lambertian model. In [Zhang et al. (2001)], multiple

illuminants from a sphere of known physical size were estimated by identifying

critical points. These points have a maximal change of intensity and occur

on a smooth surface whenever the illuminant direction is perpendicular to the

normal of the surface. Critical points were determined on a Lambertian sphere

with a least squares and iterative technique. Unfortunately, critical points are

difficult to detect because they are sensitive to noise. In addition, the method

only succeeds if there are no opposite light directions.

To tackle the aforementioned shortcomings concerning noise sensitivity,

it was proposed to segment an object into light patches [Wang & Samaras

(2008)], where each patch is illuminated by a different set of sources. The

boundaries of those light patches are the critical points. For an object with

known geometry and albedo, light directions and intensities were determined.

In other work [Wang & Samaras (2003)], shading and shadow cues were com-

bined in a hybrid approach to improve the estimation of a set of directional

light sources. In [Li et al. (2003)], cues from shading, shadow, and specular

reflections were combined for the estimation of a directional illuminant in a

textured scene. Unlike existing work, they can deal with effects of texture

but make the assumption that texture edges in the scene are not too densely

distributed. Textured objects illuminated by multiple point light sources are

studied in [Lagger & Fua (2008)]. In contrast to the work of Li et. al., specular

reflections were used to address the light source recovery. The algorithm of

Lagger and Fua can operate in the presence of an arbitrary texture and an

unknown number of light directions, but unfortunately, 3D geometry is as-

sumed to be known. The aforementioned methods are mostly based on the

Lambertian model, and they all require prior knowledge of the projections of a

reference object with known geometry to give the relationship between surface

orientations and image intensities.

The specular reflection component of light is known to work in a very pre-

dictable manner, and it can be exploited for light estimation. A mirror sphere

8

2.2 Related Work

was utilized in [Debevec (1998)] to estimate the global illumination in a real

world scene. Using such a mirror sphere might, however, change the scene illu-

mination due to its strong reflection properties. Besides, it can be challenging

to extract the boundary of a reflective object. Instead of using a purely spec-

ular sphere, a sphere which exhibit both specular and diffuse components was

utilized in [Zhou & Kambhamettu (2002)]. Zhou and Kambhamettu proposed

an iterative method to recover the location and radius of a sphere from a pair

of calibrated images, and used the recovered sphere to estimate the light di-

rections from the specular highlights on the sphere. Similar to their work, this

chapter considers the problem of recovering multiple distant light sources from

a single sphere with unknown radius and location. Unlike [Zhou & Kamb-

hamettu (2002)] which requires multiple fully calibrated views for recovering

the radius and location of the sphere via an iterative method, it will be shown

in this chapter that light directions can be recovered directly from a scaled

sphere estimated from a single view. Given multiple views of the sphere, a

closed form solution is introduced to estimate the relative positions and ori-

entations of the cameras using the recovered light directions. Hence, both

the light directions and camera poses can be recovered using a single sphere.

The proposed method will work under the assumption of a perspective camera

with known intrinsics, observing a sphere that reflects multiple distant point

light sources. It will be shown that at least two light sources are necessary to

determine the extrinsic parameters of the cameras. A closed form solution for

all the estimations is presented. This closed-form solution provides in a single

step the best possible solution. Compared to an iterative method it does not

depend on a good initial guess.

Specular reflections have been used previously for camera pose refinements

[Lagger et al. (2008)], where it was demonstrated that reflections can be ex-

ploited for accurate registration of shiny objects. Instead of using specularities

as an additional constraint for camera pose, in this thesis we will use specu-

larities as the only source of information for rotation estimation.

9

2.3 Reconstruction of Sphere


In the following, it will be shown that a sphere can be reconstructed up to its

radius by determining the central axis of a cone determined from the spheres

projection and the camera center.

Consider a pinhole camera P viewing a sphere S. Without loss of generality,

let the radius and center of the sphere be R and [ Xc Yc Zc ]T respectively,

and the camera coordinate system be coincide with the world coordinate sys-

tem. The sphere S can be represented as a quadric surface by a 4×4 symmetric

matrix

Qs =

[I3 −Sc

−STc (ST

c Sc −R2)

], (2.1)

where Sc = [ Xc Yc Zc ]T is the sphere center. Any 3D point X lying on

S will satisfy the equation XTQsX = 0 where X represents its homogeneous

coordinates. Suppose the 3 × 3 calibration matrix K of P is known, the

projection matrix for P can be written as P = K[ I3 0 ]. The image of S

under P will be a conic C. This (point) conic C can be represented by a 3× 3

symmetric matrix C, given by [Hartley & Zisserman (2004)]

C = (PQ∗sPT)∗, (2.2)

where Q∗s denotes the dual to the quadric Qs and is equal to

Q∗s = Q−1s = −

[ScS

Tc /R

2 + I3 Sc/R2

STc /R

2 1/R2

]. (2.3)

The conic image can now be expressed in terms of K, Sc and R

C = (KKT − (KSc)(KSc)T/R2)∗. (2.4)

Any 2D point x lying on C will satisfy the equation xTCx = 0 where x

represents the homogeneous coordinates of the point.

The conic image C and the camera P will define a cone Qco, which will be

tangent to S and can be represented by a 4× 4 matrix

Qco = PTCP

=

[KTCK 0

0T 0

]. (2.5)

10


O

CS

Sc

Figure 2.1: The conic image C of the sphere S and the camera center O will

define a right circular cone. This cone is tangent to S and its axis passes

through the sphere center Sc.

Note that Qco is a right circular cone. Its axis will pass through the camera

center O and the sphere center Sc (see Fig. 2.1). If the radius R of the sphere S

is known, Sc can be uniquely determined along this axis. In the next paragraph,

a closed form solution for Sc will first be derived under a special case (C is a

circle). The method for estimating Sc under the general case (C is a conic)

will then be discussed.

Special case: Consider the case where the sphere center lies along the

positive Z-axis, and the camera calibration matrix is given by the identity

matrix I3. Under this configuration, the sphere center will have coordinates

Sc = [ 0 0 d ]T. Note that d is also the distance between the camera center

and the sphere center. The image of the sphere can be obtained using (2.4),

and is given by

C = (I− ScSTc /R

2)∗ =

1 0 00 1 0

0 0 R2

R2−d2

. (2.6)

Note that C represents a circle with radius r =√− R2

R2−d2 . The center of C is

at the origin, which is also the image of the sphere center. Given the radius r

of C, the distance d between the camera center and the sphere center can be

11

2.4 Illuminant Direction Estimation

recovered as

d = R

√1 + r2

r, (2.7)

and the location of the sphere center follows.

General case: Consider the case where the sphere center and the camera

calibration matrix are given by Sc and K respectively. Generally, the image

of the sphere will no longer be a circle centered at the origin, but a conic C

centered at an arbitrary point xa. Note that xa is in general not the image

of Sc. In order to recover Sc from C, the effect of K is first removed by

normalizing the image using K−1. The conic C will be transformed to a conic

C = KTCK in the normalized image. This conic can be diagonalized into

C = MDMT = M

a 0 00 a 00 0 b

MT, (2.8)

where M is an orthogonal matrix whose columns are the eigenvectors of C and

D is a diagonal matrix consisting of the corresponding eigenvalues. Let a > 0

and b < 0, then the matrix MT defines a rotation that will transform C to the

circle D with radius r =√− ba

centered at the origin. This transformation cor-

responds to rotating the camera about its center until its principle axis passes

through the sphere center. This reduces the general case to the previously

described special case, and the distance d between the camera center and the

sphere center can be recovered in terms of r and R. Finally, the sphere center

can be recovered as

Sc = M[ 0 0 d ]T

= dm3, (2.9)

where m3 is the third column of M.


Suppose the center Sc of a sphere with known radius R has been estimated

using the method described in the previous section, it is then straightforward to

12


V

N

L

X

O

αα

x

Figure 2.2: The angle of incoming light equals to the angle of outgoing light

at a surface point with highlight.

recover the light direction from the observed highlight on the sphere. Simply

construct a ray from the camera center through a pixel corresponding to a

highlight and locate the intersection of this ray with the sphere to determine

the point on the sphere giving rise to the highlight (see Fig. 2.2). By using the

property that the angle of the incoming light equals the angle of the outgoing

light to the camera at a surface point with highlight (see Fig. 2.3), the light

direction L can be recovered as

L = V − (2N ·V)N, (2.10)

where

V =K−1x

|K−1x|(2.11)

is the unit viewing vector constructed from the specular highlight point x in

the image, N = X−Sc

|X−Sc| is the unit surface normal vector at X, and X is a point

with specular highlight on the sphere that projects to x in the image. The

point X is determined as the first intersection between the viewing ray defined

by the vector V and the sphere.

Now suppose the radius R of the sphere is unknown, it has been shown in

Sect. 2.3 that there exists a one-parameter family of solutions for the sphere

center Sc which all lie on the straight line joining the camera center and the

13


NV L(V•N)N

V-(V•N)N V-(V•N)N

-(V•N)N

ααX

Figure 2.3: The light direction can be recovered as L = V − (2N ·V)N.

true sphere center. It will now be shown that the light direction recovered

from an observed highlight using any of these scaled spheres will be identical.

In other words, light directions can be recovered from the highlights observed

in the image of a sphere without knowing its size and location.

Proposition 2.4.1 Consider a ray casted from the camera center and the fam-

ily of spheres with varying radius recovered from the conic image of a sphere. If

this ray intersects any of these spheres, it will intersect all the spheres and the

first point of intersection with each sphere will all have the same unit surface

normal.

Proof Since the cone constructed from the camera center and the conic image

of the sphere will be tangent to all the recovered spheres. Any ray lying within

this cone will intersect all these spheres, whereas any ray lying outside this

cone will intersect none of them.

To prove that the intersection points have the same unit surface normal, it

is sufficient to consider only the cross-section containing both the ray and the

line defined by the sphere centers (see Fig. 2.4). Without loss of generality,

consider a sphere S1 from the family, and let its radius and center be r1 and

Sc1 respectively. Suppose the ray intersect S1 at X1. The surface normal N1

at X1 is given by the vector Sc1X1. Now consider a second sphere S2 from

the family, and let its radius and center be r2 and Sc2 respectively. A line

being parallel to N1 can be constructed from Sc2, and let the intersection

14


N1

N2

Sc1 Sc2

OX 1

X 2R 1

R 2

Figure 2.4: The first intersection point of the ray with each sphere from the

family of solutions will have the same unit surface normal.

point between this line and S2 be X2. By construction, the surface normal

N2 at X2 will be parallel to N1. Consider the two triangles 4OSc1X1 and

4OSc2X2. Obviously, |X1Sc1| : |X2Sc2| = r1 : r2. It follows from (2.7) that

|OSc1| : |OSc2| = r1 : r2. Finally by construction, ∠OSc1X1 = ∠OSc2X2.

Hence 4OSc1X1 and 4OSc2X2 are similar and ∠Sc1OX1 = ∠Sc2OX2. It

follows that the ray will intersect S2 at X2 at which the surface normal N2 is

parallel to the surface normal N1 at X1. Since the two spheres being considered

are chosen arbitrarily, the same argument applies to all spheres in the family,

and the proof is completed.

This proof shows that there is a homothetic transformation with center O

between the first sphere and the second sphere preserving the directions of

vectors.

From (2.10), the light direction L only depends on the unit viewing vector

V and the unit surface normal N. The following corollary therefore follows

immediately from Proposition 1:

Corollary 2.4.2 The light direction estimated from an observed specular high-

light in an image of a sphere will be independent of the radius used in recovering

the location of the sphere center.

15

2.5 Camera Pose Estimation


Suppose two images of a sphere are captured from two distinct viewpoints. By

applying the method described in Sect. 2.3 to each image independently, the

sphere center can be recovered in each of the two camera-centered coordinate

systems respectively. By assuming an arbitrary but fixed radius for the sphere

in both views, it is possible to relate the two cameras in a common coordinate

system. Without loss of generality, let the sphere center in the camera-centered

coordinate system of the first view be Sc and that of the second view be S′c

respectively. By considering a common world coordinate system centered at

the sphere center, the projection matrices for the two views can be written as

P = K[ I Sc ]

P′ = K′[ I S′c ]. (2.12)

Note that the above projection matrices are not unique. Due to the symmetry

exhibited in the geometry of the sphere, an arbitrary rotation about the sphere

center (i.e., the world origin) can be applied to the camera without changing

the image of the sphere. This corresponds to rotating the camera around

the sphere while keeping the cone constructed from the image tangent to the

sphere. Hence, by choosing the first camera as a reference view, a more general

form of the projection matrices for the two views is given by

P = K[ I Sc ]

P′ = K′[ R S′c ], (2.13)

where R is a 3×3 rotation matrix with three degrees of freedom (corresponding

to the direction of a rotation axis plus the angle of rotation about the axis).

By assuming the light directions being fixed (globally) in both views, the

highlights observed in the two images can be exploited to uniquely determine

the relative rotation between the two cameras. Note that the location of the

highlight on the sphere surface will depend on both the light direction and the

viewpoint. Hence the locations of the highlights due to the same light direction

will be different under two distinct viewpoints, and their projections on the

two images do not provide a pair of point correspondence. Nonetheless, using

16


the method described in Sect. 2.4, the light direction towards the illuminant

can be recovered in each of the two camera-centered coordinate systems.

Without loss of generality, let the (unit) light direction in the camera-

centered coordinate system of the first view be L and that of the second view

be L′ respectively. Since these two directions are parallel in the common world

coordinate system, the rotation matrix R relating the two cameras will bring

L to L′, i.e.,

RL = L′. (2.14)

The above equation places two independent constraints on R. Hence, observ-

ing two highlights produced by two distinct light directions in two images will

provide four constraints, which is enough to determine the three parameters

of R. In practice, estimated light directions are not exact (noise) and a rota-

tion mapping a set of given light directions {L1...Lm} to another set of light

directions {L′1...L′m} may not be found exactly. Instead we are interested in a

rotation which minimizes the sum of squared residual errors

ε =m∑j=1

||L′j −RLj||2, (2.15)

which is minimal if the sum of dot products

c =m∑j=1

RLj · L′j (2.16)

is as large as possible. Using unit quaternions (Sect. A) to represent the

rotation, one can rewrite this to

c =m∑j=1

qLjq∗ · L′j, (2.17)

where q is the unit quaternion representation of the rotation matrix R, and

L is the quaternion representation of the light direction vector L. This can be

rewritten to

c =m∑j=1

qLj · L′jq, (2.18)

17

2.6 Summary

since for unit quaternions qq∗ = 1 and (qp) · (qr) = p · r. Using matrix

representations for quaternion multiplications this rewrites to

c =m∑j=1

(Ljq) · (L′jq)

= qTWq, (2.19)

where L and L are the orthogonal 4× 4 quaternion matrices (A.3), and W is

a 4× 4 symmetric matrix

W =m∑j=1

(Lj)TL′j. (2.20)

From basic eigenvalue theory, the unit quaternion which maximizes (2.19) is

given by the eigenvector e of W corresponding to the largest positive eigen-

value. Finally the orthonormal rotation matrix can be extracted from (A.6)

as the lower-right-hand submatrix of ETE, where E is the quaternion matrix

representation (A.3) of e.

Finding the transformation between points measured in two different coor-

dinate systems (also known as absolute orientation) is a well studied subject

within the field of photogrammetry and results obtained here coincides with

results derived in [Horn (1987)].

2.6 Summary

This section summarizes the proposed algorithm for the recovery of multiple

distant point light sources and camera positions. The algorithm requires

• the number of images n,

• the number of light sources m,

• an outline of the sphere in form of a conic matrix Ci for each image,

• a camera intrinsic calibration matrix Ki for each image,

• a specular pixel xij for each image and light, and

18

2.6 Summary

• the radius of the sphere R

as input. The computation is performed in the following three steps:

1. Determine the sphere centers Sci for each camera i (Algorithm 1).

2. Calculate the light directions Lij for each camera i and light j (Algo-

rithm 2).

3. Calculate the rotation matrices Ri for each camera i (Algorithm 3).

Finally, the output of the algorithm is given as the rotation matrices Ri, the

translation vector Ti = Sci and the the light directions L0j.

Algorithm 1 Calculation of sphere centers Sci for each camera i.

Require: Conic matrices Ci, calibration matrix Ki, number of images n, ra-

dius of sphere R.

for i = 1 to n do

C⇐ KTi CiKi

M

a1 0 0

0 a2 0

0 0 b

M−1 ⇐ eigenvalue decomposition of C {see (2.8)}

Let a1, a2 > 0 and b < 0

a⇐ a1+a2

2

r ⇐√−ba

d⇐ R√

1+r2

r

Sci ⇐M[ 0 0 d ]T

end for

The essential matrix [Longuet-Higgins (1981)], relating points in the image

of the first camera to corresponding points in the image of the i-th camera,

can be obtained as

Ei = [T1 −RiTi]xRi, (2.21)

where the operation

[v]x =

0 −vz vy

vz 0 −vx

−vy vx 0

(2.22)

19

2.6 Summary

Algorithm 2 Calculation of light directions Lij for each camera i and light j.

Require: Calibration matrices Ki, sphere centers Sci, specular points xij,

number of images n, number of lights m, radius of sphere R.

for i = 1 to n do

for j = 1 to m do

V⇐ K−1i xij/|K−1

i xij|, where x is homogeneous representation of x.

X⇐ first intersection of line L(λ) = λV with sphere (Sci, R).

N⇐ (X− Sci)/R

Lij ⇐ V − (2N ·V)N

end for

end for

Algorithm 3 Calculation of rotation matrices Ri for each camera i.

Require: Light directions Lij, number of images n, number of lights m.

R1 ⇐ I

for i = 2 to n do

for j = 1 to m do

W =∑n

j=1(Lij)TL0j, where L is quaternion matrix of L.

perform eigenvalue decomposition on W

e⇐ eigenvector corresponding to largest eigenvalue.

Ri ⇐ lower-right-hand submatrix of ETEend for

end for

20

2.7 Experimental Results

denotes the matrix multiplication representation of the cross product for the

vector v =[vx vy vz

]T.


The closed form solutions described in the previous sections for recovering

sphere centers, light directions and camera poses have been implemented. Ex-

periments on both synthetic and real data were carried out and the results are

presented in the following sections.

2.7.1 Synthetic Data

The experimental setup consists of a synthetic sphere being viewed by two

identical synthetic cameras under four distinct directional lights. The synthetic

images had a dimension of 958 × 838 pixels, the intrinsic parameters of the

cameras were given by the calibration matrix

K =

800 0 4790 800 4190 0 1

. (2.23)

Fig. 2.5 renders the two camera positions relative to the synthetic sphere

and Fig. 2.6(a) shows two views of a synthetic unit sphere under four light

sources rendered using the parameters of the two cameras.

Images of the sphere were obtained analytically as a conic using (2.4). The

specular point on the sphere (satisfying (2.10)) for a given light direction was

determined with the closed form solution of Sect. C. Fig. 2.6(b) shows the

obtained conics and the specular points.

In order to evaluate the robustness of the proposed method, uniformly

distributed random noise was added to the conic as well as to the locations of

the specular highlights. To add noise to the conic, points were first sampled

and perturbed in a radial direction from the conic center. A noisy conic was

obtained as a conic robustly fitted to these noisy points using a direct least

squares method [Fitzgibbon et al. (1999)]. Noise was added directly to the

21


Figure 2.5: Two synthetic cameras viewing a sphere under four lights.

coordinate of a specular highlight. Fig. 2.6(c) shows the noisy conic samples

with the fitted conics and the noisy specular points.

Experiments on synthetic data with noise levels ranging from 0.0 to 3.0

pixels were carried out. To obtain statistically more meaningful results, 250

independent trials for each noise level were conducted to estimate both the

light directions and the camera poses from the noisy conics and highlights.

Fig. 2.7 plots the mean angular error (in degrees) in the estimated light di-

rection against the noise level (in pixels) for a particular light source. The noise

behavior of the other light sources was similar to the one shown in Fig. 2.7.

It can be seen that the error increases linearly with the noise level. For

a noise level of 1.0 pixel, the error in the estimated light directions is only

around 0.3◦. Fig. 2.8 shows a plot of the mean angular errors (in degrees)

in the estimated rotations of the cameras against the noise level (in pixels).

Experiments for 2 lights and for 4 lights were performed, and the rotation

was decomposed into a rotation axis, and a rotation angle around the axis.

It can be seen that the errors increased linearly with the noise level, which is

expected as the computation of the rotation depends directly on the estimated

light directions. For a noise level of 1.0 pixel using two lights, the mean angular

22


(a)

(b)

(c)

Figure 2.6: (a) Two views of a synthetic sphere under four light sources ren-

dered using an OpenGL Phong Shader with the viewing parameters of the two

synthetic cameras. (b) The obtained conic and specular points. (c) Conics

fitted to noisy samples and noisy specular points.

23


0 0.5 1 1.5 2 2.5 30

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

noise [pixel]

angl

e [d

egre

e]

Figure 2.7: A plot of the mean angular error (in degrees) in the estimated light

directions against the noise level (in pixels).

errors was approximately 0.45◦ and 0.3◦ for the angle between the rotation axis

and for the rotation angle respectively. For a noise level of 1.0 pixel using all

four lights, the error decreased to 0.35◦ and 0.2◦ respectively.

The proposed method was evaluated against the normalized 8 point al-

gorithm. This algorithm, introduced in the Nature article [Longuet-Higgins

(1981)], deals with the recovery of the camera pose from a set of correspond-

ing image points. It was later extended [Hartley (1995)] for the recovery of the

fundamental matrix, which encodes the camera pose and intrinsics. We have

selected it here, because it is a widely used method for pose estimation, and

compared to the 5 point algorithm [Nister (2004)] and the 7 point algorithm

[Hartley & Zisserman (2004)], it is significantly less complex and therefore eas-

ier to implement. To apply the 8 point algorithm on the synthetic data, a grid

of 3× 3 points was generated on three perpendicular planes (total 27 points).

The resulting points Ui were projected into the images with the equations

PUi = ui and P′Ui = u′i using the ground truth camera matrices P and P′.

Point correspondences {ui ←→ u′i} were provided as input to the 8 point algo-

rithm, and the obtained fundamental matrix F was compared to the proposed

method. For each point ui, its corresponding epipolar line Fui was computed,

24


0 0.5 1 1.5 2 2.5 30

0.2

0.4

0.6

0.8

1

1.2

1.4

noise [pixel]

angl

e [d

egre

e]rotation axis angle (2 lights)rotation angle (2 lights)rotation axis angle (4 lights)rotation angle (4 lights)

Figure 2.8: A plot of the mean angular errors (in degrees) in the estimated

rotations of the cameras against the noise level (in pixels). Experiments for 2

lights and for 4 lights were performed, and the rotation is represented using a

rotation axis and a rotation angle around the axis.

and the distance to the corresponding point u′i was estimated. This was done

in both directions and the mean distance was used as a measure of quality

e =

p∑i=1

d(Fui,u′i) + d(FTu′i,ui)

2p, (2.24)

where

d(l,p) =|ax2

0 + by20 + c|√

a2 + b2(2.25)

is the distance (in pixels) between a point p = [ x0 y0 ]T and a line l =

[ a b c ]T. In the literature [Hartley & Zisserman (2004)] this measurement

is referred to as the symmetric transfer error. We found that at least 10 points

were necessary to achieve accurate results for the estimation of the fundamental

matrix from noisy correspondences. Fig. 2.9 plots the mean distance to the

epipolar line (in pixel) against the noise level (in pixels). It can be seen that

the proposed method with two, four and eight specular reflections performed

better than the 8 point algorithm using 10 points but worse than the 8 point

25


0 0.5 1 1.5 2 2.5 30

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1.8

2

noise [pixel]

dist

ance

[pix

el]

proposed method (2 lights)proposed method (4 lights)proposed method (8 lights)8 point algorithm (10 points)8 point algorithm (27 points)

Figure 2.9: A plot of the mean distance to the epipolar line (in pixels) against

the noise level (in pixels)

algorithm using all available 27 points. Increasing the number of specular

points (adding more light sources) resulted in a smaller error.

2.7.2 Real Data

In real experiments, five light sources were placed approximately 3 meters away

from a blue billiard ball which had a diameter of 54mm. Seven images of the

billiard ball were taken at distinct viewpoints and Fig. 2.10 shows two of the

input images. Note that the billiard ball was imaged with other objects, in-

cluding another billiard ball, but those objects were not used for the estimation

of light and camera. The authors of [Li et al. (2008)] provided their experi-

mental data and here just the sphere is used to estimate both camera pose and

light directions. In Li et. al., the shape of the smooth textureless object was

recovered and the camera poses were estimated with a planar checker board.

Segmentation of the sphere. The outline of the sphere needs to be

extracted from the images. A simple color image segmentation method based

on region growing and robust conic fitting was applied to locate the boundary of

the imaged sphere automatically. The sphere segmentation algorithm proceeds

26


A B

D

E F

Figure 2.10: Five light sources were placed approximately 3 meters away from

a blue billiard ball. This figure shows two (views E and F ) out of the seven

images. Note, the blue billiard ball was imaged with other objects, including

a red billiard ball, but those objects were not used for the estimation of light

and camera. Images provided by [Li et al. (2008)].

as follows:

1. Find the sphere by searching for the largest 4-neighbor connected region

of color bc with a threshold τ1.

2. Find all pixel belonging to the sphere by applying 8-neighbor region

growing from the pixel found in the previous step with a second threshold

τ2, where τ1 < τ2.

3. Find the outline of all detected pixel by selecting, for each vertical and

horizontal scan line, the first and last pixel.

4. Fit a conic to the resulting points of the previous step using a least

squares method [Fitzgibbon et al. (1999)].

5. Robustly remove outliers with a RANSAC approach for conic fitting [Li

et al. (2005)]. The conic from the previous step is used as an initialization

for this method.

For the images in this experiment the thresholds were set to τ1 = 130 and τ2 =

180, and the color of the ball was set to bc = [ 0 0 190 ]T. This segmentation

27


algorithm will fail if there is a large object of similar color. Fig. 2.11 shows

results of the segmentation method for a case without (left) and with (right)

outliers. The first row shows the input image, second row shows the result

after the 8-neighbor region growing, third row shows the first and last pixel

for each scan line and the last row plots the fitted conic.

Segmentation of the specular highlights. Highlights were extracted

as the center of the n brightest regions inside the conic, where n is the number

of light sources. To minimize the size of the highlight, the camera exposure

was adjusted such that the size of highlights was minimal. In the following,

it is assumed that the highlight is elliptical and that the center of the ellipse

corresponds to the most probable light direction. Note that this is an ap-

proximation to simplify the highlight extraction. The algorithm proceeds as

follows:

1. Select all pixel with a gray value larger than a threshold: (255−pg) < τ3.

2. Keep the n largest 8-neighbor connected regions from the selected pixel

of the previous step.

3. Fit an ellipse to each of the resulting connected components.

4. The center of each ellipse is used as the location of the specular highlight.

For all images in the experiment the threshold was set to τ3 = 25. Highlights

were matched among different views according to their distance to the centroid

of a particular view. Fig. 2.12 shows cropped input images in the first column,

extracted specular highlights and silhouette of the ball in the second column

and matched specular highlights with their centroid in the last column.

The intrinsic parameters of the camera were obtained using a planar checker-

board pattern [Zhang (2000)]. The intrinsic parameters of the camera may also

be recovered using existing techniques based on the conic images of the sphere

[Agrawal & Davis (2003); Zhang et al. (2007)], eliminating the need of the

planar pattern.

The checkerboard pattern provides dense point correspondences among dif-

ferent viewpoints. These point correspondences were not used for the pose

28


E F

Figure 2.11: Results of the segmentation method for a case without outliers

(left) and with outliers (right). The first row is the input image, second row

the result after the 8-neighbor region growing, third row shows the first and

last pixel for each scan line and the last row plots the fitted conic.

29


A

B

C

D

E

F

G

Figure 2.12: This figure shows cropped input images in the first column, ex-

tracted specular highlights and silhouette of the sphere in the second column

and matched specular highlights with their centroid in the last column.

30


Table 2.1: Mean distance to epipolar lines for the first real experiment.

view error (in pixel)

A-B 2.691

A-C 2.998

A-D 3.431

A-E 3.667

A-F 3.670

A-G 1.992

estimation; Instead, for each point, the corresponding epipolar line was de-

termined and the distance to the corresponding point was calculated. This

was done in both directions and the mean distance of the epipolar line from

the corresponding point (also known as symmetric transfer error [Hartley &

Zisserman (2004)]) was computed and used as a measure of quality (see (2.24)).

Table 2.1 lists the mean distances to the corresponding epipolar lines for

the 6 image pairs. It can be seen from the table that the maximum error was

below 3.7 pixel for an image resolution of 4752 × 3168 pixel. Fig. 2.13 shows

one of the input images with an estimate epipolar line and its corresponding

point. This figure helps to judge the magnitude of the error for this particular

experiment. For the case shown, the point to line distance was 4.3 pixel, above

the average error, and the points was rendered with a radius of 2.5 pixel.

In a second experiment, the scene consisted of a few simple objects on a

table. This time, a red billiard ball was placed on the table and just two light

sources illuminated the scene (Fig. 2.14). Like in the previous experiment,

the light was kept constant while the viewpoint was changed. Extraction of

the sphere conic was performed in the same way as above, but this time the

specular highlights were extracted from the same images. The correctness

of the camera pose and light direction estimation was empirically verified by

reconstructing the scene objects and by augmenting additional virtual objects

into the images.

To perform the reconstruction, feature points were manually selected in

two selected views. Fig. 2.15 show the points for the two selected views F

and I. Note that the feature points were connected into triangles so that the

31


B

Figure 2.13: A point and its corresponding epipolar line in one of the input

images.

reconstruction can be rendered as a triangle mesh. Since the two cameras were

calibrated with respect to the center of the sphere it was straight forward to

recover 3D structure. Each point observed by one camera gave two equations

in three unknowns and observing the same feature point with the other cam-

era provided two further equations. This system of four equations in three

unknowns was over constraint and it was solved by the ordinary linear least

squares method.

Fig. 2.16 shows the reconstruction results rendered from different view-

points with and without textures, and Fig. 2.17 shows the estimated camera

positions. The two cameras in red correspond to views F and I. Note that

adjacent faces of the rectangular boxes in the reconstruction appear to meet

at a right angle and the relative positions of the objects appear to be correct.

Reconstructions of the four rectangular boxes are quantitatively compared to

physical measurements in Table 2.2. It can be seen that the maximum error

is below 0.5cm and the mean estimation error is just 0.19cm.

An interesting application for camera pose estimation is Augmented Reality,

in which virtual objects are superimposed (augmented) onto the real images.

32


A B C

D E F

G H I

J K L

M

Figure 2.14: In the second experiment, an image sequence consisting of a few

simple objects on a table was used. This time a red billiard ball was placed on

the table and just two light sources illuminated the scene. As in the previous

experiment, the light was kept constant and the viewpoint was changed.

33


F I

Figure 2.15: To perform the reconstruction, feature points are manually se-

lected in views F and I. Note that the feature points are connected into

triangles so that the reconstruction can be rendered as a triangle mesh.

Figure 2.16: Reconstruction result rendered from different viewpoints with and

without textures.

34


Figure 2.17: A rendering of estimated camera positions and the reconstruction.

The two cameras in red correspond to views F and I.

Table 2.2: Reconstructions of the four rectangular boxes are quantitatively

compared to physical measurements. The maximum error is below 0.5cm and

the mean estimation error is just 0.19cm.

edge measured [cm] estimation [cm] error [cm]

Book length 23.0 22.88 0.12

Book height 2.5 2.31 0.19

Book depth 16.5 16.23 0.27

Phone box length 8.9 8.84 0.06

Phone box height 8.0 7.87 0.13

Phone box depth 14.5 14.10 0.40

Zoom Lens box length 13.5 13.56 0.06

Zoom Lens box height 13.5 13.23 0.27

Zoom Lens box depth 20.0 19.54 0.46

Lens box length 10.0 10.06 0.06

Lens box height 10.5 10.40 0.10

Lens box depth 11.0 11.11 0.11

35

2.8 Conclusions

K K

Figure 2.18: Close up of the view K with and without augmented bunny and

sphere.

In Fig. 2.19 a virtual bunny was augmented into the scene on top of the book.

Fig. 2.18 shows a close up view with and without augmentation of sphere and

bunny. A sphere was also rendered on top of the real sphere and both the

color of the sphere and the bunny have been set to the color of the billiard

ball. Sphere and bunny were rendered with the estimated light positions to

increase the level of realism. The good registration of the sphere and bunny

in the images show the good quality of the estimated camera poses.

2.8 Conclusions

This chapter recovered both the light directions and camera poses from a single

sphere. The two main contributions of this chapter were

1. a closed form solution for recovering light directions from the specular

highlights observed in a single image of a sphere with unknown size and

location; and

2. a closed form solution for recovering the relative camera poses using the

estimated sphere and light directions.

It was shown that a scaled sphere can be reconstructed from its image, given

the intrinsic parameters of the camera. The translation of the sphere center

from the camera center can be determined uniquely, but the distance between

36

2.8 Conclusions

A B C

D E F

G H I

J K L

M

Figure 2.19: A virtual bunny was augmented into the scene on top of the

book. A sphere was rendered on top of the real sphere and both the color of

the sphere and the bunny have been set to the color of the billiard ball.

37

2.8 Conclusions

them will be scaled by the unknown radius of the sphere. It was then shown

that the light directions can be recovered independent of the radius chosen in

locating the sphere. If the sphere is observed by multiple views, the sphere

center, recovered using a common fixed radius, will fix the translations of the

cameras from the sphere center. The relative rotations between the cameras

were then determined by aligning the relative light directions recovered in each

view. As there exists closed form solutions for all the computation steps in-

volved, the proposed method is extremely fast and efficient. Experiments on

both synthetic and real images showed promising results. With the proposed

method, both the light directions and camera poses can be estimated simulta-

neously. This greatly eases the work of multiple views light estimation.

The proposed method has the following limitations which could be ad-

dressed in future work:

• The estimation quality of both light directions and camera poses depends

on the quality of the extracted silhouette and specular highlights. If

the spheres silhouette or the specular highlights cannot be accurately

extracted, the camera pose and light direction would be inaccurate. More

sophisticated methods for silhouette and highlight extractions, e.g. based

on temporal tracking information, could be employed.

• The center of the specular highlight does not correspond to the most

probable light direction that causes the specular highlight. In future

work we would like to study the shape of the specular highlight to further

improve estimation results.

38

Chapter 3

Polygonal Light Source

Estimation

3.1 Introduction

Many computer vision techniques use point light sources to reduce the com-

plexity in modeling image formation and to simplify the analysis. As an ex-

ample, consider the classic shape from shading technique (SfS), which recovers

the 3D shape of an object by relating the intensity values to a normal vector

and light source direction. Point light sources can be estimated easily, as we

have seen in the previous chapter. However, there are different types of light

sources in practice [Langer et al. (1997)], and not all sources can be approxi-

mated by a point (see Fig. 3.1). In general, a light source can be considered

as a point, if its size is negligible.

In this chapter, a novel solution for the recovery of a planar area light source

is proposed. Here, we consider a planar polygonal light source, instead of a

general area light source. A planar polygonal light source can be represented as

a closed path of a sequence of straight 3-space line segments. In the proposed

method, each 3-space line will be estimated independently, and subsequently,

a planar polygon will be extracted.

Area light sources have been used in computer vision for several applica-

tions: photometric stereo [Funk & Yang (2007)], normal map reconstruction

[Francken et al. (2008b)], acquisition of mirror objects [Tarini et al. (2005)],

39

3.1 Introduction

Figure 3.1: In commercial or institutional buildings, large fluorescent area

lights are commonly used.

scanning of small scale surface details [Francken et al. (2008a)], nearby area

light photometric stereo [Clark (2006)], and real-time surface reconstruction

[Schindler (2008)]. In augmented reality, it is possible to overlay synthetic im-

age on top of real images without knowing the positions of light sources. When,

however, a realistic rendering is required, light sources need to be estimated

[Bimber & Raskar (2005)].

Unfortunately, in vision applications, the camera and the light source are

usually facing a similar direction; the light source is not directly visible. Instead

of a direct view of the light source, we consider a specular reflection on a

reference sphere. It is required that the sphere is reflective (reflecting the

specular component of light), and, as in Chapter 2, a standard billiard ball is

used in practice (Fig. 3.2).

This chapter has the following main contributions:

1. It is shown that spherical line estimation from a single view and single

sphere is ill-conditioned, i.e. a small error in the input results in a large

error in the output. An empirical analysis shows that there exists a

40

3.1 Introduction

Figure 3.2: A billiard ball reflecting a single rectangular fluorescent office lamp.

strong dependency between the line distance (to the sphere) and the

reconstruction error.

2. A novel closed-form solution for line reconstruction from two spherical

reflections is proposed. In contrast to existing work ([Francken et al.

(2007); Lanman et al. (2006a); Nayar (1988); Nene & Nayar (1998)]),

this work estimates lines instead of points; it is not necessary to provide

point correspondences.

3. Spheres do not need to be calibrated, and the reconstruction is up to the

scale of the sphere. A complex calibration for the spherical mirrors (as

proposed in [Lanman et al. (2006b)]) is not necessary.

4. A novel algorithm for reconstruction of a polygon from the reflections (a)

on two spheres observed from a single viewpoint or (b) from two views

of a single sphere.

The rest of the chapter is organized as follows. A survey of the literature

on related work is given in Sect. 3.2. In Sect. 3.3, the theory for line recon-

struction is introduced. Sect. 3.3.1 introduces an algebraic formulation for line

intersections in Plucker space and Sect. 3.3.2 describes an empirical analysis

41

3.2 Related Work

showing that spherical line estimation from a single view is ill-conditioned.

Sect. 3.3.3 shows that a line can be estimated robustly by introducing an

additional sphere and Sect. 3.3.4 introduces an additional view of a single

sphere. The line reconstruction algorithm is applied to polygonal light source

estimation in Sect. 3.4. Experiments on real data show the usefulness of the

proposed method in Sect. 3.5. Finally, the proposed method is compared to a

point based reconstruction algorithm in Sect. 3.6, followed by conclusions in

Sect. 3.7.

3.2 Related Work

There exists a relatively large amount of research dealing with point illuminant

estimation, and many early results were published in the context of SfS [Brooks

& Horn (1985); Pentland (1990); Zheng & Chellappa (1991b)]. A survey of

related methods can be found in [Zhang et al. (1999)].

The literature in the field of area light source estimation is sparse. In

[Debevec (1998)], the global illumination in the context of augmented reality

was estimated. Parameters such as distance and size of the illuminants were

not estimated explicitly.

Spherical mirrors have previously been used for reconstruction in [Nayar

(1988)], where points were triangulated using a single camera and multiple

spheres. In the paper, Nayar described how to match a point in the image of

one sphere to a particular point in the image of another sphere. This work

was extended in a later paper [Nene & Nayar (1998)], for the use of planar,

ellipsoidal, hyperboloidal, and paraboloidal mirrors. A screen camera setup

was estimated in [Tarini et al. (2005)] and [Francken et al. (2007)], where screen

corners were triangulated from multiple spheres. The proposed work in this

chapter differs from these existing methods, because, instead of reconstructing

points, we reconstruct lines. Lines provide stronger constraints (four vs. three

degree of freedom), and in addition, lines do not require point correspondences.

The position of a rectangular computer screen was estimated in [Funk &

Yang (2007)] from a planar mirror. The mirror had a calibration pattern

attached to it, and it reflected another calibration pattern on the computer

42

3.2 Related Work

screen. In [Sturm & Bonfort (2006)], a theoretical formulation for the estima-

tion of the pose of a known object relative to a camera was derived for the case

where the camera had no direct view of the object. In experiments, a planar

LCD monitor emitting multiple pattern reflected on planar mirrors was used.

It was shown that at least three such planar mirrors were necessary to solve

the pose problem. A screen-camera setup was estimated from the reflections

of gray code patterns on multiple spheres in [Francken et al. (2009)]. Each

pixel on the spherical mirror was uniquely identified using coded illumination

patterns. In contrast to these existing methods, the proposed method does

not need any patterns for the estimation. This enables the estimation of light

sources which cannot be controlled.

In this chapter, an area light source is recovered in 3D from an image of

the specular highlight it produces on a sphere. Unlike existing work [Zhou &

Kambhamettu (2008)], which used an iterative approach for estimating area

light sources from specularities observed on two spheres, this chapter provides

a closed form solution by treating an area light source as a polygon in 3-space.

Lines are independently determined as the intersections of reflected rays on a

sphere. Compared to an iterative method, the closed form solution presented

here has the advantage of guaranteed convergence.

There exists previous work on estimating a line from a single view. Single

view line estimation from a sphere was formulated and solved in [Lanman

et al. (2006a)], however practical results remained inaccurate. In [Lanman

et al. (2006b)], the authors solved their inaccurate line estimation by carefully

estimating all parameters of their system. In this chapter it will be shown

that even with ground truth calibration, line estimation from a single view of

a sphere cannot be accurately solved. A closed form solution of two spheres

is therefore proposed. This chapter also develops an iterative approach based

on two unknown views of just a single sphere. The rotation relating the two

unknown views can be estimated by assuming a planar light source.

43

3.3 Line Reconstruction


Mathematically, 3-space lines behave different compared to 3-space points and

planes. Points and planes in 3-space have three degrees of freedom, lines,

however, have four degrees of freedom. Moreover, three points in general

positions will define a plane by incidence and three planes in general positions

will determine a point. Four lines in general positions, however, will not define

another line, instead they define two further lines [Schubert (1874)]. Schubert’s

classic result in enumerative geometry is applied here and it will be shown that

it is possible to recover the position of 3-space lines from a single image of a

single sphere.

In the following, two-view line reconstruction is reviewed, and single-view

line reconstruction from a spherical reflection is introduced.

C

L

C’

l=PL

e e’

epipolar plane

l’=P’L’

P lT P’ l’T

Figure 3.3: The line L in 3-space can be determined from projections in two

views. The projections l and l′ back-project to planes PTl and P′Tl′ respec-

tively. If L lies on an epipolar plane its position cannot be determined.

A 3-space line projects to a 2-space line in an image as

l = PL, (3.1)

where P is the 3 × 6 line projection matrix, L is the 6-vector Plucker line

representation (see Sect. B), and l is the homogeneous 3-vector of the 2-space

line. The back-projection of l via the same camera (this time represented as

44


R

N

V

α

C

α

L

xls

incident ray

re�ected ray

Figure 3.4: A point x on the image of a reflected line will back-project to an

incident ray V and strike the sphere. V reflects on the sphere according to the

law of reflection to the reflected ray R which will intersect L.

the familiar 4× 3 point projection matrix P) results in the plane PTl. Unfor-

tunately any line on this plane (incl. L) will project to l, which ambiguates

line reconstruction from a single view. Suppose a second camera with line

projection matrix P′ is projecting L to l′ = P′L (see Fig. 3.3). This allows

the reconstruction of L as

L = PTl ∧P′Tl′, (3.2)

where A∧B are the Plucker line coordinates of the intersection of the plane A

and B. Note that L can be determined uniquely because a line in 3-space has

four degree of freedom and the imaged line provides two constraints in each

view. There exists a degeneracy for two-view line reconstruction, which occurs

when L lies on the epipolar plane (i.e. L intersects camera baseline). In this

case the epipoles lie on the imaged lines.

To reconstruction a 3-space line from a single view, let us now consider

an image ls of L formed by its reflection on a sphere. While l is a line, ls

will in general be a curve. The lines defined by the back-projection of points

on ls will intersect the sphere and reflect according to the reflection law (see

Fig. 3.4). The resulting reflected rays will intersect L. In contrast to the

case of a pinhole camera viewing a line, the reflected rays will not intersect a

45


single vantage point, but instead they will intersect another line. In contrast

to central projection, this enables line reconstruction from just a single view

(see Fig. 3.5).

L

C

ls A

L1 L2 L3 L4

Figure 3.5: The reflected rays constructed from an image of the reflection of

a line L on a sphere will in general intersect two lines, namely the line L and

the line A passing through the sphere center and the camera center.

Proposition 3.3.1 The reflected rays constructed from an image of the re-

flection of a line L on a sphere will in general intersect two lines, namely the

line L and the axis A passing through the sphere center and the camera center.

Proof Let us denote the back-projection of a point x ∈ ls as the incident

ray V and its reflection on the sphere as the reflected ray R. The viewing

ray will leave the camera center C, pass through the point x, and intersect

the sphere at a point of reflection. Let V and R be the unit vectors in the

directions of the incident ray and the reflected ray respectively. The law of

reflection states that the incident angle equals the reflected angle. Therefore

the reflection direction is given by R = V − (2N ·V)N, where N is the unit

normal vector at the point of reflection. The reflected ray R passes through

the point of reflection in the direction R and will, by construction, intersect

the line L at some point L. All the reflected rays constructed in such a way

will intersect the line L.

46


To show the intersection with the other line, note that the lines V, R and

N are coplanar, where N is defined as the line from the sphere center S in

direction N. The camera center C is on V and the sphere center S is on N,

therefore the line A from the camera center C to the sphere center S also lies

on the same plane as V, R and N. This applies to all reflected rays and it

follows that any reflected ray R will intersect A and L (see Fig. 3.6).

S

L

C

xβ

α

γ

α P

V

R

A

N

Figure 3.6: The image of a line L on a sphere with center S is determined by

reflected viewing rays R which will intersect two lines, the line L and a line

A passing through S and camera center C.

In general reflected rays are not coplanar and therefore will intersect A at

distinct points. Reconstruction will fail if there is a plane containing L and

A, i.e. L intersects A or if L is parallel to A. As illustrated in Fig. 3.5, ls

will be a line intersecting the projection of the sphere center if L intersects A.

Proposition 3.3.2 The two lines L and A are in general the only two lines

intersecting the reflected rays.

Proposition 3.3.2 follows from the classic enumerative geometry work Kalkul

der Abzahlenden Geometrie [Schubert (1874)], in which it was shown that the

number of lines intersecting four lines in general position will be two. If the

four lines lie on a doubly ruled surface (single sheet hyperboloid, hyperbolic

paraboloid or plane), they will produce infinite intersecting lines (compare

[Gasparini & Sturm (2008); Hilbert (1952)]). In the general case, the reflected

47


ls

L

AC

ls

L

A

C

Figure 3.7: If L intersects A the reconstruction will fail. In those cases ls will

be a line intersecting the projection of the sphere center.

rays do not lie on a doubly ruled surface and therefore the reflected rays will

intersect at most two lines, and under a degenerate case the reflected rays lie on

a doubly ruled surface and produce infinite intersecting lines. In practice the

degenerate case can easily be detected, because the reflected rays will produce

a matrix (defined in (3.4)) with a rank r < 4 (see Sect. 3.3.1).

Corollary 3.3.3 Reconstruction of a line from its reflection on a sphere is

possible by solving for the two lines intersecting its reflected rays and selecting

the one which does not pass through the camera center.

Note that a line L is in the null-pace of a line projection matrix P if

PL = 0. To select the correct line among the two solution lines, i.e. the line

not passing through the camera center, one can simply select the line L for

which

PL 6= 0. (3.3)

48


3.3.1 Intersection in Plucker Space

In order to formulate line intersections algebraically, we adopt the 6-vector

Plucker coordinates representation for lines in 3-space (Sect. B).

Solving for two lines which intersect m > 3 given lines R1,R2, ...,Rm is

equivalent to finding the null space of a matrix M formed by the permuted

Plucker lines Ri (see B for the definition of the permutation):

My =

R

T

1

RT

2...

RT

m

y = 0. (3.4)

The 6-vector y maps M to a null vector, if the inner product Ri · y = 0 for

each row i. Given the m > 3 reflected rays R1,R2, ...,Rm from the previous

section we can solve for y by singular value decomposition of the m×6 matrix

M into the product of three matrices

M = UΣVT =

u11 · · · u1m...

...um1 · · · umm

σ1

. . .

σ6

0 · · · 0...

...0 · · · 0

v11 · · · v16

......

v61 · · · v66

T

,

(3.5)

where U is a m × m orthogonal matrix, Σ is a m × 6 diagonal matrix with

non-negative real numbers σi on the diagonal and V is a 6 × 6 orthogonal

matrix. Let the singular values σi of Σ be ordered in decreasing magnitude.

The reflected rays will produce two intersecting lines (see Proposition 3.3.2),

therefore the rank r of M will be 4, i.e. the number of non-zero singular values

will be 4. Let the last two columns of V corresponding to the two zero singular

values σ5 and σ6 be

D = (v15, · · · , v65) (3.6)

E = (v16, · · · , v66). (3.7)

49


The null space of M can be parameterized by D and E as the line

L(t) = Dt+ E. (3.8)

Not all points on (3.8) represent 3-space lines, but just those satisfying

L(t) ·L(t) = 0. (3.9)

This was first formulated and solved in [Teller & Hohmeyer (1999)] as

(D ·D)t2 + 2(D · E)t+ (E · E) = 0, (3.10)

for which the two real roots t+ and t− corresponding to the intersecting lines

L(t+) and L(t−) can be computed as

t± =−(D · E)±

√(D · E)2 − (D ·D)(E · E)

D ·D. (3.11)

Let the rank of the matrix M be r. In general, r = 4, however, if them reflected

rays L1,L2, ...,Lm are coplanar, one or more of the four largest singular values

become zero and therefore r < 4. Due to noise M will always have full rank.

Therefore, in practice, it is necessary to evaluate the numerical rank to detect

coplanar reflected lines.

There are cases were reflected lines will be nearly coplanar. These cases

correspond to the poor reconstruction results obtained in short baseline stereo

systems. The next section identifies situations in which reflected lines are

nearly coplanar. Sect. 3.3.3 and Sect. 3.3.4 propose solutions to overcome

these difficulties.

3.3.2 Empirical Analysis

The theoretical formulation of Sect. 3.3.1 was implemented and this section

analyzes reconstructions from synthetic images.

Camera and sphere parameters are readily available because we are dealing

with synthetic data. The synthetic images in the experiments had a dimension

50


ls ls

Figure 3.8: The planarity of reflected rays depends on the distance of the line

to the sphere center.

of 1229× 731 pixels and the intrinsic parameters of the cameras were given by

the camera calibration matrix

K =

1700 0 614.50 1700 365.50 0 1

. (3.12)

A synthetic sphere of radius 1.0 centered at the world origin was rendered with

a camera centered at[

0 0 −5]T

pointing towards the positive z-direction.

Four parallel lines at different distances from the sphere were added to the

scene, and for each line its reflection on the sphere and projection to the image

was calculated. Fig. 3.9 shows the 3D setup (left) and the 2D projections

(right).

The lines were reconstructed from their reflections and the reconstruction

error was measured. Fig. 3.10 plots the reconstruction error against the line

distance. The reconstruction error was defined as the mean distance of points

on the ground truth line to the reconstructed line.

Since we are dealing with synthetic data, the reconstruction error will be

zero without any noise. Fig. 3.10 therefore plots the error for different noise

levels. Uniformly distributed random noise was added to the x and y coordi-

nates of points on the projected line. It can be seen that for all noise levels

the reconstruction error increased with a larger line distance (this is illustrated

in Fig. 3.8). Unfortunately, even with a moderate noise level of 0.5 pixel, the

51


Figure 3.9: The setup for the synthetic experiment (left) and projections of

lines to the image (right). It can be seen that lines at different distances reflect

differently on the sphere.

1.5 2 2.5 3 3.5 4 4.5 50

0.2

0.4

0.6

0.8

1

1.2

1.4

line distance [radius]

reco

nstru

ctio

n er

ror [

radi

us]

0.25 pixel noise0.5 pixel noise0.75 pixel noise1.0 pixel noise

Figure 3.10: The reconstruction error against the distance of lines for several

noise levels. Unfortunately, even with a moderate noise level, the reconstruc-

tion error is too large for distant lines.

52


reconstruction error was high for distant lines. To explain this we plotted

the planarity of reflected lines against the line distance in Fig. 3.11. Let the

planarity be defined as the mean angle reflected lines make with a best fitted

plane. A larger value therefore represents less planar lines. It can be seen (in

Fig. 3.11) that the planarity of the reflected lines is generally low for near lines

and it even decreased for distant lines. Coplanar reflected rays, or rays with a

low planarity, are undesirable because all lines on the plane will intersect the

reflected rays. The null-space selection will be unstable as a consequence. The

synthetic experiments above show that robust single view line estimation from

spherical reflections with a general line position is impossible.

1.5 2 2.5 3 3.5 4 4.5 50.08

0.1

0.12

0.14

0.16

0.18

0.2

0.22

0.24

0.26

0.28


plan

arity

[deg

ree]

Figure 3.11: The planarity of reflected lines against the line distance. The

definition of planarity here is the mean angle the reflected lines make with a

best fitted plane. A larger planarity value, i.e. a larger mean angle, represents

less coplanar lines.

In the following sections, two solutions to the line estimation problem are

proposed. Both solutions will avoid nearly coplanar lines and it will be shown

that the reconstruction error can be reduced significantly.

53


3.3.3 Two Spheres & One View

Instead of a single sphere reflecting a line, let us consider a scene with two

identical spheres (i.e. same radius) at distinct locations. It will be shown

experimentally that an additional sphere will avoid nearly coplanar reflected

rays.

A second synthetic sphere of radius 1.0 centered at[

2 0 2]T

was added

to the scene. For each of the line in the scene its reflection on both spheres and

projections to the image was estimated. Fig. 3.14 shows the 3D setup (left)

and the 2D projections (right).

Figure 3.12: A second sphere is added to the synthetic scene and the lines are

reflected on both spheres.

Line reconstruction for two spheres can, again, be solved with a null space

computation. The reflected rays R′1, ...,R

′n of the second sphere can be added

to (3.4) as additional rows to the matrix, giving the new equation

M′y′ =

RT

1...

RT

m

R′T1

...

R′Tn

y′ = 0. (3.13)

The 6-vector y′ maps M′ to a null vector, if the inner product Ri · y = 0 and

R′j ·y = 0 for i = 1 . . .m and j = 1 . . . n. We can solve for y′ by singular value

decomposition of the (m + n) × 6 matrix M′. Note that the reflected rays of

54


both spheres will produce a single intersecting line, and the rank of M′ will be

5. The null space of M′ is therefore given by the last column of V in (3.5).

The lines were reconstructed from their reflections on both spheres and

Fig. 3.13 plots the reconstruction error against the line distance. Compared to

Fig. 3.10, it can be seen that the reconstruction error has decreased significantly

for noise levels of 0.5 pixel and 1.0 pixel. In addition, experiments with higher

noise levels (1.5 pixel and 2.0 pixel) were performed and the results show a

reasonable reconstruction error. A noise level of 2.0 pixel had a reconstruction

error of around 0.1 sphere radius.

Note that there exists a degeneracy and the reflected rays will lie on the

same plane if the baseline (the line constructed by connecting the two sphere

centers) and the line to be reconstructed lie on the same plane.

1.5 2 2.5 3 3.5 4 4.5 50

0.02

0.04

0.06

0.08

0.1

0.12


reco

nstru

ctio

n er

ror [

radi

us]


Figure 3.13: The reconstruction error has significantly decreased after intro-

ducing a second sphere.

3.3.4 Two Views & One Sphere

It has been shown in the previous section that the reconstruction error will

significantly decrease after introducing a second sphere. Instead of adding an

addition sphere into the scene, this section will add an addition camera (or

55


move the same camera) to reduce the reconstruction error. We therefore have

just a single sphere being viewed from two distinct viewpoints. It will be shown

experimentally that an additional view will avoid (nearly) coplanar reflected

rays.

Several lines at different distances from the sphere center were added to

the scene and for each line its reflection on the sphere and projection to the

images was estimated. Fig. 3.14 shows the 3D setup and the 2D projections

for this configuration.

Since we are dealing with synthetic data, both camera projection matrices

are known. It will be shown in Sect. 3.4.2 how the camera positions can be

determined for real data. The reflected rays for both views can be determined

and written as (3.13). Note that the reflected rays of both views will produce

a single intersecting line, and the rank of M′ will be 5.

Figure 3.14: Two views capturing a single synthetic sphere. This figure shows

(from left to right) the 3D setup of the synthetic scene and the images of the

line reflected on the sphere.

For the synthetic experiment, the lines were reconstructed from their reflec-

tions on both spheres and Fig. 3.15 plots the reconstruction error against the

line distance. Compared to Fig. 3.10, it can be seen that the reconstruction

error has decreased significantly. A noise level of 2.0 pixel had a reconstruc-

tion error of around 0.3 sphere radius. Note that the error is larger compared

to the two sphere case but significantly smaller than the single sphere case.

Introducing a second sphere is therefore more effective than having a second

view of the same sphere.

56


1.5 2 2.5 3 3.5 4 4.5 50

0.05

0.1

0.15

0.2

0.25

0.3

0.35


reco

nstru

ctio

n er

ror [

radi

us]


Figure 3.15: The reconstruction error has decreased after introducing a second

view of a single sphere. Note that the error is larger then the two-sphere case

but significantly lower then single-sphere case.

57

3.4 Polygonal Light Source Estimation


This section applies the line reconstruction algorithms of the previous sections

to the estimation of a polygonal light source. Let a polygon be a planar

closed path in 3-space, composed of a finite sequence of straight line segments.

Suppose an image of a sphere reflecting a polygonal light source is captured.

Reflected rays can be determined given the sphere and, from the previous

sections, the 3-space line for an edge of the polygon can be estimated from the

reflected rays of either two spheres in one view (see Sect. 3.3.3) or one sphere

in two views (see Sect. 3.3.4). Repeating the reconstruction for all sides of

the polygonal light source results in n 3-space lines. The 3-space lines will in

general not intersect each other, and they will not lie on the same plane. It is

therefore necessary to extract a planar polygon from the n 3-space lines and

Sect. 3.4.1 will introduce a simple closed form solution for this.

For two views of a single sphere it is necessary to determine the positions

of cameras relative to the sphere. Sect. 3.4.2 proposes an iterative approach,

which will relate the two cameras in a common coordinate system centered at

the sphere center.

3.4.1 Extraction of Polygon

It is straightforward to extract a planar polygon from n 3-space lines. Let the

polygon lie on the plane π, which we estimated from the lines as the null-space

of

Nπ =

L1

L2...

Ln

π = 0, (3.14)

where Li are 4× 4 Plucker matrices (defined in B.1) of the n 3-space lines. By

projecting each line onto π a planar polygon can be extracted.

3.4.2 Camera Position Estimation

If two images of a single sphere are taken from two distinct viewpoints, reflected

rays can only be determined if the relative camera positions are known. This

58


section will relate the two cameras in a common coordinate system, centered at

the sphere center. The projection matrices for the two cameras can be written

as

P = K[ I C ]

P′ = K′[ E C′ ], (3.15)

where K and K′ are the two known camera calibration matrices. C and C′ are

the translations between camera and sphere and can be recovered in each of the

two camera-centered coordinate systems respectively (Sect. 2.3). By assuming

a fixed radius for the sphere in both views, it is possible to relate the two

cameras in a common coordinate system centered at the sphere center. Due to

the symmetry exhibited in the geometry of the sphere, an arbitrary rotation

about the sphere center (i.e., the world origin) can be applied to the camera

without changing the image of the sphere. This corresponds to rotating the

camera around the sphere while keeping a visual cone constructed from the

image tangent to the sphere. Let E be a 3× 3 rotation matrix corresponding

to this unknown rotation. Note that the location of the highlight on the sphere

surface will depend on both the location of the polygonal light source and the

viewpoint. Given the correct E, the reflected rays for both views meet at the

light source and construct a planar polygon. For a given rotation matrix the n

light source edges L1,L2, . . . ,Ln can be estimated. Consider the set of linear

equations in the form L1

L2...

Ln

π = 0 + r, (3.16)

where r is the vector of residual errors. The plane π can be estimated as the

linear least squares solution from this set of equations. We are interested in

finding the rotation E such that the sum of squares of the residual errors rTr

is minimal.

Instead of performing an optimization directly on the parameterized search

space, an initial global minimum is found on a subdivision of the search space.

The optimization is subsequently initialized with the global minimum of the

subdivision. This procedure avoids an early termination in a local minimum.

59

3.5 Experimental Results on Real Data


This section describes experiments illustrating the usefulness of the proposed

method. Sect. 3.5.1 presents experiments for the estimation of a standard flu-

orescent office lamp from two spheres in a single view and Sect. 3.5.2 presents

experiments for the estimation of a smaller desk light source from two views

and a single sphere. The estimates will be compared to the light source speci-

fications provided by the manufacturers, and synthetically generated specular

reflections on the spheres will be compared to the specular reflections in the

real images.

3.5.1 Two Spheres & One View

Two identical yellow billiard balls were imaged from a single viewpoint (see

Fig. 3.16). The spheres were put below a standard rectangular fluorescent

office lamp. The light source had a dimension of 270mm× 1170mm, while the

diameters of the billiard balls were 57mm. Intrinsic parameters of the camera

were obtained a priori by Zhang’s camera calibration method [Zhang (1999)].

Cubic Bezier-spline snake was applied to extract the contours of the spheres

in the image, and conics were then fitted to these contours using a direct least

squares method [Fitzgibbon & Fisher (1995)]. Specular reflections of the light

source were selected and matched manually. Fig. 3.16 shows the input image,

segmented input image and reconstructed spheres together with an augmented

object. The spheres and the augmented object have been rendered under the

estimated area light. It can be seen in a close up of the two original spheres

(Fig. 3.17 left), and the two synthetically rendered spheres (Fig. 3.17 right),

that the specular reflections are approximately in the same locations.

A 3D scene of the camera, spheres and reconstructed light source is ren-

dered in Fig. 3.18. The light source estimation result was compared against

the ground truth in Table 3.1 and Table 3.2.

60


Figure 3.16: Two identical billiard balls were imaged from a single viewpoint

(top). Extracted contours and specular reflections (middle). Spheres and an

augmented object rendered under the estimated area light (bottom).

61


Figure 3.17: A close up of the two original spheres (left) and the two rendered

spheres (right). The specular reflections are approximately identical.

Table 3.1: Estimation results for the two spheres and single view case. This

table compares the length of the edges of the light source against the specifi-

cation provided by the manufacturer.

edge ground truth length [mm] estimated length [mm] error [mm]

A 270.00 253.75 16.25

B 1170.00 1160.33 9.67

C 270.00 237.14 32.86

D 1170.00 1124.23 45.77

62


Table 3.2: Estimation results for the two spheres and single view case. This

table compares the angle of the estimated edges of the light source against the

specification provided by the manufacturer.

edge ground truth angle [degree] estimated angle [degree] error [degree]

A-B 90.00 95.58 5.58

B-C 90.00 83.65 6.35

C-D 90.00 88.03 1.97

D-A 90.00 92.74 2.74

Figure 3.18: A rendered 3D scene of the camera, spheres and reconstructed

light source.

63


Table 3.3: Estimation results for the two views and a single sphere case. This

table compares the length of the estimated edges of the light source against

the specification provided by manufacturer.

edge ground truth length [mm] estimated length [mm] error [mm]

A 68.00 69.31 1.31

B 33.00 32.87 0.13

C 68.00 67.63 0.37

D 33.00 35.28 2.28

Table 3.4: Estimation results for the two views and a single sphere case. This

table compares the angle of the estimated edges of the light source against the

specification provided by manufacturer.

edge ground truth angle [degree] estimated angle [degree] error [degree]

A-B 90.00 85.23 4.77

B-C 90.00 92.12 2.12

C-D 90.00 89.84 0.16

D-A 90.00 92.81 2.81

3.5.2 Two Views & One Sphere

In this experiment, two photos of a single sphere were taken. This time, a

blue billiard ball was enlightened by a smaller rectangular desk light source

that had a dimension of 68mm × 33mm. Again, intrinsic parameters of the

camera were obtained a priori by Zhang’s camera calibration method [Zhang

(1999)]. Cubic Bezier-spline snake was applied to extract the contours of the

sphere in the images, and conics were then fitted to these contours using a

direct least squares method [Fitzgibbon & Fisher (1995)]. Specular reflections

of the light source were selected and matched manually. Fig. 3.19 shows the

input image, segmented input image and the original image augmented with

a synthetic sphere under the estimated area light source.

A 3D scene of the camera, sphere, and reconstructed light source are ren-

dered in Fig. 3.20. The estimation result was compared to the ground truth

in Table 3.3 and Table 3.4.

64


Figure 3.19: Two images of a blue billiard ball were taken from different view-

points (top). Extracted contours and specular reflection (middle). A synthetic

sphere is rendered under the estimated area light source (bottom). It can be

seen that the specular reflections are approximately identical.

65

3.6 Comparison to a Point-based Algorithm

Figure 3.20: Two views of a rendered 3D scene with the two cameras, sphere

and reconstructed light source.


In this section, it will be shown that the proposed method is more robust to

noise compared to a point-based algorithm. Let a polygon be reflected onto

two spheres. The point-based algorithm proceeds as follows.

1. Reflected rays are estimated for each vertex of the polygon.

2. The 3D coordinates of a vertex is recovered as the 3D point closest to

the two reflected rays.

3. A planar polygon is extracted from the recovered vertices.

This closed form solution solves the same problem as the proposed method.

Without any noise on the specular reflections, both methods will give the

same results. Different estimates are produced, however, when noise is in-

volved. Note that the point based method needs direct point-to-point corre-

spondences, while the proposed method requires corresponding sets of points

(curves), where each set has to have at least four points.

A synthetic experiment was prepared to compare the performance of the

proposed reconstruction method against the point based reconstruction. Fig. 3.21

shows the synthetic input images (top) and ground truth sphere silhouettes

together with the specular reflection of lines (bottom).

66


Figure 3.21: A polygon reflected on two spheres (top). Sphere silhouettes and

reflected projections of polygonal edges and vertices (bottom).

0 0.2 0.4 0.6 0.8 10

0.02

0.04

0.06

0.08

0.1

0.12

0.14

0.16

0.18

noise [pixel]

reco

nstru

ctio

n er

ror [

radi

us]

point−based methodproposed method

Figure 3.22: The reconstruction error (in radius of the sphere) is plotted

against the noise level (in pixel) for the proposed method and the point-based

method. It can be seen that the proposed method significantly outperforms

the point-based method.

67

3.7 Conclusions

The reconstruction error is plotted against the noise level in Fig. 3.22. It can

be seen that the proposed method significantly outperforms the point-based

method. The point-based method relies on a single point-to-point correspon-

dence for the estimation of the 3D coordinates. Its is therefore more sensitive

to noise compared to the proposed method which performs the reconstruction

based on corresponding sets of points. Note that in practical applications, the

proposed method will further outperform the point-based reconstruction be-

cause it can be expected that detection of curves is more robust than detection

of points.

3.7 Conclusions

This chapter recovered a polygonal light source from the image of a specular

sphere. Vision applications where the light source is not directly visible will

benefit from the proposed method. The main contributions in this chapter

were

1. An empirical analysis showing that single view and single sphere line

reconstruction is ill-conditioned.

2. A novel closed form solution for line reconstruction from two spherical

reflections.

3. Spheres do not need to be calibrated, and the reconstruction is up to the

scale of the sphere.

4. A novel algorithm for reconstruction of a polygon from the reflections

on two spheres observed from a single viewpoint or from two views of a

single sphere.

Experiments on both synthetic and real data showed promising results. The

proposed methods have the following limitations:

• The methods aim at reconstructing polygons, but do not actually enforce

that the result is a polygon, and in general, the reconstructed lines will

not form polygons. A post-processing is required to output polygons.

68

3.7 Conclusions

In future work, the post-processing step could be eliminated and the

estimation of the area light source could be done in a single procedure.

• The shape of the reflected line on the sphere could be studied in future

work. Recently, it has been shown in [Agrawal et al. (2010)] that this

reflection is a quartic curve. This information could be used to better

segment the specular reflections and therefore improve estimation results.

69

Chapter 4

Display and Gaze Estimation

4.1 Introduction

In the previous chapters we have inserted a specular billiard ball into a scene

and analyzed specular reflections for recovery of point light sources and area

light sources. This chapter will deal with specular reflections of a visual display

on the cornea of human eyes. It will be shown that a rectangular light source,

in this case a rectangular visual display, can be reconstructed from its specular

reflections on the eye. Reconstruction of eyes and visual display is useful for

point-of-gaze estimation, which can be approximated from the 3D positions of

the iris and the visual display. It will be shown that the iris outlines, which are

ellipses in the image, and the eye reflections in a single intrinsically calibrated

image provide enough information for this reconstruction.

We are a highly visual species, and measuring our eye gaze (where we are

looking) has a large number of applications in several fields including psychol-

ogy, industrial engineering and advertising [Duchowski (2007)]. Despite some

recent advances, most systems can only work under a confined and controlled

environment. For instance, other than setup specific calibration in which the

relative positions of the cameras, visual display and active illuminations are

determined, most systems also require a subject specific calibration. This is

typically done by having the subject to fixate on multiple points with known

locations in the scene. This is an obstacle in applications requiring minimal

subject cooperation, such as applications with infants [Guestrin & Eizenman

70

4.1 Introduction

(2008)]. In a commercial state-of-the-art system, the subject specific calibra-

tion will take up to 5 minutes [SRResearch (2009)]. Some systems require

the subject to keep his face considerably still through the use of a bite bar

or a forehead support, which might bring physical discomfort. The aforemen-

tioned requirements, together with high cost of the hardware (∼US$2,000 -

US$45000), greatly hinder the applicability of eye gaze estimation systems

and often limit their use to a laboratory setting.

Instead of improving accuracies of existing systems, this chapter investi-

gates the minimal information required. By using minimal information, the

corresponding hardware setup can be greatly simplified, which in turns results

in a simplified and automatic reconstruction.

This chapter has the following contributions:

1. Estimate the positions of eyes and visual display from a single image.

Specular reflections on the cornea of the eye provide strong constraints

on the environment surrounding the subject, and can be exploited to

find the positions of objects in front of the subject.

2. No subject specific parameters for determining where a user is gazing

relative to a visual display. Apart from the camera intrinsics, setup

specific parameters do not need to be known.

3. Remove ambiguities by verifying the intersection between optic axis of

eyes and visual display. The elliptical image of the eye’s circular limbus

can be back-projected into 3D yielding two possible circles. Existing

work disambiguates these by making use of anthropomorphic knowledge

of the structure of the eyeball [Wang et al. (2005b)].

In contrast to commercial technologies, active illumination (such as infrared

light) is not employed in this work, and, as a result, off-the-shelf equipment

can be used.

The rest of this chapter is organized as follows. Sect. 4.2 gives a comparison

to existing work. Sect. 4.3 provides an introduction to the general shape and

dimensions of the human eye and proposes an approximate geometric model.

In Sect. 4.4, it will be shown that the limbus can be reconstructed, up to a

71

4.2 Related Work

sign ambiguity, from its perspective projection, and a closed form solution for

the reconstruction of a display edge is proposed in Sect. 4.5. A rectangle is

extracted from the four display edges in Sect. 4.6. Finally, the point-of-gaze is

estimated on the visual display in Sect. 4.7, followed by experimental results

in Sect. 4.8 and conclusions in Sect. 4.9.

4.2 Related Work

Early methods for eye movement measurement employed contact lenses with

embedded coils or rubber rings. Popular in the 1970s, an eye-monitoring tech-

nique known as electroculography relied on electric measurements from elec-

trodes placed around the eyes for measuring the eye gaze [Duchowski (2007)].

These crude and highly intrusive techniques of the past have been replaced by

refined and relatively user-friendly vision-based methods. Carefully calibrated

state-of-the-art methods based on active illumination are capable of recording

eye movements with a sampling rate of 2000Hz and an accuracy of up to 0.5◦

[SRResearch (2009)]. Being one of the most important challenges in eye gaze

estimation, techniques for reducing the effort of calibration have been studied

before.

Multiple views. In an interesting work [Chen & Ji (2008)], a stereo gaze

tracking system with two cameras was proposed. Subject specific parameters

were estimated by a simple calibration method involving the subject gazing at

only four positions on the visual display. Another work [Guestrin & Eizenman

(2008)] required an even simpler calibration procedure in which the subject had

to fixate on just a single point. A custom made setup with two synchronized

cameras and four infrared light sources was employed for the calibration. It

was shown in [Shih et al. (2000)] that without addition information of the

cornea and pupil size, at least two cameras and two light sources are needed

to recover the eye position and gaze in 3D. Reconstruction of the pupil ellipse

in 3D space from multiple views was done in [Kohlbecher et al. (2008)].

These methods have reduced the subject specific calibration requirements

of early methods significantly by introducing an additional camera. However,

72

4.2 Related Work

having more cameras also means that it is necessary to synchronize the cameras

as well as to estimate their relative positions and orientations. The cameras

can usually not see the visual display which further complicates the calibration

procedure, and additional hardware and user interactions are needed. Guestrin

and Eizenman [Guestrin & Eizenman (2008)] calibrated their system using a

double sided checkerboard pattern and an auxiliary camera, while Chen and

Li [Chen & Ji (2008)] used a pattern to calibrate the stereo cameras and a

planar mirror for the visual display calibration. Shih et al. [Shih et al. (2000)]

additionally required light positions to be known.

Single view. Cross-ratio is an invariant of the projective space and it

can been used for gaze estimation. In [Yoo & Chung (2005)] and later in

[Kang et al. (2008)], four infrared LEDs were attached to the corners of a

visual display, another infrared LED was positioned near the center of a zoom

camera. An additional camera was used to guide the zoom camera. Their

method did not need any setup specific calibration, and their subject specific

calibration was a simple procedure in which the subject had to fixate on the

four LEDs in the corners of the visual display. Other single view gaze-direction

estimation methods exist [Wang et al. (2005b), Wu et al. (2004), Wang et al.

(2003)]. Since we are interested in the point-of-gaze relative to a visual display,

we will exclude such methods in this discussion.

Specular reflection. Reflections are a powerful cue for pose estimation

in computer vision [Lagger et al. (2008)]. In [Francken et al. (2009)] the reflec-

tions of gray code illumination patterns on a spherical mirror were imaged to

calibrate a camera and a visual display. Reflections on the cornea of the eye

were studied in [Nishino & Nayar (2004a), Nishino & Nayar (2004b), Nishino

& Nayar (2006)]. By relating their corneal imaging system to a catadioptric

system, a projection of the environment on the cornea can be computed. Re-

cent advances [Nitschke et al. (2009)] enable the estimation of display corner

positions from multiple images of a subject that moves around a visual display.

For each subject position, light directions were estimated and finally triangu-

lated to recover the 3D positions of the corners. Motivated by these results,

73

4.3 The Eye

this chapter employes reflected curves instead of reflected points, which pose

stronger constraints on a 3D edges of the visual display (see Chapter 3). It

will be shown that two human eyeballs provide enough constraints for accurate

visual display estimation from a single view.

In summary, it can be said that most existing methods use multiple views,

while those using a single view utilize active illumination and a tailor made

hardware. The financial possibilities of selling such a technology are acknowl-

edged. However, this chapter introduces a passive method based on a single

view that is not as accurate as the existing active multiple view methods, but

is readily available to any person with a camera and a computer.

4.3 The Eye

This section provides an introduction to the general shape and dimensions

of the human eye, and proposes an approximate geometric eye model. This

geometric eye model will be used later to determine the position of the eyes

from an image. The section for the anatomy of the eye is based on information

provided in [Adler & Moses (1975); Besharse et al. (2010); Forrester et al.

(2008); Snell & Lemp (2007)], to which the interested reader may refer for a

complete discussion on the topic. A detailed review of recent eye models can

be found in [Hansen & Ji (2010)].

4.3.1 Anatomy

The eyeball is situated inside the orbital cavity, a location that serves to protect

it. The anterior surface of the eyeball is exposed (see Fig. 4.1) and partially

protected by the eyelids, which assist in the distribution of tears. A colorful,

thin, pigmented circular disk known as the iris regulates the amount of light

coming into the eye through a central aperture, called the pupil. The iris

function is analogous to the diaphragm of a camera. The color of the iris

varies from dark brown to light blue and it may vary from one eye to the

other in the same person. The upper eye lid covers part of the iris, while the

74

4.3 The Eye

iris

sclera

pupil

eyelid

limbus

Figure 4.1: Outer view of right eye. (Illustration taken from Cancer (2009))

lower lid crosses the edge of the cornea. With poor illumination the pupil is

dilated and with excessively bright light the pupil is constricted. There may

be a slight degree of asymmetry in pupil radius between right and left eyes in

normal individuals.

Pupil and iris are surrounded by the white, opaque sclera, which covers

five-sixths of the eyeball; The other one-sixth is occupied by the transparent

cornea (see Fig. 4.2). The cornea has a higher curvature than the rest of the

eyeball and refracts the light entering the eye. The shape of the eye can be

described with two spheres, a smaller one anteriorly, the cornea, and a larger

posterior sphere, the sclera. Note that the eyeball is not a perfect sphere,

it is often described as being slightly flattened in the vertical plane. The

surface of the cornea is very smooth and has a thin film of tear fluid on it,

which makes it quite reflective. The tear fluid is a watery secretion containing

antibacterial enzyme and bactericidal protein, which serve as a defense against

micro organism. Tears also keep the front of the eyeball lubricated so that it

can move easily beneath the lids. The sclera is directly connected to the cornea,

and the boundary zone between the two is known as corneoscleral junction or

limbus.

Behind the iris and the pupil is the transparent lens. The lens is an impor-

tant part of the eye, because it can change its dioptric power so that distant

75

4.3 The Eye

opticnerve

retina

pupil

iriscornea

sclera

limbusfoveacentralis

lens

vitreousbody

sclera

Figure 4.2: A vertical cross section showing the anatomy of the eye.

and near objects can be focused on an internal layer of the eyeball, known as

the retina. Photochemical transductions occur on the retina and are trans-

mitted along the optic nerve to the brain for higher cortical processing. The

central part of the retina is important for visual functions and the fovea cen-

tralis is located here. It is a small region that has the most sensitive cells and

has the highest visual acuity. Between the retina and the lens is the vitreous

body. The vitreous is a colorless, transparent gel consisting of 98% water,

which contributes slightly to the dioptric power of the eye.

4.3.2 Dimensions

In the literature [Forrester et al. (2008); Snell & Lemp (2007)], it is noted

that the dimensions of the eye components can vary considerably from one

person to the other. The literature on the biometry of the human eye was

reviewed, and this section presents the empirical values found. 1000 eyes were

measured in [Stenstrom (1948)], while the biometric data of 176 and 220 eyes

were obtained in [Kiely et al. (1982)] and in [Guillon et al. (1986)] respectively.

The scientific findings of nine papers concerning the measurements of human

eyes were summarized in [Clark (1973)]. Table 4.1 lists eye components and

their dimensions with the references where they were found.

76

4.3 The Eye

Table 4.1: Eye components and their dimensions.

Eye Component Dimension [mm]

posterior sphere radius 11.5 [Forrester et al. (2008)]

(sclera) 12.0 [Snell & Lemp (2007)]

11.5 vert., 11.75 horiz. [Snell & Lemp (2007)]

anterior sphere radius 7.8 [Adler & Moses (1975); Forrester et al. (2008)]

(cornea) 7.7 [Snell & Lemp (2007)]

7.86 (7.00 to 8.65) [Stenstrom (1948)]

7.80 (7.00 to 9.00) [Clark (1973)]

limbus radius 6.0 [Forrester et al. (2008)]

5.6 [Snell & Lemp (2007)]

limbus (wideness) 1.5 to 2.0 [Forrester et al. (2008)]

pupil radius 4.0 dilated, 0.5 constricted [Snell & Lemp (2007)]

Table 4.2: Empirical values for the dimensions of the cornea.

Reference Radius [mm] Asphericity

[Kiely et al. (1982)] 7.72± 0.27 −0.26± 0.18

[Guillon et al. (1986)] 7.77± 0.25 −0.19± 0.16

(steep meridian)

−0.17± 0.15

(flat meridian)

The cornea was modeled as an ellipsoid instead of a simple sphere in [Guil-

lon et al. (1986)] and [Kiely et al. (1982)]. Table 4.2 shows the dimensions of

the cornea in terms of the radius and asphericity of an ellipsoid.

4.3.3 Optics

Light enters the eye through the cornea and is focused on the retina with the

combined optical power of the cornea and lens. The lens’s dioptric power is

about 15 diopters, while the dioptric power of the entire eye is 58 diopters.

Most of the eye’s dioptric power is provided by the cornea. Light comes from

the air environment, with a refractive index of 1.00, to the eye, with an approx-

77

4.3 The Eye

imate refractive index of 1.33. The optical power of the eye is attributed to a

combination of the higher refractive indices and the curvatures of the cornea

and lens.

The nodal point is a point immediately behind the back surface of the

lens, where the image becomes reversed and inverted. The visual axis (also

known as the line of sight) connects the fovea centralis with the nodal point

and continues towards the viewed object (see Fig. 4.3). The location of the

fovea centralis is subject dependent and may change from one eye to the other

in the same subject.

nodalpoint

visualaxis

foveacentralis

lens

object

image

Figure 4.3: The visual axis (also known as the line of sight) connects the fovea

centralis with the nodal point and continues towards the viewed object.

4.3.4 Approximate Geometric Model

To interpret an image of an eye, a geometric model is required. Although

the eye is of an approximate spherical shape, it cannot just be modeled by a

single sphere because the cornea has a higher curvature than the rest of the

eyeball. We therefore propose an eyeball model with two sphere segments of

different sizes placed one in front of the other (see Fig. 4.4). Note that the

cornea could be modeled as an ellipsoid, but the asphericity is low in practice

(see Table 4.2).

78

4.3 The Eye

CACP

rA

rP

CL NL

rL

anteriorsphere

posteriorsphere

iris

limbus

Figure 4.4: The proposed geometric eye model.

Let the anterior, smaller sphere segment have radius and center rA and CA

respectively, and the posterior, larger sphere segment have radius and center

rP and CP respectively. The limbus is modeled as a circle with radius rL and

center CL. Let the limbus be positioned such that its center lies on the line

connecting CA and CP; This line is the optic axis and it coincides with the

supporting plane normal NL of the limbus. Note, NL does not coincide with the

visual axis (see Fig. 4.5), because the fovea is slightly displaced from the optic

axis. The angle ϕ between the optic axis and visual axis is subject dependent

(and may change among eyes of the same subject), but it is relatively small

(∼ 5◦ [Liou & Brennan (1997)]). In our model we assume that optic and visual

axes coincide.

The differences in eye parameter values are small (see Sect. 4.3.2); It is

reasonable to assume that various parameters of an human eye are close to

certain anatomical constants. The anatomical constants of our eye model are

given in Table 4.3 and were selected based on the empirical values given in

Sect. 4.3.2.

The distance between the limbus center and the anterior sphere center is

modeled as the variable

|CLCA| =√r2A − r2

L, (4.1)

79

4.3 The Eye

NL

visualaxis

φ

opticaxis

Figure 4.5: The fovea is slightly displaced from the optic axis. The angle ϕ

between the optic axis and visual axis is subject dependent, but it is relatively

small.

Name Parameter Value

limbus radius rL 5.55mm

anterior sphere radius rA 7.77mm

posterior sphere radius rP 11.75mm

Table 4.3: Anatomical constants of the proposed geometric eye model.

80

4.3 The Eye

and it is used to obtain the anterior sphere center

CA = CL − |CLCA|NL. (4.2)

The posterior sphere center CP may be obtained similarly.

4.3.5 Movement

During accommodation, i.e. the optical change in the dioptric power of the

eye, the curvature of the lens is changed, which gives the eye the ability to

change its point-of-focus from distant objects to near objects.

In addition to being able to change the shape of the lens, an eye can rotate

and translate inside the orbital cavity. Six extraocular muscles control the

movement of the eye. The amount of translation is very slight, approximately

2.0 mm along the anteroposterior axis and 0.5 mm along the frontal plane.

Ocular rotations take place around the center of the eye. Like other animals

with two eyes, humans perceive the world through binocular vision. The vi-

sual axes for both eyes will intersect at the point-of-gaze. Because eyes are

spatially separated, they point in different directions to fixate a near object

(see Fig. 4.6).

Figure 4.6: Eyes point in different directions to fixate a near object.

81

4.4 Limbus Reconstruction


In the following section, it will be shown how the limbus can be reconstructed

up to a sign ambiguity from its perspective projection. The projective equation

of a circle has been derived previously in the context of camera calibration

(see [Chen et al. (2004); Safaee-Rad et al. (1992); Zheng & Liu (2008)]). We

reformulate the problem here for our purpose of reconstructing the limbus.

4.4.1 Closed Form Solution

It will now be shown how the limbus center CL and the normal of its supporting

plane NL can be recovered in the camera coordinate system in terms of the

limbus radius rL. Let the limbus be represented as a circle in 3-space by a

3× 3 symmetric matrix

L =

[I2 −CL

−CTL (CT

LCL − r2L)

]. (4.3)

Any 2D point on the limbus plane, with homogeneous representation X, lying

on the limbus will satisfy the equation

XTLX = 0. (4.4)

Without loss of generality, let the limbus be centered at the world origin

CL = 0, and let the normal of its supporting plane be coincide with the z-axis

of the world coordinate system NL =[

0 0 1]T

. A rigid motion denoted

by [ R T ], where R is a 3 × 3 rotation matrix and T a translation vector,

defines the transformation from the limbus coordinate system to the camera

coordinate system. According to [Hartley & Zisserman (2004)], a planar ho-

mography, mapping points on a world plane to points on the image plane, can

be defined as

x = HX = K[ R1 R2 T ]X, (4.5)

where x and X are the homogeneous representations of points on the image

plane and on the world plane respectively, and Ri is the i-th column of R.

82


The image of the limbus is a symmetric 3× 3 matrix representing a conic

section. H and L determine the image of the limbus as

Limg = H−TLH−1, (4.6)

which follows from (4.4), (4.5) and xTLimgx = 0.

Let us now consider a simplified case, where both the camera calibration

matrix and the rotation matrix are given by the identity matrix K = R = I3.

Under this simplified configuration, the image of the limbus can be obtained

using (4.6) and is given by

L′img =

1 0 − t1t3

0 1 − t2t3

− t1t3− t2t3

t21+t22−r2Lt23

, (4.7)

where T =[t1 t2 t3

]T. Note that L′img represents a circle centered at

c′ = [ t1t3

t2t3

]T with radius r′ = rLt3

. The translation vector (representing the

circle center in the camera coordinate system) can be recovered in terms of r′

and c′ as

T′ =rLr′

[c′

1

]. (4.8)

Consider now the general case where the rotation matrix and the camera cal-

ibration matrix are given by R and K respectively. The effect of K is first

removed by normalizing the image using K−1. The conic Limg will be trans-

formed to a conic Limg = KTLimgK in the normalized image, which can be

diagonalized into

Limg = MDMT = M

a 0 00 b 00 0 d

MT, (4.9)

where M is an orthogonal matrix whose columns are the eigenvectors of Limg,

and D is a diagonal matrix consisting of the corresponding eigenvalues. The

matrix MT defines a rotation that will transform Limg to the ellipse D which is

centered at the origin and whose principal axes are aligned with the coordinate

system.

83


In order to relate the general case with the previous simplified case, the gen-

eralized ellipse D is transformed to a circle E by applying the transformation

E = NTDN. It can been shown that the matrix

N =

g cosα s1g sinα s2hsinα −s1 cosα 0

s1s2h cosα s2h sinα −s1g

(4.10)

will transform the ellipse D to the circle E, where g =√

b−da−d , h =

√a−ba−d and

s1,2 are undetermined signs (see [Chen et al. (2004)] for a detailed derivation).

Without loss of generality, let ab > 0, ad < 0 and |a| > |b|. The circle E

will have the center

c =

−s2

√(a−b)(b−d) cosα

b

−s1s2

√(a−b)(b−d) sinα

b

(4.11)

and radius

r = s3

√−adb

, (4.12)

where s3 is another undetermined sign. Note that α represents a rotation angle

around the limbus z-axis. It can be chosen arbitrarily because the projection

of the circular limbus will not be affected by a rotation around its normal.

Using (4.8), the translation

T = MNrLr

[c1

](4.13)

can be recovered, and the rotation is given by

R = MN. (4.14)

We finally recover the center of the limbus and the normal of its supporting

plane in the camera coordinate system as

CL = T

NL = R[

0 0 1]T. (4.15)

84


cameracenter

visualcone

behind of camera infront of camera

limbus center

limbus normal

Figure 4.7: There are 8 solutions for cutting a cone with a plane in such a way

that the intersection is a circle of a given radius. By enforcing the constraint

that the limbus center lies in front of the camera and the limbus normal is facing

the camera, only two solutions (shown in red and green) remain feasible.

Note that the three unknown signs s1, s2 and s3 represent 23 = 8 possible

solutions for the limbus. These solutions represent the 8 possibilities for cutting

a cone with a plane in such a way that the intersection is a circle of a given

radius ( see Fig. 4.7). Two of the unknown signs can be recovered by enforcing

(a) CL is in front of the camera, and

(b) NL is facing the camera,

which leaves just a single unknown sign.

4.4.2 Noise Analysis

It is difficult to obtain the limbus in practice because the iris gradually dissolves

into the sclera. The performance of the limbus reconstruction algorithm was

investigated with controlled synthetic data. The experimental setup consisted

of a circle with radius rL representing the limbus and being viewed by a camera

from different viewing angles. The limbus was imaged from a distance of 45cm,

the focal length of the synthetic camera was 70mm and the image resolution

was 1129× 716 pixel.

85


80° 70° 60°

50° 40° 3D

Figure 4.8: This figure shows the input images and a 3D rendering of the setup

for the synthetic experiment.

The image of the limbus was obtained analytically using (4.6). Gaussian

noise of different standard deviations was added to the conic where points were

sampled and perturbed in a radial direction from the center. A noisy conic was

obtained as a conic fitted to these noisy points using a least squares method

[Fitzgibbon et al. (1999)].

Experiments with noise levels ranging from 0.0 to 4.0 pixels were carried

out for distinct viewing directions (optic axis to limbus plane: 80◦ to 40◦ in

10◦ intervals). The synthetic images and a 3D rendering of the setup can be

seen in Fig. 4.8. For each noise level, 200 independent trials were conducted

to estimate the mean error in limbus center and limbus normal. The unknown

signs in (4.13) and (4.14) were determined by the ground truth.

Fig. 4.9 shows a plot of the mean error in the limbus normal (in degree)

against the noise level (in pixel), and Fig. 4.10 shows a plot of the mean error in

the limbus center (in cm) against the noise level (in pixel). It can be seen that

the mean error in the limbus normal increased approximately linearly with the

noise level. The mean error in the limbus center increased linearly for large

angles and non-linear for smaller angles. A noise level of 4.0 pixel standard

deviations resulted in a mean error of approximately 4.2◦ for the 80◦ case and

less for the other cases. Note that small errors in the limbus center and normal

86


0 0.5 1 1.5 2 2.5 3 3.5 40

0.5

1

1.5

2

2.5

3

3.5

4

4.5

noise [pixel]

erro

r [de

gree

]80°70°60°50°40°

Figure 4.9: Sensitivity of the limbus reconstruction to imperfect data. The

mean error in the angle of the limbus normal (in degree) is plotted against the

noise level (in pixel).

0 0.5 1 1.5 2 2.5 3 3.5 40

0.05

0.1

0.15

0.2

0.25

0.3

0.35

noise [pixel]

erro

r [cm

]

80°70°60°50°40°

Figure 4.10: Sensitivity of the limbus reconstruction to imperfect data. The

mean error in the distance of the limbus center (in cm) is plotted against the

noise level (in pixel).

87


Figure 4.11: A semi-automatic approach is used for the segmentation of the

limbus. The manually initialized B-spline is shown in red and the extracted

limbus is shown in green.

will have a large impact on the point-of-gaze estimation accuracy. Assuming

a limbus plane parallel to the visual display at a distance of 50mm, an error

of 4.2◦ in the limbus normal (no error in limbus center) would be equivalent

to an error of ε = tan(4.2◦)50mm ≈ 3.67mm on the visual display.

4.4.3 Experimental Results

This section presents experimental results for limbus reconstruction from a

subject gazing on different points on a visual display. A camera was positioned

below the visual display and was calibrated using a planar pattern [Zhang

(1999)].

To reconstruct the limbus from an image, its projection, the ellipse Limg,

needs to be obtained. There exists a plethora of work dealing with the problem

of limbus segmentation [Chen et al. (2009); Fabian et al. (2010); Li et al. (2005);

Zuo et al. (2008)] and [Hansen & Ji (2010)] summarizes work on eye detection

and segmentation; Unfortunately most assume a circular projection. Here, a

simple semi-automatic approach is employed to extract an elliptical projection

of the limbus. A B-spline snake is manually initialized close to the visible

88


A B C

Figure 4.12: First row: A subject gazing at three different points in a visual

display. Second row: Estimated limbus normal (in yellow) and limbus (in

green)

A B C

Figure 4.13: A close up of the left eye.

89

4.5 Display Edge Reconstruction

A B C

Figure 4.14: The estimated limbus and the normal of its supporting plane

rendered in 3D.

limbus. The B-spline is updated with a linear least squares method such that

its samples move towards local intensity discontinuities. Subsequently a least

squares method [Fitzgibbon et al. (1999)] fits an ellipse to samples of the B-

spline (see Fig. 4.11). The subject was asked to actively control his eye lid

so that the visibility of limbus was maximized (not blocked by lower or upper

eye lid). This was done to simplify the limbus segmentation. The captured

images and estimation results are shown in Fig. 4.12; Fig. 4.13 shows a close

up of the left eye and a 3D rendering of the estimation result can be found in

Fig. 4.14. The unknown signs in (4.13) and (4.14) were determined manually

for this experiment.

The limbus has been reconstructed and it is now possible to determine

the anterior sphere center from (4.2). The next section will estimate a single

display edge from its reflection on the cornea, and Sect. 4.6 will extract a

rectangle from four independently estimated display edges.


In this section a closed form solution for the reconstruction of a display edge

is proposed. It will be shown that the 2D reflections of display edges on a pair

of anterior spheres will determine the 3D display edges. The 3D display edges

of the visual display are first independently reconstructed with the method

proposed in this section and a subsequent optimization (see Sect. 4.6) will

extract a planar rectangle representing the visual display.

90


αRαR

αLαL

right eye

left eye

display edge

camera center

R

V

R

R L

R

VL

NR

NL

PR

PL

pL

pR

Figure 4.15: A 2D point on the display edge determines the viewing ray V

and reflects on the anterior sphere as the reflected ray R. The reflected rays

are constructed in such a way for both eyes and all 2D points on the display

edge. The display edge can be determined from this set of reflected rays.

Let the back-projection of a point p on the 2D display edge be the viewing

ray V. The ray V will leave the camera center, pass through the point p and

intersect the anterior sphere at some point P (see Fig. C.1). The position of

the anterior sphere can be determined from (4.15) and (4.2).

Given rA, the point P can be determined. Let V reflect on the anterior

sphere as the reflected ray R. Let V and R be the unit vectors in the directions

of V and R respectively. The law of reflection states that the incident angle

must be equal to the reflection angle, and the reflection direction is therefore

given by R = V − (2N · V)N, where N is the unit normal vector at point

P. R passes through P in the direction R and will, by construction, intersect

the display edge at some point. Note that all the reflected rays constructed

in such a way will intersect the display edge. Similar to the line estimation

from two spheres in the previous chapter (see Sect. 3.3.3), the display edge is

estimated here as the null-space of

91


−0.25 −0.2 −0.15 −0.1 −0.05 0 0.05 0.1 0.15 0.2 0.250

1

2

3

4

5

6

variation [mm]

reco

nstru

ctio

n er

ror [

cm]

anterior sphere radiuslimbus radius

Figure 4.16: The reconstruction error (in cm) for varying anterior sphere

radius (in mm) and limbus radius (in mm).

RT

L1...

RT

Lm

RT

R1...

RT

Rn

y = 0, (4.16)

where RL1, ...,RLm are the reflected rays of the left eye in permuted Plucker

line notation, and RR1, ...,RRn are the reflected rays of the right eye in per-

muted Plucker line notation.

In the next section, we will analyze the reconstruction error introduced by

variations in the anatomical parameters.

4.5.1 Variations in Anatomical Parameter

The geometric model proposed in Sect. 4.3 is an approximation, and, as noted

in the literature [Forrester et al. (2008); Snell & Lemp (2007)], dimensions of

the eye components can vary considerably from one person to another. Based

92

4.6 Visual Display Reconstruction

on the literature review on the biometry of the human eye (see Sect. 4.3.2),

this section analysis the error produced by variations from these parameters.

In [Guillon et al. (1986)], 220 eyes were measured with a mean anterior sphere

radius of 7.77mm; All measurements were within ±0.25mm of the mean (see

Table 4.2). No data for variations of the limbus radius was found and we

assume the same variation.

Synthetic experiments were prepared to investigate the reconstruction error

for varying anterior sphere radius and limbus radius. The reconstruction error

is defined as the distance between a reconstructed display edge and its ground

truth. The distance here is the mean distance of samples on the ground truth

display edge to the reconstructed display edge.

In the synthetic experiment, the limbus was imaged from a distance of

55cm, the focal length of the synthetic camera was 70mm and the image res-

olution was 1129 × 716 pixel. Fig. 4.16 plots the reconstruction error against

the variations in anterior sphere radius and limbus radius. It can be seen that

there is a linear relationship between variations in anatomical parameter and

the reconstruction error. For a relatively small variation of 0.05mm in the

limbus radius, the reconstruction error was 0.58cm and 0.61cm for the positive

and negative variations respectively. The same variation in the anterior sphere

radius resulted in a reconstruction error of 0.96cm and 0.89cm for the positive

and negative variations respectively. The reconstruction error was therefore

120 times the variation in the limbus radius and 180 times the variation in

the radius of the anterior sphere. This shows that variations in anatomical

constants are significant and an accurate reconstruction of the visual display

cannot be expected.


Display edges are independently estimated as four lines {Ltop,Lright,Lbottom,Lleft}with the closed form solution of the previous section. A rectangle will be ex-

tracted from the four lines in this section. In general the four lines will not

intersect, nor form a perfect rectangle, because:

93


• The smooth transition from iris to sclera can cause an inaccurate location

of the limbus. It has been shown (see Sect. 4.4.2) that a noisy limbus

will result in large reconstruction errors.

• The proposed geometric eye model of Sect. 4.3 is an approximation and

individual parameter could be different from the anatomical parameters

listed in Table 4.3. It has been shown in Sect. 4.5.1 that small variations

in the anatomical parameters will result in large reconstruction errors.

Limbus and anatomical parameters are adjusted through an optimization. For

a rectangular visual display, opposite sides {Ltop,Lbottom} and {Lright,Lleft}are parallel, therefore ∠(Ltop,Lbottom) = ∠(Lright,Lleft) = 0, where ∠(Li,Lj)

represents the angle between the lines Li and Lj. The energy

E1 = ∠(Ltop,Lbottom) + ∠(Lright,Lleft) (4.17)

will be minimal for rectangular lines. Similarly all angles in a rectangle are 90

degree and the term

E2 = |∠(Ltop,Lright)− 90|+ |∠(Lright,Lbottom)− 90|+

|∠(Lbottom,Lleft)− 90|+ |∠(Lleft,Ltop)− 90| (4.18)

will be minimal for rectangular lines. We therefore seek the minimum of the

function

E(rA, rL,LLimg,L

Rimg) = E1 + ωE2 (4.19)

for the corneal sphere radius rA, the limbus radius rL, the limbic projection

of the left eye LLimg, and the limbic projection of the right eye LR

img. As an

initialization for LLimg and LR

img, the semi-automatic approach described in

Sect. 4.4.3 was used. rA and rL were initialized with the values listed in

Table 4.3. The display reflections were marked manually. Alternatively, the

work [Wang et al. (2005a)] on separating reflections on human iris images

could be used. Parameters of the ellipse were bounded by ±1.5 pixel and the

radii of corneal sphere and limbus were bounded to lie within ±0.25mm and

±0.2mm respectively. An initial minimum was found through an exhaustive

(but coarse) sampling of the search space. In this initial search, the ellipse

94

4.7 Point-of-Gaze Estimation

Figure 4.17: Reflection of reconstructed visual display (red) on the corneal

sphere before (left) and after (right) the optimization.

parameters were sampled with an interval of 0.25 pixel and the anatomical

parameter with an interval of 0.05mm. A final steepest decent optimization was

initialized with the initial minimum from the subdivision. The combination

of two optimizations avoids an early termination in a local minimum. The

drawback of this strategy is that it is time-consuming and does not guarantee

the global minimum. The reflection of an estimated visual display on the

corneal sphere is shown with and without optimization in Fig. 4.17.

4.7 Point-of-Gaze Estimation

The intersection of the optic axes of both eyes with the visual display can be

computed. In general there will be no single intersection, because the optic

axis is distinct from the visual axis (see Sect. 4.3). A single point-of-gaze is

approximated as the point on the visual display that minimizes the sum of the

distances to the two lines (see Fig. 4.18).

To disambiguate and solve for the unknown sign s2 (from Sect. 4.4) we

reconstruct both gaze directions for each eye and verify if the gaze of both

eyes intersect the visual display.


Experiments on different subjects were carried out and results are presented

in this section. Four subjects were asked to gaze at nine points in a small

120mm-120mm area on a visual display. To prevent occlusions from the eyelid

95

4.9 Conclusions

visualaxis

left iris right iris

optic axis

φRφ

L

actual PoGestimated PoG

display

Figure 4.18: Point-of-gaze (PoG) is approximated by intersecting the optic

axes with the visual display. The point on the visual display minimizing the

two intersections is selected as the PoG.

and reflections of the nose on the cornea, we place the camera below the visual

display (∼45◦ between display normal and cameras principal axis). We em-

ployed a 70mm lens and focused manually so that the subject’s eyes were sharp

at a distance of around 45cm to the display. Results for viewing directions of

the subjects are shown in Fig. 4.19 and Fig. 4.21 (Fig. 4.20 and Fig. 4.22 show

close ups of the right eye). The results of the gaze estimation are shown in

Fig. 4.23. Average errors for the subjects were around 10 percents. Please

note that all experimental results have been obtained with a single image per

gaze direction. The display was estimated for each image independently.

4.9 Conclusions

This chapter introduces a method for reconstructing eyes and display from

reflections on the cornea and investigates the minimal information required

for gaze estimation. It is shown that limbus and eye reflections on a single

intrinsically calibrated image provide enough information for this. Quantita-

tive results with real data show that it is possible to determine approximately

where a user is focusing his attention relative to a visual display from just a

single image. As expected, the accuracy of the method is worse compared to

96

4.9 Conclusions

A

B

Figure 4.19: Estimated limbus normal (in yellow) and limbus (in green).

97

4.9 Conclusions

A

B

Figure 4.20: A close up of the estimated limbus normal (in yellow) and limbus

(in green).

98

4.9 Conclusions

C

D

Figure 4.21: Estimated limbus normal (in yellow) and limbus (in green).

99

4.9 Conclusions

C

D

Figure 4.22: A close up of the estimated limbus normal (in yellow) and limbus

(in green).

100

4.9 Conclusions

120 140 160 180 200 220 240 26060

80

100

120

140

160

180

200

220

120 140 160 180 200 220 240 26080

100

120

140

160

180

200

220

120 140 160 180 200 220 240 260 28080

100

120

140

160

180

200

220

120 140 160 180 200 220 240 260 28080

100

120

140

160

180

200

220

[mm]

[mm]

[mm]

[mm]

[mm] [mm]

[mm] [mm]

A B

C D

Figure 4.23: Point-of-gaze estimation results. Four subjects were asked to gaze

at 9 points on a visual display. The point-of-gaze estimation is plotted as a

red dot, while the ground truth is plotted as a cross. The visual display was

reconstructed independently for each of the 36 images. Average errors were

15.14mm, 16.90mm, 14.40mm and 25.13mm respectively.

101

4.9 Conclusions

commercial systems which use multiple images and active illumination. Gaze

estimation has potential direct and indirect implications on various fields, in-

cluding computer science (HCI), psychology (perception), industrial engineer-

ing (usability) and advertising. We believe that these areas will still benefit

from a system with reduced accuracy. Please note that all experiments were

performed using just a single image for the estimation of gaze, display and eyes,

and we found the main challenge is the accurate estimation of the display.

Eye gaze estimation has a history of more than 100 years, yet it is still not

widely used because the technology lacks usability requirements that hinder its

applicability. One of the main challenges are the calibration requirements for

gaze estimation. Minimal setup and calibration requirements of our method

enable this technology for everyone and not just for those who buy an commer-

cial gaze tracker and are willing to run through a tedious system calibration

process.

The following shortcomings have been identified in the proposed method,

and we wish to address them in future work:

• A spherical approximation of the eye’s reflective element is not optimal.

We would like to study a better, more complex approximation of the

cornea. An ellipsoid could be used that better fits the empirical values

in the literature.

• The semi-automatic limbus segmentation and the manual screen reflec-

tion segmentation could be improved to be fully automatic, robust and

able to deal with situations of illumination changes, varying focus, con-

tact lenses and partial occlusion from eyelid. There are sophisticated

segmentation algorithms for limbus segmentation (e.g [Li et al. (2005);

Ryan et al. (2008)]) and for accurate reflection segmentation (e.g [Wang

et al. (2005a)]) available.

• Temporal information could be used to improve the limbus and reflection

segmentation and solve for the unknown sign.

102

Chapter 5

Conclusions

5.1 Summary

This thesis has presented theoretical and practical solutions for light source

estimation from spherical specular reflections. Novel methods were introduced

for the estimation of

• multiple light directions and camera poses (Chapter 2),

• polygonal light source (Chapter 3), and

• visual display and point-of-gaze (Chapter 4).

A brief summary of the algorithms and techniques introduced is given below.

The problem of recovering both the light directions and camera poses from

a single sphere was addressed in Chapter 2. The two main contributions of

this chapter were a closed form solution for recovering light directions from

the specular highlights observed in a single image of a sphere with unknown

size and location; and a closed form solution for recovering the relative camera

poses using the estimated sphere and light directions. It was shown that given

the intrinsic parameters of a camera, a scaled sphere can be reconstructed from

its image. The translation direction of the sphere center from the camera center

can be determined uniquely, but the distance between them will be scaled by

the unknown radius of the sphere. It was then shown that the light directions

can be recovered independent of the radius chosen in locating the sphere. If

103

5.1 Summary

the sphere is observed by multiple views, the sphere center recovered using a

common fixed radius will fix the translations of the cameras from the sphere

center. The relative rotations between the cameras can then be determined

by aligning the relative light directions recovered in each view. As there exists

closed form solutions for all the computation steps involved, the proposed

method is extremely fast and efficient. This greatly eases the work of multiple

views light estimation.

A polygonal light source from the image of a specular sphere was recovered

in Chapter 3. An empirical analysis showed that single view and single sphere

line reconstruction is ill-conditioned. The main contributions in this chapter

were a novel algorithm for reconstruction of a polygon from the reflections on

two spheres observed from a single viewpoint or from two views of a single

sphere. It was shown that spheres do not need to be calibrated, and the

reconstruction is up to the scale of the sphere.

A method for reconstructing eyes and display from reflections on the cornea

was introduced in Chapter 4. The minimal information required for gaze es-

timation were investigated. It was shown that limbus and eye reflections on a

single intrinsically calibrated image provide enough information. Quantitative

results with real data showed that it is possible to determine approximately

where a user is focusing his attention relative to a visual display from just a

single image. As expected, the accuracy of the method is worse compared to

commercial systems which use multiple images and active illumination. Eye

gaze estimation has a history of more than 100 years, yet it is still not widely

used because the technology lacks usability requirements that hinder its ap-

plicability. One of the main challenges are the calibration requirements for

gaze estimation. Minimal setup and calibration requirements of the proposed

method enable this technology for everyone and not just for those who buy

an commercial gaze tracker and are willing to run through a tedious system

calibration process.

104

5.2 Future Work

5.2 Future Work

Though the methods in this thesis are novel and very practical, there are

certainly rooms for improvements:

• Throughout the thesis it has been assumed that the intrinsic camera

parameters are known. In future work, the intrinsics could be estimated

from the projections of spheres/circles silhouettes and the projections of

specular highlights.

• Near light sources, i.e. light sources which cannot be approximated by a

single direction, could be estimated in future work. Most computer vision

algorithms assume directional lighting conditions. However, real data is

often captured under near light sources with intensity fall-off effects, and

in practice, it is often difficult to assume a directional lighting especially

when dealing with large scenes or a limited working space.

• In computer vision, it is often assumed that there are only light sources of

a specific type present in the scene. In the real world there are, however,

a large number of different light sources. Future work should estimate

the general natural lighting conditions present in everyday environments.

• It would be very beneficial if we could estimate light without the need

of any calibration object like a specular sphere.

105

Appendix A

Quaternion Representation of

Rotations

Quaternions were first introduced in [Hamilton (1844)] and are denote with a

circle. They can be represented by a four vector

q = (w, x, y, z)T , (A.1)

or alternatively by a sum of a real number w and three imaginary numbers x,

y and z as

q = w + ix+ jy + kz, (A.2)

where i2 = j2 = k2 = ijk = −1. Note that the purely imaginary quaternion

q = 0 + ix + jy + kz represents the vector q = (x, y, z)T . The conjugate of a

quaternion negates its imaginary part q∗ = w − ix − jy − kz (similar to the

conjugation of a complex number).

Products of quaternions can be expressed as a multiplication between an

orthogonal 4× 4 quaternion matrix and a quaternion

qr =

w −x −y −zx w −z yy z w −xz −y x w

r = Qr

or

rq =

w −x −y −zx w z −yy −z w xz y −x w

r = Qr. (A.3)

106

It can be seen that the conjugate of a quaternion can be represented by the

transpose of the quaternion matrix.

A rotation

x′ = Rx (A.4)

can be represented using unit quaternions (‖q‖ = 1) as

x′ = qxq∗. (A.5)

This can be seen from qxq∗ = Qxq∗ = QTQx, where

QTQ =

q · q 0 0 0

0 (w2 + x2 − y2 − z2) 2(xy − wz) 2(xz + wy)0 2(yx+ wz) (w2 − x2 + y2 − z2) 2(yz − wx)0 2(zx− wy) 2(zy + wx) (w2 − x2 − y2 + z2)

.(A.6)

If q is a unit quaternion, q · q = 1 and the lower-right-hand submatrix of QTQis an orthonormal matrix representing the rotation. Therefore, if q is a unit

quaternions, A.5 can be used to represent a rotation.

It is much easier to enforce a quaternion to have unit length than it is to

ensure a matrix is orthonormal, and therefore unit quaternions are a popular

representation for rotations.

For a more complete discussion on quaternions, the interested reader may

refer to [Hamilton (1844) or Horn (1987)].

107

Appendix B

Plucker Representation of Lines

Plucker line coordinates, introduced by Julius Plucker, are widely used in

problems involving lines in computer graphics, computational geometry and

computer vision. Notation and properties are reviewed in the following.

A line in 3-space can be determined by the join of two distinct points that it

contains, or equivalently, the intersection of two distinct planes that contain it.

Unfortunately, lines cannot be represented conveniently together with points

and planes; Lines, with their four degrees of freedom, are represented by a

homogeneous 5-vector, whereas points and planes, with their three degrees of

freedom, are represented by a homogeneous 4-vector. Various representations

for 3-space lines have been introduced to overcome this discrepancy, and this

thesis adopts Plucker line coordinates [Hodge & Pedoe (1947)].

Each pair of distinct points with homogeneous representation A =[ax ay az 1

]Tand B =

[bx by bz 1

]Tdefines a Plucker matrix representing a line in

3-space as

L = ABT −BAT

=

0 axby − aybx axbz − azbx ax − bx

aybx − axby 0 aybz − azby ay − byazbx − axbz azby − aybz 0 az − bzbx − ax by − ay bz − az 0

. (B.1)

Note that L is a skew-symmetric matrix with just 6 independent non-zero

108

elements 1

L =

0 l12 l13 l14

−l12 0 l23 −l42

−l13 −l23 0 l34

−l14 l42 −l34 0

. (B.2)

A dual Plucker matrix is obtained by the intersection of two planes P and Q

L∗ = PQT −QPT. (B.3)

The plane defined by the join of a point X and the line L is

π = L∗X (B.4)

and L∗X = 0 if, and only if, X is on L. The point defined by the intersection

of the line L with the plane π is

X = Lπ (B.5)

and Lπ = 0 if, and only if, L is on π. The four degrees of freedom for the

3-space line are represented as the 6 independent non-zero elements of L in

homogeneous representation and the requirement det L = 0.

A Plucker line coordinates representation can be defined as a homoge-

neous 6-vector that puts the six independent elements of L in a vector

L =[l12 l13 l14 l23 l42 l34

]T. (B.6)

The requirement det L = 0 can be expressed as a permuted inner product

L ·L = 0 (B.7)

or equivalently

l12l34 + l13l42 + l14l23 = 0, (B.8)

which is known as the quadratic Plucker relation or Klein quadric. The per-

mutation of (B.7) can be defined as

L =[l34 l42 l23 l14 l13 l12

]T. (B.9)

1The element l42 is used instead of l24 because it eliminates negatives in subsequentequations.

109

Let two lines L and L′ be the join of the points A, B and A′, B′. The two

lines intersect if and only if these four points are coplanar, i.e.

det[ A B A′ B′ ] = 0 (B.10)

or quivalently

l12l′34 + l13l

′42 + l14l

′23 + l23l

′14 + l42l

′13 + l34l

′12 = 0. (B.11)

Again, this can be expressed as a permuted inner product

L ·L′ = 0. (B.12)

In summary, it can be said that two lines L and L′ intersect (i.e. are coplanar)

if they satisfy (B.12), and the 6-vector L represents a line (i.e. intersects itself)

in 3-space if it satisfies (B.7).

In addition to providing a convenient incidence operation, Plucker coor-

dinates can be used in defining a linear map from a line in 3-space to its

image [Hartley & Zisserman (2004)]. For this purpose let us define a 3× 6 line

projection matrix

P =

P2 ∧P3

P3 ∧P1

P1 ∧P2

, (B.13)

where PiT are the rows of the point projection matrix, and Pi ∧ Pj are the

Plucker line coordinates of the intersection of the planes Pi and Pj. The linear

map from a line in 3-space to its image can now be written as

l = PL. (B.14)

Rows of a point projection matrix P can be interpreted as camera planes and

similarly rows of a line projection matrix P can be interpreted as lines. These

lines are the intersection of pairs of camera planes. They are in the null-space

of P and intersect at the camera center. The lines L for which PL = 0 pass

though the camera center.

110

Appendix C

Spherical Forward Projection

The specular reflection of an arbitrary scene point on a sphere can be de-

termined analytically. It has been suggested previously [Ding et al. (2009);

Micusik & Pajdla (2004)] that this is a difficult problem, but it was shown

recently [Agrawal et al. (2010)] that the forward projection on a spherical mir-

ror results in a simple 4th degree equation with a closed form solution. In the

following this solution will be deduced.

Without loss of generality, let the plane of reflection (defined by sphere

center, camera center and a scene point) be the z = 0 plane (see Fig. C.1),

let the sphere center coincide with the world origin Sc =[

0 0 0]T

and

let the camera center lie on the positive y-axis at a distance d to the origin

C =[

0 d 0]T

. Let the scene point be denoted as L =[lx ly 0

]Tand

let the radius of the sphere be r.

The intersection of the z=0 plane and the sphere results in a circle centered

at the origin. The point of reflection P =[px py 0

]Tis on this circle and

satisfies

px =√r2 − p2

y. (C.1)

The normal vector at the point of reflection P is given by

N =1

r

√r2 − p2y

py

0

. (C.2)

The viewing ray from the camera center to the point of reflection is

111

P

L

C

x

y

NV

Rd

z = 0 plane

Scr

Figure C.1: Reflection on a sphere.

V = P−C =

√r2 − p2y

py − d0

, (C.3)

and the reflected ray follows from the law of reflection

R = V − 2N(N ·V) =1

r2

(2dpy − r2)√r2 − p2

y

2dp2y − pyr

2 − dr2

0

. (C.4)

The condition that the scene point L should lie on this reflected ray can be

expressed as

R× (L−P) = 0⇐⇒

1

r2

(2dpy − r2)√r2 − p2

y

2dp2y − pyr

2 − dr2

0

× lx −

√r2 − p2

y

ly − py

0

= 0, (C.5)

112

which can be rearranged to

(4d2l2x + 4d2l2y)p4y +

(−4dr2l2x − 4dly(dr2 + r2ly))p3y +

(l2x(r4 − 4d2r2) + (dr2 + r2ly)2 − 4d2r2l2y)p2y +

(2dr4l2x + 4dr2ly(dr2 + r2ly))py +

d2r4l2x − r2(dr2 + r2ly)2 = 0. (C.6)

This 4th degree equation has four real solutions. The correct solution that we

are looking for satisfies

R ·N = −V ·N. (C.7)

113

References

Adler, F.H. & Moses, R.A. (1975). Adler’s Physiology of the Eye : Clinical

Application. Mosby. 74, 77

Agrawal, A., Taguchi, Y. & Ramalingam, S. (2010). Analytical for-

ward projection for axial non-central dioptric and catadioptric cameras. In

Proceedings of the European Conference on Computer Vision. 69, 111

Agrawal, M. & Davis, L. (2003). Camera calibration using spheres: a semi-

definite programming approach. In Proceedings of the International Confer-

ence on Computer Vision. 28

Besharse, J., Dartt, D., Battelle, B., Dana, R., Beebe, D., Bex,

P., Bishop, P., Bok, D., D’Amore, P., Edelhauser, H., Mcloon,

L., Niederkorn, J., Reh, T. & Tamm, E. (2010). Encyclopedia of the

Eye, Four-Volume Set . Elsevier. 74

Bimber, O. & Raskar, R. (2005). Spatial Augmented Reality: Merging Real

and Virtual Worlds . A. K. Peters, Ltd. 2, 40

Blake, A. & Brelstaff, G. (1988). Geometry from specularities. In Pro-

ceedings of the International Conference on Computer Vision. 2

Brooks, M.J. & Horn, B.K.P. (1985). Shape and source from shading. In

Proceedings of the International Joint Conference on Artificial Intelligence.

7, 42

Cancer (2009). National cancer institutes. http://www.cancer.gov. 75

114

REFERENCES

Chen, J. & Ji, Q. (2008). 3d gaze estimation with a single camera without

ir illumination. In Proceedings of the International Conference on Pattern

Recognition. 72, 73

Chen, Q., Wu, H. & Wada, T. (2004). Camera calibration with two arbi-

trary coplanar circles. In Proceedings of the European Conference on Com-

puter Vision. 82, 84

Chen, Y., Adjouadi, M., Han, C. & Barreto, A. (2009). A new un-

constrained iris image analysis and segmentation method in biometrics. In

Proceedings of the International Symbosium on Biomedical Imaging . 88

Clark, B.A.J. (1973). Variations in corneal topography. The Australian

Journal of Optometry , 56(11), 399413. 76, 77

Clark, J.J. (2006). Photometric stereo with nearby planar distributed illu-

minants. In Proceedings of the Canadian Conference on Computer and Robot

Vision. 40

Daum, M. & Dudekb, G. (1997). Out of the dark: Using shadows to re-

construct 3d surfaces. In Proceedings of the Asian Conference on Computer

Vision. 2

Debevec, P. (1998). Rendering synthetic objects into real scenes: bridging

traditional and image-based graphics with global illumination and high dy-

namic range photography. Transactions on Graphics . 9, 42

Ding, Y., Yu, J. & Sturm, P. (2009). Multiperspective stereo matching and

volumetric reconstruction. In Proceedings of the International Conference on

Computer Vision. 111

Duchowski, A.T. (2007). Eye Tracking Methodology: Theory and Practice.

Springer-Verlag New York. 70, 72

Fabian, T., Gaura, J. & Kotas, P. (2010). An algorithm for iris extrac-

tion. In Proceedings of the International Conference on Image Processing

Theory Tools and Applications , 464–468. 88

115

REFERENCES

Fitzgibbon, A., Pilu, M. & Fisher, R. (1999). Direct least square fit-

ting of ellipses. Transactions on Pattern Analysis and Machine Intelligence,

21(5), 476–480. 21, 27, 86, 90

Fitzgibbon, A.W. & Fisher, R.B. (1995). A buyer’s guide to conic fitting.

In Proceedings of the British Machine Vision Conference. 60, 64

Forrester, J.V., Dick, A.D., McMenamin, P.G. & Roberts, F.

(2008). The Eye Basic Sciences in Practice. Saunders Elsevier. 74, 76, 77,

92

Francken, Y., Francken, Y., Hermans, C. & Bekaert, P. (2007).

Screen-camera calibration using a spherical mirror. In Proceedings of the

Canadian Conference on Computer and Robot Vision. 41, 42

Francken, Y., Cuypers, T., Mertens, T., Gielis, J. & Bekaert,

P. (2008a). High quality mesostructure acquisition using specularities. In

Proceedings of the Conference on Computer Vision and Pattern Recognition.

40

Francken, Y., Hermans, C., Cuypers, T. & Bekaert, P. (2008b).

Fast normal map acquisition using an lcd screen emitting gradient patterns.

In Proceedings of the Canadian Conference on Computer and Robot Vision.

39

Francken, Y., Hermans, C. & Bekaert, P. (2009). Screen-camera cal-

ibration using gray codes. In Proceedings of the Canadian Conference on

Computer and Robot Vision. 43, 73

Funk, N. & Yang, Y.H. (2007). Using a raster display for photometric

stereo. In Proceedings of the Canadian Conference on Computer and Robot

Vision. 39, 42

Gasparini, S. & Sturm, P. (2008). Multi-view matching tensors from lines

for general camera models. In Computer Vision and Pattern Recognition

Workshops . 47

116

REFERENCES

Guestrin, E.D. & Eizenman, M. (2008). Remote point-of-gaze estimation

requiring a single-point calibration for applications with infants. In Proceed-

ings of the Symposium on Eye Tracking Research & Applications . 70, 72,

73

Guillon, M., Lydon, D.P.M. & Wilson, C. (1986). Corneal topography:

A clinical model. Ophthalmic and Physiological Optics , 6, 1475–1313. 76,

77, 93

Hamilton, W.R. (1844). On quaternions, or on a new system of imaginaries

in algebra. Philosophical Magazine, 25, 489–495. 106, 107

Hansen, D.W. & Ji, Q. (2010). In the eye of the beholder: A survey of

models for eyes and gaze. Transactions on Pattern Analysis and Machine

Intelligence, 32, 478–500. 74, 88

Hartley, R.I. (1995). In defence of the 8-point algorithm. In Proceedings of

the International Conference on Computer Vision. 6, 24

Hartley, R.I. & Zisserman, A. (2004). Multiple View Geometry in Com-

puter Vision. Cambridge University Press. 10, 24, 25, 31, 82, 110

Hilbert, D. (1952). Geometry and the Imagination. Chelsea Publishing Com-

pany. 47

Hodge, W.V.D. & Pedoe, D. (1947). Methods of Algebraic Geometry .

Cambridge University Press. 108

Horn, B.K.P. (1977). Understanding image intensities. Artificial Intelli-

gence, 8, 201–231. 2

Horn, B.K.P. (1987). Closed-form solution of absolute orientation using unit

quaternions. Journal of the Optical Society of America, 4, 629–642. 18, 107

Ikeuchi, K. & Sato, K. (1991). Determining reflectance properties of an

object using range and brightness images. Transactions on Pattern Analysis

and Machine Intelligence, 13, 1139–1153. 7

117

REFERENCES

Johnson, M.K., Stork, D.G., Biswas, S. & Furuichi, Y. (2008). In-

ferring illumination direction estimated from disparate sources in paintings:

an investigation into jan vermeer’s girl with a pearl earring. In Computer

Image Analysis in the Study of Art . 2

Kang, J.J., Eizenman, M., Guestrin, E.D. & Eizenman, E. (2008).

Investigation of the cross-ratios method for point-of-gaze estimation. Trans-

actions on Biomedical Engineering , 55, 2293–2302. 73

Kiely, P.M., Smith, G. & Carney, L.G. (1982). The mean shape of the

human cornea. Optica Acta, 29, 1027–1040. 76, 77

Kohlbecher, S., Bardinst, S., Bartl, K., Schneider, E., Poitschke,

T. & Ablassmeier, M. (2008). Calibration-free eye tracking by reconstruc-

tion of the pupil ellipse in 3d space. In Eye Tracking Research & Applica-

tions . 72

Lagger, P. & Fua, P. (2008). Retrieving multiple light sources in the pres-

ence of specular reflections and texture. Computer Vision and Image Un-

derstanding , 111, 207–218. 8

Lagger, P., Salzmann, M. & V. Lepetit, P.F. (2008). 3d pose refine-

ment from reflections. In Proceedings of the Conference on Computer Vision

and Pattern Recognition, 1–8. 2, 9, 73

Langer, M., Langer, M. & Zucker, S. (1997). What is a light source? In

Proceedings of the Conference on Computer Vision and Pattern Recognition.

39

Lanman, D., Crispell, D., Wachs, M. & Taubin, G. (2006a). Spherical

catadioptric arrays: Construction, multi-view geometry, and calibration. In

Proceedings of the International Symposium on 3D Data Processing Visual-

ization and Transmission. 41, 43

Lanman, D., Wachs, M., Taubin, G. & Cukierman, F. (2006b). Recon-

structing a 3d line from a single catadioptric image. In Proceedings of the

118

REFERENCES

International Symposium on 3D Data Processing Visualization and Trans-

mission. 41, 43

Li, D., Winfield, D. & Parkhurst, D.J. (2005). Starburst: A hybrid

algorithm for video-based eye tracking combining feature-based and model-

based approaches. In Computer Vision and Pattern Recognition Workshops .

27, 88, 102

Li, S., Wong, K. & Schnieders, D. (2008). Using illumination estimated

from silhouettes to carve surface details on visual hull. In Proceedings of the

British Machine Vision Conference. 26, 27

Li, Y., Li, Y., Lin, S., Lu, H. & Shum, H.Y. (2003). Multiple-cue illu-

mination estimation in textured scenes. In Proceedings of the International

Conference on Computer Vision. 8

Liou, H.L. & Brennan, N.A. (1997). Anatomically accurate, finite model

eye for optical modeling. Journal of the Optical Society of America, 14,

1684–1695. 79

Longuet-Higgins, H.C. (1981). A computer algorithm for reconstructing a

scene from two projections. Nature, 293, 133–135. 6, 19, 24

Micusik, B. & Pajdla, T. (2004). Autocalibration & 3d reconstruction

with non-central catadioptric cameras. In Proceedings of the Conference on

Computer Vision and Pattern Recognition. 111

Nayar, S. (1988). Sphereo: Determining depth using two specular spheres and

a single camera. In Proceedings of the Cambridge Symposium on Advances

in Intelligent Robotics Systems . 41, 42

Nene, S.A. & Nayar, S.K. (1998). Stereo with mirrors. In Proceedings of

the International Conference on Computer Vision. 41, 42

Nishino, K. & Nayar, S.K. (2004a). Eyes for relighting. In Transactions

on Graphics . 73

119

REFERENCES

Nishino, K. & Nayar, S.K. (2004b). The world in an eye. In Proceedings

of the Conference on Computer Vision and Pattern Recognition. 73

Nishino, K. & Nayar, S.K. (2006). Corneal imaging system: Environment

from eyes. International Journal of Computer Vision, 70, 23–40. 73

Nister, D. (2004). An efficient solution to the five-point relative pose prob-

lem. Transactions on Pattern Analysis and Machine Intelligence, 26, 756–

770. 24

Nitschke, C., Nakazawa, A. & Takemura, H. (2009). Display-camera

calibration from eye reflections. In Proceedings of the International Confer-

ence on Computer Vision. 73

Osadchy, M., Jacobs, D. & Ramamoorthi, R. (2003). Using specular-

ities for recognition. In Proceedings of the Conference on Computer Vision

and Pattern Recognition. 2

Pentland, A.P. (1982). Finding the illuminant direction. Journal of the

Optical Society of America, 72, 448. 7

Pentland, A.P. (1990). Linear shape from shading. International Journal

of Computer Vision, 4, 153–162. 42

Powell, M., Powell, M., Sarkar, S. & Goldgof, D. (2000). Calibra-

tion of light sources. In Proceedings of the Conference on Computer Vision

and Pattern Recognition. 2

Ryan, W.J., Woodard, D.L., Duchowski, A.T. & Birchfield, S.T.

(2008). Adapting starburst for elliptical iris segmentation. In Proceedings

of the International Conference on Biometrics: Theory, Applications and

Systems . 102

Safaee-Rad, R., Tchoukanov, I., Smith, K.C. & Benhabib, B. (1992).

Three-dimensional location estimation of circular features for machine vi-

sion. Transactions on Robotics and Automation, 8, 624–640. 82

120

REFERENCES

Schindler, G. (2008). Photometric stereo via computer screen lighting for

real-time surface reconstruction. In Proceedings of the International Sympo-

sium on 3D Data Processing Visualization and Transmission. 40

Schnieders, D., Wong, K.Y.K. & Dai, Z. (2009). Polygonal light source

estimation. In Proceedings of the Asian Conference on Computer Vision. 4

Schnieders, D., Fu, X. & Wong, K.Y.K. (2010). Reconstruction of dis-

play and eyes from a single image. In Proceedings of the Conference on

Computer Vision and Pattern Recognition. 4

Schubert, H. (1874). Kalkul der Abzahlenden Geometrie. Teubner. 44, 47

Shih, S.W., Wu, Y.T. & Liu, J. (2000). A calibration-free gaze tracking

technique. In Proceedings of the International Conference on Pattern Recog-

nition. 72, 73

Snell, R. & Lemp, M.A. (2007). Clinical Anatomy of the Eye. Wiley-

Blackwell. 74, 76, 77, 92

SRResearch (2009). EyeLink 1000 Technical specifications . SR Research

Ltd., 5516 Osgoode Main St. Ottawa, Ontario Canada. 71, 72

Stenstrom, S. (1948). Investigation of the variation and the correlation of

the optical elements of human eyes. American Journal of Optometry , 25,

340350. 76, 77

Stork, D. (2009). Computer analysis of lighting in realist master art: Cur-

rent methods and future challenges. In International Conference on Image

Analysis and Processing . 2

Sturm, P. & Bonfort, T. (2006). How to compute the pose of an object

without a direct view? In Proceedings of the Asian Conference on Computer

Vision. 43

Tarini, M., Lensch, H.P.A., Goesele, M. & Seidel, H.P. (2005). 3d

acquisition of mirroring objects using striped patterns. Graphical Models ,

67, 233–259. 39, 42

121

REFERENCES

Teller, S. & Hohmeyer, M. (1999). Determining the lines through four

lines. Journal of Graphics Tools , 4, 11–22. 50

Wang, H., Lin, S., Liu, X. & Kang, S.B. (2005a). Separating reflections

in human iris images for illumination estimation. In Proceedings of the In-

ternational Conference on Computer Vision. 94, 102

Wang, J.G., Sung, E. & Venkateswarlu, R. (2003). Eye gaze estimation

from a single image of one eye. In Proceedings of the Conference on Computer

Vision and Pattern Recognition. 73

Wang, J.G., Sung, E. & Venkateswarlu, R. (2005b). Estimating the eye

gaze from one eye. Computer Vision and Image Understanding , 98, 83–103.

71, 73

Wang, Y. & Samaras, D. (2003). Multiple directional illuminant estima-

tion from a single image. In International Conference on Computer Vision

Workshops . 8

Wang, Y. & Samaras, D. (2008). Estimation of multiple directional illu-

minants from a single image. Image and Vision Computing , 26, 1179–1195.

8

Wong, K.Y.K., Schnieders, D. & Li, S. (2008). Recovering light direc-

tions and camera poses from a single sphere. In Proceedings of the European


Woodham, R.J. (1980). Photometric method for determining surface orien-

tation from multiple images. Optical Engineering , 19, 139144. 2

Wu, H., Chen, Q. & Wada, T. (2004). Conic-based algorithm for visual

line estimation from one image. In Proceddings of the Internation Conference

on Automatic Face and Gesture Recognition. 73

Yang, Y. & Yuille, A. (1991). Sources from shading. In Proceedings of the

Conference on Computer Vision and Pattern Recognition. 8

122

REFERENCES

Yoo, D.H. & Chung, M.J. (2005). A novel non-intrusive eye gaze estima-

tion using cross-ratio under large head motion. Computer Vision and Image

Understanding , 98, 25–51. 73

Zhang, H., Wong, K.Y.K. & Zhang, G. (2007). Camera calibration from

images of spheres. Transactions on Pattern Analysis and Machine Intelli-

gence, 29, 499–502. 28

Zhang, R., Tsai, P.S., Cryer, J.E. & Shah, M. (1999). Shape from shad-

ing: A survey. Transactions on Pattern Analysis and Machine Intelligence,

21, 690–706. 42

Zhang, Y., Zhang, Y. & Yang, Y.H. (2001). Multiple illuminant direc-

tion detection with application to image synthesis. Transactions on Pattern

Analysis and Machine Intelligence, 23, 915–920. 8

Zhang, Z. (1999). Flexible camera calibration by viewing a plane from un-

known orientations. In Proceedings of the International Conference on Com-

puter Vision. 60, 64, 88

Zhang, Z. (2000). A flexible new technique for camera calibration. Transac-

tions on Pattern Analysis and Machine Intelligence, 22, 1330–1334. 28

Zheng, Q. & Chellappa, R. (1991a). Estimation of illuminant direction,

albedo, and shape from shading. Transactions on Pattern Analysis and Ma-

chine Intelligence, 13, 680–702. 7

Zheng, Q. & Chellappa, R. (1991b). Estimation of illuminant direction,

albedo, and shape from shading. In Proceedings of the Conference on Com-

puter Vision and Pattern Recognition. 42

Zheng, Y. & Liu, Y. (2008). The projective equation of a circle and its appli-

cation in camera calibration. In Proceedings of the International Conference

on Pattern Recognition. 82

Zhou, W. & Kambhamettu, C. (2002). Estimation of illuminant direc-

tion and intensity of multiple light sources. In Proceedings of the European


123

REFERENCES

Zhou, W. & Kambhamettu, C. (2008). A unified framework for scene

illuminant estimation. Image Vision Computing , 26, 415–429. 43

Zuo, J., Ratha, N.K. & Connell, J.H. (2008). A new approach for iris

segmentation. In Computer Vision and Pattern Recognition Workshops . 88

124

light source estimation from spherical reflections

Documents