Online Submission ID: 010
Inverse Transient Light Transport via Time-Images
Abstract1
We show that multi-path analysis of a photo from a time-of-flight2
(ToF) camera provides a tantalizing opportunity to infer about ge-3
ometry and reflectance of not only visible but also hidden parts of4
a scene. We provide a frame-work for analyzing global light trans-5
port from just a single viewpoint using cameras which capture a6
temporal profile for each pixel at pico-second resolution. Unlike7
traditional cameras which estimate intensity per pixel, I(x, y), the8
ultra-short sampling cameras capture a time-image I(x, y, t). This9
time-image encodes global dynamical interactions of light with the10
scene elements.11
In this paper we model the dynamics of global illumination using12
transient properties of light transport formulated as a linear time13
invariant state space system. We also propose formulations and14
algorithms that use the time-image I(x, y, t) for reasoning about15
scene content in scenarios where transient reasoning exposes scene16
properties that are beyond the reach of traditional computer vision.17
In particular, we present a system identification algorithm for in-18
ferring the depth and scene structure as well as the measuring 4-19
dimensional bidirectional reflectance distribution function (BRDF)20
of the scene elements using transient global light transport. We21
conclude by presenting inverse rendering results using the inferred22
scene properties and traditional global illumination rendering.23
Keywords: Light transport, Global illumination, Multi-path anal-24
ysis, Inverse problems, Inverse rendering, Computational Imaging25
1. Introduction26
Cameras are an integral part of computer graphics and vision re-27
search and 2-dimensional intensity images I(x, y, ) have long been28
used to observe and interpret scenes. The recovered digital mod-29
els have included properties such as 3D geometry, scene lighting,30
and surface reflectance. These have found applications in robotics,31
industrial sensing, security, and user interfaces. New sensors and32
methods for interpreting scenes will clearly be of benefit in many33
application areas. This paper introduces a framework for reason-34
ing about transient light transport and shows that it can allow new35
properties of scenes to be observed and interpreted.36
During the exposure time, the incoming light arriving at a pixel37
is integrated along angular, temporal and wavelength dimensions38
to record a single intensity value. Many distinct scenes result in39
identical projections, and thus identical recorded intensity values40
on the sensor. Thus it is challenging to estimate scene properties41
such as the BRDF of scene elements, the depth in a multi-reflective42
scene, or the overall scale of a scene which are not not directly ob-43
servable. It is possible to sample other dimensions. 4D lightfield44
cameras sample both the incoming angle of light and spatial loca-45
tion. They have paved the way for powerful algorithms to perform46
many unconventional tasks. We show that capturing time-variation47
of incoming light, similarly provides powerful tools.48
Steady-state and Transient Light Transport: Steady-state light49
transport assumes an equilibrium in global illumination due to in-50
finite speed of light. In a room-sized environment, a microsecond51
exposure (integration) time is long enough for a light impulse to52
fully traverse all the possible multi-paths introduced due to inter-53
reflections between scene elements and reach equilibrium (steady)54
state. Traditional video cameras sample light very slowly compared55
to the time scale at which the transient properties of light come into56
play. Videos may be interpreted as a sequence of images of dif-57
ferent but static worlds because the exposure time of each frame is58
sufficiently long.59
In transient light transport, we assume that the speed of light is fi-60
nite. As light scatters around a scene, it takes different paths, and61
longer paths take a longer time to traverse. Even a single pulse of62
light along a single ray-beam can evolve into a complex pattern in63
time, as shown in Figure ??. The ray impulse is a space-time im-64
pulse and the resultant time-image I(x, y, t) which is patterns of65
rate of arrival of photons at a particular pixel. It is analogous to66
the impulse response function in signal processing literature. We67
define the Space Time Impulse Response (STIR(S)) of a scene,68
S, as a five-dimensional function, i.e., for each outgoing 2D ray-69
impulse direction for illumination we record the 3D ray-impulse70
time-image. Unlike a traditional 2D pixel which measures the num-71
ber of photons, transient transport describes rate of incoming pho-72
tonsas a function of time. The forward model for transient light73
transport for direct and global illumination is well understood in74
graphics, e.g. in ray tracing or photon mapping. In this paper, we75
emphasize the inverse transient light transport problem to enable76
novel scene understanding by inferring scene characteristics.77
Our approach: This work proposes an imaging model which78
samples light on the picosecond scale, equivalent to light travel on79
the order of millimeters. At this scale, it is possible to reason about80
the individual paths light takes within a scene, enabling measure-81
ment of scene properties that are beyond the reach of traditional82
machine vision. We enable this reasoning with a theoretical formu-83
lation for transient global light transport.84
The photon-bounce counting recursive linear systems used in85
graphics and vision systems are well-suited for steady state trans-86
port. However, analyzing dynamic time based interactions requires87
a different approach. System identification (sysid) is the algorith-88
mic procedure of building dynamical models from measured in-89
put/output data and estimating their parameters. We choose a ’grey90
box’ model of sysid where, although specific of the internals are91
known only partially, physical laws of the nature of light transport92
allows estimation of unknown free parameters, such as geometry93
and reflectance of visible as well as hidden scene elements.94
In this paper, we show how one can invert the captured STIR into95
geometric and reflectance properties of the scene by discretizing it96
into patches. We perform this in two steps. We first exploit in-97
formation about on-sets for estimating geometry of patches with98
unknown reflectance. Onsets are used in LIDAR for estimating ge-99
ometry of only visible and diffuse patches. In the second stage,100
we use the state-space formulation to infer the irradiance from one101
patch to another, indirectly computing reflectance of each patch.102
This allows reasoning about general surface reflectance of visible103
as well as about hidden patches.104
Our work is deeply inspired by pioneering work in analysis of light105
transport by Seitz, Matsushita and Kutulakos [2005], dual photog-106
raphy by Sen et. al. [2005] and separation of direct and global com-107
ponents by Nayar et al [2006]. Earlier inverse light transport used108
diffuse-world assumptions due to the limited data available from109
traditional steady state cameras. A measured STIR contains more110
information, and thus requires only minimum assumptions about111
the scene in order to infer its properties. In dual Photography [Sen112
et al. 2005], to reconstruct a surface hidden from the camera, one113
requires a light source to illuminate that hidden surface. Transient114
1
Online Submission ID: 010
reasoning allows inferences even when no device is in the line of115
sight of that surface.116
In addition to simulated results, we built a prototype physical sys-117
tem using a directional femto-second laser and directionally sen-118
sitive pico-second accurate detectors. However, conducting full119
experiments is beyond our current abilities. We instead show all120
the key elements: geometry, photometry, bounce observations, and121
functioning in free space. We emphasize that our hardware experi-122
mentation, although based on expensive components, is extremely123
preliminary and the paper is focused on providing the theoretical124
tools necessary to enable new applications. Importantly, our hard-125
ware design is well aligned with existing commercial devices, and126
we believe that STIR imaging will find practical applications.127
Limitations: Our approach shares limitations with existing ac-128
tive illumination systems in terms of power and use scenarios:129
the sources must overcome ambient illumination, and only scenes130
within finite distance from the camera can be imaged. In addi-131
tion, we require precise high-frequency pulsed opto-electronics at132
a speed and quality that are not currently available at consumer133
prices. We expect that solid state lasers will continue to increase134
in power and frequency, doing much to alleviate these concerns.135
An important challenge we face is to collect strong multi-path136
signals. Direct reflections from scene elements are significantly137
stronger than light which has traveled a more complex path, pos-138
sibly reflecting from diffuse surfaces. Computationally, solving139
the inverse light transport problem introduces significant complex-140
ity. As with all inverse problems, the inverse light transport prob-141
lem has degenerate cases in which multiple solutions (scenes) exist142
for the same observable STIR. Importantly, although our additional143
time-domain data is noisy, it still restricts the class of solutions to a144
greater extent than using the more limited data of traditional cam-145
eras. As is common, we make a set of a priori assumptions that146
serve as robustness constraints for the optimization algorithm we147
use to estimate the scene parameters.148
Contributions We propose a model of the dynamics of global il-149
lumination and inverse rendering using transient properties of light150
transport. This paper makes the following contributions:151
• A state space dynamical system formulation to derive the152
relationship between the transient and steady state impulse153
response, and a state update approach based on time rather154
than the photon bounce methods commonly used in computer155
graphics.156
• A specific system identification algorithm which accepts the157
scene STIR, and infers the scene structure and reflectance.158
• Example inverse rendering scenarios in which transient rea-159
soning can be used to infer scene properties that are beyond160
the reach of traditional machine vision: including direct ob-161
servation of distance, separation of direct and indirect illu-162
mination, and inference of scene elements hidden from the163
camera.164
• Reflectance-based segmentation, relighting, and material re-165
placement from a single STIR photo.166
• A hardware design for transient imaging based on a femto-167
second laser and pico-second accurate detectors, as well as168
initial experiments from our prototype.169
2. Related work170
2.1. Imaging Solutions171
LIDAR: LIDAR systems modulate light, typically on the order172
of nanoseconds, and measure the phase of the reflected signal to173
determine depth [Kamerman 1993]. Flash LIDAR systems use a174
2D imager to provide fast measurement of full depth maps [Id-175
dan and Yahav 2001; Lange and Seitz 2001; Gvili et al. 2003].176
Importantly, a number of companies are pushing this technology177
towards consumer price points [Canesta ; MESA Imaging ; 3DV178
Systems ; PMD Technologies ]. The quality of phase estimation179
can be improved by simulating the expected shape of the reflected180
signal [Jutzi06], or estimating the effect of ambient light [Miya-181
gawa and Kanade 1997; Schroeder et al. 1999; Kawakita et al.182
2000; Gonzalez-Banos and Davis 2004]. Some LIDAR systems183
do measure the transient photometric response function explicitly.184
Separately detecting multiple peaks in the sensor response can al-185
low two surfaces, such as forest canopy and ground plane, to be de-186
tected [Blair et al. 1999; Hofton et al. 2000] and waveform analysis187
can detect surface discontinuities [Vandapel et al. 2004]. However,188
all of these methods reason locally about the sensor response in a189
single direction, rather than about the global scene structure. This190
paper proposes that complex global reasoning about scene contents191
is possible given a measured TPRF. Although SONAR does not use192
light propagation, reasoning about global structure in that domain193
is common [Russell et al. 1996].194
Time gated imaging: Time gated imaging allows a reflected195
pulse of light to be integrated over extremely short windows, effec-196
tively capturing I(x, y, tδ). Multiple captures while incrementing197
the time window, tδ , allow I(x, y, t) to be captured. e.g. Busck198
et. al show a response function measured to 100 picosecond accu-199
racy [Busck and Heiselberg 2004]. While gated imaging is related200
to LIDAR, it has uses beyond 3D imaging. Nanosecond windows201
are used for imaging tanks at the range of kilometers [Andersson202
2006]. Picosecond gating allows imaging in turbid water [McLean203
et al. 1995]. Femtosecond windows allow ballistic photons to be204
separated from scattered photons while imaging through biological205
tissue [Farsiu et al. 2007; Das et al. 1993]. Most applications make206
limited use of global reasoning about scene characteristics, instead207
using a single time-gated window to improve signal to noise ratio208
while imaging.209
Streak cameras: Streak cameras are ultrafast photonic recorders210
which deposit photons across a spatial dimension, rather than in-211
tegrating them in a single pixel. Using a 2D array, I(x, yδ, t) can212
be measured. Sweeping the fixed direction, yδ , allows I(x, y, t)213
to be captured. Picosecond streak cameras have been available for214
decades [Campillo and Shapiro 1983]. Modern research systems215
can function in the attosecond range [Itatani et al. 2002]. Commer-216
cially available products image in the femtosecond regime [Hama-217
matsu ].218
2.2. Algorithmic approaches219
Global light transport: Light often follows a complex path be-220
tween the emitter and sensor. A description of forward steady-221
state light transport in a scene is referred to as the rendering equa-222
tion [Kajiya 1986a]. Extensions have been described to include223
transient light transport [Arvo 1993], but no rendering work has yet224
built on this foundation. Accurate measurement of physical scene225
properties, often called inverse-rendering, requires reasoning about226
this path [Marschner 1998; Patow and Pueyo 2003]. Complex mod-227
els have been developed for reconstructing specular scenes [Kutu-228
lakos and Steger 2007], transparent scenes [Morris and Kutulakos229
2007], Lambertian scenes [Nayar et al. 1990], reflectance proper-230
2
Online Submission ID: 010
ties [Yu et al. 1999], joint lighting and reflectance [Ramamoorthi231
and Hanrahan 2001], and scattering properties [Narasimhan et al.232
2006]. All of this work has made use of traditional cameras which233
provide measurements only of the steady-state light transport phe-234
nomena. In this work, we propose that transient light transport can235
both be observed and meaningfully used to improve estimates of236
scene properties.237
Capturing light transport: Recent work in image-based model-238
ing and computational photography has shown several methods for239
capturing steady-state light transport [Sen et al. 2005; Garg et al.240
2006; Masselus et al. 2003; Debevec et al. 2000]. The incident illu-241
mination is represented as a 4D illumination field and the resultant242
radiance is represented as a 4D view field. Taken together, the 8D243
reflectance field represents all time-invariant interaction of a scene.244
More relevant to this paper, the light transport has been decomposed245
into direct and indirect components under the assumption that the246
scene has no high-frequency components [Nayar et al. 2006], as247
well as into multi-bounce components under the assumption that248
the scene is Lambertian [Seitz et al. 2005].249
3. Transient Light Transport250
The theory of light transport describes the interaction of light rays251
with a scene. Incident illumination provides first set of light rays252
that travel towards other elements in the scene and the camera. The253
direct bounce is followed by a complex pattern of inter-reflections254
whose dynamics is governed by the scene geometry and material255
properties of the scene elements. This process continues until an256
equilibrium light flow is attained.257
We consider a scene S (figure 1) composed of M small planar258
facets (patches with unit area) p1, . . . pM with geometry G =259
Z,D,N, V comprised of the patch positions Z = [z1, . . . , zM ]260
where each zi ∈ R3; the distance matrix D = [dij ] where261
dij = dji, dii = 0 is the Euclidean distance between patches pi262
and pj ; the relative orientation matrix N = [n1, . . . , nM ] consists263
of unit surface normal vectors ni ∈ R3 at patch pi with respect264
to a fixed coordinate system, and the visibility matrix V = [vij ]265
where vij = vji = 0 or 1 depending on whether or not patch pi266
is occluded from pj . For analytical convenience, we consider the267
camera (observer) and illumination (source) as a single patch de-268
noted by p0. All the analysis that follows can be generalized to269
include multiple sources and the observer at an arbitrary position in270
the scene.271
We now introduce a variation of the rendering equation [Kajiya272
1986b] in which light takes a finite amount of time to traverse dis-273
tances within the scene. This introduces delays in the arrival of274
light from one patch to the other. Let t denote a discrete time step275
and Lij : i, j = 0, . . . ,M be the set of radiances for rays that276
travel between scene patches. Transient light transport is governed277
by the following dynamical equation which we term as transient278
rendering equation:279
Lij [t] = Eij [t] +M∑k=0
fkijLki[t− δki] (1)
Equation 1 states that the radiance Lij [t] of a ray traveling from280
patch pi to pj at time t is the sum of emissive radiance Eij [t] and281
the appropriately weighted sum of the incoming radiances from all282
other patches pj , j 6= i from earlier time instances. For simplicity,283
let the speed of light c = 1. Then the propagation delay δij is equal284
to the distance dij (see figure 2). The scalar weights fkij or form285
factors denote the proportion of light incident from patch pk on to286
0p
2p
1p
3p
4p
5p
1n
2n
3n
4n
5n
12d
23d13d
43d
54d
03d
12 0v =
12 1v =
03 1v =
3 3 3( , , )y zxz z z
2 2 2( , , )y zxz z z
1 1 1( , , )y zxz z z
5 5 5( , , )y zxz z z
Figure 1: A scene consisting of M = 5 patches and theillumination-camera patch p0. The patches have different spatialcoordinates (zxi , z
yi , z
zi ), orientations ni and relative visibility be-
tween patches vij . The patches also have different material proper-ties, for instance p4 is diffused, p4 is translucent and p5 is a mirror
pi that will be directed towards pj .287
fkij = ρkij
(cos(θin)cos(θout)
‖ zi − zj ‖2vkivij
)where ρkij is the directional reflectance which depends on the ma-288
terial property and obeys Helmholtz reciprocity (ρkij = ρjik),289
θin is the incident angle and θout is the viewing angle (see fig-290
ure 3). Additionally, if the patch does not interact with itself and291
then fkij = 0 for k = i or i = j. We assume that the scene292
is static and material properties are constant over the imaging in-293
terval. The source and observer patch p0 does not participate in294
inter-reflections: Fi0j = 0; i, j = 0, . . . ,M .295
020 2[ ]tL δ+02[ ]E t
1p
3p
2p
0p
021 2[ ]tL δ+
023 2[ ]tL δ+
020 21 1[ ]L t δ δ++
020 23 3[ ]L t δ δ++
Figure 2: A ray impulse E02[t] directed towards patch p2 at timet. This ray illuminates p2 at time instant t + δ02 and generatesthe directional radiance vector [L20[t + δ], L21[t], L23[t]]. Theselight rays travel towards the camera p0 and scene patches p1 and p3
resulting in global illumination
We model illumination using the emitter patch p0. All other patches296
in the scene are non-emissive, Eij [t] = 0 : i = 1, . . . ,M ; j =297
0, . . . ,M ; t = 0, . . . ,∞. Illumination is the set of radiances298
E0j [t] : ∀j = 1, . . . ,M ; t = 0, . . . ,∞ representing the light299
emitted towards all scene patches at different time instants.300
3
Online Submission ID: 010
ip
[ ]ki kiL t δ−kpjp
[ ]kiL t [ ]ijL t
[ ]ji ijL t δ+
inθ outθ1 kijf
Figure 3: Scalar form factors fkij are the proportion of light inci-dent from patch pk on to pi that will be directed towards pj
The outgoing light field at patch pi is the vector of directional ra-301
diances L[i, t] = [Li0[t], . . . , LiM [t]]T and for the entire scene we302
have the transient light field matrix L[t] = [L[1, t], . . . ,L[M, t]]T303
which contains (M(M − 1) + M) scalar irradiances. We can304
only observe a projection of L[t] that is directed towards the cam-305
era p0. At each time t we record a vector of M intensity values306
Lc[t] = [L10[t− δ10], . . . , LM0[t− δM0]]T .307
A transient light field camera (see figure 4) comprises of a gener-308
alized sensor and a pulsed illumination source. The sensor is direc-309
tional: it can separately sense the incident light field arriving from310
distinct patches. It can also time sample the incoming irradiance311
at picosecond or shorter time scales. A transient light field camera312
captures a time image I(ix, iy, t) : i = 1, . . . ,M: a collection313
(of time profiles) of irradiances arriving at the camera pixel (ix, iy)314
observing the patch pi. Due to bandwidth limitations, the incoming315
light can only be sampled at discrete instants. The pulsed illumina-316
tion source should be capable of transmitting directional impulses317
and the sensor and illumination source need to be synchronized.318
2p
3p
0p
01δ 1p
02δ
03δ
t →
2[ ]cL t
1[ ]cL t
3[ ]cL t
Figure 4: The transient light field camera consists of a pulsed illu-mination source and a high bandwidth detector which are synchro-nized with each other. It can time sample the incoming light fieldand sense the direction of incident light rays
We define the ray impulse for a patch pi as a unit pulse that is di-319
rected only towards pi at time t = 0. Analogous to the notion of320
an impulse response of a multiple-input multiple-output (MIMO)321
filter, we define the Space Time Impulse Response (STIR) of the322
scene S denoted by STIR(S) as the collection of time profiles ar-323
riving at the camera from all patches when only a single patch is324
illuminated. We can measure STIR(S) using the transient light325
field camera as follows (also see figure 5):326
1 Illuminate patch pi with an impulse ray327
2 Observe the radiance profile arriving at the sensor from all the328
patches i.e record Lc[t] for t = 0, . . . ,∞329
3 Repeat steps 1-2 for all observable scene patches pi : i =330
1, . . . ,M331
2p
1p
3p
0p
11O
12O
212O
213O
223O
3123O
t →
t →
t →
2p
1p
3p
0p
t →
t →
t →
221O
3231O
Figure 5: Measuring the STIR of a scene with 3 patches. A ray im-pulse is directed towards each patch and a time image is recorded.The time image is the time profile of radiances arriving at the cam-era from different patches. The STIR is simply a collection of timeimages captured under different ray impulse illuminations
At macro scales, light transport is a linear phenomenon. We note332
that the transient rendering equation (1) describes a MIMO linear333
time invariant (LTI) system. One of the important properties of an334
LTI system is that scaled and time shifted inputs will result in a335
corresponding scale and time shift in the outputs. Hence the STIR336
of the scene can be used to completely characterize its behavior337
under any general illumination.338
The transient rendering equation (1) is a difference equation de-339
scribing the evolution of a discrete linear dynamical system. If the340
delays in the scene are finite, then we need a finite number of states341
to drive the forward light transport process without any loss of ra-342
diance information. Our key contribution is to show that transient343
light transport can be represented as a MIMO LTI state space sys-344
tem whose parameters can be estimated using a system identifica-345
tion framework. We develop our formulation for a scene with arbi-346
trary form factors and geometry but for simplicity we analyze this347
scenario:348
• Each patch in the scene is visible from all the other patches349
i.e. vij 6= 0 : i, j = 0, . . . ,M350
• Each patch pi has a non-zero diffuse component351
Clearly these assumptions are not true for scenes with occluders352
and objects with no diffuse component such as mirrors. In the later353
sections, we will extend our framework for general scenes. In the354
next section we show how to model transient light transport as a355
state space system.356
4. State Space Formulation357
Now we formulate transient light transport in a scene S as a tradi-358
tional state space system which can be parameterized in terms of359
scene geometry G and form factors (fkij). A state space represen-360
tation is a mathematical model of a dynamical system with input,361
4
Online Submission ID: 010
output and state variables related by first-order differential equa-362
tions. A state space model can be represented in a matrix form363
when the dynamical system is linear and time invariant [Kailath364
1979]. It provides a convenient and compact way to model and an-365
alyze systems with multiple inputs and outputs. The internal state366
variables are the smallest possible subset of system variables that367
can represent the entire state of the system at any given time. State368
variables must be linearly independent and the minimum number369
of state variables required to represent a given system is usually370
equal to the order n of the system’s defining differential equation.371
Our contribution is to show that transient light transport in any real372
scene can be represented as the following LTI state space system 1373
(see figure 6)374
x[t+ 1] = Ax[t] +Bu[t]
y[t] = Cx[t] + η(t)
where x[t] ∈ Rn is called the state vector which in our case is
( )ΘB + Delay
( )ΘA
( )ΘC[ ]tu [ ]ty
[0]x
[ ]tx[ 1]t +x+
( )tη
Figure 6: Discrete state space representation of the transient lighttransport process in any scene. The state matrix A is related tointer-reflection between scene elements (indirect component). Theinput matrix B controls how the illumination (u[t]) directly inter-acts with the scene (direct component) and the observation matrixC determines how much of the global light transport (x[t]) reachesthe camera (y[t]). The observations are recorded with zero-meanAWGN noise η(t) ∼ N (0, σ2). Also, the system matrices can beparameterized in terms of scene geometry G and form factors Θ
375
related to the transient light field matrix L[t], y[t] ∈ RM is called376
the output vector and is related to time image Lc[t], u[t] ∈ Rp is377
called the input (or control) vector which can be derived from the378
instantaneous illumination vector E[t].379
The matrix A ∈ Rn×n is the state matrix which embeds the form380
factors for inter-reflection between scene elements,B ∈ Rn×p is381
the input matrix which consists of the form factors related to illu-382
mination and C ∈ RM×n is the output or projection matrix which383
embeds form factors related to the observer (camera). The time384
variable is t and all system matrices A,B,C are time invariant.385
We assume we are only dealing with only finite measurement pre-386
cisions and hence all the delays are rational which can be adjusted387
to be integers. We now show how to formally construct the system388
matrices using delays and form factors.389
State Vector x[t]: The state vector is essentially a vectorization390
of the transient light field matrix L[t] for all possible delay values.391
1The discrete time-invariant state space system is usually expressed as
x[t+ 1] = Ax[t] +Bu[t] + w[t]
y[t] = Cx[t] + D[t] + η(t)
whereD ∈ RM×p is the gain matrix which models the direct feed-throughof the input. It is often chosen to be the zero matrix 0 ∈ RM×p, which isalso true for our formulation since light from the source must interact withthe scene before reaching back to the camera. The vectors w(t) and η(t)are Gaussian noise vectors which model the process and measurement noiserespectively. We only consider measurement noise η(t) for our analysis andtreat process noise as an intrinsic property of the system
The intuition behind the construction of state vector is that for each392
pair of patches pi and pj we need to hold the state of outgoing393
light field Lij [t] till it reaches pj . Thus, the system needs to have394
a memory of δij states for each patch pair. If all the distances (and395
hence delays) in the scene are finite we only need a finite number396
of states to model transient rendering equation (1). The size of the397
state vector is the order of the system n (=∑i
∑j δij). We can398
construct the state vector x[t] as follows:399
x[t] = [xi[t],x2[t], . . . ,xM [t]]T
xi[t] = [xij,r[t]] , j = 0, . . . ,M ; j 6= i; r = 1, . . . , δij
xij,r[t] = Lij [t− r]
Control Vector u[t]: The input pulses originate at p0 and reach a400
patch pi after a time delay δ0i. To model this propagation delay,401
we need δ0i states for each pi. We have also shown that general402
illumination can be realized by using the ray impulse set E0i[t] :403
i = 1, . . .M ; t = 0, . . .∞. It therefore suffices to model the404
input vector u(t) for ray impulse illumination only.405
u[t] = [u1[t],u2[t], . . . ,uM [t]]T
ui[t] = [uir[t]] , r = 0, . . . , δ0i
uir(t) =
E0i[t] if t = 0, r = δ0i
0 otherwise
Output Vector y[t]: The output of the linear state space406
system is the directional irradiance measured at the camera407
(p0) arriving from the M patches. Thus y[t] = Lc[t] =408
[L10[t− δ10], . . . , LM0[t− δM0]]T409
State Matrix A: The state matrix A must model two aspects of410
the transient light transport process: the interactions due to inter-411
reflections (indirect component) and the intrinsic delays δij . The412
intuition behind constructing the state matrix A is that rows of the413
A matrix that correspond to ray radiance Lij [t] should model the414
light transport (equation 1). This implies that it should contain the415
form factors fkij , k = 1, . . . ,M at the appropriate column indices416
governed by delays δki : k = 1, . . . ,M . The rest of the rows of417
A must contain exactly a single 1 entry along the off-diagonal to418
model the time propagation. As a simple example we show the A419
matrix for the lambertian scene with M = 5 patches in figure 7.420
Input Matrix B: The input matrix B models the transport of light421
from the emitter patch p0 to the other scene patches (direct compo-422
nent). Since the B matrix occurs with input (illumination) vector,423
the rows of B corresponding to the ray radiance Lij [t] should con-424
tain the form factor f0ij in the appropriate column index governed425
by δ0i. The rest of rows of B are all zeros 01×p. The B matrix for426
the same lambertian scene with M = 5 patches is shown in figure427
8.428
Output Matrix C: The projection matrix C is related to the ob-429
servability and embeds the form factors fij0 : i, j, k = 1, . . .M430
in each of its M rows, again at the appropriate column indices gov-431
erned by δi0. The matrix C for the example lambertian scene is432
shown in figure 9.433
Also the initial state x(0) is the zero vector 0 ∈ Rn since we as-434
sume no a priori scene illumination. The effect of ambient light435
can be modeled by setting x(0) to the value of ambient illumina-436
tion. We end the formulation by restating the fact that all the sys-437
tem matrices A,B and C depend on the relative distances between438
patches and the form factors. A models the indirect component of439
light transport (inter-reflections), B governs how the illumination440
interacts with the scene (direct component) and C is the projection441
matrix which controls how the overall light transport in the scene442
reaches the observer (camera).443
5
Online Submission ID: 010
Figure 7: The state matrix A for a lambertian scene containingM = 5 diffuse patches. Using the diffuse assumption the form fac-tors fkij can reduced to fki since the same amount of radiance willbe directed towards all the patches. This also implies that Lij [t]can be reduced to Li[t]. As an illustration the first row of A thatcorresponds to the state variable L1[t] contains the form factorsf21, f31, f41, f51 (we let f11 = 0). These are in columns indicesthat governed by the delays δ21, δ31, δ41, δ51. The off-diagonal 1smodel the forward propagation of light
020 2[ ]tL δ+02[ ]E t
1p
3p
2p
0p
021 2[ ]tL δ+
023 2[ ]tL δ+
020 21 1[ ]L t δ δ++
020 23 3[ ]L t δ δ++
Figure 8: The input matrix B for a lambertian scene containingM = 5 diffuse patches. The rows of B that corresponds to thestate variable Li[t] contain the form factors f0i : i = 1, . . . , 5.These are in columns indices that governed by the delays δ0i : i =1, . . . , 5
Figure 9: The output matrix C for a lambertian scene containingM = 5 diffuse patches. The M rows of C contain a 1 in eachrow in the column position that corresponds to the state variableLi[t − δi0]. This models the fact that the diffuse light rays reachcamera after a delay
4.1. System Identification444
System identification is the algorithmic procedure of building dy-445
namical models from measured input/output data and estimating446
their parameters. In the context of the light transport problem, sys-447
tem identification refers to two kinds of inverse problems:448
• Replicating the I/O behavior of scene which enables appli-449
cations such as scene relighting from a novel viewpoint and450
illumination451
• Estimating the parameters of the state space system system452
that describes the transient light transport process. These pa-453
rameters are the form factors [fkij ] : i, j, k = 0, . . . ,M454
which can be used to devise novel algorithms for scene un-455
derstanding and material sensing456
The form factors can be vectorized into the parameter matrix Θ457
where Θ is a O(M3) length vector for a scene containing M458
patches. The system matrices A,B and C are parameterized by459
the delays and Θ and we have the parameterized state space model460
M(Θ)461
M(Θ) :x[t+ 1] = A(Θ)x[t] +B(Θ)u[t]
y[t] = C(Θ)x[t] + η[t]
Our system identification problem can be formally stated as:462
Given STIR(S)
estimate A(Θ), B(Θ), C(Θ)
The reflectance terms ρkij occurring in fkij can be arbitrary (within463
the bounds of energy conservation and Helmholtz reciprocity). Ide-464
ally we would like to estimate the whole set of reflectances whose465
cardinality scales O(M3) in the number of scene patches M . As466
an example, for a real world scene comprising of M = O(103)467
patches, we have O(109) form factors to estimate.468
The inverse process of computing estimates for such a large num-469
ber of parameters is extremely computationally expensive and is470
an ill-conditioned inverse problem. Moreover the form factors en-471
ter nonlinearly in the observations recorded by the camera due to472
the recursive behavior of system. System identification for tran-473
sient light transport is a challenging inverse problem and we have474
demonstrated initial progress in this area by formulating it based on475
a thorough understanding of the physical process.476
If the STIR of the scene is available then it can be directly used to477
obtain the geometry G of the scene by solving the inverse geome-478
try problem discussed in section 5. It can also be used to construct479
the I/O pair [y[t],u[t] : t = 0, . . . ,∞] which we will use in sec-480
tion 6 to estimate Θ by solving the inverse reflectance problem. If481
we only have a time image which profiles the light arriving at the482
camera from the scene y[t] : t = 0, . . . ,∞ in response to an483
arbitrary illumination u[t] : t = 0, . . . ,∞ then we can obtain484
STIR(S) by using subspace system identification []. We now485
propose the following system identification algorithm486
5. Inverse Geometry487
Unlike traditional time-of-flight imaging, our goal is to not only488
compute the direct distance from first bounce, but also the pairwise489
distance between patches. If the camera and light source are in-490
trinsically calibrated, the Euclidean direction of each ray is known491
directly providing 3D location of each patch. But, to support more492
sophisticated operations later such as estimating positions of hid-493
den patches, we will instead exploit the second and higher order494
bounces for inter-patch distance estimation. The first step in our495
6
Online Submission ID: 010
Algorithm 1 SYSID [STIR(S)]
1. Solve inverse geometry using STIR(S)-Estimate the distance matrix Dusing onsets
-Compute the coordinate set Zusing isometric embedding
-Compute the surface normals Nusing smoothness assumption
2. Use D to construct the model M(Θ)4. Solve inverse reflectance problem
- Estimate form factors Θ using M(Θ)
inversion pipeline is to use the STIR(S) to infer scene geome-496
try G = X,D,N, V which is then used to construct the model497
M(Θ).498
5.1. Distances from STIR499
We now demonstrate how to compute the distance matrix D =500
[dij ] : i, j = 0, 1, . . . ,M from onsets contained in the STIR(S).501
NoteD is a symmetric matrix with a 0 diagonal and it may have up502
toM(M+1)/2 distinct entries. DefineO1 = O1i |i = 1, . . . ,M503
as the set of first onsets: the collection of all times O1i when the504
transient light field camera receives the first non-zero response from505
patch pi when illuminating the same patch (see figure 5). It is easy506
to observe that O1i is the time taken by the impulse ray originat-507
ing at p0 directed towards pi to arrive back at p0 after the first508
bounce; this corresponds to the direct path p0 → pi → p0. Simi-509
larly we can define O2 = O2ij |i, j = 1, . . . ,M ; j 6= i as the set510
of second onsets: the collection of times when the transient light511
field camera receives the first non-zero response from a patch pj512
when illuminating a different patch pi (see figure 5). This corre-513
sponds to the multi-path p0 → pi → pj → p0. Note that ideally514
O2ij = O2
ji. It is worth noting here although all the onsets con-515
tained in O1 and O2 correspond to first and second bounces re-516
spectively but they are onsets contained completely different time517
profiles Lci [t] : i = 1, . . . ,M ; t = 0, . . . ,∞ that comprise the518
STIR(S). This is a result of the procedure we use to measure the519
STIR (section 3).520
In order to compute D using O1 and O2 we con-521
struct the forward distance transform matrix T2 of size522
(M(M + 1)/2×M(M + 1)/2) which models the sum of523
appropriate combinations of path lengths contained in the distance524
vector d = [dij ] : i = 1, . . . ,M ; j = i, . . . ,M and relates it to525
the vector of observed onsets O. Then we solve the linear system526
T2d = O to obtain d and construct D. As an example, consider527
the scene in figure 10 consisting of 3 patches (M = 3). The linear528
system for this scene that will allow us to solve for the distances529
can be constructed as follows:530 2 0 0 0 0 01 1 0 1 0 01 0 1 0 1 00 0 0 0 2 00 0 0 1 1 10 0 0 0 0 2
d01
d12
d13
d02
d23
d03
=
O1
1
O212
O213
O22
O223
O33
It can be easily verified that for any M the forward distance trans-531
form matrix T2 is full rank and well-conditioned. Due to syn-532
chronization errors, device delays and response times; the observed533
onsets have measurement uncertainties which we assume to corre-534
spond to bounded error (ε)in D and we generate distance approx-535
imations dij such that (dij − ε) ≤ dij ≤ (dij + ε) : i, j =536
1, . . . ,M . Moreover we can use the redundancy in second onset537
0p
1p
3p
12d
23d
13d
03d
2p02d
01d
Figure 10: A scene with M = 3 patches showing the distancesbetween the patches that form the distance matrix D
values (O2ij = O2
ji) to obtain multiple estimates D and reduce er-538
ror by averaging them.539
5.2. Structure from Pairwise Distances540
An algebraic framework for estimating the scene structure Z using541
the distance estimates D is discussed in appendix ??. The prob-542
lem of estimating structure from distances can be stated as finding543
an isometric embedding Z ⊂ RM×3 → R3. For computational544
convenience we take p0 to be the origin, i.e. z0 = (0, 0, 0). An ex-545
ample of recovering structure from noisy distance estimates using546
the isometric embedding algorithm is shown in figure 11. The em-547
bedding algorithm and the use of convex optimization to compute548
the optimal coordinate set estimate Z in presence of distance uncer-549
tainties is also explained in appendix ??. The estimated coordinates550
Z can be used to recompute robust distance estimates.551
In general the orientation matrix N = [n1, . . . , nM ] which con-552
sists of unit surface normal vectors ni ∈ R3 cannot computed even553
with the knowledge of distances D and structure Z. However we554
can estimate N for a patches sharing a common surface normal.555
Also, if we assume that we are imaging piecewise smooth surfaces,556
then given the coordinate set X we can fit an analytical (piecewise)557
surface and compute N .558
5.3. Scenes with Occluders559
Now we consider the case of estimating geometry of a scene that560
contains a set H of patches hidden from the camera: v0i = 0 :561
pi ∈ H . To illustrate our analysis we make two assumptions: that562
there is no inter-reflection amongst hidden patches and no two third563
bounces involving hidden patches arrive at the same time in the564
same time profile contained in STIR(S). The likelihood of the565
latter assumption being true for general scenes is very high because566
because we are illuminating one patch at a time. If a patch pi is not567
visible from the camera p0 then the set of first and second onsets568
do not contain any responses from pi i.e the vector of distances569
dH = [dij ] : pi ∈ H, j = 0, . . . ,M cannot be estimated usingO1570
and O2. Hence we need to consider the set of third onsets O3 =571
O3ijk : i, j, k = 1, . . . ,M ; i 6= j; j 6= k which corresponds to572
third bounces. Note that the number of first onsets |O1| is O(M)573
and there are O(M2) second onsets and O(M3) third onsets. Also574
Euclidean geometry imposes that O3ijk = O3
kji. Although we can575
easily identify (label) the onsets contained in O1 and O2 using the576
STIR(S) labeling the onsets in O3 is non-trivial. We used the577
following procedure to compute distance estimates D for scenes578
with hidden patches discussed in more detail below:579
1 Estimate the distance set DS\S(H) for the restricted scene580
which only contains all the visible patches581
7
Online Submission ID: 010
Figure 11: The original and estimated geometries of a piecewisesmooth scene with M = 147 patches. Each of the 3 planar sur-faces is made up of M/3 patches that have non-zero diffuse com-ponent. Given the noisy STIR of this scene, the distances D arecomputed using up to the 2 bounce onsets and the distance trans-form T2. We average to reduce noise in estimates using redundantonsets contained in the STIR. Then the patch coordinates are es-timated using isometric embedding (appendix ??) and a piecewisesmooth surface is fitted using the Matlab function griddata.This fitted surface is used to compute patch normals are using thefunction surfnorm
2 Use DS\S(H) and the stated assumption to label the time on-582
sets contained in O3583
3 Construct the distance transform operator T3 that relates the584
arrivals times for third bounce OH that involve the hidden585
patches and the distances to hidden patches dH586
4 Solve the resulting linear system T3dH = OH and obtain587
the complete distance set D588
2p
3p
1p
4p
0p
11O
12O
214O
241O 3
421O
3124O
3431O
3134O3
424O
3434O3
414O
3121O
3131O
3141O
t →
t →
t →
t →
Figure 12: A scene with M = 4 patches where the two patches p2
and p3 are not directly visible from the camera and the STIR onsetarrival profile. The green (first) and blue (second) onsets are a re-sult of direct observations from patches p1 and p4. The pattern ofarrival of third onsets depends on the relative distance of the hiddenpatches p2 and p3 from the visible patches. The onsets that corre-spond to light traversing the same Euclidean distance can be easilyidentified (they have the same arrival times in different time pro-files). Once the onsets are labeled, they are used to obtain distancesthat involve hidden patches
As a simple example consider the scene in figure 12. As-589
sume that the patches p2 and p3 are hidden. We first compute590
the distances: d01, d04, d14 between visible patches. The dis-591
tances (d21, d24) and (d31, d34) are not directly observable al-592
though once these distances are estimated, d02, d03, d23 can be593
computed using triangulation. Now we apply our labeling algo-594
rithm to identify third onsets: The onsets O3141, O
3414 are readily595
identified since we know distances to patch p1 and p4. The on-596
sets O3121, O
3131, O
3424, O
3434, O
3124, O
3134, O
3421, O
3431 can be dis-597
ambiguated using the fact that O3421 = O3
124 and O3431 = O3
134598
and that they arrive in different time profiles of the STIR(S) . We599
sort the remaining onsets based on their arrival times and label them600
based on their proximity to visible patches. This labeling procedure601
can be generalized to multiple planar hidden patches as:602
1 Establish an ordering of hidden patches based on their prox-603
imity to an arbitrary visible patch604
2 Identify the onsets that obey the constraint O3ijk = O3
kji and605
label them using the ordering in step 1606
3 Sort the remaining onsets according to their arrival time and607
again use ordering in step 1 to label them608
The onset arrival profile for the example scene is shown in figure 12.609
Now we construct the distance operator T3 and setup the system610
of equations to solve for the distances involving hidden patches as611
follows:612 2 0 0 01 1 0 00 0 2 00 0 1 1
d21
d24
d31
d34
= c
O3
121 −O11
O2124 − (O1
1 +O14)/2
O3131 −O1
3
O2134 − (O1
1 +O14)/2
8
Online Submission ID: 010
020 2[ ]tL δ+02[ ]E t
1p
3p
2p
0p
021 2[ ]tL δ+
023 2[ ]tL δ+
020 21 1[ ]L t δ δ++
020 23 3[ ]L t δ δ++
Figure 13: Estimating distances in scenes with hidden patches: theoriginal and estimated geometries of the piecewise smooth scene asin figure 11. The second planar surface is occluded and all patcheshave a non-zero diffuse component. Again, given the noisy STIR ofthis scene we used up to the third bounce onsets and the operator T3
to estimate distance matrix D. We also averaged using redundantonsets contained in the STIR to reduce noise in estimates. Thecoordinates, surface fitting and the normals were computed usingexactly the same Matlab routines as figure 11
An example of estimating distances in a scene containing hidden613
patches is shown in figure 13. With our stated assumptions, extend-614
ing this construction for any number of hidden patches is straight-615
forward 2. Once the distance setD is estimated we do not need spe-616
cial constructions any further and we proceed to estimate structure617
(Z) and orientation (N ) as discussed in section 5. Having obtained618
the geometry G, we use the distance matrix to obtain the delays ∆.619
We can then construct the modelM(Θ).620
6. Inverse Reflectance621
In this section we develop a system identification formulation to622
estimate form factors contained in the parameter vector Θ. The623
numerical sensitivity of the modelM(Θ) and performance of the624
inverse reflectance procedure may vary dramatically between state625
space parameterizations. Hence it is very important to carefully626
choose a model parameterization whose behavior is as close to the627
physical process as possible. We have used physical insight and628
the knowledge of inferred geometry to constructM(Θ) so that it629
closely matches the actual transient light transport process.630
The structure of the system matrices A(Θ), B(Θ), C(Θ) of the631
state space modelM(Θ) is parameterized by the scene geometry632
G and the form factors contained in Θ. The Inverse reflectance633
problem is to estimate Θ given a time image y[t],u[t]|t =634
0, 1, . . . , T and the structural parameterization of M(Θ). T is635
the number of time samples under consideration. Note that if we636
are just given the STIR(S) we can easily construct a time im-637
age from it. The next step is formulating the estimation of the638
model parameters as an optimization problem: we seek to find639
a ΘOPT such that it minimizes an error norm J(.) between the640
predicted output y[t|Θ] = M(Θ,u(t)) and the observed output641
y[t], ∀t = 0, . . . ,∞. Regularization is frequently used to over-642
come the non-uniqueness of minimizing Θ and a penalty term is643
2An important generalization of the hidden patches scenario is to esti-mate distances in the case of multiple interactions between hidden patches.It can be shown that if a hidden patch can have at most N inter-reflectionswith the other hidden patches, then we need to utilize onsets that correspondto up to (N + 3) bounces i.e. the setsO1,O2, . . . ,ON+3
added to the cost function:644
ΘOPT = minΘ∈Ω
J(y[t|Θ],y[t]) + λ ‖ Θ ‖2
where λ is a user selectable control and Ω is the solution space of645
all possible parameter values. We used two common quadratic error646
criterion: the least squares Output Error Minimization (OEM) and647
one step Prediction Error minimization (PEM)648
PEM : J(y[t|Θ],y[t]) =1
T
T−1∑t=0
‖ y[t|(t− 1),Θ]− y[t] ‖2
OEM : J(y[t|Θ],y[t]) =1
T
T−1∑t=0
‖ y[t|Θ]− y[t] ‖2
For both PEM and OEM cost functions it can be shown that649
the model output is linear in the unknown parameter vector Θ:650
y[t|Θ] = Φ(t)TΘ where Φ(t) is a linear transform derived using651
structural parameterization of the modelM(Θ). This is discussed652
in detail in the appendix ??.653
The next step is iterative numerical minimization of the cost func-654
tion in equation 2 to compute the optimum parameter values ΘOPT655
starting with an initial guess Θ0. We obtained good initial esti-656
mates for parameter values (such as diffuse reflectance) using a sin-657
gle image taken with a traditional camera which captures steady658
state light transport. To determine a numerical solution to the659
parameter optimization problem (2) we minimized the cost func-660
tion by using the following iterative gradient descent methods:661
Gauss-Newton, Steepest Descent and Levenberg-Marquardt algo-662
rithms. We found that the Levenberg-Marquardt (LM) algorithm663
outperformed the other two methods in terms of both convergence664
and computational complexity. An excerpt of the Matlab code665
that implements the above procedure for our PEM formulation us-666
ing the System identification Toolbox and Control667
theory Toolbox is included in appendix 12.668
The computation of the Jacobian J ′(y[t|Θi],y[t]) of the cost func-669
tion is a central step in the numerical minimization process. For670
our case of linear state space system identification an analytic Jaco-671
bian can be efficiently obtained by simulating the linear state space672
systemM(Θ). It is shown in appendix ?? that the calculation of673
the Jacobian boils down to simulatingM(Θ) for every element of674
the parameter vector. Therefore, if there are O(M3) parameters in675
Θ, we need to simulate O(M3) linear dynamical systems at each676
step of the algorithm in order to compute both the Jacobian and the677
Hessian. This computational burden is circumvented by using ad-678
joint methods [], allowing the computation of Jacobian using only679
2 simulations ofM(Θ).680
Inverse problems such as system identification usually do not have681
a unique solution. In the case of transient light transport, it im-682
plies that there are several possible scenes with different state space683
parameterizations M(Θ) that correspond to the same STIR(S)684
or time image y[t],u[t]|t = 0, 1, . . . , T. We now summarize685
the unique properties of our state space base formulation that allow686
us to regularize the inverse system identification problem, produce687
robust parameter estimates in presence of noise and reduce compu-688
tational complexity:689
1 Imposing structural constraints on the system matrices690
A(Θ), B(Θ), C(Θ) using scene geometry G constraints691
the solution space of set of possible transfer functions692
M(Θ)|Θ ∈ Ω to scenes having the same G as the orig-693
inal scene but possibly different form factors694
2 The structural constraints also automatically ensure that our695
state space parameterization M(Θ) is asymptotically sta-696
ble. This means that given a bounded input, the dynamical697
9
Online Submission ID: 010
system will eventually reach steady state. It is also a property698
of the physical light transport process. Requiring asymptotic699
stability of the model leads to further restriction on the solu-700
tion space. A detailed discussion on stability can be found in701
appendix ??702
3 Exploiting physical properties of light transport, we can fur-703
ther constrain the parameters (fijk) of M(Θ). For instance704
Helmholtz reciprocity (ρijk = ρkji; 0 < ρijk < 1), and en-705
ergy conservation(∑
i,j,k fijk < 1; 0 < fijk < 1)
restrict706
the otherwise unconstrained parameter set (Ω = R|Θ|) to a707
much smaller subset bounded by the unit hypercube. Starting708
with good initial estimates, the convergence and performance709
can be further improved710
4 One of the biggest advantages of modeling light transport as a711
linear state space system is that we have an analytical expres-712
sion of the Jacobian (J ′(.)) of the cost function used in nu-713
merical minimization procedure. Moreover J ′(.) is sparse for714
our formulation. Usually Jacobian is approximated by finite-715
differences and may lead to poor convergence. Having an716
explicit access to the Jacobian of the cost function not only717
reduces computational complexity but also improves conver-718
gence and numerical performance of the minimization algo-719
rithm, especially near ΘOPT720
5 The PEM is an online algorithm and is used to update esti-721
mates as the I/O data is being recorded. In the presence of722
low measurement noise (high SNR) it produces accurate es-723
timates with few observations. If the measurement noise is724
high the offline OEM produces unbiased and consistent esti-725
mates of the parameter vector Θ as T → ∞. Noise analysis726
is discussed in appendix ??727
6 Our formulation allows a straightforward integration of higher728
order BRDF parameterization in the state space model as dis-729
cussed in section 6.1730
Like all ill-conditioned inverse problems, our inverse reflectance al-731
gorithm may be unstable. Even with the key structural and compu-732
tational properties of our state space formulation, we suffer from733
poor convergence in low SNR (high measurement noise) condi-734
tions. Also there is a strict dependence on the accuracy of esti-735
mation of scene geometry. Depending on the noise, initial guesses736
and structure of the scene, we might converge a local minimum.737
The rule of thumb is to use as much a priori information available738
about the scene as possible to impose structural and parametric con-739
straints and start with good initial guesses. In the next section, we740
show how to BRDF models to reduce model complexity and im-741
prove scalability.742
6.1. Simplified Reflectance Models743
One way to reduce the dimensionality of the problem is using a744
parametric model of the BRDF. In the BRDF is unparameterized745
there are O(M2) form factors per patch. Once we have computed746
the scene geometry G, then by assuming Lambertian (diffuse) or747
Phong reflection models we can decrease the number of parameters748
at each patch to just 1 or 3 respectively.749
6.1.1 Lambertian Scenes750
For demonstration of concept we will use the the Lambertian or751
diffuse model:752
ρkij = ρdi
fkij = ρdi cos(θin) = fki
lout[i→ j] = ρdi cos(θin)lin[k → i]
where ρdi is the diffuse reflectance of patch pi. lout[i → j] is the753
fraction of incident light lin[k → i] arriving from pk onto pi that is754
reflected towards pj . Under the lambertian assumption each patch755
radiates light equally in a hemisphere of directions i.e. lout[i →756
j] = lout[i] : ∀j = 1, . . . ,M . Once the scene geometry G is757
estimated (or known) then the problem of estimating O(M3) form758
factors reduces to inferring justO(M) reflectance parameters since759
cos(θin) is available.760
6.1.2 Parameterized General Scenes761
If we use the Modified-Phong BRDF capable of modeling glossy762
(specular+diffuse) patches:763
ρkij = ρdi + ρsi [cos(θref )]αsi
fkij = ρkij[vkivijcos(θin)cos(θout)/d
2ij
]lout[i→ j] = fkij lin[k → i]
The model parameters ρdi , ρsi and αsi control the behavior of the764
material ranging from purely diffuse to highly glossy and θref is the765
angle between the direction of true reflection and viewing direction.766
Again, if the scene geometry G is estimated (or known) we need to767
estimate only 3M parameters.768
7 Implementation769
Device770
type of laser, crystal, wavelength, repetition771
[Photo here of the setup, Cartoon]772
Experiment773
There are free-space mechanisms to use femto and pico-second ac-774
curate devices. So far they have been used only in small-size en-775
vironments for femtochemistry of biological samples such as for776
optical coherence tomography.777
Performance Analysis778
• (a) Geometry: Computing distance without interference .. two779
detectors [Photo and Plot]780
• (b) Photometry: Show the radial fall off .. how good confi-781
dence can we measure this, show plots [Plot]782
• (c) Effect of Bounces: show 1, 2, 3 bounces [Plot of some-783
thing that has three bounces]784
Ability to infer the scene785
• (a) single laser, onto diffuser, second bounce from a shiny786
surface787
• (b) glass barcode788
• (c)789
8 Implementation790
In our initial experiments, we opted to verify our theory using791
a pulsed laser and single photodiode. We believe this relatively792
simple configuration demonstrates that practical implementation is793
within reach. In particular, we intend this prototype to show that it794
is possible to reason about multi-bounce global transport using the795
STIR of a scene. A 2D implementation using time gated or sweep796
imagers is left for the future.797
10
Online Submission ID: 010
Figure 14: Verifying distance calculations. (a) Configuration of twodetectors receiving same femto-second laser pulse with distance-induced delay. (b) Plot from a 10GHz scope showing the time ar-rival.
8.1 Device798
Our sensor is a commercially-available reverse-biased silicon pho-799
todiode Thorlabs FDS02 intended for fiber-coupling. The sensor800
has an active area of 250 micron diameter and a condensing lens of801
diameter 1 millimeter. Sensor signals (photocurrents) are digitized802
by 5 GHz sampling oscilloscope, LeCroy Wavemaster 8500A. The803
combination results in an impulse response width (-3 dB) of 210804
picoseconds. No optical or electrical amplifier are used, and the805
sensor has a gain of unity.806
Scene illumination is provided by a modelocked Ti-sapphire oscil-807
lator, manufactured by Kapteyn-Murnane Laboratories, with a cen-808
ter wavelength of 810 nm emitting pulses of FWHM duration 50 fs809
at a repetition rate of 93.68 MHz. The spatial bandwidth of these810
pulses so far exceeds the response bandwidth of our sensor that we811
consider the laser pulses to be effectively impulses. Average laser812
power is 420 milliwatts, corresponding to peak powers greater than813
84 kW. The high peak powers generated by our oscillator are criti-814
cal for the preservation of the signal-to-noise ratio in a photodiode815
generating current proportional to incident power.816
8.2 Experiments and Performance Analysis817
The combination of a high-speed photodiode with a modulated818
source of illumination is the basis of the optical time-domain re-819
flectometer commonly used in the telecommunications industry to820
detect back reflections caused by defects in optical fibers. The nov-821
elty of our device is dual: that it functions in free-space, rather than822
being confined to an optical fiber; and that the scale of its opera-823
tion is centimeters, rather than the dozens of meters to hundreds of824
kilometers usual to telecommunication. The increased difficulty on825
account of the signal loss due to both misalignment and inverse-826
square diffusion loss in free-space as well as the much higher827
speeds—and ensuing further inhibition of the signal—is clear.828
The success of a transient light field camera depends on the ability829
to recover two broad factors from scene information: photometry830
and geometry. We successfully demonstrated that both factors can831
be recovered in a free space setting, by three experiments.832
The first experiment, depicted in Figure ??, tested the precision3 of833
distance measurements achievable with our device and correspond-834
ingly the ability of our sensor to recover scene geometry. A plot of835
the time delay against a reference arm as the detector travels on a836
linear stage is seen in Figure ??.837
The second experiment demonstrated the recoverability of the scene838
photometry after multiple-bounces; that is, that a twice-diffused839
signal will be detectable. A scene diagram is found in Figure ??.840
The two diffusers were both anisotropic: the dull side of aluminum841
foil and a similar copper foil. The impulse response of this scene,842
consisting solely of the second bounce from the second patch, was843
successfully recorded, showing that a scene with two diffusions844
does not result in an irrecoverable loss of signal. Therefore, the845
photometric characteristics of the two diffusers can be inferred.846
The third experiment combined demonstrations of our device’s abil-847
ity to recover both geometry and photometry. The scene, drawn in848
Figure ?? and pictured in Figure ??, consists of two mirrors out-849
side the field of view of the sensor. The presence of the multiple850
hidden mirror patches is nonetheless revealed in the time profile of851
third bounces. Though our sensor did not have the resolution to852
fully distinguish the two different path lengths, the fact that the am-853
plitude of the signal varied when one or both of the mirrors were854
removed is strong evidence for the recovery of the third bounce.855
9. Implementation, George Stuff Sat Night856
Collecting the TPRF tensor corresponding to the time profile of the857
intensity of the incident light required hardware able to operate at858
the speeds required with the amount of signal present. Though it859
is clear that high-collection-speed, high-angular-resolution systems860
are considerably into the future, we sought to demonstrate that the861
time profile of an introduced dynamic in a scene can be reliably862
captured even after reflections of order greater than 1, increasing863
the difficulty compared to LIDAR or previously demonstrated time-864
gated systems.[?]865
9.1. Device866
Our sensor was a commercially-available reverse-biased silicon867
photodiode, Thorlabs FDS02, intended for fiber-coupling. The sen-868
sor has an active area of 250 micron diameter and a condensing lens869
of diameter 1 millimeter. Sensor signals (photocurrents) were digi-870
tized by 5 GHz sampling oscilloscope.4. The combination results in871
an impulse response width (-3 dB) of 210 picoseconds. No optical872
or electrical amplifier was used, and the sensor had a gain of unity.873
Introducing dynamics into the scene was a modelocked Ti-sapphire874
oscillator5 with a center wavelength of 810 nm emitting pulses of875
FWHM duration 50 fs at a repetition rate of 93.68 MHz. The spa-876
tial bandwidth of these pulses so far exceeds the response band-877
width of the sensor that we considered the laser pulses to be—878
3A note on precision vs. resolution in our system: the temporal andtherefore spatial resolution of our system is known absolutely, based on theconvention that two finite-width pulses can be resolved provided that theircenters are separated by half their FWHM; in our case: 105 ps, correspond-ing to a distance of 3.15 cm. The precision and accuracy, however, arecharacteristics of the measurement noise.
4LeCroy Wavemaster 8500A5Manufactured by Kapteyn-Murnane Laboratories.
11
Online Submission ID: 010
effectively—impulses. Average laser power was 420 milliwatts,879
corresponding to peak powers greater than 84 kW. The high peak880
powers generated by our oscillator were critical for the preserva-881
tion of the signal-to-noise ratio in a photodiode generating current882
proportional to incident power.883
9.2 Experiments884
The combination of a high-speed photodiode with a modulated885
source of illumination is the basis of the optical time-domain re-886
flectometer commonly used in the telecommunications industry to887
detect back reflections caused by defects in optical fibers. The nov-888
elty of our device is dual: that it functions in free-space, rather than889
being confined to an optical fiber; and that the scale of its opera-890
tion is centimeters, rather than the dozens of meters to hundreds of891
kilometers usual to telecommunication. The increased difficulty on892
account of the signal loss due to both misalignment and inverse-893
square diffusion loss in free-space as well as the much higher894
speeds—and ensuing further inhibition of the signal—is clear.895
The success of a transient light field camera depends on the ability896
to recover two broad factors from scene information: photometry897
and geometry. We successfully demonstrated that both factors can898
be recovered in a free space setting, by three experiments. The first899
experiment, depicted in Figure ??, tested the precision6 of distance900
measurements achievable with our device and correspondingly the901
ability of our sensor to recover scene geometry. A plot of the time902
delay against a reference arm as the detector travels on a linear stage903
is seen in Figure ??.904
The second experiment demonstrated the recoverability of the scene905
photometry after multiple-bounces; that is, that a twice-diffused906
signal will be detectable. A scene diagram is found in Figure ??.907
The two diffusers were both anisotropic: the dull side of aluminum908
foil and a similar copper foil. The impulse response of this scene,909
consisting solely of the second bounce from the second patch, was910
successfully recorded, showing that a scene with two diffusions911
does not result in an irrecoverable loss of signal. Therefore, the912
photometric characteristics of the two diffusers can be inferred.913
The third experiment combined demonstrations of our device’s abil-914
ity to recover both geometry and photometry. The scene, drawn in915
Figure ?? and pictured in Figure ??, consists of two mirrors out-916
side the field of view of the sensor. The presence of the multiple917
hidden mirror patches is nonetheless revealed in the time profile of918
third bounces. Though our sensor did not have the resolution to919
fully distinguish the two different path lengths, the fact that the am-920
plitude of the signal varied when one or both of the mirrors were921
removed is strong evidence for the recovery of the third bounce.922
10. Future work923
An important culmination of our imaging approach in this paper924
is that by exploiting emerging technologies for sensors, optics and925
active illumination, we explore a completely new area of methods926
and algorithms for solving hard problems in computer vision and927
graphics based on time-of-flight analysis. This in turn has tremen-928
dous applications in other areas such as Medical imaging and Mil-929
itary reconnaissance. We expect our analysis and results to pave930
the way not only for improved approaches to solving existing hard931
problems but also the definition of novel class of problems that can932
be cast in our framework.933
6A note on precision vs. resolution in our system: the temporal andtherefore spatial resolution of our system is known absolutely, based on theconvention that two finite-width pulses can be resolved provided that theircenters are separated by half their FWHM; in our case: 105 ps, correspond-ing to a distance of 3.15 cm. The precision and accuracy, however, arecharacteristics of the measurement noise.
We consider two primary directions for future work:934
Theoretical: There is still a lot of scope for compact representation935
of the described state space system. Better optimization algorithms936
for state space identification can be devised with the use of surface937
BRDF models. The state space framework can also be used for938
solving rendering problems in computer graphics.939
Implementation: Building the hardware for time-of-flight camera940
that is capable of imaging at picosecond or better resolution is cru-941
cial to testing our approach in practice and on real scenes. We pro-942
pose the design of the very first transient light field camera in figure943
??944
11. Conclusion945
We have presented a conceptual framework for scene understanding946
through the modeling and analysis of global light transport. We ex-947
plore new opportunities in multi-path analysis using time-of-flight948
sensors. Our approach to scene understanding is four-fold:949
• Measure the scene’s transient photometric response function950
using the directional time-of-flight camera and active impulse951
illumination.952
• Estimate the structure and geometry of the scene using the953
observed TPRF954
• Use the estimated structure and geometry along with aprior955
models of surface light scattering properties to infer the BRDF956
form factors of the scene957
• Higher order inference engines can be constructed that use the958
estimated scene properties for higher level scene abstraction959
and understanding960
A time-image camera described here is not available today but the961
pico-second resolution impulse response can be captured by scan-962
ning in time or space. Emerging trends in femto-second accurate963
emitters, detectors and non-linear optics may support single-shot964
time-image cameras. The goal of this paper is to explore the op-965
portunities in multi-path analysis of the transient response. We de-966
veloped the theoretical basis for analysis and demonstrated poten-967
tial methods for recovering scene properties using simple examples.968
But a complete procedure for estimating scene parameters remains969
future work. The contribution of this work is conceptual rather than970
experimental. We hope to influence the direction of future research971
in time-of-flight systems both in terms of sensor design and algo-972
rithms for scene understanding.973
Femto and pico-second lasers are high-power and increasing their974
power involves careful design. Fortunately, we are seeing an in-975
creasing trend towards solid state lasers and in the future ultra-short976
duration solid state lasers will allow them to bring down costs the977
same way 3DV and Canesta have done so with solid state nano-978
length sensors and emitters. There is nothing in physical laws to979
prevent such a trend for foreseeable future. We require precise opto-980
electronics that introduces minimum measurement noise.981
(?? For Kirmani’s part [Immel et al. 1986; Ljung 1987; Verhaegen982
and Verdult 2007])983
References984
3DV SYSTEMS. http://www.3dvsystems.com/.985
ANDERSSON, P. 2006. Long-range three-dimensional imaging using range-gated laser986
radar images. Optical Engineering 45, 034301.987
ARVO, J. 1993. Transfer equations in global illumination. In Global Illumination,988
SIGGRAPH 93 Course Notes.989
BLAIR, J., RABINE, D., AND HOFTON, M. 1999. The Laser Vegetation Imaging990
Sensor: a medium-altitude, digitisation-only, airborne laser altimeter for mapping991
12
Online Submission ID: 010
vegetation and topography. ISPRS Journal of Photogrammetry and Remote Sensing992
54, 2-3, 115–122.993
BUSCK, J., AND HEISELBERG, H. 2004. Gated Viewing and High-Accuracy Three-994
dimensional Laser Radar. Applied Optics 43, 24, 4705–4710.995
CAMPILLO, A., AND SHAPIRO, S. 1983. Picosecond streak camera fluorometry–A996
review. Quantum Electronics, IEEE Journal of 19, 4, 585–603.997
CANESTA. http://canesta.com/.998
DAS, B., YOO, K., AND ALFANO, R. 1993. Ultrafast time-gated imaging in thick999
tissues: a step toward optical mammography. Optics letters 18, 13, 1092–1094.1000
DEBEVEC, P., HAWKINS, T., TCHOU, C., DUIKER, H.-P., SAROKIN, W., AND1001
SAGAR, M. 2000. Acquiring the reflectance field of a human face. In SIGGRAPH1002
’00: Proceedings of the 27th annual conference on Computer graphics and interac-1003
tive techniques, ACM Press/Addison-Wesley Publishing Co., New York, NY, USA,1004
145–156.1005
FARSIU, S., CHRISTOFFERSON, J., ERIKSSON, B., MILANFAR, P., FRIEDLANDER,1006
B., SHAKOURI, A., AND NOWAK, R. 2007. Statistical detection and imaging of1007
objects hidden in turbid media using ballistic photons. Appl. Opt. 46, 23, 5805–1008
5822.1009
GARG, G. G., TALVALA, E.-V., LEVOY, M., AND LENSCH, H. P. A. 2006. Sym-1010
metric photography: Exploiting data-sparseness in reflectance fields. In Rendering1011
Techniques 2006: Eurographics Symposium on Rendering, Eurographics Associa-1012
tion, Nicosia, Cyprus, T. Anenine-Moller and W. Heidrich, Eds., 251–262.1013
GONZALEZ-BANOS, H., AND DAVIS, J. 2004. Computing depth under ambient illu-1014
mination using multi-shuttered light. In Computer Vision and Pattern Recognition,1015
2004. CVPR 2004. Proceedings of the 2004 IEEE Computer Society Conference1016
on, vol. 2.1017
GVILI, R., KAPLAN, A., OFEK, E., AND YAHAV, G. 2003. Depth keying. SPIE1018
Elec. Imaging 5006, 564–574.1019
HAMAMATSU. www.hamamatsu.com/.1020
HOFTON, M., MINSTER, J., AND BLAIR, J. 2000. Decomposition of laser altimeter1021
waveforms. Geoscience and Remote Sensing, IEEE Transactions on 38, 4 Part 2,1022
1989–1996.1023
IDDAN, G., AND YAHAV, G. 2001. 3D imaging in the studio (and elsewhere...). In1024
Proc. SPIE, vol. 4298, 48–55.1025
IMMEL, D. S., COHEN, M. F., AND GREENBERG, D. P. 1986. A radiosity method1026
for non-diffuse environments. 133–142.1027
ITATANI, J., QUERE, F., YUDIN, G., IVANOV, M., KRAUSZ, F., AND CORKUM, P.1028
2002. Attosecond Streak Camera. Physical Review Letters 88, 17, 173903.1029
KAILATH, T. 1979. Linear Systems. Prentice Hall Information and System Sciences1030
Series.1031
KAJIYA, J. T. 1986. The rendering equation. In SIGGRAPH ’86: Proceedings of the1032
13th annual conference on Computer graphics and interactive techniques, ACM,1033
New York, NY, USA, 143–150.1034
KAJIYA, J. T. 1986. The rendering equation. SIGGRAPH Comput. Graph. 20, 4,1035
143–150.1036
KAMERMAN, G., 1993. Laser Radar [M]. Chapter 1 of Active Electro-Optical System,1037
Vol. 6, The Infrared and Electro-Optical System Handbook.1038
KAWAKITA, M., IIZUKA, K., AIDA, T., KIKUCHI, H., FUJIKAKE, H., YONAI, J.,1039
AND TAKIZAWA, K. 2000. Axi-Vision Camera (real-time distance-mapping cam-1040
era). Appl. Opt 39, 22, 3931–3939.1041
KUTULAKOS, K. N., AND STEGER, E. 2007. A theory of refractive and specular1042
3d shape by light-path triangulation. International Journal of Computer Vision 76,1043
13–29.1044
LANGE, R., AND SEITZ, P. 2001. Solid-state time-of-flight range camera. Quantum1045
Electronics, IEEE Journal of 37, 3, 390–397.1046
LJUNG, L. 1987. System Identification: Theory for the User. Prentice Hall Information1047
and System Sciences Series.1048
MARSCHNER, S. 1998. Inverse rendering for computer graphics. PhD Dissertation1049
Cornell University Ithaca, NY, USA.1050
MASSELUS, V., PEERS, P., DUTRE, P., AND WILLEMS, Y. D. 2003. Relighting with1051
4d incident light fields. In SIGGRAPH ’03: ACM SIGGRAPH 2003 Papers, ACM,1052
New York, NY, USA, 613–620.1053
MCLEAN, E., BURRIS JR, H., AND STRAND, M. 1995. Short-pulse range-gated1054
optical imaging in turbid water. Applied Optics LP 34, 21.1055
MESA IMAGING. http://www.mesa-imaging.ch/.1056
MIYAGAWA, R., AND KANADE, T. 1997. CCD-based range-finding sensor. Electron1057
Devices, IEEE Transactions on 44, 10, 1648–1652.1058
MORRIS, N. J. W., AND KUTULAKOS, K. N. 2007. Reconstructing the surface of1059
inhomogeneous transparent scenes by scatter trace photography. In Proceedings of1060
the 11th International Conference on Computer Vision.1061
NARASIMHAN, S. G., GUPTA, M., DONNER, C., RAMAMOORTHI, R., NAYAR,1062
S. K., AND JENSEN, H. W. 2006. Acquiring scattering properties of participating1063
media by dilution. In SIGGRAPH ’06: ACM SIGGRAPH 2006 Papers, ACM, New1064
York, NY, USA, 1003–1012.1065
NAYAR, S. K., IKEUCHI, K., AND KANADE, T. 1990. Shape from interreflections.1066
In Third International Conference on Computer Vision.1067
NAYAR, S. K., KRISHNAN, G., GROSSBERG, M. D., AND RASKAR, R. 2006. Fast1068
separation of direct and global components of a scene using high frequency illumi-1069
nation. In SIGGRAPH ’06: ACM SIGGRAPH 2006 Papers, ACM, New York, NY,1070
USA, 935–944.1071
PATOW, G., AND PUEYO, X. 2003. A Survey of Inverse Rendering Problems. Com-1072
puter Graphics Forum 22, 4, 663–687.1073
PMD TECHNOLOGIES. http://www.pmdtec.com/.1074
RAMAMOORTHI, R., AND HANRAHAN, P. 2001. A signal-processing framework1075
for inverse rendering. In Proceedings of the 28th annual conference on Computer1076
graphics and interactive techniques, ACM New York, NY, USA, 117–128.1077
RUSSELL, G., BELL, J., HOLT, P., AND CLARKE, S. 1996. Sonar image interpreta-1078
tion and modelling. Autonomous Underwater Vehicle Technology, 1996. AUV ’96.,1079
Proceedings of the 1996 Symposium on (Jun), 317–324.1080
SCHROEDER, W., FORGBER, E., AND ESTABLE, S. 1999. Scannerless laser range1081
camera. Sensor Review 19, 4, 28–29.1082
SEITZ, S. M., MATSUSHITA, Y., AND KUTULAKOS, K. N. 2005. A theory of inverse1083
light transport. In Proc. Tenth IEEE International Conference on Computer Vision1084
ICCV 2005, vol. 2, 1440–1447.1085
SEN, P., CHEN, B., GARG, G., MARSCHNER, S. R., HOROWITZ, M., LEVOY, M.,1086
AND LENSCH, H. P. A. 2005. Dual photography. In SIGGRAPH ’05: ACM1087
SIGGRAPH 2005 Papers, ACM, New York, NY, USA, 745–755.1088
VANDAPEL, N., AMIDI, O., AND MILLER, J. 2004. Toward laser pulse waveform1089
analysis for scene interpretation. In Robotics and Automation, 2004. Proceedings.1090
ICRA’04. 2004 IEEE International Conference on, vol. 1.1091
VERHAEGEN, M., AND VERDULT, V. 2007. Filtering and System Identification: A1092
Least Squares Approach. Cambridge University Press.1093
YU, Y., DEBEVEC, P., MALIK, J., AND HAWKINS, T. 1999. Inverse global il-1094
lumination: recovering reflectance models of real scenes from photographs. In1095
SIGGRAPH ’99: Proceedings of the 26th annual conference on Computer graph-1096
ics and interactive techniques, ACM Press/Addison-Wesley Publishing Co., New1097
York, NY, USA, 215–224.1098
12 Appendix1099
Matlab source code1100
\small1101
%% System identification Code1102
% M=# of patches in scene; Delays=pairwise delay matrix1103
% Y[:,1:T] = Time image output for T time instants1104
% U[:,1:T] = Time image input for T time instants1105
1106
% Initial guess for parameter values f[i,j,k]1107
initialGuess = 0.1.*ones(M,M,M);1108
1109
% Construct system matrices. Implements section 21110
[A, B, C] = constructSysMat (M,Delays,initialGuess);1111
1112
% Initial state vector1113
L0 = zeros(size(A,1),1);1114
1115
% Construct state space model with initial estimates1116
ssModel = idss(A,B,C,0,0,L0,'Ts',1, ...1117
13
Online Submission ID: 010
'SSparameterization','structured');1118
1119
% Identify parameters to be estimated. Set them to NaN1120
[Aest, Best, Cest] = setParams2NaN (A,B,C);1121
ssModel.As = Aest; ssModel.Bs = Best; ssModel.Cs = Cest;1122
1123
% Create I/O data1124
data = iddata(y(:,1:T)',U(:,1:T)',1);1125
1126
% Prediction error minimization to estimate parameters1127
modelEstimated = pem(data, ssModel);1128
1129
% Read out the estimated parameters1130
estimates = readEstimates (modelEstimated.As,1131
modelEstimated.Bs,modelEstimated.Cs);1132
14