1web.media.mit.edu/~raskar/adm/book06feb08/illuminatio2... · web viewthey are (1) presence and...

1 Illumination Over time, the camera has morphed from a cumbersome view camera to an extremely portable handheld device. But illumination remains quite expensive, space-consuming and awkward. Given the sophistication of today’s consumer cameras, one can claim that the only significant thing that separates an amateur from a professional today is the choice of lighting. What can we learn from the expert? How can we create programmable lighting that reduced emphasis on makes human judgment at the time of capture?

One of the oldest examples of computational illumination is strobe photography of Edgerton at MIT in the 1930’s. Instead of a short exposure shutter, he used a traditional camera but with a novel strobe that has a very short duration. By varying the parameters of the camera, one can capture a variety of challenging scenes. What parameters of lighting are programmable? They are (1) presence and absence: we can take a photo without or with a external light (2) duration and brightness (3) color , wavelength and polarization (4) position and orientation (5) modulation in time (strobing) or space (like a projector). In addition sometime we can exploit the change in natural lighting.

1.1 Exploiting Duration and Brightness

1.1.1 Stroboscope for Freezing High Speed Motion Harold Edgerton pushed the instantaneous ideal to extremes by using ultra-short strobes to illuminate transient phenomena, and ultra-short shutters to measure ultra-bright phenomena quickly, such as his famous high-speed movies of atomic bomb explosions.

1.1.2 Sequential Multi-Flash StroboscopyIn some cases, he also used a sequence of flash pulses to superimpose a time-sampled sequence onto a single photograph, such as golf swing.

Figure 1 An early example of Computational Illumination. By controlling the flash duration, Edgerton showed freezing of motion.

1.1.3 Flash Dynamic RangeCurrent cameras use onboard sensors and processing to coarsely estimate the flash level and the exposure settings. But these estimates are based on aggregate measurements and lead to the common problem of over-illumination or under-illumination. It is difficult to find a single flash intensity value that can light up distant or dark objects without saturating nearby or bright objects. Also, the quality of the flash/no-flash images may be limited in terms of dynamic range. Figure ?? shows an example of such a HDR scene. In such cases, Agrawal et al. [Agrawal et al 20005] advocate using several images taken under different flash intensities and exposures to build a flash-exposure HDR image.

Figure 2 Flash Exposure High Dynamic Range Sampling (Agrawal, Raskar, Nayar, Li Siggraph 2005)

Figure ?? shows an example of this exploration for two-dimensional space of flash intensity and exposure parameters. In a scene, the brightness of required flash is a function of scene depth, natural illumination and surface reflectance. For example, a distant point with dark reflectance will require a bright flash. A nearby point that is already well lit with natural lighting will be over exposed with a flash. A far away point, however, will not be lit even by a very bright flash. Hence, the only way to make sure it is well exposed is to use a longer exposure time. So, in general, in a challenging scene one may need to take multiple pictures along the exposure and flash brightness axis. The example shows photos taken at 6 different exposure setting and four flash brightness settings. The four flash setting include a no-flash plus 3 increasing brightness values. Several consumer as well as professional cameras support manual setting for flash intensity. Taking 24 images maybe an overkill. Agrawal et al. [Agrawal et al 20005] present a greedy approach, where after taking each photo they analyze the pixel values for over or under exposure and suggest the next optimal exposure and flash parameter setting. By adaptively sampling the flash-exposure space they can minimize the number of captured images for any given scene.

1.2 Presence or Absence of Flash The simplest form of computational illumination is perhaps the ubiquitous camera flash. [DiCarlo et al 2001] first explored the idea of capturing a pair of images for the same camera position - one illuminated with ambient light only, and the other using the camera flash as an additional light source. They use this image pair to estimate object reflectance functions, an the spectral distribution of the ambient ilumination. [Hoppe et al.2003]

acquire multiple photos under different flash intensities, and allow the user to interpolate between them to simulate intermediate flash intensities.

1.2.1 Flash/No-flash Pair for Denoising

Concurrent work by [Petschnigg et al. 2004] and [Eisemann et al.2004] proposed very similar techniques of combining the information contained in the flash and no-flash image pair to generate a single nice image. The no-flash photo captures the large-scale illumination effects such as the ambiance of the scene. However, in a low-light situation, the no-flash photo generally has excessive noise. The flash photo in contrast has much lower noise and more high frequency details, but fails to preserve the mood of the scene. The basic idea here is to decouple the high and low frequency components of the images, and then recombine to preserve the desired characteristics (detail from the flash photo, and large scale ambiance from the no-flash photo). This decoupling is achieved using a modified bilateral filter called joint bilateral filter,

Smoothing of an image using bilateral filter produces edge-preserving blur. It allows one to create a low-frequency component of the image and still preserve sharp features. In the joint bilateral filter, the intensity difference in the flash photo is used. Since the flash photo has lower noise, this gives a better result and avoids over or under blurring.

Figure 3 Eisemann and Durand, Siggraph 2004

Figure 4 Combining a no-flash and flash image. (Left) Top: Photograph taken in a dark environment, the image is noisy and/or blurry. Bottom: Flash photography provides a sharp but flat image with distracting shadows at the silhouette of objects. (Middle) Zoom showing the noise of the available-light image. (Right) The technique merges the two images to transfer the ambiance of the available lighting. Note the shadow of the candle on the table. (Courtesy Elmar Eisemann and Fredo Durand, 2004) (Permission [email protected])

1.2.2 Removing Flash Artifacts

Flash images are known to suffer from several problems: saturation of nearby objects, poor illumination of distant objects, reflections of objects strongly lit by the flash and strong highlights due to the reflection of flash itself by glossy surfaces. One approach has been to use a flash and no-flash (ambient) image pair to produce better flash images. Agrawal et al. [Agrawal et al 20005] rely on the observation that the orientation of image gradients due to reflectance geometry are illumination invariant, while those due to changes in illumination are not. They propose a gradient projection scheme to decompose the illumination effects from the rest of the image. The gradient projection scheme is based on a gradient coherence model.

Figure xx below shows flash and ambient images of a painting, where the ambient image includes annoying reflections of the photographer. The low-exposure flash image avoids reflections, but has a hot spot. One can remove the reflections in the ambient image by removing the component of the ambient image gradients perpendicular to the flash image gradients. The reconstruction from projected gradients creates a reflection free result. The reconstruction from residual gradients recovers the reflection layer. The work by Agrawal et al. [Agrawal et al 20005] also shows how to compensate for flash intensity falloff due to depth by exploiting the ratio of the flash and no-flash photos.

SHAPE \* MERGEFORMAT

Figure 5 Removing flash artifacts with gradient vector projection. Undesirable artifacts in photography can be reduced by comparing image gradients at corresponding locations in a pair of flash and ambient images. (Agrawal, Raskar,

Nayar, Li Siggraph 2005)

1.2.3 Flash-based Mattinghttp://research.microsoft.com/research/pubs/view.aspx?pubid=1614(In this paper, we propose a novel approach to extract mattes using a pair of flash/no-flash images. Our approach, which we call flash matting, wasinspired by the simple observation that the most noticeable difference between the flash and no-flash images is the foreground object if the background scene is sufficiently distant. We apply a new matting algorithm called joint Bayesian flash matting to robustly recover the matte from flash/no-flash images, even for scenes in which the foreground and the background are similar or the background is complex. Experimental results involving a variety of complex indoors and outdoors scenes show that it is easy to extract high-quality mattes using an off-the-shelf, flash-equipped camera. We also describe extensions to flash matting for handling more general scenes.

Show a result here on flash based matting.)

1.3 Modifying Color and wavelengthHAEBERLI, P. 1994. Grafica Obscura web site. http://www.graficaobscura.com/synth/index.html

Residual Gradient

Vector

Intensity Gradient Vector Projection

Result Gradient Vector

Ambient Flash Result Residual

Reflection Ambient Gradient Vector

Flash Gradient Vector

1.4 Position and Orientations of LightingWe can also change the position and orientation of the lights. Changing position introduces new shading as well as shadows. For light with shaped output profile, changing the orientation also changes the absolute intensity but it does not change the incident direction of light at any point in the scene.

1.4.1 Shaping Lighting using Reflectors and Guides

1.4.2 Shape and Detail Enhancement using Multi-Position Flashes Raskar et al.[Raskar et al 2004] used a multi-flash camera to find the silhouettes in a scene. They take four photos of an object with four different light positions (above, below, left and right of the lens). They detect shadows cast along the depth discontinuities are use them to detect depth discontinuities in the scene. The detected silhouettes are then used for stylizing the photograph and highlighting important features. They also demonstrate silhouette detection in a video using a repeated fast sequence of flashes.

Figure 6 Multi-flash Camera for Depth Edge Detection. (Left) A camera with four flashes. (Right) Photos due to individual flashes, highlighted shadows and epipolar traversal to compute the single pixel depth edges.

1.4.3 Relighting using Domes and Light Waving

Light fields [Levoy 1996] and Lumigraph [Gortler 1996} reduced the more general plenoptic function [Adelson 1991] to a four dimensional function, L(u,v,s,t) that describes the the presence of light in free space, ignoring the effect of wavelength and time. Here (u,v) and (s,t) are the parameters on two parallel planes respectively that describe a ray of light in space. A slightly different parameterization can be used to decribe the incident light field on an object. If we think of the object surrounded by a while sphere of imaginary projectors looking inwards, (thetai, phii) describes the angular position of the projector on the unit sphere, and (u,v) the pixel position on that projector. Thus, the function Li(u,v,theta,phi) gives complete control over the incident light on an object in free space. Similarly a sphere of inward looking cameras would capture the entire radiant light field of an object, Lr(u,v,theta,phi). Debevec et al.[Debevec et al 2001] introduced the 8D reflectance field that describes relationship of the incident and radiant light fields of a scene. An additional dimension of time is sometimes added to describe light interaction with an object that changes over time.

While the reflectance field gives a complete description of how light interacts with a scene, acquiring this complete function would require enormous amounts of time and storage. Significant work has been done in trying to acquire lower dimensional subsets of this function, and using it for restricted re-lighting and rendering.

Most image based relighting work relies on the simple observation that light interacts linearly with materials [Nimeroff 1994, Haeberli 1992]. If a fixed camera makes an image Ii from a fixed scene lit only by a light L i , then the same scene lit by many lights scaled by weights wi will make an image Iout=sumi (wiIi). Adjusting weights lets us ``relight’’ the image, as if the weights modulate the lights rather than the images.

Debevec et al.[Debevec et al 2001] used a light stage comprising of a light mounted on a rotating robotic arm to acquire the non-local reflectance field of a human face. The point-like light source can be thought of as a simplified projector with a single pixel. Thus the incident light field is reduced to a 2D function. They acquired images of the face using a small number of cameras with densely sampled lighting directions. They demonstrated generation of novel images from the original viewpoints under arbitrary illumination. This is done by simply adjusting the weights wi to match the desired illumination intensity from different directions. They also are also able to simulate small changes in the viewpoint using a simple model for the skin reflectance. Hawkins et al.[Hawkins et al 2001] used a similar setup and used it for digitizing cultural artifacts. They argue for the use reflectance field in digital archiving instead of geometric models and reflectance textures. Koudelka et al.[Koudelka et al 2001} acquire a set of images from a single viewpoint as a point light source moved around the object, and estimate the surface

geometry by using two set of basis images. They then estimate the apparent BRDF for each pixel in the images, and use this to render the object under arbitrary illumination. Debevec et al.[Debevec ey al 2002} proposed an enhanced light stage comprising of a large number (156) of inward pointing LEDs distributed on a spherical structure, about two meters in diameter, around the actor. They set each light to an arbitrary color and intensity to simulate the effect of a real world environment around the actor. The images gathered by the light stage, together with a mask of the actor captured using infrared sources and detector, were used to seamlessly composite the actor into a virtual set while maintaining consistent illumination. Malzblender et al. [Malzbender et al 2001] used 50 inward looking flashes placed on a hemispherical dome and a novel scheme for compressing and storing the 4D reflectance field, called the Polynomial Texture Map. They assumed that the color of a pixel changed smoothly as the light moved around the object, and store only the coefficients of a biquadratic polynomial that best models this change for each pixel. This highly compact representation allows for real time rendering of the scene with arbitrary illumination, and works fairly well for diffuse objects; specular highlights are not modeled very nicely by the polynomial model and result in visual artifacts.

The free-form light stage [Masselus 2002] presented a way to acquire a 4D slice of the reflectance field without the use of an extensive light-stage. Instead, they used a handheld, free-moving light source around the object. The light position was estimated automatically from four diffuse spheres placed near the object in the field of view of the camera. The data acquisition time was reported as 25-30 minutes. Winnemoller et al. [Winnemoeller et al 2005] used dimensionality reduction and a slightly constrained light scanning pattern to estimate approximate light source position without the need for any additional fiducials in the scene.

Akers et al. [Akers et al 2003] use spatially varying image weights on images acquired with a light stage similar to [Debevec et al 2001]. They use a painting interface allow an artist to locally modify the relit image as desired. While the spatially varying mask gives greater flexibility, it might also gives results that are not physically realizable and look unrealistic. [Anrys et al.2004] and [Mohan et al.2005] used a similar painting interface to help a novice user in lighting design for photography. The users sketch a target image, and the system finds optimal weights for each basis image to get a physically realizable result that is closest to the target. [Mohan et al.2005] argue that accurate calibration is not necessary for the application photographic relighting, and propose a novel reflector based acquisition system. They place a moving-head gimbaled disco light inside a diffuse enclosure, together with the object to be photographed. The spot from the light on the enclosure acts as an area light source that illuminates the object. The light source is moved by simply rotating the light and capturing images for various light positions. The idea of area light sources was also used in bayesian relighting [Fuchs 2005].

1.4.4 Towards Reflectance Fields Capture in 4D, 6D and 8D

1.5 Modulation in SpaceWe can create an intelligent flash that behaves much like a projector. Shree Nayar coined the term ‘CamPro’ where the projector is supporting the operation of a camera. Here we can change not only the overall brightness but also the radiance of every ray emitted from the projector-flash. In the future, the projector maybe replaced by smart lasers or with light sources with highly programmable mask patterns in front of them.

1.5.1 Projector for Structured Light1. http://eia.udg.es/~jpages/ReportCodedLight03.pdf(Coded structured light is considered one of the most reliable techniques for recovering the surface of objects. This technique is based on projecting a light pattern and imaging the illuminated scene from one or more points of view. Since the pattern is coded, correspondences between image points and points of the projected pattern can be easily found. The decoded points can be triangulated and 3D information is recovered. We present an overview of the existing techniques, as well as a new and definitive classification of patterns for structured light sensors. We have implemented a set of representative techniques in this field and present some comparative results.The advantages and constraints of the different patterns are also discussed.)

Such structured light schemes have been improved to include codes that also exploit the boundary. http://graphics.stanford.edu/papers/realtimerange/

2. Space time codinghttp://grail.cs.washington.edu/projects/stfaces/

1.5.2 Masks for Shadows and Light Attenuation1. Nayar direct global http://www1.cs.columbia.edu/CAVE/projects/separation/http://www1.cs.columbia.edu/CAVE/projects/separation/separation_gallery.php

2. Raskar Prakash Motioncapturehttp://www.merl.com/people/raskar/LumiNetra/

1.6 Modulation in TimeWe can also change the pattern of the flash in time. We can use strobes to synchronize with activity in the scene.

1.6.1 High Frequency Strobes for Freezing Periodic MotionWe can slow or freeze high speed periodic motion phenomenon using strobes of a frequency which almost matches the frequency of the periodic motion. For example, vocal folds moving at 1000 Hz can viewed with a laryngoscope with auxiliary lighting. If the strobe is also at 1000Hz, the vocal folds appear frozen if the person maintains a continuous pitched sound. Strobe is at 99Hz, the strobe create an illusion that the vocal folds are moving only once per second (1 Hz). This makes it easy for the observing

doctor to see the correctness of vocal fold movement. They can also detect any distortions in the fold shape.http://www.divop.com/downloads/SS109BOV.pdf

1.6.2 Colored strobes for Trailing edges of Motion

Sometimes the strobes are colored with different phase delay between them or with different frequencies. If anything is static, the two colors just add up. If the object is moving, it shows colored trails.

1.7 Exploiting Natural Illumination VariationsSometimes we cannot actively change the illumination for photography. But we can still exploit natural variations such as due to change in sunlight over the day.

1.7.1 Intrinsic ImagesDecomposing images into layers, parts, and other types of pieces is often a useful image processing task. In an intrinsic image decomposition, the goal is to decompose the input image I into a reflectance image R and an illumination image L such that:Image (I) = Reflectance(R) x Illumination (L)Log(I) = log(R) + log(L)

(More at http://www.cs.toronto.edu/~zemel/Courses/CS2541/Lect/intrinsic.pdfhttp://www.ai.mit.edu/courses/6.899/papers/13_02.PDF)

Figure 7 Intrinsic Images. The goal is to decompose an image into its reflectance (intrinsic) and illumination layer.

http://www.cs.toronto.edu/~zemel/Courses/CS2541/Lect/intrinsic.pdf

http://www.ai.mit.edu/courses/6.899/papers/13_02.PDF

Figure 8 Intrinsic Images from a webcamera sequence. (Permission Yair Weiss)

Every image is the product of the characteristics of a scene. Two of the most important characteristics of the scene are its shading and reflectance. The shading of a scene is the interaction of the surfaces in the scene and the illumination. The reflectance of the scene describes how each point reflects light. The ability to find the reflectance of each point in the scene and how it is shaded is important because interpreting an image requires the ability to decide how these two factors affect the image. For example, the geometry of an object in the scene cannot be recovered without being able to isolate the shading of every point. Likewise, segmentation would be simpler given the reflectance of each point in the scene. In this work, we present a system which finds the shading and reflectance of each point in a scene by decomposing an input image into two images, one containing the shading of each point in the scene and another image containing the reflectance of each point. These two images are types of a representation known as intrinsic images [1] because each image contains one intrinsic characteristic of the scene.(http://people.csail.mit.edu/people/mtappen/nips02_final.pdf)

1.7.2 Context Enhancement of Night-time Photoshttp://www.merl.com/people/raskar/NPAR04/

Figure 9 The night time photo is context enhanced to the photo on right using a prior daytime image. (Raskar, Ilie, Yu 2004)

1.7.3 Shadow Mattinghttp://grail.cs.washington.edu/projects/digital-matting/shadow-matting/

1web.media.mit.edu/~raskar/adm/book06feb08/illuminatio2... · web viewthey are (1) presence and...

Documents