deriving intrinsic images from image sequencesyweiss/iccv01.pdf · cylinder as a single segment in...

Deriving intrinsic imagesfrom imagesequences

Yair WeissComputerScienceDivision

UC BerkeleyBerkeley, CA [email protected]

Abstract

Intrinsic imagesare a usefulmidlevel descriptionof scenesproposedby Barrow and Tenenbaum[1]. An image is de-composedinto two images: a reflectanceimage andan il-luminationimage. Finding such a decompositionremainsa difficult problem in computervision. Here we focuson a slightly easierproblem: given a sequenceof

�im-

ageswhere thereflectanceis constantandthe illuminationchanges,canwerecover

�illumination imagesanda sin-

glereflectanceimage?Weshowthat thisproblemis still ill-posedandsuggestapproachingit asa maximum-likelihoodestimationproblem.Following recentworkon thestatisticsof natural images, we usea prior that assumesthat illu-minationimageswill give rise to sparsefilter outputs. Weshowthat this leadsto a simple, novel algorithm for re-covering reflectanceimages. We illustrate the algorithm’sperformanceon real andsyntheticimagesequences.

In: Proc ICCV (2001)

1 Introduction

Barrow andTenenbaum(1978)introducedtheterm“intrin-sic images”to referto a midlevel decompositionof thesortdepictedin figure 1. The observed imageis a productoftwo images:anillumination imageandareflectanceimage.We call this a midlevel descriptionbecauseit falls shortofa full, 3D descriptionof the scene:theintrinsic imagesareviewpoint dependentandthephysicalcausesof changesinillumination at differentpointsarenot madeexplicit (e.g.thecastshadow versustheattachedshadows in figure1c).

Barrow andTenenbaumarguedthatsucha midlevel de-scription,depsitenotmakingexplicit all thephysicalcausesof imagefeatures,canbeextremelyusefulfor supportingarangeof visualinferences.Forexample,thetaskof segmen-tationmaybepoorly definedon the input imageandmanysegmentationalgorithmsmakeuseof arbitrarythresholdsinorderto avoid beingfooledby illuminationchanges.Ontheintrinsic, reflectanceimage,on the otherhand,even prim-itive segmentationalgorithmswould correctlysegmentthecylinder asa singlesegmentin figure 1b. Similarly, view-

basedtemplatematchingandshape-from-shadingwouldbesignificantlylessbrittle if they couldwork on the intrinsicimagerepresentationratherthanon theinput image.

Recoveringtwo intrinsic imagesfrom a singleinput im-ageremainsa difficult problemfor computervision sys-tems. This is a classicill-posed problem: the numberofunknowns is twice the numberof equations.Denotingby��

the input imageandby � � ��the reflectanceim-

ageand � � ��the illumination image,the threeimages

arerelatedby: �� (1)

Obviously, one can always set � ��and satisfy

the equationsby setting � � �� . Despitethis

difficulty, someprogresshasbeenmadetowardsachiev-ing this decomposition.LandandMcCann’s Retinex algo-rithm [7] couldsuccessfullydecomposescenesin whichthereflectanceimagewaspiecewise constant.This algorithmhasbeencontinuouslyextendedover the years(e.g. [4]).More recently, FreemanandViola [3] have useda waveletprior to classify imagesinto one of two classes:all re-flectanceor all illumination.

In this paperwe focuson a slightly easierversionof theproblem.Givena sequenceof

�images� �� ! "�# in

whichthereflectanceis constantovertimeandonly theillu-minationchanges,canwethensolvefor asinglereflectanceimage� ��

and�

illumination images� � ��$�%�! "�# ?Ourwork wasmotivatedby theubiquityof imagesequencesof this form on the world wide web: “webcam” picturesfrom outdoorscenesasin figure2. Thecamerais stationaryandthesceneis mostlystationary:thepredominantchangein thesequencearechangesin illumination.

While this problemseemseasier, it is still completelyill-posed: at every pixel thereare

�equationsand

�'&(�unknowns. Again, one can simply set � � ��)�*�

and� � ��+�,�� andfully satisfytheequations.Ob-

viously, someadditionalconstraintsareneeded.Oneversionof this problemthat hasreceivedmuchat-

tentionis thecasewhen � � ��areattachedshadows of

asingle,convex, lambertiansurface: � � ��-�/.01� ��21

Input = reflectance 3 illumination

a b c

Figure1: Theintrinsic imagedecomposition[1].

Figure 2: Imagesfrom a “webcam” at www.berkeley.edu/webcams/sproul.html. Most of the changesare changesinillumination. Canwe usesuchimagesequencesto derive intrinsic images?.4 � ��

, with05��

thesurfacenormalat��

and.4

a vec-tor in the directionof the light source. This is the photo-metricstereoproblemwith unknown light-source,andcanbe solved usingSVD techniquesup to a GeneralizedBasReliefambiguity[12, 6].

Farid andAdelsonaddresseda specialcaseof this prob-lem [2]. They assumedthat � ��6�879��

, i.e.that all illumination imagesare relatedby a scalar. Theyusedindependentcomponentanalysisto solve for � � ��and � ��

.

A similar problem has recently been addressedbySzeliski,AvidanandAnandan[11]. They dealtwith addi-tive transparency sequencesso that

��:� � � ��;&� � ��<=�?>A@��5<=�?>ABC��. Of course, if we exponentiate

both sideswe get equation1 with an additionalconstraintthat the illumination imagesare warpedimagesof a sin-gle illumination image. They showed that since � ��is boundedbelow (i.e. that � � ��

is positive), settingD� ��E�(FHGJI ! �� cangive a goodestimatefor thereflectance.They usedthemin filter asaninitialization fora secondestimationprocedurethatestimatedthemotionof� � ��

andimprovedtheestimate.

In thispaperwe takeadifferentapproach.Weformulatethe problemasa maximum-likelihoodestimationproblembasedon the assumptionthat derivative-like filter outputs

applied to � will tend to be sparse. We derive the MLestimatorunder this assumptionand show that it gives asimple,novel algorithmfor recovering reflectance.We il-lustratethe algorithm’s performanceon real andsyntheticimagesequences.

2 ML estimator assuming sparseness

For convenience,we work here in the log domain. De-note by K � ��L��M�N�� the logarithmsof the in-put, reflectanceand illumination images. We are given:K � ��O�PL��Q&RN��

andwish to recoverL��

andN��

.To make theproblemsolvable,wewantto assumeadis-

tribution overN� ��

. Our first thoughtwas to make asimilar assumptionto that madein the Retinex work: thatillumination imagesarelower contrastthanrelfectanceim-ages. We found, however, that while this may hold truein the Mondrianworld studiedby LandandMcCann,it israrely true for theoutdoorscenesof the typeshown in fig-ure2. Edgesdueto illumination oftenhave ashigh a con-trastasthosedueto reflectancechanges.

Weuseaweaker, moregenericprior thatis motivatedbyrecentwork on the statisticsof naturalimages.A remark-ably robust property of natural imagesthat has received

2

a b

−4 −3 −2 −1 0 1 2 3 4 50

2000

4000

6000

8000

10000

12000

−0.8 −0.6 −0.4 −0.2 0 0.2 0.4 0.6 0.8 10

5000

10000

15000

−1 −0.5 0 0.5 10

0.2

0.4

0.6

0.8

1

x

exp(

−|a

x|)

c d e

Figure3: Weuseaprior motivatedby recentwork onthestatisticsof naturalscenes.Derivativefilter outputstendto besparsefor a wide rangeof images.a-b images: c-d histogramsof horizontalderivative filter outputs.e A Laplciandistribution.Notethesimilar shapeto theobservedhistograms.

muchattentionlately is thefact thatwhenderivative filtersareappliedto naturalimages,the filter outputstendto besparse[8, 10]. Figure3 illustratesthis fact: the imageofthefaceandtheoutdoorscenehavesimilar histogramsthatarepeakedat zeroandfall off muchfasterthana Gaussian.This propertyis robust enoughthat it continuesto hold ifwe apply a pixelwise log function to eachimage(the his-togramsshown areactuallyfor thelog images).Thesepro-totypicalhistogramscanbewell fit by aLaplaciandistribu-tion S � �� #TVUXW�Y�Z @ Z (althoughbetterfits areobtainedwithrichermodels).Figure3eshowsaLaplaciandistribution.

We will thereforeassumethatwhenderivativefilters areappliedto

N��the resultingfilter outputsare sparse:

more exactly, we will assumethe filter outputsare inde-pendentover spaceandtime andhave a Laplaciandensity.Assumewehave

0filters �%[]\ � wedenotethefilter outputs

by ^]\ �� K�_+[A\ . We useL \ to denotethereflectance

imagefilteredby the ` th filterL \ ��L _a[A\ .

Claim 1: Assumefilter outputsappliedtoN� ��

areLaplaciandistributedandindpendentover spaceandtime.Thenthe ML estimateof the filtered reflectanceimage

DL \aregivenby: DL \ � ��cb+dfe�g hji ! ^]\ �� (2)

Proof: AssumingLaplaciandensitiesandindependence

yieldsthelikelihood:S � ^]\lk L \ m� �npo@rq B%q ! U W�s�Z t�uXv @rq B%q ! w W�xyujv @rq B w Z (3)� �n U Wzs-{}|M~ �$~ �CZ t u v @rq B%q ! w W�x u v @rq B w Z (4)

Maximizing the likelihoodis equivalentto minimizing thesum of absolutedeviations from ^ \ � �� . The sum ofabsolutevalues(or

N # norm) is minimizedby the median.�.Claim 1 gives us the ML estimatefor the filtered re-

flectanceimagesDL \ . To recover

L, theestimatedreflectance

function, we solve the overconstrainedsystemsof linearequations: []\+_ DL�� DL \ (5)

It canbeshown thatthepsuedo-inversesolutionis givenby:DL�� _�� \ [ x\ _ DL \C� (6)

with [ x\ the reversedfilter of [ \ : [ \ � �� [ x\ ��<��<��and

�asolutionto:� _�� \ [ x\ _a[]\j� �'�

(7)

3

frame 1

P

Q

frame 2

P

Q

frame 3

P

Q

horiz filter vertical filter



reflectance image

P

Q

median horiz median vertical

Figure4: An illustrationof theML estimationalgorithm

Notethat�

is indpendentof theimagesequencebeingcon-sideredandcanbecomputedin advance.

Figure4 illustratesthealgorithmfor calculatingtheMLreflectancefunction. The threeframesin the leftmostcol-umn show a circle illuminatedwith a squarecastshadow.The shadow is moving over time. The middle and rightcolumnsshow the horizontalandverticalfilter outputsap-pliedto thissequence.Takingapixelwisemedianovertimegivestheestimatedfilteredreflectanceimagesin thebottomrow. Finally, applyingequation6 givesthe ML estimatedreflectancefunction. Oncewe have anestimatefor

L�� wecanestimate

N��;� K � ��;<)L�� .

The ML estimatehassomesimilaritiesto the temporalmedianfilter that is often usedin accumulatingmosaicsfrom imagesequences(e.g.[9]) but hasvery differentper-formancecharacteristics.In figure 4 taking the temporalmedianof the threeframesin the left column would notgive the right reflectancefunction. A pixel whoseinten-sity is bright in all frames(e.g. pixel P in figure 4a) anda pixel whoseintensity is dark in all frames(e.g. pixel Qin figure 4a) must have different medians. Thus, a pixelwhoseintensityis alwaysdarkmustbeestimatedashavingadifferentreflectancethatapixel whoseintensityis alwayslight.

In the ML estimate,on the other hand, this is not the

case.Notethatpixels S and � areestimatedashaving thesamereflectanceeven thoughonewasalways lighter thantheother. This is becausetheML estimateperformsa tem-poralmedianonthefilteredimages,not theoriginal images.In termsof probabilisticmodeling,a temporalmedianfilteron imagesis theML estimateif we assumedthat

N� ��is sparse,i.e. thatmostpixelsin

N�� arecloseto zero.

This is rarely true for naturalimages.In contrast,hereweareassumingthat the filter outputsappliedto

N��are

sparse,anassumptionthatoftenholdsfor naturalimages.What if [ \ _ N� ��

doesnot have exactly a Laplaciandistribution?Thefollowingclaimshowsthattheexactformof thedistribution is not importantaslong asthefilter out-putsaresparse.

Claim 2: Let �� S � k []�C_ N�� k�� . Thenthees-timatedfilteredreflectancesarewithin � of thetruefilteredreflectanceswith probabilityat least:�z�� "�#+� � �=� �y��< �� W � � ��

Proof: If morethan �A�C� of thesamplesof [ \ _ N� ��are within � of somevalue, then by the definition of themedian,the medianmust be within � of that value. Theclaim follows from thebinomial formula for the sumof

�independentevents.

�4

Claim 2 doesnot require that the illumination imageshave a Laplaciandistribution in their filter outputs,ratherthemoresparsethefilter outputsarethequickerthemedianestimatewill convergeto the truefiltered reflectancefunc-tion. For example,thelighting imagein figure1c,doesnothaveaLaplaciandistributionbut is verysparse: j�X� of thefilter outputshave magnitudelessthan � �(� � of themax-imal magnitude.Claim 2 guaranteesthe k DL \ <¡L \�kC�� withprobability at least ��¢ £r¤ given only threeframesandwithprobabilityat least�z¢ £C¥ givenfiveframes.

3 Results - synthetic sequences

All theresultsshown in thispaperusedjust two filters: hor-izontalandverticalderivativefilters.

Figure5 show thefirst andlast framesfrom a sequencein which a squarecastshadow is moving over a circle andellipse.Notethatthecircle is alwaysin shadow andthattheellipseis alwayshalf in shadow. Despitethis, the ML re-flectanceestimatecorrectlygetsrid of theshadowsonboth.For comparison,figure5 alsoshows thetemporalmin filterandtemporalmeanfilter. Both of theseapproachessufferfrom the fact that if a pixel is always in shadow, the esti-matedreflectanceis alsodarker.

Figure6 shows thefirst andlastframesfrom a syntheticadditive transparency sequence.The imageof Reaganisconstantfor all framesandwe addedan imageof Einsteinthat is moving diagonally(speed¦ pixelsper frame). Fig-ure6 alsoshowstherecoveredReaganandEinsteinimages.They are indistinguishablefrom the original image. Forcomparison,figures6 alsoshows the min andmedianfil-ters. Again, the min filter assumesthat all pixels at sometime seea blackpixel from the Einsteinimage. Sincethatassumptiondoesnot hold, theestimateis quitebad.

4 Results - real sequences

Figure7 shows two framesfrom a sequencethat is partoftheYale facedatabaseB [5]. A strobelight at §A¦ differentlocationsilluminatedaperson’sface,giving §A¦ imageswithchangingillumination. Theimagesweretakenwith a cam-erawith linear responsefunction. Figure7 alsoshows theestimatedreflectanceand illumination images. Note thatnearly all the specularhighlights are gonealthoughtherearestill somehighlightsat thetip of thenose.Althoughit ishardtomeasureperformancein thistask,observersdescribetheillumination imagesas“looking likemarblestatues”,aswouldbeexpectedfrom anillumination imageof a face.

Figure8 shows two framesfrom a sequencetakenfromthe“WebCam”at UC Berkeley. We used ¤X� imagestakenat different times. Figure 8 also shows the estimatedre-flectanceandillumination images.Note that thecastshad-

ows by the treesandbuildingsaremostlygone. Note alsothatwe did not have to do anythingspecialto getrid of thepeoplein the images:sincewe usea genericprior for thelighting imageswe caneasilyaccomodatechangesthatarenotpurelydueto lighting. TheresultsarenotasgoodastheYalesequence,andwebelievethis is partiallydueto theau-tomaticgaincontrolonthewebcamerasothattheresponsefunctionis far from linear.

Even theseresultscanbe goodenoughfor someappli-cations. Figure9a shows a color sceneof SproulplazainBerkeley. SupposeamaliciusStanfordhackerwantedto in-sertthe Stanfordlogo on the plaza. Figure9b shows whathappenswhen

7blending is usedto compositethe logo

and the image. The result is noticeablyfake. Figure 9cshows theresultwhen

7blendingis usedon theestimated

reflectanceimage,andtheimageis rerenderedwith thees-timatedillumination image.Thelogo appearsto bepartofthescene.

5 Discussion

Deriving intrinsicimagesfrom asingleimageremainsadif-ficult problemfor computervision systems.Herewe havefocusedon a slightly easierproblem: recovering intrinsicimagesfrom an imagesequencein which the illuminationvariesbut the reflectanceis constant.We showed that theproblemis still ill-posed and suggestedaddinga statisti-cal assumptionbasedon recentwork in statisticsof naturalscenes:thatderivativefilter outputson theillumination im-agewill tendto besparse.We showedthatthis assumptionleadsto a novel, simplealgorithmfor reflectancerecoveryandshowedencouragingresultson a numberof imagese-quences.

Both the cameramodel and the statisticalassumptionswe have usedcan be extended. We assumeda linear re-sponsein thecameramodelandonecanderivetheML esti-matorwhenthecamerais nonlinear. We havealsoassumedthatfilter outputsareindependentacrossspaceandtime. Itwould be interestingto derive ML estimatorswhenthede-pendency is taken into account. We hopethat progressinstatisticalmodelingof illumination imageswill enableusto tackletheoriginal problemposedby Barrow andTenen-baum:recoveringintrinsic imagesfrom a singleimage.

Acknowledgements

I thankA. Efrosfor pointingout theutility of webcamsforcomputervision. Supportedby MURI-ARO-DAAH04-96-1-0341.

5

first frame lastframe ML illumination ML reflectance min filter meanfilter

Figure5: A syntheticsequencein which a squarecastshadow translatesdiagonally. Note that the pixels surroundingthediamondarealwaysin shadow, yet their estimatedreflectanceis thesameasthatof pixels thatwerealwaysin light. In themin andmeanfilters, this is not thecaseandtheestimatedreflectancesarequitewrong.

Reaganimage Einsteinimage first frame lastframe

ML Reagan ML Einstein min filter medianfilter

Figure6: Resultson a syntheticadditive transparency sequence.The Einsteinimageis translateddiagonallywith speed¦pixelsperframeandaddedto theReaganimage.TheML estimatesarenearlyexactwhile themin andmedianfiltersarenot.

frame2 frame11 ML reflectance ML illumination 2 ML illumination 11

Figure7: Resultson onefacefrom theYaleFaceDatabaseB [5]. Therewere §A¦ imagestakenwith variablelighting. Notethattherecoveredreflectanceimageis almostfreeof specularitiesandis freeof castshadows. TheML illumination imagesareshown with a logarithmicnonlinearityto increasedynamicrange.

6

frame1 frame11 ML reflectance

ML illumination 1 ML illumination 2

Figure8: Resultson a webcamsequencefrom: www.berkeley.edu/webcams/sproul.html. Therewere ¤j� imagesthatvariedmostly in illumination. Note that the ML reflectanceimageis free of castshadows. The illumination imagesareshown with a logarithmicnonlinearityto increasedynamicrange.

a b c

Figure9: Intrinsic imagesareuseful for imagemanipulation.a. The original imageof Sproulplazain Berkeley. b. TheStanfordlogo is

7blendedwith theimage:theresultis noticeablefake. c. TheStanfordlogo is

7blendedin thereflectance

imageandthenrenderedwith thederivedillumination image.

7

References

[1] H.G.Barrow andJ.M.Tenenbaum.Recoveringintrin-sic scenecharacteristicsfrom images. In A. Hansonand E. Riseman,editors, ComputerVision Systems.AcademicPress,1978.

[2] H. Farid and E.H. Adelson. Separatingreflectionsfrom imagesby useof independentcomponentsanal-ysis. Journal of the optical society of america,16(9):2136–2145,1999.

[3] W.T. FreemanandP.A. Viola. Bayesianmodelof sur-faceperception.In Adv. Neural InformationProcess-ing Systems10. MIT Press,1998.

[4] B. V. Funt,M. S.Drew, andM. Brockington.Recover-ing shadingfrom color images.In G. Sandini,editor,Proceedings:SecondEuropeanConferenceon Com-puterVision, pages124–132.Springer-Verlag,1992.

[5] A.S. Georghiades,P.N. Belhumeur, and D.J. Krieg-man.Fromfew to many: Generativemodelsfor recog-nition undervariableposeandillumination. In IEEEInt. Conf. on AutomaticFace and Gesture Recogni-tion, pages277–284,2000.

[6] K. Hayakawa. Photometricstereoundera light sourcewith arbitrarymotions. Journal of theoptical societyof America, 11(11),1994.

[7] E. H. Land andJ. J. McCann. Lightnessandretinextheory. Journal of the Optical Societyof America,61:1–11,1971.

[8] B.A. OlshausenandD. J.Field. Emergenceof simple-cell receptivefield propertiesby learningasparsecodefor naturalimages.Nature, 381:607–608,1996.

[9] H.Y. Shum and R. Szeliski. Constructionand re-finementof panoramicmosaicswith globalandlocalalignment. In ProceedingsIEEE CVPR, pages953–958,1998.

[10] E.P. Simoncelli. Statistical models for im-ages:compressionrestorationandsynthesis. In ProcAsilomar Conferenceon Signals,Systemsand Com-puters, pages673–678,1997.

[11] R. Szeliksi,S.Avidan,andP. Anandan.Layerextrac-tion from multiple imagescontainingreflectionsandtransparency. In ProceedingsIEEE CVPR, 2000.

[12] A.L. Yuille, D. Snow, R.Epstein,andP.N. Belhumeur.Determininggenerativemodelsof objectsundervary-ing illumination: shapeandalbedofrom multiple im-agesusingSVD andintegrability. InternationalJour-nal of ComputerVision, 35(3):203–222,1999.

8

deriving intrinsic images from image sequencesyweiss/iccv01.pdf · cylinder as a single segment in...

Documents