unmixing-based soft color segmentation for image manipulation€¦ · unmixing-based soft color...

19
19 Unmixing-Based Soft Color Segmentation for Image Manipulation YA ˘ GIZ AKSOY ETH Z ¨ urich and Disney Research Z ¨ urich TUNC ¸ OZAN AYDIN and ALJO ˇ SA SMOLI ´ C Disney Research Z ¨ urich and MARC POLLEFEYS ETH Z ¨ urich We present a new method for decomposing an image into a set of soft color segments that are analogous to color layers with alpha channels that have been commonly utilized in modern image manipulation software. We show that the resulting decomposition serves as an effective intermediate image representation, which can be utilized for performing various, seemingly unrelated, image manipulation tasks. We identify a set of requirements that soft color segmentation methods have to fulfill, and present an in-depth theoretical analysis of prior work. We propose an energy formulation for producing compact layers of homogeneous colors and a color refinement procedure, as well as a method for automatically estimating a statistical color model from an image. This results in a novel framework for automatic and high-quality soft color segmentation that is efficient, parallelizable, and scalable. We show that our technique is superior in quality compared to previous methods through quantitative analysis as well as visually through an extensive set of examples. We demonstrate that our soft color segments can easily be exported to familiar image manipulation software packages and used to produce compelling results for numerous image manipulation applications without forcing the user to learn new tools and workflows. Categories and Subject Descriptors: I.3.8 [Computer Graphics]: Appli- cations; I.4.6 [Image Processing and Computer Vision]: Segmentation— Pixel classification; I.4.8 [Image Processing and Computer Vision]: Scene Analysis—Color General Terms: Image Processing Authors’ addresses: Y. Aksoy and M. Pollefeys, Department of Com- puter Science, ETH Zurich, Universitatstrasse 6, CH-8092, Zurich, Switzer- land; emails: {yaksoy, marc.pollefeys}@inf.ethz.ch; T. Aydin, Disney Re- search Zurich, Stampfenbachstrasse 48, CH-8006, Zurich, Switzerland; email: [email protected]; A. Smolic, Graphics Vision and Visu- alisation Group, School of Computer Science and Statistics, Trinity College Dublin, Dublin 2, Ireland; email: [email protected]. Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies show this notice on the first page or initial screen of a display along with the full citation. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, to republish, to post on servers, to redistribute to lists, or to use any component of this work in other works requires prior specific permission and/or a fee. Permissions may be requested from Publications Dept., ACM, Inc., 2 Penn Plaza, Suite 701, New York, NY 10121-0701 USA, fax +1 (212) 869-0481, or [email protected]. c 2017 ACM 0730-0301/2017/03-ART19 $15.00 DOI: http://dx.doi.org/10.1145/3002176 Additional Key Words and Phrases: Soft segmentation, digital composit- ing, image manipulation, layer decomposition, color manipulation, color unmixing, green-screen keying ACM Reference Format: Ya ˘ gız Aksoy, Tunc ¸ Ozan Aydin, Aljoˇ sa Smoli´ c, and Marc Pollefeys. 2017. Unmixing-based soft color segmentation for image manipulation. ACM Trans. Graph. 36, 2, Article 19 (March 2017), 19 pages. DOI: http://dx.doi.org/10.1145/3002176 1. INTRODUCTION The goal of soft color segmentation is to decompose an image into a set of layers with alpha channels, such as in Figure 1(b). These lay- ers usually consist of fully opaque and fully transparent regions, as well as pixels with alpha values between the two extremes wherever multiple layers overlap. Ideally, the color content of a layer should be homogeneous, and its alpha channel should accurately reflect the color contribution of the layer to the input image. Equally impor- tant is to ensure that overlaying all layers yields the input image. If a soft color segmentation method satisfies these and a number of other well-defined criteria that we discuss in detail later, then the resulting layers can be used for manipulating the image content conveniently through applying per-layer modifications. These im- age manipulations can range from subtle edits to give the feeling that the image was shot in a different season of the year (Figure 1(c)) to more pronounced changes that involve dramatic hue shifts and replacing the image background (Figure 1(d)). In this article, we propose a novel soft color segmentation method that produces a powerful intermediate image representation, which in turn allows a rich set of image manipulations to be performed within a unified framework using standard image editing tools. Obtaining layers that meet the demanding quality requirements of image manipulation applications is challenging, as even barely visible artifacts on individual layers can have a significant negative impact on quality when certain types of image edits are applied. That said, once we devise a soft color segmentation method that reliably produces high-quality layers, numerous image manipulation tasks can be performed with little extra effort by taking advantage of this image decomposition. Importantly, the resulting layers naturally in- tegrate into the layer-based workflows of widely used image manip- ulation packages. By using soft color segmentation as a black box, and importing the resulting layers into their favorite image manipu- lation software, users can make use of their already-existing skills. While the traditional hard segmentation is one of the most active fields of visual computing, soft color segmentation has received ACM Transactions on Graphics, Vol. 36, No. 2, Article 19, Publication date: March 2017.

Upload: others

Post on 22-Jun-2020

8 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Unmixing-Based Soft Color Segmentation for Image Manipulation€¦ · Unmixing-Based Soft Color Segmentation for Image Manipulation • 19:3 Fig. 2. The original image (a) is shown

19

Unmixing-Based Soft Color Segmentationfor Image Manipulation

YAGIZ AKSOYETH Zurich and Disney Research ZurichTUNC OZAN AYDIN and ALJOSA SMOLICDisney Research ZurichandMARC POLLEFEYSETH Zurich

We present a new method for decomposing an image into a set of soft colorsegments that are analogous to color layers with alpha channels that havebeen commonly utilized in modern image manipulation software. We showthat the resulting decomposition serves as an effective intermediate imagerepresentation, which can be utilized for performing various, seeminglyunrelated, image manipulation tasks. We identify a set of requirements thatsoft color segmentation methods have to fulfill, and present an in-depththeoretical analysis of prior work. We propose an energy formulation forproducing compact layers of homogeneous colors and a color refinementprocedure, as well as a method for automatically estimating a statisticalcolor model from an image. This results in a novel framework for automaticand high-quality soft color segmentation that is efficient, parallelizable, andscalable. We show that our technique is superior in quality compared toprevious methods through quantitative analysis as well as visually throughan extensive set of examples. We demonstrate that our soft color segmentscan easily be exported to familiar image manipulation software packagesand used to produce compelling results for numerous image manipulationapplications without forcing the user to learn new tools and workflows.

Categories and Subject Descriptors: I.3.8 [Computer Graphics]: Appli-cations; I.4.6 [Image Processing and Computer Vision]: Segmentation—Pixel classification; I.4.8 [Image Processing and Computer Vision]: SceneAnalysis—Color

General Terms: Image Processing

Authors’ addresses: Y. Aksoy and M. Pollefeys, Department of Com-puter Science, ETH Zurich, Universitatstrasse 6, CH-8092, Zurich, Switzer-land; emails: {yaksoy, marc.pollefeys}@inf.ethz.ch; T. Aydin, Disney Re-search Zurich, Stampfenbachstrasse 48, CH-8006, Zurich, Switzerland;email: [email protected]; A. Smolic, Graphics Vision and Visu-alisation Group, School of Computer Science and Statistics, Trinity CollegeDublin, Dublin 2, Ireland; email: [email protected] to make digital or hard copies of part or all of this work forpersonal or classroom use is granted without fee provided that copies arenot made or distributed for profit or commercial advantage and that copiesshow this notice on the first page or initial screen of a display along withthe full citation. Copyrights for components of this work owned by othersthan ACM must be honored. Abstracting with credit is permitted. To copyotherwise, to republish, to post on servers, to redistribute to lists, or to useany component of this work in other works requires prior specific permissionand/or a fee. Permissions may be requested from Publications Dept., ACM,Inc., 2 Penn Plaza, Suite 701, New York, NY 10121-0701 USA, fax +1(212) 869-0481, or [email protected]© 2017 ACM 0730-0301/2017/03-ART19 $15.00

DOI: http://dx.doi.org/10.1145/3002176

Additional Key Words and Phrases: Soft segmentation, digital composit-ing, image manipulation, layer decomposition, color manipulation, colorunmixing, green-screen keying

ACM Reference Format:

Yagız Aksoy, Tunc Ozan Aydin, Aljosa Smolic, and Marc Pollefeys. 2017.Unmixing-based soft color segmentation for image manipulation. ACMTrans. Graph. 36, 2, Article 19 (March 2017), 19 pages.DOI: http://dx.doi.org/10.1145/3002176

1. INTRODUCTION

The goal of soft color segmentation is to decompose an image into aset of layers with alpha channels, such as in Figure 1(b). These lay-ers usually consist of fully opaque and fully transparent regions, aswell as pixels with alpha values between the two extremes wherevermultiple layers overlap. Ideally, the color content of a layer shouldbe homogeneous, and its alpha channel should accurately reflect thecolor contribution of the layer to the input image. Equally impor-tant is to ensure that overlaying all layers yields the input image.If a soft color segmentation method satisfies these and a numberof other well-defined criteria that we discuss in detail later, thenthe resulting layers can be used for manipulating the image contentconveniently through applying per-layer modifications. These im-age manipulations can range from subtle edits to give the feelingthat the image was shot in a different season of the year (Figure 1(c))to more pronounced changes that involve dramatic hue shifts andreplacing the image background (Figure 1(d)). In this article, wepropose a novel soft color segmentation method that produces apowerful intermediate image representation, which in turn allowsa rich set of image manipulations to be performed within a unifiedframework using standard image editing tools.

Obtaining layers that meet the demanding quality requirementsof image manipulation applications is challenging, as even barelyvisible artifacts on individual layers can have a significant negativeimpact on quality when certain types of image edits are applied. Thatsaid, once we devise a soft color segmentation method that reliablyproduces high-quality layers, numerous image manipulation taskscan be performed with little extra effort by taking advantage of thisimage decomposition. Importantly, the resulting layers naturally in-tegrate into the layer-based workflows of widely used image manip-ulation packages. By using soft color segmentation as a black box,and importing the resulting layers into their favorite image manipu-lation software, users can make use of their already-existing skills.

While the traditional hard segmentation is one of the most activefields of visual computing, soft color segmentation has received

ACM Transactions on Graphics, Vol. 36, No. 2, Article 19, Publication date: March 2017.

Page 2: Unmixing-Based Soft Color Segmentation for Image Manipulation€¦ · Unmixing-Based Soft Color Segmentation for Image Manipulation • 19:3 Fig. 2. The original image (a) is shown

19:2 • Y. Aksoy et al.

Fig. 1. Our method automatically decomposes an input image (a) into a set of soft segments (b). In practice, these soft segments can be treated as layers thatare commonly utilized in image manipulation software. Using this relation, we achieve compelling results in color editing (c), compositing (d), and many otherimage manipulation applications conveniently under a unified framework.

surprisingly little attention so far. In addition to direct investigationsof soft color segmentation [Tai et al. 2007; Tan et al. 2016], certainnatural alpha matting and green-screen keying methods presentedsoft color segmentation methods—without necessarily calling themas such—as a component in their pipeline. While it may seem at afirst glance that one can simply use any of these previous methodsfor practical and high-quality photo manipulation, a closer lookreveals various shortcomings of the currently available soft colorsegmentation methods. To this end, in this article, we provide atheoretical analysis of the problems with previous work, whichwe also supplement with quantitative evaluation, as well as visualexamples that support our arguments.

We address the two main challenges of soft color segmentation:devising a color unmixing scheme that results in high-quality softcolor segments and automatically determining a content-adaptivecolor model from an input image. We propose a novel energy for-mulation that we call sparse color unmixing (SCU) that decomposesthe image into layers of homogeneous colors. The main advantageof SCU when compared to similar formulations used in differentapplications such as green-screen keying [Aksoy et al. 2016] isthat it produces compact color segments by favoring fully opaqueor transparent pixels. We also enforce spatial coherency in opacitychannels and accordingly propose a color refinement procedure thatis required for preventing visual artifacts while applying image ed-its. We additionally propose a method for automatically estimatinga color model corresponding to an input image, which comprisesa set of distinct and representative color distributions. Our methoddetermines the size of the color model automatically in a contentadaptive manner. We show that the color model estimation can ef-ficiently be performed using our novel projected color unmixing(PCU) formulation.

We present a comprehensive set of results in the article and sup-plementary material that show that our method consistently pro-duces high-quality layers. Given such layers, we demonstrate thatnumerous common image manipulation applications can be reducedto trivial per-layer operations that can be performed convenientlythrough familiar software tools.

2. RELATED WORK

Previous work uses the term soft segmentation in various contexts,such as the probabilistic classification of computerized tomogra-phy (CT) scans [Posirca et al. 2011], computing per-pixel fore-

ground/background probabilities [Yang et al. 2010b], and inter-active image segmentation utilizing soft input constraints [Yanget al. 2010a]. In fact, generally speaking, even the traditional k-means clustering algorithm can be considered as a soft segmen-tation method, as it computes both a label as well as a confi-dence value for each point in the feature space [Tai et al. 2007].In contrast to these approaches, we seek to compute proper al-pha values rather than arbitrarily defined confidence values orprobabilities.

The goals of our method are similar to those of the alternatingoptimization soft color segmentation technique [Tai et al. 2007]and the decomposition technique via RGB-space geometry [Tanet al. 2016], which decompose input images into a set of softcolor segments and demonstrate their use in various image manip-ulation applications. A number of techniques proposed in naturalmatting and green-screen keying literature, such as the ones by Sin-garaju and Vidal [2011], Chen et al. [2013], and Aksoy et al. [2016],can also be regarded as soft color segmentation methods. While thegoal of natural matting is to compute the alpha channel of the fore-ground in an image, the aforementioned methods can also be usedto obtain soft color layers from the input image. Our method hasseveral advantages when compared to these works in terms of thequality of the opacity channels, the color content of individual layersand the absence of need for user input. We present an in-depth dis-cussion of our differences to and the shortcomings of these methodsin Sections 4 and 6.

While we focus on soft color segmentation for the purpose ofimage manipulation, other methods in the literature have been pro-posed that utilize spatial information [Levin et al. 2008b; Tai et al.2005], as well as texture [Yeung et al. 2008] for soft segmenta-tion. The object motion deblurring method by Pan et al. [2016]performs soft segmentation simultaneously with blur kernel esti-mation to increase performance. There is also a substantial bodyof work on traditional, hard segmentation. For a discussion andevaluation of hard segmentation methods, we refer the reader to therecent survey by Pont-Tuset and Marques [2015]. Similarly, thereare also examples of hard color segmentation such as the one byOmer and Werman [2004]. Decomposing an image into several lay-ers with transparency also has distinct applications with specializedmethods such as illumination decomposition for realistic materialrecoloring [Carroll et al. 2011], anti-aliasing recovery [Yang et al.2011], vectorizing bitmaps [Richardt et al. 2014], and reflectionremoval [Shih et al. 2015].

ACM Transactions on Graphics, Vol. 36, No. 2, Article 19, Publication date: March 2017.

Page 3: Unmixing-Based Soft Color Segmentation for Image Manipulation€¦ · Unmixing-Based Soft Color Segmentation for Image Manipulation • 19:3 Fig. 2. The original image (a) is shown

Unmixing-Based Soft Color Segmentation for Image Manipulation • 19:3

Fig. 2. The original image (a) is shown with the alpha channels of the layers corresponding to the yellow of the road lines estimated by the proposed sparsecolor unmixing (b) and by the color unmixing [Aksoy et al. 2016] (c). Notice the spurious alpha values on the road in (c). When our sparse color unmixingresults are utilized for a color change (d) the regions that do not contain the yellow of the road are not affected. On the other hand, a color change using thecolor unmixing (e) effects unrelated regions as well. The color change was applied after matte regularization and color refinement stages for both (d) and (e).

3. SOFT COLOR SEGMENTATION

Our main objective in this article is to decompose an image into mul-tiple partially transparent segments of homogeneous colors, that issoft color segments, which can then be used for various image ma-nipulation tasks. Borrowing from image manipulation terminology,we will refer to such soft color segments simply as layers throughoutthis article.

A crucial property of soft color segmentation is that overlayingall layers obtained from an image yields the original image itself soediting individual layers is possible without degrading the originalimage. In mathematical terms, for a pixel p, we denote the opacityvalue as α

p

i and the layer color in RGB as up

i for the ith layer. andwe want to satisfy the color constraint:∑

i

αp

i up

i = cp ∀p, (1)

where cp denotes the original color of the pixel. The total numberof layers will be denoted by N .

We assume that the original input image is fully opaque and thusrequires the opacity values over all layers to add up to unity, whichwe express as the alpha constraint:∑

i

αp

i = 1 ∀p. (2)

Finally, the permissible range for the alpha and color values areenforced by the box constraint:

αp

i , up

i ∈ [0, 1] ∀i, p. (3)

For convenience, we will drop the superscript p in the remainderof the article and present our formulation at the pixel level, unlessstated otherwise.

It should be noted that different representations for overlayingmultiple layers exist. We use Equation (1), which we refer to asan alpha-add representation, in our formulation, which does notassume any particular ordering of the layers. This representationhas also been used by Tai et al. [2007] and Chen et al. [2013] amongothers. In most commercial image editing software, however, therepresentation proposed by Porter and Duff [1984], referred to inthis article as the overlay representation, is used. The differenceand conversion between the two representations are presented inthe appendix.

Our algorithm for computing high-quality soft color segments canbe described by three stages: color unmixing, matte regularization,and color refinement, which will be discussed in the remainder ofthis section.

Color Unmixing: An important property we want to achievewithin each layer is color homogeneity: The colors present ina layer should be sufficiently similar. To this end, we associate eachlayer with a 3D normal distribution representing the spread of thelayer colors in RGB space, and we refer to the set of N distributionsas the color model. Our novel technique for automatically extractingthe color model for an image is discussed in detail in Section 5.

Given the color model, we propose the sparse color unmixingenergy function in order to find a preliminary approximation to thelayer colors and opacities:

FS =∑

i

αiDi(ui) + σ

( ∑i αi∑i α

2i

− 1

), (4)

where the layer color cost Di(ui) is defined as the squared Ma-halanobis distance of the layer color ui to the layer distributionN (μi , �i), and σ is the sparsity weight that is set to 10 in ourimplementation. The energy function in Equation (4) is minimizedfor all αi and ui simultaneously while satisfying the constraintsdefined in Equations (1)–(3) using the original method of multipli-ers [Bertsekas 1982]. For each pixel, for the layer with best fittingdistribution, we initialize the alpha value to 1 and the layer color ui

to the pixel color. The rest of the layers are initialized to zero alphavalue and the mean of their distributions as layer colors. The firstterm in Equation (4) favors layer colors that fit well with the corre-sponding distribution especially for layers with high alpha values,which is essential for getting homogeneous colors in each layer.The second term pushes the alpha values to be sparse, that is, favors0 or 1 alpha values.

The first term in Equation (4) appears as the color unmixing en-ergy proposed by Aksoy et al. [2016] as a part of their system forgreen-screen keying. They do not include a sparsity term in theirformulation, and this inherently results in favoring small alpha val-ues, which results in many layers appearing in regions that shouldactually be opaque in a single layer. The reason is that a better-fittinglayer color for the layer with alpha close to 1 (hence a lower colorunmixing energy) becomes favorable by leaking small contribu-tions from others (assigning small alpha values to multiple layers)with virtually no additional cost as the sample costs are multipliedwith alpha values in the color unmixing energy. This decreasesthe compactness of the segmentation and, as a result, potentiallycreates visual artifacts when the layers are edited independently.Figure 2 shows such an example obtained through minimizing thecolor unmixing energy, where the alpha channel of the layer thatcaptures the yellow road line is noisy on the asphalt region, eventhough the yellow of the road is not a part of the color of the asphalt.

ACM Transactions on Graphics, Vol. 36, No. 2, Article 19, Publication date: March 2017.

Page 4: Unmixing-Based Soft Color Segmentation for Image Manipulation€¦ · Unmixing-Based Soft Color Segmentation for Image Manipulation • 19:3 Fig. 2. The original image (a) is shown

19:4 • Y. Aksoy et al.

Fig. 3. Two layers corresponding to the dark (top) and light wood color in the original image (a) are shown before (b) and after (c) matte regularization andcolor refinement.

While these errors might seem insignificant at first, they result inunintended changes in the image when subjected to various layermanipulation operations such as contrast enhancement and colorchanges, as demonstrated in Figure 2.

The sparsity term in Equation (4) is zero when one of the lay-ers is fully opaque (and thus all other layers are fully transparent)and increases as the alpha values move away from zero or 1. An-other term for favoring matte sparsity has been proposed by Levinet al. [2008b]: ∑

i

|αi |0.9 + |1 − αi |0.9. (5)

This cost is infinitely differentiable in the interval [0, 1]. Its infinitederivatives at αi = 0+ and αi = 1− causes the alpha values tostay at these extremes in the optimization process we employ. Inspectral matting, the behavior of this function is used to keep alphavalues from taking values outside [0, 1]. In our case, as the boxconstraints are enforced during the optimization of the sparse colorunmixing energy, the negative values our sparsity cost takes outsidethe interval do not affect our results adversely.

Matte Regularization: Sparse color unmixing is done indepen-dently for each pixel, and there is no term ensuring spatial coherency.This may result in sudden changes in opacities that do not quiteagree with the underlying image texture, as shown in Figure 3(b).Hence, spatial regularization of the opacity channels is necessaryfor ensuring smooth layers as in Figure 3(c). This issue also occursfrequently in sampling-based natural matting. The common practicefor alpha regularization is using the matting Laplacian introducedby Levin et al. [2008a] as the smoothness term and solve a linearsystem that also includes the spatially non-coherent alpha values, asproposed by Gastal and Oliveira [2010]. While this method is veryeffective in regularizing mattes, on the downside, it is computation-ally expensive and consumes a high amount of memory especiallyas the image resolution increases.

The guided filter proposed by He et al. [2013] provides an effi-cient way to filter any image using the texture information from aparticular image, referred to as the guide image. The guided filter isan edge-aware filtering method that can make use of an image, theguide image, to extract the edge characteristics and filter a secondimage using the edge information from the guide image efficiently.They discuss the theoretical similarity between their filter and thematting Laplacian and show that getting satisfactory alpha mattesis possible through guided filtering when the original image is usedas the guide image. While filtering the mattes with the guided fil-ter only approximates the behavior of the matting Laplacian, weobserved that this approximation provides sufficient quality for themattes obtained through sparse color unmixing. For a 1MP image,we use 60 as the filter radius and 10−4 as ε for the guided filter,

as recommended by He et al. [2013] for matting, to regularize thealpha matte of each layer. As the resultant alpha values do not nec-essarily add up to 1, we normalize the sum of the alpha values foreach pixel after filtering to get rid of small deviations from the alphaconstraint. The filter radius is scaled according to the image resolu-tion. Note that the layer colors are not affected by this filtering andthey will be updated in the next step.

While enforcing spatial coherency on opacity channels is triv-ial using off-the-shelf filtering, dealing with its side effects is notstraightforward. Obtaining spatially smooth results while avoidingdisturbing color artifacts requires a second step that we discuss next.

Color Refinement: As the original alpha values are modifieddue to regularization, we can no longer guarantee that all pixelsstill satisfy the color constraint defined in Equation (1). Violatingthe color constraint in general severely limits the ability to use softsegments for image manipulation. For illustration, Figure 6 showsa pair of examples where KNN matting fails to satisfy the colorconstraint, which results in unintended color shifts in their results.To avoid such artifacts, we introduce a second energy minimizationstep, where we replace the alpha constraint defined in Equation (2) inthe color unmixing formulation with the following term that forcesthe final alpha values to be as close as possible to the regularizedalpha values: ∑

i

(αi − αi)2 = 0, (6)

where αi represents the regularized alpha value of the ith layer. Byrunning the energy minimization using this constraint, we recom-pute unmixed colors at all layers so they satisfy the color constraintwhile retaining spatial coherency of the alpha channel. Note thatsince the alpha values are determined prior to this second optimiza-tion, the sparsity term in Equation (4) becomes irrelevant. Hence,we only employ the unmixing term of the energy in this step. Forthe optimization, we initialize the layer colors as the values foundin the previous energy minimization step.

Finally, to summarize our color unmixing process: We first min-imize the sparse color unmixing energy in Equation (4) for everypixel independently. We then regularize the alpha channels of thesoft layers using the guided filter and refine the colors by runningthe energy minimization once again, this time augmented with thenew alpha constraint defined in Equation (6). This way we achievesoft segments that satisfy the fundamental color, alpha, and boxconstraints, as well as the matte sparsity and spatial coherency re-quirements for high-quality soft segmentation.

Note that the two energy minimization steps are computed inde-pendently for each pixel, and the guided filter can be implementedas a series of box filters. These properties make our algorithm easilyparallelizable and highly scalable.

ACM Transactions on Graphics, Vol. 36, No. 2, Article 19, Publication date: March 2017.

Page 5: Unmixing-Based Soft Color Segmentation for Image Manipulation€¦ · Unmixing-Based Soft Color Segmentation for Image Manipulation • 19:3 Fig. 2. The original image (a) is shown

Unmixing-Based Soft Color Segmentation for Image Manipulation • 19:5

4. ANALYSIS OF STATE OF THE ART

While the particular deficiencies of current soft color segmenta-tion methods will be apparent from the qualitative and quantitativeevaluation results in Section 6 and the supplemental material, ourgoal in this section is to highlight the theoretical reasons behindsome of those deficiencies. To this end, we take a closer look at theformulations of the current methods.

The soft color segmentation methods in the literature can becategorized into two classes: unmixing based and affinity based.Unmixing-based methods [Tai et al. 2007; Aksoy et al. 2016] in-cluding ours attempt to get unmixed colors and their correspondingalpha values by processing the observed color of each pixel witha statistical model for the layers. The method by Tan et al. [2016]can loosely be categorized as unmixing based as it computes thealpha values using a (non-statistical) color model using fixed layercolors. Affinity-based methods [Levin et al. 2008b; Singaraju andVidal 2011; Chen et al. 2013], on the other hand, aim to use localor non-local pixel proximities to propagate a set of given labels tothe rest of the image.

Alternating Optimization (AO): The alternating optimizationalgorithm proposed by Tai et al. [2007] is similar to ours in termsof the end goal. The authors define a Bayesian formulation thatincludes the alpha values, the layer colors, as well as the colormodel for the soft color segments and find the maximum a priori(MAP) solution to the problem by alternatingly optimizing for thealpha values, the layer colors, and the color model parameters.Here, we will only analyze AO’s alpha and layer color estimationformulations and defer the discussion on its color model estimationto Section 5.

AO estimates the alpha values by defining a Markov randomfield that encodes the probability of the alpha values given the layercolors and the color model. After some algebraic manipulation, theirmaximization can be expressed as minimizing Eα = ∑

p Epα , with

Epα defined as

Epα = 1

2σ 2c

‖cp −∑

i

αp

i up

i ‖2 + 1

σ 2c

∑i

αp

i Di(ui) (7a)

+∑q∈Np

log

⎛⎝1 + 1

2σb

⎛⎝√∑

i

p

i − αq

i

)2

⎞⎠

⎞⎠ , (7b)

where Np is the 8-neighborhood of the pixel p and σb and σc are al-gorithmic constants. The color unmixing energy actually appears asthe second term in Equation (7a) together with the color constraint,whereas Equation (7b) shows the smoothness term. The MAP es-timation in AO is done via loopy belief propagation. They assignthe soft labels assigned by the belief propagation algorithm as thealpha values. This global optimization scheme becomes a limitingfactor for AO in scaling to high-resolution images.

Due to the iterative nature of the algorithm, in the alpha esti-mation step, AO uses the color values from the previous iteration.Hence, it will find a compromise between the optimal alpha valuesand satisfying the color constraint, which is to be corrected in thecolor estimation step. Experimentally, we observed that this inter-dependency, when also coupled with the interdependency with thecolor model estimation, may result in layers oscillating betweentwo local minima as the iterations progress. The smoothness term,on the other hand, depends on how close the estimated alpha valuesfrom the previous iteration are rather than using the image texture.Their results typically have unnatural gradients in layer alphas as itcan be seen in Figure 4.

Fig. 4. The layers corresponding to the outer region of the northern lights inthe input image (a) computed by the proposed algorithm (b), color unmixing(c) and alternating optimization (d). Notice that the layers in (c, d) haveabrupt edges in opacity while our result has a smooth transition followingthe input image content. Note that AO’s result already has smoothnessenforced in their optimization procedure. A comparison of our results andthat of CU after our matte regularization is presented in Figure 2.

Decomposition via RGB-Space Geometry (RGBSG): Tanet al. [2016] use a different approach to the soft color unmixingproblem by fixing the layer colors beforehand and optimizing forthe opacity values. Differing from all the approaches discussed inthis section as well as ours, they use the overlay layers represen-tation as defined in the appendix. This representation requires apre-determined ordering of the layers, and RGBSG requires thisordering as input before the decomposition. The main advantageof using alpha-add representation over overlay representation canbe said to be the indifference to layer order, which decreases theamount of user input needed. However, their end goal and applica-tion scenario is similar to ours.

They construct a color model that encompasses the hull of theRGB values in the image, which will be discussed in Section 5.2,and fix ui’s to these predetermined values. They define their energyfunction as

ERGBSG = ωpEp + ωoEo + ωsEs

Ep = 1

K

∥∥∥∥∥∥un − c +∑

i

⎛⎝(ui−1 − ui)

N∏j=i

(1 − αj )

⎞⎠

∥∥∥∥∥∥2

Eo = 1

N

∑i

−(1 − αi)2 Es = 1

N

∑i

(∇αi)2,

(8)

where K is 3 or 4 depending on whether they use RGB or RGBAoptimization as defined in their article, ∇αi is the opacity gradient,and ωp = 375, ωo = 1, and ωs = 100 are algorithmic constants.Notice the use of α as opposed to α due to their compositing formu-lation. As the layer colors are determined with the color model, thisoptimization only determines the opacity values, unlike the otherunmixing-based approaches. The color constraint is satisfied withthe Ep term while sparsity and smoothness is enforced via the Eo andEs terms, respectively. Characteristically, their sparseness energy issomewhat similar to ours as it also depends on the sum of squareof the alpha values. Their smoothness measure follows that of AOas it also depends on the smoothness of alpha values rather thanthe image texture. Their layers do not necessarily give the originalimage when overlayed due to the possibility of imaginary colors

ACM Transactions on Graphics, Vol. 36, No. 2, Article 19, Publication date: March 2017.

Page 6: Unmixing-Based Soft Color Segmentation for Image Manipulation€¦ · Unmixing-Based Soft Color Segmentation for Image Manipulation • 19:3 Fig. 2. The original image (a) is shown

19:6 • Y. Aksoy et al.

in their color model, as will be discussed in Section 5.2. However,their layers have solid colors by definition.

One particular characteristic of RGBSG is that it requires a colormodel that encompasses all the colors that appear in the image fromthe outside. This requirement is sometimes limiting as the layerscan not have colors that are close to the center of the RGB cube. Ademonstration of this behavior is presented in Figure 9.

Green-Screen Keying via Color Unmixing (CU): The first stepof our algorithm, sparse color unmixing, shares common elementswith the color unmixing formulation proposed by Aksoy et al.[2016]. While CU does the energy minimization while satisfyingthe constraints in Equations (1)–(3) as we do, it lacks a sparsitymeasure in the energy formulation, as discussed in Section 3. Theauthors solve the matte sparsity problem by introducing a per-framelocal color model, which locally determines subsets of their globalcolor model. A first approximation of the local color model is com-puted via a Markov random field optimization, which later needsto be manually refined by the user in order to achieve high-qualitykeying results. Note that the user interaction for obtaining localcolor models is provided not for giving artistic freedom to the user,but it is required in order to fix the errors of the initial estimate.

CU also lacks a measure for spatial coherency. While they canachieve satisfactory results for green-screen keying with the help oflocal color models, when applied to soft color segmentation, theirper-pixel formulation results in abrupt transitions between layers,as seen in Figure 4(c).

Multiple Image Layer Estimation (ML): Multiple image layerestimation method [Singaraju and Vidal 2011] makes use of thematting Laplacian proposed by Levin et al. [2008a]. The mattingLaplacian encodes local affinities that effectively represent the alphapropagation between neighboring pixels. ML formulates the prob-lem of estimating multiple soft layers into several sub-problems oftwo-layer estimation. Their formulation allows the estimation of Nlayers in closed form. However, they discuss in depth that it is notpossible to solve for the layer alphas in closed-form while satisfyingboth non-negative alpha values and the alpha values summing up to1 for N > 2.

Spectral Matting (SM): It is worth mentioning that the soft seg-mentation (as opposed to soft color segmentation) method spectralmatting [Levin et al. 2008b] also extracts multiple layers by makinguse of the matting Laplacian. SM defines matting components, softsegments that can be determined as the eigenvectors of the mattingLaplacian corresponding to its smallest eigenvalues. By making useof the sparsity prior shown in Equation (5), they find N plausiblematting components, N being the number of layers specified by theuser. The primary aim of SM is to extract spatially connected softsegments rather than layers of homogeneous colors. Their formula-tion fails to keep the alpha values in [0, 1].

Both ML and SM only estimate the alpha values. In order to getthe layer colors, an additional step is needed for each of them. MLuses the layer color estimation method proposed by Levin et al.[2008a]. In this method, the authors define an energy for estimatingthe foreground and background colors (only for the N = 2 case)that make use of layer alpha and color derivatives:

∑p

‖ cp −∑

i

αp

i up

i ‖2 +[ ∇xα0

∇yα0

]T ∑i

[ ∇x uTi ∇x ui

∇y uTi ∇y ui

]. (9)

This energy, which takes the alpha values as input, propagatesthe color values in the image to get plausible colors agreeing with thealpha values. As the color constraint is only one of the terms in theenergy, the result does not necessarily satisfy the color constraint.

Fig. 5. A layer extracted by the proposed method (b) and KNN Matting(c). KNN’s hard constraints on alpha values may cause artifacts as shown inthe inset.

Fig. 6. Our layers, when overlaid (b), give us the original image (a), whilelayers extracted by KNN matting may result in erroneous reconstruction (c)by failing to satisfy the color constraint. The effect may be local color losssuch as the lips in the top image or a global degradation of image quality asseen in the bottom image.

KNN Matting (KNN): KNN matting [Chen et al. 2013], in con-trast to SM and ML, uses non-local affinities that are computedusing the neighbors of each pixel in a feature space rather than onlyspatially close-by pixels. They also transform the layer estimationproblem into a sparse linear system and solve for the alpha valuesof each layer separately. Unlike ML, they show that their algorithmnaturally satisfies the constraint that the alpha values sum to 1.While the non-local approach, in fact, produces higher-quality softlayers when compared to its local counterparts, the sparse affinitymatrix they construct ends up having many entries far from the di-agonal. This significantly increases the runtime to be able to solvethe linear system. KNN puts hard constraints around the seed pixelsand this frequently results in disturbing blockiness artifacts due totheir affinity definition as shown in Figure 5.

KNN matting proposes a layer color estimation algorithm thatfollows their non-local approach. By making a smoothness assump-tion on layer colors, they again propose a sparse linear system forestimating layer colors and solve for them for each layer inde-pendently. We observed that independently solving for each layerfails to satisfy the color constraint in the final result, as shown inFigure 6. This is highly undesirable, especially for image editingapplications because of the information loss on the original imageprior to editing.

To summarize, the main advantages of our method are satisfyingall the constraints and requirements that we discussed in Section 3,not requiring a pre-determined layer ordering, as well as providingsmooth transitions between layers and its per-pixel formulation that

ACM Transactions on Graphics, Vol. 36, No. 2, Article 19, Publication date: March 2017.

Page 7: Unmixing-Based Soft Color Segmentation for Image Manipulation€¦ · Unmixing-Based Soft Color Segmentation for Image Manipulation • 19:3 Fig. 2. The original image (a) is shown

Unmixing-Based Soft Color Segmentation for Image Manipulation • 19:7

Fig. 7. Color model estimation is done by placing seed pixels (marked green on the top row) one by one and estimating a normal distribution (visualized asthe hexagons) from the local neighborhood of each of the seeds. To select the next seed pixel, we make use of the representation scores (bottom row, brightermeans better-represented). The pixels that are already marked as well-represented are highlighted with blue on the top row. The hexagonal visualization showsthe color variation along the three principal axes of the covariance matrices of the estimated 3D color distributions in their diagonals.

permits efficient implementation and scalability to high-resolutionimages. In contrast, any other soft color segmentation method weanalyzed in this section fails in at least one aspect. Scalability isespecially crucial for practical applications, as in terms of bothcomputation time and memory consumption, current methods failto scale to resolutions captured by consumer-grade cameras. We willfurther discuss the visual implications of these shortcomings andrequired computational resources for each algorithm in Section 6.

5. COLOR MODEL ESTIMATION

In Section 3, we defined the color model as the set of layer colordistributions Ni that represents the color characteristics of layers,which we then used as a fundamental component of our soft colorsegmentation formulation. In this section, we discuss details of ourautomatic color model extraction process, which we treated as ablack box until now.

One priority we have in the estimation of the color model isdetermining the number of prominent colors N automatically. Asour layers are meant to be used by artists in image manipulationapplications, it is important to keep N at a manageable number. Atthe same time, to enable meaningful edits, the color model shouldbe able to represent the majority—if not all—pixels of the inputimage. In order to strike a balance between color homogeneity andthe number of layers N , we would like to avoid including colors thatcan already be well represented as a mixture of other main colors.

Figure 7 illustrates our color model estimation process. In orderto determine how well a color model represents the input image, wedefine a per-pixel representation score rp , which is the color un-mixing energy obtained by minimization using the current (possiblyincomplete) color model. The color unmixing energy can be used toassess the representativeness of an intermediate color model, since,if the model fails to fully represent a pixel color in the input im-age, then the color unmixing energy will be high due to the D(ui)term. By using the color unmixing energy (Equation (4) withoutthe sparsity term) instead of only D(ui) terms, we make sure thatthe colors that can already be represented as a mixture of severalexisting colors are not added to the color model. This increasesthe compactness of the estimated color model while preserving thecolor homogeneity of the to-be-estimated layers.

Our first goal when estimating the color model is to determinea set of seed pixels with distinct and representative colors. Forestimating the color model, we rely on a greedy iterative scheme,where we keep selecting additional seed pixels until we determinethat the current color model is sufficiently representative of thewhole image. To select the next seed pixel, we rely on a votingscheme. To that end, we first divide the RGB color space into10 × 10 × 10 bins. Every pixel votes for its own bin, and each voteis weighted by how well represented the pixel already is such thatthe most underrepresented pixels get the highest voting right. Wefinally select the seed pixel from the bin with the most votes. If nobin has a significant number of votes, then the algorithm terminates.

Mathematically, the vote of each pixel is computed as

vp = e−‖∇cp‖ (1 − e−rp )

, (10)

where ∇cp represent the image gradient, which is often a goodindicator of image regions with mixed colors. Since we would ratherlike to select seed pixels with pure colors, the above expressionpenalizes the votes coming from high-gradient image regions.

After selecting which color bin to add to the color model, wechoose the next seed pixel as follows:

si = arg maxp∈bin

Spe−‖∇cp‖, (11)

where Sp is the number of pixels in the same bin as pixel p in its20 × 20 neighborhood. We then place a guided filter kernel aroundp to use as weights in estimating a normal distribution from theneighborhood of the seed.

We use a representation threshold to determine which pixels aresufficiently represented and remove them from the voting pool:

Remove p if rp < τ 2. (12)

The representation threshold τ roughly indicates the number ofstandard deviations the color of a pixel can be away from the meanof a layer distribution to be considered as well represented. Smallervalues of τ would produce larger color models that may be cumber-some for users manipulating images, but the resulting layers wouldbe more homogeneous in terms of their color content. On the con-trary, a larger τ would result in compact color models but possiblycause the layers to be less homogeneous. We show in Section 6.1

ACM Transactions on Graphics, Vol. 36, No. 2, Article 19, Publication date: March 2017.

Page 8: Unmixing-Based Soft Color Segmentation for Image Manipulation€¦ · Unmixing-Based Soft Color Segmentation for Image Manipulation • 19:3 Fig. 2. The original image (a) is shown

19:8 • Y. Aksoy et al.

that although our algorithm requires τ as a parameter, fixing τ in-stead of the number of layers N generalizes well over different typesof images. Through experimentation, we found τ = 5 to be a goodcompromise between color model size and layer homogeneity anduse this value to produce all our results.

As stated earlier, the color model is computed as a pre-processingstep to the soft color segmentation algorithm detailed in Section 3.A computational challenge here is estimating the representationscores efficiently. Instead of running the computationally demand-ing nonlinear optimization scheme of color unmixing every timewe add a new seed pixel to the color model, we approximate thecolor unmixing cost using the novel projected color unmixing asdescribed in Section 5.1.

5.1 Approximating the Representation Score

From an implementation point of view, the main challenge withthe aforementioned color model estimation scheme is the expensivecost of recomputing the color unmixing energy every time we adda new entry to the color model. The key observation that enablesus to circumvent this challenge is that in order to estimate therepresentation scores rp , we only need to know the color unmixingenergy F , but do not necessarily need to obtain the correct alphaand color values as a result of the minimization process.

We reformulate the representation score computation as:

rp = min({Di(cp), ∀i} ∪ {Fi,j (cp), ∀i,∀j �= i}), (13)

where Fi,j (cp) represents an approximation of the minimized colorunmixing energy using the ith and j th color distributions as input.The major simplifying assumption we make here is that the mixedcolors mainly constitute major contributions from at most two dis-tributions in the model. The first term in Equation (13) correspondsto colors that fit well with a single color, while the second representsthe case for two-color mixtures. We approximate the two-color min-imum color unmixing energy using the method we call projectedcolor unmixing.

5.1.1 Projected Color Unmixing. The color line assumption[Ruzon and Tomasi 2000], that is, the unmixed layer colors andthe observed pixel color should form a line in the RGB space, is auseful tool especially for sampling-based natural matting methodsin the literature, such as Gastal and Oliveira [2010]. While this as-sumption can be used for the two-layer case, this simplification doesnot generalize to N > 2. In this section, we utilize the color lineassumption and provide an approximation to the color unmixingenergy [Aksoy et al. 2016] for the N = 2 case, which we illustratein Figure 8.

Color unmixing samples colors from 3D normal distributions incolor space. In order to make use of the color line assumption, werestrict the possible space of samples taken from each distributionto a 2D plane. For a pair of normal distributions N1(μ1, �1) andN2(μ2, �2), the plane of possible unmixed colors for the first layeris then defined by the normal vector n = μ1 − μ2 and the point μ1.

We determine the approximations u{1,2} to the unmixed colorsu{1,2} as projections of the observed color c to the two planes:

u{1,2} = c − (c − μ{1,2}) · n

n · nn. (14)

The alpha values are then determined such that they satisfy the colorconstraint:

α1 = ‖c − u2‖‖u1 − u2‖ , α2 = 1 − α1. (15)

Fig. 8. An illustration of projected color unmixing. See text for discussion.

If c does not lie between the two planes, then we conclude that thepixel p can not be represented as a mixture of samples drawn fromthe two color distributions.

In order to compute the cost of this color mixture, we projectthe normal distributions onto the corresponding planes as well. Wethen apply the color unmixing energy formulation using these 2Ddistributions:

F = α1Dproj1 (u1) + α2Dproj

2 (u2), (16)

where we refer to F as PCU energy.If the two largest eigenvalues of the normal distribution are close

to lying on the plane we defined above, then the estimated layercolor is actually very close to the one found by color unmixing, asillustrated by distribution 1 in Figure 8. Otherwise, the approxima-tion error is larger, as illustrated by distribution 2.

The approximation to the alpha values shown in Equation (15)is actually the same as the alpha estimation equation proposed byChuang et al. [2001] and utilized by many sampling-based naturalmatting approaches,

αB1 = (c − u2) · (u1 − u2)

‖u1 − u2‖2. (17)

The cost of using a pair of samples is typically defined in relationto chromatic distortion [Gastal and Oliveira 2010],

C = ‖c − α1u1 − α2u2‖, (18)

which is basically the deviation from the color constraint. Whileboth chromatic distortion and PCU measure the quality of the un-mixing using the two samples/distributions, a significant differencebetween them is that PCU measures the cost when the color con-straint is satisfied using the statistical models for the layers.

Experimentally, we found that the projected color unmixing en-ergy gives us an approximation to the two-layer color unmixingenergy by an error rate of 15% on average, while running approx-imately 3,000 times faster on a standard PC. By estimating theunmixing costs efficiently, our overall gain in total color modelcomputation time is a 5-time improvement compared to using thecolor unmixing optimization procedure. On average, the time re-quired to compute the color model for a 1MP image is 9s.

A step-by-step example of our color model computation proce-dure is presented in Figure 7. To summarize our color model com-putation, we add new layer color distributions to our color model byfirst computing the per-pixel representation scores efficiently usingprojected color unmixing (Equation (13)). We then group the imagepixels into color space bins and execute a voting scheme wherewe give higher weights to pixels with lower representation scores.Within the bin with the highest votes (Equation (10)), we select

ACM Transactions on Graphics, Vol. 36, No. 2, Article 19, Publication date: March 2017.

Page 9: Unmixing-Based Soft Color Segmentation for Image Manipulation€¦ · Unmixing-Based Soft Color Segmentation for Image Manipulation • 19:3 Fig. 2. The original image (a) is shown

Unmixing-Based Soft Color Segmentation for Image Manipulation • 19:9

Fig. 9. The color model and corresponding layers computed by the proposed method (left) and by Tan et al. [2016]. The layers on the right have been convertedto alpha-add representation from overlay representation (see the appendix) for a more meaningful comparison. The number of layers is different for the twoalgorithms because we are using the original result of Tan et al. [2016] which has 5 layers, and our automatic color model estimation method determined 6dominant colors in the image. See text for discussion.

the seed pixel from an image region where the gradient magnitudeis low (Equation (11)). We compute the parameters of a normaldistribution from the neighborhood of the seed pixel and add thisdistribution to the color model. We remove the well-representedpixels (Equation (12)) and repeat the procedure until no bins haveenough votes.

5.2 Color Model Estimation Methods in Literature

There are several methods to compute a color model given an image.We will discuss the common clustering methods such as k-meansand Gaussian mixture models (GMM) as well as methods integratedwith their soft color segmentation or color editing counterparts byTai et al. [2007], Chang et al. [2015], and Tan et al. [2016].

Our method generalizes well over different types of images witha fixed parameter τ = 5 as opposed to the rest of the algorithms, allof which need the number of layers N as the input parameter thatchange dramatically from image to image. In contrast, KNN [Chenet al. 2013], ML [Singaraju and Vidal 2011], and CU [Aksoy et al.2016] rely on user input in forms of scribbles or seed pixels ratherthan a color model. RGBSG [Tan et al. 2016] requires the ordering ofthe layers as input in addition to N . It should be noted that there aregeneralized methods such as G-means [Hamerly and Elkan 2003] orPG-means [Feng and Hamerly 2006] for automatically estimatingthe number of clusters in arbitrary data sets. However, they do nottake into account the specific characteristics of estimating the num-ber of color layers, such as mixed-color or high-gradient regions.

The color model, referred to as the palette by Chang et al. [2015],is determined by a modified version of the k-means for the palette-based recoloring application [Chang et al. 2015]. They simplify theproblem by using color bins rather than all of the pixel colors in theimage and disregarding dark color entries.

Clustering methods such as k-means or expectation maximiza-tion for GMM, as well as the palette estimation by Chang et al.[2015], tend to produce layer color distributions with means faraway from the edges of the RGB cube. This often results in under-representation of very bright or highly saturated colors in the colormodel.

On the other hand, the color model estimation of AO, as well asGMM, typically results in normal distributions with high covari-ances, which has an adverse effect on the color homogeneity of theresulting layers. This is due to the lower energy achieved with largecovariances in the expectation maximization for GMM. In the caseof AO, they estimate the parameters using the layer colors at a partic-ular iteration of their algorithm. This causes their algorithm to beginwith large covariances as in the first iteration they assume opaque

pixels after an initial clustering, and this inclusion of unmixed colorsin the model estimation causes large color variation in each layer.Large covariances promote non-uniform layers and as a result, fur-ther iterations do not tend to make color distributions more compact.

RGBSG begin their model estimation by assuming that the inputimage is formed by a limited set of colors. This assumption doesnot generalize well to natural images as they may include muchmore complex color schemes and mixtures. Their alpha estimationmethod requires the model colors to envelop the convex hull formedby the pixel colors in the image, and hence they identify the modelcolors by simplifying the wrapping of the hull. This results in se-lection of colors that are either on the edge or outside the convexhull. Hence, the colors picked by them do not necessarily exist inthe image, and a dominant color that is in the middle of the RGBcube cannot be selected as a model color. The effect of this can beobserved in Figure 9, where orange were not included by their colormodel and instead yellow, which does not appear anywhere in theimage, is selected. The orange pixels are then represented by themixture of red and yellow. Tan et al. [2016] points out that this be-havior allows the algorithm to discover hidden colors in the image.However, it is suboptimal for image editing in various scenarios aswhen the edited color does not exist in the image, the effects of theedit becomes hard to predict. In addition, the simplified wrapping ofthe hull does not necessarily stay inside the valid color values. Theymap these imaginary colors onto the acceptable range and this mayresult in their alpha estimation not satisfying the color constraint.

Our method, in contrast to some of the competing approaches,exclusively selects colors that exist in the image. By constructingthe model through selected seed pixels instead of clustering, we caninclude highly saturated or bright colors.

While each approach has advantages and disadvantages, an im-portant thing that should be noted is that the integrated approachesin RGBSG, AO, and ours have characteristics coupled with theircorresponding layer estimation methods. RGBSG requires colorsthat envelop the pixels, and AO requires an iterative approach to re-fine the layers together with the model. Our approach constructs thecolor model using the unmixing energy, which results in a modelthat is able to represent the pixel colors with low unmixing en-ergy, which in turn makes our estimated layers have smaller colorvariation. We evaluate our automatic technique in Section 6.1.

6. EVALUATION

Evaluating soft color segmentation methods is challenging, sincehow the optimal set of layers resulting from decomposing a par-ticular input image should exactly look like is not clear. In fact,

ACM Transactions on Graphics, Vol. 36, No. 2, Article 19, Publication date: March 2017.

Page 10: Unmixing-Based Soft Color Segmentation for Image Manipulation€¦ · Unmixing-Based Soft Color Segmentation for Image Manipulation • 19:3 Fig. 2. The original image (a) is shown

19:10 • Y. Aksoy et al.

Table I. Quantitative Test Results for ML [Singaraju and Vidal2011], SM [Levin et al. 2008b], KNN [Chen et al. 2013], AO

[Tai et al. 2007], CU [Aksoy et al. 2016] and the ProposedAlgorithm. Italic Font Indicates Values Below

Quantization LimitSM ML KNN AO CU Ours

α /∈ [0, 1] 34% 30% 0.0001% 0% 0% 0%Rec. Error 0.0039 0.0138 0.0183 0.00002 0.0003 0.0005Color Var. 0.050 0.044 0.006 0.051 0.001 0.005

Grad. Corr. 0.78 0.71 0.84 0.55 0.56 0.89

even the existence of such a ground-truth soft color segmentationis questionable at best. Accordingly, in this section, we present twotypes of evaluation. As qualitative evaluation, we present layers thatare computed by our method in comparison with previous methodsin Figure 12 and in the supplementary material for visual inspectionby the reader. These results provide insights on the overall qual-ity level of each algorithm and serve as a visual reference on thecharacteristic artifacts each method produces. For quantitative eval-uation, we devise a set of blind metrics for assessing how well eachmethod satisfies the constraints and requirements that we discussedin Section 3. These metrics are as follows:

—Out-of-bounds alpha values: Check if the box constraint (Equa-tion (3)) is satisfied for alpha values. The metric returns the per-centage of alpha values outside the permissible range.

—Reconstruction error: Check if the color constraint (Equa-tion (1)) is satisfied. The metric computes the average squareddistance between an input image and the alpha-weighted sum ofall its layers.

—Color variance: Assess the color homogeneity of layers. Themetric returns the sum of individual variances of RGB channelsaveraged over all layers of an input image.

—Gradient correlation: Assess whether the texture content of theindividual layers agrees with that of the input image. Inconsis-tencies such as abrupt edges in a layer, which are not visiblyidentifiable in the original input image will result in a low score.The metric returns the correlation coefficient between the alphachannel gradients of the layers and the color gradients of theoriginal image. We define the alpha channel gradient as the L2norm of the vector that comprises the gradients of alpha values ofevery layer. Similarly, we compute the L2 norm of the gradientof each color channel as the color gradient.

Note that the above metrics only evaluate constraints and require-ments that are not satisfied by at least one method. For example,we excluded the box constraint for color (Equation (3)), since theparticular implementations of all the methods we consider in ourevaluation produce color values within the permissible range. Weexecuted the above metrics on a test set of 100 diverse images andreport the average numbers in Table I.1

We also analyzed the runtime and memory requirements of eachmethod both as a function of the color model size (Figure 10) andimage resolution (Figure 11). For the former test, we selected eightimages at 1MP resolution with varying color model sizes from 4to 11. In the latter test, where we investigate the scalability of eachmethod in terms of image size, we chose an image with sevenlayers,2 which we resized from 5MP down to 40kP. It should be

1All images and the corresponding layers produced by six methods shownin the figure are presented in the supplementary material.2The average size of the color model was 6.88 for the 100 images on whichwe do the evaluation.

Fig. 10. Computational resources needed for soft color segmentation withrespect to the number of layers. We selected example images correspondinglayer count to see the trend in each algorithm. Note that the data axes are inlogarithmic scale.

Fig. 11. Computational resources needed for soft color segmentation withrespect to the image size. We scaled an image with 7 layers to see the trendin each algorithm. Note that the data axes are in logarithmic scale.

pointed out that our parallelized C++ application is compared to theMATLAB implementations of the competing methods except forCU. KNN, SM, and ML solve large linear systems which cannot beefficiently parallelized. However, these graphs still give us an ideaof the scalability of each method.

In our evaluation, we used publicly available implementations ofSM and KNN, whereas for CU, we use the original source code.We implemented ML using Levin et al.’s [2008a] publicly availablematting Laplacian and foreground layer estimation implementation.We utilized the latter also for computing the layer colors of SM.As AO’s implementation was not available,3 we implemented themethod ourselves in MATLAB, where we utilized the UGM toolboxfor the loopy belief propagation [Schmidt 2007]. Our own resultswere generated using our research prototype written in C++ withparallelization using OpenMP.

3On correspondence with the first author, we learned that the original codeis no longer available.

ACM Transactions on Graphics, Vol. 36, No. 2, Article 19, Publication date: March 2017.

Page 11: Unmixing-Based Soft Color Segmentation for Image Manipulation€¦ · Unmixing-Based Soft Color Segmentation for Image Manipulation • 19:3 Fig. 2. The original image (a) is shown

Unmixing-Based Soft Color Segmentation for Image Manipulation • 19:11

Fig. 12. Comparison of the soft color segments produced by various algorithms including ours. Results of all the algorithms shown here are provided for 100different images in the supplementary material. See text for discussion.

SM and AO require the number of layers N a priori as input,whereas KNN obtain N seed pixels through user interaction, CUrequires N scribbles, and ML similarly requires N user-specifiedregions. In order to enable a meaningful comparison among allcompeting methods, we utilized the number N , as well as the N seedpoints determined by our color model (Section 5). As the input toML, we used N image regions, each comprising pixels with similarcolors to a seed pixel. In order to avoid biasing the comparison byartistic talent, we excluded the local color models from CU. It shouldalso be noted that SM, by design, produces spatially connected softlayers rather than emphasizing color homogeneity. Nevertheless,for completeness, we include SM in our evaluation.

A representative qualitative comparison is presented in Figure 12,where we show the layers produced by all competing methods.4

The figure clearly illustrates the shortcomings of propagation basedmethods SM and ML. For example in ML’s case, the sky and itsreflection on the lake end up in two separate layers, despite havingsimilar colors. The quantitative results for both methods, in agree-ment with our theoretical analysis (Section 4), indicate that they, infact, suffer from alpha values outside the permissible range. Their

4See the supplementary material for many other examples and furtherdiscussion.

ACM Transactions on Graphics, Vol. 36, No. 2, Article 19, Publication date: March 2017.

Page 12: Unmixing-Based Soft Color Segmentation for Image Manipulation€¦ · Unmixing-Based Soft Color Segmentation for Image Manipulation • 19:3 Fig. 2. The original image (a) is shown

19:12 • Y. Aksoy et al.

Fig. 13. The soft color segments produced by the proposed algorithm (left) and by the method by Tan et al. [2016] (right). The layers by Tan et al. [2016]have been converted to alpha-add representation from overlay representation (see the appendix) for a more meaningful comparison. An extension of this figureis available in the supplementary material. See text for discussion.

results on the remaining metrics reveal that they are generally worsethan others for the task of soft color segmentation (Table I).

On the other hand, both Figure 12 and Table I confirm our claimsfrom Sections 4 and 5 on AO’s potential color homogeneity issuesand the consequences of solving for alpha and color values sepa-rately. The figure clearly shows spurious hard edges on AO’s layers,and the method performs badly on both color variance and gradientcorrelation metrics.

While KNN does not share AO’s weaknesses, it instead tends tofail to satisfy the color constraint for reasons discussed in Section 4.Another significant drawback of KNN is its prohibitively long run-time, which prevents the method to be used on high-resolutionimages on current PC hardware.

The layers computed by CU in Figure 12 clearly show the visualeffect of the absence of spatial coherency, which manifests itself asspurious hard edges throughout the layers. This problem is also evi-dent from the gradient correlation metric outcomes in Table I, whereCU gets the worst score among all the algorithms we evaluated.

We could not include RGBSG [Tan et al. 2016] in these compar-isons because it requires the user to define the ordering of the layers.The quality of its layers depend on this ordering and hence we couldnot use a random ordering to determine its true performance. Thisordering is rather hard to define prior to soft color segmentation evenby a user and the need for it is a significant shortcoming. Instead,we compare our layers against RGBSG using the layers providedby Tan et al. [2016] as supplementary material to their article inFigure 13.5

RGBSG provides opacity layers with smooth transitions as wedo, as opposed to other algorithms we analyze in this section. Also,their layer colors are defined to be solid, which makes their colorvariation score 0. However, their formulation and model estimationrequires the model colors to envelop the pixel colors of the origi-nal image from the outside in RGB space, which makes the colorcontent of their layers differ from the dominant colors in the image.This behavior, when coupled with the solid color layers, results inreduced sparsity in their layers. Figure 13 demonstrates this partic-ular shortcoming. While the girl’s hoodie has a particular shade ofthe blue, it was not included in the color model of RGBSG. As aresult, that region also has strong green and white components inaddition to blue. Also, the purple color does not exist in the originalimage but is included in the model of RGBSG and computed asan additional layer. By containing a subset of actual image colors,

5Our layers for the 18 images used by Tan et al. [2016] are provided assupplementary material.

our automatically computed layers provide an intermediate imagerepresentation that is more intuitive to use.

To summarize, all current soft color segmentation methods sufferfrom one or more significant drawback(s). Our method, on the otherhand, successfully satisfies the alpha, box, and color constraintsand produces layers with homogeneous colors. The transitions inbetween layers of our method are highly correlated with the textureof the original image. Importantly, due to its highly parallelizableper-pixel formulation, our method is more memory efficient andruns more than 20× faster than KNN and RGBSG, which are theclosest competition in terms of the quality of results. The proposedalgorithm can process a 100MP image in 4h using up to 25GB ofmemory. Note that KNN and AO can only process a 2.5MP imagewithin the same time budget, and SM requires more than 25GB ofmemory for a 5MP image. Also, note that our modifications to thecolor unmixing formulation by Aksoy et al. [2016] cause only amodest performance hit, while significantly improving the qualityof the layers.

6.1 Color Model Estimation

We also compare our color model estimation method (Section 5)with other methods, such as the specialized color model estimationschemes from AO, RGBSG, and palette-based recoloring [Changet al. 2015], as well as general-purpose clustering methods k-meansand expectation maximization for GMM.

Figure 14 shows exemplary results for discussion, whereas thefull set color models for 100 images is presented in the supplementalmaterial. These results demonstrate several advantages of our colormodel estimation method over others.

First, Figure 14 clearly demonstrates that the number of maincolors N in an image (and thus the desired cardinality of the colormodel) is highly content dependent. However, all current meth-ods require N to be specified a priori. Figure 14 shows that anyfixed number will be overkill for certain images, whereas being toorestrictive for others. Our method is highly advantageous in thisregard, as it determines N automatically by analyzing the imagecontent using a fixed parameter τ = 5. Note that to enable a mean-ingful comparison, we set N for the competing methods to the valuedetermined by our method.

One characteristic of clustering-based methods is the lack ofhighly saturated or bright colors in the estimated model. This isbecause, as they form clusters, the centers tend to be far fromthe edges of the RGB cube to get a lower clustering energy. Bysampling colors directly from the image using our voting-basedscheme, we are able to get the colors as they appear in the image.This behavior is especially apparent in the last image in Figure 14,

ACM Transactions on Graphics, Vol. 36, No. 2, Article 19, Publication date: March 2017.

Page 13: Unmixing-Based Soft Color Segmentation for Image Manipulation€¦ · Unmixing-Based Soft Color Segmentation for Image Manipulation • 19:3 Fig. 2. The original image (a) is shown

Unmixing-Based Soft Color Segmentation for Image Manipulation • 19:13

Fig. 14. Comparison of the color models estimated by our method, alternating optimization [Tai et al. 2007], expectation maximization using Gaussianmixture models (GMM), RGB-hull based method used in RGBSG [Tan et al. 2016], palette estimation used in palette-based recoloring [Chang et al. 2015],and K-means clustering. The green dots on the images denote seed points determined by the proposed algorithm. The hexagonal visualization shows the colorvariation along the three principal axes of the covariance matrices of the estimated 3D color distributions in their diagonals. The square visualization shows asingle solid color, which is used for methods that only estimate colors rather than distributions. See text for discussion.

where k-means and palette-based recoloring fails to capture thevivid colors in the image.

RGBSG color model estimation, on the other hand, selects colorsthat lie outside the convex hull formed by the pixel colors in theimage. This results in many colors appearing in the color model thatdo not exist anywhere in the image, such as the bright green entriesin the second and third examples in Figure 14.

Another advantage of our color model estimation is its ten-dency to produce homogeneous color distributions, which directlyinfluences the color homogeneity of the resulting layers computedby the soft color segmentation method. The hexagonal visualiza-tions of each method’s estimated color distributions in Figure 14reveal that AO and GMM produce noticeably more color variationin each color model entry compared to our method.

Our method can also capture distinct colors confined in small im-age regions, such as the skin color in the middle image in Figure 14,or the green of the plant in the fourth image in the figure, which aremissed by all other methods except palette-based recoloring. Sincesuch regions usually form small clusters in color space, k-meansand GMM tend to merge them into larger clusters.

While the competing methods have particular disadvantages, werefer the reader to Section 5.2 for the discussion on how and why thespecialized model estimation methods in AO and RGBSG, as well asours, fit well with their corresponding layer estimation counterparts.

7. APPLICATIONS

A key advantage of our framework is that it allows numerous,seemingly unrelated, image manipulation applications to be exe-cuted trivially, once an input image is decomposed into a set oflayers. For example, color editing can simply be performed bytranslating the colors of one or more layers, green-screen keyingamounts to removing the layer(s) corresponding to the background,texture overlay can be achieved by introducing new texture layersobtained from other images, and so on. In this section, we showhigh-quality results for different image manipulation applicationsthat were produced by first letting our method do the heavy liftingand then applying a small set of basic per-layer operations.

Our method naturally integrates into the layer-based workflowadopted by the majority of the current image manipulation pack-ages. In Figure 15 (bottom) we list some basic image manipulation

operations that are implemented in image manipulation softwarepackages. In practice, users can easily import the layers computedautomatically using our method into their preferred software pack-age and perform these edits through familiar tools and interfaces.This way, we prevent any unnecessary learning effort, as well as takefull advantage of the powerful layer-based editing tools available.

We produced our application results by first exporting the layerscomputed automatically by our method to Adobe Photoshop andthen performing a set of operations on individual layers.

We present our results in Figures 15 and 16, where input imagesand the corresponding automatically estimated color models areshown on the top, and edited images are shown at the bottom, alongwith the list of image manipulation operations applied to obtain thepresented results. These image manipulation operations are denotedby small icons underneath the edited images, and the correspond-ing legend is presented at the bottom of Figure 15. For example,Figure 15 (Image (3)) is obtained by changing brightness, colors,and saturation on certain layers, whereas in Figure 15 (Image (8))we only applied color changes to a subset of the layers. For brevity,in the remainder of this section we will refer to the specific resultspresented in Figures 15 and 16 solely by their designated indices.

In the following paragraphs, we discuss numerous examples ofimage manipulation applications and highlight our correspondingresults. These applications can be categorized into layer adjust-ments, where we modify properties of existing layers, and com-positing, where we add new layers to an image or remove existingones. Note that some of our results contain specific manipulationsthat can be classified under more than one application, which caneasily be performed under a single framework using our method.

7.1 Layer Adjustments

The users can easily enhance or even completely change imagecolors as well as adjusting brightness and exposure on a per-layerbasis. Note that while performing such edits globally throughout theentire image is trivial, making local adjustments is often challengingand prone to producing visual artifacts.

Color enhancement/change: A number of effects can beachieved by simply using the hue/saturation tools available in mostimage manipulation software, as well as the more sophisticatedcolor adjustment tools (such as the vibrance control in Photoshop)

ACM Transactions on Graphics, Vol. 36, No. 2, Article 19, Publication date: March 2017.

Page 14: Unmixing-Based Soft Color Segmentation for Image Manipulation€¦ · Unmixing-Based Soft Color Segmentation for Image Manipulation • 19:3 Fig. 2. The original image (a) is shown

19:14 • Y. Aksoy et al.

Fig. 15. Example results that were generated using the layers provided by the proposed algorithm. Each set shows the input image (top), the color model, theedited image and the set of operations applied to individual layers. The set of operations is defined on the bottom. See text for discussion.

on selected layers of an input image. We showcase several exam-ples, including changing colors of clothing (Images (1), (4), (6), (8),(13), and (18)), fauna (Image (3)), illumination (Images (2) and (7)),sky (Images (16) and (17)), fire (Image (17)), and metallic surfaces(Image (15)).

Brightness/exposure change: The brightness and exposure ofspecific layers can similarly be modified to make slight adjustmentsin the overall image appearance. Such subtle edits are performed inmost of our results presented in Figures 15 and 16. Additionally, inImage (16) and Figure 1(c), we demonstrate a local enhancementof luminance contrast of the clouds, which results in certain detailsbecoming visible and giving the impression of more volumecompared to the input images. A particularly interesting imagemanipulation is showcased in Image (19), where we increase theintensity of shadows to facilitate establishing the skater as thecenter of attention in the composition.

Skin tone correction: Adjusting skin tones is one of the mostcommon photo editing operations applied in practice. Since humans

are usually the most salient scene elements and our visual systemis highly tuned for detecting any imperfections, especially on faces,even the slightest visual artifacts caused by re-adjusting skin colorscan be disturbing for the viewer. Our results (Images (1), (4), (6),and (18)) show examples of modified skin tones, which we achievedby making the mid-tones richer in the corresponding layer using thecurve tool. The highlights on the faces can also be edited to beweaker (Images (1) and (4)).

7.2 Compositing

Our layers also serve as a useful intermediate image representationfor general-purpose compositing applications.

Green-screen keying: Images (11), (12), (13), and (14) showour green-screen keying results obtained automatically by simplyremoving the layers corresponding to the green screen. Note that anyother background object on the green-screen (such as the markers inImage (14)) can easily be removed using a standard garbage matte.

ACM Transactions on Graphics, Vol. 36, No. 2, Article 19, Publication date: March 2017.

Page 15: Unmixing-Based Soft Color Segmentation for Image Manipulation€¦ · Unmixing-Based Soft Color Segmentation for Image Manipulation • 19:3 Fig. 2. The original image (a) is shown

Unmixing-Based Soft Color Segmentation for Image Manipulation • 19:15

Fig. 16. Example results continued. See text for discussion.

We additionally present a video result for green-screen keying inthe supplementary material.

Texture overlay: Our method also allows additional layers tobe overlaid onto existing ones. Image (10) shows such an examplewhere we extract the drawing from the blackboard on the sourceimage and overlay it on a new image only by applying a perspectivetransform to the corresponding layers. Note that, due to their accu-rate opacity channels, the transferred layers properly mix with thetexture of the concrete ground and shadows. In Image (15) we makea file cabinet appear more rugged by overlaying a texture pattern.

Layer replacement: Our method can also be used for removingexisting layers and adding new ones, even if the content is not cap-tured in a green-screen setting. Image (9) shows an example where

we extract the object from the background with the help of rotomasks and use it to create a web page. Note that the details, such asthe shadow of the motorcycle, are retained in the composited resultwith proper opacity. Finally, in Image (18), we completely replacethe original background with an external image while properly re-taining the reflection on the window.

Our results demonstrate that state-of-the-art quality can beachieved using our soft color segmentation as an intermediate imagerepresentation, which in turn trivializes numerous image manipu-lation applications. This suggests that soft color segmentation is,in fact, a fundamental technical problem in image manipulation.By isolating this problem and solving it effectively and efficiently,our method serves as a unified framework for high-quality image

ACM Transactions on Graphics, Vol. 36, No. 2, Article 19, Publication date: March 2017.

Page 16: Unmixing-Based Soft Color Segmentation for Image Manipulation€¦ · Unmixing-Based Soft Color Segmentation for Image Manipulation • 19:3 Fig. 2. The original image (a) is shown

19:16 • Y. Aksoy et al.

Fig. 17. Step-by-step editing of the image on the left in Adobe Photoshop using our layers (a), by a professional artist using only Adobe Photoshop (b),using the palette-based image editing tool by Chang et al. [2015] (c), and the layers computed by AO [Tai et al. 2007] (d), KNN [Chen et al. 2013] (e), andRGBSG [Tan et al. 2016] (f). AO and RGBSG rows lack one step each (the color of the sunset for AO and the color of the sea for RGBSG) because the layerscorresponding to the intended edits did not exist in their color model. See text for discussion.

manipulation, which gives users significant flexibility for realizingtheir artistic vision.

8. COMPARISONS AT THE APPLICATION LEVEL

We demonstrated the use of our soft color segments in image editingin Section 7. Theoretically, any soft color segmentation methodcould produce layers that can be used for the same applications.We analyzed in Section 6 the differences between the current stateof the art in soft color segmentation and the proposed method. Inthis section, we will demonstrate how these differences affect imageediting results. We will also present the results of professional artistsusing commercially available tools for some of the demonstratedapplications.

Figure 17 shows an image being edited by using our layers,layers by AO, KNN, and RGBSG, as well as those by a professionalartist using Adobe Photoshop and using the palette-based recoloringapplication by Chang et al. [2015]. Creating a mask in Photoshop fortargeted color edit results in color artifacts when the artist attemptsto change the pink of the coat to yellow. The soft color transitionsand fine structures are the main source of the problem since the layermasks do not include unmixing of the colors and the color changesresult in suboptimal color transitions with the neighboring regions.The feedback we got from the artist was that very precise masksshould be drawn at the pixel level manually and targeted color

edits should be done to make the transitions look more natural,which is a very time-consuming and error-prone task. The finestructures and transitions are dealt with better by the palette-basedrecoloring. However, in this case, we see significant artifacts aroundbright regions. Using the layers by AO also creates significant visualartifacts, as the pink layer could not cover the full extent of the color.The layers by KNN do not give the original image and this results ina degraded version of the image even without editing. The pink layercovers areas in around the hair and in the background in RGBSGlayers, which results in some artifacts in the edited result. It can alsobe observed that the transition from pink to blue of RGBSG layersis suboptimal, and some pinkish hue can be observed in this regionafter the color change.

We present color editing results using our layers in images usedby Tan et al. [2016] (RGBSG) and Chang et al. [2015] (PBR) inFigure 18. We observed some matte smoothness issues in the resultsby RGBSG as apparent in the top two images. The color changefrom blue to pink in the bottom example results in artifacts aroundthe soft transition from the chair to the background. The luminanceconstancy constraint of PBR, which states that the luminance of aparticular palette color should always be lower than other palettecolors that are originally brighter in the non-edited palette, results inoverexposed regions when one attemps to increase the luminanceof the sky in the top example. Also, the orange hue in the boatexample still exists in the ground after the color change from orange

ACM Transactions on Graphics, Vol. 36, No. 2, Article 19, Publication date: March 2017.

Page 17: Unmixing-Based Soft Color Segmentation for Image Manipulation€¦ · Unmixing-Based Soft Color Segmentation for Image Manipulation • 19:3 Fig. 2. The original image (a) is shown

Unmixing-Based Soft Color Segmentation for Image Manipulation • 19:17

Fig. 18. Color editing results using our layers (a), layers by Tan et al. [2016] (b) and using the recoloring application by Chang et al. [2015] (c) on imagesused by Tan et al. [2016] and Chang et al. [2015] in their papers. An extension of this figure is available in the supplementary material. See text for discussion.

Fig. 19. Comparison to Aksoy et al. [2016] with manually edited active colors (a), and the same method with all colors activated at all pixels (b). Their resultswithout the manual active color selection show severe visual artifacts. On the other hand, the proposed method (c) achieves comparable quality to (a) despitethe lack of any user interaction. Three images on the right show work of an independent professional artist using multiple commercially available tools (d),only Keylight (e) and only IBK (f) as reported by Aksoy et al. [2016].

to purple, alongside with other artifacts around the boat itself. Thecolor change in the bottom example erroneously affects the colorof the chair completely when PBR is used.

In practice, green-screen keying depends heavily on user interac-tion. The common practice in industry requires a specialized com-positing artist to combine multiple commercially available toolssuch as Keylight, IBK, or Primatte to get the best possible result.The state of the art in the academic literature [Aksoy et al. 2016]also requires multiple user interaction stages, although it decreasesthe total interaction time significantly. We compare our results withthose of Aksoy et al. [2016] with user interaction, their results with-out user interaction using only the color unmixing proposed in thesame article [Aksoy et al. 2016], as well as with the work of aprofessional compositing artist that were provided in their article in

Figure 19. It can be observed in both examples in Figure 19 that weare able to conserve intricate details of the foreground object whilerequiring only a simple mask to get rid of the markers, and so on,in the background to clean the matte in a post-process. Note thatthis mask is only one of the steps of interaction that is required incommercially available tools. Unlike the proposed method, usingcolor unmixing without the interaction steps described by Aksoyet al. [2016] results in disturbing artifacts.

In summary, while given enough time our results could poten-tially be reproduced by a skilled artist utilizing various specializedtools for each task, our method in general significantly reduces themanual labor required for achieving production-quality results, andprovides artists a unified framework for performing various imagemanipulation tasks conveniently.

ACM Transactions on Graphics, Vol. 36, No. 2, Article 19, Publication date: March 2017.

Page 18: Unmixing-Based Soft Color Segmentation for Image Manipulation€¦ · Unmixing-Based Soft Color Segmentation for Image Manipulation • 19:3 Fig. 2. The original image (a) is shown

19:18 • Y. Aksoy et al.

Fig. 20. In some cases, the proposed color model estimation algorithmmay give more layers than the user intends to make use of, as seen in thebottom row (c). However, these layers can easily be combined to be editedtogether, an example of which is shown in the top row (b) with the originalimage and the color model (a).

Fig. 21. Although our automatic method is not able to deal with color spill(c) as well as [Aksoy et al. 2016] (b), it is possible to use commercial keyingsoftware such as Keylight to get rid of the spill as a post-processing step (d).

9. LIMITATIONS

In some cases, the automatically estimated color model comprisescolor distributions that are subjectively similar, such as the threeseparate white layers and the separate dark brown/black layers inFigure 20(c). While this level of granularity might be useful, amore concise color model would be more convenient for certainedits. Fortunately, combining multiple layers into one is trivial inour framework (Figure 20(b)). It is worth noting that the transitionsbetween layers with perceptually similar color distributions are stillsmooth and have accurate alpha values. Therefore, they can still beedited separately if the user intends to do so.

Our method has a significant advantage over the state of the artin green-screen keying [Aksoy et al. 2016] in that it completelyreplaces the cumbersome two-step user interaction process witha fully automatic workflow without sacrificing the quality of theresults in general. That said, our sparse color unmixing energyformulation (Section 3), which is designed for general-purposesoft color segmentation, is not as effective in dealing with colorspill (indirect illumination from the green-screen) as the methodby Aksoy et al. [2016]. Fortunately, we found that this problem caneasily be alleviated in practice, as off-the-shelf tools can effectivelyremove spill even in challenging cases, such as Figure 21, afteronly a few mouse clicks.

The color model estimation method we propose selects seed pix-els from the image and hence comprises only colors that exist inthe image. This strategy is effective for including colors with high

brightness or saturation when compared to the clustering-basedmethods as discussed in Section 6.1. However, one limitation is thatit cannot identify colors that only appear as a mixture in the image,such as the original color of a colored glass or smoke that is notdense. For estimating such a color that does not appear opaque any-where in the image, a different specialized approach is needed thatwould estimate the partial contributions from an existing incompletecolor model in order to isolate the missing color.

Finally, as our soft color segmentation method does not utilizehigh-level information such as semantic segmentation or face de-tectors, the spatial extent of our soft segments do not necessarilyoverlap with semantically meaningful object boundaries. That said,it is also fairly easy to limit the spatial extent of layers by introduc-ing masks since our layers naturally integrate into current imagemanipulation software.

10. CONCLUSION

We proposed a novel soft color segmentation method that is espe-cially suitable for image manipulation applications. We introducedSCU, which gives us compact preliminary layers. We discussedthe challenges of enforcing spatial coherence and proposed a colorrefinement step that prevents visual artifacts. We also proposed amethod for automatically estimating the color model that is requiredfor solving the color unmixing problem, which completely removesthe need for user input. A key component in our color model estima-tion is projected color unmixing, which enables efficient computa-tion of representation scores we need for compact color models. Wepresented a theoretical reasoning as well as quantitative and quali-tative evaluations showing that our results are superior to previouswork in soft color segmentation. We also demonstrated state-of-the-art results in various image manipulation applications. The futuredirections for our research include addressing the limitations of ourmethod discussed in Section 9, as well as extending our per-framemethod to exploit temporal information in order to natively handleimage sequences.

APPENDIX

A. ALPHA-ADD AND OVERLAY LAYERREPRESENTATIONS

For handling layers, our blending formulation in Equation (1) corre-sponds to the alpha add mode present in Adobe After Effects. Thenormal blending option in Photoshop differs slightly from ours,which is defined for two layers as follows:

uo = αaua + αbub (1 − αa)

αa + αb(1 − αa), (19)

αo = αa + αb (1 − αa) , (20)

where uo and αo define the overlaid result with Photoshop-adjustedalpha value, ua and αa define the top layer, and ub and αb define thebottom layer. We refer to the layers following this representation asoverlay layers with alpha values denoted by α, in opposition to therepresentation used in our formulation, to which we refer as alpha-add layers with alpha values α. Unlike the alpha-add representationwhere the ordering of the layers is irrelevant, overlay layers dependon a pre-defined layer order.

Assuming that the layers form an opaque image when overlaid,given the layer order from 1 to N , with the N th layer being at the

ACM Transactions on Graphics, Vol. 36, No. 2, Article 19, Publication date: March 2017.

Page 19: Unmixing-Based Soft Color Segmentation for Image Manipulation€¦ · Unmixing-Based Soft Color Segmentation for Image Manipulation • 19:3 Fig. 2. The original image (a) is shown

Unmixing-Based Soft Color Segmentation for Image Manipulation • 19:19

top, one can convert alpha-add layers to overlay layers as follows:

αn ={

αn∑ni=1 αi

if∑n

i=1 αi > 0

0 if∑n

i=1 αi = 0, n ∈ {1 . . . N}. (21)

Since some regions are completely occluded by the layers on top,the alpha values assigned to them are arbitrary, although we definedαn = 0 when

∑n

i=1 αi = 0. If the artist intends to remove someof the layers during editing for compositing applications as wedemonstrate in Section 7, then those layers should be placed at thebottom before the conversion.

Similarly, the overlay layers can be converted to alpha-add layersusing the following formulation:

αn = αn

(1 −

N∑i=n+1

αi

), n ∈ {1 . . . N}. (22)

Note that the layer colors ui are not affected by these conversions.

ACKNOWLEDGMENTS

We thank Alessia Marra for discussions on possible applications andher help in result generation; Jean-Charles Bazin, Henning Zimmer,Jordi Pont-Tuset, and the anonymous reviewers for their feedbackon the text; Yu-Wing Tai and Jianchao Tan for our correspondenceabout their publications; and Andrew Speilberg for the voice-overof the accompanying video.

The original images in Figures 1, 2, 3, 5, 14 (Images (1), (3), and(4)), 16 (Images (9), (15), (16), (17), and (19)), and 20 are fromDeath to the Stock Photo; in Figure 4 from Flickr user MarceloQuinan; in Figure 6 (top) from Flickr user Jason Bolonski; inFigure 15 (Image (2)) from Flickr user Nana B Agyei; in Figure15 (3) from Flickr user taymtaym; in Figure 15 (Image (7)) fromFlickr user Paul Bica; in Figure 16 (Images (10), (13), and (14)) byFlickr user Dave Pape; and in Figures 16 (Images (11) and (12))and 19 are from the movie Tears of Steel by (CC) Blender Founda-tion (mango.blender.org). The result images in Figures 1(d) and 16(Images (9)–(12)) were generated by Alessia Marra, and the onesin Figures 1(c), 15 (Images (1)–(8)), and 16 (Images (13)–(19)) byYagız Aksoy.

REFERENCES

Yagız Aksoy, Tunc Ozan Aydın, Marc Pollefeys, and Aljosa Smolic. 2016.Interactive high-quality green-screen keying via color unmixing. ACMTrans. Graph. 35, 5 (2016), 152:1–152:12.

Dimitri P. Bertsekas. 1982. The method of multipliers for equality con-strained problems. In Constrained Optimization and Lagrange MultiplierMethods. Academic Press, New York, NY, 96–157.

Robert Carroll, Ravi Ramamoorthi, and Maneesh Agrawala. 2011. Illumina-tion decomposition for material recoloring with consistent interreflections.ACM Trans. Graph. 30, 4 (2011), 43:1–43:10.

Huiwen Chang, Ohad Fried, Yiming Liu, Stephen DiVerdi, and AdamFinkelstein. 2015. Palette-based photo recoloring. ACM Trans. Graph.34, 4 (2015), 139:1–139:11.

Qifeng Chen, Dingzeyu Li, and Chi-Keung Tang. 2013. KNN matting. IEEETrans. Pattern Anal. Mach. Intell. 35, 9 (2013), 2175–2188.

Yung-Yu Chuang, Brian Curless, David H. Salesin, and Richard Szeliski.2001. A Bayesian approach to digital matting. In Proc. CVPR.

Yu Feng and Greg Hamerly. 2006. PG-means: Learning the number ofclusters in data. In Proc. NIPS.

Eduardo S. L. Gastal and Manuel M. Oliveira. 2010. Shared sampling forreal-time alpha matting. Comput. Graph. Forum 29, 2 (2010), 575–584.

Greg Hamerly and Charles Elkan. 2003. Learning the K in K-means. InProc. NIPS.

Kaiming He, Jian Sun, and Xiaoou Tang. 2013. Guided image filtering.IEEE Trans. Pattern Anal. Mach. Intell. 35, 6 (2013), 1397–1409.

Anat Levin, Dani Lischinski, and Yair Weiss. 2008a. A closed-form solutionto natural image matting. IEEE Trans. Pattern Anal. Mach. Intell. 30, 2(2008), 228–242.

Anat Levin, Alex Rav-Acha, and Dani Lischinski. 2008b. Spectral matting.IEEE Trans. Pattern Anal. Mach. Intell. 30, 10 (2008), 1699–1712.

Ido Omer and Michael Werman. 2004. Color lines: Image specific colorrepresentation. In Proc. CVPR.

Jinshan Pan, Zhe Hu, Zhixun Su, Hsin-Ying Lee, and Min-Hsuan Yang.2016. Soft-segmentation guided object motion deblurring. In Proc. CVPR.

J. Pont-Tuset and F. Marques. 2015. Supervised evaluation of image segmen-tation and object proposal techniques. IEEE Trans. Pattern Anal. Mach.Intell. 38, 7 (2015), 1465–1478.

Thomas Porter and Tom Duff. 1984. Compositing digital images. SIG-GRAPH Comput. Graph. 18, 3 (1984), 253–259.

Iulia Posirca, Yunmei Chen, and Celia Z. Barcelos. 2011. A new stochasticvariational PDE model for soft Mumford-Shah segmentation. J. Math.Anal. Appl. 384, 1 (2011), 104–114.

C. Richardt, J. Lopez-Moreno, A. Bousseau, M. Agrawala, and G. Drettakis.2014. Vectorising bitmaps into semi-transparent gradient layers. Comput.Graph. Forum 33, 4 (2014), 11–19.

Mark A. Ruzon and Carlo Tomasi. 2000. Alpha estimation in natural images.In Proc. CVPR.

Mark Schmidt. 2007. UGM: A Matlab toolbox for probabilisticundirected graphical models. Retrieved from http://www.cs.ubc.ca/∼schmidtm/Software/UGM.html.

YiChang Shih, Dilip Krishnan, Fredo Durand, and William T. Freeman.2015. Reflection removal using ghosting cues. In Proc. CVPR.

D. Singaraju and R. Vidal. 2011. Estimation of alpha mattes for multipleimage layers. IEEE Trans. Pattern Anal. Mach. Intell. 33, 7 (2011), 1295–1309.

Yu-Wing Tai, Jiaya Jia, and Chi-Keung Tang. 2005. Local color transfer viaprobabilistic segmentation by expectation-maximization. In Proc. CVPR.

Yu-Wing Tai, Jiaya Jia, and Chi-Keung Tang. 2007. Soft color segmentationand its applications. IEEE Trans. Pattern Anal. Mach. Intell. 29, 9 (2007),1520–1537.

Jianchao Tan, Jyh-Ming Lien, and Yotam Gingold. 2016. Decomposingimages into layers via RGB-space geometry. ACM Trans. Graph. 36, 1(2016), 7:1–7:14.

F. Yang, H. Lu, and Y. W. Chen. 2010b. Robust tracking based on boostedcolor soft segmentation and ICA-R. In Proc. ICIP.

Lei Yang, Pedro V. Sander, Jason Lawrence, and Hugues Hoppe. 2011.Antialiasing recovery. ACM Trans. Graph. 30, 3 (2011), 22:1–22:9.

W. Yang, J. Cai, J. Zheng, and J. Luo. 2010a. User-friendly interactive im-age segmentation through unified combinatorial user inputs. IEEE Trans.Image Process. 19, 9 (2010), 2470–2479.

Sai-Kit Yeung, Tai-Pang Wu, and Chi-Keung Tang. 2008. Extracting smoothand transparent layers from a single image. In Proc. CVPR.

Received August 2016; revised December 2016; accepted December 2016

ACM Transactions on Graphics, Vol. 36, No. 2, Article 19, Publication date: March 2017.