collaborative personalization of image enhancement

Int J Comput VisDOI 10.1007/s11263-013-0675-3

Collaborative Personalization of Image Enhancement

Ashish Kapoor · Juan C. Caicedo · Dani Lischinski ·Sing Bing Kang

Received: 21 February 2013 / Accepted: 16 November 2013© Springer Science+Business Media New York 2013

Abstract This paper presents methods for personalizationof image enhancement, which could be deployed in photoediting software and also in cloud-based image sharing ser-vices. We observe that users do have different preferencesfor enhancing images and that there are groups of peoplethat share similarities in preferences. Our goal is to pre-dict enhancements for novel images belonging to a particularuser based on her specific taste, to facilitate the retouchingprocess on large image collections. To that end, we describean enhancement framework that can learn user preferencesin an individual or collaborative way. The proposed system isbased on a novel interactive application that allows to collectuser’s enhancement preferences. We propose algorithms topredict personalized enhancements by learning a preferencemodel from the provided information. Furthermore, the algo-rithm improves prediction performance as more enhance-ment examples are progressively added. We conducted exper-iments via Amazon Mechanical Turk to collect preferencesfrom a large group of people. Results show that the proposedframework can suggest image enhancements more targeted

A. Kapoor (B) · S. B. KangMicrosoft Research, One Microsoft Way,Redmond, WA 98052, USAe-mail: [email protected]

J. C. CaicedoUniversity of Illinois at Urbana-Champaign,201 North Goodwin Ave., Urbana, IL 61801, USAe-mail: [email protected]

D. LischinskiThe Hebrew University of Jerusalem, Room 73, Ross Building,The Edmond J. Safra Campus, Jerusalem 91904, Israele-mail: [email protected]

S. B. Kange-mail: [email protected]

to individual users than commercial tools with global auto-enhancement functionalities.

Keywords Image enhancement · Personalization ·Collaborative filtering · Crowdsourcing

1 Introduction

The widespread use of digital cameras now empowers manypeople to capture photographs of important events as well aseveryday life activities. However, despite progress in hard-ware for high resolution images, stabilization capabilities andlight-adaptive functionalities, taking good quality picturesremain a challenge for casual photographers. At times, whatpromised to be a very memorable picture does not look likeit should. Almost every photograph could benefit from sometone and color adjustment, but tweaking adjustment parame-ters for every single image is impractical.

Available software for managing digital albums providegeneric functionalities for one-click enhancement, such asWindows Live Photo Gallery and Picasa. While these generalpurpose tools make easy to improve image quality by avoid-ing to fine-tune enhancement controls, they do not considerany user preference. If the result of the auto/enhancement fea-ture is unsatisfactory, the user would then have to fall backto manually refining the image appearance.

In this work, we show that people have different prefer-ences for image enhancement. This means that when one per-son is allowed to enhance her photographs, she tends to adjustthe final look and feel following some personal patterns andpreferences. These preferences can arise due to many rea-sons that include personal taste, artistic expression or someother personal aspirations associated to the image. We pro-pose to investigate methodologies that can save the effort of

123

Int J Comput Vis

enhancing each picture separately, while at the same timeconsider the user’s preferences to produce enhanced images.We address the problem of designing computer methods thatcan assist users in the task of enhancing a photo collec-tion following personal preferences. This goal poses severalresearch questions and challenges:

– Although it is reasonable to assume that users can havedifferent preferences for image enhancement, it is impor-tant to identify the extent to which these preferences existand how significant the differences may be.

– If there are different preferences between individual users,can we find users that share similar preferences at a largerscale?

– Is it possible to predict personalized enhancements forindividuals or groups? This challenge presents variouscomputational issues that are inter-disciplinary; morespecifically, these issues straddle computer vision, humancomputer interaction, and machine learning.

– How can we build practical personalized image enhance-ment systems and evaluate their performance? Efficientcomputational methods are required to conduct real-timeuser studies, based on interactive systems.

This paper presents a framework for personalized imageenhancement that demonstrates the potential of personaliza-tion for both individual profiles and collaborative filteringenvironments. We report user studies that indicate the exis-tence of personal tastes for photo enhancement, and also,that preferences can be shared among various individuals ina large group. To conduct such studies, we design an inter-active user interface for image enhancement that allows sub-jects to easily choose improved images using data visualiza-tion techniques, while at the same time captures and recordsuser preferences.

We study two different issues in building such predictiveimage enhancement systems: (a) individually learnt prefer-ences and (b) collaboratively learnt preferences. To handlethe first issue, we model a personal user profile by finding aset of diverse images in the collection and asking the user totrain the system using this set only. Then, the system learnsand predicts enhancements for the rest or for novel images.We propose to handle the second issue via a collaborativeapproach, where the system learns preferences from manygroups of users and then allows prediction of enhancementsfor new users. The performance of the system improves asmore information of individual preferences are incorporated.We compared the enhancements produced by our methods tofree commercial auto-enhance tools and found that person-alized enhancements, both individually and collaborativelylearnt, lead to better predictions.

The contribution of this work is twofold: first, we presentan end-to-end pipeline for personalized image enhancement

at large scale, and discuss all the issues that a system of suchcharacteristics presents for implementation in practice. Thisinvolved various adaptations and extensions to existing tech-niques, and also the formulation of solutions to new problemsthat arise when building an interactive and adaptive systemfor image enhancement. Second, we present a unique com-putational model for image enhancement in a collaborativesetting, which can scale to assist a large number of usersto enhance images, requiring as minimum effort as possiblefrom each one. The proposed solution extends the scope ofcollaborative filtering to a new setting that, up to our knowl-edge, has not been approached before. Parts of this work werepublished in Kang et al. (2010) and Caicedo et al. (2011), andthis article presents a unified framework and extended dis-cussions.

The structure of the paper is as follows: we first surveytechniques relevant to our work (Sect. 2); this is followedby an overview of our personalized enhancement framework(Sect. 3), including details of the main components of aninteractive enhancement system. Next, we present the modelfor collaborative image enhancement in Sect. 4. User stud-ies and experimental results for both, individual users andcollaborative environments, are presented in Sect. 5. Finally,Sect. 6 presents discussions and concluding remarks.

2 Previous Work

Our work involves image enhancement, user interaction, andcollaborative filtering. In this section, we briefly review rel-evant work in these areas.

2.1 Image Enhancement

Most techniques for automatic image correction or enhance-ment typically focus on very specific features. For example,there is substantial work on denoising (e.g., Portilla et al.2003), geometric distortion removal (e.g., Farid and Popescu2001), and optical correction (e.g., Kang 2007) from a singleimage. There are also techniques for automatically lineariz-ing the color space of a single image, e.g., through inversegamma correction (Farid 2001) and linearizing the color edgeprofile in RGB space (Lin et al. 2004). Such approachesgenerally produce results that are objectively better than theinputs and thus user-independent.

There is also a fair amount of work done on automaticor semi-automatic color correction (e.g., Gehler et al. 2008;Gijsenij and Gevers 2007; Hsu et al. 2008). Because of theill-posed nature of the problem, such techniques may failwhen assumptions made (such as average pixels being gray)are not applicable. There are also learning-based techniquesfor automatically enhancing images (e.g., dictionary learningElad and Aharon 2006; Mairal et al. 2009 and example-based

123

Int J Comput Vis

Freeman et al. 2002), but the database used for learning tendto be generic.

The basic idea of image-derived priors is to exploit infor-mation in a predefined image set to automatically determinethe most likely parameters for enhancing an image. It hasbeen used for a wide range of tasks including denoising(Fergus et al. 2006), color correction (Stanikunas 2004), andimage restoration (Dale et al. 2009). These approaches usegeneric data sets with a large number of images that do notprovide any information about user identity or user prefer-ences. Grabler et al. (2009) use sample image manipulationsto learn macros to be applied to images with similar content.

Joshi et al. (2010) narrowed the domain of prior images toa person’s favorite photographs to develop a personal imageenhancement framework. They focus their application ondetecting and enhancing faces of photographer’s family andfriends, matching pictures that the user previously selectedas favorites. This personalization scheme differs from ourproposed framework in the sense that we explicitly use a per-son’s enhancement preferences, instead of extracting it fromimplicit information on favorite photos. Also, we extend thescope of preference priors to a multi-user environment totransfer preferences collaboratively.

The work of Bychkovsky et al. (2011) proposed learningimage enhancements from examples given by experts. Theyhired 5 trained photographers to enhance 5,000 images andproposed algorithms to learn automatic adjustment functionsfor each person. Since this is infeasible for normal users, theyalso proposed strategies to transfer user adjustments to newphotographers. Our work differs from this strategy in that wedo not rely on thousands of images enhanced by experts, butinstead, we let a crowd of normal users to collaborate andshare similar preferences.

2.2 User Interaction

To make it easier for the user to train our system, we designedits interface so that the user only needs to click through aseries of images that the user deem to be visually more pleas-ing. Our interface is similar to those of Marks et al. (1997),Shapira et al. (2009). Adobe Photoshop has a pre-defined setof “actions” (macros) that simplifies batch image enhance-ment, and allows the user to save new actions. However, it isnot clear how these actions can be automatically customizedfor a given user. Besides, our work combines various learningalgorithms with user interaction to make more intuitive andsimpler the experience of enhancing images, as described inSect. 3.

2.3 Collaborative Filtering

Our method builds upon both personalization of imageenhancement and collaborative filtering techniques. For per-

sonalization of image enhancement, our approach requiresusers to enhance a set of training images using an intuitiveinterface to collect preference information.

Collaborative filtering is an approach to build recom-mender systems, which analyzes patterns of user interest inproducts to provide personalized recommendations (Koren etal. 2009). Intuitively, collaborative filtering works by build-ing a database of preferences for items by users, and matchingnew users against this database to find neighbors with similarpreferences (Sarwar et al. 2001). Most existing approachesfor collaborative filtering rely on the notion of users, itemsand ratings. The analogous of these in our system are users,images and enhancements. Items in collaborative filteringare discrete objects with unique identity (e.g., Titanic movie,Harry Potter book), so users can be linked to them withoutambiguity. Images on the other hand are non-trivial to handleas in general they are very diverse, and determining similarityrequires modeling visual as well as semantic understanding.

Traditionally, the goal of collaborative filtering is to rec-ommend items to users, which is achieved by predictingscalar ratings. This problem has been naturally modelled as amatrix completion problem, for which different approacheshave been studied, including probabilistic matrix factor-ization (Salakhutdinov and Mnih 2007), maximum marginmatrix factorization (Rennie and Nathan 2005), and non-linear matrix factorization with Gaussian processes (GP)(Lawrence and Urtasun 2009). This is, however, very dif-ferent to our goal with collaborative image enhancement asinstead of a single rating we need to predict a vector ofenhancement parameters. Our goal is to predict structuredenhancements for new given images, using a model of userpreferences conditioned on image content. In this work thisis accomplished by incorporating the wisdom of the crowd asa prior to assist the user with image content that has not beenused for training before. These characteristics require com-putational models that are beyond the study of conventionalcollaborative filtering. We describe the proposed approachfor collaborative image enhancement in Sect. 4.

3 Personalization of Image Enhancement

We present a system for learning image enhancement pref-erences as depicted in Fig. 1. Our first goal is to collectenhancement parameters that the user likes for a small setof training images. The next goal is to learn a model to pre-dict enhancements for new images. This section presents thecomputational framework to achieve the first goal in an inter-active system. The system first asks the user to enhance aset of training images from which our system will learn topersonalize future enhancements. To assist the user duringthe training phase, a database is automatically constructedby selecting a representative set of training images. Then, a

123

Int J Comput Vis

simple and general interface guides the user to enhance eachof these images. The model for learning novel enhancementsusing this information is presented in Sect. 4.

There are three components of the interactive system thatserve as core building blocks to collect enhancement prefer-ences: (a) image enhancement pipeline (Sect. 3.1), (b) auto-matic database organization (Sect. 3.2), and (c) user interface(Sect. 3.3). These components are central to our interactivesystem, which allow to capture individual user preferencesduring training.

3.1 Image Enhancement Pipeline

Figure 2 presents the enhancement pipeline, which is focusedon global enhancement operations in the spatial domain. Theenhancement is performed in two steps: auto-enhancement,and personalized enhancement. The auto/enhancement stepis necessary to handle bad quality photos that the system isnot trained to handle. This step generates some kind of abaseline image that is then further adjusted using personal-ized enhancement.

We use three parameters associated with contrast and twoassociated with color correction to account for global trans-formations that could make the appearance of an image morepleasant. We limit the number of parameters to five primarilyto limit complexity, since the search space grows exponen-

Fig. 1 Basic idea of our approach. The control parameters for imageenhancement are represented by vector φ

Fig. 2 Image enhancement pipeline

tially with the number of parameters. The following is thelist of our enhancement parameters:

– Power curve: τ (contrast)– S-curve: λ (contrast)– S-curve: a (contrast)– Temperature: T (color correction)– Tint: h (color correction)

As for auto-enhance we apply two basic operations. First,auto-white balance, making a gray-world assumption for thebrightest 5 % of the pixels. Second, auto-contrast stretch,linearly stretching the brightness between 0.4 % of the darkerpixels and 1 % of the brighter pixels. More details about thesetwo and the five parameters for personalized enhancementcan be found in Appendix.

3.2 Training Data Collection and Organization

Since our personalized image enhancement scheme is datadriven, it is critical to carefully consider issues in collectionand organization of training data. The training images canbe extracted from a personal collection of photos of a singleuser, or from a large collection of images from multiple users.In either case, the two critical tasks required to automati-cally organize the database are: (1) matching images withsimilar enhancement requirements and (2) finding an infor-mative subset of images to ask the user for enhancements.We approach these two tasks using the learning algorithmspresented below.

3.2.1 Learning a Distance Metric Between Images

We handle the issue of image similarity using distance metriclearning. Note that there are many different metrics to com-pare images. However, our task is to determine similaritysuch that it correlates well with the enhancement parame-ters. In particular, we would like images that require similarenhancement parameters to be similar to each other; conse-quently, our goal is to learn a distance metric that enforcessuch regularization.

We construct the distance metric between two images as alinear combination of 38 different individual distances. Thesedistances include differences of histograms in each of thechannels of RGB, HSL, and intensity space using differentways to measure histogram distances (L2 norm, symmetricKL-divergence, smoothed L2 norm). We also consider thedifference in intensity histogram of gradient images in bothx and y directions. Finally, in addition to the distribution-based distances, we also consider distances that use entropyin each of the channels as well as the size and the aspect ratioof images (Fogarty et al. 2008). Formally, the parametricform of the distance metric between images Ii and I j is

123

Int J Comput Vis

Dimagesα (i, j) =

38∑

k=1

αk Dk(i, j). (1)

Here, α are the parameters in the linear combination andDk(·) denotes the individual distances computed.

We start by selecting a large set of images and computingdistances among them. Assume that we knew enhancementparameters beforehand for all these images. Then, we wouldseek to learn a distance metric Dimages

α , parameterized withα, such that it minimizes the following objective:

α∗ = arg minα

∑

i, j

||Dimagesα (i, j) − Dparams(i, j)||2, (2)

where Dparams(i, j) is the L2 norm distance between theenhancement parameters for Ii and those for I j .

The objective (2) examines all pairs of images and mea-sures how much the distance in image space differs fromthe distance in the parameter space. Thus, minimizing thisobjective leads to finding an appropriate distance functionthat reflects how far two images should be in terms of theirenhancement parameters. Note that this objective is convex;the unique optimum can be found using a closed form solu-tion. However, since the size of the problem grows quadrati-cally with the number of training images, we explored gradi-ent descent approaches to make our solution scalable to largecollections of images. In our first implementation, we useda limited memory BFGS (Liu and Nocedal 1989), whichis a quasi-Newton optimization algorithm useful to find thesolution. Then, we opted to follow the methodology of Jainet al. (2008), which is an online algorithm for learning Maha-lanobis distances efficiently. This specific online algorithm iswell suited for our task; for an image collection with 10,000examples, the problem has about 100 million constraints,which makes it non-trivial for other distance metric learn-ing methods to handle. In our large-scale user study, welearnt this distance using 300,000 web images in a singlemachine.

Note that the optimization procedure needs enhancementparameters for all images and it is not feasible for any userto find these parameters manually. Instead, we used theautomatically-determined parameters from our auto-enhancecomponent to estimate D params(·). Although not personal-ized, these parameters do capture how these images are likelyto be enhanced, and thus it is assumed that the learnt dis-tance metric using these proxy parameters leads to reason-able estimates of the relevant distance metric in the imagespace.

The learnt distance is adopted to establish a similarity rela-tion between the images and serves as a core component ofthe system. As shown by the experimental results, the learntdistance in fact gives a reasonably good notion of how closetwo images are with respect to their enhancement require-ments.

3.2.2 Selection of Training Set

Ideally, we need a rich enough training data set that sam-ples well the set of possible input images and appropri-ate enhancement transformations. Unfortunately, includinga large number of images in the training set is not feasible,because each user would have to go through a tedious train-ing phase. Thus, we seek to answer the following question:if the typical user is willing to spend the time enhancing, say,only 25 images, what should those 25 images be?

We answer this question of selecting the training imagesas a sensor placement problem (Krause et al. 2008). Eachinstance can be thought of as a possible sensor location,where a probe is placed in order to “sense” the space ofimages. Given a sensor budget (the number of trainingimages), we choose a set that can provide maximum infor-mation about the rest of the images. Intuitively, our approachbuilds upon the observation that instances that are close toeach other can be expected to share similar properties, includ-ing the appropriate enhancement parameters. Thus, our aimis to select a subset of instances that share the highest mutualinformation with the rest of the high-dimensional space andis therefore most representative of the full set.

Krause et al. (2008) proved that the exact optimization ofmutual information is NP-hard for the problem of placingk sensors out of N locations. However, they give theoreti-cal guarantees for a constant-factor approximation algorithmthat runs in polynomial time. We employ this approach toselect a maximally informative set. Formally, we consider aGP (used in Krause et al. (2008)) perspective with covariancefunction (alternatively, kernel function):

ki j = exp

(− Dimages

α (i, j)

mean(Dimagesα (:))

)(3)

Here Dimagesα (i, j) is the learnt distance as described in

Sect. 3.2.1. The procedure results in a ranking of all of theinput images. The top k images are then selected as the train-ing set to collect user preferences for image enhancement.

Even though the solution is approximate, it has good guar-antees. In our case, the task is to first consider all exampleimages sampled from a large collection, in an attempt to coverthe continuous image enhancement parameter space. Fromthose potential positions, the algorithm selects k points thatgive maximal mutual information of the space that they arecovering. The 25 images selected using this technique areshown in Fig. 3.

3.3 Interactive User Interface for Training

We developed a user interface that allows a user to seamlesslyexplore the space of possible enhancements for each of thetraining images and indicate his/her preferences. This user

123

Int J Comput Vis

Fig. 3 Selected 25 training images from the LabelMe dataset. These images were used in the individual user study

Fig. 4 Two versions of UI. a Our interface, where the central image iscurrently being considered while the images at the periphery are differ-ent modified versions. b A version where the same images are arranged

in linear order. The subject can flip between (i.e., blink compare) anypair of enhanced image candidates by mousing over them

interface, shown in Fig. 4, was designed to allow non-expertusers to quickly steer a given training image towards themost subjectively pleasing version, using only a few mouseclicks. The design of our interface is similar to those of Markset al. (1997), Shapira et al. (2009); the user has the choice oftoggling between the 3 × 3 tiled view in (a) and the linearview in (b). While the tiled view provides an overview thatreflects underlying structure of enhanced image space, thelinear view enables a higher resolution view with an abilityto do pairwise comparisons across candidates.

The basic idea in both views is to give a UI to the sub-ject where she can inspect and browse the space of imagesresulting from all possible combination of enhancement para-meters. Since the number of images resulting from all pos-sible combinations is prohibitively large, we use tools frommachine learning to reduce the number of images shown tothe user at any given time, and lay them out in a manner thatreflects some structure in the space of enhanced images.

More specifically, given an input image, we first applythe enhancement operators descrived in Sect. 3.1 to sample aneighborhood of the images in the enhancements space. Thisneighborhood is sampled by considering 3 parameter settingsfor each of the 5 enhancement parameters: a negative step,a zero step, and a positive step. All possible combinations

of these steps yield 35 = 243 candidate images. From theseimages, we select 8 representatives using the same sensorselection procedure described in Sect. 3.2.2 and display themto the user. To lay the 8 images out in the screen, a non-lineardimensionality reduction using ISOMAP (Tenenbaum et al.2000) is employed, projecting the structure of the originalspace in a 2D-plane. Figure 4a shows example of such tiling.

The current, unmodified image is displayed as well, result-ing in a total of 9 choices. The user then selects the versionthat he/she likes the best by clicking on it, and the entire pro-cedure is repeated around the selected image. The processcontinues until the user selects the unmodified image. Theuser is also able to control the step size used to generate thevariations.

3.4 Learning Individual Preferences

Once the training information is collected for a particularuser, an individual profile is defined and used as a non-parametric model of her preferences. This personal profilestores feature vectors describing each training image Iin,i

(previously enhanced by the user during training), along witha vector of enhancement parameters φi . Given a new inputimage, this profile is then searched for the best matching

123

Int J Comput Vis

image using the learnt metric. The corresponding enhance-ment parameters are used to perform the enhancement.

4 Enhancing Images Collaboratively

The previous section presented a novel interactive systemfor assisting users in enhancing a set of images to indi-cate which are their personal preferences. This interactivesystem is data-driven and highly automated to simplify theway users enhance images, so they can concentrate on pick-ing the best looking image instead of tweaking parametreswith manual controls. In the following section, we present amodel for learning to enhance images in a collaborative envi-ronment, i.e., when multiple users enhancing images sharesimilar preferences. We call this setting collaborative imageenhancement since it resembles collaborative filtering strate-gies to recommend movies or products. However, despiteseveral similarities in concept, there are fundamental differ-ences as well, as mentioned in Sect. 2.3.

Figure 5 summarizes the flow of tasks for collaborativeimage enhancement. We first acquire a database of imagesthat are enhanced by multiple users, and represent the data-base as a table (leftmost of Fig. 5) whose rows are individualusers and columns are images. Each entry in the table corre-sponds to enhancement parameters associated with a differ-ent user u and image xi , and are represented by vector θu

i . Theimages used for training are a fixed set, and are reasonablyrepresentative of images that need enhancement. Note thatthis set of representative images is large; hence, it is unreason-able to expect that each user will provide enhancement para-meters for all the images. Instead, every user enhances onlya small subset of images, and as such, there are quite a fewentries in the table that remain unobserved (denoted as ‘?’).

Once this database is acquired, we analyze it to learn astatistical model that explicitly encodes clustering of users.Specifically, we recover groupings of users and estimate thepreference parameters of each individual group for all the

representative images (middle of Fig. 5). Once such a sta-tistical model is learned, we can enhance a new (i.e., unseenduring training) image for any cluster by considering its sim-ilarity to the images in the representative set.

Let us denote the collection of all the n images as X = {xi }.The user study provides us with sets of enhancement para-meters chosen by all the m users. Let θu

i be the enhance-ment parameters chosen by user u for image xi . Because ofresource and time constraints, every user enhances only asmall subset of images (20 images out of 200) and conse-quently there are a lot of images for which we do not directlyobserve the enhancement parameters corresponding to eachuser. Our goal is to derive a framework that can (1) infer thesemissing enhancement parameters for every user in the study,and more importantly, (2) determine enhancement parame-ters for new images and users. The framework is shown inFig. 5.

Our model is motivated by the methods in collaborativefiltering. In particular, the first key underlying assumptionhere is that similar users should have similar enhancementparameters for similar images. Assumptions like this are alsoat the heart of many collaborative filtering methodologies; wecould in fact adapt those methods to infer missing enhance-ment parameters for all the users in the study. However,most of the work in collaborative filtering focuses on pre-dicting a scalar quantity while our goal is to model a vectorof enhancement preferences. While off-the-shelf approachescan be individually adapted to each enhancement parameter,such simplistic scheme ignores the structure of relationshipsacross parameters and will be sub-optimal.

In this work, we extend the collaborative filtering to a set-ting of structured prediction where we jointly predict all thecomponents of the enhancement preferences. Specifically,our model not only encodes similarity across images andusers but also models relationships between different compo-nents of the enhancement space. The notion of users similar-ity in our system corresponds to grouping of users into clus-ters. Thus, users are clustered if they have similar enhance-

Fig. 5 Overview of collaborative personalization of image enhance-ment. A database of users is built, with θu

i the observed enhancementvector for user u and image xi . Enhancements are not observed for everycombination of images and users, which is indicated by the symbol ‘?’

in the matrix. An algorithm is used to discover cluster membership forusers, and to infer enhancement vectors yc

i associated with cluster cfor each ith image. New images are enhanced based on similarities toexisting images and cluster to which the user belongs

123

Int J Comput Vis

ment parameters for all the images. In the next subsections,we describe (1) a probabilistic graphical model for jointlypredicting enhancement preferences that explicitly encodessimilarity across images and groups users into clusters, (2)an efficient inference algorithm, and (3) extensions to do pre-dictions on unseen images and users that were not part of theuser study.

4.1 Probabilistic Model for Enhancements

We propose a model that encodes the dependence of enhance-ment parameters on image content as well as user prefer-ences. Specifically, given the collection of images and clus-tering of users, we assume that there are latent enhance-ment preference vectors yc

i which correspond to cluster cand image xi . Further, the enhancement parameter vectorswe observe in the Mechanical Turk (MT) study are simplynoisy versions of these latent true enhancement preferences.Figure 6 illustrates the factor graph corresponding to theproposed model. The observed enhancement preferences θu

ifrom different users are denoted as circles and we introducea discrete random variable (squares) hu, u ∈ {1, .., m} foreach user that indicates the cluster the user belongs to. Wealso use Yi to denote the collection of all true enhance-ment preferences for the i th image across all the cluster-ings. The shaded nodes correspond to random variables thatare observed. For example, in Figure 6, the enhancementpreferences for user 1 are known for images 2 and n, whileuser m did not enhance image 2. This is consistent with ourdata set where users enhance only a subset of all trainingimages.

Fig. 6 Factor graph depicting the proposed model. Shaded nodes cor-respond to observed preferences. Not all observations are available forall the images and users. Square boxes depict latent random variablescorresponding to cluster membership

The model imposes smoothness constraints using a GPprior (Seeger 2004) in order to account for the image con-tent, enforcing the assumption that “similar” images shouldhave similar preference parameters. In particular, for eachcluster c we induce a GP prior (denoted as G P(c)), whereeach of the five components of the latent variables yc

i and ycj

are assumed to be jointly Gaussian with zero mean and thecovariance specified using a kernel function applied to xi andx j . Formally, G P(c) ∼ ∏5

p=1 N (yc(p); 0, K). Here yc(p)

is the column vector of pth component of enhancement pref-erence for all images corresponding to the cluster c and K isa kernel matrix1 with Ki j = k(xi , x j ) and encodes similar-ity between pairs of images. We use the kernel described inEq. 3.

Note, that the GP prior above does not explicitly encodethe relationship between the components of the parametersand we can assume that the components of yc

i have beenwhitened (zero mean, with unit covariance) beforehand2.Also, note that all the dimensions of the latent variable Yare coupled in the model and performing inference will pre-serve the relationships between different components.

Let � be all the enhancement preferences from all theusers and for all the images. Our proposed model induces aconditional probability distribution p(�, Y, h|X) using theGP prior p(Y|X), prior probabilities on the cluster member-ship p(h), and the potential terms p(θu

i |xi , Yi , hu) that linkthe latent image preferences to the ones that are observed.Thus, the conditional distribution induced by our model canbe written as

p(�, Y, h|X) = 1

Zp(Y|X)p(h)p(�|X, Y, h)

= 1

Z

k∏

c=1

G P(c)m∏

u=1

p(hu)

n∏

i=1

φhu

(θu

i , Yi

),

where G P(c) ∼ ∏5p=1 N (yc(p); 0, K), Z is the partition

function (normalization term) and the potential φhu (θui , Yi )

corresponding to a user u and image xi takes the followingform:

φhu (θui , Yi ) ∝ exp

(−||yhu

i − θui ||2

2σ 2

). (4)

Here, yhui is the hidden random variable for the same clus-

ter as the cluster indicated by hu and the image xi , and σ 2

is the noise parameter that determines how tight the relationbetween the smoothness constraint and the final label is. Bychanging the value of σ we can emphasize or de-emphasize

1 This kernel matrix is a positive semidefinite matrix and is akin to thekernel matrix used in classifiers such as SVMs.2 In this work, we whiten the enhancement preferences before applyingthe model and re-project back to ordinal space eventually which resultsin preservation of the structure of the parameter space.

123

Int J Comput Vis

the effect of the GP prior. In summary, the model providesa powerful framework for encoding dependence of enhance-ment parameters on image content (via the GP prior) as wellas the clustering of users and allows us to combine the priorassumptions with the data that is observed in the MT study.We would like to mention that the proposed model is in spiritwith existing research of mixture of GP models (Tresp 2001;Kapoor et al. 2005). The core distinguishing factor here isthat the proposed model is used to predict a vector in a frame-work that resembles collaborative filtering.

4.2 Inference in the Model

Given the observations �o from the MT study the key task isto infer the posterior distribution p(Y, h|X,�o) over latenttrue enhancement preferences and the clustering membershipfor all the users. Performing exact inference is prohibitive asthe joint distribution is a product of a Gaussian (GP prior andthe φ(·) potentials) and non-Gaussian terms (cluster mem-bership). We resort to approximate inference techniques inorder to get around this problem. In particular, we perform anapproximate inference by maximizing the variational lowerbound with the assumption that the posterior over the unob-served random variable Y and h can be factorized:

F =∫

Y,hq(Y)q(h) log

p(Y, h|X,�o)

q(Y)q(h)

≤ log∫

Y,hp(Y, h|X,�o),

where q(Y) = ∏kc=1

∏5p=1 q(yc(p)) is assumed to be a

Gaussian distribution and q(h) = ∏mu=1 q(hu) is a discrete

joint distribution over the unobserved labels. The approx-imate inference algorithm aims to compute good approxi-mations q(Y) and q(h) to the real posteriors by iterativelyoptimizing the above described variational bound. Specifi-cally, given the approximations qt (Y) and qt (h) from the t thiteration and assuming uniform prior over p(h) the updaterules are:

qt+1 (yc) ∝ G P(c)

∏

θui ∈�o

[φc(θ

ui , Yi )

]qt (hu=c),

qt+1(hu = c) ∝∏

θui ∈�o

φc

(θu

i , mean(

qt+1 (Yi )))

.

Intuitively, the update of image enhancement preferencesconsiders the cluster membership from the previous iter-ation and uses it to decide if a data term should beincluded in update for each cluster. Similarly, the updatefor the posterior over the cluster membership considersmean enhancement preferences from the previous itera-tion. Thus, starting from a random initialization the para-meters and posterior of the cluster memberships are iter-atively updated until convergence. Upon convergence, we

obtain posterior distribution of cluster membership for eachuser (q(h)) as well the distribution over the true imageenhancement preferences (q(Y)) approximated as a Gaussiandistribution.

4.3 Handling New Images and Users

Besides inferring about the images in the database, we arealso interested in predicting enhancement preferences for animage xtest that was not in the original training set as well asfor a user who was not part of the user study.

The more straightforward case is where we infer enhance-ment preferences for a user u who was a part of the user studyand had enhanced training images for us. Here, we simplyneed to run the inference algorithm where we augment therandom variable Y with Ytest , the collection of random vari-ables that represent true enhancement parameters for the newtest image across all the clusterings. Thus, the new image canbe enhanced using the mean of inferred enhancement prefer-ences corresponding to the cluster the user u belongs to. Notethat if we have already performed variational inference on thetraining data, inference after the augmentation only requiresone iteration of update equations. This is because the newimage does not introduce any new information about cluster-ings of the user or enhancement parameters, hence, makingit fairly efficient.

It is trickier if we need to enhance images for a user whois not part of the study. However, once we know the clustermembership of the new user, we can simply use the schemementioned above to estimate the enhancement preferences.Thus, the problem really reduces down to establishing thecluster membership of the user. To this end, we can employseveral methods. For example, the new user can enhancesome of the images from the training set, which in turn willgive us evidence about membership. The user now can beconsidered a part of the available corpus and inference canbe run as before. Alternatively, we can also engage the userin an introductory dialogue where training images are used;the resulting enhancements will indicate membership. In thiswork, we use the former approach of asking users to enhancea subset of training images to test the extension of the systemto new users and images.

5 User Studies and Experiments

The proposed model is evaluated in two settings: individualpersonalization of enhancement (Sect. 5.1) and collabora-tive personalization of enhancement (Sect. 5.2). The first isa controlled setup that aims to observe the extent to whichindividual models help to obtain personalized enhancements.The latter is a large scale setup involving multiple users andweb images in a collaborative environment.

123

Int J Comput Vis

Fig. 7 A photograph of a window in a wall automatically selected fortesting. From left to right: input, Picasa auto-enhanced, photo galleryauto-enhanced, enhanced using preferences of subject 1, 7, and 9,

respectively. In this case, the subject-enhanced images were favoredby the respective subjects. Notice the significant differences in coloracross the different versions (Color figure online)

5.1 Individual Personalization of Enhancement

Our work is predicated on the assumption that imageenhancement is highly person-dependent. First we show,through user studies, that this assumption is valid and thenshow that it is feasible to learn preferences in image enhance-ments. The following experiments focus on the question:how significant are personal preferences in image enhance-ment? To answer this question, we investigate the impact ofan individual model on a user’s preferences and compare toother ways of enhancement. An individual model is a non-parametric profile of enhancements, as described in Sect. 3.4(Fig. 7).

The experiment was run in two phases: first was the train-ing phase where we collected image enhancement data, thesecond phase included user studies with pairwise compari-son among various enhanced flavors of test images in orderto test the effectiveness of the personalized image enhance-ment system. For both stages, we used the same 14 subjects(colleagues and acquaintances in our organization), 9 malesand 5 females. None of the subjects are experts in photogra-phy. An example image enhanced by different methods andusers is shown in Fig. 7.

5.1.1 Experimental Setup

For the first stage of the study, subjects are asked to enhancea set of training images to collect enhancement preferences.We selected 25 training images from (Fig. 3) 5,000 of theLabelMe dataset Russell et al. (2008) 3 following the pro-cedure described in Sect. 3.2.2. We found 25 to be a rea-sonable number, resulting in an interactive training sessionof 25–45 min, in our user studies. Next, we conducted thesecond stage of our experiment: pairwise-comparison. Letus denote the set of subjects as B = {bi , i = 1, ...14},with bi being the i th subject. We ask B to perform pair-wise comparisons amongst the following versions of a testimage:

3 Available at http://labelme.csail.mit.edu/

1. Original2. Auto-corrected using Picasa4

3. Auto-corrected using Windows Live Photo Gallery5

4. Personal model (using the subject’s preferences)5. Median model (using a “median” subject’s preferences)

The “median” subject is selected by first computing sum-of-squared distance over the enhancement parameters associ-ated with all the training images for subject bi and those forthe other subjects B −{bi }. The “median” subject is the sub-ject from B − {bi } with the median distance. The “median”strategy simulates the case when generic preference para-meters that satisfy the majority of the population. Note thatinstead of computing the average, it is preferable to use amedian operator; the median operator allows us to choosea specific enhancement profile instead of a possibly non-existent one.

The following pairs were compared: 4–1, 4–2, 4–3, 4–5,and 2–3. Note that the order of pairs and placement of firstand second images within each pair shown to the subject arerandomized. The interface for the pairwise comparison isshown in Fig. 8. The subject selects either “left,” “right,” or“no preference.” The user study was conducted in the sameroom under the same lighting condition and same display asthe first (learning) phase.

We used 20 test images for our user study, and for eachof those images the subject selects her preferred version in5 pairwise comparisons. In that way, we collect 100 votesfrom each of the 14 subjects to aggregate preference sta-tistics. Test images were selected from a large group ofimages taken by our colleagues and from the web. Theywere selected based on two criteria: (1) there is a rea-sonable variation in scenes and lighting condition, and (2)they look like they require some form of color correctionand/or contrast enhancement. The images are all differ-ent from those used for training. The pairwise comparisonportion took each subject between 10 and 20 min to com-plete.

4 http://picasa.google.com/5 http://download.live.com/photogallery

123

http://labelme.csail.mit.edu/

http://picasa.google.com/

http://download.live.com/photogallery

Int J Comput Vis

Fig. 8 Interface for pairwise comparison

5.1.2 Results of Pairwise Comparison

The results of the user study are summarized in the two graphsin Fig. 9. We first looked at the result of pairwise comparisonsacross different subjects. In particular, we consider the 20images in the test data, we look at the percentage of timesa participant choose a system in each of the comparisons(subject vs. input, subject vs. median, subject vs. Picasa,subject vs. photo gallery, and Picasa vs. photo gallery).In summary, for every subject we have the percentage oftimes (out of 20 images) that participant choose a systemover another for each pairwise task and can analyze the datato see significant effects.

Figure 9a graphically shows the means of these percent-ages averaged over all the 14 subjects (error bars denote thestandard error). We further did significance analysis usingWilcoxon (1945) rank sum test for each of the pairwise con-ditions, and found significant differences in scores betweensubject versus input (p < 0.01) and subject versus median(p < 0.01). These results indicate that the participants over-whelmingly selected their own model (mean = 59.30 %) overthe input image (mean = 30.70 %), suggesting that the pro-cedure did help enhance the input image. More interest-ingly, the participants also preferred their own model (mean= 57.10 %) instead of the Median model (mean = 28.20 %),suggesting that the preferences among participants vary quitea bit and provides further evidence that the personalizedimage enhancement is required instead of just a single “auto-enhance” functionality. Figure 7 shows the different versionsfor one of the test images.

Finally, the participants showed some bias in preferencetowards the personalized enhancement when compared toexisting commercial systems (mean = 50.35 vs. 39.30 %against photo gallery and mean = 44.65 vs. 41.75 % againstPicasa). While the difference was significant for photo gallery(p < 0.05), it was not for Picasa. Note that the proposed sys-tem only uses 5 simple enhancement operations; we hypoth-esize that the personalized enhancement has the potential tofurther improve upon the performance of the existing com-mercial systems by using them as a “pre-process” step andthen overlaying the personalization.

Next, we also compared the number of subjects that pre-ferred one system over another in the five pairwise com-parisons. Specifically, we consider that a participant prefersone system over another when she chose more images

Fig. 9 Results of individual user study show the overwhelming prefer-ence of subjects for images enhanced by their own personalized model.Subjects selected one of the two enhanced versions of a test image in5 pairwise comparisons. Each group of three bars corresponds to one

pairwise comparison. a Graph comparing mean frequency of image ver-sion favored (in percent over 20 images, with standard deviation bars).b Graph comparing number of subjects predominantly favoring theimage version. The results are averaged over 14 subjects

123

Int J Comput Vis

corresponding to the former system than the latter. Figure9b graphically shows the results. To judge the significanceof the numbers, we did an exact binomial sign test and theresults indicate that the subjects personalized model was sig-nificantly preferred over the input image (p < 0.01) and themedian model (p < 0.01).

Note that this individual personalization framework ismostly suitable for small-scale training for personalizedimage enhancement. One of the bottlenecks of applying suchsupervised learning paradigm is the difficulty in collectingtraining images for every individual user and photographstyle. In Sect. 5.2, we specifically focus on the issue of scal-ing the enhancement framework. The key question that moti-vates the next part of the work is: are there any clusters inthe training data? The existence of clusters (both in terms ofusers and enhancement operations) would suggest a simplerapproach via “preset” preferences (each set correspondingto a cluster), with the best “preset” preference explicitly orimplicitly selected by the user.

We handle the scaling problem by observing and learn-ing from multiple users using methods similar to collabora-tive filtering. For more practical larger scale training and tofacilitate the discovery of common features across users, wepropose that image enhancement be done collaboratively.

5.2 Collaborative Personalization of Enhancement

We carried out a large scale user study to explore the effec-tiveness of collaborative image enhancement. This experi-ment was conducted in two stages as well, starting from thecollection of enhancement parameters from a large popula-tion of users, from which a model has to be learnt. The secondstage aims to validate whether the collaborative model pre-dicts better enhancements for an individual.

5.2.1 Experimental Setup

A larger image database was built for this experiment, bydownloading a set of real photographs from Flickr, followinga “sampling through time” strategy, in which the parameter“date taken” is randomly changed (Chua et al. 2009; Haysand Efros 2008). The list of keywords includes “landscape,”“nature,” “people,” and “sports,” among others, to cover adiverse range of photographs without focusing on particularobjects. We filtered out images that are too small (imagessmaller than 800 × 800 are removed), black-and-white, andhave very little intensity variation (measured by histogramentropy). In the end, our downloaded set consisted of morethan 300,000 photos.

We also observed that users tend to upload many simi-lar images, so the sensor placement algorithm (Sect. 3.2.2)was applied to images from the same user, to keep only thethree most “different” images from each individual. We then

combined these filtered images and ran the algorithm againto select a final photo collection for the user study. We foundthat with 200 images, we kept about 84 % of the informa-tion from a set of approximately 25,000 candidates. Then,we use these 200 as training images, which are significantlymore photographs for training than those used in the previ-ous experiment (200 vs. 25). However, each subject handlesonly 20 images.

The next step was to build the user-image matrix ofenhancement parameters (the same parameters as describedin Sect. 3.1). We opted for Amazon Mechanical Turk (AMT)to engage people in our image enhancement task. Each per-son had to enhance a small number of images by selecting themost preferred enhanced version via an interactive web sys-tem. We managed to collect enhancement parameters from336 valid users (Fig. 10).

Two main criteria guided the design of the system forimage enhancement used in this study: (1) easy of use, and(2) web-oriented. For the first criterion, the user interfacedoes not require parameter tweaking to apply enhancements,because our users are not expected to be expert photogra-phers. Consequently, we simply used the 3 × 3 presentationmode of UI described earlier (Fig. 4a), with an enhanced ver-sion of the image in each cell. For the second design crite-rion, the application is required to be a web-oriented system,to allow people to enhance images online through AMT. Weopted to deploy our application using cloud computing ser-vices, which allows our system to scale up as needed to com-pute image enhancements online. In our user study, we askedeach participant to enhance 20 images, which are randomlyassigned from the collection of 200 photos. The assignmentof images to users attempts to have approximately the samenumber of users enhancing each image. The maximum timeallowed to complete this task was 1 hour and the reward wasUS$1.50. The click count, click rate, and time stamps wererecorded to identify spammers.

5.2.2 Learning the Collaborative Model

We first perform experiments to determine the correct num-ber of clusters using the data collected in the MT study.Train-test splits were created by randomly choosing 10 % ofthe observed enhancement vectors in the user-image matrixas test examples. The other 90 % is used to run the infer-ence algorithm described in Sect. 4.2 with σ 2 fixed to 10−3.Evaluating the quality of user adjustments is difficult, sincethere is no universal agreement on the perceptual metric. Weresorted to the objective measure of intensity difference look-ing at average root mean square error (RMSE) in parameterspace (enhancement vectors) as well as image intensity space(resulting enhanced images).

Figure 11 highlights the variation of RMSE as we changethe number of clusters in the model. Note that considering

123

Int J Comput Vis

Fig. 10 Box plots of values for each parameter (first three controlglobal contrast, last two control color correction). The x-axis indicatescluster number and y-axis corresponds to range of values. Red lines

denote the median, end lines are at the lower and upper quartile values,and crosses beyond the ends of the whiskers are outliers (Color figureonline)

Fig. 11 Average estimation error on training for different number ofclusters

only one cluster is equivalent to assuming that all users followthe same preferences for enhancing images. We observe thaterror is highest in both parameter and intensity spaces whenthe number of clusters is fixed to one and reduces significantlyas we start incorporating additional number of clusters. Thisresult is consistent with the observations reported in Sect.5.1, where we show that different users do have significantlydifferent preferences. We also find that the error was mini-mum in the parameter space for 3 clusters and pretty closeto minimum in the intensity space. Thus, we perform rest ofthe analysis and experiments by fixing the number of clustersto 3.

The proposed method can be thought of as a clusteringalgorithm with constraints imposed by similarity in the imageappearance space. Consequently, it is in principle the sameas an EM algorithm for learning a Gaussian mixture model,and it does not prefer that most users have their own distinctcluster. We think that the increase in error with more clustersmay be due to the small number of enhancements operationsallowed, and we can expect to see different clusterings ifadditional operations are permitted.

Figure 12 shows example images enhanced according tothe preferences of users in each of the three clusters, giv-ing a general indication of the preferred appearance in eachcluster. While the corrected images in cluster 2 are slightlymore saturated than the others, images in cluster 3 seem tohave more contrast. A more quantitative description of thepreferred enhancements in each cluster can be seen in Fig.10, which presents the distribution of the five parameters asbox plots. The box plots are over the means of the inferreddistribution over Y (all 200 images for the 3 clusters), wherethe red line in each column denotes the median. Note that

Fig. 12 Example images enhanced according to the preferences dis-covered for each cluster

the largest difference across clusters is for the power curveand the S-curve inflection point, suggesting that variationsin contrast is a dominating factor across the clusters. Thereare some differences across the rest of the three parametersas well suggesting a difference in parameter choice amongstthe users.

5.2.3 Testing the Collaborative Model

Next, we evaluate the ability of the proposed model to predictenhancement preferences for a new user on unseen imagesin a collaborative environment. In this test, 10 % of all userswere randomly chosen as test subjects and inference wasperformed using only the data observed for rest of the users.We simulate the real-life situation where every test user pro-vides evidence about its cluster membership by enhancingsome images. Since this work is about collaboratively usinginformation about image enhancement available from manydifferent users, we show results on scenarios where a test user

123

Int J Comput Vis

Fig. 13 Average error of predicting personalized enhancement para-meters. Auto-enhancement tools do not learn user preferences. Ourapproach predicts personalized enhancements with increasing accuracyas users provide new examples

has provided such little information that personalized modeldescribed in Sect. 5.1 cannot learn. We look at the RMS errorin intensity space as each user enhances one image at a time(chosen randomly) and compare them with results of auto-enhancement from Picassa and Windows Live Photo Gallery.

Figure 13 presents the plot of RMSE obtained by thesystem with increasing number of enhancements availablefrom the user. We find a consistent reduction in estimationerrors as more enhancements are added to the personal pro-file. This is due to the fact that additional enhanced imagesprovide more information about the cluster membership ofthe user, enabling the inference procedure to estimate a bet-ter recommendation. We also observe that the performanceof the collaborative approach is far better than the two auto-enhancement tools. While all the three approaches performsimilarly when there is no evidence about the cluster member-ship of the user, we see that the performance of collaborativestrategy greatly improves as user starts enhancing images.Also, note that the gains start showing up with as few as 2images indicating that strong gains can be obtained with thecollaborative enhancement of images.

Finally, Fig. 14 shows several image examples with thecorresponding enhanced versions produced by PhotoGalleryand Picassa. Also, the enhanced version of our method, fora subject of the cluster 3 is presented. In general, each toolproduces a different enhancement resulting in a different finallook and feel. Notice that the suggestion made by our methodhas been inferred using the subjective preferences learnedfrom people in cluster 3, thus it is more likely to be preferredby a user belonging to that population.

6 Discussion

In this work, we focus on user-specific image enhancementinstead of correcting arbitrarily bad images (which span a

Fig. 14 Example images with the corresponding enhanced versionsgenerated by auto-enhancement tools and the proposed approach for asubject that belongs to cluster 3

large space). We see the role of personalization as refiningthe output of a “generic” automatic enhancement module.Our improvement over Picasa is marginal and statisticallyinsignificant; our “generic” (i.e., non-user-specific) portionof our system is likely to be less effective than that of Picasa.However, note that our back-end “generic” version can beeasily replaced with Picasa or Windows Live Photo Gallery,thus potentially providing further opportunities for enhance-ments.

In our system, only 5 parameters are used for personaliz-ing image enhancement. While results do show in favor ofpersonalized versions, it is likely that more parameters areneeded to optimize the personalization effect. There are otherimportant features, such as auto-cropping, filters (e.g., sharp-ening), other types of contrast enhancement (e.g., shadow-midtone-highlight curve specification), and optical correc-tion (e.g., vignetting and barrel distortion correction).

Clearly we need to balance the initial training set sizewith user effort in personalizing the system. We are lookinginto incremental training: the user manually corrects if thesystem-generated output is inadequate, and the correction isused to update the user’s preference database.

7 Conclusions and Future Work

In this paper, we demonstrated two key ideas: (1) imageenhancement through color correction and contrast improve-ment is person-specific, and (2) increasing the scale for train-ing to determine cluster of preferences and for personalizedauto-enhancement can be accomplished through a collabo-rative framework.

To show that image enhancement is strongly person-specific, we designed an end-to-end pipeline that covers train-ing, user interface issues, and testing. We use a distancemetric learning technique to allow us to match images thathave similar enhancement requirements. We use an activesensor selection technique to select the training set from a

123

Int J Comput Vis

larger database of images. We also use it during the trainingphase, where a subset of images that best represent variationin the space of enhanced images is determined for displayand user selection. Results suggest that while general tech-niques for enhancing images are helpful, image enhance-ment has a strong personalization component, which shouldhelp improve the (subjective) quality of images even fur-ther.

The user study done to characterize the significance of per-sonal preferences is necessarily of small-scale, mostly to con-trol the environment under which subjects view the enhancedimages. In the second part of the paper, we addressed theissue of establishing clusters of enhancement preferences fora much larger set of subjects. To that end, we proposed a col-laborative framework. Instead of building a system that istrained to enhance images for a specific user, our techniqueis a principled and practical way for collaboration at a webscale, and can encompass alternative parameterizations anderror criterion. The main idea is we pre-discover the clustersof personalized enhancement parameters through collabora-tive learning on large image collections. Then, for a new user,we only need to figure out which cluster she belongs to.

Results of the cluster analysis showed the existence ofthree main groups, mainly characterized by differences incontrast preferences. Experimental results indicate that thecollaborative enhancement strategy significantly helps inmaking better predictions of enhancement parameters thanexisting one touch-button commercial auto-enhance tools.We do not claim that our results extend to a much larger scalewith many more users and a much richer set of enhancementknobs; more experiments are required to substantiate such aclaim.

Future work includes active selection of images that wouldenable better estimation of cluster membership with fewestimage enhancements by the user. Also, modeling the prob-lem of completing enhancement parameters in the user-imagematrix may be formulated as completing values in a tensor,which is an interesting extension for further research. A cen-tral component of the proposed framework is a distance met-ric that measures the extent to which two pictures requiresimilar enhancements. We formulated this as a metric learn-ing problem at large scale and adopted it as a building blockin our system. However, correctly defining an appropriatemeasurement for this endeavor may be a research problemby itself, requiring the design of new features and metrics.

8 Appendix: Detailed Enhancement Operations

We approximate the processes of linearization and reversionto original nonlinear space with gamma curves with para-meter γ = 2.2 and γ −1 = 0.455, respectively. RGB is lin-earized using clinear = cγ , where c = R, G, B (normalized).

Fig. 15 Two examples of contrast curves. a S-curve to specify globalcontrast change. b Parameterized curve to specify shadow, mid-tone,and highlight regions. In our enhancement pipeline, we use (a)

The linearized RGB values are then color corrected and con-trast enhanced, and finally “unlinearized” by applying theinverse operation.

8.1 Contrast Curve Specification

To manipulate the contrast, we use the power and S-curves.

Power curve: τ . This is equivalent to the gamma curve (butnote that is kept separate from the gamma curve, as seen inFig. 2): y = xτ , with x and y being the normalized inputand output intensities.

S-curve: λ, a. The S-curve is also commonly used to glob-ally modify contrast. The formula we used to specify theS-curve is

y =⎧⎨

⎩a − a

(1 − x

a

)λ if x ≤ a

a + (1 − a)(

x−a1−a

)λ

otherwise, (5)

with x and y being the normalized input and output intensities(see Fig. 15a).

8.2 Color Correction: Temperature T and Tint h

We color correct based on color temperature T and tint h,rather than applying a 3 × 3 diagonal matrix6. The notion oftemperature and tint is a more natural parameterization fromthe photographer’s perspective. Also, we deal with only twoinstead of three parameters. (One can “normalize” the matrixso that the resulting luminance is unchanged, yielding twoindependent parameters, but these numbers are perceptuallyless meaningful.)

The color temperature is determined by comparing itschromaticity with that of an ideal black-body radiator. Inpractice, however, color temperature is relative to a stan-

6 Details on the process of color correction using T and h can be foundin (Wyszecki and Stiles 1982).

123

Int J Comput Vis

dard, usually D65 standard, which is equivalent to 6,500 K.The temperature curve is along the blue–yellow line. Unin-tuitive as it may seem, higher color temperatures (5,000 Kor more) are “cool” colors (green–blue), while lower are“warm” colors (yellow–red). Tint, however, is orthogonalto color temperature, and controls change along the green-magenta axis.

8.3 Auto-Correction as Preprocess Step

Our version of auto-correction consists of auto-white balancefollowed by auto-contrast stretch.

Auto-white balance. We make the gray-world assumptionfor the brightest 5 % of the pixels (Buchsbaum 1980). Thegray-world assumption is that the average color of all pix-els in the image, i.e., the white point, is gray (R=G=B) withequal luminance as the average color. A matrix M is com-puted such that it transforms the average color to gray. Thismatrix M is then applied to all the pixels in the image. Wefound that empirically, it is more robust to use a certain per-centage of the brightest pixels that are not saturated to com-pute the white point. More sophisticated techniques (such asHsu et al. (2008)) may be used, but even then, limitationsexist. This portion generates 3 parameters, each a color bandmultiplier.

Auto-contrast stretch. We find the intensity I0 such that amaximum of 0.4 % of the pixels are darker or as dark as I0,and intensity I1 such that a maximum of 1 % of the pixelsare brighter or as bright as I1. We then linearly stretch thebrightness so that I0 is mapped to 0 and I1 is mapped to255 (with appropriate clamping at 0 and 255). This portiongenerates 2 parameters, a shift and a scale, similarly appliedto all color bands.

References

Buchsbaum, G. (1980). A spatial processor model for object colourperception. Journal of the Franklin Institute, 310, 337–350.

Bychkovsky, V., Paris, S., Chan, E., & Durand, F. (2011). Learningphotographic global tonal adjustment with a database of input/outputimage Pairs. In: CVPR’11.

Caicedo, J., Kapoor, A., & Kang, S. B. (2011). Collaborative personal-ization of image enhancement. In CVPR’11.

Chua, T. S., Tang, J., Hong, R., Li, H., Luo, Z., & Zheng, Y. (2009). Nus-wide: A real-world web image database from national university ofsingapore. In CIVR ’09, ACM.

Dale, K., Johnson, M. K., Sunkavalli, K., Matusik, W., & Pfister, H.(2009). Image restoration using online photo collections. In ICCV.

Elad, M., & Aharon, M. (2006). Image denoising via sparse and redun-dant representations over learned dictionaries. IEEE Trans ImageProcessing, 54(12), 3736–3745.

Farid, H. (2001). Blind inverse gamma correction. IEEE Transactionson Image Processing, 10(10), 1428–1433.

Farid, H., & Popescu, A. C. (2001). Blind removal of lens distortions.Journal of the Optical Society of America, 18(9), 2072–2078.

Fergus, R., Singh, B., Hertzmann, A., Roweis, S. T., & Freeman, W.T. (2006). Removing camera shake from a single photograph. ACMTransactions on Graphics, 25(3), 787–794.

Fogarty, J., Tan, D., Kapoor, A., & Winder, S. (2008). Cueflik: Inter-active concept learning in image search. In Conference on humanfactors in computing systems (CHI).

Freeman, W. T., Jones, T. R., & Pasztor, E. C. (2002). Example-basedsuper-resolution. IEEE Computer Graphics and Applications, 22,56–65.

Gehler, P. V., Rother, C., Blake, A., Minka, T., & Sharp, T. (2008).Bayesian color constancy revisited. In CVPR.

Gijsenij, A., & Gevers, T. (2007). Color constancy using natural imagestatistics. In CVPR.

Grabler, F., Agrawala, M., Li, W., Dontcheva, M., & Igarashi, T. (2009).Generating photo manipulation tutorials by demonstration. ACMTransactions on Graphics, 28(3), 1–9.

Hays, J., & Efros, A. (2008). Scene completion using millions of pho-tographs. Communications of the ACM, 51(10), 87–94.

Hsu, E., Mertens, T., Paris, S., Avidan, S., & Durand, F. (2008). Lightmixture estimation for spatially varying white balance. ACM Trans-actions on Graphics and SIGGRAPH, 27(3), article 70.

Jain, P., Kulis, B., Dhillon, I., & Grauman, K. (2008). Online metriclearning and fast similarity search. In NIPS.

Joshi, N., Matusik, W., Adelson, E. H., & Kriegman, D. J. (2010). Per-sonal photo enhancement using example images. ACM Transactionson Graphics, 29(2), 1–15.

Kang, S. B. (2007). Automatic removal of chromatic aberration from asingle image. In CVPR.

Kang, S. B., Kapoor, A., & Lischinski, D. (2010). Personalization ofimage enhancement. In CVPR.

Kapoor, A., Ahn, H., & Picard, R. W. (2005). Mixture of Gaussianprocesses for combining multiple modalities. In Workshop on mul-tiple classifier systems, 2005.

Koren, Y., Bell, R., & Volinsky, C. (2009). Matrix factorization tech-niques for recommender systems. IEEE, Computer Journal, 42(8),30–37.

Krause, A., Singh, A., & Guestrin, C. (2008). Near-optimal sensorplacements in gaussian processes: Theory, efficient algorithms andempirical studies. Journal of Machine Learning Research, 9, 235–284.

Lawrence, N. D., & Urtasun R. (2009). Non-linear matrix factorizationwith Gaussian processes. In Proceedings of the 26th annual interna-tional conference on machine learning. ACM, 2009.

Lin, S., Gu, J., Yamazaki, S., & Shum, H. Y. (2004). Radiometric cali-bration using a single image. In CVPR (Vol. 2, pp. 938–945).

Liu, D. C., & Nocedal, J. (1989). On the limited memory method forlarge scale optimization. Mathematical Programming B, 45(3), 503–528.

Mairal, J., Bach, F., Ponce, J., Sapiro, G., & Zisserman, A. (2009).Non-local sparse models for image restoration. In ICCV.

Marks, J., Andalman, B., Beardsley, P., Freeman, W., Gibson, S., Hod-gins, J., et al. (1997). Design galleries: A general approach to set-ting parameters for computer graphics and animation. In ACM SIG-GRAPH (pp. 389–400).

Portilla, J., Strela, V., Wainwright, M. J., & Simoncelli, E. P. (2003).Image denoising using scale mixtures of Gaussians in the waveletdomain. IEEE Transactions on Image Processing, 12(11), 1338–1351.

Rennie, J. D. M., & Nathan, S. (2005). Fast maximum margin matrixfactorization for collaborative prediction. In Proceedings of the 22ndinternational conference on machine learning. ACM, 2005.

Russell, B. C., Torralba, A., Murphy, K. P., & Freeman, W. T. (2008).Labelme: A database and web-based tool for image annotation. IJCV,77(1–3), 157–173.

123

Int J Comput Vis

Salakhutdinov, R., & Mnih, A. (2007). Probabilistic matrix factoriza-tion. In Advances in neural information processing systems (Vol. 20).MIT, Cambridge, MA.

Sarwar, B., Karypis, G., Konstan, J., & Riedl, J. (2001). Item-basedcollaborative filtering recommendation algorithms. In WWW. NewYork, NY: ACM.

Seeger, M. (2004). Gaussian processes for machine learning. Interna-tional Journal of Neural Systems, 14(2), 69–106.

Shapira, L., Shamir, A., & Cohen-Or, D. (2009). Image appearanceexploration by model based navigation.

Stanikunas, R. (2004). Investigation of color constancy with a neuralnetwork. Neural Networks, 17(3), 327–337.

Tenenbaum, J. B., de Silva, V., & Langford, J. C. (2000). A global geo-metric framework for nonlinear dimensionality reduction. Science,290(5500), 2319–2323.

Tresp, V. (2001). Neural Information Processing Systems: Mixture ofGaussian Processes 2001.

Wilcoxon, F. (1945). Individual comparisons by ranking methods. Bio-metrics, 1, 80–83.

Wyszecki, G., & Stiles, W. S. (1982). Color science: Concepts andmethods quantitative data and formulae. New York: Wiley.

123

collaborative personalization of image enhancement

Documents