visual explanations - rutgers universitymdstone/pubs/npar10-explanations.pdf · 2013-04-23 · in...

6
In NPAR 2010 Visual Explanations Doug DeCarlo Matthew Stone Department of Computer Science & Center for Cognitive Science Rutgers, the State University of New Jersey Abstract Human perceptual processes organize visual input to make the structure of the world explicit. Successful techniques for auto- matic depiction, meanwhile, create images whose structure clearly matches the visual information to be conveyed. We discuss how analyzing these structures and realizing them in formal representa- tions can allow computer graphics to engage with perceptual sci- ence, to mutual benefit. We call these representations visual expla- nations: their job is to account for patterns in two dimensions as evidence of a visual world. Keywords: non-photorealism, depiction, perception 1 Introduction Computer graphics increasingly aims to support people as they in- teract with shapes, scenes, animations, and images. That means creating renderings and depictions that make visual ideas clear. It means supporting people in accessing, comparing and adapting vi- sual content from existing sources—and allowing them to create new content easily and naturally, for example by sketching. In approaching these open-ended challenges, we face fundamen- tal problems in advancing the art and science of depiction [Durand 2002]. Human perceptual skills mirror the richness with which peo- ple understand and engage with the world, yet we somehow want our systems to take full advantage of those skills. We take our in- spiration from the results and practices of artists, but what makes those artists inspiring is often precisely the deep and unpredictable ways they exploit their visual intelligence. We need to find ways to come to grips with this complexity step-by-step. We need frame- works that help us link computer graphics and perceptual science, and methodologies that help us delimit new research ideas, evaluate systems and assess research progress. This paper advocates a focus on the representation of meaningful structure in imagery—that is, on models that assign a structural or- ganization to images that corresponds to the real-world relation- ships in a depicted scene. As justified further in Section 2, we call these representations visual explanations, because their job is to ac- count for patterns in two dimensions as evidence of a visual world. Working with visual explanations delimits the scope of research in depiction in two important ways. First, in the spirit of Marr [1982], it keeps the specification of visual information separate from computational questions about how visual information might be rendered or perceived. This relates graphics and perception, and opens up a range of algorithmic techniques to tie them together, in- cluding machine learning, probabilistic inference and optimization. Second, it sharpens the links between new research questions and clearly-defined classes of visual information. We naturally expect new subproblems to involve certain perceptual judgments and not others, to describe certain kinds of imagery and not others, and to enable certain kinds of depiction and not others. As we emphasize in Section 3, these methodological constraints provide a scaffold to guide research in depiction. Representation is already a key focus in perceptual science and in computer graphics. Nevertheless, we think its power remains un- derestimated. We envision increasingly powerful representations that gradually track more real-world information through more so- phisticated structures and inferences. These simultaneously enable a richer range of successful applications in computer graphics and a deeper understanding of human perceptual abilities. 2 Explanations When you look at a scene you understand it. The unconscious infer- ences 1 you make don’t just describe the geometry and appearance of the scene. They answer a range of further questions: What are the objects? How are they arranged? What are they made out of? How can they move? How were they formed? And why? What causal processes are currently underway? What has just happened? What is going to happen next? The list goes on. For you to know the answers to these questions, your visual system must build up a representation of sufficient richness and complexity. The answers emerge as a side-effect of this perceptual inference. This representation is special because it connects appearance and reality. Ultimately, it describes the imagery on the retina. But it builds up that imagery through structures that embody your under- standing of the visual world. 2 We are also familiar with such struc- tural organization in computer representations of imagery. For ex- ample, we can represent an image directly as a collection of pixels or represent it as the output of a PostScript program that generates vector graphics. Both might specify the same pattern of pixels, but the PostScript program generates it through higher-level primitives that compose the image in a rule-governed way. The perceptual system, however, organizes imagery using a par- ticular set of primitives that are tuned to human experience. Per- ceptual scientists use examples like Figure 1 to draw powerful and surprising inferences about what these representations specify, and what they omit. We see square A in Figure 1 is as a patch of dark paint that’s brightly illuminated. We see square B is as a patch of light paint that’s in shadow. Thus, our representations must have primitives both for paint—local variation in reflectance—and for shadow—local variation in illumination [Barrow and Tenenbaum 1978; Tappen et al. 2005]. And they can infer both paint and shadow anywhere in imagery. Our perceptual judgments draw on our representation of the imagery, not the imagery itself. That’s 1 Helmholtz [1962] used the term unconscious inference to describe the operation of perceptual systems. Although his specific proposals face the- oretical and empirical difficulties, Helmholtz’s insights nevertheless estab- lished an influential groundwork for computational accounts of perception. 2 As Searle [1983] distills it, the content of a perceptual experience re- lates the character of your experience to the properties of the world that your experience clues you into—looking at a ball, your percept is that it looks round because it is round, for example. 1

Upload: others

Post on 10-May-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Visual Explanations - Rutgers Universitymdstone/pubs/npar10-explanations.pdf · 2013-04-23 · In NPAR 2010 Visual Explanations Doug DeCarlo Matthew Stone Department of Computer Science

In NPAR 2010

Visual Explanations

Doug DeCarlo Matthew StoneDepartment of Computer Science & Center for Cognitive Science

Rutgers, the State University of New Jersey

Abstract

Human perceptual processes organize visual input to make thestructure of the world explicit. Successful techniques for auto-matic depiction, meanwhile, create images whose structure clearlymatches the visual information to be conveyed. We discuss howanalyzing these structures and realizing them in formal representa-tions can allow computer graphics to engage with perceptual sci-ence, to mutual benefit. We call these representations visual expla-nations: their job is to account for patterns in two dimensions asevidence of a visual world.

Keywords: non-photorealism, depiction, perception

1 Introduction

Computer graphics increasingly aims to support people as they in-teract with shapes, scenes, animations, and images. That meanscreating renderings and depictions that make visual ideas clear. Itmeans supporting people in accessing, comparing and adapting vi-sual content from existing sources—and allowing them to createnew content easily and naturally, for example by sketching.

In approaching these open-ended challenges, we face fundamen-tal problems in advancing the art and science of depiction [Durand2002]. Human perceptual skills mirror the richness with which peo-ple understand and engage with the world, yet we somehow wantour systems to take full advantage of those skills. We take our in-spiration from the results and practices of artists, but what makesthose artists inspiring is often precisely the deep and unpredictableways they exploit their visual intelligence. We need to find ways tocome to grips with this complexity step-by-step. We need frame-works that help us link computer graphics and perceptual science,and methodologies that help us delimit new research ideas, evaluatesystems and assess research progress.

This paper advocates a focus on the representation of meaningfulstructure in imagery—that is, on models that assign a structural or-ganization to images that corresponds to the real-world relation-ships in a depicted scene. As justified further in Section 2, we callthese representations visual explanations, because their job is to ac-count for patterns in two dimensions as evidence of a visual world.

Working with visual explanations delimits the scope of researchin depiction in two important ways. First, in the spirit of Marr[1982], it keeps the specification of visual information separatefrom computational questions about how visual information mightbe rendered or perceived. This relates graphics and perception, andopens up a range of algorithmic techniques to tie them together, in-cluding machine learning, probabilistic inference and optimization.Second, it sharpens the links between new research questions and

clearly-defined classes of visual information. We naturally expectnew subproblems to involve certain perceptual judgments and notothers, to describe certain kinds of imagery and not others, and toenable certain kinds of depiction and not others. As we emphasizein Section 3, these methodological constraints provide a scaffold toguide research in depiction.

Representation is already a key focus in perceptual science and incomputer graphics. Nevertheless, we think its power remains un-derestimated. We envision increasingly powerful representationsthat gradually track more real-world information through more so-phisticated structures and inferences. These simultaneously enablea richer range of successful applications in computer graphics anda deeper understanding of human perceptual abilities.

2 Explanations

When you look at a scene you understand it. The unconscious infer-ences1 you make don’t just describe the geometry and appearanceof the scene. They answer a range of further questions: What arethe objects? How are they arranged? What are they made out of?How can they move? How were they formed? And why? Whatcausal processes are currently underway? What has just happened?What is going to happen next? The list goes on. For you to knowthe answers to these questions, your visual system must build up arepresentation of sufficient richness and complexity. The answersemerge as a side-effect of this perceptual inference.

This representation is special because it connects appearance andreality. Ultimately, it describes the imagery on the retina. But itbuilds up that imagery through structures that embody your under-standing of the visual world.2 We are also familiar with such struc-tural organization in computer representations of imagery. For ex-ample, we can represent an image directly as a collection of pixelsor represent it as the output of a PostScript program that generatesvector graphics. Both might specify the same pattern of pixels, butthe PostScript program generates it through higher-level primitivesthat compose the image in a rule-governed way.

The perceptual system, however, organizes imagery using a par-ticular set of primitives that are tuned to human experience. Per-ceptual scientists use examples like Figure 1 to draw powerful andsurprising inferences about what these representations specify, andwhat they omit. We see square A in Figure 1 is as a patch of darkpaint that’s brightly illuminated. We see square B is as a patch oflight paint that’s in shadow. Thus, our representations must haveprimitives both for paint—local variation in reflectance—and forshadow—local variation in illumination [Barrow and Tenenbaum1978; Tappen et al. 2005]. And they can infer both paint andshadow anywhere in imagery. Our perceptual judgments draw onour representation of the imagery, not the imagery itself. That’s

1Helmholtz [1962] used the term unconscious inference to describe theoperation of perceptual systems. Although his specific proposals face the-oretical and empirical difficulties, Helmholtz’s insights nevertheless estab-lished an influential groundwork for computational accounts of perception.

2As Searle [1983] distills it, the content of a perceptual experience re-lates the character of your experience to the properties of the world thatyour experience clues you into—looking at a ball, your percept is that itlooks round because it is round, for example.

1

Page 2: Visual Explanations - Rutgers Universitymdstone/pubs/npar10-explanations.pdf · 2013-04-23 · In NPAR 2010 Visual Explanations Doug DeCarlo Matthew Stone Department of Computer Science

In NPAR 2010

Figure 1: The checker shadow illusion by Edward H. Adelson. Sur-prisingly, pixels in squares A and B are equal in intensity.

why we have no visual access to the surprising fact that the twosquares are rendered in the same shade of gray [Adelson 2000].

In this paper, we use the term visual explanation to describe anyrepresentation that organizes imagery in a way that mirrors thestructure of what’s depicted and explicitly shows something im-portant about the relationship between imagery and the world. Inthis sense, a visual explanation can be thought of as a “seman-tic representation” for graphics. But it puts the emphasis on therepresentation—not on the semantics.3

We are not in a position to give a complete or definitive accountof visual explanations. The scope of human perceptual represen-tation is too expansive. Some perceptual scientists, for example,have even suggested that human perceptual representations relateimagery to the causal processes involved in the creation and func-tion of an observed object [Leyton 1992]. In addition, our percep-tual explanations abstract away from the physics of illumination inintriguing ways that are not well understood. Figure 2(b) shows acube. In other words, even though the visual world has no lines init, your visual system still entertains the hypothesis that the lines onthe paper reflect the edges of a three dimensional object.

Instead, we suggest that perceptual science and computer graphicscan come together to delimit specific classes of visual explanationthat make sense of restricted classes of objects and imagery. Forexample, for imagery like that in Figure 1, our intuitive percept in-vokes paint and shadow. This suggests a specific vocabulary forconnecting such images with the scenes they depict. For imagerylike that in Figure 2(b), by contrast, our intuitive percept analyzesedges as convex creases and occlusions marking object boundariesin a polyhedral scene [Waltz 1975]. This requires a different vo-cabulary for relating images and scenes. For imagery like that inFigure 3(a), by contrast, our intuitive percept appeals to the changes

3Technically, a semantics is just a general scheme for associating repre-sentations with information. For example, representations of geometry al-ready have a perfectly good semantics, in terms of things like the positionsof points on surfaces. The geometry does have a deeper meaning, in thesense that it offers indispensable clues to the structure, history and causalityof the scene. But this kind of meaning is really just evidence. You piece ittogether through detective work. And to state your hypotheses and your con-clusions, you need more expressive representations—representations whosesemantics does directly characterize these broader aspects of the world andtheir relationship to geometry.

(a) (b) (c) (d)

Figure 2: Orthographic line drawings of cubes from two view-points. Depictions are from a generic viewpoint (a,b), or corner-aligned viewpoint (c,d); with hidden edges (a,c) or without (b,d).

(a) (b)

Figure 3: Two different depictions of a transparent grey rectangleover a black and white checkerboard. The gray rectangle in (a)appears as a closer, transparent object. The imagery in (b) appearsflat (after Singh and Anderson [2002]).

in appearance associated with partial occlusion by a transparent ob-ject, yet another possibility for representation [Singh and Anderson2002; Anderson and Winawer 2008]. Thus, we assume that visualexplanations can usefully range from the simple to the complex. Ina specific application, visual explanations need only be as elaborateas is required to describe the imagery used and the key informationabout the visual world it encodes.

Perceptual scientists think of the visual system as carrying out in-ference to the best explanation. There are lots of ways of concep-tualizing this inference: looking for the simplest [Kanizsa 1979]or most compact explanation [Hochberg and McAlister 1953], themost generic one [Binford 1981; Lowe 1985; Biederman 1987;Freeman 1994], or the most likely one [Helmholtz 1962; Weisset al. 2002; Kersten et al. 2004]. Surprisingly, these notions aremore closely related than you might expect [van der Helm 2007;Li and Vitanyi 2008; Feldman 2009]. For example, they all pre-dict that explanations that appeal to coincidences are dispreferred.Thus, we are less likely to see a cube in Figure 2(c) or (d) when theimagery has salient two-dimensional symmetry. We are less likelyto see transparency in Figure 3(b) when color changes align withthe pattern of paint we must already find on the surface.

Asking whether an explanation is right is therefore a very differentquestion from asking if the image is right. When we consider pos-sible explanations, we must connect the image with what we wantto depict in the world and what the viewer understands. Artists andart critics have long pointed out the need for artists to learn howto see in a way that lets them make effective depictions [Ruskin1857]. This guidance can also include explicit connections to whatis known about perception [Hodges 2003; Andrews 2006], so thatartists learn to be sensitive to the effects of different visual cues inencouraging particular percepts.

We face the same challenge in designing systems. Our perceptualsystem can deliver classes of explanation that we don’t anticipate—and some systems run into problems for this reason. A “showerdoor effect” [Meier 1996] happens when painterly brushstrokes fail

2

Page 3: Visual Explanations - Rutgers Universitymdstone/pubs/npar10-explanations.pdf · 2013-04-23 · In NPAR 2010 Visual Explanations Doug DeCarlo Matthew Stone Department of Computer Science

In NPAR 2010

(a) (b) (c) (d) (e)

Figure 4: Abstraction of a map of Australia [Mi et al. 2009] showing (a) the original shape; (b) an automatic abstraction and (c) the parts usedto construct it; (d) an abstraction produced by extracting only foreground parts; (e) an abstraction produced by using only part salience.

to follow a moving object but are instead attached to the viewingplane. It’s as if the depicted scene is seen through a bumpy, cloudysurface. Here our visual system finds a real-world explanation forthe visual effects of painterly rendering, even when they are in-tended as a purely artistic, evocative embellishment.

Our visual system can find a real-world explanation for other ren-dering approximations. If we omit an object’s shadow, it may ap-pear to float. The visual system demands an explanation of thelighting in the scene. Without a nearby cast shadow onto the floor,a better explanation puts the object in midair with a shadow fallingat a distance, perhaps outside the scene altogether.

In all of these cases, however, the connection with perception hap-pens at the level of representation, where we specify models ofscenes and assess people’s intuitions about imagery. We do not needprecise knowledge of the mechanisms of visual processing. We justneed to model the set of hypotheses the visual system entertains,and the coherence of each of these hypotheses as an analysis of theavailable imagery. This is why we advocate a focus on represen-tation. It suggests that computer graphics researchers can connectwith perception simply by being explicit about their assumptions inhow imagery is organized to match the depicted world. If empiricalresults are needed to assess whether perception organizes imagerythe same way, we can flesh out our representations by relying onhuman intuitions and machine learning. Nevertheless, our resultscan not only enrich the kinds of depictions we create but feed backinto the evolving literature on human perceptual explanations.

3 Implementation: Case Study

Experiments in depiction often involve building a computationalmodel of imagery and using the model to create a picture in aparticular style. In this context, working with visual explanationsrequires asking what visual information the model could possiblycapture, and how well the model succeeds at encoding this visualinformation. This section summarizes some of our own work onthe abstraction of 2D shape [Mi et al. 2009], which we approachedthrough this methodology. We built our application using visual ex-planations that describe 2D shapes in terms of parts. A part-basedvisual explanation of a shape shows how the shape can be assem-bled out of simple components, each of which has one principalsymmetry axis and one connection to the rest of the shape. Partsare well-enough defined from the perceptual and computational lit-erature [Singh and Hoffman 2001] that we could make sense of oursuccesses and failures in capturing parts of shapes. This helped usin extending, refining and improving our representations over thecourse of the project.

Our specific problem is abstraction of shape. We give an example inFigure 4(a)-(c). We approach abstraction through transformationsthat clarify the visual explanation of the important features of the

(a)

(b)

(c)

Figure 5: Shark: (a) original shape; (b) explanation in terms of itsparts; (c) abstraction.

Figure 6: Removal of background and foreground parts.

shape. Our transformations yield new shapes with fewer and clearerparts. The approach aims not for abstractions that look similar totheir originals—in fact, they will be plainly different—but rathershapes that the viewer understands in similar ways.

Figure 5 motivates our use of a part-based representation for ab-straction. In (a), we see this shark as a streamlined body, withfins attached—the visual explanation in (b) makes this structure ex-plicit. The abstraction in Figure 5(c) eliminates much visual detail.The form of the shark’s body and fins has been simplified, and anumber of fins have been removed. However, we understand theabstract image in the same way as the detailed one: a streamlinedbody with fins attached. In this sense, the original and the abstrac-tion share a common visual explanation for their important features.

Our part-based representation of shape is a tree. Nodes in the treeare parts. Each includes the specification of the geometry of a sim-ple component of the shape. Edges in the tree correspond to at-tachments between parts. Attachments describe the local geometrywhere parts come together.

3

Page 4: Visual Explanations - Rutgers Universitymdstone/pubs/npar10-explanations.pdf · 2013-04-23 · In NPAR 2010 Visual Explanations Doug DeCarlo Matthew Stone Department of Computer Science

In NPAR 2010

By considering core cases of part-based explanations, we realizedthat the explanations we wanted to capture involved both fore-ground and background parts. Consider, for example, the top tailfin of the shark of Figure 5, enlarged in Figure 6. The most naturalexplanation organizes the top fin into a single coherent foregroundpiece. In this piece is an indentation—part of the background. Thisindentation is clearly visible on the left, and is removed in the ab-straction in the center.

Geometrically, however, we can analyze the shark tail entirely withforeground parts, as on the right of Figure 6. The indentation nowmarks the transition between the tip of the fin (one foreground part)and the base of the fin (a second foreground part). Visual explana-tions with only foreground parts lead to distortions in part shape aslong parts get incorrectly truncated. These effects remain visible inabstractions of more complex shapes, as in Figure 4(d).

Having identified the parts we were aiming for, our goal was sim-ply to compute those parts, not to model the processes of humanperception. In fact, it was straightforward to analyze when ouralgorithm was failing and how to improve it. A key problem forpart-based explanations is disambiguation. As in Figure 6, partsdescribed by symmetry and cuts suggest a range of organizationsof the shape, not one in particular. Our first implementation dis-ambiguated part structure using a metric of perceptual salience tofind the most salient visual structure. However, because we hada clear understanding of the visual explanations we aimed to pro-duce, we were able to discover that this heuristic was inadequate. Itmade frequent mistakes that should have been avoidable given thedomain limits and representational affordances of our algorithm.Figure 4(e) shows how building part structures in salience orderprevents the system from deriving abstract shapes with clear andrecognizable parts. From simple examples, and complex ones suchas these, we were able to conclude that picking parts to increasesymmetry led to the desired explanations much more often thansalience. It took a lot of effort to adjust the thresholds by hand,however. In retrospect, developing an express set of training repre-sentations would have streamlined this engineering considerably.

Since publishing our work [Mi et al. 2009], we have gone on torun a human subjects study of our system’s results. The design ofthe study reflects our interest in the system’s visual explanations.By processing the results of the system to create shapes with partspreserved and without, we were able to show how preserving partstructure contributes to abstraction.

Our critical engagement with our system’s representations alsohighlights questions that we could not substantively evaluate—evenwhen our system’s performance was quite good. For example, noexplicit representation controls the individual appearance of parts,and whether they remain recognizable. And there is nothing that ex-amines the result that verifies if the part is still recognizable as such.Our system’s results are still fairly effective, but we don’t have therepresentational insights to argue that its effectiveness does not de-pend on a certain amount of luck.

Parts are important, but they do not exhaust our explanation of 2Dshape. In Figure 7(a), we see the head of a snake. We naturallyrecognize the partial opening of its hinged mouth, and understand

(a) (b)

Figure 7: Explanation of (a) a snake head and (b) a piece of toast.

that the mouth could also be closed, as shown at right. A differentkind of perceptual explanation is involved in the toast depicted inFigure 7(b). It looks like it has been cut with a knife, to removea corner, as diagrammed at right. When we decided to limit ourrepresentation to parts, we knew that we could not expect cases likethese to be successfully abstracted using our system.

This is a key point—starting from representations lets us see wherefuture work is, even when that’s not immediately obvious in output.Keeping individual parts recognizable is one direction to go. So ishandling other kinds of shapes like those in Figure 7.

4 Discussion

We think there are many examples of innovative research in depic-tion whose contributions are most insightfully described as extend-ing the range of visual explanations available for computer graph-ics. Even without an implementation of the underlying representa-tion, visual explanations can help in describing a system’s designand situating its contribution. For example, route maps are nat-urally understood as communicating directions—a kind of visualexplanation [Agrawala and Stolte 2001]. Route maps distort an ac-curate map into a schematic diagram. The choices involved can beoptimized with the goal of allowing the perceptual system to getthe right explanation (i.e. turns in roads remain recognizable) andto do so easily (i.e. so that key details are large enough). Analo-gously, a successful cutaway-illustration [Li et al. 2007] has a visualexplanation that spells out the relative location and placement ofparts, and their contexts. The explanation is particularly valuablebecause it can exhibit forms and signal relationships that wouldotherwise be invisible due to occlusion. Animation, meanwhile,offers up even further possibilities to convey explanations effec-tively. For instance, careful placement of lighting or selections ofviewpoints and character poses leads to scenes that are more engag-ing and easier to follow. When automated, such as the illustrativerendering style used in the video game Team Fortress 2 [Mitchellet al. 2007] forms can be rendered to make individual charactersand team membership more recognizable.

Visual explanations are not limited to NPR. Work in 3D shape anal-ysis can involve explicit construction of shape explanations, for ex-ample, in symmetry detection [Mitra et al. 2006; Pauly et al. 2008;Podolak et al. 2006]. Likewise, symmetrization [Mitra et al. 2007]can be seen as a manipulation that preserves and enhances aspectsof an explanation grounded in symmetry. Some recent work in per-ceptually based rendering stresses visual equivalence, which mod-els when viewers can discern differences in scenes [Ramanarayananet al. 2007] rather than simply differences in images [Daly 1993].Models of visual equivalence might ultimately be used to predictwhen the viewer recovers an equivalent explanation of the scene.

The examples we have offered in this paper motivate an approachto depiction that focuses on explanatory representations that sum-marize the visual information to be conveyed. We arrive in manyways at a refinement and a relaxation of the view of depiction asinverse vision advocated by Durand [2002]. In inverse vision, sys-tems create renderings that make key information available to theperceptual system, so that impressions from the renderings matchthe impressions viewers would have if they saw the actual scene.

Our point is not that you simulate perception, or that you aim foropen-ended correspondence between imagery and visual experi-ence of the world. Instead, by focusing on representation, you nar-row the questions of what you want to communicate and what peo-ple see, and model how specific information is reflected in specificimagery. As you investigate the constraints that allow this informa-tion to be signaled and inferred in principle, you are likely to findstructure that the perceptual system also exploits. But you may find

4

Page 5: Visual Explanations - Rutgers Universitymdstone/pubs/npar10-explanations.pdf · 2013-04-23 · In NPAR 2010 Visual Explanations Doug DeCarlo Matthew Stone Department of Computer Science

In NPAR 2010

ways not only of recreating visual experience but of extending it.Depictions can convey information in ways that would be difficultor impossible to achieve with a realistic or detailed rendering. Fromthis perspective, a good depiction is easy to see because the best ex-planation is a clear winner over qualitatively different alternatives,and it’s easy to find. Willats [1997] points out a special case of thiswhen only a single explanation exists, and calls such an picture aneffective representation of the shape.

Most importantly, by focusing on the representations that connectimagery to perception and the world, you derive a way to delimit theopen-ended generalizations that human perception potentially uses.At each step of investigation, identifying a target set of visual expla-nations can serve to restrict the scope of system building and empir-ical investigation, to connect system models with data from humanperception, to assess the performance of system and model intelligi-bly and illuminatingly, and pose questions for future investigation.In short, studying visual explanations promises steady progress to-wards deeper, more natural, and more human computer graphics.

Acknowledgements

Thanks to Maneesh Agrawala, Marc Alexa, Hedlena Bezerra, JacobFeldman, Adam Finkelstein, Gabe Greenberg, Aaron Hertzmann,Kristian Hildebrand, Olga Karpenko, Eileen Kowler, Xiaofeng Mi,Andy Nealen, Zygmunt Pizlo, Manish Singh, and Jason Stanley.We gratefully acknowledge support from the Alexander von Hum-boldt Foundation and the National Science Foundation under grantCCF-0541185.

References

ADELSON, E. H. 2000. Lightness perception and lightness il-lusions. In The New Cognitive Neurosciences, second edition,M. S. Gazzaniga, Ed. MIT Press, Cambridge, MA, 339–351.

AGRAWALA, M., AND STOLTE, C. 2001. Rendering effectiveroute maps: improving usability through generalization. In SIG-GRAPH 2001, 241–249.

ANDERSON, B. L., AND WINAWER, J. 2008. Layered imagerepresentations and the computation of surface lightness. J. Vis.8, 7 (7), 1–22.

ANDREWS, B. 2006. Introduction to perceptual principles inmedical illustration. In Illustrative Visualization for Medicineand Science (Eurographics 2006 Tutorial notes), I. Viola, M. C.Sousa, D. Ebert, B. Andrews, B. Gooch, and C. Tietjen, Eds.

BARROW, H. G., AND TENENBAUM, J. M. 1978. Recoveringintrinsic scene characteristics from images. In Computer VisionSystems, A. Hanson and E. Riseman, Eds. Academic Press, NewYork, 3–26.

BIEDERMAN, I. 1987. Recognition-by-components: A theory ofhuman image understanding. Psychological Review 94, 115–147.

BINFORD, T. O. 1981. Inferring surfaces from images. ArtificialIntelligence 17, 205–244.

DALY, S. 1993. The visible differences predictor: an algorithm forthe assessment of image fidelity. In Digital Images and HumanVision, A. B. Watson, Ed. MIT Press, 179–206.

DURAND, F. 2002. An invitation to discuss computer depiction. InInternational Symposium on Non-Photorealistic Rendering andAnimation (NPAR) 2002, 111–124.

FELDMAN, J. 2009. Bayes and the simplicity principle in percep-tion. Psychological Review 116, 4, 875–887.

FREEMAN, W. T. 1994. The generic viewpoint assumption in aframework for visual perception. Nature 76, 181–189.

HELMHOLTZ, H. 1962. Treatise on Physiological Optics, Vol-ume 3. Dover (reprint of translation; original 1909), New York.

HOCHBERG, J. E., AND MCALISTER, E. 1953. A quantitativeapproach to figural “goodness”. Journal of Experimental Psy-chology 46, 361–364.

HODGES, E. R. S. 2003. The Guild Handbook of Scientific Illus-tration, 2nd Edition. John Wiley and Sons, Inc.

KANIZSA, G. 1979. Organization in Vision: Essays on Gestaltperception. Prager, New York.

KERSTEN, D., MAMASSIAN, P., AND YUILLE, A. 2004. Objectperception as Bayesian inference. Annual Review of Psychology55, 271–304.

LEYTON, M. 1992. Symmetry, Causality, Mind. MIT Press, Cam-bridge, MA.

LI, M., AND VITANYI, P. 2008. An Introduction to KolmogorovComplexity and Its Applications, Third Edition. Springer, NewYork.

LI, W., RITTER, L., AGRAWALA, M., CURLESS, B., ANDSALESIN, D. 2007. Interactive cutaway illustrations of com-plex 3D models. ACM Transactions on Graphics 26, 3 (July),31:1–31:11.

LOWE, D. G. 1985. Perceptual Organization and Visual Recogni-tion. Kluwer Academic Publishers, Amsterdam.

MARR, D. 1982. Vision: A Computational Investigation into theHuman Representation and Processing of Visual Information.W.H. Freeman, San Francisco.

MEIER, B. J. 1996. Painterly rendering for animation. In Proceed-ings of SIGGRAPH 96, Computer Graphics Proceedings, AnnualConference Series, 477–484.

MI, X., DECARLO, D., AND STONE, M. 2009. Abstraction of 2Dshapes in terms of parts. In International Symposium on Non-Photorealistic Rendering and Animation (NPAR) 2009, 15–24.

MITCHELL, J. L., FRANCKE, M., AND ENG, D. 2007. Illustra-tive rendering in Team Fortress 2. In International Symposiumon Non-Photorealistic Rendering and Animation (NPAR) 2007,ACM, 19–32.

MITRA, N. J., GUIBAS, L. J., AND PAULY, M. 2006. Partialand approximate symmetry detection for 3D geometry. ACMTransactions on Graphics 25, 3 (July), 560–568.

MITRA, N. J., GUIBAS, L. J., AND PAULY, M. 2007. Symmetriza-tion. ACM Transactions on Graphics 26, 3 (July), 63:1–63:8.

PAULY, M., MITRA, N. J., WALLNER, J., POTTMANN, H., ANDGUIBAS, L. 2008. Discovering structural regularity in 3D ge-ometry. ACM Transactions on Graphics 27, 3, #43, 1–11.

PODOLAK, J., SHILANE, P., GOLOVINSKIY, A., RUSINKIEWICZ,S., AND FUNKHOUSER, T. 2006. A planar-reflective symmetrytransform for 3D shapes. ACM Transactions on Graphics 25, 3(July), 549–559.

RAMANARAYANAN, G., FERWERDA, J., WALTER, B., ANDBALA, K. 2007. Visual equivalence: Towards a new standard

5

Page 6: Visual Explanations - Rutgers Universitymdstone/pubs/npar10-explanations.pdf · 2013-04-23 · In NPAR 2010 Visual Explanations Doug DeCarlo Matthew Stone Department of Computer Science

In NPAR 2010

for image fidelity. ACM Transactions on Graphics 26, 3 (July),76:1–76:11.

RUSKIN, J. 1857. The Elements of Drawing. Smith, Elder and Co.

SEARLE, J. 1983. Intentionality. Cambridge.

SINGH, M., AND ANDERSON, B. L. 2002. Toward a perceptualtheory of transparency. Psychological Review 109, 492–519.

SINGH, M., AND HOFFMAN, D. D. 2001. Part-based represen-tations of visual shape and implications for visual cognition. InFrom Fragments to Objects: Grouping and Segmentation in Vi-sion, T. F. Shipley and P. J. Kellman, Eds. Elsevier Science Ltd.,401–459.

TAPPEN, M. F., FREEMAN, W. T., AND ADELSON, E. H. 2005.Recovering intrinsic images from a single image. IEEE Trans.Pattern Anal. Mach. Intell. 27, 9, 1459–1472.

VAN DER HELM, P. A. 2007. The resurrection of simplicity invision. In In the Mind’s Eye : Julian Hochberg on the Perceptionof Pictures, Film, and the World, M. A. Peterson, B. Gillam, andH. Sedgwick, Eds. Oxford University Press, New York, 518–524.

WALTZ, D. L. 1975. Understanding line drawings of scenes withshadows. In The Psychology of Computer Vision, P. Winston,Ed. McGraw-Hill, 19–92.

WEISS, Y., SIMONCELLI, E. P., AND ADELSON, E. H. 2002.Motion illusions as optimal percepts. Nature Neuroscience 5,598—604.

WILLATS, J. 1997. Art and Representation: New Principles in theAnalysis of Pictures. Princeton University Press.

6