globale ects in figure/ground segregation by a model with ......globale ects in figure/ground...

Global effects in Figure/Ground segregation bya model with only local interactions

N. Rubin, Center for Neural Science, New York UniversityM. C. Pugh, Department of Mathematics, University of TorontoCorresponding author: Nava Rubin, [email protected]

Abstract

Figure/Ground segregation is a fundamental problem in visual processing. Edges often arisebecause of occlusion, along the bounding contour of the occluding object. Having detectedan edge, the visual system therefore has to decide whether it is there due to occlusion, and ifso, which side of it belongs to the front surface (“the Figure”). This is known as the borderownership problem. Determining the border ownership of an edge cannot be done locally:it requires integrating information from an image region which contains the entire Figure orlarge portions of it. Thus, a neuron whose receptive field is small compared to the Figurecannot compute border ownership of an edge of that Figure in isolation. Recently it wasreported that the responses of V2 cells to an edge can be strongly modulated by the polarity ofborder ownership of the edge (Zhou et al.2000). These effects must therefore be the productof computations done by a network of neurons. One possibility is that such networks rely ondirect long-range connections (of the scale of the Figure) and/or feedback from areas withcells of large receptive fields. The model presented here offers an alternative. It is shown thatnetwork of small receptive field cells which interact only locally can nevertheless computeFigure/Ground segregation. The long-range effects emerge as a result of iterative propagationof information through a cascade of short-range connections. The model produces resultssimilar to those observed perceptually: it prefers enclosed, convex and/or smaller regionsas the Figure. Our results suggest that considerable portions of Figure/Ground segregationcan be accomplished by early visual cortex, without a need for feedback from higher areas.

Table of Contents

Section 1: Introduction, pp 2–4Section 2: The model, pp 4

§2.1: Formulating the problem, pp 6–10§2.2: Guiding principles, pp 10–11§2.3: Network and mathematical implemention

§2.3.1: Luminance edges and Figure/Ground transitions, p 12§2.3.2: Generating the candidate organizations: the cost function E , pp 12–15§2.3.3: Identifying undesired candidate organizations: the Figural entropy, pp 15–16

Section 3: Simulation results, pp 16–23Section 4: Discussion, p 23

§4.1: Timing considerations, pp 23–24§4.2: Region-based processes in physiology and perception, pp 24–25§4.3: Probability and neural representation, pp 25–26§4.4: Further extensions to the model, pp 26–27

Section 5: References, pp 27–31Appendix A: Computing P, pp 31–32Appendix B: Numerical methods and parameters, pp 32–34

1

1. Introduction

A visual image is a projection of a three-dimensional scene onto a two-dimensional sur-face. As a result, virtually every image ofa real-world scene includes occlusion. Whenone object occludes another, an edge is formedbetween them, defined by differences in sur-face brightness, color or texture. The edge isformed at the boundary of the front surface,and is informative about the shape of thatsurface. But in the generic case, an edge hasno relation to the shape of the occluded sur-face. In other words, when two surfaces areseparated by an edge, only one of those sur-faces “owns” the edge – the one which is infront. In order to recover a reliable repre-sentation of the shape of surfaces in the 3Dworld, the visual system must therefore beable to resolve “the border ownership prob-lem” (Nakayama et al.1995). Determinationof border ownership is essential for correctvisual segmentation and is intimately relatedto the process of recovering the true shape ofoccluded surfaces.

The first clear articulation of the ambigu-ity inherent in edges is attributed to EdgarRubin (1921, 1958). Rubin had the insightthat the border ownership problem occurs atevery edge, in every image, but he cleverlyused ambiguous figures to highlight this factand to study it further. He observed thateven when the shape of the border is suchthat both sides of it form the silhouette ofa known object, as in his famous face/vaseillusion, only one of the abutting objects canbe perceived at any given moment. The per-ception is bi-stable, alternating from one in-terpretation to the other over time, but thetwo objects cannot be perceived simultane-ously. Rubin (1921) termed the surface seenin front the “Figure” and the region seen be-hind the “Ground”. By restricting his studyto images with two layers of depth, he in-sured that the latter was indeed the back-ground – the distant-most, shapeless region,which nothing can come behind. In natural

scenes, the back region is often a surface witha well-defined shape of its own. But the needto resolve border ownership remains, sincethe border is still uninformative about theshape of the back surface.

Figure 1: An ambiguous Figure/Ground im-age demonstrating perceptual bi-stability for un-familiar shapes. (Adapted from Kanizsa andGerbino 1976)

Bi-stable Figure/Ground perception canarise also with shapes that do not depict fa-miliar objects. An example is shown in figure1 (Kanizsa and Gerbino 1976; see also Rubin1921, figure 2). The perception of this imagealternates between two interpretations – yel-low regions in front of a blue background, orblue regions in front of a yellow background.Most observers report the former interpreta-tion to be the dominant one (80%; Kanizsaand Gerbino 1976), but with prolonged view-ing, both interpretations will be seen. Thisshows that Figure/Ground alternations arenot driven by knowledge of familiar objects.

Rubin’s (1921, 1958) observations suggestthat the brain has a built-in mechanism whichmandates a border to belong only to one sur-face at a time, and not to both (see alsoNakayama et al.1995; Driver and Baylis 1996;Baylis and Driver 2001; Kourtzi and Kan-wisher 2001; Rubin 2001a). This hypoth-esized mechanism enforces a uni-directionalresolution of border ownership, and must op-erate on every edge, in every image. Am-biguous figures are a useful experimental tool,

2

but what they teach is just as valid for im-ages which give rise to unambiguous Fig-ure/Ground perception.

The neural basis of Figure/Ground (F/G)segregation, i.e., how the brain solves theborder ownership problem, remains largelyunknown. In particular, it is not yet knownat what stage, or “level” of cortical process-ing border ownership is resolved. (Or, ifmore than one level of processing is involved,what is the role of each level, and how dothey interact to produce the perceptual re-sults observed.) Recently, Zhou et al. (2000;see also Baumann et al. 1997) reported a setof striking findings which bear on the issue.They found that a large fraction of cells inearly visual cortex (18% in V1, 59% in V2,53% in V4) encode information about borderownership. Specifically, those cells exhibitedmarked differences in firing rates in responseto optimally oriented edges, depending onwhich side of the edge the Figure laid. An il-lustration of their findings is shown in figure2. The stimuli shown in panels A and B pro-duce identical stimulation within the cell’s“minimum response field” (see Zhou et al.2000, Methods) and its immediate surround,but the cell consistently responded more tothe stimulus in panel A than to that in B.The authors suggested that the difference inresponses is due to the inversion of borderownership polarity of the edge: the cell re-sponds when the Figure falls on one side ofthe edge, but not the other. Consistent withthis hypothesis, they found that cells whichwere insensitive to contrast polarity (3% ofthe cells recorded in V1, 15% in V2 and 15%in V4) retained their preferred border owner-ship polarity when the contrast of the imageswas reversed (panels C and D). Zhou et al.(2000) went on to perform a set of ingeniousexperiments which provide strong evidencethat the modulation of the cell’s firing rateswas indeed related to border ownership andF/G polarity.

Figure 2: An illustration of the findings of Zhouet al. (2000, adapted from their figure 4). Inspite of the identical stimulation within its ‘min-imum response field’ (ellipses), this V2 cell re-sponses strongly only when the Figure is to itsleft (panel A) and not when it is to the right(panel B). The cell shows preference to this Fig-ural polarity regardless of the contrast polarityof the edge (panels C, D). This suggests that thecell is involved in encoding border ownership.

At first glance, the findings of Zhou et al.(2000) may be taken to mean that F/G seg-regation is performed in early visual corti-cal areas. But this interpretation immedi-ately poses new problems. As the authorsnoted, “... the identification of a region as aFigure requires global image processing (thesystem needs to evaluate an area of the sizeof the Figure or more.)” In other words,the resolution of border ownership dependson information from image regions well out-side the receptive field of each individual cell.(Zhou et al. found cells that showed sensitiv-ity to F/G manipulations which took place

3

as far as 20◦ away from their classical recep-tive field.) The small sizes of receptive fieldsof cells in V1 and V2 make them seem un-suitable for such global computations. Theprocessing of global image information is gen-erally assumed to be the province of highlevel visual areas, primarily in inferotempo-ral cortex, where cells have large receptivefields and complex response characteristics.(We use the terms “low level” and “highlevel” cortical areas without reference to theirfunction, based solely on their level in the hi-erarchy of information stream in cortex, asassessed from the distribution of incomingand outgoing connections in specific layersin each area; cf. Felleman and Van Essen1991). Indeed, authors who previously re-ported that V1 cells are affected by imagemanipulations distant from their receptivefield (Lamme 1995; Zipser et al. 1996; butsee Rossi et al. 2001) proposed that thoseeffects may be the result of feedback fromhigher level areas (Lamme et al.1997). Zhouet al. (2000) also seem to favor the interpre-tation that the border ownership selectivitythey found in V1/V2 cells may result from“the presence of top-down signals”.

Nevertheless, there are a number of rea-sons why it may be advantageous for the vi-sual system to achieve Figure/Ground segre-gation in early visual cortex. A reliable seg-mentation process needs to be able to iden-tify the location and spatial extent of Fig-ures in the incoming visual image across theentire visual field, and to segment correctlysurfaces of any shape, including ones thatwere never seen before. The retinotopic or-ganization of areas V1 and V2 and the “gen-eral purpose” type of processing commonlyassociated with them (i.e., not designed forspecific shapes or objects) fit naturally withthese important requirements. ComputingF/G segregation in early visual cortex wouldalso free higher-level areas to perform morespecialized processing on a small number ofsegmented surfaces. Another advantage is

that areas V1/V2 have immediate access tohigh spatial resolution information about theimage (Lee et al. 1998.) F/G segregationis often sensitive to very subtle manipula-tions in the image: small changes (restrictedto, say, a 1 square-degree area), can changethe Figure/Ground assignments of large re-gions (Gillam 1987; Minguzzi 1987; Rubin2001b). Therefore, it would be good to per-form as much of the computation as possiblein the regions which contain cells with recep-tive fields of the scale of such image manip-ulations. The purpose of the work presentedhere was to show that Figure/Ground com-putations can indeed be achieved by a net-work whose architecture resembles that ofearly visual cortex.

2. The Model

We present a model which computes Fig-ure/Ground segregation using “units” whichresemble early cortical neurons (V1/V2). Themodel units have small receptive fields andeach unit is connected only to its nearestneighbors. Aside from these two attributes,the model does not attempt to be faithfulto physiological details. The computationsare done by a set of equations that treateach unit as a simple element characterizedsolely by its level of activity at each mo-ment. At present, no attempt is made toformulate how those equations would be im-plemented in a biologically realistic modelthat takes into account the full complexityof factors such as cells’ membrane potentialsand synaptic transmission. Nevertheless, weargue that the model has significant implica-tions for research on the neural basis of F/Gsegregation. It demonstrates that it is possi-ble to observe global effects in a network oflocal elements. Moreover, the model showsbiases similar to those of human observersin its F/G assignments: a tendency to judgeenclosed and/or smaller surfaces as the Fig-ure, and a bias to ascribe ownership to theconvex side of a border. The fact that these

4

effects can be obtained in a model which useslocally connected units with small receptivefields suggests that the possibility that F/Gsegregation is achieved in early cortex war-rants further consideration.

The general observation that global effectsmay arise in a system of locally-interactingelements is well known in the physical sci-ences (e.g., thermodynamic phase transitions,magnetization), and it has been used previ-ously in vision models (e.g., Marr and Poggio1976; Hildreth 1984; Sha’ashua and Ullman1988). The intuitive explanation is that asthe system evolves in time, information maybe mediated across long distances by a cas-cade of local interactions. How such inter-actions give rise to the specific global F/Geffects observed, however, has not been ad-dressed so far.

The approach we take is different fromthat prevalent in current vision studies. Mostmodels focus on edges: the bounding con-tours of surfaces. We term this approachcontour-based. In contour-based models, unitswhose receptive fields fall on homogeneousregions in the image normally do not partic-ipate in the processing and representation ofthe scene (Ullman 1976; Sha’ashua and Ull-man 1988; Heitger and von der Heydt 1993;Nitzberg et al.1993; Iverson and Zucker 1995;Williams 1996; Williams and Jacobs 1997;Yen and Finkel 1998). The goals of seg-mentation are thus defined in terms of con-tours: identify continuous contours, grouptogether parts of the same contour whichare separated in the image because of occlu-sion etc. Computations take place primar-ily along contours – e.g., testing for colinear-ity or relatability (cf. Kellman and Shipley,1991; Elder and Zucker 1993), using “asso-ciation fields” to identify contours embed-ded in a noisy background (Field et al. 1993;Yen and Finkel 1998), or constructing illu-sory and occluded contours (Heitger et al.1992; Heitger and von der Heydt 1993; cf.von der Heydt et al.1984; Peterhans and von

der Heydt 1989; Sugita 1999; Bakin et al.2000). In the contour-based approach, res-olution of border ownership may not evenbe part of the computational task: contoursmay be detected and represented indepen-dently of the surfaces they bound (but seeFinkel and Sajda 1992; Weiss 1997).

The approach presented here is region-based.Like contour-based models, it recognizes theimportant role of edges as a source of in-formation about surface boundaries in thescene. But the goal of region-based modelsis to identify the surfaces bounded by edges– not the edges per se. An edge is treated asa potential part of the bounding contour ofa region and the model attempts to find thishypothesized region. As a result, assignmentof border ownership is an integral part of thecomputational task in region-based models.To achieve this goal, the flow of informa-tion in region-based models is not confinedto be along contours. Instead, all units con-tribute to the computation – including thosewhose receptive fields fall within homoge-neous parts of the image. Possibly due tothe physiological findings of the abundanceof neural responses to edges (e.g., Hubel andWiesel 1968), there has been a longstandingfocus on contour-based computations for vi-sion, with relatively small attention to region-based processes (but see Mumford et al.1987;Paradiso and Nakayama 1991; Kimia et al.1995; Grossberg 1997). Recently, however,several region-based segmentation models incomputer vision have shown promising re-sults in delineating what are called “salient”regions in the image (roughly, the computervision equivalent of what we term Figuralsurfaces; cf. Zhu et al. 1995; Shi and Malik1997; Geiger et al. 1998; Sharon et al. 2000;see also Finkel and Sajda 1992; Grossberg1997). In this paper, we apply the region-based approach to resolving the border own-ership problem and relate it to human per-ception.

5

Figure 3: How region-based models give rise toglobal Figure/Ground effects. Signals about theprobability that a point belongs to the Figureare launched from locations near the edge andpropagated all around. The cumulative effectsof those signals is different on the two sides ofthe contour. This asymmetry leads the model toprefer the “inside” as the Figure. (The decreas-ing contrast of the circles denotes attenuationof signals away from their source.)

Allowing signals to propagate also withinhomogenous regions provides a natural wayfor global F/G effects to arise. An intu-itive illustration of this is shown in figure3. Consider three local segments along thebounding contour of an ellipse. The goal ofa region-based model is to find out if thesesegments belong to a contour that boundsa region. Because each unit has only localinformation, this is a non-trivial computa-tion. But by propagating signals betweenneighbors, the collection of units can actu-ally “find out” that one side is more likelyto be the Figure. Suppose that a cascade ofsignals is launched from each location, withthe level of activation of each unit denotinga possibility (or probability) that the unitbelongs to the Figure. Because there is noprior knowledge which side of each edge seg-ment may be the Figure, those signals wouldhave to be launched in a completely isotropicway – as indicated by the concentric circles.Nevertheless, inspecting figure 3 reveals thatthe signals on the inside of the ellipse rein-force each other more strongly: the maxi-mum (summed) activity is higher than for

signals on the outside, and points of equalactivity lie deeper (further from the edge)on the inside. Note, that for this imbal-ance to emerge, signals had to propagate alsobetween units that lie on homogeneous re-gions of the image. Region-based modelscan therefore exhibit a preference for the en-closed region to be the Figure without need-ing to implement any special bias in the indi-vidual units. (The simple case of the ellipseconfounds the properties of closure and con-vexity; those will be disentangled in section3.)

Before moving on, there are two qualifica-tion we need to make. One, the model pre-sented here is designed for situations wherethe foreground and background surfaces arehomogeneous, i.e., they are not textured anddo not contain other internal edges. We areassuming that computations such as identi-fying regions of uniform texture are handledby other modules (in the human or artificialvisual system). The output of these modulescan then be used as input to our model. Inthe tradition established by E. Rubin (1921),who used untextured images in his studies,we restricted ourselves to images of homoge-neous surfaces (like that shown in figure 1)and thus could concentrate on the computa-tions most relevant to F/G segregation andborder ownership. For real-world applica-tions, this model would therefore need to becombined with models that take care of thework of the other modules mentioned, e.g.,texture segregation. The other simplifica-tion we took, again following Rubin (1921),is that we restricted ourselves to images withtwo layers of depth (which also allows us torefer to the back surface as Ground). But ex-tending the model to handle more layers ofdepth does not require conceptual changes(see Discussion).

2.1. Formulating the problem. The modelstarts by representing the input image bya set of units which signal the luminanceand/or color at a small, restricted part of

6

the image. In physiologically-inspired terms,the input is represented using a set of smallreceptive-field units that tile the image retino-topically. The output is also represented bya set of retinotopically organized small re-ceptive field units. In principle, the den-sity of output units and the extent and lo-cation of their receptive fields may be dif-ferent from those of the input units; how-ever for simplicity we choose them here tobe the same. The activity level of each out-put unit represents the probability that thelocal region of the image corresponding tothe unit’s receptive field belongs to a Figu-ral surface. Using the index k for the modelunits, we denote the activity of output unitk by P (k), where 0 ≤ P (k) ≤ 1. The inter-pretation of an output unit k having a valueP (k) = 1 or P (k) = 0 is straightforward: theformer case indicates certainty that the cor-responding location is part of a Figural sur-face, whereas the latter indicates certaintythat that location is part of the background.Intermediate values of P (k) indicate varyingamounts of uncertainty about whether thelocation belongs to the Figure or the back-ground. (See section 4 for a discussion ofpossible neural implementations of interme-diate P values.)

Next, we need to define the desired rela-tionship, or transformation between the in-put and the output in the model. Broadlyspeaking, we want the output to correspondto what is observed in perception. Thus, toknow the desired output for a particular in-put image, we need to ask observers whatthey perceive as Figure in that image; thedesired output of the model would then bethe one with P (k) = 1 at units that fallwithin the (perceptually) Figural region, andP (k) = 0 elsewhere. The computations per-formed by the model should produced thedesired output (or a result close to it), i.e.behave like human perception.

To consider a concrete example of an in-put and its possible outputs, refer to figure

4. It presents an overview of the model, asapplied to one simple image. At this point,we focus only on select parts of figure 4.The top row shows the array of 100x100 in-put units representing the image. We usecolor to denote different image regions, toprevent confusion with illustrations of subse-quent model stages, where grayscale will beused to denote the values of P (k). (Recallthat in principle, these two regions could bedifferentiated by other attributes, e.g. lumi-nance or texture.) Although the input unitscan signal different values of color or lumi-nance, this does not solve the border owner-ship problem.

The top panel of figure 4, is perceived as ayellow ellipse (the F) in front of a blue back-ground. The desired output would thereforebe P (k) = 1 for all units k inside the ellipse,and P (k) = 0 for those outside. DenotingP (k) = 1 with white, P (k) = 0 with black,and using a grayscale for intermediate val-ues, the bottom panel of figure 4 shows theoutput produced by the model. Clearly, it isa good approximation of the desired output.

En route to finding this output, many otherpossible outputs are evaluated. Two of themare shown in panels I and J of figure 4. PanelI corresponds to a percept of an elliptical“hole” in a blue foreground, looking onto ayellow background. Panel J shows a pos-sible (but undesired) output with little re-lation to the input image: transitions be-tween F and G occur in places where thereare no edges in the image (note: we use theterm “edge” to mean luminance-, color- ortexture-defined edges in the input image.)Furthermore, many of the output units haveintermediate values of P (k) (gray), indicat-ing an output with many “undecided” units.The model evaluates the possible outputs bycomputing a certain value, or “cost” for eachof them. The success of the model is that theoutput with the lowest cost approximates

7

Network Implementation Mathematical FormulationStage

Input Image

Generate multiple sets oflocal F/G assignments;enforce flipping acrossedges (Principle i)

Generate a candidateorganization from eachset; local smoothingminimizes F/G flips awayfrom edges (Principle ii)

Evaluate candidateorganizations based onthe strength of the F unitsin them (Principle iii)

Input Image

Anchoring operatorsstore Po (set of localF/G assignments)

Computeminimal-cost solution Pfor each Po

ComputeFigural Entropy Sfor eachcandidate organization

Output: the organizationwith the strongest Figure

Select the P withlowest S

edge detection

A B C

D E F

H I J

Figure 4: An overview of the model. The left and right columns describe the stages guided byPrinciples i-iii and their mathematical formulation, respectively. The central column shows thenetwork implementation. Top panel, the input image. Bottom panel, the output. Grayscaleindicates P (k), the probability that unit k is Figure, with P (k) = 1 in white, P (k) = 0 in black,and gray for intermediate values.

8

well the F/G organization observed percep-tually, while undesired organizations (suchas 4I and J) lead to higher costs.

In a sense, the idea that the observed per-ceptual organization is the one which mini-mizes (or maximizes) a cost function datesback to the Gestalt Psychologists, who theo-rized that the brain selects visual interpreta-tions which maximize “perceptual goodness”(or Pragnanz; cf. Koffka 1935.) Here, how-ever, we are faced with the challenge of de-vising a formula that measures the “good-ness” of Figure/Ground organizations quan-titatively.

In addition to handling input images thatgive rise to unambiguous F/G interpretations(like that in figure 4), we would like the modelto be able to predict when a certain imagemay lead to perceptual bi-stability. Thus,for an input image like that in figure 1, weexpect to find two outputs with low cost –one corresponding to the yellow regions be-ing F, the other to the blue regions being F.As will be seen in section 3, the model willeven predict that the cost of the former willbe slightly lower than that of the latter, cor-responding to its observed perceptual domi-nance. Nevertheless, both costs will be muchlower than that of other, ‘nonsense’ organi-zations, which are not observed perceptually.

2.2. Guiding principles. In this section,we outline the main stages of computationsin the model. We reserve details of the math-ematical and network implementation for thenext section; here we focus on the ideas themodel is based on, and how they promoteoutputs that are in agreement with humanperception. Figure 4 will be used to givean overview of the stages of the model. Forreaders who are not interested in the math-ematical implementation, reading this sec-tion should be enough to understand how themodel works (i.e., it is possible to skip sec-tion 2.3 and go directly to the simulations).

The model is based on the following threeprinciples:

(i) F/G boundaries are likely to be presentalong luminance/color gradients (edges).

(ii) F/G boundaries are unlikely whereedges are absent.

(iii) Among all possible organizationssatisfying (1) and (2), prefer the one(s)where the Figural units (P (k) > 0.5)have the strongest F assignment – i.e.,where they are closest to 1.

These three principles may seem innocu-ous, but we will see that in concert they pro-vide enough constraints to allow the modelto converge at the right solutions. Further-more, these principles can be implementedin a physiologically plausible way, on a net-work of small receptive field units with localconnections between them.

At the first stage, the model generates setsof F/G assignments over the entire image.At this point, each set is only required to sat-isfy Principle (i): that the F/G assignmentsshould flip across the edges. The image issampled with a collection of local operatorswhose “receptive field” extends several in-put units, to make it possible to detect edgeswithin them, but still much smaller than thescale of the Figure (typically 8-15 units in di-ameter; see section 2.3.1 and Appendix B).If an edge (defined as gradient > criterion)falls within the receptive field of an opera-tor, the units within its receptive fields arelabeled border units. The operator then as-signs all its border units on one side of theedge as F, and those on the other side as G.Which side gets F is decided at random atthis point. The rest of the units in the model(i.e., the non-border units) are labeled as un-decided (0.5). We term the local operatorsanchoring operators to reflect the fact thatthey will exert influence on the model unitsto remain at their assigned values. The ex-tent of this influence will be large for borderunits (i.e., those assigned 1 or 0), and smallfor non-border units.

Since the model has no “global” knowl-edge (e.g., units with receptive fields of the

9

scale of the global Figure), F/G assignmentsare generated independently within each an-choring operator. As a result, there will bemany possible sets of F/G assignments. Eachset corresponds to a different combination ofthe two possible F/G polarities within eachanchoring operator. Panels A-C in Figure4 show three examples of such sets. (Theanchoring operators that fall on edges are il-lustrated by the red circles.) The two sets ofmost interest to us are those in panels A andB. Panel A corresponds to the interpretationobserved perceptually, of an elliptical Figure,and panel B to the reversed F/G interpreta-tion – an elliptically-shaped hole in a ‘front’surface. Explaining why A is preferred toB is a major goal of the model. But thereis also something in common to these twosets: they are globally consistent. Becauseof the local nature of the F/G assignments,the vast majority of the sets of initial F/Gassignments will not share this property, i.e.,they will be globally inconsistent. Panel Cshows one such example. Following the cir-cumference of the ellipse, the polarity of F/Gvalues changes erratically. This is unavoid-able given the independent assignments ateach operator – there is no way to a prioriguarantee global consistency of the anchor-ing F/G assignments. Among all possibleanchoring F/G assignments, there are manymore cases like panel C: given a anchoringoperators, there are 2a − 2 globally inconsis-tent anchoring F/G assignments and only 2consistent ones. Another major task of themodel is therefore to identify and exclude theglobally inconsistent sets. What will makethe model select the globally consistent an-choring F/G assignments, albeit rare, is thatthey lead to organizations that satisfy alsoPrinciples (ii) and (iii).

At the next stage, the model takes eachset of local F/G assignments and generatesfrom it a candidate organization: the bestF/G organization that can be obtained forthe given set. What makes it “best” is that

it is the organization which follows Princi-ple (ii) most closely: minimize F/G bound-aries in image regions that do not containedges. Formally, this is achieved by meansof a cost function E which penalizes for F/Gtransition away from edges. The candidateorganization is the one for which E is mini-mized for the given set of F/G assignments.(A detailed description of the mathematicalformulation, as well as an explanation of howthe minimization process is implemented inthe network, is given in section 2.3.2.) Con-sider panels D-F of figure 4, where the can-didate organizations obtained for the F/Gassignments in panels A-C, respectively areshown. When the set of F/G assignments isglobally consistent, as in panels A and B, theminimization of E leads to a candidate orga-nizations with the majority of units near 1or 0 (panels D and E, respectively). In con-trast, the candidate organization in panel F,which was generated from the globally incon-sistent set C, contains many units near 0.5(gray). The reason for this is that in order tominimize E (and thereby adhere to Principle(ii)), the model smoothly interpolated be-tween the conflicting values near the edges,so that sharp F/G transitions would not oc-cur away from the edges. The abundance ofunits near 0.5 in globally-inconsistent candi-date organizations will be used in the next,final stage of the model to prune them, leav-ing us only with the globally consistent or-ganization(s).

The final stage of the model implementsPrinciple (iii): it evaluates each of the candi-date organizations based on the strength ofits Figure and selects the strongest one(s).The ‘Figure’ in a candidate organization isdefined as the collection of units k that haveP (k) > 0.5. Referring back to figure 4, pan-els H, I and J show the Figures of candi-date organizations D, E and F, respectively(the units with P (k) < 0.5 were all set toblack). To evaluate the strength of each Fig-ure, the model computes a function called

10

entropy, denoted S, which is monotonicallydecreasing the more units in the Figure arenear 1. (For details see section 2.3.3.) Itthen chooses as its output the candidate or-ganization with lowest S. For the case ofthe ellipse in figure 4, the entropies of theFigures in panels H, I and J are 0.34, 0.74and 0.77 respectively. The output is there-fore that shown in panel H (reproduced inthe bottom row), which is consistent withperception. Note, that the model prunedthe undesired organizations although no ex-plicit computation to detect global inconsis-tencies was built in. An intuitive explana-tion for why the candidate organization inpanel F leads to high Figural entropy wasalready given: globally inconsistent sets ofF/G assignments (e.g., panel C) lead to can-didate organizations with many units with0.5 < P (k) � 1, which in turn lead to highentropy. With regard to the high Figuralentropy of the globally consistent candidateorganization in panel E, some intuition wasgiven in figure 3, and this issue will be revis-ited in section 3.

The large difference in values between theentropy of the Figure in panel H and thatof panel I predicts that the interpretation ofthe image as an elliptical Figure will be muchstronger than that of an elliptically-shapedhole. Indeed, the entropy of the “hole” inter-pretation is not much lower than that of the“nonsense” interpretation J. This is in verygood agreement with what is observed per-ceptually. For other images, however, theremay be more than one candidate organiza-tion with entropy considerably lower thanthe rest. In such cases, the model’s pre-diction is that perception will alternate be-tween these organizations. Some examplesof such bi-stable images will be discussed insection 3. (Note: the “hole” interpretationcan be promoted perceptually by introduc-ing stereoscopic cues which disambiguate thedepth relationships of image regions; here we

do not consider such additional factors in theresolution of border ownership.)

To summarize, once multiple sets of F/Gassignments are generated in accordance withPrinciple (i), the model proceeds in two stages.In the first stage it generates a candidateorganization from each set, by minimizinga cost function E . The structure of E im-plements Principle (ii), be penalizing F/Gtransitions away from edges. As a conse-quence, global inconsistencies in the F/G as-signments result in candidate organizationswith many ‘undecided’ units, while globallyconsistent sets result in candidate organiza-tion with few undecided units. In the secondstage, the model chooses the candidate or-ganization with the lowest entropy S, whichleads to the elimination of candidate orga-nizations with many undecided units, in ac-cordance with Principle (iii).

The present model allows for an efficientminimization of the cost function E as a dy-namical process in the network, but it doesnot offer a similarly efficient implementationfor finding the best candidate organization(s)with lowest S. Here, this second stage re-quires a search through the set of candidateorganizations, a process which is not bio-logically plausible. An efficient convergenceto the low entropy organization(s) can beachieved if one introduces local interactionswhich favor consistent polarity of F/G as-signments between neighboring anchoring op-erators. In effect, this turns the present “twostage” version of the model (first minimizethe cost E , then compute the entropy S) intoa single stage of minimizing a cost functionwhich incorporates both E and S (Pugh andRubin, in preparation). But since the math-ematical formulation of the combined E − Smodel is much more complicated, we limitthe discussion here to the two-stage model,which allows us to present the main ideasand achievements of such models in a sim-pler form.

11

2.3. Network and mathematical imple-mentation.

2.3.1. Luminance edges and Figure/Groundtransitions. The first stage in the model wasdescribed in detail in section 2.2. Below wegive a brief summary and introduce furthernotation and details. After a pre-processingstage of edge detection, the model samplesthe image with a dense set of anchoring oper-ators. F/G assignments are then generatedlocally within each anchoring operator. Wedenote the anchored values P0. The valuesof P0 reflect Principle (i): that F/G transi-tions are likely near edges. If an edge fallswithin an anchoring operator, it assigns val-ues P0(k) = 1 (F) or P0(k) = 0 (G) to itsborder units, such that all units on one sideof the edge are assigned one value, and thoseon the other side are assigned the oppositevalue. Otherwise, the units within the an-choring operator are non-border units andthey are assigned P0(k) = 0.5. The inde-pendent generation of P0 values within eachanchoring operator leads to multiple sets ofF/G assignments. The next stage will be togenerate a candidate organization from eachset.

The spatial distribution of anchoring oper-ators in the model is dense, i.e., an operatorcould be centered at any input unit location.But not all anchoring operators are activatedwhen processing a given image. First, toavoid conflicting anchoring assignments of asingle border unit, operators with overlap-ping receptive fields are not allowed to be si-multaneously active. In addition, the overalldensity of activated operators is a parameterin the model. In the simulations presentedhere, we choose which anchoring operatorswill be activated at random (subject to theconstraints listed above). We expect that fornatural scenes, anchoring operators may beactivated around the stronger edge segmentsin the image.

2.3.2. Generating the candidate organizations:the cost function E . The next step is to com-pute the candidate organization that eachset of F/G assignments gives rise to. Thisstep aims to satisfy Principle (ii): F/G bound-aries are unlikely where no luminance bor-ders are present. Thus, we seek a candidateorganization which agrees with the anchor-ing F/G values assigned to the border units,while minimizing the amount of F/G rever-sals at non-border units.

We denote the collection of values {P0(k)},which is defined on the entire intermediatelayer (both border and non-border units), byP0. We will denote the resulting candidateorganization by P: the collection of values{P (k)} that specify the probability that thecorresponding location in the image belongsto the Figure (for the given P0). Findinga network configuration that best satisfies aset of constraints – in our case, adherenceto the F/G assignments (Principle i) whileminimizing F/G boundaries away from edges(Principle ii) – is done by expressing themin a cost function that the network mini-mizes. (How network dynamics can performthis minimization will be shown later; cf fig-ure 5 and Appendix A.) We use the followingcost function E :

E(Q) =n2∑

k=1

∑

j∈Nk

µkj [Q(k) − Q(j)]2

+∑

k∈A

[Q(k) − P0(k)]2(1)

+∑

k∈ Ac

ν [Q(k) − P0(k)]2

E was written here as a function of Q; like P,it denotes a collection of F/G values of all theunits. However, unlike P, Q does not neces-sarily minimize E . For example, it could bea random collection of values – in this case,the value of E would simply be very large andfar from the minimum. Furthermore, thequadratic dependence of E on Q means thatthere is only one P that minimizes it, and

12

unit k in A unit k in Ac

j=k-n

1

Po(k)=1

0.1

Po(k)=0

Po(k)=0.5

j=k+1j=k-1

j=k+n

kj=0.1

kj=0

k0.1

kj=0.1

0.10.1

0.1

j=k-n

j=k+1j=k-1

j=k+n

k

k k

Figure 5: Network implementation of equation (1). Left, The connections to a border unit (i.e.,one that is near an edge and is covered by an anchoring operator). Right, The connections toa non-border unit. In both panels, the top layer shows the image, the intermediate layer showsthe values of P0 and the bottom layer shows the network which computes P . Appendix A showsthat this network will settle in a state which minimizes equation (1).

this is the candidate organization that arisesfrom P0. The other notations are best un-derstood by consulting figure 5, which showstwo illustrations of a small piece of the net-work. Because the network is based on localconnections, we focus on a single unit and itsnearest neighbors. For a unit with index k,its nearest neighbors to the ‘west’ and ‘east’have indices k − 1 and k + 1, respectively.Since the network is m×n, the nearest neigh-bors to the ‘north’ and ‘south’ are k−n andk + n.

The first term of equation (1) deals withthe connections between unit k and its near-est neighbors, which are denoted as a set byNk. Using j as a generic index for a neigh-boring unit, its input to unit k is weighted byµkj. The values of these weights are shownin the bottom layers in figure 5. They de-pend on whether unit k is near an edge (leftpanel) or not (right panel). For units near anedge, the incoming weights from neighborson the other side on the edge (in the figure,it is unit k + n) are set to zero. The weightsfrom units on the same side of the edge havevalue µ, which is a free parameter (µ shouldbe less than 1; in the simulations presented

later, we took µ = 0.1). For units not nearan edge (right panel), the incoming weightsequal µ for all neighbors. (Note, that thismeans that µkj, the incoming connection tounit k, is equal to µjk, the outgoing connec-tion from k to j. This symmetry will be usedlater.)

The second term in equation (1) deals withhow border units are affected by their an-chored F/G assignments (here, A stands for“anchored” and denotes the set of borderunits.) Consider again the unit k on the leftpanel of figure 5. This unit is not only nearan edge, but is also in A (recall that not allunits near edges will fall within an anchor-ing operator; for illustrative purposes, thefigure depicts a case where unit k as well asits neighbors are all in A.) The F/G assign-ments are shown on the intermediate layer.Unit k and three of its neighbors were as-signed P0(k) of 1 (white), while the fourthneighbor was assigned 0 (black). But unitk will only be affected by its own anchoredvalue, P0(k). This is denoted by the arrowfrom the intermediate layer unit k to the bot-tom layer. This connection weight is 1 (i.e.,it is not weighted by a free parameter.)

13

Finally, the third term in equation (1) dealswith how non-border units are affected bytheir unbiased F/G assignment of 0.5 (Ac

denotes the complement set of A). This isillustrated for unit k on the right panel offigure 5. The connection from the interme-diate layer to unit k in the bottom layer isweighted by a very small number, which wedenote as ν. It is another free parameter(e.g., ν = 0.0002 for the results presented insection 3).

To understand how the structure of thecost function E affects the solution P, con-sider the three choices of P0 in panels A-C of figure 4 and their resulting minimizersP shown in panels D-F. Despite their obvi-ous differences, the three minimizers sharesome common characteristics. One, they aresmooth except where there are edges in theimage; two, they show fidelity to the anchor-ing values near the edges; and three, theycontain some units with intermediate valuesof P (k) (i.e., neither 1 nor 0). These com-monalities directly relate to the three termsin E . Below, we discuss each term in turn,and then discuss their combined effect.

Smoothness. A notable thing about allthree candidate organizations in panels D-F of figure 4 is the absence of sharp transi-tions away from the edges. Along the edges,there are jumps between Figure and Ground(white and black, respectively), as dictatedby Principle (i). In contrast, away from theedges, the gray levels vary smoothly. Thisis an implementation of Principle (ii), andmathematically, it is a result of the first termin E . This term increases with every pair ofneighboring units which have different val-ues of P (k); the larger the difference, thelarger the additional cost. The exceptionto that is when the neighbors are separatedby a luminance border, in which case differ-ences between the values of P (k) and P (j)are not penalized, since µkj was set to 0 forsuch pairs (to allow for F/G changes acrossluminance borders). This is why sharp F/G

transitions occur also in portions of the edgeswhere anchoring units were not activated (cf.figure 4D-E.)

The smoothness away from the edges re-flects the fact that, for a fixed set of anchoredvalues along the edges, the function P whichminimizes the first term in E is given byinterpolating between the edge values. Wetherefore call the first term the smoothnessterm. It has the effect of smoothly propagat-ing, or spreading the F/G values from borderunits to non-border units, i.e., from edgesonto entire regions. Note, that although thesmoothness term explicitly couples only neigh-boring units (and therefore preserves the lo-cality of the model), a given unit can havean effect on the value at much more distantunits. This is because chains of pair-wiseterms [P (k) − P (k + 1)], [P (k + 1) − P (k +2)], . . . , [P (k + n− 1)− P (k + n)] effectivelycouple units separated by n other units (aslong as they are not separated by an edge).Intuitively, we can think of this as contri-butions to the cost by a cascade of pairs ofneighboring units with different values of P .Thus, although the model admits only localconnections (µkj 6= 0 only if units j and k arenearest-neighbors), the smoothness term en-ables long-range interactions (global effects)in the model.

Fidelity to the anchoring values. There isa clear relation between each candidate or-ganization and the set of anchoring valuesthat led to it. Specifically, the minimizersP are close to P0 at the border units. Thisis a result of the second term of E . Thisterm gets a positive contribution wheneverP (k) at a border unit k deviates from theanchored value P0(k) there. Thus, E pe-nalizes for deviations from anchored values,although it allows such deviations, in prin-ciple. The minimizer P will therefore tendto remain faithful to the anchored values atthe border units. This allows the model to

14

evaluate different possible P0’s through theireffect on P.

The effect of P0 at non-border units. Thethird term in E penalizes deviation of P (k)from P0(k) at non-border units. However,the cost for such deviations is much smallerthan at border units: they are weighted bythe small value of ν. The effect of this termis best understood by inspecting panels D orE in figure 4 – for simplicity we pick one ofthem, D. It shows the solution which mini-mizes E for the P0 shown in panel A. There,the anchoring values assigned at the edgeswere globally consistent. If E consisted onlyof the first two terms (i.e., if ν were set tozero), the minimizing solution would be verysimple: P (k) = 1 for all units k inside theellipse, and P (k) = 0 for all units outsideof it. This solution leads to nulling of boththe first and the second terms in E and sinceboth terms are quadratic (i.e., always posi-tive), a function which nulls them is neces-sarily the one which minimizes them.

Thus, with only the first two terms in E ,the spread of F/G values away from the bor-der units would be potentially unlimited: allunits, no matter how far away from the edge,may end up having “hard” F/G values P = 1or P = 0. This might seem like a goodthing, at first glance, but in fact there areseveral reasons why this is undesirable. Con-sidering physiological plausibility, it is morereasonable to assume that there would besome loss of signal as it is transmitted viaa chain of short-range connections. Froma computational point of view, attenuatingthe F/G signal as one moves away from theedge can be advantageous: for example, inresolving conflicting F/G signals which ar-rive at a single location from different direc-tions, it would allow the unit to go with thestronger signal which would then imply thecloser edge. Finally, as we shall see in thenext section, unlimited spread of the F/Gsignal leads to predictions which are incon-sistent with perception, specifically about the

effects of closure, convexity, and size on F/Gperception.

The interplay between the terms. Whileisolating the terms affords some understand-ing of the cost function, in the full problemthey play off of one another. If the cost func-tion consisted only of the second and thirdterms, then it would be minimized by P0

- i.e., the output would be identical to theinput (as these two terms control the ‘faith-fulness’ of the solution of equation (1) to theanchored values at border units and to theP = 0.5 values elsewhere). If, on the otherhand, the cost function consisted only of thefirst term, then the penalty it puts on devi-ations from smoothness would drive the sys-tem to a trivial solution of constant P valueswithin regions that do not contain a lumi-nance border, with no regard to the valuesset by the anchoring operators. The require-ment that the solution minimizes the sum ofall three terms creates an interplay betweenthese conflicting demands. As a result, theminimizer P will be as smooth as possible(first term) while remaining faithful to theinitial biases (second and third terms). Thefree parameters µ and ν affect the relativeweight of these demands. The second termhas no parameter and is of order 1. The firstterm scales like µ: by taking µ small we arevaluing fidelity to the anchored values at theborder units over smoothness. By taking ν

very small, we value a modicum of fidelity tothe unbiased values (P0(k) = 0.5) but this isthe weakest demand of the three.

2.3.3. Identifying undesired candidate orga-nizations: the Figural entropy. The final stepof the model is to evaluate the “global good-ness” of each of these organizations and thusselect the best output F/G organization(s).This stage implements Principle (iii): preferthe organization(s) where the Figure is thestrongest. The ’Figure’ in a candidate orga-nization is defined as the collection of Figural

15

units (units k with P (k) > 0.5). Its strengthis quantified by the Figural entropy:

(2)

S(P) = −1

Nfig

∑

k,P (k)>.5

2P (k) log2(P (k))

where Nfig is the number of Figural units.

The name “entropy” comes from the simi-larity of equation (2) to the formula for en-tropy in statistical mechanics (Reichl 1998)and information theory (Cover, 1991). How-ever, we make no claim for a physical orinformation-theoretic basis for S. The selec-tion of the function −2P log2(P ) is in factsomewhat arbitrary; any monotonically de-creasing function defined over [0.5, 1] woulddo for our purposes. Our choice has the ad-vantage that it offers a convenient scale to in-tuitively grasp the “goodness” of the Figure.The value of S is confined between zero andone: if all Figural units have P (k) = 1, S willbe 0; as more and more Figural units departfrom 1 and approach 0.5, S approaches itsmaximal value of 1.

Using the Figural entropy as a measure,the model ranks the candidate organizationsP of a given image, creating a hierarchy ofF/G organizations. In some cases, there willbe one candidate organization whose entropyvalue is significantly lower than that of allother organizations. As seen earlier (section2.2), this is the case for the ellipse in figure 4.The Figural entropy of the (potential) out-put in panel H is 0.34, while those of panelsI and J are 0.74 and 0.77, respectively. Theoutput of the model is therefore that in panelH (redrawn on the bottom panel), in agree-ment with perception. In other cases, theremay be more than one candidate organiza-tion with low entropy. The model predictsthat these situations will result in perceptualbi-stability, as will be discussed in the nextsection.

Before proceeding to the simulations, an-other note about S needs to be made. The

strength, or “goodness” of a candidate or-ganization is determined only by the valuesof the Figural units (P (k) > 0.5). Thisis motivated by the difference in perceptualquality of Figure and Ground (Rubin 1921,1958). However, mathematically it is possi-ble to draw information from the values ofthe Ground units as well. Consider the twoglobally consistent sets of F/G assignments,P0, in panels 4A and B. They are symmet-ric about 0.5 with respect to each other: oneset can be obtained from the other by thetransformation (x → (1 − x)), i.e., from 1 −P0. The structure of the cost function Epreserves the symmetry with respect to thistransformation. Therefore, the solution tothe P0 in panel B can be obtained by reflect-ing the solution to the P0 in panel A about0.5. In other words, the values of P (k) shownin panel E equal (1−P (k)) of those shown inpanel D. This relation is not limited to thepair of globally consistent solutions: each setP0 has a complementary set obtained by re-flection. But this fact is particularly usefulfor the globally consistent ones, because it al-lows one to evaluate the Figural entropies ofboth of them from a single solution. Specif-ically, applying equation (2) to the (1 − P )values of the background units of one solu-tion gives the same value as one would getfrom computing the Figural entropy of thecomplementary solution. We shall use thisextensively when discussing further resultsin the next section.

3. Simulation results

This section presents results from a set ofimages designed to study how the model be-haves under conditions which are known toaffect Figure/Ground perception. The bulkof this section can be understood even if sec-tion 2.3 was skipped, although we make oc-casional reference to it.

The first image was already introduced inthe previous section: it is the ellipse in fig-ure 4. We have already seen that the model

16

output corresponded to that observed per-ceptually: the organization with the lowestentropy in figure 4 was that of an ellipse infront (panel D). figure 3 gave some intuitionas to why this organization was preferredover that of panel E (an elliptic hole): theFigure units are, on average, stronger (closerto 1) when signals propagate “inwards” than“outwards”. Next, we wish to disentanglethe effect of the two properties that con-tributed to the preference of panel D: theellipse being the enclosed region and the con-vexity of its bounding contour.

To study the effect of convexity in isola-tion, we ran the model on images that con-tained regions with either convex or concavesides, where no region was enclosed in an-other. To save space, figure 6A presents onlythe result of the simulation for one of thetwo globally consistent solutions. (The orig-inal image can be inferred from it easily.)It is known that observers tend to perceiveconvex regions as the Figure (Kanizsa andGerbino 1976; see also Liu et al. 1999).

This tendency is also shown by the model:the entropy of the solution in figure 6A, i.e.,when the convex regions are Figure, was 0.48±0.03 (see Appendix B for how confidence in-tervals were computed). In contrast, the en-tropy for the other globally-consistent solu-tion, i.e. when the concave regions are pre-sumed Figure, was 0.71 ± 0.03. (Globallyinconsistent solutions produced even higherentropies and will not be discussed further.)

This difference between the two globallyconsistent solutions is a direct result of con-vexity. As one moves away from an edgeinto the Figural, convex region in figure 6A,the values of P (k) fall off from 1 (turn fromwhite to gray), and then start to rise againas one approaches the edge on the other sideof the region. Similarly, the P (k) values forunits on the concave, Ground side increasefrom 0 (turn from black to gray), and thendecrease back. However, the fall off from 1on the convex side is slower than the rise

0.6

0.7

0.8

0.9

1.0

0.6

0.7

0.8

0.9

1.0

0.6

0.7

0.8

0.9

1.0

0 10 20 30 40 50

C

B

A

Distance from edge

P(k

) [F

uni

ts,

] /

1-P

(k)

[G u

nits

,

]

Figure 6: The effect of convexity on Fig-ure/Ground organization. A, The candidateorganization P for a globally consistent setof F/G assignments where the convex re-gions are be Figure. B, A cross sectionshowing P (k) as a function of the distanceof Figural unit k from the edge (solid curve).The dashed curve shows P (k) for G (back-ground) units, and the dotted curve showsa reflected version of 1−P (k) for the sameunits, allowing the faster decay of the Gunits to be compared directly with the slowerdecay of the F units. C, Plots of the decayof P (k) and 1−P (k) for F and G units, re-spectively, as a function of distance from theedge. Three levels of convexity are shown,decreasing from top.

17

from 0 on the concave side. To illustratethis difference more clearly, panel B shows agraph of P (k) for units which lie along a cen-tral horizontal cross section. (For simplic-ity, only one cycle of the solution is shown.)The solid curve denotes the value of P (k) forF units, and the dashed curve for G units.To enable direct comparison, the values of(1 − P (k)) for the G units have been “re-flected” about the edge and redrawn on theFigural (convex) side with a dotted line. Insection 2.3.3 it was shown that the entropyof the other globally-consistent solution canbe derived from the (1−P (k)) values of theG units in this solution. The faster decay ofthe G units therefore explains the higher en-tropy for the other solution, where the con-cave regions are considered Figure.

The effect of convexity depends on the cur-vature of the edge: the more curved the edge,the more mutual reinforcement there is be-tween the F units on the convex side of theedge (see figure 3). As a result, the differ-ence in the rates of decay on the convex andconcave sides increases with curvature. Thisis shown in figure 6C. The values of P (k)for the F units and of (1 − P (k)) for the Gunits are plotted for three levels of curvature(the lowest level is zero, i.e., straight edges,for which the two curves coincide.) Conse-quently, the values of the Figural entropiesfor the two solutions (convex regions are Fversus concave regions are F) become closerand closer as the curvature decreases, andcoincide when the edges are straight. Thispredicts that the lower curvature images willgive rise to more perceptual bi-stability.

Note that the effect of convexity is in con-flict with another factor: the distance of aunit from the edge. A unit at the center of aconvex region of our image was always fur-ther away from the edges than a unit at thecenter of a concave region (cf. figure 6A).When all other factors are equal, then thegreater the distance from the edges, the moredecay a unit suffers. Nevertheless, the effect

of convexity more than compensates for thisattenuation.

Another form of conflict between two fac-tors – convexity and closure – is when thecircumference of an enclosed region containsportions which are concave (e.g., a kidneybean shape.) Near those portions, F unitson the “in” side of the curve will decay fasterthan G units on the “out”. But since the re-gion is enclosed, there are always more por-tions of its circumference which are convex(with respect to its inside), with the net re-sult being that the candidate organizationwhere the enclosed region is F prevails (datanot shown).

Next, we discuss the model’s behavior withperceptually ambiguous images. At first sight,one might expect that the model will sig-nal ambiguity by settling into a state wheremany units have P (k) = 0.5. However, thiswould not be consistent with perception: am-biguous images lead to multi-stability, wherethere are several (typically two) distinct con-figurations, each with well-defined Figure andGround regions. Upon prolonged viewing,which configuration is experienced alternates– but at any given moment, there is a definitesense which regions are Figure. Therefore,for ambiguous images, the model should gen-erate several candidate organizations withlow entropy (i.e., few units near 0.5). Themore comparable these entropies are, the morebalanced the competing percepts would be.Conversely, significant differences between the(low) entropies means that one of the per-cepts would be more dominant.

To test these predictions, we ran the modelon the perceptually bistable image shown infigure 1 (Kanizsa and Gerbino 1976; Kanizsa1979). The Figural entropy of the globallyconsistent organization with the yellow re-gions as Figure is 0.42±0.02, and for the blueregions as Figure it is 0.45 ± 0.02. The Fig-ural entropies for globally inconsistent con-figurations, in contrast, were much higher,averaging around 0.70. These results are in

18

Figure 7: Like figure 1, this ambiguous image also leads toperceptual bi-stability, but the greater degree of convexity hereleads to a stronger dominance of the blue regions as Figure(Adapted from Kanizsa 1979.)

good agreement with the perception of thisimage. The close values of the two globallyconsistent organizations predict that bothwould be perceived. At the same time, theslightly lower entropy of the “yellow is Fig-ure” interpretation predicts a dominance ofthis percept.

Kanizsa and Gerbino (1976) attributed theadvantage of the yellow regions in figure 1as Figure to convexity. Note, however, thatthese regions are not convex in the strict,mathematical sense of the word. It is pos-sible, in principle, to quantify intermediatedegrees of convexity (although we will notdo this here). A shape which is convex inthe usual sense (e.g., the ellipse or the lightregions in figure 6) would get a score of (say)1, while the yellow regions in figure 1 wouldget a lower score, but yet higher than theblue regions. It is interesting that both themodel and human perception show sensitiv-ity to such intermediate levels of convexity.Kanizsa (1979) was evidently aware of thisnuance. In discussing figure 7, he noted thatthe convexity there was “more accentuated”(than in figure 1), and that the dominance ofthe convex regions was, in turn, stronger (pg.109). The results the model gives for this im-age reflect this, too: the entropy for the blueregions as Figure is 0.48 ± 0.03, while thatof the yellow regions as Figure is 0.57±0.03.This difference is larger than that obtainedfor figure 1, but smaller that that obtainedfor the ellipse (see also Pao, Geiger and Ru-bin 1999).

Kanizsa and Gerbino (1976) used figures1 and 7 to demonstrate an additional point,which is that convexity wins over symme-try: in both images, the less-convex regions

are symmetric, but nevertheless perceptionis dominated by the non-symmetric, but more-convex regions. Our model suggests thatthe visual system’s greater sensitivity to oneglobal property (convexity) over another (sym-metry) may be related to what can be com-puted by networks of locally connected units.

Next, we consider the effect of size on Fig-ure/Ground organization. Graham (1929)found that smaller regions tend to be per-ceived as Figure. To isolate the effect ofsize in our model, we used images comprisedof parallel strips of alternating widths. Thestrips differed only in size, i.e., no differencescould arise from closure or convexity effects.We created a set of images with varying rela-tive strip sizes. For each image we computedthe Figural entropy for the two globally con-sistent solutions. The results are shown infigure 8. For all images except when thestrips were of equal width, the Figural en-tropy was significantly lower when the nar-rower strips were F. This effect grows as thedifference in size increases.

What is the cause for the effect of size?As in the case of convexity, the differencein entropies results from differences in de-cay from the 1, 0 anchored values near theedges. However, here the difference does notcome from the rate of decay: as was shownin figure 6c, for a straight edge the rate ofdecay is equal on the F and G sides. In-stead, the effect of size arises because thedecay extends further for the wider strips, re-sulting in a higher proportion of units withP (k) far from 1 or 0. As discussed in sec-tion 2.3.2, the decay occurs because the non-border units have an anchoring value of 0.5.

19

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

30 35 40 45 50Percent area (thin strips)

thin strips are F

thick strips are F

Figu

ral E

ntro

py

Figure 8: Effect of relative size on F/G organi-zation. The entropy of the two globally consis-tent solution is shown for images with differentratios of strip widths. Similar to perception, themodel shows a preference for the thinner stripsto be Figure.

The third term in equation (1), which penal-izes for deviations from this anchored value,is weighted by the small parameter ν. Therate of decay therefore depends on the valueof ν. Large values will lead to fast decay; asν is taken to be smaller and smaller, therewill be more and more propagation of theanchored values (1, 0) from the border unitsinto the depth of the regions. Therefore, thegraphs shown in figure 8b will change witha different choice of ν. Nevertheless, as longas ν is non-zero, there will be an effect ofsize. The perceptual effect of size thereforesupports the notion that non-border unitsshould have some small “inertia” towards anundecided F/G state. The specific way ofimplementing this may take different forms.In our model, ν was set for this image andthen its value was chosen according to a spe-cific scaling law for the other images; seeAppendix B). More experimental data areneeded in order to constrain ν or, more gen-erally, explore the implementation of F/Gdecay away from edges (as well as its poten-tial dependence on scale.)

The images we have considered so far dif-fered in several ways, but there is also an

important property which they all shared.In all cases, the contours that bound the re-gions perceived as Figure coincided with theluminance (and/or color) edges in the im-age. In terms of the model, this meant thatthese images yielded solutions that were al-ways in perfect agreement with both Prin-ciples (i) and (ii): there were F/G bound-aries along all luminance edges, and therewere no F/G boundaries elsewhere. (In thecase of ambiguous images, the polarity of theF/G boundaries could reverse, perceptuallyas well as in the model, but the boundariesalways coincided with the luminance edgesin the image.) There are, however, imagesthat give rise to percepts which deviate fromone or both Principles (i) and (ii) in places.The first class of such images we consider arethose that contain figures outlined by opencontours. Outlined figures can give rise to asense of an enclosed region even when thisregion is not bound by a luminance edge allaround. A simple example is shown in figure9A. Observers report perceiving an enclosed,near-elliptical Figural region. Although itmay be difficult to introspect what happens(perceptually) in the vicinity of the gap, for-mally there must be a transition between Ffor units inside and G for those outside. Thisis at odds with Principle (ii), which statedthat F/G transitions are unlikely where noluminance edges are present. But the modelis nevertheless capable of handling this sit-uation. The reason is that the implemen-tation of Principle (ii) via minimization ofE allowed for deviations from it: F/G tran-sitions away from edges are penalized, butthey are not banned.

To run the model on this image, we treatedthe outlining contour the same way we havetreated luminance edges so far. Panels Band C show two globally consistent candi-date organizations with the Figure units onthe inside versus outside, respectively. PanelD shows a globally inconsistent organizationwhere the polarity of the F/G assignments

20

A B C D

Figure 9: How the region-based model interprets figures outlined by open contours. A:, Perceptually,the image gives rise to a sense of an enclosed Figural region even though it is not bound by a luminanceedge all around. B, C: Globally consistent candidate organizations where the F units were set on theinside and outside, respectively. D: A globally inconsistent organization.

reverse between different anchoring opera-tors. The qualitative resemblance to the threecandidate organizations shown for the ellipse(panels 4D-F) is evident. The entropies ofthe three organizations in 9B-D are 0.46, 0.71and 0.73, respectively. The model thus pro-duces a nearly-elliptical Figural region (panelB), in accordance with perception. Uponcloser inspection, traces of the gap in theoutlining contour can be seen in the solution.Across the two sides of the outline, thereare sharp F/G transitions also in the por-tions where no anchoring operators were ac-tivated. (This is because the coefficients µkj

in the cost function, eq. (1), equal zero whenunits k and j are separated by an edge.) Incontrast, between the two ends of the out-line, the transition between F (P (k) > 0.5)and G (P (k) < 0.5) is more gradual. This isbecause, in the absence of a luminance con-tour there, the coefficients µkj are non-zero,enforcing a smooth transition between F andG.

Region-based models thus offer a naturalway to interpret how fragmentary edge in-formation can give rise to complete, enclosedFigural regions. Importantly, this can hap-pen even in the absence of “good continua-tion” of the contour fragments – as in thecase for figure 9a. This is in contrast withcontour based models, which do not offer anatural way to complete regions when goodcontinuation is absent. The advantage ofthe region-based approach follows from its

stated goal, to find Figural regions, and theresulting distinction it makes between lumi-nance edges and F/G transitions.

Another class of images that give rise topercepts where edge information and F/Gtransitions do not coincide – i.e., when Prin-ciples (i) and/or (ii) are violated – is thosethat give rise to globally inconsistent F/Gorganizations. An example is shown in fig-ure 10A. Observers report that there are two“lobe-shaped” Figural regions, a blue one onthe left and a yellow one on the right. Fol-lowing the single luminance edge that boundsthese two regions, it is evident that the per-cept just described involves a reversal of theF/G polarity at some point. This suggeststhat global consistency is not, in general,an absolute perceptual constraint, and thatother factors may override it. This impor-tant observation was already made by Rubin(1921, cf. figure 2).

To study the behavior of the model for thisimage, we compare several different candi-date organizations. We first consider a glob-ally consistent organization, shown in panelB. Here, the yellow region was taken as Fig-ure, leading to the right-hand lobe being Fig-ure and the left lobe as part of the back-ground. This organization (which, as alreadymentioned, does not correspond to percep-tion) yields a Figural entropy of 0.71± 0.02.Next, we flipped the F/G assignments of theanchoring operators around the left lobe, to

21

A B C D E

Figure 10: A: This image gives rise to a globally inconsistent Figure/Ground organization. Observersreport perceiving two lobe-shaped Figural regions, a blue one on the left and a yellow one on the right,but this necessitates a reversal of the F/G polarity along the continuous contour that bounds both. B: Aglobally consistent candidate organization. C: When the polarity of the F/G assignments around the leftlobe is reversed, the organization is not longer globally consistent, but nevertheless yields lower entropy.D: The output of the model based on the candidate organization in C yields, in addition to the twoFigural lobes, regions which are not observed perceptually. E: The organization observed perceptuallyis achieved when anchoring operators are suppressed from enforcing F/G flips along the vertical edgesthat separate the blue and yellow sides.

make the resulting organization better re-semble the reported percept. This is shownin panel C. The Figural entropy is lower:0.56±0.02. This organization, however, stilldiffers somewhat from that observed percep-tually. This can be seen more easily in panelD, where the background units (P (k) < 0.5)were all set to black, making the F/G tran-sitions clearly visible. In addition to the twolobes, this organization contains pie-shapedFigural regions which are not observed per-ceptually. Moreover, parts of the F/G tran-sitions around these regions occur where thereare no luminance edges in the image. Asdiscussed previously, such “spurious” transi-tions are an inevitable result of F/G assign-ments which are globally inconsistent. (Flip-ping the F/G assignments of the units aroundthe left lobe forced a transition between theG units outside it and the F units along theleft sides of the vertical edges.) The factthat the organization in panel C neverthelessyielded a lower entropy than that in panelB reflects the model’s strong preference forenclosed, smaller regions to be the Figure.The mutual reinforcement of Figure units in-side the lobes more than compensated for thepenalty incurred for the ‘gray’ (P (k) ' 0.5)units along the spurious F/G transition.

Enforcing F/G transitions along all edges,including the vertical ones, was done in or-der to adhere to Principle (i), which stated

that F/G transitions are likely along lumi-nance edges. The reported percept, however,suggests that for the image in panel A, thisprinciple may need to be reconsidered. Re-porting the two lobes as the Figures in theimage implies, among other things, that thevertical luminance edges are not perceived asF/G transitions. We therefore tested whathappens if we suppressed the anchoring oper-ators along the vertical edges. The result ofminimizing equation (1) for this modified setof F/G assignments is shown in panel E. Theentropy of this organization is 0.34 ± 0.04.Note, however, that in spite of its obviousadvantage, this organization will not be ob-tained by running the version of the modeldescribed here. The present version auto-matically enforces F/G flips across all lumi-nance edges (i.e., it can only give organiza-tions like those in panels B and C). PanelE was obtained by manually suppressing se-lect luminance edges from being loci of F/Gtransitions, as guided by our knowledge ofthe percept. In subsection 2.2, we mentionedthat there should be mechanisms that allowthe suppression of certain luminance edgesfrom being considered as F/G transitions.There, we gave as an example the case oftexture. The image in figure 10A serves asa reminder that, more generally, the visualsystem also has the means to admit the pos-sibility of sharp luminance variations on a

22

single surface (due to a change of surfaceproperty, e.g., paint color, or to illumination,e.g., shadows), or the existence of two abut-ting surfaces. The present model focuses onhow to assign F/G polarity to edges, whichis a major component in segmenting the im-age. But clearly, the ultimate interpretationof any image must involve many other fac-tors, and possibly more than one iteration,until a self-consistent, ecologically valid in-terpretation of the scene is reached.

4. Discussion

We presented a model for Figure/Ground(F/G) segregation. The starting point of themodel is the need to determine which of thetwo sides of an edge is in front (the Figure),and which side is in the background. This“border ownership” problem is particularlysevere when the units have only local infor-mation (small receptive fields), since its reso-lution requires integrating information froman image region approximately of the size ofthe Figure. Zhou et al. (2000) reported thatearly visual cortical cells (V1/V2/V4) showsensitivity to the polarity of border owner-ship induced by Figures much larger thantheir “minimum response fields”. We there-fore asked whether those effects could be ac-counted for by computations performed inthose early areas, or whether receiving globalinformation via feedback from higher areaswas necessary. We found that Figure/Groundsegregation can indeed be computed by amodel network of small receptive-field unitswhich interact only locally (nearest-neighborconnections). Importantly, the model showssensitivity to global image properties in away similar to human perception: it prefersenclosed, smaller or “more convex” regionsas the Figure. This suggests that the sen-sitivity to Figure/Ground polarity found byZhou et al. (2000) for early cells (V1, V2 andV4) may be computed in those regions, i.e.,it may not necessitate feedback from higherareas.

4.1. Timing considerations. What allowsthe model to show global effects despite hav-ing only local connections is that signals canpropagate through the network iteratively.The iterations lead to long-range propaga-tion of information as the system evolves withtime. This raises a natural question: cansuch a model be fast enough to account forthe rapid F/G segregation effects observedexperimentally? Zhou et al.(2000) concludedfrom their results (figure 20) that “the corti-cal processing that leads to border-ownershipdiscrimination requires no more than ∼ 25msec.” This poses a challenge to iteration-based models such as ours, but does not ex-clude them a-priori. The number of itera-tions needed for global effects to emerge isno more than half the number of units thatspan the most distant points in the Figure.Here, we restricted the model units to be ofa single scale, to keep things simple. Butin reality computations like those describedhere would need to be performed at multi-ple scales, including somewhat coarse ones.Cross-talk between the different scales couldthen allow the system to identify the ap-propriate scale for analysis of a given Fig-ure – that which converges rapidly to themost stable solution. In such a more realis-tic system, a relatively small number of it-erations may suffice for a wide range of Fig-ural surfaces. For modestly sized Figures,say those bound within 5◦, as few as 2-3iterations can provide an adequate solutioneven with 1◦ receptive field units. For largerFigural regions, the computations would becarried out by units with larger receptivefields, which are known to exist in early cor-tex (at least 3◦ parafoveally in V2, cf. Fosteret al. 1985; see also Van Essen et al. 1984;Maunsell and Newsome 1987; Sceniak et al.2001.) Given estimates of the extent of lat-eral connections, and how rapidly those sig-nals may spread (T’so et al. 1986; Grinvaldet al. 1994; Das and Gilbert 1995; see alsoMovshon and Newsome 1996; Girard et al.

23

2001; Hupe et al. 2001 for between-area it-erations times), Figural regions as large as30◦ may still be resolved within time framesof the order of 25 msec. Nevertheless, evenwith multiple-scale computations, iteration-based models predict a trend of slower con-vergence for computations on larger regions(cf. Paradiso and Nakayama 1991.) Furtherbehavioral and physiological experiments areneeded to test this prediction, that the timerequired for border-ownership resolutionshould grow with Figure size.

4.2. Region-based processes in physiol-ogy and perception. An important prop-erty of the present model is that it is region-based. The iterative propagation of signalsgives rise to global Figure/Ground effects be-cause these signals are allowed to propagateinto homogenous (edge-free) regions in theimage (recall figure 3). Although region-based computations have not been empha-sized in experimental or theoretical studiesin vision, there is evidence for the existenceof such processes. Cells whose receptive fieldsfall within homogeneous regions can showdifferential responses if the apparent bright-ness of the region they represent is affectedby changes to the far surround (Rossi et al.1996; MacEvoy et al. 1998; Rossi and Par-adiso 1999.) Several other studies (Lamme1995; Zipser et al.1996; Lee et al.1998; Lammeet al. 1999) showed that cells within homo-geneous, or homogeneously textured regionsexhibit heightened activity if those regionsbelong to a Figure (but see Rossi et al.2001.)The increased activity had a latency of 30-40ms relative to the onset of cells’ responses.Interestingly, some of the later studies re-ported that the increased activity was greaternear the edge than away from it, towards thecenter of the Figure (Lee et al. 1998; Lammeet al. 1999; Rossi et al. 2001) – similar towhat happens in our model.

Psychophysical studies have also focusedon the role of contour-based processes for

segmentation, especially for perceptual com-pletion of occluded and illusory surfaces (Kell-man and Shipley 1991; Ringach and Shap-ley 1996; Rubin 2001b; but see Mumfordet al. 1987). Nevertheless, there are percep-tual phenomena that are more naturally ex-plained by positing region-based processes.As shown in this paper, the preference forenclosed, convex regions to be perceived asFigure is one such example. More examplescan be found in the domain of illusory con-tours. Consider panels A and B in figure11: both show a group of lines which termi-nate along a circular arc. But observers re-port quite different percepts in the two cases:the terminators in panel A induce an illu-sory contour (IC) which bounds an occlud-ing white surface, while such an IC is notreported for panel B. Models of how ICs areinduced by line terminators have emphasizedcomputations along the contour defined bythe terminators (Heitger et al. 1992). Butsuch a contour-based approach cannot ac-count for the perceptual difference in pan-els A and B, since the shape of the contouralong the lines’ terminators is identical inthe two cases (panel C). In contrast, if sig-nals propagate not only along the contour,but also into the region posited as an oc-cluder, there would be more mutual enhance-ment in A (see panel D), similarly to how apreference for convexity arises in our region-based model. Other examples for IC phe-nomena that are more naturally explainedby a region-based approach can be found inGillam (1987; figures 30.8, 30.9). Indeed,the emergence of illusory contours is inti-mately related to the border ownership prob-lem (Nakayama et al. 1995), and thereforeit is quite likely that region-based processesplay a part in IC-related computations aswell.

From a computational point of view, region-based models have several attractive features,which explains their recent popularity in com-puter vision (e.g., Zhu et al. 1995; Shi and

24

A

DC

B

Figure 11: Region-based effects in the formation of illusory contours (ICs). The terminators in panel Agive rise to an IC which bounds an occluding white surface on the left. In contrast, no IC is observed inpanel B. The difference in the perception of the two images cannot be explained by a purely contour-based approach, as the contours traced by the terminators are identical in the two cases (panel C).Positing region-based computations, on the other hand, provides a natural explanation (panel D.)

Malik 1997; Geiger et al. 1998; Sharon et al.2000). In general, in such models the outputof the computation is a collection of pixels,grouped together and labeled as a ‘region’.Such regions are naturally bounded by closedcontours, and therefore region-based modelscan be less sensitive to image noise that in-terrupts parts of a luminance edge. (Recall,for example, figure 9, which readily yielded aFigural region in our model, but would posea problem for contour based models.) Fur-thermore, the bounding contours of the re-gions produced as output are globally consis-tent by definition, thus eliminating the needto test for self-consistency of traced contoursas valid surface boundaries (Williams andHanson 1996; Williams 1997.)

4.3. Probability and neural representa-tion. A key feature of the model is the useof intermediate values to represent proba-bilities that image locations are Figure orbackground. It is therefore natural to won-der about the neural plausibility of such aprobabilistic scheme. While in the present

model we do not commit to any specific neu-ral implementation, a few possibilities canbe suggested. One, there may be a singlepopulation of neurons whose role is to sig-nal whether the location they correspond tois part of the Figure (P (k) = 1), in whichcase they fire at a maximal rate, or in thebackground (P (k) = 0), in which case theyare quiescent. In this case, intermediate F/Gvalues would mean that the neurons are fir-ing less than their maximal rate. Anotherpossibility is to have two populations of cells:at each location, there would be a corre-sponding “F neuron” which fires maximallyif the location is part of the Figure and a “Gneuron” which fires if the location is partof the background. The F and G neuronswould be mutually inhibitory since, ideally,only one of them fires, corresponding to P (k)values of 1 and 0. The less desirable casein which both neurons fire (at intermediaterates) would correspond to an intermediatevalue of P (k). This latter scheme is similarto that favored by Zhou et al. (2000) to ac-count for their results (see their figure 28).

25

Interestingly, this scheme can offer an alter-native to the “temporal binding” hypothe-sis for how to encode that different neuronsbelong to the same surface (cf. Finkel andSajda 1992; Wang and Terman 1997; Roskies1999 and references therein). Instead of us-ing correlated firing as a code, the fact thata collection of F neurons are active simul-taneously may itself signal that they belongto the same surface, even for non-contiguouspopulations. (At its simplest form, however,this mechanism only allows the signaling ofone Figural surface at any given moment.)

The probabilistic nature of the activity ofunits in the model has an appealing advan-tage: it offers a natural way to incorporateadditional cues in a distributed way. In thepresent version, the setting of the F/G as-signments by anchoring operators was unbi-ased, i.e., there were equal chances for F tobe on either side of an edge. But there maybe other sources of information in the im-age (or image statistics) about which sideis more likely to be in front. Such infor-mation is easily incorporated into the modelby biasing the probability in favor of a cer-tain side to be F. The most obvious exam-ple is stereoscopic depth, which can disam-biguate the relative depth of (all or parts of)an edge. There are also monocular (picto-rial) cues for depth stratification, such as T-junctions (Rubin 2001b) and concave cusps(Stevens and Brookes 1988), which could eas-ily be incorporated in a probabilistic scheme.Global biases about depth relationships canbe implemented also. For example: the lowerparts of images tend to be perceived as nearerthan the upper parts (presumably reflectingthe learned statistics of the ground plane.)This effect can override the preference forconvex regions to be Figure: when a regionis divided horizontally by a curved edge, thebottom part is often seen as Figure even ifit lies on the concave side of the edge. Thisbias can be implemented in the model byhaving P0(k) change as a function of the y

value (height) of the unit in the image, i.e.,by introducing priors about the probabilityof lower and upper parts in the image to beFigure. In this regard, the model presentedhere is related to “Bayesian networks” and“belief nets” which have been used for im-age analysis in several domains (Weiss 1997;Freeman and Viola 1998).

4.4. Further extensions to the model.The model presented here was kept as sim-ple as possible in order to isolate and high-light the main ideas it implements. To makeit a realistic model of how F/G segregationmay be implemented in the brain, however,it needs to me extended in many directions.Some of these extensions have already beenmentioned. Below, we revisit the most im-portant one and then list further directionsto extend the model which were not discussedbefore.

An important modification to the modelwas mentioned in section 2.2. The present,two-stage version requires a search throughthe set of candidate organizations to find thelow-entropy ones. This was done in order topresent the principles of the model in theirsimplest form, but it is not biologically plau-sible. A one-stage model which minimizes acost function that incorporates both E and Scan be achieved by introducing local interac-tions which favor consistent polarity of F/Gassignments between neighboring anchoringoperators (Pugh and Rubin, in preparation).

Another limitation of the present versionof the model is that it does not representmore than two layers of depth at a time (theFigure and the Ground), and this obviouslyneeds to be addressed in order to handle real-world occlusion situations. (This does notnecessarily imply a need to invoke numerouslayers of F/G units: an alternative approachmay be to combine the model with an atten-tional/control module, so that detailed seg-mentation is performed on only a piece ofthe image at a time. In support of this idea,there is evidence that human observers are

26

able to perform only rather crude segmen-tation on “busy” images that contain manysurfaces; cf. Gurnsey et al. 1996) A relatedproblem is that the present model does nothave a mechanism to allow the background(or, more generally, occluded surfaces) to con-tinue behind the Figure. This is necessaryfor correct recovery of the shape of surfaces,including linking disjoint region into a uni-tary surface. Such “perceptual completion”processes clearly take place in human vision(E. Rubin 1921, 1958; Nakayama et al. 1995;Driver and Baylis 1996), and neural corre-lates of it have recently been reported (Baylisand Driver 2001; Kourtzi and Kanwisher 2001;Rubin 2001a). Future extensions of the modelshould therefore incorporate completion mech-anisms.

The region-based model we presented heremakes minimal contact with contour-basedcomputations: the only way in which con-tours were used was to determine where F/Gtransitions are likely to occur (Principle i).However, in reality it is likely that there is in-tensive cross-talk between contour-based andregion-based computations. One example ofhow such cross-talk may significantly improvethe performance of the model is by mak-ing use of local contour-orientation informa-tion. In the present version, signals propa-gate isotropically in all directions (cf. fig-ure 3). Preliminary results suggest that ifstronger signals are sent in the direction or-thogonal to the local edge orientation, themutual enhancement on the convex/enclosedside of the edge is more pronounced (Pao2001). Such “directional diffusion” thereforeoffers a way to significantly enhance F/G res-olution, and possibly also accelerate conver-gence.

Finally, the present model focused entirelyon how Figure/Ground resolution may beachieved in a stimulus-driven way, relyingsolely on the geometrical properties of shapesin the image. But it is known that high-level

factors such as object knowledge and atten-tion can affect F/G resolution (Peterson andGibson 1994; Driver and Baylis 1998). Anyultimate model of segmentation would un-doubtedly need to incorporate “top-down”information alongside “bottom-up” compu-tations like those described here. In addi-tion to enabling high-level effects to emerge,feedback from higher cortical areas may alsofacilitate the computations in complex situ-ations, e.g. for very large Figures or whenthe image is very cluttered (cf. Kienker etal. 1986).

Acknowledgments: We thank Davi Geiger,Jean-Michel Hupe, J. Anthony Movshon, KenNakayama, Robert Shapley, Eitan Sharon,Shimon Ullman and Yair Weiss for helpfuldiscussions and comments on the manuscript.NR is supported by NSF grant IBN-9720305and an Alfred P. Sloan Fellowship. MP issupported by NSF grants DMS97-21430 andDMS99-71392 and an Alfred P.Sloan Fellow-ship.

References

Bakin JS, Nakayama K, Gilbert CD (2000)Visual responses in monkey areas V1 and V2to three-dimensional surface configurations.J Neurosci 20:8188-8198.

Baumann R, van der Zwan R, Peterhans E(1997) Figure-ground segregation at contours:a neural mechanism in the visual cortex ofthe alert monkey. European Journal of Neu-roscience 9:1290-1303.

Baylis GC, Driver J (2001) Shape-coding inIT cells generalizes over contrast and mirrorreversal, but not figure-ground reversal. NatNeurosci 4:937-942.

Cover TM (1991) Elements of informationtheory. New York: John Wiley & Sons, Inc.,.

27

Das A, Gilbert CD (1995) Long-range hor-izontal connections and their role in corti-cal reorganization revealed by optical record-ing of cat primary visual cortex. Nature375:780-784.

Dongarra JJ, Bunch JR, Moler CB, StewartGW (1979) LINPACK Users’ Guide. Philadel-phia: SIAM.

Driver J, Baylis GC (1996) Edge-assignmentand figure-ground segmentation in short-termvisual matching. Cognit Psychol 31:248306.

Driver J, Baylis GC (1998) Attention andvisual object segmentation. In: The Atten-tive Brain (Parasuraman R, ed), pp 299325.Cambridge, MA: Mit Press.

Elder J, Zucker SW (1993) Contour Closureand the Perception of Shape. Vision Re-search 33:981-991.

Felleman DJ, Van Essen DC (1991) Distributedhierarchical processing in the primate cere-bral cortex. Cereb Cortex 1:1-47.

Field DJ, Hayes A, Hess RF (1993) Contourintegration by the human visual system: ev-idence for a local association field. VisionResearch 33:173-193.

Finkel LH, Sajda P (1992) Object discrimi-nation based on depth-from-occlusion. Neu-ral Computation 4:901-921.

Foster KH, Gaska JP, Nagler M, Pollen DA(1985) Spatial and temporal frequency selec-tivity of neurones in visual cortical areas V1and V2 of the macaque monkey. J Physiol365:331363.

Freeman WT, Viola PA (1998) Bayesian modelof surface perception. Neural InformationProcessing Systems 10:787-793.

Geiger D, Pao K, Rubin N (1998) Salientand multiple illusory surfaces. Proc. IEEEConf. Comp. Vis. Patt. Rec. June ’98,Santa Barbara:118-124.

Gillam B (1987) Perceptual grouping andsubjective contours. In: The Perception ofIllusory Contours (Petry S, Meyer GE, eds),pp 268-273. New York: Springer-Verlag.

Girard P, Hupe JM, Bullier J (2001) Feed-forward and feedback connections betweenareas V1 and V2 of the monkey have similarrapid conduction velocities. J Neurophysiol85:1328-1331.

Graham CH (1929) Area, color and bright-ness difference in a reversible configuration.J. General Psychology 2:470-481.

Grinvald A, Lieke EE, Frostig RD, HildesheimR (1994) Cortical point-spread function andlong-range lateral interactions revealed by real-time optical imaging of macaque monkey pri-mary visual cortex. J Neurosci 14:2545-2568.

Grossberg S (1997) Cortical dynamics of three-dimensional figure-ground perception of two-dimensional pictures. Psychol Rev 104:618-658.

Gurnsey R, Poirier F, Gascon E (1996) Thereis no evidence that Kanizsa-type subjectivecontours can be detected in parallel. Percep-tion 25:861-874.

Heitger F, Rosenthaler L, von der Heydt R,Peterhans E, Kubler O (1992) Simulation ofneural contour mechanisms: from simple toend-stopped cells. Vision Res 32:963-981.

Heitger F, von der Heydt R (1993) A com-putational model of neural contour process-ing: figure-ground segregation and illusorycontours. Proc. Int. Conf. Comp. Vis.4:32-40.

Hildreth EC (1984) The measurement of vi-sual motion. Cambridge, Mass: MIT Press.

Hubel DH, Wiesel TN (1968) Receptive fieldsand functional architecture of monkey stri-ate cortex. J Physiol 195:215-243.

28

Hupe JM, James AC, Girard P, Bullier J(2001) Response modulations by static tex-ture surround in area V1 of the macaquemonkey do not depend on feedback connec-tions from V2. J Neurophysiol 85:146-163.

Iverson L, Zucker SW (1995) Logical/linearOperators for Image Curves. IEEE Trans.Pattern Analysis and Machine Intelligence17:982-996.

Kanizsa G (1979) Organization in Vision.New York: Praeger.

Kanizsa G, Gerbino W (1976) Convexity andsymmetry in figure-ground organization. In:Vision and Artifact (Henle M, ed), pp 2532.New York: Springer Co.

Kellman P, Shipley T (1991) A theory of vi-sual interpolation in object perception. Cog-nitive Psychology 23:141-221.

Kienker PK, Sejnowski TJ, Hinton GE, Schu-macher LE (1986) Separating figure fromground with a parallel network. Perception15:197-216.

Kimia B, Tannenbaum A, Zucker SW (1995)Shapes, Shocks, and Deformations. Intl J.Computer Vision 15:189-224.

Koffka K (1935) Principles of Gestalt Psy-chology. New York: Harcourt Brace & Co.

Kourtzi Z, Kanwisher N (2001) Representa-tion of perceived object shape by the humanlateral occipital complex. Science 293:1506-1509.

Lamme VA (1995) The neurophysiology offigure-ground segregation in primary visualcortex. J Neurosci 15:1605-1615.

Lamme VA, Rodriguez-Rodriguez V, Spekrei-jse H (1999) Separate processing dynamicsfor texture elements, boundaries and surfacesin primary visual cortex of the macaque mon-key. Cereb Cortex 9:406-413.

Lamme VAF, Zipser K, Spekreijse H (1997)Figure-ground signals in V1 depend on ex-trastriate feedback. Invest. Ophthal. & Vis.Sci. (Suppl.) 38:4490.

Lee TS, Mumford D, Romero R, Lamme VA(1998) The role of the primary visual cortexin higher level vision. Vision Res 38:2429-2454.

Liu Z, Jacobs DW, Basri R (1999) The role ofconvexity in perceptual completion: beyondgood continuation. Vision Res 39:4244-4257.

MacEvoy SP, Kim W, Paradiso MA (1998)Integration of surface information in primaryvisual cortex. Nat Neurosci 1:616-620.

Marr D, Poggio T (1976) Cooperative com-putation of stereo disparity. Science 194:283-287.

Maunsell JH, Newsome WT (1987) Visualprocessing in monkey extrastriate cortex.Annu Rev Neurosci 10:363-401.

Minguzzi GF (1987) Anomalous Figures andthe Tendency to Continuation. In: The Per-ception of Illusory Contours (Petry S, MeyerG, eds), pp 71-75. New York: Springer-Verlag.

Movshon JA, Newsome WT (1996) Visualresponse properties of striate cortical neu-rons projecting to area MT in macaque mon-keys. J Neurosci 16:7733-7741.

Mumford D, Kosslyn SM, Hillger LA, Her-rnstein RJ (1987) Discriminating figure fromground: the role of edge detection and regiongrowing. Proc Natl Acad Sci U S A 84:7354-7358.

Nakayama K, He ZJ, Shimojo S (1995) Vi-sual surface representation: a critical linkbetween lower-level and higher-level vision.In: Visual Cognition (Kosslyn SM, OshersonDN, eds), pp 1-70. Cambridge, MA: MITpress.

29

Nitzberg M, Mumford D, Shiota T (1993)Filtering, segmentation, and depth.Berlin/New York: Springer-Verlag.

Pao H, Geiger D, Rubin N (1999) Measur-ing convexity for Figure/Ground separation.Proc. 7th IEEE Intl. Conf. Comp. Vision:948-955.

Pao HK (2001) A Continuous Model forSalient Shape Selection and Representation.In: Courant Institute for Mathematical Sci-ences: New York Univerity.

Paradiso MA, Nakayama K (1991) Bright-ness perception and filling-in. Vision Res31:1221-1236.

Peterhans E, von der Heydt R (1989) Mech-anisms of contour perception in monkey vi-sual cortex. II. Contours bridging gaps. JNeurosci 9:1749-1763.

Peterson MA, Gibson BS (1994) Must figure-ground organization precede object recogni-tion? An assumption in peril. PsychologicalScience 5:253-259.

Reichl LE (1998) A modern course in statis-tical physics, 2nd Edition. New York: JohnWiley.

Ringach DL, Shapley R (1996) Spatial andtemporal properties of illusory contours andamodal boundary completion. Vision Re-search 36:30373050.

Roskies AL (1999) The binding problem.Neuron 24:7-9.

Rossi AF, Desimone R, Ungerleider LG (2001)Contextual modulation in primary visual cor-tex of macaques. J Neurosci 21:1698-1709.

Rossi AF, Paradiso MA(1999) Neural correlates of perceived bright-ness in the retina, lateral geniculate nucleus,and striate cortex. J Neurosci 19:6145-6156.

Rossi AF, Rittenhouse CD, Paradiso MA (1996)The representation of brightness in primaryvisual cortex. Science 273:1104-1107.

Rubin E (1921) Visuell wahrgenommene Fig-uren. Copenhagen: Gyldendals.

Rubin E (1958) Figure and Ground. In: Read-ings in Perception (Beardslee DC, WertheimerM, eds), pp 194-203. Princeton, NJ: D. VanNostrand Company, Inc.

Rubin N (2001a) Figure and ground in thebrain. Nat Neurosci 4:857-858.

Rubin N (2001b) The role of junctions in sur-face completion and contour matching. Per-ception 30:339366.

Sceniak MP, Hawken MJ, Shapley R (2001)Visual spatial characterization of macaquev1 neurons. J Neurophysiol 85:1873-1887.

Sha’ashua A, Ullman S (1988) Structuralsaliency: the detection of globally salient struc-tures using a locally connected network. Proc.of 2nd Int. Conf. Comp. Vis. Clearwater,FL.:321327.

Sharon E, Brandt A, Basri R (2000) FastMultiscale Image Segmentation. In: Confer-ence on Computer Vision and Pattern Recog-nition, pp 70-77. South Carolina: IEEE.

Shi J, Malik J (1997) Normalized cuts andimage segmentation. Proc. of IEEE Conf.on Comp. Vis. and Patt. Rec. PuertoRico:731-737.

Stevens KA, Brookes A (1988) The concavecusp as a determiner of figure-ground. Per-ception 17:35-42.

Sugita Y (1999) Grouping of image fragmentsin primary visual cortex. Nature 401:269-272.

Ts’o DY, Gilbert CD, Wiesel TN (1986) Re-lationships between horizontal interactions

30

and functional architecture in cat striate cor-tex as revealed by cross-correlation analysis.J Neurosci 6:1160-1170.

Ullman S (1976) Filling-in the gaps: the shapeof subjective contours and a model for theirgeneration. Biological Cybernetics 25:1-6.

Van Essen DC, Newsome WT, Maunsell JH(1984) The visual field representation in stri-ate cortex of the macaque monkey: asymme-tries, anisotropies, and individual variability.Vision Res 24:429-448.

von der Heydt R, Peterhans E, Baumgart-ner G (1984) Illusory contours and corticalneuron responses. Science 224:1260-1262.

Wang D, Terman D (1997) Image segmenta-tion based on oscillatory correlation. NeuralComputation 9:805-836.

Weiss Y (1997) Interpreting images by prop-agating Bayesian beliefs. In: Neural Infor-mation Processing Systems (Mozer MC, Jor-dan MI, Petsche T, eds), pp 908-915.

Williams LR (1997) Topological reconstruc-tion of a smooth manifold-solid from its oc-cluding contours. Int. J. Comp. Vis. 23:93-108.

Williams LR, Hanson AR (1996) Perceptualcompletion of occluded surfaces. ComputerVision and Image Understanding 64:1-20.

Williams LR, Jacobs DW (1997) Stochasticcompletion fields: a neural model of illusorycontour shape and salience. Neural Compu-tation 9:837-858.

Yen SC, Finkel LH (1998) Extraction of Per-ceptually Salient Contours by Striate Corti-cal Networks. Vision Research 38:719-741.

Zhou H, Friedman HS, von der Heydt R (2000)Coding of border ownership in monkey vi-sual cortex. J Neurosci 20:6594-6611.

Zhu SC, Lee TS, Yuille A (1995) Region com-petition: unifying snakes, region growing and

MDL for image segmentation. Proc. FifthInt. Conf. in Comp. Vis., :416-425.

Zipser K, Lamme VA, Schiller PH (1996)Contextual modulation in primary visual cor-tex. J Neurosci 16:7376-7389.

Appendix A. Computing P

The Algebraic approach

To find the function P which minimizes Ewe use the fact that the derivative of any well-behaved function is zero at points where thefunction is minimal (or maximal). This is truealso for functions of more than one variable,such as E(Q(k)). Therefore, we take the deriva-tives of E as a function of Q(k) for all units kand demand that these derivatives equal zero:

(A1)∂E

∂Q= 0.

The derivatives of E are given by

∂E

∂Q(k)= 4

∑

j∈Nk

µkj [Q(k) − Q(j)]

+ 2νk [Q(k) − P0(k)] .(A2)

The first term comes from differentiating thefirst term in equation (1), using the symme-try µkj = µjk. Nk denotes the set of nearestneighbors of unit k, as before. The second termunifies the derivatives of the second and thirdterms of equation (1); the coefficient νk equals1 if unit k is a border unit and ν otherwise.Referring back to figure 5, note that the coef-ficients of the first term in equation A2 corre-spond to the connection weights between unitk and its neighbors j in the bottom layer, andthe coefficient in the second term correspondsto the connection from the unit storing P0(k) inthe intermediate later to unit k. Thus, the lo-cal neighborhood of unit k in the network com-putes the value of ∂E

∂Q(k) , i.e., the effect of a small

change in Q(k) on the cost function E .Equation (A2) is linear in Q, since it is the

derivative of E which is quadratic in Q. There-fore, equation (A1) can be solved as a set of Nlinear equations, determining the minimizer P.This is the algebraic approach. In the generic

31

case, such a system of N equations with N un-knowns would take O(N 3) operations to solve.But since each unit only interacts with its near-est neighbors, this makes the linear system inequation (A2) sparse. This allows the efficientmethods developed specifically for sparse sys-tems (Dongarra et al.1979) to be applied. Thesemethods find P quickly with O(N) computa-tions.

The dynamical-systems approach

The algebraic approach is useful for findingP numerically, but it has a significant disadvan-tage in terms of physiological plausibility. Wewould not want to suppose that a neural systemuses sparse linear algebra (at least, not explic-itly). Therefore, it is important to show thatthe computation of P has an alternative formu-lation, one that lends itself readily to networkimplementation. Eq. (A2) gives the value of thegradient of E for any Q. Instead of equating thegradient to zero, as in the algebraic approach,we use it to define a dynamical system. In thiscase, Q evolves from one moment to the next insuch a way as to decrease E the fastest. Specif-ically, for each unit k

(A3) Q(k, t + ∆t) = Q(k, t) − ∆t ·∂E

∂Q(k)

with ∂E∂Q(k) given by equation (A2). Since the

network in figure 5 computes these derivatives(locally) for every k, equation A3 therefore showsthat it will converge towards the state P whichminimizes the cost function E . In theory, oneneeds to let the system evolve infinitely to ob-tain this asymptotic value. However, for a de-sired degree of precision, P will be approximatedin a finite number of time steps.

In the continuum limit (N → ∞ with a fixedimage size and ∆t → 0), eq. (A3) yields a par-tial differential equation (PDE) which is alsoused to describe heat flow in the presence ofexternal heat sources. This is useful becauseit makes it possible to draw insights about thetime evolution of the model from known prop-erties of heat flow.

Appendix B. Numerical methods and

parameters

The model was simulated with Matlab (TheMathWorks, Inc., version 5.3.1.29215a, 1999.)The input image, represented by an m × n ma-trix J with values of 0 and 1, was edge-detectedwith the Canny operator. Anchoring operatorswere then placed on the image with a specifiedhigh density, disallowing overlap of two or moreoperators. Each anchoring operators was thenchecked to see if an edge fell within it, and ifso, all units within it were labeled border units.The border units were given F/G assignmentsP0 according to the following rule: for the twoglobally consistent organizations, each unit wasgiven the value of the image J at the same loca-tion, or of its contrast-reversed image 1−J, re-spectively. For all other organizations, for eachanchoring operator the border units on one sideof the edge received a value of 1 and those onthe other side received 0, with a random deci-sion which side receives which value. Finally, allnon-border units receive a value of P0 = 0.5.

Next, we find P which minimizes the costfunction E (eq. 1). As discussed in subsection2.3.2, this corresponds to solving the linear sys-tem given by equations (A1-A2).

Treating the unknown P as a vector of lengthN(= m×n), those equations can be rewritten as

A~P = ~c, where A is a symmetric N ×N matrixand ~c is a vector of length N . The matrix A issparse, with at most five nonzero entries in eachrow. Using the notation kwest, keast, knorth andksouth for the four nearest neighbors of unit k(their indices are k − 1, k + 1, k − n and k + n,respectively; see figure 5), the k-th row in thelinear system is:

Mwest(k) [P (kwest) − P (k)]

+ Meast(k) [P (keast) − P (k)]

+ Mnorth(k) [P (knorth) − P (k)]

+ Msouth(k) [P (ksouth) − P (k)](B1)

− V (k)P (k) = −V (k)P0(k).

The diffusion matrices are defined by:

Mwest(k) = µ J(kwest)J(k)

+ µ (1 − J(kwest)(1 − J(k))(B2)

32

image size lengthscale ν extensionFig.1 200 × 200 30 pixels 0.000556 27 pixelsFig.6 70 × 210 70 pixels 0.0002 10 pixelsFig.7 67 × 331 50 pixels 0.0002 36 pixelsFig.8 210 × 210 30 pixels 0.000556 27 pixelsFig.10 180 × 220 50 pixels 0.0002 36 pixelsImages used for illustrative purposes

Fig.4 100 × 100 50 pixels 0.0004 36 pixelsFig.9 86 × 80 50 pixels 0.0004 36 pixels

Table 1:

with Meast, Mnorth and Msouth defined analo-gously. They implement the requirement thatthe value of µkj, the connection between neigh-boring units k and j, should be zero if theyare separated by an edge, and µ otherwise (firstterm in eq. 1). V (k) is 1 for border units and νfor non-border units, implementing the secondand third terms in eq. (1).

Equation (B1) holds for each unit that is noton the boundary of the network (i.e., n < k <n(m − 1) and ((k − 1) mod n) 6= 0, n − 1). Forunits on the boundary, one has to choose whatboundary conditions to apply, thus determin-ing the remaining 2n + 2(m− 2) equations. Weused Dirichlet boundary conditions: P(k) = 0.5for all k on the boundary. As a result, the Pvalues go to 0.5 as one approaches the bound-ary. To prevent the boundary conditions fromaffecting the output of the model, we solved forP on an ‘extended’ version of the image whichcontained an additional band of units aroundthe boundary, and then ‘cropped’ it back be-fore computing the Figural entropy. (To obtainthe extended version of an image we simply de-fined the target image as a cropped version of alarger image.) The width of the band was deter-mined empirically to be such that the effect ofthe boundary dissipated faster than the bandwidth. (The effect of the boundary was con-sidered eliminated if the change in the entropyproduced by extending the band further was inthe fourth significant digit or higher.) All theresults shown in paper are for the cropped im-ages. Table 1 lists the widths of the band usedfor each image.

Finally, the entropy was computed accordingto eq. (2). When confidence intervals are given,they were computed by generating ten sets ofanchoring operators, producing P0 for each ac-cording to the case at hand (globally consistentwith polarity of J or 1 − J, or globally incon-sistent), and computing P and the resulting en-tropy for each. The values given for the entropyrepresent mean ± 2std.

Free parameters. Their values were chosen asfollows:The anchoring operator were always of diam-eter 7 units, except for figures 4 and 9 wherethey were made larger (15 units) for illustrativepurposes.The density of the anchoring operators was 0.88±0.05 operators per 100 units (the variability arisesfrom the need to eliminate overlapping opera-tors.) This value holds for all the results shown,except for figures 4 and 9 where they were placedmanually.The parameter µ was set to 0.1 for all the sim-ulations shown here.The choice of the parameter ν, which controlsthe rate of decay of P (k) away from the edges, isrelated to the scale of the image (as measuredby number of units, not physical size.) Con-sider the image in figure 8. It is 210×210 pixelsand was processed by a network of that manyunits. If the same image was doubled in reso-lution and processed by a network of 420 × 420units without changing ν, the entropy value forthe best candidate organization (as well as allother organizations) would be different. Thereason is that the new Figural regions extendtwice as many units and therefore would suffer

33

comparatively more decay than before, leadingto higher Figural entropy. But this higher valueis an artifact of the different scales of the images,i.e., it is not truly representative of a strongerFigural organization in one of them. To pre-serve Figural entropy when scaling an image,ν should be scaled by (l/ln)2, where l and lnare the old and new lengthscales, respectively.(Strictly speaking, the diameter of the anchor-ing operators needs to be scaled, too, but we didnot do this as we found it had a negligible ef-fect on the results.) Generally, the appropriatelengthscale of an image is determined not by itsoverall size but rather by the size of the Figuralregions. The decay rate should be scaled withrespect to the Figure if one is to make meaning-ful comparisons between the performance of themodel on different images. We fixed the valueof ν at 0.0002 for figure 6 and then scaled it forother images by their lengthscales as given inTable 1. (As mentioned in the Discussion, ulti-mately the computations would need to be doneat multiple scales, with cross-talk between thedifferent scales to choose the best organization.For the present purpose of explaining the princi-ples of the model, however, we pre-selected therelevant scale for each image manually.)

34

globale ects in figure/ground segregation by a model with ......globale ects in figure/ground...

Documents