Download - Is perception continuous with cognition?
Is perception continuous with cognition?
The Cognitive Impenetrability of Vision
Read Seeing & Visualizing Chapter 2 or the BBS article on my web site: ruccs.rutgers.edu/faculty/pylyshyn.html
The accepted answer goes along with intellectual (and political) fashions
The zeitgeist in the second half of the 20th century was one of populist values which emphasized egalitarianism and the limitless possibility of the human mind. In keeping with this spirit, many scholars mistakenly repudiated innateness and, emphasizing plasticity, embraced learning as the defining character of human nature.
It was in this era that the belief arose that everything we perceived or thought was dependent on our cultural or linguistic context. Hence the popularity of Worf and Sapir, of Bruner’s New Look in perception as well as deconstruction movements in European (and, of course, Californian) thought.
The cultural ethos gave strong support to the view that the mind is highly plastic and at the mercy of the environment
Through the last half of 1900s both the general public and the social science community assumed that perception and cognition were continuous – that you could not distinguish between the two. Bruner’s New Look in Perception: Perception modeled on Science –
hypothesis formation, verification & modification Bruner, J. S. (1957). On perceptual readiness. Psychological
Review, 64, 123-152. Effect on Philosophy of Science: Feyerabend, Hanson, Kuhn (These
philosophers assumed that there is no “innocent eye” so all observations are “theory-laden”. Quine’s “The myth of the given”)
What are some reasons for thinking that vision is cognitively penetrable (and not modular)Expectation and the perception of patterns
Perception in noise of words, sentences, and probable sequences Assimilation of perception to the norms (Postman) Visual recognition as involving the framing hypotheses (Potter) Perceptual learning and expertise (bird watchers, wine tasters…) Apparent motion and “problem solving” in vision (Rock) The effect of hints and foreknowledge on fragmented figures or
stereogramsNeuroscience evidence for top-down effectsThe experience in computer vision (Heterarchy)
Why should we doubt the continuity thesis? Illusions…. Methodological concerns (signal detection theory)
What are some reasons for thinking that vision is cognitively penetrable (and not modular)
Expectancy and the perception of patterns Perception in noise of words vs nonwords, sentences, statistical
properties of sequences Assimilation of perception to the norms (Postman)
Explanation in terms of framing hypotheses (Potter)
Perceptual learning (bird watchers, wine tasters…) Apparent motion and “problem solving” in vision (Rock) The effect of hints and foreknowledge on closure figures or stereograms
● Neuroscience evidence for top-down effects● The experience in computer vision (Heterarchy –Shirai)
● Why should we doubt the continuity thesis? Illusions…. Methodological concerns (signal detection theory)
Some reasons to assume that vision is cognitively penetrable
Informal Magic tricks The proofreader’s problemSpeech perception
Phonetic monitoring Cross modal effect on hearing (listening while reading) Phoneme restoration effect
Most people believe that we when we read we make predictions about the word next which is whywhy we often substitute the wrong word for an unexpected word and is also why the predictability index (the Cloze Score) is a useful measure of readability, widely adopted in the past by newspapers and language teachers.
If you think that mental imagery involves (uses) visual processes, it is important to know which aspects or stages or visual functions it uses. If it involves reasoning and inferences based on what is sees then it’s not an interesting thesis since mental imagery is assumed to be a modality of thought. In that case the thesis turns out to be the claim that imagery shares cortical resources with thought which nobody doubted. If, on the other hand, visual perception is used to interpret mental images then this is a strong thesis since it suggests that images themselves are (or can be) seen.
Is a mental image something that is seen (with the mind’s eye)?
Seeing Mental Images• The central tenet of the Picture Theory of mental imagery is
that mental images constitute the same kind of spatial representations as are found on, say, the retinal or primary visual cortex.
• If, on the other hand, a mental image is an already-interpreted form of representation, this would support the thesis I am putting forward; that images are only phenomenologically picture-like while in their function (i.e., in their causal role) they are conceptual and therefore symbolic descriptions.
• This not only makes a big difference in one’s view about the mind but also has repercussions for the treatment of reports of conscious experience in scientific theories.
Is vision like science itself?• Does vision involve hypothesis formation and testing,
as was once believed to be the method used in science? Potter and Bruner’s hypothesis testing experiment
• If the answer is YES then the question whether mental imagery uses vision becomes circular (or empty) since clearly vision serves thought by providing new information.
• So the question whether there are general top-down hypotheses-proposing or hypothesis-tendering) effects becomes central.
Familiarity and the reconstruction of partially hidden patterns
Familiarity and the reconstruction of partially hidden sounds
Signal detection theory helps to isolate stages in information processing Signal detection theory helps to isolate slages in information processing
VERNALIT INTERVAL TRLAVNEI
Meaning and perception of phonemesThe ‘phoneme restoration effect’
1. The pitcher’s thoughts about the dangerous batt▒ made him nervous
2. The soldier’s thoughts about the dangerous batt▒ made him nervous
How to get someone to see fragmented figures?
Autostereogram or Magic Eye® figures
Apparent motion is a function of early vision, yet is subject to
various “intelligent” interpretations
In the “Ternus Configuration” short time delays result in “single element motion” (the middle object persists as the
“same object” so it does not appear to move)
Long time delays result in “group motion” because the middle object does not persist but is perceived as a new
object each time it reappears
But long delays, when the disappearance appears to be due to occlusion by an opaque surface, maintain objecthood, and therefore behave like short delays
But long delays, when the disappearance appears to be due to occlusion by an opaque surface, maintain objecthood, and therefore behave like short delays
Apparent “problem solving” in vision
Time 1
Time 2
Condition A Condition B
Does the circle appear to move or are there two circles being covered?
Does the circle appear to move or are there two circles being covered?
Apparent “problem solving” in vision (Rock)
Time 1
Time 2
Condition A Condition B
Another example from Irving RockBecause the rectangle is seen as transparent, the circle is seen to move in apparent motion
Here the rectangle is seen as opaque so it ‘explains’ the circle’s visible-invisible cycle. Therefore the circle is not seen to move.
Many reasons for thinking that vision is cognitively penetrable
Expectancy and the perception of patterns Perception in noise of words vs nonwords, sentences, statistical
properties of sequences Assimilation of perception to the norms Explanation in terms of readiness: Seeing as… Perceptual learning (bird watchers, wine tasters…) The effect of hints and foreknowledge on closure figures or
stereograms
►Neuroscience evidence for top-down effectsThe experience in computer visionWhy should we doubt the continuity thesis?
Illusions…. Methodological concerns (signal detection theory)
Centrifugal neural pathways
There are almost as many outward (efferent) nerve fibers as inward fibers
There is evidence of top-down control of sensors and top-down effects on percepts (e.g., filling-in effect for blind spot and other scotomas) Early attentional gating of a cat’s auditory signals
Hernandez-Péon, R., Scherrer, R. H., & Jouvet, M. (1956). Modification of electrical activity in the cochlear nucleus during "attention" in unanesthetized cats. Science, 123, 331-332.
Many reasons for thinking that vision is cognitively penetrable
Expectancy and the perception of patterns Perception in noise of words vs nonwords, sentences,
statistical properties of sequences Assimilation of perception to the norms Explanation in terms of readiness: Seeing as… Perceptual learning (bird watchers, wine tasters…) The effect of hints and foreknowledge on closure figures
or stereograms● Neuroscience evidence for top-down effects►The experience in computer visionWhy should we doubt the continuity thesis?
Illusions…. Methodological concerns (signal detection theory)
Early experience in computational perception (and AI) suggested that knowledge-based
perception leads to better performanceMinsky & Papert’s: “Heterarchy, not hierarchy” or the knowledge- based approach to perception, Shirai’s success in building an edge detector and other model-based vision systems.Riseman & Hanson (1987): “It appears that human vision is fundamentally organized to exploit the use of contextual knowledge and expectations in the organization of spatial primitives… Thus the inclusion of knowledge-driven processes at some level in the image interpretation task, where there is still a great degree of ambiguity in the organization of the visual primitives, appears inevitable (286).”
Common Blackboard Communication Area
Accoustic/optical expert
Phonetic/edge-finding expert
Phone-pattern/contour expert
Word/2D shape expert
Syntax/3D shape expert
Semantic/object-recognition expert
The Blackboard Architecture used in many AI applications is highly non-modular because all parts can communicate with one another.
e.g. ‘Hearsay’ speech recognition system
PandemoniumAn early architecture, similar to the blackboard architecture, was proposed by Selfridge in 1959. This idea continues to be at the heart of many psycho-logical models, including ones implemented as neural net (or connectionist) models.
On the other hand ….From a function perspective it makes sense
that the earliest stage of vision should be built to be fast and very often (though not necessarily always) veridical.
The parable of the blind clockmaker and Simon’s “partially decomposable systems”
The influence of David Marr’s Principle of least commitment: Do not do something that you may later wish to undo (e.g., depth first search)
The beginnings of a modular view in computer vision
David Marr (1982):
“The principle of least commitment… requires not doing something that may later have to be undone, and I believe that it applies to all situations in which performance is fluent. It states that algorithms that are constructed according to a hypothesize-and-test strategy should be avoided because there is probably a better method.”
There is a lot of prima facie evidence that vision works independently of what we believe and what we expect.
In order to explain why that seems to be so we need to distinguish between the part of vision that is unique to vision and the part that is shared by all intellectual processes. The unique part is called Early Vision.
What evidence is there that vision is a modular process?
Irvin Rock produced a lot of the evidence that is cited in support of the view that perception involves “Inference” and “Problem Solving” in order to account for the visual input, but he also says:
“The major difference between perception and thought is that perception is based on a rather narrow range of internalized knowledge, as far as inference and problem solving are concerned… Perception must rigidly adhere to the appropriate internalized rules, so that it often seems unintelligent and inflexible in its imperviousness to other forms of knowledge (p 340)”.
From: Rock, I. (1983). The Logic of Perception. Cambridge, Mass.: MIT Press.
Illusions don’t depend on what you believe!
A B C
Do hints speed up closure of fragmented figures?According to Reynolds (1985) the only thing that makes a difference to ease of closure is knowing that there is a sensible reading of the fragmented figure. Knowing the name of the figure does not help.
Meaning and difficult percepts What helps us see them?
Fragmented figures; autostereograms (“magiceye”); Random dot stereograms Category hints? Description? Model of what you should see? Knowing where to look/attend? Knowing they are ambiguous? Having seen them once before?
Saye, A., & Frisby, J. P. (1975). The role of monocularly conspicuous features in facilitating stereopsis from random-dot stereograms. Perception, 4(2), 159-171.Frisby, J. P., & Clatworthy, J. L. (1975). Learning to see complex random-dot stereograms. Perception, 4(2), 173-178.
Amodal completion is automatic and non-inferential (not rational)What would the figures look like if the black square occluders were removed? Try to see through them.
Independence of feature detection and object (pattern) perception
Evidence from brain-damage: visual agnosic with spared nonvisual pattern-recognition [Humphreys & Riddoch, 1987. To see but not to see: a case study of visual agnosia] This patient could had severe agnosia and could not visually recognize
familiar things (including his wife’s face) or discriminate shapes. But he had normal eye movements and sensory abilities (including
stereo and motion detection) He could see local features and, with enough time and effort, could
often infer the identity of the object (just as the New Look suggests) He could describe and draw objects from memory and could recognize
objects by touch, so his pattern memory was normal It “supports the view that the perceptual representation used in this
matching process can be ‘driven’ solely by stimulus information, so that it is unaffected by contextual knowledge.” H&R, p104)
Signal Detection Paradigm
Signal
Present Absent
Response
Yes I see it Correct Detection
False positive
No I don’t see it
False negative (misses)
Correct rejection
If correct detection improves without increase in false positives it’s an increase in sensitivity. If it improves but so does the false positive rate it suggests an increase in bias towards acceptance.
Examples from the language module
Phoneme restoration effect appears to be a response bias effect
Lexical ambiguity appears to be resolved after a period of time, before which all options are available.
Meaning and perception of phonemes
1. The pitcher’s thoughts about the dangerous batter made him nervous2. The pitcher’s thoughts about the dangerous battle made him nervous3. The pitcher’s thoughts about the dangerous batt▒ made him nervous4. The soldier’s thoughts about the dangerous batter made him nervous5. The soldier’s thoughts about the dangerous battle made him nervous6. The soldier’s thoughts about the dangerous batt▒ made him nervous
Signal detection analysis of responses shows that the effect is connected to the response selection stage*
*Samuel, A. G. (1981). Phonemic restoration: Insights from a new methodology. Journal of Experimental Psychology: General, 110(4), 474-494.
Maybe cognition has a post-perceptual selection function?
Swinney study of resolution of lexical ambiguity
Context Ambiguous UnambiguousNone The man was not surprised when
he found several bugs▲ in the corner of his room.
The man was not surprised when he found several insects▲ in the corner of his room.
Biased-insect
The man was not surprised when he found several spiders, roaches, and other bugs▲ in the corner of his room.
The man was not surprised when he found several spiders, roaches, and other insects▲ in the corner of his room.
Biased-Spying
The man was not surprised when he found several microphones, recorders, cameras, and other bugs▲ in the corner of his room.
The man was not surprised when he found several microphones, recorders, cameras, and other illegal devices▲ in the corner of his room.
Visual reading task at ▲: Ant (related), Spy (inappropriate), bag (unrelated)
Swinney ambiguous-word priming experiment No context:
Rumor had it that for years the government building had been plagued with problems. The man was not surprised when he found several bugs▲ in the corner of his room.
Context biased to insects:Rumor had it that for years the government building had been plagued with problems. The man was not surprised when he found several spiders, roaches, and other bugs▲ in the corner of his room.
Context biased to spying:Rumor had it that for years the government building had been plagued with problems. The man was not surprised when he found several microphones, recorders, cameras, and other bugs▲ in the corner of his room.
At points marked with ▲ a word was presented visually and subjects had to decide as fast as possible whether it was a word (a lexical decision task, where half of the time it was not a word). Examples of these words are ant, spy or sew. Nonwords were formed from the same letters: tna, ysp, swe.
Swinney ambiguous priming experiment
Swinney found that both senses of the ambiguous word (eg “bug”) primed the decision task – so both spy and ant were primed relative to the neutral word (sew)
The priming effect for the inappropriate sense of the word disappeared after about 0.7 to 1.0 seconds. After this only words related to the appropriate sense were primed.
Swinney, D. A. (1979). Lexical access during sentence comprehension: (Re)consideration of context effects. Journal of Verbal Learning & Verbal Behavior, 18(6), 645-659.
Visual Detection of Anomalous objects
Biederman
Alteration of perception with practice: The case of expert perceivers
Visual expertise often arises from learning what to attend to (pre-visual) as well as which patterns are diagnostic (post-visual).
Shiffrar & Biederman study of expert chicken sexers Perception & recall of board positions by chess masters Studies of athletes’ perception
Other fluent perceptions once thought to require access to knowledge
Noninvertibility of 3D-to-2D mappingThere is an important difference between constraints on visual interpretation being built in to the architecture through evolution and the use of knowledge about the likelihood of particular scenes contents in visual interpretation of a particular scene David Marr and Natural Constraints Natural constrains involve only optical-geometrical
properties – not physical constrains or statistical properties
Every perspective projection of edges is infinitely ambiguous, yet is almost always perceived univocally
Any set of 2D edges could have arisen for an unlimited number of 3D configurations. What makes the perception unique is the ‘assumption’ that certain configurations are non-accidental – i,e. they would not change with a small change in perspective.
Label propagation as an illustration of a natural constraintFour types of edge labels (convex, concave, 2 boundaries)
These are the only valid junction labels for a world of polyhedraThey are the only physically possible vertices formed by the 4 edge labels
Arrow
T
Fork
L
Three distinct types of trihedral junctions are recognized as part of the labeling scheme (referred to as Y, T and Arrow junction), making a total of 192 trihedral junction labels and 16 dihedral junction labels. But most of these label combinations cannot occur in the physical world. … The complete Waltz set includes over 50 line labels. These can generate over 300 million logically possible junctions, of which only 1790 are physically possible (see Waltz, 1975).
Arrow
T
Fork
L
Illustration of how to assign possible labels to a figure using the label-consistency constraintStart with junction A, followed by B, C, and D. The Arrows placed at A limit the choices for L’s at B, which in turn limit for Arrows at C. At C automatic neighbor reexamination has an effect, eliminating all but one label at B and A. Finally, the C boundary label limits the Fork choices at D to the one shown.
Many natural constraints take the form of label-propagation. By increasing the number of different label types, we increase the possible constraints in interpretation. By adding shadow edges and ‘cracks’ we end up with a unique labeling of figures like this
++
--
c
c
s
ss
-
S
S
SS
+-C
C
Waltz, D. (1975). Understanding Line Drawings of Scenes with Shadows. In P. H. Winston (Ed.), The Psychology of Computer Vision (pp. 19-91). New York: McGraw-Hill.
The label consistency requirement can also explain why some figures are ambiguous… in this case
because it has two globally consistent sets of labels
+
+
++
++
+
++ +
++
+
+
+++
+ -
-
-
---
--
+
Off-retinal info different from foveal info
Off-retinal info different from foveal info
Labels propagate over picture
Anorthoscope: Scan view How many contours are there?
Anorthoscope: Scan view What is the shape?
Perceiving a real figure as an impossible one!
Because the visual system applies Natural Constraints blindly, it can be tricked in reverse – to see an impossible figure knowing that it is real – as this illusory figure constructed by Richard Gregory shows.
Amodal completion seems to defy rational cues; vision has a logic of its own
What would this figure look like if the squares were removed?
Is there very short iconic storage in the preconceptual stage?
● Although the idea of pictorial long-term memory is not supported, there is some provisional evidence that sensory information outlasts the duration of the stimulus. Many people have studied these “sensory buffers” including George Sperling and Michael Posner.
Sperling’s partial report method for showing an iconic memory
Posner’s demonstration of short duration shape information
Fast
Fast
Slower
Slower
Some special cases: Natural Constraints or cognitive penetration into early vision?
Solving the correspondence problemThe Constancies (Muller-Lyer, lightness demo)Direction of light built inConcavity-convexity of very familiar stimuli –
e.g., face recognition
Illusions of size and motion
Cavanagh_SlowGPSpokeS.mov
Color Motion
Here are two sets of rotating spokes. On the left, the relative luminance of the two colours varies slowly over a range that might include equiluminance (more likely on a CRT than a LCD monitor). If it does, the coloured spokes will appear to slow down briefly as the ratio moves through equiluminance, silencing the luminance response. To help judge the slowing, light and dark spokes are shown in the centre, moving at the same rate as the colour spokes. The slowing effect should grow as you watch through several cycles. To observe the slowing, you have to fixate the centre of the bull's-eye. If you look at the spokes, you can of course track them and see their actual speed.
Which is lighter: Square A or Square B?
Apparent Motion – another Natural Constraint(which of these two matches will vision choose?)
Dawson, M., & Pylyshyn, Z. W. (1988). Natural constraints in apparent motion. In Z. W. Pylyshyn (Ed.), Computational Processes in Human Vision: An interdisciplinary perspective (pp. 99-120). Stamford, CT: Ablex Publishing.
Whether these figures are seen as dimples or mounds depends on whether they are viewed in this orientation or upside-down (relative to the head)
Mountain or crater?
The Ames distorting room
The Ames distorting room
Some special cases of modulesMicro-modules in vision (color & motion &
contour)Evidence for central (cognitive) architectural
modules is rarer – the best example being the Theory of Mind Module (ToMM) – Leslie, 1994.
Is there a visual-motor “module”? Milner & Goodale; patient DF. Ventral-dorsal pathways. Postural, grasping, eye-movement responses are not
subject to illusions that influence conscious vision and sometimes vice-versa.
Micro-modules?
Color, contour, motion and shadow; other information-flow constraints
The visual system consists of minimodules that are restricted in their intercommunication
Luminance
Stim
ulus
Motion
BinocularDisparity
Color
Texture
Edge Polarity
Shape
Derived Attributes
Shading,Occlusion,Surfaces,
Lumin
Color
?
Not all contours are alike:Equiluminous boundaries are not interpreted as shadows
Equiluminous boundaries also don’t yield a depth percept
Where does this leave us?● There is a large part of vision, called early vision (after
Marr) that is impervious to direct cognitive influences It can be affected by cognitively-determined selection at one of
two loci: prior to the vision module, where attention may select individual objects, and after vision the vision module, where cognition may select from among possible interpretations
● Vision seems to take into account many general world-properties (including direction of light), but it nonetheless remains “ignorant” and unable to utilize relevant knowledge about a particular scene that the organism has.
● The built-in natural constraints appear to be highly restricted and are mostly confined to geometrical-optical properties and not to physical laws.
The Pulfrich Pendulum illusion shows that a basic physical principle, such as the impenetrability of solid objects, is not part of the build-in constraints on visual interpretation. Here the visual system readily yields a physically impossible interpretation.
Leslie, A. M. (1988). The necessity of illusion: Perception and thought in infancy. In L. Weiskrantz (Ed.), Thought Without Language. Oxford: Oxford Science.
Modularity is a common principle in nature (Simon’s ‘Partially Decomposable Systems’)
The parable of the blind watchmaker (Simon)Perception may also have submodules. Vision
appears to have submodules for color, motion, contour, stereopsis… and there is even evidence for modules such as face recognition
Evidence for central (cognitive) architectural modules is rare – the best example being the Theory of Mind Module (ToMM) (Leslie, 1994).
How does this connect with classical issues in the philosophy of mind?
Early vision seems to include more than has been assumed to be part of the sensorium (sensory transduction) since it involves complex inference-like processes. However, it does not appear to permit contact with information in memory so it does not allow recognition of objects in the visual field as particular known objects.
The question of whether representations at this level are conceptual has not been raised. I believe the answer may be tied to other distinctions that we have not yet discussed – in particular to the distinction between personal and subpersonal representations, and architectural vs intentional or representation-governed processes.
The issue of accessibility to consciousness has not been raised, even though it is central to some views of sensation (or sentience). I will come back to this question later.
Humphreys and Riddoch study of a case of visual agnosia A remarkable case of classical visual agnosia is described in a book by Glyn Humphreys and Jane Riddoch
(Humphreys & Riddoch, 1987). … the patient was unable to recognize familiar objects, including faces of people well-known to him (e.g., his wife), and found it difficult to discriminate among simple shapes, despite the fact that he did not exhibit any intellectual deficit. As is typical in visual agnosias, this patient showed no purely sensory deficits, showed normal eye movement patterns, and appeared to have close to normal stereoscopic depth and motion perception. Despite the severity of his visual impairment, the patient could do many other visual and object-recognition tasks. For example, even though he could not recognize an object in its entirety, he could recognize its features and could describe and even draw the object quite well – either when it was in view or from memory. Because he recognized the component features, he often could figure out what the object was by a process of deliberate problem-solving, much as the continuity theory claims occurs in normal perception, except that for this patient it was a painstakingly slow process. From the fact that he could describe and copy objects from memory, and could recognize objects quite well by touch, it appears that there was no deficit in his memory for shape. These deficits seem to point to a dissociation between the ability to recognize an object (from different sources of information) and the ability to compute an integrated pattern from visual inputs which can serve as the basis for recognition. As (Humphreys & Riddoch, 1987, p 104) put it, this patient’s pattern of deficits “supports the view that ‘perceptual’ and ‘recognition’ processes are separable, because his stored knowledge required for recognition is intact” and that inasmuch as recognition involves a process of somehow matching perceptual information against stored memories, then his case also “supports the view that the perceptual representation used in this matching process can be ‘driven’ solely by stimulus information, so that it is unaffected by contextual knowledge.”
It appears that in this patient the earliest stages in perception – those involving computing contours and simple shape features – are spared. So also is the ability to look up shape information in memory in order to recognize objects. What then is damaged? It appears that an intermediate stage of “integration” of visual features fails to function as it should. The pattern of dissociation shows the intact capacity to extract features together with the capacity to recognize objects from shape information is insufficient for visual recognition so long as the unique visual capacity for integration is absent. But “integration” according to the New Look (or Helmholtzian) view of perception, comes down to no more than making inferences from the basic shape features – a capacity that appears to be spared.
Independence of detection and category selection bias
Signal Detection Theory shows that expectations based on statistical frequency-of-occurrence are response bias effects, and therefore post-perceptual
Arthur Samuel’s study of phoneme restorationVisual expertise often arises from learning where to
attend (pre-visual) and which patterns to retain because they are diagnostic (post-visual).
Shiffrar & Biederman study of expert chicken sexers Studies of athletes’ perception (and Chess masters’ memory)