slides09 fop10 v1&objects - university of...

Vision IIIFrom Early Processing to

Object Perception

Chapter 10 in Chaudhuri

1 1

2

• Beyond the retina: 2 pathways to V1

• Subcortical structures (LGN & SC)

• Primary visual cortex: Lines, direction, colour

• M vs. P pathways: Movement vs. Particulars

• Object & Face recognition

Overview of Topics

3

How We See Things(short version)

• RGCs: Dot detectors

• V1: Line orientation, motion, colour

• V4: Shapes

• Temporal lobe: Objects & Faces

Fundamental Concept:Brain Organization Schemes

4

Dorsal vs. Ventral Streams

• Dorsal stream mainly involved in motion processing (the “where pathway”)

• Ventral mainly involved in processing identity of objects (the “what pathway”)

• Will focus on ventral here, which takes most of its input from P pathway (more later)

5

From Eye To Brain

6

Contralaterality in Vision

• One might think info from left eye goes to right brain & vice versa, but no. Instead...

• Information from left half of visual field goes (first) to right half of brain & vice versa

• That is, everything to the left of what you’re fixating goes to the right hemisphere (first), and vice versa.

7

Contralaterality in Vision

• Nasal halves of retinas (close to nose):

• Capture light from temporal half of visual field

• Send signals across to contralateral side of brain

• Temporal halves of retinas (close to temples):

• Capture light from nasal half of visual field

• Send signals along to ipsilateral side of brain

8 9

Optic Tract

Primary Visual Cortex (V1)

Optic Radiations

9

Optic Tract


Optic Radiations

9

Optic Tract


Optic Radiations

9

Nasal Visual FieldTemporal RetinaIpsilateral Hemisphere

Optic Tract


Optic Radiations

10

Temporal Visual FieldNasal RetinaContralateral Hemisphere

Optic Tract


Optic Radiations

10


Optic Tract


Optic Radiations

10


Optic Tract


Optic Radiations

11

Optic Tract


Optic Radiations

11

Optic Tract


Optic Radiations

11

Optic Tract


Optic Radiations

11

Optic Tract


Optic Radiations

11

Optic Tract


Optic Radiations

Foveal Representation• Is the fovea split, with info from each half carried to separate

hemispheres?

• No, instead it is represented in both hemispheres

• Evidence for this is seen in “foveal sparing”, i.e., continued visual function in fovea after loss of a visual hemifield due to stroke

12

Questions

• Light from the temporal visual field of the right eye falls on the ________ half of the retina, which sends information to the ______ side of the brain

•What are the two large streams of visual information that exit V1 and go to the rest of the brain?

13

Two Pathways From Eye To Cortex

• Geniculocortical pathway:

• Lateral Geniculate Nucleus (LGN) of thalamus to V1

• ≈90% of RGC outputs

• Tectopulvinar pathway:

• Superior colliculus (aka “tectum”) to Pulvinar nucleus to visual cortex (many parts)

• ≈10% of RGC outputs

14

The LGN

• As we’ve seen, thalamus has nuclei for early sensory processing of all sensory modalities (except smell).

• LGN is the one for vision

• A knee-shaped part of the thalamus

• Has a six-layered structure

• Does early visual processing (centre-surround RFs)

• Good example of a module that is organized in layers and columns

15 16

The LGN

• LGN has left and right halves.

• Each half receives signals from right and left eyes

• Layers 2, 3, & 5 receive input from the ipsilateral eye

• Layers 1, 4, & 6 receive input from the contralateral eye

• C I I C I C “See I? I see! I see!”

• 1+4 = 6, not true, so “contra”; 2+3 = 5, true, so “ipsi”

17

The LGN

• LGN layers 1 & 2 are magnocellular, with large neurones

• Part of the M-pathway, responsible for motion

• LGN layers 3-6 are parvocellular, with small neurones

• Part of the P-pathway, responsible for colour and detail

18

• Laterality:

• Red: Receive signals from the ipsilateral eye.

• Blue: Receive signals from the contralateral eye.

• Pathways (aka Channels)

• Solid: parvocellular layers.

• Dotted: magnocellular layers

19

The LGN Visual Processing in LGN

• LGN neurones have centre-surround receptive fields, just like retinal ganglion cells

• However, LGN cells are more “selective”, possibly representing some signal processing to reduce noise and produce sharper tuning than RGCs.

• These “cleaned” signals are sent on the V1

20

Visual Processing in LGN

• The LGN’s connection to V1 goes both ways, with a descending fibre tract

• This has been found to play a role in attentional modulation

• Example: After I ask “Are there any questions?”, likely some of my magno LGN cells are more active (those for upward motion)

21

The LGN’sRetinotopic Map

• Each LGN layer is organized according to a Retinotopic Map

• Map: Each place on the retina corresponds to a place on the LGN in a systematic fashion

• Retinotopic: Neurones that are adjacent in LGN have RFs that are adjacent on the retina

22

Adjacent spots in the visual field correspond toadjacent spots on the retina, which in turn correspond toadjacent spots in the LGN

23


23


23


23

How do we know this?

• Single-cell recording experiments in monkeys

• Recorded from individual neurones with a very fine electrode.

• For example, our electrode might penetrate the LGN parallel to its surface, thus staying the same layer.

• Measuring response of LGN neurones, and moving systematically across surface of LGN, we find the receptive fields move systematically across retina.

24 25

25 25

25 25

25 25

25

LGN Location Columns

• Single-cell recording experiments in monkeys

• If we instead move the electrode down through the LGN layers, we find the RFs are all the same place

• That is, the LGN is organized into location columns.

• i.e., there are columns of neurones that all process information from the same location on retina

26

27 27

27 27

27 27

27 27

Questions

•Which layers of the right LGN receive inputs from the left eye?

•What do we mean when we say the LGN has location columns?

•What do we mean when we say the LGN has a retinotopic layout?

28

Superior Colliculus

• Small branch from the optic tract goes to SC

• Has retinotopic map of contralateral visual field

• Signals go from SC to another thalamic nucleus, the pulvinar

• From there, they go to many parts of the visual cortex

29

Superior Colliculus

• SC receives descending signals from visual, auditory and somatosensory cortices

• SC integrates these to coordinate eye and body movements toward stimuli

• Example: You hear a loud sound or feel a tap on your shoulder and look automatically in that direction

30

V1

• Primary visual cortex = striate cortex = Brodmann Area 17 = Visual Receiving Area = V1

• The first cortical area for visual processing

• The best-understood part of the cortex, thanks to work by such luminaries as Hubel & Weisel, who won a Nobel Prize for research on V1

31

V1

• Organizational aspects:

• 6 layered structure

• Retinotopic map

• Ocular dominance columns

• Orientation selectivity columns

• Cytochrome oxidase blobs

• Location hypercolumns

32

Layer 1: No neurones, just fibres from neurones below

Layers 2-3: Communicate horizontally with other visual cortical areas

Layer 4: Receives inputs from LGN, subdivided into 4A, 4B, 4Cα (receives parvo inputs) & 4Cβ (receives magno)

Signals are then sent up/down from here to other layers

Layers 5-6: Send descending communications back to subcortical areas (LGN and SC)

Layers of V1

33

Layers of V1

34

Retinotopic Layout of V1

• RFs of adjacent V1 neurones are adjacent on the retina

• But, this retinotopic mapping is distorted relative to the surface area of the retina

• Foveal Magnification: Far more V1 neurones have RFs in the fovea than in the periphery

• i.e., foveal RGCs innervate a far larger area of V1than one would predict based on the area of the fovea

35

Foveal MagnificationOn The Retina In V1

36

Foveal MagnificationOn The Retina In V1

36

Foveal Magnification

•Why does V1 exhibit foveal magnification?

• One, there are simply more RGCs per unit area of fovea than in the peripheral retina

• Two, each foveal RGC innervates more cortical neurones

• Presumably this allows for more complex and precise processing of visual information from fovea

37

Questions

•What is the role of the SC in vision?

•Which layer of V1 receives signals from LGN?

•What is foveal magnification? Why does it occur?

38

Binocularity &Ocular Dominance

• At the LGN, all neurones are monocular

• However, at V1, the majority are binocular, taking inputs from both eyes

• This is the beginning of stereoscopic depth perception

39

Ocular Dominance

• Most V1 neurones are binocular

• But most show some preference for one eye or the other

• The preference varies systematically across the surface of the cortex

40

Ocular Dominance

41

How Ocular Dominance Comes About

42

Orientation Selectivity

• Most V1 Neurons have elongated receptive fields

• These are ON/OFF or OFF/ON, like RGCs

• Each neurone responds best to a line of light (ON/OFF) or dark (OFF/ON) of a given orientation

43

Orientation Selectivity

44

Orientation Selectivity• How does orientation selectivity in V1 neurones arise?

• Hubel & Weisel proposed the model below, where several LGN cells having RFs lined up on the retina--feed into one V1 cell

45

How V1 cells are wired to RGCs to produce oriented receptive fields

46

V1 Simple Cell

+ + +

- - -

- - -


46

V1 Simple Cell

+ + +

- - -

- - -


46

+-

Retinal Ganglion Cells

V1 Simple Cell

+ + +

- - -

- - -


46

+-


LGN Cells

V1 Simple Cell

+ + +

- - -

- - -


46

+-


LGN Cells

+-

V1 Simple Cell

+ + +

- - -

- - -


46

+-


LGN Cells

+-

+-

V1 Simple Cell

+ + +

- - -

- - -


46

+-


LGN Cells

+-

+-

V1 Simple Cell

+ + +

- - -

- - -


46

+ +

+

- -

-

- -

-

+-


LGN Cells

+- +

-

V1 Simple Cell


47

+ +

+

- -

-

- -

-

+-


LGN Cells

+- +

-

V1 Simple Cell


47

–+


LGN Cells

–+

–+

V1 Simple Cell

– – –

+ + +

+ + +


48

–+


LGN Cells

–+

–+

V1 Simple Cell

– – –

+ + +

+ + +


48

Directional Motion Selectivity

• Hubel & Weisel also found V1 cells sensitive to the direction of motion of the stimulus

• These are the first stage in our ability to process moving stimuli

49

Directional Motion Selectivity

50

Striate Cortex Motion-Sensitive

Cell(Reichardt Detector)

Striate Cortex Simple Cell

+

+

+-

-

-

-

-

-

+ + +

- - -

- - -

DelayingInterneuron

Schematic of a Reichardt Motion Detector

51

Striate Cortex Motion-Sensitive

Cell(Reichardt Detector)

Striate Cortex Simple Cell

+

+

+-

-

-

-

-

-

+ + +

- - -

- - -

DelayingInterneuron

Schematic of a Reichardt Motion Detector

51

Questions

•What are three stimulus characteristics that V1 neurones are tuned to?

• True or false: Orientation-selective cells are all ON/OFF (i.e, excitatory centre/inhibitory surround)?

52

Organization of V1

• How are the various cells in V1 organized?

• Hubel & Weisel proposed the “ice cube” model, whereby orientation and ocular dominance columns varied independently

• Also proposed the location hypercolumn, which is a set of all orientation columns and two ocular dominance columns

53

Ice Cube Model of V1

Retina

54


Retina

54


Retina

54


Retina

54


Retina

54


Retina

54


Retina

54


Retina

54


• While it’s accurate as far as it goes, the ice cube model is not complete

• Motion direction selectivity is not incorporated, nor are colour processing, spatial frequency, or M vs. P channels.

55

Cytochrome Oxidase Blobs• Interlaced with location hypercolumns are another set of

columns that show high neural activity

• Once were thought to be involved in colour processing and called “colour blobs”

• But instead they seem to integrate info from M&P cells

56

M vs. P Channels• As noted earlier, two

channels start in retina:

• Magno = movement

• Parvo = particulars

• These continue on through V1 to higher visual areas

57

M vs. P: Anatomical Separation

M P

Retina Parasol RGCs Midget RGCs

LGN Layers 1-2 Layers 3-6

V1, layer 4 4Cα 4Cβ

V1, blobs blob & interblob interblob only

Extrastriate MT V4

Pathways Dorsal (“where”) Ventral (“what”)

Com

pleteSeparation

PartialSeparation

58

M vs. P: Functional Differences

59

Ultimately...

• M channel projects more to MT (motion processing area) and then to the dorsal stream (= “where pathway”)

• P channel projects more to V4 (form processing area) and then to the ventral stream (= “what pathway”)

• We will now take a closer look at the latter

60

Questions

•What is a location hypercolumn in V1?

• The M & P pathways are completely segregated up to what point in the visual system?

61

The Ventral Stream• Consists of a network of areas, mostly in the

inferior temporal (IT) area, that engage in high-level vision

• IT cortex can be divided into 3 zones:

• Posterior IT (PIT): Complex form processing

• Central IT (CIT): View invariant processing

• Anterior IT (AIT): Individuation / configuration / shape-invariant processing

62

• Differentiating illumination edges from reflectance edges.

• Inverse Projection: Determining 3D shape from 2D information

• Segmentation: Differentiating objects from background and each other.

• Viewpoint invariance: Objects look different from different viewpoints

• Shape invariance: Some objects, especially living things, change shape but nonetheless are recognized as the same object.

• Completion: Objects are often partially occluded, how do we complete the view of a partially-viewed object?

Complex Form Processing Tasks (PIT/CIT)

63

64

Illumination Edge

64

Illumination Edge

Reflectance Edge

64

It is difficult for a computer program (but easy for us) to determine which changes in lightness in this scene are due to properties of different parts of the scene, and which are due to changes in illumination.

Illumination Edge

Reflectance Edge

64

Light Comes From Above

One assumption the visual system makes in differentiatingshadow from reflectance change is that light comes from above. True over our evolutionary history.

65

Light Comes From Above

One assumption the visual system makes in differentiatingshadow from reflectance change is that light comes from above. True over our evolutionary history.

65

The Light-from-above Assumption

66


67


67

Shadows Have Fuzzy Edges

Another assumption the visual system makes is that shadows have fuzzy edges (penumbras).

68

Shadows Have Fuzzy Edges

Another assumption the visual system makes is that shadows have fuzzy edges (penumbras).

68

An infinite number of objects can create the same image on the retina. How do we know which one is out there?

Inverse Projection Problem

69

Inverse Projection Rules

• The brain uses heuristics--rules of thumb that aren’t always true--to solve the otherwise impossible inverse projection problem. E.g.,

• “A straight line in the 2D image on the retina is a straight line in 3D reality”

• “If the tips of two lines meet in 2D, assume they meet at their tips in 3D reality”

70

Heuristics

Both of these assumptions hold true in most cases, but not all. Both are part of a more general rule that the visual system interprets the 3D world in a “stable” way, meaning that it will not change with slight changes in POV.

71

Heuristics


71

Heuristics


71

Heuristics


71

Inverse Projection Fail

• http://tinyurl.com/7duz9zb

• Your brain makes assumptions about how 2D projections arise from 3D objects

• In the case of the Devil’s Triangle, they not only fail to give you the correct interpretation, they actively prevent you from getting it!

72

Inverse Projection Epic Fail!

73

• Segmentation: Which parts of a scene belong to which objects? What is object vs. background?

• Part of the solution involves gestalt rules such as smoothness heuristics, but part is simple experience.

• Here Magritte messes with our segmentation heuristic by violating both gestalt rules and our experiences.

74

Segmentation

• Gestalt heuristics play a role in segmentation:

• Smoothness: Take the interpretation with the least sharp turns

• Pragnanz (simplicity): Take the interpretation with the fewest objects and types of objects

75

Smoothness Heuristic Fail

76

Smoothness Heuristic Fail

76

Perc

eptu

al S

egre

gatio

n:

Figu

re a

nd G

roun

d

77

78

• Heuristics used to determine which area is figure:

• Figures are located in the lower part of scene

• Figures are symmetrical

• Figures are small, backgrounds are large

• Figures are vertical

• Elements that are “meaningful” (i.e., have been seen as figures before) are figures

Figure-Ground Segmentation

79

80 81

• Recordings from V1 in the monkey cortex show:

• Response to area that is figure

• No response to area that is ground

• This result is important because:

• V1 neurones are early in the nervous system

• It reveals both a “feedforward” and “feedback” in the system

Figure-Ground Segmentation in V1

82

How a neurone in V1 responds to stimuli presented to its receptive field (green rectangle).

(a) The neurone responded when the stimulus on the receptive field is figure.

(b) No response when the same pattern on the receptive field is not figure!

83

How a neurone in V1 responds to stimuli presented to its receptive field (green rectangle).

(a) The neurone responded when the stimulus on the receptive field is figure.

(b) No response when the same pattern on the receptive field is not figure!

83

Questions

•What are some heuristics the brain uses...

• ...to distinguish luminance changes from lightness changes?

• ...to solve the inverse projection problem?

• ...to solve the segmentation problem?

84

Viewpoint Invariance“Ceci ne sont pas des pipes”

85

Viewpoint Invariance

•We recognize objects from different viewpoints, even though pattern of light on retina changes.

• i.e., there is a ∞-to-one relation between light patterns and objects (and vice versa).

• Beiderman’s RBC and Tarr et al’s view-based recognition models both tried to account for this

•We will see that this is another example of thesis-antithesis-synthesis

86

87

Human Viewpoint Invariance is Imperfect.Quick: Which pairs show two views of the same object?

A B C

87


A B C

87


A B C

87


A B C

• Structural-description models: 3D object representations are based on combinations of 3D volumetric primitives

• Image-description models: Ability to identify 3D objects comes from sets of stored 2D images from different perspectives

Two Viewpoints onViewpoint Invariance

88

• Marr’s model proposed a sequence of events using simple geometrical features:

• Edges (detected via V1 neurones)

• View-invariant features such as parallel lines, curve polarity, angle type. (PIT, maybe?)

• Geometrical shapes (again, PIT?)

• Relations between geometric shapes (CIT?).

Structural Description Models

89

• Recognition-by-components theory by Biederman (developed from Marr’s ideas)

• Volumetric primitives are called geons

• Theory proposes there are 36 geons that combine to make all 3-D objects

• Geons include cylinders, rectangular solids, pyramids, etc.

Structural-Description Models

90

Geons & Objects

91

• Properties of geons

• View-invariant: They can be recognized from almost any viewpoint (except rare “accidental” viewpoints)

• Discriminability: They can be easily distinguished from one another.

• Principle of componential recovery - the ability to recognize an object if we can identify its geons

Structural-Description Models

92

It is difficult to identify the object behind the mask because the corners and curves that allow extraction of geons have been obscured.

93

Now that it is possible to identify geons, the object can be identified

94

• In contrast to structural description models, image-description models claim that:

• Ability to identify 3-D objects comes from stored 2-D viewpoints from different perspectives.

• Evidence for this comes from novel object studies:

• For a familiar object, view invariance occurs

• For a novel object, view invariance does not occur

• Shows that an observer must have the different viewpoints encoded before recognition can occur from all viewpoints

Image-Description Models

95

Psychophysical curve showing that a monkey is better at identifying the view of the object that was presented during training (arrow). No view invariance.

96

Synthesis• Tjan & Legge (1998):

• View-invariant performance found for simple objects (e.g., geometrical shapes)

• But not for complex objects (e.g., ameoboids, bent paper-clip objects, etc.)

• Complexity defined quantitatively via an ideal observer algorithm

• Recent models incorporate both image-description and structural description aspects.

97

Completion

• Completion of partially-viewed objects is based on gestalt heuristics such as smoothness and pragnanz

• But, as with all heuristics, these sometimes fail.

• Sometimes we complete things that aren’t there...

• Experience obviously plays an important role

98

You Complete Me

99

You Complete Me

99 100

Shape Invariance

We have little trouble recognizing this bird as such, despite its many changes in shape

101

Shape Invariance

A similar problem arises with facial expression

102

Questions

•Which theory best explains our ability to recognize objects from many views, structural description models or image-description models?

•What does completion refer to in vision?

103

Face Perception

• Face processing is a highly complex visual task

• Faces are quite uniform, but we individuate them with ease

• Facial expressions are subtle variations in face shape, but we decode them with ease

• Thought to be subserved by a network of brain areas, some of which are in AIT

104

“Your face is the same as every-body has – the two eyes... nose in the middle, mouth under. It's always the same. Now if you had the two eyes on the same side of the nose, for instance – or the mouth at the top – that would be some help.”

- Humpty Dumpty, Through the Looking Glass

105 106

My Clones?

106

My Clones? Prosopaganosia

• An inability to recognize faces

• Often arises after damage to the AIT, specifically the fusiform face area (FFA)

• Specific to faces, recognition of other objects is unimpaired

107

Face-Selective Neurones

In areas of monkey cortex homologous to FFA, we find cells that respond specifically to faces

108

Perceptual Differences Between Faces & Objects• A number of phenomena suggest that faces are

processed in a qualitatively different way than other objects

• Inversion has a disproportionate effect on face recognition

• Inversion seems to disrupt configural processing in faces but not objects

• Composite effects exist for faces but not objects

109

A little harder

A lot harder

110

Face Inversion Inversion Disrupts Configural Processing

111

Schwaninger, A., Carbon, C.C., & Leder, H. (2003). Expert face processing: Specialization and constraints. In G. Schwarzer & H. Leder (Eds.), Development of Face Processing, pp. 81-97. Göttingen: Hogrefe.

Inversion Disrupts Configural Processing

111


Inversion Disrupts Configural Processing

111


Composite Face Effect

112

Composite Face Effect

112

Questions

•What is prosopagnosia?

•When does it occur?

•What is the composite face effect?

113

slides09 fop10 v1&objects - university of...

Documents