visual computing theory and engineering...

Visual Computing Theory and Engineering

Applications

- Marr’s Vision and Beyond

Prof. Li Song(宋利)

http://medialab.sjtu.edu.cn

Shanghai Jiao tong University

http://medialab.sjtu.edu.cn/

Overview

Marr’s Vision Theory

Marr’s approach

• Distinguished different explanatory tasks at different levels

• Gave a general theoretical framework for combining them

• Apply the framework in considerable detail to a single example the early visual system

Marr’s three levels

• 3 different types of analysis of an information-

processing system

• Computational

• Algorithmic

• Implementational

Computational analysis

• Form of task analysis of a cognitive system

▪ (a) Identify the specific information-

processing problem that the system is

configured to solve

▪ (b) Identify general constraints upon any

solution to that problem

Algorithmic analysis

• Explains how the cognitive system actually

performs the information-processing task

• identify input information and output

information

• identify algorithm for transforming input into

required output

• specify how information is encoded

Implementational analysis

• Finds a physical realization for the algorithm

• Identify neural structures realizing the basic

representational states to which the algorithm applies

[e.g. populations of neurons]

• Identify neural mechanisms that transform those

representational states according to the algorithm

Marr’s computational analysis of visual system

• Two basic conclusions from his task analysis

• The visual system’s job is to provide a 3D

representation of the visual environment that can

serve as input to recognition and classification

processes – primarily information about shape of

objects and their spatial distribution

• This 3D representation is on an object-centered

rather than viewer-centered frame of reference

Experimental evidence

• Possibility of double dissociations between perceptual abilities and recognition abilities

▪ Right parietal lesions (右顶叶病变 ) - recognition

abilities preserved, but problems in perceiving shapes

from unusual perspectives

▪ Left parietal lesions (左顶叶病变) - shape perception

intact, but recognition and identification impaired

▪ Suggested to Marr that visual system provides input to

recognition systems

Theoretical considerations

• Recognition abilities are constant across changes in how things look to the perceiver due to • orientation of object • its distance from perceiver • partial occlusion by other objects

• Visual system provides information to recognition systems that abstracts away from these perspectival features - observer-independent representation

Algorithmic analysis

• Input = light arriving at retina

• Output = 3D representation of environment

• Questions: • what sort of information is extracted from the light

at the retina?

• how does the system get from this information to a

3D representation of the environment?

The challenge

• “From an information-processing point of view, our primary purpose is to define a representation of the image of reflectance changes on a surface that is suitable for detecting changes in the image’s geometrical organization that are due to changes in the reflectance of the surface itself or to changes in the surface’s orientation or distance from the viewer” (Marr, Vision p. 44)

• Need to find representational primitives that allow inference backwards from structure of image to structure of environment

Representational primitives

• Basic information at retina = intensity value of light at each point in the retinal image ▪ Changes in intensity value provide clues as to

surface boundaries

• Primitives allow structure to be imposed on patterns of intensity changes ▪ E.g. zero crossings (sudden intensity changes)

Zero crossings

• If we plot changes in

intensity on a graph, then

radical discontinuities will

be signaled by the curve

crossing zero

• Marr proposed a Laplacian

or Gaussian filter to detect

zero crossings

Primal sketch

• identifies intensity changes in the 2D image

• basic information about the geometric organization of those intensity changes

• Primitives include: zero-crossings virtual lines groups

2.5D sketch

• Displays orientation of visible

surfaces in viewer-centered

coordinates

• Represents distance of each point

in visual field from viewer

• Also orientation of each point

and contours of discontinuities

• Very basic information about

depth

3D sketch

•characterizes shapes and

their spatial organization

• object-centered

• basic volumetric and surface

primitives are schematic

(facilitates recognition)

Representation in the 3D sketch

• depends upon many shapes being recognizable as

ensembles of generalized cones

• Generalized cones are easy to represent

• vector describing path of the figures axis of symmetry

• vector specifying perpendicular distance from every point on axis to shape’s surface

20 Years After Marr

Recovering 3D shape of object by exploiting more constraints

Let us back to Marr’ theory

The basic processing is as follows：

Modules

• Vision processing is organized according to

function modules that are almost independent.

• Thus we can only focus on a specific function or

algorithm for each step

• Let’s us begin with image representation

Image – math viewpoint

Image-DSP viewpoint

Image-vision viewpoint

Image-storage viewpoint

• Let’s open an image file is its “raw”

format:

P6: (this is a ppm image)

Resolution: 512x512

Depth: 255

(8bits per pixel in each channel)

Image- computing viewpoint

From Image to Representation

• Intensity(强度) is affected by：Geometry (几何关系)、Reflection(反射)、Lighting(照明)、Observation(观察点)

• Representation is tokens， is values of attributes, point, line, edge, and their combination，and it is real physical changes on surface of objects that can be used to infer structure

• Human eye scans an unknown object by tracking its contours which are connection of edges

• If we are successful to extract edges of an object, it could be easy to recognize it.

From token to sketch

• Assumption(物理假设)： ▪ Surface、levels、similarity、Continuity

• Characteristics(初级表象的性质) ▪ Consists of Basics and reflect image local structures ▪ Steps: Zero Crossing -> primal sketch -> full primal

sketch

• Zero crossing (零交叉) ▪ 2nd derivative is zero, has biological explanation

• Raw primal sketch(原始要素图) ▪ token (表征)：Edge, blob, bar, and discontinuity ▪ local configurations: similarity and configuration

between tokens

• Full primal sketch(完全要素图) ▪ Selection, combination, discrimination, ▪ Form meaningful representation in multi-scale way

Edge is the most important token

Edge’s mathematical features

Edge extraction algorithm

• Hundreds of methods

▪ 1959：Julez, “A Method of Coding TV Signals

Based on Edge Detection,” Compression, Video.

Television.

▪ 1963：L. G. Roberts is a pioneer who did an

systematical research on edge detection

▪ …

Edge detection by filtering

Convolution(1)

Convolution(2)

Convolution(3)

Convolution(4)

Edge Filter(1)

Edge Filtering(2)

Edge Filtering(3)

Edge Filtering(4)

Edge Filtering(5)

Canny detector(1)

Canny detector(2)

Canny detector(3)

Canny detector(4)

Canny detector(5)

2nd order edge filter(1)

Marr-Hildreth Edge Detector

Marr’s and Canny’s

But edge extraction are hard

• Very difficult problem ！ For nature

images, people can easily see many

obvious edges but algorithm fails to

extract them or gets useless edges！

• There are still no general and robust edge

detection algorithm!

• Why? ▪ Many factors, one of them is multiscale…

Multiscale

Multiscale everywhere

0.gif

Multi-scale edge

Homework

• Further Reading

• Learning to Detect Natural Image Boundaries Using Local

Brightness, Color, and Texture Cues, TPAMI09

• Contour Detection and Hierarchical Image Segmentation, IEEE

TPAMI 2011.

• DeepEdge: A Multi-Scale Bifurcated Deep Network for Top-Down

Contour Detection, CVPR2015

• DeepContour: A Deep Convolutional Feature Learned by Positive-

sharing Loss for Contour Detection，CVPR2015

• Object Contour Detection with a Fully Convolutional Encoder-

Decoder Network，CVPR2016

visual computing theory and engineering...

Documents