visual computing theory and engineering...
TRANSCRIPT
Visual Computing Theory and Engineering
Applications
- Marr’s Vision and Beyond
Prof. Li Song(宋利)
http://medialab.sjtu.edu.cn
Shanghai Jiao tong University
Overview
Marr’s Vision Theory
Marr’s approach
• Distinguished different explanatory tasks at different levels
• Gave a general theoretical framework for combining them
• Apply the framework in considerable detail to a single example the early visual system
Marr’s three levels
• 3 different types of analysis of an information-
processing system
• Computational
• Algorithmic
• Implementational
Computational analysis
• Form of task analysis of a cognitive system
▪ (a) Identify the specific information-
processing problem that the system is
configured to solve
▪ (b) Identify general constraints upon any
solution to that problem
Algorithmic analysis
• Explains how the cognitive system actually
performs the information-processing task
• identify input information and output
information
• identify algorithm for transforming input into
required output
• specify how information is encoded
Implementational analysis
• Finds a physical realization for the algorithm
• Identify neural structures realizing the basic
representational states to which the algorithm applies
[e.g. populations of neurons]
• Identify neural mechanisms that transform those
representational states according to the algorithm
Marr’s computational analysis of visual system
• Two basic conclusions from his task analysis
• The visual system’s job is to provide a 3D
representation of the visual environment that can
serve as input to recognition and classification
processes – primarily information about shape of
objects and their spatial distribution
• This 3D representation is on an object-centered
rather than viewer-centered frame of reference
Experimental evidence
• Possibility of double dissociations between perceptual abilities and recognition abilities
▪ Right parietal lesions (右顶叶病变 ) - recognition
abilities preserved, but problems in perceiving shapes
from unusual perspectives
▪ Left parietal lesions (左顶叶病变) - shape perception
intact, but recognition and identification impaired
▪ Suggested to Marr that visual system provides input to
recognition systems
Theoretical considerations
• Recognition abilities are constant across changes in how things look to the perceiver due to • orientation of object • its distance from perceiver • partial occlusion by other objects
• Visual system provides information to recognition systems that abstracts away from these perspectival features - observer-independent representation
Algorithmic analysis
• Input = light arriving at retina
• Output = 3D representation of environment
• Questions: • what sort of information is extracted from the light
at the retina?
• how does the system get from this information to a
3D representation of the environment?
The challenge
• “From an information-processing point of view, our primary purpose is to define a representation of the image of reflectance changes on a surface that is suitable for detecting changes in the image’s geometrical organization that are due to changes in the reflectance of the surface itself or to changes in the surface’s orientation or distance from the viewer” (Marr, Vision p. 44)
• Need to find representational primitives that allow inference backwards from structure of image to structure of environment
Representational primitives
• Basic information at retina = intensity value of light at each point in the retinal image ▪ Changes in intensity value provide clues as to
surface boundaries
• Primitives allow structure to be imposed on patterns of intensity changes ▪ E.g. zero crossings (sudden intensity changes)
Zero crossings
• If we plot changes in
intensity on a graph, then
radical discontinuities will
be signaled by the curve
crossing zero
• Marr proposed a Laplacian
or Gaussian filter to detect
zero crossings
Primal sketch
• identifies intensity changes in the 2D image
• basic information about the geometric organization of those intensity changes
• Primitives include: zero-crossings virtual lines groups
2.5D sketch
• Displays orientation of visible
surfaces in viewer-centered
coordinates
• Represents distance of each point
in visual field from viewer
• Also orientation of each point
and contours of discontinuities
• Very basic information about
depth
3D sketch
•characterizes shapes and
their spatial organization
• object-centered
• basic volumetric and surface
primitives are schematic
(facilitates recognition)
Representation in the 3D sketch
• depends upon many shapes being recognizable as
ensembles of generalized cones
• Generalized cones are easy to represent
• vector describing path of the figures axis of symmetry
• vector specifying perpendicular distance from every point on axis to shape’s surface
20 Years After Marr
Recovering 3D shape of object by exploiting more constraints
Let us back to Marr’ theory
The basic processing is as follows:
Let us back to Marr’ theory
The basic processing is as follows:
Modules
• Vision processing is organized according to
function modules that are almost independent.
• Thus we can only focus on a specific function or
algorithm for each step
• Let’s us begin with image representation
Image – math viewpoint
Image-DSP viewpoint
Image-vision viewpoint
Image-storage viewpoint
• Let’s open an image file is its “raw”
format:
P6: (this is a ppm image)
Resolution: 512x512
Depth: 255
(8bits per pixel in each channel)
Image- computing viewpoint
From Image to Representation
• Intensity(强度) is affected by:Geometry (几何关系)、Reflection(反射)、Lighting(照明)、Observation(观察点)
• Representation is tokens, is values of attributes, point, line, edge, and their combination,and it is real physical changes on surface of objects that can be used to infer structure
• Human eye scans an unknown object by tracking its contours which are connection of edges
• If we are successful to extract edges of an object, it could be easy to recognize it.
From token to sketch
• Assumption(物理假设): ▪ Surface、levels、similarity、Continuity
• Characteristics(初级表象的性质) ▪ Consists of Basics and reflect image local structures ▪ Steps: Zero Crossing -> primal sketch -> full primal
sketch
• Zero crossing (零交叉) ▪ 2nd derivative is zero, has biological explanation
• Raw primal sketch(原始要素图) ▪ token (表征):Edge, blob, bar, and discontinuity ▪ local configurations: similarity and configuration
between tokens
• Full primal sketch(完全要素图) ▪ Selection, combination, discrimination, ▪ Form meaningful representation in multi-scale way
Edge is the most important token
Edge’s mathematical features
Edge extraction algorithm
• Hundreds of methods
▪ 1959:Julez, “A Method of Coding TV Signals
Based on Edge Detection,” Compression, Video.
Television.
▪ 1963:L. G. Roberts is a pioneer who did an
systematical research on edge detection
▪ …
Edge detection by filtering
Convolution(1)
Convolution(2)
Convolution(3)
Convolution(4)
Convolution(4)
Edge Filter(1)
Edge Filtering(2)
Edge Filtering(3)
Edge Filtering(4)
Edge Filtering(5)
Canny detector(1)
Canny detector(2)
Canny detector(3)
Canny detector(4)
Canny detector(5)
2nd order edge filter(1)
2nd order edge filter(2)
2nd order edge filter(3)
2nd order edge filter(4)
Marr-Hildreth Edge Detector
Marr-Hildreth Edge Detector
Marr-Hildreth Edge Detector
Marr’s and Canny’s
But edge extraction are hard
• Very difficult problem ! For nature
images, people can easily see many
obvious edges but algorithm fails to
extract them or gets useless edges!
• There are still no general and robust edge
detection algorithm!
• Why? ▪ Many factors, one of them is multiscale…
Multiscale
Multiscale everywhere
Multi-scale edge
Homework
• Further Reading
• Learning to Detect Natural Image Boundaries Using Local
Brightness, Color, and Texture Cues, TPAMI09
• Contour Detection and Hierarchical Image Segmentation, IEEE
TPAMI 2011.
• DeepEdge: A Multi-Scale Bifurcated Deep Network for Top-Down
Contour Detection, CVPR2015
• DeepContour: A Deep Convolutional Feature Learned by Positive-
sharing Loss for Contour Detection,CVPR2015
• Object Contour Detection with a Fully Convolutional Encoder-
Decoder Network,CVPR2016