Computer Vision GroupUniversity of California Berkeley
Visual Grouping and Object Recognition
Jitendra Malik*
U.C. Berkeley
* with S. Belongie, C. Fowlkes, T. Leung, D. Martin, G. Mori, J. Puzicha, J.Shi, X. Ren
Computer Vision GroupUniversity of California Berkeley
From images/video to objects
Labeled sets: tiger, grass etc
Computer Vision GroupUniversity of California Berkeley
Computer Vision GroupUniversity of California Berkeley
Computer Vision GroupUniversity of California Berkeley
Consistency
A
B C
• A,C are refinements of B• A,C are mutual refinements • A,B,C represent the same percept
• Attention accounts for differences
Image
BG L-bird R-bird
grass bush
headeye
beakfar body
headeye
beak body
Perceptual organization forms a tree:
Two segmentations are consistent when they can beexplained by the samesegmentation tree (i.e. theycould be derived from a single perceptual organization).
Computer Vision GroupUniversity of California Berkeley
Outline
• Finding boundaries
• Recognizing objects
• Recognizing actions
Computer Vision GroupUniversity of California Berkeley
Finding boundaries: Is texture a problem or a solution?
image orientation energy
Computer Vision GroupUniversity of California Berkeley
Statistically optimal contour detection
• Use humans to segment a large collection of natural images.
• Train a classifier for the contour/non-contour classification using orientation energy and texture gradient as features.
Computer Vision GroupUniversity of California Berkeley
Orientation Energy
• Gaussian 2nd derivative and its Hilbert pair
•
• Can detect combination of bar and edge features [Perona & Malik 90]
22 )()( evenodd fIfIOE
Computer Vision GroupUniversity of California Berkeley
Texture gradient = Chi square distance between texton histograms in half disks across edge
Texture gradient = Chi square distance between texton histograms in half disks across edge
i
j
k
K
m ji
jiji mhmh
mhmhhh
1
22
)()(
)]()([
2
1),(Chi-square
0.1
0.8
Computer Vision GroupUniversity of California Berkeley
Computer Vision GroupUniversity of California Berkeley
Computer Vision GroupUniversity of California Berkeley
ROC curve for local boundary detection
Computer Vision GroupUniversity of California Berkeley
Outline
• Finding boundaries
• Recognizing objects
• Recognizing actions
Computer Vision GroupUniversity of California Berkeley
Biological Shape
• D’Arcy Thompson: On Growth and Form, 1917– studied transformations between shapes of organisms
Computer Vision GroupUniversity of California Berkeley
Deformable Templates: Related Work
• Fischler & Elschlager (1973)
• Grenander et al. (1991)
• von der Malsburg (1993)
Computer Vision GroupUniversity of California Berkeley
Matching Framework
• Find correspondences between points on shape
• Fast pruning
• Estimate transformation & measure similarity
model target
...
Computer Vision GroupUniversity of California Berkeley
Comparing Pointsets
Computer Vision GroupUniversity of California Berkeley
Shape ContextCount the number of points inside each bin, e.g.:
Count = 4
Count = 10
...
Compact representation of distribution of points relative to each point
Computer Vision GroupUniversity of California Berkeley
Shape Context
Computer Vision GroupUniversity of California Berkeley
Comparing Shape Contexts
Compute matching costs using Chi Squared distance:
Recover correspondences by solving linear assignment problem with costs Cij
[Jonker & Volgenant 1987]
Computer Vision GroupUniversity of California Berkeley
Matching Framework
• Find correspondences between points on shape
• Fast pruning
• Estimate transformation & measure similarity
model target
...
Computer Vision GroupUniversity of California Berkeley
Fast pruning
• Find best match for the shape context at only a few random points and add up cost
),(minarg
),(),(
2*
*
1
2
ui
jqueryui
ij
query
r
jiquery
SCSCSC
SCSCSSdist
Computer Vision GroupUniversity of California Berkeley
Matching Framework
• Find correspondences between points on shape
• Fast pruning
• Estimate transformation & measure similarity
model target
...
Computer Vision GroupUniversity of California Berkeley
• 2D counterpart to cubic spline:
• Minimizes bending energy:
• Solve by inverting linear system
• Can be regularized when data is inexact
Thin Plate Spline Model
Duchon (1977), Meinguet (1979), Wahba (1991)
Computer Vision GroupUniversity of California Berkeley
MatchingExample
model target
Computer Vision GroupUniversity of California Berkeley
Outlier Test Example
Computer Vision GroupUniversity of California Berkeley
Object Recognition Experiments
• Handwritten digits
• COIL 3D objects (Nayar-Murase)
• Human body configurations
• Trademarks
Computer Vision GroupUniversity of California Berkeley
Terms in Similarity Score• Shape Context difference
• Local Image appearance difference– orientation– gray-level correlation in Gaussian window– … (many more possible)
• Bending energy
Computer Vision GroupUniversity of California Berkeley
Handwritten Digit Recognition
• MNIST 60 000: – linear: 12.0%
– 40 PCA+ quad: 3.3%
– 1000 RBF +linear: 3.6%
– K-NN: 5%
– K-NN (deskewed): 2.4%
– K-NN (tangent dist.): 1.1%
– SVM: 1.1%
– LeNet 5: 0.95%
• MNIST 600 000 (distortions): – LeNet 5: 0.8%– SVM: 0.8%– Boosted LeNet 4: 0.7%
• MNIST 20 000: – K-NN, Shape Context
matching: 0.63%
Computer Vision GroupUniversity of California Berkeley
Computer Vision GroupUniversity of California Berkeley
COIL Object Database
Computer Vision GroupUniversity of California Berkeley
Prototypes Selected for 2 Categories
Details in Belongie, Malik & Puzicha (NIPS2000)
Computer Vision GroupUniversity of California Berkeley
Error vs. Number of Views
Computer Vision GroupUniversity of California Berkeley
Human body configurations
Computer Vision GroupUniversity of California Berkeley
Deformable Matching
• Kinematic chain-based deformation model
• Use iterations of correspondence and deformation
• Keypoints on exemplars are deformed to locations on query image
Computer Vision GroupUniversity of California Berkeley
Results
Computer Vision GroupUniversity of California Berkeley
Trademark Similarity
Computer Vision GroupUniversity of California Berkeley
Recognizing objects in scenes
Computer Vision GroupUniversity of California Berkeley
Outline
• Finding boundaries
• Recognizing objects
• Recognizing actions
Computer Vision GroupUniversity of California Berkeley
Examples of Actions• Movement and posture change
– run, walk, crawl, jump, hop, swim, skate, sit, stand, kneel, lie, dance (various), …
• Object manipulation– pick, carry, hold, lift, throw, catch, push, pull, write, type, touch, hit,
press, stroke, shake, stir, turn, eat, drink, cut, stab, kick, point, drive, bike, insert, extract, juggle, play musical instrument (various)…
• Conversational gesture– point, …
• Sign Language
Computer Vision GroupUniversity of California Berkeley
Key cues for action recognition
• “Morpho-kinesics” of action (shape and movement of the body)
• Identity of the object/s
• Activity context
Computer Vision GroupUniversity of California Berkeley
Image/Video Stick figure Action
• Stick figures can be specified in a variety of ways or at various resolutions (deg of freedom)– 2D joint positions– 3D joint positions– Joint angles
• Complete representation
• Evidence that it is effectively computable
Computer Vision GroupUniversity of California Berkeley
Tracking by Repeated Finding
Computer Vision GroupUniversity of California Berkeley
Achievable goals in 3 years
• Reasonable competence at object recognition at crude category level (~1000)
• Detection/Tracking of humans as kinematic chains, assuming adequate resolution.
• Recognition of ~10-100 actions and compositions thereof.