pictorial structures and distance transforms computer vision cs 543 / ece 549 university of illinois...
TRANSCRIPT
![Page 1: Pictorial Structures and Distance Transforms Computer Vision CS 543 / ECE 549 University of Illinois Ian Endres 03/31/11](https://reader036.vdocuments.us/reader036/viewer/2022062315/5697bfd81a28abf838caeef4/html5/thumbnails/1.jpg)
Pictorial Structures and Distance Transforms
Computer VisionCS 543 / ECE 549
University of Illinois
Ian Endres
03/31/11
![Page 2: Pictorial Structures and Distance Transforms Computer Vision CS 543 / ECE 549 University of Illinois Ian Endres 03/31/11](https://reader036.vdocuments.us/reader036/viewer/2022062315/5697bfd81a28abf838caeef4/html5/thumbnails/2.jpg)
Goal: Detect all instances of objectsCars
Faces
Cats
![Page 3: Pictorial Structures and Distance Transforms Computer Vision CS 543 / ECE 549 University of Illinois Ian Endres 03/31/11](https://reader036.vdocuments.us/reader036/viewer/2022062315/5697bfd81a28abf838caeef4/html5/thumbnails/3.jpg)
Object model: last class• Statistical Template in Bounding Box
– Object is some (x,y,w,h) in image– Features defined wrt bounding box coordinates
Image Template Visualization
Images from Felzenszwalb
![Page 4: Pictorial Structures and Distance Transforms Computer Vision CS 543 / ECE 549 University of Illinois Ian Endres 03/31/11](https://reader036.vdocuments.us/reader036/viewer/2022062315/5697bfd81a28abf838caeef4/html5/thumbnails/4.jpg)
Last class: sliding window detection
…
…
![Page 5: Pictorial Structures and Distance Transforms Computer Vision CS 543 / ECE 549 University of Illinois Ian Endres 03/31/11](https://reader036.vdocuments.us/reader036/viewer/2022062315/5697bfd81a28abf838caeef4/html5/thumbnails/5.jpg)
Last class: statistical template
• Object model = log linear model of parts at fixed positions
+3 +2 -2 -1 -2.5 = -0.5
+4 +1 +0.5 +3 +0.5= 10.5
> 7.5?
> 7.5?
Non-object
Object
![Page 6: Pictorial Structures and Distance Transforms Computer Vision CS 543 / ECE 549 University of Illinois Ian Endres 03/31/11](https://reader036.vdocuments.us/reader036/viewer/2022062315/5697bfd81a28abf838caeef4/html5/thumbnails/6.jpg)
When are statistical templates useful?
Caltech 101 Average Object Images
![Page 7: Pictorial Structures and Distance Transforms Computer Vision CS 543 / ECE 549 University of Illinois Ian Endres 03/31/11](https://reader036.vdocuments.us/reader036/viewer/2022062315/5697bfd81a28abf838caeef4/html5/thumbnails/7.jpg)
Deformable objects
Images from Caltech-256
Slide Credit: Duan Tran
![Page 8: Pictorial Structures and Distance Transforms Computer Vision CS 543 / ECE 549 University of Illinois Ian Endres 03/31/11](https://reader036.vdocuments.us/reader036/viewer/2022062315/5697bfd81a28abf838caeef4/html5/thumbnails/8.jpg)
Deformable objects
Images from D. Ramanan’s datasetSlide Credit: Duan Tran
![Page 9: Pictorial Structures and Distance Transforms Computer Vision CS 543 / ECE 549 University of Illinois Ian Endres 03/31/11](https://reader036.vdocuments.us/reader036/viewer/2022062315/5697bfd81a28abf838caeef4/html5/thumbnails/9.jpg)
Object models: this class• Articulated parts model
– Object is configuration of parts– Each part is detectable
Images from Felzenszwalb
![Page 10: Pictorial Structures and Distance Transforms Computer Vision CS 543 / ECE 549 University of Illinois Ian Endres 03/31/11](https://reader036.vdocuments.us/reader036/viewer/2022062315/5697bfd81a28abf838caeef4/html5/thumbnails/10.jpg)
Parts-based Models
Define object by collection of parts modeled by1. Appearance2. Spatial configuration
Slide credit: Rob Fergus
![Page 11: Pictorial Structures and Distance Transforms Computer Vision CS 543 / ECE 549 University of Illinois Ian Endres 03/31/11](https://reader036.vdocuments.us/reader036/viewer/2022062315/5697bfd81a28abf838caeef4/html5/thumbnails/11.jpg)
How to model spatial relations?• Star-shaped model
Root
Part
Part
Part
Part
Part
![Page 12: Pictorial Structures and Distance Transforms Computer Vision CS 543 / ECE 549 University of Illinois Ian Endres 03/31/11](https://reader036.vdocuments.us/reader036/viewer/2022062315/5697bfd81a28abf838caeef4/html5/thumbnails/12.jpg)
How to model spatial relations?• Star-shaped model
=X X
X
≈Root
Part
Part
Part
Part
Part
![Page 13: Pictorial Structures and Distance Transforms Computer Vision CS 543 / ECE 549 University of Illinois Ian Endres 03/31/11](https://reader036.vdocuments.us/reader036/viewer/2022062315/5697bfd81a28abf838caeef4/html5/thumbnails/13.jpg)
How to model spatial relations?• Tree-shaped model
![Page 14: Pictorial Structures and Distance Transforms Computer Vision CS 543 / ECE 549 University of Illinois Ian Endres 03/31/11](https://reader036.vdocuments.us/reader036/viewer/2022062315/5697bfd81a28abf838caeef4/html5/thumbnails/14.jpg)
How to model spatial relations?
Fergus et al. ’03Fei-Fei et al. ‘03
Leibe et al. ’04, ‘08Crandall et al. ‘05Fergus et al. ’05
Crandall et al. ‘05 Felzenszwalb & Huttenlocher ‘05
Bouchard & Triggs ‘05 Carneiro & Lowe ‘06Csurka ’04Vasconcelos ‘00
from [Carneiro & Lowe, ECCV’06]
O(N6) O(N2) O(N3) O(N2)
• Many others...
![Page 15: Pictorial Structures and Distance Transforms Computer Vision CS 543 / ECE 549 University of Illinois Ian Endres 03/31/11](https://reader036.vdocuments.us/reader036/viewer/2022062315/5697bfd81a28abf838caeef4/html5/thumbnails/15.jpg)
Today’s class
1. Tree-shaped model–Example: Pictorial structures
• Felzenszwalb Huttenlocher 2005
2. Optimization with Dynamic Programming3. Distance Transforms
![Page 16: Pictorial Structures and Distance Transforms Computer Vision CS 543 / ECE 549 University of Illinois Ian Endres 03/31/11](https://reader036.vdocuments.us/reader036/viewer/2022062315/5697bfd81a28abf838caeef4/html5/thumbnails/16.jpg)
Pictorial Structures Model
Part = oriented rectangle Spatial model = relative size/orientation
Felzenszwalb and Huttenlocher 2005
![Page 17: Pictorial Structures and Distance Transforms Computer Vision CS 543 / ECE 549 University of Illinois Ian Endres 03/31/11](https://reader036.vdocuments.us/reader036/viewer/2022062315/5697bfd81a28abf838caeef4/html5/thumbnails/17.jpg)
Part representation• Background subtraction
![Page 18: Pictorial Structures and Distance Transforms Computer Vision CS 543 / ECE 549 University of Illinois Ian Endres 03/31/11](https://reader036.vdocuments.us/reader036/viewer/2022062315/5697bfd81a28abf838caeef4/html5/thumbnails/18.jpg)
Pictorial Structures Model
Appearance likelihood Geometry likelihood
![Page 19: Pictorial Structures and Distance Transforms Computer Vision CS 543 / ECE 549 University of Illinois Ian Endres 03/31/11](https://reader036.vdocuments.us/reader036/viewer/2022062315/5697bfd81a28abf838caeef4/html5/thumbnails/19.jpg)
Modeling the Appearance• Any appearance model could be used
– HOG Templates, etc.– Here: rectangles fit to background subtracted
binary map
• Train a detector for each part independently
Appearance likelihood Geometry likelihood
![Page 20: Pictorial Structures and Distance Transforms Computer Vision CS 543 / ECE 549 University of Illinois Ian Endres 03/31/11](https://reader036.vdocuments.us/reader036/viewer/2022062315/5697bfd81a28abf838caeef4/html5/thumbnails/20.jpg)
Pictorial Structures Model
Minimize Energy (-log of the likelihood):
Appearance Cost Geometry Cost
![Page 21: Pictorial Structures and Distance Transforms Computer Vision CS 543 / ECE 549 University of Illinois Ian Endres 03/31/11](https://reader036.vdocuments.us/reader036/viewer/2022062315/5697bfd81a28abf838caeef4/html5/thumbnails/21.jpg)
Optimization – Brute Force
H
T
F
Appearance Cost Deformation Cost
![Page 22: Pictorial Structures and Distance Transforms Computer Vision CS 543 / ECE 549 University of Illinois Ian Endres 03/31/11](https://reader036.vdocuments.us/reader036/viewer/2022062315/5697bfd81a28abf838caeef4/html5/thumbnails/22.jpg)
Optimization
H
T
F
![Page 23: Pictorial Structures and Distance Transforms Computer Vision CS 543 / ECE 549 University of Illinois Ian Endres 03/31/11](https://reader036.vdocuments.us/reader036/viewer/2022062315/5697bfd81a28abf838caeef4/html5/thumbnails/23.jpg)
Optimization - Dynamic Programming
H
T
F
![Page 24: Pictorial Structures and Distance Transforms Computer Vision CS 543 / ECE 549 University of Illinois Ian Endres 03/31/11](https://reader036.vdocuments.us/reader036/viewer/2022062315/5697bfd81a28abf838caeef4/html5/thumbnails/24.jpg)
Optimization - Dynamic Programming
H
T
F
![Page 25: Pictorial Structures and Distance Transforms Computer Vision CS 543 / ECE 549 University of Illinois Ian Endres 03/31/11](https://reader036.vdocuments.us/reader036/viewer/2022062315/5697bfd81a28abf838caeef4/html5/thumbnails/25.jpg)
Optimization - Dynamic Programming
H
T
F
![Page 26: Pictorial Structures and Distance Transforms Computer Vision CS 543 / ECE 549 University of Illinois Ian Endres 03/31/11](https://reader036.vdocuments.us/reader036/viewer/2022062315/5697bfd81a28abf838caeef4/html5/thumbnails/26.jpg)
Optimization - Dynamic Programming
H
T
F
![Page 27: Pictorial Structures and Distance Transforms Computer Vision CS 543 / ECE 549 University of Illinois Ian Endres 03/31/11](https://reader036.vdocuments.us/reader036/viewer/2022062315/5697bfd81a28abf838caeef4/html5/thumbnails/27.jpg)
Optimization - Complexity
H
T
F
?
For all lT?
Each part can take N locations
Overall? (for K parts)
![Page 28: Pictorial Structures and Distance Transforms Computer Vision CS 543 / ECE 549 University of Illinois Ian Endres 03/31/11](https://reader036.vdocuments.us/reader036/viewer/2022062315/5697bfd81a28abf838caeef4/html5/thumbnails/28.jpg)
• For each pixel p, how far away is the nearest pixel q of set S– – G is often the set of edge pixels
Distance Transform
G: black pixelsp1
q1
p2
q2
![Page 29: Pictorial Structures and Distance Transforms Computer Vision CS 543 / ECE 549 University of Illinois Ian Endres 03/31/11](https://reader036.vdocuments.us/reader036/viewer/2022062315/5697bfd81a28abf838caeef4/html5/thumbnails/29.jpg)
Computing the Distance Transform• 1-Dimension
![Page 30: Pictorial Structures and Distance Transforms Computer Vision CS 543 / ECE 549 University of Illinois Ian Endres 03/31/11](https://reader036.vdocuments.us/reader036/viewer/2022062315/5697bfd81a28abf838caeef4/html5/thumbnails/30.jpg)
• Extending to N-Dimensions• Savings can be extended from 1-D to N-D cases if deformation cost is
separable:
Computing the Distance Transform
![Page 31: Pictorial Structures and Distance Transforms Computer Vision CS 543 / ECE 549 University of Illinois Ian Endres 03/31/11](https://reader036.vdocuments.us/reader036/viewer/2022062315/5697bfd81a28abf838caeef4/html5/thumbnails/31.jpg)
Distance Transform - Applications• Set distances – e.g. Hausdorff Distance• Image processing – e.g. Blurring• Robotics – Motion Planning• Alignment
– Edge images– Motion tracks– Audio warping
• Deformable Part Models (of course!)
![Page 32: Pictorial Structures and Distance Transforms Computer Vision CS 543 / ECE 549 University of Illinois Ian Endres 03/31/11](https://reader036.vdocuments.us/reader036/viewer/2022062315/5697bfd81a28abf838caeef4/html5/thumbnails/32.jpg)
Generalized Distance Transform• Original form: • General form:
– m(q): arbitrary function sampled at discrete points– For each p, find a nearby q with small m(q)
– For original DT:
– For part models:
• For some deformation costs,
![Page 33: Pictorial Structures and Distance Transforms Computer Vision CS 543 / ECE 549 University of Illinois Ian Endres 03/31/11](https://reader036.vdocuments.us/reader036/viewer/2022062315/5697bfd81a28abf838caeef4/html5/thumbnails/33.jpg)
How do we do it?• Key idea: Construct lower “envelope” of data
• Given envelope, compute f(p) in O(N) time• Goal: Compute envelope in O(N) time
![Page 34: Pictorial Structures and Distance Transforms Computer Vision CS 543 / ECE 549 University of Illinois Ian Endres 03/31/11](https://reader036.vdocuments.us/reader036/viewer/2022062315/5697bfd81a28abf838caeef4/html5/thumbnails/34.jpg)
• Key idea: Keep track of intersection points– fi(x), fj(x) only intersect once (let i<j)
Computing the Lower Envelope
Start(1) = -InfEnd(1) = Inf
![Page 35: Pictorial Structures and Distance Transforms Computer Vision CS 543 / ECE 549 University of Illinois Ian Endres 03/31/11](https://reader036.vdocuments.us/reader036/viewer/2022062315/5697bfd81a28abf838caeef4/html5/thumbnails/35.jpg)
• Key idea: Keep track of intersection points– fi(x), fj(x) only intersect once (let i<j)
Computing the Lower Envelope
Start(1) = -InfEnd(1) = 3
Start(2) = 3End(2) = Inf
X
![Page 36: Pictorial Structures and Distance Transforms Computer Vision CS 543 / ECE 549 University of Illinois Ian Endres 03/31/11](https://reader036.vdocuments.us/reader036/viewer/2022062315/5697bfd81a28abf838caeef4/html5/thumbnails/36.jpg)
• Key idea: Keep track of intersection points– fi(x), fj(x) only intersect once (let i<j)
Computing the Lower Envelope
Start(1) = -InfEnd(1) = 3
Start(2) = 3End(2) = 4
Start(3) = 4End(3) = Inf
X
![Page 37: Pictorial Structures and Distance Transforms Computer Vision CS 543 / ECE 549 University of Illinois Ian Endres 03/31/11](https://reader036.vdocuments.us/reader036/viewer/2022062315/5697bfd81a28abf838caeef4/html5/thumbnails/37.jpg)
• Key idea: Keep track of intersection points– fi(x), fj(x) only intersect once (let i<j)
Computing the Lower Envelope
X
Start(1) = -InfEnd(1) = 3
Start(2) = 3End(2) = 4
Start(3) = 4End(3) = Inf
![Page 38: Pictorial Structures and Distance Transforms Computer Vision CS 543 / ECE 549 University of Illinois Ian Endres 03/31/11](https://reader036.vdocuments.us/reader036/viewer/2022062315/5697bfd81a28abf838caeef4/html5/thumbnails/38.jpg)
• Key idea: Keep track of intersection points– fi(x), fj(x) only intersect once (let i<j)
Computing the Lower Envelope
Start(1) = -InfEnd(1) = 3
Start(2) = 3End(2) = 4
Start(3) = []End(3) = []
Start(4) = 4End(4) = Inf
X
![Page 39: Pictorial Structures and Distance Transforms Computer Vision CS 543 / ECE 549 University of Illinois Ian Endres 03/31/11](https://reader036.vdocuments.us/reader036/viewer/2022062315/5697bfd81a28abf838caeef4/html5/thumbnails/39.jpg)
• Key idea: Keep track of intersection points– fi(x), fj(x) only intersect once (let i<j)
Computing the Lower Envelope
Start(1) = -InfEnd(1) = 3
Start(2) = 3End(2) = 4
Start(3) = []End(3) = []
Start(4) = 4End(4) = Inf
X
![Page 40: Pictorial Structures and Distance Transforms Computer Vision CS 543 / ECE 549 University of Illinois Ian Endres 03/31/11](https://reader036.vdocuments.us/reader036/viewer/2022062315/5697bfd81a28abf838caeef4/html5/thumbnails/40.jpg)
• Key idea: Keep track of intersection points– fi(x), fj(x) only intersect once (let i<j)
Computing the Lower Envelope
Start(1) = -InfEnd(1) = 3
Start(2) = 3End(2) = 4
Start(3) = []End(3) = []
Start(4) = 4End(4) = Inf
X
![Page 41: Pictorial Structures and Distance Transforms Computer Vision CS 543 / ECE 549 University of Illinois Ian Endres 03/31/11](https://reader036.vdocuments.us/reader036/viewer/2022062315/5697bfd81a28abf838caeef4/html5/thumbnails/41.jpg)
• Key idea: Keep track of intersection points– fi(x), fj(x) only intersect once (let i<j)
Computing the Lower Envelope
Start(1) = -InfEnd(1) = 3
Start(2) = []End(2) = []
Start(3) = []End(3) = []
Start(4) = 3End(4) = Inf
X
![Page 42: Pictorial Structures and Distance Transforms Computer Vision CS 543 / ECE 549 University of Illinois Ian Endres 03/31/11](https://reader036.vdocuments.us/reader036/viewer/2022062315/5697bfd81a28abf838caeef4/html5/thumbnails/42.jpg)
• Key idea: Keep track of intersection points– fi(x), fj(x) only intersect once (let i<j)
Computing the Lower Envelope
Start(1) = -InfEnd(1) = 2.6
Start(2) = []End(2) = []
Start(3) = []End(3) = []
Start(4) = 2.6End(4) = Inf
X
![Page 43: Pictorial Structures and Distance Transforms Computer Vision CS 543 / ECE 549 University of Illinois Ian Endres 03/31/11](https://reader036.vdocuments.us/reader036/viewer/2022062315/5697bfd81a28abf838caeef4/html5/thumbnails/43.jpg)
• Key idea: Keep track of intersection points– fi(x), fj(x) only intersect once (let i<j)
Computing the Lower Envelope
Start(1) = -InfEnd(1) = 2.6
Start(2) = []End(2) = []
Start(3) = []End(3) = []
Start(4) = 2.6End(4) = 3.5
Start(5) = 3.5End(5) = Inf
X
![Page 44: Pictorial Structures and Distance Transforms Computer Vision CS 543 / ECE 549 University of Illinois Ian Endres 03/31/11](https://reader036.vdocuments.us/reader036/viewer/2022062315/5697bfd81a28abf838caeef4/html5/thumbnails/44.jpg)
Is This O(N)?
• Yes!• N parabolas are added
– For each addition, may remove up to O(N) parabolas
• But, each parabola can only be deleted once• Thus, O(N) additions, and at most O(N)
deletions
![Page 45: Pictorial Structures and Distance Transforms Computer Vision CS 543 / ECE 549 University of Illinois Ian Endres 03/31/11](https://reader036.vdocuments.us/reader036/viewer/2022062315/5697bfd81a28abf838caeef4/html5/thumbnails/45.jpg)
What distances can we use?
• Quadratic (with linear shift)
• Abs. diff
• Min-composition
• Bounded (Requires extra bookkeeping)
![Page 46: Pictorial Structures and Distance Transforms Computer Vision CS 543 / ECE 549 University of Illinois Ian Endres 03/31/11](https://reader036.vdocuments.us/reader036/viewer/2022062315/5697bfd81a28abf838caeef4/html5/thumbnails/46.jpg)
Back to the Pictorial structures model
Problem: May need to infer more than one configuration
a) May be more than one object• Report scores for every root position, apply NMS
b) Optimal solution may be incorrect• Sampling
– Sample root node, then each node given parent, until all parts are sampled
![Page 47: Pictorial Structures and Distance Transforms Computer Vision CS 543 / ECE 549 University of Illinois Ian Endres 03/31/11](https://reader036.vdocuments.us/reader036/viewer/2022062315/5697bfd81a28abf838caeef4/html5/thumbnails/47.jpg)
Sample poses from likelihood and choose best match with Chamfer distance to map
![Page 48: Pictorial Structures and Distance Transforms Computer Vision CS 543 / ECE 549 University of Illinois Ian Endres 03/31/11](https://reader036.vdocuments.us/reader036/viewer/2022062315/5697bfd81a28abf838caeef4/html5/thumbnails/48.jpg)
Results for person matching
48
![Page 49: Pictorial Structures and Distance Transforms Computer Vision CS 543 / ECE 549 University of Illinois Ian Endres 03/31/11](https://reader036.vdocuments.us/reader036/viewer/2022062315/5697bfd81a28abf838caeef4/html5/thumbnails/49.jpg)
Results for person matching - Mistakes• Ambiguous Background subtracted image
49
![Page 50: Pictorial Structures and Distance Transforms Computer Vision CS 543 / ECE 549 University of Illinois Ian Endres 03/31/11](https://reader036.vdocuments.us/reader036/viewer/2022062315/5697bfd81a28abf838caeef4/html5/thumbnails/50.jpg)
Recently enhanced pictorial structures
BMVC 2009
![Page 51: Pictorial Structures and Distance Transforms Computer Vision CS 543 / ECE 549 University of Illinois Ian Endres 03/31/11](https://reader036.vdocuments.us/reader036/viewer/2022062315/5697bfd81a28abf838caeef4/html5/thumbnails/51.jpg)
Deformable Latent Parts Model
Useful parts discovered during training
Detections
Template Visualization
Felzenszwalb et al. 2008
![Page 52: Pictorial Structures and Distance Transforms Computer Vision CS 543 / ECE 549 University of Illinois Ian Endres 03/31/11](https://reader036.vdocuments.us/reader036/viewer/2022062315/5697bfd81a28abf838caeef4/html5/thumbnails/52.jpg)
Things to Remember• Rather than searching for whole object, can
locate parts that compose the object– Better encoding of spatial variation
• Models can be broken down into part appearance and spatial configuration– Wide variety of models
• Efficient optimization is often tricky, but many tricks available– Dynamic programming, Distance transforms