Segmentation
Slides Credit: Jim Rehg, G.Tech.
Christopher Rasmussen, UD
John Spletzer, Lehigh
Also, Slides adopted from material provided byDavid Forsyth and Trevor Darrell
Segmentation and Grouping
• Obtain a compact representation from an image/motion sequence/set of tokens
• Should support application
• Broad theory is absent at present
• Grouping (or clustering)– collect together tokens
that “belong together”
• Fitting– associate a model with
tokens– issues
• which model?• which token goes to
which element?• how many elements in the
model?
What is Segmentation?
• Clustering image elements that “belong together” – Partitioning
• Divide into regions/sequences with coherent internal properties(k-means)
– Grouping • Identify sets of coherent tokens in image (model fit, Hough)
• Tokens: Whatever we need to group – Pixels – Features (corners, lines, etc.) – Larger regions (e.g., arms, legs, torso)– Discrete objects (e.g., people in a crowd)– Etc.
General ideas
• tokens– whatever we need to
group (pixels, points, surface elements, etc., etc.)
• top down segmentation– tokens belong together
because they lie on the same object
• bottom up segmentation– tokens belong together
because they are locally coherent
• These two are not mutually exclusive
Basic ideas of grouping in humans
• Gestalt properties– elements in a collection of elements can have
properties that result from relationships (Muller-Lyer effect)
• gestaltqualitat
– A series of factors affect whether elements should be grouped together
• Gestalt factors
Gestalt Theory of Grouping
• Psychological basis for why/how things are grouped bottom-up and top-down
• Figure-ground discrimination– Grouping can be seen in terms of allocating tokens to figure or ground
• Factors affecting token coherence– Proximity– Similarity: Based on color, texture, orientation (aka parallelism), etc.– Common fate: Parallel motion – Common region: Tokens that lie inside the same closed region tend to be
grouped together.– Closure: Tokens or curves that tend to lead to closed curves tend to be
grouped together.– Symmetry: Curves that lead to symmetric groups are grouped together– Continuity: Tokens that lead to “continuous” — as in “joining up nicely,”
rather than in the formal sense — curves tend to be grouped– Familiar Configuration: Tokens that, when grouped, lead to a familiar
object—e.g., the top-down recognition that allows us to see the dalmation
from Forsyth & Ponce
Technique: Background Subtraction
• If we know what the background looks like, it is easy to identify “interesting bits”
• Applications– Person in an office– Tracking cars on a
road– surveillance
• Approach:– use a moving average
to estimate background image
– subtract from current frame
– large absolute values are interesting pixels
• trick: use morphological operations to clean up pixels
Basic Approaches to Clustering
• Unknown number of clusters– Agglomerative clustering
• Start with as many clusters as tokens and selectively merge
– Divisive clustering• Start with one cluster for all tokens and selectively split
• Known number of clusters– Selectively change cluster memberships of tokens
• Merging/splitting/rearranging stops when threshold on token similarity is reached– Within cluster: As similar as possible– Between clusters: As dissimilar as possible
Feature Space
• Every token is identified by a set of salient visual characteristics called features (akin to gestalt grouping factors). For example: – Position – Color– Texture– Motion vector– Size, orientation (if token is larger than a pixel)
• The choice of features and how they are quantified implies a feature space in which each token is represented by a point
• Token similarity is thus measured by distance between points (aka “feature vectors”) in feature space
k-means Clustering
• Initialization: Given k categories, N points in feature space. Pick k points randomly; these are initial
cluster centers (means) ¹1, …, ¹k. Repeat the following:
1. Assign all N points to clusters by nearest ¹i (make sure no cluster is empty)
2. Recompute mean ¹i of each cluster from Ci member points3. If no mean has changed more than some ¢, stop
• Effectively carries out gradient descent on
Example: 3-means Clustering
from Duda et al.
Convergence in 3 steps
The mean changes, and so arethe clusters.
Hough Transform (HT)
• Basic idea: Change problem from complicated pattern detection to peak finding in parameter space of the shape– Each pixel can lie on a family of possible shapes
(e.g., for lines, the set of lines through that point)– Shapes with more pixels on them
have more evidence that they are
present in the image– Thus every pixel “votes” for a set of shapes
and the one(s) with the most votes “win”—i.e., exist courtesy of Massey U.
The Hough Transform
• The general idea:
– A line in the image can be parameterized by 2 variables
– Each edge pixel (x,y) corresponds to a family of lines L(x,y) = {l1,…,ln}
– Pixel (x,y) votes for each li L(x,y)
– Edge pixels that form a line will each place one vote for the same (ai,bi) – along with lots of other lines
– Lines that are in the image will receive more votes than ones that are not
ii bxay
The Hough Transform
• Edge pixels that form a line will each place one vote for the same (ai,bi) – along with lots of other lines
The Hough Transform
• The line feature detection problem is transformed into a peak detection algorithm!
• We need only find the lines with the most votes, and these correspond to the greatest
• Issue: Line representation
• How do we discretize a and b?
],[
],[
b
a
Solution: Polar Representation• Instead of using the slope-intercept representation, we can use a polar
representation
where ρ corresponds to the normal distance to the line and θ the polar angle
• These parameters are bounded by
which we can discretize to an appropriate resolution
sincos yx
]2,0[
],0[ 22
wh imim
HT for Line Finding
• Fixing an image pixel (xi, yi) yields a set of points in line space f(r, µ)g corresponding to a sinusoidal curve described by
• Each point on curve in line space is a member of the set of lines through the pixel
• Collinear points yield curves that intersect in a single point
courtesy of R. Bock
The Algorithm
1. Take as input an edge image E(i,j)
2. Define a resolution dθ and dρ for θ and ρ, respectively
3. Construct an accumulator array A(m,n)=0 where m=(1/dρ)*ρmax and n=2π/dθ
],0[ 22max wh imim
The Algorithm
4. For each pixel E(i,j)==255 do
For θ = dθ:dθ:2*pi
1. ρ = i*cos(θ) + j*sin(θ)
2. if ρ<0, continue
3. Round ρ to the nearest dρ value
4. A(i’,j’)++
5. Threshold A(i’,j’) to find all relevant lines
Example: HT for Line Finding
Edge-detectedimage
Accumulatorarray
“De-Hough” of lines ¸ 70% of max
courtesy of Massey U.
Mechanics of the Hough transform
• How big should the cells be? – If too big, we can’t distinguish between different
lines; if too small, noise causes lines to be missed)
• How many lines?– Count the peaks in the array (thresholding)
• Who belongs to which line?– Tag the votes, or post-process
Hough Transform: Issues
• Noise– Points slightly off curve result in multiple intersections
• Can use larger bins, smooth accumulator array• Non-maximum suppression a good idea to get unique peaks• Dimensionality
– Exponential increase in size of accumulator array as number of shape parameters goes up
– HT works best for shapes with 3 or fewer variables (circles, ellipses, etc.)
Since the HT is a voting algorithm, it is robust to noise and can detect broken, disconnected, or partially occluded line segments