cap5415-lecture 2mtappen/cap5415/lecs/lec10.pdftitle: cap5415-lecture 2 author: khurram hassan...

Post on 04-Oct-2020

1 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Lecture 10: Segmentation by Clustering

Marshall Tappen

Region Segmentation

What is the idea underlying segmentation?

We want to group pixels together that “belong” togetherComputer vision researchers aren't the first ones to think about thisAlso studied by the Gestalt school of psychologists

GroupingTo perceive the image, the elements must be perceived as a wholeStudied how elements could be grouped togetherGestalt psychologists identified a group of factors that led to elements being grouped togetherI'm mentioning these ideas because they often come up in discussions of segmentation and grouping in computer visionHere are some examples

Proximity

Things that are nearby tend to be grouped together

(Figure from Forsyth and Ponce)

Similarity

Similar things tend to be grouped together

(Figure from Forsyth and Ponce)

Common Region

Tokens that lie in the same region tend to be grouped together

(Figure from Forsyth and Ponce)

Parallelism

Parallel lines or tokens tend to be grouped together

(Figure from Forsyth and Ponce)

Symmetry

We prefer groupings that lead to symmetric groups

(Figure from Forsyth and Ponce)

Closure

Tokens that lead to closed curves tend to be grouped together

(Figure from Forsyth and Ponce)

Grouping can lead to interesting effects

Called the Kanizsa TriangleGrouping is causing you to see illusory contours

Back to Pixels

Our goal is to group pixelsWe won't be able to incorporate all of the Gestalt cues, so we will have to focus on simpler cuesRGB similarityProximity

Simple idea

Let's find three clusters in this dataThese points could represent RGB triplets in 3D

Simple idea

Begin by guessing where the “center” of each cluster is

Simple idea

Now assign each point to the closest cluster

Simple ideaNow move each cluster center to the center of the points assigned to itRepeat this process until it converges

Mathematically, What's going on?

Each cluster will be described by a center μj

Each point, xi, will be assigned to one cluster

Call this assignment c(i)Our goal is to find the assignments and centers that minimize

How do we do this?

Optimizing c(i) and μj jointly is too difficult

But!What if I know μ

j already?

How do I minimize this?

How do we do this?

What if I know c(i) already?

Do you see why it's called k-means?

K-Means

How does this translate to images?

(From Comanciu and Meer)

Image Segmentation by K-Means

Select a value of KSelect a feature vector for every pixel (color, texture, position, or combination of these etc.)Define a similarity measure between feature vectors (Usually Euclidean Distance).Apply K-Means Algorithm.Apply Connected Components Algorithm.Merge any components of size less than some threshold to an adjacent component that is most similar to it.

K-means clustering using intensity alone and color alone

Image

Clusters on intensity

Clusters on color

K-means using color alone, 11 segments

Image

Clusters on color

K-means usingcolor alone,11 segments.

Probabilistic Point of View

We'll take a generative point of viewHow to generate a data point:1)Choose a cluster,z, from (1 .... N)2)Sample that point from the distribution associated with that cluster

1D Example

Called a Mixture Model

z indicates which cluster is chosen

Probability of choosing cluster k

Probability of x given the cluster is k

or

To make it a Mixture of Gaussians

Called a mixing coefficient

Brief Review of Gaussians

Mixture of Gaussians

In Context of Our Previous Model

Now, we have means and covariances

How does this help with clustering?

If we had the parameters of the clusters, it would be easy to assign points to clusters

How do we get the cluster parameters? We'll maximize the likelihood of the data

Mathematically, this means

Log of Mixture ModelMixture Model

Now we run into a problem

This is hard to maximize But, we can lower bound it If the lower bound is easy to work with, we

can maximize it. That should push the true function up

Lower Bounding

We use a theorem called Jensen's inequality

These have to add up to one

This looks familiar

This looks a lot like using Bayes rule to find the probability of that point's cluster

Now life is easier We can now differentiate to find parameters This is called the M-Step, The previous step is called the E-Step You are always increasing a lower bound Complete set of steps:

Find Mean Covariance Mixing Coefficients

Where this comes from

Let's differentiate with respect to \mu_k

Mixing Coefficients

EM Algorithm

This is called the E-StepM-Step: Using these estimates of maximize the rest of the parameters

Find Mean Covariance Mixing Coefficients

Back to clustering

Now we have Can be seen as a soft-clustering

How many clusters?

Remember the line problem?

Basic Idea

We want to fit the data well, but we don't want a model that is too complex

We are balancing two issues: Fitting the data Model complexity (Here, that is the

number of lines Three popular criteria for evaluating this

AIC – An Information Criterion

L is the squared error in our predictions (There is a probabilistic interpretation also,

involving the log-likelihood) The variable p is the number of

parameters

BIC – Bayes Information Criterion

L is the squared error in our predictions (There is a probabilistic interpretation also,

involving the log-likelihood) The variable N is the number of

parameters Also called MDL (Minimum Description

Length)

It doesn't always work (But it's close)

Another Clustering Application

Another Clustering Application

In this case, we have a video and we want to segment out what's moving or changing

from C. Stauffer and W. Grimson

Easy Solution

Average a bunch of frames to get a “Background” ImageComputer the difference between background and foreground

The difficulty with this approach

The background changes

(From Stauffer and Grimson)

Solution

Fit a mixture model to the backgroundI.E. A background pixel could have multiple colors

Can use this to track in surveillance

Advantages/Disadvantages

Advantages:Easy to code!Flexible, you can easily incorporate

cues like proximity by including more features

Be careful about scaling! (why?)Monotonic optimization

Advantages/Disadvantages

Disadvantages:Only converges to a local minimumYou still need to initialize itThat could have a big impact on quality

of results

top related