clusteringpmuthuku/mlsp_page/lectures/... · 2012-09-27 · k–means 1. initialize a set of...

126
Clustering

Upload: others

Post on 06-Aug-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Clusteringpmuthuku/mlsp_page/lectures/... · 2012-09-27 · K–means 1. Initialize a set of centroids randomly 2. For each data point x, find the distance from the centroid for each

Clustering

Page 2: Clusteringpmuthuku/mlsp_page/lectures/... · 2012-09-27 · K–means 1. Initialize a set of centroids randomly 2. For each data point x, find the distance from the centroid for each

Image Segmentation

Page 3: Clusteringpmuthuku/mlsp_page/lectures/... · 2012-09-27 · K–means 1. Initialize a set of centroids randomly 2. For each data point x, find the distance from the centroid for each

Image Segmentation

Pink/White pixel : Apple blossom Orange pixel : Orange

Green pixel : leaf

Page 4: Clusteringpmuthuku/mlsp_page/lectures/... · 2012-09-27 · K–means 1. Initialize a set of centroids randomly 2. For each data point x, find the distance from the centroid for each

Image Segmentation

Page 5: Clusteringpmuthuku/mlsp_page/lectures/... · 2012-09-27 · K–means 1. Initialize a set of centroids randomly 2. For each data point x, find the distance from the centroid for each

Pixels as features

Page 6: Clusteringpmuthuku/mlsp_page/lectures/... · 2012-09-27 · K–means 1. Initialize a set of centroids randomly 2. For each data point x, find the distance from the centroid for each

Principle of clustering:

Put things that are closer to each

other (in feature space) into the

same group

Page 7: Clusteringpmuthuku/mlsp_page/lectures/... · 2012-09-27 · K–means 1. Initialize a set of centroids randomly 2. For each data point x, find the distance from the centroid for each

Pixels as features

Page 8: Clusteringpmuthuku/mlsp_page/lectures/... · 2012-09-27 · K–means 1. Initialize a set of centroids randomly 2. For each data point x, find the distance from the centroid for each

But what is a ‘good’ cluster?

Low in-group variability

High out of group

variability

Page 9: Clusteringpmuthuku/mlsp_page/lectures/... · 2012-09-27 · K–means 1. Initialize a set of centroids randomly 2. For each data point x, find the distance from the centroid for each

Compactness: Min(in group variability)

Need a measure that shows how ‘compact’ our clusters

are

Distance based measures

Page 10: Clusteringpmuthuku/mlsp_page/lectures/... · 2012-09-27 · K–means 1. Initialize a set of centroids randomly 2. For each data point x, find the distance from the centroid for each

Distance-based Measures

Total distance between each element in the cluster and

every other element

Page 11: Clusteringpmuthuku/mlsp_page/lectures/... · 2012-09-27 · K–means 1. Initialize a set of centroids randomly 2. For each data point x, find the distance from the centroid for each

Distance-based Measures

Distance between farthest points in cluster

Page 12: Clusteringpmuthuku/mlsp_page/lectures/... · 2012-09-27 · K–means 1. Initialize a set of centroids randomly 2. For each data point x, find the distance from the centroid for each

Distance-based Measures

Total distance of every element in the cluster from the

Centroid in the cluster

Page 13: Clusteringpmuthuku/mlsp_page/lectures/... · 2012-09-27 · K–means 1. Initialize a set of centroids randomly 2. For each data point x, find the distance from the centroid for each

Distance-based Measures

Total distance of every element in the cluster from the

Centroid in the cluster

Page 14: Clusteringpmuthuku/mlsp_page/lectures/... · 2012-09-27 · K–means 1. Initialize a set of centroids randomly 2. For each data point x, find the distance from the centroid for each

Distance-based Measures

Total distance of every element in the cluster from the

Centroid in the cluster

Page 15: Clusteringpmuthuku/mlsp_page/lectures/... · 2012-09-27 · K–means 1. Initialize a set of centroids randomly 2. For each data point x, find the distance from the centroid for each

Finding clusters: K-means

Page 16: Clusteringpmuthuku/mlsp_page/lectures/... · 2012-09-27 · K–means 1. Initialize a set of centroids randomly 2. For each data point x, find the distance from the centroid for each

K-means algorithm

Minimizes scatter: Distance from centroid

Page 17: Clusteringpmuthuku/mlsp_page/lectures/... · 2012-09-27 · K–means 1. Initialize a set of centroids randomly 2. For each data point x, find the distance from the centroid for each

What is a ‘Centroid’

clusteri

icluster xn

m1

Page 18: Clusteringpmuthuku/mlsp_page/lectures/... · 2012-09-27 · K–means 1. Initialize a set of centroids randomly 2. For each data point x, find the distance from the centroid for each

What is a ‘Centroid’

clusteri

ii

clusteri

i

cluster xww

m1

Page 19: Clusteringpmuthuku/mlsp_page/lectures/... · 2012-09-27 · K–means 1. Initialize a set of centroids randomly 2. For each data point x, find the distance from the centroid for each

K–means 1. Initialize a set of centroids randomly

2. For each data point x, find the distance from the centroid for each cluster

3. Put data point in the cluster of the closest centroid

• Cluster for which dcluster is minimum

4. When all data points clustered, recompute cluster centroid

5. If not converged, go back to 2

),( clustercluster mxd distance

clusteri

i

cluster

cluster xN

m1

Page 20: Clusteringpmuthuku/mlsp_page/lectures/... · 2012-09-27 · K–means 1. Initialize a set of centroids randomly 2. For each data point x, find the distance from the centroid for each

K–means 1. Initialize a set of centroids randomly

2. For each data point x, find the distance from the centroid for each cluster

3. Put data point in the cluster of the closest centroid

• Cluster for which dcluster is minimum

4. When all data points clustered, recompute cluster centroid

5. If not converged, go back to 2

),( clustercluster mxd distance

clusteri

i

cluster

cluster xN

m1

Page 21: Clusteringpmuthuku/mlsp_page/lectures/... · 2012-09-27 · K–means 1. Initialize a set of centroids randomly 2. For each data point x, find the distance from the centroid for each

K–means 1. Initialize a set of centroids randomly

2. For each data point x, find the distance from the centroid for each cluster

3. Put data point in the cluster of the closest centroid

• Cluster for which dcluster is minimum

4. When all data points clustered, recompute cluster centroid

5. If not converged, go back to 2

),( clustercluster mxd distance

clusteri

i

cluster

cluster xN

m1

Page 22: Clusteringpmuthuku/mlsp_page/lectures/... · 2012-09-27 · K–means 1. Initialize a set of centroids randomly 2. For each data point x, find the distance from the centroid for each

K–means 1. Initialize a set of centroids randomly

2. For each data point x, find the distance from the centroid for each cluster

3. Put data point in the cluster of the closest centroid

• Cluster for which dcluster is minimum

4. When all data points clustered, recompute cluster centroid

5. If not converged, go back to 2

),( clustercluster mxd distance

clusteri

i

cluster

cluster xN

m1

Page 23: Clusteringpmuthuku/mlsp_page/lectures/... · 2012-09-27 · K–means 1. Initialize a set of centroids randomly 2. For each data point x, find the distance from the centroid for each

K–means 1. Initialize a set of centroids randomly

2. For each data point x, find the distance from the centroid for each cluster

3. Put data point in the cluster of the closest centroid

• Cluster for which dcluster is minimum

4. When all data points clustered, recompute cluster centroid

5. If not converged, go back to 2

),( clustercluster mxd distance

clusteri

i

cluster

cluster xN

m1

Page 24: Clusteringpmuthuku/mlsp_page/lectures/... · 2012-09-27 · K–means 1. Initialize a set of centroids randomly 2. For each data point x, find the distance from the centroid for each

K–means 1. Initialize a set of centroids randomly

2. For each data point x, find the distance from the centroid for each cluster

3. Put data point in the cluster of the closest centroid

• Cluster for which dcluster is minimum

4. When all data points clustered, recompute cluster centroid

5. If not converged, go back to 2

),( clustercluster mxd distance

clusteri

i

cluster

cluster xN

m1

Page 25: Clusteringpmuthuku/mlsp_page/lectures/... · 2012-09-27 · K–means 1. Initialize a set of centroids randomly 2. For each data point x, find the distance from the centroid for each

K–means 1. Initialize a set of centroids randomly

2. For each data point x, find the distance from the centroid for each cluster

3. Put data point in the cluster of the closest centroid

• Cluster for which dcluster is minimum

4. When all data points clustered, recompute cluster centroid

5. If not converged, go back to 2

),( clustercluster mxd distance

clusteri

i

cluster

cluster xN

m1

Page 26: Clusteringpmuthuku/mlsp_page/lectures/... · 2012-09-27 · K–means 1. Initialize a set of centroids randomly 2. For each data point x, find the distance from the centroid for each

K–means 1. Initialize a set of centroids randomly

2. For each data point x, find the distance from the centroid for each cluster

3. Put data point in the cluster of the closest centroid

• Cluster for which dcluster is minimum

4. When all data points clustered, recompute cluster centroid

5. If not converged, go back to 2

),( clustercluster mxd distance

clusteri

i

cluster

cluster xN

m1

Page 27: Clusteringpmuthuku/mlsp_page/lectures/... · 2012-09-27 · K–means 1. Initialize a set of centroids randomly 2. For each data point x, find the distance from the centroid for each

K–means 1. Initialize a set of centroids randomly

2. For each data point x, find the distance from the centroid for each cluster

3. Put data point in the cluster of the closest centroid

• Cluster for which dcluster is minimum

4. When all data points clustered, recompute cluster centroid

5. If not converged, go back to 2

),( clustercluster mxd distance

clusteri

i

cluster

cluster xN

m1

Page 28: Clusteringpmuthuku/mlsp_page/lectures/... · 2012-09-27 · K–means 1. Initialize a set of centroids randomly 2. For each data point x, find the distance from the centroid for each

K–means 1. Initialize a set of centroids randomly

2. For each data point x, find the distance from the centroid for each cluster

3. Put data point in the cluster of the closest centroid

• Cluster for which dcluster is minimum

4. When all data points are clustered, recompute centroids

5. If not converged, go back to 2

),( clustercluster mxd distance

clusteri

ii

clusteri

i

cluster xww

m1

Page 29: Clusteringpmuthuku/mlsp_page/lectures/... · 2012-09-27 · K–means 1. Initialize a set of centroids randomly 2. For each data point x, find the distance from the centroid for each

K–means 1. Initialize a set of centroids randomly

2. For each data point x, find the distance from the centroid for each cluster

3. Put data point in the cluster of the closest centroid

• Cluster for which dcluster is minimum

4. When all data points clustered, recompute cluster centroid

5. If not converged, go back to 2

),( clustercluster mxd distance

clusteri

i

cluster

cluster xN

m1

Page 30: Clusteringpmuthuku/mlsp_page/lectures/... · 2012-09-27 · K–means 1. Initialize a set of centroids randomly 2. For each data point x, find the distance from the centroid for each

K–means 1. Initialize a set of centroids randomly

2. For each data point x, find the distance from the centroid for each cluster

3. Put data point in the cluster of the closest centroid

• Cluster for which dcluster is minimum

4. When all data points clustered, recompute cluster centroid

5. If not converged, go back to 2

),( clustercluster mxd distance

clusteri

i

cluster

cluster xN

m1

Page 31: Clusteringpmuthuku/mlsp_page/lectures/... · 2012-09-27 · K–means 1. Initialize a set of centroids randomly 2. For each data point x, find the distance from the centroid for each

K–means 1. Initialize a set of centroids randomly

2. For each data point x, find the distance from the centroid for each cluster

3. Put data point in the cluster of the closest centroid

• Cluster for which dcluster is minimum

4. When all data points clustered, recompute cluster centroid

5. If not converged, go back to 2

),( clustercluster mxd distance

clusteri

i

cluster

cluster xN

m1

Page 32: Clusteringpmuthuku/mlsp_page/lectures/... · 2012-09-27 · K–means 1. Initialize a set of centroids randomly 2. For each data point x, find the distance from the centroid for each

K–means 1. Initialize a set of centroids randomly

2. For each data point x, find the distance from the centroid for each cluster

3. Put data point in the cluster of the closest centroid

• Cluster for which dcluster is minimum

4. When all data points clustered, recompute cluster centroid

5. If not converged, go back to 2

),( clustercluster mxd distance

clusteri

i

cluster

cluster xN

m1

Page 33: Clusteringpmuthuku/mlsp_page/lectures/... · 2012-09-27 · K–means 1. Initialize a set of centroids randomly 2. For each data point x, find the distance from the centroid for each

K–means 1. Initialize a set of centroids randomly

2. For each data point x, find the distance from the centroid for each cluster

3. Put data point in the cluster of the closest centroid

• Cluster for which dcluster is minimum

4. When all data points clustered, recompute cluster centroid

5. If not converged, go back to 2

),( clustercluster mxd distance

clusteri

i

cluster

cluster xN

m1

Page 34: Clusteringpmuthuku/mlsp_page/lectures/... · 2012-09-27 · K–means 1. Initialize a set of centroids randomly 2. For each data point x, find the distance from the centroid for each

K–means 1. Initialize a set of centroids randomly

2. For each data point x, find the distance from the centroid for each cluster

3. Put data point in the cluster of the closest centroid

• Cluster for which dcluster is minimum

4. When all data points clustered, recompute cluster centroid

5. If not converged, go back to 2

),( clustercluster mxd distance

clusteri

i

cluster

cluster xN

m1

Page 35: Clusteringpmuthuku/mlsp_page/lectures/... · 2012-09-27 · K–means 1. Initialize a set of centroids randomly 2. For each data point x, find the distance from the centroid for each

K–means 1. Initialize a set of centroids randomly

2. For each data point x, find the distance from the centroid for each cluster

3. Put data point in the cluster of the closest centroid

• Cluster for which dcluster is minimum

4. When all data points clustered, recompute cluster centroid

5. If not converged, go back to 2

),( clustercluster mxd distance

clusteri

i

cluster

cluster xN

m1

Page 36: Clusteringpmuthuku/mlsp_page/lectures/... · 2012-09-27 · K–means 1. Initialize a set of centroids randomly 2. For each data point x, find the distance from the centroid for each

K–means 1. Initialize a set of centroids randomly

2. For each data point x, find the distance from the centroid for each cluster

3. Put data point in the cluster of the closest centroid

• Cluster for which dcluster is minimum

4. When all data points clustered, recompute cluster centroid

5. If not converged, go back to 2

),( clustercluster mxd distance

clusteri

i

cluster

cluster xN

m1

Page 37: Clusteringpmuthuku/mlsp_page/lectures/... · 2012-09-27 · K–means 1. Initialize a set of centroids randomly 2. For each data point x, find the distance from the centroid for each

K–means 1. Initialize a set of centroids randomly

2. For each data point x, find the distance from the centroid for each cluster

3. Put data point in the cluster of the closest centroid

• Cluster for which dcluster is minimum

4. When all data points are clustered, recompute centroids

5. If not converged, go back to 2

),( clustercluster mxd distance

clusteri

ii

clusteri

i

cluster xww

m1

Page 38: Clusteringpmuthuku/mlsp_page/lectures/... · 2012-09-27 · K–means 1. Initialize a set of centroids randomly 2. For each data point x, find the distance from the centroid for each

K–means 1. Initialize a set of centroids randomly

2. For each data point x, find the distance from the centroid for each cluster

3. Put data point in the cluster of the closest centroid

• Cluster for which dcluster is minimum

4. When all data points are clustered, recompute centroids

5. If not converged, go back to 2

),( clustercluster mxd distance

clusteri

ii

clusteri

i

cluster xww

m1

Page 39: Clusteringpmuthuku/mlsp_page/lectures/... · 2012-09-27 · K–means 1. Initialize a set of centroids randomly 2. For each data point x, find the distance from the centroid for each

Another example

Page 40: Clusteringpmuthuku/mlsp_page/lectures/... · 2012-09-27 · K–means 1. Initialize a set of centroids randomly 2. For each data point x, find the distance from the centroid for each

Another example

Page 41: Clusteringpmuthuku/mlsp_page/lectures/... · 2012-09-27 · K–means 1. Initialize a set of centroids randomly 2. For each data point x, find the distance from the centroid for each

Another example

Page 42: Clusteringpmuthuku/mlsp_page/lectures/... · 2012-09-27 · K–means 1. Initialize a set of centroids randomly 2. For each data point x, find the distance from the centroid for each

Another example

Page 43: Clusteringpmuthuku/mlsp_page/lectures/... · 2012-09-27 · K–means 1. Initialize a set of centroids randomly 2. For each data point x, find the distance from the centroid for each

Going back to our first example

Page 44: Clusteringpmuthuku/mlsp_page/lectures/... · 2012-09-27 · K–means 1. Initialize a set of centroids randomly 2. For each data point x, find the distance from the centroid for each

Going back to our first example

Page 45: Clusteringpmuthuku/mlsp_page/lectures/... · 2012-09-27 · K–means 1. Initialize a set of centroids randomly 2. For each data point x, find the distance from the centroid for each

Going back to our first example

4 clusters

Page 46: Clusteringpmuthuku/mlsp_page/lectures/... · 2012-09-27 · K–means 1. Initialize a set of centroids randomly 2. For each data point x, find the distance from the centroid for each

Going back to our first example

6 clusters

Page 47: Clusteringpmuthuku/mlsp_page/lectures/... · 2012-09-27 · K–means 1. Initialize a set of centroids randomly 2. For each data point x, find the distance from the centroid for each

Problems with K-means

Initial conditions important

Page 48: Clusteringpmuthuku/mlsp_page/lectures/... · 2012-09-27 · K–means 1. Initialize a set of centroids randomly 2. For each data point x, find the distance from the centroid for each

Problems with K-means

Initial conditions important

Page 49: Clusteringpmuthuku/mlsp_page/lectures/... · 2012-09-27 · K–means 1. Initialize a set of centroids randomly 2. For each data point x, find the distance from the centroid for each

Problems with K-means

Initial conditions important

Page 50: Clusteringpmuthuku/mlsp_page/lectures/... · 2012-09-27 · K–means 1. Initialize a set of centroids randomly 2. For each data point x, find the distance from the centroid for each

Problems with K-means

Initial conditions important

Page 51: Clusteringpmuthuku/mlsp_page/lectures/... · 2012-09-27 · K–means 1. Initialize a set of centroids randomly 2. For each data point x, find the distance from the centroid for each

Problems with K-means

Initial conditions important

Page 52: Clusteringpmuthuku/mlsp_page/lectures/... · 2012-09-27 · K–means 1. Initialize a set of centroids randomly 2. For each data point x, find the distance from the centroid for each

Problems with K-means

What is K?

Page 53: Clusteringpmuthuku/mlsp_page/lectures/... · 2012-09-27 · K–means 1. Initialize a set of centroids randomly 2. For each data point x, find the distance from the centroid for each

Problems with K-means

K=2

Page 54: Clusteringpmuthuku/mlsp_page/lectures/... · 2012-09-27 · K–means 1. Initialize a set of centroids randomly 2. For each data point x, find the distance from the centroid for each

Problems with K-means

K=5

Page 55: Clusteringpmuthuku/mlsp_page/lectures/... · 2012-09-27 · K–means 1. Initialize a set of centroids randomly 2. For each data point x, find the distance from the centroid for each

Is there an optimal clustering

method?

Page 56: Clusteringpmuthuku/mlsp_page/lectures/... · 2012-09-27 · K–means 1. Initialize a set of centroids randomly 2. For each data point x, find the distance from the centroid for each

Optimal method: Exhaustive Enumeration

Compute distances between every single pair of data

points and cluster on that

Page 57: Clusteringpmuthuku/mlsp_page/lectures/... · 2012-09-27 · K–means 1. Initialize a set of centroids randomly 2. For each data point x, find the distance from the centroid for each

Optimal method: Exhaustive Enumeration

Compute distances between every single pair of data

points and cluster on that

Page 58: Clusteringpmuthuku/mlsp_page/lectures/... · 2012-09-27 · K–means 1. Initialize a set of centroids randomly 2. For each data point x, find the distance from the centroid for each

Optimal method: Exhaustive Enumeration

Compute distances between every single pair of data

points and cluster on that

Very very computationally expensive

If M data points and we want N clusters:

Compute goodness for every possible combination

N

i

Mi iNi

N

M 0

)()1(!

1

Page 59: Clusteringpmuthuku/mlsp_page/lectures/... · 2012-09-27 · K–means 1. Initialize a set of centroids randomly 2. For each data point x, find the distance from the centroid for each

Optimal method: Exhaustive Enumeration

Compute distances between every single pair of data

points and cluster on that

Very very computationally expensive

If M data points and we want N clusters:

Compute goodness for every possible combination

N

i

Mi iNi

N

M 0

)()1(!

1

Page 60: Clusteringpmuthuku/mlsp_page/lectures/... · 2012-09-27 · K–means 1. Initialize a set of centroids randomly 2. For each data point x, find the distance from the centroid for each

Optimal method: Exhaustive Enumeration

Compute distances between every single pair of data

points and cluster on that

Very very computationally expensive

If M data points and we want N clusters:

Compute goodness for every possible combination

K-means: Fast but greedy

Page 61: Clusteringpmuthuku/mlsp_page/lectures/... · 2012-09-27 · K–means 1. Initialize a set of centroids randomly 2. For each data point x, find the distance from the centroid for each

Going back to our first example

Page 62: Clusteringpmuthuku/mlsp_page/lectures/... · 2012-09-27 · K–means 1. Initialize a set of centroids randomly 2. For each data point x, find the distance from the centroid for each

Hierarchical clustering

Page 63: Clusteringpmuthuku/mlsp_page/lectures/... · 2012-09-27 · K–means 1. Initialize a set of centroids randomly 2. For each data point x, find the distance from the centroid for each
Page 64: Clusteringpmuthuku/mlsp_page/lectures/... · 2012-09-27 · K–means 1. Initialize a set of centroids randomly 2. For each data point x, find the distance from the centroid for each

Hierarchical clustering: Bottom up

Page 65: Clusteringpmuthuku/mlsp_page/lectures/... · 2012-09-27 · K–means 1. Initialize a set of centroids randomly 2. For each data point x, find the distance from the centroid for each

Bottom up clustering

Page 66: Clusteringpmuthuku/mlsp_page/lectures/... · 2012-09-27 · K–means 1. Initialize a set of centroids randomly 2. For each data point x, find the distance from the centroid for each

Bottom up clustering

Page 67: Clusteringpmuthuku/mlsp_page/lectures/... · 2012-09-27 · K–means 1. Initialize a set of centroids randomly 2. For each data point x, find the distance from the centroid for each

Bottom up clustering

Page 68: Clusteringpmuthuku/mlsp_page/lectures/... · 2012-09-27 · K–means 1. Initialize a set of centroids randomly 2. For each data point x, find the distance from the centroid for each

Bottom up clustering

Page 69: Clusteringpmuthuku/mlsp_page/lectures/... · 2012-09-27 · K–means 1. Initialize a set of centroids randomly 2. For each data point x, find the distance from the centroid for each

Bottom up clustering

Initially, every point is its own cluster

Page 70: Clusteringpmuthuku/mlsp_page/lectures/... · 2012-09-27 · K–means 1. Initialize a set of centroids randomly 2. For each data point x, find the distance from the centroid for each

Bottom up clustering

Page 71: Clusteringpmuthuku/mlsp_page/lectures/... · 2012-09-27 · K–means 1. Initialize a set of centroids randomly 2. For each data point x, find the distance from the centroid for each

Bottom up clustering

Page 72: Clusteringpmuthuku/mlsp_page/lectures/... · 2012-09-27 · K–means 1. Initialize a set of centroids randomly 2. For each data point x, find the distance from the centroid for each

Bottom up clustering

Page 73: Clusteringpmuthuku/mlsp_page/lectures/... · 2012-09-27 · K–means 1. Initialize a set of centroids randomly 2. For each data point x, find the distance from the centroid for each

Bottom up clustering

Page 74: Clusteringpmuthuku/mlsp_page/lectures/... · 2012-09-27 · K–means 1. Initialize a set of centroids randomly 2. For each data point x, find the distance from the centroid for each

Bottom up clustering

Page 75: Clusteringpmuthuku/mlsp_page/lectures/... · 2012-09-27 · K–means 1. Initialize a set of centroids randomly 2. For each data point x, find the distance from the centroid for each

Notes about bottom up clustering

Single Link: Nearest neighbor distance

Page 76: Clusteringpmuthuku/mlsp_page/lectures/... · 2012-09-27 · K–means 1. Initialize a set of centroids randomly 2. For each data point x, find the distance from the centroid for each

Notes about bottom up clustering

Single Link: Nearest neighbor distance

Complete link: Farthest neighbor distance

Page 77: Clusteringpmuthuku/mlsp_page/lectures/... · 2012-09-27 · K–means 1. Initialize a set of centroids randomly 2. For each data point x, find the distance from the centroid for each

Notes about bottom up clustering

Single Link: Nearest neighbor distance

Complete link: Farthest neighbor distance

Centroid: Distance between centroids

Page 78: Clusteringpmuthuku/mlsp_page/lectures/... · 2012-09-27 · K–means 1. Initialize a set of centroids randomly 2. For each data point x, find the distance from the centroid for each

Hierarchical clustering: Top Down

Page 79: Clusteringpmuthuku/mlsp_page/lectures/... · 2012-09-27 · K–means 1. Initialize a set of centroids randomly 2. For each data point x, find the distance from the centroid for each

Top down clustering

Page 80: Clusteringpmuthuku/mlsp_page/lectures/... · 2012-09-27 · K–means 1. Initialize a set of centroids randomly 2. For each data point x, find the distance from the centroid for each

Top down clustering

Page 81: Clusteringpmuthuku/mlsp_page/lectures/... · 2012-09-27 · K–means 1. Initialize a set of centroids randomly 2. For each data point x, find the distance from the centroid for each

Top down clustering

Page 82: Clusteringpmuthuku/mlsp_page/lectures/... · 2012-09-27 · K–means 1. Initialize a set of centroids randomly 2. For each data point x, find the distance from the centroid for each

K-Means for Top–Down clustering 1. Start with one cluster

2. Split each cluster into two: Perturb centroid of cluster slightly (by < 5%)

to generate two centroids

3. Initialize K means with new set of centroids

4. Iterate Kmeans until convergence

5. If the desired number of clusters is not obtained, return to 2

Page 83: Clusteringpmuthuku/mlsp_page/lectures/... · 2012-09-27 · K–means 1. Initialize a set of centroids randomly 2. For each data point x, find the distance from the centroid for each

K-Means for Top–Down clustering 1. Start with one cluster

2. Split each cluster into two: Perturb centroid of cluster slightly (by < 5%) to

generate two centroids

3. Initialize K means with new set of centroids

4. Iterate Kmeans until convergence

5. If the desired number of clusters is not obtained, return to 2

Page 84: Clusteringpmuthuku/mlsp_page/lectures/... · 2012-09-27 · K–means 1. Initialize a set of centroids randomly 2. For each data point x, find the distance from the centroid for each

K-Means for Top–Down clustering 1. Start with one cluster

2. Split each cluster into two: Perturb centroid of cluster slightly (by < 5%) to

generate two centroids

3. Initialize K means with new set of centroids

4. Iterate Kmeans until convergence

5. If the desired number of clusters is not obtained, return to 2

Page 85: Clusteringpmuthuku/mlsp_page/lectures/... · 2012-09-27 · K–means 1. Initialize a set of centroids randomly 2. For each data point x, find the distance from the centroid for each

K-Means for Top–Down clustering 1. Start with one cluster

2. Split each cluster into two: Perturb centroid of cluster slightly (by < 5%) to

generate two centroids

3. Initialize K means with new set of

centroids

4. Iterate Kmeans until convergence

5. If the desired number of clusters is not obtained, return to 2

Page 86: Clusteringpmuthuku/mlsp_page/lectures/... · 2012-09-27 · K–means 1. Initialize a set of centroids randomly 2. For each data point x, find the distance from the centroid for each

K-Means for Top–Down clustering 1. Start with one cluster

2. Split each cluster into two: Perturb centroid of cluster slightly (by < 5%) to

generate two centroids

3. Initialize K means with new set of

centroids

4. Iterate Kmeans until convergence

5. If the desired number of clusters is not obtained, return to 2

Page 87: Clusteringpmuthuku/mlsp_page/lectures/... · 2012-09-27 · K–means 1. Initialize a set of centroids randomly 2. For each data point x, find the distance from the centroid for each

K-Means for Top–Down clustering 1. Start with one cluster

2. Split each cluster into two: Perturb centroid of cluster slightly (by < 5%) to

generate two centroids

3. Initialize K means with new set of

centroids

4. Iterate Kmeans until convergence

5. If the desired number of clusters is

not obtained, return to 2

Page 88: Clusteringpmuthuku/mlsp_page/lectures/... · 2012-09-27 · K–means 1. Initialize a set of centroids randomly 2. For each data point x, find the distance from the centroid for each

K-Means for Top–Down clustering 1. Start with one cluster

2. Split each cluster into two: Perturb centroid of cluster slightly (by < 5%) to

generate two centroids

3. Initialize K means with new set of

centroids

4. Iterate Kmeans until convergence

5. If the desired number of clusters is

not obtained, return to 2

Page 89: Clusteringpmuthuku/mlsp_page/lectures/... · 2012-09-27 · K–means 1. Initialize a set of centroids randomly 2. For each data point x, find the distance from the centroid for each

When is a data point in a cluster?

Page 90: Clusteringpmuthuku/mlsp_page/lectures/... · 2012-09-27 · K–means 1. Initialize a set of centroids randomly 2. For each data point x, find the distance from the centroid for each

Distance from cluster

Euclidean distance from centroid

Page 91: Clusteringpmuthuku/mlsp_page/lectures/... · 2012-09-27 · K–means 1. Initialize a set of centroids randomly 2. For each data point x, find the distance from the centroid for each

Distance from cluster

Distance from the closest point

Page 92: Clusteringpmuthuku/mlsp_page/lectures/... · 2012-09-27 · K–means 1. Initialize a set of centroids randomly 2. For each data point x, find the distance from the centroid for each

Distance from cluster

Distance from the farthest point

Page 93: Clusteringpmuthuku/mlsp_page/lectures/... · 2012-09-27 · K–means 1. Initialize a set of centroids randomly 2. For each data point x, find the distance from the centroid for each

Distance from cluster

Probability of data measured on cluster distribution

Page 94: Clusteringpmuthuku/mlsp_page/lectures/... · 2012-09-27 · K–means 1. Initialize a set of centroids randomly 2. For each data point x, find the distance from the centroid for each

A closer look at ‘Distance’

Page 95: Clusteringpmuthuku/mlsp_page/lectures/... · 2012-09-27 · K–means 1. Initialize a set of centroids randomly 2. For each data point x, find the distance from the centroid for each

K–means 1. Initialize a set of centroids randomly

2. For each data point x, find the distance from the centroid for each cluster

3. Put data point in the cluster of the closest centroid

• Cluster for which dcluster is minimum

4. When all data points are clustered, recompute centroids

5. If not converged, go back to 2

),( clustercluster mxd distance

clusteri

ii

clusteri

i

cluster xww

m1

Page 96: Clusteringpmuthuku/mlsp_page/lectures/... · 2012-09-27 · K–means 1. Initialize a set of centroids randomly 2. For each data point x, find the distance from the centroid for each

A closer look at ‘Distance’

Original algorithm uses L2 norm and weight=1

This is an instance of generalized EM

The algorithm is not guaranteed to converge for other

distance metrics

clusteri

i

cluster

cluster xN

m1

2||||),( clusterclustercluster mxmx distance

Page 97: Clusteringpmuthuku/mlsp_page/lectures/... · 2012-09-27 · K–means 1. Initialize a set of centroids randomly 2. For each data point x, find the distance from the centroid for each

Problems with Euclidean distance

Page 98: Clusteringpmuthuku/mlsp_page/lectures/... · 2012-09-27 · K–means 1. Initialize a set of centroids randomly 2. For each data point x, find the distance from the centroid for each

Problems with Euclidean distance

Page 99: Clusteringpmuthuku/mlsp_page/lectures/... · 2012-09-27 · K–means 1. Initialize a set of centroids randomly 2. For each data point x, find the distance from the centroid for each

Problems with Euclidean distance

Page 100: Clusteringpmuthuku/mlsp_page/lectures/... · 2012-09-27 · K–means 1. Initialize a set of centroids randomly 2. For each data point x, find the distance from the centroid for each

Problems with Euclidean distance

Page 101: Clusteringpmuthuku/mlsp_page/lectures/... · 2012-09-27 · K–means 1. Initialize a set of centroids randomly 2. For each data point x, find the distance from the centroid for each

Better way: Map it to different space

f([x,y]) -> [x,y,z]

x = x

y = y

z = a(x2 + y2)

Page 102: Clusteringpmuthuku/mlsp_page/lectures/... · 2012-09-27 · K–means 1. Initialize a set of centroids randomly 2. For each data point x, find the distance from the centroid for each

The Kernel trick

Page 103: Clusteringpmuthuku/mlsp_page/lectures/... · 2012-09-27 · K–means 1. Initialize a set of centroids randomly 2. For each data point x, find the distance from the centroid for each

The Kernel trick

Transform data to higher dimensional space (even

infinite!)

z = F(x)

Page 104: Clusteringpmuthuku/mlsp_page/lectures/... · 2012-09-27 · K–means 1. Initialize a set of centroids randomly 2. For each data point x, find the distance from the centroid for each

The Kernel trick

Transform data to higher dimensional space (even

infinite!)

z = F(x)

Compute distance in higher dimensional space

d(x1, x2) = ||z1- z2||2 = ||F(x1) – F(x2)||

2

Page 105: Clusteringpmuthuku/mlsp_page/lectures/... · 2012-09-27 · K–means 1. Initialize a set of centroids randomly 2. For each data point x, find the distance from the centroid for each

The cool part

Distance in low dimensional space:

||x1- x2||2 = (x1- x2)

T(x1- x2) = x1.x1 + x2.x2 -2 x1.x2

Page 106: Clusteringpmuthuku/mlsp_page/lectures/... · 2012-09-27 · K–means 1. Initialize a set of centroids randomly 2. For each data point x, find the distance from the centroid for each

The cool part

Distance in low dimensional space:

||x1- x2||2 = (x1- x2)

T(x1- x2) = x1.x1 + x2.x2 -2 x1.x2

Distance in high dimensional space:

d(x1, x2) =||F(x1) – F(x2)||2

= F(x1). F(x1) + F(x2). F(x2) -2 F(x1). F(x2)

Note: Every term involves dot products!

Page 107: Clusteringpmuthuku/mlsp_page/lectures/... · 2012-09-27 · K–means 1. Initialize a set of centroids randomly 2. For each data point x, find the distance from the centroid for each

Kernel function

Kernel function is just

K(x1,x2) = F(x1). F(x2)

Going back to our distance function in the high

dimensional space:

d(x1, x2) =||F(x1) – F(x2)||2

= F(x1). F(x1) + F(x2). F(x2) -2 F(x1). F(x2)

= K(x1,x1) + K(x2,x2) - 2K(x1,x2)

Kernel functions are more efficient than dot products

Page 108: Clusteringpmuthuku/mlsp_page/lectures/... · 2012-09-27 · K–means 1. Initialize a set of centroids randomly 2. For each data point x, find the distance from the centroid for each

Typical Kernel Functions Linear: K(x,y) = xTy + c

Polynomial K(x,y) = (axTy + c)n

Gaussian: K(x,y) = exp(-||x-y||2/s2)

Exponential: K(x,y) = exp(-||x-y||/l)

Several others

Choosing the right Kernel with the right parameters for

your problem is an art

Page 109: Clusteringpmuthuku/mlsp_page/lectures/... · 2012-09-27 · K–means 1. Initialize a set of centroids randomly 2. For each data point x, find the distance from the centroid for each

Kernel K-means

Page 110: Clusteringpmuthuku/mlsp_page/lectures/... · 2012-09-27 · K–means 1. Initialize a set of centroids randomly 2. For each data point x, find the distance from the centroid for each

Kernel K–means 1. Initialize a set of centroids randomly

2. For each data point x, find the distance from the centroid for each cluster

3. Put data point in the cluster of the closest centroid

• Cluster for which dcluster is minimum

4. When all data points clustered, recompute cluster centroid

5. If not converged, go back to 2

),( clustercluster mxd distance

clusteri

i

cluster

cluster xN

m1

Page 111: Clusteringpmuthuku/mlsp_page/lectures/... · 2012-09-27 · K–means 1. Initialize a set of centroids randomly 2. For each data point x, find the distance from the centroid for each

Kernel K–means 1. Initialize a set of centroids randomly

2. For each data point x, find the distance from the centroid for each cluster

3. Put data point in the cluster of the closest centroid

• Cluster for which dcluster is minimum

4. When all data points clustered, recompute cluster centroid

5. If not converged, go back to 2

),( clustercluster mxd distance

clusteri

i

cluster

cluster xN

m1

Page 112: Clusteringpmuthuku/mlsp_page/lectures/... · 2012-09-27 · K–means 1. Initialize a set of centroids randomly 2. For each data point x, find the distance from the centroid for each

Kernel K–means 1. Initialize a set of centroids randomly

2. For each data point x, find the distance from the centroid for each cluster

3. Put data point in the cluster of the closest centroid

• Cluster for which dcluster is minimum

4. When all data points clustered, recompute cluster centroid

5. If not converged, go back to 2

),( clustercluster mxd distance

clusteri

i

cluster

cluster xN

m1

Page 113: Clusteringpmuthuku/mlsp_page/lectures/... · 2012-09-27 · K–means 1. Initialize a set of centroids randomly 2. For each data point x, find the distance from the centroid for each

Kernel K–means 1. Initialize a set of centroids randomly

2. For each data point x, find the distance from the centroid for each cluster

3. Put data point in the cluster of the closest centroid

• Cluster for which dcluster is minimum

4. When all data points clustered, recompute cluster centroid

5. If not converged, go back to 2

),( clustercluster mxd distance

clusteri

i

cluster

cluster xN

m1

Page 114: Clusteringpmuthuku/mlsp_page/lectures/... · 2012-09-27 · K–means 1. Initialize a set of centroids randomly 2. For each data point x, find the distance from the centroid for each

Kernel K–means 1. Initialize a set of centroids randomly

2. For each data point x, find the distance from the centroid for each cluster

3. Put data point in the cluster of the closest centroid

• Cluster for which dcluster is minimum

4. When all data points clustered, recompute cluster centroid

5. If not converged, go back to 2

),( clustercluster mxd distance

clusteri

i

cluster

cluster xN

m1

Page 115: Clusteringpmuthuku/mlsp_page/lectures/... · 2012-09-27 · K–means 1. Initialize a set of centroids randomly 2. For each data point x, find the distance from the centroid for each

Kernel K–means 1. Initialize a set of centroids randomly

2. For each data point x, find the distance from the centroid for each cluster

3. Put data point in the cluster of the closest centroid

• Cluster for which dcluster is minimum

4. When all data points clustered, recompute cluster centroid

5. If not converged, go back to 2

),( clustercluster mxd distance

clusteri

i

cluster

cluster xN

m1

Page 116: Clusteringpmuthuku/mlsp_page/lectures/... · 2012-09-27 · K–means 1. Initialize a set of centroids randomly 2. For each data point x, find the distance from the centroid for each

Kernel K–means 1. Initialize a set of centroids randomly

2. For each data point x, find the distance from the centroid for each cluster

3. Put data point in the cluster of the closest centroid

• Cluster for which dcluster is minimum

4. When all data points clustered, recompute cluster centroid

5. If not converged, go back to 2

),( clustercluster mxd distance

clusteri

i

cluster

cluster xN

m1

Page 117: Clusteringpmuthuku/mlsp_page/lectures/... · 2012-09-27 · K–means 1. Initialize a set of centroids randomly 2. For each data point x, find the distance from the centroid for each

Kernel K–means 1. Initialize a set of centroids randomly

2. For each data point x, find the distance from the centroid for each cluster

3. Put data point in the cluster of the closest centroid

• Cluster for which dcluster is minimum

4. When all data points clustered, recompute cluster centroid

5. If not converged, go back to 2

),( clustercluster mxd distance

clusteri

i

cluster

cluster xN

m1

Page 118: Clusteringpmuthuku/mlsp_page/lectures/... · 2012-09-27 · K–means 1. Initialize a set of centroids randomly 2. For each data point x, find the distance from the centroid for each

Kernel K–means 1. Initialize a set of centroids randomly

2. For each data point x, find the distance from the centroid for each cluster

3. Put data point in the cluster of the closest centroid

• Cluster for which dcluster is minimum

4. When all data points clustered, recompute cluster centroid

5. If not converged, go back to 2

),( clustercluster mxd distance

clusteri

i

cluster

cluster xN

m1

Page 119: Clusteringpmuthuku/mlsp_page/lectures/... · 2012-09-27 · K–means 1. Initialize a set of centroids randomly 2. For each data point x, find the distance from the centroid for each

Kernel K–means 1. Initialize a set of centroids randomly

2. For each data point x, find the distance from the centroid for each cluster

3. Put data point in the cluster of the closest centroid

• Cluster for which dcluster is minimum

4. When all data points are clustered, recompute centroids

5. If not converged, go back to 2

),( clustercluster mxd distance

clusteri

ii

clusteri

i

cluster xww

m1

Page 120: Clusteringpmuthuku/mlsp_page/lectures/... · 2012-09-27 · K–means 1. Initialize a set of centroids randomly 2. For each data point x, find the distance from the centroid for each

Kernel K–means 1. Initialize a set of centroids randomly

2. For each data point x, find the distance from the centroid for each cluster

3. Put data point in the cluster of the closest centroid

• Cluster for which dcluster is minimum

4. When all data points are clustered, recompute centroids

5. If not converged, go back to 2

),( clustercluster mxd distance

clusteri

ii

clusteri

i

cluster xww

m1

Page 121: Clusteringpmuthuku/mlsp_page/lectures/... · 2012-09-27 · K–means 1. Initialize a set of centroids randomly 2. For each data point x, find the distance from the centroid for each

Kernel K–means 1. Initialize a set of centroids randomly

2. For each data point x, find the distance from the centroid for each cluster

3. Put data point in the cluster of the closest centroid

• Cluster for which dcluster is minimum

4. When all data points are clustered, recompute centroids

5. If not converged, go back to 2

),( clustercluster mxd distance

clusteri

ii

clusteri

i

cluster xww

m1

Page 122: Clusteringpmuthuku/mlsp_page/lectures/... · 2012-09-27 · K–means 1. Initialize a set of centroids randomly 2. For each data point x, find the distance from the centroid for each

Distance metric

FF

FFF

clusteri

ii

T

clusteri

ii

2

cluster )x(wC)x()x(wC)x(||m)x(||)cluster,x(d

Page 123: Clusteringpmuthuku/mlsp_page/lectures/... · 2012-09-27 · K–means 1. Initialize a set of centroids randomly 2. For each data point x, find the distance from the centroid for each

Distance metric

FF

FFF

clusteri

ii

T

clusteri

ii

2

cluster )x(wC)x()x(wC)x(||m)x(||)cluster,x(d

FFFFFF

clusteri clusteri clusterj

j

T

iji

2

i

T

i

T )x()x(wwC)x()x(wC2)x()x(

Page 124: Clusteringpmuthuku/mlsp_page/lectures/... · 2012-09-27 · K–means 1. Initialize a set of centroids randomly 2. For each data point x, find the distance from the centroid for each

Distance metric

FF

FFF

clusteri

ii

T

clusteri

ii

2

cluster )x(wC)x()x(wC)x(||m)x(||)cluster,x(d

FFFFFF

clusteri clusteri clusterj

j

T

iji

2

i

T

i

T )x()x(wwC)x()x(wC2)x()x(

clusteri clusteri clusterj

jiji

2

ii )x,x(KwwC)x,x(KwC2)x,x(K

Page 125: Clusteringpmuthuku/mlsp_page/lectures/... · 2012-09-27 · K–means 1. Initialize a set of centroids randomly 2. For each data point x, find the distance from the centroid for each

Other clustering methods

Regression based clustering

Find a regression representing each cluster

Associate each point to the cluster with the best

regression

Related to kernel methods

Page 126: Clusteringpmuthuku/mlsp_page/lectures/... · 2012-09-27 · K–means 1. Initialize a set of centroids randomly 2. For each data point x, find the distance from the centroid for each

Questions?