cs 461: machine learning lecture 7
DESCRIPTION
CS 461: Machine Learning Lecture 7. Dr. Kiri Wagstaff [email protected]. Plan for Today. Unsupervised Learning K-means Clustering EM Clustering Homework 4. Review from Lecture 6. Parametric methods Data comes from distribution Bernoulli, Gaussian, and their parameters - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: CS 461: Machine Learning Lecture 7](https://reader036.vdocuments.us/reader036/viewer/2022070402/568137d3550346895d9f7271/html5/thumbnails/1.jpg)
2/21/09 CS 461, Winter 2009 1
CS 461: Machine LearningLecture 7
Dr. Kiri [email protected]. Kiri [email protected]
![Page 2: CS 461: Machine Learning Lecture 7](https://reader036.vdocuments.us/reader036/viewer/2022070402/568137d3550346895d9f7271/html5/thumbnails/2.jpg)
2/21/09 CS 461, Winter 2009 2
Plan for Today
Unsupervised Learning K-means Clustering EM Clustering
Homework 4
![Page 3: CS 461: Machine Learning Lecture 7](https://reader036.vdocuments.us/reader036/viewer/2022070402/568137d3550346895d9f7271/html5/thumbnails/3.jpg)
2/21/09 CS 461, Winter 2009 3
Review from Lecture 6
Parametric methods Data comes from distribution Bernoulli, Gaussian, and their parameters How good is a parameter estimate? (bias, variance)
Bayes estimation ML: use the data (assume equal priors) MAP: use the prior and the data
Parametric classification Maximize the posterior probability
![Page 4: CS 461: Machine Learning Lecture 7](https://reader036.vdocuments.us/reader036/viewer/2022070402/568137d3550346895d9f7271/html5/thumbnails/4.jpg)
2/21/09 CS 461, Winter 2009 4
Clustering
Chapter 7Chapter 7
![Page 5: CS 461: Machine Learning Lecture 7](https://reader036.vdocuments.us/reader036/viewer/2022070402/568137d3550346895d9f7271/html5/thumbnails/5.jpg)
2/21/09 CS 461, Winter 2009 5
Unsupervised Learning
The data has no labels! What can we still learn?
Salient groups in the data Density in feature space
Key approach: clustering … but also:
Association rules Density estimation Principal components analysis (PCA)
![Page 6: CS 461: Machine Learning Lecture 7](https://reader036.vdocuments.us/reader036/viewer/2022070402/568137d3550346895d9f7271/html5/thumbnails/6.jpg)
2/21/09 CS 461, Winter 2009 6
Clustering
Group items by similarity
Density estimation, cluster models
![Page 7: CS 461: Machine Learning Lecture 7](https://reader036.vdocuments.us/reader036/viewer/2022070402/568137d3550346895d9f7271/html5/thumbnails/7.jpg)
2/21/09 CS 461, Winter 2009 7
Applications of Clustering
Image Segmentation
[Ma and Manjunath, 2004]
Data Mining: Targeted marketing
Remote Sensing: Land cover types
Text Analysis
[Selim Aksoy]
![Page 8: CS 461: Machine Learning Lecture 7](https://reader036.vdocuments.us/reader036/viewer/2022070402/568137d3550346895d9f7271/html5/thumbnails/8.jpg)
2/21/09 CS 461, Winter 2009 8
Applications of Clustering
Text Analysis: Noun Phrase Coreference
John Simon, Chief Financial Officer of Prime Corp. since 1986, saw his pay jump 20%, to $1.3 million, as the 37-year-old also became the financial-services company’s president.
John Simon
Chief Financial Officer
his
the 37-year-old
president
Prime Corp.
the financial-services company
Input text
Cluster PC
1986
pay
20%
$1.3 million
SingletonsCluster JS
![Page 9: CS 461: Machine Learning Lecture 7](https://reader036.vdocuments.us/reader036/viewer/2022070402/568137d3550346895d9f7271/html5/thumbnails/9.jpg)
2/21/09 CS 461, Winter 2009 9[ Andrew Moore]
Sometimes easy
Sometimes impossible
and sometimesin between
![Page 10: CS 461: Machine Learning Lecture 7](https://reader036.vdocuments.us/reader036/viewer/2022070402/568137d3550346895d9f7271/html5/thumbnails/10.jpg)
2/21/09 CS 461, Winter 2009 10
K-means1. Ask user how
many clusters they’d like. (e.g. k=5)
[ Andrew Moore]
![Page 11: CS 461: Machine Learning Lecture 7](https://reader036.vdocuments.us/reader036/viewer/2022070402/568137d3550346895d9f7271/html5/thumbnails/11.jpg)
2/21/09 CS 461, Winter 2009 11
K-means1. Ask user how
many clusters they’d like. (e.g. k=5)
2. Randomly guess k cluster Center locations
[ Andrew Moore]
![Page 12: CS 461: Machine Learning Lecture 7](https://reader036.vdocuments.us/reader036/viewer/2022070402/568137d3550346895d9f7271/html5/thumbnails/12.jpg)
2/21/09 CS 461, Winter 2009 12
K-means1. Ask user how
many clusters they’d like. (e.g. k=5)
2. Randomly guess k cluster Center locations
3. Each datapoint finds out which Center it’s closest to. (Thus each Center “owns” a set of datapoints)
[ Andrew Moore]
![Page 13: CS 461: Machine Learning Lecture 7](https://reader036.vdocuments.us/reader036/viewer/2022070402/568137d3550346895d9f7271/html5/thumbnails/13.jpg)
2/21/09 CS 461, Winter 2009 13
K-means1. Ask user how
many clusters they’d like. (e.g. k=5)
2. Randomly guess k cluster Center locations
3. Each datapoint finds out which Center it’s closest to.
4. Each Center finds the centroid of the points it owns
[ Andrew Moore]
![Page 14: CS 461: Machine Learning Lecture 7](https://reader036.vdocuments.us/reader036/viewer/2022070402/568137d3550346895d9f7271/html5/thumbnails/14.jpg)
2/21/09 CS 461, Winter 2009 14
K-means1. Ask user how
many clusters they’d like. (e.g. k=5)
2. Randomly guess k cluster Center locations
3. Each datapoint finds out which Center it’s closest to.
4. Each Center finds the centroid of the points it owns…
5. …and jumps there
6. …Repeat until terminated!
[ Andrew Moore]
![Page 15: CS 461: Machine Learning Lecture 7](https://reader036.vdocuments.us/reader036/viewer/2022070402/568137d3550346895d9f7271/html5/thumbnails/15.jpg)
2/21/09 CS 461, Winter 2009 15
K-means Start: k=5
Example generated by Dan Pelleg’s super-duper fast K-means system:
Dan Pelleg and Andrew Moore. Accelerating Exact k-means Algorithms with Geometric Reasoning. Proc. Conference on Knowledge Discovery in Databases 1999, (KDD99) (available on www.autonlab.org/pap.html)
[ Andrew Moore]
![Page 16: CS 461: Machine Learning Lecture 7](https://reader036.vdocuments.us/reader036/viewer/2022070402/568137d3550346895d9f7271/html5/thumbnails/16.jpg)
2/21/09 CS 461, Winter 2009 16
K-means continues…
[ Andrew Moore]
![Page 17: CS 461: Machine Learning Lecture 7](https://reader036.vdocuments.us/reader036/viewer/2022070402/568137d3550346895d9f7271/html5/thumbnails/17.jpg)
2/21/09 CS 461, Winter 2009 17
K-means continues…
[ Andrew Moore]
![Page 18: CS 461: Machine Learning Lecture 7](https://reader036.vdocuments.us/reader036/viewer/2022070402/568137d3550346895d9f7271/html5/thumbnails/18.jpg)
2/21/09 CS 461, Winter 2009 18
K-means continues…
[ Andrew Moore]
![Page 19: CS 461: Machine Learning Lecture 7](https://reader036.vdocuments.us/reader036/viewer/2022070402/568137d3550346895d9f7271/html5/thumbnails/19.jpg)
2/21/09 CS 461, Winter 2009 19
K-means continues…
[ Andrew Moore]
![Page 20: CS 461: Machine Learning Lecture 7](https://reader036.vdocuments.us/reader036/viewer/2022070402/568137d3550346895d9f7271/html5/thumbnails/20.jpg)
2/21/09 CS 461, Winter 2009 20
K-means continues…
[ Andrew Moore]
![Page 21: CS 461: Machine Learning Lecture 7](https://reader036.vdocuments.us/reader036/viewer/2022070402/568137d3550346895d9f7271/html5/thumbnails/21.jpg)
2/21/09 CS 461, Winter 2009 21
K-means continues…
[ Andrew Moore]
![Page 22: CS 461: Machine Learning Lecture 7](https://reader036.vdocuments.us/reader036/viewer/2022070402/568137d3550346895d9f7271/html5/thumbnails/22.jpg)
2/21/09 CS 461, Winter 2009 22
K-means continues…
[ Andrew Moore]
![Page 23: CS 461: Machine Learning Lecture 7](https://reader036.vdocuments.us/reader036/viewer/2022070402/568137d3550346895d9f7271/html5/thumbnails/23.jpg)
2/21/09 CS 461, Winter 2009 23
K-means continues…
[ Andrew Moore]
![Page 24: CS 461: Machine Learning Lecture 7](https://reader036.vdocuments.us/reader036/viewer/2022070402/568137d3550346895d9f7271/html5/thumbnails/24.jpg)
2/21/09 CS 461, Winter 2009 24
K-means terminates
[ Andrew Moore]
![Page 25: CS 461: Machine Learning Lecture 7](https://reader036.vdocuments.us/reader036/viewer/2022070402/568137d3550346895d9f7271/html5/thumbnails/25.jpg)
2/21/09 CS 461, Winter 2009 25
K-means Algorithm
1. Randomly select k cluster centers
2. While (points change membership)1. Assign each point to its closest cluster
(Use your favorite distance metric)
2. Update each center to be the mean of its items
Objective function: Variance
K-means applet
€
V =c=1
k
∑ dist(x j ,μ cx j ∈Cc
∑ )2
![Page 26: CS 461: Machine Learning Lecture 7](https://reader036.vdocuments.us/reader036/viewer/2022070402/568137d3550346895d9f7271/html5/thumbnails/26.jpg)
2/21/09 CS 461, Winter 2009 26
K-means Algorithm: Example
1. Randomly select k cluster centers
2. While (points change membership)1. Assign each point to its closest cluster
(Use your favorite distance metric)
2. Update each center to be the mean of its items
Objective function: Variance
Data: [1, 15, 4, 2, 17, 10, 6, 18]
€
V =c=1
k
∑ dist(x j ,μ cx j ∈Cc
∑ )2
![Page 27: CS 461: Machine Learning Lecture 7](https://reader036.vdocuments.us/reader036/viewer/2022070402/568137d3550346895d9f7271/html5/thumbnails/27.jpg)
2/21/09 CS 461, Winter 2009 27
K-means for Compression
Original image Clustered, k=4
159 KB 53 KB
![Page 28: CS 461: Machine Learning Lecture 7](https://reader036.vdocuments.us/reader036/viewer/2022070402/568137d3550346895d9f7271/html5/thumbnails/28.jpg)
2/21/09 CS 461, Winter 2009 28
Issue 1: Local Optima
K-means is greedy! Converging to a non-global optimum:
[Example from Andrew Moore]
![Page 29: CS 461: Machine Learning Lecture 7](https://reader036.vdocuments.us/reader036/viewer/2022070402/568137d3550346895d9f7271/html5/thumbnails/29.jpg)
2/21/09 CS 461, Winter 2009 29
Issue 1: Local Optima
K-means is greedy! Converging to a non-global optimum:
[Example from Andrew Moore]
![Page 30: CS 461: Machine Learning Lecture 7](https://reader036.vdocuments.us/reader036/viewer/2022070402/568137d3550346895d9f7271/html5/thumbnails/30.jpg)
2/21/09 CS 461, Winter 2009 30
Issue 2: How long will it take?
We don’t know! K-means is O(nkdI)
d = # features (dimensionality) I =# iterations
# iterations depends on random initialization “Good” init: few iterations “Bad” init: lots of iterations How can we tell the difference, before clustering?
We can’t Use heuristics to guess “good” init
![Page 31: CS 461: Machine Learning Lecture 7](https://reader036.vdocuments.us/reader036/viewer/2022070402/568137d3550346895d9f7271/html5/thumbnails/31.jpg)
2/21/09 CS 461, Winter 2009 31
Issue 3: How many clusters?
The “Holy Grail” of clustering
![Page 32: CS 461: Machine Learning Lecture 7](https://reader036.vdocuments.us/reader036/viewer/2022070402/568137d3550346895d9f7271/html5/thumbnails/32.jpg)
2/21/09 CS 461, Winter 2009 32
Issue 3: How many clusters?
Select k that gives partition with least variance?
Best k depends on the user’s goal
[Dhande and Fiore, 2002]
![Page 33: CS 461: Machine Learning Lecture 7](https://reader036.vdocuments.us/reader036/viewer/2022070402/568137d3550346895d9f7271/html5/thumbnails/33.jpg)
2/21/09 CS 461, Winter 2009 33
Issue 4: How good is the result?
Rand Index A = # pairs in same cluster in both partitions B = # pairs in different clusters in both partitions Rand = (A + B) / Total number of pairs
1
53
2
4 76 8
9 10
1
53
2
4 76 8
910
Rand = (5 + 26) / 45
![Page 34: CS 461: Machine Learning Lecture 7](https://reader036.vdocuments.us/reader036/viewer/2022070402/568137d3550346895d9f7271/html5/thumbnails/34.jpg)
2/21/09 CS 461, Winter 2009 34
K-means: Parametric or Non-parametric?
Cluster models: means Data models?
All clusters are spherical Distance in any direction is the same Cluster may be arbitrarily “big” to include outliers
![Page 35: CS 461: Machine Learning Lecture 7](https://reader036.vdocuments.us/reader036/viewer/2022070402/568137d3550346895d9f7271/html5/thumbnails/35.jpg)
2/21/09 CS 461, Winter 2009 35
EM Clustering
Parametric solution Model the data distribution
Each cluster: Gaussian model Data: “mixture of models”
Hidden value zt is the cluster of item t E-step: estimate cluster memberships
M-step: maximize likelihood (clusters, params)
€
N μ,σ( )
€
L μ,σ | X( ) = P(X |μ,σ )
€
E z t X ,μ,σ[ ] =p x t |C ,μ,σ( )P C( )
p x t |C j ,μ j ,σ j( )P C j( )j
∑
![Page 36: CS 461: Machine Learning Lecture 7](https://reader036.vdocuments.us/reader036/viewer/2022070402/568137d3550346895d9f7271/html5/thumbnails/36.jpg)
2/21/09 CS 461, Winter 2009 36
The GMM assumption
• There are k components. The i’th component is called i
• Component i has an associated mean vector μi μ1
μ2
μ3
[ Andrew Moore]
![Page 37: CS 461: Machine Learning Lecture 7](https://reader036.vdocuments.us/reader036/viewer/2022070402/568137d3550346895d9f7271/html5/thumbnails/37.jpg)
2/21/09 CS 461, Winter 2009 37
The GMM assumption
• There are k components. The i’th component is called i
• Component i has an associated mean vector μi
• Each component generates data from a Gaussian with mean μi and covariance matrix 2I
Assume that each datapoint is generated according to the following recipe:
μ1
μ2
μ3
[ Andrew Moore]
![Page 38: CS 461: Machine Learning Lecture 7](https://reader036.vdocuments.us/reader036/viewer/2022070402/568137d3550346895d9f7271/html5/thumbnails/38.jpg)
2/21/09 CS 461, Winter 2009 38
The GMM assumption
• There are k components. The i’th component is called i
• Component i has an associated mean vector μi
• Each component generates data from a Gaussian with mean μi and covariance matrix 2I
Assume that each datapoint is generated according to the following recipe:
• Pick a component at random. Choose component i with probability P(i).
μ2
[ Andrew Moore]
![Page 39: CS 461: Machine Learning Lecture 7](https://reader036.vdocuments.us/reader036/viewer/2022070402/568137d3550346895d9f7271/html5/thumbnails/39.jpg)
2/21/09 CS 461, Winter 2009 39
The GMM assumption
• There are k components. The i’th component is called i
• Component i has an associated mean vector μi
• Each component generates data from a Gaussian with mean μi and covariance matrix 2I
Assume that each datapoint is generated according to the following recipe:
• Pick a component at random. Choose component i with probability P(i).
• Datapoint ~ N(μi, 2I )
μ2
x
[ Andrew Moore]
![Page 40: CS 461: Machine Learning Lecture 7](https://reader036.vdocuments.us/reader036/viewer/2022070402/568137d3550346895d9f7271/html5/thumbnails/40.jpg)
2/21/09 CS 461, Winter 2009 40
The General GMM assumption
μ1
μ2
μ3
• There are k components. The i’th component is called i
• Component i has an associated mean vector μi
• Each component generates data from a Gaussian with mean μi and covariance matrix i
Assume that each datapoint is generated according to the following recipe:
• Pick a component at random. Choose component i with probability P(i).
• Datapoint ~ N(μi, i )
[ Andrew Moore]
![Page 41: CS 461: Machine Learning Lecture 7](https://reader036.vdocuments.us/reader036/viewer/2022070402/568137d3550346895d9f7271/html5/thumbnails/41.jpg)
2/21/09 CS 461, Winter 2009 41
EM in action
http://www.the-wabe.com/notebook/em-algorithm.html
![Page 42: CS 461: Machine Learning Lecture 7](https://reader036.vdocuments.us/reader036/viewer/2022070402/568137d3550346895d9f7271/html5/thumbnails/42.jpg)
2/21/09 CS 461, Winter 2009 42
Gaussian Mixture Example: Start
[ Andrew Moore]
![Page 43: CS 461: Machine Learning Lecture 7](https://reader036.vdocuments.us/reader036/viewer/2022070402/568137d3550346895d9f7271/html5/thumbnails/43.jpg)
2/21/09 CS 461, Winter 2009 43
After first iteration
[ Andrew Moore]
![Page 44: CS 461: Machine Learning Lecture 7](https://reader036.vdocuments.us/reader036/viewer/2022070402/568137d3550346895d9f7271/html5/thumbnails/44.jpg)
2/21/09 CS 461, Winter 2009 44
After 2nd iteration
[ Andrew Moore]
![Page 45: CS 461: Machine Learning Lecture 7](https://reader036.vdocuments.us/reader036/viewer/2022070402/568137d3550346895d9f7271/html5/thumbnails/45.jpg)
2/21/09 CS 461, Winter 2009 45
After 3rd iteration
[ Andrew Moore]
![Page 46: CS 461: Machine Learning Lecture 7](https://reader036.vdocuments.us/reader036/viewer/2022070402/568137d3550346895d9f7271/html5/thumbnails/46.jpg)
2/21/09 CS 461, Winter 2009 46
After 4th iteration
[ Andrew Moore]
![Page 47: CS 461: Machine Learning Lecture 7](https://reader036.vdocuments.us/reader036/viewer/2022070402/568137d3550346895d9f7271/html5/thumbnails/47.jpg)
2/21/09 CS 461, Winter 2009 47
After 5th iteration
[ Andrew Moore]
![Page 48: CS 461: Machine Learning Lecture 7](https://reader036.vdocuments.us/reader036/viewer/2022070402/568137d3550346895d9f7271/html5/thumbnails/48.jpg)
2/21/09 CS 461, Winter 2009 48
After 6th iteration
[ Andrew Moore]
![Page 49: CS 461: Machine Learning Lecture 7](https://reader036.vdocuments.us/reader036/viewer/2022070402/568137d3550346895d9f7271/html5/thumbnails/49.jpg)
2/21/09 CS 461, Winter 2009 49
After 20th iteration
[ Andrew Moore]
![Page 50: CS 461: Machine Learning Lecture 7](https://reader036.vdocuments.us/reader036/viewer/2022070402/568137d3550346895d9f7271/html5/thumbnails/50.jpg)
2/21/09 CS 461, Winter 2009 50
EM Benefits
Model actual data distribution, not just centers
Get probability of membership in each cluster,not just distance
Clusters do not need to be “round”
![Page 51: CS 461: Machine Learning Lecture 7](https://reader036.vdocuments.us/reader036/viewer/2022070402/568137d3550346895d9f7271/html5/thumbnails/51.jpg)
2/21/09 CS 461, Winter 2009 51
EM Issues?
Local optima How long will it take? How many clusters? Evaluation
![Page 52: CS 461: Machine Learning Lecture 7](https://reader036.vdocuments.us/reader036/viewer/2022070402/568137d3550346895d9f7271/html5/thumbnails/52.jpg)
2/21/09 CS 461, Winter 2009 52
Summary: Key Points for Today
Unsupervised Learning Why? How?
K-means Clustering Iterative Sensitive to initialization Non-parametric Local optimum Rand Index
EM Clustering Iterative Sensitive to initialization Parametric Local optimum
![Page 53: CS 461: Machine Learning Lecture 7](https://reader036.vdocuments.us/reader036/viewer/2022070402/568137d3550346895d9f7271/html5/thumbnails/53.jpg)
2/21/09 CS 461, Winter 2009 53
Next Time
Clustering Reading: Alpaydin Ch. 7.1-7.4, 7.8 Reading questions: Gavin, Ronald, Matthew
Next time: Reinforcement learning – Robots!