machine learning for data science (cs4786) lecture 5 · 2017. 9. 7. · k-means: pitfalls • looks...
TRANSCRIPT
Machine Learning for Data Science (CS4786)Lecture 5
Gaussian Mixture Models
Course Webpage :http://www.cs.cornell.edu/Courses/cs4786/2017fa/
Clustering demo
Two elongated ellipses
Iris dataset: Flowers
Iris-Setosa Iris-versicolor Iris-virginica
Smiley dataset
Wisconsin Breast Cancer dataset
K-means: pitfalls
K-means: pitfalls
K-means: pitfalls
K-means: pitfalls
K-means: pitfalls
K-means: pitfalls
K-means: pitfalls
K-means: pitfalls
K-means: pitfalls
K-means: pitfalls
K-means: pitfalls
K-means: pitfalls
• Looks for spherical clusters
• Of same radius
• And with roughly equal number of points
• Can we design algorithm that can address these shortcomings?
K-means: pitfalls
Ellipsoid
⌃ =1
|Cj |X
t2Cj
(xt � rj)(xt � rj)>
rjx
(x� rj)>⌃�1(x� rj)
t
K-MEANS CLUSTERING
For all j ∈ [K], initialize cluster centroids r
0j randomly and set m = 1
Repeat until convergence (or until patience runs out)1 For each t ∈ {1, . . . ,n}, set cluster identity of the point
cm(xt) = argminj∈[K]
�xt − r
m−1j �
2 For each j ∈ [K], set new representative as
r
mj = 1�Cm
j � �xt∈Cmj
xt
3 m← m + 1
ELLIPSOIDAL CLUSTERING
For all j ∈ [K], initialize cluster centroids r
0j and ellipsoids ⌃0
jrandomly and set m = 1Repeat until convergence (or until patience runs out)
1 For each t ∈ {1, . . . ,n}, set cluster identity of the point
cm(xt) = argminj∈[K]
(xt − r
m−1j )� �⌃m−1�−1 (xt − r
m−1j )
2 For each j ∈ [K], set new representative as
r
mj = 1�Cm
j � �xt∈Cmj
xt ⌃m = 1�Cj� �t∈Cj
(xt − r
mj )(xt − r
mj )�
3 m← m + 1
K-means: pitfalls
• Looks for spherical clusters
• Of same radius
• And with roughly equal number of points
HARD GAUSSIAN MIXTURE MODEL
For all j ∈ [K], initialize cluster centroids r
0j , ellipsoids ⌃0
j andinitial proportions ⇡0 randomly and set m = 1Repeat until convergence (or until patience runs out)
1 For each t ∈ {1, . . . ,n}, set cluster identity of the point
cm(xt) = argminj∈[K]
(xt − r
m−1j )� �⌃m−1�−1 (xt − r
m−1j ) − log(⇡m−1
j )2 For each j ∈ [K], set new representative as
r
mj = 1�Cm
j � �xt∈Cmj
xt ⌃m = 1�Cj� �t∈Cj
(xt − r
mj )(xt − r
mj )� ⇡m
j = �Cmj �
n
3 m← m + 1
Multivariate Gaussian• Two parameters:
• Mean
• Covariance matrix of size dxd
µ 2 Rd
⌃
p(x;µ,⌃) = (2⇡)
d/2det(⌃)
�1/2exp
✓�1
2
(x� µ)
>⌃
�1(x� µ)
◆
Multivariate Gaussian• Two parameters:
• Mean
• Covariance matrix of size dxd
µ 2 Rd
⌃
32
10
x
-1
exp(-(10 x2 +y2)/2)
-2-3-3
-2
-1
0
y
1
2
3
1
0.8
0.6
0.4
0.2
0
p(x;µ,⌃) = (2⇡)
d/2det(⌃)
�1/2exp
✓�1
2
(x� µ)
>⌃
�1(x� µ)
◆
HARD GAUSSIAN MIXTURE MODEL
For all j ∈ [K], initialize cluster centroids r
0j , ellipsoids ⌃0
j andinitial proportions ⇡0 randomly and set m = 1Repeat until convergence (or until patience runs out)
1 For each t ∈ {1, . . . ,n}, set cluster identity of the point
cm(xt) = argminj∈[K]
p(xt, rm−1j , ⌃m−1) × ⇡m(j)
2 For each j ∈ [K], set new representative as
r
mj = 1�Cm
j � �xt∈Cmj
xt ⌃m = 1�Cj� �t∈Cj
(xt − r
mj )(xt − r
mj )� ⇡m
j = �Cmj �
n
3 m← m + 1
(SOFT) GAUSSIAN MIXTURE MODEL
For all j ∈ [K], initialize cluster centroids r
0j and ellipsoids ⌃0
jrandomly and set m = 1Repeat until convergence (or until patience runs out)
1 For each t ∈ {1, . . . ,n}, set cluster identity of the point
Qmt (j) = p(xt, r
m−1j , ⌃m−1) × ⇡m(j)
2 For each j ∈ [K], set new representative as
r
mj = ∑
nt=1 Qt(j)xt∑n
t=1 Qt(j) ⌃m = ∑nt=1 Qt(j)(xt − r
mj )(xt − r
mj )�
∑nt=1 Qt(j)
⇡mj = ∑
nt=1 Qt(j)
n
3 m← m + 1