methods for clustering k-means, soft k-means...
Post on 14-Aug-2020
4 Views
Preview:
TRANSCRIPT
MACHINE LEARNING - MSc CourseAPPLIED MACHINE LEARNING
1
APPLIED MACHINE LEARNING
Methods for Clustering
K-means, Soft K-means
DBSCAN
MACHINE LEARNING - MSc CourseAPPLIED MACHINE LEARNING
2
Objectives
Learn basic techniques for data clustering
• K-means and soft K-means, GMM (next lecture)
• DBSCAN
Understand the issues and major challenges in clustering
• Choice of metric
• Choice of number of clusters
MACHINE LEARNING - MSc CourseAPPLIED MACHINE LEARNING
3
What is clustering?
Clustering is a type of multivariate statistical analysis also known as
cluster analysis, unsupervised classification analysis, or numerical
taxonomy.
Clustering is a process of partitioning a set of data (or objects) in a set
of meaningful sub-classes, called clusters.
Cluster: a collection of data objects that are “similar” to one another and
thus can be treated collectively as one group.
MACHINE LEARNING - MSc CourseAPPLIED MACHINE LEARNING
4
Classification versus Clustering
Supervised Classification = Classification
We know the class labels and the number of classes.
1 2 3 1 2 3
Unsupervised Classification = Clustering
We do not know the class labels and may not know
the number of classes.
? ? ? ? ? ?
MACHINE LEARNING - MSc CourseAPPLIED MACHINE LEARNING
5
Classification versus Clustering
Unsupervised Classification = Clustering
Hard problem when no pair of objects have exactly
the same feature.
Need to determine how similar two or more objects
are to one another.
??
?? ?
MACHINE LEARNING - MSc CourseAPPLIED MACHINE LEARNING
6
Which clusters can you create?
Which two subgroups of pictures are similar and why?
MACHINE LEARNING - MSc CourseAPPLIED MACHINE LEARNING
7
Which clusters can you create?
Which two subgroups of pictures are similar and why?
MACHINE LEARNING - MSc CourseAPPLIED MACHINE LEARNING
8
A good clustering method produces high quality clusters
when:
• The intra-class (that is, intra-cluster) similarity is high.
• The inter-class similarity is low.
• The quality measure of a cluster depends on the similarity
measure used!
What is Good Clustering?
MACHINE LEARNING - MSc CourseAPPLIED MACHINE LEARNING
9
Exercise:
Intra-class similarity is the highest when:
a) you choose to classify images with and without glasses
b) you choose to classify images of person1 against person2
Person1 with glasses
Person1 without glasses
Person2 without glasses
Person2 with glasses
MACHINE LEARNING - MSc CourseAPPLIED MACHINE LEARNING
10
Exercise:
Projection onto first two principal components after PCA
Person1 with glasses
Person1 without glasses
Person2 without glasses
Person2 with glasses
Intra-class similarity is the highest when:
a) you choose to classify images with and without glasses
b) you choose to classify images of person1 against person2
MACHINE LEARNING - MSc CourseAPPLIED MACHINE LEARNING
11
Exercise:
The eigenvector e1 is composed of a mix between the main characteristics of
the two faces and it is hence explanatory of both. However, since both faces
have little in common, the two groups have different coordinates onto e1 but
have quasi identical coordinates for the glasses in each subgroup. Projecting
onto e1 hence offers a mean to compute a metric of similarity across the two
persons.
Projection onto e1 against e2
Person1 with glasses
Person1 without glasses
Person2 without glasses
Person2 with glasses
e1 e2
MACHINE LEARNING - MSc CourseAPPLIED MACHINE LEARNING
12
Exercise:
When projecting onto e1 and e3, we can separate the image of the
person1 with and without glasses, as the eigenvector e3 embeds
features distinctive of person1 primarily.
Projection onto e1 against e3
Person1 with glasses
Person1 without glasses
Person2 without glasses
Person2 with glasses
e1 e3e2
MACHINE LEARNING - MSc CourseAPPLIED MACHINE LEARNING
13
Exercise:
Design a method to find out the groups when you no longer
have the class labels?
Projection onto first two principal components after PCA
MACHINE LEARNING - MSc CourseAPPLIED MACHINE LEARNING
14
Priors:
• Data cluster within a circle
• There are 2 clusters
x1
x2
x3
Sensitivity to Prior Knowledge
Outliers (noise)
Relevant Data
MACHINE LEARNING - MSc CourseAPPLIED MACHINE LEARNING
15
Priors:
• Data follow a complex distribution
• There are 3 clusters
x1
x2
x3
Sensitivity to Prior Knowledge
MACHINE LEARNING - MSc CourseAPPLIED MACHINE LEARNING
16
Globular Clusters
Non-Globular Clusters
Clusters’ Types
K-means
produces
globular clusters
DBSCAN
produces non-
globular clusters
MACHINE LEARNING - MSc CourseAPPLIED MACHINE LEARNING
17
Requirements for good clustering:
• Discovery of clusters with arbitrary shape
• Ability to deal with noise and outliers
• Insensitivity to input records’ ordering
• Scalability
• High dimensionality
• Interpretability and reusability
What is Good Clustering?
MACHINE LEARNING - MSc CourseAPPLIED MACHINE LEARNING
18
How to cluster?
x1
x2
What choice of model (circle, ellipse) for the cluster?
How many models?
MACHINE LEARNING - MSc CourseAPPLIED MACHINE LEARNING
19
x1
x2
What choice of model (circle, ellipse) for the cluster?
How many models?
Circle
Fixed number: K=2
Where to place them for optimal clustering?
K-means Clustering
K-Means clustering generates a number K of disjoint clusters to miminize:
2
1
1
,..., i k
ik
KK
k x c
J x
ix ith data point
k geometric centroid
𝑐𝒌 cluster label or number
MACHINE LEARNING - MSc CourseAPPLIED MACHINE LEARNING
20
K-means Clustering
Initialization: initialize at random the positions of the centers of the clusters
x1
x2
In mldemos; centroids are initialized on one datapoint with no overlap across centroids.
MACHINE LEARNING - MSc CourseAPPLIED MACHINE LEARNING
21
K-means Clustering
Assignment Step:
• Calculate the distance from each data point to each centroid.
• Assign the responsibility of each data point to its “closest” centroid.If a tie happens (i.e. two centroids are equidistant to a data point, one assigns the data point to the smallest
winning centroid).
x1
x2
arg min ,i k
ik
k d x
ix ith data point
k geometric centroid
Responsibility of cluster for point
1 if k
0 otherwise
i
ik
i
k x
kr
MACHINE LEARNING - MSc CourseAPPLIED MACHINE LEARNING
22
Update step (M-Step):
Recompute the position of centroid based on the assignment of the points
K-means Clustering
x1
x2
i
k
k
i
i
k
i
i
r x
r
arg min ,i k
ik
k d x
Responsibility of cluster for point
1 if k
0 otherwise
i
ik
i
k x
kr
MACHINE LEARNING - MSc CourseAPPLIED MACHINE LEARNING
23
K-means Clustering
x1
x2
arg min ,i k
ik
k d x
Responsibility of cluster for point
1 if k
0 otherwise
i
ik
i
k x
kr
Assignment Step:
• Calculate the distance from each data point to each centroid.
• Assign the responsibility of each data point to its “closest” centroid.If a tie happens (i.e. two centroids are equidistant to a data point, one assigns the data point to the smallest
winning centroid).
i
k
k
i
i
k
i
i
r x
r
MACHINE LEARNING - MSc CourseAPPLIED MACHINE LEARNING
24
K-means Clustering
x1
x2
Stopping Criterion: Go back to step 2 and repeat the process until the clusters are stable.
Update step (M-Step):
Recompute the position of centroid based on the assignment of the points
MACHINE LEARNING - MSc CourseAPPLIED MACHINE LEARNING
25
K-means Clustering
x1
x2
Intersection points
K-means creates a hard partitioning of the dataset
MACHINE LEARNING - MSc CourseAPPLIED MACHINE LEARNING
26
Effect of the distance metric on K-means
L1-Norm L2-Norm
L3-Norm L8-Norm
MACHINE LEARNING - MSc CourseAPPLIED MACHINE LEARNING
27
K-means Clustering: Algorithm
1. Initialization: Pick K arbitrary centroids and set their geometric means
to random values (in mldemos; centroids are initialized on one datapoint
with no overlap across centroids).
2. Calculate the distance from each data point to each centroid .
3. Assignment Step: Assign the responsibility of each data point to its
“closest” centroid (E-step). If a tie happens (i.e. two centroids are equidistant
to a data point, one assigns the data point to the smallest winning centroid).
4. Update Step: Adjust the centroids to be the means of all data points
assigned to them (M-step)
5. Go back to step 2 and repeat the process until the clusters are stable.
arg min ,i k
ik
k d x 1 if k
0 otherwise
ik
i
kr
i
k
k
i
i
k
i
i
r x
r
MACHINE LEARNING - MSc CourseAPPLIED MACHINE LEARNING
28
K-means Clustering
The algorithm of K-means is a simple version of
Expectation-Maximization applied to a model
composed of isotropic Gauss functions
(see next lecture)
MACHINE LEARNING - MSc CourseAPPLIED MACHINE LEARNING
29
K-means Clustering: Properties
• There are always K clusters.
• The clusters do not overlap. (soft K-means relaxes this assumption,
see next slides)
• Each member of a cluster is closer to its cluster than to any other
cluster.
The algorithm is guaranteed to converge
in a finite number of iterations
But it converges to a local optimum!
It is hence very sensitive to initialization of the centroids.
MACHINE LEARNING - MSc CourseAPPLIED MACHINE LEARNING
30
Soft K-means Clustering
Assignment Step (E-step):
• Calculate the distance from each data point to each centroid.
• Assign the responsibility of each data point to its “closest” centroid.
x1
x2
'
,
,
'
: responsibility of cluster for point
[0,1],
Normalized over clusters: 1
i
k i
k i
k
i
d x
k
id x
k
k
i
k
r k x
er
e
r
Each data point is given a soft `degree of assignment'
to each of the means .
i
k
x
MACHINE LEARNING - MSc CourseAPPLIED MACHINE LEARNING
31
x1
x2
Update step (M-Step):
Recompute the position of centroid based on the assignment of the points
The model parameters, i.e. the means, are adjusted to match the weighted
sample means of the data points that they are responsible for.
ik
ik i
k
i
i
r x
r
'
,
,
'
: responsibility of cluster for point
[0,1],
Normalized over clusters: 1
i
k i
k i
k
i
d x
k
id x
k
k
i
k
r k x
er
e
r
The update algorithm of the soft K-means is identical to that of the hard K-means, aside from the
fact that the responsibilities to a particular cluster are now real numbers varying between 0 and 1.
Soft K-means Clustering
MACHINE LEARNING - MSc CourseAPPLIED MACHINE LEARNING
32
small
~ large
large
~ small
is the stiffness
1 measures the disparity across clusters
'
,
,
'
: responsibility of cluster for point
[0,1],
Normalized over clusters: 1
i
k i
k i
k
i
d x
k
id x
k
k
i
k
r k x
er
e
r
Soft K-means Clustering
MACHINE LEARNING - MSc CourseAPPLIED MACHINE LEARNING
33
Soft K-means algorithm with a small (left), medium (center) and large (right)
Soft K-means Clustering
10 5 1
MACHINE LEARNING - MSc CourseAPPLIED MACHINE LEARNING
34
Soft K-means Clustering
Iterations of the Soft K-means algorithm from the random initialization (left)
to convergence (right). Computed with = 10.
MACHINE LEARNING - MSc CourseAPPLIED MACHINE LEARNING
35
Advantages:
• Computationally faster than other clustering techniques.
• Produces tighter clusters, especially if the clusters are globular.
• Guaranteed to converge.
Drawbacks:
• Does not work well with non-globular clusters.
• Sensitivity to choice of initial partitions
Different initial partitions can result in different final clusters.
• Assumes a fixed number K of clusters.
It is, therefore, good practice to run the algorithm several times using
different K values, to determine the optimal number of clusters.
(soft) K-means Clustering: Properties
MACHINE LEARNING - MSc CourseAPPLIED MACHINE LEARNING
36
Advantages:
• Computationally faster than other clustering techniques.
• Produces tighter clusters, especially if the clusters are globular.
• Guaranteed to converge.
Drawbacks:
• Does not work well with non-globular clusters.
• Sensitivity to choice of initial partitions
Different initial partitions can result in different final clusters.
• Assumes a fixed number K of clusters.
It is, therefore, good practice to run the algorithm several times using
different K values, to determine the optimal number of clusters.
(soft) K-means Clustering: Properties
MACHINE LEARNING - MSc CourseAPPLIED MACHINE LEARNING
37
• Unbalanced clusters:
K-means takes into account only the distance between the means and data
points; it has no representation of the variance of the data within each
cluster.
• Elongated clusters:
K-means imposes a fixed shape for each cluster (sphere).
K-means Clustering: Weaknesses
MACHINE LEARNING - MSc CourseAPPLIED MACHINE LEARNING
38
Very sensitive to the choice of the number of clusters K and
the initialization. Mldemos example
K-means Clustering: Weaknesses
MACHINE LEARNING - MSc CourseAPPLIED MACHINE LEARNING
39
K-means would not be able to reject outliers
x1
x2
x3
K-means: Limitations
Outliers (noise)
Relevant Data
MACHINE LEARNING - MSc CourseAPPLIED MACHINE LEARNING
40
K-means would not be able to reject outliers
K-means assigns all datapoints to a cluster
Outliers get assigned to the closest cluster
x1
x2
x3
K-means: Limitations
DBSCAN can determine outliers and can generate non-globular clusters
MACHINE LEARNING - MSc CourseAPPLIED MACHINE LEARNING
41
Density Based Spatial Clustering of Applications
with Noise (DBSCAN)
x1
x2
x3
Outliers (noise)e
1. Pick a datapoint at random
2. Compute number of datapoints within e
3. If < mdata, set this datapoint as outlier
4. Go back to 1
MACHINE LEARNING - MSc CourseAPPLIED MACHINE LEARNING
42
Density Based Spatial Clustering of Applications
with Noise (DBSCAN)
x1
x2
x3
Outliers (noise)
1. Pick a datapoint at random
2. Compute number of datapoints within e
3. For each datapoint found, assign it to same cluster
4. Go back to 1
Cluster 1
MACHINE LEARNING - MSc CourseAPPLIED MACHINE LEARNING
43
Density Based Spatial Clustering of Applications
with Noise (DBSCAN)
x1
x2
x3
Outliers (noise)
1. Pick a datapoint at random
2. Compute number of datapoints within e
3. For each datapoint found, assign it to same cluster
4. Merge two clusters if distance between clusters < e
Cluster 1
Cluster 2
Cluster 1
MACHINE LEARNING - MSc CourseAPPLIED MACHINE LEARNING
44
Density Based Spatial Clustering of Applications
with Noise (DBSCAN)
x1
x2
x3
Outliers (noise)
Cluster 1
Cluster 2
Cluster 1
Hyperparameters:
• e: size of neighborhood
• mdata: minimum
number of datapoints
APPLIED MACHINE LEARNING
46
Comparison: K-means / DBSCAN
K-means DBSCAN
Hyperparameters K: Nm of clusters e: size, Mdata: min. nm of datapoints
Computational cost O(K*M) O(M*log(M)), M: nm datapoints
Type of cluster Globular Non-globular (arbitrary shapes, non-
linear boundaries)
Robustness to noise Not robust Robust to outliers within e
K-means is computational cheap. However, it is not robust to noise and
produces only globular clusters.
DBSCAN is computationally intensive, but it can detect automatically noise
and produces clusters of arbitrary shape.
Both K-means and BDSCAN depend on choosing well the hyperparameters
To determine the hyperparameters, use evaluation methods for clustering (next)
APPLIED MACHINE LEARNING
47
Clustering methods rely on hyper parameters
• Number of clusters, elements in the cluster, distance metric
Need to determine the goodness of these choices
Clustering is unsupervised classification
Do not know the real number of clusters and the data labels
Difficult to evaluate these choices without ground truth
Evaluation of Clustering Methods
4848
ADVANCED MACHINE LEARNING
Two types of measures: Internal versus external measures
Internal measures rely on measures of similarity:
(low) intra-cluster distance versus (high) inter-cluster distances
Internal measures are problematic as the metric of similarity is
often already optimized by the clustering algorithm.
External measures rely on ground truth (class labels):
Given a (sub)-set of known class labels compute similarity of
clusters to class labels.
In real-world data, it’s hard/infeasible to gather ground truth.
Evaluation of Clustering Methods
APPLIED MACHINE LEARNING
49
Internal Measure: RSS
Residual Sum of Square RSS is an internal measure (available in
mldemos).
It computes the distance (in norm-2) of each datapoint from its centroid
for all clusters.
2
1
RSS=k
Kk
k x C
x
5050
ADVANCED MACHINE LEARNING
Goal of K-means is to find cluster centers 𝜇𝑘 which minimize distortion.
RSS for K-Means
2
1
RSS=k
Kk
k x C
x
Measure of
Distortion
𝐾:𝑀 𝑐𝑙𝑢𝑠𝑡𝑒𝑟𝑠𝑅𝑆𝑆: 0
𝑀:100 𝑑𝑎𝑡𝑎𝑝𝑜𝑖𝑛𝑡𝑠
𝑁: 2 𝑑𝑖𝑚𝑒𝑛𝑠𝑖𝑜𝑛𝑠
However, it can still be used to determine an ‘optimal’ 𝐾 by monitoring the
slope of the decrease of the measure as 𝐾 increases.
By ↑ 𝐾 we ↓ 𝑅𝑆𝑆, what is the optimal 𝐾 such that 𝑅𝑆𝑆 → 0? 𝑅𝑆𝑆 = 0 when 𝐾 = 𝑀.One has as many clusters as datapoints!
5151
ADVANCED MACHINE LEARNING
K-means Clustering: Examples
Procedure: Run K-means – increase monotonically number of clusters – run K-
means with several initialization and take best run;
use RSS measure to measure improvement in clustering determine a plateau
Optimal 𝑘 is at the
‘elbow’ of the curve
𝑀:100 𝑑𝑎𝑡𝑎𝑝𝑜𝑖𝑛𝑡𝑠𝑁: 2 𝑑𝑖𝑚𝑒𝑛𝑠𝑖𝑜𝑛𝑠
𝑘: 4 𝑐𝑙𝑢𝑠𝑡𝑒𝑟𝑠
MACHINE LEARNING - MSc CourseAPPLIED MACHINE LEARNING
52
K-means with RSS: Examples
Cluster Analysis of Hedge Funds (fonds speculatifs)[N. Das, 9th Int. Conf. on Computing Economis and Finance, 2011]
No legal definition of Hedge funds - consists of a wide category of
investment funds with high risk & high returns – variety of strategies for
guiding the investment
Research Question: classify type of Hedge funds based on information
provided to the client
Data Dimension (Features): such as: asset class, size of the hedge fund,
incentive fee, risk-level, and liquidity of hedge funds
MACHINE LEARNING - MSc CourseAPPLIED MACHINE LEARNING
53
K-means with RSS: Examples
Cluster Analysis of Hedge Funds (fonds speculatifs)[N. Das, 9th Int. Conf. on Computing Economis and Finance, 2011]
No legal definition of Hedge funds - consists of a wide category of investment funds with high
risk & high returns – variety of strategies for guiding the investment
Research Question: classify type of Hedge funds based on information provided to the client
Data Dimension (Features): such as: asset class, size of the hedge fund, incentive fee, risk-
level, and liquidity of hedge funds
Number of Clusters (K) Optimal results are found with 7
clusters.
Cutoff
Procedure: Run K-means – increase
monotonically number of clusters – run
K-means with several initialization and
take best run;
Use RSS measure to measure
improvement in clustering determine
a plateau
5454
ADVANCED MACHINE LEARNING
K-means Clustering: Examples
Which one is the
‘optimal’ 𝑘?
𝑀:100 𝑑𝑎𝑡𝑎𝑝𝑜𝑖𝑛𝑡𝑠K: 3 𝑑𝑖𝑚𝑒𝑛𝑠𝑖𝑜𝑛𝑠
The ‘elbow’ or ‘plateau’ method for choosing the optimal 𝑘 from the RSS curve can
be unreliable for certain datasets:
𝑘: 11
𝑘: 2
We don’t know! We need an
additional penalty or criterion!
APPLIED MACHINE LEARNING
55
AIC and BIC determine how good the model fits the dataset in a probabilistic
sense (maximum-likelihood measure). The measure is balanced by how
many parameters are needed to get a good fit.
L: maximum likelihood of the model
: number of free parameters
: number of datapoints
- Aikaike Information Criterion: AIC= 2ln 2
- Bayesian Information Criterion: 2 ln ln
As the number of da
B
M
L B
BIC L B M
tapoints (observations) increase, BIC assigns more weights
to simpler models than AIC.
Low BIC implies either fewer explanatory variables, better fit, or both.
Penalty for an
increase in
computational costs
due to number of
parameters and
number of datapoints
Other Metrics to Evaluate Clustering Methods
Choosing AIC versus BIC depends on the application:
Is the purpose of the analysis to make predictions, or to decide which model best
represents reality?
AIC may have better predictive ability than BIC, but BIC finds a computationally more
efficient solution.
5656
ADVANCED MACHINE LEARNING
AIC for K-Means
For the particular case of K-means, we do not have a maximum likelihood
estimate of the model:
𝐴𝐼𝐶 = −2 ln(𝐿) + 2𝐵
𝐴𝐼𝐶𝑅𝑆𝑆 = 𝑅𝑆𝑆 + 𝐵
However, we can formulate a metric based on the RSS that penalizes for
model complexity (# K-clusters), conceptually following AIC:
Weighting
Factor
2
1
RSS=k
Kk
k x C
x
Number of free
parameters B=(K*N)
K: # clusters
N: # dimensions
: likelihood of model
: number of free parameters
L
B
5757
ADVANCED MACHINE LEARNING
BIC for K-Means
𝐵𝐼𝐶𝑅𝑆𝑆 = 𝑅𝑆𝑆 + ln(𝑀)𝐵
For the particular case of K-means, we do not have a maximum likelihood
estimate of the model:
𝐵𝐼𝐶 = −2 ln(𝐿) + ln(𝑀)𝐵
However, we can formulate a metric based on the RSS that penalizes for
model complexity (# K-clusters, # M-datapoints), conceptually following BIC:
Weighting factor penalizes wrt.
# datapoints (i.e. computational
complexity)
2
1
RSS=k
Kk
k x C
x
Number of free
parameters B=(K*N)
K: # clusters
N: # dimensions
5858
ADVANCED MACHINE LEARNING
K-means Clustering: Examples
𝑀:100 𝑑𝑎𝑡𝑎𝑝𝑜𝑖𝑛𝑡𝑠N: 3 𝑑𝑖𝑚𝑒𝑛𝑠𝑖𝑜𝑛𝑠
Procedure: Run K-means – increase monotonically number of clusters – run K-
means with several initialization and take best run;
use AIC/BIC curves to find the optimal 𝑘, which is min 𝐴𝐼𝐶 or min(𝐵𝐼𝐶)
Both min(𝐵𝐼𝐶) and
min(𝐴𝐼𝐶) → 𝑘 = 2
𝑘: 2 𝑐𝑙𝑢𝑠𝑡𝑒𝑟𝑠
5959
ADVANCED MACHINE LEARNING
BIC for K-Means
𝑀:100 𝑑𝑎𝑡𝑎𝑝𝑜𝑖𝑛𝑡𝑠N: 2 𝑑𝑖𝑚𝑒𝑛𝑠𝑖𝑜𝑛𝑠
𝐵𝐼𝐶𝑅𝑆𝑆 = 𝑅𝑆𝑆 + ln(𝑀) (𝐾 ∙ 𝑁)
𝑘: 14
𝐾: 14 𝑐𝑙𝑢𝑠𝑡𝑒𝑟𝑠
6060
ADVANCED MACHINE LEARNING
BIC for K-Means
𝑀:100 𝑑𝑎𝑡𝑎𝑝𝑜𝑖𝑛𝑡𝑠N: 2 𝑑𝑖𝑚𝑒𝑛𝑠𝑖𝑜𝑛𝑠
𝐵𝐼𝐶𝑅𝑆𝑆 = 𝑅𝑆𝑆 + ln(𝑀) (𝐾 ∙ 𝑁)
𝑘: 4
𝐾: 4 𝑐𝑙𝑢𝑠𝑡𝑒𝑟𝑠
6161
ADVANCED MACHINE LEARNING
AIC / BIC for DBSCAN
Comput centroid of each cluster and apply AIC/BIC of K−means
DBSCAN large e DBSCAN medium e DBSCAN small e
DBSCAN large e DBSCAN medium e DBSCAN small e
RSS 43 26 0.5
BIC 42 34 78
AIC 69 51 24
6262
ADVANCED MACHINE LEARNING
AIC / BIC for DBSCAN
Comput centroid of each cluster and apply AIC/BIC of K−means
DBSCAN large e DBSCAN medium e DBSCAN small e
K-means DBSCAN large e DBSCAN medium e DBSCAN small e
RSS 51 95 59 0.6
BIC 65 118 88 331
AIC 55 102 67 93
K-means
APPLIED MACHINE LEARNING
63
Two types of measures: Internal versus external measures
External measures assume that a subset of datapoints have class label
semi-supervised learning
They measure how well these datapoints are clustered.
Needs to have an idea of the number of existing classes and have
labeled some datapoints
Interesting only in cases when labeling is highly time-consuming
when the data is very large (e.g. in speech recognition)
Evaluation of Clustering Methods
MACHINE LEARNING - MSc CourseAPPLIED MACHINE LEARNING
64
Semi-Supervised Learning
Clustering F1-Measure:(careful: similar but not the same F-measure as the F-measure we will see for classification!)
Tradeoff between clustering correctly all datapoints of the same class in the same
cluster and making sure that each cluster contains points of only one class.
1 1
1
: nm of labeled datapoints
: the set of classes
: nm of clusters,
: nm of members of class and of cluster
, max ,
2 , ,,
, ,
,
,
ik
i
ik
ik
i
i
i
i
c C k
i i
i
i i
i
i
i
M
C c
K
n c k
cF C K F c k
M
R c k P c kF c k
R c k P c k
nR c k
c
nP c k
k
MACHINE LEARNING - MSc CourseAPPLIED MACHINE LEARNING
65
Recall: proportion of
datapoints correctly
classified/clusterized
Precision: proportion of
datapoints of the same
class in the cluster
Class 1
Class 2
Labeled
Unlabeled
1 1
1
: nm of labeled datapoints
: the set of classes
: nm of clusters,
: nm of members of class and of cluster
, max ,
2 , ,,
, ,
,
,
ik
i
ik
ik
i
i
i
i
c C k
i i
i
i i
i
i
i
M
C c
K
n c k
cF C K F c k
M
R c k P c kF c k
R c k P c k
nR c k
c
nP c k
k
1 2
2 4, 1 1 , 2 1
2 4R c k R c k
1 2
2 4, 1 , 2
6 6P c k R c k
MACHINE LEARNING - MSc CourseAPPLIED MACHINE LEARNING
66
Penalize fraction of labeled
points in each class
1 2
2 4, , 1 , 2 0.7
6 6F C K F c k F c k
Class 1
Class 2
Labeled
Unlabeled
Picks for each class
the cluster with the
maximal F1 measure
1 1
1
: nm of labeled datapoints
: the set of classes
: nm of clusters,
: nm of members of class and of cluster
, max ,
2 , ,,
, ,
,
,
ik
i
ik
ik
i
i
i
i
c C k
i i
i
i i
i
i
i
M
C c
K
n c k
cF C K F c k
M
R c k P c kF c k
R c k P c k
nR c k
c
nP c k
k
MACHINE LEARNING - MSc CourseAPPLIED MACHINE LEARNING
67
Summary of F1-Measure
Picks for each class the
cluster with the maximal
F1 measure
Recall: proportion of
datapoints correctly
classified/clusterized
Precision: proportion of
datapoints of the same
class in the cluster
1 1
1
: nm of labeled datapoints
: the set of classes
: nm of clusters,
: nm of members of class and of cluster
, max ,
2 , ,,
, ,
,
,
ik
i
ik
ik
i
i
i
i
c C k
i i
i
i i
i
i
i
M
C c
K
n c k
cF C K F c k
M
R c k P c kF c k
R c k P c k
nR c k
c
nP c k
k
Penalize fraction of labeled
points in each class
Clustering F1-Measure:(careful: similar but not the same F-measure as the F-measure we will see for classification!)
Tradeoff between clustering correctly all datapoints of the same class in the same
cluster and making sure that each cluster contains points of only one class.
MACHINE LEARNING - MSc CourseAPPLIED MACHINE LEARNING
68
Introduced two clustering techniques: K-means and DBSCAN
Discussed pros and cons in terms of computational time, power
of representation (globular/non-globular clusters)
Introduced metrics to evaluate clustering and help to choose the
hyperparameters:
• Internal measures (RSS, AIC, BIC)
• External measures: F1-measure (also called F-measure
for clustering)
Summary of Lecture
Next week: Practical on Clustering:
You will compare performance of K-means and DBSCAN on
your datasets and use the internal and external measure to
assess these performance and choose the hyperparameters.
APPLIED MACHINE LEARNING
69
Robotic Application of Clustering Method
Variety of hand postures
when grasping objects How to generate correct hand
posture on robots?
El-Khoury, S., Miao, Li and Billard, A. (2013) On the Generation of a Variety of Grasps. Robotics and Autonomous Systems Journal.
APPLIED MACHINE LEARNING
70
Robotic Application of Clustering Method
4 DOFs industrial hand
(Barrett Technology)
9 DOFs humanoid hand
(iCub Robot)
Problem:
Choose the point of contact and generate feasible posture for the fingers to
touch the object at the correct point and with the desired force.
Difficuly:
High-degrees of freedom (large number of possible points of contact, large
number of DOFs to control)
MACHINE LEARNING - MSc CourseAPPLIED MACHINE LEARNING
71
Formulate the problem as Constraint-Based Optimization :
Minimize generated torques at fingertips under
constraints:
•Force closure
•Kinematic feasibility
•Collision avoidance
Nonconvex optimization yields several local / feasible solutions
From 1890 trials converge
to 791 feasible solutions
From 1890 trials converge
to 612 feasible solutions
Took ~2.65s. for each
solution!
Took ~12.14s for each
solution
Took too long for realistic application
MACHINE LEARNING - MSc CourseAPPLIED MACHINE LEARNING
72
Apply K-means on all solutions and group them into clusters
11 Clusters 20 Clusters
MACHINE LEARNING - MSc CourseAPPLIED MACHINE LEARNING
73A. Shukla and A. Billard, NIPS 2012
MACHINE LEARNING - MSc CourseAPPLIED MACHINE LEARNING
74
MACHINE LEARNING - MSc CourseAPPLIED MACHINE LEARNING
75
top related