distance metric learning, with application to clustering with side-information
Post on 30-Dec-2015
48 Views
Preview:
DESCRIPTION
TRANSCRIPT
Distance metric learning, with applicationto clustering with side-information
Eric P. Xing, Andrew Y. Ng, Michael I. Jordan and Stuart Russell
University of California, Berkeley{epxing,ang,jordan,russell}@cs.berkeley.edu
Neural Information Processing Systems 2002
2004/9 2
Abstract Many algorithms rely critically on being given a good
metric over the input space.– Provide a more systematic way for users to indicate what they
consider “similar”. If a clustering algorithm fails to find one that is meaningful
to a user, the only recourse may be for the user to manually tweak the metric until sufficiently good clusters are found.
In this paper, we present an algorithm that, given examples of similar (and, if desired, dissimilar) pairs of points in , learns a distance metric over that respects these relationships.
nn
2004/9 3
Introduction Given a good metrics that reflect reasonably well the important
relationships between the data.– K-means , nearest-neighbors , SVMs ,and so on.
Many learning and datamining algorithm depend on a good metric measurement, this problem is particularly acute in unsupervised settings such as clustering.
One important family of algorithms that learn metrics are the unsupervised ones that take an input dataset, and find an embedding of it in some space. Such as:
– Multidimensional Scaling (MDS) – Locally Linear Embedding (LLE)
• Have “no right answer problem”
• Similar to Principal Components Analysis (PCA)
2004/9 4
Introduction
For clustering with similarity information certain pairs are “similar” or “dissimilar” , they search for a clustering that puts the similar pairs into the same, and dissimilar pairs into different, clusters.
This gives a way of using similarity side-information to find clusters that reflect a user’s notion of meaningful clusters.
2004/9 5
Learning Distance Metrics Suppose we have some set of points ,and are g
iven information that certain pairs of them are “similar”:
Consider learning a distance metric of the form
How can we learn a distance metric d(x,y) between points x and y that respects this; specifically, so that “similar” points end up close to each other .
nmiix 1}{
2004/9 6
Learning Distance Metrics
metric – satisfying non-negativity and the triangle inequality – We require that A be positive semi-definite, – Setting A = givens Euclidean distance , – If we restrict A to be diagonal, this corresponds to learning a
metric in which the different axes are given different “weights”– More generally, A parameterizes a family of Mahalanobis dis
tances over .– Learning such a distance metric is also equivalent to finding
a rescaling of a data that replaces each point x with and applying the standard Euclidean metric to the rescaled data.
0AI
n
xA 2/1
2004/9 7
Learning Distance Metrics A simple way of defining a criterion for the desired metric is
to demand that pairs of points (xi, yj) in S have, say, small squared distance between them :
This is trivially solved with A=0, which is not useful, and we add the constraint :
– To ensure that A does not collapse the dataset into a single point.
– Here, D can be a set of pairs of points known to be “dissimilar” ,i.e., all pairs not in S.
Sxx AjiA ji
xx),(
2||||min
1||||),(
ADxx jiji
xx
2004/9 8
Learning Distance Metrics This givens the optimization problem:
• The optimization problem is convex, which enables us to derive efficient, local-minima-free algorithm to solve it.
2004/9 9
Note.
2004/9 10
Note.
Newton-Raphson Method:
2004/9 11
Note.
2004/9 12
Learning Distance Metrics The case of diagonal
– In the case that we want to learn a diagonal A=diag(A11,A22,…Ann), we can derive an efficient algorithm using the Newton-Raphson to efficiently optimize g.
Newton update by ,where is a step-size parameter optimized via a line-search to give the largest downhill step subject to Aii 0
gH 1 gH 1
2004/9 13
Learning Distance Metrics
The case of full A – Newton’s method often becomes prohibitively
expensive requiring O(n6) time to invert the Hessian over n2 parameters.
2004/9 14
Learning Distance Metrics
2004/9 15
Experiments and Examples
In the experiments with synthetic data, S was a randomly sampled 1% of all pairs of similar points.
We can use the fact discussed earlier that learning ||.||A is equivalent to finding rescaling of the data x -> A1/2x
2004/9 16
Experiments and Examples -Examples of learned distance metrics
2004/9 17
Experiments and Examples -Examples of learned distance metrics
2004/9 18
Experiments and Examples-Application to clustering
Clustering with side information:– Given S,and told that each pair mean
s xi and xj belong to the same cluster.
Let be the cluster to which point xi is assigned by an automatic clustering algorithm,and let ci be some “correct” or desired clustering of the data.
Sxx ji ,
)1(ˆ mic
2004/9 19
Experiments and Examples-Application to clustering
2004/9 20
Experiments and Examples-Application to clustering
2004/9 21
Experiments and Examples-Application to clustering
2004/9 22
Experiments and Examples-Application to clustering
2004/9 23
Conclusions We have presented an algorithm that, given examples
of similar pairs of points in , learns a distance metric that respects these relationships.
The method is based on posing metric learning as a convex optimization problem, which allowed us to derive efficient, local optima free algorithms.
We also showed examples of diagonal and full metrics learned from simple artificial examples, and demonstrated on artificial and on UCI datasets how our methods can be used to improve clustering performance.
n
top related