an impossibility theorem for clustering by jon kleinberg
TRANSCRIPT
![Page 1: An Impossibility Theorem for Clustering By Jon Kleinberg](https://reader036.vdocuments.us/reader036/viewer/2022081419/56649e555503460f94b4c5c7/html5/thumbnails/1.jpg)
An Impossibility Theorem for Clustering
By Jon Kleinberg
![Page 2: An Impossibility Theorem for Clustering By Jon Kleinberg](https://reader036.vdocuments.us/reader036/viewer/2022081419/56649e555503460f94b4c5c7/html5/thumbnails/2.jpg)
Definitions Clustering function: operates on a set S of
more than 2 points and the distances among them
where is a partition of S Distance function:
the distance is 0 only for d(i,i) Does not require the triangle inequality.
RSSd :
),( dSf
![Page 3: An Impossibility Theorem for Clustering By Jon Kleinberg](https://reader036.vdocuments.us/reader036/viewer/2022081419/56649e555503460f94b4c5c7/html5/thumbnails/3.jpg)
Many different clustering criteria
k-center k-median k-means Inter-Intra etc
![Page 4: An Impossibility Theorem for Clustering By Jon Kleinberg](https://reader036.vdocuments.us/reader036/viewer/2022081419/56649e555503460f94b4c5c7/html5/thumbnails/4.jpg)
k-Center
Minimize maximum distance
![Page 5: An Impossibility Theorem for Clustering By Jon Kleinberg](https://reader036.vdocuments.us/reader036/viewer/2022081419/56649e555503460f94b4c5c7/html5/thumbnails/5.jpg)
k-median
Minimize average distance
k-means: minimize distance squared
![Page 6: An Impossibility Theorem for Clustering By Jon Kleinberg](https://reader036.vdocuments.us/reader036/viewer/2022081419/56649e555503460f94b4c5c7/html5/thumbnails/6.jpg)
Inter-Intra
T(C)
D(C)
Maximize D(C) – T(C)
![Page 7: An Impossibility Theorem for Clustering By Jon Kleinberg](https://reader036.vdocuments.us/reader036/viewer/2022081419/56649e555503460f94b4c5c7/html5/thumbnails/7.jpg)
Motivation
Each criterion optimizes different features
Is there one clustering criterion with phenomenal cosmic powers?
![Page 8: An Impossibility Theorem for Clustering By Jon Kleinberg](https://reader036.vdocuments.us/reader036/viewer/2022081419/56649e555503460f94b4c5c7/html5/thumbnails/8.jpg)
Method
Give three intuitive axioms that any criterion should satisfy
Surprise: Not possible to satisfy all three
Reminiscent of Arrow’s Impossibility theorem: ranking is impossible
![Page 9: An Impossibility Theorem for Clustering By Jon Kleinberg](https://reader036.vdocuments.us/reader036/viewer/2022081419/56649e555503460f94b4c5c7/html5/thumbnails/9.jpg)
Axiom 1 – Scale-Invariance For any distance function d and any β >0 we have
that f(S,d)=f(S,βd)
![Page 10: An Impossibility Theorem for Clustering By Jon Kleinberg](https://reader036.vdocuments.us/reader036/viewer/2022081419/56649e555503460f94b4c5c7/html5/thumbnails/10.jpg)
Axiom 2 - Richness Range(f) is equal to all partitions of S
i.e. All possible clusterings can be generated given the right distances
![Page 11: An Impossibility Theorem for Clustering By Jon Kleinberg](https://reader036.vdocuments.us/reader036/viewer/2022081419/56649e555503460f94b4c5c7/html5/thumbnails/11.jpg)
Axiom 3 - Consistency Let d and d’ be two distance functions. If
f(d) = and d’ is such that the distance between all points in a cluster is less than in d and the distance between inter-cluster points is larger than in d then f(d’)=
d(i,j)
d(i,j)d’(i,j)
d’(i,j)
![Page 12: An Impossibility Theorem for Clustering By Jon Kleinberg](https://reader036.vdocuments.us/reader036/viewer/2022081419/56649e555503460f94b4c5c7/html5/thumbnails/12.jpg)
Definition
Anti-chain: A collection of partitions is an anti-chain if it does not contain two distinct partitions such that one is a refinement of the other
Anti-Chains can not satisfy Richness
![Page 13: An Impossibility Theorem for Clustering By Jon Kleinberg](https://reader036.vdocuments.us/reader036/viewer/2022081419/56649e555503460f94b4c5c7/html5/thumbnails/13.jpg)
Main Result For each , there is no clustering
function f that satisfies Scale-Invariance, Richness and Consistency
Implied by proof that if f satisfies Scale-Invariance and Consistency, then Range(f) is an anti-chain
2n
![Page 14: An Impossibility Theorem for Clustering By Jon Kleinberg](https://reader036.vdocuments.us/reader036/viewer/2022081419/56649e555503460f94b4c5c7/html5/thumbnails/14.jpg)
Reminder of Axioms Scale-Invariance: For any distance
function d and any β >0 we have that f(d)=f(β d)
Richness: Range(f) is equal to all partitions of S
Consistency: Let d and d’ be two distance functions. If f(d) = and d’ is such that the distance between all points in a cluster is less than in d and the distance between inter-cluster points is larger than in d then f(d’)=
![Page 15: An Impossibility Theorem for Clustering By Jon Kleinberg](https://reader036.vdocuments.us/reader036/viewer/2022081419/56649e555503460f94b4c5c7/html5/thumbnails/15.jpg)
Single Linkage
Cluster by combining the closest points
0 1 4 9 10 12 15 19 20
![Page 16: An Impossibility Theorem for Clustering By Jon Kleinberg](https://reader036.vdocuments.us/reader036/viewer/2022081419/56649e555503460f94b4c5c7/html5/thumbnails/16.jpg)
Any two axioms For every pair of axioms, there is a
stopping condition for single linkage
Consistency + Richness: only link if distance is less than r
Consistency + SI: stop when you have k connected components
Richness + SI: if x is the diameter of the graph, only add edges with weight βx
![Page 17: An Impossibility Theorem for Clustering By Jon Kleinberg](https://reader036.vdocuments.us/reader036/viewer/2022081419/56649e555503460f94b4c5c7/html5/thumbnails/17.jpg)
Centroid-Based Clustering (k,g)-centroid clustering function: Choose
T, a set of k centroid points such that is minimized
If g is identity, we get k-median, etc.
Result: For every and every function g and n significantly larger than k the (k,g)-centroid clustering function does not satisfy consistency.
)),(( TidgSi
2k
![Page 18: An Impossibility Theorem for Clustering By Jon Kleinberg](https://reader036.vdocuments.us/reader036/viewer/2022081419/56649e555503460f94b4c5c7/html5/thumbnails/18.jpg)
Proof: A contradiction
r
r+δ
ε
X (size m)Y (size λm)
)()()),(( mgrmgTidg
![Page 19: An Impossibility Theorem for Clustering By Jon Kleinberg](https://reader036.vdocuments.us/reader036/viewer/2022081419/56649e555503460f94b4c5c7/html5/thumbnails/19.jpg)
A new distance function
r’r+δ
ε
Y (size λm)
)()'()),(( rmgrmgTidg
X0 (size m/2)
r’
r
r+δ
X1 (size m/2)
r’ < r
![Page 20: An Impossibility Theorem for Clustering By Jon Kleinberg](https://reader036.vdocuments.us/reader036/viewer/2022081419/56649e555503460f94b4c5c7/html5/thumbnails/20.jpg)
Wrapping Up If we pick λ, r, r’, ε and δ right then we can
have:
But then our new centers are in X0 and X1
But our new distance followed consistency, so it should give us X and Y.
This covers the case where k is 2.
)()'()()( rmgrmgmgrmg
![Page 21: An Impossibility Theorem for Clustering By Jon Kleinberg](https://reader036.vdocuments.us/reader036/viewer/2022081419/56649e555503460f94b4c5c7/html5/thumbnails/21.jpg)
Discussion: Relaxing Axioms Refinement-consistency: if d’ is an f(d)-
transformation of d, then f(d’) is a refinement of f(d) Near-Richness: all partitions except the trivial
one can be obtained
These together allow a function that satisfies these replacements.
What other relaxations could we have?
![Page 22: An Impossibility Theorem for Clustering By Jon Kleinberg](https://reader036.vdocuments.us/reader036/viewer/2022081419/56649e555503460f94b4c5c7/html5/thumbnails/22.jpg)
Discussion Does this mean there is a law of continuous
employment for clustering criterion creators?
Is the clustering function properly defined? Allow overlaps Allow outliers
Are these the right axioms? All partitions possible vs. power set
Axioms for graph clustering?
![Page 23: An Impossibility Theorem for Clustering By Jon Kleinberg](https://reader036.vdocuments.us/reader036/viewer/2022081419/56649e555503460f94b4c5c7/html5/thumbnails/23.jpg)
Questions?