clustering with diversityyike/clusterslides.pdf · recall ‘-diversity: given a set v of points in...

46
1-1 Clustering with Diversity July 2010 Jian Li University of Maryland, College Park Ke Yi and Qin Zhang Hong Kong University of Science & Technology ICALP 2010

Upload: others

Post on 25-Jun-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Clustering with Diversityyike/clusterslides.pdf · Recall ‘-diversity: Given a set V of points in a metric space, each of whichhas a color, subject to: 1.each cluster has size l;

1-1

Clustering with Diversity

July 2010

Jian Li

University of Maryland, College Park

Ke Yi and Qin Zhang

Hong Kong University of Science & Technology

ICALP 2010

Page 2: Clustering with Diversityyike/clusterslides.pdf · Recall ‘-diversity: Given a set V of points in a metric space, each of whichhas a color, subject to: 1.each cluster has size l;

2-1

Clustering - overview

A general clustering problem:

Given a set V of points in a metric space, partition Vinto a set of disjoint clusters such that a certain objectivefunction is minimized,

subject to some

Page 3: Clustering with Diversityyike/clusterslides.pdf · Recall ‘-diversity: Given a set V of points in a metric space, each of whichhas a color, subject to: 1.each cluster has size l;

2-2

Clustering - overview

A general clustering problem:

Given a set V of points in a metric space, partition Vinto a set of disjoint clusters such that a certain objectivefunction is minimized,

subject to some

cluster-level constraints: impose restrictions, for exam-ple, on the number of clusters (exactly k clusters) oron the size of each cluster (at least l elements).

Page 4: Clustering with Diversityyike/clusterslides.pdf · Recall ‘-diversity: Given a set V of points in a metric space, each of whichhas a color, subject to: 1.each cluster has size l;

2-3

Clustering - overview

A general clustering problem:

Given a set V of points in a metric space, partition Vinto a set of disjoint clusters such that a certain objectivefunction is minimized,

subject to some

cluster-level constraints: impose restrictions, for exam-ple, on the number of clusters (exactly k clusters) oron the size of each cluster (at least l elements).

instance-level constraints: specify whether particularpair of items can be clustered together based on somebackground knowledge.

Page 5: Clustering with Diversityyike/clusterslides.pdf · Recall ‘-diversity: Given a set V of points in a metric space, each of whichhas a color, subject to: 1.each cluster has size l;

3-1

Clustering with diversity

all points partitioned into one cluster must havedistinct colors;

`-diversity: Given a set V of points in a metric space,each of which has a color, subject to:

each cluster has size ≥ l.

Goal (this paper): minimize max{diameter of clusters}.

Page 6: Clustering with Diversityyike/clusterslides.pdf · Recall ‘-diversity: Given a set V of points in a metric space, each of whichhas a color, subject to: 1.each cluster has size l;

3-2

Clustering with diversity

all points partitioned into one cluster must havedistinct colors;

`-diversity: Given a set V of points in a metric space,each of which has a color, subject to:

each cluster has size ≥ l.

⇒ `-anonymity

Goal (this paper): minimize max{diameter of clusters}.

Page 7: Clustering with Diversityyike/clusterslides.pdf · Recall ‘-diversity: Given a set V of points in a metric space, each of whichhas a color, subject to: 1.each cluster has size l;

4-1

The main motivation

`-diversity is motivated from privacy preservation for data pub-lication (Machanavajjhala et. al. 2006), follows `-anonymity(a.k.a. k-anonymity, Samarati 2001).

pneumonia

pneumonia

pneumonia

pneumonia

HIV

HIV

bronchitis

bronchitis

bronchitis

dyspepsia

DiseaseT-1D(Name)

1 (Adam)

2 (Bob)

3 (Calvin)

4 (Daisy)

5 (Elam)

6 (Frank)

7 (George)

8 (Henry)

9 (Ivy)

10 (Jane)

Age Gender Degree

29

25

25

29

40

45

35

37

50

60

M

M

M

F

F

F

M

M

M

M

M.Sc.

M.Sc.

B.Sc.

B.Sc.

B.Sc.

B.Sc.

B.Sc.

B.Sc.

Ph.D.

Ph.D.

(a) The microdata

sensitiveattribute

quasi-identifier (QI)

Page 8: Clustering with Diversityyike/clusterslides.pdf · Recall ‘-diversity: Given a set V of points in a metric space, each of whichhas a color, subject to: 1.each cluster has size l;

4-2

The main motivation

`-diversity is motivated from privacy preservation for data pub-lication (Machanavajjhala et. al. 2006), follows `-anonymity(a.k.a. k-anonymity, Samarati 2001).

pneumonia

pneumonia

pneumonia

pneumonia

HIV

HIV

bronchitis

bronchitis

bronchitis

dyspepsia

DiseaseT-1D(Name)

1 (Adam)

2 (Bob)

3 (Calvin)

4 (Daisy)

5 (Elam)

6 (Frank)

7 (George)

8 (Henry)

9 (Ivy)

10 (Jane)

Age Gender Degree

29

25

25

29

40

45

35

37

50

60

M

M

M

F

F

F

M

M

M

M

M.Sc.

M.Sc.

B.Sc.

B.Sc.

B.Sc.

B.Sc.

B.Sc.

B.Sc.

Ph.D.

Ph.D.

(a) The microdata

sensitiveattribute

quasi-identifier (QI)

pneumonia

pneumonia

pneumonia

pneumonia

HIV

HIV

bronchitis

bronchitis

bronchitis

dyspepsia

DiseaseQIs

(25-29, M, MSc)

(25-29, *, BSc)

(40-45, M, BSc)

(35-37, M, BSc)

(50-60, F, PhD)

(b) A 2-anonymous table

Page 9: Clustering with Diversityyike/clusterslides.pdf · Recall ‘-diversity: Given a set V of points in a metric space, each of whichhas a color, subject to: 1.each cluster has size l;

4-3

The main motivation

`-diversity is motivated from privacy preservation for data pub-lication (Machanavajjhala et. al. 2006), follows `-anonymity(a.k.a. k-anonymity, Samarati 2001).

pneumonia

pneumonia

pneumonia

pneumonia

HIV

HIV

bronchitis

bronchitis

bronchitis

dyspepsia

DiseaseT-1D(Name)

1 (Adam)

2 (Bob)

3 (Calvin)

4 (Daisy)

5 (Elam)

6 (Frank)

7 (George)

8 (Henry)

9 (Ivy)

10 (Jane)

Age Gender Degree

29

25

25

29

40

45

35

37

50

60

M

M

M

F

F

F

M

M

M

M

M.Sc.

M.Sc.

B.Sc.

B.Sc.

B.Sc.

B.Sc.

B.Sc.

B.Sc.

Ph.D.

Ph.D.

(a) The microdata

sensitiveattribute

quasi-identifier (QI)

pneumonia

pneumonia

pneumonia

pneumonia

HIV

HIV

bronchitis

bronchitis

bronchitis

dyspepsia

DiseaseQIs

(25-29, M, MSc)

(25-29, *, BSc)

(40-45, M, BSc)

(35-37, M, BSc)

(50-60, F, PhD)

(b) A 2-anonymous table

pneumonia

pneumonia

pneumonia

pneumonia

HIV

HIV

bronchitis

bronchitis

bronchitis

dyspepsia

DiseaseQIs

(c) An 2-diverse table

(25-29, M, *)

(25-29, *, *)

(50-60, F, PhD)

(35-40, M, BSc)

(37-45, M, BSc)

Page 10: Clustering with Diversityyike/clusterslides.pdf · Recall ‘-diversity: Given a set V of points in a metric space, each of whichhas a color, subject to: 1.each cluster has size l;

5-1

Previous work

`-anonymity

A 2-approximation in the metric space is known(Aggarwal el. al. 2006).

Page 11: Clustering with Diversityyike/clusterslides.pdf · Recall ‘-diversity: Given a set V of points in a metric space, each of whichhas a color, subject to: 1.each cluster has size l;

5-2

Previous work

`-anonymity

A 2-approximation in the metric space is known(Aggarwal el. al. 2006).

Many heuristic solutions have been proposed in the DB commu-nity (i.e. LeFevre et. al. 2006, Ghinita et. al. 2007).But no theoretical result.

`-diversity

Page 12: Clustering with Diversityyike/clusterslides.pdf · Recall ‘-diversity: Given a set V of points in a metric space, each of whichhas a color, subject to: 1.each cluster has size l;

5-3

Previous work

`-anonymity

A 2-approximation in the metric space is known(Aggarwal el. al. 2006).

Many heuristic solutions have been proposed in the DB commu-nity (i.e. LeFevre et. al. 2006, Ghinita et. al. 2007).But no theoretical result.

`-diversity

Outliers (remove some points to get better clusters)

First considered by Charikar et. al. 2001 for facility locationand k-median.

A 4-approximation is known for `-anonymity(Aggarwal el. al. 2006).

Page 13: Clustering with Diversityyike/clusterslides.pdf · Recall ‘-diversity: Given a set V of points in a metric space, each of whichhas a color, subject to: 1.each cluster has size l;

6-1

Related work on instance-level constraints

ML constraints and CL constraints (Wagstaff and Cardie 2000)

must-link (ML): two points must be clustered together.cannot-link (CL): two points must be separated.

`-diverse clustering can be seen as a special case where nodeswith the same color must satisfy CL constraints.

No approximation algorithm is studied.

Page 14: Clustering with Diversityyike/clusterslides.pdf · Recall ‘-diversity: Given a set V of points in a metric space, each of whichhas a color, subject to: 1.each cluster has size l;

6-2

Related work on instance-level constraints

ML constraints and CL constraints (Wagstaff and Cardie 2000)

must-link (ML): two points must be clustered together.cannot-link (CL): two points must be separated.

`-diverse clustering can be seen as a special case where nodeswith the same color must satisfy CL constraints.

No approximation algorithm is studied.

Correlation clustering (Bansal et. al. 2004)

Minimize the violation of the given constraints.

Best approximation algorithms are due to Ailon et. al. 2008

Page 15: Clustering with Diversityyike/clusterslides.pdf · Recall ‘-diversity: Given a set V of points in a metric space, each of whichhas a color, subject to: 1.each cluster has size l;

7-1

Our results for l-diversity

A 2-approximation algorithm

(if the problem has feasible solutions).

A matching lower bound assuming P 6= NP

(even there are only 3 colors)

An O(1)-approximation algorithm for the infeasible case.

(if the problem does not have a feasible solution, we remove the

least possible number of points to get a feasible solution)

Page 16: Clustering with Diversityyike/clusterslides.pdf · Recall ‘-diversity: Given a set V of points in a metric space, each of whichhas a color, subject to: 1.each cluster has size l;

8-1

2-approximation algorithm

Page 17: Clustering with Diversityyike/clusterslides.pdf · Recall ‘-diversity: Given a set V of points in a metric space, each of whichhas a color, subject to: 1.each cluster has size l;

9-1

Recall `-diversity: Given a set V of points in a metricspace, each of which has a color, subject to:

1. each cluster has size ≥ l;

2. all points partitioned into one cluster must have dis-tinct colors.

Goal: try to minimize the largest diameter of clusters.

Problems and definitions

Page 18: Clustering with Diversityyike/clusterslides.pdf · Recall ‘-diversity: Given a set V of points in a metric space, each of whichhas a color, subject to: 1.each cluster has size l;

9-2

Recall `-diversity: Given a set V of points in a metricspace, each of which has a color, subject to:

1. each cluster has size ≥ l;

2. all points partitioned into one cluster must have dis-tinct colors.

Goal: try to minimize the largest diameter of clusters.

Problems and definitions

Given V , construct G(V,E) with

• each v ∈ V has a color c(v);

• for each pair of points u,v with distinct colors, creat anedge e = (u, v) with weight w(e), which is the distanceof u, v in the metric space.

• diameter of C ⊆ V : maxu,v∈C(w(e(u, v))).

Page 19: Clustering with Diversityyike/clusterslides.pdf · Recall ‘-diversity: Given a set V of points in a metric space, each of whichhas a color, subject to: 1.each cluster has size l;

10-1

Definitions (cont.)

star forest: a forest where eachconnected component is a star.

Page 20: Clustering with Diversityyike/clusterslides.pdf · Recall ‘-diversity: Given a set V of points in a metric space, each of whichhas a color, subject to: 1.each cluster has size l;

10-2

Definitions (cont.)

star forest: a forest where eachconnected component is a star.

spanning star forest: a star forestwith no isolated point.

Page 21: Clustering with Diversityyike/clusterslides.pdf · Recall ‘-diversity: Given a set V of points in a metric space, each of whichhas a color, subject to: 1.each cluster has size l;

10-3

Definitions (cont.)

star forest: a forest where eachconnected component is a star.

spanning star forest: a star forestwith no isolated point.

Page 22: Clustering with Diversityyike/clusterslides.pdf · Recall ‘-diversity: Given a set V of points in a metric space, each of whichhas a color, subject to: 1.each cluster has size l;

10-4

Definitions (cont.)

star forest: a forest where eachconnected component is a star.

spanning star forest: a star forestwith no isolated point.

semi-valid spanning star forest: a spanning star forest witheach component containing at least ` colors.

` = 3

Page 23: Clustering with Diversityyike/clusterslides.pdf · Recall ‘-diversity: Given a set V of points in a metric space, each of whichhas a color, subject to: 1.each cluster has size l;

10-5

Definitions (cont.)

star forest: a forest where eachconnected component is a star.

spanning star forest: a star forestwith no isolated point.

semi-valid spanning star forest: a spanning star forest witheach component containing at least ` colors.

` = 3

Page 24: Clustering with Diversityyike/clusterslides.pdf · Recall ‘-diversity: Given a set V of points in a metric space, each of whichhas a color, subject to: 1.each cluster has size l;

10-6

Definitions (cont.)

star forest: a forest where eachconnected component is a star.

spanning star forest: a star forestwith no isolated point.

semi-valid spanning star forest: a spanning star forest witheach component containing at least ` colors.

` = 3

valid spanning star forest: a semi-valid spanning star forestwith each component containing points with distinct colors.

Page 25: Clustering with Diversityyike/clusterslides.pdf · Recall ‘-diversity: Given a set V of points in a metric space, each of whichhas a color, subject to: 1.each cluster has size l;

10-7

Definitions (cont.)

star forest: a forest where eachconnected component is a star.

spanning star forest: a star forestwith no isolated point.

semi-valid spanning star forest: a spanning star forest witheach component containing at least ` colors.

` = 3

valid spanning star forest: a semi-valid spanning star forestwith each component containing points with distinct colors.

Page 26: Clustering with Diversityyike/clusterslides.pdf · Recall ‘-diversity: Given a set V of points in a metric space, each of whichhas a color, subject to: 1.each cluster has size l;

11-1

A review on 2-approximation for `-anonymity

General idea of the algorithm in Aggarwal el. al. 2006.

1. Let e1, e2, . . . be the edge of G in a non-decreasingorder of weights.

2. Consider each graph Gi formed by the first i edgesEi = {e1, e2, . . . , ei}.

3. For each Gi = (V,Ei), try to

Page 27: Clustering with Diversityyike/clusterslides.pdf · Recall ‘-diversity: Given a set V of points in a metric space, each of whichhas a color, subject to: 1.each cluster has size l;

11-2

A review on 2-approximation for `-anonymity

General idea of the algorithm in Aggarwal el. al. 2006.

1. Let e1, e2, . . . be the edge of G in a non-decreasingorder of weights.

2. Consider each graph Gi formed by the first i edgesEi = {e1, e2, . . . , ei}.

3. For each Gi = (V,Ei), try to

find a maximal independent set (IS) I s.t.

(1) there is a spanning star forest in Gi with thenodes in I being the star centers,

(2) each star has at least ` nodes.

Page 28: Clustering with Diversityyike/clusterslides.pdf · Recall ‘-diversity: Given a set V of points in a metric space, each of whichhas a color, subject to: 1.each cluster has size l;

11-3

A review on 2-approximation for `-anonymity

General idea of the algorithm in Aggarwal el. al. 2006.

1. Let e1, e2, . . . be the edge of G in a non-decreasingorder of weights.

2. Consider each graph Gi formed by the first i edgesEi = {e1, e2, . . . , ei}.

3. For each Gi = (V,Ei), try to

find a maximal independent set (IS) I s.t.

(1) there is a spanning star forest in Gi with thenodes in I being the star centers,

(2) each star has at least ` nodes.

Let the diameter of the OPT clustering be d∗ withw(ei∗) = d∗. Aggarwal el. al. shows that the trial onGi∗ must succeed.

Page 29: Clustering with Diversityyike/clusterslides.pdf · Recall ‘-diversity: Given a set V of points in a metric space, each of whichhas a color, subject to: 1.each cluster has size l;

11-4

A review on 2-approximation for `-anonymity

General idea of the algorithm in Aggarwal el. al. 2006.

1. Let e1, e2, . . . be the edge of G in a non-decreasingorder of weights.

2. Consider each graph Gi formed by the first i edgesEi = {e1, e2, . . . , ei}.

3. For each Gi = (V,Ei), try to

find a maximal independent set (IS) I s.t.

(1) there is a spanning star forest in Gi with thenodes in I being the star centers,

(2) each star has at least ` nodes.

Let the diameter of the OPT clustering be d∗ withw(ei∗) = d∗. Aggarwal el. al. shows that the trial onGi∗ must succeed.

⇒ 2-approximation (SOL ≤ 2d∗)

Page 30: Clustering with Diversityyike/clusterslides.pdf · Recall ‘-diversity: Given a set V of points in a metric space, each of whichhas a color, subject to: 1.each cluster has size l;

12-1

Our algorithm for `-diversity

Our algorithm for `-diversity follows the same frame-work but we try to

find a maximal IS I s.t.

there is a valid spanning star forest in Gi with thenodes in I being the star centers.

Page 31: Clustering with Diversityyike/clusterslides.pdf · Recall ‘-diversity: Given a set V of points in a metric space, each of whichhas a color, subject to: 1.each cluster has size l;

12-2

Our algorithm for `-diversity

Our algorithm for `-diversity follows the same frame-work but we try to

find a maximal IS I s.t.

there is a valid spanning star forest in Gi with thenodes in I being the star centers.

We can also prove that the algorithm will succeed on Gi∗ ,⇒ 2-approximation.

Page 32: Clustering with Diversityyike/clusterslides.pdf · Recall ‘-diversity: Given a set V of points in a metric space, each of whichhas a color, subject to: 1.each cluster has size l;

12-3

Our algorithm for `-diversity

Our algorithm for `-diversity follows the same frame-work but we try to

find a maximal IS I s.t.

there is a valid spanning star forest in Gi with thenodes in I being the star centers.

We can also prove that the algorithm will succeed on Gi∗ ,⇒ 2-approximation.

The additional challenge: nodes in a star havedistinct colors.

Page 33: Clustering with Diversityyike/clusterslides.pdf · Recall ‘-diversity: Given a set V of points in a metric space, each of whichhas a color, subject to: 1.each cluster has size l;

13-1

Two tests

S-TEST(Gi, I): check if there exists a semi-valid span-ning star forest in Gi with nodes in I being star centers.

⇒S-TEST

Page 34: Clustering with Diversityyike/clusterslides.pdf · Recall ‘-diversity: Given a set V of points in a metric space, each of whichhas a color, subject to: 1.each cluster has size l;

13-2

Two tests

S-TEST(Gi, I): check if there exists a semi-valid span-ning star forest in Gi with nodes in I being star centers.

V-TEST(Gi, I): check if there exists a valid spanningstar forest in Gi with nodes in I being star centers.

⇒⇒S-TEST V-TEST

Page 35: Clustering with Diversityyike/clusterslides.pdf · Recall ‘-diversity: Given a set V of points in a metric space, each of whichhas a color, subject to: 1.each cluster has size l;

14-1

The 2-approximate algorithm

1. Let I be an arbitrary maximal IS in Gi

2. While S-TEST(Gi, I) is passed

(a) (S, S′)← V-TEST(Gi,I) /* S ⊆ V − I, S′ ⊆ I */

(b) If S = ∅ then Succeed; else

i. I ← I − S′ + S/* |S′| < |S|, |I| increase and I is still an IS */

ii. Add nodes to I until it is a maximal IS

3. Fail;

We just perform the trial on graphs G1, G2, . . . one byone, until we find on some Gi a valid spanning forestwith a maximal independent set as centers.

Page 36: Clustering with Diversityyike/clusterslides.pdf · Recall ‘-diversity: Given a set V of points in a metric space, each of whichhas a color, subject to: 1.each cluster has size l;

14-2

The 2-approximate algorithm

1. Let I be an arbitrary maximal IS in Gi

2. While S-TEST(Gi, I) is passed

(a) (S, S′)← V-TEST(Gi,I) /* S ⊆ V − I, S′ ⊆ I */

(b) If S = ∅ then Succeed; else

i. I ← I − S′ + S/* |S′| < |S|, |I| increase and I is still an IS */

ii. Add nodes to I until it is a maximal IS

3. Fail;

We just perform the trial on graphs G1, G2, . . . one byone, until we find on some Gi a valid spanning forestwith a maximal independent set as centers.

redistribute pointsandpick new centers

Page 37: Clustering with Diversityyike/clusterslides.pdf · Recall ‘-diversity: Given a set V of points in a metric space, each of whichhas a color, subject to: 1.each cluster has size l;

14-3

The 2-approximate algorithm

1. Let I be an arbitrary maximal IS in Gi

2. While S-TEST(Gi, I) is passed

(a) (S, S′)← V-TEST(Gi,I) /* S ⊆ V − I, S′ ⊆ I */

(b) If S = ∅ then Succeed; else

i. I ← I − S′ + S/* |S′| < |S|, |I| increase and I is still an IS */

ii. Add nodes to I until it is a maximal IS

3. Fail;

We just perform the trial on graphs G1, G2, . . . one byone, until we find on some Gi a valid spanning forestwith a maximal independent set as centers.

Claim: Both tests succeed on Gi∗ .⇒ ∃ a valid spanning star forest on Gi∗

Page 38: Clustering with Diversityyike/clusterslides.pdf · Recall ‘-diversity: Given a set V of points in a metric space, each of whichhas a color, subject to: 1.each cluster has size l;

15-1

Lowerbound

Theorem: There is no polynomial-time approximation al-gorithm for `-diversity that achieves an approximation fac-tor less than 2 unless P = NP.

By reduction to 3D-matching.

Page 39: Clustering with Diversityyike/clusterslides.pdf · Recall ‘-diversity: Given a set V of points in a metric space, each of whichhas a color, subject to: 1.each cluster has size l;

15-2

Lowerbound

Theorem: There is no polynomial-time approximation al-gorithm for `-diversity that achieves an approximation fac-tor less than 2 unless P = NP.

By reduction to 3D-matching.

Holds even there are only 3 colors.

If there are 2 colors, the problem can be solved in polynomialtime by finding perfect matchings in the G1, G2, . . ..

Page 40: Clustering with Diversityyike/clusterslides.pdf · Recall ‘-diversity: Given a set V of points in a metric space, each of whichhas a color, subject to: 1.each cluster has size l;

16-1

The infeasible case (high-level sketch)

If some color has more than bn/lc points, there is no fea-sible solution, thus we must remove some points.

Least number of points that should be removed to get afeasible solution can be computed, say, k points.

Goal: Compute an optimal clustering by deleting k points.

Page 41: Clustering with Diversityyike/clusterslides.pdf · Recall ‘-diversity: Given a set V of points in a metric space, each of whichhas a color, subject to: 1.each cluster has size l;

16-2

The infeasible case (high-level sketch)

If some color has more than bn/lc points, there is no fea-sible solution, thus we must remove some points.

Least number of points that should be removed to get afeasible solution can be computed, say, k points.

Goal: Compute an optimal clustering by deleting k points.

Additional challenge: have to decide (even know k):

which points should be deleted?

Page 42: Clustering with Diversityyike/clusterslides.pdf · Recall ‘-diversity: Given a set V of points in a metric space, each of whichhas a color, subject to: 1.each cluster has size l;

16-3

The infeasible case (high-level sketch)

If some color has more than bn/lc points, there is no fea-sible solution, thus we must remove some points.

Least number of points that should be removed to get afeasible solution can be computed, say, k points.

Goal: Compute an optimal clustering by deleting k points.

Additional challenge: have to decide (even know k):

which points should be deleted?

Our strategy: to find

a valid star forest spanning n− k nodes in G28i∗ .

⇒ 56-approximation

Page 43: Clustering with Diversityyike/clusterslides.pdf · Recall ‘-diversity: Given a set V of points in a metric space, each of whichhas a color, subject to: 1.each cluster has size l;

17-1

Futher work

Try to minimize the sum of the diameters of the clusters.

Page 44: Clustering with Diversityyike/clusterslides.pdf · Recall ‘-diversity: Given a set V of points in a metric space, each of whichhas a color, subject to: 1.each cluster has size l;

17-2

Futher work

Try to minimize the sum of the diameters of the clusters.

Design approximation algorithms for the problem byremoving any fixed number of points.

Page 45: Clustering with Diversityyike/clusterslides.pdf · Recall ‘-diversity: Given a set V of points in a metric space, each of whichhas a color, subject to: 1.each cluster has size l;

17-3

Futher work

Try to minimize the sum of the diameters of the clusters.

Design approximation algorithms for the problem byremoving any fixed number of points.

Consider other instance-level constraints like the gen-eral Must-Link and Cannot-Link constraints,

Page 46: Clustering with Diversityyike/clusterslides.pdf · Recall ‘-diversity: Given a set V of points in a metric space, each of whichhas a color, subject to: 1.each cluster has size l;

18-1

The End

T HANK YOU

Q and A