bi-criteria linear-time approximations for generalized k-mean/median/center

1

Bi-criteria Linear-time Approximations for

Generalized k-Mean/Median/Center

Speaker: Dan Feldman

Joint work with

Amos Fiat, Danny Segev & Micha Sharir

1-Line Median

Let P be a set of n points in ddR

P

The line median * minimizes dist( )p P

p

1-Line Median

*

*dist( )p

p

- Line Mediank

- Approximation (PTAS)

L (k lines) is a (1 ) approximation if

dist OPT( ) (1 )p P

Lp

min distT ( )OPL k

p P

p L

(1 )

- Line Meank

- Line Centerk

dist( )max ,p P

p L

k

- Line Median

Can you cover P by k lines?

´ Does OPT = 0 ?

´ Does 2OPT = 0 ?

´ Does nOPT = 0 ?

- Line Median/Mean/Center is NP-Hard

It is NP -Hard to decide whether a set of npoints can be covered by k lines [Megiddo andTamir, 1983]

Iafddfsd

It is NP -Hard to decide whether a set

P µ R2 can be covered by k-lines.

[Megiddo and Tamir, 1983]

No non-trivial approximations to the

k-line median/ mean/ center that takes

poly (k) time

k

( Approximation

L is a -approximation for the k-line median if

dis PT) Ot(p P

p L

L is an (-approximation for the k-line median if

| |L dis PT) Ot(p P

p L

and

| |L k and

Example: The 2-Line Median of P

* * *1 2,L

*2

*1

(3-)½, approximation to the 2-Line Median of P

`

1 2 3, ,L

1 2

3

dist( , ½ P) O T p P

Lp

(4 ,10-) approximation to the 2-Line Median of P

1 2 3 4, , ,L

32

4

1

dist( , ) OPT10 p P

p L

k j-Flat Median/Mean/Center

A set F of k j-dimensional flats that minimizes the

sum of distances/

sum of squared distances/

maximum distance

from P to F

Results forj ¸ 1;k = 1² Mean:

O(nd2) time, Exact (SVD) [Pearson, 1901]

nd¢poly(j ;1=²) time, P TAS[Deshpande et al.,][Sarlos][Har-Peled] (2006)

² Median:

nd¢2poly(j ;1=²) time, P TAS[Shyamalkumar & Varadarajan, 2007]

² Center:

nd¢exp³poly(2(j

2) ;1=²)´time, P TAS

[Har-Peled & Varadarajan, 2004]

Results forj = 1;k ¸ 1² Mean/ Median:

nd¢kO(1) + (²¡ 1 logn)O(dk) time PTAS ,[Feldman et al., 2006]

² Center:

n logn ¢(1=²)poly(d;k) time PTAS[Agarwal et al., 2002]

O(dnk3 log4 n) time for³O(dk logk);8

´-approximation

[Agarwal & P rocopiuc, 2000]

Results for

PTAS that takes d¢npoly(j ;k;1=²) time.

Mean: [Deshpande et al., 2006]Median: [Shyamalkumar & Varadarajan, 2007]Center: [Har-Peled & Varadarajan, 2002]

j ; k > 1

Our Result

A set F which is an (®;¯ )-approximation

to the k j -° at mean/ median/ center of

P simultaneously, where

jF j · ®= logn ¢(j k log logn)O(j )

¯ = 2O(j )

in dn ¢(j k)O(j ) time.

Applications for

F irst (1 + ²)-approximations (with exactlyk-lines) that takes time linear in n

² for the k-line median/ mean of P µ Rd,using [Feldman et al., 2006]

² for the k-line center of P µ Rd,using [Agarwal et al., 2002]

j = 1;k ¸ 1

Applications for

² P TAS for the 1 j -° at median,using [Feldman et al., 2006]

² More e±cient algorithm for the1 j -° at center, using[Har-Peled & Varadarajan, 2004]

² F irst coresets for the k-lineand j -° at median/ mean/ center

k = 1; j ¸ 1

24

The Algorithm

InputA set of n points P ½Rd, k; j ¸ 1.

Output (with high probability)F : an (®;¯ )-approximation to the

k j -° at mean/ median/ center of P

Initialization

1) t Ã 1

Counter for iterations

2) F Ã ;

T he output set of j -° ats

3) P ick a random sample St ½P ,

jStj = O(j 2k2t)

t = 1

F t := All possible j -dimensional °ats

that pass through (j + 1) points of St

(t = 1)

4) F Ã F [ Ft

(t = 1)

5) 8p : Compute dist(p;F t)

p

(t = 1)

6) Remove Pt: the half of P that is

closer to Ft

(t = 1)tP

tP


closer to Ft

(t = 1)

7) t Ã t + 1

8) Repeat steps 3 to 6:

3) P ick a random sample St ½P ,

jStj = O(j 2k2t)

(t = 2)

F t := All possible j -dimensional °ats

that pass through (j + 1) points of St

(t = 2)

4) F Ã F [ Ft

(t = 2)

5) 8p : Compute dist(p;F t)

p

(t = 2)


closer to Ft

(t = 2)

6) t Ã t + 1

7) Repeat steps 3 to 6

till there are no more input points.

8) Return F :

44

Proof of Correctnessfor the case of lines ( j=1)

Let F ¤ be any set of k lines in Rd.

Let F ¤ be any set of k lines in Rd.

Consider F t that is constructed during

the tth iteration.

A point b2 P is bad for Ft, if:

dist(b;F t) > 4dist(b;F ¤)

b

A point g 2 P is good for F t otherwise:

dist(g;F t) · 4dist(g;F ¤)

g

Main Technical TheoremWe can map every bad point b2 Pt to

a distinct good point g 2 Pt+1.

b

g

tP

50

dist(b;F ) · dist(b;Ft), because F ¶ Ft.

Since b2 Pt and g 2 Pt+1:

dist(b;Ft) · dist(g;Ft)

Since g is good for Ft:dist(g;Ft) · 4dist(g;F ¤)

51

dist(b;F ) · dist(b;Ft), because F ¶ Ft.

Since b2 Pt and g 2 Pt+1:

dist(b;Ft) · dist(g;Ft)

Since g is good for Ft:dist(g;Ft) · 4dist(g;F ¤)

dist(b;F ) · 4dist(g;F ¤)

Applied for k-line MedianX

p2Pdist(p;F )=

X

gdist(g;F ) +

X

bdist(b;F )

·X

g4dist(g;F ¤) +

X

g4dist(g;F ¤)

· 8X

p2Pdist(p;F ¤)

Similarly for k j -°at mean/ center of P .

² T he number of bad points is at most

jB j =jPtj8

²¯¯P̄t+1

¯¯¯=

jPtj2

T he number of good points in Pt+1 is at least

¯¯P̄t+1

¯¯¯¡ jB j ¸

jPtj2

¡jPtj8

¸ jB j

Proof of the Technical Theorem

*f

`

p

1b0b

1f

dist(p; f1) · 4dist(p; f ¤)

Claim: Only B =jPtj8

points are bad for f 1 2 Ft

*f

` 0B

B0: thejPtj8 closest points to f ¤

*f

` 0B

0b


B0 probably contains b0 2 St

*f

`

0f 0b

0B



*f

`0f 0b

p

0B

dist(p; f0) = dist(p; f ¤) + dist(b0; f¤)

· 2dist(p; f ¤)

For every white point p 2 P nB0:

0f1B

0b1B

B1 : T hejPtj16

points with smallest angle to f 0


0f1B

0b1B

1b

B1 : T hejPtj16

points with smallest angle to f 0

0f1B

0b

p

1B

1b1f

For every white point p 2 P nB1:

dist(p; f 1) · 2dist(p; f 0)

*f`

0f

p

1b0b

1f

dist(p; f1) · 2dist(p; f0) · 4dist(p; f ¤)

All the white points are good for f1

*f`

0f

p

1b0b

1f

jB j = jB0j + jB1j =jPtj16

+jPtj16

=jPtj8

Only the black points B are bad for F t

Lines/Edges Detection

Prediction/Analyzing Time Series

µ(p; f 1) · µ(p; f 0) + µ(b1; f 0) · 2µ(p; f 0)

or,µ(p; f 1)

2· µ(p; f 0),

so:

B

0

q

`

B

sp{b}b

q

sinµ(p; f 1) = 2sinµ(p; f 1)

2cos

µ(p; f 1)2

· 2 sinµ(p; f 1)

2· 2 sinµ(p; f 0):

So, we have sinµ(p; f 1) · 2 sinµ(p; f 0).

T he distance from p to f 1 is thus bounded by

dist(p; f 1) = kpksinµ(p; f1)

· kpk ¢2sinµ(p; f 0) = 2dist(p; f0):

B

0

q

`

B

sp{b}b

q

bi-criteria linear-time approximations for generalized k-mean/median/center

Documents