bi-criteria linear-time approximations for generalized k-mean/median/center
DESCRIPTION
Bi-criteria Linear-time Approximations for Generalized k-Mean/Median/Center. Speaker: Dan Feldman Joint work with Amos Fiat, Danny Segev & Micha Sharir. TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A A A A A A A A A A A A A A A A A A A. 1-Line Median. - PowerPoint PPT PresentationTRANSCRIPT
1
Bi-criteria Linear-time Approximations for
Generalized k-Mean/Median/Center
Speaker: Dan Feldman
Joint work with
Amos Fiat, Danny Segev & Micha Sharir
1-Line Median
Let P be a set of n points in ddR
P
The line median * minimizes dist( )p P
p
1-Line Median
*
*dist( )p
p
- Line Mediank
- Approximation (PTAS)
L (k lines) is a (1 ) approximation if
dist OPT( ) (1 )p P
Lp
min distT ( )OPL k
p P
p L
(1 )
- Line Meank
- Line Centerk
dist( )max ,p P
p L
k
- Line Median
Can you cover P by k lines?
Can you cover P by k lines?
´ Does OPT = 0 ?
´ Does 2OPT = 0 ?
´ Does nOPT = 0 ?
- Line Median/Mean/Center is NP-Hard
It is NP -Hard to decide whether a set of npoints can be covered by k lines [Megiddo andTamir, 1983]
Iafddfsd
It is NP -Hard to decide whether a set
P µ R2 can be covered by k-lines.
[Megiddo and Tamir, 1983]
No non-trivial approximations to the
k-line median/ mean/ center that takes
poly (k) time
k
( Approximation
L is a -approximation for the k-line median if
dis PT) Ot(p P
p L
L is an (-approximation for the k-line median if
| |L dis PT) Ot(p P
p L
and
| |L k and
Example: The 2-Line Median of P
* * *1 2,L
*2
*1
(3-)½, approximation to the 2-Line Median of P
`
1 2 3, ,L
1 2
3
dist( , ½ P) O T p P
Lp
(4 ,10-) approximation to the 2-Line Median of P
1 2 3 4, , ,L
32
4
1
dist( , ) OPT10 p P
p L
k j-Flat Median/Mean/Center
A set F of k j-dimensional flats that minimizes the
sum of distances/
sum of squared distances/
maximum distance
from P to F
Results forj ¸ 1;k = 1² Mean:
O(nd2) time, Exact (SVD) [Pearson, 1901]
nd¢poly(j ;1=²) time, P TAS[Deshpande et al.,][Sarlos][Har-Peled] (2006)
² Median:
nd¢2poly(j ;1=²) time, P TAS[Shyamalkumar & Varadarajan, 2007]
² Center:
nd¢exp³poly(2(j
2) ;1=²)´time, P TAS
[Har-Peled & Varadarajan, 2004]
Results forj = 1;k ¸ 1² Mean/ Median:
nd¢kO(1) + (²¡ 1 logn)O(dk) time PTAS ,[Feldman et al., 2006]
² Center:
n logn ¢(1=²)poly(d;k) time PTAS[Agarwal et al., 2002]
O(dnk3 log4 n) time for³O(dk logk);8
´-approximation
[Agarwal & P rocopiuc, 2000]
Results for
PTAS that takes d¢npoly(j ;k;1=²) time.
Mean: [Deshpande et al., 2006]Median: [Shyamalkumar & Varadarajan, 2007]Center: [Har-Peled & Varadarajan, 2002]
j ; k > 1
Our Result
A set F which is an (®;¯ )-approximation
to the k j -° at mean/ median/ center of
P simultaneously, where
jF j · ®= logn ¢(j k log logn)O(j )
¯ = 2O(j )
in dn ¢(j k)O(j ) time.
Applications for
F irst (1 + ²)-approximations (with exactlyk-lines) that takes time linear in n
² for the k-line median/ mean of P µ Rd,using [Feldman et al., 2006]
² for the k-line center of P µ Rd,using [Agarwal et al., 2002]
j = 1;k ¸ 1
Applications for
² P TAS for the 1 j -° at median,using [Feldman et al., 2006]
² More e±cient algorithm for the1 j -° at center, using[Har-Peled & Varadarajan, 2004]
² F irst coresets for the k-lineand j -° at median/ mean/ center
k = 1; j ¸ 1
24
The Algorithm
InputA set of n points P ½Rd, k; j ¸ 1.
Output (with high probability)F : an (®;¯ )-approximation to the
k j -° at mean/ median/ center of P
Output (with high probability)F : an (®;¯ )-approximation to the
k j -° at mean/ median/ center of P
Initialization
1) t à 1
Counter for iterations
2) F Ã ;
T he output set of j -° ats
3) P ick a random sample St ½P ,
jStj = O(j 2k2t)
t = 1
F t := All possible j -dimensional °ats
that pass through (j + 1) points of St
(t = 1)
4) F Ã F [ Ft
(t = 1)
5) 8p : Compute dist(p;F t)
p
(t = 1)
6) Remove Pt: the half of P that is
closer to Ft
(t = 1)tP
tP
6) Remove Pt: the half of P that is
closer to Ft
(t = 1)
7) t à t + 1
8) Repeat steps 3 to 6:
3) P ick a random sample St ½P ,
jStj = O(j 2k2t)
(t = 2)
F t := All possible j -dimensional °ats
that pass through (j + 1) points of St
(t = 2)
4) F Ã F [ Ft
(t = 2)
5) 8p : Compute dist(p;F t)
p
(t = 2)
6) Remove Pt: the half of P that is
closer to Ft
(t = 2)
7) Remove Pt: the half of P that is
closer to Ft
(t = 2)
7) Remove Pt: the half of P that is
closer to Ft
(t = 2)
6) t à t + 1
7) Repeat steps 3 to 6
till there are no more input points.
8) Return F :
44
Proof of Correctnessfor the case of lines ( j=1)
Let F ¤ be any set of k lines in Rd.
Let F ¤ be any set of k lines in Rd.
Consider F t that is constructed during
the tth iteration.
A point b2 P is bad for Ft, if:
dist(b;F t) > 4dist(b;F ¤)
b
A point g 2 P is good for F t otherwise:
dist(g;F t) · 4dist(g;F ¤)
g
Main Technical TheoremWe can map every bad point b2 Pt to
a distinct good point g 2 Pt+1.
b
g
tP
50
dist(b;F ) · dist(b;Ft), because F ¶ Ft.
Since b2 Pt and g 2 Pt+1:
dist(b;Ft) · dist(g;Ft)
Since g is good for Ft:dist(g;Ft) · 4dist(g;F ¤)
51
dist(b;F ) · dist(b;Ft), because F ¶ Ft.
Since b2 Pt and g 2 Pt+1:
dist(b;Ft) · dist(g;Ft)
Since g is good for Ft:dist(g;Ft) · 4dist(g;F ¤)
dist(b;F ) · 4dist(g;F ¤)
Applied for k-line MedianX
p2Pdist(p;F )=
X
gdist(g;F ) +
X
bdist(b;F )
·X
g4dist(g;F ¤) +
X
g4dist(g;F ¤)
· 8X
p2Pdist(p;F ¤)
Similarly for k j -°at mean/ center of P .
² T he number of bad points is at most
jB j =jPtj8
²¯¯P̄t+1
¯¯¯=
jPtj2
T he number of good points in Pt+1 is at least
¯¯P̄t+1
¯¯¯¡ jB j ¸
jPtj2
¡jPtj8
¸ jB j
Proof of the Technical Theorem
*f
`
p
1b0b
1f
dist(p; f1) · 4dist(p; f ¤)
Claim: Only B =jPtj8
points are bad for f 1 2 Ft
*f
` 0B
B0: thejPtj8 closest points to f ¤
*f
` 0B
0b
B0: thejPtj16 closest points to f ¤
B0 probably contains b0 2 St
*f
`
0f 0b
0B
B0: thejPtj16 closest points to f ¤
B0 probably contains b0 2 St
*f
`0f 0b
p
0B
dist(p; f0) = dist(p; f ¤) + dist(b0; f¤)
· 2dist(p; f ¤)
For every white point p 2 P nB0:
0f1B
0b1B
B1 : T hejPtj16
points with smallest angle to f 0
B1 probably contains b1 2 St
0f1B
0b1B
1b
B1 : T hejPtj16
points with smallest angle to f 0
0f1B
0b
p
1B
1b1f
For every white point p 2 P nB1:
dist(p; f 1) · 2dist(p; f 0)
*f`
0f
p
1b0b
1f
dist(p; f1) · 2dist(p; f0) · 4dist(p; f ¤)
All the white points are good for f1
*f`
0f
p
1b0b
1f
jB j = jB0j + jB1j =jPtj16
+jPtj16
=jPtj8
Only the black points B are bad for F t
Lines/Edges Detection
Prediction/Analyzing Time Series
Prediction/Analyzing Time Series
µ(p; f 1) · µ(p; f 0) + µ(b1; f 0) · 2µ(p; f 0)
or,µ(p; f 1)
2· µ(p; f 0),
so:
B
0
q
`
B
sp{b}b
q
sinµ(p; f 1) = 2sinµ(p; f 1)
2cos
µ(p; f 1)2
· 2 sinµ(p; f 1)
2· 2 sinµ(p; f 0):
So, we have sinµ(p; f 1) · 2 sinµ(p; f 0).
T he distance from p to f 1 is thus bounded by
dist(p; f 1) = kpksinµ(p; f1)
· kpk ¢2sinµ(p; f 0) = 2dist(p; f0):
B
0
q
`
B
sp{b}b
q