coresets and sketches for high dimensional subspace approximation problems morteza monemizadeh tu...

28
Coresets and Sketches for High Dimensional Subspace Approximation Problems Morteza Monemizadeh TU Dortmund Joint work with: D. Feldman, C. Sohler, D. Woodruff SODA 2010

Post on 21-Dec-2015

222 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Coresets and Sketches for High Dimensional Subspace Approximation Problems Morteza Monemizadeh TU Dortmund Joint work with: D. Feldman, C. Sohler, D. Woodruff

Coresets and Sketches for

High Dimensional Subspace Approximation Problems

Morteza Monemizadeh

TU Dortmund

Joint work with: D. Feldman, C. Sohler, D. Woodruff

SODA 2010

Page 2: Coresets and Sketches for High Dimensional Subspace Approximation Problems Morteza Monemizadeh TU Dortmund Joint work with: D. Feldman, C. Sohler, D. Woodruff

Unbounded Precision

Insertion-only Streaming:

P = fp1;p2;¢¢¢;pngµ <d; j ¸ 0

Head of stream

Seen pointsUnseen points

Input:

Page 3: Coresets and Sketches for High Dimensional Subspace Approximation Problems Morteza Monemizadeh TU Dortmund Joint work with: D. Feldman, C. Sohler, D. Woodruff

Subspace Problem

OP T = minF µ <d cost(P;F )

Find a j-subspace F:

p1 p2p4

p3

p5

p6Euclidean Distance

= minF µ <d

Ppi 2P dist(pi ;F )

Page 4: Coresets and Sketches for High Dimensional Subspace Approximation Problems Morteza Monemizadeh TU Dortmund Joint work with: D. Feldman, C. Sohler, D. Woodruff

Subspace Approximation

cost(P;F 0) · (1+ ²) ¢OP T

Find a j-subspace such thatF 0

PTAS:

Page 5: Coresets and Sketches for High Dimensional Subspace Approximation Problems Morteza Monemizadeh TU Dortmund Joint work with: D. Feldman, C. Sohler, D. Woodruff

Simple Cases

cost(P;F ) =P

pi 2P (dist(pi ;F ))

j = 0: 1-median

j : PCA/SVD

Machine Learning

LSI, PageRank, HIITS

Collaborative Filtering, Recommendation Systems

Clustering

k-median

Page 6: Coresets and Sketches for High Dimensional Subspace Approximation Problems Morteza Monemizadeh TU Dortmund Joint work with: D. Feldman, C. Sohler, D. Woodruff

Simple Cases

Linear regression

Nonlinear regression

j = d¡ 1 :

Shape-fitting

Page 7: Coresets and Sketches for High Dimensional Subspace Approximation Problems Morteza Monemizadeh TU Dortmund Joint work with: D. Feldman, C. Sohler, D. Woodruff

Known Before

Coresets (Har-Peled)

Dynamic Programming (Arora, Mitchell)

d =O(1): Low-dimensions

d =O(n): High-dimensions

Dimensionality Reduction (Indyk, Rabani, …)

d =O(1): Low-dimensions

Page 8: Coresets and Sketches for High Dimensional Subspace Approximation Problems Morteza Monemizadeh TU Dortmund Joint work with: D. Feldman, C. Sohler, D. Woodruff

Simple PTAS

PTAS: O(nd¢j¡ j)

9F i : cost(P;F i ) · (1+ ²) ¢OP T

F j¡ j

Centroid Set:

¡ =F1 F2 F i

Page 9: Coresets and Sketches for High Dimensional Subspace Approximation Problems Morteza Monemizadeh TU Dortmund Joint work with: D. Feldman, C. Sohler, D. Woodruff

PTAS

S = fs1;s2;¢¢¢;si ;¢¢¢;sjS jg

8F i 2 ¡ : jcost(S;F i ) ¡ cost(P;F i )j · ² ¢cost(P;F i )

Weak Coreset:

jSj = O(j¡ j)

PTAS:O(d¢j¡ j2) O(d¢2poly(j =²))

j¡ j = 2poly(j =²)

Page 10: Coresets and Sketches for High Dimensional Subspace Approximation Problems Morteza Monemizadeh TU Dortmund Joint work with: D. Feldman, C. Sohler, D. Woodruff

Tools

Weak Coreset Centroid Set

Page 11: Coresets and Sketches for High Dimensional Subspace Approximation Problems Morteza Monemizadeh TU Dortmund Joint work with: D. Feldman, C. Sohler, D. Woodruff

Coreset Construction

Assumptions:

d=2, j=1

Fix a 1-subspace (line): F i

Have a 1-subspace (line): cost(P;F ¤) · O(1) ¢OP T

9Q µ <d : jcost(P;F i ) ¡ cost(Q;F i )j · ² ¢cost(P;F ¤)

· O(1) ¢² ¢OP T

· O(²) ¢cost(P;F i )

GOAL: P rob¸ 1¡ ±

jQj = O( log(1=±)²2 )

Page 12: Coresets and Sketches for High Dimensional Subspace Approximation Problems Morteza Monemizadeh TU Dortmund Joint work with: D. Feldman, C. Sohler, D. Woodruff

1st Try

F i

Sampling u.a.r or even non-uniformly:

E (cost(Q;F i )) = cost(P;F i )

Page 13: Coresets and Sketches for High Dimensional Subspace Approximation Problems Morteza Monemizadeh TU Dortmund Joint work with: D. Feldman, C. Sohler, D. Woodruff

2nd Try

F ¤

F i

(pj ;1)

(p1;1)

(¹pj ; ¡ 1)

(¹pj ;+1)(¹p1; ¡ 1)

(¹p1;+1)

Page 14: Coresets and Sketches for High Dimensional Subspace Approximation Problems Morteza Monemizadeh TU Dortmund Joint work with: D. Feldman, C. Sohler, D. Woodruff

F i

F ¤(¹pj ;+1)(¹p1;+1)

cost( ¹P ;F i ) =P

pj 2P dist(¹pj ;F i )

F ¤

F i

(¹pj ; ¡ 1)

(pj ;1)

(p1;1)

(¹p1;¡ 1)

cost(P;F i ) ¡ cost( ¹P ;F i )

F ¤

F i

(¹pj ; ¡ 1)

(¹pj ;+1)

(pj ;1)

(p1;1)

(¹p1; ¡ 1)

(¹p1;+1)

cost(P;F i )

Page 15: Coresets and Sketches for High Dimensional Subspace Approximation Problems Morteza Monemizadeh TU Dortmund Joint work with: D. Feldman, C. Sohler, D. Woodruff

F ¤

F i

E [cost(S;F i )] = cost(P;F i ) ¡ cost( ¹P ;F i )

(p1;1)

(¹p1; ¡ 1)

(¹pj ; ¡ 1)

(pj ;1)

cost(S;F i ) · jcost(P;F i ) ¡ cost( ¹P ;F i )j · cost(P;F ¤)

(¹pj ;¡ 2)

(pj ;2)

Page 16: Coresets and Sketches for High Dimensional Subspace Approximation Problems Morteza Monemizadeh TU Dortmund Joint work with: D. Feldman, C. Sohler, D. Woodruff

Chernoff Bounds

jSj = O( log(1=±)²2 )

jcost(S;F i ) ¡ (cost(P;F i ) ¡ cost( ¹P ;F i ))j · ² ¢cost(P;F ¤)

E [cost(S;F i )] = cost(P;F i ) ¡ cost( ¹P ;F i )

cost(S;F i ) · jcost(P;F i ) ¡ cost( ¹P ;F i )j · cost(P;F ¤)

Page 17: Coresets and Sketches for High Dimensional Subspace Approximation Problems Morteza Monemizadeh TU Dortmund Joint work with: D. Feldman, C. Sohler, D. Woodruff

Recursion

F i

F ¤(¹pj ;+1)(¹p1;+1)

cost( ¹P ;F i ) =P

pj 2P dist(¹pj ;F i )

0

Page 18: Coresets and Sketches for High Dimensional Subspace Approximation Problems Morteza Monemizadeh TU Dortmund Joint work with: D. Feldman, C. Sohler, D. Woodruff

Recursion

F i

F ¤(¹pj ;+1)(¹p1;+1)

cost( ¹P ;F i ) =P

pj 2P dist(¹pj ;F i )

0

(¹¹pj ;+1)

(¹¹pj ; ¡ 1)

Page 19: Coresets and Sketches for High Dimensional Subspace Approximation Problems Morteza Monemizadeh TU Dortmund Joint work with: D. Feldman, C. Sohler, D. Woodruff

cost( ¹P ;F i )

F i

F ¤(¹pj ;+1)(¹p1;+1)

0

(¹¹pj ;+1)

(¹¹pj ;¡ 1)

F i

F ¤

(0,n)

cost(~0£ n;F i )

F ¤

F i

(¹pj ;+1)(¹p1;+1)

0

(¹¹pj ;¡ 1)

(¹¹p1; ¡ 1)

cost( ¹P ;F i ) ¡ cost(~0£ n;F i )

Page 20: Coresets and Sketches for High Dimensional Subspace Approximation Problems Morteza Monemizadeh TU Dortmund Joint work with: D. Feldman, C. Sohler, D. Woodruff

F ¤

F i

0

(¹¹p1; ¡ 1)

(¹p1;+1) (¹pj ;+1)

E (cost(S0;F i )) = cost( ¹P ;F i ) ¡ cost(~0£ n;F i )

cost(S0;F i ) · O(1) ¢cost(P;F ¤)

(¹¹pj ;+2)

(¹pj ;+2)

jSj = O( log(1=±)²2 )

jcost(S0;F i ) ¡ (cost( ¹P ;F i ) ¡ cost(~0£ n;F i ))j · ² ¢cost(P;F ¤)

Page 21: Coresets and Sketches for High Dimensional Subspace Approximation Problems Morteza Monemizadeh TU Dortmund Joint work with: D. Feldman, C. Sohler, D. Woodruff

+cost( ¹P ;F i ) ¡ cost(~0£ n;F i )

cost(P;F i ) = cost(~0£ n;F i )

+cost(P;F i ) ¡ cost( ¹P ;F i )

jcost(S0;F i ) ¡ (cost( ¹P ;F i ) ¡ cost(~0£ n;F i )j · ² ¢cost(P;F ¤)

jcost(S;F i ) ¡ (cost(P;F i ) ¡ cost( ¹P ;F i )j · ² ¢cost(P;F ¤)

Page 22: Coresets and Sketches for High Dimensional Subspace Approximation Problems Morteza Monemizadeh TU Dortmund Joint work with: D. Feldman, C. Sohler, D. Woodruff

Strong Coreset

S = fs1;s2;¢¢¢;si ;¢¢¢;sjS jg

8F i 2 <d : jcost(S;F i ) ¡ cost(P;F i )j · ² ¢cost(P;F i )

jSj = O(dj O(j 2) ¢²¡ 2 ¢logn)

jSj = O(d( j ¢2p

log n

²2 )poly(j ))Stream:

Page 23: Coresets and Sketches for High Dimensional Subspace Approximation Problems Morteza Monemizadeh TU Dortmund Joint work with: D. Feldman, C. Sohler, D. Woodruff

Centroid Set

In time n ¢2poly(j =²)

j¡ j = 2poly(j =²)

9F i : cost(P;F i ) · (1+ ²) ¢OP T

F j¡ j

Centroid Set:

¡ =F1 F2 F i

Page 24: Coresets and Sketches for High Dimensional Subspace Approximation Problems Morteza Monemizadeh TU Dortmund Joint work with: D. Feldman, C. Sohler, D. Woodruff

Centroid Set Construction

Page 25: Coresets and Sketches for High Dimensional Subspace Approximation Problems Morteza Monemizadeh TU Dortmund Joint work with: D. Feldman, C. Sohler, D. Woodruff

Bounded Precisionp1

p2

pi

pn

Stream S: …., (i,j, -5), …, (i,j, +10), … : |S|=poly(n,M)

A[i,j]-5 A[i,j]+10

A[i; j ] 2 f ¡ M ;¢¢¢;+M g

=A[n,d]

Page 26: Coresets and Sketches for High Dimensional Subspace Approximation Problems Morteza Monemizadeh TU Dortmund Joint work with: D. Feldman, C. Sohler, D. Woodruff

Bounded Precision

1-pass streaming algorithm

~O(nj 4 ¢log(nd))=²5Space:

Time: M poly(j =²)

Page 27: Coresets and Sketches for High Dimensional Subspace Approximation Problems Morteza Monemizadeh TU Dortmund Joint work with: D. Feldman, C. Sohler, D. Woodruff

Open Problems

Coreset size:

jSj = O(dj O(j 2) ¢²2 ¢logn)

jSj = O(d( j ¢2p

log n

²2 )poly(j ))Stream:

PTAS: O(nd¢poly(j =²) + (n + d) ¢2poly(j =²))

What other classes of Clustering?

Page 28: Coresets and Sketches for High Dimensional Subspace Approximation Problems Morteza Monemizadeh TU Dortmund Joint work with: D. Feldman, C. Sohler, D. Woodruff

Thanks!