sampling algorithms for l 2 regression and applications

20
Sampling algorithms for l Sampling algorithms for l 2 regression and applications regression and applications Michael W. Mahoney Michael W. Mahoney Yahoo Research http://www.cs.yale.edu/homes/mmahoney (Joint work with P. Drineas and S. (Muthu) (Joint work with P. Drineas and S. (Muthu) Muthukrishnan) Muthukrishnan) SODA 2006

Upload: bluma

Post on 18-Jan-2016

26 views

Category:

Documents


0 download

DESCRIPTION

Sampling algorithms for l 2 regression and applications. Michael W. Mahoney Yahoo Research http://www.cs.yale.edu/homes/mmahoney (Joint work with P. Drineas and S. (Muthu) Muthukrishnan) SODA 2006. Regression problems. We seek sampling-based algorithms for solving l 2 regression. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Sampling algorithms for l 2  regression and applications

Sampling algorithms for lSampling algorithms for l22 regression regression and applicationsand applications

Michael W. MahoneyMichael W. Mahoney

Yahoo Researchhttp://www.cs.yale.edu/homes/mmahoney

(Joint work with P. Drineas and S. (Muthu) Muthukrishnan)(Joint work with P. Drineas and S. (Muthu) Muthukrishnan)

SODA 2006

Page 2: Sampling algorithms for l 2  regression and applications

Regression problems

We seek sampling-based algorithms for solving l2 regression.

We are interested in overconstrained problems, n >> d. Typically, there is no x such that Ax = b.

Page 3: Sampling algorithms for l 2  regression and applications

sampled “rows” of b

sampled rows of A

scaling to account for

undersampling

“Induced” regression problems

Page 4: Sampling algorithms for l 2  regression and applications

Regression problems, definition

There is work by K. Clarkson in SODA 2005 on sampling-based algorithms for l1 regression ( = 1 ) for overconstrained problems.

Page 5: Sampling algorithms for l 2  regression and applications

Exact solution

Pseudoinverse of A

Projection of b on the subspace

spanned by the columns of A

Page 6: Sampling algorithms for l 2  regression and applications

Singular Value Decomposition (SVD)

U (V): orthogonal matrix containing the left (right) singular vectors of A.

: diagonal matrix containing the singular values of A.

rank of A.

Computing the SVD takes O(d2n) time. The pseudoinverse of A is

Page 7: Sampling algorithms for l 2  regression and applications

Questions …

Can sampling methods provide accurate estimates for l2 regression?

Is it possible to approximate the optimal vector and the optimal value value Z2 by only looking at a small sample from the input?

(Even if it takes some sophisticated oracles to actually perform the sampling …)

Equivalently, is there an induced subproblem of the full regression problem, whose optimal solution and its value Z2,s aproximates the optimal solution and its value Z2?

Page 8: Sampling algorithms for l 2  regression and applications

Creating an induced subproblem

Algorithm

• Fix a set of probabilities pi, i=1…n, summing up to 1.

• Pick r indices from {1…n} in r i.i.d. trials, with respect to the pi’s.

• For each sampled index j, keep the j-th row of A and the j-th element of b; rescale both by (1/rpj)1/2.

Page 9: Sampling algorithms for l 2  regression and applications

The induced subproblem

sampled elements of b, rescaled

sampled rows of A, rescaled

Page 10: Sampling algorithms for l 2  regression and applications

Our results

If the pi satisfy certain conditions, then with probability at least 1-,

The sampling complexity is

Page 11: Sampling algorithms for l 2  regression and applications

Our results, cont’d

If the pi satisfy certain conditions, then with probability at least 1-,

The sampling complexity is

A): condition number of A

Page 12: Sampling algorithms for l 2  regression and applications

Back to induced subproblems …

sampled elements of b, rescaled

sampled rows of A, rescaled

The relevant information for l2 regression if n >> d is contained in an induced subproblem of size O(d2)-by-d.

(upcoming writeup: we can reduce the sampling complexity to r = O(d).)

Page 13: Sampling algorithms for l 2  regression and applications

Conditions on the probabilities, SVD

Let U(i) denote the i-th row of U.

Let U? 2 Rn £ (n -) denote the orthogonal complement of U.

U (V): orthogonal matrix containing the left (right) singular vectors of A.

: diagonal matrix containing the singular values of A.

rank of A.

Page 14: Sampling algorithms for l 2  regression and applications

Conditions on the probabilities, interpretation

What do the lengths of the rows of the n x d matrix U = UA “mean”?

Consider possible n x d matrices U of d left singular vectors:

In|k = k columns from the identity

row lengths = 0 or 1

In|k x -> x

Hn|k = k columns from the n x n Hadamard (real Fourier) matrix

row lengths all equal

Hn|k x -> maximally dispersed

Uk = k columns from any orthogonal matrix

row lengths between 0 and 1

The lengths of the rows of U = UA correspond to a notion of information dispersal

Page 15: Sampling algorithms for l 2  regression and applications

Conditions for the probabilities

The conditions that the pi must satisfy, for some 1, 2, 3 2 (0,1]:

lengths of rows of matrix of left singular vectors of A

Component of b not in the span of the columns of A

The sampling complexity is:

Small i ) more

sampling

Page 16: Sampling algorithms for l 2  regression and applications

Computing “good” probabilities

Open question: can we compute “good” probabilities faster, in a pass efficient manner?

Some assumptions might be acceptable (e.g., bounded condition number of A, etc.)

In O(nd2) time we can easily compute pi’s that satisfy all three conditions, with 1 = 2 = 3 = 1/3.

(Too expensive in practice for this problem!)

Page 17: Sampling algorithms for l 2  regression and applications

Critical observation

sample & rescale

sample & rescale

Page 18: Sampling algorithms for l 2  regression and applications

Critical observation, cont’d

sample & rescale only U

sample & rescale

Page 19: Sampling algorithms for l 2  regression and applications

Critical observation, cont’d

Important observation: Us is almost orthogonal, and we can bound the spectral and the Frobenius norm of

UsT Us – I.

(FKV98, DK01, DKM04, RV04)

Page 20: Sampling algorithms for l 2  regression and applications

Application: CUR-type decompositions

1. How do we draw the rows and columns of A to include in C and R?

2. How do we construct U?

Create an approximation to A, using rows and columns of A

O(1) columnsO(1) rows

Carefully chosen U

Goal: provide (good) bounds for some norm of the error matrix

A – CUR