direct robust matrix factorization

13
Direct Robust Matrix Factorization Liang Xiong, Xi Chen, Jeff Schneider Presented by xxx School of Computer Science Carnegie Mellon University

Upload: harmon

Post on 22-Feb-2016

62 views

Category:

Documents


0 download

DESCRIPTION

Direct Robust Matrix Factorization. Liang Xiong , Xi Chen, Jeff Schneider Presented by xxx School of Computer Science Carnegie Mellon University. Matrix Factorization. Extremely useful… Assumes the data matrix is of low-rank. PCA/SVD, NMF, Collaborative Filtering… - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Direct Robust  Matrix Factorization

Direct Robust Matrix Factorization

Liang Xiong, Xi Chen, Jeff SchneiderPresented by xxx

School of Computer ScienceCarnegie Mellon University

Page 2: Direct Robust  Matrix Factorization

2

Matrix Factorization

• Extremely useful…– Assumes the data matrix is of low-rank.– PCA/SVD, NMF, Collaborative Filtering…– Simple, effective, and scalable.

• For Anomaly Detection– Assumption: the normal data is of low-rank, and

anomalies are poorly approximated by the factorization.

DRMF: Liang Xiong, Xi Chen, Jeff Schneider

Page 3: Direct Robust  Matrix Factorization

3

Robustness Issue

• Usually not robust (sensitive to outliers)– Because of the L2 (Frobenius) measure they use.

• For anomaly detection, of course we have outliers.

DRMF: Liang Xiong, Xi Chen, Jeff Schneider

Minimize the approximation error

Low rank

Page 4: Direct Robust  Matrix Factorization

4

Why outliers matter

DRMF: Liang Xiong, Xi Chen, Jeff Schneider

Input signals Output basis

No outlier

Moderate outlier

Wild outlier

• Simulation– We use SVD to find the first basis of 10 sine signals.– To make it more fun, let us turn one point of one signal into a spike (the

outlier).

Cool

Disturbed

Totally lost

Page 5: Direct Robust  Matrix Factorization

DRMF: Liang Xiong, Xi Chen, Jeff Schneider 5

Direct Robust Matrix Factorization (DRMF)• Throw outliers out of the factorization, and

problem solved!

• Mathematically, this is DRMF:

– : number of non-zeros in S.

“Trash can” for outliers

There should be only a small number of outliers.

Page 6: Direct Robust  Matrix Factorization

DRMF: Liang Xiong, Xi Chen, Jeff Schneider 6

DRMF Algorithm

• Input: Data X.• Output: Low-rank L; Outliers S.

• Iterate (block coordinate descent):– Let C = X – S. Do rank-K SVD: L = SVD(C, K).– Let E = X – L. Do thresholding:

• t: the e-th largest elements in {|Eij|}.

• That’s it! Everyone could try at home.

| |0 otherwiseij ij

ij

E E tS

Page 7: Direct Robust  Matrix Factorization

7

Related Work• Nuclear norm minimization (NNM)– Effective methods with nice theoretical properties

from compressive sensing.– NNM is the convex relaxation of DRMF:

• A parallel work GoDec by Zhou et al. found in ICML’11.

DRMF: Liang Xiong, Xi Chen, Jeff Schneider

Page 8: Direct Robust  Matrix Factorization

DRMF: Liang Xiong, Xi Chen, Jeff Schneider 8

Pros & Cons

• Pros:– No compromise/relaxation => High quality– Efficient– Easy to implement and use

• Cons:– Difficult theory

• Because of the rank and the L0 norm…

– Non-convex. • Local minima exist. But can be greatly mitigated if initialized

by its convex version, NNM.

Page 9: Direct Robust  Matrix Factorization

DRMF: Liang Xiong, Xi Chen, Jeff Schneider 9

Highly Extensible• Structured Outliers

– Outlier rows instead of entries? Just use structured measurements.

• Sparse Input / Missing data– Useful for Recommendation, Matrix Completion.

• Non-Negativity like in NMF– Still readily solvable with the constraints.

• For large-scale problems.– Use approximate SVD solvers.

Page 10: Direct Robust  Matrix Factorization

DRMF: Liang Xiong, Xi Chen, Jeff Schneider 10

Simulation Study

• Factorize noisy low-rank matrices to find entry outliers.

– SVD: plain SVD.RPCA, SPCP: two representative NNM methods.

Error of recovering normal entries

Detection rate of outlier entries.

Running time (log-scale)

Page 11: Direct Robust  Matrix Factorization

11

Simulation Study

• Sensitivity to outliers– We examine the recovering errors when the

outlier amplitude grows.

– Noiseless case. All assumptions by RPCA hold.DRMF: Liang Xiong, Xi Chen, Jeff Schneider

Page 12: Direct Robust  Matrix Factorization

DRMF: Liang Xiong, Xi Chen, Jeff Schneider 12

Find Stranger Digits• USPS dataset is used. We mix a few ‘7’s into many ‘1’’s, and

then ask DRMF to find out those ‘7’s. Unsupervised.– Treat each digit as a row in the matrix.– Rank the digits by reconstruction errors.– Use the structured extension of DRMF: row outliers.

• Resulting ranked list:

Page 13: Direct Robust  Matrix Factorization

DRMF: Liang Xiong, Xi Chen, Jeff Schneider 13

Conclusion

• DRMF is a direct and intuitive solution to the robust factorization problem.

• Easy to implement and use.• Highly extensible.• Good empirical performance.

Please direct questions to Liang Xiong ([email protected])