principled regularization for probabilistic matrix factorization robert bell, suhrid balakrishnan...

Principled Regularization for Probabilistic Matrix Factorization

Robert Bell, Suhrid BalakrishnanAT&T Labs-Research

Duke Workshop on Sensing and Analysis of High-Dimensional Data

July 26-28, 2011

2

Probabilistic Matrix Factorization (PMF)

• Approximate a large n-by-m matrix R by – M = P Q – P and Q each have k rows, k << n, m– mui = puqi

– R may be sparsely populated

• Prime tool in Netflix Prize– 99% of ratings were missing

Regularization for PMF

• Needed to avoid overfitting – Even after limiting rank of M– Critical for sparse, imbalanced data

• Penalized least squares– Minimize

3

ii

uui

iuobserved

uui qpqpr222

),(

' )(



• Penalized least squares– Minimize– or

4

ii

uui

iuobserved

uui qpqpr222

),(

' )(

i

iQu

uPi

iuobserved

uui qpqpr222

),(

' )(



• Penalized least squares– Minimize– or

– ’s selected by cross validation

5

ii

uui

iuobserved

uui qpqpr222

),(

' )(

i

iQu

uPi

iuobserved

uui qpqpr222

),(

' )(

Research Questions

• Should we use separate P and Q?

6

Research Questions


• Should we use k separate ’s for each dimension of P and Q?

7

Matrix Completion with Noise(Candes and Plan, Proc IEEE, 2010)

• Rank reduction without explicit factors– No pre-specification of k, rank(M)

• Regularization applied directly to M– Trace norm, aka, nuclear norm– Sum of the singular values of M

• Minimize subject to

• “Equivalent” to L2 regularization for P, Q

8

*M

),(

2)(

iuobserved

uiui mr

Research Questions


• Should we use k separate ’s for each dimension of P and Q?

• Should we use the trace norm for regularization?

9

Bayesian Matrix Factorization (BPMF) (Salakhutdinov and Mnih, ICML 2008)

• Let rui ~ N(puqi, 2)

• No PMF-type regularization• pu ~ N(P, P

-1) and qi ~ N(Q, Q-1)

• Priors for 2, P, Q, P, Q

• Fit by Gibbs sampling• Substantial reduction in prediction error

relative to PMF with L2 regularization10

Research Questions


• Should we use k separate reg. parameters for each dimension of P and Q?


• Does BPMF “regularize” appropriately?

11

Matrix Factorization with Biases

• Let mui = + au + bi + puqi

• Regularization similar to before– Minimize

12

ii

uu

ii

uu

iuobserved

uiui qpbamr2222

),(

2)(

Matrix Factorization with Biases

• Let mui = + au + bi + puqi

• Regularization similar to before– Minimize – or

13

ii

uu

ii

uu

iuobserved

uiui qpbamr2222

),(

2)(

i

iu

Qui

Pibu

ua

iuobserved

uiui qpbamr2222

),(

2)(

Research Questions


• Should we use k separate reg. parameters for each dimension of P and Q?


• Does BPMF “regularize” appropriately?• Should we use separate ’s for the biases?

14

Some Things this Talk Will Not Cover

• Various extensions of PMF– Combining explicit and implicit feedback– Time varying factors– Non-negative matrix factorization – L1 regularization

– ’s depending on user or item sample sizes• Efficiency of optimization algorithms

– Use Newton’s method, each coordinate separately– Iterate to convergence

15

No Need for Separate P and Q

• M = (cP)(c -1Q) is invariant for c ≠ 0• For initial P and Q

– Solve for c to minimize

– c =

– Gives

• Sufficient to let P = Q = PQ

16

22cQc QP P

41

2

2

P

Q

P

Q

QP212 QP

Bayesian Motivation for L2 Regularization

• Simplest case: only one item– R is n-by-1– Ru1 = a1 + ui, a1 ~ N(0, 2), ui ~ N(0, 2)

• Posterior mean (or MAP) of a1 satisfies– – a = ( 2/ 2)

– • Best is inversely proportional to 2

17

21

1

211 )( aar a

n

uu

11 )/(ˆ rnna a

Implications for Regularization of PMF

• Allow a ≠ b

– If a2 ≠ b

2

• Allow a ≠ b ≠ PQ

• Allow PQ1 ≠ PQ2 ≠ … ≠ PQk ?– Trace norm does not– BPMF appears to

18

Simulation Experiment Structure

• n = 2,500 users, m = 400 items• 250,000 observed ratings

– 150,000 in Training (to estimate a, b, P, Q)– 50,000 in Validation (to tune ’s)– 50,000 in Test (to estimate MSE)

• Substantial imbalance in ratings– 8 to 134 ratings per user in Training data– 33 to 988 ratings per item in Training data

19

Simulation Model

• rui = au + bi + pu1qi1 + pu2qi2 + ui

• Elements of a, b, P, Q, and – Independent normals with mean 0– Var(au) = 0.09

– Var(bi) = 0.16

– Var(pu1qi1) = 0.04

– Var(pu2qi2) = 0.01

– Var(ui) = 1.00 20

Evaluation

• Test MSE for estimation of mui = E(rui)– MSE =

• Limitations– Not real data– Only one replication– No standard errors

21

Testiniu

uiui mm),(

2)ˆ(

PMF Results for k = 0

Restrictions on ’s Values of a, b MSE for m MSE

Grand mean; no (a, b) NA .2979

22




a = b = 0 0 .0712 -.2267

23




a = b = 0 0 .0712 -.2267

a = b 9.32 .0678 -.0034

24




a = b = 0 0 .0712 -.2267

a = b 9.32 .0678 -.0034

Separate a, b 9.26, 9.70 .0678 .0000

25


Restrictions on ’s Values of a, b, PQ1 MSE for m MSE

Separate a, b 9.26, 9.70 .0678

26



Separate a, b 9.26, 9.70 .0678

a = b = PQ1 11.53 .0439 -.0239

27



Separate a, b 9.26, 9.70 .0678

a = b = PQ1 11.53 .0439 -.0239

Separate a, b, PQ1 8.50, 10.13, 13.44 .0439 .0000

28


Restrictions on ’s Values of a, b, PQ1 MSE for m

MSE

Separate a, b, PQ1 8.50, 10.13, 13.44, NA .0439

29



MSE

Separate a, b, PQ1 8.50, 10.13, 13.44, NA .0439

a, b, PQ1 = PQ2 8.44, 9.94, 19.84, 19.84 .0441 +.0002

30



MSE

Separate a, b, PQ1 8.50, 10.13, 13.44, NA .0439

a, b, PQ1 = PQ2 8.44, 9.94, 19.84, 19.84 .0441 +.0002

Separate a, b, PQ1, PQ2 8.43, 10.24, 13.38, 27.30 .0428 -.0013

31

Results for Matrix Completion

• Performs poorly on raw ratings– MSE = .0693– Not designed to estimate biases

• Fit to residuals from PMF with k = 0– MSE = .0477– “Recovered” rank was 1– Worse than MSE’s from PMF: .0428 to .0439

32

Results for BPMF

• Raw ratings– MSE = .0498, using k = 3– Early stopping– Not designed to estimate biases

• Fit to residuals from PMF with k = 0– MSE = .0433, using k = 2– Near .0428, for best PMF w/ biases

33

Summary

• No need for separate P and Q

• Theory suggests using separate ’s for distinct sets of exchangeable parameters– Biases vs. factors– For individual factors

• Tentative simulation results support need for separate ’s across factors– BPMF does so automatically– PMF requires a way to do efficient tuning

34

principled regularization for probabilistic matrix factorization robert bell, suhrid balakrishnan...

Documents

matrix r

analysis of high

large n

principled regularization

dimensional data