probabilistic matrix factorization - cse · matrix factorization given a matrix x of size n m,...

53
Probabilistic Matrix Factorization Piyush Rai IIT Kanpur Probabilistic Machine Learning (CS772A) Feb 8, 2016 Probabilistic Machine Learning (CS772A) Probabilistic Matrix Factorization 1

Upload: duongthien

Post on 18-Aug-2018

274 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Probabilistic Matrix Factorization - CSE · Matrix Factorization Given a matrix X of size N M, approximate it via a low-rank decomposition X ˇUV> Each entry of X can be written as

Probabilistic Matrix Factorization

Piyush RaiIIT Kanpur

Probabilistic Machine Learning (CS772A)

Feb 8, 2016

Probabilistic Machine Learning (CS772A) Probabilistic Matrix Factorization 1

Page 2: Probabilistic Matrix Factorization - CSE · Matrix Factorization Given a matrix X of size N M, approximate it via a low-rank decomposition X ˇUV> Each entry of X can be written as

Matrix Factorization

Given a matrix X of size N ×M, approximate it via a low-rank decomposition

X ≈ UV>

Each entry of X can be written as

Xnm ≈ u>n vm =

K∑k=1

unkvmk

Note: K minM,N

U: N × K row latent factor matrix, un: K × 1 latent factors of row n

V: M ×K column latent factor matrix, vm: K × 1 latent factors of column m

X may have missing entries

Probabilistic Machine Learning (CS772A) Probabilistic Matrix Factorization 2

Page 3: Probabilistic Matrix Factorization - CSE · Matrix Factorization Given a matrix X of size N M, approximate it via a low-rank decomposition X ˇUV> Each entry of X can be written as

Matrix Factorization

Given a matrix X of size N ×M, approximate it via a low-rank decomposition

X ≈ UV>

Each entry of X can be written as

Xnm ≈ u>n vm =

K∑k=1

unkvmk

Note: K minM,N

U: N × K row latent factor matrix, un: K × 1 latent factors of row n

V: M ×K column latent factor matrix, vm: K × 1 latent factors of column m

X may have missing entries

Probabilistic Machine Learning (CS772A) Probabilistic Matrix Factorization 2

Page 4: Probabilistic Matrix Factorization - CSE · Matrix Factorization Given a matrix X of size N M, approximate it via a low-rank decomposition X ˇUV> Each entry of X can be written as

Matrix Factorization

Given a matrix X of size N ×M, approximate it via a low-rank decomposition

X ≈ UV>

Each entry of X can be written as

Xnm ≈ u>n vm =

K∑k=1

unkvmk

Note: K minM,N

U: N × K row latent factor matrix, un: K × 1 latent factors of row n

V: M ×K column latent factor matrix, vm: K × 1 latent factors of column m

X may have missing entries

Probabilistic Machine Learning (CS772A) Probabilistic Matrix Factorization 2

Page 5: Probabilistic Matrix Factorization - CSE · Matrix Factorization Given a matrix X of size N M, approximate it via a low-rank decomposition X ˇUV> Each entry of X can be written as

Matrix Factorization

Given a matrix X of size N ×M, approximate it via a low-rank decomposition

X ≈ UV>

Each entry of X can be written as

Xnm ≈ u>n vm =

K∑k=1

unkvmk

Note: K minM,N

U: N × K row latent factor matrix, un: K × 1 latent factors of row n

V: M ×K column latent factor matrix, vm: K × 1 latent factors of column m

X may have missing entries

Probabilistic Machine Learning (CS772A) Probabilistic Matrix Factorization 2

Page 6: Probabilistic Matrix Factorization - CSE · Matrix Factorization Given a matrix X of size N M, approximate it via a low-rank decomposition X ˇUV> Each entry of X can be written as

Matrix Factorization

Given a matrix X of size N ×M, approximate it via a low-rank decomposition

X ≈ UV>

Each entry of X can be written as

Xnm ≈ u>n vm =

K∑k=1

unkvmk

Note: K minM,N

U: N × K row latent factor matrix, un: K × 1 latent factors of row n

V: M ×K column latent factor matrix, vm: K × 1 latent factors of column m

X may have missing entries

Probabilistic Machine Learning (CS772A) Probabilistic Matrix Factorization 2

Page 7: Probabilistic Matrix Factorization - CSE · Matrix Factorization Given a matrix X of size N M, approximate it via a low-rank decomposition X ˇUV> Each entry of X can be written as

Matrix Factorization

Given a matrix X of size N ×M, approximate it via a low-rank decomposition

X ≈ UV>

Each entry of X can be written as

Xnm ≈ u>n vm =

K∑k=1

unkvmk

Note: K minM,N

U: N × K row latent factor matrix, un: K × 1 latent factors of row n

V: M ×K column latent factor matrix, vm: K × 1 latent factors of column m

X may have missing entries

Probabilistic Machine Learning (CS772A) Probabilistic Matrix Factorization 2

Page 8: Probabilistic Matrix Factorization - CSE · Matrix Factorization Given a matrix X of size N M, approximate it via a low-rank decomposition X ˇUV> Each entry of X can be written as

Matrix Factorization: Examples and Applications

Some applications:

Learning embeddings from dyadic/relational data (each matrix entry is adyad, e.g., user-item rating, document-word count, user-user link, etc.). Thusit also performs dimensionality reduction.

Matrix Completion, i.e., predicting missing entries in X via the learnedembeddings (useful in recommender systems/collaborative filtering - NetflixPrize competition, link prediction in social networks, etc.): Xnm ≈ u

>n vm

Probabilistic Machine Learning (CS772A) Probabilistic Matrix Factorization 3

Page 9: Probabilistic Matrix Factorization - CSE · Matrix Factorization Given a matrix X of size N M, approximate it via a low-rank decomposition X ˇUV> Each entry of X can be written as

Matrix Factorization: Examples and Applications

Some applications:

Learning embeddings from dyadic/relational data (each matrix entry is adyad, e.g., user-item rating, document-word count, user-user link, etc.). Thusit also performs dimensionality reduction.

Matrix Completion, i.e., predicting missing entries in X via the learnedembeddings (useful in recommender systems/collaborative filtering - NetflixPrize competition, link prediction in social networks, etc.): Xnm ≈ u

>n vm

Probabilistic Machine Learning (CS772A) Probabilistic Matrix Factorization 3

Page 10: Probabilistic Matrix Factorization - CSE · Matrix Factorization Given a matrix X of size N M, approximate it via a low-rank decomposition X ˇUV> Each entry of X can be written as

Matrix Factorization: Examples and Applications

Some applications:

Learning embeddings from dyadic/relational data (each matrix entry is adyad, e.g., user-item rating, document-word count, user-user link, etc.). Thusit also performs dimensionality reduction.

Matrix Completion, i.e., predicting missing entries in X via the learnedembeddings (useful in recommender systems/collaborative filtering - NetflixPrize competition, link prediction in social networks, etc.): Xnm ≈ u

>n vm

Probabilistic Machine Learning (CS772A) Probabilistic Matrix Factorization 3

Page 11: Probabilistic Matrix Factorization - CSE · Matrix Factorization Given a matrix X of size N M, approximate it via a low-rank decomposition X ˇUV> Each entry of X can be written as

Matrix Factorization: Examples and Applications

Some applications:

Learning embeddings from dyadic/relational data (each matrix entry is adyad, e.g., user-item rating, document-word count, user-user link, etc.). Thusit also performs dimensionality reduction.

Matrix Completion, i.e., predicting missing entries in X via the learnedembeddings (useful in recommender systems/collaborative filtering - NetflixPrize competition, link prediction in social networks, etc.): Xnm ≈ u

>n vm

Probabilistic Machine Learning (CS772A) Probabilistic Matrix Factorization 3

Page 12: Probabilistic Matrix Factorization - CSE · Matrix Factorization Given a matrix X of size N M, approximate it via a low-rank decomposition X ˇUV> Each entry of X can be written as

Interpreting the Embeddings

The embeddings/latent factors/latent features can be given interpretations(e.g., as genres if the matrix X represents a user-movie rating matrix case)

A cartoon illustation of matrix factorization based embeddings (or “generes”)learned from a user-movie rating data set using embedding dimension K = 2

Similar things (users/movies) get embedded nearby in the embedding space(two things will be deemed similar if their embeddings are similar). Thususeful for computing similarities and/or making recommendations

Picture courtesy: Matrix Factorization Techniques for Recommender Systems: Koren et al, 2009

Probabilistic Machine Learning (CS772A) Probabilistic Matrix Factorization 4

Page 13: Probabilistic Matrix Factorization - CSE · Matrix Factorization Given a matrix X of size N M, approximate it via a low-rank decomposition X ˇUV> Each entry of X can be written as

Interpreting the Embeddings

The embeddings/latent factors/latent features can be given interpretations(e.g., as genres if the matrix X represents a user-movie rating matrix case)

A cartoon illustation of matrix factorization based embeddings (or “generes”)learned from a user-movie rating data set using embedding dimension K = 2

Similar things (users/movies) get embedded nearby in the embedding space(two things will be deemed similar if their embeddings are similar). Thususeful for computing similarities and/or making recommendations

Picture courtesy: Matrix Factorization Techniques for Recommender Systems: Koren et al, 2009

Probabilistic Machine Learning (CS772A) Probabilistic Matrix Factorization 4

Page 14: Probabilistic Matrix Factorization - CSE · Matrix Factorization Given a matrix X of size N M, approximate it via a low-rank decomposition X ˇUV> Each entry of X can be written as

Interpreting the Embeddings

Another illustation of two-dimensional embeddings of movies only

Similar movies get embedded nearby in the embedding space

Picture courtesy: Matrix Factorization Techniques for Recommender Systems: Koren et al, 2009

Probabilistic Machine Learning (CS772A) Probabilistic Matrix Factorization 5

Page 15: Probabilistic Matrix Factorization - CSE · Matrix Factorization Given a matrix X of size N M, approximate it via a low-rank decomposition X ˇUV> Each entry of X can be written as

Matrix Factorization

Recall our model X ≈ UV> or X = UV> + E where E is the noise matrix

Goal: learn U and V, given a subset Ω of X (let’s call it XΩ)

Some notations:

Ω = (n,m): Xnm is observed

Ωun : column indices of observed entries in rows n

Ωvm : row indices of observed entries in column m

Probabilistic Machine Learning (CS772A) Probabilistic Matrix Factorization 6

Page 16: Probabilistic Matrix Factorization - CSE · Matrix Factorization Given a matrix X of size N M, approximate it via a low-rank decomposition X ˇUV> Each entry of X can be written as

Probabilistic Matrix Factorization

Assuming latent factors un, vm and each matrix entry Xnm to be real-valued

un ∼ N (un|0, λ−1U IK ), n = 1, . . . ,N

vm ∼ N (vn|0, λ−1V IK ), m = 1, . . . ,M

Xnm ∼ N (Xnm|u>n vm, σ2), ∀(n,m) ∈ Ω

This is also equivalent to Xnm = u>n vm + εnm where the noise/residual

εnm ∼ N (0, σ2)

Probabilistic Machine Learning (CS772A) Probabilistic Matrix Factorization 7

Page 17: Probabilistic Matrix Factorization - CSE · Matrix Factorization Given a matrix X of size N M, approximate it via a low-rank decomposition X ˇUV> Each entry of X can be written as

Probabilistic Matrix Factorization

Assuming latent factors un, vm and each matrix entry Xnm to be real-valued

un ∼ N (un|0, λ−1U IK ), n = 1, . . . ,N

vm ∼ N (vn|0, λ−1V IK ), m = 1, . . . ,M

Xnm ∼ N (Xnm|u>n vm, σ2), ∀(n,m) ∈ Ω

This is also equivalent to Xnm = u>n vm + εnm where the noise/residual

εnm ∼ N (0, σ2)

Probabilistic Machine Learning (CS772A) Probabilistic Matrix Factorization 7

Page 18: Probabilistic Matrix Factorization - CSE · Matrix Factorization Given a matrix X of size N M, approximate it via a low-rank decomposition X ˇUV> Each entry of X can be written as

Probabilistic Matrix Factorization

Assuming latent factors un, vm and each matrix entry Xnm to be real-valued

un ∼ N (un|0, λ−1U IK ), n = 1, . . . ,N

vm ∼ N (vn|0, λ−1V IK ), m = 1, . . . ,M

Xnm ∼ N (Xnm|u>n vm, σ2), ∀(n,m) ∈ Ω

This is also equivalent to Xnm = u>n vm + εnm where the noise/residual

εnm ∼ N (0, σ2)

Probabilistic Machine Learning (CS772A) Probabilistic Matrix Factorization 7

Page 19: Probabilistic Matrix Factorization - CSE · Matrix Factorization Given a matrix X of size N M, approximate it via a low-rank decomposition X ˇUV> Each entry of X can be written as

Probabilistic Matrix Factorization

Assuming latent factors un, vm and each matrix entry Xnm to be real-valued

un ∼ N (un|0, λ−1U IK ), n = 1, . . . ,N

vm ∼ N (vn|0, λ−1V IK ), m = 1, . . . ,M

Xnm ∼ N (Xnm|u>n vm, σ2), ∀(n,m) ∈ Ω

This is also equivalent to Xnm = u>n vm + εnm where the noise/residual

εnm ∼ N (0, σ2)

Probabilistic Machine Learning (CS772A) Probabilistic Matrix Factorization 7

Page 20: Probabilistic Matrix Factorization - CSE · Matrix Factorization Given a matrix X of size N M, approximate it via a low-rank decomposition X ˇUV> Each entry of X can be written as

Probabilistic Matrix Factorization

Assuming latent factors un, vm and each matrix entry Xnm to be real-valued

un ∼ N (un|0, λ−1U IK ), n = 1, . . . ,N

vm ∼ N (vn|0, λ−1V IK ), m = 1, . . . ,M

Xnm ∼ N (Xnm|u>n vm, σ2), ∀(n,m) ∈ Ω

This is also equivalent to Xnm = u>n vm + εnm where the noise/residual

εnm ∼ N (0, σ2)

Probabilistic Machine Learning (CS772A) Probabilistic Matrix Factorization 7

Page 21: Probabilistic Matrix Factorization - CSE · Matrix Factorization Given a matrix X of size N M, approximate it via a low-rank decomposition X ˇUV> Each entry of X can be written as

Probabilistic Matrix Factorization

Assuming latent factors un, vm and each matrix entry Xnm to be real-valued

un ∼ N (un|0, λ−1U IK ), n = 1, . . . ,N

vm ∼ N (vn|0, λ−1V IK ), m = 1, . . . ,M

Xnm ∼ N (Xnm|u>n vm, σ2), ∀(n,m) ∈ Ω

This is also equivalent to Xnm = u>n vm + εnm where the noise/residual

εnm ∼ N (0, σ2)

Probabilistic Machine Learning (CS772A) Probabilistic Matrix Factorization 7

Page 22: Probabilistic Matrix Factorization - CSE · Matrix Factorization Given a matrix X of size N M, approximate it via a low-rank decomposition X ˇUV> Each entry of X can be written as

Probabilistic Matrix Factorization

Our basic model

un ∼ N (un|0, λ−1U IK ), n = 1, . . . ,N

vm ∼ N (vn|0, λ−1V IK ), m = 1, . . . ,M

Xnm ∼ N (Xnm|u>n vm, σ2), ∀(n,m) ∈ Ω

Note: Many variations possible, e.g., adding row/column biases (an, bm),rows/column features (XU , XV ); will not consider those here

Xnm = N (Xnm|u>n vm+an + bm + β>U xUn + +β>V x

Vm, σ

2)

Note: Gaussian assumption on Xnm may not be appropriate if data is notreal-valued, e.g., is binary/counts/ordinal (but it still works well nevertheless)

Likewise, if we want to impose specific constraints on the latent factors (e.g.,non-negativity, sparsity, etc.) then Gaussians on un, vm are not appropriate

Here, we will only focus on the Gaussian case (leads to a simple algorithm)

Probabilistic Machine Learning (CS772A) Probabilistic Matrix Factorization 8

Page 23: Probabilistic Matrix Factorization - CSE · Matrix Factorization Given a matrix X of size N M, approximate it via a low-rank decomposition X ˇUV> Each entry of X can be written as

Probabilistic Matrix Factorization

Our basic model

un ∼ N (un|0, λ−1U IK ), n = 1, . . . ,N

vm ∼ N (vn|0, λ−1V IK ), m = 1, . . . ,M

Xnm ∼ N (Xnm|u>n vm, σ2), ∀(n,m) ∈ Ω

Note: Many variations possible, e.g., adding row/column biases (an, bm),rows/column features (XU , XV ); will not consider those here

Xnm = N (Xnm|u>n vm+an + bm + β>U xUn + +β>V x

Vm, σ

2)

Note: Gaussian assumption on Xnm may not be appropriate if data is notreal-valued, e.g., is binary/counts/ordinal (but it still works well nevertheless)

Likewise, if we want to impose specific constraints on the latent factors (e.g.,non-negativity, sparsity, etc.) then Gaussians on un, vm are not appropriate

Here, we will only focus on the Gaussian case (leads to a simple algorithm)

Probabilistic Machine Learning (CS772A) Probabilistic Matrix Factorization 8

Page 24: Probabilistic Matrix Factorization - CSE · Matrix Factorization Given a matrix X of size N M, approximate it via a low-rank decomposition X ˇUV> Each entry of X can be written as

Probabilistic Matrix Factorization

Our basic model

un ∼ N (un|0, λ−1U IK ), n = 1, . . . ,N

vm ∼ N (vn|0, λ−1V IK ), m = 1, . . . ,M

Xnm ∼ N (Xnm|u>n vm, σ2), ∀(n,m) ∈ Ω

Note: Many variations possible, e.g., adding row/column biases (an, bm),rows/column features (XU , XV ); will not consider those here

Xnm = N (Xnm|u>n vm+an + bm + β>U xUn + +β>V x

Vm, σ

2)

Note: Gaussian assumption on Xnm may not be appropriate if data is notreal-valued, e.g., is binary/counts/ordinal (but it still works well nevertheless)

Likewise, if we want to impose specific constraints on the latent factors (e.g.,non-negativity, sparsity, etc.) then Gaussians on un, vm are not appropriate

Here, we will only focus on the Gaussian case (leads to a simple algorithm)

Probabilistic Machine Learning (CS772A) Probabilistic Matrix Factorization 8

Page 25: Probabilistic Matrix Factorization - CSE · Matrix Factorization Given a matrix X of size N M, approximate it via a low-rank decomposition X ˇUV> Each entry of X can be written as

Probabilistic Matrix Factorization

Our basic model

un ∼ N (un|0, λ−1U IK ), n = 1, . . . ,N

vm ∼ N (vn|0, λ−1V IK ), m = 1, . . . ,M

Xnm ∼ N (Xnm|u>n vm, σ2), ∀(n,m) ∈ Ω

Note: Many variations possible, e.g., adding row/column biases (an, bm),rows/column features (XU , XV ); will not consider those here

Xnm = N (Xnm|u>n vm+an + bm + β>U xUn + +β>V x

Vm, σ

2)

Note: Gaussian assumption on Xnm may not be appropriate if data is notreal-valued, e.g., is binary/counts/ordinal (but it still works well nevertheless)

Likewise, if we want to impose specific constraints on the latent factors (e.g.,non-negativity, sparsity, etc.) then Gaussians on un, vm are not appropriate

Here, we will only focus on the Gaussian case (leads to a simple algorithm)

Probabilistic Machine Learning (CS772A) Probabilistic Matrix Factorization 8

Page 26: Probabilistic Matrix Factorization - CSE · Matrix Factorization Given a matrix X of size N M, approximate it via a low-rank decomposition X ˇUV> Each entry of X can be written as

Probabilistic Matrix Factorization

Our basic model

un ∼ N (un|0, λ−1U IK ), n = 1, . . . ,N

vm ∼ N (vn|0, λ−1V IK ), m = 1, . . . ,M

Xnm ∼ N (Xnm|u>n vm, σ2), ∀(n,m) ∈ Ω

Note: Many variations possible, e.g., adding row/column biases (an, bm),rows/column features (XU , XV ); will not consider those here

Xnm = N (Xnm|u>n vm+an + bm + β>U xUn + +β>V x

Vm, σ

2)

Note: Gaussian assumption on Xnm may not be appropriate if data is notreal-valued, e.g., is binary/counts/ordinal (but it still works well nevertheless)

Likewise, if we want to impose specific constraints on the latent factors (e.g.,non-negativity, sparsity, etc.) then Gaussians on un, vm are not appropriate

Here, we will only focus on the Gaussian case (leads to a simple algorithm)

Probabilistic Machine Learning (CS772A) Probabilistic Matrix Factorization 8

Page 27: Probabilistic Matrix Factorization - CSE · Matrix Factorization Given a matrix X of size N M, approximate it via a low-rank decomposition X ˇUV> Each entry of X can be written as

Parameter Estimation via MAP

Let’s do MAP estimation (recall, we have priors on the latent factors)

Log-posterior log p(XΩ,U,V) = log p(XΩ|U,V)p(U)p(V) is given by

L = log p(XΩ|U,V) + log p(U) + log p(V)

= log∏

(n,m)∈Ω

p(Xnm|un, vm)+ logN∏

n=1

p(un) + logM∏

m=1

p(vm)

With Gaussian likelihood and priors, ignoring the constants, we have

L =∑

(n,m)∈Ω

− 1

2σ2(Xnm − u

>n vm)2−

N∑n=1

λU2||un||2 −

M∑m=1

λV2||vm||2

Can solve for row and column latent factors un, vm in an alternating fashion

Probabilistic Machine Learning (CS772A) Probabilistic Matrix Factorization 9

Page 28: Probabilistic Matrix Factorization - CSE · Matrix Factorization Given a matrix X of size N M, approximate it via a low-rank decomposition X ˇUV> Each entry of X can be written as

Parameter Estimation via MAP

Let’s do MAP estimation (recall, we have priors on the latent factors)

Log-posterior log p(XΩ,U,V) = log p(XΩ|U,V)p(U)p(V) is given by

L = log p(XΩ|U,V) + log p(U) + log p(V)

= log∏

(n,m)∈Ω

p(Xnm|un, vm)+ logN∏

n=1

p(un) + logM∏

m=1

p(vm)

With Gaussian likelihood and priors, ignoring the constants, we have

L =∑

(n,m)∈Ω

− 1

2σ2(Xnm − u

>n vm)2−

N∑n=1

λU2||un||2 −

M∑m=1

λV2||vm||2

Can solve for row and column latent factors un, vm in an alternating fashion

Probabilistic Machine Learning (CS772A) Probabilistic Matrix Factorization 9

Page 29: Probabilistic Matrix Factorization - CSE · Matrix Factorization Given a matrix X of size N M, approximate it via a low-rank decomposition X ˇUV> Each entry of X can be written as

Parameter Estimation via MAP

Let’s do MAP estimation (recall, we have priors on the latent factors)

Log-posterior log p(XΩ,U,V) = log p(XΩ|U,V)p(U)p(V) is given by

L = log p(XΩ|U,V) + log p(U) + log p(V)

= log∏

(n,m)∈Ω

p(Xnm|un, vm)

+ logN∏

n=1

p(un) + logM∏

m=1

p(vm)

With Gaussian likelihood and priors, ignoring the constants, we have

L =∑

(n,m)∈Ω

− 1

2σ2(Xnm − u

>n vm)2−

N∑n=1

λU2||un||2 −

M∑m=1

λV2||vm||2

Can solve for row and column latent factors un, vm in an alternating fashion

Probabilistic Machine Learning (CS772A) Probabilistic Matrix Factorization 9

Page 30: Probabilistic Matrix Factorization - CSE · Matrix Factorization Given a matrix X of size N M, approximate it via a low-rank decomposition X ˇUV> Each entry of X can be written as

Parameter Estimation via MAP

Let’s do MAP estimation (recall, we have priors on the latent factors)

Log-posterior log p(XΩ,U,V) = log p(XΩ|U,V)p(U)p(V) is given by

L = log p(XΩ|U,V) + log p(U) + log p(V)

= log∏

(n,m)∈Ω

p(Xnm|un, vm)+ logN∏

n=1

p(un)

+ logM∏

m=1

p(vm)

With Gaussian likelihood and priors, ignoring the constants, we have

L =∑

(n,m)∈Ω

− 1

2σ2(Xnm − u

>n vm)2−

N∑n=1

λU2||un||2 −

M∑m=1

λV2||vm||2

Can solve for row and column latent factors un, vm in an alternating fashion

Probabilistic Machine Learning (CS772A) Probabilistic Matrix Factorization 9

Page 31: Probabilistic Matrix Factorization - CSE · Matrix Factorization Given a matrix X of size N M, approximate it via a low-rank decomposition X ˇUV> Each entry of X can be written as

Parameter Estimation via MAP

Let’s do MAP estimation (recall, we have priors on the latent factors)

Log-posterior log p(XΩ,U,V) = log p(XΩ|U,V)p(U)p(V) is given by

L = log p(XΩ|U,V) + log p(U) + log p(V)

= log∏

(n,m)∈Ω

p(Xnm|un, vm)+ logN∏

n=1

p(un) + logM∏

m=1

p(vm)

With Gaussian likelihood and priors, ignoring the constants, we have

L =∑

(n,m)∈Ω

− 1

2σ2(Xnm − u

>n vm)2−

N∑n=1

λU2||un||2 −

M∑m=1

λV2||vm||2

Can solve for row and column latent factors un, vm in an alternating fashion

Probabilistic Machine Learning (CS772A) Probabilistic Matrix Factorization 9

Page 32: Probabilistic Matrix Factorization - CSE · Matrix Factorization Given a matrix X of size N M, approximate it via a low-rank decomposition X ˇUV> Each entry of X can be written as

Parameter Estimation via MAP

Let’s do MAP estimation (recall, we have priors on the latent factors)

Log-posterior log p(XΩ,U,V) = log p(XΩ|U,V)p(U)p(V) is given by

L = log p(XΩ|U,V) + log p(U) + log p(V)

= log∏

(n,m)∈Ω

p(Xnm|un, vm)+ logN∏

n=1

p(un) + logM∏

m=1

p(vm)

With Gaussian likelihood and priors, ignoring the constants, we have

L =∑

(n,m)∈Ω

− 1

2σ2(Xnm − u

>n vm)2−

N∑n=1

λU2||un||2 −

M∑m=1

λV2||vm||2

Can solve for row and column latent factors un, vm in an alternating fashion

Probabilistic Machine Learning (CS772A) Probabilistic Matrix Factorization 9

Page 33: Probabilistic Matrix Factorization - CSE · Matrix Factorization Given a matrix X of size N M, approximate it via a low-rank decomposition X ˇUV> Each entry of X can be written as

Parameter Estimation via MAP

Let’s do MAP estimation (recall, we have priors on the latent factors)

Log-posterior log p(XΩ,U,V) = log p(XΩ|U,V)p(U)p(V) is given by

L = log p(XΩ|U,V) + log p(U) + log p(V)

= log∏

(n,m)∈Ω

p(Xnm|un, vm)+ logN∏

n=1

p(un) + logM∏

m=1

p(vm)

With Gaussian likelihood and priors, ignoring the constants, we have

L =∑

(n,m)∈Ω

− 1

2σ2(Xnm − u

>n vm)2−

N∑n=1

λU2||un||2 −

M∑m=1

λV2||vm||2

Can solve for row and column latent factors un, vm in an alternating fashion

Probabilistic Machine Learning (CS772A) Probabilistic Matrix Factorization 9

Page 34: Probabilistic Matrix Factorization - CSE · Matrix Factorization Given a matrix X of size N M, approximate it via a low-rank decomposition X ˇUV> Each entry of X can be written as

Solving for Row Latent Factors

The (negative) log-posterior

L =∑

(n,m)∈Ω

1

2σ2(Xnm − u

>n vm)2 +

N∑n=1

λU2||un||2 +

M∑m=1

λV2||vm||2

For row latent factors un (with all column factors fixed), the objective will be

Lun =∑

m∈Ωun

1

2σ2(Xnm − u

>n vm)2 +

λU2u>n un

Taking derivative w.r.t. un and setting to zero, we get

un =

( ∑m∈Ωun

vmv>m + λUσ

2IK

)−1( ∑m∈Ωun

Xnmvm

)

Note: with V fixed, we can solve for all un (n = 1, . . . ,N) in parallel

Probabilistic Machine Learning (CS772A) Probabilistic Matrix Factorization 10

Page 35: Probabilistic Matrix Factorization - CSE · Matrix Factorization Given a matrix X of size N M, approximate it via a low-rank decomposition X ˇUV> Each entry of X can be written as

Solving for Row Latent Factors

The (negative) log-posterior

L =∑

(n,m)∈Ω

1

2σ2(Xnm − u

>n vm)2 +

N∑n=1

λU2||un||2 +

M∑m=1

λV2||vm||2

For row latent factors un (with all column factors fixed), the objective will be

Lun =∑

m∈Ωun

1

2σ2(Xnm − u

>n vm)2 +

λU2u>n un

Taking derivative w.r.t. un and setting to zero, we get

un =

( ∑m∈Ωun

vmv>m + λUσ

2IK

)−1( ∑m∈Ωun

Xnmvm

)

Note: with V fixed, we can solve for all un (n = 1, . . . ,N) in parallel

Probabilistic Machine Learning (CS772A) Probabilistic Matrix Factorization 10

Page 36: Probabilistic Matrix Factorization - CSE · Matrix Factorization Given a matrix X of size N M, approximate it via a low-rank decomposition X ˇUV> Each entry of X can be written as

Solving for Row Latent Factors

The (negative) log-posterior

L =∑

(n,m)∈Ω

1

2σ2(Xnm − u

>n vm)2 +

N∑n=1

λU2||un||2 +

M∑m=1

λV2||vm||2

For row latent factors un (with all column factors fixed), the objective will be

Lun =∑

m∈Ωun

1

2σ2(Xnm − u

>n vm)2 +

λU2u>n un

Taking derivative w.r.t. un and setting to zero, we get

un =

( ∑m∈Ωun

vmv>m + λUσ

2IK

)−1( ∑m∈Ωun

Xnmvm

)

Note: with V fixed, we can solve for all un (n = 1, . . . ,N) in parallel

Probabilistic Machine Learning (CS772A) Probabilistic Matrix Factorization 10

Page 37: Probabilistic Matrix Factorization - CSE · Matrix Factorization Given a matrix X of size N M, approximate it via a low-rank decomposition X ˇUV> Each entry of X can be written as

Solving for Row Latent Factors

The (negative) log-posterior

L =∑

(n,m)∈Ω

1

2σ2(Xnm − u

>n vm)2 +

N∑n=1

λU2||un||2 +

M∑m=1

λV2||vm||2

For row latent factors un (with all column factors fixed), the objective will be

Lun =∑

m∈Ωun

1

2σ2(Xnm − u

>n vm)2 +

λU2u>n un

Taking derivative w.r.t. un and setting to zero, we get

un =

( ∑m∈Ωun

vmv>m + λUσ

2IK

)−1( ∑m∈Ωun

Xnmvm

)

Note: with V fixed, we can solve for all un (n = 1, . . . ,N) in parallel

Probabilistic Machine Learning (CS772A) Probabilistic Matrix Factorization 10

Page 38: Probabilistic Matrix Factorization - CSE · Matrix Factorization Given a matrix X of size N M, approximate it via a low-rank decomposition X ˇUV> Each entry of X can be written as

Solving for Column Latent Factors

The (negative) log-posterior

L =∑

(n,m)∈Ω

1

2σ2(Xnm − u

>n vm)2 +

N∑n=1

λU2||un||2 +

M∑m=1

λV2||vm||2

For column latent factors vm (with all row factors fixed), the objective will be

Lvm =∑

n∈Ωvm

1

2σ2(Xnm − u

>n vm)2 +

λV2v>mvm

Taking derivative w.r.t. vm and setting to zero, we get

vm =

( ∑n∈Ωvm

unu>n + λVσ

2IK

)−1( ∑n∈Ωum

Xnmun

)

Note: with U fixed, we can solve for all vm (m = 1, . . . ,M) in parallel

Probabilistic Machine Learning (CS772A) Probabilistic Matrix Factorization 11

Page 39: Probabilistic Matrix Factorization - CSE · Matrix Factorization Given a matrix X of size N M, approximate it via a low-rank decomposition X ˇUV> Each entry of X can be written as

Solving for Column Latent Factors

The (negative) log-posterior

L =∑

(n,m)∈Ω

1

2σ2(Xnm − u

>n vm)2 +

N∑n=1

λU2||un||2 +

M∑m=1

λV2||vm||2

For column latent factors vm (with all row factors fixed), the objective will be

Lvm =∑

n∈Ωvm

1

2σ2(Xnm − u

>n vm)2 +

λV2v>mvm

Taking derivative w.r.t. vm and setting to zero, we get

vm =

( ∑n∈Ωvm

unu>n + λVσ

2IK

)−1( ∑n∈Ωum

Xnmun

)

Note: with U fixed, we can solve for all vm (m = 1, . . . ,M) in parallel

Probabilistic Machine Learning (CS772A) Probabilistic Matrix Factorization 11

Page 40: Probabilistic Matrix Factorization - CSE · Matrix Factorization Given a matrix X of size N M, approximate it via a low-rank decomposition X ˇUV> Each entry of X can be written as

Solving for Column Latent Factors

The (negative) log-posterior

L =∑

(n,m)∈Ω

1

2σ2(Xnm − u

>n vm)2 +

N∑n=1

λU2||un||2 +

M∑m=1

λV2||vm||2

For column latent factors vm (with all row factors fixed), the objective will be

Lvm =∑

n∈Ωvm

1

2σ2(Xnm − u

>n vm)2 +

λV2v>mvm

Taking derivative w.r.t. vm and setting to zero, we get

vm =

( ∑n∈Ωvm

unu>n + λVσ

2IK

)−1( ∑n∈Ωum

Xnmun

)

Note: with U fixed, we can solve for all vm (m = 1, . . . ,M) in parallel

Probabilistic Machine Learning (CS772A) Probabilistic Matrix Factorization 11

Page 41: Probabilistic Matrix Factorization - CSE · Matrix Factorization Given a matrix X of size N M, approximate it via a low-rank decomposition X ˇUV> Each entry of X can be written as

Solving for Column Latent Factors

The (negative) log-posterior

L =∑

(n,m)∈Ω

1

2σ2(Xnm − u

>n vm)2 +

N∑n=1

λU2||un||2 +

M∑m=1

λV2||vm||2

For column latent factors vm (with all row factors fixed), the objective will be

Lvm =∑

n∈Ωvm

1

2σ2(Xnm − u

>n vm)2 +

λV2v>mvm

Taking derivative w.r.t. vm and setting to zero, we get

vm =

( ∑n∈Ωvm

unu>n + λVσ

2IK

)−1( ∑n∈Ωum

Xnmun

)

Note: with U fixed, we can solve for all vm (m = 1, . . . ,M) in parallel

Probabilistic Machine Learning (CS772A) Probabilistic Matrix Factorization 11

Page 42: Probabilistic Matrix Factorization - CSE · Matrix Factorization Given a matrix X of size N M, approximate it via a low-rank decomposition X ˇUV> Each entry of X can be written as

The Complete Algorithm

Input: Partially complete matrix XΩ

Initialize the column latent factors v 1, . . . , vM randomly, e.g., from the prior,i.e., vn ∼ N (0, λ−1

V IK )

Iterate until converge

Update each row latent factor un, n = 1, . . . ,N (can be in parallel)

un =

( ∑m∈Ωun

vmv>m + λUσ

2IK

)−1( ∑m∈Ωun

Xnmvm

)

Update each column latent factor vm, m = 1, . . . ,M (can be in parallel)

vm =

( ∑n∈Ωvm

unu>n + λVσ

2IK

)−1( ∑n∈Ωum

Xnmun

)

Final prediction for any entry: Xnm = u>n vm

Probabilistic Machine Learning (CS772A) Probabilistic Matrix Factorization 12

Page 43: Probabilistic Matrix Factorization - CSE · Matrix Factorization Given a matrix X of size N M, approximate it via a low-rank decomposition X ˇUV> Each entry of X can be written as

The Complete Algorithm

Input: Partially complete matrix XΩ

Initialize the column latent factors v 1, . . . , vM randomly, e.g., from the prior,i.e., vn ∼ N (0, λ−1

V IK )

Iterate until converge

Update each row latent factor un, n = 1, . . . ,N (can be in parallel)

un =

( ∑m∈Ωun

vmv>m + λUσ

2IK

)−1( ∑m∈Ωun

Xnmvm

)

Update each column latent factor vm, m = 1, . . . ,M (can be in parallel)

vm =

( ∑n∈Ωvm

unu>n + λVσ

2IK

)−1( ∑n∈Ωum

Xnmun

)

Final prediction for any entry: Xnm = u>n vm

Probabilistic Machine Learning (CS772A) Probabilistic Matrix Factorization 12

Page 44: Probabilistic Matrix Factorization - CSE · Matrix Factorization Given a matrix X of size N M, approximate it via a low-rank decomposition X ˇUV> Each entry of X can be written as

The Complete Algorithm

Input: Partially complete matrix XΩ

Initialize the column latent factors v 1, . . . , vM randomly, e.g., from the prior,i.e., vn ∼ N (0, λ−1

V IK )

Iterate until converge

Update each row latent factor un, n = 1, . . . ,N (can be in parallel)

un =

( ∑m∈Ωun

vmv>m + λUσ

2IK

)−1( ∑m∈Ωun

Xnmvm

)

Update each column latent factor vm, m = 1, . . . ,M (can be in parallel)

vm =

( ∑n∈Ωvm

unu>n + λVσ

2IK

)−1( ∑n∈Ωum

Xnmun

)

Final prediction for any entry: Xnm = u>n vm

Probabilistic Machine Learning (CS772A) Probabilistic Matrix Factorization 12

Page 45: Probabilistic Matrix Factorization - CSE · Matrix Factorization Given a matrix X of size N M, approximate it via a low-rank decomposition X ˇUV> Each entry of X can be written as

The Complete Algorithm

Input: Partially complete matrix XΩ

Initialize the column latent factors v 1, . . . , vM randomly, e.g., from the prior,i.e., vn ∼ N (0, λ−1

V IK )

Iterate until converge

Update each row latent factor un, n = 1, . . . ,N (can be in parallel)

un =

( ∑m∈Ωun

vmv>m + λUσ

2IK

)−1( ∑m∈Ωun

Xnmvm

)

Update each column latent factor vm, m = 1, . . . ,M (can be in parallel)

vm =

( ∑n∈Ωvm

unu>n + λVσ

2IK

)−1( ∑n∈Ωum

Xnmun

)

Final prediction for any entry: Xnm = u>n vm

Probabilistic Machine Learning (CS772A) Probabilistic Matrix Factorization 12

Page 46: Probabilistic Matrix Factorization - CSE · Matrix Factorization Given a matrix X of size N M, approximate it via a low-rank decomposition X ˇUV> Each entry of X can be written as

The Complete Algorithm

Input: Partially complete matrix XΩ

Initialize the column latent factors v 1, . . . , vM randomly, e.g., from the prior,i.e., vn ∼ N (0, λ−1

V IK )

Iterate until converge

Update each row latent factor un, n = 1, . . . ,N (can be in parallel)

un =

( ∑m∈Ωun

vmv>m + λUσ

2IK

)−1( ∑m∈Ωun

Xnmvm

)

Update each column latent factor vm, m = 1, . . . ,M (can be in parallel)

vm =

( ∑n∈Ωvm

unu>n + λVσ

2IK

)−1( ∑n∈Ωum

Xnmun

)

Final prediction for any entry: Xnm = u>n vm

Probabilistic Machine Learning (CS772A) Probabilistic Matrix Factorization 12

Page 47: Probabilistic Matrix Factorization - CSE · Matrix Factorization Given a matrix X of size N M, approximate it via a low-rank decomposition X ˇUV> Each entry of X can be written as

The Complete Algorithm

Input: Partially complete matrix XΩ

Initialize the column latent factors v 1, . . . , vM randomly, e.g., from the prior,i.e., vn ∼ N (0, λ−1

V IK )

Iterate until converge

Update each row latent factor un, n = 1, . . . ,N (can be in parallel)

un =

( ∑m∈Ωun

vmv>m + λUσ

2IK

)−1( ∑m∈Ωun

Xnmvm

)

Update each column latent factor vm, m = 1, . . . ,M (can be in parallel)

vm =

( ∑n∈Ωvm

unu>n + λVσ

2IK

)−1( ∑n∈Ωum

Xnmun

)

Final prediction for any entry: Xnm = u>n vm

Probabilistic Machine Learning (CS772A) Probabilistic Matrix Factorization 12

Page 48: Probabilistic Matrix Factorization - CSE · Matrix Factorization Given a matrix X of size N M, approximate it via a low-rank decomposition X ˇUV> Each entry of X can be written as

Matrix Factorization as Linear Regression

Suppose we are solving for the column latent factor vm (with U fixed)

Likewise, solving for each row latent factor un is a least-squares regression problem

Probabilistic Machine Learning (CS772A) Probabilistic Matrix Factorization 13

Page 49: Probabilistic Matrix Factorization - CSE · Matrix Factorization Given a matrix X of size N M, approximate it via a low-rank decomposition X ˇUV> Each entry of X can be written as

Matrix Factorization as Linear Regression

Suppose we are solving for the column latent factor vm (with U fixed)

Likewise, solving for each row latent factor un is a least-squares regression problem

Probabilistic Machine Learning (CS772A) Probabilistic Matrix Factorization 13

Page 50: Probabilistic Matrix Factorization - CSE · Matrix Factorization Given a matrix X of size N M, approximate it via a low-rank decomposition X ˇUV> Each entry of X can be written as

Matrix Factorization as Linear Regression

Suppose we are solving for the column latent factor vm (with U fixed)

Likewise, solving for each row latent factor un is a least-squares regression problem

Probabilistic Machine Learning (CS772A) Probabilistic Matrix Factorization 13

Page 51: Probabilistic Matrix Factorization - CSE · Matrix Factorization Given a matrix X of size N M, approximate it via a low-rank decomposition X ˇUV> Each entry of X can be written as

Matrix Factorization as Linear Regression

A very useful way to think about matrix factorization

Can modify the regularized least-squares like objective

arg minun

∑m∈Ωun

1

2σ2(Xnm − u

>n vm)2+

λU2u>n un

.. and replace it by other loss functions and regularizers

Can easily extend the model in various ways, e.g.

Handle other types of entries in the matrix X, e.g., binary, counts, etc. (bychanging the loss function or the likelihood function term)

Impose constraints on the latent factors (by changing the regularizer or prioron latent factors)

Probabilistic Machine Learning (CS772A) Probabilistic Matrix Factorization 14

Page 52: Probabilistic Matrix Factorization - CSE · Matrix Factorization Given a matrix X of size N M, approximate it via a low-rank decomposition X ˇUV> Each entry of X can be written as

Matrix Factorization as Linear Regression

A very useful way to think about matrix factorization

Can modify the regularized least-squares like objective

arg minun

∑m∈Ωun

1

2σ2(Xnm − u

>n vm)2+

λU2u>n un

.. and replace it by other loss functions and regularizers

Can easily extend the model in various ways, e.g.

Handle other types of entries in the matrix X, e.g., binary, counts, etc. (bychanging the loss function or the likelihood function term)

Impose constraints on the latent factors (by changing the regularizer or prioron latent factors)

Probabilistic Machine Learning (CS772A) Probabilistic Matrix Factorization 14

Page 53: Probabilistic Matrix Factorization - CSE · Matrix Factorization Given a matrix X of size N M, approximate it via a low-rank decomposition X ˇUV> Each entry of X can be written as

Matrix Factorization as Linear Regression

A very useful way to think about matrix factorization

Can modify the regularized least-squares like objective

arg minun

∑m∈Ωun

1

2σ2(Xnm − u

>n vm)2+

λU2u>n un

.. and replace it by other loss functions and regularizers

Can easily extend the model in various ways, e.g.

Handle other types of entries in the matrix X, e.g., binary, counts, etc. (bychanging the loss function or the likelihood function term)

Impose constraints on the latent factors (by changing the regularizer or prioron latent factors)

Probabilistic Machine Learning (CS772A) Probabilistic Matrix Factorization 14