lecturenotes for bayesian methods in recommendation system

23

Upload: xudong-sun

Post on 23-Jan-2017

172 views

Category:

Education


0 download

TRANSCRIPT

Bayesian method in Recommendation system

Bayesian method in Recommendation system

Lecturer: Xudong Sun,[email protected]

DSOR-AISBI

January 20, 2016

Bayesian method in Recommendation system

1 Preliminaries

Exponential family model

Su�cient Statistics

Bayesian formula and Conjugate Prior

statistical sampling

2 Probablistic Matrix Factorization

3 Bayesian Matrix factorization

4 Bayesian Factorization machine

Bayesian method in Recommendation system

Preliminaries

Exponential family model

Exponential family model of order 1

Exponential tilting:f (y ; θ) = es(y)θf0(y)∫ s(x)θ f0(x)dx

f0(x)isanuniformdistributionunderadefinedsupport

a valid θ is called a natural parameter

s(y) is a transform for the random variable, which de�ne the

order of the exponential family

cumulant-generating function κ(θ) = log∫

es(y)θf0(y)dy

Natural parameter set is Convex set N = {θ : κ(θ) <∞}Partition function P(y) = e−κ(θ) and moment generating

function E{ety} =∫

ety f (y)dy

Physics: If you have the partition function, you have

everything!

Bayesian method in Recommendation system

Preliminaries

Exponential family model

Exponential family model of order 1

Suppose s(y) = θ

f (y ; θ) = eyθf0(y)∫ yθ f0(x)dx

κ(θ) = log(eθ − 1)/θ,why?

f (y ; θ) = θeθy/(eθ − 1)

Exponential family model without tilting:

f (y ; θ) = es(y)θf0(y)

eκ(θ) = exp{s(y)θ − κ(θ) + logf0(y)}f (y ;ω) = exp{s(y)θ(w)− b(w) + c(y)}

s(y) is called a natual observation

θ is called a natual parameter

Bayesian method in Recommendation system

Preliminaries

Exponential family model

Case study for exponential family model of order 1

Find the natual observation and natual parameter for the following

distribution

exponential p.d.f. f (y ; w) = w−1exp(−y/w)

bionomial p.d.f. C rmπ

r (1− π)m−r

Poisson density p.d.f. λxe−lambda

x!

Bayesian method in Recommendation system

Preliminaries

Exponential family model

Exponential family model of order p

Exponential tilting:f (y ; θ) = esT (y)θf0(y)∫ sT (x)θ f0(x)dx∑

ci si (y) = 0 implies ci = 0

θ = [1, θ1(w), ...θp(w)]

f (y ; w) = exp{sT (y)θ(w)− b(w)}f0(y)

in case there is 1-1 mapping between ω and

θ,f (y ; w) = exp{sT (y)θ − κ(θ)}f0(y)

E s(Y ) = dκ(θ)dθ

E s(Y ) = dκ(θ)dθ

Bayesian method in Recommendation system

Preliminaries

Exponential family model

Case study for exponential family model of order p

Find the natual observation and natual parameter for the following

distribution

beta p.d.f. f (y ; w) = w−1exp(−y/w)

one dimension Gaussian f (y |µ, σ2) = 1

(2π)0.5σexp− 1

2σ2(y − u)2

Bayesian method in Recommendation system

Preliminaries

Su�cient Statistics

Su�cient Statistics

f (y ; θ) = g{s(y); θ}h(y) <=> fY |S(y |s; θ)

s(y) is called su�cient statistics

Y = [y1, y2, ..yn] is from observation

what su�cient statistics can do?

e.g. f (y ;λ) = λe−λy1λe−λy2 = λ2e(−λ(y1+y2)) × 1

Bayesian method in Recommendation system

Preliminaries

Su�cient Statistics

Su�cient Statistics for Exponential family model

i.i.d S =∑

si (y)

Bayesian method in Recommendation system

Preliminaries

Bayesian formula and Conjugate Prior

conjugate prior in Exponential family models

likelihood in exponential family model∏ni=1

f (yi |w) = exp{ST (y)θ(w)− nb(w)}prior π(w) = exp{εT θ(w)− vb(w) + c(ε, v)}posterior p(w |D) = partition × p(D|w)× π(w) =exp{(ε+ s)T θ(w)− (v + n)b(w)}

Bayesian method in Recommendation system

Preliminaries

Bayesian formula and Conjugate Prior

Posterior = prior × likelihood

If prior is conjugate for the likelihood,then posterior is also the

same form as the prior

Bayesian method in Recommendation system

Preliminaries

Bayesian formula and Conjugate Prior

Multivariant Gaussian

p(x1, x2, ...xn) =1√2π|Σ|

exp{−1/2(x − µ)TΣ−1(x − u)} (1)

variance cov(x , x) = E{(x − E{x})covariance cov(x , y) = E{(x − E{x})(y − E{y})}question: how do you estimate the covariance of Gaussian

given n samples?

Cholksky decomposition

Bayesian method in Recommendation system

Preliminaries

Bayesian formula and Conjugate Prior

Wishart distribution

W (Λ|W0, ν0(freedom)) = 1

Z |Λ|(ν0−D−1)/2exp(−1

2Tr(W−1

0Λ))

Bayesian method in Recommendation system

Preliminaries

statistical sampling

Metropolis-Hastings

algorithm,Ak(z∗, z(τ)) = min(1,˜p(z∗)q(z(τ)|z∗)˜p(z(τ))q(z∗|z(τ))

) where q(z∗|z(τ))is

the proposal distribution of the new sample z∗conditioned on the

current samplez(τ).

Gibbs sampler: sample conditional distribution each time

Convergence:Detailed balancep(z∗)will be reached if we discard the

�rst bunch of samples.

Bayesian method in Recommendation system

Probablistic Matrix Factorization

σU

σV

σR

U

V

Rij

UserItems

Figure: graph model for probablistic matrix factorization

Bayesian method in Recommendation system

Probablistic Matrix Factorization

Point estimate of MAP

erri ,j = Rij − UTi Vj

∂E

∂Ui=

Nitem∑j=1

Ii ,j(erri ,j)(−Vj) + λUUi (2)

∂E

∂Vj=

Nu∑i=1

Ii ,j(errj)(−Ui ) + λV Vj (3)

Bayesian method in Recommendation system

Bayesian Matrix factorization

Wishart distribution

W (Λ|W0, ν0(freedom)) = 1

Z |Λ|(ν0−D−1)/2exp(−1

2Tr(W−1

0Λ))

Bayesian method in Recommendation system

Bayesian Matrix factorization

p(ΘU(µU , Λ)|Θ0) = N(µU |µ0, (β0ΛU)−1)W (ΛU |W0, ν0(freedom))

Bayesian method in Recommendation system

Bayesian Matrix factorization

Graph model

Bayesian method in Recommendation system

Bayesian Matrix factorization

p(R,U,V ,ΘU ,ΘV ,Θ0) =p(R|U,V , α)p(U|µU , ΛU)p(V |µV , ΛV )p(ΘU |Θ0)p(ΘV |Θ0)

Bayesian method in Recommendation system

Bayesian Matrix factorization

sampling on U and V

p(Ui |U−i ,R,V ,ΘU ,ΘV ,Θ0(α)) =p(R|U,V ,α)p(U|µU ,ΛU)[p(V |µV ,ΛV )p(ΘU |Θ0)p(ΘV |Θ0)]|R,U−i ,V ,ΘU ,ΘV ,Θ0=Const.

p(R,U−i ,V ,ΘU ,α)|R,U,V ,ΘV ,Θ0=Const.=

[∏

N(Rij | < Ui ,Vj >,α−1)]I (i ,j)p(Ui |µU , ΛU) = [

∏i ,jN(Rij | <

Ui ,Vj >,α−1)]I (i ,j)N(U−i |µU , ΛU)

Bayesian method in Recommendation system

Bayesian Matrix factorization

sampling on hyperparameter

p[(ΘU(µU , ΛU)|R,U,V ,ΘV , α] =p(R|U,V ,α)p(U|µU ,ΛU)p(V |µV ,ΛV )p(ΘU |Θ0)p(ΘV |Θ0)|R,U,V ,ΘV ,Θ0=Const.

p(R,U,V ,ΘV ,α)|R,U,V ,ΘV ,Θ0=Const.(denominator−is−const) =

[∏

N(Rij | < Ui ,Vj >,α−1)]I (i ,j)p(Ui |µU , ΛU)p(V |µV , ΛV )p(ΘV |Θ0)|R,U,V ,ΘV ,Θ0=Const. ={[∏i ,jN(Rij | < Ui ,Vj >

,α−1)]I (i ,j)N(U|µU , ΛU)N(V |µV , ΛV ){N(µV |µ0, (β0ΛV )−1)...×W (ΛV |W0, freedom)}|R,U,V ,ΘV ,Θ0=Const.}|constant ....×{N(µU |µ0, (β0ΛU)−1)W (ΛU |W0, freedom)} =N(µU |µ∗0, (β∗ΛU)−1)W (ΛU |W ∗

0, v∗

0)

Bayesian method in Recommendation system

Bayesian Factorization machine

A review to Factorization machine model

Gaussian and wishart