lecturenotes for bayesian methods in recommendation system

Bayesian method in Recommendation system


Lecturer: Xudong Sun,[email protected]

DSOR-AISBI

January 20, 2016


1 Preliminaries

Exponential family model

Su�cient Statistics

Bayesian formula and Conjugate Prior

statistical sampling

2 Probablistic Matrix Factorization

3 Bayesian Matrix factorization

4 Bayesian Factorization machine


Preliminaries


Exponential family model of order 1

Exponential tilting:f (y ; θ) = es(y)θf0(y)∫ s(x)θ f0(x)dx

f0(x)isanuniformdistributionunderadefinedsupport

a valid θ is called a natural parameter

s(y) is a transform for the random variable, which de�ne the

order of the exponential family

cumulant-generating function κ(θ) = log∫

es(y)θf0(y)dy

Natural parameter set is Convex set N = {θ : κ(θ) <∞}Partition function P(y) = e−κ(θ) and moment generating

function E{ety} =∫

ety f (y)dy

Physics: If you have the partition function, you have

everything!


Preliminaries


Exponential family model of order 1

Suppose s(y) = θ

f (y ; θ) = eyθf0(y)∫ yθ f0(x)dx

κ(θ) = log(eθ − 1)/θ,why?

f (y ; θ) = θeθy/(eθ − 1)

Exponential family model without tilting:

f (y ; θ) = es(y)θf0(y)

eκ(θ) = exp{s(y)θ − κ(θ) + logf0(y)}f (y ;ω) = exp{s(y)θ(w)− b(w) + c(y)}

s(y) is called a natual observation

θ is called a natual parameter


Preliminaries


Case study for exponential family model of order 1

Find the natual observation and natual parameter for the following

distribution

exponential p.d.f. f (y ; w) = w−1exp(−y/w)

bionomial p.d.f. C rmπ

r (1− π)m−r

Poisson density p.d.f. λxe−lambda

x!


Preliminaries


Exponential family model of order p

Exponential tilting:f (y ; θ) = esT (y)θf0(y)∫ sT (x)θ f0(x)dx∑

ci si (y) = 0 implies ci = 0

θ = [1, θ1(w), ...θp(w)]

f (y ; w) = exp{sT (y)θ(w)− b(w)}f0(y)

in case there is 1-1 mapping between ω and

θ,f (y ; w) = exp{sT (y)θ − κ(θ)}f0(y)

E s(Y ) = dκ(θ)dθ

E s(Y ) = dκ(θ)dθ


Preliminaries


Case study for exponential family model of order p

Find the natual observation and natual parameter for the following

distribution

beta p.d.f. f (y ; w) = w−1exp(−y/w)

one dimension Gaussian f (y |µ, σ2) = 1

(2π)0.5σexp− 1

2σ2(y − u)2


Preliminaries



f (y ; θ) = g{s(y); θ}h(y) <=> fY |S(y |s; θ)

s(y) is called su�cient statistics

Y = [y1, y2, ..yn] is from observation

what su�cient statistics can do?

e.g. f (y ;λ) = λe−λy1λe−λy2 = λ2e(−λ(y1+y2)) × 1


Preliminaries


Su�cient Statistics for Exponential family model

i.i.d S =∑

si (y)


Preliminaries


conjugate prior in Exponential family models

likelihood in exponential family model∏ni=1

f (yi |w) = exp{ST (y)θ(w)− nb(w)}prior π(w) = exp{εT θ(w)− vb(w) + c(ε, v)}posterior p(w |D) = partition × p(D|w)× π(w) =exp{(ε+ s)T θ(w)− (v + n)b(w)}


Preliminaries


Posterior = prior × likelihood

If prior is conjugate for the likelihood,then posterior is also the

same form as the prior


Preliminaries


Multivariant Gaussian

p(x1, x2, ...xn) =1√2π|Σ|

exp{−1/2(x − µ)TΣ−1(x − u)} (1)

variance cov(x , x) = E{(x − E{x})covariance cov(x , y) = E{(x − E{x})(y − E{y})}question: how do you estimate the covariance of Gaussian

given n samples?

Cholksky decomposition


Preliminaries


Wishart distribution

W (Λ|W0, ν0(freedom)) = 1

Z |Λ|(ν0−D−1)/2exp(−1

2Tr(W−1

0Λ))


Preliminaries

statistical sampling

Metropolis-Hastings

algorithm,Ak(z∗, z(τ)) = min(1,˜p(z∗)q(z(τ)|z∗)˜p(z(τ))q(z∗|z(τ))

) where q(z∗|z(τ))is

the proposal distribution of the new sample z∗conditioned on the

current samplez(τ).

Gibbs sampler: sample conditional distribution each time

Convergence:Detailed balancep(z∗)will be reached if we discard the

�rst bunch of samples.


Probablistic Matrix Factorization

σU

σV

σR

U

V

Rij

UserItems

Figure: graph model for probablistic matrix factorization


Probablistic Matrix Factorization

Point estimate of MAP

erri ,j = Rij − UTi Vj

∂E

∂Ui=

Nitem∑j=1

Ii ,j(erri ,j)(−Vj) + λUUi (2)

∂E

∂Vj=

Nu∑i=1

Ii ,j(errj)(−Ui ) + λV Vj (3)


Bayesian Matrix factorization

Wishart distribution

W (Λ|W0, ν0(freedom)) = 1

Z |Λ|(ν0−D−1)/2exp(−1

2Tr(W−1

0Λ))



p(ΘU(µU , Λ)|Θ0) = N(µU |µ0, (β0ΛU)−1)W (ΛU |W0, ν0(freedom))



Graph model



sampling on hyperparameter

p[(ΘU(µU , ΛU)|R,U,V ,ΘV , α] =p(R|U,V ,α)p(U|µU ,ΛU)p(V |µV ,ΛV )p(ΘU |Θ0)p(ΘV |Θ0)|R,U,V ,ΘV ,Θ0=Const.

p(R,U,V ,ΘV ,α)|R,U,V ,ΘV ,Θ0=Const.(denominator−is−const) =

[∏

N(Rij | < Ui ,Vj >,α−1)]I (i ,j)p(Ui |µU , ΛU)p(V |µV , ΛV )p(ΘV |Θ0)|R,U,V ,ΘV ,Θ0=Const. ={[∏i ,jN(Rij | < Ui ,Vj >

,α−1)]I (i ,j)N(U|µU , ΛU)N(V |µV , ΛV ){N(µV |µ0, (β0ΛV )−1)...×W (ΛV |W0, freedom)}|R,U,V ,ΘV ,Θ0=Const.}|constant ....×{N(µU |µ0, (β0ΛU)−1)W (ΛU |W0, freedom)} =N(µU |µ∗0, (β∗ΛU)−1)W (ΛU |W ∗

0, v∗

0)


Bayesian Factorization machine

A review to Factorization machine model

Gaussian and wishart

lecturenotes for bayesian methods in recommendation system

Education