exploring statistical language models for recommender systems [recsys '15 ds slides]

DOCTORAL SYMPOSIUMExploring Statistical Language Models forRecommender Systems

RecSys 201516 - 20 September, Vienna, Austria

Daniel Valcarce@dvalcarce

Information Retrieval LabUniversity of A CoruñaSpain

Motivation

Information Retrieval vs Information Filtering (1)

Information Retrieval (IR)

# Goal: Retrieve relevantdocuments according to theinformation need of a user

# Examples: Search engines(web, multimedia...)

# Input: The user’s query(explicit).

Information Filtering (IF)

# Goal: Select relevant itemsfrom an information streamfor a given user

# Examples: spam filters,recommender systems

# Input: The user’s history(implicit).

Some people consider them different fields:

# U. Hanani, B. Shapira and P. Shoval: InformationFiltering: Overview of Issues, Research and Systems inUser Modeling and User-Adapted Interaction (2001)

While other consider them the same thing:

# N. J. Belkin and W. B. Croft: Information filtering andinformation retrieval: two sides of the same coin? inCommunications of the ACM (1992)

What is undeniable is that they are closely related:

# Why not apply techniques from one field to the other?

# It has already been done!

Some retrieval techniques are:

# Vector: Vector Space Model

# MF: Latent SemanticIndexing (LSI)

# Probabilistic: LDA

,Language Models (LM)

Some CF techniques are:

# Vector: Pairwise similarities(cosine, Pearson)

# MF: SVD, NMF

# Probabilistic: LDA andother PGMs

# MF: SVD, NMF

# Probabilistic: LDA,Language Models (LM)

# MF: SVD, NMF

Language Models for Recommendation: Research goals

Language Models (LM) represented a breakthrough inInformation Retrieval:

# State-of-the-art technique for text retrieval

# Solid statistical foundation

Maybe they can also be useful in RecSys:

# Are LM a good framework for Collaborative Filtering?

# Can LM be adapted to deal with temporal (TARS) and/orcontextual information (CARS)?

# A principled formulation of LM that combinesContent-Based and Collaborative Filtering?

Language Models for Recommendation: Related work

There is little work done in using Language Models for CF:

# J. Wang, A. P. de Vries and M. J. Reinders: A User-ItemRelevance Model for Log-based Collaborative Filteringin ECIR 2006

# A. Bellogín, J. Wang and P. Castells: BridgingMemory-Based Collaborative Filtering and TextRetrieval in Information Retrieval (2013)

# J. Parapar, A. Bellogín, P. Castells and Á. Barreiro:Relevance-Based Language Modelling for RecommenderSystems in Information Processing & Management (2013)

Relevance-Based Language Modelsfor Collaborative Filtering

Relevance-Based Language Models

Relevance-Based Language Models or Relevance Models (RM)are a pseudo-relevance feedback technique from IR.

Pseudo-relevance feedback is an automatic query expansiontechnique.

The expanded query is expected to yield better results than theoriginal one.

Pseudo-relevance feedback

Information need

query RetrievalSystem

Information need

QueryExpansion

expandedquery

Information need

QueryExpansion

expandedquery

Relevance-Based Language Models for CF Recommendation (1)

IR RecSysUser’s query User’s profile

mostˆ1,populatedˆ1,stateˆ2 Titanicˆ2,Avatarˆ3,Sharkˆ5

Parapar et al. (2013):

RM2 : p(i|Ru) ∝ p(i)∏j∈Iu

∑v∈Vu

p(i|v) p(v)p(i)

p(j|v)

# Iu is the set of items rated by the user u

# Vu is neighbourhood of the user u. This is computed using aclustering algorithm

# p(i|u) is computed smoothing the maximum likelihoodestimate with the probability in the collection

# p(i) and p(v) are the item and user priors

∑v∈Vu

p(i|v) p(v)p(i)

p(j|v)

# Iu is the set of items rated by the user u# Vu is neighbourhood of the user u. This is computed

using a clustering algorithm

∑v∈Vu

p(i|v) p(v)p(i)

p(j|v)

# Iu is the set of items rated by the user u# Vu is neighbourhood of the user u. This is computed using a

clustering algorithm

∑v∈Vu

p(i|v) p(v)p(i)

p(j|v)

# Iu is the set of items rated by the user u# Vu is neighbourhood of the user u. This is computed using a

clustering algorithm

Smoothing methods

Smoothing in RM2

∑v∈Vu

p(i|v) p(v)p(i)

p(j|v)

p(i|u) is computed smoothing the maximum likelihoodestimate:

pml(i|u) =ru,i∑

j∈Iuru,j

with the probability in the collection:

p(i|C) =∑

v∈U rv,i∑j∈I, v∈U rv,j

Why use smoothing?

In Information Retrieval, smoothing provides:

# A way to deal with data sparsity

# The inverse document frequency (IDF) role

# Document length normalisation

In RecSys, we have the same problems:

# Data sparsity

# Item popularity vs item specificity

# Profiles with different lengths

Why use smoothing?

In Information Retrieval, smoothing provides:

# A way to deal with data sparsity

# The inverse document frequency (IDF) role

# Document length normalisation

In RecSys, we have the same problems:

# Data sparsity

# Item popularity vs item specificity

# Profiles with different lengths

Smoothing techniques

Jelinek-Mercer (JM): Linear interpolation. Parameter λ.

pλ(i|u) = (1− λ) pml(i|u) + λ p(i|C)

Dirichlet priors (DP): Bayesian analysis. Parameter µ.

pµ(i|u) =ru,i + µ p(i|C)µ+

∑j∈Iu

Absolute Discounting (AD): Subtract a constant δ.

pδ(i|u) =max(ru,i − δ, 0) + δ |Iu| p(i|C)∑

j∈Iuru,j

Experiments with smoothing

Smoothing: ranking accuracy

0 100 200 300 400 500 600 700 800 900 1000

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

λ, δ

RM2 + ADRM2 + JMRM2 + DP

Figure: nDCG@10 values of RM2 varying the smoothing methodusing 400 nearest neighbours according to Pearson’s correlation onMovieLens 100k dataset

Smoothing: diversity

0 100 200 300 400 500 600 700 800 900 1000

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

λ, δ

Figure: Gini@10 values of RM2 varying the smoothing method using400 nearest neighbours according to Pearson’s correlation onMovieLens 100k dataset

Smoothing: novelty

0 100 200 300 400 500 600 700 800 900 1000

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

λ, δ

Figure: MSI@10 values of RM2 varying the smoothing method using400 nearest neighbours according to Pearson’s correlation onMovieLens 100k dataset

More about smoothings in RM2 for CF

More details about smoothings in:

D. Valcarce, J. Parapar, Á. Barreiro: A Study ofSmoothing Methods for Relevance-Based LanguageModelling of Recommender Systems in ECIR 2015

Priors

Priors in RM2

∑v∈Vu

p(i|v) p(v)p(i)

p(j|v)

p(i) and p(v) are the item and user priors:

# Enable to introduce a priori information into the model

# Provide a principled way of modelling business rules!

Priors in RM2

∑v∈Vu

p(i|v) p(v)p(i)

p(j|v)

p(i) and p(v) are the item and user priors:

# Enable to introduce a priori information into the model

# Provide a principled way of modelling business rules!

Prior estimates

Uniform (U) Linear (L)

User prior pU(u) =1|U| pL(u) =

∑i∈Iu

ru,i∑v∈U

∑j∈Iv

Item prior pU(i) =1|I| pL(i) =

∑u∈Ui

ru,i∑j∈I

∑v∈Uj

Experiments with priors

Priors on MovieLens 100k

User prior Item prior nDCG@10 Gini@10 MSI@10

Linear Linear 0.0922 0.4603 28.4284Uniform Linear 0.2453 0.2027 16.4022Uniform Uniform 0.3296 0.0256 6.8273Linear Uniform 0.3423 0.0264 6.7848

Table: nDCG@10, Gini@10 and MSI@10 values of RM2 varying theprior estimates using 400 nearest neighbours according to Pearson’scorrelation on MovieLens 100k dataset and Absolute Discounting(δ = 0.1)

More priors in

D. Valcarce, J. Parapar and Á. Barreiro: A Study of Priorsfor Relevance-Based Language Modelling ofRecommender Systems in RecSys 2015!

Comparison with other CF algorithms

Priors on MovieLens 100k

Algorithm nDCG@10 Gini@10 MSI@10

SVD 0.0946 0.0109 14.6129SVD++ 0.1113 0.0126 14.9574NNCosNgbr 0.1771 0.0344 16.8222UIR-Item 0.2188 0.0124 5.2337PureSVD 0.3595 0.1364 11.8841RM2-JM 0.3175 0.0232 9.1087RM2-DP 0.3274 0.0251 9.2181RM2-AD 0.3296 0.0256 9.2409RM2-AD-L-U 0.3423 0.0264 9.2004

Table: nDCG@10, Gini@10 and MSI@10 values of different CFrecommendation algorithms

Conclusions and future directions

Conclusions

IR techniques can be employed in RecSys

# Not only methods such as SVD...

# but also Language Models!

Language Models provide a principled and interpretableframework for recommendation.

Relevance-Based Language Models are competitive, but there isroom for improvements:

# More sophisticated priors# Neighbourhood computation

◦ Different similarity metrics: cosine, Kullback–Leiblerdivergence

◦ Matrix factorisation: NMF, SVD◦ Spectral clustering: NC

Conclusions

# More sophisticated priors

# Neighbourhood computation◦ Different similarity metrics: cosine, Kullback–Leibler

divergence◦ Matrix factorisation: NMF, SVD◦ Spectral clustering: NC

Conclusions

Future work

Improve novelty and diversity figures:

# RM2 performance is similar to PureSVD in terms of nDCG

# but it fails in terms of diversity and novelty

Introduce more evidences in the LM framework apart fromratings:

# Content-based information (hybrid recommender)

# Temporal and contextual information (TARS & CARS)

Future work

Improve novelty and diversity figures:

# RM2 performance is similar to PureSVD in terms of nDCG

# but it fails in terms of diversity and novelty

Introduce more evidences in the LM framework apart fromratings:

# Content-based information (hybrid recommender)

# Temporal and contextual information (TARS & CARS)

Thank you!

@dvalcarce

http://www.dc.fi.udc.es/~dvalcarce

Time and Context in Language Models

# X. Li and W. B. Croft: Time-based Language Models inCIKM 2003

# K. Berberich, S. Bedathur, O. Alonso and G. Weikum: Alanguage modeling approach for temporal informationneeds in ECIR 2010

Context:

# H. Rode and D. Hiemstra: Conceptual Language Modelsfor Context-Aware Text Retrieval in TREC 2004

# L. Azzopardi: Incorporating Context within theLanguage Modeling Approach for ad hoc InformationRetrieval. PhD Thesis (2005)

exploring statistical language models for recommender systems [recsys '15 ds slides]

Data & Analytics