tanwistha saha, huzefa rangwala, carlotta...

Predicting Preference Tags to Improve Item Recommendation Tanwistha Saha, Huzefa Rangwala, Carlotta Domeniconi

Department of Computer Science

George Mason University, VA USA

• User preferences/item descriptors are represented by their choice of tags (short text snippets) while rating any item

• Tags can be seen as “multi-labels” that are descriptive of both the user and the item

• Collaborative Filtering (CF) based Recommender Systems [1] improve their performance by integrating these tags as user preferences and item descriptors in the system

•  Collective Classification [2] on the implicit user-user network and item-item network can be used to predict tags for users/items that don’t have any

•  Two users tagging/rating at least one common item are connected in the user-user network

•  Two items rated by at least one user are connected in the item-item network

•  Active Learning [3] methods on relational networks are used to improve collective classifier’s performance on tag prediction in cases when enough users/items didn’t have tag information available

BACKGROUND

OBJECTIVE AND SETUP

Latent Factor Models with User and Item Tags Factorization Machines (FM) [4] with User and Item Tags (User-Item-Tag-FM): User-Tag-FM (UT-FM): Uses predicted tags from collective classification for users only Item-Tag-FM (IT-FM): Uses predicted tags from collective classification for items only

Neighborhood-based model with User and Item Tags Rating for an item by an user is computed by tag based similarity between the user and the Item (User-Item-Tag-KNN or UIT-KNN). User-Tag-KNN (UT-KNN): Predicted rating of an item is an weighted average of the ratings given by the “nearest neighbors” of the user, computed using tag-based similarity Item-Tag-KNN (IT-KNN): Predicted rating of an item is an weighted average of the ratings obtained by the “nearest neighbors” of the items, computed using tag-based similarity

Active Learning for Tag Prediction using Collective Classification Active learning is used to selectively query informative users/items on their preference tags. We use FLIP algorithm [5] to choose which users / items to select for querying the tags. The users/items with high FLIP scores S[z] are chosen (z is a node in the implicit user-user or item-item network) for which we *do not* have the tag information.

METHODOLOGY

1.  X. Su & T. M. Khoshgoftaar. “A survey of collaborative filtering techniques”, Advances in Artificial Intelligence, 2009.

2.  S. A. Macskassy & F. Provost. “Classification in networked data: A toolkit and a univariate case study”, Journal of Machine Learning Research, 2007.

3.  Burr Settles. “Active Learning Literature Survey”, Technical Report, University of Wisconsin-Madison, 2010.

4.  S. Rendle. “Factorization Machines with LIBFM”, ACM Transactions on Intelligent Systems and Technology, TIST 2012.

5.  T. Saha, H. Rangwala & C. Domeniconi. “FLIP: Active Learning for Relational Network Classification”, European Conference on Machine Learning, ECML 2014.

We propose : •  Using collective classification for predicting user preference tags and

item descriptive tags as side information in state-of-the-art collaborative filtering recommender systems

•  Using active learning to incrementally add informative samples to the training set for training the collective classifier

Our results on several real world relational network datasets show that using collective classification to predict preference tags in tag-based recommender systems is more effective for datasets with high rating density.

The experiments were run on Argo Research Cluster http://orc.gmu.edu/

Title Here Insert your text here. You can change the font size to fit your text. You can also make this box shrink or grow with the amount of text. Simply double click this text box, go to the “Text Box” tab, and check the option “Resize AutoShape to fit text”.

• The background of this template may appear blue on your screen, but it does print lavender. Insert your text here.

• You can change the font size to fit your text. • You can also make this box shrink or grow with the amount of text.

Simply double click this text box, go to the “Text Box” tab, and check the option “Resize AutoShapesize AutoShape to fit text”.

CONCLUSIONS

REFERENCES

f(x) = w0 + wu + wi +

t̂usws +

t̂ushvi,vsi+|P |X

t̂ust̂us0hvs,vs0i+|P |X

t̂iq t̂iq0hvq,vq0i+

t̂iqwq +

t̂iqhvu,vqi+FX

vufviff(x) -> predicted rating

-> predicted tag for user/item t̂

R̂u,i =

Si,jRu,jPj2Ni

RESULTS WITH ACTIVE LEARNING

•  Hamming Loss for Item tag prediction with active learning (FLIP) decreases with increasing training data in comparison to Random

•  LibraryThing dataset with active learning (FLIP) can achieve the same RMSE with only 65% of labeled data, that is obtained when 80% labeled data is used in traditional settings

S[z] =maxiterX

|t̂jzp

� t̂j�1zp

Contact (any query/comments): tsaha@masonlive.gmu.edu rangwala@cs.gmu.edu carlotta@cs.gmu.edu

RESULTS WITH COLLABORATIVE FILTERING

Datasets for the experiments are publicly available at: http://www.cs.gmu.edu/~mlbio/TagRecSys/ Mean Absolute Error (MAE) for Tag-based Latent Factor Models are less compared to the baseline model (FM) which do not use tags. All the scores are average of 5 independent runs. The statistical significance is proved by paired t-tests with p-value <0.1 (significance level of 0.10).

Datasets FM IT-‐FM IT-‐FM-‐G UIT-‐FM UIT-‐FM-‐G

TripAdvisor 0.8704 ± 0.0077 0.8480 ± 0.0043 0.8310 ± 0.0047 0.8429 ± 0.0044 0.8312 ± 0.0063

MovieLens 0.5741 ± 0.0013 0.5706± 0.0016 0.5674 ± 0.0015 0.5751 ± 0.0019 0.5724 ± 0.0011

LibraryThing 0.6128 ± 0.0014 0.6045 ± 0.0015 0.5959 ± 0.0012 0.6085 ± 0.0019 0.5985 ± 0.0012

Dataset UIT-‐FM vs FM UIT-‐FM-‐G vs FM IT-‐FM vs FM IT-‐FM-‐G vs FM

TripAdvisor 0 0 0 0

MovieLens 0.7645 0.4202 0.1283 0

LibraryThing 0.1059 0 0 0

MAE for Tag-based methods

p-value for MAE of Tag-based Methods w.r.t. Baseline Methods

Inspiring, feel-good

Feel-good, drama

Crime drama, 90s

Prison movie

User 1

User 2

User 3

User 4

Users Tags Item

e.g., Fiction, Comedy, Horror,

Thriller

All users inthe system

user-item ratingmatrix

user-user network

item-item network

Tag predictions for users/items

(collective classification)

User-user / item-item CFOR

Factorization Machines

Top-N predicted items Predicted rating of an item

GUrates

Rm1,Rm2, · · · ,Rmn

R11,R12, · · · ,R1n

tanwistha saha, huzefa rangwala, carlotta...

Documents

saha lecture updated1

statistics anuradha saha

bibhas saha - econstor.eu

official website of mathabhanga municipality · 2020. 2....

+ sparsification and sampling of networks for collective...

amlan saha(11072)

huzefa chini socialmediadeck

slides prepared by prof. huzefa...

inadequate housing, israel, and the bedouin of the...

rashmi saha - wicci

10 ways to fail as a business analyst - lubaina rangwala

av audit saha

official website of mathabhanga municipality · with slab...

7706 saha equation

allotment of additional bpl rice mmasy sept -...

rangwala draft (11/8/2017) public draft (11/4/19) changes

pritha saha 132500020042

medical tourism_surojit saha

autri saha

bappa saha (project)