cs590i: information retrieval

50
CS590I: Information Retrieval CS-590I Information Retrieval Collaborative Filtering Luo Si Department of Computer Science Purdue University

Upload: quynn-branch

Post on 31-Dec-2015

33 views

Category:

Documents


0 download

DESCRIPTION

CS590I: Information Retrieval. CS-590I Information Retrieval Collaborative Filtering Luo Si Department of Computer Science Purdue University. Abstract. Outline Introduction to collaborative filtering Main framework Memory-based collaborative filtering approach - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: CS590I: Information Retrieval

CS590I: Information Retrieval

CS-590I

Information Retrieval

Collaborative Filtering

Luo Si

Department of Computer Science

Purdue University

Page 2: CS590I: Information Retrieval

Abstract

Outline Introduction to collaborative filtering Main framework Memory-based collaborative filtering approach Model-based collaborative filtering approach

Aspect model & Two-way clustering model

Flexible mixture model

Decouple model

Unified filtering by combining content and collaborative filtering Active learning for collaborative filtering

Page 3: CS590I: Information Retrieval

What is Collaborative Filtering?

Collaborative Filtering (CF):Making recommendation decisions for a specific user based on the judgments of users with similar tastes

Content-Based Filtering: Recommend by analyzing the content information

Collaborative Filtering: Make recommendation by judgments of

similar users

Page 4: CS590I: Information Retrieval

What is Collaborative Filtering?

Collaborative Filtering (CF): Making recommendation decisions for a specific user based on the judgments of users with similar

tastes

Train_User 1 1 5 3 3 4

Train_User 2 4 1 5 3 2

Test User 1 ? 3 4

Page 5: CS590I: Information Retrieval

What is Collaborative Filtering?

Collaborative Filtering (CF): Making recommendation decisions for a specific user based on the judgments of users with similar

tastes

Train_User 1 1 5 3 3 4

Train_User 2 4 1 5 3 2

Test User 1 5 3 4

Page 6: CS590I: Information Retrieval

Why Collaborative Filtering?

Advantages of Collaborative Filtering Collaborative Filtering does not need content information as required by CBF

The contents of items belong to the third-party (not accessible or available)

The contents of items are difficult to index or analyze (e.g., multimedia information)

Problems of Collaborative Filtering Privacy issues, how to share one’s interest without disclosing too much detailed information?

Page 7: CS590I: Information Retrieval

Why Collaborative Filtering?

Applications Collaborative Filtering

E-Commerce

Email ranking: borrow email ranking from your office mates (be careful…)

Web search? (e.g., local search)

Page 8: CS590I: Information Retrieval

Formal Framework for Collaborative Filtering

Test User Ut

2 3

What we have:

• Assume there are some ratings by training users

• Test user provides some amount of additional training data

What we do:

• Predict test user’s rating based training information

Rut (Oj)=

TrainingUsers:

Un

O1 O2 O3 ……Oj………… OMU1

U2

UN

4 1 1

5 2 2

Ui

Objects: Om

3 2 4

Page 9: CS590I: Information Retrieval

Memory-Based Approaches

Memory-Based Approaches Given a specific user u, find a set of similar users

Predict u’s rating based on ratings of similar users

Issues How to determine the similarity between users?

How to combine the ratings from similar users to make the predictions (how to weight different users)?

Page 10: CS590I: Information Retrieval

Memory-Based Approaches

2_

2_

__

,

))(())((

))()()((

uuuu

uuuu

uu

RoRRoR

RoRRoRw

tt

tt

t

Pearson Correlation Coefficient Similarity

Average Ratings

Vector Space Similarity

22,)()(

)()(

oRoR

oRoRw

uu

uuuu

t

t

t

_

_ ,^

,

( ( ) )( )

t

tt

t

uuu uu

uu

u uu

w R o RR o R

w

Prediction:

How to determine the similarity between users?

Measure the similarity in rating patterns between different users

Page 11: CS590I: Information Retrieval

Memory-Based Approaches

_

_ ,^

,

( ( ) )( )

t

tt

t

uuu uu

uu

u uu

w R o RR o R

w

Prediction:

How to combine the ratings from similar users for predicting?

Weight similar users by their similarity with a specific user; use these weights to combine their ratings.

Page 12: CS590I: Information Retrieval

Memory-Based Approaches

Remove User-specific Rating Bias

Train_User 1 1 5 3 3 4

Train_User 2 4 1 5 3 2

Test User 1 ? 3 2 4

Page 13: CS590I: Information Retrieval

Memory-Based Approaches

Train_User 1 1 5 3 3 4

Sub Mean (Train1) -2.2 1.8 -0.2 -0.2 0.8

Train_User 2 4 1 5 3 2

Sub Mean (Train2) 1 -2 2 0 -1

Test User 1 ? 3 4

Sub Mean (Test) -1.667 0.333 1.33

Normalize Rating

Page 14: CS590I: Information Retrieval

Memory-Based Approaches

Train_User 1 1 5 3 3 4

Sub Mean (Train1) -2.2 1.8 -0.2 -0.2 0.8

Train_User 2 4 1 5 3 2

Sub Mean (Train2) 1 -2 2 0 -1

Test User 1 ? 3 4

Sub Mean (Test) -1.667 0.333 1.33

Calculate Similarity: Wtrn1_test=0.92; Wtrn2_test=-0.44;

Page 15: CS590I: Information Retrieval

Memory-Based Approaches

Train_User 1 1 5 3 3 4

Sub Mean (Train1) -2.2 1.8 -0.2 -0.2 0.8

Train_User 2 4 1 5 3 2

Sub Mean (Train2) 1 -2 2 0 -1

Test User 1 ? 3 4

Sub Mean (Test) -1.667 0.333 1.33

Make Prediction: (1.8*0.61 ( 2)*( 0.78))

2.67 4.62(0.92 0.44)

Page 16: CS590I: Information Retrieval

Memory-Based Approaches

Train_User 1 1 5 3 3 4

Sub Mean (Train1) -2.2 1.8 -0.2 -0.2 0.8

Train_User 2 4 1 5 3 2

Sub Mean (Train2) 1 -2 2 0 -1

Test User 1 5 3 4

Sub Mean (Test) -1.667 0.333 1.33

Make Prediction: (1.8*0.61 ( 2)*( 0.78))

2.67 4.62(0.92 0.44)

Page 17: CS590I: Information Retrieval

Memory-Based Approaches

Problems with memory-based approaches Associated a large amount of computation online costs

(have to go over all users, any fast indexing approach?)

Heuristic method to calculate user similarity and make user rating prediction

Possible Solution

Cluster users/items in offline manner, save for online computation cost

Proposal more solid probabilistic modeling method

Page 18: CS590I: Information Retrieval

Model-Based Approaches:Aspect Model (Hofmann et al., 1999)

– Model individual ratings as convex combination of preference factors

Zz

llllll zrPzuPzoPzPruoP )|()|()|()(),,( )()()()()()(

Two-Sided Clustering Model (Hofmann et al., 1999)– Assume each user and item belong to one user and item group.

vuuyuv

vxlllll CJIuPoPruoPll )()(

,)()()()()( )()(),,( Ix(l)v,Jy(l)u:Indicator

Variables Cvu: Associaion Parameter

P(o|Z) P(Z) P(u|Z)

P(r|Z)

Ol

Rl

Ul

L

Z

Collaborative Filtering

Page 19: CS590I: Information Retrieval

Model-Based Approaches:Aspect Model (Hofmann et al., 1999)

– Model individual ratings as convex combination of preference factors

Zz

llllll zrPzuPzoPzPruoP )|()|()|()(),,( )()()()()()(

P(o|Z) P(Z) P(u|Z)

P(r|Z)

Ol

Rl

Ul

L

Z

Collaborative Filtering

Brief description for the expectation maximization training process…

Page 20: CS590I: Information Retrieval

Thoughts:

Previous algorithms all cluster users and objects either implicitly (memory-based) or explicitly (model-based)

– Aspect model allows users and objects to belong to different classes, but cluster them together

– Two-sided clustering model clusters users and objects separately, but only allows them to belong to one single class

Collaborative Filtering

Page 21: CS590I: Information Retrieval

Previous Work: Thoughts

Cluster users and objects separately AND allow them to belong to different classes

Flexible Mixture Model (FMM)

Page 22: CS590I: Information Retrieval

Flexible Mixture Model (FMM):

Cluster users and objects separately AND allow them to belong to different classes

uo ZZ

uoluloluo

lll

ZZrPZuPZoPZPZP

ruoP

,)()()(

)()()(

),|()|()|()()(

),,(

• Training Procedure: Annealed Expectation Maximization (AEM) algorithm

E-Step: Calculate Posterior Probabilities

uo ZZ

buoluloluo

buoluloluo

llluo ZZrPZuPZoPZPZP

ZZrPZuPZoPZPZPruozzP

,)()()(

)()()()()()( )),|()|()|()()((

)),|()|()|()()((),,|,(

P(u|Zu)

Zo Zu

P(o|Zo) P(Zo) P(Zu)

P(r|Zo,Zu)

Ol Rl Ul L

Collaborative Filtering

Page 23: CS590I: Information Retrieval

),|();|();|();();( )()()( uoluloluo ZZrPZuPZoPZPZP

• Prediction Procedure: Fold-In process to calculate join probabilities

M-Step: Update Parameters

uo ZZ

uout

ouolt ZZrPZuPZoPZPZPruoP

,)( ),|()|()|()()(),,(

Fold-in process by EM algorithmCalculate expectation for prediction

'

'

^

),,(

),,()(

r

t

t

ru ruoP

ruoProR t

“Flexible Mixture Model for Collaborative Filtering”, ICML’03

Collaborative Filtering

Page 24: CS590I: Information Retrieval

Thoughts:

Previous algorithms address the problem that users with similar tastes may have different rating patterns implicitly (Normalize user rating)

Collaborative Filtering

Page 25: CS590I: Information Retrieval

Previous Work: Thoughts

Thoughts:

Explicitly decouple users preference values out of the rating values

Decoupled Model (DM)

Nice Rating: 5Mean Rating: 2

Nice Rating: 3Mean Rating: 1

Page 26: CS590I: Information Retrieval

Decoupled Model (DM)

Decoupled Model (DM):Separate preference value

],....,1[ kZ pref

from rating

(1 disfavor, k favor)

}5,4,3,2,1{r ZPre

Joint Probability:

Ruo preZZZ

RprelZ

ouprelRuloluo

lll

ZZrPZZZPuZPZuPZoPZPZP

ruoP

,,)()()()(

)()()(

)],|(),|()[|()|()|()()(

),,(

“Preference-Based Graphical Model for Collaborative Filtering”, UAI’03“A study of Mixture Model for Collaborative Filtering”, Journal of IR

ZR

P(u|Zu) P(o|Zo) P(Zo) P(Zu)

Zo Zu

P(r|ZPre,ZR)

Ol

Rl

Ul

L

Page 27: CS590I: Information Retrieval

Experimental Data

MovieRating EachMovie

Number of Users 500 2000

Number of Movies 1000 1682

Avg. # of rated items/User 87.7 129.6

Scale of ratings 1,2,3,4,5 1,2,3,4,5,6

Datasets: MovieRating and EachMovie

Evaluation: MAE: average absolute deviation of the predicted ratings to the actual ratings on items.

|)(|1 ^

)()( )( lol

lTest

uRrL

MAEl

Page 28: CS590I: Information Retrieval

Vary Number of Training UserTest behaviors of algorithms with different amount of training data

– For MovieRating100 and 200 training users

– For EachMovie200 and 400 training users

Vary Amount of Given Information from the Test UserTest behaviors of algorithms with different amount of given information from test user

– For both testbeds

Vary among given 5, 10, or 20 items

Collaborative Filtering

Page 29: CS590I: Information Retrieval

0.75

0.77

0.79

0.81

0.83

0.85

0.87

0.89

0.75

0.77

0.79

0.81

0.83

0.85

0.87

0.89

Movie Rating, 100 Training Users

Movie Rating, 200 Training Users

0.90.95

11.051.1

1.151.2

1.251.3

0.90.95

11.051.1

1.151.2

1.251.3

Each Movie, 400 Training Users

Each Movie, 200 Training Users

Given: 5 10 20 Given: 5 10 20

Given: 5 10 20Given: 5 10 20

MAE

MAE

MAE

MAE

Results of Flexible Mixture Model

Page 30: CS590I: Information Retrieval

Experimental ResultsImproved by Combing FMM and DM

Training Users Size

Algorithms5 Items Given

10 Items Given

20 Items Given

100FMM 0.829 0.822 0.807

FMM+DM 0.792 0772 0.741

200FMM 0.800 0.787 0.768

FMM+DM 0.770 0.750 0.728

Results on Movie Rating

Training Users Size

Algorithms5 Items Given

10 Items Given

20 Items Given

200FMM 1.07 1.04 1.02

FMM+DM 1.06 1.01 0.99

400FMM 1.05 1.03 1.01

FMM+DM 1.04 1.00 0.97

Results on Each Movie

Page 31: CS590I: Information Retrieval

Combine Collaborative Filtering and Content-Based Filtering

Content-Based Filtering (CBF): Recommend by analyzing the content information

A group of aliens visit earth……………………..

kind of friendship in which E.T learns…………

Science Fiction?Yes

Young Harry is in love and wants to marry an

actress, much to the displeasure of his family….

No

Unified Filtering (UF): Combining both the content-based information and

the collaborative rating information for more accurate recommendation

Content information is very useful when few users have rated an object.

Page 32: CS590I: Information Retrieval

Content-Based Filtering and Unified Filtering

Content-Based Filtering (CF):

Generative Methods (e.g. Naïve Bayes)

Discriminative Methods (e.g. SVM, Logistic Regression)

Unified Filtering by combining CF and CBF:

Linearly combine the scores from CF and CBF

Personalized linear combination of the scores

Bayesian combination with collaborative ensemble learning

- Usually more accurate

- Can be used to combine features (e.g., actors for movies)

Page 33: CS590I: Information Retrieval

Unified Filtering by flexible mixture model and exponential model

Unified Filtering with mixture model and exponential model (UFME):

Mixture model for rating information:

Exponential model for content information

P(Zu|U)

P(o) P(u)

Zo Zu R

P(r|Zx,Zy)

dolj Ol Ul

OlJ

uo ZZ

uollluolo

lll

ZZrPuPoPuZPdZP

ruoP

,)()()(

)()()(

),|()()()|()|(

),,(

Page 34: CS590I: Information Retrieval

Unified Filtering by flexible mixture model and exponential model

Mixture model for rating information:

Exponential model for content information:

P(Zu|U)

P(o) P(u)

Zo Zu R

P(r|Zx,Zy)

dolj Ol Ul

OlJ

'

exp( )

( | )(exp( ))

o

o

o

z j oljj

oloz j olj

jZ

d

P Z dd

,,,,,,,,,,,,,,

Specific word

Unified Filtering with mixture model and exponential model (UFME):

Page 35: CS590I: Information Retrieval

Unified Filtering by flexible mixture model and exponential model

Training Procedure:

E-Step: Calculate posterior probabilities

Expectation Step of EM

M-Step: Update parameters

Second, refine the object cluster distribution with content information by maximizing

Iterative Scaling Training

“Unified Filtering by Combining Collaborative Filtering and Content-Based Filtering via Mixture Model and Exponential

Model ”, CIKM’04

Page 36: CS590I: Information Retrieval

Experiment Results

Table. MAE results for four filtering algorithms on EachMovie testbed. Four algorithms are pure content-based filtering (CBF), pure collaborative filtering (CF), unified filtering by combining mixture model and exponential model (UFME)

Training Users Size

Algorithms0 Items Given

5 Items Given

10 Items Given

20 Items Given

50

CBF 1.43 1.21 1.24 1.19

CF 1.21 1.14 1.13 1.12

UFME 1.19 1.11 1.10 1.09

100

CBF 1.43 1.23 1.21 1.19

CF 1.17 1.08 1.07 1.05

UFME 1.17 1.08 1.06 1.05

Page 37: CS590I: Information Retrieval

Experiment Results

Cluster 1 Cluster 2 Cluster 3 Cluster 4 Cluster 5

forev previou mad inhabit custom

depress passion hang dress hang

mate court rape relat forev

broken forget finish door water

abandon sea arrest younger food

Table. Five most indicative words (with highest values) for 5 movie clusters, sorted byEach column corresponds to a different movie cluster. All listed words are stemmed.

( | )oP Z w

Page 38: CS590I: Information Retrieval

Summarization

What we talked about so far?

Propose the flexible mixture model

– Demonstrate the power of clustering users and objects separately AND allowing them to belong to different classes

Propose the decoupled model

– Demonstrate the power of extracting preference values from the surface rating values

Propose the unified probabilistic model for unified filtering

– Demonstrate the power of taking advantage of content information with limited rating information

Page 39: CS590I: Information Retrieval

Model-Based Approaches:Aspect Model (Hofmann et al., 1999)

– Model individual ratings as convex combination of preference factors

Zz

llllll zrPzuPzoPzPruoP )|()|()|()(),,( )()()()()()(

Two-Sided Clustering Model (Hofmann et al., 1999)– Assume each user and item belong to one user and item group.

vuuyuv

vxlllll CJIuPoPruoPll )()(

,)()()()()( )()(),,( Ix(l)v,Jy(l)u:Indicator

Variables Cvu: Associaion Parameter

P(o|Z) P(Z) P(u|Z)

P(r|Z)

Ol

Rl

Ul

L

Z

Collaborative Filtering

Page 40: CS590I: Information Retrieval

Active Learning

Motivation of Active Learning: Learn correct model using a small number of labeled examples

Active select training data: select unlabeled examples that lead to the largest reduction in the expected loss function

Key problems

Loss function, what is the best loss function? Prediction uncertainty? Model uncertainty? Should we consider the distribution of test data?

Should we only consider the current model or we need to incorporate the model uncertainty?

Page 41: CS590I: Information Retrieval

1f

2f

Correct Decision Boundary

Est

imat

ed D

ecis

ion

Bou

ndar

y

Positive class Negative class

Problem with Using Current Models in Active Learning

The estimated model is a horizontal line while the true

model is a vertical line

Active learning based on the current model will tend to select examples close to

horizontal line, which lead to an inefficient learning

Labeled Examples Borrow Slides from Rong Jin

Page 42: CS590I: Information Retrieval

Active Learning for CF

The key of active learning for CF is to efficiently identify the user type for the active users {p(y|z)}z

Previous approaches– ‘Model entropy method’: search for an object whose rating information

can most efficiently reduce the uncertainty in {p(y|z)}z

– ‘Predication entropy method’: search for an object whose rating information can most efficiently reduce the uncertainty in the predicted ratings

Problems with previous approaches:

– Both methods only utilize a single model (i.e., the current model) for estimating the expected cost

– Since each user can be of multiple user classes, purifying the distribution {p(u|z)}z may not necessary be the right goal

Page 43: CS590I: Information Retrieval

A Bayesian Approach

Key: find the object whose rating can help the model efficiently identify the distribution of user classes for the active user, or = {z = p(y|z)}z

– When true model true is given, find the example by

– Replacing true model true with the distribution for parameters p(|D)

| ,*

( | , )

arg min logtrue

z x rtruez truez

x X z p r x

x

θ

| ,* ''

( | , ') ( '| )

arg min log z x rzz

x X z p r x p D

x

θ θ

Page 44: CS590I: Information Retrieval

Approximate Posterior Distribution

Directly estimate the posterior distribution p(|D) is difficult

Approximate p(|D) using Laplacian approximation

– Expand the log-likelihood function around its maximum point *

*

*1

*

*'

'

( | ( '))log 1 log

( | ( '))

( ) ( | )

( )

( | , )where 1

( | , ')

z

zz

z z

zzz

i i zz

i i i zz

p D y

p D y

p D

p r x z

p r x z

*

θ

θ

θ

Page 45: CS590I: Information Retrieval

Efficiently Compute Expectation

Integration can be computed analytically

Efficiently updated model |x,r using posterior distribution p(|D)

| ,* ''

( | , ') ( '| )

arg min log z x rzz

x X z p r x p D

x

θ θ

| ,| , *

'| ,'

( | , ) 1

( ( | , ') 1)z x r z

z x rz x r

z

p r x z

p r x z

Page 46: CS590I: Information Retrieval

Experiment: Result for ‘EachMovie’

Bayesian > random prediction >> model

Each Movie

1

1.03

1.06

1.09

1.12

1.15

1.18

1 2 3 4 5 6

number of feedbacks

MA

E

random

Bayesian

model

prediction

Page 47: CS590I: Information Retrieval

Experiment: Results for ‘MovieRating’

Bayesian > random > prediction > model

MovieRating

0.79

0.82

0.85

0.88

0.91

1 2 3 4 5 6

number of feedback

MA

Erandom

Bayesian

model

prediction

Page 48: CS590I: Information Retrieval

Conclusion

Existing active learning algorithms do not work for aspect model for CF

– Model entropy method sometimes works substantially worse than the simple random method

The Bayesian approach toward active learning works substantially better than random selection

Future work

– Extend the Bayesian analysis to active learning for robust estimation of prediction errors.

Page 49: CS590I: Information Retrieval

Conclusion and Future Work

Some of my related work:

“Flexible Mixture Model for Collaborative Filtering”, ICML’03

“Preference-Based Graphical Model for Collaborative Filtering”, UAI’03

“A Study of Methods for Normalizing User Ratings in Collaborative Filtering , SIGIR ’04

“A Bayesian Approach toward Active Learning for Collaborative Filtering” UAI’04

“Unified Filtering by Combining Collaborative Filtering and Content-Based Filtering

via Mixture Model and Exponential Model ”, CIKM ’04

“A study of Mixture Models for Collaborative Filtering ”, JIR’06

Page 50: CS590I: Information Retrieval

Abstract

Outline Introduction to collaborative filtering Main framework Memory-based collaborative filtering approach Model-based collaborative filtering approach

Aspect model & Two-way clustering model

Flexible mixture model

Decouple model

Unified filtering by combining content and collaborative filtering Active learning for collaborative filtering