a scalable collaborative filtering framework based on co clustering

20
A SCALABLE COLLABORATIVE FILTERING FRAMEWORK BASED ON CO-CLUSTERING Authors/ Thomas George and Srujana Merugu Source/ ICDM’05, pp. 628-628 Presenter/ Allen 1

Upload: allenwu

Post on 06-May-2015

2.654 views

Category:

Documents


5 download

DESCRIPTION

A scalable collaborative filtering framework based on co clustering

TRANSCRIPT

Page 1: A scalable collaborative filtering framework based on co clustering

A SCALABLE COLLABORATIVE FILTERING FRAMEWORK BASED ON CO-CLUSTERINGAuthors/ Thomas George and Srujana Merugu

Source/ ICDM’05, pp. 628-628

Presenter/ Allen

1

Page 2: A scalable collaborative filtering framework based on co clustering

OUTLINE

Introduction Related Work Problem Definition Collaborative Filtering via Co-clustering Scalable Collaborative Filtering System Experimental Results Conclusion

2

Page 3: A scalable collaborative filtering framework based on co clustering

INTRODUCTION

Due to the overwhelming increasing in web-based activities, users are often forced to choose from a large number of products or content items.

To aid users in the decision making process, it has become increasingly important to design recommender systems.

Collaborative filtering identify the likely preferences of a user based on the known preferences of other users.

3

Page 4: A scalable collaborative filtering framework based on co clustering

INTRODUCTION (CONT.) Existing collaborative filtering methods based on correlation criteria

Singular value decomposition (SVD) Non-negative matrix factorization (NNMF)

Drawbacks: Computationally expensive of training component

The practical scenarios such as real-time news personalization require dynamic collaborative filtering.

The key idea Simultaneously obtaining user and item neighborhoods via co-

clustering. Generating predictions based on average ratings. 4

Page 5: A scalable collaborative filtering framework based on co clustering

INTRODUCTION (CONT.)

Two new contributions: Dynamic collaborative filtering approach

Supporting the entry of new users, items and ratings via a hybrid of incremental and batch versions of the co-clustering algorithm.

A scalable, real-time collaborative filtering system Developing parallel versions of co-clustering, prediction and

incremental training routines.

Notation: A: matrix, e.g. Aij denoting the corresponding matrix elements.

: sets, and enumerated as {xi}ni=1, where xi are the elements of

the set. 5

Page 6: A scalable collaborative filtering framework based on co clustering

RELATED WORK

Recommender System Content-based filtering system Collaborative filtering system

Co-clustering SVD and NNMF-based filtering techniques that predict the

unknown ratings based on a low rank approximation of the original ratings matrix. The missing values are filled with the average ratings.

Incremental versions of SVD has been proposed to solve the computational expensive problem. (SDM 2003)

6

Page 7: A scalable collaborative filtering framework based on co clustering

PROBLEM DEFINITION

Let U={ui}mi=1 be the set of users such that |U|=m and

P={pj}nj=1 be the set of items such that |P|=n.

Let A be the mn ratings matrix such that Aij is the rating of the user ui to the item pj. Let W be the mn matrix corresponding to the condifence of

the ratings in A. Wij=1, the rating is known and 0 otherwise.

Let user clustering : {1, …, m} → {1, …, k}, and item clustering :{1, …, n} → {1, …, l} k: # user clusters; l: # item clusters

7

Page 8: A scalable collaborative filtering framework based on co clustering

PROBLEM DEFINITION (CONT.) The approximate matrix  is given by

where g=(i), h=(j). Ai

R, AjC are the average ratings of user ui and item pj.

AghCOC, Ag

RC and AhCC are the average ratings of the corresponding

co-cluster, user-cluster and item-cluster.

8

Page 9: A scalable collaborative filtering framework based on co clustering

COLLABORATIVE FILTERING VIA CO-CLUSTERING

Static training (co-clustering): the goal is to minimize

The row and column assignment steps can be implemented efficiently by pre-computing the invariant parts of the update cost functions. Required info. Row updating: minimizing

Column updating: minimizing

9CCh

COChi

tmpji AAA )(

3)(

Page 10: A scalable collaborative filtering framework based on co clustering

STATIC TRAINING: CO-CLUSTERING

10

Page 11: A scalable collaborative filtering framework based on co clustering

PREDICTION

11

Page 12: A scalable collaborative filtering framework based on co clustering

INCREMENTAL TRAINING

12

Page 13: A scalable collaborative filtering framework based on co clustering

SCALABLE COLLABORATIVE FILTERING SYSTEM

Using a distributed memory representation for the data objects so that each of the processors P1 and P2 are in fact clusters of processors. P1 handles the prediction and incremental training. P2 is responsible for the static training.

13

Page 14: A scalable collaborative filtering framework based on co clustering

PARALLEL CO-CLUSTERING

14

Page 15: A scalable collaborative filtering framework based on co clustering

EXPERIMENTAL RESULTS

Datasets and algorithm Movie-lens (100K): 943 users and 1682 movies consists of

100,000 ratings(1-5). BookCrossing: 470034 users and 133438 books consists of

269392 ratings(1-10). Movie1-Movie10: 10-100% ratings of the movie-lens 100K.

80% training and 20% testing for all the datasets. Evaluation metrics: Mean Absolute Error (MAE)

The experiments evaluated the effectiveness and efficiency in terms of MAE and execution time.

15

Page 16: A scalable collaborative filtering framework based on co clustering

MAE COMPARISON

Mov1: movie-lens Mov2: BookCrossing Mov3: 10 subsets of movie-lens

16

K=3

Page 17: A scalable collaborative filtering framework based on co clustering

VARIATION OF MAE WITH # PARAMETERS

# prediction parameters: COCLUST: (m+n+kl-k-l) values SVD, NNMF: (m+n)(k+l) values

Movie3 dataset

17

Page 18: A scalable collaborative filtering framework based on co clustering

EFFICIENCY

The time is needed for prediction on each given test pair of movie-lens.

Training time (co-clustering) vs. Data size Movie-lens dataset Experimental devices

AMD 1.4Ghz on 128 computer

nodes with 384MB RAM

18

Page 19: A scalable collaborative filtering framework based on co clustering

TRAINING TIME VS. # OF PROCESSORS

Movie-lens dataset Experimental devices

AMD 1.4Ghz on different # of processors with 384MB RAM

19

Page 20: A scalable collaborative filtering framework based on co clustering

CONCLUSION

Recommender system are proving to be extremely useful for a number of online activities such as e-commerce.

Regarding to the dynamic scenario, the efficiency and effectiveness issues should be concerned. New users, items and ratings enter the system at a rapid rate.

This paper proposed a new dynamic CF approach based on co-clustering.

Empirical results indicate the high quality predictions at a much lower computational cost.

20