applied algorithm lab wooram heo

Post on 24-Feb-2016

36 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

Toward the Next Generation of Recommender Systems : A Survey of the State-of-the-Art and Possible Extensions. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2005 . Applied Algorithm Lab Wooram Heo. Outline. Recommemder Systems Problem statement Survey of Recommender systems - PowerPoint PPT Presentation

TRANSCRIPT

Toward the Next Generation of Recommender Systems: A Survey of theState-of-the-Art and Possible Extensions

Applied Algorithm LabWooram Heo

IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2005

Outline

• Recommemder Systems

• Problem statement

• Survey of Recommender systems– Content-Based Methods– Collabolative Methods– Hybrid Methods

Recommender Systems• Systems for recommending items (e.g. books, movies,

CD’s, web pages, newsgroup messages) to users based on examples of their preferences.

• Many on-line stores provide recommendations (e.g. Ama-zon, CDNow).

• Recommenders have been shown to substantially increase sales at on-line stores.

Recommender Systems• Examples

Problem statement• Recommendation problem is to estimate ratings for the

items that have not been seen by a user

• Estimation is usually based on the ratings given by the user to other items and on some other information

Problem statement• : the set of all users• : the set of all possible items that can be recommended• : , where is a nonnegative integers or real numbers within

certain range• For each user , we want to choose such item that maxi-

mizes the user’s utility.

• Utility needs to be extrapolated to the whole space

Recommender System Categories

• Content-based recommendations– The user will be recommended items similar to the ones the user

preferred in the past

• Collaborative recommendations – The user will be recommended items that people with similar tastes

and preferences liked in the past

• Hybrid approaches – These methods combine collaborative and content-based methods

Content-Based Methods• Recommend items similar to those users preferred in the

past• User profiling is the key• E.g. in a movie recommender application,

– Specific actors– Directors– Genres– etc

Content-Based Methods• Content-based approach has its roots in information re-

trieval– Documents, web sites(URLs), and news messages

• Designed mostly to recommend text-based items– Content is usually described with keywords

Content-Based Methods• TF-IDF weight for keywords in document is defined as

• Content of document is defined as

• Cosine similarity measure

Disadvantages• Not all content is well represented by keywords

– Multimedia data

• Items represented by same set of features are indistin-guishable

• Overspecialization problem

• New user problem– No history available

Collaborative Methods• Use other users recommendations (ratings) to judge item’s

utility

• Key is to find users/user groups whose interests match with the current user

• More users, more ratings: better results

• Can account for items dissimilar to the ones seen in the past too

Collaborative Methods

A 9B 3C 9: :Z 5

A B C 9: :Z 10

A 5B 3C: : Z 7

A B C 8: : Z

A 6B 4C 2: :Z

A 10B 4C 8. .Z 1

UserDatabase

ActiveUser

CorrelationMatch

A 9B 3C . .Z 5

A 9B 3C 9: :Z 5

A 10B 4C 8. .Z 1

ExtractRecommendations

C

Collaborative Methods• Memory-based algorithms

– Value of the unknown rating for user and item is usually computed as an aggregate of the ratings of some other users for the same item

– Where denotes the set of users that are the most similar to user c and who have rated item

Collaborative Methods• Similarity between two users

– Pearson correlation coefficient

– Cosine similarity

Collaborative Methods• Model-based algorithm

– Cluster models and Bayesian networks are used to estimate this probability

Collaborative Methods• Model-based approaches use various machine learning

techniques– K-means clustering– Gibbs sampling– Bayesian model– Probabilistic relational model– Linear regression– Maximum entropy model– Markov decision process– Probabilistic latent semantic analysis– Latent Dirichlet allocation– etc

Disadvantages• Finding similar users/user groups isn’t very easy

• New user problem : No preferences available

• New item problem: No ratings available

• Sparsity problem

END

top related