clustering-based collaborative filtering for web page recommendation csce 561 project proposal...

24
Clustering-based Clustering-based Collaborative filtering Collaborative filtering for web page for web page recommendation recommendation CSCE 561 project Proposal CSCE 561 project Proposal Mohammad Amir Sharif Mohammad Amir Sharif [email protected] [email protected]

Upload: dora-glenn

Post on 12-Jan-2016

218 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Clustering-based Collaborative filtering for web page recommendation CSCE 561 project Proposal Mohammad Amir Sharif mas4108@louisiana.edu

Clustering-based Clustering-based Collaborative filtering for web Collaborative filtering for web

page recommendation page recommendation

CSCE 561 project ProposalCSCE 561 project Proposal

Mohammad Amir SharifMohammad Amir Sharif

[email protected]@louisiana.edu

Page 2: Clustering-based Collaborative filtering for web page recommendation CSCE 561 project Proposal Mohammad Amir Sharif mas4108@louisiana.edu

Presentation Presentation OutlineOutline

Introduction to Recommendation SystemsIntroduction to Recommendation Systems Clustering based recommendation Clustering based recommendation

algorithmalgorithm Implementing Clustering based webpage Implementing Clustering based webpage

recommendation using mahoutrecommendation using mahout Experimental set-upExperimental set-up Evaluation of the developed SystemEvaluation of the developed System ReferencesReferences

Page 3: Clustering-based Collaborative filtering for web page recommendation CSCE 561 project Proposal Mohammad Amir Sharif mas4108@louisiana.edu

Information OverloadInformation OverloadNews items,

Books, Journals,

Research

papers

TV programs,

Music CDs,

Movie titles

Consumer

products, e-

commerce

items,

Web pages,

Usenet articles,e-mails

Page 4: Clustering-based Collaborative filtering for web page recommendation CSCE 561 project Proposal Mohammad Amir Sharif mas4108@louisiana.edu

IntroductionIntroduction What is recommendation system? What is recommendation system? – – Recommend related items Recommend related items – – Personalized experiences Personalized experiences

Page 5: Clustering-based Collaborative filtering for web page recommendation CSCE 561 project Proposal Mohammad Amir Sharif mas4108@louisiana.edu

Introduction(cont)Introduction(cont)

Components of a recommender Components of a recommender systemsystem

– – Set of users, set of items Set of users, set of items (products)(products) – – Implicit/explicit user rating on Implicit/explicit user rating on itemsitems – – Additional informationAdditional information: : trust, trust, collaboration, etc.collaboration, etc. – – Algorithms for generating Algorithms for generating recommendationsrecommendations

Page 6: Clustering-based Collaborative filtering for web page recommendation CSCE 561 project Proposal Mohammad Amir Sharif mas4108@louisiana.edu

Introduction Introduction (cont)(cont)

Recommendation techniquesRecommendation techniques-Collaborative Filtering (CF)Collaborative Filtering (CF)

-Memory-based algorithms: user-Memory-based algorithms: user-based, item-basedbased, item-based-Model-based algorithms: Bayesian Model-based algorithms: Bayesian network ; network ; Clustering Clustering ; Rule-based ; ; Rule-based ; Machine learning on graphsMachine learning on graphs;; PLSA; PLSA; Matrix factorization Matrix factorization

-Content-based recommendationContent-based recommendation-Hybrid approachesHybrid approaches

Page 7: Clustering-based Collaborative filtering for web page recommendation CSCE 561 project Proposal Mohammad Amir Sharif mas4108@louisiana.edu

CF AlgorithmCF Algorithm

Problems: large-scale data; sparse Problems: large-scale data; sparse rating matrix rating matrix

Page 8: Clustering-based Collaborative filtering for web page recommendation CSCE 561 project Proposal Mohammad Amir Sharif mas4108@louisiana.edu

Clustering based Clustering based Collaborative FilteringCollaborative Filtering

x xx x

x x xx x x

x x xx x x

x xx x

x x xx x x

x x xx x x

x xx x

x x xx x x

x x xx x x

Cluster 2Cluster 1

item-based CF

User clustering

item-based CF

Find the most similar cluster for an active userFind the most similar cluster for an active user Apply Similarity measure among current and other Apply Similarity measure among current and other usersusers Users’ similarities are used to predict the Users’ similarities are used to predict the recommendation value of an item for active userrecommendation value of an item for active user

Page 9: Clustering-based Collaborative filtering for web page recommendation CSCE 561 project Proposal Mohammad Amir Sharif mas4108@louisiana.edu

Experimental set Experimental set upup

13745 preprocessed user session data on 683 13745 preprocessed user session data on 683 pages are available for this experiment.pages are available for this experiment.

User-Item pageview Matrix of size 13745×683 User-Item pageview Matrix of size 13745×683 where each cell represents the page view where each cell represents the page view time of a user for a page in a particular time of a user for a page in a particular session.session.

Apache Mahout which works on top of Hadoop Apache Mahout which works on top of Hadoop will be used to make the clustering of user will be used to make the clustering of user sessionssessions

The Apache Hadoop and mahout are open The Apache Hadoop and mahout are open source software library for large scale source software library for large scale distributed computing and machine learning distributed computing and machine learning respectively respectively

Page 10: Clustering-based Collaborative filtering for web page recommendation CSCE 561 project Proposal Mohammad Amir Sharif mas4108@louisiana.edu

Experimental set up(cont)Experimental set up(cont) Sample row of the data setsSample row of the data sets

0 0 3 0 0 5 0 4 6 2 0 0 0 0 0 70 0 3 0 0 5 0 4 6 2 0 0 0 0 0 7 Vector similarity Vector similarity

– similarity among active session and similarity among active session and

cluster centercluster centerx x

x xx x x

x x xx x xx x x

Page 11: Clustering-based Collaborative filtering for web page recommendation CSCE 561 project Proposal Mohammad Amir Sharif mas4108@louisiana.edu

Experimental set Experimental set up(cont)up(cont)

Similarity of Similarity of

user u and v, user u and v, WWu,vu,v

Predicted Recommendation of user Predicted Recommendation of user aa to item to item i, i, PPa,ia,i

Here, Here, rru,iu,i is the user is the user uu’s pageview time ’s pageview time for page for page ii

I I is set of pages, is set of pages, rruu and and rrvv are average are average pageview time of userpageview time of user u u and and vv..

x xx x

x x xx x x

x x xx x x

Page 12: Clustering-based Collaborative filtering for web page recommendation CSCE 561 project Proposal Mohammad Amir Sharif mas4108@louisiana.edu

Evaluation Evaluation MetriceMetrice

Where, Where, rrmaxmax and and rrminmin are the upper and lower bounds of pageview are the upper and lower bounds of pageview time time ppi,ji,j is the prediction for user is the prediction for user ii to item to item j, rj, ri,j i,j is the is the pageview time of user pageview time of user ii to page to page jj

Mean Absolute Error and Normalized Mean Absolute Error:Mean Absolute Error and Normalized Mean Absolute Error:

Page 13: Clustering-based Collaborative filtering for web page recommendation CSCE 561 project Proposal Mohammad Amir Sharif mas4108@louisiana.edu

ReferencesReferences

http://hadoop.apache.org/http://hadoop.apache.org/ http://mahout.apache.org/http://mahout.apache.org/ Manh, C., P., Yiwei C., Ralf K., Matthias J., A Clustering Manh, C., P., Yiwei C., Ralf K., Matthias J., A Clustering

Approach for Collaborative Filtering Recommendation Approach for Collaborative Filtering Recommendation Using Social Network Analysis, Using Social Network Analysis, Journal of Universal Journal of Universal Computer Science, vol. 17, no. 4 (2011), 583-604Computer Science, vol. 17, no. 4 (2011), 583-604

Page 14: Clustering-based Collaborative filtering for web page recommendation CSCE 561 project Proposal Mohammad Amir Sharif mas4108@louisiana.edu

Thank youThank you

To know more contact me

E-mail: [email protected]

Page 15: Clustering-based Collaborative filtering for web page recommendation CSCE 561 project Proposal Mohammad Amir Sharif mas4108@louisiana.edu

An Efficient Information An Efficient Information Retrieval SystemRetrieval System

Objectives:Objectives: Efficient Retrieval incorporating Efficient Retrieval incorporating

keyword’s position; and occurrences keyword’s position; and occurrences of keywords in heading or titles in the of keywords in heading or titles in the inverted index. inverted index.

Retrieve relevant documents Retrieve relevant documents considering proximity (considering proximity (Example: “dogs” Example: “dogs” and “race” within 4 words)and “race” within 4 words) of query termsof query terms

Evaluation of the systemEvaluation of the system

Extension to Assignment # 3 Extension to Assignment # 3

Page 16: Clustering-based Collaborative filtering for web page recommendation CSCE 561 project Proposal Mohammad Amir Sharif mas4108@louisiana.edu

Inverted IndexInverted Index

system

computer

database

science D2, 4

D5, 2

D1, 3

D7, 4

Index terms df

3

2

4

1

Dj, tfj

Index file Postings lists

cats dogs fish goats sheep

whales

(1,1): 1

(1,2): 2,3

(4,1): 1

(3,1): 2

(2,1): 3

(3,1): 1

(2,1): 2

(2,1): 1 (3,1): 2

(4,2): 1

Page 17: Clustering-based Collaborative filtering for web page recommendation CSCE 561 project Proposal Mohammad Amir Sharif mas4108@louisiana.edu

Inverted Index (cont.)Inverted Index (cont.)

HashMaptokenHash

Stringtoken

TokenInfo

doubleidf

ArrayListoccList

TokenOccurenceDocumentReferencedocRef

intcount

Filefile

doublelength

TokenOccurenceDocumentReferencedocRef

intcount

Filefile

doublelength

Page 18: Clustering-based Collaborative filtering for web page recommendation CSCE 561 project Proposal Mohammad Amir Sharif mas4108@louisiana.edu

Inverted Index (cont.)Inverted Index (cont.)

HashMaptokenHash

Stringtoken

TokenInfo

doubleidf

ArrayListoccList

TokenOccurenceDocumentReferencedocRef

Weight

Positi-ons

Filefile

doublelength

TokenOccurenceDocumentReferencedocRef

Weight

Positi-ons

Filefile

doublelength

Based on frequency, heading etc

Stores the positions of occurrences

Page 19: Clustering-based Collaborative filtering for web page recommendation CSCE 561 project Proposal Mohammad Amir Sharif mas4108@louisiana.edu

Creating an Inverted IndexCreating an Inverted Index

Create an empty HashMap, H;Create an empty HashMap, H;For each document, D, (i.e. file in an input directory):For each document, D, (i.e. file in an input directory): Create a HashMapVector,V, for D;Create a HashMapVector,V, for D;

For each (non-zero) token, T, in V:For each (non-zero) token, T, in V: If T is not already in H, create an empty If T is not already in H, create an empty

TokenInfo for T and insert it into H;TokenInfo for T and insert it into H; Create a TokenOccurence for T in D and Create a TokenOccurence for T in D and add it to the occList in the TokenInfo for T;add it to the occList in the TokenInfo for T;Compute IDF for all tokens in H;Compute IDF for all tokens in H;Compute vector lengths for all documents in H;Compute vector lengths for all documents in H;

Page 20: Clustering-based Collaborative filtering for web page recommendation CSCE 561 project Proposal Mohammad Amir Sharif mas4108@louisiana.edu

Inverted-Index Retrieval Inverted-Index Retrieval AlgorithmAlgorithm

Create a HashMapVector, Q, for the query.Create a HashMapVector, Q, for the query.Create empty HashMap, R, to store retrieved documents Create empty HashMap, R, to store retrieved documents

with scores.with scores.For each token, T, in Q:For each token, T, in Q: Let I be the IDF of T, and K be the count of T in Q;Let I be the IDF of T, and K be the count of T in Q; Set the weight of T in Q: W = K * I;Set the weight of T in Q: W = K * I; Let L be the list of TokenOccurences of T from H;Let L be the list of TokenOccurences of T from H; For each TokenOccurence, O, in L:For each TokenOccurence, O, in L: Let D be the document of O, and C be the count of Let D be the document of O, and C be the count of

O O (tf of T in D);(tf of T in D);

If D is not already in R If D is not already in R (D was not previously retrieved) (D was not previously retrieved)

Then add D to R and initialize score to 0.0;Then add D to R and initialize score to 0.0; Increment D’s score by W * I * C; Increment D’s score by W * I * C; (product of T-weight (product of T-weight

in Q and D)in Q and D)

Page 21: Clustering-based Collaborative filtering for web page recommendation CSCE 561 project Proposal Mohammad Amir Sharif mas4108@louisiana.edu

Precision and Precision and RecallRecall

documents relevant of number Total

retrieved documents relevant of Number recall

retrieved documents of number Total

retrieved documents relevant of Number precision

Relevant documents

Retrieved documents

Entire document collection

retrieved & relevant

not retrieved but relevant

retrieved & irrelevant

Not retrieved & irrelevant

retrieved not retrieved

rele

vant

irre

leva

nt

Page 22: Clustering-based Collaborative filtering for web page recommendation CSCE 561 project Proposal Mohammad Amir Sharif mas4108@louisiana.edu

Computing Computing Recall/PrecisionRecall/Precision

R=3/6=0.5; P=3/5=0.6

R=1/6=0.167; P=1/1=1

R=2/6=0.333; P=2/3=0.667

R=6/6=1.0; p=6/14=0.429

R=4/6=0.667; P=4/8=0.5

R=5/6=0.833; P=5/9=0.556

n doc # relevant

1 588 x2 5763 589 x4 3425 590 x6 7177 9848 772 x9 321 x10 49811 11312 62813 77214 592 x

Page 23: Clustering-based Collaborative filtering for web page recommendation CSCE 561 project Proposal Mohammad Amir Sharif mas4108@louisiana.edu

EvaluationEvaluation

0

0.2

0.4

0.6

0.8

1

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Recall

Pre

cisi

on

No Position Position

oConsidering position information the system should give better performanceoThe curve closest to the upper right-hand corner of the graph indicates the best performance

Page 24: Clustering-based Collaborative filtering for web page recommendation CSCE 561 project Proposal Mohammad Amir Sharif mas4108@louisiana.edu

Thank youThank you

To know more contact me

E-mail: [email protected]