topic-sensitive pagerank

44
Topic-Sensitive PageRank Taher H. Haveliwala

Upload: imogene-montana

Post on 30-Dec-2015

65 views

Category:

Documents


1 download

DESCRIPTION

Topic-Sensitive PageRank. Taher H. Haveliwala. PageRank. Importance is propagated A global ranking vector is pre-computed. PageRank. Topic-Sensitive PageRank. Basic idea For each topic, the importance scores for each page are computed - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Topic-Sensitive PageRank

Topic-Sensitive PageRank

Taher H. Haveliwala

Page 2: Topic-Sensitive PageRank

PageRank

Importance is propagatedA global ranking vector is pre-computed

Page 3: Topic-Sensitive PageRank

PageRank

Page 4: Topic-Sensitive PageRank

Topic-Sensitive PageRank

Basic idea For each topic, the importance scores for each page

are computed Composite score of a page are calculated by

combining the scores of the page based on the topics of the query

Page 5: Topic-Sensitive PageRank

Topic-Sensitive PageRank

ODP-Biasing The top level categories of the Open Directory (16 topics)

is used Let Tj be the set of URLs in the ODP categories cj

In computing the PageRank vector for topic cj, we replace the uniform damping vector by the non-uniform vector where

It will be referred as

Page 6: Topic-Sensitive PageRank

Topic-Sensitive PageRank

We chose to make P(cj) uniform

Page 7: Topic-Sensitive PageRank

Topic-Sensitive PageRank

Page 8: Topic-Sensitive PageRank

Experiment

Page 9: Topic-Sensitive PageRank

Experimental Results

Similarity Measure for Induced Rankings overlap of two sets A and B

= . k = 20

Kendall’s distance measure

Page 10: Topic-Sensitive PageRank

Experimental Results

Page 11: Topic-Sensitive PageRank

Experimental Results

Page 12: Topic-Sensitive PageRank

Experimental Results

Page 13: Topic-Sensitive PageRank

Experimental Results

Page 14: Topic-Sensitive PageRank

Experimental Results

Page 15: Topic-Sensitive PageRank

Experimental Results

Query-Sensitive Scoring User Study

10 queries (randomly selected from our test set) 5 volunteers For each query, the volunteer was shown 2 result

rankings:• 1. top 10 results ranked with the unbiased PageRank

vector• 2. top 10 results ranked with the topic-sensitive

PageRank vector

Page 16: Topic-Sensitive PageRank

Experimental Results

User Study( con’t) The volunteer was asked to

• 1. select all URLs which were “relevant” to the query• 2. select the ranking list which is better

(They were not told anything about how either of the rankings was generated.)

Page 17: Topic-Sensitive PageRank

Experimental Results

Page 18: Topic-Sensitive PageRank

Experimental Results

Page 19: Topic-Sensitive PageRank

Experimental Results

Context-Sensitive Scoring

Page 20: Topic-Sensitive PageRank

Experimental Results

Page 21: Topic-Sensitive PageRank

Other issues

Search Context hierarchical directory users’ browsing patterns Bookmarks email archives

Page 22: Topic-Sensitive PageRank

Other issues

Flexibility Apply to any kinds of context

Transparency tune the classifier used on the search context, or adjust

topic weights

Privacy a client-side program could use the user context to

generate the user profile locally

Efficiency query-time cost and the offline preprocessing cost is low

Page 23: Topic-Sensitive PageRank

Automatic Identification of User Interest For Personalized Search

Feng Qiu Junghoo Cho

Page 24: Topic-Sensitive PageRank

User Preference Representation

Topic Preference Vector T = [T(1),…,T(m)] T(i) represents the user’s degree of interest in the ith

topic

Page 25: Topic-Sensitive PageRank

User Preference Representation

Page 26: Topic-Sensitive PageRank

User Model

Topic-Driven Random Surfer Model• The user browses the web in a two-step process.• First, the user chooses a topic of interest t for the

ensuing sequence of random walks with probability T(t)• Then with equal probability, she jumps to one of the

pages on topic t• Starting from this page, the user then performs a random

walk, such that at each step, with probability d, she randomly follows an out-link on the current page; with the remaining probability 1-d she gets bored and picks a new topic of interest for the next sequence of random walks based on T and jumps to a page on the chosen topic.

• This process is repeated forever.

Page 27: Topic-Sensitive PageRank

User Model

Topic-Driven Searcher Model• The user always visits web pages through a search

engine in a two-step process.• First, the user chooses a topic of interest t with

probability T(t).• Then the user goes to the search engine and issues a

query on the chosen topic t. • The search engine then returns pages ranked by

TSPRt(p), on which the user clicks.

Page 28: Topic-Sensitive PageRank

User Model

Relationship between V and T Under Topic-Driven Random Surfer Model

Under Topic-Driven Searcher Model

Page 29: Topic-Sensitive PageRank

Learning Topic Preference Vector

Problem

Given V and TSPRi, find T satisfies

Page 30: Topic-Sensitive PageRank

Learning Topic Preference Vector

Linear regression Minimize the square-root error

Maximum likelihood estimator **

= the probability that the user visits the page p

Page 31: Topic-Sensitive PageRank

Ranking Search Results Using Topic Preference Vectors

Ranking of page p =

because

Page 32: Topic-Sensitive PageRank

Evaluation Metrics

Accuracy of topic preference vector

Te is our estimation based on the user’s click history T is the user’s actual topic preference vector

Page 33: Topic-Sensitive PageRank

Evaluation Metrics

Accuracy of personalized ranking Kendall distance between and is the sorted list of top-k pages based on the

estimated personalized ranking scores is the sorted list of top-k pages computed the user

‘s true preference vector

Page 34: Topic-Sensitive PageRank

Evaluation Metrics

Improvement in search quality Average rank of relevant pages in the search

result

S denotes the set of the pages the user u selected

R(p) is the ranking of the page p

Page 35: Topic-Sensitive PageRank

Experiments

User Study 10 subjects in the UCLA Computer Science

Department 04/2004 – 10/2004 (6 months) Queries to Google, results and clicked URLs

average number of queries per subject = 255.6 average number of clicks per query = 0.91

Page 36: Topic-Sensitive PageRank

Experiments

Accuracy of Learning Method synthetic dataset generated by simulation based on

our topic-driven searcher model Generation of topic preference vector

• Randomly choose K topics and assign random weight for them. The weight of others are set to zero. The vector is then normalized

Generation of click history• Use the generated topic preference vector to generate the

clicks by the visit probability distribution dictated by the topic-driven searcher model

Page 37: Topic-Sensitive PageRank

Experiments

Accuracy of estimated topic preference vector

Page 38: Topic-Sensitive PageRank

Experiments

Accuracy of estimated topic preference vector

Page 39: Topic-Sensitive PageRank

Experiments

Accuracy of Personalized PageRank

Page 40: Topic-Sensitive PageRank

Experiments

Accuracy of Personalized PageRank

Page 41: Topic-Sensitive PageRank

Experiments

Quality of Personalized Search

Page 42: Topic-Sensitive PageRank

Experiments

Quality of Personalized Search

Page 43: Topic-Sensitive PageRank

Conclusion

Proposed a framework to investigate the problem of personalizing web searching by the user search history and TSPR

Conducted both theoretical and real life experiments to evaluate the approach

Page 44: Topic-Sensitive PageRank

Thank you