topic-sensitive pagerank taher h. haveliwala. pagerank importance is propagated a global ranking...

Topic-Sensitive PageRank

Taher H. Haveliwala

PageRank

Importance is propagatedA global ranking vector is pre-computed

PageRank


Basic idea For each topic, the importance scores for each page

are computed Composite score of a page are calculated by

combining the scores of the page based on the topics of the query


ODP-Biasing The top level categories of the Open Directory (16 topics)

is used Let Tj be the set of URLs in the ODP categories cj

In computing the PageRank vector for topic cj, we replace the uniform damping vector by the non-uniform vector where

It will be referred as


We chose to make P(cj) uniform

Experiment

Experimental Results

Similarity Measure for Induced Rankings overlap of two sets A and B

= . k = 20

Kendall’s distance measure


Query-Sensitive Scoring User Study

10 queries (randomly selected from our test set) 5 volunteers For each query, the volunteer was shown 2 result

rankings:• 1. top 10 results ranked with the unbiased PageRank

vector• 2. top 10 results ranked with the topic-sensitive

PageRank vector


User Study( con’t) The volunteer was asked to

• 1. select all URLs which were “relevant” to the query• 2. select the ranking list which is better

(They were not told anything about how either of the rankings was generated.)


Context-Sensitive Scoring

Other issues

Search Context hierarchical directory users’ browsing patterns Bookmarks email archives

Other issues

Flexibility Apply to any kinds of context

Transparency tune the classifier used on the search context, or adjust

topic weights

Privacy a client-side program could use the user context to

generate the user profile locally

Efficiency query-time cost and the offline preprocessing cost is low

Automatic Identification of User Interest For Personalized Search

Feng Qiu Junghoo Cho

User Preference Representation

Topic Preference Vector T = [T(1),…,T(m)] T(i) represents the user’s degree of interest in the ith

topic

User Preference Representation

User Model

Topic-Driven Random Surfer Model• The user browses the web in a two-step process.• First, the user chooses a topic of interest t for the

ensuing sequence of random walks with probability T(t)• Then with equal probability, she jumps to one of the

pages on topic t• Starting from this page, the user then performs a random

walk, such that at each step, with probability d, she randomly follows an out-link on the current page; with the remaining probability 1-d she gets bored and picks a new topic of interest for the next sequence of random walks based on T and jumps to a page on the chosen topic.

• This process is repeated forever.

User Model

Topic-Driven Searcher Model• The user always visits web pages through a search

engine in a two-step process.• First, the user chooses a topic of interest t with

probability T(t).• Then the user goes to the search engine and issues a

query on the chosen topic t. • The search engine then returns pages ranked by

TSPRt(p), on which the user clicks.

User Model

Relationship between V and T Under Topic-Driven Random Surfer Model

Under Topic-Driven Searcher Model

Learning Topic Preference Vector

Problem

Given V and TSPRi, find T satisfies

Learning Topic Preference Vector

Linear regression Minimize the square-root error

Maximum likelihood estimator **

= the probability that the user visits the page p

Ranking Search Results Using Topic Preference Vectors

Ranking of page p =

because

Evaluation Metrics

Accuracy of topic preference vector

Te is our estimation based on the user’s click history T is the user’s actual topic preference vector

Evaluation Metrics

Accuracy of personalized ranking Kendall distance between and is the sorted list of top-k pages based on the

estimated personalized ranking scores is the sorted list of top-k pages computed the user

‘s true preference vector

Evaluation Metrics

Improvement in search quality Average rank of relevant pages in the search

result

S denotes the set of the pages the user u selected

R(p) is the ranking of the page p

Experiments

User Study 10 subjects in the UCLA Computer Science

Department 04/2004 – 10/2004 (6 months) Queries to Google, results and clicked URLs

average number of queries per subject = 255.6 average number of clicks per query = 0.91

Experiments

Accuracy of Learning Method synthetic dataset generated by simulation based on

our topic-driven searcher model Generation of topic preference vector

• Randomly choose K topics and assign random weight for them. The weight of others are set to zero. The vector is then normalized

Generation of click history• Use the generated topic preference vector to generate the

clicks by the visit probability distribution dictated by the topic-driven searcher model

Experiments

Accuracy of estimated topic preference vector

Experiments

Accuracy of Personalized PageRank

Experiments

Quality of Personalized Search

Conclusion

Proposed a framework to investigate the problem of personalizing web searching by the user search history and TSPR

Conducted both theoretical and real life experiments to evaluate the approach

Thank you

topic-sensitive pagerank taher h. haveliwala. pagerank importance is propagated a global ranking...

Documents