topic-sensitive pagerank taher h. haveliwala. pagerank importance is propagated a global ranking...
Post on 22-Dec-2015
218 views
TRANSCRIPT
Topic-Sensitive PageRank
Taher H. Haveliwala
PageRank
Importance is propagatedA global ranking vector is pre-computed
PageRank
Topic-Sensitive PageRank
Basic idea For each topic, the importance scores for each page
are computed Composite score of a page are calculated by
combining the scores of the page based on the topics of the query
Topic-Sensitive PageRank
ODP-Biasing The top level categories of the Open Directory (16 topics)
is used Let Tj be the set of URLs in the ODP categories cj
In computing the PageRank vector for topic cj, we replace the uniform damping vector by the non-uniform vector where
It will be referred as
Topic-Sensitive PageRank
We chose to make P(cj) uniform
Topic-Sensitive PageRank
Experiment
Experimental Results
Similarity Measure for Induced Rankings overlap of two sets A and B
= . k = 20
Kendall’s distance measure
Experimental Results
Experimental Results
Experimental Results
Experimental Results
Experimental Results
Experimental Results
Query-Sensitive Scoring User Study
10 queries (randomly selected from our test set) 5 volunteers For each query, the volunteer was shown 2 result
rankings:• 1. top 10 results ranked with the unbiased PageRank
vector• 2. top 10 results ranked with the topic-sensitive
PageRank vector
Experimental Results
User Study( con’t) The volunteer was asked to
• 1. select all URLs which were “relevant” to the query• 2. select the ranking list which is better
(They were not told anything about how either of the rankings was generated.)
Experimental Results
Experimental Results
Experimental Results
Context-Sensitive Scoring
Experimental Results
Other issues
Search Context hierarchical directory users’ browsing patterns Bookmarks email archives
Other issues
Flexibility Apply to any kinds of context
Transparency tune the classifier used on the search context, or adjust
topic weights
Privacy a client-side program could use the user context to
generate the user profile locally
Efficiency query-time cost and the offline preprocessing cost is low
Automatic Identification of User Interest For Personalized Search
Feng Qiu Junghoo Cho
User Preference Representation
Topic Preference Vector T = [T(1),…,T(m)] T(i) represents the user’s degree of interest in the ith
topic
User Preference Representation
User Model
Topic-Driven Random Surfer Model• The user browses the web in a two-step process.• First, the user chooses a topic of interest t for the
ensuing sequence of random walks with probability T(t)• Then with equal probability, she jumps to one of the
pages on topic t• Starting from this page, the user then performs a random
walk, such that at each step, with probability d, she randomly follows an out-link on the current page; with the remaining probability 1-d she gets bored and picks a new topic of interest for the next sequence of random walks based on T and jumps to a page on the chosen topic.
• This process is repeated forever.
User Model
Topic-Driven Searcher Model• The user always visits web pages through a search
engine in a two-step process.• First, the user chooses a topic of interest t with
probability T(t).• Then the user goes to the search engine and issues a
query on the chosen topic t. • The search engine then returns pages ranked by
TSPRt(p), on which the user clicks.
User Model
Relationship between V and T Under Topic-Driven Random Surfer Model
Under Topic-Driven Searcher Model
Learning Topic Preference Vector
Problem
Given V and TSPRi, find T satisfies
Learning Topic Preference Vector
Linear regression Minimize the square-root error
Maximum likelihood estimator **
= the probability that the user visits the page p
Ranking Search Results Using Topic Preference Vectors
Ranking of page p =
because
Evaluation Metrics
Accuracy of topic preference vector
Te is our estimation based on the user’s click history T is the user’s actual topic preference vector
Evaluation Metrics
Accuracy of personalized ranking Kendall distance between and is the sorted list of top-k pages based on the
estimated personalized ranking scores is the sorted list of top-k pages computed the user
‘s true preference vector
Evaluation Metrics
Improvement in search quality Average rank of relevant pages in the search
result
S denotes the set of the pages the user u selected
R(p) is the ranking of the page p
Experiments
User Study 10 subjects in the UCLA Computer Science
Department 04/2004 – 10/2004 (6 months) Queries to Google, results and clicked URLs
average number of queries per subject = 255.6 average number of clicks per query = 0.91
Experiments
Accuracy of Learning Method synthetic dataset generated by simulation based on
our topic-driven searcher model Generation of topic preference vector
• Randomly choose K topics and assign random weight for them. The weight of others are set to zero. The vector is then normalized
Generation of click history• Use the generated topic preference vector to generate the
clicks by the visit probability distribution dictated by the topic-driven searcher model
Experiments
Accuracy of estimated topic preference vector
Experiments
Accuracy of estimated topic preference vector
Experiments
Accuracy of Personalized PageRank
Experiments
Accuracy of Personalized PageRank
Experiments
Quality of Personalized Search
Experiments
Quality of Personalized Search
Conclusion
Proposed a framework to investigate the problem of personalizing web searching by the user search history and TSPR
Conducted both theoretical and real life experiments to evaluate the approach
Thank you