optimizing search engines using clickthrough data
DESCRIPTION
ΕΘΝΙΚΟ ΜΕΤΣΟΒΙΟ ΠΟΛΥΤΕΧΝΕΙΟ ΣΧΟΛΗ ΗΛΕΚΤΡΟΛΟΓΩΝ ΜΗΧΑΝΙΚΩΝ ΚΑΙ ΜΗΧΑΝΙΚΩΝ ΥΠΟΛΟΓΙΣΤΩΝ ΤΟΜΕΑΣ ΤΕΧΝΟΛΟΓΙΑΣ ΠΛΗΡΟΦΟΡΙΚΗΣ ΚΑΙ ΥΠΟΛΟΓΙΣΤΩΝ. Optimizing search engines using clickthrough data. Problem. Optimization of web search results ranking Ranking algorithms mainly based on similarity - PowerPoint PPT PresentationTRANSCRIPT
Optimizing search engines using clickthrough data
ΕΘΝΙΚΟ ΜΕΤΣΟΒΙΟ ΠΟΛΥΤΕΧΝΕΙΟΣΧΟΛΗ ΗΛΕΚΤΡΟΛΟΓΩΝ ΜΗΧΑΝΙΚΩΝΚΑΙ ΜΗΧΑΝΙΚΩΝ ΥΠΟΛΟΓΙΣΤΩΝΤΟΜΕΑΣ ΤΕΧΝΟΛΟΓΙΑΣ ΠΛΗΡΟΦΟΡΙΚΗΣ ΚΑΙ ΥΠΟΛΟΓΙΣΤΩΝ
2
Problem Optimization of web search results ranking
Ranking algorithms mainly based on similarity Similarity between query keywords and page text keywords Similarity between pages (Page Rank)
No consideration for user personal preferences
Digital librariesHigh rank
VacationLow rank
3
Problem E.g.: query “delos”
Digital libraries Archaeology Vacation
Room for improvement by incorporating user behaviour data: user feedback Use of previous implicit search information to enhance result ranking
VacationLow rank
Digital librariesHigh rank
4
Types of user feedbackExplicit feedback
User explicitly judges relevance of results with the query
Costly in time and resources Costly for user limited effectiveness Direct evaluation of results
Implicit feedback Extracted from log files
Large amount of information Indirect evaluation of results through click behaviour
5
Implicit feedback (Categories) Clicked results
Absolute relevance: Clicked result -> Relevant
Risky: poor user behaviour quality Percentage of result clicks for a query Frequently clicked groups of results Links followed from result page
Relative relevance: Clicked result -> More relevant than non-clicked
More reliable Time
Between clicks E.g., fast clicks maybe bad results
Spent on a result page E.g., a lot of time relevant page
First click, scroll E.g., maybe confusing results
6
Implicit feedback (Categories) Query chains: sequences of reformulated queries to
improve results of initial search Detection:
Query similarity Result sets similarity Time
Connection between results of different queries: Enhancement of a bad query with another from the query chain Comparison of result relevance between different query results
Scrolling Time (quality of results) Scrolling behaviour (quality of results) Percentage of page (results viewed)
Other features Save, print, copy etc of result page maybe relevant Exit type (e.g. close window poor results)
7
Joachims approach Clickthrough data
Relative relevance Indicated by user behaviour study
Method Training of svm functions Training input: inequations of query result rankings Trained function: weight vector for features examined Use of trained vector to give weights to examined
features Experiments
Comparison of method with existing search engines
8
Clickthrough data Form
Triplets (q,r,c) = (query,ranked results,links clicked) …in the log file of a proxy server
Relative relevance (regarding a specific query) dk more relevant than dl if
dk clicked and dl not clicked and dk lower initial ranking than dl
d5 more relevant than d4d5 more relevant than d2
d3 more relevant than d2
9
Clickthrough data (Studies) Why not absolute relevance?
User behaviour influenced by initial ranking
Study: Rank and viewership Percentage of queries where a user
viewed the search result presented at a particular rank
Conclusion: Most times users view only the few first results
Study: Rank and viewership Percentage of queries where a user
clicked the result presented at a particular rank, both in normal and swapped conditions
Conclusion: Users tend to click on higher ranked results irrespective of content
10
Method (System train input) Data extracted from log
Relevance inequalities: dk <rq dl dk more relevant than dl for query q
Construct relevance inequalites for every query q and every result pair (dk , dl) for q
For each link di (result of query q): construct a feature vector Φ(q,di)
d5 more relevant than d4d5 more relevant than d2
d3 more relevant than d2
11
Method (System train input) Feature vector:
Describes the quality of the match between a document di and a query q
Φ(q,di) = [rank_X, top1_X, top10_X, …..]
12
Method (System train input) Weight vector:
w = [w1, w2,…,wn] Assigns weights to each of the features in Φ(q,di) Sdi = w * Φ(q,di) assigns a score to document di for
query q w: unknown must be trained by solving the system of
relative relevance inequalities derived from clickthrough data
13
Method (System training) Initial input transformation:
dk <rm dl => Sdk >rm Sdl => w * Φ(qm,dk) > w * Φ(qm,dl) => [w1, w2,…,wn] Φ(qm,dk) > [w1, w2,…,wn] Φ(qm,dl) System of relevance inequalites for every query q and
every result pair (dk , dl) for q Object of training:
Knowing, for every clicked link di , its feature vector, Find an optimal weight vector w which satisfies most of
the inequalities above (optimal solution) With vector w known, we can calculate a score for every
link, whether clicked or not, hence a new ranking
d5 more relevant than d4d5 more relevant than d2
d3 more relevant than d2
14
Method (System training) Intuitively
Example: 2 dimension feature vector X-axis: similarity between query terms and link text Y-axis: time spent on link page
Looking for optimal weight vector w If training input: r1 < r2 < r3 < r4
Optimal solution w1: text similarity more important for ranking If training input: r2 < r3 < r1 < r4
Optimal solution w2: time more important for ranking w optimized by maximizing δ
Link 1
Link 2Link 3
Link 4
15
Experiments Based on a meta-search engine
Combination of results from different search engines
“Fair” presentation of results: For a meta-meta-search engine combining 2 engines: Top z results of meta-search contain x results from engine A y results from engine b x + y = z |x – y| <= 1
16
Experiments Meta-search
engine Assumptions
Users click more often on more relevant links
Preference for one of the search engines not influenced by ranking
17
Experiments (Results) Conditions
20 users (university students) 260 training queries collected 20 days of training 10 days of evaluation
Comparison with Google MSNSearch Toprank (meta-search for Google, MSNSearch, Altavista, Excite,
Hotbot) Results
18
Experiments (Results) Weight vector
19
Open issues Trade-off between amount of training data and
homogeneity Clustering algorithms to find homogenous groups
of users Adaptation to the properties of a particular
document collection Incremental online learning/feedback algorithm Protection from spamming
20
Joachims approach II Clickthrough data
Addition of new relative relevance criterionsAddition of query chains
Method Modified feature vector
Rank features Term/document features
Constraints to the svm optimization problem To avoid trivial/incorrect solutions
21
Query chains Sequences of reformulated queries
Poor results for first query Too many/few terms Not representative terms
q1: “special collections” q2: “rare books” Incorrect spelling
q1: “Lexis Nexus” q2: “Lexis Nexis” Execution of new query
Addition/deletion of query terms to get better results In every query of the sequence, searching for the same thing
Ways of detection Term similarity between queries Percentage of common results Time between queries
22
Query chains detection method 1st strategy
Data recorded from log for 1285 queries: query, date, IP address, results returned, number of
clicks on the results, session id uniquely assigned Manual grouping of queries in query chains Division of data set into training and testing set Training
Feature vector for every pair of query Training of a weight vector indicating which features
are more important for a query pair to be a query chain
Testing Recognition of query chains in testing set Comparison with manual grouping 94.3% accuracy
2nd strategy Assumption: Query chain if:
Queries from the same IP Time between 2 queries < 30 min 91.6% accuracy
Adoption of (simpler) 2nd strategy
23
Clickthrough data (studies) Conclusion: Most
times users view the first 2 results
Conclusion: Most times users view the next result after the last clicked
24
Relevance criteria (query) Click >q Skip Above
dk more relevant than dl if dk clicked and dl not clicked and dk lower initial ranking than dl
Click First >q No-Click Second d1 more relevant than d2 if
d1 clicked d2 not clicked
d5 more relevant than d4d5 more relevant than d2d3 more relevant than d2
d1 more relevant than d2
25
Relevance criteria (query chain) Click >q1 Skip Above
dk more relevant than dl if dk clicked and dl not clicked and dk lower initial ranking than dl
Click First >q1 No-Click Second d1 more relevant than d2 if
d1 clicked d2 not clicked
FOR QUERY 1 (q’)d8 more relevant than d7d8 more relevant than d6
FOR QUERY 1 (q’)d5 more relevant than d6
QUERY 1 QUERY 2
26
Relevance criteria (query chain II)
Click >q1 Skip Earlier Query
Click First >q1 Top 2 Earlier Query
FOR QUERY 1 (q’)d6 more relevant than d1d6 more relevant than d2d6 more relevant than d4
FOR QUERY 1 (q’)d6 more relevant than d1d6 more relevant than d2
QUERY 1 QUERY 2
27
Relevance criteria (Experiment) Sequences of reformulated queries
Search on 16 subjectsRelevance inequalities produced by previous
criteriaComparison with explicit relevance judgements
on query chains
28
Method (SVM training) Feature vector consists of
Rank features φfirank(d, q)
Term/document features φterms(d, q) Rank features
One φfirank(d, q) for every retrieval
function fi examined Every φfi
rank(d, q) consists of 28 features
For ranks 1,2,3,…10,15,20,…100 in the initial ranking
If rank of d in initial ranking ≤ 1,2,3,…10,15,20,…100 set 1 else set 0
Allow us to learn weights for and combine different retrieval functions
29
Method (SVM training) Term/document features
φterms(d, q) consists of N*M features N number of terms M number of documents
Set 1 if di = d and tj ε q Trains weight vector to learn
assosiations between terms and documents
30
Method (Constraints) Problem
Most relevance criteria suggest that a lower ranked link (in the initial ranking) is better than a higher ranked link
Trivial solution: reverse the initial ranking by assigning negative values to weight vector
Solution Each w*φfi
rank(d, q) ≥ a minimun positive value
31
Experiments Based on a meta-search engine
Initial retrieval function from Nutch Training
9949 queries 7429 clicks Pqc: Total of 120134 relative relevance inequalities Pnc: A set of Pqc generated without use of query chains
Results: (rel0 initial ranking)
32
Experiments Results
Trained weights for term/document features
33
Open issues Tolerance of noise and malicious clicks Different weights in each query of a query chain Personalized ranking functions Improvement of performance
34
Related work (Fox et al. 2005) Evaluation of implicit measures Use of explicit ratings of user satisfaction Method
Bayesian modeling and decision trees Gene analysis
Results: Importance of combination of Clickthrough Time Exit type
Features used:
35
Related work (Agichtein, Brill, Dumais 2006) Incorporation of implicit
feedback features directly into the initial ranking algorithm BM25F RankNet
Features used:
36
Related work Query/links clustering based on features extracted
from implicit feedback Score based on interrelations between queries and
web pages
37
Possible extensions Utilization of time feedback
As relative relevance criterion As a feature
Other types of feedback Scrolling Exit type
Combination with absolute relevance clickthrough feedback Percentage of result clicks for a query Links followed from result page
Query chains Improvement of detection method
Link association rules For frequently clicked groups of results
Query/links clustering Constant training of ranking functions
38
References Search Engines that Learn from Implicit Feedback
Joachims, Radlinski Optimizing Search Engines Using Clickthrough Data
Joachims Query Chains: Learning to Rank from Implicit Feedback
Radlinski, Joachims Evaluating implicit measures to improve web search
Fox et al. Implicit Feedback for Inferring User Preference A Bibliography
Kelly, Teevan
Improving Web Search Ranking by Incorporating User Behavior Agichtein, Brill, Dumais
Optimizing web search using web click-through data Xue, Zeng, Chen, Yu, Ma, Xi, Fan
39
Clickthrough data (Studies) Why not absolute relevance?
Study: Rank and viewership Percentage of queries where a user clicked the result presented
at a particular rank, both in normal and swapped conditions
Conclusion: Users tend to click on higher ranked results irrespective of content
40
Types of user feedbackExplicit feedback
User explicitly judges relevance of results with the query
Costly in time and resources Costly for user limited effectiveness Direct evaluation of results
Implicit feedback Extracted from log files
Large amount of information Real user behaviour (not expert judgement) Indirect evaluation of results through click behaviour