matching task profiles and user needs in personalized web search

1

Matching Task Profiles and User Needs in

Personalized Web SearchJulia Luxenburger, Shady Elbassuoni, Gerhard Weikum

CIKM’08

Advisor: Chia-Hui ChangStudent: Teng-Kai Fan

Date: 2009-10-13

2

Outline Introduction Model and Algorithms

Architecture Personalization Framework

Experiments Conclusion and Future Work

3

Introduction Personalization provides better search experience to

individual users. User’s goal, tasks, and contexts.

Introducing language model for user tasks representing user profile.

Personalization framework selectively matches the actual user information need with relevant past user tasks.

4

Architecture A client-side search personalization with the use of a proxy

which is running locally. It can intercept all HTTP traffic.

Result re-ranking Whenever a user action allows to update the query representation,

unseen results are re-ranked. Query expansion

For some queries, they might rewrite the query sent to the search engine.

Merging of personalized and original results. Personalized result ranks and original web ranks are aggregated to

form the final result ranking. Combination method: Dwork et al. , “Rank aggregation methods for

the web,” WWW’01.

5

Personalization Framework user profile: query chains (subsequently posed queries), result

sets, clicked result pages, the whole clickstream of subsequently visited web pages.

search session: user's timing as well as the relatedness of subsequent user's actions. Actions: (1) queries (2) result clicks (3) other page visits.

task: user’s past search and browse behavior. They obtain tasks by means of a hierarchical clustering of the user’s

profile.

facet: using a hierarchical clustering of the query’s result set (represented by its title and snippet) to obtain query facets.

7

Task: user’s past search and browse behavior.

by means of a hierarchical clustering of the user’s profile.

Session: user's timing as well as the relatedness of subsequent user's actions.

Profile: query chains (subsequently posed queries), result sets, clicked result pages, the clickstream of subsequently visited web pages.

8

Selective Personalization Strategy Case I: the current query is the first query in the

current session. We retrieve the top-k tasks T1,…,Tk most similar to the

query from the user’s profile.

Case II: there exists some query history already, and the current query is a refinement of previously issued query in the same session. The tasks present in the user profile are accompanied by a

current task made up by all the actions of the currently active session, and represented by the language model Tk+1.

9

Selective Personalization Strategy cont. Considering the Kullback Leibler (KL) divergence between a

query fact Fi, and a task Tj

The KL divergence characterizes the strength of their similarity.

If KL(F∗i , T∗

j) is larger than a threshold σ, we conclude that the current query goes for a previously unexplored task, and thus refrain from biasing the search results.

Otherwise, we might either reformulate the query sent to Google or re-rank the original search results.

10

Means of Personalization We update the query representation with terms best

discriminating the query facet F∗i from all other query

facts, while being most similar to the task T∗j.

That is, terms which have the largest impact on the KL-divergence between the union of the chosen facet-task pair and the remaining query facets.

11

Means of Personalization cont. Using a threshold δ to allow for an automatic

reformulation of the query sent to Google.

Thus, Terms ｖ (w) < δ qualify for query expansion. Term with δ < ｖ (w) <τand P(w|∪iFi) > 0 qualify for re-

ranking the original top-50 search results.

12

Task Language Model The language model of a user task is a weighted

mixture of its components: queries, result clicks, clickstream documents and query-independent browsed documents.

Thus, the task language model T is:

Q is a uniform mixture of the task’s query chains. B is average of the individual browsed documents’

language models

13

Query language model Let QC denote a query chain Q1, Q2,…,Qk.

query language model is the average of all query chains’ models.

14

Query language model cont. The mixture model:

q: query string. CR: the set of clicked result items NR: non-clicked result items ranked above a clicked one. UR: unseen results ranked below the lowest-ranked clicked item. CS: the set of clickstream documents beyond the result documents.

15

Query language model cont. All constituent language models employ Dirichlet

prior smoothing:

μ: 2000 c(w,.): the frequency of word w in (.).

16

Facet Language Model The facet language model F is the uniform mixture of

the result snippet s ∈ F

17

Experimental Setup 7 volunteers install proxy to log their search and

browsing activities for a period of 2 months.

18

Experimental Setup cont. Each participant evaluated 8 self-chosen search tasks.

A search task is a sequence of queries, click and browsing actions until the user’s information need is satisfied.

For each task, the participant was presented with the top-50 Google results.

Then the participant was asked to mark each result as highly relevant, relevant or completely irrelevant.

Furthermore, we asked users to group the top-10 results of each query by giving labels to them.

59 search tasks and 89 individual evaluation queries.

19

Experimental Setup cont. Measure: Discounted Cumulative gain (DCG)

i : the rank of the result within the result set G(i): the relevance level of the result.

G(i) = 2 for highly relevant documents. G(i) = 1 for relevant documents. G(i) = 0 for non-relevant documents.

Parameters for the task query language model

20

Evaluation results with re-ranking

Fixed: the same fixed number of expansion terms. Flexible: the optimal threshold τ.

Enforced (no tasks) is unaware of both query facets and tasks.

Enforced (no facets) distinguishes between history tasks but still treats a query always in its entirety.

21

Query Expanding Evaluation

v(w) < τ

v(w) < δ

22

Correlating KL-divergence with performance gains Goal: whether a query benefits from personalization.

A negative correlation indicates that queries with more relevant information in the local index.

23

Parameterizing the personalization framework

24

Efficiency Tasks are computed offline. Using a incremental clustering to fold in the new session.

Hammouda et al. “Incremental document clustering using cluster similarity histograms,” WI’07.

25

Conclusion They proposed a thorough language model that

addresses user tasks and matches the user needs with past user tasks.

The model considers past viewed documents and past queries.

The proposed method achieved significant gains over both the Google ranking and traditional personalization approaches.

matching task profiles and user needs in personalized web search

Technology

current query

query chains

query representation

query facts

issued query

query history

query expansion

users profile