personalized search result diversification via structured learning shangsong liang, zhaochun ren,...

Personalized Search Result Diversification via Structured LearningSHANGSONG LIANG, ZHAOCHUN REN, MAARTEN DE RIJKE

UNIVERSITY OF AMSTERDAM

PRESENTED BY YU HU

Tackling Ambiguous Query Personalization approach:

◦ Tailor the results to the specific interests of the user

◦ Inaccurate user profile

◦ When query is unrelated to the personalized information

Diversification Approach:◦ Maximize probability of showing an interpretation relevant to the user

◦ Outliers

Diversification

?

Diversified Results

Query: Queen

Personalization

User Profile

Query: Queen

Personalized ordering

Overview of PSVMdiv Given a user and a query, predict a diverse set of docs

◦ Formulate a discriminant based on maximizing search result diversification◦ Perform training using the structured support vector machines framework◦ User interest LDA-style topic model

◦ Infer a per-document per-user multinomial distribution over topics and determine whether a document can cater to a specific user

◦ During Training use features extracted from three sources

The Learning Problem

u: documents user u is interested in x : a set of documents• y: candidate documents

• Given a user and a set of documents, select a subset of documents that maximizes search result diversification for the user

• Loss Function:

The Learning Problem• Learn a hypothesis function to predict a y given x and u;• Labeled training data assumed to be available: • To find a function h such that the empirical risk can be minimized;• Let a discriminant compute how well the predicting y fits x

and u. • The hypothesis predicts the y that maximizes F: • Each (x, u, y) is described through a feature vector • The discriminant function is assumed to be linear in the feature space：

Standard SVMs and Additional Constraints

Optimization problem for standard SVMs

Additional constraints:For diversity:

For consistency with user’s interest:

User Interest Topic Model• To capture per-user and per-document distributions over topics

Latent Dirichelet Allocation

α is the Dirichlet prior on the per-document topic distributions,β is the Dirichlet prior on the per-topic word distribution,θi is the topic distribution for document i,ϕk is the word distribution for topic k,Z is the topic for the j th word in document i, andwij is the specific word.

Feature Space Three types:

◦ Extracted directly from tokens’ statistical information in the documents◦ Compute similarity scores between a document x ϵ y and a set of documents u that a user is interested in. Cosine, Euclidean, KL

divergence metrics are considered.

◦ Those generated from proposed user- interest LDA-style topic model◦ Compute similarity scores between a document x ϵ y and a set of documents u based on a multinomial distribution over topics and

the user’s multinomial distribution over topics generated by the User Interest Topic Model. Cosine, Euclidean, KL divergence metrics are considered.

◦ Those utilized by unsupervised personalized diversification algorithms◦ The main probability used in state-of-art unsupervised personalized diversification methods are utilized here as features. Such as

p(d|q), the probability of d relevant to q; p(c|d), the probability of d belonging to a category c, etc.

Dataset A publicly available personalized diversification dataset.

◦ Contains private evaluation information from 35 users on 180 search queries◦ Ambiguous queries, length no more than two keywords◦ 751 subtopics for the queries, with most of the queries having more than 2 subtopics◦ Over 3800 relevance judgments are available, for at least top 5 results for each query◦ Each relevance judgment includes 3 main assessments

◦ 4-grade scale assessment on how relevant the result is to the user’s interest—user relevance◦ 4-grade scale assessment on how relevant the result is to the evaluated query—topic relevance◦ 2-grade assessment whether a subtopic is related to the evaluated query

Baselines PSVMdiv compared to 11 baselines:

◦ Traditional: BM25◦ Plain diversity: IA-select, xQuAD ◦ Plain personalization: PersBM25 ◦ Two step, first div, then pers: xQuADBM25

◦ Pers-diversification: PIA-select, PIA-select BM25 , PxQuAD, PxQuAD BM25

◦ Supervised diversification: SVMdiv, SVMrank

Results & Analysis -Supervised v. Unsupervised

Results & Analysis- Effect of UIT Model

Results & Analysis-Effects of Constraints

Query-Level Analysis

Conclusion Pro:

◦ User Interest Topic Model Con:

◦ Evaluated on a single, small dataset

Thank you!

Questions?

personalized search result diversification via structured learning shangsong liang, zhaochun ren,...

Documents

documents user u

specific user

user inaccurate user

user outliers

proposed user

user multinomial distribution

topics slide

user loss function