query suggestion using hitting time qiaozhu mei, dengyong zhou, kenneth church university of...
TRANSCRIPT
Query Suggestion Using Hitting Time
Qiaozhu Mei †, Dengyong Zhou ‡, Kenneth Church ‡
† University of Illinois at Urbana-Champaign‡ Microsoft Research, Redmond
Motivating Examples
2
MSG
1. Difficult for a user to express information need2. Difficult for a Search engine to infer information need
Query Suggestions: Accurate to express the information need;
Easy to infer information need
Sports
center
Food Additiv
e
Motivating Examples (Cont.)
3
Welcome to the hotel california
Suggestions
hotel california
eagles hotel california
hotel california band
hotel california by the eagles
hotel california song
lyrics of hotel california
listen hotel california eagle
Motivating Examples: Personalization
4
Mountain safety research
Metropolis Street Racer
Molten salt reactorMars Sample Return
Magnetic Stripe Reader
…
MSR
Actually Looking for Microsoft Research…
Research Questions
5
• How can we generate query suggestions in a principled way?
• Can we generate personalized query suggestions using the same method?
• Can this method be generalized to other search related tasks?
6
Rest of This Talk
• Random Walk, Hitting Time, and Bipartite Graph• Generating Query Suggestion• Personalized Query Suggestion• Experiments• Discussion and Summary
Random Walk and Hitting Time
7
i
k
A
jP = 0.7
P = 0.3
• Hitting Time– TA: the first time that the random
walk is at a vertex in A
• Mean Hitting Time– hi
A: expectation of TA given that the walk starts from vertex i
0.3
0.7
Computing Hitting Time
8
i
kA
j
TA: the first time that the random walk is at a vertex in A
}0,:min{ tAXtT tA
A ifor ,1)( Vj
Ajhjip
Aih
A ifor ,0
Iterative Computation
hiA: expectation of TA given that the
walk starting from vertex i
A i
h = 0
hiA = 0.7 hj
A + 0.3 hkA + 1
0.7
0.7
Apparently, hiA = 0 for those
Bipartite Graph and Hitting Time
9
Expected proximity of query i to the query A : hitting time of i A, hi
A
Bipartite Graph:- Edges between V1 and V2
- No edge inside V1 or V2
- Edges are weighted- e.g., V1 = query; V2 = Url
A
ijw(i, j) = 3
4
5
0.7
0.4V1 V2
7 1
)73(
3),()(
id
jiwjip
A
ij
4
5
0.7
0.4V1 V2
7 1
)13(
3),()(
jd
jiwijp
A
k
ij
4
5
0.7
0.4V1 V2
7 1
2
),(),()(
Vj ji d
jkw
d
jiwkip
• convert to a directed graph, even collapse one group
Generate Query Suggestion
10
Taa
american airline
mexiana
www.aa.com
www.theaa.com/travelwatch/planner_main.jsp
en.wikipedia.org/wiki/Mexicana
300
15
Query Url• Construct a (kNN)
subgraph from the query log data (of a predefined number of queries/urls)
• Compute transition probabilities p(i j)
• Compute hitting time hiA
• Rank candidate queries using hi
A
Intuition
• Why it works?– A url is close to a query if freq(q, url)
dominates the number of clicks on this url (most people use q to access url)
– A query is close to the target query if it is close to many urls that are close to the target query
11
Personalized Query Suggestion
• Queries are ambiguous• Different user different information need
different query suggestions• Simple approach: build the graph, compute
hitting time solely based on the user’s history• Data Sparseness
– E.g., you cannot see a query if you never used it
• Alternative: modify the bipartite graph instead of rebuilding all
12
Personalize the Bipartite Graph
13
Taa
american airline
alcoholics anonymous
www.aa.com
www.theaa.com/travelwatch/planner_main.jsp
www.alcoholics-anonymous.org
Query Url
en.wikipedia.org/wiki/Alcoholics_Anonymous
P“aa” + user
pseudo query:
Introduce a
pseudo (personalized query)
Reweight edges using
personalized
Probs.
• Key: How to compute – From w(url, user, query) – Sparse data!– Compute a smoothed p(Url | User, Query)
),|( QueryUserUrlp
),|( UrlUserQueryp
),|( QueryUserUrlp
Personalization with Backoff (Mei and Church 08)
14
),|(
),|(
),|(
),|(
),|(),|(
00
11
22
33
44
QIPUrlP
QIPUrlP
QIPUrlP
QIPUrlP
QIPUrlPQIPUrlP
156.111.188.243
156.111.188.*
156.111.*.*
156.*.*.*
*.*.*.*
Full personalization: sparse data!
No personalization: lose the
opportunity
Personalization with backoff:
We don’t have enough data for everyone!- Backoff to classes of users (e.g., IP)
Experiments
• Query Suggestion using Query Logs– commercial search engine log (1.5 year)– 637 million queries; 585 million urls– Query-click bipartite graph
• Author/keyword suggestion using DBLP– titles and authors from DBLP– 110k of papers, 580k authors– Coauthor graph, keyword graph, author-keyword
bipartite graph
• Baselines: nearest neighbor; personalized pagerank
15
Result: Query Suggestion
16
Hitting time
wikipedia friends
friends tv show wikipedia
friends home page
friends warner bros
the friends series
friends official site
friends(1994)
friendship
friends poem
friendster
friends episode guide
friends scripts
how to make friends
true friends
Yahoo
secret friends
friends reunited
hide friends
hi 5 friends
find friends
poems for friends
friends quotes
Query = friends
Result: Query Suggestion (II)
17
Yahoo
aa route planner
aa route finder
aa airlines
aa meetings
aa autoroute
aa road map
Live
aa route finder
aa route planner
aa airlines
american airlines
aa meeting
aa road map
Query = aaHitting time
alcoholics anonymous
automobile association
theaa
american airlines
american air
american airline ticket reservation
Hitting Time
learning to rank
ndcg measure ir
ndcg
lambdarank
Chris burges
pairwise testQuery = ranknet
Results: Personalized Query Suggestion
Query = msr
18
No personalization
mountian safety research
msrcorp
msr outdoor equipment
msr camp stoves
msr snowshoes
msr racing
Personalized
Microsoft research
research
what is research
research website
microsoft research anddevelopment
yahoo research labs
Result: Author Suggestion
Query = Jon Kleinberg
19
Hitting time
Aleksandrs Slivkins
Mark Sandler
Tom Wexler
Lars Backstrom
Elliot Anshelevich
Xiangyang Lan
Nearest Neighbor;
Prabhakar Raghavan
Eva Tardos
Daniel P. Huttenlocher
David Kempe
Amit Kumar
Andrew Tomkins
Favor students, especially
current students
(personalized Pagerank is similar)
Famous research
ers + former
students
Query = olap
Dimension updates
OLAP data
OLAP cubes
OLAP queries
View size
Hierarchical cluster
Result: Keyword Suggestion
Query = social network
Knowledge collaboration
Community structure
Resource organization
Information kiosks
Efficient searching
Network extraction
20
Query = pagerank
Pagerank computation
Ranking systems
Pagerank approximation
Incremental computations
Web spam
Iterative computation
Result: Keyword Suggestion for Author
21
Baselines
mining
data
frequent
Efficient
pattern
data mining
Baselines
learning
statistical
kernel
markov
inference
model
Hitting Time
large databases
frequent pattern
sequential pattern
pattern mining
frequent
multi dimensional
Query = Michael I. Jordan
Query = Jiawei Han
Hitting time
Dirichlet process
approximate inference
dirichlet
mean field
supervised learning
graphic models
Discussions
• Hitting time effectively boosts infrequent queries– Nearest Neighbor & personalized pagerank favorites
frequent queries
• Fast convergence: a few iterations and a subgraph gets most of the value
• No parameter to tune• Can be generalized to many other tasks (on
different graphs)
22
Ranking on Query log Graph and Search Tasks
• Query Query: query suggestion• Url Url: finding related pages
www.cs.jhu.edu/~brill • "research.microsoft.com/users/brill”
• IP IP: finding similar users• Url Query: Annotation, Summarization, ads term• Query Url: Search• IP, Query Url: Personalized Search• IP, Query Query: Personalized Query Suggestion• Many other opportunities!
Summary
• Generate query suggestions using hitting time on query-click graph
• Personalized query suggestion• Generalizable to other search tasks• Future work:
– Different types of graphs: e.g., query sessions– Combine with other features – Large scale evaluation
24
Thanks!
25