![Page 1: Behavior-driven clustering of queries into topics](https://reader035.vdocuments.us/reader035/viewer/2022070422/56816571550346895dd80800/html5/thumbnails/1.jpg)
CIKM 2011, Glasgow
Behavior-driven clustering of
queries into topics
Luca Maria AielloDebora DonatoUmut OzertemFilippo Menczer
![Page 2: Behavior-driven clustering of queries into topics](https://reader035.vdocuments.us/reader035/viewer/2022070422/56816571550346895dd80800/html5/thumbnails/2.jpg)
CIKM 2011 2
USER PROFILING IN SEARCH ENGINES
Granularity levels
Aggregation
27/10/2011
Concise representation
Meaningful semantics
Query
Session
Goal
Mission
Topic
![Page 3: Behavior-driven clustering of queries into topics](https://reader035.vdocuments.us/reader035/viewer/2022070422/56816571550346895dd80800/html5/thumbnails/3.jpg)
CIKM 2011 3
MISSIONS AND TOPICS
A topic is a mental object or cognitive content, i.e., the sum of what can be perceived, discovered or learned about any real or abstract entity.
A search mission can be identified as a set of queries that express a complex search need, possibly articulated in smaller goals
27/10/2011
![Page 4: Behavior-driven clustering of queries into topics](https://reader035.vdocuments.us/reader035/viewer/2022070422/56816571550346895dd80800/html5/thumbnails/4.jpg)
CIKM 2011 4
QUERY STREAM DECOMPOSITION27/10/2011
Queries in the same mission
Same topic
Queries in consecutive missions
Different topic
Donato et. al:Do you want to take notes? Identifying research missions in Y! search pad. WWW’10Taxonomies User behavior and intent
![Page 5: Behavior-driven clustering of queries into topics](https://reader035.vdocuments.us/reader035/viewer/2022070422/56816571550346895dd80800/html5/thumbnails/5.jpg)
CIKM 2011 5
MERGING MISSIONS27/10/2011
![Page 6: Behavior-driven clustering of queries into topics](https://reader035.vdocuments.us/reader035/viewer/2022070422/56816571550346895dd80800/html5/thumbnails/6.jpg)
CIKM 2011 6
TOPIC DETECTOR STATS
• Gradient Boosted Decision Tree (GBDT)• Aggregation (min, max, avg, std) of 62 query pair
features
AUC 0.9510X cross validation on 500K pairs
27/10/2011
Lexical Features Behavioral features
Trigrams/terms cosine Probability fwd
Common prefix/suffix Session total click avg
Length difference Session total time avg
… …
![Page 7: Behavior-driven clustering of queries into topics](https://reader035.vdocuments.us/reader035/viewer/2022070422/56816571550346895dd80800/html5/thumbnails/7.jpg)
CIKM 2011 7
• Topic detector applied to pairs of query sets• O(log|M|·|M|2) (heavily parellelizable)
1. Missions of the same user supermissions
2. Query sets of different users higher-level topics
GREEDY AGGLOMERATIVE TOPIC EXTRACTION (GATE)27/10/2011
![Page 8: Behavior-driven clustering of queries into topics](https://reader035.vdocuments.us/reader035/viewer/2022070422/56816571550346895dd80800/html5/thumbnails/8.jpg)
EVALUATION
40K users
3 months Y! log
![Page 9: Behavior-driven clustering of queries into topics](https://reader035.vdocuments.us/reader035/viewer/2022070422/56816571550346895dd80800/html5/thumbnails/9.jpg)
CIKM 2011 9
EVALUATION: BASELINE
• OSLOM community detection algorithm– Weighted undirected graph– Maximizing local fitness function of clusters– Automatic hierarchy detection
Lancichinetti et. al:Finding statistically significant communities in networks. PLoS ONE, 2011.
27/10/2011
2URL cover graph
![Page 10: Behavior-driven clustering of queries into topics](https://reader035.vdocuments.us/reader035/viewer/2022070422/56816571550346895dd80800/html5/thumbnails/10.jpg)
CIKM 2011 10
EVALUATION: QUERY SET COVERAGE
Fraction of queries considered in the clustering phase
URL cover graph connected components size distribution
GATE: 1 OSLOM 0.2
27/10/2011
![Page 11: Behavior-driven clustering of queries into topics](https://reader035.vdocuments.us/reader035/viewer/2022070422/56816571550346895dd80800/html5/thumbnails/11.jpg)
CIKM 2011 11
EVALUATION: SINGLETON RATIO
Fraction of queries that remains isolated in singleton
GATE: 0.55-0.27 OSLOM 0.88
27/10/2011
![Page 12: Behavior-driven clustering of queries into topics](https://reader035.vdocuments.us/reader035/viewer/2022070422/56816571550346895dd80800/html5/thumbnails/12.jpg)
CIKM 2011 12
EVALUATION: AGGREGATION ABILITY
Topics aggregated in two consecutive steps or levels
GATE: 500k OSLOM:100K
27/10/2011
![Page 13: Behavior-driven clustering of queries into topics](https://reader035.vdocuments.us/reader035/viewer/2022070422/56816571550346895dd80800/html5/thumbnails/13.jpg)
CIKM 2011 13
EVALUATION: PURITY vs. COVERAGE
• Coverage– Number of unique clicked URLs for the query
• Purity– Average pointwise mutual information of pairs
of query-related relevant terms• Relevant terms are extracted from top clicked
results using a predefined dictionary
27/10/2011
![Page 14: Behavior-driven clustering of queries into topics](https://reader035.vdocuments.us/reader035/viewer/2022070422/56816571550346895dd80800/html5/thumbnails/14.jpg)
CIKM 2011 14
EVALUATION: PURITY vs. COVERAGE27/10/2011
![Page 15: Behavior-driven clustering of queries into topics](https://reader035.vdocuments.us/reader035/viewer/2022070422/56816571550346895dd80800/html5/thumbnails/15.jpg)
CIKM 2011 15
EVALUATION: PURITY vs. COVERAGE27/10/2011
![Page 16: Behavior-driven clustering of queries into topics](https://reader035.vdocuments.us/reader035/viewer/2022070422/56816571550346895dd80800/html5/thumbnails/16.jpg)
USER PROFILING
![Page 17: Behavior-driven clustering of queries into topics](https://reader035.vdocuments.us/reader035/viewer/2022070422/56816571550346895dd80800/html5/thumbnails/17.jpg)
CIKM 2011 17
USER PROFILING FROM TOPICS27/10/2011
TopicDetector
Missions
Topics
0.0 0.0 0.00.72.9 3.2 1.90.35 0.41 0.24 User topicalprofile
![Page 18: Behavior-driven clustering of queries into topics](https://reader035.vdocuments.us/reader035/viewer/2022070422/56816571550346895dd80800/html5/thumbnails/18.jpg)
CIKM 2011 18
PROFILES FOR “PREDICTION”
• Sequence of missions of the profiled user vs. sequence of a random one
• Sequence-profile match using topic detector• Success: 0.65 (0.72 less frequent, 0.55 most frequent)
27/10/2011
![Page 19: Behavior-driven clustering of queries into topics](https://reader035.vdocuments.us/reader035/viewer/2022070422/56816571550346895dd80800/html5/thumbnails/19.jpg)
CIKM 2011 19
CONCLUSIONS
• New behavior-driven notion of topics• Bottom-up topic extraction algorithm• Favorable comparison with graph-based clustering• Effective user profiling
• Other baselines• More accurate predictions
27/10/2011
![Page 20: Behavior-driven clustering of queries into topics](https://reader035.vdocuments.us/reader035/viewer/2022070422/56816571550346895dd80800/html5/thumbnails/20.jpg)
ACKNOWLEDGMENTS
Fil MenczerProf. Informatics @ IUDirector CNetS @IU
Umut OzertemYahoo! Search SciencesYahoo! Labs @ Sunnyvale
Emre VelisapaogluYahoo! Search Sciences
Yahoo! Labs @ Sunnyvale
Debora DonatoYahoo! Search Sciences
Yahoo! Labs @ Sunnyvale
![Page 21: Behavior-driven clustering of queries into topics](https://reader035.vdocuments.us/reader035/viewer/2022070422/56816571550346895dd80800/html5/thumbnails/21.jpg)
![Page 22: Behavior-driven clustering of queries into topics](https://reader035.vdocuments.us/reader035/viewer/2022070422/56816571550346895dd80800/html5/thumbnails/22.jpg)
CIKM 2011 2227/10/2011
Taxonomies User behavior and intent