query classification
DESCRIPTION
Query Classification. and KDDCUP 2005 Qiang Yang, Dou Shen. Query Classification and Online Advertisement. QC as Machine Learning. Inspired by the KDDCUP’05 competition Classify a query into a ranked list of categories Queries are collected from real search engines - PowerPoint PPT PresentationTRANSCRIPT
1
Query Classification
and KDDCUP 2005Qiang Yang, Dou Shen
2
Query Classification and Online Advertisement
33
QC as Machine Learning
Inspired by the KDDCUP’05 competition Classify a query into a ranked list of
categories Queries are collected from real search
engines Target categories are organized in a tree
with each node being a category
4
How to do it?
55
Solutions: Query Enrichment + Staged Classification
Queries Target Categories
Target Categories
Queries
Solution 1: Query/Category Enrichment
Solution 2: Bridging classifier
66
Category information
Full text
Query enrichment Textual information
TitleSnippet Category
77
Classifiers Map by Word
Matching Direct and Extended
Matching
High precision, low recall
SVM: Apply synonym-based classifiers to map Web pages from ODP to target taxonomy
Obtain <pages, target category> as the training data
Train SVM classifiers for the target categories;
Higher Recall
Device
DE
88
Bridging Classifier
Problem with Solution 1: When target is changed, training needs to
repeat! Solution:
Connect the target taxonomy and queries by taking an intermediate taxonomy as a bridge
99
Bridging Classifier (Cont.)
How to connect?
Prior prob. of IjC
The relation between and I
jC
TiC
The relation between and I
jC
q
The relation between andTiC
q
1010
Category Selection for Intermediate Taxonomy
Category Selection for Reducing Complexity
Total Probability (TP)
Mutual Information
1111 / 68
Experiment─ Data Sets & Evaluation KDDCUP
Starting at 1997, KDD Cup is the leading Data Mining and Knowledge Discovery competition in the world, organized by ACM SIGKDD
KDDCUP 2005 Task: Categorize 800K search queries into 67 categories Three Awards
(1) Performance Award ; (2) Precision Award; (3) Creativity Award Participation
142 registration groups; 37 solutions submitted from 32 teams Evaluation data
800 queries randomly selected from the 800K query set 3 human labelers labeled the entire evaluation query set (
details) Evaluation measurements: Precision and Performance (F1) (
details) a
3
1
i)labeler human against (F13
1 F1 Overall
i
1212 / 68
Experiment Results─ Compare Different Methods
From Different Groups
Comparison among our own methodsComparison with other teams in
KDDCUP2005
13
Result of Bridging Classifiers
Using bridging classifier allows the target classes to change freely
without the need to retrain the classifier!
Performance of the Bridging Classifier with Different Granularity of Intermediate Taxonomy
14
Target-transfer Learning Classifier, once trained, stays constant When target classes change, classifier needs to
be retrained with new data Too costly Not online
Bridging Classifier: Allow target to change
Application: advertisements come and go, but our querytarget mapping needs not be retrained!
We call this the target-transfer learning problem
15
Task 2: Can computer do this?
16
Data: Web Search Queries
Consider the following search queries “AAAI” “Machine Learning” “Constraint Reasoning”
AAAI
Machine learning Constraint Reasoning
17
AAAI 07, joint work with D. Shen, J. Sun, M. Qin, Z. Chen et al.
Queries have different granularity Car v.s BMW; BMW v.s. AUDI
Can we organize the queries into hierarchies?
Benefits of building query hierarchies Provide online query suggestion Query classification Query clustering
Difficulties of building query hierarchies
Queries are short The hierarchical structure cannot be pre-
defined
18
Clickthrough Data
Clickthrough Data
Person 1
“SIGIR”
√
√
√
Search Engine
s
19
Intuitive Ideas Our goal: mine the query hierarchies from
clickthrough data If two queries are related to each other, they
should share some of the same or similar clicked Web pages;
For two queries qi and qj , qi is more general if
most of the clicked pages of qj have similar pages to some clicked pages of qi
while not the other way around If a query is specific,
the contents of its clicked pages are relatively consistent,