query classification

19
1 Query Classification and KDDCUP 2005 Qiang Yang, Dou Shen

Upload: merle

Post on 03-Feb-2016

23 views

Category:

Documents


0 download

DESCRIPTION

Query Classification. and KDDCUP 2005 Qiang Yang, Dou Shen. Query Classification and Online Advertisement. QC as Machine Learning. Inspired by the KDDCUP’05 competition Classify a query into a ranked list of categories Queries are collected from real search engines - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Query Classification

1

Query Classification

and KDDCUP 2005Qiang Yang, Dou Shen

Page 2: Query Classification

2

Query Classification and Online Advertisement

Page 3: Query Classification

33

QC as Machine Learning

Inspired by the KDDCUP’05 competition Classify a query into a ranked list of

categories Queries are collected from real search

engines Target categories are organized in a tree

with each node being a category

Page 4: Query Classification

4

How to do it?

Page 5: Query Classification

55

Solutions: Query Enrichment + Staged Classification

Queries Target Categories

Target Categories

Queries

Solution 1: Query/Category Enrichment

Solution 2: Bridging classifier

Page 6: Query Classification

66

Category information

Full text

Query enrichment Textual information

TitleSnippet Category

Page 7: Query Classification

77

Classifiers Map by Word

Matching Direct and Extended

Matching

High precision, low recall

SVM: Apply synonym-based classifiers to map Web pages from ODP to target taxonomy

Obtain <pages, target category> as the training data

Train SVM classifiers for the target categories;

Higher Recall

Device

DE

Page 8: Query Classification

88

Bridging Classifier

Problem with Solution 1: When target is changed, training needs to

repeat! Solution:

Connect the target taxonomy and queries by taking an intermediate taxonomy as a bridge

Page 9: Query Classification

99

Bridging Classifier (Cont.)

How to connect?

Prior prob. of IjC

The relation between and I

jC

TiC

The relation between and I

jC

q

The relation between andTiC

q

Page 10: Query Classification

1010

Category Selection for Intermediate Taxonomy

Category Selection for Reducing Complexity

Total Probability (TP)

Mutual Information

Page 11: Query Classification

1111 / 68

Experiment─ Data Sets & Evaluation KDDCUP

Starting at 1997, KDD Cup is the leading Data Mining and Knowledge Discovery competition in the world, organized by ACM SIGKDD

KDDCUP 2005 Task: Categorize 800K search queries into 67 categories Three Awards

(1) Performance Award ; (2) Precision Award; (3) Creativity Award Participation

142 registration groups; 37 solutions submitted from 32 teams Evaluation data

800 queries randomly selected from the 800K query set 3 human labelers labeled the entire evaluation query set (

details) Evaluation measurements: Precision and Performance (F1) (

details) a

3

1

i)labeler human against (F13

1 F1 Overall

i

Page 12: Query Classification

1212 / 68

Experiment Results─ Compare Different Methods

From Different Groups

Comparison among our own methodsComparison with other teams in

KDDCUP2005

Page 13: Query Classification

13

Result of Bridging Classifiers

Using bridging classifier allows the target classes to change freely

without the need to retrain the classifier!

Performance of the Bridging Classifier with Different Granularity of Intermediate Taxonomy

Page 14: Query Classification

14

Target-transfer Learning Classifier, once trained, stays constant When target classes change, classifier needs to

be retrained with new data Too costly Not online

Bridging Classifier: Allow target to change

Application: advertisements come and go, but our querytarget mapping needs not be retrained!

We call this the target-transfer learning problem

Page 15: Query Classification

15

Task 2: Can computer do this?

Page 16: Query Classification

16

Data: Web Search Queries

Consider the following search queries “AAAI” “Machine Learning” “Constraint Reasoning”

AAAI

Machine learning Constraint Reasoning

Page 17: Query Classification

17

AAAI 07, joint work with D. Shen, J. Sun, M. Qin, Z. Chen et al.

Queries have different granularity Car v.s BMW; BMW v.s. AUDI

Can we organize the queries into hierarchies?

Benefits of building query hierarchies Provide online query suggestion Query classification Query clustering

Difficulties of building query hierarchies

Queries are short The hierarchical structure cannot be pre-

defined

Page 18: Query Classification

18

Clickthrough Data

Clickthrough Data

Person 1

“SIGIR”

Search Engine

s

Page 19: Query Classification

19

Intuitive Ideas Our goal: mine the query hierarchies from

clickthrough data If two queries are related to each other, they

should share some of the same or similar clicked Web pages;

For two queries qi and qj , qi is more general if

most of the clicked pages of qj have similar pages to some clicked pages of qi

while not the other way around If a query is specific,

the contents of its clicked pages are relatively consistent,