advisor: koh jia-ling nonhlanhla shongwe 2010-09-28 efficient query expansion for advertisement...

30
Advisor: Koh Jia-Ling Nonhlanhla Shongwe 2010- 09-28 EFFICIENT QUERY EXPANSION FOR ADVERTISEMENT SEARCH WANG.H, LIANG.Y, FU.L, XUE.G, YU.Y SIGIR’09

Upload: lewis-chase

Post on 08-Jan-2018

217 views

Category:

Documents


0 download

DESCRIPTION

Introduction  Web has become an important venue for advertising e.g Google, Yahoo  Mainly two kinds of advertising channels  Contextual advertising  Sponsored advertising  Ranking: derived from  relevance to the user query  page content

TRANSCRIPT

Page 1: Advisor: Koh Jia-Ling Nonhlanhla Shongwe 2010-09-28 EFFICIENT QUERY EXPANSION FOR ADVERTISEMENT SEARCH WANG.H, LIANG.Y, FU.L, XUE.G, YU.Y SIGIR’09

Advisor: Koh Jia-Ling Nonhlanhla Shongwe

2010-09-28

EFFICIENT QUERY EXPANSION FOR ADVERTISEMENT SEARCHWANG.H, LIANG.Y, FU.L, XUE.G, YU.YSIGIR’09

Page 2: Advisor: Koh Jia-Ling Nonhlanhla Shongwe 2010-09-28 EFFICIENT QUERY EXPANSION FOR ADVERTISEMENT SEARCH WANG.H, LIANG.Y, FU.L, XUE.G, YU.Y SIGIR’09

Preview

Introduction AdSearch

Bid phrase clustering Index structure for efficient ad search Query processing

Experimental evaluation Conclusion

Page 3: Advisor: Koh Jia-Ling Nonhlanhla Shongwe 2010-09-28 EFFICIENT QUERY EXPANSION FOR ADVERTISEMENT SEARCH WANG.H, LIANG.Y, FU.L, XUE.G, YU.Y SIGIR’09

Introduction

Web has become an important venue for advertising e.g Google, Yahoo

Mainly two kinds of advertising channels Contextual advertising Sponsored advertising

Ranking: derived from relevance to the user query page content

Page 4: Advisor: Koh Jia-Ling Nonhlanhla Shongwe 2010-09-28 EFFICIENT QUERY EXPANSION FOR ADVERTISEMENT SEARCH WANG.H, LIANG.Y, FU.L, XUE.G, YU.Y SIGIR’09

Introduction cont’s

Ad’s are characterized by bid phrases keywords the advertisers choose for their ads

Syntactic approaches suffer low recall Example

Query: “job training” Ad: career college

Ad does not have a syntactic match and is not proposed

Page 5: Advisor: Koh Jia-Ling Nonhlanhla Shongwe 2010-09-28 EFFICIENT QUERY EXPANSION FOR ADVERTISEMENT SEARCH WANG.H, LIANG.Y, FU.L, XUE.G, YU.Y SIGIR’09

Introduction cont’s

The problem is even worse because Shorter lengths of ads Sparsity of the bid phrases

Propose an efficient adsearch solution Tackle the issues with query expansion

Page 6: Advisor: Koh Jia-Ling Nonhlanhla Shongwe 2010-09-28 EFFICIENT QUERY EXPANSION FOR ADVERTISEMENT SEARCH WANG.H, LIANG.Y, FU.L, XUE.G, YU.Y SIGIR’09

AdSearch Overview

Page 7: Advisor: Koh Jia-Ling Nonhlanhla Shongwe 2010-09-28 EFFICIENT QUERY EXPANSION FOR ADVERTISEMENT SEARCH WANG.H, LIANG.Y, FU.L, XUE.G, YU.Y SIGIR’09

AdSearch cont’s Bid phrase clustering

Bipartite Graph Construction for Bid Phrase and Ads

Agglomerative Iterative Clustering

Page 8: Advisor: Koh Jia-Ling Nonhlanhla Shongwe 2010-09-28 EFFICIENT QUERY EXPANSION FOR ADVERTISEMENT SEARCH WANG.H, LIANG.Y, FU.L, XUE.G, YU.Y SIGIR’09

Bipartite Graph Construction for BidPhrase and Ads

A, B , C Ad0, Ad1, Ad2, Ad3, Ad4

1. B = 2. A =

3. G = vba, vbb, vbc 4. G = va0, va1, va2, va3, va4

Corpus data CA = Ad0, Ad3B = Ad1, Ad2, Ad3C = Ad2, Ad3, Ad4

Page 9: Advisor: Koh Jia-Ling Nonhlanhla Shongwe 2010-09-28 EFFICIENT QUERY EXPANSION FOR ADVERTISEMENT SEARCH WANG.H, LIANG.Y, FU.L, XUE.G, YU.Y SIGIR’09

Agglomerative Iterative Clustering

Jaccard Similarity

Corpus data CA = Ad0, Ad3B = Ad1, Ad2, Ad3C = Ad2, Ad3, Ad4 (A,B) = 1/4 (B,C) = 2/4

Page 10: Advisor: Koh Jia-Ling Nonhlanhla Shongwe 2010-09-28 EFFICIENT QUERY EXPANSION FOR ADVERTISEMENT SEARCH WANG.H, LIANG.Y, FU.L, XUE.G, YU.Y SIGIR’09

Agglomerative Iterative Clustering cont’sCorpus data C

A = Ad0, Ad3B = Ad1, Ad2, Ad3C = Ad2, Ad3, Ad4

A, B , C Ad0, Ad1, Ad2, Ad3, Ad4Bid-phrases Ads

Page 11: Advisor: Koh Jia-Ling Nonhlanhla Shongwe 2010-09-28 EFFICIENT QUERY EXPANSION FOR ADVERTISEMENT SEARCH WANG.H, LIANG.Y, FU.L, XUE.G, YU.Y SIGIR’09

Corpus data CA = Ad0, Ad3B = Ad1, Ad2, Ad3C = Ad2, Ad3, Ad4

A, B , C Ad0, Ad1, Ad2, Ad3, Ad4Bid-phrases

(A, B) = 0.25 (A, C) = 0.25(B, C) = 0.5

Bipartite graph

Ads

Ad0 = A, Ad1 = B, Ad2 = B, CAd3 = B, A, CAd4 = C

Ad0, Ad1 = 0Ad0, Ad2 = 0Ad0, Ad3 = 0.33Ad0, Ad4 = 0Ad1, Ad2 = 0.5Ad1, Ad3 = 0.33Ad1, Ad4 = 0Ad2, Ad3 = 0.66Ad2, Ad4 =0.5Ad3, Ad4 =0.33

Merge:Ad2, Ad3Ad2, Ad4Ad1, Ad2Ad0, Ad3

MergeB to CThen A

AB, C

Ad0

Ad1, Ad4

Ad2, Ad3

Page 12: Advisor: Koh Jia-Ling Nonhlanhla Shongwe 2010-09-28 EFFICIENT QUERY EXPANSION FOR ADVERTISEMENT SEARCH WANG.H, LIANG.Y, FU.L, XUE.G, YU.Y SIGIR’09

AdSearch cont’s

• Index structure for efficient adsearch Mapping clusters of Bid Phrases to Index Terms Block-based Index Structure Dictionaries

Page 13: Advisor: Koh Jia-Ling Nonhlanhla Shongwe 2010-09-28 EFFICIENT QUERY EXPANSION FOR ADVERTISEMENT SEARCH WANG.H, LIANG.Y, FU.L, XUE.G, YU.Y SIGIR’09

Mapping clusters of Bid Phrases to Index Terms

Clusters

B

A

C

D E

Page 14: Advisor: Koh Jia-Ling Nonhlanhla Shongwe 2010-09-28 EFFICIENT QUERY EXPANSION FOR ADVERTISEMENT SEARCH WANG.H, LIANG.Y, FU.L, XUE.G, YU.Y SIGIR’09

Block-based Index Structure3 inverted lists

Contains: Index =bid phrase

List = ad1 inverted list

Contains:Index =3 bid

phrases List = ad and bid phrase

Query =B

Page 15: Advisor: Koh Jia-Ling Nonhlanhla Shongwe 2010-09-28 EFFICIENT QUERY EXPANSION FOR ADVERTISEMENT SEARCH WANG.H, LIANG.Y, FU.L, XUE.G, YU.Y SIGIR’09

Block-based Index Structure cont’s

Advantages over the traditional method Similar bit phrases and their

corresponding ads are placed together Merge operations become fewer or even

can be avoided Expanding phrase B with phrase A and C,

in the traditional method is not efficient.

Page 16: Advisor: Koh Jia-Ling Nonhlanhla Shongwe 2010-09-28 EFFICIENT QUERY EXPANSION FOR ADVERTISEMENT SEARCH WANG.H, LIANG.Y, FU.L, XUE.G, YU.Y SIGIR’09

Dictionaries

Dictionary D used to record the mapping

Bid phrase to its corresponding artificial words Locate corresponding block to a bid phrase

Bid phrase artificial words (path)

A 6:0B 6_5:1C 6_5:2

Page 17: Advisor: Koh Jia-Ling Nonhlanhla Shongwe 2010-09-28 EFFICIENT QUERY EXPANSION FOR ADVERTISEMENT SEARCH WANG.H, LIANG.Y, FU.L, XUE.G, YU.Y SIGIR’09

Cluster pathNumber of distinct ads

Dictionaries cont’s

Dictionary C (counter dictionary) used to record number of distinct Ads per

cluster

Corpus data CA = Ad0, Ad3B = Ad1, Ad2, Ad3C = Ad2, Ad3, Ad4

Cluster path

Distinct ads

6 |Ad0, Ad3|=2

6_5 |Ad1, Ad2, Ad3, Ad4| = 4

(6, 2)(6_5, 4)

Page 18: Advisor: Koh Jia-Ling Nonhlanhla Shongwe 2010-09-28 EFFICIENT QUERY EXPANSION FOR ADVERTISEMENT SEARCH WANG.H, LIANG.Y, FU.L, XUE.G, YU.Y SIGIR’09

AdSearch cont’s

• Query processing Finding Related Bid phrases with

Corresponding Ads Ranking Top-k Relevant Ads

Page 19: Advisor: Koh Jia-Ling Nonhlanhla Shongwe 2010-09-28 EFFICIENT QUERY EXPANSION FOR ADVERTISEMENT SEARCH WANG.H, LIANG.Y, FU.L, XUE.G, YU.Y SIGIR’09

Finding Related Bid phrases with Corresponding Ads

The process to find related bid phrases Input: user queries

Look up the dictionary D to get corresponding artificial words

Find minimum clusters that contain enough ads

Bid phrase artificial words (path)

A 6:0B 6_5:1C 6_5:2

Query: ABD

Cluster path

Distinct ads

6 |Ad0, Ad3|=2

6_5 |Ad1, Ad2, Ad3, Ad4| = 4

e.g. Top 2 ads M=1.5 *2 = 3

Bid phrase artificial words (path)

A 6:0B 6_5:1C 6_5:2

Cluster path

Distinct ads

6 |Ad0, Ad3|=2

6_5 |Ad1, Ad2, Ad3, Ad4| = 4

Page 20: Advisor: Koh Jia-Ling Nonhlanhla Shongwe 2010-09-28 EFFICIENT QUERY EXPANSION FOR ADVERTISEMENT SEARCH WANG.H, LIANG.Y, FU.L, XUE.G, YU.Y SIGIR’09

Finding Related Bid phrases with Corresponding Ads

The process to find related bid phrases Return clusters, those containing at least

one bid are stored in one group

Perform a multi-way merge operation to get the final results.

Ad Ad1 Ad2 Ad3 Ad4

Bid phrases

A B,C A,B,C C

Ad Ad1 Ad2 Ad3 Ad4

Bid phrases

A B,C A,B,C C

Page 21: Advisor: Koh Jia-Ling Nonhlanhla Shongwe 2010-09-28 EFFICIENT QUERY EXPANSION FOR ADVERTISEMENT SEARCH WANG.H, LIANG.Y, FU.L, XUE.G, YU.Y SIGIR’09

Ranking Top-k Relevant Ads

A procedure to expand the user query with related bid phrases and get a list of ads

To get the top K User a scoring function

Q Query B(x) Set of related bid phrases

Similarity between x and ytfidf(y, ad) term frequency and inverse

document frequency

Page 22: Advisor: Koh Jia-Ling Nonhlanhla Shongwe 2010-09-28 EFFICIENT QUERY EXPANSION FOR ADVERTISEMENT SEARCH WANG.H, LIANG.Y, FU.L, XUE.G, YU.Y SIGIR’09

Experimental evaluation

Both Chinese and English

Page 23: Advisor: Koh Jia-Ling Nonhlanhla Shongwe 2010-09-28 EFFICIENT QUERY EXPANSION FOR ADVERTISEMENT SEARCH WANG.H, LIANG.Y, FU.L, XUE.G, YU.Y SIGIR’09

Experimental evaluation cont’s

Name Description CQS1 (Chinese )or EQS1 (English) Randomly sampled 100 bid

phrases and each bid phrase is associated with few distinct ads

CQS2 (Chinese )or EQS2 (English) Selected 100 pairs bid phrases, each pair could return ads associated with both bid phrases inside it

CQS3 (Chinese )or EQS3 (English) Constructed similarly with queries composed of 3 to 4 bid phrases

CQF ( Chinese Frequent Query set)and EQS( English Frequency Query Set )

100 popular bid phrases to build the CQF and EQF

Page 24: Advisor: Koh Jia-Ling Nonhlanhla Shongwe 2010-09-28 EFFICIENT QUERY EXPANSION FOR ADVERTISEMENT SEARCH WANG.H, LIANG.Y, FU.L, XUE.G, YU.Y SIGIR’09

Experimental evaluation cont’s

Evaluation of the clusters step

Page 25: Advisor: Koh Jia-Ling Nonhlanhla Shongwe 2010-09-28 EFFICIENT QUERY EXPANSION FOR ADVERTISEMENT SEARCH WANG.H, LIANG.Y, FU.L, XUE.G, YU.Y SIGIR’09

Experimental evaluation cont’s

Efficiency evaluation The adSearch was implemented in fixed and unfixed block sizes

The block size is defined as the fraction of distinct ads in the block with regards to the whole ads.

AdSearch(0.001) number of distinct ads in each block.

For example Chinese data 524, 868 * 0.001 = 525Chinese data set = 525

Inv= perform query expansion on top of the traditional inverted index

Page 26: Advisor: Koh Jia-Ling Nonhlanhla Shongwe 2010-09-28 EFFICIENT QUERY EXPANSION FOR ADVERTISEMENT SEARCH WANG.H, LIANG.Y, FU.L, XUE.G, YU.Y SIGIR’09

Experimental evaluation cont’s

Effectiveness valuation

•Randomly selected 50 queries •10 people invited to evaluate the returned ads by AdSearch and Baidu.

Page 27: Advisor: Koh Jia-Ling Nonhlanhla Shongwe 2010-09-28 EFFICIENT QUERY EXPANSION FOR ADVERTISEMENT SEARCH WANG.H, LIANG.Y, FU.L, XUE.G, YU.Y SIGIR’09

Experimental evaluation cont’s

Effectiveness evaluation

Page 28: Advisor: Koh Jia-Ling Nonhlanhla Shongwe 2010-09-28 EFFICIENT QUERY EXPANSION FOR ADVERTISEMENT SEARCH WANG.H, LIANG.Y, FU.L, XUE.G, YU.Y SIGIR’09

Conclusion Introduced a AdSearch system which

consists Bid phrase clustering

For each bid phrase and ad, it will contract a bipartite graph

Used the agglomerative iterative clustering to cluster similar ads

Index structure for efficient ad search Used a block-based index structure to index all ads

and bid phrases Used the dictionary to record mappings between

bid phrases and ads Query processing

Explained how ads we retrieved and ranked to get the top-k results

Page 29: Advisor: Koh Jia-Ling Nonhlanhla Shongwe 2010-09-28 EFFICIENT QUERY EXPANSION FOR ADVERTISEMENT SEARCH WANG.H, LIANG.Y, FU.L, XUE.G, YU.Y SIGIR’09

THANK YOU

Page 30: Advisor: Koh Jia-Ling Nonhlanhla Shongwe 2010-09-28 EFFICIENT QUERY EXPANSION FOR ADVERTISEMENT SEARCH WANG.H, LIANG.Y, FU.L, XUE.G, YU.Y SIGIR’09

Introduction cont’s

Back

All Docs Relevant Ads

Relevant Docs (R)

Relevant Ads in the Ads set (Ra)

Q = “job

training”