an important problem in sponsored search advertising is keyword generation, which bridges the gap...

1
An important problem in sponsored search advertising is keyword generation, which bridges the gap between the keywords bidded by advertisers and queried by their potential customers. We propose an efficient relevance feedback-based interactive model for keyword generation in sponsored search advertising. We formulate the ranking of relevant terms as a supervised learning problem and suggest new terms for the seed by leveraging user relevance feedback. Active learning is employed to select the most informative samples from a set of candidate terms for user labeling. Experiments show our approach improves the relevance of generated terms significantly with little user effort required. Represent each seed term using a characteristic document which contains retrieved snippets from top search-hits. Remove stop words from these documents and stem the terms using Porter’s stemmer as preprocessing. Top weighted terms on TFIDFs in a characteristic document are selected as the candidates for the corresponding seed term. Advertising Keyword Generation Advertising Keyword Generation Using Active Learning Using Active Learning Hao Wu, Guang Qiu, Xiaofei He, Yuan Shi, Mingcheng Qu, Jing Shen, Jiajun Bu and Chun Chen College of Computer Science and Technology, Zhejiang University, China [email protected] Sponsored search is a successful form of on- line advertising, with annual revenue exceeding billions of dollars. Advertisers: With the goal of branding or/and marketing Create ads and bid on relevant keywords (or terms) The auction winners have their ads displayed as sponsored links (As illustrated in Figure 1) alongside the organic search results. Web Users: Search on keywords (or terms) Click relevant sponsored links The two salient features of our relevance feedback-based interactive model: Leveraging the advantage of active learning algorithm in selecting the most informative samples for user labeling. Achieving a good tradeoff between the performance and user satisfaction. Future Work: Explore the use of phrases Extend to other applications such as term clustering Contact: Hao Wu Master Candidate College of Computer Science and Technology Zhejiang University, China Phone: (+86)13819132899 Email: [email protected] Thanks to Jian Pei @ School of Computing Science, Simon Fraser University for many invaluable discussions for this research. 1. Abstract 2. Introduction 3. Methodology 4. Experimental Results Dataset Setup: 100 category names widely spread over different topics from eBay and Amazon (Seed Terms) 400 candidate terms for each seed term 20 benchmark seed terms for user evaluation, namely, Books, Software, Pets, Furniture, Bedding, Lighting, Hair, Toys, Baby, Shoes, Jewelry, Watches, Gift, Motorcycle, Battery, Wedding, Phones, Guitar, Skin, Travel”. Results are presented in Figure 3. 5. Conclusion and Future Work Acknowledgements Figure 3: Average Precision Curves (TED for Transductive Experimental Design, Random for Random Sampling, Baseline1 for Pseudo-Relevance Feedback, and Baseline0 for Original Candidate Terms) 3.1 Candidate Term Generation 3.1 Active Learning Figure1: Sponsored Links in Pink Rectangles Alongside the Organic Search Results Keyword Generation: Keyword generation/suggestion methods are employed to assist advertisers in finding all the terms relevant to their products or services. The term relevance is crucial to capture valid queries and clicks from their potential customers. E.g. an initial keyword which represents a concept such as “shoes” (i.e. a seed term) footwear, sandals, heel, boots, clogs” can be good relevant terms (suggestions) for “shoes”. Results: Both random sampling and TED significantly outperform pseudo-relevance feedback since additional user relevance feedback is explored. TED consistently performs the best. The Interactive Model: Employ search engines to match a seed term with a large set of ranked candidate terms. Users are required to provide relevant/irrelevant labeling on just a few candidates (training samples which are selected using active learning algorithm). Re-rank all the candidate terms based on learned Relevance Model. Framework is illustrated in Figure 2. Figure2: Proposed Framework for Keyword Generation Performance & User Satisfaction: The samples to be selected for labeling should be as few as possible since labeling is labor-intensive. The quality of training samples affects the performance directly. A natural strategy is to select the most informative samples using active learning Feature Representation for each (seed, candidate) term pair: TF and TFIDF of the candidate in the search snippets document of the seed; TF of the seed in the search snippets document of the candidate; Search Snippets Similarity; Common Search URLs. Transductive Experimental Design (TED): Select candidates that are hard-to- predict and representative for unlabeled candidates. Consider a regularized linear regression problem, the maximum likelihood estimate of w is given by The quality of predictions on the target data V is characterized by where Cw is the inverted Hessian of

Upload: marcia-wright

Post on 29-Dec-2015

216 views

Category:

Documents


4 download

TRANSCRIPT

Page 1: An important problem in sponsored search advertising is keyword generation, which bridges the gap between the keywords bidded by advertisers and queried

An important problem in sponsored search advertising is keyword generation, which bridges the gap between the keywords bidded by advertisers and queried by their potential customers.We propose an efficient relevance feedback-based interactive model for keyword generation in sponsored search advertising. We formulate the ranking of relevant terms as a supervised learning problem and suggest new terms for the seed by leveraging user relevance feedback. Active learning is employed to select the most informative samples from a set of candidate terms for user labeling. Experiments show our approach improves the relevance of generated terms significantly with little user effort required.

Represent each seed term using a characteristic document which contains retrieved snippets from top search-hits.Remove stop words from these documents and stem the terms using Porter’s stemmer as preprocessing.Top weighted terms on TFIDFs in a characteristic document are selected as the candidates for the corresponding seed term.

Advertising Keyword Generation Advertising Keyword Generation Using Active LearningUsing Active Learning

Hao Wu, Guang Qiu, Xiaofei He, Yuan Shi, Mingcheng Qu, Jing Shen, Jiajun Bu and Chun ChenCollege of Computer Science and Technology, Zhejiang University, China

[email protected]

Sponsored search is a successful form of on- line advertising, with annual revenue exceeding billions of dollars.

Advertisers:With the goal of branding or/and marketingCreate ads and bid on relevant keywords (or terms)The auction winners have their ads displayed as sponsored links (As illustrated in Figure 1) alongside the organic search results.

Web Users:Search on keywords (or terms)Click relevant sponsored links

The two salient features of our relevance feedback-based interactive model:

Leveraging the advantage of active learning algorithm in selecting the most informative samples for user labeling. Achieving a good tradeoff between the performance and user satisfaction.

Future Work:

Explore the use of phrasesExtend to other applications such as term clustering

Contact:

Hao WuMaster Candidate College of Computer Science and TechnologyZhejiang University, China

Phone: (+86)13819132899Email: [email protected]

Thanks to Jian Pei @ School of Computing Science, Simon Fraser University for many invaluable discussions for this research.

1. Abstract

2. Introduction

3. Methodology 4. Experimental Results

Dataset Setup:

100 category names widely spread over different topics from eBay and Amazon (Seed Terms)400 candidate terms for each seed term20 benchmark seed terms for user evaluation, namely, “Books, Software, Pets, Furniture, Bedding, Lighting, Hair, Toys, Baby, Shoes, Jewelry, Watches, Gift, Motorcycle, Battery, Wedding, Phones, Guitar, Skin, Travel”.Results are presented in Figure 3.

5. Conclusion and Future Work

Acknowledgements

Figure 3: Average Precision Curves (TED for Transductive Experimental Design, Random for Random Sampling, Baseline1 for Pseudo-Relevance Feedback, and Baseline0 for Original Candidate Terms)

3.1 Candidate Term Generation

3.1 Active Learning

Figure1: Sponsored Links in Pink Rectangles Alongside the Organic Search Results

Keyword Generation:

Keyword generation/suggestion methods are employed to assist advertisers in finding all the terms relevant to their products or services. The term relevance is crucial to capture valid queries and clicks from their potential customers.E.g. an initial keyword which represents a concept such as “shoes” (i.e. a seed term)“footwear, sandals, heel, boots, clogs” can be good relevant terms (suggestions) for “shoes”.

Results:

Both random sampling and TED significantly outperform pseudo-relevance feedback since additional user relevance feedback is explored.TED consistently performs the best.

The Interactive Model:

Employ search engines to match a seed term with a large set of ranked candidate terms.Users are required to provide relevant/irrelevant labeling on just a few candidates (training samples which are selected using active learning algorithm). Re-rank all the candidate terms based on learned Relevance Model.Framework is illustrated in Figure 2.

Figure2: Proposed Framework for Keyword Generation

Performance & User Satisfaction:

The samples to be selected for labeling should be as few as possible since labeling is labor-intensive.The quality of training samples affects the performance directly. A natural strategy is to select the most informative samples using active learning

Feature Representation for each (seed, candidate) term pair:

TF and TFIDF of the candidate in the search snippets document of the seed; TF of the seed in the search snippets document of the candidate; Search Snippets Similarity;Common Search URLs.

Transductive Experimental Design (TED):

Select candidates that are hard-to-predict and representative for unlabeled candidates.Consider a regularized linear regression problem, the maximum likelihood estimate of w is given by

The quality of predictions on the target data V is characterized by where Cw is the inverted Hessian of J(w).One should find a subset X which can minimize the total predictive variance: