a statistical comparison of tag and query logs

Post on 04-Jan-2016

22 Views

Category:

Documents

2 Downloads

Preview:

Click to see full reader

DESCRIPTION

A Statistical Comparison of Tag and Query Logs. Mark J. Carman, Robert Gwadera , Fabio Crestani , and Mark Baillie SIGIR 2009 June 4, 2010 Hyunwoo Kim. Contents. Introduction Building a Dataset Are the Distributions Similar? Investigating Website Content Conclusion. Introduction. - PowerPoint PPT Presentation

TRANSCRIPT

A Statistical Comparison of Tag and Query Logs

Mark J. Carman, Robert Gwadera, Fabio Crestani, and Mark BaillieSIGIR 2009

June 4, 2010Hyunwoo Kim

Contents Introduction Building a Dataset Are the Distributions Similar? Investigating Website Content Conclusion

2 / 20

Introduction

tags3 / 20

Introduction Questions

1. Are queries and tags similar across URLs?2. Can tag data be used to approximate user queries to a

search engine?3. Can query logs be used to suggest new tags for a particular

webpage?4. For what types of websites is the correlation between the

term distributions for queries and tags the highest?5. Which of the distributions, tags or queries, is most closely re-

lated to the content of the clicked websites?

4 / 20

Building a Dataset AOL query log

– Sizable– Recent (2006)– English queries– Available to academic researchers– 657,426 users– A period of 3 months from March to May, 2006

Delicious tag– Collaborative tagging system

Final dataset: 4145 complete URLs– Google query, stemming, prunning

5 / 20

Are the Distributions Similar?

http://www.nytimes.com

tags

or

6 / 20

Are the Distributions Similar? Kullback-Leibler divergence

7 / 20

Are the Distributions Similar? Jensen-Shannon divergence

– Symmetric measure

Overlap coefficient

Vq : query logsVr : tags

8 / 20

Are the Distributions Similar?

9 / 20

Are the Distributions Similar? Open directory project

10 / 20

Are the Distributions Similar?

11 / 20

Are the Distributions Similar?

12 / 20

Are the Distributions Similar?

13 / 20

Are the Distributions Similar?

14 / 20

Are the Distributions Similar?

15 / 20

Are the Distributions Similar?

16 / 20

Investigating Website Content

17 / 20

Investigating Website Content

18 / 20

Conclusion Similarity between query term and tag

– Vocabularies contain a large amount of overlap– Term frequency distributions are correlated– Similarity is not dependent on the topic area

Queries are more similar to content than to tags Queries and tags are more similar to one another

than to content

Future work– Models for automatically removing noise from the tag and

query logs– Techniques for predicting useful tags from query distributions– Techniques for the effective use of tag data to improve dif-

ferent forms of Web search

19 / 20

Thank you

top related