evaluating novelty and diversity charles clarke school of computer science university of waterloo...
TRANSCRIPT
![Page 1: Evaluating Novelty and Diversity Charles Clarke School of Computer Science University of Waterloo two talks in one!](https://reader038.vdocuments.us/reader038/viewer/2022110304/5518a754550346b31f8b4af2/html5/thumbnails/1.jpg)
Evaluating Novelty and Diversity
Charles Clarke
School of Computer Science
University of Waterloo
two talks in one!
![Page 2: Evaluating Novelty and Diversity Charles Clarke School of Computer Science University of Waterloo two talks in one!](https://reader038.vdocuments.us/reader038/viewer/2022110304/5518a754550346b31f8b4af2/html5/thumbnails/2.jpg)
Goals for Evaluation Measures
• meaningful• tractable• reusable
![Page 3: Evaluating Novelty and Diversity Charles Clarke School of Computer Science University of Waterloo two talks in one!](https://reader038.vdocuments.us/reader038/viewer/2022110304/5518a754550346b31f8b4af2/html5/thumbnails/3.jpg)
Evaluation Framework
We examine a framework for evaluation.
Specific measures covered by the framework include:
Clarke et al. (SIGIR ’08)Agrawal et al. (WSDM ’09)Clarke et al. (ICTIR ‘09)
![Page 4: Evaluating Novelty and Diversity Charles Clarke School of Computer Science University of Waterloo two talks in one!](https://reader038.vdocuments.us/reader038/viewer/2022110304/5518a754550346b31f8b4af2/html5/thumbnails/4.jpg)
Talk #1: Evaluating Diversity
Charles Clarke
School of Computer Science
University of Waterloo
![Page 5: Evaluating Novelty and Diversity Charles Clarke School of Computer Science University of Waterloo two talks in one!](https://reader038.vdocuments.us/reader038/viewer/2022110304/5518a754550346b31f8b4af2/html5/thumbnails/5.jpg)
Query: “windows”
1. Microsoft Windowsa) When will Windows 7 be released?
b) What’s the Windows update URL?
c) I want to download Windows Live Essentials
2. House windowsa) Where can I buy replacement windows?
b) What brands are available?
c) Aluminum or vinyl?
3. Windows Restaurant, Las Vegas
![Page 6: Evaluating Novelty and Diversity Charles Clarke School of Computer Science University of Waterloo two talks in one!](https://reader038.vdocuments.us/reader038/viewer/2022110304/5518a754550346b31f8b4af2/html5/thumbnails/6.jpg)
Nuggets
Nugget = any binary property of a document
Provides address of a Pella dealer. Discusses history of the Windows OS. Is the Windows update page.
(factual, topical and navigational)
Problem: potentially thousands per query.
![Page 7: Evaluating Novelty and Diversity Charles Clarke School of Computer Science University of Waterloo two talks in one!](https://reader038.vdocuments.us/reader038/viewer/2022110304/5518a754550346b31f8b4af2/html5/thumbnails/7.jpg)
Evaluation
• Model user information needs using nuggets. Different users will be interested in different combinations of nuggets.
• Express judgments in terms of nuggets. Judgments may be automatic or manual. Judgments are binary: Does this document contain this nugget?
• Nuggets link users and documents
![Page 8: Evaluating Novelty and Diversity Charles Clarke School of Computer Science University of Waterloo two talks in one!](https://reader038.vdocuments.us/reader038/viewer/2022110304/5518a754550346b31f8b4af2/html5/thumbnails/8.jpg)
Interdependencies
Problem: Complex interdependencies between nuggets.
Three possible simplifying assumptions:
1. User interested in nugget A will always be interested in nugget B.
2. User interested in nugget A will never be interested in nugget B.
3. Nuggets A and B are independent.
![Page 9: Evaluating Novelty and Diversity Charles Clarke School of Computer Science University of Waterloo two talks in one!](https://reader038.vdocuments.us/reader038/viewer/2022110304/5518a754550346b31f8b4af2/html5/thumbnails/9.jpg)
Possible Assumption #1
If a user interested in nugget A will always be interested in nugget B, then A and B can be treated as the same nugget.
![Page 10: Evaluating Novelty and Diversity Charles Clarke School of Computer Science University of Waterloo two talks in one!](https://reader038.vdocuments.us/reader038/viewer/2022110304/5518a754550346b31f8b4af2/html5/thumbnails/10.jpg)
Possible Assumption #2
A user interested in nugget A will never be interested in nugget B (and vice versa). A user’s interest in nugget A depends on their interest in nugget B.
Nugget A and nugget be may be viewed as representing different interpretations of the query.
![Page 11: Evaluating Novelty and Diversity Charles Clarke School of Computer Science University of Waterloo two talks in one!](https://reader038.vdocuments.us/reader038/viewer/2022110304/5518a754550346b31f8b4af2/html5/thumbnails/11.jpg)
Query: “windows”
1. Microsoft Windowsa) When will Windows 7 be released?
b) What’s the Windows update URL?
c) I want to download Windows Live Essentials
2. House windowsa) Where can I buy replacement windows?
b) What brands are available?
c) Aluminum or vinyl?
3. Windows Restaurant, Las Vegas
![Page 12: Evaluating Novelty and Diversity Charles Clarke School of Computer Science University of Waterloo two talks in one!](https://reader038.vdocuments.us/reader038/viewer/2022110304/5518a754550346b31f8b4af2/html5/thumbnails/12.jpg)
Query Interpretations
• Assume M interpretations• Compute any effectiveness measure with
respect to each interpretation (Sj)
• Compute weighted average (where pj is probability of interpretation j)
• Agrawal et al, 2009
![Page 13: Evaluating Novelty and Diversity Charles Clarke School of Computer Science University of Waterloo two talks in one!](https://reader038.vdocuments.us/reader038/viewer/2022110304/5518a754550346b31f8b4af2/html5/thumbnails/13.jpg)
Possible Assumption #3
A user’s interest in nugget A is independent of their interest in nugget B.
The probability that the user is interested in nugget A is a constant (pA).
The probability that the user is interested in nugget B is a constant (pB).
![Page 14: Evaluating Novelty and Diversity Charles Clarke School of Computer Science University of Waterloo two talks in one!](https://reader038.vdocuments.us/reader038/viewer/2022110304/5518a754550346b31f8b4af2/html5/thumbnails/14.jpg)
Query: “windows”
1. Microsoft Windowsa) When will Windows 7 be released?
b) What’s the Windows update URL?
c) I want to download Windows Live Essentials
2. House windowsa) Where can I buy replacement windows?
b) What brands are available?
c) Aluminum or vinyl?
3. Windows Restaurant, Las Vegas
![Page 15: Evaluating Novelty and Diversity Charles Clarke School of Computer Science University of Waterloo two talks in one!](https://reader038.vdocuments.us/reader038/viewer/2022110304/5518a754550346b31f8b4af2/html5/thumbnails/15.jpg)
Relevance framework
A document is relevant if it contains any relevant information (with N nuggets).
![Page 16: Evaluating Novelty and Diversity Charles Clarke School of Computer Science University of Waterloo two talks in one!](https://reader038.vdocuments.us/reader038/viewer/2022110304/5518a754550346b31f8b4af2/html5/thumbnails/16.jpg)
Relevance
• Assume constant user probabilities• Assume constant document probabilities• J(d, i) = 1 iff document d is judged to
contain nugget i
count the nuggets
![Page 17: Evaluating Novelty and Diversity Charles Clarke School of Computer Science University of Waterloo two talks in one!](https://reader038.vdocuments.us/reader038/viewer/2022110304/5518a754550346b31f8b4af2/html5/thumbnails/17.jpg)
Probability of Relevance
Estimated probability of relevance replaces relevance in standard evaluation measures, including nDCG, MAP, and Rank-biased precision.
Assumptions #2 and #3 can then be combined.
Other estimation methods possible.
![Page 18: Evaluating Novelty and Diversity Charles Clarke School of Computer Science University of Waterloo two talks in one!](https://reader038.vdocuments.us/reader038/viewer/2022110304/5518a754550346b31f8b4af2/html5/thumbnails/18.jpg)
Research Issues (talk #1)
• Identifying nuggets automatically– Clustering– Co-clicks– Query refinement
• Automatic judging– Patterns– Classification
• How many nuggets are enough?• Estimating probability of relevance
![Page 19: Evaluating Novelty and Diversity Charles Clarke School of Computer Science University of Waterloo two talks in one!](https://reader038.vdocuments.us/reader038/viewer/2022110304/5518a754550346b31f8b4af2/html5/thumbnails/19.jpg)
Conclusions (talk #1)
• Evaluating diversity requires us to model and represent the diversity.
• Nuggets represent one possible solution.• Simple user model; simple assumptions;
simple judging.
![Page 20: Evaluating Novelty and Diversity Charles Clarke School of Computer Science University of Waterloo two talks in one!](https://reader038.vdocuments.us/reader038/viewer/2022110304/5518a754550346b31f8b4af2/html5/thumbnails/20.jpg)
Questions?
Talk #1: Evaluating Diversity
Charles Clarke
School of Computer Science
University of Waterloo
![Page 21: Evaluating Novelty and Diversity Charles Clarke School of Computer Science University of Waterloo two talks in one!](https://reader038.vdocuments.us/reader038/viewer/2022110304/5518a754550346b31f8b4af2/html5/thumbnails/21.jpg)
Intermission
The TREC 2009 Web Track• traditional adhoc task• novelty and diversity task• ClueWeb09 dataset (one billion pages)• explore effectiveness measures• http://plg.uwaterloo.ca/~trecweb
![Page 22: Evaluating Novelty and Diversity Charles Clarke School of Computer Science University of Waterloo two talks in one!](https://reader038.vdocuments.us/reader038/viewer/2022110304/5518a754550346b31f8b4af2/html5/thumbnails/22.jpg)
Intermission: Free sample topic
<topic number=0> <query> physical therapist </query> <description> The user requires information regarding the profession and the services it provides. </description> <subtopic number=1> What does a physical therapist do? </subtopic> <subtopic number=2> Where can I find a physical therapist? </subtopic> <subtopic number=3> How much does physical therapy cost per hour? </subtopic> …
![Page 23: Evaluating Novelty and Diversity Charles Clarke School of Computer Science University of Waterloo two talks in one!](https://reader038.vdocuments.us/reader038/viewer/2022110304/5518a754550346b31f8b4af2/html5/thumbnails/23.jpg)
Talk #2: Evaluating Novelty
Charles Clarke
School of Computer Science
University of Waterloo
![Page 24: Evaluating Novelty and Diversity Charles Clarke School of Computer Science University of Waterloo two talks in one!](https://reader038.vdocuments.us/reader038/viewer/2022110304/5518a754550346b31f8b4af2/html5/thumbnails/24.jpg)
Novelty
• Novelty depends on diversity.• Previous talk considered probability of
relevance in isolation (e.g., for the top-ranked document).
• In this talk we will examine how user context impacts the probability of relevance.
![Page 25: Evaluating Novelty and Diversity Charles Clarke School of Computer Science University of Waterloo two talks in one!](https://reader038.vdocuments.us/reader038/viewer/2022110304/5518a754550346b31f8b4af2/html5/thumbnails/25.jpg)
User context
![Page 26: Evaluating Novelty and Diversity Charles Clarke School of Computer Science University of Waterloo two talks in one!](https://reader038.vdocuments.us/reader038/viewer/2022110304/5518a754550346b31f8b4af2/html5/thumbnails/26.jpg)
Simplest context model
• Ranked list• User scans result 1, 2, 3, 4, 5, … in order.• Novelty of result k considered in light of
the first k-1 results.
![Page 27: Evaluating Novelty and Diversity Charles Clarke School of Computer Science University of Waterloo two talks in one!](https://reader038.vdocuments.us/reader038/viewer/2022110304/5518a754550346b31f8b4af2/html5/thumbnails/27.jpg)
Relevance framework
![Page 28: Evaluating Novelty and Diversity Charles Clarke School of Computer Science University of Waterloo two talks in one!](https://reader038.vdocuments.us/reader038/viewer/2022110304/5518a754550346b31f8b4af2/html5/thumbnails/28.jpg)
Relevance
Assuming constant probabilities.
![Page 29: Evaluating Novelty and Diversity Charles Clarke School of Computer Science University of Waterloo two talks in one!](https://reader038.vdocuments.us/reader038/viewer/2022110304/5518a754550346b31f8b4af2/html5/thumbnails/29.jpg)
Beyond the ranked list
![Page 30: Evaluating Novelty and Diversity Charles Clarke School of Computer Science University of Waterloo two talks in one!](https://reader038.vdocuments.us/reader038/viewer/2022110304/5518a754550346b31f8b4af2/html5/thumbnails/30.jpg)
Research issues (talk #2)
• Better user models• Prior browsing context, local context, etc.• Evaluating impact of result presentation
methods– Better captions– Query suggestions– Instant answers (stock quotes, weather,
product prices, definitions)
![Page 31: Evaluating Novelty and Diversity Charles Clarke School of Computer Science University of Waterloo two talks in one!](https://reader038.vdocuments.us/reader038/viewer/2022110304/5518a754550346b31f8b4af2/html5/thumbnails/31.jpg)
Conclusions (talk #2)
• Modeling and representing diversity allows us to consider novelty.
• User models should be simple enough to be tractable.
• User models should be complex enough to be meaningful.
![Page 32: Evaluating Novelty and Diversity Charles Clarke School of Computer Science University of Waterloo two talks in one!](https://reader038.vdocuments.us/reader038/viewer/2022110304/5518a754550346b31f8b4af2/html5/thumbnails/32.jpg)
Questions?
Talk #2: Evaluating Novelty
Charles Clarke
School of Computer Science
University of Waterloo