aggregated search: motivations, methods, and milestones...aggregated search: motivations, methods,...
TRANSCRIPT
![Page 1: Aggregated Search: Motivations, Methods, and Milestones...Aggregated Search: Motivations, Methods, and Milestones Jaime Arguello INLS 613: Text Data Mining jarguell@email.unc.edu November](https://reader035.vdocuments.us/reader035/viewer/2022070212/61066e8d2c9a08106d7fa98e/html5/thumbnails/1.jpg)
Aggregated Search:Motivations, Methods, and Milestones
Jaime ArguelloINLS 613: Text Data Mining
November 19, 2014
Tuesday, November 18, 14
![Page 2: Aggregated Search: Motivations, Methods, and Milestones...Aggregated Search: Motivations, Methods, and Milestones Jaime Arguello INLS 613: Text Data Mining jarguell@email.unc.edu November](https://reader035.vdocuments.us/reader035/viewer/2022070212/61066e8d2c9a08106d7fa98e/html5/thumbnails/2.jpg)
2
• retrieving full-text documents from a single collection in response to a user’s query
Traditional Information Retrieval
full-text
query-document similarity
document prior
P(D|Q) ∝ P(Q|D) × P(D)
Tuesday, November 18, 14
![Page 3: Aggregated Search: Motivations, Methods, and Milestones...Aggregated Search: Motivations, Methods, and Milestones Jaime Arguello INLS 613: Text Data Mining jarguell@email.unc.edu November](https://reader035.vdocuments.us/reader035/viewer/2022070212/61066e8d2c9a08106d7fa98e/html5/thumbnails/3.jpg)
Information Needs in Today’s World
news
local businesslistings
3
Tuesday, November 18, 14
![Page 4: Aggregated Search: Motivations, Methods, and Milestones...Aggregated Search: Motivations, Methods, and Milestones Jaime Arguello INLS 613: Text Data Mining jarguell@email.unc.edu November](https://reader035.vdocuments.us/reader035/viewer/2022070212/61066e8d2c9a08106d7fa98e/html5/thumbnails/4.jpg)
images
video
Information Needs in Today’s World
4
Tuesday, November 18, 14
![Page 5: Aggregated Search: Motivations, Methods, and Milestones...Aggregated Search: Motivations, Methods, and Milestones Jaime Arguello INLS 613: Text Data Mining jarguell@email.unc.edu November](https://reader035.vdocuments.us/reader035/viewer/2022070212/61066e8d2c9a08106d7fa98e/html5/thumbnails/5.jpg)
blogs
books
Information Needs in Today’s World
5
Tuesday, November 18, 14
![Page 6: Aggregated Search: Motivations, Methods, and Milestones...Aggregated Search: Motivations, Methods, and Milestones Jaime Arguello INLS 613: Text Data Mining jarguell@email.unc.edu November](https://reader035.vdocuments.us/reader035/viewer/2022070212/61066e8d2c9a08106d7fa98e/html5/thumbnails/6.jpg)
weather
movies
stock information
Information Needs in Today’s World
6
Tuesday, November 18, 14
![Page 7: Aggregated Search: Motivations, Methods, and Milestones...Aggregated Search: Motivations, Methods, and Milestones Jaime Arguello INLS 613: Text Data Mining jarguell@email.unc.edu November](https://reader035.vdocuments.us/reader035/viewer/2022070212/61066e8d2c9a08106d7fa98e/html5/thumbnails/7.jpg)
translation
text-entry calculations
flight information
social media updates
Information Needs in Today’s World
7
Tuesday, November 18, 14
![Page 8: Aggregated Search: Motivations, Methods, and Milestones...Aggregated Search: Motivations, Methods, and Milestones Jaime Arguello INLS 613: Text Data Mining jarguell@email.unc.edu November](https://reader035.vdocuments.us/reader035/viewer/2022070212/61066e8d2c9a08106d7fa98e/html5/thumbnails/8.jpg)
• different information needs are associated with:
‣ different retrievable items (representations)
‣ different definitions of relevance
‣ different information-seeking behavior
• they cannot be supported by a single search engine (in the traditional sense)
• the trend is towards specialization and integration
‣ highly specialized services brought together within a single search interface
Information Needs in Today’s World
8
Tuesday, November 18, 14
![Page 9: Aggregated Search: Motivations, Methods, and Milestones...Aggregated Search: Motivations, Methods, and Milestones Jaime Arguello INLS 613: Text Data Mining jarguell@email.unc.edu November](https://reader035.vdocuments.us/reader035/viewer/2022070212/61066e8d2c9a08106d7fa98e/html5/thumbnails/9.jpg)
maps
images
books
web
web
Aggregated Search
9
Tuesday, November 18, 14
![Page 10: Aggregated Search: Motivations, Methods, and Milestones...Aggregated Search: Motivations, Methods, and Milestones Jaime Arguello INLS 613: Text Data Mining jarguell@email.unc.edu November](https://reader035.vdocuments.us/reader035/viewer/2022070212/61066e8d2c9a08106d7fa98e/html5/thumbnails/10.jpg)
• aggregated search: providing users with integrated access to multiple specialized search services within a single interface
10
Aggregated Search
Tuesday, November 18, 14
![Page 11: Aggregated Search: Motivations, Methods, and Milestones...Aggregated Search: Motivations, Methods, and Milestones Jaime Arguello INLS 613: Text Data Mining jarguell@email.unc.edu November](https://reader035.vdocuments.us/reader035/viewer/2022070212/61066e8d2c9a08106d7fa98e/html5/thumbnails/11.jpg)
Background
Tuesday, November 18, 14
![Page 12: Aggregated Search: Motivations, Methods, and Milestones...Aggregated Search: Motivations, Methods, and Milestones Jaime Arguello INLS 613: Text Data Mining jarguell@email.unc.edu November](https://reader035.vdocuments.us/reader035/viewer/2022070212/61066e8d2c9a08106d7fa98e/html5/thumbnails/12.jpg)
12
web
maps
books
news
local
...“pittsburgh”
images
• vertical: a search service that focuses on a particular domain or type of media
portal interface
Aggregated Search on the Web
Tuesday, November 18, 14
![Page 13: Aggregated Search: Motivations, Methods, and Milestones...Aggregated Search: Motivations, Methods, and Milestones Jaime Arguello INLS 613: Text Data Mining jarguell@email.unc.edu November](https://reader035.vdocuments.us/reader035/viewer/2022070212/61066e8d2c9a08106d7fa98e/html5/thumbnails/13.jpg)
• a user may not know that a vertical has relevant content (e.g., news)
• a user may want results from multiple verticals at once (e.g., planning a trip)
13
Why Aggregate Information?
Tuesday, November 18, 14
![Page 14: Aggregated Search: Motivations, Methods, and Milestones...Aggregated Search: Motivations, Methods, and Milestones Jaime Arguello INLS 613: Text Data Mining jarguell@email.unc.edu November](https://reader035.vdocuments.us/reader035/viewer/2022070212/61066e8d2c9a08106d7fa98e/html5/thumbnails/14.jpg)
14
Task Decomposition
web
maps
books
news
local
...“pittsburgh”
images
portal interface
Tuesday, November 18, 14
![Page 15: Aggregated Search: Motivations, Methods, and Milestones...Aggregated Search: Motivations, Methods, and Milestones Jaime Arguello INLS 613: Text Data Mining jarguell@email.unc.edu November](https://reader035.vdocuments.us/reader035/viewer/2022070212/61066e8d2c9a08106d7fa98e/html5/thumbnails/15.jpg)
15
Task Decomposition
web
maps
books
news
local
...“pittsburgh”
images
portal interface
• vertical selection: predicting which verticals, if any, are relevant to the query
Tuesday, November 18, 14
![Page 16: Aggregated Search: Motivations, Methods, and Milestones...Aggregated Search: Motivations, Methods, and Milestones Jaime Arguello INLS 613: Text Data Mining jarguell@email.unc.edu November](https://reader035.vdocuments.us/reader035/viewer/2022070212/61066e8d2c9a08106d7fa98e/html5/thumbnails/16.jpg)
web
16
maps
images
books
web
web
Task Decomposition
maps
books
news
local
...“pittsburgh”
images
• vertical results presentation: predicting where in the web results to present the vertical results
portal interface
Tuesday, November 18, 14
![Page 17: Aggregated Search: Motivations, Methods, and Milestones...Aggregated Search: Motivations, Methods, and Milestones Jaime Arguello INLS 613: Text Data Mining jarguell@email.unc.edu November](https://reader035.vdocuments.us/reader035/viewer/2022070212/61066e8d2c9a08106d7fa98e/html5/thumbnails/17.jpg)
17
Outline
• vertical selection (Arguello et. al. SIGIR 2009; Arguello et. al., SIGIR 2010)
• aggregated search evaluation (Arguello et al. ECIR 2011)
Tuesday, November 18, 14
![Page 18: Aggregated Search: Motivations, Methods, and Milestones...Aggregated Search: Motivations, Methods, and Milestones Jaime Arguello INLS 613: Text Data Mining jarguell@email.unc.edu November](https://reader035.vdocuments.us/reader035/viewer/2022070212/61066e8d2c9a08106d7fa98e/html5/thumbnails/18.jpg)
18
Vertical Selection
web
maps
books
news
local
...“pittsburgh”
images
portal interface
• predicting which vertical(s), if any, are relevant to the query
Tuesday, November 18, 14
![Page 19: Aggregated Search: Motivations, Methods, and Milestones...Aggregated Search: Motivations, Methods, and Milestones Jaime Arguello INLS 613: Text Data Mining jarguell@email.unc.edu November](https://reader035.vdocuments.us/reader035/viewer/2022070212/61066e8d2c9a08106d7fa98e/html5/thumbnails/19.jpg)
19
• automatically searching across multiple distributed collections (of full-text documents)
C1
C2
C3
Cn
...
Prior Researchfederated search
Tuesday, November 18, 14
![Page 20: Aggregated Search: Motivations, Methods, and Milestones...Aggregated Search: Motivations, Methods, and Milestones Jaime Arguello INLS 613: Text Data Mining jarguell@email.unc.edu November](https://reader035.vdocuments.us/reader035/viewer/2022070212/61066e8d2c9a08106d7fa98e/html5/thumbnails/20.jpg)
20
• resource selection: predicting which collections to search
C1
C2
C3
Cn
...“pittsburgh”
Prior Researchfederated search
Tuesday, November 18, 14
![Page 21: Aggregated Search: Motivations, Methods, and Milestones...Aggregated Search: Motivations, Methods, and Milestones Jaime Arguello INLS 613: Text Data Mining jarguell@email.unc.edu November](https://reader035.vdocuments.us/reader035/viewer/2022070212/61066e8d2c9a08106d7fa98e/html5/thumbnails/21.jpg)
21
merged ranking
• results merging: combining their results into a single merged ranking
C1
C2
C3
Cn
...“pittsburgh”
Prior Researchfederated search
Tuesday, November 18, 14
![Page 22: Aggregated Search: Motivations, Methods, and Milestones...Aggregated Search: Motivations, Methods, and Milestones Jaime Arguello INLS 613: Text Data Mining jarguell@email.unc.edu November](https://reader035.vdocuments.us/reader035/viewer/2022070212/61066e8d2c9a08106d7fa98e/html5/thumbnails/22.jpg)
22
• resource relevance as a function of sample relevance
docdocdocdocdocdoc
docdocdoc
docdocdocdocdocdoc
sample index
query( )q
...docdocdocdocdocdocdocdocdoc
docdocdocdocdocdocdocdocdoc
docdocdocdocdocdocdocdocdoc
docdocdocdocdocdocdocdocdoc
RCqR
sq
Prior Researchfederated search: unsupervised resource selection
Tuesday, November 18, 14
![Page 23: Aggregated Search: Motivations, Methods, and Milestones...Aggregated Search: Motivations, Methods, and Milestones Jaime Arguello INLS 613: Text Data Mining jarguell@email.unc.edu November](https://reader035.vdocuments.us/reader035/viewer/2022070212/61066e8d2c9a08106d7fa98e/html5/thumbnails/23.jpg)
23
1. combine cross-collection samples within a single index
docdocdocdocdocdoc
docdocdoc
docdocdocdocdocdoc
sample index
...docdocdocdocdocdocdocdocdoc
docdocdocdocdocdocdocdocdoc
docdocdocdocdocdocdocdocdoc
docdocdocdocdocdocdocdocdoc
Prior Researchfederated search: unsupervised resource selection
Tuesday, November 18, 14
![Page 24: Aggregated Search: Motivations, Methods, and Milestones...Aggregated Search: Motivations, Methods, and Milestones Jaime Arguello INLS 613: Text Data Mining jarguell@email.unc.edu November](https://reader035.vdocuments.us/reader035/viewer/2022070212/61066e8d2c9a08106d7fa98e/html5/thumbnails/24.jpg)
24
2. conduct a retrieval to predict a set of relevant samples
docdocdocdocdocdoc
docdocdoc
docdocdocdocdocdoc
sample index
query( )q
...docdocdocdocdocdocdocdocdoc
docdocdocdocdocdocdocdocdoc
docdocdocdocdocdocdocdocdoc
docdocdocdocdocdocdocdocdoc
Rsq
Prior Researchfederated search: unsupervised resource selection
Tuesday, November 18, 14
![Page 25: Aggregated Search: Motivations, Methods, and Milestones...Aggregated Search: Motivations, Methods, and Milestones Jaime Arguello INLS 613: Text Data Mining jarguell@email.unc.edu November](https://reader035.vdocuments.us/reader035/viewer/2022070212/61066e8d2c9a08106d7fa98e/html5/thumbnails/25.jpg)
25
3. select resources as a function of sample relevance
docdocdocdocdocdoc
docdocdoc
docdocdocdocdocdoc
sample index
query( )
RCqR
sq
q
...docdocdocdocdocdocdocdocdoc
docdocdocdocdocdocdocdocdoc
docdocdocdocdocdocdocdocdoc
docdocdocdocdocdocdocdocdoc
select
Prior Researchfederated search: unsupervised resource selection
Tuesday, November 18, 14
![Page 26: Aggregated Search: Motivations, Methods, and Milestones...Aggregated Search: Motivations, Methods, and Milestones Jaime Arguello INLS 613: Text Data Mining jarguell@email.unc.edu November](https://reader035.vdocuments.us/reader035/viewer/2022070212/61066e8d2c9a08106d7fa98e/html5/thumbnails/26.jpg)
26
• assume document type and retrieval algorithm homogeneity across resources
• derive evidence exclusively from collection content
Prior Researchfederated search: limitations
Tuesday, November 18, 14
![Page 27: Aggregated Search: Motivations, Methods, and Milestones...Aggregated Search: Motivations, Methods, and Milestones Jaime Arguello INLS 613: Text Data Mining jarguell@email.unc.edu November](https://reader035.vdocuments.us/reader035/viewer/2022070212/61066e8d2c9a08106d7fa98e/html5/thumbnails/27.jpg)
27
web
contentquery query-logs
local
images
travel
news
... ...
Sources of Evidence for Vertical Selection
“pittsburgh hotels”
Tuesday, November 18, 14
![Page 28: Aggregated Search: Motivations, Methods, and Milestones...Aggregated Search: Motivations, Methods, and Milestones Jaime Arguello INLS 613: Text Data Mining jarguell@email.unc.edu November](https://reader035.vdocuments.us/reader035/viewer/2022070212/61066e8d2c9a08106d7fa98e/html5/thumbnails/28.jpg)
28
web
contentquery query-logs
local
images
travel
news
... ...
“pittsburgh pics”
Sources of Evidence for Vertical Selection
Tuesday, November 18, 14
![Page 29: Aggregated Search: Motivations, Methods, and Milestones...Aggregated Search: Motivations, Methods, and Milestones Jaime Arguello INLS 613: Text Data Mining jarguell@email.unc.edu November](https://reader035.vdocuments.us/reader035/viewer/2022070212/61066e8d2c9a08106d7fa98e/html5/thumbnails/29.jpg)
29
Supervised Vertical Selection
• machine learning approach: predict vertical relevance as a function of a set of features
• learn a different model for each vertical
• training data: a set of queries with (positive and negative) relevance labels for each candidate vertical
Tuesday, November 18, 14
![Page 30: Aggregated Search: Motivations, Methods, and Milestones...Aggregated Search: Motivations, Methods, and Milestones Jaime Arguello INLS 613: Text Data Mining jarguell@email.unc.edu November](https://reader035.vdocuments.us/reader035/viewer/2022070212/61066e8d2c9a08106d7fa98e/html5/thumbnails/30.jpg)
30
Features
• vertical corpus features
‣ similarity between the query and sampled documents
• vertical query-log features
‣ similarity between the query and vertical query-traffic
• query features (vertical independent)
‣ the query’s topical category (e.g., travel-related)
‣ presence of a particular term (e.g., “pittsburgh pics”)
‣ geographic named entity types (e.g., city name)
Tuesday, November 18, 14
![Page 31: Aggregated Search: Motivations, Methods, and Milestones...Aggregated Search: Motivations, Methods, and Milestones Jaime Arguello INLS 613: Text Data Mining jarguell@email.unc.edu November](https://reader035.vdocuments.us/reader035/viewer/2022070212/61066e8d2c9a08106d7fa98e/html5/thumbnails/31.jpg)
Evaluation Methodology
Tuesday, November 18, 14
![Page 32: Aggregated Search: Motivations, Methods, and Milestones...Aggregated Search: Motivations, Methods, and Milestones Jaime Arguello INLS 613: Text Data Mining jarguell@email.unc.edu November](https://reader035.vdocuments.us/reader035/viewer/2022070212/61066e8d2c9a08106d7fa98e/html5/thumbnails/32.jpg)
• given a query, predict a single relevant vertical or predict that no vertical is relevant
32
Task Formulationsingle vertical prediction
Tuesday, November 18, 14
![Page 33: Aggregated Search: Motivations, Methods, and Milestones...Aggregated Search: Motivations, Methods, and Milestones Jaime Arguello INLS 613: Text Data Mining jarguell@email.unc.edu November](https://reader035.vdocuments.us/reader035/viewer/2022070212/61066e8d2c9a08106d7fa98e/html5/thumbnails/33.jpg)
web
travel
33
• predict a single relevant vertical
Task Formulationsingle vertical prediction
Tuesday, November 18, 14
![Page 34: Aggregated Search: Motivations, Methods, and Milestones...Aggregated Search: Motivations, Methods, and Milestones Jaime Arguello INLS 613: Text Data Mining jarguell@email.unc.edu November](https://reader035.vdocuments.us/reader035/viewer/2022070212/61066e8d2c9a08106d7fa98e/html5/thumbnails/34.jpg)
web
34
• predict that no vertical is relevant
Task Formulationsingle vertical prediction
Tuesday, November 18, 14
![Page 35: Aggregated Search: Motivations, Methods, and Milestones...Aggregated Search: Motivations, Methods, and Milestones Jaime Arguello INLS 613: Text Data Mining jarguell@email.unc.edu November](https://reader035.vdocuments.us/reader035/viewer/2022070212/61066e8d2c9a08106d7fa98e/html5/thumbnails/35.jpg)
• logistic regression models
• use highest confidence prediction
• default to no vertical if highest confidence is below threshold
Supervised Classification Framework
35
Q
maps
images
news
travel
...
travel model
images model
news model
maps model
no vertical
...
P(travel|Q)
P(images|Q)
P(news|Q)
P(travel|Q)
...
P(no vertical|Q)
n + 1
Tuesday, November 18, 14
![Page 36: Aggregated Search: Motivations, Methods, and Milestones...Aggregated Search: Motivations, Methods, and Milestones Jaime Arguello INLS 613: Text Data Mining jarguell@email.unc.edu November](https://reader035.vdocuments.us/reader035/viewer/2022070212/61066e8d2c9a08106d7fa98e/html5/thumbnails/36.jpg)
36
• percentage of queries for which we make a correct prediction
‣ correctly predict a vertical that is relevant
‣ correctly predict that no vertical is relevant
Evaluation Metricsingle-vertical precision
Tuesday, November 18, 14
![Page 37: Aggregated Search: Motivations, Methods, and Milestones...Aggregated Search: Motivations, Methods, and Milestones Jaime Arguello INLS 613: Text Data Mining jarguell@email.unc.edu November](https://reader035.vdocuments.us/reader035/viewer/2022070212/61066e8d2c9a08106d7fa98e/html5/thumbnails/37.jpg)
Verticals
37
mapsjobs
moviesgames
financetv
autosvideosportshealth
directorymusicnews
imagestravel
referencelocal
shopping
0% 10% 20% 30%
queries for which the vertical is relevant
Tuesday, November 18, 14
![Page 38: Aggregated Search: Motivations, Methods, and Milestones...Aggregated Search: Motivations, Methods, and Milestones Jaime Arguello INLS 613: Text Data Mining jarguell@email.unc.edu November](https://reader035.vdocuments.us/reader035/viewer/2022070212/61066e8d2c9a08106d7fa98e/html5/thumbnails/38.jpg)
• about 25,000 queries drawn randomly from commercial Web traffic
• human annotators assigned between 0-6 relevant verticals per query
• 70% of queries assigned either one relevant vertical or none
Queries
38
Tuesday, November 18, 14
![Page 39: Aggregated Search: Motivations, Methods, and Milestones...Aggregated Search: Motivations, Methods, and Milestones Jaime Arguello INLS 613: Text Data Mining jarguell@email.unc.edu November](https://reader035.vdocuments.us/reader035/viewer/2022070212/61066e8d2c9a08106d7fa98e/html5/thumbnails/39.jpg)
• redde: content-based resource selection method
• clarity: query difficulty measure
• qlog: similarity to vertical query-traffic
• soft.redde: redde variant
• no.rel: always predict no vertical relevant
Single-Evidence Baselines
39
Tuesday, November 18, 14
![Page 40: Aggregated Search: Motivations, Methods, and Milestones...Aggregated Search: Motivations, Methods, and Milestones Jaime Arguello INLS 613: Text Data Mining jarguell@email.unc.edu November](https://reader035.vdocuments.us/reader035/viewer/2022070212/61066e8d2c9a08106d7fa98e/html5/thumbnails/40.jpg)
Experimental Results
Tuesday, November 18, 14
![Page 41: Aggregated Search: Motivations, Methods, and Milestones...Aggregated Search: Motivations, Methods, and Milestones Jaime Arguello INLS 613: Text Data Mining jarguell@email.unc.edu November](https://reader035.vdocuments.us/reader035/viewer/2022070212/61066e8d2c9a08106d7fa98e/html5/thumbnails/41.jpg)
clarity 0.254no.rel 0.263▲
soft.redde 0.324▲
redde 0.336▲
qlog 0.368▲
classification 0.583▲
41
▲ statistically significant improvement in performance (p < 0.05) compared to all
worse-performing methods
Resultssingle-vertical precision
Tuesday, November 18, 14
![Page 42: Aggregated Search: Motivations, Methods, and Milestones...Aggregated Search: Motivations, Methods, and Milestones Jaime Arguello INLS 613: Text Data Mining jarguell@email.unc.edu November](https://reader035.vdocuments.us/reader035/viewer/2022070212/61066e8d2c9a08106d7fa98e/html5/thumbnails/42.jpg)
58% improvement over best single-evidence baseline
42
clarity 0.254no.rel 0.263▲
soft.redde 0.324▲
redde 0.336▲
qlog 0.368▲
classification 0.583▲
Resultssingle-vertical precision
Tuesday, November 18, 14
![Page 43: Aggregated Search: Motivations, Methods, and Milestones...Aggregated Search: Motivations, Methods, and Milestones Jaime Arguello INLS 613: Text Data Mining jarguell@email.unc.edu November](https://reader035.vdocuments.us/reader035/viewer/2022070212/61066e8d2c9a08106d7fa98e/html5/thumbnails/43.jpg)
Discussion
all 0.583no.querylog 0.583 0.03%no.boolean 0.583 -0.03%
no.clarity 0.582 -0.10%%%no.geographical 0.577▼ -1.01%
no.redde 0.568▼ -2.60%no.soft.redde 0.567▼ -2.67%
no.category 0.552▼ -5.33%
43
• is evidence integration helpful?
▼ statistically significant decrease in performance (p < 0.05) compared to the model
that uses all features
Tuesday, November 18, 14
![Page 44: Aggregated Search: Motivations, Methods, and Milestones...Aggregated Search: Motivations, Methods, and Milestones Jaime Arguello INLS 613: Text Data Mining jarguell@email.unc.edu November](https://reader035.vdocuments.us/reader035/viewer/2022070212/61066e8d2c9a08106d7fa98e/html5/thumbnails/44.jpg)
Discussion• is evidence integration helpful?
44
all 0.583no.querylog 0.583 0.03%no.boolean 0.583 -0.03%
no.clarity 0.582 -0.10%%%no.geographical 0.577▼ -1.01%
no.redde 0.568▼ -2.60%no.soft.redde 0.567▼ -2.67%
no.category 0.552▼ -5.33%
multiple types of evidence contribute to performance
corpus query-log query
Tuesday, November 18, 14
![Page 45: Aggregated Search: Motivations, Methods, and Milestones...Aggregated Search: Motivations, Methods, and Milestones Jaime Arguello INLS 613: Text Data Mining jarguell@email.unc.edu November](https://reader035.vdocuments.us/reader035/viewer/2022070212/61066e8d2c9a08106d7fa98e/html5/thumbnails/45.jpg)
Discussion
45
all 0.583no.querylog 0.583 0.03%no.boolean 0.583 -0.03%
no.clarity 0.582 -0.10%%%no.geographical 0.577▼ -1.01%
no.redde 0.568▼ -2.60%no.soft.redde 0.567▼ -2.67%
no.category 0.552▼ -5.33%
most predictive source of evidence requires human supervision
• is evidence integration helpful?
corpus query-log query
Tuesday, November 18, 14
![Page 46: Aggregated Search: Motivations, Methods, and Milestones...Aggregated Search: Motivations, Methods, and Milestones Jaime Arguello INLS 613: Text Data Mining jarguell@email.unc.edu November](https://reader035.vdocuments.us/reader035/viewer/2022070212/61066e8d2c9a08106d7fa98e/html5/thumbnails/46.jpg)
• per-vertical performance
travel 0.842health 0.788music 0.772games 0.771autos 0.730sports 0.726
tv 0.716movies 0.688finance 0.655
local 0.619jobs 0.570
shopping 0.563images 0.483
video 0.459news 0.456
reference 0.348maps 0.000
directory 0.000
46
Discussion
Tuesday, November 18, 14
![Page 47: Aggregated Search: Motivations, Methods, and Milestones...Aggregated Search: Motivations, Methods, and Milestones Jaime Arguello INLS 613: Text Data Mining jarguell@email.unc.edu November](https://reader035.vdocuments.us/reader035/viewer/2022070212/61066e8d2c9a08106d7fa98e/html5/thumbnails/47.jpg)
47
travel 0.842health 0.788music 0.772games 0.771autos 0.730sports 0.726
tv 0.716movies 0.688finance 0.655
local 0.619jobs 0.570
shopping 0.563images 0.483
video 0.459news 0.456
reference 0.348maps 0.000
directory 0.000
• topically focused
Discussion
Tuesday, November 18, 14
![Page 48: Aggregated Search: Motivations, Methods, and Milestones...Aggregated Search: Motivations, Methods, and Milestones Jaime Arguello INLS 613: Text Data Mining jarguell@email.unc.edu November](https://reader035.vdocuments.us/reader035/viewer/2022070212/61066e8d2c9a08106d7fa98e/html5/thumbnails/48.jpg)
48
travel 0.842health 0.788music 0.772games 0.771autos 0.730sports 0.726
tv 0.716movies 0.688finance 0.655
local 0.619jobs 0.570
shopping 0.563images 0.483
video 0.459news 0.456
reference 0.348maps 0.000
directory 0.000
• topically diverse
Discussion
Tuesday, November 18, 14
![Page 49: Aggregated Search: Motivations, Methods, and Milestones...Aggregated Search: Motivations, Methods, and Milestones Jaime Arguello INLS 613: Text Data Mining jarguell@email.unc.edu November](https://reader035.vdocuments.us/reader035/viewer/2022070212/61066e8d2c9a08106d7fa98e/html5/thumbnails/49.jpg)
49
travel 0.842health 0.788music 0.772games 0.771autos 0.730sports 0.726
tv 0.716movies 0.688finance 0.655
local 0.619jobs 0.570
shopping 0.563images 0.483
video 0.459news 0.456
reference 0.348maps 0.000
directory 0.000
• text-impoverished
Discussion
Tuesday, November 18, 14
![Page 50: Aggregated Search: Motivations, Methods, and Milestones...Aggregated Search: Motivations, Methods, and Milestones Jaime Arguello INLS 613: Text Data Mining jarguell@email.unc.edu November](https://reader035.vdocuments.us/reader035/viewer/2022070212/61066e8d2c9a08106d7fa98e/html5/thumbnails/50.jpg)
50
travel 0.842health 0.788music 0.772games 0.771autos 0.730sports 0.726
tv 0.716movies 0.688finance 0.655
local 0.619jobs 0.570
shopping 0.563images 0.483
video 0.459news 0.456
reference 0.348maps 0.000
directory 0.000
• highly dynamic content and user interests
Discussion
Tuesday, November 18, 14
![Page 51: Aggregated Search: Motivations, Methods, and Milestones...Aggregated Search: Motivations, Methods, and Milestones Jaime Arguello INLS 613: Text Data Mining jarguell@email.unc.edu November](https://reader035.vdocuments.us/reader035/viewer/2022070212/61066e8d2c9a08106d7fa98e/html5/thumbnails/51.jpg)
51
• traditional resource selection methods not well-suited for this environment
‣ derive evidence exclusively from collection content
• a machine learning approach performs better
• multiple types of evidence contribute to performance
• the most predictive type of evidence (the query category) requires human supervision
Milestones
Tuesday, November 18, 14
![Page 52: Aggregated Search: Motivations, Methods, and Milestones...Aggregated Search: Motivations, Methods, and Milestones Jaime Arguello INLS 613: Text Data Mining jarguell@email.unc.edu November](https://reader035.vdocuments.us/reader035/viewer/2022070212/61066e8d2c9a08106d7fa98e/html5/thumbnails/52.jpg)
52
local
images
travel
news
...finance
web
• learn a model for a new vertical using only existing vertical training data
Vertical Selection Adaptation(Arguello et al., SIGIR 2010)
Tuesday, November 18, 14
![Page 53: Aggregated Search: Motivations, Methods, and Milestones...Aggregated Search: Motivations, Methods, and Milestones Jaime Arguello INLS 613: Text Data Mining jarguell@email.unc.edu November](https://reader035.vdocuments.us/reader035/viewer/2022070212/61066e8d2c9a08106d7fa98e/html5/thumbnails/53.jpg)
53
• vertical selection (Arguello et. al. SIGIR 2009; Arguello et. al., SIGIR 2010)
• aggregated search evaluation (Arguello et al. ECIR 2011)
Outline
Tuesday, November 18, 14
![Page 54: Aggregated Search: Motivations, Methods, and Milestones...Aggregated Search: Motivations, Methods, and Milestones Jaime Arguello INLS 613: Text Data Mining jarguell@email.unc.edu November](https://reader035.vdocuments.us/reader035/viewer/2022070212/61066e8d2c9a08106d7fa98e/html5/thumbnails/54.jpg)
web
54
maps
images
books
web
web
maps
books
news
local
...“pittsburgh”
images
portal interface
Vertical Results Presentation
• predicting where in the web results to present the vertical results
Tuesday, November 18, 14
![Page 55: Aggregated Search: Motivations, Methods, and Milestones...Aggregated Search: Motivations, Methods, and Milestones Jaime Arguello INLS 613: Text Data Mining jarguell@email.unc.edu November](https://reader035.vdocuments.us/reader035/viewer/2022070212/61066e8d2c9a08106d7fa98e/html5/thumbnails/55.jpg)
55
End-To-End Output
maps
web
news
images
web
Tuesday, November 18, 14
![Page 56: Aggregated Search: Motivations, Methods, and Milestones...Aggregated Search: Motivations, Methods, and Milestones Jaime Arguello INLS 613: Text Data Mining jarguell@email.unc.edu November](https://reader035.vdocuments.us/reader035/viewer/2022070212/61066e8d2c9a08106d7fa98e/html5/thumbnails/56.jpg)
56
How good is this presentation?
maps
web
news
images
web
Tuesday, November 18, 14
![Page 57: Aggregated Search: Motivations, Methods, and Milestones...Aggregated Search: Motivations, Methods, and Milestones Jaime Arguello INLS 613: Text Data Mining jarguell@email.unc.edu November](https://reader035.vdocuments.us/reader035/viewer/2022070212/61066e8d2c9a08106d7fa98e/html5/thumbnails/57.jpg)
maps
57
Is this one better?
web
images
web
news
Tuesday, November 18, 14
![Page 58: Aggregated Search: Motivations, Methods, and Milestones...Aggregated Search: Motivations, Methods, and Milestones Jaime Arguello INLS 613: Text Data Mining jarguell@email.unc.edu November](https://reader035.vdocuments.us/reader035/viewer/2022070212/61066e8d2c9a08106d7fa98e/html5/thumbnails/58.jpg)
maps
58
What about this one?
web
images
web
weather
Tuesday, November 18, 14
![Page 59: Aggregated Search: Motivations, Methods, and Milestones...Aggregated Search: Motivations, Methods, and Milestones Jaime Arguello INLS 613: Text Data Mining jarguell@email.unc.edu November](https://reader035.vdocuments.us/reader035/viewer/2022070212/61066e8d2c9a08106d7fa98e/html5/thumbnails/59.jpg)
1. define a set of layout constraints
‣ will define the set of all possible presentations
2. define an evaluation metric that can measure the quality of any possible presentation
3. validate the metric by ensuring that it correlates with human preferences
59
Aggregated Search Evaluation Objectives
Tuesday, November 18, 14
![Page 60: Aggregated Search: Motivations, Methods, and Milestones...Aggregated Search: Motivations, Methods, and Milestones Jaime Arguello INLS 613: Text Data Mining jarguell@email.unc.edu November](https://reader035.vdocuments.us/reader035/viewer/2022070212/61066e8d2c9a08106d7fa98e/html5/thumbnails/60.jpg)
Layout Constraints and Task Formulation
60
Tuesday, November 18, 14
![Page 61: Aggregated Search: Motivations, Methods, and Milestones...Aggregated Search: Motivations, Methods, and Milestones Jaime Arguello INLS 613: Text Data Mining jarguell@email.unc.edu November](https://reader035.vdocuments.us/reader035/viewer/2022070212/61066e8d2c9a08106d7fa98e/html5/thumbnails/61.jpg)
Layout Constraints and Task Formulation
61
vertical slot 1
vertical slot 2
vertical slot 3
Tuesday, November 18, 14
![Page 62: Aggregated Search: Motivations, Methods, and Milestones...Aggregated Search: Motivations, Methods, and Milestones Jaime Arguello INLS 613: Text Data Mining jarguell@email.unc.edu November](https://reader035.vdocuments.us/reader035/viewer/2022070212/61066e8d2c9a08106d7fa98e/html5/thumbnails/62.jpg)
Layout Constraints and Task Formulation
62
vertical slot 1
vertical slot 2
vertical slot 3
web block 1
web block 2
Tuesday, November 18, 14
![Page 63: Aggregated Search: Motivations, Methods, and Milestones...Aggregated Search: Motivations, Methods, and Milestones Jaime Arguello INLS 613: Text Data Mining jarguell@email.unc.edu November](https://reader035.vdocuments.us/reader035/viewer/2022070212/61066e8d2c9a08106d7fa98e/html5/thumbnails/63.jpg)
web block 1
web block 2
image block
books block
weather block
news block
map block
63
Layout Constraints and Task Formulation
imaginary end-of-results block
Tuesday, November 18, 14
![Page 64: Aggregated Search: Motivations, Methods, and Milestones...Aggregated Search: Motivations, Methods, and Milestones Jaime Arguello INLS 613: Text Data Mining jarguell@email.unc.edu November](https://reader035.vdocuments.us/reader035/viewer/2022070212/61066e8d2c9a08106d7fa98e/html5/thumbnails/64.jpg)
• formulate vertical results presentation as block ranking
• web blocks are always presented and maintain their natural order
• suppressed vertical blocks are effectively tied
displayed
suppressed
64
Layout Constraints and Task Formulation
Tuesday, November 18, 14
![Page 65: Aggregated Search: Motivations, Methods, and Milestones...Aggregated Search: Motivations, Methods, and Milestones Jaime Arguello INLS 613: Text Data Mining jarguell@email.unc.edu November](https://reader035.vdocuments.us/reader035/viewer/2022070212/61066e8d2c9a08106d7fa98e/html5/thumbnails/65.jpg)
65
Aggregated Search Evaluation Objectives
1. define a set of layout constraints
‣ will define the set of all possible presentations
2. define an evaluation metric that can measure the quality of any possible presentation
3. validate the metric by ensuring that it correlates with human preferences
Tuesday, November 18, 14
![Page 66: Aggregated Search: Motivations, Methods, and Milestones...Aggregated Search: Motivations, Methods, and Milestones Jaime Arguello INLS 613: Text Data Mining jarguell@email.unc.edu November](https://reader035.vdocuments.us/reader035/viewer/2022070212/61066e8d2c9a08106d7fa98e/html5/thumbnails/66.jpg)
How good is a presentation?
web 1
images
weather
web 2
66
• problem: a query with 10 blocks has 3,628,800 possible rankings/presentations
Tuesday, November 18, 14
![Page 67: Aggregated Search: Motivations, Methods, and Milestones...Aggregated Search: Motivations, Methods, and Milestones Jaime Arguello INLS 613: Text Data Mining jarguell@email.unc.edu November](https://reader035.vdocuments.us/reader035/viewer/2022070212/61066e8d2c9a08106d7fa98e/html5/thumbnails/67.jpg)
67
• given a query, collect human judgements on individual blocks and use these to derive a “ground truth” presentation
• evaluate alternative presentations based on their distance to this “ground truth”
Metric-Based Evaluation Approach
Tuesday, November 18, 14
![Page 68: Aggregated Search: Motivations, Methods, and Milestones...Aggregated Search: Motivations, Methods, and Milestones Jaime Arguello INLS 613: Text Data Mining jarguell@email.unc.edu November](https://reader035.vdocuments.us/reader035/viewer/2022070212/61066e8d2c9a08106d7fa98e/html5/thumbnails/68.jpg)
1. compose web and vertical blocks for the query
68
Metric-Based Evaluation Approach
web block 1
web block 2
image block
books block
weather block
news block
map block
Tuesday, November 18, 14
![Page 69: Aggregated Search: Motivations, Methods, and Milestones...Aggregated Search: Motivations, Methods, and Milestones Jaime Arguello INLS 613: Text Data Mining jarguell@email.unc.edu November](https://reader035.vdocuments.us/reader035/viewer/2022070212/61066e8d2c9a08106d7fa98e/html5/thumbnails/69.jpg)
maps block
69
image block
?
Metric-Based Evaluation Approach
2. collect preference judgements on all block pairs
‣ left is better, right is better, both are bad?
Tuesday, November 18, 14
![Page 70: Aggregated Search: Motivations, Methods, and Milestones...Aggregated Search: Motivations, Methods, and Milestones Jaime Arguello INLS 613: Text Data Mining jarguell@email.unc.edu November](https://reader035.vdocuments.us/reader035/viewer/2022070212/61066e8d2c9a08106d7fa98e/html5/thumbnails/70.jpg)
70
?
Metric-Based Evaluation Approach
web block 1image block
2. collect preference judgements on all block pairs
‣ left is better, right is better, both are bad?
Tuesday, November 18, 14
![Page 71: Aggregated Search: Motivations, Methods, and Milestones...Aggregated Search: Motivations, Methods, and Milestones Jaime Arguello INLS 613: Text Data Mining jarguell@email.unc.edu November](https://reader035.vdocuments.us/reader035/viewer/2022070212/61066e8d2c9a08106d7fa98e/html5/thumbnails/71.jpg)
3. use these pairwise preference judgements to construct a ground truth or reference presentation
71
σ∗
schulze voting method
(Schulze, 2010)
Metric-Based Evaluation Approach
Tuesday, November 18, 14
![Page 72: Aggregated Search: Motivations, Methods, and Milestones...Aggregated Search: Motivations, Methods, and Milestones Jaime Arguello INLS 613: Text Data Mining jarguell@email.unc.edu November](https://reader035.vdocuments.us/reader035/viewer/2022070212/61066e8d2c9a08106d7fa98e/html5/thumbnails/72.jpg)
72
σ1
σ∗
K(σ∗, σ1)
Metric-Based Evaluation Approach
generalized kendall’s tau(Kumar et. al., 2010)
4. evaluate a presentation based on its distance to the reference
Tuesday, November 18, 14
![Page 73: Aggregated Search: Motivations, Methods, and Milestones...Aggregated Search: Motivations, Methods, and Milestones Jaime Arguello INLS 613: Text Data Mining jarguell@email.unc.edu November](https://reader035.vdocuments.us/reader035/viewer/2022070212/61066e8d2c9a08106d7fa98e/html5/thumbnails/73.jpg)
schulze voting method
73
reference
1. compose web and vertical blocks
for the query
2. collect redundant preference judgements
on every block-pair
3. derive reference presentation
4. evaluate a presentation based on its distance to the reference
generalized kendall’s tau
Metric-Based Evaluation Approach
Tuesday, November 18, 14
![Page 74: Aggregated Search: Motivations, Methods, and Milestones...Aggregated Search: Motivations, Methods, and Milestones Jaime Arguello INLS 613: Text Data Mining jarguell@email.unc.edu November](https://reader035.vdocuments.us/reader035/viewer/2022070212/61066e8d2c9a08106d7fa98e/html5/thumbnails/74.jpg)
74
1. define a set of layout constraints
‣ will define the set of all possible presentations
2. define an evaluation metric that can measure the quality of any possible presentation
3. validate the metric by ensuring that it correlates with human preferences
Aggregated Search Evaluation Objectives
Tuesday, November 18, 14
![Page 75: Aggregated Search: Motivations, Methods, and Milestones...Aggregated Search: Motivations, Methods, and Milestones Jaime Arguello INLS 613: Text Data Mining jarguell@email.unc.edu November](https://reader035.vdocuments.us/reader035/viewer/2022070212/61066e8d2c9a08106d7fa98e/html5/thumbnails/75.jpg)
Empirical Metric Validation
• does the metric agree with human preferences on pairs of full presentations?
75
vs.web 1
images books
weather
news
map
web 2
web 1
weather
map
web 2
!
σ1 σ2
Tuesday, November 18, 14
![Page 76: Aggregated Search: Motivations, Methods, and Milestones...Aggregated Search: Motivations, Methods, and Milestones Jaime Arguello INLS 613: Text Data Mining jarguell@email.unc.edu November](https://reader035.vdocuments.us/reader035/viewer/2022070212/61066e8d2c9a08106d7fa98e/html5/thumbnails/76.jpg)
• agreement: K(σ1, σ∗) < K(σ2, σ
∗)
76
σ1
σ2
σ∗
Empirical Metric Validation
!
Tuesday, November 18, 14
![Page 77: Aggregated Search: Motivations, Methods, and Milestones...Aggregated Search: Motivations, Methods, and Milestones Jaime Arguello INLS 613: Text Data Mining jarguell@email.unc.edu November](https://reader035.vdocuments.us/reader035/viewer/2022070212/61066e8d2c9a08106d7fa98e/html5/thumbnails/77.jpg)
• show assessors pairs of presentations and ask which they prefer
• sample presentation-pairs from particular regions of the metric-space
• H-H, H-M, H-L, M-M, M-L, L-L
H M L
77
Methodology
Tuesday, November 18, 14
![Page 78: Aggregated Search: Motivations, Methods, and Milestones...Aggregated Search: Motivations, Methods, and Milestones Jaime Arguello INLS 613: Text Data Mining jarguell@email.unc.edu November](https://reader035.vdocuments.us/reader035/viewer/2022070212/61066e8d2c9a08106d7fa98e/html5/thumbnails/78.jpg)
Materials
• 72 queries selected manually
• 13 verticals constructed from freely available Web APIs (Google, eBay, Yahoo, YouTube)
• human judgements also collected using Amazon’s Mechanical Turk
• 72 queries x 6 bin-combs. x 4 pairs per bin-comb. x 4 judges per pair = 6,912 judgements
78
Tuesday, November 18, 14
![Page 79: Aggregated Search: Motivations, Methods, and Milestones...Aggregated Search: Motivations, Methods, and Milestones Jaime Arguello INLS 613: Text Data Mining jarguell@email.unc.edu November](https://reader035.vdocuments.us/reader035/viewer/2022070212/61066e8d2c9a08106d7fa98e/html5/thumbnails/79.jpg)
Empirical Metric Validation Results
Tuesday, November 18, 14
![Page 80: Aggregated Search: Motivations, Methods, and Milestones...Aggregated Search: Motivations, Methods, and Milestones Jaime Arguello INLS 613: Text Data Mining jarguell@email.unc.edu November](https://reader035.vdocuments.us/reader035/viewer/2022070212/61066e8d2c9a08106d7fa98e/html5/thumbnails/80.jpg)
κfleiss = 0.660
(substantial)80
Inter-assessor Agreementblock-pairs
?
web block 1image block
Tuesday, November 18, 14
![Page 81: Aggregated Search: Motivations, Methods, and Milestones...Aggregated Search: Motivations, Methods, and Milestones Jaime Arguello INLS 613: Text Data Mining jarguell@email.unc.edu November](https://reader035.vdocuments.us/reader035/viewer/2022070212/61066e8d2c9a08106d7fa98e/html5/thumbnails/81.jpg)
81
Inter-assessor Agreementpresentation-pairs
vs.web 1
images books
weather
news
map
web 2
web 1
weather
map
web 2
?
σ1 σ2
Tuesday, November 18, 14
![Page 82: Aggregated Search: Motivations, Methods, and Milestones...Aggregated Search: Motivations, Methods, and Milestones Jaime Arguello INLS 613: Text Data Mining jarguell@email.unc.edu November](https://reader035.vdocuments.us/reader035/viewer/2022070212/61066e8d2c9a08106d7fa98e/html5/thumbnails/82.jpg)
binH-H 0.066H-M 0.290H-L 0.303
M-M 0.216M-L 0.179L-L 0.237
ALL 0.237
κfleiss
82
Inter-assessor Agreementpresentation-pairs
• fair agreement
• lower than agreement on block-pair judgements
Tuesday, November 18, 14
![Page 83: Aggregated Search: Motivations, Methods, and Milestones...Aggregated Search: Motivations, Methods, and Milestones Jaime Arguello INLS 613: Text Data Mining jarguell@email.unc.edu November](https://reader035.vdocuments.us/reader035/viewer/2022070212/61066e8d2c9a08106d7fa98e/html5/thumbnails/83.jpg)
binH-H 0.066H-M 0.290H-L 0.303
M-M 0.216M-L 0.179L-L 0.237
ALL 0.237
κfleiss
83
Inter-assessor Agreementpresentation-pairs
• very low agreement on H-H pairs
• these pairs were almost identical in terms of the top-ranked blocks
Tuesday, November 18, 14
![Page 84: Aggregated Search: Motivations, Methods, and Milestones...Aggregated Search: Motivations, Methods, and Milestones Jaime Arguello INLS 613: Text Data Mining jarguell@email.unc.edu November](https://reader035.vdocuments.us/reader035/viewer/2022070212/61066e8d2c9a08106d7fa98e/html5/thumbnails/84.jpg)
binH-H 0.066H-M 0.290H-L 0.303
M-M 0.216M-L 0.179L-L 0.237
ALL 0.237
κfleiss
84
Inter-assessor Agreementpresentation-pairs
• higher agreement on H-M and H-L pairs
• assessors agreed on pairs where one presentation was close to the reference and the other was far
Tuesday, November 18, 14
![Page 85: Aggregated Search: Motivations, Methods, and Milestones...Aggregated Search: Motivations, Methods, and Milestones Jaime Arguello INLS 613: Text Data Mining jarguell@email.unc.edu November](https://reader035.vdocuments.us/reader035/viewer/2022070212/61066e8d2c9a08106d7fa98e/html5/thumbnails/85.jpg)
bin pairs % agree% agreeH-H 164 60.37▲H-M 210 81.90▲H-L 204 84.31▲
M-M 184 57.61▲M-L 187 50.80L-L 202 63.37▲
ALL 1151 67.07▲
▲ = statistically significant agreement based on a sign-test. The null hypothesis is
that the metric selects the preferred presentation randomly with equal probability
85
bin pairs % agree% agreeH-H 47 65.96▲H-M 95 87.37▲H-L 97 91.75▲
M-M 75 58.67M-L 71 54.93L-L 77 63.64▲
ALL 462 72.51▲
3/4 majority or better 4/4 majority
Metric Agreementmetric vs. majority preference
Tuesday, November 18, 14
![Page 86: Aggregated Search: Motivations, Methods, and Milestones...Aggregated Search: Motivations, Methods, and Milestones Jaime Arguello INLS 613: Text Data Mining jarguell@email.unc.edu November](https://reader035.vdocuments.us/reader035/viewer/2022070212/61066e8d2c9a08106d7fa98e/html5/thumbnails/86.jpg)
86
• when assessors agreed with each other, the metric agreed with the assessors
Metric Agreementmetric vs. majority preference
bin pairs % agree% agreeH-H 164 60.37▲H-M 210 81.90▲H-L 204 84.31▲
M-M 184 57.61▲M-L 187 50.80L-L 202 63.37▲
ALL 1151 67.07▲
bin pairs % agree% agreeH-H 47 65.96▲H-M 95 87.37▲H-L 97 91.75▲
M-M 75 58.67M-L 71 54.93L-L 77 63.64▲
ALL 462 72.51▲
3/4 majority or better 4/4 majority
Tuesday, November 18, 14
![Page 87: Aggregated Search: Motivations, Methods, and Milestones...Aggregated Search: Motivations, Methods, and Milestones Jaime Arguello INLS 613: Text Data Mining jarguell@email.unc.edu November](https://reader035.vdocuments.us/reader035/viewer/2022070212/61066e8d2c9a08106d7fa98e/html5/thumbnails/87.jpg)
87
• on average, the reference presentation was good
3/4 majority or better 4/4 majority
Metric Agreementmetric vs. majority preference
bin pairs % agree% agreeH-H 164 60.37▲H-M 210 81.90▲H-L 204 84.31▲
M-M 184 57.61▲M-L 187 50.80L-L 202 63.37▲
ALL 1151 67.07▲
bin pairs % agree% agreeH-H 47 65.96▲H-M 95 87.37▲H-L 97 91.75▲
M-M 75 58.67M-L 71 54.93L-L 77 63.64▲
ALL 462 72.51▲
Tuesday, November 18, 14
![Page 88: Aggregated Search: Motivations, Methods, and Milestones...Aggregated Search: Motivations, Methods, and Milestones Jaime Arguello INLS 613: Text Data Mining jarguell@email.unc.edu November](https://reader035.vdocuments.us/reader035/viewer/2022070212/61066e8d2c9a08106d7fa98e/html5/thumbnails/88.jpg)
88
• less correlated for presentations far from the reference (low quality presentations)
3/4 majority or better 4/4 majority
Metric Agreementmetric vs. majority preference
bin pairs % agree% agreeH-H 164 60.37▲H-M 210 81.90▲H-L 204 84.31▲
M-M 184 57.61▲M-L 187 50.80L-L 202 63.37▲
ALL 1151 67.07▲
bin pairs % agree% agreeH-H 47 65.96▲H-M 95 87.37▲H-L 97 91.75▲
M-M 75 58.67M-L 71 54.93L-L 77 63.64▲
ALL 462 72.51▲
Tuesday, November 18, 14
![Page 89: Aggregated Search: Motivations, Methods, and Milestones...Aggregated Search: Motivations, Methods, and Milestones Jaime Arguello INLS 613: Text Data Mining jarguell@email.unc.edu November](https://reader035.vdocuments.us/reader035/viewer/2022070212/61066e8d2c9a08106d7fa98e/html5/thumbnails/89.jpg)
89
images
web
• in some cases, assessors favored a particular vertical only in the context of a full presentation
• suggests interactions between cross-vertical results
Discussionblock-pair assessment interface
Tuesday, November 18, 14
![Page 90: Aggregated Search: Motivations, Methods, and Milestones...Aggregated Search: Motivations, Methods, and Milestones Jaime Arguello INLS 613: Text Data Mining jarguell@email.unc.edu November](https://reader035.vdocuments.us/reader035/viewer/2022070212/61066e8d2c9a08106d7fa98e/html5/thumbnails/90.jpg)
90
• add context to the block-pair assessment interface?
Discussionblock-pair assessment interface
vs.
web 1
images
weather
σ1 σ2
web 1
images
weather
?
Tuesday, November 18, 14
![Page 91: Aggregated Search: Motivations, Methods, and Milestones...Aggregated Search: Motivations, Methods, and Milestones Jaime Arguello INLS 613: Text Data Mining jarguell@email.unc.edu November](https://reader035.vdocuments.us/reader035/viewer/2022070212/61066e8d2c9a08106d7fa98e/html5/thumbnails/91.jpg)
91
• 20% of the assessors did 80% of the work
‣ “traps” can be used to detect careless assessors
‣ determines reliability for 80% of the work
‣ determining reliability for the other 20% is difficult
• increase fix costs: qualification tests
• lots of interesting research on deriving judgements (and learning models) from multiple noisy assessors
• TREC 2011 will offer a crowd-sourcing track
Discussioncrowd-sourcing relevance judgements
Tuesday, November 18, 14
![Page 92: Aggregated Search: Motivations, Methods, and Milestones...Aggregated Search: Motivations, Methods, and Milestones Jaime Arguello INLS 613: Text Data Mining jarguell@email.unc.edu November](https://reader035.vdocuments.us/reader035/viewer/2022070212/61066e8d2c9a08106d7fa98e/html5/thumbnails/92.jpg)
92
• with fewer than 100 judgements per query, we can evaluate any possible presentation
‣ portable test collection
‣ does not require live system with (many) users
‣ facilitates model learning (my current research)
• correlates with human preferences (when humans agree with each other)
• general: bottom-up construction of “ground truth” + rank-based distance metric
Milestonesmetric-based evaluation approach
Tuesday, November 18, 14
![Page 93: Aggregated Search: Motivations, Methods, and Milestones...Aggregated Search: Motivations, Methods, and Milestones Jaime Arguello INLS 613: Text Data Mining jarguell@email.unc.edu November](https://reader035.vdocuments.us/reader035/viewer/2022070212/61066e8d2c9a08106d7fa98e/html5/thumbnails/93.jpg)
Concluding Remarks
Tuesday, November 18, 14
![Page 94: Aggregated Search: Motivations, Methods, and Milestones...Aggregated Search: Motivations, Methods, and Milestones Jaime Arguello INLS 613: Text Data Mining jarguell@email.unc.edu November](https://reader035.vdocuments.us/reader035/viewer/2022070212/61066e8d2c9a08106d7fa98e/html5/thumbnails/94.jpg)
94
• IR systems will continuously support a wider-range of information needs
• different information needs require customized solutions
• the trend is towards specialization and integration
• aggregated search provides integrated access to specialized systems within a single search interface
‣ predicting which back-ends to display
‣ predicting how to combine their results
Information Needs in Today’s World
Tuesday, November 18, 14
![Page 95: Aggregated Search: Motivations, Methods, and Milestones...Aggregated Search: Motivations, Methods, and Milestones Jaime Arguello INLS 613: Text Data Mining jarguell@email.unc.edu November](https://reader035.vdocuments.us/reader035/viewer/2022070212/61066e8d2c9a08106d7fa98e/html5/thumbnails/95.jpg)
95
• dynamic content and dynamic user interests (e.g. news)
‣ implicit user feedback
‣ generate training data retrospectively
• improve cohesion between cross-vertical results
‣ capitalize on highly-confident verticals
• incorporate user-context and session-level evidence into aggregated search decisions
Aggregated Searchcurrent challenges and opportunities
Tuesday, November 18, 14
![Page 96: Aggregated Search: Motivations, Methods, and Milestones...Aggregated Search: Motivations, Methods, and Milestones Jaime Arguello INLS 613: Text Data Mining jarguell@email.unc.edu November](https://reader035.vdocuments.us/reader035/viewer/2022070212/61066e8d2c9a08106d7fa98e/html5/thumbnails/96.jpg)
96
• aggregation in other environments
‣ mobile search, library search, enterprise search, personal information management, social network search
• search assistance and customized interactions
‣ search tools can be viewed as verticals
‣ predict when and how to make search tools available to a user
Aggregated Searchextensions
Tuesday, November 18, 14
![Page 97: Aggregated Search: Motivations, Methods, and Milestones...Aggregated Search: Motivations, Methods, and Milestones Jaime Arguello INLS 613: Text Data Mining jarguell@email.unc.edu November](https://reader035.vdocuments.us/reader035/viewer/2022070212/61066e8d2c9a08106d7fa98e/html5/thumbnails/97.jpg)
97
Aggregated Searchextensions: search assistance
Tuesday, November 18, 14
![Page 98: Aggregated Search: Motivations, Methods, and Milestones...Aggregated Search: Motivations, Methods, and Milestones Jaime Arguello INLS 613: Text Data Mining jarguell@email.unc.edu November](https://reader035.vdocuments.us/reader035/viewer/2022070212/61066e8d2c9a08106d7fa98e/html5/thumbnails/98.jpg)
98
Aggregated Searchextensions: customized interactions
Tuesday, November 18, 14