azzopardi2012economics of iir_tech_talk
DESCRIPTION
In this talk, I discuss how Micro-economics can be used to describe, explain and prediction the interactions of a user and information retrieval system. The work is based on the ACM SIGIR 2011 paper ( http://dl.acm.org/citation.cfm?id=2009923 ) and is available to download from: http://www.dcs.gla.ac.uk/~leif/papers/azzopardi2011economics.pdfTRANSCRIPT
![Page 1: Azzopardi2012economics of iir_tech_talk](https://reader035.vdocuments.us/reader035/viewer/2022062616/54b6ed554a7959aa218b46ef/html5/thumbnails/1.jpg)
The Economics in Interactive Information Retrieval
Leif Azzopardihttp://www.dcs.gla.ac.uk/~leif
![Page 2: Azzopardi2012economics of iir_tech_talk](https://reader035.vdocuments.us/reader035/viewer/2022062616/54b6ed554a7959aa218b46ef/html5/thumbnails/2.jpg)
Cost
Interaction
Benefit
![Page 3: Azzopardi2012economics of iir_tech_talk](https://reader035.vdocuments.us/reader035/viewer/2022062616/54b6ed554a7959aa218b46ef/html5/thumbnails/3.jpg)
RelevantInformation
Interactive and Iterative Search
Queries
A simplified, abstracted, representation
Information Need
DocumentsReturned
System
User
![Page 4: Azzopardi2012economics of iir_tech_talk](https://reader035.vdocuments.us/reader035/viewer/2022062616/54b6ed554a7959aa218b46ef/html5/thumbnails/4.jpg)
Observational & Empirical
Theoretical & FormalInformation Foraging Theory
ASK
Berry Picking IS&R
Framework
Pirolli (1999)
![Page 5: Azzopardi2012economics of iir_tech_talk](https://reader035.vdocuments.us/reader035/viewer/2022062616/54b6ed554a7959aa218b46ef/html5/thumbnails/5.jpg)
Interactive Information Retrieval needs formal models to: • describe, explain and predict the interaction of users
with systems,• provide a basis on which to reason about interaction,• understand the relationships between interaction,
performance and cost,• help guide the design, development and research of
information systems, and• derive laws and principles of interaction.
Theoretical & Formal
A Major Research Challenge
Belkin (2008)
Jarvelin (2011)
![Page 6: Azzopardi2012economics of iir_tech_talk](https://reader035.vdocuments.us/reader035/viewer/2022062616/54b6ed554a7959aa218b46ef/html5/thumbnails/6.jpg)
How do users behave?
Patent searchers typically examine 100-200 documents per query (using a Boolean system)
User queries tend to be short (only 2-3 terms) Web searchers typically
only examine the first page of results
Users adapt to degraded systems by issuing more queries
Users rarely provide explicit relevance feedback
Users will often pose a series of short queries
Patent searchers usually express longer and complex queries
Why do users behave like this?
![Page 7: Azzopardi2012economics of iir_tech_talk](https://reader035.vdocuments.us/reader035/viewer/2022062616/54b6ed554a7959aa218b46ef/html5/thumbnails/7.jpg)
So why do users pose short queries?
User queries tend to be short
But longer queries tend to be more effective!
![Page 8: Azzopardi2012economics of iir_tech_talk](https://reader035.vdocuments.us/reader035/viewer/2022062616/54b6ed554a7959aa218b46ef/html5/thumbnails/8.jpg)
So why do users pose short queries?
0 5 10 15 20 25 300
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
0.5
Query Length (No. of Terms)
Perf
orm
ance
Exponentially diminishing returns kicks in after 2 query terms
Around 2-3 terms is where the user gets the most bang for their buck
Marginal Performance
Total Performance
Azzopardi (2009)
![Page 9: Azzopardi2012economics of iir_tech_talk](https://reader035.vdocuments.us/reader035/viewer/2022062616/54b6ed554a7959aa218b46ef/html5/thumbnails/9.jpg)
How can we use microeconomics to model the search process?
![Page 10: Azzopardi2012economics of iir_tech_talk](https://reader035.vdocuments.us/reader035/viewer/2022062616/54b6ed554a7959aa218b46ef/html5/thumbnails/10.jpg)
Microeconomics
Consumer Theory
Production Theory
Utility Maximization
Cost Minimization
![Page 11: Azzopardi2012economics of iir_tech_talk](https://reader035.vdocuments.us/reader035/viewer/2022062616/54b6ed554a7959aa218b46ef/html5/thumbnails/11.jpg)
Production Theorya.k.a. Theory of Firms
OutputInputs
The Firm
Technology
Utilizes Constrains
CapitalLabor
Widgets
Varian (1987)
![Page 12: Azzopardi2012economics of iir_tech_talk](https://reader035.vdocuments.us/reader035/viewer/2022062616/54b6ed554a7959aa218b46ef/html5/thumbnails/12.jpg)
Production FunctionsCa
pita
l
Labor
Production Function
![Page 13: Azzopardi2012economics of iir_tech_talk](https://reader035.vdocuments.us/reader035/viewer/2022062616/54b6ed554a7959aa218b46ef/html5/thumbnails/13.jpg)
Production FunctionsCa
pita
l
LaborQuantity 1
Quantity 2
Quantity 3
Production Function Quantity = F ( Capital, Labor )
![Page 14: Azzopardi2012economics of iir_tech_talk](https://reader035.vdocuments.us/reader035/viewer/2022062616/54b6ed554a7959aa218b46ef/html5/thumbnails/14.jpg)
Production FunctionsCa
pita
l
LaborQuantity 1
Quantity 2
Quantity 3
Production FunctionProduction Set
![Page 15: Azzopardi2012economics of iir_tech_talk](https://reader035.vdocuments.us/reader035/viewer/2022062616/54b6ed554a7959aa218b46ef/html5/thumbnails/15.jpg)
Production FunctionsCa
pita
l
LaborQuantity 1
Quantity 2
Quantity 3
Technology constrainsthe production set
Production FunctionProduction Set
![Page 16: Azzopardi2012economics of iir_tech_talk](https://reader035.vdocuments.us/reader035/viewer/2022062616/54b6ed554a7959aa218b46ef/html5/thumbnails/16.jpg)
Applying Production Theory to Interactive Information Retrieval
![Page 17: Azzopardi2012economics of iir_tech_talk](https://reader035.vdocuments.us/reader035/viewer/2022062616/54b6ed554a7959aa218b46ef/html5/thumbnails/17.jpg)
RelevantInformation
Interactive and Iterative Search
Queries
A simplified, abstracted, representation
Information Need
DocumentsReturned
System
User
![Page 18: Azzopardi2012economics of iir_tech_talk](https://reader035.vdocuments.us/reader035/viewer/2022062616/54b6ed554a7959aa218b46ef/html5/thumbnails/18.jpg)
Search as Production
OutputInputs
The Firm
Search Engine Technology
Utilizes Constrains
QueriesAssessments
Relevance Gain
![Page 19: Azzopardi2012economics of iir_tech_talk](https://reader035.vdocuments.us/reader035/viewer/2022062616/54b6ed554a7959aa218b46ef/html5/thumbnails/19.jpg)
Search Production FunctionN
o. o
f Que
ries
(Q)
No. of Assessments per Query (A)Gain = 10
Gain = 20
Gain = 30
Gain = F(Q,A)
The function represents how well a system could be used. i.e. the min input required to achieve that level of gain
![Page 20: Azzopardi2012economics of iir_tech_talk](https://reader035.vdocuments.us/reader035/viewer/2022062616/54b6ed554a7959aa218b46ef/html5/thumbnails/20.jpg)
Few Queries,
Lots of Assessment
s?
Lots of Queries,
Few Assessment
s?
Or someother way?
What strategies can the user employwhen interacting with the search system to achieve their end goal
What is the most cost-efficient way for a user to interact with an IR system?
![Page 21: Azzopardi2012economics of iir_tech_talk](https://reader035.vdocuments.us/reader035/viewer/2022062616/54b6ed554a7959aa218b46ef/html5/thumbnails/21.jpg)
Modeling Caveatsof an economic model of the search process
AbstractedSimplified
Representative
Gain = F(Q,A)
![Page 22: Azzopardi2012economics of iir_tech_talk](https://reader035.vdocuments.us/reader035/viewer/2022062616/54b6ed554a7959aa218b46ef/html5/thumbnails/22.jpg)
What does the model tell us about search & interaction?
![Page 23: Azzopardi2012economics of iir_tech_talk](https://reader035.vdocuments.us/reader035/viewer/2022062616/54b6ed554a7959aa218b46ef/html5/thumbnails/23.jpg)
ScenarioSearch Scenario
• Task: Find news articles about ….
• Goal: To find a number of relevant documents and reach the desired level of Cumulative Gain.
• Output: Total Cumulative Gain (G) across the session
• Inputs:
– Y No. of Queries, and
– X No. of Assessments per Query
• Collections:
– TREC News Collections (AP, LA, Aquaint)
– Each topic had about 30 or more relevant documents
• Simulation: built using C++ and the Lemur IR toolkit
![Page 24: Azzopardi2012economics of iir_tech_talk](https://reader035.vdocuments.us/reader035/viewer/2022062616/54b6ed554a7959aa218b46ef/html5/thumbnails/24.jpg)
Simulating User Interaction
TREC Documents marked Relevant
Issues Y Queries of Length 3
TREC Aquaint Topics
AssessesX Documents per QuerySimulated User
Models:ProbabilisticVector SpaceBoolean
Queries generated from Relevant set
Record X & Y for each level of gain
Select the best query first/next
The simulation assumes the user has perfect information – in order to find out how well the system could be used.
![Page 25: Azzopardi2012economics of iir_tech_talk](https://reader035.vdocuments.us/reader035/viewer/2022062616/54b6ed554a7959aa218b46ef/html5/thumbnails/25.jpg)
0 50 100 150 200 250 3000
2
4
6
8
10
12
14
16
18
20
BM25 NCG=0.2
BM25 NCG=0.4
Search Production Curves
No. of Assessments per Query
No.
of Q
uerie
sTREC Aquaint Collection
8 Q & 15 Q/A gets NCG = 0.44 Q & 40 Q/A gets NCG = 0.4
7.7 Q & 5 Q/A gets NCG = 0.23.6 Q & 15 Q/A gets NCG = 0.2
Same Retrieval Model, Different Gain
To double the gain, requires more than double the no. of assessments
![Page 26: Azzopardi2012economics of iir_tech_talk](https://reader035.vdocuments.us/reader035/viewer/2022062616/54b6ed554a7959aa218b46ef/html5/thumbnails/26.jpg)
0 50 100 150 200 250 3000
2
4
6
8
10
12
14
16
18
20BM25 NCG=0.4
BOOL NCG=0.4
TFIDF NCG=0.4
Search Production Curves
No. of Assessments per Query
No.
of Q
uerie
s
TREC Aquaint Collection
No input combinations with depth less than this are technically feasible!
BM25 provides more strategies (i.e. input combinations) than BOOL or TFIDF
User Adaption:-BM25: 5 Q @ 25 A/Q-BOOL: 10 Q @ 25A/QMore queries on the degraded systems
For the same gain, BOOL and TFIDF require a lot more interaction.
Different Retrieval Models, Same Gain
![Page 27: Azzopardi2012economics of iir_tech_talk](https://reader035.vdocuments.us/reader035/viewer/2022062616/54b6ed554a7959aa218b46ef/html5/thumbnails/27.jpg)
Search Production FunctionCobbs-Douglas Production Function
Model K α Goodness of FitBM25 5.39 0.58 0.995BOOL 3.47 0.58 0.992TFIDF 1.69 0.50 0.997
Example Values on Aquaint when NCG = 0.6
No. of queries issued
No. of Assessments per query Mixing parameter determined by the technology
Efficiency of the technology used
![Page 28: Azzopardi2012economics of iir_tech_talk](https://reader035.vdocuments.us/reader035/viewer/2022062616/54b6ed554a7959aa218b46ef/html5/thumbnails/28.jpg)
Using the Cobbs-Douglas Search Function
– the change in gain over the change in querying– i.e. how much more gain do we get if we pose
extra queries
We can differentiate the function to find the rates of change of the input variables
Marginal Product of Querying
Marginal Product of Assessing – the change in gain over the change in assessing– i.e. how much more gain do we get if we assess
extra documents
![Page 29: Azzopardi2012economics of iir_tech_talk](https://reader035.vdocuments.us/reader035/viewer/2022062616/54b6ed554a7959aa218b46ef/html5/thumbnails/29.jpg)
Technical Rate of Substitution
0 50 100 150 200 250 3000
2
4
6
8
10
12
14
16
18
20
BM25 NCG=0.4
How many more assessments per query are needed, if one less query was posed?
0.4
1.2
2.5
4.2
8.3
No. of Assessments per Query
No.
of Q
uerie
s
TRS of Assessments for Queries
EXAMPLE:If 5 queries are submitted, instead of 6, then 24.2 docs/query need to be assessed, instead of 20 docs/query
6Q @ 20A / Q = 120 A5Q @ 24.2 / Q = 121 A
At this point if you gave up one query you’d need to assess 1.2 extra docs/query
![Page 30: Azzopardi2012economics of iir_tech_talk](https://reader035.vdocuments.us/reader035/viewer/2022062616/54b6ed554a7959aa218b46ef/html5/thumbnails/30.jpg)
What about the cost of interaction?
![Page 31: Azzopardi2012economics of iir_tech_talk](https://reader035.vdocuments.us/reader035/viewer/2022062616/54b6ed554a7959aa218b46ef/html5/thumbnails/31.jpg)
User Search Cost Function
No. of queries issued
No. of Assessments per query
Relative cost of a Query to an Assessment
Total no. of documents assessed
A linear cost function
What is the relative cost of a query?Using cognitive costs of querying and assessing taken from Gwizdka (2010):• The average cost of querying was 2628 ms• The average cost of assessing was 2226 ms• So β was set to 2628/2226 = 1.1598
![Page 32: Azzopardi2012economics of iir_tech_talk](https://reader035.vdocuments.us/reader035/viewer/2022062616/54b6ed554a7959aa218b46ef/html5/thumbnails/32.jpg)
Cost Efficient Strategies
0 5 10 15 20 25 300
10
20
30
40
50
0 5 10 15 20 25 30130
180
230
280
330
380
BM25 0.4 and 0.6 Gains
Cost
No.
of Q
uerie
s
No. of Assessment per Query
Minimum Cost
On BM25 to increase gain pose more queries, but examine the same no. of docs per [email protected]
![Page 33: Azzopardi2012economics of iir_tech_talk](https://reader035.vdocuments.us/reader035/viewer/2022062616/54b6ed554a7959aa218b46ef/html5/thumbnails/33.jpg)
Cost Efficient Strategies
20 40 60 80100
120140
160180
200300500700900
110013001500
20 60100
140180
02468
1012
Cost
No.
of Q
uerie
s
BOOL 0.4 & 0.6 Gains
No. of Assessment per Query
Minimum Cost
On Boolean, to increase gain,
issue the about the same no. of queries,
but examine more docs per query
![Page 34: Azzopardi2012economics of iir_tech_talk](https://reader035.vdocuments.us/reader035/viewer/2022062616/54b6ed554a7959aa218b46ef/html5/thumbnails/34.jpg)
Contrasting Systems
20 40 60 80100
120140
160180
200300500700900
110013001500
20 60100
140180
02468
1012
Cost
No.
of Q
uerie
s
0 5 10 15 20 25 300
10
20
30
40
50
0 5 10 15 20 25 30130
180
230
280
330
380
BM25 0.4 and 0.6 Gains BOOL 0.4 and 0.6 Gains
Cost
No.
of Q
uerie
s
No. of Assessment per Query No. of Assessment per Query
BM25 is less costly to use than BOOL
On BM25 issue more queries
But examine less doc per [email protected]
![Page 35: Azzopardi2012economics of iir_tech_talk](https://reader035.vdocuments.us/reader035/viewer/2022062616/54b6ed554a7959aa218b46ef/html5/thumbnails/35.jpg)
A Hypothetical Experiment
Querying costsgo down?
More queries issued
Decrease in assessments per query
Querying costs go up?
Increase in assessmentsper query
Decrease inqueries issued
$$$$
What happens if
![Page 36: Azzopardi2012economics of iir_tech_talk](https://reader035.vdocuments.us/reader035/viewer/2022062616/54b6ed554a7959aa218b46ef/html5/thumbnails/36.jpg)
Changing the Relative Query CostCo
st
No. of Assessment per Query
As β increases the relative cost of querying goes up, it is cheaper to assess more documents per query and consequently query less!
![Page 37: Azzopardi2012economics of iir_tech_talk](https://reader035.vdocuments.us/reader035/viewer/2022062616/54b6ed554a7959aa218b46ef/html5/thumbnails/37.jpg)
• Knowing how benefit, interaction and cost relate can help guide how we design systems – We can theorize about how changes to the system
will affect the user’s interaction• Is this desirable? Do we want the user to query more? Or
for them to assess more?
– We can categorize the type of user• Is this a savvy rational user? Or is this a user behaving
irrationally?
– We can scrutinize the introduce of new features• Are they going to be of any use? Are they worth it for the
user? i.e. how much more performance, or how little must they cost?
Implications for Design
![Page 38: Azzopardi2012economics of iir_tech_talk](https://reader035.vdocuments.us/reader035/viewer/2022062616/54b6ed554a7959aa218b46ef/html5/thumbnails/38.jpg)
Future Directions• Validate the theory by conducting
observational & empirical research– Do the predictions about user behavior hold?
• Incorporate other inputs into the model– Find Similar, Relevance Feedback, Browsing, – Query length, Query Type, etc
• Develop more accurate cost functions– Obtain Better Estimates of Costs
• Model other search tasks
Future Directions
![Page 40: Azzopardi2012economics of iir_tech_talk](https://reader035.vdocuments.us/reader035/viewer/2022062616/54b6ed554a7959aa218b46ef/html5/thumbnails/40.jpg)
• Varian, H., Intermediate Microeconomics, 1987• Varian, H., Economics and Search, ACM SIGIR Forum,
1999• Pirolli, P., Information Foraging Theory, 1999• Belkin, N., Some (what) grand challenges of Interactive
Information Retrieval, ACM SIGIR Forum, 2008• Azzopardi, L., Query Side Evaluation, ACM SIGIR 2009
– http://dl.acm.org/citation.cfm?doid=1571941.1572037
• Azzopardi, L., The Economics of Interactive Information Retrieval, ACM SIGIR 2011 – http://dl.acm.org/citation.cfm?doid=2009916.2009923
• Jarvelin, K., IR Research: Systems, Interaction, Evaluation and Theories, ACM SIGIR Forum, 2011
Selected References
![Page 41: Azzopardi2012economics of iir_tech_talk](https://reader035.vdocuments.us/reader035/viewer/2022062616/54b6ed554a7959aa218b46ef/html5/thumbnails/41.jpg)
Search Production FunctionIn
tera
ction
X
Interaction Y
G = F( X, Y )
Example
![Page 42: Azzopardi2012economics of iir_tech_talk](https://reader035.vdocuments.us/reader035/viewer/2022062616/54b6ed554a7959aa218b46ef/html5/thumbnails/42.jpg)
Search Production FunctionLe
ngth
of Q
uery
(L)
No. of Assessments (A)
P@10= 0.1
P@10= 0.2
P@10= 0.3
P@10 = F(L,A)
Example application for web search