factors affecting response rates for real-life mir queries jin ha leem. cameron jonesj. stephen...

1
FACTORS AFFECTING RESPONSE RATES FOR REAL-LIFE MIR QUERIES Jin Ha Lee M. Cameron Jones J. Stephen Downie Graduate School of Library and Information Science University of Illinois at Urbana-Champaign Possible Explanations: Some questions are either impossible or difficult to answer given the query statement, regardless of the price. Users may also need more words to adequately describe their information needs for difficult questions that are less likely to get answered. As users provide more information they may also increase the chance that they are providing incorrect information which may result in search failure. Discussion Discussion Our results show that price has no effect and adding more words has a negative effect, although minimal, on the likelihood of a query being answered. Future Work: A qualitative analysis of the queries to determine which of the answered queries were answered correctly and also assess the level of accuracy of user provided information features for answered queries. Special Thanks to: The Andrew W. Mellon Foundation and the National Science Foundation (Grant No. NSF IIS-0327371) Conclusion and Future Work Conclusion and Future Work Results Results The increase in the proportion of queries answered drops rapidly over time. More than half of all answered queries (51.69%) are answered within 2 hours of being posted and 83.71% within the first 24 hours. Difference in the mean price: (Δµ = 0.097) Logistic regression model: logit(π) = - 0.079 + 0.000X The query price has no statistically significant effect on the probability of getting an answer (likelihood-ratio test statistic = 0.013; df = 1; p = 0.908). Difference in the mean query length: (Δµ = 4.553) Logistic regression model: logit(π) = 0.099 - 0.006X The word count has a statistically significant effect on the probability of getting an answer (likelihood-ratio test statistic = 14.342; df = 1; p = 0.000), but the substantive difference is very small. Price Price Query Length Query Length Response Rate and Time Response Rate and Time Total number of queries: 2,208 Number of answered queries: 1,062 Number of unanswered queries: 1,146 Overall probability of a query being answered ≈ 0.481 Project: This poster reports on research conducted as part of the Human Use of Music Information Retrieval Systems (HUMIRS) project. Objective: To improve our understanding of the factors related to a query being answered. Analysis: First, we examine how the proportion of queries answered varies over time. Then we compare selected features of answered and unanswered queries, namely the price offered for the answer and the length of the query, in order to understand if these variables affect the probability of a query being answered. Introduction Introduction Data: 2,208 real-life music-related queries collected from the Google Answers website’s music category. Features Examined: (1) if the query was answered or not; (2) the elapsed time in minutes between when the query was posted and when it was answered; (3) the number of content words in the query (i.e., stop words removed); and (4) the price offered for the answer. Statistical Analysis: A logistic regression model describes how the probability of a query being answered depends on the explanatory variables. We test if that probability is independent of variable X by the likelihood-ratio test. A well-fitting model is significant at the .05 level. Data and Analysis Data and Analysis

Upload: alyson-allen

Post on 13-Dec-2015

213 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: FACTORS AFFECTING RESPONSE RATES FOR REAL-LIFE MIR QUERIES Jin Ha LeeM. Cameron JonesJ. Stephen Downie Graduate School of Library and Information Science

FACTORS AFFECTING RESPONSE RATES FOR REAL-LIFE MIR QUERIES

Jin Ha Lee M. Cameron Jones J. Stephen Downie

Graduate School of Library and Information ScienceUniversity of Illinois at Urbana-Champaign

Possible Explanations:

Some questions are either impossible or difficult to answer given the query statement, regardless of the price.

Users may also need more words to adequately describe their information needs for difficult questions that are less likely to get answered.

As users provide more information they may also increase the chance that they are providing incorrect information which may result in search failure.

DiscussionDiscussion

Conclusion: Our results show that price has no effect and adding more words has a negative effect, although minimal, on the likelihood of a query being answered.

Future Work: A qualitative analysis of the queries to determine which of the answered queries were answered correctly and also assess the level of accuracy of user provided information features for answered queries.

Special Thanks to: The Andrew W. Mellon Foundation and the National Science Foundation (Grant No. NSF IIS-0327371)

Conclusion and Future WorkConclusion and Future Work

ResultsResults

The increase in the proportion of queries answered drops rapidly over time.

More than half of all answered queries (51.69%) are answered within 2 hours of being posted and 83.71% within the first 24 hours.

Difference in the mean price: (Δµ = 0.097)

Logistic regression model:logit(π) = - 0.079 + 0.000X

The query price has no statistically significant effect on the probability of getting an answer (likelihood-ratio test statistic = 0.013; df = 1; p = 0.908).

Difference in the mean query length: (Δµ = 4.553)

Logistic regression model: logit(π) = 0.099 - 0.006X

The word count has a statistically significant effect on the probability of getting an answer (likelihood-ratio test statistic = 14.342; df = 1; p = 0.000), but the substantive difference is very small.

Pri

ceP

rice

Qu

ery

Len

gth

Qu

ery

Len

gth

Res

po

nse

Rat

e an

d T

ime

Res

po

nse

Rat

e an

d T

ime

Total number of queries: 2,208Number of answered queries: 1,062 Number of unanswered queries: 1,146 Overall probability of a query being answered ≈ 0.481

Project: This poster reports on research conducted as part of the Human Use of Music Information Retrieval Systems (HUMIRS) project.

Objective: To improve our understanding of the factors related to a query being answered.

Analysis: First, we examine how the proportion of queries answered varies over time. Then we compare selected features of answered and unanswered queries, namely the price offered for the answer and the length of the query, in order to understand if these variables affect the probability of a query being answered.

IntroductionIntroduction

Data: 2,208 real-life music-related queries collected from the Google Answers website’s music category.

Features Examined: (1) if the query was answered or not; (2) the elapsed time in minutes between when the query was posted and when it was answered; (3) the number of content words in the query (i.e., stop words removed); and (4) the price offered for the answer.

Statistical Analysis: A logistic regression model describes how the probability of a query being answered depends on the explanatory variables. We test if that probability is independent of variable X by the likelihood-ratio test. A well-fitting model is significant at the .05 level.

Data and AnalysisData and Analysis