11 a classification-based approach to question routing in community question answering tom chao zhou...
DESCRIPTION
33 Community-based Question Answering Knowledge dissemination, information seeking Natural language questions Explicit, self-contained answersTRANSCRIPT
![Page 1: 11 A Classification-based Approach to Question Routing in Community Question Answering Tom Chao Zhou 1, Michael R. Lyu 1, Irwin King 1,2 1 The Chinese](https://reader036.vdocuments.us/reader036/viewer/2022062504/5a4d1b297f8b9ab05999848f/html5/thumbnails/1.jpg)
11
A Classification-based Approach to Question Routing in Community Question Answering
Tom Chao Zhou1, Michael R. Lyu1, Irwin King1,2
1 The Chinese University of Hong Kong2 AT&T Labs Research
{czhou,lyu,king}@[email protected]
Workshop on Community Question Answering on the Webin Conjunction with World Wide Web 2012
April 17, 2012
![Page 2: 11 A Classification-based Approach to Question Routing in Community Question Answering Tom Chao Zhou 1, Michael R. Lyu 1, Irwin King 1,2 1 The Chinese](https://reader036.vdocuments.us/reader036/viewer/2022062504/5a4d1b297f8b9ab05999848f/html5/thumbnails/2.jpg)
22
Introduction
Problem Definition and Feature
Experiments
Conclusions and Future Work
Related Work
![Page 3: 11 A Classification-based Approach to Question Routing in Community Question Answering Tom Chao Zhou 1, Michael R. Lyu 1, Irwin King 1,2 1 The Chinese](https://reader036.vdocuments.us/reader036/viewer/2022062504/5a4d1b297f8b9ab05999848f/html5/thumbnails/3.jpg)
33
Community-based Question Answering• Knowledge dissemination, information
seeking• Natural language questions• Explicit, self-contained answers
![Page 4: 11 A Classification-based Approach to Question Routing in Community Question Answering Tom Chao Zhou 1, Michael R. Lyu 1, Irwin King 1,2 1 The Chinese](https://reader036.vdocuments.us/reader036/viewer/2022062504/5a4d1b297f8b9ab05999848f/html5/thumbnails/4.jpg)
44
How CQA Works
SubmitQuestion
GetAnswers?
Answer Selection, Question Resolved
yes
no
Question Not Resolved
CQA users
• The number of posted questions grows fast.
• Whether users could get questions resolved within a reasonable period?
![Page 5: 11 A Classification-based Approach to Question Routing in Community Question Answering Tom Chao Zhou 1, Michael R. Lyu 1, Irwin King 1,2 1 The Chinese](https://reader036.vdocuments.us/reader036/viewer/2022062504/5a4d1b297f8b9ab05999848f/html5/thumbnails/5.jpg)
55
Whether Questions Get Resolved• Randomly sample 140 questions from each
category in Yahoo! Answers• 26 top-level categories• In total 3,640 questions• Track the status of each question
![Page 6: 11 A Classification-based Approach to Question Routing in Community Question Answering Tom Chao Zhou 1, Michael R. Lyu 1, Irwin King 1,2 1 The Chinese](https://reader036.vdocuments.us/reader036/viewer/2022062504/5a4d1b297f8b9ab05999848f/html5/thumbnails/6.jpg)
6
1 2 3 4 5 6 7 8 911.95% 19.95% 24.75% 26.48% 27.31% 51.32% 61.92% 63.41% 64.45%
Percentage of Questions Resolved
![Page 7: 11 A Classification-based Approach to Question Routing in Community Question Answering Tom Chao Zhou 1, Michael R. Lyu 1, Irwin King 1,2 1 The Chinese](https://reader036.vdocuments.us/reader036/viewer/2022062504/5a4d1b297f8b9ab05999848f/html5/thumbnails/7.jpg)
77
CQA users
How CQA Works
SubmitQuestion
GetAnswers?
Answer Selection, Question Resolved
yes
no
Question Not Resolved
How about we carefully select a set of CQA users who may be interested in the question?
![Page 8: 11 A Classification-based Approach to Question Routing in Community Question Answering Tom Chao Zhou 1, Michael R. Lyu 1, Irwin King 1,2 1 The Chinese](https://reader036.vdocuments.us/reader036/viewer/2022062504/5a4d1b297f8b9ab05999848f/html5/thumbnails/8.jpg)
88
Question Routing• Definition
– Routing open questions to suitable answerers who may be interested in the question
Not interestedin the question
Interestedin the question
No
Yes
![Page 9: 11 A Classification-based Approach to Question Routing in Community Question Answering Tom Chao Zhou 1, Michael R. Lyu 1, Irwin King 1,2 1 The Chinese](https://reader036.vdocuments.us/reader036/viewer/2022062504/5a4d1b297f8b9ab05999848f/html5/thumbnails/9.jpg)
99
Question Routing• Benefits
– Asker’s Perspective• Reduce time lag between the time a question is
posted and it is answered– Answerer’s Perspective
• More enthusiastic in providing answers for interested questions
– CQA’s perspective• Leverage users’ answering passion, leading to the
improvement of the CQA, as well as the boosts of the user’s adhesiveness and loyalty to the system
![Page 10: 11 A Classification-based Approach to Question Routing in Community Question Answering Tom Chao Zhou 1, Michael R. Lyu 1, Irwin King 1,2 1 The Chinese](https://reader036.vdocuments.us/reader036/viewer/2022062504/5a4d1b297f8b9ab05999848f/html5/thumbnails/10.jpg)
1010
Introduction
Problem Definition and Feature
Experiments
Conclusions and Future Work
Related Work
![Page 11: 11 A Classification-based Approach to Question Routing in Community Question Answering Tom Chao Zhou 1, Michael R. Lyu 1, Irwin King 1,2 1 The Chinese](https://reader036.vdocuments.us/reader036/viewer/2022062504/5a4d1b297f8b9ab05999848f/html5/thumbnails/11.jpg)
1111
Problem Definition
Question Routing Problem
Given a question and a user in CQA, determine whether the user will contribute his/her
knowledge to answer the question
![Page 12: 11 A Classification-based Approach to Question Routing in Community Question Answering Tom Chao Zhou 1, Michael R. Lyu 1, Irwin King 1,2 1 The Chinese](https://reader036.vdocuments.us/reader036/viewer/2022062504/5a4d1b297f8b9ab05999848f/html5/thumbnails/12.jpg)
1212
Feature Investigation• Local Features
– Only local information about question, user history and question-user relationships are needed
• Global Features– Take into account the global information of CQA – Consider category as the global information – Questions in the same category discuss similar
topics – Incorporating global information act as the
smoothing effect
![Page 13: 11 A Classification-based Approach to Question Routing in Community Question Answering Tom Chao Zhou 1, Michael R. Lyu 1, Irwin King 1,2 1 The Chinese](https://reader036.vdocuments.us/reader036/viewer/2022062504/5a4d1b297f8b9ab05999848f/html5/thumbnails/13.jpg)
1313
Feature Investigation
# of features Question User History Question-User Relationship
Local Features
3 10 7
Global Features
3 2 1
Feature Investigation Summary
![Page 14: 11 A Classification-based Approach to Question Routing in Community Question Answering Tom Chao Zhou 1, Michael R. Lyu 1, Irwin King 1,2 1 The Chinese](https://reader036.vdocuments.us/reader036/viewer/2022062504/5a4d1b297f8b9ab05999848f/html5/thumbnails/14.jpg)
1414
Local Features• Question (3 features)
– Question Length• Agichtein et al. 2008 found question length an
important feature to measure question quality1.Title length2.Detail length
– Question Type3.5W1H type
– Why, what, where, who and how
![Page 15: 11 A Classification-based Approach to Question Routing in Community Question Answering Tom Chao Zhou 1, Michael R. Lyu 1, Irwin King 1,2 1 The Chinese](https://reader036.vdocuments.us/reader036/viewer/2022062504/5a4d1b297f8b9ab05999848f/html5/thumbnails/15.jpg)
1515
Local Features• User History (10 features)
– Users’ history would have implications for users’ interests and behaviors
– Profile, question and answering behaviors1.Member since2.Percentage of best answer3.Total points4.Number of answers5.Number of best answers6.Number of asked questions7.Number of resolved questions
![Page 16: 11 A Classification-based Approach to Question Routing in Community Question Answering Tom Chao Zhou 1, Michael R. Lyu 1, Irwin King 1,2 1 The Chinese](https://reader036.vdocuments.us/reader036/viewer/2022062504/5a4d1b297f8b9ab05999848f/html5/thumbnails/16.jpg)
1616
Local Features• User History (10 features)
8. Number of stars received9. Answer/question ratio10.Best answer/question ratio
![Page 17: 11 A Classification-based Approach to Question Routing in Community Question Answering Tom Chao Zhou 1, Michael R. Lyu 1, Irwin King 1,2 1 The Chinese](https://reader036.vdocuments.us/reader036/viewer/2022062504/5a4d1b297f8b9ab05999848f/html5/thumbnails/17.jpg)
1717
Local Features• Question-User Relationship (7 features)
– Capture the relationship between a question and a user
– Features adapted from the existing CQA service1. Top contributor
– Features that measure the extent the user is interested in the category given question belongs to
2. Ratio of answered question in the category3. Ratio of best answered question in the category4. Ratio of asked question in the category5. Ratio of starred question in the category
![Page 18: 11 A Classification-based Approach to Question Routing in Community Question Answering Tom Chao Zhou 1, Michael R. Lyu 1, Irwin King 1,2 1 The Chinese](https://reader036.vdocuments.us/reader036/viewer/2022062504/5a4d1b297f8b9ab05999848f/html5/thumbnails/18.jpg)
1818
Local Features• Question-User Relationship (7 features)
– Features describing the similarity of the question’s language model and the user’s language model
6. KL-divergence between given question and a user’s answered questions
7. KL-divergence between given question and a user’s background language model (answered, asked, and starred questions)
![Page 19: 11 A Classification-based Approach to Question Routing in Community Question Answering Tom Chao Zhou 1, Michael R. Lyu 1, Irwin King 1,2 1 The Chinese](https://reader036.vdocuments.us/reader036/viewer/2022062504/5a4d1b297f8b9ab05999848f/html5/thumbnails/19.jpg)
1919
Global Features• Question (3 features)
– Category-level features that smooth each question
1. Average title length2. Average detail length
– Whether the question is representative in the category
3. KL-divergence value between given question and questions in the category given question belongs to
![Page 20: 11 A Classification-based Approach to Question Routing in Community Question Answering Tom Chao Zhou 1, Michael R. Lyu 1, Irwin King 1,2 1 The Chinese](https://reader036.vdocuments.us/reader036/viewer/2022062504/5a4d1b297f8b9ab05999848f/html5/thumbnails/20.jpg)
2020
Global Features• User History (2 features)
– Capture the uniqueness of a user• Question-User Relationship (1 feature)
– The more similar the language model of a user’s answered questions and that of the questions in a category, the more probable a user would answer the questions from the category
• KL-divergence between the user’s answered questions and questions in the category given question belongs to
![Page 21: 11 A Classification-based Approach to Question Routing in Community Question Answering Tom Chao Zhou 1, Michael R. Lyu 1, Irwin King 1,2 1 The Chinese](https://reader036.vdocuments.us/reader036/viewer/2022062504/5a4d1b297f8b9ab05999848f/html5/thumbnails/21.jpg)
2121
Introduction
Problem Definition and Feature
Experiments
Conclusions and Future Work
Related Work
![Page 22: 11 A Classification-based Approach to Question Routing in Community Question Answering Tom Chao Zhou 1, Michael R. Lyu 1, Irwin King 1,2 1 The Chinese](https://reader036.vdocuments.us/reader036/viewer/2022062504/5a4d1b297f8b9ab05999848f/html5/thumbnails/22.jpg)
2222
Experiments• Classification Algorithm
– Support vector machines (SVM) with linear kernel
• Metrics– Precision, recall, F1 for positive class– Accuracy for both classes
• Dataset– Crawled from 3,500 users’ “Answers”,
“Questions”, and “Starred Questions” pages from Yahoo! Answers
![Page 23: 11 A Classification-based Approach to Question Routing in Community Question Answering Tom Chao Zhou 1, Michael R. Lyu 1, Irwin King 1,2 1 The Chinese](https://reader036.vdocuments.us/reader036/viewer/2022062504/5a4d1b297f8b9ab05999848f/html5/thumbnails/23.jpg)
2323
Effect of Local Features
Precision Recall F1 AccuracyQuestion 0.5314 0.3896 0.4496 0.5157
User History 0.8278 0.4682 0.5981 0.6805Question-User Relationship
0.5824 0.935 0.7178 0.6267
• Question-User Relationship achieves the best F1 and recall• Capture the user’s performance and interests in the category
of the given question• Capture the semantic relatedness of the given question and
the user• User History achieves the best precision
• Some users are quite active in the system• These highly active users only account for a few percentage
among all users
![Page 24: 11 A Classification-based Approach to Question Routing in Community Question Answering Tom Chao Zhou 1, Michael R. Lyu 1, Irwin King 1,2 1 The Chinese](https://reader036.vdocuments.us/reader036/viewer/2022062504/5a4d1b297f8b9ab05999848f/html5/thumbnails/24.jpg)
2424
Effect of Local Features
Precision Recall F1 AccuracyQ + QU Relationship
0.5974 0.9134 0.7223 0.6435
U + QU Relationship
0.7362 0.8275 0.7792 0.7619
Q + U + QU Relationship
0.7418 0.8253 0.7814 0.7655
Top 10 features in Local features
0.6964 0.8095 0.7487 0.7241
• The combination of all local features achieves the best F1• Results of employing the top 10 features are also
encouraging
![Page 25: 11 A Classification-based Approach to Question Routing in Community Question Answering Tom Chao Zhou 1, Michael R. Lyu 1, Irwin King 1,2 1 The Chinese](https://reader036.vdocuments.us/reader036/viewer/2022062504/5a4d1b297f8b9ab05999848f/html5/thumbnails/25.jpg)
2525
Effect of Local Features• Two most important local features
– KL-divergence value between given question and questions answered by the user
• Capture the most accurate semantic relatedness between the given question and the knowledge of the user
– KL-divergence value between given question and questions answered, asked, and starred by the user
• Consider the user’s interests as well by incorporating other factors
![Page 26: 11 A Classification-based Approach to Question Routing in Community Question Answering Tom Chao Zhou 1, Michael R. Lyu 1, Irwin King 1,2 1 The Chinese](https://reader036.vdocuments.us/reader036/viewer/2022062504/5a4d1b297f8b9ab05999848f/html5/thumbnails/26.jpg)
2626
Effect of Local and Global Features
Precision Recall F1 AccuracyLocal 0.7418 0.8253 0.7814 0.7655
Global 0.5779 0.8713 0.6949 0.6109
Local + Global 0.7279 0.8499 0.7842 0.7689
• Combination of local features and global features promise to maintain the best elements of the two, and the best F1 score is consequently achieved
![Page 27: 11 A Classification-based Approach to Question Routing in Community Question Answering Tom Chao Zhou 1, Michael R. Lyu 1, Irwin King 1,2 1 The Chinese](https://reader036.vdocuments.us/reader036/viewer/2022062504/5a4d1b297f8b9ab05999848f/html5/thumbnails/27.jpg)
2727
Effect of Local and Global Features• Three most important features
– KL-divergence value between given question and questions answered by the user
– KL-divergence value between given question and questions answered, asked, and starred by the user
– KL-divergence value between given question and questions from the same category
• If a question is quite typical in the category, it would have higher chance to be answered by users, and this could also partially explain the reason why CQA services usually have well-structured categories
![Page 28: 11 A Classification-based Approach to Question Routing in Community Question Answering Tom Chao Zhou 1, Michael R. Lyu 1, Irwin King 1,2 1 The Chinese](https://reader036.vdocuments.us/reader036/viewer/2022062504/5a4d1b297f8b9ab05999848f/html5/thumbnails/28.jpg)
2828
Introduction
Problem Definition and Feature
Experiments
Conclusions and Future Work
Related Work
![Page 29: 11 A Classification-based Approach to Question Routing in Community Question Answering Tom Chao Zhou 1, Michael R. Lyu 1, Irwin King 1,2 1 The Chinese](https://reader036.vdocuments.us/reader036/viewer/2022062504/5a4d1b297f8b9ab05999848f/html5/thumbnails/29.jpg)
2929
Related Work• Question Routing
– Zhou et al. 2009, expertise-based question routing
– Li and King 2010, language model based framework for combining expertise estimation and availability estimation
– Li et al. 2011, category-sensitive language model• Link analysis and Expert Finding
– Jurczyk and Agichtein, 2007– Zhang, Ackerman and Adamic, 2007– Apply PageRank and HITS in social media
![Page 30: 11 A Classification-based Approach to Question Routing in Community Question Answering Tom Chao Zhou 1, Michael R. Lyu 1, Irwin King 1,2 1 The Chinese](https://reader036.vdocuments.us/reader036/viewer/2022062504/5a4d1b297f8b9ab05999848f/html5/thumbnails/30.jpg)
3030
Introduction
Problem Definition and Feature
Experiments
Conclusions and Future Work
Related Work
![Page 31: 11 A Classification-based Approach to Question Routing in Community Question Answering Tom Chao Zhou 1, Michael R. Lyu 1, Irwin King 1,2 1 The Chinese](https://reader036.vdocuments.us/reader036/viewer/2022062504/5a4d1b297f8b9ab05999848f/html5/thumbnails/31.jpg)
3131
Conclusions• Formulate question routing as a
classification task• Derive a variety of local and global
features• Analyze the contributions from different
sources• Thorough experimental study
![Page 32: 11 A Classification-based Approach to Question Routing in Community Question Answering Tom Chao Zhou 1, Michael R. Lyu 1, Irwin King 1,2 1 The Chinese](https://reader036.vdocuments.us/reader036/viewer/2022062504/5a4d1b297f8b9ab05999848f/html5/thumbnails/32.jpg)
3232
Future Work• Semi-supervised approach• Incorporate social aspects into the model
![Page 33: 11 A Classification-based Approach to Question Routing in Community Question Answering Tom Chao Zhou 1, Michael R. Lyu 1, Irwin King 1,2 1 The Chinese](https://reader036.vdocuments.us/reader036/viewer/2022062504/5a4d1b297f8b9ab05999848f/html5/thumbnails/33.jpg)
3333
Thanks Q&A