a question of complexity - measuring the maturity of online enquiry communities
Post on 05-Dec-2014
289 Views
Preview:
DESCRIPTION
TRANSCRIPT
A QUESTION OF COMPLEXITY − MEASURING THE MATURITY OF ONLINE ENQUIRY
COMMUNITIES
GRÉGOIRE BUREL1 AND YULAN HE2
1Knowledge Media Institute, The Open University, Milton Keynes, UK.2School of Engineering & Applied Science Aston University, UK.
HT2013Paris, France. 2013
OUTLINEA QUESTION OF COMPLEXITY − MEASURING THE MATURITY OF ONLINE ENQUIRY COMMUNITIES
- Question Complexity and Community Maturity- Enquiry Communities- Server Fault- Needs and Motivations- Contributions
- Hypotheses and Validation- Two Definitions- Five Hypotheses- Validation
- Computing and Mapping Features- Predictors- Feature Computation: Users, Content and Threads.
- Measuring Content Complexity and Community Maturity- Prediction Results- Feature Ranking- Community Maturity
- Future Work- Conclusion
ENQUIRY COMMUNITIES
“Enquiry Communities are communities composed of askers and answerers looking for solutions to particular issues.”
A QUESTION OF COMPLEXITY − MEASURING THE MATURITY OF ONLINE ENQUIRY COMMUNITIES
ENQUIRY COMMUNITIES
“Enquiry Communities are communities composed of askers and answerers looking for solutions to particular issues.”
A QUESTION OF COMPLEXITY − MEASURING THE MATURITY OF ONLINE ENQUIRY COMMUNITIES
ENQUIRY COMMUNITIES
“Enquiry Communities are communities composed of askers and answerers looking for solutions to particular issues.”
A QUESTION OF COMPLEXITY − MEASURING THE MATURITY OF ONLINE ENQUIRY COMMUNITIES
A QUESTION OF COMPLEXITY − MEASURING THE MATURITY OF ONLINE ENQUIRY COMMUNITIES
- Server Fault (SF):- A web based enquiry IT
community specialised in server related issues.
- Factual questions rather than conversational questions.
- Dataset (Data up to April 2011):
- 71,962 Questions- 162,401 Answers- 51,727 Users- 4,999 Topics (Tags)
http://serverfault.com
ENQUIRY COMMUNITIES- Enquiry Communities Needs (Rowe et al. 2011, Burel
et al. 2012):- Community Managers:
- Make sure that the community is “happy” (questions are solved).
- Make sure that the community becomes more knowledgeable over time (users gain expertise and experience).
- Identify and implement features that help users goals.- Askers:
- Get answers related to a particular issue.- Make sure that a community can fulfil their needs before asking
a questions.- Answerers:
- Find which question they can answer.- Find questions that are challenging.
A QUESTION OF COMPLEXITY − MEASURING THE MATURITY OF ONLINE ENQUIRY COMMUNITIES
ISSUES AND MOTIVATION- Enquiry Communities Needs:
- Questions have uneven complexity:- Difficulty to identify how hard are particular questions and who
can answer them. - Communities have different answering abilities:
- Some communities can answers simple questions about a topic while other communities can also answer complex questions.
- How do determine if a community is able to answer complex questions?
- Some communities are more knowledgeable and experienced than others:- How do we measure experience and expertise?
- Features can support the identification of mature communities and complex content, but which ones?- What features help to measure community maturity and content
complexity?
A QUESTION OF COMPLEXITY − MEASURING THE MATURITY OF ONLINE ENQUIRY COMMUNITIES
IDENTIFYING COMPLEX QUESTIONS AND MATURE COMMUNITIESHow user, content, thread and platform features affect content complexity identification? How can we measure maturity based on content complexity?1. Identifying Complex Questions: – Helping answerer to find relevant and challenging questions.2. Analysis of Complexity Predictors:– Helping community manager to identify important complexity factors3. Measuring Community Maturity:– Helping users to decide if their question will be answered/Helping community manager to understand their community abilities.
A QUESTION OF COMPLEXITY − MEASURING THE MATURITY OF ONLINE ENQUIRY COMMUNITIES
CONTRIBUTIONSHow user, content, thread and platform features affect quality content complexity? How can we use content complexity for measuring the maturity of communities?- Introduce a definition of question complexity and validate the hypothesis that question complexity increases with askers’ community involvement.- Study the influence of features relating to askers, answerers, questions and answers on question complexity prediction.- Introduce the concept of community maturity, a measure of community knowledge and specialisation.- Investigate the evolution of community maturity in Server Fault and demonstrate that community maturity is influenced by topical dynamics.
A QUESTION OF COMPLEXITY − MEASURING THE MATURITY OF ONLINE ENQUIRY COMMUNITIES
LITERATUREHow user, content, thread and platform features affect quality content complexity? How can we use content complexity for measuring the maturity of communities?
- No empirical study of the relation between content complexity and community involvement.
- No free-form model of content complexity. Typically very domain dependent (Wu, 2009; Bachrach et al. 2012).
- Community health metrics (Welinder, et al. 2010; Toral et al., 2009; Rowe et al. 2011) tend to neglect skill building as a key health indicator despite the importance of such factor in user participation (Pal et al., 2012; Nam et al., 2009).
A QUESTION OF COMPLEXITY − MEASURING THE MATURITY OF ONLINE ENQUIRY COMMUNITIES
QUESTION COMPLEXITY AND MATURITY- Definition 1 (Question Complexity):- Question complexity is a value representing the difficulty and level of expertise required for answering a question.- Definition 2 (Community Maturity):- Community Maturity is a value representing the level of knowledge and specialisation achieved by a community. A more mature community focuses on more complex questions whereas a community less mature has simpler and less focused questions.
A QUESTION OF COMPLEXITY − MEASURING THE MATURITY OF ONLINE ENQUIRY COMMUNITIES
QUESTION COMPLEXITY AND MATURITY- Definition 1 (Question Complexity):- Question complexity is a value representing the difficulty and level of expertise required for answering a question.- Definition 2 (Community Maturity):- Community Maturity is a value representing the level of knowledge and specialisation achieved by a community. A more mature community focuses on more complex questions whereas a community less mature has simpler and less focused questions.
A QUESTION OF COMPLEXITY − MEASURING THE MATURITY OF ONLINE ENQUIRY COMMUNITIES
QUESTION COMPLEXITY AND MATURITY- Hypothesis 1 (Temporality):
- For a given user, question complexity increases as a function of time and participation. The longer a user is actively involved in a community, the more complex are her questions.
- Hypothesis 2 (Enquiry):- For a given user, question complexity increases with the number of question asked. The
more a user asks questions, the more likely her questions will become more complex.
- Hypothesis 3 (Commitment):- For a given user, question complexity increases with her activity levels. The more
frequently a user is involved in a community, the more complex are her questions.
- Hypothesis 4 (Accomplishment):- For a given user, question complexity increases with the number of questions she has
found answers before. The more a user finds answers to some questions, the more likely she can improve her knowledge skill and thus asks more complex questions in the future.
- Hypothesis 5 (Focus):- For a given user, question complexity increases with the number of question asked. The
more a user asks questions, the more likely her questions will become more complex.
A QUESTION OF COMPLEXITY − MEASURING THE MATURITY OF ONLINE ENQUIRY COMMUNITIES
QUESTION COMPLEXITY AND MATURITY- Hypothesis 1 (Temporality):
- For a given user, question complexity increases as a function of time and participation. The longer a user is actively involved in a community, the more complex are her questions.
- Hypothesis 2 (Enquiry):- For a given user, question complexity increases with the number of question asked. The
more a user asks questions, the more likely her questions will become more complex.
- Hypothesis 3 (Commitment):- For a given user, question complexity increases with her activity levels. The more
frequently a user is involved in a community, the more complex are her questions.
- Hypothesis 4 (Accomplishment):- For a given user, question complexity increases with the number of questions she has
found answers before. The more a user finds answers to some questions, the more likely she can improve her knowledge skill and thus asks more complex questions in the future.
- Hypothesis 5 (Focus):- For a given user, question complexity increases with the number of question asked. The
more a user asks questions, the more likely her questions will become more complex.
A QUESTION OF COMPLEXITY − MEASURING THE MATURITY OF ONLINE ENQUIRY COMMUNITIES
QUESTION COMPLEXITY AND MATURITY- Hypothesis 1 (Temporality):
- For a given user, question complexity increases as a function of time and participation. The longer a user is actively involved in a community, the more complex are her questions.
- Hypothesis 2 (Enquiry):- For a given user, question complexity increases with the number of question asked. The
more a user asks questions, the more likely her questions will become more complex.
- Hypothesis 3 (Commitment):- For a given user, question complexity increases with her activity levels. The more
frequently a user is involved in a community, the more complex are her questions.
- Hypothesis 4 (Accomplishment):- For a given user, question complexity increases with the number of questions she has
found answers before. The more a user finds answers to some questions, the more likely she can improve her knowledge skill and thus asks more complex questions in the future.
- Hypothesis 5 (Focus):- For a given user, question complexity increases with the number of question asked. The
more a user asks questions, the more likely her questions will become more complex.
A QUESTION OF COMPLEXITY − MEASURING THE MATURITY OF ONLINE ENQUIRY COMMUNITIES
Participation
Com
ple
xity
HYPOTHESES VALIDATION
- Methodology:1. Select 510 question pairs based on the previous hypotheses:
- Questions from early and late user contributions.
2. Annotate the question pairs by selecting what question is the most complex:- Due to low inter-annotator agreement (for 3 annotators, κ = 0.146), we
focus on pairs that have more than 75% agreement (220 pairs, 440 questions).
3. Calculate the statistical significance of hypothesis- Concentration on Hypothesis 1: Temporality.
- Results (Hypothesis 1):
A QUESTION OF COMPLEXITY − MEASURING THE MATURITY OF ONLINE ENQUIRY COMMUNITIES
HYPOTHESES VALIDATION
- Methodology:1. Select 510 question pairs based on the previous hypotheses:
- Questions from early and late user contributions.
2. Annotate the question pairs by selecting what question is the most complex:- Due to low inter-annotator agreement (for 3 annotators, κ = 0.146), we
focus on pairs that have more than 75% agreement (220 pairs, 440 questions).
3. Calculate the statistical significance of hypothesis- Concentration on Hypothesis 1: Temporality.
- Results (Hypothesis 1):
A QUESTION OF COMPLEXITY − MEASURING THE MATURITY OF ONLINE ENQUIRY COMMUNITIES
FEATURES 1. User Features (Askers and Answerers):– Represents the characteristics and reputation of
askers and answerers (e.g. reputation, number of best answers, normalised topic entropy…).
2. Questions and Answers Features:– Questions and answers features (e.g. readability,
ratings, number of views…).– Represents relation between answers within a
particular thread. (e.g. topic reputation, elapsed days…).
– Content based features (e.g. term entropy, readability…).
A QUESTION OF COMPLEXITY − MEASURING THE MATURITY OF ONLINE ENQUIRY COMMUNITIES
FEATURESA QUESTION OF COMPLEXITY − MEASURING THE MATURITY OF ONLINE ENQUIRY COMMUNITIES
Type Features
Askers Community Age (Experience), Community Age Difference, Number of Questions (Enquiry), Number of Answers, Asking Rate (Asker Commitment), Answering Rate, Ratio of Successfully-Answered Questions (Accomplishment), Ratio of Question Successfully Answered by Others, Normalised Question Topic Entropy (Focus), Normalised Answer Topic Entropy, Average Number of Replies per Question, Average Number of Question Views, Z-score, Reputation.
Answerers
Askers features + Mean and Standard deviation forms.
Questions
Number of Views, Number of Words, Readability with Gunning Fog , Readability with Flesch-Kincaid Grade, Existing Value, Status, Number of Answers, Favourites, Score, Informativeness, Cumulative Term Entropy.
Answers Questions features + Mean and Standard deviation forms + Elapsed Days, Elapsed Days First, Elapsed Days Last, Number of Comments Mean, Score.
QUESTION COMPLEXITY PREDICTION- Experimental Setting:
1. Split the annotated questions in complex and non-complex questions (440 questions).
2. Compute features.3. Use Logistic Regression algorithm and
validate results using 10-folds cross validation.
4. Compute Precision (P), Recall (R), F-Measure (F1) and area under the Receiver Operator Curve (ROC) for different feature groups.
A QUESTION OF COMPLEXITY − MEASURING THE MATURITY OF ONLINE ENQUIRY COMMUNITIES
COMPLEXITY PREDICTION RESULTS
A QUESTION OF COMPLEXITY − MEASURING THE MATURITY OF ONLINE ENQUIRY COMMUNITIES
COMPLEXITY PREDICTION RESULTS
- Best Answer Identification (F1 0.60):– Baseline Models:
- Asker’s age in a community correlates better than question length.
- Question length is not correlated with complex questions.
– Feature Types Models and Complete Model:- Askers and answerer’s features are the best: Question
complexity is mostly related with asker’s features.- The full model performs better than the feature type
models.
A QUESTION OF COMPLEXITY − MEASURING THE MATURITY OF ONLINE ENQUIRY COMMUNITIES
FEATURES RANKING- Features Ranking:
1. For each feature, Information Gain Ratio (IGR), Correlation Feature Selection (CFS) and F1 Feature Drop (FD) is computed
2. The features are then sorted by their respective importance.
3. The best features are then selected for computing a new question complexity model by accounting for the best F1.
A QUESTION OF COMPLEXITY − MEASURING THE MATURITY OF ONLINE ENQUIRY COMMUNITIES
FEATURES RANKING RESULTSA QUESTION OF COMPLEXITY − MEASURING THE MATURITY OF ONLINE ENQUIRY COMMUNITIES
FEATURES RANKING RESULTS- Features Impact Comparison:– Asker’s community age and topical focus are the
most important features.– User features are the most significant (73.3% of
the top ten features).– Answer features are low ranked.– Focused users are more likely to ask complex
questions.– Questions with low value (Pal et al., 2010) are
more likely to be complex (complements findings on question selection behaviour of experts (Pal et al., 2010)).
A QUESTION OF COMPLEXITY − MEASURING THE MATURITY OF ONLINE ENQUIRY COMMUNITIES
BEST MODEL RESULTS- Best Model (F1 0.64):
– The best model is obtained when using CFS, the selected features are:1. Asker’s question topical
focus.2. Asker’s ratio of successfully-
answered questions.3. Askers’ community age. 4. Questions’ existing value
(Pal et al., 2010).5. Questions’ views.
A QUESTION OF COMPLEXITY − MEASURING THE MATURITY OF ONLINE ENQUIRY COMMUNITIES
COMMUNITY MATURITY- Maturity Measure:
- Experimental Setting:1. Calculate question complexity based on the proportion of
complex questions asked per month.2. Compute maturity on different users sets depending on
their age in the community.3. Compute maturity for the most discussed topics (tags)
and users that have been active for more than a day.4. Observe the evolution of maturity for the most discussed
topics and the different users groups.
A QUESTION OF COMPLEXITY − MEASURING THE MATURITY OF ONLINE ENQUIRY COMMUNITIES
COMMUNITY MATURITY RESULTS
A QUESTION OF COMPLEXITY − MEASURING THE MATURITY OF ONLINE ENQUIRY COMMUNITIES
Users Topics/Communities
COMMUNITY MATURITY RESULTS- User Evolution:- Maturity increases over time.- Maturity drop can be explained by the drop of average community age at the end of 2010 (229 to 185 days).- Committed users are more likely to become more mature (0.64 > 0.4).- Community Evolution and Topics:- Maturity increases over time.- Different topics/Different growth rates. For example:- Linux: Slow but sustained → Linux users becomes more knowledgeable over time.- Windows-server-2008: Initially high, then low → Users migrating to Windows-server-2008-r2.
A QUESTION OF COMPLEXITY − MEASURING THE MATURITY OF ONLINE ENQUIRY COMMUNITIES
FUTURE WORK- Perform similar analysis on other
Enquiry Communities:- Confirm our results on additional
datasets.- Derive a complexity metric that can
be applied to any online community based on the 5 factors of complexity:- Create a measure that does not require
annotations.
A QUESTION OF COMPLEXITY − MEASURING THE MATURITY OF ONLINE ENQUIRY COMMUNITIES
CONCLUSION- We showed that current health measures do not help in
identifying communities that become more topic proficient over time.
- We introduced the concept of question complexity and community maturity and provided a complexity model (F1 ≈ 0.65) and a maturity measure.
- We showed that question complexity depends on user activity and commitment as well as other factors (hypotheses testing).
- We found that complex questions depends on five key factors: 1) asker’s question topical focus; 2) asker’s ratio of successfully-answered questions; 3) askers’ community age; 4) questions’ existing value (Pal et al., 2010), and; 5) questions’ views.
- We showed that SF is a mature community and that maturity has topical dynamics.
A QUESTION OF COMPLEXITY − MEASURING THE MATURITY OF ONLINE ENQUIRY COMMUNITIES
QUESTIONS?Web: http://evhart.online.frEmail: g.burel@open.ac.ukTwitter: @evhart
@www
A QUESTION OF COMPLEXITY − MEASURING THE MATURITY OF ONLINE ENQUIRY COMMUNITIES
REFERENCES- Rowe, M., Alani, H., Angeletou, S., and Burel, G. Report on social, technical and corporate
needs in online communities. Tech. Rep. 3.1, ROBUST, 2011.- Burel, G, Yulan H., Alani H. Automatic Identification Of Best Answers In Online Enquiry
Communities. In Proceeding of ESWC2012 (2012). Heraklion, Greece. - Wu, M. The community health index. In Proceedings of the 4th International Conference on
Persuasive Technology (New York, NY, USA, 2009), Persuasive ’09, ACM, pp. 24:1–24:2.- Bachrach, Y., Graepel, T., Minka, T., and Guiver, J. How to grade a test without knowing the
Answers - A bayesian graphical model for adaptive crowdsourcing and aptitude testing. arXiv preprint arXiv:1206.6386 (2012).
- Welinder, P., Branson, S., Belongie, S., and Perona, P. The multidimensional wisdom of crowds. In In Proc. of NIPS (2010), pp. 2424–2432.
- Toral, S. L., Martınez-Torres, M. R., Barrero, F., and Cortals, F. An empirical study of the driving forces behind online communities. Internet Research 19, 4 (2009), 378–392.
- Pal, A., Chang, S., and Konstan, J. Evolution of experts in question answering communities. In Proceedings of the International AAAI Conference on Weblogs and Social Media (2012), pp. 274–281.
- Nam, K., Ackerman, M., and Adamic, L. Questions in, knowledge in?: a study of naver’s question answering community. In Proceedings of the 27th international conference on Human factors in computing systems (2009), pp. 779–788.
- Pal, A., Chang, S., and Konstan, J. Evolution of experts in question answering communities. In Proceedings of the International AAAI Conference on Weblogs and Social Media (2012), pp. 274–281.
A QUESTION OF COMPLEXITY − MEASURING THE MATURITY OF ONLINE ENQUIRY COMMUNITIES
top related