data triangulation in a user evaluation of the sealife semantic web browsers
DESCRIPTION
Data Triangulation in a User Evaluation of the Sealife Semantic Web Browsers. Helen Oliver Patty Kostkova Ed de Quincey City eHealth Research Centre (CeRC) City University London. User-Centred Evaluation of Semantic Web Browsers. The Semantic Web for Life Sciences Browse for meaning - PowerPoint PPT PresentationTRANSCRIPT
Data Triangulation in a User Evaluation of the SealifeSemantic Web Browsers
Helen OliverPatty KostkovaEd de Quincey
City eHealth Research Centre (CeRC)City University London
User-Centred Evaluation of Semantic Web Browsers
• The Semantic Web for Life Sciences– Browse for meaning– Find answers to critical questions faster– Computer scientists love SWBs!
• First-ever user-centred evaluation of SWBs recruiting REAL-WORLD users– Do real users love SWBs too?
• Realistic user-centred evaluation has been neglected for SWBs!
User-Centred Evaluation of Semantic Web Browsers
• Use Triangulation to consider all angles– Essential to our innovative evaluation framework( Quantitative data:
• Web server logs• Questionnaire results
+ Qualitative data:• Semi-structured interviews )
= (Validation AND Completeness)• Triangulation has been neglected in
user-centred evaluations of SWBs!
Group A1: Infectious Disease Professionals
CORESE-based SWB vs NeLI
COHSE vs NeLI
Group A2: MicrobiologistsGoPubMed/GoGene vs PubMed
Use of Triangulation for Semantic Web
• Quantitative Data Sources:– Web Form Questionnaires
• Pre-questionnaire• Post-task questionnaires• Post-questionnaire
– Web Server Logs• Qualitative Data Sources:
– Semi-Structured Interviews (subset of participants)
• Evaluation Settings:– Online– Workshops
Value of Data Triangulation in Interpreting the Results
• Questionnaires– Findability– Usability– System Speed– Relevance– Likeability
• Web Server Logs– Task Completion Time– Usage of Semantic Links– # of External Pages Viewed– Views of Target Documents
• Semi-Structured Interviews– Answers to questions we didn’t think to ask…– Observe participants to assess system intuitiveness
Sealife Results
COHSE: 67 respondents39 online
28 in workshopsCORESE: 14 respondents2 online (only 1 completed)
12 in workshops GoPubMed:137 online
4 in workshopGoGene + Extended GoPubMed:
14 in workshop
Qualitative results not statistically significant (few interviews conducted)
Web Server Logs
• PubMed was faster than GoGene• Faster => Better…• So, users liked PubMed better than
GoGene – right?• Web Server Logs Don’t Lie!
Questionnaires
• Best for: – Likeability– Information Findability– Relevance– System Speed
• GoPubMed/GoGene– Usability
• COHSE• Highest Number of Positive Ratings:
– GoPubMed/GoGene• Largest Positive Mode Differences Between Control and Intervention:
– GoPubMed/GoGene• Fewest Negative Mode Ratings Compared to Control:
– GoPubMed/GoGene NEVER had worse mode scores than PubMed!
Semi-Structured Interviews
• So the winner is GoPubMed/GoGene• COHSE was rated the most usable
– what more could we want?• Well…
– Critiques in GoPubMed/GoGene interviews were about the details– Critiques in COHSE/CORESE interviews were about being able to
use the systems at all• At first, it turned out that some could not tell control from intervention!• When asked for critiques of COHSE or CORESE, users gave abundant
detail… about NeLI!– Yes, but what about COHSE? “Those awful little boxes? They were really
distracting, I didn’t really understand what they were.”• Presentations explaining the SWBs improved users’
understanding
Validation
• We were expecting discrepancy between logs, questionnaires, and interviews– True for COHSE’s findability ratings
• Workshop users rated it as adequate or good• Logs showed that none of these users had found the answer
– Triangulation revealed discrepancies in plausible results– Otherwise users were generally consistent
• We suspected one user of giving fake answers because she was exceptionally positive in her questionnaires and interview
– Task logs showed that she was one of the fastest (1-2 min per task)» …but 2 others were faster!
– Logs showed that she activated 4 link boxes» …matching the median for all respondents
– Logs showed that she viewed only 1 external page» …but some users didn’t view any and of those who did, 1 page was
the mode– Triangulation validated suspicious results
Completeness
• Logs showed that interviewees who spoke negatively about COHSE often had spent a long time on it– Longer than 5 minutes– Longer than they spent on the control platform
• Several users spent more time on GoGene than on PubMed or the extended GoPubMed, but:– Said GoGene was their favourite– Rated it highly on the questionnaires
• Triangulation shows the whole picture– Faster ! => better– Slower ! => worse
Discussion
– GoPubMed/GoGene workshop confirmed positive impressions – CORESE workshop confirmed negative questionnaire results– GoPubMed/GoGene workshop also confirmed:
• That problems with this SWB were the most trivial• That somewhat higher questionnaire results masked dramatically
better user experiences– Impressions that COHSE was more usable were quashed by
contact with users at workshop• Severity of problems would have gone undetected without interviews
– Low number of interviews means triangulation was not complete• Recruitment difficult given time pressures on user base• Workshops are resource-intensive• Future work: carefully sample a subset for interview
– Time constraints prevented gathering of observational data in situ• Future work: use video and/or eye tracking software
Conclusion
– We have developed a method of triangulating quantitative and qualitative data in user-centred evaluation of SWBs• This addresses a need for greater attention to a technique
which is essential for accurate interpretation of data– Having applied our evaluation framework we
triangulated:• Quantitative data from the web server logs and from
questionnaires • Qualitative data from semi-structured interviews eliciting
users’ opinions on matters they identified as important
Conclusion
– Triangulation was indispensable for an accurate view of the results• Log data gave system speed
– Questionnaires and interviews gave the meaning of the log data• Log data showed usage of semantic links• Log data showed whether users found the answers
– Questionnaires and interviews revealed discrepancies between what users said and what they did
• Questionnaires showed system intuitiveness– Only the interviews showed the full significance of the
questionnaire results– Only triangulation could answer the ultimate questions
about user satisfaction• If any one data source had been left out, the results could
have been misinterpreted