data triangulation in a user evaluation of the sealife semantic web browsers

Data Triangulation in a User Evaluation of the SealifeSemantic Web Browsers

Helen OliverPatty KostkovaEd de Quincey

City eHealth Research Centre (CeRC)City University London

User-Centred Evaluation of Semantic Web Browsers

• The Semantic Web for Life Sciences– Browse for meaning– Find answers to critical questions faster– Computer scientists love SWBs!

• First-ever user-centred evaluation of SWBs recruiting REAL-WORLD users– Do real users love SWBs too?

• Realistic user-centred evaluation has been neglected for SWBs!

User-Centred Evaluation of Semantic Web Browsers

• Use Triangulation to consider all angles– Essential to our innovative evaluation framework( Quantitative data:

• Web server logs• Questionnaire results

+ Qualitative data:• Semi-structured interviews )

= (Validation AND Completeness)• Triangulation has been neglected in

user-centred evaluations of SWBs!

Group A1: Infectious Disease Professionals

CORESE-based SWB vs NeLI

COHSE vs NeLI

Group A2: MicrobiologistsGoPubMed/GoGene vs PubMed

Use of Triangulation for Semantic Web

• Quantitative Data Sources:– Web Form Questionnaires

• Pre-questionnaire• Post-task questionnaires• Post-questionnaire

– Web Server Logs• Qualitative Data Sources:

– Semi-Structured Interviews (subset of participants)

• Evaluation Settings:– Online– Workshops

Value of Data Triangulation in Interpreting the Results

• Questionnaires– Findability– Usability– System Speed– Relevance– Likeability

• Web Server Logs– Task Completion Time– Usage of Semantic Links– # of External Pages Viewed– Views of Target Documents

• Semi-Structured Interviews– Answers to questions we didn’t think to ask…– Observe participants to assess system intuitiveness

Sealife Results

COHSE: 67 respondents39 online

28 in workshopsCORESE: 14 respondents2 online (only 1 completed)

12 in workshops GoPubMed:137 online

4 in workshopGoGene + Extended GoPubMed:

14 in workshop

Qualitative results not statistically significant (few interviews conducted)

Web Server Logs

• PubMed was faster than GoGene• Faster => Better…• So, users liked PubMed better than

GoGene – right?• Web Server Logs Don’t Lie!

Questionnaires

• Best for: – Likeability– Information Findability– Relevance– System Speed

• GoPubMed/GoGene– Usability

• COHSE• Highest Number of Positive Ratings:

– GoPubMed/GoGene• Largest Positive Mode Differences Between Control and Intervention:

– GoPubMed/GoGene• Fewest Negative Mode Ratings Compared to Control:

– GoPubMed/GoGene NEVER had worse mode scores than PubMed!

Semi-Structured Interviews

• So the winner is GoPubMed/GoGene• COHSE was rated the most usable

– what more could we want?• Well…

– Critiques in GoPubMed/GoGene interviews were about the details– Critiques in COHSE/CORESE interviews were about being able to

use the systems at all• At first, it turned out that some could not tell control from intervention!• When asked for critiques of COHSE or CORESE, users gave abundant

detail… about NeLI!– Yes, but what about COHSE? “Those awful little boxes? They were really

distracting, I didn’t really understand what they were.”• Presentations explaining the SWBs improved users’

understanding

Validation

• We were expecting discrepancy between logs, questionnaires, and interviews– True for COHSE’s findability ratings

• Workshop users rated it as adequate or good• Logs showed that none of these users had found the answer

– Triangulation revealed discrepancies in plausible results– Otherwise users were generally consistent

• We suspected one user of giving fake answers because she was exceptionally positive in her questionnaires and interview

– Task logs showed that she was one of the fastest (1-2 min per task)» …but 2 others were faster!

– Logs showed that she activated 4 link boxes» …matching the median for all respondents

– Logs showed that she viewed only 1 external page» …but some users didn’t view any and of those who did, 1 page was

the mode– Triangulation validated suspicious results

Completeness

• Logs showed that interviewees who spoke negatively about COHSE often had spent a long time on it– Longer than 5 minutes– Longer than they spent on the control platform

• Several users spent more time on GoGene than on PubMed or the extended GoPubMed, but:– Said GoGene was their favourite– Rated it highly on the questionnaires

• Triangulation shows the whole picture– Faster ! => better– Slower ! => worse

Discussion

– GoPubMed/GoGene workshop confirmed positive impressions – CORESE workshop confirmed negative questionnaire results– GoPubMed/GoGene workshop also confirmed:

• That problems with this SWB were the most trivial• That somewhat higher questionnaire results masked dramatically

better user experiences– Impressions that COHSE was more usable were quashed by

contact with users at workshop• Severity of problems would have gone undetected without interviews

– Low number of interviews means triangulation was not complete• Recruitment difficult given time pressures on user base• Workshops are resource-intensive• Future work: carefully sample a subset for interview

– Time constraints prevented gathering of observational data in situ• Future work: use video and/or eye tracking software

Conclusion

– We have developed a method of triangulating quantitative and qualitative data in user-centred evaluation of SWBs• This addresses a need for greater attention to a technique

which is essential for accurate interpretation of data– Having applied our evaluation framework we

triangulated:• Quantitative data from the web server logs and from

questionnaires • Qualitative data from semi-structured interviews eliciting

users’ opinions on matters they identified as important

Conclusion

– Triangulation was indispensable for an accurate view of the results• Log data gave system speed

– Questionnaires and interviews gave the meaning of the log data• Log data showed usage of semantic links• Log data showed whether users found the answers

– Questionnaires and interviews revealed discrepancies between what users said and what they did

• Questionnaires showed system intuitiveness– Only the interviews showed the full significance of the

questionnaire results– Only triangulation could answer the ultimate questions

about user satisfaction• If any one data source had been left out, the results could

have been misinterpreted

data triangulation in a user evaluation of the sealife semantic web browsers

Documents

user evaluation

web server logsquestionnaire

swbs improved users

web server logs dont

gopubmedgogene interviews

cohsecorese interviews

structured interviewsso

realworld usersdo real