usability evaluation of multi-device/platform user interfaces generated by model-driven engineering

Usability Evaluation of Multi-Device/Platform User Interfaces Generated by Model-Driven Engineering

Nathalie Aquino1, Jean Vanderdonckt1,2, Nelly Condori-Fernández1, Óscar Dieste3, Óscar Pastor1

1Centro de Investigación en Métodos de Producción de Software, Universidad Politécnica de Valencia, Spain{naquino, nelly, opastor}@pros.upv.es

2Université catholique de Louvain, Louvain School of Management (LSM), [email protected]

3Facultad de Informática, Universidad Politécnica de Madrid, [email protected]

This work has been developed with the support of MICINN under the project SESAMO TIN2007-62894 and co-financed with ERDF, ITEA2 Call 3 UsiXML project under reference 20080026, MITYC under the project MyMobileWeb TSI-020301-2009-014, and GVA under grant BFPI/2008/209.

04/10/23 ESEM 2010 - Bolzano-Bozen, Italy 2

Agenda

• Introduction• Experiment Definition• Experiment Planning and Operation• Validity Evaluation• Analysis• Conclusion


Introduction

• Multiple computing platforms and display devices

• Interactive applications

Devices with one computing platformDevice

Devices with different computing platforms


Introduction

• Model-Driven Engineering (MDE) of User Interfaces (UIs)– Good option to develop multi-device/platform interactive systems– Several approaches currently exist: UIML, UsiXML, Teresa, Maria, Just-

UI, OO-Method, among othersUSER INTERFACE CONTEXT

INTERACTION REQUIREMENTS

DOMAIN TASKS

ABSTRACT USER INTERFACE

CONCRETE USER INTERFACE

USER INTERFACE CODE

CIM

PIM

PSM

CODE

USER ENVIRONMENT

PLATFORM


Introduction

• General question – What is the quality of the multi-device/platform UIs

generated by MDE?

Usability is a very important aspect of quality, especially in the case of interactive applications

• Research question– Is the usability of UIs generated by MDE the same in

different devices and platforms?


Experiment Definition

• In order to address the research question, an exploratory usability evaluation was conducted in an experimental controlled context

• Goal of the experimentAnalyze multi-device/platform graphical UIs generated by MDE for the purpose of evaluating their quality with respect to usabilityfrom the point of view of the researchers in the context of computer science professionals using UIs of an interactive

application with different computing platforms and devices



• Usability was measured in accordance with ISO 9241-11– User satisfaction– Effectiveness – Efficiency

• Computing platforms– Desktop: C# running on .NET– Web: JavaServer Faces running on Java

• Devices– Small screen: ultra-mobile PC and stylus– Standard size screen: PC, mouse, and keyboard– Wide screen: PC connected to a TV, mouse, and keyboard

• MDE approach– OO-Method/OlivaNOVA



• OO-Method/OlivaNOVA - Presentation Model

HIERARCHICAL ACTION TREE

SERVICEINTERACTION UNIT

INSTANCEINTERACTION UNIT

POPULATIONINTERACTION UNIT

MASTER/DETAILINTERACTION UNIT

INTRODUCTION

DEFINEDSELECTION

ARGUMENTGROUPING

POPULATION PRELOAD

DEPENDENCY

CONDITIONAL NAVIGATION

FILTER

ORDER CRITERION

DISPLAY SET

ACTIONS

NAVIGATIONS

MASTERINTERACTION UNIT

DETAILSINTERACTION

UNITS

A uses B

A B

Legend

Level 1 Level 2 Level 3

Interaction Units: main interactive operations.

SIU: scenario to execute a service.

PIU: scenario where multiple objects are presented.

IIU: scenario where information about a single object is presented. Special case of PIU.

MDIU: presents collections of objects that belong to different interrelated classes. Combination of other interaction units.


Experiment Planning and Operation

• Hypotheses– Null hypotheses: when using interfaces automatically

generated from InteractionUnit, the UsabilityAspect is the same for different platforms and devices

where the following values are combinedInteractionUnit = {SIU, PIU}UsabilityAspect = {user satisfaction, effectiveness, efficiency}

6 null and alternate hypotheses

– Alternate hypotheses: when using interfaces automatically generated from InteractionUnit, the UsabilityAspect is not the same for different platforms and devices



• Response variables

Variable Measure

Satisfaction Overall satisfactionSystem usefulnessInformation qualityInterface quality (CSUQ)

Effectiveness Task completion percentage

Efficiency Task completion percentage in relation to time in task



• Factors– Device

• Small screen• Standard size screen• Wide screen

– Platform• Desktop• Web



• Experimental subjects– Selected by convenience sampling

• Computer science postgraduate students and professors from Universidad Politécnica de Valencia were invited to participate

– Participation was voluntary– Subjects did not received incentives– 31 people participated– Subjects did not received training



• Objects of study– Multi-device/platform graphical UIs generated by MDE

http://www.pros.upv.es/users/naquino/mdp-usability-eval/

Web Site



• Experiment design– Factorial 3x2x2 design with repeated measures

• The order in which subjects tested the different combinations was randomized

Device

Small Standard Wide

Platform Platform Platform

Desktop Web Desktop Web Desktop Web

Interaction Unit

Interaction Unit

Interaction Unit

Interaction Unit

Interaction Unit

Interaction Unit

SIU PIU SIU PIU SIU PIU SIU PIU SIU PIU SIU PIU

Subjects

All All All All All All All All All All All All



• Tasks– 12 tasks were defined, 6 for SIU and 6 for PIU– Tasks were similar regarding complexity– Subjects used a different and randomly assigned task in

each of the 12 combinations of device, platform and interaction unit

Web Site



• Experimental procedure– Presentation with general information and instructions– Demographic questionnaire– Guideline

• Specifies the combination of device, platform, and interaction unit to use, together with the task to do

• For each task– Start and completion time– CSUQ



• Experimental procedure– Groups of at most three subjects at a time, on different

days– Two hours to complete the experiment



• Data collection– Tasks were corrected to obtain the task completion

percentage (effectiveness)– Time on task was derived from the start and completion

time of each task (efficiency)– Overall satisfaction, system usefulness, information

quality, and interface quality were aggregated from the answers of CSUQ, following the rules specified by the designers of CSUQ


Validity Evaluation

• Conclusion validityThreat Category Threat Actions taken

Reliability of measures

Satisfaction is a subjective measure

CSUQ has excellent psychometric reliability properties that have been reported in literature

Start and completion time of tasks were measured manually

Reliability of the application of treatments to subjects

Evaluations were carried out on different occasions

A standard procedure was designed to be equally applied in each occasionAssignment of devices, platforms, and tasks was carried out randomly

Random heterogeneity of subjects

Subjects had different levels of experience in using the different devices


Validity Evaluation

• Internal validityThreat Category Threat Actions taken

Instrumentation Badly designed instruments

Instrumentation, tasks, and objects of study were pre-validated by two persons

Maturation Repeated measures Learning effect

Different tasks, but with similar complexity, were proposed

Long experiment A five-minutes break was given to participants at each change of device


Validity Evaluation

• Construct validityThreat Actions taken

Not having representative material

We used an application implemented by the developers of the OlivaNOVA tool. This application is used in training courses about the tool.


Validity Evaluation

• External validityThreat Category Threat Actions taken

Interaction of selection and treatment

Not having a representative population from which to generalize results

Subjects had different levels of experience regarding the different devices, but all of them had a background on computer science

Interaction of setting and treatment

Not having representative material from which to generalize results

We selected OO-Method/OlivaNOVA as a representative approach of MDE of UIs since it has been patented and is currently being used in commercial and industrial environments


Analysis

• Statistical analysis– Statistical Package for the Social Sciences (SPSS) V16.0– Confidence level of 95% (α = 0.05)


Analysis

• Data distribution for SIU

Overall satisfaction Effectiveness Efficiency

Line 1: small screen and web platform Line4: standard screen and desktop platform

Line 2: small screen and desktop platform Line 5: wide screen and web platform

Line 3: standard screen and web platform Line 6: wide screen and desktop platform


Analysis

• Data distribution for PIU

Overall satisfaction Effectiveness Efficiency

Line 1: small screen and web platform Line4: standard screen and desktop platform

Line 2: small screen and desktop platform Line 5: wide screen and web platform

Line 3: standard screen and web platform Line 6: wide screen and desktop platform


Analysis

• Main analysis performed on response variables– Descriptive statistics

• Mean• Standard deviation

– Outliers identification• Box plots


Analysis

• Main analysis performed on response variables– Hypotheses testing

• Performed discarding and without discarding outliers• Normality test: Kolmogorov-Smirnov• Normal distribution

– Parametric test» ANOVA with repeated measures» If significant: Estimated Marginal Means (for paired comparisons) using

Bonferroni as confidence interval adjustment

• Not normal distribution– Non-parametric test

» Friedman test» If significant: Wilcoxon signed-rank test (for paired comparisons) using a

Bonferroni correction to control the error rate


Analysis

• Main results for SIU– Overall satisfaction, system usefulness, information quality, and

efficiency are significantly better (confidence level of 95%) in the desktop platform than in the web platform

– Efficiency is significantly better in the standard size screen than in the small one as well as it is significantly better in the wide screen than in the small one

– Effectiveness is significantly better when combining the small screen with the desktop platform than when combining the standard size screen with the web platform (only discarding outliers)

– Interface quality was not affected by the use of different sized devices or platforms


Analysis

• Main results for PIU– Efficiency is significantly better for large screens than for small ones

and for the desktop platform rather than for the web one

– Overall satisfaction and system usefulness tend to be better (confidence level of 90%) for standard size screens than for small ones

– Information quality, interface quality, and effectiveness were not affected by the use of different sized devices or platforms


Conclusion

• Regarding platforms, the desktop platform obtained the best results

• Regarding devices, the one with the small screen obtained the worst results

• Possible causes– OO-Method/OlivaNOVA is mainly used to develop organizational

information systems. In those environments the desktop platform and the standard size screens are more common than the other options.

– The kinds of user interfaces that people are used to using in small devices is different from the type of user interfaces generated with OlivaNOVA.

• OO-Method/OlivaNOVA should incorporate enhancements in order to generate multi-device/platforms user interfaces with a suitable usability


Thank you very much for your attention

[email protected]

www.pros.upv.es

usability evaluation of multi-device/platform user interfaces generated by model-driven engineering

Technology

italyexperiment planning

different platforms

different combinations

multideviceplatform

usability of uis

different devices

different interrelated

universidad politcnica