d1.2 usability evaluation methodologydownload.microsoft.com/download/7/1/d/71dc7697-47ee-45a5... ·...

23
[PaeLife: Personal Assistant to Enhance the Social Life of the Seniors, Contract nº AAL-2009-2-068] 1 D1.2 Usability Evaluation Methodology Contract no AAL-2009-2-068 Start date of project 24/02/2012 Due date of deliverable M13 Completion date of deliverable March 2013 Lead partner for deliverable Université de Technologie de Troyes (UTT) Type of version xxx v5.0

Upload: others

Post on 13-Aug-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: D1.2 Usability Evaluation Methodologydownload.microsoft.com/download/7/1/D/71DC7697-47EE-45A5... · 2018-10-15 · When older adults reject technology, it tends to be due to not

[PaeLife: Personal Assistant to Enhance the Social Life of the Seniors, Contract nº AAL-2009-2-068]

1

D1.2 Usability Evaluation Methodology

Contract no AAL-2009-2-068

Start date of project 24/02/2012

Due date of deliverable M13

Completion date of deliverable March 2013

Lead partner for deliverable Université de Technologie

de Troyes (UTT)

Type of version xxx v5.0

Page 2: D1.2 Usability Evaluation Methodologydownload.microsoft.com/download/7/1/D/71DC7697-47EE-45A5... · 2018-10-15 · When older adults reject technology, it tends to be due to not

[PaeLife: Personal Assistant to Enhance the Social Life of the Seniors, Contract nº AAL-2009-2-068]

2

NATURE OF THE DELIVERABLE

R Report X

P Prototype

D Demonstrator

Project co-funded by the European Commission within the AAL Program, call 2

Dissemination Level

PU Public

PP Restricted to other programme participants (including AALA) X

RE Restricted to a group specified by the consortium (including AALA)

CO Confidential, only for members of the consortium (including AALA)

Page 3: D1.2 Usability Evaluation Methodologydownload.microsoft.com/download/7/1/D/71DC7697-47EE-45A5... · 2018-10-15 · When older adults reject technology, it tends to be due to not

[PaeLife: Personal Assistant to Enhance the Social Life of the Seniors, Contract nº AAL-2009-2-068]

3

Document History

Issue Date Version Change Made / Reason for this Issue

19/11/2012 1 General guideline to qualitative usability evaluation

14/12/2012 2 Adaptation of guideline to partners’ research interests

21/02/2013 3 Updated document after discussion and partners’ contribution

03/03/2013 4 Integrated methodology after partner’s input

08/03/2013 5 Integrated methodology document after internal review

Document Main Author Karine Lan (UTT)

Document signed off by Myriam Lewkowicz (UTT)

Page 4: D1.2 Usability Evaluation Methodologydownload.microsoft.com/download/7/1/D/71DC7697-47EE-45A5... · 2018-10-15 · When older adults reject technology, it tends to be due to not

[PaeLife: Personal Assistant to Enhance the Social Life of the Seniors, Contract nº AAL-2009-2-068]

4

Table of contents

INTRODUCTION ..................................................................................................... 5

Scope and structure of the Methodology document .......................................................................... 5

1. IMPROVING USABILITY WHEN DESIGNING FOR THE ELDERLY .......................... 6

What is “usability”? ............................................................................................................................. 6

Usability norm ..................................................................................................................................... 7

WP6.1 and WP6.2-6.5: producing insights as part of an iterative design process ............................. 8

2. WP 6.1: PRELIMINARY USABILITY EVALUATION OF PAELIFE TECHNOLOGIES AND PROTOTYPES

.................................................................................................................... 10

Preliminary usability questioning around the prototype .................................................................. 11

Verbal protocol analysis ................................................................................................................ 11

User needs elicitation interviews .................................................................................................. 13

Focus group ................................................................................................................................... 14

Insights of WP6.1 ............................................................................................................................... 15

Precision note concerning roadmap for Deliverable 6.2 ............................................................... 15

3. WP6.2-6.5: FIELD TRIALS AND USABILITY EVALUATION ................................. 15

Phase 1: Usability tests of the first version of the PLA ...................................................................... 16

Usability tests ................................................................................................................................ 16

Focus group ................................................................................................................................... 18

Phase 2: Usability tests of the final version of the PLA ..................................................................... 18

One-month field trials ................................................................................................................... 18

Focus group ................................................................................................................................... 22

Insights of WP6.2-6.5 ........................................................................................................................ 22

CONCLUSIVE REMARKS ........................................................................................ 22

BIBLIOGRAPHY .................................................................................................... 23

Page 5: D1.2 Usability Evaluation Methodologydownload.microsoft.com/download/7/1/D/71DC7697-47EE-45A5... · 2018-10-15 · When older adults reject technology, it tends to be due to not

[PaeLife: Personal Assistant to Enhance the Social Life of the Seniors, Contract nº AAL-2009-2-068]

5

Introduction

Society is getting older. In addition to changes in demographics, there has been an enormous change

in technological capabilities – how products work, look, act, and react to people who use them – and

in the potential of these technologies. The PaeLife project is based on the observation that

technology has an impact in all aspects of our life, particularly in the way we grow older. The

project’s main goal is to fight isolation and exclusion and to allow the elderly to be more productive,

independent and to have a more social and fulfilling life, by empowering these elderly users with a

Personal (Virtual) Life Assistant. The PLA would be a virtual presence which supports social

communication, learning and entertainment in an integrated way via an optimized interface.

PaeLife focuses on individuals who are recently retired and who are used to some level of technology

usage and who want to keep themselves active, productive and socially engaged. Although today’s

elderly people over 65 may show some resistance to the adoption of technology, tomorrow’s elderly

will have used technology in the last one or two decades of their lives. Therefore, more and more

consumers and users of technology are becoming part of the category “older adult”. Aging occurs on

many levels – (1) biological, (2) psychological, (3) social. Concerning the interaction with

technologies, aging brings changes in perception, cognition and control of movements (Fisk et al.,

2009). Therefore this change in demographics brings with it important changes in the demands for

products and services adapted to older adults, that fit into a context of use and that meet user

needs. The usability evaluation that will be achieved in the PaeLife project aims at producing

feedback and recommendations as part of an iterative design process (WP6.1 and WP6.2-6.5) to

ensure that the developed PLA meets these requirements.

Motivation

The first aim of this methodology's document is to be a general guideline to using both a qualitative

approach, supplemented by quantitative measures of performance and log file analysis, to do the

usability evaluation of the technology and the prototype (WP6.1) and the developed PLA application

(WP6.2-6.5). The second aim is to plan the timing of the different steps of usability evaluation as part

of the iterative design process. It shares the protocol that has been planned, following the

discussions and contributions of all the partners involved in the evaluation.

It has been unanimously agreed, during videoconference meeting discussions and consortium

meeting that each partner involved in evaluation and field trials will adapt the methodology

proposed in WP1.4 to its own context, to the specificities of the field, and to the resources and

competencies available in the team. Based on partners’ experience and research interests, a protocol

has been elaborated for the evaluation of the prototype and the field trials of the PLA. It appeared

absolutely necessary to distinguish the different phases in the design process, and to adapt the

methodology of evaluation / testing according to the needs and level of progression in the project.

Apart from the differences in the fieldwork techniques, we have selected different participants for

Page 6: D1.2 Usability Evaluation Methodologydownload.microsoft.com/download/7/1/D/71DC7697-47EE-45A5... · 2018-10-15 · When older adults reject technology, it tends to be due to not

[PaeLife: Personal Assistant to Enhance the Social Life of the Seniors, Contract nº AAL-2009-2-068]

6

WP6.1 and WP6.2-6.5. The idea is to avoid introducing a bias in the experience of the first end users

of the PLA, if the latter would have tested the prototype, where the interaction sequences would

unavoidably be different due to the interface, if not the functions.

Goal of the deliverable This deliverable is the method document presented in WP1.4. This document is organized in three

parts. The first part lays the foundation of the importance of usability when designing technological

products and services for the elderly. The second part describes the protocol for WP6.1, where the

prototype will be evaluated in terms of usability and usefulness. The third part describes the protocol

and the schedule of the field trials that the PaeLife project partners are planning to use for WP6.2-

6.5.

1. Improving usability when designing for the elderly

A common stereotype of older adults is that they do not and will not use technology, which is far

from being true. Adults over 65 want to keep up with technology and take advantage of what a

technological world has to offer. Understanding new technologies makes them feel connected to

others and the world in general. When older adults reject technology, it tends to be due to not

perceiving a benefit of the technology, not necessarily because it is too difficult or time-consuming to

learn. The “not worth it” impression is more likely to be stressed by an unusable interface1 (Pak and

McLaughlin, 2011: 4). Indeed, there are two important ergonomics notions in designing HCI: usability

and interface. However, developing a usable product involves more than considering the user

interface.

What is “usability”?

A “usable” product is a product that is learnt naturally, that is easy and effective to use, and is not

unpleasant to the user. Usability is a quality attribute that assesses how easy user interfaces

(application, website, homepage, etc.,) are to use. Another important quality attribute is utility,

which refers to the design's functionality: Does it do what users need? For Nielsen (1994), usability

and utility are equally important and together determine whether something is useful. The author

gives clear definitions and explanations of these notions.

Utility = whether it provides the features you need.

Usability = how easy & pleasant these features are to use.

Useful = usability + utility.

1 Unusable’ or ‘usable’ are the adjectives of the concept usability

Page 7: D1.2 Usability Evaluation Methodologydownload.microsoft.com/download/7/1/D/71DC7697-47EE-45A5... · 2018-10-15 · When older adults reject technology, it tends to be due to not

[PaeLife: Personal Assistant to Enhance the Social Life of the Seniors, Contract nº AAL-2009-2-068]

7

For Nielsen, “It matters little that something is easy if it's not what you want. It's also no good if the

system can hypothetically do what you want, but you can't make it happen because the user

interface is too difficult”. Therefore, concerning products and services designed for the elderly, the

“need of use” – or rather the benefits of use – must be made clear before older adults will voluntarily

adopt technology (Fisk et al., 2009).

To study a design's utility, you can use the same user research methods that improve usability. These

methods will be described in parts 2 and 3 of this report. From an ergonomics perspective, usability

can be measured using different techniques. The insights that are produced aim at improving users’

well-being (health, security, satisfaction, comfort…) and systems’ global efficiency (Baccino et al.,

2005: 15). Concerning the products designed for the elderly, apart from enhancing market

penetration, improved usability will improve quality of life and, with some classes of products, save

lives (Fisk et al., 2009). Therefore, the word "usability" also refers to methods for improving ease-of-

use during the design process (Nielsen, 1994). Among the several methods for studying usability, the

most basic and useful is what is called user testing or usability tests, which will be described below

and used mainly for WP6.2-6.5.

Usability norm

Usability has been defined in the ISO 92412 norm as “the extent to which a system, product or

service can be used by specified users to achieve specified goals with effectiveness, efficiency and

satisfaction in a specified context of use” (ISO 9241-11:2011). Effectiveness is the accuracy and

completeness with which users achieve certain goals. It can be measured by error rates or by the

actions achieved. Efficiency, which is the relation between (i) the accuracy and completeness

with which users achieve certain goals and (ii) the resources expended in achieving them.

Indicators of efficiency include task completion time and learning time. Satisfaction is the users'

comfort with and positive attitudes towards the use of the system. Users' (subjective) satisfaction

can be measured by attitude rating scales.

However, as specified in the ISO 9241 norm, the usability of an object is not a constant and frozen

parameter. Indeed, an object can only be defined as being usable related to a precise type of user –

who represents the main target whom it addresses – and related to a specific context (Baccino et al.,

2005). There are instances where increasing usability for some may reduce usability for others. For

example, making an interface usable for people with visual impairments could involve forcing

unnecessary audio on all users3 (Pak and McLaughlin, 2011). Therefore, measuring the usability when

designing for the elderly requires taking into account these parameters, which are

the context of use

the type of user.

2 This norm is the reference for contributing to standardization in WP7.1

3 However, though it may be unnecessary, the audio does not make the user interface necessarily less usable

Page 8: D1.2 Usability Evaluation Methodologydownload.microsoft.com/download/7/1/D/71DC7697-47EE-45A5... · 2018-10-15 · When older adults reject technology, it tends to be due to not

[PaeLife: Personal Assistant to Enhance the Social Life of the Seniors, Contract nº AAL-2009-2-068]

8

The usability evaluation planned for the PaeLife project fit these requirements of adapting the

design. First, the methodology answers the PaeLife objectives, where users’ limitations and needs are

studied, considering their context (Human, Environment and Application). Second, while contributing

to the state-of-the-art in Speech Recognition (SR) systems, the usability evaluation and testing of the

PLA will take into account the state-of-the-art guidelines concerning the design for older adults4, that

is, considering the normal age-related declines in abilities. The type of qualitative and observational

analysis – mainly based on ethnography and video recordings – that is described in this document

and that will be achieved in WP 6.1 and WP 6.2-6.5, will hopefully enable the identification of:

• information needs;

• visual and auditory requirements;

• demands for focused attention and for retaining information in memory;

• the time necessary to react to signals;

• physical requirements.

Ideally identifying these specific needs and requirements and general usability “problems” will be the

basis to improve the usability of the interface. Interviews will supplement the insights in order to

improve the usefulness of the functions and services of the PLA.

WP6.1 and WP6.2-6.5:

producing insights as part of an iterative design process

WP6.1 is the preliminary usability evaluation of PaeLife technologies and prototypes. At this stage,

based on the discussions among the project partners concerning the services that will be provided by

the developed PLA, the current LHC5 prototype and the developed application will be sensibly

different. These differences concern the interface, the devices as well as the functions. Since

evaluating a prototype or an implemented device have different objectives, we have selected the

techniques that are the most coherent at this stage, depending on the number of prototypes

available and the time available before the implemented PLA (WP5) is available for field trials in

4 We draw on two books published in the “Human Factors & Aging Series” in 2009 and 2011 :

- Fisk A.D., et al., (2009), Designing for Older Adults: Principles and Creative Human Factors Approaches (second

edition), London and New York, CRC Press

- Pak R. and McLaughlin, A., (2011), Designing displays for older adults, Boca Raton, London and New York, CRC

Press

5 LHC is Living Home Center, an application developed by Microsoft prior to the PaeLife project, and which will

evolve and be improved as part of the PaeLife project. The application developed as part of the PaeLife project

will be referred to in this document as the PLA.

Page 9: D1.2 Usability Evaluation Methodologydownload.microsoft.com/download/7/1/D/71DC7697-47EE-45A5... · 2018-10-15 · When older adults reject technology, it tends to be due to not

[PaeLife: Personal Assistant to Enhance the Social Life of the Seniors, Contract nº AAL-2009-2-068]

9

WP6.2-6.5. Based on the availability of the text of the graphical interface after translation and

implementation, in French, Hungarian and Polish, the preliminary usability evaluation will start in

April 2013. The first insights to inform the development in WP5 for the second iteration will hopefully

be available by the end of May 2013. Testing the LHC prototype will result in preparatory user

feedbacks about the quality and usability of speech and gesture modalities, the proposed mixed HCI

interface and the PLA services. This feedback will be used in WP2, WP3, WP4 to adjust the proposed

techniques. In this methodology we focus on preliminary usability evaluation of the prototype,

which is connected to D6.4, while the exact methodologies for voice talent selection, and technology

testing (D6.1-D6.3) must be adapted during the execution of WP6 considering the statements of the

present document.

WP6.2-6.5 will focus on the field trials and usability evaluation of the pilot application. The planning

for the availability of the application – produced by WP5 – has been decided by the consortium:

September 2013 – availability of the first version of the application

October 2013 – availability of the final application after iterative improvement informed by

WP6

The evaluation tasks in WP6.2-6.5 will immediately follow the availability of the first version or final

version of the application. The first stage will immediately follow the availability of the first version of

the application – September 2013 – and will consist of usability tests and a focus group. The second

stage is the evaluation of the final application, which will be available in October 2013 and will

consist of a one-month field trial at the elderly participants’ place and a focus group.

Page 10: D1.2 Usability Evaluation Methodologydownload.microsoft.com/download/7/1/D/71DC7697-47EE-45A5... · 2018-10-15 · When older adults reject technology, it tends to be due to not

[PaeLife: Personal Assistant to Enhance the Social Life of the Seniors, Contract nº AAL-2009-2-068]

10

Figure 1 : Timing and organization of Usability evaluation of WP6

The methods for each stage have been chosen according to their coherence with the level of

progression in the iterative design process. Together with the preliminary usability questioning

achieved in WP6.1, the two stages in WP6.2-6.5 brings the total number of tests to three, which is

considered as a good number of iterations (Nielsen, 1993). The planning and protocol of the different

tasks of WP6 are described below, in parts 2 and 3 of this deliverable.

2. WP 6.1: Preliminary usability evaluation of PaeLife technologies

and prototypes

An important question – both methodological and organisational – has been raised and discussed by

the project partners: How are the insights in WP6.1 going to inform the development and

implementation of the technologies (WP2-4) and the application (WP5), so that there is a

progression between WP6.1 and WP6.2-6-5? It appeared absolutely necessary to create clear bridges

Page 11: D1.2 Usability Evaluation Methodologydownload.microsoft.com/download/7/1/D/71DC7697-47EE-45A5... · 2018-10-15 · When older adults reject technology, it tends to be due to not

[PaeLife: Personal Assistant to Enhance the Social Life of the Seniors, Contract nº AAL-2009-2-068]

11

between WP6.1 + WP6.2-6.5 and WP2-5 in terms of planning and coordination, so that the insights of

the evaluation WPs could efficiently inform the implementation WPs. Through this coordination and

collaboration in between WPs, it would be possible to make the most of (i) the prototype as an

artefact for reflective practice and (ii) user participation in the iterative design approach. In order to

achieve this efficiency, partners doing the evaluation and the partners doing the development have

defined a working method and will collaboratively define the formalization of the insights, so as to

transfer the knowledge of the first prototype testing and usefully inform the development in WP2-5.

Preliminary usability questioning around the prototype

Adopting the view that a technology’s adoption and success relies on its usefulness and not just on

the usability of the interface, WP6.1 will consist of what we would call “preliminary usability

questioning” and qualitative interviews. It will involve the participation of three to five users,

depending on time available. In other words, though the form may resemble simple usability tests,

the focus will not be about testing specifically the prototype’s interface in terms of usability. The

objective is to use the prototype to question usability issues in a general way, and be the basis to

make contextual interviews to elicit user needs.

Verbal protocol analysis

The preliminary usability questioning will, like usability tests, involve users experiencing certain

aspects of the prototype and solving tasks with it. They will be organized around a technique which is

called think aloud testing or verbal protocol analysis. The slight difference between the two terms is

that the first one is used in an experimental framework from the interface of a developed product or

prototype that is being tested, and the second in a more “natural” activity situation. However, the

principle is the same: it involves having participants performing a task or set of tasks and verbalizing

their thoughts (“talking aloud”) while doing so. This technique is based on users' spoken comments,

where they verbalise how they use the system, explaining what they are trying to do and the type of

problems they experience.

The basic assumption of verbal protocol analysis is that when people talk aloud while performing a

task, the verbal stream can be taken as a reflection of the cognitive processes in use. Concerning

usability and usefulness issues, verbal protocols can reveal information about misconceptions and

conceptual change, strategy acquisition, use, and mastery, task performance, affective response, and

the like. Verbal protocols are useful at any stage in the development lifecycle. During evaluation, it

allows to:

1. determine how effective the device is, as in the case of the prototype, as an integrated social

communication tool.

Page 12: D1.2 Usability Evaluation Methodologydownload.microsoft.com/download/7/1/D/71DC7697-47EE-45A5... · 2018-10-15 · When older adults reject technology, it tends to be due to not

[PaeLife: Personal Assistant to Enhance the Social Life of the Seniors, Contract nº AAL-2009-2-068]

12

2. assess the user’s learning and performance. Indeed, verbal protocols can provide valuable

information about the user’s cognitive processes, beyond simple measures of accuracy and

time on task.

However, we will approach the protocol with only a general idea in WP6.1, without implementing an

effective coding scheme that maps to the cognitive processes of interest.

The type of verbal protocol analysis that is planned for WP 6.1 is concurrent verbal protocol, that is,

delivered at the same time as the participant performs the task, and are ideally unprompted by the

experimenter. It should be distinguished from the retrospective verbal protocols, which will be used

mainly at a later stage, in WP6.2-6.5 (cf. part 3 of this document). However, depending on the

difficulty of the task to be achieved - “difficulty” linked mainly to users’ competencies and abilities to

use the LHC prototype, and that will be revealed in situ - we might prefer to use retrospective verbal

protocols. Indeed, the think aloud protocol concurrent to the performance of the task can impose an

additional task-load on the user. This load can alter the way the user performs the task.

However, when using the retrospective protocol, in order to avoid a bias because it relies upon the

user's memory after running the task, we will prefer the self-confrontation interview. In this protocol,

users are asked to comment on their actions in front of the video recording after the system use.

Thus, for retrospective protocols, video recording will be preferred, whereas for concurrent protocol,

the verbal protocols may be collected by direct note taking, by simple audio recording or by video

recording. In every video recording utilization, the participants will be asked to sign a declaration of

consent.

Video recording has a major advantage: the phenomena are saved – the user’s non-verbal behaviour,

visual landmarks regarding the different states of the interface. This allows the researcher to relate

much more easily the user’s verbalisations to the actions that he/she has achieved. It is also possible,

for the internet applications, to save the logs linked to the browsing that has been done. Data

collection is rapid, because very few special arrangements need to be on-site, and data analysis will

usually be conducted off-site. The data analysis is much longer, and one disadvantage of this method

is that it is time consuming to analyse audio and videotapes afterwards.

This type of think aloud testing will allow the identification of usability problems with the prototype’s

interface. But the main focus of attention is not on the usability of the interface, but on usefulness in

a more global sense. Thus, the task analysis – especially the authentic user tasks – aims at (i) gaining

an understanding of the users' perception of the usefulness of the functionalities, and (ii) identifying

user needs. They allow an observation of the user's actions in a given situation and context. This is

the reason why, even though the activity achieved by the user is somehow “artificially provoked”,

the tests and interviews will be made at the user’s place. Indeed, for both practical and

methodological6 reasons, the usability tests and interviews will be done at the users' homes or at the

6 The practical reason is that none of the universities part of the Consortium is equipped with a usability lab.

The methodological reason is that we are not favorable to experimental situations (though we are conscious

that task assignments are somehow artificial) completely uncorrelated from ordinary practice, since our

research interest is, not only on usability issues, but on “usage” in a more broader sense.

Page 13: D1.2 Usability Evaluation Methodologydownload.microsoft.com/download/7/1/D/71DC7697-47EE-45A5... · 2018-10-15 · When older adults reject technology, it tends to be due to not

[PaeLife: Personal Assistant to Enhance the Social Life of the Seniors, Contract nº AAL-2009-2-068]

13

elderly association to which they belong, i.e., a familiar place, where they conduct ordinary day-to-

day activities. However, care must be taken to avoid interruptions – phone calls, people coming in

the room etc. The think aloud testing will last a maximum of one hour. These think aloud testing

around the prototype will lead to the interviews, which will take place during the same

morning/afternoon session, after a short break. The objective of these qualitative interviews is to

elicit and understand user needs and the users’ context of use of the technologies they already use at

their home as part of their daily life.

User needs elicitation interviews

The individual in-depth interviews will be semi-directive, lasting from one to two hours, and should

be recorded and partially transcribed7. These interviews will focus on the current uses of ICTs, the

careers of the users (the different technologies that they have used so far), how they have learnt to

use these technologies, and how their use of technology is embedded in their way of life, particularly

the way in which they relate to their friends and family. The interviews will also include a session of

practical observation of the real use of ICTs that are owned and currently used in the day-to-day

home environment. For example, if users describe the way in which they use their smartphone, they

will be asked them to demonstrate this using the verbal protocol analysis.

Interviews and observations are both qualitative methods that will enable precise descriptions to be

obtained on current practices, and thus provide a full vision of the context in which users could use

the PLA developed in the project. This understanding of the context of use of the ICTs will be

extremely useful for the analysis of users’ needs that will be completed with: (i) the validation of the

functionalities (following the first focus group) and (ii) information from previously published studies

on the use of ICTs by the elderly.

The objective of these interviews is to grasp a broad understanding of the elderly users’ way of life,

so that the utility of the prototype and its functionalities can be examined as part of a research

interest on social usage. These interviews will unable to grasp the context of use and thus,

supplement the insights about user requirements provided by WP1 (cf. D1.1), answering a gap

pointed by WP1, where “This discrepancy between the observation results and questionnaire data

may suggest that the users rated how satisfied they were with themselves when completing the task,

rather than how satisfying the task itself was. There is not enough data, however, to verify the

suspicion, hence it would need further investigation.” The argument based on the presumption that,

although the completion of a particular task may be fun, the user himself may not be very pleased

with his/her results when completing the task, which affect the final rating. This suggests including

two different questions during the future interviews:

1. How satisfying was the task?

7 The transcriptions constitute a first level of analysis of what has been said during the interview. As a

document, it has the advantage of easily being shared between members of the research team, and verbatim

quotations are efficient means to integrate the user’s perspective in research reports.

Page 14: D1.2 Usability Evaluation Methodologydownload.microsoft.com/download/7/1/D/71DC7697-47EE-45A5... · 2018-10-15 · When older adults reject technology, it tends to be due to not

[PaeLife: Personal Assistant to Enhance the Social Life of the Seniors, Contract nº AAL-2009-2-068]

14

2. How pleased are you with completing the task?

Also, the interviews will iteratively validate the choice of devices the consortium made, based on the

insights of WP1 concerning user’s preference. The qualitative approach proposed in WP6 will focus

on the details and the context of use, and investigate, in real-time, the degree of satisfaction when

using the prototype. These individual in-depth interviews will be supplemented by focus groups.

Focus group

Focus groups are planned at different moments of the project, to explore different research/design

questions, which will be clearly identified once the different versions of the developed PLA is

available. The total number of focus groups planned is 3, but may vary according to the research

questions that will reflexively emerge out of the preliminary usability questioning (WP6.1), or to the

time available when the developed PLA is available for phases 1 and 2 of WP6.2-6.5, and the end

date of the project. For WP6.1, there will be one focus group.

As name suggests, a Focus Group is a group activity whose objective is to focus on specific aspects of

a study matter. The technique consists in creating small user groups part of the target of the product,

who are made to interact around a relevant subject. This technique can be used both in the design

phase and in the evaluation phase. In the design phase, the objective of the focus group is to collect

information about user characteristics and needs. In the evaluation phase, it is possible to bring

participants to talk about the prototype and gather feedback, whether positive or negative. Unlike

other techniques, e.g. usability tests, focus groups do not allow to gather objective information on

the efficiency. It allows to collect the users’ subjective impressions about the device easiness of use

and learning etc.

Independent of the schedule of the focus groups, the idea is to have 10 participants for each focus

group. In each focus group involving a total of 10 participants, 2 or 3 end users will be automatically

integrated. When participating in the focus group, these participants will have already done the

preliminary usability questioning (WP6.1), the usability tests (WP6.2-6.5 phase 1) or field trials

(WP6.2-6.5 phase2). From this point of view, these users will have a different status and role to play

compared to the 8 other users, as it will be the evaluation of their own experiences that will have

enabled (i) the identification of a range of existing usages, and also to refine (ii) the list of usability

problems with the LHC prototype (identified in WP6.1) as well as (iii) the list of the potential

functionalities of the system. These users will effectively become mediators between the researchers

and the other users, and it is notably this characteristic or ability that have been taken into account

when selecting the users: These users will have already been involved doing the questionnaires and

the workshops (WP1) and will be involved in the field trials.

The topic of the focus group for WP6.1 will be the usability of different devices, focusing on the LHC

prototype, and the perceived usefulness of these existing technologies in their daily life. The idea is

that users discuss about the devices (that they would have tested during the workshop of WP1 and

which they will be able to manipulate) in a general way, to try the LHC and discuss it. The objective is

Page 15: D1.2 Usability Evaluation Methodologydownload.microsoft.com/download/7/1/D/71DC7697-47EE-45A5... · 2018-10-15 · When older adults reject technology, it tends to be due to not

[PaeLife: Personal Assistant to Enhance the Social Life of the Seniors, Contract nº AAL-2009-2-068]

15

to test/confirm the insights collected during the preliminary usability questioning based on the LHC

prototype and the individual interviews.

Insights of WP6.1

It is expected that the WP6.1 produce interesting insights concerning:

the usefulness of the functions and modalities present in the LHC prototype;

usability issues and guidelines for the design of interface for older adults;

communication habits and way of life of the elderly and how they may be supported by the

PLA;

prioritization of features in a prototype.

These insights will be formalized in very precise requirements8 so that they can readily be used for

the development in WP5.

Precision note concerning roadmap for Deliverable 6.2

The deliverable D6.2 “Preliminary Usability Evaluation of Multimodal HCI” cannot be produced at this

stage in the project. Our aim is to do this evaluation with multimodal interaction created using

technologies developed in the framework of the PaeLife project, and not existing ones. As part of the

modalities is delayed (ex: touch), and only recently a first very simple proof-of-concept prototype

was created, we will proceed with this evaluation in the near future.

3. WP6.2-6.5: Field trials and usability evaluation

WP 6.2 aims at planning the end-user field trials in the countries part of the consortium, so that

usability evaluation can be executed in the respective countries: WP6.3 – Hungary, WP6.4 – France,

WP6.5 – Poland.

8 Partners have agreed to collaboratively discuss the best way to formalize these insights once they will have

been produced.

Page 16: D1.2 Usability Evaluation Methodologydownload.microsoft.com/download/7/1/D/71DC7697-47EE-45A5... · 2018-10-15 · When older adults reject technology, it tends to be due to not

[PaeLife: Personal Assistant to Enhance the Social Life of the Seniors, Contract nº AAL-2009-2-068]

16

Phase 1: Usability tests of the first version of the PLA

Usability tests

Usability testing is a technique used to evaluate a product by testing it with representative users. The

goal is to identify usability problems, collect quantitative data on participants' performance (e.g.,

time on task, error rates), and determine participant's satisfaction with the product. Simple usability

tests, where users think out loud are cheap, robust, flexible, and easy to learn. They are qualitative in

nature. Thinking aloud may be the single most valuable usability engineering method (Nielsen, 1993).

In the test, these users will try to complete typical tasks. Doing so, the user identifies and describes

usability problems while using the system that is being tested, that is, he/she comments his/her

actions and what is seen on the screen, in real-time. A researcher-observer is present and prompts

the user to continue talking, watches, listens and takes notes. Some partners in the PaeLife will use

video recording of the activities, screen content and user's comments. These data will be used for

later analysis to identify usability problems. The earlier those problems are found and fixed, the less

expensive they are. The identified problems will be formalized in a list, organized by degree of

severity, and will include a detailed description of each usability problem.

Studies on usability tests (Lindgaard and Chattratichart, 2007) show no significant correlation

between the number of users and the number of severe problems identified. Nielsen recommends

having tests with a maximum of 5 users and summarizes as such: “Elaborate usability tests are a

waste of resources. The best results come from testing no more than 5 users and running as many

small tests as you can afford.” This will be the position that we will adopt in order to make a more in-

depth qualitative analysis of each test. Indeed, the collection of data from a single test user brings a

lot of insights – almost a third of all there is to know about the usability of the design. With a second

user, there is some overlap in what you learn, but since people are different, the second user adds

some amount of new insight. The third user will do many things that were already observed with the

first user or with the second user and even some things that have been identified already twice. But

still, the third user will generate a small amount of new data, even if not as much as the first and the

second user did. After the fifth user, it is a waste of time to observe the same findings repeatedly but

not learning much new (Nielsen, 2000). Therefore, for both methodological and practical reasons,

the usability tests will involve 3 or 4 participants9 when the research interest is in understanding the

usability problems in detail, taking into account the context in which they happen. However, when

the research interest is more focused on measuring the number of errors and task completion rate,

through the gathering of metrics to be able to do statistics, a greater number of users may be

needed. France will adopt the first approach, Poland and Hungary the second approach.

Even though there is no correlation between the number of users and number of severe problems

identified, there is a significant correlation between the number of tasks and the number of

problems identified: Higher task coverage causes a higher number of usability problems to be

9 The total amount of participants for WP6 is 10 in each country (they are the same having done the workshops

in WP1). WP6.2-6.5 will be the last evaluation of the final application based on field trials and will involve 6

participants (see below).

Page 17: D1.2 Usability Evaluation Methodologydownload.microsoft.com/download/7/1/D/71DC7697-47EE-45A5... · 2018-10-15 · When older adults reject technology, it tends to be due to not

[PaeLife: Personal Assistant to Enhance the Social Life of the Seniors, Contract nº AAL-2009-2-068]

17

identified. Therefore, knowledge on the usage domain will be mobilized so as to identify as many

tasks as possible, with a clear goal analysis so as to minimize variation among the usability problems

identified. The tasks will be identified by the research team after installing the first version of the

application on the tablet, and going through and testing the different functions and modalities,

including, if possible, the speech recognition.

The usability tests are based on:

1. predefined task assignments

2. users' own authentic tasks.

It is useful to combine both types of tasks, because each type allows different types of insights to

be gained. On the one hand, predefined tasks allow the research teams to deal with specific

usability problems and thus, to collect useful information concerning the interface and the users'

difficulties or preferred ways of interacting with the device. These exploratory tests keep the

users within the limits of specific areas of interest, which have been clearly identified at the level

of the interface. On the other hand, an authentic system use in non-task based conditions allows

an identification of a varied set of problems, since other parts of the system are explored, and is

not restricted to usability issues.

The tasks – whether predefined or authentic – are realized after users have been explained the

different functions. The instructions given to users concerning usability issues that they have to

report will be both deductive – conceptually explaining what is a usability problem before doing the

activity – and inductive – the user discovers the issue by him/her-self after being given examples.

This combination of deductive and inductive instructions will hopefully allow users to identify more

problems than when using deductive instructions only, as it caters for the difference in users'

preference.

Apart from allowing the identification of usability problems, task assignments – especially the

authentic user tasks – aim to:

measure the efficiency and performance through task times;

gain an understanding of the users' perception of the utility/usefulness of the functionalities;

identify user needs.

Therefore, it is clear that our conception of usability tests is more qualitative than quantitative: The

aim is not just to see whether the user has succeeded in achieving a given task and measure how

much time has been necessary. The usability tests allow an observation of the user's actions in a

given situation and context, even though this situation is somehow “artificially provoked”. The video

recordings will serve to analyse (i) the metric data concerning the tasks, and, perhaps more

importantly from our research perspective, (ii) the situated action of the user, drawing on a

qualitative approach.

Page 18: D1.2 Usability Evaluation Methodologydownload.microsoft.com/download/7/1/D/71DC7697-47EE-45A5... · 2018-10-15 · When older adults reject technology, it tends to be due to not

[PaeLife: Personal Assistant to Enhance the Social Life of the Seniors, Contract nº AAL-2009-2-068]

18

Focus group

A focus group will be organized following the usability tests, involving the 3-4 users per country who

have participated in the usability tests over a total of 10 participants. The aim will be to discuss the

usability problems that will have been identified – why they are ‘problems’ – and generate ideas

about ways to fix it. It will be organized in two ways that will supplement each other:

Group interviewing – every participant is invited to talk with the intervention of a moderator

Small group activities – e.g. to test, or at least try the application, either in autonomy or with

the presence of a moderator, before discussing it in the group interview

This subjective evaluation of participants’ perception will supplement the metric and qualitative data

collected during the usability tests.

Phase 2: Usability tests of the final version of the PLA

One-month field trials

The field trials will be based mainly on observational methods. These methods involve an investigator

viewing users as they work in a field study, and taking notes of, or video recording the activity that

takes place. Observation may be either direct, where the investigator is actually present during the

task – what is known as ethnography – or indirect, where the task is viewed a posteriori by some

other means such as through the use of a video recorder. The method is useful early in user

requirements specification for obtaining qualitative data. It is also useful for studying currently

executed tasks and processes. As explained above, observational methods in field studies are

qualitative in nature. However, not all the partners have the same scientific interests and background

for qualitative approach. We have therefore agreed, after fruitful discussions, that France (UTT, who

leads these tasks) would adopt observational-qualitative methods, based mainly on ethnography and

video analysis, and that Hungary and Poland would adopt a more quantitative approach, using log

files analysis. Despite the difference in the methods of analysis, the timing and protocol of the field

trials will be the same for all the partners involved in this WP, so that we will be able to cross the

results more efficiently. We are convinced of the interest of complementing qualitative and

quantitative analyses for the field trials to obtain global insights.

Based on (i) the resources available – both human (1 field researcher in each country) and material

(2 video recording devices for France, and most probably 2 devices for each country because of cost

reasons), and (ii) the amount of log data that we think will be necessary, we are planning the field

trials with two end users in parallel. The field trials with each end user will last one month, as

specified in the proposal. Assuming that the final application is actually available in October 2013, the

time available will be three months. This will make a total of 6 users participating in the field trials.

During this month, the users will keep a media diary. The idea that had at first been considered was

to keep a written media diary or an audio media diary for self-reporting, where users write down

Page 19: D1.2 Usability Evaluation Methodologydownload.microsoft.com/download/7/1/D/71DC7697-47EE-45A5... · 2018-10-15 · When older adults reject technology, it tends to be due to not

[PaeLife: Personal Assistant to Enhance the Social Life of the Seniors, Contract nº AAL-2009-2-068]

19

their comments, suggestions and ideas. Instead we have chosen a “videoconference media diary”,

where, instead of writing, the user discusses his/her use of the PLA and usability problems with the

researcher during online sessions precisely using the PLA. Indeed, a videoconference media diary has

a double advantage. First, data collection is not restricted because users are unwilling to write, or

forget to keep the media diary. Second, since the research object is precisely about ICT-mediated

social interaction, users will indeed use this technology to communicate, in a motivated but not

“artificial” way. Since this type of diary keeping may be intrusive, the regularity with which the

researcher will contact the user needs to be determined with the users themselves. However, in case

there are users who are categorically against regular video conference sessions with the researcher,

the written media diary will be used instead.

Before the beginning of the each one-month field trial, we are planning to organize a meeting with

the two test participants. The objective is to inform them about the objectives of field trials and the

whole scenario. This can be an opportunity to show them the PLA and collect opinions in a less

formal environment than a focus group, and eventually train them in using the PLA. The users will

also have a presentation of the functions of the PLA and the tasks that they will prescriptively asked

to achieve during the field trials. These scenarios will be defined when the final application will be

ready, but, for example, the prescribed tasks will be, in week 1 to write x emails, appoint meeting

and write it down in calendar; in week 2 organize skype teleconference and so on.

This one-month field trial will be organized around stepmarks every week.

Visit 1: the PLA is installed at the end user's place. Comments of appreciation or difficulties are

recorded (either by note-taking or video recording), like in a usability test, but in a less formal way

with no predefined tasks. The researcher can assist the user, depending on personality, level of

competence with technologies, etc. to optimize user's confidence in the use of the PLA.

Week 1: the end user is left with the device, and can familiarize with the device, using it at his/her

own pace.

Visit 2: At the beginning of the second week, the researchers go the user's place. A short interview is

made to collect comments on their use during the first week (to supplement the impressions,

difficulties, satisfaction, etc., in the case of a written media diary being kept instead of a Skype media

diary). In France, a simple-to-use video recording device is installed at the French users’ place. The

researcher explains to the user how to switch on the recording, and the reason for recording. In

Poland and Hungary, the users are informed of the fact that the log data will be saved, even though

data acquisition is completely transparent and anonymised for them.

Page 20: D1.2 Usability Evaluation Methodologydownload.microsoft.com/download/7/1/D/71DC7697-47EE-45A5... · 2018-10-15 · When older adults reject technology, it tends to be due to not

[PaeLife: Personal Assistant to Enhance the Social Life of the Seniors, Contract nº AAL-2009-2-068]

20

Week 2: Like week 1, the user is left with the device and can use it freely in the usual home

environment and context. He/she continues keeping the media diary, either in a written way or

through regular skype sessions with the researcher. The difference with week 1 (for French users) is

that whenever the user uses the PLA, he/she video-records his activity. The user switches off the

recording when he/she finishes using the PLA. A methodological precision that seems important here

is that there should not happen any “mise-en-scène” (stage setting): the user is not supposed to

comment what he/she is doing in the video, or do special tasks for the needs of the recording.

He/she simply turns the recording on, and acts “normally”, i.e. in a usual and natural way. The video

recordings will then be analysed by the French social scientists and used for self-confrontation

interviews.

The log files tracking the end-users activities will be collected automatically by the system, and will

serve as the basis for the analysis of the tasks achieved by the Hungarian and Polish users. We will

collect information from internet service providers about data transmission and dates of using the

prototype. On that basis we will build usage profile for every end user and present statistic data for

all users. Log file analysis (that is the files that tracks the activity of the user through the interface) is

used to study the behaviour, identify strategies that are most often used to browse, to identify the

mistakes that are frequently made by users. Log files are a list of tasks the server actually completed,

that corresponds to the server requests and responses.

All access and error information coming from the PLA application will be logged. Partners from WP5

and WP6 have come to an agreement concerning the level of detail that the system will allow to

track log activity. This information will concern several usability aspects:

Response time: page to be displayed and requests (when a user presses a button and the

action/function is finalized). This information will allow to know if everything is working and

if the time response is reasonable.

Problems experienced while using the portal. This information will be relevant to track and

fix any problems. All abnormal/errors situations will be logged in the application to display a

proper message.

Errors in forms: empty fields, mandatory information missing.

Data transfer: this information allows the evaluation of whether the requests/responses

have the appropriate format.

Data transmission statistics will be anonymous due to regulations on personal data protection and

privacy issues.

The interest of combining a qualitative approach (France) and a quantitative approach (Hungary and

Poland) is that the integrated analysis will have a holistic understanding of how the PLA is used,

gaining insights on both context of use through observational analysis of usage in the home, and

patterns of use and performance indicators like average task time.

Page 21: D1.2 Usability Evaluation Methodologydownload.microsoft.com/download/7/1/D/71DC7697-47EE-45A5... · 2018-10-15 · When older adults reject technology, it tends to be due to not

[PaeLife: Personal Assistant to Enhance the Social Life of the Seniors, Contract nº AAL-2009-2-068]

21

Visit 3: the researcher (France) takes back the recording device to collect the video data. Hungarian

and Polish researchers can collect the first log files at the same moment. It may be a good

opportunity at this stage, in case it is written, to check whether the media diary is regularly kept so

that it can provide useful information. At this stage, the user may be encouraged to make

suggestions concerning usability issues as well as functionalities. Most of all, the interviews may

focus on usage, that is, aim at understanding the context of use of the PLA, exploring the ways in

which the PLA is embedded in the user's way of life, i.e. how it acts as an “assistant” and not just as a

device that is occasionally used.

Week 3: the user continues to use the device, perhaps in a more critical way.

Week 3: this week is used by the researchers to, either (for France) watch the video recording, select

extracts that will be shown to the user and prepare the self-confrontation interviews, or (Hungary

and Poland) do a first analysis of the log files collected so far.

Visit 4: Self-confrontation interviews (France), drawing on retrospective verbal protocols. The users

are shown video recording extracts of their own activity, and asked to explain what happened, what

is the task they were trying to achieve, what were their difficulties, were they experiencing

satisfaction or frustration, etc., The objective of showing videos is that the user cannot remember

everything he/she has done, and therefore provide more interesting insights than a posteriori

interviews. Also, instead of having a researcher interpreting the user's actions from an exterior

perspective, video analysis is used differently, where the user him/her-self explains and interprets

his/her past actions.

Week 4: last week of field trials

The user keeps using the PLA at his/her home, so that the field trials can last a whole month, allowing

enough time for the user to have the feeling he/she has tested all the functionalities and understand

how the PLA can be integrated in his/her daily life, and support his/her activities (or not).

Visit 5: final debriefing interview of 1-month trial. It is not possible or desirable to determine exactly

what will/should be the content of this final interview a posteriori. The topics that will be talked

about and the importance of them, will depend on the phenomena that will have emerged of the

one-month trial.

Page 22: D1.2 Usability Evaluation Methodologydownload.microsoft.com/download/7/1/D/71DC7697-47EE-45A5... · 2018-10-15 · When older adults reject technology, it tends to be due to not

[PaeLife: Personal Assistant to Enhance the Social Life of the Seniors, Contract nº AAL-2009-2-068]

22

Focus group

A final focus group will be organized, depending on the time available between the moment insights

are produced from the analyses of the field trials and the end of the project. It will act as a

debriefing/summing up session, where researchers will be able to confirm the phenomena that will

have emerged of WP6 as a whole. All end users who take part in a test will share their insights and

comment the field trial tests. The results of the meeting will also be included in a final

documentation.

Insights of WP6.2-6.5

The insights of the usability tests, field trials and focus group will produce analyses that will be

formalized according to three aspects – observation, interpretation and recommendation – in D6.5

and D6.6. The recommendations that will be proposed will be argued and based on concrete

examples; the specifications will be written in a way that will be as clear as possible for design teams,

in order to be useful for the final iteration.

Conclusive remarks

This Evaluation Methodology document, which is the deliverable D1.2, aims at being the basis for

sharing an integrated methodology and protocol for WP6.1 and WP6.2-6.5, that is unanimously

agreed upon by all the partners. The objective, that we believe is practically possible, is to share each

other's results, and collaborate to make a global analysis of the qualitative and quantitative data that

will be collected for the evaluation part.

Having explained the approach adopted in WP6, the need for coordination reveals to be even more

important. It has been agreed that partners will be efficient in making the bridge between the

different WPs, for example to make use of the insights of the evaluation of the prototype inWP6.1 –

in terms of usability of the interface and usefulness of functions – before finalizing the

implementation of the device. We believe that the iterative design process which PaeLife has

adopted will succeed in significantly improving the usability and usefulness of the PLA and follow the

guidelines for designing for older adults, so that it fits the needs and context of use of the elderly

people.

Page 23: D1.2 Usability Evaluation Methodologydownload.microsoft.com/download/7/1/D/71DC7697-47EE-45A5... · 2018-10-15 · When older adults reject technology, it tends to be due to not

[PaeLife: Personal Assistant to Enhance the Social Life of the Seniors, Contract nº AAL-2009-2-068]

23

Bibliography Baccino, T. et al. (2005). Mesure de l’utilisabilité des Interfaces, Hermès Science Publisher: Paris.

Fisk A.D., et al., (2009). Designing for Older Adults: Principles and Creative Human Factors Approaches

(second edition), London and New York, CRC Press

ISO 9241-210:2010, Ergonomie de l'interaction homme-système -- Partie 210: Conception centrée

sur l'opérateur humain pour les systèmes interactifs

Lindgaard, G., & Chattratichart, J. (2007). Usability Testing: What Have We Overlooked? CHI 2007,

(pp. 1415-1424).

Nielsen, J. (1993). Iterative User Interface Design, Jakob Nielsen’s Alertbox: November 1, 1993

Nielsen, J. (1994). Usability engineering, Morgan Kaufmann Publishers Inc. San Francisco, CA

Nielsen, J. (2000). Why You Only Need to Test with 5 Users, Jakob Nielsen’s Alertbox: March 19, 2000

Pak R. and McLaughlin, A., (2011). Designing displays for older adults, Boca Raton, London and New

York, CRC Press