measuring user engagement: the do, the do not do, and the we do not know

40
Measuring user engagement: the do, the do not do, and the we do not know Mounia Lalmas [email protected] World Usability Day Berlin – November 2014

Upload: mounia-lalmas

Post on 02-Jul-2015

1.181 views

Category:

Internet


0 download

DESCRIPTION

In the online world, user engagement refers to the quality of the user experience that emphasises the phenomena associated with wanting to use an application longer and frequently. User engagement is a multifaceted, complex phenomenon; this gives rise to a number of measurement approaches. Common ways to evaluate user engagement include self-report measures, e.g., questionnaires; physiological methods, e.g. cursor and eye tracking; and web analytics, e.g., number of site visits, click depth. These methods represent various trade-off in terms of the setting (laboratory versus in the wild), object of measurement (user behaviour, affect or cognition) and scale of data collected. This talk will present various efforts aiming at combining approaches to measure engagement. A particular focus will be what these measures individually and combined can tell us and not tell about user engagement. The talk will use examples of studies on news sites, social media, and native advertising.

TRANSCRIPT

Page 1: Measuring user engagement: the do, the do not do, and the we do not know

Measuring  user  engagement:  the  do,  the  do  not  do,  and  the  we  do  not  know  

Mounia Lalmas [email protected]

World Usability Day Berlin – November 2014

Page 2: Measuring user engagement: the do, the do not do, and the we do not know

About me §  Since October 2013: Principal Research Scientist at Yahoo Labs

London ›  User engagement, native advertising, social media, search

§  2011- 2013: Visiting Principal Scientist at Yahoo Labs Barcelona ›  User engagement, social media, search

§  2008 2010: Microsoft Research/RAEng Research Professor at the University of Glasgow ›  Quantum theory to model information retrieval

§  1999 - 2008: Lecturer (assistant professor) to Professor at Queen

Mary, University of London ›  XML retrieval and evaluation (INEX)

Blog: labtomarket.wordpress.com

Page 3: Measuring user engagement: the do, the do not do, and the we do not know

This talk § What is user engagement ›  Definitions ›  Characteristics ›  Approaches

§ Attributes of user engagement measurement ›  Scalability ›  Setting ›  Objectivity versus subjectivity ›  Temporality

Page 4: Measuring user engagement: the do, the do not do, and the we do not know

What is user engagement?

Page 5: Measuring user engagement: the do, the do not do, and the we do not know

What is user engagement?

“User engagement is a quality of the user experience that emphasizes the phenomena associated with wanting to use a technological resource longer and frequently” (Attfield et al, 2011)

self-report: happy, sad, enjoyment, …

emotional, cognitive and behavioural connection that exists, at any point in time and over time, between a user and a technological resource

analytics: click, upload, read, comment, share …

physiology: gaze, body heat, mouse movement, …

5

Page 6: Measuring user engagement: the do, the do not do, and the we do not know

Why is it important to engage users? §  In today’s wired world, users have enhanced expectations

about their interactions with technology … resulting in increased competition amongst the

purveyors and designers of interactive systems. §  In addition to utilitarian factors, such as usability, we must

consider the hedonic and experiential factors of interacting with technology, such as fun, fulfillment, play, and user engagement.

(O’Brien, Lalmas & Yom-Tov, 2014)

Page 7: Measuring user engagement: the do, the do not do, and the we do not know

Patterns of user engagement Online sites differ concerning their engagement!

Games Users spend much time per visit

Search Users come frequently and do not stay long

Social media Users come frequently and stay long

Niche Users come on average once a week e.g. weekly post

News Users come periodically, e.g. morning and evening

Service Users visit site, when needed, e.g. to renew subscription

(Lehmann etal, 2012)

Page 8: Measuring user engagement: the do, the do not do, and the we do not know

Why is it important to measure and interpret user engagement well?

CTR

Page 9: Measuring user engagement: the do, the do not do, and the we do not know

Characteristics of user engagement • Users must be focused to be engaged • Distortions in the subjective perception of time used to

measure it

Focused attention (Webster & Ho, 1997; O’Brien,

2008)

• Emotions experienced by user are intrinsically motivating •  Initial affective “hook” can induce a desire for exploration,

active discovery or participation

Positive Affect (O’Brien & Toms, 2008)

• Sensory, visual appeal of interface stimulates user & promotes focused attention

• Linked to design principles (e.g. symmetry, balance, saliency)

Aesthetics (Jacques et al, 1995; O’Brien,

2008)

• People remember enjoyable, useful, engaging experiences and want to repeat them

• Reflected in e.g. the propensity of users to recommend an experience/a site/a product

Endurability (Read, MacFarlane, & Casey,

2002; O’Brien, 2008)

Page 10: Measuring user engagement: the do, the do not do, and the we do not know

Characteristics of user engagement •  Novelty, surprise, unfamiliarity and the unexpected •  Appeal to users’ curiosity; encourages inquisitive

behavior and promotes repeated engagement

Novelty (Webster & Ho, 1997; O’Brien,

2008)

•  Richness captures the growth potential of an activity •  Control captures the extent to which a person is able

to achieve this growth potential

Richness and control (Jacques et al, 1995; Webster &

Ho, 1997)

•  Trust is a necessary condition for user engagement •  Implicit contract among people and entities which is

more than technological

Reputation, trust and expectation (Attfield et al,

2011)

•  Difficulties in setting up “laboratory” style experiments

•  Why should users engage?

Motivation, interests, incentives, and

benefits (Jacques et al., 1995; O’Brien & Toms, 2008)

Page 11: Measuring user engagement: the do, the do not do, and the we do not know

Attributes of user engagement

§ Scale (large versus small) § Setting (laboratory versus field) § Objective versus subjective § Temporality (short- versus long-term)

one is not better than other: it depends on aims and constraints.

Page 12: Measuring user engagement: the do, the do not do, and the we do not know

Measuring user engagement Measures   Attributes  

Self-report Questionnaire, interview, think-aloud and think after protocols

Subjective Short- and long-term Lab and field Small scale

Physiology EEG, SCL, fMRI eye tracking mouse-tracking

Objective Short-term Lab and field Small and large scale

Analytics Intra and inter-session metrics Data science

Objective Short- and long-term Field Large scale

Page 13: Measuring user engagement: the do, the do not do, and the we do not know

Towards reliable and valid measurement

Page 14: Measuring user engagement: the do, the do not do, and the we do not know

Scalability

dozen – qualitative & physiology hundred to thousand – online survey million – analytics

from rich but limited in generalisation … to powerful but hard to explain

Page 15: Measuring user engagement: the do, the do not do, and the we do not know

Large scale measurement – analytics

Metrics •  Dwell time •  Session duration •  Bounce rate •  Play time (video) •  Mouse movement •  Click through rate

(CTR) •  Number of pages

viewed (click depth) •  Conversion rate •  Number of UCG

(comments) •  …

Dwell time as a proxy of user interest Dwell time as a proxy of relevance Dwell time as a proxy of conversion

Intra-session measurement

Page 16: Measuring user engagement: the do, the do not do, and the we do not know

Small scale measurement – eye tracking

18 users, 16 tasks each (chose one story & rate it) eye movement recorded

Attention (gaze) interest has no role position > saliency

Selection mainly driven by interest position > attention

(Navalpakkam etal, 2012)

(Lin et al, 2007)

Page 17: Measuring user engagement: the do, the do not do, and the we do not know

Small scale measurement – focused attention questionnaire 5-point scale (strong disagree to strong agree)

1.  I lost myself in this news tasks experience 2.  I was so involved in my news tasks that I lost track of time 3.  I blocked things out around me when I was completing the

news tasks 4.  When I was performing these news tasks, I lost track of the

world around me 5.  The time I spent performing these news tasks just slipped

away 6.  I was absorbed in my news tasks 7.  During the news tasks experience I let myself go

(O'Brien & Toms, 2010)

Page 18: Measuring user engagement: the do, the do not do, and the we do not know

Small scale measurement – PANAS questionnaire (10 positive items and 10 negative items)

§  You feel this way right now, that is, at the present moment [1 = very slightly or not at all; 2 = a little; 3 = moderately;

4 = quite a bit; 5 = extremely] [randomize items]

distressed, upset, guilty, scared, hostile, irritable, ashamed, nervous, jittery, afraid interested, excited, strong, enthusiastic, proud, alert, inspired, determined, attentive, active

(Watson, Clark & Tellegen, 1988)

Page 19: Measuring user engagement: the do, the do not do, and the we do not know

Small scale measurement – gaze and self-reporting

§  News §  interest §  57 users §  reading task (114)

§  questionnaire (qualitative data) §  record eye tracking (quantitative data)

Three metrics: gaze, focus attention and

positive affect

All three metrics align: interesting content promote all engagement metrics

(Arapakis etal, 2014)

Page 20: Measuring user engagement: the do, the do not do, and the we do not know

From small- to large-scale measurement – mouse tracking § Navigation & interaction with digital

environment usually involves the use of a mouse (selecting, positioning, clicking)

§ Several works show mouse cursor as weak proxy of gaze (attention)

§  Low-cost, scalable alternative

§ Can be performed in a non-invasive manner, without removing users from their natural setting

Page 21: Measuring user engagement: the do, the do not do, and the we do not know

Relevance, dwell time & cursor

“reading” a relevant long document vs “scanning” a long non-relevant document

(Guo & Agichtein, 2012)

Page 22: Measuring user engagement: the do, the do not do, and the we do not know

Mouse Gestures à Features

x0y0

x1y1

x2y2

x3y3 x4y4

x5y5

x6y6

x7y7

x8y8

t

Δt rest Δt rest

resting cursor (500ms) resting cursor (1000ms) resting cursor (1500ms) click

−2000 0 2000 4000

02000

4000

6000

x

y

●●

●●●●●●●●●●●

●●●

(Arapakis, Lalmas & Valkanas, 2014)

22 users reading two articles 176, 550 cursor positions 2,913 mouse gestures

Page 23: Measuring user engagement: the do, the do not do, and the we do not know

Towards a taxonomy of mouse gestures for user engagement measurement

§  The top-ranked clustering configuration is the Spectral Clustering for the original dataset, with hyperbolic tangent kernel, for k = 38

•  certain types of mouse gestures occur more or less often, depending on user interest in article

•  significant correlations between certain types of mouse gestures and self-report measures

•  cursor behaviour goes beyond measuring frustration •  inform about the positive and negative interaction

Page 24: Measuring user engagement: the do, the do not do, and the we do not know

Setting

laboratory “in the wild”

from high level of consistency and control … to greater external validity and more “true to life”

Page 25: Measuring user engagement: the do, the do not do, and the we do not know

§ How the visual catchiness (saliency) of “relevant” information impacts ›  focused attention ›  affect

§ Saliency model of visual attention developed by (Itti & Koch, 2000)

Crowdsourcing and self-report

Page 26: Measuring user engagement: the do, the do not do, and the we do not know

Manipulating saliency

Web page screenshot

Saliency maps

salie

nt c

ondi

tion

non-

salie

nt c

ondi

tion

(McCay-Peet, Lalmas & Navalpakkam, 2012)

Page 27: Measuring user engagement: the do, the do not do, and the we do not know

Study design §  8 tasks = finding latest news or headline on celebrity or

entertainment topic

§  Affect measured pre- and post- task using the Positive e.g. “determined”, “attentive” Affect Schedule (PANAS)

§  Focused attention measured with 7-item focused attention scale e.g. “I was so involved in my news tasks that I lost track of time”, “I blocked things out around me when I was completing the news tasks” and perceived time

§  Interest level in topics (pre-task) and questionnaire (post-task) e.g. “I was interested in the content of the web pages”, “I wanted to find out more about the topics that I encountered on the web pages”

§  189 (90+99) participants from Amazon Mechanical Turk

Page 28: Measuring user engagement: the do, the do not do, and the we do not know

Using crowdsourcing works

§ When headlines are non-salient: users are slow at finding them, report more distraction due to web page features, and show a drop in affect

§ When headlines are salient: user find them faster, report that it is easy to focus, and maintain positive affect

§ Users reported “easier to focus in the salient condition” BUT no significant improvement in the focused attention scale or differences in perceived time spent on tasks

User interest in web page content is a good predictor è of focused attention, itself a good predictor èof positive affect

Page 29: Measuring user engagement: the do, the do not do, and the we do not know

Objectivity vs Subjectivity

objective – analytics and physiological subjective – self-report

towards reliability and validity … mapping objective and subjective measurement

Page 30: Measuring user engagement: the do, the do not do, and the we do not know

“U

gly”

vs

“N

orm

al” In

terf

ace

BBC News

Wikipedia

Page 31: Measuring user engagement: the do, the do not do, and the we do not know

Mouse tracking and self-reporting §  324 users from Amazon Mechanical Turk (between

subject design) §  Two domains (BBC News and Wikipedia) §  Two tasks (reading and search) §  “Normal vs Ugly” interface

§  Questionnaires (qualitative data) ›  focus attention, positive effect ›  interest, aesthetics ›  + demographics, hardware

§  Mouse tracking (quantitative data) ›  movement speed, movement rate, click rate, pause length, percentage of time

still

(Warnock & Lalmas, 2013)

Page 32: Measuring user engagement: the do, the do not do, and the we do not know

Mouse tracking could not tell much about

§  focused attention and positive affect §  user interests in the task/topic §  aesthetics

§ BUT BUT BUT BUT ›  “ugly” variant did not result in lower USER aesthetics scores ›  although BBC > Wikipedia

• BUT – the comments left … ›  Wikipedia: “The website was simply awful. Ads flashing everywhere, poor

text colors on a dark blue background.”; “The webpage was entirely blue. I don't know if it was supposed to be like that, but it definitely detracted from the browsing experience.”

›  BBC News: “The website's layout and color scheme were a bitch to navigate and read.”; “Comic sans is a horrible font.”

Page 33: Measuring user engagement: the do, the do not do, and the we do not know

Flawed methodology? Non-existing signal? Wrong metric? Wrong measure? § Hawthorne Effect

§ Design ›  Usability versus engagement ›  Within- versus between-subject

§ Mouse movement was not sophisticated enough as shown by recent work (Arapakis etal 2014)

Page 34: Measuring user engagement: the do, the do not do, and the we do not know

Temporality

short-term long-term

from intra-session … to inter-session

Page 35: Measuring user engagement: the do, the do not do, and the we do not know

Large scale measurements – analytics

intra-session measures inter-session measures

•  Dwell time •  Session duration •  Bounce rate •  Play time (video) •  Mouse movement •  Click through rate (CTR) •  Number of pages viewed (click

depth) •  Conversion rate •  Number of UCG (comments). •  …

•  Fraction of return visits •  Time between visits (inter-session time,

absence time) •  Total view time per month (video) •  Lifetime value (number of actions) •  Number of sessions per unit of time •  Total usage time per unit of time •  Number of friends on site (social

networks) •  Number of UCG (comments) •  …

•  intra-session engagement measures success in attracting user to remain on site for as long as possible.

•  inter-session engagement measured by observing lifetime user value.

loyalty popularity

activity

Page 36: Measuring user engagement: the do, the do not do, and the we do not know

Inter-session metric – absence time

short absence is a sign of loyalty

important indication of user engagement

(Dupret & Lalmas, 2013)

Page 37: Measuring user engagement: the do, the do not do, and the we do not know

Absence time – search experience

1.  Clicks after the 5th results reflect poorer user experience; users cannot find what they are looking for

2.  No click means a bad user experience 3.  Clicking at bottom is a sign of low quality overall ranking 4.  Users finding their answers quickly (click sooner) return

sooner to the search application 5.  Returning to the same search result page is a worse user

experience than reformulating the query.

search session metrics absence time

Page 38: Measuring user engagement: the do, the do not do, and the we do not know

Conclusions

Page 39: Measuring user engagement: the do, the do not do, and the we do not know

Measuring User Engagement

1.  No one measurement is perfect or complete.

2.  Studies have different constraints.

3.  Measurement should be applied consistently with attention to reliability.

4.  Mostly “normal” interaction.

5.  “It is a capital mistake to theorize before one has data.” - Arthur Conan Doyle

What is a good signal?

What is a good metric?

What is a correct interpretation?

Page 40: Measuring user engagement: the do, the do not do, and the we do not know

Danke schön

This talk is based on tutorial & book “Measuring User Engagement” (with Heather O’Brien and Elad Yom-Tov)