measuring user engagement: the do, the do not do, and the we do not know

Measuring user engagement: the do, the do not do, and the we do not know

Mounia Lalmas [email protected]

World Usability Day Berlin – November 2014

About me §  Since October 2013: Principal Research Scientist at Yahoo Labs

London ›  User engagement, native advertising, social media, search

§  2011- 2013: Visiting Principal Scientist at Yahoo Labs Barcelona ›  User engagement, social media, search

§  2008 2010: Microsoft Research/RAEng Research Professor at the University of Glasgow ›  Quantum theory to model information retrieval

§  1999 - 2008: Lecturer (assistant professor) to Professor at Queen

Mary, University of London ›  XML retrieval and evaluation (INEX)

Blog: labtomarket.wordpress.com

This talk § What is user engagement ›  Definitions ›  Characteristics ›  Approaches

§ Attributes of user engagement measurement ›  Scalability ›  Setting ›  Objectivity versus subjectivity ›  Temporality

What is user engagement?

What is user engagement?

“User engagement is a quality of the user experience that emphasizes the phenomena associated with wanting to use a technological resource longer and frequently” (Attfield et al, 2011)

self-report: happy, sad, enjoyment, …

emotional, cognitive and behavioural connection that exists, at any point in time and over time, between a user and a technological resource

analytics: click, upload, read, comment, share …

physiology: gaze, body heat, mouse movement, …

5

Why is it important to engage users? §  In today’s wired world, users have enhanced expectations

about their interactions with technology … resulting in increased competition amongst the

purveyors and designers of interactive systems. §  In addition to utilitarian factors, such as usability, we must

consider the hedonic and experiential factors of interacting with technology, such as fun, fulfillment, play, and user engagement.

(O’Brien, Lalmas & Yom-Tov, 2014)

Patterns of user engagement Online sites differ concerning their engagement!

Games Users spend much time per visit

Search Users come frequently and do not stay long

Social media Users come frequently and stay long

Niche Users come on average once a week e.g. weekly post

News Users come periodically, e.g. morning and evening

Service Users visit site, when needed, e.g. to renew subscription

(Lehmann etal, 2012)

Why is it important to measure and interpret user engagement well?

CTR

Characteristics of user engagement • Users must be focused to be engaged • Distortions in the subjective perception of time used to

measure it

Focused attention (Webster & Ho, 1997; O’Brien,

2008)

• Emotions experienced by user are intrinsically motivating •  Initial affective “hook” can induce a desire for exploration,

active discovery or participation

Positive Affect (O’Brien & Toms, 2008)

• Sensory, visual appeal of interface stimulates user & promotes focused attention

• Linked to design principles (e.g. symmetry, balance, saliency)

Aesthetics (Jacques et al, 1995; O’Brien,

2008)

• People remember enjoyable, useful, engaging experiences and want to repeat them

• Reflected in e.g. the propensity of users to recommend an experience/a site/a product

Endurability (Read, MacFarlane, & Casey,

2002; O’Brien, 2008)

Characteristics of user engagement •  Novelty, surprise, unfamiliarity and the unexpected •  Appeal to users’ curiosity; encourages inquisitive

behavior and promotes repeated engagement

Novelty (Webster & Ho, 1997; O’Brien,

2008)

•  Richness captures the growth potential of an activity •  Control captures the extent to which a person is able

to achieve this growth potential

Richness and control (Jacques et al, 1995; Webster &

Ho, 1997)

•  Trust is a necessary condition for user engagement •  Implicit contract among people and entities which is

more than technological

Reputation, trust and expectation (Attfield et al,

2011)

•  Difficulties in setting up “laboratory” style experiments

•  Why should users engage?

Motivation, interests, incentives, and

benefits (Jacques et al., 1995; O’Brien & Toms, 2008)

Attributes of user engagement

§ Scale (large versus small) § Setting (laboratory versus field) § Objective versus subjective § Temporality (short- versus long-term)

one is not better than other: it depends on aims and constraints.

Measuring user engagement Measures Attributes

Self-report Questionnaire, interview, think-aloud and think after protocols

Subjective Short- and long-term Lab and field Small scale

Physiology EEG, SCL, fMRI eye tracking mouse-tracking

Objective Short-term Lab and field Small and large scale

Analytics Intra and inter-session metrics Data science

Objective Short- and long-term Field Large scale

Towards reliable and valid measurement

Scalability

dozen – qualitative & physiology hundred to thousand – online survey million – analytics

from rich but limited in generalisation … to powerful but hard to explain

Large scale measurement – analytics

Metrics •  Dwell time •  Session duration •  Bounce rate •  Play time (video) •  Mouse movement •  Click through rate

(CTR) •  Number of pages

viewed (click depth) •  Conversion rate •  Number of UCG

(comments) •  …

Dwell time as a proxy of user interest Dwell time as a proxy of relevance Dwell time as a proxy of conversion

Intra-session measurement

Small scale measurement – eye tracking

18 users, 16 tasks each (chose one story & rate it) eye movement recorded

Attention (gaze) interest has no role position > saliency

Selection mainly driven by interest position > attention

(Navalpakkam etal, 2012)

(Lin et al, 2007)

Small scale measurement – focused attention questionnaire 5-point scale (strong disagree to strong agree)

1.  I lost myself in this news tasks experience 2.  I was so involved in my news tasks that I lost track of time 3.  I blocked things out around me when I was completing the

news tasks 4.  When I was performing these news tasks, I lost track of the

world around me 5.  The time I spent performing these news tasks just slipped

away 6.  I was absorbed in my news tasks 7.  During the news tasks experience I let myself go

(O'Brien & Toms, 2010)

Small scale measurement – PANAS questionnaire (10 positive items and 10 negative items)

§  You feel this way right now, that is, at the present moment [1 = very slightly or not at all; 2 = a little; 3 = moderately;

4 = quite a bit; 5 = extremely] [randomize items]

distressed, upset, guilty, scared, hostile, irritable, ashamed, nervous, jittery, afraid interested, excited, strong, enthusiastic, proud, alert, inspired, determined, attentive, active

(Watson, Clark & Tellegen, 1988)

Small scale measurement – gaze and self-reporting

§  News §  interest §  57 users §  reading task (114)

§  questionnaire (qualitative data) §  record eye tracking (quantitative data)

Three metrics: gaze, focus attention and

positive affect

All three metrics align: interesting content promote all engagement metrics

(Arapakis etal, 2014)

From small- to large-scale measurement – mouse tracking § Navigation & interaction with digital

environment usually involves the use of a mouse (selecting, positioning, clicking)

§ Several works show mouse cursor as weak proxy of gaze (attention)

§  Low-cost, scalable alternative

§ Can be performed in a non-invasive manner, without removing users from their natural setting

Relevance, dwell time & cursor

“reading” a relevant long document vs “scanning” a long non-relevant document

(Guo & Agichtein, 2012)

Mouse Gestures à Features

x0y0

x1y1

x2y2

x3y3 x4y4

x5y5

x6y6

x7y7

x8y8

t

Δt rest Δt rest

resting cursor (500ms) resting cursor (1000ms) resting cursor (1500ms) click

−2000 0 2000 4000

02000

4000

6000

x

y

●●

●

●●●●●●●●●●●

●●●

(Arapakis, Lalmas & Valkanas, 2014)

22 users reading two articles 176, 550 cursor positions 2,913 mouse gestures

Towards a taxonomy of mouse gestures for user engagement measurement

§  The top-ranked clustering configuration is the Spectral Clustering for the original dataset, with hyperbolic tangent kernel, for k = 38

•  certain types of mouse gestures occur more or less often, depending on user interest in article

•  significant correlations between certain types of mouse gestures and self-report measures

•  cursor behaviour goes beyond measuring frustration •  inform about the positive and negative interaction

Setting

laboratory “in the wild”

from high level of consistency and control … to greater external validity and more “true to life”

§ How the visual catchiness (saliency) of “relevant” information impacts ›  focused attention ›  affect

§ Saliency model of visual attention developed by (Itti & Koch, 2000)

Crowdsourcing and self-report

Manipulating saliency

Web page screenshot

Saliency maps

salie

nt c

ondi

tion

non-

salie

nt c

ondi

tion

(McCay-Peet, Lalmas & Navalpakkam, 2012)

Study design §  8 tasks = finding latest news or headline on celebrity or

entertainment topic

§  Affect measured pre- and post- task using the Positive e.g. “determined”, “attentive” Affect Schedule (PANAS)

§  Focused attention measured with 7-item focused attention scale e.g. “I was so involved in my news tasks that I lost track of time”, “I blocked things out around me when I was completing the news tasks” and perceived time

§  Interest level in topics (pre-task) and questionnaire (post-task) e.g. “I was interested in the content of the web pages”, “I wanted to find out more about the topics that I encountered on the web pages”

§  189 (90+99) participants from Amazon Mechanical Turk

Using crowdsourcing works

§ When headlines are non-salient: users are slow at finding them, report more distraction due to web page features, and show a drop in affect

§ When headlines are salient: user find them faster, report that it is easy to focus, and maintain positive affect

§ Users reported “easier to focus in the salient condition” BUT no significant improvement in the focused attention scale or differences in perceived time spent on tasks

User interest in web page content is a good predictor è of focused attention, itself a good predictor èof positive affect

Objectivity vs Subjectivity

objective – analytics and physiological subjective – self-report

towards reliability and validity … mapping objective and subjective measurement

“U

gly”

vs

“N

orm

al” In

terf

ace

BBC News

Wikipedia

Mouse tracking and self-reporting §  324 users from Amazon Mechanical Turk (between

subject design) §  Two domains (BBC News and Wikipedia) §  Two tasks (reading and search) §  “Normal vs Ugly” interface

§  Questionnaires (qualitative data) ›  focus attention, positive effect ›  interest, aesthetics ›  + demographics, hardware

§  Mouse tracking (quantitative data) ›  movement speed, movement rate, click rate, pause length, percentage of time

still

(Warnock & Lalmas, 2013)

Mouse tracking could not tell much about

§  focused attention and positive affect §  user interests in the task/topic §  aesthetics

§ BUT BUT BUT BUT ›  “ugly” variant did not result in lower USER aesthetics scores ›  although BBC > Wikipedia

• BUT – the comments left … ›  Wikipedia: “The website was simply awful. Ads flashing everywhere, poor

text colors on a dark blue background.”; “The webpage was entirely blue. I don't know if it was supposed to be like that, but it definitely detracted from the browsing experience.”

›  BBC News: “The website's layout and color scheme were a bitch to navigate and read.”; “Comic sans is a horrible font.”

Flawed methodology? Non-existing signal? Wrong metric? Wrong measure? § Hawthorne Effect

§ Design ›  Usability versus engagement ›  Within- versus between-subject

§ Mouse movement was not sophisticated enough as shown by recent work (Arapakis etal 2014)

Temporality

short-term long-term

from intra-session … to inter-session

Large scale measurements – analytics

intra-session measures inter-session measures

•  Dwell time •  Session duration •  Bounce rate •  Play time (video) •  Mouse movement •  Click through rate (CTR) •  Number of pages viewed (click

depth) •  Conversion rate •  Number of UCG (comments). •  …

•  Fraction of return visits •  Time between visits (inter-session time,

absence time) •  Total view time per month (video) •  Lifetime value (number of actions) •  Number of sessions per unit of time •  Total usage time per unit of time •  Number of friends on site (social

networks) •  Number of UCG (comments) •  …

•  intra-session engagement measures success in attracting user to remain on site for as long as possible.

•  inter-session engagement measured by observing lifetime user value.

loyalty popularity

activity

Inter-session metric – absence time

short absence is a sign of loyalty

important indication of user engagement

(Dupret & Lalmas, 2013)

Absence time – search experience

1.  Clicks after the 5th results reflect poorer user experience; users cannot find what they are looking for

2.  No click means a bad user experience 3.  Clicking at bottom is a sign of low quality overall ranking 4.  Users finding their answers quickly (click sooner) return

sooner to the search application 5.  Returning to the same search result page is a worse user

experience than reformulating the query.

search session metrics absence time

Conclusions

Measuring User Engagement

1.  No one measurement is perfect or complete.

2.  Studies have different constraints.

3.  Measurement should be applied consistently with attention to reliability.

4.  Mostly “normal” interaction.

5.  “It is a capital mistake to theorize before one has data.” - Arthur Conan Doyle

What is a good signal?

What is a good metric?

What is a correct interpretation?

Danke schön

This talk is based on tutorial & book “Measuring User Engagement” (with Heather O’Brien and Elad Yom-Tov)

measuring user engagement: the do, the do not do, and the we do not know

Internet