discovering common motifs in cursor movement data

60
Discovering Common Motifs in Cursor Movement Data Dmitry Lagun, 2014 Emory University 1

Upload: yandex

Post on 19-Jun-2015

652 views

Category:

Technology


0 download

DESCRIPTION

Mouse cursor movements can provide valuable information on how users interact and engage with web documents. This interaction data is far richer than traditional click data, and can be used to improve evaluation and presentation of web information systems. Unfortunately, the diversity and complexity inherent in this interaction data make it more difficult to capture salient behavior characteristics through traditional feature engineering. To address this problem, we introduce a novel approach of automatically discovering frequent subsequences, or motifs, in mouse cursor movement data. In order to scale our approach to realistic datasets, we introduce novel optimizations for motif discovery, specifically designed for mining cursor movement data. We show that by encoding the motifs discovered from thousands of real web search sessions as features, enables significant improvements in important web search tasks. These results, complemented with visualization and qualitative analysis, demonstrate that our approach is able to automatically capture key characteristics of mouse cursor movement behavior, providing a valuable new tool for online user behavior analysis. In addition to the application of motifs to web mining, we demonstrate that similar technique can be successfully applied in medical domain for the task of predicting future decline of memory function and subsequent development of the Alzheimer Disease.

TRANSCRIPT

Page 1: Discovering Common Motifs in Cursor Movement Data

1

Discovering Common Motifs in Cursor Movement Data

Dmitry Lagun, 2014Emory University

Page 2: Discovering Common Motifs in Cursor Movement Data

2

Thank you!

Mikhail Ageev Qi Guo Eugene Agichtein

Page 3: Discovering Common Motifs in Cursor Movement Data

3

The Importance of Online User Attention

• “Attention is focused mental engagement on a particular item of information.”(Davenport & Beck 2001, p. 20)

Abundance of information

Scarcity of attention

Page 4: Discovering Common Motifs in Cursor Movement Data

4

The Importance of Online User Attention

• “Eye-mind Hypothesis”[Just and Carpenter, 1980]

• “When a subject looks at a word or object, he or she also thinks about (process cognitively), and for exactly as long as the recorded fixation.”

Page 5: Discovering Common Motifs in Cursor Movement Data

5

The Importance of Online User Attention

• Attention is critical for science of cognition (vision, language, memory)

• Many industry applications:– Web search intent, quality,

presentation, satisfaction– UI usability testing– Display advertising,

customer engagement, branding

Page 6: Discovering Common Motifs in Cursor Movement Data

6

Measurement of Attention

• Eye Tracking– Based on corneal reflection of infra-red light

Infra-red cameras

Users spend most of the time on top search results

Page 7: Discovering Common Motifs in Cursor Movement Data

7

Applications

Examination Strategies [Buscher et al.]

Web Page Re-Design [Leiva et al.]

Behavior Biased Summaries

[Ageev et al.]

Query-Expansion & Relevance Feedback

[Buscher et al.]

Parkinson, ADHD, FASD[Tseng et al.]

Prediction of Cognitive Impairment[Zola et al.]

Search Relevance [Guo & Agichtein]

Search Abandonment[Huang et al.]

Page 8: Discovering Common Motifs in Cursor Movement Data

8

Applications

Examination Strategies [Buscher et al.]

Web Page Re-Design [Leiva et al.]

Behavior Biased Summaries

[Ageev et al.]

Query-Expansion & Relevance Feedback

[Buscher et al.]

Parkinson, ADHD, FASD[Tseng et al.]

Prediction of Cognitive Impairment[Zola et al.]

Search Relevance [Guo & Agichtein]

Search Abandonment[Huang et al.]

Our focus

Page 9: Discovering Common Motifs in Cursor Movement Data

9

emory math and cs

Search

Page 10: Discovering Common Motifs in Cursor Movement Data

10

Search Logs

Web Pages

Search Engine Ranking

emory math and cs

emory math and cs

emory math and cs

Page 11: Discovering Common Motifs in Cursor Movement Data

11

Search Logs

Web Pages

Search Engine Ranking

click

emory math and cs

emory math and cs

emory math and cs

Page 12: Discovering Common Motifs in Cursor Movement Data

12

Search Logs

Web Pages

Search Engine Ranking

Relevant or Not?

Ranking

emory math and cs

emory math and cs

emory math and cs

Page 13: Discovering Common Motifs in Cursor Movement Data

13

Prior Work:Cursor Movement on Landing Pages

• Post Click Behavior Model [Guo and Agichtein, WWW 2012]• Two basic patterns: “Reading” and “Scanning”

Reading Scanning

“Reading”: consuming or verifying when (seemingly) relevant information is found

“Scanning”: not yet found the relevant information, still in the process of visually searching

Page 14: Discovering Common Motifs in Cursor Movement Data

14

Post-Click Behavior (PCB) Data Improves Ranking

• PCB and PCB_User consistently outperform DTR (baseline)

[Guo & Agichtein, WWW 2012][Guo , Lagun & Agichtein, CIKM 2012]

DTR = Dwell time + Rank

ND

CG

Page 15: Discovering Common Motifs in Cursor Movement Data

15

Post-Click Behavior (PCB) Model Features

• Average cursor position, cursor speed, direction

• Travelled distance, horizontal and vertical ranges

• Max/Min cursor positions on the screen• Scroll speed, frequency and scroll distance• Cursor position in a region-of-interest

Can we automatically discover meaningful features of cursor trajectory?

Page 16: Discovering Common Motifs in Cursor Movement Data

16

Our Approach: Cursor Motif Mining Instead of engineering complex features, discover common subsequences (motifs)

Motif is a frequently occurring sequence of cursor movements.

Similar

Page 17: Discovering Common Motifs in Cursor Movement Data

17

Mouse Cursor Data: Challenges

Different users examine web pages with different speed, hence move mouse slower/faster.

Similar of movements can appear in different parts of a web page (top vs. bottom).

Page 18: Discovering Common Motifs in Cursor Movement Data

18

Mouse Cursor Data: Challenges

Different users examine web pages with different speed, hence move mouse slower/faster.[Flexible Distance Metric, DTW]

Similar type of movements can appear in different parts of a web page (top vs. bottom).[Location Invariance: normalize subsequence position]

Page 19: Discovering Common Motifs in Cursor Movement Data

19

Motif Discovery Pipeline

Generate Motif Candidates

Discover Frequent

Candidates

De-duplicate / Output Motifs

Distance Measure

Page 20: Discovering Common Motifs in Cursor Movement Data

20

Candidate Generation

window size

sliding window

Motif candidates

Page 21: Discovering Common Motifs in Cursor Movement Data

21

Distance Measure

• Which time series are similar? • Popular Choices:

– Euclidian Distance (ED)– Dynamic Time Warping (DTW)

Page 22: Discovering Common Motifs in Cursor Movement Data

22

Frequent Motif Mining

• Similarity Search– How many subsequences in the dataset are similar

to the given candidate subsequence?motif candidates

moti

f can

dida

tes

dist(i,j) – how similar i-th candidate to the j-th motif candidate.

Algorithm Parameters:max_dist – distance when two subsequences are considered “similar”min_count – minimal frequency of motif candidate

Brute force search is computationally expensive

Page 23: Discovering Common Motifs in Cursor Movement Data

23

De-Duplication (only keep cluster centroids)

• Similarity search can generate a lot of frequent candidates that are similar between each other (due to redundancy in motif candidate generation)

Page 24: Discovering Common Motifs in Cursor Movement Data

24

Motif Discovery Pipeline

Generate Motif Candidates

Discover Frequent

Candidates

De-duplicate / Output Motifs

Distance Metric

Page 25: Discovering Common Motifs in Cursor Movement Data

25

Optimizations in Similarity Search

• Early stopping– in DTW computation (takes O(n^2) time)– in lower bound computation (takes O(n) time)

[Keogh et al.]• Parallel Computation

– No dependency in distance computation use multiple cores

• Distance Metric Learning• Spatial Indexing

Page 26: Discovering Common Motifs in Cursor Movement Data

26

Distance Measure Learning

• Goal: Fast pruning of not-promising candidates in similarity search

Features (x_max, y_max, …, feature_k)

Features (x_max, y_max, …, feature_k)

Tune the weights with Gradient based method (e.g. SGD)

Page 27: Discovering Common Motifs in Cursor Movement Data

27

Spatial Indexing

• Goal: Fast pruning of not-promising candidates in similarity search

• Indexes motif candidatesin weighted feature space

• Improves asymptotic time for similarity search

Page 28: Discovering Common Motifs in Cursor Movement Data

28

Timing Experiments

Page 29: Discovering Common Motifs in Cursor Movement Data

29

Example of Discovered Motif

discovered motif

eye gaze

mouse cursor

matching subsequence

Page 30: Discovering Common Motifs in Cursor Movement Data

30

Motifs Discovery: Examples

On Search Engine Result Pages (SERPs)

On “Landing” Pages (non-SERPs)

Page 31: Discovering Common Motifs in Cursor Movement Data

31

Discovered motifs have many uses

• Summarize typical mouse cursor usages– E.g. create dictionary of typical cursor usages

• Compact (task-free) representation– Characterize entire cursor trajectory based on which

motifs appear in it

• For classification/regression:– Compute whether particular motifs appears in a

given mouse cursor trajectory

Page 32: Discovering Common Motifs in Cursor Movement Data

32

Using motifs as features for Classification/Regression

• We can measure how similar is mouse movement trajectory to each of the discovered motifs

window size

sliding windowmotif

Page 33: Discovering Common Motifs in Cursor Movement Data

33

Motifs for Relevance Prediction

• Baselines– Cursor Hover (on the search result page)

[Huang et al., CHI 2011]

– Post Click Behavior Model[Guo & Agichtein, WWW, 2012]

• Dwell time• Statistics of cursor movements: max, min, range, etc.• Statistics of scrolling activity: max, min, range, etc.

Reading Scanning

Page 34: Discovering Common Motifs in Cursor Movement Data

34

Dataset

• User study (21 users)– mostly informational search tasks

– 566 search queries

– 1340 page views

– 854 relevance judgments

Page 35: Discovering Common Motifs in Cursor Movement Data

35

Motifs are Better than Previous Models (PCB, Hover)

Feature Group Pearson CorrelationCursor Hover 0.120Post Click Behavior 0.392Motifs 0.394 (+0.5%)Post Click Behavior + Motifs 0.468 (+19.4%)

Page 36: Discovering Common Motifs in Cursor Movement Data

36

Motifs are Helpful for Web Search Result Ranking

Page 37: Discovering Common Motifs in Cursor Movement Data

37

Conclusions

• It is possible to automatically discover meaningful motifs from mouse cursor data

• Motifs are helpful for relevance prediction & ranking

• Cursor motifs provide compact (task free) representation for the entire cursor trajectory

Page 38: Discovering Common Motifs in Cursor Movement Data

38

Applications of Gaze/Mouse Cursor Tracking in Medical Domain

Page 39: Discovering Common Motifs in Cursor Movement Data

39

Background: Mild Cognitive Impairment (MCI) and Alzheimer’s Disease

• Alzheimer’s disease (AD) affects more than 5M Americans, expected to grow in the coming decade

• Memory impairment (aMCI) indicates onset of AD (affects hippocampus first)

• Visual Paired Comparison (VPC) task: promising for early diagnosis of both MCI and AD before it is detectableby other means

Page 40: Discovering Common Motifs in Cursor Movement Data

40

VPC Task: Eye Tracking Equipment

Page 41: Discovering Common Motifs in Cursor Movement Data

41

Impaired Subjects spent 50% on Novel Image after Long Delay

Page 42: Discovering Common Motifs in Cursor Movement Data

42

VPC Task: Eye Tracking

Page 43: Discovering Common Motifs in Cursor Movement Data

43

Exploiting Eye Gaze Movement Data

Novelty Preference

fixation duration distribution

+

Page 44: Discovering Common Motifs in Cursor Movement Data

44

Shapelets are Helpful for Prediction of Cognitive Decline

• Shapelets – “class specific” motifs

Page 45: Discovering Common Motifs in Cursor Movement Data

45

Shapelets are Helpful for Prediction of Cognitive Decline

• Shapelets – “class specific” motifs

Baseline AUC = 0.892 ± 0.003Shapelets AUC = 0.916 ± 0.006

Page 46: Discovering Common Motifs in Cursor Movement Data

46

User Attention on Web Pages

Page 47: Discovering Common Motifs in Cursor Movement Data

47

Cross-Domain User Study

• Research Question– Does web page content affect user attention?

• Domains– Search (Google), Wikipedia, Shopping (Amazon), Social (Twitter),

News (CNN )

• 20 users (4 + 20 tasks per user)

• 400 tasks, 1700 page views

• 500K gaze/cursor measurements (sampled every 50 ms)

?search domain X

Page 48: Discovering Common Motifs in Cursor Movement Data

48

Web Search Pages

Page 49: Discovering Common Motifs in Cursor Movement Data

49

News Search Pages

Page 50: Discovering Common Motifs in Cursor Movement Data

50

Shopping Search Pages

Page 51: Discovering Common Motifs in Cursor Movement Data

51

Twitter Search Pages

Page 52: Discovering Common Motifs in Cursor Movement Data

52

Conclusions

• It is possible to automatically discover meaningful motifs from mouse cursor data

• Motifs are helpful for relevance prediction, ranking and prediction of cognitive impairment

• Attention patterns vary significantly across search interfaces

Page 53: Discovering Common Motifs in Cursor Movement Data

53

Thank You!

• This work was supported by

Page 54: Discovering Common Motifs in Cursor Movement Data

54

Emory IR Lab: Research Areas

• Modeling collaborative content creation for information organization, indexing, and search

• Mining search behavior data to improve information finding.

• Medical applications of Search, NLP, behavior modeling.

Page 55: Discovering Common Motifs in Cursor Movement Data

55

UFindIt: Remote Search Behavior StudiesMisha Ageev (MGU & Yandex), Dmitry Lagun (Emory), Denis Savenkov (Emory)

SIGIR 2011 (best paper award), SIGIR 2013, EMNLP 2013

Page 56: Discovering Common Motifs in Cursor Movement Data

56

Search behavior models for Touch Screens

Ongoing project, looking for students

Guo et al., SIGIR 2013

Page 57: Discovering Common Motifs in Cursor Movement Data

Dynamics in User Generated Content

Wikipedia

Major events (e.g., natural disasters, sports) affect the content change in Wikipedia articles.

Use content change for ranking:• Words used in early revisions of the documents are more essential and important to

the documents.• Words used during a major event may reflect relevance change between words and

documents

Twitter

Topic transitions in Tweet streams:• What you’ve tweeted before may affect what you will tweet in the near feature.

Sentiment change in Twitter during major events:• People respond differently to the same event since they could hold different prior

opinions. (e.g., conservatives vs. liberals)

Yu Wang (Ph.D. expected 2014)[CIKM 2010, KDD 2012, CIKM 2013]

Page 58: Discovering Common Motifs in Cursor Movement Data

Community Question Answering (CQA)

1. What are the factors influencing answer contributions in CQA Systems?– Analyzing answerer behavior [ECIR 2011]

2. What kind of searches benefit most from CQA services and archives? – Understanding how searchers become askers [SIGIR 2011]

3. How to improve search quality with CQA data?– Predicting searcher satisfaction with CQA data [SIGIR 2012]

Qiaoling Liu, Ph.D. expected: 2014

Page 59: Discovering Common Motifs in Cursor Movement Data

59

• Emory IR Lab is looking for a few good Ph.D. students to start September 2015

• Information retrieval and web search: search behavior, ranking, user interfaces, content analysis, Question Answering

• Social media and social network mining applications:political science, public health, advertising

• Psychology, Neuroscience, Medicine applications: computational attention, memory, cognition, language

Contact: Eugene AgichteinAssociate Professor

[email protected]/~eugene/

http://www.mathcs.emory.edu/programs-grad/ Computer Science Ph.D. Program information and application process:

Page 60: Discovering Common Motifs in Cursor Movement Data

60

Atlanta, GA