collecting a dataset of information behaviour in context
TRANSCRIPT
http://www.swell-project.net
Collecting a dataset of information behaviour in context
Maya Sappelli, TNO & Radboud University NijmegenSuzan Verberne, Radboud University NijmegenSaskia Koldijk, TNO & Radboud University NijmegenWessel Kraaij, TNO & Radboud University Nijmegen
Supported by the Dutch National Program:
http://www.swell-project.net
Information behaviour in context
Supported by the Dutch National Program:
2 / 15
http://www.swell-project.net
Information behaviour in context
Supported by the Dutch National Program:
3 / 15
http://www.swell-project.net
But…
• Controlled Search Experiment Lacks context for search Unnatural motive/behaviour
• Uncontrolled Data Collection Privacy issues Noise
Supported by the Dutch National Program:
4 / 15
http://www.swell-project.net
Data Collection
Supported by the Dutch National Program:
5 / 15
http://www.swell-project.net
Data Collection
Supported by the Dutch National Program:
6 / 15
http://www.swell-project.net
Data Labeling: Event Stream to Event Blocks
Supported by the Dutch National Program:
7 / 15
Event Block
e y
Outlo ok
A nHInbox
http://www.swell-project.net
Data Labeling: presenting Event Blocks
• Mechanical Turk• 9416 event blocks• Cohen’s kappa 0.78
Supported by the Dutch National Program:
8 / 15
http://www.swell-project.net
Data Labeling: result
Supported by the Dutch National Program:
9 / 15
Distribution of Labels
Einstein
Information OverloadStressHealthyPrivacy
PerthRoadtripNapoleon
IndeterminableNo Label
Total no. Event blocks
9416
Average no. Event blocks per participant
377
http://www.swell-project.net
Examples of analyses with the data
• Stress-related behavioural research • Information-related behavioural research
1. system-oriented2. user-oriented
Supported by the Dutch National Program:
10 / 15
http://www.swell-project.net
System-oriented analysis
Supported by the Dutch National Program:
11 / 15
Total # Queries: 980Of which followed by a click on a
URL: 732Of which followed by a switch to
Word/ Powerpoint: 125
Of which with Ctrl+C: 15
with a dwell-time of >=30 seconds: 44
http://www.swell-project.net
User-oriented analysis
Supported by the Dutch National Program:
12 / 15
http://www.swell-project.net
Discussion: challenges
• Combining data from multiple sources is not trivial• Incomplete queries logged due to Google’s query suggestions• Clicks without change in Window title (esp. Google Images) • Noise from browser logging
Supported by the Dutch National Program:
13 / 15
http://www.swell-project.net
Conclusions
• Dataset of information behaviour of knowledge workers• Main contributions of the dataset:
1. Combination of data types2. Natural information seeking behaviour 3. In-context recordings
Supported by the Dutch National Program:
14 / 15