euba: the emory user behavior analysis system eugene agichtein, qi guo and ryan kelly intelligent...
TRANSCRIPT
EUBA: The Emory User Behavior Analysis System
Eugene Agichtein, Qi Guo and Ryan KellyIntelligent Information Access Lab http://ir.mathcs.emory.edu Math & Computer Science Department
Arthur Murphy, Selden Deemer, Kyle FentonEmory Libraries
2Intelligent Information Access Lab
http://ir.mathcs.emory.edu/
Goals/Motivation Evaluate effectiveness of search and discovery with
automatic behavioral metrics Perform aggregate and longitudinal studies
Develop tools for usability studies “in the wild” Scale (hundreds/thousands of “participants”) Realistic behavior and tasks On-demand playback of “interesting” sessions
Unified analysis/query framework for internal and external resource access and usage statistics Web-based query and statistics interface Access auditing, privacy, anonymity enforced
3Intelligent Information Access Lab
http://ir.mathcs.emory.edu/
Approach: Client-side instrumentation
Implemented on top of the Emory Installation of the LibX Toolbar: (http://www.libx.org)
Extended LibX to track UI events: JavaScript patch to sample the mouse movements and other events on pre-specified web search pages. Events are encoded into a string and buffered, and periodically sent to the server (on internal library network).
4Intelligent Information Access Lab
http://ir.mathcs.emory.edu/
Events captured (v0.5, Aug. 2008) Button/link clicks/Url changes
Name of the button, link, other meta-info Mouse movements
(x,y) coordinates sampled ~every 10ms Scrolling
Start, stop position, ~ every 10ms Text entry, keypress (ctrl-c, ctrl-v)
Query text, options changes Menu item events
Print, bookmark, save (all of them) Hover over important elements Mouse-in/out of browser
5Intelligent Information Access Lab
http://ir.mathcs.emory.edu/
How it works On login to Learning Commons, Firefox is
started with http://irlib.library.emory.edu/consent.cgi?user=USERID
If previously opted in (or out), goto homepage Else show consent form
Store user choice in database; if opted in, also store salted hash string for user login Can opted-in user behavior over “lifetime” No way to recover login id by dictionary attack Can be removed at any time by deleting mapping
6Intelligent Information Access Lab
http://ir.mathcs.emory.edu/
How it works (2 of 3): ConsentRequest for Logging of Internet UseRequest for Logging of Internet Use
To improve our web services, Emory Libraries are evaluating the use of our discovery tools (EUCLID, Databases, eJournals, Research Guides, Reserves Direct, Google Scholar, etc.).
We would like to capture the web traffic of your browser session to enable us to log and evaluate our patrons’ success in finding scholarly resources within the Learning Commons.
All data logged will be anonymous so that specific internet use will not be connected to a specific individual. (Details of Research Protocol)
Despite the data capture safeguards, you may wish to “opt out” of this log file recording process. Please select a choice:
This study is being undertaken by the University Libraries under the auspices of Emory University’s Institutional Review Board. To contact the Principal investigators of this study, please send email to: [email protected] or [email protected].
Log my Internet use during this semester
Do not log my Internet use during this semesterContinue Logon
http://irlib.library.emory.edu/
7Intelligent Information Access Lab
http://ir.mathcs.emory.edu/
How it works (3 or 3): which URLs? For all visited URLs LibX notifies the server;
information varies by type of site: White list (search sites): Black list (known private sites): Only domain name is
saved All “https://” and “mail.*” URLs
White list (known search/discovery sites): EUCLID, Primo, Google, Google Scholar, Yahoo and
Live search engines, Wikipedia All events captured
Gray list (search results and important public sites) Mouse moves and clicks (no keypress/text)
The rest: Only URL, button clicks, and menu items
8Intelligent Information Access Lab
http://ir.mathcs.emory.edu/
Emory User Behavior Analysis System
Combines client side instrumentation, server-side caching, log management, querying, and analysis Client-side instrumentation, data
mining/machine learning (Qi Guo) Log DB parsing, indexing, web-based
interface for querying, playback, annotation (Ryan Kelly)
Plan: to release the system to research/library community (2009?)
9Intelligent Information Access Lab
http://ir.mathcs.emory.edu/
EUBA Web-based analysis interface
Prototype:http://ir.mathcs.emory.edu/library/private/index.pl
user: testpassword: notsafe
10Intelligent Information Access Lab
http://ir.mathcs.emory.edu/
Future Plans Incorporate log data for ranking, discovery,
query suggestion, collaborative filtering
Richer statistics and visualization
Streamline usability studies
Comments and suggestions welcome!