analyzing web logs
DESCRIPTION
User Interface. Group. Research. for. Analyzing Web Logs. Sarah Waterson 18 April 2002 SIMS 213. Talk Outline. What is a web log? Where do they come from? Why are they relevant? How can we analyze them? Study Discussion. A record of a visit to a web page Visitor (IP address) - PowerPoint PPT PresentationTRANSCRIPT
Analyzing Web LogsSarah Waterson
18 April 2002SIMS 213
Group for
UserInterfac
e Research
SIMS 21318 April 2002
Talk Outline
What is a web log? Where do they come from? Why are they relevant? How can we analyze them?
Study Discussion
SIMS 21318 April 2002
What is a web log?
A record of a visit to a web page
Visitor (IP address) URL Time of visit Time spent on a page Browser used Referring URL
Type of request Reply code Number of bytes
in the reply etc…
A record of a visit to a web page
SIMS 21318 April 2002
What is a clickstream?
A record of a path through web pages
Visitor (IP address) URL Time of visit Time spent on a page Browser used Referring URL
Type of request Reply code Number of bytes
in the reply Next URL etc…
A record of a path through web pages
SIMS 21318 April 2002
What is a Web Log?Apache web log:205.188.209.10 - - [29/Mar/2002:03:58:06 -0800] "GET
/~sophal/whole5.gif HTTP/1.0" 200 9609 "http://www.csua.berkeley.edu/~sophal/whole.html" "Mozilla/4.0 (compatible; MSIE 5.0; AOL 6.0; Windows 98; DigExt)"
216.35.116.26 - - [29/Mar/2002:03:59:40 -0800] "GET /~alexlam/resume.html HTTP/1.0" 200 2674 "-" "Mozilla/5.0 (Slurp/cat; [email protected]; http://www.inktomi.com/slurp.html)“
202.155.20.142 - - [29/Mar/2002:03:00:14 -0800] "GET /~tahir/indextop.html HTTP/1.1" 200 3510 "http://www.csua.berkeley.edu/~tahir/" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)“
202.155.20.142 - - [29/Mar/2002:03:00:14 -0800] "GET /~tahir/animate.js HTTP/1.1" 200 14261 "http://www.csua.berkeley.edu/~tahir/indextop.html" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)“
SIMS 21318 April 2002
Where do they come from?
Servers Done on most
web servers Standard formats
Clients
Browsers, loggers on client machine Must send data back
Proxy Log
Proxies Similar to servers Hang out in between client and server
SIMS 21318 April 2002
Why are web logs relevant?
Lots of data Quantitative analysis is much more fun!
User behavior, patterns Real users, tasks Or at least more realistic users and tasks
Leaving the usability lab Testing effect
Fast, easy, cheap Automatic or almost-automatic
SIMS 21318 April 2002
Ed Chi asks…
Usage: How has information been accessed? How frequently? What’s popular? What’s not? How do people enter the site? Exit? Where do people spend time? How long do they spend there? How do people travel within the site? Who are the people visiting?
SIMS 21318 April 2002
Ed Chi asks…
Structural: What information has been added,
deleted, modified, moved? Usage + Structural
What happens when the site changes? (Google)
Does navigation change? Does popularity change? What about missing data?
SIMS 21318 April 2002
How do you analyze web logs?
1. Data Mining: task or intent unknown “Automated extraction of hidden predictive
information from (large) databases” – Kurt Thearling
Server log analysis
2. Remote Usability Testing: task or intent known Similar to traditional lab usability testing Clickstream analysis
What are people doing?
How well does the site support what people are doing?
SIMS 21318 April 2002
How? Data MiningStatistics and numbers galore! Gazillions of tools for server log analysis
Computers>Software>Internet>Site Management> Log Analysis
Usually charts, graphs, numbers galore Analog & NetTracker typical statistics In 3D too (eBizinsights)
SIMS 21318 April 2002
How? Data Mining cont’dOther interesting work: Web Ecologies (Chi)
Development over time Information scent (Chi)
Behavior patterns Understand how to organize info
“Information scent is made of cues that people use to decide whether a path is interesting.“
Useful for web designers?
SIMS 21318 April 2002
Web Ecologies (Chi 1998)
SIMS 21318 April 2002
How? Remote Usability Testing
Analyze clickstream in the context of the task and user intentions
Can be gathered on client, server, and via proxy
Varied granularities of interaction Mouse movements page access
Varied levels of user awareness Interactive invisible
Varied levels of access Site only entire web
SIMS 21318 April 2002
How? Remote Usability TestingWebVip and VisVip
(NIST) Server side logging Javascript
instrumentation Individual paths within
context of site Animation/replay
sessionsQuestions: What part of site used
for a task? Not used? How long to finish
task? Per page? What sorts of
behavior for task?
SIMS 21318 April 2002
How? Remote Usability TestingClickViz (Blue Martini)
Server side logging Custom instrumentation Aggregate paths based
on file system Include demographics,
purchase history Filtering
Questions: How does visitor of
type X compare to type Y?
Success vs. “failure”
SIMS 21318 April 2002
How? Remote Usability Testing
NetRaker Clickstream
Vividence ClickStreams
Not restricted to servers Testing suites Interesting aggregation methods
SIMS 21318 April 2002
How? Remote Usability Testing
WebQuilt (GUIR)Logging Design Goals:
Extensible, Scalable Allow for unobtrusive, “naturalistic” user interaction Multi-platform, multi-device compatibility Fast and easy to deploy on any website
Solution: Proxy-based logger rewrites links
Nearly invisible to user Independent of client browser
Infer actions (e.g. back button clicks) Stand alone or use with other tools
SIMS 21318 April 2002
How? Remote Usability Testing
WebQuilt (GUIR)Visual Analysis Tool:
Put data within context of the design Show deviations from expected paths Interactive graph
SIMS 21318 April 2002
Study: Purpose
Exploratory comparison of lab and remote usability testing with mobile devices
What types of usability issues can we: find with either method? find with one that we can’t find with the
other?
Design implications testing tools testing strategies
SIMS 21318 April 2002
Study: The Mobile Web
Limited and/or new interaction methods Small screens Graffiti, keypads, thumb-pads
Beyond the desktop Driving, traveling, walking Noisy, public
Gathering good usability data is vital to making these interfaces, and subsequently these devices, successful.
SIMS 21318 April 2002
Study: Design 10 users asked to find:
Anti-lock brake information on the latest Nissan Sentra
The closest Nissan dealer http://pda.edmunds.com Handspring Visor Edge with
OmniSky wireless modem 5 users in the lab 5 users in the wild Web-based questionnaires
SIMS 21318 April 2002
Study: Identifying Usability Issues
Lab Data Tester observations Participant
comments Questionnaire
Remote Data Clickstream analysis Questionnaire
Severity Levels 0 indicates a
comment 15
(minorcritical)
Four Categories
Device Browser Site Design Test Design
SIMS 21318 April 2002
Study: Caveats
Analysis and observation for both tests done by same person
Issues identified from remote tests first Avoids biasing remote analysis tools
Looking for potential problem areas
SIMS 21318 April 2002
Study: Results
Totals: 18 unique issues 7 found remotely
Lab Remote
Device 4 1
Browser 2 0
Test Design 6 2
Site Design 9 5
Site Design 5 of the 9 issues 3 of the 4 with severity level > 3
1/3 device or browser related
Test Design 2 of the 6 issues 2 of the 4 with severity level > 3
SIMS 21318 April 2002
Study: Process Observations
Remote usability testing can capture some usability issues that lab
testing already discovers
Lab testing gets me: Qualitative observations Thinking aloud comments Non-content usability issues
SIMS 21318 April 2002
Study: Process Observations
What can remote testing get us that labs can’t?
Lab effect Quitting a task is easier when not in
lab Network problems more realistic
With more users Patterns emerge Can reduce uncertainty
Faster
SIMS 21318 April 2002
Study: Conclusions
Remote usability testing is a promising technique for capturing realistic
usage data for mobile web site design
Main concerns Gathering user feedback on mobile devices is even
more difficult because of limited input Understanding users can be ambiguous
Potentially alleviated by ability to test larger number of users
SIMS 21318 April 2002
Discussion
Comments Questions
Where does web log analysis fit into a design cycle?
Understanding what methods to use when and where
Experiences? These or other tools?
Design
Evaluate Prototype