webquilt: capturing and visualizing the web experience at
DESCRIPTION
Research I did a while back on using a web proxy to capture web interactions remotely and then visualizing those interactions. Basically, WebQuilt is a tool to support remote usability testing of web sites. WebQuilt is a web logging and visualization system that helps web design teams run usability tests (both local and remote) and analyze the collected data. Logging is done through a proxy, overcoming many of the problems with server-side and client-side logging. Captured usage traces can be aggregated and visualized in a zooming interface that shows the web pages people viewed. The visualization also shows the most common paths taken through the website for a given task, as well as the optimal path for that task as designated by the designer. This paper discusses the architecture of WebQuilt and also describes how it can be extended for new kinds of analyses and visualizations. Authors are Jason Hong and James LandayTRANSCRIPT
WebQuiltCapturing and Visualizing the Web Experience
Jason I. HongJames A. Landay
Group for User Interface ResearchEECS Department
University of California at Berkeley
World Wide Web 10
May 04 2001 2
Motivation
• Many websites have usability problems 62% web shoppers gave up past month (Spool)
39% failed in buying attempts (Creative Good)
• Two problems all web designers face Understanding users' tasks Understanding obstacles in completing tasks
• Many methods for understanding tasks E.g. interviews, ethnographic observations,
surveys, focus groups
• Focus here is on understanding obstacles
May 04 2001 3
Understanding Obstacles Today
• Traditional usability testsExtremely useful qualitative informationLots of time, small websites, few people, local
• Server-side loggingEasy to collect, remote testing, lots of toolsRestricted access, little on tasks and problems
• Client-side loggingCan track everything, remote testing Installation, platform-dependent, analysis tools
May 04 2001 4
Streamlining Current Practices
• Fast and easy to deploy on any website• Compatible with range of OS and browsers• Better tools for analyzing the data
May 04 2001 5
WebQuilt Approach
• Fast and easy to deploy on any website• Compatible with range of OS and browsers• Better tools for analyzing the data
Client Browser Web Server
Request
Response
May 04 2001 6
WebQuilt Approach
• Fast and easy to deploy on any website• Compatible with range of OS and browsers• Better tools for analyzing the data
WebQuiltLog
ProxyClient Browser Web Server
May 04 2001 7
WebQuilt Approach
• Fast and easy to deploy on any website• Compatible with range of OS and browsers• Better tools for analyzing the data
May 04 2001 8
WebQuilt Usage
• Setup several tasks, recruit 20–100 people• Email participants a URL that uses the proxy• Ask them to complete the predefined tasks• Collect lots of remote (or local) data• Aggregate, view, and interact with data• Find problems, fix, repeat
Evaluate
Design
Prototype
May 04 2001 9
Outline
Background and MotivationWebQuilt ArchitectureUsage Experience and VisualizationsSummary and Future Work
May 04 2001 10
Overall ArchitectureProxy
Logger
GraphLayout
Viz
GraphMerger
ActionInferencer
Log Files
Online
Offline
May 04 2001 11
ProxyProxy
Logger
GraphLayout
Viz
GraphMerger
ActionInferencer
• Lies between browser and server
http://domain.com/webquilt?replace=http://www.yahoo.com
• One log file per user session• Currently use Java servlets
Important part is log file format
May 04 2001 12
Time(ms)
From TID
To TID
Parent ID
HTTP Response
Frame ID
Link ID
HTTP Method
URL + Query
6062 0 1 -1 200 -1 -1 GET http://www.google.com
11191 1 2 -1 200 -1 -1 GET http://www.phish.com/index.htmq=Phish&btnI=I%27m+Feeling+Lucky
167525 2 3 -1 200 -1 1 GET http://www.phish.com/bios.html
31043 3 4 -1 200 -1 2 GET https://www.phish.com/bin/catalog.cgi
68772 2 5 -1 200 -1 15 GET http://www.emusic.com/features/phish
Log File Format
May 04 2001 13
Time From TID
To TID
Parent ID
HTTP Response
Frame ID
Link ID
HTTP Method
URL + Query
6062 0 1 -1 200 -1 -1 GET http://www.google.com
11191 1 2 -1 200 -1 -1 GET http://www.phish.com/index.htmq=Phish&btnI=I%27m+Feeling+Lucky
167525 2 3 -1 200 -1 1 GET http://www.phish.com/bios.html
31043 3 4 -1 200 -1 2 GET https://www.phish.com/bin/catalog.cgi
68772 2 5 -1 200 -1 15 GET http://www.emusic.com/features/phish
Time From TID
To TID
Parent ID
HTTP Response
6062 0 1 -1 200
(ms)
Log File Format
May 04 2001 14
Time(ms)
From TID
To TID
Parent ID
HTTP Response
Frame ID
Link ID
HTTP Method
URL + Query
6062 0 1 -1 200 -1 -1 GET http://www.google.com
11191 1 2 -1 200 -1 -1 GET http://www.phish.com/index.htmq=Phish&btnI=I%27m+Feeling+Lucky
167525 2 3 -1 200 -1 1 GET http://www.phish.com/bios.html
31043 3 4 -1 200 -1 2 GET https://www.phish.com/bin/catalog.cgi
68772 2 5 -1 200 -1 15 GET http://www.emusic.com/features/phish
Frame ID
Link ID
HTTP Method
URL + Query
-1 -1 GET http: / /www.google.com
Log File Format
May 04 2001 15
Client Browser Web Server WebQuilt Proxy
Proxy Editor
Cached Pages WebQuilt Logs
WebProxy Servlet 1 2
3 4 5 HTTPClient
Package
Client Browser Web Server WebQuilt Proxy
Proxy Editor
Cached Pages WebQuilt Logs
WebProxy Servlet 1 2
3 4 5 HTTPClient
Package
Store
The Proxy at Runtime
May 04 2001 16
Client Browser Web Server WebQuilt Proxy
Proxy Editor
Cached Pages WebQuilt Logs
WebProxy Servlet 1 2
3 4 5 HTTPClient
Package
Client Browser Web Server WebQuilt Proxy
Proxy Editor
Cached Pages WebQuilt Logs
WebProxy Servlet 1 2
3 4 5 HTTPClient
Package
1. Process Client Request
Store
The Proxy at Runtime
May 04 2001 17
Client Browser Web Server WebQuilt Proxy
Proxy Editor
Cached Pages WebQuilt Logs
WebProxy Servlet 1 2
3 4 5 HTTPClient
Package
Client Browser Web Server WebQuilt Proxy
Proxy Editor
Cached Pages WebQuilt Logs
WebProxy Servlet 1 2
3 4 5 HTTPClient
Package
2. Retrieve Requested Document
Store
The Proxy at Runtime
May 04 2001 18
Client Browser Web Server WebQuilt Proxy
Proxy Editor
Cached Pages WebQuilt Logs
WebProxy Servlet 1 2
3 4 5 HTTPClient
Package
Client Browser Web Server WebQuilt Proxy
Proxy Editor
Cached Pages WebQuilt Logs
WebProxy Servlet 1 2
3 4 5 HTTPClient
Package
3. Process and return the page
Store
The Proxy at Runtime
May 04 2001 19
Start with:<A HREF="computers.html">
End up with:<A HREF="http://tasmania.cs.berkeley.edu/webquilt?replace=http://www.yahoo.com/computers.html&tid=1&linkid=12">
The Proxy at Runtime
May 04 2001 20
Client Browser Web Server WebQuilt Proxy
Proxy Editor
Cached Pages WebQuilt Logs
WebProxy Servlet 1 2
3 4 5 HTTPClient
Package
Client Browser Web Server WebQuilt Proxy
Proxy Editor
Cached Pages WebQuilt Logs
WebProxy Servlet 1 2
3 4 5 HTTPClient
Package
4. Store the page 5. Log the transaction
Store
The Proxy at Runtime
May 04 2001 21
Additional Proxy Functionality
• Handling Cookies Cookies only sent from browser back
to web server that put it there
ProxyLogger
GraphLayout
Viz
GraphMerger
ActionInferencer
User ID Domain Cookie
AAA yahoo.com xyzzy
AAA google.com asdfg
BBB yahoo.com abcde
May 04 2001 22
Additional Proxy Functionality
• Handling Cookies Cookies only sent from browser back
to web server that put it there
• Handling Secure Socket Layer Encrypts page requests and data E.g. Shopping Carts, Financials
ProxyLogger
GraphLayout
Viz
GraphMerger
ActionInferencer
Client Browser Web Server
SSL
May 04 2001 23
Additional Proxy Functionality
• Handling Cookies Cookies only sent from browser back
to web server that put it there
• Handling Secure Socket Layer Encrypts page requests and data E.g. Shopping Carts, Financials Split into two SSL requests
ProxyLogger
GraphLayout
Viz
GraphMerger
ActionInferencer
ProxyClient Browser Web Server
SSL SSL
May 04 2001 24
Action Inferencer
• Takes a single log file and converts into a list of actions "Clicked on link" or "Hit back button"
• Inference because still must guess Back and forward actions local
ProxyLogger
GraphLayout
Viz
GraphMerger
ActionInferencer
May 04 2001 25
Time From TID
To TID
Parent ID
HTTP Response
Frame ID
Link ID
HTTP Method
URL + Query
6062 0 1 -1 200 -1 -1 GET http://www.google.com
11191 1 2 -1 200 -1 -1 GET http://www.phish.com/index.htmq=Phish&btnI=I%27m+Feeling+Lucky
167525 2 3 -1 200 -1 1 GET http://www.phish.com/bios.html
31043 3 4 -1 200 -1 2 GET https://www.phish.com/bin/catalog.cgi
68772 2 5 -1 200 -1 15 GET http://www.emusic.com/features/phish
Re-Assembling User Actions
May 04 2001 26
From TID
To TID
Parent ID
HTTP Response
Frame ID
Link ID
HTTP Method
0 1 -1 200 -1 -1 GET
2 -1 200 -1 -1 GET
3 -1 200 -1 1 GET
4 -1 200 -1 2 GET
5 -1 200 -1 15 GET
URL + Query
http://www.google.com
http://www.phish.com/index.htmq=Phish&btnI=I%27m+Feeling+Lucky
http://www.phish.com/bios.html
https://www.phish.com/bin/catalog.cgi
http://www.emusic.com/features/phish
Time
6062
11191
167525
31043
68772
1
2
3
2
1
2
3
4
5
Re-Assembling User Actions
May 04 2001 27
From TID
To TID
0 1
2 3
3 4
2 5
URL + Query
http://www.google.com
http://www.phish.com/index.htmq=Phish&btnI=I%27m+Feeling+Lucky
http://www.phish.com/bios.html
https://www.phish.com/bin/catalog.cgi
http://www.emusic.com/features/phish
1
2
3
4
5
1
2
3
2
Re-Assembling User Actions
May 04 2001 28
From TID
To TID
0 1
2 3
3 4
2 5
URL + Query
http://www.google.com
http://www.phish.com/index.htmq=Phish&btnI=I%27m+Feeling+Lucky
http://www.phish.com/bios.html
https://www.phish.com/bin/catalog.cgi
http://www.emusic.com/features/phish
1
2
3
4
5
1
2
3
2
Start 1
Re-Assembling User Actions
May 04 2001 29
From TID
To TID
0 1
2 3
3 4
2 5
URL + Query
http://www.google.com
http://www.phish.com/index.htmq=Phish&btnI=I%27m+Feeling+Lucky
http://www.phish.com/bios.html
https://www.phish.com/bin/catalog.cgi
http://www.emusic.com/features/phish
1
2
3
4
5
1
2
3
2
Start 1 2
Re-Assembling User Actions
May 04 2001 30
From TID
To TID
0 1
2 3
3 4
2 5
URL + Query
http://www.google.com
http://www.phish.com/index.htmq=Phish&btnI=I%27m+Feeling+Lucky
http://www.phish.com/bios.html
https://www.phish.com/bin/catalog.cgi
http://www.emusic.com/features/phish
1
2
3
4
5
1
2
3
2
Start 1 2 3
Re-Assembling User Actions
May 04 2001 31
From TID
To TID
0 1
2 3
3 4
2 5
URL + Query
http://www.google.com
http://www.phish.com/index.htmq=Phish&btnI=I%27m+Feeling+Lucky
http://www.phish.com/bios.html
https://www.phish.com/bin/catalog.cgi
http://www.emusic.com/features/phish
1
2
3
4
5
1
2
3
2
Start 1 2 3 4
Re-Assembling User Actions
May 04 2001 32
From TID
To TID
0 1
2 3
3 4
2 5
URL + Query
http://www.google.com
http://www.phish.com/index.htmq=Phish&btnI=I%27m+Feeling+Lucky
http://www.phish.com/bios.html
https://www.phish.com/bin/catalog.cgi
http://www.emusic.com/features/phish
1
2
3
4
5
1
2
3
2
Start 1 2 3 4
5
Re-Assembling User Actions
May 04 2001 33
Start 1 2 3 4
5
Action Inferencer
May 04 2001 34
1 2 3 4 3 2Start 5
Start 1 2 3 4
5
Case 1
Link Back Link
Action Inferencer
May 04 2001 35
Start 1 2 3 4
5
1 2 3 4 3 2Start 1 2 5
Case 2
Link Back LinkFwd
Action Inferencer
May 04 2001 36
1 2 3 4 3 2Start 5
Start 1 2 3 4
5
Case 1 by default(shortest path)
Action Inferencer
May 04 2001 37
Merger
• Combines multiple log files into a single directed graph Web pages are nodes Actions are edges
ProxyLogger
GraphLayout
Viz
GraphMerger
ActionInferencer
May 04 2001 38
Graph Layout
• Assign (x,y) to all nodes• Force-directed placement
Keep connected nodes close Push unconnected nodes far apart
• Edge-weighted depth-first Most traffic along top Less followed paths below Grid to help organize and align
• Plug-in new algorithms here
ProxyLogger
GraphLayout
Viz
GraphMerger
ActionInferencer
May 04 2001 39
VisualizationProxy
Logger
GraphLayout
Viz
GraphMerger
ActionInferencer
May 04 2001 40
May 04 2001 41
May 04 2001 42
May 04 2001 43
May 04 2001 44
May 04 2001 45
May 04 2001 46
May 04 2001 47
May 04 2001 48
May 04 2001 49
Future Work
• More sophisticated logging Lower level events (e.g. AT&T WET) Personalized web pages
• More sophisticated visualizations More use of semantic zooming Dynamic filtering
• Continue getting feedback from designers Initiated interviews with web designers Still need to do evaluations
May 04 2001 50
Take Home Ideas
• Need more tools for improving web site usability
• Proxy logging Logging where task is already known Any website, any browser, remote testing
• Visualizing logged data Aggregates large data sets Interact with in a zooming interface
• Pluggable architecture
May 04 2001 51
Acknowlegements
• Special thanks to Jeff Heer, Tim Sohn, and Sarah Waterson
Group for User Interface ResearchEECS Department
University of California at Berkeley
Download WebQuilt at:http://guir.berkeley.edu/webquilt
May 04 2001 52
Extra Slides
May 04 2001 53
Berkeley Website A
May 04 2001 54
May 04 2001 55
May 04 2001 56
May 04 2001 57
May 04 2001 58
May 04 2001 59
Casa de Fruta A
May 04 2001 60
May 04 2001 61
May 04 2001 62
May 04 2001 63
Casa de Fruta B
May 04 2001 64
May 04 2001 65
May 04 2001 66
May 04 2001 67
May 04 2001 68
Time From TID
To TID
Parent ID
HTTP Response
Frame ID
Link ID
HTTP Method
URL + Query
6062 0 1 -1 200 -1 -1 GET http://www.google.com
11191 1 2 -1 200 -1 -1 GET http://www.phish.com/index.htmq=Phish&btnI=I%27m+Feeling+Lucky
167525 2 3 -1 200 -1 1 GET http://www.phish.com/bios.html
31043 3 4 -1 200 -1 2 GET https://www.phish.com/bin/catalog.cgi
68772 2 5 -1 200 -1 15 GET http://www.emusic.com/features/phish
Log File Format
May 04 2001 69
In Case You're Feeling Evil…
• URLs can be of the form:http://userid@domain/page.html
• Most web servers ignore the userid part, but…http://[email protected]…/…
• Can auto-track people's actions once they hit your page without their knowledge or consent