the role of information scent - act-ract-r.psy.cmu.edu/wordpress/wp-content/themes/act-r/... ·...
TRANSCRIPT
The Role of Information Scent in On-line Browsing:Extensions of the ACT-R Utility and Concept Formation Mechanisms
Peter PirolliUser Interface Research Area
Supported in part byOffice of Naval Research (ONR)Advanced Research and Development Agency (ARDA)
Information ScentBrowsing often requires that we use visual or textual cues to direct our actions.
These local (proximal) cues are called information scent.
The theory of information scent is a psychological theory
Used to develop new user interface designs and new Web site evaluation tools
(a) (b)
(c) (d)
Overview
� Why information scent is important� SNIF-ACT Model of Navigation
– Where to go next (navigational choices)?– When to stop (leaving a Web site)?
� InfoCLASS Model of topic category formation– What categories of topics are where (learning
about the information environment)
Example: Looking for Pirolli’sPersonal Page on the PARC Site
Company
ResearchDocument Content
SensemakingUserInterface
People
Pirolli
Hierarchical Navigation SpacesStrong Information Scent= Navigation Cost
Linear in Depth
Weak Information Scent= Navigation Cost
Exponential in Depth
Navigation Costs as Function of Information Scent
Notes: Average branching factor = 10Depth = 10
0 2 4 6 8 100
50
100
150
Depth
Num
ber
of p
ages
vis
ited
.100
.125
.150
0 2 4 6 8 100
50
100
150
.100
.150
Probability of choosing wrong link Probabilities within
range of empirical values in Woodruff et al. (2002)
Phase Transition in Navigation Costs as Function of Information Scent
0 0.05 0.1 0.15 0.20
20
40
60
80
100
0 2 4 6 8 100
50
100
150
Depth
Num
ber
of p
ages
visi
ted
.100
.125
.150
0 2 4 6 8 100
50
100
150
Depth
Num
ber
of p
ages
visi
ted
.100
.125
.150
Probability of Choosing Wrong Link
Num
ber
of P
ages
Vis
ited
per
Lev
el
Notes: Average branching factor = 10Depth = 10
Linear Exponential
Probability of choosing wrong link
UserSystem
Interface Content
RepresentativeTask Sample
Instrumentation
Database
DatabaseUser Trace
Data
UserSimulation
Model
User Tracing
WWW Studies
Survey of user tasks (N = 2188). Develop ecologically valid tasks for study in experiments.
Develop instrumentation to record user trace data.
Experiments
Analyze data. Develop cognitive model. Validate against user trace data.
WWW Experiment & Model
� 12 Stanford University students� 6 tasks� 2 tasks analyzed and modeled for 4 participants� Example tasks
– Find specific movie posters for your new living room– Find dates for a performance by a comedy troupe
Pirolli, P. & Fu, W. (2003). SNIF-ACT: A model of information foraging on the World Wide Web. Proceedings of the Conference on User Modeling.
Web Behavior Graphs (WBGs)
� Links providing better information scent yield more direct navigation
� People abandon Web sites when information scent diminishes
S1 S2 S3 S4 S6
S7 S9
S10
S13
S16 S17 S18 S19
S21
S22
S23
S24
S25 S28 S29 S30 S31 S32
S33 S34 S36
S37 X
S5
S5
S5
S5
S5
S8
S8
S12
S12
S11
S11
S15
S15
S20
S20
S20
S11
S11
S11
S14
S14
S14
S26 S27
S27
S26 S35
S35
S1 S2 S3 S4 S5 S6 S8
S9
S7
S7
Find a posterfor the movie “Antz”
Find the tour schedule of the“Second City Comedy Troupe”
Card, S., Pirolli, P., Van Der Wege, M., Morrison, J., Reeder, R., Schraedley, P., & Boshart, J. (2001). CHI Proceedings
SNIF-ACT
� Scent-based Navigation and Information Foraging in the ACT theory
� Declarative knowledge– User goal (e.g. Find the poster for “Antz”)– Perceived aspects of Web page & browser– Large spreading activation network representing word
associations
� Procedural knowledge– Productions representing basic Web browsing actions
� Utility: Information Scent– Mutual relevance between link text and user goal
Cognitive Model of Information Scent
cell
patient
dose
beam
new
medical
treatments
procedures
InformationGoal Link Text
� Spreading activation– Activation reflects likelihood of
relevance given past history and current context
Spreading activation
Bi = ln( ψ)Pr(i)Pr(not i)
Sji = ln( ψ)Pr(j|i)Pr(j|not i)
i“bread”
j“butter”
Ai = Bi + ΣWjSji
Activation of node i
Base-levelactivation
Activation spreadfrom linked nodes j
Base-level reflects log odds of occurrence
Strength of link spread reflects log likelihoododds of cooccurrance
spreading activation networks
Documentcorpus
Wordstatistics
Spreadingactivationnetwork
~ 200 X 200 million sparse “word” matrix
~ 55 million associations
Information Scent:Random Utility Model based on Activation
GJGJGJ VU ||| ε+=
∑∈
=Gi
iGJ AV |
∑∈
=
CK
V
V
QK
QJ
eeCGJ
|
|
),|Pr( µ
µ
Utility of alternative J, out of set C, in context of goal G:
Deterministic
Stochastic
Deterministic utility is based on activation:
Probability of choosing alternative J:
Summed activation fromscent cues to goal
Specific alternative J
Set of Alternatives (C)
GoalG
Evaluation:Link-following actions
Link XLink WLink V
…
Info ScentSNIF-ACT
User ActionsPage A choose “link X”Page B choose “link Y”Page C choose “link Z”
…
Link KLink YLink M
…
Info Scent
Page A
Page B
Link ZLink OLink B
…
Info ScentPage C
Rank=1
Rank=1
Rank=2
Observed vs Predicted Link Choice
Rank of link by SNIF-ACT predicted utility
Observed frequency thatlink is chosen by user
Theory: Decision to Leave Web Site
Expected Utility(via Information Scent)
Foraging Time
Expected utility of starting over at a new relevant site
Expected utility of continuing at site
Decision to stop& go elsewhere
Data: Sequences of Moves Just Prior to Leaving a Site
Match of SNIF-ACT to User Data
SNIF-ACTPredictedutilityof nextlink
User Move Before Leaving Site
SNIF-ACTAverageutility ofnew site
Cancer
Skin
Forming Concepts about Information Sources
Treatments Statistics
Standard Web Site Search Engine
Forming Concepts about Information Sources
Scatter/GatherDocument Clustering Browser
Browser Study
Pirolli, P., Hearst, M., Schank, P., & Diehl, C. (1996). Scatter-Gather browsing communicates the topic structure of a very large text collection. Proceedings of the ‘CHI ’96 Conference.
� Browsers– Scatter-Gather (N = 8)– Standard Similarity Search Engine (N = 8)
� 12 Tasks (TREC)– E.g., Find documents on new medical procedures
for cancer
Inferred Topic Structure
Browse documents
Typical Topic-Structure Tree Diagram
Coherence of Mental Topics Among Users
SGSS
Person
CPerson A
Person B
Hierarchical clustering of users based on similarity of topic trees
InfoCLASS
� Information Category Learning by Adaptation to Scent Stimuli– Rational Analysis Model of Human Category
Learning (Anderson, 1991, 1991)� Assumes
– Topics are mental categories– Mental categories are probabilistic collections of
concepts– Links, thumbnails, citations provide information
scent cues that evoke mental concepts
Evoking Categories and Inferences
cell
patient
dose
beam
Perceive cues
Health
Strength between observed UI features (words) and category reflects log odds
ActivateCategory
medical
treatments
cancer
Strength between category and inferred features reflects log odds
Infer
Cues correspond to existing mental topics: Choose most activated (highest odds)
cell
patient
dose
beam
Health
Patents
medical
treatments
cancer
Otherwise create new category
cell
patient
dose
beam
New
InfoCLASS Simulation
� Scatter/Gather (SG): expose to cluster summaries from user logs
� Standard Search (SS) Expose to search result lists from user logs
� Evaluate ADA (category coherence) for the two groups of users
SG Logs SS Logs
Parse
Stoplist
Parse
Stoplist
Items Items
Categorization Categorization
Compare
Simulation Performance
141241486507816151963370745152565420628120133177052810114815794331421926526339149767470228594643301
SSSGSSSG
496 1123
Items Categories
125 29
Scatter-Gather (SG) simulations develop more categories than Standard Search (SS).
Same trend as Pirolli et al (1996) comparison of SG vsSS users.
Category structure and coherence
1.292SS
.949SG
ADA x 10-6 bits
0 10 20 30 40 50 600
20
40
60
80
100
120
Rank S ize of Category
Num
ber o
f Cat
egor
y M
embe
rs
0 5 10 15 20 25 300
50
100
150
200
250
300
350
400
450
Rank S ize of Category
Num
ber o
f cat
egor
y M
embe
rs
Scatter/Gather User 1 Similarity Search User 1
Scatter-Gather simulations exhibit greater category coherence (less divergence).
Summary� Importance
– Quality of information scent can effect qualitative changes in navigation costs (linear to exponential)
– Browsers can differ in the how they communicate the topic structure of a collection of information
� SNIF-ACT model of navigation– Navigation & when to stop
� InfoClass– Model of formation of mental categories of topic structure
� Applications– New UIs (ScentTrails, Relevance Enhanced Thumbnails, …)– Web Site Evaluation tools (Bloodhound, Lumberjack)
BACKUP SLIDES
Eye Movements:High Information Scent(Pirolli, Card, & Van Der Wege, 2003, TOCHI)
Eye Movements:Low Information Scent(Pirolli, Card, & Van Der Wege, 2003, TOCHI)
Information Scent and the Cost of Navigation(based on Hogg & Huberman, 1987)
D = depth of search hierarchyz = average branching factor
(1- q) = prob. of false alarm ( Pr[FA] )µµµµ(q, z) = qz =average no. branches explored
A(U, q, z) = average no. nodes explored within distance U= (1- µµµµ(q, z)U +1)/(1-µµµµ(q, z))
N(D, z, q) = average no. nodes examined before desired goal found
= (z - 1)q2[ ]Σ A(s - 1, q, z)
s=1
D-1
Prototype Formation
� If there are no existing categories, then create a new category (kNew) and assign instance P to the category, otherwise
� Determine the prob that the instance comes from a new category, Pr(kNew|P), and compare that to the existing category with the highest probability of including the instance, Pr(kMax|P)– Assign P to kNew if Pr(kNew|P) > Pr(kMax|P), else– Assign P to kMax
Coupling Parameter (c)
cnccnk k
+−=
)()Pr(
1
cncckNew +−
−=)(
)Pr(1
1
Number of items in category
Number of items experienced
Categorization Model
Item = [ ]00102
w1 w2 wN….Series of multinomial trials
∏ −=i
innip
Zppp 1
212121
1 α
αααααα
),,(),,|,,Pr(
�
��
Category = [ ]11415 .....
w1 w2 wNVector of probabilities
Dirichlet
[ ]
[ ]∑
∑
∑
++=
=
=
=
=
jj
iii
i
ii
ii
ii
nnp
p
p
0
0
0
1
1
αα
α
αα
αα
E
EExpected value
Noninformative prior
Posterior distribution
Average Divergence from Average Entropy
∑
=x wq
xpxpqpD)()(log)()||(
[ ]54321 ppppp [ ]54321 ppppp [ ]54321 ppppp
[ ]54321 ppppp
K
DADA
K
kk∑
== 1
)||( pp
p=
pk
Category structure and coherence
1.292SS
.949SG
ADA x 10-6 bits
0 10 20 30 40 50 600
20
40
60
80
100
120
Rank S ize of Category
Num
ber o
f Cat
egor
y M
embe
rs
0 5 10 15 20 25 300
50
100
150
200
250
300
350
400
450
Rank S ize of Category
Num
ber o
f cat
egor
y M
embe
rs
Scatter/Gather User 1 Similarity Search User 1