ed h. chi ima digital library workshop 2001-02-23 1 ed h. chi u of minnesota ph.d.: visualization...

31
Ed H. Chi IMA Dig ital Library Workshop 200 1-02-23 1 Ed H. Chi Ed H. Chi www.geekbiker.com www.geekbiker.com U of Minnesota Ph.D.: Visualization Spreadsheets M.S.: Computational Biology Expertise: InfoVis, Study of the Web, TaeKwonDo, Poetry, Motorcycling, Pottery

Post on 21-Dec-2015

216 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Ed H. Chi IMA Digital Library Workshop 2001-02-23 1 Ed H. Chi  U of Minnesota Ph.D.: Visualization Spreadsheets M.S.: Computational Biology

Ed H. Chi IMA Digital Library Workshop 2001-02-23

1

Ed H. ChiEd H. Chi www.geekbiker.comwww.geekbiker.com

U of MinnesotaPh.D.: Visualization

SpreadsheetsM.S.: Computational

Biology

Expertise: InfoVis, Study of the Web, TaeKwonDo, Poetry, Motorcycling, Pottery

Page 2: Ed H. Chi IMA Digital Library Workshop 2001-02-23 1 Ed H. Chi  U of Minnesota Ph.D.: Visualization Spreadsheets M.S.: Computational Biology

Information Scent Information Scent Modeling User Browsing Strategies Modeling User Browsing Strategies

on the Webon the Web

Ed H. ChiPeter Pirolli

User Interface Research GroupThis research was supported in part by

Office of Naval Research contract number 'N00014-96-C-007'.

Page 3: Ed H. Chi IMA Digital Library Workshop 2001-02-23 1 Ed H. Chi  U of Minnesota Ph.D.: Visualization Spreadsheets M.S.: Computational Biology

Ed H. Chi IMA Digital Library Workshop 2001-02-23

3

Comparison to LibraryComparison to Library

• Experience tells us:• general layout of content

– which floor, which section.• which books are of greatest interest

– by the wear on the spines.• which information is timely or

deadwood– by looking at the circulation check-out

stamps inside the book covers.

Page 4: Ed H. Chi IMA Digital Library Workshop 2001-02-23 1 Ed H. Chi  U of Minnesota Ph.D.: Visualization Spreadsheets M.S.: Computational Biology

Ed H. Chi IMA Digital Library Workshop 2001-02-23

4

Trends and ProblemsTrends and Problems

• 200M Web users, 6M web sites• Web design process ad-hoc, not

optimal• Some tools extract behaviors and

correlations but not intentionally

• Being successful requires making the Web more useful and usable to a broader audience

Page 5: Ed H. Chi IMA Digital Library Workshop 2001-02-23 1 Ed H. Chi  U of Minnesota Ph.D.: Visualization Spreadsheets M.S.: Computational Biology

Ed H. Chi IMA Digital Library Workshop 2001-02-23

5

Information Information ForagingForagingInformation Information ForagingForaging

Amount ofAmount ofAccessibleAccessibleKnowledgeKnowledge

Amount ofAmount ofAccessibleAccessibleKnowledgeKnowledge

Cost [Time]Cost [Time]Cost [Time]Cost [Time]

Page 6: Ed H. Chi IMA Digital Library Workshop 2001-02-23 1 Ed H. Chi  U of Minnesota Ph.D.: Visualization Spreadsheets M.S.: Computational Biology

Ed H. Chi IMA Digital Library Workshop 2001-02-23

6

Underlying ConceptUnderlying Concept

• Users seeking information is similar to hunter/gatherers optimization strategies.

Page 7: Ed H. Chi IMA Digital Library Workshop 2001-02-23 1 Ed H. Chi  U of Minnesota Ph.D.: Visualization Spreadsheets M.S.: Computational Biology

Ed H. Chi IMA Digital Library Workshop 2001-02-23

7

Underlying ConceptUnderlying Concept

• Information Scent is the user perception of the cost and value of information.– Similar to hunters

following animal foot prints.

Page 8: Ed H. Chi IMA Digital Library Workshop 2001-02-23 1 Ed H. Chi  U of Minnesota Ph.D.: Visualization Spreadsheets M.S.: Computational Biology

Ed H. Chi IMA Digital Library Workshop 2001-02-23

8

E n t e r E x i t

U s e r s e n t e r a w e b s i t e a t v a r i o u sp a g e s a n d b e g i n s u r f i n g

C o n t i n u i n g s u r f e r s d i s t r i b u t e t h e m s e l v e s d o w n v a r i o u s p a t h s

S u r f e r s a r r i v e a t p a g e s h a v i n g t r a v e l e d d i f f e r e n t p a t h s

A f t e r s o m e n u m b e r o f p a g e v i s i t s s u r f e r s l e a v e t h e w e b s i t e

( a )

( b )

( c )

( d )

p1

p3

p2

Page 9: Ed H. Chi IMA Digital Library Workshop 2001-02-23 1 Ed H. Chi  U of Minnesota Ph.D.: Visualization Spreadsheets M.S.: Computational Biology

Ed H. Chi IMA Digital Library Workshop 2001-02-23

9

Information ScentInformation Scent

Users forage by surfing along links

Foragers use proximal cues (text snippets or graphics) to

accessdistal content (destination page)

Scent is the proximal perception of value and cost of distal content

contentlinksnippet

Page 10: Ed H. Chi IMA Digital Library Workshop 2001-02-23 1 Ed H. Chi  U of Minnesota Ph.D.: Visualization Spreadsheets M.S.: Computational Biology

Ed H. Chi IMA Digital Library Workshop 2001-02-23

10

AssumptionsAssumptions

Users have information goals, their surfing patterns are guided by information scent

Two questions– Given an information goal and a starting

pointWhere do users go? (Behavior)

– Given some surfing patternWhat is the user’s goal? (Need)

Page 11: Ed H. Chi IMA Digital Library Workshop 2001-02-23 1 Ed H. Chi  U of Minnesota Ph.D.: Visualization Spreadsheets M.S.: Computational Biology

Ed H. Chi IMA Digital Library Workshop 2001-02-23

11

WUFIS: Web User Flow by Information WUFIS: Web User Flow by Information ScentScent

UserInformation

goal

Web site

WebPage

contentlinks

Web user flow simulation

Predictedpaths

Page 12: Ed H. Chi IMA Digital Library Workshop 2001-02-23 1 Ed H. Chi  U of Minnesota Ph.D.: Visualization Spreadsheets M.S.: Computational Biology

Ed H. Chi IMA Digital Library Workshop 2001-02-23

12

How does it work?How does it work?

Start users at page with some goal

Flow users through the

network

Examine user patterns

Page 13: Ed H. Chi IMA Digital Library Workshop 2001-02-23 1 Ed H. Chi  U of Minnesota Ph.D.: Visualization Spreadsheets M.S.: Computational Biology

Ed H. Chi IMA Digital Library Workshop 2001-02-23

13

WUFIS AlgorithmWUFIS Algorithm

document

wordWQR

T

1000000

0010000

0000100

0100000

0000011

0000001

0001000

0101110

0

0

0

0

0

0

1

1

Weight MatrixQuery

1

Relevant

Documents

Page 14: Ed H. Chi IMA Digital Library Workshop 2001-02-23 1 Ed H. Chi  U of Minnesota Ph.D.: Visualization Spreadsheets M.S.: Computational Biology

Ed H. Chi IMA Digital Library Workshop 2001-02-23

14

WUFIS Algorithm WUFIS Algorithm (cont.)(cont.)

R = Relevant documents

T = Topology matrix

from

toTRS

0269.0269.00000

10731.000212.00

0000000

00001576.00

00000212.00

0731.000001

0000000

2

Scent Matrix

Page 15: Ed H. Chi IMA Digital Library Workshop 2001-02-23 1 Ed H. Chi  U of Minnesota Ph.D.: Visualization Spreadsheets M.S.: Computational Biology

Ed H. Chi IMA Digital Library Workshop 2001-02-23

15

Prelim. Evaluation of Prelim. Evaluation of WUFISWUFIS

Show that WUFIS generates good URL destinations based on information need.19 WebsitesSize: 27-12,000 pagesInfo Provider, eCommerce, Large Corp.Info Need from very general (product

info) to very specific (migraine headaches)

Top ten URL position simulated are extracted.

Each URL is blindly rated for relevancy.

Page 16: Ed H. Chi IMA Digital Library Workshop 2001-02-23 1 Ed H. Chi  U of Minnesota Ph.D.: Visualization Spreadsheets M.S.: Computational Biology

Ed H. Chi IMA Digital Library Workshop 2001-02-23

16

WUFIS EvaluationWUFIS Evaluation

570 ratings are collected = 3 variations of the algorithm x 10 URLs x 19 sites

Tabulated, Averaged.Result = 7.54 (out of 10)

19 Websites

Website Info,Algorithm Performance

Page 17: Ed H. Chi IMA Digital Library Workshop 2001-02-23 1 Ed H. Chi  U of Minnesota Ph.D.: Visualization Spreadsheets M.S.: Computational Biology

Ed H. Chi IMA Digital Library Workshop 2001-02-23

17

IUNIS: Inferring User Need by Info IUNIS: Inferring User Need by Info ScentScent

UserInformation

goal

Web site

WebPage

contentlinks

Web user flow simulation

observedpaths

Page 18: Ed H. Chi IMA Digital Library Workshop 2001-02-23 1 Ed H. Chi  U of Minnesota Ph.D.: Visualization Spreadsheets M.S.: Computational Biology

Ed H. Chi IMA Digital Library Workshop 2001-02-23

18

Extracting PathsExtracting Paths

Longest Repeating Sequence (LRS)New path mining techniqueExtracts significant surfing

pathsReduces the complexity of

path model

Page 19: Ed H. Chi IMA Digital Library Workshop 2001-02-23 1 Ed H. Chi  U of Minnesota Ph.D.: Visualization Spreadsheets M.S.: Computational Biology

Ed H. Chi IMA Digital Library Workshop 2001-02-23

19

0

2

0

2

2

1

0

0

0

0

3

0

2

1

0110000

1010010

0000000

0000110

0000010

0100001

0000000

12

from

toPTP

IUNISIUNIS

0

2

0

2

2

1

0

1000000

0010000

0000100

0100000

0000011

0000001

0001000

0101110 document

wordPWK

Weight Path

P = observed user path

T = topology matrix

W = word x document weights

K = relevant keywords

2

1

Topology Path

Page 20: Ed H. Chi IMA Digital Library Workshop 2001-02-23 1 Ed H. Chi  U of Minnesota Ph.D.: Visualization Spreadsheets M.S.: Computational Biology

Ed H. Chi IMA Digital Library Workshop 2001-02-23

20

Evaluation of IUNISEvaluation of IUNISGoal:

Show that keyword summaries produced by IUNIS are good at communicating the content of the user paths.

Dataset:8 participants random 10 paths from (5/18/1998, xerox.com,

path length=6)

booklets of pages on paths (in order)

Page 21: Ed H. Chi IMA Digital Library Workshop 2001-02-23 1 Ed H. Chi  U of Minnesota Ph.D.: Visualization Spreadsheets M.S.: Computational Biology

Ed H. Chi IMA Digital Library Workshop 2001-02-23

21

Evaluation of IUNISEvaluation of IUNIS

Procedure:Single rating sheet with the ten 20-word

summaries. Beside each summary, users are asked to rate the summaries on a 5-point Likert Scale. A copy of this rating sheet is attached to each of the ten path booklets

Users are asked to read through each booklet and rate each of the path summaries.

User are also asked to identify which of the ten summaries was the best match.

Page 22: Ed H. Chi IMA Digital Library Workshop 2001-02-23 1 Ed H. Chi  U of Minnesota Ph.D.: Visualization Spreadsheets M.S.: Computational Biology

Ed H. Chi IMA Digital Library Workshop 2001-02-23

22

Evaluation of IUNISEvaluation of IUNIS

Results:Matching summary mean = 4.58

(median=5)Non-matching summary mean = 1.97

(median=1)Difference highly significant (p < .001)Best match summary: 5.6 out of 10

(Cohen Kappa=0.51)

Evaluation yield strong evidence that IUNIS generates good summaries of the Web paths.

Page 23: Ed H. Chi IMA Digital Library Workshop 2001-02-23 1 Ed H. Chi  U of Minnesota Ph.D.: Visualization Spreadsheets M.S.: Computational Biology

Ed H. Chi IMA Digital Library Workshop 2001-02-23

23

ScentViz TasksScentViz Tasks

Overall siteHigh-level traffic flow and routes?Ease of access and costs?

Given a specific Web pageWhere do users come from?Where do they go? What other pages are related?

UsersWhat are interests of the users?Where should they go based on their

need?Do observed data match simulation?

Page 24: Ed H. Chi IMA Digital Library Workshop 2001-02-23 1 Ed H. Chi  U of Minnesota Ph.D.: Visualization Spreadsheets M.S.: Computational Biology

Ed H. Chi IMA Digital Library Workshop 2001-02-23

24

Visualization DemoVisualization DemoDome TreeUsage Based LayoutPath Embedding

Page 25: Ed H. Chi IMA Digital Library Workshop 2001-02-23 1 Ed H. Chi  U of Minnesota Ph.D.: Visualization Spreadsheets M.S.: Computational Biology

Ed H. Chi IMA Digital Library Workshop 2001-02-23

25

Scenario 1: Page TypesScenario 1: Page Types

Multi-way branching point

investor/sitemap.htm

Page 26: Ed H. Chi IMA Digital Library Workshop 2001-02-23 1 Ed H. Chi  U of Minnesota Ph.D.: Visualization Spreadsheets M.S.: Computational Biology

Ed H. Chi IMA Digital Library Workshop 2001-02-23

26

Scenario 1: Drill-downScenario 1: Drill-down

Few well-traveled future paths

shareholder info1998 fact bookfinancial doc order

Conclusiongood local sitemap

Page 27: Ed H. Chi IMA Digital Library Workshop 2001-02-23 1 Ed H. Chi  U of Minnesota Ph.D.: Visualization Spreadsheets M.S.: Computational Biology

Ed H. Chi IMA Digital Library Workshop 2001-02-23

27

Scenario 2: Well-Scenario 2: Well-traveledtraveled

Related information all over the site

One well-worn path on the left relating to product tutorial

Scansoft/tbpro98win/index.htm

Page 28: Ed H. Chi IMA Digital Library Workshop 2001-02-23 1 Ed H. Chi  U of Minnesota Ph.D.: Visualization Spreadsheets M.S.: Computational Biology

Ed H. Chi IMA Digital Library Workshop 2001-02-23

28

Scenario 3: Identify Scenario 3: Identify NeedNeed

Need of path from shareinfo to orderdoc

reinvestmentstockbrochuredividendshareholder

investor/sitemap.htm

Page 29: Ed H. Chi IMA Digital Library Workshop 2001-02-23 1 Ed H. Chi  U of Minnesota Ph.D.: Visualization Spreadsheets M.S.: Computational Biology

Ed H. Chi IMA Digital Library Workshop 2001-02-23

29

Scenario 4: Scent Scenario 4: Scent PredictPredict

Scent computed based on “pagis” need

Good match between scent and LRS paths

Scansoft/pagis/index.html

Page 30: Ed H. Chi IMA Digital Library Workshop 2001-02-23 1 Ed H. Chi  U of Minnesota Ph.D.: Visualization Spreadsheets M.S.: Computational Biology

Ed H. Chi IMA Digital Library Workshop 2001-02-23

30

InfoScent SummaryInfoScent Summary

The overall goal is to model Web user information needsBridge gap between clicks and information

needsPredict user navigation behaviorDevelop new applications and Web usability

metrics

Page 31: Ed H. Chi IMA Digital Library Workshop 2001-02-23 1 Ed H. Chi  U of Minnesota Ph.D.: Visualization Spreadsheets M.S.: Computational Biology

Ed H. Chi IMA Digital Library Workshop 2001-02-23

31

Questions?Questions?

Ed H. [email protected]://www.geekbiker.com