1 t-scroll: visualizing trends in a time-series of documents for interactive user exploration...

37
1 T-Scroll: Visualizing Trends in a Time-Series of Documents for Interactive User Exploration Yoshiharu Ishikawa and Mikine Hasegawa Nagoya University, Japan [email protected]

Upload: estella-henderson

Post on 29-Dec-2015

221 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 1 T-Scroll: Visualizing Trends in a Time-Series of Documents for Interactive User Exploration Yoshiharu Ishikawa and Mikine Hasegawa Nagoya University,

1

T-Scroll: Visualizing Trends in a Time-Series of Documents for Interactive User Exploration

Yoshiharu Ishikawa and Mikine Hasegawa

Nagoya University, [email protected]

Page 2: 1 T-Scroll: Visualizing Trends in a Time-Series of Documents for Interactive User Exploration Yoshiharu Ishikawa and Mikine Hasegawa Nagoya University,

2

Outline Background and objective Related work Novelty-based document clustering Overview of T-Scroll system Evaluation Conclusions and future work

Page 3: 1 T-Scroll: Visualizing Trends in a Time-Series of Documents for Interactive User Exploration Yoshiharu Ishikawa and Mikine Hasegawa Nagoya University,

3

Background Time-series of documents

Example: news articles delivered on the Internet, online academic journals

Continually delivered everyday

Problems A large number of documents: appropriate

summarization is required Topics will change: topic detection/tracking and trend

extraction are useful

Page 4: 1 T-Scroll: Visualizing Trends in a Time-Series of Documents for Interactive User Exploration Yoshiharu Ishikawa and Mikine Hasegawa Nagoya University,

4

Objectives Development and evaluation of T-Scroll

(Trend/Topic-Scroll) User interface for visualizing the transition of topics

extracted from a time-series documents

System Features Constructed over a document clustering system that

outputs new clustering results periodically Clusters are displayed along the time axis like a scroll Links are shown between related clusters to represent

topic transition Some useful features for interactive exploratory analysis

Page 5: 1 T-Scroll: Visualizing Trends in a Time-Series of Documents for Interactive User Exploration Yoshiharu Ishikawa and Mikine Hasegawa Nagoya University,

5

Page 6: 1 T-Scroll: Visualizing Trends in a Time-Series of Documents for Interactive User Exploration Yoshiharu Ishikawa and Mikine Hasegawa Nagoya University,

6

Outline Background and objective Related work Novelty-based document clustering Overview of T-Scroll system Evaluation Conclusions and future work

Page 7: 1 T-Scroll: Visualizing Trends in a Time-Series of Documents for Interactive User Exploration Yoshiharu Ishikawa and Mikine Hasegawa Nagoya University,

7

Visualization of a time-series of documents A few systems for visualization of trends in a time-

series of documents ThemeRiver (Havre et al, IEEE Trans. VCG,

2002) [4] Visualizes topic streams like a river Focuses on providing visual impacts No features for analysis and browsing

TimeMine (Swan and Allan, SIGIR’00) [5] Extracts topics from a time-series of documents Displays timelines to represent topics on the screen

Page 8: 1 T-Scroll: Visualizing Trends in a Time-Series of Documents for Interactive User Exploration Yoshiharu Ishikawa and Mikine Hasegawa Nagoya University,

8

ThemeRiver

Analysis of the articles related to Cuba (1960 – 1961)

Page 9: 1 T-Scroll: Visualizing Trends in a Time-Series of Documents for Interactive User Exploration Yoshiharu Ishikawa and Mikine Hasegawa Nagoya University,

9

TimeMine Swan & Allan (U. of Massachusetts)

Page 10: 1 T-Scroll: Visualizing Trends in a Time-Series of Documents for Interactive User Exploration Yoshiharu Ishikawa and Mikine Hasegawa Nagoya University,

10

Analysis of time-dependent clusters Mei & Zhai (KDD’05) [6]

Statistical approach for discovering major topics from a time-series of documents

Probabilistic modeling

MONIC (Spiliopoulou et al., KDD’06) [7] Detects various types of patterns from cluster

transitions Examples: splitting/merging of clusters, cluster size changes

Based on the analysis of historical snapshots of clusters

Page 11: 1 T-Scroll: Visualizing Trends in a Time-Series of Documents for Interactive User Exploration Yoshiharu Ishikawa and Mikine Hasegawa Nagoya University,

11

Outline Background and objective Related work Novelty-based document clustering Overview of T-Scroll system Evaluation Conclusions and future work

Page 12: 1 T-Scroll: Visualizing Trends in a Time-Series of Documents for Interactive User Exploration Yoshiharu Ishikawa and Mikine Hasegawa Nagoya University,

12

Novelty-based document clustering (1) Developed by our group (ECDL’01 [8], WWW Journal

2007 [10] etc.) Clusters documents incrementally based on their

similarity and novelty Features

Similarity considers novelty Assign high weights to recent documents, low weights to old

ones Document weights decay as time passes: Based on the

concept of obsolescence (aging) Delete old documents whose weights are smaller than the

threshold Incremental processing: low update cost

Page 13: 1 T-Scroll: Visualizing Trends in a Time-Series of Documents for Interactive User Exploration Yoshiharu Ishikawa and Mikine Hasegawa Nagoya University,

13

Novelty-based document clustering (2)

ττ time

New President SarkozyYeltsin’s Death

Other articles

Blair to Resign

“Yeltsin’s Death” and other

documents are obsolete!

“Yeltsin’s Death” and other

documents are obsolete!

Periodical clustering processes are performed on a time-series of documents

Page 14: 1 T-Scroll: Visualizing Trends in a Time-Series of Documents for Interactive User Exploration Yoshiharu Ishikawa and Mikine Hasegawa Nagoya University,

14

Document similarity (1)

iTττi λ|dw

acquisition timeacquisition time of document of document di

1

dwi

Ti t

iTτλ

Current timeCurrent time

(0 < < 1): forgetting factor determines the forgetting speed

The weight of a document exponentially decreases as time passes.

Assumption: each delivered document gradually loses its value as time passes

dwi: the weight of a documentdi at time

Page 15: 1 T-Scroll: Visualizing Trends in a Time-Series of Documents for Interactive User Exploration Yoshiharu Ishikawa and Mikine Hasegawa Nagoya University,

15

Document similarity (2) Similarity score of documents di and dj

Based on novelty of documents and word occurrence patterns in the documents.

Extension of the tf-idf method

New documents have high impact on the clustering result

Document clustering: k-means method

ji

jiji

jiji

lenlendd

ddddsim

dd)Pr()(Pr

),(Pr),(

Page 16: 1 T-Scroll: Visualizing Trends in a Time-Series of Documents for Interactive User Exploration Yoshiharu Ishikawa and Mikine Hasegawa Nagoya University,

16

Outline Background and objective Related work Novelty-based document clustering Overview of T-Scroll system Evaluation Conclusions and future work

Page 17: 1 T-Scroll: Visualizing Trends in a Time-Series of Documents for Interactive User Exploration Yoshiharu Ishikawa and Mikine Hasegawa Nagoya University,

17

T-Scroll: Idea Periodical clustering results are displayed like a

scroll Links represents related cluster pairs

Page 18: 1 T-Scroll: Visualizing Trends in a Time-Series of Documents for Interactive User Exploration Yoshiharu Ishikawa and Mikine Hasegawa Nagoya University,

18

Page 19: 1 T-Scroll: Visualizing Trends in a Time-Series of Documents for Interactive User Exploration Yoshiharu Ishikawa and Mikine Hasegawa Nagoya University,

19

System functionalities (1) Cluster labels: selected based on the formula

Pr(di): document weight, tfij: term frequency count

Cluster sizes: ellipse size roughly corresponds to the number of documents

Links: If the score is greater than the threshold, links are shown

pi Cd

ijij tfdtscore )Pr()(

||

||)|Pr()(

i

jiijji C

CCCdCdCCscore

Page 20: 1 T-Scroll: Visualizing Trends in a Time-Series of Documents for Interactive User Exploration Yoshiharu Ishikawa and Mikine Hasegawa Nagoya University,

20

System functionalities (2) Cluster quality: visualized using different colors for

the cluster border lines red (good) purple (bad)

High score can be achieved if (1) the cluster size is large, and (2) documents contained in the cluster are similar

jiji ddCddji ddsim

CCCsimavg

CsimavgCCquality

,,

),()1|(|||

1)(_

)(_||)(

Page 21: 1 T-Scroll: Visualizing Trends in a Time-Series of Documents for Interactive User Exploration Yoshiharu Ishikawa and Mikine Hasegawa Nagoya University,

21

System functionalities (3) Drill-down/roll-up: user can specify the interval of

between two consecutive clustering interactively (e.g., one day, one week)

Displaying keyword list: user can browse the keyword list for a specified cluster

Access to original documents Keyword-based emphasis: clusters that contain a

user-specified keyword are emphasized

Page 22: 1 T-Scroll: Visualizing Trends in a Time-Series of Documents for Interactive User Exploration Yoshiharu Ishikawa and Mikine Hasegawa Nagoya University,

22

Demo

Page 23: 1 T-Scroll: Visualizing Trends in a Time-Series of Documents for Interactive User Exploration Yoshiharu Ishikawa and Mikine Hasegawa Nagoya University,

23

System implementation T-Scroll module

Written by Perl: generates an SVG file Browser displays the generated SVG file SVG file includes scripts (JavaScript)

Used for interactive manipulation

Clustering module Written by Ruby Novelty-based incremental document clustering

Page 24: 1 T-Scroll: Visualizing Trends in a Time-Series of Documents for Interactive User Exploration Yoshiharu Ishikawa and Mikine Hasegawa Nagoya University,

24

System architecture

SVG ControlModule

T-ScrollMain Module

SVG OutputModule

(JavaScript)

SVG file(includes JavaScript)

(Perl)

( Perl )

Plug-in

Outputs

T-Scroll

---------------------Browser

---------------------

---------------------

---------------------

News articles

Input Output

---------------------

---------------------

---------------------

Clustering result

Input

Commandinputs

Clusterdisplay

Interactivemanipulation

User

ClusteringModule

RSS FeedModule

Page 25: 1 T-Scroll: Visualizing Trends in a Time-Series of Documents for Interactive User Exploration Yoshiharu Ishikawa and Mikine Hasegawa Nagoya University,

25

Outline Background and objective Related work Novelty-based document clustering Overview of T-Scroll system Evaluation Conclusions and future work

Page 26: 1 T-Scroll: Visualizing Trends in a Time-Series of Documents for Interactive User Exploration Yoshiharu Ishikawa and Mikine Hasegawa Nagoya University,

26

Evaluation 10 Users Data set

Japanese news articles collected from news web sites from Sept. 2006 to Feb. 2007

100 articles per day Clustering was performed at six-hour intervals

Evaluation criteria Overall impressions Evaluation of each function Obervability of topics Comparison with ThemeRiver

Page 27: 1 T-Scroll: Visualizing Trends in a Time-Series of Documents for Interactive User Exploration Yoshiharu Ishikawa and Mikine Hasegawa Nagoya University,

27

Overall impression User specifies scores between 0 to 5

0

1

2

3

4

5

Usability

Understandability

Usefulness

Design

Page 28: 1 T-Scroll: Visualizing Trends in a Time-Series of Documents for Interactive User Exploration Yoshiharu Ishikawa and Mikine Hasegawa Nagoya University,

28

Evaluation on each function

012345

Scroll

DocN

um

Label

Quality

Keyw

ord

TitleList

Emphasis

Interval

Page 29: 1 T-Scroll: Visualizing Trends in a Time-Series of Documents for Interactive User Exploration Yoshiharu Ishikawa and Mikine Hasegawa Nagoya University,

29

Observability of topics (1) Can users observe major topics in Nov. 2006?

Five major topics are specified by ours: user gives scores how clearly he or she can observe the topic

0

1

2

3

4

5

Topic 1 Topic 2 Topic 3 Topic 4 Topic 5

Page 30: 1 T-Scroll: Visualizing Trends in a Time-Series of Documents for Interactive User Exploration Yoshiharu Ishikawa and Mikine Hasegawa Nagoya University,

30

Observability of topics (2) 10 users (different from

former experiments) Users should reply

observed topics and their scores with no information

Topics 1 to 5 are major topics used in the previous experiments

Topic 2 (big hurricane) was regarded as a normal weather topic

0

2

4

6

8

Topic 1

Topic 4

Topic 3

Topic 6

Topic 7

Topic 5

Topic 8

Topic 9

Topic 10

Topic 11

No. of answersScore

Page 31: 1 T-Scroll: Visualizing Trends in a Time-Series of Documents for Interactive User Exploration Yoshiharu Ishikawa and Mikine Hasegawa Nagoya University,

31

Comparison with ThemeRiver (1) ThemeRiver-like display figure was manually

created for news articles in Dec. 2006 11 users (different from previous experiments) Questions to users

Overall impressions Obserbability of topics

Page 32: 1 T-Scroll: Visualizing Trends in a Time-Series of Documents for Interactive User Exploration Yoshiharu Ishikawa and Mikine Hasegawa Nagoya University,

32

Page 33: 1 T-Scroll: Visualizing Trends in a Time-Series of Documents for Interactive User Exploration Yoshiharu Ishikawa and Mikine Hasegawa Nagoya University,

33

Comparison with ThemeRiver (2) Overall impression

Category No. of replies

T-Scroll is better 2

T-Scroll is slightly bettrer 3

Almost same 3

ThemeRiver is slightly better 3

ThemeRiver is better 0

Page 34: 1 T-Scroll: Visualizing Trends in a Time-Series of Documents for Interactive User Exploration Yoshiharu Ishikawa and Mikine Hasegawa Nagoya University,

34

Comparison with ThemeRiver (2) Can users observe five major topics that we

selected?

Category No. of replies

Good 0

Possible 3

No good 4

Impossible 4

Page 35: 1 T-Scroll: Visualizing Trends in a Time-Series of Documents for Interactive User Exploration Yoshiharu Ishikawa and Mikine Hasegawa Nagoya University,

35

Summary of experiments Overall impressions

Good, but improvements required for usability Some users made comments on the response speed

System functionalities Several features (quality info, article lists, etc.) are

useful in practice Appropriate labels are necessary: should be improved

Comparison with ThemeRiver ThemeRiver has visual impacts, but its display tends to

be complicated for many topics

Page 36: 1 T-Scroll: Visualizing Trends in a Time-Series of Documents for Interactive User Exploration Yoshiharu Ishikawa and Mikine Hasegawa Nagoya University,

36

Outline Background and objective Related work Novelty-based document clustering Overview of T-Scroll system Evaluation Conclusions and future work

Page 37: 1 T-Scroll: Visualizing Trends in a Time-Series of Documents for Interactive User Exploration Yoshiharu Ishikawa and Mikine Hasegawa Nagoya University,

37

Conclusions and future work Development and evaluation of T-Scroll system

Based on novelty-based incremental clustering method Scroll-like display for showing changing trends Several features for interactive analysis

Evaluation Overall impression Observability of topics Comparison with ThemeRiver

Future work Sophisticated keyword (label) selection Improvement of interactive speed