citewiz: a tool for the visualization of scientiï¬c citation

Technical Report no. 2004-05

CiteWiz: A Tool for the Visualization ofScientific Citation Networks

Niklas Elmqvist Philippas Tsigas

Department of Computing ScienceChalmers University of Technology and Goteborg University

412 96 Goteborg, Sweden

Goteborg, 2004

Abstract

The management of citation data for scientific articles is part of everyday life for a re-searcher. In this paper, we present CiteWiz, an extensible framework for visualizationof scientific networks. The system is based on a taxonomy of citation database us-age for researchers, and uses the Growing Polygons causality visualization technique,suitably modified to the context of citation data. Using this technique, hierarchiesof articles with potentially very long citation chains can be graphically represented.The visualization is augmented with mechanisms for parent-child visualization, colorassignment, as well as suitable interaction techniques for interacting with the view hi-erarchy and the individual articles in the dataset. Furthermore, the tool includes a statictimeline visualization for overviewing the causality and importance of authors and ar-ticles in a citation database. Informal pilot studies with active researchers indicate thatCiteWiz is useful for a wide range of scientific activities, and could well become astandard tool in their everyday work.

Keywords: citation networks, bibliographic visualization, information visualization,causal relations

1 Introduction

One of the key tasks of scientific research is the management and study of existing workin a given field of inquiry. The specific nature of the tasks involved in this venture varygreatly depending on the situation and the role of the researcher; for a new studentjust entering a research area, the task is that of orientation within the existing work;for a reviewer, one of originality and correctness checking; for a conference organizer,one of chronological survey; and, finally, for an experienced scientist, one of stayingabreast with new developments and identifying current hot topics in his or her area ofchoice. Researchers spend a considerable portion of their time on these tasks, ampleevidence that it is in everyone’s best interest to streamline this process as much aspossible, and that large time savings can be made.

The highly connected and highly contextual nature of citation networks and thelarge amounts of data to be displayed suggest that techniques of information visualiza-tion could successfully be brought to bear in this area.

In this paper, we present CiteWiz, a tool for bibliographic visualization of thechronology and influences in networks of scientific articles. The primary visualizationin CiteWiz is an implementation of the Growing Polygons [6] causality visualizationtechnique, suitably adapted to the context of citation data, but the architecture is suffi-ciently flexible to allow other visualizations to be plugged in. The tool was designed foruse by researchers, scientists, and students alike, and its baseline features were estab-lished through extended discussions in a focus group consisting of such users. Guidedby these discussions, we created a prototype implementation of the tool with a userinterface that allows for normal browsing and filtering of the citation meta-data as wellas building hierchical views of the dataset for visualization. A formal user study iscurrently in progress to assess the efficiency of using the tool, but our initial pilot testssuggest that CiteWiz is more efficient to use for analysis tasks related to overviews andinfluences than traditional citation database search interfaces.

Causality and influences both play a large role in tracing the history of ideas andtrends in a scientific community, and these are core strengths of the Growing Poly-gons technique. In order to allow us to make use of this technique, we show how tomodel citations in scientific articles using general causal relations, and we introducethe slightly relaxed concept ofinfluencebetween articles in a citation database. Wechose an article-centered approach (as opposed to an author-centered one) in our im-plementation, where the articles themselves are the active entities (represented by pro-cesses), and citations are the information-bearing messages between them. To allowthe technique to cope with potentially huge datasets, we also improved its scalability intwo different ways: we implemented multi-level process hierarchies for grouping setsof articles together, and we added a focus+context technique with variable time scaleto handle long event histories. The visualization was accordingly supplemented witha number of interaction techniques to support these new features as well as interac-tion techniques targeted specifically at citation visualization; these include collapsingand expanding the group hierarchy, navigating in the citation network by followingbackward and forward references, and getting details-on-demand of the complete bib-liographical data for a specific paper.

In addition to this, CiteWiz also contains a static influence visualization that renders

1

a timeline of the articles or authors in the citation database, scaling their size and colordepending on the number of citations and the citation density (see Figure 7). In thisway, the authors (or articles) in the database form a “human pyramid” allowing usersto easily see who the giants in the field are, and on whose work they rest upon.

This paper begins (Section 2) with a review of existing work in the field of bib-liographic visualization. We then move on to describe the problem domain (citationnetworks) in more detail in Section 3, and present a breakdown of the roles, tasks, andsubtasks of researchers working with citation meta-data. In Section 4 we present theCiteWiz tool itself, and in Section 5 we give three usage scenarios from using the toolwith the InfoVis 1995-2002 dataset [7]. We end the paper with conclusions and ourplans for future work.

2 Related Work

The common model of viewing citation networks as directed graphs (see the next sec-tion) lends itself quite naturally to visualizing bibliographical data as simple node-linkdiagrams. However, node-link diagrams scale poorly with network size, and further-more only present local dependency information; it is easy to see direct citing and citedarticles, but the user must traverse the graph in order to see dependencies more thanone step away. CiteWiz, on the other hand, provides the surrounding context throughinfluence mapping, and gives a more straightforward way to see the chronology ofarticles.

Modjeska et al. [13] propose a minimum set of functions necessary for effectivebibliographic visualization: (i) display of complete bibliographic information, (ii) fil-tering by record fields, (iii) display of chronology and influence of articles, (iv) in-formation views at different levels of detail, (v) multiple simultaneous views, and (v)visualization of large search results. They also present the BIVTECI prototype systemthat partially implements this specification, but the visualization used in the tool is re-stricted to node-link diagrams with visualized attributes. CiteWiz also implements thisminimum functionality, but instead employs the Growing Polygons causality visualiza-tion technique in order to handle larger search results and provide stronger chronologyand influence information.

The Butterfly [12] system provides a 3D visualization front-end of the DIALOGscience citation databases, using the notion of “organic user interfaces” to build aninformation landscape as the user explores the results of various queries. Individualarticles are represented by an innovative butterfly-shaped 3D object with references andciters on the left and right wings, respectively, and provides various graphical cues toorient the user when browsing the citation network. Butterfly uses a node-link diagramfor overview and context, however, and has no mechanism for showing the cumulativeinfluences and chronology of articles.

CiteWiz and the above-mentioned systems are all article-focused tools in that theyemphasize the visualization of articles and their interdependencies. A number of group-focused techniques have also been proposed, where the emphasis lies on representingthe groupings and structure of a scientific domain through metrics such as relevance,bibliographic coupling [11], and co-citation [15]. Work in this area is numerous but

2

peripheral to the system described in this paper; examples include [1, 2, 3, 4, 10].

3 Citation Networks

Citation networks consist of bibliographical entries representing scientific works, eachbeing a tuple of attributes such as title, authors, source, date, abstract, keywords, etc.In addition, each entry has a number of references to other entries representing thecitations found in the article. Thus, citation networks can be seen as directed graphswhere each node represents an article, out edges represent cited papers (i.e. the depen-dencies of the current paper), and in edges represent citing papers. A citation graphis generally not acyclic since articles may mutually cite each other; this is often thecase when an author (or a team of authors) publishes two or more related articles to thesame conference.

Traditional bibliographical databases generally provide means for searching, sort-ing, and filtering the citation data in various ways (examples include IEEE Xplore1, theACM Digital Library [5], and CiteSeer [9]). These database interfaces serve as suitablereference implementations when assessing new visualizations for citation networks.

3.1 Formative Evaluation

In order to deduce the common user tasks associated with bibliographical databases, weorganized a formative user evaluation using a focus group of six active researchers fromour department. Our intention with this session was to identify the high-level issues andtasks involved with the use of bibliographical data, including various situations whenresearchers make use of such databases. The session lasted for approximately one hour,and influenced us to develop a taxonomy of citation database interaction based on userroles and the tasks and subtasks associated with each role. This taxonomy, presentedin the following section, has proven useful when discussing bibliographic visualizationand the analysis tasks involved in this activity, but may have a slight bias towards aresearcher’s point of view; we plan to involve other users of citation databases (e.g.librarians) in future updates of the taxonomy.

3.2 Taxonomy of Citation Database Interaction

A researcher may assume any of a number of different roles when interacting with acitation database, and we have thus chosen to base our taxonomy on the concept ofuserrolesand the goals and tasks associated with these. Clearly, a user has differentgoalsto achieve depending on his or her current role, and these govern whichtasksneed tobe carried out. Using this taxonomy, we can make decisions about which user rolesand goals we want a tool to support, and accordingly which tasks we must implement.

In the taxonomy below, the termsgroup and subgrouprefer to any (potentiallyhierarchical) clustering of articles (and subgroups) according to some criteria, such asshared keywords, author, source, etc. Aneventis defined as any scientific communityactivity, such as a journal issue, a conference, a workshop, etc. Furthermore, we have

1http://ieeexplore.ieee.org/

3

Role Description

NoviceA researcher that is new to a specific field; can either be a new studentor an experienced researcher moving to a new area.

Expert An experienced researcher with intimate knowledge of a field.

ReviewerA researcher tasked with peer-reviewing a new paper, potentially froma field he or she has only passing knowledge of.

OrganizerA researcher responsible for organizing, editing, and/or steering anevent (such as a conference or journal).

Evaluator A person, such as a recruiter, tasked with evaluating the work of a spe-cific researcher.

Table 1: User roles in citation database usage.

categorized the user tasks depending on where the focus of the task lies; making adistinction between (i) article-, (ii) event-, (iii) author–, and (iv) group-focused usertasks is useful when discussing the nature of a visualization tool.

Table 1 presents the roles we have identified, including a short description of eachrole. Table 2 gives a listing of the individual goals of each role, as well as the tasks in-volved with completing that particular goal. Finally, Table 3 shows the different tasks,including their focus category. Note that these tasks operate on the current workinggroup and not necessarily the entire database; for instance, task T3 should be inter-preted as “find the most influential paper in the current group of papers”.

3.3 Citations as Causal Relations

A causal ordering is a general relation that relates twoeventswhere one is the cause ofthe other. We can interpret citations in scientific articles as causal orderings in at leasttwo different ways: either with authors as the active entities (processes) and their papersas events, or with papers as the active entities and a single event marking the paper’spublication for each entity. For both cases, we represent citations by causal relationsbetween the events. In this paper, we have chosen the latter approach for the simplereason that the former causes problem with the visualization when authors combineto work together on a paper; thus, our visualization is fundamentally article-focusedinstead of author-focused.

Seeing that a citation in a scientific article can be modeled by a causal relation isquite straightforward; a citation implies that (a) the authors have read the cited paper(and thus, indirectly, that the cited paper existed before the citing paper), and that (b)the citing paper has a dependency to the cited paper. Admittedly, mutual citationscannot be represented and must be either removed entirely or broken arbitrarily. In

4

Role Goal Tasks

Novice Orientation in a new area T2, T3, T5, T6Find open problems T4

Expert Verify hypotheses/intuition T1Stay updated T1Find papers quickly T1

Reviewer Check originality T2, T3, T5Check correctness T2, T3Check adequacy of references T2, T5

Organizer Identify hot topics T4, T5, T6View chronology of an event T7View collaborations between events T8

Evaluator View the career of an author T7Assess the work of an author T2, T3, T5

Table 2: Goals for each user role.

this paper, we will use the terminfluence, which is a slightly relaxed interpretationof causality in this context: if a paperA cites a paperB, the authors ofA have beeninfluenced (in some undefined way) by paperB, and this is reflected in the paper (putshortly,A has been influenced byB).

4 The CiteWiz Platform

The CiteWiz system is a modularized bibliographic visualization platform based on acentral citation dataset and a number of dataset views that can be used as input for theavailable visualization techniques. The primary visualization technique in CiteWiz isan adaptation of the Growing Polygons causality visualization method, but the platformhas been designed to be easily extensible with new visualizations. One such visualiza-tion extension is the Newton’s Shoulders diagram that provides a static timeline of theauthors and articles in the database. Based on the taxonomy described in Section 3.2,we developed the tool to be primarily article-focused, meaning that we emphasize thevisualization of articles and their interdependencies, but sufficient provisions exist forauthor-, group-, and event-focused user tasks as well.

An important point to note in the following description of the CiteWiz platform isthat this is asystemand not a visualization technique, and that many of the features(such as the tree view, the node-link arrows, and the navigation window) were addedfor convenience and flexibility, not necessarily to prove the purity of the platform.

5

Task Description Focus

T1 Find a particular paper ArticleT2 Find related papers ArticleT3 Find the most influential paper(s) ArticleT4 Find hot topics (at a specific time) GroupT5 Partition an area into subareas GroupT6 Study the overall citation network ArticleT7 Study the chronology of an author/event/group Au/Ev/GrT8 Study the collaboration between authors/events/groups Au/Ev/Gr

Table 3: Tasks for citation database interaction.

4.1 Datasets and Views

CiteWiz has a central citation dataset that is used for all queries and visualizations.Each entry in the set is a name/value pair, with fields for the conventional attributessuch as title, authors, source (i.e. journal or proceedings name), keywords, abstract,etc. Entries also have a list of references to other entries cited in the paper. The datasetis loaded from disk using a simple XML-based file format for citation meta-data thatwas designed for the InfoVis 2004 contest [7]. This file format is basically a flat list ofthe bibliographical entries in the dataset.

Users can browse, filter, sort, and search the dataset in the CiteWiz application. Inaddition, users can also buildviewsof the dataset for visualization; these are essentiallysubsets of the central dataset with the extra capability of containing hierarchical groupsof bibliographical entries. This makes it possible to build complex structures of nestedgroups according to some criteria relevant to the user; for instance, when studying adataset containing citation data for a specific conference over a period of time, onemight create groups for each conference year, and the papers could then be arranged insubgroups representing the different sessions for each conference. Other groupings arepossible and depend on the user’s goals. For instance, when performing author-focusedtasks, it might be useful to create groups for each author in the dataset and add theirpapers, allowing for easy study of author chronology and collaboration.

Views can be saved and loaded to disk using another straightforward XML format;each view file is associated with a specific dataset file, and uses the internal identifiersto refer to bibliographical entries in the dataset.

The CiteWiz tool does not currently contain any functionality for automatic con-struction of hierarchical views; all views must be manually defined. However, cluster-ing algorithms for building views automatically could be worthwhile future extensionsto the tool.

6

4.2 Influence Visualization

Views built by the user form the input for the visualization techniques supported byCiteWiz. As mentioned above, the primary visualization technique is currently theGrowing Polygons [6] method for visualization of general causal relations, suitablymodified to be able to handle citation networks and the scalability issues associatedwith these. We believe the focus on influence and causality visualization in the Grow-ing Polygons technique makes it very well suited to citation networks. The techniqueuses a combination of 2D shapes, color, and animation to graphically represent a sys-tem ofn active processes asn-sided so-calledprocess polygonsshowing the influencesaffecting each process. Each process polygon is assigned a position on a largen-sidedlayout polygon, as well as a triangular sector of the polygons corresponding to thisposition. As time progresses, the process polygons grow from zero to full size, andthe sectors of each polygon fill in as messages are received by other processes, signi-fying the information transfer. Figure 1 shows the end result at full time of a 5-processsystem; note for instance how processP0 (upper right) has all its sectors filled in, indi-cating that it has been influenced by (received messages from) all the other processesin the system, whereas processesP1 andP2 only show influences fromP4 (upper left).

Figure 1: Growing Polygons visualization with 5 processes.

In our adaptation of the original technique, articles form the processes in the visual-ization (thus represented byarticle polygons), and citations are messages from a source(cited) article to a destination (citing) article. This mimics the information transfer im-plicit when authors reference another paper. Even if articles are more or less static oncepublished, this article-focused approach gives us a way to easily see the influences andchronology of a set of articles, including global transitivity information for each article.

7

In order to make effective use of the Growing Polygons method in this context, wewere forced to address two scalability issues in relation to (i) long execution times, and(ii) large quantities of visualized articles. For the former issue concerning time scala-bility, the problem lies in that visualizing a large citation network may result in verylong chains of causality, and the visualization will then run out of space for display-ing individual time segments. For the latter case, the quantity scalability issue comesfrom the fact that visualizing a sufficiently large amount of articles means that eachindividual article gets assigned a very small polygon sector and it will thus be diffi-cult to distinguish between neighboring sectors. Both of these issues can be partiallyaddressed through zooming mechanisms, but this instead results in loss of overview.

Our solution for these concerns in the modified, more scalable version of the Grow-ing Polygons method is two-fold: we introduce a focus+context [8] technique based onadjustablelinear time windowsthat lets the user concentrate on certain areas of the ex-ecution while still retaining the context of the surrounding history (i.e. the focus viewand the overview are integrated in the same visual space, as opposed to overview+detailtechniques where the views are spatially or temporally separate). Secondly, we addressthe quantity concern by modifying the Growing Polygon technique to handlehierarchi-cal viewsinstead of flat article lists (this was our incentive for the distinction betweendatasets and views in the design of CiteWiz).

4.2.1 Linear Time Windows

As stated above, our solution to the time scalability problem is a focus+context tech-nique based on a non-linear time scale and user-controlled linear time windows. LetTbe the total number of time units in the execution we are studying. Each time windowwill then displayk time units using a normal linear time scale (ifk≥ T, we have thestandard Growing Polygons technique). A specific ratior of the maximum radiusRmax

of each article polygon is reserved for the time window, and the remaining space is dis-tributed among theT−k time units outside of the time window. These peripheral timesegments flanking the time window are calledhistory panels. The user can control theparametersk andr, and can furthermore also control the indexi, which is the index ofthe first time unit that is inside the time window. Normally, the user wants the windowto show thek latest time steps in the execution, but it is useful to be able to changei to focus on different parts of the execution. Figure 4.2.1 shows an example of anarticle polygon with a linear time window centered on the middle of the time execution(i = 4).

The history panels flank the time window and provide the surrounding context tothe user, including both future and past events (the panels are accordingly referred toas the future and past history panels). We distribute the remaining 1− r ratio of themaximum polygon radius simply by allocating a fixed space to each time unit that isproportional to(1− r)/(T− t) of the radiusRmax. A more intelligent space allocationscheme would assign recent time periods (i.e. those adjacent to the current location ofthe time window) more screen space than older history.

A linear time window has three free parameters that can be controlled by the user:the starting indexi of the window, the number of time segmentsk shown in the window,and the ratior of the maximum radius used by the time window. In our implementation,

8

k

i

index

history panelstime window

maxr R

maxR

radius

Figure 2: Growing Polygons visualization with linear time windows (i = 4, k = 2,r = 0.5).

these parameters are synchronized for all time windows (one for each article polygon),since independently controlled time windows does not make sense in the context ofcausality where the timing of individual events cannot be decoupled and where we aremainly interested in comparing the state of different articles. Sliders are provided to letthe user control these parameters freely.

4.2.2 Hierarchical Views

In order to allow the Growing Polygons technique to handle a large quantity of articles,we modify the visualization to be able to render hierarchical groups of articles insteadof single articles. These correspond directly to the views of the central dataset builtby the users. The view hierarchy is visualized by treating an article group as a normalarticle, except that the group will have the cumulative influences of all of its children.We derive these influences by a simple postorder traversal of the hierarchy, buildingthe influence timelines of the internal nodes from the bottom up (i.e. starting with thearticles in the leaves of the tree). The currently visible nodes (depending on how farthe hierarchy has been expanded) are then rendered as normal article polygons, withthe single exception that groups (i.e. non-leaves) have a drop shadow to signify that thepolygon represents more than one article.

9

4.2.3 Interaction Techniques

Merely visualizing the article hierarchy is not enough, users must also be able to browseit in order for the visualization to be useful. In our modified version of the GrowingPolygons technique, we provide two simple interaction techniques for doing this: userscan either click directly in the visualization to expand and collapse article groups (usingthe left and right mouse buttons, respectively), or they can use a separate tree navigationwindow to study the structure of the hierarchy. The same tree window can also be usedto search for the full or partial name of a specific article, and the tree will be expandedto the level of the article to show the search result.

In addition to these interaction techniques, we also provide an overview map win-dow with a color legend and clickable fields for quickly jumping to a specific articlepolygon.

4.2.4 Parent-Child Visualization

The parent-child relationships in the view hierarchy can be indicated by drawing thechord on the layout polygon connecting the article polygons of the first and last child ofeach parent. The Growing Polygons diagram to the right in Figure 3 gives an exampleof this, where the article groupsb, e, and f are shown with dashed chords enclos-ing their children. Figure 4 shows an actual screenshot of our implementation with apartially expanded hierarchy and the parent regions plainly visible (filled in with theirrespective colors).

c db e

a

ihgf

j lk

c

d

g

i

hj

k

l

f b

e

Figure 3: Simple article hierarchy (left) visualized as a expanded Growing Polygonsdiagram (right). The dashed region in the hierarchy shows the level of expansion, andthe dashed lines (chords) in the GP diagram show parent-child relationships.

4.2.5 Color Assignment

Color assignment for the modified hierarchical Growing Polygons technique is slightlydifferent than for the original technique. Even if we normally do not show the entireset of articles at the same time, we still need to statically allocate a fixed color to eacharticle so that these remain invariant as the user expands and collapses the hierarchy. Inaddition, we need to assign colors to the interior nodes in the article hierarchy (i.e. the

10

Figure 4: Modified Growing Polygons visualization with parent regions.

article groups), and this should ideally be done in such a way that the color of a parentnode has some relation to the color of its children.

In our implementation, we achieve this by normalizing the HSV spectrum to arange[0,1) and assigning intervals of this range through a simple top-down recursivetraversal of the article tree. Each child gets assigned an interval of the allocated colorrange proportional to the number of articles (not counting internal article groups) in itsbranch (see Figure 5 for an example of color assignment on an article hierarchy con-sisting ofn = 8 articles). This ensures that article colors are evenly distributed acrossthe spectrum. The article or article group itself chooses the center of the allocated colorrange as its own color. In this way, parents and children will at least potentially have avisual relation.

In fact, this can be taken one step further by rendering the geometry representingthe influences of an article group with a color gradient based on the interval allocatedto the group instead of using a single, flat color. This gives users a visual cue thatthe polygon represents a group of articles and not a single one, and might also help inperceiving the parent-child relationship among nodes. However, colors can be difficultto compare and group visually, and there is no natural way to perceive color differencein a color spectrum, so we have chosen not to perform this step in our implementation.

11

c

a

b

d e f g h i j

k l

n

0.0 1.0

tree level

colora (100%)

b (25%) c (75%)

d e f g h (25%) i j

lk m

n

m

Figure 5: Color assignment (right) for a simple article hierarchy (left) withn = 8 arti-cles. The bars to the right show the assignment, filled-in bars represent actual articles.

4.2.6 Details-On-Demand

As suggested by both Shneiderman [14] and Modjeska et al. [13], bibliographic visual-ization tools need to provide a mechanism to show the complete bibliographical data ofan article. In CiteWiz, this is handled by a detail window that gives the full meta-dataof the currently selected article. In addition to this, we augment the currently selectednode in the visualization with blue arrows pointing from the cited nodes, and with redarrows pointing to citing nodes (see Figure 6)–again for convenience reasons.

Figure 6: Citation links displayed for a particular entity.

12

4.3 Static Timeline Visualization

Beyond the modified Growing Polygons visualization described above, CiteWiz alsocontains another visualization informally referred to as a Newton’s Shoulders dia-gram2. This visualization creates a static, non-interactive timeline of either articlesor authors in the central CiteWiz citation database, displaying each entity as an iconon the timeline according to their publication date (or the date of their first publica-tion, in the case of authors). The surface area of each icon is scaled proportionally tothe amount of citations the article or author has received (rounded up so that the iconconforms to a uniform grid). The timeline is split up into suitable time units (yearsor months), and each time segment gets assigned space on the timeline equal to thesize of the largest entity in the segment. The icons representing the entities for eachtime segment are then laid out using a greedy algorithm that places the entities in de-scending size within the allocated space on the timeline, always trying to minimize thedistance to the centerpoint of the diagram. An example of such a Newton’s Shouldersdiagram can be seen in Figure 7 depicting a modest-sized citation database of some1000 authors.

As can be seen in Figure 7, we can orient the timeline vertically and use humanfigures for the entity icons, giving the impression of people standing on the shoulders ofothers. This is exactly the metaphor we had in mind when designing the visualization,and matches the intuition of the work of a researcher resting on the work of those whocame before him. The diagram now tells us the relative chronology of researchers ina specific field, and instantly shows the most influential authors and their relationships(for instance, that George Robertson, Ben Shneiderman, Jock Mackinlay, and StuartCard seems to be the “giants” of information visualization). Figure 11 shows a similardiagram for the articles in the same citation database, and we can note that the “ConeTrees” paper by Robertsonet. al seems to be the most cited paper in the database,closely followed by Furnas’ work on generalized fisheye views.

These diagrams can be modified to show additional dimensions by applying colorto the entity icons. The choice of metric to display this way can be chosen arbitrarily;one useful metric for authors could becitation density, which we define as the totalnumber of citations for an author divided by the total number of publications writtenby the author (i.e. a kind of “average paper quality” metric). Another, slightly morecomplex, metric would involve weighing citations for an author or article by their ageso that recently cited articles or authors get a stronger and more visible color than olderones, signifying that this article or author is involved in a “hot topic”.

4.4 Implementation

The CiteWiz tool is implemented as a C++ application running under the Linux operat-ing system, but should be easily portable to other platforms. It uses standard OpenGLfor efficient 2D rendering, and the GTK+/GTK– library for the graphical user interfacecomponents.

2So named after Sir Isaac Newton’s famous quote in a letter to Robert Hooke in 1676, “If I have seenfurther, it is by standing on the shoulders of giants.”

13

Leslie Lamport

James D. Foley

Marc H. Brown

William S. Cleveland

G. W. Furnas

Edward R. Tufte Jock Mackinlay

Richard A. Becker

Peter EadesRoberto Tamassia David HarelMarylyn E. McGill William C. Cleveland

Marc Levoy Teuvo KohonenSteven Feiner Frank,G. Halasz

Andries van DamS. K. Card

J. D. MackinlayG. Robertson

Stuart K. Card

George G. Robertson Jock D. Mackinlay

Steven F. Roth

Joe MattisS. K. Feiner

Edward Tufte

Clifford Beshers

John F. Hughes

Steven K. Feiner Andreas Buja

Werner Stuetzle

John Alan McDonald

Alfred Inselberg

Bernard Dimsdale

Ben Shneiderman

Manojit SarkarBrian Johnson

Robert R. KorfhageStephen M. Casner

Christopher Ahlberg

James D. Hollan Stephen G. EickChristopher Williamson

Steven P. ReissMatthew Chalmers

Paul Chitson

Peter Pirolli

Maureen C. Stone Eric A. Bier

Jack D. Mackinlay Oren J. TverskyScott S. Snibbe Ken PerlinDavid Fox William Buxton

Tony D. DeRose

Ken Pier

Robert Spence Edward A. FoxRobert K. France Kellogg S. BoothJohn T. Stasko Ben SchneidermanBay−Wei ChangDavid Ungar

John Stasko

Ramana Rao

Benjamin B. Bederson John Lamping

George W. Furnas Ken FishkinJade Goldstein Y. K. LeungM. D. Apperley

Daniel A. Keim Matthew O. WardJohn Kolojejchick Lisa TweedieMatthias Hemmje Clemens KunkelAlexander Willett

Pak Chung Wong Peter R. Keller

R. Daniel BergeronMary M. Keller

Hans−Peter Kriegel

Bob Spence

David Williams

Ravinder Bhogal

A. Schur J. A. WiseD. Lantrip V. CrowK. Pennock J. J. ThomasM. Pottier Tamara MunznerS. F. Roth Marti A. HearstCatherine PlaisantStuart Card M. Sheelagh T. CarpendaleDavid J. Cowperthwaite F. David FracchiaAllison Woodruff

R. KazmanJ. Carriere Mei C. ChuahSougata Mukherjea Paul BurchardNahum Gershon Scott HudsonMichael Stonebraker John RiedlEd Huai−hsin Chi Allan R. WilksChristopher G. Healey Alexander AikenPhillip Barry Deborah HixJolly Chen Lenwood S. HeathJohn Dill Lyn BartramM. C. Chuah James T. EnnsErik Wistrand

James Pitkow William York

P. J. StroffolinoP. Lucas C. C. GombergJ. A. Senn Qing−Wen FengM. B. Burks A. J. KolojechickC. Dunmire T. MunznerWilliam E. LorensenChristian Beilken Anne RoseMichael Spenke Kenneth M. MartinSeth Widoff Thomas BerlageBrett Milash

K. WengerD. DonjerkovicR. Ramakrishnan K. BeyerM. Livny S. LawandeG. Chen Barry G. BeckerJ. Myllymaki

Ioannis G. TollisEd H. Chi Ivan HermanGiuseppe Di Battista Rich Gossweiler

Paul Whitney Jim Thomas

Elke A. RundensteinerGraham J. WillsYing−Huey Fua Jarke J. Van WijkHuub van de Wetering Martin Wattenberg

M. Scott MarshallGuy MelanÃ§on

19741974

19751975

19761976

19781978

19791979

19801980

19811981

19821982

19831983

19841984

19851985

19861986

19871987

19881988

19891989

19901990

19911991

19921992

19931993

19941994

19951995

19961996

19971997

19981998

19991999

20002000

20012001

20022002

20032003

Figure 7: Newton’s Shoulders diagram of the authors in the IV04 contest citationdatabase.

4.5 User Study

We have yet to conduct a formal user study of the CiteWiz tool, so we cannot presentany quantitative data on the efficiency of the tool. However, we have conducted pilotstudies using subjects from the original focus group of active researchers, and theirreactions have so far been very positive. We plan to perform a formal user study in thenear future based on the user tasks and roles formulated in Section 3.2.

5 Case Study: The IV04 Dataset

In order to clarify the various uses of CiteWiz for bibliographic visualization, we willpresent three different usage scenarios for the tool in this section. These scenarios are

14

based on the user roles presented in Section 3.2, and involve the tasks discussed there.The citation dataset used is the IV04 dataset made public by the InfoVis 2004 con-test [7], which currently contains all of the papers published in the InfoVis symposiumseries as well as their references (614 entries in total).

5.1 Novice

A novice is a researcher that is new to a specific field, and may thus either be a newstudent or an experienced researcher moving to a new area. In this case, the goals ofthe user is mainly that of orientation within the area. The CiteWiz tool may help to dothis in several ways.

First of all, to get a grasp of the research field, the user may want to study a New-ton’s Shoulders diagram of both the authors and the articles in the database. This willquickly pinpoint the important articles and authors of the area. Secondly, the noviceuser may want to overview the entire citation network of the field. This can be done bycreating a multi-level view consisting of the areas and subareas of the field and addingthe respective articles to their fields (for instance, on the basis of the keywords of eacharticle). A Growing Polygons visualization of this view will show the relevance of eacharea. Studying the dependencies of the areas may also show their chronology and his-tory, such as areas spawning other areas. For example, the numerous out edges of the“Visualizing Hierarchies and Databases” session of InfoVis 1995 in Figure 8 suggestthat hierarchy visualization is a long-term research area within information visualiza-tion.

Figure 8: Hierarchy visualization in InfoVis 1995.

One of the key tasks of a novice user is to identify the most influential papers ina field. CiteWiz makes this task easy through the use of both the Growing Polygons

15

technique, which provides users with an intuitive graphical method of identifying in-fluential articles by studying the density of various colors in the visualization, and theNewton’s Shoulders diagram.

5.2 Reviewer

A reviewer is someone charged with peer-reviewing a new article for some scientificevent, potentially from a field he or she has only passing knowledge of. In this way,the work of a reviewer may often mirror that of a novice, since the reviewer may haveto familiarize himself with the field before passing judgment on the reviewed article.Furthermore, the reviewer may wish to see a graphical overview of the citation networkof the references in the article to see in which area it belongs and identify relevantarticles to read as background.

In addition to finding relevant work for checking originality and correctness of anarticle under review, a reviewer may want to check the adequacy of references for thearticle. Here, too, an overview of the citation network may help in finding similararticles and making sure all relevant work is referenced.

5.3 Organizer

Finally, an organizer is someone active in a scientific community that is tasked withorganizing, administrating, or steering an event (i.e. a scientific activity such as a jour-nal or a conference). In this case, bibliographic visualization can be used primarily tosee high-level trends concerning the event and the field as a whole. Using CiteWiz, theorganizer can build a chronological view of the event, creating groups for each issue oryear (and, optionally, adding subgroups for the sessions and tracks of a conference, forinstance). A Growing Polygons visualization of the view will then show the chronol-ogy and dependencies of each event instance (see Figure 9), and may reveal interestingfacts; for instance, Figure 10 shows that the “Decision Trees and Clickstreams” sessionin InfoVis 2001 seems to have been a passing trend in that no future InfoVis papersreference any of the papers contained in that session.

To see interdependencies and collaborations between other events, the organizercan add groups for each of the additional events of the view (possibly with top-levelgroups for each of the events to be compared). A visualization of this view will now re-veal how much interaction goes on between the events, which event (if any) tends to beleading in terms of presenting new ideas and subareas, and how the events co-developover time. Furthermore, building views of authors in relation to their participation inthe organizer’s event and other events may show whether there exists groupings of au-thors in the area, or whether authors publish freely with no special preference for anevent.

6 Conclusions

We have described CiteWiz, an extensible platform for bibliographic visualization. Theplatform includes a modified version of the Growing Polygons method for visualizing

16

Figure 9: Partially expanded chronological view of the InfoVis conferences.

causal relations. At the onset of the project, we conducted a formative evaluation usinga focus group of active researchers, allowing us to formulate a taxonomy of the usageof citation databases. Guided by this taxonomy, we designed CiteWiz to emphasize thevisualization of articles and their independencies. The modifications to the GrowingPolygons technique were aimed primarily at adapting the method to citation networks,and included provisions for rendering hierarchies of articles rather than flat lists, anda focus+context technique with user-controlled time windows to more easily supportlong citation chains. Furthermore, we introduce interaction techniques to the tool toallow for expanding and collapsing the hierarchies, navigating forward and backwardsreferences in the network, and for retrieving details-on-demand. In addition, the toolalso contains another visualization technique called a Newton’s Shoulders diagram thatconstructs static timelines of articles or authors showing the causality and citations ina citation database. Finally, we also presented three usage scenarios for CiteWiz tohighlight the wide range of uses possible for the tool.

7 Future Work

There exists a wide variety of possible extensions to the CiteWiz tool, not least thedesign of new bibliographic visualization techniques to provide alternate views of thedataset. An interesting such technique would be an overview visualization of an entiredataset, allowing users to see general trends and major features rather than individualarticles. Furthermore, document clustering algorithms for automatic construction ofhierarchical views would be a useful addition to the tool. We are also investigatingvarious ways to build a web-based interface for CiteWiz and make it accessible on the

17

Figure 10: Citation network for the “Decision Trees and Clickstreams” session of In-foVis 2001.

Internet.

Acknowledgements

The authors would like to our colleagues at Chalmers University of Technology fortheir thoughts and feedback during the focus group session. We also wish to thank Jean-Daniel Fekete, Georges Grinstein, and Catherine Plaisant for providing the InfoVis2004 contest dataset that we used for testing the CiteWiz tool.

References

[1] Ulrik Brandes and Thomas Willhal. Visualization of bibliographic networks witha reshaped landscape metaphor. InProceedings of the Symposium on Data Visu-alisation 2002, pages 159–164. Eurographics Association, 2002.

[2] Matthew Chalmers and Paul Chitson. Bead: Explorations in information visual-ization. InProceedings of the Fifteenth Annual International ACM SIGIR Con-ference on Research and Development in Information Retrieval, pages 330–337,1992.

[3] Chaomei Chen. Visualising semantic spaces and author co-citation networks indigital libraries.Information Processing and Management, 35(3):401–420, 1999.

18

[4] Chaomei Chen and Steven Morris. Visualizing evolving networks: Minimumspanning trees versus pathfinder networks. InProceedings of the IEEE Sympo-sium on Information Visualization 2003, pages 67–74, October 2003.

[5] Peter J. Denning. The ACM digital library goes live.Communications of theACM, 40(7):28–29, July 1997.

[6] Niklas Elmqvist and Philippas Tsigas. Causality visualization using animatedgrowing polygons. InProceedings of the IEEE Symposium on Information Visu-alization 2003, pages 189–196, October 19–21 2003.

[7] Jean-Daniel Fekete, Georges Grinstein, and Catherine Plaisant. InfoVis 2004Contest: The History of InfoVis, 2004.http://www.cs.umd.edu/hcil/iv04contest/.

[8] George W. Furnas. Generalized fisheye views. InProceedings of the ACM CHI’86Conference on Human Factors in Computer Systems, pages 16–23, 1986.

[9] C. Lee Giles, Kurt Bollacker, and Steve Lawrence. CiteSeer: An automatic ci-tation indexing system. InDigital Libraries 98 - The Third ACM Conference onDigital Libraries, pages 89–98, June 1998.

[10] Matthias Hemmje, Clemens Kunkel, and Alexander Willett. Lyberworld – Avisualization user interface supporting fulltext retrieval. InProceedings of theSeventeenth Annual International ACM SIGIR Conference on Research and De-velopment in Information Retrieval, pages 249–259, 1994.

[11] Michael M. Kessler. Bibliographic coupling between scientific papers.AmericanDocumentation, 14(1):10–25, January 1963.

[12] Jock D. Mackinlay, Ramana Rao, and Stuart K. Card. An organic user interfacefor searching citation links. InProceedings of ACM CHI’95 Conference on Hu-man Factors in Computing Systems, volume 1 ofPapers: Information Access,pages 67–73, 1995.

[13] David Modjeska, Vassilios Tzerpos, Petros Faloutsos, and Michalis Faloutsos.BIVTECI: A bibliographic visualization tool. InProceedings of the 1996 Con-ference of the Centre of Advanced Studies on Collaborative Research, page 28,1996.

[14] Ben Shneiderman. The eyes have it: A task by data type taxonomy for informa-tion visualizations. InProceedings of the IEEE Symposium on Visual Languages,pages 336–343, September 3–6 1996.

[15] Henry G. Small. Co-citation in the scientific literature: A new measure of therelationship between two documents.Journal of the American Society for Infor-mation Science, 24(4):265–269, July-August 1973.

19

The elements of

Generalized fis

The visual disp

Automating the

Brushing scatte

Dynamic Graphic

Seeing the foreReflections on On visual forma

The cognitive c

Envisioning infComputer graphi Worlds within wParallel coordi

Rapid controllePainting multip

Cone Trees anim

The perspective Tree−Maps a spa

To see, or not Task−analytic a

Dynamic queries Tree visualizatThe dynamic Hom

Bead exploratio

Information vis Stretching the Pad an alternatGraphical Fishe Toolglass and m

Visualizing DatAnimation from

Visual informatPad++ a zooming The table lens A review and ta The movable filInteractive gra

LyberWorld−a viVisual Cues Pra Using aggregatiLaying out and The attribute e Dynamic QueriesData visualizat

A focus+context

Visualizing the Space−scale dia

Research report

Visualizing the

3−dimensional p

Visualizing com

TileBars visual

Visualizing Net

IVEE an environ

The WebBook and

Visage: a user FOCUS the interLifeLines visua A linear iterat

Self−organizingDEVise integrat

Visualizing theGraph Drawing A

Information vis

Using vision toThe Document LeNavigating larg

Graph Visualiza

19741974

19751975

19761976

19781978

19791979

19801980

19811981

19821982

19831983

19841984

19851985

19861986

19871987

19881988

19891989

19901990

19911991

19921992

19931993

19941994

19951995

19961996

19971997

19981998

19991999

20002000

20012001

20022002

20032003

20042004

Figure 11: Newton’s Shoulders diagram of the articles in the IV04 contest citationdatabase. 20

citewiz: a tool for the visualization of scientiï¬c citation

Documents