co-cited author maps as interfaces to digital libraries: kohonen and pfnet displays for the...

Post on 12-Jan-2016

215 Views

Category:

Documents

1 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Co-Cited Author Maps as Interfaces to Digital Libraries:

Kohonen and PFNetDisplays for the Humanities

Howard D. White Jan Buzydlowski

Xia LinCollege of Information Science and Technology

Drexel University, Philadelphia, PA

Co-citation is the mentioning of any two earlier documents in the bibliographic references of a later third document.

The count of mentions may grow over time as new writings appear. Thus, co-citation counts can reflect citers’ changing perceptions of documents as more or less strongly related.

Documents shown to be related by their co-citation counts can be mapped as proximate in intellectual space.

Co-Citation Analysis

Doc 1

Doc 2

Doc 3

Co-Citation Analysis

Lin, Xia. 1997. Map Displays for Information Retrieval. Journal of the American Society for Information Science 48: 40-54.

Chen, Chaomei. 1998. Bridging the Gap: The Use of Pathfinder Networks in Visual Navigation. Journal of Visual Languages and Computing 9: 267-286.

Document co-citation counts times two papers are cited together.

Author co-citation counts times two authors, e.g., Lin and Chen, are cited together.

Journal co-citation counts times two journals are cited together.

Co-Citation Analysis

Data on co-citation are readily obtainable from databases of the Institute for Scientific Information (ISI) in Philadelphia, PA:• Scisearch (Science Citation Index)• Social Scisearch (Social Sciences Citation Index)• Arts & Humanities Search (Arts & Humanities

Citation Index) These databases are searchable online through,

e.g., the Dialog Corporation.

Author Co-Citation Analysis (ACA)

Detects patterns in the frequency with which any works by any two authors are jointly cited in later works.

Only recurrent co-citation is significant: the more times authors are cited together, the more strongly related they are in the eyes of citers.

Author Co-Citation Analysis

If Ben Shneiderman and Shakespeare are cited together in one article, it probably means little.

If Ben Shneiderman and Stuart Card are cited together in 205 articles,* it means a lot: their names have jointly come to symbolize something like “interactive interfaces for digital libraries.” Possibly no subject heading captures this concept.

In a cited-author (CA) search on Dialog, SELECT CA=SHNEIDERMAN B AND CA=CARD SK

would retrieve the 205 citing articles. *Actual count, 7/10/00

Underlying Database and Software

ISI gave our college 10 years’ worth of data from the Arts & Humanities Citation Index (AHCI 1988-1997) as a research grant. Has 1.26 million bibliographic records on articles and other items from humanities journals.

For retrievals from AHCI, we bought BRS Search, an industrial-strength engine, from Dataware, Inc.

Buzydlowski and Lin have written several special programs in Java and C to implement our system on top of the BRS Search software.

Our Project

Produces co-cited author maps in real time (a few seconds) on a Web site.

Low cognitive load: User merely has to enter name of a single author of interest as a “seed.”• E.g., Dickinson-E for Emily Dickinson

System responds with the top authors co-cited with that seed—about 25 names ranked by frequency of co-occurrence.

Quick Visualizations of a Database

User can choose to display the top 25 as either a Kohonen feature map (SOM, self-organizing map) or a Pathfinder network map (PFNET).

User can use either map as • An aid to retrieving articles from AHCI

1988-97 that cite authors in various combinations. Combinations are made through drag-and-drop.

• Reproducible artwork in a new study, such as a review of a literature or a commentary on the author used as “seed.”

Maps in the Humanities

We are able to produce maps of authors in the humanities with high face validity.• Can build maps around great names in literature,

philosophy, history, religion, the fine arts. E.g., Dante, Picasso, D. H. Lawrence, Martin Luther, Edward Gibbon, Emily Dickinson, Plato, Vladimir Nabokov.

• Can also build maps around noted scholars, critics, or commentators. E.g., Simon Schama, Garry Wills, Elaine Showalter, Camille Paglia, Derek de Solla Price.

• System will work with authors in other ISI databases in the natural and social sciences. Also with other kinds of co-occurring terms: journal names, descriptors, etc.

Advantages of Maps

Ranked list of top 25 co-cited authors often contains names not previously known to user.

Both Kohonen maps and PFNETs show interconnections of the 25 authors not apparent in the one-dimensional ranking of a simple list.

Interpretation of Maps

Kohonen maps show high co-citation counts of authors by placing them closer in space.

PFNETs show highest co-citation counts of authors directly, as links between nodes bearing authors’ names. The counts themselves can be made to appear above the links.

Kohonen Feature Maps

Are a variety of neural network. Are produced by an algorithm for

unsupervised computer learning in which data points “compete” for the position on the output grid that best represents their numeric weights (co-citation counts) relative to all other points.

PFNETs

Are algorithmically connected graphs based on finding “minimum-cost” path between any two nodes.

In ACA, this is generally the highest single co-citation count between author pairs (all pairs are examined).

Results in useful simplification of graph. Use spring embedder algorithm to

produce layout.

PFNETs

Make sense as pictures of relations in databases! Independent observers have found them highly

intelligible:• Xia Lin on Chinese philosophers• Kate McCain on historians of science & technology• Howard White on various literary figures and artists

Buzydlowski research will test interpretability of PFNETs and Kohonen maps as interfaces for domain experts and naïve users.

Interface Design Considerations

Link interface to valuable digital libraries (ISI citation databases and the journal literatures they lead to).

Focus on intellectual content: meaningful words, meaningfully presented.

Stress quick and flexible presentations over long-term displays.

Evidence We’re on Right Track

US Patent 6,038,574: “Method and Apparatus for Clustering Collection of Linked Documents Using Co-Citation Analysis”

Filed: March 18, 1998 Awarded: March 14, 2000 Inventors: James E. Pitkow, Peter L. Pirolli,

Jock D. Mackinlay, Stuart K. Card, all of Xerox PARC

SCHLEIERMACHER F

GADAMER HG

KANT I

HEGEL GWF

BARTH K

DILTHEY W

HEIDEGGER M

PLATO

BIBLE

ARISTOTLE

HABERMAS J

DERRIDA J

RICOEUR P

GOETHE JWV

BULTMANN R

FRANK M

NIETZSCHE F

TILLICH P

FICHTE JG

PANNENBERG W

TROELTSCH E

SCHELLING FWJ

SCHLEGEL FV

LUTHER M

EBELING G

PFNET of authors co-cited with F. Schleiermacher in AHCI,

1988-1997(Biblical and literary hermeneutics)

AuthorLink System Structure

…….. Procedures

Web InterfaceJava Applet

Web Server

Application ServerJava Servlets

Kohonen Mapping Procedures in C

BRS SearchEngine/ISI Data

PFNET Mapping Procedures in C

cgi

top related