co-cited author maps as interfaces to digital libraries: kohonen and pfnet displays for the...
TRANSCRIPT
Co-Cited Author Maps as Interfaces to Digital Libraries:
Kohonen and PFNetDisplays for the Humanities
Howard D. White Jan Buzydlowski
Xia LinCollege of Information Science and Technology
Drexel University, Philadelphia, PA
Co-citation is the mentioning of any two earlier documents in the bibliographic references of a later third document.
The count of mentions may grow over time as new writings appear. Thus, co-citation counts can reflect citers’ changing perceptions of documents as more or less strongly related.
Documents shown to be related by their co-citation counts can be mapped as proximate in intellectual space.
Co-Citation Analysis
Doc 1
Doc 2
Doc 3
Co-Citation Analysis
Lin, Xia. 1997. Map Displays for Information Retrieval. Journal of the American Society for Information Science 48: 40-54.
Chen, Chaomei. 1998. Bridging the Gap: The Use of Pathfinder Networks in Visual Navigation. Journal of Visual Languages and Computing 9: 267-286.
Document co-citation counts times two papers are cited together.
Author co-citation counts times two authors, e.g., Lin and Chen, are cited together.
Journal co-citation counts times two journals are cited together.
Co-Citation Analysis
Data on co-citation are readily obtainable from databases of the Institute for Scientific Information (ISI) in Philadelphia, PA:• Scisearch (Science Citation Index)• Social Scisearch (Social Sciences Citation Index)• Arts & Humanities Search (Arts & Humanities
Citation Index) These databases are searchable online through,
e.g., the Dialog Corporation.
Author Co-Citation Analysis (ACA)
Detects patterns in the frequency with which any works by any two authors are jointly cited in later works.
Only recurrent co-citation is significant: the more times authors are cited together, the more strongly related they are in the eyes of citers.
Author Co-Citation Analysis
If Ben Shneiderman and Shakespeare are cited together in one article, it probably means little.
If Ben Shneiderman and Stuart Card are cited together in 205 articles,* it means a lot: their names have jointly come to symbolize something like “interactive interfaces for digital libraries.” Possibly no subject heading captures this concept.
In a cited-author (CA) search on Dialog, SELECT CA=SHNEIDERMAN B AND CA=CARD SK
would retrieve the 205 citing articles. *Actual count, 7/10/00
Underlying Database and Software
ISI gave our college 10 years’ worth of data from the Arts & Humanities Citation Index (AHCI 1988-1997) as a research grant. Has 1.26 million bibliographic records on articles and other items from humanities journals.
For retrievals from AHCI, we bought BRS Search, an industrial-strength engine, from Dataware, Inc.
Buzydlowski and Lin have written several special programs in Java and C to implement our system on top of the BRS Search software.
Our Project
Produces co-cited author maps in real time (a few seconds) on a Web site.
Low cognitive load: User merely has to enter name of a single author of interest as a “seed.”• E.g., Dickinson-E for Emily Dickinson
System responds with the top authors co-cited with that seed—about 25 names ranked by frequency of co-occurrence.
Quick Visualizations of a Database
User can choose to display the top 25 as either a Kohonen feature map (SOM, self-organizing map) or a Pathfinder network map (PFNET).
User can use either map as • An aid to retrieving articles from AHCI
1988-97 that cite authors in various combinations. Combinations are made through drag-and-drop.
• Reproducible artwork in a new study, such as a review of a literature or a commentary on the author used as “seed.”
Maps in the Humanities
We are able to produce maps of authors in the humanities with high face validity.• Can build maps around great names in literature,
philosophy, history, religion, the fine arts. E.g., Dante, Picasso, D. H. Lawrence, Martin Luther, Edward Gibbon, Emily Dickinson, Plato, Vladimir Nabokov.
• Can also build maps around noted scholars, critics, or commentators. E.g., Simon Schama, Garry Wills, Elaine Showalter, Camille Paglia, Derek de Solla Price.
• System will work with authors in other ISI databases in the natural and social sciences. Also with other kinds of co-occurring terms: journal names, descriptors, etc.
Advantages of Maps
Ranked list of top 25 co-cited authors often contains names not previously known to user.
Both Kohonen maps and PFNETs show interconnections of the 25 authors not apparent in the one-dimensional ranking of a simple list.
Interpretation of Maps
Kohonen maps show high co-citation counts of authors by placing them closer in space.
PFNETs show highest co-citation counts of authors directly, as links between nodes bearing authors’ names. The counts themselves can be made to appear above the links.
Kohonen Feature Maps
Are a variety of neural network. Are produced by an algorithm for
unsupervised computer learning in which data points “compete” for the position on the output grid that best represents their numeric weights (co-citation counts) relative to all other points.
PFNETs
Are algorithmically connected graphs based on finding “minimum-cost” path between any two nodes.
In ACA, this is generally the highest single co-citation count between author pairs (all pairs are examined).
Results in useful simplification of graph. Use spring embedder algorithm to
produce layout.
PFNETs
Make sense as pictures of relations in databases! Independent observers have found them highly
intelligible:• Xia Lin on Chinese philosophers• Kate McCain on historians of science & technology• Howard White on various literary figures and artists
Buzydlowski research will test interpretability of PFNETs and Kohonen maps as interfaces for domain experts and naïve users.
Interface Design Considerations
Link interface to valuable digital libraries (ISI citation databases and the journal literatures they lead to).
Focus on intellectual content: meaningful words, meaningfully presented.
Stress quick and flexible presentations over long-term displays.
Evidence We’re on Right Track
US Patent 6,038,574: “Method and Apparatus for Clustering Collection of Linked Documents Using Co-Citation Analysis”
Filed: March 18, 1998 Awarded: March 14, 2000 Inventors: James E. Pitkow, Peter L. Pirolli,
Jock D. Mackinlay, Stuart K. Card, all of Xerox PARC
SCHLEIERMACHER F
GADAMER HG
KANT I
HEGEL GWF
BARTH K
DILTHEY W
HEIDEGGER M
PLATO
BIBLE
ARISTOTLE
HABERMAS J
DERRIDA J
RICOEUR P
GOETHE JWV
BULTMANN R
FRANK M
NIETZSCHE F
TILLICH P
FICHTE JG
PANNENBERG W
TROELTSCH E
SCHELLING FWJ
SCHLEGEL FV
LUTHER M
EBELING G
PFNET of authors co-cited with F. Schleiermacher in AHCI,
1988-1997(Biblical and literary hermeneutics)
AuthorLink System Structure
…….. Procedures
Web InterfaceJava Applet
Web Server
Application ServerJava Servlets
Kohonen Mapping Procedures in C
BRS SearchEngine/ISI Data
PFNET Mapping Procedures in C
cgi