to see, or not to see—is that the query? robert r. korfhage dept. of information science...
TRANSCRIPT
To See, or Not to See—Is That the Query?
Robert R. Korfhage
Dept. of Information ScienceUniversity of Pittsburgh
1991
Reviewed by Yi-Bu ChenLIS 551 Information Retrieval
March 23, 2004
Problems of traditional IR systemsProblems of traditional IR systems Visual information retrieval system I -- GUIDOVisual information retrieval system I -- GUIDO
Underlying modelUnderlying model How it worksHow it works Problems and developmentProblems and development
Visual information retrieval system II -- VIBEVisual information retrieval system II -- VIBE Underlying modelUnderlying model How it worksHow it works Problems and developmentProblems and development
ConclusionsConclusions
Overview
Problems of Traditional IR Systems
In responding to a user’s query, the traditional IR systems In responding to a user’s query, the traditional IR systems return with sequential list of documents:return with sequential list of documents:
The documents may or may not be ranked by The documents may or may not be ranked by relevance.relevance.
Even if ranked, users do not know how the relevance Even if ranked, users do not know how the relevance is determined by the systems.is determined by the systems.
Preventing users from viewing of other documents, Preventing users from viewing of other documents, and with no or little relevance feedback, they hinder and with no or little relevance feedback, they hinder users’ query reformation process.users’ query reformation process.
The sequential list does not give users a clear and The sequential list does not give users a clear and comprehensive view of all the documents retrieved.comprehensive view of all the documents retrieved.
Little or no efforts to take into account user Little or no efforts to take into account user individuality.individuality.
The Questions Raised
How to present to users a complete view of How to present to users a complete view of the document space so that the document space so that Users are not limited away from the documents the Users are not limited away from the documents the
systems deem less relevant;systems deem less relevant; Users are able to see significant relationships among Users are able to see significant relationships among
the documents. the documents.
How to enable users to browse and navigate How to enable users to browse and navigate large document spaces with ease ? large document spaces with ease ?
How to facilitate users’ query reformation How to facilitate users’ query reformation process?process?
GUIDO: Graphical User Interface for Data OrganizationThe underlying model--the vector space model
A document collection is viewed as a multidimentional space whose dimensionality is determined by the number of vocabulary terms.
The vectors of the queries are reference points called points of interests (POI), against which each document is measured.
A Document is represented by a point whose coordinates are its absolute distances from each POI.
Similarity of a document to a POI is measured by the absolute distance (numerical values) between the two vectors.
GUIDO: 1, 2, and 3 POIs
With a single POI (query), GUIDO display in one-dimentioal just like many traditional IR systems.
With 2 POIs, the distance space (document space) is a half-infinite plank.
With 3 POIs, the distance space is a 3-dimentional prism. GUIDO is most useful for 2 and 3 POIs or reference points.
Fig. 2 The GUIDO space with 3 POIsFig. 1 The GUIDO space with 2 POIs
GUIDO: A 2-POI example
Document icons located lower in the plank have higher similarity to both POIs than those further out.
All documents having the same distances from both POIs appear at the same single position.
Fig. 3 The GUIDO space with 2 POIs, with distances measured using 3 different metrics (the city block metric; the Euclidean metric, and the maximal distance metric.)
GUIDO: Browsing and Retrieval
Four models of combining distances for retrieval evaluation have been proposed.
Once a model is chosen, a cap is formed. Documents below the cap will be retrieved.
Placing the cap near the bottom of the plank sets a higher threshold for retrieval.
Fig. 4 The GUIDO space with 2 POIs, capped with four different retrieval models (disjunctive, conjuctive, elliptical and Cassini oval).
GUIDO: Redefining the Display
Documents unrelated to the POIs are hidden. By changing POIs, new document spaces can be
generated. (query reformulation) Four ways of creating new POIs:
Modify the weights of the terms in an existing POI Combine given POIs into a new one Select a particular document to be a POI Calculate a new POI from an interesting cluster of the
documents by a simple cluster analysis.
GUIDO: Problems and Future Directions
Loss of information in changing from document space model to the distance space model. Document space model: only documents with identical
descriptors mapped into the same point. Distance space model: any documents whose distances
to various POIs are same mapped into the same point.
Inter-document distance is not represented. Integrate GUIDO with other visual interfaces (such
as BIRD which is based on the Boolean model) to create a more complete document handling system.
VIBE: VIsualization By Example
Developed by Olsen et al. (1991) on the basis of the same distance paradigm as GUIDO, with further reduction of dimensionality.
A document collection is viewed as a TWO dimensional space.
The POIs can be positioned anywhere (NOT fixed as in GUIDO), and users can define as many POIs as wanted.
A Document is represented by a point whose coordinates are the RATIOS of its distances from each POI (NOT the absolute distances as used in GUIDO).
VIBE: An Example
GUIDO Vs. VIBE
POIs are fixed, 2-3 are the most useful.
POIs can be modified. Document position determined
by the absolute distances to each POI.
Position of a document indicates the strength of each POI with respect to the document.
Loss of information as different documents with the same distances to each POIs are mapped to the same point.
POIs are NOT fixed, 3 or more are the most useful.
POIs can be modified AND moved.
Document position determined by the ratios of the distances to each POI.
Position of a document indicates the relative strength of each POI with respect to the document.
Icon size proportional to the strength of the most significant POI.
Further Loss of information as different documents with the same ratios of the distances to each POIs are mapped to the same point.
VIBE: Recent development
Conclusions Both GUIDO and VIBE provide users with full
visibility of a document set of large size and allow them to control the retrieval dynamically.
Both GUIDO and VIBE permit users to modify POIs and view the instant impact of document space by such modifications, therefore greatly facilitate query reformulation process.
Unlike the traditional IR systems, GUIDO and VIBE enable users to browse the database and determine the relevance of a document before its retrieval.
Limitations in displays will need to be resolved before GUIDO and VIBE can go mainstream.
Additional Readings
Korfhage RR and Olsen KA, 1991. Information display: control of visual representations. IEEE.
Nuchprayoon A and Korfhage RR, 1994. GUIDO, a visual tool for retrieving documents. IEEE.
Korfhage RR and Nuchprayoon A, 1997. GUIDO: visualizing document retrieval. IEEE.
Morse E, Lewis M, and Olsen KA. Testing Visual Information Retrieval Methodologies Case Study: Comparative Analysis Of Textual, Icon, Graphical And “Spring” Displays. Manuscript on the web.