to see, or not to see—is that the query? robert r. korfhage dept. of information science...

17
o See, or Not to See—Is That the Query? Robert R. Korfhage Dept. of Information Science University of Pittsburgh 1991 Reviewed by Yi-Bu Chen LIS 551 Information Retrieval March 23, 2004

Upload: shonda-parks

Post on 18-Dec-2015

214 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: To See, or Not to See—Is That the Query? Robert R. Korfhage Dept. of Information Science University of Pittsburgh 1991 Reviewed by Yi-Bu Chen LIS 551 Information

To See, or Not to See—Is That the Query?

Robert R. Korfhage

Dept. of Information ScienceUniversity of Pittsburgh

1991

Reviewed by Yi-Bu ChenLIS 551 Information Retrieval

March 23, 2004

Page 2: To See, or Not to See—Is That the Query? Robert R. Korfhage Dept. of Information Science University of Pittsburgh 1991 Reviewed by Yi-Bu Chen LIS 551 Information

Problems of traditional IR systemsProblems of traditional IR systems Visual information retrieval system I -- GUIDOVisual information retrieval system I -- GUIDO

Underlying modelUnderlying model How it worksHow it works Problems and developmentProblems and development

Visual information retrieval system II -- VIBEVisual information retrieval system II -- VIBE Underlying modelUnderlying model How it worksHow it works Problems and developmentProblems and development

ConclusionsConclusions

Overview

Page 3: To See, or Not to See—Is That the Query? Robert R. Korfhage Dept. of Information Science University of Pittsburgh 1991 Reviewed by Yi-Bu Chen LIS 551 Information

Problems of Traditional IR Systems

In responding to a user’s query, the traditional IR systems In responding to a user’s query, the traditional IR systems return with sequential list of documents:return with sequential list of documents:

The documents may or may not be ranked by The documents may or may not be ranked by relevance.relevance.

Even if ranked, users do not know how the relevance Even if ranked, users do not know how the relevance is determined by the systems.is determined by the systems.

Preventing users from viewing of other documents, Preventing users from viewing of other documents, and with no or little relevance feedback, they hinder and with no or little relevance feedback, they hinder users’ query reformation process.users’ query reformation process.

The sequential list does not give users a clear and The sequential list does not give users a clear and comprehensive view of all the documents retrieved.comprehensive view of all the documents retrieved.

Little or no efforts to take into account user Little or no efforts to take into account user individuality.individuality.

Page 4: To See, or Not to See—Is That the Query? Robert R. Korfhage Dept. of Information Science University of Pittsburgh 1991 Reviewed by Yi-Bu Chen LIS 551 Information

The Questions Raised

How to present to users a complete view of How to present to users a complete view of the document space so that the document space so that Users are not limited away from the documents the Users are not limited away from the documents the

systems deem less relevant;systems deem less relevant; Users are able to see significant relationships among Users are able to see significant relationships among

the documents. the documents.

How to enable users to browse and navigate How to enable users to browse and navigate large document spaces with ease ? large document spaces with ease ?

How to facilitate users’ query reformation How to facilitate users’ query reformation process?process?

Page 5: To See, or Not to See—Is That the Query? Robert R. Korfhage Dept. of Information Science University of Pittsburgh 1991 Reviewed by Yi-Bu Chen LIS 551 Information

GUIDO: Graphical User Interface for Data OrganizationThe underlying model--the vector space model

A document collection is viewed as a multidimentional space whose dimensionality is determined by the number of vocabulary terms.

The vectors of the queries are reference points called points of interests (POI), against which each document is measured.

A Document is represented by a point whose coordinates are its absolute distances from each POI.

Similarity of a document to a POI is measured by the absolute distance (numerical values) between the two vectors.

Page 6: To See, or Not to See—Is That the Query? Robert R. Korfhage Dept. of Information Science University of Pittsburgh 1991 Reviewed by Yi-Bu Chen LIS 551 Information

GUIDO: 1, 2, and 3 POIs

With a single POI (query), GUIDO display in one-dimentioal just like many traditional IR systems.

With 2 POIs, the distance space (document space) is a half-infinite plank.

With 3 POIs, the distance space is a 3-dimentional prism. GUIDO is most useful for 2 and 3 POIs or reference points.

Fig. 2 The GUIDO space with 3 POIsFig. 1 The GUIDO space with 2 POIs

Page 7: To See, or Not to See—Is That the Query? Robert R. Korfhage Dept. of Information Science University of Pittsburgh 1991 Reviewed by Yi-Bu Chen LIS 551 Information

GUIDO: A 2-POI example

Document icons located lower in the plank have higher similarity to both POIs than those further out.

All documents having the same distances from both POIs appear at the same single position.

Fig. 3 The GUIDO space with 2 POIs, with distances measured using 3 different metrics (the city block metric; the Euclidean metric, and the maximal distance metric.)

Page 8: To See, or Not to See—Is That the Query? Robert R. Korfhage Dept. of Information Science University of Pittsburgh 1991 Reviewed by Yi-Bu Chen LIS 551 Information

GUIDO: Browsing and Retrieval

Four models of combining distances for retrieval evaluation have been proposed.

Once a model is chosen, a cap is formed. Documents below the cap will be retrieved.

Placing the cap near the bottom of the plank sets a higher threshold for retrieval.

Fig. 4 The GUIDO space with 2 POIs, capped with four different retrieval models (disjunctive, conjuctive, elliptical and Cassini oval).

Page 9: To See, or Not to See—Is That the Query? Robert R. Korfhage Dept. of Information Science University of Pittsburgh 1991 Reviewed by Yi-Bu Chen LIS 551 Information

GUIDO: Redefining the Display

Documents unrelated to the POIs are hidden. By changing POIs, new document spaces can be

generated. (query reformulation) Four ways of creating new POIs:

Modify the weights of the terms in an existing POI Combine given POIs into a new one Select a particular document to be a POI Calculate a new POI from an interesting cluster of the

documents by a simple cluster analysis.

Page 10: To See, or Not to See—Is That the Query? Robert R. Korfhage Dept. of Information Science University of Pittsburgh 1991 Reviewed by Yi-Bu Chen LIS 551 Information

GUIDO: Problems and Future Directions

Loss of information in changing from document space model to the distance space model. Document space model: only documents with identical

descriptors mapped into the same point. Distance space model: any documents whose distances

to various POIs are same mapped into the same point.

Inter-document distance is not represented. Integrate GUIDO with other visual interfaces (such

as BIRD which is based on the Boolean model) to create a more complete document handling system.

Page 11: To See, or Not to See—Is That the Query? Robert R. Korfhage Dept. of Information Science University of Pittsburgh 1991 Reviewed by Yi-Bu Chen LIS 551 Information

VIBE: VIsualization By Example

Developed by Olsen et al. (1991) on the basis of the same distance paradigm as GUIDO, with further reduction of dimensionality.

A document collection is viewed as a TWO dimensional space.

The POIs can be positioned anywhere (NOT fixed as in GUIDO), and users can define as many POIs as wanted.

A Document is represented by a point whose coordinates are the RATIOS of its distances from each POI (NOT the absolute distances as used in GUIDO).

Page 12: To See, or Not to See—Is That the Query? Robert R. Korfhage Dept. of Information Science University of Pittsburgh 1991 Reviewed by Yi-Bu Chen LIS 551 Information

VIBE: An Example

Page 13: To See, or Not to See—Is That the Query? Robert R. Korfhage Dept. of Information Science University of Pittsburgh 1991 Reviewed by Yi-Bu Chen LIS 551 Information

GUIDO Vs. VIBE

POIs are fixed, 2-3 are the most useful.

POIs can be modified. Document position determined

by the absolute distances to each POI.

Position of a document indicates the strength of each POI with respect to the document.

Loss of information as different documents with the same distances to each POIs are mapped to the same point.

POIs are NOT fixed, 3 or more are the most useful.

POIs can be modified AND moved.

Document position determined by the ratios of the distances to each POI.

Position of a document indicates the relative strength of each POI with respect to the document.

Icon size proportional to the strength of the most significant POI.

Further Loss of information as different documents with the same ratios of the distances to each POIs are mapped to the same point.

Page 14: To See, or Not to See—Is That the Query? Robert R. Korfhage Dept. of Information Science University of Pittsburgh 1991 Reviewed by Yi-Bu Chen LIS 551 Information

VIBE: Recent development

Page 15: To See, or Not to See—Is That the Query? Robert R. Korfhage Dept. of Information Science University of Pittsburgh 1991 Reviewed by Yi-Bu Chen LIS 551 Information

Conclusions Both GUIDO and VIBE provide users with full

visibility of a document set of large size and allow them to control the retrieval dynamically.

Both GUIDO and VIBE permit users to modify POIs and view the instant impact of document space by such modifications, therefore greatly facilitate query reformulation process.

Unlike the traditional IR systems, GUIDO and VIBE enable users to browse the database and determine the relevance of a document before its retrieval.

Limitations in displays will need to be resolved before GUIDO and VIBE can go mainstream.

Page 16: To See, or Not to See—Is That the Query? Robert R. Korfhage Dept. of Information Science University of Pittsburgh 1991 Reviewed by Yi-Bu Chen LIS 551 Information

Additional Readings

Korfhage RR and Olsen KA, 1991. Information display: control of visual representations. IEEE.

Nuchprayoon A and Korfhage RR, 1994. GUIDO, a visual tool for retrieving documents. IEEE.

Korfhage RR and Nuchprayoon A, 1997. GUIDO: visualizing document retrieval. IEEE.

Morse E, Lewis M, and Olsen KA. Testing Visual Information Retrieval Methodologies Case Study: Comparative Analysis Of Textual, Icon, Graphical And “Spring” Displays. Manuscript on the web.

Page 17: To See, or Not to See—Is That the Query? Robert R. Korfhage Dept. of Information Science University of Pittsburgh 1991 Reviewed by Yi-Bu Chen LIS 551 Information