slides

Slides

• Please download the slides fromwww.umiacs.umd.edu/~daqingd/lbsc796-w5.ppt

www.umiacs.umd.edu/~daqingd/lbsc796-w5.rtf

Interactions

LBSC 796/CMSC 838o

Daqing He, Douglas W. Oard

Session 5, March 8, 2004

Slides

• Please download the slides fromwww.umiacs.umd.edu/~daqingd/lbsc796-w5.ppt

www.umiacs.umd.edu/~daqingd/lbsc796-w5.rtf

Agenda

• Interactions in retrieval systems

• Query formulation

• Selection

• Examination

• Document delivery

System Oriented Retrieval Model

Search

Query

Ranked List

Indexing Index

Acquisition Collection

Whose Process Is It?

• Who initiates a search process?

• Who controls the progress?

• Who ends a search process?

User Oriented Retrieval Model

SourceSelection

Search

Query

Document Selection

Ranked List

DocumentExamination

Document

DocumentDelivery

Document

QueryFormulation

IR System

CollectionIndexing

Index

CollectionAcquisition

Collection

User

Taylor’s Conceptual Framework

• Four levels of “information needs”– Visceral

• What you really want to know

– Conscious• What you recognize that you want to know

– Formalized (e.g., TREC topics)• How you articulate what you want to know

– Compromised (e.g., TREC queries)• How you express what you want to know to a system

[Taylor 68]

Belkin’s ASK model

• Users are concerned with a problem

• But do not clearly understand– the problem itself– the information need to solve the problem

Anomalous State of Knowledge

• Need clarification process to form a query

[Belkin 80, Belkin, Oddy, Brooks 82]

What are humans good at?

• Sense low level stimuli

• Recognize patterns

• Reason inductively

• Communicate with multiple channels

• Apply multiple strategies

• Adapt to changes or unexpected events

From Ben Shneiderman’s “designing user interfaces”

What are computers good at?

• Sense stimuli outside human’s range

• Calculate fast and mechanical

• Store large quantities and recall accurately

• Response rapidly and consistently

• Perform repetitive actions reliably

• Maintain performance under heavy load and extended time

From Ben Shneiderman’s “designing user interfaces”

What should Interaction be?

• Synergic

• Humans do things that human are good at

• Computers do things that computers are good at

• the strength of one covers the weakness of the other

Source Selection

• People have their own preference

• Different tasks require different sources

• Possible choices– ask help from people or machines– browsing or search, or combination– general purpose vs specific domain IR system– different collections

Query Formulation

Search

QueryQuery

Formulation

CollectionIndexing

Index

User

User’s Goals

• User’s goals– Identify the right query for the current need

• conscious/formalized need => compromised need

• How can the user achieve this goal?– Infer the right query terms– Infer the right composition of terms

System’s Goals

• Help the user – build links between needs – know more about the system and the collection

How does System Achieve Its Goals?

• Ask more from the user – Encourage long/complex queries

• Provide a large text entry area• Use forms filling or direct manipulation

– Initiate interactions• Ask questions related to the needs • Engage a dialogue with the user

• Infer from relevant items– Infer from previous queries– Infer from previous retrieved documents

Query Formulation Interaction Styles

• Shneiderman 97– Command Language– Form Fillin– Menu Selection– Direct Manipulation– Natural Language

Credit: Marti Hearst

Form-Based Query Specification (Melvyl)


Form-based Query Specification (Infoseek)


Dir

ect M

anip

ulat

ion

Spe

c.V

QU

ER

Y (

Jone

s 98

)


High-Accuracy Retrieval of Documents

SearchEngine

Baseline Results

TopicStatement

Clarification Questions

Answers toClarification Questions

HARD Results

UMD HARD 2003 retrieval model

Preference among subtopic areas

Recently viewed relevant documents

Preference to sub-collections or genres

Query Expansion

Refined Ranked

List

Desired result formats

Clarification Questions

DocumentReranking

PassageRetrieval

Ranked ListMerging

HARD retrieval process

[He & Demner, 2003]

Dialogues in Need Negotiation

InformationNeed

1. Formulate a Query

Document Collection

3. Find DocumentsMatching the Query

Search Results

2. Need negotiation

SearchEngine

Personalization through User’s Search Contexts

AfricanQueen

Romantic Films

Romantic Films

IncrementalLearner

Casablanca

InformationRetrievalSystem

[Goker & He, 2000]

Things That Hurt

• Obscure ranking methods– Unpredictable effects of adding or deleting terms

• Only single-term queries avoid this problem

• Counterintuitive statistics– “clis”: AltaVista says 3,882 docs match the

query– “clis library”: 27,025 docs match the

query!• Every document with either term was counted

Browsing Retrieved Set

Search

Query

Document Selection

Ranked List

DocumentExamination

Document

QueryFormulation

QueryReformulation

User

DocumentReselection

Indicative vs. Informative

• Terms often applied to document abstracts– Indicative abstracts support selection

• They describe the contents of a document

– Informative abstracts support understanding• They summarize the contents of a document

• Applies to any information presentation– Presented for indicative or informative purposes

User’s Browsing Goals

• Identify documents for some form of delivery– An indicative purpose

• Query Enrichment– Relevance feedback (indicative)

• User designates “more like this” documents

• System adds terms from those documents to the query

– Manual reformulation (informative)• Better approximation of visceral information need

System’s Goals

• Assist the user to– Identify relevant documents– Identify potential useful terms

• for clarifying the right information need

• for generating better queries


Search

Query

Document Selection

Ranked List

DocumentExamination

Document

QueryFormulation

QueryReformulation

User

DocumentReselection

A Selection Interface Taxonomy

• One dimensional lists– Content: title, source, date, summary, ratings, ...– Order: retrieval status value, date, alphabetic, ...– Size: scrolling, specified number, RSV threshold

• Two dimensional displays– Construction: clustering, starfields, projection– Navigation: jump, pan, zoom

• Three dimensional displays– Contour maps, fishtank VR, immersive VR

Extraction-Based Summarization

• Robust technique for making disfluent summaries

• Four broad types:– Single-document vs. multi-document– Term-oriented vs. sentence-oriented

• Combination of evidence for selection:– Salience: similarity to the query– Selectivity: IDF or chi-squared– Emphasis: title, first sentence

• For multi-document, suppress duplication

Generated Summaries

• Fluent summaries for a specific domain• Define a knowledge structure for the domain

– Frames are commonly used

• Analysis: process documents to fill the structure– Studied separately as “information extraction”

• Compression: select which facts to retain• Generation: create fluent summaries

– Templates for initial candidates– Use language model to select an alternative

Google’s KWIC Summary

• For Query “University of Maryland College Park”

Teoma’s Query Refine Suggestions

url: www.teoma.com

Vivisimo’s Clustering Results

url: vivisimo.com

Kartoo’s Cluster Visualization

url: kartoo.com

Cluster Formation• Based on inter-document similarity

– Computed using the cosine measure, for example

• Heuristic methods can be fairly efficient– Pick any document as the first cluster “seed”– Add the most similar document to each cluster

• Adding the same document will join two clusters

– Check to see if each cluster should be split• Does it contain two or more fairly coherent groups?

• Lots of variations on this have been tried

Starfield

Dynamic Queries:• IVEE/Spotfire/Filmfinder (Ahlberg &

Shneiderman 93)

Constructing Starfield Displays

• Two attributes determine the position– Can be dynamically selected from a list

• Numeric position attributes work best– Date, length, rating, …

• Other attributes can affect the display– Displayed as color, size, shape, orientation, …

• Each point can represent a cluster

– Interactively specified using “dynamic queries”

Projection

• Depict many numeric attributes in 2 dimensions– While preserving important spatial relationships

• Typically based on the vector space model– Which has about 100,000 numeric attributes!

• Approximates multidimensional scaling– Heuristic approaches are reasonably fast

• Often visualized as a starfield– But the dimensions lack any particular meaning

Contour Map Displays

• Display a cluster density as terrain elevation– Fit a smooth opaque surface to the data

• Visualize in three dimensions– Project two 2-D and allow manipulation– Use stereo glasses to create a virtual “fishtank”– Create an immersive virtual reality experience

• Mead mounted stereo monitors and head tracking

• “Cave” with wall projection and body tracking

ThemeView

Credit to: Pacific Northwest National Laboratory


Search

Query

Document Selection

Ranked List

DocumentExamination

Document

QueryFormulation

QueryReformulation

User

DocumentReselection

Full-Text Examination Interfaces

• Most use scroll and/or jump navigation– Some experiments with zooming

• Long documents need special features– “Best passage” function helps users get started

• Overlapping 300 word passages work well

– “Next search term” function facilitates browsing

• Integrated functions for relevance feedback– Passage selection, query term weighting, …

A Long Document

Document lens

Robertson & Mackinlay, UIST'93, Atlanta, 1993

TileBar

[Hearst et al 95]

SeeSoft

[Eric 94]

Things That Help

• Show the query in the selection interface– It provides context for the display

• Explain what the system has done– It is hard to control a tool you don’t understand

• Highlight search terms, for example

• Complement what the system has done– Users add value by doing things the system can’t– Expose the information users need to judge utility

Document Delivery

DocumentExamination

DocumentDelivery

Document

User

Delivery Modalities

• On-screen viewing– Good for hypertext, multimedia, cut-and-paste, …

• Printing– Better resolution, portability, annotations, …

• Fax-on-demand– Really just another way to get to a printer

• Synthesized speech– Useful for telephone and hands-free applications

Take-Away Messages

• IR process belongs to users

• Matching documents for a query is only part of the whole IR process

• But IR system can help users

• And IR systems need to support– Query formulation/reformulation– Document Selection/Examination

Two Minute Paper

• When examining documents in the selection and examination interfaces, which type of information need (visceral, conscious, formalized, or compromised) guides the user’s decisions? Please justify your answer.

• What was the muddiest point in today’s lecture?

Alternate Query Modalities

• Spoken queries– Used for telephone and hands-free applications– Reasonable performance with limited vocabularies

• But some error correction method must be included

• Handwritten queries– Palm pilot graffiti, touch-screens, …– Fairly effective if some form of shorthand is used

• Ordinary handwriting often has too much ambiguity

slides

Documents

user interfaces

query belkin

humans good

computers good

search process

right query termsinfer

user build links

marti hearsthighaccuracy