slides

57
Slides • Please download the slides from www.umiacs.umd.edu/~daqingd/ lbsc796-w5.ppt www.umiacs.umd.edu/~daqingd/lbsc 796-w5.rtf

Upload: galena-faulkner

Post on 04-Jan-2016

36 views

Category:

Documents


1 download

DESCRIPTION

Slides. Please download the slides from www.umiacs.umd.edu/~daqingd/lbsc796-w5.ppt www.umiacs.umd.edu/~daqingd/lbsc796-w5.rtf. Interactions. LBSC 796/CMSC 838o Daqing He, Douglas W. Oard Session 5, March 8, 2004. Slides. Please download the slides from - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Slides

Slides

• Please download the slides fromwww.umiacs.umd.edu/~daqingd/lbsc796-w5.ppt

www.umiacs.umd.edu/~daqingd/lbsc796-w5.rtf

Page 2: Slides

Interactions

LBSC 796/CMSC 838o

Daqing He, Douglas W. Oard

Session 5, March 8, 2004

Page 3: Slides

Slides

• Please download the slides fromwww.umiacs.umd.edu/~daqingd/lbsc796-w5.ppt

www.umiacs.umd.edu/~daqingd/lbsc796-w5.rtf

Page 4: Slides

Agenda

• Interactions in retrieval systems

• Query formulation

• Selection

• Examination

• Document delivery

Page 5: Slides

System Oriented Retrieval Model

Search

Query

Ranked List

Indexing Index

Acquisition Collection

Page 6: Slides

Whose Process Is It?

• Who initiates a search process?

• Who controls the progress?

• Who ends a search process?

Page 7: Slides

User Oriented Retrieval Model

SourceSelection

Search

Query

Document Selection

Ranked List

DocumentExamination

Document

DocumentDelivery

Document

QueryFormulation

IR System

CollectionIndexing

Index

CollectionAcquisition

Collection

User

Page 8: Slides

Taylor’s Conceptual Framework

• Four levels of “information needs”– Visceral

• What you really want to know

– Conscious• What you recognize that you want to know

– Formalized (e.g., TREC topics)• How you articulate what you want to know

– Compromised (e.g., TREC queries)• How you express what you want to know to a system

[Taylor 68]

Page 9: Slides

Belkin’s ASK model

• Users are concerned with a problem

• But do not clearly understand– the problem itself– the information need to solve the problem

Anomalous State of Knowledge

• Need clarification process to form a query

[Belkin 80, Belkin, Oddy, Brooks 82]

Page 10: Slides

What are humans good at?

• Sense low level stimuli

• Recognize patterns

• Reason inductively

• Communicate with multiple channels

• Apply multiple strategies

• Adapt to changes or unexpected events

From Ben Shneiderman’s “designing user interfaces”

Page 11: Slides

What are computers good at?

• Sense stimuli outside human’s range

• Calculate fast and mechanical

• Store large quantities and recall accurately

• Response rapidly and consistently

• Perform repetitive actions reliably

• Maintain performance under heavy load and extended time

From Ben Shneiderman’s “designing user interfaces”

Page 12: Slides

What should Interaction be?

• Synergic

• Humans do things that human are good at

• Computers do things that computers are good at

• the strength of one covers the weakness of the other

Page 13: Slides

Source Selection

• People have their own preference

• Different tasks require different sources

• Possible choices– ask help from people or machines– browsing or search, or combination– general purpose vs specific domain IR system– different collections

Page 14: Slides

Query Formulation

Search

QueryQuery

Formulation

CollectionIndexing

Index

User

Page 15: Slides

User’s Goals

• User’s goals– Identify the right query for the current need

• conscious/formalized need => compromised need

• How can the user achieve this goal?– Infer the right query terms– Infer the right composition of terms

Page 16: Slides

System’s Goals

• Help the user – build links between needs – know more about the system and the collection

Page 17: Slides

How does System Achieve Its Goals?

• Ask more from the user – Encourage long/complex queries

• Provide a large text entry area• Use forms filling or direct manipulation

– Initiate interactions• Ask questions related to the needs • Engage a dialogue with the user

• Infer from relevant items– Infer from previous queries– Infer from previous retrieved documents

Page 18: Slides

Query Formulation Interaction Styles

• Shneiderman 97– Command Language– Form Fillin– Menu Selection– Direct Manipulation– Natural Language

Credit: Marti Hearst

Page 19: Slides

Form-Based Query Specification (Melvyl)

Credit: Marti Hearst

Page 20: Slides

Form-based Query Specification (Infoseek)

Credit: Marti Hearst

Page 21: Slides

Dir

ect M

anip

ulat

ion

Spe

c.V

QU

ER

Y (

Jone

s 98

)

Credit: Marti Hearst

Page 22: Slides

High-Accuracy Retrieval of Documents

SearchEngine

Baseline Results

TopicStatement

Clarification Questions

Answers toClarification Questions

HARD Results

Page 23: Slides

UMD HARD 2003 retrieval model

Preference among subtopic areas

Recently viewed relevant documents

Preference to sub-collections or genres

Query Expansion

Refined Ranked

List

Desired result formats

Clarification Questions

DocumentReranking

PassageRetrieval

Ranked ListMerging

HARD retrieval process

[He & Demner, 2003]

Page 24: Slides

Dialogues in Need Negotiation

InformationNeed

1. Formulate a Query

Document Collection

3. Find DocumentsMatching the Query

Search Results

2. Need negotiation

SearchEngine

Page 25: Slides

Personalization through User’s Search Contexts

AfricanQueen

Romantic Films

Romantic Films

IncrementalLearner

Casablanca

InformationRetrievalSystem

[Goker & He, 2000]

Page 26: Slides

Things That Hurt

• Obscure ranking methods– Unpredictable effects of adding or deleting terms

• Only single-term queries avoid this problem

• Counterintuitive statistics– “clis”: AltaVista says 3,882 docs match the

query– “clis library”: 27,025 docs match the

query!• Every document with either term was counted

Page 27: Slides

Browsing Retrieved Set

Search

Query

Document Selection

Ranked List

DocumentExamination

Document

QueryFormulation

QueryReformulation

User

DocumentReselection

Page 28: Slides

Indicative vs. Informative

• Terms often applied to document abstracts– Indicative abstracts support selection

• They describe the contents of a document

– Informative abstracts support understanding• They summarize the contents of a document

• Applies to any information presentation– Presented for indicative or informative purposes

Page 29: Slides

User’s Browsing Goals

• Identify documents for some form of delivery– An indicative purpose

• Query Enrichment– Relevance feedback (indicative)

• User designates “more like this” documents

• System adds terms from those documents to the query

– Manual reformulation (informative)• Better approximation of visceral information need

Page 30: Slides

System’s Goals

• Assist the user to– Identify relevant documents– Identify potential useful terms

• for clarifying the right information need

• for generating better queries

Page 31: Slides

Browsing Retrieved Set

Search

Query

Document Selection

Ranked List

DocumentExamination

Document

QueryFormulation

QueryReformulation

User

DocumentReselection

Page 32: Slides

A Selection Interface Taxonomy

• One dimensional lists– Content: title, source, date, summary, ratings, ...– Order: retrieval status value, date, alphabetic, ...– Size: scrolling, specified number, RSV threshold

• Two dimensional displays– Construction: clustering, starfields, projection– Navigation: jump, pan, zoom

• Three dimensional displays– Contour maps, fishtank VR, immersive VR

Page 33: Slides

Extraction-Based Summarization

• Robust technique for making disfluent summaries

• Four broad types:– Single-document vs. multi-document– Term-oriented vs. sentence-oriented

• Combination of evidence for selection:– Salience: similarity to the query– Selectivity: IDF or chi-squared– Emphasis: title, first sentence

• For multi-document, suppress duplication

Page 34: Slides

Generated Summaries

• Fluent summaries for a specific domain• Define a knowledge structure for the domain

– Frames are commonly used

• Analysis: process documents to fill the structure– Studied separately as “information extraction”

• Compression: select which facts to retain• Generation: create fluent summaries

– Templates for initial candidates– Use language model to select an alternative

Page 35: Slides

Google’s KWIC Summary

• For Query “University of Maryland College Park”

Page 36: Slides

Teoma’s Query Refine Suggestions

url: www.teoma.com

Page 37: Slides

Vivisimo’s Clustering Results

url: vivisimo.com

Page 38: Slides

Kartoo’s Cluster Visualization

url: kartoo.com

Page 39: Slides

Cluster Formation• Based on inter-document similarity

– Computed using the cosine measure, for example

• Heuristic methods can be fairly efficient– Pick any document as the first cluster “seed”– Add the most similar document to each cluster

• Adding the same document will join two clusters

– Check to see if each cluster should be split• Does it contain two or more fairly coherent groups?

• Lots of variations on this have been tried

Page 40: Slides

Starfield

Page 41: Slides

Dynamic Queries:• IVEE/Spotfire/Filmfinder (Ahlberg &

Shneiderman 93)

Page 42: Slides

Constructing Starfield Displays

• Two attributes determine the position– Can be dynamically selected from a list

• Numeric position attributes work best– Date, length, rating, …

• Other attributes can affect the display– Displayed as color, size, shape, orientation, …

• Each point can represent a cluster

– Interactively specified using “dynamic queries”

Page 43: Slides

Projection

• Depict many numeric attributes in 2 dimensions– While preserving important spatial relationships

• Typically based on the vector space model– Which has about 100,000 numeric attributes!

• Approximates multidimensional scaling– Heuristic approaches are reasonably fast

• Often visualized as a starfield– But the dimensions lack any particular meaning

Page 44: Slides

Contour Map Displays

• Display a cluster density as terrain elevation– Fit a smooth opaque surface to the data

• Visualize in three dimensions– Project two 2-D and allow manipulation– Use stereo glasses to create a virtual “fishtank”– Create an immersive virtual reality experience

• Mead mounted stereo monitors and head tracking

• “Cave” with wall projection and body tracking

Page 45: Slides

ThemeView

Credit to: Pacific Northwest National Laboratory

Page 46: Slides

Browsing Retrieved Set

Search

Query

Document Selection

Ranked List

DocumentExamination

Document

QueryFormulation

QueryReformulation

User

DocumentReselection

Page 47: Slides

Full-Text Examination Interfaces

• Most use scroll and/or jump navigation– Some experiments with zooming

• Long documents need special features– “Best passage” function helps users get started

• Overlapping 300 word passages work well

– “Next search term” function facilitates browsing

• Integrated functions for relevance feedback– Passage selection, query term weighting, …

Page 48: Slides

A Long Document

Page 49: Slides

Document lens

Robertson & Mackinlay, UIST'93, Atlanta, 1993

Page 50: Slides

TileBar

[Hearst et al 95]

Page 51: Slides

SeeSoft

[Eric 94]

Page 52: Slides

Things That Help

• Show the query in the selection interface– It provides context for the display

• Explain what the system has done– It is hard to control a tool you don’t understand

• Highlight search terms, for example

• Complement what the system has done– Users add value by doing things the system can’t– Expose the information users need to judge utility

Page 53: Slides

Document Delivery

DocumentExamination

DocumentDelivery

Document

User

Page 54: Slides

Delivery Modalities

• On-screen viewing– Good for hypertext, multimedia, cut-and-paste, …

• Printing– Better resolution, portability, annotations, …

• Fax-on-demand– Really just another way to get to a printer

• Synthesized speech– Useful for telephone and hands-free applications

Page 55: Slides

Take-Away Messages

• IR process belongs to users

• Matching documents for a query is only part of the whole IR process

• But IR system can help users

• And IR systems need to support– Query formulation/reformulation– Document Selection/Examination

Page 56: Slides

Two Minute Paper

• When examining documents in the selection and examination interfaces, which type of information need (visceral, conscious, formalized, or compromised) guides the user’s decisions? Please justify your answer.

• What was the muddiest point in today’s lecture?

Page 57: Slides

Alternate Query Modalities

• Spoken queries– Used for telephone and hands-free applications– Reasonable performance with limited vocabularies

• But some error correction method must be included

• Handwritten queries– Palm pilot graffiti, touch-screens, …– Fairly effective if some form of shorthand is used

• Ordinary handwriting often has too much ambiguity