slides
DESCRIPTION
Slides. Please download the slides from www.umiacs.umd.edu/~daqingd/lbsc796-w5.ppt www.umiacs.umd.edu/~daqingd/lbsc796-w5.rtf. Interactions. LBSC 796/CMSC 838o Daqing He, Douglas W. Oard Session 5, March 8, 2004. Slides. Please download the slides from - PowerPoint PPT PresentationTRANSCRIPT
Slides
• Please download the slides fromwww.umiacs.umd.edu/~daqingd/lbsc796-w5.ppt
www.umiacs.umd.edu/~daqingd/lbsc796-w5.rtf
Interactions
LBSC 796/CMSC 838o
Daqing He, Douglas W. Oard
Session 5, March 8, 2004
Slides
• Please download the slides fromwww.umiacs.umd.edu/~daqingd/lbsc796-w5.ppt
www.umiacs.umd.edu/~daqingd/lbsc796-w5.rtf
Agenda
• Interactions in retrieval systems
• Query formulation
• Selection
• Examination
• Document delivery
System Oriented Retrieval Model
Search
Query
Ranked List
Indexing Index
Acquisition Collection
Whose Process Is It?
• Who initiates a search process?
• Who controls the progress?
• Who ends a search process?
User Oriented Retrieval Model
SourceSelection
Search
Query
Document Selection
Ranked List
DocumentExamination
Document
DocumentDelivery
Document
QueryFormulation
IR System
CollectionIndexing
Index
CollectionAcquisition
Collection
User
Taylor’s Conceptual Framework
• Four levels of “information needs”– Visceral
• What you really want to know
– Conscious• What you recognize that you want to know
– Formalized (e.g., TREC topics)• How you articulate what you want to know
– Compromised (e.g., TREC queries)• How you express what you want to know to a system
[Taylor 68]
Belkin’s ASK model
• Users are concerned with a problem
• But do not clearly understand– the problem itself– the information need to solve the problem
Anomalous State of Knowledge
• Need clarification process to form a query
[Belkin 80, Belkin, Oddy, Brooks 82]
What are humans good at?
• Sense low level stimuli
• Recognize patterns
• Reason inductively
• Communicate with multiple channels
• Apply multiple strategies
• Adapt to changes or unexpected events
From Ben Shneiderman’s “designing user interfaces”
What are computers good at?
• Sense stimuli outside human’s range
• Calculate fast and mechanical
• Store large quantities and recall accurately
• Response rapidly and consistently
• Perform repetitive actions reliably
• Maintain performance under heavy load and extended time
From Ben Shneiderman’s “designing user interfaces”
What should Interaction be?
• Synergic
• Humans do things that human are good at
• Computers do things that computers are good at
• the strength of one covers the weakness of the other
Source Selection
• People have their own preference
• Different tasks require different sources
• Possible choices– ask help from people or machines– browsing or search, or combination– general purpose vs specific domain IR system– different collections
Query Formulation
Search
QueryQuery
Formulation
CollectionIndexing
Index
User
User’s Goals
• User’s goals– Identify the right query for the current need
• conscious/formalized need => compromised need
• How can the user achieve this goal?– Infer the right query terms– Infer the right composition of terms
System’s Goals
• Help the user – build links between needs – know more about the system and the collection
How does System Achieve Its Goals?
• Ask more from the user – Encourage long/complex queries
• Provide a large text entry area• Use forms filling or direct manipulation
– Initiate interactions• Ask questions related to the needs • Engage a dialogue with the user
• Infer from relevant items– Infer from previous queries– Infer from previous retrieved documents
Query Formulation Interaction Styles
• Shneiderman 97– Command Language– Form Fillin– Menu Selection– Direct Manipulation– Natural Language
Credit: Marti Hearst
Form-Based Query Specification (Melvyl)
Credit: Marti Hearst
Form-based Query Specification (Infoseek)
Credit: Marti Hearst
Dir
ect M
anip
ulat
ion
Spe
c.V
QU
ER
Y (
Jone
s 98
)
Credit: Marti Hearst
High-Accuracy Retrieval of Documents
SearchEngine
Baseline Results
TopicStatement
Clarification Questions
Answers toClarification Questions
HARD Results
UMD HARD 2003 retrieval model
Preference among subtopic areas
Recently viewed relevant documents
Preference to sub-collections or genres
Query Expansion
Refined Ranked
List
Desired result formats
Clarification Questions
DocumentReranking
PassageRetrieval
Ranked ListMerging
HARD retrieval process
[He & Demner, 2003]
Dialogues in Need Negotiation
InformationNeed
1. Formulate a Query
Document Collection
3. Find DocumentsMatching the Query
Search Results
2. Need negotiation
SearchEngine
Personalization through User’s Search Contexts
AfricanQueen
Romantic Films
Romantic Films
IncrementalLearner
Casablanca
InformationRetrievalSystem
[Goker & He, 2000]
Things That Hurt
• Obscure ranking methods– Unpredictable effects of adding or deleting terms
• Only single-term queries avoid this problem
• Counterintuitive statistics– “clis”: AltaVista says 3,882 docs match the
query– “clis library”: 27,025 docs match the
query!• Every document with either term was counted
Browsing Retrieved Set
Search
Query
Document Selection
Ranked List
DocumentExamination
Document
QueryFormulation
QueryReformulation
User
DocumentReselection
Indicative vs. Informative
• Terms often applied to document abstracts– Indicative abstracts support selection
• They describe the contents of a document
– Informative abstracts support understanding• They summarize the contents of a document
• Applies to any information presentation– Presented for indicative or informative purposes
User’s Browsing Goals
• Identify documents for some form of delivery– An indicative purpose
• Query Enrichment– Relevance feedback (indicative)
• User designates “more like this” documents
• System adds terms from those documents to the query
– Manual reformulation (informative)• Better approximation of visceral information need
System’s Goals
• Assist the user to– Identify relevant documents– Identify potential useful terms
• for clarifying the right information need
• for generating better queries
Browsing Retrieved Set
Search
Query
Document Selection
Ranked List
DocumentExamination
Document
QueryFormulation
QueryReformulation
User
DocumentReselection
A Selection Interface Taxonomy
• One dimensional lists– Content: title, source, date, summary, ratings, ...– Order: retrieval status value, date, alphabetic, ...– Size: scrolling, specified number, RSV threshold
• Two dimensional displays– Construction: clustering, starfields, projection– Navigation: jump, pan, zoom
• Three dimensional displays– Contour maps, fishtank VR, immersive VR
Extraction-Based Summarization
• Robust technique for making disfluent summaries
• Four broad types:– Single-document vs. multi-document– Term-oriented vs. sentence-oriented
• Combination of evidence for selection:– Salience: similarity to the query– Selectivity: IDF or chi-squared– Emphasis: title, first sentence
• For multi-document, suppress duplication
Generated Summaries
• Fluent summaries for a specific domain• Define a knowledge structure for the domain
– Frames are commonly used
• Analysis: process documents to fill the structure– Studied separately as “information extraction”
• Compression: select which facts to retain• Generation: create fluent summaries
– Templates for initial candidates– Use language model to select an alternative
Google’s KWIC Summary
• For Query “University of Maryland College Park”
Teoma’s Query Refine Suggestions
url: www.teoma.com
Vivisimo’s Clustering Results
url: vivisimo.com
Kartoo’s Cluster Visualization
url: kartoo.com
Cluster Formation• Based on inter-document similarity
– Computed using the cosine measure, for example
• Heuristic methods can be fairly efficient– Pick any document as the first cluster “seed”– Add the most similar document to each cluster
• Adding the same document will join two clusters
– Check to see if each cluster should be split• Does it contain two or more fairly coherent groups?
• Lots of variations on this have been tried
Starfield
Dynamic Queries:• IVEE/Spotfire/Filmfinder (Ahlberg &
Shneiderman 93)
Constructing Starfield Displays
• Two attributes determine the position– Can be dynamically selected from a list
• Numeric position attributes work best– Date, length, rating, …
• Other attributes can affect the display– Displayed as color, size, shape, orientation, …
• Each point can represent a cluster
– Interactively specified using “dynamic queries”
Projection
• Depict many numeric attributes in 2 dimensions– While preserving important spatial relationships
• Typically based on the vector space model– Which has about 100,000 numeric attributes!
• Approximates multidimensional scaling– Heuristic approaches are reasonably fast
• Often visualized as a starfield– But the dimensions lack any particular meaning
Contour Map Displays
• Display a cluster density as terrain elevation– Fit a smooth opaque surface to the data
• Visualize in three dimensions– Project two 2-D and allow manipulation– Use stereo glasses to create a virtual “fishtank”– Create an immersive virtual reality experience
• Mead mounted stereo monitors and head tracking
• “Cave” with wall projection and body tracking
ThemeView
Credit to: Pacific Northwest National Laboratory
Browsing Retrieved Set
Search
Query
Document Selection
Ranked List
DocumentExamination
Document
QueryFormulation
QueryReformulation
User
DocumentReselection
Full-Text Examination Interfaces
• Most use scroll and/or jump navigation– Some experiments with zooming
• Long documents need special features– “Best passage” function helps users get started
• Overlapping 300 word passages work well
– “Next search term” function facilitates browsing
• Integrated functions for relevance feedback– Passage selection, query term weighting, …
A Long Document
Document lens
Robertson & Mackinlay, UIST'93, Atlanta, 1993
TileBar
[Hearst et al 95]
SeeSoft
[Eric 94]
Things That Help
• Show the query in the selection interface– It provides context for the display
• Explain what the system has done– It is hard to control a tool you don’t understand
• Highlight search terms, for example
• Complement what the system has done– Users add value by doing things the system can’t– Expose the information users need to judge utility
Document Delivery
DocumentExamination
DocumentDelivery
Document
User
Delivery Modalities
• On-screen viewing– Good for hypertext, multimedia, cut-and-paste, …
• Printing– Better resolution, portability, annotations, …
• Fax-on-demand– Really just another way to get to a printer
• Synthesized speech– Useful for telephone and hands-free applications
Take-Away Messages
• IR process belongs to users
• Matching documents for a query is only part of the whole IR process
• But IR system can help users
• And IR systems need to support– Query formulation/reformulation– Document Selection/Examination
Two Minute Paper
• When examining documents in the selection and examination interfaces, which type of information need (visceral, conscious, formalized, or compromised) guides the user’s decisions? Please justify your answer.
• What was the muddiest point in today’s lecture?
Alternate Query Modalities
• Spoken queries– Used for telephone and hands-free applications– Reasonable performance with limited vocabularies
• But some error correction method must be included
• Handwritten queries– Palm pilot graffiti, touch-screens, …– Fairly effective if some form of shorthand is used
• Ordinary handwriting often has too much ambiguity