information access i interactive information search gslt, göteborg, october 2003 barbara gawronska,...
Post on 21-Dec-2015
216 views
TRANSCRIPT
Information Access IInteractive Information Search
GSLT,
Göteborg, October 2003
Barbara Gawronska, Högskolan i Skövde
2nd intensive week:
Interactivity (Th 8-12 BG, 13-15 MM) Multilingual systems and resources
(Fr 8-10 MM, 10-12 BG) Evaluation (Fr 13-15 BG)
Some repetition...: Data Retrieval vs. IR (2)(the German IR Research Group)
IR systems have to handle ”uncertain knowledge” (”unsicheres Wissen”): Vague queries; reformulation frequently required The problem of the user’s own understanding of his/hers information need Limitations of knowledge representations
This implies interaction need.
A General Model of an IR system (Fuhr 1995:11)
Data Analysis Retrieved Information
Knowledgerepresentation Transformations
Information Retrieval
Internal KnowledgeStructures
A Basic Model of a Document Retrieval System
(Fuhr 1995:11)
Document AnalysisRetrieved Documents orDocument Information
Indexing, Classification,Clustering Retrieval operations
(Boolean or stochastic)
Document Retrieval
Data Bank Structures
A document from different perspectives (Meghini et al. 91, modified)
Artikel ur NyttI T
Grundskoleprojektet – sammanfattning av detförsta året2003-09-05 FU-kanslietJ ohanna Österberg
Sedan ett år tillbaka driver Högskolan rekryteringsprojektet’Grundskolans elever – våra framtida studenter’.
Genom att på olika sätt nå ut med information om högskolestudier tillgrundskoleelever är målet att avdramatisera och väcka intresse för högrestudier i allmänhet och Högskolan i Skövde i synnerhet. Syftet är attöppna upp högskolans värld, öka mångfalden och minskasnedrekryteringen.
KlassbesökUnder hösten 2002 samarbetade Högskolan med Vasaskolan i Skövde ochCentralskolan i Töreboda. På båda skolorna träffade personal ochstudenter från Högskolan alla avgångsklasser under ungefär en timme föratt diskutera framtiden och olika valmöjligheter i livet. Även skillnadermellan att läsa på högstadiet/gymnasiet och högskola diskuterades.Sammanlagt deltog ungefär 200 elever i dessa träffar. Även föräldrarnatill dessa elever fick en kort information om högskolestudier i sambandmed föräldramöten om gymnasievalet.
Layout”Logical” stucture
(head, title, autor…)Semantics
Different aspects of a search
DB object
Real objectInformation
request
Formalquery
Objectattributes
Logical view
Layout viewLayout
specification
Structurespecification
Semantic viewContent
specification
But where and when the interactivity is needed?
DB object
Real objectInformation
request
Formalquery
Objectattributes
Logical view
Layout viewLayout
specification
Structurespecification
Semantic viewContent
specification
How to diagnose the need of interaction refinement?
User studies (still to sparse):
User in contact with existing systems:
Free task choice Predefined tasks
Wizard-of-Oz experiments
Relevance feedback (”real” och ”pseudo”)
Wizard-of-Oz experiments(Dahlbäck, Jönson...)
Users tend to spontaneously produce a kind of ”controlled” language:
written language syntax (complete sentences, elipsis avoided) ”reparations” not frequent pronominal anaphora less frequent than in human-human
communication
Wizard-of-Oz experiments (3)
”Controlled” language in users (3)
A psycholinguistic reflection: it is not unlike”baby-talk” (i.e. the way of talking to young children or unskilled/unidiomatic speaker of a language)
This can make human-computer NLP-dialogue a less complicated task than e.g. translating human-human dialogue
Theree seem to be age related differences in the way of inteeracting with computer systems
But:
If the system makes an impression of being too smart, the user normally becomes more natural in his/her linguistic behaviour,
which causesproblem to the system...
Should the systems responses remain a little ”stupid”???
But where and when the interactivity is needed?
DB object
Real objectInformation
request
Formalquery
Objectattributes
Logical view
Layout viewLayout
specification
Structurespecification
Semantic viewContent
specification
Information request level:
Common Problems: Spelling errors (recall Hercules´ lecture) Connector interpretation: Natural Language conjunctions
vs. logical connectors; conjuction symbols in IR systems may be ambiguous:
”Food for cats and dogs”
Information request level (2)
Negation (examples inspired by Fuhr 1995):
”Drugs and sedatives without relation to aging”
”Drugs and sedatives, not related to aging”
”Drugs and sedatives, no aging”
”Drugs and sedatives, not age”
Information request (3)
What kind of feedback would be useful on this level?(Feedback, definition (Meadow et al. 2000: 246, Mc GrawHill 1971):
Feedback = information derived from the output of a process and used to control the process in the future
Possible feedback format on the infromation request level (?)
Predicate logic? For(food,cat) & for(food,dog)
Or
For (food, cat) or for(food,dog)
Or
For(food,cat) & dog
Generate NLP questions?
Leave everything to the user?
Or?
How to present the feedback? Menu choice?
Between information request level and formal query level
Meadow et at 2000: 179ff: examples from Dialog: SSELECT CAT interpreted as:
SS (=SELECT SETS) CAT SELECTiON (wrongly used instead of the standard command SELECT)
interpreted as:S(=SELECT) ION
What kind of feedback would be useful on this level?
Between the information request/formal query level and database objects
If the request/query is ambiguous: Give some feedback and try to resolve the ambiguity before
searching the database, or after the search, before presenting the documents (”Delayed disambiguation”) ?
What search stage is most suitable for feedback/dialog? What factors should be taken into account?
Search stages, or ”states” in searchers (Penniman & Dominick 1980, Chapman 1981)
Database selection Exploration of individual terms (looking up terms in a thesaurus or
an inverted file in order to decide which terms are to be used in the query)
Record search by term combinations Record browsing and display Record evaluation ( for possible iteration)
Levels of search activities(Bates 1990, Fuhr 1995)
Strategy (= a plan for an entire information search, e.g. Find relevant literature for a course in IA) Strategem: e.g. journal run, citation search...
Tactic: one or several moves made to further the search Move: a single action
Levels of system involvement(Bates 1990)
1. No system inolvement: All search activities human generated and executed
2. Displays possible activities: system lists search activities when asked. Some of the activities may be executable by system, some may not.
3. Monitors search and recommends search activities:1. Only when searcher asks for suggestions2. Always when it indentifies a need
4. Executes desired actions automatically
Query modification by relevance feedback(picture from M.A. Hearst, http://www.sims.berkeley.edu/courses/is202/f98/Lecture25/sld005.htm)
How to utilize terms extracted from relevant documents?
The extracted terms may be added to the query They may be presented for the user, who makes the
decision about modification They can be used for re-weighting the terms in the query
A standard method for re-weighting: Rocchio’s Algorithm(Rocchio 1971)
Goal: to achieve an optimal query
An optimal query maximizes the difference between average relevant vector and average nonrelevant vector
A standard method for re-weighting: Rocchio’s Algorithm(Rocchio 1971; many modifications, e.g. Salton & McGill 1983; Picture from Srinivasan 2003, http://mingo.info-science.uiowa.edu:16080/courses/230/Lectures/Vector.html#1c)
Qnew = a Q old + b Average Relevant Vector - c Average Nonrelevant Vector
Rocchio’s Algorithm (2)(Rocchio 1971; many modifications, e.g. Salton & McGill 1983;a more formal way of expressing the same thing – Meadow et al. 2000:258)
NiRi DDi
DDi DW
NDW
RQWWQ
QW: the initial query vector
QW’: the vector of the modified query
R= the number of the relevant retrieved documents
N= the number of the not relevant retrieved documents
DW = the document vector
, = coefficients that must be determined experimentally ( often about 0.75, about 0.25)
Future?
According to several studies, Machine Learning methods perform better than different variants of Rocchio’s algorithm.
Your experience?
Future?
Future users – a preliminary case study(age:12-13)First observations: most frequent search goals: to DO things, not to read
documents.”Download movies”, ” Prenumerate X”, ”Translate X” etc.
Future? (young users)
Queries in English dominate (specific for Swedish kids, or? What does it mean for multilinguality?)
Narrow terms dominate, specific terms more frequent than general Quite aware of the danger of information overload Short queries, 2-3 words per query ”No idea to search for subcategories” (!)