(spoken) dialogue and information retrieval antoine raux dialogs on dialogs group 10/24/2003

27
(Spoken) Dialogue (Spoken) Dialogue and Information and Information Retrieval Retrieval Antoine Raux Antoine Raux Dialogs on Dialogs Group Dialogs on Dialogs Group 10/24/2003 10/24/2003

Upload: phillip-armstrong

Post on 17-Dec-2015

217 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: (Spoken) Dialogue and Information Retrieval Antoine Raux Dialogs on Dialogs Group 10/24/2003

(Spoken) Dialogue(Spoken) Dialogueand Information Retrievaland Information Retrieval

Antoine RauxAntoine Raux

Dialogs on Dialogs GroupDialogs on Dialogs Group

10/24/200310/24/2003

Page 2: (Spoken) Dialogue and Information Retrieval Antoine Raux Dialogs on Dialogs Group 10/24/2003

OutlineOutline

Interactive Information Retrieval Interactive Information Retrieval Systems (Belkin et al)Systems (Belkin et al)

EUREKA: Dialogue-based IR for Low EUREKA: Dialogue-based IR for Low Bandwidth DevicesBandwidth Devices

Voice Access to IRVoice Access to IR

Page 3: (Spoken) Dialogue and Information Retrieval Antoine Raux Dialogs on Dialogs Group 10/24/2003

Cases, Scripts, and Cases, Scripts, and Information-Seeking StrategiesInformation-Seeking Strategies

Belkin, Cool (Rutgers)Belkin, Cool (Rutgers)Stein, Thiel (GMD-IPSI)Stein, Thiel (GMD-IPSI)

Long journal article (1995)Long journal article (1995)

From the IR community (Expert From the IR community (Expert Systems)Systems)

Page 4: (Spoken) Dialogue and Information Retrieval Antoine Raux Dialogs on Dialogs Group 10/24/2003

IR as InteractionIR as Interaction

Traditional IR research focuses on Traditional IR research focuses on document/query document/query representationrepresentation and and comparisoncomparison

Need to focus on the Need to focus on the useruser Represent IR as a Represent IR as a dialoguedialogue between between

an an information seekerinformation seeker and an and an information providerinformation provider

Page 5: (Spoken) Dialogue and Information Retrieval Antoine Raux Dialogs on Dialogs Group 10/24/2003

Information-Seeking Information-Seeking StrategiesStrategies

Represent information-seeking behavior Represent information-seeking behavior along 4 dimensions:along 4 dimensions: Method of Interaction (scanning vs searching)Method of Interaction (scanning vs searching) Goal of Interaction (learning vs selecting)Goal of Interaction (learning vs selecting) Mode of Retrieval (recognition vs specification)Mode of Retrieval (recognition vs specification) Resource Considered (information vs meta-Resource Considered (information vs meta-

info)info) Binary values Binary values 16 strategies (ISS) 16 strategies (ISS)

Page 6: (Spoken) Dialogue and Information Retrieval Antoine Raux Dialogs on Dialogs Group 10/24/2003

Dialogue Structures for Dialogue Structures for Information SeekingInformation Seeking

Mix of different formalisms:Mix of different formalisms: Recursive state-based schemas (COR)Recursive state-based schemas (COR)

e.g. e.g. Request Request Promise Promise Inform Inform Be contented Be contented

Scripts: prototypical interaction for each Scripts: prototypical interaction for each ISSISS

Goal treesGoal treesRetrieve Specified Items

Specify Characteristic Recognize Desired Items

Offer choice Select and Specify

Page 7: (Spoken) Dialogue and Information Retrieval Antoine Raux Dialogs on Dialogs Group 10/24/2003

Deriving Scripts from DataDeriving Scripts from Data

Case-based approach: problem Case-based approach: problem solving using previously stored solving using previously stored solved instancessolved instances

Match a sequence of action to a Match a sequence of action to a state-based schemastate-based schema

Extract goal treeExtract goal tree Identify goal (which ISS?)Identify goal (which ISS?)

Page 8: (Spoken) Dialogue and Information Retrieval Antoine Raux Dialogs on Dialogs Group 10/24/2003

The MERIT SystemThe MERIT System

Theory vs Practice…Theory vs Practice… Graphical interface (not NL dialogue)Graphical interface (not NL dialogue) User does case selection (for User does case selection (for

eventual case-based reasoning)eventual case-based reasoning) Example task is relational database Example task is relational database

(not free text IR): uses form filling (!) (not free text IR): uses form filling (!)

Page 9: (Spoken) Dialogue and Information Retrieval Antoine Raux Dialogs on Dialogs Group 10/24/2003

DiscussionDiscussion

Contribution to IR: user-centered Contribution to IR: user-centered view, application of many non-IR view, application of many non-IR theories (discourse, CBR)theories (discourse, CBR)

BUT: too complicated for the user?BUT: too complicated for the user?

Page 10: (Spoken) Dialogue and Information Retrieval Antoine Raux Dialogs on Dialogs Group 10/24/2003

DiscussionDiscussion

Contribution to Dialogue Systems: Contribution to Dialogue Systems: difficult task (not often dealt with in difficult task (not often dealt with in DS), CBR (can we learn dialogue DS), CBR (can we learn dialogue structure from data?)structure from data?)

BUT: lacks a good, unified, practical BUT: lacks a good, unified, practical framework (too many different framework (too many different paradigms applied…)paradigms applied…)

Page 11: (Spoken) Dialogue and Information Retrieval Antoine Raux Dialogs on Dialogs Group 10/24/2003

Dialogue-based IR: Why?Dialogue-based IR: Why?

Google-like interface still predominant Google-like interface still predominant (despite MERIT)(despite MERIT)

Why?Why? Users receives a lot of information Users receives a lot of information

(document titles, summaries) and use it (document titles, summaries) and use it as they wantas they want

Very simple to learnVery simple to learn Very flexibleVery flexible BUT: works on BUT: works on high bandwidth deviceshigh bandwidth devices

Page 12: (Spoken) Dialogue and Information Retrieval Antoine Raux Dialogs on Dialogs Group 10/24/2003

Dialogue-based IR: Why?Dialogue-based IR: Why?

For For low bandwidth deviceslow bandwidth devices (PDA, (PDA, phone), information-rich interface phone), information-rich interface don’t workdon’t work

Only small pieces of information Only small pieces of information exchanged at a timeexchanged at a time

System has to System has to selectselect Less information, more interactionLess information, more interaction

Page 13: (Spoken) Dialogue and Information Retrieval Antoine Raux Dialogs on Dialogs Group 10/24/2003

EUREKA: IdeaEUREKA: Idea

Use dialogue to submit queries to a Use dialogue to submit queries to a web search engine, browse through web search engine, browse through the hierarchically clustered results, the hierarchically clustered results, perform query perform query reformulation/refinement, etc…reformulation/refinement, etc…

Page 14: (Spoken) Dialogue and Information Retrieval Antoine Raux Dialogs on Dialogs Group 10/24/2003

EUREKA: OverviewEUREKA: Overview

Backend: Backend: VivisimoVivisimo (through web (through web scraper)scraper)

Dialogue Management: Dialogue Management: RavenClawRavenClaw (successor of CMU Communicator)(successor of CMU Communicator)

Language Understanding: Language Understanding: Light Open Light Open Vocabulary ParserVocabulary Parser

NLG/TTS: template-based & FestivalNLG/TTS: template-based & Festival

Page 15: (Spoken) Dialogue and Information Retrieval Antoine Raux Dialogs on Dialogs Group 10/24/2003

Backend: VivisimoBackend: Vivisimo

Available clustering meta-search Available clustering meta-search engineenginewww.vivisimo.comwww.vivisimo.com

Hand-written Perl web scraper Hand-written Perl web scraper (hope Vivisimo doesn’t change their (hope Vivisimo doesn’t change their page design by the end of the page design by the end of the semester…)semester…)

Page 16: (Spoken) Dialogue and Information Retrieval Antoine Raux Dialogs on Dialogs Group 10/24/2003

LOV ParserLOV Parser

Problem: traditional NL parsers require a Problem: traditional NL parsers require a dictionary dictionary not applicable to open not applicable to open domain IRdomain IR

Solution (implemented in C++):Solution (implemented in C++): fix a small number of one-word commands fix a small number of one-word commands

(new_query, open, list_clusters)(new_query, open, list_clusters) parse each line as “[command] parse each line as “[command]

[arguments]” or “[command]” or [arguments]” or “[command]” or “[arguments]”“[arguments]”

Page 17: (Spoken) Dialogue and Information Retrieval Antoine Raux Dialogs on Dialogs Group 10/24/2003

Dialogue Management: Dialogue Management: RavenClawRavenClaw

Hierarchical agent architecture:Hierarchical agent architecture:

EUREKA

Greet UserPromptQuery

New QueryOpen Cluster

SubmitQuery

GetCluster List

GetDoc List

InformResults

CloseCluster

Page 18: (Spoken) Dialogue and Information Retrieval Antoine Raux Dialogs on Dialogs Group 10/24/2003

NLG/TTSNLG/TTS

Template-based Language Generation Template-based Language Generation (e.g. “I found <n_doc> documents.”)(e.g. “I found <n_doc> documents.”)

General purpose Festival voice for TTSGeneral purpose Festival voice for TTS

NB: NB: Browsing through lists is Browsing through lists is not efficientnot efficient with speech, even for lists of clusterswith speech, even for lists of clusters

Page 19: (Spoken) Dialogue and Information Retrieval Antoine Raux Dialogs on Dialogs Group 10/24/2003

Already ImplementedAlready Implemented

Working prototypeWorking prototype Commands:Commands:

new_querynew_query list_clusters, list_documentslist_clusters, list_documents open, close (cluster)open, close (cluster) more, back (list of clusters/documents)more, back (list of clusters/documents)

Page 20: (Spoken) Dialogue and Information Retrieval Antoine Raux Dialogs on Dialogs Group 10/24/2003

DemoDemo

Page 21: (Spoken) Dialogue and Information Retrieval Antoine Raux Dialogs on Dialogs Group 10/24/2003

Future WorkFuture Work

Add more functionalities (query Add more functionalities (query refinement, summarization…)refinement, summarization…)

Make Make cleverclever use of the dialogue (not use of the dialogue (not only command and control + browsing)only command and control + browsing) System can provide advice to user on System can provide advice to user on

search strategies (e.g. “you need to refine search strategies (e.g. “you need to refine the query”)the query”)

User and system can negotiate to specify User and system can negotiate to specify the user’s information needthe user’s information need(cf Belkin: overview vs specific document)(cf Belkin: overview vs specific document)

Page 22: (Spoken) Dialogue and Information Retrieval Antoine Raux Dialogs on Dialogs Group 10/24/2003

Future Work/DiscussionFuture Work/Discussion

Advantage of dialogue: more Advantage of dialogue: more feedback from the userfeedback from the user

How can dialogue improve the How can dialogue improve the efficiency of low bandwidth IR?efficiency of low bandwidth IR?

Do we need to tailor IR techniques Do we need to tailor IR techniques (e.g. clustering) for dialogue, or even (e.g. clustering) for dialogue, or even design new techniques?design new techniques?

Page 23: (Spoken) Dialogue and Information Retrieval Antoine Raux Dialogs on Dialogs Group 10/24/2003

Vocal Access to IRVocal Access to IR

Problem: ASR introduces a lot of Problem: ASR introduces a lot of erroneous words in a spoken query erroneous words in a spoken query (for an open domain, speaker (for an open domain, speaker independent system)independent system)

However, in an IR system: access to However, in an IR system: access to many text documents to help many text documents to help language modeling…language modeling…

Page 24: (Spoken) Dialogue and Information Retrieval Antoine Raux Dialogs on Dialogs Group 10/24/2003

Vocal Access to a Newspaper Vocal Access to a Newspaper Archive (Crestani 02)Archive (Crestani 02)

Presents studies for a full voice-controlled IR Presents studies for a full voice-controlled IR systemsystem

No dialogue: No dialogue: user query user query list of summaries list of summaries

Focuses on issues of:Focuses on issues of: TTS: can user make relevance judgments when TTS: can user make relevance judgments when

they hear document descriptions synthesized they hear document descriptions synthesized over the phone? (answer: yes)over the phone? (answer: yes)

ASR: how does IR perform with recognized ASR: how does IR perform with recognized queries?queries?

Page 25: (Spoken) Dialogue and Information Retrieval Antoine Raux Dialogs on Dialogs Group 10/24/2003

Using IR Techniques to Deal Using IR Techniques to Deal with Recognition Errorswith Recognition Errors

WER does have an impact on precision, WER does have an impact on precision, although not much variation for WER in although not much variation for WER in 27%-47%27%-47%

Relevance feedback: use documents Relevance feedback: use documents judged relevant by the user as queryjudged relevant by the user as query

Use prosodic stress to estimate Use prosodic stress to estimate information content of query termsinformation content of query terms

Include semantically/phonetically close Include semantically/phonetically close terms in the queryterms in the query

Page 26: (Spoken) Dialogue and Information Retrieval Antoine Raux Dialogs on Dialogs Group 10/24/2003

Improving ASR (Fujii et al Improving ASR (Fujii et al 02)02)

Fujii et al propose LM adaptation based on Fujii et al propose LM adaptation based on the IR corpus:the IR corpus: Offline “adaptation”: train on the whole corpusOffline “adaptation”: train on the whole corpus Online adaptation: adapt on the top retrieved Online adaptation: adapt on the top retrieved

documents (then reperform ASR and IR)documents (then reperform ASR and IR) Good results with offline trained LM (WER Good results with offline trained LM (WER

< 20%, AP loss of 20-30% from text IR)< 20%, AP loss of 20-30% from text IR) No evaluation of online adaptation…No evaluation of online adaptation…

Page 27: (Spoken) Dialogue and Information Retrieval Antoine Raux Dialogs on Dialogs Group 10/24/2003

Vocal Access to IR: Vocal Access to IR: DiscussionDiscussion

Seems to work ok for some tasksSeems to work ok for some tasks Clever use of IR techniquesClever use of IR techniques BUT queries are not spontaneous nor BUT queries are not spontaneous nor

natural (maybe)natural (maybe) LM for Web queries??LM for Web queries??

What about dialogue?What about dialogue?