enabling domain experts to convey questions to a machine: a modified, template-based approach

21
Enabling Domain Experts to Convey Questions to a Machine: A Modified, Template-Based Approach Peter Clark (Boeing Phantom Works) Ken Barker, Bruce Porter (Univ Texas at Austin) Vinay Chaudhri, Sunil Mishra, Jerome Thomere (SRI International)

Upload: marcie

Post on 20-Mar-2016

34 views

Category:

Documents


0 download

DESCRIPTION

Enabling Domain Experts to Convey Questions to a Machine: A Modified, Template-Based Approach. Peter Clark (Boeing Phantom Works) Ken Barker, Bruce Porter (Univ Texas at Austin) Vinay Chaudhri, Sunil Mishra, Jerome Thomere (SRI International). How can End-Users Pose Questions?. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Enabling Domain Experts to Convey Questions to a Machine: A Modified, Template-Based Approach

Enabling Domain Experts to Convey Questions to a Machine:

A Modified, Template-Based Approach

Peter Clark (Boeing Phantom Works)Ken Barker, Bruce Porter (Univ Texas at Austin)Vinay Chaudhri, Sunil Mishra, Jerome Thomere

(SRI International)

Page 2: Enabling Domain Experts to Convey Questions to a Machine: A Modified, Template-Based Approach

How can End-Users Pose Questions?

• User needs to express:– domain knowledge– questions posed to that domain knowledge

• Posing questions:– can be straightforward, e.g., single-task systems:

• “What disease does this patient have?”

– or, can itself be a major “knowledge capture” challenge

• This talk:– How to pose questions (not how to answer them!)

Start-to-Finish Knowledge Capture:

Page 3: Enabling Domain Experts to Convey Questions to a Machine: A Modified, Template-Based Approach

Some Example Questions…

1. When during RNA translation is the movement of a tRNA molecule from the A- to the P-site of a ribosome thought to occur?

2. What are the functions of RNA?

3. What happens to the DNA during RNA transcription?

4. In a cell, what factors affect the rate of protein production?

5. A mutation in DNA generates a UGA stop codon in the middle of the RNA coding for a particular protein. What nucleotide change has probably occurred ?

Page 4: Enabling Domain Experts to Convey Questions to a Machine: A Modified, Template-Based Approach

Some Previous Approaches• Just allow one question to be asked

– “What disease does this patient have?”– But: inappropriate for multifunctional systems

• Ask in Natural Language– e.g., for databases

• “How many employees work for Joe?”– But: lacks sufficient constraint

• Use Question Templates– e.g., HPKB

• “What risks/rewards would <country> face/expect in taking hostage citizens of <country>?”

– But: domain-specific

Page 5: Enabling Domain Experts to Convey Questions to a Machine: A Modified, Template-Based Approach

A Modified, Template-Based Approach

1. Complex questions can be factored into– the question scenario (“Imagine that…”)

– query to that scenario (“Thus, what is…”)

2. The scenario contains most of the complexity– The “raw query” itself is usually simple

3. The query can be mapped into one of a small number of domain-general templates– grouped around different modeling paradigms

Claims:

Page 6: Enabling Domain Experts to Convey Questions to a Machine: A Modified, Template-Based Approach

A Modified, Template-Based Approach

basis for a modified, template-based approach:

Full Question = Scenario + Query

Capture using graphical tools (Shaken)

Capture with a finite set of domain-general templates

Page 7: Enabling Domain Experts to Convey Questions to a Machine: A Modified, Template-Based Approach

Full Question = Scenario + Query

• Create using a graphical “representation builder”– Select objects from an ontology– Connect them together using small library of relations– Graph converted to ground logic assertions

“A DNA virus invades the cell of a multicellular organism”

Page 8: Enabling Domain Experts to Convey Questions to a Machine: A Modified, Template-Based Approach

• Huge variety of possible queries– But can be grouped according to reasoning paradigms

the KB supports

Catalog of 29 Domain-General Question Types– based on analysis of 339 cell biology questions

– have a fill-in-the-blank template

– “blanks” are (often complex) objects from the scenario

Full Question = Scenario + Query

Page 9: Enabling Domain Experts to Convey Questions to a Machine: A Modified, Template-Based Approach

Paradigms and Some Templates…1. Lookup & Simple Deductive Reasoning

q2 “What is/are the function of RNA?”q4 “Is a ribosome a cytoplasmic organelle?”q6 “How many membranes are in the parts relationship to the

ribosome?”

2. Discrete Event Simulationq12 “What happens to the DNA during RNA transcription?”

3. Qualitative Reasoningq25 “In cell protein synthesis, what factors affect the rate of

protein production?”q26 “In RNA transcription, what factors might cause the

transcription rate to increase?”

4. Analogical/Comparitive Reasoningq29 “What is the difference between procaryotic mRNA and

eucaryotic mRNA?”

Page 10: Enabling Domain Experts to Convey Questions to a Machine: A Modified, Template-Based Approach

Question Reformulation• Small set of question types users often must

re-cast original question in terms of those types• For example…

7.1.5-270: “Where in a eucaryotic cell does RNA transcription take place?”

“What is/are the site of RNA transcription?”7.1.4.118: “When is the sigma factor of bacterial RNA

polymerase released with respect to RNA transcription?”

“During RNA transcription, when does the RNA polymerase | release | the sigma factor?”

Page 11: Enabling Domain Experts to Convey Questions to a Machine: A Modified, Template-Based Approach

Posing questions

Page 12: Enabling Domain Experts to Convey Questions to a Machine: A Modified, Template-Based Approach

Posing questions (cont)

Page 13: Enabling Domain Experts to Convey Questions to a Machine: A Modified, Template-Based Approach

Receiving Answers

Page 14: Enabling Domain Experts to Convey Questions to a Machine: A Modified, Template-Based Approach

Receiving Answers (cont)

Page 15: Enabling Domain Experts to Convey Questions to a Machine: A Modified, Template-Based Approach

Evaluation and Lessons Learned

• Large-scale trials in 2001

• 4 biology students used system for 4 weeks

• Their goals:– Encode 11-page subsection on cell biology

– Test their representations using a set of 70 questions• Qns expressed in English

• High-school level of difficulty

• Qns set independently, no knowledge of our templates

• 18 of the 29 templates implemented at time of trials

Page 16: Enabling Domain Experts to Convey Questions to a Machine: A Modified, Template-Based Approach

Results• It works…

– All 4 users able to pose most (~80%) of the qns– Answer score (average) = 2.23 (2 = “mostly correct”)

– Exposes what the system is able to do

• …but three major challenges…

Page 17: Enabling Domain Experts to Convey Questions to a Machine: A Modified, Template-Based Approach

Challenges1. Users had difficulty reformulating their

questions to match a template, e.g.

(Original) “Where in a eucaryotic cell does RNA transcription take place?” (Desired) “What is/are the site of RNA transcription?” (User) “What is RNA transcription?”

Heavy use of a few generic templates:

Page 18: Enabling Domain Experts to Convey Questions to a Machine: A Modified, Template-Based Approach

Challenges• Reformulation is not just a rewording task

• Rather, requires user to view problem in terms of one of the KB’s modeling paradigms

• Easier for us than for the users

Page 19: Enabling Domain Experts to Convey Questions to a Machine: A Modified, Template-Based Approach

Challenges2. Users need to be fluent with the graph tool

and KB ontology for specifying scenarios• Not an problem in this case

Page 20: Enabling Domain Experts to Convey Questions to a Machine: A Modified, Template-Based Approach

Challenges3. Sometimes, the template approach breaks down

• Some questions require identifying the scenarios:

• “What kinds of final products result from mRNA?”

• Similarly, identifying the right viewpoint/level of detail:• e.g., DNA as a line vs. sequence vs. two strands

• Some topics not covered by templates

• Uncertainty, causal event structure

• Diagnosis, abduction

• Some questions go beyond concepts in the KB

• “What are the building blocks of proteins?”

• Can’t specify “impossible objects”• “Is <object> possible?”

Page 21: Enabling Domain Experts to Convey Questions to a Machine: A Modified, Template-Based Approach

Summary

• Conveying questions can itself be a major “knowledge capture” challenge

• A modified, template-based approach:– Factor full questions into scenarios + templates– Templates are domain-general, and based on

modeling paradigms available

– Balances flexibility vs. interpretability

• Results:– A catalog of templates

– Approach works! but with significant caveats.