a tripartite question answering architecture for integrating diverse knowledge resources boris katz,...

32
A Tripartite Question Answering Architecture for Integrating Diverse Knowledge Resources Boris Katz, Gary Borchardt, Sue Felshin and Jimmy Lin MIT Computer Science and Artificial Intelligence Laboratory October 8, 2004

Upload: della-robbins

Post on 30-Dec-2015

214 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: A Tripartite Question Answering Architecture for Integrating Diverse Knowledge Resources Boris Katz, Gary Borchardt, Sue Felshin and Jimmy Lin MIT Computer

A Tripartite Question Answering Architecture for Integrating Diverse

Knowledge Resources

Boris Katz, Gary Borchardt, Sue Felshin and Jimmy Lin

MIT Computer Science and Artificial Intelligence Laboratory

October 8, 2004

Page 2: A Tripartite Question Answering Architecture for Integrating Diverse Knowledge Resources Boris Katz, Gary Borchardt, Sue Felshin and Jimmy Lin MIT Computer

MIT

Moving Forward

Question answering today… Mostly focused on simple questions Driven by IR and named-entity detection One-shot interactions: “context free” Focused on textual documents

Future directions More complex questions Deeper semantic processing Knowledge from multiple resources Extended user interactions: “scenario-based QA” Multimodal QA: retrieving audio and video

MIT AQUAINT Phase 2 Focus

Page 3: A Tripartite Question Answering Architecture for Integrating Diverse Knowledge Resources Boris Katz, Gary Borchardt, Sue Felshin and Jimmy Lin MIT Computer

MIT

Project Goals

Develop advanced QA capabilities Push the envelope in NLP technology Create natural user-system interactions Provide seamless access to heterogeneous data Fuse knowledge from multiple resources Integrate linguistic, statistical, and knowledge-based

strategies

Build a comprehensive end-to-end QA system Focus on deployment in real-world environments

Contribute to theories of knowledge representation and language comprehension

Page 4: A Tripartite Question Answering Architecture for Integrating Diverse Knowledge Resources Boris Katz, Gary Borchardt, Sue Felshin and Jimmy Lin MIT Computer

MIT

Tripartite QA Architecture

Complex Natural Language Questions

Diverse Knowledge Sources

Natural Language Understanding

Knowledge Fusion and Complex Reasoning

Uniform Access to Diverse Knowledge Resources

Complex Natural Language Questions

Diverse Knowledge Sources

Natural Language Understanding

Knowledge Fusion and Complex Reasoning

Uniform Access to Diverse Knowledge Resources

Page 5: A Tripartite Question Answering Architecture for Integrating Diverse Knowledge Resources Boris Katz, Gary Borchardt, Sue Felshin and Jimmy Lin MIT Computer

MIT

Top Layer: Understanding Language

Coordinate natural language interactions with users

Primary responsibilities: Analyze natural language sentences Disambiguate user information needs interactively Manage discourse and dialog

Page 6: A Tripartite Question Answering Architecture for Integrating Diverse Knowledge Resources Boris Katz, Gary Borchardt, Sue Felshin and Jimmy Lin MIT Computer

MIT

Bottom Layer: Accessing Resources

Complex questions require multiple heterogeneous resources to answer

Our solution: OmniStore, a uniform knowledge repository based on ternary expressions

Sources of knowledge: Structured and semi-structured databases Syntactic and semantic relations automatically

extracted from free text Natural language annotations attached to opaque

knowledge segments

Page 7: A Tripartite Question Answering Architecture for Integrating Diverse Knowledge Resources Boris Katz, Gary Borchardt, Sue Felshin and Jimmy Lin MIT Computer

MIT

OmniStore

Diverse Knowledge Sources

Knowledge Repository of Ternary Expressions

Databases, Semi-Structured Sources, and Other Knowledge-

Based Systems

Arbitrary Procedures and Opaque Knowledge Segments

Unstructured Texts

KnowledgeExtraction

Diverse Knowledge Sources

Knowledge Repository of Ternary Expressions

Databases, Semi-Structured Sources, and Other Knowledge-

Based Systems

Arbitrary Procedures and Opaque Knowledge Segments

Unstructured Texts

KnowledgeExtraction

Page 8: A Tripartite Question Answering Architecture for Integrating Diverse Knowledge Resources Boris Katz, Gary Borchardt, Sue Felshin and Jimmy Lin MIT Computer

MIT

Middle Layer: Connecting the Pieces

Bridge the gap between questions and knowledge required to answer those questions

Knowledge fusion and complex reasoning: Decompose complex questions into combinations of

simpler questions Efficiently access resources required to answer

individual questions Combine smaller “nuggets of knowledge” into a

coherent response

Page 9: A Tripartite Question Answering Architecture for Integrating Diverse Knowledge Resources Boris Katz, Gary Borchardt, Sue Felshin and Jimmy Lin MIT Computer

MIT

In This Presentation

Explicit, syntactically-based decomposition of questions Using syntactic cues to decompose questions into

combinations of simpler questions Answering simpler questions with different resources

Implicit, semantically-based decomposition of questions Applying domain rules to decompose questions into

combinations of simpler questions Answering simpler questions using the CNS WMD

Terrorism Database

Managing extended user interactions Creating more natural dialog by handling ellipsis

The beginnings of...

Page 10: A Tripartite Question Answering Architecture for Integrating Diverse Knowledge Resources Boris Katz, Gary Borchardt, Sue Felshin and Jimmy Lin MIT Computer

MIT

Complex Natural Language Questions

Diverse Knowledge Sources

Natural Language Understanding

Knowledge Fusion and Complex Reasoning

Uniform Access to Diverse Knowledge Resources

Complex Natural Language Questions

Diverse Knowledge Sources

Natural Language Understanding

Knowledge Fusion and Complex Reasoning

Uniform Access to Diverse Knowledge Resources

MIT AQUAINT QA Server

START+

IMPACT+

Omnibase+

WMD Terrorism database Infoplease Biography.com WorldBook

Page 11: A Tripartite Question Answering Architecture for Integrating Diverse Knowledge Resources Boris Katz, Gary Borchardt, Sue Felshin and Jimmy Lin MIT Computer

MIT

Answering Complex Questions

Syntactically decomposing questions:

Semantically decomposing questions:

How many people live in the capital of the third largest Asian country?

What is the third largest Asian country?What is its capital?How many people live there?

Could HAMAS carry out an attack in the United States with biological agents?

Does HAMAS have the expertise to carry out an attack using biological agents?Does HAMAS have the motivation to carry out an attack in the United States?

Page 12: A Tripartite Question Answering Architecture for Integrating Diverse Knowledge Resources Boris Katz, Gary Borchardt, Sue Felshin and Jimmy Lin MIT Computer

MIT

Syntactic Decomposition

Parse questions into nested ternary expressions

Successively resolve groups of ternary expressions containing unbound variables Answer sub-questions by replacing variables with

values

How many people live in the capital of the 3rd largest Asian country?

1. What is the 3rd largest Asian country?

ANSWER = Kazakhstan

ANSWER = Almaty

2. What is the capital of Kazakhstan?

ANSWER = 1.2 million

3. How many people live in Almaty?

Page 13: A Tripartite Question Answering Architecture for Integrating Diverse Knowledge Resources Boris Katz, Gary Borchardt, Sue Felshin and Jimmy Lin MIT Computer

MIT

A Complete Example How many people live in the capital of the 3rd largest Asian country?

< < people+9814 live > in capital+9815 >

< people+9814 quantity *numeral* >

< capital+9815 related-to country+9813 >

< country+9813 is Asian >

< country+9813 is largest+9816 >

< largest+9816 mod third >

< < people+9814 live > in capital+9815 >

< people+9814 quantity *numeral* >

< capital+9815 related-to Kazakhstan >

< < people+9814 live > in Almaty >

< people+9814 quantity *numeral* >

country+9813 = Kazakhstan

The third largest Asian countryis Kazakhstan.

capital+9815 = AlmatyThe capital of Kazakhstan is Almaty.

*numeral* = 1.2 millionThe population ofAlmaty is 1.2 million.

Page 14: A Tripartite Question Answering Architecture for Integrating Diverse Knowledge Resources Boris Katz, Gary Borchardt, Sue Felshin and Jimmy Lin MIT Computer

MIT

Page 15: A Tripartite Question Answering Architecture for Integrating Diverse Knowledge Resources Boris Katz, Gary Borchardt, Sue Felshin and Jimmy Lin MIT Computer

MIT

Page 16: A Tripartite Question Answering Architecture for Integrating Diverse Knowledge Resources Boris Katz, Gary Borchardt, Sue Felshin and Jimmy Lin MIT Computer

MIT

Page 17: A Tripartite Question Answering Architecture for Integrating Diverse Knowledge Resources Boris Katz, Gary Borchardt, Sue Felshin and Jimmy Lin MIT Computer

MIT

Ellipsis

"What country in Africa has the largest population?"

"How about area?"

"area" X X

"country""Africa""population"

possible antecedents

There are three NPs in the previous query. Which one should be replaced?

START employs linguistic and ontological knowledge to resolve ambiguities:

Lexical semantic properties of English nouns Reasoning over relevant domain knowledge

Page 18: A Tripartite Question Answering Architecture for Integrating Diverse Knowledge Resources Boris Katz, Gary Borchardt, Sue Felshin and Jimmy Lin MIT Computer

MIT

Page 19: A Tripartite Question Answering Architecture for Integrating Diverse Knowledge Resources Boris Katz, Gary Borchardt, Sue Felshin and Jimmy Lin MIT Computer

MIT

Resource1Resource2

Resourcen

Natural Language Questions

Symbolic Queries

Syntactic and Semantic Decomposition using Domain Knowledge

IndividualResources …

A Visualization

Page 20: A Tripartite Question Answering Architecture for Integrating Diverse Knowledge Resources Boris Katz, Gary Borchardt, Sue Felshin and Jimmy Lin MIT Computer

MIT

The WMD Terrorism Database

Page 21: A Tripartite Question Answering Architecture for Integrating Diverse Knowledge Resources Boris Katz, Gary Borchardt, Sue Felshin and Jimmy Lin MIT Computer

MIT

Knowledge Templates

stylized natural language “wrappers” around selected database fields

In [1995], [religious cult] [Aum Supreme Truth] carried out a [use of agent] in [Japan], involving [chemical agent] [sarin].

Page 22: A Tripartite Question Answering Architecture for Integrating Diverse Knowledge Resources Boris Katz, Gary Borchardt, Sue Felshin and Jimmy Lin MIT Computer

MIT

constants (e.g., "1995")

unnamed variables (e.g., "something")

named variables (e.g., "some year")

restricted variables (e.g., "some year (> 1990)")

reported variables (e.g., "what year")

...

Query Arguments

In [1995], [religious cult] [Aum Supreme Truth] carried out a [use of agent] in [Japan], involving [chemical agent] [sarin].

Each field can be similarly treated…

Page 23: A Tripartite Question Answering Architecture for Integrating Diverse Knowledge Resources Boris Katz, Gary Borchardt, Sue Felshin and Jimmy Lin MIT Computer

MIT

[<a group>] could carry out an attack in [<a country>] using a [<an agent type>].

“Could the KKK be involved in an attack using biological weapons?”“Could an attack be carried out in Italy involving chemical weapons?”“Are any groups trying to conduct an attack in the United States?”“What groups will be able to carry out an attack in the US?”“In what countries could Hizballah execute an attack?”“Aum Shinrikyo could carry out an attack with what agent types?”

From Language to Queries

Many natural language questions can be represented by the same knowledge template

Page 24: A Tripartite Question Answering Architecture for Integrating Diverse Knowledge Resources Boris Katz, Gary Borchardt, Sue Felshin and Jimmy Lin MIT Computer

MIT

Two Domain Rules

[some group] has the expertise to carry out an attack using a [some agent type].

[some group] has the motivation to carry out an attack in [some country].

[some group] could carry out an attack in [some country] using a [some agent type].

(A group could carry out an attack if the group has the expertise and the motivation to do so.)

[some group] has the expertise to carry out an attack using a [some agent type].

In [something], [something] [some group] carried out a [something (<>attempted acquisition) (<>hoax/prank/threat) (<>plot only)] in [something], involving [some agent type] [something].

(A group has the expertise to carry out an attack if the group has been involved in a WMD terrorism incident other than an attempted acquisition, hoax, etc., or unexecuted plot.)

Page 25: A Tripartite Question Answering Architecture for Integrating Diverse Knowledge Resources Boris Katz, Gary Borchardt, Sue Felshin and Jimmy Lin MIT Computer

MIT

Page 26: A Tripartite Question Answering Architecture for Integrating Diverse Knowledge Resources Boris Katz, Gary Borchardt, Sue Felshin and Jimmy Lin MIT Computer

MIT

Page 27: A Tripartite Question Answering Architecture for Integrating Diverse Knowledge Resources Boris Katz, Gary Borchardt, Sue Felshin and Jimmy Lin MIT Computer

MIT

Page 28: A Tripartite Question Answering Architecture for Integrating Diverse Knowledge Resources Boris Katz, Gary Borchardt, Sue Felshin and Jimmy Lin MIT Computer

MIT

Terrorist Activities Which groups have been involved in attacks in the United

States?

Has Aum Shinrikyo carried out an attack in Japan with a biological agent?

In what countries have organizations executed attacks with radiological weapons?

Did the Japanese Red Army carry out a threat in Japan?

Has the KKK been engaged in an attack in the US?

What groups have put on a hoax in the United States?

Did Aum Supreme Truth plot to use a chemical agent in the United States?

Which groups have acquired a chemical weapon?

What groups have issued a threat?

Did the Animal Liberation Front issue a threat?

Page 29: A Tripartite Question Answering Architecture for Integrating Diverse Knowledge Resources Boris Katz, Gary Borchardt, Sue Felshin and Jimmy Lin MIT Computer

MIT

Database Contents What group types are there?

What groups are in the WMD DB?

Is Aum Shinrikyo portrayed in the WMD Terrorism Database?

Is Turkey in the WMD Terrorism DB?

Is PFLP in the WMD Terrorism DB?

Is the Japanese Red Army specified in the WMD Terrorism Database?

Is the KKK included?

What event types are specified in the WMD DB?

What countries are in the WMD DB?

Is the Netherlands in the WMD DB?

What agent types are in the WMD Terrorism Database?

Page 30: A Tripartite Question Answering Architecture for Integrating Diverse Knowledge Resources Boris Katz, Gary Borchardt, Sue Felshin and Jimmy Lin MIT Computer

MIT

Relationships What group type is Dark Harvest? What groups are right-wing organizations? What event types are left-wing groups associated with? What group types are in Mexico? What countries have criminal organizations? Are criminal organizations in Lithuania? Religious cults have a presence in what countries? Are nationalist groups associated with radiological agents? What groups are associated with use of agents? Is Aum Shinrikyo associated with use of agents? What groups does Canada have? The Red Army Faction is in what countries? What groups have a presence in Turkey? What groups are associated with nuclear agents? Has use of agents occurred in Germany?

Page 31: A Tripartite Question Answering Architecture for Integrating Diverse Knowledge Resources Boris Katz, Gary Borchardt, Sue Felshin and Jimmy Lin MIT Computer

MIT

Capabilities and Motivations Does Hizballah want to carry out an attack in Lebanon?

Which groups have the motivation to carry out an attack in France?

Does Hizballah have the expertise to carry out an attack with a chemical agent?

What groups have the expertise to carry out an attack with a chemical agent?

Could Hizballah conduct an attack in Turkey using a biological agent?

Using what agent types could Hizballah execute an attack in Lebanon?

Are the Chechen rebels able to carry out an attack in Georgia using a chemical agent?

In what countries could Hizballah carry out an attack using biological agents?

Page 32: A Tripartite Question Answering Architecture for Integrating Diverse Knowledge Resources Boris Katz, Gary Borchardt, Sue Felshin and Jimmy Lin MIT Computer

MIT

Summary

Initial realization of the tripartite QA architecture completed

New capabilities: Explicit, syntactically-based decomposition of questions Augmented handling of elliptic questions Implicit, semantically-based decomposition of questions

Incorporated resources: CNS WMD Terrorism database A range of web-based resources (e.g., Infoplease)