question answering

40
Group Members: Satadru Biswas (05005021) Tanmay Khirwadkar (05005016) Arun Karthikeyan Karra (05d05020) CS 626-460 Course Seminar Group-2 Question Answering

Upload: nona

Post on 06-Jan-2016

32 views

Category:

Documents


5 download

DESCRIPTION

Question Answering. Group Members: Satadru Biswas (05005021) Tanmay Khirwadkar (05005016) Arun Karthikeyan Karra (05d05020) CS 626-460 Course Seminar Group-2. Outline. Introduction Why Question Answering ? AskMSR FALCON Conclusion. Introduction. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Question Answering

Group Members:Satadru Biswas (05005021)

Tanmay Khirwadkar (05005016)Arun Karthikeyan Karra (05d05020)

CS 626-460 Course SeminarGroup-2

Question Answering

Page 2: Question Answering

OutlineIntroduction

Why Question Answering ?

AskMSR

FALCON

Conclusion

Page 3: Question Answering

IntroductionQuestion Answering (QA) is the task of

automatically answering a question posed in natural language.

To find the answer to a question, a QA computer program may use either a pre-structured database or a collection of natural language documents (a text corpus such as the World Wide Web or some local collection).

Page 4: Question Answering

A few sample questionsQ: Who shot President Abraham Lincoln?A: John Wilkes Booth

Q: How many lives were lost in the Pan Am crash in Lockerbie?

A: 270

Q: How long does it take to travel from London to Paris through the Channel?

A: three hours 45 minutes

Q: Which Atlantic hurricane had the highest recorded wind speed?

A: Gilbert (200 mph)

Page 5: Question Answering

Why Question Answering ?Google – Query driven search

Answers to a query are documentsQuestion Answering – Answer driven search

Answers to a query are phrases

Page 6: Question Answering

ApproachesQuestion classificationFinding entailed answer typeUse of WordNetHigh-quality document search

Page 7: Question Answering

Question Classes Class 1

Answer: single datum or list of items C: who, when, where, how (old, much, large) Example: Who shot President Abraham Lincoln? Answer: John Wilkes Booth

Class 2 A: multi-sentence C: extract from multiple sentences Example: Who was Picasso? Answer: Picasso was great Spanish painter

Class 3 A: across several texts C: comparative/contrastive Example: What are the Valdez Principles?

Page 8: Question Answering

Question Classes (contd…)

Class 4 A: an analysis of retrieved information C: synthesized coherently from several retrieved fragments Example: Which Atlantic hurricane had the highest recorded

wind speed? Answer: Gilbert (200 mph)

Class 5 A: result of reasoning C: word/domain knowledge and common sense reasoning Example: What did Richard Feynman say upon hearing he

would receive the Nobel Prize in Physics?

Page 9: Question Answering

Types of QAClosed-domain question answering deals with

questions under a specific domain, and can be seen as an easier task because NLP systems can exploit domain-specific knowledge frequently formalized in ontologies.

Open-domain question answering deals with questions about nearly everything, and can only rely on general ontologies and world knowledge. On the other hand, these systems usually have much more data available from which to extract the answer.

Page 10: Question Answering

QA - ConceptsQuestion Classes: Different types of questions

require the use of different strategies to find the answer.

Question Processing: A semantic model of question understanding and processing is needed, one that would recognize equivalent questions, regardless of the speech act or of the words, syntactic inter-relations or idiomatic forms.

Context and QA: Questions are usually asked within a context and answers are provided within that specific context.

Data sources for QA: Before a question can be answered, it must be known what knowledge sources are available.

Page 11: Question Answering

Cont...Answer Extraction: Answer extraction depends on the

complexity of the question, on the answer type provided by question processing, on the actual data where the answer is searched, on the search method and on the question focus and context.

Answer Formulation: The result of a QA system should be presented in a way as natural as possible.

Real time question answering: There is need for developing Q&A systems that are capable of extracting answers from large data sets in several seconds, regardless of the complexity of the question, the size and multitude of the data sources or the ambiguity of the question.

Multi-lingual QA: The ability to answer a question posed in one language using an answer corpus in another language (or even several).

Page 12: Question Answering

Cont...Interactive QA: Often the questioner might

want not only to reformulate the question, but (s)he might want to have a dialogue with the system.

Advanced reasoning for QA: More sophisticated questioners expect answers which are outside the scope of written texts or structured databases.

User profiling for QA: The user profile captures data about the questioner, comprising context data, domain of interest, reasoning schemes frequently used by the questioner, common ground established within different dialogues between the system and the user etc.

Page 13: Question Answering

Tanmay Khirwadkar

Question Answering with the Help of WEB

Page 14: Question Answering

Issues with traditional QA SystemsRetrieval is performed against small set of

documentsExtensive use of linguistic resources

POS tagging, Named Entity Tagging, WordNet etc.

Difficult to recognize answers that do not match question syntaxE.g. Q: Who shot President Abraham Lincoln?

A: John Wilkes Booth is perhaps America’s most infamous assassin having fired the bullet that killed Abraham Lincoln.

Page 15: Question Answering

The Web can help !Web – A gigantic data repository with

extensive data redundancyFactoids likely to be expressed in hundreds

of different waysAt-least a few will match the way the

question was askedE.g. Q: Who shot President Abraham Lincoln?

A: John Wilkes Booth shot President Abraham Lincoln.

Page 16: Question Answering

AskMSRBased on Data-Redundancy of the WebProcess the question

Form a web-search engine queryRecognize the answer-type

Rank answers on basis of frequencyProject the answers on TREC-corpus

Page 17: Question Answering

1. Query ReformulationQuestion is often syntactically close to

answerE.g. Where is the Louvre Museum located?

The Louvre Museum is located in ParisWho created the character of Scrooge?

Charles Dickens created the character of Scrooge.

Page 18: Question Answering

1. Query ReformulationClassify the query into 7 categories

Who, When, Where …Hand-crafted category-specific rewrite rules

[String, L/R/-, Weight]Weight – preference for a query

“Abraham Lincoln born on” preferred to “Abraham” “Lincoln” “born”

String – Simple String Manipulations

Page 19: Question Answering

1. Query ReformulationE.g. For ‘where’ questions move ‘is’ to all

possible locations –Q: What is relative humidity?

[“is relative humidity”, LEFT, 5][”relative is humidity”, RIGHT, 5][”relative humidity is”, RIGHT, 5][”relative humidity”, NULL, 2][”relative” AND “humidity”, NULL, 1]

Some rewrites may be non-sensical

Page 20: Question Answering

2. Query Search EngineSend all rewrites to a Web search engineRetrieve top N answers (100-200)For speed, rely just on search engine’s

“snippets”, not the full text of the actual document

Page 21: Question Answering

3. N-gram HarvestingProcess the snippet to retrieve string to

left/right of queryEnumerate all n-grams (1, 2 and 3)Score of n-gram -

Occurrence frequency weighted by ‘weight’ of rewrite rule that fetched the summary

Formula:

summariesgramn

weightrewritegramnweight )(

Page 22: Question Answering

4. Filtering AnswersApply filters based on question-types of

queriesRegular ExpressionsNatural Language Analysis

E.g. “Genghis Khan”, “Benedict XVI”

Boost score of answer when it matches expected answer-type

Remove answers from candidate listWhen set of answers is a closed set

“Which country …” , “How many times …”

Page 23: Question Answering

5. Answer TilingShorter N-grams have higher weights

Solution: Perform tilingCombine overlapping shorter n-grams into

longer n-gramsScore = maximum(constituent n-grams)E.g.

Pierre Baron (5)

Baron de Coubertin (20) de Coubertin (10)

Pierre Baron de Coubertin (20)

Page 24: Question Answering

6. Answer ProjectionRetrieve support ing documents from

document collection for each answerUse a standard IR system

IR Query : Web-Query + Candidate Answer

Page 25: Question Answering

Results

System Strict Lenient

AskMSR

MRR 0.347 0.434

No Answers 49.2 40.0

AskMSR2

MRR 0.347 0.437

No Answers 49.6 39.6

Qq q

Q rankMRR )

1(||

1

Page 26: Question Answering

-Arun Karthikeyan Karra(05d05020)

FALCON(Boosting Knowledge for QA systems)

Page 27: Question Answering

FALCON IntroductionAnother QA system

Integrates syntactic, semantic and pragmatic knowledge for achieving better performance

Handles question reformulations, incorporates Wordnet semantic net, performs unifications on semantic forms to extract answers

Page 28: Question Answering

Architecture of FALCON

Source: FALCON: Boosting Knowledge for Answer Engines, Harabagiu et. al.

Page 29: Question Answering

Working of FALCON: A gist

Source: FALCON: Boosting Knowledge for Answer Engines, Harabagiu et. al.

Page 30: Question Answering

Question Reformulations (1)

Source: FALCON: Boosting Knowledge for Answer Engines, Harabagiu et. al.

Page 31: Question Answering

Question Reformulations (2)

Source: FALCON: Boosting Knowledge for Answer Engines, Harabagiu et. al.

Page 32: Question Answering

Expected Answer Type (1)

Page 33: Question Answering

Expected Answer Type (2)

Source: FALCON: Boosting Knowledge for Answer Engines, Harabagiu et. al.

Page 34: Question Answering

Semantic Knowledge

Source: FALCON: Boosting Knowledge for Answer Engines, Harabagiu et. al.

Page 35: Question Answering

Key words and AlternationsMorphological Alternations

Lexical AlternationsWho killed Martin Luther King?How far is the moon?

Semantic Alternations

Page 36: Question Answering

Results Reported

692 QuestionsKey word alternations used for 89

questionsTREC-9 (Text Retrieval Conference)

Source: FALCON: Boosting Knowledge for Answer Engines, Harabagiu et. al.

Page 37: Question Answering

ConclusionQuestion Answering requires more complex

NLP techniques compared to other forms of Information Retrieval

Two main approaches;Data Redundancy – AskMSRBoosting Knowledge Base – FALCON

Ultimate Goal : System which we can ‘talk’ to

There is a long way to go ... And a lot more money to come

Page 38: Question Answering

ReferencesData Intensive Question Answering, Eric Brill

et.al., TREC-10, 2001An Analysis of the AskMSR Question-Answering

System, Eric Brill et. al., Proceedings of the Conference on Empirical Methods in Natural Association for Computational Linguistics. Language Processing (EMNLP), Philadelphia, July 2002, pp. 257-264.

FALCON: Boosting Knowledge for Answer Engines, Sanda Harabagiu, Dan Moldovan et. al., Southern Methodist University, TREC-9, 2000.

Wikipedia

Page 39: Question Answering

EXTRA SLIDES

Page 40: Question Answering

Abductive Knowledge