1
Advanced Techniques Advanced Techniques for Answer Extraction for Answer Extraction
and Formulationand Formulation
Language Computer Corporation
www.languagecomputer.comDallas, TexasPI: Dan Moldovan
2
TasksTasks
Task 1. QA System Taxonomy Task 2. Answer fusion Task 3. Develop methods for on-line
ontology construction Task 4. Develop an inference engine capable
of providing answer justification Task 5. Formulate concise and coherent
answers Task 6. Explore new QA System
Architectures
3
Performance AnalysisPerformance Analysis
Serial System Architecture
M1: Keyword pre-
processing (split/bind/spel
l)
M5: Keyword
expansion
M3: Derivation
of expected answer type
M4: Keyword selectio
n
M2: Construction of question representati
on
M6: Actual retrieval of documents
and passages
M10: Answer
formulation
M8: Identificatio
n of candidate answers
M9: Answer ranking
M7: Passage post-filtering
Answer
Question
4
Performance AnalysisPerformance Analysis
Distribution of ErrorsModule
Module definition Errors (%)
M1 Keyword pre-processing (split/bind/spell check)
1.9
M2 Construction of internal question representation
5.2
M3 Derivation of expected answer type 36.4
M4 Keyword selection (incorrectly added or excluded)
8.9
M5 Keyword expansion desirable but missing 25.7
M6 Actual retrieval (limit on passage number or size)
1.6
M7 Passage post-filtering (incorrectly discarded) 1.6
M8 Identification of candidate answers 8.0
M9 Answer ranking 6.3
M10 Answer formulation 4.4
5
Performance AnalysisPerformance Analysis
Impact of System Parameters
0.340.350.360.370.380.390.4
0.410.420.43
Precision(MRR)
20 50 200
Nd
Np=50Np=200Np=500
Nd – maximum number of documents retrieved
Np – maximum number of passages processed
6
Performance AnalysisPerformance Analysis
Impact of System Parameters
0.4
0.411
0.421
0.401
0.38732
4359
110
265
+-3 +-6 +-10 +-20 +-40
nr. extra lines
Precision (MRR)
Time(sec)
TimePrecision
Sp – site of retrieved passage
Sp
7
Performance AnalysisPerformance Analysis
Architecture with Feedbacks
M1+M2+M3+M4
M5
+ lexico-sem
alternations
M6M7+M
8Logic
ProvingM9+M1
0
Question
Answer
Loop 1
Loop 2
Loop 3
8
Performance AnalysisPerformance Analysis
Feedback added Precision (MRR) Incremental enhancement
none 0.421=b 0%
Passage retrieval (loop 1)
0.468=b1 b+11%
Lexico-semantic (loop 2)
0.542=b2 b1+15%
Proving (loop 3) 0.572=b3 b2+5%
Impact of System Parameters
9
On-line Ontology On-line Ontology ConstructionConstruction
Discover Concepts Step 1: Pick a set of related seed concepts Step 2: Form a corpus of N sentences that
contain at least one of the seeds Step 3: Parse the sentences in the corpus
and extract the NP that contain the seeds Step 4: Apply filtering procedures that
accept or reject new concepts Step 5: Form an ontology: classify new
concepts using subsumption
10
On-line Ontology On-line Ontology ConstructionConstruction
Discover Semantic Relations Step 1: Select the semantic relation R Step 2: Pick pairs of concepts among which
R holds Step 3: Form a corpus such that each
sentence contains one pair of concepts Step 4: Extract lexico-syntactic patterns
between concepts CiPCj
Step 5: Apply semantic constraints determined a priory and decide whether or not the pattern CiPCj is a semantic relation R
11
Extracting Concepts Extracting Concepts Methods:1. From NP that contain the seed.
Many of his fellow writer friends have been assassinated by islamist fundamentalist terrorist groups during the same years , in the nineties .
All the suicide terrorist groups have support infrastructures in Europe and in North America .
terrorist group
“is a”
12
Extracting Concepts Extracting Concepts (cont.)(cont.)
2. From lexico-syntactic patterns containing the seed.2.1 Via subsumptionSome domestic U.S. terrorist groups , including the Aryan Nation and the Phineas Priesthood , and some militia members are also religiously motivated in addition to being driven by a hatred of the federal government .
Terrorist groups including bin Laden 's , Hamas , Hizbollah , etc. in concert with Sudan , Iran and Iraq , form alliance , to be called " Jerusalem Foundation " , to coordinate global activities .
Religiously motivated terrorist groups , such as Usama bin Ladin 's group , al - Qaida , which is believed to have bombed the U.S. Embassies in Africa , represent a growing trend toward hatred of the United States .
terrorist group
“is a”
13
Extracting Concepts Extracting Concepts (cont.)(cont.)
2.2 Via lexical parallelism
During the same period , Erbakan and Refah leaders pledged their support for Hamas and other fundamentalist terrorist groups seeking to halt the Middle East peace process and to overthrow Egypt 's secular government.
terrorist group
“is a”
14
Power Ontology ToolPower Ontology Tool
15
Ontology SnapshotOntology Snapshotterrorist group
fundamentalist terrorist
group
Islamic terrorist group
islamist fundamentalist terrorist group
national Islamic terrorist group
Palestinian Islamic terrorist group
American terrorist group
Hamas Hizbollah
Number of concepts automatically identified: 107
Number of concepts rejected interactively: 25
Number of concepts collected and classified: 107 - 25 = 82
16
Overall ResultsOverall ResultsBuilding a Corpus from the Building a Corpus from the WebWeb
(1) Total time
(2) Number hits returned by search engine
(3) Number of sentences retained
seed Total time(1)
# hits from SE(2)
Sent. Ret.(3)
Base NPs(4)
Collected concepts(5)
Asian Countries 33 min 112756 hits 876 467 34
cosmographers 8 min 210 hits 116 65 17
Eastern European countries
26 min 24523 hits 468 245 15
Explosives 53 min 168793 hits 1408 840 57
Grenades 47 min 77137 hits 1334 866 92
Microsoft products
25 min 72951 hits 762 454 48
Operating systems
39 min 1240119 hits
839 529 58
Search engines 63 min 1869723 hits
2000 871 60
Sports cars 39 min 62311 hits 493 221 40
Terrorist groups 54 min 22683 hits 855 560 82
(4) Number of base NPs containing the seed identified in documents (including duplicates)
(5) Number of collected concepts
17
Semantic ConstraintsSemantic ConstraintsImposed by CausationImposed by Causation
Greenspan makes a recession
Greenspan makes a mistake
18
Semantic ConstraintsSemantic ConstraintsImposed by CausationImposed by Causation
Focus on < NP1 verb NP2 >
NP1
A hyponym of causal agent
verb
Senses of verbs that mean causation
NP2
A hyponym of a causation class
- Human action- Phenomenon- State- Psychological feature- Event
19
Semantic ConstraintsSemantic ConstraintsImposed by CausationImposed by Causation
causal agent make v#5: state
(cause to, do, make)
Greenspan makes a recession
causal agent make v#1: state
(make, do)
Greenspan makes a mistake
20
Answer FusionAnswer Fusion Study answer fusion at various levels of
complexity Questions asking simple facts
What countries import sugar from Cuba? Questions that require on-line ontology
development What software products does Microsoft sell? What causes asthma? What are the effects of alcohol on the brain?
Speculative questions about future events Where will Al Qaeda strike next?
21
Answer FusionAnswer Fusion Answers are extracted by building
an ontology on-line Cause/effect ontology
Q: What causes hypertension?
hypertensionhigh blood pressure
overwork virus fat overindulgence
obesity TV watching environmental factors
exhaustion chronic fatigue syndrome alcohol alcohol dehydration laxative abuse bacteria
atherosclerosis caffeine food poisoning viruses alcohol Salmonella anger
high salt intakesmoking
22
Answer FusionAnswer Fusion
Cause/effect ontology Q: What are the effects of stress?
hair lossabsenteeismgastrointestinal treat disordersillness nerve damageheadache physical problemshyperactive behavior reading inabilitydrug abuse, substance abuse money spendingdepression homelessness
suicide attemptweight lossfatiguereduced resistance to disease
stress,tension
23
Answer FusionAnswer Fusion Part-whole meronomy ontology
< NP1 have NP2 > car has clutch < NP2’s NP1 > John’s hand < NP1 of NP2 > leg of a table
Q: What does the AH-64A Apache helicopter consist of?
AH-64A Apache helicopter
Hellfire air-to-surface missile millimeter wave seeker70mm Folding Fin Aerial rocket30mm Cannon cameraArmamentsGeneral Electric 1700-GE engine4-rail launchersFour-bladed main rotorAnti-tank laser guided missileLongbow millimeter wave fire control radar integrated radar frequencyRotating turret interferometerTandem cockpit Kevlar seats
24
Answer FusionAnswer Fusion Questions with multiple ontologies
Q: What terrorist groups are in Asia? Build an ontology for terrorist groups Build an ontology for Asian countries Generate specific queries with
combinations between two ontologies
terrorist groups Asian countries
25
Thank you!Thank you!
Papers: Moldovan, Pasca, Surdeanu, Harabagiu, “Performance Issues and Error Analysis in an Open-Domain QA System”, ACL 2002. Girju, Moldovan, “Mining Answers for Causation Questions”, AAAI Spring Symposium 2002. Moldovan, Novischi, “Lexical Chains for Question Answering”, COLING 2002.