answering list questions using co-occurrence and clustering
DESCRIPTION
Answering List Questions using Co-occurrence and Clustering. Majid Razmara and Leila Kosseim Concordia University [email protected]. Introduction. Question Answering TREC QA track Question Series Corpora. Target: American Girl dolls - PowerPoint PPT PresentationTRANSCRIPT
Answering List Questions using Co-occurrence and
Clustering
Majid Razmara and Leila KosseimConcordia [email protected]
2
Introduction
• Question Answering• TREC QA track
Question Series Corpora
Target: American Girl dolls FACTOID: In what year were American Girl dolls first introduced? LIST: Name the historical dolls. LIST: Which American Girl dolls have had TV movies made about them? FACTOID: How much does an American Girl doll cost? FACTOID: How many American Girl dolls have been sold? FACTOID: What is the name of the American Girl store in New York? FACTOID: What corporation owns the American Girl company? OTHER: Other
3
Hypothesis
• Answer Instances
1. Have the same semantic entity class
2. Co-occur within sentences, or
3. Occur in different sentences sharing similar context
Based on Distributional Hypothesis: “Words occurring in the
same contexts tend to have similar meanings” [Harris, 1954].
Ltw_Eng_20050712.0032 (AQUAINT-2)United, which operates a hub at Dulles, has six luggage screening machines in its basement and several upstairs in the ticket counter area. Delta, Northwest, American, British Airways and KLM share four screening machines in the basement.
Ltw_Eng_20060102.0106 (AQUAINT-2)Independence said its last flight Thursday will leave White Plains, N.Y., bound for Dulles Airport.Flyi suffered from rising jet fuel costs and the aggressive response of competitors, led by United and US Airways.
New York Times (Web)Continental Airlines sued United Airlines and the committee that oversees operations at Washington Dulles International Airport yesterday, contending that recently installed baggage-sizing templates inhibited competition.
Wikipedia (Web)At its peak of 600 flights daily, Independence, combined with service from JetBlue and AirTranAirTran, briefly made Dulles the largest low-cost hub in the United States.
Target 232: "Dulles Airport“ Question 232.6: "Which airlines use Dulles”
4
5
Our Approach
1. Create an initial candidate list Answer Type Recognition Document Retrieval Candidate Answer Extraction It may also be imported from an external source
(e.g. Factoid QA)
2. Extract co-occurrence information
3. Cluster candidates based on their co-occurrence
6
Answer Type Recognition
• 9 Types: Person, Country, Organization, Job, Movie, Nationality, City, State,
and Other
• Lexical Patterns ^ (Name | List | What | Which) (persons | people | men | women | players | contestants |
artists | opponents | students) PERSON
^ (Name | List | What | Which) (countries | nations) COUNTRY
• Syntagmatic Patterns for Other types ^ (WDT | WP | VB | NN) (DT | JJ)* (NNS | NNP | NN | JJ | )* (NNS | NNP | NN | NNPS) (VBN | VBD |
VBZ | WP | $)
^ (WDT | WP | VB | NN) (VBD | VBP) (DT | JJ | JJR | PRP$ | IN)* (NNS | NNP | NN | )* (NNS | NNP | NN)
• Type Resolution
7
Type Resolution
• Resolves the answer subtype to one of the main types List previous conductors of the Boston Pops.
Type: OTHER Sub Type: Conductor PERSON
• WordNet's Hypernym Hierarchy
8
Document Retrieval
• Document Collection Source Document Collection
Few documents To extract candidates
Domain Document Collection Many documents To extract co-occurrence information
• Query Generation Google Query on Web Lucene Query on Corpora
Source
Domain
9
Candidate Answer Extraction
• Term Extraction Extract all terms that conform to the expected answer type Person, Organization, Job
Intersection of several NE taggers: LingPipe, Stanford tagger & Gate NE To get a better precision
Country, State, City, Nationality Gazetteer To get a better precision
Movie, Other Capitalized and quoted terms
Verification of Movie
Verification of OthernumHits(“SubType Term” OR “Term SubType”)
numHits(“Term”)
numHits(GoogleQuery intitle:Term site:www.imdb.com)
10
Co-occurrence Information Extraction
• Domain Collection Documents are split into sentences
• Each sentence is checked as to whether it contains
candidate answers
3 131
0 02
3 1
0
Candidates
Can
dida
tes
Sente
nces
• Steps:
1. Put each candidate term ti in a separate cluster Ci
2. Compute the similarity between each pair of clusters Average Linkage
3. Merge two clusters with highest inter-cluster similarity
4. Update all relations between this new cluster and other clusters
5. Go to step 3 until There are only N clusters, or The similarity is less than a threshold
11
Hierarchical Agglomerative Clustering
im jnCt Ct
nmji
ji )t,(tsimilarity|C||C|
1)C,(Csimilarity
• Similarity between each pair of candidates
• Based on co-occurrence within sentences
• Using chi-square (2)
• Shortcoming12
The Similarity Measure
)O)(OO)(OO)(OO(O
)OO– O(O N
2221221221111211
2211222112
termitermiTotal
termjO11O21O11 + O21
termjO12O22O12 + O22
TotalO11 + O12O21 + O22 N
13
Pinpointing the Right Cluster
• Question and target keywords are used as “spies”
• Spies are: Inserted into the list of candidate answers Are treated as candidate answers, hence
their similarity to one another and similarity to candidate answers are computed
they are clustered along with candidate answers
• The cluster with the most number of spies is returned The spies are removed
• Other approaches
oman
Cluster-31Cluster-2
Cluster-9
spain, bangladesh,
japan, germany, haiti, nepal,
china, sweden, iran, mexico,
vietnam, belgium, lebanon,
iraq, russia, turkey
pakistan, 2005,
afghanistan, octob,
u.s, india, affect,
earthquak
pakistanpakistan, 2005,
afghanistan, octob,
u.s, india, affect,
earthquak
pakistan, 2005,
afghanistan, octob,
u.s, india, affect,
earthquak
Target 269: Pakistan earthquakes of October 2005Pakistan earthquakes of October 2005
Question 269.2: What countries were affected by this earthquakeWhat countries were affected by this earthquake??
Recall = 2/3Recall = 2/3
Precision = 2/3Precision = 2/3
F-score = 2/3F-score = 2/3
14
15
TREC 2007 Results (F-measure)
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
0.5
Lym
baPA07
LCCFer
ret
ILQUA1
QASCU3
Ephyr
a3
UofL
FDUQAT16
B
IITDIB
M20
07T
pron
to07
run3
lsv200
7a
pirc
s07qa
1
QUANTA
csai
l1
Inte
llexe
r7A
aske
d07c
Dal07t
uam
s07a
tch
MIT
RE2007B
Drexe
lRun
2
eduF
sc05
iiitqa
07
Best0.479
Median0.085
Worst0.000
F=14.5
Results in TREC 2007
16
Evaluation of Clustering
• Baseline List of candidate answers prior to clustering
• Our Approach List of candidate answers filtered by the clustering
• Theoretical Maximum The best possible output of clustering based on the initial list
CorpusCorpusQuestionsQuestionsPrecisionPrecisionRecallRecallF-scoreF-score
BaselineBaselineOur ApproachOur ApproachTheoretical MaxTheoretical Max
2004TREC 2005 2006
237
0.0640.1411
0.4070.2870.407
0.0980.1540.472
BaselineBaselineOur ApproachOur ApproachTheoretical MaxTheoretical Max
TREC 200785
0.0750.1651
0.3880.2480.388
0.1060.1630.485
17
Percentage of each Question Type in Training Set
36%
32%
15%
5%5%
3%
2%
1%
1%
Other Person CountryOrganization Movie CityJob State Nationality
Evaluation of each Question Type
F-score of each type in training and test sets
0.000
0.100
0.200
0.300
0.400
0.500
0.600
0.700
Perso
nO
ther
Country
State
Org
anizat
ion Job
Mov
ie
Nationa
lity City
Test Set
Training Set
18
Future Work
• Developing a module that verifies whether each candidate is a member of the answer type In case of Movie and Other types
• Using co-occurrence at the paragraph level rather than the sentence level Anaphora Resolution can be used Another method for similarity measure
2 does not work well with sparse data for example, using Yates correction for continuity (Yates’ 2)
• Using different clustering approaches
• Using different similarity measures Mutual Information
19
Questions?