ling 581: advanced computational linguistics lecture notes april 27th
TRANSCRIPT
QA Homework
• Idea: evaluate the feasibility of QA on the web– using TREC 9 QA examples– programming it up is optional
– see and appreciate why it’s hard to do...
• Steps:– Pick 3 query groups– Simulate (programmatically) QA– use the Collins parser and WordNet to find answers to the queries– submit report (before final class next week)
• Example• Question groupWhat kind of animal was Winnie the Pooh?Winnie the Pooh is what kind of animal? What species was Winnie the Pooh?Winnie the Pooh is an imitation of which animal?What was the species of Winnie the Pooh?
Example
• trees
• reformulate Qs into declarative sentences with missing wh-phrase– ____ (kind of animal) is winnie the pooh– winnie the pooh is ____ (species)– winnie the pooh is an imitation of ____ (animal)– the species of winnie the pooh is ____
Example
• Answers:• Winnie the Pooh is such a
popular character in Poland• Winnie-the-Pooh Is My Co-
worker• Winnie the Pooh is a little,
adorable and cute bear obsessed by honey.
• Winnie-the-Pooh is so fat.• Winnie the Pooh is one of the
things most closest to my heart• Winnie the Pooh is his usual
befuddled self
Example
• Original declarative form:– winnie the pooh is ____
(species) • Check semantic relatedness of
extracted head words using WordNet:– character– co-worker– little– one– self
• Here, look at shortest paths
SummaryHeadword Length #nodes–bear 6 9258–character 6 1072–one 7 734 –co-worker 7 6488–self 7 14456–little 10 28706
Constraints:length < #nodes
XWN
• Applications– The Extended WordNet may be used as a Core Knowledge
Base for applications such as Question Answering, Information Retrieval, Information Extraction, Summarization, Natural Language Generation, Inferences, and other knowledge intensive applications.
– The glosses contain a part of the world knowledge since they define the most common concepts of the English language.
XWN
Example: • Dan Moldovan and Adrian Novischi, Lexical Chains for Question Answering, COLING 2002
COALS
• Take a look at an alternative to WordNet for computing similarity– WordNet: handbuilt system– COALS:
• the correlated occurrence analogue to lexical semantics• (Rohde et al. 2004)• a instance of a vector-based statistical model for similarity
– e.g., see also Latent Semantic Analysis (LSA) – Singular Valued Decomposition (SVD)
» sort by singular values, take top k and reduce the dimensionality of the co-occurrence matrix to rank k
• based on weighted co-occurrence data from large corpora
COALS
• Basic Idea:– compute co-occurrence counts for (open class) words from a large
corpora– corpora:
• Usenet postings over 1 month• 9 million (distinct) articles• 1.2 billion word tokens• 2.1 million word types
– 100,000th word occurred 98 times
– co-occurrence counts• based on a ramped weighting system with window size 4
– excluding closed-class items
4 4
wi
332 2 11
wi-1wi-2wi-3wi-4 wi+4wi+3wi+2wi+1
Worked Example: zealous• run connectbf/3
– ?- connectbf(impassioned,zealous,X).– X = 10 ?– ?- connectbf(zealous,impassioned,X).– X = 9 ?
• compare to b. ravenous– ?- connectbf(ravenous,zealous,X).– no– ?- connectbf(zealous,ravenous,X).
• shortest link between impassioned and zealous
Old Code: WordNet 1.7.1
Task: Match each word in the first column with its definition in the second column
accolade
abateaberrant
abscondacumen
abscissionacerbic
accretionabjureabrogate
deviation
abolishkeen insight
lessen in intensitysour or bitter
building updepart secretly
renounceremovalpraise
Task: Match each word in the first column with its definition in the second column
accolade
abateaberrant
abscondacumen
abscissionacerbic
accretionabjureabrogate
deviation
abolishkeen insight
lessen in intensitysour or bitter
building updepart secretly
renounceremovalpraise3
2
3
2
2
2
2
COALS and the GREACCOLADE
-0.05
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
DEVIATIONINSIGHT ABOLISH LESSEN
SOURDEPART BUILD
RENOUNCEREMOVAL
PRAISE
Correlation
COALS and the GREABERRANT
-0.04
-0.02
0
0.02
0.04
0.06
0.08
0.1
0.12
0.14
DEVIATIONINSIGHT ABOLISH LESSEN
SOURDEPART BUILD
RENOUNCEREMOVAL
PRAISE
Correlation
COALS and the GREABATE
-0.1
-0.05
0
0.05
0.1
0.15
0.2
DEVIATIONINSIGHT ABOLISH LESSEN
SOURDEPART BUILD
RENOUNCEREMOVAL
PRAISE
Correlation
COALS and the GREABSCOND
-0.06
-0.04
-0.02
0
0.02
0.04
0.06
0.08
0.1
DEVIATIONINSIGHT ABOLISH LESSEN
SOURDEPART BUILD
RENOUNCEREMOVAL
PRAISE
Correlation
COALS and the GREACUMEN
-0.05
0
0.05
0.1
0.15
0.2
0.25
DEVIATIONINSIGHT ABOLISH LESSEN
SOURDEPART BUILD
RENOUNCEREMOVAL
PRAISE
Correlation
COALS and the GREACERBIC
-0.1
-0.05
0
0.05
0.1
0.15
DEVIATIONINSIGHT ABOLISH LESSEN
SOURDEPART BUILD
RENOUNCEREMOVAL
PRAISE
Correlation
COALS and the GREACCRETION
-0.05
-0.04
-0.03
-0.02
-0.01
0
0.01
0.02
0.03
0.04
0.05
DEVIATIONINSIGHT ABOLISH LESSEN
SOURDEPART BUILD
RENOUNCEREMOVAL
PRAISECorrelation
COALS and the GREABJURE
-0.1
-0.05
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
DEVIATIONINSIGHT ABOLISH LESSEN
SOURDEPART BUILD
RENOUNCEREMOVAL
PRAISE
Correlation
COALS and the GREABROGATE
-0.1
-0.05
0
0.05
0.1
0.15
0.2
0.25
0.3
DEVIATIONINSIGHT ABOLISH LESSEN
SOURDEPART BUILD
RENOUNCEREMOVAL
PRAISE
Correlation
Task: Match each word in the first column with its definition in the second column
accolade
abateaberrant
abscondacumen
abscissionacerbic
accretionabjureabrogate
deviation
abolishkeen insight
lessen in intensitysour or bitter
building updepart secretly
renounceremovalpraise