statistical nlp spring 2010
DESCRIPTION
Statistical NLP Spring 2010. Lecture 21: Compositional Semantics. Dan Klein – UC Berkeley. Includes slides from Luke Zettlemoyer. Truth-Conditional Semantics. Linguistic expressions: “Bob sings” Logical translations: sings(bob) Could be p_1218(e_397) Denotation: - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Statistical NLP Spring 2010](https://reader035.vdocuments.us/reader035/viewer/2022070405/56813caf550346895da65b46/html5/thumbnails/1.jpg)
Statistical NLPSpring 2010
Lecture 21: Compositional SemanticsDan Klein – UC Berkeley
Includes slides from Luke Zettlemoyer
![Page 2: Statistical NLP Spring 2010](https://reader035.vdocuments.us/reader035/viewer/2022070405/56813caf550346895da65b46/html5/thumbnails/2.jpg)
Truth-Conditional Semantics
Linguistic expressions: “Bob sings”
Logical translations: sings(bob) Could be p_1218(e_397)
Denotation: [[bob]] = some specific person (in some context) [[sings(bob)]] = ???
Types on translations: bob : e (for entity) sings(bob) : t (for truth-value)
S
NP
Bob
bob
VP
sings
y.sings(y)
sings(bob)
![Page 3: Statistical NLP Spring 2010](https://reader035.vdocuments.us/reader035/viewer/2022070405/56813caf550346895da65b46/html5/thumbnails/3.jpg)
Truth-Conditional Semantics Proper names:
Refer directly to some entity in the world Bob : bob [[bob]]W ???
Sentences: Are either true or false (given
how the world actually is) Bob sings : sings(bob)
So what about verbs (and verb phrases)? sings must combine with bob to produce sings(bob) The -calculus is a notation for functions whose arguments are
not yet filled. sings : x.sings(x) This is predicate – a function which takes an entity (type e) and
produces a truth value (type t). We can write its type as et. Adjectives?
S
NP
Bob
bob
VP
sings
y.sings(y)
sings(bob)
![Page 4: Statistical NLP Spring 2010](https://reader035.vdocuments.us/reader035/viewer/2022070405/56813caf550346895da65b46/html5/thumbnails/4.jpg)
Compositional Semantics So now we have meanings for the words How do we know how to combine words? Associate a combination rule with each grammar rule:
S : () NP : VP : (function application) VP : x . (x) (x) VP : and : VP : (intersection)
Example:
S
NP VP
Bob VP and
sings
VP
dancesbob
y.sings(y) z.dances(z)
x.sings(x) dances(x)
[x.sings(x) dances(x)](bob)
sings(bob) dances(bob)
![Page 5: Statistical NLP Spring 2010](https://reader035.vdocuments.us/reader035/viewer/2022070405/56813caf550346895da65b46/html5/thumbnails/5.jpg)
Denotation What do we do with logical translations?
Translation language (logical form) has fewer ambiguities
Can check truth value against a database Denotation (“evaluation”) calculated using the database
More usefully: assert truth and modify a database Questions: check whether a statement in a corpus
entails the (question, answer) pair: “Bob sings and dances” “Who sings?” + “Bob”
Chain together facts and use them for comprehension
![Page 6: Statistical NLP Spring 2010](https://reader035.vdocuments.us/reader035/viewer/2022070405/56813caf550346895da65b46/html5/thumbnails/6.jpg)
Other Cases
Transitive verbs: likes : x.y.likes(y,x) Two-place predicates of type e(et). likes Amy : y.likes(y,Amy) is just like a one-place predicate.
Quantifiers: What does “Everyone” mean here? Everyone : f.x.f(x) Mostly works, but some problems
Have to change our NP/VP rule. Won’t work for “Amy likes everyone.”
“Everyone likes someone.” This gets tricky quickly!
S
NP VP
Everyone VBP NP
Amylikes
x.y.likes(y,x)
y.likes(y,amy)
amy
f.x.f(x)
[f.x.f(x)](y.likes(y,amy))
x.likes(x,amy)
![Page 7: Statistical NLP Spring 2010](https://reader035.vdocuments.us/reader035/viewer/2022070405/56813caf550346895da65b46/html5/thumbnails/7.jpg)
Indefinites First try
“Bob ate a waffle” : ate(bob,waffle) “Amy ate a waffle” : ate(amy,waffle)
Can’t be right! x : waffle(x) ate(bob,x) What does the translation
of “a” have to be? What about “the”? What about “every”?
S
NP VP
Bob VBD NP
a waffleate
![Page 8: Statistical NLP Spring 2010](https://reader035.vdocuments.us/reader035/viewer/2022070405/56813caf550346895da65b46/html5/thumbnails/8.jpg)
Grounding
Grounding So why does the translation likes : x.y.likes(y,x) have anything
to do with actual liking? It doesn’t (unless the denotation model says so) Sometimes that’s enough: wire up bought to the appropriate
entry in a database
Meaning postulates Insist, e.g x,y.likes(y,x) knows(y,x) This gets into lexical semantics issues
Statistical version?
![Page 9: Statistical NLP Spring 2010](https://reader035.vdocuments.us/reader035/viewer/2022070405/56813caf550346895da65b46/html5/thumbnails/9.jpg)
Tense and Events In general, you don’t get far with verbs as predicates Better to have event variables e
“Alice danced” : danced(alice) e : dance(e) agent(e,alice) (time(e) < now)
Event variables let you talk about non-trivial tense / aspect structures “Alice had been dancing when Bob sneezed” e, e’ : dance(e) agent(e,alice)
sneeze(e’) agent(e’,bob) (start(e) < start(e’) end(e) = end(e’)) (time(e’) < now)
![Page 10: Statistical NLP Spring 2010](https://reader035.vdocuments.us/reader035/viewer/2022070405/56813caf550346895da65b46/html5/thumbnails/10.jpg)
Adverbs What about adverbs?
“Bob sings terribly”
terribly(sings(bob))?
(terribly(sings))(bob)?
e present(e) type(e, singing) agent(e,bob) manner(e, terrible) ?
Hard to work out correctly!
S
NP VP
Bob VBP ADVP
terriblysings
![Page 11: Statistical NLP Spring 2010](https://reader035.vdocuments.us/reader035/viewer/2022070405/56813caf550346895da65b46/html5/thumbnails/11.jpg)
Propositional Attitudes “Bob thinks that I am a gummi bear”
thinks(bob, gummi(me)) ? thinks(bob, “I am a gummi bear”) ? thinks(bob, ^gummi(me)) ?
Usual solution involves intensions (^X) which are, roughly, the set of possible worlds (or conditions) in which X is true
Hard to deal with computationally Modeling other agents models, etc Can come up in simple dialog scenarios, e.g., if you want to talk
about what your bill claims you bought vs. what you actually bought
![Page 12: Statistical NLP Spring 2010](https://reader035.vdocuments.us/reader035/viewer/2022070405/56813caf550346895da65b46/html5/thumbnails/12.jpg)
Trickier Stuff
Non-Intersective Adjectives green ball : x.[green(x) ball(x)] fake diamond : x.[fake(x) diamond(x)] ?
Generalized Quantifiers the : f.[unique-member(f)] all : f. g [x.f(x) g(x)] most? Could do with more general second order predicates, too (why worse?)
the(cat, meows), all(cat, meows) Generics
“Cats like naps” “The players scored a goal”
Pronouns (and bound anaphora) “If you have a dime, put it in the meter.”
… the list goes on and on!
x.[fake(diamond(x))
![Page 13: Statistical NLP Spring 2010](https://reader035.vdocuments.us/reader035/viewer/2022070405/56813caf550346895da65b46/html5/thumbnails/13.jpg)
Multiple Quantifiers Quantifier scope
Groucho Marx celebrates quantifier order ambiguity:“In this country a woman gives birth every 15 min.
Our job is to find that woman and stop her.”
Deciding between readings “Bob bought a pumpkin every Halloween” “Bob put a warning in every window” Multiple ways to work this out
Make it syntactic (movement) Make it lexical (type-shifting)
![Page 14: Statistical NLP Spring 2010](https://reader035.vdocuments.us/reader035/viewer/2022070405/56813caf550346895da65b46/html5/thumbnails/14.jpg)
Modeling Uncertainty?
Gaping hole warning! Big difference between statistical disambiguation and statistical
reasoning.
With probabilistic parsers, can say things like “72% belief that the PP attaches to the NP.”
That means that probably the enemy has night vision goggles. However, you can’t throw a logical assertion into a theorem prover
with 72% confidence. Not clear humans really extract and process logical statements
symbolically anyway. Use this to decide the expected utility of calling reinforcements?
In short, we need probabilistic reasoning, not just probabilistic disambiguation followed by symbolic reasoning!
The scout saw the enemy soldiers with night goggles.
![Page 15: Statistical NLP Spring 2010](https://reader035.vdocuments.us/reader035/viewer/2022070405/56813caf550346895da65b46/html5/thumbnails/15.jpg)
CCG Parsing
Combinatory Categorial Grammar Fully (mono-)
lexicalized grammar
Categories encode argument sequences
Very closely related to the lambda calculus
Can have spurious ambiguities (why?)
![Page 16: Statistical NLP Spring 2010](https://reader035.vdocuments.us/reader035/viewer/2022070405/56813caf550346895da65b46/html5/thumbnails/16.jpg)
Mapping to Logical Form
Learning to Map Sentences to Logical Form
Texas borders Kansas
borders(texas,kansas)
![Page 17: Statistical NLP Spring 2010](https://reader035.vdocuments.us/reader035/viewer/2022070405/56813caf550346895da65b46/html5/thumbnails/17.jpg)
Some Training Examples
Input: What states border Texas?Output: x.state(x) borders(x,texas)
Input: What is the largest state?Output: argmax(x.state(x), x.size(x))
Input: What states border the largest state?Output: x.state(x) borders(x, argmax(y.state(y), y.size(y)))
![Page 18: Statistical NLP Spring 2010](https://reader035.vdocuments.us/reader035/viewer/2022070405/56813caf550346895da65b46/html5/thumbnails/18.jpg)
CCG Lexicon
WordsCategory
Syntax : Semantics
Texas NP : texas
borders (S\NP)/NP : x.y.borders(y,x)
Kansas NP : kansas
Kansas city NP : kansas_city_MO
![Page 19: Statistical NLP Spring 2010](https://reader035.vdocuments.us/reader035/viewer/2022070405/56813caf550346895da65b46/html5/thumbnails/19.jpg)
Parsing Rules (Combinators)
• Application X/Y : f Y : a => X : f(a)
Y : a X\Y : f => X : f(a)
• Additional rules Composition Type Raising
(S\NP)/NPx.y.borders(y,x)
NPtexas
S\NPy.borders(y,texas)
NPkansas
S\NPy.borders(y,texas)
Sborders(kansas,texas)
![Page 20: Statistical NLP Spring 2010](https://reader035.vdocuments.us/reader035/viewer/2022070405/56813caf550346895da65b46/html5/thumbnails/20.jpg)
CCG Parsing
NPtexas
(S\NP)/NPx.y.borders(y,x)
borders KansasTexas
NPkansas
S\NPy.borders(y,kansas)
Sborders(texas,kansas)
![Page 21: Statistical NLP Spring 2010](https://reader035.vdocuments.us/reader035/viewer/2022070405/56813caf550346895da65b46/html5/thumbnails/21.jpg)
Parsing a Question
(S\NP)/NPx.y.borders(y,x)
border TexasWhat
NPtexas
S\NPy.borders(y,texas)
states
Nx.state(x)
S/(S\NP)/Nf.g.x.f(x)g(x)
S/(S\NP)g.x.state(x)g(x)
Sx.state(x) borders(x,texas)
![Page 22: Statistical NLP Spring 2010](https://reader035.vdocuments.us/reader035/viewer/2022070405/56813caf550346895da65b46/html5/thumbnails/22.jpg)
Lexical Generation
Words Category
Texas NP : texas
borders (S\NP)/NP : x.y.borders(y,x)
Kansas NP : kansas
... ...
Output Lexicon
Input Training ExampleSentence: Texas borders Kansas Logic Form: borders(texas,kansas)
![Page 23: Statistical NLP Spring 2010](https://reader035.vdocuments.us/reader035/viewer/2022070405/56813caf550346895da65b46/html5/thumbnails/23.jpg)
GENLEX
• Input: a training example (Si,Li)
• Computation:
1. Create all substrings of words in Si
2. Create categories from Li
3. Create lexical entries that are the cross product of these two sets
• Output: Lexicon
![Page 24: Statistical NLP Spring 2010](https://reader035.vdocuments.us/reader035/viewer/2022070405/56813caf550346895da65b46/html5/thumbnails/24.jpg)
GENLEX Cross Product
Output Substrings:Texas
borders
Kansas
Texas borders
borders Kansas
Texas borders Kansas
Output Categories:
NP : texasNP : kansas(S\NP)/NP : x.y.borders(y,x)
GENLEX is the cross product in these two output sets
X
Input Training ExampleSentence: Texas borders Kansas Logic Form: borders(texas,kansas)
Output Lexicon
![Page 25: Statistical NLP Spring 2010](https://reader035.vdocuments.us/reader035/viewer/2022070405/56813caf550346895da65b46/html5/thumbnails/25.jpg)
GENLEX: Output Lexicon
Words CategoryTexas NP : texas
Texas NP : kansas
Texas (S\NP)/NP : x.y.borders(y,x)borders NP : texas
borders NP : kansas
borders (S\NP)/NP : x.y.borders(y,x)... ...
Texas borders Kansas NP : texas
Texas borders Kansas NP : kansas
Texas borders Kansas (S\NP)/NP : x.y.borders(y,x)
![Page 26: Statistical NLP Spring 2010](https://reader035.vdocuments.us/reader035/viewer/2022070405/56813caf550346895da65b46/html5/thumbnails/26.jpg)
Weighted CCG
Given a log-linear model with a CCG lexicon , a feature vector f, and weights w.
The best parse is:
Where we consider all possible parses y for the sentence x given the lexicon .
y* argmaxy
w f (x, y)
![Page 27: Statistical NLP Spring 2010](https://reader035.vdocuments.us/reader035/viewer/2022070405/56813caf550346895da65b46/html5/thumbnails/27.jpg)
Inputs: Training set {(xi, zi) | i=1…n} of sentences and logical forms. Initial lexicon . Initial parameters w. Number of iterations T.
Computation: For t = 1…T, i =1…n:
Step 1: Check Correctness• Let
• If L(y*) = zi, go to the next example
Step 2: Lexical Generation• Set • Let
• Define i to be the lexical entries in
• Set lexicon to i
Step 3: Update Parameters• Let• If
• Set
Output: Lexicon and parameters w.
y* argmaxy
w f (x i, y)
GENLEX(x i,zi)
ˆ y arg maxy s.t. L(y )zi
w f (x i, y)
y argmaxy
w f (x i,y)
L( y ) zi
w w f (x i, ˆ y ) f (x i, y )
y
![Page 28: Statistical NLP Spring 2010](https://reader035.vdocuments.us/reader035/viewer/2022070405/56813caf550346895da65b46/html5/thumbnails/28.jpg)
Example Learned Lexical Entries
Words Categorystates N : x.state(x)major N/N : g.x.major(x)g(x)
population N : x.population(x)cities N : x.city(x)river N : x.river(x)
run through (S\NP)/NP : x.y.traverse(y,x)the largest NP/N : g.argmax(g,x.size(x))
rivers N : x.river(x)the highest NP/N : g.argmax(g,x.elev(x))the longest NP/N : g.argmax(g,x.len(x))
... ...
![Page 29: Statistical NLP Spring 2010](https://reader035.vdocuments.us/reader035/viewer/2022070405/56813caf550346895da65b46/html5/thumbnails/29.jpg)
Challenge Revisited
The lexical entries that work for:
Show me the latest flight from Boston to Prague on Friday
S/NP NP/N N N\N N\N N\N
… … … … … …
Will not parse:
Boston to Prague the latest on Friday
NP N\N NP/N N\N … … … …
![Page 30: Statistical NLP Spring 2010](https://reader035.vdocuments.us/reader035/viewer/2022070405/56813caf550346895da65b46/html5/thumbnails/30.jpg)
Disharmonic Application
Reverse the direction of the principal category:
X\Y : f Y : a => X : f(a)
Y : a X/Y : f => X : f(a)
Nx.flight(x)
N/Nf.x.f(x)one_way(x)
flights one way
Nx.flight(x)one_way(x)
![Page 31: Statistical NLP Spring 2010](https://reader035.vdocuments.us/reader035/viewer/2022070405/56813caf550346895da65b46/html5/thumbnails/31.jpg)
Missing content words
Insert missing semantic content
NP : c => N\N : f.x.f(x) p(x,c)
Nx.flight(x)
N\Nf.x.f(x)to(x,PRG)
flights to Prague
NPBOS
Boston
N\Nf.x.f(x)from(x,BOS)
Nx.flight(x)from(x,BOS)
Nx.flight(x)from(x,BOS)to(x,PRG)
![Page 32: Statistical NLP Spring 2010](https://reader035.vdocuments.us/reader035/viewer/2022070405/56813caf550346895da65b46/html5/thumbnails/32.jpg)
Missing content-free words
Bypass missing nounsN\N : f => N : f(x.true)
N/Nf.x.f(x)airline(x,NWA)
N\Nf.x.f(x)to(x,PRG)
Northwest Air to Prague
Nx.to(x,PRG)
Nx.airline(x,NWA) to(x,PRG)
![Page 33: Statistical NLP Spring 2010](https://reader035.vdocuments.us/reader035/viewer/2022070405/56813caf550346895da65b46/html5/thumbnails/33.jpg)
A Complete Parse
NPBOS
N\Nf.x.f(x)to(x,PRG)
Boston to Prague
NP/Nf.argmax(x.f(x),x.time(x))
the latest
N\Nf.x.f(x)day(x,FRI)
on Friday
Nx.day(x,FRI)
N\Nf.x.f(x)from(x,BOS)
N\Nf.x.f(x)from(x,BOS) to(x,PRG)
NP\Nf.argmax(x.f(x)from(x,BOS)to(x,PRG), x.time(x))
Nargmax(x.from(x,BOS)to(x,PRG)day(x,FRI), x.time(x))
![Page 34: Statistical NLP Spring 2010](https://reader035.vdocuments.us/reader035/viewer/2022070405/56813caf550346895da65b46/html5/thumbnails/34.jpg)
Geo880 Test Set
Precision Recall F1
Zettlemoyer & Collins 2007
95.49 83.20 88.93
Zettlemoyer & Collins 2005
96.25 79.29 86.95
Wong & Money 2007 93.72 80.00 86.31
Exact Match Accuracy:
![Page 35: Statistical NLP Spring 2010](https://reader035.vdocuments.us/reader035/viewer/2022070405/56813caf550346895da65b46/html5/thumbnails/35.jpg)
![Page 36: Statistical NLP Spring 2010](https://reader035.vdocuments.us/reader035/viewer/2022070405/56813caf550346895da65b46/html5/thumbnails/36.jpg)
![Page 37: Statistical NLP Spring 2010](https://reader035.vdocuments.us/reader035/viewer/2022070405/56813caf550346895da65b46/html5/thumbnails/37.jpg)
![Page 38: Statistical NLP Spring 2010](https://reader035.vdocuments.us/reader035/viewer/2022070405/56813caf550346895da65b46/html5/thumbnails/38.jpg)
Machine Translation Approach
SourceText
TargetText
nous acceptons votre opinion .
we accept your view .
![Page 39: Statistical NLP Spring 2010](https://reader035.vdocuments.us/reader035/viewer/2022070405/56813caf550346895da65b46/html5/thumbnails/39.jpg)
Translations from Monotexts
SourceText
TargetText
Translation without parallel text?
Need (lots of) sentences
[Fung 95, Koehn and Knight 02, Haghighi and Klein 08]
![Page 40: Statistical NLP Spring 2010](https://reader035.vdocuments.us/reader035/viewer/2022070405/56813caf550346895da65b46/html5/thumbnails/40.jpg)
Task: Lexicon Matching
SourceText
TargetText
Matchingmstate
world
name
Source Words s
nation
estado
política
Target Words t
mundo
nombre
[Haghighi and Klein 08]
![Page 41: Statistical NLP Spring 2010](https://reader035.vdocuments.us/reader035/viewer/2022070405/56813caf550346895da65b46/html5/thumbnails/41.jpg)
Data Representation
state
SourceText
Orthographic Features
1.01.0
1.0
#sttatte#
Context Features
20.05.0
10.0
worldpoliticssociety
![Page 42: Statistical NLP Spring 2010](https://reader035.vdocuments.us/reader035/viewer/2022070405/56813caf550346895da65b46/html5/thumbnails/42.jpg)
Data Representation
state
Orthographic Features1.0
1.0
1.0
#st
tatte#
5.0
20.0
10.0
Context Features
worldpoliticssociety
SourceText
estado
Orthographic Features1.0
1.0
1.0
#es
stado#
10.0
17.0
6.0
Context Features
mundopoliticasocieda
d
TargetText
![Page 43: Statistical NLP Spring 2010](https://reader035.vdocuments.us/reader035/viewer/2022070405/56813caf550346895da65b46/html5/thumbnails/43.jpg)
Generative Model (CCA)
estadostateSource Space Target Space
PAria
Canonical Space
![Page 44: Statistical NLP Spring 2010](https://reader035.vdocuments.us/reader035/viewer/2022070405/56813caf550346895da65b46/html5/thumbnails/44.jpg)
Generative Model (Matching)
Source Words s
Target Words tMatching
mstate
world
name
nation
estado
nombre
politica
mundo
![Page 45: Statistical NLP Spring 2010](https://reader035.vdocuments.us/reader035/viewer/2022070405/56813caf550346895da65b46/html5/thumbnails/45.jpg)
E-Step: Find best matching
M-Step: Solve a CCA problem
Inference: Hard EM
![Page 46: Statistical NLP Spring 2010](https://reader035.vdocuments.us/reader035/viewer/2022070405/56813caf550346895da65b46/html5/thumbnails/46.jpg)
Experimental Setup
Data: 2K most frequent nouns, texts from Wikipedia
Seed: 100 translation pairs
Evaluation: Precision and Recall against lexicon obtained from Wiktionary Report p0.33, precision at recall 0.33
![Page 47: Statistical NLP Spring 2010](https://reader035.vdocuments.us/reader035/viewer/2022070405/56813caf550346895da65b46/html5/thumbnails/47.jpg)
Lexicon Quality (EN-ES)Pr
ecis
ion
Recall
![Page 48: Statistical NLP Spring 2010](https://reader035.vdocuments.us/reader035/viewer/2022070405/56813caf550346895da65b46/html5/thumbnails/48.jpg)
Analysis
![Page 49: Statistical NLP Spring 2010](https://reader035.vdocuments.us/reader035/viewer/2022070405/56813caf550346895da65b46/html5/thumbnails/49.jpg)
Analysis
Interesting Matches Interesting Mistakes
![Page 50: Statistical NLP Spring 2010](https://reader035.vdocuments.us/reader035/viewer/2022070405/56813caf550346895da65b46/html5/thumbnails/50.jpg)
Language Variation