building a semantic parser overnight yushi wang jonathan berant percy liang t raghuveer

18
Building a Semantic Parser Overnight Yushi Wang Jonathan Berant Percy Liang T Raghuveer

Upload: candice-summers

Post on 13-Dec-2015

218 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Building a Semantic Parser Overnight Yushi Wang Jonathan Berant Percy Liang T Raghuveer

Building a Semantic Parser Overnight

Yushi Wang Jonathan Berant Percy Liang

T Raghuveer

Page 2: Building a Semantic Parser Overnight Yushi Wang Jonathan Berant Percy Liang T Raghuveer

Abstract

Functionality driven process for rapidly building a semantic parser in a new domain

The logical forms are meant to cover the desired set of compositional operators, and the canonical utterances are meant to capture the meaning of the logical forms (although clumsily).

Then crowdsourcing is used to paraphrase these canonical utterances into natural utterances. The resulting data is used to train the semantic parser

Study compositionality …paraphrases Tested on 7 new domains

Page 3: Building a Semantic Parser Overnight Yushi Wang Jonathan Berant Percy Liang T Raghuveer
Page 4: Building a Semantic Parser Overnight Yushi Wang Jonathan Berant Percy Liang T Raghuveer

Logical form• The logical form of a sentence is the form obtained by abstracting out the

subject matter of its content terms or by regarding the content terms as mere placeholders or blanks on a form. In an ideal logical language, the logical form can be determined from syntax alone

• Original argument– All humans are mortal.– Socrates is human.– Therefore, Socrates is mortal.

• Argument Form– All H are M.– S is H.– Therefore, S is M

• Multiple logic forms for one sentence and one logic for may correspond to multiple sentences.

Page 5: Building a Semantic Parser Overnight Yushi Wang Jonathan Berant Percy Liang T Raghuveer

Seed Lexicon (L)

• fixed database w set of triples (e1, p, e2),• where e1 and e2 are entities (e.g., article1, 2015) and p is

a property (e.g., publicationDate). • The purpose of L is to simply connect each predicate with some representation in natural

language

• L:– <t → s[p]>

• t is in natural language (representation)• p is a database property/entity• S is a category ex(RELNP,TYPENP)

– <person → TYPENP[person]>here person is the natural lang. representation

– And TYPENP[person] is a logical representation

Page 6: Building a Semantic Parser Overnight Yushi Wang Jonathan Berant Percy Liang T Raghuveer

Examples

• ‘person’ has the syntactic category TYPENP,• All entities ‘alice’ , ’1950’ are ENTITYNP.• Properties ‘publication date’ are RELNP• Unary predicates are realized as verb phrases VP.• binaries as either relational noun phrases (RELNP) or

generalized transitive verbs (VP/NP).

Page 7: Building a Semantic Parser Overnight Yushi Wang Jonathan Berant Percy Liang T Raghuveer

Domain General Grammer

Page 8: Building a Semantic Parser Overnight Yushi Wang Jonathan Berant Percy Liang T Raghuveer

Canonical Utterances

• “article that has the largest publication date” andarg max( type.article, publicationDate)).

• Lambda DCS is the logical language used

Page 9: Building a Semantic Parser Overnight Yushi Wang Jonathan Berant Percy Liang T Raghuveer

Paraphrasing• Synonym level : (“block” to “brick”)• RELNP -> prepositions (“meeting whose attendee is alice meeting ⇒ with alice”) • complex RELNP => argument can become embedded:

“player whose number of points is 15 “player who ⇒ scored 15 points”

• Superlative/comparative constructions => other RELNP-dependent “article that has the largest publication date ⇒ newest article”

Some examples“housing unit whose housingtype is apartment ⇒ apartment”

“university of student alice whosefield of study is music” becomes “At which university did Alice study music?”

Page 10: Building a Semantic Parser Overnight Yushi Wang Jonathan Berant Percy Liang T Raghuveer

Assumptions• Canonical compositionality :

Using a small grammar, all logical forms expressible in natural language can be realized compositionally based on the logical form.

• Sublexical Compositionality : – Our hypothesis is that the sublexical compositional units are small, so

we only need to crowdsource a small number of canonical utterances to learn about most of the language variability in the given domain

– “parent of alice whose gender is female “mother of alice”⇒– “person that is author of paper whose =>author is X co-author of X” ⇒

Bounded non-compositionalityNatural utterances for expressing complex logical forms are compositional with respect to fragments of bounded size– “NP[number of

NP[article CP[whose publication date is largerthan NP[publication date of article 1]]]]” -> “How many articles were published after

article 1?”

Page 11: Building a Semantic Parser Overnight Yushi Wang Jonathan Berant Percy Liang T Raghuveer

Crowdsourcing

• Amazon Mechanical Turk (AMT) to paraphrase the canonical utterances

• Paraphrases that share the same canonical utterance are collapsed, while identical paraphrases that have distinct canonical utterances are deleted.

• 26,098 examples collected over all domains

• 20 examples in each domain were manually analysed, and found that 17% of the utterances were inaccurate.

Page 12: Building a Semantic Parser Overnight Yushi Wang Jonathan Berant Percy Liang T Raghuveer

Domains, x,c

Page 13: Building a Semantic Parser Overnight Yushi Wang Jonathan Berant Percy Liang T Raghuveer

Model and Learning

• Log linear distribution over candidate pairs (z, c) GEN(G Lx):∈ ∪

• G : domain general grammer• “article published in 2015 that cites article 1” Lx or T(x) :

2015 → NP[2015]article 1 → NP[article1]

Page 14: Building a Semantic Parser Overnight Yushi Wang Jonathan Berant Percy Liang T Raghuveer

Features – Basic + Lexical

Page 15: Building a Semantic Parser Overnight Yushi Wang Jonathan Berant Percy Liang T Raghuveer
Page 16: Building a Semantic Parser Overnight Yushi Wang Jonathan Berant Percy Liang T Raghuveer

Accuracies

Analysis Tested – 7 domains Data : Generated facts using entities and properties Training : 80% Test : 20% Accuracy - fraction of examples that yield correct denotation.

Error Analysis 70% due to paraphrasing model “restaurants that have waiters and you can sit outside” >>> “restaurant that has waiter service and that takes reservations”

12.5% - Reordering issues “What venue has fewer than two articles” >>> “article that has less than two venue”

Page 17: Building a Semantic Parser Overnight Yushi Wang Jonathan Berant Percy Liang T Raghuveer

Thank you

Page 18: Building a Semantic Parser Overnight Yushi Wang Jonathan Berant Percy Liang T Raghuveer

Sublexical compositionality

• The idea is that common, multi-part concepts are compressed to single words or simpler constructions.

• “person that is author of paper whoseauthor is X co-author of X”⇒

“person whose birthdate is birthdate of X person born on the ⇒ same day as X”

“meeting whose start time is 3pm and whose end time is 5pm ⇒ meetings between 3pm and 5pm” “that allows cats and that allows dogs that allows pets”⇒