mono- and bilingual modeling of selectional preferences sebastian padó institute for computational...

25
Mono- and bilingual modeling of selectional preferences Sebastian Padó Institute for Computational Linguistics Heidelberg University (joint work with Katrin Erk, Ulrike Pado, Yves Peirsman)

Post on 21-Dec-2015

215 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Mono- and bilingual modeling of selectional preferences Sebastian Padó Institute for Computational Linguistics Heidelberg University (joint work with Katrin

Mono- and bilingual modeling of selectional preferences

Sebastian PadóInstitute for Computational Linguistics

Heidelberg University

(joint work with Katrin Erk, Ulrike Pado, Yves Peirsman)

Page 2: Mono- and bilingual modeling of selectional preferences Sebastian Padó Institute for Computational Linguistics Heidelberg University (joint work with Katrin

Some context

•Computational lexical semantics: modeling the meaning of words and phrases

•Distributional approach• Observe the usage of words in corpora

• Robustness: Broad coverage, manageable complexity • Flexibility: Corpus choice determines model

Knowledge

Corpus

Page 3: Mono- and bilingual modeling of selectional preferences Sebastian Padó Institute for Computational Linguistics Heidelberg University (joint work with Katrin

Structure

Methods:Distributional

semantics

Phenomena:Semantic

relations in bilingual

dictionaries

Application:Predictions of plausibility judgments

Page 4: Mono- and bilingual modeling of selectional preferences Sebastian Padó Institute for Computational Linguistics Heidelberg University (joint work with Katrin

Plausibility of Verb-Relation-Argument-Triples

Verb Relation Argument

Plausibility

eat subject customer 6.9

eat object customer 1.5

eat subject apple 1.0

eat object apple 6.4• Central aspect of language• Selectional preferences [Katz & Fodor 1963, Wilks

1975]

• Generalization of lexical similarity

• Incremental language processing [McRae & Matsuki 2009]

• Disambiguation [Toutanova et al. 2005], Applicability of inference rules [Pantel et al.

2007], SRL [Gildea & Jurafsky 2002]

Page 5: Mono- and bilingual modeling of selectional preferences Sebastian Padó Institute for Computational Linguistics Heidelberg University (joint work with Katrin

Modelling Plausibility

•Approximating plausibility by frequency

•Two lexical variables: Frequency of most triples is zero•Implausibility or sparse data?• Generalization based on an ontology (WordNet)

[Resnik 1996]

• Generalization based on vector space [Erk, Padó, und Padó 2010]

English corpus

(eat, obj, apple) 100

(eat, obj, hat) 1(eat, obj,

telephone) 0(eat, obj, caviar) 0

(eat, obj, apple): highly plausible(eat, obj, hat): somewhat plausible(eat, obj, telephone): ?(eat, obj, caviar): ?

Page 6: Mono- and bilingual modeling of selectional preferences Sebastian Padó Institute for Computational Linguistics Heidelberg University (joint work with Katrin

Semantic Spaces

• Characterization of word meaning though profile over occurrence contexts [Salton, Wang, and Yang 1974, Landauer & Dumais 1997, Schütze 1998]

• Geometrically: Vector in high-dimensional space

• High vector similarity implies high semantic similarity• Next neighbors = synonyms

cultiver

rouler

mandarine

5 1

clémentine

4 1

voiture 1 20

Fr

cultiver

rouler

mandarine

clémentinevoiture

Page 7: Mono- and bilingual modeling of selectional preferences Sebastian Padó Institute for Computational Linguistics Heidelberg University (joint work with Katrin

Similarity-based generalization[Pado, Pado & Erk 2010]

•Plausibility is average vector space similarity to seen arguments

• (v, r, a): verb – relation – argument head word triple

• seenargs: set of argument head words seen in the corpus

• wt: weight function• Z: normalization constant• sim: semantic (vector space) similarity

Page 8: Mono- and bilingual modeling of selectional preferences Sebastian Padó Institute for Computational Linguistics Heidelberg University (joint work with Katrin

Geometrical interpretation

Peter

husbandchild

orangeapple

breakfastcaviar

Seen objects of “eat”

Seen subjects of “eat”

telephone

Page 9: Mono- and bilingual modeling of selectional preferences Sebastian Padó Institute for Computational Linguistics Heidelberg University (joint work with Katrin

Evaluation

•Triples with human plausibility ratings [McRae et al. 1996]

• Evaluation: Correlation of model predictions with human judgments• Spearman’s ρ = 1: perfect correlation; ρ = 0:

no correlation

•Result: Vector space model attains almost quality of “deep” model at 98% coverage

Modell Abdeckung

Spearman’s rho

Resnik 1996 [ontology-based]

100% 0.123 n.s.

EPP [vector space-based] 98% 0.325 ***

U. Pado et al. 2006 [“deep” model]

78% 0.415 ***

Page 10: Mono- and bilingual modeling of selectional preferences Sebastian Padó Institute for Computational Linguistics Heidelberg University (joint work with Katrin

From one to many languages…

•Vector space model reduces the need for language resources to predict plausibility judgments• No ontologies

•Still necessary: Observations of triples, target words• Large, accurately parsed corpus• Problematic for basically all languages except

English

•Can we extend our strategy to new languages?

Resnik [Brockmann & Lapata 2002]

TIGER+ GermaNet

ρ= .37

EPP [Pado & Peirsman 2010]

HGC ρ= .33

Page 11: Mono- and bilingual modeling of selectional preferences Sebastian Padó Institute for Computational Linguistics Heidelberg University (joint work with Katrin

Predicting plausibility for new languages

•Transfer with a bilingual lexicon [Koehn and Knight 2002]

• Cross-lingual knowledge transfer

•Print dictionaries are problematic• Instead: acquire from distributional data

cultiver – grow

pomme – apple

(cultiver, Obj, pomme) Englishmodel

Englishcorpus

(grow, obj, apple): highly plausible

Page 12: Mono- and bilingual modeling of selectional preferences Sebastian Padó Institute for Computational Linguistics Heidelberg University (joint work with Katrin

Bilingual semantic space

• Joint semantic space for words from both languages [Rapp 1995, Fung & McKeown 1997]

• Dimensions are bilingual word pairs, can be bootstrapped

• Frequencies observable from comparable corpora

• Nearest neighbors: Cross-lingual synonyms ⟷ Translations

(cultiver, grow)

(rouler, drive)

mandarine

5 1

mandarin

4 2

car 1 20

Fr

cultiver/grow

rouler/drive

mandarine

mandarincar

E

Page 13: Mono- and bilingual modeling of selectional preferences Sebastian Padó Institute for Computational Linguistics Heidelberg University (joint work with Katrin

Nearest neighbors in bilingual space

• Similar usages / context profiles do not necessarily indicate synonymy

(cultiver, grow)

(rouler, drive)

pear 5 1

pomme 4 2

car 1 20

Fr

cultiver/grow

rouler/drive

pear

pommecar

E

• Bilingual case: Peirsman & Pado (2011)• Lexicon extraction for EN/DE and

EN/NL

Page 14: Mono- and bilingual modeling of selectional preferences Sebastian Padó Institute for Computational Linguistics Heidelberg University (joint work with Katrin

Evaluation against Gold Standard

•Evaluation of nearest cross-lingual neighbors against a translators’ dictionary

Page 15: Mono- and bilingual modeling of selectional preferences Sebastian Padó Institute for Computational Linguistics Heidelberg University (joint work with Katrin

Analysis of 200 noun pairs (EN-DE)

Meta-Relation Relation Frequency

Example

Synonymy (50%) 99 Verhältnis - relationship

Semantic similarity (16%)

Antonymy 1 Inneres - exterior

Co-Hyponymy

15 Straßenbahn - bus

Hyponymy 3 Kunstwerk - painting

Hypernymy 15 Dramatiker - poet

Semantic relatedness (19%)

39 Kapitel - essay

Errors (14%) 28 DDR-Zeit – trainee

Page 16: Mono- and bilingual modeling of selectional preferences Sebastian Padó Institute for Computational Linguistics Heidelberg University (joint work with Katrin

Similarity by relation

Page 17: Mono- and bilingual modeling of selectional preferences Sebastian Padó Institute for Computational Linguistics Heidelberg University (joint work with Katrin

How to proceed?

•Classical reaction: Focus on cross-lingual synonyms• Aggressive filtering of nearest-neighbor lists • Risk: Sparse data issues

•Our hypothesis (prelimimary version):• Non-synonymous pairs still provide information about

bilingual similarity• Should be exploited for cross-lingual knowledge transfer• Experimental validation: Vary number of synonyms,

observe effect on cross-lingual knowledge transfer

Page 18: Mono- and bilingual modeling of selectional preferences Sebastian Padó Institute for Computational Linguistics Heidelberg University (joint work with Katrin

Varying the number of neighbors

•Nearest neighbors: 50% of synonyms•Further neighbors: quick decline to 10% of synonyms

Page 19: Mono- and bilingual modeling of selectional preferences Sebastian Padó Institute for Computational Linguistics Heidelberg University (joint work with Katrin

Experimental setup

rouler – drive

bagnole – jalopy, banger,

car

(bagnole, subj, rouler) English model

Englishcorpus

Consider plausibilities für:

(jalopy, subj, drive)(banger, subj, drive)

(car, subj, drive)

Page 20: Mono- and bilingual modeling of selectional preferences Sebastian Padó Institute for Computational Linguistics Heidelberg University (joint work with Katrin

Details

• Model:• English model: trained on BNC as before• Bilingual lexicon extracted from BNC und

Stuttgarter Nachrichtenkorpus HGC as comparable corpora

• Prediction based on n nearest English neighbours for German argument

• Evaluation:• 90 German (v,r,a) triples with human

plausibility ratings [Brockmann & Lapata 2003]

Page 21: Mono- and bilingual modeling of selectional preferences Sebastian Padó Institute for Computational Linguistics Heidelberg University (joint work with Katrin

Results – EN-DE

1 NN

2 NN

3 NN

4 NN

5 NN

Translated English EPP 0.34 0.41 0.44 0.46 0.40

Model Resources Sperman’s ρ

Resnik [Brockmann & Lapata 2002]

TIGER corpus, German Word Net

.37

EPP German [Pado & Peirsman 2010]

HGC corpus parsed with PCFG

.33

• Result: Transfer model significantly better than monolingual model, but only if non-synonymous neighbors are included

Page 22: Mono- and bilingual modeling of selectional preferences Sebastian Padó Institute for Computational Linguistics Heidelberg University (joint work with Katrin

Results: Details

1 NN

2 NN

3 NN

4 NN 5 NN

English EPP (all ) 0.34 0.41 0.44 0.46 0.40

English EPP (subjects) 0.53 0.51 0.56 0.56 0.55

English EPP (objects) 0.58 0.61 0.61 0.64 0.58

English EPP (pp objects)

0.33 0.45 0.45 0.46 0.42

Page 23: Mono- and bilingual modeling of selectional preferences Sebastian Padó Institute for Computational Linguistics Heidelberg University (joint work with Katrin

Sources of the positive effect

•Non-synonyms are in fact informative for plausibility translation

•Semantically similar verbs: eat – munch – feast• Similar events, similar arguments [Fillmore et al.

2003, Levin 1993]

•Semantically related verbs: peel – cook – eat• Schemas/narrative chains: shared participants

[Shank & Abelson 1977, Chambers & Jurafsky 2009]

Page 24: Mono- and bilingual modeling of selectional preferences Sebastian Padó Institute for Computational Linguistics Heidelberg University (joint work with Katrin

Our hypothesis with qualifications

• Using non-synonymous translation pairs is helpful1. if transferred knowledge is lexical• Many infrequently observed datapoints

2. if knowledge is stable across semantically related/similar word pairs

• Counterexample: polarity/sentiment judgments• food – feast – grub • Parallel experiment: best results for single

nearest neighbor

Page 25: Mono- and bilingual modeling of selectional preferences Sebastian Padó Institute for Computational Linguistics Heidelberg University (joint work with Katrin

Summary

•Plausibility can be modeled with fairly shallow methods• Seen head words plus generalization in vector

space• Precondition: accurately parsed corpus

•If unavailable: Transfer from better-endowed language• Translation through automatically induced

lexicons

•Transfer of knowledge about certain phenomena can benefit from non-synonymous translations• Corresponding to monolingual results from QA

[Harabagiu et al. 2000], paraphrases [Lin & Pantel 2001], entailment [Dagan et al. 2006], …