mono- and bilingual modeling of selectional preferences sebastian padó institute for computational...

Mono- and bilingual modeling of selectional preferences

Sebastian PadóInstitute for Computational Linguistics

Heidelberg University

(joint work with Katrin Erk, Ulrike Pado, Yves Peirsman)

Some context

•Computational lexical semantics: modeling the meaning of words and phrases

•Distributional approach• Observe the usage of words in corpora

• Robustness: Broad coverage, manageable complexity • Flexibility: Corpus choice determines model

Knowledge

Corpus

Structure

Methods:Distributional

semantics

Phenomena:Semantic

relations in bilingual

dictionaries

Application:Predictions of plausibility judgments

Plausibility of Verb-Relation-Argument-Triples

Verb Relation Argument

Plausibility

eat subject customer 6.9

eat object customer 1.5

eat subject apple 1.0

eat object apple 6.4• Central aspect of language• Selectional preferences [Katz & Fodor 1963, Wilks

1975]

• Generalization of lexical similarity

• Incremental language processing [McRae & Matsuki 2009]

• Disambiguation [Toutanova et al. 2005], Applicability of inference rules [Pantel et al.

2007], SRL [Gildea & Jurafsky 2002]

Modelling Plausibility

•Approximating plausibility by frequency

•Two lexical variables: Frequency of most triples is zero•Implausibility or sparse data?• Generalization based on an ontology (WordNet)

[Resnik 1996]

• Generalization based on vector space [Erk, Padó, und Padó 2010]

English corpus

(eat, obj, apple) 100

(eat, obj, hat) 1(eat, obj,

telephone) 0(eat, obj, caviar) 0

(eat, obj, apple): highly plausible(eat, obj, hat): somewhat plausible(eat, obj, telephone): ?(eat, obj, caviar): ?

Semantic Spaces

• Characterization of word meaning though profile over occurrence contexts [Salton, Wang, and Yang 1974, Landauer & Dumais 1997, Schütze 1998]

• Geometrically: Vector in high-dimensional space

• High vector similarity implies high semantic similarity• Next neighbors = synonyms

cultiver

rouler

mandarine

5 1

clémentine

4 1

voiture 1 20

Fr

cultiver

rouler

mandarine

clémentinevoiture

Similarity-based generalization[Pado, Pado & Erk 2010]

•Plausibility is average vector space similarity to seen arguments

• (v, r, a): verb – relation – argument head word triple

• seenargs: set of argument head words seen in the corpus

• wt: weight function• Z: normalization constant• sim: semantic (vector space) similarity

Geometrical interpretation

Peter

husbandchild

orangeapple

breakfastcaviar

Seen objects of “eat”

Seen subjects of “eat”

telephone

Evaluation

•Triples with human plausibility ratings [McRae et al. 1996]

• Evaluation: Correlation of model predictions with human judgments• Spearman’s ρ = 1: perfect correlation; ρ = 0:

no correlation

•Result: Vector space model attains almost quality of “deep” model at 98% coverage

Modell Abdeckung

Spearman’s rho

Resnik 1996 [ontology-based]

100% 0.123 n.s.

EPP [vector space-based] 98% 0.325 ***

U. Pado et al. 2006 [“deep” model]

78% 0.415 ***

From one to many languages…

•Vector space model reduces the need for language resources to predict plausibility judgments• No ontologies

•Still necessary: Observations of triples, target words• Large, accurately parsed corpus• Problematic for basically all languages except

English

•Can we extend our strategy to new languages?

Resnik [Brockmann & Lapata 2002]

TIGER+ GermaNet

ρ= .37

EPP [Pado & Peirsman 2010]

HGC ρ= .33

Predicting plausibility for new languages

•Transfer with a bilingual lexicon [Koehn and Knight 2002]

• Cross-lingual knowledge transfer

•Print dictionaries are problematic• Instead: acquire from distributional data

cultiver – grow

pomme – apple

(cultiver, Obj, pomme) Englishmodel

Englishcorpus

(grow, obj, apple): highly plausible

Bilingual semantic space

• Joint semantic space for words from both languages [Rapp 1995, Fung & McKeown 1997]

• Dimensions are bilingual word pairs, can be bootstrapped

• Frequencies observable from comparable corpora

• Nearest neighbors: Cross-lingual synonyms ⟷ Translations

(cultiver, grow)

(rouler, drive)

mandarine

5 1

mandarin

4 2

car 1 20

Fr

cultiver/grow

rouler/drive

mandarine

mandarincar

E

Nearest neighbors in bilingual space

• Similar usages / context profiles do not necessarily indicate synonymy

(cultiver, grow)

(rouler, drive)

pear 5 1

pomme 4 2

car 1 20

Fr

cultiver/grow

rouler/drive

pear

pommecar

E

• Bilingual case: Peirsman & Pado (2011)• Lexicon extraction for EN/DE and

EN/NL

Evaluation against Gold Standard

•Evaluation of nearest cross-lingual neighbors against a translators’ dictionary

Analysis of 200 noun pairs (EN-DE)

Meta-Relation Relation Frequency

Example

Synonymy (50%) 99 Verhältnis - relationship

Semantic similarity (16%)

Antonymy 1 Inneres - exterior

Co-Hyponymy

15 Straßenbahn - bus

Hyponymy 3 Kunstwerk - painting

Hypernymy 15 Dramatiker - poet

Semantic relatedness (19%)

39 Kapitel - essay

Errors (14%) 28 DDR-Zeit – trainee

Similarity by relation

How to proceed?

•Classical reaction: Focus on cross-lingual synonyms• Aggressive filtering of nearest-neighbor lists • Risk: Sparse data issues

•Our hypothesis (prelimimary version):• Non-synonymous pairs still provide information about

bilingual similarity• Should be exploited for cross-lingual knowledge transfer• Experimental validation: Vary number of synonyms,

observe effect on cross-lingual knowledge transfer

Varying the number of neighbors

•Nearest neighbors: 50% of synonyms•Further neighbors: quick decline to 10% of synonyms

Experimental setup

rouler – drive

bagnole – jalopy, banger,

car

(bagnole, subj, rouler) English model

Englishcorpus

Consider plausibilities für:

(jalopy, subj, drive)(banger, subj, drive)

(car, subj, drive)

Details

• Model:• English model: trained on BNC as before• Bilingual lexicon extracted from BNC und

Stuttgarter Nachrichtenkorpus HGC as comparable corpora

• Prediction based on n nearest English neighbours for German argument

• Evaluation:• 90 German (v,r,a) triples with human

plausibility ratings [Brockmann & Lapata 2003]

Results – EN-DE

1 NN

2 NN

3 NN

4 NN

5 NN

Translated English EPP 0.34 0.41 0.44 0.46 0.40

Model Resources Sperman’s ρ

Resnik [Brockmann & Lapata 2002]

TIGER corpus, German Word Net

.37

EPP German [Pado & Peirsman 2010]

HGC corpus parsed with PCFG

.33

• Result: Transfer model significantly better than monolingual model, but only if non-synonymous neighbors are included

Results: Details

1 NN

2 NN

3 NN

4 NN 5 NN

English EPP (all ) 0.34 0.41 0.44 0.46 0.40

English EPP (subjects) 0.53 0.51 0.56 0.56 0.55

English EPP (objects) 0.58 0.61 0.61 0.64 0.58

English EPP (pp objects)

0.33 0.45 0.45 0.46 0.42

Sources of the positive effect

•Non-synonyms are in fact informative for plausibility translation

•Semantically similar verbs: eat – munch – feast• Similar events, similar arguments [Fillmore et al.

2003, Levin 1993]

•Semantically related verbs: peel – cook – eat• Schemas/narrative chains: shared participants

[Shank & Abelson 1977, Chambers & Jurafsky 2009]

Our hypothesis with qualifications

• Using non-synonymous translation pairs is helpful1. if transferred knowledge is lexical• Many infrequently observed datapoints

2. if knowledge is stable across semantically related/similar word pairs

• Counterexample: polarity/sentiment judgments• food – feast – grub • Parallel experiment: best results for single

nearest neighbor

Summary

•Plausibility can be modeled with fairly shallow methods• Seen head words plus generalization in vector

space• Precondition: accurately parsed corpus

•If unavailable: Transfer from better-endowed language• Translation through automatically induced

lexicons

•Transfer of knowledge about certain phenomena can benefit from non-synonymous translations• Corresponding to monolingual results from QA

[Harabagiu et al. 2000], paraphrases [Lin & Pantel 2001], entailment [Dagan et al. 2006], …

mono- and bilingual modeling of selectional preferences sebastian padó institute for computational...

Documents

vector space model

telephone slide

vector space erk

high semantic similarity

epp vector spacebased98

plausibility of verb

model knowledg e corpus

yves peirsman slide