the interface between model-theoretic and corpus-based semantics sebastian pado

Post on 04-Jan-2016

213 Views

Category:

Documents

1 Downloads

Preview:

Click to see full reader

TRANSCRIPT

The interface between model-theoretic and corpus-based

semantics

Sebastian Pado

Natural language semantics

• Model-theoretic semantics– Compositional calculation of sentence meaning– Formal descriptions of ambiguities– Inference

• Model-theoretic semantics– Compositional calculation of sentence meaning– Formal descriptions of ambiguities– Inference

• Corpus-based semantics- Distributional, graded meaning representation- Probabilistic knowledge acquisition from corpora- Prediction of linguistic behaviour based on context

• Corpus-based semantics- Distributional, graded meaning representation- Probabilistic knowledge acquisition from corpora- Prediction of linguistic behaviour based on context

Complementary benefits

How to divide work between the approaches?

Model-theoreticsemantics

Good for sentence level(closed word classes)

Limited coverage

Correct

Model-theoreticsemantics

Good for sentence level(closed word classes)

Limited coverage

Correct

Corpus-basedsemantics

Good for lexical level(open word classes)

High coverage, robustness

Approximative

Corpus-basedsemantics

Good for lexical level(open word classes)

High coverage, robustness

Approximative

Strategies

1. More expressive representations for corpus-based models of meaning: Compositionality in vector spaces- Ongoing collaboration with Katrin Erk

(Dept. of Linguistics, U. Texas at Austin)

2. Corpus-based methods for enrichment of formal meaning representations– Core of SFB project proposal

Strategy 1

More expressive representations for corpus-based models of meaning

Compositionality in Vector Spaces• Vector space: Representation of word meaning by

context co-occurrences

• What is the representation of a phrase?– Centroid of two vectors?– No: Must take mode of combination into account

• “a horse draws…” : pull• “draw a horse” : sketch

A first step

• Structured vector space model [Erk & Pado 2008]

– Covers Verb+Object, Verb+Subject combinations– Word meaning consists of lexical vector plus selectional

preferences (=experiences) for dependents/governors

A first step

• Structured vector space model [Erk & Pado 2008]

– Covers Verb+Object, Verb+Subject combinations– Phrase meaning consists of two vectors:

• Verb meaning modified by nominal expectations about governor• Noun meaning modified by verbal expectations about dependent

Current state

• Evaluation: Better distinction between contextually appropriate and inappropriate paraphrases (WSD-style task)

• Further research questions– Generalisation to longer phrases

• More expressive model of expectations

– Modelling of phrases involving closed word classes• E.g. Negation

Strategy 2

Corpus-based methods for enrichment of formal meaning representations

Formal models of meaning in context

• Lexicon entries cannot provide the full range of readings for words/phrases– Readings often productively negotiated in text– Type/sort conflict

• Examples:– Metonymy/Metaphor– Telic adjectives (“fast typist”)– Coercion/Reinterpretation

Example: Coercion

• Wegen einer 15-jährigen kam es zu einem Streit, in dessen Verlauf sie verletzt wurde.

• […] Sie hatte sich mit einem 21-jährigen unterhalten.

• Red and blue expressions are coreferring, but red expression has wrong type (wegen takes <e,t>; expression is <e>).

• Here, context overtly provides missing event

• Often, this is not the case: Operator must be recovered from general knowledge

The role of corpus methods

• Acquisition of general reinterpretation operators from corpora

• Recovery/prediction of operators for instances with type/sort conflict– Making implicit meaning explicit: can be seen as

context-driven semantic specification

• Interest primarily empirical

Project Steps

• Creation of multilingual corpus of type/sort conflict cases with human annotations– Informed by formal considerations

• Development of CL methods to predict operators for conflict resolution

• Ideally, task-based evaluation (to be determined)

• Consequences/insights for formal descriptions

Research Questions• When can operators be found overtly in context; when must

general operators be recovered?– Influence of local discourse?

• CL methods for efficient and accurate prediction of operators– What linguistic levels are helpful? Semantic classes, semantic roles,

dependency relations, …?– Focus on more than one language: Can bilingual processing help?

• What is the level of generality of acquired operators?– What shape do people’s expectations have? – Do peoples’ judgments of recovered operators agree?

• Can empirical results have impact on formal descriptions?– E.g. do sort and type conflicts behave differently or similarly?

• Relation to work on textual entailment?

Collaborations

• D1 (Representation of ambiguities)– Formal descriptions as information source for corpus

development– Attempt to transfer of empirical results back into theory

• B5 (Polysemy in a conceptual system)– Ontological information as knowledge source for CL

operator models– Entailment as shared evaluation task

• Open for other ideas

top related