developing a tt-mctag for german with an rcg-based parser · developing a tt-mctag for german with...

22
Developing a TT-MCTAG for German with an RCG-based Parser Laura Kallmeyer, Timm Lichte, Wolfgang Maier, Yannick Parmentier , Johannes Dellert University of T¨ ubingen, Germany CNRS-LORIA, France LREC 2008, 28.05.2008 Developing a TT-MCTAG for German 1

Upload: others

Post on 07-Apr-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Developing a TT-MCTAG for German with an RCG-based Parser · Developing a TT-MCTAG for German with an RCG-based Parser Laura Kallmeyer, Timm Lichte, Wolfgang Maier, Yannick Parmentier⋆,

Developing a TT-MCTAG for German with anRCG-based Parser

Laura Kallmeyer, Timm Lichte, Wolfgang Maier,Yannick Parmentier⋆, Johannes Dellert

University of Tubingen, Germany⋆CNRS-LORIA, France

LREC 2008, 28.05.2008

Developing a TT-MCTAG for German 1

Page 2: Developing a TT-MCTAG for German with an RCG-based Parser · Developing a TT-MCTAG for German with an RCG-based Parser Laura Kallmeyer, Timm Lichte, Wolfgang Maier, Yannick Parmentier⋆,

Aims and scope

Presentation of an implementation framework for a GermanTAG-based grammar

How to design and maintain a grammatical resource ?(i.e., a German TT-MCTAG)

How to connect this with a (2-layered) lexical resource?

How to parse German using these resources?

Outline:

1 The formalism: TAG and TT-MCTAG

2 The implementation framework: XMG and TuLiPA

3 The grammar: GerTT

Developing a TT-MCTAG for German 2

Page 3: Developing a TT-MCTAG for German with an RCG-based Parser · Developing a TT-MCTAG for German with an RCG-based Parser Laura Kallmeyer, Timm Lichte, Wolfgang Maier, Yannick Parmentier⋆,

Tree-Adjoining Grammar - Basics

A Tree Adjoining Grammar (TAG) is a set of elementary trees:

a finite set of initial trees

a finite set of auxiliary trees

E.g.:

VP

ADV VP*

easily

VP

NP↓ VP

V NP↓

repaired

Combinatorial operations:

substitution: replacing a non-terminal leaf with an initial tree

adjunction: replacing an internal node with an auxiliary tree

Developing a TT-MCTAG for German 3

Page 4: Developing a TT-MCTAG for German with an RCG-based Parser · Developing a TT-MCTAG for German with an RCG-based Parser Laura Kallmeyer, Timm Lichte, Wolfgang Maier, Yannick Parmentier⋆,

Tree-Adjoining Grammar - Example

NP

Peter

VP

NP↓ VP

V NP↓

repaired

NP

the fridgeVP

ADV VP*

easily

derived tree derivation treeVP

NP VP

Peter ADV VP

easily V NP

repaired the fridge

repaired

Peter

1

easily

2

the fridge

22

Developing a TT-MCTAG for German 4

Page 5: Developing a TT-MCTAG for German with an RCG-based Parser · Developing a TT-MCTAG for German with an RCG-based Parser Laura Kallmeyer, Timm Lichte, Wolfgang Maier, Yannick Parmentier⋆,

Tree-Adjoining Grammar - Basics

TAGs are mildly context-sensitive:

1 Polynomial time parsing complexity

2 Generation of limited crossing dependencies

3 Constant growth property (semilinearity)

Large TAG grammars:

English and Korean (XTAG, UPenn)

French TAG (Benoit Crabbe’s PhD-thesis)

. . .

Developing a TT-MCTAG for German 5

Page 6: Developing a TT-MCTAG for German with an RCG-based Parser · Developing a TT-MCTAG for German with an RCG-based Parser Laura Kallmeyer, Timm Lichte, Wolfgang Maier, Yannick Parmentier⋆,

Why not TAG for German?

The order of complements (and adjuncts) of a verb is flexible.

(1) Peter liebt Susi.1: Peter loves Susi

2: Susi loves Peter

(2) dass Peter heute den Kuhlschrank repariert hatdass den Kuhlschrank heute Peter repariert hat. . .(’that Peter has repaired the fridge today’)

TAG is inappropriate for German, because it is:

not powerful enough for some constructions(i.e., coherent constructions)

not descriptively adequat(i.e., one elementary tree for each permutation)

Developing a TT-MCTAG for German 6

Page 7: Developing a TT-MCTAG for German with an RCG-based Parser · Developing a TT-MCTAG for German with an RCG-based Parser Laura Kallmeyer, Timm Lichte, Wolfgang Maier, Yannick Parmentier⋆,

Why not TAG for German?

The order of complements (and adjuncts) of a verb is flexible.

(1) Peter liebt Susi.1: Peter loves Susi

2: Susi loves Peter

(2) dass Peter heute den Kuhlschrank repariert hatdass den Kuhlschrank heute Peter repariert hat. . .(’that Peter has repaired the fridge today’)

TAG is inappropriate for German, because it is:

not powerful enough for some constructions(i.e., coherent constructions)

not descriptively adequat(i.e., one elementary tree for each permutation)

Developing a TT-MCTAG for German 7

Page 8: Developing a TT-MCTAG for German with an RCG-based Parser · Developing a TT-MCTAG for German with an RCG-based Parser Laura Kallmeyer, Timm Lichte, Wolfgang Maier, Yannick Parmentier⋆,

TT-MCTAG: a TAG-extension for German

Multi-Component TAG (MCTAG) with shared-nodes locality

Elementary structures are tuples 〈γ, {β1 , ..., βn}〉:

a lexicalized elementary tree γ (the head tree)a tree set {β1 , ..., βn} (the complement trees)

Meaning of tree tuples: During derivation, the β-trees haveto attach to the γ-tree (via node sharing).

Node sharing: In the derivation tree,

1 a β-tree must either be the immediate daughter of its γ-tree,2 or the β-tree must be connected to the daughter of the γ-tree

via a chain of root adjunctions.

VP

V

repariert

,

VP

NPnom ↓ VP*,

VP

NPacc ↓ VP*

Developing a TT-MCTAG for German 8

Page 9: Developing a TT-MCTAG for German with an RCG-based Parser · Developing a TT-MCTAG for German with an RCG-based Parser Laura Kallmeyer, Timm Lichte, Wolfgang Maier, Yannick Parmentier⋆,

TT-MCTAG example

(3) dass den Kuhlschrank heute Peter repariert(“that Peter repairs the fridge today”)

VP

ADV VP*

heute

*

VP

V

repariert

,

8

>

<

>

:

VP

NPnom ↓ VP*,

VP

NPacc ↓ VP*

9

>

=

>

;

+

NP

Peter

NP

den K.

repariert

NPnom

0

Peter

1

heute

0

NPacc

0

den Kuhlschrank

1

Developing a TT-MCTAG for German 9

Page 10: Developing a TT-MCTAG for German with an RCG-based Parser · Developing a TT-MCTAG for German with an RCG-based Parser Laura Kallmeyer, Timm Lichte, Wolfgang Maier, Yannick Parmentier⋆,

The implementation framework:

metagrammar XMG-compiler

lexicon parser parsing results(TuLiPA)

sentence

XMG: eXtensible MetaGrammar (Duchier et al, 2004)

TuLiPA: Tubingen Linguistic Parsing Architecture(Parmentier et al, 2008)

Developing a TT-MCTAG for German 10

Page 11: Developing a TT-MCTAG for German with an RCG-based Parser · Developing a TT-MCTAG for German with an RCG-based Parser Laura Kallmeyer, Timm Lichte, Wolfgang Maier, Yannick Parmentier⋆,

eXtensible MetaGrammar (XMG)

(Duchier et al, 2004)

XMG lets one construct a grammar semi-automatically bydescribing tree fragments and their combination. The outputstructures are unlexicalized trees (tree schemata).

Essential for: consistency, design and maintainance efforts

Components:

1 a descripton language

2 a compiler

3 a viewer

4 output format: XML

⇒ XMG has been extended to describe tree sets.

Developing a TT-MCTAG for German 11

Page 12: Developing a TT-MCTAG for German with an RCG-based Parser · Developing a TT-MCTAG for German with an RCG-based Parser Laura Kallmeyer, Timm Lichte, Wolfgang Maier, Yannick Parmentier⋆,

XMG: An example

NP↓

substitution node

+

VP

VP*

VP-projection

VP

NP↓ VP*

complement tree

AP⋄

adverbial anchor

+

VP

VP*

VP-projection

VP

AP⋄ VP*

adverbial tree

Developing a TT-MCTAG for German 12

Page 13: Developing a TT-MCTAG for German with an RCG-based Parser · Developing a TT-MCTAG for German with an RCG-based Parser Laura Kallmeyer, Timm Lichte, Wolfgang Maier, Yannick Parmentier⋆,

XMG: An example

+ ⇒

Developing a TT-MCTAG for German 13

Page 14: Developing a TT-MCTAG for German with an RCG-based Parser · Developing a TT-MCTAG for German with an RCG-based Parser Laura Kallmeyer, Timm Lichte, Wolfgang Maier, Yannick Parmentier⋆,

A 2-layered lexicon

Morphological lexicon

maps an (inflected) token to some lemma form, while preservingmorphological information in a feature structure.

vergisst vergessen [pos=v; num=sg; per=3;]

Lemma lexicon

maps a lemma onto tree tuple families, while also containing selectionalrestrictions (e.g., case assignment).

*ENTRY: vergessen*CAT: v*SEM: BinaryRel[pred=vergessen]*ACC: 1*FAM: Vnp2*FILTERS: []*EX:*EQUATIONS:NParg1 → cas = nomNParg2 → cas = acc*COANCHORS:

Developing a TT-MCTAG for German 14

Page 15: Developing a TT-MCTAG for German with an RCG-based Parser · Developing a TT-MCTAG for German with an RCG-based Parser Laura Kallmeyer, Timm Lichte, Wolfgang Maier, Yannick Parmentier⋆,

A 2-layered lexicon

Morphological lexicon

maps an (inflected) token to some lemma form, while preservingmorphological information in a feature structure.

vergisst vergessen [pos=v; num=sg; per=3;]

Lemma lexicon

maps a lemma onto tree tuple families, while also containing selectionalrestrictions (e.g., case assignment).

*ENTRY: vergessen*CAT: v*SEM: BinaryRel[pred=vergessen]*ACC: 1*FAM: Vnp2*FILTERS: []*EX:*EQUATIONS:NParg1 → cas = nomNParg2 → cas = acc*COANCHORS:

Developing a TT-MCTAG for German 15

Page 16: Developing a TT-MCTAG for German with an RCG-based Parser · Developing a TT-MCTAG for German with an RCG-based Parser Laura Kallmeyer, Timm Lichte, Wolfgang Maier, Yannick Parmentier⋆,

Tubingen Linguistic Parsing Architecture (TuLiPA)

(Parmentier et al, 2008)

Components:

1 TT-MCTAG-to-RCG converter (on-line)

2 RCG parser → RCG derivation forest → TT-MCTAGderivation forest

3 Parse viewer (derived tree, derivation tree, dependency view,semantic representation)

Availability of TuLiPA:written in Java and released under the GNU GPL(http://sourcesup.cru.fr/tulipa/)

Developing a TT-MCTAG for German 16

Page 17: Developing a TT-MCTAG for German with an RCG-based Parser · Developing a TT-MCTAG for German with an RCG-based Parser Laura Kallmeyer, Timm Lichte, Wolfgang Maier, Yannick Parmentier⋆,

TuLiPA: Why RCG?

RCG is useful, because:

it has attractive formal properties (polynomially parsable, fullexpressive power of MCS-languages);

there exist parsing algorithms.

⇒ Parser can be reused for other mildly context-sensitiveformalisms!

NB: RCG properly includes MCS. We use a restricted RCG, calledsimple RCG, that is included in MCS.

Developing a TT-MCTAG for German 17

Page 18: Developing a TT-MCTAG for German with an RCG-based Parser · Developing a TT-MCTAG for German with an RCG-based Parser Laura Kallmeyer, Timm Lichte, Wolfgang Maier, Yannick Parmentier⋆,

TuLiPA: The graphical frontend

Developing a TT-MCTAG for German 18

Page 19: Developing a TT-MCTAG for German with an RCG-based Parser · Developing a TT-MCTAG for German with an RCG-based Parser Laura Kallmeyer, Timm Lichte, Wolfgang Maier, Yannick Parmentier⋆,

TuLiPA: The graphical frontend

Developing a TT-MCTAG for German 19

Page 20: Developing a TT-MCTAG for German with an RCG-based Parser · Developing a TT-MCTAG for German with an RCG-based Parser Laura Kallmeyer, Timm Lichte, Wolfgang Maier, Yannick Parmentier⋆,

Ongoing grammar development

GerTT (German TT-MCTAG)

Large-coverage TT-MCTAG for German, including semantics.

Linguistic principals:

no empty elements such as traces and PRO

no control and raising in the syntax

State of implementation:

free word order phenomena:scrambling, coherent constructions, verbal clustering

extraction phenomena:relative clauses, wh-questions, bridging constructions

ca. 70 XMG-classes

Currently, coverage testing is prepared based on the TSNLP testsuite.

Developing a TT-MCTAG for German 20

Page 21: Developing a TT-MCTAG for German with an RCG-based Parser · Developing a TT-MCTAG for German with an RCG-based Parser Laura Kallmeyer, Timm Lichte, Wolfgang Maier, Yannick Parmentier⋆,

Summary

TT-MCTAG:

More natural support of flexible word order languages, but stillmildly context-sensitive (in fact only k-TT-MCTAG).

The implementation framework:

XMG + TuLiPA: Immediate control over implementational(consistency) and linguistic (coverage) aspects of thegrammar.

XMG: Effortless means for making systematic changes in thegrammar.

TuLiPA: Easiliy adoptable to other MCS formalisms (given aRCG conversion algorithm).

And GerTT is on his way . . .

Developing a TT-MCTAG for German 21

Page 22: Developing a TT-MCTAG for German with an RCG-based Parser · Developing a TT-MCTAG for German with an RCG-based Parser Laura Kallmeyer, Timm Lichte, Wolfgang Maier, Yannick Parmentier⋆,

References

Denys Duchier,Joseph Le Roux,Yannick Parmentier (2004):The Metagrammar Compiler: An NLP Application with a

Multi-paradigm. Second International Mozart/Oz Conference(MOZ’2004)Architecture.

Yannick Parmentier, Laura Kallmeyer, Wolfgang Maier, TimmLichte, Johannes Dellert (2008):TuLiPA: A syntax-semantics parsing environment for mildly

context-sensitive formalisms. Proceedings of the The NinthInternational Workshop on Tree Adjoining Grammars and RelatedFormalisms (TAG+9).

Developing a TT-MCTAG for German 22