interfacing sentential and discourse tag-based grammars · d-stag, to build discourse structures...

12
Interfacing Sentential and Discourse TAG-based Grammars Laurence Danlos, Aleksandre Maskharashvili, Sylvain Pogodalla To cite this version: Laurence Danlos, Aleksandre Maskharashvili, Sylvain Pogodalla. Interfacing Sentential and Discourse TAG-based Grammars. The 12th International Workshop on Tree Adjoining Gram- mars and Related Formalisms (TAG+12), Jun 2016, D¨ usseldorf, Germany. Proceedings of the 12th International Workshop on Tree Adjoining Grammars and Related Formalisms (TAG+12). <http://www.aclweb.org/anthology/W16-3303>. <hal-01328697v3> HAL Id: hal-01328697 https://hal.inria.fr/hal-01328697v3 Submitted on 10 Jun 2016 HAL is a multi-disciplinary open access archive for the deposit and dissemination of sci- entific research documents, whether they are pub- lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers. L’archive ouverte pluridisciplinaire HAL, est destin´ ee au d´ epˆ ot et ` a la diffusion de documents scientifiques de niveau recherche, publi´ es ou non, ´ emanant des ´ etablissements d’enseignement et de recherche fran¸cais ou ´ etrangers, des laboratoires publics ou priv´ es.

Upload: others

Post on 03-Jul-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Interfacing Sentential and Discourse TAG-based Grammars · D-STAG, to build discourse structures that are direct acyclic graphs (DAG) and not only trees. All the examples may be run

Interfacing Sentential and Discourse TAG-based

Grammars

Laurence Danlos, Aleksandre Maskharashvili, Sylvain Pogodalla

To cite this version:

Laurence Danlos, Aleksandre Maskharashvili, Sylvain Pogodalla. Interfacing Sentential andDiscourse TAG-based Grammars. The 12th International Workshop on Tree Adjoining Gram-mars and Related Formalisms (TAG+12), Jun 2016, Dusseldorf, Germany. Proceedings of the12th International Workshop on Tree Adjoining Grammars and Related Formalisms (TAG+12).<http://www.aclweb.org/anthology/W16-3303>. <hal-01328697v3>

HAL Id: hal-01328697

https://hal.inria.fr/hal-01328697v3

Submitted on 10 Jun 2016

HAL is a multi-disciplinary open accessarchive for the deposit and dissemination of sci-entific research documents, whether they are pub-lished or not. The documents may come fromteaching and research institutions in France orabroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire HAL, estdestinee au depot et a la diffusion de documentsscientifiques de niveau recherche, publies ou non,emanant des etablissements d’enseignement et derecherche francais ou etrangers, des laboratoirespublics ou prives.

Page 2: Interfacing Sentential and Discourse TAG-based Grammars · D-STAG, to build discourse structures that are direct acyclic graphs (DAG) and not only trees. All the examples may be run

Interfacing Sentential and Discourse TAG-based Grammars

Laurence DanlosUniversite Paris Diderot

ALPAGEINRIA Paris–Rocquencourt

Institut Universitaire de FranceParis, F-75005, France

[email protected]

Aleksandre Maskharashvili Sylvain PogodallaINRIA, Villers-les-Nancy, F-54600, France

Universite de Lorraine, CNRSLORIA, UMR 7503

Vandœuvre-les-Nancy, F-54500, [email protected]

[email protected]

Abstract

Tree-Adjoining Grammars (TAG) havebeen used both for syntactic parsing, withsentential grammars, and for discourseparsing, with discourse grammars. But themodeling of discourse connectives (coor-dinate conjunctions, subordinate conjunc-tions, adverbs, etc.) in TAG-based for-malisms for discourse differ from theirmodeling in sentential grammars. Becauseof this mismatch, an intermediate, notTAG-related, processing step is requiredbetween the sentential and the discourseprocesses, both in parsing and in gener-ation. We present a method to smoothlyinterface sentential and discourse TAGgrammars, without using such an inter-mediate processing step. This method isbased on Abstract Categorial Grammars(ACG) and relies on the modularity of thelatter. It also provides the possibility, as inD-STAG, to build discourse structures thatare direct acyclic graphs (DAG) and notonly trees. All the examples may be runand tested with the appropriate software.

1 Introduction

It is usually assumed that the internal structureof a text, typically characterized by discourse orrhetorical relations, plays an important role inits overall interpretation. Building this structuremay resort to different techniques such as seg-menting the discourse into elementary discourseunits and then relating them with appropriate re-lations (Marcu, 2000; Soricut and Marcu, 2003).Other techniques use discourse grammars, and aparticular trend relies on tree grammars (Polanyiand van den Berg, 1996; Gardent, 1997; Schilder,1997). This trend has been further developed

by integrating the modeling of both clausal syn-tax and semantics, and discourse syntax and se-mantics within the framework of Tree-AdjoiningGrammar (TAG, Joshi et al. (1975); Joshi andSchabes (1997)). This gave rise to the TAGfor Discourse (D-LTAG) formalism (Webber andJoshi, 1998; Forbes et al., 2003; Webber, 2004;Forbes-Riley et al., 2006), and to the DiscourseSynchronous TAG (D-STAG) formalism (Danlos,2009; Danlos, 2011). The latter derives semanticinterpretation using Synchronous Tree-AdjoiningGrammars (STAG, Shieber and Schabes (1990);Nesson and Shieber (2006); Shieber (2006)).

While one may think that using similar frame-works for both levels should help to interfacethem, it is not as smooth as one can expect. In-deed, a shared feature of D-LTAG and D-STAGis that grammatical parsing and discourse parsingare performed at two different stages. Moreover,the result of the first stage requires additional, notTAG-related, processing before being able to enterthe second stage. This intermediary step consistsin discourse relation extraction in D-LTAG and indicourse normalization in D-STAG.

The reason for this intermediary step relates tothe mismatch between the syntactic properties andthe discourse properties of discourse markers. Forinstance, at the syntactical level, sentences as in(1) are well-formed.

(1) a. Then, John went to Paris.

b. John then went to Paris.The discourse marker then, an adverb, is consid-ered as a modifier, either of the whole clause in(1a) or of the verb phrase (1b). In TAG, they arerepresented as auxiliary trees with S or VP rootnodes (XTAG Research Group, 2001; Abeille,2002). Using the elementary trees of Figure 1,Figure 2 (Figure 4, resp.) shows the TAG analy-sis of (1a) (of (1b), resp.).

Page 3: Interfacing Sentential and Discourse TAG-based Grammars · D-STAG, to build discourse structures that are direct acyclic graphs (DAG) and not only trees. All the examples may be run

NP

Fred

S

NP VP

Vwent

PP

Prep

to

NP

NP

Paris

S

Adv

then

S∗VP

Adv

then

VP∗

Figure 1: Elementary trees of a toy TAG grammar

S

Adv

then

S∗

NP

Fred

S

NP VP

Vwent

PP

Prep

to

NP

NP

Paris

Figure 2: TAG analysis of (1a)

S

Advthen

SNPFred

VPV

wentPP

Prep

to

NPParis

Figure 3: Derived tree for (1a)

NP

Fred

S

NP

VP

Adv

then

VP∗

VP

Vwent

PP

Prep

to

NP

NP

Paris

Figure 4: TAG analysis of (1b)

SNPFred

VPAdvthen

VPV

wentPP

Prep

to

NPParis

Figure 5: Derived tree for (1b)

At the discourse level, it is difficult to interpretthese sentences without referring to preceding sen-tences. The discourse relation (e.g., Narration) hastwo arguments: the discourse unit consisting of theclause in which the discourse cue appears (the hostclause), and some other discourse unit (it can be acomplex one). D-LTAG and D-STAG propose dif-ferent models of such adverbials, in particular inthe way the first argument is provided. But in bothaccounts, adverbials are fronted (see Figure 6(c)and Figure 9(a)). Hence sentences with medialadverbials such as (1b) are excluded without theintermediary step of discourse relation extraction.

A similar mismatch occurs with subordinateconjunctions. In a typical TAG analysis, they aremodeled with auxiliary trees because they modifythe matrix clause and are not part of its predicate-argument structure.1 In D-LTAG, however, theyare modeled with initial trees with two substitu-tion sites (see Figure 6(a)) for the two discourseunits they are predicating over.

So the question of relating the syntactic mod-eling and the discourse modeling arises. In par-ticular, we wish to avoid this relation to rely onsome intermediary step. Indeed, the latter has sev-eral drawbacks. First, it complicates the model-ing of connectives that are ambiguous in their syn-tactic and discourse use, and prevents us from us-ing standard grammar inference and disambigua-tion techniques. Second, while most of the syntax-semantics interfaces, in particular in TAG, aimat satisfying a compositional assumption (Gardentand Kallmeyer, 2003; Pogodalla, 2004; Kallmeyerand Romero, 2008; Nesson and Shieber, 2006),the syntax-discourse interface seems to escape it.Third, a better integration of the sentential and ofthe discourse components also seems an interest-

1It is not always the case, though. Bernard and Danlos(2016) propose different elementary trees, depending on thesyntactical, semantic, and discourse properties of the con-junction.

Page 4: Interfacing Sentential and Discourse TAG-based Grammars · D-STAG, to build discourse structures that are direct acyclic graphs (DAG) and not only trees. All the examples may be run

ing feature if we want to better describe the inter-action between discourse connectives and propo-sitional attitude predicates (Danlos, 2013; Bernardand Danlos, 2016).

Finally, when generation instead of parsing isat stake, this architecture also prevents the re-versibility of the grammars and requires ad-hocpost-processing. G-TAG, a TAG-based formalismdedicated to generation that includes elements of adiscourse grammar, had this requirement (Danlos,1998; Meunier, 1997; Danlos, 2000).

In this article, we describe how to interface asentential and a discourse TAG-based grammar.We show how to link such two grammars andtheir proposed modelings of discourse connec-tives, overcoming the above mentioned issue. Weuse an encoding of TAG into Abstract Catego-rial Grammar (ACG, de Groote (2001)), a gram-matical framework based on the simply typed λ-calculus. As we aim at reusing previous workssuch as existing TAG sentential grammars as wellas discourse analysis, our approach relies on twokey features of ACG: the ACG account of the TAGoperations and the ACG-based syntax-semanticsinterface for TAG (Pogodalla, 2004; Pogodalla,2009) on the one hand; and the modular ACGcomposition, in order to smoothly integrate thesyntactical and discourse behavior of adverbialconnectives without using a two-step analysis onthe other hand. Note, however, that the operationswe use in the ACG composition are not availableas TAG operations. While the encoding of TAGinto ACG is standard (de Groote, 2002; Pogodalla,2009), our contribution is to use the interpretingdevice of ACG to relate (the ACG encoding of) aTAG sentential grammar and (the ACG encodingof) a TAG discourse grammar. The example gram-mars we use may be run and tested2 on the ACGdevelopment software.3

2 TAG Based Discourse Grammars

As TAG grammars, D-LTAG and D-STAG do notdiffer from any other TAG grammar: they defineelementary trees that can be combined using the

2The ACG example files can be downloaded fromhttp://hal.inria.fr/hal-01328697v3/file/acg-examples.zip. They also include the semanticinterpretation that generates the expected DAG discoursestructures. But because of lack of space, we cannot presenthere the semantic part that builds on the one proposed forD-STAG (Danlos, 2009; Danlos, 2011) and extends it.

3http://www.loria.fr/equipes/calligramme/acg/#Software.

operations of substitution and adjunction. How-ever, if some elementary trees are anchored by lex-ical items (the discourse markers) as in sententialgrammars, the others are anchored by clauses re-sulting from the syntactic analysis. Contrary tosentential grammars that contain a lot of differ-ent elementary tree families, discourse grammarshave a small set of such families. In this section,we focus on these elementary trees, anchored bydiscourse markers. We show how the structure ofthese trees influences the interaction between thesentential and the discourse grammars, and whythis interaction calls for an intermediary process-ing step. For an in-depth presentation of these for-malisms, we refer the reader to (Webber and Joshi,1998; Forbes et al., 2003; Webber, 2004; Forbes-Riley et al., 2006) for D-LTAG and to (Danlos,2009; Danlos, 2011) for D-STAG.

Du

Du↓ although Du↓

(a) D-LTAG initial trees forsubordinate conjunctions

Du

Du∗ and Du↓

(b) D-LTAG auxiliarytrees for coordinatingconjunctions (and and ε)

Du

then Du∗

(c) D-LTAG auxiliarytrees for connectiveadverbials

Figure 6: D-LTAG elementary tree schemes

D-LTAG D-LTAG proposes three main familiesof elementary trees that capture different insightson discourse structures. Trees for subordinate con-junctions are modeled using initial trees with twosubstitution nodes for each of the arguments asFigure 6(a) shows. This reflects the predicate-argument structure of these connectives at the dis-course level. But this contrasts with the syntac-tic account of these connectives: because theyare outside the domain of locality of the verbs towhich they can adjoin (at S or VP nodes), theytypically are modeled using auxiliary trees (seeFigure 7).

The second family of connectives is used to ex-tend or to elaborate on clauses with auxiliary treesanchored by coordinate conjunctions (or by theempty connective). The first argument of the con-nective corresponds to the discourse unit the tree is

Page 5: Interfacing Sentential and Discourse TAG-based Grammars · D-STAG, to build discourse structures that are direct acyclic graphs (DAG) and not only trees. All the examples may be run

S

S∗ Punct↓ PP

P

although

S↓

Figure 7: Syntactic modeling of subordinate con-junctions

adjoined to, and the second, the extending clause,corresponds to the clause that is substituted at thesubstitution node, as Figure 6(b) shows.

The third family also consists of auxiliary trees.But the latter are associated with a single clause asFigure 6(c) shows. The second argument comesfrom the anaphoric interpretation of the connec-tives anchoring such trees.

The two-stage process for parsing discourseproceeds as follows: first, each sentence gets aTAG analysis (derived and derivation trees) by astandard TAG. Then, each derivation tree is pro-cessed in order to identify the possible discourseconnectives and their arguments from a syntacticpoint of view. The latter (one or two, dependingon the connective) are added as initial trees withroot Du to the discourse grammar, as well as the(discourse) elementary tree anchored by the con-nective. For instance, from the clausal derivationtree of Figure 8, the two arguments αd

s1 and αds2 ,

and the connective βds3 are extracted. A similar ex-traction step takes care of the extraction of clause-medial adverbial connectives.

D-STAG Contrary to D-LTAG, D-STAG modelsall discourse connectives with auxiliary trees thatare adjoined to the discourse unit they extend. Theclause content that serves as second argument ofthe connective is substituted within this tree. Fig-ure 9 shows some of the schemes for the elemen-tary (auxiliary) trees of a D-STAG. The three inter-nal Du nodes are available for adjunctions, achiev-ing different effects on the semantic trees (follow-ing the principles of synchronous TAG, each dis-course elementary tree is paired with a semantictree). Together with a higher-order type for the se-mantic trees, this allows D-STAG to structurallygenerate DAG discourse structures.4 But as thefocus of this article is on the articulation betweenthe sentential and the discourse grammar, we do

4Such structures are not easily available with D-LTAG,and this was a motivation to introduce D-STAG.

αsaw

αshe αdog

βa

βwhile

αeating

αshe βwas αlunch β.

αsaw

αshe αdog

βa

Du↓

s1

Extraction

αds1 =

αeating

αshe βwas αlunch β.

Du↓

s2

αds2 =

βwhile (see Figure 6(a))βds3 =

Figure 8: Discourse elementary tree extraction

not enter the details of the discourse-semantics in-terface here (see Danlos et al. (2015)).

D-STAG shares with D-LTAG the requirementof a transformation from the sequence of sentenceanalysis with a sentential grammar into a sequenceof clauses and discourse connectives in a “dis-course normal form”. The reasons are basicallythe same as for D-LTAG: discourse connectivesneed to be identified in order to anchor their as-sociated discourse elementary trees, as they differfrom their syntactic elementary trees, and clause-medial extractions need to be managed at this levelas well.

Medial Adverbial Extraction Looking at theelementary trees of Figure 9(a) (the problem issimilar for D-LTAG elementary trees), we observethat the host clause of the adverbial are substi-tuted into the elementary tree, at the Du ↓ node.But at the sentential level, it is auxiliary trees an-chored by the adverbials that adjoin into the hostclause. When the adjunction occurs at the top Snode, we get the same surface form in both cases.However, whenever the adjunction occurs at theVP node in the sentential grammar, this is not thecase anymore: the adverbial is not fronted, andthe discourse grammar cannot account for this po-sition. An intermediary form, such as the dis-course normal form in D-STAG, or the tree ex-traction in D-LTAG is then required. In order toget rid of this intermediary step, we should be

Page 6: Interfacing Sentential and Discourse TAG-based Grammars · D-STAG, to build discourse structures that are direct acyclic graphs (DAG) and not only trees. All the examples may be run

Du

Du

Du∗ Punct.

DC

adv

Du

Du↓

(a) Adverbial connectives

Du

Du

Du∗ (Punct)

,

DC

conj

Du

Du↓

(b) Postposed conjunctions

Du

Du

DC

conj

Du

Du↓

Punct,

Du∗

(c) Preposed conjunction

Figure 9: D-STAG elementary trees

able to describe an operation that simultaneouslysubstitutes a clause within the elementary tree ofthe discourse connective, and adjoins the auxiliarytree on the VP node. Figure 10(a) describes suchan operation. The dotted lines would represent adominance constraint that the tree to be substitutedat Du ↓ should satisfy. It is also natural then touse the same approach for fronted discourse ad-verbials, as in Figure 10(b).

Because the adverb has to adjoin within the treethat is being substituted, describing such an op-eration seems not to be possible in TAG nor inmulticomponent TAG (at least in a single step). Itwould be possible with D-Tree Substitution Gram-mars (Rambow et al., 2001), but then the deriva-tion trees would be different, the synchronoussyntax-semantics interface would have to be rede-fined, and the reversibility properties (for genera-tion) would have to be stated. We instead use anencoding with ACG, where these properties nat-urally follow the standard encoding of TAG intoACG.

Du

Du

Du∗ Punct.

DCε

Du

VP

adv VP

Du↓

VP

(a)

Du

Du

Du∗ Punct.

DCε

Du

S

adv S

Du↓

S

(b)

Figure 10: Auxiliary trees for discourse connec-tives

3 Abstract Categorial Grammars

ACG derives from type-theoretic grammars.Rather than a grammatical formalism on its own, itprovides a framework in which several grammat-ical formalisms may be encoded (de Groote andPogodalla, 2004), in particular TAG (de Groote,2002). The definition of an ACG is based on type-theory, λ-calculus, and linear logic. In particular,ACG generates languages of linear λ-terms, whichgeneralize both string and tree languages.

As key feature, ACG provides the user with a di-rect control over the parse structures of the gram-mar, the abstract language. Such structures arelater on interpreted by a morphism, the lexicon, toget the concrete object language. We use the stan-dard notations of the typed λ-calculus.

Definition (Types). LetA be a set of atomic types.The set T (A) of implicative types built upon A isdefined with the following grammar:5

T (A) ::= A|T (A) ( T (A)

Definition (Higher-Order Signatures). A higher-order signature Σ is a triple Σ = 〈A,C, τ〉 where:

5We use the linear arrow ( of linear logic (Girard, 1987)for the implication.

Page 7: Interfacing Sentential and Discourse TAG-based Grammars · D-STAG, to build discourse structures that are direct acyclic graphs (DAG) and not only trees. All the examples may be run

Λ(ΣD-STAG)

Λ(ΣTAG)

Gdisc.

Λ(Σtrees)Gderived

Λ(Σstring)Gyield

Λ(Σlogic)GTAG sem.

GD-STAG sem.

Figure 11: ACG architecture for a discourse and clause grammar interface

• A is a finite set of atomic types;

• C is a finite set of constants;

• τ : C → T (A) is a function assigning typesto constants.

We note Λ(Σ) the set of typed terms build on Σ.For t ∈ Λ(Σ) and α ∈ T (A), we denote that t hastype α by t :Σ α (possibly omitting the subscript).

Definition (Lexicon). Let Σ1 = 〈A1, C1, τ1〉 andΣ2 = 〈A2, C2, τ2〉 be two higher-order signatures.A lexicon L = 〈F,G〉 from Σ1 to Σ2 is such that:

• F : A1 → T (A2). We also note F :T (A1) → T (A2) its homomorphic exten-sion;6

• G : C1 → Λ(Σ2). We also note G :Λ(Σ1) → Λ(Σ2) its homomorphic exten-sion;7

• F and G are such that for all c ∈ C1, G(c) isof type F (τ1(c)) (i.e., G(c) :Σ2 F (τ1(c))).

We also use L instead of F or G.

Definition (Abstract Categorial Grammar and vo-cabulary). An abstract categorial grammar is aquadruple G = 〈Σ1,Σ2,L,S〉 where:

• Σ1 = 〈A1, C1, τ1〉 and Σ2 = 〈A2, C2, τ2〉 aretwo higher-order signatures. Σ1 (resp. Σ2)is called the abstract vocabulary (resp. theobject vocabulary) and Λ(Σ1) (resp. Λ(Σ2))is the set of abstract terms (resp. the set ofobject terms).

• L : Σ1 → Σ2 is a lexicon.

• S ∈ T (A1) is the distinguished type of thegrammar.

Given an ACG Gname = 〈Σ1,Σ2,Lname,S〉, weuse the following notational variants for the in-terpretation β (resp. u) of the type α (resp. of

6Such that F (α( β) = F (α) ( F (β).7Such that G(λx.t) = λx.G(t) and G(t u) =

G(t) G(u).

the term t): Gname(α) = β and α:=name β (resp.Gname(t) = u and t:=name u). The subscript maybe omitted if clear from the context.

Definition (Abstract and Object Languages).Given an ACG G , the abstract language is definedby

A(G) = {t ∈ Λ(Σ1) |t :Σ1 S}

The object language is defined by

O(G) = {u ∈ Λ(Σ2) | ∃t ∈ A(G) s.t. u = G (t)}

The process of recovering an abstract structurefrom an object term o is called ACG parsing andconsists in finding the inverse image of {o} underthe lexicon (lexicon inversion). In this perspective,derivation trees of TAG are represented as termsof an abstract language, while derived trees (andyields) are represented by terms of some objectlanguages. It is an object language of trees in thederived tree case and an object language of stringsin the yield case. The class of second-order ACGis polynomially parsable with the usual complex-ity bounds (O(n3) for ACG encoding CFG,O(n6)for ACG encoding TAG, Kanazawa (2008)).

The lexicon, i.e., the way structures are inter-preted, plays a crucial role in our proposal intwo different ways. First, two interpretations mayshare the same abstract vocabulary, hence map-ping a single structure into two different ones, typ-ically a surface form and a semantic form. Thiscomposition, illustrated for instance by Gderivedand GTAG sem. sharing the ΣTAG vocabulary in Fig-ure 11, allows for the semantic interpretation ofderivation trees. Second, the result of a first in-terpretation can itself be interpreted by a secondlexicon when the object vocabulary of the first in-terpretation is the abstract vocabulary of the sec-ond one. This composition, illustrated for instanceby the Gyield ◦ Gderived composition in Figure 11,allows for modularity and partial specification ofderivations. This is how we relate the discoursederivation trees to the clausal derivation trees inGdisc..

Page 8: Interfacing Sentential and Discourse TAG-based Grammars · D-STAG, to build discourse structures that are direct acyclic graphs (DAG) and not only trees. All the examples may be run

4 Examples

4.1 TAG as ACG

We present the TAG and D-STAG encoding us-ing examples. This encoding follows (de Groote,2001; de Groote, 2002; Pogodalla, 2009).

In order to encode a TAG into an ACG, weuse a higher-order signature ΣTAG whose atomictypes include S, VP, NP, SA, VPA. . . where theX types stand for the categories X of the nodeswhere a substitution can occur while the XA typesstand for the categories X of the nodes where anadjunction can occur. For each elementary treeγlex. entry, there is a constant Clex. entry whose typeis based on the adjunction and substitution sites asTable 1 shows. It additionally contains constantsIX : XA that are meant to provide a fake auxiliarytree on adjunction sites where no adjunction actu-ally takes place in a TAG derivation. Terms builton this signature are interpreted by Gderived in thehigher-order signature whose unique atomic typeis τ the type of trees. In this signature, for anyX of arity n belonging to the ranked alphabet de-scribing the elementary trees of the TAG, we have

a constant Xn :

n times︷ ︸︸ ︷τ ( · · ·( τ ( τ . Then Gyield

interprets τ into σ, the type for strings, and Xn asλx1 · · ·xn.x1 + · · · + xn. For instance, the lexi-con of Table 1 allows one to interpret two terms ofΛ(ΣTAG) representing a derivation with an adjunc-tion at the S node (resp. at the VP node) of thegiven sentences as the equation (2a) (resp. (2b))shows.

(2a) Gyield ◦ Gderived(Cwent to CSthen IVP CFred CParis) =

then + Fred + went + to + Paris

(2b) Gyield ◦ Gderived(Cwent to IS CVPthen CFred CParis) =

Fred + then + went + to + Paris

4.2 D-STAG as ACG

The ACG encoding of D-STAG follows the abovementioned principles to encode the derived and thederivation trees resulting from the D-STAG ele-mentary trees of Figure 9. As a consequence, weget the same derivation trees. The main differ-ences with (Danlos, 2009; Danlos, 2011) lie in theinterpretations:

• Gdisc. implements the interface between thediscourse grammar and the sentential gram-mar, avoiding the intermediate step of build-

ing a discourse normal form (or the extractionstep in D-LTAG). It is central to our proposal.

• GTAG sem.8 implements the interpretation of

the discourse structures. It slightly differsfrom (Danlos, 2011) in order to allow for amore unified view on the semantic types andto deal with the relative scope of quantifiersand discourse relations.

Sentence-Discourse Interface The higher-order vocabulary ΣD-STAG includes the usualatomic types to describe the sentence level (NP,VP, VPA etc.) and new atomic types to describethe discourse level: Du, which is the type for dis-course units, and the corresponding DuA type rep-resenting adjunction sites. A typical constant in-troducing a discourse marker such as dS

then has type

DC ∆= DuA ( DuA ( DuA ( Du ( DuA

that reflects the auxiliary trees of D-STAG (Fig-ure 9). For comparison, see the encoding of theCVP

then encoding an auxiliary tree adjoining at a VPnode). We also use a type T for full texts.

The key point to smoothly interface the senten-tial and the discourse grammar is to have the con-stant that describes a discourse marker ddm of typeDC at the discourse level interpreted using the cor-responding auxiliary tree Cdm at the right place,i.e., as adjoining into the host clause. So, cru-cially, the interpretation specifies an adjunction ofthe auxiliary tree into the tree that is being substi-tuted (i.e., the argument of Du type that is param-eter of ddm or, in D-STAG terms, the one pluggedinto the Du ↓ node of the auxiliary trees of Fig-ure 9). This operation mimics the insertion of theauxiliary tree in Figure 10.

In order to enable this adjunction, we inter-pret discourse units (with Du type) as missing asubordinate conjunction, a fronted adverbial, or aclause-medial adverbial. This corresponds to in-terpreting the atomic type Du as a second-ordertype such as SA ( VPA ( S.9 We actuallyrather interpret Du as SA ( (VPA ( VPA) (S in order to account for clause-medial adverbialsoccurring between to other adverbs such as in Johnsuddenly then passionately kissed her.10 Accord-

8Not discussed here but implemented in the example files.9Another solution would be to have DC requires a (SA (

VPA ( Du) type as fourth parameter. But the ACG wouldnot be second-order anymore.

10It should be clear that from a technical point of view,both fronted and clause-medial missing adverbials could bedealt with the same way (i.e. with a SA ( VPA ( S or a

Page 9: Interfacing Sentential and Discourse TAG-based Grammars · D-STAG, to build discourse structures that are direct acyclic graphs (DAG) and not only trees. All the examples may be run

Constants of ΣTAG Their interpretations by Gderived

CFred : NP γFred : τγFred = NP1 Fred

Cwent to :SA ( VPA (

NP ( NP ( Sγwent to : (τ ( τ) ( (τ ( τ) ( τ ( τ ( τγwent to = λSAsc.S(S2 s (A (VP2 (V1 went) (PP2 (Prep to) c))))

CSthen : SA

γSthen : τ ( τγS

then = λx.(S2 (Adv1 then) x)

CVPthen : VPA ( VPA

γVPthen : (τ ( τ) ( τ ( τγVP

then = λA x.A (VP2 (Adv1 then) x)

Table 1: Sample ACG lexicon encoding the TAG grammar of Figure 1

ingly, at the discourse level, the type of an in-transitive verb will be SA ( VPA ( VPA (NP ( S instead of SA ( VPA ( NP ( S,allowing to specify the two VPA auxiliary treesthat can adjoin before and after the possible dis-course marker. This leads us to the interpreta-tion of Table 2. Note that even though the samename can occur on both sides of the := symbol,the atomic types and the constants on the left handside belong to ΣD-STAG while the (possibly com-plex) types and the terms on the right hand sidebelong to Λ(ΣTAG).

NPA := NPA NA := NA

VP := VP DuA := SA

VPA := VPA ( VPA T := SDu := SA ( (VPA ( VPA) ( S NP := NPS := SA ( (VPA ( VPA) ( S N := NSA := SA ( SA

IX : XA := λP.PdFred : NP:= CFreddwent to : SA ( VPA ( VPA ( S ( S

:= λS a1 a2 s o c m.Cwent to (S c)(a2(m(a1IVP))) s o

din. anc. : S ( DuA ( Du:= λs m ds dv.mod (s ds dv) m

danchor : S ( DuA ( Du:= λs m ds dv.mod (s ds dv) m

dSthen : DuA ( DuA ( DuA ( Du ( DuA

:= λd1 d2 d3 s.cons d1 d2 d3 (s CSthen (λx.x))

dVPthen : DuA ( DuA ( DuA ( Du ( DuA

:= λd1 d2 d3 s.cons d1 d2 d3 (s IS CVPthen)

Table 2: Gdisc. interpretation for the sentence-discourse interface12

We exemplify our approach on the examples(3). In D-STAG, the associated discourse rep-

(SA ( SA) ( (VPA ( VPA) ( S type). We leave it forfurther work to check the adequacy of the same phenomenaoccurring for fronted adverbials and how it compares withdiscourse connective modification or multiple connectives.

12mod and cons are two operators that have no other mean-ing that juxtaposing TAG derivation trees of elementary dis-course units. They are interpreted as: mod := λs m.m s(it performs the actual adjunction on the derived tree) andcons := λs1 s2 s3 s x.s1(s2(S3 x . (s3 s))) (it builds a de-rived tree, inserting a period between the derived trees corre-sponding to the elementary discourse units).

dinitial anchor

C6 dbecause

IDu dSthen

IDu IDu IDu danchor

C8 IDu

IDu danchor

C7 IDu

Figure 12: Discourse derivation trees

resentation is as in Figure 13, and the discoursederivation trees is the one of Figure 12 where theCis correspond to the derivation trees of the brack-eted discourse units of the examples. In D-STAG,the discourse derivation tree of course results fromthe discourse normal form F6 because F7 then F8

that are the same for (3a) and (3b).(3) a. [Fred went to the supermarket]6 because

[his fridge is empty]7. Then, [he went tothe movies]8.

b. [Fred went to the supermarket]6 because[his fridge is empty]7. [He]8 then [wentto the movies]8.

φExpl.

F6 F7

φNarr.

F8

Figure 13: Discourse structure for (3a)

If we define the terms d8 and d′8 as in (4) and (5),we can compute their interpretations (6)-(11) us-ing the lexicons of Tables 1 and 2. They show thatboth positions for the adverbs are now availabledirectly from the abstract terms representing dis-course derivations. Consequently, the two termsdefined in (12) and (13) account for both sentencesof (3). Note that they differ only in the constantthey use for the adverb.

Page 10: Interfacing Sentential and Discourse TAG-based Grammars · D-STAG, to build discourse structures that are direct acyclic graphs (DAG) and not only trees. All the examples may be run

S

S∗ . S

AdvThen

S

NPFred

VPV

wentPP

Prep

to

NPDetthe

Nmovies

(a) Derived tree for d8

S

S∗ . S

NPFred

VPAdvthen

VPV

wentPP

Prep

to

NPDetthe

Nmovies

(b) Derived tree for d′8

Figure 14: Interpretations as derived trees

d8 = dSthen IDu IDu IDu (danchor C8 IDu) : DuA(4)

d′8 = dVPthen IDu IDu IDu (danchor C8 IDu) : DuA(5)

5 Related Works

This problem of avoiding an intermediate stepbetween a sentential and a discourse analysishas also been addressed within the framework ofCombinatory Categorial Grammar (CCG, Steed-man (2001); Steedman and Baldridge (2011)) byNakatsu and White (2010). They propose a sin-gle grammar to treat both sentential and discoursephenomena using Discourse Combinatory Cate-gorial Grammar (DCCG). This approach intro-duced “cue threading” where “connectives can be

thought of as percolating from where they takescope semantically down to the clause in whichthey appear” (Nakatsu and White, 2010, p. 19).Here, the connective at the discourse level takesscope over its argument, but it is interpreted atthe sentential level as an auxiliary tree adjoiningwithin the clause.

6 Conclusion

This article shows how to interface TAG-basedsentential and discourse grammars without resort-ing to a two step process. It relies on the interpre-tation of abstract terms encoding discourse deriva-tion trees into terms encoding sentential derivationtrees using ACG. The approach also allows us tobuild DAG discourse structures. ACG grammarshave been implemented to compute (and parse) thesurface forms and associate them with the rele-vant semantic forms. In this article, we only ap-plied the approach to D-STAG, but it should beclear that it applies to D-LTAG as well. The ap-proach is also suitable to model connective mod-ifications (. . . probably because it rains). Ourfuture work will concern multiple connectives(. . . because then he discovered he was broke),some of them we already account for. It will alsoconcern the integration of discourse structure con-straints such as the right frontier principle and theinteraction with pronominal anaphora resolution.

Finally, discourse grammars are highly ambigu-ous. Hence the ACG we derive from such gram-mars also are ambiguous. We want to take ad-vantage of our integrated approach to apply thedisambiguation methods used in syntactic parsing.Moreover, as the analysis can now be dealt with atthe level of the text, even with polynomial algo-rithms, the size of the input will be an issue. Thiscalls for further analysis of discourse structuring,both in parsing and generation.

Gdisc.(d8) = cons IS IS IS (mod (Cwent to CSthen IVP CFred (Cthe (Cmovies IN))) IS) : SA(6)

Gderived ◦ Gdisc.(d8) = [see the tree representation in Figure 14(a)](7)Gyield ◦ Gderived ◦ Gdisc.(d8) = λx.x+ .+ Then + Fred + went + to + the + movies) : σ ( σ(8)

Gdisc.(d′8) = cons IS IS IS (mod (Cwent to IS (CVP

then IVP) CFred (Cthe (Cmovies IN))) IS) : SA(9)

Gderived ◦ Gdisc.(d′8) = [see the tree representation in Figure 14(b)](10)

Gyield ◦ Gderived ◦ Gdisc.(d′8) = λx.x+ .+ Fred + then + went + to + the + movies) : σ ( σ(11)

d3 = din. anc. C6 (dbecause IDu (dSthen IDu IDu IDu (danc. C8 IDu)) IDu (danc. C7 IDu))(12)

d′3 = din. anc. C6 (dbecause IDu (dVPthen IDu IDu IDu (danc. C8 IDu)) IDu (danc. C7 IDu))(13)

Page 11: Interfacing Sentential and Discourse TAG-based Grammars · D-STAG, to build discourse structures that are direct acyclic graphs (DAG) and not only trees. All the examples may be run

ReferencesAnne Abeille. 2002. Une grammaire electronique du

francais. Sciences du langage. CNRS Editions.

Timothee Bernard and Laurence Danlos. 2016. Mod-elling discourse in stag: Subordinate conjunc-tions and attributing phrases. In David Chiangand Alexander Koller, editors, Proceedings of theTwelfth International Workshop on Tree AdjoiningGrammars and Related Framework (TAG+12). HALopen archive: hal-01329539.

Laurence Danlos, Aleksandre Maskharashvili, and Syl-vain Pogodalla. 2015. Grammaires phrastiqueset discursives fondees sur les TAG : une approchede D-STAG avec les ACG. In TALN 2015 -22e conference sur le Traitement Automatique desLangues Naturelles, Actes de TALN 2015, pages158–169, Caen, France. Association pour le Traite-ment Automatique des Langues. HAL open archive:hal-01145994.

Laurence Danlos. 1998. G-TAG : Un formalisme lex-icalise pour la generation de textes inspire de TAG.Traitement Automatique des Langues, 39(2). HALopen archive: inria-00098489.

Laurence Danlos. 2000. G-TAG: A lexicalized for-malism for text generation inspired by Tree Ad-joining Grammar. In Anne Abeille and OwenRambow, editors, Tree Adjoining Grammars: For-malisms, Linguistic Analysis, and Processing, vol-ume 107 of CSLI Lecture Notes, pages 343–370.CSLI Publications. http://www.linguist.jussieu.fr/∼danlos/Dossier%20publis/G-TAG-eng’01.pdf.

Laurence Danlos. 2009. D-STAG : un formalismed’analyse automatique de discours base sur les TAGsynchrones. T.A.L., 50(1):111–143. HAL openarchive: inria-00524743.

Laurence Danlos. 2011. D-STAG: a formalism fordiscourse analysis based on SDRT and using Syn-chronous TAG. In Philippe de Groote, MarkusEgg, and Laura Kallmeyer, editors, 14th confer-ence on Formal Grammar - FG 2009, volume5591 of LNCS/LNAI, pages 64–84. Springer. DOI:10.1007/978-3-642-20169-1 5.

Laurence Danlos. 2013. Connecteurs de dis-cours adverbiaux : Problemes a l’interface syntaxe-semantique. LinguisticæInvestigationes, 36(2):261–275. HAL open archive: hal-00932184. DOI:10.1075/li.36.2.05dan.

Philippe de Groote and Sylvain Pogodalla. 2004. Onthe expressive power of Abstract Categorial Gram-mars: Representing context-free formalisms. Jour-nal of Logic, Language and Information, 13(4):421–438. HAL open archive: inria-00112956. DOI:10.1007/s10849-004-2114-x.

Philippe de Groote. 2001. Towards Abstract Catego-rial Grammars. In Association for Computational

Linguistics, 39th Annual Meeting and 10th Confer-ence of the European Chapter, Proceedings of theConference, pages 148–155. ACL anthology: P01-1033.

Philippe de Groote. 2002. Tree-Adjoining Gram-mars as Abstract Categorial Grammars. In Pro-ceedings of the Sixth International Workshop onTree Adjoining Grammars and Related Frameworks(TAG+6), pages 145–150. Universita di Venezia.URL: http://www.loria.fr/equipes/calligramme/acg/publications/2002-tag+6.pdf.

Katherine Forbes, Eleni Miltsakaki, Rashmi Prasad,Anoop Sarkar, Aravind K. Joshi, and Bonnie LynnWebber. 2003. D-LTAG system: Discourse parsingwith a Lexicalized Tree-Adjoining Grammar. Jour-nal of Logic, Language and Information, 12(3):261–279. Special Issue: Discourse and InformationStructure. DOI: 10.1023/A:1024137719751.

Katherine Forbes-Riley, Bonnie Lynn Webber, and Ar-avind K. Joshi. 2006. Computing discourse se-mantics: The predicate-argument semantics of dis-course connectives in D-LTAG. Journal of Seman-tics, 23(1):55–106. DOI: 10.1093/jos/ffh032.

Claire Gardent and Laura Kallmeyer. 2003. Semanticconstruction in feature-based TAG. In Proceedingsof the 10th Meeting of the European Chapter of theAssociation for Computational Linguistics (EACL),pages 123–130. ACL anthology: E03-1030.

Claire Gardent. 1997. Discourse tree adjoining gram-mar. CLAUS Report 89, Universit, Saarbr, April.

Jean-Yves Girard. 1987. Linear logic. The-oretical Computer Science, 50(1):1–102. DOI:10.1016/0304-3975(87)90045-4.

Aravind K. Joshi and Yves Schabes. 1997. Tree-adjoining grammars. In Grzegorz Rozenberg andArto K. Salomaa, editors, Handbook of formal lan-guages, volume 3, chapter 2. Springer.

Aravind K. Joshi, Leon S. Levy, and Masako Taka-hashi. 1975. Tree adjunct grammars. Journalof Computer and System Sciences, 10(1):136–163.DOI: 10.1016/S0022-0000(75)80019-5.

Laura Kallmeyer and Maribel Romero. 2008. Scopeand situation binding for LTAG. Research onLanguage and Computation, 6(1):3–52. DOI:10.1007/s11168-008-9046-6.

Makoto Kanazawa. 2008. A prefix-correct earleyrecognizer for multiple context-free grammars. InProceedings of the Ninth International Workshop onTree Adjoining Grammars and Related Formalisms(TAG+9), pages 49–56, Tuebingen, Germany, June7–8. http://tagplus9.cs.sfu.ca/papers/Kanazawa.pdf.

Daniel Marcu. 2000. The Theory and Practice of Dis-course Parsing and Summarization. The MIT Press.

Page 12: Interfacing Sentential and Discourse TAG-based Grammars · D-STAG, to build discourse structures that are direct acyclic graphs (DAG) and not only trees. All the examples may be run

Frederic Meunier. 1997. Implantation du formalismede generation G-TAG. Ph.D. thesis, Universite Paris7 — Denis Diderot.

Crystal Nakatsu and Michael White. 2010. Gener-ating with discourse combinatory categorial gram-mar. Linguistic Issues in Language Technology,4. http://journals.linguisticsociety.org/elanguage/lilt/article/view/1277.html.

Rebecca Nancy Nesson and Stuart M. Shieber. 2006.Simpler TAG semantics through synchronization.In Proceedings of the 11th Conference on For-mal Grammar, Malaga, Spain, 7. CSLI Publica-tions. http://cslipublications.stanford.edu/FG/2006/nesson.pdf.

Sylvain Pogodalla. 2004. Computing Semantic Repre-sentation: Towards ACG Abstract Terms as Deriva-tion Trees. In Seventh International Workshop onTree Adjoining Grammar and Related Formalisms -TAG+7, pages 64–71, Vancouver, BC, Canada. HALopen archive: inria-00107768.

Sylvain Pogodalla. 2009. Advances in Abstract Cat-egorial Grammars: Language Theory and Linguis-tic Modeling. ESSLLI 2009 Lecture Notes, Part II.HAL open archive: hal-00749297.

Livia Polanyi and Martin H. van den Berg. 1996.Discourse structure and discourse interpretation.In Paul J. E. Dekker and Martin Stokhof, edi-tors, Proceedings of the Tenth Amsterdam Collo-quium. ILLC/Department of Philosophy, Universityof Amsterdam. http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.56.221.

Owen Rambow, K. Vijay-Shanker, and David Weir.2001. D-Tree Substitution Grammars. Computa-tional Linguistics, 27(1):87–121. ACL anthology:J01-1004. DOI: 10.1162/089120101300346813.

Frank Schilder. 1997. Tree discourse grammaror how to get attached to a discourse? In InProceedings of the Tilburg Conference on For-mal Semantics (IWCS-1997), pages 261–273.ftp://ftp.informatik.uni-hamburg.de/pub/unihh/informatik/WSV/schild97a.ps.gz.

Stuart M. Shieber and Yves Schabes. 1990. Syn-chronous tree-adjoining grammars. In Proceedingsof the 13th International Conference on Compu-tational Linguistics, volume 3, pages 253–258,Helsinki, Finland. http://www.eecs.harvard.edu/∼shieber/Biblio/Papers/synch-tags.pdf. DOI:10.3115/991146.991191.

Stuart M. Shieber. 2006. Unifying synchronous tree-adjoining grammars and tree transducers via bimor-phisms. In Proceedings of the 11th Conference ofthe European Chapter of the Association for Com-putational Linguistics (EACL-06), pages 377–384,Trento, Italy, 3–7 April. ACL anthology: E06-1048.

Radu Soricut and Daniel Marcu. 2003. Sentencelevel discourse parsing using syntactic and lexi-cal information. In Marti Hearst and Mari Osten-dorf, editors, Proceedings of the 2003 Human Lan-guage Technology Conference of the North Ameri-can Chapter of the Association for ComputationalLinguistics (HLT-NAACL 2003), pages 149–156.ACL anthology: N03-1030.

Mark Steedman and Jason Baldridge. 2011. Combi-natory categorial grammar. In Robert Borsley andKersti Borjars, editors, Non-Transformational Syn-tax: Formal and Explicit Models of Grammar, chap-ter 5. Wiley-Blackwell.

Mark Steedman. 2001. The Syntactic Process. MITPress.

Bonnie Lynn Webber and Aravind K. Joshi. 1998.Anchoring a lexicalized tree-adjoining grammarfor discourse. In Manfred Stede, Leo Wanner,and Eduard Hovy, editors, Proccedings of theACL/COLING workshop on Discourse Relationsand Discourse Markers. ACL anthology: W98-0315.

Bonnie Lynn Webber. 2004. D-LTAG:extending lexicalized TAG to discourse.Cognitive Science, 28(5):751–779. DOI:10.1207/s15516709cog2805 6.

XTAG Research Group. 2001. A Lexicalized TreeAdjoining Grammar for English. Technical Re-port IRCS-01-03, IRCS, University of Pennsylva-nia. ftp://ftp.cis.upenn.edu/pub/xtag/release-2.24.2001/tech-report.pdf.