amr parsing with latent structural informationgenerator. we then combine both explicit and latent...

Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 4306–4319July 5 - 10, 2020. c©2020 Association for Computational Linguistics

4306

AMR Parsing with Latent Structural Information

Qiji Zhou1†, Yue Zhang2,3, Donghong Ji1∗, Hao Tang1

1Key Laboratory of Aerospace Information Security and Trusted Computing,Ministry of Education, School of Cyber Science and Engineering, Wuhan University, China

{qiji.zhou,dhji,tanghaopro}@whu.edu.cn2School of Engineering, Westlake University

3Institute of Advanced Technology, Westlake Institute for Advanced [email protected]

Abstract

Abstract Meaning Representations (AMRs)capture sentence-level semantics structuralrepresentations to broad-coverage natural sen-tences. We investigate parsing AMR with ex-plicit dependency structures and interpretablelatent structures. We generate the latent softstructure without additional annotations, andfuse both dependency and latent structure viaan extended graph neural networks. Thefused structural information helps our experi-ments results to achieve the best reported re-sults on both AMR 2.0 (77.5% Smatch F1 onLDC2017T10) and AMR 1.0 (71.8% SmatchF1 on LDC2014T12).

1 Introduction

Abstract Meaning Representations (AMRs) (Ba-narescu et al., 2013) model sentence level seman-tics as rooted, directed, acyclic graphs. Nodes inthe graph are concepts which represent the events,objects and features of the input sentence, andedges between nodes represent semantic relations.AMR introduces re-entrance relation to depict thenode reuse in the graphs. It has been adopted indownstream NLP tasks, including text summariza-tion (Liu et al., 2015; Dohare and Karnick, 2017),question answering (Mitra and Baral, 2016) andmachine translation (Jones et al., 2012; Song et al.,2019).

AMR parsing aims to transform natural languagesentences into AMR semantic graphs. Similar toconstituent parsing and dependency parsing (Nivre,2008; Dozat and Manning, 2017), AMR parsersmainly employ two parsing techniques: transition-based parsing (Wang et al., 2016; Damonte et al.,2017; Wang and Xue, 2017; Liu et al., 2018; Guoand Lu, 2018) use a sequence of transition actions

†Part of work was done when the author was visiting West-lake University

*Corresponding author.

The boy came and left

and

boy

come-01 leave-11

det nsubj cc

conj

:op1 :op2

:ARG1

:ARG0

Figure 1: An example of AMR graph and its corre-sponding syntactic dependencies for the sentence “Theboy came and left”. The dashed lines denote the con-nected relations in the syntactic dependencies but notappear in the AMR graph.

to incrementally construct the graph, while graph-based parsing (Flanigan et al., 2014; Lyu and Titov,2018; Zhang et al., 2019a; Cai and Lam, 2019)divides the task into concept identification and rela-tion extraction stages and then generate a full AMRgraph with decoding algorithms such as greedy andmaximum spanning tree (MST). Additionally, re-inforcement learning (Naseem et al., 2019) andsequence-to-sequence (Konstas et al., 2017) havebeen exploited in AMR parsing as well.

Previous works (Wang et al., 2016; Artzi et al.,2015) shows that structural information can bringbenefit to AMR parsing. Illustrated by Figure 1,for example syntactic dependencies can conveythe main predicate-argument structure. However,dependency structural information may be noisydue to the error propagation of external parsers.Moreover, AMR concentrates on semantic rela-tions, which can be different from syntactic depen-dencies. For instance, in Figure 1, AMR prefersto select the coordination (i.e. “and”) as the root,which is different from syntactic dependencies (i.e.“came”).

Given the above observations, we investigate theeffectiveness of latent syntactic dependencies for

4307

AMR parsing. Different from existing work (Wanget al., 2016), which uses a dependency parser to pro-vide explicit syntactic structures, we make use of atwo-parameter distribution (Bastings et al., 2019) toinduce latent graphs, which is differentiable underreparameterization (Kingma and Welling, 2014).We thus build a end-to-end model for AMR pars-ing with induced latent dependency structures asa middle layer, which is tuned in AMR trainingand thus can be more aligned to the need of AMRstructure.

For better investigating the correlation betweeninduced and gold syntax, and better combine thestrengths, we additionally consider fusing gold andinduced structural dependencies into an align-freeAMR parser (Zhang et al., 2019a). Specifically,we first obtain the input sentence’s syntactic depen-dencies1 and treat the input sentence as prior of theprobabilistic graph generator for inferring the latentgraph. Second, we propose an extended graph neu-ral network (GNN) for encoding above structuralinformation. Subsequently we feed the encodedstructural information into a two stage align-freeAMR parser (Zhang et al., 2019a) for promotingAMR parsing.

To our knowledge, we are the first to incorpo-rate syntactic latent structure in AMR parsing. Ex-perimented results show that our model achieves77.5% and 71.8% SMATCH F1 on standard AMRbenchmarks LDC2017T10 and LDC2014T12, re-spectively, outperforming all previous best reportedresults. Beyond that, to some extent, our model caninterpret the probabilistic relations between the in-put words in AMR parsing by generating the latentgraph2.

2 Baseline: Align-Free AMR Parsing

We adopt the parser of Zhang et al. (2019a) as ourbaseline, which treats AMR parsing as sequence-to-graph transduction.

2.1 Task FormalizationOur baseline splits AMR parsing into a two-stageprocedure: concept identification and edge pre-diction. The first task aims to identify the concepts(nodes) in AMR graph from input tokens, and thesecond task is designed to predict semantic rela-tions between identified concepts.

1We employ Stanford CoreNLP (Manning et al., 2014) toget the dependencies.

2Our code will be available at: https://github.com/zhouqiji/ACL2020_AMR_Parsing.

Formally, for a given input sequence of wordsw = 〈w1, ..., wn〉, the goal of concept identifica-tion in our baseline is sequentially predicting theconcept nodesu = 〈u1, ..., um〉 in the output AMRgraph, and deterministically assigning correspond-ing indices d = 〈d1, ..., dm〉.

P (u) =m∏i=1

P (ui | u<i, d<i,w),

After identifying the concept nodes c and theircorresponding indices d, we predict the semanticrelations in the searching spaceR(u).

Predict(u) = argmaxr∈R(u)

∑(ui,uj)∈r

score(ui, uj),

where r = {(ui, uj) | 1 ≤ i, j ≤ m} is a set ofdirected relations between concept nodes.

2.2 Align-Free Concept IdentificationOur baseline extends the pointer-generator networkwith self-copy mechanism for concept identifica-tion (See et al., 2017; Zhang et al., 2018a). Theextended model can copy the nodes not only fromthe source text, but also from the previously gener-ated list of nodes on the target side.

The concept identifier firstly encodes the inputsentence into concatenated vector embeddings withGloVe (Pennington et al., 2014), BERT (Devlinet al., 2019), POS (part-of-speech) and character-level (Kim et al., 2016) embeddings. Subsequently,we encode the embedded sentence by a two-layerbidirectional LSTM (Schuster and Paliwal, 1997;Hochreiter and Schmidhuber, 1997):

hli = [−→f l(hl−1i , hli−1);

←−f l(hl−1i , hli+1)],

where hli is the l-th layer encoded hidden state atthe time step i and h0i is the embedded token wi.

Different from the encoding stage, the decoderdoes not use pre-trained BERT embeddings, butemploys a two-layer LSTM to generate the decod-ing hidden state slt at each time step:

slt = f l(sl−1t , slt−1),

where sl−1t and slt−1 are hidden states from lastlayer and previous time step respectively, and sl0is the concatenation of the last bi-directional en-coding hidden states. In addition, s0t is generatedfrom the concatenation of the previous node ut−1embedding and the attention vector st−1, whichcombine both source and target information:

https://github.com/zhouqiji/ACL2020_AMR_Parsing

https://github.com/zhouqiji/ACL2020_AMR_Parsing

4308

st = tanh(Wc[ct; slt] + bc),

where Wc and bc are trainable parameters, ct is thecontext vector calculated by the attention weightedencoding hidden states and the source attentiondistribution atsrc following Bahdanau et al. (2015)

The produced attention vector s is used to gener-ate the vocabulary distribution:

Pvocab = softmax(Wvocabst + bvocab),

as well as the target attention distribution:

ettgt = v>tgttanh(Wtgts1:t−1 + Utgtst + btgt),

attgt = softmax(ettgt),

The source-side copy probability psrc, target-sidecopy probability ptgt and generation probabilitypgen are calculated by s, which can be treated asgeneration switches:

[psrc, ptgt, pgen] = softmax(Wswitchst + bswitch),

The final distribution is defined below, if vt iscopied from existing nodes:

P (node)(ut) = ptgt

t−1∑i:ui=ut

attgt[i],

otherwise:

P (node)(ut) = pgenPvocab(ut) + psrc

n∑i:wi=ut

atsrc[i],

where at[i] is the i-th element of at, and then de-terministically assigned the existing indices to theidentified nodes based on whether the node is gen-erated from the target-side distribution.

2.3 Edge PredictionOur baseline employs a deep biaffine attention clas-sifier for semantic edge prediction (Dozat and Man-ning, 2017), which have been widely used in graph-based structure parsing (Peng et al., 2017; Lyu andTitov, 2018; Zhang et al., 2019a).

For a node ut, the probability of uk being thehead node of ut and the probability of edge (uk, ut)are defined below:

P (head)t (uk) =

exp(

score(edge)k,t

)∑m

j=1 exp(

score(edge)j,t

) ,P (label)k,t (l) =

exp(

score(label)k,t [l]

)∑

l′ exp(

score(label)k,t [l′]

) ,

where score(score) and label(edge) are calculated viabi-affine attentions.

3 Model

The overall structure of our model is shown inFigure 2. First, we use an external dependencyparser (Manning et al., 2014) to obtain the explicitstructural information, and obtain the latent struc-tural information via a probabilistic latent graphgenerator. We then combine both explicit and latentstructural information by encoding the input sen-tence through an extended graph neural network.Finally, we incorporate our model with an align-free AMR parser for parsing AMR graphs with thebenefit of structural information.

3.1 Latent Graph GeneratorWe generate the latent graph of input sentence viathe HardKuma distribution (Bastings et al., 2019),which has both continuous and discrete behaviours.HardKuma can generate samples from the closedinterval [0, 1] probabilisitcally . This feature allowsus to predict soft connections probabilities betweeninput words, which can be seen as a latent graph.Specifically, we treat embedded input words as aprior of a two-parameters distribution, and thensample a soft adjacency matrix between inputwords for representing a dependency.

HardKuma Distribution The HardKuma distri-bution is derived from the Kumaraswamy distri-bution (Kuma) (Kumaraswamy, 1980), which isa two-parameters distribution over an open inter-val (0, 1), i.e., K ∼ Kuma(a, b), where a ∈ R>0

and b ∈ R>0. The Kuma distribution is similarto Beta distribution, but its CDF function has asimpler analytical solution and inverse of the CDFis:

C−1K (u;a, b) =(1− (1− u)1/b

)1/a,

We can generate the samples by:

C−1K (U ;α,β) ∼ Kuma(α,β),

where U ∼ U(0, 1) is the uniform distribution, andwe can reconstruct this inverse CDF function bythe reparameterizing fashion (Kingma and Welling,2014; Nalisnick and Smyth, 2017).

In order to include the two discrete points 0 and1, HardKuma employs a stretch-and-rectify methodwith support (Louizos et al., 2017), which leads

4309

… …

The boy came and left

Latent Graph Generator

𝑤𝑤1 𝑤𝑤2 𝑤𝑤3

𝑤𝑤4

𝑤𝑤5

Graph Encoder

Align-Free Concept Identifier

Explicit Graph Latent

Graph

0.0 1.0

1.0

HardKuma

𝑤𝑤3𝑤𝑤2

𝑤𝑤4

𝑤𝑤1 𝑤𝑤5𝑤𝑤1

𝑤𝑤4

GELU

Graph Fusion Layer

𝑤𝑤3 𝑤𝑤5

𝑤𝑤2

GELUGELU GELU GELU

GELU GELUGELU GELU GELU

… …… … …

𝑛𝑛 l−1

𝑛𝑛 l

Figure 2: Stretch of the model which has four main components: (1) A latent graph generator for producingthe soft-connected latent graph (§§3.1); (2) An extended syntactic graph convolutional network for encoding thestructural information (§§3.2); (3) An align-free concept identification for concept node generation (§§2.2); (4) Adeep biaffine classifier for relation edge prediction (§§2.3).

the variable T ∼ Kuma(a, b, l, r) to be sampledfrom Kuma distribution with an open interval (l, r)where l < 0 and r > 0. The new CDF is:

CT (t;a, b, l, r) = CK((t− l)/(r − l);a, b),

We pass the stretched variable T ∼Kuma(a, b, l, r) through a hard-sigmoid function(i.e., h = min(1,max(0, t))) to obtain therectified variable H ∼ HardKuma(a, b, l, r).Therefore, the rectified variable covers the closedinterval [0, 1]. Note that all negative values of tare deterministically mapped to 0. In contrast,all samples t > 1 are mapped to 13. Becausethe rectified variable is sampled based on Kumadistribution, HardKuma first sample a uniformvariable over open interval (0, 1) from uniformdistribution U ∼ U(0, 1), and then generate aKuma variable through inverse CDF:

k = C−1K (u;a, b),

Second, we transform the Kuma variable for cov-ering the stretched support:

t = l + (r − l)k,3Details of derivations can be found at (Bastings et al.,

2019).

Finally, we rectify the stretched variable includ-ing closed interval [0, 1] via a hard-sigmoid func-tion:

h = min(1,max(0, t)).

Latent Graph We generate the latent graph ofinput words w by sampling from HardKuma distri-bution with trained parameters a and b. We firstcalculate the prior c of (a, b) by employing multi-head self-attention (Vaswani et al., 2017):

ca = Transfomera(v),

cb = Transfomerb(v),

where v = 〈v1, ..., vn〉 is the embedded inputwords. Subsequently, we compute a and b as:

a = Norm(cacTa ),

b = Norm(cbcTb ),

where ai = 〈ai1, ..., ain〉 and bi = 〈bi1, ..., bin〉,ca, cb ∈ Rn×n and Norm(x) is the normalizationfunction. Hence, the latent graph L is sampled vialearned parameters a and b:

lij ∼ HardKuma(aij , bij , l, r).

4310

3.2 Graph EncoderFor a syntactic graph with n nodes, the cellAij = 1in the corresponding adjacent matrix represents thatan edge connects word wi to word wj . An L-layersyntactic GCN of l-th layer can be used to representA, where the hidden vector for each word wi at thel − th layer is:

h(l)i = σ(

n∑j=1

AijW(l)h

(l−1)j /di + b(l)),

where A = A+ I with the n× n identity matrixI , di =

∑nj=1 Aij is the degree of word wi in

the graph for normalizing the activation to avoidthe word representation with significantly differentmagnitudes (Marcheggiani and Titov, 2017; Kipfand Welling, 2017), and σ is a nonlinear activationfunction.

In order to take benefits from both explicit and la-tent structural information in AMR parsing, we ex-tend the Syntactic-GCN (Marcheggiani and Titov,2017; Zhang et al., 2018b) with a graph fusionlayer and omit labels in the graph (i.e. we onlyconsider the connected relation in GCN). Specif-ically, we propose to merge the parsed syntacticdependencies and sampled latent graph through agraph fusion layer:

F = πL+ (1− π)Dwhere π is trainable gate variables are calculatedvia the sigmoid function,D and L are the parsedsyntactic dependencies and generated latent graphrespectively, and F represent the fused soft graph.Furthermore, F is a n× n adjacent matrix for theinput words w, different from the sparse adjacentmatrixA, Fij denote a soft connection degree fromword wi to word wj . We adapt syntactic-GCNwith a fused adjacent matrix F , and employ a gatemechanism:

h(l)i = GELU(

Lnorm(

n∑j=1

Gj(FijW(l)h

(l−1)j + b(l)))),

We use GELU (Hendrycks and Gimpel, 2016)as the activation function, and apply layer normal-ization Lnorm (Ba et al., 2016) before passing theresults into GELU. The scalar gate Gj is calculatedby each edge-node pair :

Gj = µ(h(l−1)j · v(l−1) + b(l−1)),

where µ is the logistic sigmoid function, v and bare trainable parameters.

3.3 TrainingSimilar to our baseline (Zhang et al., 2019a), welinearize the AMR concepts nodes by a pre-ordertraversal over the training dataset. We obtain gra-dient estimates of E(φ, θ) through Monte Carlosampling from:

E(φ, θ) = EU(0,I) [logP (node|ut, gφ(u,w), θ)

+ logPt(head|uk, gφ(u,w), θ)

+ logPk,t(label|l, gφ(u,w), θ)]

+ λcovlosst

where ut is the reference node at time step t withreference head uk and l is the reference edge labelbetween uk and uj . The form gφ(u,w) is short forthe latent graph samples from uniform distributionto HardKuma distribution (§§3.1).

Different from Bastings et al. (2019), we do notlimit the sparsity of sampled latent graphs, i.e. wedo not control the proportion of zeros in the latentgraph, because we prefer to retain the probabilisticconnection information of each word in w. Finally,we introduce coverage loss into our estimation dueto reduce duplication of node generation (See et al.,2017).

3.4 ParsingWe directly generate the latent graph by the PDFfunction of HardKuma distribution with trainedparameters a and b. In the concept identificationstage, we decode the node from the final probabil-ity distribution P (node)(ut) at each time step, andapply beam search for sequentially generating theconcept nodes u and deterministically assigningcorresponding indices d. For edge prediction, weuse a bi-affine classifier to calculate the edge scoresunder the generated nodes u and indices d:

S = {score(edge)i,j | 0 ≤ i, j ≤ m}.

Similar to Zhang et al. (2019a), we apply a max-imum spanning tree (MST) algorithm (Chu, 1965;Edmonds, 1967) to generate complete AMR graphand restore the re-entrance relations by merging thereceptive nodes via their indices.

4 Experiments

4.1 SetupWe use two standard AMR corpora: AMR1.0(LDC2014T12) and AMR 2.0 (LDC2017T10).AMR 1.0 contains 13051 sentences in total. AMR

4311

Data Parser F1(%)

AMR2.0

Cai and Lam (2019) 73.2Lyu and Titov (2018) 74.4±0.2Lindemann et al. (2019) 75.3±0.1Naseem et al. (2019) 75.5Zhang et al. (2019a) 76.3±0.1

- w/o BERT 74.6Zhang et al. (2019b) 77.0±0.1

Ours 77.5±0.2- w/o BERT 75.5±0.2

AMR1.0

Flanigan et al. (2016) 66.0Pust et al. (2015) 67.1Wang and Xue (2017) 68.1Guo and Lu (2018) 68.3±0.4Zhang et al. (2019a) 70.2±0.1

- w/o BERT 68.8Zhang et al. (2019b) 71.3±0.1

Ours 71.8±0.2- w/o BERT 70.0±0.2

Table 1: Main results of SMATCH F1 on AMR 2.0(LDC2017T10) and 1.0 (LDC2014T12) test sets. Re-sults are evaluated over 3 runs.

2.0 is larger which is split into 36521, 1368 and1371 sentences in training, development and testingsets respectively. We treat in AMR 2.0 as the maindataset in our experiments since it is larger.

We tune hyperparameters on the developmentset, and store the checkpoints under best devel-opment results for evaluation. We employ thepre-processing and post-processing methods fromZhang et al. (2019a), and get the syntactic de-pendencies via Stanford Corenlp (Manning et al.,2014). We train our model jointly with the Adamoptimizer (Kingma and Ba, 2015). The learningrate is decayed based on the results of developmentset in training. Training takes approximately 22hours on two Nivida GeForce GTX 2080 Ti.

4.2 Results

Main Results We compare the SMATCH F1scores (Cai and Knight, 2013) against previousbest reported models and other recent AMRparsers. Table 1 summarizes the results on bothAMR 1.0 and AMR 2.0 data sets. For AMR2.0, with the benefit from the fused structuralinformation, we improve our baseline (Zhang et al.,2019a) by 1.2% F1 in the full model, and 0.9% F1

Metric N’19 Z’19a Z’19b Ours

SMATCH 75.5 76.3 77 77.5

Unlabeled 80 79 80 80.4No WSD 76 77 78 78.2Reentrancies 56 60 61 61.1Concepts 86 85 86 85.9Named Ent. 83 78 79 78.8Wikification 80 86 86 86.5Negation 67 75 77 76.1SRL 72 70 71 71.0

Table 2: Fine-grained F1 scores on the AMR 2.0(LDC2017T10) test set. N’18 is Naseem et al. (2019);Z’19a is Zhang et al. (2019a); Z’19b is Zhang et al.(2019b)

is gained without pre-trained BERT embeddings4.In addition, our model outperform the best reportedmodel (Zhang et al., 2019b) by 0.5% F1. On AMR1.0, there are only about 10k sentences for training.We outperform the best results by 0.5% SmatchF1. We observe that for the smaller data set, ourmodel has a greater improvement of 1.6% F1 thanfor the larger data set (1.2% F1 comparing withour baseline.)

Fine-grained Results Table 2 shows fined-grainedparsing results of each sub-tasks in AMR 2.0,which are evaluated by the enhance AMR eval-uation tools (Damonte et al., 2017). We noticethat our model brings more than 1% average im-provement to our baseline (Zhang et al., 2019a)for most sub-tasks, in particular, the unlabeled isgained 1.4% F1 score increasing with the structuralinformation, and the sub-task of no WSD, reentran-cies, negation and SRL are all improved more than1.0% score under our graph encoder. In addition,our model achieves comparable results to the bestreported method (Zhang et al., 2019b) for each sub-task.

Ablation Study We investigate the impacts of dif-ferent structural information in our model on AMR2.0 with main sub-tasks5. Table 3 shows the fusedstructure perform better in most sub-task than ex-plicit and latent structure. In particular, the modelwith explicit structures (i.e. both explicit and fused)

4We use pre-tained bert-base embedings without fine-tuning.

5We set all the hyper-parameters to the same.

4312

Metric Explicit Latent Fused

SMATCH 77.4 77.4 77.5

Unlabeled 80.2 80.1 80.4Reentrancies 61.1 60.6 61.1Concepts 85.6 86.0 85.7Negation 75.6 75.1 76.1SRL 70.8 70.9 71.0

Table 3: Ablation studies of the results for AMR2.0(LDC2017T10) on different kind of structural informa-tion in our model.

Graph Type UAS

Fused 84.9%Latent 64.1%

Table 4: The UAS of fused and latent graph by calcu-lating from the corresponding explicit dependencies intest set (we calculate the UAS by predicting the maxi-mum probability heads in the latent graph).

outperform the model with only latent structure by0.5% F1 in Reentrancies sub-task, which demon-strates that the explicit dependencies informationcan improve the this sub-task. Latent structure per-form better in concepts sub-task, and fused struc-ture brings more information to the negation sub-task which obtain 0.5% and 1.0% improvementthan explicit and latent structure respectively.

Additionally, we can notice that both latent andexplicit models outperform the previous best re-ported Smatch F1 score, and fused model reach thebest results. It shows that different types of struc-tural information can help the AMR parsing, wediscuss the connection tendencies of each structurein (§§4.3).

4.3 Discussion

Experiment results show that both the explicit struc-ture and latent structure can improve the perfor-mance of AMR parsing, and latent structural infor-mation reduces the errors in sub-tasks such as con-cept and SRL. Different from the discrete relationof explicit structures, the internal latent structureholds soft connection probabilities between wordsin the input sentence, so that, each fully-connectedword receive information from all the other words.

Figure 3 depicts the latent and fused soft adja-cent matrix of the input sentence “The boy cameand left” respectively. It can be seen that the la-

(a) Latent Matrix (b) Fused Matrix

Figure 3: The latent soft adjacent matrix (a) and fusedsoft adjacent matrix (b) of the input sentence “The boycame and left”.

tent matrix (Figure 3a) tries to retain informationfrom most word pairs, and the AMR root “and”holds high connection probabilities to each wordin the sentence. In addition, the mainpredicatesand arguments in the sentence tend to be connectedwith high probabilities. The fused matrix (Fig-ure 3b) holds similar connection probabilities topredicates and arguments in the sentence as well,and it reduces the connection degrees to the de-terminer “The” which does not appear in corre-sponding AMR graph. Moreover, the syntacticroot “came” and semantic root “and” reserve mostconnection probabilistic to other words.

We compare the connections in different struc-tures in Figure 4. The latent graph (Figure 4a)prefers to connect most words, and the main predi-cates and arguments in the graph have higher con-nection probabilities. The fused graph (Figure 4c)shows that our model provides core structural in-formation between interpretable relations. Specif-ically, it holds similar potential relations to anno-tated AMR graph, and tries to alleviate the connec-tion information to the words which are not alignedin AMR concept nodes.

Beyond that, we calculate the Unlabeled Attach-ment Score (UAS) for fused and latent graph inTable 4, the unsupervised latent graph captures lessexplicit edges than fused graph, and both fused andlatent graph ignore some arcs on explicit graph.It shows that a lower UAS does not mean lowerAMR parsing score and some arcs are more use-ful to AMR parsing but not in explicit gold trees.Consequently, we preserve the explicit and latentstructure information simultaneously. The latentstructure can not only improve AMR parsing, butalso have ability to interpret the latent connectionsbetween input words.

4313

came

The boy

and

left

(a) Latent Graph

came

The

boy and

nsubj cc

det

root

left

conj

(b) Syntactic Graph

came

The boy

and

left

(c) Fused Graph

and

boy

come-01 leave-11

:op1 :op2

:ARG1

:ARG0

root

(d) AMR Graph

Figure 4: Different structures of the sentence “The boy came and left ”. (a): The Latent Graph; (b): The SyntacticGraph; (c): The Fused Graph; (d): The AMR graph. (We construct the latent and fused graph by selecting the top2 possible soft connections between each word, in addition, we ignore the edges whose connection probabilitiesare less than 0.5.).

5 Related Work

Transition-based AMR parsers (Wang et al., 2016;Damonte et al., 2017; Wang and Xue, 2017; Liuet al., 2018; Guo and Lu, 2018; Naseem et al., 2019)suffer from the lack of annotated alignments be-tween words and concept notes is crucial in thesemodels. Lyu and Titov (2018) treat the alignmentsas an latent variable for their probabilistic model,which jointly obtains the concepts, relations andalignments variables. Sequence-to-sequence AMRparsers transform AMR graphs into serialized se-quences by external traversal rules, and then re-store the generated the AMR sequence to avoidaligning issue (Konstas et al., 2017; van Noord andBos, 2017). Moreover, Zhang et al. (2019a) extenda pointer generator (See et al., 2017), which cangenerate a node multiple times without alignmentthrough the copy mechanism.

With regards to latent structure, Naradowskyet al. (2012) couples syntactically-oriented NLPtasks to combinatorially constrained hidden syn-tactic representations. Bowman et al. (2016); Yo-gatama et al. (2017) and Choi et al. (2018) generateunsupervised constituent tree for text classification.The latent constituent trees are shallower than hu-man annotated, and it can boost the performanceof downstream NLP tasks (e.g., text classification).Guo et al. (2019) and Ji et al. (2019) employ self-

attention and bi-affine attention mechanism respec-tively to generate soft connected graphs, and thenadopt GNNs to encode the soft structure to takeadvantage from the structural information to theirworks.

GCN and its variants are increasingly appliedin embedding syntactic and semantic structuresin NLP tasks (Kipf and Welling, 2017; Marcheg-giani and Titov, 2017; Damonte and Cohen, 2019).Syntactic-GCN tries to alleviate the error propaga-tion in external parsers with gates mechanism, it en-codes both relations and labels with the gates, andfilters the output of each GCN layer over the depen-dencies. (Marcheggiani and Titov, 2017; Bastingset al., 2017). Damonte and Cohen (2019) encodesAMR graphs via GCN to promote the AMR-to-textgeneration task.

6 Conclusion

We investigate latent structure for AMR parsing,and we denote that the inferred latent graph caninterpret the connection probabilities between in-put words. Experiment results show that the la-tent structural information improve the best re-ported parsing performance on both AMR 2.0(LDC2017T10) and AMR 1.0 (LDC2014T12). Wealso propose to incorporate the latent graph intoother multi-task learning problems (Chen et al.,

4314

2019; Kurita and Søgaard, 2019).

Acknowledgments

We thank the anonymous reviewers for their de-tailed comments. We are grateful to ZhiyangTeng’s discussions and suggestions. This workis supported by the National Natural Science Foun-dation of China (NSFC-61772378), the NationalKey Research and Development Program of China(No.2017YFC1200500) and the Major Projects ofthe National Social Science Foundation of China(No.11&ZD189). We also would like to acknowl-edge funding support from the Westlake Univer-sity and Bright Dream Joint Institute for IntelligentRobotics.

ReferencesYoav Artzi, Kenton Lee, and Luke Zettlemoyer. 2015.

Broad-coverage CCG semantic parsing with AMR.In Proceedings of the 2015 Conference on Empiri-cal Methods in Natural Language Processing, pages1699–1710, Lisbon, Portugal. Association for Com-putational Linguistics.

Lei Jimmy Ba, Jamie Ryan Kiros, and Geoffrey E.Hinton. 2016. Layer normalization. CoRR,abs/1607.06450.

Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Ben-gio. 2015. Neural machine translation by jointlylearning to align and translate. In 3rd Inter-national Conference on Learning Representations,ICLR 2015, San Diego, CA, USA, May 7-9, 2015,Conference Track Proceedings.

Laura Banarescu, Claire Bonial, Shu Cai, MadalinaGeorgescu, Kira Griffitt, Ulf Hermjakob, KevinKnight, Philipp Koehn, Martha Palmer, and NathanSchneider. 2013. Abstract meaning representationfor sembanking. In Proceedings of the 7th Linguis-tic Annotation Workshop and Interoperability withDiscourse, pages 178–186, Sofia, Bulgaria. Associa-tion for Computational Linguistics.

Joost Bastings, Wilker Aziz, and Ivan Titov. 2019. In-terpretable neural predictions with differentiable bi-nary variables. In Proceedings of the 57th AnnualMeeting of the Association for Computational Lin-guistics, pages 2963–2977, Florence, Italy. Associa-tion for Computational Linguistics.

Joost Bastings, Ivan Titov, Wilker Aziz, DiegoMarcheggiani, and Khalil Sima’an. 2017. Graphconvolutional encoders for syntax-aware neural ma-chine translation. In Proceedings of the 2017 Con-ference on Empirical Methods in Natural LanguageProcessing, EMNLP 2017, Copenhagen, Denmark,September 9-11, 2017, pages 1957–1967.

Samuel R. Bowman, Jon Gauthier, Abhinav Ras-togi, Raghav Gupta, Christopher D. Manning, andChristopher Potts. 2016. A fast unified model forparsing and sentence understanding. In Proceedingsof the 54th Annual Meeting of the Association forComputational Linguistics, ACL 2016, August 7-12,2016, Berlin, Germany, Volume 1: Long Papers.

Deng Cai and Wai Lam. 2019. Core semantic first: Atop-down approach for AMR parsing. In Proceed-ings of the 2019 Conference on Empirical Methodsin Natural Language Processing and the 9th Inter-national Joint Conference on Natural Language Pro-cessing (EMNLP-IJCNLP), pages 3797–3807, HongKong, China. Association for Computational Lin-guistics.

Shu Cai and Kevin Knight. 2013. Smatch: an evalua-tion metric for semantic feature structures. In Pro-ceedings of the 51st Annual Meeting of the Associa-tion for Computational Linguistics (Volume 2: ShortPapers), pages 748–752, Sofia, Bulgaria. Associa-tion for Computational Linguistics.

Mingda Chen, Qingming Tang, Sam Wiseman, andKevin Gimpel. 2019. A multi-task approach for dis-entangling syntax and semantics in sentence repre-sentations. In Proceedings of the 2019 Conferenceof the North American Chapter of the Associationfor Computational Linguistics: Human LanguageTechnologies, Volume 1 (Long and Short Papers),pages 2453–2464, Minneapolis, Minnesota. Associ-ation for Computational Linguistics.

Jihun Choi, Kang Min Yoo, and Sang-goo Lee. 2018.Learning to compose task-specific tree structures. InProceedings of the Thirty-Second AAAI Conferenceon Artificial Intelligence, (AAAI-18), the 30th inno-vative Applications of Artificial Intelligence (IAAI-18), and the 8th AAAI Symposium on EducationalAdvances in Artificial Intelligence (EAAI-18), NewOrleans, Louisiana, USA, February 2-7, 2018, pages5094–5101.

Yoeng-Jin Chu. 1965. On the shortest arborescence ofa directed graph. Scientia Sinica, 14:1396–1400.

Marco Damonte and Shay B. Cohen. 2019. Structuralneural encoders for AMR-to-text generation. In Pro-ceedings of the 2019 Conference of the North Amer-ican Chapter of the Association for ComputationalLinguistics: Human Language Technologies, Vol-ume 1 (Long and Short Papers), pages 3649–3658,Minneapolis, Minnesota. Association for Computa-tional Linguistics.

Marco Damonte, Shay B. Cohen, and Giorgio Satta.2017. An incremental parser for abstract meaningrepresentation. In Proceedings of the 15th Confer-ence of the European Chapter of the Associationfor Computational Linguistics: Volume 1, Long Pa-pers, pages 536–546, Valencia, Spain. Associationfor Computational Linguistics.

Jacob Devlin, Ming-Wei Chang, Kenton Lee, andKristina Toutanova. 2019. BERT: pre-training of

https://doi.org/10.18653/v1/D15-1198

http://arxiv.org/abs/1607.06450



https://www.aclweb.org/anthology/W13-2322

https://www.aclweb.org/anthology/W13-2322

https://doi.org/10.18653/v1/P19-1284

https://doi.org/10.18653/v1/P19-1284

https://doi.org/10.18653/v1/P19-1284

https://www.aclweb.org/anthology/D17-1209/



https://www.aclweb.org/anthology/P16-1139/

https://www.aclweb.org/anthology/P16-1139/

https://doi.org/10.18653/v1/D19-1393

https://doi.org/10.18653/v1/D19-1393

https://www.aclweb.org/anthology/P13-2131

https://www.aclweb.org/anthology/P13-2131

https://doi.org/10.18653/v1/N19-1254

https://doi.org/10.18653/v1/N19-1254

https://doi.org/10.18653/v1/N19-1254

https://www.aaai.org/ocs/index.php/AAAI/AAAI18/paper/view/16682

https://doi.org/10.18653/v1/N19-1366

https://doi.org/10.18653/v1/N19-1366

https://www.aclweb.org/anthology/E17-1051

https://www.aclweb.org/anthology/E17-1051

https://www.aclweb.org/anthology/N19-1423/

4315

deep bidirectional transformers for language under-standing. In Proceedings of the 2019 Conferenceof the North American Chapter of the Associationfor Computational Linguistics: Human LanguageTechnologies, NAACL-HLT 2019, Minneapolis, MN,USA, June 2-7, 2019, Volume 1 (Long and Short Pa-pers), pages 4171–4186.

Shibhansh Dohare and Harish Karnick. 2017. Textsummarization using abstract meaning representa-tion. CoRR, abs/1706.01678.

Timothy Dozat and Christopher D. Manning. 2017.Deep biaffine attention for neural dependency pars-ing. In 5th International Conference on LearningRepresentations, ICLR 2017, Toulon, France, April24-26, 2017, Conference Track Proceedings.

Jack Edmonds. 1967. Optimum branchings. Journalof Research of the national Bureau of Standards B,71(4):233–240.

Jeffrey Flanigan, Chris Dyer, Noah A. Smith, andJaime Carbonell. 2016. CMU at SemEval-2016task 8: Graph-based AMR parsing with infiniteramp loss. In Proceedings of the 10th InternationalWorkshop on Semantic Evaluation (SemEval-2016),pages 1202–1206, San Diego, California. Associa-tion for Computational Linguistics.

Jeffrey Flanigan, Sam Thomson, Jaime Carbonell,Chris Dyer, and Noah A. Smith. 2014. A discrim-inative graph-based parser for the abstract mean-ing representation. In Proceedings of the 52nd An-nual Meeting of the Association for ComputationalLinguistics (Volume 1: Long Papers), pages 1426–1436, Baltimore, Maryland. Association for Compu-tational Linguistics.

Zhijiang Guo and Wei Lu. 2018. Better transition-based AMR parsing with a refined search space.In Proceedings of the 2018 Conference on Em-pirical Methods in Natural Language Processing,pages 1712–1722, Brussels, Belgium. Associationfor Computational Linguistics.

Zhijiang Guo, Yan Zhang, and Wei Lu. 2019. Atten-tion guided graph convolutional networks for rela-tion extraction. In Proceedings of the 57th AnnualMeeting of the Association for Computational Lin-guistics, pages 241–251, Florence, Italy. Associationfor Computational Linguistics.

Dan Hendrycks and Kevin Gimpel. 2016. Bridgingnonlinearities and stochastic regularizers with gaus-sian error linear units. CoRR, abs/1606.08415.

Sepp Hochreiter and Jurgen Schmidhuber. 1997.Long short-term memory. Neural Computation,9(8):1735–1780.

Tao Ji, Yuanbin Wu, and Man Lan. 2019. Graph-based dependency parsing with graph neural net-works. In Proceedings of the 57th Annual Meet-ing of the Association for Computational Linguis-tics, pages 2475–2485, Florence, Italy. Associationfor Computational Linguistics.

Bevan Jones, Jacob Andreas, Daniel Bauer,Karl Moritz Hermann, and Kevin Knight. 2012.Semantics-based machine translation with hyper-edge replacement grammars. In Proceedings ofCOLING 2012, pages 1359–1376, Mumbai, India.The COLING 2012 Organizing Committee.

Yoon Kim, Yacine Jernite, David A. Sontag, andAlexander M. Rush. 2016. Character-aware neurallanguage models. In Proceedings of the ThirtiethAAAI Conference on Artificial Intelligence, Febru-ary 12-17, 2016, Phoenix, Arizona, USA, pages2741–2749.

Diederik P. Kingma and Jimmy Ba. 2015. Adam: Amethod for stochastic optimization. In 3rd Inter-national Conference on Learning Representations,ICLR 2015, San Diego, CA, USA, May 7-9, 2015,Conference Track Proceedings.

Diederik P. Kingma and Max Welling. 2014. Auto-encoding variational bayes. In 2nd InternationalConference on Learning Representations, ICLR2014, Banff, AB, Canada, April 14-16, 2014, Con-ference Track Proceedings.

Thomas N. Kipf and Max Welling. 2017. Semi-supervised classification with graph convolutionalnetworks. In 5th International Conference on Learn-ing Representations, ICLR 2017, Toulon, France,April 24-26, 2017, Conference Track Proceedings.

Ioannis Konstas, Srinivasan Iyer, Mark Yatskar, YejinChoi, and Luke Zettlemoyer. 2017. Neural AMR:Sequence-to-sequence models for parsing and gener-ation. In Proceedings of the 55th Annual Meeting ofthe Association for Computational Linguistics (Vol-ume 1: Long Papers), pages 146–157, Vancouver,Canada. Association for Computational Linguistics.

Ponnambalam Kumaraswamy. 1980. A generalizedprobability density function for double-bounded ran-dom processes. Journal of Hydrology, 46(1-2):79–88.

Shuhei Kurita and Anders Søgaard. 2019. Multi-tasksemantic dependency parsing with policy gradientfor learning easy-first strategies. In Proceedings ofthe 57th Annual Meeting of the Association for Com-putational Linguistics, pages 2420–2430, Florence,Italy. Association for Computational Linguistics.

Matthias Lindemann, Jonas Groschwitz, and Alexan-der Koller. 2019. Compositional semantic parsingacross graphbanks. In Proceedings of the 57th An-nual Meeting of the Association for ComputationalLinguistics, pages 4576–4585, Florence, Italy. Asso-ciation for Computational Linguistics.

Fei Liu, Jeffrey Flanigan, Sam Thomson, NormanSadeh, and Noah A. Smith. 2015. Toward abstrac-tive summarization using semantic representations.In Proceedings of the 2015 Conference of the NorthAmerican Chapter of the Association for Computa-tional Linguistics: Human Language Technologies,






https://openreview.net/forum?id=Hk95PK9le

https://openreview.net/forum?id=Hk95PK9le

https://doi.org/10.18653/v1/S16-1186

https://doi.org/10.18653/v1/S16-1186

https://doi.org/10.18653/v1/S16-1186

https://doi.org/10.3115/v1/P14-1134

https://doi.org/10.3115/v1/P14-1134

https://doi.org/10.3115/v1/P14-1134

https://doi.org/10.18653/v1/D18-1198

https://doi.org/10.18653/v1/D18-1198

https://doi.org/10.18653/v1/P19-1024

https://doi.org/10.18653/v1/P19-1024

https://doi.org/10.18653/v1/P19-1024




https://doi.org/10.1162/neco.1997.9.8.1735

https://doi.org/10.18653/v1/P19-1237

https://doi.org/10.18653/v1/P19-1237

https://doi.org/10.18653/v1/P19-1237

https://www.aclweb.org/anthology/C12-1083

https://www.aclweb.org/anthology/C12-1083

http://www.aaai.org/ocs/index.php/AAAI/AAAI16/paper/view/12489






https://openreview.net/forum?id=SJU4ayYgl



https://doi.org/10.18653/v1/P17-1014

https://doi.org/10.18653/v1/P17-1014

https://doi.org/10.18653/v1/P17-1014

https://doi.org/10.18653/v1/P19-1232

https://doi.org/10.18653/v1/P19-1232

https://doi.org/10.18653/v1/P19-1232

https://doi.org/10.18653/v1/P19-1450

https://doi.org/10.18653/v1/P19-1450

https://doi.org/10.3115/v1/N15-1114

https://doi.org/10.3115/v1/N15-1114

4316

pages 1077–1086, Denver, Colorado. Associationfor Computational Linguistics.

Yijia Liu, Wanxiang Che, Bo Zheng, Bing Qin,and Ting Liu. 2018. An AMR aligner tuned bytransition-based parser. In Proceedings of the 2018Conference on Empirical Methods in Natural Lan-guage Processing, Brussels, Belgium, October 31 -November 4, 2018, pages 2422–2430.

Christos Louizos, Max Welling, and Diederik P.Kingma. 2017. Learning sparse neural networksthrough l0 regularization. CoRR, abs/1712.01312.

Chunchuan Lyu and Ivan Titov. 2018. AMR parsing asgraph prediction with latent alignment. In Proceed-ings of the 56th Annual Meeting of the Associationfor Computational Linguistics (Volume 1: Long Pa-pers), pages 397–407, Melbourne, Australia. Asso-ciation for Computational Linguistics.

Christopher D. Manning, Mihai Surdeanu, John Bauer,Jenny Finkel, Steven J. Bethard, and David Mc-Closky. 2014. The Stanford CoreNLP natural lan-guage processing toolkit. In Association for Compu-tational Linguistics (ACL) System Demonstrations,pages 55–60.

Diego Marcheggiani and Ivan Titov. 2017. Encodingsentences with graph convolutional networks for se-mantic role labeling. In Proceedings of the 2017Conference on Empirical Methods in Natural Lan-guage Processing, pages 1506–1515, Copenhagen,Denmark. Association for Computational Linguis-tics.

Arindam Mitra and Chitta Baral. 2016. Addressing aquestion answering challenge by combining statis-tical methods with inductive rule learning and rea-soning. In Proceedings of the Thirtieth AAAI Con-ference on Artificial Intelligence, February 12-17,2016, Phoenix, Arizona, USA, pages 2779–2785.

Eric T. Nalisnick and Padhraic Smyth. 2017. Stick-breaking variational autoencoders. In 5th Inter-national Conference on Learning Representations,ICLR 2017, Toulon, France, April 24-26, 2017, Con-ference Track Proceedings.

Jason Naradowsky, Sebastian Riedel, and David Smith.2012. Improving NLP through marginalization ofhidden syntactic structure. In Proceedings of the2012 Joint Conference on Empirical Methods in Nat-ural Language Processing and Computational Natu-ral Language Learning, pages 810–820, Jeju Island,Korea. Association for Computational Linguistics.

Tahira Naseem, Abhishek Shah, Hui Wan, Radu Flo-rian, Salim Roukos, and Miguel Ballesteros. 2019.Rewarding Smatch: Transition-based AMR parsingwith reinforcement learning. In Proceedings of the57th Annual Meeting of the Association for Com-putational Linguistics, pages 4586–4592, Florence,Italy. Association for Computational Linguistics.

Joakim Nivre. 2008. Algorithms for deterministic in-cremental dependency parsing. Computational Lin-guistics, 34(4):513–553.

Rik van Noord and Johan Bos. 2017. Neural seman-tic parsing by character-based translation: Experi-ments with abstract meaning representations. CoRR,abs/1705.09980.

Hao Peng, Sam Thomson, and Noah A. Smith. 2017.Deep multitask learning for semantic dependencyparsing. In Proceedings of the 55th Annual Meet-ing of the Association for Computational Linguistics(Volume 1: Long Papers), pages 2037–2048, Van-couver, Canada. Association for Computational Lin-guistics.

Jeffrey Pennington, Richard Socher, and ChristopherManning. 2014. Glove: Global vectors for word rep-resentation. In Proceedings of the 2014 Conferenceon Empirical Methods in Natural Language Process-ing (EMNLP), pages 1532–1543, Doha, Qatar. Asso-ciation for Computational Linguistics.

Michael Pust, Ulf Hermjakob, Kevin Knight, DanielMarcu, and Jonathan May. 2015. Parsing Englishinto abstract meaning representation using syntax-based machine translation. In Proceedings of the2015 Conference on Empirical Methods in Natu-ral Language Processing, pages 1143–1154, Lis-bon, Portugal. Association for Computational Lin-guistics.

Mike Schuster and Kuldip K. Paliwal. 1997. Bidirec-tional recurrent neural networks. IEEE Trans. Sig-nal Processing, 45(11):2673–2681.

Abigail See, Peter J. Liu, and Christopher D. Manning.2017. Get to the point: Summarization with pointer-generator networks. In Proceedings of the 55th An-nual Meeting of the Association for ComputationalLinguistics (Volume 1: Long Papers), pages 1073–1083, Vancouver, Canada. Association for Computa-tional Linguistics.

Linfeng Song, Daniel Gildea, Yue Zhang, ZhiguoWang, and Jinsong Su. 2019. Semantic neural ma-chine translation using AMR. Transactions of theAssociation for Computational Linguistics, 7:19–31.

Ashish Vaswani, Noam Shazeer, Niki Parmar, JakobUszkoreit, Llion Jones, Aidan N. Gomez, LukaszKaiser, and Illia Polosukhin. 2017. Attention is allyou need. In Advances in Neural Information Pro-cessing Systems 30: Annual Conference on NeuralInformation Processing Systems 2017, 4-9 Decem-ber 2017, Long Beach, CA, USA, pages 5998–6008.

Chuan Wang, Sameer Pradhan, Xiaoman Pan, Heng Ji,and Nianwen Xue. 2016. CAMR at semeval-2016task 8: An extended transition-based AMR parser.In Proceedings of the 10th International Workshopon Semantic Evaluation, SemEval@NAACL-HLT2016, San Diego, CA, USA, June 16-17, 2016, pages1173–1178.





https://doi.org/10.18653/v1/P18-1037

https://doi.org/10.18653/v1/P18-1037

http://www.aclweb.org/anthology/P/P14/P14-5010

http://www.aclweb.org/anthology/P/P14/P14-5010

https://doi.org/10.18653/v1/D17-1159

https://doi.org/10.18653/v1/D17-1159

https://doi.org/10.18653/v1/D17-1159





https://openreview.net/forum?id=S1jmAotxg

https://openreview.net/forum?id=S1jmAotxg

https://www.aclweb.org/anthology/D12-1074

https://www.aclweb.org/anthology/D12-1074

https://doi.org/10.18653/v1/P19-1451

https://doi.org/10.18653/v1/P19-1451

https://doi.org/10.1162/coli.07-056-R1-07-027

https://doi.org/10.1162/coli.07-056-R1-07-027




https://doi.org/10.18653/v1/P17-1186

https://doi.org/10.18653/v1/P17-1186

https://doi.org/10.3115/v1/D14-1162

https://doi.org/10.3115/v1/D14-1162

https://doi.org/10.18653/v1/D15-1136

https://doi.org/10.18653/v1/D15-1136

https://doi.org/10.18653/v1/D15-1136

https://doi.org/10.1109/78.650093

https://doi.org/10.1109/78.650093

https://doi.org/10.18653/v1/P17-1099

https://doi.org/10.18653/v1/P17-1099

https://doi.org/10.1162/tacl_a_00252

https://doi.org/10.1162/tacl_a_00252

http://papers.nips.cc/paper/7181-attention-is-all-you-need

http://papers.nips.cc/paper/7181-attention-is-all-you-need

https://www.aclweb.org/anthology/S16-1181/

https://www.aclweb.org/anthology/S16-1181/

4317

Chuan Wang and Nianwen Xue. 2017. Getting themost out of AMR parsing. In Proceedings of the2017 Conference on Empirical Methods in Natu-ral Language Processing, pages 1257–1268, Copen-hagen, Denmark. Association for ComputationalLinguistics.

Dani Yogatama, Phil Blunsom, Chris Dyer, EdwardGrefenstette, and Wang Ling. 2017. Learning tocompose words into sentences with reinforcementlearning. In 5th International Conference on Learn-ing Representations, ICLR 2017, Toulon, France,April 24-26, 2017, Conference Track Proceedings.

Sheng Zhang, Xutai Ma, Kevin Duh, and BenjaminVan Durme. 2019a. AMR parsing as sequence-to-graph transduction. In Proceedings of the 57th An-nual Meeting of the Association for ComputationalLinguistics, pages 80–94, Florence, Italy. Associa-tion for Computational Linguistics.

Sheng Zhang, Xutai Ma, Kevin Duh, and BenjaminVan Durme. 2019b. Broad-coverage semantic pars-ing as transduction. In Proceedings of the 2019 Con-ference on Empirical Methods in Natural LanguageProcessing and the 9th International Joint Confer-ence on Natural Language Processing (EMNLP-IJCNLP), pages 3784–3796, Hong Kong, China. As-sociation for Computational Linguistics.

Sheng Zhang, Xutai Ma, Rachel Rudinger, Kevin Duh,and Benjamin Van Durme. 2018a. Cross-lingual de-compositional semantic parsing. In Proceedings ofthe 2018 Conference on Empirical Methods in Nat-ural Language Processing, pages 1664–1675, Brus-sels, Belgium. Association for Computational Lin-guistics.

Yuhao Zhang, Peng Qi, and Christopher D. Manning.2018b. Graph convolution over pruned dependencytrees improves relation extraction. In Proceedings ofthe 2018 Conference on Empirical Methods in Nat-ural Language Processing, pages 2205–2215, Brus-sels, Belgium. Association for Computational Lin-guistics.

https://doi.org/10.18653/v1/D17-1129

https://doi.org/10.18653/v1/D17-1129

https://openreview.net/forum?id=Skvgqgqxe



https://doi.org/10.18653/v1/P19-1009

https://doi.org/10.18653/v1/P19-1009

https://doi.org/10.18653/v1/D19-1392

https://doi.org/10.18653/v1/D19-1392

https://doi.org/10.18653/v1/D18-1194

https://doi.org/10.18653/v1/D18-1194

https://doi.org/10.18653/v1/D18-1244

https://doi.org/10.18653/v1/D18-1244

4318

A Appendix

A.1 Details of Model Structures andParameters

GloVe embeddingsdim 300

BERT embeddingssource BERT-base-caseddim 768

POS tag embeddingsdim 100

CharCNNnum filters 100ngram filter sizes [3]

Graph Encodergcn hidden dim 512gcn layers 1

Latent Graph GeneratorHardKuma support [-0.1, 1.1]k dim 64v dim 64n heads 8

Encoderhidden size 512num layers 2

Decoderhidden size 1024num layers 2

Deep Biaffine Classifieredge hidden size 256label hidden size 128

Optimizertype Adamlearning rate 0.001max grad norm 5.0

Coverage loss weight λ 1.0

Beam size 5

Dropout 0.33

Batch Sizetrain batch size 64test batch size 32

Table 5: Hyper-parameter settings

We select the best hyper-parameters under theresults of the development set, and we fix the hyper-parameters at the test stage. We use two-layer high-way LSTM as the encoder and two-layer LSTMas the decoder for the align-free node generator.Table 5 shows the details.

A.2 More Examples

To discuss the generated latent graph in differentsituations, We provide two examples from the testset on the next page.

Figure 5 gives the analysis of an interrogativesentence: “What advice could you give me?”. Itshows that the latent graph of the sentence is goingto hold the most information between predicatesand arguments. Both the AMR root “advice” andthe dependency root “give” are paid more attentionfrom other words, and the fused graph retains moreinformation of the predicates and arguments in theoriginal sentence as well.

For a longer sentence with multiple predicate-argument structures, Figure 6 depicts the latentand fused graph of the sentence “You could go tothe library on saturdays and do a good 8 hours ofstudying there.”. In this case, the correspondinglatent graph becomes shallower, and the AMR root“and” holds most information from other words.Besides, the fused graph indicates that predicateswill receive more information from other words,and to some extent, phrases tend to be connectedby the fused graph generator.

4319

Sentence:What advice could you give me?

Dependency: AMR:

Latent Matrix: Fused Matrix:

advice

you

:ARG0

:ARG2

root

ipossible

amr-unkonwn

:ARG1-of:ARG1

give

could

advice

dobj

det

root

you

?

whatme

punct

dobjnsubjaux

Figure 5: An analysis of the sentence “What advice could you give me?”.

Sentence:You could go to the library on saturdays and do a good 8 hours of studying there.

Dependency: AMR:

Latent Matrix: Fused Matrix:

go

root

nsubj

you could

to the

library

on

saturdays and do

a good 8

hours

studying

there

.

of

aux nmod nmod ccconj

punct

case det case dobj

detamod nummod acl

markadvmod

:ARG1

root

and

possible

Study-01

:op1

:op2

go

you

library

date-entity

saturday

temporal-quantity

good-02 8 hour

:time

:weekday

:ARG0

:ARG4

:ARG0

:location

:duration

:ARG1-of:quant

:unit

Figure 6: An analysis of the sentence “You could go to the library on Saturdays and do a good 8 hours of studyingthere.”.

amr parsing with latent structural informationgenerator. we then combine both explicit and latent...

Documents